Can Games Fix What’s Wrong with Computer Security Education?

We had the pleasure of Zachary Peterson visiting UCL on a Cyber Security Fulbright Scholarship. The title is from his presentation given at our annual ACE-CSR event in November 2016.

Zachary Peterson is an associate professor of computer science at Cal Poly, San Luis Obispo. The key problem he is trying to solve is that the educational system is producing many fewer computer security professionals than are needed; an article he’d seen just two days before the ACE meeting noted a 73% rise in job vacancies in the last year despite a salary premium of 9% over other IT jobs. This information is backed up by the 2014 Taulbee survey, which found that the number of computer security PhDs has declined to 4% of the US total. Lack of diversity, which sees security dominated by white and Asian males, is a key contributing factor. Peterson believes that diversity is not only important as a matter of fairness, but essential because white males are increasingly a demographic minority in the US and because monocultures create perceptual blindness. New perspectives are especially needed in computer security as present approaches are not solving the problem on their own.

Peterson believes that the numbers are so bad because security is under-represented in both the computer science curriculum and in curriculum standards. The ACM 2013 curriculum guidelines recommend only three contact hours (also known as credit hours) in computer security in an entire undergraduate computer science degree. These are typically relegated to an upper-level elective class, and subject to a long chain of prerequisites, so they are only ever seen by a self-selected group who have survived years of attrition – which disproportionately affects women. The result is to create a limited number of specialists, unnecessarily constrain the student body, and limit the time students have to practice before joining the workforce. In addition, the self-selected group who do study security late in their academic careers have developed both set habits and their mind set before encountering an engineering task. Changing security into a core competency and teaching it as early as secondary school is essential but has challenges: security can be hard, and pushing it to the forefront may worsen existing problems seen in computer science more broadly, such as the solitary, anti-social, creativity-deficient image perception of the discipline.

Peterson believes games can help improve this situation. CTFTime, which tracks games events, reports a recent explosion in cyber security games to over 56 games events per year since 2013. These games, if done correctly, can teach core security skills in an entertaining – and social – way, with an element of competition. Strategic thinking, understanding an adversary’s motivation, rule interpretation, and rule-breaking are essential for both game-playing and security engineering.

Continue reading Can Games Fix What’s Wrong with Computer Security Education?

MAMADROID: Detecting Android Malware by Building Markov Chains of Behavioral Models

Now making up 85% of mobile devices, Android smartphones have become profitable targets for cybercriminals, allowing them to bypass two factor authentication or steal sensitive information such as credit cards details or login credentials.

Smartphones have limited battery and memory available, therefore, the defences that can be deployed on them have limitations. For these reasons, malware detection is usually performed in a centralised fashion by Android market operators. As previous work and even recent news have shown, however, even Google Play Store is not able to detect all malicious apps; to make things even worse, there are countries in which Google Play Store is blocked. This forces users to resort to third party markets, which are usually performing less careful malware checks.

Previous malware detection studies focused on models based on permissions or on specific API calls. While the first method is prone to false positives, the latter needs constant retraining, because apps as well as the Android framework itself are constantly changing.

Our intuition is that, while malicious and benign apps may call the same API calls during their execution, the reason why those calls are made may be different, resulting in them being called in a different order. For this reason, we decided to rely on sequences of calls that, as explained later, we abstract to higher level for performance, feasibility, and robustness reasons. To implement this idea we created MaMaDroid, a system for Android malware detection.

MaMaDroid

MaMaDroid is built by combining four different phases:

  • Call graph extraction: starting from the apk file of an app, we extract the call graph of the analysed sample.
  • Sequence extraction: from the call graph, we extract the different potential paths as sequences of API calls and abstract all those calls to higher levels.
  • Markov Chain modelling: all the samples got their sequences of abstracted calls, and these sequences can be modelled as transitions among states of a Markov Chain.
  • Classification: Given the probabilities of transition between states of the chains as features set, we apply machine learning to detect malicious apps.
Four phases of MaMaDroid

Call graph extraction

MaMaDroid is a system based only on static analysis. To analyse the app, we use off-the-shelf tools, such as Soot and FlowDroid for the first step of the system.

Sequence Extraction

Taking the call graph as input, we extract the sequences of functions potentially called by the program and, by identifying the set of entry nodes, enumerate all the possible paths and output them as sequences of API calls.

Example call graph in which we can observe 3 different potential paths, or sequences, starting from the root node

Continue reading MAMADROID: Detecting Android Malware by Building Markov Chains of Behavioral Models

Battery Status Not Included: Assessing Privacy in W3C Web Standards

Designing standards with privacy in mind should be a standard in itself. Historically this was not always the case and the idea of designing systems with privacy is relatively new – it dates from the beginning of this decade. One of the milestones is accepting this view on the IETF level, dating 2013.

This note and the referenced research work focusses on designing standards with privacy in mind. The World Wide Web Consortium (W3C) is one of the most important standardization bodies. W3C is standardizing something everyone – billions of people – use every day for entertainment or work, the web and how web browsers work. It employs a focused, lengthy, but very open and consensus-driven environment in order to make sure that community voice is always heard during the drafting of new specifications. The actual inner workings of W3C are surprisingly complex. One interesting observation is that the W3C Process Document document does not actually mention privacy reviews at all. Although in practice this kind of reviews are always the case.

However, as the web along with its complexity and web browser features constantly grow, and as browsers and the web are expected to be designed in an ever-rapid manner, considering privacy becomes a challenge. But this is what is needed in order to obtain a good specification and well-thought browser features, with good threat models, and well designed privacy strategies.

My recent work, together with the Princeton team, Arvind Narayanan and Steven Englehardt, is providing insight into the standardization process at W3C, specifically the privacy areas of standardization. We point out a number of issues and room for improvements. We also provide recommendations as well as broad evidence of a wide misuse and abuse (in tracking and fingerprinting) of a browser mechanism called Battery Status API.

Battery Status API is a browser feature that was meant to allow websites access the information concerning the battery state of a user device. This seemingly innocuous mechanism initially had no identified privacy concerns. However, following my previous research work in collaboration with Gunes Acar this view has changed (Leaking Battery and a later note).

The work I authored in 2015 identifies a number or privacy risks of the API. Specifically – the issue of potential exploitation of the API to tracking, as well as the potential possibility of recovering the raw value of the battery capacity – something that was not foreseen to be accessed by web sites. Ultimately, this work has led to a fix in Firefox browser and an update to W3C specification. But the matter was not over yet.

Continue reading Battery Status Not Included: Assessing Privacy in W3C Web Standards

What the CIA hack and leak teaches us about the bankruptcy of current “Cyber” doctrines

Wikileaks just published a trove of documents resulting from a hack of the CIA Engineering Development Group, the part of the spying agency that is in charge of developing hacking tools. The documents seem genuine and catalog, among other things, a number of exploits against widely deployed commodity devices and systems, including Android, iPhone, OS X and Windows. Also smart TVs. This hack, with appropriate background, teaches us a lesson or two about the direction of public policy related to “cyber” in the US and the UK.

Routine proliferation of weaponry and tactics

The CIA hack is in many ways extraordinary, in that it allowed the attackers to gain access to the source code of the hacking tools of the agency – an extraordinary act of proliferation of attack technologies. In other ways, it is mundane in that it is neither the first, nor probably the last hack or leak of catastrophic proportions to occur to a US/UK government department in charge of offensive cyber operations.

This list of leaks of government attack technologies, illustrates that when it comes to cyber-weaponry the risk of proliferation is not merely theoretical, but very real. In fact it seems to be happening all the time.

I find it particularly amusing – and those in charge of those agencies should probably find it embarrassing – that NSA and GCHQ go around presenting themselves as national technical authorities in assurance; they provide advice to others on how to not get hacked; they keep asserting that they can be trusted to operate extremely dangerous spying infrastructures; and handle in secret extremely dangerous zero-day exploits. Yet, they seem to be routinely hacked and have their secret documents leaked. Instead of chasing whistleblowers and journalists, policy makers should probably take note that there is not a high-enough level of assurance to secure cyber-weaponry, and for sure it is not to be found within those agencies.

In fact the risk of proliferation is at the very heart of cyber attack, and integral to it, even without hacking or leaking from inside government. Many of us quietly laughed at the bureaucratic nightmare discussed in the recent CIA leak, describing the difficulty of classifying the cyber attack techniques while at the same time deploying them on target system. As the press release summarizes:

To attack its targets, the CIA usually requires that its implants communicate with their control programs over the internet. If CIA implants, Command & Control and Listening Post software were classified, then CIA officers could be prosecuted or dismissed for violating rules that prohibit placing classified information onto the Internet. Consequently the CIA has secretly made most of its cyber spying/war code unclassified.

This illustrates very clearly a key dynamic in hacking: once a hacker uses an exploit against an adversary system, there is a very real risk the exploit is captured by monitoring and intrusion detection systems of the target, and then weponized to hack other computers, at a low cost. This is very well established and researched, and such “honey pot” infrastructures have been used in the academic and commercial community for some time to detect and study potentially new attacks. This is not the premise of sophisticated defenders, the explanation of how honeypots work is on Wikipedia! The Flame malware, and Stuxnet before, were in fact found in the wild.

In that respect cyber-war is not like war at all. The weapons you use will be turned against you immediately, and your effective use of weapons relies on your very own infrastructures being utterly vulnerable to them.

What “Cyber” doctrine?

The constant leaks and hacks, leading to proliferation of exploits and hacking tools from the heart of government, as well through operations, should deeply inform policy makers when making choices about “cyber” doctrines. First, it is probably time to ditch the awkward term “Cyber”.

Continue reading What the CIA hack and leak teaches us about the bankruptcy of current “Cyber” doctrines

Double-Fetch Situations Turn into Double-Fetch Vulnerabilities: A Study of Double Fetches in the Linux Kernel

We held our annual ACE-CSR event on 2 November 2016. This post and the title is drawn from the second talk by Dr Jens Krinke, a senior lecturer in the Centre for Research on Evolution, Search and Testing, and the Software Systems Engineering group at UCL, who outlined work he conducted with Pengfei Wang that found flaws in the Linux kernel.

The work Krinke presented, completed over the last year, revolves around double-fetch problems, which got their name in 2008 in a study that found exploitable vulnerabilities in the Windows kernel. In an updated study in 2013, Mateusz “j00ru” Jurczyk and Gynvael Coldwind produced a paper, which examined the Windows kernel and found 89 potential issues, 36 of which were highly exploitable. Most of the paper focused on how to exploit these bugs; today, if you find such a vulnerability you can download an exploit from Github.

Many of these vulnerabilities have now been fixed, but Jurczyk and Gynvael’s analysis was extremely slow, because they essentially simulated Windows over and over again, with the result that it took them 15 hours just to get to the boot screen. Because the analysis was dynamic, they were only able to cover a subset of the drivers present in the system, and they said outright that they knew there were other vulnerabilities present. Krinke was sure there were vulnerabilities of this type in Linux, but no one had audited the Linux kernel in detail.

Krinke and Wang wanted to audit Linux, looking at the source code directly and including all the drivers – which was essential, because, as it turned out, roughly 80% of the problems they found were in the drivers.

Double-fetch problems are memory access conflicts that arise in the interface between the operating system kernel and an application when data is changed between the time it’s checked and the time it’s actually used. It works this way: every process a user starts has its own virtual memory space that is isolated from other user memory spaces; only the kernel can see all memory spaces. However, there must be interaction, and this is done through system calls. So, a user selects some data and asks the system to operate on it. The data remains in the user space but the system copies it to the kernel and checks the data is valid before using it. The upshot is that the kernel may fetch the data twice: once for checking and once for use. A malicious application that intercedes between those two fetches and changes the data after it’s been checked but before it’s used can cause system crashes.

If an error like this arises outside of the operating system, it causes software errors, but this affects only you; when it happens in the operating system, an attacker can take over the complete system.

Analysing this process is extremely hard, especially because exactly what’s occurring is unknown beforehand. Krinke and Wang instead created a two-step analysis. First, a pattern-based double-fetch analysis based on Coccinelle examined the source code to find occurrences; these were then manually analysed to determine how the data was used and what the consequences were. The results were divided into three categories: size checking; type selection; and shallow copy. In the 40,000 source code files comprising Linux, they found 90 such situations, of which 57 were in drivers, five were true bugs. All of them have been confirmed and three are exploitable vulnerabilities. In checking back, Krinke and Wang found that some of these bugs are at least ten years old.

Krinke and Wang also applied their analysis also to Android (a variant of Linux) and FreeBSD. The latter, which is more security-oriented in design, had no problems. Android had only three bugs, but they also found a problem that had been fixed in Linux (but not in Android). After studying these three operating systems with Coccinelle, they can say that after these bugs have been fixed, that all three are likely to be free of double-fetch vulnerabilities. As the bug was in the file system, Android apps with native binary code that interacts with the phone at the file system level can also exploit it.

Diversity is our strength

On Friday evening, US President Donald Trump signed an executive order suspending visas to citizens of seven countries for at least 90 days. Among the many other implications of this ban — none of which we want to minimise with our focus on the implications for academics — this now implies that (1) students who are citizens (even dual citizens) of these countries are now unable to study in the US or attend conferences there, and (2) academics who are citizens of these countries and who legally work and live in the US are now unable to leave (to, say, attend conferences or visit another academic institution), as they would not be allowed back in.

We receive many inquiries each year from strong applicants from these seven countries, and according to a statement issued by many US-based academics, more than 3,000 Iranian students received PhDs from American universities in the past 3 years. Across our nine faculty members, we currently have funding available for numerous PhD students and postdoctoral researchers. If any student is stranded outside of the US, we of course hope that they are able to make it back quickly, but have funding for internships that would allow them to work from here in the interim. In organising conferences, we and our wider UCL colleagues are doing all we can to organise them in places without such bans in place, and where that is not possible to enable remote participation.

Most of all, as a group that prides itself on the quality and openness of its research and on its international reach, we would like to re-affirm our commitment to working with the best possible students and academics, regardless of their religion or their country of origin (or indeed anything aside from their scientific contributions). To quote a statement from the International Association of Cryptologic Research (IACR), “the open exchange of ideas requires freedom of movement.” To address the full effects of this ban we of course need far more international cooperation, but we hope that even our small actions can help mitigate the damage that has already been done to our friends and colleagues, both within and outside of the US, and that promises to continue to be done in the future.

Nicolas Courtois
Emiliano de Cristofaro
George Danezis
Jens Groth
Sarah Meiklejohn
Steven Murdoch
David Pym
Angela Sasse
Gianluca Stringhini

On the security and privacy of the ultrasound tracking ecosystem

In April 2016, the US Federal Trade Commission (FTC) sent warning letters to 12 Google Play app developers. The letters were addressed to those who incorporated the SilverPush framework in their apps, and reminded developers who used tracking software to explicitly inform their users (as seen in Section 5 of the Federal Trade Commission Act). The incident was covered by popular press and privacy concerns were raised. Shortly after, SilverPush claimed no active partnerships in the US and the buzz subsided.

Unfortunately, as the incident was resolved relatively fast, very few technical details of the technology were made public. To fill in this gap and understand the potential security implications, we conducted an in-depth study of the SilverPush framework and all the associated technologies.

The development of the framework was motivated by a fast-increasing interest of the marketing industry in products performing high-accuracy user tracking, and their derivative monetization schemes. This resulted in a high demand for cross-device tracking techniques with increased accuracy and reduced prerequisites.

The SilverPush framework fulfilled both of these requirements, as it provided a novel way to track users between their devices (e.g., TV, smartphone), without any user actions (e.g., login to a single platform from all their devices). To achieve that, the framework realized a previously unseen cross-device tracking technique (i.e., ultrasound cross-device tracking, uXDT) that offered high tracking accuracy, and came with various desirable features (e.g., easy to deploy, imperceptible by users). What differentiated that framework from existing ones was the use of high-frequency, inaudible ultrasonic beacons (uBeacons) as a medium/channel for identifier transmission between the user’s devices. This is also offered a major advantage to uXDT, against other competing technologies, as uBeacons can be emitted by most commercial speakers and captured by most commercial microphones. This eliminates the need for specialized emission and/or capturing equipment.

Aspects of a little-known ecosystem

The low deployment cost of the technology fueled the growth of a whole ecosystem of frameworks and applications that use uBeacons for various purposes, such as proximity marketing, audience analytics, and device pairing. The ecosystem is built around the near-ultrasonic transmission channel, and enables marketers to profile users.

Unfortunately, users are often given limited information on the ecosystem’s inner workings. This lack of transparency has been the target of great criticism from the users, the security community and the regulators. Moreover, our security analysis revealed a false assumption in the uBeacon threat model that can be exploited by state-level adversaries to launch complex attacks, including one that de-anonymizes the users of anonymity networks (e.g., Tor).

On top of these, a more fundamental shortcoming of the ecosystem is the violation of the least privilege principle, as a consequence of the access to the device’s microphone. More specifically, any app that wants to employ ultrasound-based mechanisms needs to gain full access to the device’s microphone, as there is currently no way to gain access only to the ultrasound spectrum. This clearly violates the least privilege principle, as the app has now access to all audible frequencies and allows a potentially malicious developer to request access to the microphone for ultrasound-pairing purposes, and then use it to eavesdrop the user’s discussions. This also results in any ultrasound-enabled apps to risk being perceived as “potentially malicious” by the users.

Mitigation

To address these shortcomings, we developed a set of countermeasures aiming to provide protection to the users in the short and medium term. The first one is an extension for the Google Chrome browser, which filters out all ultrasounds from the audio output of the websites loaded by the user. The extension actively prevents web pages from emitting inaudible sounds, and thus completely thwarts any unsolicited ultrasound-tracking attempts. Furthermore, we developed a patch for the Android permission system that allows finer-grained control over the audio channel, and forces applications to declare their intention to capture sound from the inaudible spectrum. This will properly separate the permissions for listening to audible sound and sound in the high-frequency spectrum, and will enable the end users to selectively filter the ultrasound frequencies out of the signal acquired by the smartphone microphone.

More importantly, we argue that the ultrasound ecosystem can be made secure only with the standardization of the ultrasound beacon format. During this process, the threat model will be revised and the necessary security features for uBeacons will be specified. Once this process is completed, APIs for handling uBeacons can be implemented in all major operating systems. Such an API would provide methods for uBeacon discovery, processing, generation and emission, similar to those found in the Bluetooth Low Energy APIs. Thereafter, all ultrasound-enabled apps will need access only to this API, and not to the device’s microphone. Thus, solving the problem of over-privileging that exposed the user’s sensitive data to third-party apps.

Discussion

Our work provides an early warning on the risks looming in the ultrasound ecosystem, and lays the foundations for the secure use of this set of technologies. However, it also raises several questions regarding the security of the audio channel. For instance, in a recent incident a journalist accidentally injected commands to several amazon echo devices, which then allegedly tried to order products online. This underlines the need for security features in the audio channel. Unfortunately, due to the variety of use cases, a universal solution that could be applied to the lower communication layers seems unlikely. Instead, solutions must be sought in the higher communications layers (e.g., application layers), and should be the outcome of careful threat modeling.

Inaugural Lecture: Zero-Knowledge Proofs

We held our annual ACE-CSR event in November 2016. The last talk was my inaugural lecture to full professor. I did not write the summary below myself, hence the use of third person, not because I now consider myself royalty! 🙂

In introducing Jens Groth, professor of cryptology, George Danezis, head of the information security group, commented that Groth’s work provided the only viable solutions to many of the hard privacy problems he himself was tackling. To most qualified engineers, he said, the concept of zero-knowledge proofs seems impossible: the idea is to show the properties of a secret without revealing them. A zero-knowledge proof could, for example, verify the result of a computation on some data without revealing the data itself. Most engineers believe that you must choose between integrity and confidentiality; Groth has proved this is not true. In addition, Danezis praised Groth’s work as highly creative, characterised by great mathematical depth and subtlety, and admired Groth’ willingness to speak his mind fearlessly even to government funders. Angela Sasse, head of the department, called Groth’s work “security tools we’re going to need for future generations”, and noted that simultaneously with these other accomplishments Groth helped put in place the foundation for the group as it is today.

Groth, jokingly opted to structure his talk around papers he’s had rejected to illustrate how hard it can be to publish innovative research. The concept of zero-knowledge proofs originated with a 1985 paper by Shafi Goldwasser, Silvio Micali, and Charles Rackoff. Zero-knowledge proofs have three characteristics: completeness (the prover can convince the verifier that the statement is true); soundness (the claiming prover cannot convince the verifier when the statement is false); and secrecy (no information other than the truth of the statement is leaked, even when the prover is interacting with a verifier who cheats). Groth illustrated the latter idea with a simple card trick: he asked an audience member to choose a card and then say whether the card was a heart or not. If the respondent shows all the cards that are not hearts, counting these proves that the selected card must be a heart without revealing what it is.

Zero-knowledge proofs can be extended to think about more complicated statements. Groth listed some examples:

  • Assert that a logical formula has an assignment to the variables that makes it true
  • Verify that a graph is Hamiltonian – that is, there is a path that touches each vertex exactly once
  • That a set of inputs into a Boolean circuit will produce an output of 1
  • Any statement of the general form U belongs to some NP-language L

Groth could see many possible applications for these proofs: signatures, encryption, electronic cash, electronic auctions, internet voting, multiparty computation, and verifiable cloud computing. His overall career has focused on building versatile and efficient proofs with the goal of moving them from being expensive and slow to being just a fraction of the cost of the task that’s being executed so that people would stop thinking about the cost and just toss them in as a standard part of any transaction.

Continue reading Inaugural Lecture: Zero-Knowledge Proofs

Security intrusions as mechanisms

The practice of security often revolves around figuring out what (malicious act) happened to a system. This historical inquiry is the focus of forensics, specifically when the inquiry regards a policy violation (such as a law). The results of forensic investigation might be used to fix the impacted system, attribute the attack to adversaries, or build more resilient systems going forwards. However, to execute any of these purposes, the investigator first must discover the mechanism of the intrusion.

As discussed at an ACE seminar last October, one common framework for this discovery task is the intrusion kill chain. Mechanisms, mechanistic explanation, and mechanism discovery have highly-developed meanings in the biological and social sciences, but the word is not often used in information security. In a recent paper, we argue that incident response and forensics investigators would be well-served to make use of the existing literature on mechanisms, as thinking about intrusion kill chains as mechanisms is a productive and useful way to frame the work.

To some extent, thinking mechanistically is a description of what (certain) scientists do. But the mechanisms literature within philosophy of science is not merely descriptive. The normative benefits extolled include that thinking mechanistically is an effective heuristic for searching out useful explanations; mechanisms provide the most coherent unity to complex fields of study; and that mechanistic explanation is necessary to guide selection among potential studies given limited experimental resources, experiment design decisions, and interpretation of statistical results. I previously argued that capricious use of biological metaphors is bad for information security. We are keenly aware that these benefits of mechanistic explanation need to apply to security as and for security, not merely because they work in other sciences.

Our paper demonstrates how we can cast the intrusion kill chain, the diamond model, and other models of security intrusions as mechanistic models. This casting begins to demonstrate the mosaic unity of information security. Campaigns are made up of attacks. Attacks, as modeled by the kill chain, have multiple steps. In a specific attack, the delivery step might be accomplished by a drive-by-download. So we demonstrate how drive-by-downloads are a mechanism, one among many possible delivery mechanisms. This description is a schema to be filled in during a particular drive-by download incident with a specific URL and specific javascript, etc. The mechanistic schema of the delivery mechanism informs the investigator because it indicates what types of network addresses to look for, and how to fit them into the explanation quickly. This process is what Lindley Darden calls schema instantiation in the mechanism discovery literature.

Our argument is not that good forensics investigators do not do such mechanism discovery strategies. Rather, it is precisely that good investigators do do them. But we need to describe what it is good investigators in fact do. We do not currently, and that lack makes teaching new investigators particularly difficult. Thinking about intrusions as mechanisms unlocks an expansive literature on good ways to do mechanism discovery. This literature will make it easier to codify what good investigators do, which among other benefits allows us to better teach sound methodological practices to incoming investigators.

Our paper on this topic was published in the open-access Journal of Cybersecurity, as Thinking about intrusion kill chains as mechanisms, by Jonathan M. Spring and Eric Hatleback.

Privacy analysis of the W3C Proximity Sensor specification

Mobile developers are familiar with proximity sensors. These provide information about the distance of an object from the device providing access to the sensor (i.e. smartphone, tablet, laptop, or another Web of Things device). This object is usually the user’s head or hand.

Proximity sensors provide binary values such as far (from the object) or near (respectively), or a more verbose readout, in centimeters.

There are many useful applications for proximity. For example, an application can turn off the screen if the user holds the device close to his/her head (face detection). It is also handy for avoiding the execution of undesired actions, possibly arising from scratching the head with the phone’s screen.

The Proximity Sensor API specification is being standardized by W3C. Every web site will be able to access this information. Let’s focus on the privacy engineering point of view.

Proximity is the distance between an object and device. The latest version of W3C Proximity Sensors API takes advantage of a soon-to-be standard Generic Sensors API. The sensor provides the proximity distance in centimeters.

The use of Proximity Sensors is simple; an example displaying the current distance in centimeters using the modern syntax of Generic Sensors API is as follows:

let sensor = new ProximitySensor();

sensor.start();

sensor.onchange = function(){console.log(event.reading.distance)};  

This syntax might later allow requesting various proximity sensors (if the device has more than one), including those based on the sensor’s position (such as “rear” or “front”), but the current implementations generally still use the legacy version from the previous edition of the spec:

window.addEventListener('deviceproximity', function(event) {  
console.log(event.value)  
});

From the perspective of privacy considerations, there is currently no significant difference between those two API versions.

Privacy analysis of Proximity Sensor API

Proximity sensor readout is not providing much data, so it may appear there are no privacy implications, but there are good reasons for performing such analysis. Let’s list two:

Firstly, designing new standards and systems with privacy in mind — privacy by design — is a required practice and a good idea. Secondly, in some circumstances even potentially insignificant mechanisms can still bring consequences from privacy point of view. For example, multiple identifiers might be helpful in deanonymizing users. Non-obvious data leaks can surface, as well.

So let’s discuss some of the possible issues.

The average distance from the device to the user’s face (i.e. object) could be used to differentiate and discriminate between users. While the severity is hard to establish, future changes (i.e. influencing with other external data) cannot be ruled out.

If proximity patterns would be individually-attributable, this would offer a possibility to enhance user profiling based on the analysis of device use patterns. For example, the following could be obtained and analyzed:

  • Frequency of the user’s patterns of device use (i.e. waving a hand in front of the proximity sensor)
  • Frequency of the user’s zooming in and out (i.e. device close to the user’s head)
  • Patterns of use. Does the way the user hold the device vary during the day? How?
  • What are the mechanics of holding the device close to the user’s head? Can the distance vary?

Some users may use the mobile operating system’s zoom capability to increase the font or images, but others might casually prefer to hold the device closer to their eyes. Such behavioral differences can also be of note.

Proximity sensors can provide the following data: the current proximity distance and max, the maximum distance readout supported by the device. In case the value of max would differ between various implementations (e.g. among browsers, devices), it could form an identifier. And the two (max, distance) combined could also be used as short-lived identifiers. Moreover, it is not so difficult to imagine a situation where those identifiers are actually not so short-lived.

Recommendations

First of all, is there a need to provide a verbose proximity readout at all? For example, is providing readouts of proximity (distance) value up to 150 cm necessary?

In general, a device (browser, a Web of Things device) should be capable of informing the user when a web site accesses proximity information. The user should also be able to inspect which web sites – and how frequently – accessed the API.

Finally, the sensors should provide an adequately verbose readout of the distance.

Proximity Sensors should also be subject to permissions.

Demonstration

At the moment, the implementations of proximity sensors are limited. But a demonstration is running on my SensorsPrivacy research project (tested against Firefox Mobile on Android). In my case (Nexus 5), Firefox offered data in centimeters, but the verbosity was limited to 5 centimeters on my device. Again, this is a matter of hardware and software. In principle, the accuracy could be much higher and one can imagine an accuracy a sensor providing more accurate readouts (even up to 1 meter).

The screen dump below shows how the example demonstrates the use of proximity data (and whether it works on a browser). The site changes its background color if an object is placed near the proximity sensor of a device.

The readout from the demonstration shows a relative time between events (first column), and the proximity distance (second column).

398  5  
1011 0  
404  5  
607  0  

The sensor on my device is reporting two values of distance: 0 (near) and 5 (far), in centimeters. In general, the granularity is subject to the following: standard, implementation and hardware.

And this suggests something, of course. So let’s complement the privacy analysis from the previous section to incorporate a timing analysis.

Even such a limited readout can help performing behavioural analysis, simply by profiling based on time series. As we can see, the sensor readout is quite sensitive in respect to time and can measure with sub-200 ms granularity. The demonstration proof of principle is also showing the minimum relative detected time between events. Feel free to test the performance of your fingers and/or hardware!

Summary

When designing a standard project or an implementation, paying attention to details is imperative. This includes consideration of even potential risks. This is especially the case if the software or systems will be used by millions or hundreds of millions of users, i.e. you are aiming for success.

Finally, standards and specifications are ideal for issuing guidance and good practices

 

This post originally appeared on Security, Privacy & Tech Inquiries, the blog of Lukasz Olejnik. An accompanying demonstration is available on SensorsPrivacy (a project studing privacy of web sensors).