A Critical Analysis of Genome Privacy Research

The relationship between genomics and privacy-enhancing technologies (PETs) has been an intense one for the better part of the last decade. Ever since Wang et al.’s paper, “Learning your identity and disease from research papers: Information leaks in genome wide association study”, received the PET Award in 2011, more and more research papers have appeared in leading conferences and journals. In fact, a new research community has steadily grown over the past few years, also thanks to several events, such as Dagstuhl Seminars, the iDash competition series, or the annual GenoPri workshop. As of December 2017, the community website genomeprivacy.org lists more than 200 scientific publications, and dozens of research groups and companies working on this topic.

dagstuhl
Participants of the 2015 Dagstuhl Seminar on Genome Privacy

Progress vs Privacy

The rise of genome privacy research does not come as a surprise to many. On the one hand, genomics has made tremendous progress over the past few years. Sequencing costs have dropped from millions of dollars to less than a thousand, which means that it will soon be possible to easily digitize the full genetic makeup of an individual and run complex genetic tests via computer algorithms. Also, researchers have been able to link more and more genetic features to predisposition of diseases (e.g., Alzheimer’s or diabetes), or to cure patients with rare genetic disorders. Overall, this progress is bringing us closer to a new era of “Precision Medicine”, where diagnosis and treatment can be tailored to individuals based on their genome and thus become cheaper and more effective. Ambitious initiatives, including in UK and in US, are already taking place with the goal of sequencing the genomes of millions of individuals in order to create bio-repositories and make them available for research purposes. At the same time, a private sector for direct-to-consumer genetic testing services is booming, with companies like 23andMe and AncestryDNA already having millions of customers.

23andme
Example of 23andme “Health Overview” test results. (Image from: https://www.singularityweblog.com/)

On the other hand, however, the very same progress also prompts serious ethical and privacy concerns. Genomic data contains highly sensitive information, such as predisposition to mental and physical diseases, as well as ethnic heritage. And it does not only contain information about the individual, but also about their relatives. Since many biological features are hereditary, access to genomic data of an individual essentially means access to that of close relatives as well. Moreover, genomic data is hard to anonymize: for instance, well-known results have demonstrated the feasibility of identifying people (down to their last name) who have participated in genetic research studies just by cross-referencing their genomic information with publicly available data.

Overall, there are a couple of privacy issues that are specific to genomic data, for instance its almost perpetual sensitivity. If someone gets ahold of your genome 30 years from now, that might be still as sensitive as today, e.g., for your children. Even if there may be no immediate risks from genomic data disclosure, things might change. New correlations between genetic features and phenotypical traits might be discovered, with potential effects on perceived suitability to certain jobs or on health insurance premiums. Or, in a nightmare scenario, racist and discriminatory ideologies might become more prominent and target certain groups of people based on their genetic ancestry.

Alt-right trolls are arguing over genetic tests they think “prove” their whiteness. (Image taken from Vice News)

Making Sense of PETs for Genome Privacy

Motivated by the need to reconcile privacy protection with progress in genomics, the research community has begun to experiment with the use of PETs for securely testing and studying the human genome. In our recent paper, Systematizing Genomic Privacy Research – A Critical Analysis, we take a step back. We set to evaluate research results using PETs in the context of genomics, introducing and executing a methodology to systematize work in the field, ultimately aiming to elicit the challenges and the obstacles that might hinder their real-life deployment.

Continue reading A Critical Analysis of Genome Privacy Research

Chainspace: A Sharded Smart Contracts Platform

Thanks to their resilience, integrity, and transparency properties, blockchains have gained much traction recently, with applications ranging from banking and energy sector to legal contracts and healthcare. Blockchains initially received attention as Bitcoin’s underlying technology. But for all its success as a popular cryptocurrency, Bitcoin suffers from scalability issues: with a current block size of 1MB and 10 minute inter-block interval, its throughput is capped at about 7 transactions per second, and a client that creates a transaction has to wait for about 10 minutes to confirm that it has been added to the blockchain. This is several orders of magnitude slower that what mainstream payment processing companies like Visa currently offer: transactions are confirmed within a few seconds, and have ahigh throughput of 2,000 transactions per second on average, peaking up to 56,000 transactions per second. A reparametrization of Bitcoin can somewhat assuage these issues, increasing throughput to to 27 transactions per second and 12 second latency. Smart contract platforms, such as Ethereum inherit those scalability limitations. More significant improvements, however, call for a fundamental redesign of the blockchain paradigm.

This week we published a pre-print of our new Chainspace system—a distributed ledger platform for high-integrity and transparent processing of transactions within a decentralized system. Chainspace uses smart contracts to offer extensibility, rather than catering to specific applications such as Bitcoin for a currency, or certificate transparency for certificate verification. Unlike Ethereum, Chainspace’s sharded architecture allows for a ledger linearly scalable since only the nodes concerned with the transaction have to process it. Our modest testbed of 60 cores achieves 350 transactions per second. In comparison, Bitcoin achieves a peak rate of less than 7 transactions per second for over 6k full nodes, and Ethereum currently processes 4 transactions per second (of a theoretical maximum of 25). Moreover, Chainspace is agnostic to the smart contract language, or identity infrastructure, and supports privacy features through modern zero-knowledge techniques. We have released the Chainspace whitepaper, and the code is available as an open-source project on GitHub.

System Overview

The figure above illustrates the system design of Chainspace. Chainspace is comprised of a network of infrastructure nodes that manage valid objects and ensure that only valid transactions on those objects are committed.  Let’s look at the data model of Chainspace first. An object represents a unit of data in the Chainspace system (e.g., a bank account), and is in one of the following three states: active (can be used by a transaction), locked (is being processed by an existing transaction), or inactive (was used by a previous transaction).  Objects also have a type that determines the unique identifier of the smart contract that defines them. Smart contract procedures can operate on active objects only, while inactive objects are retained just for the purposes of audit. Chainspace allows composition of smart contracts from different authors to provide ecosystem features. Each smart contract is associated with a checker to enable private processing of transactions on infrastructure nodes since checkers do not take any secret local parameters. Checkers are pure functions (i.e., deterministic, and have no side-effects) that return a boolean value.

Now, a valid transaction accepts active input objects along with other ancillary information, and generates output objects (e.g., transfers money to another bank account). To achieve high transaction throughput and low latency, Chainspace organizes nodes into shards that manage the state of objects, keep track of their validity, and record transactions aborted or committed. We implemented this using Sharded Byzantine Atomic Commit (S-BAC)—a protocol that composes existing Byzantine Fault Tolerant (BFT) agreement and atomic commit primitives in a novel way. Here is how the protocol works:

  • Intra-shard agreement. Within each shard, all honest nodes ensure that they consistently agree on accepting or rejecting a transaction.
  • Inter-shard agreement. Across shards, nodes must ensure that transactions are committed if all shards are willing to commit the transaction, and rejected (or aborted) if any shards decide to abort the transaction.

Consensus on committing (or aborting) transactions takes place in parallel across different shards. A nice property of S-BAC’s atomic commit protocol is that the entire shard—rather than a third party—acts as a coordinator. This is in contrast to other sharding-based systems with cryptocurrency application like OmniLedger or RSCoin where an untrusted client acts as the coordinator, and is incentivized to act honestly. Such incentives do not hold for a generalized platform like Chainspace where objects may have shared ownership.

Continue reading Chainspace: A Sharded Smart Contracts Platform

Creating scalable distributed ledgers for DECODE

Since the introduction of Bitcoin in 2008, blockchains have gone from a niche cryptographic novelty to a household name. Ethereum expanded the applicability of such technologies, beyond managing monetary value, to general computing with smart contracts. However, we have so far only scratched the surface of what can be done with such “Distributed Ledgers”.

The EU Horizon 2020 DECODE project aims to expand those technologies to support local economy initiatives, direct democracy, and decentralization of services, such as social networking, sharing economy, and discursive and participatory platforms. Today, these tend to be highly centralized in their architecture.

There is a fundamental contradiction between how modern services harness the work and resources of millions of users, and how they are technically implemented. The promise of the sharing economy is to coordinate people who want to provide resources with people who want to use them, for instance spare rooms in the case of Airbnb; rides in the case of Uber; spare couches of in the case of couchsurfing; and social interactions in the case of Facebook.

These services appear to be provided in a peer-to-peer, and disintermediated fashion. And, to some extent, they are less mediated at the application level thanks to their online nature. However, the technical underpinnings of those services are based on the extreme opposite design philosophy: all users technically mediate their interactions through a very centralized service, hosted on private data centres. The big internet service companies leverage their centralized position to extract value out of user or providers of services – becoming de facto monopolies in many case.

When it comes to privacy and security properties, those centralized services force users to trust them absolutely, and offer little on the way of transparency to even allow users to monitor the service practices to ground that trust. A recent example illustrating this problem was Uber, the ride sharing service, providing a different view to drivers and riders about the fare that was being paid for a ride – forcing drivers to compare what they receive with what riders pay to ensure they were getting a fair deal. Since Uber, like many other services, operate in a non-transparent manner, its functioning depends on users absolute to ensure fairness.

The lack of user control and transparency of modern online services goes beyond monetary and economic concerns. Recently, the Guardian has published the guidelines used by Facebook to moderate abusive or illegal user postings. While, moderation has a necessary social function, the exact boundaries of what constitutes abuse came into question: some forms of harms to children or holocaust denial were ignored, while material of artistic or political value has been suppressed.

Even more worryingly, the opaque algorithms being used to promote and propagate posts have been associated with creating a filter bubble effect, influencing elections, and dark adverts, only visible to particular users, are able to flout standards of fair political advertising. It is a fact of the 21st century that a key facet of the discursive process of democracy will take place on online social platforms. However, their centralized, opaque and advertising-driven form is incompatible with their function as a tool for democracy.

Finally, the revelations of Edward Snowden relating to mass surveillance, also illustrate how the technical centralization of services erodes privacy at an unprecedented scale. The NSA PRISM program coerced internet services to provide access to data on their services under a FISA warrant, not protecting the civil liberties of non-American persons. At the same time, the UPSTREAM program collected bulk information between data centres making all economic, social and political activities taking place on those services transparent to US authorities. While users struggle to understand how those services operate, governments (often foreign) have total visibility. This is a complete inversion of the principles of liberal democracy, where usually we would expect citizens to have their privacy protected, while those in position of authority and power are expected to be accountable.

The problems of accountability, transparency and privacy are social, but are also based on the fundamental centralized architecture underpinning those services. To address them, the DECODE project brings together technical, legal, social experts from academia, alongside partners from local government and industry. Together they are tasked to develop architectures that are compatible with the social values of transparency, user and community control, and privacy.

The role of UCL Computer Science, as a partner, is to provide technical options into two key technical areas: (1) the scalability of secure decentralized distributed ledgers that can support millions or billions of users while providing high-integrity and transparency to operations; (2) mechanisms for protecting user privacy despite the decentralized and transparent infrastructure. The latter may seem like an oxymoron: how can transparency and privacy be reconciled? However, thanks to advances in modern cryptography, it is possible to ensure that operations were correctly performed on a ledger, without divulging private user data – a family of techniques known as zero-knowledge.

I am particularly proud of the UCL team we have put together that is associated with this project, and strengthens considerably our existing expertise in distributed ledgers.

I will be leading and coordinating the work. I have a long standing interest, and track record, in privacy enhancing technologies and peer-to-peer computing, as well as scalable distributed ledgers – such as the RSCoin currency proposal. Shehar Bano, an expert on systems and networking, has joined us as a post-doctoral researcher after completing her thesis at Cambridge. Alberto Sonnino will be doing his thesis on distributed ledgers and privacy, as well as hardware and IoT applications related to ledgers, after completing his MSc in Information Security at UCL last year. Mustafa Al-Bassam, is also associated with the project and works on high-integrity and scalable ledger technologies, after completing his degree at Kings College London – he is funded by the Turing Institute to work on such technologies. Those join our wider team of UCL CS faculty, with research interests in distributed ledgers, including Sarah Meiklejohn, Nicolas Courtois and Tomaso Aste and their respective teams.

 

This post also appears on the DECODE project blog.

Top ten obstacles along distributed ledgers’ path to adoption

In January 2009, Bitcoin was released into the world by its pseudonymous founder, Satoshi Nakamoto. In the ensuing years, this cryptocurrency and its underlying technology, called the blockchain, have gone on a rollercoaster ride that few could have predicted at the time of its deployment. It’s been praised by governments around the world, and people have predicted that “the blockchain” will one day be like “the Internet.” It’s been banned by governments around the world, and people have declared it “adrift” and “dead.”

After years in which discussions focused entirely on Bitcoin, people began to realize the more abstract potential of the blockchain, and “next-generation” platforms such as Ethereum, Steem, and Zcash were launched. More established companies also realized the value in the more abstract properties of the blockchain — resilience, integrity, etc. — and repurposed it for their particular industries to create an even wider class of technologies called distributed ledgers, and to form industrial consortia such as R3 and Hyperledger. These more general distributed ledgers can look, to varying degrees, quite unlike blockchains, and have a somewhat clearer (or at least different) path to adoption given their association with established partners in industry.

Amidst many unknowns, what is increasingly clear is that, even if they might not end up quite like “the Internet,” distributed ledgers — in one form or another — are here to stay. Nevertheless, a long path remains from where we are now to widespread adoption and there are many important decisions to be made that will affect the security and usability of any final product. In what follows, we present the top ten obstacles along this path, and highlight in some cases both the problem and what we as a community can do (and have been doing) to address them. By necessity, many interesting aspects of distributed ledgers, both in terms of problems and solutions, have been omitted, and the focus is largely technical in nature.

10. Usability: why use distributed ledgers?

The problem, in short. What do end users actually want from distributed ledgers, if anything? In other words, distributed ledgers are being discussed as the solution to problems in many industries, but what is it that the full public verifiability (or accountability, immutability, etc.) of distributed ledgers really maps to in terms of what end users want?

9. Governance: who makes the rules?

The problem, in short. The beauty of distributed ledgers is that no one entity gets to control the decisions made by the network; in Bitcoin, e.g., coins are generated or transferred from one party to another only if a majority of the peers in the network agree on the validity of this action. While this process becomes threatened if any one peer becomes too powerful, there is a larger question looming over the operation of these decentralized networks: who gets to decide which actions are valid in the first place? The truth is that all these networks operate according to a defined set of rules, and that “who makes the rules matters at least as much as who enforces them.”

In this process of making the rules, even the most decentralized networks turn out to be heavily centralized, as recent issues in cryptocurrency governance demonstrate. These increasingly common collapses threaten to harm the value of these cryptocurrencies, and reveal the issues associated with ad-hoc forms of governance. Thus, the problem is not just that we don’t know how to govern these technologies, but that — somewhat ironically — we need more transparency around how these structures operate and who is responsible for which aspects of governance.

8. Meaningful comparisons: which is better?

The problem. Bitcoin was the first cryptocurrency to be based on the architecture we now refer to as the blockchain, but it certainly isn’t the last; there are now thousands of alternative cryptocurrencies out there, each with its own unique selling point. Ethereum offers a more expressive scripting language and maintains state, Litecoin allows for faster block creation than Bitcoin, and each new ICO (Initial Coin Offering) promises a shiny feature of its own. Looking beyond blockchains, there are numerous proposals for cryptocurrencies based on consensus protocols other than proof-of-work and proposals in non-currency-related settings, such as Certificate Transparency, R3 Corda, and Hyperledger Fabric, that still fit under the broad umbrella of distributed ledgers.

Continue reading Top ten obstacles along distributed ledgers’ path to adoption

Caveat emptor: Privacy could turn UK’s genomic dream into a nightmare

Raise your hand if, over the past couple of years, you have not heard of whole genome sequencing (usually abbreviated as WGS), or at least read a sensational headline or two about how fast its costs are dropping. In a nutshell, WGS is used to determine an organism’s complete DNA sequence. But it is actually not the only way to analyze our DNA — in fact, genetic testing has been used in clinical settings for decades, e.g., to diagnose patients with known genetic conditions. Seven-time Wimbledon champion Pete Sampras is a beta-thalassemia carrier – a condition that affects the formation of beta-globin chains, ultimately leading to red blood cells not being formed correctly. Testing for thalassemia, usually triggered by family history or a blood test showing low mean corpuscular volume, is done with a number of simple in-vitro techniques.

The availability of affordable whole genome sequencing not only prompts new hopes toward the discovery and diagnosis of rare/unknown genetic conditions, but also enables researchers to better understand the relationship between the genome and predisposition to diseases, response to treatment, etc. Overall, progress makes it increasingly feasible to envision a not-so-distant future where individuals will undergo sequencing once, making their digitized genome easily available for doctors, clinicians, and third-parties. This would also allow us to use computational algorithms to analyze the genome as a whole, as opposed to expensive, slower, targeted in-vitro tests.

Along these lines is last week’s announcement by Prof. Dame Sally Davies, UK’s Chief Medical Officer, calling the NHS to deliver her “genomic dream” within five years, with whole genome sequencing becoming “as standard as blood tests and biopsies.” As detailed in her annual report, a large number of patients in the UK already undergo genetic testing at least once in their life, and for a wide range of reasons, including the aforementioned thalassemia diagnosis, screening for cancer predisposition triggered by high family incidence, or determining the best course of action in cancer treatment. So wouldn’t it make sense to sequence the genome once and keep the data available for life? My answer is yes, but with a number of bold and double underlined caveats.

The first one is with respect to the security concerns prompted by the need to store data of extreme sensitivity like genomic data. The genome obviously contains information about ethnic heritage and predisposition to diseases/conditions, possibly including mental disorders. Data breaches of sensitive information, including health and medical data, sadly happen on a daily basis. But certain security threats are actually specific to genomic data and much more worrisome. For instance, due to its hereditary nature, access to a genome essentially implies access to that of close relatives as well, including offspring, so one’s decision to publish/donate their genome is also being made for their siblings, kids/grandkids, etc. So sensitivity does not degrade over time, but persists long after a patient’s death. In fact, it might even increase, as new aspects of the genome are studied and discovered. As a consequence, Prof. Dame Davies’ dream could easily turn into a nightmare without adequate investments toward sound security measures, that involve both technical tools (such as upgrading of obsolete hardware) as well as education, awareness, and practices that do not simply shift burden onto clinicians and practitioners, but incorporate security in their design and not as an after-the-fact.

Another concern is with allowing researchers to use the genomic data collected by the NHS, along with medical history, for research purposes – e.g., to discover genetic mutations that are responsible for certain traits or diseases. This requires building a meaningful trust relationship between the NHS/Government and patients, which cannot happen without healing the wounds from recent incidents like the care.data debacle or Google DeepMind’s use of personal NHS records. Instead, the annual report seems to include security/anonymity promises we cannot realistically maintain, while, worse yet, promoting a rhetoric of greater good trumping privacy concerns, as well as seemingly pushing a choice between donating data and access to the best care. It is misleading to use terms like “de-identification” of genomic data as an effective protection tool, while proper anonymization is inherently impossible due to its peculiar combination of unique and hereditary features, as demonstrated by a wide array of scientific results. Rather, we should make it clear that data can never be fully anonymized, or protected with 100% guarantees.

Overall, I believe that patients should not be automatically enrolled in sequencing programs. Even if they are given an option to later withdraw, once the data is out there it is impossible to delete all copies of it. Rather, patients should voluntarily decide to join through an effective informed consent mechanism. This proves to be challenging against a background in which information that can be extracted/inferred from genomes may rapidly change: what if in the future a new mutation responsible for early on-set Alzheimer’s is discovered? What if the NHS is privatized? Encouraging results with respect to education and informed consent, however, do exist. For instance, the Personal Genome Project is a good example of effective strategies to help volunteers understand the risks and could be used to inform future NHS-run sequencing programs.

 

An edited version of this article was originally published on the BMJ.

Battery Status Not Included: Assessing Privacy in W3C Web Standards

Designing standards with privacy in mind should be a standard in itself. Historically this was not always the case and the idea of designing systems with privacy is relatively new – it dates from the beginning of this decade. One of the milestones is accepting this view on the IETF level, dating 2013.

This note and the referenced research work focusses on designing standards with privacy in mind. The World Wide Web Consortium (W3C) is one of the most important standardization bodies. W3C is standardizing something everyone – billions of people – use every day for entertainment or work, the web and how web browsers work. It employs a focused, lengthy, but very open and consensus-driven environment in order to make sure that community voice is always heard during the drafting of new specifications. The actual inner workings of W3C are surprisingly complex. One interesting observation is that the W3C Process Document document does not actually mention privacy reviews at all. Although in practice this kind of reviews are always the case.

However, as the web along with its complexity and web browser features constantly grow, and as browsers and the web are expected to be designed in an ever-rapid manner, considering privacy becomes a challenge. But this is what is needed in order to obtain a good specification and well-thought browser features, with good threat models, and well designed privacy strategies.

My recent work, together with the Princeton team, Arvind Narayanan and Steven Englehardt, is providing insight into the standardization process at W3C, specifically the privacy areas of standardization. We point out a number of issues and room for improvements. We also provide recommendations as well as broad evidence of a wide misuse and abuse (in tracking and fingerprinting) of a browser mechanism called Battery Status API.

Battery Status API is a browser feature that was meant to allow websites access the information concerning the battery state of a user device. This seemingly innocuous mechanism initially had no identified privacy concerns. However, following my previous research work in collaboration with Gunes Acar this view has changed (Leaking Battery and a later note).

The work I authored in 2015 identifies a number or privacy risks of the API. Specifically – the issue of potential exploitation of the API to tracking, as well as the potential possibility of recovering the raw value of the battery capacity – something that was not foreseen to be accessed by web sites. Ultimately, this work has led to a fix in Firefox browser and an update to W3C specification. But the matter was not over yet.

Continue reading Battery Status Not Included: Assessing Privacy in W3C Web Standards

On the security and privacy of the ultrasound tracking ecosystem

In April 2016, the US Federal Trade Commission (FTC) sent warning letters to 12 Google Play app developers. The letters were addressed to those who incorporated the SilverPush framework in their apps, and reminded developers who used tracking software to explicitly inform their users (as seen in Section 5 of the Federal Trade Commission Act). The incident was covered by popular press and privacy concerns were raised. Shortly after, SilverPush claimed no active partnerships in the US and the buzz subsided.

Unfortunately, as the incident was resolved relatively fast, very few technical details of the technology were made public. To fill in this gap and understand the potential security implications, we conducted an in-depth study of the SilverPush framework and all the associated technologies.

The development of the framework was motivated by a fast-increasing interest of the marketing industry in products performing high-accuracy user tracking, and their derivative monetization schemes. This resulted in a high demand for cross-device tracking techniques with increased accuracy and reduced prerequisites.

The SilverPush framework fulfilled both of these requirements, as it provided a novel way to track users between their devices (e.g., TV, smartphone), without any user actions (e.g., login to a single platform from all their devices). To achieve that, the framework realized a previously unseen cross-device tracking technique (i.e., ultrasound cross-device tracking, uXDT) that offered high tracking accuracy, and came with various desirable features (e.g., easy to deploy, imperceptible by users). What differentiated that framework from existing ones was the use of high-frequency, inaudible ultrasonic beacons (uBeacons) as a medium/channel for identifier transmission between the user’s devices. This is also offered a major advantage to uXDT, against other competing technologies, as uBeacons can be emitted by most commercial speakers and captured by most commercial microphones. This eliminates the need for specialized emission and/or capturing equipment.

Aspects of a little-known ecosystem

The low deployment cost of the technology fueled the growth of a whole ecosystem of frameworks and applications that use uBeacons for various purposes, such as proximity marketing, audience analytics, and device pairing. The ecosystem is built around the near-ultrasonic transmission channel, and enables marketers to profile users.

Unfortunately, users are often given limited information on the ecosystem’s inner workings. This lack of transparency has been the target of great criticism from the users, the security community and the regulators. Moreover, our security analysis revealed a false assumption in the uBeacon threat model that can be exploited by state-level adversaries to launch complex attacks, including one that de-anonymizes the users of anonymity networks (e.g., Tor).

On top of these, a more fundamental shortcoming of the ecosystem is the violation of the least privilege principle, as a consequence of the access to the device’s microphone. More specifically, any app that wants to employ ultrasound-based mechanisms needs to gain full access to the device’s microphone, as there is currently no way to gain access only to the ultrasound spectrum. This clearly violates the least privilege principle, as the app has now access to all audible frequencies and allows a potentially malicious developer to request access to the microphone for ultrasound-pairing purposes, and then use it to eavesdrop the user’s discussions. This also results in any ultrasound-enabled apps to risk being perceived as “potentially malicious” by the users.

Mitigation

To address these shortcomings, we developed a set of countermeasures aiming to provide protection to the users in the short and medium term. The first one is an extension for the Google Chrome browser, which filters out all ultrasounds from the audio output of the websites loaded by the user. The extension actively prevents web pages from emitting inaudible sounds, and thus completely thwarts any unsolicited ultrasound-tracking attempts. Furthermore, we developed a patch for the Android permission system that allows finer-grained control over the audio channel, and forces applications to declare their intention to capture sound from the inaudible spectrum. This will properly separate the permissions for listening to audible sound and sound in the high-frequency spectrum, and will enable the end users to selectively filter the ultrasound frequencies out of the signal acquired by the smartphone microphone.

More importantly, we argue that the ultrasound ecosystem can be made secure only with the standardization of the ultrasound beacon format. During this process, the threat model will be revised and the necessary security features for uBeacons will be specified. Once this process is completed, APIs for handling uBeacons can be implemented in all major operating systems. Such an API would provide methods for uBeacon discovery, processing, generation and emission, similar to those found in the Bluetooth Low Energy APIs. Thereafter, all ultrasound-enabled apps will need access only to this API, and not to the device’s microphone. Thus, solving the problem of over-privileging that exposed the user’s sensitive data to third-party apps.

Discussion

Our work provides an early warning on the risks looming in the ultrasound ecosystem, and lays the foundations for the secure use of this set of technologies. However, it also raises several questions regarding the security of the audio channel. For instance, in a recent incident a journalist accidentally injected commands to several amazon echo devices, which then allegedly tried to order products online. This underlines the need for security features in the audio channel. Unfortunately, due to the variety of use cases, a universal solution that could be applied to the lower communication layers seems unlikely. Instead, solutions must be sought in the higher communications layers (e.g., application layers), and should be the outcome of careful threat modeling.

Inaugural Lecture: Zero-Knowledge Proofs

We held our annual ACE-CSR event in November 2016. The last talk was my inaugural lecture to full professor. I did not write the summary below myself, hence the use of third person, not because I now consider myself royalty! 🙂

In introducing Jens Groth, professor of cryptology, George Danezis, head of the information security group, commented that Groth’s work provided the only viable solutions to many of the hard privacy problems he himself was tackling. To most qualified engineers, he said, the concept of zero-knowledge proofs seems impossible: the idea is to show the properties of a secret without revealing them. A zero-knowledge proof could, for example, verify the result of a computation on some data without revealing the data itself. Most engineers believe that you must choose between integrity and confidentiality; Groth has proved this is not true. In addition, Danezis praised Groth’s work as highly creative, characterised by great mathematical depth and subtlety, and admired Groth’ willingness to speak his mind fearlessly even to government funders. Angela Sasse, head of the department, called Groth’s work “security tools we’re going to need for future generations”, and noted that simultaneously with these other accomplishments Groth helped put in place the foundation for the group as it is today.

Groth, jokingly opted to structure his talk around papers he’s had rejected to illustrate how hard it can be to publish innovative research. The concept of zero-knowledge proofs originated with a 1985 paper by Shafi Goldwasser, Silvio Micali, and Charles Rackoff. Zero-knowledge proofs have three characteristics: completeness (the prover can convince the verifier that the statement is true); soundness (the claiming prover cannot convince the verifier when the statement is false); and secrecy (no information other than the truth of the statement is leaked, even when the prover is interacting with a verifier who cheats). Groth illustrated the latter idea with a simple card trick: he asked an audience member to choose a card and then say whether the card was a heart or not. If the respondent shows all the cards that are not hearts, counting these proves that the selected card must be a heart without revealing what it is.

Zero-knowledge proofs can be extended to think about more complicated statements. Groth listed some examples:

  • Assert that a logical formula has an assignment to the variables that makes it true
  • Verify that a graph is Hamiltonian – that is, there is a path that touches each vertex exactly once
  • That a set of inputs into a Boolean circuit will produce an output of 1
  • Any statement of the general form U belongs to some NP-language L

Groth could see many possible applications for these proofs: signatures, encryption, electronic cash, electronic auctions, internet voting, multiparty computation, and verifiable cloud computing. His overall career has focused on building versatile and efficient proofs with the goal of moving them from being expensive and slow to being just a fraction of the cost of the task that’s being executed so that people would stop thinking about the cost and just toss them in as a standard part of any transaction.

Continue reading Inaugural Lecture: Zero-Knowledge Proofs

Privacy analysis of the W3C Proximity Sensor specification

Mobile developers are familiar with proximity sensors. These provide information about the distance of an object from the device providing access to the sensor (i.e. smartphone, tablet, laptop, or another Web of Things device). This object is usually the user’s head or hand.

Proximity sensors provide binary values such as far (from the object) or near (respectively), or a more verbose readout, in centimeters.

There are many useful applications for proximity. For example, an application can turn off the screen if the user holds the device close to his/her head (face detection). It is also handy for avoiding the execution of undesired actions, possibly arising from scratching the head with the phone’s screen.

The Proximity Sensor API specification is being standardized by W3C. Every web site will be able to access this information. Let’s focus on the privacy engineering point of view.

Proximity is the distance between an object and device. The latest version of W3C Proximity Sensors API takes advantage of a soon-to-be standard Generic Sensors API. The sensor provides the proximity distance in centimeters.

The use of Proximity Sensors is simple; an example displaying the current distance in centimeters using the modern syntax of Generic Sensors API is as follows:

let sensor = new ProximitySensor();

sensor.start();

sensor.onchange = function(){console.log(event.reading.distance)};  

This syntax might later allow requesting various proximity sensors (if the device has more than one), including those based on the sensor’s position (such as “rear” or “front”), but the current implementations generally still use the legacy version from the previous edition of the spec:

window.addEventListener('deviceproximity', function(event) {  
console.log(event.value)  
});

From the perspective of privacy considerations, there is currently no significant difference between those two API versions.

Privacy analysis of Proximity Sensor API

Proximity sensor readout is not providing much data, so it may appear there are no privacy implications, but there are good reasons for performing such analysis. Let’s list two:

Firstly, designing new standards and systems with privacy in mind — privacy by design — is a required practice and a good idea. Secondly, in some circumstances even potentially insignificant mechanisms can still bring consequences from privacy point of view. For example, multiple identifiers might be helpful in deanonymizing users. Non-obvious data leaks can surface, as well.

So let’s discuss some of the possible issues.

The average distance from the device to the user’s face (i.e. object) could be used to differentiate and discriminate between users. While the severity is hard to establish, future changes (i.e. influencing with other external data) cannot be ruled out.

If proximity patterns would be individually-attributable, this would offer a possibility to enhance user profiling based on the analysis of device use patterns. For example, the following could be obtained and analyzed:

  • Frequency of the user’s patterns of device use (i.e. waving a hand in front of the proximity sensor)
  • Frequency of the user’s zooming in and out (i.e. device close to the user’s head)
  • Patterns of use. Does the way the user hold the device vary during the day? How?
  • What are the mechanics of holding the device close to the user’s head? Can the distance vary?

Some users may use the mobile operating system’s zoom capability to increase the font or images, but others might casually prefer to hold the device closer to their eyes. Such behavioral differences can also be of note.

Proximity sensors can provide the following data: the current proximity distance and max, the maximum distance readout supported by the device. In case the value of max would differ between various implementations (e.g. among browsers, devices), it could form an identifier. And the two (max, distance) combined could also be used as short-lived identifiers. Moreover, it is not so difficult to imagine a situation where those identifiers are actually not so short-lived.

Recommendations

First of all, is there a need to provide a verbose proximity readout at all? For example, is providing readouts of proximity (distance) value up to 150 cm necessary?

In general, a device (browser, a Web of Things device) should be capable of informing the user when a web site accesses proximity information. The user should also be able to inspect which web sites – and how frequently – accessed the API.

Finally, the sensors should provide an adequately verbose readout of the distance.

Proximity Sensors should also be subject to permissions.

Demonstration

At the moment, the implementations of proximity sensors are limited. But a demonstration is running on my SensorsPrivacy research project (tested against Firefox Mobile on Android). In my case (Nexus 5), Firefox offered data in centimeters, but the verbosity was limited to 5 centimeters on my device. Again, this is a matter of hardware and software. In principle, the accuracy could be much higher and one can imagine an accuracy a sensor providing more accurate readouts (even up to 1 meter).

The screen dump below shows how the example demonstrates the use of proximity data (and whether it works on a browser). The site changes its background color if an object is placed near the proximity sensor of a device.

The readout from the demonstration shows a relative time between events (first column), and the proximity distance (second column).

398  5  
1011 0  
404  5  
607  0  

The sensor on my device is reporting two values of distance: 0 (near) and 5 (far), in centimeters. In general, the granularity is subject to the following: standard, implementation and hardware.

And this suggests something, of course. So let’s complement the privacy analysis from the previous section to incorporate a timing analysis.

Even such a limited readout can help performing behavioural analysis, simply by profiling based on time series. As we can see, the sensor readout is quite sensitive in respect to time and can measure with sub-200 ms granularity. The demonstration proof of principle is also showing the minimum relative detected time between events. Feel free to test the performance of your fingers and/or hardware!

Summary

When designing a standard project or an implementation, paying attention to details is imperative. This includes consideration of even potential risks. This is especially the case if the software or systems will be used by millions or hundreds of millions of users, i.e. you are aiming for success.

Finally, standards and specifications are ideal for issuing guidance and good practices

 

This post originally appeared on Security, Privacy & Tech Inquiries, the blog of Lukasz Olejnik. An accompanying demonstration is available on SensorsPrivacy (a project studing privacy of web sensors).

A Privacy Enhancing Architecture for Secure Wearable Devices

Why do we need security on wearable devices? The primary reason comes from the fact that, being in direct contact with the user, wearable devices have access to very private and sensitive user’s information more often than traditional technologies. The huge and increasing diversity of wearable technologies makes almost any kind of information at risk, going from medical records to personal habits and lifestyle. For that reason, when considering wearables, it is particularly important to introduce appropriate technologies to protect these data, and it is primary that both the user and the engineer are aware of the exact amount of collected information as well as the potential threats pending on the user’s privacy. Moreover, it has also to be considered that the privacy of the wearable’s user is not the only one at risk. In fact, more and more devices are not limited to record the user’s activity, but can also gather information about people standing around.

This blog post presents a flexible privacy enhancing system from its architecture to the prototyping level. The system takes advantage from anonymous credentials and is based on the protocols developed by M. Chase, S. Meiklejohn, and G. Zaverucha in Algebraic MACs and Keyed-Verification Anonymous Credentials. Three main entities are involved in this multi-purpose system: a main server, wearable devices and localisation beacons. In this multipurpose architecture, the server firstly issue some anonymous credentials to the wearables. Then, each time a wearable reach a particular physical location (gets close to a localisation beacon) where it desires to perform an action, it starts presenting its credentials in order to ask the server the execution of that a particular action. Both the design of the wearable and the server remain generic and scalable in order to encourage further enhancements and easy integration into real-world applications; i.e., the central server can manage an arbitrary number of devices, each device can posses an arbitrary number of credentials and the coverage area of the localisation system is arbitrarily extendable.

Architecture

The complete system’s architecture can be modelled as depicted in the figure below. Roughly speaking, a web interface is used to manage and display the device’s functions. Each user and admin access the system from that interface.

complete_architecture

During the setup phase, the server issues the credentials to a selected device (according to the algorithms presented in Algebraic MACs and Keyed-Verification Anonymous Credentials) granting it a given privilege level. The credentials’ issuance is a short-range process. In fact, the wearable needs to be physically close to the server to allow the admins to physically verify, once and for all, the identity of the wearables’ users. In order to improve security and battery life, the wearable only communicates using extremely low-power and short-range radio waves (dotted line on the figure). The server beacons can be seen as continuities of the main server and have essentially two roles: the first is to operate as an interface between the wearables and the server, and the second is to act as a RF localisation system. Each time the wearable granted with enough privileges reaches some particular physical location (gets close to a localisation beacon), it starts presenting its credentials in order to prove to the server that it possesses credentials over some attributes (without revealing them), and that these credentials have been previously issued by the server itself. Note that the system preserves anonymity only if many users are involved (for each privilege level), but this is a classic requirement of anonymous systems. Finally, once the credentials have been successfully verified by the server, the server issues a signed request to an external entity (which can be, for instance, an automatic door, an alarm system or any compatible IoT entity) to perform the requested action.

Continue reading A Privacy Enhancing Architecture for Secure Wearable Devices