Measuring mobility without violating privacy – a case study of the London Underground

In the run-up to this year’s Privacy Enhancing Technologies Symposium (PETS 2019), I noticed some decidedly non-privacy-enhancing behaviour. Transport for London (TfL) announced they will be tracking the wifi MAC addresses of devices being carried on London Underground stations. Before storing a MAC address it will be hashed with a key, but since this key will remain unchanged for an extended period (2 years), it will be possible to track the movements of an individual over this period through this pseudonymous ID. These traces are likely enough to link records back to the individual with some knowledge of that person’s distinctive travel plans. Also, for as long as the key is retained it would be trivial for TfL (or someone who stole the key) to convert the someone’s MAC address into its pseudonymised form and indisputably learn that that person’s movements.

TfL argues that under the General Data Protection Regulations (GDPR), they don’t need the consent of individuals they monitor because they are acting in the public interest. Indeed, others have pointed out the value to society of knowing how people typically move through underground stations. But the GDPR also requires that organisations minimise the amount of personal data they collect. Could the same goal be achieved if TfL irreversibly anonymised wifi MAC addresses rather than just pseudonymising them? For example, they could truncate the hashed MAC address so that many devices all have the same truncated anonymous ID. How would this affect the calculation of statistics of movement patterns within underground stations? I posed these questions in a presentation at the PETS 2019 rump session, and in this article, I’ll explain why a set of algorithms designed to violate people’s privacy can be applied to collect wifi mobility information while protecting passenger privacy.

It’s important to emphasise that TfL’s goal is not to track past Underground customers but to predict the behaviour of future passengers. Inferring past behaviours from the traces of wifi records may be one means to this end, but it is not the end in itself, and TfL creates legal risk for itself by holding this data. The inferences from this approach aren’t even going to be correct: wifi users are unlikely to be typical passengers and behaviour will change over time. TfL’s hope is the inferred profiles will be useful enough to inform business decisions. Privacy-preserving measurement techniques should be judged by the business value of the passenger models they create, not against how accurate they are at following individual passengers around underground stations in the past. As the saying goes, “all models are wrong, but some are useful”.

Simulating privacy-preserving mobility measurement

To explore this space, I built a simple simulation of Euston Station inspired by one of the TfL case studies. In my simulation, there are two platforms (A and B) and six types of passengers. Some travel from platform A to B; some from B to A; others enter and leave the station at one platform (A or B). Of the passengers that travel between platforms, they can take either the fast route (taking 2 minutes on average) or the slow route (taking 4 minutes on average). Passengers enter the station at a Poisson arrival rate averaging one per second. The probabilities that each new passenger is of a particular type are shown in the figure below. The goal of the simulation is to infer the number of passengers of each type from observations of wifi measurements taken at platforms A and B.

Continue reading Measuring mobility without violating privacy – a case study of the London Underground

Tracing transactions across cryptocurrency ledgers

The Bitcoin whitepaper specifies the risks of revealing owners of addresses. It states that “if the owner of a key is revealed, linking could reveal other transactions that belonged to the same owner.”  Five years later, we have seen many projects which look at de-anonymising entities in Bitcoin. Such projects use techniques such as address tagging and clustering to tie many addresses to one entity, making it easier to analyse the movement of funds. However, this is not only limited to Bitcoin but also occurs on alternative cryptocurrencies such as Zcash and Monero. Thus tracing transactions on-chain is a known and studied problem.

But we have recently seen a shift into entities performing cross-currency trades. For example, the WannaCry hackers laundered over $142,000 Bitcoin from ransoms across cryptocurrencies. The issue here is that cross-chain transactions appear to be indistinguishable from native transactions on-chain. For example, to trade Bitcoin for Monero, one would have to send the exchange bitcoin, and in return, the exchange sends the user some coins in Monero. Both these transactions occur on separate chains and do not appear to be connected, so the actual swap can appear to be obscured. This level of obscurity can be used to hide the original flow of coins, giving users an additional form of anonymity.

Thus it is important to ask whether or not we can analyse such transactions and the extent of the analysis possible, and if so, how? In our paper being presented today at the USENIX Security Symposium, we (Haaroon Yousaf, George Kappos and Sarah Meiklejohn) answer these questions.

Our Research

In summary, we scraped and linked over 1.3 million transactions across different blockchains from the service ShapeShift. In doing so, we found over 100,000 cases where users would convert coins to another currency then move right back to the original one, identified that a Bitcoin address associated with CoinPayments.net address is a very popular service for users to shift to, and saw that scammers preferred shifting their Ethereum to Bitcoin and Monero.

We collected and analysed 13 months of transaction data across eight different blockchains to identify how users interacted with this service. In doing so, we developed new heuristics and identified various patterns of cross-currency trades.

What is ShapeShift? 

ShapeShift is a lightweight cross-currency non-custodial service that facilitates trades which allows users to directly trade coins from one currency to another (a cross-currency shift). This service acts as the entity which facilitates the entire trade, allowing users to essentially swap their coins with its own supply. ShapeShift and Changelly are examples of such services.

Continue reading Tracing transactions across cryptocurrency ledgers

Thoughts on the Libra blockchain: too centralised, not private, and won’t help the unbanked

Facebook recently announced a new project, Libra, whose mission is to be “a simple global currency and financial infrastructure that empowers billions of people”. The announcement has predictably been met with scepticism by organisations like Privacy International, regulators in the U.S. and Europe, and the media at large. This is wholly justified given the look of the project’s website, which features claims of poverty reduction, job creation, and more generally empowering billions of people, wrapped in a dubious marketing package.

To start off, there is the (at least for now) permissioned aspect of the system. One appealing aspect of cryptocurrencies is their potential for decentralisation and censorship resistance. It wasn’t uncommon to see the story of PayPal freezing Wikileak’s account in the first few slides of a cryptocurrency talk motivating its purpose. Now, PayPal and other well-known providers of payment services are the ones operating nodes in Libra.

There is some valid criticism to be made about the permissioned aspect of a system that describes itself as a public good when other cryptocurrencies are permissionless. These are essentially centralised, however, with inefficient energy wasting mechanisms like Proof-of-Work requiring large investments for any party wishing to contribute.

There is a roadmap towards decentralisation, but it is vague. Achieving decentralisation, whether at the network or governance level, hasn’t been done even in a priori decentralised cryptocurrencies. In this sense, Libra hasn’t really done worse so far. It already involves more members than there are important Bitcoin or Ethereum miners, for example, and they are also more diverse. However, this is more of a fault in existing cryptocurrencies rather than a quality of Libra.

Continue reading Thoughts on the Libra blockchain: too centralised, not private, and won’t help the unbanked

How Accidental Data Breaches can be Facilitated by Windows 10 and macOS Mojave

Inadequate user interface designs in Windows 10 and macOS Mojave can cause accidental data breaches through inconsistent language, insecure default options, and unclear or incomprehensible information. Users could accidentally leak sensitive personal data. Data controllers in companies might be unknowingly non-compliant with the GDPR’s legal obligations for data erasure.

At the upcoming Annual Privacy Forum 2019 in Rome, I will be presenting the results of a recent study conducted with my colleague Mark Warner, exploring the inadequate design of user interfaces (UI) as a contributing factor in accidental data breaches from USB memory sticks. The paper titled “Fight to be Forgotten: Exploring the Efficacy of Data Erasure in Popular Operating Systems” will be published in the conference proceedings at a later date but the accepted version is available now.

Privacy and security risks from decommissioned memory chips

The process of decommissioning memory chips (e.g. USB sticks, hard drives, and memory cards) can create risks for data protection. Researchers have repeatedly found sensitive data on devices they acquired from second-hand markets. Sometimes this data was from the previous owners, other times from third persons. In some cases, highly sensitive data from vulnerable people were found, e.g. Jones et al. found videos of children at a high school in the UK on a second-hand USB stick.

Data found this way had frequently been deleted but not erased, creating the risk that any tech-savvy future owner could access it using legally available, free to download software (e.g., FTK Imager Lite 3.4.3.3). Findings from these studies also indicate the previous owners’ intentions to erase these files and prevent future access by unauthorised individuals, and their failure to sufficiently do so. Moreover, these risks likely extend from the second-hand market to recycled memory chips – a practice encouraged under Directive 2012/19/EU on ‘waste electrical and electronic equipment’.

The implications for data security and data protection are substantial. End-users and companies alike could accidentally cause breaches of sensitive personal data of themselves or their customers. The protection of personal data is enshrined in Article 8 of the Charter of Fundamental Rights of the European Union, and the General Data Protection Regulation (GDPR) lays down rules and regulation for the protection of this fundamental right. For example, data processors could find themselves inadvertently in violation of Article 17 GDPR Right to Erasure (‘right to be forgotten’) despite their best intentions if they failed to erase a customer’s personal data – independent of whether that data was breached or not.

Seemingly minor design choices, the potential for major implications

The indication that people might fail to properly erase files from storage, despite their apparent intention to do so, is a strong sign of system failure. We know since more than twenty years that unintentional failure of users at a task is often caused by the way in which [these] mechanisms are implemented, and users’ lack of knowledge. In our case, these mechanisms are – for most users – the UI of Windows and macOS. When investigating these mechanisms, we found seemingly minor design choices that might facilitate unintentional data breaches. A few examples are shown below and are expanded upon in the full publication of our work.

Continue reading How Accidental Data Breaches can be Facilitated by Windows 10 and macOS Mojave

Protecting human rights by avoiding regulatory capture within surveillance oversight

Regulation is in the news again as a result of the Home Office blocking surveillance expert Eric Kind from taking up his role as Head of Investigation at the Investigatory Powers Commissioner’s Office (IPCO) – the newly created agency responsible for regulating organisations managing surveillance, including the Home Office. Ordinarily, it would be unheard of for a regulated organisation to be able to veto the appointment of staff to their regulator, particularly one established through statute as being independent. However, the Home Office was able to do so here by refusing to issue the security clearance required for Kind to do his job. The Investigatory Powers Commissioner, therefore, can’t override this decision, the Home Office doesn’t have to explain their reasoning, nor is there an appeal process.

Behaviour like this can lead to regulatory capture – where the influence of the regulated organisation changes the effect of regulation to direct away from the public interest and toward the interests of the organisations being regulated. The mechanism of blocking security clearances is specific to activities relating to the military and intelligence, but the phenomenon of regulatory capture is more widespread. Consequently, regulatory capture has been well studied, and there’s a body of work describing tried and tested ways to resist it. If the organisations responsible for surveillance regulation were to apply these recommendations, it would improve both the privacy of the public and the trust in agencies carrying out surveillance. When we combine these techniques with advanced cryptography, we can do better still.

Regulatory capture is also a problem in finance – likely contributing to high-profile scandals like Libor manipulation, and payment-protection-insurance misselling. In previous articles, we’ve discussed how regulators’ sluggish response to new fraud techniques has led to their victims unfairly footing the bill. Such behaviour by regulators is rarely the result of clear corruption – regulatory capture is often more subtle. For example, the skills needed by the regulator may only be available by hiring staff from the regulated organisations, bringing their culture and mindset along with them. Regulators’ staff often find career opportunities within the regulator limited and so are reluctant to take a hard-line against the regulated organisation and so close off the option of getting a job there later – likely at a much higher salary. Regulatory capture resulting from sharing of staff and their corresponding culture is, I think, a key reason for surveillance oversight bodies having insufficient regard for the public interest.

Continue reading Protecting human rights by avoiding regulatory capture within surveillance oversight

New threat models in the face of British intelligence and the Five Eyes’ new end-to-end encryption interception strategy

Due to more and more services and messaging applications implementing end-to-end encryption, law enforcement organisations and intelligence agencies have become increasingly concerned about the prospect of “going dark”. This is when law enforcement has the legal right to access a communication (i.e. through a warrant) but doesn’t have the technical capability to do so, because the communication may be end-to-end encrypted.

Earlier proposals from politicians have taken the approach of outright banning end-to-end encryption, which was met with fierce criticism by experts and the tech industry. The intelligence community had been slightly more nuanced, promoting protocols that allow for key escrow, where messages would also be encrypted under an additional key (e.g. controlled by the government). Such protocols have been promoted by intelligence agencies as recently as 2016 and early as the 1990s but were also met with fierce criticism.

More recently, there has been a new set of legislation in the UK, statements from the Five Eyes and proposals from intelligence officials that propose a “different” way of defeating end-to-end encryption, that is akin to key escrow but is enabled on a “per-warrant” basis rather than by default. Let’s look at how this may effect threat models in applications that use end-to-end encryption in the future.

Legislation

On the 31st of August 2018, the governments of the United States, the United Kingdom, Canada, Australia and New Zealand (collectively known as the “Five Eyes”) released a “Statement of Principles on Access to Evidence and Encryption”, where they outlined their position on encryption.

In the statement, it says:

Privacy laws must prevent arbitrary or unlawful interference, but privacy is not absolute. It is an established principle that appropriate government authorities should be able to seek access to otherwise private information when a court or independent authority has authorized such access based on established legal standards.

The statement goes on to set out that technology companies have a mutual responsibility with government authorities to enable this process. At the end of the statement, it describes how technology companies should provide government authorities access to private information:

The Governments of the Five Eyes encourage information and communications technology service providers to voluntarily establish lawful access solutions to their products and services that they create or operate in our countries. Governments should not favor a particular technology; instead, providers may create customized solutions, tailored to their individual system architectures that are capable of meeting lawful access requirements. Such solutions can be a constructive approach to current challenges.

Should governments continue to encounter impediments to lawful access to information necessary to aid the protection of the citizens of our countries, we may pursue technological, enforcement, legislative or other measures to achieve lawful access solutions.

Their position effectively boils down to requiring technology companies to provide a technical means to fulfil court warrants that require them to hand over private data of certain individuals, but the implementation for doing so is open to the technology company.

Continue reading New threat models in the face of British intelligence and the Five Eyes’ new end-to-end encryption interception strategy

Coconut E-Petition Implementation

An interesting new multi-authority selective-disclosure credential scheme called Coconut has been recently released, which has the potential to enable applications that were not possible before. Selective-disclosure credential systems allow issuance of a credential (having one or more attributes) to a user, with the ability to unlinkably reveal or “show” said credential at a later instance, for purposes of authentication or authorization. The system also provides the user with the ability to “show” specific attributes of that credential or a specific function of the attributes embedded in the credential; e.g. if the user gets issued an identification credential and it has an attribute x representing their age, let’s say x = 25, they can show that f(x) > 21 without revealing x.

High-level overview of Coconut, from the Coconut paper

A particular use-case for this scheme is to create a privacy-preserving e-petition system. There are a number of anonymous electronic petition systems that are currently being developed but all lack important security properties: (i) unlinkability – the ability to break the link between users and specific petitions they signed, and (ii) multi-authority – the absence of a single trusted third party in the system. Through multi-authority selective disclosure credentials like Coconut, these systems can achieve unlinkability without relying on a single trusted third party. For example, if there are 100 eligible users with a valid credential, and there are a total of 75 signatures when the petition closes, it is not possible to know which 75 people of the total 100 actually signed the petition.

Continue reading Coconut E-Petition Implementation

What We Disclose When We Choose Not To Disclose: Privacy Unraveling Around Explicit HIV Disclosure Fields

For many gay and bisexual men, mobile dating or “hook-up” apps are a regular and important part of their lives. Many of these apps now ask users for HIV status information to create a more open dialogue around sexual health, to reduce the spread of the virus, and to help fight HIV related stigma. Yet, if a user wants to keep their HIV status private from other app users, this can be more challenging than one might first imagine. While most apps provide users with the choice to keep their status undisclosed with some form of “prefer not to say” option, our recent study which we describe in a paper being presented today at the ACM Conference on Computer-Supported Cooperative Work and Social Computing 2018, finds privacy may “unravel” around users who choose this non-disclosure option, which could limit disclosure choice.

Privacy unraveling is a theory developed by Peppet in which he suggests people will self-disclose their personal information when it is easy to do so, low-cost, and personally beneficial. Privacy may then unravel around those who keep their information undisclosed, as they are assumed to be “hiding” undesirable information, and are stigmatised and penalised as a consequence.

In our study, we explored the online views of Grindr users and found concerns over assumptions developing around HIV non-disclosures. For users who believe themselves to be HIV negative, the personal benefits of disclosing are high and the social costs low. In contrast, for HIV positive users, the personal benefits of disclosing are low, whilst the costs are high due to the stigma that HIV still attracts. As a result, people may assume that those not disclosing possess the low gain, high cost status, and are therefore HIV positive.

We developed a series of conceptual designs that utilise Peppet’s proposed limits to privacy unraveling. One of these designs is intended to artificially increase the cost of disclosing an HIV negative status. We suggest time and financial as two resources that could be used to artificially increase disclosure cost. For example, users reporting to be HIV negative could be asked to watch an educational awareness video on HIV prior to disclosing (time), or only those users who had a premium subscription could be permitted to disclose their status (financial). An alternative (or in parallel) approach is to reduce the high cost of disclosing an HIV positive status by designing in mechanisms to reduce social stigma around the condition. For example, all users could be offered the option to sign up to “living stigma-free” which could also appear on their profile to signal others of their pledge.

Another design approach is to create uncertainty over whether users are aware of their own status. We suggest profiles disclosing an HIV negative status for more than 6 months be switched automatically to undisclosed unless they report a recent HIV test. This could act as a testing reminder, as well as increasing uncertainty over the reason for non-disclosures. We also suggest increasing uncertainty or ambiguity around HIV status disclosure fields by clustering undisclosed fields together. This may create uncertainty around the particular field the user is concerned about disclosing. Finally, design could be used to cultivate norms around non-disclosures. For example, HIV status disclosure could be limited to HIV positive users, with non-disclosures then assumed to be a HIV negative status, rather than HIV positive status.

In our paper, we discuss some of the potential benefits and pitfalls of implementing Peppet’s proposed limits in design, and suggest further work needed to better understand the impact privacy unraveling could have in online social environments like these. We explore ways our community could contribute to building systems that reduce its effect in order to promote disclosure choice around this type of sensitive information.

 

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 675730.

On Location, Time, and Membership: Studying How Aggregate Location Data Can Harm Users’ Privacy

The increasing availability of location and mobility data enables a number of applications, e.g., enhanced navigation services and parking, context-based recommendations, or waiting time predictions at restaurants, which have great potential to improve the quality of life in modern cities. However, the large-scale collection of location data also raises privacy concerns, as mobility patterns may reveal sensitive attributes about users, e.g., home and work places, lifestyles, or even political or religious inclinations.

Service providers, i.e., companies with access to location data, often use aggregation as a privacy-preserving strategy to release these data to third-parties for various analytics tasks. The idea being that, by grouping together users’ traces, the data no longer contains information to enable inferences about individuals such as the ones mentioned above, while it can be used to obtain useful insights about the crowds. For instance, Waze constructs aggregate traffic models to improve navigation within cities, while Uber provides aggregate data for urban planning purposes. Similarly, CityMapper’s Smart Ride application aims at identifying gaps in transportation networks based on traces and rankings collected by users’ mobile devices, while Telefonica monetizes aggregate location statistics through advertising as part of the Smart Steps project.

That’s great, right? Well, our paper, “Knock Knock, Who’s There? Membership Inference on Aggregate Location Data” and published at NDSS 2018, shows that aggregate location time-series can in fact be used to infer information about individual users. In particular, we demonstrate that aggregate locations are prone to a privacy attack, known as membership inference: a malicious entity aims at identifying whether a specific individual contributed her data to the aggregation. We demonstrate the feasibility of this type of privacy attack on a proof-of-concept setting designed for an academic evaluation, and on a real-world setting where we apply membership inference attacks in the context of the Aircloak challenge, the first bounty program for anonymized data re-identification.

Membership Inference Attacks on Aggregate Location Time-Series

Our NDSS’18 paper studies membership inference attacks on aggregate location time-series indicating the number of people transiting in a certain area at a given time. That is, we show that an adversary with some “prior knowledge” about users’ movements is able to train a machine learning classifier and use it to infer the presence of a specific individual’s data in the aggregates.

We experiment with different types of prior knowledge. On the one hand, we simulate powerful adversaries that know the real locations for a subset of users in a database during the aggregation period (e.g., telco providers which have location information about their clients), while on the other hand, weaker ones which only know past statistics about user groups (i.e., reproducing a setting of continuous data release). Overall, we find that the adversarial prior knowledge influences significantly the effectiveness of the attack.

Continue reading On Location, Time, and Membership: Studying How Aggregate Location Data Can Harm Users’ Privacy

Improving the auditability of access to data requests

Data is increasingly collected and shared, with potential benefits for both individuals and society as a whole, but people cannot always be confident that their data will be shared and used appropriately. Decisions made with the help of sensitive data can greatly affect lives, so there is a need for ways to hold data processors accountable. This requires not only ways to audit these data processors, but also ways to verify that the reported results of an audit are accurate, while protecting the privacy of individuals whose data is involved.

We (Alexander Hicks, Vasilios Mavroudis, Mustafa Al-Basam, Sarah Meiklejohn and Steven Murdoch) present a system, VAMS, that allows individuals to check accesses to their sensitive personal data, enables auditors to detect violations of policy, and allows publicly verifiable and privacy-preserving statistics to be published. VAMS has been implemented twice, as a permissioned distributed ledger using Hyperledger Fabric and as a verifiable log-backed map using Trillian. The paper and the code are available.

Use cases and setting

Our work is motivated by two scenarios: controlling the access of law-enforcement personnel to communication records and controlling the access of healthcare professionals to medical data.

The UK Home Office states that 95% of serious and organized criminal cases make use of communications data. Annual reports published by the IOCCO (now under the IPCO name) provide some information about the request and use of communications data. There were over 750 000 requests for data in 2016, a portion of which were audited to provide the usage statistics and errors that can be found in the published report.

Not only is it important that requests are auditable, the requested data can also be used as evidence in legal proceedings. In this case, it is necessary to ensure the integrity of the data or to rely on representatives of data providers and expert witnesses, the latter being more expensive and requiring trust in third parties.

In the healthcare case, individuals usually consent for their GP or any medical professional they interact with to have access to relevant medical records, but may have concerns about the way their information is then used or shared.  The NHS regularly shares data with researchers or companies like DeepMind, sometimes in ways that may reduce the trust levels of individuals, despite the potential benefits to healthcare.

Continue reading Improving the auditability of access to data requests