Measuring mobility without violating privacy – a case study of the London Underground

In the run-up to this year’s Privacy Enhancing Technologies Symposium (PETS 2019), I noticed some decidedly non-privacy-enhancing behaviour. Transport for London (TfL) announced they will be tracking the wifi MAC addresses of devices being carried on London Underground stations. Before storing a MAC address it will be hashed with a key, but since this key will remain unchanged for an extended period (2 years), it will be possible to track the movements of an individual over this period through this pseudonymous ID. These traces are likely enough to link records back to the individual with some knowledge of that person’s distinctive travel plans. Also, for as long as the key is retained it would be trivial for TfL (or someone who stole the key) to convert the someone’s MAC address into its pseudonymised form and indisputably learn that that person’s movements.

TfL argues that under the General Data Protection Regulations (GDPR), they don’t need the consent of individuals they monitor because they are acting in the public interest. Indeed, others have pointed out the value to society of knowing how people typically move through underground stations. But the GDPR also requires that organisations minimise the amount of personal data they collect. Could the same goal be achieved if TfL irreversibly anonymised wifi MAC addresses rather than just pseudonymising them? For example, they could truncate the hashed MAC address so that many devices all have the same truncated anonymous ID. How would this affect the calculation of statistics of movement patterns within underground stations? I posed these questions in a presentation at the PETS 2019 rump session, and in this article, I’ll explain why a set of algorithms designed to violate people’s privacy can be applied to collect wifi mobility information while protecting passenger privacy.

It’s important to emphasise that TfL’s goal is not to track past Underground customers but to predict the behaviour of future passengers. Inferring past behaviours from the traces of wifi records may be one means to this end, but it is not the end in itself, and TfL creates legal risk for itself by holding this data. The inferences from this approach aren’t even going to be correct: wifi users are unlikely to be typical passengers and behaviour will change over time. TfL’s hope is the inferred profiles will be useful enough to inform business decisions. Privacy-preserving measurement techniques should be judged by the business value of the passenger models they create, not against how accurate they are at following individual passengers around underground stations in the past. As the saying goes, “all models are wrong, but some are useful”.

Simulating privacy-preserving mobility measurement

To explore this space, I built a simple simulation of Euston Station inspired by one of the TfL case studies. In my simulation, there are two platforms (A and B) and six types of passengers. Some travel from platform A to B; some from B to A; others enter and leave the station at one platform (A or B). Of the passengers that travel between platforms, they can take either the fast route (taking 2 minutes on average) or the slow route (taking 4 minutes on average). Passengers enter the station at a Poisson arrival rate averaging one per second. The probabilities that each new passenger is of a particular type are shown in the figure below. The goal of the simulation is to infer the number of passengers of each type from observations of wifi measurements taken at platforms A and B.

Continue reading Measuring mobility without violating privacy – a case study of the London Underground

A Reflection on the Waves Of Malice: Malicious File Distribution on the Web (part 2)

The first part of this article introduced the malicious file download dataset and the delivery network structure. This final part explores the types of files delivered, discusses how the network varies over time, and concludes with challenges for the research community.

The Great Divide: A PUP Ecosystem and a Malware Ecosystem

We found a notable divide in the delivery of PUP and malware. First, there is much more PUP than malware in the wild: we found PUP-to-malware ratios of 5:1 by number of SHA-2s, and 17:2 by number of raw downloads. Second, we found that mixed delivery mechanisms of PUP and malware are not uncommon (e.g., see our Opencandy case study in the paper). Third, the highly connected Giant Component is predominantly a PUP Ecosystem (8:1 PUP-to-malware by number of SHA-2s), while the many “islands” of download activity outside of this component are predominantly a Malware Ecosystem (1.78:1 malware-to-PUP by number of SHA-2s).

Comparing the structures of the two ecosystems,we found that the PUP Ecosystem leverages a higher degree of IP address and autonomous system (AS) usage per domain and per dropper than the Malware Ecosystem, possibly indicating higher CDN usage or the use of evasive fast-flux techniques to change IP addresses (though, given earlier results, the former is the more likely). On the other hand, the Malware Ecosystem was attributed with fewer SHA-2s being delivered per domain than the PUP Ecosystem with the overall numbers in raw downloads remaining the same, which could again be indicative of a disparity in the use of CDNs between the two ecosystems (i.e., CDNs typically deliver a wide range of content). At the same time, fewer suspicious SHA-2s being delivered per domain could also be attributable to evasive techniques being employed (e.g., malicious sites delivering a few types of files before changing domain) or distributors in this ecosystem dealing with fewer clients and smaller operations.

We tried to estimate the number of PPIs in the wild by defining a PPI service as a network-only component (or group of components aggregated by e2LD) that delivered more than one type of malware or PUP family. Using this heuristic, we estimated a lower bound of 394 PPIs operating on the day, 215 of which were in the PUP Ecosystem. In terms of proportions, we found that the largest, individual PPIs in the PUP and Malware Ecosystems involved about 99% and 24% of all e2LDs and IPs in their ecosystems, respectively.

With there being a number of possible explanations for these structural differences between ecosystems, and such a high degree of potential PPI usage in the wild (especially within the PUP Ecosystem), this is clearly an area in which further research is required.

Keeping Track of the Waves

The final part of the study involved tracking these infrastructures and their activities over time. Firstly, we generated tracking signatures of the network-only (server-side) and file-only (client-side) delivery infrastructures. In essence, this involved tracking the root and trunk nodes in a component, which typically had the highest node degrees, and thus, were more likely to be stable, as opposed to the leaf nodes, which were more likely to be ephemeral.

Continue reading A Reflection on the Waves Of Malice: Malicious File Distribution on the Web (part 2)

A Reflection on the Waves Of Malice: Malicious File Distribution on the Web (part 1)

The French cybercrime unit, C3N, along with the FBI and Avast, recently took down the Retadup botnet that infected more than 850,000 computers, mostly in South America. Though this takedown operation was successful, the botnet was created as early as 2016, with the operators reportedly making millions of euros since. It is clear that large-scale analysis, monitoring, and detection of malicious downloads and botnet activity, even as far back as 2016, is still highly relevant today in the ongoing battle against increasingly sophisticated cybercriminals.

Malware delivery has undergone an impressive evolution since its inception in the 1980s, moving from being an amateur endeavor to a well-oiled criminal business. Delivery methods have evolved from the human-centric transfer of physical media (e.g., floppy disks), sending of malicious emails, and social engineering, to the automated delivery mechanisms of drive-by downloads (malicious code execution on websites and web advertisements), packaged exploit kits (software packages that fingerprint user browsers for specific exploits to maximise the coverage of potential victims), and pay-per-install (PPI) schemes (botnets that are rented out to other cybercriminals).

Furthermore, in recent times, researchers have uncovered the parallel economy of potentially unwanted programs (PUP), which share many traits with the malware ecosystem (such as their delivery through social engineering and PPI networks), while being primarily controlled by different actors. However with some types of PUP, including adware and spyware, PUP has generally been regarded as an annoyance rather than a direct threat to security.

Using the download metadata of millions of users worldwide from 2015/16, we (Colin C. Ife, Yun Shen, Steven J. Murdoch, Gianluca Stringhini) carried out a comprehensive measurement study in the short-term (a 24-hour period), the medium-term (daily, over the course of a month), and the long-term (weekly, over the course of a year) to characterise the structure of this complex malicious file delivery ecosystem on the Web, and how it evolves over time. This work provides us with answers to some key questions, while, at the same time, posing some more and exemplifying some significant issues that continue to hinder security research on unwanted software activity.

An Overview

There were three main research questions that influenced this study, which we will traverse in the following sections of this post:

    1. What does the malicious file delivery ecosystem look like?
    2. How do the networks that deliver only malware, only PUP, or both compare in structure?
    3. How do these file delivery infrastructures and their activities change over time?

For full technical details, you can refer to our paper – Waves of Malice: A Longitudinal Measurement of the Malicious File Delivery Ecosystem on the Web – published by and presented at the ACM AsiaCCS 2019 conference.

The Data

The dataset was provided (and pre-sanitized) by Symantec and consisted of 129 million download events generated by 12 million users. Each download event contained information such as the timestamp, the SHA-2s of the downloaded file and its parent file, the filename, the size (in bytes), the referrer URL, Host URLs (landing pages after redirection) of the download and parent file, and the IP address hosting the download.

Continue reading A Reflection on the Waves Of Malice: Malicious File Distribution on the Web (part 1)

Beyond Regulators’ Concerns, Facebook’s Libra Cryptocurrency Faces another Big Challenge: The Risk of Fraud

Facebook has attracted attention through the announcement of their blockchain-based payment network, Libra. This won’t be the first payment system Facebook has launched, but what makes Facebook’s Libra distinctive is that rather than transferring Euros or dollars, the network is designed for a new cryptocurrency, also called Libra. This currency is backed by a reserve of nationally-issued currencies, and so Facebook hopes it will avoid the high volatility of cryptocurrencies like Bitcoin. As a result, Libra won’t be attractive to currency speculators, but Facebook hopes that it will, therefore, be useful for its stated goal – to be a “simple global currency and financial infrastructure that empowers billions of people.”

Reducing currency volatility is only one step towards meeting this goal of scaling cryptocurrencies to billions of users. The Libra blockchain design addresses how the network can maintain the high throughput and low transaction fees needed to compete with existing payment networks like Visa or MasterCard. However, a question that is equally important but as yet unanswered is how Facebook will develop a secure authentication and fraud prevention system that can scale to billions of users while maintaining good usability and low cost.

Facebook designed the Libra network, but in contrast to traditional payment networks, the Libra network is open. Anyone can send transactions through the network, and anyone can write programs (known as “smart contracts”) that control how, and under what conditions, funds can move between Libra accounts. To comply with anti-money-laundering regulations, Know Your Customer (KYC) checks will be performed, but only when Libra enters or leaves the network through exchanges. Transactions moving funds within the network should be accepted if they meet the criteria set out in the applicable smart contract, regardless of who sent them.

The Libra network isn’t even restricted to transactions transferring the Libra currency. Facebook has explicitly designed the Libra blockchain to make it easy for anyone to implement their own currency and benefit from the same technical facilities that Facebook designed for its currency. Other blockchains have tried this. For example, Ethereum has spawned hundreds of special-purpose currencies. But programming a smart contract to implement a new currency is difficult, and errors can be costly. The programming language for smart contracts within the Libra network is designed to help developers avoid some of the most common mistakes.

Facebook’s Libra and Securing the Calibra Wallet

There’s more to setting up an effective currency than just the technology: regulatory compliance, a network of exchanges, and monetary policy are essential. Facebook, through setting up the Libra Association, is focusing its efforts here solely on the Libra currency. The widespread expectation is, therefore, at least initially, the Libra cryptocurrency will be the dominant usage of the network, and most users will send and receive funds through the Calibra wallet smartphone app, developed by a Facebook subsidiary. From the perspective of the vast majority of the world, the Calibra wallet will be synonymous with Facebook’s Libra, and so damage to trust in Calibra will damage the reputation of Libra as a whole.

Continue reading Beyond Regulators’ Concerns, Facebook’s Libra Cryptocurrency Faces another Big Challenge: The Risk of Fraud

Tracing transactions across cryptocurrency ledgers

The Bitcoin whitepaper specifies the risks of revealing owners of addresses. It states that “if the owner of a key is revealed, linking could reveal other transactions that belonged to the same owner.”  Five years later, we have seen many projects which look at de-anonymising entities in Bitcoin. Such projects use techniques such as address tagging and clustering to tie many addresses to one entity, making it easier to analyse the movement of funds. However, this is not only limited to Bitcoin but also occurs on alternative cryptocurrencies such as Zcash and Monero. Thus tracing transactions on-chain is a known and studied problem.

But we have recently seen a shift into entities performing cross-currency trades. For example, the WannaCry hackers laundered over $142,000 Bitcoin from ransoms across cryptocurrencies. The issue here is that cross-chain transactions appear to be indistinguishable from native transactions on-chain. For example, to trade Bitcoin for Monero, one would have to send the exchange bitcoin, and in return, the exchange sends the user some coins in Monero. Both these transactions occur on separate chains and do not appear to be connected, so the actual swap can appear to be obscured. This level of obscurity can be used to hide the original flow of coins, giving users an additional form of anonymity.

Thus it is important to ask whether or not we can analyse such transactions and the extent of the analysis possible, and if so, how? In our paper being presented today at the USENIX Security Symposium, we (Haaroon Yousaf, George Kappos and Sarah Meiklejohn) answer these questions.

Our Research

In summary, we scraped and linked over 1.3 million transactions across different blockchains from the service ShapeShift. In doing so, we found over 100,000 cases where users would convert coins to another currency then move right back to the original one, identified that a Bitcoin address associated with CoinPayments.net address is a very popular service for users to shift to, and saw that scammers preferred shifting their Ethereum to Bitcoin and Monero.

We collected and analysed 13 months of transaction data across eight different blockchains to identify how users interacted with this service. In doing so, we developed new heuristics and identified various patterns of cross-currency trades.

What is ShapeShift? 

ShapeShift is a lightweight cross-currency non-custodial service that facilitates trades which allows users to directly trade coins from one currency to another (a cross-currency shift). This service acts as the entity which facilitates the entire trade, allowing users to essentially swap their coins with its own supply. ShapeShift and Changelly are examples of such services.

Continue reading Tracing transactions across cryptocurrency ledgers

Next version of Android might introduce new security risks for online banking, 2FA, and more

Google is preparing new functionality for Android that will allow apps to retrieve and auto-fill security codes from SMS. Last year Apple introduced a similar feature to iOS and macOS, for which we discovered security risks for online banking, two-factor authentication, and other services. Will Google come up with a better design? In this post, we analyse what we know about this feature so far. 


The latest developer beta of Google Play Services (18.7.13 beta) contains code fragments that show a new Android permission to automatically retrieve verification codes from text messages. This feature has not yet been fully implemented, but the available code allows for some analysis and early evaluation for possible security risks, akin to similar risks we demonstrated in 2018 for the Security Code AutoFill feature in iOS and macOS.

Background

It seems that Google is updating the “Autofill Framework”, introduced with Android 8.0 in 2017, to include the new functionality. Previously, this framework’s sole purpose was to support the autofill functionality of password managers in Android apps and websites. The code fragments of this new feature reveal the names and descriptions of the associated system setting and corresponding runtime permission requests, shown below.

A screenshot of an Android phone.
The likely UI of the new setting in Android to enable/disable SMS Code Auto-fill.
The picture of an Android runtime permission request.
The likely UI of the new runtime permission request in Android to deny or allow an application’s access to the SMS Code Auto-fill feature.

Continue reading Next version of Android might introduce new security risks for online banking, 2FA, and more

Confirmation of Payee is coming, but will it protect bank customers from fraud?

The Payment System Regulator (PSR) has just announced that the UK’s six largest banks must check whether the name of the recipient of a transfer matches what the sender thinks. This new feature should help address a security loophole in online payments: the name of the recipient of transfers is ignored, contrary to expectations and unlike cheques. This improved security should make some fraud more difficult, but banks must be prevented from exploiting the change to unfairly shift the liability of the remaining crime to the victims.

The PSR’s target is for checks to be fully implemented by March 2020, somewhat later than their initial promise to Parliament of September 2018 and subsequent target of July 2019. The new proposal, known as Confirmation of Payee, also only covers the six largest banking groups, but this should cover 90% of transfers. Its goal is to defend against criminals who trick victims into transferring funds under the false pretence that the money is going to the victim’s new account, whereas it is really going to the criminal. The losses from such fraud, known as push payment scams, are often life-changing, resulting in misery for the victims.

Checks on the recipient name will make this particular scam harder, so while unlikely to prevent all types of push payment scams they will hopefully force criminals to adopt strategies that are easier to prevent. The risk that consumer representatives and regulators will need to watch out for is that these new security measures could result in victims being unfairly held liable. This scenario is, unfortunately, likely because the voluntary consumer protection code for push payment scams excuses the bank from liability if they show the customer a Confirmation of Payee warning.

Warning fatigue and misaligned incentives

In my response to the consultation over this consumer protection code, I raised the issue of “warning fatigue” – that customers will be shown many irrelevant warnings while they do online banking and this reduces the likelihood that customers will notice important ones. Even Confirmation of Payee warnings will frequently be wrong, such as if the recipient’s bank account is under a different name to what the sender expects. If the two names are very dissimilar, the sender won’t be given more details but if the name entered is close to the name in bank records the sender should be told what the correct one is and asked to compare.

Continue reading Confirmation of Payee is coming, but will it protect bank customers from fraud?

UCL’s Centre for Doctoral Training in Cybersecurity

It has become increasingly apparent that the world’s cybersecurity challenges will not be resolved by specialists working in isolation.

Indeed, it has become clear that the challenges that arise from the integration of emerging technologies into existing social, commercial, legal and political systems will not be resolved by specialists working in isolation. Rather, these complex problems require the efforts of people who can cross disciplinary boundaries, communicate beyond their own fields, and comprehend the context in which others operate. Computer science, information security, encryption, criminology, psychology, international relations, public policy, philosophy of science, legal studies, and economics combine to form the ecosystem within which cybersecurity problems and solutions are found but training people to think and work across these boundaries has proven difficult.

UCL is delighted to have been awarded funding by the UK’s Engineering and Physical Sciences Research Council (EPSRC) to establish a Centre for Doctoral Training (CDT) in Cybersecurity that will help to establish a cadre of leaders in security with the breadth of perspective and depth of skills required to handle the complex challenges in security faced by our society. The CDT is led by Prof Madeline Carr (Co-Director; UCL Science, Technology, and Public Policy), Prof Shane Johnson (Co-Director; UCL Security and Crime Science), and Prof David Pym (Director; UCL Programming Principles, Logic, and Verification (PPLV) and Information Security).

The CDT is an exciting collaboration that brings together research teams in three of UCL’s departments – Computer Science, Security and Crime Science, and Science, Technology, Engineering, and Public Policy – in order to increase the capacity of the UK to respond to future information and cybersecurity challenges. Through an interdisciplinary approach, the CDT will train cohorts of highly skilled experts drawn from across the spectrum of the engineering and social sciences, able to become the next generation of UK leaders in industry and government, public policy, and scientific research. The CDT will equip them with a broad understanding of all sub-fields of cybersecurity, as well as specialized knowledge and transferable skills to be able to operate professionally in business, academic, and policy circles.

The CDT will admit candidates with a strong background in STEM (CS, Mathematics, Engineering, Physics) or Social Sciences (Psychology, Sociology, International Relations, Public Policy, Crime Science, Economics, and Management), either recent graduates or mid-career. Each will be trained in research and innovation skills in the multidisciplinary facets of cybersecurity, (computing, crime science, management and public policy) and then specialise within a discipline, with industrial experience through joint industrial projects and internships.

For more information, including directions for applications, please visit the cybersecurity CDT website.

Hiring Research Assistants and PhD students

We’re happy to announce that we have several open positions!

Privacy & machine learning

Emiliano De Cristofaro has at least one post-doc position in privacy and machine learning. The researcher will work with him and others in UCL’s InfoSec group. For a sample of our recent work in the field, please see Emiliano’s publications on this topic.

Please email jobs@emilianodc.com with questions or apply directly before 25 July 2019.

Note that we would be keen to hear from both PhD students looking for part-time research work, as well as people looking for longer-term full-time post-doctoral positions.

Web measurements

Multiple positions are available in the context of a project based at the Alan Turing Institute on cyberbullying and cyberhate, led by Emiliano De Cristofaro and Gareth Tyson. The project will primarily focus on measurements research, i.e., gathering and analysing various types of social datasets.

For a sample of our recent work in this space, please see Emiliano’s publications on this topic.

Again, we would be keen to hear from both PhD students looking for part-time research work, as well as people looking for longer-term full-time post-doctoral positions.

Please email edecristofaro@turing.ac.uk if you have questions.

PhD positions with Philipp Jovanovic

Members of the InfoSec group are always looking for talented PhD students to join their team. If you would like to investigate opportunities, please do check their website for details of their research interests and contact instructions. We are particularly happy to announce that Philipp Jovanovic will join our group as an Associate Professor starting in January 2020, and he is inviting applications for PhD students.

Philipp’s research interests broadly cover applied cryptography, privacy, and decentralised systems. His current work focuses on building scalable, privacy-preserving, decentralised protocols (such as ByzCoin, RandHound, OmniLedger, or Calypso). He has also worked on a wide variety of other security-related topics in the past, including design and analysis of symmetric cryptographic primitives, side-channel attacks and countermeasures, and the security analysis of systems deployed in the real world such as the Transport Layer Security (TLS) protocol or the Open Smart Grid Protocol (OSGP).

For an overview of his work, please visit Philipp’s website.

If you’re interested in working with Philipp as a PhD student, please email philipp@jovanovic.io.

New CDT in cybersecurity

We have several PhD positions funded through the new Centre for Doctoral Training in Cybersecurity (CDT). Please see the article about the CDT for more details and instructions to apply.

Will dispute resolution be Libra’s Achilles’ heel?

Facebook’s new cryptocurrency, Libra, has the ambitious goal of being the “financial infrastructure that empowers billions of people”. This aspiration will only be achievable if the user-experience (UX) of Libra and associated technologies is competitive with existing payment channels. Now, Facebook has an excellent track record of building high-quality websites and mobile applications, but good UX goes further than just having an aesthetically pleasing and fast user interface. We can already see aspects of Libra’s design that will have consequences on the experience of its users making payments.

For example, the basket of assets that underly the Libra currency should ensure that its value should not be too volatile in terms of the currencies represented within the reserve, so easing international payments. However, Libra’s value will fluctuate against every other currency, creating a challenge for domestic payments. People won’t be paid their salary in Libra any time soon, nor will rents be denominated in Libra. If the public is expected to hold significant value in Libra, fluctuations in the currency markets could make the difference between someone being able to pay their rent or not – a certainly unwelcome user experience.

Whether the public will consider the advantages of Libra are worth the exposure to the foibles of market fluctuations is an open question, but in this post, I’m mostly going to discuss the consequences another design decision baked into the design of Libra: that transactions are irrevocable. Once a transaction is accepted by the validator network, the user may proceed “knowing that the transaction can never be changed or reversed“. This is a common design decision within cryptocurrencies because it ensures that companies, governments and regulators should be unable to revoke payments they dislike. When coupled with anonymity or decentralisation, to prevent blacklisted transactions being blocked beforehand, irrevocability creates a censorship-resistant payment system.

Mitigating the cost of irrevocable transactions

Libra isn’t decentralised, nor is it anonymous, so it is unlikely to be particularly resistant to censorship over matters when there is an international consensus. Irrevocability does, however, make fraud easier because once stolen funds are gone, they cannot be reinstated, even if the fraud is identified. Other cryptocurrencies share Libra’s irrevocability (at least in theory), but they are designed for technically sophisticated users, and their risk of theft can be balanced against the potentially substantial gains (and losses) that can be made from volatile cryptocurrencies. While irrevocability is common within cryptocurrencies, it is not within the broader payments industry. Exposing billions of people to the risk of their Libra holdings being stolen, without the potential for recourse, isn’t good UX. I’ve argued that irrevocable transactions protect the interests of financial institutions over those of the public, and are the wrong default for payments. Eventually, public pressure and regulatory intervention forced UK banks to revoke fraudulent transactions, and they take on the risk that they are unable to do so, rather than pass it onto the victims. The same argument applies to Libra, and if fraud becomes common, they will see the same pressures as UK banks.

Continue reading Will dispute resolution be Libra’s Achilles’ heel?