Moving towards security and privacy experiments for the real world

Jono and I recently presented our joint paper with Simon and Angela at the Learning from Authoritative Security Experiment Results (LASER) Workshop in San Jose, CA, USA. The workshop was co-located with the IEEE Security and Privacy Symposium. LASER has a different focus each year; in 2016, presented papers explored new approaches to computer security experiments that are repeatable and can be shared across communities.

Through our LASER paper, “Towards robust experimental design for user studies in security and privacy”, we wanted to advance the quest for better experiment design and execution. We proposed the following five principles for conducting robust experiments into usable security and privacy:

  1. Give participants a primary task
  2. Ensure participants experience realistic risk
  3. Avoid priming the participants
  4. Perform experiments double-blind whenever possible
  5. Define these elements precisely: threat model; security; privacy and usability

Understanding users and their interaction with security is a blind spot for many security practitioners and designers. Learning from prior studies within and outside our research group, we have defined principles for conducting robust experiments into usable security and privacy. These principles are informed by efforts in other fields such as biology, qualitative research methods, and medicine, where four overarching experiment-design factors guided our principles:

Internal validity – The experiment is of “suitable scope to achieve the reported results” and is not “susceptible to systematic error”.

External validity – The result of the experiment “is not solely an artifact of the laboratory setting”.

Containment  – There are no “confounds” in the results, and no experimental “effects are a threat to safety” of the participants, the environment, or society generally.

Transparency – “There are no explanatory gaps in the experimental mechanism” and the explanatory “diagram for the experimental mechanism is complete”, in that it covers all relevant entities and activities.

Continue reading Moving towards security and privacy experiments for the real world

Adblocking and Counter-Blocking: A Slice of the Arms Race

anti-adblocking message from WIRED
If you use an adblocker, you are probably familiar with messages of the kind shown above, asking you to either disable your adblocker, or to consider supporting the host website via a donation or subscription. This is the battle du jour in the ongoing adblocking arms race — and it’s one we explore in our new report Adblocking and Counter-Blocking: A Slice of the Arms Race.

The reasons for the rising popularity of adblockers include improved browsing experience, better privacy, and protection against malvertising. As a result, online advertising revenue is gravely threatened by adblockers, prompting publishers to actively detect adblock users, and subsequently block them or otherwise coerce the user to disable the adblocker — practices we refer to as anti-adblocking. While there has been a degree of sound and fury on the topic, until now we haven’t been able to understand the scale, mechanism and dynamics of anti-adblocking. This is the gap we have started to address, together with researchers from the University of Cambridge, Stony Brook University, University College London, University of California Berkeley, Queen Mary University of London and International Computer Science Institute (Berkeley). We address some of these questions by leveraging a novel approach for identifying third-party services shared across multiple websites to present a first characterization of anti-adblocking across the Alexa Top-5K websites.

We find that at least 6.7% of Alexa Top-5K websites employ anti-adblocking, with the practices finding adoption across a diverse mix of publishers; particularly publishers of “General News”, “Blogs/Wiki”, and “Entertainment” categories. It turns out that these websites owe their anti-adblocking capabilities to 14 unique scripts pulled from 12 unique domains. Unsurprisingly, the most popular domains are those that have skin in the game — Google, Taboola, Outbrain, Ensighten and Pagefair — the latter being a company that specialises in anti-adblocking services. Then there are in-house anti-adblocking solutions that are distributed by a domain to client websites belonging to the same organisation: TripAdvisor distributes an anti-adblocking script to its eight websites with different country code top-level domains, while adult websites (all hosted by MindGeek) turn to DoublePimp. Finally, we visited a sample website for each anti-adblocking script via AdBlock Plus, Ghostery and Privacy Badger, and discovered that half of the 12 anti-adblocking suppliers are counter-blocked by at least one adblocker — suggesting that the arms race has already entered the next level.

It is hard to say how many levels deeper the adblocking arms race might go. While anti-adblocking may provide temporary relief to publishers, it is essentially band-aid solution to mask a deeper issue — the disequilibrium between ads (and, particularly, their behavioural tracking back-end) and information. Any long term solution must address the reasons that brought users to adblockers in the first place. In the meantime, as the arms race continues to escalate, we hope that studies such as ours will bring transparency to this opaque subject, and inform policy that moves us out of the current deadlock.


“Ad-Blocking and Counter Blocking: A Slice of the Arms Races” by Rishab Nithyanand, Sheharbano Khattak, Mobin Javed, Narseo Vallina-Rodriguez, Marjan Falahrastegar, Julia E. Powles, Emiliano De Cristofaro, Hamed Haddadi, and Steven J. Murdoch. arXiv:1605.05077v1 [cs.CR], May 2016.

This post also appears on the University of Cambridge Computer Laboratory Security Group blog, Light Blue Touchpaper.

On the hunt for Facebook’s army of fake likes

As social networks are increasingly relied upon to engage with people worldwide, it is crucial to understand and counter fraudulent activities. One of these is “like farming” – the process of artificially inflating the number of Facebook page likes. To counter them, researchers worldwide have designed detection algorithms to distinguish between genuine likes and artificial ones generated by farm-controlled accounts. However, it turns out that more sophisticated farms can often evade detection tools, including those deployed by Facebook.

What is Like Farming?

Facebook pages allow their owners to publicize products and events and in general to get in touch with customers and fans. They can also promote them via targeted ads – in fact, more than 40 million small businesses reportedly have active pages, and almost 2 million of them use Facebook’s advertising platform.

At the same time, as the number of likes attracted by a Facebook page is considered a measure of its popularity, an ecosystem of so-called “like farms” has emerged that inflate the number of page likes. Farms typically do so either to later sell these pages to scammers at an increased resale/marketing value or as a paid service to page owners. Costs for like farms’ services are quite volatile, but they typically range between $10 and $100 per 100 likes, also depending on whether one wants to target specific regions — e.g., likes from US users are usually more expensive.

Screenshot from
Screenshot from

How do farms operate?

There are a number of possible way farms can operate, and ultimately this dramatically influences not only their cost but also how hard it is to detect them. One obvious way is to instruct fake accounts, however, opening a fake account is somewhat cumbersome, since Facebook now requires users to solve a CAPTCHA and/or enter a code received via SMS. Another strategy is to rely on compromised accounts, i.e., by controlling real accounts whose credentials have been illegally obtained from password leaks or through malware. For instance, fraudsters could obtain Facebook passwords through a malicious browser extension on the victim’s computer, by hijacking a Facebook app, via social engineering attacks, or finding credentials leaked from other websites (and dumped on underground forums) that are also valid on Facebook.

Continue reading On the hunt for Facebook’s army of fake likes

Privately gathering statistics and training simple models

Last week, Luca Melis has presented our NDSS16 paper “Efficient Private Statistics with Succinct Sketches“, where we show how to privately and efficiently aggregate data from many sources and/or large streams, and then use the aggregate to extract useful statistics and train simple machine learning models.

Our work is motivated by a few “real-world” problems:

  • Media broadcasting providers like the BBC (with which we collaborate) routinely collect data from their users about videos they have watched (e.g., on BBC’s iPlayer) in order to provide users with personalized suggestions for other videos, based on recommender systems like Item k-Nearest Neighbor (ItemKNN)
  • Urban and transport planning committees, such as London’s mass transport operators, need to gather statistics about people’s movements and commutes, e.g., to improve transportation services and predict near-future trends and anomalies on a short notice.
  • Network infrastructures like the Tor network need to gather traffic statistics, like the number of, and traffic generated by, Tor hidden services, in order to tune design decisions as well as convince their founders the infrastructure is used for the intended purposes.

While different in their application, these examples exhibit a common feature: the need for providers to aggregate large amounts of sensitive information from large numbers of data sources, in order to produce aggregate statistics and possibly train machine learning models.

Prior work has proposed a few cryptographic tools for privacy-enhanced computation that could be use for private collection of statistics. For instance, by relying on homomorphic encryption and/or secret sharing, an untrusted aggregator can receive encrypted readings from users and only decrypt their sum. However, these require users to perform a number of cryptographic operations, and transmit a number of ciphertexts, linear in the size of their inputs, which makes it impractical for the scenarios discussed above, whereby inputs to be aggregated are quite large. For instance, if we use ItemKNN for the recommendations, we would need to aggregate values for “co-views” (i.e., videos that have been watched by the same user) of hundreds of videos at the time – thus, each user would have to encrypt and transfer hundreds of thousands of values at the time.

Scaling private aggregation

We tackle the problem from two points of view: an “algorithmic” one and a “system” one. That is, we have worked both on the design of the necessary cryptographic and data structure tools, as well as on making it easy for application developers to easily support these tools in web and mobile applications.

Our intuition is that, in many scenarios, it might be enough to collect estimates of statistics and trade off an upper-bounded error with significant efficiency gains. For instance, the accuracy of a recommender system might not be really affected if the statistics we need to train the model are approximated with a small error.

Continue reading Privately gathering statistics and training simple models

“Do you see what I see?” ask Tor users, as a large number of websites reject them but accept non-Tor users

If you use an anonymity network such as Tor on a regular basis, you are probably familiar with various annoyances in your web browsing experience, ranging from pages saying “Access denied” to having to solve CAPTCHAs before continuing. Interestingly, these hurdles disappear if the same website is accessed without Tor. The growing trend of websites extending this kind of “differential treatment” to anonymous users undermines Tor’s overall utility, and adds a new dimension to the traditional threats to Tor (attacks on user privacy, or governments blocking access to Tor). There is plenty of anecdotal evidence about Tor users experiencing difficulties in browsing the web, for example the user-reported catalog of services blocking Tor. However, we don’t have sufficient detail about the problem to answer deeper questions like: how prevalent is differential treatment of Tor on the web; are there any centralized players with Tor-unfriendly policies that have a magnified effect on the browsing experience of Tor users; can we identify patterns in where these Tor-unfriendly websites are hosted (or located), and so forth.

Today we present our paper on this topic: “Do You See What I See? Differential Treatment of Anonymous Users” at the Network and Distributed System Security Symposium (NDSS). Together with researchers from the University of Cambridge, University College London, University of California, Berkeley and International Computer Science Institute (Berkeley), we conducted comprehensive network measurements to shed light on websites that block Tor. At the network layer, we scanned the entire IPv4 address space on port 80 from Tor exit nodes. At the application layer, we fetch the homepage from the most popular 1,000 websites (according to Alexa) from all Tor exit nodes. We compare these measurements with a baseline from non-Tor control measurements, and uncover significant evidence of Tor blocking. We estimate that at least 1.3 million IP addresses that would otherwise allow a TCP handshake on port 80 block the handshake if it originates from a Tor exit node. We also show that at least 3.67% of the most popular 1,000 websites block Tor users at the application layer.

Continue reading “Do you see what I see?” ask Tor users, as a large number of websites reject them but accept non-Tor users

Scaling Tor hidden services

Tor hidden services offer several security advantages over normal websites:

  • both the client requesting the webpage and the server returning it can be anonymous;
  • websites’ domain names (.onion addresses) are linked to their public key so are hard to impersonate; and
  • there is mandatory encryption from the client to the server.

However, Tor hidden services as originally implemented did not take full advantage of parallel processing, whether from a single multi-core computer or from load-balancing over multiple computers. Therefore once a single hidden service has hit the limit of vertical scaling (getting faster CPUs) there is not the option of horizontal scaling (adding more CPUs and more computers). There are also bottle-necks in the Tor networks, such as the 3–10 introduction points that help to negotiate the connection between the hidden service and the rendezvous point that actually carries the traffic.

For my MSc Information Security project at UCL, supervised by Steven Murdoch with the assistance of Alec Muffett and other Security Infrastructure engineers at Facebook in London, I explored possible techniques for improving the horizontal scalability of Tor hidden services. More precisely, I was looking at possible load balancing techniques to offer better performance and resiliency against hardware/network failures. The focus of the research was aimed at popular non-anonymous hidden services, where the anonymity of the service provider was not required; an example of this could be Facebook’s .onion address.

One approach I explored was to simply run multiple hidden service instances using the same private key (and hence the same .onion address). Each hidden service periodically uploads its own descriptor, which describes the available introduction points, to six hidden service directories on a distributed hash table. The hidden service instance chosen by the client depends on which hidden service instance most recently uploaded its descriptor. In theory this approach allows an arbitrary number of hidden service instances, where each periodically uploads its own descriptors, overwriting those of others.

This approach can work for popular hidden services because, with the large number of clients, some will be using the descriptor most recently uploaded, while others will have cached older versions and continue to use them. However my experiments showed that the distribution of the clients over the hidden service instances set up in this way is highly non-uniform.

I therefore ran experiments on a private Tor network using the Shadow network simulator running multiple hidden service instances, and measuring the load distribution over time. The experiments were devised such that the instances uploaded their descriptors simultaneously, which resulted in different hidden service directories receiving different descriptors. As a result, clients connecting to a hidden service would be balanced more uniformly over the available instances.

Continue reading Scaling Tor hidden services

What are the social costs of contactless fraud?

Contactless payments are in the news again: in the UK the spending limit has been increased from £20 to £30 per transaction, and in Australia the Victoria Police has argued that contactless payments are to blame for an extra 100 cases of credit card fraud per week. These frauds are where multiple transactions are put through, keeping each under the AUS $100 (about £45) limit. UK news coverage has instead focussed on the potential for cross-channel fraud: where card details are skimmed from contactless cards then used for fraudulent online purchases. In a demonstration, Which? skimmed volunteers cards at a distance then bought a £3,000 TV with the card numbers and expiry dates recorded.

The media have been presenting contactless payments are insecure; the response from the banking industry is to point out that customers are not liable for the fraudulent transactions. Both are in some ways correct, but in other ways are missing the point.

The law in the UK (Payment Services Regulations (PSR) 2009, Regulation 62) indeed does say that the customers are entitled to a refund for fraudulent transactions. However a bank will only do this if they are convinced the customer has not authorised the transaction, and was not negligent. In my experience, a customer who is unable to clearly, concisely and confidently explain why they are entitled to a refund runs a high risk of not getting one. This fact will disproportionately disadvantage the more vulnerable members of society.

Continue reading What are the social costs of contactless fraud?

Experimenting with SSL Vulnerabilities in Android Apps

As the number of always-on, always-connected smartphones increase, so does the amount of personal and sensitive information they collect and transmit. Thus, it is crucial to secure traffic exchanged by these devices, especially considering that mobile users might connect to open Wi-Fi networks or even fake cell towers. The go-to protocol to secure network connection is HTTPS i.e., HTTP over SSL/TLS.

In the Android ecosystem, applications (apps for short), support HTTPS on sockets by relying on the, android.webkit,,,,, and org.apache.http packages of the Android SDK. These packages are used to create HTTP/HTTPS connections, administer and verify certificates and keys, and instantiate TrustManager and HostnameVerifier interfaces, which are in turn used in the SSL certificate validation logic.

A TrustManager manages the certificates of all Certificate Authorities (CAs) used to assess a certificate’s validity. Only root CAs trusted by Android are contained in the default TrustManager. A HostnameVerifier performs hostname verification whenever a URL’s hostname does not match the hostname in the peer’s identification credentials.

While browsers provide users with visual feedback that their communication is secured (via the lock symbol) as well as certificate validation issues, non-browser apps do so less extensively and effectively. This shortcoming motivates the need to scrutinize the security of network connections used by apps to transmit user sensitive data. We found that some of the most popular Android apps insufficiently secure these connections, putting users’ passwords, credit card details and chat messages at risk.

Continue reading Experimenting with SSL Vulnerabilities in Android Apps

Measuring Internet Censorship

Norwegian writer Mette Newth once wrote that: “censorship has followed the free expressions of men and women like a shadow throughout history.” Indeed, as we develop innovative and more effective tools to gather and create information, new means to control, erase and censor that information evolve alongside it. But how do we study Internet censorship?

Organisations such as Reporters Without Borders, Freedom House, or the Open Net Initiative periodically report on the extent of censorship worldwide. But as countries that are fond of censorship are not particularly keen to share details, we must resort to probing filtered networks, i.e., generating requests from within them to see what gets blocked and what gets through. We cannot hope to record all the possible censorship-triggering events, so our understanding of what is or isn’t acceptable to the censor will only ever be partial. And of course it’s risky, or even outright illegal, to probe the censor’s limits within countries with strict censorship and surveillance programs.

This is why the leak of 600GB of logs from hardware appliances used to filter internet traffic in and out of Syria was a unique opportunity to examine the workings of a real-world internet censorship apparatus.

Leaked by the hacktivist group Telecomix, the logs cover a period of nine days in 2011, drawn from seven Blue Coat SG-9000 internet proxies. The sale of equipment like this to countries such as Syria is banned by the US and EU. California-based manufacturer Blue Coat Systems denied making the sales but confirmed the authenticity of the logs – and Dubai-based firm Computerlinks FZCO later settled on a US$2.8m fine for unlawful export. In 2013, researchers at the University of Toronto’s Citizen Lab demonstrated how authoritarian regimes in Saudi Arabia, UAE, Qatar, Yemen, Egypt and Kuwait all rely on US-made equipment like those from Blue Coat or McAfee’s SmartFilter software to perform filtering.

Continue reading Measuring Internet Censorship

Understanding Online Dating Scams

Our research on online dating scams will be presented at the  Conference on Detection of Intrusions and Malware and Vulnerability Assessment (DIMVA) that will be held in Milan in July. This work was a collaboration with colleagues working for Jiayuan, the largest online dating site in China, and is the first large-scale measurement of online dating scams, comprising a dataset of more than 500k accounts used by scammers on Jiayuan across 2012 and 2013.

As someone who has spent a considerable amount of time researching ways to mitigate malicious activity on online services, online dating scams picked my interest for a number of reasons. First, online dating sites operate following completely different dynamics compared to traditional online social networks. On a regular social network (say Facebook or Linkedin) users connect with people they know in real life, and any request to connect from an unknown person is considered unsolicited and potentially malicious. Many malicious content detection systems (including my own) leverage this observation to detect malicious accounts. Putting people who don’t know each other in contact, however, is the main purpose of online dating sites – for this reason, traditional methods to detect fake and malevolent accounts cannot be applied to this context, and the development of a new threat model is required. As a second differentiator, online dating users tend to use the site only for the first contact, and move to other media (text messages, instant messaging) after that. Although that is fine for regular use, it makes it more difficult to track scammers, because the online dating site loses visibility of the messages exchanged between users after they have left the site. Third, online dating scams have a strong human component, which differentiates them heavily from traditional malicious activity on online services such as spam, phishing, or malware.

We identified three types of scams happening on Jiayuan. The first one involves advertising of  escort services or illicit goods, and is very similar to traditional spam. The other two are far more interesting and specific to the online dating landscape. One type of scammers are what we call swindlers. For this scheme, the scammer starts a long-distance relationship with an emotionally vulnerable victim, and eventually asks her for money, for example to purchase the flight ticket to visit her. Needless to say, after the money has been transferred the scammer disappears. Another interesting type of scams that we identified are what we call dates for profit. In this scheme, attractive young ladies are hired by the owners of fancy restaurants. The scam then consists in having the ladies contact people on the dating site, taking them on a date at the restaurant, having the victim pay for the meal, and never arranging a second date. This scam is particularly interesting, because there are good chances that the victim will never realize that he’s been scammed – in fact, he probably had a good time.

In the paper we analyze the accounts that we detected belonging to the different scam types, and extract typical information about the demographics that scammers pose as in their accounts, as well as the demographics of their victims. For example, we show that swindlers usually pose as widowed mid-aged men and target widowed women. We then analyze the modus operandi of scam accounts, showing that specific types of scam accounts have a higher chance of getting the attention of their victims and receiving replies than regular users. Finally, we show that the activity performed on the site by scammers is mostly manual, and that the use of infected computers and botnet to spread content – which is prominent on other online services – is minimal.

We believe that the observations provided in this paper will shed some light on a so far understudied problem in the field of computer security, and will help researchers in developing systems that can automatically detect such scam accounts and block them before they have a chance to reach their victims.

The full paper is available on my website.

Update (2015-05-15): There is press coverage of this paper in Schneier on Security and BuzzFeed.