Adblocking and Counter-Blocking: A Slice of the Arms Race

anti-adblocking message from WIRED
If you use an adblocker, you are probably familiar with messages of the kind shown above, asking you to either disable your adblocker, or to consider supporting the host website via a donation or subscription. This is the battle du jour in the ongoing adblocking arms race — and it’s one we explore in our new report Adblocking and Counter-Blocking: A Slice of the Arms Race.

The reasons for the rising popularity of adblockers include improved browsing experience, better privacy, and protection against malvertising. As a result, online advertising revenue is gravely threatened by adblockers, prompting publishers to actively detect adblock users, and subsequently block them or otherwise coerce the user to disable the adblocker — practices we refer to as anti-adblocking. While there has been a degree of sound and fury on the topic, until now we haven’t been able to understand the scale, mechanism and dynamics of anti-adblocking. This is the gap we have started to address, together with researchers from the University of Cambridge, Stony Brook University, University College London, University of California Berkeley, Queen Mary University of London and International Computer Science Institute (Berkeley). We address some of these questions by leveraging a novel approach for identifying third-party services shared across multiple websites to present a first characterization of anti-adblocking across the Alexa Top-5K websites.

We find that at least 6.7% of Alexa Top-5K websites employ anti-adblocking, with the practices finding adoption across a diverse mix of publishers; particularly publishers of “General News”, “Blogs/Wiki”, and “Entertainment” categories. It turns out that these websites owe their anti-adblocking capabilities to 14 unique scripts pulled from 12 unique domains. Unsurprisingly, the most popular domains are those that have skin in the game — Google, Taboola, Outbrain, Ensighten and Pagefair — the latter being a company that specialises in anti-adblocking services. Then there are in-house anti-adblocking solutions that are distributed by a domain to client websites belonging to the same organisation: TripAdvisor distributes an anti-adblocking script to its eight websites with different country code top-level domains, while adult websites (all hosted by MindGeek) turn to DoublePimp. Finally, we visited a sample website for each anti-adblocking script via AdBlock Plus, Ghostery and Privacy Badger, and discovered that half of the 12 anti-adblocking suppliers are counter-blocked by at least one adblocker — suggesting that the arms race has already entered the next level.

It is hard to say how many levels deeper the adblocking arms race might go. While anti-adblocking may provide temporary relief to publishers, it is essentially band-aid solution to mask a deeper issue — the disequilibrium between ads (and, particularly, their behavioural tracking back-end) and information. Any long term solution must address the reasons that brought users to adblockers in the first place. In the meantime, as the arms race continues to escalate, we hope that studies such as ours will bring transparency to this opaque subject, and inform policy that moves us out of the current deadlock.

 

“Ad-Blocking and Counter Blocking: A Slice of the Arms Races” by Rishab Nithyanand, Sheharbano Khattak, Mobin Javed, Narseo Vallina-Rodriguez, Marjan Falahrastegar, Julia E. Powles, Emiliano De Cristofaro, Hamed Haddadi, and Steven J. Murdoch. arXiv:1605.05077v1 [cs.CR], May 2016.

This post also appears on the University of Cambridge Computer Laboratory Security Group blog, Light Blue Touchpaper.

On the hunt for Facebook’s army of fake likes

As social networks are increasingly relied upon to engage with people worldwide, it is crucial to understand and counter fraudulent activities. One of these is “like farming” – the process of artificially inflating the number of Facebook page likes. To counter them, researchers worldwide have designed detection algorithms to distinguish between genuine likes and artificial ones generated by farm-controlled accounts. However, it turns out that more sophisticated farms can often evade detection tools, including those deployed by Facebook.

What is Like Farming?

Facebook pages allow their owners to publicize products and events and in general to get in touch with customers and fans. They can also promote them via targeted ads – in fact, more than 40 million small businesses reportedly have active pages, and almost 2 million of them use Facebook’s advertising platform.

At the same time, as the number of likes attracted by a Facebook page is considered a measure of its popularity, an ecosystem of so-called “like farms” has emerged that inflate the number of page likes. Farms typically do so either to later sell these pages to scammers at an increased resale/marketing value or as a paid service to page owners. Costs for like farms’ services are quite volatile, but they typically range between $10 and $100 per 100 likes, also depending on whether one wants to target specific regions — e.g., likes from US users are usually more expensive.

Screenshot from http://www.getmesomelikes.co.uk/
Screenshot from http://www.getmesomelikes.co.uk/

How do farms operate?

There are a number of possible way farms can operate, and ultimately this dramatically influences not only their cost but also how hard it is to detect them. One obvious way is to instruct fake accounts, however, opening a fake account is somewhat cumbersome, since Facebook now requires users to solve a CAPTCHA and/or enter a code received via SMS. Another strategy is to rely on compromised accounts, i.e., by controlling real accounts whose credentials have been illegally obtained from password leaks or through malware. For instance, fraudsters could obtain Facebook passwords through a malicious browser extension on the victim’s computer, by hijacking a Facebook app, via social engineering attacks, or finding credentials leaked from other websites (and dumped on underground forums) that are also valid on Facebook.

Continue reading On the hunt for Facebook’s army of fake likes

Insecure by design: protocols for encrypted phone calls

The MIKEY-SAKKE protocol is being promoted by the UK government as a better way to secure phone calls. The reality is that MIKEY-SAKKE is designed to offer minimal security while allowing undetectable mass surveillance, through the introduction a backdoor based around mandatory key-escrow. This weakness has implications which go further than just the security of phone calls.

The current state of security for phone calls leaves a lot to be desired. Land-line calls are almost entirely unencrypted, and cellphone calls are also unencrypted except for the radio link between the handset and the phone network. While the latest cryptography standards for cellphones (3G and 4G) are reasonably strong it is possible to force a phone to fall back to older standards with easy-to-break cryptography, if any. The vast majority of phones will not reveal to their user whether such an attack is under way.

The only reason that eavesdropping on land-line calls is not commonplace is that getting access to the closed phone networks is not as easy compared to the more open Internet, and cellphone cryptography designers relied on the equipment necessary to intercept the radio link being only affordable by well-funded government intelligence agencies, and not by criminals or for corporate espionage. That might have been true in the past but it certainly no longer the case with the necessary equipment now available for $1,500. Governments, companies and individuals are increasingly looking for better security.

A second driver for better phone call encryption is the convergence of Internet and phone networks. The LTE (Long-Term Evolution) 4G cellphone standard – under development by the 3rd Generation Partnership Project (3GPP) – carries voice calls over IP packets, and desktop phones in companies are increasingly carrying voice over IP (VoIP) too. Because voice calls may travel over the Internet, whatever security was offered by the closed phone networks is gone and so other security mechanisms are needed.

Like Internet data encryption, voice encryption can broadly be categorised as either link encryption, where each intermediary may encrypt data before passing it onto the next, or end-to-end encryption, where communications are encrypted such that only the legitimate end-points can have access to the unencrypted communication. End-to-end encryption is preferable for security because it avoids intermediaries being able to eavesdrop on communications and gives the end-points assurance that communications will indeed be encrypted all the way to their other communication partner.

Current cellphone encryption standards are link encryption: the phone encrypts calls between it and the phone network using cryptographic keys stored on the Subscriber Identity Module (SIM). Within the phone network, encryption may also be present but the network provider still has access to unencrypted data, so even ignoring the vulnerability to fall-back attacks on the radio link, the network providers and their suppliers are weak points that are tempting for attackers to compromise. Recent examples of such attacks include the compromise of the phone networks of Vodafone in Greece (2004) and Belgacom in Belgium (2012), and the SIM card supplier Gemalto in France (2010). The identity of the Vodafone Greece hacker remains unknown (though the NSA is suspected) but the attacks against Belgacom and Gemalto were carried out by the UK signals intelligence agency – GCHQ – and only publicly revealed from the Snowden leaks, so it is quite possible there are others attacks which remain hidden.

Email is typically only secured by link encryption, if at all, with HTTPS encrypting access to most webmail and Transport Layer Security (TLS) sometimes encrypting other communication protocols that carry email (SMTP, IMAP and POP). Again, the fact that intermediaries have access to plaintext creates a vulnerability, as demonstrated by the 2009 hack of Google’s Gmail likely originating from China. End-to-end email encryption is possible using the OpenPGP or S/MIME protocols but their use is not common, primarily due to their poor usability, which in turn is at least partially a result of having to stay compatible with older insecure email standards.

In contrast, instant messaging applications had more opportunity to start with a clean-slate (because there is no expectation of compatibility among different networks) and so this is where much innovation in terms of end-to-end security has taken place. Secure voice communication however has had less attention than instant messaging so in the remainder of the article we shall examine what should be expected of a secure voice communication system, and in particular see how one of the latest and up-coming protocols, MIKEY-SAKKE, which comes with UK government backing, meets these criteria.

MIKEY-SAKKE and Secure Chorus

MIKEY-SAKKE is the security protocol behind the Secure Chorus voice (and also video) encryption standard, commissioned and designed by GCHQ through their information security arm, CESG. GCHQ have announced that they will only certify voice encryption products through their Commercial Product Assurance (CPA) security evaluation scheme if the product implements MIKEY-SAKKE and Secure Chorus. As a result, MIKEY-SAKKE has a monopoly over the vast majority of classified UK government voice communication and so companies developing secure voice communication systems must implement it in order to gain access to this market. GCHQ can also set requirements of what products are used in the public sector and as well as for companies operating critical national infrastructure.

UK government standards are also influential in guiding purchase decisions outside of government and we are already seeing MIKEY-SAKKE marketed commercially as “government-grade security” and capitalising on their approval for use in the UK government. For this reason, and also because GCHQ have provided implementers a free open source library to make it easier and cheaper to deploy Secure Chorus, we can expect wide use MIKEY-SAKKE in industry and possibly among the public. It is therefore important to consider whether MIKEY-SAKKE is appropriate for wide-scale use. For the reasons outlined in the remainder of this article, the answer is no – MIKEY-SAKKE is designed to offer minimal security while allowing undetectable mass surveillance though key-escrow, not to provide effective security.

Continue reading Insecure by design: protocols for encrypted phone calls

Scaling Tor hidden services

Tor hidden services offer several security advantages over normal websites:

  • both the client requesting the webpage and the server returning it can be anonymous;
  • websites’ domain names (.onion addresses) are linked to their public key so are hard to impersonate; and
  • there is mandatory encryption from the client to the server.

However, Tor hidden services as originally implemented did not take full advantage of parallel processing, whether from a single multi-core computer or from load-balancing over multiple computers. Therefore once a single hidden service has hit the limit of vertical scaling (getting faster CPUs) there is not the option of horizontal scaling (adding more CPUs and more computers). There are also bottle-necks in the Tor networks, such as the 3–10 introduction points that help to negotiate the connection between the hidden service and the rendezvous point that actually carries the traffic.

For my MSc Information Security project at UCL, supervised by Steven Murdoch with the assistance of Alec Muffett and other Security Infrastructure engineers at Facebook in London, I explored possible techniques for improving the horizontal scalability of Tor hidden services. More precisely, I was looking at possible load balancing techniques to offer better performance and resiliency against hardware/network failures. The focus of the research was aimed at popular non-anonymous hidden services, where the anonymity of the service provider was not required; an example of this could be Facebook’s .onion address.

One approach I explored was to simply run multiple hidden service instances using the same private key (and hence the same .onion address). Each hidden service periodically uploads its own descriptor, which describes the available introduction points, to six hidden service directories on a distributed hash table. The hidden service instance chosen by the client depends on which hidden service instance most recently uploaded its descriptor. In theory this approach allows an arbitrary number of hidden service instances, where each periodically uploads its own descriptors, overwriting those of others.

This approach can work for popular hidden services because, with the large number of clients, some will be using the descriptor most recently uploaded, while others will have cached older versions and continue to use them. However my experiments showed that the distribution of the clients over the hidden service instances set up in this way is highly non-uniform.

I therefore ran experiments on a private Tor network using the Shadow network simulator running multiple hidden service instances, and measuring the load distribution over time. The experiments were devised such that the instances uploaded their descriptors simultaneously, which resulted in different hidden service directories receiving different descriptors. As a result, clients connecting to a hidden service would be balanced more uniformly over the available instances.

Continue reading Scaling Tor hidden services

Experimenting with SSL Vulnerabilities in Android Apps

As the number of always-on, always-connected smartphones increase, so does the amount of personal and sensitive information they collect and transmit. Thus, it is crucial to secure traffic exchanged by these devices, especially considering that mobile users might connect to open Wi-Fi networks or even fake cell towers. The go-to protocol to secure network connection is HTTPS i.e., HTTP over SSL/TLS.

In the Android ecosystem, applications (apps for short), support HTTPS on sockets by relying on the android.net, android.webkit, java.net, javax.net, java.security, javax.security.cert, and org.apache.http packages of the Android SDK. These packages are used to create HTTP/HTTPS connections, administer and verify certificates and keys, and instantiate TrustManager and HostnameVerifier interfaces, which are in turn used in the SSL certificate validation logic.

A TrustManager manages the certificates of all Certificate Authorities (CAs) used to assess a certificate’s validity. Only root CAs trusted by Android are contained in the default TrustManager. A HostnameVerifier performs hostname verification whenever a URL’s hostname does not match the hostname in the peer’s identification credentials.

While browsers provide users with visual feedback that their communication is secured (via the lock symbol) as well as certificate validation issues, non-browser apps do so less extensively and effectively. This shortcoming motivates the need to scrutinize the security of network connections used by apps to transmit user sensitive data. We found that some of the most popular Android apps insufficiently secure these connections, putting users’ passwords, credit card details and chat messages at risk.

Continue reading Experimenting with SSL Vulnerabilities in Android Apps

UCL Code Breaking Competition

6689260_sModern security systems frequently rely on complex cryptography to fulfil their goals and so it is important for security practitioners to have a good understanding of how cryptographic systems work and how they can fail. The Cryptanalysis (COMPGA18/COMPM068) module in UCL’s MSc Information Security provides students with the foundational knowledge to analyse cryptographic systems whether as part of system development in industry or as academic research.

To give students a more realistic (and enjoyable) experience there is no written exam for this module; instead the students are evaluated based on coursework and a code breaking competition.

UCL has a strong tradition of experimental research and we have been running many student competitions and hacking events in the past. In March 2013 a team directed by Dr Courtois won the UK University Cipher Challenge 2013 award, held as part of the UK Cyber Security Challenge.

This year the competition has been about finding cryptographically significant events in a real-life financial system. The competition (open both to UCL students and those of other London universities) requires the study of random number generators, elliptic curve cryptography, hash functions, exploration of large datasets, programming and experimentation, data visualisation, graphs and statistics.

We are pleased to announce the winners of the competition:

  • Joint 1st prize: Gemma Bartlett. Grade obtained 92/100.
  • Joint 1st prize: Vasileios Mavroudis.  Grade obtained 92/100.
  • 2nd prize: David Kohan Marzagão.  Grade obtained 82/100.

About the winners:

gemmb vasmdavm

  • Gemma Bartlett (left) is in her final year at UCL studying for an M.Eng. in Mathematical Computation with a focus on Information Security. Her particular interests include digital forensics. She will be starting a job in this field after graduation.
  • Vasilios Mavroudis (middle) received his B.Sc. in Applied Informatics from the University of Macedonia, Greece in 2012.  He is currently pursuing an M.Sc. in Information Security at UCL. In the past, he has worked as a security researcher in Deutsche Bank, University of California Santa Barbara and at the Centre for Research and Technology Hellas (CERTH). His research interests include network and systems security, malware, and applied cryptography.
  • David Kohan Marzagão (right) is currently undertaking a PhD in Computer Science under the supervision of Peter McBurney at King’s College London.  In 2014, he received his BSc in Mathematics at the University of São Paulo, Brazil. His research interests include cryptography, multi-agent systems, graph theory, and random walks.

Understanding Online Dating Scams

Our research on online dating scams will be presented at the  Conference on Detection of Intrusions and Malware and Vulnerability Assessment (DIMVA) that will be held in Milan in July. This work was a collaboration with colleagues working for Jiayuan, the largest online dating site in China, and is the first large-scale measurement of online dating scams, comprising a dataset of more than 500k accounts used by scammers on Jiayuan across 2012 and 2013.

As someone who has spent a considerable amount of time researching ways to mitigate malicious activity on online services, online dating scams picked my interest for a number of reasons. First, online dating sites operate following completely different dynamics compared to traditional online social networks. On a regular social network (say Facebook or Linkedin) users connect with people they know in real life, and any request to connect from an unknown person is considered unsolicited and potentially malicious. Many malicious content detection systems (including my own) leverage this observation to detect malicious accounts. Putting people who don’t know each other in contact, however, is the main purpose of online dating sites – for this reason, traditional methods to detect fake and malevolent accounts cannot be applied to this context, and the development of a new threat model is required. As a second differentiator, online dating users tend to use the site only for the first contact, and move to other media (text messages, instant messaging) after that. Although that is fine for regular use, it makes it more difficult to track scammers, because the online dating site loses visibility of the messages exchanged between users after they have left the site. Third, online dating scams have a strong human component, which differentiates them heavily from traditional malicious activity on online services such as spam, phishing, or malware.

We identified three types of scams happening on Jiayuan. The first one involves advertising of  escort services or illicit goods, and is very similar to traditional spam. The other two are far more interesting and specific to the online dating landscape. One type of scammers are what we call swindlers. For this scheme, the scammer starts a long-distance relationship with an emotionally vulnerable victim, and eventually asks her for money, for example to purchase the flight ticket to visit her. Needless to say, after the money has been transferred the scammer disappears. Another interesting type of scams that we identified are what we call dates for profit. In this scheme, attractive young ladies are hired by the owners of fancy restaurants. The scam then consists in having the ladies contact people on the dating site, taking them on a date at the restaurant, having the victim pay for the meal, and never arranging a second date. This scam is particularly interesting, because there are good chances that the victim will never realize that he’s been scammed – in fact, he probably had a good time.

In the paper we analyze the accounts that we detected belonging to the different scam types, and extract typical information about the demographics that scammers pose as in their accounts, as well as the demographics of their victims. For example, we show that swindlers usually pose as widowed mid-aged men and target widowed women. We then analyze the modus operandi of scam accounts, showing that specific types of scam accounts have a higher chance of getting the attention of their victims and receiving replies than regular users. Finally, we show that the activity performed on the site by scammers is mostly manual, and that the use of infected computers and botnet to spread content – which is prominent on other online services – is minimal.

We believe that the observations provided in this paper will shed some light on a so far understudied problem in the field of computer security, and will help researchers in developing systems that can automatically detect such scam accounts and block them before they have a chance to reach their victims.

The full paper is available on my website.

Update (2015-05-15): There is press coverage of this paper in Schneier on Security and BuzzFeed.

Teaching cybersecurity to criminologists

I recently had the pleasure of teaching my first module at UCL, an introduction to cybersecurity for students in the SECReT doctoral training centre.

The module had been taught before, but always from a fairly computer-science-heavy perspective. Given that the students had largely no background in computer science, and that my joint appointment in the Department of Security and Crime Science has given me at least some small insight into what aspects of cybersecurity criminologists might find interesting, I chose to design the lecture material largely from scratch. I tried to balance the technical components of cybersecurity that I felt everyone needed to know (which, perhaps unsurprisingly, included a fair amount of cryptography) with high-level design principles and the overarching question of how we define security. Although I say I designed the curriculum from scratch, I of course ended up borrowing heavily from others, most notably from the lecture and exam material of my former supervisor’s undergraduate cybersecurity module (thanks, Stefan!) and from George’s lecture material for Introduction to Computer Security. If anyone’s curious, the lecture material is available on my website.

As I said, the students in the Crime Science department (and in particular the ones taking this module) had little to no background in computer science.  Instead, they had a diverse set of academic backgrounds: psychology, political science, forensics, etc. One of the students’ proposed dissertation titles was “Using gold nanoparticles on metal oxide semiconducting gas sensors to increase sensitivity when detecting illicit materials, such as explosives,” so it’s an understatement to say that we were approaching cybersecurity from different directions!

With that in mind, one of the first things I did in my first lecture was to take a poll on who was familiar with certain concepts (e.g., SSH, malware, the structure of the Internet), and what people were interested in learning about (e.g., digital forensics, cryptanalysis, anonymity). I don’t know what I was expecting, but the responses really blew me away! The students overwhelmingly wanted to hear about how to secure themselves on the Internet, both in terms of personal security habits (e.g., using browser extensions) and in terms of understanding what and how things might go wrong. Almost the whole class specifically requested Tor, and a few had even used it before.

This theme of being (pleasantly!) surprised continued throughout the term.  When I taught certificates, the students asked not for more details on how they work, but if there was a body responsible for governing certificate authorities and if it was possible to sue them if they misbehave. When I taught authentication, we played a Scattergories-style game to weigh the pros and cons of various authentication mechanisms, and they came up with answers like “a con of backup security questions is that they reveal cultural trends that may then be used to reveal age, ethnicity, gender, etc.”

There’s still a month and a half left until the students take the exam, so it’s too soon to say how effective it was at teaching them cybersecurity, but for me the experience was a clear success and one that I look forward to repeating and refining in the future.

Tor: the last bastion of online anonymity, but is it still secure after Silk Road?

The Silk Road trial has concluded, with Ross Ulbricht found guilty of running the anonymous online marketplace for illegal goods. But questions remain over how the FBI found its way through Tor, the software that allows anonymous, untraceable use of the web, to gather the evidence against him.

The development of anonymising software such as Tor and Bitcoin has forced law enforcement to develop the expertise needed to identify those using them. But if anything, what we know about the FBI’s case suggests it was tip-offs, inside men, confessions, and Ulbricht’s own errors that were responsible for his conviction.

This is the main problem with these systems: breaking or circumventing anonymity software is hard, but it’s easy to build up evidence against an individual once you can target surveillance, and wait for them to slip up.

The problem

A design decision in the early days of the internet led to a problem: every message sent is tagged with the numerical Internet Protocol (IP) addresses that identify the source and destination computers. The network address indicates how and where to route the message, but there is no equivalent indicating the identity of the sender or intended recipient.

This conflation of addressing and identity is bad for privacy. Any internet traffic you send or receive will have your IP address attached to it. Typically a computer will only have one public IP address at a time, which means your online activity can be linked together using that address. Whether you like it or not, marketers, criminals or investigators use this sort of profiling without consent all the time. The way IP addresses are allocated is geographically and on a per-organisation basis, so it’s even possible to pinpoint a surprisingly accurate location.

This conflation of addressing and identity is also bad for security. The routing protocols which establish the best route between two points on the internet are not secure, and have been exploited by attackers to take control of (hijack) IP addresses they don’t legitimately own. Such attackers then have access to network traffic destined for the hijacked IP addresses, and also to anything the legitimate owner of the IP addresses should have access to.

Continue reading Tor: the last bastion of online anonymity, but is it still secure after Silk Road?