Exploring an Attack on Image Scaling Algorithms

In their 2019 publication ‘Seeing is Not Believing: Camouflage Attacks on Image Scaling Algorithms’, Xiao et al. demonstrated a fascinating and frightening exploit on a few commonly used and popular scaling algorithms. Through what Quiring et al. referred to as adversarial preprocessing, they created an attack image that closely resembles one image (decoy) but portrays a completely different image (payload) when scaled down. In their example (below), an image of sheep could scale down and suddenly show a wolf.

Two images are shown, the left shows the original attack image, which depicts a group of sheep. The right shows the scaled down attack image, which shows a grey wolf.
On the left, a group of sheep can be seen in a slightly stretched out photo (the decoy). When scaled down to the correct dimensions (right), the image shows a grey wolf (payload). This is an example of an attack image.

These attack images can be used in a number of scenarios, particularly in data poisoning of deep learning datasets and covert dissemination of information. Deep learning models require large datasets for training. A series of carefully crafted and planted attack images placed into public datasets can poison these models, for example, reducing the accuracy of object classification. Essentially all models are trained with images scaled down to a fixed size (e.g. 229 × 229) to reduce the computational load, so these attack images are highly likely to work if their dimensions are correctly configured. As these attack images hide their malicious payload in plain sight, they also evade detection. Xiao et al. described how an attack image could be crafted for a specific device (e.g. an iPhone XS) so that the iPhone XS browser renders the malicious image instead of the decoy image. This technique could be used to propagate payload, such as illegal advertisements, discreetly.

The natural stealthiness of this attack is a dangerous factor, but on top of that, it is also relatively easy to replicate. Xiao et al. published their own source code in a GitHub repository, with which anyone can run and create their own attack images. Additionally, the maths behind the method is also well described in the paper, allowing my group to replicate the attack for coursework assigned to us for UCL’s Computer Security II module, without referencing the paper authors’ source code. Our implementation of the attack is available at our GitHub repository. This coursework required us to select an attack detailed in a conference paper and replicate it. While working on the coursework, we discovered a relatively simple way to stop these attack images from working and even allow the original content to be viewed. This is shown in the series of images below.

Continue reading Exploring an Attack on Image Scaling Algorithms

Creating scalable distributed ledgers for DECODE

Since the introduction of Bitcoin in 2008, blockchains have gone from a niche cryptographic novelty to a household name. Ethereum expanded the applicability of such technologies, beyond managing monetary value, to general computing with smart contracts. However, we have so far only scratched the surface of what can be done with such “Distributed Ledgers”.

The EU Horizon 2020 DECODE project aims to expand those technologies to support local economy initiatives, direct democracy, and decentralization of services, such as social networking, sharing economy, and discursive and participatory platforms. Today, these tend to be highly centralized in their architecture.

There is a fundamental contradiction between how modern services harness the work and resources of millions of users, and how they are technically implemented. The promise of the sharing economy is to coordinate people who want to provide resources with people who want to use them, for instance spare rooms in the case of Airbnb; rides in the case of Uber; spare couches of in the case of couchsurfing; and social interactions in the case of Facebook.

These services appear to be provided in a peer-to-peer, and disintermediated fashion. And, to some extent, they are less mediated at the application level thanks to their online nature. However, the technical underpinnings of those services are based on the extreme opposite design philosophy: all users technically mediate their interactions through a very centralized service, hosted on private data centres. The big internet service companies leverage their centralized position to extract value out of user or providers of services – becoming de facto monopolies in many case.

When it comes to privacy and security properties, those centralized services force users to trust them absolutely, and offer little on the way of transparency to even allow users to monitor the service practices to ground that trust. A recent example illustrating this problem was Uber, the ride sharing service, providing a different view to drivers and riders about the fare that was being paid for a ride – forcing drivers to compare what they receive with what riders pay to ensure they were getting a fair deal. Since Uber, like many other services, operate in a non-transparent manner, its functioning depends on users absolute to ensure fairness.

The lack of user control and transparency of modern online services goes beyond monetary and economic concerns. Recently, the Guardian has published the guidelines used by Facebook to moderate abusive or illegal user postings. While, moderation has a necessary social function, the exact boundaries of what constitutes abuse came into question: some forms of harms to children or holocaust denial were ignored, while material of artistic or political value has been suppressed.

Even more worryingly, the opaque algorithms being used to promote and propagate posts have been associated with creating a filter bubble effect, influencing elections, and dark adverts, only visible to particular users, are able to flout standards of fair political advertising. It is a fact of the 21st century that a key facet of the discursive process of democracy will take place on online social platforms. However, their centralized, opaque and advertising-driven form is incompatible with their function as a tool for democracy.

Finally, the revelations of Edward Snowden relating to mass surveillance, also illustrate how the technical centralization of services erodes privacy at an unprecedented scale. The NSA PRISM program coerced internet services to provide access to data on their services under a FISA warrant, not protecting the civil liberties of non-American persons. At the same time, the UPSTREAM program collected bulk information between data centres making all economic, social and political activities taking place on those services transparent to US authorities. While users struggle to understand how those services operate, governments (often foreign) have total visibility. This is a complete inversion of the principles of liberal democracy, where usually we would expect citizens to have their privacy protected, while those in position of authority and power are expected to be accountable.

The problems of accountability, transparency and privacy are social, but are also based on the fundamental centralized architecture underpinning those services. To address them, the DECODE project brings together technical, legal, social experts from academia, alongside partners from local government and industry. Together they are tasked to develop architectures that are compatible with the social values of transparency, user and community control, and privacy.

The role of UCL Computer Science, as a partner, is to provide technical options into two key technical areas: (1) the scalability of secure decentralized distributed ledgers that can support millions or billions of users while providing high-integrity and transparency to operations; (2) mechanisms for protecting user privacy despite the decentralized and transparent infrastructure. The latter may seem like an oxymoron: how can transparency and privacy be reconciled? However, thanks to advances in modern cryptography, it is possible to ensure that operations were correctly performed on a ledger, without divulging private user data – a family of techniques known as zero-knowledge.

I am particularly proud of the UCL team we have put together that is associated with this project, and strengthens considerably our existing expertise in distributed ledgers.

I will be leading and coordinating the work. I have a long standing interest, and track record, in privacy enhancing technologies and peer-to-peer computing, as well as scalable distributed ledgers – such as the RSCoin currency proposal. Shehar Bano, an expert on systems and networking, has joined us as a post-doctoral researcher after completing her thesis at Cambridge. Alberto Sonnino will be doing his thesis on distributed ledgers and privacy, as well as hardware and IoT applications related to ledgers, after completing his MSc in Information Security at UCL last year. Mustafa Al-Bassam, is also associated with the project and works on high-integrity and scalable ledger technologies, after completing his degree at Kings College London – he is funded by the Turing Institute to work on such technologies. Those join our wider team of UCL CS faculty, with research interests in distributed ledgers, including Sarah Meiklejohn, Nicolas Courtois and Tomaso Aste and their respective teams.

 

This post also appears on the DECODE project blog.

Preventing phishing won’t stop ransomware spreading

Ransomware is in the news again, with Reckitt Benckiser reporting that disruption caused by the NotPetya ransomware could have cost them up to £100 million. In response to this news, just as every previous ransomware incident, the security industry started giving out advice – almost universally emphasising the importance of not opening phishing emails.

The problem is that this advice won’t work. Putting aside the fact that such advice is often so vague as to be impossible to put into action, the cause of recent ransomware outbreaks is not people opening phishing emails:

  • WannaCry, which notably caused severe disruption to the NHS, spread by automated scanning of computers vulnerable to an NSA-developed exploit. Although the starting point was initially assumed to be a phishing email, this was later debunked – only network scanning was used.
  • The Mole Ransomware attack that hit many organisations, including UCL, was initially thought to be spread by employees clicking on links in phishing emails. Subsequent analysis found this was incorrect and most likely the malware spread through malicious advertisements on legitimate websites.
  • NotPetya was initially thought to have been spread through Russian or Ukrainian phishing emails (explaining why that part of the world was so badly affected). It turned out to have not involved phishing at all, but the outbreak started through a tampered software update to the MEDoc tax accounting software mandated by the Ukranian government. Once inside an organisation, NotPetya then spread using the same exploit as WannaCry or by compromising administrative credentials.

Here are three major incidents, making international news, and the standard advice to “be vigilant” when opening emails or clicking links would have been useless. Is it any surprise that security advice gets ignored?

Not only is common anti-phishing advice unhelpful but it shifts blame to individuals (who are not in a position to prevent or mitigate most attacks) away from the IT industry and staff (who are). It also misleads management into thinking that they can “blame-and-train” their employees rather than investing in well engineered preventative security mechanisms and IT systems that can recover from compromise.

And there are things that can be done which have been shown to be effective, not just against the current outbreaks but many in the past and likely future. WannaCry would have been prevented by applying software updates, but the NotPetya outbreak was caused by a software update. The industry needs to act promptly to ensure that software updates are safe and reliable before customers become even more wary about installing them.

The spread of WannaCry and NotPetya within companies could have been prevented or slowed through better operational practices such as segmenting networks and limiting the use of administrative privilege. We’ve known this approach to be effective, but better tools and practices are needed to avoid enhanced security mechanisms being a drag on an organisation’s productivity.

Mole could have been prevented by ad-blocking browser extensions. The advertising industry is in open war against ad-blocking because it harms their income stream, but while they keep on spreading malware through their networks I have limited sympathy.

Well maintained and protected backups are essential to allow recovery, whether from ransomware, purely destructive attacks, or hardware failure. The security techniques above are effective, but these measures will not prevent every attack so mechanisms are needed to efficiently deal with the aftermath.

Most importantly we need to move away from security being a set of traditions passed from generation to generation with little or no reason to believe they are effective (so called “best practice”) to well engineered systems following rigorous, evidence-based guidance on state of the art cybersecurity principles, standards and practices.

On the security and privacy of the ultrasound tracking ecosystem

In April 2016, the US Federal Trade Commission (FTC) sent warning letters to 12 Google Play app developers. The letters were addressed to those who incorporated the SilverPush framework in their apps, and reminded developers who used tracking software to explicitly inform their users (as seen in Section 5 of the Federal Trade Commission Act). The incident was covered by popular press and privacy concerns were raised. Shortly after, SilverPush claimed no active partnerships in the US and the buzz subsided.

Unfortunately, as the incident was resolved relatively fast, very few technical details of the technology were made public. To fill in this gap and understand the potential security implications, we conducted an in-depth study of the SilverPush framework and all the associated technologies.

The development of the framework was motivated by a fast-increasing interest of the marketing industry in products performing high-accuracy user tracking, and their derivative monetization schemes. This resulted in a high demand for cross-device tracking techniques with increased accuracy and reduced prerequisites.

The SilverPush framework fulfilled both of these requirements, as it provided a novel way to track users between their devices (e.g., TV, smartphone), without any user actions (e.g., login to a single platform from all their devices). To achieve that, the framework realized a previously unseen cross-device tracking technique (i.e., ultrasound cross-device tracking, uXDT) that offered high tracking accuracy, and came with various desirable features (e.g., easy to deploy, imperceptible by users). What differentiated that framework from existing ones was the use of high-frequency, inaudible ultrasonic beacons (uBeacons) as a medium/channel for identifier transmission between the user’s devices. This is also offered a major advantage to uXDT, against other competing technologies, as uBeacons can be emitted by most commercial speakers and captured by most commercial microphones. This eliminates the need for specialized emission and/or capturing equipment.

Aspects of a little-known ecosystem

The low deployment cost of the technology fueled the growth of a whole ecosystem of frameworks and applications that use uBeacons for various purposes, such as proximity marketing, audience analytics, and device pairing. The ecosystem is built around the near-ultrasonic transmission channel, and enables marketers to profile users.

Unfortunately, users are often given limited information on the ecosystem’s inner workings. This lack of transparency has been the target of great criticism from the users, the security community and the regulators. Moreover, our security analysis revealed a false assumption in the uBeacon threat model that can be exploited by state-level adversaries to launch complex attacks, including one that de-anonymizes the users of anonymity networks (e.g., Tor).

On top of these, a more fundamental shortcoming of the ecosystem is the violation of the least privilege principle, as a consequence of the access to the device’s microphone. More specifically, any app that wants to employ ultrasound-based mechanisms needs to gain full access to the device’s microphone, as there is currently no way to gain access only to the ultrasound spectrum. This clearly violates the least privilege principle, as the app has now access to all audible frequencies and allows a potentially malicious developer to request access to the microphone for ultrasound-pairing purposes, and then use it to eavesdrop the user’s discussions. This also results in any ultrasound-enabled apps to risk being perceived as “potentially malicious” by the users.

Mitigation

To address these shortcomings, we developed a set of countermeasures aiming to provide protection to the users in the short and medium term. The first one is an extension for the Google Chrome browser, which filters out all ultrasounds from the audio output of the websites loaded by the user. The extension actively prevents web pages from emitting inaudible sounds, and thus completely thwarts any unsolicited ultrasound-tracking attempts. Furthermore, we developed a patch for the Android permission system that allows finer-grained control over the audio channel, and forces applications to declare their intention to capture sound from the inaudible spectrum. This will properly separate the permissions for listening to audible sound and sound in the high-frequency spectrum, and will enable the end users to selectively filter the ultrasound frequencies out of the signal acquired by the smartphone microphone.

More importantly, we argue that the ultrasound ecosystem can be made secure only with the standardization of the ultrasound beacon format. During this process, the threat model will be revised and the necessary security features for uBeacons will be specified. Once this process is completed, APIs for handling uBeacons can be implemented in all major operating systems. Such an API would provide methods for uBeacon discovery, processing, generation and emission, similar to those found in the Bluetooth Low Energy APIs. Thereafter, all ultrasound-enabled apps will need access only to this API, and not to the device’s microphone. Thus, solving the problem of over-privileging that exposed the user’s sensitive data to third-party apps.

Discussion

Our work provides an early warning on the risks looming in the ultrasound ecosystem, and lays the foundations for the secure use of this set of technologies. However, it also raises several questions regarding the security of the audio channel. For instance, in a recent incident a journalist accidentally injected commands to several amazon echo devices, which then allegedly tried to order products online. This underlines the need for security features in the audio channel. Unfortunately, due to the variety of use cases, a universal solution that could be applied to the lower communication layers seems unlikely. Instead, solutions must be sought in the higher communications layers (e.g., application layers), and should be the outcome of careful threat modeling.

Battery Status readout as a privacy risk

Privacy risks and threats arise even in seemingly innocuous mechanisms. It is a fairly regular issue.

Over a year ago, I was researching the risk of the W3C Battery Status API. The mechanism allows a web site to read the battery level of a device (smartphone, laptop, etc.). One of the positive use cases may be, for example, stopping the execution of intensive operations if the battery is running low.

Our privacy analysis of Battery Status API revealed interesting results.

Privacy analysis of Battery API

The battery status provides the following information:

  • the current level of battery (format: 0.0–1.0, for empty and full battery, respectively)
  • time to a full discharge of battery (in seconds)
  • time to a full charge of battery, if connected to a charger (in seconds)

These items are updated whenever a new value is supplied by the operating system

It turns out that privacy risks may surface even in this kind of – seemingly innocuous – data and access mechanisms.

Frequency of changes

The frequency of changes in the reported readouts from Battery Status API potentially allow the monitoring of users’ computer use habits; for example, potentially enabling analyzing of how frequently the user’s device is under heavy use. This could lead to behavioral analysis.

Additionally, identical installations of computer deployments in standard environments (e.g. at schools, work offices, etc.) are often are behind a NAT. In simple terms, NAT allows a number of users to browse the Internet with an – externally seen – single IP address. The ability of observing any differences between otherwise identical computer installations – potentially allows particular users to be identified (and targeted?).

Battery readouts as identifiers

The information provided by the Battery Status API is not always subject to rapid changes. In other words, this information may be static for a period of time; this in turn may give rise to a short-lived identifier. The situation gets especially interesting when we consider a scenario of users sometimes clearing standard web identifiers (such as cookies). In such a case, a web script could potentially analyse identifiers provided by Battery Status API, and this information then could possibly even lead to re-creation of other identifiers. A simple sketch follows.

Continue reading Battery Status readout as a privacy risk

Analyzing privacy aspects of the W3C Vibration API

When making web standards, multiple scenarios possibly affecting privacy are considered. This includes even extreme ones; and this is a good thing. It’s best to predict the creative use and abuse of web features, before they are exploited.

Vibration API

The mechanism allowing websites to utilize a device’s vibration motor is called the Vibration API. The mechanism allows a device to be vibrated in particular patterns. The argument to the vibration() function is a list called a pattern. The list’s odd indices cause a vibration for a specific length of time, and even values are the still periods. For example, a web designer can make the device to vibrate for a specific duration, say 50 ms and follow that with a still period of 100 ms using the following call:

navigator.vibration([50,100])

In certain circumstances this can create several interesting potential privacy risks. Let’s look at the Vibration API from a privacy point of view. I will consider a number of scenarios on various technical levels.

Toy de-anonymisation scenario

One potential risk is the identification of a particular person in real life. Imagine several people in the same room placing their devices on a table. At some point, one person’s device vibrates in specific patterns. This individual might then become marked to a potential observer.

How could such a script be delivered? One possibility is though web advertising infrastructures. These offer capabilities of targeting individuals with a considerable accuracy (with respect to their location).

Continue reading Analyzing privacy aspects of the W3C Vibration API

Adblocking and Counter-Blocking: A Slice of the Arms Race

anti-adblocking message from WIRED
If you use an adblocker, you are probably familiar with messages of the kind shown above, asking you to either disable your adblocker, or to consider supporting the host website via a donation or subscription. This is the battle du jour in the ongoing adblocking arms race — and it’s one we explore in our new report Adblocking and Counter-Blocking: A Slice of the Arms Race.

The reasons for the rising popularity of adblockers include improved browsing experience, better privacy, and protection against malvertising. As a result, online advertising revenue is gravely threatened by adblockers, prompting publishers to actively detect adblock users, and subsequently block them or otherwise coerce the user to disable the adblocker — practices we refer to as anti-adblocking. While there has been a degree of sound and fury on the topic, until now we haven’t been able to understand the scale, mechanism and dynamics of anti-adblocking. This is the gap we have started to address, together with researchers from the University of Cambridge, Stony Brook University, University College London, University of California Berkeley, Queen Mary University of London and International Computer Science Institute (Berkeley). We address some of these questions by leveraging a novel approach for identifying third-party services shared across multiple websites to present a first characterization of anti-adblocking across the Alexa Top-5K websites.

We find that at least 6.7% of Alexa Top-5K websites employ anti-adblocking, with the practices finding adoption across a diverse mix of publishers; particularly publishers of “General News”, “Blogs/Wiki”, and “Entertainment” categories. It turns out that these websites owe their anti-adblocking capabilities to 14 unique scripts pulled from 12 unique domains. Unsurprisingly, the most popular domains are those that have skin in the game — Google, Taboola, Outbrain, Ensighten and Pagefair — the latter being a company that specialises in anti-adblocking services. Then there are in-house anti-adblocking solutions that are distributed by a domain to client websites belonging to the same organisation: TripAdvisor distributes an anti-adblocking script to its eight websites with different country code top-level domains, while adult websites (all hosted by MindGeek) turn to DoublePimp. Finally, we visited a sample website for each anti-adblocking script via AdBlock Plus, Ghostery and Privacy Badger, and discovered that half of the 12 anti-adblocking suppliers are counter-blocked by at least one adblocker — suggesting that the arms race has already entered the next level.

It is hard to say how many levels deeper the adblocking arms race might go. While anti-adblocking may provide temporary relief to publishers, it is essentially band-aid solution to mask a deeper issue — the disequilibrium between ads (and, particularly, their behavioural tracking back-end) and information. Any long term solution must address the reasons that brought users to adblockers in the first place. In the meantime, as the arms race continues to escalate, we hope that studies such as ours will bring transparency to this opaque subject, and inform policy that moves us out of the current deadlock.

 

“Ad-Blocking and Counter Blocking: A Slice of the Arms Races” by Rishab Nithyanand, Sheharbano Khattak, Mobin Javed, Narseo Vallina-Rodriguez, Marjan Falahrastegar, Julia E. Powles, Emiliano De Cristofaro, Hamed Haddadi, and Steven J. Murdoch. arXiv:1605.05077v1 [cs.CR], May 2016.

This post also appears on the University of Cambridge Computer Laboratory Security Group blog, Light Blue Touchpaper.

On the hunt for Facebook’s army of fake likes

As social networks are increasingly relied upon to engage with people worldwide, it is crucial to understand and counter fraudulent activities. One of these is “like farming” – the process of artificially inflating the number of Facebook page likes. To counter them, researchers worldwide have designed detection algorithms to distinguish between genuine likes and artificial ones generated by farm-controlled accounts. However, it turns out that more sophisticated farms can often evade detection tools, including those deployed by Facebook.

What is Like Farming?

Facebook pages allow their owners to publicize products and events and in general to get in touch with customers and fans. They can also promote them via targeted ads – in fact, more than 40 million small businesses reportedly have active pages, and almost 2 million of them use Facebook’s advertising platform.

At the same time, as the number of likes attracted by a Facebook page is considered a measure of its popularity, an ecosystem of so-called “like farms” has emerged that inflate the number of page likes. Farms typically do so either to later sell these pages to scammers at an increased resale/marketing value or as a paid service to page owners. Costs for like farms’ services are quite volatile, but they typically range between $10 and $100 per 100 likes, also depending on whether one wants to target specific regions — e.g., likes from US users are usually more expensive.

Screenshot from http://www.getmesomelikes.co.uk/
Screenshot from http://www.getmesomelikes.co.uk/

How do farms operate?

There are a number of possible way farms can operate, and ultimately this dramatically influences not only their cost but also how hard it is to detect them. One obvious way is to instruct fake accounts, however, opening a fake account is somewhat cumbersome, since Facebook now requires users to solve a CAPTCHA and/or enter a code received via SMS. Another strategy is to rely on compromised accounts, i.e., by controlling real accounts whose credentials have been illegally obtained from password leaks or through malware. For instance, fraudsters could obtain Facebook passwords through a malicious browser extension on the victim’s computer, by hijacking a Facebook app, via social engineering attacks, or finding credentials leaked from other websites (and dumped on underground forums) that are also valid on Facebook.

Continue reading On the hunt for Facebook’s army of fake likes

Is sending shoppers ads by Bluetooth just a bit creepy?

Using Bluetooth wireless networking to send information to nearby smartphones, beacon technology could transform how retailers engage with their customers. But customers will notice how their information is used to personalise these unsolicited adverts, and companies that fail to respect their privacy may get burned.

UK retailer House of Fraser is to introduce beacon-equipped mannequins to its Aberdeen store, which will deliver details about the clothes and accessories the mannequin is wearing to the smartphones of customers within 50 metres. In London’s Regent Street, around 100 stores have installed Apple’s iBeacons, able to send adverts to smartphones to entice passers-by to come inside.

A sort of precursor to the “internet of things”, beacon technology has great potential to enhance consumer experience: providing access to relevant information more quickly, or offering rewards and discounts for loyal shoppers. Some retailers may rearrange their store based on analysing data from customers’ shopping habits. It has uses outside of marketing too, such as providing contactless payments, tourist information at museums, or gate information at airports.

Continue reading Is sending shoppers ads by Bluetooth just a bit creepy?