A well-executed exercise in snake oil evaluation

In the umpteenth chapter of UK governments battling encryption, Priti Patel in September 2021 launched the “Safety Tech Challenge”. It was to give five companies £85K each to develop “innovative technologies to keep children safe when using end-to-end encrypted messaging services”. Tasked with evaluating the outcomes was the REPHRAIN project, the consortium given £7M to address online harms. I had been part of the UKRI 2020 panel awarding this grant, and believed then and now that it concerns a politically laden and technically difficult task, that was handed to a group of eminently sensible scientists.1 While the call had strongly invited teams to promise the impossible in order to placate the political goals, this team (and some other consortia too) wisely declined to do so, and remained realistic.

The evaluation results have now come back, and the REPHRAIN team have done a very decent job given that they had to evaluate five different brands of snake oil with their hands tied behind their backs. In doing so, they have made a valuable contribution to the development of trustworthy AI in the important application area of online (child) safety technology.

The Safety Tech Challenge

The Safety Tech Challenge was always intellectually dishonest. The essence of end-to-end encryption (E2EE) is that nothing2 can be known about encrypted information by anyone other than the sender and receiver. Not whether the last bit is a 0, not whether the message is CSAM (child sexual abuse material).3 The final REPHRAIN report indeed states there is “no published research on computational tools that can prevent CSAM in E2EE”.

In terms of technologies, there really also is no such thing as “in the context of E2EE”: the messages are agnostic as to whether they are about to be encrypted (on the sender side) or have just been decrypted (on the receiving side), and nothing meaningful can be done4 in between; any technologies that can be developed are agnostic of when they get invoked.

Continue reading A well-executed exercise in snake oil evaluation

What is Synthetic Data? The Good, the Bad, and the Ugly

Sharing data can often enable compelling applications and analytics. However, more often than not, valuable datasets contain information of sensitive nature, and thus sharing them can endanger the privacy of users and organizations.

A possible alternative gaining momentum in the research community is to share synthetic data instead. The idea is to release artificially generated datasets that resemble the actual data — more precisely, having similar statistical properties.

So how do you generate synthetic data? What is that useful for? What are the benefits and the risks? What are the fundamental limitations and the open research questions that remain unanswered?

All right, let’s go!

How To Safely Release Data?

Before discussing synthetic data, let’s first consider the “alternatives.”

Anonymization: Theoretically, one could remove personally identifiable information before sharing it. However, in practice, anonymization fails to provide realistic privacy guarantees because a malevolent actor often has auxiliary information that allows them to re-identify anonymized data. For example, when Netflix de-identified movie rankings (as part of a challenge seeking better recommendation systems), Arvind Narayanan and Vitaly Shmatikov de-anonymized a large chunk by cross-referencing them with public information on IMDb.

Continue reading What is Synthetic Data? The Good, the Bad, and the Ugly

“I am yet to meet a young person that has not experienced some form of abuse via tech”

Technology-facilitated abuse describes the misuse of digital systems such as smartphones or other Internet-connected devices to monitor, control and harm individuals. In recent years increasing attention has been given to this phenomenon in school settings and the criminal justice system. Yet, an awareness in the healthcare sector is lacking. To address this gap, Dr Isabel Straw and Dr Leonie Tanczer from University College London (UCL) have been leading a new research project that examines technology-facilitated abuse in medical settings.

Technology-facilitated forms of abuse are on the rise, with perpetrators adapting digital technologies such as smartphones and drones, trackers such as AirTags, and spyware tools including parental control software, to cause harm. The impact of technology-facilitated abuse on patients may not always be immediately obvious to healthcare professionals. For instance, smart, Internet-connected devices have been showcased to be misused in domestic abuse cases to inflict physical harm. Smart locks have been used to trap individuals inside their homes, smart thermostats have been used to inflict extremes of temperature on victims, and remotely controlled lighting and sound systems have been manipulated to cause psychological distress. COVID-19 catalyzed the proliferation of these technologies within our environment, with sales of smart devices increasing 30% on last year. Yet, while these tools are advertised for their proposed safety and convenience, they are also providing new avenues for violence, harassment, and abuse.

The impact of technology-facilitated abuse is especially notable on young people. In recent years, pediatric safeguarding guidelines have been amended in response to increasing rates of knife crime, gang violence and drug trafficking in the UK. However, technology-facilitated abuse has evolved at a parallel rate and has not received the same level of attention. The impact of technology-facilitated abuse on children and teenagers may manifest as emotional distress, anxiety, suicidal ideation. Koubel reports the exacerbation of mental health risks born from websites that encourage self-harm, eating disorders, and suicide. Furthermore, technology-facilitated dating abuse and sextortion is increasing amongst adolescent populations. With 10% of children being affected by sexual solicitation online, the problem is widespread and under-investigated. As reported by Stonard et al. in “They’ll Always Find a Way to Get to You, digital devices are playing an increasing role in relationship abuse amongst young people.

Vulnerable individuals frequently perceive medical settings as a place of safety. Healthcare professionals, thus, have a role in providing both medical and psychosocial care to ensure their wellbeing. At present, existing clinical and patient management protocols are outdated and do not address the emerging threats of technology-facilitated abuse. For clinicians to provide effective care to patients affected by technological elements of abuse and violence, clinical safeguarding protocols need a radical update if they are to assist professionals navigating high risk scenarios.

Continue reading “I am yet to meet a young person that has not experienced some form of abuse via tech”

Vulnerability in Linux containers – investigation and mitigation

Operating system access controls, that constrain which programs can open which files, have existed for almost as long as computers themselves. Access controls are still widely used and are more flexible and efficient when compared to cryptographically protecting files. Despite the long history, there continues to be innovation in access control, particularly now in containers, like Docker and Kubernetes and similar technologies offered by cloud providers. Here, rather than running lots of software on a single computer, the service is split up into microservices running in containers. Each container is isolated from others on the same computer as if it has its own computer and operating system and is prevented from reading files in other containers.

However, in reality, there’s only one operating system, and the container runtime’s role is to create the illusion that there are more than one. As part of its job, the runtime should also set up containers such that access control works inside each container because not every program running inside a container should be able to access every file. Multiple containers can also be given access to the same directory, and access controls used to restrict what each container can do with the directory contents. If access controls don’t work properly, an attacker could read or modify files they should not be able to.

Unfortunately, there is such a vulnerability. The bad news is that it originates from an omission in the specification that underlies all the major container runtimes and so is present regardless of which container runtime you use (e.g. runc, crun, Kata Containers) and regardless of whether you use containers directly (e.g. through Docker or podman) or indirectly (e.g. through Kubernetes). The good news is that the vulnerability affects a feature of Linux access control permissions that is not widely used – negative group permissions. However, if your system does depend on this feature then the vulnerability could be serious. Read on for more details about the vulnerability, why it exists and what can be done to mitigate the problem.

Introduction to Linux permissions

In Linux there are user accounts and each user is also a member of a group. Each object (files, directories, devices, etc.) has an associated owner and associated group. The object also has a set of permissions associated with the three classes: owner, group, and other. These permissions tell the operating system whether a user should be able to read from the object (r), write to the object (w) and execute the object (x). If a user is the owner of an object the owner-class permissions are used, if the user is a member of the file’s group, the group-class permissions are used, and otherwise the other-class permissions are used.

For example, a file containing a company’s finance database could be owned by the Chief Financial Officer (CFO) and have owner class permissions “r+w”. It could have the group set to “auditors” with group-class permissions only “r” and other-class permissions set to nothing. Then the CFO could freely read and write to the database, all members of the group auditors could read it, and everyone else cannot access the database at all.

Continue reading Vulnerability in Linux containers – investigation and mitigation

The legal rule that computers are presumed to be operating correctly – unforeseen and unjust consequences

In this briefing note, we discuss the legal presumption that computers are operating correctly – a topic previously covered on Bentham’s Gaze, particularly in relation to the Post Office Horizon Scandal but that is also relevant to other areas like payment disputes. The briefing note is also available in PDF format at DOI 10.14324/000.rp.10151259, where it includes more detailed citations.

Overview

In England and Wales, courts consider computers, as a matter of law, to have been working correctly unless there is evidence to the contrary. Therefore, evidence produced by computers is treated as reliable unless other evidence suggests otherwise. This way of handling evidence is known as a ‘rebuttable presumption’. A court will treat a computer as if it is working perfectly unless someone can show why that is not the case.

This presumption poses a challenge to those who dispute evidence produced by a computer system. Frequently the challenge is insurmountable, particularly where a substantial institution operates the system.

The Post Office Horizon scandal clearly exposes the problem and the harm that may result. From 1999, the Post Office prosecuted hundreds of postmasters and Post Office employees for theft and fraud based on evidence produced by the Horizon computer system showing shortfalls in their branch accounts. In those prosecutions, the Post Office relied on the presumption that computers were operating correctly.

Hundreds of postmasters and others were convicted, sentenced to terms of imprisonment, fined, or had their property confiscated. This clearly demonstrated that the Law Commission’s assertion that ‘such a regime would work fairly’ was flawed.

In the December 2019 judgment in the group litigation Bates v The Post Office Ltd (No 6: Horizon Issues) Rev 1, Mr Justice Fraser concluded that it was possible that software errors in Horizon could have caused apparent shortfalls in branch accounts, rather than these being due to theft or fraud. Following this judgement, the Criminal Cases Review Commission referred an unprecedented number of convictions, based upon the supposed shortfalls in the Horizon accounts, to the Court of Appeal. Appeal courts have quashed more than 70 convictions at the time of writing. There will be many more appeals and many more convictions quashed in what is likely the largest miscarriage of justice in British history.

Were it not for the group litigation, the fundamental unreliability of the software in the Post Office’s Horizon computer system would not have been revealed, as previous challenges to Horizon’s correctness were unable to rebut the presumption of reliability for computer evidence. The financial risk of bringing legal action deterred other challenges. Similar issues apply in other situations where the reliability of computer evidence is questioned, such as in payment disputes.

The legal presumption, as applied in practice, has exposed widespread misunderstanding about the nature of computer failures – in particular, the fact that these are almost invariably failures of software. The presumption has been the cause of widespread injustice.

Continue reading The legal rule that computers are presumed to be operating correctly – unforeseen and unjust consequences

US proposes to protect bank customers from Authorised Push Payment fraud

This week, at the US House Financial Services Committee hearing, Representative Stephen F. Lynch announced a draft of the Protecting Consumers From Payment Scams Act. If enacted, this would expand the existing protection for US customers (Regulation E) who have funds transferred out of their account without their consent, to also cover when the customer is tricked into performing the fraudulent transfer themselves. This development is happening in parallel with efforts in the UK and elsewhere to reduce fraud and better protect victims. However, the draft act’s approach is notably different from the UK approach – it’s simpler, gives stronger protection to customers, and shifts liability to the bank receiving fraudulent transfers. In this post, I’ll discuss these differences and what the implications might be.

The type of fraud the proposed law deals with, where criminals coerce victims into making payment under false pretences, is known as Authorised Push Payment (APP) fraud and is a problem worldwide. In the UK, APP fraud is now by far the most common type of payment fraud, with losses of £355 million in the first half of 2021, more than all types of card fraud put together (£261 million).

APP fraud falls outside of existing consumer protection, so victims are commonly held liable for the losses. The effects can be life-changing, with people losing 6-figure sums within minutes. It’s therefore welcome to see moves to better consumer protection. The UK was one of the first to tackle this problem, with a voluntary code of practice being put in place following years of campaigning by consumer rights organisations, particularly Which.

Continue reading US proposes to protect bank customers from Authorised Push Payment fraud

Pre-loading HSTS for sibling domains through this one weird trick

The vast majority of websites now support encrypted connections over HTTPS. This prevents eavesdroppers from monitoring or tampering with people’s web activity and is great for privacy. However, HTTPS is optional, and all browsers still support plain unsecured HTTP for when a website doesn’t support encryption. HTTP is commonly the default, and even when it’s not, there’s often no warning when access to a site falls back to using HTTP.

The optional nature of HTTPS is its weakness and can be exploited through tools, like sslstrip, which force browsers to fall back to HTTP, allowing the attacker to eavesdrop or tamper with the connection. In response to this weakness, HTTP Strict Transport Security (HSTS) was created. HSTS allows a website to tell the browser that only HTTPS should be used in future. As long as someone visits an HSTS-enabled website one time over a trustworthy Internet connection, their browser will refuse any attempt to fall back to HTTP. If that person then uses a malicious Internet connection, the worst that can happen is access to that website will be blocked; tampering and eavesdropping are prevented.

Still, someone needs to visit the website once before an HSTS setting is recorded, leaving a window of opportunity for an attacker. The sooner a website can get its HSTS setting recorded, the better. One aspect of HSTS that helps is that a website can indicate that not only should it be HSTS enabled, but that all subdomains are too. For example, planet.wikimedia.org can say that the subdomain en.planet.wikimedia.org is HSTS enabled. However, planet.wikimedia.org can’t say that commons.wikimedia.org is HSTS enabled because they are sibling domains. As a result, someone would need to visit both commons.wikimedia.org and planet.wikimedia.org before both websites would be protected.

What if HSTS could be applied to sibling domains and not just subdomains? That would allow one domain to protect accesses to another. The HSTS specification explicitly excludes this feature, for a good reason: discovering whether two sibling domains are run by the same organisation is fraught with difficulty. However, it turns out there’s a way to “trick” browsers into pre-loading HSTS status for sibling domains.

google chrome hsts warning Continue reading Pre-loading HSTS for sibling domains through this one weird trick

Apple letting the content-scanning genie out of the bottle

When Apple announced that they would be scanning iPhones for child sexual abuse material (CSAM), the push-back appears to have taken them by surprise. Since then, Apple has been engaging with experts and developing their proposals to mitigate risks that have been raised. In this post, I’ll discuss some of the issues with Apple’s CSAM detection system and what I’ve learned from their documentation and events I’ve participated in.

Technically Apple’s CSAM detection proposal is impressive, and I’m pleased to see Apple listening to the community to address issues raised. However, the system still creates risks that will be difficult to avoid. Governments are likely to ask to expand the system to types of content other than CSAM, regardless of what Apple would like to happen. When they do, there will be complex issues to deal with, both for Apple and the broader technology community. The proposals also risk causing people to self-censor, even when they are doing nothing wrong.

How Apple’s CSAM detection works

The iPhone or iPad scans images for known CSAM just before it uploads the image to Apple’s cloud data storage system – iCloud. Images that are not going to be uploaded don’t get scanned. The comparison between images and the database is made in such a way that minor changes to CSAM, like resizing and cropping, will trigger a match, but any image that wasn’t derived from a known item of CSAM should be very unlikely to match. The results of this matching process go into a clever cryptographic system designed to ensure that the user’s device doesn’t learn the contents of the CSAM database or which of their images (if any) match. If more than a threshold of about 30 images match, Apple will be able to verify if the matching images are CSAM and, if so, report to the authorities. If the number of matching images is less than the threshold, Apple learns nothing.

Risk of scope creep

Now that Apple has built their system, a risk is that it could be extended to search for content other than CSAM by expanding the database used for matching. While some security properties of their system are ensured through cryptography, the restriction to CSAM is only a result of Apple’s policy on the content of the matching database. Apple has clearly stated that it would resist any expansion of this policy, but governments may force Apple to make changes. For example, in the UK, this could be through a Technical Capability Notice (under the Investigatory Powers Act) or powers proposed in the Online Safety Bill.

If a government legally compelled them to expand the matching database, Apple may have to choose between complying or leaving the market. So far, Apple has refused to say which of these choices they would take.

Continue reading Apple letting the content-scanning genie out of the bottle

Exploring an Attack on Image Scaling Algorithms

In their 2019 publication ‘Seeing is Not Believing: Camouflage Attacks on Image Scaling Algorithms’, Xiao et al. demonstrated a fascinating and frightening exploit on a few commonly used and popular scaling algorithms. Through what Quiring et al. referred to as adversarial preprocessing, they created an attack image that closely resembles one image (decoy) but portrays a completely different image (payload) when scaled down. In their example (below), an image of sheep could scale down and suddenly show a wolf.

Two images are shown, the left shows the original attack image, which depicts a group of sheep. The right shows the scaled down attack image, which shows a grey wolf.
On the left, a group of sheep can be seen in a slightly stretched out photo (the decoy). When scaled down to the correct dimensions (right), the image shows a grey wolf (payload). This is an example of an attack image.

These attack images can be used in a number of scenarios, particularly in data poisoning of deep learning datasets and covert dissemination of information. Deep learning models require large datasets for training. A series of carefully crafted and planted attack images placed into public datasets can poison these models, for example, reducing the accuracy of object classification. Essentially all models are trained with images scaled down to a fixed size (e.g. 229 × 229) to reduce the computational load, so these attack images are highly likely to work if their dimensions are correctly configured. As these attack images hide their malicious payload in plain sight, they also evade detection. Xiao et al. described how an attack image could be crafted for a specific device (e.g. an iPhone XS) so that the iPhone XS browser renders the malicious image instead of the decoy image. This technique could be used to propagate payload, such as illegal advertisements, discreetly.

The natural stealthiness of this attack is a dangerous factor, but on top of that, it is also relatively easy to replicate. Xiao et al. published their own source code in a GitHub repository, with which anyone can run and create their own attack images. Additionally, the maths behind the method is also well described in the paper, allowing my group to replicate the attack for coursework assigned to us for UCL’s Computer Security II module, without referencing the paper authors’ source code. Our implementation of the attack is available at our GitHub repository. This coursework required us to select an attack detailed in a conference paper and replicate it. While working on the coursework, we discovered a relatively simple way to stop these attack images from working and even allow the original content to be viewed. This is shown in the series of images below.

Continue reading Exploring an Attack on Image Scaling Algorithms

What went wrong with Horizon: learning from the Post Office Trial

This Post Office trial has revealed what is likely the largest miscarriage of justice in UK legal history. Hundreds of individuals who operated Post Office branches (subpostmasters) were convicted on fraud and theft charges on the basis of missing funds identified by the Horizon accounting system. Thousands more subpostmasters were forced to pay the Post Office back for these shortfalls. But the Post Office trial concluded that Horizon was “not remotely robust”, and the supposed shortfalls might never have existed in the first place and, where they did, they might not have been due to the fault of the subpostmaster.

This scandal resulted from insufficient information being disclosed in the process of prosecuting subpostmasters, poor oversight of the Post Office (both by its management and by the government) and a failure of the legal system to view evidence generated by Horizon with appropriate scepticism. These matters have been discussed elsewhere, but what’s been talked about less are the technical failures in Horizon and associated systems that might have caused the supposed shortfalls.

I spoke to the Computerphile YouTube channel about what we’ve learned about Horizon and its failures, based on the Post Office trial. What seems to be a simple problem – keeping track of how much money and stock is in a branch – is actually much harder than it appears. Considering the large number of transactions that Horizon performs (millions per day), inevitable hardware and communication failures, and the complex interactions between systems, it should have been obvious that errors would be a common occurrence.

In this video, I explained the basics of double-entry accounting, how this must be implemented on a transaction system (that provides atomicity, consistency, isolation, and durability – ACID) and gave some examples of where Horizon has failed. For this video, I had to abbreviate and simplify some of the aspects discussed, so I wrote this blog post to refer to the Post Office trial judgement that talked about the situations in which Horizon has been identified to fail.

Failure of atomicity resulting in a duplication of a transfer

At 7:06, I talked about atomicity requiring that all parts of a transaction must occur precisely once. In the judgement (paragraph 346), an example of where Horizon duplicated part of a transaction following a system crash.

Mr Godeseth was taken, very carefully, through a specific use of the transaction correction tool in 2010. In PEAK 0195561, a problem was reported to the SSC on 4 March 2010 where a SPM had tried, on 2 March 2010, to transfer out £4,000 (referred to in the PEAK as 4,000 pds, which means either pounds (plural) or pounds sterling) from an individual stock unit into the shared main stock unit when the system crashed. The SPM was then issued with 2 x £4,000 receipts. These two receipts had the same session number. The PEAK, as one would expect, records various matters in note form and also uses informal shorthand. However, the main thrust is that when the SPM did the cash declaration, although the main stock unit (into which the £4,000 was being transferred) “was fine”, the unit from which the cash was taken “was out by 4000 pounds (a loss of 4000 pds)”. This is very similar to what Mr Latif said had happened to him, although the transfer in July 2015 to which he referred was £2,000. The PEAK related to Horizon Online and was the admitted occasion when the Balancing Transaction tool had been used.

Continue reading What went wrong with Horizon: learning from the Post Office Trial