Apple letting the content-scanning genie out of the bottle

When Apple announced that they would be scanning iPhones for child sexual abuse material (CSAM), the push-back appears to have taken them by surprise. Since then, Apple has been engaging with experts and developing their proposals to mitigate risks that have been raised. In this post, I’ll discuss some of the issues with Apple’s CSAM detection system and what I’ve learned from their documentation and events I’ve participated in.

Technically Apple’s CSAM detection proposal is impressive, and I’m pleased to see Apple listening to the community to address issues raised. However, the system still creates risks that will be difficult to avoid. Governments are likely to ask to expand the system to types of content other than CSAM, regardless of what Apple would like to happen. When they do, there will be complex issues to deal with, both for Apple and the broader technology community. The proposals also risk causing people to self-censor, even when they are doing nothing wrong.

How Apple’s CSAM detection works

The iPhone or iPad scans images for known CSAM just before it uploads the image to Apple’s cloud data storage system – iCloud. Images that are not going to be uploaded don’t get scanned. The comparison between images and the database is made in such a way that minor changes to CSAM, like resizing and cropping, will trigger a match, but any image that wasn’t derived from a known item of CSAM should be very unlikely to match. The results of this matching process go into a clever cryptographic system designed to ensure that the user’s device doesn’t learn the contents of the CSAM database or which of their images (if any) match. If more than a threshold of about 30 images match, Apple will be able to verify if the matching images are CSAM and, if so, report to the authorities. If the number of matching images is less than the threshold, Apple learns nothing.

Risk of scope creep

Now that Apple has built their system, a risk is that it could be extended to search for content other than CSAM by expanding the database used for matching. While some security properties of their system are ensured through cryptography, the restriction to CSAM is only a result of Apple’s policy on the content of the matching database. Apple has clearly stated that it would resist any expansion of this policy, but governments may force Apple to make changes. For example, in the UK, this could be through a Technical Capability Notice (under the Investigatory Powers Act) or powers proposed in the Online Safety Bill.

If a government legally compelled them to expand the matching database, Apple may have to choose between complying or leaving the market. So far, Apple has refused to say which of these choices they would take.

Continue reading Apple letting the content-scanning genie out of the bottle

Exploring an Attack on Image Scaling Algorithms

In their 2019 publication ‘Seeing is Not Believing: Camouflage Attacks on Image Scaling Algorithms’, Xiao et al. demonstrated a fascinating and frightening exploit on a few commonly used and popular scaling algorithms. Through what Quiring et al. referred to as adversarial preprocessing, they created an attack image that closely resembles one image (decoy) but portrays a completely different image (payload) when scaled down. In their example (below), an image of sheep could scale down and suddenly show a wolf.

Two images are shown, the left shows the original attack image, which depicts a group of sheep. The right shows the scaled down attack image, which shows a grey wolf.
On the left, a group of sheep can be seen in a slightly stretched out photo (the decoy). When scaled down to the correct dimensions (right), the image shows a grey wolf (payload). This is an example of an attack image.

These attack images can be used in a number of scenarios, particularly in data poisoning of deep learning datasets and covert dissemination of information. Deep learning models require large datasets for training. A series of carefully crafted and planted attack images placed into public datasets can poison these models, for example, reducing the accuracy of object classification. Essentially all models are trained with images scaled down to a fixed size (e.g. 229 × 229) to reduce the computational load, so these attack images are highly likely to work if their dimensions are correctly configured. As these attack images hide their malicious payload in plain sight, they also evade detection. Xiao et al. described how an attack image could be crafted for a specific device (e.g. an iPhone XS) so that the iPhone XS browser renders the malicious image instead of the decoy image. This technique could be used to propagate payload, such as illegal advertisements, discreetly.

The natural stealthiness of this attack is a dangerous factor, but on top of that, it is also relatively easy to replicate. Xiao et al. published their own source code in a GitHub repository, with which anyone can run and create their own attack images. Additionally, the maths behind the method is also well described in the paper, allowing my group to replicate the attack for coursework assigned to us for UCL’s Computer Security II module, without referencing the paper authors’ source code. Our implementation of the attack is available at our GitHub repository. This coursework required us to select an attack detailed in a conference paper and replicate it. While working on the coursework, we discovered a relatively simple way to stop these attack images from working and even allow the original content to be viewed. This is shown in the series of images below.

Continue reading Exploring an Attack on Image Scaling Algorithms

What went wrong with Horizon: learning from the Post Office Trial

This Post Office trial has revealed what is likely the largest miscarriage of justice in UK legal history. Hundreds of individuals who operated Post Office branches (subpostmasters) were convicted on fraud and theft charges on the basis of missing funds identified by the Horizon accounting system. Thousands more subpostmasters were forced to pay the Post Office back for these shortfalls. But the Post Office trial concluded that Horizon was “not remotely robust”, and the supposed shortfalls might never have existed in the first place and, where they did, they might not have been due to the fault of the subpostmaster.

This scandal resulted from insufficient information being disclosed in the process of prosecuting subpostmasters, poor oversight of the Post Office (both by its management and by the government) and a failure of the legal system to view evidence generated by Horizon with appropriate scepticism. These matters have been discussed elsewhere, but what’s been talked about less are the technical failures in Horizon and associated systems that might have caused the supposed shortfalls.

I spoke to the Computerphile YouTube channel about what we’ve learned about Horizon and its failures, based on the Post Office trial. What seems to be a simple problem – keeping track of how much money and stock is in a branch – is actually much harder than it appears. Considering the large number of transactions that Horizon performs (millions per day), inevitable hardware and communication failures, and the complex interactions between systems, it should have been obvious that errors would be a common occurrence.

In this video, I explained the basics of double-entry accounting, how this must be implemented on a transaction system (that provides atomicity, consistency, isolation, and durability – ACID) and gave some examples of where Horizon has failed. For this video, I had to abbreviate and simplify some of the aspects discussed, so I wrote this blog post to refer to the Post Office trial judgement that talked about the situations in which Horizon has been identified to fail.

Failure of atomicity resulting in a duplication of a transfer

At 7:06, I talked about atomicity requiring that all parts of a transaction must occur precisely once. In the judgement (paragraph 346), an example of where Horizon duplicated part of a transaction following a system crash.

Mr Godeseth was taken, very carefully, through a specific use of the transaction correction tool in 2010. In PEAK 0195561, a problem was reported to the SSC on 4 March 2010 where a SPM had tried, on 2 March 2010, to transfer out £4,000 (referred to in the PEAK as 4,000 pds, which means either pounds (plural) or pounds sterling) from an individual stock unit into the shared main stock unit when the system crashed. The SPM was then issued with 2 x £4,000 receipts. These two receipts had the same session number. The PEAK, as one would expect, records various matters in note form and also uses informal shorthand. However, the main thrust is that when the SPM did the cash declaration, although the main stock unit (into which the £4,000 was being transferred) “was fine”, the unit from which the cash was taken “was out by 4000 pounds (a loss of 4000 pds)”. This is very similar to what Mr Latif said had happened to him, although the transfer in July 2015 to which he referred was £2,000. The PEAK related to Horizon Online and was the admitted occasion when the Balancing Transaction tool had been used.

Continue reading What went wrong with Horizon: learning from the Post Office Trial

Making sense of EMV card data – how to decode the TLV data format

At the Payment Village in DEFCON 28, I presented a talk about my research in payment system security. While my talks have in the past covered high-level issues or particular security vulnerabilities, for this presentation, I went into depth about the TLV (tag-length-value) data format that anyone researching payment security is going to have to deal with. This format is used for Chip and PIN cards, as specified by the EMV standard, and is present in related standards like contactless and mobile payments. The TLV format used in EMV is also closely related to the ASN.1 format used in HTTPS certificates. There are automated decoders for TLV (the one I wrote is available on EMVLab), but for the purposes of debugging, testing and handling corrupt or incomplete data, it’s sometimes necessary to get your hands dirty and understand the format yourself. In this talk, I show how this can be done.

Rather than the usual PowerPoint, I tried something different for this talk. The slides are an interactive RISE show based on a Juptyer notebook, demonstrating a Python library I wrote to show TLV data-structure decoding. Everything is in my talk’s GitHub repository, and you can experiment with the notebook and view the slides without installing any software through its Binder. I have an accompanying Sway notebook with the reference guides I relied upon for the talk. Do have a try with this material, and I’d welcome your comments on how well (or badly) this approach works.

The DEFCON Payment Village is running again this year in August. If you’ve got something you would like to share with the community, the call for papers is open until 15 July 2021.

Evidence Critical Systems: Designing for Dispute Resolution

On Friday, 39 subpostmasters had their criminal convictions overturned by the Court of Appeal. These individuals ran post office branches and were prosecuted for theft, fraud and false accounting based on evidence from Horizon, the Post Office computer system created by Fujitsu. Horizon’s evidence was asserted to be reliable by the Post Office, who mounted these prosecutions, and was accepted as proof by the courts for decades. It was only through a long and expensive court case that a true record of Horizon’s problems became publicly known, with the judge concluding that it was “not remotely reliable”, and so allowing these successful appeals against conviction.

The 39 quashed convictions are only the tip of the iceberg. More than 900 subpostmasters were prosecuted based on evidence from Horizon, and many more were forced to reimburse the Post Office for losses that might never have existed. It could be the largest miscarriage of justice the UK has ever seen, and at the centre is the Horizon computer system. The causes of this failure are complex, but one of the most critical is that neither the Post Office nor Fujitsu disclosed the information necessary to establish the reliability (or lack thereof) of Horizon to subpostmasters disputing its evidence. Their reasons for not doing so include that it would be expensive to collect the information, that the details of the system are confidential, and disclosing the information would harm their ability to conduct future prosecutions.

The judgment quashing the convictions had harsh words about this failure of disclosure, but this doesn’t get away from the fact that over 900 prosecutions took place before the problem was identified. There could easily have been more. Similar questions have been raised relating to payment disputes: when a customer claims to be the victim of fraud but the bank says it’s the customer’s fault, could a computer failure be the cause? Both the Post Office and banking industry rely on the legal presumption in England and Wales that computers operate correctly. The responsibility for showing otherwise is for the subpostmaster or banking customer.

Continue reading Evidence Critical Systems: Designing for Dispute Resolution

Aggregatable Distributed Key Generation

We present our work on designing an aggregatable distributed key generation algorithm, which will appear at Eurocrypt 2021.  This is joint work with Kobi Gurkan, Philipp Jovanovic, Mary Maller, Sarah Meiklejohn, Gilad Stern, and Alin Tomescu.

What is a Distributed Key Generation Algorithm?

Ever heard of Shamir’s secret sharing algorithm? It’s a classic. The overriding idea is that it is harder to corrupt many people than corrupting one person. Shamir’s secret sharing algorithm ensures that you can only learn a secret if multiple people cooperate. In cryptography, we often want to share a secret key so that we can distribute trust. The secret key might be used to decrypt a database, sign a transaction, or compute some pseudo-randomness.

In a secret sharing scheme, there is a trusted dealer who knows the whole secret, shares it out, and then goes offline. This begs the question: why bother to share the secret in the first place if you have a trusted dealer who knows the whole secret? Often the reason is that the secret sharing scheme is merely being used as an ingredient in a larger distributed key generation algorithm in which nobody knows the full secret. This isn’t always true; certainly, there are cases where a central authority might delegate tasks to workers with less authority. But in the case where there is no central authority, we need a more complete solution.

Continue reading Aggregatable Distributed Key Generation

Still treating users as the enemy: entrapment and the escalating nastiness of simulated phishing campaigns

Three years ago, we made the case against phishing your own employees through simulated phishing campaigns. They do little to improve security: click rates tend to be reduced (temporarily) but not to zero – and each remaining click can enable an attack. They also have a hidden cost in terms of productivity – employees have to spend time processing more emails that are not relevant to their work, and then spend more time pondering whether to act on emails. In a recent paper, Melanie Volkamer and colleagues provided a detailed listing of the pros and cons from the perspectives of security, human factors and law. One of the legal risks was finding yourself in court with one of the 600-pound digital enterprise gorillas for trademark infringement – Facebook objected to their trademark and domain being impersonated. They also likely don’t want their brand to be used in attacks because, contrary to what some vendors tell you, being tricked by your employer is not a pleasant experience. Negative emotions experienced with an event often transfer to anyone or anything associated with it – and negative emotions are not what you want associated with your brand if your business depends on keeping billions of users engaging with your services as often as possible.

Recent tactics employed by the providers of phishing campaigns can only be described as entrapment – to “demonstrate” the need for their services, they create messages that almost everyone will click on. Employees of the Chicago Tribune and GoDaddy, for instance, received emails promising bonuses. Employees had hope of extra pay raised and then cruelly dashed, and on top, were hectored for being careless about phishing. Some employees vented their rage publicly on Twitter, and the companies involved apologised. The negative publicity may eventually be forgotten, but the resentment of employees feeling not only tricked but humiliated and betrayed, will not fade any time soon. The increasing nastiness of entrapment has seen employees targeted with promises of COVID vaccinations from employers – who then find themselves being ridiculed for their gullibility instead of lauded for their willingness to help.

Continue reading Still treating users as the enemy: entrapment and the escalating nastiness of simulated phishing campaigns

Thoughts on the Future Implications of Microsoft’s Legal Approach towards the TrickBot Takedown

Just this week, Microsoft announced its takedown operation against the TrickBot botnet, in collaboration with other cybersecurity partners, such as FS-ISAC, ESET, and Symantec. This takedown followed Microsoft’s successful application for a court order this month, enabling them to enact technical disruption against the botnet. Such legal processes are typical and necessary precursors to such counter-operations.

However, what was of particular interest, in this case, was the legal precedent Microsoft (successfully) sought, which was based on breaches of copyright law. Specifically, they founded their claim on the alleged reuse (and misuse) of Microsoft’s copyrighted software – the Windows 8 SDK – by the TrickBot malware authors.

Now, it is clear that this takedown operation is not likely to cripple the entirety of the TrickBot operation. As numerous researchers have found (e.g., Stone-Gross et al., 2011; Edwards et al., 2015), a takedown operation often works well in the short-term, but the long-term effects are highly variable. More often than not, unless they are arrested, and their infrastructure is seized, botnet operators tend to respond to such counter-operations by redeploying their infrastructure to new servers and ISPs, moving their operations to other geographic regions or new targets, and/or adapting their malware to become more resistant to detection and analysis. In fact, these are just some of the behaviours we observed in a case-by-case longitudinal study on botnets targeted by law enforcement (one of which involved Dyre, a predecessor of the TrickBot malware). A pre-print of this study is soon to be released.

So, no, I’m not proposing to discuss the long-term efficacy of takedown operations such as this. That is for another blog post.

Rather, what I want to discuss (or, perhaps, more accurately, put forward as some initial thoughts) are the potential implications of Microsoft’s legal approach to obtaining the court order (which is incumbent for such operations) on future botnet takedowns, particularly in the area of malicious code reuse.

Continue reading Thoughts on the Future Implications of Microsoft’s Legal Approach towards the TrickBot Takedown

Winkle – Decentralised Checkpointing for Proof-of-Stake

Several blockchain projects are considering proof-of-stake mechanisms in place of proof-of-work, attracted by the lower energy costs. Some proof-of-stake protocols based on BFT systems such as HotStuff or Tendermint appear to provide faster and deterministic finality. In these protocols, a set of nodes known as validators, that are identified by their public key, operates the consensus protocol such that any user can verify it using only publicly available information by verifying the validators’ signatures. The set of validators changes periodically, with respect to a specific governance mechanism.

However, as any consensus protocol that is not based on resource consumption (such as proof-of-work, proof-of-space and so on) they are vulnerable to an attack known in the literature as Long-Range Attack. In a Long-Range Attack, an adversary obtains the secret keys of past validators (e.g., by bribing them at no cost since they do not use these keys any more) and is thus able to re-write the entire history of the blockchain with those. A user that has been offline for a long period of time could then be fooled by the adversarial chain.

The number of keys holding a given fraction of stake (logarithmic scale).

To solve this problem, we propose Winkle, a decentralised checkpointing mechanism operated by coin holders, whose keys are harder to compromise than validators’ as they are more numerous. By analogy, in Bitcoin, taking control of one-third of the total supply of money would require at least 889 keys, whereas only 4 mining pools control more than half of the hash power (see figure above).

Our Protocol

The idea of Winkle is that coin holders will checkpoint the honest chain, such that if an adversary creates an alternative chain, its chain will not be checkpointed (since the adversary does not control the keys of coin holders) and is thus easily differentiable from the honest chain.

Continue reading Winkle – Decentralised Checkpointing for Proof-of-Stake

The role of usability, power dynamics, and incentives in dispute resolutions around computer evidence

As evidence produced by a computer is often used in court cases, there are necessarily presumptions about the correct operation of the computer that produces it. At present, based on a 1997 paper by the Law Commission, it is assumed that a computer operated correctly unless there is explicit evidence to the contrary.

The recent Post Office trial (previously mentioned on Bentham’s Gaze) has made clear, if previous cases had not, that this assumption is flawed. After all, computers and the software they run are never perfect.

This blog post discusses a recent invited paper published in the Digital Evidence and Electronic Signature Law Review titled The Law Commission presumption concerning the dependability of computer evidence. The authors of the paper, collectively referred to as LLTT, are Peter Bernard Ladkin, Bev Littlewood, Harold Thimbleby and Martyn Thomas.

LLTT examine the basis for the presumption that a computer operated correctly unless there is explicit evidence to the contrary. They explain why the Law Commission’s belief in Colin Tapper’s statement in 1991 that “most computer error is either immediately detectable or results from error in the data entered into the machine” is flawed. Not only can computers be assumed to have bugs (including undiscovered bugs) but the occurrence of a bug may not be noticeable.

LLTT put forward three recommendations. First, a presumption that any particular computer system failure is not caused by software is not justified, even for software that has previously been shown to be very reliable. Second, evidence of previous computer failure undermines a presumption of current proper functioning. Third, the fact that a class of failures has not happened before is not a reason for assuming it cannot occur.

Continue reading The role of usability, power dynamics, and incentives in dispute resolutions around computer evidence