March 2023 Archives

In the umpteenth chapter of UK governments battling encryption, Priti Patel in September 2021 launched the “Safety Tech Challenge”. It was to give five companies £85K each to develop “innovative technologies to keep children safe when using end-to-end encrypted messaging services”. Tasked with evaluating the outcomes was the REPHRAIN project, the consortium given £7M to address online harms. I had been part of the UKRI 2020 panel awarding this grant, and believed then and now that it concerns a politically laden and technically difficult task, that was handed to a group of eminently sensible scientists.¹ While the call had strongly invited teams to promise the impossible in order to placate the political goals, this team (and some other consortia too) wisely declined to do so, and remained realistic.

The evaluation results have now come back, and the REPHRAIN team have done a very decent job given that they had to evaluate five different brands of snake oil with their hands tied behind their backs. In doing so, they have made a valuable contribution to the development of trustworthy AI in the important application area of online (child) safety technology.

The Safety Tech Challenge

The Safety Tech Challenge was always intellectually dishonest. The essence of end-to-end encryption (E2EE) is that nothing² can be known about encrypted information by anyone other than the sender and receiver. Not whether the last bit is a 0, not whether the message is CSAM (child sexual abuse material).³ The final REPHRAIN report indeed states there is “no published research on computational tools that can prevent CSAM in E2EE”.

In terms of technologies, there really also is no such thing as “in the context of E2EE”: the messages are agnostic as to whether they are about to be encrypted (on the sender side) or have just been decrypted (on the receiving side), and nothing meaningful can be done⁴ in between; any technologies that can be developed are agnostic of when they get invoked.

Continue reading A well-executed exercise in snake oil evaluation

How To Safely Release Data?

Before discussing synthetic data, let’s first consider the “alternatives.”

Anonymization: Theoretically, one could remove personally identifiable information before sharing it. However, in practice, anonymization fails to provide realistic privacy guarantees because a malevolent actor often has auxiliary information that allows them to re-identify anonymized data. For example, when Netflix de-identified movie rankings (as part of a challenge seeking better recommendation systems), Arvind Narayanan and Vitaly Shmatikov de-anonymized a large chunk by cross-referencing them with public information on IMDb.

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Month: March 2023

A well-executed exercise in snake oil evaluation

The Safety Tech Challenge

What is Synthetic Data? The Good, the Bad, and the Ugly

How To Safely Release Data?