Improving the auditability of access to data requests

Data is increasingly collected and shared, with potential benefits for both individuals and society as a whole, but people cannot always be confident that their data will be shared and used appropriately. Decisions made with the help of sensitive data can greatly affect lives, so there is a need for ways to hold data processors accountable. This requires not only ways to audit these data processors, but also ways to verify that the reported results of an audit are accurate, while protecting the privacy of individuals whose data is involved.

We (Alexander Hicks, Vasilios Mavroudis, Mustafa Al-Basam, Sarah Meiklejohn and Steven Murdoch) present a system, VAMS, that allows individuals to check accesses to their sensitive personal data, enables auditors to detect violations of policy, and allows publicly verifiable and privacy-preserving statistics to be published. VAMS has been implemented twice, as a permissioned distributed ledger using Hyperledger Fabric and as a verifiable log-backed map using Trillian. The paper and the code are available.

Use cases and setting

Our work is motivated by two scenarios: controlling the access of law-enforcement personnel to communication records and controlling the access of healthcare professionals to medical data.

The UK Home Office states that 95% of serious and organized criminal cases make use of communications data. Annual reports published by the IOCCO (now under the IPCO name) provide some information about the request and use of communications data. There were over 750 000 requests for data in 2016, a portion of which were audited to provide the usage statistics and errors that can be found in the published report.

Not only is it important that requests are auditable, the requested data can also be used as evidence in legal proceedings. In this case, it is necessary to ensure the integrity of the data or to rely on representatives of data providers and expert witnesses, the latter being more expensive and requiring trust in third parties.

In the healthcare case, individuals usually consent for their GP or any medical professional they interact with to have access to relevant medical records, but may have concerns about the way their information is then used or shared.  The NHS regularly shares data with researchers or companies like DeepMind, sometimes in ways that may reduce the trust levels of individuals, despite the potential benefits to healthcare.

Continue reading Improving the auditability of access to data requests

Scanning the Internet for Liveness

Internet-wide scanning (or probing) has emerged as a key measurement technique to study a diverse set of the Internet’s properties, including address space utilization, host reachability, topology, service availability, vulnerabilities, and service discrimination. But despite its widespread use and critical importance for Internet measurement, we still lack a clear understanding of IP liveness—whether a target IP address responds to a probe packet. What type of probe packets should we send if we, for example, want to maximize the responding host population? What type of responses can we expect and which factors determine such responses? What degree of consistency can we expect when probing the same host with different probe packets?

In our recent paper Scanning the Internet for Liveness, we presented a systematic analysis of liveness and how it shows up in active scanning campaigns. We developed a taxonomy of liveness which we employed to develop a method to perform concurrent IPv4 scans using ICMP, five TCP-based, and two UDP-based protocols, capturing all responses to our probes. Our key findings are:

  • Responsive host populations are highly sensitive to the choice of probe. While ICMP discovers the highest number of raw IPs, our TCP and UDP measurements exclusively contribute a fifth to the total population of responsive hosts.
  • Collecting ICMP Error messages for TCP and UDP scans increases the responsive population by more than 13%, and provides new opportunities to interpret scan results.
  • At the transport layer, our concurrent measurements reveal that the majority of hosts exhibit inconsistent behaviour when probed on different ports, and that capturing negative responses significantly improves scanning completeness.
  • Our concurrent scans allow us to identify nearly 2M tarpits (IPs masquerading as fake hosts) that would bias measurements that do not take them into account.
  • Our study of cross-protocol liveness shows that responsiveness for some protocols is correlated, suggesting that the same seed set of responsive IP addresses can be potentially used to bootstrap multiple highly-correlated target populations to reduce scan traffic.

This work recently appeared in the April 2018 issue of ACM SIGCOMM Computer Communication Review (CCR), and was conducted in collaboration with Philipp Richter (MIT), Mobin Javed (LUMS Pakistan, ICSI Berkeley), Srikanth Sundaresan (Princeton University), Zakir Durumeric (Stanford University), Steven J. Murdoch (University College London), Richard Mortier (University of Cambridge) and Vern Paxson (UC Berkeley, ICSI Berkeley). Overall, this study yields practical insights and methodological improvements for the design and the execution of active Internet measurement studies. We released the code and data of this work as open source to allow for reproducibility of the results, and to enable further research.

Continue reading Scanning the Internet for Liveness

Tampering with OpenPGP digitally signed messages by exploiting multi-part messages

The EFAIL vulnerability in the OpenPGP and S/MIME secure email systems, publicly disclosed yesterday, allows an eavesdropper to obtain the contents of encrypted messages. There’s been a lot of finger-pointing as to which particular bit of software is to blame, but that’s mostly irrelevant to the people who need secure email. The end result is that users of encrypted email, who wanted formatting better than what a mechanical typewriter could offer, were likely at risk.

One of the methods to exploit EFAIL relied on the section of the email standard that allows messages to be in multiple parts (e.g. the body of the message and one or more attachments) – known as MIME (Multipurpose Internet Mail Extensions). The authors of the EFAIL paper used the interaction between MIME and the encryption standard (OpenPGP or S/MIME as appropriate) to cause the email client to leak the decrypted contents of a message.

However, not only can MIME be used to compromise the secrecy of messages, but it can also be used to tamper with digitally-signed messages in a way that would be difficult if not impossible for the average person to detect. I doubt I was the first person to discover this, and I reported it as a bug 5 years ago, but it still seems possible to exploit and I haven’t found a proper description, so this blog post summarises the issue.

The problem arises because it is possible to have a multi-part email where some parts are signed and some are not. Email clients could have adopted the fail-safe option of considering such a mixed message to be malformed and therefore treated as unsigned or as having an invalid signature. There’s also the fail-open option where the message is considered signed and both the signed and unsigned parts are displayed. The email clients I looked at (Enigmail with Mozilla Thunderbird, and GPGTools with Apple Mail) opt for a variant of the fail-open approach and thus allow emails to be tampered with while keeping their status as being digitally signed.

Continue reading Tampering with OpenPGP digitally signed messages by exploiting multi-part messages

“The pool’s run dry” – analyzing anonymity in Zcash

Zcash is a cryptocurrency whose main feature is a “shielded pool” that is designed to provide strong anonymity guarantees. Indeed, the cryptographic foundations of the shielded pool are based in highly-regarded academic research. The deployed Zcash protocol, however, allows for transactions outside of the shielded pool (which, from an anonymity perspective, are identical to Bitcoin transactions), and it can be easily observed from blockchain data that the majority of transactions do not use the pool. Nevertheless, users of the shielded pool should be able to treat it as their anonymity set when attempting to spend coins in an anonymous fashion.

In a recent paper, An Empirical Analysis of Anonymity in Zcash, we (George Kappos, Haaroon Yousaf, Mary Maller, and Sarah Meiklejohn) conducted an empirical analysis of Zcash to further our understanding of its shielded pool and broader ecosystem. Our main finding is that is possible in many cases to identify the activity of founders and miners using the shielded pool (who are required by the consensus rules to put all newly generated coins into it). The implication for anonymity is that this activity can be excluded from any attempt to track coins as they move through the pool, which acts to significantly shrink the effective anonymity set for regular users. We have disclosed all our findings to the developers of Zcash, who have written their own blog post about this research.  This work will be presented at the upcoming USENIX Security Symposium.

What is Zcash?

In Bitcoin, the sender(s) and receiver(s) in a transaction are publicly revealed on the blockchain. As with Bitcoin, Zcash has transparent addresses (t-addresses) but gives users the option to hide the details of their transactions using private addresses (z-addresses). Private transactions are conducted using the shielded pool and allow users to spend coins without revealing the amount and the sender or receiver. This is possible due to the use of zero-knowledge proofs.

Like Bitcoin, new coins are created in public “coingen” transactions within new blocks, which reward the miners of those blocks. In Zcash, a percentage of the newly minted coins are also sent to the founders (a predetermined list of Zcash addresses owned by the developers and embedded into the protocol).

Continue reading “The pool’s run dry” – analyzing anonymity in Zcash