New threat models in the face of British intelligence and the Five Eyes’ new end-to-end encryption interception strategy

Due to more and more services and messaging applications implementing end-to-end encryption, law enforcement organisations and intelligence agencies have become increasingly concerned about the prospect of “going dark”. This is when law enforcement has the legal right to access a communication (i.e. through a warrant) but doesn’t have the technical capability to do so, because the communication may be end-to-end encrypted.

Earlier proposals from politicians have taken the approach of outright banning end-to-end encryption, which was met with fierce criticism by experts and the tech industry. The intelligence community had been slightly more nuanced, promoting protocols that allow for key escrow, where messages would also be encrypted under an additional key (e.g. controlled by the government). Such protocols have been promoted by intelligence agencies as recently as 2016 and early as the 1990s but were also met with fierce criticism.

More recently, there has been a new set of legislation in the UK, statements from the Five Eyes and proposals from intelligence officials that propose a “different” way of defeating end-to-end encryption, that is akin to key escrow but is enabled on a “per-warrant” basis rather than by default. Let’s look at how this may effect threat models in applications that use end-to-end encryption in the future.

Legislation

On the 31st of August 2018, the governments of the United States, the United Kingdom, Canada, Australia and New Zealand (collectively known as the “Five Eyes”) released a “Statement of Principles on Access to Evidence and Encryption”, where they outlined their position on encryption.

In the statement, it says:

Privacy laws must prevent arbitrary or unlawful interference, but privacy is not absolute. It is an established principle that appropriate government authorities should be able to seek access to otherwise private information when a court or independent authority has authorized such access based on established legal standards.

The statement goes on to set out that technology companies have a mutual responsibility with government authorities to enable this process. At the end of the statement, it describes how technology companies should provide government authorities access to private information:

The Governments of the Five Eyes encourage information and communications technology service providers to voluntarily establish lawful access solutions to their products and services that they create or operate in our countries. Governments should not favor a particular technology; instead, providers may create customized solutions, tailored to their individual system architectures that are capable of meeting lawful access requirements. Such solutions can be a constructive approach to current challenges.

Should governments continue to encounter impediments to lawful access to information necessary to aid the protection of the citizens of our countries, we may pursue technological, enforcement, legislative or other measures to achieve lawful access solutions.

Their position effectively boils down to requiring technology companies to provide a technical means to fulfil court warrants that require them to hand over private data of certain individuals, but the implementation for doing so is open to the technology company.

Continue reading New threat models in the face of British intelligence and the Five Eyes’ new end-to-end encryption interception strategy

Coconut E-Petition Implementation

An interesting new multi-authority selective-disclosure credential scheme called Coconut has been recently released, which has the potential to enable applications that were not possible before. Selective-disclosure credential systems allow issuance of a credential (having one or more attributes) to a user, with the ability to unlinkably reveal or “show” said credential at a later instance, for purposes of authentication or authorization. The system also provides the user with the ability to “show” specific attributes of that credential or a specific function of the attributes embedded in the credential; e.g. if the user gets issued an identification credential and it has an attribute x representing their age, let’s say x = 25, they can show that f(x) > 21 without revealing x.

High-level overview of Coconut, from the Coconut paper

A particular use-case for this scheme is to create a privacy-preserving e-petition system. There are a number of anonymous electronic petition systems that are currently being developed but all lack important security properties: (i) unlinkability – the ability to break the link between users and specific petitions they signed, and (ii) multi-authority – the absence of a single trusted third party in the system. Through multi-authority selective disclosure credentials like Coconut, these systems can achieve unlinkability without relying on a single trusted third party. For example, if there are 100 eligible users with a valid credential, and there are a total of 75 signatures when the petition closes, it is not possible to know which 75 people of the total 100 actually signed the petition.

Continue reading Coconut E-Petition Implementation

What We Disclose When We Choose Not To Disclose: Privacy Unraveling Around Explicit HIV Disclosure Fields

For many gay and bisexual men, mobile dating or “hook-up” apps are a regular and important part of their lives. Many of these apps now ask users for HIV status information to create a more open dialogue around sexual health, to reduce the spread of the virus, and to help fight HIV related stigma. Yet, if a user wants to keep their HIV status private from other app users, this can be more challenging than one might first imagine. While most apps provide users with the choice to keep their status undisclosed with some form of “prefer not to say” option, our recent study which we describe in a paper being presented today at the ACM Conference on Computer-Supported Cooperative Work and Social Computing 2018, finds privacy may “unravel” around users who choose this non-disclosure option, which could limit disclosure choice.

Privacy unraveling is a theory developed by Peppet in which he suggests people will self-disclose their personal information when it is easy to do so, low-cost, and personally beneficial. Privacy may then unravel around those who keep their information undisclosed, as they are assumed to be “hiding” undesirable information, and are stigmatised and penalised as a consequence.

In our study, we explored the online views of Grindr users and found concerns over assumptions developing around HIV non-disclosures. For users who believe themselves to be HIV negative, the personal benefits of disclosing are high and the social costs low. In contrast, for HIV positive users, the personal benefits of disclosing are low, whilst the costs are high due to the stigma that HIV still attracts. As a result, people may assume that those not disclosing possess the low gain, high cost status, and are therefore HIV positive.

We developed a series of conceptual designs that utilise Peppet’s proposed limits to privacy unraveling. One of these designs is intended to artificially increase the cost of disclosing an HIV negative status. We suggest time and financial as two resources that could be used to artificially increase disclosure cost. For example, users reporting to be HIV negative could be asked to watch an educational awareness video on HIV prior to disclosing (time), or only those users who had a premium subscription could be permitted to disclose their status (financial). An alternative (or in parallel) approach is to reduce the high cost of disclosing an HIV positive status by designing in mechanisms to reduce social stigma around the condition. For example, all users could be offered the option to sign up to “living stigma-free” which could also appear on their profile to signal others of their pledge.

Another design approach is to create uncertainty over whether users are aware of their own status. We suggest profiles disclosing an HIV negative status for more than 6 months be switched automatically to undisclosed unless they report a recent HIV test. This could act as a testing reminder, as well as increasing uncertainty over the reason for non-disclosures. We also suggest increasing uncertainty or ambiguity around HIV status disclosure fields by clustering undisclosed fields together. This may create uncertainty around the particular field the user is concerned about disclosing. Finally, design could be used to cultivate norms around non-disclosures. For example, HIV status disclosure could be limited to HIV positive users, with non-disclosures then assumed to be a HIV negative status, rather than HIV positive status.

In our paper, we discuss some of the potential benefits and pitfalls of implementing Peppet’s proposed limits in design, and suggest further work needed to better understand the impact privacy unraveling could have in online social environments like these. We explore ways our community could contribute to building systems that reduce its effect in order to promote disclosure choice around this type of sensitive information.

 

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 675730.

On Location, Time, and Membership: Studying How Aggregate Location Data Can Harm Users’ Privacy

The increasing availability of location and mobility data enables a number of applications, e.g., enhanced navigation services and parking, context-based recommendations, or waiting time predictions at restaurants, which have great potential to improve the quality of life in modern cities. However, the large-scale collection of location data also raises privacy concerns, as mobility patterns may reveal sensitive attributes about users, e.g., home and work places, lifestyles, or even political or religious inclinations.

Service providers, i.e., companies with access to location data, often use aggregation as a privacy-preserving strategy to release these data to third-parties for various analytics tasks. The idea being that, by grouping together users’ traces, the data no longer contains information to enable inferences about individuals such as the ones mentioned above, while it can be used to obtain useful insights about the crowds. For instance, Waze constructs aggregate traffic models to improve navigation within cities, while Uber provides aggregate data for urban planning purposes. Similarly, CityMapper’s Smart Ride application aims at identifying gaps in transportation networks based on traces and rankings collected by users’ mobile devices, while Telefonica monetizes aggregate location statistics through advertising as part of the Smart Steps project.

That’s great, right? Well, our paper, “Knock Knock, Who’s There? Membership Inference on Aggregate Location Data” and published at NDSS 2018, shows that aggregate location time-series can in fact be used to infer information about individual users. In particular, we demonstrate that aggregate locations are prone to a privacy attack, known as membership inference: a malicious entity aims at identifying whether a specific individual contributed her data to the aggregation. We demonstrate the feasibility of this type of privacy attack on a proof-of-concept setting designed for an academic evaluation, and on a real-world setting where we apply membership inference attacks in the context of the Aircloak challenge, the first bounty program for anonymized data re-identification.

Membership Inference Attacks on Aggregate Location Time-Series

Our NDSS’18 paper studies membership inference attacks on aggregate location time-series indicating the number of people transiting in a certain area at a given time. That is, we show that an adversary with some “prior knowledge” about users’ movements is able to train a machine learning classifier and use it to infer the presence of a specific individual’s data in the aggregates.

We experiment with different types of prior knowledge. On the one hand, we simulate powerful adversaries that know the real locations for a subset of users in a database during the aggregation period (e.g., telco providers which have location information about their clients), while on the other hand, weaker ones which only know past statistics about user groups (i.e., reproducing a setting of continuous data release). Overall, we find that the adversarial prior knowledge influences significantly the effectiveness of the attack.

Continue reading On Location, Time, and Membership: Studying How Aggregate Location Data Can Harm Users’ Privacy

Improving the auditability of access to data requests

Data is increasingly collected and shared, with potential benefits for both individuals and society as a whole, but people cannot always be confident that their data will be shared and used appropriately. Decisions made with the help of sensitive data can greatly affect lives, so there is a need for ways to hold data processors accountable. This requires not only ways to audit these data processors, but also ways to verify that the reported results of an audit are accurate, while protecting the privacy of individuals whose data is involved.

We (Alexander Hicks, Vasilios Mavroudis, Mustafa Al-Basam, Sarah Meiklejohn and Steven Murdoch) present a system, VAMS, that allows individuals to check accesses to their sensitive personal data, enables auditors to detect violations of policy, and allows publicly verifiable and privacy-preserving statistics to be published. VAMS has been implemented twice, as a permissioned distributed ledger using Hyperledger Fabric and as a verifiable log-backed map using Trillian. The paper and the code are available.

Use cases and setting

Our work is motivated by two scenarios: controlling the access of law-enforcement personnel to communication records and controlling the access of healthcare professionals to medical data.

The UK Home Office states that 95% of serious and organized criminal cases make use of communications data. Annual reports published by the IOCCO (now under the IPCO name) provide some information about the request and use of communications data. There were over 750 000 requests for data in 2016, a portion of which were audited to provide the usage statistics and errors that can be found in the published report.

Not only is it important that requests are auditable, the requested data can also be used as evidence in legal proceedings. In this case, it is necessary to ensure the integrity of the data or to rely on representatives of data providers and expert witnesses, the latter being more expensive and requiring trust in third parties.

In the healthcare case, individuals usually consent for their GP or any medical professional they interact with to have access to relevant medical records, but may have concerns about the way their information is then used or shared.  The NHS regularly shares data with researchers or companies like DeepMind, sometimes in ways that may reduce the trust levels of individuals, despite the potential benefits to healthcare.

Continue reading Improving the auditability of access to data requests

“The pool’s run dry” – analyzing anonymity in Zcash

Zcash is a cryptocurrency whose main feature is a “shielded pool” that is designed to provide strong anonymity guarantees. Indeed, the cryptographic foundations of the shielded pool are based in highly-regarded academic research. The deployed Zcash protocol, however, allows for transactions outside of the shielded pool (which, from an anonymity perspective, are identical to Bitcoin transactions), and it can be easily observed from blockchain data that the majority of transactions do not use the pool. Nevertheless, users of the shielded pool should be able to treat it as their anonymity set when attempting to spend coins in an anonymous fashion.

In a recent paper, An Empirical Analysis of Anonymity in Zcash, we (George Kappos, Haaroon Yousaf, Mary Maller, and Sarah Meiklejohn) conducted an empirical analysis of Zcash to further our understanding of its shielded pool and broader ecosystem. Our main finding is that is possible in many cases to identify the activity of founders and miners using the shielded pool (who are required by the consensus rules to put all newly generated coins into it). The implication for anonymity is that this activity can be excluded from any attempt to track coins as they move through the pool, which acts to significantly shrink the effective anonymity set for regular users. We have disclosed all our findings to the developers of Zcash, who have written their own blog post about this research.  This work will be presented at the upcoming USENIX Security Symposium.

What is Zcash?

In Bitcoin, the sender(s) and receiver(s) in a transaction are publicly revealed on the blockchain. As with Bitcoin, Zcash has transparent addresses (t-addresses) but gives users the option to hide the details of their transactions using private addresses (z-addresses). Private transactions are conducted using the shielded pool and allow users to spend coins without revealing the amount and the sender or receiver. This is possible due to the use of zero-knowledge proofs.

Like Bitcoin, new coins are created in public “coingen” transactions within new blocks, which reward the miners of those blocks. In Zcash, a percentage of the newly minted coins are also sent to the founders (a predetermined list of Zcash addresses owned by the developers and embedded into the protocol).

Continue reading “The pool’s run dry” – analyzing anonymity in Zcash

“Wow such genetics. So data. Very forever?” – An overview of the blockchain genomics trend

In 2014, Harvard professor and geneticist George Church said: “‘Preserving your genetic material indefinitely’ is an interesting claim. The record for storage of non-living DNA is now 700,000 years (as DNA bits, not electronic bits). So, ironically, the best way to preserve your electronic bitcoins/blockchains might be to convert them into DNA”. In early February 2018, Nebula Genomics, a blockchain-enabled genomic data sharing and analysis platform, co-founded by George Church, was launched. And they are not alone on the market. The common factor between all of them is that they want to give the power back to the user. By leveraging the fact that most companies that currently offer direct-to-consumer genetic testing sell data collected from their customers to pharmaceutical and biotech companies for research purposes, they want to be the next Uber or Airbnb, with some even claiming to create the Alibaba for life data using the next-generation artificial intelligence and blockchain technologies.

Nebula Genomics

Its launch is motivated by the need of increasing genomic data sharing for research purposes, as well as reducing the costs of sequencing on the client side. The Nebula model aims to eliminate personal genomics companies as the middle-man between the customer and the pharmaceutical companies. This way, data owners can acquire their personal genomic data from Nebula sequencing facilities or other sources, join the Nebula network and connect directly with the buyers.
Their main claims from their whitepaper can be summarized as follows:

  • Lower the sequencing costs for customers by joining the network to profiting from directly by connecting with data buyers if they had their genomes sequenced already, or by participating in paid surveys, which can incentivize data buyers to subsidize their sequencing costs
  • Enhanced data protection: shared data is encrypted and securely analyzed using Intel Software Guard Extensions (SGX) and partially homomorphic encryption (such as the Paillier scheme)
  • Efficient data acquisition, enabling data buyers to efficiently acquire large genomic datasets
  • Being big data ready, by allowing data owners to privately store their data, and introducing space efficient data encoding formats that enable rapid transfers of genomic data summaries over the network

Zenome

This project aims to ensure that genomic data from as many people as possible will be openly available to stimulate new research and development in the genomics industry. The founders of the project believe that if we do not provide open access to genomic data and information exchange, we are at risk of ending up with thousands of isolated, privately stored collections of genomic data (from pharmaceutical companies, genomic corporations, and scientific centers), but each of these separate databases will not contain sufficient data to enable breakthrough discoveries. Their claims are not as ambitious as Nebula, focusing more on the customer profiting from selling their own DNA data rather than other sequencing companies. Their whitepaper even highlights that no valid solutions currently exist for the public use of genomic information while maintain individual privacy and that encryption is used when necessary. When buying ZNA tokens (the cryptocurrency associated with Zenome), one has to follow a Know-Your-Customer procedure and upload their ID/Passport.

Gene Blockchain

The Gene blockchain business model states it will use blockchain smart contracts to:

  • Create an immutable ledger for all industry related data via GeneChain
  • Offer payment for industry related services and supplies through GeneBTC
  • Establish advanced labs for human genome data analysis via GeneLab
  • Organize and unite global platform for health, entertainment, social network and etc. through GeneNetwork

Continue reading “Wow such genetics. So data. Very forever?” – An overview of the blockchain genomics trend

Coconut: Threshold Issuance Selective Disclosure Credentials with Applications to Distributed Ledgers

Selective disclosure credentials allow the issuance of a credential to a user, and the subsequent unlinkable revelation (or ‘showing’) of some of the attributes it encodes to a verifier for the purposes of authentication, authorisation or to implement electronic cash. While a number of schemes have been proposed, these have limitations, particularly when it comes to issuing fully functional selective disclosure credentials without sacrificing desirable distributed trust assumptions. Some entrust a single issuer with the credential signature key, allowing a malicious issuer to forge any credential or electronic coin. Other schemes do not provide the necessary re-randomisation or blind issuing properties necessary to implement modern selective disclosure credentials. No existing scheme provides all of threshold distributed issuance, private attributes, re-randomisation, and unlinkable multi-show selective disclosure.

We address these challenges in our new work Coconut – a novel scheme that supports distributed threshold issuance, public and private attributes, re-randomization, and multiple unlinkable selective attribute revelations. Coconut allows a subset of decentralised mutually distrustful authorities to jointly issue credentials, on public or private attributes. These credentials cannot be forged by users, or any small subset of potentially corrupt authorities. Credentials can be re-randomised before selected attributes being shown to a verifier, protecting privacy even in the case all authorities and verifiers collude.

Applications to Smart Contracts

The lack of full-featured selective disclosure credentials impacts platforms that support ‘smart contracts’, such as Ethereum, Hyperledger and Chainspace. They all share the limitation that verifiable smart contracts may only perform operations recorded on a public blockchain. Moreover, the security models of these systems generally assume that integrity should hold in the presence of a threshold number of dishonest or faulty nodes (Byzantine fault tolerance). It is desirable for similar assumptions to hold for multiple credential issuers (threshold aggregability). Issuing credentials through smart contracts would be very useful. A smart contract could conditionally issue user credentials depending on the state of the blockchain, or attest some claim about a user operating through the contract—such as their identity, attributes, or even the balance of their wallet.

As Coconut is based on a threshold issuance signature scheme, that allows partial claims to be aggregated into a single credential,  it allows collections of authorities in charge of maintaining a blockchain, or a side chain based on a federated peg, to jointly issue selective disclosure credentials.

System Overview

Coconut is a fully featured selective disclosure credential system, supporting threshold credential issuance of public and private attributes, re-randomisation of credentials to support multiple unlikable revelations, and the ability to selectively disclose a subset of attributes. It is embedded into a smart contract library, that can be called from other contracts to issue credentials. The Coconut architecture is illustrated below. Any Coconut user may send a Coconut request command to a set of Coconut signing authorities; this command specifies a set of public or encrypted private attributes to be certified into the credential (1). Then, each authority answers with an issue command delivering a partial credentials (2). Any user can collect a threshold number of shares, aggregate them to form a consolidated credential, and re-randomise it (3). The use of the credential for authentication is however restricted to a user who knows the private attributes embedded in the credential—such as a private key. The user who owns the credentials can then execute the show protocol to selectively disclose attributes or statements about them (4). The showing protocol is publicly verifiable, and may be publicly recorded.

 

Implementation

We use Coconut to implement a generic smart contract library for Chainspace and one for Ethereum, performing public and private attribute issuing, aggregation, randomisation and selective disclosure. We evaluate their performance, and cost within those platforms. In addition, we design three applications using the Coconut contract library: a coin tumbler providing payment anonymity, a privacy preserving electronic petitions, and a proxy distribution system for a censorship resistance system. We implement and evaluate the first two former ones on the Chainspace platform, and provide a security and performance evaluation. We have released the Coconut white-paper, and the code is available as an open-source project on Github.

Performance

Coconut uses short and computationally efficient credentials, and efficient revelation of selected attributes and verification protocols. Each partial credentials and the consolidated credential is composed of exactly two group elements. The size of the credential remains constant, and the attribute showing and verification are O(1) in terms of both cryptographic computations and communication of cryptographic material – irrespective of the number of attributes or authorities/issuers. Our evaluation of the Coconut primitives shows very promising results. Verification takes about 10ms, while signing an attribute is 15 times faster. The latency is about 600 ms when the client aggregates partial credentials from 10 authorities distributed across the world.

Summary

Existing selective credential disclosure schemes do not provide the full set of desired properties needed to issue fully functional selective disclosure credentials without sacrificing desirable distributed trust assumptions. To fill this gap, we presented Coconut which enables selective disclosure credentials – an important privacy enhancing technology – to be embedded into modern transparent computation platforms. The paper includes an overview of the Coconut system, and the cryptographic primitives underlying Coconut; an implementation and evaluation of Coconut as a smart contract library in Chainspace and Ethereum, a sharded and a permissionless blockchain respectively; and three diverse and important application to anonymous payments, petitions and censorship resistance.

 

We have released the Coconut white-paper, and the code is available as an open-source project on GitHub.  We would be happy to receive your feedback, thoughts, and suggestions about Coconut via comments on this blog post.

The Coconut project is developed, and funded, in the context of the EU H2020 Decode project, the EPSRC Glass Houses project and the Alan Turing Institute.

A Critical Analysis of Genome Privacy Research

The relationship between genomics and privacy-enhancing technologies (PETs) has been an intense one for the better part of the last decade. Ever since Wang et al.’s paper, “Learning your identity and disease from research papers: Information leaks in genome wide association study”, received the PET Award in 2011, more and more research papers have appeared in leading conferences and journals. In fact, a new research community has steadily grown over the past few years, also thanks to several events, such as Dagstuhl Seminars, the iDash competition series, or the annual GenoPri workshop. As of December 2017, the community website genomeprivacy.org lists more than 200 scientific publications, and dozens of research groups and companies working on this topic.

dagstuhl
Participants of the 2015 Dagstuhl Seminar on Genome Privacy

Progress vs Privacy

The rise of genome privacy research does not come as a surprise to many. On the one hand, genomics has made tremendous progress over the past few years. Sequencing costs have dropped from millions of dollars to less than a thousand, which means that it will soon be possible to easily digitize the full genetic makeup of an individual and run complex genetic tests via computer algorithms. Also, researchers have been able to link more and more genetic features to predisposition of diseases (e.g., Alzheimer’s or diabetes), or to cure patients with rare genetic disorders. Overall, this progress is bringing us closer to a new era of “Precision Medicine”, where diagnosis and treatment can be tailored to individuals based on their genome and thus become cheaper and more effective. Ambitious initiatives, including in UK and in US, are already taking place with the goal of sequencing the genomes of millions of individuals in order to create bio-repositories and make them available for research purposes. At the same time, a private sector for direct-to-consumer genetic testing services is booming, with companies like 23andMe and AncestryDNA already having millions of customers.

23andme
Example of 23andme “Health Overview” test results. (Image from: https://www.singularityweblog.com/)

On the other hand, however, the very same progress also prompts serious ethical and privacy concerns. Genomic data contains highly sensitive information, such as predisposition to mental and physical diseases, as well as ethnic heritage. And it does not only contain information about the individual, but also about their relatives. Since many biological features are hereditary, access to genomic data of an individual essentially means access to that of close relatives as well. Moreover, genomic data is hard to anonymize: for instance, well-known results have demonstrated the feasibility of identifying people (down to their last name) who have participated in genetic research studies just by cross-referencing their genomic information with publicly available data.

Overall, there are a couple of privacy issues that are specific to genomic data, for instance its almost perpetual sensitivity. If someone gets ahold of your genome 30 years from now, that might be still as sensitive as today, e.g., for your children. Even if there may be no immediate risks from genomic data disclosure, things might change. New correlations between genetic features and phenotypical traits might be discovered, with potential effects on perceived suitability to certain jobs or on health insurance premiums. Or, in a nightmare scenario, racist and discriminatory ideologies might become more prominent and target certain groups of people based on their genetic ancestry.

Alt-right trolls are arguing over genetic tests they think “prove” their whiteness. (Image taken from Vice News)

Making Sense of PETs for Genome Privacy

Motivated by the need to reconcile privacy protection with progress in genomics, the research community has begun to experiment with the use of PETs for securely testing and studying the human genome. In our recent paper, Systematizing Genomic Privacy Research – A Critical Analysis, we take a step back. We set to evaluate research results using PETs in the context of genomics, introducing and executing a methodology to systematize work in the field, ultimately aiming to elicit the challenges and the obstacles that might hinder their real-life deployment.

Continue reading A Critical Analysis of Genome Privacy Research

Chainspace: A Sharded Smart Contracts Platform

Thanks to their resilience, integrity, and transparency properties, blockchains have gained much traction recently, with applications ranging from banking and energy sector to legal contracts and healthcare. Blockchains initially received attention as Bitcoin’s underlying technology. But for all its success as a popular cryptocurrency, Bitcoin suffers from scalability issues: with a current block size of 1MB and 10 minute inter-block interval, its throughput is capped at about 7 transactions per second, and a client that creates a transaction has to wait for about 10 minutes to confirm that it has been added to the blockchain. This is several orders of magnitude slower that what mainstream payment processing companies like Visa currently offer: transactions are confirmed within a few seconds, and have ahigh throughput of 2,000 transactions per second on average, peaking up to 56,000 transactions per second. A reparametrization of Bitcoin can somewhat assuage these issues, increasing throughput to to 27 transactions per second and 12 second latency. Smart contract platforms, such as Ethereum inherit those scalability limitations. More significant improvements, however, call for a fundamental redesign of the blockchain paradigm.

This week we published a pre-print of our new Chainspace system—a distributed ledger platform for high-integrity and transparent processing of transactions within a decentralized system. Chainspace uses smart contracts to offer extensibility, rather than catering to specific applications such as Bitcoin for a currency, or certificate transparency for certificate verification. Unlike Ethereum, Chainspace’s sharded architecture allows for a ledger linearly scalable since only the nodes concerned with the transaction have to process it. Our modest testbed of 60 cores achieves 350 transactions per second. In comparison, Bitcoin achieves a peak rate of less than 7 transactions per second for over 6k full nodes, and Ethereum currently processes 4 transactions per second (of a theoretical maximum of 25). Moreover, Chainspace is agnostic to the smart contract language, or identity infrastructure, and supports privacy features through modern zero-knowledge techniques. We have released the Chainspace whitepaper, and the code is available as an open-source project on GitHub.

System Overview

The figure above illustrates the system design of Chainspace. Chainspace is comprised of a network of infrastructure nodes that manage valid objects and ensure that only valid transactions on those objects are committed.  Let’s look at the data model of Chainspace first. An object represents a unit of data in the Chainspace system (e.g., a bank account), and is in one of the following three states: active (can be used by a transaction), locked (is being processed by an existing transaction), or inactive (was used by a previous transaction).  Objects also have a type that determines the unique identifier of the smart contract that defines them. Smart contract procedures can operate on active objects only, while inactive objects are retained just for the purposes of audit. Chainspace allows composition of smart contracts from different authors to provide ecosystem features. Each smart contract is associated with a checker to enable private processing of transactions on infrastructure nodes since checkers do not take any secret local parameters. Checkers are pure functions (i.e., deterministic, and have no side-effects) that return a boolean value.

Now, a valid transaction accepts active input objects along with other ancillary information, and generates output objects (e.g., transfers money to another bank account). To achieve high transaction throughput and low latency, Chainspace organizes nodes into shards that manage the state of objects, keep track of their validity, and record transactions aborted or committed. We implemented this using Sharded Byzantine Atomic Commit (S-BAC)—a protocol that composes existing Byzantine Fault Tolerant (BFT) agreement and atomic commit primitives in a novel way. Here is how the protocol works:

  • Intra-shard agreement. Within each shard, all honest nodes ensure that they consistently agree on accepting or rejecting a transaction.
  • Inter-shard agreement. Across shards, nodes must ensure that transactions are committed if all shards are willing to commit the transaction, and rejected (or aborted) if any shards decide to abort the transaction.

Consensus on committing (or aborting) transactions takes place in parallel across different shards. A nice property of S-BAC’s atomic commit protocol is that the entire shard—rather than a third party—acts as a coordinator. This is in contrast to other sharding-based systems with cryptocurrency application like OmniLedger or RSCoin where an untrusted client acts as the coordinator, and is incentivized to act honestly. Such incentives do not hold for a generalized platform like Chainspace where objects may have shared ownership.

Continue reading Chainspace: A Sharded Smart Contracts Platform