“Wow such genetics. So data. Very forever?” – An overview of the blockchain genomics trend

In 2014, Harvard professor and geneticist George Church said: “‘Preserving your genetic material indefinitely’ is an interesting claim. The record for storage of non-living DNA is now 700,000 years (as DNA bits, not electronic bits). So, ironically, the best way to preserve your electronic bitcoins/blockchains might be to convert them into DNA”. In early February 2018, Nebula Genomics, a blockchain-enabled genomic data sharing and analysis platform, co-founded by George Church, was launched. And they are not alone on the market. The common factor between all of them is that they want to give the power back to the user. By leveraging the fact that most companies that currently offer direct-to-consumer genetic testing sell data collected from their customers to pharmaceutical and biotech companies for research purposes, they want to be the next Uber or Airbnb, with some even claiming to create the Alibaba for life data using the next-generation artificial intelligence and blockchain technologies.

Nebula Genomics

Its launch is motivated by the need of increasing genomic data sharing for research purposes, as well as reducing the costs of sequencing on the client side. The Nebula model aims to eliminate personal genomics companies as the middle-man between the customer and the pharmaceutical companies. This way, data owners can acquire their personal genomic data from Nebula sequencing facilities or other sources, join the Nebula network and connect directly with the buyers.
Their main claims from their whitepaper can be summarized as follows:

  • Lower the sequencing costs for customers by joining the network to profiting from directly by connecting with data buyers if they had their genomes sequenced already, or by participating in paid surveys, which can incentivize data buyers to subsidize their sequencing costs
  • Enhanced data protection: shared data is encrypted and securely analyzed using Intel Software Guard Extensions (SGX) and partially homomorphic encryption (such as the Paillier scheme)
  • Efficient data acquisition, enabling data buyers to efficiently acquire large genomic datasets
  • Being big data ready, by allowing data owners to privately store their data, and introducing space efficient data encoding formats that enable rapid transfers of genomic data summaries over the network


This project aims to ensure that genomic data from as many people as possible will be openly available to stimulate new research and development in the genomics industry. The founders of the project believe that if we do not provide open access to genomic data and information exchange, we are at risk of ending up with thousands of isolated, privately stored collections of genomic data (from pharmaceutical companies, genomic corporations, and scientific centers), but each of these separate databases will not contain sufficient data to enable breakthrough discoveries. Their claims are not as ambitious as Nebula, focusing more on the customer profiting from selling their own DNA data rather than other sequencing companies. Their whitepaper even highlights that no valid solutions currently exist for the public use of genomic information while maintain individual privacy and that encryption is used when necessary. When buying ZNA tokens (the cryptocurrency associated with Zenome), one has to follow a Know-Your-Customer procedure and upload their ID/Passport.

Gene Blockchain

The Gene blockchain business model states it will use blockchain smart contracts to:

  • Create an immutable ledger for all industry related data via GeneChain
  • Offer payment for industry related services and supplies through GeneBTC
  • Establish advanced labs for human genome data analysis via GeneLab
  • Organize and unite global platform for health, entertainment, social network and etc. through GeneNetwork

Continue reading “Wow such genetics. So data. Very forever?” – An overview of the blockchain genomics trend

Coconut: Threshold Issuance Selective Disclosure Credentials with Applications to Distributed Ledgers

Selective disclosure credentials allow the issuance of a credential to a user, and the subsequent unlinkable revelation (or ‘showing’) of some of the attributes it encodes to a verifier for the purposes of authentication, authorisation or to implement electronic cash. While a number of schemes have been proposed, these have limitations, particularly when it comes to issuing fully functional selective disclosure credentials without sacrificing desirable distributed trust assumptions. Some entrust a single issuer with the credential signature key, allowing a malicious issuer to forge any credential or electronic coin. Other schemes do not provide the necessary re-randomisation or blind issuing properties necessary to implement modern selective disclosure credentials. No existing scheme provides all of threshold distributed issuance, private attributes, re-randomisation, and unlinkable multi-show selective disclosure.

We address these challenges in our new work Coconut – a novel scheme that supports distributed threshold issuance, public and private attributes, re-randomization, and multiple unlinkable selective attribute revelations. Coconut allows a subset of decentralised mutually distrustful authorities to jointly issue credentials, on public or private attributes. These credentials cannot be forged by users, or any small subset of potentially corrupt authorities. Credentials can be re-randomised before selected attributes being shown to a verifier, protecting privacy even in the case all authorities and verifiers collude.

Applications to Smart Contracts

The lack of full-featured selective disclosure credentials impacts platforms that support ‘smart contracts’, such as Ethereum, Hyperledger and Chainspace. They all share the limitation that verifiable smart contracts may only perform operations recorded on a public blockchain. Moreover, the security models of these systems generally assume that integrity should hold in the presence of a threshold number of dishonest or faulty nodes (Byzantine fault tolerance). It is desirable for similar assumptions to hold for multiple credential issuers (threshold aggregability). Issuing credentials through smart contracts would be very useful. A smart contract could conditionally issue user credentials depending on the state of the blockchain, or attest some claim about a user operating through the contract—such as their identity, attributes, or even the balance of their wallet.

As Coconut is based on a threshold issuance signature scheme, that allows partial claims to be aggregated into a single credential,  it allows collections of authorities in charge of maintaining a blockchain, or a side chain based on a federated peg, to jointly issue selective disclosure credentials.

System Overview

Coconut is a fully featured selective disclosure credential system, supporting threshold credential issuance of public and private attributes, re-randomisation of credentials to support multiple unlikable revelations, and the ability to selectively disclose a subset of attributes. It is embedded into a smart contract library, that can be called from other contracts to issue credentials. The Coconut architecture is illustrated below. Any Coconut user may send a Coconut request command to a set of Coconut signing authorities; this command specifies a set of public or encrypted private attributes to be certified into the credential (1). Then, each authority answers with an issue command delivering a partial credentials (2). Any user can collect a threshold number of shares, aggregate them to form a consolidated credential, and re-randomise it (3). The use of the credential for authentication is however restricted to a user who knows the private attributes embedded in the credential—such as a private key. The user who owns the credentials can then execute the show protocol to selectively disclose attributes or statements about them (4). The showing protocol is publicly verifiable, and may be publicly recorded.



We use Coconut to implement a generic smart contract library for Chainspace and one for Ethereum, performing public and private attribute issuing, aggregation, randomisation and selective disclosure. We evaluate their performance, and cost within those platforms. In addition, we design three applications using the Coconut contract library: a coin tumbler providing payment anonymity, a privacy preserving electronic petitions, and a proxy distribution system for a censorship resistance system. We implement and evaluate the first two former ones on the Chainspace platform, and provide a security and performance evaluation. We have released the Coconut white-paper, and the code is available as an open-source project on Github.


Coconut uses short and computationally efficient credentials, and efficient revelation of selected attributes and verification protocols. Each partial credentials and the consolidated credential is composed of exactly two group elements. The size of the credential remains constant, and the attribute showing and verification are O(1) in terms of both cryptographic computations and communication of cryptographic material – irrespective of the number of attributes or authorities/issuers. Our evaluation of the Coconut primitives shows very promising results. Verification takes about 10ms, while signing an attribute is 15 times faster. The latency is about 600 ms when the client aggregates partial credentials from 10 authorities distributed across the world.


Existing selective credential disclosure schemes do not provide the full set of desired properties needed to issue fully functional selective disclosure credentials without sacrificing desirable distributed trust assumptions. To fill this gap, we presented Coconut which enables selective disclosure credentials – an important privacy enhancing technology – to be embedded into modern transparent computation platforms. The paper includes an overview of the Coconut system, and the cryptographic primitives underlying Coconut; an implementation and evaluation of Coconut as a smart contract library in Chainspace and Ethereum, a sharded and a permissionless blockchain respectively; and three diverse and important application to anonymous payments, petitions and censorship resistance.


We have released the Coconut white-paper, and the code is available as an open-source project on GitHub.  We would be happy to receive your feedback, thoughts, and suggestions about Coconut via comments on this blog post.

The Coconut project is developed, and funded, in the context of the EU H2020 Decode project, the EPSRC Glass Houses project and the Alan Turing Institute.

Thinking about fake news – As a security incident?

In Tristan and David’s Philosophy, Politics and Economics of Security and Privacy class, Jono gave a little information about incident response.  As a result, we have been thinking about the recent furor over fake news. There are some big questions circling this topic, and we’re going to try to focus on a part we have some competence in: what an understanding of fake news as a security incident can contribute to the wider debate. Our goal here is mostly to highlight some lessons from security research that should be applicable, so we can help constrain the solution space. Ultimately, any solution will need to engage with wider civil society.

The lessons we will argue for in the following are:

  • Solutions need to support the elector’s primary task. Education to avoid cognitive biases is not a short- or medium-term solution.
  • Focus on aligning the incentives of the media companies and the voters. Reduce the return on investment for the adversary.
  • Any blocking should be strategically useful, and not merely reactionary.

First, we want a more specific term, as well as a less charged one. Fake news includes politically or financially motivated stories presented as factual reports on the world that are fictional in material ways, and usually are intended to stir strong feelings. This definition is hardly complete. Furthermore, similar to the term “post-truth” as discussed by Jasanoff and Simmet, the term “fake news” makes several value judgement we’d like to avoid. “Fake news” carries a strong suggestion that we, the speakers, know what is true and what isn’t, and it also indicates some condescension by the speaker for anyone who believes an item of fake news. We want to avoid such insults. Instead, let’s say we want to focus on the following hypothetical security policy: democratic elections should be free from foreign interference.

Grounding out this policy definition hangs on the term “interference.” This is hard. Ultimately, the will of an elector in a free and fair election needs to be respected. This makes it particularly challenging to agree on constraints to what information an elector has access to. In practice, no elector is omniscient, so some constraints de facto exist. But weighing in on this issue is outside our competence. Let’s assume for now that public policy will provide an assessment of “interference” eventually. The UK recently announced a “dedicated national security communications unit” would be charged with “combating disinformation by state actors and others.” In France, Emmanuel Macron plans legislation to fight interference from foreign sources during elections. Various social media platforms have likewise announced attempted fixes, which means they have some functional definition of what “interference” they’re seeking to remove. Unfortunately, “none of the tech giants claim to be ready” for the November 2018 elections in the US.

Interference in elections is a type of information warfare. An appropriate security policy needs to assess the threat environment and the capabilities of the adversaries. In particular, the Russian Federation has been assessed as a highly motivated and well-resourced actor in this space. We should note that Russia, in turn, assesses the intent and capability of the USA similarly. Tools and tactics within information warfare, particularly disinformation campaigns, help define “interference” within our security policy.

In this context, what can the security research community recommend? Well, the main target of the disinformation campaign are usual citizens. They are targetable largely due to inherent cognitive biases in the way humans process and reason about information. In security terms, we could see these biases as vulnerabilities in the system. Classically, we have two options to secure the system: patch the vulnerability, or prevent the adversary from exploiting it by controlling or filtering the attack before it reaches the target.

Patch in this case would mean teaching people to avoid cognitive biases in their day-to-day reasoning. Psychology tells us this is hard. Intelligence analysts train for months or years for this. And the research in usable security has affirmed time and time again that the users are not the enemy. That is, the system must alleviate the burden on the user’s attention and not interfere with their primary task, or else the user will subvert or avoid the protections put in place. Any changes in user culture are slow. This leads us to lesson 1 on preventing disinformation campaigns for election interference: solutions need to support the elector’s primary task. Education to avoid cognitive biases is not a short-term or medium-term solution.

Controlling the attack vectors is more promising, although filtering them is not. A key aspect of any information security policy is aligning the economic incentives of the actors. Economics is a main reason why infosec is hard. It may not be easy to reorganize the incentives in the advertising and news distribution media space. However, as long as organizations profit from more clicks on an article no matter the content, there will be an incentive to drive viewers that is ultimately at cross-purposes with our security goal. Such misaligned incentives often swamp any technical security solutions. And any adversary with an economic incentive to attack usually will. Thus our second lesson: focus on aligning the incentives of the media companies and the voters; reduce the return on investment for the adversary. Exactly how to do these things will require future work.

There are huge issues about human rights and free speech for blocking access to information. However, the technical aspects of blacklisting are worth understanding before even attempting such human-rights debates. Blacklists of internet resources, such as domain names, IP addresses, or web pages, are useful. But they’re not a final solution. Whether blacklists move at the speed of national legislatures or are updated every five minutes, their main impact is to cause the adversary to move around.  Blacklists alone are not enough. We would need to look for suspiciously mobile resources (i.e. fast-flux), and eventually whitelist resources. Blacklists such as implemented by Facebook in response to Congress are helpful. But we should carefully consider how they drive the disinformation campaigns into a place we are better able to counteract them, and be sure we don’t make such campaigns harder to find instead. Lesson 3 is therefore that any blocking should be strategically useful, and not merely reactionary.

We’d be happy for further comments on fake news, disinformation campaigns that interfere with elections, lessons we’ve missed, disagreements about the value of security research to this topic, and other comments you might have! This is a wide open topic, and we’re still sounding it all out.

A witch-hunt for trojans in our chips

A Hardware Trojan (HT) is a malicious modification of the circuitry of an integrated circuit.


A malicious chip can make a device malfunction in several ways. It has been rumored that a hardware trojan implanted in a Syrian air-defense radar caused it to stop operating during an airstrike, thus instantly minimizing the country’s situational awareness and threat response capabilities. In other settings, hardware trojans may leak encryption keys or other secrets, or even generate weak keys that can be easily recovered by the adversary.

This article introduces a new trojan-resilient architecture, discusses its motivation and outlines how it differs from existing solutions. The full paper (Vasilios MavroudisAndrea CerulliPetr Svenda, Dan CvrcekDusan Klinec, George Danezis) has been presented in several academic and industrial venues including DEF CON 25, and ACM Conference on Computer and Communications Security 2017.

The Challenge of Detecting HT

Judging from the abundance of governmental, industrial and academic projects concerned with the prevention and the detection of hardware trojans, there is a consensus regarding the severity of the threat and it’s not taken lightly. DARPA has launched the “Integrity and Reliability of Integrated Circuits“ program aiming to develop techniques for the detection of malicious circuitry. The Intelligence Advanced Research Projects Activity funded a project aiming to redesign the fabrication of integrated circuits, while various other initiatives are currently undergoing (e.g., the COST Action project on “Trustworthy Manufacturing and Utilization of Secure Devices” and the DoD Trusted Foundry program). In addition to these, there are numerous other industrial and academic projects proposing new trojan detection techniques every year, only to be circumvented by follow up work.

But do Hardware Trojans exist?

Ironically, until now there have been no cases where malicious circuitry was detected in military-grade or even commercial chips. With nothing more than rumors to hint about hardware trojans (in places other than academic lab benches), one cannot but question their existence. In other words, is HT design and insertion too complex to be practical, or do our detection tools fail to detect the malicious circuitry embedded in the chips around us?

It could be that both are true: hardware trojans do not exist (yet) as malicious actors are focusing on other aspects of the hardware that are easier to compromise. In all cases where trojans were discovered, the erroneous behavior was traced to the chip’s firmware and not its circuitry. Interestingly, in the vast majority of those incidents the security flaws were attributed to honest fabrication mistakes (e.g., manufacturer failing to disable a testing interface).

Intentional vs. Unintentional Errors

It is safe to always assume that an IC will fail in the worst possible way, at the worst possible time (see Syrian airdefense incident). This “crash n’ burn” approach is common in critical systems (e.g., airplanes, satellites, dams), where any divergence from normal operation will result in an irrecoverable failure of the whole system.

To mitigate the risk, critical system designers employ redundancy techniques to eliminate single points of failure and thus make their setups resilient to faults. A common example are triple-redundant systems used in autopilots. Those systems employ three identical chips sourced from disjoint supply chains and replicate all the navigation computations across them. This allows the system to both tolerate a misbehaving chip and detect its presence.

It is noteworthy, that those systems do not consider the cause of the chip malfunction, and simply assume that they fail in the worst possible way. Following from this, a malicious chip is not significantly different from a defective one. After all, any adversary sophisticated enough to design and insert a hardware trojan is capable of making it indistinguishable from honest manufacturing errors. Similarly, from an operational perspective it makes little sense to distinguish between trojans in the circuitry and trojans in the firmware as the risk they pose for the system is identical.

Distributing Trust for Resilience

Given that it is impossible to achieve 100% detection rates of hardware trojans and errors, it is important that our devices maintain their security properties even in their presence. Our work introduces a new high-level device architecture that is resilient to both. In its core, it uses a redundancy-based architecture and secret-sharing protocols to distribute all secrets and computations among multiple chips. Hence, unless all chips are compromised by the same adversary, the security of the system remains intact. A key point is that those chips should originate from disjoint supply chains. This is to minimize the risk of the same adversary compromising more than one chips. To evaluate its practicality in real-life applications, we built a Hardware Security Module (HSM) that performs standard cryptographic operations (e.g., key generation, decryption, signing) at a very high rate. HSMs are commonly used in operations where security is critical, and an increased transaction throughput is needed (e.g., banking, certification Authorities). A demonstration is shown in the video above, and further details are on our website. Finally, our work can be easily combined with all existing detection and prevention techniques to further decrease the likelihood of compromises.

An investigation of online censorship in Cyprus

The island of Cyprus, situated in the east of the Mediterranean sea, has always been an important commercial and information exchange hub. Today, this is reflected on the large number of submarine cables that facilitate telecommunications with neighboring countries (Greece, Turkey, Egypt, Israel, Syria, and Lebanon) and with the rest of the world (reaching as far as India, South Korea, and Australia). Nevertheless, the Republic of Cyprus (RoC) is officially regarded as a freedom of expression safe haven, where “Internet is completely free of any specific regulation”. Unfortunately, Cypriot netizens claim that such statements couldn’t be further from the truth.

In recent years, Internet Service Providers (ISPs) in RoC have implemented an Internet filtering infrastructure to comply with the laws and regulations implied by the National Betting Authority (NBA). In an effort to understand the capacity of this infrastructure, a multi-disciplinary group of volunteers from the hack66 Observatory in Nicosia has collected and analyzed connectivity measurements from end-user connections on a variety of websites and services. Their report was presented at the 7th International Conference on e-Democracy.

For their experiments, the hack66 Observatory team put together a testlist comprising of domains from the National Betting Authority blocklist, the CitizenLab lists for Greece and Turkey, and WordPress blogs banned in Turkey as reported at the Lumen Database. The analysis was based on over 45,000 measurements from four residential ISPs operating in the Republic of Cyprus, that were anonymously submitted using a custom OONI probe during the months of March to May 2017. In addition, the team collected data using open DNS resolvers in Cyprus. Early findings suggest that the most common blocking method is DNS hijacking. Furthermore, the measurements indicate that some of the ISPs have deployed middle-boxes – network components capable of performing censorship, traffic manipulation or surveillance.

A closer inspection on the variations of the censorship mechanism implementations among ISPs raised concerns with regard to transparency and privacy: some ISPs do not inform users why a blocked website is not accessible; while others redirect requests to a web server controlled by the NBA, that could in turn log user identifiers such as their IP address. Similarly, the hack66 Observatory team was able to identify a number of unreported Internet censorship cases, entries in the NBA blocklist that either are invalid or that require sophisticated blocking techniques, and collateral damage due to blocking of email delivery to the regulated domains.

Understanding the case of Internet freedom in Cyprus becomes more complicated when the geopolitical situation is taken into consideration. Apart from the Republic of Cyprus, the island of Cyprus is divided into three other segments: the self-declared Turkish Republic of Northern Cyprus; the United Nations-controlled Green Line buffer zone; and the Sovereign Base Areas of Akrotiri and Dhekelia that remain under British control for military purposes. Measurements from the Multimax ISP operating in the area occupied by Turkey indicate network interference practices similar to those of mainland Turkey. This could be interpreted as the existence of two distinct regimes in terms of information policy on the island of Cyprus. No volunteers submitted measurements from the UN buffer zone or the British Sovereign bases. However, it is known via the Snowden revelations that GCHQ is operating a wiretap base in Cyprus codenamed “SOUNDER”, jointly funded by the NSA.

The purpose of the hack66 Observatory is to “to collect and analyze data, and routes of data through EMEA, […] in order to promote evidence based policy making”. The timing is just right, given the recent RoC government announcement of a new bill in the making, to regulate media operations and stop fake news. With their report, the hack66 Observatory aims to provide policy makers with a valuable asset for understanding the limitations and implications of the existing censorship infrastructure, and to start a debate around Internet freedom on the entirety of the island of Cyprus.

Smart Contracts and Bribes

We propose smart contracts that allows a wealthy adversary to rent existing hashing power and attack Nakamoto-style consensus protocols. Our bribery smart contracts highlight:

  • The use of Ethereum’s uncle block reward to directly subsidise a bribery attack,
  • The first history-revision attack requiring no trust between the briber and bribed miners.
  • The first realisation of a Goldfinger attack, using a contract that rewards miners in one cryptocurrency (e.g. Ethereum) for reducing the utility of another cryptocurrency (e.g. Bitcoin).

This post provides an overview of the full paper (by Patrick McCorry, Alexander Hicks and Sarah Meiklejohn) which will be presented at the 5th Workshop on Bitcoin and Blockchain Research, held at this year’s Financial Cryptography and Data Security conference.

What is a bribery attack?

Fundamentally, a wealthy adversary (let’s call her Alice) wishes to manipulate the blockchain in some way. For example, by censoring transactions, revising the blockchain’s history or trying to reduce the utility of another blockchain.

But purchasing hardware up front and competing with existing miners is discouragingly expensive, and may require a Boeing or two. Instead, it may be easier and more cost-effective for Alice to temporarily rent hashing power and obtain a majority of the network’s hash rate before performing the attack.

Continue reading Smart Contracts and Bribes

Practicing a science of security

Recently, at NSPW 2017, Tyler Moore, David Pym, and I presented our work on practicing a science of security. The main argument is that security work – both in academia but also in industry – already looks a lot like other sciences. It’s also an introduction to modern philosophy of science for security, and a survey of the existing science of security discussion within computer science. The goal is to help us ask more useful questions about what we can do better in security research, rather than get distracted by asking whether security can be scientific.

Most people writing about a science of security conclude that security work is not a science, or at best rather hopefully conclude that it is not a science yet but could be. We identify five common reasons people present as to why security is not a science: (1) experiments are untenable; (2) reproducibility is impossible; (3) there are no laws of nature in security; (4) there is no single ontology of terms to discuss security; and (5) security is merely engineering.

Through our introduction to modern philosophy of science, we demonstrate that all five of these complaints are misguided. They rely on an old conception of what counts as science that was largely abandoned in the 1970s, when the features of biology came to be recognized as important and independent from the features of physics. One way to understand what the five complaints actually allege is that security is not physics. But that’s much less impactful than claiming it is not science.

More importantly, we have a positive message on how to overcome these challenges and practice a science of security. Instead of complaining about untenable experiments, we can discuss structured observations of the empirical world. Experiments are just one type of structured observation. We need to know what counts as a useful structure to help us interpret the results as evidence. We provide recommendations for use of randomized control trials as well as references for useful design of experiments that collect qualitative empirical data. Ethical constraints are also important; the Menlo Report provides a good discussion on addressing them when designing structured observations and interventions in security.

Complaints about reproducibility are really targeted at the challenge of interpreting results. Astrophysics and paleontology do not reproduce experiments either, but are clearly still sciences. There are different senses of “reproduce,” from repeat exactly to corroborate by similar observations in a different context. There are also notions of statistical reproducibility, such as using the right tests and having enough observations to justify a statistical claim. The complaint is unfair in essentially demanding all the eight types of reproducibility at once, when realistically any individual study will only be able to probe a couple types at best. Seen with this additional nuance, security has similar challenges in reproducibility and interpreting evidence as other sciences.

A law of nature is a very strange thing to ask for when we have constructed the devices we are studying. The word “law” has had a lot of sticking power within science. The word was perhaps used in the 1600s and 1700s to imply a divine designer, thereby making the Church more comfortable with the work of the early scientists. The intellectual function we really care about is that a so-called “law” lets us generalize from particular observations. Mechanistic explanations of phenomena provide a more useful and approachable goal for our generalizations. A mechanism “for a phenomenon consists of entities (or parts) whose activities and interactions are organized so as to be responsible for the phenomenon” (pg 2).

MITRE wrote the original statement that a single ontology was needed for a science of security. They also happen to have a big research group funded to create such an ontology. We synthesize a more realistic view from Galison, Mitchell, and Craver. Basically, diverse fields contribute to a science of security by collaboratively adding constraints on the available explanations for a phenomenon. We should expect our explanations of complex topics to reflect that complexity, and so complexity may be a mark of maturity, rather than (as is commonly taken) a mark that security has as yet failed to become a science by simplifying everything into one language.

Finally, we address the relationship between science and engineering. In short, people have tried to reduce science to engineering and engineering to science. Neither are convincing. The line between the two is blurry, but it is useful. Engineers generate knowledge, and scientists generate knowledge. Scientists tend to want to explain why, whereas engineers tend to want to predict a change in the future based on something they make.  Knowing why may help us make changes. Making changes may help us understand why. We draw on the work of Dear and Leonelli to bring out this nuanced, mutually supportive relationship between science and engineering.

Security already can accommodate all of these perspectives. There is nothing here that makes it seem any less scientific than life sciences. What we hope to gain from this reorientation is to refocus the question about cybersecurity research from ‘is this process scientific’ to ‘why is this scientific process producing unsatisfactory results’.

A Critical Analysis of Genome Privacy Research

The relationship between genomics and privacy-enhancing technologies (PETs) has been an intense one for the better part of the last decade. Ever since Wang et al.’s paper, “Learning your identity and disease from research papers: Information leaks in genome wide association study”, received the PET Award in 2011, more and more research papers have appeared in leading conferences and journals. In fact, a new research community has steadily grown over the past few years, also thanks to several events, such as Dagstuhl Seminars, the iDash competition series, or the annual GenoPri workshop. As of December 2017, the community website genomeprivacy.org lists more than 200 scientific publications, and dozens of research groups and companies working on this topic.

Participants of the 2015 Dagstuhl Seminar on Genome Privacy

Progress vs Privacy

The rise of genome privacy research does not come as a surprise to many. On the one hand, genomics has made tremendous progress over the past few years. Sequencing costs have dropped from millions of dollars to less than a thousand, which means that it will soon be possible to easily digitize the full genetic makeup of an individual and run complex genetic tests via computer algorithms. Also, researchers have been able to link more and more genetic features to predisposition of diseases (e.g., Alzheimer’s or diabetes), or to cure patients with rare genetic disorders. Overall, this progress is bringing us closer to a new era of “Precision Medicine”, where diagnosis and treatment can be tailored to individuals based on their genome and thus become cheaper and more effective. Ambitious initiatives, including in UK and in US, are already taking place with the goal of sequencing the genomes of millions of individuals in order to create bio-repositories and make them available for research purposes. At the same time, a private sector for direct-to-consumer genetic testing services is booming, with companies like 23andMe and AncestryDNA already having millions of customers.

Example of 23andme “Health Overview” test results. (Image from: https://www.singularityweblog.com/)

On the other hand, however, the very same progress also prompts serious ethical and privacy concerns. Genomic data contains highly sensitive information, such as predisposition to mental and physical diseases, as well as ethnic heritage. And it does not only contain information about the individual, but also about their relatives. Since many biological features are hereditary, access to genomic data of an individual essentially means access to that of close relatives as well. Moreover, genomic data is hard to anonymize: for instance, well-known results have demonstrated the feasibility of identifying people (down to their last name) who have participated in genetic research studies just by cross-referencing their genomic information with publicly available data.

Overall, there are a couple of privacy issues that are specific to genomic data, for instance its almost perpetual sensitivity. If someone gets ahold of your genome 30 years from now, that might be still as sensitive as today, e.g., for your children. Even if there may be no immediate risks from genomic data disclosure, things might change. New correlations between genetic features and phenotypical traits might be discovered, with potential effects on perceived suitability to certain jobs or on health insurance premiums. Or, in a nightmare scenario, racist and discriminatory ideologies might become more prominent and target certain groups of people based on their genetic ancestry.

Alt-right trolls are arguing over genetic tests they think “prove” their whiteness. (Image taken from Vice News)

Making Sense of PETs for Genome Privacy

Motivated by the need to reconcile privacy protection with progress in genomics, the research community has begun to experiment with the use of PETs for securely testing and studying the human genome. In our recent paper, Systematizing Genomic Privacy Research – A Critical Analysis, we take a step back. We set to evaluate research results using PETs in the context of genomics, introducing and executing a methodology to systematize work in the field, ultimately aiming to elicit the challenges and the obstacles that might hinder their real-life deployment.

Continue reading A Critical Analysis of Genome Privacy Research

Systematizing Consensus in the Age of Blockchains

We are at a crucial point in the evolution of blockchains, and the biggest hurdle in their widespread adoption is improving their performance and scalability. These properties are deeply related to the consensus protocol used—the core component of the blockchain allowing multiple nodes to agree on the data to be sealed in the chain. This week we published a pre-print of the first comprehensive systematization of blockchain consensus protocols. This blog post discusses the motivation for this study, the challenges in systematization, and a summary of the key contributions.

Consensus is an old well-studied problem in computer science. The distributed systems community has studied it for decades, and developed robust and practical protocols that can tolerate faulty and malicious nodes. However, these protocols were designed for small closed groups and cannot be directly applied to blockchains that require consensus in very large peer-to-peer open participation settings.

The Bitcoin Consensus Protocol

Bitcoin’s main innovation was to enable consensus among an open, decentralized group of nodes. This involves a leader election based on proof-of-work: all nodes attempt to find the solution to a hash puzzle and the node that wins adds the next block to the blockchain. A downside of its probabilistic leader election process, combined with performance variations in decentralized networks, is that Bitcoin offers only weak consistency. Different nodes might end up having different views of the blockchain leading to forks. Moreover, Bitcoin suffers from poor performance which cannot be fixed without fundamental redesign, and its proof-of-work consumes a huge amount of energy.

Improved Blockchain Consensus Protocols

Because of these issues, over the last few years a plethora of designs for new consensus protocols have been proposed. Some replace Bitcoin’s proof-of-work with more energy-efficient alternatives, while others modify Bitcoin’s original design for better performance. To achieve strong consistency and similar performance as mainstream payment processing systems like Visa and PayPal, another vein of work proposes to repurpose classical consensus protocols for use in blockchains. As a result of these various design proposals, the area has become too complex to see the big picture.

Systematization Challenges

To date there exists no systematic and comprehensive study of blockchain consensus protocols. Such a study is challenging because of two reasons. First, a comprehensive survey of blockchains would be incomplete without a discussion of classical consensus protocols. But the literature is vast and complex, which makes it hard to be tailored to blockchains. Second, conducting a survey of consensus protocols in blockchains has its own difficulties. Though the field is young, it is both high-volume and fast-paced. The figure above shows the number of papers published on blockchains each year since Bitcoin’s inception in 2008 (sourced from CABRA).  One might consider only accounting for work published in reputable venues, but this approach is not feasible in the case of blockchains because the bulk of the work is published in non peer-reviewed venues and as white papers for industrial platforms.

Systematization of Blockchain Consensus Protocols

To fill this gap, this week we published a pre-print of the first comprehensive systematization of blockchain consensus protocols—mapping out their evolution from the classical distributed systems use case to their application to blockchains. After first discussing key themes in classical consensus protocols, we describe: (i) protocols based on proof-of-work, (ii) proof-of-X protocols that replace proof-of-work with more energy-efficient alternatives, and (iii) hybrid protocols that are compositions or variations of classical consensus protocols. We developed a framework to evaluate their performance, security and design properties, and used it to systematize key themes in different protocol categories. This work highlighted a number of open areas and challenges related to gaps between classical consensus protocols and blockchains, security vs performance tradeoffs, incentives, and privacy. We hope that this longitudinal perspective will inspire the design of new and faster consensus protocols that can cater to varying security and privacy requirements.

How 4chan and The_Donald Influence the Fake News Ecosystem

On July 2nd, 2017, Donald Trump, the President of the United States of America, tweeted a short video clip of him punching out a CNN logo. The video was modified from an appearance that Mr. Trump made at a Wrestlemania event. It originally appeared on Reddit’s /r/The_Donald subreddit. Although The_Donald was infamous in some circles, the uproar the image caused was many’s first introduction to a community of users that have had a striking amount of influence on the world stage. While memes like the one birthed from The_Donald are worrying, but mostly harmless, a shocking amount of disinformation (“fake news”) is also created by and spread from smaller, fringe Web communities that have relatively outsized influence on the greater Web.

In a nutshell, the explosion of the Web has commoditized the creation of false information and enabled it to spread like wild fire at unprecedented scale. After a decade and a half of experience with social media platforms, bad actors have honed their techniques and been surprisingly adept at crafting messages that at best make it difficult to distinguish between fact and fiction, and at worst propagate dangerous falsehoods.

While recent discourse has tried to blur the lines between real and fake news, there are some fundamental differences. For example, the simple fact that fake news has to be created in the first place. Real news, even opinion pieces, is based around reporting and interpretation of factual material. Not to dismiss the efforts of journalists, but, the fact remains: they are not responsible for generating stories from whole cloth. This is not the case with the type of misinformation pushed by certain corners of the Web.

Consider recent events like the death of Heather Heyer during the Charlottesville protests earlier this year. While facts had to be discovered, they were facts, supported by evidence gathered by trained professionals (both law enforcement and journalists) over a period of time. This is real news. However, the facts did not line up with the far right political ideology espoused on 4chan’s /pol/ board (or if we want to play Devil’s Advocate, it made for good trolling material), and thus its users set about creating alternative narratives. Immediately they began working towards a shocking, to the uninitiated, nearly singular goal: deflect from the fact that a like mind committed a heinous act of violence in any way possible.

4chan: Crowdsourced Opposition Intelligence

With over a year of observing, measuring, and trying to understand the rise of the alt-right online we saw a familiar pattern emerge: crowdsourced opposition intelligence.

/pol/ users mobilized in a perverse, yet fascinating, use of the Web. Dozens of, often conflicting, discussion threads putting forth alternative theories of Ms. Heyer’s killing, supported by everything from pure conjecture, to dubious analysis of mobile phone video and pictures, to impressive investigations discovering personal details and relationships of victims and bystanders. Over time, pieces of the fabrication were agreed upon and tweaked until it resembled in large part a plausible, albeit eyebrow raising, false reality ready for consumption by the general public. Further, as bits of the narrative are debunked, it continues to evolve, weeks after the actual facts have been established.

One month after Heather Heyer’s killing, users on /pol/ were still pushing fabricated alternative narratives of events.

The Web Centipede

There are many anecdotal examples of smaller communities on the Web bubbling up and influencing the rest of the Web, but the plural of anecdote is not data. The research community has studied information diffusion on specific social media platforms like Facebook and Twitter, and indeed each of these platforms is under fire from government investigations in the US, UK, and EU, but the Web is much bigger than just Facebook and Twitter. There are other forces at play, where false information is incubated and crafted for maximum impact before it reaches a mainstream audience. Thus, we set out to measure just how this influence flows in a systematic and methodological manner, analyzing how URLs from 45 mainstream and 54 alternative news sources are shared across 8 months of Reddit, 4chan, and Twitter posts.

While we made many interesting findings, there are a few we’ll highlight here:

  1. Reddit and 4chan post mainstream news URLs at over twice the rate than Twitter does, and 4chan in particular posts alternative news URLs at twice the rate of Twitter and Reddit.
  2. We found that alternative news URLs spread much faster than mainstream URLs, perhaps an artifact of automated bots.
  3. While 4chan was usually the slowest to a post a given URL, it was also the most successful at “reviving” old stories: if a URL was re-posted after a long period of time, it probably showed up on 4chan originally.
Graph representation of news ecosystem for mainstream news domains (left) and alternative news domains (right). We create two directed graphs, one for each type of news, where the nodes represent alternative or mainstream domains, as well as the three platforms, and the edges are the sequences that consider only the first-hop of the platforms. For example, if a breitbart.com URL appears first on Twitter and later on the six selected subreddits, we add an edge from breitbart.com to Twitter, and from Twitter to the six selected subreddits. We also add weights on these edges based on the number of such unique URLs. Edges are colored the same as their source node.

Measuring Influence Through the Lens of Mainstream and Alternative News

While comparative analysis of news URL posting behavior provides insight into how Web communities connect together like a centipede through which information flows, it is not sufficiently powerful to quantify the specific levels of influence they have.

Continue reading How 4chan and The_Donald Influence the Fake News Ecosystem