The relationship between genomics and privacy-enhancing technologies (PETs) has been an intense one for the better part of the last decade. Ever since Wang et al.’s paper, “Learning your identity and disease from research papers: Information leaks in genome wide association study”, received the PET Award in 2011, more and more research papers have appeared in leading conferences and journals. In fact, a new research community has steadily grown over the past few years, also thanks to several events, such as Dagstuhl Seminars, the iDash competition series, or the annual GenoPri workshop. As of December 2017, the community website genomeprivacy.org lists more than 200 scientific publications, and dozens of research groups and companies working on this topic.
Progress vs Privacy
The rise of genome privacy research does not come as a surprise to many. On the one hand, genomics has made tremendous progress over the past few years. Sequencing costs have dropped from millions of dollars to less than a thousand, which means that it will soon be possible to easily digitize the full genetic makeup of an individual and run complex genetic tests via computer algorithms. Also, researchers have been able to link more and more genetic features to predisposition of diseases (e.g., Alzheimer’s or diabetes), or to cure patients with rare genetic disorders. Overall, this progress is bringing us closer to a new era of “Precision Medicine”, where diagnosis and treatment can be tailored to individuals based on their genome and thus become cheaper and more effective. Ambitious initiatives, including in UK and in US, are already taking place with the goal of sequencing the genomes of millions of individuals in order to create bio-repositories and make them available for research purposes. At the same time, a private sector for direct-to-consumer genetic testing services is booming, with companies like 23andMe and AncestryDNA already having millions of customers.
On the other hand, however, the very same progress also prompts serious ethical and privacy concerns. Genomic data contains highly sensitive information, such as predisposition to mental and physical diseases, as well as ethnic heritage. And it does not only contain information about the individual, but also about their relatives. Since many biological features are hereditary, access to genomic data of an individual essentially means access to that of close relatives as well. Moreover, genomic data is hard to anonymize: for instance, well-known results have demonstrated the feasibility of identifying people (down to their last name) who have participated in genetic research studies just by cross-referencing their genomic information with publicly available data.
Overall, there are a couple of privacy issues that are specific to genomic data, for instance its almost perpetual sensitivity. If someone gets ahold of your genome 30 years from now, that might be still as sensitive as today, e.g., for your children. Even if there may be no immediate risks from genomic data disclosure, things might change. New correlations between genetic features and phenotypical traits might be discovered, with potential effects on perceived suitability to certain jobs or on health insurance premiums. Or, in a nightmare scenario, racist and discriminatory ideologies might become more prominent and target certain groups of people based on their genetic ancestry.
Making Sense of PETs for Genome Privacy
Motivated by the need to reconcile privacy protection with progress in genomics, the research community has begun to experiment with the use of PETs for securely testing and studying the human genome. In our recent paper, Systematizing Genomic Privacy Research – A Critical Analysis, we take a step back. We set to evaluate research results using PETs in the context of genomics, introducing and executing a methodology to systematize work in the field, ultimately aiming to elicit the challenges and the obstacles that might hinder their real-life deployment.