A Hacked Database Prompts Debate about Genetic Privacy


 Linking a human genome in an anonymous sequencing database to its real-world counterpart wasn’t supposed to be possible.

Yaniv Erlich, a geneticist at the Massachusetts Institute of Technology’s Whitehead Institute for Biomedical Research, apparently never got the memo. In the end all it took him and M.I.T. undergraduate student Melissa Gymrek to decipher the identity of 50 individuals whose DNA is available online in free-access databases was a computer and an Internet connection.

Erlich and Gymrek selected 32 male genomes from the 1000 Genomes Project, which has a publicly accessible database designed to help researchers find genes associated with different human diseases. Next, Erlich and Gymrek used an algorithm to extract genetic markers from the DNA sequences. The algorithm is specially designed to hone in on short tandem repeats on a man’s Y chromosome. Y-STRs are passed patrilineally with little to no change from one generation to the next. They provide a way to link an anonymous genome to a particular family surname.