‘Sanitizing’ functional genomics data may prevent privacy breaches | Spectrum | Autism Research News

January 16, 2021


The new data ‘sanitization’ technique obscures regions of a
participant’s genome in a dataset to secure her privacy, and may encourage more people to participate in genetic studies, says lead investigator Mark Gerstein, professor of biomedical informatics at Yale University.

“If someone hacks into your email, you can get a new email address; or if someone hacks your credit card, you can get a new credit card,” Gerstein says. “If someone hacks your genome, you can’t get a new one.”

To determine which information and how much of it should remain private to prevent a linkage attack, Gerstein and his colleagues performed linkage attacks on existing genetic datasets. In one sample attack, they compared two publicly available databases and RNA sequencing results to successfully identify 421 individuals.

In another linkage attack, Gerstein’s team sequenced the RNA of two volunteers and shuffled these data into a larger dataset. They then obtained DNA samples from the volunteers’ used coffee cups and sequenced their genomes. Again, they could link the two individuals to their genomes with a high degree of certainty.

Based on what they learned from the mock linkage attacks, Gerstein’s team developed a technique to mask some variants from a person’s genetic data while preserving where those variants are located in the genome. To do this, they replace the genetic variant of concern with one from a reference genome; which variants are removed depend on the genetic conditions or predispositions someone’s genetic data reveals.

Introducing too many of these privacy-masking variants can decrease the usefulness of the data. But Gerstein’s team struck a balance that enables researchers to obtain data on gene-expression values but also enables study participants to dictate how much of their genetic information they wish to keep hidden.