DAVIS, Calif. — Online services that allow users to upload their genetic information, learn about their genealogy, and identify long lost family members have become increasingly popular in recent years, and for good reason. Who wouldn’t want to learn more about where they came from? As more and more people continue to share their genetic information with these public databases, they may be opening themselves up to a form of data theft they probably didn’t even know was possible.
These services may be vulnerable to a few different variations of “genetic hacking,” according to a new study conducted at the University of California, Davis. By uploading certain DNA sequences, the research team say, it may be possible for hackers to collect the genomes of many people in a database or successfully identify individuals with specific genetic variants linked to traits like Alzheimer’s disease.
“People are giving up more information than they think they are,” comments professor Graham Coop in a release. Coop would go on to add that your genome isn’t like a stolen credit card, you can’t just cancel it and order a new one.
To be clear, researchers say these potential vulnerabilities do not apply to for-profit DNA sequencing companies, in which users must submit a sample of their own DNA in order to be granted access to the service’s database. Public databases, though, allow anyone to upload any DNA sequences and search for other users with matching genes.
These public DNA databases operate by using software that compares all of the DNA sequences uploaded by users with sequences already stored in their database. Every person’s genome is inherited from their ancestors, both relatively recent and from generations ago. The bigger pieces of a genome usually come from more recent family members, and as generations go by matching genealogical sequences get cut down into smaller pieces. So, if a user were to find another DNA sequence in one of these databases with large chunks similar to their own, it would likely mean the two sequences, and individuals, share a recent ancestor.
The research team identified three strategies malicious individuals could use to obtain much more information from a public DNA database than a few long lost distant family members. The three approaches are: IBS (identical by sequence) tiling, IBS probing and IBS baiting.
IBS Tiling: A hacker would upload numerous genomes easily found in research databases, and look to see which ones match up with other genomes within the public database. If enough matching tiles are found, a person’s genome could conceivably be pieced together.
IBS Probing: This approach could be used to find people who carry a specific genetic variant. The study used a gene tied to Alzheimer’s as an example. In this approach, a fake genome with a DNA sequence unlikely to match up with anyone would be created, that is, except for one small section of the sequence that would match whichever gene the perpetrator is interested in. Any matches within a public database for this falsified genome would reveal people with this specific gene.
IBS Baiting: This strategy tricks a class of algorithms used to identify relatives in some public databases. The study’s authors estimate that with as little as 100 uploaded DNA sequences, a hacker could get his or her hands on essentially all of the genetic information stored in an entire database. The research team even performed their own test of this method on the GEDMatch database; using only DNA sequences they had uploaded, they were able to confirm that IBS baiting can be used to find specific genetic variants within public databases.
All three of the strategies could conceivably be carried out by an individual with both computing and genetics knowledge, such as a graduate student.
“The good news is that it’s quite preventable,” comments postdoctoral researcher Michael “Doc” Edge.
Researchers lay out how direct-to-consumer genetics services can easily stop these attacks in their study, and say they’ve already shared their findings and suggestions with a number of leading services. However, they report receiving “varied” responses in return.
Anyone using these services should be aware of the potential risks involved and just how much information they may be making accessible, the study’s authors conclude.
The study is published in eLife.