Professor Hyungryul Baik Proposes a Way to Collaborate on Genomics Research While Protecting Privacy in Genomic Information

A joint research team led by Professors Hyungryul Baik from KAIST and Buhm Han from Seoul National University presented a method to allow collaboration on genomics research while protecting information on individual genotypes.

In recent years, genomics research has played an important role, for example, in discovering the cause of disease or mapping the mobility of human populations. One of the crucial elements required in genomics research is how to collect and store as many DNA samples as possible, which inevitably leads to collaboration among researchers and information sharing by multiple organizations.

However, it has been a challenging issue to share DNA data and at the same time, protect their privacy. The joint research team found a solution for this problem by developing an algorithm that allows researchers or institutions to share genomic information, but discloses only the necessary information to conduct a certain type of research, i.e., genomic analysis, while encrypting the rest of the information.

The algorithm, called “Genomic GPS” according to the team, is based upon the application of technical principles employed in the Global Positioning System (GPS), multilateration.

Multilateration is a localization technique for wireless sensor networks in which spatial coordinates of a node with an unknown position are inferred by measuring the distances from the node to several reference nodes at known positions.

In a similar manner to multilateration, the researchers calculated genetic distances from their samples (a location of a node with an unknown position) to reference samples in the public domain (reference nodes at known positions). Without revealing individual genome data, they were able to share the genetic distances with other research groups and performed a genomic analysis, such as the identification of sample overlaps and close relatives and the investigation of population genomics, thereby demonstrating the balance between privacy protection and data sharing in genomics research.

They also evaluated the algorithm with a mathematical proof that it safely protects genome privacy, and that the encrypted data deliver as intact genomic information as original data.

Professor Baik who led the mathematical proof said, “Even if a hacker steals the encrypted information on genetic distances, the possibility of reconstructing the original information of individual genome by the hacker practically converges to zero (0) percent.”

This research entitled “Genomic GPS: using genetic distance from individuals to public data for genomic analysis without disclosing personal genomes” has been published in Genome Biology on August 27, 2019.


Picture “a” shows how the GPS works. Pictures “b” and “c” illustrate genomic GPS and its application to sample overlap detection.