'PopDel' detects deletions in our genomes
The human genome contains roughly three million letters and is distributed over 46 chromosomes. Yet the genetic variation from person to person is very small: the genome sequences of any two people differ from each other by about one in every 1,000 letters. Sometimes single letters are exchanged in the genome, while other times whole sections are moved around.
"Many of these differences go unnoticed, because they do not affect the structure of the proteins encoded in the genome and do not cause any diseases," says Birte Kehr, a junior group leader at the Berlin Institute of Health (BIH) who was recently appointed professor at the Regensburg Center for Interventional Immunology (RCI). The bioinformatician is working on so-called structural variants, in which larger segments of the genome are deleted, duplicated or even swapped between chromosomes.
The researchers have now published their findings in the journal Nature Communications.
Large structural changes have a big impact
"Large structural changes are much rarer than changes in single letters," Kehr explains, "but they often have a bigger impact and are also harder to detect." To learn more about these bigger changes, it is helpful to use large databases to search for them. That's where the collaboration with the Icelandic company deCODE Genetics came in, whose database contains some 50,000 human genome sequences and where Kehr formerly worked as a postdoc. "We always planned to search for deletions in the data, but we didn't have a program capable of reliably and quickly processing such huge amounts of data." When Kehr joined the BIH in Berlin, she gave this task to her first Ph.D. student, Sebastian Niehus.
Data is valuable only if it is used
The programs previously available for identifying structural variants such as deletions could only process data from a few individuals at a time. For large data sets, such as those in the deCODE genome database, the results then had to be combined, which was a cumbersome and error-prone process. "We first wanted to develop a statistical model that would enable us to evaluate information from all genomes simultaneously," Niehus reports. "To do this, the program had to be designed so that a computer could quickly sift through huge quantities of data. We also had to compress the files to 1 to 2 percent of their original size to be able to work with them at all."
Faster and more accurate than other software
Once a prototype was developed, the PopDel program had to prove itself against other programs in various scenarios. These included simulated sequence data from up to 1,000 "individuals"; sequence data from 49 parent-child trios, which allowed a thorough analysis of whether inheritance patterns were reconstructed correctly; sequence data from 150 individuals of different ethnicities, which made it possible to evaluate population structures; and finally the approximately 50,000 genomes from the Icelandic cooperation partner deCODE Genetics.
"We were able to show that PopDel produced good results in a quick, reliable and resource-efficient way, both with data from a single person and with data from the largest cohorts," Niehus reports with pride. Kehr adds: "PopDel only needed two days to analyze the genomes of 150 individuals where other programs had taken four weeks. And PopDel's results were better."
Rare gene variant discovered
The highlight for the researchers was the discovery of a rare, previously unknown gene variant in only one family out of the 50,000 Icelanders analyzed. "The gene for the LDL receptor showed a larger deletion, or gap, in these family members. This was coupled with very low levels of cholesterol in these individuals," explains Kehr. Her collaborators at deCODE Genetics have since been able to show that the change in the LDL receptor gene is indeed responsible for the low cholesterol levels in affected individuals. "One affected individual died at the age of 85, while six other affected individuals aged 35 to 65 are all very healthy with their low cholesterol levels," says Kehr. "The results are therefore also very interesting from a medical point of view, because we seem to have discovered a genetic variant that contributes to healthy lipid metabolism."
In the next step, the researchers want to develop the program further. They are continuing to work on it themselves, but have also deposited PopDel's source code on an open server so that anyone can view, use and improve it. "PopDel can so far only detect deleted DNA segments, but there are also genetic variants where segments have been duplicated, inverted or translocated. We would now like to find all of them with PopDel as well," says Niehus, looking to the future. And Kehr hopes that, in the long run, the findings will lead to the development of new therapies and treatment approaches—in keeping with BIH's motto of "Turning Research into Health."