It's been nearly two decades since a UC Santa Cruz research team announced that they had assembled and posted the first human genome sequence on the internet. Despite the passage of time, enormous gaps remain in our genomic reference map. These gaps span each human centromere.
New research from a UC Santa Cruz Genomics Institute-affiliated team from the Jack Baskin School of Engineering just published in the journal Nature Biotechnology attempts to close these gaps. The research uses nanopore long-read sequencing to generate the first complete and accurate linear map of a human Y chromosome centromere. This milestone in human genetics and genomics signals that scientists are finally entering a technological phase when completing the human genome will be a reality.
Centromeres are sites in our genetic material that are dedicated to ensuring that our genome is correctly partitioned when cells divide. If the centromere is lost, or damaged, then life is out of balance, with too much or too little DNA in each daughter cell. This can be catastrophic, and is often seen in cancers.
Scientists still do not understand how the underlying DNA contributes to centromere function. Until now the inability to generate maps through these regions has been a fundamental roadblock in studies aimed to understand how missing sequences impact our health.
The DNA known to span human centromeres are full of tandem repeats. That is to say, exact copies of the same sequence are found—in a head-to-tail orientation—thousands of times. These exact copies often span millions of bases—the fundamental units of DNA.
Dr. Karen Miga, corresponding author on the publication, explained that researchers have called these repeat-rich regions of code the "black holes of the genome," "puzzle pieces of a blue sky" and "a hall of mirrors."
Miga noted, "Prior to our work, no sequence technology, or collection of sequence technologies have been sufficient to ensure proper assembly through these regions."
One of the key aspects of this work is the authors' use of a method to generate both long (hundreds of thousands of bases) and high-quality sequences. Co-lead author Miten Jain, postdoctoral researcher with the nanopore group at UC Santa Cruz, which pioneered the technology, said both quantity and quality are critical to confidently assemble these previously unresolved, repeat-rich centromere region on the Y chromosome.
"Previously, no sequencing technology has been able to assemble centromeric regions because extremely high-quality, long reads are needed to confidently traverse low-copy sequence variants," Jain said, As a result, human centromeric regions remain absent from even the most complete chromosome assemblies.
These "black holes of the genome" contain information that is "critical for understanding the role of genome biology in health and in diseases such as cancer, explained co-lead author Hugh Olsen, a UC Santa Cruz scientist and lecturer working in the nanopore group. With NIH grant support, the research team continues to improve read lengths of nanopore sequencing technology. "In collaboration with other investigators, we are applying that knowledge to resolve uncharacterized regions of the genome," Olsen explained.
Dr. Miga states that it is her hope that this study will mark the beginning of a new era in human genetics and genomics, where having gaps in the genome reference will not longer be tolerated. "We are on a trajectory for a complete genome. I, for one, look forward to a day that where we are finally able to roll up our sleeves and study the function of these mysterious sequences."
Explore further: Exploring the 'last frontier' of our genome
Miten Jain et al, Linear assembly of a human centromere on the Y chromosome, Nature Biotechnology (2018). DOI: 10.1038/nbt.4109