May 4, 2020

Eleven human genomes in nine days

by University of California - Santa Cruz

It's only been three years since UC Santa Cruz researchers proved that long-read human genome assembly using the same nanopore technology developed on campus could be done at all. At the time, it was a monumental effort, requiring 150,000 hours of computing time and weeks of work.

About a year later, using the PromethION nanopore sequencer, a similar effort proved significantly faster, cheaper, and easier, clocking in at about a week. "We sequenced eleven human genomes in nine days, which was unprecedented at the time," said UC Santa Cruz Research Scientist Miten Jain.

Now, researchers at UC Santa Cruz researchers have collaborated on an algorithm designed to accurately and precisely assemble individual, complete human genomes from long-read sequencing data in about six hours and for about $70.

The researchers said they hope their assembler will increase the pace of genomics research and open opportunities. This includes enabling pangenome research to represent the true scale of human diversity, a decidedly more practical pursuit.

Until recently, genomic research has relied exclusively on the reference genome from a single individual selected to represent an entire species. To reflect true human diversity, UC Santa Cruz has embarked on a pangenomic initiative to sequence 350 new, individual human genomes.

As a part of this work, UC Santa Cruz Genomics Institute researchers developed a nanopore long-read sequencing protocol that consistently yields ~60X coverage (~200 gigabases) of a human genome at unprecedented lengths (median read N50 of 42 kb) using three PromethION flow cells. Additionally, ~7X coverage of the genome is in reads exceeding 100 kb in length. This method is highly scalable, both in terms of cost and the number of genomes that can be processed simultaneously. We are now improving this method for higher read lengths and throughput, which will further facilitate our goal of achieving complete, phased, reference-quality genomes.

This large inflow of data necessitated the development of highly efficient software tools, starting with an assembler. "Our new assembler was designed to be cheap and quick, with the goal to be on the cloud," said UC Santa Cruz's Benedict Paten. "It gives us the power to scale nanopore sequencing. Now, I'm confident that we'll be easily assembling hundreds of de novo genomes in the next couple of years."

An extensive team of researchers and developers that was led by Paolo Carnevali from the Chan Zuckerberg Initiative (CZI)—and included many at the Computational Genomics Lab at the UC Santa Cruz Genomics Institute—contributed to this solution.

"When I saw the Jain 2018 paper, I was impressed and realized that I could contribute to the computational side of this line of investigation," said Paolo Carnevali. "I had recently met Benedict Paten and decided I wanted to work with his team at UCSC.

The team were soon collaborating. Within months, they had developed and tested the special algorithmic sauce, which they called Shasta.

Shasta is an in-memory computing-driven algorithm that can now help complete a de novo (new, never before processed) human genome assembly in under six hours, the authors say, for an average cost of $70 per sample.

In their paper, "Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes," published today in Nature Biotechnology, they describe how Shasta not only yields comparable or better accuracy as its contemporaries but also has the lowest number of misassemblies.

Not satisfied with this milestone, the team saw an opportunity to improve the draft assembly at an affordable cost and turn-around time. "To improve the base-level quality of the assemblies, we used a sequence polisher based on a deep neural network as the final assembly step," explained lead author Kishwar Shafin. "This brought the total cost of the assembly process to less than $200 and 37 hours—which further reduced the computational overhead of generating long-read assemblies dramatically—by a factor of five."

The researchers assessed the precision and then validated the accuracy, and noted that they had achieved 99.9% accurate assembly using only nanopore data, a first for the human genome. Further, they generated chromosome-level scaffolds for these polished assemblies using HiC sequencing data.

Research scientist and co-author Karen Miga, who is directing the Data Production Center at UCSC for the Human Pangenome Project, points out the significance of the team's achievements in improved accuracy. "Our aim is not only to expand the diversity of the reference genome but also to resolve the hundreds of gaps that persist across the genome," Miga explains. "Now that we can routinely include these uncharted regions, we have a truly complete assembly of a human genome, and we can begin to explore variations of unknown consequence."

More information: Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes, Nature Biotechnology (2020). DOI: 10.1038/s41587-020-0503-6 , www.nature.com/articles/s41587-020-0503-6

Journal information: Nature Biotechnology

Provided by University of California - Santa Cruz

Citation: Eleven human genomes in nine days (2020, May 4) retrieved 20 June 2024 from https://phys.org/news/2020-05-eleven-human-genomes-days.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Research signals arrival of a complete human genome

147 shares

Feedback to editors

Eleven human genomes in nine days

French-Chinese probe to hunt universe's biggest explosions

The ornate horns of ancient marvel Lokiceratops point to evolutionary insights

High-temperature superconductivity: Exploring quadratic electron-phonon coupling

Scientists devise algorithm to engineer improved enzymes

Improving crops with laser beams and 3D printing

Researchers find wave activity on Titan may be strong enough to erode the coastlines of lakes and seas

Caffeine may be a useful marker of wastewater leaks in storm drain systems

Boosting the synthesis of stable sugar compounds with a novel nature-inspired approach

Earth's atmosphere is our best defense against nearby supernovae, study suggests

Shepherd's graffiti sheds new light on Acropolis lost temple mystery

Relevant PhysicsForums posts

Is meat broth really nutritious?

A DNA Animation

Innovative ideas and technologies to help folks with disabilities

How do fetuses breathe in the womb?

DNA-maternity test - could you see other relationship than mother?

Insulin resistance and external insulin

Research signals arrival of a complete human genome

New human reference genome resources help capture global genetic diversity

Nanopore sequencing of African swine fever virus

Researchers conduct sequencing and de novo assembly of 150 genomes in Denmark

Scientists sequence the genome of basmati rice

UCSC genome browser posts the coronavirus genome

Scientists devise algorithm to engineer improved enzymes

Improving crops with laser beams and 3D printing

Hope from an unexpected source in the global race to stop wheat blast

A railroad of cells: Computer simulations explain cell movement

250-million-year-old fossil seen anew with modern technology

Biologists take closer look at stress response in cells

Medical Xpress

Tech Xplore

Science X

Eleven human genomes in nine days

French-Chinese probe to hunt universe's biggest explosions

The ornate horns of ancient marvel Lokiceratops point to evolutionary insights

High-temperature superconductivity: Exploring quadratic electron-phonon coupling

Scientists devise algorithm to engineer improved enzymes

Improving crops with laser beams and 3D printing

Researchers find wave activity on Titan may be strong enough to erode the coastlines of lakes and seas

Caffeine may be a useful marker of wastewater leaks in storm drain systems

Boosting the synthesis of stable sugar compounds with a novel nature-inspired approach

Earth's atmosphere is our best defense against nearby supernovae, study suggests

Shepherd's graffiti sheds new light on Acropolis lost temple mystery

Relevant PhysicsForums posts

Related Stories

Research signals arrival of a complete human genome

New human reference genome resources help capture global genetic diversity

Nanopore sequencing of African swine fever virus

Researchers conduct sequencing and de novo assembly of 150 genomes in Denmark

Scientists sequence the genome of basmati rice

UCSC genome browser posts the coronavirus genome

Recommended for you

Scientists devise algorithm to engineer improved enzymes

Improving crops with laser beams and 3D printing

Hope from an unexpected source in the global race to stop wheat blast

A railroad of cells: Computer simulations explain cell movement

250-million-year-old fossil seen anew with modern technology

Biologists take closer look at stress response in cells

Newsletter sign up

Donate and enjoy an ad-free experience