Automating the selection process for a genome assembler

A repository of genome assemblers is being developed to automate the process of selecting the best assembler for the task at hand.

There are many different genome assemblers being introduced and touted. On the nucleotid.es site (nucleotid.es/), the test results for various genome assemblers provide reproducible findings that genomics researchers can use to select the appropriate assembler for their needs.

After an organism's genetic code has been sequenced, researchers have to assemble the DNA fragments into a single sequence to be able to parse the information. However, selecting an assembler while considering factors such as the large number of short sequence reads generated, repeated sequences, and lack of a reference genome sequence against which to compare the draft assembly can be challenging.

At the U.S. Department of Energy Joint Genome Institute (DOE JGI), a DOE Office of Science user facility, bioinformatics systems analyst Michael Barton has been developing a repository of genome assemblers called nucleotid.es to help the DOE JGI team address these questions for sequencing projects in process. Right now, he said, the process of selecting a genome assembler is manual so an automated pipeline would be very helpful. The repository at http://nucleotid.es/ is publicly available so that other bioinformaticists can benefit from the findings being generated.

"A lot of assemblers are being produced in the bioinformatics community, and instead of reading subjective papers with assemblers, you can test the assemblers for yourself," Barton said, "with the added benefit of having reproducible research so that anyone can produce the results."

Barton started with genome assemblers that are being used by the DOE JGI, and he tested them against an internal dataset of several microbial genomes. The findings are categorized by benchmarks such as NG50 (a statistic which tracks the average length of a set of DNA sequences) on the website so that bioinformaticists can see how each assembler fared at the criteria of interest to them.

Each of the assemblers on the nucleotid.es site is enclosed in virtual boxes called docker containers. The docker containers make it easy to share and use the software. If a bioinformaticist finds a particular assembler useful, they can easily download it from the nucleotid.es site. Conversely, if other bioinformaticists want to see another assembler on the site, Barton said, they can send him the docker container for posting.

So far, he said, the genome assemblers on nucleotid.es are testing microbial genomes that have come off Illumina sequencers. He plans to add assemblers such as meraculous, an assembler for plant genomes developed at the DOE JGI, and jigsaw and allpaths. Barton said eventually he also hopes to have assemblers for other types of genome projects on nucleotides.

Provided by DOE/Joint Genome Institute

Automating the selection process for a genome assembler

Researchers develop tool to evaluate genome sequencing method

Researchers train a bank of AI models to identify memory formation signals in the brain

Neuronal gateway to essential molecules in learning and memory discovered on atomic scale

Plant sensors could act as an early warning system for farmers

Computer model suggests frozen cells could be used to save northern white rhino from extinction

Making crops colorful for easier weeding by robots

Disease-resistant strains of carp provide advancements in aquaculture, enhance gefilte fish quality

A nematode gel to protect crops in Africa and Asia

Unraveling the mysteries of consecutive atmospheric river events

Research team resolves decades-long problem in microscopy

RNA's hidden potential: New study unveils its role in early life and future bioengineering

Smoother surfaces make for better accelerators

Scientists reveal hydroclimatic changes on multiple timescales in Central Asia over the past 7,800 years

Research reveals a surprising topological reversal in quantum systems

NASA's Juno gives aerial views of mountain and lava lake on Io

Toxic fireproof chemicals can be absorbed through touch, 3D-printed skin model shows

Skyrmions move at record speeds: A step towards the computing of the future

A third of China's urban population at risk of city sinking, new satellite data shows

Novel material supercharges innovation in electrostatic energy storage

Donate and enjoy an ad-free experience

Automating the selection process for a genome assembler

Let us know if there is a problem with our content

Thank you for taking time to provide your feedback to the editors

Donate and enjoy an ad-free experience

Share article

E-MAIL THE STORY