November 1, 2016

HybPiper: A bioinformatic pipeline for processing target-enrichment data

by Botanical Society of America

With the rapid rise of next-generation sequencing technologies, disparate fields from cancer research to evolutionary biology have seen a drastic shift in the way DNA sequence data is obtained. It is now possible to sequence many genes across large numbers of species in an incredibly short period of time. And the price tag keeps getting smaller and smaller. However, the deluge of sequence data obtained using these high-throughput sequencing techniques requires a substantial amount of computational input to process—a daunting task for many biologists. A recently developed bioinformatics pipeline allows researchers with limited computational skills to quickly and efficiently extract gene regions of interest from data obtained with the increasingly popular targeted sequence capture approach.

Targeted sequence capture is a technique used to focus sequencing efforts on specific regions of the genome. By reducing the size of the target genome to only those gene regions of interest, many more samples can be sequenced concurrently. A recent study led by scientists at the Chicago Botanic Garden and available in Applications in Plant Sciences describes the pipeline, HybPiper, for recovering gene regions from sequence data obtained using this technique.

"We set out to design a tool to reliably extract gene sequences from high-throughput sequencing projects to build phylogenetic trees," explains Dr. Matthew Johnson, lead author of the study. "Scientists using next-generation sequencing technologies get their data delivered in a big pile of DNA fragments. HybPiper decides which fragments belong to which gene, assembles the fragments into a gene region, and returns the full gene sequence, including introns, in a format that can be used for downstream analysis."

The pipeline brings together a number of Python scripts and free-standing programs to create a simple-to-use workflow for processing large amounts of sequence data. "We used a variety of tools at each phase, and tweaked the parameter settings until we were consistently recovering the right sequence. We also tried to be sensitive to different targeted sequencing designs—for example, not everyone will be able to design probes from a closely related genome. This flexibility is reflected in a large number of customizable parameters in HybPiper to better fit each individual project," explains Johnson.

One feature that is particularly useful, especially for those researchers working with plants, is HybPiper's ability to detect duplicate genes. Because all flowering plants, for example, have at least one whole genome duplication in their shared evolutionary history, the detection of paralogous gene copies is an essential part of accurately estimating species relationships. This, however, can be an exceedingly difficult and time-consuming task. Enter HybPiper. Built into the pipeline is the ability to detect duplicate genes within a molecular dataset. Johnson explains, "Sorting DNA sequencing fragments can be tricky when what seems like one gene is really two closely related genes. HybPiper has tools that will allow users to avoid this issue and detect whether a gene has been duplicated in their study organism."

Dr. Johnson concludes, "Development of HybPiper is ongoing. We have set up a website (github.com/mossmatters/HybPiper) that helps users with installation issues and a comprehensive tutorial using an example dataset. We encourage users to provide feedback and suggest new features that will help them with their target enrichment analysis."

More information: Matthew G. Johnson et al, HybPiper: Extracting Coding Sequence and Introns for Phylogenetics from High-Throughput Sequencing Reads Using Target Enrichment, Applications in Plant Sciences (2016). DOI: 10.3732/apps.1600016

Journal information: Applications in Plant Sciences

Provided by Botanical Society of America

Citation: HybPiper: A bioinformatic pipeline for processing target-enrichment data (2016, November 1) retrieved 29 June 2024 from https://phys.org/news/2016-11-hybpiper-bioinformatic-pipeline-target-enrichment.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Thousands of nuclear loci via target enrichment and genome skimming

6 shares

Feedback to editors

HybPiper: A bioinformatic pipeline for processing target-enrichment data

The Milky Way's eROSITA bubbles are large and distant

Saturday Citations: Armadillos are everywhere; Neanderthals still surprising anthropologists; kids are egalitarian

NASA astronauts will stay at the space station longer for more troubleshooting of Boeing capsule

The beginnings of fashion: Paleolithic eyed needles and the evolution of dress

Analysis of NASA InSight data suggests Mars hit by meteoroids more often than thought

New computational microscopy technique provides more direct route to crisp images

A harmless asteroid will whiz past Earth Saturday. Here's how to spot it

Tiny bright objects discovered at dawn of universe baffle scientists

New method for generating monochromatic light in storage rings

Soft, stretchy electrode simulates touch sensations using electrical signals

Relevant PhysicsForums posts

Who chooses official designations for individual dolphins, such as FB15, F153, F286?

Color Recognition: What we see vs animals with a larger color range

Innovative ideas and technologies to help folks with disabilities

Is meat broth really nutritious?

COVID Virus Lives Longer with Higher CO2 In the Air

Periodical Cicada Life Cycle

Thousands of nuclear loci via target enrichment and genome skimming

One pipeline that combines many gene-finding tools

Researchers show novel technique that can 'taste' DNA

Big data: A method for obtaining large, phylogenomic data sets

New software automates and improves phylogenomics from next-generation sequencing data

Sequencing hundreds of nuclear genes in the sunflower family now possible

Researcher discovers 1 in 5 bacteria can break down plastic

Research team develops surfaces designed to discourage spread of resistant bacteria

Printed sensors in soil could help farmers improve crop yields and save money

Researchers improve measurement of gene expression in single cells

Boosting 'natural killer' cell activity could improve cancer therapy

New tool maps microbial diversity with unprecedented details

Medical Xpress

Tech Xplore

Science X

HybPiper: A bioinformatic pipeline for processing target-enrichment data

The Milky Way's eROSITA bubbles are large and distant

Saturday Citations: Armadillos are everywhere; Neanderthals still surprising anthropologists; kids are egalitarian

NASA astronauts will stay at the space station longer for more troubleshooting of Boeing capsule

The beginnings of fashion: Paleolithic eyed needles and the evolution of dress

Analysis of NASA InSight data suggests Mars hit by meteoroids more often than thought

New computational microscopy technique provides more direct route to crisp images

A harmless asteroid will whiz past Earth Saturday. Here's how to spot it

Tiny bright objects discovered at dawn of universe baffle scientists

New method for generating monochromatic light in storage rings

Soft, stretchy electrode simulates touch sensations using electrical signals

Relevant PhysicsForums posts

Related Stories

Thousands of nuclear loci via target enrichment and genome skimming

One pipeline that combines many gene-finding tools

Researchers show novel technique that can 'taste' DNA

Big data: A method for obtaining large, phylogenomic data sets

New software automates and improves phylogenomics from next-generation sequencing data

Sequencing hundreds of nuclear genes in the sunflower family now possible

Recommended for you

Researcher discovers 1 in 5 bacteria can break down plastic

Research team develops surfaces designed to discourage spread of resistant bacteria

Printed sensors in soil could help farmers improve crop yields and save money

Researchers improve measurement of gene expression in single cells

Boosting 'natural killer' cell activity could improve cancer therapy

New tool maps microbial diversity with unprecedented details

Newsletter sign up

Donate and enjoy an ad-free experience