November 1, 2016

HybPiper: A bioinformatic pipeline for processing target-enrichment data

by Botanical Society of America

With the rapid rise of next-generation sequencing technologies, disparate fields from cancer research to evolutionary biology have seen a drastic shift in the way DNA sequence data is obtained. It is now possible to sequence many genes across large numbers of species in an incredibly short period of time. And the price tag keeps getting smaller and smaller. However, the deluge of sequence data obtained using these high-throughput sequencing techniques requires a substantial amount of computational input to process—a daunting task for many biologists. A recently developed bioinformatics pipeline allows researchers with limited computational skills to quickly and efficiently extract gene regions of interest from data obtained with the increasingly popular targeted sequence capture approach.

Targeted sequence capture is a technique used to focus sequencing efforts on specific regions of the genome. By reducing the size of the target genome to only those gene regions of interest, many more samples can be sequenced concurrently. A recent study led by scientists at the Chicago Botanic Garden and available in Applications in Plant Sciences describes the pipeline, HybPiper, for recovering gene regions from sequence data obtained using this technique.

"We set out to design a tool to reliably extract gene sequences from high-throughput sequencing projects to build phylogenetic trees," explains Dr. Matthew Johnson, lead author of the study. "Scientists using next-generation sequencing technologies get their data delivered in a big pile of DNA fragments. HybPiper decides which fragments belong to which gene, assembles the fragments into a gene region, and returns the full gene sequence, including introns, in a format that can be used for downstream analysis."

The pipeline brings together a number of Python scripts and free-standing programs to create a simple-to-use workflow for processing large amounts of sequence data. "We used a variety of tools at each phase, and tweaked the parameter settings until we were consistently recovering the right sequence. We also tried to be sensitive to different targeted sequencing designs—for example, not everyone will be able to design probes from a closely related genome. This flexibility is reflected in a large number of customizable parameters in HybPiper to better fit each individual project," explains Johnson.

One feature that is particularly useful, especially for those researchers working with plants, is HybPiper's ability to detect duplicate genes. Because all flowering plants, for example, have at least one whole genome duplication in their shared evolutionary history, the detection of paralogous gene copies is an essential part of accurately estimating species relationships. This, however, can be an exceedingly difficult and time-consuming task. Enter HybPiper. Built into the pipeline is the ability to detect duplicate genes within a molecular dataset. Johnson explains, "Sorting DNA sequencing fragments can be tricky when what seems like one gene is really two closely related genes. HybPiper has tools that will allow users to avoid this issue and detect whether a gene has been duplicated in their study organism."

Dr. Johnson concludes, "Development of HybPiper is ongoing. We have set up a website (github.com/mossmatters/HybPiper) that helps users with installation issues and a comprehensive tutorial using an example dataset. We encourage users to provide feedback and suggest new features that will help them with their target enrichment analysis."

More information: Matthew G. Johnson et al, HybPiper: Extracting Coding Sequence and Introns for Phylogenetics from High-Throughput Sequencing Reads Using Target Enrichment, Applications in Plant Sciences (2016). DOI: 10.3732/apps.1600016

Journal information: Applications in Plant Sciences

Provided by Botanical Society of America

Citation: HybPiper: A bioinformatic pipeline for processing target-enrichment data (2016, November 1) retrieved 26 April 2024 from https://phys.org/news/2016-11-hybpiper-bioinformatic-pipeline-target-enrichment.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Thousands of nuclear loci via target enrichment and genome skimming

6 shares

Feedback to editors

HybPiper: A bioinformatic pipeline for processing target-enrichment data

Research investigates radio emission of the rotating radio transient RRAT J1854+0306

More efficient molecular motor widens potential applications

Managing meandering waterways in a changing world

New dataset sheds light on relationship of far-red sun-induced chlorophyll fluorescence to canopy-level photosynthesis

How much trust do people have in different types of scientists?

Scientists say voluntary corporate emissions targets not enough to create real climate action

Barley plants fine-tune their root microbial communities through sugary secretions

A shortcut for drug discovery: Novel method predicts on a large scale how small molecules interact with proteins

Yeast study offers possible answer to why some species are generalists and others specialists

Cichlid fishes' curiosity promotes biodiversity: How exploratory behavior aids in ecological adaptation

Relevant PhysicsForums posts

The Cass Report (UK)

Major Evolution in Action

If theres a 15% probability each month of getting a woman pregnant...

Can four legged animals drink from beneath their feet?

Mold in Plastic Water Bottles? What does it eat?

Dolphins don't breathe through their esophagus

Thousands of nuclear loci via target enrichment and genome skimming

One pipeline that combines many gene-finding tools

Researchers show novel technique that can 'taste' DNA

Big data: A method for obtaining large, phylogenomic data sets

New software automates and improves phylogenomics from next-generation sequencing data

Sequencing hundreds of nuclear genes in the sunflower family now possible

Scientists replace fishmeal in aquaculture with microbial protein derived from soybean processing wastewater

Scientists regenerate neural pathways in mice with cells from rats

Artificial intelligence helps scientists engineer plants to fight climate change

Enhanced CRISPR method enables stable insertion of large genes into the DNA of higher plants

Laser technology offers breakthrough in detecting illegal ivory

New small molecule helps scientists study regeneration

Medical Xpress

Tech Xplore

Science X

HybPiper: A bioinformatic pipeline for processing target-enrichment data

Research investigates radio emission of the rotating radio transient RRAT J1854+0306

More efficient molecular motor widens potential applications

Managing meandering waterways in a changing world

New dataset sheds light on relationship of far-red sun-induced chlorophyll fluorescence to canopy-level photosynthesis

How much trust do people have in different types of scientists?

Scientists say voluntary corporate emissions targets not enough to create real climate action

Barley plants fine-tune their root microbial communities through sugary secretions

A shortcut for drug discovery: Novel method predicts on a large scale how small molecules interact with proteins

Yeast study offers possible answer to why some species are generalists and others specialists

Cichlid fishes' curiosity promotes biodiversity: How exploratory behavior aids in ecological adaptation

Relevant PhysicsForums posts

Related Stories

Thousands of nuclear loci via target enrichment and genome skimming

One pipeline that combines many gene-finding tools

Researchers show novel technique that can 'taste' DNA

Big data: A method for obtaining large, phylogenomic data sets

New software automates and improves phylogenomics from next-generation sequencing data

Sequencing hundreds of nuclear genes in the sunflower family now possible

Recommended for you

Scientists replace fishmeal in aquaculture with microbial protein derived from soybean processing wastewater

Scientists regenerate neural pathways in mice with cells from rats

Artificial intelligence helps scientists engineer plants to fight climate change

Enhanced CRISPR method enables stable insertion of large genes into the DNA of higher plants

Laser technology offers breakthrough in detecting illegal ivory

New small molecule helps scientists study regeneration

Newsletter sign up

Donate and enjoy an ad-free experience