Formal mathematics underpins new approach that standardizes analysis of genome information

Sep 11, 2013
Bioinformatics: Analysis of sequence data falls into line
Two new mathematically based algorithms for analyzing genomic data are not only more accurate and robust than their predecessors, but also allow integration of data from different sources. Credit: iStockphoto/Thinkstock

Researchers in Singapore have developed and tested mathematical tools, or algorithms, that are more accurate and robust than those currently used in analyzing high-throughput genetic sequencing data. The algorithms can determine the location and activity of specific nucleic acid sequences in a broad range of high-throughput techniques that detect gene–protein interactions. The research group, led by Shyam Prabhakar of the A*STAR Genome Institute of Singapore, also showed they could use the algorithms to generate meaningful results from degraded tissue and tissue constructed from several different cell types.

The rapid expansion in the application of high-throughput sequencing was possible because of a parallel growth in bioinformatics techniques to analyze the huge amount of data that the technique generated. High-throughput sequencing began as a technology for rapidly sequencing whole genomes; now it can detect gene activity, DNA methylation, microRNA binding and interactions between genes, transcription factors and . Each of these different sequencing techniques spawned its own specialized analytical methods, many of them based on heuristics—practical strategies that work but may require optimization.

Prabhakar and his colleagues recognized that almost all sequencing analyses are concerned with solving two major classes of problems, long studied in the fields of signal processing—signal detection and signal strength estimation. Standard mathematical techniques already existed for solving such problems. The researchers therefore adapted these techniques to sequencing analyses. They reasoned that the formal mathematical basis underlying the techniques would allow them to be optimized or tuned. They also realized that the same approaches could be used across a broad range of applications, thus enabling data integration.

The researchers developed two algorithms: DFilter for detecting and locating the binding of to the genome; and EFilter for estimating through levels of messenger RNA, the genetic material used as a template for building proteins. Across several sequencing technologies, the researchers benchmarked both algorithms against existing analytical methods. They found that DFilter and EFilter outperformed the more specialized algorithms. The new algorithms also facilitated the analysis and comparison of multiple and diverse data sets.

Prabhakar and co-workers also used their new algorithms to analyze data from complex, heterogeneous tissue in the embryonic mouse forebrain. They searched for functioning transcription factors and gained useful insights, despite the fact that individual could not be assigned to specific cell types.

"We intend to make DFilter and EFilter widely available," says Prabhakar, "perhaps via cloud genomics providers, if all goes according to plan."

Explore further: Vermicompost leachate improves tomato seedling growth

More information: Kumar, V., et al. Uniform, optimal signal processing of mapped deep-sequencing data, Nature Biotechnology 31, 615–622 (2013). www.nature.com/nbt/journal/v31… 7/full/nbt.2596.html

add to favorites email to friend print save as pdf

Related Stories

Recommended for you

Vermicompost leachate improves tomato seedling growth

Nov 21, 2014

Worldwide, drought conditions, extreme temperatures, and high soil saline content all have negative effects on tomato crops. These natural processes reduce soil nutrient content and lifespan, result in reduced plant growth ...

Plant immunity comes at a price

Nov 21, 2014

Plants are under permanent attack by a multitude of pathogens. To win the battle against fungi, bacteria, viruses and other pathogens, they have developed a complex and effective immune system. And just as ...

Evolution: The genetic connivances of digits and genitals

Nov 20, 2014

During the development of mammals, the growth and organization of digits are orchestrated by Hox genes, which are activated very early in precise regions of the embryo. These "architect genes" are themselves regulated by ...

Surrogate sushi: Japan biotech for bluefin tuna

Nov 20, 2014

Of all the overfished fish in the seas, luscious, fatty bluefin tuna are among the most threatened. Marine scientist Goro Yamazaki, who is known in this seaside community as "Young Mr. Fish," is working to ...

User comments : 0

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.