New analysis of big data sheds light on cell functions

October 26, 2016, University of California - San Diego
Escherichia coli. Credit: Rocky Mountain Laboratories, NIAID, NIH

Researchers have developed a new way of obtaining useful information from big data in biology to better understand—and predict—what goes on inside a cell. Using genome-scale models, researchers were able to integrate multiple different data sets and discovered new biological patterns among different cellular processes. The research, led by bioengineers at the University of California San Diego, was published online Oct. 26 in Nature Communications.

Scientists have been relying more on big data to make new quantitative discoveries in biology with respect to the genome, the microbiome, personalized medicine and disease modeling, for example. With today's technology, scientists are able to generate data about a cell's or organism's complete set of genes, proteins, RNA profiles, metabolites and much more—known as omic data. Using omic data, scientists can model complex biological interactions and gain a more holistic view of different cellular processes. But a challenge is analyzing and making sense of these large data sets.

"When doing analysis, it is important to know how all these different data types are related. Now we have a way of connecting multiple different data types to generate fundamental answers to biological questions," said Bernhard Palsson, Galetti Professor of Bioengineering at the Jacobs School of Engineering at UC San Diego and senior author of the study.

"While all these data types are derived from the same cell, they represent processes occurring at very different scales. Our work is about getting multiple different data types synchronized so that we can understand the coordination of these processes and derive meaning from them," said Elizabeth Brunk, a postdoctoral researcher in Palsson's lab and a co-first author of the study.

This study is part of a larger effort to address a grand challenge posed by the National Institutes of Health called "Big Data to Knowledge"—translating large, complex biological data sets into information that can be understood based on fundamentals.

In this study, researchers collected multiple omic data types (RNA sequences, ribosome profiles, protein data, metabolic data) from E. coli grown in different growth environments. The team then integrated these different data types into next-generation genome-scale models of metabolism, which were developed in Palsson's lab.

They examined the relationships between omic data types and discovered new regularities, which are biological consistencies throughout a change in environment. Among the regularities they found were that during protein translation, ribosomes consistently pause at particular sites along a messenger RNA transcript, and that these pause sites dictate the protein's three-dimensional structure.

Pause sites exist so that a protein has time to fold and form its overall shape, which is important for the protein to function correctly, Palsson explained. This knowledge is useful for studying cancer biology. If a tumor has a genetic mutation that eliminates a pause site, translation will yield a protein that's not folded correctly and malfunctions.

"Now we have a fundamental explanation for these pause sites that we didn't have before. It's as if we're witnessing an intricate dance with a certain rhythm to make sure that a protein is formed the right way," Palsson said.

The team also developed what's called a parameterized model that can be used to predict which genes are expressed when a cell experiences a change in environment.

"Thanks to the high-quality topological information provided in the genome-scale models developed by Dr. Palsson's lab, we can obtain a better understanding of the connection between genes, proteins and metabolites and place multi-omic data into the context of these biochemical networks," Brunk said.

Explore further: Bioengineers identify the key genes and functions for sustaining microbial life

More information: "Multi-omic data integration enables discovery of hidden biological regularities" DOI: 10.1038/NCOMMS13091

Related Stories

Distinguishing deadly Staph bacteria from harmless strains

June 6, 2016

Staphylococcus aureus bacteria are the leading cause of skin, soft tissue and several other types of infections. Staph is also a global public threat due to the rapid rise of antibiotic-resistant strains, including methicillin-resistant ...

Recommended for you

How birds and insects reacted to the solar eclipse

November 14, 2018

A team of researchers with Cornell University and the University of Oxford has found that birds and insects reacted in some surprising ways to the 2017 U.S. total solar eclipse. In their paper published in the journal Biology ...

Symbiosis a driver of truffle diversity

November 14, 2018

While the sight of black or white truffle being shaved over on pasta is generally considered a sign of dining extravagance, they play an important role in soil ecosystem services. Truffles are the fruiting bodies of the ectomycorrhizal ...

Gene-edited food is coming, but will shoppers buy?

November 14, 2018

The next generation of biotech food is headed for the grocery aisles, and first up may be salad dressings or granola bars made with soybean oil genetically tweaked to be good for your heart.


Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.