July 19, 2016

New study uses computer learning to provide quality control for genetic databases

by Kathryne Metcalf, University of Illinois at Urbana-Champaign

DNA doesn't exist in a vacuum: even though every cell contains the entire genome of its host organism, they know how to differentiate, to become part of an eye, or a bone, or a leaf. These differences are related to each cell's transcriptome—the array of messenger RNA (mRNA) that describe which parts of the genome are expressed as they are translated into proteins.

A new study published in The Plant Journal helps to shed light on the transcriptomic differences between different tissues in Arabidopsis, an important model organism, by creating a standardized "atlas" that can automatically annotate samples to include lost metadata such as tissue type. By combining data from over 7000 samples and 200 labs, this work represents a way to leverage the increasing amounts of publically available 'omics data while improving quality control, to allow for large scale studies and data reuse.

"As more and more 'omics data are hosted in the public databases, it become increasingly difficult to leverage those data. One big obstacle is the lack of consistent metadata," says first author and Brookhaven National Laboratory research associate Fei He. "Our study shows that metadata might be detected based on the data itself, opening the door for automatic metadata re-annotation."

The study focuses on data from microarray analyses, an early high-throughput genetic analysis technique that remains in common use. Such data are often made publically available through tools such as the National Center for Biotechnology Information's Gene Expression Omnibus (GEO), which over time accumulates vast amounts of information from thousands of studies.

Though this abundance of data opens the door for large and inexpensive studies, there are often issues integrating multiple data sets. For example, University of Illinois bioengineer and Carl R. Woese Institute for Genomic Biology affiliate Sergei Maslov describes, "tissue type is a major metadata point for a sample. However, different researchers use different vocabularies to describe the same tissue, [... and] errors exist during the data submission process."

Because the sheer amount of data precludes manual correction or quality control, Maslov, He and collaborators were inspired to create an automated solution that could deduce metadata from the expression profiles themselves by identifying similarities between tissue types. Their findings suggest that expression profiles remain remarkably similar between samples of the same tissue type, even when taken from plants grown under very different conditions.

By identifying the most similar samples with tissue types already annotated, researchers were able to teach their algorithm to identify other samples of the same type with an excellent degree of accuracy. The team generated over 10,000 entries of metadata, and was even able to correct some mistaken annotation in another lab's study by confirming with the original author. The end result is a massive "atlas" of well-annotated data that can be used for future studies.

"Our ultimate goal is to provide cloud-based computer infrastructure for the study of energy/agriculture related plants, such as poplar and maize," says Maslov. "If our strategies have been successfully applied on Arabidopsis, they can be applied on other species as well."

Meanwhile, adds He, their integrated Arabidopsis atlas is itself an important contribution to plant genetics. "It can be used for constructing coexpression networks, one of the popular methods to leverage transcriptome data for annotation of gene function. We hope it will become a gold standard dataset in many applications."

More information: Fei He et al, Large-scale atlas of microarray data reveals the distinct expression landscape of different tissues in Arabidopsis, The Plant Journal (2016). DOI: 10.1111/tpj.13175

Journal information: The Plant Journal

Provided by University of Illinois at Urbana-Champaign

Citation: New study uses computer learning to provide quality control for genetic databases (2016, July 19) retrieved 10 May 2024 from https://phys.org/news/2016-07-quality-genetic-databases.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

New version of the Human Protein Atlas

5 shares

Feedback to editors

New study uses computer learning to provide quality control for genetic databases

Scientists unlock key to breeding 'carbon gobbling' plants with a major appetite

Clues from deep magma reservoirs could improve volcanic eruption forecasts

Study shows AI conversational agents can help reduce interethnic prejudice during online interactions

NASA's Chandra notices the galactic center is venting

Wildfires in old-growth Amazon forest areas rose 152% in 2023, study shows

GoT-ChA: New tool reveals how gene mutations affect cells

Accelerating material characterization: Machine learning meets X-ray absorption spectroscopy

Life expectancy study reveals longest and shortest-lived cats

New research shows microevolution can be used to predict how evolution works on much longer timescales

Stable magnetic bundles achieved at room temperature and zero magnetic field

Relevant PhysicsForums posts

Who chooses official designations for individual dolphins, such as FB15, F153, F286?

Is it usual for vaccine injection site to hurt again during infection?

The Cass Report (UK)

Is 5 milliamps at 240 volts dangerous?

Major Evolution in Action

If theres a 15% probability each month of getting a woman pregnant...

New version of the Human Protein Atlas

Crowdsourcing platform makes public gene expression data more accessible

'Omics' data improves breast cancer survival prediction

A rallying call for microbiome science national data management

New method for analysing RNA sequence data identifies new subtypes of cells

Researchers link gene expression patterns of normal tissue to breast cancer prognosis

Scientists unlock key to breeding 'carbon gobbling' plants with a major appetite

Should we fight climate change by re-engineering life itself?

AlphaFold 3 upgrade enables the prediction of other types of biomolecular systems

For sustainable aviation fuel, researchers engineer a promising microorganism for precursor production

New fluidic system advances development of artificial blood vessels and biomedicine applications

An adjuvant made in yeast could lower vaccine cost and boost availability

Medical Xpress

Tech Xplore

Science X

New study uses computer learning to provide quality control for genetic databases

Scientists unlock key to breeding 'carbon gobbling' plants with a major appetite

Clues from deep magma reservoirs could improve volcanic eruption forecasts

Study shows AI conversational agents can help reduce interethnic prejudice during online interactions

NASA's Chandra notices the galactic center is venting

Wildfires in old-growth Amazon forest areas rose 152% in 2023, study shows

GoT-ChA: New tool reveals how gene mutations affect cells

Accelerating material characterization: Machine learning meets X-ray absorption spectroscopy

Life expectancy study reveals longest and shortest-lived cats

New research shows microevolution can be used to predict how evolution works on much longer timescales

Stable magnetic bundles achieved at room temperature and zero magnetic field

Relevant PhysicsForums posts

Related Stories

New version of the Human Protein Atlas

Crowdsourcing platform makes public gene expression data more accessible

'Omics' data improves breast cancer survival prediction

A rallying call for microbiome science national data management

New method for analysing RNA sequence data identifies new subtypes of cells

Researchers link gene expression patterns of normal tissue to breast cancer prognosis

Recommended for you

Scientists unlock key to breeding 'carbon gobbling' plants with a major appetite

Should we fight climate change by re-engineering life itself?

AlphaFold 3 upgrade enables the prediction of other types of biomolecular systems

For sustainable aviation fuel, researchers engineer a promising microorganism for precursor production

New fluidic system advances development of artificial blood vessels and biomedicine applications

An adjuvant made in yeast could lower vaccine cost and boost availability

Newsletter sign up

Donate and enjoy an ad-free experience