Powerful machine-learning technique enables biologists to analyze enormous data sets

Researchers at A*STAR have compared six data-analysis processes and come up with a clear winner in terms of speed, quality of analysis and reliability. The top performer took large, complex biological data sets and spat out key relations between parameters (such as grouping blood and marrow cells according to cell type) in a fraction of the time of the other techniques.

Measurements on single cells alone can generate huge data sets that have anywhere from 20 to more than 20,000 parameters. The mind-boggling size and complexity of biological data sets make it extremely challenging for scientists to uncover meaningful relationships between parameters.

Mathematicians have developed statistical techniques that simplify complex data sets by grouping data according to their similar characteristics. The most well-known technique is principal component analysis (PCA), which was developed in the early twentieth century. Recently, more powerful techniques, that harness the power of machine learning, have been developed.

Now, Evan Newell and Florent Ginhoux at the Singapore Immunology Network (SIgN), and their colleagues have used single-cell data to test six such machine-learning techniques and discovered one that stands out from the rest in terms of speed, quality of analysis and reliability. This technique is called the uniform manifold approximation and projection, or 'UMAP'.

"When Evan and Etienne Becht in his group at SIgN started to benchmark UMAP, we realized that it was much more powerful than anything we had used before," recalls Ginhoux.

An analysis that might take days using other methods can be done in a few hours using UMAP, which will allow scientists to investigate larger data sets. "With UMAP, we can analyze data for two or three million cells, whereas we generally avoid going beyond 100,000 cells with other methods," says Newell.

UMAP grouped similar cells in the most intuitive way, making it easier to interpret its results.

"I think it's really groundbreaking," says Ginhoux. "Researchers I meet at conferences are already starting to use it."

In an earlier study, the group demonstrated UMAP's power by using it to discover a new population of cells in blood. Newell notes that UMAP is highly versatile and can be applied to data generated in fields as diverse as astronomy and crystallography. "Basically, any data that can be expressed in matrices can be analyzed by UMAP," he says.

In addition to using UMAP to analyze data on a daily basis, the team plans to continue to work with informaticians to tailor UMAP to their needs.

More information: Etienne Becht et al. Dimensionality reduction for visualizing single-cell data using UMAP, Nature Biotechnology (2018). DOI: 10.1038/nbt.4314

Journal information: Nature Biotechnology

Provided by Agency for Science, Technology and Research (A*STAR), Singapore

Powerful machine-learning technique enables biologists to analyze enormous data sets

A machine learning approach helps sort and label cell clusters in multiple dimensions

Scientists replace fishmeal in aquaculture with microbial protein derived from soybean processing wastewater

Scientists regenerate neural pathways in mice with cells from rats

Artificial intelligence helps scientists engineer plants to fight climate change

Laser technology offers breakthrough in detecting illegal ivory

Enhanced CRISPR method enables stable insertion of large genes into the DNA of higher plants

New small molecule helps scientists study regeneration

A new method for enzymatic synthesis of potential RNA therapeutics

Managing meandering waterways in a changing world

New dataset sheds light on relationship of far-red sun-induced chlorophyll fluorescence to canopy-level photosynthesis

How much trust do people have in different types of scientists?

Scientists say voluntary corporate emissions targets not enough to create real climate action

Barley plants fine-tune their root microbial communities through sugary secretions

A shortcut for drug discovery: Novel method predicts on a large scale how small molecules interact with proteins

Yeast study offers possible answer to why some species are generalists and others specialists

Cichlid fishes' curiosity promotes biodiversity: How exploratory behavior aids in ecological adaptation

Climate change could become the main driver of biodiversity decline by mid-century, analysis suggests

First-of-its-kind study shows that conservation actions are effective at halting and reversing biodiversity loss

Deer are expanding north, and that's not good for caribou: Scientists evaluate the reasons why

Donate and enjoy an ad-free experience

Powerful machine-learning technique enables biologists to analyze enormous data sets

Let us know if there is a problem with our content

Thank you for taking time to provide your feedback to the editors

Donate and enjoy an ad-free experience

Share article

E-MAIL THE STORY