February 6, 2020

Analysis of human genomes in the cloud

by Mathias Jäger, European Molecular Biology Laboratory

Most bioinformatics software used for genomic analysis is experimental in nature and has a relatively high failure rate. In addition, cloud infrastructure itself, when run at scale, is prone to system crashes. These setbacks mean that big biomedical data analysis can take a long time and incur huge costs. To solve these problems, Sergei Yakneen, Jan Korbel, and colleagues at EMBL developed a system that identifies and fixes crashes efficiently.

Researchers performing analysis on the cloud need a number of technological skills, from configuring large clusters of machines and loading them with software, to handling networking, data security, and efficiently recovering from crashes. Butler helps researchers master these new domains by serving up appropriate tools that overcome all these challenges.

Saving time by checking the system's pulse

Butler differs from other bioinformatics workflow systems because it constantly collects health metrics from all system components, for example the Central Processing Unit (CPU), memory, or disk space. Its self-healing modules use these health metrics to figure out when something has gone wrong, and can take automated action to restart failed services or machines.

When this automated action does not work, a human operator is notified by email or Slack to solve the problem. Previously, a crew of trained people was necessary to check a similar system and detect failures. By automating this process, Butler dramatically reduces the time needed to execute large projects. "It is indeed very rewarding that these large-scale analyses can now take place in a few months instead of years," Korbel says.

Open source

Good solutions are already available for individual challenges associated with scientific computing in the cloud. So instead of reinventing the wheel, the team improved existing technologies. "We built Butler by integrating a large number of established open source projects," says Sergei Yakneen, the paper's first author, currently Chief Operating Officer at SOPHiA GENETICS. "This dramatically improves the ease and cost-effectiveness with which the software can be maintained, and regularly brings new features into the Butler ecosystem without the need for major development efforts."

Besides system stability and maintainability, using the cloud for genomics research is also challenging with respect to data privacy and the way it is regulated in different countries. Bigger projects will need to make simultaneous use of several cloud environments in different institutes and countries in order to meet the diverse data handling requirements of various jurisdictions. Butler addresses this challenge by being able to run on a wide variety of cloud computing platforms, including most major commercial and academic clouds. This allows researchers access to the widest variety of datasets while meeting stringent data protection requirements.

Butler in use

Butler's ability to facilitate such complex analyses was demonstrated in the context of the Pan-Cancer Analysis of the Whole Genome study. Butler processed a 725 terabyte cancer genome dataset in a time-efficient and uniform manner, on 1500 CPU cores, 5.5 terabytes of RAM, and approximately one petabyte of storage. The European Bioinformatics Institute (EMBL-EBI) played a crucial role by providing access and support to their Embassy Cloud, which was used for testing Butler. The system has recently been used in other projects as well, for example in the European Open Science Cloud pilot project (EOSC).

The Pan-Cancer project

The Pan-Cancer Analysis of Whole Genomes project is a collaboration involving more than 1300 scientists and clinicians from 37 countries. It involved analysis of more than 2600 genomes of 38 different tumor types, creating a huge resource of primary cancer genomes. This was the starting point for 16 working groups to study multiple aspects of cancer development, causation, progression, and classification.

More information: Sergei Yakneen et al. Butler enables rapid cloud-based analysis of thousands of human genomes, Nature Biotechnology, published on 05 February 2020. DOI: 10.1038/s41587-019-0360-3

Journal information: Nature Biotechnology

Provided by European Molecular Biology Laboratory

Citation: Analysis of human genomes in the cloud (2020, February 6) retrieved 1 September 2024 from https://phys.org/news/2020-02-analysis-human-genomes-cloud.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

'Chromosome shattering': Understanding chromothripsis in human cancer

113 shares

Feedback to editors

Analysis of human genomes in the cloud

Saving time by checking the system's pulse

Open source

Butler in use

The Pan-Cancer project

Data from space probes show that Alfvén waves drive the acceleration and heating of the solar wind

Saturday Citations: Corn sweat! Nanoplastics! Plus: Massive objects in your area are dragging spacetime

How fruit flies use internal representations of head direction to support goal-directed navigation

Study finds RNA molecule controls butterfly wing coloration

Doughnut-shaped region found inside Earth's core deepens understanding of planet's magnetic field

Study combines data and molecular simulations to accelerate drug discovery

Biodiversity loss: Many students of environment-related subjects are partly unaware of the causes

How stressed are you? Nanoparticles pave the way for home stress testing

Researchers identify genes for low glycemic index and high protein in rice

New discoveries about how mosquitoes mate may help the fight against malaria

Relevant PhysicsForums posts

The predictive brain (Stimulus-Specific Error Prediction Neurons)

Any suggestions to dampen the sounds of a colostomy bag?

Will cryosleep ever be a reality?

Any stereo audio learning resources for other languages?

Cannot find a comfortable side-sleeping position

Therapeutic Interfering Particle

'Chromosome shattering': Understanding chromothripsis in human cancer

Characterizing RNA alterations in cancer

Unprecedented exploration generates most comprehensive map of cancer genomes to date

Homomorphic encryption for cloud users

Solution to genomic analysis may be in the clouds

Ford, Microsoft extend partnership on Sync 3

How fruit flies use internal representations of head direction to support goal-directed navigation

Study finds RNA molecule controls butterfly wing coloration

Researchers identify genes for low glycemic index and high protein in rice

New discoveries about how mosquitoes mate may help the fight against malaria

Scientists discover molecular mechanism that plays key role in gene transcription and macrophage functional activation

AI tool maps out cell metabolism with precision

Medical Xpress

Tech Xplore

Science X

Analysis of human genomes in the cloud

Saving time by checking the system's pulse

Open source

Butler in use

The Pan-Cancer project

Data from space probes show that Alfvén waves drive the acceleration and heating of the solar wind

Saturday Citations: Corn sweat! Nanoplastics! Plus: Massive objects in your area are dragging spacetime

How fruit flies use internal representations of head direction to support goal-directed navigation

Study finds RNA molecule controls butterfly wing coloration

Doughnut-shaped region found inside Earth's core deepens understanding of planet's magnetic field

Study combines data and molecular simulations to accelerate drug discovery

Biodiversity loss: Many students of environment-related subjects are partly unaware of the causes

How stressed are you? Nanoparticles pave the way for home stress testing

Researchers identify genes for low glycemic index and high protein in rice

New discoveries about how mosquitoes mate may help the fight against malaria

Relevant PhysicsForums posts

Related Stories

'Chromosome shattering': Understanding chromothripsis in human cancer

Characterizing RNA alterations in cancer

Unprecedented exploration generates most comprehensive map of cancer genomes to date

Homomorphic encryption for cloud users

Solution to genomic analysis may be in the clouds

Ford, Microsoft extend partnership on Sync 3

Recommended for you

How fruit flies use internal representations of head direction to support goal-directed navigation

Study finds RNA molecule controls butterfly wing coloration

Researchers identify genes for low glycemic index and high protein in rice

New discoveries about how mosquitoes mate may help the fight against malaria

Scientists discover molecular mechanism that plays key role in gene transcription and macrophage functional activation

AI tool maps out cell metabolism with precision

Newsletter sign up

Donate and enjoy an ad-free experience