Galaxy DNA-analysis software is now available 'in the cloud'

Nov 08, 2011
Galaxy -- an open-source, web-based platform for data-intensive biomedical and genetic research -- is now available as a "cloud computing" resource. The new technology, developed by a team of researchers from Penn State University and Emory University, will help scientists and biomedical researchers to harness such tools as DNA-sequencing and analysis software, as well as storage capacity for large quantities of scientific data. More information is online at http://science.psu.edu/news-and-events/2011-news/Nekrutenko11-2011 Credit: National Institutes of Health

Galaxy -- an open-source, web-based platform for data-intensive biomedical and genetic research -- is now available as a "cloud computing" resource. A team of researchers including Anton Nekrutenko, an associate professor of biochemistry and molecular biology at Penn State University; Kateryna Makova, an associate professor of biology at Penn State; and James Taylor from Emory University, developed the new technology, which will help scientists and biomedical researchers to harness such tools as DNA-sequencing and analysis software, as well as storage capacity for large quantities of scientific data. Details of the development will be published as a letter in the journal Nature Biotechnology.

Earlier papers by Nekrutenko and co-authors describing the technology and its uses are published in the journals Genome Research and Genome Biology.

Nekrutenko said that he and his team first developed the Galaxy (http://galaxyproject.org) in 2005 because "biology is in a state of shock. Biochemistry and biology labs generate mountains of data, and then scientists wonder, 'What do we do now? How do we analyze all these data?'" Galaxy, which was developed at Penn State and continues to use the University's servers for its computing power, solves many of the problems that researchers encounter by pulling together a variety of tools that allow for easy retrieval and analysis of large amounts of data, simplifying the process of genomic analysis. As described in one of the team's early papers in the journal Genome Research, Galaxy "combines the power of existing genome-annotation databases with a simple to enable users to search remote resources, combine data from independent queries, and visualize the results." Galaxy also allows other researchers to be able to review the steps that have been taken, for example, in the analysis of a string of . "Galaxy offers scientific transparency -- the option of creating a public report of analyses. So, after a paper has been published, scientists in other labs can do studies in order to reproduce the results described," Nekrutenko said.

Now, Nekrutenko's team has taken Galaxy to the next level by developing an "in the cloud" option using, for example, the popular Amazon Web Services cloud. "A cloud is basically a network of powerful computers that can be accessed remotely without the need to worry about heating, cooling, and system administration. Such a system allows users, no matter where they are in the world, to shift the workload of software storage, data storage, and hardware infrastructure to this remote location of networked computers," Nekrutenko explained. "Rather than run Galaxy on one's own computer or use Penn State's servers to access Galaxy, now a researcher can harness the power of the cloud, which allows almost unlimited computing power." As a case study, the authors report on recent research published in Genome Biology in which scientists, with the help of Ian Paul, a professor of pediatrics at Penn State's Hershey Medical Center, analyzed DNA from nine individuals across three families using Galaxy Cloud. Thanks to the enormous computing power of the platform, the researchers were able to identify four heteroplasmic sites -- variations in mitochondria, the part of the genome passed exclusively from mother to child.

"Galaxy Cloud offers many advantages other than the obvious ones, such as for large amounts of data and the ability for a scientist without much computer training to use DNA-analysis tools that might not otherwise be accessible," Nekrutenko said. "For example, researchers need not invest in expensive computer infrastructure to be able to perform data-intensive, sophisticated scientific analyses."

Yet another advantage of Galaxy Cloud is its data-storage capacity. Using the Amazon Web Services cloud, researchers have the option of storing vast amounts of data in a secure location. "There are emerging technologies that will produce 100 times more data than existing 'next-generation' DNA sequencing, which already has reached the point where even more storage becomes an issue, not to mention analysis," Nekrutenko said.

Explore further: Scientists sequence complete genome of E. coli strain responsible for food poisoning

Related Stories

Web-based tools, called 'Galaxy,' simplify genomic analysis

Feb 23, 2010

With tremendous advances in DNA sequencing and the advent of microarray technology in the 1990s, biology embarked on a new age of discovery. Researchers suddenly had access to unprecedented amounts of data -- and faced unprecedented ...

Cloud computing method greatly increases gene analysis

Sep 08, 2010

Researchers at the Johns Hopkins Bloomberg School of Public Health have developed new software that greatly improves the speed at which scientists can analyze RNA sequencing data. RNA sequencing is used to compare differences ...

A genomic CluE for cloud computing

Apr 23, 2009

DNA sequencing is the next frontier in biological research. As new sequencing technology becomes more efficient and affordable, it is increasingly available to small laboratories. Thus, sequencing data is being generated ...

How energy-efficient is cloud computing?

Oct 08, 2010

(PhysOrg.com) -- Conventionally, data storage and data processing are done at the user's own computer, using that computer's storage system and processor. An alternative to this method is cloud computing, ...

Mitsubishi, Hitachi eye disc for cloud computing era

Aug 06, 2009

Hitachi Ltd., Mitsubishi Chemical Corp. and some other organizations plan to jointly develop a next-generation optical disc that can store 25 times more data than a Blu-ray Disc, with the aim of putting the technology into ...

Recommended for you

Sorghum and biodiversity

12 hours ago

It is difficult to distinguish the human impact on the effects of natural factors on the evolution of crop plants. A Franco-Kenyan research team has managed to do just that for sorghum, one of the main cereals ...

Going to extremes for enzymes

Sep 01, 2014

In the age-old nature versus nurture debate, Douglas Clark, a faculty scientist with Berkeley Lab and the University of California (UC) Berkeley, is not taking sides. In the search for enzymes that can break ...

User comments : 0