Reproducing the computational environments of experiments

September 6, 2017, Max Delbrück Center for Molecular Medicine
A view from inside the MDC's data center, with racks of high performance computers. Credit: MDC

Experiments increasingly rely on high-performance computing software. Differences in software environments can cause problems when those experiments need to be reproduced—so scientists at the MDC in Berlin are seeking a solution.

Reproducing experiments and results is a cornerstone of science, but researchers acknowledge that actually achieving this feat can be tricky. Specific experimental setups are usually the result of a lab's painstaking work, and are increasingly expensive in today's of high-throughput methods. The fact that complex, customized sets of are frequently involved in the analysis and interpretation of data makes it even more difficult to achieve true reproducibility.

Guix—a free program that is used to fully reproduce computational environments—might be part of the solution, says Ludovic Courtès of Inria of the French National Institute for computer science and applied mathematics in Bordeaux. To implement it, he has collaborated with Ricardo Wurmus of the platform for bioinformatics and modeling at the the MDC's Berlin Institute of Medical Systems Biology (BIMSB), scientists from the Utrecht University Medical Center and a growing group of international colleagues.

Capturing complete computational environments

Science authorities insist that researchers share source code and support reproducibility. "The ability to reproduce an experiment depends—among other things—on the ability to reproduce the ," Courtès says. "That poses particular difficulties in the many cases requiring high-performance computing (HPC) environments."

Guix is an outgrowth of a project called GNU launched almost 40 years ago at MIT in the U.S. It makes up for some deficits of earlier efforts and addresses several challenges: Users are no longer dependent on software package management by system administrators, empowering them to fully customize the environment to their needs. It also solves problems that arise when scientists draw on "container solutions," which Courtès compares to receiving a brand-new computer with everything pre-installed. "That works until you make a small modification in the experiment to test a new hypothesis—which often happens in the world of research."

The advantage of Guix is that it characterizes software environments in unambiguous terms, similar to a mathematical function. It completely describes all its relations and thus can reproduce them bit-for-bit. This way, Guix facilitates both reproducibility and customizability.

Adapting Guix to scientists' needs

Guix was not originally designed for the high-performance computing environments required by today's experiments. So scientists at the MDC, Inria and the partner institutes are building functions that permit Guix to be used on a computing cluster, to implement reproducible workflows. They are also adding packages that were developed at each site.

"Before Guix, the installation of scientific software was necessarily ad-hoc," Wurmus says. "Groups would build their own software, statically link it into existing systems, and hope that it would never have to change—because managing software environments was virtually impossible. Now not only can we manage a single environment per group in a reliable fashion, but we use Guix at all levels: of the group, user, workflow and so on."

The project is scheduled to last two years, at which time its initiators hope to have met the software reproducibility needs of their institutions. "The wider objective," Courtès says, "is to convince others who rely on high-performance computing that Guix represents a major advance toward a fundamental goal in science."

Explore further: New method to ensure reproducibility in computational experiments

More information: guix-hpc.bordeaux.inria.fr/

Related Stories

'Charliecloud' simplifies Big Data supercomputing

June 7, 2017

At Los Alamos National Laboratory, home to more than 100 supercomputers since the dawn of the computing era, elegance and simplicity of programming are highly valued but not always achieved. In the case of a new product, ...

Research team quantifies 'the difficulties of reproducibility'

November 28, 2013

(Phys.org) —A key pillar of "the scientific method" is reproducibility, one way to prove another scientist's experimental claims. If the experiment and its results can be reproduced, the validity of the work is considerably ...

Recommended for you

Researchers find tweeting in cities lower than expected

February 20, 2018

Studying data from Twitter, University of Illinois researchers found that less people tweet per capita from larger cities than in smaller ones, indicating an unexpected trend that has implications in understanding urban pace ...

Augmented reality takes 3-D printing to next level

February 20, 2018

Cornell researchers are taking 3-D printing and 3-D modeling to a new level by using augmented reality (AR) to allow designers to design in physical space while a robotic arm rapidly prints the work.

What do you get when you cross an airplane with a submarine?

February 15, 2018

Researchers from North Carolina State University have developed the first unmanned, fixed-wing aircraft that is capable of traveling both through the air and under the water – transitioning repeatedly between sky and sea. ...

0 comments

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.