March 20, 2014

SDSC assists in whole-genome sequencing analysis under collaboration with Janssen

A recent whole-genome sequencing (WGS) analysis project supported by the San Diego Supercomputer Center (SDSC) at the University of California, San Diego has demonstrated the effectiveness of innovative applications of "flash" memory technology to rapidly process large data sets that are pervasive throughout human genomics research.

Janssen Research and Development, LLC (Janssen), in collaboration with SDSC and the Scripps Translational Science Institute (STSI), recently launched a project to conduct whole-genome sequencing of 438 patients with rheumatoid arthritis to better understand the disease, as well as explore genetic factors of patient response to a biologic therapy discovered, developed, and currently marketed by Janssen in the United States.

The analysis began with 50 terabytes of "read" data generated by DNA sequencers from samples originally obtained from each of the study participants. These source data were fed into a 14-step processing "pipeline" using open source software tools. Key components of the analysis were mapping the DNA read sequences from each patient against a reference genome and calling to identify the variants between the two.

The read mapping and variant calling were done by Kristopher Standish, a UC San Diego graduate student working under Nicholas Schork, formerly with STSI and now with the J. Craig Venter Institute. SDSC provided high-performance computing and storage resources, as well as expertise to set up and optimize the computational pipeline.

"The need to conduct analysis of 438 full human genomes in a relatively short timeframe necessitated a thorough understanding not only of the computational workload, but of the memory, storage, and input/output requirements," said Wayne Pfeiffer, an SDSC Distinguished Scientist and the Center's lead researcher in the collaboration. "The emergence of 'big data' challenges such as those in human genomics has brought to the fore situations where computer analyses are more likely memory-and I/O (input/output)-bound than compute-bound, meaning that while the actual computer processors may have plenty of capacity, the ability to store and/or move around large amounts of data becomes the limiting factor in throughput."

In the case of the Janssen collaboration, one step in particular – the "sort" step of the read mapping stage – was particularly challenging, requiring a relatively small number of processor cores, but rapid access to several terabytes of data, more than can be kept in the supercomputer's high performance main memory. The conventional approach of storing data on hard disk drives during the sort step resulted in a severely I/O-bound situation, dramatically limiting throughput.

"The solution was to take advantage of Gordon's flash memory, which provides much higher speed than conventional disk drives for the random access I/O operations of the sort step," said Pfeiffer. "Several terabytes of flash were aggregated into what we call "BigFlash" nodes, which significantly reduced the I/O bottleneck in this step and contributed to helping researchers meet the project's timelines."

"The bulk of the analysis was completed in six weeks (including learning time on Gordon) using more than 300,000 core hours of computer time," said Glenn K. Lockwood, a user services consultant at SDSC. "That analysis would have taken more than four years of 24/7 compute time on an 8-core workstation."

The collaboration also demonstrated the need for large-scale, high-performance computing resources when analyzing hundreds of human genomes in constrained timeframes. With 340 teraflops of computing power, 64 terabytes of main memory, and 300 terabytes of flash memory, Gordon ranked among the 50 fastest supercomputers in the world when it debuted in late 2011, according to the Top500 list.

According to Lockwood, at the project's peak throughput, the WGS pipeline was using 350 terabytes of storage on SDSC's high-performance storage system and 5,000 processor cores representing 30 percent of the system capacity.

"The Janssen collaboration validated our vision for the Gordon system," said Michael Norman, SDSC's director and principal investigator for the Gordon project. "We saw that emerging big data challenges such as human genomics would dictate new supercomputer architectures where memory and IOPS (I/O operations per second) would be more important than raw computing power, so we designed the system accordingly."

Provided by University of California - San Diego

Citation: SDSC assists in whole-genome sequencing analysis under collaboration with Janssen (2014, March 20) retrieved 29 June 2024 from https://phys.org/news/2014-03-sdsc-whole-genome-sequencing-analysis-collaboration.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

'Gordon,' a supercomputer with unique flash memory, helps battle autism

0 shares

Feedback to editors

SDSC assists in whole-genome sequencing analysis under collaboration with Janssen

The Milky Way's eROSITA bubbles are large and distant

Saturday Citations: Armadillos are everywhere; Neanderthals still surprising anthropologists; kids are egalitarian

NASA astronauts will stay at the space station longer for more troubleshooting of Boeing capsule

The beginnings of fashion: Paleolithic eyed needles and the evolution of dress

Analysis of NASA InSight data suggests Mars hit by meteoroids more often than thought

New computational microscopy technique provides more direct route to crisp images

A harmless asteroid will whiz past Earth Saturday. Here's how to spot it

Tiny bright objects discovered at dawn of universe baffle scientists

New method for generating monochromatic light in storage rings

Soft, stretchy electrode simulates touch sensations using electrical signals

Relevant PhysicsForums posts

Cyber security in the modern/post-modern internet

AI In Actual Use

Help! Old PC dog has to learn new Mac tricks

How can you trade non integer values of Bitcoin?

Help with my buggy TV/Streaming Services

Looking for a reliable inkjet All-In-One printer for photos and docs

'Gordon,' a supercomputer with unique flash memory, helps battle autism

NSF awards $12 million to SDSC to deploy 'Comet' supercomputer

SDSC readying 'Gordon' supercomputer for pre-production trials this month

'Data motion metric' needed for supercomputer rankings, says SDSC's Snavely

SDSC to venture capitalists: Data-intensive supercomputing is here

SDSC's Gordon Supercomputer assists in crunching large Hadron Collider data

China's Huawei unveils chip for global big data market

New 28-GHz transceiver paves the way for future 5G devices

China maintains reign over world supercomputer rankings: survey

China tops global supercomputer speed list for 7th year (Update)

Microsoft testing underwater datacenters

New Intel chip technology designed to foil hackers

Medical Xpress

Tech Xplore

Science X

SDSC assists in whole-genome sequencing analysis under collaboration with Janssen

The Milky Way's eROSITA bubbles are large and distant

Saturday Citations: Armadillos are everywhere; Neanderthals still surprising anthropologists; kids are egalitarian

NASA astronauts will stay at the space station longer for more troubleshooting of Boeing capsule

The beginnings of fashion: Paleolithic eyed needles and the evolution of dress

Analysis of NASA InSight data suggests Mars hit by meteoroids more often than thought

New computational microscopy technique provides more direct route to crisp images

A harmless asteroid will whiz past Earth Saturday. Here's how to spot it

Tiny bright objects discovered at dawn of universe baffle scientists

New method for generating monochromatic light in storage rings

Soft, stretchy electrode simulates touch sensations using electrical signals

Relevant PhysicsForums posts

Related Stories

'Gordon,' a supercomputer with unique flash memory, helps battle autism

NSF awards $12 million to SDSC to deploy 'Comet' supercomputer

SDSC readying 'Gordon' supercomputer for pre-production trials this month

'Data motion metric' needed for supercomputer rankings, says SDSC's Snavely

SDSC to venture capitalists: Data-intensive supercomputing is here

SDSC's Gordon Supercomputer assists in crunching large Hadron Collider data

Recommended for you

China's Huawei unveils chip for global big data market

New 28-GHz transceiver paves the way for future 5G devices

China maintains reign over world supercomputer rankings: survey

China tops global supercomputer speed list for 7th year (Update)

Microsoft testing underwater datacenters

New Intel chip technology designed to foil hackers

Newsletter sign up

Donate and enjoy an ad-free experience