Red Storm upgrade lifts Sandia supercomputer to 2nd in world, but 1st in scalability, say researchers
A $15 million upgrade to Sandia’s Red Storm computer has increased its peak speed from 41.5 to 124.4 teraflops in a computing terrain in which a single teraflop was a big deal only 6 years ago.
The machine, built by Cray Inc., is now rated second fastest in the world, with a Linpack speed of 101.4 teraflops. The widely recognized Linpack test measures a supercomputer’s speed as applied to a computing problem.
“While not number one in speed, in terms of scalability, Red Storm is the best in the world,” says Bill Camp, director of Sandia’s Computation, Computers, Information, and Math center.
Scalability refers to a supercomputer’s computational efficiency as the number of processors on a job is increased. “You want to use more processors to get large jobs done more quickly,” says Camp, “but if the computer doesn’t scale well you can lose much of that speedup.” Red Storm loses little efficiency on large numbers of processors.
“The Cray XT3 supercomputers now dominating the highest end of computing worldwide is based upon Sandia’s Red Storm,” says Camp, who together with Sandia colleague Jim Tomkins, led the design of the machine. “Scientists love it because they can do bigger science more quickly on it than any other computer in existence, except for molecular dynamics studies on BlueGene/L (Lawrence Livermore National Lab's supercomputer). Otherwise, it’s the best thing since night baseball.”
“The machine’s also a computational workhorse. It gets the job done,” says Sandia researcher Steve Attaway, a winner of several national computing awards who runs large engineering simulations on the machine.
Red Storm was designed under the National Nuclear Security Administration’s Advanced Simulation & Computing program and is used for NNSA’s stockpile stewardship program, which helps ensure that the U.S. nuclear weapons stockpile is safe and reliable without the resumption of underground nuclear testing. This supercomputer also runs computer codes used for conducting materials science simulations critical to national security. Sandia is an NNSA laboratory.
The Red Storm design became the basis for the Cray XT3™ massively parallel processor (MPP) supercomputer that has been installed at a number of prestigious supercomputing centers around the world.
Purchasers of this design include Oak Ridge National Laboratory, will create an even bigger supercomputer than Red Storm based on the same design, as well as Lawrence Berkeley Labs, Pittsburgh Supercomputer Center (which the largest National Science Foundation site), the U.S. Army, the United Kingdom’s AWE Atomic Weapons Establishment program, the national computing centers in Finland, Switzerland and the U.K., and other U.S. and allied government sites.
Red Storm is Sandia’s largest high-performance computer, but is thrifty in its use of power. It uses 2.2 megawatts, roughly half of other supercomputers of its class. This means that comparatively less of Red Storm’s energy is converted to useless heat.
Red Storm also takes up a relatively small area — about 3,500 square feet.
Its Linpack test demonstrated high reliability, repeatedly running for nine hours on over 26,000 processor cores without a failure.
The machine took less than three years to create from concept to customer shipment. It was relatively inexpensive to develop and build — $77.5 million including engineering and design costs — and is used for large scientific and technical problems.
Sandia developed the architectural specifications of the machine and did much of the software development. “The hardware at Cray was built to meet our specifications,” says Sandia Senior Scientist Jim Tomkins.
The upgrade included the addition of a fifth row of cabinets and upgrading the entire system with dual-core AMD Opteron TM processors, resulting in a supercomputer with over 26,000 processor cores. Dual-core technology fits two processor cores on a single die; doubling processing capacity with minimal impact on power consumption and temperature levels.
Why is Red Storm so efficient? In part, says Sandia researcher Robert Ballance, because its operating system is based on minimalist software — termed a lightweight kernel — which carries just enough functionality to load the job, put it on the network, and stop it. Any other software is job-specific; thus, each computer node (at which two chips are located) in effect lugs no useless software on its back.
The original technology was pioneered by Sandia on its ASCI Red machine, built by Intel Corporation, the world’s first terascale supercomputer.
Source: Sandia National Laboratories