The National Nuclear Security Administration's (NNSA) Lawrence Livermore National Laboratory has teamed with 10 computing industry leaders to accelerate the development of powerful next-generation Linux clusters in a project dubbed Hyperion.
Hyperion brings together Dell, Intel, Supermicro, QLogic, Cisco, Mellanox, DDN, Sun, LSI, and RedHat to create a large-scale testbed for high-performance computing technologies critical to NNSA's work to maintain the aging U.S. nuclear weapons stockpile without underground nuclear testing, and industry's ability to make petaFLOP/s (quadrillion floating operations per second) computing and storage more accessible for commerce, industry and research and development.
"Hyperion represents a new way of doing business. Collectively we are building a system none of us could have built individually," said Mark Seager, LLNL project leader. "The project will advance the state-of-the-art in a cost-effective manner, benefiting both end users, such as the national security labs, and the computing industry, which can expand the market with proven, easy to deploy large-and small-scale Linux clusters."
The goal of the project is to provide a development, testing and scaling environment for new cluster technologies and infrastructure critical to the mission requirements of NNSA's Advanced Simulation and Computing program. This includes testing new hardware and software technologies and forming long-term relationships to ensure continuity in the development of new technologies for ever-larger systems over the long haul.
Important technologies for scaling up computing clusters include Open Fabrics Enterprise Edition (OFED) InfiniBand™ Open Source software; Lustre Open Source Parallel File System; and Open Source Operating System Software and cluster tools used by the Tri-Lab Capacity Clusters, which serve researchers at Lawrence Livermore, Los Alamos and Sandia national labs. In addition, Hyperion will help lay the foundation for future petascale ASC computing platforms by facilitating the development of processors, memory, networks, storage and visualization.
The first half of Hyperion is now online and being used by the collaboration. When completed in March 2009, the Hyperion cluster, located at Livermore, will have at least 1,152 nodes with 9,216 cores; with about a 100 teraFLOP/s peak; more than 9 TB of memory; InfiniBand™ 4x DDR interconnect and access to more than 47 GB/s of RAID disk bandwidth. The Hyperion testbed includes two Storage Area Networks (SAN): one based on "Data Center Ethernet" and the other based on InfiniBand™. Both SANs are currently deployed utilizing a unique TorMesh topology. This system is the largest testbed of its kind in the world and will provide the Hyperion collaborators with an unmatched opportunity to develop and test hardware and software technologies at unprecedented scale.
Hyperion helps fulfill U.S. Department of Energy/NNSA goals to provide state-of-the-art computing capabilities for national security; advance high-performance scientific computing for meeting energy, climate and other national challenges; enabling scientific discovery in basic science; and enhancing U.S. competitiveness in high performance computing.
Source: DOE/Lawrence Livermore National Laboratory
Explore further: Communication-optimal algorithms for contracting distributed tensors