Scaling Goes eXtreme: Researchers reach 34K CPUs

May 25, 2010
Showing the results of the most recent tests performed reaching 34 CPUs.

(PhysOrg.com) -- Currently, researchers have demonstrated the scalability of high-level excited-state coupled-cluster approaches and parallel-in-time algorithms, reaching a staggering 34,000 Core Processing Units.  Researchers at Pacific Northwest National Laboratory are targeting the software that is capable of describing the behavior of molecules in excited states, as well simulating their dynamics.

To effectively attack major scientific problems in energy, the environment, health, and national security, state-of-the-art algorithms need to be developed that are capable of running effectively on emerging computer systems capable of performing more than one quadrillion operations per second (known as a petaflop). The development of advanced algorithms that will perform on these petascale—and on soon-to-be exascale—computer architectures will make it feasible to perform massive modeling and simulation calculations on these types of computers.

PNNL researchers also are looking toward integrated multiscale approaches that can be used to model chemical processes in realistic settings and that can capitalize on having highly scalable implementations of electronic structure methodologies. This will allow high-level description of large molecular systems in realistic settings defined by finite temperatures and pressures. "We think this method will significantly enhance our understanding of fundamental processes in , systems mimicking photosynthesis, and optically active materials," says PNNL researcher Karol Kowalski.

Progress in this field will significantly enhance the systems-size limit that can be managed by and will set new standards for accuracies attainable in molecular simulations. In particular, researchers will use new, highly scalable codes to describe the energy conversion in light harvesting (photosynthesis) molecular systems and to simulate the structure, dynamics, and reactions at the mineral (Fe2O3)/ solution interface as a function of pressure and temperature.

Significant progress in scalability of a class of methodologies known as non-iterative CC methodologies for excited states was accomplished thanks to several factors, including redefined local memory management, a new global addressing strategy used to handle large data sets stored on Global Arrays, and more efficient ways of dealing with complex large-data expressions defining the CC equations. These improvements significantly increased the performance of the active-space version of non-iterative CC methods accounting for important, from the point of view of obtained accuracies, triply excited configurations.

Researchers continue to develop next-generation algorithms for massively parallel computers. Researchers are planning to address the following problems:

• Find an efficient solution to local memory bottlenecks characterizing correlation effects at different orders, further optimize the communication pattern in coupled-cluster approaches for excited states, and characterize the performance of resulting codes on computers with 50K-100K of CPUs.

• Development of the exascale parallel in time algorithms for use with terascale ab initio molecular dynamics and the terascale molecular dynamics programs

• Development of efficient interface between ab-initio theories and an adaptive multiscale simulation module.

"At the completion of this project, we expect to have a suite of massively parallel tools to perform excited-state calculations for molecular systems composed of hundreds of atoms and new algorithms to perform dynamic simulations for much longer propagation times," says Kowalski. "Another important outcome is closely related to the development of multiscale approaches capable of incorporating the changes in chemical structure of the surrounding environment."

Explore further: New algorithm identifies data subsets that will yield the most reliable predictions

More information: References:

-- Kowalski, K., Krishnamoorthy, S., Villa, O., Hammond, J., N. Govind, 2010. Active-space completely-renormalized equation-of-motion coupled-cluster formalism: Excited-state studies of green fluorescent protein, free-base porphyrin, and oligoporphyrin dimer, The Journal of Chemical Physics, April 16, 2010, doi:10.1063/1.3385315

-- E.J. Bylaska et al. Large Scale Plane Wave Density Functional Theory: Formalism, Parallelization, and Applications" in Computational Methods for Large Systems: Electronic Structure Approaches for Biotechnology and Nanotechnology, (John Wiley & Sons, Inc.) Ed. J.R. Reimers. (in press).

-- K. Kowalski, M. Valiev, Journal of Chemical Physics 131, 234107 (2009).

Related Stories

Software speeds up molecular simulations

Feb 04, 2009

(PhysOrg.com) -- Whether vibrating in place or taking part in protein folding to ensure cells function properly, molecules are never still. Simulating molecular motions provides researchers with information ...

Computational actinide chemistry: Are we there yet?

Aug 21, 2007

Ever since the Manhattan project in World War II, actinide chemistry has been essential for nuclear science and technology. Yet scientists still seek the ability to interpret and predict chemical and physical ...

Recommended for you

Designing exascale computers

Jul 23, 2014

"Imagine a heart surgeon operating to repair a blocked coronary artery. Someday soon, the surgeon might run a detailed computer simulation of blood flowing through the patient's arteries, showing how millions ...

User comments : 11

Adjust slider to filter visible comments by rank

Display comments: newest first

gold2_718
1.8 / 5 (5) May 25, 2010
34000 processors in one computer sounds like a lot but Thinking Machines was shipping computers with 65536 processors starting in 1986 (http://en.wikiped...chines). Sure, processors were slower then but it has been amusing to see companies reinventing the parallel-computing wheel over the last several years.
Megadeth312
3 / 5 (2) May 25, 2010
FYI,

The wiki article linked above says 64k bit processors, not 64k processors.

Totally different.
gold2_718
4 / 5 (1) May 25, 2010
OK, the first version had 64K 1-bit processors. Future versions had various other processor technologies. The point is that we were doing massively parallel computing quite a while ago as were several other companies (BBN, Masspar, NCube, etc). Many in the computing world act like massive parallelism is a new thing.
malapropism
5 / 5 (1) May 26, 2010
I think the point of the article though (supported by the abstract of the original paper) is not that it's been possible to bring together 34K of processors in a single unit but to create new algorithms that efficiently make use of such technologies to do useful work that hasn't been feasible previously. In fact, in the abstract the mention of the number of processors is almost an aside at the end and the core thing the researchers comment about is the algorithm scalability.
Objectivist
5 / 5 (2) May 26, 2010
It's FLOPS (FLoating point Operations Per Second), it's not a plural form, it's an abbreviation. Look -- you even quasi elaborate it yourself and yet you still make the mistake.
one quadrillion operations per second (known as a petaflop[sic!])
dirk_bruere
not rated yet May 26, 2010
What does the "quad" in quadrillion refer to?
baudrunner
1 / 5 (2) May 26, 2010
FLOPS is so a plural form... "operations"?

and

"quad" in quadrillion? You don't know what that means? It's a thousand times a trillion. You should not even be on this page, sir.
gold2_718
not rated yet May 26, 2010
I know lingo can be confusing. In the beginning, we used FLOPS as stated by Objectivist. This usage predated the parallel-computing world (Cray was using it in the '70s but I haven't traced the complete history). However, once we broke through the giga-flop barrier, it became clumsy to use FLOPS (a giga-FLOPS machine doesn't roll off the tongue) so the form with no trailing S became common. I've been in the parallel computing world since 1984. That's our lingo and we're probably not going to change. gigaflop, terraflop and petaflop computers are here to stay with no S required. Now, has anyone figured out how we are going to rate the performance of quantum computers?
Objectivist
5 / 5 (1) May 27, 2010
@baudrunner

My point was that FLOPS is not a plural form of FLOP, as there is no such thing as a FLOP (at least nothing relevant). Thus you don't skip the S in singular form. It never stops to amaze me how such simple statements can be misinterpreted.
PinkElephant
not rated yet May 29, 2010
A FLOP is to FLOPS, as MP is to MPG, or MP to MPH. IOW, nonsensical.
Quantum_Conundrum
not rated yet May 31, 2010
Now, has anyone figured out how we are going to rate the performance of quantum computers?


From what I have seen so far, a "rating system" for quantum computers may be irrelevant, as most proposed quantum circuits must be designed specifically for the algorithm or class of algorithms that is intended to be run on them.

I have yet to see anyone devise even a theoretical architecture or "machine code" for an "all purpose" quantum computer comparable to the "all purpose" PC.

But I do recognize the ridiculously huge benefits of "Spintronic" memory even in otherwise "electronic" computers, both in terms of power consumption and in terms of potential increase in maximum memory, and even processing speed.