20 petaflops: New supercomputer for Oak Ridge facility to regain speed lead over the Chinese

March 23, 2011 by Bob Yirka, Phys.org report

Image credit: ORNL presentation (see link below)
(PhysOrg.com) -- The Oak Ridge National Laboratory (ORNL) campus in Oak Ridge Tennessee will soon play host once again to the fastest computer in the world (barring any new sudden announcements by the Chinese). The computer, dubbed "Titan" has been commissioned by the U.S. Department of Energy, and is expected to achieve 20,000 trillion calculations (20 petaflops) per second.

It was only last October that ’s National University of Defense team unveiled the Tianhe-1A, a machine capable of computing at 2.5 .

The Titan, built by Cray Computer, will become part of a collection of some of the fastest computers in the world at the ORNL facility, joining NOAA’s Gaea, the NSF’s Kraken and the DOE’s current workhorse, the Jaguar, though new space will have to be found, as the current structure has no room. Plans are in the works for an entirely new facility to be built over the next year, which should fit in well with the delivery date for the first stage of the Titan expected to be by the end of this year, with the second stage slated for sometime next year.

The Titan architecture will rely on use of XT3, 4 and 5 processor boxes, but will use a "Gemini" XE interconnect, and it will be configured in a 3D torus topology, rather than as an array.

Supercomputers achieve their ability to process enormous amounts of data in very short amounts of time by in essence, hooking together a lot of processing boxes and then using a device to connect them all together, this is why the Gemini XE interconnect is so important; it’s actually one of only two new pieces of proprietary hardware that will be added to create the new machine; the other is the Graphics Display Unit (GDU) co-processor (likely provided by Nvidia) that will help to perform calculations more quickly. This is also why computer scientists are so easily able to choose ahead of time just how fast a new computer will be; the more processor boxes you add, the faster the end result, so long as you have an interconnect that can handle them. The Titan will also use what is being described as “globally addressable memory,” which means data won’t have to slow down as it passes through I/O channels.

The Titan is expected to be used by the DOE to calculate complex energy systems and will cost the government somewhere in the neighborhood of $100 million dollars.

Explore further: NVIDIA GPUs power world's fastest supercomputer

More information: computing.ornl.gov/SC10/docume … Booth_Talk_Bland.pdf

But in all seriousness.. While I tend to wonder what all this computing power is good for, I still appreciate the advancement of technology.

But in all seriousness.. While I tend to wonder what all this computing power is good for, I still appreciate the advancement of technology.
5 / 5 (3) Mar 23, 2011
But in all seriousness.. While I tend to wonder what all this computing power is good for, I still appreciate the advancement of technology.

Running simulations are resource intensive... ideally you'd like to consider a system with as few simplifications as possible, at reasonably high resolution, and with fine time steps. Degrees of freedom goes through the roof and so does the solve time. If every engineer/scientist/researcher had this sort of power at their fingertips we'd really be cooking with gas. Even top of the line desktops are still pretty useless in this arena so theres a long way to go in the consumer market as well.
5 / 5 (2) Mar 23, 2011
what can it do --- i don;t know -- but consider this

when scientists model the sun using real physics they take a 2d slice of the sun to run simulations on -- basically they pretend that a slice with NO WIDTH gives pretty good insight into stellar habits

why would they do this -- well a 3d slice as a model is too computational expensive -- the simulation would not end in our lifetime because it would be TOO complex.

These machines can simulate the inside of a nuclear reaction, star formation, folding of protiens, simulating drugs, or hold all known star corrdinates and known motion and tell us where galaxies likely ACTUALLY are instead of our view point looking millions of years into the past... if we wanted to travel to a galaxy we would need to go to where it actually is not where we see it based on the speed of light. These things are computational expensive and super computers can do them.
3.8 / 5 (5) Mar 23, 2011
i am waiting for exaflop supercomputer, I will create simulation of my life on it

I was one of those who build an 8080 IMSI computer when the original article describing it was published in March 1975 in Popular Science (Pop Electronics was several years away then). At that time, processor speeds were in the few hundred thousand instructions per second. To do floating point math took about a hundred instructions, so while the kiloflops wasn't exactly coined yet, that computer would have been in the low dozens of kiloflops. I have seen the increases goto to megaflops, gigaflops, etc.
I have no doubt that when chips with thousands of processors on a single chip are released (currently in the works by Intel among others), we will be seeing pentaflops give way to the next higher unit (I am too lazy to look it up), etc.
5 / 5 (1) Mar 23, 2011
"i am waiting for exaflop supercomputer, I will create simulation of my life on it"

I have an interest in exaflop computing myself (exaflop=10^18 flops per second).

Currently, researchers designing the Square Kilometer Array (SKA) radio telescope are collaborating with IBM to build an exaflop-capable machine to handle the flood of data SKA will generate (due date ~2020): http://www.comput...pdated_/

Cray and several European partners are pursuing exaflop machines for other purposes (again with a due date ~2020): http://www.eetime...n-Europe

To be sure, Intel predicts zettaflop performance (10^21 flops per second) in supercomputers sometime around 2030: http://www.h-onli...779.html

5 / 5 (3) Mar 23, 2011
folding of protiens,

My PS3 is among about 1 million others around the world doing protien folding (folding@home - with 2770 work units completed by my system so far). Not to mention hundreds of thousands of PCs. Per thier estimate they do about 1 petaflop per 50,000 PS3's. So, with about 1 million PS3's running folding@home, we functionally already have a 20 petaflop distributed computer system. With classified research I can see the use for discrete systems, but there are literally millions of us computer enthusiasts who are more than willing to contribute our computer down time for viable distributed computing projects.

From: h_t_t_p://boinc.berkeley.edu/

"Active: 320,153 volunteers, 514,614 computers.
24-hour average: 5,620.64 TeraFLOPS."

If anyone is interested there are well over 100 distributed (Boinc and non-Boinc) computing projects to participate in. All 4 cores of my home system stay busy with einstein@home.
5 / 5 (5) Mar 23, 2011
Here's a fun calculation: rough estimate of performance required to accurately simulate Earth's atmospheric circulation.

Surface of Earth: 5.1x10^15 m^2.
Height of atmosphere (incl. stratosphere): 5x10^4 m

Start with weather:
cell dimensions: 10x10x10 m
Total number of cells in grid: 2.55x10^17
time step: 10 seconds.
complexity: ~10^4 flop/cell/iteration.

To simulate 1 week worth of global weather: 1.5x10^27 flop. A zetaflop computer (50,000 times faster than the 20 petaflop one proposed above, which Intel expects by 2030) would take 7.7 days to complete such a simulation run -- assuming it operates at 100% load.

What about climate?
cell dimensions: 100x100x50 m
Total number of cells in grid: 5.1x10^14
time step: 1 minute
complexity: ~10^6 flop/cell/iteration

To simulate 100 years worth of global climate: 2.7x10^28 flop -- roughly 43 days on a zetaflop machine at 100% utilization.

One can never have too much computing power...
5 / 5 (1) Mar 23, 2011
Here's another one: simulating a human brain in real time. What would be the required performance of a general-purpose computer?

10^11 neurons (on the order of)
10^4 synapses per neuron (average)
10^3 spikes per synapse, per second (maximum rate) -- so let's use timestep of 0.1 millisecond: 10^4 timesteps/second
10^2 flop/synapse/time step (to include simulation of conduction, integration, potentiation/depression of synapses, and growth/modification of axon/dendrite trees.)

Comes out to 10^21 flops. That's 1 zetaflop/sec, folks.

And let's note that this ignores (for simplicity) the roughly 100x greater quantity of glial cells than neurons in the brain, all of which may actually participate in neural computation to some extent (according to some recent studies...)
Don't underestimate smartphones. Nvidia has an aggressive roadmap for CPU's:
5 / 5 (44) Mar 24, 2011
Maybe we'll live to see Hal's ancestor yet.
