Skyscraper-style chip design boosts performance 1,000-fold

December 10, 2015 by Ramin Skibba
A multi-campus team led by Stanford engineers Subhasish Mitra and H.-S. Philip Wong has developed a revolutionary high-rise architecture for computing.

For decades, engineers have designed computer systems with processors and memory chips laid out like single-story structures in a suburb. Wires connect these chips like streets, carrying digital traffic between the processors that compute data and the memory chips that store it.

But suburban-style layouts create long commutes and regular traffic jams in electronic circuits, wasting time and energy.

That is why researchers from three other universities are working with Stanford engineers, including Associate Professor Subhasish Mitra and Professor H.-S. Philip Wong, to create a revolutionary new high-rise architecture for computing.

In Rebooting Computing, a special issue of the IEEE Computer journal, the team describes its new approach as Nano-Engineered Computing Systems Technology, or N3XT.

N3XT will break data bottlenecks by integrating processors and memory like floors in a skyscraper and by connecting these components with millions of "vias," which play the role of tiny electronic elevators. The N3XT high-rise approach will move more data, much faster, using far less energy, than would be possible using low-rise circuits.

"We have assembled a group of top thinkers and advanced technologies to create a platform that can meet the computing demands of the future," Mitra said.

Shifting electronics from a low-rise to a high-rise architecture will demand huge investments from industry – and the promise of big payoffs for making the switch.

"When you combine higher speed with lower energy use, N3XT systems outperform conventional approaches by a factor of a thousand," Wong said.

To enable these advances, the N3XT team uses new nano-materials that allow its designs to do what can't be done with silicon – build high-rise computer circuits.

"With N3XT the whole is indeed greater than the sum of its parts," said co-author and Stanford electrical engineering Professor Kunle Olukotun, who is helping optimize how software and hardware interact.

New transistor and memory materials

Engineers have previously tried to stack silicon chips but with limited success, said Mohamed M. Sabry Aly, a postdoctoral research fellow at Stanford and first author of the paper.

Fabricating a silicon chip requires temperatures close to 1,800 degrees Fahrenheit, making it extremely challenging to build a silicon chip atop another without damaging the first layer. The current approach to what are called 3-D, or stacked, chips is to construct two silicon chips separately, then stack them and connect them with a few thousand wires.

But conventional, 3-D are still prone to traffic jams and it takes a lot of energy to push data through what are a relatively few connecting wires.

The N3XT team is taking a radically different approach: building layers of processors and memory directly atop one another, connected by millions of electronic elevators that can move more data over shorter distances that traditional wire, using less energy. The N3XT approach is to immerse computation and memory storage into an electronic super-device.

The key is the use of non-silicon materials that can be fabricated at much lower temperatures than silicon, so that processors can be built on top of memory without the new layer damaging the layer below.

N3XT high-rise chips are based on carbon nanotube transistors (CNTs). Transistors are fundamental units of a computer processor, the tiny on-off switches that create digital zeroes and ones. CNTs are faster and more energy-efficient than silicon processors. Moreover, in the N3XT architecture, they can be fabricated and placed over and below other layers of memory.

Among the N3XT scholars working at this nexus of computation and memory are Christos Kozyrakis and Eric Pop of Stanford, Jeffrey Bokor and Jan Rabaey of the University of California, Berkeley, Igor Markov of the University of Michigan, and Franz Franchetti and Larry Pileggi of Carnegie Mellon University.

Team members also envision using data storage technologies that rely on materials other than silicon, which can be manufactured on top of CNTs, using low-temperature fabrication processes.

One such data storage technology is called resistive random-access memory, or RRAM. Resistance slows down electrons, creating a zero, while conductivity allows electrons to flow, creating a one. Tiny jolts of electricity switch RRAM memory cells between these two digital states. N3XT team members are also experimenting with a variety of nano-scale magnetic materials to store digital ones and zeroes.

Just as skyscrapers have ventilation systems, N3XT high-rise chip designs incorporate thermal cooling layers. This work, led by Stanford mechanical engineers Kenneth Goodson and Mehdi Asheghi, ensures that the heat rising from the stacked layers of electronics does not degrade overall system performance.

Proof of principle

Mitra and Wong have already demonstrated a working prototype of a high-rise chip. At the International Electron Devices Meeting in December 2014 they unveiled a four-layered chip made up of two layers of RRAM sandwiched between two layers of CNTs.

In their N3XT paper they ran simulations showing how their high-rise approach was a thousand times more efficient in carrying out many important and highly demanding industrial software applications.

Stanford computer scientist and N3XT co-author Chris Ré, who recently won a "genius grant" from the John D. and Catherine T. MacArthur Foundation, said he joined the N3XT collaboration to make sure that computing doesn't enter what some call a "dark data" era.

"There are huge volumes of data that sit within our reach and are relevant to some of society's most pressing problems from health care to climate change, but we lack the computational horsepower to bring this data to light and use it," Re said. "As we all hope in the N3XT project, we may have to boost horsepower to solve some of these pressing challenges."

Explore further: Researchers combine logic, memory to build a 'high-rise' chip

Related Stories

TU Delft launches Delfi-n3Xt satellite

November 22, 2013

On the morning of the 21st of November, the Delfi-n3Xt was launched from a base in Yasny, Russia. At 9.47, Mission Control in Delft made its first successful contact with the satellite. For almost five years now, students ...

Engineers find a simple yet clever way to boost chip speeds

June 17, 2015

A typical computer chip includes millions of transistors connected with an extensive network of copper wires. Although chip wires are unimaginably short and thin compared to household wires both have one thing in common: ...

Recommended for you

Wireless power could enable ingestible electronics

April 27, 2017

Researchers at MIT, Brigham and Women's Hospital, and the Charles Stark Draper Laboratory have devised a way to wirelessly power small electronic devices that can linger in the digestive tract indefinitely after being swallowed. ...

16 comments

Adjust slider to filter visible comments by rank

Display comments: newest first

antialias_physorg
5 / 5 (4) Dec 10, 2015
At the International Electron Devices Meeting in December 2014 they unveiled a four-layered chip made up of two layers of RRAM memory sandwiched between two layers of CNTs.

Ah, OK. From the headline I thought they were stacking thousands of layers (which would run into some serious heat management issues)
ayesdi_fdesay
1 / 5 (1) Dec 10, 2015
Does anyone know what a realistic performance increase would look like with this technology and in what time frame? And are we talking about a "general computing" performance increase (with code that consists of lots of computational dependencies and branches) or more specialized computations? (e.g. highly parallel vector math for graphics computations)
SkyLy
not rated yet Dec 10, 2015
So, any bright mind to put this paper under the light of pragmatism ?
antialias_physorg
5 / 5 (4) Dec 10, 2015
Does anyone know what a realistic performance increase would look like with this technology and in what time frame?

Since this is a lab prototype you can expect about 5 years before this (if ever) hits the market. There's a lot of unknowns that need to be sorted first:
Shifting electronics from a low-rise to a high-rise architecture will demand huge investments from industry – and the promise of big payoffs for making the switch.

Now remember: Industry doesn't want the best product but the best profit. As long as the current set of processors/memory are cash cows there is no incentive to switch over. For anyone making those huge investments there's also the chance that something else will come along that will boost 'regular' chips in the meantime. Companies are loathe to engage in risks.

If it means that a lot of code has to be rewritten (even if it would boost speed) then that's another issue that would slow adoption.
axemaster
5 / 5 (4) Dec 10, 2015
Ah, OK. From the headline I thought they were stacking thousands of layers (which would run into some serious heat management issues)

Yeah, that was the first thing that crossed my mind as well. Their use of carbon based technology really boosts my hopes though, since both carbon nanotubes and graphene have been shown to have thermal conductivities comparable to or better than copper. Ideally they could build heatsinking right into the bulk of the chip...

I can't wait to see 3D architectures become the new standard. Perhaps they'll be able to increase chip frequencies as well!

EDIT: By the by, does anyone know what happened to spintronics? That would be great to have also, since spintronics would generate far less heat than conventional electronics. The two technologies would fit nicely together.
antialias_physorg
5 / 5 (6) Dec 10, 2015
Ideally they could build heatsinking right into the bulk of the chip...

Still, heat dissipation is a function of surface, while stacking incraeses used volume. If you increase the size with the third power but the cooling surface only with the second power you quickly run into problems. The more you go to a volumetric design the larger part of your volume you need to take up for heatpipes runninng through it (effectively negating the advantage after a certain size). The real advantage here comes from the low power usage which means less total heat per volume. But that still puts severe upper limits on high stacks of this type.

does anyone know what happened to spintronics?

It's still on the current ITRS roadmap (section "Beyond CMOS")..estimated roughly for 2030.
http://electroiq....tion.pdf

(If you still have a hard drive with spinning discs in your computer you're using a spintronic effect (read heads))
Whydening Gyre
5 / 5 (6) Dec 10, 2015
At the International Electron Devices Meeting in December 2014 they unveiled a four-layered chip made up of two layers of RRAM memory sandwiched between two layers of CNTs.

Ah, OK. From the headline I thought they were stacking thousands of layers (which would run into some serious heat management issues)

Ya. 4 layers is a far cry from a "skyscraper"...:-)
LariAnn
1 / 5 (1) Dec 10, 2015
But this radical new design is similar to the CPU design used in the Terminators. Skynet can't be far behind!
Captain Stumpy
3.6 / 5 (5) Dec 10, 2015

Still, heat dissipation is a function of surface, while stacking incraeses used volume. If you increase the size with the third power but the cooling surface only with the second power you quickly run into problems. The more you go to a volumetric design the larger part of your volume you need to take up for heatpipes runninng through it (effectively negating the advantage after a certain size). The real advantage here comes from the low power usage which means less total heat per volume
@antialias_physorg
@axemaster
If they can build a stackable 3D chip that reduces the energy used but increases efficiency, why not include a cooling mechanism as well for larger stacks?
kinda like a car engine? or even build in vented "floors" that will allow for circulation of air - like a server farm sets up it's layout?
(just curious and would love to hear feedback on that issue)
antialias_physorg
5 / 5 (5) Dec 10, 2015
The thing is you have to get the heat out to some kind of cooling surface.

Consider it like this: a cubic unit of circuitry produces x amount of heat per time second (let's just call this amount of energy X). You have to get n*X away from the entire stack per second (not just the unit that's producing it...because there's units right/left/up/down and back/front wanting to get rid of their heat, too).
So you build a channel with some cooling medium in. But that channel only can move heat out via its exit surface (proportional to its cross section times flow speed).

At this point we see the problem: To double the heat removed you have to double either the cross section or the flow speed. Flow speed increase is limited (due to e.g. cavitation) and doubling the cross section means your more than doubling the volume cut out by the channel (as it is a column that passes through the entire structure). Volume of cooling mechanism increases by power 3/2 faster than cooling ability.

my2cts
3 / 5 (2) Dec 10, 2015
1800 'F = 980 'C.
But how much is a cubic nano inch ?
24volts
5 / 5 (1) Dec 10, 2015
Getting working reliable vias have been a bigger problem than heat so far from what I understand reading other articles about stacking chips on top of each other.
It would certainly allow computer systems to become much smaller even than some of the tiny ones out now. The average home desktop would end up about the same size as a 3.5" hard drive and that's only to allow enough usb ports or whatever we will be using by then.
Benni
1 / 5 (2) Dec 11, 2015
@antialias_physorg
@axemaster
If they can build a stackable 3D chip that reduces the energy used but increases efficiency, why not include a cooling mechanism as well for larger stacks?
.....because you do not understand why nonvolatile RRAM (ReRAM) cell architecture is being used on the memory chips.

December 2014 they unveiled a four-layered chip made up of two layers of RRAM memory sandwiched between two layers of CNTs


One such data storage technology is called resistive random-access memory, or RRAM. Resistance slows down electrons, creating a zero, while conductivity allows electrons to flow, creating a one. Tiny jolts of electricity switch RRAM memory cells between these two digital states.


Crossbar owns almost all the patents on RRAM (ReRAM). The advantage of RRAM is the energy consumption is 1/10 that of NAND or DRAM, thereby reducing by 90% the necessity of the heatsinking real estate on the chip, plus RRAM can be fabbed to 8nm vs 15 others.

Eikka
not rated yet Dec 15, 2015
NAND or DRAM


You don't use DRAM inside a processor, you use SRAM. All CPU internal caches, registers and memories that the CPU actually operates on are made of SRAM.

DRAM requires intricate timing to read and write because it's basically like juggling balls in the air for bits - it needs dedicated controllers and caches which all take up chip area and power. That also makes it slow: if you miss the read/write window you have to wait a whole refresh cycle.

NAND flash has similiar problems, and it wears out, and it's not fully random access but operates in large chunks, all of which makes it practically impossible to use for the internal functions of a CPU.

RRAM is being considered because it's almost as fast as SRAM so it can be used internally to the CPU circuit rather than as external mass memory. It's directly accessible to the CPU at the rate at which it operates, so it can potentially widen or eliminate the bottleneck between CPU and RAM.
Eikka
not rated yet Dec 15, 2015
Simply stacking in DRAM or any type of memory chips in layers between CPU chips wouldn't offer any sort of advantage because they would still be architecturally external memory to the CPU itself.

You would have to access the memory through a system of caches and controllers and a memory bus - which might be shorter and therefore faster - but ultimately it's still slowing you down.

If instead the memory layers were directly accessible to the CPU as if it was all cache memory, the size of problems the CPU could efficiently work on would increase tremendously. See:

http://stackoverf...n-memory

Core i7 Xeon 5500 Series Data Source Latency (approximate)
local L1 CACHE (2.1 - 1.2 ns)
local L2 CACHE (5.3 - 3.0 ns)
local L3 CACHE, line unshared (21.4 - 12.0 ns)
local L3 CACHE, modified in another core (40.2 - 22.5 ns)

local DRAM ~60 ns
remote DRAM ~100 ns
Eikka
not rated yet Dec 15, 2015
With RRAM memory, the switching of a cell between states has been show to work at below 1 ns (>1 GHz), so the access time can be on the same order as the speed of the CPU itself.

That is simply not possible with DRAM or flash. This is a question between SRAM and RRAM, which both use almost no power when you're not doing anything with them, both are "non-volatile", and both are fast random access.

The difference with RRAM is that it doesn't lose data when completely powered down, and a crossbar memory is simpler and denser in construction than SRAM which makes it possible to cram in more memory.

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.