Futuristic 48-Core Intel Chip Could Reshape How Computers are Built (w/ Video)

Dec 03, 2009
Single Chip Cloud Computer has 48 Intel cores and runs at as low as 25 watts

(PhysOrg.com) -- Researchers from Intel Labs demonstrated an experimental, 48-core Intel processor, or "single-chip cloud computer," that rethinks many of the approaches used in today's designs for laptops, PCs and servers.

Researchers from Labs demonstrated an experimental, 48-core Intel , or "single-chip cloud computer," that rethinks many of the approaches used in today's designs for laptops, PCs and servers. This futuristic chip boasts about 10 to 20 times the processing engines inside today's most popular Intel Core-branded processors.

The long-term research goal is to add incredible scaling features to future computers that spur entirely new software applications and human-machine interfaces. The company plans to engage industry and academia next year by sharing 100 or more of these experimental chips for hands-on research in developing new software applications and programming models.

This video is not supported by your browser at this time.

While Intel will integrate key features in a new line of Core-branded chips early next year and introduce six- and eight-core processors later in 2010, this prototype contains 48 fully programmable Intel processing cores, the most ever on a single silicon chip. It also includes a high-speed on-chip network for sharing information along with newly invented power management techniques that allow all 48 cores to operate extremely energy efficiently at as little as 25 watts, or at 125 watts when running at maximum performance (about as much as today's Intel processors and just two standard household light bulbs).

Intel plans to gain a better understanding of how to schedule and coordinate the many cores of this experimental chip for its future mainstream chips. For example, future laptops with processing capability of this magnitude could have "vision" in the same way a human can see objects and motion as it happens and with high accuracy.

Imagine, for example, someday interacting with a computer for a virtual dance lesson or on-line shopping that uses a future laptop's 3-D camera and display to show you a "mirror" of yourself wearing the clothes you are interested in. Twirl and turn and watch how the fabric drapes and how the color complements your skin tone.

This kind of interaction could eliminate the need of keyboards, remote controls or joysticks for gaming. Some researchers believe computers may even be able to read brain waves, so simply thinking about a command, such as dictating words, would happen without speaking.

Intel Labs has nicknamed this test chip a "single-chip cloud computer" because it resembles the organization of datacenters used to create a "cloud" of computing resources over the Internet, a notion of delivering such services as online banking, social networking and online stores to millions of users.

Cloud datacenters are comprised of tens to thousands of computers connected by a physically cabled network, distributing large tasks and massive datasets in parallel. Intel's new experimental research chip uses a similar approach, yet all the computers and networks are integrated on a single piece of Intel 45nm, high-k metal-gate silicon about the size of a postage stamp, dramatically reducing the amount of physical computers needed to create a cloud datacenter.

"With a chip like this, you could imagine a cloud datacenter of the future which will be an order of magnitude more energy efficient than what exists today, saving significant resources on space and power costs," said Justin Rattner, head of Intel Labs and Intel's Chief Technology Officer. "Over time, I expect these advanced concepts to find their way into mainstream devices, just as advanced automotive technology such as electronic engine control, air bags and anti-lock braking eventually found their way into all cars."

Cores Allow Software to Intelligently Direct Data for Efficiency

The concept chip features a high-speed network between cores to efficiently share information and data. This technique gives significant improvement in communication performance and energy efficiency over today's datacenter model, since data packets only have to move millimeters on chip instead of tens of meters to another computer system.

Application software can use this network to quickly pass information directly between cooperating cores in a matter of a few microseconds, reducing the need to access data in slower

off-chip system memory. Applications can also dynamically manage exactly which cores are to be used for a given task at a given time, matching the performance and energy needs to the demands of each.

Related tasks can be executed on nearby cores, even passing results directly from one to the next as in an assembly line to maximize overall performance. In addition, this software control is extended with the ability to manage voltage and clock speed. Cores can be turned on and off or change their performance levels, continuously adapting to use the minimum energy needed at a given moment.

Overcoming Software Challenges

Programming processors with multiple cores is a well-known challenge for the industry as computer and software makers move toward many-cores on a single silicon chip. The prototype allows popular and efficient parallel programming approaches used in cloud datacenter software to be applied on the chip. Researchers from Intel, HP and Yahoo's Open Cirrus collaboration have already begun porting cloud applications to this 48 IA core chip using Hadoop, a Java software framework supporting data-intensive, distributed applications as demonstrated by Rattner today.

Intel plans to build 100 or more experimental chips for use by dozens of industrial and academic research collaborators around the world with the goal of developing new software applications and programming models for future many-core processors.

"Microsoft is partnering with Intel to explore new hardware and software architectures supporting next-generation client plus cloud applications," said Dan Reed, Microsoft's corporate vice president of Extreme Computing. "Our early research with the single cloud computer prototype has already identified many opportunities in intelligent resource management, system software design, programming models and tools, and future application scenarios."

This milestone represents the latest achievement from Intel's Tera-scale Computing Research Program, aimed at breaking barriers to scaling future chips to 10s-100s of cores. It was co-created by Intel Labs at its Bangalore (India), Braunschweig (Germany) and Hillsboro, Ore. (U.S.) research centers. Details on the chip's architecture and circuits are scheduled to be published in a paper at the International Solid State Circuits Conference in February.

Provided by Intel (news : web)

Explore further: EU holds largest-ever cyber security exercise

add to favorites email to friend print save as pdf

Related Stories

Dell Talking About 80-Core Chip Processor

Nov 20, 2008

(PhysOrg.com) -- This week Michael Dell (CEO of Dell) gave a slide presentation that included IntelĀ“s recently developed 80-core processor. This isn't the first time that the 80-core chip was mentioned in ...

Intel Research Chip Advances 'Era Of Tera'

Feb 12, 2007

Intel researchers have developed the world's first programmable processor that delivers supercomputer-like performance from a single, 80-core chip not much larger than the size of a finger nail while using ...

Intel Develops Tera-Scale Research Chips

Sep 26, 2006

Intel Corporation today described the significant technical challenges that need to be addressed if computing, from personal devices to giant data centers, is to keep up with increasing demand by consumers ...

Doubling Down On Intel-Based Servers

Jul 11, 2005

Intel Corporation today introduced the dual-core Intel Pentium D processor-based platform for entry-level servers that includes two processor cores in one processor and the corresponding Intel E7230 chipset.

Recommended for you

Google execs discuss regulation, innovation and bobble-heads

1 hour ago

Eric Schmidt and Jonathan Rosenberg help run Google, one of the world's best-known, most successful - and most controversial - companies. They've just published a new book, "How Google Works," a guide to managing what they ...

Gamers' funding fuels meteoric rise of 'Star Citizen'

1 hour ago

Chris Roberts' brain spun out a grand vision: a rich, immersive galaxy; exquisite spaceships traversing between infinite star systems with thousands of computer gamers manning the cockpits, racing, dogfighting and defending ...

LinkedIn reports 3Q loss but sales climb

1 hour ago

LinkedIn Corp. posted a third-quarter loss on Thursday, but its results were better than expected as revenue grew sharply, sending shares of the online professional networking service higher in extended trading.

User comments : 19

Adjust slider to filter visible comments by rank

Display comments: newest first

benrot
Dec 03, 2009
This comment has been removed by a moderator.
SincerelyTwo
4.3 / 5 (6) Dec 03, 2009
That CPU might do the trick, I was hoping that eventually programmers can develop software without actually needing to be concerned about threading. At the most we could gives some 'direction' for the CPU through meta-instructions in special situations to do explicit threading, but in general threading should be implicit and managed by the CPU which would be able to determine how to best deal with certain types of information.

A mix of implicit/explicit threading through meta-instructions compiled in to the application, that would be such a fantastic approach and alleviate a lot of burden from programmers.

And when I talk about threading I mean both across all cores and on any single core, the CPU should decide when to use extra or share cores unless given explicit instructions. That way software will always run utilizing far more of the CPU's potential.
jselin
2 / 5 (2) Dec 03, 2009
What SincerelyTwo said ^^^

I'm amazed its left to the programmers to make use of the technology the way it is now. Software developers have to make the decision that such additional programming work is value added. This results in pricer software if you want to take advantage of multithreading :(

If it doesn't take full advantage of the cores the way tailored programming might then there should be a way for the programmer to choose to override.
PinkElephant
5 / 5 (1) Dec 03, 2009
SincerelyTwo/jselin, what you propose is impossible. Whether a program can be efficiently threaded depends entirely on the nature of the program's workflow and algorithms. Some tasks are just inherently serial (i.e. each step vitally depends on the output of a preceding step), and cannot be parallelized at all; other tasks are "embarassingly parallel", and can be easily farmed out to a set of worker threads. But many real-life tasks fall into serial, rather than parallel category. Thus, as a rule, most programs will not gain performance on massively parallel systems. Still, such systems would typically enable you to simultaneously run many non-threaded programs with no loss of performance. Of course, some useful tasks are parallelizable: e.g. tile-based rendering, image processing, breadth-first searches (like in chess games, or similar AI). So many-core systems aren't totally useless; but such tasks must still be explicitly implemented with a mind toward parallelism.
Zenmaster
2.5 / 5 (2) Dec 04, 2009
"Application software can use this network to quickly pass information directly between cooperating cores in a matter of a few microseconds, reducing the need to access data in slower off-chip system memory."

Can't be right. Since when does even off-chip memory access take a few microseconds...
Feldagast
not rated yet Dec 04, 2009
Where is my talking computer?? Why read brain waves when our voices would do as much. Talking Star Trek and the enterprises main computer. Not the one from the 60's either.
Buyck
not rated yet Dec 04, 2009
These 48 cores are realized on a 45nm chip (not for consumers). What if in 2015 the 22nm or even the 15nm chips are for sale to every consumer?! We will likely see a significant increase of many cores on commercial available chips in the coming years. I think by 2020-2025 a thousand cores or maybe even more.
plasticpower
not rated yet Dec 04, 2009
]
Can't be right. Since when does even off-chip memory access take a few microseconds...


You can access registers and different levels of cache a magnitude faster than RAM (main memory) and thousands of times faster than non-volatile (dear god) memory such as hard drives. When you're passing information between machines, you're running it over layers and layers of devices. From the registers in the CPU through main memory via some sort of a bus to a device that's sitting on the bus and is probably sharing said bus with other similar devices, which then has to send information over some kind of networked connection and then the process is repeated in reverse on the receiving end.

Instead what Intel is doing is passing information between registers of multiple cores, the same thing that makes RISC machines so great, you aren't making all these accesses to main memory and beyond. This article doesn't really explain just how hard it would be to implement something like this.
Temple
not rated yet Dec 04, 2009
Apple has already released its first step toward making it easy and intuitive for developers to make use of multiple cores/processors.

Whatever your particular OS religion, it's worth checking out the open-source (open-ish) Grand Central Dispatch.

There's some info here (follow the link in the article for an exhaustive review): http://arstechnic...inux.ars
SincerelyTwo
not rated yet Dec 04, 2009
PinkElephant,

I realize that, which is why I dedicated a significant part of my statements to describing implicit *and explicit multi-threading approach. For example, it could be possible for multiple applications to be working on the 'current' core, in which case the CPU will decide to offload the work to another core without the programmer actually have to command the CPU to do such a thing.

I'm a programmer with regular experience in this, it's obvious to me that there are ways the systems can be architected to automate threading based on various conditions not directly tied to any application itself, but the size of the job and current consumption of a given core.

It's more about the distribution of work than it is segmenting of any given algorithm. Not 'automated parallelism', automated distribution* of work.

Explicit threading for parallelism.

Implicit threading for redistribution of work across multiple cores for general processing.
Nik_2213
not rated yet Dec 04, 2009
http://en.wikiped...ansputer

IIRC, the Occam language written to work with the Transputer enabled automatic task distribution into parallel processes, pipelines etc. One benefit, IIRC, was that different cores ran asynchronously, so you could add more / faster hardware as easy as USB...
maxcypher
Dec 04, 2009
This comment has been removed by a moderator.
simulus
not rated yet Dec 04, 2009
Increasing number of cores becomes meaningless at some point (so don't expect billion-core chips for example). More of something being equivalent to better is a meaningless addiction.

Between one and one billion cores, what is the optimum and do we have to design software around it? How long would that take?
vantomic
5 / 5 (1) Dec 04, 2009
"This kind of interaction could eliminate the need of keyboards, remote controls or joysticks for gaming. Some researchers believe computers may even be able to read brain waves, so simply thinking about a command, such as dictating words, would happen without speaking."

I have a real doubt about how this could work. If I used this setup to type these two simple sentences it would come out as...

"I have a real doubt as to about no wait I have a real doubt as to how this could work man my grammar sucks where is my beer ok I don't think this could work what no i didn't take the damn garbage out what this stupid software ARAGGSDGHGH stop"

"
PinkElephant
not rated yet Dec 04, 2009
PinkElephant,
...
It's more about the distribution of work than it is segmenting of any given algorithm. Not 'automated parallelism', automated distribution* of work.


That's already being done by every single modern OS (beginning with the first flavors of UNIX.) The word you're looking for, is "scheduling". Of course, some OSes are better at scheduling and fair resource allocations, than others. My experience with Windows, for example, leads me to conclude that Microsoft can't be bothered to implement a decent scheduler. Very frequently, the entire computer hangs up (even on dual-core systems!) when a single task somehow manages to "hog" the CPU. One such routine culprit is (of all things) Outlook....
tkjtkj
not rated yet Dec 05, 2009
25watts?? so, how come the video shows the test-bed to include a massive copper multi-pipe heatsink???
EvgenijM
not rated yet Dec 05, 2009
They are a little bit late, nvidia fermi is going out soon. Besides, nvidia gpgpu already works in software for popular OS, unlike this architechtuure.
Velanarris
not rated yet Dec 06, 2009
Cores don't need to use a bus to talk to main memory since Nehalem came out.

This isn't about memory registers it's about more processing power with fixed clock speeds. More cores means more calculative ability meaning less time even talking to memory. This is about streamlining the process through over pwoering the application. It will be interesting to see if this is the correct methodology going forward.
psommerfeld
not rated yet Dec 07, 2009
25watts?? so, how come the video shows the test-bed to include a massive copper multi-pipe heatsink???


I'm wondering if they mean 25 - 125W PER CORE?! Not so power-efficient if this is the case! If these are full-blown cores than they I would expect them to be at least 25W/core (or 1000 W per package!)
Velanarris
not rated yet Dec 07, 2009
25W over average die resistance figures is a lot of heat in a space that small.

PinkElephant
5 / 5 (1) Dec 08, 2009
I'm guessing they mean 125W for the entire 48-core chip. That would mean about 2.6 W per core when active (I'm also guessing 25W figure is when the chip is idling.) 2.6W is within a normal range of operation for small chips such as used in netbooks. Intel's own Atom line fits within such a power envelope, so perhaps those 48 cores are each derived from the Atom design... See here for more info about Atom:

http://en.wikiped...itecture

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.