New software-simulation system promises much more accurate evaluation of multicore-chip designs

March 9, 2012 by Larry Hardesty, Massachusetts Institute of Technology

For the last decade or so, computer chip manufacturers have been increasing the speed of their chips by giving them extra processing units, or “cores.” Most major manufacturers now offer chips with eight, 10 or even 12 cores.

But if chips are to continue improving at the rate we’ve grown accustomed to — doubling in power roughly every 18 months — they’ll soon require hundreds and even thousands of cores. Academic and industry researchers are full of ideas for improving the performance of multicore chips, but there’s always the possibility that an approach that seems to work well with 24 or 48 cores may introduce catastrophic problems when the count gets higher. No chip manufacturer will take a chance on an innovative chip design without overwhelming evidence that it works as advertised.

As a research tool, an MIT group that specializes in computer architecture has developed a software simulator, dubbed Hornet, that models the performance of multicore chips much more accurately than its predecessors do. At the Fifth International Symposium on Networks-on-Chip in 2011, the group took the best-paper prize for work in which they used the simulator to analyze a promising and much-studied multicore-computing technique, finding a fatal flaw that other simulations had missed. And in a forthcoming issue of IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, the researchers present a new version of the simulator that factors in power consumption as well as patterns of communication between cores, the processing times of individual tasks, and memory-access patterns.

The flow of data through a chip with hundreds of cores is monstrously complex, and previous software simulators have sacrificed some accuracy for the sake of efficiency. For more accurate simulations, researchers have typically used hardware models — programmable chips that can be reconfigured to mimic the behavior of multicore chips. According to Myong Hyon Cho, a PhD student in the Department of Electrical Engineering and Computer Science (EECS) and one of Hornet’s developers, Hornet is intended to complement, not compete with, these other two approaches. “We think that Hornet sits in the sweet spot between them,” Cho says.

The various tasks performed by a chip’s many components are synchronized by a master clock; during each “clock cycle,” each component performs one task. Hornet is significantly slower than its predecessors, but it can provide a “cycle-accurate” simulation of a chip with 1,000 cores. “‘Cycle-accurate’ means the results are precise to the level of a single cycle,” Cho explains. “For example, [Hornet has] the ability to say, ‘This task takes 1,223,392 cycles to finish.’”

Existing simulators are good at evaluating chips’ general performance, but they can miss problems that arise only in rare, pathological cases. Hornet is much more likely to ferret those out, as it did in the case of the research presented at the Network-on-Chip Symposium. There, Cho, his adviser and EECS professor Srini Devadas, and their colleagues analyzed a promising multicore-computing technique in which the passes computational tasks to the cores storing the pertinent data rather than passing data to the cores performing the pertinent tasks. Hornet identified the risk of a problem called deadlock, which other simulators had missed. (Deadlock is a situation in which some number of cores are waiting for resources — communications channels or memory locations — in use by other cores. No core will abandon the resource it has until it’s granted access to the one it needs, so clock cycles tick by endlessly without any of the cores doing anything.)

In addition to identifying the risk of deadlock, the researchers also proposed a way to avoid it — and demonstrated that their proposal worked with another Hornet simulation. That illustrates Hornet’s advantage over hardware systems: the ease with which it can be reconfigured to test out alternative design proposals.

Building simulations that will run on hardware “is more tricky than just writing software,” says Edward Suh, an assistant professor of electrical and computer engineering at Cornell University, whose group used an early version of Hornet that just modeled communication between cores. “It’s hard to say whether it’s inherently more difficult to write, but at least right now, there’s less of an infrastructure, and students do not know those languages as well as they do regular programming language. So as of right now, it’s more work.” Hornet, Suh says, could have advantages in situations where “you want to test out several ideas quickly, with good accuracy.”

Suh points out, however, that because Hornet is slower than either hardware simulations or less-accurate software simulations, “you tend to simulate a short period of the application rather than trying to run the whole application.” But, he adds, “That’s definitely useful if you want to know if there are some abnormal behaviors.” And furthermore, “there are techniques people use, like statistical sampling, or things like that, to say, ‘these are representative portions of the application.’”

Explore further: Designing the hardware

Related Stories

Designing the hardware

February 23, 2011

Computer chips' clocks have stopped getting faster. To maintain the regular doubling of computer power that we now take for granted, chip makers have been giving chips more "cores," or processing units. But how to distribute ...

The next operating system

February 24, 2011

At the most basic level, a computer is something that receives zeroes and ones from either memory or an input device — like a keyboard — combines them in some systematic way, and ships the results off to either ...

Microchips' optical future

February 15, 2012

Computer chips are one area where the United States still enjoys a significant manufacturing lead over the rest of the world. In 2011, five of the top 10 chipmakers by revenue were U.S. companies, and Intel, the largest of ...

Recommended for you

Researchers find tweeting in cities lower than expected

February 20, 2018

Studying data from Twitter, University of Illinois researchers found that less people tweet per capita from larger cities than in smaller ones, indicating an unexpected trend that has implications in understanding urban pace ...

Augmented reality takes 3-D printing to next level

February 20, 2018

Cornell researchers are taking 3-D printing and 3-D modeling to a new level by using augmented reality (AR) to allow designers to design in physical space while a robotic arm rapidly prints the work.

1 comment

Adjust slider to filter visible comments by rank

Display comments: newest first

1 / 5 (1) Mar 09, 2012
Things like gaming software and even design software are still limited by highly linear and procedural behaviors of inputs and outputs. Therefore, the clock speed of the processor is ultimately the most important statistic.

If you double or quadruple the number of cores, it will barely effect how smoothly a computer runs most consumer software: games, CAD, paint, other design tools, movies, browsers, or even compiling software (for small projects anyway).

In order to maximize the benefits of multiple cores, you'd need a way to program these things into non-linear algorithms, and fact is many of them do not have any easy way to be broken down into simpler components.

I suppose in games you could eventually have one core control each and every unit in a game, in a 1000 core processor, while other cores work on the a.i. at the "player" level, and others work on the game engine, mini-map, and chat threads, but even then you'd hit a limit of effectiveness determined by clock speed.

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.