Multicore may not be so scary: Linux will keep up with addition of more processing units

September 30, 2010 by Larry Hardesty, MIT News
Graphic: Christine Daniloff

Computer chips have stopped getting faster. To keep improving chips’ performance, manufacturers have turned to adding more "cores," or processing units, to each chip. In principle, a chip with two cores can run twice as fast as a chip with only one core, a chip with four cores four times as fast, and so on.

But breaking up computational tasks so that they run efficiently on multiple cores is a difficult task, and it only gets harder as the number of cores increases. So a number of ambitious research projects, including one at MIT, are reinventing computing, from chip architecture all the way up to the design of programming languages, to ensure that adding cores continues to translate to improved performance.

To managers of large office networks or Internet server farms, this is a daunting prospect. Is the computing landscape about to change completely? Will information-technology managers have to relearn their trade from scratch?

Probably not, say a group of MIT researchers. In a paper they’re presenting on Oct. 4 at the USENIX Symposium on Operating Systems Design and Implementation in Toronto, the researchers argue that, for at least the next few years, the Linux should be able to keep pace with changes in chip design.

Linux is an open-source operating system, meaning that any programmer who chooses to may modify its code, adding new features or streamlining existing ones. By the same token, however, any public distribution of those modifications must be free of charge, which makes Linux popular among managers of large data centers. Programmers around the world have contributed thousands of hours of their time to the continuing improvement of Linux.

Clogged counter

To get a sense of how well Linux will run on the chips of the future, the MIT researchers built a system in which eight six-core chips simulated the performance of a 48-core chip. Then they tested a battery of applications that placed heavy demands on the operating system, activating the 48 cores one by one and observing the consequences.

At some point, the addition of extra cores began slowing the system down rather than speeding it up. But that performance drag had a surprisingly simple explanation. In a multicore system, multiple cores often perform calculations that involve the same chunk of data. As long as the data is still required by some core, it shouldn’t be deleted from memory. So when a core begins to work on the data, it ratchets up a counter stored at a central location, and when it finishes its task, it ratchets the counter down. The counter thus keeps a running tally of the total number of cores using the data. When the tally gets to zero, the operating system knows that it can erase the data, freeing up memory for other procedures.

As the number of cores increases, however, tasks that depend on the same data get split up into smaller and smaller chunks. The MIT researchers found that the separate cores were spending so much time ratcheting the counter up and down that they weren’t getting nearly enough work done. Slightly rewriting the Linux code so that each core kept a local count, which was only occasionally synchronized with those of the other cores, greatly improved the system’s overall performance.

On the job

“That basically tells you how scalable things already are,” says Frans Kaashoek, one of three MIT computer-science professors who, along with four students, conducted the research. “The fact that that is the major scalability problem suggests that a lot of things already have been fixed. You could imagine much more important things to be problems, and they’re not. You’re down to simple reference counts.” Nor, Kaashoek says, do Linux contributors need a trio of MIT professors looking over their shoulders. “Our claim is not that our fixes are the ones that are going to make Linux more scalable,” Kaashoek says. “The community is completely capable of solving these problems, and they will solve them. That’s our hypothesis. In fact, we don’t have to do the work. They’ll do it.”

Kaashoek does say, however, that while the problem with the reference counter was easy to repair, it was not easy to identify. “There’s a bunch of interesting research to be done on building better tools to help programmers pinpoint where the problem is,” he says. “We have written a lot of little tools to help us figure out what’s going on, but we’d like to make that process much more automated.”

"The big question in the community is, as the number of cores on a processor goes up, will we have to completely rethink how we build operating systems," says Remzi Arpaci-Dusseau, a professor of computer science at the University of Wisconsin. "This paper is one of the first to systematically address that question."

Someday, Arpaci-Dusseau says, if the number of cores on a chip gets "significantly beyond 48," new architectures and operating systems may become necessary. But "for the next five, eight years," he says, "I think this paper answers pretty definitively that we probably don't have to completely rethink things, which is great, because it really helps direct resources and research toward more relevant problems."

Arpaci-Dusseau points out, too, that the MIT researchers "showed that finding the problems is the hard part. What that hints at for the rest of the community is that building techniques — whether they're software techniques or hardware techniques or both — that help to identify these problems is going to be a rich new area as we go off into this multicore world."

This story is republished courtesy of MIT News (, a popular site that covers news about MIT research, innovation and teaching.

Explore further: AMD Planning 16-Core Server Chip For 2011 Release

Related Stories

AMD Planning 16-Core Server Chip For 2011 Release

April 27, 2009

( -- AMD is in the process of designing a server chip with up to 16-cores. Code named Interlagos, the server chip will contain between 12 and 16 cores and will be available in 2011.

Mastering multicore

April 26, 2010

( -- MIT researchers have developed software that makes computer simulations of physical systems run much more efficiently on so-called multicore chips. In experiments involving chips with 24 separate cores -- ...

Intel's single-chip cloud computer

February 11, 2010

( -- Intel Labs has recently shown off a 48-core prototype chip it calls a "single-chip cloud computer" or SCC.

Recommended for you

A not-quite-random walk demystifies the algorithm

December 15, 2017

The algorithm is having a cultural moment. Originally a math and computer science term, algorithms are now used to account for everything from military drone strikes and financial market forecasts to Google search results.

US faces moment of truth on 'net neutrality'

December 14, 2017

The acrimonious battle over "net neutrality" in America comes to a head Thursday with a US agency set to vote to roll back rules enacted two years earlier aimed at preventing a "two-speed" internet.

FCC votes along party lines to end 'net neutrality' (Update)

December 14, 2017

The Federal Communications Commission repealed the Obama-era "net neutrality" rules Thursday, giving internet service providers like Verizon, Comcast and AT&T a free hand to slow or block websites and apps as they see fit ...

The wet road to fast and stable batteries

December 14, 2017

An international team of scientists—including several researchers from the U.S. Department of Energy's (DOE) Argonne National Laboratory—has discovered an anode battery material with superfast charging and stable operation ...


Adjust slider to filter visible comments by rank

Display comments: newest first

not rated yet Sep 30, 2010
The same slogan has been repeated already for more than 20 years.
It is very simple.
Either problems are dividable in to nearly independent tasks (e.g. block based video en- or decoding, projection of millions of triangles onto a view plane in gpu's, ...).
Or when tasks are quite depending on each other, communication is the bottleneck. The only solution is then to make this communication as fast as possible between the tasks or hierarchy of tasks and of course to avoid the transmission of redundant data. That's why on-chip short distance optical transmission at 100 Gb or more will become important. The optimum would be to use quantum entanglement for zero delay communication
not rated yet Sep 30, 2010
The optimum would be to use quantum entanglement for zero delay communication

Quantum Entanglement is not instantaneous. It's very very very fast but there is still a time delay. Also, at this point we still don;t have a means for actually transmitting information via Quantum Entanglement. I like to believe that one day we'll figure out a way to use QE for communication but at this stage its still impossible.

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.