Research shows that it may be time to let software, rather than hardware, manage high-speed on-chip memory banks

September 13, 2013 by Larry Hardesty, Massachusetts Institute of Technology

In today's computers, moving data to and from main memory consumes so much time and energy that microprocessors have their own small, high-speed memory banks, known as "caches," which store frequently used data. Traditionally, managing the caches has required fairly simple algorithms that can be hard-wired into the chips.

In the 21st century, however, in order to meet consumers' expectations for steadily increasing , chipmakers have had to begin equipping their chips with more and more cores, or . And as cores proliferate, cache management becomes much more difficult.

Daniel Sanchez, an assistant professor in MIT's Department of Electrical Engineering and Computer Science, believes that it's time to turn cache management over to software. This week, at the International Conference on Parallel Architectures and Compilation Techniques, Sanchez and his student Nathan Beckmann presented a new system, dubbed Jigsaw, that monitors the computations being performed by a multicore chip and manages accordingly.

In experiments simulating the execution of hundreds of applications on 16- and 64-core chips, Sanchez and Beckmann found that Jigsaw could speed up execution by an average of 18 percent—with more than twofold improvements in some cases—while actually reducing energy consumption by as much as 72 percent. And Sanchez believes that the performance improvements offered by Jigsaw should only increase as the number of cores does.

Location, location, location

In most , each core has several small, private caches. But there's also what's known as a last-level cache, which is shared by all the cores. "That cache is on the order of 40 to 60 percent of the chip," Sanchez says. "It is a significant fraction of the area because it's so crucial to performance. If we didn't have that cache, some applications would be an order of magnitude slower."

Physically, the last-level cache is broken into separate memory banks and distributed across the chip; for any given core, accessing the nearest bank takes less time and consumes less energy than accessing those farther away. But because the last-level cache is shared by all the cores, most chips assign data to the banks randomly.

Jigsaw, by contrast, monitors which cores are accessing which data most frequently and, on the fly, calculates the most efficient assignment of data to cache banks. For instance, data being used exclusively by a single core is stored near that core, whereas data that all the cores are accessing with equal frequency is stored near the center of the chip, minimizing the average distance it has to travel.

Jigsaw also varies the amount of cache space allocated to each type of data, depending on how it's accessed. Data that is reused frequently receives more space than data that is accessed infrequently or only once.

In principle, optimizing cache space allocations requires evaluating how the chip as a whole will perform given every possible allocation of cache space to all the computations being performed on all the cores. That calculation would be prohibitively time-consuming, but by ignoring some particularly convoluted scenarios that are extremely unlikely to arise in practice, Sanchez and Beckmann were able to develop an approximate optimization algorithm that runs efficiently even as the number of cores and the different types of data increases dramatically.

Quick study

Of course, since the optimization is based on Jigsaw's observations of the chip's activity, "it's the optimal thing to do assuming that the programs will behave in the next 20 milliseconds the way they did in the last 20 milliseconds," Sanchez says. "But there's very strong experimental evidence that programs typically have stable phases of hundreds of milliseconds, or even seconds."

Sanchez also points out that the new paper represents simply his group's "first cut" at turning cache management over to software. Going forward, they will be investigating, among other things, the co-design of hardware and software to improve efficiency even further and the possibility of allowing programmers themselves to classify data according to their memory-access patterns, so that Jigsaw doesn't have to rely entirely on observation to evaluate memory allocation.

"More and more of our computation is happening in data centers," say Jason Mars, an assistant professor of computer science at the University of Michigan. "In the data-center space, it's going to be very important to be able to have the microarchitecture partition and allocate resources on an application-by-application basis."

"When you have multiple applications that are running inside a single box," he explains, "there's a point of interference where jobs can hurt the performance of each other. With current commodity hardware, there are a limited number of mechanisms we have to manage how jobs hurt each other."

Mars cautions that a system like Jigsaw dispenses with a layer of abstraction between chip hardware and the software running on it. "Companies like Intel, once they expose the microarchitectural configurations through the software layer, they have to keep that interface over future generations of the processor," Mars says. "So if Intel wanted to do something audacious with the microarchitecture to make a big change, they'll have to keep that legacy support around, which can limit the design options they can explore."

"However," he adds, "the techniques in Jigsaw seem very practical, and I could see some variant of this hardware-software interface being adopted in future designs. It's a pretty compelling approach, actually."

Explore further: New bandwidth management techniques boost operating efficiency in multi-core chips

Related Stories

Researchers solve scaling challenge for multi-core chips

April 16, 2012

Researchers sponsored by Semiconductor Research Corporation (SRC), the world's leading university-research consortium for semiconductors and related technologies, today announced that they have identified a path to overcome ...

Intel's new Ivy Bridge parts form a budget line

January 22, 2013

(—News from Intel on Ivy Bridge microarchitecture involves the rollout of eight new SKUs (processor unique identifier part numbers) to Celeron and Pentium families as well as a new Core chip. Translation: Intel's ...

The next operating system

February 24, 2011

At the most basic level, a computer is something that receives zeroes and ones from either memory or an input device — like a keyboard — combines them in some systematic way, and ships the results off to either ...

Intel's single-chip cloud computer

February 11, 2010

( -- Intel Labs has recently shown off a 48-core prototype chip it calls a "single-chip cloud computer" or SCC.

New hardware boosts communication speed on multi-core chips

January 31, 2011

Computer engineers at North Carolina State University have developed hardware that allows programs to operate more efficiently by significantly boosting the speed at which the "cores" on a computer chip communicate with each ...

Recommended for you

Technology near for real-time TV political fact checks

January 18, 2019

A Duke University team expects to have a product available for election year that will allow television networks to offer real-time fact checks onscreen when a politician makes a questionable claim during a speech or debate.

Privacy becomes a selling point at tech show

January 7, 2019

Apple is not among the exhibitors at the 2019 Consumer Electronics Show, but that didn't prevent the iPhone maker from sending a message to attendees on a large billboard.

China's Huawei unveils chip for global big data market

January 7, 2019

Huawei Technologies Ltd. showed off a new processor chip for data centers and cloud computing Monday, expanding into new and growing markets despite Western warnings the company might be a security risk.

1 comment

Adjust slider to filter visible comments by rank

Display comments: newest first

not rated yet Oct 23, 2013
Indeed everything has gone online these days and so is time management for businesses. In fact, we had been looking for some great software to track our day to day organizational and employee time and thereby started using Replicon time tracking software.

It has made everything possible and easier in order to track employee and project time and overall made things far efficient and productive. Hats off to this great online tool!!

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.