New bandwidth management techniques boost operating efficiency in multi-core chips

May 25, 2011

Researchers from North Carolina State University have developed two new techniques to help maximize the performance of multi-core computer chips by allowing them to retrieve data more efficiently, which boosts chip performance by 10 to 40 percent.

To do this, the new techniques allow multi-core chips to deal with two things more efficiently: allocating and "prefetching" data.

Multi-core chips are supposed to make our computers run faster. Each core on a chip is its own , or computer brain. However, there are things that can slow these cores. For example, each core needs to retrieve data from that is not stored on its chip. There is a limited pathway – or bandwidth – these cores can use to retrieve that off-chip data. As chips have incorporated more and more cores, the bandwidth has become increasingly congested – slowing down system performance.

One of the ways to expedite core performance is called prefetching. Each chip has its own small memory component, called a cache. In prefetching, the cache predicts what data a core will need in the future and retrieves that data from off-chip memory before the core needs it. Ideally, this improves the core's performance. But, if the cache's prediction is inaccurate, it unnecessarily clogs the bandwidth while retrieving the wrong data. This actually slows the chip's overall performance.

"The first technique relies on criteria we developed to determine how much bandwidth should be allotted to each core on a chip," says Dr. Yan Solihin, associate professor of electrical and computer engineering at NC State and co-author of a paper describing the research. Some cores require more off-chip data than others. The researchers use easily-collected data from the hardware counters on each chip to determine which cores need more bandwidth. "By better distributing the bandwidth to the appropriate cores, the criteria are able to maximize system performance," Solihin says.

"The second technique relies on a set of criteria we developed for determining when prefetching will boost performance and should be utilized," Solihin says, "as well as when prefetching would slow things down and should be avoided." These criteria also use data from each chip's hardware counters. The prefetching criteria would allow manufacturers to make multi-core chips that operate more efficiently, because each of the individual cores would automatically turn prefetching on or off as needed.

Utilizing both sets of criteria, the researchers were able to boost multi-core performance by 40 percent, compared to multi-core chips that do not prefetch data, and by 10 percent over multi-core chips that always prefetch data.

Explore further: Researchers reverse-engineering China's online censorship methods reveal government's deepest concerns

More information: The paper, "Studying the Impact of Hardware Prefetching and Bandwidth Partitioning in Chip-Multiprocessors," will be presented June 9 at the International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS) in San Jose, Calif.

Related Stories

New hardware boosts communication speed on multi-core chips

Jan 31, 2011

Computer engineers at North Carolina State University have developed hardware that allows programs to operate more efficiently by significantly boosting the speed at which the "cores" on a computer chip communicate with each ...

AMD Planning 16-Core Server Chip For 2011 Release

Apr 27, 2009

(PhysOrg.com) -- AMD is in the process of designing a server chip with up to 16-cores. Code named Interlagos, the server chip will contain between 12 and 16 cores and will be available in 2011.

New software design technique allows programs to run faster

Apr 05, 2010

(PhysOrg.com) -- Researchers at North Carolina State University have developed a new approach to software development that will allow common computer programs to run up to 20 percent faster and possibly incorporate new security ...

Recommended for you

Enabling a new future for cloud computing

Aug 21, 2014

The National Science Foundation (NSF) today announced two $10 million projects to create cloud computing testbeds—to be called "Chameleon" and "CloudLab"—that will enable the academic research community ...

Hacking Gmail with 92 percent success

Aug 21, 2014

(Phys.org) —A team of researchers, including an assistant professor at the University of California, Riverside Bourns College of Engineering, have identified a weakness believed to exist in Android, Windows ...

User comments : 14

Adjust slider to filter visible comments by rank

Display comments: newest first

spectator
1 / 5 (4) May 25, 2011
I'm sure software companies will find a way to squander this improvement by making their next generation operating system or appliation 10 times larger and less efficienty.
that_guy
not rated yet May 25, 2011
This is actually both a very basic, simple, and clever idea.

For example:

lets say you have 4 cores on a chip and each core has 2mb L1 cache (i realize that some will have cache that is shared between pairs etc). Currently there is very little flexibility in high level cache, but if you think about it, the first core is the one that you will want to have the most amount of cache generally, because it is going to be the one that usually does most of the work.

So it's an algorithm that helps allocate high level cache based on usage need, program/date type, and urgency, to help your computer run more efficiently, and put some of that lazy fourth core cache to work doing something useful.
that_guy
5 / 5 (1) May 25, 2011
I'm sure software companies will find a way to squander this improvement by making their next generation operating system or appliation 10 times larger and less efficienty.

overall, software and algorithms have actually become much more efficient and streamlined in the last 2 decades. Your criticism may apply to Vista, but I think you lack an understanding of how much your computer does, and how efficiently it does it. ten years ago, a computer could barely run tv quality video in real time.
spectator
1 / 5 (1) May 25, 2011
I'm sure software companies will find a way to squander this improvement by making their next generation operating system or appliation 10 times larger and less efficienty.

overall, software and algorithms have actually become much more efficient and streamlined in the last 2 decades. Your criticism may apply to Vista, but I think you lack an understanding of how much your computer does, and how efficiently it does it. ten years ago, a computer could barely run tv quality video in real time.


10 years ago, the average computer was something like 1.4 ghz, and the average video card had 128 times fewer stream processors and each stream processor was slower. The average RAM, except in the most extreme gaming machines, was around 128mb to 256mb. Paging files are actually one reason older computers were so slow, which has more to do with RAM limitations than procossors.

The efficiency of algorithms actually has little to do with it.
Na_Reth
not rated yet May 25, 2011
I'm sure software companies will find a way to squander this improvement by making their next generation operating system or appliation 10 times larger and less efficienty.

overall, software and algorithms have actually become much more efficient and streamlined in the last 2 decades. Your criticism may apply to Vista, but I think you lack an understanding of how much your computer does, and how efficiently it does it. ten years ago, a computer could barely run tv quality video in real time.


Windows 7 is an improvement over vista, but it is still a monster. With compiling my own programs i have a fully functionally linux desktop running 100mb ram, AND its much more responsive than even windows XP, I think a decent distro will do with atleast 150mb.. no way windows vista or higher will do that. Also take into account bytecode garbage collected languages that are a terrible processor hog(java, .net etc)... most software has really gone the shitter.(thanks to sun and MS
spectator
1 / 5 (1) May 25, 2011
The computer I'm using right now performs approximately 10.32 billion clock cycles per second, counting only the 4 core CPU, and not counting the video card and sound card.

I know from experiment that it can run the entire game engine of Starcraft 2 and run 7 computer "players" simultaneously, performing up to 6000 to 6500 game actions per minute per "player" during the late-middle game. That comes to 45500 game actions per minute, in addition to running the game engine itself and taking my inputs.

Each game action results in often multiple nested loops for decision making, pathfinding, damage calculation, etc.

Additionally, game actions only count actual orders from the "A.I." to the army controlled by that "player" via the "game engine," game actions do not count the a.i.'s logic operations involved in running it's script, which are tens of thousands of times more.

So yes, I'm aware of how powerful modern computers really are both mathematically and in human terms...
spectator
1 / 5 (1) May 25, 2011
And I still say that if you could simulate permanent human learning, not necessarily even a "neural net," but just some way of "teaching" the a.i. permanently through experience so it learned from it's mistakes, even if it was so much as insect level learning, then combine an ANTS intelligence, even in the form of a basic self-learning "expert machine" script, then with the computer's perfect memory and brute force calculations, no human could ever compete in an Real Time Strategy game, ever.

Note that above when I said 6000 to 6500 game actions per minute, the computer could actually play faster than that, but to avoid excessive spam, some of the branching a.i. script threads it runs are designed so that they are only refreshed once every 2 seconds or so. This is so they could conserve processor time for the graphics engine, etc.

It could easily be made to refresh every frame, or even multiple times per frame.
that_guy
not rated yet May 25, 2011
computer algorithms have outpaced moores law in efficiency gains. Please hold for actual evidence, as this is not a widely reported issue. (I'm looking for articles on general software efficiency improvements)
that_guy
not rated yet May 25, 2011
So I fail at finding what I want - but there was an article on physorg for one that described how computer algorithms have outpaced moore's law in efficiency gains. Basically, if we had the sophistication of 1980s software and algorithms, we would not be able to watch tv (Even sd tv) on our computers today, due to the heavy computational load.

However, since every search I run is sidetracked by articles on single algorithms, or this that and the other, I can't find an actual article that describes the gains in general software efficiency. So, in fairness, i'll have to drop the argument for now.

However, I do understand the sometimes bloated state of some software, as, say, microsoft tries to make it's os do tons of stuff you don't care about, slowing things down. It's not that the software is doing any one thing innefficiently - it's doing every individual thing very efficiently - but 90% of what it does it not useful for joe anybody.
that_guy
not rated yet May 25, 2011
it would be helpful if microsoft made things streamlined and modular, so that you can add pieces as needed, rather than have every single thing that only 1% of the population benefits from. We can all moan about MS I'm sure, but it is one case, and does not necessarily apply to all parts of the broader software environment.
spectator
3 / 5 (2) May 25, 2011
that_guy:

The C family and a few of it's competitors solved the old algorithm problems through more rigid software development protocols, and modern self-optimizing compilers which remove redundant code.

In the 70's and 80's, they had terrible "spaghetti code" programs which were filled with manually written "goto" statements and crap like that. There was little or no syntax, and no standards for how software should be developed, organized, or maintained.

I would agree with what you said though, the efficiency of the algoriths per se is not necessarily the culprit with microsoft. They just have too many "permanently on" functions built into their software.
that_guy
not rated yet May 25, 2011
There are always algorithms for things that can be further optimized, for both new and old technologies (Say for example, self driving cars, voice recognition, evolving video codecs, etc.), and even programming languages are being improved still. But I am in agreeance that structurally, programming can be done much more efficiently than in the past.

I think that there's still progress to be made in areas, and there will always be new things that need optimization (multicore utilization for example), but yes, i'll agree that for a lot of the mature software technologies that are commonly used, they are reaching some points of diminishing returns.
Na_Reth
5 / 5 (2) May 25, 2011
the C family did indeed solve most of the problems in software development, C++ even more so, to bad it lacks libraries or a framework that is cross platform, and the ones that exist or only decent and often pose problems in one platform than the other.
After that MS came with terrible solutions... visual basic classic anyone? Java from sun.. etc.. These were HORRIBLY SLOW languages, from experience.
Now most software is written in java, .net, php or some other higher level language. C++ is still widely used for game engines that need performance, C is still used for operating systems. Most distros of Linux also use python now, it works but its also slow... at least its tons better than the code i have seen from MS. Then there is all this Actionscript(flash) that is also horribly slow(i would prefer java or even javascript over this one).
Software is much slower than you realise and i would know talking from 12 years of experience now.
daniel_ikslawok
not rated yet May 26, 2011
Is this accaleration software something I could just download on my laptop? (Excuse me for the stupid question, I have no idea about things like that ;)