Conquering the chaos in modern, multiprocessor computers

Mar 10, 2010 by Hannah Hickey

(PhysOrg.com) -- Computers should not play dice. That, to paraphrase Einstein, is the feeling of a University of Washington computer scientist with a simple manifesto: If you enter the same computer command, you should get back the same result. Unfortunately, that is far from the case with many of today's machines. Beneath their smooth exteriors, modern computers behave in wildly unpredictable ways, said Luis Ceze, a UW assistant professor of computer science and engineering.

"With older, single-processor systems, computers behave exactly the same way as long as you give the same commands. Today's computers are non-deterministic. Even if you give the same set of commands, you might get a different result," Ceze said.

He and UW associate professors of computer science and engineering Mark Oskin and Dan Grossman and UW graduate students Owen Anderson, Tom Bergan, Joseph Devietti, Brandon Lucia and Nick Hunt have developed a way to get modern, multiple-processor computers to behave in predictable ways, by automatically parceling sets of commands and assigning them to specific places. Sets of commands get calculated simultaneously, so the well-behaved program still runs faster than it would on a single processor.

Next week at the International Conference on Architectural Support for Programming Languages and Operating Systems in Pittsburgh, Bergan will present a software-based version of this system that could be used on existing machines. It builds on a more general approach the group published last year, which was recently chosen as a top paper for 2009 by the Institute of Electrical and Electronics Engineers' journal Micro.

In the old days one computer had one processor. But today's consumer standard is dual-core processors, and even quad-core machines are appearing on store shelves. Supercomputers and servers can house hundreds, even thousands, of processing units.

On the plus side, this design creates computers that run faster, cost less and use less power for the same performance delivered on a single . On the other hand, multiple processors are responsible for elusive errors that freeze Web browsers and crash programs.

It is not so different from the classic chaos problem in which a butterfly flaps its wings in one place and can cause a hurricane across the globe. Modern shared-memory computers have to shuffle tasks from one place to another. The speed at which the information travels can be affected by tiny changes, such as the distance between parts in the or even the temperature of the wires. Information can thus arrive in a different order and lead to unexpected errors, even for tasks that ran smoothly hundreds of times before.

"With multi-core systems the trend is to have more bugs because it's harder to write code for them," Ceze said. "And these concurrency bugs are much harder to get a handle on."

One application of the UW system is to make errors reproducible, so that programs can be properly tested.

"We've developed a basic technique that could be used in a range of systems, from cell phones to data centers," Ceze said. "Ultimately, I want to make it really easy for people to design high-performing, low-energy and secure systems."

Last year Ceze, Oskin, and Peter Godman, a former director at Isilon Systems, founded a company to commercialize their technology. Petra is initially named after the Greek word for rock because it hopes to develop "rock-solid systems," Ceze said. The Seattle-based startup will soon release its first product, Jinx, which makes any errors that are going to crop up in a program happen quickly.

"We can compress the effect of thousands of people using a program into a few minutes during the software's development," Ceze said. "We want to allow people to write code for multi-core systems without going insane."

The company already has some big-name clients trying its product, Ceze said, though it is not yet disclosing their identities.

"If this erratic behavior irritates us, as software users, imagine how it is for banks or other mission-critical applications."

Explore further: Ride-sharing could cut cabs' road time by 30 percent

More information: http://www.ece.cmu.edu/CALCM/asplos10/doku.php

Related Stories

New supercomputer design planned

Mar 20, 2006

Seattle-based Cray Inc., a manufacturer of high performance computers, announced Monday a radical design change for its supercomputers.

Sun Microsystems' Donation Aids Multiprocessor Research

Jul 08, 2004

Sun Microsystems Laboratories has donated over USD 500,000 in computer equipment to computer scientists at the University of Rochester in support of collaborative efforts to make high-speed computers run faster. Michael Scott, ...

Recommended for you

Ride-sharing could cut cabs' road time by 30 percent

Sep 01, 2014

Cellphone apps that find users car rides in real time are exploding in popularity: The car-service company Uber was recently valued at $18 billion, and even as it faces legal wrangles, a number of companies ...

Avatars make the Internet sign to deaf people

Aug 29, 2014

It is challenging for deaf people to learn a sound-based language, since they are physically not able to hear those sounds. Hence, most of them struggle with written language as well as with text reading ...

Chameleon: Cloud computing for computer science

Aug 26, 2014

Cloud computing has changed the way we work, the way we communicate online, even the way we relax at night with a movie. But even as "the cloud" starts to cross over into popular parlance, the full potential ...

User comments : 15

Adjust slider to filter visible comments by rank

Display comments: newest first

XopherMV
3 / 5 (2) Mar 10, 2010
Ok, great. This article tells me nothing. How about at least a high-level description of the techniques?
JayK
1 / 5 (2) Mar 10, 2010
I agree, this article is fairly useless, doesn't begin to describe the methodologies or even configuration data that was used to generate the conclusion.

There are also a lot of accusations in this article that are unbacked by proof, such as "web browsers locking up on multiprocessor machines." Poorly designed code will lock up on any sort of architecture, and code that is poorly written can easily lock up due to race conditions that work on older systems.
jj2009
not rated yet Mar 10, 2010
a rock solid system. great, ill believe it when i see it! will it run microsoft windows?
eachus
not rated yet Mar 11, 2010
Lol! Two things going on here. One is that they have a product which, in effect rattles the cage to get events to happen in different (but still legal under the API) orders.

The other is a lesson which Software Engineers discovered a long time ago. You can't test in quality, the most you can do is remove some of the bugs. And then you run into the problem that fixing those bugs can introduce more bugs, until the project is eventually dropped without ever shipping a product.

For complex reasons, Ada was designed to fix these problems by making it easy to see that the program was correct, even if that made it harder to write. In 1983 I taught one of the earliest programming courses to existing programmers, and we got a rude shock. Around 30% of the existing, professional programmers in the course could not write a correct Ada program. (Not Hello World, but on the order of 50 lines of code.) What went wrong?
eachus
not rated yet Mar 11, 2010
We found out one of the problems when we took a scheduling algorithm (written in Fortran) from an operating system and re-implemented it in Ada to see if it was slower or faster. We found 13 errors in 110 SLOC (ignoring block comments) in the Fortran. Once we fixed them, and compared the generated code, the resulting machine code was almost identical.

For some reason management was more interested in the errors in the existing code that had been shipping for over a year, than in how fast either version ran. Anyway we learned that the problem with programming in Ada, was that it expected you to get all these exceptional and edge case right the first time.

Tools were developed for writing (hard real-time) tasking code in Ada that work and result in code that can be proven correct even when distributed across multiple homogeneous or heterogeneous processors.

The problem is that it takes at least a year or more to become productive. :-( The good news is no debugging.
komone
not rated yet Mar 11, 2010
Partly in response to Xopher... the issue is that imperative languages like c, java, ada, etc share memory and thus cause side-effects. Shared transactional memory (STM) is certainly one response to the problem, but actor based systems and lambda calculus offer a better approach to resolving these issues at the heart of imperative programming. Functional languages e.g. erlang, haskell et al. appear to be the most productive way forward when faced with the multicore situation.
abhishekbt
not rated yet Mar 11, 2010
@jj2009 - You must be kidding right?
The two items in your post are antonyms.
Jo01
not rated yet Mar 11, 2010
Logic is the ultimate (and only) tool to master programming on single, multi or ultra core hardware.

The complexity increases maybe, depending on the tools and paradigm used, but logic is always sufficient to resolve the problem.

When it isn't it is save to say it is a hardware problem, most of the time related to disk errors.
In 30 years time I've never encountered a CPU related hardware error or memory related hardware error that resulted in unexplainable code behavior.
(I know that memory stress tests on common not error corrected hardware reveals errors, but that's not what I am talking about. What I mean is that debugging an error manifested by using a program never led to faulty hardware, in my case.)

Also, avoiding (and resolving) memory leaks when programming in for example C is as hard as parallel programming. The combination, C and parallel, is of course the ultimate fun.

...
Jo01
not rated yet Mar 11, 2010
I think it's completely wrong to state that modern computers are chaotic, unpredictable and chaotic isn't necessarily synonym.

J
CSharpner
not rated yet Mar 11, 2010
"and even quad-core machines are appearing on store shelves"

What?? They've been on store shelves for YEARS. I got mine over 2 years ago at CompUSA, which has been closed for over a year now.

This article gives zero information on what this magic algorithm or process is. From what little information is provided, it sounds like it's just something that does load balancing of threads, which all modern OS's already do, so I'm still curious what this is. This article is essentially a really long title to a story that was never written.

And what's this "unpredictable" so-called problem they speak of? Things are fairly predictable in my software development... even in my multi-threaded development, except of course, for things that HAVE to be unpredictable like threads that pause until external events and the like.
JayK
2.3 / 5 (3) Mar 11, 2010
CShaprner: I think they're selling something and this is just a pitch full of hyperbole and gross exaggerations.
raron
not rated yet Mar 13, 2010
"Today's computers are non-deterministic."

L-O-L. I'd never thought I'd read THAT in a physics related website.

However, I must confess to thinking the same thoughts myself (especially concerning some popular OS out there), but that doesn't mean it is "non-deterministic". Only a very, very complex machine dependent on variables not easily controlled by mere users.

This is either a joke, or there is actually a very, very frustrated "computer scientist" at University of Washington. Which isn't all that unthinkable, really.
:-)
El_Nose
not rated yet Mar 16, 2010
This still does not get rid of coding errors -- in fact no program can ensure that a piece od code does not have a programming error in it - outside of checking syntax

@raron

the nondeterministic part comes from the Processor - today's processors uses branch prediction and this adds the nondeterministic part -- there was a very good explanation of this as a lecture - that i cannot find the link to ;-( -- anyway because of branch prediction in the processor you cannot exactly guarentee that the same input will lead to exactly the same output IF -- BIG IF -- you are using multiple cores - basically you have one cores.

1) branch prediction on a single process with the same input will have exactly the same effect with the same input

2) but what if you depend on data from another process and the code in that process on another core or thread is momentarily wrong -- and then you have a context switch -- you have stored data that has been updated and you continue forward
El_Nose
not rated yet Mar 16, 2010
these are really programming issues that a careful programmer can overcome and the fact that multicores is really not the point -- anyone making a program with a lot of threads will run into the same issue even if on one core without the proper locking of information.
taka
1 / 5 (1) Mar 19, 2010
All nowadays programming is based on sequential execution of commands (including functional languages unfortunately). If there is parallelism of any kind involved then this sequence do not exist any more and programmer is not capable to handle it as soon as it is anything else then trivial. If things become parallel then there are no ways to abstract or divide tasks any more. Proper parallel assembler must emerge before truly parallel high-level languages become possible and it should obviously be based on moving data around between commands-functions that just transform it (no memory to spoil, no next command to execute in wrong time).