Major breakthrough improves software reliability and security

Anyone who uses multithreaded computer programs -- and that's all of us, as these are the programs that power nearly all software applications including Office, Windows, MacOS, and Google Chrome Browser, and web services like Google Search, Microsoft Bing, and iCloud, -- knows well the frustration of computer crashes, bugs, and other aggravating problems. The most widely used method to harness the power we require from multicore processors, multithreaded programs can be difficult for programmers to get right and they often contain elusive bugs called races. Data races can cause very serious problems, like the software bug that set off the 2003 power blackout in the Northeast. Now there is a new system that will combat this problem.

Peregrine, a new software system developed by a team of researchers at Columbia Engineering School, led by Assistant Professor of Computer Science Junfeng Yang, will improve the reliability and security of multithreaded programs, benefiting virtually every computer user across the globe. Peregrine can be used by software vendors like Microsoft and Apple and web service providers like and , to provide reliable services to computer users. This new research was published in the 23rd ACM Symposium on Operating Systems Principles, considered to be the most prestigious systems conference held each year, and presented by Yang's graduate student Heming Cui at Cascais, Portugal, on Oct. 26. The paper can be found online.

"Multithreaded programs are becoming more and more critical and pervasive," says Professor Yang."But these programs are nondeterministic, so running them is like tossing a coin or rolling dice -- sometimes we get correct results, and sometimes we get wrong results or the program crashes. Our main finding in developing Peregrine is that we can make threads deterministic in an efficient and stable way: Peregrine can compute a plan for allowing when and where a thread can "change lanes" and can then place barriers between the lanes, allowing threads to change lanes only at fixed locations, following a fixed order. This prevents the random collisions that can occur in a nondeterministic system.

"Once Peregrine computes a good plan without collisions for one group of threads," adds Yang, "it can reuse the plan on subsequent groups to avoid the cost of computing a new plan for each new group. This approach matches our natural tendency to follow familiar routes so we can avoid both potential hazards in unknown routes and efforts to find a new route."

Yang notes that in contrast to many earlier systems that address only resultant problems but not the root cause, Peregrine addresses nondeterminism -- a system that is unpredictable as each input has multiple potential outcomes -- and thus simultaneously addresses all the problems that are caused by nondeterminism.

Peregrine also deals with data races or bugs, unlike most previous efforts that do not provide such fine-grained control over the execution of a program. And it's very fast -- many earlier systems may slow down the execution of a program by up to ten times. Peregrine is also a practical system that works with current hardware and programming languages -- it does not require new hardware or new languages, all of which can take years to develop. It reuses execution plans, whereas some previous work makes a different plan for each group of threads: as Yang points out, "The more plans one makes, the more likely some plans have errors and will lead to collisions."

"Today's software systems are large, complex, and plagued with errors, some of which have caused critical system failures and exploits," adds Yang. "My research is focused on creating effective tools to improve the reliability and security of real software systems. I'm excited about this area because it has the potential to make the cyberspace a better place and benefit every government, business, and individual who uses computers."

Explore further

Computer science researchers explore virtualization potential for high-end computing

Citation: Major breakthrough improves software reliability and security (2011, November 2) retrieved 20 April 2019 from
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Feedback to editors

User comments

Nov 02, 2011
Perhaps I'm missing some details here, but I don't quite see how race conditions could be eliminated in software that runs on multiple cores in real-world operating environments. Aside from having to share resources with other processes, and different cache states leading to different hit/miss ratios and memory access delays... Aside from the fact that different cores may be clocked differently due to voltage/power gating... Aside from the fact that much of modern code is event-driven and those external events (and/or hardware interrupts) can't be "scheduled" as they can't be foreseen... Aside from all that and possibly more, we are moving toward heterogeneous computing architectures where even the multiple cores within the same system could be fundamentally different from each other in both capabilities and performance.

I don't see how, under such circumstances and with shared global memory between threads, one could somehow magically eliminate race conditions (or deadlocks.)

Nov 02, 2011
It may well be a breakthrough, but this article is merely an ad for the company. Pass.

Nov 02, 2011
It's nof JUST an ad, it contains a link to the full text of the article for those who are interested.

I'm still going through it at the moment but, so far, it looks like the abstract summarized it correctly:

It may not be a fundamental discovery in itself (after all, race conditions and related issues have been well known and understood as far back as the early 1970s at least). What they appear to have developed however, is a nifty way to handle existing, very sloppy systems, to bring them up to the standards required of serious, mission and safety-critical requirements. They also do it in a completely automated and, most importantly, -- formally provable -- way. That's no mean feat at all.

Nov 02, 2011
I realize now that this is in no way related to Peregrine Systems or their products. At least I couldn't find a connection to HP. Odd for them to choose that name. Bet they end up changing it. I hear HP's attorneys calling.

Nov 02, 2011
Ah, missed the link... From the abstract:
It works by constraining a program to repeat the same thread interleavings, or schedules, when given same input.
So first of all, this appears to apply only to in-process threads. If you have multiple processes comprising a single application, this won't work.

More importantly, this seems useful mainly for debugging thread-related issues. Not so much for deploying somehow automatically bullet-proofed code. This could actually be detrimental when testing: if the code always behaves the same for a given set of inputs, then all tests may pass even though the code will fail after it's released, when it's given a different set of inputs. Having threaded code behave deterministically like this may reduce the chance of serendipitously finding flaws during development (particularly, test-driven development...)

But for debugging, this seems to be pretty useful.

Nov 02, 2011
In reading this article, I can only conclude that essentially, at it core, this technique essentially inhibits multithreaded execution of code. Of course you can avoid race conditions, if in fact, you order things such that they run sequentially. But doesn't that defeat the whole point of a multi-threaded, multi-process software system?

This kind of makes me wonder, if all of the other "breakthroughs" touted here (in fields I know nothing about) are really as breakthroughy as their headlines imply?

Nov 03, 2011
It doesn't inhibit the parallelism, only the non-determinism that is a consequence of the parallelism.

Nov 03, 2011
For those who are really (and I mean -really- seriously) interested in the issue, I suggest you check something called "Ravenscar Profile".

Even the first, albeit a bit naive, Wiki entry can give you a first taste of what's involved. Just look at the list of things we have to prohibit (under the pragma Restrictions list) to make sure it can ---absolutely--- guarantee the concurrent consistency. That's not easy.

And that profile applies to the NEW development. Applying it to some code that never even -heard- of proper software engineering is a nightmare verging very closely on impossible.

Good luck with that.

Nov 03, 2011
This will probably decrease multithreaded performance. Much hot-baked air for nothing. Plus these people probably never heard about watchdog timers or firing lazy ass developers.

Nov 03, 2011
This will probably decrease multithreaded performance. Much hot-baked air for nothing. Plus these people probably never heard about watchdog timers

If you read the article it claims to be much faster than conventional means of ensuring no race conditions occur.

Naturally such a system will impact performance somewhat. But for many critical applications that is a small price to pay if it means that you actually CAN go parallel where before you weren't allowed to (medical, aviation, ... )

And especially for software companies the price of tracking down such bugs in large software frameworks is disproprtionally large (not to mention it's a real hassle fo the software engineer unlucky enough to actually have to do it). Any development that will reduce that effort will be very wlcome.

Nov 03, 2011
In many software projects you do not have access to the entire code.
E.g. in the software project we're currently working on we employ at least 6 third party frameworks - not all of which are open souce and therefore not fully accessible for debugging/cleaning up - even if we did have the time and manpower to go through their code line by line.
And even if we fixed stuff there: the changes we make and submit might not make it into the next version of the third party framework and hence all our effort may be or naught.

I agree that programmers should take care to have a clean design. But the reality is that you do add-ons and modifications of older code bases or tie-ins with new code in projcts so large that you just don't have full control over and knowledge of all the other stuff going on.

Nov 03, 2011
This will probably decrease multithreaded performance

Sorry but it's a common misconception amongst people who never worked in hard real-time software devrlopment that it is about performance. I.e. - throw the latest and fastest processor at it and all's good.

Nothing could be further from the truth. The first, foremost and absolute "must" is determinitic prdictability. You must be able to guarantee that your piece of code responds to async event in no mor than X microseconds, or Y clock cycles (which are not alway interchangeble!). In extreme cases, you may even be required to resond at EXACTLY X/Y msec/cycles - no more no less.

This is the reason many safety-critical systems (yes, avionics are most definitely among them) still use environment-hardened versions of the old i386 CPU, 4-bit controllers whereever possible and so on. Probably will keep on using them for the foreseeable future too.

In such environments, any tool that can GUARANTEE both corectness and timliness of

Nov 03, 2011
Cont. (ran out of space): ... of your software very definitely deserves all the accolades it gets.

P.S. Sorry for the numerous typos - posting this from a mobile device with no easy way to spellcheck. This is clearly NOT a safety-critical piece of software :-)

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more