Complex software systems -- heal thyself

Complex software systems -- heal thyself

( -- Software underlies modern life, keeping everything from mobile phone networks functioning to planes in the air, but ensuring increasingly complex systems stay free of faults has become an epic task. What if software could heal itself?

Researchers from Israel and six EU countries have carried out pioneering work on self-healing capable of automatically and autonomously detecting, identifying and fixing errors in the copious lines of code that make up . The results of their research are already being used internally by several companies and could feed into commercial products in the near future.

“Software systems have grown increasingly large and complex as we come to rely on them to do more things. Just making a single mobile phone call may involve hundreds of systems operating behind the scenes and all of them need to work properly,” notes Onn Shehory, a researcher at IBM in Haifa, Israel.

We are talking about hundreds of systems containing hundreds of thousands or even millions of lines of code. And if just a tiny part of that code is wrong - due to design flaws or faults introduced while in use - performance will be degraded or the system may not function at all. Fixing software faults has, until now, meant calling on to sift through the code to identify the cause, locate it and repair it, a process that could be compared to searching for a needle in a digital haystack.

Tools developed by a team of researchers, coordinated by Shehory and funded by the European Union in the SHADOWS project, do the sifting, identifying and fixing automatically. The approach relies on a set of detection-localisation-healing-assurance loops that function in the background of complex software systems, without the need for human intervention.

The detection stage reveals or predicts the presence of problems, such as functional deviations, performance bottlenecks or concurrency problems. The localisation stage identifies the fault that caused the issue. The healing stage provides automatic or semi-automatic problem remediation. And, finally, the assurance stage examines the healing that has been done to ensure it solved the problem and no new problems were introduced.

A unified framework, based on open standards, such as Eclipse, provides a single methodology and architecture.

“Say you have several hundred thousand lines of code. We don’t analyse all of it but instead look at those areas - perhaps 10,000 lines - that have been identified as being at greater risk of faults. Monitoring it all would be too costly as the load on the system from the healing software would be greater than from the software that is being monitored,” Shehory explains.

When a fault is detected and its cause found, the tools can automatically apply a series of predefined solutions until it is resolved. In addition, the tools can be used to generate a model describing how a software system should function in a set of typical scenarios. These models can then be used to make comparisons with how it is functioning in reality.

“This is particularly useful when comparing different versions of the same software,” Shehory says.

By using aspect-oriented development, the researchers designed their tools to function with legacy systems, ensuring that companies do not have to “reinvent the wheel” and redesign their existing software in order to incorporate self-healing features. This makes the SHADOWS tools cost-effective and relatively simple to implement.

Contained risks

The team also worked on tools and drew up guidelines for developers creating new software, to encourage the development of software for complex systems with built-in self-healing capabilities.

“It will take time for this to be widely accepted by developers as they have to be able to trust tools that are going to act autonomously,” Shehory notes. “Yet, results already achieved by project partners using the SHADOWS technologies demonstrate that risks due to autonomy are well contained.”

Companies have already started applying the tools with success, with one telecommunications firm having used the SHADOWS approach to identify and correct a long-running fault in its call servers.

“There is a very real need for self-healing solutions among users of software… although I think the biggest initial demand is from software developers who want to reduce software testing times,” Shehory explains.

“They tell us that if our tools can reduce the time it takes to test for bugs and errors by just a few weeks it would be a major advantage.” Several of the project partners are continuing to work together, and a follow-up project is planned with that goal in mind.

Explore further

IBM Debuts 'Grammar Checker' Approach to Catching Software Bugs

Provided by ICT Results
Citation: Complex software systems -- heal thyself (2010, May 3) retrieved 23 August 2019 from
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Feedback to editors

User comments

May 03, 2010
Doesn't this present an AI situation like that movie Eagle Eye, do we really want to teach a PC to fix and grow on its own. I enjoy the movies about AI, but not being able to control its evolution scares me and makes me wonder what the future holds.

May 04, 2010
I am more concerned about how exactly does it do this? Computers by definition do not have consciousness and it is near impossible for them to understand what they are doing themselves leave alone correct themselves.

Would like to see an example of how this thing works. What kind of bugs can it track? Is there an example of a bug it has corrected.

May 04, 2010
I've been a programmer for something more then 40 years. The best I can expect is stuff that'll fail softly and tell me why. To expect it to fix itself is nonsense, IMHO.... If a program decides that a pointer is bad, it still has to figure out what the pointer should look like. I see that as impossible, other than a generic pointer to some kind of dump function....

There's also an assumption here that a program line will break on it's own - age out, for example. If it does die due to cosmic rays or some such (not impossible), there are techniques involving alternate copies of the software (and doing things twice, etc.), but that's a whole different area. Otherwise, the code I write today is either good or bad....

If my code is zipping along and branches someplace "wrong", not much can be done about it except to halt it.... Maybe the OS or the software could hook something to post a nice note, but....

May 04, 2010
Can the software corrector debug the software corrector?

Also, before someone else says it first, it's the Singularity!!!!! (with extra exclamation points for tastiness).

May 04, 2010
I agree with SMMAssociates. If it knows what's wrong and can fix it, why write software at all? If the correction software is that smart, let IT write the software. Obviously, that's not the case, so the explanation in this article is wrong. I've been programming for 28 years (not as long as SMMAssociates :) and I know the description in the article is wrong. Probably filtered through a non-programmer. These practices aren't new. It's standard practice to have server software being monitored by other server software, then automatically restart it if it fails (among other actions), send out e-mail or text someone to let them know, trigger other software to do something in response. What may be new is making it an open standard. And don't worry insectking and Rosario, there's no singularity and there's no AI going on here. Just simple simple automated monitoring and response.

May 04, 2010
It seems like a better debugger software at most. Maybe it could catch the simplest errors, like memory leaks and infinite loops. It also sounds like there is a huge overhead cost to running it.

May 04, 2010
Perhaps if the software was TESTED in-house rather than off-shored it wouldn't crash as often. I've 1st hand knowledge of this 'effect' of off-shored and out-sourced TESTING.

At least, this would put less load on the monitors as it is done in a TEST LAB with REAL DATA LOADS.

It is the Singular, most undervalued process of software design and engineering.

May 04, 2010
IMHO, we can probably build debuggers and analysis tools to find all of the likely failure points in code, and a good OS and monitor can spot stuff wandering off into places it shouldn't, but expecting something really fix bad code is kinda blue sky. Soft failure and tell me about it....

The problem, of course, is that the code is huge, may be made from multiple libraries that interact in unknown ways, and probably will have a near-infinite number of paths that are possible. OTOH, I'm doing what amounts to e-mail on a machine who's existence I couldn't have imagined in the 60's - why not? Real-time really implies only a monitor anyway, and that needn't be a hog. Other analysis is in the programming phase, and if that has to run overnight (or for a week), so what?

May 04, 2010
Well, I guess I am the youngest programmer here with just 5yrs of exp! But I totally agree with you all.

There are actually many SWs available in the market today which do a good job at monitoring. And I am not talking about just being passive or external. There are stuff available which can monitor memory usage and there are even code optimizers available which can DETECT and FLAG out non reachable conditions and length loops.

But, actual self correcting SW?
Now - Don't think so!
In a decade or two maybe - Plausible.

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more