Seeking out silent threats to simulation integrity

Sep 11, 2013
After researching the impact of soft errors on large-scale computers, scientists learned the impact of the errors on algorithms is great, yet more than 95% of those soft errors can be corrected. The team found that without intervention, soft errors invalidate simulations in a significant fraction of all cases.

Large-scale computing has become a necessity for solving the nation's most intractable problems. Due to their sheer number of cores, high-end computers increasingly exhibit intermittently incorrect behaviors—referred to as "soft errors"—placing the validity of simulation results at risk. A team of scientists at Pacific Northwest National Laboratory investigated the impact of soft errors on a full optimization algorithm. The team found that without intervention, soft errors would invalidate simulations in a significant fraction of all cases. They also found that 95% of the soft errors can be corrected.

The work is featured in the Journal of Chemical Theory and Computation.

To deliver the 100-times performance increase relative to today's largest computers, planned systems will need to combine millions of cores. As the number of cores increases, so does the chance that some of them will intermittently produce unexpected results. These soft errors are a major impediment to utilizing the potential of upcoming high-end systems, silently corrupting the . Only by explicitly looking for such soft errors can they be detected and remedied.

The study investigated optimization methods, which, starting from an initial guess, iteratively reduce the error until an accurate answer is reached. Because of this inherent characteristic, these methods should be relatively insensitive to uncontrolled perturbations. As a concrete example, the team explored the Hartree-Fock method of . Despite the convergent characteristics of optimization methods, in general, and the Hartree-Fock method, in particular, soft errors cause calculations to fail in a significant fraction of cases. Using knowledge about the data structures, bounds and restraints can be defined, allowing large errors to be detected and corrected. In the majority of cases, the remaining residual errors are small enough that they are eliminated in the normal execution of the optimization.

To meet growing computational requirements and solve large-scale problems, exascale computational machines are planned and expected to deliver in the next decade. Increasingly, error detection and correction will become a central consideration for any algorithm. Generic and reusable approaches to address these issues will be formulated.

Explore further: Innovative new supercomputers increase nation's computational capacity and capability

More information: van Dam, H. et al. 2013. A case for soft error detection and correction in computational chemistry, Journal of Chemical Theory and Computation, Article ASAP, July 19, 2013. DOI: 10.1021/ct400489c

add to favorites email to friend print save as pdf

Related Stories

Med errors common among pediatric cancer outpatients

Apr 29, 2013

(HealthDay)—Among pediatric cancer patients who receive medications at home, errors are common, with a rate of 3.6 errors with injury per 100 patients, according to a study published online April 29 in ...

Study shows medication errors lead to child fatalities

Jan 18, 2013

(Medical Xpress)—Serious errors administering drugs to children are occurring frequently due to workload, distraction and ineffective communication, according to a new study exploring the relationship between the nursing ...

Recommended for you

Forging a photo is easy, but how do you spot a fake?

Nov 21, 2014

Faking photographs is not a new phenomenon. The Cottingley Fairies seemed convincing to some in 1917, just as the images recently broadcast on Russian television, purporting to be satellite images showin ...

Algorithm, not live committee, performs author ranking

Nov 21, 2014

Thousands of authors' works enter the public domain each year, but only a small number of them end up being widely available. So how to choose the ones taking center-stage? And how well can a machine-learning ...

Professor proposes alternative to 'Turing Test'

Nov 19, 2014

(Phys.org) —A Georgia Tech professor is offering an alternative to the celebrated "Turing Test" to determine whether a machine or computer program exhibits human-level intelligence. The Turing Test - originally ...

User comments : 0

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.