How to program unreliable chips

Nov 04, 2013 by Larry Hardesty
How to program unreliable chips
Credit: Christine Daniloff

As transistors get smaller, they also become less reliable. So far, computer-chip designers have been able to work around that problem, but in the future, it could mean that computers stop improving at the rate we've come to expect.

A third possibility, which some researchers have begun to float, is that we could simply let our computers make more mistakes. If, for instance, a few pixels in each frame of a high-definition video are improperly decoded, viewers probably won't notice—but relaxing the requirement of perfect decoding could yield gains in speed or energy efficiency.

In anticipation of the dawning age of unreliable chips, Martin Rinard's research group at MIT's Computer Science and Artificial Intelligence Laboratory has developed a new programming framework that enables software developers to specify when errors may be tolerable. The system then calculates the probability that the software will perform as it's intended.

"If the really is going to stop working, this is a pretty big deal for computer science," says Rinard, a professor in the Department of Electrical Engineering and Computer Science. "Rather than making it a problem, we'd like to make it an opportunity. What we have here is a … system that lets you reason about the effect of this potential unreliability on your program."

Last week, two graduate students in Rinard's group, Michael Carbin and Sasa Misailovic, presented the new system at the Association for Computing Machinery's Object-Oriented Programming, Systems, Languages and Applications conference, where their paper, co-authored with Rinard, won one a best-paper award.

On the dot

The researchers' system, which they've dubbed Rely, begins with a specification of the hardware on which a program is intended to run. That specification includes the expected failure rates of individual low-level instructions, such as the addition, multiplication, or comparison of two values. In its current version, Rely assumes that the hardware also has a failure-free mode of operation—one that might require slower execution or higher power consumption.

A developer who thinks that a particular program instruction can tolerate a little error simply adds a period—a "dot," in programmers' parlance—to the appropriate line of . So the instruction "total = total + new_value" becomes "total = total +. new_value." Where Rely encounters that telltale dot, it knows to evaluate the program's execution using the failure rates in the specification. Otherwise, it assumes that the instruction needs to be executed properly.

Compilers—applications that convert instructions written in high-level programming languages like C or Java into low-level instructions intelligible to computers—typically produce what's called an "intermediate representation," a generic low-level program description that can be straightforwardly mapped onto the instruction set specific to any given chip. Rely simply steps through the intermediate representation, folding the probability that each instruction will yield the right answer into an estimation of the overall variability of the program's output.

"One thing you can have in programs is different paths that are due to conditionals," Misailovic says. "When we statically analyze the program, we want to make sure that we cover all the bases. When you get the variability for a function, this will be the variability of the least-reliable path."

"There's a fair amount of sophisticated reasoning that has to go into this because of these kind of factors," Rinard adds. "It's the difference between reasoning about any specific execution of the program where you've just got one single trace and all possible executions of the program."

Trial runs

The researchers tested their system on several benchmark programs standard in the field, using a range of theoretically predicted failure rates. "We went through the literature and found the numbers that people claimed for existing designs," Carbin says.

With the existing version of Rely, a programmer who finds that permitting a few errors yields an unacceptably low probability of success can go back and tinker with his or her code, removing dots here and there and adding them elsewhere. Re-evaluating the code, the researchers say, generally takes no more than a few seconds.

But in ongoing work, they're trying to develop a version of the system that allows the programmer to simply specify the accepted failure rate for whole blocks of code: say, pixels in a frame of video need to be decoded with 97 percent reliability. The system would then go through and automatically determine how the code should be modified to both meet those requirements and maximize either power savings or speed of execution.

"This is a foundation result, if you will," says Dan Grossman, an associate professor of and engineering at the University of Washington. "This explains how to connect the mathematics behind reliability to the languages that we would use to write code in an unreliable environment."

Grossman believes that for some applications, at least, it's likely that chipmakers will move to unreliable components in the near future. "The increased efficiency in the hardware is very, very tempting," Grossman says. "We need software work like this work in order to make that hardware usable for ."

Explore further: Dude, where's my code?

More information: Paper (PDF): "Verifying Quantitative Reliability for Programs That Execute on Unreliable Hardware"

Related Stories

Dude, where's my code?

Oct 16, 2013

Compilers are computer programs that translate high-level instructions written in human-readable languages like Java or C into low-level instructions that machines can execute. Most compilers also streamline ...

Defibrillator for stalled software

Aug 03, 2011

It’s happened to everyone: You’re using a familiar piece of software to do something you’ve done a thousand times before — say, find a particular word in a document — and all of a ...

Writing programs using ordinary language

Jul 11, 2013

In a pair of recent papers, researchers at MIT's Computer Science and Artificial Intelligence Laboratory have demonstrated that, for a few specific tasks, it's possible to write computer programs using ordinary ...

Detecting program-tampering in the cloud

Sep 11, 2013

For small and midsize organizations, the outsourcing of demanding computational tasks to the cloud—huge banks of computers accessible over the Internet—can be much more cost-effective than buying their ...

Recommended for you

Tackling urban problems with Big Data

17 hours ago

Paul Waddell, a city planning professor at the University of California, Berkeley, with a penchant for conducting research with what he calls his "big urban data," is putting his work to a real-world test ...

Computer-assisted accelerator design

Apr 22, 2014

Stephen Brooks uses his own custom software tool to fire electron beams into a virtual model of proposed accelerator designs for eRHIC. The goal: Keep the cost down and be sure the beams will circulate in ...

User comments : 0

More news stories

Genetic code of the deadly tsetse fly unraveled

Mining the genome of the disease-transmitting tsetse fly, researchers have revealed the genetic adaptions that allow it to have such unique biology and transmit disease to both humans and animals.