The surprising usefulness of sloppy arithmetic

January 4, 2011 By Larry Hardesty
Graphic: Christine Daniloff

Ask a computer to add 100 and 100, and its answer will be 200. But what if it sometimes answered 202, and sometimes 199, or any other number within about 1 percent of the correct answer?

Arithmetic circuits that returned such imprecise answers would be much smaller than those in today’s computers. They would consume less power, and many more of them could fit on a single chip, greatly increasing the number of calculations it could perform at once. The question is how useful those imprecise calculations would be.

If early results of a research project at MIT are any indication, the answer is, surprisingly useful. About a year ago, Joseph Bates, an adjunct professor of science at Carnegie Mellon University, was giving a presentation at MIT and found himself talking to Deb Roy, a researcher at MIT’s Media Lab. Three years earlier, before the birth of his son, Roy had outfitted his home with 11 video cameras and 14 microphones, intending to flesh out what he calls the “surprisingly incomplete and biased observational data” about human speech acquisition. Data about a child’s interactions with both its caregivers and its environment could help confirm or refute a number of competing theories in developmental psychology. But combing through more than 100,000 hours of video for, say, every instance in which either a child or its caregivers says “ball,” together with all the child’s interactions with actual balls, is a daunting task for human researchers and artificial-intelligence systems alike. Bates had designed a chip that could perform tens of thousands of simultaneous calculations using sloppy arithmetic and was looking for applications that leant themselves to it.

Roy and Bates knew that algorithms for processing visual data are often fairly error-prone: A system that identifies objects in static images, for instance, is considered good if it’s right about half the time. Increasing a video-processing algorithm’s margin of error ever so slightly, the researchers reasoned, probably wouldn’t compromise its performance too badly. And if the payoff was the ability to do thousands of computations in parallel, Roy and his colleagues might be able to perform analyses of video data that they hadn’t dreamt of before.

High tolerance

So in May 2010, with funding from the U.S. Office of Naval Research, Bates came to MIT as a visiting professor, working with Roy’s group to determine whether video algorithms could be retooled to tolerate sloppy arithmetic. George Shaw, a graduate student in Roy’s group, began by evaluating an algorithm, commonly used in object-recognition systems, that distinguishes foreground and background elements in frames of video.

To simulate the effects of a chip with imprecise arithmetic circuits, Shaw rewrote the algorithm so that the results of all its numerical calculations were either raised or lowered by a randomly generated factor of between 0 and 1 percent. Then he compared its performance to that of the standard implementation of the algorithm. “The difference between the low-precision and the standard arithmetic was trivial,” Shaw says. “It was about 14 pixels out of a million, averaged over many, many frames of video.” “No human could see any of that,” Bates adds.

Of course, a really useful algorithm would have to do more than simply separate foregrounds and backgrounds in frames of video, and the researchers are exploring what tasks to tackle next. But Bates’ chip design looks to be particularly compatible with image and video processing. Although he hasn’t had the chip manufactured yet, Bates has used standard design software to verify that it will work as anticipated. Where current commercial computer chips often have four or even eight “cores,” or separate processing units, Bates’ chip has a thousand; since they don’t have to provide perfectly precise results, they’re much smaller than conventional cores.

But the chip has another notable idiosyncrasy. In most commercial chips, and even in many experimental chips with dozens of cores, any core can communicate with any other. But sending data across the breadth of a chip consumes much more time and energy than sending it locally. So in Bates’ chip, each core can communicate only with its immediate neighbors. That makes it much more efficient — a chip with 1,000 cores would really be 1,000 times faster than a conventional chip — but it also limits its use. Any computation that runs on the chip has to be easily divided into subtasks whose results have consequences mainly for small clusters of related subtasks — those running on the adjacent cores.

On the grid

Fortunately, video processing seems to fit the bill. Digital images are just big blocks of pixels, which can be split into smaller blocks of pixels, each of which is assigned its own core. If the task is to, say, determine whether the image changes from frame to frame, each core need report only on its own block. The core associated with the top left corner of the image doesn’t need to know what’s happening in the bottom right corner.

Bates has identified a few other problems that his chip also handles well. One is a standard problem in computer science called “nearest-neighbor search,” in which you have a set of objects that can each be described by hundreds or thousands of criteria, and you want to find the one that best matches some sample. Another is analysis of protein folding, in which you need to calculate all the different ways in which the different parts of a long biological molecule could interact with each other.

Bob Colwell, who was the chief architect on several of Intel’s Pentium processors and has been a private consultant since 2000, thinks that the most promising application of Bates’ chip could be in human-computer interactions. “There’s a lot of places where the machine does a lot of work on your behalf just to get information in and out of the machine suitable for a human being,” Colwell says. “If you put your hand on a mouse, and you move it a little bit, it really doesn’t matter where exactly the mouse is, because you’re in the loop. If you don’t like where the cursor goes, you’ll move it a little more. Real accuracy in the input is really not necessary.” A system that can tolerate inaccuracy in the input, Colwell, argues can also tolerate (some) inaccuracy in its calculations. The type of graphics processors found in most modern computers are another example, Colwell says, since they work furiously hard to produce 3-D images that probably don’t need to be rendered perfectly.

Bates stresses that his chip would work in conjunction with a standard processor, shouldering a few targeted but labor-intensive tasks, and Colwell says that, depending on how Bates’ chip is constructed, there could be some difficulty in integrating it with existing technologies. But he doesn’t see any of the technical problems as insurmountable. But “there’s going to be a fair amount of people out in the world that as soon as you tell them I’ve got a facility in my new chip that gives sort-of wrong answers, that’s what they’re going to hear no matter how you describe it,” he adds. “That’s kind of a non-technical barrier, but it’s real nonetheless.”

This story is republished courtesy of MIT News (, a popular site that covers news about MIT research, innovation and teaching.

Explore further: Making decisions is the third way we learn, research shows

Related Stories

Making decisions is the third way we learn, research shows

October 11, 2010

Experts have long believed there are two main ways our brains work: cognition, which is thinking or processing information, and affect, which is feeling or emotion. However, a new breakthrough was just made in regard to a ...

Modern society made up of all types

November 4, 2010

Modern society has an intense interest in classifying people into ‘types’, according to a University of Melbourne Cultural Historian, leading to potentially catastrophic life-changing outcomes for those typed – ...

For software developers, more speed and mobility

December 14, 2010

Across the globe, technology and innovation are becoming increasingly more reliant on mobility and accessibility. For software developers working on highly complex projects, that means being able to save their work quickly ...

Recommended for you

Nevada researchers trying to turn roadside weed into biofuel

November 26, 2015

Three decades ago, a University of Nevada researcher who obtained one of the first U.S. Energy Department grants to study the potential to turn plants into biofuels became convinced that a roadside weed—curly top gumweed—was ...

Glider pilots aim for the stratosphere

November 20, 2015

Talk about serendipity. Einar Enevoldson was strolling past a scientist's office in 1991 when he noticed a freshly printed image tacked to the wall. He was thunderstruck; it showed faint particles in the sky that proved something ...


Adjust slider to filter visible comments by rank

Display comments: newest first

5 / 5 (6) Jan 04, 2011
Bistromathics is born!

It's kind of funny it took so long to get here. The human mind is anything but an exact computational device. As we move along in our reverse engineering of the brain I think we are going to find analogs to this multi-sloppy core design occurring iteratively down to the simplest levels in the brain.

5 / 5 (2) Jan 04, 2011
Sloppy Super Fast Fourier Transformation?
not rated yet Jan 04, 2011
I think we are going to find analogs to this multi-sloppy core design occurring iteratively down to the simplest levels in the brain.

Interesting thought. I think we'll find a wildly complex emergent system designed of many small, simple units.
not rated yet Jan 04, 2011
I never thought I'd ever see 1000 cores on a chip. I'm with Pyle - AI :D
1 / 5 (1) Jan 04, 2011
yeah this would be such a smart A.I. No fears of world conquest here.

12 + 12 = 20

11* 13 = 100

He so smart.
5 / 5 (4) Jan 04, 2011
I've developed a lot of software around visualizing music, I can attest to the usefulness of sloppy math results. I can compactly format 1024 byte multiple audio channels in to 32x32 texture, and implement DCT on the fragment shader. You wouldn't dare use the resulting data in any serious analysis BUT the process is extremely fast and very practical for spectrum analysis good enough to use in detecting noticeable patterns in varying frequency ranges. So at the end of the day I can reliably detect a specific effect in audio signals, without precise results.

It isn't even funny how many applications in
The applications go much further than that example, but it is a very clear example for those who cannot imagine why bad results can work effectively.
not rated yet Jan 04, 2011
You guys are funny -- did no one notice the direct correlation to simulating the human brain -- not AI- talking about a celluar simulation of the interconnects -- might be just the thing we need to advance that field
5 / 5 (4) Jan 04, 2011
El Nose,

Sorry if I missed something, but... Are you kidding? AI is pretty much a direct product of simulation of the human brain. I agree with rynex regarding the "wildly complex emergent system", but fail to see how this is incompatible with multi-sloppy cores.

Ultimately pattern recognition is somewhere near the root of our intelligence and this technology is going to let us synthesize this ability.

QC is right that a mind with poor error correction/detection is laughable, but that is where redundancy and significance come into play. Funny that point came from somebody with his record of correctness.

5 / 5 (1) Jan 04, 2011
And gradually the neural network is reinvented...
1 / 5 (1) Jan 05, 2011
It is ideally suitable for simulating almost any physical system. Starting from weather and ending with nuclear explosion. All these systems are not perfect and have some noise inside, meaning sloppiness of calculations do not make results worse, can make even better.
1 / 5 (1) Jan 05, 2011
Neural network is one possible type of computer. PC is another and analogous computer is third. This sloppy one is the fourth type. But they all do identical calculations, just they do them differently and some are more efficient in one thing and other in another.
not rated yet Jan 05, 2011
And now we realize why quantum mechanics at the lowest level is "fuzzy", the universe is a "matrix" simulation using sloppy math. Ahahaha!
not rated yet Jan 10, 2011
How about the next XBOX Kinect comes with this and loses the hopeless lag even if it doesn't know exactly where my elbow is.
not rated yet Jan 21, 2011
Since they're so cheap and fast, why not assign three to each subproblem and use the consensus? It would probably increase the accuracy, assuming the inaccuracy approaches the correct solution.

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.