Fixing design bugs and wrong wire connections in computer chips after they've been fabricated in silicon is a tedious, trial-and-error process that often costs companies millions of dollars and months of time-to-market.
Engineering researchers at the University of Michigan say it doesn't have to be that way. They've developed a new technology to automate "post-silicon debugging."
"Today's silicon technology has reached such levels of small-scale fabrication and of sheer complexity that it is almost impossible to produce computer chips that work correctly under all scenarios," said Valeria Bertacco, assistant professor of electrical engineering and computer science and co-investigator in the new technology. "Almost all manufacturers must produce several prototypes of a given design before they attain a working chip."
FogClear, as the new method is called, uses puzzle-solving search algorithms to diagnose problems early on and automatically adjust the blueprint for the chip. It reduces parts of the process from days to hours.
"Practically all complicated chips have bugs and finding all bugs is intractable," said Igor Markov, associate professor of computer science and electrical engineering and another of FogClear's developers. "It's a paradox. Today, manufacturers are producing chips that must work for almost all applications, from e-mail to chess, but they cannot be validated for every possible condition. It's physically impossible."
In the current system, a chip design is first validated in simulations. Then a draft is cast in silicon, and this first prototype undergoes additional verification with more realistic applications. If a bug is detected at this stage, an engineer must narrow down the cause of the problem and then craft a fix that does not disrupt the delicate balance of all other components of the system. This can take several days. Engineers then produce new prototypes incorporating all the fixes. This process repeats until they arrive at a prototype that is free of bugs. For modern chips, the process of making sure a chip is free of bugs takes as much time as production.
"Bugs found post-silicon are often very difficult to diagnose and repair because it is difficult to monitor and control the signals that are buried inside a silicon die, or chip. Up until now engineers have handled post-silicon debugging more as an art than a science," said Kai-Hui Chang, a recent doctoral graduate who will present a paper on FogClear at the upcoming International Conference on Computer-Aided Design.
FogClear automates this debugging process. The computer-aided design tool can catch subtle errors that several months of simulations would still miss. Some bugs might take days or weeks before causing any miscomputation, and they might only do so under very rare circumstances, such as operating at high temperature. The new application searches for and finds the simplest way to fix a bug, the one that has the least impact on the working parts of the chip. The solution usually requires reconnecting certain wires, and does not affect transistors.
Chang, who received his doctorate in electrical engineering and computer science from U-M in August, will present Nov. 6 at the International Conference on Computer-Aided Design in San Jose, California. The paper is titled "Automating Post-Silicon Debugging and Repair." Markov and Bertacco are co-authors with Chang.
Source: University of Michigan
Explore further: Tool checks computer architectures, reveals flaws in emerging design