Advanced mathematical algorithms are essential for processing electronic signals within computers and embedded processors. Scientists and engineers are constantly refining and redesigning their algorithms to obtain higher throughput of information on ever smaller devices that consume less power.

Now, Pramod Kumar Meher of the A*STAR Institute for Infocomm Research in Singapore and co-workers at Central South University in Changsha, China, have developed an efficient new method to implement an important step in signal processing, called the discrete cosine transform (DCT). Their method could lead to devices that occupy smaller areas, provide higher throughput of information, and consume less power than existing devices.

The DCT is commonly used for the compression of digital video and audio such as MPEG files (see image). Similar to the better-known Fourier transform, the DCT involves expressing a series of data points as a sum of their product with cosine functions.

Several algorithms and software architectures already exist for computing so-called 'power-of-two-length DCTs'. But, those DCTs are not suitable for all applications. The prime-length DCT is an alternative to the power-of-two-length DCT that has the potential to be more efficient for implementation in hardware, Meher notes.

Meher and his co-workers have focused on computing the DCT of different lengths of practical interest using specialized digital circuits that occupy less area on a silicon chip and use less power, but run at adequate speed. They not only derived a more efficient algorithm for DCT, but also derived new architecture—based on the 'distributed arithmetic' approach—for implementing the algorithm in integrated circuit chips.

Meher and co-workers made use of a theorem that inter-relates the transforms with cyclic convolution of two finite duration sequences. By using look-up tables, this convolution, and thereafter the prime-length DCT, could be performed quickly and accurately.

The team also described a new, efficient algorithm for decomposing the DCT—in mathematics, this means rewriting the problem in terms of a combination of simpler quantities. In addition to reducing the required size of read-only memory (ROM), the researchers found that overall their algorithm significantly reduced the computation time.

"We found that the proposed design involves significantly less area and it yields higher throughput with less power consumption than the corresponding existing designs," says Meher. "The structure we propose is highly regular, modular and therefore suitable for Very Large Scale Integration realization."

**Explore further:**
White House backs use of body cameras by police

**More information:** Xie, J., Meher, P. K. & He, J. Hardware-efficient realization of prime-length DCT based on distributed arithmetic. *IEEE Transactions on Computers* preprint, 6 March 2012 (doi: 10.1109/TC.2012.64). http://www.computer.org/csdl/trans/tc/preprint/ttc2012990042-abs.html