IMEC reports breakthrough power efficiency and performance of coarse-grain processor

Nov 02, 2005

IMEC’s novel coarse-grain processor, ADRES, meets requirements of design-time and runtime reconfigurability for supporting a broad range of embedded applications. ADRES beats the performance of state-of-the-art DSP processors, while offering the same flexibility and having the power efficiency of state-of-the-art ASIPs. The core can also be used in a multi-core context for high-performance applications.

Within tomorrow’s environment, mobile terminals will have to efficiently deal with a multitude of communication modes (such as WLAN, 3G/4G, 802.16e and DVB-H) and content formats (such as MPEG-2, MPEG-4, AVC/H.264 and Scalable Video Coding). Low-power implementation of complex functionality combined with flexibility to deal with legacy standards and future algorithmic developments, requires both algorithmic optimization towards the platform architecture and a new highly efficient programmable core with performance and power specifications well beyond state-of-the-art cores.

IMEC’s ADRES (Architecture for Dynamically Reconfigurable Embedded System) core is a flexible architecture that consists of a tightly coupled VLIW (Very Long Instruction Word) processor and a coarse-grain reconfigurable array. The core is fully programmable from C thanks to the co-developed C-code compiler. The core has been developed within IMEC’s Multi-Mode MultiMedia (M4) strategic research program, which creates breakthrough solutions for flexible processing cores and multi-core processor system architectures.

Benchmark studies of a H.264 - AVC decoder and MIMO-SDM receiver implemented on the ADRES core prove that the core beats the performance of state-of-the-art DSP processors, while offering the same flexibility and having the power efficiency of state-of-the-art ASIPs. For a 32-bit ADRES core with 8x8 functional units in a 90nm CMOS technology, the following numbers apply:

- Power efficiency in the range of 50-60MOPS/mW;
- Peak performance around 25 GOPS (400MHz);
- An area of 7mm2 including the core and the L1 cache (32 KB data, 128 KB instruction and 16 KB configuration memory).

The performance of compiled C-code on the ADRES core is 10 times better in terms of cycles than compiled code on state-of-the-art DSP solutions. Even when using C-code compilation, ADRES achieves an improved performance compared to current state-of-the-art assembly-programmed approaches.

These characteristics make ADRES a processor of choice for supporting multi-mode baseband SDR (software-defined radio) platforms. A single 8x8 ADRES processor core is sufficient to support 2 parallel high-performance communication streams of 802.11n, 802.16e or HRHM (high-rate high-mobility) solutions of the future. Also legacy standards such as 2G or 3G can be supported in parallel on the core. IMEC currently deploys the ADRES core in its flexible-air interface within the M4 program targeting an area of 15mm2 and power consumption of 300mW (when running 802.11n on the platform in active transmission mode) for a 90nm CMOS technology. In standby mode, the platform only consumes 5mW.

The ADRES core is also ideally suited for multi-format multimedia (3MF) platforms. Due to the intrinsic power/functionality/silicon-area scalability of 3MF-platform and ADRES-core architectures, it is possible to create scaled platforms specially optimized to support less advanced standards or lower performance versions and consequently saving silicon area. Also minimal power usage when running standards with lower performance on a high-end platform implementation can be assured by exploiting the advantages of the ADRES core. Within IMEC’s M4 program an area of 75mm2 (of which 50 mm2 in the ADRES cores and the L2 cache) and power consumption of 700mW is targeted for the 3MF block to support HDTV-level AVC and VGA-level SVC for a 90nm CMOS technology.

Breakthrough specifications were achieved by the unique architecture of the core, parallelism and compiler optimization. ADRES combines the host VLIW processor and configurable accelerator in a single architecture, leading to simplified programming and removal of communication bottleneck. By using a VLIW instead of a RISC, the limited parallelism available in parts of the code that cannot be mapped to the array, can also be exploited leading to an important overall performance increase. Higher parallelism has also been achieved by exploiting loop-level parallelism in both the architecture and the compiler. IMEC’s industrial partners target to deploy the developed technologies in the beyond 2008 generations of mass-market mobile terminals. M4 aims at significantly improving the power/performance /cost-efficiency balance compared to state-of-the-art architectures needed to support the functionality expectations of beyond third generation (3G) mobile devices.

Source: IMEC

Explore further: 5G mobile networks will support an internet that's so good you can feel it

add to favorites email to friend print save as pdf

Related Stories

Recommended for you

Microsoft to tap $2-trillion Indian cloud market

25 minutes ago

Microsoft announced plans Tuesday to offer its commercial cloud services from Indian data centres as it seeks to tap what it calls a $2-trillion market in the country where Internet use is growing rapidly.

User comments : 0