CIM® with TI Multicore CPUs
Overview
Texas Instruments multicore CPUs are suitable for CIM® for a number of reasons. First and foremost, they process standard C code, without forcing Linux developers into a series of function calls, or "APIs" (as is the case with CUDA and OpenCL), or forcing developers into programming at the "assembly language" level using a cryptic, narrowly defined instruction set. This is not to say that API based acceleration is not good -- in many cases it can be quite effective and yield excellent performance. But for Linux development, it's easier and less resource intensive to a) avoid rewriting existing, working code so that it instead uses a series of APIs, and b) not have to think in terms "how to write code to fit my accelerator" when creating new code. Second, TI multicore CPUs offer high performance per mm2 and per mW; i.e. they have high performance per square area and per mW of energy (power). In a word, they have high performance density. Examples visible in the market include OMAP devices found in smart phones and DaVinci devices found in digital cameras (see pictures at right).TI Advantages
Due to TI's long and involved experience in embedded products, TI multicore chips excel at both SIMD processing (like Nvidia) and general-purpose processing (like Intel). TI chips have for many years been found in small, embedded products such as mobile phones and digital cameras, which require very low energy consumption and small package size, while still maintaining high performance. The pictures at right show two examples of consumer products where compute-intensive performance is crucial, even with package size and power consumption constraints. The smart phone runs compute-intensive voice and video codecs, and the digital camera does H.264 video compression. Here are some specific TI multicore CPU advantages:- Extremely low energy consumption per chip (from 3 to 5 W)
- Very small package size (24 x 24 mm)
- High amount of onchip memory
- Up to 16 multiply-and-accumulate (MAC) operations per clock cycle (as fast as 1 MAC per 50 pico sec)
Lab Measurements
Here is a data point for a TI multicore CPU (2008 chip) vs. a quad-core Penryn x86 server (this one from a convolution with filter and data lengths about 37,000 and 196,000):Penryn quad x86 Server w/ OpenMP 1 | C6472 w/ CIM® OpenMP |
---|---|
1 sec | 0.65 sec |
CPU Types Used with CIM®
Here are some of the TI CPU types currently supported by CIM® technology:- C641x (single core, 1 GHz)
- C6472 (6-core, 750 MHz per core)
- C6678 (8-core, 1.2 GHz per core)