The TMS320C64x DSPs (including the TMS320DM642 device) are the highest-performance fixed-point DSP generation in the TMS320C6000 DSP platform. The TMS320DM642 (DM642) device is based on the second-generation high-performance, advanced VelociTI very-long-instruction-word (VLIW) architecture (VelociTI.2) developed by Texas Instruments (TI), making these DSPs an excellent choice for digital media applications. The C64x is a code-compatible member of the C6000 DSP platform.
With performance of up to 5760 million instructions per second (MIPS) at a clock rate of 720 MHz, the DM642 device offers cost-effective solutions to high-performance DSP programming challenges. The DM642 DSP possesses the operational flexibility of high-speed controllers and the numerical capability of array processors. The C64x DSP core processor has 64 general-purpose registers of 32-bit word length and eight highly independent functional units-two multipliers for a 32-bit result and six arithmetic logic units (ALUs)- with VelociTI.2 extensions. The VelociTI.2 extensions in the eight functional units include new instructions to accelerate the performance in video and imaging applications and extend the parallelism of the VelociTI architecture. The DM642 can produce four 16-bit multiply-accumulates (MACs) per cycle for a total of 2880 million MACs per second (MMACS), or eight 8-bit MACs per cycle for a total of 5760 MMACS. The DM642 DSP also has application-specific hardware logic, on-chip memory, and additional on-chip peripherals similar to the other C6000 DSP platform devices.
The DM642 uses a two-level cache-based architecture and has a powerful and diverse set of peripherals. The Level 1 program cache (L1P) is a 128-Kbit direct mapped cache and the Level 1 data cache (L1D) is a 128-Kbit 2-way set-associative cache. The Level 2 memory/cache (L2) consists of an 2-Mbit memory space that is shared between program and data space. L2 memory can be configured as mapped memory, cache, or combinations of the two. The peripheral set includes: three configurable video ports; a 10/100 Mb/s Ethernet MAC (EMAC); a management data input/output (MDIO) module; a VCXO interpolated control port (VIC); one multichannel buffered audio serial port (McASP0); an inter-integrated circuit (I2C) Bus module; two multichannel buffered serial ports (McBSPs); three 32-bit general-purpose timers; a user-configurable 16-bit or 32-bit host-port interface (HPI16/HPI32); a peripheral component interconnect (PCI); a 16-pin general-purpose input/output port (GP0) with programmable interrupt/event generation modes; and a 64-bit glueless external memory interface (EMIFA), which is capable of interfacing to synchronous and asynchronous memories and peripherals.
The DM642 device has three configurable video port peripherals (VP0, VP1, and VP2). These video port peripherals provide a glueless interface to common video decoder and encoder devices. The DM642 video port peripherals support multiple resolutions and video standards (e. g., CCIR601, ITU-BT.656, BT.1120, SMPTE 125M, 260M, 274M, and 296M).
These three video port peripherals are configurable and can support either video capture and/or video display modes. Each video port consists of two channels - A and B with a 5120-byte capture/display buffer that is splittable between the two channels.
For more details on the Video Port peripherals, see the TMS320C64x Video Port/VXCO Interpolated Control (VIC) Port Reference Guide (literature number SPRU629).
High-Performance Digital Media Processor (TMS320DM642)
2-, 1.67-, 1.39-ns Instruction Cycle Time
500-, 600-, 720-MHz Clock Rate
Eight 32-Bit Instructions/Cycle
4000, 4800, 5760 MIPS
Fully Software-Compatible With C64x™
VelociTI.2™ Extensions to VelociTI™ Advanced Very-Long-Instruction-Word VLIW) TMS320C64x™ DSP Core
Eight Highly Independent Functional Units With VelociTI.2™ Extensions:
Six ALUs (32-/40-Bit), Each Supports Single 32-Bit, Dual 16-Bit, or Quad 8-Bit Arithmetic per Clock Cycle
Two Multipliers Support Four 16 x 16-Bit Multiplies (32-Bit Results) per Clock Cycle or Eight 8 x 8-Bit Multiplies (16-Bit Results) per Clock Cycle
Load-Store Architecture With Non-Aligned Support
64 32-Bit General-Purpose Registers
Instruction Packing Reduces Code Size
All Instructions Conditional
Instruction Set Features
Byte-Addressable (8-/16-/32-/64-Bit Data)
8-Bit Overflow Protection
Bit-Field Extract, Set, Clear
Normalization, Saturation, Bit-Counting
VelociTI.2™ Increased Orthogonality
L1/L2 Memory Architecture
128K-Bit (16K-Byte) L1P Program Cache (Direct Mapped)
128K-Bit (16K-Byte) L1D Data Cache (2-Way Set-Associative)