High-Performance Processors AltiVec™ Technology

Overview The preferred programming environment is The Results Are In: the and C++ languages favored by Performance Benchmarks with AltiVec Technology embedded systems developers, and EEMBC develops meaningful performance ’s advanced Freescale is working with leading tool benchmarks for the hardware and software used AltiVec™ technology is designed to enable providers to develop simulators, in embedded systems. Through the combined exceptional general-purpose processing assemblers, linkers and compilers to help efforts of its members, EEMBC benchmarks power for high-performance PowerPC® provide full support for the AltiVec have become an industry standard for evaluating processors in embedded computing and technology. In addition, numerous communications applications. This software libraries are available to help the capabilities of embedded processors, leading-edge technology is engineered to jump-start software development using compilers, and Java™ language implementations support high-bandwidth data processing the AltiVec technology. according to objective, clearly defined, and algorithmic-intensive computations, all application-based criteria. AltiVec technology is engineered to bring in a single-chip solution. AltiVec exceptional power to applications such as Freescale Semiconductor’s* family of technology has proven itself to be a leader telecommunications switches, IP telephony high-performance PowerPC® processors are in enabling high performance, with gateways, speech processing systems, frequently and rigorously put through EEMBC continued “best-in-class” performance image and video processing systems, benchmark certification tests. The graph shows ratings from the Embedded Microprocessor virtual private network servers, high-resolution performance improvements when processors Benchmark Consortium (EEMBC®), an 3-D graphics and more. With its DSP-like are tested out-of-the-box versus when they are organization of semiconductor, compiler computing power, AltiVec technology also tested with AltiVec technology optimizations for and RTOS vendors. enables high-performance PowerPC telecommunications and networking AltiVec technology offers a programmable processors to address markets and applications. Telecommunications tests cover applications in which performance must solution that can be adapted to changing autocorrelation, bit allocation, convolutional be balanced with power consumption, standards and customer requirements for encoder, Fast Fourier Transform (FFT) and protecting technology investments. system cost and peripheral integration. Viterbi decoder. Networking tests cover packet flow, route lookup and Open Shortest Path First (OSPF) protocol.

EEMBC® PERFORMANCE BENCHMARKS

MPC7447A with AltiVec™ Technology 135.0 3x faster! MPC7447A without AltiVec™ Technology 46.7 EEMBC NetMark

MPC7447A with AltiVec™ Technology 500.6 12x faster! MPC7447A without AltiVec™ Technology 41.4 EEMBC TeleMark 0 100 200 300 400 500 600 What Is AltiVec simultaneous operations on all elements Other advantages provided by the AltiVec Technology? within an AltiVec vector register. This is often instructions include: referred to as SIMD (Single Instruction > Fully pipelined with single-cycle throughput Multiple Data) parallel processing. Depending • Simple ops: 1 cycle latency Freescale’s AltiVec technology is a powerful on data size, vectors are 4, 8 or 16 elements • Compound ops: 3–4 cycle latency tool for software developers who want to add long. There is virtually no performance penalty • No restriction on issue with efficiency and speed to their applications. for mingling integer, FPU and AltiVec scalar instructions Starting with the e600 PowerPC core (also technology operations. > Enhanced cache/memory interface known as the G4), a 128-bit vector execution AltiVec technology is an extension to the unit was added to the architecture. This PowerPC instruction set, adding 162 new • Software hints for data reuse probability engine operates concurrently with the “vector” instructions. A set of useful • Prefetch support (stride-N access) existing integer and floating-point units and operations such as sum-across (sums all the > Simplified load/store architecture enables highly parallel operations—up to elements in a vector) and multiply-sum • Simple byte, halfword, word and 16 operations in a single clock cycle. By (multiplies and sums elements in three quadword loads and stores leveraging AltiVec technology, developers vectors) have been added. In addition, data • Virtually no unaligned accesses—software can see dramatic acceleration in manipulation instructions have been managed via performance-driven, high-bandwidth augmented to include operations such as computing and communications applications. permute (fills a register with bytes from two AltiVec technology uses a separate register other registers), merge (merges two vectors file containing 32 entries—each of which is into one), and “splat” (duplicates data across 128 bits wide. Each value within an AltiVec elements in a vector). The AltiVec instructions register is a vector that is made up of have one, two or three source operands and elements. AltiVec instructions perform are non-destructive in nature.

ALTIVEC EXECUTION OF MULTIPLY-ACCUMULATE ALTIVEC VECTOR UNIT BLOCK DIAGRAM

vA Dispatch

vB

IU FPU Vector Unit GPRs FPRs Vector Register File Prod 32 bits 64 bits 128 bits Instruction Stream vC Cache/Memory

+ + + +

vD To identify the functions that are not meeting Freescale offers downloadable libraries Familiar, Supported the performance requirements of the application, of AltiVec technology-enabled functions. Programming software vendors such as Green Hills, To take advantage of AltiVec technology, an Environment Metrowerks and Wind River Systems offer application must be reprogrammed or at least profiling tools designed to enable developers recompiled; however, developers do not need to identify performance bottlenecks that could to rewrite the entire application. Application Because AltiVec technology is based on the be alleviated with AltiVec technology. notes and documentation are also provided to C programming model, developers may help developers start a new application using To help developers more quickly realize the leverage their experience with C, C++ or AltiVec technology. performance benefits of AltiVec technology, Obj C to easily take advantage of the benefits that the AltiVec technology has to offer.

Standard C programs should compile without SAMPLE-BASED PROCESSING modification on C compilers with the AltiVec C Programming Model enabled. New functions and data structures that use the SISD (Single Instruction, Single Data) SISD (Single Instruction, Single Data) AC-3–Audio Decode AC-3–Audio Decode AltiVec technology may be added and used just like other functions and data structures. do{ do{

Software developers can quickly see decode (channel 1) decode decode (channel 2) performance gains in existing applications by decode (channel 3) (channel 1, channel 2 decode (channel 4) channel 3, channel 4 using AltiVec technology. AltiVec technology decode (channel 5) channel 5, channel 6) generally works best for that 10 percent of decode (channel 6) the application that consumes 80 percent of } while (Amplifier is on; step time) } while (Amplifier is on; step time)

CPU time; these functions typically have Approximately 6x performance improvement heavy computational and data loads, two areas where AltiVec technology excels. Without the power of AltiVec technology, the code may have to call a routine six times to perform the same operation on multiple pieces of data. With AltiVec technology, the routine may be run only once on all six sections of data simultaneously.

ALTIVEC LIBRARIES

Application/Category Libraries Available (Please check the Web site for an updated list of available libraries.)

Telecommunications > Fixed-Point and Floating Point Fourier Transform (FFT) Megafunction > Complex, Real, and Real Delayed Least Mean Squared (LMS) Finite Impulse Response (FIR) Functions > Autocorrelation > Global System for Mobile Communications (GSM) Convolution Encoder > GSM/3G Viterbi Decoder > Error Correction Codes (Cyclic Redundancy Check 8,12,16,24) Networking > Open Shortest Path First (OSPF) > Transmission Control Protocol/Internet Protocol (TCP/IP) > Encryption Protocols (AES, DES, 3DES, MD5) Multimedia > 2D Discrete Cosine Transform (DCT) > 2D Inverse Discrete Cosine Transform (IDCT) > MPEG-2 (Moving Picture Experts Group) > MPEG-1 Audio Layer-3 (MP3) > JPEG (Joint Photographic Experts Group) > Quantization > Dequantization > Sum of Absolute Differences (SAD) Print/Imaging > Ghostscript® Library Elements > Color Conversion (RGB to YCbCr) > FS Dithering Link-level Libraries (LibC) > Link-level Support for Standard C Functions (memcpy, strcmp etc.) Mathematical Primatives > Log, Exp, Sin, Cos, Log, Sqrt Operating System Enablement > Linux (TCP/IP) > VxWorks® Elements AltiVec Technology Features > Vector integer and floating-point Performance > SIMD functionality for embedded arithmetic applications with massive data Hot Spots for • Data types processing needs AltiVec Technology • 8-, 16- and 32-bit signed and • 128-bit vector execution unit with unsigned integer data types 32-entry,128-bit register file With its high-performance and ease-of-use • 32-bit IEEE single-precision • Parallel processing with vector floating-point data type software environment, AltiVec technology offers permute unit and vector arithmetic a single-chip solution to many common logical unit • 8-, 16- and 32-bit Boolean data types (e.g., OxFFFF = 16-bit TRUE) embedded computing challenges. AltiVec • 162 additional instructions technology may be used to improve • Modulo and saturation integer • Advanced data types, such as packed performance in the following areas: arithmetic byte, halfword and word integers, and > High-bandwidth data communications packed IEEE® single-precision floats • 32-bit “IEEE-default” single-precision floating-point arithmetic > Packet data processing • Saturation arithmetic • IEEE-default exception handling > Image and video processing > Simplified architecture • IEEE-default “round-to-nearest” > Access concentrators/DSLAMs • Virtually no interrupts other than data storage interrupt on loads and stores • Fast non-IEEE mode (e.g., denorms • ADSL and digital data concentrators flushed to zero) • Allows hardware unaligned > Speech recognition access support > Control flow with highly flexible bit manipulation engine > Voice/sound processing • Virtually no penalty for running AltiVec and standard PowerPC • Compare creates field mask used by > Array numeric processing instructions simultaneously select function > Basestation processing • Streamlined architecture to facilitate • Compare RC bit enables setting > Real-time continuous speech I/O efficient implementation Condition Register • HMM, Viterbi acceleration, neural algorithms > Maintains PowerPC ISA’s RISC › Trivial accept/reject in 3-D graphics register-to-register programming model › Exception detection via > 3-D graphics > Supports parallel operation on byte, software polling • Games, entertainment halfword, word and 128-bit operands › Available library • High-precision CAD • Intra- and interelement arithmetic > Virtual reality instructions > Motion video • Intra- and interelement conditional instructions • MPEG-2, MPEG-4 • Powerful permute, shift and • H.234 rotate instructions > High-fidelity audio • 3-D audio, AC-3, MP3 > Machine intelligence

Learn More: For more information about Freescale products, please visit www.freescale.com.

*The Semiconductor Products Sector of , Inc. became Freescale Semiconductor, Inc. in 2004. Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. The PowerPC name is a trademark of IBM Corp. and used under license. Java and all other Java-based marks are trademarks or registered trademarks of Sun Microsystems, Inc. in the U.S. and other countries. © Freescale Semiconductor, Inc. 2004

ALTIVECGLANCE REV 4 - ALTIVECFACT