Altivec Introduction November 1998, Revision 6.0, No NDA Required

AltiVec Introduction November 1998, Revision 6.0, No NDA Required Computer Systems, Inc. Craig Lund Consultant, Local Knowledge Principal Technologist, Mercury Computer Systems 3 Langley Road Durham, NH 03824-3424 Tel: +1 603 868 2300 Fax: +1 603 868 2301 [email protected] Page 1 Legal This presentation was created by Local Knowledge for Mercury Computer Systems leveraging materials owned by Mercury. Mercury hereby grants permission to both Motorola and Local Knowledge to freely create and distribute derivative works, provided Mercury receives prominent credit for its contribution. Use this information at your own risk. We took care preparing this material. However, we suspect mistakes still exist. Mr. Barry Isenstein VP, Advanced Technology Mercury Computer Systems, Inc. Page 2 Presentation Summary ¤ AltiVec introduction ¤ AltiVec programming ¤ Documentation & tools ¤ AltiVec example code ¤ G4 performance tuning Page 3 AltiVec Summary ¤ Convergence of µP and DSP technology ¤ AltiVec summary ¤ Market position Page 4 What will we do with all the transistors? Logic transistors / cm2 The National Technology Roadmap for Semiconductors Technology Needs 1997 Edition Semiconductor Industry Association ’97 3.7M ’99 6.2M ’01 10M ’03 18M Page 5 DSP & µP Convergence ¤ DSP chips are getting easier to use as they incorporate more complete functionality (byte operations, etc.). ¤ Microprocessors are gaining special-purpose functional units that resemble traditional DSP. µP DSP DSP DSP DSP ¤ According to Apple benchmarks, the first AltiVec chip is many times faster than the fastest DSPs (Hammerhead, C67) on DSP tasks. Page 6 PowerPC Before AltiVec ¤ Look at all the transistors dedicated to wide data paths and to wide memories. Many sit idle when programs manipulate shorter data types. Instruction Unit 32 bit 64 bit Scalar Int. 128 bit Units Floating Point Unit Load/Store Unit Caches External Memory Page 7 AltiVec is SIMD vA vB op op op op vD ¤ Basic concept: always utilize the full bandwidth of data paths both on and off chip. ¤ Downside: standard compilers cannot automatically generate SIMD instruction sequences. Page 8 SIMD Instructions Everywhere ¤ HP’s MAX2, 1994—pack two shorts into 32-bit registers and pipeline. ¤ Sun VIS, 1994—remap FP registers for arrays of bytes and shorts. Instructions for MPEG compression. ¤ Intel MMX, 1996—remap FP registers for arrays of bytes, shorts, and words. Two operand encoding. ¤ Digital’s motion-video instructions (MVI), 1997 — five new instructions for MPEG motion estimation. ¤ 3DNow!, 1998—Intel MMX plus packed singles. ¤ MIPS V—pack two single floats into double registers and pipeline. MIPS MDMX (MadMax)—remap FP registers for arrays of bytes and shorts. Unique 192 bit accumulator for multiply. ¤ Intel Streaming SIMD, 1999—eight new 128-bit FP registers with 70 new instructions & “streaming data”. Page 9 AltiVec History ¤ In 1996 process technology was posed to dramatically shrink the PowerPC core, leaving room for a significant SIMD vector unit. Keith Diefendorff of Motorola, and later of Apple, drove the partnership to create a SIMD extension for PowerPC. As AltiVec’s chief architect, Keith helped define a general purpose unit, not just the multimedia extensions other architectures have settled for. Keith is now editor of Microprocessor Report. ¤ IBM’s POWER2 chips have two scalar 64-bit FP units. In contrast, the PowerPC has one scalar FP unit and AltiVec (more efficient hardware, less standard software). Different markets, different decisions. ¤ VeComp is not part of this history. It was a project by a Motorola research group. Page 10 Motorola’s AltiVec ¤ Just as important as the AltiVec SIMD vector unit are the four prefetch engines (DST) and new cache behaviors (LRU). Instruction Unit Scalar Int. Units Floating Vector Unit Point Unit Load/Store Unit DST External Memory Caches LRU Page 11 Vector Unit Result/Destination Vector Register VR0 VR1 VR2 Three source paths to each execution unit Vector Register File (VRF) VR30 VR31 128 128 128 128 Vector ALU Load/Store paths Vector Permute Unit not shown Page 12 AltiVec Data Types unsigned char vector signed short {}}bool { long vector float vector pixel 128 bits 32 bits 16 bits 8 bits 123 45 67 89 1011 1213 1415 16 12345 678 1234 Three alternative vector register data layouts Page 13 AltiVec Operations ¤ 162 new PowerPC instructions—functionality similar to what is offered in the scalar units, just extrapolated into the SIMD domain. ¤ Includes: new instructions for field permutation and formatting. ¤ Includes: load/store instruction options for cache management. ¤ Includes: instructions that control four data prefetch engines. ¤ The AltiVec vector unit never generates an exception. Default floating point results are reasonable and match the Java standard. Page 14 Java Mode ¤ AltiVec processors always expect floats in IEEE single precision floating point format. ¤ In Java mode AltiVec will perform gradual underflow (process denormalized results correctly). The cost in G4 is one extra cycle of latency for AltiVec floating point operations. Also, in some cases, a denormalized result will cause the processor to trap. With Java mode on, G4 complies with the Java subset of the IEEE floating point standard. Apple says to keep it on always. Mercury says to keep it off. Different markets, different advice. ¤ Also note that strict Java compliance requires that programmers not use the multiply-add fuse instruction (Java requires a product and sum to round separately). Page 15 Programming Options ¤ Compilers cannot automatically vectorize standard C without hints (usually supplied by programmers using #pragma statements). This limitation makes C less popular than FORTRAN within the vector community which, in turn, depresses the quality of the few vector C compilers. ¤ Thus Motorola has followed the lead of DSP suppliers by mapping AltiVec instructions directly into C extensions. The result fits somewhere between assembly language and high-level languages with respect to both performance and productivity. ¤ Two third parties are independently working toward optimized VSIP libraries for AltiVec. Page 16 Productivity Niche ¤ AltiVec will beat a DSP chip in applications where programmer productivity and/or time-to-market is more of a concern than unit cost and hardware productivity. ¤ Compilers offered by DSP vendors can only tie together hand- coded routines. In contrast, little rationale exists for programming a PowerPC core in assembly language. ¤ PowerPCs support fully functional operating systems such as VxWorks. ¤ An MMU helps debug code. ¤ Real floating point is always easier than DSP block floating point (but a precision issue can exist with singles). ¤ Note: the first AltiVec chip will offer software productivity, not hardware productivity. DSP chips are highly integrated and boast optimized interfaces to the outside world. In contrast, embedded “G4” users must furnish their own DSP-like hardware interfaces. Page 17 G4 ¤ The first PowerPC with AltiVec has the Apple code name “G4”. ¤ 1.8 volt core consuming less than 8 watts (typical) at 400 MHz. ¤ Samples now. Production second calendar quarter. Page 18 Motorola G4 Schedule 1996 1997 1998 1999 Design Tape Samples Start Out First Production Silicon Page 19 Will floating point DSP survive? ¤ The market for floating-point DSP is small. Defense applications might even dominate. ¤ New SIMD extensions to all of the major microprocessor families will hit the DSP vendors first where they are weakest—the high-end, floating-point niche. DSP vendors may elect to flee. µP DSP Page 20 AltiVec Programming November 1998, Revision 6.0, No NDA Required Computer Systems, Inc. Craig Lund Consultant, Local Knowledge Principal Technologist, Mercury Computer Systems 3 Langley Road Durham, NH 03824-3424 Tel: +1 603 868 2300 Fax: +1 603 868 2301 [email protected] Page 21 Documentation & Tools ¤ Documentation ¤ Tools ¤ Libraries Page 22 Documentation ¤ AltiVec Technology Programming Environments Manual, revision 0.2, May 1998 describes the new PowerPC instructions— available on Motorola’s web site at <www.mot.com/altivec>. ¤ AltiVec Technology Programming Interface Manual, revision 0.2, June, 1998, describes Motorola’s extensions to the C Language. Alternatively, use Apple’s AltiVec Support In MrC[pp], Revision 1.1, June 3, 1998, available on Apple’s web site; ¤ PowerPC xxx Microprocessor Implementation Definition, Book IV, version 1.3 — Confidential. This tells how AltiVec is implemented. It will be replaced by a public users’ manual when G4 is announced. Page 23 Documentation (Assembler) ¤ PowerPC Compiler Writer’s Guide, dated 9/3/96, available on IBM’s web site at <www.chips.ibm.com/ products/ppc/documents/compiler/cover.html>. Interesting to assembly programmers, not just compiler writers. ¤ PowerPC Embedded Application Binary Interface, available on the web at <www.esofta.com/ softspecs.html>. ¤ PowerPC Microprocessor Family: The Programming Environments for 32-Bit Microprocessors, revision 1; Implementation Variances Relative to Rev. 1 of the Programming Environments Manual, dated 3/97; Programmer’s Reference Guide, revised 10/95, all available on Motorola’s web site at <www.mot.com/ powerpc>. Page 24 Apple Tools ¤ Apple has posted on its web site a complete C/C++ and assembler development environment for AltiVec. See <developer.apple.com/dev/tools/mpw-tools/index.html>. ¤ Apple has also posted an extension to MacOS 8.1/8.5 which allows existing PowerPC microprocessors (except the 601) to execute AltiVec code. ¤ There is no question that a Macintosh is the best AltiVec development environment at the moment. ¤ Apple has additional tools and example code on a CD- ROM available only to Apple developers who have signed NDAs with Apple and with access to Motorola G4 NDA information. ¤ Apple is planning to distribute an improved CDROM to all Apple developers soon. Page 25 Motorola Tools ¤ Motorola has an AltiVec C compiler and assembler that will run within Apple’s MPW environment and within the Metrowerks IDE. However, there is no compelling reason to use Motorola’s tools instead of Apple’s. ¤ Motorola’s tools are also available for Sun, NT, and AIX environments. However, AltiVec emulators are not.

Load more