TM

June 2012 Freescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, C-Ware, the Energy Efficient Solutions logo, mobileGT, PowerQUICC, QorIQ, StarCore and Symphony are trademarks of , Inc., Reg. U.S. Pat. & Tm. Off. Airfast, BeeKit, BeeStack, ColdFire+, CoreNet, Flexis, Kinetis, MagniV, MXC, Platform in a Package, Processor Expert, QorIQ Qonverge, , QUICC Engine, Ready Play, SafeAssure, the SafeAssure logo, SMARTMOS, TurboLink, VortiQa and Xtrinsic are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2012 Freescale Semiconductor, Inc. . Next-generation QorIQ products built on e6500 Power Architecture® technology include an enhanced AltiVec vector processor.

This session describes new instructions for extended support for misaligned vectors, support for handling head and tail vectors, and the long-awaited capability to move from general purpose to vector registers. Learn about the performance improvements resulting from enhancement to both AltiVec and the e6500 core.

This session compliments AltiVec Vector Processor Introduction for Newcomers (Part 1), but those who are already familiar with SIMD and AltiVec processors can attend as a stand-alone session.

Freescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, ColdFire+, C-Ware, the Energy Efficient Solutions logo, Kinetis, mobileGT, PowerQUICC, Processor Expert, QorIQ, Qorivva, StarCore, Symphony and VortiQa are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. Airfast, BeeKit, BeeStack, CoreNet, Flexis, MagniV, MXC, Platform in a Package, QorIQ Qonverge, QUICC Engine, TM 2 Ready Play, SafeAssure, the SafeAssure logo, SMARTMOS, TurboLink, Vybrid and Xtrinsic are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2012 Freescale Semiconductor, Inc. • Overview of AltiVec enhancements in e6500 core • More detail on each change • Some limitations • Unaligned loads and stores in-depth - memcpy

Freescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, ColdFire+, C-Ware, the Energy Efficient Solutions logo, Kinetis, mobileGT, PowerQUICC, Processor Expert, QorIQ, Qorivva, StarCore, Symphony and VortiQa are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. Airfast, BeeKit, BeeStack, CoreNet, Flexis, MagniV, MXC, Platform in a Package, QorIQ Qonverge, QUICC Engine, TM 3 Ready Play, SafeAssure, the SafeAssure logo, SMARTMOS, TurboLink, Vybrid and Xtrinsic are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2012 Freescale Semiconductor, Inc. • AltiVec e6500 core technology is essentially the same as AltiVec technology from the 74xx processors except the following: • Adds new instructions for computing absolute differences • vabsdub – absolute differences (byte) • vabsduh – absolute differences (halfword) • vabsduw – absolute differences (word) • Adds new instructions for moving data from GPRs to VRs • mvidsplt <64> and mviwsplt move data from 2 GPRs into a vector register • Adds new instructions for dealing with misaligned vectors more easily • lvtlx[l], lvtrx[l], stvflx[l], stvfrx[l] – load/store vector to/from left [LRU] • lvswx[l], stvswx[l] – load/store vector with left/right swap [LRU] • Adds new instructions for dealing with elements of vectors • lvexbx, stvebx – load/store vector element indexed byte • lvexhx, stvehx – load/store vector element indexed halfword • lvexwx, stvewx – load/store vector element indexed word • These allow loading/storing of arbitrary elements to arbitrary addresses

Freescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, ColdFire+, C-Ware, the Energy Efficient Solutions logo, Kinetis, mobileGT, PowerQUICC, Processor Expert, QorIQ, Qorivva, StarCore, Symphony and VortiQa are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. Airfast, BeeKit, BeeStack, CoreNet, Flexis, MagniV, MXC, Platform in a Package, QorIQ Qonverge, QUICC Engine, TM 4 Ready Play, SafeAssure, the SafeAssure logo, SMARTMOS, TurboLink, Vybrid and Xtrinsic are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2012 Freescale Semiconductor, Inc. Feature Original Power ISA Description AltiVec AltiVec Definition Definition Little-endian Supported Not Supported Little-endian byte ordering is not supported on Power ISA AltiVec definition. Data stream Supported Not Supported dss, dssall, dst, dstt, dstst, and instructions dststt instructions are not supported on Power ISA AltiVec definition.

Freescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, ColdFire+, C-Ware, the Energy Efficient Solutions logo, Kinetis, mobileGT, PowerQUICC, Processor Expert, QorIQ, Qorivva, StarCore, Symphony and VortiQa are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. Airfast, BeeKit, BeeStack, CoreNet, Flexis, MagniV, MXC, Platform in a Package, QorIQ Qonverge, QUICC Engine, TM 5 Ready Play, SafeAssure, the SafeAssure logo, SMARTMOS, TurboLink, Vybrid and Xtrinsic are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2012 Freescale Semiconductor, Inc. Feature Original Power ISA Description AltiVec AltiVec Definition Definition IVORs Not Supported Supported IVORs added for AltiVec unavailable interrupt and AltiVec assist interrupt. Move from Not Supported Supported mvidsplt <64> and mviwsplt GPR to VR instructions move data from 2 GPRs into a vector register. Absolute Not Supported Supported Absolute difference instructions differences vabsdub, vabsduh, and vabsduw compute the unsigned absolute differences. These are useful for motion estimation in video processing.

Freescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, ColdFire+, C-Ware, the Energy Efficient Solutions logo, Kinetis, mobileGT, PowerQUICC, Processor Expert, QorIQ, Qorivva, StarCore, Symphony and VortiQa are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. Airfast, BeeKit, BeeStack, CoreNet, Flexis, MagniV, MXC, Platform in a Package, QorIQ Qonverge, QUICC Engine, TM 6 Ready Play, SafeAssure, the SafeAssure logo, SMARTMOS, TurboLink, Vybrid and Xtrinsic are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2012 Freescale Semiconductor, Inc. Feature Original Power ISA Description AltiVec AltiVec Definition Definition Extended Not Supported Supported Load vector to left and right (lvtlx[l], support lvtrx[l]), load vector with left-right swap (lvswx[l]), load vector for misaligned for swap merge (lvsm). Store vector vectors from left and right (stvflx[l], stvfrx[l]), store vector with left-right swap (stvswx[l]). Extended Not Supported Supported Load vector element indexed [byte, support half-word, word] indexed (lvexbx, lvexhx, lvexwx) loads specified for handling elements from an arbitrary address head zeroing the rest of the register. Store and tail of vector element indexed [byte, half- word, word] indexed (stvexbx, vectors stvexhx, stvexwx) stores specified elements to an arbitrary address.

Freescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, ColdFire+, C-Ware, the Energy Efficient Solutions logo, Kinetis, mobileGT, PowerQUICC, Processor Expert, QorIQ, Qorivva, StarCore, Symphony and VortiQa are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. Airfast, BeeKit, BeeStack, CoreNet, Flexis, MagniV, MXC, Platform in a Package, QorIQ Qonverge, QUICC Engine, TM 7 Ready Play, SafeAssure, the SafeAssure logo, SMARTMOS, TurboLink, Vybrid and Xtrinsic are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2012 Freescale Semiconductor, Inc. Feature Original Power ISA Description AltiVec AltiVec Definition Definition External PID Not Supported Supported Load and store vector by external instructions for PID (lvepx[l], stvepx[l]) for moving data efficiently across address loading and spaces. storing VRs

Freescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, ColdFire+, C-Ware, the Energy Efficient Solutions logo, Kinetis, mobileGT, PowerQUICC, Processor Expert, QorIQ, Qorivva, StarCore, Symphony and VortiQa are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. Airfast, BeeKit, BeeStack, CoreNet, Flexis, MagniV, MXC, Platform in a Package, QorIQ Qonverge, QUICC Engine, TM 8 Ready Play, SafeAssure, the SafeAssure logo, SMARTMOS, TurboLink, Vybrid and Xtrinsic are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2012 Freescale Semiconductor, Inc. • dss (Data Stream Stop), dssall (stop ALL prefetch engines), dst (Data Stream Touch), dstt (Touch Transient), dstst (Data Stream Touch for STore), and dststt (Touch for Store Transient), instructions were present in the first definition of AltiVec technology for PowerPC processors. These instructions provided software initiated streaming prefetch controls. • In Power ISA these instructions are no longer defined, and streaming is performed by variants of the dcbt instruction or by hardware prefetchers. Cache stashing could be considered an alternative as well. • For Freescale EIS, these instructions are treated as no-ops since they may be present in older code and do not change architectural state.

Freescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, ColdFire+, C-Ware, the Energy Efficient Solutions logo, Kinetis, mobileGT, PowerQUICC, Processor Expert, QorIQ, Qorivva, StarCore, Symphony and VortiQa are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. Airfast, BeeKit, BeeStack, CoreNet, Flexis, MagniV, MXC, Platform in a Package, QorIQ Qonverge, QUICC Engine, TM 9 Ready Play, SafeAssure, the SafeAssure logo, SMARTMOS, TurboLink, Vybrid and Xtrinsic are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2012 Freescale Semiconductor, Inc. • IVORs added for AltiVec unavailable interrupt (IVOR32) and AltiVec assist interrupt (IVOR33). • Original AltiVec technology on the e600 core included an AltiVec unavailable exception. IVORs are the equivalent exception mechanism in e500-based cores. • The AltiVec unavailable interrupt occurs when an attempt is made to execute an AltiVec instruction and MSR[SPV] = 0. − This can be useful in reducing context switch overhead by not saving AltiVec registers unless a process actually uses AltiVec instructions. • The AltiVec assist interrupt occurs when no higher priority exception exists and a de-normalized floating-point number is an operand to an AltiVec floating-point instruction requiring software assist. An AltiVec assist exception is presented to the interrupt mechanism. The instruction handler is required to emulate the interrupt causing instruction to provide correct results with the denormalized input. − In Original AltiVec Java mode, denormalized inputs were accepted and gradual underflow results were generated. In non-Java mode, denormalized inputs are (quietly) flushed to the correctly signed zero (±0) before being used in an instruction. − In general, AltiVec vector instructions generate very few exceptions

Freescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, ColdFire+, C-Ware, the Energy Efficient Solutions logo, Kinetis, mobileGT, PowerQUICC, Processor Expert, QorIQ, Qorivva, StarCore, Symphony and VortiQa are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. Airfast, BeeKit, BeeStack, CoreNet, Flexis, MagniV, MXC, Platform in a Package, QorIQ Qonverge, QUICC Engine, TM 10 Ready Play, SafeAssure, the SafeAssure logo, SMARTMOS, TurboLink, Vybrid and Xtrinsic are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2012 Freescale Semiconductor, Inc. • This was a source of frustration in original AltiVec technology. The explanation was that the interconnect between GPRs and VPRs was not warranted when data could be moved quickly via store and load from L1 cache. Still, the capability was desired by many customers. • These new instructions will make it simpler to program with AltiVec. − mvidsplt vD,rA,rB - Move to Vector from Integer Double Word and Splat <64> Place the contents of rA into high-order 64 bits of vD and place the contents of rB into the low-order 64 bits of vD. − mviwsplt vD,rA,rB - Move to Vector from Integer Word and Splat Place the contents of the low-order 32 bits of rA concatenated with low-order 32 bits of rB into the low-order and high-order 64 bits of vD.

Freescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, ColdFire+, C-Ware, the Energy Efficient Solutions logo, Kinetis, mobileGT, PowerQUICC, Processor Expert, QorIQ, Qorivva, StarCore, Symphony and VortiQa are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. Airfast, BeeKit, BeeStack, CoreNet, Flexis, MagniV, MXC, Platform in a Package, QorIQ Qonverge, QUICC Engine, TM 11 Ready Play, SafeAssure, the SafeAssure logo, SMARTMOS, TurboLink, Vybrid and Xtrinsic are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2012 Freescale Semiconductor, Inc. • vabsdu{b|h|w} vD,vA,vB - Vector Absolute Difference Unsigned {Byte | Half Word | Word} - Each integer element in vB is subtracted from the corresponding integer element in vA. The elements of vA and vB are treated as unsigned integers. The absolute value of the result is placed into the corresponding element of vD. • These are useful for motion estimation in video processing.

Freescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, ColdFire+, C-Ware, the Energy Efficient Solutions logo, Kinetis, mobileGT, PowerQUICC, Processor Expert, QorIQ, Qorivva, StarCore, Symphony and VortiQa are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. Airfast, BeeKit, BeeStack, CoreNet, Flexis, MagniV, MXC, Platform in a Package, QorIQ Qonverge, QUICC Engine, TM 12 Ready Play, SafeAssure, the SafeAssure logo, SMARTMOS, TurboLink, Vybrid and Xtrinsic are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2012 Freescale Semiconductor, Inc. • AltiVec e6500 limitations • Operates in big-endian only • Does not have data streaming (dst type instructions) • They are executed as NOPs • The AltiVec execution units are shared between threads (but the registers are private) • AltiVec execution units will go drowsy (reducing static power) when not used • Come out of drowsy automatically when an AltiVec instruction is executed and the MSR AltiVec available bit is set. • All AltiVec state is retained when the unit is drowsy • stvflx and stvfrx will take an alignment exception for cache- inhibited stores of >8 bytes.

Freescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, ColdFire+, C-Ware, the Energy Efficient Solutions logo, Kinetis, mobileGT, PowerQUICC, Processor Expert, QorIQ, Qorivva, StarCore, Symphony and VortiQa are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. Airfast, BeeKit, BeeStack, CoreNet, Flexis, MagniV, MXC, Platform in a Package, QorIQ Qonverge, QUICC Engine, TM 13 Ready Play, SafeAssure, the SafeAssure logo, SMARTMOS, TurboLink, Vybrid and Xtrinsic are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2012 Freescale Semiconductor, Inc. • Reduces the effort to load and store unaligned (not quad-word aligned) data • Reduces number of registers needed for permute and mask vectors • Reduces the effort to deal with the head and tail of unaligned strings or vector arrays • Improves performance through: − Fewer instructions − Less register pressure − Less context to save • Makes programming AltiVec technology simpler

Freescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, ColdFire+, C-Ware, the Energy Efficient Solutions logo, Kinetis, mobileGT, PowerQUICC, Processor Expert, QorIQ, Qorivva, StarCore, Symphony and VortiQa are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. Airfast, BeeKit, BeeStack, CoreNet, Flexis, MagniV, MXC, Platform in a Package, QorIQ Qonverge, QUICC Engine, TM 14 Ready Play, SafeAssure, the SafeAssure logo, SMARTMOS, TurboLink, Vybrid and Xtrinsic are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2012 Freescale Semiconductor, Inc. • Reference H1109 Implementing and Using the Motorola AltiVec Libraries, Smart Network Developer’s Forum 2003 • Detailed explanation of memcpy using permutation • Became libmotovec.a with libc functions: − memcpy, bcopy, memmove, memset, bzero, strcmp, strlen, memcmp, strcpy, __copy_tofrom_user_vec, __clear_user_vec, csum_partial_vec, csum_partial_copy_generic_vec. • Motivated the Power AltiVec MEPL (Mentor Embedded Performance Library)

Freescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, ColdFire+, C-Ware, the Energy Efficient Solutions logo, Kinetis, mobileGT, PowerQUICC, Processor Expert, QorIQ, Qorivva, StarCore, Symphony and VortiQa are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. Airfast, BeeKit, BeeStack, CoreNet, Flexis, MagniV, MXC, Platform in a Package, QorIQ Qonverge, QUICC Engine, TM 15 Ready Play, SafeAssure, the SafeAssure logo, SMARTMOS, TurboLink, Vybrid and Xtrinsic are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2012 Freescale Semiconductor, Inc. • Solutions Ready Road Show

. TBD

Freescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, ColdFire+, C-Ware, the Energy Efficient Solutions logo, Kinetis, mobileGT, PowerQUICC, Processor Expert, QorIQ, Qorivva, StarCore, Symphony and VortiQa are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. Airfast, BeeKit, BeeStack, CoreNet, Flexis, MagniV, MXC, Platform in a Package, QorIQ Qonverge, QUICC Engine, TM 16 Ready Play, SafeAssure, the SafeAssure logo, SMARTMOS, TurboLink, Vybrid and Xtrinsic are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2012 Freescale Semiconductor, Inc. 16 www.motorola.com/sndf Using AltiVec in Embedded

• We have optimized the EEMBC Telecommunications and Networking benchmarks and gotten a 3X to 4X speedup by recoding in AltiVec. • We utilized AltiVec-enabled libc functions in the Dhrystone benchmark and got a 2-44% speedup (depending on how heavily optimized Dhrystone was already). • Using the AltiVec-enabled library we are currently seeing up to 60% speedup in DINK networking applications. Up to 50% CPU offloading in running Linux protocol stack.

Motorola General Business Information, H1109_ChuckCorley.ppt, Rev 0.2 Slide 17 MOTOROLA and the Stylized M Logo are registered in the U.S. Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2002 www.motorola.com/sndf Options for 128 byte loop – Cold DCache memcpy (AltiVec) vs memcpy (gcc) on 7445@1G/133 All alignments - Cold Dcache 450 128 byte loop 400 Cache management helps w/ dst and dcba the cold DCache case 350 128B w/dcba

300 128B w/dst 250 128 byte loop

200 MBytes/Sec 150

100 gcc

50

0 0 200 400 600 800 1000 1200 1400 1600 Bytes Copied

Motorola General Business Information, H1109_ChuckCorley.ppt, Rev 0.2 Slide 18 MOTOROLA and the Stylized M Logo are registered in the U.S. Patent & Trademark Office. All other product or service names are the property of their respective owners. © Motorola, Inc. 2002 memcpy(0x1000,0x500,27); // Quad word aligned S0: 0x500 a b c d e f g h i j k l m n o p q r s t u v w x y z 0 X X X X X D0: 0x1000 a b c d e f g h i j k l m n o p q r s t u v w x y z 0 X X X X X memcpy(0x1008,0x504, 22); // Word aligned S0: 0x504 X X X X a b c d e f g h i j k l m n o p q r s t u 0 X X X X X X D0: 0x1008 X X X X X X X X a b c d e f g h i j k l m n o p q r s t u 0 X X memcpy(0x1006,0x506, 20 ); // Byte aligned S0: 0x506 X X X X X X a b c d e f g h i j k l m n o p q r s 0 X X X X X X D0: 0x1006 X X X X X X a b c d e f g h i j k l m n o p q r s 0 X X X X X X memcpy(0x1006,0x503, 20 ); // Not aligned

S0: 0x503 X X X a b c d e f g h i j k l m n o p q r s 0 X X X X X X X X X D0: 0x1006 X X X X X X a b c d e f g h i j k l m n o p q r s 0 X X X X X X memcpy (AltiVec) treats everything as Not aligned!

Freescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, ColdFire+, C-Ware, the Energy Efficient Solutions logo, Kinetis, mobileGT, PowerQUICC, Processor Expert, QorIQ, Qorivva, StarCore, Symphony and VortiQa are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. Airfast, BeeKit, BeeStack, CoreNet, Flexis, MagniV, MXC, Platform in a Package, QorIQ Qonverge, QUICC Engine, TM 19 Ready Play, SafeAssure, the SafeAssure logo, SMARTMOS, TurboLink, Vybrid and Xtrinsic are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2012 Freescale Semiconductor, Inc. SOURCE: Addr/EA 0 1 2 3 4 5 6 7 8 9 A B C D E F ↓ S0 S0 S0 S0 S0 S0 S0 S0 S0 9 0A 0B 0C 0D 0E 0F 0x1509

10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F 0x1519

20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F 0x1529

30 31 32 33 34 35 36 37 38 39 3A 3B 3C S3 S3 S3 0x1539 Numbers communicate the source alignment; Colors communicate the destination alignment. DESTINATION: Addr/EA 0 1 2 3 4 5 6 7 8 9 A B C D E F ↓ D0 D0 D0 09 0A 0B 0C 0D 0E 0F 10 11 12 13 14 15 0x1003

16 17 18 19 1A 1B 1C 1D 1E 1F 20 21 22 23 24 25 0x1013

26 27 28 29 2A 2B 2C 2D 2E 2F 30 31 32 33 34 35 0x1023

36 37 38 39 3A 3B 3C D3 D3 D3 D3 D3 D3 D3 D3 D3 0x1033

Freescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, ColdFire+, C-Ware, the Energy Efficient Solutions logo, Kinetis, mobileGT, PowerQUICC, Processor Expert, QorIQ, Qorivva, StarCore, Symphony and VortiQa are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. Airfast, BeeKit, BeeStack, CoreNet, Flexis, MagniV, MXC, Platform in a Package, QorIQ Qonverge, QUICC Engine, TM 20 Ready Play, SafeAssure, the SafeAssure logo, SMARTMOS, TurboLink, Vybrid and Xtrinsic are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2012 Freescale Semiconductor, Inc. Permute allows copying and reordering any of 32 bytes from two AltiVec registers to a destination AltiVec register under control of a fourth permute “vector”

lvsr v9,r0,r6 // create permute vector for DST-SRC … loop: lvx v2,r8,r7 // Get next vector vperm v3,v1,v2,v9 // Align previous and new vectors stvx v3,r3,r7 // Store 16 bytes at next destination vor v1,v2,v2 // Move new vector to previous addi r7,r7,16 // Increment vector count bdnz loop

PERMUTE VECTOR 6 7 8 9 A B C D E F 10 11 12 13 14 15

SOURCE 1 (0x1519): SOURCE 2 (0x1529): 0 1 2 3 4 5 6 7 8 9 A B C D E F 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F 20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F

MEMORY at 0x1010 16 17 18 19 1A 1B 1C 1D 1E 1F 20 21 22 23 24 25 0 1 2 3 4 5 6 7 8 9 A B C D E F DESTINATION 1 (0x1013): Freescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, ColdFire+, C-Ware, the Energy Efficient Solutions logo, Kinetis, mobileGT, PowerQUICC, Processor Expert, QorIQ, Qorivva, StarCore, Symphony and VortiQa are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. Airfast, BeeKit, BeeStack, CoreNet, Flexis, MagniV, MXC, Platform in a Package, QorIQ Qonverge, QUICC Engine, TM 21 Ready Play, SafeAssure, the SafeAssure logo, SMARTMOS, TurboLink, Vybrid and Xtrinsic are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2012 Freescale Semiconductor, Inc. lvswx performs a left/right swap by byte rotation of the vector as it is loaded into the vector register. The number of bytes that are rotated is the offset from the aligned memory address that is specified by EA lvsm v8,r0,r6 // create select vector for DST-SRC … loop: lvswx v2,r8,r7 // Get and swap next vector left-right vsel v3,v2,v1,v8 // Select from previous and new vectors vor v1,v2,v2 // Move new vector to previous stvx v3,r3,r7 // Store 16 bytes at next destination addi r7,r7,16 // Increment vector count bdnz loop

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 SELECTOR (0x6): v8 FF FF FF FF FF FF FF FF FF FF 00 00 00 00 00 00

SRC 1 (0x1516): v1 16 17 18 19 1A 1B 1C 1D 1E 1F 10 11 12 13 14 15

SRC2 (0x1526): v2 26 27 28 29 2A 2B 2C 2D 2E 2F 20 21 22 23 24 25

SELECT v3 16 17 18 19 1A 1B 1C 1D 1E 1F 20 21 22 23 24 25

MEMORY at 0x1010 16 17 18 19 1A 1B 1C 1D 1E 1F 20 21 22 23 24 25 0 1 2 3 4 5 6 7 8 9 A B C D E F

DESTINATION 1 (0x1013): Freescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, ColdFire+, C-Ware, the Energy Efficient Solutions logo, Kinetis, mobileGT, PowerQUICC, Processor Expert, QorIQ, Qorivva, StarCore, Symphony and VortiQa are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. Airfast, BeeKit, BeeStack, CoreNet, Flexis, MagniV, MXC, Platform in a Package, QorIQ Qonverge, QUICC Engine, TM 22 Ready Play, SafeAssure, the SafeAssure logo, SMARTMOS, TurboLink, Vybrid and Xtrinsic are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2012 Freescale Semiconductor, Inc. For vector load instructions that help with misaligned accesses there are three addresses associated with each EA that may be used with these instructions. 1. The first address, EA, is the byte address specified from (rA|0) + (rB). This is called the pivot point. 2. The second address is the vector aligned address obtained by zeroing the low-order 4 bits of EA. This is called the vector start. 3. The third address is the address of the last byte in the 16 byte vector aligned address (vector start + 15). This is called the vector end.

Freescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, ColdFire+, C-Ware, the Energy Efficient Solutions logo, Kinetis, mobileGT, PowerQUICC, Processor Expert, QorIQ, Qorivva, StarCore, Symphony and VortiQa are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. Airfast, BeeKit, BeeStack, CoreNet, Flexis, MagniV, MXC, Platform in a Package, QorIQ Qonverge, QUICC Engine, TM 23 Ready Play, SafeAssure, the SafeAssure logo, SMARTMOS, TurboLink, Vybrid and Xtrinsic are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2012 Freescale Semiconductor, Inc. lvsm vD,rA,rB - Load Vector for Swap Merge. lvsm sets vD as a control vector to select out the left and right elements of a vector from 2 iterations of a loop that uses lvswx to load left and right portions of a vector. The EA for lvsm corresponds to the address that is used to divide the left and right portions of the load operation. For me, the key to understanding is: − lvsm will load (16-EA) “FF”s || EA “00”s • Old way: Use lvsl and lvsr to create permute vectors. Create zero and one constant vectors. Permute zeros and ones into a mask register. • New way: e.g. lvsm with EA = 0x1009 vector start (EA[0:27]) pivot point (EA & 0xF) vector end (EA[0:27] || 0xF)

vD FF FF FF FF FF FF FF 00 00 00 00 00 00 00 00 00 0 1 2 3 4 5 6 7 8 9 A B C D E F

Alignment Boundary Freescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, ColdFire+, C-Ware, the Energy Efficient Solutions logo, Kinetis, mobileGT, PowerQUICC, Processor Expert, QorIQ, Qorivva, StarCore, Symphony and VortiQa are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. Airfast, BeeKit, BeeStack, CoreNet, Flexis, MagniV, MXC, Platform in a Package, QorIQ Qonverge, QUICC Engine, TM 24 Ready Play, SafeAssure, the SafeAssure logo, SMARTMOS, TurboLink, Vybrid and Xtrinsic are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2012 Freescale Semiconductor, Inc. Complex with permutation and store by element; much simpler with new lvtlx and lvtrx instructions. For me, the key to understanding is: • lvtlx will load bytes EA:15 || EA “00”s left justified • lvtrx will load (16-EA) “00”s || 0:(EA-1) right justified

lvtlx v1,r0,r4 // load SRC[9:15] concatenated with 9 “00”s li r7,r7,16 lvtrx v2,r4,r7 // load 7 “00”s concatenated with (SRC+0x10)[0:8] lvsm v9,r0,r4 // create select vector for SRC; 7 “FF”s || 9 “00”s … continued next slide

0 1 2 3 4 5 6 7 8 9 A B C D E F MEMORY at 0x1500 S0 S0 S0 S0 S0 S0 S0 S0 S0 09 0A 0B 0C 0D 0E 0F MEMORY at 0x1510 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 SOURCE 1 (0x1509): v1 9 0A 0B 0C 0D 0E 0F 00 00 00 00 00 00 00 00 00

SOURCE 2 (0x1519): v2 00 00 00 00 00 00 00 10 11 12 13 14 15 16 17 18

SELECTOR (0x9): v9 FF FF FF FF FF FF FF 00 00 00 00 00 00 00 00 00

Freescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, ColdFire+, C-Ware, the Energy Efficient Solutions logo, Kinetis, mobileGT, PowerQUICC, Processor Expert, QorIQ, Qorivva, StarCore, Symphony and VortiQa are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. Airfast, BeeKit, BeeStack, CoreNet, Flexis, MagniV, MXC, Platform in a Package, QorIQ Qonverge, QUICC Engine, TM 25 Ready Play, SafeAssure, the SafeAssure logo, SMARTMOS, TurboLink, Vybrid and Xtrinsic are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2012 Freescale Semiconductor, Inc. • The stvflx instruction stores the leftmost bytes from the vector start into memory starting at EA up until vector end without disturbing other memory contents. My key to understanding is: − stvflx will store from left bytes 0:(15-EA) at EA

vsel v3,v2,v1,v9 // select v2 where v9 is “00” and v1 where v9 is “FF” stvflx v3,r0,r3 // store 0:12 at DST … set up for inner loop

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 SELECTOR (0x9): v9 FF FF FF FF FF FF FF 00 00 00 00 00 00 00 00 00

SELECT v3 9 0A 0B 0C 0D 0E 0F 10 11 12 13 14 15 16 17 18

MEMORY at 0x1000 D0 D0 D0 09 0A 0B 0C 0D 0E 0F 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 A B C D E F DESTINATION 0 (0x1003):

Freescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, ColdFire+, C-Ware, the Energy Efficient Solutions logo, Kinetis, mobileGT, PowerQUICC, Processor Expert, QorIQ, Qorivva, StarCore, Symphony and VortiQa are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. Airfast, BeeKit, BeeStack, CoreNet, Flexis, MagniV, MXC, Platform in a Package, QorIQ Qonverge, QUICC Engine, TM 26 Ready Play, SafeAssure, the SafeAssure logo, SMARTMOS, TurboLink, Vybrid and Xtrinsic are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2012 Freescale Semiconductor, Inc. Complex with permutation and store by element; much simpler with new lvtlx and lvtrx instructions. For me, the key to understanding is: • lvtlx will load bytes EA:15 || EA “00”s left justified • lvtrx will load (16-EA) “00”s || 0:(EA-1) right justified

addi r6,r4,-16 // SRC - 16 lvtlx v1,r6,r5 // load (SRC+BC-16)[13:15] concatenated with 13 “00”s lvtrx v2,r4,r5 // load 3 “00”s concatenated with (SRC+BC)[0:12] lvsm v8,r4,r5 // create select vector for SRC … continued next slide

0 1 2 3 4 5 6 7 8 9 A B C D E F MEMORY at 0x1520 20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F MEMORY at 0x1530 30 31 32 33 34 35 36 37 38 39 3A 3B 3C S3 S3 S3

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 SRC2 (0x152D): v1 2D 2E 2F 00 00 00 00 00 00 00 00 00 00 00 00 00

SRC3 (0x153D): v2 00 00 00 30 31 32 33 34 35 36 37 38 39 3A 3B 3C

SELECTOR (0xD): v8 FF FF FF 00 00 00 00 00 00 00 00 00 00 00 00 00

Freescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, ColdFire+, C-Ware, the Energy Efficient Solutions logo, Kinetis, mobileGT, PowerQUICC, Processor Expert, QorIQ, Qorivva, StarCore, Symphony and VortiQa are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. Airfast, BeeKit, BeeStack, CoreNet, Flexis, MagniV, MXC, Platform in a Package, QorIQ Qonverge, QUICC Engine, TM 27 Ready Play, SafeAssure, the SafeAssure logo, SMARTMOS, TurboLink, Vybrid and Xtrinsic are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2012 Freescale Semiconductor, Inc. • The stvfrx instruction stores rightmost bytes into memory starting at vector start up to, but not including, EA. (This is really useful in preventing going beyond the end of “safe” memory. My key to understanding is: − stvfrx will store from right bytes (16-EA):15 @ EA[0:27]

vsel v3,v2,v1,v9 // select v2 where v9 is “00” and v1 where v9 is “FF” stvfrx v3,r3,r5 // store v3[9:15] at (DST+BC)[0:27] … return

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 SELECTOR (0x7): v8 FF FF FF 00 00 00 00 00 00 00 00 00 00 00 00 00

SELECT v3 2D 2E 2F 30 31 32 33 34 35 36 37 38 39 3A 3B 3C

MEMORY at 0x1030 36 37 38 39 3A 3B 3C D3 D3 D3 D3 D3 D3 D3 D3 D3 0 1 2 3 4 5 6 7 8 9 A B C D E F DESTINATION 3 (DST+BC-1 = 0x1036)

Freescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, ColdFire+, C-Ware, the Energy Efficient Solutions logo, Kinetis, mobileGT, PowerQUICC, Processor Expert, QorIQ, Qorivva, StarCore, Symphony and VortiQa are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. Airfast, BeeKit, BeeStack, CoreNet, Flexis, MagniV, MXC, Platform in a Package, QorIQ Qonverge, QUICC Engine, TM 28 Ready Play, SafeAssure, the SafeAssure logo, SMARTMOS, TurboLink, Vybrid and Xtrinsic are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2012 Freescale Semiconductor, Inc. Portion Instruction Permuting Swapping Head vector ops 1 1 load/perm/lvsm 3 5 store element or vector 3 1 branch taken 5 0 branch not taken 1 0 Inner loop vor or vsel 1 2 load/perm 2 1 store vector 1 1 branch Taken 1 1 cycles (L1) 10 7 Tail vector 0 1 load/perm/lvsm 2 3 store element or vector 3 1 branch taken 2 0 branch not taken 4 1 Freescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, ColdFire+, C-Ware, the Energy Efficient Solutions logo, Kinetis, mobileGT, PowerQUICC, Processor Expert, QorIQ, Qorivva, StarCore, Symphony and VortiQa are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. Airfast, BeeKit, BeeStack, CoreNet, Flexis, MagniV, MXC, Platform in a Package, QorIQ Qonverge, QUICC Engine, TM 29 Ready Play, SafeAssure, the SafeAssure logo, SMARTMOS, TurboLink, Vybrid and Xtrinsic are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2012 Freescale Semiconductor, Inc. lvex{b|h|w}x vD,rA,rB - Load Vector Element Indexed [Byte, Half-word, Word] Indexed. Loads specified elements from an arbitrary address zeroing the rest of the register. Let E be the byte element of vector register vD indexed by rB[60–63]. The byte addressed by EA is loaded into byte E of vD.

Freescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, ColdFire+, C-Ware, the Energy Efficient Solutions logo, Kinetis, mobileGT, PowerQUICC, Processor Expert, QorIQ, Qorivva, StarCore, Symphony and VortiQa are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. Airfast, BeeKit, BeeStack, CoreNet, Flexis, MagniV, MXC, Platform in a Package, QorIQ Qonverge, QUICC Engine, TM 30 Ready Play, SafeAssure, the SafeAssure logo, SMARTMOS, TurboLink, Vybrid and Xtrinsic are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2012 Freescale Semiconductor, Inc. stvex{b|h|w}x vS,rA,rB – Store Vector Element [Byte, Half-word, Word] Indexed. Let E be the byte element of vector register vS indexed by rB[60–63]. Byte E of vS is stored into the byte addressed by EA.

Freescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, ColdFire+, C-Ware, the Energy Efficient Solutions logo, Kinetis, mobileGT, PowerQUICC, Processor Expert, QorIQ, Qorivva, StarCore, Symphony and VortiQa are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. Airfast, BeeKit, BeeStack, CoreNet, Flexis, MagniV, MXC, Platform in a Package, QorIQ Qonverge, QUICC Engine, TM 31 Ready Play, SafeAssure, the SafeAssure logo, SMARTMOS, TurboLink, Vybrid and Xtrinsic are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2012 Freescale Semiconductor, Inc. • AltiVec technology is being “advanced” into the e6500 core (after skipping the e500 – which had SPE, the e500mc, and the e5500) from the e600 core. • New instructions to move data from GPRs to VRs will reduce complexity and instruction count. • New load and store instructions simplify misaligned accesses and reduce complexity and instruction count.

Facebook.com/Freescale Tweeting? Tag yourself in photos Please use hashtag and upload your own! #FTF2012

Session materials will be posted @ www.freescale.com/FTF Look for announcements in the FTF Group on LinkedIn or follow Freescale on Twitter

Freescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, ColdFire+, C-Ware, the Energy Efficient Solutions logo, Kinetis, mobileGT, PowerQUICC, Processor Expert, QorIQ, Qorivva, StarCore, Symphony and VortiQa are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. Airfast, BeeKit, BeeStack, CoreNet, Flexis, MagniV, MXC, Platform in a Package, QorIQ Qonverge, QUICC Engine, TM 32 Ready Play, SafeAssure, the SafeAssure logo, SMARTMOS, TurboLink, Vybrid and Xtrinsic are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2012 Freescale Semiconductor, Inc. TM