Vector Game Math Processors.Pdf

Vector Game Math Processors James Leiterman Wordware Publishing, Inc. Library of Congress Cataloging-in-Publication Data Leiterman, James. Vector game math processors / by James Leiterman. p. cm. Includes bibliographical references and index. ISBN 1-55622-921-6 1. Vector processing (Computer science). 2. Computer games--Programming. 3. Supercomputers--Programming. 4. Computer science--Mathematics. 5. Algorithms. I. Title. QA76.5 .L446 2002 004'.35--dc21 2002014988 CIP © 2003, Wordware Publishing, Inc. All Rights Reserved 2320 Los Rios Boulevard Plano, Texas 75074 No part of this book may be reproduced in any form or by any means without permission in writing from Wordware Publishing, Inc. Printed in the United States of America ISBN 1-55622-921-6 10987654321 0211 Product names mentioned are used for identification purposes only and may be trademarks of their respective companies. All inquiries for volume purchases of this book should be addressed to Wordware Publishing, Inc., at the above address. Telephone inquiries may be made by calling: (972) 423-0090 Contents Preface ................................xiii Chapter 1 Introduction ................1 Book Legend ............................7 CD Files .............................7 Pseudo Vec ...........................10 Graphics 101 ..........................11 Algebraic Laws .........................11 I-VU-Q .............................11 Insight..............................13 Chapter 2 Coding Standards ............14 Constants..............................15 Data Alignment ..........................15 Pancake Memory LIFO Queue.................18 Stack ..............................18 Assertions .............................21 Memory Systems .........................24 RamTest Memory Alignment Test ...............25 Memory Header ........................26 Allocate Memory (Malloc Wrapper)..............27 Release Memory (Free Wrapper) ...............28 Allocate Memory ........................29 Allocate (Cleared) Memory ..................29 Free Memory — Pointer is Set to NULL ...........29 Exercises ..............................30 Chapter 3 Processor Differential Insight ......31 Floating-Point 101 .........................31 Floating-Point Comparison ....................33 Processor Data Type Encoding ..................36 X86 and IBM Personal Computer ...............38 Registers ............................43 Destination and Source Orientations..............43 Big and Little Endian......................44 MIPS Multimedia Instructions (MMI).............47 PS2 VU Coprocessor Instruction Supposition.........51 Gekko Supposition .......................52 Function Wrappers.........................54 iii iv Contents Integer Function Wrappers ...................54 Single-Precision Function Quad Vector Wrappers ......62 Double-Precision Function Quad Vector Wrappers ......67 Single-Precision Function Vector Wrappers ..........68 Double-Precision Function Vector Wrappers .........71 Exercises ..............................72 Chapter 4 Vector Methodologies ..........74 Target Processor ..........................74 Type of Data ............................75 AoS...............................75 SoA...............................76 A Possible Solution? ......................77 Packed and Parallel and Pickled ................81 Discrete or Parallel? ........................83 Algorithmic Breakdown ......................86 Array Summation........................86 Thinking Out of the Box (Hexagon) ...............90 Vertical Interpolation with Rounding .............91 Exercises ..............................94 Chapter 5 Vector Data Conversion .........95 (Un)aligned Memory Access ...................95 Pseudo Vec (X86) .......................95 Pseudo Vec (PowerPC).....................98 Pseudo Vec (AltiVec)......................99 Pseudo Vec (MIPS-MMI) ...................99 Pseudo Vec (MIPS-VU0) ...................101 Data Interlacing, Exchanging, Unpacking, and Merging ....101 Swizzle, Shuffle, and Splat....................114 Vector Splat Immediate Signed Byte (16x8-bit) .......114 Vector Splat Byte (16x8-bit) .................114 Vector Splat Immediate Signed Half-Word (8x16-bit)....115 Vector Splat Half-Word (8x16-bit) ..............115 Parallel Copy Half-Word (8x16-bit) .............115 Extract Word into Integer Register (4x16-bit) to (1x16). 116 Insert Word from Integer Register (1x16) to (4x16-bit) . 116 Shuffle-Packed Words (4x16-bit)...............117 Shuffle-Packed Low Words (4x16-bit) ............117 Shuffle-Packed High Words (4x16-bit)............117 Vector Splat Immediate Signed Word (8x16-bit).......118 Vector Splat Word (8x16-bit) .................118 Shuffle-Packed Double Words (4x32-bit) ..........118 Graphics Processor Unit (GPU) Swizzle ...........119 Data Bit Expansion — RGB 5:5:5 to RGB32 ..........120 Vector Unpack Low Pixel16 (4x16-bit) to (4x32) ......120 Vector Unpack High Pixel16 (4x16-bit) to (4x32) ......120 Contents v Parallel Extend from 5 Bits ..................121 Data Bit Expansion........................121 Vector Unpack Low-Signed Byte (8x8) to (8x16-bit) ....122 Vector Unpack High-Signed Byte (8x8) to (8x16-bit) ....122 Vector Unpack Low-Signed Half-Word (4x16) to (4x32-bit) .........................123 Vector Unpack High-Signed Half-Word (4x16) to (4x32-bit) ..........................123 Data Bit Reduction — RGB32 to RGB 5:5:5 ..........123 Vector Pack 32-bit Pixel to 5:5:5 ...............124 Parallel Pack to 5 Bits.....................124 Data Bit Reduction (with Saturation) ..............125 Vector Pack Signed Half-Word Signed Saturate .......125 Vector Pack Signed Half-Word Unsigned Saturate .....125 Vector Pack Unsigned Half-Word Unsigned Saturate ....126 Vector Pack Unsigned Half-Word Unsigned Modulo ....126 Vector Pack Signed Word Signed Saturate ..........127 Vector Pack Signed Word Unsigned Saturate ........127 Vector Pack Unsigned Word Unsigned Saturate .......128 Exercises .............................128 Chapter 6 Bit Mangling ..............129 Boolean Logical AND ......................130 Pseudo Vec ..........................131 Pseudo Vec (X86) .......................132 Pseudo Vec (PowerPC) ....................134 Pseudo Vec (MIPS) ......................136 Boolean Logical OR .......................138 Pseudo Vec ..........................139 Boolean Logical XOR (Exclusive OR) .............139 Pseudo Vec ..........................140 Toolbox Snippet — The Butterfly Switch ..........142 I-VU-Q ............................144 Boolean Logical ANDC .....................147 Pseudo Vec ..........................148 Boolean Logical NOR (NOT OR) ................149 Pseudo Vec ..........................149 Pseudo Vec (X86) .......................150 Pseudo Vec (PowerPC) ....................151 Graphics 101 — Blit .......................151 Copy Blit ...........................152 Transparent Blit ........................152 Graphics 101 — Blit (MMX) ..................153 Graphics Engine — Sprite Layered .............153 Graphics Engine — Sprite Overlay..............154 Exercises .............................155 vi Contents Chapter 7 Bit Wrangling..............157 Parallel Shift (Logical) Left ...................158 Pseudo Vec ..........................159 Pseudo Vec (X86) .......................162 Pseudo Vec (PowerPC) ....................163 Pseudo Vec (MMI) ......................165 Parallel Shift (Logical) Right ..................168 Pseudo Vec ..........................169 Parallel Shift (Arithmetic) Right.................170 Pseudo Vec ..........................172 Pseudo Vec (X86) .......................175 Pseudo Vec (PowerPC) ....................176 Pseudo Vec (MIPS) ......................176 Rotate Left (or N-Right) .....................179 Pseudo Vec ..........................180 Pseudo Vec (X86) .......................181 Pseudo Vec (PowerPC) ....................182 Pseudo Vec (MIPS) ......................184 Secure Hash Algorithm (SHA-1) ................187 Exercises .............................191 Chapter 8 Vector Addition and Subtraction ....192 Vector Floating-Point Addition .................193 Vector Floating-Point Addition with Scalar ...........194 Vector Floating-Point Subtraction ................195 vmp_VecNeg .........................196 Vector Floating-Point Subtraction with Scalar .........196 Pseudo Vec ..........................197 Vector Floating-Point Reverse Subtraction ...........197 Vector Addition and Subtraction (Single-Precision) ......198 Pseudo Vec ..........................198 Pseudo Vec (X86) .......................201 Pseudo Vec (PowerPC) ....................204 Pseudo Vec (MIPS) ......................205 Vector Scalar Addition and Subtraction .............206 Single-Precision Quad Vector Float Scalar Addition ....207 Single-Precision Quad Vector Float Scalar Subtraction . 207 Vector Integer Addition .....................208 Pseudo Vec ..........................209 Vector Integer Addition with Saturation.............210 Vector Integer Subtraction ....................213 Vector Integer Subtraction with Saturation ...........214 Vector Addition and Subtraction (Fixed Point) .........215 Pseudo Vec ..........................215 Pseudo Vec (X86) .......................217 Pseudo Vec (PowerPC) ....................218 Contents vii Pseudo Vec (MIPS) ......................218 Exercises .............................219 Project ..............................220 Chapter 9 Vector Multiplication and Division ...221 Floating-Point Multiplication ..................222 NxSP-FP Multiplication ...................222

Vector Game Math Processors.Pdf

Development of a Dynamically Extensible Spinnaker Chip Computing Module

NVIDIA CUDA on IBM POWER8: Technical Overview, Software Installation, and Application Development

Openpower AI CERN V1.Pdf

IBM Power Systems Performance Report Apr 13, 2021

Probabilistic Study of End-To-End Constraints in Real-Time Systems Cristian Maxim

佐野正博(2010)「Cpu モジュール単体の性能指標としての Ips 値」2010 年度技術戦略論用資料

Computer Architectures an Overview

Modern Processor Design: Fundamentals of Superscalar

Copy-Back Cache Organisation for An

POWER8® Processor-Based Systems RAS Introduction to Power Systems™ Reliability, Availability, and Serviceability

Lawrence Berkeley National Laboratory Recent Work

Organization of the Motorola 88110 Superscalar RISC Microprocessor