2. Optimizing subroutines in assembly language An optimization guide for x86 platforms By Agner Fog Copyright © 1996 - 2006. Last updated 2006-07-05. Contents 1 Introduction ....................................................................................................................... 4 1.1 Reasons for using assembly code .............................................................................. 4 1.2 Reasons for not using assembly code ........................................................................ 5 1.3 Microprocessors covered by this manual .................................................................... 6 1.4 Operating systems covered by this manual................................................................. 6 2 Before you start................................................................................................................. 7 2.1 Things to decide before you start programming .......................................................... 7 2.2 Make a test strategy.................................................................................................... 8 3 The basics of assembly coding.......................................................................................... 9 3.1 Assembly language syntaxes...................................................................................... 9 3.2 Register set and basic instructions............................................................................ 10 3.3 Addressing modes .................................................................................................... 14 3.4 Instruction code format ............................................................................................. 18 3.5 Instruction prefixes.................................................................................................... 19 4 ABI standards.................................................................................................................. 20 4.1 Register usage.......................................................................................................... 20 4.2 Data storage ............................................................................................................. 21 4.3 Function calling conventions..................................................................................... 21 4.4 Name mangling and name decoration ...................................................................... 23 4.5 Function examples.................................................................................................... 23 5 Using intrinsic functions in C++ ....................................................................................... 26 5.1 Using intrinsic functions for system code .................................................................. 27 5.2 Using intrinsic functions for instructions not available in standard C++ ..................... 28 5.3 Using intrinsic functions for vector operations ........................................................... 28 5.4 Availability of intrinsic functions................................................................................. 28 6 Using inline assembly in C++ .......................................................................................... 28 6.1 MASM style inline assembly ..................................................................................... 29 6.2 Gnu style inline assembly ......................................................................................... 34 7 Using an assembler......................................................................................................... 37 7.1 Static link libraries..................................................................................................... 38 7.2 Dynamic link libraries................................................................................................ 39 7.3 Libraries in source code form.................................................................................... 40 7.4 Making classes in assembly...................................................................................... 40 7.5 Thread-safe functions ............................................................................................... 42 7.6 Makefiles .................................................................................................................. 42 8 Making function libraries compatible with multiple compilers and platforms..................... 43 8.1 Supporting multiple name mangling schemes........................................................... 44 8.2 Supporting multiple calling conventions in 32 bit mode ............................................. 45 8.3 Supporting multiple calling conventions in 64 bit mode ............................................. 48 8.4 Supporting different object file formats...................................................................... 49 8.5 Supporting other high level languages ...................................................................... 50 9 Optimizing for speed ....................................................................................................... 50 9.1 Identify the most critical parts of your code ............................................................... 50 9.2 Out of order execution .............................................................................................. 51 9.3 Instruction fetch, decoding and retirement ................................................................ 53 9.4 Instruction latency and throughput ............................................................................ 54 9.5 Break dependence chains......................................................................................... 55 9.6 Jumps and calls........................................................................................................ 56 10 Optimizing for size......................................................................................................... 62 10.1 Choosing shorter instructions.................................................................................. 62 10.2 Using shorter constants and addresses .................................................................. 63 10.3 Reusing constants .................................................................................................. 64 10.4 Constants in 64-bit mode ........................................................................................ 64 10.5 Addresses and pointers in 64-bit mode................................................................... 64 10.6 Making instructions longer for the sake of alignment............................................... 66 11 Optimizing memory access............................................................................................ 69 11.1 How caching works................................................................................................. 69 11.2 Trace cache............................................................................................................ 70 11.3 Alignment of data.................................................................................................... 70 11.4 Alignment of code ................................................................................................... 73 11.5 Organizing data for improved caching..................................................................... 74 11.6 Organizing code for improved caching.................................................................... 75 11.7 Cache control instructions....................................................................................... 75 12 Loops ............................................................................................................................ 76 12.1 Minimize loop overhead .......................................................................................... 76 12.2 Induction variables.................................................................................................. 79 12.3 Move loop-invariant code........................................................................................ 80 12.4 Find the bottlenecks................................................................................................ 80 12.5 Instruction fetch, decoding and retirement in a loop ................................................ 80 12.6 Distribute uops evenly between execution units...................................................... 81 12.7 An example of analysis for bottlenecks ................................................................... 82 12.8 Loop unrolling ......................................................................................................... 85 12.9 Optimize caching .................................................................................................... 87 12.10 Parallelization ....................................................................................................... 88 12.11 Analyzing dependences........................................................................................ 90 12.12 Loops on processors without out-of-order execution ............................................. 92 12.13 Macro loops .......................................................................................................... 94 13 Vector programming...................................................................................................... 96 13.1
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages134 Page
-
File Size-