<<

Index

Numbers aliasing and pointer placement, 139 32- mode, 8, 162 memory, 124 Also see n32 optimization, 139 compatibility, 155 analysis, dependence, 116-117 data types, 160 analyzer, parallel, 2, 3 definition, 155 –ansi option, 23 libraries, 156 n32 definition, 155 a.out files, 27 overflow, 149 architecture porting to n32, 157 instruction set, 136 –32 option, 10, 23 optimizing programs, 136 options, 136 4.3 BSD extensions, 27 archive libraries, 55 64-bit mode, 8, 147-151 data types, 160 archiver. See ar command libraries, 156 ar command, 49-52 –64 option, 10, 23 command syntax, 50 options, 51 argument registers, 162 A arguments store, 162 ABI arrays options, 136 2 gigabyte, 163 abi options, 8 as assembler, 30 ABI specification, 8 assembly language programs address aliases, 93-94 porting to n32, 158 addresses, optimization, 141 assembly language programs, linking, 30 address space, 77 alias analysis, 93-94 B

back substitution, 132

167 Index

bit masks, 151 code BLOCK DATA, 73 arithmetic, 149 blocking and permutation transformations, 111-114 assumptions, 147 controlling, 107 conversion, 128 hints, 147 block padding, 92-93 overflow 32 , 149 restrictions, 92 portable, 150 branch elimination, 129 porting to 64-bit system, 159 BSD 4.3 extensions, 27 porting to n32-bit systems, 155 –Bsymbolic , compiling, 72 shifts, 149 build procedure signed ints, 143 n32, 158 sizeof(int)==sizeof(long), 148 byte counters sizeof(int)==sizeof(void*), 148 and file size, 165 sizeof(long)==4, 148 sizeof(void*)==4, 149 transformation, 128 typedefs, 162 view transformations, 95 C++ writing for 64-bit applications, 147-151 building DSOs, 67 zero-extension, 143 language definitions, 15 code generator, 126-136 ld options, 67 Also see optimizing programs precompiled headers, 15 and optimization levels, 127, 128, 134 C++ programs back substitution, 132 optimization, 139-143 branch elimination, 129 cross-iteration optimization, 130-131 cache read-read elimination, 130 conflicts and padding, 92 read-write elimination, 130 misses, 102 sub-expression elimination, 131 cache optimization write-write elimination, 131 –LNO option, 108 feedback, 136 cache parameters frequency of execution, 136 controlling with LNO, 105 if conversion, 128 CC compiler. See drivers if conversion and floating points, 129 –cckr option, 24 instruction-level parallelism, 129 char, 160 latency, 135 loop unrolling, 131, 134 C language memory exceptions, 129 floating point, 117 modify default, 134 precompiled headers, 15 –O0 option, 127 –clist option, 95 –O1 option, 127

168 Index

code generator, (continued) counters, internal byte, 165 –O2 option, 128-134 cpp preprocessor, 2 –O3 option, 128-134 C programs prefetch, 135 optimization, 139-143 R10000 optimization, 129 cross-file inlining, 90 recurrence breaking, 132 software pipelining, 133, 134 cross-iteration optimization, 130-131 steps at –O2 and –O3 , 133 read-read elimination, 130 read-write elimination, 130 COFF, 11 sub-expression elimination, 131 common block padding, 92-93 write-write elimination, 131 restrictions, 92 Common Object File Format, 11 COMMON symbols, 73 D COMPILER_DEFAULTS_PATH environment variable, 8 –D__EXTENSIONS__ option, 27 compiler back end, 2 –D_MIPS_FPSET, 161 compiler.defaults file, 8 –D_MIPS_ISA, 161 compiler drivers. See drivers, 2 –D_MIPS_SIM, 161 compiler front end, 2 –D_MIPS_SZINT, 161 compiler options. See drivers –D_MIPS_SZLONG, 161 compiler system –D_MIPS_SZPTR, 161 32-bit mode, 10 data 64-bit mode, 10 prefetching, 102 components, 2 data alignment macros, 161 optimizing, 138 n32-bit mode, 10 overview, 2 signed, 142 predefined types, 161 data types compiling with –Bsymbolic , 72 sizes, 160 constant format strings, 151 debugging constants, 149 driver options, 36 negative values, 151 floating points, 122 conventions, syntax, xviii defaults conversion of code, 128 compilation modes, 8 –C option, 23 specification file, 8 –c option, 3, 23 dependence analysis, 109, 116-117 copt optimizer, 2 directives –cord option, 24 LNO, 109

169 Index

disabling traps, 137 DSOs, (continued) disassemble object file, 36 dlsym(), 76 dis command, 36, 37 dynamic loading diagnostics, 76 command syntax, 37 exporting symbols, 66 options, 37 guidelines, 58 hiding symbols, 66 dlclose(), 77 libraries, shared, 58 dlerror(), 76 linking, 32 dlopen(), 75 loading dynamically, 75 dlsym(), 76 mmap() system call, 77 –Dname option, 24 munmap() system call, 77 double, 160 naming conventions, 65 QuickStart, 62-64 drivers QuickStart registry file, 68 as assembler, 30 registry files, 68-70 bypassing, 2 search path, 71 CC compiler, 2 sgidladd(), 76 cc compiler, 2 shared libraries, 58 –c option, 3 starting quickly, 62 defaults, 8, 22 unloading dynamically, 77 f77/90 compiler, 2 versioning, 77 fec preprocessor, 2 file name suffixes, 13 dump command. See elfdump input file suffixes, 13 dwarfdump command, 36, 38 –KPIC , 12 options, 39 linking, 3 DWARF symbolic information, 38 –non_shar ed, 12 dynamic linking, 2, 10, 75 omit linking, 3 Dynamic Shared Objects. See DSOs optimizing programs, 118 options, 23, 36 options to ld, 28 E –show option, 3 stages of compilation, 3 elfdump command, 36, 40 DSOs, 1, 10, 11, 55-80 command syntax, 40 building new DSOs, 65 options, 40 C++, 67 Elf object file, 38 converting libraries, 73 ELF. See executable and linking format creating DSOs, 65 dlclose(), 77 elimination dlerror(), 76 branches, 129 dlopen(), 75 read-read, 130 read-write, 130

170 Index

elimination, (continued) file inlining, 85-94 sub-expression, 131 files write-write, 131 2 gigabyte size, 165 –elspec option, 24 compilation specification, 8 environment executable, 11 optimizing programs, 137 header, 13 environment variable include, 13 COMPILER_DEFAULTS_PATH, 8 internal byte counters, 165 listing properties, 36 environment variables naming conventions, 13 32-bit compilation, 10 precompiled header, 15 64-bit compilation, 10 relocatable, 11 n32-bit compilation, 10 size, 165 –E option, 24 file type, determining, 42 executable and linking format, 1, 10, 11 fission executable files, 11 controlling, 104 exporting symbols, 66 LNO, 110 expressions loops, 100 optimizing, 125 –flist option, 95 extension float, 160 sign, 150 float.h include file, 160 zero, 150 floating points debugging, 122 if conversion, 129 F optimization, 117-122 optimizing, 125 f77/90 compiler, 2 reassociation, 125 fecc preprocessor, 2 Force, 121 fec preprocessor, 2 format bypassing, 2 object file, 1, 10 feedback Fortran and code generator, 136 floating point, 117 –feedback option, 24 padding global arrays, 92 fef77/90p analyzer, 2, 3 program optimization, 127 fef77/90 preprocessor, 2 Fortran programs file command, 36, 42 optimization, 139-143 command syntax, 42 –fullwarn option, 24 example, 43 functions options, 42 implicitly declared, 149

171 Index

fusion IEEE, (continued) controlling, 104 optimization, 119 LNO, 110 if conversion, 128 loop, 99 if-then-else statements optimization, 142 implicitly declared function, 149 G include files, 13 gather-scatter, 102 float.h, 160 controlling, 105 inttypes.h, 162 limits.h, 160 global arrays multiple languages, 14 padding, 92 n32, 158 global offset table, 12 indirect global offset table overflow, 25 calls, using, 139 global optimizer, 138-143 –INLINE , 89-90 –G option, 24 all option, 89 –g option, 24, 36 file option, 90 GOT, 12 must option, 90 GOT overflow, 25 never option, 90 none option, 89 guidelines porting, 157 inliner standalone, 90 inlining, 85-94 H benefits, 88 input file names, 13 header files, 13-22 instruction multiple languages, 14 mips4 recip, 126 portable, 150 mips4 rsqrt, 126 precompiled, 15 prefetching, 102 specification, 14 instruction-level parallelism, 129 –help option, 24 int, 148, 160, 162 high-order bit, 149 integer overflow, 125 scaling, 150 I interleaving reduction, 132 –Idirname option, 25 internal byte counters IEEE and file size, 165 floating points, 119

172 Index

inttypes.h include file, 162 libdl, 75 –IPA, 91-94 libraries addressing=ON option, 94 archive, 55 alias=ON option, 93 global data, 60 forcedepth option, 91 header files, 13 maxdepth option, 91 libdl, 75 Olimit option, 91 locality, 60 opt_alias=ON option, 94 non-shared, converting to DSOs, 73 plimit option, 91 paging, 60 space option, 91 path, 9 ISA routines to exclude, 59 options, 136 routines to include, 59 isa options, 8 self-contained, 59 shared, 1, 10 ISA specification, 8 shared, static, 12, 55 specifying, 30 static data, 59 K tuning, 60 –KPIC option, 12, 25 lib.so functions optimization, 142 limits.h include file, 160 L linking dynamic. See ld latency omit, 3 and code generator, 135 linking. See ld ld LNO. See optimizing programs, –LNO option and assembly language programs, 30 loader C++, 67 runtime. See rld command syntax, 28 loading DSOs, 67 symbols, 66 dynamic linking, 2, 10 example, 30 local variables libraries, default search path, 31 optimization, 139 libraries, specifying, 30 long, 160, 162 link editor, 2 long double, 160 multilanguage programs, 33 long long, 160 options, 28, 67 loop interchange, 97-98 registry files, 68 loop-nest optimization. See optimizing programs, –shar ed option, 65 –LNO option LD_BIND_NOW, 72

173 Index

loops mode blocking, 98 32-bit, 8 fission, 100 64-bit, 8 fusion, 99 n32-bit, 8 interchanging, 97 modeling optimizing, 104 controlling, 107 parallel, 102 –multigot option, 25, 29 unrolling, 98 multilanguage programs loop unrolling and ld, 33 code generator, 131 header files, 14 munmap() system call, 77 M N machine instructions, 36 macro preprocessors, 2 n32, 158 macros assembly language programs, 158 NARGSAVE, 162 build procedure, 158 predefined, 161 include files, 158 typedefs, 162 libraries, 156, 158 makefiles, 158 porting environment, 158 maximum integer type, 163 porting guidelines, 157 memory runtime issues, 159 2 gigabyte arrays, 163 source code changes, 158 referencing, 124, 138 n32-bit mode, 8 memory allocation –n32 option, 10, 23 arrays, 163 naming source files, 13 memory exceptions NARGSAVE macro, 162 if conversion, 129 negative values –mips1 option, 25 problems, 151 –mips2 option, 25 nm command, 36, 43-46 –mips3 option, 25 character codes, 45 –mips4 option, 25 command syntax, 43 example, 46 mips4 recip instruction, 126 example of undefined symbol, 35 mips4 rsqrt instruction, 126 options, 43 MIPS Instruction Set Architecture, 161 undefined symbol, 35 mips options, 8 –nocpp option, 25 mmap() system call, 77 –non_shar ed option, 25

174 Index

–nostdinc option, 25 optimization, (continued) signed data types, 142 STDARG, 141 O stdarg.h, 141 subscripts, 140 object file information switch statements, 142 disassemble, 36 tables, 142 format, 1, 10 tips for improving, 138 listing file properties, 36 unions, 139 listing section sizes, 36, 47 value parameters, 139 symbol table information, 36, 43 VARARG, 141 tools, 36 varargs.h, 141 using, 36 variables, global vs. local, 139 using dwarfdump, 36 optimizer, 2 using elfdump, 36, 40 copt optimizer, 2 –o filename option, 26 optimizing programs –Onum option, 26 –32 option, 136 operating system –64 option, 136 64 bit, 147-151 alias analysis, 93-94 operations –align option, 138 relational, 125 Also see code generator unsigned relational, 125 benefits, 84 cache, 102 optimization, 83-143 code generator, 126-136 addresses, 141 overview, 126 Also see optimizing programs common block padding, 92-93 and register allocation, 143 restrictions, 92 C++ programs, 139-143 data alignment, 138 C programs, 139-143 debugging, 84 Fortran, 139-143 dependence analysis, 116-117 function return values, 139 floating points, 117-122 global, 138-143 Fortran optimization, 127 if-then-else statements, 142 IEEE floating points, 119 libc.so functions, 142 ignoring pragmas, 104 –O0 compiler option, 85 –INLINE option, 89-90 –O1 compiler option, 85 inlining benefits, 88 –O2 compiler option, 85 interprocedural analysis, 85-94 –O3 compiler option, 85 –IPA option, 91, 93 options, 85 –LNO option, 94-117 pointer placement, 139 blocking, 98-99 pointers, 140

175 Index

optimizing programs, (continued) optimizing programs, (continued) blocking and permutation transformations, fold_unsigned_relops, 125 107-108 IEEE_arithmetic option, 119 cache optimization, 108 IEEE option, 118 code transformation, 95 recip, 126 controlling cache parameters, 105 recip option, 122 controlling dependence analysis, 109 roundoff option, 118 controlling fission and fusion, 104 rsqrt, 126 controlling gather-scatter, 105 rsqrt option, 122 controlling illegal transformations, 108 space option, 123 controlling prefetch, 108 pragmas, ignore, 104 controlling transformations, 107 prefetch pragmas, 114-116 directives, 109-117 shared code, 137 fission, 110 target architecture, 136 fusion, 110 target architecture options, 136-137 gather-scatter, 102-103 target environment, 137 loop fission, 100-102 –T ARG option loop fusion, 99-100 isa=mips option, 136 loop interchange, 97-98 madd option, 122, 136 optimization levels, 104 –TENV option, 137-138 outter loop unrolling, 98-99 align_aggregates option, 138 pragmas, 109-117 X option, 137 prefetching, 102 transformation pragmas, 111-114 running LNO, 94 transformations, 118 –mips option, 136 –OPT option, 26 –n32 option, 136 div_split option, 125 –OPT option, 118-126 fold_reassociate option, 125 alias=any option, 124 fold_unsafe_relops, 125 alias=name option, 124 fold_unsigned_relops option, 125 alias=restrict option, 125 recip option, 126 alias=typed option, 124 rsqrt option, 126 alias=unnamed option, 124 overflow div_split, 125 integer, 125 div_split option, 121 integers, 125 fast_complex option, 121 overflow of code, 149 fast_exp option, 121 fast_io option, 121 overflow of global offset table, 25 fast_sqrt option, 121 fold_reassociate, 125 fold_reassociate option, 121 fold_unsafe_relops, 125

176 Index

P precompiled header files, (continued) requirements, 17 padding, blocks, 92-93 reuse, 18 restrictions, 92 prefetch page size, 60 and code generator, 135 paging controlling, 108 alignment, 60 prefetching instructions, 102 parallel analyzer, 2, 3 prefetch pragmas, 114-116 parallel loops, 102 preprocessing, 2 parameters preprocessors optimization, 139 macro, 2 pca analyzer, 2, 3 printf command, 151 pc compiler. See drivers problems, 149 –pch option, 26 constants, 149 PIC. See position-independent code floating points, 122 implicitly declared functions, 149 pixie negative values, 151 and SpeedShop, 144 porting code, 147 pointer, 148, 160, 162 printf, 151 pointer placement scanf, 151 and aliasing, 139 sizeof(int)==sizeof(long), 148 example, 139 sizeof(int)==sizeof(void*), 148 pointers sizeof(long)==4, 148 example, 140 solving, 150 optimization, 140 types, 147 referencing memory, 124 processor specification, 8 –P option, 26 proc options, 9 –p option, 26 prof porting code, 159 and SpeedShop, 144 porting guidelines, 157 position-independent code, 2, 10, 12, 65 Q pragmas ignore, 104 QuickStart DSOs. See DSOs, QuickStart LNO, 109 precompiled header files, 15-21 automatic, 16 R controlling, 20 deletion, 20 –r10000 option, 137 performance, 21

177 Index

–r5000 option, 137 S –r8000 option, 137 read-read elimination, 130 scalar optimizer, copt, 2 read-write elimination, 130 scalar variables word size, 142 recip instruction, 126 scanf function, 151 recurrence breaking back substitution, 132 search path code generator, 132 rld, 71 reduction interleaving, 132 selecting reduction interleaving, 132 compilation mode, 8 instruction set, 8 registers ISA, 8 allocation, 143 processor, 8 argument, 162 blocking, 98 sgidladd(), 76 temp, 162 shared code registry file.See DSOs optimizing, 137 relational operations shared libraries, static, 55 unsigned, 125 shared library, 1, 10 relational operators shared objects, dynamic, 55 integer overflow, 125 short, 160 relocatable files, 11 –show_defaults option, 9 relocation bits, removing, 36 –show option, 3, 26 remove sign bit set, 149 relocation bits, 36 signed data type symbol table, 36 optimization, 142 resolve text symbols, 72 signed ints return values, optimization, 139 64-bit code, 143 rld, 56 sign extension, 148, 150 dynamic linking, 75 size command, 36, 47, 47-48 libdl, 75 command syntax, 47 search path, 71 example, 48 roundoff sizeof(int)==sizeof(long), 148 floating points, 118 sizeof(int)==sizeof(void*), 148 optimization, 118 sizeof(long)==4, 148 rsqrt instruction, 126 sizeof(void*)==4, 149 runtime issues size of object file, 36 n32, 159 runtime linker. See rld

178 Index

software pipelining symbol table, (continued) and code generator, 133 removing, 36 –S option, 26 syntax, conventions, xviii source code n32, 158 T source file names, 13 specifying compilation mode, 8 –T ARG option, 26 SpeedShop, 144 temp registers, 162 pixie command, 144 prof command, 144 –TENV option, 26 ssrun command, 144 transformation standalone inliner, 90 of code, 128 stdarg.h, 141 transformation pragmas, 111-114 STDARG. See optimization transformations controlling illegal, 108 stdio.h header file, 14 controlling with LNO, 107 storing arguments, 162 view code, 95 strings traps printf, 151 disable, 137 scanf, 151 troubleshooting strip command, 36, 48 constants, 149 command options, 49 implicitly declared functions, 149 command syntax, 48 negative values, 151 sub-expression elimination, 131 printf, 151 subscripts scanf, 151 example, 140 sizeof(int)==sizeof(long), 148 optimization, 140 sizeof(int)==sizeof(void*), 148 suffixes sizeof(long)==4, 148 input files, 13 sizeof(void*)==4, 149 switch statements solving problems, 150 optimization, 142 truncation of code, 149 symbol resolution, 72 type, determining for files, 42 symbols typedefs, 151, 162 exporting, 66 types loading, 66 assumptions, 147 symbol table change in size, 149 data, 36 char, 160 dumping data, 43 constants, 149 get listing, 46 double, 160

179 Index

types, (continued) X float, 160 int, 148, 160, 162 –xansi option, 27 largest integer type, 163 XFS long, 160, 162 file size, 165 long double, 160 –xgot option. See –multigot option long long, 160 pointer, 148, 160, 162 problems, 147 Z scaling integer, 150 short, 160 zero extension, 150 sizes, 160 zero-extension code, 143 typographical conventions, xviii

U

–Uname option, 27 unions optimization, 139 unsigned relational operations, 125

V

VARARG. See optimization varargs.h, 141 variables scalar, 142 virtual address space, 77

W

–woff option, 27 word-size scalar variables, 142 write-write elimination, 131

180