0-9, and Symbols A
Total Page:16
File Type:pdf, Size:1020Kb
Index Note : Online information is listed by print page number and a period followed by “e” with online page number (54.e1). Page references preceded by a single letter with hyphen refer to appendices. Page references followed by “f ,” “ t ,” and “ b ” refer to fi gures, tables, and boxes, respectively. 0-9, and symbols ID (ASID) , 436 VAX fl oating-point formats , D-29 inadequate , 497.e5–497.e6 ALU control , 249–251 . See also 1-bit ALU , A-26–A-29 . See also shared , 507–508 Arithmetic logic unit (ALU) Arithmetic logic unit (ALU) single physical , 507 , 507–508 bits , 250–251 , 250 f adder , A-27 f virtual , 436 logic , C-6–C-7 CarryOut , A-28 Address translation mapping to gates , C-4–C-7 for most signifi cant bit , A-33 f for ARM cortex-A53 , 458 f truth tables , C-5f , C-5 f illustrated , A-29f defi ned , 418–419 ALU control block , 253 logical unit for AND/OR , A-27 f fast , 428–430 defi ned , C-4–C-6 performing AND, OR, and addition , for Intel core i7 , 458 f generating ALU control bits , C-6 f A-31 , A-33 f TLB for , 428–430 ALUOp , 250 , C-6 b –C-7 b 64-bit ALU , A-29–A-31 . See also Address-control lines , C-26f bits , 250 , 251 Arithmetic logic unit (ALU) Addresses control signal , 253 from 63 copies of 1-bit ALU , A-34 f b a s e , 6 9 Amazon Web Services (AWS) , 415b with 64 1-bit ALUs , A-30f byte , 70 AMD Opteron X4 (Barcelona) , 533 , 534f defi ning in Verilog , A-36–A-37 defi ned , 68 AMD64 , 148 , 148 , 215 , 173.e5 illustrated , A-35f m e m o r y , 7 8 b Amdahl’s law , 391 , 493–494 ripple carry adder , A-29 virtual , 418–419 , 438 , 439 b corollary , 49 7090/7094 hardware , 248.e6 Addressing defi ned , 49 base , 118 f fallacy , 546 A in branches , 115–117 and (and) , 64f displacement , 118 AND gates , A-12–A-13 , C-7 Absolute references , 127 immediate , 118f AND operation , 90 , A-6 Abstractions PC-relative , 115–116 , 118f andi (and immediate) , 64f hardware/soft ware interface , 22 register , 118 f Annual failure rate (AFR) , 408–409 principle , 22 RISC-V modes , 117–118 versus MTTF of disks , 408b –409 b to simplify design , 11 x86 modes , 151 Antidependence , 325 Accumulator architectures , 173.e1–173.e2 Addressing modes Antifuse , A-77 A c r o n y m s , 9 desktop architectures , D-5–D-6 Apple computer , 54.e6 Active matrix , 18 Advanced Vector Extensions (AVX) , 216 , Apple iPad 2 A1395 , 20f add (add) , 64 f 217 logic board of , 20 f addi (add immediate) , 64f , 72 , 84 AGP , B-9–B-10 processor integrated circuit of , 21 f Addition , 172–175 . See also Arithmetic Algol-60 , 173.e6 Application binary interface (ABI) , 22 binary , 172 b –173 b Aliasing , 434 Application programming interfaces fl oating-point , 196–199 , 204 Alignment restriction , 70 (APIs) operands , 173 , 173 All-pairs N-body algorithm , B-65 defi ned , B-4 signifi cands , 195b –196 b Alpha architecture graphics , B-14 speed , 175 b bit count instructions , D-29 Architectural registers , 335–336 Address interleaving , 370–371 fl oating-point instructions , D-28–D-29 Arithmetic , 170 Address select logic , C-24 , C-25 f instructions , D-27–D-29 addition , 172–175 Address space , 418 , 421b no divide , D-28 addition and subtraction , 172–175 extending , 467b PAL code , D-28 division , 181–189 fl at , 467 unaligned load-store , D-28 fallacies and pitfalls , 220–223 I-1 I-2 Index Arithmetic (Continued) microcode , C-30 Biased notation , 81 , 193 fl oating-point , 189–214 number acceptance , 126 Binary numbers , 82 historical perspective , 225 object fi le , 126 ASCII versus, 109 b multiplication , 175–181 A s s e m b l y l a n g u a g e , 1 5f conversion to decimal numbers , 77 b parallelism and , 214–215 defi ned , 14 , 125 defi ned , 74 Streaming SIMD Extensions and fl oating-point , 205f Bisection bandwidth , 525 advanced vector extensions in illustrated , 15 f Bit maps x86 , 215–216 programs , 125 defi ned , 18 subtraction , 172–175 RISC-V , 64 f , 8 5 b –86 b g o a l , 1 8 subword parallelism , 214–215 translating into machine language , storing , 18 subword parallelism and matrix 85 b –86 b Bit-Interleaved Parity (RAID 3) , 481.e4 multiply , 216–220 Asserted signals , 240 , A-4 Bits Arithmetic instructions . See also Associativity ALUOp , 250 , 251 Instructions in caches , 395 b –396 b defi ned , 14 desktop RISC , D-11f , D-11 f degree, increasing , 394–396 , 442 dirty , 428 b embedded RISC , D-13 f increasing , 399–400 guard , 212 logical , 241–242 set, tag size versus, 399 b –400 b patterns , 212b –213 b operands , 67–74 Atomic compare and swap , 123 b reference , 426b Arithmetic intensity , 531–532 Atomic exchange , 122 rounding , 212 Arithmetic logic unit (ALU) . See also Atomic fetch-and-increment , 123b sign , 75 ALU control ; Control units Atomic memory operation , B-21 state , C-8–C-10 1-bit , A-26–A-29 Attribute interpolation , B-43–B-44 sticky , 212 64-bit , A-29–A-31 auipc’s eff ect , 156 valid , 374–376 before forwarding , 297f Automobiles, computer application in , 4 Blocking assignment , A-24 branch datapath , 244–245 Average memory access time (AMAT) , Blocking factor , 404 hardware , 174 392 Block-Interleaved Parity (RAID 4) , 481. memory-reference instruction calculating , 392 b e4–481.e5 use , 235 Blocks for register values , 242 B combinational , A-4–A-5 R-format operations , 243 f defi ned , 365–366 signed-immediate input , 300 Bandwidth , 29–30 fi nding , 442–443 ARM Cortex-A53 , 234 , 332–340 bisection , 525 fl exible placement , 392–396 address translation for , 458f external to DRAM , 388 least recently used (LRU) , 399 caches in , 459 f memory , 388 locating in cache , 397–399 data cache miss rates for , 460f network , 523–524 miss rate and , 381f memory hierarchies of , 457 Barrier synchronization , B-18 multiword, mapping addresses to , performance of , 460–462 defi ned , B-20 380 b –381 b specifi cation , 333 f for thread communication , B-34 placement locations , 441 TLB hardware for , 458 f Base addressing , 69 , 118 placement strategies , 394 ARPAnet , 54.e9 Base registers , 69 replacement selection , 399 Arrays , 405 f Basic block , 95 b replacement strategies , 444 logic elements , A-18–A-20 Benchmarks , 528–538 spatial locality exploitation , 381 multiple dimension , 210 defi ned , 46 state , A-4–A-5 pointers versus, 141–144 Linpack , 528 , 248.e2–248dir.e3 , valid data , 374–376 procedures for setting to zero , 141 f 248.e3 Bonding , 28 ASCII multiprocessor , 528–538 Boolean algebra , A-6–A-7 binary numbers versus, 109 b NAS parallel , 530 Bounds check shortcut , 96 character representation , 108 f parallel , 529 f Branch datapath defi ned , 108–109 PARSEC suite , 530 ALU , 244–245 symbols , 111 SPEC CPU , 46–48 operations , 244–245 Assemblers , 125–127 SPEC power , 48–49 Branch if Equal (beq) , A-32 defi ned , 14 SPECrate , 528 Branch if greater than or equal, unsigned function , 125–127 Stream , 538 b (bgeu) , 95–96 Index I-3 Branch if less than (blt) instruction , compiling assignment with registers , set-associative cache , 395 95–96 67 b –68 b steps , 383 Branch if less than, unsigned (bltu) , compiling while loops in , 94 b –95 b in write-through cache , 383 95–96 sort algorithms , 141 f Cache performance , 388–408 Branch instructions translation hierarchy , 124 f calculating , 390 b –391 b pipeline impact , 306f translation to RISC-V assembly hit time and , 391–392 Branch not taken language , 65 impact on processor performance , assumption , 305–306 variables , 104 b 390–391 defi ned , 244 C.mmp , 577.e3–577.e4 Cache-aware instructions , 470 Branch prediction C + + language , 173.e7 , 150.e26 Caches , 373–388 . See also Blocks b u ff ers , 308 Cache blocking and matrix multiply , accessing , 376–382 as control hazard solution , 272 463–466 in ARM cortex-A53 , 459 f defi ned , 271–272 Cache coherence , 452–456 associativity in , 395b –396 b dynamic , 272 , 308–312 coherence , 452 bits in , 380 b static , 322 consistency , 452 bits needed for , 380 Branch predictors enforcement schemes , 454 contents illustration , 377f accuracy , 310 implementation techniques , 482. defi ned , 19–22 , 373–374 correlation , 310–311 e10–482.e11 direct-mapped , 374 , 375f , 380 , 392 information from , 310–311 migration , 454 empty , 376 tournament , 311–312 problem , 452 , 453 f , 456 b FSM for controlling , 447–452 Branch table , 97–98 protocol example , 482.e11–482.e15 fully associative , 393 Branch taken protocols , 454 GPU , B-38 cost reduction , 306–307 replication , 454 inconsistent , 383 defi ned , 244 snooping protocol , 454–456 index , 378 Branch target snoopy , 482.e16 in Intel Core i7 , 459 f addresses , 244 state diagram , 482.e15 f Intrinsity FastMATH example , b u ff ers , 310 Cache coherency protocol , 482.e11–482. 385–387 B r a n c h e s . See also Conditional e15 locating blocks in , 397–399 branches fi nite-state transition diagram , 482.e14 f locations , 375 f addressing in , 115–117 functioning , 482.e13 f multilevel , 388 , 400–403 compiler creation , 93–94 mechanism , 482.e13f nonblocking , 458 decision, moving up , 306–307 state diagram , 482.e15 f physically addressed , 434–435 delayed , 272 , 306–308 states , 482.e12 physically indexed , 434b –435 b ending , 95b write-back cache , 482.e14 f physically tagged , 434b –435 b execution in ID stage , 307 Cache controllers , 457 primary , 400 , 407–408 pipelined , 308b coherent cache implementation secondary , 400 , 407–408 target address , 306–307 techniques , 482.e10–482.e11 set-associative , 393 Branch-on-zero instruction , 258–259 implementing , 482.e1 simulating , 466 b Bubble Sort , 140 snoopy cache coherence , 482.e16 size