(OL3.11-7). Page References Preceded by a Single Letter with Hyphen Refer to Appendices
Total Page:16
File Type:pdf, Size:1020Kb
Index Note: Online information is listed by chapter and section number followed by page numbers (OL3.11-7). Page references preceded by a single letter with hyphen refer to appendices. 1-bit ALU, B-26–29. See also Arithmetic addiu (Add Imm.Unsigned), 119 AGP, C-9 logic unit (ALU) Address interleaving, 381 Algol-60, OL2.21-7 adder, B-27 Address select logic, D-24, D-25 Aliasing, 444 CarryOut, B-28 Address space, 428, 431 Alignment restriction, 69–70 for most signifi cant bit, B-33 extending, 479 All-pairs N-body algorithm, C-65 illustrated, B-29 fl at, 479 Alpha architecture logical unit for AND/OR, B-27 ID (ASID), 446 bit count instructions, E-29 performing AND, OR, and addition, inadequate, OL5.17-6 fl oating-point instructions, E-28 B-31, B-33 shared, 519–520 instructions, E-27–29 32-bit ALU, B-29–38. See also Arithmetic single physical, 517 no divide, E-28 logic unit (ALU) unmapped, 450 PAL code, E-28 defi ning in Verilog, B-35–38 virtual, 446 unaligned load-store, E-28 from 31 copies of 1-bit ALU, B-34 Address translation VAX fl oating-point formats, E-29 illustrated, B-36 for ARM cortex-A8, 471 ALU control, 259–261. See also ripple carry adder, B-29 defi ned, 429 Arithmetic logic unit (ALU) tailoring to MIPS, B-31–35 fast, 438–439 bits, 260 with 32 1-bit ALUs, B-30 for Intel core i7, 471 logic, D-6 32-bit immediate operands, 112–113 TLB for, 438–439 mapping to gates, D-4–7 7090/7094 hardware, OL3.11-7 Address-control lines, D-26 truth tables, D-5 Addresses ALU control block, 263 A 32-bit immediates, 113–116 defi ned, D-4 base, 69 generating ALU control bits, D-6 Absolute references, 126 byte, 69 ALUOp, 260, D-6 Abstractions defi ned, 68 bits, 260, 261 hardware/soft ware interface, 22 memory, 77 control signal, 263 principle, 22 virtual, 428–431, 450 Amazon Web Services (AWS), 425 to simplify design, 11 Addressing AMD Opteron X4 (Barcelona), 543, 544 Accumulator architectures, OL2.21-2 32-bit immediates, 113–116 AMD64, 151, 224, OL2.21-6 Acronyms, 9 base, 116 Amdahl’s law, 401, 503 Active matrix, 18 displacement, 116 corollary, 49 add (Add), 64 immediate, 116 defi ned, 49 add.d (FP Add Double), A-73 in jumps and branches, 113–116 fallacy, 556 add.s (FP Add Single), A-74 MIPS modes, 116–118 and (AND), 64 Add unsigned instruction, 180 PC-relative, 114, 116 AND gates, B-12, D-7 addi (Add Immediate), 64 pseudodirect, 116 AND operation, 88 Addition, 178–182. See also Arithmetic register, 116 AND operation, A-52, B-6 binary, 178–179 x86 modes, 152 andi (And Immediate), 64 fl oating-point, 203–206, 211, A-73–74 Addressing modes, A-45–47 Annual failure rate (AFR), 418 instructions, A-51 desktop architectures, E-6 versus.MTTF of disks, 419–420 operands, 179 addu (Add Unsigned), 64 Antidependence, 338 signifi cands, 203 Advanced Vector Extensions (AVX), 225, Antifuse, B-78 speed, 182 227 Apple computer, OL1.12-7 I-1 I-2 Index Apple iPad 2 A1395, 20 TLB hardware for, 471 programs, 123 logic board of, 20 ARM instructions, 145–147 translating into machine language, 84 processor integrated circuit of, 21 12-bit immediate fi eld, 148 when to use, A-7–9 Application binary interface (ABI), 22 addressing modes, 145 Asserted signals, 250, B-4 Application programming interfaces block loads and stores, 149 Associativity (APIs) brief history, OL2.21-5 in caches, 405 defi ned, C-4 calculations, 145–146 degree, increasing, 404, 455 graphics, C-14 compare and conditional branch, increasing, 409 Architectural registers, 347 147–148 set, tag size versus, 409 Arithmetic, 176–236 condition fi eld, 324 Atomic compare and swap, 123 addition, 178–182 data transfer, 146 Atomic exchange, 121 addition and subtraction, 178–182 features, 148–149 Atomic fetch-and-increment, 123 division, 189–195 formats, 148 Atomic memory operation, C-21 fallacies and pitfalls, 229–232 logical, 149 Attribute interpolation, C-43–44 fl oating-point, 196–222 MIPS similarities, 146 Automobiles, computer application in, 4 historical perspective, 236 register-register, 146 Average memory access time (AMAT), multiplication, 183–188 unique, E-36–37 402 parallelism and, 222–223 ARMv7, 62 calculating, 403 Streaming SIMD Extensions and ARMv8, 158–159 advanced vector extensions in x86, ARPANET, OL1.12-10 B 224–225 Arrays, 415 subtraction, 178–182 logic elements, B-18–19 Backpatching, A-13 subword parallelism, 222–223 multiple dimension, 218 Bandwidth, 30–31 subword parallelism and matrix pointers versus, 141–145 bisection, 532 multiply, 225–228 procedures for setting to zero, 142 external to DRAM, 398 Arithmetic instructions. See also ASCII memory, 380–381, 398 Instructions binary numbers versus, 107 network, 535 desktop RISC, E-11 character representation, 106 Barrier synchronization, C-18 embedded RISC, E-14 defi ned, 106 defi ned, C-20 logical, 251 symbols, 109 for thread communication, C-34 MIPS, A-51–57 Assembler directives, A-5 Base addressing, 69, 116 operands, 66–, 73 Assemblers, 124–126, A-10–17 Base registers, 69 Arithmetic intensity, 541 conditional code assembly, A-17 Basic block, 93 Arithmetic logic unit (ALU). See also defi ned, 14, A-4 Benchmarks, 538–540 ALU control; Control units function, 125, A-10 defi ned, 46 1-bit, B-26–29 macros, A-4, A-15–17 Linpack, 538, OL3.11-4 32-bit, B-29–38 microcode, D-30 multicores, 522–529 before forwarding, 309 number acceptance, 125 multiprocessor, 538–540 branch datapath, 254 object fi le, 125 NAS parallel, 540 hardware, 180 pseudoinstructions, A-17 parallel, 539 memory-reference instruction use, 245 relocation information, A-13, A-14 PARSEC suite, 540 for register values, 252 speed, A-13 SPEC CPU, 46–48 R-format operations, 253 symbol table, A-12 SPEC power, 48–49 signed-immediate input, 312 Assembly language, 15 SPECrate, 538–539 ARM Cortex-A8, 244, 345–346 defi ned, 14, 123 Stream, 548 address translation for, 471 drawbacks, A-9–10 beq (Branch On Equal), 64 caches in, 472 fl oating-point, 212 bge (Branch Greater Th an or Equal), 125 data cache miss rates for, 474 high-level languages versus, A-12 bgt (Branch Greater Th an), 125 memory hierarchies of, 471–475 illustrated, 15 Biased notation, 79, 200 performance of, 473–475 MIPS, 64, 84, A-45–80 Big-endian byte order, 70, A-43 specifi cation, 345 production of, A-8–9 Binary numbers, 81–82 Index I-3 ASCII versus, 107 operations, 254 C conversion to decimal numbers, 76 Branch delay slots defi ned, 73 defi ned, 322 C.mmp, OL6.15-4 Bisection bandwidth, 532 scheduling, 323 C language Bit maps Branch equal, 318 assignment, compiling into MIPS, defi ned, 18, 73 Branch instructions, A-59–63 65–66 goal, 18 jump instruction versus, 270 compiling, 145, OL2.15-2–2.15-3 storing, 18 list of, A-60–63 compiling assignment with registers, Bit-Interleaved Parity (RAID 3), OL5.11- pipeline impact, 317 67–68 5 Branch not taken compiling while loops in, 92 Bits assumption, 318 sort algorithms, 141 ALUOp, 260, 261 defi ned, 254 translation hierarchy, 124 defi ned, 14 Branch prediction translation to MIPS assembly language, dirty, 437 as control hazard solution, 284 65 guard, 220 buff ers, 321, 322 variables, 102 patterns, 220–221 defi ned, 283 C++ language, OL2.15-27, OL2.21-8 reference, 435 dynamic, 284, 321–323 Cache blocking and matrix multiply, rounding, 220 static, 335 475–476 sign, 75 Branch predictors Cache coherence, 466–470 state, D-8 accuracy, 322 coherence, 466 sticky, 220 correlation, 324 consistency, 466 valid, 383 information from, 324 enforcement schemes, 467–468 ble (Branch Less Th an or Equal), 125 tournament, 324 implementation techniques, Blocking assignment, B-24 Branch taken OL5.12-11–5.12-12 Blocking factor, 414 cost reduction, 318 migration, 467 Block-Interleaved Parity (RAID 4), defi ned, 254 problem, 466, 467, 470 OL5.11-5–5.11-6 Branch target protocol example, OL5.12-12–5.12-16 Blocks addresses, 254 protocols, 468 combinational, B-4 buff ers, 324 replication, 468 defi ned, 376 Branches. See also Conditional snooping protocol, 468–469 fi nding, 456 branches snoopy, OL5.12-17 fl exible placement, 402–404 addressing in, 113–116 state diagram, OL5.12-16 least recently used (LRU), 409 compiler creation, 91 Cache coherency protocol, OL5.12- loads/stores, 149 condition, 255 12–5.12-16 locating in cache, 407–408 decision, moving up, 318 fi nite-state transition diagram, OL5.12- miss rate and, 391 delayed, 96, 255, 284, 318–319, 322, 15 multiword, mapping addresses to, 390 324 functioning, OL5.12-14 placement locations, 455–456 ending, 93 mechanism, OL5.12-14 placement strategies, 404 execution in ID stage, 319 state diagram, OL5.12-16 replacement selection, 409 pipelined, 318 states, OL5.12-13 replacement strategies, 457 target address, 318 write-back cache, OL5.12-15 spatial locality exploitation, 391 unconditional, 91 Cache controllers, 470 state, B-4 Branch-on-equal instruction, 268 coherent cache implementation valid data, 386 Bubble Sort, 140 techniques, OL5.12-11–5.12-12 blt (Branch Less Th an), 125 Bubbles, 314 implementing, OL5.12-2 bne (Branch On Not Equal), 64 Bus-based coherent multiprocessors, snoopy cache coherence, OL5.12-17 Bonding, 28 OL6.15-7 SystemVerilog, OL5.12-2 Boolean algebra, B-6 Buses, B-19 Cache hits, 443 Bounds check shortcut, 95 Bytes Cache misses Branch datapath addressing, 70 block replacement on, 457 ALU, 254 order, 70, A-43 capacity, 459 I-4 Index Cache misses (Continued) virtual memory and TLB integration, defi ned, 33 compulsory, 459 440–441 memory-stall, 399 confl ict, 459 virtually addressed, 443 number of registers and, 67 defi ned, 392 virtually indexed, 443 worst-case delay and, 272 direct-mapped cache, 404 virtually tagged, 443 Clock cycles