CSC 252/452: Computer Organization 1

CSC 252/452: Computer Organization 1

CSC 252/452: Computer Organization Programming with SSE3 Today XMM Registers ◼ 16 total, each 16 bytes ◼ 16 single-byte integers • Arrays – One-dimensional ◼ 8 16-bit integers – Multi-dimensional (nested) ◼ 4 32-bit integers – Multi-level ◼ 4 single-precision floats • Structures – Allocation ◼ 2 double-precision floats – Access ◼ 1 single-precision float – Alignment • Unions ◼ 1 double-precision float • Floating Point 1 2 Scalar & SIMD Operations ◼ Scalar Operations: Single Precision addss %xmm0,%xmm1 FP Basics %xmm0 • Arguments passed in %xmm0, %xmm1, ... + • Result returned in %xmm0 %xmm1 • All XMM registers caller-saved ◼ SIMD Operations: Single Precision addps %xmm0,%xmm1 float fadd(float x, float y) double dadd(double x, double y) { { %xmm0 return x + y; return x + y; + + + + } } %xmm1 # x in %xmm0, y in %xmm1 # x in %xmm0, y in %xmm1 addss %xmm1, %xmm0 addsd %xmm1, %xmm0 ◼ Scalar Operations: Double Precision addsd %xmm0,%xmm1 ret ret %xmm0 + %xmm1 3 4 1 CSC 252/452: Computer Organization FP Memory Referencing Other Aspects of FP Code • Integer (and pointer) arguments passed in regular registers • Lots of instructions • FP values passed in XMM registers • Different mov instructions to move between XMM registers, and – Different operations, different formats, ... between memory and XMM registers • Floating-point comparisons – Instructions ucomiss and ucomisd double dincr(double *p, double v) { – Set condition codes CF, ZF, and PF double x = *p; *p = x + v; • Using constant values return x; } – Set XMM0 register to 0 with instruction # p in %rdi, v in %xmm0 xorpd %xmm0, %xmm0 movapd %xmm0, %xmm1 # Copy v movsd (%rdi), %xmm0 # x = *p – Others loaded from memory addsd %xmm0, %xmm1 # t = x + v movsd %xmm1, (%rdi) # *p = t ret 5 6 Breakout n X n Matrix Access Array Elements Consider the following declaration of ▪ Address A + i * (C * K) + j * K a two-dimensional array ▪ C = n, K = 4 ▪ Must perform integer multiplication int Array[n][n]; /* Get element a[i][j] */ int var_ele(size_t n, int a[n][n], size_t i, size_t j) Assume n in %rdi; { Array in %rsi; return a[i][j]; } i in %rdx; # n in %rdi, a in %rsi, i in %rdx, j in %rcx j in %rcx imulq %rdx, %rdi # n*i leaq (%rsi,%rdi,4), %rax # a + 4*n*i Write the assembly code (x86-based) to movl (%rax,%rcx,4), %eax # a + 4*n*i + 4*j read Array[i][j] into register %eax ret 7 7 8 2 CSC 252/452: Computer Organization Instruction Set Architecture • Assembly Language View – Processor state Application • Registers, memory, … Program – Instructions CSC 252: • addl, movl, leal, … Compiler OS • How instructions are encoded as ISA Processor Architecture bytes CPU Design How do we go from a sequence of instructions to actual execution? Circuit Design Chip Layout 9 10 9 10 Overview of Logic Design Digital Signals • Fundamental Hardware Requirements 0 1 0 – Communication • How to get values from one place to another – Computation – combinational logic Voltage – Storage – sequential logic – Clock to drive the next computation • Bits are Our Friends Time – Everything expressed in terms of values 0 and 1 – Use voltage thresholds to extract discrete values from continuous – Communication signal • Low or high voltage on wire – Simplest version: 1-bit signal – Computation • Either high range (1) or low range (0) • Compute Boolean functions – Storage • With guard range between them • Store bits of information – Not strongly affected by noise or low quality circuit elements • Can make circuits simple, small, and fast 11 11 12 3 CSC 252/452: Computer Organization Basic Building Block: Transistors Basic Building Block: Transistors 13 14 13 14 CMOS: Complementary MOS CMOS: NOR and NAND Gates • Use both n-type and p-type NAND Gate (NOT + AND) Your text here By Reza Mirhosseini - originally uploaded to en.wikipedia (file log), Public Domain, https://commons.wikimedia.org/w/index.php?curid=12271062 https://upload.wikimedia.org/wikipedia/commons/thumb/e/e2/CMOS_NAND.s vg/280px-CMOS_NAND.svg.png 15 16 15 16 4 CSC 252/452: Computer Organization Computing with Logic Gates Combinational Circuits And Or Not Acyclic Network a a out out a out b b out = a && b out = a || b out = !a Primary Primary Inputs Outputs – Outputs are Boolean functions of inputs – Respond continuously to changes in inputs • With some, small delay Rising Delay Falling Delay a && b • Acyclic Network of Logic Gates b – Continously responds to changes on primary inputs Voltage – Primary outputs become (after some delay) a Boolean functions of primary inputs Time 17 18 17 18 Arithmetic Logic Unit Sequential Logic: Memory and Control • Sequential: 0 1 2 3 – Output depends on the current input values and Y A Y A Y A Y A A A A A the previous sequence of input values. L X + Y L X - Y L X & Y L X ^ Y U U U U – Are Cyclic: X B OF X B OF X B OF X B OF ZF ZF ZF ZF • Output of a gate feeds its input at some future time. CF CF CF CF – Memory: • Remember results of previous operations – Combinational logic • Use them as inputs. • Continuously responding to inputs – Example of use: – Control signal selects function computed • Build registers and memory units. • Corresponding to 4 arithmetic/logical operations in Y86 – Also computes values for condition codes 19 20 19 20 5 CSC 252/452: Computer Organization Clocks Edge-Triggered Latch D • Signal used to synchronize activity in a R Data processor Q+ • Every operation must be completed in the time Q– between two clock pulses (or rising edges) --- C S T Clock the cycle time Trigger • Maximum clock rate (frequency) determined by – Only in latching mode for C the slowest logic path in the circuit (the critical brief period T path) • Rising clock edge D – Value latched depends on data as clock rises Clock Q+ – Output remains stable at Time all other times 21 22 21 22 StructureRegisters Register Operation D i7 Q+ C o7 D State = x State = y i6 Q+ C o6 i D Rising 5 Q+ o Input = y Output = x Output = y C 5 clock D i4 Q+ C o4 I O x y D i3 Q+ C o3 D i2 Q+ C o2 D i1 Q+ C o1 Clock D i0 Q+ o C 0 – Stores data bits Clock – For most of time acts as barrier between input – Stores word of data and output • Different from program registers seen in assembly code – Collection of edge-triggered latches – As clock rises, loads input – Loads input on rising edge of clock 23 24 23 24 6 CSC 252/452: Computer Organization State Machine Example Random-Access Memory valA A Comb. Logic srcA 0 valW Register W Read ports dstW Write port valB file B A – Accumulator srcB L 0 Out circuit U MUX Clock In 1 – Load or – Stores multiple words of memory Load accumulate Clock • Address input specifies which word to read or write on each – Register file cycle • Holds values of program registers Clock • %eax, %esp, etc. • Register identifier serves as address Load – ID 8 implies no read or write performed In x0 x1 x2 x3 x4 x5 – Multiple Ports • Can read and/or write multiple words in one cycle Out x0 x0+x1 x0+x1+x2 x3 x3+x4 x3+x4+x5 – Each has separate address and data input/output 25 26 25 26 Register File Timing Building Blocks valA • Reading = x fun A 2 srcA – Like combinational logic • Combinational Logic Register – Compute Boolean functions of A file – Output data generated based on A x valB inputs B input address L srcB • After some delay – Continuously respond to input U 2 B 0 • Writing changes MUX – Like register – Operate on data and implement control 1 – Update only as clock rises 2 x y • Storage Elements valA valW 2 A y Rising valW srcA Register W valW dstW – Store bits Register W file 2 clock dstW Register W file – Addressable memories dstW valB file B – Non-addressable registers srcB Clock – Loaded only as clock rises Clock Clock Clock 27 28 27 28 7.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    7 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us