Computer Architecture and System Programming Laboratory

Computer Architecture and System Programming Laboratory TA Session 7 x87 FPU x87 Floating-Point Unit (FPU) provides high-performance floating-point processing capabilities • floating-point, integer, and packed BCD integer data types • floating-point processing algorithms • exception handling • IEEE Standard 754 http://home.agh.edu.pl/~amrozek/x87.pdf x87 FPU represents a separate execution environment, consists of 8 data registers and the following special-purpose registers Value loaded from memory into x87 FPU data register is automatically converted into double extended-precision floating-point format x87 FPU instructions treat the eight x87 FPU data registers as a register stack The register number of the current top-of-stack register is stored in the TOP (stack TOP) field in the x87 FPU status word. Load operations decrement TOP by one and load a value into the new top-of- stack register, and store operations store the value from the current TOP register in memory and then increment TOP by one 16-bit x87 FPU status register indicates the current state of the x87 FPU 16-bit tag word indicates the contents of each the 8 registers in the x87 FPU data-register stack (one 2-bit tag per register). Each tag in tag word corresponds to a physical register. TOP pointer is used to associate tags with registers relative to ST(0). var1: dt 5.6 var2: dt 2.4 var3: dt 3.8 gdb command to see var4: dt 10.3 stack data registers: fld tword [var1] ; st0 = 5.6, TOP=4 tui reg float fmul tword [var2] ; st0=st0*2.4=13.44, TOP=4 fld tword [var3] ; st0=3.8, st1=13.44, TOP=3 fmul tword [var4] ; st0=st0*10.3=39.14, st1=13.44, TOP=3 fadd st1 ; st0=st0+st1, st1=13.44, TOP=3 x87 FPU recognizes and operates on the following seven data types: single-precision floating point, double-precision floating point, double extended- precision floating point, signed word integer, signed doubleword integer, signed quadword integer, and packed BCD decimal integers. IEEE 754 standard RAM integer Example: number in 0 mov tword [n], 9 memory … fild tword [n] 0 0 11 9푑 = 1001푏 = 1001.0푏 = −1 ∙ 1.001 ∙ 2 푏 1 0 sign bit = 0 0 exponent = 11 1 significand = 1.001 0 0 … 0 1 1 0 … 0 1 0 0 1 float-point number in x87 data registers stack FPU INSTRUCTION SET x87 FPU instruction set fall into ESC instructions. They have a common opcode format, where the first byte of the opcode is one of the numbers from D8H through DFH. push commonly used constants onto st0 Basic Arithmetic Instructions Example of reverse instruction: Operands in memory can be in single-precision floating-point, double-precision floating-point, word-integer, or doubleword-integer format. They are converted to double extended-precision floating-point format automatically. The pop versions of instructions offer the option of popping the x87 FPU register stack following the arithmetic operation. These instructions operate on values in the ST(i) and ST(0) registers, store the result in the ST(i) register, and pop the ST(0) register. Control Instructions FINIT/FNINIT instructions initialize the x87 FPU and its internal registers to default values. Stack overflow and underflow exceptions Stack overflow — an instruction attempts to load a non-empty x87 FPU register from memory. A non-empty register is defined as a register containing a zero (tag value of 01), a valid value (tag value of 00), or a special value (tag value of 10). Stack underflow — an instruction references an empty x87 FPU register as a source operand, including attempting to write the contents of an empty register to memory. An empty register has a tag value of 11. Magic square http://www.1728.org/magicsq1.htm For the 3 x 3 magic square, each row, each column and both diagonals would sum to 3 • (3² + 1) ÷ 2 = 15 1) '1' goes in the middle of the top row 2) All numbers are then placed one column to the right and one row up from the previous number. 3) Whenever the next number placement is above the top row, stay in that column and place the number in the bottom row. 4) Whenever the next number placement is outside of the rightmost column, stay in that row and place the number in the leftmost column. 5) When encountering a filled-in square, place the next number directly below the previous number. 6) When the next number position is outside both a row and a column, place the number directly beneath the previous number. section .data fs_usage: db "Call with single, positive, odd number", 10, 0 fs_malloc_failed: db "A call to malloc() failed", 10, 0 fs_long: db "%*ld", 0 fs_newline: db 10, 0 section .bss argv: resq 1 n: resq 1 n2: resq 1 a: resq 1 b: resq 1 table: resq 1 width: resq 1 extern printf, atoi, calloc global main section .text main: ; FINIT instruction initialize the x87 FPU and its enter 0, 0 internal registers to default values. The x87 FPU tag finit word is set to FFFFH, which marks all the x87 mov qword [argv], rsi FPU data registers as empty. cmp rdi, 2 ; argc jne .error mov rdi, qword [argv] mov rdi, qword [rdi + 8*1] ; argv[1] call atoi cmp rax, 2 jle .error test rax, 1 ; test rax, 1 tests whether the number is odd. jz .error The equivalent would be to do and rax, 1, but this would change rax. mov qword [n], rax mov rdi, rax mov rsi, 8 call calloc cmp rax, 0 je .malloc_failed mov qword [table], rax mov rdx, rax mov rax, 0 mov rbx, qword [n] .allocate_table: cmp rax, rbx ; check if reach end of table je .fill_table ; if yes, finish allocation and start filling the table mov rdi, rbx mov rsi, 8 ; gdb changes this line to be “mov esi, 8” push rax … push rbx. push rdx … call calloc ; allocate a single row of the table pop rdx … mov qword [rdx], rax pop rbx … pop rax add rdx, 8 add rax, 1 jmp .allocate_table .fill_table_loop: .fill_table: cmp r8, r10 ; i == n^2 mov rbx, 0 ; a = 0 jg .fill_table_done mov r9, qword [n] ; n mov rdi, qword [table] ; rdi = pointer to table mov rcx, r9 mov rdi, qword [rdi + 8 * rbx] ; rdi = pointer to shr rcx, 1 ; b = n / 2 row[rbx] of the table (row 0, then row 1, and then row 2) mov r8, 1 ; i mov rax, r9 mov qword [rdi + 8 * rcx], r8 cdq inc r8 ; r8 = 1,2,3,... mul rax lea rax, [rbx + r9 - 1] mov r10, rax ; n^2 cdq div r9 mov rbx, rdx ` lea rax, [rcx + 1] cdq div r9 mov rcx, rdx mov rdi, qword [table] mov rdi, qword [rdi + 8 * rbx] cmp qword [rdi + 8 * rcx], 0 je .fill_table_loop lea rax, [rbx + 2] cdq div r9 mov rbx, rdx lea rax, [rcx + r9 - 1] cdq div r9 mov rcx, rdx jmp .fill_table_loop fild qword [n] ; FILD (load integer) instruction converts an integer operand in memory into double extended- precision floating-point format and pushes the value onto the top of the register stack. fld st0 ; FLD (load floating point) instruction pushes a floating-point operand from memory onto the top of the x87 FPU data-register stack. fmulp ; Multiply floating point and pop ST(0) from the register stack fxtract ; Extract exponent and significand - put significand in ST(0), and exponent in ST(1) (in binary basic 2) fld1 ; Load +1.0 into ST(0) fxch ; If no source operand is specified, the contents of ST(0) and ST(1) are exchanged fyl2x ; FYL2X instruction computes (y * log2x) ; Replace ST(1) with (ST(1) ∗ log2ST(0)) and pop the register stack. faddp ; Add ST(0) to ST(1), store result in ST(1), and pop the register stack. fldl2t ; Push log210 onto the FPU register stack. fdivp ; Divide ST(1) by ST(0), store result in ST(1), and pop the register stack. ; Indeed we would like to calculate log10x, and not log2x jmp .continue_voodoo .voodoo: dq 1.5 ; add 1.5 to st0, and store at the label width the closest integer to st0 (i.e., rounding it), and pop off the stack .continue_voodoo: fld qword [.voodoo] faddp ; Add ST(0) to ST(1), store result in ST(1), and pop the register stack. ; ST(0)+1.5 fistp qword [width] ; Store ST(0) in m64int and pop register stack. ; Indeed, this rounds the value of ‘width’ because it converts it to integer value ;;; PRINT THE MAGIC SQUARE error: mov rbx, 0 mov rdi, fs_usage mov rax, 0 .outer_loop: call printf cmp rbx, qword [n] jmp .end je .end .malloc_failed: mov rcx, 0 mov rdi, fs_malloc_failed mov rax, 0 .inner_loop: call printf cmp rcx, qword [n] je .end_inner_loop .end: leave mov rdi, fs_long ret mov rsi, qword [width] mov rdx, qword [table] mov rdx, qword [rdx + 8 * rbx] mov rdx, qword [rdx + 8 * rcx] mov rax, 0 push rbx push rcx call printf pop rcx pop rbx inc rcx jmp .inner_loop .end_inner_loop: mov rdi, fs_newline mov rax, 0 push rbx call printf pop rbx inc rbx jmp .outer_loop .

Computer Architecture and System Programming Laboratory

Arithmetic Algorithms for Extended Precision Using Floating-Point Expansions Mioara Joldes, Olivier Marty, Jean-Michel Muller, Valentina Popescu

Floating Points

Hacking in C 2020 the C Programming Language Thom Wiggers

Floating Point Formats

Unit – I Computer Architecture and Operating System – Scs1315

S.D.M COLLEGE of ENGINEERING and TECHNOLOGY Sridhar Y

Extended Precision Floating Point Arithmetic

Lecture 6: Instruction Set Architecture and the 80X86

Low-Power Microprocessor Based on Stack Architecture

A Variable Precision Hardware Acceleration for Scientific Computing Andrea Bocco

X86-64 Machine-Level Programming∗

Effectiveness of Floating-Point Precision on the Numerical Approximation by Spectral Methods