Assembly Language Programming 64-Bit Environments

Assembly Language Programming 64-bit environments Zbigniew Jurkiewicz, Instytut Informatyki UW October 17, 2017 Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming 64-bit environments Some recent history Intel together with HP start to work on 64-bit processor using VLIW technology. Itanium processor is born with the architecture labeled IA-64. AMD develops its own 64-bit processor , being the minimal (at least externally) extension of 32-bitowej x86 version. Opteron processor (Athlon 64) is born, with the architekturze denoted x86-64. While Itanium is more interesting and advanced technologically, it does not sell well (may be because it is too expensive ;-) and is used only in larger servers — it replaced older HP processors in this role. Intel clones AMD architecture (from Pentium 4 Xeon): some strange names, like EM64T or IA-32e. Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming 64-bit environments x86-64 architecture 5 working modes, 3 of them are old 32-bit modes. Compatibility Mode for executing programs compiled for 32-bits in the environment of 64-bit operating system. 64-bit Mode: full 64-bit. Application Binary Interface (ABI) for Linux defined by amd64.org. Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming 64-bit environments Registers in x86-64 General registers 64-bit: RAX, RCX, RDX, RBX, RSP, RBP, RSI, RDI, R8, R9, R10, R11, R12, R13, R14, R15 32-bit: EAX–ESP and R8D, R9D, R10D, R11D, R12D, R13D, R14D, R15D 16-bit: AX–SP and R8W, R9W, R10W, R11W, R12W, R13W, R14W, R15W 8-bit: AL–DL, AH–DH, SPL, BPL, SIL, DIL, R8B, R9B, R10B, R11B, R12B, R13B, R14B, R15B 128-bit wide XMM registers (used in SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, SSE5 and Advanced Vector Extensions) In 64-bit mode there are 16 registers 128-bit wide: XMM0–XMM15 Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming 64-bit environments Operations on x86-64 Pointers (in registers) always 64-bit long, but only 48 bits used for virtual addresses (which gives 256 TB address space), physical addresses max. 52-bit long (please do not ask why). New addressing mode: relative to instruction counter (RIP relative), already used in IA-32 for jumps etc. 32-bit offset (with sign). Easier generation of Position Independent Code (PIC). Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming 64-bit environments Operations on x86-64 8-bit and 16-bit operations do not modify higher register part. 32-bit operations clear (with zero) higher register part (possibly to have 32-bit and 64-bit pointers in mixed modes), for example mov rax,100 and mov eax,100 work the same. Prefix REX used for operating of full 64-bit arguments. Special opcodes for loading 64-bit values (movabs). Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming 64-bit environments Segmentation in x86-64 CS is used only for setting the level of code protection, base address always 0, no size control (no limits). DS, ES, SS: ignored, all three equivalenced to CS. FS, GS used only for setting base address of the segment (needed for MS Windows). Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming 64-bit environments Conventions of register use (ABI) Both SYSCALL and INT 0x80 work, but differently. For SYSCALL system call number in EAX, parameters in RDI, RSI, RDX, R10, R8, R9. Numbers of system calls (services) are in /usr/src/linux/include/asm-x86 64/unistd.h. The result in RAX and RDX. Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming 64-bit environments Conventions of register use (ABI) Function calls (CALL): Arguments in RDI, RSI, RDX, RCX, R8, R9 (integers and pointer). If argument size smaller, you can use “partial” register. Floating-point arguments in XMM0, XMM1, XMM2, ..., XMM7. If more arguments needed, they are passed on stack and should be aligned (usually 64 bits), but of course we use only lower part (remember about little-endian). Function value returned in RAX (integers and pointers) or in XMM0 (floating-point numbers). Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming 64-bit environments Conventions of register use (ABI) Function calls (CALL): RSP usually constant inside function body, RBP not used for frames. In GCC the stack is aligned to 128 bits during function call — useful for saving (pushing) FPU and SSE registers. In assembly language must be done by hand and rsp,15 RBX, RBP, ESP, R12, R13, R14, R15 should be saved. Above the current top of stack there is protected red zone — 128 bytes to be used by the program. Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming 64-bit environments Problems On x86-64 in 64-bits mode segments do not work, only paging. But paging does not disriminate between levels 0–2 (What for? There are segments for that). Thus the operating system of virtual machines must be on the level 3, and then it is not protected against bad applicatios, or on the level 0. Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming 64-bit environments Intel/HP IA-64 (Itanium) 128 integer registers (64 bits each) 128 floating-point registers (82-bits each: 17/64) f0 always 0.0 f1 always 1.0 64 predicate registers (1 bit each), p0 always 1 8 branch registers (64-bits each) ??? 128 application registers ??? Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming 64-bit environments Intel/HP IA-64 (Itanium) Some registers of each kind are “rotating” (e.g. r32–r127). Kind of stack. Instruction Bundle: 3 41-bit slots per instruction + 5 bits for template. Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming 64-bit environments CPUID Instruction All newer processors have the CPUID instruction, which helps to identify on what processor we are. This information is accessible in Linux by cat /proc/cpuinfo But first we must determine whether it is supported, by flipping the ID flag (bit 21 of FLAGS). pushf pop eax xor eax,00200000h ;flip bit 21 push eax popf pushf pop ecx xor eax,ecx ;check if bit 21 was flipped jz cpuid_not_supported Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming 64-bit environments CPUID Instruction Some processors do not support the ID flag, but they do support the CPUID instruction. In that case we can temporarily hook Invalid Opcode exception (int 6) and execute the CPUID instruction. If the exception is triggered, CPUID is not supported. Now we can use CPUID to identify the processor. The instruction expects EAX register to hold a function number (“level”). Information is returned in EAX, ECX, EDX and EBX. Using CPUID instruction mov eax,function cpuid Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming 64-bit environments CPUID Instruction If we put 0 in EAX we receive Maximum available level in EAX. ASCII processor ID (“short name”) in EBX:EDX:ECX as follows Intel ”GenuineIntel” (ebx=’Genu’, bl=’G’(47h)) AMD ”AuthenticAMD” Cyrix ”CyrixInstead” Rise ”RiseRiseRise” Centaur ”CentaurHauls” NexGen ”NexGenDriven” UMC ”UMC UMC UMC ” Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming 64-bit environments CPUID Instruction There are also other levels level 1 returns flags for processor properties; level 2 returns cache and TLB descriptors. Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming 64-bit environments CPUID Instruction Example code to determine MMX support: ;; First check maximum available level xor eax, eax ;level 0 cpuid cmp eax, 0 jng no_way ;; Now check MMX support mov eax, 1 ;level 1 cpuid test edx, 00800000h ;bit 23 is set if MMX is supported jz mmx_not_supported Zbigniew Jurkiewicz, Instytut Informatyki UW Assembly Language Programming 64-bit environments.

Assembly Language Programming 64-Bit Environments

SIMD Extensions

RISC-V Vector Extension Webinar I

NASM – the Netwide Assembler

AMD's Bulldozer Architecture

C++ Code M128 Add (Const M128 &X, Const __M128 &Y){ X X3 X2 X1 X0 Return Mm Add Ps(X, Y); } + + + + +

Computer Architectures an Overview

An Introduction to CUDA/Opencl and Graphics Processors

(GAMI) API Specification Designed and Implemented for Intel® Rack Scale Design Software V2.3.2 Release

128-Bit SSE5 Instruction Set and Supplemental 64-Bit Media

Zynq-7000 All Programmable Soc Architecture Porting Quick Start Guide

SHA-3 Conference, March 2012, BLAKE and 256-Bit Advanced

Six-Core AMD Opteron Processor Istanbul Paul G. Howard, Ph.D