Inline Assembly

Florob

Why? Assembler Inline Assembly Design Goals

Prior Art MSVC Florian “Florob” Zeitz gcc LLVM IR

Rust (now) Rust’s Future? 2017-06-07 Questions

1 / 48 Inline Assembly

Florob 1 Why?

Why?

Assembler 2 Assembler

Design Goals Prior Art 3 Design Goals D MSVC gcc LLVM IR 4 Prior Art Rust (now)

Rust’s Future?

Questions 5 Rust (now)

6 Rust’s Future?

2 / 48 Inline Assembly

Florob 1 Why?

Why?

Assembler 2 Assembler

Design Goals Prior Art 3 Design Goals D MSVC gcc LLVM IR 4 Prior Art Rust (now)

Rust’s Future?

Questions 5 Rust (now)

6 Rust’s Future?

3 / 48 Why?

Inline Assembly

Florob

Why?

Assembler

Design Goals

Prior Art D MSVC gcc LLVM IR

Rust (now)

Rust’s Future?

Questions

4 / 48 Why?

Inline Assembly

Florob low-level control config registers Why? hardware not otherwise accessible Assembler Design Goals performance Prior Art predictable timing D MSVC embedded gcc LLVM IR cryptography (side-channel) Rust (now) convenience Rust’s Future? no separate assembly file Questions no need for a build.rs Rust philosophy: safe with unsafe {} escape hatches where needed

5 / 48 Inline Assembly

Florob 1 Why?

Why?

Assembler 2 Assembler

Design Goals Prior Art 3 Design Goals D MSVC gcc LLVM IR 4 Prior Art Rust (now)

Rust’s Future?

Questions 5 Rust (now)

6 Rust’s Future?

6 / 48 Examples

Inline Assembly Florob (AT&T) x86 (Intel)

Why? 1 jmp foo 1 jmp foo Assembler 2 mov $5, %eax 2 mov eax, 5 Design Goals 3 foo: 3 foo: Prior Art D 4 mov 5(%eax, %ebx, 2), %edx 4 mov edx, [eax + ebx*2 + 5] MSVC gcc LLVM IR Rust (now) ARM PowerPC Rust’s Future?

Questions 1 ldmiaeq sp!, {r4-r7, pc} 1 lwz r0, 8(r1) 2 ldrb r0, [sp, #32] 2 rlwimi r11, r1, 0, 30, 28

7 / 48 Properties

Inline Assembly

Florob

Why? textual representation of the instructions a CPU executes Assembler

Design Goals mnemonics Prior Art operands D MSVC register gcc LLVM IR immediate Rust (now) label Rust’s Future? combinations thereof Questions

8 / 48 Inline Assembly

Florob 1 Why?

Why?

Assembler 2 Assembler

Design Goals Prior Art 3 Design Goals D MSVC gcc LLVM IR 4 Prior Art Rust (now)

Rust’s Future?

Questions 5 Rust (now)

6 Rust’s Future?

9 / 48 Design Goals

Inline Assembly

Florob

Why? Assembler easy to use Design Goals portable Prior Art D between platforms (x86, AMD64, ARM, AVR, PowerPC, RISC-V, …) MSVC gcc between backends (LLVM, Cretone, gcc, …) LLVM IR

Rust (now) support all instructions? (with all operand types?)

Rust’s Future?

Questions

10 / 48 Inline Assembly

Florob 1 Why?

Why?

Assembler 2 Assembler

Design Goals Prior Art 3 Design Goals D MSVC gcc LLVM IR 4 Prior Art Rust (now)

Rust’s Future?

Questions 5 Rust (now)

6 Rust’s Future?

11 / 48 D

Inline Assembly

Florob

Why? 1 asm { mov RAX, RDX; } Assembler Design Goals implemented as DSL inside asm {} Prior Art D standardized per CPU family MSVC gcc can be marked pure and nothrow LLVM IR Rust (now) only x86/AMD64 are currently available Rust’s Future? Docs: https://dlang.org/spec/iasm.html Questions

12 / 48 D’s x86/AMD64 syntax

Inline Assembly

Florob close to regular Intel syntax

Why? registers, always uppercase Assembler mnemonics, always lowercase Design Goals local variables accessed as var[EBP] Prior Art D (just var when not in naked code) MSVC gcc many D expressions allowed LLVM IR Rust (now) $ represents the PC, but can’t be used for PIC Rust’s Future? pseudo-ops for: Questions alignment data definition prefixes (lock, rep, …)

13 / 48 D Examples: Add 5 to variable

Inline Assembly

Florob

Why?

Assembler

Design Goals 1 int var = 0;

Prior Art 2 asm { D 3 add var, 3 + 2; MSVC gcc 4 } LLVM IR

Rust (now)

Rust’s Future?

Questions

14 / 48 D Examples: Get L1 cache size

Inline Assembly

Florob 1 int ebx, ecx; Why? 2 asm { Assembler 3 mov EAX, 4; Design Goals 4 xor ECX, ECX; Prior Art 5 cpuid; D MSVC 6 mov ebx, EBX; gcc 7 LLVM IR mov ecx, ECX; 8 Rust (now) } 9 Rust’s Future? writefln("L1 Cache: %s", 10 ebx ebx Questions (( >> 22) + 1) * ((( >> 12) & 0x3ff) + 1) 11 * ((ebx & 0xfff) + 1) * (ecx + 1));

15 / 48 MSVC

Inline Assembly 1 __asm { mov rax, rdx } Florob

Why? DSL after __asm Assembler only available for x86 (Pentium 4 and AMD Athlon opcodes) Design Goals

Prior Art uses MASM (Microsoft Macro Assembler) expressions and /C++ D MSVC elements gcc LLVM IR allows jumps to C labels from ASM and vice versa Rust (now) allows calls to C functions Rust’s Future?

Questions C/C++ operators cannot be used Docs: https://docs.microsoft.com/en-us/cpp/assembler/ inline/inline-assembler

16 / 48 MSVC, supported MASM

Inline Assembly

Florob

Why? Assembler not supported: Design Goals data definitions Prior Art macros D MSVC supported: gcc LLVM IR alignment Rust (now) comments Rust’s Future?

Questions

17 / 48 MSVC, supported C/C++ elements

Inline Assembly

Florob

Why? symbols (labels, function names, variable names) Assembler MASM reserved words take precedence over symbols Design Goals Prior Art constants D MSVC comments gcc LLVM IR macros and preprocessor directives Rust (now)

Rust’s Future? type names where a MASM type would be legal

Questions

18 / 48 MSVC Examples: Add 5 to variable

Inline Assembly

Florob

Why?

Assembler

Design Goals 1 int var = 0;

Prior Art 2 __asm { D 3 add var, 5 MSVC gcc 4 } LLVM IR

Rust (now)

Rust’s Future?

Questions

19 / 48 MSVC Examples: Get L1 cache size

Inline Assembly

Florob 1 int ebx_v, ecx_v; 2 Why? __asm { 3 mov eax, 4 Assembler 4 xor ecx, ecx Design Goals 5 cpuid Prior Art D 6 mov ebx_v, ebx MSVC gcc 7 mov ecx_v, ecx LLVM IR 8 } Rust (now) 9 std::cout << "L1 Cache: " Rust’s Future? 10 << ((ebx_v >> 22) + 1) * (((ebx_v >> 12) & 0x3ff) + 1) Questions 11 * ((ebx_v & 0xfff) + 1) * (ecx_v + 1)) 12 << '\n';

20 / 48 gcc

Inline Assembly

Florob 1 asm [volatile] ("template" : outs : ins : clobber) Why? Assembler special syntax within asm () Design Goals uses a template strings Prior Art D MSVC template interpreted by the assembler ( doesn’t have to gcc LLVM IR know instructions) Rust (now) compiler fills template based on constraints Rust’s Future? Docs: Questions https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html

21 / 48 gcc Template

Inline Assembly

Florob

Why?

Assembler %0, or %[name] as placeholders Design Goals supports modifiers, e.g. %w0 to print a HImode register name (%ax) Prior Art D multiple dialects as {mov %%eax, %%ebx | mov ebx, eax} MSVC gcc specify one or all aviailable dialects LLVM IR purely defined by order Rust (now) escapes: %%, %{, %}, %| Rust’s Future?

Questions

22 / 48 gcc Arguments

Inline Assembly

Florob

Why? specify outputs, inputs, and clobbers Assembler constraints determine how to reference arguments (register, Design Goals address, immediate) Prior Art D all inputs have to read before any output is written MSVC gcc clobbers specify touched state, that can not be inferred from inputs LLVM IR

Rust (now) and outputs Rust’s Future? compiler moves arguments into registers, saves clobbered registers, Questions knows to reload changed memory

23 / 48 gcc Constraints

Inline Assembly

Florob

Why? Assembler constraint codes are usually a single letter Design Goals architecture specific Prior Art D MSVC e. g. r for register, m for memory, i for immediate gcc LLVM IR multiple alternative constraints are allowed, allowing the compiler to Rust (now) choose the most efficient Rust’s Future?

Questions

24 / 48 gcc Constraints

Inline Assembly Florob output

Why? start with = or + (input and output)

Assembler can be modified with & to allow write before all inputs are read

Design Goals (early-clobber) Prior Art input D MSVC can be tied to outputs by specifying the operand number as gcc LLVM IR constrained Rust (now) clobbers Rust’s Future? used registers Questions modified flags if memory was modified

25 / 48 gcc volatile

Inline Assembly

Florob

Why?

Assembler asm statements can be declared volatile Design Goals Prior Art disables code motion and dead code elimination D MSVC may be required when side-effects are not otherwise visible, gcc LLVM IR e. g. asm ("wfi"); Rust (now)

Rust’s Future?

Questions

26 / 48 gcc Examples: Add 5 to variable

Inline Assembly

Florob

Why?

Assembler

Design Goals 1 int var = 0; Prior Art 2 asm ("add $5, %0" : "+r"(var)); D MSVC 3 asm ("add $5, %0" : "=r"(var) : "0"(var)); gcc LLVM IR

Rust (now)

Rust’s Future?

Questions

27 / 48 gcc Examples: Get L1 cache size

Inline Assembly 1 int ebx, ecx; Florob 2 asm (

Why? 3 "mov $4, %%eax;"

Assembler 4 "xor %%ecx, %%ecx;"

Design Goals 5 "cpuid;"

Prior Art 6 "mov %%ebx, %0;" D 7 : "=r"(ebx), "=c"(ecx) MSVC gcc 8 : LLVM IR 9 : "eax", "ebx", "edx" Rust (now) 10 ); Rust’s Future? 11 printf("L1 Cache: %i\n", ((ebx >> 22) + 1) Questions 12 * (((ebx >> 12) & 0x3ff) + 1) 13 * ((ebx & 0xfff) + 1) 14 * (ecx + 1));

28 / 48 No, in and out could be in the same register.

gcc Examples: Increment

Inline Assembly

Florob

Why? 1 int in = 3, out; Assembler 2 asm ( Design Goals 3 "mov $1, %0\r\n" Prior Art 4 "add %1, %0\r\n" D MSVC 5 : "=r"(out) gcc LLVM IR 6 : "r"(in) Rust (now) 7 ); Rust’s Future? Is this correct? Questions

29 / 48 gcc Examples: Increment

Inline Assembly

Florob

Why? 1 int in = 3, out; Assembler 2 asm ( Design Goals 3 "mov $1, %0\r\n" Prior Art 4 "add %1, %0\r\n" D MSVC 5 : "=r"(out) gcc LLVM IR 6 : "r"(in) Rust (now) 7 );

Rust’s Future? Is this correct? No, in and out could be in the same register. Questions

29 / 48 LLVM IR

Inline Assembly

Florob

Why? 1 %out = call i32 asm "template", "constraints"(i32 %in) Assembler Design Goals used to implement gcc-style inline assembly Prior Art D outputs as return values, inputs as call arguments MSVC gcc LLVM IR can be qualified with sideeffect, alignstack, inteldialect Rust (now) Docs: http://llvm.org/docs/LangRef.html# Rust’s Future? inline-assembler-expressions Questions

30 / 48 LLVM Template

Inline Assembly

Florob

Why?

Assembler Design Goals $0 as placeholder Prior Art D supports modifiers, e. g. ${0:w} MSVC gcc escape is $$ LLVM IR

Rust (now)

Rust’s Future?

Questions

31 / 48 LLVM Constraints

Inline Assembly

Florob

Why? constraint codes are a single letter,

Assembler ^ followed by two letters, or {reg} Design Goals architecture specific Prior Art D e. g. r for register, m for memory, i for immediate MSVC gcc multiple alternative constraints are allowed, allowing the compiler to LLVM IR

Rust (now) choose the most efficient Rust’s Future? indirect outputs/inputs via the * modifier Questions argument is an address that will be read/written from, typically =*m

32 / 48 LLVM Constraints

Inline Assembly

Florob output start with = (no +) Why? can be modified with & to allow write before all inputs are read Assembler (early-clobber) Design Goals

Prior Art input D MSVC can be tied to outputs by specifying the operand number as gcc constrained LLVM IR

Rust (now) clobbers Rust’s Future? start with ~ Questions used registers modified flags if memory was modified

33 / 48 LLVM Examples: Add 5 to variable

Inline Assembly

Florob

Why?

Assembler 1 %1 = alloca i32, align 4 Design Goals 2 store i32 0, i32* %1, align 4 Prior Art D 3 %2 = load i32, i32* %1, align 4 MSVC 4 %3 = call i32 asm "add $$5, $0", "=r,0"(i32 %2) gcc LLVM IR 5 store i32 %3, i32* %1, align 4 Rust (now)

Rust’s Future?

Questions

34 / 48 LLVM Examples: Get L1 cache size

Inline Assembly

Florob

Why? 1 %1 = alloca i32, align 4

Assembler 2 %2 = alloca i32, align 4

Design Goals 3 %3 = call { i32, i32 } asm

Prior Art 4 "mov $$4, %eax;xor %ecx, %ecx;cpuid;mov %ebx, $0;", D 5 "=r,={ecx},~{eax},~{ebx},~{edx}"() MSVC gcc 6 %4 = extractvalue { i32, i32 } %3, 0 LLVM IR 7 %5 = extractvalue { i32, i32 } %3, 1 Rust (now) 8 store i32 %4, i32* %1, align 4 Rust’s Future? 9 store i32 %5, i32* %2, align 4 Questions

35 / 48 Inline Assembly

Florob 1 Why?

Why?

Assembler 2 Assembler

Design Goals Prior Art 3 Design Goals D MSVC gcc LLVM IR 4 Prior Art Rust (now)

Rust’s Future?

Questions 5 Rust (now)

6 Rust’s Future?

36 / 48 Rust global_asm!()

Inline Assembly 1 global_asm!(r" Florob 2 add5: Why? 3 mov %rdi, %rax Assembler 4 add $5, %rax

Design Goals 5 ret

Prior Art 6 "); D 7 MSVC extern { gcc 8 fn add5(i: i64) -> i64; LLVM IR 9 } Rust (now)

Rust’s Future?

Questions RFC 1548 insert assembly verbatim into the module not yet implemented

37 / 48 Rust asm!()

Inline Assembly

Florob 1 unsafe { 2 asm!("template" Why? 3 : outs Assembler 4 : ins Design Goals 5 : clobbers Prior Art 6 : options D MSVC 7 ); gcc LLVM IR 8 } Rust (now) Rust’s Future? straight binding to LLVM IR, but supports + Questions options are volatile, alignstack, and intel Docs: https://doc.rust-lang.org/unstable-book/asm.html

38 / 48 Rust Examples: Add 5 to variable

Inline Assembly

Florob

Why?

Assembler

Design Goals 1 let mut var = 0;

Prior Art 2 unsafe { D 3 asm!("add $$5, $0" : "+r"(var)); MSVC gcc 4 } LLVM IR

Rust (now)

Rust’s Future?

Questions

39 / 48 Rust Examples: Get L1 cache size

Inline Assembly 1 let ebx: i32; Florob 2 let ecx: i32;

Why? 3 unsafe {

Assembler 4 asm!(r"

Design Goals 5 mov $$4, %eax;

Prior Art 6 xor %ecx, %ecx; D 7 cpuid; MSVC gcc 8 mov %ebx, $0;" LLVM IR 9 : "=r"(ebx), "={ecx}"(ecx) :: "eax", "ebx", "edx" Rust (now) 10 ); Rust’s Future? 11 } Questions 12 println!("L1 Cache: {}", ((ebx >> 22) + 1) 13 * (((ebx >> 12) & 0x3ff) + 1) 14 * ((ebx & 0xfff) + 1) * (ecx + 1));

40 / 48 Inline Assembly

Florob 1 Why?

Why?

Assembler 2 Assembler

Design Goals Prior Art 3 Design Goals D MSVC gcc LLVM IR 4 Prior Art Rust (now)

Rust’s Future?

Questions 5 Rust (now)

6 Rust’s Future?

41 / 48 RFC 129

Inline Assembly 1 asm!("assembly template", Florob 2 positional parameters,

Why? 3 named parameters, 4 Assembler clobbers and options 5 Design Goals );

Prior Art D MSVC positional parameters: gcc expr1, expr2_in -> expr2_out, LLVM IR

Rust (now) "eax" = expr3_in -> expr3_out, … Rust’s Future? named parameters: Questions name1 = expr_in_out, name2 = expr_in -> expr_out, … clobbers and options: "eax", "ebx", "memory", "volatile", "intel", …

42 / 48 RFC 129 Example

Inline Assembly

Florob 1 fn addsub(a: int, b: int) -> (int, int) { Why? 2 let mut c = 0; Assembler 3 let mut d = 0; Design Goals 4 unsafe { Prior Art D 5 asm!("add {2:r}, {:=r}\n\t\ MSVC 6 sub {2:r}, {:=r}", gcc LLVM IR 7 a -> c, a -> d, b); Rust (now) 8 } Rust’s Future? 9 (c, d) Questions 10 }

43 / 48 Some Questions

Inline Assembly

Florob

Why? Should we go the DSL, or template route? Assembler Does it make sense to support both? Design Goals What to do about early-clobber? Prior Art D MSVC What placehoder should we use? gcc LLVM IR Should we copy gcc since it’s familiar? Rust (now) Are single character constraints sensible? Alternatives? Rust’s Future?

Questions Are sigil heavy constraints sensible? Alternatives?

44 / 48 My Thoughts

Inline Assembly

Florob

Why? Templates work better for portability Assembler

Design Goals We should use the same placeholder as everywhere else {} Prior Art Use words over sigils for constraints D MSVC gcc Allow inputs and outputs in any order LLVM IR

Rust (now) Either do early-clobber by default, or be explicit

Rust’s Future? (early_out/late_out)

Questions

45 / 48 Example: Add 5 to variable

Inline Assembly

Florob

Why?

Assembler

Design Goals 1 let mut var = 0;

Prior Art 2 unsafe { D 3 asm!("add $5, {}", inout(reg) var); MSVC gcc 4 } LLVM IR

Rust (now)

Rust’s Future?

Questions

46 / 48 Example: Get L1 cache size

Inline Assembly 1 let ebx: i32; Florob 2 let ecx: i32;

Why? 3 unsafe {

Assembler 4 asm!(r"

Design Goals 5 mov $$4, %eax;

Prior Art 6 xor %ecx, %ecx; D 7 cpuid; MSVC gcc 8 mov %ebx, {};", LLVM IR 9 out(reg) ebx, out(ecx) ecx, clobber(eax, ebx, edx) Rust (now) 10 ); Rust’s Future? 11 } Questions 12 println!("L1 Cache: {}", ((ebx >> 22) + 1) 13 * (((ebx >> 12) & 0x3ff) + 1) 14 * ((ebx & 0xfff) + 1) * (ecx + 1));

47 / 48 Inline Assembly Florob Thank you for your attention.

Why? Any questions?

Assembler

Design Goals

Prior Art D MSVC gcc LLVM IR

Rust (now)

Rust’s Future?

Questions

https://babelmonkeys.de/~florob/talks/RC-2017-06-07-inline-assembly.pdf

48 / 48