AFTERWORD

Where to Go from Here?

After you have worked your way through this book, you have mastered the basics of modern assembly programming. The next step depends on your needs. This afterword contains some ideas. Security analysts can use the acquired knowledge to study malware, viruses, and other ways to break into computers or networks. Malware, in binary format, tries to get into computers and networks. You can take this binary code, reverse engineer it, and try to figure out what the code is doing. You would, of course, do that in an isolated lab system. Study how to reverse engineer and acquire the necessary tooling. You should consider learning ARM assembly for analyzing code on smartphones. As a higher-level language programmer, you may consider building your own library of high-speed functions to be linked with your code. Study how you can optimize code; the code in this book was not written for high performance but for illustration purposes. In the book, we referred to a couple of texts that can help you write optimized code. If you want a thorough understanding of the processors, download the Intel manuals and study them. There is a lot of interesting information to digest, and knowing how the hardware and software works together will give you an edge in developing system software or diagnosing system crashes. As a higher-level language programmer with a grasp of , you are now better equipped to debug your code. Analyze your .obj and .lst files and reverse engineer your code to see what happens. See how your converts your code into machine language. Maybe using other instructions are more efficient?

405 © Jo Van Hoey 2019 J. Van Hoey, Beginning x64 Assembly Programming, https://doi.org/10.1007/978-1-4842-5076-1 Index

A avx2, 308 avx512, 308 add instructions, 82 addpd, 240, 244 addps, 240 B addsd, 98 Base pointer, 108 addss, 99 betterloop.asm, 65 adouble.asm, 191 Binary numbers, 13–16 Advanced Vector Extension bitflags variable, 150–151 (AVX), 221, 307 Bit operations, 138, 139, 144, 147, 150 Aggregation, 252 Blend mask, 334, 335 align the stack, 384 blend_trace function, 368 Alive program, 37–40 Block Started by Symbol (bss), 9 Alive program printing, 40–44 Branch functions, 110 alive.asm, 35, 36 break or b command, 30 AND instruction, 49 bt, 147 arguments1.asm output, 385 btr, 147 arguments2.asm output, 386 bts, 147 Arithmetic bit operation, 140 ASCII, 206 Assembler functions, 187 Assembler preprocessor C functions, 113, 121, 185 directives, 155, 178 C , 1 Assembly instructions, 197 callee-saved register, 130 asum.asm, 190 , nonvolatile, 131 AVX instruction, 353, 362 Calling convention, 16-byte aligned, 125 AVX matrix multiplication, 329–331 Calling convention, volatile, 131 AVX matrix operations, 317–327 Calling conventions, 121, 129–130 AVX program, 310–315 Cayley-Hamilton theorem, 332 AVX_transpose function, 360 circle.asm, 115, 188

407 © Jo Van Hoey 2019 J. Van Hoey, Beginning x64 Assembly Programming, https://doi.org/10.1007/978-1-4842-5076-1 INDEX

CLI debugger, 21 Environment path variable, 371–373 Clobbered registers, 197, 201 Environment variables, 373 cmp, 62, 218 epilogue, 37 cmpsb, 209, 212, 213 Equal any, 252 coefficient, 323 Equal each, 252 Command line, 181–182 Equal range, 252 Command line, debugging, 183–184 Executable and Linkable Format for Compare and scan strings, 209–214 64-bit (elf64), 5 Comparison, 252 Expanded makefile, 117 Conditional assembly, 168, 178–179 Explicit length, 251, 270–275 Console I/O, 159–161 Extended inline assembly, 198, 199 continue or c command, 31 extern, 113 Conversion calculators, 14 External function, 101, 115 CountReg, 208 CPU, 16, 307–309 cpuid, 215–218 F CreateFileA, 402 File handling, 168, 179 cvtss2sd, 240, 244, 313 File I/O, 167–168 Flag register, 18, 19 Floating-point arguments, 123 D Floating-point numbers, 16, 97, 99, 390–392 Data Display Debugger (DDD), 51–53, 92 FPU instructions, 97 Datatypes, 8 function.asm output, 102 Debugging, break program, 27 function2.asm output, 105 Debug With Arbitrary Record Format function4.asm, 113 (DWARF), 5 dec, 82 DF flag, 207 G , 207 GDB, 21, 24 divsd, 98, 234, 323 GDB commands, 53, 54, 376 divss, 99 GDB, debugging, 22–27 gdbinit file, 24 gdb memory, 70 E Gedit, 2, 4 eflags, 62 General-purpose register, 16, 18 ELF format, 71 global, 115 Endianness, big-endian, 40 GNU compiler collection (GCC), 3, 6 Endianness, little-endian, 40 GUI debuggers, 21

408 Index H jumploop.asm, 63, 64 jz, 213 haddpd, 325 Hello Windows world, 371–376 hello, world, better version, 32–33 L hello, world program, 1 ldmxcsr, 226, 234 High cycles, 360 Leaf functions, 110 Higher-level language lea instruction, 69, 207 programmer, 405 Length of string, 259, 260 Leverrier algorithm, 333 I Linking Options line, 119 icalc.asm, 77 little-endian, 345 idiv instructions, 85 lodsb, 212 IEEE-754, 16 Looping, 63 imm8, 251 Looping vs. jumping, 66 imm8 control byte, 253–256 loop instruction, 65, 66, 89 Implicit length, 251, 267–270 Low cycles, 360 imul instructions, 83, 84 inc instructions, 82 M info registers, 28–30 Inline assembly, 195–197 Machine language, 11, 12 Instruction Pointer Register (rip), 18–19 Macros, 154–156 Integer arithmetic instructions, 82, 83 makefile, 5, 232 Integers, 15 mask, 303 Integrated development environment MASM, 371 (IDE), 3, 57 Match characters, 252 Intel syntax flavor, 23, 24 Match characters in range, 252 IntRes1, 253 Matrix inversion, 317, 332 IntRes2, 253 Matrix Math Extension (MMX), 221 matrix multiplication, 317, 329–332 Matrix print, 328 J, K Matrix transpose, 317, 339–341, 343 jge, 63 Memory, 67–69 jmp instructions, 101 Memory alignment, 12 jne, 212 memory.asm, 67 jnz, 213 Memory investigation, DDD, 54 jump instructions and flags, 63 Memory page, 262 jump.asm, 60 MinGW-w64, 372

409 INDEX

Minimalist GNU for Windows Optimization, 361 (MinGW), 372 OR, 47, 48 minus_mask, 321 Out-of-order execution, 360 mov, 10, 54, 69 Overflows, data, 162–165 movaps, 244 movdqa, 248 movdqu, 279 P, Q move.asm, 51, 52 Packed data, 221–223 Moving strings, 203–208 paddd, 248 movq, 44 pcmpestri, 251, 274 movsb, 207, 208 pcmpestrm, 251, 289 movsd, 97, 99, 208 pcmpistri, 251, 269 movss, 99, 240, 313 pcmpistrm, 251, 295 movsw, 208 Permutation, 346 movupd, 240 Permutation mask, 335–337 movups, 240 pextrd, 249 mul, 83 pinsrd, 249 mulsd, 98 Polarity, 252 mulss, 99 pop instruction, 87 Multiline macros, 155 Portable assembly language, 1 MXCSR, 19, 234, 235 port 5 pressure, 361 mxcsr bits, 226–233 Position-independent executables (PIEs), 6 PowerShell, 373 N printb, 133, 138 NASM, 3, 5, 371 printdpfp, 240 nasm-v, 371 printf, 40–42, 83, 91, 101, 115 neg instruction, 49 print_hex.c, 231 (NASM), 2, 71, 153 print_mxcsr.c, 231 next or n command, 31 print or p command, 31 Non-floating-point arguments, 122, 383 printspfp, 240, 244 nop instruction, 25 printString, 179 NOT, 47 print_xmm, 234 prologue, 37 pshufd, 279, 348 O pstrcmp, 269 Octal notation, 168 pstrlen, 260 Octal number, 15 pstrln, 295

410 Index pstrscan_l function, 266 Search in string, 262–264, 266 push, 87, 90 Search, range of characters, 296, 298–300 pxor, 261–266 Search, range of uppercase, 302 Search, substring, 301, 303–305 section .bss, 9 R section .data, 7, 8 radius, pi variables, 40, 43, 44 section .txt, 9–12 rax, 54, 69, 83 Security analysts, 405 rbx counter, 64 seq_trace function, 368 rdtsc, 360 seq_transpose function, 360 rdtscp, 360 serializing, 360 rdx, 10 setc, 150 readelf, 71–73 setnz, 326 reads function, 165 Settings dialog, SASM, 58, 59 rect.asm, 188 Shadow space, 374, 383 Register constraints, 200 shift, 82 Registers, 16, 62 shl, 139 rep, 207 shr, 139 repe, 212 Shuffle broadcast, 283 repne, 212, 214 Shuffle masks, 283, 288, 350 reverse string, 87–90 Shuffle reverse, 283 reverse_xmm0 function, 295 Shuffle rotate, 283 rflags, 18, 62 Shuffle version, matrix, 348–352 rip register, 18, 29 Shuffling, 277–283 rol, 140 Sign extension, 82, 140 ror, 140 Significand/mantissa, 96 Round down, 226 SimpleASM (SASM), 21, 57, 92, 165, Round to nearest, 226 372, 373 Round up, 226 Simple function, 101–103 Runtime masks, 288 Single vs. double precision, 95 Single instruction, multiple data (SIMD), 19, 221 S Single-line macros, 155 sal, 82, 139 singular, 319 sar, 82, 139 sqrtsd, 99 Scalar data, 221–223 sqrtss, 99 scasb, 209, 212, 214 sreverse.asm, 189 Search, characters, 289–293 SSE, aligned data, 241, 242, 245

411 INDEX SSE packed integers, instruction, 247–250 U SSE string manipulation, 251, 252, 256 Unaligned/aligned data, 223, 224 SSE, unaligned data, 237–240 unpack version, 344–348 SSE unaligned example, 310–312, 314 STABS, 5 Stack alignment, 16 byte, 107–109, 384 V stack.asm, 87, 390 vaddpd, 312 Stack frames, 110 vaddps, 311 Stack layout, 125, 128 variadic function, 393–395 Stack pointer, 68 vblendpd, 325 step or s command, 31 vbroadcastsd, 324 stmxcsr, 234 vdivsd, 324 stosb, 206, 208 vextractf128, 311 stosd, 208 vfmadd213sd, 323 stosw, 208 vfmadd231pd, 327 Streaming SIMD Extension (SSE), 215, 221 vfmadd231sd, 323 String compare, 252 vhaddpd, 325 Strings, explicit length, 270, 271, 273, 274 Visual Studio, 201, 371 Strings, implicit length, 267–269 vmovapd, 324 sub, 82 vmovupd, 312 subsd, 98 vmovups, 311 subss, 99 vmulpd, 324 Substring search, 252 vpermpd, 325 syscall, 167, 376 vperm2f128, 347 System V AMD64 ABI, 182 vshufpd, 351 vtrace function, 333 T vunpckhpd, 345 vunpcklpd, 345 test, 218 vxorpd, 323 testfile.txt file, 179 vzeroall, 321 test instruction, 218–220 time instruction, 66 timestamp, 360 W trace, 322 Windows, 371 Trace computation, 362–369 Windows API, 377 Transpose computation, 353–361 Windows API, Console Output, 377–380 Truncate, 226 WriteConsole, 380 tui enable command, 31 WriteFile, 380, 402

412 Index X Y x64 calling convention, 375 ymm register, 19 processors, 201 xmm registers, 19, 277, 392, 397 Z XOR instruction, 48 ZF flag, 213

413