RetDec: An Open-Source Machine-Code Decompiler

Jakub Kˇroustek Peter Matula Petr Zemek

Threat Labs

Botconf 2017 1 / 51 ♂ Peter Matula • main developer of the RetDec decompiler • senior developer @Avast (previously @AVG) • ♥ rock climbing and  • R [email protected]

> whoarewe

♂ Jakub Kˇroustek • founder of RetDec • Threat Labs lead @Avast (previously @AVG) • reverse engineer, malware hunter, security researcher • 7 @JakubKroustek • R [email protected]

Botconf 2017 2 / 51 > whoarewe

♂ Jakub Kˇroustek • founder of RetDec • Threat Labs lead @Avast (previously @AVG) • reverse engineer, malware hunter, security researcher • 7 @JakubKroustek • R [email protected] ♂ Peter Matula • main developer of the RetDec decompiler • senior developer @Avast (previously @AVG) • ♥ rock climbing and  • R [email protected]

Botconf 2017 2 / 51 Quiz Time

Botconf 2017 3 / 51 Quiz Time

Botconf 2017 4 / 51 Quiz Time

Botconf 2017 5 / 51 Quiz Time

Botconf 2017 6 / 51 Disassembling vs. Decompilation

Botconf 2017 7 / 51 Decompilation? What is it?

Botconf 2017 8 / 51 è Binary recompilation (yeah, like that’s ever gonna work) • porting • bug fixing • adding new features • original sources got lost • optimizations

Decompilation? What good is it?

ü Binary analysis • • malware analysis • vulnerability detection • verification • binary comparison • ...

Botconf 2017 9 / 51 Decompilation? What good is it?

ü Binary analysis • reverse engineering • malware analysis • vulnerability detection • verification • binary comparison • ... è Binary recompilation (yeah, like that’s ever gonna work) • porting • bug fixing • adding new features • original sources got lost • optimizations

Botconf 2017 9 / 51 • It is damn hard • compilation is not lossless • high-level constructions • data types • names • comments, macros, . . . • are optimizing • computer science goodies • undecidable problems • complex algorithms • exponential complexities • obfuscation, packing, anti-debugging

Ok, why aren’t we already using it?

• Multiple existing tools: Hex-Rays, Hopper, Snowman, etc.

Botconf 2017 10 / 51 • compilation is not lossless • high-level constructions • data types • names • comments, macros, . . . • compilers are optimizing • computer science goodies • undecidable problems • complex algorithms • exponential complexities • obfuscation, packing, anti-debugging

Ok, why aren’t we already using it?

• Multiple existing tools: Hex-Rays, Hopper, Snowman, etc. • It is damn hard

Botconf 2017 10 / 51 • compilers are optimizing • computer science goodies • undecidable problems • complex algorithms • exponential complexities • obfuscation, packing, anti-debugging

Ok, why aren’t we already using it?

• Multiple existing tools: Hex-Rays, Hopper, Snowman, etc. • It is damn hard • compilation is not lossless • high-level constructions • data types • names • comments, macros, . . .

Botconf 2017 10 / 51 • computer science goodies • undecidable problems • complex algorithms • exponential complexities • obfuscation, packing, anti-debugging

Ok, why aren’t we already using it?

• Multiple existing tools: Hex-Rays, Hopper, Snowman, etc. • It is damn hard • compilation is not lossless • high-level constructions • data types • names • comments, macros, . . . • compilers are optimizing

Botconf 2017 10 / 51 • obfuscation, packing, anti-debugging

Ok, why aren’t we already using it?

• Multiple existing tools: Hex-Rays, Hopper, Snowman, etc. • It is damn hard • compilation is not lossless • high-level constructions • data types • names • comments, macros, . . . • compilers are optimizing • computer science goodies • undecidable problems • complex algorithms • exponential complexities

Botconf 2017 10 / 51 Ok, why aren’t we already using it?

• Multiple existing tools: Hex-Rays, Hopper, Snowman, etc. • It is damn hard • compilation is not lossless • high-level constructions • data types • names • comments, macros, . . . • compilers are optimizing • computer science goodies • undecidable problems • complex algorithms • exponential complexities • obfuscation, packing, anti-debugging

Botconf 2017 10 / 51 • Many ABIs • Many OFFs (object-file formats) • ± ELF, q PE,  Mach-O, . . . • Many programming languages • Many compilers & optimizations • Statically linked code • ...

Generic decompilation? Even harder

• Many architectures • x86, ARM, MIPS, PowerPC, . . . • CISC vs. RISC • bit length, endianness, floating points • versions & extensions

Botconf 2017 11 / 51 • Many OFFs (object-file formats) • ± ELF, q PE,  Mach-O, . . . • Many programming languages • Many compilers & optimizations • Statically linked code • ...

Generic decompilation? Even harder

• Many architectures • x86, ARM, MIPS, PowerPC, . . . • CISC vs. RISC • bit length, endianness, floating points • versions & extensions • Many ABIs

Botconf 2017 11 / 51 • Many programming languages • Many compilers & optimizations • Statically linked code • ...

Generic decompilation? Even harder

• Many architectures • x86, ARM, MIPS, PowerPC, . . . • CISC vs. RISC • bit length, endianness, floating points • versions & extensions • Many ABIs • Many OFFs (object-file formats) • ± ELF, q PE,  Mach-O, . . .

Botconf 2017 11 / 51 • Many compilers & optimizations • Statically linked code • ...

Generic decompilation? Even harder

• Many architectures • x86, ARM, MIPS, PowerPC, . . . • CISC vs. RISC • bit length, endianness, floating points • versions & extensions • Many ABIs • Many OFFs (object-file formats) • ± ELF, q PE,  Mach-O, . . . • Many programming languages

Botconf 2017 11 / 51 • Statically linked code • ...

Generic decompilation? Even harder

• Many architectures • x86, ARM, MIPS, PowerPC, . . . • CISC vs. RISC • bit length, endianness, floating points • versions & extensions • Many ABIs • Many OFFs (object-file formats) • ± ELF, q PE,  Mach-O, . . . • Many programming languages • Many compilers & optimizations

Botconf 2017 11 / 51 • ...

Generic decompilation? Even harder

• Many architectures • x86, ARM, MIPS, PowerPC, . . . • CISC vs. RISC • bit length, endianness, floating points • versions & extensions • Many ABIs • Many OFFs (object-file formats) • ± ELF, q PE,  Mach-O, . . . • Many programming languages • Many compilers & optimizations • Statically linked code

Botconf 2017 11 / 51 Generic decompilation? Even harder

• Many architectures • x86, ARM, MIPS, PowerPC, . . . • CISC vs. RISC • bit length, endianness, floating points • versions & extensions • Many ABIs • Many OFFs (object-file formats) • ± ELF, q PE,  Mach-O, . . . • Many programming languages • Many compilers & optimizations • Statically linked code • ...

Botconf 2017 11 / 51 ƽ History • 2011–2013 (AVG + BUT FIT via TACRˇ TA01010667 grant) • 2013–2016 (AVG + BUT FIT students via diploma theses) • 2016–* (Avast + BUT FIT students)  People ȯ 3-4 core developers Ƅ ≈ 20 BSc/MSc/PhD students ǀ Lines of code ì 419,451 code 7 205,222 comments, etc. Ý 624,673 total

Retargetable Decompiler (RetDec)

◎ Goal • generic decompilation of binary code

Botconf 2017 12 / 51  People ȯ 3-4 core developers Ƅ ≈ 20 BSc/MSc/PhD students ǀ Lines of code ì 419,451 code 7 205,222 comments, etc. Ý 624,673 total

Retargetable Decompiler (RetDec)

◎ Goal • generic decompilation of binary code ƽ History • 2011–2013 (AVG + BUT FIT via TACRˇ TA01010667 grant) • 2013–2016 (AVG + BUT FIT students via diploma theses) • 2016–* (Avast + BUT FIT students)

Botconf 2017 12 / 51 ǀ Lines of code ì 419,451 code 7 205,222 comments, etc. Ý 624,673 total

Retargetable Decompiler (RetDec)

◎ Goal • generic decompilation of binary code ƽ History • 2011–2013 (AVG + BUT FIT via TACRˇ TA01010667 grant) • 2013–2016 (AVG + BUT FIT students via diploma theses) • 2016–* (Avast + BUT FIT students)  People ȯ 3-4 core developers Ƅ ≈ 20 BSc/MSc/PhD students

Botconf 2017 12 / 51 Retargetable Decompiler (RetDec)

◎ Goal • generic decompilation of binary code ƽ History • 2011–2013 (AVG + BUT FIT via TACRˇ TA01010667 grant) • 2013–2016 (AVG + BUT FIT students via diploma theses) • 2016–* (Avast + BUT FIT students)  People ȯ 3-4 core developers Ƅ ≈ 20 BSc/MSc/PhD students ǀ Lines of code ì 419,451 code 7 205,222 comments, etc. Ý 624,673 total

Botconf 2017 12 / 51 3 Does • /packer detection • statically linked code detection • OS loader simulation • recursive traversal disassembling • high-level constructions/types reconstruction • pattern detection • ... Ù Runs on (hopefully)

RetDec? What does it do

¿ Supports • architectures (32-bit): x86, ARM, PowerPC, MIPS • OFFs: ELF, PE, COFF, Mach-O, Intel HEX, AR, raw • compilers (we test with): gcc, Clang, MSVC

Botconf 2017 13 / 51 Ù Runs on (hopefully)

RetDec? What does it do

¿ Supports • architectures (32-bit): x86, ARM, PowerPC, MIPS • OFFs: ELF, PE, COFF, Mach-O, Intel HEX, AR, raw • compilers (we test with): gcc, Clang, MSVC 3 Does • compiler/packer detection • statically linked code detection • OS loader simulation • recursive traversal disassembling • high-level constructions/types reconstruction • pattern detection • ...

Botconf 2017 13 / 51 RetDec? What does it do

¿ Supports • architectures (32-bit): x86, ARM, PowerPC, MIPS • OFFs: ELF, PE, COFF, Mach-O, Intel HEX, AR, raw • compilers (we test with): gcc, Clang, MSVC 3 Does • compiler/packer detection • statically linked code detection • OS loader simulation • recursive traversal disassembling • high-level constructions/types reconstruction • pattern detection • ... Ù Runs on (hopefully)

Botconf 2017 13 / 51 á Repositories 11 core ¿ 6 support  8 third party Ò Contacts ɂ https://retdec.com/ ‡ https://github.com/avast-tl 7 https://twitter.com/retdec ø https://retdec.com/rss/ R [email protected]

Good news everyone!

 RetDec goes open-source under the MIT license • december 2017, shortly after the conference

Botconf 2017 14 / 51 Good news everyone!

 RetDec goes open-source under the MIT license • december 2017, shortly after the conference á Repositories C 11 core ¿ 6 support  8 third party Ò Contacts ɂ https://retdec.com/ ‡ https://github.com/avast-tl 7 https://twitter.com/retdec ø https://retdec.com/rss/ R [email protected]

Botconf 2017 14 / 51 How to get from EXE to C . . .

Botconf 2017 15 / 51 . . . by using cool technologies

Botconf 2017 16 / 51 We need to go deeper!

Botconf 2017 17 / 51 Preprocessing

Botconf 2017 18 / 51 Preprocessing

Botconf 2017 19 / 51 Preprocessing

Botconf 2017 19 / 51 Preprocessing

Botconf 2017 19 / 51 Preprocessing

Botconf 2017 19 / 51 Preprocessing

Botconf 2017 19 / 51 Preprocessing

Botconf 2017 19 / 51 Preprocessing

Botconf 2017 19 / 51 Preprocessing

Botconf 2017 19 / 51 Preprocessing

Botconf 2017 20 / 51 Preprocessing

Botconf 2017 20 / 51 ƶ PeLib • strengthened • new modules (rich header, delayed imports, security dir, . . . ) ƶ ELFIO • strengthened ƶ PDBparser • will hopefully be replaced by LLVM parsers ƶ Yaracpp • YARA C++ wrapper

Preprocessing repos

ƶ Fileformat • fileformat, loader, cpdetect, fileinfo, unpacker • ar-extractor, macho-extractor, . . .

Botconf 2017 21 / 51 ƶ ELFIO • strengthened ƶ PDBparser • will hopefully be replaced by LLVM parsers ƶ Yaracpp • YARA C++ wrapper

Preprocessing repos

ƶ Fileformat • fileformat, loader, cpdetect, fileinfo, unpacker • ar-extractor, macho-extractor, . . . ƶ PeLib • strengthened • new modules (rich header, delayed imports, security dir, . . . )

Botconf 2017 21 / 51 ƶ PDBparser • will hopefully be replaced by LLVM parsers ƶ Yaracpp • YARA C++ wrapper

Preprocessing repos

ƶ Fileformat • fileformat, loader, cpdetect, fileinfo, unpacker • ar-extractor, macho-extractor, . . . ƶ PeLib • strengthened • new modules (rich header, delayed imports, security dir, . . . ) ƶ ELFIO • strengthened

Botconf 2017 21 / 51 ƶ Yaracpp • YARA C++ wrapper

Preprocessing repos

ƶ Fileformat • fileformat, loader, cpdetect, fileinfo, unpacker • ar-extractor, macho-extractor, . . . ƶ PeLib • strengthened • new modules (rich header, delayed imports, security dir, . . . ) ƶ ELFIO • strengthened ƶ PDBparser • will hopefully be replaced by LLVM parsers

Botconf 2017 21 / 51 Preprocessing repos

ƶ Fileformat • fileformat, loader, cpdetect, fileinfo, unpacker • ar-extractor, macho-extractor, . . . ƶ PeLib • strengthened • new modules (rich header, delayed imports, security dir, . . . ) ƶ ELFIO • strengthened ƶ PDBparser • will hopefully be replaced by LLVM parsers ƶ Yaracpp • YARA C++ wrapper

Botconf 2017 21 / 51 Core

Botconf 2017 22 / 51 Core

Botconf 2017 22 / 51 Core

Botconf 2017 22 / 51 Core

Botconf 2017 22 / 51 • clang -o hello hello.c -O3 • 217 passes • -targetlibinfo -tti -tbaa -scoped-noalias -assumption-cache-tracker -profile-summary-info -forceattrs -inferattrs -ipsccp -globalopt -domtree -mem2reg -deadargelim -domtree -basicaa -aa -instcombine -simplifycfg -basiccg -globals-aa -prune-eh -inline -functionattrs -argpromotion

-domtree -sroa -basicaa -aa -memoryssa -early-cse-memssa -speculative-execution -domtree -basicaa

-aa -lazy-value-info -jump-threading ...

Core: LLVM

• dozens of analysis & transform & utility passes • dead global elimination, constant propagation, inlining, reassociation, loop optimization, memory promotion, dead store elimination, . . .

Botconf 2017 23 / 51 Core: LLVM

• dozens of analysis & transform & utility passes • dead global elimination, constant propagation, inlining, reassociation, loop optimization, memory promotion, dead store elimination, . . . • clang -o hello hello.c -O3 • 217 passes • -targetlibinfo -tti -tbaa -scoped-noalias -assumption-cache-tracker -profile-summary-info -forceattrs -inferattrs -ipsccp -globalopt -domtree -mem2reg -deadargelim -domtree -basicaa -aa -instcombine -simplifycfg -basiccg -globals-aa -prune-eh -inline -functionattrs -argpromotion

-domtree -sroa -basicaa -aa -memoryssa -early-cse-memssa -speculative-execution -domtree -basicaa

-aa -lazy-value-info -jump-threading ...

Botconf 2017 23 / 51 • SSA = Static Single Assignment • %y = add i32 %x, %arg • Load/Store architecture • %x = load i32, i32* @global • Functions, arguments, returns, data types • (Un)conditional branches, switches • Ŏ Universal IR for efficient compiler transformations and analyses

Core: LLVM IR

• LLVM IR = LLVM Intermediate Representation • kind of / three address code @global = global i32 define i32 @fnc(i32 %arg) { %x = load i32, i32* @global %y = add i32 %x, %arg store i32 %y, @global return i32 %y }

Botconf 2017 24 / 51 • Ŏ Universal IR for efficient compiler transformations and analyses

Core: LLVM IR

• LLVM IR = LLVM Intermediate Representation • kind of assembly language / three address code @global = global i32 define i32 @fnc(i32 %arg) { %x = load i32, i32* @global %y = add i32 %x, %arg store i32 %y, @global return i32 %y } • SSA = Static Single Assignment • %y = add i32 %x, %arg • Load/Store architecture • %x = load i32, i32* @global • Functions, arguments, returns, data types • (Un)conditional branches, switches

Botconf 2017 24 / 51 Core: LLVM IR

• LLVM IR = LLVM Intermediate Representation • kind of assembly language / three address code @global = global i32 define i32 @fnc(i32 %arg) { %x = load i32, i32* @global %y = add i32 %x, %arg store i32 %y, @global return i32 %y } • SSA = Static Single Assignment • %y = add i32 %x, %arg • Load/Store architecture • %x = load i32, i32* @global • Functions, arguments, returns, data types • (Un)conditional branches, switches • Ŏ Universal IR for efficient compiler transformations and analyses

Botconf 2017 24 / 51 Core: decoder

Botconf 2017 25 / 51 Core: decoder

Botconf 2017 25 / 51 Core: decoder

Botconf 2017 25 / 51 Core: decoder

Botconf 2017 25 / 51 Core: decoder

Botconf 2017 25 / 51 Core: decoder

Botconf 2017 25 / 51 Core: decoder

Botconf 2017 25 / 51 Core: decoder

Botconf 2017 25 / 51 • Architectures (core instruction sets): • ARM + Thumb extension – 32-bit • MIPS – 32/64-bit • PowerPC – 32/64-bit • x86 – 32/64-bit • Capstone: 64-bit ARM, SPARC, SYSZ, XCore, m68k, m680x, TMS320C64x • Decompilation & advanced insns • full semantics only for simple instructions • Implementation details, testing framework (Keystone Engine + LLVM emulator), keeping LLVM IR ↔ ASM mapping, . . .

Core: Capstone2LlvmIR

• Capstone insn → sequence of LLVM IR • Handcoded sequences • 32/64-bit x86 – 1 person ≈ 2-3 weeks

Botconf 2017 26 / 51 • Decompilation & advanced insns • full semantics only for simple instructions • Implementation details, testing framework (Keystone Engine + LLVM emulator), keeping LLVM IR ↔ ASM mapping, . . .

Core: Capstone2LlvmIR

• Capstone insn → sequence of LLVM IR • Handcoded sequences • 32/64-bit x86 – 1 person ≈ 2-3 weeks • Architectures (core instruction sets): • ARM + Thumb extension – 32-bit • MIPS – 32/64-bit • PowerPC – 32/64-bit • x86 – 32/64-bit • Capstone: 64-bit ARM, SPARC, SYSZ, XCore, m68k, m680x, TMS320C64x

Botconf 2017 26 / 51 • full semantics only for simple instructions • Implementation details, testing framework (Keystone Engine + LLVM emulator), keeping LLVM IR ↔ ASM mapping, . . .

Core: Capstone2LlvmIR

• Capstone insn → sequence of LLVM IR • Handcoded sequences • 32/64-bit x86 – 1 person ≈ 2-3 weeks • Architectures (core instruction sets): • ARM + Thumb extension – 32-bit • MIPS – 32/64-bit • PowerPC – 32/64-bit • x86 – 32/64-bit • Capstone: 64-bit ARM, SPARC, SYSZ, XCore, m68k, m680x, TMS320C64x • Decompilation & advanced insns

Botconf 2017 26 / 51 Would you rather . . .

• PMULHUW • Multiply Packed Unsigned Integers and Store High Result if (OperandSize == 64) { //PMULHUW instruction with 64-bit operands: Tmp0[0..31] = Dst[0..15] * Src[0..15]; Tmp1[0..31] = Dst[16..31] * Src[16..31]; Tmp2[0..31] = Dst[32..47] * Src[32..47]; Tmp3[0..31] = Dst[48..63] * Src[48..63]; Dst[0..15] = Tmp0[16..31]; Dst[16..31] = Tmp1[16..31]; __asm_PMULHUW(mm1, mm2); Dst[32..47] = Tmp2[16..31]; Dst[48..63] = Tmp3[16..31]; } else { //PMULHUW instruction with 128-bit operands: // Even longer ... }

Botconf 2017 25 / 51 • full semantics only for simple instructions • Implementation details, testing framework (Keystone Engine + LLVM emulator), keeping LLVM IR ↔ ASM mapping, . . .

Core: Capstone2LlvmIR

• Capstone insn → sequence of LLVM IR • Handcoded sequences • 32/64-bit x86 – 1 person ≈ 2-3 weeks • Architectures (core instruction sets): • ARM + Thumb extension – 32-bit • MIPS – 32/64-bit • PowerPC – 32/64-bit • x86 – 32/64-bit • Capstone: 64-bit ARM, SPARC, SYSZ, XCore, m68k, m680x, TMS320C64x • Decompilation & advanced insns

Botconf 2017 26 / 51 • Implementation details, testing framework (Keystone Engine + LLVM emulator), keeping LLVM IR ↔ ASM mapping, . . .

Core: Capstone2LlvmIR

• Capstone insn → sequence of LLVM IR • Handcoded sequences • 32/64-bit x86 – 1 person ≈ 2-3 weeks • Architectures (core instruction sets): • ARM + Thumb extension – 32-bit • MIPS – 32/64-bit • PowerPC – 32/64-bit • x86 – 32/64-bit • Capstone: 64-bit ARM, SPARC, SYSZ, XCore, m68k, m680x, TMS320C64x • Decompilation & advanced insns • full semantics only for simple instructions

Botconf 2017 26 / 51 Core: Capstone2LlvmIR

• Capstone insn → sequence of LLVM IR • Handcoded sequences • 32/64-bit x86 – 1 person ≈ 2-3 weeks • Architectures (core instruction sets): • ARM + Thumb extension – 32-bit • MIPS – 32/64-bit • PowerPC – 32/64-bit • x86 – 32/64-bit • Capstone: 64-bit ARM, SPARC, SYSZ, XCore, m68k, m680x, TMS320C64x • Decompilation & advanced insns • full semantics only for simple instructions • Implementation details, testing framework (Keystone Engine + LLVM emulator), keeping LLVM IR ↔ ASM mapping, . . .

Botconf 2017 26 / 51 Core: Capstone2LlvmIR

• Capstone insn → sequence of LLVM IR • Handcoded sequences • 32/64-bit x86 – 1 person ≈ 2-3 weeks • Architectures (core instruction sets): • ARM + Thumb extension – 32-bit • MIPS – 32/64-bit • PowerPC – 32/64-bit • x86 – 32/64-bit • Capstone: 64-bit ARM, SPARC, SYSZ, XCore, m68k, m680x, TMS320C64x • Decompilation & advanced insns • full semantics only for simple instructions • Implementation details, testing framework (Keystone Engine + LLVM emulator), keeping LLVM IR ↔ ASM mapping, . . .

Botconf 2017 26 / 51 Core: low-level passes

Botconf 2017 27 / 51 Core: low-level passes

Botconf 2017 27 / 51 Core: assembly generation

Botconf 2017 28 / 51 Core: high-level passes

Botconf 2017 29 / 51 ƶ Capstone2LlvmIR • Capstone instruction to LLVM IR translation ƶ Capstone-dumper • what does Capstone know about any instruction ƶ Fnc-patterns • statically linked function pattern creation and detection ƶ Yaramod • YARA to AST parsing & C++ API to build new YARA rulesets ƶ Ctypes • extraction and presentation of C function data types ƶ Demangler • gcc/Clang, Microsoft Visual C++, and Borland C++

Core repos

ƶ RetDec • bin2llvmir library • bin2llvmirtool

Botconf 2017 30 / 51 ƶ Capstone-dumper • what does Capstone know about any instruction ƶ Fnc-patterns • statically linked function pattern creation and detection ƶ Yaramod • YARA to AST parsing & C++ API to build new YARA rulesets ƶ Ctypes • extraction and presentation of C function data types ƶ Demangler • gcc/Clang, Microsoft Visual C++, and Borland C++

Core repos

ƶ RetDec • bin2llvmir library • bin2llvmirtool ƶ Capstone2LlvmIR • Capstone instruction to LLVM IR translation

Botconf 2017 30 / 51 ƶ Fnc-patterns • statically linked function pattern creation and detection ƶ Yaramod • YARA to AST parsing & C++ API to build new YARA rulesets ƶ Ctypes • extraction and presentation of C function data types ƶ Demangler • gcc/Clang, Microsoft Visual C++, and Borland C++

Core repos

ƶ RetDec • bin2llvmir library • bin2llvmirtool ƶ Capstone2LlvmIR • Capstone instruction to LLVM IR translation ƶ Capstone-dumper • what does Capstone know about any instruction

Botconf 2017 30 / 51 ƶ Yaramod • YARA to AST parsing & C++ API to build new YARA rulesets ƶ Ctypes • extraction and presentation of C function data types ƶ Demangler • gcc/Clang, Microsoft Visual C++, and Borland C++

Core repos

ƶ RetDec • bin2llvmir library • bin2llvmirtool ƶ Capstone2LlvmIR • Capstone instruction to LLVM IR translation ƶ Capstone-dumper • what does Capstone know about any instruction ƶ Fnc-patterns • statically linked function pattern creation and detection

Botconf 2017 30 / 51 ƶ Ctypes • extraction and presentation of C function data types ƶ Demangler • gcc/Clang, Microsoft Visual C++, and Borland C++

Core repos

ƶ RetDec • bin2llvmir library • bin2llvmirtool ƶ Capstone2LlvmIR • Capstone instruction to LLVM IR translation ƶ Capstone-dumper • what does Capstone know about any instruction ƶ Fnc-patterns • statically linked function pattern creation and detection ƶ Yaramod • YARA to AST parsing & C++ API to build new YARA rulesets

Botconf 2017 30 / 51 ƶ Demangler • gcc/Clang, Microsoft Visual C++, and Borland C++

Core repos

ƶ RetDec • bin2llvmir library • bin2llvmirtool ƶ Capstone2LlvmIR • Capstone instruction to LLVM IR translation ƶ Capstone-dumper • what does Capstone know about any instruction ƶ Fnc-patterns • statically linked function pattern creation and detection ƶ Yaramod • YARA to AST parsing & C++ API to build new YARA rulesets ƶ Ctypes • extraction and presentation of C function data types

Botconf 2017 30 / 51 Core repos

ƶ RetDec • bin2llvmir library • bin2llvmirtool ƶ Capstone2LlvmIR • Capstone instruction to LLVM IR translation ƶ Capstone-dumper • what does Capstone know about any instruction ƶ Fnc-patterns • statically linked function pattern creation and detection ƶ Yaramod • YARA to AST parsing & C++ API to build new YARA rulesets ƶ Ctypes • extraction and presentation of C function data types ƶ Demangler • gcc/Clang, Microsoft Visual C++, and Borland C++

Botconf 2017 30 / 51 Backend

Botconf 2017 31 / 51 Backend

Botconf 2017 31 / 51 Backend

Botconf 2017 31 / 51 Backend: BIR is an AST

• BIR = Backend IR • AST = Abstract syntax tree • while (x < 20){ x = x + (y * 2); }

Botconf 2017 32 / 51 Backend: code structuring

• LLVM IR: only (un)conditional branches & switches • identify high-level control-flow patterns • restructure BIR: if-else, for-loop, while-loop, switch, break, continue

Botconf 2017 33 / 51 Backend: code structuring

• LLVM IR: only (un)conditional branches & switches • identify high-level control-flow patterns • restructure BIR: if-else, for-loop, while-loop, switch, break, continue

Botconf 2017 33 / 51 Backend: code structuring

• LLVM IR: only (un)conditional branches & switches • identify high-level control-flow patterns • restructure BIR: if-else, for-loop, while-loop, switch, break, continue

Botconf 2017 33 / 51 Backend: code structuring

• LLVM IR: only (un)conditional branches & switches • identify high-level control-flow patterns • restructure BIR: if-else, for-loop, while-loop, switch, break, continue

Botconf 2017 33 / 51 Backend: code structuring

• LLVM IR: only (un)conditional branches & switches • identify high-level control-flow patterns • restructure BIR: if-else, for-loop, while-loop, switch, break, continue

Botconf 2017 33 / 51 Backend: code structuring

• LLVM IR: only (un)conditional branches & switches • identify high-level control-flow patterns • restructure BIR: if-else, for-loop, while-loop, switch, break, continue

Botconf 2017 33 / 51 Backend: code structuring

• LLVM IR: only (un)conditional branches & switches • identify high-level control-flow patterns • restructure BIR: if-else, for-loop, while-loop, switch, break, continue

Botconf 2017 33 / 51 • arithmetic expression simplification • a + -1 - -4 ⇒ a + 3 • negation optimization • if (!(a == b)) ⇒ if (a != b) • pointer arithmetic • *(a + 4) ⇒ a[4] • conversion of while (true){ ... if (cond) break; ... } • for (cond){ ... } • while (cond){ ... } • conversion of if/else-if/else chains to switch • ...

Backend: optimizations

• copy propagation • reducing the number of variables

Botconf 2017 34 / 51 • negation optimization • if (!(a == b)) ⇒ if (a != b) • pointer arithmetic • *(a + 4) ⇒ a[4] • conversion of while (true){ ... if (cond) break; ... } • for (cond){ ... } • while (cond){ ... } • conversion of if/else-if/else chains to switch • ...

Backend: optimizations

• copy propagation • reducing the number of variables • arithmetic expression simplification • a + -1 - -4 ⇒ a + 3

Botconf 2017 34 / 51 • pointer arithmetic • *(a + 4) ⇒ a[4] • conversion of while (true){ ... if (cond) break; ... } • for (cond){ ... } • while (cond){ ... } • conversion of if/else-if/else chains to switch • ...

Backend: optimizations

• copy propagation • reducing the number of variables • arithmetic expression simplification • a + -1 - -4 ⇒ a + 3 • negation optimization • if (!(a == b)) ⇒ if (a != b)

Botconf 2017 34 / 51 • conversion of while (true){ ... if (cond) break; ... } • for (cond){ ... } • while (cond){ ... } • conversion of if/else-if/else chains to switch • ...

Backend: optimizations

• copy propagation • reducing the number of variables • arithmetic expression simplification • a + -1 - -4 ⇒ a + 3 • negation optimization • if (!(a == b)) ⇒ if (a != b) • pointer arithmetic • *(a + 4) ⇒ a[4]

Botconf 2017 34 / 51 • conversion of if/else-if/else chains to switch • ...

Backend: optimizations

• copy propagation • reducing the number of variables • arithmetic expression simplification • a + -1 - -4 ⇒ a + 3 • negation optimization • if (!(a == b)) ⇒ if (a != b) • pointer arithmetic • *(a + 4) ⇒ a[4] • conversion of while (true){ ... if (cond) break; ... } • for (cond){ ... } • while (cond){ ... }

Botconf 2017 34 / 51 • ...

Backend: optimizations

• copy propagation • reducing the number of variables • arithmetic expression simplification • a + -1 - -4 ⇒ a + 3 • negation optimization • if (!(a == b)) ⇒ if (a != b) • pointer arithmetic • *(a + 4) ⇒ a[4] • conversion of while (true){ ... if (cond) break; ... } • for (cond){ ... } • while (cond){ ... } • conversion of if/else-if/else chains to switch

Botconf 2017 34 / 51 Backend: optimizations

• copy propagation • reducing the number of variables • arithmetic expression simplification • a + -1 - -4 ⇒ a + 3 • negation optimization • if (!(a == b)) ⇒ if (a != b) • pointer arithmetic • *(a + 4) ⇒ a[4] • conversion of while (true){ ... if (cond) break; ... } • for (cond){ ... } • while (cond){ ... } • conversion of if/else-if/else chains to switch • ...

Botconf 2017 34 / 51 • stdlib context literals • var_ffff7dc6 = socket(2, 3, 255) sock_id = socket(PF_INET, SOCK_RAW, IPPROTO_RAW) • flock(sock_id, 7) flock(sock_id, LOCK_SH | LOCK_EX | LOCK_NB) • output generation • C • CFG = Control-Flow Graph • Call Graph

Backend: code generation

• variable name assignment • induction variables: for (i = 0; i < 10; ++i) • function arguments: a1, a2, a3, ... • general context names: return result; • stdlib context names: int len = strlen();

Botconf 2017 35 / 51 sock_id = socket(PF_INET, SOCK_RAW, IPPROTO_RAW) • flock(sock_id, 7) flock(sock_id, LOCK_SH | LOCK_EX | LOCK_NB) • output generation • C • CFG = Control-Flow Graph • Call Graph

Backend: code generation

• variable name assignment • induction variables: for (i = 0; i < 10; ++i) • function arguments: a1, a2, a3, ... • general context names: return result; • stdlib context names: int len = strlen(); • stdlib context literals • var_ffff7dc6 = socket(2, 3, 255)

Botconf 2017 35 / 51 • flock(sock_id, 7) flock(sock_id, LOCK_SH | LOCK_EX | LOCK_NB) • output generation • C • CFG = Control-Flow Graph • Call Graph

Backend: code generation

• variable name assignment • induction variables: for (i = 0; i < 10; ++i) • function arguments: a1, a2, a3, ... • general context names: return result; • stdlib context names: int len = strlen(); • stdlib context literals • var_ffff7dc6 = socket(2, 3, 255) sock_id = socket(PF_INET, SOCK_RAW, IPPROTO_RAW)

Botconf 2017 35 / 51 • output generation • C • CFG = Control-Flow Graph • Call Graph

Backend: code generation

• variable name assignment • induction variables: for (i = 0; i < 10; ++i) • function arguments: a1, a2, a3, ... • general context names: return result; • stdlib context names: int len = strlen(); • stdlib context literals • var_ffff7dc6 = socket(2, 3, 255) sock_id = socket(PF_INET, SOCK_RAW, IPPROTO_RAW) • flock(sock_id, 7) flock(sock_id, LOCK_SH | LOCK_EX | LOCK_NB)

Botconf 2017 35 / 51 Backend: code generation

• variable name assignment • induction variables: for (i = 0; i < 10; ++i) • function arguments: a1, a2, a3, ... • general context names: return result; • stdlib context names: int len = strlen(); • stdlib context literals • var_ffff7dc6 = socket(2, 3, 255) sock_id = socket(PF_INET, SOCK_RAW, IPPROTO_RAW) • flock(sock_id, 7) flock(sock_id, LOCK_SH | LOCK_EX | LOCK_NB) • output generation • C • CFG = Control-Flow Graph • Call Graph

Botconf 2017 35 / 51 Backend repos

ƶ RetDec • llvmir2hll library • llvmir2hlltool

Botconf 2017 36 / 51 , REST API https://retdec.com/api/ © Build it yourself 3 CMake, ± gcc/Clang, q Visual Studio 2015 Update 2 I Perl, GNU Bison, Flex, GNU Tar, scp, GNU bash, UPX, dot ƶ Recursively clone the main RetDec repository mkdir build && cd build cmake .. make && make install Ù Run it yourself decompile.sh binary.exe I Get RetDec IDA plugin

How to use RetDec

ɂ Online decompilation service https://retdec.com/decompilation/

Botconf 2017 37 / 51 © Build it yourself 3 CMake, ± gcc/Clang, q Visual Studio 2015 Update 2 I Perl, GNU Bison, Flex, GNU Tar, scp, GNU bash, UPX, dot ƶ Recursively clone the main RetDec repository mkdir build && cd build cmake .. make && make install Ù Run it yourself decompile.sh binary.exe I Get RetDec IDA plugin

How to use RetDec

ɂ Online decompilation service https://retdec.com/decompilation/ , REST API https://retdec.com/api/

Botconf 2017 37 / 51 Ù Run it yourself decompile.sh binary.exe I Get RetDec IDA plugin

How to use RetDec

ɂ Online decompilation service https://retdec.com/decompilation/ , REST API https://retdec.com/api/ © Build it yourself 3 CMake, ± gcc/Clang, q Visual Studio 2015 Update 2 I Perl, GNU Bison, Flex, GNU Tar, scp, GNU bash, UPX, dot ƶ Recursively clone the main RetDec repository mkdir build && cd build cmake .. make && make install

Botconf 2017 37 / 51 I Get RetDec IDA plugin

How to use RetDec

ɂ Online decompilation service https://retdec.com/decompilation/ , REST API https://retdec.com/api/ © Build it yourself 3 CMake, ± gcc/Clang, q Visual Studio 2015 Update 2 I Perl, GNU Bison, Flex, GNU Tar, scp, GNU bash, UPX, dot ƶ Recursively clone the main RetDec repository mkdir build && cd build cmake .. make && make install Ù Run it yourself decompile.sh binary.exe

Botconf 2017 37 / 51 How to use RetDec

ɂ Online decompilation service https://retdec.com/decompilation/ , REST API https://retdec.com/api/ © Build it yourself 3 CMake, ± gcc/Clang, q Visual Studio 2015 Update 2 I Perl, GNU Bison, Flex, GNU Tar, scp, GNU bash, UPX, dot ƶ Recursively clone the main RetDec repository mkdir build && cd build cmake .. make && make install Ù Run it yourself decompile.sh binary.exe I Get RetDec IDA plugin

Botconf 2017 37 / 51 What is RetDec IDA plugin

Botconf 2017 38 / 51 How does RetDec IDA plugin work

◎ Goals ` look & feel native ȩ same object names as IDA è interactive

Botconf 2017 39 / 51 How does RetDec IDA plugin work

◎ Goals ` look & feel native ȩ same object names as IDA è interactive

Botconf 2017 39 / 51 How does RetDec IDA plugin work

◎ Goals ` look & feel native ȩ same object names as IDA è interactive

Botconf 2017 39 / 51 How does RetDec IDA plugin work

◎ Goals ` look & feel native ȩ same object names as IDA è interactive

Botconf 2017 39 / 51 How does RetDec IDA plugin work

◎ Goals ` look & feel native ȩ same object names as IDA è interactive

Botconf 2017 39 / 51 How does RetDec IDA plugin work

◎ Goals ` look & feel native ȩ same object names as IDA è interactive

Botconf 2017 39 / 51 How does RetDec IDA plugin work

◎ Goals ` look & feel native ȩ same object names as IDA è interactive

Botconf 2017 39 / 51 How does RetDec IDA plugin work

◎ Goals ` look & feel native ȩ same object names as IDA è interactive

Botconf 2017 39 / 51 RetDec IDA plugin is interactive

Botconf 2017 40 / 51  12,000 registered users Ù 423,000 decompilations ɂ 350,000 Web , 73,000 API ÿ 410 decompilations daily

How was RetDec used so far

 retdec.com launched on 2015-02-05

Botconf 2017 41 / 51 Ù 423,000 decompilations ɂ 350,000 Web , 73,000 API ÿ 410 decompilations daily

How was RetDec used so far

 retdec.com launched on 2015-02-05  12,000 registered users

Botconf 2017 41 / 51 How was RetDec used so far

 retdec.com launched on 2015-02-05  12,000 registered users Ù 423,000 decompilations ɂ 350,000 Web , 73,000 API ÿ 410 decompilations daily

Botconf 2017 41 / 51 Real example #1: Vawtrak (x86)

Botconf 2017 42 / 51 Real example #2: Vawtrak (x86)

Botconf 2017 43 / 51 Real example #3: CryproWall (x86)

Botconf 2017 44 / 51 Real example #4: Psyb0t (mips) RetDec

Botconf 2017 45 / 51 Real example #5: Psyb0t (mips) RetDec

main

ip2c system flock function_404810 function_404b1c fileno backup Daemonize RSeed getip

fetch sleep strcmp xDec fopen fread fclose fork exit gettimeofday srand

snprintf strncmp parse strncpy strlen

Botconf 2017 46 / 51 NO!

• IDA and Hex-Rays are great Ŏ output quality è interactive ç seamlessly integrated mature many plugins î official support

• IDA and Hex-Rays have flaws not free µ proprietary ® big monolithic GUI app

Should you throw away your Hex-Rays?

Botconf 2017 47 / 51 • IDA and Hex-Rays have flaws not free µ proprietary ® big monolithic GUI app

Should you throw away your Hex-Rays? NO!

• IDA and Hex-Rays are great Ŏ output quality è interactive ç seamlessly integrated mature many plugins î official support

Botconf 2017 47 / 51 Should you throw away your Hex-Rays? NO!

• IDA and Hex-Rays are great Ŏ output quality è interactive ç seamlessly integrated mature many plugins î official support

• IDA and Hex-Rays have flaws not free µ proprietary ® big monolithic GUI app

Botconf 2017 47 / 51 • Not so obvious reasons Ǹ LLVM is awesome  different basic designs: interactive GUI vs. pipeline Ȝ LLVM is OP (don’t worry, it won’t be nerfed)

RetDec is handy because . . .

• Obvious reasons it is free + MIPS architecture a MIT license r you can play with the sources

Botconf 2017 48 / 51  different basic designs: interactive GUI vs. pipeline Ȝ LLVM is OP (don’t worry, it won’t be nerfed)

RetDec is handy because . . .

• Obvious reasons it is free + MIPS architecture a MIT license r you can play with the sources • Not so obvious reasons Ǹ LLVM is awesome

Botconf 2017 48 / 51 Ȝ LLVM is OP (don’t worry, it won’t be nerfed)

RetDec is handy because . . .

• Obvious reasons it is free + MIPS architecture a MIT license r you can play with the sources • Not so obvious reasons Ǹ LLVM is awesome  different basic designs: interactive GUI vs. pipeline

Botconf 2017 48 / 51 RetDec is handy because . . .

• Obvious reasons it is free + MIPS architecture a MIT license r you can play with the sources • Not so obvious reasons Ǹ LLVM is awesome  different basic designs: interactive GUI vs. pipeline Ȝ LLVM is OP (don’t worry, it won’t be nerfed)

Botconf 2017 48 / 51 ƶ Fileformat – generic OFF parsing and analysis ƶ Capstone2LlvmIR – binary to LLVM translation ƶ Fnc-patterns – statically linked code detection in YARA (IDA F.L.I.R.T.) ƶ Yaramod – hack YARA rules in C++ ƶ Yaracpp – YARA C++ wrapper ƶ Ctypes – info on function types

RetDec is not only decompiler

ƶ RetDec – the decompiler ƶ RetDec IDA plugin – Hex-Rays impersonation

Botconf 2017 49 / 51 RetDec is not only decompiler

ƶ RetDec – the decompiler ƶ RetDec IDA plugin – Hex-Rays impersonation ƶ Fileformat – generic OFF parsing and analysis ƶ Capstone2LlvmIR – binary to LLVM translation ƶ Fnc-patterns – statically linked code detection in YARA (IDA F.L.I.R.T.) ƶ Yaramod – hack YARA rules in C++ ƶ Yaracpp – YARA C++ wrapper ƶ Ctypes – info on function types

Botconf 2017 49 / 51 • Throw a release party        • Solve some inevitable “hey guys, I’m unable to build your repo” • Write more technical documentation on how it all works • Present it somewhere (maybe LLVM dev meeting) • Continue improving the implementation • make it more portable: Bash ⇒ Python, . . . • 64-bit architectures supports • replace some libraries & modules • We will see. . .

What’s next

• Release the sources on ‡ Github shortly after the conference

Botconf 2017 50 / 51 • Solve some inevitable “hey guys, I’m unable to build your repo” • Write more technical documentation on how it all works • Present it somewhere (maybe LLVM dev meeting) • Continue improving the implementation • make it more portable: Bash ⇒ Python, . . . • 64-bit architectures supports • replace some libraries & modules • We will see. . .

What’s next

• Release the sources on ‡ Github shortly after the conference • Throw a release party       

Botconf 2017 50 / 51 • Write more technical documentation on how it all works • Present it somewhere (maybe LLVM dev meeting) • Continue improving the implementation • make it more portable: Bash ⇒ Python, . . . • 64-bit architectures supports • replace some libraries & modules • We will see. . .

What’s next

• Release the sources on ‡ Github shortly after the conference • Throw a release party        • Solve some inevitable “hey guys, I’m unable to build your repo”

Botconf 2017 50 / 51 • Present it somewhere (maybe LLVM dev meeting) • Continue improving the implementation • make it more portable: Bash ⇒ Python, . . . • 64-bit architectures supports • replace some libraries & modules • We will see. . .

What’s next

• Release the sources on ‡ Github shortly after the conference • Throw a release party        • Solve some inevitable “hey guys, I’m unable to build your repo” • Write more technical documentation on how it all works

Botconf 2017 50 / 51 • Continue improving the implementation • make it more portable: Bash ⇒ Python, . . . • 64-bit architectures supports • replace some libraries & modules • We will see. . .

What’s next

• Release the sources on ‡ Github shortly after the conference • Throw a release party        • Solve some inevitable “hey guys, I’m unable to build your repo” • Write more technical documentation on how it all works • Present it somewhere (maybe LLVM dev meeting)

Botconf 2017 50 / 51 • We will see. . .

What’s next

• Release the sources on ‡ Github shortly after the conference • Throw a release party        • Solve some inevitable “hey guys, I’m unable to build your repo” • Write more technical documentation on how it all works • Present it somewhere (maybe LLVM dev meeting) • Continue improving the implementation • make it more portable: Bash ⇒ Python, . . . • 64-bit architectures supports • replace some libraries & modules

Botconf 2017 50 / 51 What’s next

• Release the sources on ‡ Github shortly after the conference • Throw a release party        • Solve some inevitable “hey guys, I’m unable to build your repo” • Write more technical documentation on how it all works • Present it somewhere (maybe LLVM dev meeting) • Continue improving the implementation • make it more portable: Bash ⇒ Python, . . . • 64-bit architectures supports • replace some libraries & modules • We will see. . .

Botconf 2017 50 / 51 That’s all folks

Thanks!

Ò Contacts ɂ https://retdec.com/ ‡ https://github.com/avast-tl 7 https://twitter.com/retdec ø https://retdec.com/rss/ R [email protected]

Botconf 2017 51 / 51