RetDec: An Open-Source Machine-Code Decompiler
Jakub Kˇroustek Peter Matula Petr Zemek
Threat Labs
Botconf 2017 1 / 51 ♂ Peter Matula • main developer of the RetDec decompiler • senior developer @Avast (previously @AVG) • ♥ rock climbing and • R [email protected]
> whoarewe
♂ Jakub Kˇroustek • founder of RetDec • Threat Labs lead @Avast (previously @AVG) • reverse engineer, malware hunter, security researcher • 7 @JakubKroustek • R [email protected]
Botconf 2017 2 / 51 > whoarewe
♂ Jakub Kˇroustek • founder of RetDec • Threat Labs lead @Avast (previously @AVG) • reverse engineer, malware hunter, security researcher • 7 @JakubKroustek • R [email protected] ♂ Peter Matula • main developer of the RetDec decompiler • senior developer @Avast (previously @AVG) • ♥ rock climbing and • R [email protected]
Botconf 2017 2 / 51 Quiz Time
Botconf 2017 3 / 51 Quiz Time
Botconf 2017 4 / 51 Quiz Time
Botconf 2017 5 / 51 Quiz Time
Botconf 2017 6 / 51 Disassembling vs. Decompilation
Botconf 2017 7 / 51 Decompilation? What is it?
Botconf 2017 8 / 51 è Binary recompilation (yeah, like that’s ever gonna work) • porting • bug fixing • adding new features • original sources got lost • optimizations
Decompilation? What good is it?
ü Binary analysis • reverse engineering • malware analysis • vulnerability detection • verification • binary comparison • ...
Botconf 2017 9 / 51 Decompilation? What good is it?
ü Binary analysis • reverse engineering • malware analysis • vulnerability detection • verification • binary comparison • ... è Binary recompilation (yeah, like that’s ever gonna work) • porting • bug fixing • adding new features • original sources got lost • optimizations
Botconf 2017 9 / 51 • It is damn hard • compilation is not lossless • high-level constructions • data types • names • comments, macros, . . . • compilers are optimizing • computer science goodies • undecidable problems • complex algorithms • exponential complexities • obfuscation, packing, anti-debugging
Ok, why aren’t we already using it?
• Multiple existing tools: Hex-Rays, Hopper, Snowman, etc.
Botconf 2017 10 / 51 • compilation is not lossless • high-level constructions • data types • names • comments, macros, . . . • compilers are optimizing • computer science goodies • undecidable problems • complex algorithms • exponential complexities • obfuscation, packing, anti-debugging
Ok, why aren’t we already using it?
• Multiple existing tools: Hex-Rays, Hopper, Snowman, etc. • It is damn hard
Botconf 2017 10 / 51 • compilers are optimizing • computer science goodies • undecidable problems • complex algorithms • exponential complexities • obfuscation, packing, anti-debugging
Ok, why aren’t we already using it?
• Multiple existing tools: Hex-Rays, Hopper, Snowman, etc. • It is damn hard • compilation is not lossless • high-level constructions • data types • names • comments, macros, . . .
Botconf 2017 10 / 51 • computer science goodies • undecidable problems • complex algorithms • exponential complexities • obfuscation, packing, anti-debugging
Ok, why aren’t we already using it?
• Multiple existing tools: Hex-Rays, Hopper, Snowman, etc. • It is damn hard • compilation is not lossless • high-level constructions • data types • names • comments, macros, . . . • compilers are optimizing
Botconf 2017 10 / 51 • obfuscation, packing, anti-debugging
Ok, why aren’t we already using it?
• Multiple existing tools: Hex-Rays, Hopper, Snowman, etc. • It is damn hard • compilation is not lossless • high-level constructions • data types • names • comments, macros, . . . • compilers are optimizing • computer science goodies • undecidable problems • complex algorithms • exponential complexities
Botconf 2017 10 / 51 Ok, why aren’t we already using it?
• Multiple existing tools: Hex-Rays, Hopper, Snowman, etc. • It is damn hard • compilation is not lossless • high-level constructions • data types • names • comments, macros, . . . • compilers are optimizing • computer science goodies • undecidable problems • complex algorithms • exponential complexities • obfuscation, packing, anti-debugging
Botconf 2017 10 / 51 • Many ABIs • Many OFFs (object-file formats) • ± ELF, q PE, Mach-O, . . . • Many programming languages • Many compilers & optimizations • Statically linked code • ...
Generic decompilation? Even harder
• Many architectures • x86, ARM, MIPS, PowerPC, . . . • CISC vs. RISC • bit length, endianness, floating points • versions & extensions
Botconf 2017 11 / 51 • Many OFFs (object-file formats) • ± ELF, q PE, Mach-O, . . . • Many programming languages • Many compilers & optimizations • Statically linked code • ...
Generic decompilation? Even harder
• Many architectures • x86, ARM, MIPS, PowerPC, . . . • CISC vs. RISC • bit length, endianness, floating points • versions & extensions • Many ABIs
Botconf 2017 11 / 51 • Many programming languages • Many compilers & optimizations • Statically linked code • ...
Generic decompilation? Even harder
• Many architectures • x86, ARM, MIPS, PowerPC, . . . • CISC vs. RISC • bit length, endianness, floating points • versions & extensions • Many ABIs • Many OFFs (object-file formats) • ± ELF, q PE, Mach-O, . . .
Botconf 2017 11 / 51 • Many compilers & optimizations • Statically linked code • ...
Generic decompilation? Even harder
• Many architectures • x86, ARM, MIPS, PowerPC, . . . • CISC vs. RISC • bit length, endianness, floating points • versions & extensions • Many ABIs • Many OFFs (object-file formats) • ± ELF, q PE, Mach-O, . . . • Many programming languages
Botconf 2017 11 / 51 • Statically linked code • ...
Generic decompilation? Even harder
• Many architectures • x86, ARM, MIPS, PowerPC, . . . • CISC vs. RISC • bit length, endianness, floating points • versions & extensions • Many ABIs • Many OFFs (object-file formats) • ± ELF, q PE, Mach-O, . . . • Many programming languages • Many compilers & optimizations
Botconf 2017 11 / 51 • ...
Generic decompilation? Even harder
• Many architectures • x86, ARM, MIPS, PowerPC, . . . • CISC vs. RISC • bit length, endianness, floating points • versions & extensions • Many ABIs • Many OFFs (object-file formats) • ± ELF, q PE, Mach-O, . . . • Many programming languages • Many compilers & optimizations • Statically linked code
Botconf 2017 11 / 51 Generic decompilation? Even harder
• Many architectures • x86, ARM, MIPS, PowerPC, . . . • CISC vs. RISC • bit length, endianness, floating points • versions & extensions • Many ABIs • Many OFFs (object-file formats) • ± ELF, q PE, Mach-O, . . . • Many programming languages • Many compilers & optimizations • Statically linked code • ...
Botconf 2017 11 / 51 ƽ History • 2011–2013 (AVG + BUT FIT via TACRˇ TA01010667 grant) • 2013–2016 (AVG + BUT FIT students via diploma theses) • 2016–* (Avast + BUT FIT students) People ȯ 3-4 core developers Ƅ ≈ 20 BSc/MSc/PhD students ǀ Lines of code ì 419,451 code 7 205,222 comments, etc. Ý 624,673 total
Retargetable Decompiler (RetDec)
◎ Goal • generic decompilation of binary code
Botconf 2017 12 / 51 People ȯ 3-4 core developers Ƅ ≈ 20 BSc/MSc/PhD students ǀ Lines of code ì 419,451 code 7 205,222 comments, etc. Ý 624,673 total
Retargetable Decompiler (RetDec)
◎ Goal • generic decompilation of binary code ƽ History • 2011–2013 (AVG + BUT FIT via TACRˇ TA01010667 grant) • 2013–2016 (AVG + BUT FIT students via diploma theses) • 2016–* (Avast + BUT FIT students)
Botconf 2017 12 / 51 ǀ Lines of code ì 419,451 code 7 205,222 comments, etc. Ý 624,673 total
Retargetable Decompiler (RetDec)
◎ Goal • generic decompilation of binary code ƽ History • 2011–2013 (AVG + BUT FIT via TACRˇ TA01010667 grant) • 2013–2016 (AVG + BUT FIT students via diploma theses) • 2016–* (Avast + BUT FIT students) People ȯ 3-4 core developers Ƅ ≈ 20 BSc/MSc/PhD students
Botconf 2017 12 / 51 Retargetable Decompiler (RetDec)
◎ Goal • generic decompilation of binary code ƽ History • 2011–2013 (AVG + BUT FIT via TACRˇ TA01010667 grant) • 2013–2016 (AVG + BUT FIT students via diploma theses) • 2016–* (Avast + BUT FIT students) People ȯ 3-4 core developers Ƅ ≈ 20 BSc/MSc/PhD students ǀ Lines of code ì 419,451 code 7 205,222 comments, etc. Ý 624,673 total
Botconf 2017 12 / 51 3 Does • compiler/packer detection • statically linked code detection • OS loader simulation • recursive traversal disassembling • high-level constructions/types reconstruction • pattern detection • ... Ù Runs on (hopefully)
RetDec? What does it do
¿ Supports • architectures (32-bit): x86, ARM, PowerPC, MIPS • OFFs: ELF, PE, COFF, Mach-O, Intel HEX, AR, raw • compilers (we test with): gcc, Clang, MSVC
Botconf 2017 13 / 51 Ù Runs on (hopefully)
RetDec? What does it do
¿ Supports • architectures (32-bit): x86, ARM, PowerPC, MIPS • OFFs: ELF, PE, COFF, Mach-O, Intel HEX, AR, raw • compilers (we test with): gcc, Clang, MSVC 3 Does • compiler/packer detection • statically linked code detection • OS loader simulation • recursive traversal disassembling • high-level constructions/types reconstruction • pattern detection • ...
Botconf 2017 13 / 51 RetDec? What does it do
¿ Supports • architectures (32-bit): x86, ARM, PowerPC, MIPS • OFFs: ELF, PE, COFF, Mach-O, Intel HEX, AR, raw • compilers (we test with): gcc, Clang, MSVC 3 Does • compiler/packer detection • statically linked code detection • OS loader simulation • recursive traversal disassembling • high-level constructions/types reconstruction • pattern detection • ... Ù Runs on (hopefully)
Botconf 2017 13 / 51 á Repositories C 11 core ¿ 6 support 8 third party Ò Contacts ɂ https://retdec.com/ https://github.com/avast-tl 7 https://twitter.com/retdec ø https://retdec.com/rss/ R [email protected]
Good news everyone!
RetDec goes open-source under the MIT license • december 2017, shortly after the conference
Botconf 2017 14 / 51 Good news everyone!
RetDec goes open-source under the MIT license • december 2017, shortly after the conference á Repositories C 11 core ¿ 6 support 8 third party Ò Contacts ɂ https://retdec.com/ https://github.com/avast-tl 7 https://twitter.com/retdec ø https://retdec.com/rss/ R [email protected]
Botconf 2017 14 / 51 How to get from EXE to C . . .
Botconf 2017 15 / 51 . . . by using cool technologies
Botconf 2017 16 / 51 We need to go deeper!
Botconf 2017 17 / 51 Preprocessing
Botconf 2017 18 / 51 Preprocessing
Botconf 2017 19 / 51 Preprocessing
Botconf 2017 19 / 51 Preprocessing
Botconf 2017 19 / 51 Preprocessing
Botconf 2017 19 / 51 Preprocessing
Botconf 2017 19 / 51 Preprocessing
Botconf 2017 19 / 51 Preprocessing
Botconf 2017 19 / 51 Preprocessing
Botconf 2017 19 / 51 Preprocessing
Botconf 2017 20 / 51 Preprocessing
Botconf 2017 20 / 51 ƶ PeLib • strengthened • new modules (rich header, delayed imports, security dir, . . . ) ƶ ELFIO • strengthened ƶ PDBparser • will hopefully be replaced by LLVM parsers ƶ Yaracpp • YARA C++ wrapper
Preprocessing repos
ƶ Fileformat • fileformat, loader, cpdetect, fileinfo, unpacker • ar-extractor, macho-extractor, . . .
Botconf 2017 21 / 51 ƶ ELFIO • strengthened ƶ PDBparser • will hopefully be replaced by LLVM parsers ƶ Yaracpp • YARA C++ wrapper
Preprocessing repos
ƶ Fileformat • fileformat, loader, cpdetect, fileinfo, unpacker • ar-extractor, macho-extractor, . . . ƶ PeLib • strengthened • new modules (rich header, delayed imports, security dir, . . . )
Botconf 2017 21 / 51 ƶ PDBparser • will hopefully be replaced by LLVM parsers ƶ Yaracpp • YARA C++ wrapper
Preprocessing repos
ƶ Fileformat • fileformat, loader, cpdetect, fileinfo, unpacker • ar-extractor, macho-extractor, . . . ƶ PeLib • strengthened • new modules (rich header, delayed imports, security dir, . . . ) ƶ ELFIO • strengthened
Botconf 2017 21 / 51 ƶ Yaracpp • YARA C++ wrapper
Preprocessing repos
ƶ Fileformat • fileformat, loader, cpdetect, fileinfo, unpacker • ar-extractor, macho-extractor, . . . ƶ PeLib • strengthened • new modules (rich header, delayed imports, security dir, . . . ) ƶ ELFIO • strengthened ƶ PDBparser • will hopefully be replaced by LLVM parsers
Botconf 2017 21 / 51 Preprocessing repos
ƶ Fileformat • fileformat, loader, cpdetect, fileinfo, unpacker • ar-extractor, macho-extractor, . . . ƶ PeLib • strengthened • new modules (rich header, delayed imports, security dir, . . . ) ƶ ELFIO • strengthened ƶ PDBparser • will hopefully be replaced by LLVM parsers ƶ Yaracpp • YARA C++ wrapper
Botconf 2017 21 / 51 Core
Botconf 2017 22 / 51 Core
Botconf 2017 22 / 51 Core
Botconf 2017 22 / 51 Core
Botconf 2017 22 / 51 • clang -o hello hello.c -O3 • 217 passes • -targetlibinfo -tti -tbaa -scoped-noalias -assumption-cache-tracker -profile-summary-info -forceattrs -inferattrs -ipsccp -globalopt -domtree -mem2reg -deadargelim -domtree -basicaa -aa -instcombine -simplifycfg -basiccg -globals-aa -prune-eh -inline -functionattrs -argpromotion
-domtree -sroa -basicaa -aa -memoryssa -early-cse-memssa -speculative-execution -domtree -basicaa
-aa -lazy-value-info -jump-threading ...
Core: LLVM
• dozens of analysis & transform & utility passes • dead global elimination, constant propagation, inlining, reassociation, loop optimization, memory promotion, dead store elimination, . . .
Botconf 2017 23 / 51 Core: LLVM
• dozens of analysis & transform & utility passes • dead global elimination, constant propagation, inlining, reassociation, loop optimization, memory promotion, dead store elimination, . . . • clang -o hello hello.c -O3 • 217 passes • -targetlibinfo -tti -tbaa -scoped-noalias -assumption-cache-tracker -profile-summary-info -forceattrs -inferattrs -ipsccp -globalopt -domtree -mem2reg -deadargelim -domtree -basicaa -aa -instcombine -simplifycfg -basiccg -globals-aa -prune-eh -inline -functionattrs -argpromotion
-domtree -sroa -basicaa -aa -memoryssa -early-cse-memssa -speculative-execution -domtree -basicaa
-aa -lazy-value-info -jump-threading ...
Botconf 2017 23 / 51 • SSA = Static Single Assignment • %y = add i32 %x, %arg • Load/Store architecture • %x = load i32, i32* @global • Functions, arguments, returns, data types • (Un)conditional branches, switches • Ŏ Universal IR for efficient compiler transformations and analyses
Core: LLVM IR
• LLVM IR = LLVM Intermediate Representation • kind of assembly language / three address code @global = global i32 define i32 @fnc(i32 %arg) { %x = load i32, i32* @global %y = add i32 %x, %arg store i32 %y, @global return i32 %y }
Botconf 2017 24 / 51 • Ŏ Universal IR for efficient compiler transformations and analyses
Core: LLVM IR
• LLVM IR = LLVM Intermediate Representation • kind of assembly language / three address code @global = global i32 define i32 @fnc(i32 %arg) { %x = load i32, i32* @global %y = add i32 %x, %arg store i32 %y, @global return i32 %y } • SSA = Static Single Assignment • %y = add i32 %x, %arg • Load/Store architecture • %x = load i32, i32* @global • Functions, arguments, returns, data types • (Un)conditional branches, switches
Botconf 2017 24 / 51 Core: LLVM IR
• LLVM IR = LLVM Intermediate Representation • kind of assembly language / three address code @global = global i32 define i32 @fnc(i32 %arg) { %x = load i32, i32* @global %y = add i32 %x, %arg store i32 %y, @global return i32 %y } • SSA = Static Single Assignment • %y = add i32 %x, %arg • Load/Store architecture • %x = load i32, i32* @global • Functions, arguments, returns, data types • (Un)conditional branches, switches • Ŏ Universal IR for efficient compiler transformations and analyses
Botconf 2017 24 / 51 Core: decoder
Botconf 2017 25 / 51 Core: decoder
Botconf 2017 25 / 51 Core: decoder
Botconf 2017 25 / 51 Core: decoder
Botconf 2017 25 / 51 Core: decoder
Botconf 2017 25 / 51 Core: decoder
Botconf 2017 25 / 51 Core: decoder
Botconf 2017 25 / 51 Core: decoder
Botconf 2017 25 / 51 • Architectures (core instruction sets): • ARM + Thumb extension – 32-bit • MIPS – 32/64-bit • PowerPC – 32/64-bit • x86 – 32/64-bit • Capstone: 64-bit ARM, SPARC, SYSZ, XCore, m68k, m680x, TMS320C64x • Decompilation & advanced insns • full semantics only for simple instructions • Implementation details, testing framework (Keystone Engine + LLVM emulator), keeping LLVM IR ↔ ASM mapping, . . .
Core: Capstone2LlvmIR
• Capstone insn → sequence of LLVM IR • Handcoded sequences • 32/64-bit x86 – 1 person ≈ 2-3 weeks
Botconf 2017 26 / 51 • Decompilation & advanced insns • full semantics only for simple instructions • Implementation details, testing framework (Keystone Engine + LLVM emulator), keeping LLVM IR ↔ ASM mapping, . . .
Core: Capstone2LlvmIR
• Capstone insn → sequence of LLVM IR • Handcoded sequences • 32/64-bit x86 – 1 person ≈ 2-3 weeks • Architectures (core instruction sets): • ARM + Thumb extension – 32-bit • MIPS – 32/64-bit • PowerPC – 32/64-bit • x86 – 32/64-bit • Capstone: 64-bit ARM, SPARC, SYSZ, XCore, m68k, m680x, TMS320C64x
Botconf 2017 26 / 51 • full semantics only for simple instructions • Implementation details, testing framework (Keystone Engine + LLVM emulator), keeping LLVM IR ↔ ASM mapping, . . .
Core: Capstone2LlvmIR
• Capstone insn → sequence of LLVM IR • Handcoded sequences • 32/64-bit x86 – 1 person ≈ 2-3 weeks • Architectures (core instruction sets): • ARM + Thumb extension – 32-bit • MIPS – 32/64-bit • PowerPC – 32/64-bit • x86 – 32/64-bit • Capstone: 64-bit ARM, SPARC, SYSZ, XCore, m68k, m680x, TMS320C64x • Decompilation & advanced insns
Botconf 2017 26 / 51 Would you rather . . .
• PMULHUW • Multiply Packed Unsigned Integers and Store High Result if (OperandSize == 64) { //PMULHUW instruction with 64-bit operands: Tmp0[0..31] = Dst[0..15] * Src[0..15]; Tmp1[0..31] = Dst[16..31] * Src[16..31]; Tmp2[0..31] = Dst[32..47] * Src[32..47]; Tmp3[0..31] = Dst[48..63] * Src[48..63]; Dst[0..15] = Tmp0[16..31]; Dst[16..31] = Tmp1[16..31]; __asm_PMULHUW(mm1, mm2); Dst[32..47] = Tmp2[16..31]; Dst[48..63] = Tmp3[16..31]; } else { //PMULHUW instruction with 128-bit operands: // Even longer ... }
Botconf 2017 25 / 51 • full semantics only for simple instructions • Implementation details, testing framework (Keystone Engine + LLVM emulator), keeping LLVM IR ↔ ASM mapping, . . .
Core: Capstone2LlvmIR
• Capstone insn → sequence of LLVM IR • Handcoded sequences • 32/64-bit x86 – 1 person ≈ 2-3 weeks • Architectures (core instruction sets): • ARM + Thumb extension – 32-bit • MIPS – 32/64-bit • PowerPC – 32/64-bit • x86 – 32/64-bit • Capstone: 64-bit ARM, SPARC, SYSZ, XCore, m68k, m680x, TMS320C64x • Decompilation & advanced insns
Botconf 2017 26 / 51 • Implementation details, testing framework (Keystone Engine + LLVM emulator), keeping LLVM IR ↔ ASM mapping, . . .
Core: Capstone2LlvmIR
• Capstone insn → sequence of LLVM IR • Handcoded sequences • 32/64-bit x86 – 1 person ≈ 2-3 weeks • Architectures (core instruction sets): • ARM + Thumb extension – 32-bit • MIPS – 32/64-bit • PowerPC – 32/64-bit • x86 – 32/64-bit • Capstone: 64-bit ARM, SPARC, SYSZ, XCore, m68k, m680x, TMS320C64x • Decompilation & advanced insns • full semantics only for simple instructions
Botconf 2017 26 / 51 Core: Capstone2LlvmIR
• Capstone insn → sequence of LLVM IR • Handcoded sequences • 32/64-bit x86 – 1 person ≈ 2-3 weeks • Architectures (core instruction sets): • ARM + Thumb extension – 32-bit • MIPS – 32/64-bit • PowerPC – 32/64-bit • x86 – 32/64-bit • Capstone: 64-bit ARM, SPARC, SYSZ, XCore, m68k, m680x, TMS320C64x • Decompilation & advanced insns • full semantics only for simple instructions • Implementation details, testing framework (Keystone Engine + LLVM emulator), keeping LLVM IR ↔ ASM mapping, . . .
Botconf 2017 26 / 51 Core: Capstone2LlvmIR
• Capstone insn → sequence of LLVM IR • Handcoded sequences • 32/64-bit x86 – 1 person ≈ 2-3 weeks • Architectures (core instruction sets): • ARM + Thumb extension – 32-bit • MIPS – 32/64-bit • PowerPC – 32/64-bit • x86 – 32/64-bit • Capstone: 64-bit ARM, SPARC, SYSZ, XCore, m68k, m680x, TMS320C64x • Decompilation & advanced insns • full semantics only for simple instructions • Implementation details, testing framework (Keystone Engine + LLVM emulator), keeping LLVM IR ↔ ASM mapping, . . .
Botconf 2017 26 / 51 Core: low-level passes
Botconf 2017 27 / 51 Core: low-level passes
Botconf 2017 27 / 51 Core: assembly generation
Botconf 2017 28 / 51 Core: high-level passes
Botconf 2017 29 / 51 ƶ Capstone2LlvmIR • Capstone instruction to LLVM IR translation ƶ Capstone-dumper • what does Capstone know about any instruction ƶ Fnc-patterns • statically linked function pattern creation and detection ƶ Yaramod • YARA to AST parsing & C++ API to build new YARA rulesets ƶ Ctypes • extraction and presentation of C function data types ƶ Demangler • gcc/Clang, Microsoft Visual C++, and Borland C++
Core repos
ƶ RetDec • bin2llvmir library • bin2llvmirtool
Botconf 2017 30 / 51 ƶ Capstone-dumper • what does Capstone know about any instruction ƶ Fnc-patterns • statically linked function pattern creation and detection ƶ Yaramod • YARA to AST parsing & C++ API to build new YARA rulesets ƶ Ctypes • extraction and presentation of C function data types ƶ Demangler • gcc/Clang, Microsoft Visual C++, and Borland C++
Core repos
ƶ RetDec • bin2llvmir library • bin2llvmirtool ƶ Capstone2LlvmIR • Capstone instruction to LLVM IR translation
Botconf 2017 30 / 51 ƶ Fnc-patterns • statically linked function pattern creation and detection ƶ Yaramod • YARA to AST parsing & C++ API to build new YARA rulesets ƶ Ctypes • extraction and presentation of C function data types ƶ Demangler • gcc/Clang, Microsoft Visual C++, and Borland C++
Core repos
ƶ RetDec • bin2llvmir library • bin2llvmirtool ƶ Capstone2LlvmIR • Capstone instruction to LLVM IR translation ƶ Capstone-dumper • what does Capstone know about any instruction
Botconf 2017 30 / 51 ƶ Yaramod • YARA to AST parsing & C++ API to build new YARA rulesets ƶ Ctypes • extraction and presentation of C function data types ƶ Demangler • gcc/Clang, Microsoft Visual C++, and Borland C++
Core repos
ƶ RetDec • bin2llvmir library • bin2llvmirtool ƶ Capstone2LlvmIR • Capstone instruction to LLVM IR translation ƶ Capstone-dumper • what does Capstone know about any instruction ƶ Fnc-patterns • statically linked function pattern creation and detection
Botconf 2017 30 / 51 ƶ Ctypes • extraction and presentation of C function data types ƶ Demangler • gcc/Clang, Microsoft Visual C++, and Borland C++
Core repos
ƶ RetDec • bin2llvmir library • bin2llvmirtool ƶ Capstone2LlvmIR • Capstone instruction to LLVM IR translation ƶ Capstone-dumper • what does Capstone know about any instruction ƶ Fnc-patterns • statically linked function pattern creation and detection ƶ Yaramod • YARA to AST parsing & C++ API to build new YARA rulesets
Botconf 2017 30 / 51 ƶ Demangler • gcc/Clang, Microsoft Visual C++, and Borland C++
Core repos
ƶ RetDec • bin2llvmir library • bin2llvmirtool ƶ Capstone2LlvmIR • Capstone instruction to LLVM IR translation ƶ Capstone-dumper • what does Capstone know about any instruction ƶ Fnc-patterns • statically linked function pattern creation and detection ƶ Yaramod • YARA to AST parsing & C++ API to build new YARA rulesets ƶ Ctypes • extraction and presentation of C function data types
Botconf 2017 30 / 51 Core repos
ƶ RetDec • bin2llvmir library • bin2llvmirtool ƶ Capstone2LlvmIR • Capstone instruction to LLVM IR translation ƶ Capstone-dumper • what does Capstone know about any instruction ƶ Fnc-patterns • statically linked function pattern creation and detection ƶ Yaramod • YARA to AST parsing & C++ API to build new YARA rulesets ƶ Ctypes • extraction and presentation of C function data types ƶ Demangler • gcc/Clang, Microsoft Visual C++, and Borland C++
Botconf 2017 30 / 51 Backend
Botconf 2017 31 / 51 Backend
Botconf 2017 31 / 51 Backend
Botconf 2017 31 / 51 Backend: BIR is an AST
• BIR = Backend IR • AST = Abstract syntax tree • while (x < 20){ x = x + (y * 2); }
Botconf 2017 32 / 51 Backend: code structuring
• LLVM IR: only (un)conditional branches & switches • identify high-level control-flow patterns • restructure BIR: if-else, for-loop, while-loop, switch, break, continue
Botconf 2017 33 / 51 Backend: code structuring
• LLVM IR: only (un)conditional branches & switches • identify high-level control-flow patterns • restructure BIR: if-else, for-loop, while-loop, switch, break, continue
Botconf 2017 33 / 51 Backend: code structuring
• LLVM IR: only (un)conditional branches & switches • identify high-level control-flow patterns • restructure BIR: if-else, for-loop, while-loop, switch, break, continue
Botconf 2017 33 / 51 Backend: code structuring
• LLVM IR: only (un)conditional branches & switches • identify high-level control-flow patterns • restructure BIR: if-else, for-loop, while-loop, switch, break, continue
Botconf 2017 33 / 51 Backend: code structuring
• LLVM IR: only (un)conditional branches & switches • identify high-level control-flow patterns • restructure BIR: if-else, for-loop, while-loop, switch, break, continue
Botconf 2017 33 / 51 Backend: code structuring
• LLVM IR: only (un)conditional branches & switches • identify high-level control-flow patterns • restructure BIR: if-else, for-loop, while-loop, switch, break, continue
Botconf 2017 33 / 51 Backend: code structuring
• LLVM IR: only (un)conditional branches & switches • identify high-level control-flow patterns • restructure BIR: if-else, for-loop, while-loop, switch, break, continue
Botconf 2017 33 / 51 • arithmetic expression simplification • a + -1 - -4 ⇒ a + 3 • negation optimization • if (!(a == b)) ⇒ if (a != b) • pointer arithmetic • *(a + 4) ⇒ a[4] • conversion of while (true){ ... if (cond) break; ... } • for (cond){ ... } • while (cond){ ... } • conversion of if/else-if/else chains to switch • ...
Backend: optimizations
• copy propagation • reducing the number of variables
Botconf 2017 34 / 51 • negation optimization • if (!(a == b)) ⇒ if (a != b) • pointer arithmetic • *(a + 4) ⇒ a[4] • conversion of while (true){ ... if (cond) break; ... } • for (cond){ ... } • while (cond){ ... } • conversion of if/else-if/else chains to switch • ...
Backend: optimizations
• copy propagation • reducing the number of variables • arithmetic expression simplification • a + -1 - -4 ⇒ a + 3
Botconf 2017 34 / 51 • pointer arithmetic • *(a + 4) ⇒ a[4] • conversion of while (true){ ... if (cond) break; ... } • for (cond){ ... } • while (cond){ ... } • conversion of if/else-if/else chains to switch • ...
Backend: optimizations
• copy propagation • reducing the number of variables • arithmetic expression simplification • a + -1 - -4 ⇒ a + 3 • negation optimization • if (!(a == b)) ⇒ if (a != b)
Botconf 2017 34 / 51 • conversion of while (true){ ... if (cond) break; ... } • for (cond){ ... } • while (cond){ ... } • conversion of if/else-if/else chains to switch • ...
Backend: optimizations
• copy propagation • reducing the number of variables • arithmetic expression simplification • a + -1 - -4 ⇒ a + 3 • negation optimization • if (!(a == b)) ⇒ if (a != b) • pointer arithmetic • *(a + 4) ⇒ a[4]
Botconf 2017 34 / 51 • conversion of if/else-if/else chains to switch • ...
Backend: optimizations
• copy propagation • reducing the number of variables • arithmetic expression simplification • a + -1 - -4 ⇒ a + 3 • negation optimization • if (!(a == b)) ⇒ if (a != b) • pointer arithmetic • *(a + 4) ⇒ a[4] • conversion of while (true){ ... if (cond) break; ... } • for (cond){ ... } • while (cond){ ... }
Botconf 2017 34 / 51 • ...
Backend: optimizations
• copy propagation • reducing the number of variables • arithmetic expression simplification • a + -1 - -4 ⇒ a + 3 • negation optimization • if (!(a == b)) ⇒ if (a != b) • pointer arithmetic • *(a + 4) ⇒ a[4] • conversion of while (true){ ... if (cond) break; ... } • for (cond){ ... } • while (cond){ ... } • conversion of if/else-if/else chains to switch
Botconf 2017 34 / 51 Backend: optimizations
• copy propagation • reducing the number of variables • arithmetic expression simplification • a + -1 - -4 ⇒ a + 3 • negation optimization • if (!(a == b)) ⇒ if (a != b) • pointer arithmetic • *(a + 4) ⇒ a[4] • conversion of while (true){ ... if (cond) break; ... } • for (cond){ ... } • while (cond){ ... } • conversion of if/else-if/else chains to switch • ...
Botconf 2017 34 / 51 • stdlib context literals • var_ffff7dc6 = socket(2, 3, 255) sock_id = socket(PF_INET, SOCK_RAW, IPPROTO_RAW) • flock(sock_id, 7) flock(sock_id, LOCK_SH | LOCK_EX | LOCK_NB) • output generation • C • CFG = Control-Flow Graph • Call Graph
Backend: code generation
• variable name assignment • induction variables: for (i = 0; i < 10; ++i) • function arguments: a1, a2, a3, ... • general context names: return result; • stdlib context names: int len = strlen();
Botconf 2017 35 / 51 sock_id = socket(PF_INET, SOCK_RAW, IPPROTO_RAW) • flock(sock_id, 7) flock(sock_id, LOCK_SH | LOCK_EX | LOCK_NB) • output generation • C • CFG = Control-Flow Graph • Call Graph
Backend: code generation
• variable name assignment • induction variables: for (i = 0; i < 10; ++i) • function arguments: a1, a2, a3, ... • general context names: return result; • stdlib context names: int len = strlen(); • stdlib context literals • var_ffff7dc6 = socket(2, 3, 255)
Botconf 2017 35 / 51 • flock(sock_id, 7) flock(sock_id, LOCK_SH | LOCK_EX | LOCK_NB) • output generation • C • CFG = Control-Flow Graph • Call Graph
Backend: code generation
• variable name assignment • induction variables: for (i = 0; i < 10; ++i) • function arguments: a1, a2, a3, ... • general context names: return result; • stdlib context names: int len = strlen(); • stdlib context literals • var_ffff7dc6 = socket(2, 3, 255) sock_id = socket(PF_INET, SOCK_RAW, IPPROTO_RAW)
Botconf 2017 35 / 51 • output generation • C • CFG = Control-Flow Graph • Call Graph
Backend: code generation
• variable name assignment • induction variables: for (i = 0; i < 10; ++i) • function arguments: a1, a2, a3, ... • general context names: return result; • stdlib context names: int len = strlen(); • stdlib context literals • var_ffff7dc6 = socket(2, 3, 255) sock_id = socket(PF_INET, SOCK_RAW, IPPROTO_RAW) • flock(sock_id, 7) flock(sock_id, LOCK_SH | LOCK_EX | LOCK_NB)
Botconf 2017 35 / 51 Backend: code generation
• variable name assignment • induction variables: for (i = 0; i < 10; ++i) • function arguments: a1, a2, a3, ... • general context names: return result; • stdlib context names: int len = strlen(); • stdlib context literals • var_ffff7dc6 = socket(2, 3, 255) sock_id = socket(PF_INET, SOCK_RAW, IPPROTO_RAW) • flock(sock_id, 7) flock(sock_id, LOCK_SH | LOCK_EX | LOCK_NB) • output generation • C • CFG = Control-Flow Graph • Call Graph
Botconf 2017 35 / 51 Backend repos
ƶ RetDec • llvmir2hll library • llvmir2hlltool
Botconf 2017 36 / 51 , REST API https://retdec.com/api/ © Build it yourself 3 CMake, ± gcc/Clang, q Visual Studio 2015 Update 2 I Perl, GNU Bison, Flex, GNU Tar, scp, GNU bash, UPX, dot ƶ Recursively clone the main RetDec repository mkdir build && cd build cmake .. make && make install Ù Run it yourself decompile.sh binary.exe I Get RetDec IDA plugin
How to use RetDec
ɂ Online decompilation service https://retdec.com/decompilation/
Botconf 2017 37 / 51 © Build it yourself 3 CMake, ± gcc/Clang, q Visual Studio 2015 Update 2 I Perl, GNU Bison, Flex, GNU Tar, scp, GNU bash, UPX, dot ƶ Recursively clone the main RetDec repository mkdir build && cd build cmake .. make && make install Ù Run it yourself decompile.sh binary.exe I Get RetDec IDA plugin
How to use RetDec
ɂ Online decompilation service https://retdec.com/decompilation/ , REST API https://retdec.com/api/
Botconf 2017 37 / 51 Ù Run it yourself decompile.sh binary.exe I Get RetDec IDA plugin
How to use RetDec
ɂ Online decompilation service https://retdec.com/decompilation/ , REST API https://retdec.com/api/ © Build it yourself 3 CMake, ± gcc/Clang, q Visual Studio 2015 Update 2 I Perl, GNU Bison, Flex, GNU Tar, scp, GNU bash, UPX, dot ƶ Recursively clone the main RetDec repository mkdir build && cd build cmake .. make && make install
Botconf 2017 37 / 51 I Get RetDec IDA plugin
How to use RetDec
ɂ Online decompilation service https://retdec.com/decompilation/ , REST API https://retdec.com/api/ © Build it yourself 3 CMake, ± gcc/Clang, q Visual Studio 2015 Update 2 I Perl, GNU Bison, Flex, GNU Tar, scp, GNU bash, UPX, dot ƶ Recursively clone the main RetDec repository mkdir build && cd build cmake .. make && make install Ù Run it yourself decompile.sh binary.exe
Botconf 2017 37 / 51 How to use RetDec
ɂ Online decompilation service https://retdec.com/decompilation/ , REST API https://retdec.com/api/ © Build it yourself 3 CMake, ± gcc/Clang, q Visual Studio 2015 Update 2 I Perl, GNU Bison, Flex, GNU Tar, scp, GNU bash, UPX, dot ƶ Recursively clone the main RetDec repository mkdir build && cd build cmake .. make && make install Ù Run it yourself decompile.sh binary.exe I Get RetDec IDA plugin
Botconf 2017 37 / 51 What is RetDec IDA plugin
Botconf 2017 38 / 51 How does RetDec IDA plugin work
◎ Goals ` look & feel native ȩ same object names as IDA è interactive
Botconf 2017 39 / 51 How does RetDec IDA plugin work
◎ Goals ` look & feel native ȩ same object names as IDA è interactive
Botconf 2017 39 / 51 How does RetDec IDA plugin work
◎ Goals ` look & feel native ȩ same object names as IDA è interactive
Botconf 2017 39 / 51 How does RetDec IDA plugin work
◎ Goals ` look & feel native ȩ same object names as IDA è interactive
Botconf 2017 39 / 51 How does RetDec IDA plugin work
◎ Goals ` look & feel native ȩ same object names as IDA è interactive
Botconf 2017 39 / 51 How does RetDec IDA plugin work
◎ Goals ` look & feel native ȩ same object names as IDA è interactive
Botconf 2017 39 / 51 How does RetDec IDA plugin work
◎ Goals ` look & feel native ȩ same object names as IDA è interactive
Botconf 2017 39 / 51 How does RetDec IDA plugin work
◎ Goals ` look & feel native ȩ same object names as IDA è interactive
Botconf 2017 39 / 51 RetDec IDA plugin is interactive
Botconf 2017 40 / 51 12,000 registered users Ù 423,000 decompilations ɂ 350,000 Web , 73,000 API ÿ 410 decompilations daily
How was RetDec used so far
retdec.com launched on 2015-02-05
Botconf 2017 41 / 51 Ù 423,000 decompilations ɂ 350,000 Web , 73,000 API ÿ 410 decompilations daily
How was RetDec used so far
retdec.com launched on 2015-02-05 12,000 registered users
Botconf 2017 41 / 51 How was RetDec used so far
retdec.com launched on 2015-02-05 12,000 registered users Ù 423,000 decompilations ɂ 350,000 Web , 73,000 API ÿ 410 decompilations daily
Botconf 2017 41 / 51 Real example #1: Vawtrak (x86)
Botconf 2017 42 / 51 Real example #2: Vawtrak (x86)
Botconf 2017 43 / 51 Real example #3: CryproWall (x86)
Botconf 2017 44 / 51 Real example #4: Psyb0t (mips) RetDec
Botconf 2017 45 / 51 Real example #5: Psyb0t (mips) RetDec
main
ip2c system flock function_404810 function_404b1c fileno backup Daemonize RSeed getip
fetch sleep strcmp xDec fopen fread fclose fork exit gettimeofday srand
snprintf strncmp parse strncpy strlen
Botconf 2017 46 / 51 NO!
• IDA and Hex-Rays are great Ŏ output quality è interactive ç seamlessly integrated mature many plugins î official support
• IDA and Hex-Rays have flaws not free µ proprietary ® big monolithic GUI app
Should you throw away your Hex-Rays?
Botconf 2017 47 / 51 • IDA and Hex-Rays have flaws not free µ proprietary ® big monolithic GUI app
Should you throw away your Hex-Rays? NO!
• IDA and Hex-Rays are great Ŏ output quality è interactive ç seamlessly integrated mature many plugins î official support
Botconf 2017 47 / 51 Should you throw away your Hex-Rays? NO!
• IDA and Hex-Rays are great Ŏ output quality è interactive ç seamlessly integrated mature many plugins î official support
• IDA and Hex-Rays have flaws not free µ proprietary ® big monolithic GUI app
Botconf 2017 47 / 51 • Not so obvious reasons Ǹ LLVM is awesome different basic designs: interactive GUI vs. pipeline Ȝ LLVM is OP (don’t worry, it won’t be nerfed)
RetDec is handy because . . .
• Obvious reasons it is free + MIPS architecture a MIT license r you can play with the sources
Botconf 2017 48 / 51 different basic designs: interactive GUI vs. pipeline Ȝ LLVM is OP (don’t worry, it won’t be nerfed)
RetDec is handy because . . .
• Obvious reasons it is free + MIPS architecture a MIT license r you can play with the sources • Not so obvious reasons Ǹ LLVM is awesome
Botconf 2017 48 / 51 Ȝ LLVM is OP (don’t worry, it won’t be nerfed)
RetDec is handy because . . .
• Obvious reasons it is free + MIPS architecture a MIT license r you can play with the sources • Not so obvious reasons Ǹ LLVM is awesome different basic designs: interactive GUI vs. pipeline
Botconf 2017 48 / 51 RetDec is handy because . . .
• Obvious reasons it is free + MIPS architecture a MIT license r you can play with the sources • Not so obvious reasons Ǹ LLVM is awesome different basic designs: interactive GUI vs. pipeline Ȝ LLVM is OP (don’t worry, it won’t be nerfed)
Botconf 2017 48 / 51 ƶ Fileformat – generic OFF parsing and analysis ƶ Capstone2LlvmIR – binary to LLVM translation ƶ Fnc-patterns – statically linked code detection in YARA (IDA F.L.I.R.T.) ƶ Yaramod – hack YARA rules in C++ ƶ Yaracpp – YARA C++ wrapper ƶ Ctypes – info on function types
RetDec is not only decompiler
ƶ RetDec – the decompiler ƶ RetDec IDA plugin – Hex-Rays impersonation
Botconf 2017 49 / 51 RetDec is not only decompiler
ƶ RetDec – the decompiler ƶ RetDec IDA plugin – Hex-Rays impersonation ƶ Fileformat – generic OFF parsing and analysis ƶ Capstone2LlvmIR – binary to LLVM translation ƶ Fnc-patterns – statically linked code detection in YARA (IDA F.L.I.R.T.) ƶ Yaramod – hack YARA rules in C++ ƶ Yaracpp – YARA C++ wrapper ƶ Ctypes – info on function types
Botconf 2017 49 / 51 • Throw a release party • Solve some inevitable “hey guys, I’m unable to build your repo” • Write more technical documentation on how it all works • Present it somewhere (maybe LLVM dev meeting) • Continue improving the implementation • make it more portable: Bash ⇒ Python, . . . • 64-bit architectures supports • replace some libraries & modules • We will see. . .
What’s next
• Release the sources on Github shortly after the conference
Botconf 2017 50 / 51 • Solve some inevitable “hey guys, I’m unable to build your repo” • Write more technical documentation on how it all works • Present it somewhere (maybe LLVM dev meeting) • Continue improving the implementation • make it more portable: Bash ⇒ Python, . . . • 64-bit architectures supports • replace some libraries & modules • We will see. . .
What’s next
• Release the sources on Github shortly after the conference • Throw a release party
Botconf 2017 50 / 51 • Write more technical documentation on how it all works • Present it somewhere (maybe LLVM dev meeting) • Continue improving the implementation • make it more portable: Bash ⇒ Python, . . . • 64-bit architectures supports • replace some libraries & modules • We will see. . .
What’s next
• Release the sources on Github shortly after the conference • Throw a release party • Solve some inevitable “hey guys, I’m unable to build your repo”
Botconf 2017 50 / 51 • Present it somewhere (maybe LLVM dev meeting) • Continue improving the implementation • make it more portable: Bash ⇒ Python, . . . • 64-bit architectures supports • replace some libraries & modules • We will see. . .
What’s next
• Release the sources on Github shortly after the conference • Throw a release party • Solve some inevitable “hey guys, I’m unable to build your repo” • Write more technical documentation on how it all works
Botconf 2017 50 / 51 • Continue improving the implementation • make it more portable: Bash ⇒ Python, . . . • 64-bit architectures supports • replace some libraries & modules • We will see. . .
What’s next
• Release the sources on Github shortly after the conference • Throw a release party • Solve some inevitable “hey guys, I’m unable to build your repo” • Write more technical documentation on how it all works • Present it somewhere (maybe LLVM dev meeting)
Botconf 2017 50 / 51 • We will see. . .
What’s next
• Release the sources on Github shortly after the conference • Throw a release party • Solve some inevitable “hey guys, I’m unable to build your repo” • Write more technical documentation on how it all works • Present it somewhere (maybe LLVM dev meeting) • Continue improving the implementation • make it more portable: Bash ⇒ Python, . . . • 64-bit architectures supports • replace some libraries & modules
Botconf 2017 50 / 51 What’s next
• Release the sources on Github shortly after the conference • Throw a release party • Solve some inevitable “hey guys, I’m unable to build your repo” • Write more technical documentation on how it all works • Present it somewhere (maybe LLVM dev meeting) • Continue improving the implementation • make it more portable: Bash ⇒ Python, . . . • 64-bit architectures supports • replace some libraries & modules • We will see. . .
Botconf 2017 50 / 51 That’s all folks
Thanks!
Ò Contacts ɂ https://retdec.com/ https://github.com/avast-tl 7 https://twitter.com/retdec ø https://retdec.com/rss/ R [email protected]
Botconf 2017 51 / 51