Retdec: an Open-Source Machine-Code Decompiler
Total Page:16
File Type:pdf, Size:1020Kb
RetDec: An Open-Source Machine-Code Decompiler Jakub Kˇroustek Peter Matula Petr Zemek Threat Labs Botconf 2017 1 / 51 ♂ Peter Matula • main developer of the RetDec decompiler • senior developer @Avast (previously @AVG) • ♥ rock climbing and • R [email protected] > whoarewe ♂ Jakub Kˇroustek • founder of RetDec • Threat Labs lead @Avast (previously @AVG) • reverse engineer, malware hunter, security researcher • 7 @JakubKroustek • R [email protected] Botconf 2017 2 / 51 > whoarewe ♂ Jakub Kˇroustek • founder of RetDec • Threat Labs lead @Avast (previously @AVG) • reverse engineer, malware hunter, security researcher • 7 @JakubKroustek • R [email protected] ♂ Peter Matula • main developer of the RetDec decompiler • senior developer @Avast (previously @AVG) • ♥ rock climbing and • R [email protected] Botconf 2017 2 / 51 Quiz Time Botconf 2017 3 / 51 Quiz Time Botconf 2017 4 / 51 Quiz Time Botconf 2017 5 / 51 Quiz Time Botconf 2017 6 / 51 Disassembling vs. Decompilation Botconf 2017 7 / 51 Decompilation? What is it? Botconf 2017 8 / 51 è Binary recompilation (yeah, like that’s ever gonna work) • porting • bug fixing • adding new features • original sources got lost • optimizations Decompilation? What good is it? ü Binary analysis • reverse engineering • malware analysis • vulnerability detection • verification • binary comparison • ... Botconf 2017 9 / 51 Decompilation? What good is it? ü Binary analysis • reverse engineering • malware analysis • vulnerability detection • verification • binary comparison • ... è Binary recompilation (yeah, like that’s ever gonna work) • porting • bug fixing • adding new features • original sources got lost • optimizations Botconf 2017 9 / 51 • It is damn hard • compilation is not lossless • high-level constructions • data types • names • comments, macros, . • compilers are optimizing • computer science goodies • undecidable problems • complex algorithms • exponential complexities • obfuscation, packing, anti-debugging Ok, why aren’t we already using it? • Multiple existing tools: Hex-Rays, Hopper, Snowman, etc. Botconf 2017 10 / 51 • compilation is not lossless • high-level constructions • data types • names • comments, macros, . • compilers are optimizing • computer science goodies • undecidable problems • complex algorithms • exponential complexities • obfuscation, packing, anti-debugging Ok, why aren’t we already using it? • Multiple existing tools: Hex-Rays, Hopper, Snowman, etc. • It is damn hard Botconf 2017 10 / 51 • compilers are optimizing • computer science goodies • undecidable problems • complex algorithms • exponential complexities • obfuscation, packing, anti-debugging Ok, why aren’t we already using it? • Multiple existing tools: Hex-Rays, Hopper, Snowman, etc. • It is damn hard • compilation is not lossless • high-level constructions • data types • names • comments, macros, . Botconf 2017 10 / 51 • computer science goodies • undecidable problems • complex algorithms • exponential complexities • obfuscation, packing, anti-debugging Ok, why aren’t we already using it? • Multiple existing tools: Hex-Rays, Hopper, Snowman, etc. • It is damn hard • compilation is not lossless • high-level constructions • data types • names • comments, macros, . • compilers are optimizing Botconf 2017 10 / 51 • obfuscation, packing, anti-debugging Ok, why aren’t we already using it? • Multiple existing tools: Hex-Rays, Hopper, Snowman, etc. • It is damn hard • compilation is not lossless • high-level constructions • data types • names • comments, macros, . • compilers are optimizing • computer science goodies • undecidable problems • complex algorithms • exponential complexities Botconf 2017 10 / 51 Ok, why aren’t we already using it? • Multiple existing tools: Hex-Rays, Hopper, Snowman, etc. • It is damn hard • compilation is not lossless • high-level constructions • data types • names • comments, macros, . • compilers are optimizing • computer science goodies • undecidable problems • complex algorithms • exponential complexities • obfuscation, packing, anti-debugging Botconf 2017 10 / 51 • Many ABIs • Many OFFs (object-file formats) • ± ELF, q PE, Mach-O, . • Many programming languages • Many compilers & optimizations • Statically linked code • ... Generic decompilation? Even harder • Many architectures • x86, ARM, MIPS, PowerPC, . • CISC vs. RISC • bit length, endianness, floating points • versions & extensions Botconf 2017 11 / 51 • Many OFFs (object-file formats) • ± ELF, q PE, Mach-O, . • Many programming languages • Many compilers & optimizations • Statically linked code • ... Generic decompilation? Even harder • Many architectures • x86, ARM, MIPS, PowerPC, . • CISC vs. RISC • bit length, endianness, floating points • versions & extensions • Many ABIs Botconf 2017 11 / 51 • Many programming languages • Many compilers & optimizations • Statically linked code • ... Generic decompilation? Even harder • Many architectures • x86, ARM, MIPS, PowerPC, . • CISC vs. RISC • bit length, endianness, floating points • versions & extensions • Many ABIs • Many OFFs (object-file formats) • ± ELF, q PE, Mach-O, . Botconf 2017 11 / 51 • Many compilers & optimizations • Statically linked code • ... Generic decompilation? Even harder • Many architectures • x86, ARM, MIPS, PowerPC, . • CISC vs. RISC • bit length, endianness, floating points • versions & extensions • Many ABIs • Many OFFs (object-file formats) • ± ELF, q PE, Mach-O, . • Many programming languages Botconf 2017 11 / 51 • Statically linked code • ... Generic decompilation? Even harder • Many architectures • x86, ARM, MIPS, PowerPC, . • CISC vs. RISC • bit length, endianness, floating points • versions & extensions • Many ABIs • Many OFFs (object-file formats) • ± ELF, q PE, Mach-O, . • Many programming languages • Many compilers & optimizations Botconf 2017 11 / 51 • ... Generic decompilation? Even harder • Many architectures • x86, ARM, MIPS, PowerPC, . • CISC vs. RISC • bit length, endianness, floating points • versions & extensions • Many ABIs • Many OFFs (object-file formats) • ± ELF, q PE, Mach-O, . • Many programming languages • Many compilers & optimizations • Statically linked code Botconf 2017 11 / 51 Generic decompilation? Even harder • Many architectures • x86, ARM, MIPS, PowerPC, . • CISC vs. RISC • bit length, endianness, floating points • versions & extensions • Many ABIs • Many OFFs (object-file formats) • ± ELF, q PE, Mach-O, . • Many programming languages • Many compilers & optimizations • Statically linked code • ... Botconf 2017 11 / 51 ƽ History • 2011–2013 (AVG + BUT FIT via TACRˇ TA01010667 grant) • 2013–2016 (AVG + BUT FIT students via diploma theses) • 2016–* (Avast + BUT FIT students) People ȯ 3-4 core developers Ƅ ≈ 20 BSc/MSc/PhD students ǀ Lines of code ì 419,451 code 7 205,222 comments, etc. Ý 624,673 total Retargetable Decompiler (RetDec) ◎ Goal • generic decompilation of binary code Botconf 2017 12 / 51 People ȯ 3-4 core developers Ƅ ≈ 20 BSc/MSc/PhD students ǀ Lines of code ì 419,451 code 7 205,222 comments, etc. Ý 624,673 total Retargetable Decompiler (RetDec) ◎ Goal • generic decompilation of binary code ƽ History • 2011–2013 (AVG + BUT FIT via TACRˇ TA01010667 grant) • 2013–2016 (AVG + BUT FIT students via diploma theses) • 2016–* (Avast + BUT FIT students) Botconf 2017 12 / 51 ǀ Lines of code ì 419,451 code 7 205,222 comments, etc. Ý 624,673 total Retargetable Decompiler (RetDec) ◎ Goal • generic decompilation of binary code ƽ History • 2011–2013 (AVG + BUT FIT via TACRˇ TA01010667 grant) • 2013–2016 (AVG + BUT FIT students via diploma theses) • 2016–* (Avast + BUT FIT students) People ȯ 3-4 core developers Ƅ ≈ 20 BSc/MSc/PhD students Botconf 2017 12 / 51 Retargetable Decompiler (RetDec) ◎ Goal • generic decompilation of binary code ƽ History • 2011–2013 (AVG + BUT FIT via TACRˇ TA01010667 grant) • 2013–2016 (AVG + BUT FIT students via diploma theses) • 2016–* (Avast + BUT FIT students) People ȯ 3-4 core developers Ƅ ≈ 20 BSc/MSc/PhD students ǀ Lines of code ì 419,451 code 7 205,222 comments, etc. Ý 624,673 total Botconf 2017 12 / 51 3 Does • compiler/packer detection • statically linked code detection • OS loader simulation • recursive traversal disassembling • high-level constructions/types reconstruction • pattern detection • ... Ù Runs on (hopefully) RetDec? What does it do ¿ Supports • architectures (32-bit): x86, ARM, PowerPC, MIPS • OFFs: ELF, PE, COFF, Mach-O, Intel HEX, AR, raw • compilers (we test with): gcc, Clang, MSVC Botconf 2017 13 / 51 Ù Runs on (hopefully) RetDec? What does it do ¿ Supports • architectures (32-bit): x86, ARM, PowerPC, MIPS • OFFs: ELF, PE, COFF, Mach-O, Intel HEX, AR, raw • compilers (we test with): gcc, Clang, MSVC 3 Does • compiler/packer detection • statically linked code detection • OS loader simulation • recursive traversal disassembling • high-level constructions/types reconstruction • pattern detection • ... Botconf 2017 13 / 51 RetDec? What does it do ¿ Supports • architectures (32-bit): x86, ARM, PowerPC, MIPS • OFFs: ELF, PE, COFF, Mach-O, Intel HEX, AR, raw • compilers (we test with): gcc, Clang, MSVC 3 Does • compiler/packer detection • statically linked code detection • OS loader simulation • recursive traversal disassembling • high-level constructions/types reconstruction • pattern detection • ... Ù Runs on (hopefully) Botconf 2017 13 / 51 á Repositories C 11 core ¿ 6 support 8 third party Ò Contacts ɂ https://retdec.com/ https://github.com/avast-tl 7 https://twitter.com/retdec ø https://retdec.com/rss/ R [email protected] Good news everyone!