Retdec: an Open-Source Machine-Code Decompiler

Retdec: an Open-Source Machine-Code Decompiler

RetDec: An Open-Source Machine-Code Decompiler Jakub Kˇroustek Peter Matula Petr Zemek Threat Labs Botconf 2017 1 / 51 ♂ Peter Matula • main developer of the RetDec decompiler • senior developer @Avast (previously @AVG) • ♥ rock climbing and • R [email protected] > whoarewe ♂ Jakub Kˇroustek • founder of RetDec • Threat Labs lead @Avast (previously @AVG) • reverse engineer, malware hunter, security researcher • 7 @JakubKroustek • R [email protected] Botconf 2017 2 / 51 > whoarewe ♂ Jakub Kˇroustek • founder of RetDec • Threat Labs lead @Avast (previously @AVG) • reverse engineer, malware hunter, security researcher • 7 @JakubKroustek • R [email protected] ♂ Peter Matula • main developer of the RetDec decompiler • senior developer @Avast (previously @AVG) • ♥ rock climbing and • R [email protected] Botconf 2017 2 / 51 Quiz Time Botconf 2017 3 / 51 Quiz Time Botconf 2017 4 / 51 Quiz Time Botconf 2017 5 / 51 Quiz Time Botconf 2017 6 / 51 Disassembling vs. Decompilation Botconf 2017 7 / 51 Decompilation? What is it? Botconf 2017 8 / 51 è Binary recompilation (yeah, like that’s ever gonna work) • porting • bug fixing • adding new features • original sources got lost • optimizations Decompilation? What good is it? ü Binary analysis • reverse engineering • malware analysis • vulnerability detection • verification • binary comparison • ... Botconf 2017 9 / 51 Decompilation? What good is it? ü Binary analysis • reverse engineering • malware analysis • vulnerability detection • verification • binary comparison • ... è Binary recompilation (yeah, like that’s ever gonna work) • porting • bug fixing • adding new features • original sources got lost • optimizations Botconf 2017 9 / 51 • It is damn hard • compilation is not lossless • high-level constructions • data types • names • comments, macros, . • compilers are optimizing • computer science goodies • undecidable problems • complex algorithms • exponential complexities • obfuscation, packing, anti-debugging Ok, why aren’t we already using it? • Multiple existing tools: Hex-Rays, Hopper, Snowman, etc. Botconf 2017 10 / 51 • compilation is not lossless • high-level constructions • data types • names • comments, macros, . • compilers are optimizing • computer science goodies • undecidable problems • complex algorithms • exponential complexities • obfuscation, packing, anti-debugging Ok, why aren’t we already using it? • Multiple existing tools: Hex-Rays, Hopper, Snowman, etc. • It is damn hard Botconf 2017 10 / 51 • compilers are optimizing • computer science goodies • undecidable problems • complex algorithms • exponential complexities • obfuscation, packing, anti-debugging Ok, why aren’t we already using it? • Multiple existing tools: Hex-Rays, Hopper, Snowman, etc. • It is damn hard • compilation is not lossless • high-level constructions • data types • names • comments, macros, . Botconf 2017 10 / 51 • computer science goodies • undecidable problems • complex algorithms • exponential complexities • obfuscation, packing, anti-debugging Ok, why aren’t we already using it? • Multiple existing tools: Hex-Rays, Hopper, Snowman, etc. • It is damn hard • compilation is not lossless • high-level constructions • data types • names • comments, macros, . • compilers are optimizing Botconf 2017 10 / 51 • obfuscation, packing, anti-debugging Ok, why aren’t we already using it? • Multiple existing tools: Hex-Rays, Hopper, Snowman, etc. • It is damn hard • compilation is not lossless • high-level constructions • data types • names • comments, macros, . • compilers are optimizing • computer science goodies • undecidable problems • complex algorithms • exponential complexities Botconf 2017 10 / 51 Ok, why aren’t we already using it? • Multiple existing tools: Hex-Rays, Hopper, Snowman, etc. • It is damn hard • compilation is not lossless • high-level constructions • data types • names • comments, macros, . • compilers are optimizing • computer science goodies • undecidable problems • complex algorithms • exponential complexities • obfuscation, packing, anti-debugging Botconf 2017 10 / 51 • Many ABIs • Many OFFs (object-file formats) • ± ELF, q PE, Mach-O, . • Many programming languages • Many compilers & optimizations • Statically linked code • ... Generic decompilation? Even harder • Many architectures • x86, ARM, MIPS, PowerPC, . • CISC vs. RISC • bit length, endianness, floating points • versions & extensions Botconf 2017 11 / 51 • Many OFFs (object-file formats) • ± ELF, q PE, Mach-O, . • Many programming languages • Many compilers & optimizations • Statically linked code • ... Generic decompilation? Even harder • Many architectures • x86, ARM, MIPS, PowerPC, . • CISC vs. RISC • bit length, endianness, floating points • versions & extensions • Many ABIs Botconf 2017 11 / 51 • Many programming languages • Many compilers & optimizations • Statically linked code • ... Generic decompilation? Even harder • Many architectures • x86, ARM, MIPS, PowerPC, . • CISC vs. RISC • bit length, endianness, floating points • versions & extensions • Many ABIs • Many OFFs (object-file formats) • ± ELF, q PE, Mach-O, . Botconf 2017 11 / 51 • Many compilers & optimizations • Statically linked code • ... Generic decompilation? Even harder • Many architectures • x86, ARM, MIPS, PowerPC, . • CISC vs. RISC • bit length, endianness, floating points • versions & extensions • Many ABIs • Many OFFs (object-file formats) • ± ELF, q PE, Mach-O, . • Many programming languages Botconf 2017 11 / 51 • Statically linked code • ... Generic decompilation? Even harder • Many architectures • x86, ARM, MIPS, PowerPC, . • CISC vs. RISC • bit length, endianness, floating points • versions & extensions • Many ABIs • Many OFFs (object-file formats) • ± ELF, q PE, Mach-O, . • Many programming languages • Many compilers & optimizations Botconf 2017 11 / 51 • ... Generic decompilation? Even harder • Many architectures • x86, ARM, MIPS, PowerPC, . • CISC vs. RISC • bit length, endianness, floating points • versions & extensions • Many ABIs • Many OFFs (object-file formats) • ± ELF, q PE, Mach-O, . • Many programming languages • Many compilers & optimizations • Statically linked code Botconf 2017 11 / 51 Generic decompilation? Even harder • Many architectures • x86, ARM, MIPS, PowerPC, . • CISC vs. RISC • bit length, endianness, floating points • versions & extensions • Many ABIs • Many OFFs (object-file formats) • ± ELF, q PE, Mach-O, . • Many programming languages • Many compilers & optimizations • Statically linked code • ... Botconf 2017 11 / 51 ƽ History • 2011–2013 (AVG + BUT FIT via TACRˇ TA01010667 grant) • 2013–2016 (AVG + BUT FIT students via diploma theses) • 2016–* (Avast + BUT FIT students) People ȯ 3-4 core developers Ƅ ≈ 20 BSc/MSc/PhD students ǀ Lines of code ì 419,451 code 7 205,222 comments, etc. Ý 624,673 total Retargetable Decompiler (RetDec) ◎ Goal • generic decompilation of binary code Botconf 2017 12 / 51 People ȯ 3-4 core developers Ƅ ≈ 20 BSc/MSc/PhD students ǀ Lines of code ì 419,451 code 7 205,222 comments, etc. Ý 624,673 total Retargetable Decompiler (RetDec) ◎ Goal • generic decompilation of binary code ƽ History • 2011–2013 (AVG + BUT FIT via TACRˇ TA01010667 grant) • 2013–2016 (AVG + BUT FIT students via diploma theses) • 2016–* (Avast + BUT FIT students) Botconf 2017 12 / 51 ǀ Lines of code ì 419,451 code 7 205,222 comments, etc. Ý 624,673 total Retargetable Decompiler (RetDec) ◎ Goal • generic decompilation of binary code ƽ History • 2011–2013 (AVG + BUT FIT via TACRˇ TA01010667 grant) • 2013–2016 (AVG + BUT FIT students via diploma theses) • 2016–* (Avast + BUT FIT students) People ȯ 3-4 core developers Ƅ ≈ 20 BSc/MSc/PhD students Botconf 2017 12 / 51 Retargetable Decompiler (RetDec) ◎ Goal • generic decompilation of binary code ƽ History • 2011–2013 (AVG + BUT FIT via TACRˇ TA01010667 grant) • 2013–2016 (AVG + BUT FIT students via diploma theses) • 2016–* (Avast + BUT FIT students) People ȯ 3-4 core developers Ƅ ≈ 20 BSc/MSc/PhD students ǀ Lines of code ì 419,451 code 7 205,222 comments, etc. Ý 624,673 total Botconf 2017 12 / 51 3 Does • compiler/packer detection • statically linked code detection • OS loader simulation • recursive traversal disassembling • high-level constructions/types reconstruction • pattern detection • ... Ù Runs on (hopefully) RetDec? What does it do ¿ Supports • architectures (32-bit): x86, ARM, PowerPC, MIPS • OFFs: ELF, PE, COFF, Mach-O, Intel HEX, AR, raw • compilers (we test with): gcc, Clang, MSVC Botconf 2017 13 / 51 Ù Runs on (hopefully) RetDec? What does it do ¿ Supports • architectures (32-bit): x86, ARM, PowerPC, MIPS • OFFs: ELF, PE, COFF, Mach-O, Intel HEX, AR, raw • compilers (we test with): gcc, Clang, MSVC 3 Does • compiler/packer detection • statically linked code detection • OS loader simulation • recursive traversal disassembling • high-level constructions/types reconstruction • pattern detection • ... Botconf 2017 13 / 51 RetDec? What does it do ¿ Supports • architectures (32-bit): x86, ARM, PowerPC, MIPS • OFFs: ELF, PE, COFF, Mach-O, Intel HEX, AR, raw • compilers (we test with): gcc, Clang, MSVC 3 Does • compiler/packer detection • statically linked code detection • OS loader simulation • recursive traversal disassembling • high-level constructions/types reconstruction • pattern detection • ... Ù Runs on (hopefully) Botconf 2017 13 / 51 á Repositories C 11 core ¿ 6 support 8 third party Ò Contacts ɂ https://retdec.com/ https://github.com/avast-tl 7 https://twitter.com/retdec ø https://retdec.com/rss/ R [email protected] Good news everyone!

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    152 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us