Université Batna 2 – Mostefa Ben Boulaïd Thèse Doctorat En
Total Page:16
File Type:pdf, Size:1020Kb
République Algérienne Démocratique et Populaire Ministère de l’Enseignement Supérieur et de la Recherche Scientifique Université Batna 2 – Mostefa Ben Boulaïd Faculté de Technologie Département de Génie Industriel Thèse Préparée au sein du laboratoire d’Automatique & Productique Présentée pour l’obtention du diplôme de : Doctorat en Sciences en Génie Industriel Option : Génie Industriel Sous le Thème : An Optimized Approach to Software Security via Malware Analysis Présentée par : OURLIS Lazhar Devant le jury composé de : M. ABDELHAMID Samir MCA Université de Batna 2 Président M. BELLALA Djamel MCA Université de Batna 2 Rapporteur Mme. BOUAME Souhila MCA Université de Batna 2 Examinateur M.DJEFFAL Abdelhamid MCA Université de Biskra Examinateur M.KAHLOUL Laid Prof Université de Biskra Examinateur M. BENMOHAMMED Mohamed Prof Université de Constantine 2 Examinateur Novembre 2020 To my parents and family Contents List of Figures. .I List of Tables. .II Program Listings. III Abbreviations. IV Publications. .V Acknowledgements. .VI Abstract. VII Chapter 1 Introduction ...................................................................................................................... 1 1.1 Motivation for the research ............................................................................................ 3 1.2 Thesis scope .................................................................................................................... 3 1.3 Thesis outline .................................................................................................................. 3 2 Malware Overview ........................................................................................................... 5 2.1 A Brief History of Malware ............................................................................................ 5 2.2 Malware definition ....................................................................................................... 16 2.3 Malware propagation vectors ...................................................................................... 17 2.4 Malware concealment strategies .................................................................................. 20 2.4.1 Encryption ............................................................................................................ 20 2.4.2 Stealth ................................................................................................................... 20 2.4.3 Packing ................................................................................................................. 21 2.4.4 Oligomorphism ..................................................................................................... 21 2.4.5 Polymorphism ...................................................................................................... 22 2.4.6 Metamorphism ..................................................................................................... 23 2.5 Malware obfuscation techniques .................................................................................. 23 2.6 Malware types .............................................................................................................. 25 2.7 Malware analysis ......................................................................................................... 30 2.7.1 Static analysis ....................................................................................................... 30 2.7.2 Dynamic analysis ............................................................................................... 300 2.7.3 Memory analysis (memory forensics) .................................................................. 31 2.8 Malware detection ........................................................................................................ 31 2.8.1 Signature-based detection .................................................................................... 32 2.8.2 Heuristic-based detection ..................................................................................... 36 3 Pattern-matching algorithms ........................................................................................ 37 3.1 Pattern-matching problem ........................................................................................... 37 3.2 Exact pattern-matching algorithms .............................................................................. 39 3.3 Brute force algorithm ................................................................................................... 39 3.4 The Boyer-Moore algorithm ......................................................................................... 40 3.4.1 The bad character heuristic .................................................................................. 41 3.4.2 The good suffix heuristic ...................................................................................... 43 3.5 The Aho-Corasick algorithm ........................................................................................ 47 3.5.1 Review of the AC algorithm ................................................................................ 49 3.5.2 Review of the AC algorithm using the next-move function ................................. 54 4 SIMD Implementation of the Aho-Corasick Algorithm using Intel® AVX2 ............ 57 4.1 Intel© SIMD extensions ................................................................................................ 59 4.1.1 MMX™ Technology ............................................................................................ 60 4.1.2 Streaming SIMD Extensions (SSE) ..................................................................... 61 4.1.3 Advanced Vector Extensions (AVX) ................................................................... 63 4.1.4 Test for AVX2 support ......................................................................................... 65 4.2 Vectorization: Data Parallelism .................................................................................. 68 4.3 Vectorization approaches (SIMD programming methods) .......................................... 69 4.4 Characteristics of SIMD operations ............................................................................ 71 4.5 SIMD Implementation of the Aho-Corasick Algorithm using Intel® AVX2 ................. 73 4.5.1 Vectorization of the Aho-Corasick Algorithm using Intel® AVX2 ..................... 73 4.5.2 Experimental results ............................................................................................. 76 5 Improving the Signature Scanner using an Optimized Aho-Corasick algorithm .... 80 5.1 Scanning for byte-stream signatures ............................................................................ 82 5.2 The signature scanner tool ........................................................................................... 85 5.3 Performance analysis ................................................................................................... 87 5.3.1 Performance according to the malware database ................................................. 88 5.3.2 Performance according to the malware signature size ......................................... 90 6 Conclusion ....................................................................................................................... 92 BIBLIOGRAPHY . 93 List of Figures Figure 2. 1: Aycock's classification of malware ...................................................................... 26 Figure 2. 2: Adleman’s classification of malware ................................................................... 26 Figure 2. 3: Microsoft malware classification ......................................................................... 27 Figure 2. 4: Malware detected by the MSRT by means of propagation ability ...................... 28 Figure 2. 5: Kaspersky Lab malware classification diagram................................................... 29 Figure 2. 6: Hexadecimal representation of a typical malware signature and the corresponding x86 code snippet .............................................................................................. 33 Figure 2. 7: Hexadecimal representation of a generic malware signature.............................. 34 Figure 2. 8: ClamAV signature for the Virut malware ........................................................... 34 Figure 3. 1: Application fields using pattern-matching ........................................................... 37 Figure 3. 2: The bad character shift heuristic .......................................................................... 41 Figure 3. 3: The good suffix .................................................................................................... 43 Figure 3. 4: The matching suffix occurs somewhere else in the pattern ................................. 44 Figure 3. 5: A partial of a good suffix occurs as a prefix of the pattern .................................. 44 Figure 3. 6: Pattern-matching machine for the set of keywords {these, this, the, set} ........... 50 Figure 3. 7: State transitions using the next-move function .................................................... 56 Figure 4. 1: Classification of parallel architectures ................................................................ 58 Figure 4. 2: Intel® SIMD extensions ....................................................................................... 59 Figure 4. 3: MMX data types..................................................................................................