Genetic Algorithm Application in Information Security Systems
Total Page:16
File Type:pdf, Size:1020Kb
VILNIUS GEDIMINAS TECHNICAL UNIVERSITY Nikolaj GORANIN GENETIC ALGORITHM APPLICATION IN INFORMATION SECURITY SYSTEMS DOCTORAL DISSERTATION TECHNOLOGICAL SCIENCES, INFORMATICS ENGINEERING (07T) Vilnius 2010 Doctoral dissertation was prepared at Vilnius Gediminas Technical University in 2006–2010. Scientific Supervisor Prof Dr Habil Antanas ČENYS (Vilnius Gediminas Technical University, Technological Sciences, Informatics Engineering – 07T). VGTU leidyklos TECHNIKA 1756-M mokslo literat ūros knyga http://leidykla.vgtu.lt ISBN 978-9955-28-588-5 © VGTU leidykla TECHNIKA, 2010 © Nikolaj Goranin, 2010 [email protected] VILNIAUS GEDIMINO TECHNIKOS UNIVERSITETAS Nikolaj GORANIN GENETINI Ų ALGORITM Ų TAIKYMAS INFORMACIJOS SAUGOS SISTEMOSE DAKTARO DISERTACIJA TECHNOLOGIJOS MOKSLAI, INFORMATIKOS INŽINERIJA (07T) Vilnius 2010 Disertacija rengta 2006–2010 metais Vilniaus Gedimino technikos universitete. Mokslinis vadovas prof. habil. dr. Antanas ČENYS (Vilniaus Gedimino technikos universitetas, technologijos mokslai, informatikos inžinerija – 07T). Abstract Information infrastructure protection against malicious software, such as vi- ruses, Internet worms and botnets is a crucial task that requires development and implementation of both reactive, such as patches, removal tools, antivirus soft- ware updates and proactive, such as countermeasure planning in advance, meas- ures. Success of these measures application highly depends on reaction time and understanding of malware evolution trends respectively. Reaction time should be directly proportional to the risk level posed by malware. Currently risk evalua- tion task is based on expert knowledge and no automatic evaluation systems ex- ist. Existing malware models mainly concentrate on malware epidemic conse- quences modelling, i.e. forecasting the number of infected computers, simulating malware behaviour or economic propagation aspects and are based only on cur- rent malware propagation strategies. Present study presents an innovative genetic algorithm based malware risk level evaluation and malware evolution forecasting model. Genetic algorithm approach was selected taking into consideration its efficiency while solving tasks with large solution space and ability to model the evolution process which is the case for malware, often considered as a form of artificial life, evolution forecasting. The model presented covers the malware feature representation de- scription in the genetic algorithm suitable format, evolution evaluation fitness functions for propagation and survivability strategy evolution forecasting of sev- eral malware types in friendly and hostile environments, algorithm operating conditions and a genetic algorithm based method for decision tree generation, used for malware risk evaluation. Correctness of the proposed fitness evaluation criteria is tested on historical data; malware evolution modelling and risk evaluation experiments are per- formed; conclusions regarding the malware evolution trends are provided. Study also reviews genetic algorithm application in information security systems, provides technical analysis of modern malware types, that are used for evolution modelling, and analyses existing malware models. It is concluded that the proposed model is easily extensible for other mal- ware types and parameters. It can be successfully used for evolution forecasting and risk evaluation of newly appearing malware samples. v Rezium ÷ Apsauga nuo kenksmingo programinio kodo, tokio kaip kompiuteriniai vi- rusai, interneto kirminai ir kenksmingi botnet tinklai, tampa gyvybiškai svarbiu uždaviniu užtikrinant informacin ÷s infrastrukt ūros saug ą ir stabilum ą. Taikomos apsaugos priemon ÷s gali b ūti suskirstytos į reaktyvias, tokias kaip programin ÷s įrangos ir antivirusini ų program ų duomen ų bazi ų atnaujinimai, šalinimo įranki ų kūrimas, ir proaktyvias, tai yra planuojamas iki gr ÷sm ÷s atsiradimo. Apsaugos priemoni ų grupi ų taikymo efektyvumas priklauso nuo reakcijos laiko pirmu, ir kenksmingo programinio kodo evoliucijos tendencij ų supratimo, antru atveju. Reakcijos laikas tur ÷tų b ūti tiesiogiai proporcingas naujai atsiran- dan čio kenksmingo programinio kodo pavojingumo lygiui, kurio įvertinimas šiuo metu pagrinde remiasi ekspertiniais vertinimais. Šiuolaikiniai kenksmingo programinio kodo modeliavimo įrankiai yra skirti epidemiologini ų arba ekono- mini ų pasekmi ų prognozavimui ir kenksmingo programinio kodo elgesio simu- liavimui bei nevertina perspektyvi ų kenksmingo programinio kodo strategijų. Šioje disertacijoje si ūlomas genetiniais algoritmais paremtas kenksmingo programinio kodo rizikos vertinimo ir evoliucijos prognozavimo modelis. Gene- tiniai algoritmai pasirinkti atsižvelgiant į metodo efektyvum ą sprendžiant dauge- lio sri čių, tame tarpe ir informacin ÷s saugos, uždavinius, bei galimyb ę modeliuo- ti evoliucijos proces ą, kuris gali b ūti taikomas ir kenksmingo programinio kodo, dažnai traktuojamo kaip dirbtin ÷s gyvyb ÷s atmaina, evoliucijos modeliavimui. Modelio aprašym ą sudaro kenksmingo programinio kodo atvaizdavimo aprašy- mas, tikslo funkcijos skirting ų parametr ų evoliucijos prognozavimui, algoritmo veikimo parametrai ir genetiniais algoritmais paremtas sprendim ų medži ų, nau- dojam ų rizikos vertinimui, generavimo metodas. Pasi ūlyt ų tinkslo funkcij ų korektiškumas tikrinamas pritaikant istorinius duomenis, atliekami evoliucijos modeliavimo ir rizikos vertinimo eksperimentai, padaromos išvados d ÷l kenksmingo programinio kodo evoliucijos tendecij ų. Disertacijoje taip pat analizuojami genetini ų algoritm ų taikymai informaci- jos saugos sistemose, pateikiama keli ų šiuolaikini ų kenksmingo programinio kodo r ūši ų, kuri ų evoliucija yra modeliuojama, technin ÷ analiz ÷, bei nagrin ÷ja- mos egzistuojan čios kenksmingo programinio kodo modeliavimo priemon÷s. Darbe teigiama, kad pasi ūlytas modelis yra lengvai ple čiamas ir gali b ūti taikomas kenksmingo programinio kodo evoliucijos modeliavimui bei naujai atsirandančių egzempliori ų rizikos vertinimui. vi Notations Abbreviations 2D – two dimensional (geometrics); 3D – three dimensional (geometrics); AES – advanced encryption standard; AI – artificial intelligence; AL – artificial life; ANN – artificial neural networks; C&C – command & control (center); CPU – central processing unit; CS – classifier systems; CVE – common vulnerabilities and exposures; DARPA – The Defense Advanced Research Projects Agency DDoS – distributed denial of service (attack); DES – data encryption standard; DNA – deoxyribonucleic acid; DNS – domain name system; DOS – denial of service; EA – evolutionary algorithm; EEG – electroencephalogram; EER – error rate; vii ENISA – The European Network and Information Security Agency; ES – evolutionary strategies; FDR – Fischer discriminant ratio; FTP – file transfer protocol; FVC – Fingerprint Verification Contest; GA – genetic algorithm; GP – genetic programming; HDD – hard disk drive; HIDS – host-based IDS; HTTP – hypertext transfer protocol; ICIGA – improved cryptography inspired by genetic algorithms; ICMP – Internet control message protocol; ID – intrusion detection; IDC – International Data Corporation; IDS – intrusion detection system; IIS – Internet Information Services; IP – Internet protocol; IPS – intrusion prevention system; IRC – Internet relay chat; IT – information technology; LDA – linear discriminant analysis; LPR – line printer remote; MMS – multimedia messaging service; NIDS – network-based IDS; NIST – National Institute of Standards and Technology; NLFFSR – non-linear function feedback shift register; OS – operating system; P2P – peer-to-peer; R2L – Remote2Local; RCS – The Random Constant Spread; RSA – Rivest, Shamir and Adleman algorithm; SIR – susceptible-infected-recovered; SIS – susceptible-infected-susceptible; SMS – short message service; SMTP – simple mail transfer protocol; SNR – signal-to-noise ratio; TC – termination condition; TCP – transmission control protocol; U2R – User2Root; UDP – user datagram protocol; UTC – coordinated universal time; VEP – visual evoked potential. viii Symbols A – public key; a(t) – proportion of vulnerable machines which have been compromised at the instance t; A’ – private key; aij – adjustable parameter (biometry); ci – greyscale intensity; CPU_LOAD i – average gene’s method load on the infected computer’s CPU during time tj; DDF – measured freaquency for a diagram; DF – measured character’s frequency in the decoded ciphertext; F – fitness function, that evaluates the malware strategy’s efficiency under in terms of survivability; th fi – fitness of the i individual in population of solutions; FSC – fitness function, that evaluates the malware strategy’s efficiency under pressure of countermeasures; k – activity level of malware strategy being evaluated, either in terms of number of ac- tion performed (propagation forecasting) or computer resource usage level (survivability forecasting); K – value based on K(S) function calculations, that evaluated the S strategy’s fitness in sense of propagation speed efficiency; m – relatively primary number (cryptography) or number of points considered (biome- try); Matched – revellance of gene in historical data; N – population size; NCC – number of Command and Control servers, needed for botnet to remain stable in time interval T; pi – probability assigned to the specific gene i, describing its effectiveness probability or probability of the ith individual in population to be selected; ranking – difficulty of intrusion detection; S – malware strategy representing variable;