USOO8161548B1

(12) United States Patent (10) Patent No.: US 8,161,548 B1 Wan (45) Date of Patent: Apr. 17, 2012

(54) DETECTION USING PATTERN 7,155,742 B1* 12/2006 Szor ...... 726/25 CLASSIFICATION 7,181,583 B2 2/2007 Saika 7,188,369 B2 3/2007 Ho et al. 7,228,566 B2 6/2007 Caceres et al. (75) Inventor: Justin Wan, Nanjing (CN) 7,234,076 B2 6/2007 Daynes et al. 7,263,616 B1 8, 2007 Brackett (73) Assignee: Trend Micro, Inc., Tokyo (JP) 7,308,449 B2 12/2007 Fairweather 7,370,360 B2 5/2008 van der Made (*) Notice: Subject to any disclaimer, the term of this 2. R 539 Marinescu past its, listed under 35 7,409,719 B2 8/2008 Armstrong et al. .S.C. 154(b) by yS. 7,441,234 B2 10/2008 Cwalina et al. 7.487,543 B2 2/2009 Arnold et al. (21) Appl. No.: 11/204,567 7,526,809 B2 4/2009 Liang et al. (22) Filed: Aug. 15, 2005 (Continued)Continued (51) Int. Cl. OTHER PUBLICATIONS G06F2L/00 (2006.01) Office Action dated Jun. 3, 2009 in U.S. Appl. No. 1 1/356,600. (52) U.S. Cl...... 726/22; 726/23: 726/24; 726/25; Continued 713/188 (Continued) (58) Field of Classification a 35.737s".6% Primary Examiner — Vivek Srivastava S ee application file for complete search history. Assistant Examiner — Thong Truong (74) Attorney, Agent, or Firm — Beyer Law Group LLP (56) References Cited (57) ABSTRACT U.S. PATENT DOCUMENTS A malware classifier uses features of Suspect Software to 5,381545 A 1/1995 Baker et al. classify the Software as malicious or not. The classifier uses a 5,907,834. A * 5/1999 Kephart et al...... T06/20 pattern classification algorithm to statistically analyze com 5,950,003 A 9, 1999 Kaneshiro et al. 6,002,869 A 12/1999 Hinckley puter software. The classifier takes a feature representation of 6,067,410 A 5/2000 Nachenberg the software and maps it to the classification label with the use 6,128,630 A 10, 2000 Shackelford of a trained model. The feature representation of the input 6,161,130 A * 12/2000 Horvitz et al...... TO9,206 computer software includes the relevant features and the val 6,266,811 B1 7, 2001 Nabahi ues of each feature. These features include the categories of: 6,539,501 B1 3/2003 Edwards 6,785,818 B1 8, 2004 Sobel et al. applicable software characteristics of a particular type of 6,877,109 B2 4/2005 Delaney et al. malware; dynamic link library (DLL) and function name 6,973,577 B1 * 12/2005 Kouznetsov ...... 726/25 strings typically occurring in the body of the malware; and 6,993,537 B2 1/2006 Buxton et al. other alphanumeric strings commonly found in malware. By 7,039,830 B2 5/2006 Qin providing these features and their values to the classifier, the 7,047.303 B2 5/2006 Lingafelt et al. 7,093,239 B1 8, 2006 van der Made classifier is better able to identify a particular type of mal 7,096,368 B2 8/2006 Kouznetsov et al. Wa. 7,103,913 B2 9, 2006 Arnold et al. 7,120,901 B2 10/2006 Ferrietal. 22 Claims, 16 Drawing Sheets

Flow Classification of Software

load Feature Definition File ---504

load Function Definition

Obtain Suspect Software

Extract Features ---

run Classification Algorithm 1

Output Classification Label Ed US 8,161,548 B1 Page 2

U.S. PATENT DOCUMENTS 2006, O136720 A1 6/2006 Armstrong et al. 2006/O136771 A1 6/2006 Watanabe 7,562,391 B1 T/2009 Nachenberg et al. 2006, O13701.0 A1 6/2006 Kramer et al. 7,565,382 B1 T/2009 Sobel 2006, O150256 A1 7/2006 Fanton et al. 7,577,943 B2 8, 2009 Chilimbi et al. 2006/0156397 A1 7, 2006 Dai 7,581,136 B2 8, 2009 Osaki 2006/0173935 A1 8, 2006 Merchant et al. 7,587,760 B1 9, 2009 Day 2006/0230451 A1 10, 2006 Kramer et al. 7,636,946 B2 12, 2009 Verma et al. 2006/0236049 A1 10, 2006 Iwamura 7,664,923 B2 2, 2010 Kim et al. 2006/0242636 A1 10, 2006 Chilimbi et al. 2002/0178374 A1 11, 2002 Swimmer et al. 2006/0242701 A1 10, 2006 Black et al. 2003/0O23733 A1 1, 2003 Lingafelt et al. 2007,0006304 A1 1/2007 Kramer et al. 2003/0041316 A1 2, 2003 Hibbeller et al. 2007/0022287 A1 1/2007 Becket al. 2003, OO65926 A1 4, 2003 Schultz et al...... T13, 188 2007/002811.0 A1 2/2007 Brennan 2003/O159070 A1 8, 2003 Mayer et al. 2007/OO74169 A1 3/2007 Chess et al. 2003/O1591.33 A1 8, 2003 Ferri et al. 2007/0O8897O A1 4/2007 Buxton et al. 2003/O187853 A1 10, 2003 Hensely et al. 2007/0094728 A1 4/2007 Julisch et al. 2003,019 1782 A1 10, 2003 Buxton et al. 2007/0094.734 A1 4/2007 Mangione-Smith et al. 2003/02O8500 A1 11, 2003 Daynes et al. 2007/O150957 A1 6/2007 Hartrell et al. 2003/0212902 A1 11, 2003 van der Made 2007/0162975 A1 7/2007 Overton et al. 2004, OO15712 A1 1, 2004 Szor 2007. O168285 A1 7/2007 Girtakovskis et al. 2004, OO15879 A1 1, 2004 Pauw et al. 2007/0180528 A1 8, 2007 Kane 2004/OO34794 A1 2, 2004 Mayer et al. 2007/0256127 A1 11/2007 Kraemer et al. 2004.0034.813 A1 2, 2004 Chaboud et al. 2007/0271273 A1 11/2007 Cradick et al. 2004, OO64736 A1 4, 2004 Obrecht et al. 2008/0066069 A1 3/2008 Verbowski et al. 2004/OO73653 A1 4, 2004 Hunt et al. 2008/0256137 A1 10, 2008 Kawamura et al. 2004/OO986O7 A1 5, 2004 Alagna et al. 2008/0289042 A1 11/2008 Bai et al. 2004/01 11557 A1 6, 2004 Nakatani et al. 2009/0055166 A1 2/2009 Moyle 2004/O128355 A1 T/2004 Chao et al...... 726/22 2004/O153878 A1 8, 2004 Bromwich et al. 2009/0083,855 A1 3/2009 Apapetal. 2004/O158819 A1 8, 2004 Cuomo et al. OTHER PUBLICATIONS 2004/O1998.27 A1 10, 2004 Muttik et al. 2004/0215972 A1 10, 2004 Sung et al...... T13 201 Office Action dated Jun. 15, 2009 in U.S. Appl. No. 1 1/247,349. 2004/O25O107 A1 12, 2004 Guo Office Action dated Sep. 15, 2009 in U.S. Appl. No. 1 1/181,320. 2005/0033553 A1 2, 2005 Swaine et al. Office Action dated Feb. 18, 2009 in U.S. Appl. No. 1 1/181,320. 2005/0060528 A1 3, 2005 Kim Office Action dated Jan. 16, 2009 in U.S. Appl. No. 1 1/247,349. 2005, OO60699 A1 3, 2005 Kim et al. Notice of Allowance dated Jun. 1, 2010 in U.S. Appl. No. 1 1/181,320. 2005/0O81053 A1 4, 2005 Aston et al. Office Action dated Dec. 10, 2009 in U.S. Appl. No. 1 1/247,349. 2005/0216759 A1 9, 2005 Rothman et al. Notice of Allowance dated Jun. 1, 2010 in U.S. Appl. No. 1 1/247,349. 2005/0268338 A1 12, 2005 van der Made Office Action dated Dec. 22, 2009 in U.S. Appl. No. 1 1/356,600. 2006, OO15940 A1 1, 2006 Zamir et al. Office Action dated Apr. 2, 2010 in U.S. Appl. No. 1 1/356,600. 2006,004 1942 A1 2, 2006 Edwards Notice of Allowance dated Aug. 3, 2010 in U.S. Appl. No. 2006/00479.31 A1 3, 2006 Saika 1 1/356,600. 2006.0075499 A1 4, 2006 Edwards et al. 2006, O123481 A1 6, 2006 Bhatnagar et al. * cited by examiner

U.S. Patent Apr. 17, 2012 Sheet 2 of 16 US 8,161,548 B1

File Header Example 210

MS DOS MZ Header

MS DOS Stub Program DataDirectory VirtualAddress

PE File Signature

VirtualAddress

PE File Header (Machine, NumberOfSections,...)

PE File Optional Header VirtualAddress (SizeCfCode, Sizedfimage, DataDirectory...)

Section Headers

FIG 2 U.S. Patent Apr. 17, 2012 Sheet 3 of 16 US 8,161,548 B1

262 264 Function Name String Feature Value

Count AutoStart Keys Number of auto-start registry keys found in the body of the software. Count Binding Keys Number of file binding registry keys found in the body of the software Count Binding Keys Number of file binding registry keys found in the body of the sotware. Count EXE Files Number of strings ending with "...exe" found in the body of the software. Call Socket Connect 1 Or O. Whether the Software calls Connect. Call CreateFile 1 or 0. Whether the Software calls CreateRile. Call CopyFile 1 or 0. Whether the software calls CopyFile. Call DeleteFile 1 Or O. Whether the Software calls Deletefile. Call GetWindowsOirectory 1 or O. Whether the software calls Get WindowsDirectory. Call MAPSendmail 1 Or O. Whether the Software calls MAP SendMail. Call Outlook 1 or O. Whether the Software Calls Outlook. Call OutlookExpress 1 or O. Whether the Software calls OutlookExpress. Call Word 1 or O. Whether the Software calls Word. Count HTML Tags Number of HTML tags in the body of the Software. Count Kazza Number of strings with "Kazza" in it. Count MSN Number of strings with "MSN Messenger" in it. Count AOL Number of strings with "AOL" in it. Count Crack Number of strings with "Crack" in it.

260 FIG. 3 Function Names as Features U.S. Patent Apr. 17, 2012 Sheet 4 of 16 US 8,161,548 B1

3 O O

Major Linker Version

Major Linker Version

SizeCofimage 217762

SizeCfCodeSizeCfImage O. 13

Size Of mitialized Code? 0.79 310 SizeCfImage ImportTableSize/SizeCfImage 0000280.00028

ResourceSize/Sizedfimage 0.75

Entry Point Location Third Section

User32.dll Ws2 32.dll COmCt132.dll 320

aAvapi32.dll ntd.dll

TFTP Strings

Game Names

Software\MicroSoftWindows\CurrentVersion

FIG. 4 Worm Features Example U.S. Patent Apr. 17, 2012 Sheet 5 of 16 US 8,161,548 B1

472

U.S. Patent Apr. 17, 2012 Sheet 6 of 16 US 8,161,548 B1

Flow Classification of Software

Load Feature Definition File 504

Load Function Definition 508

512 Obtain Suspect Software

516 Extract Features

520 Run Classification Algorithm

O 524 Output Classification Label U.S. Patent Apr. 17, 2012 Sheet 7 of 16 US 8,161,548 B1

0Z).

U.S. Patent Apr. 17, 2012 Sheet 8 of 16 US 8,161,548 B1

low Produce Trained Model

Determine Classification 604 Labels

Select Features and Add to 608 Feature Definition File

612 Collect Training Samples

616 Select Parameters

62O Run Training Application

624 Output Trained Model

628 Output Measurement Results

632 Validate Results

FIG. 9 U.S. Patent Apr. 17, 2012 Sheet 9 of 16 US 8,161,548 B1

FIG. 10A 704 U.S. Patent Apr. 17, 2012 Sheet 10 of 16 US 8,161,548 B1

708

712

FIG 10B U.S. Patent Apr. 17, 2012 Sheet 11 of 16 US 8,161,548 B1

716 Kfeature name="match/MAP SendMail"/> 720

Kfeature name="match/WNetAddOonnection2"/> 724

728 FIG. 10C U.S. Patent Apr. 17, 2012 Sheet 12 of 16 US 8,161,548 B1

732

736

740 744 748

752

" 756

" " 760 " " " &/feature-Seta " 764 Cfeature name="match/Westwood\Red Alert"/>

768 FIG 10E U.S. Patent Apr. 17, 2012 Sheet 14 of 16 US 8,161,548 B1

772

776 780

820 830

(8. ture name="a biCreateSolidBrush 10 (8. ture name="a pi/GetDeviceCaps/> (8. ture name="a piFindResource A - (8. ture name="a biHeapSize' > (8. ture name="a pi/CreateCompatibleDC/> Dialer Feature Definition File Example (8. ture name="a pif GetEnvironmentVariableA (8. ture name="a pi InvalidateRect (8. ture name="a piLoadResource - As previously mentioned, the present invention may be (8. ture name="a biFiRect.> (8. ture name="a bi SetLastError> 25 used to detect a wide variety of types of malware. Listed (8. ture name="a bi IsBadReadPtr below is an example feature definition file for detecting dialer (8. ture name="a bifGetParent> malware. One of skill in the art, upon a reading of the speci (8. ture name="a pi/DialogBoxParamA/> fication and the examples contained herein, would be able to (8. ture name="a biCreateMutex.A's (8. ture name="a pi/SystemParametersInfoA's use invention to detect a variety of other types of malware. (8. ture name="a pi/SetUnhandledExceptionFilter's 30 (8. ture name="a pi/CompareString A (8. ture name="a pi/PeekMessageA/> (8. ture name="a pi InternetCloseHandle' > (8. ture name="a bi,ISBadWritePtrf 35 (8. ture name="a pi/RemoveDirectory AD (8. ture name="a bifGetShortPathNameA's 50 65 Tunisia,Thailand,Taiwan, Syria, Switzerland, Sweden, Spain,South (8. ture name="a biCreateEontA Africa,Slovenia,Slovak Republic,Singapore,Serbia, US 8,161,548 B1 17 18 -continued -continued Saudi Arabia.Russia,Romania,Qatar,Portugal, Poland.Philippines.Paraguay, that includes second features relevant to the classifica benign; an exploit, a root kit, key logger software, a dialer or URL eature name="ma h. CurrentVersion'Run - injection software. eature name="ma h. CurrentVersion'RunOnce - 3. A method as recited in claim 1 wherein said type of eature name="ma h. CurrentVersion'RunServices - eature name="ma h. CurrentVersion'RunServicesOnce - 45 malware is a worm, spyware or a dialer. eature name="ma h/txtfile\shell\open command's 4. A method as recited in claim 1 wherein said character 60 outputting said classification label for said previously un