<<

USOO8844039B2

(12) United States Patent (10) Patent No.: US 8,844,039 B2 Niemela et al. (45) Date of Patent: Sep. 23, 2014

(54) IMAGE RECOGNITION (56) References Cited (75) Inventors: Jarno Niemela, Espoo (FI); Kimmo U.S. PATENT DOCUMENTS Kasslin, Espoo (FI) 2003/0229810 A1* 12/2003 Bango ...... T13 201 - 2004/01231 17 A1* 6/2004 Berger ...... 713,188 (73) Assignee: F-Secure Corporation, Helsinki (FI) 2004/0210769 A1* 10, 2004 Radatti et al. T13 201 2008/O127340 A1* 5/2008 Lee ...... 726/22 (*) Notice: Subject to any disclaimer, the term of this 2009,0199296 A1* 8, 2009 Xie et al...... T26/23 patent is extended or adjusted under 35 k . U.S.C. 154(b) by 677 days. cited by examiner Primary Examiner —Yogesh Paliwal (21) Appl. No.: 12/803,613 (74) Attorney, Agent, or Firm — Harrington & Smith (22) Filed: Jun. 30, 2010 (57) ABSTRACT (65) Prior Publication Data According to a first aspect of the present invention there is provided a method of detecting malware or other potentially US 2012/OOO2839 A1 Jan. 5, 2012 unwanted programs. The method includes, at each of a plu rality of client terminals, when it is determined that a program (51) Int. Cl. may be malware or a potentially unwanted program, gener H04L 29/06 (2006.01) ating image recognition data from displayed image data that G06K 9/00 (2006.01) includes image elements generated by the program, and send (52) U.S. Cl. ing the image recognition data to a central server. At the CPC ...... G06K 9/00973 (2013.01); H04L 63/145 central server, storing the received image recognition data, (2013.01) and using the stored image recognition data to detect the USPC ...... 726/24; 713/188 presence of a malware or potentially unwanted program at the (58) Field of Classification Search client terminals. USPC ...... 726/24 See application file for complete search history. 18 Claims, 3 Drawing Sheets

ANTI-VIRUS SERVER

A2. Captures screenshot and generates image recognition data including indexing information

A5. Sends image recognition data, A6. Does NO hash value and indexing eceived indexing information match information to anti-virus server sigred indexing informatio

A8. Determine if generated display image matches stored A7. Sends stored image image recognition data recognition data to client terminal

A12. Stores received image indicate that program matches recognition data in association with stored image recognition the identification data as a

A11. Store received image recognition data together with already stored image recognition data U.S. Patent Sep. 23, 2014 Sheet 1 of 3 US 8,844,039 B2

JOSS3OOJE

U.S. Patent Sep. 23, 2014 Sheet 2 of 3 US 8,844,039 B2

CLENT TERMINAL ANTI-VIRUS SERVER A1. Determines a program is Suspicious

A2. Captures screenshot and generates image recognition data including indexing information

A3. Generates indexing information

A4, Generates hash of program file

A5. Sends image recognition data, A6. Does NO hash value and indexing eceived indexing information match information to anti-virus server stgred indexing informatiop

A8. Determine if generated display image matches stored A7. Sends stored image image recognition data recognition data to client terminal

A10. Does response A12. Stores received image indicate that program matches recognition data in association with A9. Notify anti-virus server

the identification data as a stored image recognition data?

A11. Store received image recognition data together with already stored image recognition U.S. Patent Sep. 23, 2014 Sheet 3 of 3 US 8,844,039 B2

CLIENT TERMINAL ANTI-VIRUS SERVER

B1. Determines whether or not program is malware

B2. Retrieves identities of client terminals, together with associated program information

B3. Notifies client terminals of B4. Takes appropriate action result

CLIENT TERMINAL ANTI-VIRUS SERVER

C2. Determines program is C1. Determines whether Or not Suspicious program is malware

C3. Captures screenshot and generates image recognition data

C4. Generates indexing information

C5. Generates hash of program file

C6. Sends image recognition data, C7. Received indexing information hash value and indexing matches information to anti-virus Server stored indexing information

C9. Generated display image matches stored C8. Sends stored image image recognition data, determines recognition data to client terminal, program status as indicated by anti together with indication of program virus Server US 8,844,039 B2 1. 2 MALWARE IMAGE RECOGNITION According to a first aspect of the present invention there is provided a method of detecting malware or other potentially TECHNICAL FIELD unwanted programs. The method comprises, at each of a plurality of client terminals, when it is determined that a The present invention relates to the detection of malware, program may be malware or a potentially unwanted program, or other potentially unwanted programs, using image recog generating image recognition data from displayed image data nition. In particular, the present invention relates to a method that includes image elements generated by the program, and of using image recognition data as malware detection infor sending the image recognition data to a central server. At the mation. central server, storing the received image recognition data, 10 and using the stored image recognition data to detect the BACKGROUND presence of a malware or potentially unwanted program at the client terminals. Malware is short for malicious and is used as a The step of using the stored image recognition data to term to refer to any software designed to infiltrate or damage detect the presence of malware or a potentially unwanted a computer system without the owners informed consent. 15 program at the client terminals may comprise, at the central Malware can include computer viruses, worms, trojan horses, server, upon a determination that a program is malware or a , and . In order to prevent problems associ potentially unwanted program, notifying each of the client ated with malware infections, many end users make use of terminals from which image recognition data associated with anti-virus software to detect and possibly remove malware. In the malware program has been received. If so, this step may addition, anti-virus Software is often also used to detect any further comprise, at the central server, if any image recogni other potentially unwanted programs (PUP). A PUP is a pro tion data received from a client terminal is determined as gram that may be unwanted, despite the possibility that users potentially matching stored image recognition associated consented to download it, often downloading the program in with the program, sending the image recognition data asso conjunction with a program that the user wants. PUPS can ciated with the program to the client terminal. Alternatively, include spyware, , Scareware, and Scamware. 25 the step of using the stored image recognition data to detect In order to detect a malware or PUP file, the anti-virus the presence of malware or a potentially unwanted program at Software must have some way of identifying it amongstall the the client terminals may comprise, at the central server, upon other files present on a device. Typically, this requires that the a determination that a program is malware or a potentially anti-virus Software has a database containing the 'signatures' unwanted program, distributing the image recognition data or “fingerprints' that are characteristic of individual malware 30 associated with that program to the plurality of client termi or PUP files. When the supplier of the anti-virus software nals for use in detecting the program. identifies new malware or a new PUP the program is analysed The step of generating image recognition from displayed and its signature is generated. The malware or PUP is then image data may comprise applying a one-way function to “known and its signature can be distributed to end users as displayed image data that includes image elements generated updates to their local anti-virus Software databases. 35 by the program, Such that the displayed image cannot easily Using approaches that solely rely on signature scanning to be recovered from the image recognition data. The one-way detect malware still leaves computers vulnerable to function applied to the displayed image data may comprise “unknown or “Zero day malware programs/applications any of that have not yet been analysed for their signature. To address a Scale-invariant Feature Transform, SIFT algorithm; and this issue, in addition to Scanning for malware or PUP signa 40 an Optical Character Recognition, OCR, algorithm. tures, most anti-virus applications additionally employ heu The method may further comprise, at each of the client ristic analysis. This approach involves the application of gen terminals, when it is determined that a program may be mal eral rules intended to distinguish the behaviour of any ware or a potentially unwanted program, generating an iden malware or PUP from that of clean/legitimate programs. For tifier for the program, and sending the program identifier to example, the behaviour of all programs/applications on a PC 45 the central server for storage with the image recognition data. may be monitored and if a program/application attempts to If so, then upon a determination that a program is malware or write data to an executable file, the anti-virus software can a potentially unwanted program, the central server may gen flag this as Suspicious behaviour. Heuristics can be based on erate an identifier for the program, and compare the generated behaviours such as API calls, attempts to send data over the identifier with the stored program identifiers to identify any Internet, etc. However, due to the ever increasing and ever 50 associated image recognition data. The step of generating an changing nature of malware, these heuristic detection meth identifier for a program may comprise generatingahash value ods are not sufficient to detect all unknown malware. of the program file. The method may further comprise, at each of the client SUMMARY terminals, in addition to generating image recognition data, 55 generating indexing information from the displayed image It is an object of the present invention to obtain image data, the indexing information being sent to the central server recognition data associated with malware or other potentially for storage with the image recognition data. The initial index unwanted programs, and to use this image recognition data to ing information may comprise key features extracted from the detect the presence of malware or other potentially unwanted displayed image data. The indexing information may be program. This is achieved by generating the image recogni 60 extracted from the from the image recognition data. tion data associated with any possible malware or other The method may then further comprise, at the central potentially unwanted programs at client terminals, and send server, upon receipt of image recognition data including ing this image recognition data to a centralised server. Then, indexing information, comparing the received index informa when it is determined whether or not a particular program is tion with previously stored index information to identify malware or a potentially unwanted program, this image rec 65 potentially matching image recognition data previously ognition data can be used to detect the presence of the pro stored at the central server. If the central server does not gram at the client terminals. identify potentially matching image recognition data, the US 8,844,039 B2 3 4 central server may store the received image recognition data the associated image recognition data to the plurality of client individually. Alternatively, if the central server identifies computers for use in detecting the program. potentially matching image recognition data, the potentially According to a fourth aspect of the present invention there matching image recognition data may be sent to the client is provided a computer program according to the third terminal, and the client terminal compare the potentially embodiment embodied on a computer readable medium. matching image recognition data to the displayed image data According to a fifth aspect of the present invention there is to determine if it is a match. provided a server for use in detecting malware or potentially If the potentially matching image recognition data is a unwanted programs at a plurality of client terminals. The match, then the client terminal may notify the central server, server comprises a receiver receiving image recognition data and the central server may store the received image recogni 10 from the plurality of client terminals, the image recognition tion data in association with the previously stored matching data having been generated from displayed image data that image recognition data. Alternatively, if the potentially includes image elements generated by a program that the matching image recognition data is not a match, then the client terminal has determined as possibly being malware or client terminal may notify the central server, and the central a potentially unwanted program, a memory for storing the server store the received image recognition data individually. 15 received image recognition data, and a processor for using the According to a second aspect of the present invention there stored image recognition data to detect the presence of a is provided a method of operating a server. The method com malware or potentially unwanted program at the client termi prises receiving image recognition data from each of a plu nals. The processor may be further configured to determine if rality of client terminals, the image recognition data having a program is malware or a potentially unwanted program. The been generated from displayed image data that includes server may further comprise a transmitter for, if it is deter image elements generated by a program that the client termi mined that a program is malware or a potentially unwanted nal has determined as possibly being malware or a potentially program, notifying each of the client terminals from which unwanted program, storing the received image recognition image recognition data associated with the program has been data, and using the stored image recognition data to detect the 25 received. Alternatively, the server may further comprise a presence of a malware or potentially unwanted program at the transmitter for, if it is determined that a program is malware or client terminals. a potentially unwanted program, distributing stored image The step of using the stored image recognition data to recognition data associated with the program to the plurality detect the presence of malware or a potentially unwanted of client computers for use in detecting the program. program at the client terminals may comprise, upon a deter 30 According to a sixth aspect of the present invention there is mination that a program is malware or a potentially unwanted program, notifying each of the client terminals from which provided a method of operating a client terminal. The method image recognition data associated with that program has been comprises, when it is determined that a program may be received. Alternatively, the step of using the stored image malware or a potentially unwanted program, generating recognition data to detect the presence of malware or a poten 35 image recognition data from displayed image data that tially unwanted program at the client terminals may com includes image elements generated by the program, and send prise, upon a determination that a program is malware or a ing the image recognition data to a central server. The method potentially unwanted program, retrieving stored image rec may further comprise receiving a notification from the central ognition data associated with the program, and distributing server that the program is malware or a potentially unwanted the associated image recognition data to the plurality of client 40 program. Alternatively, the method may further comprise computers for use in detecting the program. receiving detection image recognition data from the central According to a third aspect of the present invention there is server, and using the detection image recognition data to provided a computer program comprising computer program detect the presence of a malware or potentially unwanted code means adapted to perform the following steps: program. accept image recognition data received from each of a 45 The step of generating image recognition from displayed plurality of client terminals, the image recognition data image data may comprise applying a one-way function to having been generated from displayed image data that displayed image data that includes image elements generated includes image elements generated by a program that the by the program, Such that the displayed image cannot easily client terminal has determined as possibly being mal be recovered from the image recognition data. ware or a potentially unwanted program; 50 According to a seventh aspect of the present invention there implement storage of the received image recognition data; is provided a computer program comprising computer pro and gram code means adapted to perform the following steps: use the stored image recognition data to detect the presence determine that a program may be malware or a potentially of a malware or potentially unwanted program at the unwanted program; client terminals. 55 generate image recognition data from displayed image data The step of using the stored image recognition data to that includes image elements generated by the program; detect the presence of malware or a potentially unwanted and program at the client terminals may comprise, upon a deter send the image recognition data to a central server. mination that a program is malware or a potentially unwanted The steps may further comprise receiving a notification program, notifying each of the client terminals from which 60 from the central server that the program is malware or a image recognition data associated with that program has been potentially unwanted program. Alternatively, the steps may received. Alternatively, the step of using the stored image further comprise receiving detection image recognition data recognition data to detect the presence of malware or a poten from the central server, and using the detection image recog tially unwanted program at the client terminals may com nition data to detect the presence of a malware or potentially prise, upon a determination that a program is malware or a 65 unwanted program. The step of generating image recognition potentially unwanted program, retrieving stored image rec from displayed image data may comprise applying a one-way ognition data associated with the program, and distributing function to displayed image data that includes image ele US 8,844,039 B2 5 6 ments generated by the program, Such that the displayed FIG. 4 is a flow diagram illustrating an alternative process image cannot easily be recovered from the image recognition of implemented when a program is Subsequently determined data. as being either malware or legitimate According to an eighth aspect of the present invention there is provided a computer program according to the seventh DETAILED DESCRIPTION embodiment embodied on a computer readable medium. According to a ninth aspect of the present invention there is Whilst many forms of malware and other unwanted soft provided a client terminal. The client terminal may comprise ware are designed to hide any indication of their presence, a processor for determining if a program may be malware or Some malware and PUP programs, such as adware, Scamware 10 or scareware programs, are designed to display elements on a potentially unwanted program and, if so, for generating the graphical user interface (GUI) of a computer system. For image recognition data from displayed image data that example, adware programs usually silently install themselves includes image elements generated by the program, and a on a computer device in order to display advertising material transmitter for sending the image recognition data to a central to the user. By way of further example, Scareware or scam server. The client terminal may further comprise a receiver for 15 ware, such as rogue anti-virus or anti-spyware applications, receiving a notification from the central server that the pro usually silently install themselves on a computer system gram is malware. Alternatively, the client terminal may fur before displaying some information to the user. In many ther comprise a receiver for receiving detection image recog cases, Scareware programs display hoax messages and warn nition data from the central server, and the processor may be ings that a computer device is infected with some form of further configured to use the detection image recognition data malware, and offers to disinfect the device provided that the to detect the presence of a malware or potentially unwanted user purchases a license to the Software. program. The processor may be further configured to gener It is therefore possible to detect the presence of some ate image recognition data by applying a one-way function to malware and PUP programs by using image recognition to displayed image data that includes image elements generated determine when a display element associated with a particu by the program, Such that the displayed image cannot easily 25 lar malware or PUP program is displayed on the GUI of a be recovered from the image recognition data. computer system. In order for an anti-virus application to According to a tenth aspect of the present invention there is perform this detection using image recognition, a screenshot provided a method of operating a client terminal. The method or screen capture of the display data generated by a malware comprises receiving image recognition data associated with a or PUP program must be obtained and distributed by the malware or potentially unwanted program, using the received 30 Supplier of the anti-virus application as a "fingerprint. How ever, it has been recognised here that this can be difficult to image recognition data to determine if a program executed on achieve as many malware and PUP programs are designed to the client terminal generates image elements that match the prevent themselves from executing in a virtual or emulated image recognition data, and, if so, identifying the program as environment. malware or a potentially unwanted program. 35 In order to at least partially overcome the problem According to an eleventh aspect of the present invention described above, there will now be described methods and there is provided a computer program comprising computer apparatus for obtaining image recognition data, and for using program code means adapted to perform the following steps: this image recognition data to detect the presence of a mal acceptimage recognition data associated with a malware or ware or other potentially unwanted program, wherein the potentially unwanted program; 40 image recognition data is generated at client terminals from use the received image recognition data to determine if a displayed image data and provided to an anti-virus Supplier's program executed on the client terminal generates image centralised servers. For the sake of clarity, malware will be elements that match the image recognition data; and used to refer to both malware programs and PUPs. if so, identify the program as malware or a potentially In addition, it has also been recognised here that privacy unwanted program. 45 issues can prevent, or at the very least can make it undesirable According to a twelfth aspect of the present invention there for an anti-virus application to capture screenshots directly is provided a computer program according to the eleventh from user computer systems for uploading to the anti-virus embodiment embodied on a computer readable medium. supplier's centralised servers. To overcome this additional According to a thirteenth aspect of the present invention problem, it is also proposed here to make use of a one-way/ there is provided a client terminal comprising a receiver for 50 non-reversible function at the client terminals in order to receiving image recognition data associated with a malware generate the image recognition data from displayed image or potentially unwanted program, and a processor for deter data, such that the displayed image cannot easily be recovered mining if a program executed on the client terminal generates from the image recognition data. image elements that match the image recognition data and, if FIG. 1 illustrates Schematically a system according to an So, for identifying the program as malware or a potentially 55 embodiment of the present invention and which comprises a unwanted program. plurality of client terminals 1 connected to a central anti-virus server 2 via a network 3 such as the Internet or a LAN. Each BRIEF DESCRIPTION OF THE DRAWINGS of the client terminals 1 can be implemented as a combination of computer hardware and Software. A client terminal 1 com FIG. 1 illustrates Schematically a computer system accord 60 prises a memory 4, a processor 5 and a transceiver 6. The ing to an embodiment of the present invention; memory 4 stores the various programs/executable files that FIG. 2 is a flow diagram illustrating a process of obtaining are implemented by the processor 5, and also provides a image recognition data for use in detecting malware or other storage unit 7 for any required data. The programs/executable potentially unwanted programs; files stored in the memory 4, and implemented by the proces FIG. 3 is a flow diagram illustrating a process of imple 65 Sor 5, include a malware detection unit 8 and an image rec mented when a program is Subsequently determined as being ognition data generation unit 9. The malware detection unit 8 either malware or legitimate; and and image recognition data generation unit 9 can be sub-units US 8,844,039 B2 7 8 of an anti-virus application 10. The transceiver 6 is used to A4. The anti-virus application 10 also generates an identi communicate with the central anti-virus server 2 over the fier for the program by applying a hash function to the network 3. Typically, the client terminals 1 may be any of a program file. desktop personal computer (PC), laptop, personal data assis A5. The image recognition data is then sent to the centra tant (PDA) or mobile phone, or any other suitable device. lised anti-virus server 2, together with an identifier of the The central anti-virus server 2 is typically operated by the client terminal, the hash value of the program file and the provider of the anti-virus application 10 that is run on each of indexing information. the client terminals 1, and the users of these terminals will A6. The centralised anti-virus server 2 then determines if usually be subscribers to an update service supplied by the the received indexing information matches any indexing 10 informationalready stored in its database. If the received central anti-virus server 2. Alternatively, the central anti-virus indexing information does not match any of the stored server 2 may be that of a network administrator or supervisor, indexing information, then the process proceeds to step each of the client terminals 1 being part of the network for A11. which the supervisor is responsible. The central anti-virus A7. If the received indexing information does match any of server 2 comprises a database 11 for storing entries that 15 the stored indexing information, then anti-virus server 2 include image recognition data and associated program iden retrieves the stored image recognition data associated tification data, as well as any other malware-related data, and with the matching indexing information and sends this a transceiver 12 for communicating with the client terminals to the client terminal. 1 over the network 3. The central anti-virus server 2 can A8. The client terminal 1 then determines if the display further comprise a memory 13 and a processor 14. The data includes any image elements generated by the pro memory 13 can store programs/executable files that can be gram that match the image recognition data received implemented by the processor 14. The programs/executable from the central anti-virus server 2. files stored in the memory 13, and implemented by the pro A9. The client terminal 1 notifies the anti-virus server 2 of cessor 14, can include a malware analysis unit 15. the result. FIG. 2 is a flow diagram illustrating the process of obtain 25 A10. The anti-virus server 2 determines the result from the ing image recognition data for use in detecting malware or response received from the client terminal 1. If the client other potentially unwanted programs. The steps are per terminal 1 notifies the anti-virus server 2 that the display formed as follows: image generated by the program does not match the A1. The anti-virus application 10 on a user's client termi stored image recognition data, then the process proceeds nal 1 determines that a program present on the client 30 to step A11. terminal 1 is Suspicious and may therefore be malware. A11. If the client terminal 1 notifies the anti-virus server 2 By way of example, the anti-virus application 10 may that the display image generated by the program does identify a program as Suspicious if: match the previously stored image recognition data, then it determines that the program is new; the anti-virus server 2 stores the image recognition data, if it does not recognise the program as one that it has 35 the hash value of the program file and the identifier of the previously identified as clean/legitimate; client terminal, received from the client terminal 1 in if it generates Suspicious image elements (e.g. infection step A5, in association with the already stored indexing warnings, or image elements already known to be information and image recognition data. Therefore, if a associated with malware etc) on the display of the client terminal 1 Subsequently sends matching indexing client terminal; and 40 information, the anti-virus server 2 will respond with if the structure of the program file is Suspicious. both the previously stored image recognition data that A2. The anti-virus application 10 will then take one or was sent in step A6, and the newly stored image recog more screenshots or screen captures whilst the program nition data that has been stored in the same entry. is executing, in order to capture display data that A12. If the received indexing information does not match includes any image elements generated on the display of 45 any identification data already stored by the anti-virus the client terminal 1 by the program. These image ele server 2, or if the client terminal 1 notifies the anti-virus ments can include dialog boxes, pop-up windows, mes server 2 that the display image generated by the program Sage balloons, etc. The anti-virus application 10 then does not match the stored image recognition data, then generates image recognition data from the captured dis the anti-virus server 2 stores the image recognition data, play data. This image recognition data is generated using 50 the hash value of the program file and the identifier of the a one-way function Such that it is impossible or imprac client terminal, received from the client terminal 1 in tical to reconstruct the original screenshot from the data, step A5, together with the received indexing information but such that it can still be used to identify any matching as a new individual entry in the database. Therefore, if a images. For example, the image recognition data could client terminal Subsequently sends matching indexing be generated by applying a Scale-invariant Feature 55 information, the anti-virus server 2 will respond with Transform (or SIFT) algorithm to the display data. both the previously stored image recognition data that A3. The anti-virus application 10 also generates some ini was sent in step A6, and the newly stored image recog tial indexing/identification information for the program. nition data from the separate database entries. This initial indexing information takes the form of some As described above, if the client terminal 1 indicates that key features, or key points, of the display data that 60 the display data from which the received image recognition includes the image elements generated by the program. data has been generated matches any of the image recognition For example, this may be the strings present in the dis data received from the anti-virus server 2, then the anti-virus play data. This indexing information could be comprised server 2 stores the received image recognition data and pro of particular components extracted from the image rec gram identifier in association with the matching previously ognition data or could be generated using some separate 65 stored image recognition data. In doing so, the anti-virus algorithm, depending upon the algorithm used to gener server 2 performs a process of server-side grouping, in which ate the image recognition data. programs that generate the same image elements are grouped US 8,844,039 B2 10 into a single set for classification purposes. As such, if any one database of the result of the determination. For example, of the programs within the same set/group is classified as if the program has been identified as malware it notifies malware, then the anti-virus server 2 can be configured to the client terminals of this. identify all programs within the set as also being malware. B4. The notified client terminals 1 can then take any appro In addition to sending the image recognition data and pro priate action. For example, if a program is identified as gram identifiers to the anti-virus sever 2, the client terminals malware, then the anti-virus application at the client 1 could also collect and send details of any actions performed terminal can prompt the user to disinfect or quarantine during the installation of the program, in order to obtain the the malware program. registry paths, files, mutexes, registry keys etc that may have In addition, or as an alternative to the process outlined been created by the program. This information could then be 10 above, FIG. 4 is a flow diagram illustrating a possible process used when disinfecting the client terminals if it is determined implemented when a client terminal 1 sends image recogni that the program is malware. Alternatively, if it is determined tion data and indexing information to the anti-virus server 2, that the program is malware, the anti-virus application after the anti-virus server 2 has determined whether or not the present on the client terminals could perform a scan to search 15 program is malware. The steps are performed as follows: for any paths, registry keys etc that contain strings extracted C1. The anti-virus server 2 determines whether or not a from the display data that included image elements generated particular program relates to malware. by the program. C2. Subsequently, the anti-virus application 10 on a user's Furthermore, the combination of image recognition data client terminal 1 determines that a program present on and associated hash values could be useful as a form of the client terminal 1 is suspicious and may therefore be heuristic analysis. If the anti-virus server 2 were to identify a malware. single item of image recognition data as being associated with C3. The anti-virus application 10 will then take one or more a large number of different hash values, then this would be an Screenshots or screen captures whilst the program is indication that the same image elements have been generated executing, in order to capture display data that includes by different program files. Therefore, given that it would be 25 any image elements generated on the display of the unusual for multiple legitimate programs to generate the client terminal 1 by the program. The anti-virus appli same display data, this would be an indication that the asso cation 10 then generates image recognition data from the ciated program files are likely to be malware. This method is captured display data. particularly useful when attempting to detect malware pro C4. The anti-virus application 10 also generates some ini grams that obfuscate their binary code, as even though the 30 tial indexing/identification information for the program. binary code may vary between each occurrence of the mal C5. The anti-virus application 10 also generates an identi ware, each occurrence of the malware program will likely fier for the program by applying a hash function to the generate image elements that are Substantively the same. program file. There are various one-way functions that could be used to C6. The image recognition data is then sent to the centra perform the image recognition and comparison steps. For 35 lised anti-virus server 2, together with an identifier of the example, a SIFT algorithm could be used to generate a “fea client terminal, the hash value of the program file and the ture description from the display data, the description defin indexing information. ing the display image using any interesting points. This C7. The anti-virus server 2 determines that the received description could then be used to determine if any other indexing information matches the indexing information display data contains images with the same interesting points. 40 stored for the program and retrieves the associated Alternatively, Optical Character Recognition (OCR) could be stored image recognition data. used to extract text/strings from display data that includes C8. The stored image recognition data is then sent to the image elements generated by a program of interest. client terminal 1, together with an indication as to This method provides that the supplier of an anti-virus whether or not the program that matches the image rec application can obtain malware image recognition data with 45 ognition data is malware or legitimate, as previously out the need to overcome the difficulty of executing the mal determined by the anti-virus server 2 in step C1. ware program in a virtual environment and withoutbreaching C9. The client terminal 1 then determines that at least some the user privacy. of the image recognition data received from the anti The anti-virus server 2 continues to store image recogni virus server 2 matches the image elements generated by tion data received from clients, as outlined above, until it can 50 the program, and therefore that the program is malware make a determination as to whether or not a program relates or legitimate, as indicated by the anti-virus server 2. to malware. FIG. 3 is a flow diagram illustrating a possible It will be appreciated by the person of skill in the art that process implemented when a program is Subsequently deter various modifications may be made to the above described mined as being either malware or legitimate. The steps are embodiments without departing from the scope of the present performed as follows: 55 invention. For example, whilst the above-described embodi B1. The anti-virus server 2 determines whether or not a ments make use of a one-way function to generate the image particular program relates to malware (e.g. having per recognition data, this is not essential but is merely preferable formed a full analysis of the program file). in order to provide privacy for the user's of the client termi B2. The anti-virus server 2 retrieves the identities of all of nals. In addition, the above-described embodiments also the client terminals 1 that have previously provided 60 make use of indexing information in order to identify possible matching indexing information or both matching index matching image recognition data. Whilst the use of indexing ing information and image recognition data from the information does improve the performance of the invention, it database. The anti-virus server also retrieves the pro is not essential, as the central anti-virus server could equally gram information (e.g. program file hash etc) provided provide all of the relevant image recognition data to the client by each of the client terminals. 65 terminals. Furthermore, whilst in the above-described B3. The anti-virus server 2 then notifies each of the client embodiments the key features that comprise the indexing terminals whose details have been retrieved from the information are determined at the client terminal, these key US 8,844,039 B2 11 12 features could equally be determined by the anti-virus server the potentially matching image recognition data is sent to the from the received image recognition data. client terminal, and the client terminal compares the poten The invention claimed is: tially matching image recognition data to the displayed image 1. A method of detecting malware or potentially unwanted data to determine if it is a match. programs, the method comprising: 10. A method as claimed in claim 9, wherein, if the poten at each of a plurality of client terminals, when it is deter tially matching image recognition data is a match, the client mined that a program may be malware or a potentially terminal notifies the central server, and the central server unwanted program, generating image recognition data stores the received image recognition data in association with from displayed image data that includes image elements the previously stored matching image recognition data. generated by the program, and sending the image rec 10 11. A method as claimed in claim 9, wherein, if the poten ognition data to a central server, tially matching image recognition data is not a match, the at the central server, storing the received image recognition client terminal notifies the central server, and the central data, and using the stored image recognition data to server Stores the received image recognition data individu detect a presence of a malware or potentially unwanted ally. program at the plurality of client terminals; and 15 12. A non-transitory computer storage medium having at each of the plurality client terminals, in addition to stored thereon a computer program comprising computer generating image recognition data, generating indexing program code means adapted to perform the following steps: information from the displayed image data, the indexing accept image recognition data and indexing data received information being sent to the central server for storage from each of a plurality of client terminals, the image with the image recognition data. recognition data and the indexing data having been gen 2. A method as claimed in claim 1, wherein the step of erated from displayed image data that includes image using the stored image recognition data to detect the presence elements generated by a program that a client terminal of a malware or potentially unwanted program at the client has determined as possibly being malware or a poten terminals comprises: tially unwanted program; at the central server, upon a determination that a program is 25 implement storage of the received image recognition data malware or a potentially unwanted program, notifying and the indexing data; and each of the client terminals from which image recogni use the stored image recognition data to detect a presence tion data associated with the program has been received. of a malware or potentially unwanted program at the 3. A method as claimed in claim 1, wherein the step of plurality of client terminals. using the stored image recognition data to detect the presence 30 13. A non-transitory computer storage medium having of a malware or potentially unwanted program at the client stored thereon a computer program as claimed in claim 12, terminals comprises: wherein the step of using the stored image recognition data to at the central server, upon a determination that a program is detect the presence of a malware or potentially unwanted malware or a potentially unwanted program, distributing program at the client terminals comprises: the image recognition data associated with that program 35 upon a determination that a program is malware or a poten to the plurality of client terminals for use in detecting the tially unwanted program, notifying each of the client program. terminals from which image recognition data associated 4. A method as claimed in claim 1, wherein the step of with that program has been received. generating image recognition from displayed image data 14. A non-transitory computer storage medium having comprises: 40 stored thereon a computer program as claimed in claim 12, applying a one-way function to displayed image data that wherein the step of using the stored image recognition data to includes image elements generated by the program, Such detect the presence of a malware or potentially unwanted that the displayed image cannot easily be recovered from program at the client terminals comprises: the image recognition data. upon a determination that a program is malware or a poten 5. A method as claimed in claim 1, and further comprising: 45 tially unwanted program, retrieving stored image recog at each of the client terminals, when it is determined that a nition data associated with the program, and distributing program may be malware or a potentially unwanted the associated image recognition data to the plurality of program, generating an identifier for the program, and client computers for use in detecting the program. sending the program identifier to the central server for 15. A non-transitory computer storage medium having storage with the image recognition data. 50 stored thereon a computer program comprising computer 6. A method as claimed in claim 5, wherein, upon a deter program code means adapted to perform the following steps: mination that a program is malware or a potentially unwanted determine that a program may be malware or a potentially program, the central server generates an identifier for the unwanted program; program, and compares the identifier with the stored program generate image recognition data from displayed image data identifiers to identify any associated image recognition data. 55 that includes image elements generated by the program; 7. A method as claimed in claim 1, and further comprising: generate indexing information from the displayed image at the central server, upon receipt of image recognition data data; and including indexing information, comparing the received send the image recognition data and the indexing informa index information with previously stored index informa tion to a central server. tion to identify potentially matching image recognition 60 16. A non-transitory computer storage medium having data previously stored at the central server. stored thereon a computer program as claimed in claim 15. 8. A method as claimed in claim 7, wherein, if the central wherein the steps further comprise: server does not identify potentially matching image recogni receiving a notification from the central server that the tion data, the central server stores the received image recog program is malware or a potentially unwanted program. nition data individually. 65 17. A non-transitory computer storage medium having 9. A method as claimed in claim 7, wherein, if the central stored thereon a computer program as claimed in claim 15. server identifies potentially matching image recognition data, wherein the steps further comprise: US 8,844,039 B2 13 14 receiving detection image recognition data from the central server; and using the detection image recognition data to detect the presence of a malware or potentially unwanted program. 18. A non-transitory computer storage medium having 5 stored thereon a computer program as claimed in claim 15. wherein the step of generating image recognition from dis played image data comprises: applying a one-way function to displayed image data that includes image elements generated by the program, Such 10 that the displayed image cannot easily be recovered from the image recognition data. k k k k k