(12) Patent Application Publication (10) Pub. No.: US 2011/0258201 A1 LEVOW Et Al
Total Page:16
File Type:pdf, Size:1020Kb
US 201102582O1A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2011/0258201 A1 LEVOW et al. (43) Pub. Date: Oct. 20, 2011 (54) MULTILEVEL INTENT ANALYSIS Related U.S. Application Data APPARATUS & METHOD FOR EMAIL FILTRATION (62) Division of application No. 12/128.286, filed on May 28, 2008. (75) Inventors: ZACHARY LEVOW, MOUNTAIN VIEW, CA (US); Publication Classification DEAN DRAKO, LOS ALTOS, CA (51) Int. Cl. (US); SHAWN ANDERSON, G06F 7/30 (2006.01) VANCOUVER, WA (US) (52) U.S. C. .. 707/748; 707/755; 707/769; 707/E17.005; 707/E17.014 (73) Assignee: BARRACUDA INC., CAMPBELL, CA (US) (57) ABSTRACT (21) Appl. No.: 13/175,812 A method for filtering email which contains links to uniform resource identifiers which disguise the content and identity of (22) Filed: Jul. 1, 2011 spam sites by multiple serial redirection. Analyze an electronic document to extract embedded uniform resource identifier 310 Extract website from uri 320 Matching website With a database of categorized websites 330 No match Fetch data at uri location 340 Final data Continue for all MOre URI urin document Take action DOne Patent Application Publication Oct. 20, 2011 Sheet 1 of 8 US 2011/02582O1 A1 SPAMMERS REDIRECTION CREATION SYSTEM O REDIRECTING UR 15n 4 EMAIL REDIRECTING RECIPIENT UR 155 LINK REDIRECTING SPAM UR 154 LINK WEBSITE 60 42 EMAIL REDIRECTING RECIPIENT URI 153 LINK REDIRECTING LINK UR 152 REDIRECTING UR 15 Fig. 1 Patent Application Publication Oct. 20, 2011 Sheet 2 of 8 US 2011/02582O1 A1 Analyzing an electronic document (email) For an embedded uniform resource identifier 210 Following uri to resource 220 And obtain data from final resource Analyze content for category indicators (spam) e.g. image recognition, Status Codes, patterns 230 Extract website from uri 240 Determine category of website 250 Storing website in database of categories 260 Fig.2 Patent Application Publication Oct. 20, 2011 Sheet 3 of 8 US 2011/02582O1 A1 Analyze an electronic document to extract embedded uniform resource identifier 30 Extract website from uri 320 Matching website With a database of categorized websites 330 No match Fetch data at uri location 340 Final data Continue for all MOre UR urin document Take action DOne Fig. 3 Patent Application Publication Oct. 20, 2011 Sheet 4 of 8 US 2011/02582O1 A1 DB 4 O SEX.COM : PORN BAD.COM : SPAM UNIX.COM : TECH RESULT CONVENTIONAL ANALYZER 420 NCE.COM REDIRECT BAD.COM FIG.4 Patent Application Publication Oct. 20, 2011 Sheet 5 of 8 US 2011/02582O1 A1 oImpLiter Reidable Storage Medium opLiter Readable Processors) limplit Leices piteries Storage Medium Reader . (Il Tunications orkingi enor t Memory perating Systein |D Matching Eng ther progr:r code: F.5 Patent Application Publication Oct. 20, 2011 Sheet 6 of 8 US 2011/02582O1 A1 RECEIVE DOCUMENT 5 O SELECT LINK(S) 520 FOLLOW LINK 530 - d SPAM DB 54 t s s - - - - - YES YES RANSFER DOCUMEN SET GRADE 56 560 FIG.6 Patent Application Publication Oct. 20, 2011 Sheet 7 of 8 US 2011/02582O1 A1 RECEIVE DOCUMENT 5 O FOLLOW DB 52 SELECT LINK(S) 520 FOLLOW LINK 530 S d SPAM DB 54 - - L - - - - - - 1 - - - - - - - - - - T YES RANSFER DOCUMEN SET GRADE 56 560 FIG.7 Patent Application Publication Oct. 20, 2011 Sheet 8 of 8 US 2011/02582O1 A1 RECEIVE DOCUMENT 51 O C d FOLLOW DB SELECT LINK(S) 520 J FOLLOW LINK 530 TDY DB 53 - DY WEBSITE - - - - - - - - --- 1 532 ES - d SPAM DB 54 s YES YES RANSFER DOCUMEN SET GRADE 56 560 FIG.8 US 2011/02582O1 A1 Oct. 20, 2011 MULTILEVELINTENT ANALYSIS 0007 Referring now to FIG. 1 a block diagram illustrates APPARATUS & METHOD FOR EMAIL a plurality of websites containing redirecting uniform FILTRATION resource identifiers (uri) 151-15n created by the spammer to disguise the location of the spam website 160. The emails RELATED APPLICATION 131-132 contain at least a link to the redirecting website which makes a conventional content filter ineffective at 0001. This application is a division of Ser. No. 12/128.286 blocking spam because the embedded uri is changed quickly filed May 28, 2008 MULTILEVEL INTENT ANALYSIS in Subsequently transmitted email. The spammer is able to METHOD FOR EMAIL FILTRATION which was restricted rapidly create new redirecting uri’s so that new emails contain and claims the priority of its parent application which is fully links to websites not known to be related to spam. Thus the incorporated by reference. The parent application issued on spam website uri 160 is effectively hidden from conventional as U.S. Pat. No. content filters operating only on the email itself. To appreciate the limitations of conventional analyzers, consider the illus BACKGROUND tration in FIG. 4. A database 410 contains text strings which the conventional analyzer 420 references in searching docu 0002 Unsolicited bulk email messages commonly called ments. In an embodiment, these strings may be a website or a spam are nearly free for the sender to send and they are being uniform resource identifier. A string not found in the database sent in large growing Volumes. They are expensive to the results in “No Match'. However, in the example, NICE.COM receivers in wasted resources, fraud, and lost productivity. turns out to link to a redirect document which in turn links to 0003 Conventional methods provide for filtering spam a known spam website. BAD.COM which is not discoverable either at the desktop or at a mail server. It is common knowl by conventional filtering. Redirection defeats conventional edge to those skilled in the art to examine Subject lines and content filtering. message content for certain keywords to determine that an 0008 Thus it can be appreciated that what is needed is a email is likely to be spam. As an example, words for male method for determining that an email is actually spam sexual enhancement products are generally reliable indica although the embedded uniform resource identifier within the tors of one type of spam. This conventional process is called email only references a redirecting resource which is not content filtering. within a database of spam websites. 0004. To counteract the effectiveness of content filters, 0009. The present invention accesses uri’s by following spammers have delivered specious messages which link to a redirection directives and comparing uri’s with a database of website which delivers the messaging in text, images, or spam uri’s. The best mode of the invention adds a number of audio-visual presentation. Of course, those skilled in the art optimizations to reduce the number of uri's that must be can incorporate the uniform resource identifier (uri) of the followed. In general the method has the steps of analyzing and spam website into content or similar filters. grading documents such as email. 0005. To avoid email content filters the embedded link 0010. The objective of the invention is to set a grade value may be changed quickly by automatically creating a large and for a document such as email comprising one selected from dynamic number of new redirection websites whose purpose the following: a numerical value, a letter, match, no-match, is to redirect the user to the spam website. These redirecting category, String, pass, and fail. The invention may itselfoper websites may be created and abandoned faster than conven ate on the document or simply mark it for another tool to tional content filters can be updated to match them. It also operate on the document. These operations include but are not becomes impractical to operate a content filter if the number limited to causing the document to be blocked, deleted, of websites created that need to be filtered is large due to diverted to a spam mailbox, marked with warning messages, automation. sterilized, quarantined, or modified, depending on the grade 0006 Sender Identity Obfuscation Conventional source value; else, passing it on to one of an addressee, a user agent, reputation techniques have been used to combat spammers by a mail server, a gateway, and another filter, or doing nothing to profiling the sender's history. This enables a spam filter to it block spam efficiently by doing a simple database lookup on 0011. The best mode embodiment is illustrated in FIG. 8 the Source. Spammers have resorted to obfuscating their iden includes using a database to assist in the following steps: tities more systematically to avoid this. Sender identity obfus O012 selecting links, 0013 following links, and 0014 cation may result from spammers taking control of networks matching links. of computers infected with a form of malware to create a 0012 Rather than selecting and analyzing all of the links botnet. The spammer's control over other computers allow that may be embedded in a document it is more efficient to them to send email from diverse sources throughout the Inter maintain a “follow database' and only select links 0016 net. In doing so the spammer effectively hides his own iden which are hardcoded, or 0017 match a follow database, for tity from conventional source reputation checks that profile example domains, websites or strings that reference compli sender network addresses. Just as botnets have enabled spam mentary web hosts whereby anonymous users may freely mers to send from many sender IP addresses, inexpensive publish content comprising at least one of Scripts, hypertext domain registrations and free redirection sites have enabled documents, and redirection instructions. spammers to create new domain identities quickly and inex 0013 Following links may also be optimized by consult pensively. By redirecting to spam websites through reputable ing a database of what we define as tidied websites.