USOO8725739B2

(12) United States Patent (10) Patent No.: US 8,725,739 B2 Liang et al. (45) Date of Patent: May 13, 2014

(54) CATEGORY-BASED CONTENT 5,634,051 A * 5/1997 Thomson ...... 1.1 RECOMMENDATION 5,778,362 A 7/1998 Deerwester .... 707/5 5,794,050 A 8/1998 Dahlgren et al. 395/708 5,794, 178 A 8, 1998 Caid et al...... 704/9 (75) Inventors: Jisheng Liang, Bellevue, WA (US); 5,799,268 A 8/1998 Boguraev ...... 704/9 Krzysztof Koperski, , WA (US); 5,848,417 A 12/1998 Shoji et al...... TO7,102 Jennifer Cooper, Seattle, WA (US); Ted 5,857,179 A 1/1999 Vaithyanathan et al...... 707/2 Diamond, Seattle, WA (US) 5,884,302 A 3/1999 Ho ...... 707/3 (Continued) (73) Assignee: Evri, Inc., Seattle, WA (US) FOREIGN PATENT DOCUMENTS (*) Notice: Subject to any disclaimer, the term of this patent is extended or adjusted under 35 EP O 280 866 9, 1988 U.S.C. 154(b) by 255 days. EP O 597 630 T 2002 (Continued) (21) Appl. No.: 13/286,778 OTHER PUBLICATIONS (22) Filed: Nov. 1, 2011 Abraham, “FoxO Xquery by Forms.” Human Centric Computing (65) Prior Publication Data Languages and Environments, Proceedings 2003 IEEE Symposium, Oct. 28-31, 2003, Piscataway, New Jersey, pp. 289-290. US 2012/O109966 A1 May 3, 2012 (Continued) Related U.S. Application Data Primary Examiner — Anteneh Girma (60) Pyignal application No. 61/408,965, filed on Nov. (74) Attorney, Agent, or Firm Lowe Graham Jones PLLC (51) Int. Cl (57) ABSTRACT Goof i730 (2006.01) Techniques for category-based content recommendation are (52) U.S. Cl described. Some embodiments provide a content recommen USPG 707f740: 707/E17.014 dation system (“CRS) configured to recommend content 58) Field fo - - - - - ificati------s ------h s items (e.g., Web pages, images, videos) that are related to (58) OSSO Sea specified categories. In one embodiment, the CRS processes S O lication file f let h hist content items to determine entities referenced by the content ee appl1cauon Ille Ior complete searcn n1Story. items, and to determine categories related to the referenced (56) References Cited entities. The determined entities and/or categories may be part of a taxonomy that is stored by the CRS. Then, in U.S. PATENT DOCUMENTS response to a received request that indicates a category, the CRS determines and provides indications of one or more 4,839,853. A 6, 1989 Deerwester et al...... 364,900 content items that each have a corresponding category that 5,301,109 A 4, 1994 Landauer et al...... 364,419.19 5,317,507 A 5, 1994 Gallant ...... 364/419.13 matches the indicated category. In some embodiments, at 5,325,298 A 6, 1994 Gallant ...... 364/419.19 least Some of these techniques are employed to implement a 5,331,556 A 7/1994 Black, Jr. et al...... 364/419.08 category-based news service. 5,377,103 A 12/1994 Lamberti et al...... 364/419.08 5,619,709 A 4, 1997 Caid et al...... 395/794 29 Claims, 12 Drawing Sheets

US 8,725,739 B2 Page 2

(56) References Cited 2004/0167909 A1 8/2004 Wakefield et al...... 7O7/1OO 2004O16791.0 A1 8, 2004 Wakefield et al...... 707/100 U.S. PATENT DOCUMENTS 2004O167911 A1 8, 2004 Wakefield et al...... 707/100 2004/0221235 A1 11/2004 Marchisio et al...... 715,534 5,933,822 A 8, 1999 Braden-Harder et al...... 707/5 2004/0243388 A1 12/2004 Corman et al...... TO4f1 5,950, 189 A 9, 1999 Cohen et al...... 707/3 2005/0O27704 A1 2/2005 Hammond et al. 707/5 5,982,370 A 1 1/1999 Kamper ...... 345,356 2005/0076365 A1* 4/2005 Popov et al. ... 725/46 6,006,221 A 12/1999 Liddy et al...... 707/5 2005/0108001 A1 5/2005 Aarskog ...... 70410 6,006,225 A 12/1999 Bowman et al. 705/5 2005, 0108262 A1* 5, 2005 Fawcett et al. .. 7O7/1OO 6,026,388 A 2/2000 Liddy et al. . 707/1 2005, 0138018 A1 6/2005 Sakai et al...... 707 3 6,061,675 A 5, 2000 Wical ...... TO6/45 2005. O144064 A1 6/2005 Calabria et al. TO5/14 6,064,951 A 5/2000 Park et al...... TO4/8 2005. O149494 A1 7/2005 Lindh et al. .. 707 3 6,122,647 A 9, 2000 Horowitz et al. 707/513 2005/0177805 A1 8/2005 Lynch et al...... T15/968 6,185.550 B1 2/2001 Snow et al...... 707/1 2005/O197828 A1 9/2005 McConnell et al...... TO4/9 6, 192,360 B1 2, 2001 Dumais et al. 707/6 2005/0210000 A1 9, 2005 Michard ...... 707/3 6,202,064 B1 3/2001 Juliard ...... 707/5 2005/0216443 A1 9, 2005 Morton et al. .. 707/3 6,219,664 B1 4/2001 Watanabe ...... 707/3 2005/0267871 A1 12/2005 Marchisio et al. 707/3 6,246,977 B1 6/2001 Messerly et al. 704.9 2006, O149734 A1 7/2006 Egnor et al. . 707/7 6,311,152 B1 10/2001 Bai et al...... 704/9 2006/0279799 A1* 12/2006 Goldman ..... 358/403 6,363,373 B1 3, 2002 Steinkraus 707/3 2007, OO67285 A1 3/2007 Blume et al...... 707/5 6.405, 190 B1 6/2002 Conklin ...... 707/3 2007.0143300 A1 6/2007 Gulli et al. 6,411,962 B1 6/2002 Kupiec 2007/0156669 A1* 7/2007 Marchisio et al...... 7O7/4 6,460,029 B1 10/2002 Fries et al. 2007/02090 13 A1 9/2007 Ramsey et al...... 715,769 6,484.162 B1 1 1/2002 Edlund et al. 2008.0005651 A1 1/2008 Grefenstette et al. 71.5/5OO 6,510,406 B1 1/2003 Marchisio 2008/0010270 A1 1/2008 Gross ...... 707/5 6,571.236 B1 5/2003 Ruppelt ...... 707/3 2008/0059456 A1 3/2008 Chowdhury et al. 707/5 6,584,464 B1 6, 2003 Warthen 2008.OO82578 A1 4/2008 Hogue et al. . 707/104.1 6,601,026 B2 7/2003 Appelt et al. 2008/0097.975 A1 4/2008 Guay et al. ... TO7/4 6,728,707 B1 4/2004 Wakefield et al. 2008/0288456 Al 1 1/2008 Omoigui...... 707/3 6,732,097 B1 5, 2004 Wakefield et al. 2009/0076886 A1 3/2009 Dulitz et al...... 70.5/10 6,732,098 B1 5, 2004 Wakefield et al. 2009, O144609 A1 6/2009 Liang et al. 6,738,765 B1 5, 2004 Wakefield et al. 2009/0228439 A1 9/2009 Manolescu et al...... 707/3 6,741,988 B1 5, 2004 Wakefield et al. 2010, 0046842 A1 2/2010 Conwell ...... 382.218 6,745,161 B1 6/2004 Arnold et al...... 7O4/7 2010, 004.8242 A1 2/2010 Rhoads et al. 455,556.1 6,757,646 B2 6, 2004 Marchisio ... 2010, O2SO497 A1 9/2010 Redlich et al...... TO7/661 6,859,800 B1 2/2005 Roche et al. 2012/0254188 A1* 10/2012 Koperski et al...... 707/740 6,862,710 Bl 3/2005 Marchisio ... 6,910,003 B1 6/2005 Arnold et al...... TO4/4 FOREIGN PATENT DOCUMENTS 6,996,575 B2 2/2006 Cox et al...... 707/102 7,051,017 B2 5/2006 Marchisio ...... 707/3 WO WOOOf 14651 3, 2000 7,054,854 B1 5, 2006 Hattori et al. .. 707/3 WO WOOO,573O2 9, 2000 7,171,349 B1 1/2007 Wakefield et al. . 704.9 WO WOO1/2228O 3, 2001 7.283,951 B2 10/2007 Marchisio et al. . 704.9 WO WOO1/8O177 10, 2001 7,398,201 B2 * 7/2008 Marchisio et al. . 704.9 WO WO O2/27536 4/2002 7,403,938 B2 7/2008 Harrison et al. ... 707/3 WO WOO2,33583 4/2002 7,451,135 B2 * 1 1/2008 Goldman et al...... 1f1 WO WOO3,O17143 2, 2003 7,526,425 B2 4/2009 Marchisio et al. . 704.9 WO WO 2004/053645 6, 2004 7,672,833 B2 3/2010 Blume et al...... TO4f10 WO WO 2004/114163 12, 2004 7,788,084 B2 8, 2010 Brunet al. WO WO 2006/068872 6, 2006 8, 112,402 B2 2/2012 Cucerzan et al...... 707/705 8,122,016 B1 2/2012 Lamba et al. OTHER PUBLICATIONS 8,122,026 B1 2/2012 Laroco, Jr. et al. 707/737 2002fOOO7267 A1 1/2002 Batchilo et al. .... TO4/9 Cass, "A Fountain of Knowledge.” IEEE Spectrum Online, URL: 2002/0010574 A1 1/2002 Tsourikov et al 704.9 http://www.spectrum.ieee.org/WEBONLY/publicfeature/jan04/ 2002/0059161 A1 5/2002 Li ...... 707/1 0104.compl.html, download date Feb. 4, 2004, 8 pages. 2002fOO78041 A1 6, 2002 Wu 707/4 Feldman et al., “Text Mining at the Term Level.” Proc. of the 2" 2002fOO78045 A1 6, 2002 Dutta European Symposium on Principles of Data Mining and Knowledge 2002/009 1671 A1 7/2002 Prokoph ...... 297. Discover, Nantes, France, 1998. 28:8-9) A. 39: St.ustejovsky i. et al"al. 707/3 Ilyaset al., “A Conceptual Architecture for Semantic Search Engine.” 2002/01567.63 A1 10, 2002 Marchisio ...... JayapandianIEEE, INMIC, et 2004, al., “Automating pp. 605-610. the Design and Construction of 2003,000471.6 A1 1/2003 Haigh et al. 704.238 Query Forms.” Data Engineering, Proceedings of the 22" Interna 38. 9. . A. 23. R al al. 042%. tional Conference IEEE, Atlanta, Georgia, Apr. 3, 2006, pp. 125- 127. 2003.01.15191 A1 6, 2003 CO R et al. 7073 Kaiser, "Ginseng A Natural Language User Interface for Semantic 568. A 1556 ME". 7. Web Search,” University of Zurich, Sep. 16, 2004, URL http://www. 2004/0010508 A1* 1 2004 Fest etal 707,102 ifi.unizh.ch/archive/mastertheses/DA Arbeiten 2004/Kaiser 2004/0044669 A1* 3, 2004 Brown et al. 707/100 his pit pp. 1-84. istical f s 2004/0064447 A1 4/2004 Simske et al...... 707/5 Liang et al., “Extracting Statistical Data Frames from Text, 2004/0103090 A1 5/2004 Doglet al...... 707/3 SIGKDD Explorations, Jun. 2005. vol. 7, No. 1, pp. 67-75. 2004/O125877 A1 7/2004 Chang et al. ... 375,240.28 Littman et al., “Automatic Cross-Language Information Retrieval 2004O167870 A1 8/2004 Wakefield et al...... 707/1 using Latent Semantic Indexing.” In Grefenstette, G., editor, Cross 2004O167883 A1 8/2004 Wakefield et al...... 707/3 Language Information Retrieval. Kluwer, 1998. 2004O167884 A1 8/2004 Wakefield et al...... 707/3 Nagao et al., “Semantic Annotation and Transcoding: Making Web 2004O167885 A1 8, 2004 Wakefield et al. . 707/3 Content More Accessible.” IEEE Multimedia, IEEE Computer Soci 2004O167886 A1 8, 2004 Wakefield et al. . 707/3 ety, US. 8(2):69-81, Apr. 2001. 2004O167887 A1 8/2004 Wakefield et al...... 707/3 Nguyen et al., "Accessing Relational Databases from the WorldWide 2004/O167907 A1 8, 2004 Wakefield et al. . 707/100 Web.” SIGMOD Record. ACM USA, Jun. 1996, vol. 25, No. 2, pp. 2004O167908 A1 8/2004 Wakefield et al...... 707/100 529-540. US 8,725,739 B2 Page 3

(56) References Cited Google, “How to InterpretYour Search Results'. http://web.archive. org/web/20011116075703/http://www.google.com/intl/en/helpin OTHER PUBLICATIONS terpret.html, Mar. 27, 2001; 6 pages. Razvan Bunescu et al., “Using Encyclopedic Knowledge for Named Pohlmann et al., “The Effect of Syntactic Phrase Indexing on Entity Disambiguation', 2006, Google, pp. 9-16. Retrieval Performance for Dutch Texts.” Proceedings of RIAO, pp. Silviu Cucerzan, "Large-Scale Named Entity Disambiguation Based 176-187, Jun. 1997. Rasmussen, “WDB-A Web Interface to Sybase.” Astronomical Soci on Wikipedia Data'. Proceedings of the 2007 Joint Conference on ety of the Pacific Conference Series, Astron. Soc. Pacific USA, 1995, Empirical Methods in Natural Language Processing and Computa vol. 77, pp. 72-75. tional Natural Language Learning, pp. 708-716, Prague, Jun. 2007. Sneiders, "Automated Question Answering Using Question Tem Razvan Constantin Bunescu, "Learning for Information Extraction: plates That Cover the Conceptual Model of the Database.” Natural From Named Entity Recognition and Disambiguation to Relation Language Processing and Information Systems, 6' International Extraction' The Dissertation Committee for Aug. 2007. The Univer Conference on Applications of Natural Language to Information sity of Texas at Austin, pp. 1-150. Systems, Revised Papers (Lecture Notes in Computer Science vol. Joseph Hassell et al., “Ontology-Driven Automatic Entity Disam 2553), Springer-Verlag, Berlin, Germany, 2002, vol. 2553, pp. 235 biguation in Unstructured Text” Large Scale Distributed Information 239. Systems (LSDIS) Lab Computer Science Department, University of Ruiz-Casado et al., “From Wikipedia to Semantic Relationships: A Georgia, Athens, GA 30602-7404, ISWC 2006, LNCS 4273, pp. Semi-Automated Annotation Approach'. 2006, pp. 1-14. 44-57. Florian et al., “Named Entity Recognition through Classifier Com Levon Lloyd et al. “Disambiguation of References to Individuals” bination', 2003, IBM T.J. Watson Research Center, pp. 168-171. IBM Research Report, Oct. 28, 2005, pp. 1-9. Dekai Wu, A Stacked, Voted, Stacked Model for Named Entity Rec ognition, 2003, pp. 1-4. * cited by examiner U.S. Patent May 13, 2014 Sheet 1 of 12 US 8,725,739 B2

; s Category-Based Corter Recorrendation

sessssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss 8: repository of 88ities arci X}}cepts -i, 2

88: 888 & 88kei is x 8:::ities : 888 {x}:::::: i.e. i. 8 c{g} is

&ssig categories to 88c: {{{tes: 888 8 88 xxxi.;&

Determine corter items that are relevar: 108 to 8: ixicatex category &

sssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss

8terix xxxiiar axio 888 gig - if categories &

Fig. 1 U.S. Patent May 13, 2014 Sheet 2 of 12 US 8,725,739 B2

~~~~.*

*··:

*******************************{&#;&#------~--~~~~$

s

X------×3. ------8 &

S *

...……*************…….ºº:::::::::::::::::::::::::::::::::::****************************** &! $3&###### U.S. Patent May 13, 2014 Sheet 3 of 12 US 8,725,739 B2

*****…&&&&&&&&#x<3<***********•*******:*****************3

U.S. Patent May 13, 2014 Sheet 4 of 12 US 8,725,739 B2

:& U.S. Patent May 13, 2014 Sheet 5 of 12 US 8,725,739 B2

U.S. Patent May 13, 2014 Sheet 6 of 12 US 8,725,739 B2

vir'61) U.S. Patent US 8,725,739 B2

gº‘fit-1 U.S. Patent May 13, 2014 Sheet 8 of 12 US 8,725,739 B2

3,3%,že U.S. Patent Sheet 9 of 12 US 8,725,739 B2

3×3×3×3×3##3

.* ºº~~~~~~~~~);~~~~~~~~~~~~~~~~~~~~~); U.S. Patent May 13, 2014 Sheet 10 of 12 US 8,725,739 B2

------

:: : : : : : : : : ------.

888

};&#####

x------U.S. Patent May 13, 2014 Sheet 11 of 12 US 8,725,739 B2

------

------imipie entities referenced by the content - 602 8 ite 8 -----

store the determined multiple entities and 80s the determined at east are category X 8

- s * -* -- -- * -* s w * ^ a. *::: ------^ 838W cotte -88x w ^ ^ 88s.a. - '^ x *- * - ^*

Fig. 6 w M. *::::::: U.S. Patent May 13, 2014 Sheet 12 of 12 US 8,725,739 B2

: s ------Y------

:X correspondingSeied: 8 xxxter category ites: at that 8as matches a the x...'-ex. - iricatex categxy

s sesssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss

fast: 3: 8:88): {3 & 8888 & | 706 {x}88: i.e.

Fig. 7 US 8,725,739 B2 1. 2 CATEGORY-BASED CONTENT FIG. 6 is an example flow diagram of a content indexer RECOMMENDATION process performed by an example embodiment. FIG. 7 is an example flow diagram of a content recom TECHNICAL FIELD mender process performed by an example embodiment. The present disclosure relates to methods, techniques, and DETAILED DESCRIPTION systems for category-based content recommendation and, more particularly, to methods, techniques, and systems for Embodiments described herein provide enhanced com recommending content items, such as news stories, that ref puter- and network-based methods and systems for recom erence entities in an indicated category. 10 mending content and more particularly, recommending con tent items in specified categories. Example embodiments provide a content recommendation system (“CRS) config BACKGROUND ured to recommend contentitems such as articles, documents, Videos, advertisements, product information, Software appli Various approaches for automated categorization (or clas 15 cations/modules, and the like. Recommending content items sification) of texts into predefined categories exist. One approach to this problem uses machine learning: a general may include determining content items that are in an indi inductive process automatically builds a classifier by learn cated category, based on whether entities referenced by those ing, from a set of pre-classified documents that are repre content items are in or otherwise related to the indicated sented as vectors of key terms, the characteristics of the category. In some embodiments, the content items are news categories. Various machine-learning techniques may be items that include timely information about recent or impor employed. In one approach, for each category, a set of human tant events. Example news items may be documents that labeled examples are collected as training data in order to include news stories or reports, press releases, video news build classifiers, such as Decision Tree classifiers, Naive reporting, radio news broadcasts, or the like. Bayes classifiers, Support Vector Machines, Neural Net 25 In some embodiments, the CRS indexes a corpus of content works, or the like. A separate classifier typically must be built items by determining semantic information about the content for each new category. Such approaches also may not scale items. Determining semantic information may include deter well when processing a large quantity of documents. For mining entities that are referenced by the content items. A example, to add a new category, a new classifier may need to content item may reference an entity by naming or describing be built. Then, every document may need to be run through 30 the entity. Entities may include people, places (e.g., loca the resulting classifier. tions), organizations (e.g., political parties, corporations, In addition, various approaches to providing computer groups), events, concepts, products, substances, and the like. generated news Web sites exist. One approach aggregates For example, the sentence "Sean Connery starred as James headlines from news sources worldwide, and groups similar Bond.” names the entities Sean Connery and James Bond. stories together. The stories are grouped into a handful of Table 1, below, includes a list of example entity types. Fewer broad, statically defined categories, such as Business, Sports, or more entity types may be available. Entertainment, and the like. In some approaches, the presen Determining semantic information may further include tation of news items may be customized. Such as by allowing determining categories that are associated with the deter users to specify keywords to filter news items. However, such mined entities. In some embodiments, entities may further be a keyword-based approach to customization may be limited 40 associated with (e.g., related to) one or more categories (also because it can be difficult or impossible to express higher called “facets'). Example facets include actor, politician, ath order concepts with simple keywords. For example, if a user lete, nation, drug, sport, automobile, and the like. The entities wishes to obtain articles about NBA players, the and/or categories may be arranged in a semantic network, term “NBA” may yield an over-inclusive result set, by includ Such as a taxonomic graph that includes relations between ing many articles that do not mention any basketball players. 45 various categories and/or entities. In one embodiment, each On the other hand, the terms "NBA basketball player may entity is associated with at least one category, via an is-a yield an under-inclusive result set, by not including articles relationship (e.g., Sean Connery is-a Actor), and each cat that do not include the specified keywords but that do mention egory may be associated with one or more other categories some NBA basketball player by name. (e.g., an Actor is-a Person). The CRS may exploit the rela 50 tionships or links in the semantic network to determine and store relevant categories associated with a particular content BRIEF DESCRIPTION OF THE DRAWINGS item. Table 2, below, includes a list of example categories/ facets. Fewer, greater, or different categories may be incor FIG. 1 is an example flow diagram of a category-based porated or utilized. content recommendation process performed by an example 55 In some embodiments, the CRS provides a search and embodiment. discovery facility that is configured to recommend content FIG. 2 illustrates an example diagram of an example items that include or otherwise are related to one or more embodiment of a content recommendation system. specified categories. In one embodiment, the CRS receives an FIGS. 3A-3C illustrate example screen displays provided indication of a category, such as via a Web-based search 60 interface. In response, the CRS determines (e.g., finds, by an example embodiment of a content recommendation selects, obtains, identifies) one or more content items that system. reference or are otherwise related to the specified category. FIGS. 4A-4D illustrate example data processed and/or uti The CRS may then rank or order the selected content items, lized by an example embodiment. Such that more relevant content items appear before less rel FIG. 5 is an example block diagram of an example com 65 evant content items. The CRS then provides indications of the puting system for implementing a content recommendation selected content items, such as by storing, transmitting, or system according to an example embodiment. forwarding the selected content items. US 8,725,739 B2 3 4 1. Overview of Category-Based Content Recommendation in given the category “NCAA basketball, the process may One Embodiment determine and return teams and players that are most popular FIG. 1 is an example flow diagram of a category-based in the news recently. Such additional information may then content recommendation process performed by an example also be provided in response to a received search query or embodiment. In particular, FIG. 1 illustrates a process that other request. may be implemented by and/or performed by an example At block 110, the process determines popular and/or content recommendation system. The process automatically emerging categories. In one embodiment, the CRS is config provides content items relevant to a specified topic category. ured to provide a feed, stream, or other dynamic collection, The process begins at block 102, where it builds a reposi Such as may be part of an automated news service, where tory of entities and concepts. In one embodiment, building the 10 news items are organized by popular and/or emerging catego repository may include automatically identifying entities by ries. Popular and/or emerging categories may be automati processing structured or semi-structured data, Such as may be cally determined based on popularity ranking and detection obtained Wikipedia, Techcrunch, or other public or private of emerging categories, by aggregating popularity measures data repositories, knowledge bases, news feeds, and the like. of entities and documents that belong to each category. The In other embodiments, unstructured text documents or other 15 Source of entity popularity measures may be page views, user content items (e.g., audio data) may be processed to identify clicks, recent (e.g., last day/week) mentions in content items, entities. As noted above, entities may be organized into taxo Wikipedia traffic, Twitter references, and the like. nomic hierarchies, based on taxonomic relations such as is-a, In addition, the CRS may also determine popular and part-of, member-of, and the like. In some embodiments, the emerging entities within a category. For example, for a cat entities are also associated with properties. Taxonomic paths egory such as NBA Basketball, the CRS may determine enti and/or properties may be extracted from structured and semi ties (e.g., players, teams, coaches) that have recently received structured sources (e.g., Wikipedia). An example taxonomic significant press coverage, such as NBA basketball teams hierarchy is illustrated with respect to FIG. 4D. (and their players) that are playing each other in a current At block 104, the process determines a ranked list of enti championship series or other event that has recently (e.g., ties for each contentitem in a corpus of contentitems. In some 25 within the last day, week, month) received additional press embodiments, the process uses entity tagging and disambigu coverage. ation to link references to entities that occur in the text of a The CRS may identify, organize, and/or present content content item to entries in the repository of entities and con items in other or additional ways. For example, within a given cepts generated at block 102. Then, for each content item, the category, the CRS may identify content items that reference process determines a ranked list of entities, ordered by their 30 or are related to particular events. For example, in a sports importance and relevance to the main Subject/topic of the category, the CRS may automatically identify, group, and/or content item. present content items about various sports-related events, At block 106, the process assigns categories to each con Such as injuries within particular sports (e.g., injuries suffered tent item in the corpus of content items, based on the ranked by football players), awards (e.g., player of the year), player list of entities determined at block 104. The categories may be 35 team engagements (e.g., contract renewals), or the like. Or, in or include any node or path in a semantic network and/or a an entertainment category, the CRS may automatically iden taxonomic graph, or any properties that may be shared by a tify content items about common gossip events, such as group of entities (e.g., Pac-10 conference teams, University romances, break-ups, movie openings, or the like. of Washington Huskies football players, left-handed baseball 2. Functional Elements of an Example Content Recommen pitchers, rookie football quarterbacks). The assigned catego 40 dation System ries may be based on groups of entities or entity types, FIG. 2 illustrates an example block diagram of an example grouped based on their taxonomic paths and/or any selected embodiment of a content recommendation system. In particu properties. Assigning categories to a contentitem may further lar, FIG. 2 illustrates a content recommendation system include storing the determined categories in an inverted index (“CRS) 200 that includes a content ingester 211, an entity or other type of data structure for efficient retrieval at a later 45 and relationship identifier 212, a category identifier 213, a time. In other embodiments, the process assigns categories to content recommender 214, an optional other content recom only some of the content items in the corpus, leaving content mender 215, and a data store 217. The data store 217 includes items without an assigned category. This may occur for vari a content index 217a, an entity store 217b, and a relationship ous reasons, such as because no category can be determined, index 217c. for performance reasons (e.g., only recent content items are 50 The contentingester 211 receives and indexes content from processed), or the like. various content sources 255, including sources such as Web At block 108, the process determines content items that are sites, Blogs, news feeds, video feeds, and the like. The content relevant to an indicated category. In one embodiment, the ingester 211 may also receive content from non-public or CRS provides a search engine facility that can answer queries semi-public sources, including Subscription-based informa requesting content items related to one or more specified 55 tion services, access-controlled social networks, and the like. categories. Thus, determining relevant content items may The content ingester 211 provides content information, include finding content items that match or are otherwise including data included within content items (e.g., text, related to at least one of the specified categories. The deter images, video) and meta-data about content items (e.g., mined content items may be ranked by factors such as Source author, title, date, Source), to the entity and relationship iden credibility, popularity of the topic, recency, or the like. The 60 tifier 212. The content information may be provided directly determined content items may then be provided (e.g., trans (as illustrated) and/or via Some intermediary, Such as the mitted, sent, forwarded, stored). Such as in response to a content index 217a. received search query or other request. The entity and relationship identifier 212 determines Additional related information may also or instead be semantic information about content items obtained from the determined that this time, including producing a Summariza 65 various content sources 255, and stores the determined infor tion of the specified category by producing a list of current mation in the data store 217. More specifically, the entity and and popular entities in the specified category. For example, relationship identifier 212 receives content information from US 8,725,739 B2 5 6 the content ingester 211 and identifies entities and relation egory, the CRS 200 may provide a list of content items that ships that are referenced therein. Various automatic and semi reference that facet. In addition, given an indication of a automatic techniques are contemplated for identifying enti content item, the CRS may provide a list of entities or cat ties within content items. In one embodiment, the identifier egories referenced by that content item. 212 uses natural language processing techniques, such as The entity store 217b is a repository of entities (e.g., parts of speech tagging and relationship searching, to identify people, organization, place names, products, events, things), sentence components such as Subjects, verbs, and objects, and concepts, and other semantic information. In at least some to identify and disambiguate entities. Example relationship embodiments, the entities in the entity store 217b are related searching technology, which uses natural language process Such that they form a semantic network, taxonomy, or graph. ing to determine relationships between Subjects and objects in 10 The entities in the entity store 217b are associated with cat ingested content, is described in detail in U.S. Pat. No. 7,526, egories/facets. The categories themselves are organized into 425, filed Dec. 13, 2004, and entitled “METHOD AND SYS one or more taxonomies based on taxonomic relations such as TEM FOR EXTENDING KEYWORD SEARCHING FOR is-a, part-of, member-of, and the like. In addition, entities are SYNTACTICALLY AND SEMANTICALLY ANNO associated with certain properties, such as name and aliases, TATED DATA' issued on Apr. 28, 2009, and example entity 15 a unique identifier, types and facets, descriptions, and the like. recognition and disambiguation technology is described in Entities may also have type/facet-specific properties. For detail in U.S. patent application Ser. No. 12/288,158, filed example, for a sports athlete, common properties may Oct. 15, 2008, and entitled “NLP-BASEDENTITY RECOG include: birthplace, birth date, sports teams, player positions, NITION AND DISAMBIGUATION, both of which are awards, and the like. Note that some of the properties are incorporated herein by reference in their entireties. Amongst relational, that is, the property value may itself be another other capabilities, the use of relationship searching, enables entity in the entity store 217b. For example, the team property the CRS 200 to establish second order (or greater order) for an athlete may be link to a sports team entity in the entity relationships between entities and to store Such information store 217b, and vice versa. Thus, the entities in the entity store in the data store 217. 217b are interconnected through the property links, creating a For example, given a sentence such as "Sean Connery 25 semantic network or graph. Certain taxonomic relations are starred in Goldfinger the identifier 212 may identify “Sean represented as Such property links (e.g., the “member-of Connery' as the sentence subject, “starred as the sentence relation for the players-team relation, and team-league rela verb (or action), and “Goldfinger as the sentence object, tion in the sports domain). In some embodiments, the entities, along with the various modifiers present in the sentence. their taxonomic paths and/or properties are extracted from These parts-of-speech components of each sentence, along 30 one or more structured and semi-structured sources (e.g., with their grammatical roles and other tags may be stored in Wikipedia). In other embodiments, the process of identifying the relationship index 217c, for example as an inverted index entities may be at least in part manual. For example, entities as described in U.S. Pat. No. 7,526,425. As part of the index may be provisionally identified by the content ingester 211, ing process, the CRS recognizes and disambiguates entities and then Submitted to humans for editing, finalization, review, that are present in the text. Indications of these disambiguated 35 and/or approval. entities are also stored with the sentences information, when The category identifier 213 determines category-related the sentence contains uniquely identifiable entities that the semantic information about content items obtained from the CRS already knows about. These entities are those that have various content sources 255, and stores the determined infor been added previously to the entity store 217b. In some cases, mation in the data store 217. More specifically, the category the indexed text contains subjects and objects that indicate 40 identifier 213 receives content information from the content entities that are not necessarily known or not yet disambigu ingester 211 and entity and relationship information from the ated entities. In this case the indexing of the sentence may entity and relationship identifier 212, and determines catego store as much information as it has in index 217c, but may not ries associated with the entities referenced by content items. refer to a unique identifier of an entity in the entity store 217b. The category identifier 213 thus associates categories with Over time, as the CRS encounters new entities, and in some 45 content items, and store Such associations in the data store cases with the aid of manual curation, new entities are added 217. Such as by annotating content items stored in the content to the entity store 217b. In the above example, “Sean Con index 217a. The category identifier 213 may perform other or nery' and “Goldfinger may be unique entities already known additional category-related functions. Such as identifying to the CRS and present in the entity store 217b. In this case, popular or trending categories, Summarizing categories by their identifiers will be stored along with the sentence infor 50 determining popular entities in the category, or the like. mation in the relationship index 217c. The identified verbs The content recommender 214 provides indications of con also define relationships between the identified entities. tent items in response to a request received from a user 202 or These defined relationships (e.g., stored as Subject-action a device operated by the user 202. In one embodiment, the object or “SAO' triplets, or otherwise) are then stored in the content recommender 214 provides an interface (e.g., a Web relationship index 217c. In the above example, a representa 55 based interface, an application program interface) that tion of the fact that the actor Sean Connery starred in the film receives requests/queries that specify one or more categories. Goldfinger would be added to the relationship index 217c. In In response, the content recommender 214 determines con Some embodiments, the process of identifying entities may be tent items that are related to at least one of the one or more at least in part manual. For example, entities may be provi categories, and provides (e.g., transmits, sends, forwards) sionally identified by the identifier 212, and then submitted to 60 indications of the determined content items. In another curators (or other humans) for editing, finalization, review, embodiment, the content recommender 214 operates in a and/or approval. “push’ model, where it provides a stream or feed of content The content index 217a associates content items with one items related to one or more categories. or more entities and categories, and vice versa, in order to The optional other content recommender 215 provides rec Support efficient searches such as searches for content items 65 ommendations of other types of content obtained from or having a particular entity or for categories associated with a provided by third-party services/sources. In some embodi particular content item. For example, given an entity or cat ments, the recommender 215 may query third-party services US 8,725,739 B2 7 8 to retrieve other media types (e.g., videos, podcasts, social respect to the ordering of the code flow, different code flows, media messages) that may not be included in the content etc. Thus, the scope of the techniques and/or functions index 217a. In one embodiment, the recommender 215 may, described are not limited by the particular order, selection, or given a specified category, automatically construct a query decomposition of steps described with reference to any par adapted for a third-party information/content service by tak ticular routine. ing the top entities (e.g., top three) from a list of current and 3. Example Screen Displays for Category-Based News Item popular entities for the specified category. Recommendation In addition, although the described techniques for content FIGS. 3A-3C illustrate example screen displays provided recommendation are illustrated primarily with respect to tex by an example embodiment of a content recommendation tual content, other types of content are contemplated. 10 In one embodiment, the CRS 200 may utilize at least some system. In particular, FIG. 3A illustrates a Web browser 300 of the described techniques to perform or facilitate the cat that displays a screen 302 (e.g., defined by a received Web egory-based recommendation of contentitems based on other page) that is being used by a user to interact with the content types of content, including advertisements, audio (e.g., recommendation system to view news items organized by music), video, images, and the like. In some embodiments, 15 category. The screen 302 includes a menu bar 304 and a news the CRS 200 is configured to ingest video streams (e.g., live items area 306. The menu bar 304 includes multiple controls, streaming of sports games) in a similar fashion. In particular, here labeled “US & World.” “Entertainment,” “Sports.” the CRS 200 may obtain text content from the stream via “Business.” “Technology, and “More.” The multiple controls either closed captions or speech recognition. Then, the CRS allow a user to specify a particular category for which he 200 analyzes the obtained text content as discussed above, wishes to view news items. In this example, the user has such that the CRS 200 can provide category-based recom specified a Politics category, and in response, the news items mendations for Such content items as well. area 306 is updated to present news items that are related to Furthermore, the described techniques are not limited to the specified category. the specific architecture shown in FIG. 2. For example, in The news items area 306 includes multiple news item snip Some embodiments, content ingestion and relationship iden 25 pets, such as snippets 308 and 310. Each snippet provides tification may be performed by another (possibly external or information about a news item, such as the title of the news remote) system or component, such as a stand-alone content item, some of text from the news item, the source of the news indexing, search, and discovery system. In other embodi item, or the like. For example, snippet 308 provides informa ments, the CRS 200 may not interact directly with users as tion about an AP Online story, including its title (“FCC opens shown, but rather provide user interface components (e.g., 30 up unused TV signals for broadband') and some of its text recommender widgets, plug-ins) that may be embedded or (“The Federal Communications Commission is opening up otherwise incorporated in third-party applications or systems, unused airwaves between television stations...”). Note that Such as Web sites, Smart phones, desktop systems, and the the news item presented via snippet 308 does not mention the like. term politics. Rather, the CRS has determined that various Although the techniques of category-based content recom 35 entities referenced in the news item (e.g., the FCC) are entities mendation and the CRS are generally applicable to any type that are in some manner related to the Politics category. Snip of content item, the phrase “content item is used generally to pet 310 provides information about a social media message, refer to or imply any type of information and/or data, regard in this case a post from the Twitter micro-blogging service. less of form or purpose. For example, a content item may be Again, the underlying social media message presented via in textual or binary format, or a content item may be a news 40 snippet 310 does not include the term politics. Instead, the item, a report, an image, an audio source, a video stream, a CRS has determined that the referenced entity Timothy Gei code module (e.g., an application, an executable), an online thner is related to the Politics category. activity (e.g., to purchase a good or service), or the like. FIG. 3B illustrates the specification of a content item cat Essentially, the concepts and techniques described are appli egory by a user. In particular, in FIG.3B, the user has selected cable to any category-based recommendation system. Also, 45 the control labeled “Sports” in the menu bar 304. In response, although certain terms are used primarily herein, other terms a category selector menu 310 has been presented. The menu could be used interchangeably to yield equivalent embodi 310 includes a category section 312 and a trend section 314. ments and examples. For example, the term "category' and The category section 312 indicates multiple categories that “facet' are used interchangeably. Other terms for category are related to the general Sports category, including MLS may include “class.” “property-based set,” or the like. In 50 (Major League Soccer), Soccer, MLB (Major League Base addition, terms may have alternate spellings which may or ball), Cycling, and the like. The trend section 314 indicates may not be explicitly mentioned, and all Such variations of multiple entities that have been identified by the CRS as terms are intended to be included. popular, emerging, and/or “trending. Trending entities or Example embodiments described herein provide applica categories include those that have been recently (e.g., in the tions, tools, data structures and other Support to implement a 55 last day, week, month) been receiving an increase in attention, content recommendation system to be used for recommend for example measured by number of page views, number of ing content items, such as news items, that belong to a par news stories, or the like. A trending entity/category may be ticular category. Other embodiments of the described tech identified by measuring activity (e.g., number of news sto niques may be used for other purposes, including for ries) with respect to historical averages for the trending entity/ category-based recommendation of technology reports (e.g., 60 category and/or with respect to other entities/categories. reviews of computer systems, programs, games, mobile FIG. 3C illustrates content items presented in response to devices). In the following description, numerous specific specification of a content item category by a user. In particu details are set forth, Such as data formats and code sequences, lar, in FIG. 3C, the user has selected the category labeled etc., in order to provide a thorough understanding of the “MLB' displayed in the category section 312 of FIG. 3B. In described techniques. The embodiments described also can 65 response, the category section 306 of the Web page 302 is be practiced without some of the specific details described updated to present news items that are about or in some herein, or with other specific details, such as changes with manner reference the category Major League Baseball. US 8,725,739 B2 10 Although the category-based content recommendation ranking may be based on one or more of the following factors: techniques of FIGS. 3A-3C have been described primarily number of mentions (e.g., references) of each entity in the with reference to Web-based technologies, the described text; positions of the mentions in the text(e.g., entities appear techniques are equally applicable in other contexts. For ing in document title may be weighted more; entities appear example, category-based content recommendation may be 5 ing earlier in the text would be weighted more than the ones performed in the mobile computing context, such as via a appearing later in the text; entities appearing in boilerplate newsreader application/module or other type of code module text may be weighted less); and penalties to certain types of that is configured to execute on a mobile device (e.g., a Smart entities (e.g., if the publisher of the document appears in the phone) in order to present news or other content items for consumption by a user. 10 text, it may be weighted less). 4. Entity Identification and Category Aggregation in an FIG. 4D a portion of an example taxonomic tree. In par Example Embodiment ticular, FIG. 4D illustrates a taxonomic graph 430. The illus FIGS. 4A-4D illustrate example data processed and/or uti trated taxonomic graph 430 is a tree that represents a hierar lized by an example embodiment. In particular, FIGS. 4A-4D chy of categories that each have Zero or more child categories illustrate various types of data used to support a running 15 connected via an arc or link representing a relation. The example of entity and category identification and aggregation hierarchy begins with a unique root category (here labeled performed by an example embodiment of a content recom “Evri) 432, which has child categories 434a-434c, respec mendation system. tively labeled Organization, Person, and Location. Category FIGS. 4A and 4B show a representation of two entities. In 434a has child category 436a, which in turn has child cat particular, FIGS. 4A and 4B illustrate XML-based represen- 20 egory 438a. Portions of the graph 430 that are not shown are tations of a basketball player entity named Martell Webster illustrated by ellipses, such as ellipses 450. FIG. 4D also and a basketball team entity named the Portland Trailblazers, illustrates entities 440a-440e, linked to their respective cat respectively. In FIG. 4A, Martell Webster is represented by egories (e.g., via an is-a relation). For example, entities 440a structure 400, which includes a facets section 402, a name (Trail Blazers) and 440b (Lakers) are Basketball Teams section 404, a properties section 406, and a type section 408. 25 (438a). Although the illustrated graph 430 is a tree in the The facets section 402 represents one or more facets/catego illustrated embodiment, in other embodiments other graph ries, each of which includes a facet name (e.g., Basketball structures may be utilized, including general directed or undi Player) and a taxonomic path, which is a path in a taxonomic rected graphs. tree or other type semantic graph. An example taxonomic Nodes in the graph 430 also have associated levels, where graph is described with respect to FIG. 4D, below. The name 30 the root is at the first level, categories at the level of category section 404 represents a name (e.g., “Martell Webster) for 434a are at the second level, categories at the level of category the illustrated entity. The properties section 406 represents 436a are at the third level, and so on. When levels are counted one or more properties of the entity, which are name-value starting from one, the level of a category is defined as one plus pairs that describe some aspect of the entity. In this example, the number of arcs/links that must be traversed to reach the the properties are birth date=Dec. 4, 1986 and 35 root category. sports league-NBA. The type section 408 indicates a “top A taxonomic path is a path between one category and level” category to which the entity belongs, in this case PER another in the graph 430. For example, the path connecting SON. categories 432, 434a, 436a, and 438a form a taxonomic path In FIG. 4B, the Portland Trailblazers team is represented that specifies the Basketball Team category as well as all of its by a structure 410, which includes a facets section 412, a 40 ancestor categories up to the root of the graph 430. The path name section 414, a properties section 416, and a type section connecting categories 432, 434a, 436a, and 438a, may also be 418. The sections 412–418 include, represent, or indicate data denoted textually as: Evri/Organization/Sports/Basket of types similar to those described with respect to sections ball Team. 402-408 of FIG. 4A. Here, the entity has a facet/category of In the processing of the example content item shown in Basketball Team (section 412), a name of “Portland Trail 45 FIG. 4C, a resulting ranked list of entities (in non-increasing Blazers” (section 414), properties of sports league-NBA and order), along with their respective taxonomic paths, may be: number championships=3 (section 416), and a type of 1. Evri/Organization/Sports/Basket ORGANIZATION (section 418). As discussed above, enti ball Team ties such as the ones described with respect to FIGS. 4A and 2. Los Angeles Lakers Evri/Organization/Sports/Basket 4B may be stored in an entity repository, Such as the entity 50 ball Team Store 217b of FIG. 2. 3. Martell Webster Evri/Person/Sports/Athlete/Basket FIG. 4C illustrates an example content item. In particular, ball Player FIG. 4C illustrates a news item 420, which is a news story 4. Staples Center Evri/Location/Sports/Sports Venue about a basketball game between the Portland Trailblazers In the illustrated example, the above ranking results due to and the Los Angeles Lakers. The news item 420 includes 55 the relative number of mentions of the entities in the news entity references 422a-422g. The CRS processes the text of item of FIG. 4C. In particular, the Portland Trail Blazers are the news item 420, recognizes the entity references 422a mentioned three times, the Los Angeles Lakers and Martell 422g, and determines (e.g., links) the references to corre Webster are each mentioned twice, and the Staples Center is sponding entities stored in the entity store. For example, mentioned once. references 422a, 422d, and 422fare linked to the Portland 60 In the next step, the CRS assigns categories to the content Trailblazers entity described with respect to FIG. 4B; refer item, based on the ranked list of entities. The CRS takes the ences 422b and 422g are linked to a Los Angeles Lakers top Kentities from the ranked list. In one implementation, entity; reference 422c is linked to the Martell Webster entity K=5. By taking the top Kentities, the CRS includes only the described with respect to FIG. 4A; and reference 422d is entities that are relatively important and relevant to the docu linked to a Staples Center entity. 65 ment. Then, the CRS derives categories by aggregating the Next, the CRS ranks the tagged entities by their importance common nodes in the taxonomic paths of the top entities, as and relevance to the main subject of the content item. The well as their properties, as described below. US 8,725,739 B2 11 12 First, the CRS counts the third-level nodes in the taxo news as well as scheduled events (e.g., Oscars Academy nomic paths, which correspond to the major topic domains Award, March madness college basketball). Another (e.g., Politics, Entertainment, Sports, Business, Technology, approach includes automated popularity ranking and detec Heath, Food, etc.). In FIG. 4D, these are categories at the level tion of emerging categories, by looking at popularity of indi of category 436a (e.g., Sports, Business, Health). The CRS 5 vidual entities and aggregating to the category level. The then ranks the topic domains by their frequency. Next, the Sources of entity popularity may include one or more of CRS takes the top two highest ranked topic domains and Frequency counts in an index of news articles, as well as a assigns them as category to the document. In the above corresponding timeline used in order to detect emerging example, all the entity taxonomic paths share a common entities that have rising popularity. top-level node Sports. Therefore, the CRS assigns the cat- 10 User clicks on entities. egory "Sports to the given document. Wikipedia page view counts (e.g., Wikipedia traffic statis Second, the CRS ranks the leaf nodes (facets) of the taxo tics on an hourly basis are publicly available from nomic path, based on the number of entities having the facet Sources such as http://stats.grok.se). and/or the rank of the entity in the ranked entity list. In the Number of mentions in Tweets or other social media mes above example, the ranked list of facets would be: (1) Bas- 15 sages, obtained through the Twitter API, for example. ketball Team (because the Trail Blazers and Lakers are both Popular/trending queries from search engines such as basketball teams), (2) Basketball Player (because Martell Google and Yahoo. Webster is a basketball player), and (3) Sports Venue (because 5. Category Indexing in an Example Embodiment the Staples Center is a sports venue). Those facets would be In one embodiment, for the underlying information assigned to the document as facet categories. 2O retrieval capability, the CRS uses a typical Vector Space Other or additional processing may also be performed. For Model (“VSM) based system. An example of such systems example, any levels or individual nodes/paths in the tax is Apache Lucene. The vector space model procedure can be onomy may be selected as a category, and the same entity divided in to three stages. The first stage includes document aggregation process can then be applied. For example, the indexing where content bearing terms are extracted from CRS may assign a category “entertainment people” by 25 content item text and indexed/stored. The second stage aggregating entities that share taxonomic path EVri/Person/ includes weighting of the indexed terms to enhance retrieval Entertainment. Similarly, by aggregating on path EVri/Per of content items relevant to a user. A common weighting son/Sports/Athlete, the CRS would create an "athletes' cat scheme for terms within a content item is based on the fre egory that covers players in all kinds of sports. quency of term occurrence. The last stage includes ranking Besides nodes/paths in the taxonomy, the CRS may also 30 the content items with respect to a query according to a Support configurable categories based on entity properties. similarity measure. This allows the CRS to create categories from any group of In one embodiment, the CRS creates a special term for each entities that share a common property. A configuration is a set category that is to be assigned to a content item in the index. of key-value pairs, where the key is a facet, and the value is a As discussed above, during the indexing of a contentitem, the list of properties to be checked. For example, a basketball 35 CRS identifies the top Kentities in the content item, collects player entity may have the following configuration: Facet the facets assigned to the top entities, and then annotate the (key)=Basketball Player and Properties (value)=Sports contentitem with those facets using special tokens (facet.bas League. ketball. team for the category/facet Basketball Team). The During processing of a document, the CRS processes the tokens are treated as single terms in the index. During search list of entities assigned to the content item. For each entity, the 40 time, given a facet category Basketball Team as input, the CRS takes its facets, and performs a lookup against its con CRS translates it into a query term “facet.basketball..team.” figuration. If there is a match, the CRS obtains the property Content items containing the query term can then be located list from the configuration. Then, the CRS retrieves the enti and returned. ty's corresponding entry from the entity store. The CRS then The CRS may retrieve and rank content items in the index processes the entity’s properties available in the entity store, 45 by determining how relevanta given contentitem is to a user's and determines if the entity has the property specified in the query. In one embodiment, the more times a query term configuration. If a match is found, the CRS adds the property appears in a content item relative to the number of times the value as a potential category. term appears in all the content items in the collection, the For example, leaf nodes in the example taxonomy of FIG. more relevant that content item is to the query. This is accom 4D are Basketball Player and Basketball Team. The CRS 50 plished by using TF-IDF (term frequency-inverse document can assign finer granularity categories (e.g., NBA basketball, frequency) weighting in the scoring of content items. As NCAA college basketball, FIBA international basketball), by noted above, the CRS creates artificial tokens for the catego using a property configuration. As shown above in FIGS. 4A ries extracted from the content item. These tokens are thus not and 4B, basketball players and teams have a property “Sports terms originally appearing in the content item text. However, League' in the Entity-Store. In our example, the entities are: 55 the CRS can simulate the TF weights on the category tokens, Portland Trail Blazers—Sports League-NBA by controlling how many times each token is added to the Los Angeles Lakers—Sports League NBA index. The frequency may be determined based on the impor Martell Webster Sports League-NBA tance measure of the categories that are computed when rank Again, by aggregating the specified properties from the ing the categories. Another option to use a binary document ranked list of entities, the CRS extracts the top categories 60 vector (e.g., true/false to indicate whether the document is (NBA basketball in this case), and assigns them to the content assigned a particular category). With this approach, the CRS item, as discussed further below. takes the top K categories from the document, and consider As described above, in one embodiment, certain categories them as equally important, such that each category token is are always detected by default, while others can be specified added to the index once. through a configuration. To decide what categories to popu- 65 In addition, Some embodiments can influence search late in the index, human curators may select the categories results by “boosting.” The CRS may apply document level manually based on their knowledge of popular topics in the boosting, by specifying a boost value for a content item as it US 8,725,739 B2 13 14 is added to the index. The boost value for each content item is As discussed above, some embodiments of the CRS may computed as a combination of the items source credibility also determine popular, trending, or recententities for a given (e.g., articles from more credible or popular sources are category. In one embodiment, in addition to retrieving news weighted more) and publishing date (e.g., fresher or more articles, the CRS also produces a summary list of entities/ recent articles are weighted more). 5 concepts that are most popular at a current time for the given The CRS may be configured to retrieve recently published category. The popular entities may be extracted and ranked content items from credible sources. In order to do so, the based on entity occurrences in the top articles returned for the CRS may assign to each content item source a credibility given category. In one embodiment, the CRS iterates through score (e.g., 1 to 5), which is translated into a credibility the top articles in a result set, takes the top entities from each weight. Source credibility can be assigned manually and/or 10 article (top entities for each article are stored in the index), computed based on Some properties of a source (e.g., search and aggregates the occurrence frequency of each entity. engine rank of a source web page, Internet traffic to the Source Popular entities may be further extracted and ranked based on web page, etc.). Credibility weight can then be multiplied by entity popularity scores in an entity store. From the entity recency weight to form a boost value for a content item. store, the CRS may retrieve a list of entities that belong to the Recency weight may be based on the difference between the 15 specified category, ranked by their scores. The score for each publishing date of a document and the date when ingestion entity may be computed based on the total number of recent started. articles (in the content store) that mention the entity and/or 6. Category-based Content Item Retrieval user visit traffic to the entity’s page (e.g., via the CRS and/or In one embodiment, for a given category, the CRS returns external sites such as Wikipedia). a list of relevant and recent content items that are associated 7. Third-party Content Retrieval with the category in the index. The relevance ranking is based As noted above, the CRS may be configured to return, in on factors such as: query matching, based on TF-IDF weight response to a search or other request, indications of content ing as described above; source credibility; and/or publication items from external (e.g., third-party) sources. Third-party date. Sources may include or provide content items of various In one embodiment, the CRS generates a news stream for a 25 media types (e.g., images, videos, audio, social media mes given category by providing popular, fresh, and or breaking sages), some of which may not be indexed by the CRS. The news items that are associated with the category. Popular or CRS may be configured to retrieve results via external APIs or breaking news stories tend to be covered by multiple sources, other retrieval facilities provided by the third-party sources. get updated more frequently, and have more follow-up sto In Some cases, a third-party source may not have or utilize the ries. Thus, the CRS may capture popular stories by determin 30 same category-based approach used by the CRS. In Such ing its coverage in terms of the number of similar articles. cases, the CRS may translate categories used in the CRS to an After retrieving a list of relevant news articles, the CRS approximation query that would retrieve from the third-party may collapse the similar articles that cover the same topic into source content items similar to those that could be retrieved a group. The similarity between two articles is computed via the CRS itself. In one embodiment, categories may be based on one or more of distance between document signa 35 translated into keywords used as part of a search query. How tures computed during index; overlap between article titles; ever, such an approach may not provide good result coverage. overlap between article summaries; source URLs; and/or This problem may be exacerbated for certain media types, publisher. If the similarity measure exceeds some threshold, Such as images, Videos, and Social media messages (e.g., then two articles are considered as similar. tweets), where the textual content is typically very short. For In one embodiment, the CRS detects similar documents by 40 example, using keyword query “NBA Basketball may not computing a signature (e.g., a hash key of fixed length) of a find videos or tweets that are about championship games document Summary. A document Summary is a Snippet of the between the Los Angeles Lakers and Boston Celtics, where document text that best represents the document Subject and the words "NBA and “basketball” are not mentioned in the best matches the search query. To determine a document text, even though Such content items are clearly related to Summary, the CRS may scan through the document text, and 45 NBA Basketball. select a Snippet of a pre-determined window size, by maxi In another approach, for a given category, the CRS gener mizing a combination of the following measures: the density ates a keyword query where the keywords are based on the of the documents top entities appearing in the Snippet and entities related to the given category. In particular, the CRS overlap of the words in the snippet with the words in the takes the top entities (e.g., most popular and recent) within the document title. 50 category, and uses the names and/oraliases of such entities to In one embodiment, the logic for determining whether construct keyword queries against the third-party service. articles A and B are similar is as follows: if the article URLs Thus, if the category is NBA Basketball, the CRS may con are identical, return true; else if the Hamming distance struct a query that includes names of multiple (e.g., five) between the two article signatures is lower than a pre-deter popular NBA Basketball entities, such as the Los Angeles mined threshold, return true; else if overlap between article 55 Lakers, Boston Celtics, LeBron James, and the like. Queries titles is higher than a threshold, and overlap between article may be enhanced using techniques similar to those mentioned Summaries is higher than a threshold, return true; else, return in U.S. Patent Application No. 61/256,851, filed Oct. 30, false. 2009, and entitled “IMPROVING KEYWORD-BASED By using the above logic, articles that cover the same story SEARCH ENGINE RESULTS USING ENHANCED are grouped together. In one embodiment, each group con 60 QUERYSTRATEGIES, incorporated herein by reference in tains a head article, and underneath is Zero or more similar its entirety. articles. Article groups may then be ranked by various factors, In addition, in some cases third-party sources may use including one or more of the number of articles in each group descriptors or other indicators (e.g., tags) that can be mapped, (e.g., the larger the group size, usually the more popular the translated, or otherwise related to categories utilized in the story is); publication dates of the articles, preferring recently 65 CRS. For example, a video sharing service may have “chan published articles; and/or source diversity of the articles, nels' or other higher-level grouping constructs that can be preferring stories covered by many different sources. mapped (manually or automatically) to CRS categories. As US 8,725,739 B2 15 16 another example, a social messaging service (e.g., Twitter) functions performed by one or more of these components may provide streams from credible sources that can be related may be performed externally to the content recommendation to one or more CRS categories. In general. Such mappings system 510. For example, a separate content indexing and may be compiled or generated through automated discovery search system may host the content ingester 511, entity and and/or manual curation. Once a mapping has been estab relationship identifier 512, and at least some of the data store lished, the CRS can generate a query for a third-party source 517. by translating an indicated category into a corresponding The UI manager 515 provides a view and a controller that descriptor used by the third-party source. facilitate user interaction with the content recommendation 8. Example Computing System and Processes system 510 and its various components. For example, the UI FIG. 5 is an example block diagram of an example com 10 manager 515 may provide interactive access to the content puting system for implementing a content recommendation recommendation system 510, such that users can search for system according to an example embodiment. In particular, contentitems related to specified categories. In some embodi FIG. 5 shows a computing system 500 that may be utilized to ments, access to the functionality of the UI manager 515 may implement a content recommendation system 510. be provided via a Web server, possibly executing as one of the Note that one or more general purpose or special purpose 15 other programs 530. In Such embodiments, a user operating a computing systems/devices may be used to implement the Web browser executing on one of the client devices 560 can content recommendation system 510. In addition, the com interact with the content recommendation system 510 via the puting system 500 may comprise one or more distinct com UI manager 515. For example, a user may manually submit a puting systems/devices and may span distributed locations. search for content items related to a specified category. Furthermore, each block shown may represent one or more The API 516 provides programmatic access to one or more Such blocks as appropriate to a specific embodiment or may functions of the content recommendation system 510. For be combined with other blocks. Also, the content recommen example, the API 516 may provide a programmatic interface dation system 510 may be implemented in software, hard to one or more functions of the content recommendation ware, firmware, or in Some combination to achieve the capa system 510 that may be invoked by one of the other programs bilities described herein. 25 530 or some other module. In this manner, the API 516 In the embodiment shown, computing system 500 com facilitates the development of third-party software, such as prises a computer memory (“memory') 501, a display 502, user interfaces, plug-ins, news feeds, adapters (e.g., for inte one or more Central Processing Units (“CPU”) 503, Input/ grating functions of the content recommendation system 510 Output devices 504 (e.g., keyboard, mouse, CRT or LCD into Web applications), and the like. display, and the like), other computer-readable media 505, 30 In addition, the API 516 may be in at least some embodi and network connections 506. The content recommendation ments invoked or otherwise accessed via remote entities, such system 510 is shown residing in memory 501. In other as code executing on one of the client devices 560 or as part of embodiments. Some portion of the contents, some or all of the one of the third-party applications 565, to access various components of the content recommendation system 510 may functions of the content recommendation system 510. For be stored on and/or transmitted over the other computer 35 example, an application on a mobile device may obtain rec readable media 505. The components of the content recom ommended content items for a specified category via the API mendation system 510 preferably execute on one or more 516. As another example, one of the content sources 555 may CPUs 503 and recommend content items, as described herein. push content information to the content recommendation sys Other code or programs 530 (e.g., an administrative interface, tem510 via the API516. The API 516 may also be configured a Web server, and the like) and potentially other data reposi 40 to provide recommendation widgets (e.g., code modules) that tories, such as data repository 520, also reside in the memory can be integrated into the third-party applications 565 and that 501, and preferably execute on one or more CPUs 503. Of are configured to interact with the content recommendation note, one or more of the components in FIG. 5 may not be system 510 to make at least some of the described function present in any specific implementation. For example, some ality available within the context of other applications. embodiments may not provide other computer readable 45 The data store 517 is used by the other modules of the media 505 or a display 502. content recommendation system 510 to store and/or commu The content recommendation system 510 interacts via the nicate information. In particular, modules 511-516 may use network 550 with content sources 555, third-party applica the data store 517 to record various types of information, tions 565, and client computing devices 560. The network including semantic information about content items, such as 550 may be any combination of media (e.g., twisted pair, 50 entities, categories, and relationships. Although the modules coaxial, fiber optic, radio frequency), hardware (e.g., routers, 511-516 are described as communicating primarily through Switches, repeaters, transceivers), and protocols (e.g., TCP/ the data store 517, other communication mechanisms are IP, UDP, Ethernet, Wi-Fi, WiMAX) that facilitate communi contemplated, including message passing, function calls, cation between remotely situated humans and/or devices. The pipes, sockets, shared memory, and the like. devices 560 include desktop computers, notebook computers, 55 In an example embodiment, components/modules of the mobile phones, Smart phones, tablet computers, personal content recommendation system 510 are implemented using digital assistants, and the like. standard programming techniques. For example, the content In a typical embodiment, the content recommendation sys recommendation system 510 may be implemented as a tem 510 includes a content ingester 511, an entity and rela “native” executable running on the CPU 503, along with one tionship identifier 512, a category identifier 513, a user inter 60 or more static or dynamic libraries. In other embodiments, the face manager 515, a content recommendation system content recommendation system 510 may be implemented as application program interface (API) 516, and a data store instructions processed by a virtual machine that executes as 517. The modules 511-5.14 respectively perform functions one of the other programs 530. In general, a range of pro such as those described with reference to modules 211-214 of gramming languages known in the art may be employed for FIG. 2. The content ingester 511, entity and relationship 65 implementing Such example embodiments, including repre identifier 512, user interface manager 515, and API 516 are sentative implementations of various programming language drawn in dashed lines to indicate that in other embodiments, paradigms, including but not limited to, object-oriented (e.g., US 8,725,739 B2 17 18 Java, C++, C#, Visual Basic.NET, Smalltalk, and the like), tures may be stored on tangible, non-transitory storage medi functional (e.g., ML, Lisp, Scheme, and the like), procedural ums. Some or all of the system components and data struc (e.g., C. Pascal, Ada, Modula, and the like), Scripting (e.g., tures may also be stored as data signals (e.g., by being Perl, Ruby, Python, JavaScript, VBScript, and the like), and encoded as part of a carrier wave or included as part of an declarative (e.g., SQL, Prolog, and the like). analog or digital propagated signal) on a variety of computer The embodiments described above may also use either readable transmission mediums, which are then transmitted, well-known or proprietary synchronous or asynchronous cli including across wireless-based and wired/cable-based medi ent-server computing techniques. Also, the various compo ums, and may take a variety of forms (e.g., as part of a single nents may be implemented using more monolithic program or multiplexed analog signal, or as multiple discrete digital ming techniques, for example, as an executable running on a 10 packets or frames). Such computer program products may single CPU computer system, or alternatively decomposed also take other forms in other embodiments. Accordingly, using a variety of structuring techniques known in the art, embodiments of this disclosure may be practiced with other including but not limited to, multiprogramming, multithread computer system configurations. ing, client-server, or peer-to-peer, running on one or more FIG. 6 is an example flow diagram of a content indexer computer systems each having one or more CPUs. Some 15 process performed by an example embodiment. In particular, embodiments may execute concurrently and asynchronously, FIG. 6 illustrates a process that may be implemented by, for and communicate using message passing techniques. Equiva example, one or more elements of the content recommenda lent synchronous embodiments are also supported. Also, tion system 200, such as the content ingester 211, entity and other functions could be implemented and/or performed by relationship identifier 212, and the category identifier 213, each component/module, and in different orders, and by dif described with reference to FIG. 2. The process indexes con ferent components/modules, yet still achieve the described tent items by determining entities and categories related to the functions. indexed content items. In addition, programming interfaces to the data stored as The illustrated process begins at block 602, where it pro part of the content recommendation system 510, such as in the cesses a content item to determine multiple entities refer data store 517, can be available by standard mechanisms such 25 enced by the content item. Determining the multiple entities as through C, C++, C#, and Java APIs; libraries for accessing may include identifying entities referenced by the content files, databases, or other data repositories; through scripting item, each of the determined entities being electronically languages such as XML; or through Web servers, FTP serv represented by the content recommendation system. Deter ers, or other types of servers providing access to stored data. mining the multiple entities may further include ranking the The data store 517 may be implemented as one or more 30 entities on factor Such as the number/quantity of mentions in database systems, file systems, or any other technique for the content item, the position of the mentions of the entity in storing such information, or any combination of the above, the content item, and/or penalties based on the type of the including implementations using distributed computing tech entity. niques. At block 604, the process determines at least one category Different configurations and locations of programs and 35 that is associated with one of the multiple entities. The deter data are contemplated for use with techniques of described mined at least one category may be part of a taxonomy stored herein. A variety of distributed computing techniques are by the content recommendation system and may be associ appropriate for implementing the components of the illus ated with one of the multiple corresponding entities refer trated embodiments in a distributed manner including but not enced by the content item, as determined at block 602. Deter limited to TCP/IP sockets, RPC, RMI, HTTP, Web Services 40 mining the at least one category may further include selecting (XML-RPC, JAX-RPC, SOAP and the like). Other variations a predetermined number of highest ranked entities from enti are possible. Also, other functionality could be provided by ties ranked at block 602, and then selecting and/or aggregat each component/module, or existing functionality could be ing categories associated with the selected entities. distributed amongst the components/modules in different At block 606, the process stores the determined multiple ways, yet still achieve the functions described herein. 45 entities and the determined at least one category. Storing the Furthermore, in some embodiments, some or all of the determined entities and categories may include annotating components of the content recommendation system 510 may the content item (e.g., with a token that represents and entity be implemented or provided in other manners, such as at least or category) in an index or other data structure that Supports partially in firmware and/or hardware, including, but not lim efficient retrieval of content items. ited to one or more application-specific integrated circuits 50 At block 608, the process determines whether there are (ASICs'), standard integrated circuits, controllers executing more content items. If so, the process proceeds to block 602, appropriate instructions, and including microcontrollers and/ else returns. or embedded controllers, field-programmable gate arrays FIG. 7 is an example flow diagram of a content recom (“FPGAs), complex programmable logic devices mender process performed by an example embodiment. In (“CPLDs), and the like. Some or all of the system compo 55 particular, FIG. 7 illustrates a process that may be imple nents and/or data structures may also be stored as contents mented by, for example, one or more elements of the content (e.g., as executable or other machine-readable Software recommendation system 200. Such as the content recom instructions or structured data) on a computer-readable mender 214, described with reference to FIG. 2. The process medium (e.g., as a hard disk; a memory; a computer network provides content items that are related to a specified category, or cellular wireless network or other data transmission 60 Such as by responding to a search query. medium; or a portable media article to be read by an appro The process begins at block 702, where it receives an priate drive or via an appropriate connection, Such as a DVD indication of a category. In one embodiment, receiving the or flash memory device) so as to enable or configure the indication of the category includes receiving a search query/ computer-readable medium and/or one or more associated request that specifies the category. computing systems or devices to execute or otherwise use or 65 At block 704, the process selects a content item that has a provide the contents to perform at least some of the described corresponding category that matches the indicated category. techniques. Some or all of the components and/or data struc The corresponding category may be associated with one or US 8,725,739 B2 19 20 more entities that are referenced by the selected content item Example EntityTypes and that are part of a taxonomy stored by the content recom The following Table defines several example entity types in mendation system. Selecting the content item may include ranking multiple content items that reference the category, an example embodiment. Other embodiments may incorpo based on various factors, such as term frequency, the number 5 rate different types. of times a category token was added to the content item, a credibility score of the content item, and/or a recency score of TABLE 1 the content item. Person At block 706, the process transmits an indication of the Organization selected content item. Transmitting the indication of the con- 10 Location tent item may include transmitting an identifier (e.g., a URL) Concept and/or information about or from the content item (e.g., Ele article title, Summary, text). Condition Some embodiments perform one or more operations/as- Organism pects in addition to, or instead of the ones described with Substance reference to the process of FIG. 7. For example, in one 15 embodiment, after block 706, the process may return to block Example Facets 702 to receive and process additional category indications. In p another embodiment, the process may also aggregate mul- The following Table defines several example facets in an tiple selected content items, such that related content items example embodiment. are grouped together. Other embodiments may incorporate different facets. TABLE 2 RSON actor Evri Person Entertainment Actor RSON animator Evri Person Entertainment Animator RSON cinematographer Evri/Person/Entertainment Cinematographer RSON comedian Evri Person Entertainment Comedian RSON fashion designer Evri/Person/Entertainment Fashion Designer RSON musician Evri Person Entertainment Musician RSON composer Evri/Person/Entertainment Musician/Composer RSON producer Evri/Person/Entertainment Producer RSON director Evri Person Entertainment Director RSON radio personality Evri/Person/Entertainment Radio Personality RSON television personality Evri/Person/Entertainment Television Personality RSON author Evri Person Entertainment Author RSON model Evri Person Entertainment Model RSON screenwriter Evri Person Entertainment Screenwriter RSON playwright Evri/Person/Entertainment Playwright RSON conductor Evri Person/Entertainment? Conductor RODUCT film Evri Product Entertainment Movie RODUCT television show Evri Product Entertainment Television Show RODUCT album Evri Product Entertainment Album RODUCT musical Evri Product EntertainmentMusical RODUCT book Evri Product Entertainment Book RODUCT newspaper Evri Product Publication ERSON politician Evri Person Politics Politician ERSON cabinet member Evri Person Politics, Cabinet Member ERSON government person Evri/Person/Politics/Government Person ERSON political party leader Evri/Person/Politics/Political Party Leader ERSON judge Evri/Person/Politics/Judge ERSON country leader Evri/Person/Politics/Politician/World Leader ERSON ioint chiefs of staff wriPerson Politics, Politician Joint Chiefs of Staff ERSON white house staff Evri Person Politics. White House Staff ERSON activist Evri Person Politics. Activist ERSON lobbyist Evri/Person/Politics/Lobbyist ERSON ambassador Evri Person Politics. Ambassador ERSON analyst Evri/Person/Analyst ERSON iournalist Evri Person Journalist PERSON blogger Evri/Person/Blogger ORGANIZATION band Evri Organization/Entertainment Band ORGANIZATION political party Evri Organization/Politics/Political Party ORGANIZATION advocacy group Evri Organization/Politics Advocacy Group EVENT film award ceremony Evri? Event Entertainment Film Award Ceremony EVENT music award ceremony Evri? Event Entertainment Music Award Ceremony EVENT television award ceremony Evri/Event Entertainment Television Award Ceremony EVENT court case Evri Event Politics Court Case ORGANIZATION television network Evri Organization/Entertainment Company/Television Network ORGANIZATION music production company Evri Organization/Entertainment Company Music Production Company ORGANIZATION film production company Evri Organization/Entertainment Company/Film Production Company LOCATION congressional district Evri/Location/Politics/Congressional District LOCATION military base Evri/Location/Politics/Military Base ORGANIZATION congressional committee Evri Organization/Politics Congressional Committee ORGANIZATION international organization US 8,725,739 B2 21 22 TABLE 2-continued Evri Organization Politics International Organization ORGANIZATION government agency Evri Organization/Politics/Government Agency ORGANIZATION armed force Evri Organization/Politics Armed Force ORGANIZATION terrorist organization Evri Organization/Politics/Terrorist Organization ORGANIZATION us court Evri? Organization/Politics/US Court ORGANIZATION cabinet department Evri Organization/Politics/Cabinet Department LOCATION continent Evri Location Continent LOCATION geographic region Evri Location Geographic Region LOCATION country Evri/Location/Country LOCATION province Evri Location Province LOCATION state Evri Location State LOCATION city Evri/Location/City LOCATION us city Evri/Location City LOCATION neighborhood Evri/Location/Neighborhood LOCATION building Evri/Location/Structure/Building LOCATION island Evri Location Island LOCATION mountain Evri Location Mountain LOCATION body of water Evri/Location/Body of Water ORGANIZATION media companyEvri Organization/Entertainment Company/Media Company ORGANIZATION haute couture house Evri Organization/Entertainment/Company/Haute Couture House ORGANIZATION publishing company Evri Organization/Entertainment Company Publishing Company ORGANIZATION entertainment company Evri Organization/Entertainment Company CONCEPT fictional character Evri Concept Entertainment Fictional Character PERSON military leader Evri/Person/Politics/Military Leader RSON military person Evri/Person/Politics/Military Person ENT military conflict Evri? Event Politics/Military Conflict RSON terrorist Evri Person Politics. Terrorist RSON criminal Evri Person? Criminal RSON explorer Evri/Person/Explorer RSON inventor Evri/Person Technology/Inventor RSON lawyer Evri/Person/Lawyer RSON artist Evri Person Artist RSON painter Evri Person Artist Painter RSON revolutionary Evri/Person/Revolutionary RSONspiritual leader Evri/Person/Spiritual Leader RSON philosopher Evri/Person/Philosopher RSON anthropologist Evri/Person/Anthropologist RSON architect Evri Person Architect RSON historian Evri Person. Historian RSON editor Evri Person Editor RSON astronaut Evri Person Astronaut RSON photographer Evri/Person/Photographer RSON scientist Evri/Person Technology/Scientist RSON economist Evri Person Economist RSON technology person Evri/Person Technology/Technology Person RSON business person Evri/Person/Business/Business Person RSON stock trader Evri Person Business. Business Person Stock Trader ERSON first lady Evri/Person/Politics/First Lady ORGANIZATION us state legislature Evri Organization/Politics/Legislative Body/State Legislature ORGANIZATION legislative body Evri Organization/Politics/Legislative Body ORGANIZATION executive body Evri? Organization/Politics/Executive Body PERSON team owner Evri/Person/Sports/Team Owner RSON sports announcer Evri/Person/Sports/Sports Announcer RSON sports executive Evri/Person/Sports/Sports Executive RSON olympic medalist Evri/Person/Sports/Olympic Medalist RSON athlete Evri/Person/Sports/Athlete RSON coach Evri/Person/Sports/Coach RSON sports official Evri/Person/Sports/Sports Official RSON motorcycle driver Evri/Person/Sports Athlete/Motorcycle Rider RSON race car driver Evri/Person/Sports/Athlete/Race car Driver RGANIZATION auto racing team Evri Organization/Sports/Auto Racing Team E RSON baseball player Evri/Person/Sports Athlete/Baseball Player RGANIZATION baseball team Evri Organization/Sports/Baseball Team ERSON basketball player Evri/Person/Sports Athlete/Basketball Player RGANIZATION basketball team Evri Organization/Sports/Basketball Team ERSON football player Evri/Person/Sports Athlete/Football Player RGANIZATION football team Evri Organization/Sports/Football Team ERSON hockey player Evri/Person/Sports Athlete? Hockey Player RGANIZATION hockey team Evri Organization/Sports/Hockey Team ERSON soccer player Evri/Person/Sports Athlete/Soccer Player RGANIZATION soccer team Evri Organization/Sports/Soccer Team RGANIZATION sports league Evri? Organization/Sports/Sports League ERSON cricketer Evri/Person/Sports/Athlete/Cricketer RGANIZATION cricket team Evri? Organization/SportsCricket Team ERSON cyclist Evri/Person/Sports/Athlete/Cyclis ORGANIZATION cycling team Evri Organization/Sports/Cycling Team US 8,725,739 B2 23 24 TABLE 2-continued ERSON volleyball player Evri/Person/Sports Athlete? Volleyball Player RGANIZATION volleyball team Evri Organization/Sports/Volleyball Team s ERSON rugby player Evri/Person/Sports/Athlete? Rugby Player RGANIZATION rugby team Evri Organization/Sports/Rugby Team RSON boxer Evri/Person/Sports/Athlete/Boxer RSON diver Evri/Person/Sports/Athlete/Diver RSON golfer Evri/Person/Sports/Athlete/Golfer RSON gymnast Evri/Person/Sports/Athlete/Gymnast RSON figure skater Evri/Person/Sports Athlete/Figure Skater RSON horse racing jockey Evri/Person/Sports Athlete? Horse Racing Jockey RSON lacrosse player Evri/Person/Sports Athlete/Lacrosse Player GANIZATION lacrosse team Evri? Organization/Sports/Lacrosse Team RSON rower Evri/Person/Sports/Athlete/Rower RSON Swimmer Evri/Person/Sports Athlete/Swimmer RSON tennis player Evri/Person/Sports Athlete,Tennis Player RSON track and field athlete Evri/Person/Sports Athlete/Track and Field Athlete RSON wrestler Evri/Person/Sports/Athlete/Wrestler ERSON triathlete Evri/Person/Sports/Athlete?Triathlete EVENT sports competition Evri? Event Sports/Sports Event Sporting Competition EVENT sports event Evri? Event Sports Sports Event EVENT olympic sport Evri? Event Sports/Olympic Sports EVENT election Evri Event Politics. Election LOCATION sports venue Evri/Location/Sports/Sports Venue ORGANIZATION sports division Evri? Organization/Sports/Sports Division ORGANIZATION sports event promotion company Evri Organization/Sports/Sports Event Promotion Company ORGANIZATION sports organization Evri Organization/Sports/Sports Organization ORGANIZATION company Evri? Organization/Business/Company ORGANIZATION news agency Evri Organization/Business/Company/News Agency PRODUCT cell phone Evri ProductiTechnology Cell Phone PRODUCT computer Evri ProductiTechnology/Computer PRODUCT software Evri/Product/Technology/Software PRODUCT video game Evri Product Technology/Software/Video Game PRODUCT video game console Evri ProductiTechnology/Video Game Console PRODUCT media player Evri Product?Technology/Media player RGANIZATION website Evri Organization/Technology/Website RGANIZATION technology company Evri? Organization/Technology/Company 8RODUCT magazine Evri Product? Publication O RGANIZATION financial services company Vri? Organization/Business Company Financial Services Company RGANIZATION radio network Evri Organization/Entertainment Company/Radio Network RGANIZATION futures exchange Evri? Organization/Business/Futures Exchange RGANIZATION stock exchange Evri? Organization/Business/Stock Exchange RGANIZATION government sponsored enterprise wri/Organization/Politics/Government Sponsored Enterprise RGANIZATION political organization Evri Organization/Politics/Political organization RGANIZATION labor union Evri? Organization/Politics/Labor Union RGANIZATION nonprofit corporation wri/Organization/Business/Company/Nonprofit Corporation RGANIZATION nonprofit organization Evri Organization/Nonprofit Organization RGANIZATION national laboratory Evri Organization/Politics/National Laboratory RGANIZATION unified combatant commands :Evri Organization/Politics/Unified Combatant Commands ORGANIZATION research institute Evri? Organization/Research Institute CONCEPT stock market index Evri Concept Business/Stock Market Index PERSON business executive Evri Person Business. Business Person Business Executive PERSON corporate director Evri/Person/Business/Business Person/Corporate Director PERSON banker Evri Person Business. Business Person Banker PERSON publisher Evri/Person/Business/Business Person/Publisher PERSON us politician Evri Person Politics, U.S. Politician PERSON nobel laureate Evri Person Nobel Laureate PERSON chemist Evri Person Chemist PERSON physicist Evri/Person/Physicist ORGANIZATION business organization Evri Organization/Business/Business Organization ORGANIZATION consumer organization Evri Organization/Business/Consumer Organization ORGANIZATION professional association Evri? Organization/Business/Professional Association PERSON investor Evri Person BusinessBusiness Person Investor PERSON financier Evri Person Business. Business Person Financier PERSON money manager Evri/Person/Business/Business Person/Money Manager ORGANIZATION aerospace company Evri Organization Business Company Aerospace Company ORGANIZATION advertising agency Evri Organization Business Company Advertising Company ORGANIZATION agriculture company Evri Organization Business Company Agriculture Company ORGANIZATION airline Evri Organization/Business/Company/Airline ORGANIZATION architecture firm Evri Organization/Business/Company/Architecture Firm ORGANIZATION automotive company Evri Organization Business Company Automotive Company ORGANIZATION chemical company Evri Organization/Business/Company/Chemical Company US 8,725,739 B2 25 26 TABLE 2-continued ORGANIZATION clothing company Evri? Organization/Business/Company/Clothing Company ORGANIZATION consulting company Evri Organization Business Company Consulting Company ORGANIZATION cosmetics company Evri Organization Business Company Cosmetics Company ORGANIZATION defense company Evri Organization/Business/Company/Defense Company ORGANIZATION distribution company Evri Organization Business Company Distribution Company ORGANIZATION gaming company Evri Organization Business Company Gaming Company ORGANIZATION electronics company Evri Organization Business Company Electronics Company ORGANIZATION energy company Evri Organization/Business/Company/Energy Company ORGANIZATION hospitality company Evri Organization Business Company Hospitality Company ORGANIZATION insurance company Evri? Organization/Business/Company/Insurance Company ORGANIZATION law firm Evri Organization/Business/Company/Law Firm ORGANIZATION manufacturing company Evri Organization Business Company Manufacturing Company ORGANIZATION mining company Evri? Organization/Business/Company/Mining Company ORGANIZATION pharmaceutical company Evri Organization Business Company Pharmaceutical Company ORGANIZATION railway company Evri Organization/Business/Company/Railway ORGANIZATION real estate company Evri Organization Business Company Real Estate Company ORGANIZATION retailer Evri Organization/Business/Company/Retailer ORGANIZATION shipping company Evri Organization/Business/Companyi Shipping Company ORGANIZATION software company Evri Organization.Technology Company Software Company ORGANIZATION steel company Evri? Organization/Business/Company Steel Company ORGANIZATION telecommunications company Evri Organization Business Company Telecommunications Company ORGANIZATION utilities company Evri? Organization/Business/Company/Utilities Company ORGANIZATION wholesaler Evri Organization/Business/Company/Wholesaler ORGANIZATION television production company Evri Organization/Entertainment Company Television Production Company ORGANIZATION food company Evri? Organization/Business/Company/Food Company ORGANIZATION beverage company Evri Organization Business Company Food Company Beverage Company ORGANIZATION restaurant Evri? Organization/Business/Company/Food Company/Restaurant ORGANIZATION winery Evri Organization Business Company Food Company Beverage Company EVENT film festival Evri Event Entertainment Film Festival ORGANIZATION film festival Evri? Event? Entertainment? Film Festival PRODUCT anime Evri Product Entertainment Anime PRODUCT aircraft Evri Product Aircraft PRODUCT military aircraft Evri Product Aircraft Military Aircraft PRODUCT vehicle Evri Product Vehicle PRODUCT ballet Evri Product Entertainment Ballet PRODUCT opera Evri/Product/Entertainment/Opera PRODUCT painting Evri? Product? Entertainment Painting PRODUCT song Evri Product Entertainment/Single EVENT technology conference Evri Event/Technology/Technology Conference CONCEPT legislation Evri Concept Politics/Legislation CONCEPT treaty Evri/Concept/Politics/Treaty ORGANIZATION trade association Evri Organization/Business/Trade Association ORGANIZATION technology organization Evri Organization.Technology Technology Organization ORGANIZATION educational institution Evri Organization/Educational Institution LOCATION museum Evri/Location/Structure/Building/Museum LOCATION religious building Evri? Location Structure/Building/Religious Building PERSON astronomer Evri Person Astronomer PERSON mathematician Evri Person Mathematician PERSON academic Evri Person Academic PERSON dancer Evri Person Entertainment Dancer PRODUCT play Evri/Product/Entertainment Play LOCATION botanical garden Evri/Location. Botanical Garden LOCATION hospital Evri/Location/Health/Hospital PERSON psychiatrist Evri/Person/Health/Psychiatrist PERSON physician Evri/Person/Health/Physician PERSON nurse Evri Person Health Nurse ORGANIZATION journalism organization Evri Organization/Journalism Organization ORGANIZATION healthcare company Evri Organization Business Company Healthcare Company ORGANIZATION religious organization Evri? Organization/Religious Organization PERSON biologist Evri/Person/Scientist/Biologist PERSON biochemist Evri Person Scientist Biochemist PERSON botanist Evri Person Scientist Botanist PERSON poet Evri/Person/Entertainment Authori Poet PERSON curler Evri/Person/Sports/Athlete/Curler PERSON biathlete Evri/Person/Sports/Athlete/Biathlete US 8,725,739 B2 27 28 TABLE 2-continued RSON alpine skier Evri/Person/Sports/Athlete? Alpine Skier RSON cross-country skier Evri/Person/Sports Athlete/Cross-country Skier RSON freestyle skier Evri/Person/Sports/Athlete/Freestyle Skier RSON luger Evri/Person/Sports Athlete/Luger RSON nordic combined skier Evri/Person/Sports/Athlete/Nordic Combined Skier RSON speed skater Evri/Person/Sports/Athlete/Speed Skater RSON skeleton racer Evri/Person/Sports Athlete? Skeleton Racer RSON ski jumper Evri/Person/Sports Athlete? Ski Jumper RSON Snowboarder Evri/Person/Sports/Athlete? Snowboarder RSON bobsledder Evri/Person/Sports/Athlete/Bobsledder RSON bodybuilder Evri/Person/Sports/Athlete/Bodybuilder RSON equestrian Evri/Person/Sports/Athlete Equestrian RSON fencer Evri/Person/Sports/Athlete/Fencer RSON hurler Evri/Person/Sports/Athlete/Hurler RSON martial artist Evri/Person/Sports/Athlete/Martial Artist RSON canoer Evri/Person/Sports/Athlete? Canoer LOCATION music venue Evri Location. EntertainmentMusic Venue LOCATION aquarium Evri/Location/Aquarium LOCATION cemetery Evri/Location? Cemetery LOCATION national park Evri/Location/National Park LOCATION volcano Evri Location Volcano LOCATION zoo Evri Location Zoo LOCATION structure Evri Location Structure LOCATION airport Evri/Location Structure/Airport LOCATION bridge Evri/Location Structure/Bridge LOCATION hotel Evri Location Structure, Hotel LOCATION palace Evri/Location Structure/Palace LOCATION monument Evri Location Structure, Monument LOCATION Street Evri Location Stree LOCATION amusement park Evri/Location/Amusement Park LOCATION unitary authority Evri/Location/Unitary Authority RODUCT drug brand Evri Product? Health Drug Brand RODUCT weapon Evri/Product/Weapon RODUCT missile system Evri ProductWeapon Missile System RODUCT firearm Evri/Product/Weapon/Firearm RODUCT artillery Evri/Product/Weapon/Artillery RODUCT anti-aircraft weapon Evri/Product Weapon/Anti-aircraft Weapon RODUCT anti-tank weapon Evri Product? Weapon/Anti-tank Weapon RODUCT biological weapon Evri Product? Weapon/Biological Weapon RODUCT chemical weapon Evri Product? Weapon/Chemical Weapon HEMICAL chemical weapon Evri Product? Weapon/Chemical Weapon UBSTANCE chemical weapon Evri ProductWeapon/Chemical Weapon RODUCT explosive Evri/Product/Weapon/Explosive RODUCT weapons launcher Evri ProductWeapon. Weapons Launcher ERSON chess player Evri/Person/Chess Player ERSON sculptor Evri/Person? Artist/Sculptor RODUCT game Evri/Product/Game RGANIZATION theater company Vri? Organization/Entertainment Company. Theater Company ERSON badminton player Evri/Person/Sports Athlete/Badminton Player RODUCT naval ship Evri? Product? Watercraft/Naval Ship RODUCT battleship Evri/Product/Watercraft/Naval Ship/Battleship RODUCT cruiser Evri/Product/Watercraft/Naval Ship/Cruiser RODUCT aircraft carrier Evri Product? Watercraft/Naval Ship/Aircraft Carrier RODUCT destroyer Evri/Product/Watercraft/Naval Ship/Destroyer RODUCT frigate Evri/Product/Watercraft/Naval Ship/Frigate RODUCT Submarine Evri Product? Watercraft Naval Ship/Submarine RODUCT cruise ship Evri Product? Watercraft Cruise Ship RODUCT yacht Evri/Product/Watercraft/Yacht RODUCT ocean liner Evri Product? Watercraft Ocean Liner LOCATION county Evri/Location County PRODUCT symphony Evri Product Entertainment Symphony ORGANIZATION television station Evri Organization/Entertainment/Company/Television Station ORGANIZATION radio station Evri? Organization/Entertainment Company/Radio Station CONCEPT constitutional amendment Evri? Concept/Politics/Constitutional Amendment PERSON australian rules footballer Evri/Person/Sports/Athlete? Australian Rules Footballer ORGANIZATION australian rules football team Evri Organization/Sports/Australian Rules Football Team ORGANIZATION criminal organization Evri Organization/Criminal Organization PERSON poker player Evri/Person/Poker Player PERSON bowler Evri/Person/Sports/Athlete/Bowler PERSONyacht racer Evri/Person/Sports Athlete/Yacht Racer PERSON water polo player Evri/Person/Sports/Athlete Water Polo Player PERSON field hockey player Evri/Person/Sports/Athlete? Field Hockey Player PERSON skateboarder Evri/Person/Sports/Athlete/Skateboarder PERSON polo player Evri/Person/Sports/Athlete/Polo Player PERSON gaelic footballer Evri/Person/Sports Athlete/Gaelic Footballer PRODUCT programming language Evri ProductiTechnology Programming Language PERSON engineer Evri/Person Technology/Engineer US 8,725,739 B2 29 30 TABLE 2-continued EVENT cybercrime Evri? Event/Technology/Cybercrime EVENT criminal act Evri Event Criminal Act ERSON critic Evri Person Critic RSON pool player Evri/Person/Pool Player RSON Snooker player Evri/Person/Snooker Player RSON competitive eater Evri/Person/Competitive Eater ODUCT data storage medium Evri ProductiTechnology/Data Storage Medium ODUCT data storage device Evri Product?Technology/Data Storage Device RSON mountain climber Evri Person Mountain Climber RSON aviator Evri Person Aviator RGANIZATION cooperative Evri? Organization Cooperative ONCEPT copyright license Evri Concept Copyright License VENT observance Evri Event Observance RSON outdoor sportsperson Evri/Person/Sports/Outdoor Sportsperson C RSON rodeo performer Evri/Person/Sports/Rodeo Performer RSON sports shooter Evri/Person/Sports Athlete/Sports Shooter ONCEPT award Evri Concept Award ONCEPT entertainment series Evri? Concept Entertainment/Entertainment Series RSON chef Evri Person Chef RSON cartoonist Evri Person Entertainment? Cartoonist RSON comics creator Evri Person Entertainment Comics Creator RSON nobility Evri/Person/Nobility RSON porn star Evri/Person/Porn Star RSON archaeologist Evri/Person/Scientist Archaeologist RSON paleontologist Evri/Person/Scientist Paleontologist C RSON victim of crime Evri Person Victim of Crime CATION region Evri/Location/Region RSON linguist Evri/Person/Linguist RSON librarian Evri Person Librarian RSON bridge player Evri/Person/Bridge Player RSON choreographer Evri/Person/Entertainment/Choreographer PRODUCT camera Evri Product Technology/Camera PRODUCT publication Evri/Product/Publication PRODUCT comic Evri Product Entertainment? Comic PRODUCTs hort story Evri? Product Entertainment Short Story O RGANIZATION irregular military organization Vri? Organization. Irregular Military Organization BST ANCE chemical element Evri Substance, Chemical Element BST ANCE alkaloid Evri Substance Organic Compound Alkaloid BST ANCE glycoside Evri Substance? Glycoside BST ANCE amino acid Evri Substance, Amino Acid BST ANCE protein Evri Substance/Protein BST ANCE enzyme Evri Substance/Enzyme BST ANCE hormone Evri Substance, Hormone BST ANCE hydrocarbon Evri Substance/Organic Compound Hydrocarbon BST ANCE inorganic compound Evri Substance, Inorganic Compound BST ANCE lipid Evri Substance/Organic Compound Lipid BST ANCE steroid Evri Substance/Organic Compound Lipid Steroid BST ANCE molecule Evri Substance/Molecule BST ANCE polymer Evri Substance/Molecule? Polymer BST ANCE terpene Evri Substance/Organic Compound Terpene BST ANCE toxin Evri Substance, Toxin BST ANCE ibiotic Evri Substance, Health Antibiotic BST ANCE ioxidant Evri Substance, Health Antioxidant BST ANCE i-inflammatory Evri Substance/Health/Anti-inflammatory BST ANCE iaisthmatic drug Evri Substance/Health Antiasthmatic drug BST ANCE iconvulsant Evri Substance, Health Anticonvulsant BST ANCE ihistamine Evri Substance, Health Antihistamine BST ANCE ihypertensive Evri Substance/Health/Antihypertensive BST ANCE iviral Evri Substance, Health Antivira BST ANCE inkiller Evri Substance Health Painkiller BST ANCE P 8. inkiller Evri Substance, Health Painkiller BST ANCE esthetic Evri Substance, Health Anesthetic BST ANCE ibody Evri Substance/Antibody BST ANCE chemotherapeutic drug Evri Substance/Health Chemotherapeutic BST ANCE anti-diabetic drug Evri Substance/Health Anti-diabetic BST ANCE antianginal drug Evri Substance Health Antianginal BST ANCE muscle relaxant Evri Substance, Health Muscle relaxant BST ANCE hypolipidemic drug Evri Substance/Health Hypolipidemic Drug BST ANCE psychoactive drug Evri Substance/Health/Psychoactive Drug BST ANCE vaccine Evri Substance, Health Vaccine BST ANCE gastrointestinal drug Evri Substance/Health Gastrointestinal Drug BST ANCE erectile dysfunction drug Evri Substance/Health/Erectile Dysfunction Drug BST ANCE organometallic compound rif Substance Organic Compound Organometallic Compound UBST ANCE phenol Evri Substance/Organic Compound Phenol US 8,725,739 B2 31 32 TABLE 2-continued BSTANCE ketone Evri Substance/Organic Compound Ketone BSTANCE amide Evri Substance/Organic Compound. Amide BSTANCE ester Evri Substance/Organic Compound/Ester BSTANCE ether Evri Substance/Organic Compound Ether BSTANCE heterocyclic compound i.Substance Organic Compound Heterocyclic Compound BSTANCE organic compound Evri Substance/Organic Compound BSTANCE carbohydrate Evri Substance/Organic Compound/Carbohydrate BSTANCE peptide Evri Substance/Organic Compound Peptide BSTANCE organohalide Evri Substance/Organic Compound/Organohalide BSTANCE organosulfur compound i.Substance Organic Compound Organosulfur Compound BSTANCE aromatic compound Evri Substance/Organic Compound. Aromatic Compound BSTANCE carboxylic acid Evri Substance/Organic Compound/Carboxylic Acid BSTANCE nucleic acid Evri Substance Nucleic Acid BSTANCE ion Evri Substance, Ion ORGANISM cyanobacterium Evri? Organism/Health/Cyanobacterium ORGANISM gram-positive bacterium Evri Organism/Health Gram-positive Bacterium ORGANISM gram-negative bacterium Evri Organism/Health Gram-negative Bacterium ORGANISM acid-fast bacterium Evri Organism. Health/Acid-fast Bacterium ORGANISM dna virus Evri? Organism. Health/DNA Virus ORGANISMrna virus Evri Organism/Health/RNA Virus CONDITION symptom Evri Condition/Health Symptom CONDITION injury Evri/Condition/Health/Injury CONDITION inflammation Evri? Condition Health Inflammation CONDITION disease Evri Condition. Health Disease CONDITION cancer Evri? Condition Health Diseasef Cancer ORGANISM medicinal plant Evri? Organism/Health/Medicinal Plant ORGANISM poisonous plant Evri Organism/Poisonous Plant ORGANISM herb Evri/Organism/Herb CONCEPT medical procedure Evri Concept Health Medical Procedure ORGANISM bacterium Evri Organism/Health. Bacterium ORGANISM virus Evri/Organism/Health/Virus ORGANISM horse Evri Organism/Horse PERSON fugitive Evri/Person/Fugitive ORGANIZATION military unit Evri? Organization/Politics/Military Unit ORGANIZATION law enforcement agency Evri Organization Politics Law Enforcement Agency LOCATION golf course Evri/Location/Golf Course PERSON law enforcement agent Evri/Person/Politics/Law Enforcement Agent PERSON magician Evri/Person/Entertainment Magician LOCATION educational institution Evri Organization/Educational Institution CONCEPT social program Evri Concept Politics/Social Program EVENT international conference Evri Event Politics International Conference

All of the above U.S. patents, U.S. patent application pub- 40 The invention claimed is: lications, U.S. patent applications, foreign patents, foreign 1. A computer-implemented method in a content recom patent applications and non-patent publications referred to in mendation system, the method comprising: this specification and/or listed in the Application Data Sheet, processing a corpus of content items to determine, for each including but not limited to U.S. Provisional Patent Applica of the content items, multiple corresponding entities 45 referenced by the content item, each of the determined tion No. 61/408,965, entitled “CATEGORY-BASED CON entities being electronically represented by the content TENT RECOMMENDATION filed Nov. 1, 2010, is incor recommendation system; porated herein by reference, in its entirety. determining, for each of at least Some of the content items, From the foregoing it will be appreciated that, although at least one corresponding category that is part of a specific embodiments have been described herein for pur- 50 taxonomy that is that is represented as a graph stored by poses of illustration, various modifications may be made the content recommendation system and that is associ without deviating from the spirit and scope of this disclosure. ated with one of the multiple corresponding entities For example, the methods, techniques, and systems for cat referenced by the content item, wherein determining the egory-based content recommendation are applicable to other at least one corresponding category includes aggregat architectures. For example, instead of recommending textual 55 ing common nodes in taxonomic paths that are associ ated with the determined entities and that are part of the content items, the techniques may be used to automatically graph, by: recommend other types of items, such as music or other audio determining a first taxonomic path associated with a first items, videos, applications (e.g., mobile applications), online one of the determined entities, the first path including activities, or the like. Also, the methods, techniques, and 60 multiple connected nodes that are in the graph and systems discussed herein are applicable to differing query that represent a hierarchy of categories for the first languages, protocols, communication media (optical, wire entity; less, cable, etc.) and devices (e.g., desktop computers, wire determining a second taxonomic path associated with a less handsets, electronic organizers, personal digital assis second one of the determined entities, the second path tants, tablet computers, portable email machines, game 65 including multiple connected nodes that are in the machines, pagers, navigation devices such as GPS receivers, graph and that represent a hierarchy of categories for etc.). the second entity; and US 8,725,739 B2 33 34 determining a common node between the first and sec into groups of content items, wherein similarity between two ond taxonomic paths, the common node representing content items is based on at least one of distance between the at least one corresponding category, such that the signatures of the two content items, amount of overlap firs and second entities are both in an is-a relationship between titles of the two content items, amount of overlap with the at least one corresponding category; and between Summaries of the two content items, amount of over storing, for each of the content items, the determined lap between URLs referencing the two content items, and multiple corresponding entities and the determined at publishers of the two content items. least one corresponding category. 16. The method of claim 10, further comprising: 2. The method of claim 1 wherein determining the at least determining content items that are related to the indicated one corresponding category includes traversing a path in the 10 taxonomy stored by the content recommendation system. category but that are not in the corpus of content items. 3. The method of claim 1 wherein determining the at least 17. The method of claim 16 wherein determining content one corresponding category includes traversing one or more items that are related to the indicated category but that are not relations in the taxonomy stored by the content recommen in the corpus of content items includes generating a keyword dation system, the relations including at least one of an is-a 15 query that includes names of popular entities in the indicated relation, a part-of relation, or a member-of relation. category. 4. The method of claim 1 wherein processing the corpus of 18. The method of claim 16 wherein determining content content items includes ranking, for one of the content items, items that are related to the indicated category but that are not the determined multiple corresponding entities. in the corpus of content items includes receiving from a 5. The method of claim 4 wherein ranking the determined third-party content service indications of at least one of a multiple corresponding entities includes ranking an entity Video, an image, an audio file, an instant message, or a mes based on at least one of the quantity of times that the entity is sage in a social network. referenced by the content item, a position/location of a refer 19. The method of claim 10 wherein providing indications ence of the entity, or a penalty assessed based on a type of the of the selected content items includes presenting information entity. 25 about the selected content items on a display Screen. 6. The method of claim 4 wherein determining the at least 20. The method of claim 10 wherein providing indications one corresponding category includes selecting a predeter of the selected content items includes transmitting informa mined number of highest ranked entities from the ranked tion about the selected content items to a remote client sys entities. tem. 7. The method of claim 1 wherein determining the at least 30 21. The method of claim 1, further comprising: one corresponding category includes ranking leaf node cat determining popular entities for an indicated category, the egories of taxonomic paths associated with the determined popular entities having recently received an increased multiple corresponding entities, the ranking based on the number of references by content items in the corpus quantity of entities having a particular category and/or the a and/or having more references by content items in the rank of an entity in a ranked list of entities. 35 corpus than other entities; and 8. The method of claim 1 wherein storing the determined transmitting indications of the determined popular entities. multiple corresponding entities and the determined at least 22. A non-transitory computer-readable medium having one corresponding category includes annotating a content contents that, when executed, enable a computing system to item entry in an index with tokens that reflect the determined recommend content, by performing a method comprising: at least one category. 40 processing a corpus of content items to determine, for each 9. The method of claim 8, further comprising: of the content items, multiple corresponding entities searching for content items in the index that include a referenced by the content item, each of the determined specified token that reflects the indicated category. entities being electronically represented by the content 10. The method of claim 1, further comprising: recommendation system; receiving an indication of a category: 45 determining, for each of at least Some of the content items, Selecting one or more of the content items that each have a at least one corresponding category that is part of a corresponding category that matches the indicated cat taxonomy that is represented as a graph stored by the egory; and content recommendation system and that is associated providing indications of the selected content items. with one of the multiple corresponding entities refer 11. The method of claim 10 wherein selecting the one or 50 enced by the content item, wherein determining the at more content items includes ranking content items based on least one corresponding category includes aggregating term frequency minus inverse document frequency. common nodes in taxonomic paths that are associated 12. The method of claim 10 wherein selecting the one or with the determined entities and that are part of the more content items includes ranking content items based the graph, by: quantity of times a category token was added to a content item 55 determining a first taxonomic path associated with a first index. one of the determined entities, the first path including 13. The method of claim 10 wherein selecting the one or multiple connected nodes that are in the graph and more content items includes ranking the one or more content that represent a hierarchy of categories for the first items based on a credibility score determined for each content entity; item. 60 determining a second taxonomic path associated with a 14. The method of claim 10 wherein selecting the one or second one of the determined entities, the second path more content items includes ranking the one or more content including multiple connected nodes that are in the items based on recency of each content item, such that more graph and that represent a hierarchy of categories for recent contentitems are ranked higher than less recent content the second entity; and items. 65 determining a common node between the first and Sec 15. The method of claim 10 wherein selecting the one or ond taxonomic paths, the common node representing more content items includes collapsing similar content items the at least one corresponding category, such that the US 8,725,739 B2 35 36 first and second entities are both in an is-a relationship ated with one of the multiple corresponding entities with the at least one corresponding category: referenced by the content item, wherein the at least providing category-based content recommendations, by: one corresponding category is determined by aggre receiving an indication of a category: gating common nodes in taxonomic paths that are Selecting a content item that is one of the at least some of 5 associated with the determined entities and that are the content items and that has a corresponding cat part of the graph, by: egory that matches the indicated category, the corre determining a first taxonomic path associated with a first sponding category being one of the determined cat one of the determined entities, the first path including egories; and multiple connected nodes that are in the graph and transmitting an indication of the selected content item. 10 23. The computer-readable medium of claim 22 wherein that represent a hierarchy of categories for the first Selecting the content item includes collapsing similar content entity; items into groups of content items, wherein similarity determining a second taxonomic path associated with a between two content items is based on at least one of distance second one of determined entities, the second path between signatures of the two content items, amount of over 15 including multiple connected nodes that are in the lap between titles of the two content items, amount of overlap graph and that represent a hierarchy of categories for between summaries of the two content items, amount of over the first entity; lap between URLs referencing the two content items, and/or determining a common node between the first and sec publishers of the two content items. ond taxonomic paths, the common node representing 24. The computer-readable medium of claim 22 wherein 20 the at least one corresponding category, such that the the computer-readable medium is a memory in the computing first and secondentities are both in an is-arelationship system. with the at least one corresponding category; and 25. The computer-readable medium of claim 22 wherein receive from a search query an indication of a category: the contents are instructions that, when executed, cause the Select a content item from the corpus of content items, the computing system to perform the method. 25 Selected content item having a corresponding category 26. A computing system configured to recommend content, that matches the indicated category, the corresponding comprising: category being one of the determined categories; and a memory; transmit an indication of the selected content item. a module stored on the memory that is configured, when 27. The computing system of claim 26 wherein the module executed, to: 30 includes software instructions for execution in the memory of process a corpus of content items to determine, for each the computing system. of the content items, multiple corresponding entities 28. The computing system of claim 26 wherein the module referenced by the content item, each of the determined is a content recommendation system. entities being electronically represented by the con 29. The computing system of claim 26 wherein the module tent recommendation system: 35 is a category-based news service configured to recommend determine, for each of at least some of the content items, news items to at least one of a personal digital assistant, a at least one corresponding category that is part of a Smartphone, a tablet computer, a laptop computer, and/or a taxonomy that is represented as a graph stored by the third-party application. content recommendation system and that is associ UNITED STATES PATENT AND TRADEMARK OFFICE CERTIFICATE OF CORRECTION

PATENT NO. : 8,725,739 B2 Page 1 of 1 APPLICATION NO. 13/286778 DATED : May 13, 2014 INVENTOR(S) : Jisheng Liang et al. It is certified that error appears in the above-identified patent and that said Letters Patent is hereby corrected as shown below:

Claims: In column 32, line 50, in claim 1, remove the Second occurrence of “that is. In column 35, line 2, in claim 22, insert --and-after the “:. In column 36, line 13, in claim 26, insert --the-- before “determined. In column 36, line 16, in claim 26, replace “first with --second--.

Signed and Sealed this Ninth Day of August, 2016 74-4-04- 2% 4 Michelle K. Lee Director of the United States Patent and Trademark Office