(12) United States Patent (10) Patent No.: US 6,785,688 B2 Abajian Et Al
Total Page:16
File Type:pdf, Size:1020Kb
USOO6785688B2 (12) United States Patent (10) Patent No.: US 6,785,688 B2 Abajian et al. (45) Date of Patent: Aug. 31, 2004 (54) INTERNET STREAMING MEDIA 5,920,854 A 7/1999 Kirsch et al. .................. 707/3 WORKFLOW ARCHITECTURE 5.941,944 A 8/1999 Messerly ... 709/203 5,953,718 A 9/1999 Wical ............................ 707/5 (75) Inventors: Aram Christian Abajian, Seattle, WA 2. A yo ity et al. ................... A. 2- - -a- ll . - - - - E. Roadstander 5.991,809 A 11/1999 Kriegsman ... 709/226 eClimOnd, s 6,112.203 A 8/2000 Bharat et al. .................. 707/5 Chao-Chueh Lee, Bellevue, WA (US); 6,138,113 A 10/2000 Dean et al. .................... 707/2 Austin David Dahl, Seattle, WA (US); 6,151,584. A 11/2000 Papierniak et al. ........... 705/10 John Anthony Derosa, Seattle, WA 6,175,830 B1 1/2001 Maynard ................ ... 707/5 (US); Charles A. Porter, Seattle, WA 6,377,995 B2 * 4/2002 Agraharam et al. ........ 709/231 (US); Eric Carl Rehm, Bainbridge 6,389,467 B1 * 5/2002 Eyal ........................... 709/223 Island, WA (US); Jennifer Lynn Kolar, 6,484,199 B2 11/2002 Eyal - - - - ... 709/223 Seattle, WA (US); Srinivasan 6,519,648 B1 2/2003 Eyal .......... ... 709/231 Sudanagunta Scattle WA (US) 6,598,051 B1 * 7/2003 Wiener et al. .............. 707/100 (73) Assignee: America Online, Inc., Dulles, VA (US) OTHER PUBLICATIONS - - - - Eric Rehm, Representing Internet Streaming Media Meta (*) Notice: Subject to any disclaimer, the term of this data. Using MPEG-7 Multimedia Description Schemes, Jul. patent is extended or adjusted under 35 2, 2000, pp. 1-14. U.S.C. 154(b) by 25 days. Network Working9. Group,p Dublin Core Metadata for Resource Discovery, Sep. 1998, pp. 1-10. (21) Appl. No.: 09/876,941 Taalee Semantic Engine Brochure. (22) Filed: Jun. 8, 2001 Eberman, et al., Compaq, Indexing Multimedia for the (65) Prior PublicationO O Data Internet, Cambridgeg Resarch Laboratory,ry Mar. 1999. US 2002/0099695 A1 Jul. 25, 2002 (List continued on next page.) Related U.S. ApplicationO O Data Primary Examiner-Charles Rones (60) Provisional application No. 60/252,273, filed on Nov. 21, (74) Attorney, Agent, or Firm-Perkins Coie LLP 2000. (57) ABSTRACT (51) Int. Cl." ................................................ G06F 17/30 A method and System for utilizing metadata to Search for media, Such as multimedia and Streaming media, includes (52) U.S. Cl. ........................ 707/102; 707/4; 707/104.1; Searching for the media, receiving results, extracting meta 709/223 data associated with the media, enhancing the extracted metadata, and grouping the Search results in accordance with (58) Field of Search ................................ 707/1, 3, 4, 6, attributes of the enhanced metadata. Enhancing and group 707/10, 100, 104.1; 709/223, 231, 245 ing include adding related metadata to the database of metadata, iteratively using metadata to Search for more (56) Ref CS Cited media related data, removing duplicate URLS, collapsing U.S. PATENT DOCUMENTS URLS that are variants of each other, and masking out 5,241,305 A 8/1993 Fascenda et all 340/825.44 Superfluous terms from URLs. The resultant metadata and 5,345,227 A 9/1994 Fascenda et al.- ......- - - - - - 340/825.22 media files are available to users and Search engines.9. 5,483,522 A 1/1996 Derby et al. .................. 370/54 5.917424. A 6/1999 Goldman et al. ...... 340/825.44 30 Claims, 6 Drawing Sheets FUTUREALIDITY CHECKS 300 MULTI EEEJ Will wR. \- TITLEPAGEKEYWORDS, PAGE RETRY ADATOR DESCRIPTION: EXTRACTION (ANNOTATIONAGENT1) DETERMINES WHETHERA GROUPER 76 66 rebelAGEN MULTIMEDARLISYORKING (ANNOTATION AGENT 3) WALIDITY (AKA 'CRACKER GROUPSTOGETHERVARIANTS OF CHANGES SPIDER INTERPRETWEEXTRACTION "Reg" THESAMEMULTIMEDIAURL 8DAAASERETREAL DETERMINES WHETHERA METADATAQUALITY IMPROVEMENT 70 6 MULTIMEMEDAURLISADUPLICATE (ANNOTATIONAGENT4) w - ENHANCES METADAAAGGREATES INTERNE METADATA 78 METADAAFROMOTHERSOURCES. WEBPAGES INTERNET MUTMEDARPAGERL MULTIMEDIA PAGETITLE, PAGEKEYWORDS, FULTEXTRELEVANCYRANKING PAGEDESCRIPTIONSTREAMTITLE AUTHOR, (ANNOTATIONAGENT) COPYRIGHT STREAMURL, KEYFRAME, SORTSSEARCHRESULTSBASEDON BITRATEDURATION, ETC. SEMANTICALLYDISTINCTDATAFIELDS METADATA MTMEDIAURLPAGEURL, PROMOTER PAGETITLE, PAGEKEYWORDS, (ANNOTATIONAGENT5) PAGEDESCRIPTIONSTREAMTTLE AUTHOR, 82 NORMALZESMETADATA, SELECTS AND PROCESSES COPYRIGHT STREAMURL, KEY FRAME MEADATANTOFORMFORSEARCH SYSTEM BTRATEDURATION, ETC. PROMOTETO SEARCHSYSTEM US 6,785,688 B2 Page 2 OTHER PUBLICATIONS Anne J. Gilliand-Swetland, Introduction to Metadata Set Kontothanassis, et al. Compaq, Design, Implementation, and tinging theline StStage, Jul. 5,), 2000,, pppp. 1-11. Analysis of a Multimedia Indexing and Delivery Server, Max Chittister, Oracle InterMedia Annotator User's Guide, Cambridge Research Laboratory Aug. 1999. Release 1.5, 1999, 2000 Orcle Corporation. John R. Smith & Shih-Fu Chang, Visually Searching the Web for Content, Jul.-Sep. 1997, pp. 12-20. * cited by examiner U.S. Patent Aug. 31, 2004 Sheet 1 of 6 US 6,785,688 B2 F.G. 1 U.S. Patent Aug. 31, 2004 Sheet 3 of 6 US 6,785,688 B2 SEED SPIDER 36 38 RETRIEVERESULTS 40 42 PARSEPAGE QUEUE FIG. 4 DEQUEUE AND DISTRIBUTE 46 MULTIMEDIA SPECIFIC INTERPRETIVE 48 METADATA EXTRACT U.S. Patent Aug. 31, 2004 Sheet 4 of 6 US 6,785,688 B2 60 EXTRACT METADATA 62 PARSE AND INDEX EXTRACTED METADATA INTO METADATA FIELDS COMPARE FIELDS WITH 64 KNOWN DATABASES 65 CORRECT/REPLACE/ADD FIELDS QUEUE FIG. 6 SEPARATE NOISY METADATA INTO KEYWORDS 84 PERFORM FULL-TEXT QUERY 86 CALCULATE SCORE 88 92 90 SCORE > DO NOT QUALFY QUALIFY DATABASE THRESHOLD DATABASE QUEUE FIG.7 U.S. Patent Aug. 31, 2004 Sheet 5 of 6 US 6,785,688 B2 106 SELECT AND SORT URLS 102 BINNING COMBINE URLS HAVING 108 COMMON SPECIFIED ATTRIBUTES INTO BINS TERATIVE 104 MASKING OUEUE FIG. 8 CREATEMASK 112 COMPARE MASK WITHURLIN BIN REMOVE CHARACTER(S) ANOTHER URLIN BIN COLLAPSE SIMLAR URLS IN BIN INTO SINGLE URL 122 ANOTHER BIN N QUEUE FIG. 9 U.S. Patent Aug. 31, 2004 Sheet 6 of 6 US 6,785,688 B2 REORGANIZE FIELDS IN URL 138 ANALYZE FIRST FELD TO 140 IDENTIFY ASSOCATED METADATA 148 142 ADD ASSOCATED ASSOCIATEDYNY METADATA TO METADATA ORIGINAL METADATA 146 ANALYZENEXT FIELD TO DENTIFY ASSOCATED METADATA SEMANTICALLY SORTAND 156 CATEGORIZE METADATA (WHO, WHAT WHEN, WHERE, ETC) TECHNICALLY WEIGHT EACH CATEGORY 158 (BITRATE, DURATION, FIDELITY, ETC) 160 FIG. 11 US 6,785,688 B2 1 2 INTERNET STREAMING MEDIA then typically activates one of the hyperlinks to See the WORKFLOW ARCHITECTURE information contained in the document. Search engines, however, have drawbacks. For example, CROSS REFERENCE TO RELATED many typical Search engines are oriented to discover textual APPLICATIONS information only. In particular, they are not well Suited for This application claims priority from U.S. provisional indexing information contained in Structured databases (e.g. application No. 60/252,273, filed on Nov. 21, 2000, which is relational databases), voice related information, audio herein incorporated by reference in its entirety. This appli related information, multimedia, and Streaming media, etc. cation is related to the following applications filed on Jun. 8, Also, mixing data from incompatible data Sources is difficult 2001: application Ser. No. 09/876,943, entitled “Interpretive for conventional Search engines. Stream Metadata Extraction”; application Ser. No. 09/876, Another disadvantage of conventional Search engines is 942, entitled “Metadata Quality Improvement,” application that irrelevant information is aggregated with relevant infor Ser. No. 09/876,925, entitled “Full Text Relevancy Rank mation. For example, it is not uncommon for a Search engine ing”. This application is also related to the following appli on the web to locate hundreds of thousands of documents in cations filed on Jun. 11, 2001: application Ser. No. 09/878, 15 response to a Single query. Many of those documents are 877, entitled “Grouping Multimedia And Streaming Media found because they coincidentally include the Same keyword Search Results”; application Ser. No. 09/878,866, entitled in the Search query. Sifting through Search results in the “Fuzzy Database Retrieval’; and application Ser. No. thousands, however, is a daunting task. For example, if a 09/878,876, entitled “Internet Crawl Seeding”. user were looking for a Song having the title "I Am The Walrus,” the search query would typically contain the word FIELD OF THE INVENTION “walrus.” The list of hits would include documents provid ing biological information on walruses, etc. Thus, the user The present invention relates to computer related infor would have to review an enormous number of these hits mation Search and retrieval, and Specifically to multimedia before finally (if ever) reaching a hit related to the desired and Streaming media Search tools. 25 Song title. Adding to a users frustration is the possibility that many of the Search results are duplicates and/or variants of BACKGROUND each other, leading to the same document (e.g. uniform An aspect of the Internet (also referred to as the World resource locator, URL). Further difficulty occurs in trying to WideWeb, or Web) that has contributed to its popularity is evaluate the relative merit or relevance of concurrently the plethora of multimedia and Streaming media files avail found documents. The Search for Specific content based on able to users. However, finding a specific multimedia or a few key words will almost always identify documents Streaming media file buried among the millions of files on whose individual relevance is highly