Rising to the Top: Evoluoting the Lmprove Retrievol of World Wide
Total Page:16
File Type:pdf, Size:1020Kb
258/ Risingto the Top:Evoluoting the Useof the HTMLMETA Tog lo lmproveRetrievol of WorldWide WebDocuments through InfernefSeorch Engines ThomqsP. Turner ond LiseBrockbill imp rooe re sourc e dis cot: ery. rn |. he problernot'finding materialson the details the number of pagesindexed byvar- \\brld \\'ide \\zeb has been discussed in ious Intemet search enflnes; in addition, library and in{brmation science journals, the speed ol the systerns is evaluated computer literature, and the popular me- (Mele; 1998). Other authors have analyzed dia Internet search engines have been particular aspects of Internet search en- developed to aid in finding rnaterials; how- gines, such as their retrieval precision ever,their per{brmances vary considerably. (Leighton and Srivastava1997), their us- Numerous re.searchers have evaluated abillty (Pollock and Hockley 1997), and these tools and have detailedtheir streneths their indexins methods (Srinivasan, Ruiz, and rveaknesses.Melee's lndexing Cover- and Larn 1906). Some researchers ha't'e ageAnalysis (MICA) report. issuedrveekly, ofl'ered advice to the authors of Hypertext THoMAs P Tunr-rn (tpt2@corr-relledu) is lvletadataLibrarian, Albert R Mann Library, Cornell University, Ithaca, Nerv York Ltst BnacxsILL ([email protected] lib,or us) is Technol- ogv Trarner, Multnomal.r County Library Portland, Oregon Manuscript received February 4, 1998; accepted lbr publication April 17, 1998 LRTS . 12(4) . Rising to the Top /259 Markup Language (HTML) documents METADATA AND THE about improving retrieval of their materials HTML META TeC The current research rvas designed to de- termine how uselul one rnithod, the Much has been rvritten about the impor- HTML META tag. is in irnprovingaccessi- tance of metadata for understanding and bility via Interneisearch engnes; here lve usingelectronic resources This literiture focus on indexins rather than on search en- shedslight on the types of issuesthat the gine per{ormance. HTML META tag (see {igure 1) is <HTML> <HEAD> <TITLE>Poultry, production and vafue</TITLE> <META NAME="keywords" CONTENT="USDA,Mann Library, poultry produc- Lion and value, agriculLure, I ivestock, dairy, poultry, agr icul- tural economics, business, trade, commodities, statistics"> <METANAME="Description" CONTENT="Thisfull-text file presents the annual estimates of production and value for commercial broilers, eggs, turkeys raised, and chickens sold by stales and U.S. This report is a supplement to Broiler hatchery, Chickens and eggs, and Turkey hatchery- "> < / HEAD> <BODY> <A HREF="hLLp: //www.mann-Iio.cornel I .edu/gateway.hcm1 ">Mann Library Home Page<,/A>: <A HREF=" ht tp : / /www.mannf ib. cornel l . edu/ cataf og / cataf og . htm l', >cate- way</A> <H1><img src= " http : / /ww. rnannlib. cornel I . edu/ icons /world. gi f " ALT -" IWor]dl "> PoulLry, production and value .</HL> <HR><P> < f orm act ion= " http : / /tm. mannl ib. cornef I . edu/cgti -bin,/ connect . cgi " method= "post " ><input type= " submi t " name=,'Connect " VALUE=" Connect,' > <f nput NAME='thelD' TYPE=" hidden', VALUE-" 728', >< / form><br clear=al l> <HR> <H3>Descript ion< /H3 > This ful1-text file presents the annuaf estimat.es of production and value for commercial broilers, eggs, Lurkeys raised, and chickens sold by states and U.S. This report is a supplement to Broiler hatchery, Chickens and eggs, and Turkey hatchery.<p> Resource type: Full text<P> Update Frequency: Annualfy<P> Summary Holdingrs : 1995-<P> Publisher: Washington, DC : National Agricultural Statistics Ser- vrce, <P> <H3>Access Notes<,/H3> No access restrictions apply. <p> <HR> <B>Crossroads</B>.. from here you can:<p> <DL><DD>coto other titles of similar subiect: <DD><UL> <LI><A HREF="http : / /www.mannlib. cor- nell. edu/cgi-bin/subj . cgi?ag-econ" >Agriculture - Agricultural Eco- nomics<,/A> <LI><A HREF="http ; / /www.mannlib. cor- ne11. edu/cgri-bin/subj . cgi?ag- live', >Agriculture - Livestock, Dairy and Poultry</A> <LI><A HREF="http : //www.mannlib. cor- nell. edu/cgi-bin/subj . cgi?bus-tra,'>Business and Economics - Trade and Commodities</A> </uL></DL><P> </BODY></HTML> Figure l. Example HTML Document s'ith Embedded META Tags 260/ LRTS . 42(4) . Tu.rnerandBrackbill intended to address. Metadata is com- data can be placed (\\/eibel 1997). Many rnonly defined asdata about data A more o{'these authors envision resources that cornolete definition notes that rnetadata are "self-declaring" because the items pro*'ides "a user (human or rnachine) provide importanl information about with a means to discover that the re- themselvesto hurnancatalogers and auto- source exists and how it miEht be ob- rnatedindexers. tained or accessed.It can covir rnanyas- The HTML META tas resideswithin pects, such as subject content, creators, the header and can have the attributes publishers, quality, structure, history, CONTENT, HTTP-EQUIY or NAME. access riqhts and restrictions, relation- It is intended to provide "a place to put ship to oiher works or appropriate audi- rneta-information that is not delined by ence" (Efthimiadis and Carlyle 1997, 5) the other HEAD elements.This allowsan Metadata is important fbr what it enables; author to more richly describe the docu- its strength is not description but the sup- ment content for indexing and cataloging port it providesfor resourcediscovery and purposes" (Graham 1995,f47). In this re- data use (Lynch I998). Metadata alsopre- search.we are most concerned with trvo vents ambiguity about data (Lide 1995). attributes: CONTENT and NAME. The \\'eibel (1995) describes metadata as the NAME attribute requires that a CON- centerpiece ol inlbrmation gathering. He TENT attribute also be present. Al- arguesthat new types of metadataneed to though the NAME attribute can take the be developed to facilitate docurnent dis- valuesof author,document type, distribu- covery and suggeststhe Dublin Core ele- tion, keylvords, and descripiion ment description set as a solution Ibr other values, most of the Intemet "*otlgsearch metadataproblems. engines that currently support use of the HTML permits document authors to META tag recognize only those NAME control not only how text, graphics, and attributes defined askeyl,vords or descrip- multirnedia materials are displayed, but tion. The keylvordsattribute provides im- also the inforrnation available about the portant terms associated u'ith a docu- document itself throuph the use of the ment, rvhile the description attribute META tag. Several authors have sug- briefly details it and is often used as a gested that the HTML META tag can be summary on the results page generated usedto enhanceinfbrrnation retrieval, es- by Internet search engine queries. This pecially through Internet searchengine.s example of a META tag from the header AltaVista Search Network (lgg7) docu- of the USDA report "Agriculture and mentation suggeststhat authors use the trade: Eurooe" illustrates the use ofboth keyrvords and description attributes o1' the keywords and description attributes: the META tag to improve retrieval and control the descriotion ol the document <META NAME="Kelt'ords" CON- that appears on i search results page TENT="USDA, Mann Library agricul- Bremser (1997) offers more detailed ad- ture, Europe, agricultural economics, vice to \\/eb authors about using di{Ierent internationalagriculture, business, eco- aspectsof the META tag nomics,trade, commodities, statistics"> The META tag has alsobeen seen as a rvay of providing additional types o{' <META NAME="Description"CON- metadataaboutdocuments. Miller (1995) TENT="Database contains macroeco- discussesthe potential use of the META nomic data on \\'estern Europe, budget tag to contain formatted in{brmation de- andprice data, and time-series data on sup- fined by the Dublin Core element set ply andutilization of agriculturalcommod- The Dublin Core orovides a meansof cre- itiesfor the EC-12 and the EuropeanFree atine basic metadita about a resourcein a TradeAssociation "> sirnple rnanner and is not formally con- nected to the HTML META tag. Hou'- Severalauthors have voiced some con- ever, the META tag is the best section ol' cerns about the potential misuse and fail- the HTML specification in which this ure of the META tag. Kuhn (1996) notes LR?S . 42(4) . Risingtothe Top /26I that although the META tag can be used when appropriate searchesare executed. for certain infonnation, there is not Although the META tag is being put to enough agreement about the qpes ofin- use by many !\/orld \\/ide \tt/eb publish- formation that can be irnolemented. He is ers, its effectivenesshas not been evalu- especially concerned a6out in{brmation ated. In this study, we examinethe follow- related to authors of documents, ab- ing questions related to the use of the stracts, and document content beyond HTML META tag: kepvords assignedby authors. One con- cern about the use of the META tas in- l. Do pagesthat use the META tag have volvesthe variousopinions about the no- higher retrieval ranks than pagesthat menclature for the NAME attribute. do not? Currently some searchengines recognize 2. Is one method of META tas NAME designated as keywords and de- authoring more effective than othei scription, but other options, such as his- methods? tory, access restrictions, and audience, 3. Do pagesthat use both of the META are ignored. \\/ithout consensusabout the taq attributes have better retrieval nornenclature amonq the HTML stan- ranks than pagesthat use only one at- dard develope.s, aniho.s using HTML, tribute? and Internet indexinq services, the META tag will ne'uer 6e rvidely irnple- To answer these questions,it is necessary mented (P1'a{Ienbergerf995). Current to understand how search engines deal court casesrvill alsoset orecedentsfor the with the META tag. use of the META tag. Using namesin a META tag that have nothing to do with the content ol'a site has been called into Mrrnoo legalquestion by companieswhose names At the time oI'this research.Mann Li- appear in documents with which they brary provided accessto many USDA re- have no connection (Kaplan 1997). ports and data sets,as well as other net- worked electronic resources, throuqh (http://www. PRoBLEM Srerrlroxr the Mann Library Gateway library.cornell.edu/).