Intelligent Internet Searching Agent Based on Hybrid Simulated Annealing

Total Page:16

File Type:pdf, Size:1020Kb

Intelligent Internet Searching Agent Based on Hybrid Simulated Annealing Decision Support Systems 28Ž. 2000 269±277 www.elsevier.comrlocaterdsw Intelligent internet searching agent based on hybrid simulated annealing Christopher C. Yang a,), Jerome Yen a, Hsinchun Chen b a Department of Systems Engineering and Engineering Management, The Chinese UniÕersity of Hong Kong, Hong Kong, People's Republic of China b Department of Management Information Systems, UniÕersity of Arizona, Tucson, AZ, USA Abstract The World-Wide WebŽ. WWW based Internet services have become a major channel for information delivery. For the same reason, information overload also has become a serious problem to the users of such services. It has been estimated that the amount of information stored on the Internet doubled every 18 months. The speed of increase of homepages can be even faster, some people estimated that it doubled every 6 months. Therefore, a scalable approach to support Internet searching is critical to the success of Internet services and other current or future National Information InfrastructureŽ. NII applications. In this paper, we discuss a modified version of simulated annealing algorithm to develop an intelligent personal spiderŽ. agent , which is based on automatic textual analysis of the Internet documents and hybrid simulated annealing. q 2000 Elsevier Science B.V. All rights reserved. Keywords: Information retrieval; Intelligent agent; Searching agent; Simulated annealing; World-wide Web 1. Introduction To develop searching engines or spiders, which are ``intelligent'', or to reach high recall and high Information searching over the cyberspace has precision is always the dream to the researchers in become more and more important. It has been esti- this area. In order to qualified as an agent or intelli- mated that the amount of information stored on the gent agent, such searching agent or spider must be Internet doubled every 18 months. However, the able to make adjustments according to progress of speed of increase of home pages can be even faster searching or be personalized to adjust its behavior and it is doubled every 6 months or even shorter. In according to the users' preferences or behavior. some areas, such as, Hong Kong and Taiwan, the The major problem with the current searching increasing speeds can be even faster. Therefore, engines or spiders is that only few spiders do have searching for the needed homepages or information the communication capabilities between the spiders has become a challenge to the users of Internet. and the users who dispatched the spiders. Since there is no communication, the users are difficult to trace ) Corresponding author. or to understand the progress of searching and have E-mail address: [email protected]Ž. C.C. Yang . to tie themselves to the terminals. This paper reports 0167-9236r00r$ - see front matter q 2000 Elsevier Science B.V. All rights reserved. PII: S0167-9236Ž. 99 00091-3 270 C.C. Yang et al.rDecision Support Systems 28() 2000 269±277 a searching engine, which uses the combination of brid simulated annealing algorithm. Best-first search CGI and Java to develop the user interface. It allows has been developed and reported in our earlier publi- the users to keep track of the progress of searching. cationswx 3,4 . In this paper, we propose an approach The users can also make changes on the searching based on automatic textual analysis of Internet docu- parameters, such as, number of homepages to re- ments and hybrid simulated annealing based search- trieve. There were several algorithms have been used ing engine. to develop spiders, for example, best-first searching In Section 2, a short literature review will be and genetic algorithms. In this paper, we will discuss provided. Section 3 will discuss the architecture and the spider that we developed with the hybrid simu- algorithms for building our searching spider. Section lated annealing algorithm. We have made a compari- 4 will report the experiments that we have conducted son to compare the performance of spiders that to compare its performance with the other searching developed with best-first search. The results will be spiders. This paper is concluded with some com- reported in this paper. ments about our spider and other factors that will Although network protocols and Internet applica- affect its performance. tions, such as, HTTP, Netscape and Mosaic, have significantly improved the efficiency and effective- ness of searching and retrieving of online informa- 2. Literature review: machine learning, intelligent tion, their usage is still accompanied by the problems agents that users cannot explore and find what they want in the cyberspace. While Internet services become pop- ular to the users, difficulties with searching on the 2.1. Machine learning Internet is expected to get worse as the amount of on-line information increases, the number of Internet Research on intelligent InternetrIntranet search- users increasesŽ. traffic increases , and more and ing relies significantly on machine learning. Neural more multimedia are used to develop the home network, symbolic learning algorithms and genetic pages. This is the problem of information overload algorithms are three major approaches in machine or information explosion. learning. Development of searching engines has become Neural network model computation in terms of easier. For example, it is possible to download exe- complex topologies and statistics-based error correc- cutable spider programs or even source codes. How- tion algorithms, which fits well conventional infor- ever, it is difficult to develop spiders which do have mation retrieval models such as vector space model satisfactory performance or unique features, such as, and probabilistic model. Doszkocs et al.wx 7 gives an learning and collaboration. overview of connectionist models for information There are two major approaches to develop retrieval. Belewwx 1 developed a three-layer neural searching engines, either based on keywords and network of authors, index terms, and documents huge index tables, for example, Alta Vista and Ya- using relevance feedback from users to change its hoo, or based on hypertext linkages, for example, representation over time. Kwok developed a similar Microsoft Explorer and Netscape browser. It is diffi- three-layer network using a modified Hebbian learn- cult for the keyword search to reach high precision ing rule. Lin et al.wx 11 adopted the Kohonen's and high recall. Slow response due to the limitations feature map to produce a two-dimensional grid rep- on indexing methodology and network traffics, and resentation for N-dimensional features. the inability for the users to use the appropriate Symbolic learning algorithms are based on pro- terms to articulate their need always become frus- duction rule and decision tree knowledge representa- trate the users. tions. Fuhr et al.wx 8 adopted regression methods and Our approach is based on automatic textual analy- ID3 for feature-based automatic indexing technique. sis of Internet documents, such as, HTML files, aims Chen and She adopted ID3 and the incremental to address the Internet searching problem by creating ID5R algorithm for constructing decision trees of intelligent personal spiderŽ. agent based on the hy- important keywords which represent users' queries. C.C. Yang et al.rDecision Support Systems 28() 2000 269±277 271 Genetic algorithms are based on evolution and bandwidth bottleneck on Internet severely con- heredity. Gordonwx 9 presented a genetic algorithms strained the usefulness of such an agent approach. based approach for document indexing. Chen and At the Second WWW Conference, Pinkertonwx 14 Kimwx 5 presented a GA-neural-network hybrid sys- reported a more efficient spiderŽ. crawler . The tem for concept optimization. WebCrawler extends the tueMosaic's concept to ini- Our hybrid simulated annealing approach is simi- tiate the search using its index and to follow links in lar to genetic algorithm in producing new generation. an intelligent order. It first appeared in April of 1994 However, the selection is stochastic based instead of and was purchased by America Online in January of probability base. 1995. The WebCrawler extended the concept of the Fish Search Algorithm to initiate the search using index, and to follow links in an intelligent order. 2.2. Intelligent internet searching agent However, the WebCrawler evaluates the relevance of the link based on the similarity of the anchor text to There are two major approaches of Internet the user's query. The anchor texts are the words that searching:Ž. 1 client-based searching agent, and Ž. 2 describe a link to another document. These anchor online database indexing and searching. There are texts are usually short and do not provide relevance also some systems contain both approaches. information as much as the full document texts. A client-based searching agent on the Internet Moreover, problems with the local search and com- serves as a program that operates autonomously to munication bottleneck persist. A more efficient and search for relevant information without direct human global Internet search algorithm is needed to im- supervision. Several software programs have been prove client-based searching agents. developed. The TkWWW robot was developed by Scott TueMosaic and the WebCrawler are two promi- Spetka and was funded by the Air Force Rome nent examples. Both of them are using the Best First Laboratorywx 15 The TkWWW robots are dispatched Search techniques. DeBra and Postwx 6 reported tue- from the TkWWW browser. The
Recommended publications
  • Microsoft's Internet Exploration: Predatory Or Competitive?
    Cornell Journal of Law and Public Policy Volume 9 Article 3 Issue 1 Fall 1999 Microsoft’s Internet Exploration: Predatory or Competitive Thomas W. Hazlett Follow this and additional works at: http://scholarship.law.cornell.edu/cjlpp Part of the Law Commons Recommended Citation Hazlett, Thomas W. (1999) "Microsoft’s Internet Exploration: Predatory or Competitive," Cornell Journal of Law and Public Policy: Vol. 9: Iss. 1, Article 3. Available at: http://scholarship.law.cornell.edu/cjlpp/vol9/iss1/3 This Article is brought to you for free and open access by the Journals at Scholarship@Cornell Law: A Digital Repository. It has been accepted for inclusion in Cornell Journal of Law and Public Policy by an authorized administrator of Scholarship@Cornell Law: A Digital Repository. For more information, please contact [email protected]. MICROSOFT'S INTERNET EXPLORATION: PREDATORY OR COMPETITIVE? Thomas W. Hazlettt In May 1998 the U.S. Department of Justice ("DOJ") accused Microsoft of violatirig the Sherman Antitrust Act by vigorously compet- ing against Netscape's Navigator software with Microsoft's rival browser, Internet Explorer. The substance of the allegation revolves around defensive actions taken by Microsoft to protect the dominant po- sition enjoyed by Microsoft's Windows operating system. The DOJ's theory is that, were it not for Microsoft's overly aggressive reaction to Netscape, Navigator software would have been more broadly distributed, thus enabling competition to Windows. This competition would have come directly from Java, a computer language developed by Sun Microsystems and embedded in Netscape software, allowing applications to run on any underlying operating system.
    [Show full text]
  • DOCUMENT RESUME AUTHOR Webnet 96 Conference Proceedings
    DOCUMENT RESUME ED 427 649 IR 019 168 AUTHOR Maurer, Hermann, Ed. TITLE WebNet 96 Conference Proceedings (San Francisco, California, October 15-19, 1996). INSTITUTION Association for the Advancement of Computing in Education, Charlottesville, VA. PUB DATE 1996-10-00 NOTE 930p.; For selected individual papers, see IR 019 169-198. Many figures and tables are illegible. AVAILABLE FROM Web site: http://aace.virginia.edu/aace/conf/webnet/proc96.html; also archived on WebNet 98 CD-ROM (includes 1996, 1997, 1998) AACE Membership/CD orders, P.O. Box 2966, Charlottesville, VA 22902; Fax: 804-978-7449 ($35, AACE members, $40, nonmembers). PUB TYPE Collected Works Proceedings (021) EDRS PRICE MF06/PC38 Plus Postage. DESCRIPTORS Access to Information; Authoring Aids (Programming); Computer Science; Computer Software; Courseware; Databases; Distance Education; Educational Media; Educational Strategies; *Educational Technology; Electronic Libraries; Elementary Secondary Education; *Hypermedia; Information Technology; Instructional Design; Multimedia Materials; Postsecondary Education; *World Wide Web IDENTIFIERS Electronic Commerce; Software Tools; Virtual Classrooms; *Web Sites ABSTRACT This proceedings contains 80 full papers, 12 posters/demonstrations, 108 short papers, one panel, and one tutorial, all focusing on World Wide Web applications. Topics include: designing hypertext navigation tools; Web site design; distance education via the Web; instructional design; the world-wide market and censorshipon the Web; customer support via the Web; VRML;
    [Show full text]
  • Regulating Search Engines: Taking Stock and Looking Ahead
    GASSER: REGULATING SEARCH ENGINES REGULATING SEARCH ENGINES: TAKING STOCK AND LOOKING AHEAD "To exist is to be indexed by a search engine" (Introna & Nissenbaum) URS GASSER TABLE OF CONTENTS I. IN TR O D UCTIO N ....................................................................................... 202 II. A BRIEF (AND CASUAL) HISTORY OF SEARCH ENGINES ..................... 203 Il. SEARCH ENGINE REGULATION: PAST AND PRESENT ........................ 208 A. OVERVIEW OF SEARCH ENGINE-RELATED CASES ............................ 208 B. LEGISLATION AND REGULATION ................................................. 216 C . SU M M AR Y .......................................................................................... 2 19 III. POSSIBLE FUTURE: HETEROGENEOUS POLICY DEBATES AND THE NEED FOR A NORMATIVE FRAMEWORK ......................................... 220 A. THEMES OF FUTURE POLICY DEBATES ............................................. 220 B . C HALLENGES A HEAD ........................................................................ 224 C. NORMATIVE FOUNDATIONS .............................................................. 227 IV . C ON CLU SIO N ....................................................................................... 234 * Associate Professor of Law, S.J.D. (St. Gallen), J.D. (St. Gallen), LL.M. (Harvard), Attorney at Law, Director, Research Center for Information Law, Univ. of St. Gallen, Faculty Fellow, Berkman Center for Internet & Society, Harvard Law School. I owe special thanks to my colleague James Thurman and the
    [Show full text]
  • Search Engines : Tools for Exploring the Internet
    CALIBER-98. 4-5 March 1998. Bhubaneswar. pp.193-199 @ 1NFLIBNET Centre. Ahmedabad Search Engines : Tools For Exploring The Internet SWAPAN KUMAR DASGUPTA Central Library, Kalyani University, Klyuniv @giasclOl. vsnl.net.in Abstract A software package for searching a particular information or topic from the vast amount of information available in INTERNET is called a search engine. The common search engines are Altavista. Webcrawler, Yahoo, Lycos, Infoseek, Aliweb. The author provides a list of search engines in INTERNET covering wide areas of interest and then brief description of URLs. After mentioning about the role of the INFLIBNET in modemising the university libraries and in improving the on-line access by creating its web page, the author says that in order to improve upon the education and research infrastructure of the country. some changes are necessary in our present thinking and approach. Introdution Internet is a global mesh and may be called a large repository of information put up by the user. Searching in a particular information or topic of interest, is an intricate task due to the fabulous size of Internet, and vast amount of information, and its many possible methods of storage. A software package for this purpose is called a search engine. Common Search Engines The common WWW search engines are Altavista, Webcrawler, Yahoo, Lycos, Infoseek, Aliweb. Some of these sites may be very busy, then the user has to try for another site, or may press G key for going to another URL. People are not aware that, netsurf can be of various ways. They may say it seems to be time consuming, but the fact is it is free from traditional constraints of time and space.
    [Show full text]
  • Name Synopsis Description Arguments Options
    W3M(1) General Commands Manual W3M(1) NAME w3m − a text based web browser and pager SYNOPSIS w3m [OPTION]... [ file | URL ]... DESCRIPTION w3m is a text based browser which can display local or remote web pages as well as other documents. It is able to process HTML tables and frames but it ignores JavaScript and Cascading Style Sheets. w3m can also serveasapager for text files named as arguments or passed on standard input, and as a general purpose directory browser. w3m organizes its content in buffers or tabs, allowing easy navigation between them. With the w3m-img extension installed, w3m can display inline graphics in web pages. And whenever w3m’s HTML rendering capabilities do not meet your needs, the target URL can be handed overtoagraphical browser with a single command. Forhelp with runtime options, press “H” while running w3m. ARGUMENTS When givenone or more command line arguments, w3m will handle targets according to content type. For web, w3m gets this information from HTTP headers; for relative orabsolute file system paths, it relies on filenames. With no argument, w3m expects data from standard input and assumes “text/plain” unless another MIME type is givenbythe user. If provided with no target and no fallback target (see for instance option −v below), w3m will exit with us- age information. OPTIONS Command line options are introduced with a single “−” character and may takeanargument. General options −B with no other target defined, use the bookmark page for startup −M monochrome display −no-mouse deactivate mouse support −num display each line’snumber −N distribute multiple command line arguments to tabs.
    [Show full text]
  • Maelstrom Web Browser Free Download
    maelstrom web browser free download 11 Interesting Web Browsers (That Aren’t Chrome) Whether it’s to peruse GitHub, send the odd tweetstorm or catch-up on the latest Netflix hit — Chrome’s the one . But when was the last time you actually considered any alternative? It’s close to three decades since the first browser arrived; chances are it’s been several years since you even looked beyond Chrome. There’s never been more choice and variety in what you use to build sites and surf the web (the 90s are back, right?) . So, here’s a run-down of 11 browsers that may be worth a look, for a variety of reasons . Brave: Stopping the trackers. Brave is an open-source browser, co-founded by Brendan Eich of Mozilla and JavaScript fame. It’s hoping it can ‘save the web’ . Available for a variety of desktop and mobile operating systems, Brave touts itself as a ‘faster and safer’ web browser. It achieves this, somewhat controversially, by automatically blocking ads and trackers. “Brave is the only approach to the Web that puts users first in ownership and control of their browsing data by blocking trackers by default, with no exceptions.” — Brendan Eich. Brave’s goal is to provide an alternative to the current system publishers employ of providing free content to users supported by advertising revenue. Developers are encouraged to contribute to the project on GitHub, and publishers are invited to become a partner in order to work towards an alternative way to earn from their content. Ghost: Multi-session browsing.
    [Show full text]
  • The Elinks Manual the Elinks Manual Table of Contents Preface
    The ELinks Manual The ELinks Manual Table of Contents Preface.......................................................................................................................................................ix 1. Getting ELinks up and running...........................................................................................................1 1.1. Building and Installing ELinks...................................................................................................1 1.2. Requirements..............................................................................................................................1 1.3. Recommended Libraries and Programs......................................................................................1 1.4. Further reading............................................................................................................................2 1.5. Tips to obtain a very small static elinks binary...........................................................................2 1.6. ECMAScript support?!...............................................................................................................4 1.6.1. Ok, so how to get the ECMAScript support working?...................................................4 1.6.2. The ECMAScript support is buggy! Shall I blame Mozilla people?..............................6 1.6.3. Now, I would still like NJS or a new JS engine from scratch. .....................................6 1.7. Feature configuration file (features.conf).............................................................................7
    [Show full text]
  • Netscape 6.2.3 Software for Solaris Operating Environment
    What’s New in Netscape 6.2 Netscape 6.2 builds on the successful release of Netscape 6.1 and allows you to do more online with power, efficiency and safety. New is this release are: Support for the latest operating systems ¨ BETTER INTEGRATION WITH WINDOWS XP q Netscape 6.2 is now only one click away within the Windows XP Start menu if you choose Netscape as your default browser and mail applications. Also, you can view the number of incoming email messages you have from your Windows XP login screen. ¨ FULL SUPPORT FOR MACINTOSH OS X Other enhancements Netscape 6.2 offers a more seamless experience between Netscape Mail and other applications on the Windows platform. For example, you can now easily send documents from within Microsoft Word, Excel or Power Point without leaving that application. Simply choose File, “Send To” to invoke the Netscape Mail client to send the document. What follows is a more comprehensive list of the enhancements delivered in Netscape 6.1 CONFIDENTIAL UNTIL AUGUST 8, 2001 Netscape 6.1 Highlights PR Contact: Catherine Corre – (650) 937-4046 CONFIDENTIAL UNTIL AUGUST 8, 2001 Netscape Communications Corporation ("Netscape") and its licensors retain all ownership rights to this document (the "Document"). Use of the Document is governed by applicable copyright law. Netscape may revise this Document from time to time without notice. THIS DOCUMENT IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND. IN NO EVENT SHALL NETSCAPE BE LIABLE FOR INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES OF ANY KIND ARISING FROM ANY ERROR IN THIS DOCUMENT, INCLUDING WITHOUT LIMITATION ANY LOSS OR INTERRUPTION OF BUSINESS, PROFITS, USE OR DATA.
    [Show full text]
  • Attachment to Pih Notice 2000-34
    Office of Public and Indian Housing SEMAP (Section 8 Management Assessment Program) Certification Technical Requirements · Computer with Pentium processor or higher · 100 Mb or larger hard drive · 16 Mb or more RAM · VGA Monitor · 28.8KBps or higher modem connected to an outside telephone line or a direct connection to the Internet · Operating system capable of Internet access · Internet Browser: Netscape 4.0 or higher, Internet Explorer 4.0 or higher* *It is possible to use the PIC to submit information by using either Netscape 3.2 or Internet Explorer 3.2. However, several features that enhance usability (for example, certain colors) are not available unless Netscape 4.0 or higher, Internet Explorer 4.0 or higher is used. Setting up Internet Service In order to obtain an Internet Service Provider in your area, you can look in the Yellow Pages of the telephone book under "Computers.” Usually there is a section called "On-Line Access Providers" which will give you a list of companies in your area that can provide Internet access. It may be helpful to call two or three providers to compare prices and services. Only basic Internet service is needed to use the SEMAP system to submit SEMAP certifications. In most cases, the Internet Service Provider will provide Internet browser software and will provide assistance for installing the software, configuring a modem, and accessing the Internet. You should ensure that the provider you select does, in fact, provide the browser software and provides some support. Once the browser software is installed, you will require a connection from your computer to a regular phone line to access the Internet.
    [Show full text]
  • HTTP Cookie - Wikipedia, the Free Encyclopedia 14/05/2014
    HTTP cookie - Wikipedia, the free encyclopedia 14/05/2014 Create account Log in Article Talk Read Edit View history Search HTTP cookie From Wikipedia, the free encyclopedia Navigation A cookie, also known as an HTTP cookie, web cookie, or browser HTTP Main page cookie, is a small piece of data sent from a website and stored in a Persistence · Compression · HTTPS · Contents user's web browser while the user is browsing that website. Every time Request methods Featured content the user loads the website, the browser sends the cookie back to the OPTIONS · GET · HEAD · POST · PUT · Current events server to notify the website of the user's previous activity.[1] Cookies DELETE · TRACE · CONNECT · PATCH · Random article Donate to Wikipedia were designed to be a reliable mechanism for websites to remember Header fields Wikimedia Shop stateful information (such as items in a shopping cart) or to record the Cookie · ETag · Location · HTTP referer · DNT user's browsing activity (including clicking particular buttons, logging in, · X-Forwarded-For · Interaction or recording which pages were visited by the user as far back as months Status codes or years ago). 301 Moved Permanently · 302 Found · Help 303 See Other · 403 Forbidden · About Wikipedia Although cookies cannot carry viruses, and cannot install malware on 404 Not Found · [2] Community portal the host computer, tracking cookies and especially third-party v · t · e · Recent changes tracking cookies are commonly used as ways to compile long-term Contact page records of individuals' browsing histories—a potential privacy concern that prompted European[3] and U.S.
    [Show full text]
  • Copyrighted Material
    05_096970 ch01.qxp 4/20/07 11:27 PM Page 3 1 Introducing Cascading Style Sheets Cascading style sheets is a language intended to simplify website design and development. Put simply, CSS handles the look and feel of a web page. With CSS, you can control the color of text, the style of fonts, the spacing between paragraphs, how columns are sized and laid out, what back- ground images or colors are used, as well as a variety of other visual effects. CSS was created in language that is easy to learn and understand, but it provides powerful control over the presentation of a document. Most commonly, CSS is combined with the markup languages HTML or XHTML. These markup languages contain the actual text you see in a web page — the hyperlinks, paragraphs, headings, lists, and tables — and are the glue of a web docu- ment. They contain the web page’s data, as well as the CSS document that contains information about what the web page should look like, and JavaScript, which is another language that pro- vides dynamic and interactive functionality. HTML and XHTML are very similar languages. In fact, for the majority of documents today, they are pretty much identical, although XHTML has some strict requirements about the type of syntax used. I discuss the differences between these two languages in detail in Chapter 2, and I also pro- vide a few simple examples of what each language looks like and how CSS comes together with the language to create a web page. In this chapter, however, I discuss the following: ❑ The W3C, an organization that plans and makes recommendations for how the web should functionCOPYRIGHTED and evolve MATERIAL ❑ How Internet documents work, where they come from, and how the browser displays them ❑ An abridged history of the Internet ❑ Why CSS was a desperately needed solution ❑ The advantages of using CSS 05_096970 ch01.qxp 4/20/07 11:27 PM Page 4 Part I: The Basics The next section takes a look at the independent organization that makes recommendations about how CSS, as well as a variety of other web-specific languages, should be used and implemented.
    [Show full text]
  • United States Patent (19) 11 Patent Number: 6,094,649 Bowen Et Al
    US006094649A United States Patent (19) 11 Patent Number: 6,094,649 Bowen et al. (45) Date of Patent: Jul. 25, 2000 54) KEYWORD SEARCHES OF STRUCTURED “Charles Schwab Broadens Deployment of Fulcrum-Based DATABASES Corporate Knowledge Library Application', Uknown, Full 75 Inventors: Stephen J Bowen, Sandy; Don R crum Technologies Inc., Mar. 3, 1997, pp. 1-3. Brown, Salt Lake City, both of Utah (List continued on next page.) 73 Assignee: PartNet, Inc., Salt Lake City, Utah 21 Appl. No.: 08/995,700 Primary Examiner-Hosain T. Alam 22 Filed: Dec. 22, 1997 Assistant Examiner Thuy Pardo Attorney, Agent, or Firm-Computer Law---- (51) Int. Cl." ...................................................... G06F 17/30 52 U.S. Cl. ......................................... 707/3; 707/5; 707/4 (57 ABSTRACT 58 Field of Search .................................... 707/1, 2, 3, 4, 707/5, 531, 532,500 Methods and Systems are provided for Supporting keyword Searches of data items in a structured database, Such as a 56) References Cited relational database. Selected data items are retrieved using U.S. PATENT DOCUMENTS an SQL query or other mechanism. The retrieved data values 5,375,235 12/1994 Berry et al. ................................. is are documented using a markup language such as HTML. 5,469,354 11/1995 Hatakeyama et al. ... 707/3 The documents are indexed using a web crawler or other 5,546,578 8/1996 Takada ................. ... 707/5 indexing agent. Data items may be Selected for indexing by 5,685,003 11/1997 Peltonen et al. .. ... 707/531 5,787.295 7/1998 Nakao ........... ... 707/500 identifying them in a data dictionary. The indexing agent 5,787,421 7/1998 Nomiyama ..
    [Show full text]