<<

Search services are a key to attracting users because they are an important rea- Searching for son people use portals in the first place. Moreover, about 71 percent of users utilize search services to find Web sites, according to Nielsen Media Re-

Industry Trends New Search search, a company that measures com- puter and Internet usage (as well as television audience levels). Currently, though, said Werbach, “For Technologies consumers, most of the search engines are pretty comparable.” Users thus fre- quently choose a portal for reasons other Ilan Greenberg and Lee Garber than its search service. However, a company might attract more users to its portal if it could offer an improved search technology. Re- earching for Web sites is one of searchers are thus looking at a variety of the most common tasks per- new technologies and techniques. formed on the Web. It is also one of the most frustrating. In CHALLENGES S fact, the situation has become The sheer size of the Web is a challenge a notorious symbol of the Web’s grow- to improving search technology. There ing size and lack of structure, as well as are more than 350 million Web pages, the inadequacy of Web search tech- and AltaVista contains only about 140 nologies. million of them, one of the largest totals However, a number of Web companies of any search service. and research organizations are taking a There are technical, Meanwhile, the Web is constantly variety of approaches to try to solve this commercial incentives changing, with new URLs added and old problem. for improving search pages discarded. NEC’s Research Institute Traditional search technology (see the technology. estimated that in 1998, more than 5 per- sidebar “Traditional Web Search Tech- cent of search results in one prominent nology”) is based on users typing in key- search service were invalid or “dead” links. words for the information they want to Boolean search techniques from the 60s receive. Search services then scan Web and 70s are running out of gas,” said NEW APPROACHES pages for those keywords. This approach Kevin Werbach, managing editor of Researchers are taking a variety of consistently causes a number of well- Release 1.0, a newsletter on emerging approaches to improving Web search known problems. communications and computing tech- technology. For example, some search Users must try to come up with the nologies. services are making their Web indexes correct keywords for their search. If the There are considerable commercial bigger, in an effort to make their results keywords are too general or have multi- incentives for developing better search more comprehensive. ple meanings, users may receive too technologies. Various search services— many results or too many irrelevant including AltaVista, , Lycos, and Human annotation results to find the information they want. Yahoo—are turning their Web pages The human-annotation approach For example, a search for “history of into portals. Portals are Web home bases search results on the behavior of rock” could yield results related to pop- bases from which users can access a and the results obtained by previous ular music, geology, or history classes at variety of services, including searches, Web searchers, rather than just on key- a university. Meanwhile, the wrong e-commerce, stock prices, weather fore- words. Proponents say the results of pre- choice of keywords may lead to useless casts, chat rooms, and driving direc- vious searches, as well as Webmasters’ results or no results at all. tions. decisions about which pages their sites “In general, what’s happening is the Companies want to attract more peo- should link to, better indicate which sites ple to their portals because the more will satisfy new searches. They also say unique users they attract, the more this technique reduces the ability of a Editor: Lee Garber, Computer, 10662 Los money they can charge advertisers and Web site to use keywords to manipulate Vaqueros Circle, PO Box 3014, Los Alamitos, partners on the sites. Figure 1 lists the five search services. CA 90720-1314; [email protected] Web sites that attracted the most unique However, Release 1.0’s Werbach and visitors as of May 1999. Continued on page 6

4 Computer Industry Trends

COMPUTER EDITORIAL BOARD EDITOR-IN-CHIEF: JAMES H. AYLOR, UNIV. OF VIRGINIA; [email protected]; (804) 924-6100 ASSOCIATE EDITOR-IN-CHIEF: DORIS CARVER, LOUISIANA STATE UNIV.; [email protected] ADVANCED DESIGN/MANUFACTURING: 35 JOSEPH WONG, [email protected] ARTIFICIAL INTELLIGENCE: BILL MARK, SRI INTERNATIONAL; [email protected] 30 BINARY CRITIC: TED G. LEWIS, TECHNOLOGY ASSESSMENT GROUP;TEDGLEWIS@ FRICTION- FREE- ECONOMY.COM 25 COMPONENT AND OBJECT TECHNOLOGY: BERTRAND MEYER, INTERACTIVE 20 ENGINEERING AND MONASH UNIV.; [email protected] COMPUTING PRACTICES: THOMAS CAIN, UNIV. 15 OF PITTSBURGH; [email protected] CYBERSQUARE: RONALD HOELZEMAN, UNIV. OF PITTSBURGH; R.HOELZEMAN@ COMPUTER.ORG 10

HARDWARE TECHNOLOGIES: ROHIT KAPUR, Unique visitors (millions) SYNOPSYS; [email protected] 5 HIGH-PERFORMANCE COMPUTING: ROBERT COLWELL, INTEL CORP.; BCOLWELL@ICHIPS. INTEL.COM Yahoo Infoseek Excite Lycos AltaVista INTEGRATED ENGINEERING: JERZY ROZENBLIT, UNIV. OF ARIZ., TUCSON; [email protected]; Source: Media Metrix AND SANJAYA KUMAR, HONEYWELL TECHNOL- OGY CENTER; [email protected] According to Media Metrix, a company that provides Internet and digital-media measurement INTERNET WATCH: RON VETTER, UNIV. OF NORTH CAROLINA AT WILMINGTON; services, the Web-search sites with the greatest number of unique visitors as of May 1999 were [email protected] Yahoo, Infoseek, Excite, Lycos, and AltaVista. MANAGEMENT: BARRY BOEHM, UNIV. OF SOUTH- ERN CALIFORNIA; [email protected] MULTIMEDIA AND DATABASES: SHUNSUKE other critics say this approach can limit a other techniques. Google ranks such UEMURA, NARA INST. OF SCIENCE AND TECHNOL- search service’s effectiveness by forcing it pages highly and is likely to return them OGY; [email protected] to reflect past usage and not leaving it open in response to a search query. NETWORKING: JONATHAN M. SMITH, UNIVERSITY OF PENNSYLVANIA; [email protected] enough to meet the needs of new users. Clever. IBM is developing search tech- NEW BOOKS: MIKE LUTZ, ROCHESTER INSTITUTE Direct Hit. The Direct Hit search ser- nology it calls Clever, which uses an algo- OF TECHNOLOGY; [email protected] vice uses a technology it calls the Pop- rithm it calls HITS (Hyperlink-Induced SOFTWARE METRICS: WILL TRACZ, LOCKHEED MARTIN SYSTEMS; [email protected] ularity Engine. A proprietary algorithm Topic Search). (See the related article, SOFTWARE REALITIES: JAMES M. BACH, RELIABLE tracks users through Web searches. “Mining the Web’s Link Structure,” on SOFTWARE TECHNOLOGIES; [email protected] Direct Hit cofounder Gary Culliss said page 60.) The technology starts with a SOFTWARE TECHNOLOGIES: BILL N. SCHILIT, the tracking is done anonymously and standard keyword search to get a root set FX PALO ALTO LABORATORY; SCHILIT@ PAL.XEROX.COM cannot match specific IP addresses to of results. It then looks for documents SPECIAL ISSUES: KATHLEEN SWIGGER, UNIV. OF Web pages. that link to and from the root results. NORTH TEXAS; [email protected] The Popularity Engine monitors which Clever rates the Web pages in the root set STANDARDS: CHARLES R. SEVERANCE, MICHIGAN STATE UNIVERSITY; [email protected] Web pages a user accesses, how much and the linked pages on the basis of how TECHNICAL ACTIVITIES FORUM: DEBORAH time the user spends at each site, and many other sites link to them. SCHERRER, STANFORD UNIV.; DEBBIE@QUAKE. which hyperlinks the user clicks on. Pages that many Web site authors have STANFORD.EDU Direct Hit then uses this information to chosen to link to are called authorities CONTRIBUTING EDITORS rate the relevance of individual Web sites and are considered to be valuable sources DUNCAN LAWRIE, UNIVERSITY OF ILLINOIS; [email protected] to specific searches. of content. Web sites that link to many HOWARD RUBIN, HUNTER COLLEGE; [email protected] According to Culliss, this technique authorities are called hubs and are con- turns users into search editors. sidered to be valuable reference tools. COMPUTER ADVISERS EDWARD A. PARRISH, WORCESTER Google. Google, founded by Stanford POLYTECHNIC INSTITUTE; [email protected] University doctoral students Sergey Brin Built for speed RALPH CAVIN, SEMICONDUCTOR RESEARCH CORP.; [email protected] and Larry Page, is something of a hybrid Fast Search & Transfer (http://www. between the keyword and human-anno- fast.no) is using several approaches in an CS MAGAZINE OPERATIONS COMMITTEE CARL CHANG (CHAIR), tation approaches. effort to make its search service (http:// WILLIAM EVERETT (VICE CHAIR), JAMES H. AYLOR, JEAN BACON, Google uses its own crawler, called www.alltheweb.com) faster. WUSHOW CHOU, GEORGE CYBENKO, Googlebot, to zip around the Web. But Through the scalability in the archi- WILLIAM I. GROSKY, STEVE MCCONNELL, DANIEL E. O’LEARY, KEN SAKAMURA, instead of looking for keywords, tecture, the average response time for an MUNINDAR P. SINGH, JAMES J. THOMAS, Googlebot searches for hyperlinks. In advanced search is under a second, com- YERVANT ZORIAN response to a search topic, Googlebot pared to an industry average of four to CS PUBLICATIONS BOARD BEN WAH (CHAIR), CARL CHANG, JON BUTLER, looks for Web pages that hyperlink to four-and-a-half seconds, said Ray Ro- ALAN CLEMENTS, DANTE DEL CORSO, other pages that are deemed relevant to magnolo, a vice president at Fast Search WILLIAM EVERETT, DAVE PESSEL, FRANCIS LAU, RICHARD ECKHOUSE, SOREL REISMAN the topic, based on text-matching and & Transfer. The company credits its search ser- using natural language,” said John Laf- queries is limited because the algorithms vice’s performance in part to fast index- ferty, associate professor at Carnegie Mel- are still quite immature. ing algorithms; large arrays of off-the- lon University’s Computer Science Depart- Elucidate’s Monash, on the other hand, shelf servers, storage systems, and inter- ment and Language Technologies Institute. said “I think natural-language patterns, connects; and software that efficiently He said natural-language search tech- à la Ask Jeeves, are clearly productive utilizes server capabilities. nology’s ability to effectively parse Continued on page 11 Fast Search & Transfer says this scal- able architecture will be able to effec- tively handle a growing number of search Traditional Web Search Technology queries and search an increasingly large In 1990, researchers at McGill University in Montreal developed Archie, the first index of Web pages. Internet . Archie searches the files of Internet FTP servers. Two other The search service’s index includes 80 early engines search servers: Veronica, developed in 1992 at the University million pages and is slated to grow to 200 of Nevada; and Jughead, developed in 1993 at the University of Utah. million in the near future, which would make it one of the biggest in use. SEARCH ENGINES AND DIRECTORIES Current search services can be divided into search engines and directories. Filtering query responses Search services such as iAtlas and Search engines Northern Light use filtering technology. Search engines (such as AltaVista and HotBot) traditionally consist of three Filtering narrows the scope of queries to components: the crawler, the index, and the search software. Crawlers, also called yield results that are more relevant. When spiders, are programs that automatically scan various Web sites and create indexes submitting a keyword search, users can of URLs, keywords, links, and text. Crawlers also follow the links on a site to fill out electronic forms to specify that, find other relevant pages. They return to sites periodically to look for changes. for example, they want only information When a user submits a search query, the engine’s software goes through the index relating to certain industries or certain to find Web pages with keyword matches and ranks the pages in terms of relevance. geographic locations. “The roles and goals of the user have Directories to be taken into account. This takes you Instead of working with indexes, directories (such as LookSmart and Yahoo) down the road to context sensitivity, work with descriptions of Web pages submitted by either Webmasters or editors which is crucial,” said Curt Monash, who have reviewed the pages. Directories respond to queries by searching through CEO and cofounder of Elucidate Tech- these descriptions. Some search engines, such as ’s MSN and nologies, a software company working Search, take a hybrid approach by also using directories. on a variety of products, including some Because they don’t use crawlers, directories don’t automatically find changes in that are search related. “That gives you Web pages. But proponents say that human-generated descriptions can produce a chance for accurate searches. The more relevant responses to some search queries. results can be a near-perfect search.” However, said Kevin Werbach, managing editor of Release 1.0, a newsletter However, if users filter their searches on emerging communications and computing technologies, it’s difficult for man- too broadly, they could screen out poten- ually produced directories to keep up with the rapid growth of the Internet. He tially useful results. said, “It’s really important to automate some of the processes that they do.”

Natural language KEYWORD SEARCHES Search services are beginning to work Traditionally, a user enters a keyword (or keywords along with Boolean mod- with natural-language queries, designed ifiers, such as “and,” “or,” “not”) into a search engine, which then scans indexed to make them easier to use. For example, Web pages for the keywords. To determine in which order to display pages to the with Ask Jeeves, instead of typing in one user, the engine uses an algorithm to rank sites that contain the keyword. or more keywords, users who are con- For example, the engine may count the number of times the keyword appears sidering selling their automobile could on a page. The engine also may look for keywords in metatags. A metatag is an type, “How much is my used car HTML tag that provides information about a Web page. Unlike most HTML worth?” The service would then refer tags, metatags don’t affect a document’s appearance. Instead, they include such them to a site that provides the market information as a Web page’s contents and some relevant keywords. value of used cars. In the past, some users have subverted keyword-based techniques by stuffing Natural-language engines analyze a their Web pages with keywords or loading their metatags with keywords that query’s grammatical structure for mean- don’t relate to their site’s content. However, search services have taken steps to ing and then use the analysis to conduct counteract this. For example, some don’t scan metatags any more, and some lower keyword searches. the relevance rankings of sites that use keywords unrelated to their content. “There hasn’t been great success in

August 1999 7 Industry Trends Continued from page 7

be ready to scrap their expensive infra- However, the ITU subsequently deter- because many queries are [structured] structure, particularly when they still mined that the 230-MHz allocation fundamentally the same.” have network capacity. would not provide enough capacity for The transition may also be compli- the projected level of usage or provide Directories cated because there’s more competition enough frequencies in the same range Some search sites are considered direc- between 2G standards in North America that could be used throughout the world tories (as seen in the sidebar “Traditional than in Japan or Europe, Poticny said. to permit seamless global roaming. The Web Search Technology”on page 7). Several North American groups, all ITU has thus been recalculating spectrum When Netscape Communications ac- working under the auspices of the needs and considering other ways to pro- quired the NewHoo Community Direc- Industry Associa- vide sufficient capacity. tory Project recently, it also acquired the tion (TIA), have proposed competing Open Directory project (http://dmoz. standards. The CDMA Development org). The project uses volunteer experts Group and Qualcomm favor cdma2000/ in various subject areas to produce and Wideband cdmaOne. The Universal f any killer app will drive 3G wireless maintain comprehensive directories of Wireless Communications Consortium technology, said Bruederle, it will Web sites that contain information in their (UWCC) is working on UWC-136 to I be the Internet. That’s because 3G areas of expertise. The Open Directory provide an upgrade path for TDMA- and will enhance important existing and has been licensed to a number of search GSM-based carriers. future Internet capabilities, including services, including HotBot and Lycos. However, progress is being made Web access, voice applications, and e- Proponents claim the use of volunteers toward compatibility, noted GTE’s commerce. will help the project scale as the Internet Levitan, who said the big issue is pro- Wireless expert Seybold doesn’t see grows. However, some question whether viding upgrade paths to protect the such a bright future. He said that there volunteers will do as effective a job as investment many operators have made in is not a significant demand for 3G tech- experts paid and trained by search ser- 2G technologies. nology’s capabilities now and that tech- vices to work on directories. Levitan expects that North America nology vendors can no longer push will primarily migrate to W-CDMA and carriers into adopting new technologies cdma2000, which will probably be the without sound business reasons. ven the military wants better search most widely used standards worldwide, Even in Japan, he said, carriers are technology. The US Defense Ad- but will also make some use of TD- rolling out 3G technology only because E vanced Research Projects Agency’s CDMA. they’re running out of 2G voice capacity, (DARPA’s) Space and Naval Warfare not because consumers are demanding Systems Center has invested $2 million in OBSTACLES IN 3G’S PATH new capabilities. a classified search-technology project at It appears that worldwide implemen- Bruederle said the key to 3G’s future Mississippi State University. tation of IMT-2000 technology won’t be success will be wireless technology’s pro- Meanwhile, Release 1.0’s Werbach easy, fast, or inexpensive. gression from a voice-centric to a data- said, commercial search services are Although the ITU has worked hard to centric medium. According to many likely to move into niche areas. Niche make proposed 3G standards less diver- industry analysts, the more that users services would focus only on certain top- gent, more work is needed. want to use mobile devices to access ics and/or index only certain sets of doc- Also, in some cases, new 3G tech- data, the more they will need a faster uments, he said. nologies will require carriers to obtain wireless technology. For Web portals, the future of search- new radio frequencies and undertake GTE’s Levitan said that 3G technol- ing will likely entail incorporating not expensive upgrades to software and ogy will be driven by market demand for one but several of the approaches now hardware, including transmitting and the ability to use a single wireless device spearheaded by the vanguard of new receiving equipment, noted Dataquest’s worldwide. However, he said, “It’ll be an search services. “People think search is a Bruederle. evolution, not just a flip of the switch. As monolithic activity,” said Werbach. “But On the other hand, increasingly systems evolve, you’ll see a lot of the it isn’t. The reality is that sometimes you cheaper chipsets and increased competi- newer technology leveraging off previ- will want to use different techniques for tion will help keep prices down for 3G ous generations.” ❖ different results.” ❖ consumer products. Meanwhile, 3G faces spectrum limita- tions. The 1992 World Administrative Ilan Greenberg is a freelance technology Radio Conference, sponsored by the writer based in San Francisco. Contact ITU’s Radiocommunications Sector, him at [email protected]. identified 230 MHz of spectrum at the 2 David Clark is a freelance technology GHz frequency for terrestrial- and satel- writer based in Torrance, California. Lee Garber is Computer’s editor. lite-based 3G transmissions. Contact him at [email protected]. Contact him at [email protected].

August 1999 11