NEWS FEATURE NATURE|Vol 438|1 December 2005

conference proceedings and institutional repositories, often locating free versions of Start your engines articles on author websites. This ‘grey litera- ture’ is growing in importance but remains poorly defined. It is widely assumed that Google has launched another challenge to commercial search Google considers a source scholarly if it is cited services — this time aimed at scientists. But is the new engine by another scholarly resource — but as online publishing evolves, so may this definition. running as smoothly as its fans hope? Jim Gilesinvestigates. Advocates of greater access to the scientific lit- erature hope that Scholar will encourage more s an undergraduate in India in the Science had a monopoly on citation tracking. researchers to deposit their articles in free mid-1980s, Anurag Acharya had to Citation counts allow researchers, institutes online repositories. write letters to scientists when he and journals to follow the impact of individual But how well does Scholar actually work? Acould not find the papers he wanted. articles through time, leading to metrics, such Librarians who have It is a memory that makes the softly spoken as journal impact factors, that are the bane and run systematic computer engineer laugh. Now working at blessing of many academic careers. Google, Acharya is creating a search tool that But unlike and Web of aims to be the first choice for everyone from Science, Scholar does more Indian students to Iranian professors. “I want than just search the peer- to make it the one place to go to for scholarly reviewed literature. Some information across all languages and disci- users like the fact that plines,” he says. And that ambition, he freely Scholar searches lots admits, is “simple to state, but not to achieve”. of non-traditional For a member of the public seeking a one- sources, includ- off scholarly article, is ideal. It ing is free to access, and as easy to use as the main archives, Google search engine (see ‘Inside information’, opposite). But for academics with access to dedicated library resources, why make the switch? Most scientists rely on tried and trusted favourites, including subject-specific databases such as the US National Institutes of Health’s PubMed or the NASA Astrophysics Data System, to find papers. C. DARKIN C. Since its launch last November, Acharya’s Scholar engine has delighted and infuriated in equal measure. One librarian has even begun a blog following the search engine’s progress. Although there are no detailed studies, many librarians report that faculty members and students are beginning to use the search engine; some suspect that Scholar will replace more established, and more costly, search tools. Figures from academic publishers also suggest that use of Scholar is growing rapidly: it already directs more online traffic to Nature websites than any other multidisciplinary science search engine. Thomas Mrsic-Flogel, a neuroscientist at the Max Plank Institute of Neurobiology in Martinsried, Germany, and a regular PubMed user, has started to use Scholar. He says he finds the engine useful when he is not quite sure what he is searching for. Search results include citation links to other articles, so he follows the links until he finds something interesting — a function that PubMed, which does not track citations, cannot provide. “I follow the citation trail and get to papers I hadn’t expected,” says Mrsic-Flogel. “I have found papers that way that I wouldn’t have found otherwise.” This citation tracking puts Scholar in direct competition with the fee-based search engines marketed by traditional science publishers. Until launched its search engine, Scopus, in 2004, Thomson Scientific’s Web of

554 © 2005Nature PublishingGroup NATURE|Vol 438|1 December 2005 NEWS FEATURE

Inside information Science search engines are fine for Site:Websites are often difficult to Filetype:A useful way to refine ‘define:Tamiflu’ takes you to literature searches, but scientists find your way around, so rather searches is to search for particular definitions in Wikipedia in several inevitably need much broader than wasting time endlessly document types using the languages for example. information from the web. clicking, just type ‘site:’ into your ‘filetype:’ query. A search for Searching using the main Google query followed by the website ‘Tamiflu filetype:ppt’ will return Quotation marksUltimately, the engine may take some coaxing, name. Searches can also be only PowerPoint presentations, web is about people, and if you are but a few tricks can help you to restricted to a domain name. which are usually conference looking for contacts, or possible find the most relevant information For example, ‘site:gov’ will limit talks. ‘Filetype:doc’ will often collaborators, there are some faster, and to get a variety of views a search to US government sites, return project proposals or ways to Google scientists. The on a topic. and ‘site:nih.gov’ to the National government texts, ‘filetype:pdf’ query, ‘“avian influenza” Google has advanced search Institutes of Health. A search for is more likely to return scientific “workshop participants”’, will options that will help you narrow Tamiflu at the World Health information. bring back a few hundred hits, your search, using more precise Organization, ‘Tamiflu site:who.int’, often with contact details for terms, or broaden it, using returns about 100 hits. A broader Define: This simple query will world experts among the top synonyms. Here we list some less search, such as ‘tamiflu site:edu’, provide a definition of the words results. Variations of this will do well-known tips, using the drug brings back more than 40,000 hits you enter after it, gathered from the same in any scientific field. Tamiflu as an example. from US universities. various online sources. The query Declan Butler searches across several engines, say that process means Scholar’s citation tracking can resources, but it doesn’t track citations. Scholar performs well. A study published this return odd results. For example, So where does this leave Acharya’s bold year, which looked at more than 100 papers, finds almost 14,000 citations for a 1988 Science goal? Librarians say that Scholar’s current high concludes that Scholar finds similar numbers paper on the polymerase chain reaction2, iden- usage rates are likely to reflect searches run by of citations to its commercial rivals1. Yet such tifying it as the most highly cited paper ever undergraduates, who typically require only a results need to be interpreted cautiously, say to appear in that journal. Scholar finds just couple of key papers on any one subject, and information scientists. Critics point out that under 3,000. researchers who want a quick snapshot of an the study did not examine the list of citations All this suggests that there may be little unfamiliar field. Acharya says he intended to see whether they contained duplicated or overlap between the citations in the grey liter- Scholar to appeal to such users, but also wants erroneous entries. ature found by Scholar, and those extracted to attract academics who need to keep up with A closer look at Scholar search results sug- from the primary literature — even when the the latest papers in their field. As Thomson gests that duplication may well be occurring. and Elsevier continue to invest in new ser- One of Scholar’s harshest critics, Péter Jascó, “I want to make vices, it will be interesting to see whether an information scientist at the University of Scholar can keep up. GOOGLE Hawaii in Honolulu, has taken the engine on Google Scholar numerous test drives. He has documented the the one place to Two’s company results in unflattering terms on a website run go for scholarly With just two full-time staff working with by Thomson Scientific. In one extreme case, Acharya, it would seem that Scholar is a low Jascó found that the first 100 results from a information across priority for Google. But maybe they could search for documents on ‘computers’ and all disciplines.” draw on the expertise of outside computer ‘intractability’ returned 92 slightly different — Anurag Acharya programmers by letting them write software citations of a entitled Computers and that taps into Scholar’s database. It is an Intractabilityand only 8 other unique results. approach Google has used before to good citation counts match up. For now, librarians effect. If Google allows programmers to do Cite unseen are unanimous in their advice: stick to Web of the same with Scholar, it is likely that add- The source of this problem is the way in Science or Scopus if you need to do a thorough ons would be developed by librarians and which Google adds records to its scholarly literature search or an accurate citation count. academics. index. At Web of Science and Scopus, staff The engines have impressive coverage and well So does Scholar plan to open itself to out- scan in the abstracts and references from indexed records with fewer misclassified siders? Not right now, says Acharya. He print journals and use dedicated electronic entries. Librarians also warn that Scholar is remains cagey, but is not ruling it out. “We feeds supplied by publishers. Scholar, by con- still an experimental, or beta, version. Google may reconsider this decision once the service trast, uses an automated process. Software remains reluctant to reveal details of its search is closer to how we envisage it.” robots crawl the web in search of documents algorithm, or what it indexes, so hopes of The Google team may also reconsider if that look like scientific papers, and then use using Scholar as a tool for checking on citation enthusiasm for Scholar continues to grow. algorithms to strip out relevant information counts is a distant prospect, they say. Librarians at Virginia Tech in Blacksburg have such as author and publication date. The All three search engines will continue to already created a free software extension, process is vastly cheaper and quicker, but it is evolve. Scopus and Web of Science plan to add called LibX, for an Internet browser, which not yet updated daily and there are no manual additional resources to their databases, such allows users to retrieve papers using Scholar checks to delete duplicates or correct misclas- as institutional repositories, together with new with a simple mouse click on highlighted text. sified records. ways for searching those sources. Scopus, for LibX will take you directly to your library’s Google has deals with several academic example, is integrated with a chemical database, resources, if the paper can be found there. publishers that allow it to search the full text of such that users can go from a literature search And that is the sort of tool both Google and many papers, whereas Web of Science and the to see structural information on molecules of librarians can learn to love. ■ others are largely restricted to searching interest. But it is unlikely that these engines Jim Giles is a reporter for Naturein London. abstracts. But Scholar’s index is restricted to will ever mine the as broadly as 1. Bauer, K. & Bakkalbasi, N. D-Lib Magazine online sources — Web of Science has archives Scholar. Elsevier has a separate, free search 10.1045/september2005-bauer (2005). that go back to 1900. And the automated engine, called Scirus, that searches science web 2. Saiki, R. K. et al. Science239,487–491 (1988).

555 © 2005Nature PublishingGroup