Vol-104-No-3-2012-29
Total Page:16
File Type:pdf, Size:1020Kb
LAW LIBRARY JOURNAL Vol. 104:3 [2012-29] The Long Tail of Legal Information: Legal Reference Service in the Age of the Content Farm* R. Lee Sims** and Roberta Munoz*** The authors discuss the implications for legal reference service of a new feature of the legal information universe: the content farm. This article describes the content farm, its workings, what makes it profitable, and the market and informational forces that drive content farm creation. It also discusses how reference interactions may be altered if a patron has consulted content farm information before coming to the reference desk. A New Challenge for Law Librarians at the Reference Desk ¶1 Nonexpert legal research patrons come to the reference desk wanting answers to their questions, but they often don’t know what question to ask or how to ask it. The first encounter between patron and reference librarian frequently begins with an analysis of the question itself. It is important for the librarian to consider whether the patron has asked a question that can be answered, and whether the patron has asked the question using language that will lead him down the right path. ¶2 Now, a new type of legal information is becoming widely available—even ubiquitous—that makes this process even more challenging. Vast quantities of legal information are currently being generated by “content farms.”1 Content farm infor- mation, retrieved through search engines like Google, answers legal questions with a high degree of specificity, even when the questions posed are incomplete, poorly formed, or nonsensical. Patrons with an article from a content farm in hand may come to the reference desk assuming that someone else has posed the correct ques- tion, which they then have adopted as their own. They may further assume that their question has been answered in whole or in part, and that all they need is for the librarian to place the materials cited in the article in their hands and their work will be done. They may well be surprised to come face-to-face with a librarian who is not so sure. The primary purpose of this article is to alert legal reference librar- ians to this new and fundamentally different type of legal information. We discuss * © R. Lee Sims and Roberta Munoz, 2012. ** Head of User Services, Rutgers-Newark Law Library, Newark, New Jersey. *** Head Librarian, CMS: Cataloging and Metadata Services of New Jersey, Weehawken, New Jersey. 1. “Content farms” are defined infra ¶¶ 5–9. 411 412 LAW LIBRARY JOURNAL Vol. 104:3 [2012-29] and explain the content farm phenomenon as it affects legal information aimed at the nonexpert user. ¶3 Those who follow trends in information dissemination may recall a recent controversy concerning a change in Google’s search algorithm that was designed to slow down the proliferation of content farm information.2 This change makes assumptions about content farm information: specifically, that information gener- ated for content farms is somehow less worthy than other legal information on the web. ¶4 In examining whether this assumption is correct, we begin by trying to define the term content farm—a term that is evolving even now. We then describe how legal information is generated and disseminated via content farms and explore why this type of information is different from other kinds of legal information—good and bad—that are available on the Internet. We analyze the economic theory upon which content farm information is being developed—the “long-tail theory”—and show and critique some examples of legal content generated by content farms that are currently available on the web. As part of that discussion, we pose the question of whether Google is right that content farm content should be removed from the web or at least from Google’s search results. We end with a review of why familiarity with this material is important to reference librarians and why, in the context of a legal reference transaction, content farm material is fundamentally different from other sources of legal information our users may bring to us. Content Farms Defined ¶5 What is a content farm? How does legal information produced by content farms differ from other sources of information used by our patrons? Information generation via content farms is a relatively new phenomenon, and there is as yet no settled definition. But the most popular definitions share four important charac- teristics: (1) content farm information is produced “on demand”—it is created in response to user-generated search data and uses the natural language terms that individuals plug into search engines; (2) content is written around keyword phrases and heavily “search engine optimized” for those phrases; (3) information created this way is generally considered to be of substandard quality; and (4) con- tent is accompanied by advertisements and links to service providers’ web sites. ¶6 One definition of content farm information appears on the Google Blog. There, Matt Cutts, Google’s principal engineer, presented a definition of a content farm in an effort to explain why Google was changing its algorithm to deemphasize content farm information, saying that content farms “are sites with shallow or low- quality content.”3 Cutts equated content farm information with webspam, a term that probably does not require definition.4 2. Steve Lohr, Google Schools Its Algorithm, N.Y. TIMES, Mar. 6, 2011, at WK4 (“[I]ndustry ana- lysts agree that the target seemed to be so-called content farms, often sites with listlike articles, filled with words that are frequently used as search terms.”). 3. Matt Cutts, Google Search and Search Engine Spam, GOOGLE OFFICIAL BLOG (Jan. 21, 2011, 12:00 P.M.), http://googleblog.blogspot.com/2011/01/google-search-and-search-engine-spam.html. 4. Google’s description of webspam can be found in Matt Cutts, Using Data to Fight Web- spam, GOOGLE OFFICIAL BLOG (June 27, 2008, 7:51 P.M.), http://googleblog.blogspot.com/2008/06 Vol. 104:3 [2012-29] THE LONG TAIL OF LEGAL INFORMATION 413 ¶7 It is more accurate to say that the definition of a content farm, like the mar- ket for the content and the content itself, is continuously evolving. A more detailed definition is provided by Allen Graves.5 He makes reference to Cutts’s post on the Google Blog but includes additional factors gleaned from other sources (like the Wikipedia article on the topic6) for determining when a web site can be considered a content farm. His list of factors, including his own emphasis and parenthetical comments, is as follows: • Multiple writers producing large amounts of content • Authors are paid and may not be experts on what they are writing • Content is written around currently popular/profitable long-tail keyword phrases and optimized heavily for those phrases • Content is of low quality and/or shallow (subjective) • Content is “spammy” (subjective) • Content does not link to authority websites or accurate resources • Content can be considered “intra-domain duplicate content” by the newly upgraded search engine document indexer • Content is diminutive, without supporting information or resolution • Website or section of website contains large and growing number of articles • Pages are designed to drive traffic to other monetized web pages or lead forms • Content is designed to drive traffic to other monetized web pages or lead forms • Content is surrounded by multiple advertisements, lead generation forms, con- textual adverts, affiliate links or any other monetization techniques7 ¶8 In reality, it is almost impossible to make a blanket statement about the qual- ity of information coming from content farms. Although commentators like Cutts and Graves deem content farm material to be shallow, trivial, low quality, and sub- standard, this is not always true. We analyze some examples later in this article and consider the quality of the information we found. In the end, the determination of whether this type of material is trivial or substandard is the responsibility of the information seeker. For reasons discussed in later sections, we believe that, regard- less of what that determination may be, an objective understanding of the proper- ties of this type of information will help reference librarians who encounter patrons who have decided to rely on content farm information. ¶9 Additionally, when Cutts and Graves assert that content farm articles do not contain references to authoritative sources, they are mistaken. The content farm articles that we critique below all include links to supporting authorities. Painting all content farms with the same brush is unfair. Content farms vary widely, not only in the type of information they provide, but in the validity and accuracy of that information and the authority upon which it is based. ¶10 It is vital that reference librarians know that content farm information is generated for commercial purposes. The entities that generate content farm infor- /using-data-to-fight-webspam.html (“[W]ebspam is junk you see in search results when websites try to cheat their way into higher positions in search results or otherwise violate search engine quality guidelines.”). 5. Allen Graves, What Is a Content Farm? A Comprehensive Definition, WEBSITE-ARTICLES (Mar. 18, 2012, 10:42 A.M.), http://www.website-articles.net/articledetail.php?artid=1172&catid=424. 6. Content Farm, WIKIPEDIA, http://en.wikipedia.org/wiki/Content_farm (last modified Feb. 16, 2012). 7. Graves, supra note 5. 414 LAW LIBRARY JOURNAL Vol. 104:3 [2012-29] mation produce information that is designed to keep the user on the page, reading the content or, at a minimum, following relevant links to more information, more articles, or commercial sites. ¶11 For the sake of simplicity and clarity, we limit our review to those legal content farms in which the information is generated “on demand,” is written by human beings, contains citation to authority, and has words and phrases designed for search engine optimization.