Publishing Primary Data on the World Wide Web: Opencontext.org and an Open Future for the Past

Eric C. Kansa

The Internet and Scholarly Communication would host and maintain these draft papers. University libraries are some of the most vocal advocates for OA re- More scholars are exploring forms of digital dissemina- search. Current publishing frameworks have seen dramati- tion, including (OA) systems where content is cally escalated costs, sometimes four times higher than the made available free of charge. These include peer-reviewed general rate of inflation (Create Change 2003). Increasing e-journals as well as traditional journals that have an on- costs have forced many libraries to cancel subscriptions line presence. Besides SHA’s Technical Briefs in Historical and thereby hurt access and scholarship (Association for Archaeology, the American Journal of Archaeology now offers College and Research Libraries 2003; Suber 2004). open access to downloadable articles from their printed issues. Similarly, Evolutionary Anthropology offers many Beyond Papers: Sharing Other Scholarly Media full-text articles free for download. More archaeologists are also taking advantage of easy Web publication to post These policy developments and others show that OA copies of their publications on personal websites. Roughly models are gaining in importance. Similar trends toward 15% of all scholars participate in such “self-archiving.” To greater are evident for other types of scholarly encourage this practice, Science Commons (2006) and the content, beyond peer-reviewed papers. Besides making Scholarly Publishing and Academic Resources Coalition distribution highly cost-effective, the Internet provides a (SPARC) recently launched the Scholar Copyright powerful means for sharing large collections of rich media Project, an initiative that will develop standard “Author and complex data. These types of content are important Addenda”—a suite of short amendments to attach to copy- components of both museum collections and excavation right agreements from publishers . These addenda make it of this complex, media rich content is an important goal easier for paper authors to retain and clarify their rights to for many seeking to reform and enhance scholarly commu- self-archive their papers electronically. Several studies now nication (Association of Research Libraries 2006:57–59). clearly document that self-archiving and OA publication As demonstrated in ecology and other sciences, reused enhances uptake and citation rates (Hajjem et al. 2005). primary data can be an important resource for advances in Researchers enhance their reputations and stature by open- understanding (Kansa 2005). ing up their scholarship. Data sharing does present a new set of technical, con- Mounting pressure for greater public access also comes ceptual, and incentive problems. Excavation results, spe- from many research stakeholders. Granting foundations cialist analyses, and museum collection databases are highly interested in maximizing the return on their investment variable and often complex (Kintigh 2006). Nevertheless, in basic research are often encouraging and sometimes there has been great progress on many of the technical even requiring some form of OA electronic dissemination. and conceptual problems involved in pooling and integrat- Interest in maximizing public access to publicly financed ing the complex and unstandardized data generated by research is catching on in Congress. A new bipartisan bill, researchers. The Etana Digital Library project (Etana-DL) the Federal Research Public Access Act, would require OA , led for drafts of papers that pass and result from by James Flanagan and digital library pioneer Edward Fox, federally funded research (U.S. Congress 2006). The bill has successfully demonstrated a data-mediation system that would create government-funded digital repositories that uses software to translate local data structures to a more

Technical Briefs In historical archaeology, 2007, 2: –11 Publishing Primary Data on the World Wide Web general data structure. This mediation enables Etana-DL to easy to integrate with a host of other Web services, includ- provide interoperability and integrated search, browse, and ing weblogs, e-journals, and commercial search engines. analysis tools for several Near Eastern excavation datasets Search engine discovery is becoming an increasingly sig- (Flanagan et al. 2004). Dean Snow and colleagues advocate nificant factor in determining the impact and uptake of developing advanced text-mining systems to extract com- research (Jensen 2005; Vaughan and Shaw 2005). parative data from archaeological reports, including “grey literature” documentation generated from CRM activities Types of Material in Open Context (Snow et al. 2006). Following the model of other scien- tific disciplines, the National Science Foundation recently Open Context is best suited for publishing large bodies awarded a group led by Keith Kintigh and colleagues a of complex archaeological documentation. All content is large “cyber-infrastructure” grant to stimulate data integra- linked together in an integrated and cohesive resource. The tion and sharing in archaeology. This project is now in its types of content in Open Context include: initial stages and aims to begin by developing ontologies for • Narratives: These include more loosely structured, tex- zooarchaeology. Ontologies are formally defined concep- tual types of content, including excavation notes, ob- tual systems and are often used to support the integration servations, and diaries. These narratives are integrated of multiple datasets within a discipline. and linked to other types of information, including database records and other media (images, videos, and An “Open Context” for Excavation Results and maps) (Figure 2). Related Collections • Analytic (Tabular) Data: Open Context enables publica- tion of database types of content. These include context Other working systems are now coming online, including databases, finds registries, museum registries and cata- two related systems, the University of Chicago OCHRE logs, and specialist analyses. All of these different types project and Open Context of data are automatically integrated together in one . Both systems share the cohesive database (Figure 3). same data architecture described by the Archaeological • Media: Open Context can link digital images, maps, Markup Language (ArchaeoML) and both have similar ca- drawings, GIS files, videos, and other types of media pabilities for integrating and pooling complex and media- with other forms of content. For example, a user will rich archaeological documentation (Schloen 2001; Kansa immediately know if an item in a finds registry was 2005). While OCHRE provides sophisticated data man- photographed or drawn because a thumbnail image will agement tools targeted for active research projects, Open appear with the record of that item (Figure 4). Context (Figure 1) is aimed at streamlined, Web-based access and retrieval of excavation and collections-related Using Open Context content. Funding for the development of Open Context came from the William and Flora Hewlett Foundation, as Open Context makes it easy to browse, search, and analyze part of their Education Program’s effort to make high-qual- data from different projects and collections. It can serve ity instructional resources freely available on the Web. as a reference resource to help researchers find relevant Open Context enables researchers to publish their pri- comparative materials. It can also support reanalysis and mary field data, notes, and media (images, maps, drawings, reinterpretation of excavation results. Finally, undergradu- videos) on the World Wide Web. It provides an easy to use, ate students can use Open Context as a primary source, so yet powerful, online database for exploring, searching, and they can develop important analytical skills by exploring analyzing multiple excavation results, survey datasets, and and synthesizing primary excavation results. Users have museum collections. These diverse datasets can be explored a variety of options to find materials in Open Context, by browsing through a map (Figure 1) or through different including simple, “Google-like” text searches (Figures 5 search options. Open Context is built with standard but and 6), and more sophisticated, advanced searches that use powerful Web technologies (MySQL and PHP), making it Boolean logic (Figure 7).

 Technical Briefs in historical archaeology Eric C. Kansa

Figure 1. A simple map enables users to locate content based on geographic location.

Access, Copyright, and Reuse of Content reuse and reapplication of copyrighted digital content, each item in Open Context is licensed with an open, Creative Open Context is an OA publication system. All content Commons license. These licenses give explicit permissions is freely available on the World Wide Web. Contributors for users to freely and legally use the material so long as who publish with Open Context retain copyright to their they properly attribute the original creator (Brown 2003) content. This means all contributors are free to publish (Figure 8). licenses include machine- their material with other venues (including journals, readable RDF metadata that is captured by commercial books, and other websites). Open Context is also unique search engines such as Yahoo and Google (Kansa et al. in that it provides a framework for sharing archaeological 2005). This metadata facilitates discovery of openly licensed research, free of burdensome copyright restrictions, while content, including Open Context resources. Such openness still protecting scholarly attribution. Copyright law pro- ensures that the Open Context content is of maximum hibits unauthorized reproduction or derivative uses of all value for reuse in both instructional and research applica- expressive works. These legal barriers work against some of tions. Finally, to facilitate scholarly applications, citation the innate advantages of digital content, namely the ease to information is automatically generated for each item in which digital information can be readapted in new works the database (Figure 8). Stable URLs to each item in Open and then globally shared. In order to encourage scholarly Context facilitate citation and later retrieval.

Technical Briefs in historical archaeology  Publishing Primary Data on the World Wide Web

Figure 2. An excavation diary linking to items from the Domuztepe Context Database.

Figure 3. A record from the Domuztepe Zooarchaeological Analysis Database.

 Technical Briefs in historical archaeology Eric C. Kansa

Figure 4. An image linked with its small finds registry record and context.

Figure 5. A simple search for “carnelian.”

Technical Briefs in historical archaeology  Publishing Primary Data on the World Wide Web

Figure 6. Results of the above “carnelian” search, showing items from multiple projects.

Figure 7. Example of an “Advanced Search” across multiple projects.

 Technical Briefs in historical archaeology Eric C. Kansa

Making Sense of Multiple Projects and (Bearman and Trant 2005; Trant 2006). Open Context is Collections developing several enhancements to this system, including better ways of recognizing professional credentials and The lack of many formal standards in archaeology makes scholarly authority, and options for users to apply profes- data integration a challenge. Open Context uses special sionally developed standard vocabularies such as the Getty software to import spreadsheets and databases created by Art and Architecture Thesaurus or future ontologies such individual researchers and collections managers. We are as those advocated by Kintigh’s team (Kintigh 2006). currently completing development of a a web-based ver- sion of this import software. The web version will enable “Pinging” the Past individual contributors to upload and document their own data tables and submit them for editorial review. Because Folksonomies, once they are properly adapted for profes- Open Context uses ArchaeoML’s very flexible architecture, sional applications, and other collaborative methods of data all the original recording systems and terminologies and re- analysis can become new areas where researchers can make tained. Open Context enables researchers to publish their important scholarly contributions. “Publishing” primary da- data without forcing them to conform to overly restrictive, prede- tasets does raise some interesting questions about value and termined standards. To help make sense of this widely varying credit. An excavation dataset is not like a peer-reviewed body of material, Open Context has a user “folksonomy” paper, currently the main currency of professional achieve- system. Open Context enables users to tag items either in- ment. To begin assessing the scholarly impact of digital dividually or collectively (users can assign a tag to items in datasets, Open Context records information on visits to a query result set). When query result sets are tagged, the each record in the system. Recent studies have shown a history of query composition is automatically linked to the significant correlation between download counts and more tagging event (Figure 9). Users can also further annotate commonly used measures of citation impact in scholarly and explain the rationale behind their tag assignments. Tags papers (Brody et al. 2006). Nevertheless, a database in can be used to save search selections for future reference itself is a poor guide to understanding excavation or survey and to share sets of items with colleagues. The authorship results. In order to be better understood and used, datasets of each tagging event is documented by Open Context. are best linked with papers and narratives that synthesize Users can filter out tags and tag authors they consider un- observations and interpretations in a more meaningful reliable. Currently, Open Context developers, a team led framework (Richards 2003). This will necessitate technical by two archaeologists with doctoral degrees, monitor and and editorial coordination between archaeological journals moderate tagging activity to insure quality and relevance. and online data publishers. As content and tagging activity expands, subject matter To facilitate coordination with narratives, Open experts will be needed to provide editorial review of the Context automatically generates reciprocal hyperlinks folksonomy system. Reviewers will receive email notifica- with other Web services that support the open “ping-back” tion of tagging activity and can remove inappropriate tags standard. If a person using a weblog or publishing in a ping- from public view. back-enabled e-journal references an item or a set of items Folksonomies are cost-effective and simple tools in Open Context, the Open Context system will be auto- that enable a community of users to add value to pooled matically informed about what items are being referenced. content by identifying and annotating items of interest. Open Context will then display links back to the weblog Users can “tag” items with common keywords and phrases post or e-journal article referencing the Open Context da- and thereby establish and share meaningful links among tabase. Once an editorial board is assembled, all links will items from different projects and collections, even if these be subjected to editorial review. This will ensure that Open projects use different recording systems. The folksonomy Context only registers references to trusted sources (such system can facilitate semantic data integration, and recent as an e-journal with a peer-review process). As e-journal experiments suggest such systems offer annotations of suf- systems gain popularity, such features will help ensure that ficient quality to meet some needs of museum professionals Open Context users will easily find scholarly uses and in-

Technical Briefs in historical archaeology  Publishing Primary Data on the World Wide Web

Figure 8. An automatically generated citation for an Open Context item.

Figure 9. A query history saved with a set of tagged items.

 Technical Briefs in historical archaeology Eric C. Kansa terpretations of Open Context content. Similarly, the abil- ect provides digital longevity support for Open Context ity to reference public databases such as Open Context can pilot projects, and additional archival partnerships will be enhance e-journal publications by making primary evidence needed to better secure the often-irreplaceable archaeo- more transparent and open for critical evaluation. logical content hosted by Open Context. To learn more about contributing to Open Context, please contact the Looking Forward author via email ([email protected]). With sufficient community contributions, feedback, Open Context has a variety of demonstration datasets now and support, Open Context and related OA systems will available for exploration and testing. These include field expedite and streamline reference searches and provide a archaeology contextual records and finds registers, geo- comparative format for efficiently interpreting and reana- archaeological samples, and a variety of zooarchaeological lyzing excavation results. Similarly, making primary data ef- analyses. The system is also adding museum and reference ficiently accessible and usable can support research agendas collection datasets. Some projects have rich image collec- that are not currently achievable. By pooling primary data tions and narrative material, and others are of primary resources in systems with powerful analytic tools, such interest for specialist comparative analyses. systems should enable broad regional syntheses that are The primary goal now is to build a critical mass of more comprehensive and more analytically rigorous than users, contributors, and content needed to sustain Open are currently feasible (Kansa 2005; Kintigh 2006). Context as a valued scholarly resource. Because Open Contributing to the development and use of systems Context supports dissemination of highly structured like Open Context illustrates one way the archaeological database content along with textual narratives, such as community can participate in the broader shifts toward historical documents, and other media, such as photos or OA. Across the board, OA now has a great deal of mo- drawings, it can be a useful communication tool for the mentum and powerful institutional support. As of mid- historical archaeology community. Databases of excava- September 2006, the Alliance for Taxpayer Access (2006) tion records, finds registries, and comparative collections reported that 53 college presidents and 25 university can be pooled together with textual syntheses stored in provosts have signed letters in support of OA legislation. the form of HTML or PDF documents. The flexible data This lobbying demonstrates significant institutional back- structure can also enable historical archaeologists to relate ing for the reform of scholarly communication. At the archaeological datasets with other types of structured in- same time, OA journals published by the Public Library of formation, such as historical census reports, tax records, Science (2006) achieved impact factors rivaling Nature and cemetery records, or even shipping manifests. Relating Science. Other published studies document how OA makes such varied forms of documentation may open doors for research easier to find and use and allows that research to innovative research agendas in historical archaeology. To have greater impact and significance (Harnad and Brody help demonstrate and refine the applicability of Open 2004; Hajjem et al. 2005). Data hording, access barri- Context to support this subdiscipline, its developers invite ers, and overly restrictive copyright controls increasingly contributions of excavation and survey data, media, and self-marginalize both research and researchers (Willinsky museum collections from historical archaeologists. 2006:21). These broader transitions in academic commu- In addition to contributing content, members of the nication, and more discipline specific initiatives such as the historical archaeological community are also welcome to development of Open Context, all illustrate how the future participate in other capacities. Open Context needs edito- of the archaeological past is looking increasingly open. rial assistance to help insure quality and oversee revision and error correction. Additional support may be provided ACKNOWLEDGMENTS through open-source software development partnerships. Finally, data sharing and collaboration among multiple The author would like to thank everyone who has par- repositories is an important digital data longevity strategy ticipated in making Open Context operational, especially (Reich and Rosenthal 2001). Currently, the OCHRE proj- David Schloen and the University of Chicago OCHRE

Technical Briefs in historical archaeology  Publishing Primary Data on the World Wide Web project for their continued support and partnership. Their tion to Encourage Interaction with Museum Collections. efforts have provided the essential conceptual groundwork D-Lib Magazine 11(9) 10 Dec 2006. be given for the collaboration of (Science Commons) and Heather Ford (iCommons) for facilitating Brody, Tim, Stevan Harnad, and Les Carr Open Context and other and open culture 2006 Earlier Web Usage Statistics as Predictors of Later initiatives. Sarah Whitcher Kansa deserves special thanks Citation Impact. Journal of the American Society for Informa- for her comments, ideas, and edits, as do the SHA review- tion Science and Technology 57(8):1060–1072. ers for their role in enhancing this text. Any omissions or errors in this work are solely the fault and responsibility Brown, Glenn Otis of the author. Finally, Open Context and many other open 2003 Out of the Way: How the Next Copyright education initiatives directly result from the enabling finan- Revolution Can Help the Next Scientific Revolution. cial support of the William and Flora Hewlett Foundation. PLoS Biology 1(1):e9 10 Dec 2006. Foundation help ensure that Open Context can serve as a free and open access resource for the community. Create Change 2003 Scholars under Siege: The Scholarly Commu- References nication Crisis. Overview of the Issues, Create Change (Association of Research Libraries, Association of College Alliance for Tax Payer Access and Research Libraries, and the Scholarly Publishing 2006 Fifty-three College Presidents Lead Newest Show and Academic Resources Coalition) ance for Tax Payer Access. 6 Sept 2006. Flanagan, James W., Edward A. Fox, Douglas R. Clark, Association of College and Research Libraries Weiguo Fan, Rao Shen Ravindranathan, and 2003 Journal Pricing. Scholarly Communication Tool- Marcos André Gonçalves kit. American Library Association, Chicago, IL 2004 Jordanian Archaeology and ETANA: Developing a 10 Dec 2006. Petra, Jordan. 10 Dec 2006. Association of Research Libraries 2006 To Stand the Test of Time: Long-Term Steward- Harnad, Stevan, and Tim Brody ship of Digital Data Sets in Science and Engineering. 2004 Comparing the Impact of Open Access (OA) vs. Report to the National Science Foundation from the Non-OA Articles in the Same Journals. D-Lib Magazine ARL Workshop on New Collaborative Relationships: The 10(6). 10 Dec 2006. September 26–27, 2006, Arlington, VA 6 Jan 2007. Hajjem, Chawki, Stevan Harnad, and Yves Gingras 2005 Ten-Year Cross-Disciplinary Comparison of the Bearman, David, and Jennifer Trant Growth of Open Access and How It Increases Research Ci- 2005 Social Terminology Enhancement through Ver- tation Impact. IEEE Data Engineering Bulletin 28(4):39–47. nacular Engagement: Exploring Collaborative Annota- 10 Dec 2006.

10 Technical Briefs in historical archaeology Eric C. Kansa

Jensen, Michael Snow, Dean R., Mark Gahegan, C. Lee Giles, 2005 Presses Have Little to Fear from Google. The Kenneth G. Hirth, George R. Milner, Prasenjit Mitra, Chronicle for Higher Education 51(44):B16. and James Z. Wang 2006 Cybertools and Archaeology. Science 311(5763): Kansa, Eric 958–959. 2005 A Community Approach to Data Integration: Authorship and Building Meaningful Links across Diverse Suber, Peter Archaeological Data Sets. Geosphere 1(2):97–109. 2004 University Actions against High Journal Prices. SPARC Open Access Newsletter 72. 2005 Protecting Traditional Knowledge and Expanding 10 Dec 2006. Access to Scientific Data. International Journal of Cultural Property 12(3):97–109. Trant, Jennifer 2006 Exploring the Potential for Social Tagging and Kintigh, Keith Folksonomy in Art Museums: Proof of Concept. New 2006 The Promise and Challenge of Archaeological Review of Hypermedia and Multimedia 12(1):83–105. Data Integration. American Antiquity 71(3):567–578. U.S. Congress Public Library of Science (PLoS) 2006 Federal Research Public Access Act of 2006, S 2695, 2006 About PLoS Biology. Journal Information. Public 109th Congress, 2nd. Session, Library of Congress, Library of Science. 10 Dec 2006. ry/z?c109:S.2695:> 10 Dec 2006.

Richards, Julian Vaughan, Liwen, and Debora Shaw 2003 Online Archives. Internet Archaeology 15, Council 2005 Web Citation Data for Impact Assessment: for British Archaeology, Department of Archaeology, A Comparison of Four Science Disciplines. Journal of University of York, England, UK 10 Dec 2006. 56(10):1075–1087.

Reich, Vicky, and David S. Rosenthal Willinsky, John 2001 LOCKSS: A Permanent Web Publishing and Ac- 2006 The Access Principle. MIT Press, Cambridge, MA. cess System. D-Lib Magazine 7(6). 10 Dec 2006. asp?ttype=2&tid=10611> 10 Dec 2006.

Science Commons Eric C. Kansa 2006 Scholar’s Copyright Project: Background. Cre- Open Context ative Commons and Science Commons San Francisco, CA 94127 10 Dec 2006.

Schloen, David 2001 Archaeological Data Models and Web Publication Using XML. Computers and the Humanities 35:123–152.

Technical Briefs in historical archaeology 11