Publishing Primary Data on the World Wide Web: Opencontext.org and an Open Future for the Past
Eric C. Kansa
The Internet and Scholarly Communication would host and maintain these draft papers. University libraries are some of the most vocal advocates for OA re- More scholars are exploring forms of digital dissemina- search. Current publishing frameworks have seen dramati- tion, including open access (OA) systems where content is cally escalated costs, sometimes four times higher than the made available free of charge. These include peer-reviewed general rate of inflation (Create Change 2003). Increasing e-journals as well as traditional journals that have an on- costs have forced many libraries to cancel subscriptions line presence. Besides SHA’s Technical Briefs in Historical and thereby hurt access and scholarship (Association for Archaeology, the American Journal of Archaeology now offers College and Research Libraries 2003; Suber 2004). open access to downloadable articles from their printed issues. Similarly, Evolutionary Anthropology offers many Beyond Papers: Sharing Other Scholarly Media full-text articles free for download. More archaeologists are also taking advantage of easy Web publication to post These policy developments and others show that OA copies of their publications on personal websites. Roughly models are gaining in importance. Similar trends toward 15% of all scholars participate in such “self-archiving.” To greater openness are evident for other types of scholarly encourage this practice, Science Commons (2006) and the content, beyond peer-reviewed papers. Besides making Scholarly Publishing and Academic Resources Coalition distribution highly cost-effective, the Internet provides a (SPARC) recently launched the Scholar Copyright powerful means for sharing large collections of rich media Project, an initiative that will develop standard “Author and complex data. These types of content are important Addenda”—a suite of short amendments to attach to copy- components of both museum collections and excavation right agreements from publishers
Technical Briefs In historical archaeology, 2007, 2: –11 Publishing Primary Data on the World Wide Web general data structure. This mediation enables Etana-DL to easy to integrate with a host of other Web services, includ- provide interoperability and integrated search, browse, and ing weblogs, e-journals, and commercial search engines. analysis tools for several Near Eastern excavation datasets Search engine discovery is becoming an increasingly sig- (Flanagan et al. 2004). Dean Snow and colleagues advocate nificant factor in determining the impact and uptake of developing advanced text-mining systems to extract com- research (Jensen 2005; Vaughan and Shaw 2005). parative data from archaeological reports, including “grey literature” documentation generated from CRM activities Types of Material in Open Context (Snow et al. 2006). Following the model of other scien- tific disciplines, the National Science Foundation recently Open Context is best suited for publishing large bodies awarded a group led by Keith Kintigh and colleagues a of complex archaeological documentation. All content is large “cyber-infrastructure” grant to stimulate data integra- linked together in an integrated and cohesive resource. The tion and sharing in archaeology. This project is now in its types of content in Open Context include: initial stages and aims to begin by developing ontologies for • Narratives: These include more loosely structured, tex- zooarchaeology. Ontologies are formally defined concep- tual types of content, including excavation notes, ob- tual systems and are often used to support the integration servations, and diaries. These narratives are integrated of multiple datasets within a discipline. and linked to other types of information, including database records and other media (images, videos, and An “Open Context” for Excavation Results and maps) (Figure 2). Related Collections • Analytic (Tabular) Data: Open Context enables publica- tion of database types of content. These include context Other working systems are now coming online, including databases, finds registries, museum registries and cata- two related systems, the University of Chicago OCHRE logs, and specialist analyses. All of these different types project
Technical Briefs in historical archaeology Eric C. Kansa
Figure 1. A simple map enables users to locate content based on geographic location.
Access, Copyright, and Reuse of Content reuse and reapplication of copyrighted digital content, each item in Open Context is licensed with an open, Creative Open Context is an OA publication system. All content Commons license. These licenses give explicit permissions is freely available on the World Wide Web. Contributors for users to freely and legally use the material so long as who publish with Open Context retain copyright to their they properly attribute the original creator (Brown 2003) content. This means all contributors are free to publish (Figure 8). Creative Commons licenses include machine- their material with other venues (including journals, readable RDF metadata that is captured by commercial books, and other websites). Open Context is also unique search engines such as Yahoo and Google (Kansa et al. in that it provides a framework for sharing archaeological 2005). This metadata facilitates discovery of openly licensed research, free of burdensome copyright restrictions, while content, including Open Context resources. Such openness still protecting scholarly attribution. Copyright law pro- ensures that the Open Context content is of maximum hibits unauthorized reproduction or derivative uses of all value for reuse in both instructional and research applica- expressive works. These legal barriers work against some of tions. Finally, to facilitate scholarly applications, citation the innate advantages of digital content, namely the ease to information is automatically generated for each item in which digital information can be readapted in new works the database (Figure 8). Stable URLs to each item in Open and then globally shared. In order to encourage scholarly Context facilitate citation and later retrieval.
Technical Briefs in historical archaeology Publishing Primary Data on the World Wide Web
Figure 2. An excavation diary linking to items from the Domuztepe Context Database.
Figure 3. A record from the Domuztepe Zooarchaeological Analysis Database.
Technical Briefs in historical archaeology Eric C. Kansa
Figure 4. An image linked with its small finds registry record and context.
Figure 5. A simple search for “carnelian.”
Technical Briefs in historical archaeology Publishing Primary Data on the World Wide Web
Figure 6. Results of the above “carnelian” search, showing items from multiple projects.
Figure 7. Example of an “Advanced Search” across multiple projects.
Technical Briefs in historical archaeology Eric C. Kansa
Making Sense of Multiple Projects and (Bearman and Trant 2005; Trant 2006). Open Context is Collections developing several enhancements to this system, including better ways of recognizing professional credentials and The lack of many formal standards in archaeology makes scholarly authority, and options for users to apply profes- data integration a challenge. Open Context uses special sionally developed standard vocabularies such as the Getty software to import spreadsheets and databases created by Art and Architecture Thesaurus or future ontologies such individual researchers and collections managers. We are as those advocated by Kintigh’s team (Kintigh 2006). currently completing development of a a web-based ver- sion of this import software. The web version will enable “Pinging” the Past individual contributors to upload and document their own data tables and submit them for editorial review. Because Folksonomies, once they are properly adapted for profes- Open Context uses ArchaeoML’s very flexible architecture, sional applications, and other collaborative methods of data all the original recording systems and terminologies and re- analysis can become new areas where researchers can make tained. Open Context enables researchers to publish their important scholarly contributions. “Publishing” primary da- data without forcing them to conform to overly restrictive, prede- tasets does raise some interesting questions about value and termined standards. To help make sense of this widely varying credit. An excavation dataset is not like a peer-reviewed body of material, Open Context has a user “folksonomy” paper, currently the main currency of professional achieve- system. Open Context enables users to tag items either in- ment. To begin assessing the scholarly impact of digital dividually or collectively (users can assign a tag to items in datasets, Open Context records information on visits to a query result set). When query result sets are tagged, the each record in the system. Recent studies have shown a history of query composition is automatically linked to the significant correlation between download counts and more tagging event (Figure 9). Users can also further annotate commonly used measures of citation impact in scholarly and explain the rationale behind their tag assignments. Tags papers (Brody et al. 2006). Nevertheless, a database in can be used to save search selections for future reference itself is a poor guide to understanding excavation or survey and to share sets of items with colleagues. The authorship results. In order to be better understood and used, datasets of each tagging event is documented by Open Context. are best linked with papers and narratives that synthesize Users can filter out tags and tag authors they consider un- observations and interpretations in a more meaningful reliable. Currently, Open Context developers, a team led framework (Richards 2003). This will necessitate technical by two archaeologists with doctoral degrees, monitor and and editorial coordination between archaeological journals moderate tagging activity to insure quality and relevance. and online data publishers. As content and tagging activity expands, subject matter To facilitate coordination with narratives, Open experts will be needed to provide editorial review of the Context automatically generates reciprocal hyperlinks folksonomy system. Reviewers will receive email notifica- with other Web services that support the open “ping-back” tion of tagging activity and can remove inappropriate tags standard. If a person using a weblog or publishing in a ping- from public view. back-enabled e-journal references an item or a set of items Folksonomies are cost-effective and simple tools in Open Context, the Open Context system will be auto- that enable a community of users to add value to pooled matically informed about what items are being referenced. content by identifying and annotating items of interest. Open Context will then display links back to the weblog Users can “tag” items with common keywords and phrases post or e-journal article referencing the Open Context da- and thereby establish and share meaningful links among tabase. Once an editorial board is assembled, all links will items from different projects and collections, even if these be subjected to editorial review. This will ensure that Open projects use different recording systems. The folksonomy Context only registers references to trusted sources (such system can facilitate semantic data integration, and recent as an e-journal with a peer-review process). As e-journal experiments suggest such systems offer annotations of suf- systems gain popularity, such features will help ensure that ficient quality to meet some needs of museum professionals Open Context users will easily find scholarly uses and in-
Technical Briefs in historical archaeology Publishing Primary Data on the World Wide Web
Figure 8. An automatically generated citation for an Open Context item.
Figure 9. A query history saved with a set of tagged items.
Technical Briefs in historical archaeology Eric C. Kansa terpretations of Open Context content. Similarly, the abil- ect provides digital longevity support for Open Context ity to reference public databases such as Open Context can pilot projects, and additional archival partnerships will be enhance e-journal publications by making primary evidence needed to better secure the often-irreplaceable archaeo- more transparent and open for critical evaluation. logical content hosted by Open Context. To learn more about contributing to Open Context, please contact the Looking Forward author via email ([email protected]). With sufficient community contributions, feedback, Open Context has a variety of demonstration datasets now and support, Open Context and related OA systems will available for exploration and testing. These include field expedite and streamline reference searches and provide a archaeology contextual records and finds registers, geo- comparative format for efficiently interpreting and reana- archaeological samples, and a variety of zooarchaeological lyzing excavation results. Similarly, making primary data ef- analyses. The system is also adding museum and reference ficiently accessible and usable can support research agendas collection datasets. Some projects have rich image collec- that are not currently achievable. By pooling primary data tions and narrative material, and others are of primary resources in systems with powerful analytic tools, such interest for specialist comparative analyses. systems should enable broad regional syntheses that are The primary goal now is to build a critical mass of more comprehensive and more analytically rigorous than users, contributors, and content needed to sustain Open are currently feasible (Kansa 2005; Kintigh 2006). Context as a valued scholarly resource. Because Open Contributing to the development and use of systems Context supports dissemination of highly structured like Open Context illustrates one way the archaeological database content along with textual narratives, such as community can participate in the broader shifts toward historical documents, and other media, such as photos or OA. Across the board, OA now has a great deal of mo- drawings, it can be a useful communication tool for the mentum and powerful institutional support. As of mid- historical archaeology community. Databases of excava- September 2006, the Alliance for Taxpayer Access (2006) tion records, finds registries, and comparative collections reported that 53 college presidents and 25 university can be pooled together with textual syntheses stored in provosts have signed letters in support of OA legislation. the form of HTML or PDF documents. The flexible data This lobbying demonstrates significant institutional back- structure can also enable historical archaeologists to relate ing for the reform of scholarly communication. At the archaeological datasets with other types of structured in- same time, OA journals published by the Public Library of formation, such as historical census reports, tax records, Science (2006) achieved impact factors rivaling Nature and cemetery records, or even shipping manifests. Relating Science. Other published studies document how OA makes such varied forms of documentation may open doors for research easier to find and use and allows that research to innovative research agendas in historical archaeology. To have greater impact and significance (Harnad and Brody help demonstrate and refine the applicability of Open 2004; Hajjem et al. 2005). Data hording, access barri- Context to support this subdiscipline, its developers invite ers, and overly restrictive copyright controls increasingly contributions of excavation and survey data, media, and self-marginalize both research and researchers (Willinsky museum collections from historical archaeologists. 2006:21). These broader transitions in academic commu- In addition to contributing content, members of the nication, and more discipline specific initiatives such as the historical archaeological community are also welcome to development of Open Context, all illustrate how the future participate in other capacities. Open Context needs edito- of the archaeological past is looking increasingly open. rial assistance to help insure quality and oversee revision and error correction. Additional support may be provided ACKNOWLEDGMENTS through open-source software development partnerships. Finally, data sharing and collaboration among multiple The author would like to thank everyone who has par- repositories is an important digital data longevity strategy ticipated in making Open Context operational, especially (Reich and Rosenthal 2001). Currently, the OCHRE proj- David Schloen and the University of Chicago OCHRE
Technical Briefs in historical archaeology Publishing Primary Data on the World Wide Web project for their continued support and partnership. Their tion to Encourage Interaction with Museum Collections. efforts have provided the essential conceptual groundwork D-Lib Magazine 11(9)
10 Technical Briefs in historical archaeology Eric C. Kansa
Jensen, Michael Snow, Dean R., Mark Gahegan, C. Lee Giles, 2005 Presses Have Little to Fear from Google. The Kenneth G. Hirth, George R. Milner, Prasenjit Mitra, Chronicle for Higher Education 51(44):B16. and James Z. Wang 2006 Cybertools and Archaeology. Science 311(5763): Kansa, Eric 958–959. 2005 A Community Approach to Data Integration: Authorship and Building Meaningful Links across Diverse Suber, Peter Archaeological Data Sets. Geosphere 1(2):97–109. 2004 University Actions against High Journal Prices. SPARC Open Access Newsletter 72.
Richards, Julian Vaughan, Liwen, and Debora Shaw 2003 Online Archives. Internet Archaeology 15, Council 2005 Web Citation Data for Impact Assessment: for British Archaeology, Department of Archaeology, A Comparison of Four Science Disciplines. Journal of University of York, England, UK
Reich, Vicky, and David S. Rosenthal Willinsky, John 2001 LOCKSS: A Permanent Web Publishing and Ac- 2006 The Access Principle. MIT Press, Cambridge, MA. cess System. D-Lib Magazine 7(6).
Science Commons Eric C. Kansa 2006 Scholar’s Copyright Project: Background. Cre- Open Context ative Commons and Science Commons
Schloen, David 2001 Archaeological Data Models and Web Publication Using XML. Computers and the Humanities 35:123–152.
Technical Briefs in historical archaeology 11