Objectives of the Workshop May 20, 2010

To crowd‐open source the electronic printing press of the 21st century with the goal of improving how science is disseminated and comprehended. The current objectives of the workshop are: 1. To define the requirements of various classes of user (reader, author, publisher, reviewer, editor, librarian etc.) that need to be met when developing research objects (RO) that extend the features of a scientific research article beyond that typically embodied in a PDF today to take full advantage of the power of the Internet. This includes, but is not limited to: a. Providing knowledge and data integration – currently the data associated with a research article are typically not available, or in a disparate database, or computationally unusable as a supplement to a PDF, b. Bringing additional senses to the problem ‐ enhance the meaning and/or comprehension of science through the use of rich media (video and podcasts) which are not included with a PDF, or if available divorced (i.e., not integrated) with the work. c. Provide reproducibility ‐ the PDF does not capture the workflow nor any of the discourse associated with the work. d. Provides Interoperability with other ROs and with other forms of discourse e.g., blogs. e. Provide interactivity ‐ the ability to interact with the content in context sensitive ways. f. Provide a living document – additional commentary, reviews, further discourse etc. can be added to the RO at any time and an audit trail maintained. g. Provide new bibliometrics – allow a more meaningful evaluation and reward system for the original work and those that critique it. h. Provide new tools for automated knowledge discovery across a broad corpus. 2. To begin to turn these user requirements into a set of specifications that embrace as far as possible emerging efforts that relate to ROs (see Related Activities) 3. To establish a process by which interested parties can contribute open source code that conforms to the specifications. 4. Have as a vision an end product of an extensible software system that could be adopted by any interested party wishing to maintain or in some way publish ROs. It is anticipated this product will include elements of a content management system, database, and journal management system. 5. Have one or more publishers commit to publishing ROs using the open source platform to illustrate to the scientific community what can be achieved with ROs. A product of the workshop will be a summary report, including requirements, and associated initial software efforts all available from the W3C website. The main sessions of the workshop will be videotaped and made available on line. Statement of Need Science Technology and Medical (STM) publishing has hardly changed since the invention of the printing press. The Internet has been adopted as a powerful distribution medium, but for the most part the power of the medium has not been exploited to improve the comprehension and interest in STM content. We (see later for a list of people who have expressed interest in attending and helped put this proposal together) propose to change that. Recent Meeting and Motivation On March 25, 2010 the Public Library of Science (PLoS) held a one‐day meeting in San Francisco to help them define their next steps in publishing. PLoS, founded by Nobel Laureate and former head of the NIH, Harold Varmus (MSKCC) along with Pat Brown (Stanford) and Mike Eisen (LBL), has been a standard bearer in the movement within the biosciences. The PI of this proposal (Bourne) is the Editor in Chief of one of the PLoS journals and an open access advocate. Attendees at the meeting included publishers, Internet experts, advocates, librarians, information specialists, and may other professionals. One of several outcomes of the meeting was a strong feeling that collectively we should work towards a new kind of research article that unleashed the full power of the Internet to better disseminate and improve the comprehension of science. For want of a better term we refer to this as a research object (RO). At the meeting initial ideas were collected as to what that research object should provide. These ideas define the objectives of the workshop outlined in the next section. It was also determined that the next step would be to define a set of users and their requirements for ROs that mapped to these ideas. The PI committed to putting a workshop together to determine those user requirements and facilitate the next steps and it is this workshop proposal that is presented here.

Subsequent to the meeting in San Francisco a list of possible workshop attendees was assembled via email and an active dialog has ensued. The list of possible attendees, and others to be recruited to the effort, is clearly capable of making the RO concept a reality. It became clear that such an endeavor was significant and would require a strong collective effort in both the RO specification and the subsequent open source coding to make RO’s a reality. It was also strongly felt that the ownership of this endeavor should not belong to a single organization but should be a free and public resource. For this reason it was decided to organize the RO effort under the auspices of the W3C, as a W3C incubator group (http://www.w3.org/2005/Incubator/xg‐guide). Related Activities Many of the proposed attendees are working on related efforts that can provide useful input and perhaps code to the development of ROs. These include but are not limited to: • FoRC – Future of Research Communication, Anita de Waard et al. • Prospect ‐ Colin Batchelor, (RSC, London) RSC editors annotate compounds, concepts and data within the articles and linking these to additional electronic resources such as biological databases http://www.rsc.org/Publishing/Journals/ProjectProspect/index.asp • openJournals ‐ Tarek Loubani (McGill) • SWAN ‐ Tim Clark (Harvard) http://hypothesis.alzforum.org/swan/ • Papers ‐ Alex Griekspoor (Cambridge, UK) http://mekentosj.com/papers/ • SAGE Commons – Stephen Friend et al. Each of these activities and associated scientists will be invited to the workshop to leverage on‐going activities as much as possible and hopefully advance the field for everyone.

A few other activities and reference materials relevant to this applications are: • The Structured Digital Abstract, Seringhaus/Gerstein, 2008 This paper basically proposes to include a 'structured XML‐readable summary of pertinent facts' • FEBS Letters SDA, 2008 ‐ now The journal FEBS Letters adds curator‐ created triples on Protein‐Protein interaction to every appropriate paper • CWA Nanopublications ­ 2010 The Concept Web Alliance proposes to model scientific research as sets of triples; the first definition of the format has just been published. • The Semantic Biochemical Journal ‐ 2010: Using Utopia, an innovative PDF reader, this allows enrichment of the PDF with interactive figures and active data. • Article of the Future, Cell, 2009: Tabbed and hyperlinked presentation of the article; Graphical Abstract and Highlights on the landing page. • Adventures in Semantic Publishing, Oxford U, 2009: A hand‐marked up version of paper in Epidemiology with data enhancements and better browsing and reference linking. • OpenCitations.org: Modelling literature citations as RDF (2010 onward) A public RDF triplestore of biomedical literature citations encoded as Open Linked Data, and characterized using CiTO, the Citation Typing Ontology. • SciVee.TV – integrating open access content with rich media. • BioLit – semantic markup of PubMed Central. Meeting Organization We propose to hold the workshop over three days. The agenda that the group has discussed is along the following lines. This will be finalized when we know the workshop can take place. Agenda Day 1 – Current Developments with emphasis on what could be contributed by way of software and standards to a collective code/application base • The state of play – interactive PDFs, semantically enriched articles etc., • Object and document standards in place or emerging • Relevant content management systems and associated standards • Relevant data and knowledge integration efforts • Relevant use of rich media • Relevant reward and review systems Day 2 – Morning • High level discussion of scope • High level discussion of RO components Day 2 – Afternoon • Working groups around each RO component – discussion of high level requirements • Reconvene and report back to whole group Day 3 • Implementers within each working group define timelines and deliverables • Begin basic code development • Go home exhausted but charged Workshop Logistics We anticipate inviting 35 people with others invited at their own expense. It is hoped to hold the workshop in the fall of 2010 or winter of 2010‐11 at the University of California San Diego. The workshop will be held under the auspices of the San Diego Supercomputer Center (SDSC) and the California Institute for Telecommunications and Information Technology (CalIT2). These organizations, which are Organized Research Units (ORUs) at UCSD, have hosted many such workshops and the infrastructure exists to make for a very successful event.

Beyond the Workshop All materials and on‐going dialog associated with the workshop will be available and open for addition and comment on the W3C website.

FoRC is holding a workshop in the fall of 2011 and will be a suitable venue to regroup and ascertain what progress has been made and what needs to be done. Plan for Recruitment of Speakers Each of the listed people has agreed to come and speak. We will work as a group to define who is best suited to represent the component tasks at hand. This form of interaction, trust and consensus building will define the tenor of the whole project. Estimated Budget We propose inviting 15 international scientists and 20 national scientists with a number of locals and people sponsored by their respective for‐profit organizations. A number of the listed attendees will come paid by their respective organizations. A budget breakdown and justification is given in the accompanying documents. The budget was defined with the help of administrative staff who have organized many meetings of similar size and duration. Other Support Thus far the National Cancer Institute (NCI) has pledged $10,000, Microsoft has pledged of the order of $10,000 and the Mark Shuttleworth Foundation and the Doris Duke Charitable Foundation have expressed interest in helping.

Why NSF and the Office of Cyberinfrastructure? Clearly if this workshop was to lead to a successful outcome it would have the potential to change the way science is disseminated and comprehended. The underlying cyberinfrastructure requirements to host, store, disseminate, preserve, communicate, visualize and compute upon these objects is enormous and will spawn the development of a myriad of new Cyberinfrastructure developments as part of a fundamental shift towards a richer eScience environment.

People who have Expressed Interest in Attending the Workshop

Herbert Van de Anita de A Waard Carole Goble Sompel Alexander Mike Ackerman Griekspoor Jan Velterop Teresa Attwood Timo Hannay Alex Wade Virginia Barbour Eduard Hovy John Wilbanks Colin Batchelor Peter Jerram Sara Wood richard k belew Heather Joseph Tanya Beradini Ove Kahler Geoffrey Bilder Gail Kaiser Peter Binfield Robert Kelly Theodora Bloom Kerry Kroffe Katy Borner Julia Lane Philip Bourne Cliff Lynch Jean-Claude Bradley Kim Marriott Patrick Brown Barend Mons Gully Burns peter murray-rust Todd Carpenter Catherine Nancarrow Richard Cave Tim Clark David Patterson Matthew Cockerill Mark Patterson Matt Day Tracy Pelon Lee Dirks Dan Pollock Mark Doyle [email protected] Brian Schottlaender Borya Shakhnovich Lynn Fink David Shotton Reiko Fitzsimonds Elliot Siegel Marc Friedman Steve Pettifer Pascale Gaudet James Taylor Paul Ginsparg helen turvey