PhD Dissertation International Doctorate School in Information and Communication Technologies

Universidad Católica DISI – University of Trento “Nuestra Señora de la Asunción”

Knowledge and Artifact Representation in the Scientific Lifecycle

Ronald Chenu-Abente Acosta

Advisor: Co-Advisor:

Prof. Fausto Giunchiglia Prof. Luca Cernuzzi

March 2012

node

node Abstract

node

This thesis introduces SKOs (Scientific Knowledge Object) a specification for capturing the knowledge and artifacts that are produced by the scientific research processes. Aiming to address the current existing limitations of scientific production this specification is focused on reducing the work overhead of scientific creation, being composable and reusable, allow continuous evolution and facilitate collaboration and discovery among researchers. To do so it introduces four layers that capture different aspects of the scientific knowledge: content, meaning, ordering and visualization.

Executive Summary

node

Scientific papers or Articles were introduced during the 17th century when the first academic journals appeared. Since then these papers, along with the scientific conferences or symposiums, have become the cornerstones of the scientific community and research. Currently, not very unlike those early times, when an author wants to publish a scientific paper he has to submit a physical and or digital copy of it to an academic journal, where it goes through a process of Peer reviewing to determine if its publication is suitable (a similar process occurs in the case of submitting papers to conferences and workshops). Furthermore, the most common (and sometimes effective) way of discussing ideas with colleagues is still through the age-old tradition of organizing scientific conferences or symposiums.

This model based on papers, persons and scientific events has remained mostly undisturbed up to now. This is the case even if the Internet and the web are slowly reducing the costs related to the dissemination process and providing new ways of contact and interaction. Several studies have been carried out to find the existing limitations on the creation, dissemination, evaluation and credit attribution of these scientific artifacts. Furthermore, new types of less formal artifacts like web pages, blogs, comments, bookmark sites among others have also been increasingly popular in scientific environments. While these emerging web-based new types of scientific artifacts are generally less well-regarded than the traditional ones, it is nonetheless, irrefutable that they also are increasingly being used to disseminate, discuss, structure and, ultimately, advance scientific knowledge.

In the light of all the known problems and limitations of the current fragmented approach, explored in several studies, it is natural to wonder what would make an ideal scientific artifact format? One that is able to cope with the demands from creators, reviewers, publishers and consumers of scientific content while also enabling new interactions between these groups. More specifically, some of the requirements for this 'ideal' artifact format would be the following:

 It is composable and complex : allowing the definition of new artifacts from simpler ones, aggregating several types of data (e.g texts, spreadsheets, images, videos, etc) to facilitate the production of new knowledge or simply different executions or presentations of already existing knowledge.  It allows continuous evolution : allows continuous creation and distribution of evolutions of the scientific artifact, while also supporting the current model of discrete releases at key points in time.  It facilitates collaborative work : the previous properties like evolvability and composability, along with Internet-enabled interaction, greatly facilitate the creation of documents as collaborative effort between different actors.  It introduces improved models for certification and credit attr