<<

CRISUNS ontology for theses and dissertations Lidija Ivanović1, Bojana Dimić Surla2, Segedinac3, Dragan Ivanović3 1University of , Faculty of Education, Sombor 2University of Novi Sad, Faculty of Sciences, Novi Sad 3University of Novi Sad, Faculty of Technical Sciences, Novi Sad

Abstract - A research management system CRIS UNS has This paper proposes representation of data about theses and been developed for the needs of the University of Novi dissertations using semantic web technologies. The Sad. The digital library of theses and dissertations has been CRISUNS ontology that is described in this paper uses developed as a part of the system CRIS UNS. This paper concepts of the Core (http://dublincore.org) proposes an ontology for semantic description of metadata ontology and the Friend of a Friend ontology about theses and dissertations based on FOAF and Dublin (http://xmlns.com/foaf/0.1/) for representing data about Core Terms ontologies. The proposed ontology extends theses and dissertations. mentioned ontologies with elements which exist in the system CRIS UNS and are not supported by those 2 RELATED WORK ontologies. The purpose of this research is creation of a semantic web service which will make publicly available The importance of public access to research results metadata about theses and dissertations (stored in the CRIS published in digital form for further development of science UNS database) for other software systems. is the subject of the papers [1, 2, 3, 4, 5, 6, 7, 8]. 1. INTRODUCTION Metadata about these and dissertations are stored in various systems such as NDLTD, DART-, institutional By 2012 theses and dissertations in digital format along repositories, research management systems, etc. Important with associated metadata are made available through sources of theses and dissertations are networks of digital various applications such as digital libraries, research theses and dissertations such as NDLTD (Networked management systems, institutional repositories, etc. Digital Library of Thesis and Dissertations - http://www.ndltd.org/), DART-Europe (http://www.dart- euroCRIS is a non-profit organization, dedicated to the europe.eu). NDLTD is an international organization whose development of research management systems and their aim is the creation of a worldwide network of theses and interoperability (http://www.eurocris.org/). The dissertations in digital form. Currently, there are over one organization promotes CRIS (Current Research million theses and dissertations in digital form in the Information System), an information system of scientific network. DART-Europe is network of European theses and research projects, researchers and research institutions. The dissertations. standard data model used in CRIS is CERIF (Common An institutional repository is a software system for storing European Research Information Format) which enables scientific research results (including theses and interoperability between scientific research information dissertations) of some research institution in digital form. systems that contain information about people, projects, organizations, publications, patents, equipments, etc. In recent years, the cooperation between various systems that contain theses and dissertations is discussed [9, 10]. A research information system of the University of Novi The paper [11] describes the NARCIS portal that integrates Sad called CRIS UNS is under development. The system’s research management systems of the and development is based upon recommendations and actions DARENET (Digital Academic Repositories in the of the organization euroCRIS and the system is compatible Netherlands). Furthermore, the paper [12] describes the with the CERIF data model. Furthermore, the system cooperation between research management systems and supports the entry, search and evaluation of scientific digital libraries of Pretoria University. The CRIS-IR group results. (http://www.eurocris.org/Index.php?page=CRIS- IR_workplan&t=1) aims to find out an optimal solution for Motivation for the research described in this paper is the interoperability of research management systems and creation of a semantic web service which will make institutional repositories taking into account all relevant publicly available metadata about theses and dissertations aspects. The main goal of the CRIS-OAR Interoperability (stored in the CRIS UNS database) for other software project (http://www.knowledge- systems. In this way availability of theses and dissertations exchange.info/Default.aspx?ID=340) is to increase the of research institutions that use CRIS UNS will be interoperability between research management systems and increased, and therefore the rating of those research Open Access Repositories by proposing a metadata institutions will be improved. exchange format.

164 2.1 CRIS explicit specifications of conceptualizations by which the Research management systems are very important for the meaning of data is formally represented, i.e. ontologies are development of science [13]. Intent of those systems is used [23]. Basic semantic web technologies are Resource collecting data about research institutions, researchers, Description Framework (RDF) [24], RDF Schema (RDFS) research projects, equipment, published results and other and Web Ontology Language (OWL). The main subject of relevant data for scientific research activity. In order to this paper is representation of data about theses and enable date exchange among research management systems dissertations using semantic web technologies. Further in and to enable researchers to find the information in this section a short review of ontologies for representation different systems, those systems should be built on some bibliographic data is provided. The review includes Dublin standards. CERIF (Common European Research Core, MarcOnt, ontologies in CRIS systems and the BIBO Information Format) is one such standard that proposes the ontology. data model that allows the interoperability among research management systems [14]. By 2012 many research Dublin Core information systems have been developed such as:  IST World (http://www.ist-world.org/), The Dublin Core metadata standard is a simple and  HunCRIS effective element set intended to describe wide range of (http://nkr.info.omikk.bme.hu/HunCRIS_eng.htm) digital resources [25]. Dublin Core Metadata Initiative has  CRIStin (http://www.cristin.no/), defined a standard way of refining Dublin Core elements  Pure (http://www.atira.dk/en/pure/), and they encourage the use of controlled vocabularies in  CRIS UNS (http://cris.uns.ac.rs/), addition to Dublin Core elements. In the original set of Dublin Core elements the method for representing  etc. bibliographical references is not specified. The method is CRIS UNS is a CERIF compatible research management specified by guidelines [26] which emerged through the system that has been being developed since 2008 at the activities of DCMI workgroup [27]. University of Novi Sad. This system is a part of the project BISIS (http://www.bisis.uns.ac.rs/). In this paper we use the Dublin Core Terms ontology (http://purl.org/dc/terms/) in which all the properties are CRIS UNS is built on the CERIF-compatible data model defined as object and datatype properties, instead of based on the MARC 21 bibliographic format described in annotation properties. The choice of ontology was mainly the paper [15]. Implementation of the subsystems which affected by the possibility of introducing new subproperties enable the input of the metadata about the published in CRISUNS ontology. scientific research results is described in the papers [16, 17]. Furthermore, automatic extraction of metadata from MarcOnT scientific publications for CRIS UNS system is main subject of the paper [18]. CRIS UNS allows the researchers MarcOnT is an ontology aimed at semantic representation to input their own data without any explicit knowledge of of bibliographical descriptions. Most of the classic library the CERIF data model and MARC 21 format. Scientific- systems rely on MARC 21 standard, while new digital research outputs stored in the system database are available libraries tend to support semantically richer forms such as via the Internet. The system meets the requirements: Dublin Core or BibTeX. The conversion among the forms  Prescribed by the Ministry of Education and is possible, but it can introduce the loss of information [28]. Science of Serbia in the field of scientific results evaluation. Therefore, the system data model is The three standards reflect the trichotomy of the library extended with necessary entities [19]. users: librarians tend to use MARC 21, researchers most  Prescribed by the CERIF standard. often use BibTeX, while Dublin Core suits the generic Internet users best [29]. Data model and architecture of the system CRIS UNS allow easy integration of the system with library The representation of bibliographic references is an aspect information systems, interoperability of the system with of bibliographic descriptions, but, since bibliographic other European CERIF-compatible national systems, as descriptions are used to represent the publications which well as interoperability with various systems that contain are not bibliographic references (e.g. books, maps, scientific contents [20]. electronic sources), MarcOnT turns out to be too wide scoped for cases in which only bibliographical references 2.2 Putting bibliographic data onto the Semantic Web are represented. Despite this, methods for the conversion of Semantic Web is an extension of current Web in which the bibliographic descriptions, as well as the techniques of meaning of data is explicitly represented [21]. Instead of collaborative library ontologies development [30] proposed current Web architecture, i.e. a distributed network of web for MarcOnT, can be used when only the bibliographic pages, Semantic Web tends to introduce a distributed references are represented. network of the meaning of data [22]. To achieve this goal,

165 Onotlogies in CRIS bibo:identifier with its subproperties bibo:volume, bibo:issue, bibo:pageStart, bibo:pageEnd and bibo:pages Lopatenko in [31] identified the need for the integration which locate the article in a journal. among CRIS systems. The integration should be: BIBO elements that are the refinements of the elements  Easy to implement, from other ontologies are classes bibo:Conference,  Flexible enough to embrace the diversity of data bibo:Performance, bibo:Hearing that are the subclasses of meaning and structure in different organizations, the event:Event class, and object properties bibo:director, sectors of science and states, bibo:editor, bibo:translator that are the subproperties of  Powerful enough to provide sophisticated dct:contributor. information retrieval services. The examples of elements adopted from other ontologies In order to achieve such integration, Lopatenko [30] that are used in BIBO ontology are classes foaf:Person and recommends the application of ontologies. Since the foaf:Organisation, object properties dct:language and representation of data in CRIS systems is based upon dct:isPartOf which specify the relation between the article CERIF standard, the ontologies aimed at integration of and the journal, datatype properties dct:date for the CRIS systems are modeled after CERIF. First such publication year and dct:title. ontology was proposed in [28]. 3 CRISUNS ONTOLOGY FOR THESIS AND Since CRIS systems allow users to obtain the information DISSERTATION concerning scientific research results [30], it is necessary to enable the representation of bibliographic references in The paper [20] presents the data model created for the CRIS systems. However, CERIF enables the representation needs of the CRIS UNS system. That data model contains of many other aspects of scientific research in addition to all metadata about theses and dissertations prescribed by the bibliographic references, such as the information the CERIF data model. Also, the data model enables CRIS concerning scientific projects, scientific institutions, UNS system to become a member of the NDLTD network. financial aspects of scientific researches, grants, equipment, Each member of that network has to provide a service for etc. Therefore, we opt to use an ontology specially exporting metadata about theses and dissertations in ETD- designed for the representation of bibliographic references. MS format according to OAI-PMH protocol. The set of metadata about theses used in the CRIS UNS system is BIBO union of previously mentioned sets of metadata extended with the following metadata: The Bibliographic Ontology (BIBO) describes the  extended abstract, bibliographic units on the semantic Web in RDF  physical description [chapters / pages / references / (http://bibliontology.com/). As stated in the specification, tables / pictures / graphs / appendixes], this ontology can be used as citation ontology, document  UDC, classification ontology or simply for describing any  scientific discipline, document in RDF.  accepted by competent scientific institution on The elements of BIBO ontology can be divided into three [date], groups:  defended on [date],  the original BIBO elements,  holding data  BIBO elements that are the refinements of the elements from other ontologies such as Dublin Core These metadata can be presented using Dublin Core and Terms and FOAF – Friend of a friend or any other Friend of a friend (FOAF) ontologies extended with set of available ontology such as Event classes and properties. Table 1 shows mappings of CRIS (http://motools.sourceforge.net/event/event.html#) UNS theses metadata to properties of Dublin Core Terms and Publishing Requirements for Industry Standard ontology and CRISUNS ontology. Metadata used in CRIS Metadata - PRISM UNS system are shown in the first column. The second (http://prismstandard.org/namespaces/1.2/basic/), column contains CRISUNS ontology properties to which  Elements directly adopted from other ontologies the metadata are mapped, and the third column contains (Dublin Core, FOAF, Event) that are used in its super properties of those properties. original form within BIBO ontology. Domain of all properties shown in Table 1 is the Some examples of the original BIBO elements are classes crisuns:Thesis class. The foaf:Person class is range for bibo:Proceedings, bibo:Journal, and bibo:Document with properties which describes person: crisuns:author, its subclass bibo:Thesis. The examples of BIBO properties crisuns:advisor, crisuns:chair, crisuns:comitteeMember. are datatype properties bibo:isbn, bibo:issn, bibi:uri which are the unique identifiers of the publications and

166 CRIS-UNS property superproperty foaf:Person. Dissertation is given as individual named author crisuns:author dct:creator crisuns:ThesisBDS which is instance of the Thesis class. advisor crisuns:advisor dct:contributor Metadata about dissertation are presented as values of chair crisuns:chair dct:contributor properties of this individual. For instance, dissertation title committee crisuns:committee dct:contributor is the value of the property dct:title, and publisher is the member Member value of the property dct:publisher. The dissertation was title dct:title - written in Serbian which is shown as URI of Serbian alternative title crisuns:alternative dct:title language code in MARC code list. Physical description of Title the dissertation is provided in format prescribed by the subtitle crisuns:subtitle dct:title University of Novi Sad: chapters/pages/literature/tables/ keywords crisuns:keyWord dct:subject pictures/graphs/appendix. The dissertation is linked with abstract dct:abstract dct:description author, advisor, chair and committee members by referencing to appropriate individuals. extended abstract crisuns:extended dct:description Abstract crisuns:MilosRackovic a foaf:Person; note crisuns:note dct:description foaf:familyName crisuns:Rackovic; language dct:language - foaf:firstName crisuns:Milos; ISBN crisuns:isbn dct:identifier foaf:title "Dr", "Full professor". physical crisuns:physical dct:description crisuns:Rackovic crisuns:hasValue description Description "Racković"; UDC crisuns:udc dct:description a rdfs:Literal. publisher dct:publisher - crisuns:Milos crisuns:hasValue "Miloš"; publication date crisuns:publication dct:date a rdfs:Literal. Date crisuns:MasulovicDragan a foaf:Person; record type crisuns:record dct:type foaf:familyName crisuns:Masulovic; Type foaf:firstName crisuns:Dragan; content format crisuns:content dct:format foaf:title "Dr", Format "Full professor". URI crisuns:uri dct:identifier crisuns:Masulovic crisuns:hasValue access rights dct:accessRights dct:rights "Mašulović"; a rdfs:Literal. thesis type crisuns:thesisType dct:type crisuns:Dragan crisuns:hasValue "Dragan"; name of author crisuns:theses - a rdfs:Literal. degree after Degree crisuns:ZoraKonjovic a foaf:Person; defence foaf:familyName crisuns:Konjovic; level of education crisuns:levelOf - foaf:firstName crisuns:Zora; Education foaf:title "Dr", "Full professor". scientific field crisuns:scientific dct:subject crisuns:Konjovic crisuns:hasValue Field "Konjović"; scientific crisuns:scientific dct:subject a rdfs:Literal. discipline Discipline crisuns:Zora crisuns:hasValue "Zora"; accepted by crisuns:date dct:date a rdfs:Literal. scientific AcceptedBy Accepted institutionon Institution crisuns:SinisaNeskovic a foaf:Person; foaf:familyName crisuns:Naskovic; institution crisuns:grantor - foaf:firstName crisuns:Sinisa; defended on crisuns:defendedOn dct:date foaf:title "Dr", holding data crisunc:holdings - "Full professor". Data crisuns:Neskovic crisuns:hasValue Table 1. Data mappings "Nešković"; a rdfs:Literal. 3.1 THE EXAMPLE OF INDIVIDUAL crisuns:Sinisa crisuns:hasValue "Siniša"; a rdfs:Literal. This section provides an example (Listing 1) of description crisuns:BojanaDimicSurla a foaf:Person; of PhD dissertation using proposed CRISUNS ontology: foaf:familyName crisuns:DimicSurla; Bojana Dimić Surla, Software system for MARC 21 foaf:firstName crisuns:Bojana; cataloguing, Faculty of Sciences, Novi Sad, 2009. foaf:title "Mr". Dissertation author, advisor, chair and committee members crisuns:Bojana crisuns:hasValue "Bojana"; in Listing 1 are presented using instances of class a rdfs:Literal. crisuns:DimicSurla crisuns:hasValue "Dimić

167 Surla"; to various systems based on those international standards. a rdfs:Literal. Also, integration capabilities of the CRIS UNS system are crisuns:OrganizationPMFNS dct:spatial improved by the fact that the CRISUNS ontology uses crisuns:TrgDositejaObradovica6; elements of common vocabularies, namely Dublin Core a foaf:Organization; and Friend of a friend. The elements that are relevant for foaf:name crisuns:PMF. representing theses and dissertations but not directly crisuns:PMF crisuns:hasValue "Faculty of supported by the mentioned ontologies are defined as Sciences"; a rdfs:Literal. subelements of the elements of the ontologies. crisuns:ThesisBDS dct:abstract "Modelling The next tasks are the implementation of a software and implementation of software system for component which will export data from the CRIS UNS MARC 21 cataloguing have been done..."; dct:language system to proposed ontology, as well as the implementation ; interoperability with other systems. To enable the export of dct:publisher "author's reprint"; all the data stored in the CRIS UNS system, the proposed dct:title "Software system for MARC 21 ontology will have to be extended with elements that cataloguing"; represent personal data about researchers, data about crisuns:advisor crisuns:MilosRackovic; patents, products and other scientific-research outputs and crisuns:author crisuns:BojanaDimicSurla; data about scientific-research projects. This extension is crisuns:chair crisuns:DraganMasulovic; crisuns:comitteeMember especially important for integration of regional CRIS crisuns:ZoraKonjovic, systems. crisuns:SinisaNeskovic crisuns:grantor 5. REFERENCES crisuns:OrganizationPMFNS; crisuns:holdingsData " Library of [1] Lawrence, S. 2001. Free online availability Department of Mathematics and Informatics, substantially increases a paper’s impact. Nature, 411, Trg Dositeja Obradovića 4"; paper 521. crisuns:keyWord "cataloguing", [2] Harnad, S. and Brody, T. 2004. Comparing the impact "Eclipse", of Open Access (OA) vs. non-OA articles in the same "EMF", "MARC 21", journals. D-Lib Magazine. 10 (6), accessed August 22, "Xtext"; 2011, available at: crisuns:physicalDescription http://www.dlib.org/dlib/june04/harnad/06harnad.html. "8/240/111/0/107/0/0"; (accessed Novenber 2, 2011) crisuns:scientificDiscipline "information [3] Antelman, K. 2004. Do open-access articles have a systems"; greater research impact? College & Research Libraries, crisuns:scientificField "informatics"; 6595, 372–382. crisuns:thesisDegree "PhD in informatics"; [4] Anderson, K., Sack, J., Krauss, L. and O’Keefe, L. crisuns:thesisType "PhD dissertation"; crisuns:uri 2001. Publishing online-only peer-reviewed biomedical "http://diglib.uns.ac.rs/ndltd/docs/set4/ndl literature: Three years of citation, author perception, td608/Disertacija-BojanaDimicSurla.pdf"; and usage experience. Journal of Electronic Publishing. a crisuns:Thesis. 6 (3), accessed August 22, 2011, available at: Listing 1. Individual for dissertation http://quod.lib.umich.edu/cgi/t/text/text- 4. CONCLUSIONS idx?c=jep;view=text;rgn=main;idno=3336451.0006.303 (accessed November 2, 2011) Semantic data model is a start point for creation of [5] Kurtz, M. J., Eichhorn, G., Accomazzi, A., Grant, C., semantic web service that should enable interoperability of Demleitner, M., Henneken, E., et al. 2005. The effect various systems which contains metadata about theses and of use and access on citations. Information Processing dissertations such as: research management systems, library and Management. 41 (6), 1395–1402 information systems and institutional repositories. [6] Kurtz, M. J., Eichhorn, G., Accomazzi, A., Grant, C., Demleitner, M. and Murray, S. S. 2005. Worldwide This paper presents a case study in which the CRISUNS use and impact of the NASA astrophysics data system ontology is used for describing the metadata about theses digital library. Journal of the American Society for and dissertations stored in the CRIS UNS system. The Information Science and Technology. 56 (1), 36–45 CRIS UNS system is based on data model compatible with [7] Kurtz, M. J., Eichhorn, G., Accomazzi, A., Grant, C., the CERIF standard and the bibliographic standard MARC Demleitner, M., Murray, S. S., et al. 2005. The 21. Compatibilities with those international standards allow bibliometric properties of article readership the proposed semantic web service to make publicly information. Journal of the American Society for available metadata about theses and dissertations accessible Information Science and Technology. 56 (2), 111–128

168 [8] Eysenbach, G. 2005. Citation advantage of open access Scientometrics. 86 (1), 155-172. articles. PLoS Biology. 4 (5), 692–698 DOI=10.1007/s11192-010-0228-2 [9] Joint, N. 2008. Current research information systems, [20]Ivanovic, L., Ivanovic, D. and Surla, D. 2012. A data open access repositories and libraries. Library Review. model of theses and dissertations compatible with 57 (8), 570-575 CERIF, Dublin Core and EDT-MS. Online Information [10]Krause, J. 2002. Current Research Information as Part Review, 36 (4) (in print) of Digital Libraries and the Heterogeneity Problem [21] Berners-Lee, T., Hendler, J., and Lassila, O. 2001. Integrated searches in the context of databases with The Semantic Web. Scientific American . 284, 34-43. different content analyses. 6th International [22] Allemang, D. and Hendler, J. 2008. Semantic Web Conference on Current Research Information Systems, for the Working Ontologist: Modeling in RDF, RDFS University of Kassel, August 29 - 31, 2002. and OWL. , Nederland: Morgan Kaufmann [11]Dijk, E., Baars, C., Hogenaar, A. and van Meel, M. Elsevier. 2006. NARCIS: The Gateway to Dutch Scientific [23]Gruber, T. 1995. Toward Principles for the Design of Information. Elpub 2006 Conference, Bansko, 14- Ontologies Used for Knowledge Sharing. International 16, 2006. Journal of Human and Computer Studies. 43 (5/6), [12]Olivier, E. 2009. Open Scholarship & research 907-928. reporting in tandem: creating more value. The African [24]Resource Description Framework (RDF), available at: Digital Scholarship & Curation Conference, May 12- http://www.w3.org/RDF/ (accessed January 27, 2012) 14, 2009, available at: [25]Hillmann, D. 2005. Using Dublin Core. Dublin Core http://www.ais.up.ac.za/digi/docs/olivier_paper.pdf. Metadata Initiative, available at (accessed November 2, 2011) http://dublincore.org/documents/usageguide/#whatis [13]Zimmerman, E. 2002. CRIS-Cross: Current Research (accessed November 2, 2011) Information Systems at a Crossroads. Proceedings of [26] Apps, A. 2005. Guidelines for Encoding the 6th International Conference on Current Research Bibliographic Citation Information in Dublin Core Information Systems, University of Kassel, August 29 - Metadata. Dublin Core Metadata Initiative, available at: 31, 2002, 11-20 http://dublincore.org/documents/dc-citation-guidelines/ [14] Asserson, A., Jeffery, K. and Lopatenko, A. 2002. (accessed Novenber 2, 2011) CERIF: Past, Present and Future: An Overview. 6th [27] Apps, A. 2006. DCMI Citation Working Group. International Conference on Current Research Dublin Core Metadata Initiative, available at: Information Systems, University of Kassel, August 29 - http://dublincore.org/groups/citation/ (accessed 31, 2002 November 2, 2011) [15]Ivanović, D., Surla, D. and Konjović, Z. 2011. CERIF [28]Lopatenko, A., Serebrakov, A., and Filipova, A. 2001. compatible data model based on MARC 21 format. Metadata usage in Digital Libraries for Research and The Electronic Library, 29 (1), 52-70. Technology. Creating and support of application DOI=10.1108/02640471111111433 profiles for science. Digital Libraries: Advanced [16]Ivanović, D., Milosavljević, G., Milosavljević, B. and Methods and Technologies, Digital Collections. Surla, D. 2010. A CERIF-compatible research . management system based on the MARC 21 format. [29]Kruk, S. R., Synak, M. and Zimmermann, K. 2005. Program: Electronic libarary and information systems. MarcOnt - Integration Ontology for Bibliographic 44 (1), 229-251 DOI=10.1108/00330331011064249 Description Formats. Proceedings of the International [17]Milosavljević, G., Ivanović, D., Surla, D. and Dublin Core Conference. . Milosavljević, B. 2011. Automated construction of the [30]Lopatenko, L. 2001. Information retrieval in Current user interface for a CERIF-compliant research Research Information Systems. Workshop on management system. The Electronic Library. 29 (5), Knowledge Markup and Semantic Annotation. 565 – 588. DOI= 10.1108/02640471111177035 Victoria. [18]Kovačević, A., Ivanovic, D., Milosavljevic, B., [31] Dabrowski, M., Kruk, S. R., Pajak, S., Nowacki, Konjovic, Z. and Surla, D. 2011. Automatic extraction M., Marmolowski, M., Piotrowski, P., et al. 2007. of metadata from scientific publications for CRIS Collaborative Ontology Development with MarcOnt systems. Program: electronic library and information Portal. Semantic Technology Conference. San Jose. systems. 45 (4), 376 – 396. DOI=10.1108/00330331111182094 ACKNOWLEDGMENTS [19]Ivanović, D., Surla, D. and Racković, M. 2011. A The work is supported by Ministry of Education and CERIF data model extension for evaluation and Science of the Republic of Serbia, through project no. quantitative expression of scientific research results. III47003: "Infrastructure for technology enhanced learning in Serbia".

169