104 LRTS 56(2)

Notes on Operations Integration of a Research Management System and an OAI-PMH Compatible ETDs Repository at the University of Novi Sad, Republic of Serbia Lidija Ivanović, Dragan Ivanović, and Dušan Surla

This paper discusses the extension of the Current Research Information System (CRIS) at the University of Novi Sad, Republic of Serbia, to incorporate elec- tronic theses and dissertations (ETDs). Data describing ETDs is entered using a web application that enables researchers to input their own data through a webpage without knowing the standards on which the system is based. The ETDs repository can exchange data with CRIS institutional repositories and Networked Digital Library of Theses and Dissertations members. In this way, the interna- tional visibility of theses and dissertations created at the University of Novi Sad is enhanced without duplicating data entry in various systems. This approach has been verified and tested on a dataset of theses and dissertations at the University of Novi Sad.

ublic access to theses and dissertations via the Internet is important for the P development of a knowledge-based society. A knowledge-based society relies on the knowledge of its citizens to drive entrepreneurship, innovation, and vitality of that society’s economy. A knowledge-based society possesses a community of scholars, researchers, research networks, engineers, technicians, and businesses engaged in research and the production of high-technology goods and provision Lidija Ivanović ([email protected]. ac.rs) is a teaching assistant, Faculty of of services. It forms a national innovation and production system, which is inte- Education, University of Novi Sad, Som- grated into international networks of knowledge production. Its communication bor, Republic of Serbia; Dragan Ivanović and information technological tools make vast amounts of human knowledge ([email protected]) is an assistant pro- fessor, Faculty of Technical Sciences, Uni- easily accessible. This paper describes a test bed project at the University of versity of Novi Sad, Novi Sad, Republic Novi Sad (UNS), Republic of Serbia, which aims to improve international access of Serbia; and Dušan Surla (surla@uns. to UNS research. The approach described here can inform projects at other ac.rs) is a professor ermeritus, Faculty of Sciences, University of Novi Sad, Novi institutions. Sad, Republic of Serbia. One approach to achieving a knowledge-based society can be through Submitted October 14, 2011; tentative- depositing electronic dissertations and theses (ETDs) in a freely accessible digital ly accepted November 20, 2011, pend- repository. Assigning appropriate metadata to ETDs can improve discoverability ing modest revision; revision submitted December 6, 2011, and accepted for by increasing their visibility. Furthermore, visibility of ETDs can be increased by publication. putting the digital object or its descriptive metadata (or both) into systems con- This paper is part of the research project taining theses and dissertations, such as digital libraries, research management “Infrastructure for Technology Enhanced systems, institutional repositories (IRs), the Networked Digital Library of Thesis Learning in Serbia,” supported by the Ministry of Education and Science of the and Dissertations (NDLTD), DART-Europe E-thesis portal, Digital Repository Republic of Serbia [Project No. 47003]. Infrastructure for European Research (DRIVER), and others. These initiatives  105 LRTS 56(2)

and related terms are explored in with an IR. The system architecture recommends a single point of entry to detail later in this paper. enables easy integration with library research articles regardless of whether Current Research Information information systems, which are based it is through an institution repository System (CRIS) at the University of on MARC 21format and also can hold or a research management system. He Novi Sad (UNS), Republic of Ser- metadata about ETDs.3 notes that three institutions (Glasgow bia, is a Common European Research The goal of the integrated sys- University, Southampton University, Information Format (CERIF)–com- tem, developed at UNS in accordance and Kingston University) have already patible research management system with CERIF, DC, ETD-MS, and OAI- implemented this approach. that has been in development since PMH, is to avoid or reduce dupli- Krause suggests the creation of 2008 at UNS.1 CERIF is “a compre- cated inputs on the two platforms and a virtual library that aims to enable hensive metadata standard and data increase metadata quality, reliability, users to gain integrated access to all exchange model that can be used for and reusability. relevant information in their special a very broad range of purposes involv- scientific field, irrespective of the loca- ing the management and exchange of tion of metadata and digital form of research data” developed by the Euro- Literature Review documents.8 A virtual library includes pean Organization for International a single point for creation of queries Research Information (www.eurocris Scientific research is an important that are sent to all systems that are part .org).2 This system has been extended component of knowledge. Much con- of the virtual library and integrates at UNS with a module for storing temporary scientific research along results retrieved from the systems. ETDs. The primary motivation for this with its associated metadata is available “NARCIS: The Gateway to Dutch expansion of CRIS UNS has been to in digital format via various means, Scientific Information,” by Dijk and increase the international visibility of such as digital libraries, research man- colleagues, describes the National Aca- theses and dissertations by UNS schol- agement systems, IRs, and publishers’ demic Research and Collaborations ars. Increasing the visibility of ETDs platforms. to scientific Research System (NARCIS) portal can be achieved in the following ways: research enhances further develop- (www..nl), which provides access ment of science.4 Maximizing the vis- to all scientific research information • Exchanging data between the ibility of scientific research is essential in the .9 That system is CRIS UNS system and other for scientific advancement. Visibility an integration of the Netherlands research management systems can be enhanced by putting research research management system and the according to the CERIF stan- into IRs that are OAI-PMH interop- Digital Academic Repositories in the dard. erable.5 The OAI-PMH protocol was Netherlands (DARENET).10 Olivier • Exchanging data between the primarily developed as a low-barrier describes collaboration between the CRIS UNS system and IRs method for interoperability between research management system and the in Dublin Core (DC) format metadata repositories and provides an digital library at Pretoria University.11 according to the Open Archives interoperability framework based on The general objective of the Initiative Protocol for Metadata metadata harvesting by defining two CRIS-IR group is “to work out an opti- Harvesting (OAI-PMH). classes of participants: data provid- mal solution for the interoperability • Membership in the NDLTD ers that expose metadata and service of Research Management Systems on network, i.e., exchanging data providers that harvest metadata. The the one hand and Institutional Reposi- between CRIS UNS and oth- IR’s metadata schema has a key role tories on the other, on a European er members of the NDLTD in increasing interoperability of the scale, taking into account all relevant network in the Interoperabili- repository, i.e., maximizing visibility of aspects.”12 The aim of the Current ty Metadata Standard for Elec- theses and dissertations that are stored Research Information Systems and tronic Theses and Dissertations in the repository.6 Open Access Repository (CRIS/OAR) (ETD-MS) format according to A rich metadata schema enables interoperability project is to increase OAI-PMH protocol. establishing relations between vari- the interoperability between research ous systems that contain scientific management systems and open access Another motivation for building research. Collaboration between those repositories “by defining and propos- this module was the creation of a systems has been discussed in recent ing a metadata exchange format for unique system with all relevant data years. According to Joint, sharing data publication information with an associ- for scientific research activities. The between institution repositories and ate common vocabulary.”13 system described here is a research research management systems to avoid The aim of integrating systems management system integrated duplication of efforts is necessary.7 Joint that contain scientific research is to (LRTS 56(2  106

maximize visibility of scientific research Context: CRIS UNS of Ajax-enabled components for JSF. and avoid duplicating input of the same The Apache Lucene (http://lucene metadata in various systems. Many CRIS UNS is a CERIF-compatible .apache.org) library is an open-source libraries worldwide store metadata in research management system under information retrieval library written the MARC 21 format. Those libraries development since 2008 at UNS. The in Java and is used for indexing and have electronic services that enable first phase of CRIS UNS development searching text contents. Text indexing downloading metadata about biblio- was the implementation of a system and query processing include a Cyrillic graphic records related to scientific for entering metadata about published to Latin transliteration algorithm. All research, thus the interoperability of scientific research including papers index entries are stored as Latin text, a repository of scientific research with published in journals, papers from sci- thus enabling the use of both scripts those libraries is important for increas- entific conferences, monographs, and in searching. On the other hand, data- ing visibility. Frequently, electronic papers published in monographs. base contents hold information as it services will enable metadata exchange CRIS UNS is built on the CERIF- was entered by the user, preserving in DC, but the fact that DC is not compatible data model based on the the correct script. Cyrillic to Latin strict a standard can cause problem MARC 21 format described in the pre- transliteration is unambiguous. This in metadata interoperability. However, vious section. CRIS UNS was imple- means that every character of Cyrillic the metadata defined by this format are mented as a web application based has an appropriate character in the a subset of metadata defined by MARC on “best-of-breed” open-source com- Latin (or Roman) alphabet—and that 21. When searching databases of scien- ponents written in Java.16 The system every word written using Cyrillic can tific research, users can better express has three-tier architecture. Three-tier be unambiguously translated to a word their information needs if the research architecture contains a client tier (the using Latin characters. The MySQL is described in a richer set of meta- presentation logic, including simple (www.mysql.com) database manage- data. A CERIF-compatible data model control and user input validation), ment system is used for data preser- based on the MARC 21 format makes middle tier (the business processes vation.4 The system data model and CRIS systems interoperable with logic and the data access), and data tier architecture enable easy integration library information systems.14 In this (the data server provides the business of the system with LIS and interoper- model some CERIF data are stored data).17 Any web browser supporting ability with other CERIF-compatible in the MARC 21 format data model. HTML 4 and JavaScript can be used national CRIS systems. As noted earlier, CERIF defines a for application access. Published results from the system data model that enables interoperabil- The server side of the system are available to anonymous users via ity between CRIS implementations; is executed within the Apache Tom- the Internet. Moreover, the system is MARC 21 is a standard for storing cat (http://tomcat.apache.org) appli- in accordance with the CERIF stan- data for library systems. That model cation server. Apache Tomcat is an dard and meets requirements pre- includes all entities and attributes of open-source software implementation scribed by the Republic of Serbia the CERIF data model and preserves of the Java Servlet and JavaServer Ministry of Science and Technological the existing references between the Pages (JSP) technologies. A servlet Development in the field of scientific CERIF data model entities. Further- is “used to extend the capabilities of research results evaluation. Therefore more, that model enables input of mul- servers that host applications accessed the system data model is extended tilingual data prescribed by the CERIF via a request-response programming with necessary entities.20 The system standard. The MARC 21 format is rich model. Although servlets can respond is implemented as a web application in metadata and enables more detailed to any type of request, they are com- that enables authors to input meta- description of entities in CRIS systems. monly used to extend the applications data about their own research without A MARC 21 record can store all meta- hosted by Web servers.”18 JSP technol- knowledge of the CERIF standard and data prescribed by DC and ETD-MS ogy “provides a simplified, fast way to the MARC 21 format. format.15 An information system based create dynamic web content.”19 on the CERIF-compatible data model The presentation tier is devel- can exchange data with other systems oped using the JavaServer Faces (JSF) Research Method using XML documents (which have development environment (www.jcp XML schemas prescribed by CERIF .org/en/jsr/detail?id=252) and Rich- The first step in this project was analy- standard) and can exchange data with Faces (www.jboss.org/richfaces) library. sis of various systems that contain LIS based on the MARC 21formats JSF technology simplifies building metadata about theses and disserta- and with IRs based on DC or ETD-MS user interfaces for JavaServer appli- tions. The following are international format. cations, and RichFaces is a library initiatives:  107 LRTS 56(2)

• NDLTD (www.ndltd.org) is information system for storing was realized using “best-of-breed” an international organization data on current research (e.g., open-source components written in that aims to create a worldwide data about institutions, research- Java. After the authors developed the network of ETDs. Each digi- ers, research projects, equip- module, it was verified and tested tal repository that is a network ment, published results, etc.). on EDTs by researchers in the Fac- member has to enable metadata The European Union encourag- ulty of Sciences, UNS. After migra- exchange in the ETD-MS for- es the development of national tion of the existing dataset containing mat (developed by DNLTD) in research management systems ETDs along with associated metadata accordance with OAI-PMH.21 in accordance with the CERIF from the DIGLIB UNS system to the • DART-Europe E-Thesis Por- standard.22 CERIF-compatible CRIS UNS system, UNS researchers tal (www.dart-europe.eu/con research management systems verified the migrated data about their tributors/how.php) aims to col- are called CRIS. Due to specific theses and dissertations and supplied lect details of the open access local or national requirements, additional data; the CRIS UNS meta- research theses stored in CRIS systems are built on dif- data set is richer than the DIGLIB Europe’s digital repositories ferent modifications (or exten- UNS metadata set. These steps are (doctoral and master theses). It sions) of CERIF data model.22 covered in detail in the following collects metadata in DC using • A library information system sections. OAI-PMH. (LIS) is a software system for • DRIVER (www.driver-com acquiring, cataloging, and cir- Data Model Definition munity.eu) is an international culating library holdings. LIS organization co-funded by the are built on various bibliograph- After analysis of various systems that European Commission with the ic standards; most are based on contain metadata about theses and dis- goal of creating a network of MARC 21 formats. sertations (NDLTD, DART-Europe freely accessible digital repos- E-thesis portal, DRIVER, IRs, CRISs, itories with content across all Across these systems, different stan­ LIS, DIGLIB UNS), a comprehensive academic disciplines. Each dig- dards and protocols—CERIF, OAI- metadata set was defined to create a ital repository that is a network PMH, DC, ETD-MS, and MARC— repository that is compatible with vari- member has to enable metadata enable interoperability. ous ETDs systems. DIGLIB UNS is exchange in DC in accordance After analysis was completed, the IR at UNS and contains theses and with the OAI-PMH protocol. a comprehensive metadata set was dissertations from the university. This defined to develop a repository that system allows input of metadata about In addition, many academic and is compatible with all previously theses and dissertations as required research institutions and research mentioned systems.Then the authors by the UNS rule book, which defines communities may implement and extended the CRIS UNS data model to key words that all the university theses manage the following approaches to store all metadata about ETDs as well and dissertations must have. Table 1 collecting, preserving, accessing, and as the ETDs as digital objects. Finally, presents the list of metadata elements disseminating research: the authors expanded CRIS UNS with selected for CRIS UNS and indicates a module for storing ETDs along with their presence or absence in CERIF, • IRs are online systems that col- associated metadata. An object-orient- DC, and ETF-MS. This metadata set lect, preserve, and disseminate ed method was used for the module unites metadata describing EDTs, the intellectual output in digital modeling. Object-oriented modeling drawing from all standards used in the form of an institution. IRs may creates models using object-oriented DIGLIB UNS (diglib.uns.ac.rs).23 The use open-source software, such diagrams (class diagram, sequence set of metadata about EDTs adopted as DSpace (www.dspace.org) diagram, etc.), which is the starting for the CRIS-UNS system unites the and Fedora (http://fedora-com- point for implementing a system using metadata sets prescribed by CERIF, mons.org), or hosted, propri- object-oriented programming lan- DC, and ETD-MS format, extended etary software, such as Digital guage. The modeling was carried out by metadata that are used in DIGLIB Commons (http://digitalcom- using the Sybase PowerDesigner tool UNS to meet the needs of the UNS. mons.bepress.com) and Sim- that supports OMG’s Unified Mod- pleDL (www.simpledl.com). eling Language (UML) 2.0 (www. Data Model Extension Many IRs support the exchange omg.org/spec/UML/2.0). The module of data in DC via OAI-PMH. model can be obtained by contact- As already stated, the CRIS UNS • A CRIS is a database of other ing the authors. The implementation data model holds data about scientific (LRTS 56(2  108

research in MARC 21 format. MARC Table 1. Metadata about Theses and Dissertations Adopted for the CRIS-UNS System 21 records are stored using an attri- bute of the MARC 21 record entity CRIS-UNS CERIF Dublin Core ETD-MS that holds a string representing a author + + + MARC 21 record serialized accord- advisor - - + ing to the International Standards chair - - + Organization (ISO) 2709 standard, which sets out the format for infor- committee member - - + mation exchange.24 Upon serializing title + + + the MARC 21 record in an ISO alternative title - - + 2709 string, the record is stored in subtitle + - - the database and its contents are keywords + + + indexed using the Apache Lucene abstract + + + information retrieval library. MARC 21 records can be classified using extended abstract - - - the entity MARC 21 Record_Class: note + - + master thesis, PhD dissertation, and language - + + so on. Also, that entity can be used ISBN + - - for the definition of the scientific physical description + - - field and scientific discipline of the UDC - - - research, such as mathematics, com- puter sciences, biology, information publisher + + + systems, and artificial intelligence. publication date + + + Using that entity, records can be record type - + + divided in sets and the OAI-PMH content format - + + “ListRecords” requirement, which URI + + + mandates the ability to download access rights - + + only records that belong to a defined set, can be met. thesis type + - - The earlier CRIS UNS data model name of author degree after defense - - + was extended by adding four attri- level of education - - + butes to the MARC 21 Record entity. scientific field - - + These added attributes are creator, scientific discipline - - - dateOfCreation, modifier, and dateO- fLastModification. Date of creation accepted by competent scientific institution on - - - and date of the last modification are institution + + + necessary to meet all requirements defended on - - - prescribed by the OAI-PMH protocol; holding data - - - the OAI-PMH ListRecords request must be able to download only records that are processed in a certain period. The attributes fileName, mime, and dissertations shown in table 1 to the Furthermore, the previous CRIS length store metadata describing the extended CRIS UNS data model. The UNS data model is extended by add- digital content that is stored in a folder first column holds names of meta- ing the File_Storage entity that is of the file system of the CRIS UNS data and the second column holds intended to hold data related to the server. The folder is not directly acces- location in MARC 21 bibliographic digital form of theses or dissertations. sible through the Internet, but digital record. The first three characters of a Each instance of the File_Storage contents can be downloaded using a MARC 21 record present a field code; entity is connected to an instance of Java servlet. In this way, access to digi- the next two characters present the the MARC 21 Record entity that holds tal content is controlled, i.e., the Java first and the second indicator, respec- bibliographic metadata about the the- servlet controls who can download tively; and the last character presents sis or dissertations. The uploader attri- digital content. a subfield code. The character “#” bute holds the e-mail address of the Table 2 shows mappings of indicates that indicator is not defined. user who uploaded the digital content. adopted metadata about theses and The last column shows some notes  109 LRTS 56(2)

Table 2. Mappings of Metadata to Data Model

Metadata MARC 21 Note author 1001# a All data about authors/advisors/chair/committee members are stored in a MARC 21 authority record; advisor 7001# a relation of thesis or dissertation with the authority record is established using the subfield 0 of data field 100/700 of MARC 21 bibliographic record. The subfield e of data field 100/700 holds relation- chair 7001# a ship type: author, mentor, thesis/dissertation defend board chair, thesis/dissertation defend board committee member 7001# a member. title 24500 a alternative title 2460# a subtitle 24500 b Translations of those metadata are stored in the field 880 as described in “CERIF Compatible Data keywords 653 ## a Model Based on MARC 21 Format.”* abstract 5203# a extended abstract 520 ## a note 500 ## a language 008 Language is stored using three letters from 35th to 37th character positions of the control field 008. Character positions starts from 0. ISBN 020 ## a physical description 300 ## Physical description is stored using subfields of the data field 300. UDC 080 ## a publisher 260 ## b The metadata holds a value author’s reprint or name of the appropriate institution. publication date 260 ## c Year of publication are additionally stored in character positions 7–10 of the control field 008. record type LDR Record type is stored in 6th character position of the leader of MARC 21 record. Character positions starts from 0. content format 856 ## q The metadata holds one of the following values: pdf, doc, docx, odt. URL 856 ## u The subfield holds the URL of a thesis or dissertation in digital form. access rights 540 ## a thesis type 655 #4 a Also stored using the MARC 21Record_Class entity of the CRIS UNS data model. name of author degree 502 ## a Name of degree is prescribed at the institution where author defends his or her thesis or dissertation. after defense For example: master of electrical engineering, doctor of technical sciences, etc. level of education 502 ## b The element holds level of education: bachelor, master, doctoral, post-doctoral, etc. scientific field 65024 a Also stored using the MARC 21Record_Class entity of the CRIS UNS data model. scientific discipline 65014 a accepted by competent 502 ## g The metadata are stored in the subfield g in the following format: scientific institution on 502 ## $gTheme of thesis or dissertation accepted on date. institution 502 ## c That subfield holds the name and address of the institution. All data about institutions are stored in a MARC 21 authority record, the relation of thesis or dissertation with the authority record is realized using entity MARC 21Record_MARC 21Record. defended on 502 ## g The metadata are stored in the subfield g in the following format: 502 ## $gThesis or dissertation defended on date. holding data 852 ## a

* Dragan Ivanović, Dušla Surla, and Zora Konjović, “CERIF Compatible Data Model Based on MARC 21 Format,” Electronic Library 29, no. 1 (2011): 52–70. about metadata and methods of their information requirements of this sub- DIGLIB UNS to the system. storing. system as the following: • Entering all metadata about EDTs that that CERIF stan- CRIS UNS Extension with ETDs • Uploading ETDs. The system dard prescribes and all meta- supports pdf, doc, docx, and odt data that are necessary for The next phase of the development file formats. Furthermore, the exchange in accordance with of CRIS UNS was to extend it with system has to backup files and the OAI-PMH protocol within a subsystem that enables uploading provides long-time preservation NDLTD. User interface has to ETDs and inputting their metada- of those files. be as simple as possible so that ta. The authors identified the basic • Migrating existing data from it can be used by users without (LRTS 56(2  110

thus enables the use of both scripts (Cyrillic and Latin) in searching. Furthermore, the system user interface is extended with user forms for uploading ETDs and entering metadata about ETDs. All textual user interface elements are stored in exter- nal files that facilitate the translation of the user interface to other languages. The first step is uploading the digi- tal content, which uses a dialog that prompts the user to find the file to be added from his or her own computer. After uploading the digital content, the next step is input of the metadata listed in table 1. The form for input of metadata is shown in figure 1. Transla- tions of multilingual metadata can be entered using this form and invoking (clicking on) the boxes to the right (e.g., Title translations, Subtitle trans- lations, and so on). All data about authors, advisors, chair, and committee members are stored in a MARC 21 authority record. Figure 1. Form for Input of Metadata The relation of a thesis or dissertation with the authority record is established using the subfield “0” of the MARC the knowledge of standards and downloading files from the server’s 21 record field 100/700. The subfield protocols. file system. This component also “0” contains the control number of the • Exchanging metadata about is used to preserve digital contents authority record that stores data about EDTs with other CRIS systems. of other scientific research, such as a researcher (thesis author, mentor, In this way, researchers from papers published in journals, mono- and so on). Subfield “e” of the field European countries using nation- graphs, and papers published in 100/700 holds the relationship type al CRIS systems can find EDTs conference proceedings. This digital between a thesis and researcher (rela- from the CRIS UNS system. content is not freely accessible and tion is established by subfield “0”), • Exchanging metadata about access to those digital materials is e.g., author, mentor, thesis or disserta- EDTs in accordance with the controlled through the Java serv- tion defense board chair, or thesis or OAI-PMH protocol. In this let. The file server component will dissertation defense board member. way, theses and dissertations be integrated with an open-source This approach to establishing relation- from CRIS UNS can be visible solution for long-term file preserva- ships allows various reports to be gen- through a various IRs as well as tion such as Lots of Copies Keep erated, such as through web applications for Stuff Safe (LOCKSS) (www.lockss searching the NDLTD Union .org). The file server component also • thesis and dissertations in which Catalogue: SCIRUS ETD extracts textual content from upload- a researcher has been a men- Search (www.ndltd.org/service ed files using open-source Apache tor, thesis defense board chair, providers/scirus-etd-search), Tika library (http://tika.apache.org). or thesis defense board mem- VTLS Visualizer (www.vtls.com/ After extraction, text goes through a ber; and products/visualizer), etc. Cyrillic to Latin transliteration algo- • thesis and dissertations in which rithm and then is indexed using the researchers from some depart- The system architecture was Apache Lucene library. Query pro- ments have been a mentor, the- extended with a file server com- cessing also includes a Cyrillic to sis defense board chair, or thesis ponent that manages storing and Latin transliteration algorithm and defense board member.  111 LRTS 56(2)

Because some metadata are multi- the remaining 1,200 from hard-copy After this step, web services for data lingual, information retrieval measures to ETDs by scanning is in progress. exchange will be made available for (precision, recall, and F-measure) Researchers did not complain about public access. Finally, an audit will are improved, i.e., visibility of ETDs the migrated data or the user inter- be performed to assess whether the are increased. Furthermore, visibility face. Adding theses and dissertations visibility of scientific research from of ETDs is improved by using fuzzy from the additional fourteen UNS fac- UMS has increased after this reposi- search that is enabled through Apache ulties is also in progress. After this pro- tory implementation. Lucene library. Fuzzy search retrieves cess is finished, an additional effort to all theses and dissertations that meet consolidate data will be necessary; this References and Notes a set of criteria that define similar- will include such activities as remov- ity. For example, similarity criteria for ing duplicated items and consolidating 1. EuroCRIS, CERIF 2008—Final two strings (string from a query and scientific fields and disciplines. Release (1.2), www.eurocris.org/ Index.php?page=CERIF2008&t=1 string from a thesis or dissertation title (accessed Nov. 16, 2011). CERIF stored in the CRIS UNS database) are 1.3 Release was available for preview defined as follows: Conclusion until early December 2011, www.euro cris.org/Index.php?page=CERIF-1.3 • Each word in one string does This paper describes the implementa- &t=1 (accessed Nov. 16, 2011). not differ by more than two tion of a digital repository of EDTs 2. CERIFy, What is the CERIFy Proj- letters from a word in anoth- within the CRIS UNS system. Meta- ect? What is CERIF?, http://cerify er string. data about theses and dissertations are .ukoln.ac.uk/node/196 (accessed Nov. • If one string contains more than stored in the MARC 21 bibliographic 14, 2011). five words, the previous criteri- format. The implementation is based 3. Library of Congress, MARC Stan- on is satisfied for at least 80 per- on open-source components. The sys- dards, MARC 21 Formats, www.loc .gov/marc/marcdocz.html (accessed cent of the words. tem architecture allows an easy transi- Nov. 16, 2011). • It is case insensitive and Cyril- tion to other bibliographic standards 4. Steve Lawrence, “Free Online Avail- lic-Latin script insensitive (i.e., and easy integration with LIS based ability Substantially Increases a lower case and upper case are on the adopted bibliographic standard. Paper’s Impact,” Nature 411 (May equal, as well as Cyrillic and The system can exchange ETDs 2001): 477, www.nature.com/nature/ Latin scripts). metadata with other CRIS systems, debates/e-access/Articles/lawrence IRs, the NDLTD network members, (accessed Nov. 14, 2011); Stevan Harnad and Tim Brody, “Compar- Data Verification and LIS. Interoperability with previ- ous stated systems maximizes visibility ing the Impact of Open Access (OA) This application was verified and tested of ETDs from the repository without vs. Non-OA Articles in the Same on data about theses and dissertations duplicate entry of ETDs metadata in Journals,” D-Lib Magazine 10, no. 6 (2004), www.dlib.org/dlib/june04/ of researchers employed at Faculty various systems. Metadata are entered harnad/06harnad.html (accessed of Sciences, UNS. After migration of once, but metadata are stored in vari- Aug. 22, 2011) ; Kristin Antelman, the existing dataset containing ETDs ous systems across the Internet. High “Do Open-Access Articles Have a along with associated metadata from international visibility of theses and Greater Research Impact?” Col- the DIGLIB UNS system to the CRIS dissertations of researchers from Uni- lege & Research Libraries 65, no. 5 UNS system, researchers from the versity of Novi Sad enhances the fur- (Sept. 2004): 372–82.; Kent Ander- University of Novi Sad verified and ther development of science and raises son et al., “Publishing Online-Only supplied migrated data about their public awareness of UNS research. Peer-Reviewed Biomedical Litera- theses and dissertations. The Faculty The system for inputting of ETDs ture: Three Years of Citation, Author of Sciences employs more than 300 has been verified and tested on a data- Perception, and Usage Experience,” researchers and has written approxi- set containing EDTs by researchers at Journal of Electronic Publishing 6, no. 3 (Mar. 2001), http://quod mately 900 master theses and 500 PhD Faculty of Sciences, UNS. The addi- .lib.umich.edu/cgi/t/text/text-idx?c=j dissertations through 2011. The test tion of ETDs from additional fourteen ep;view=text;rgn=main;idno=333645 set included metadata about all 1,400 UNS faculties is in progress. After this 1.0006.303 (accessed Aug. 22, theses and dissertations. Hard-copies process is finished, further effort to 2011); Gunther Eysenbach, “Cita- of all 1,400 theses and dissertations can consolidate data will be necessary; this tion Advantage of Open Access Arti- be found in the faculty library. In time will include such activities as remov- cles,” PLoS Biology 4, no. 5 (May of this writing, 200 of them also can ing duplicated items and consolidat- 2006): 692–98, www.plosbiology be found in digital form. Transforming ing scientific fields and disciplines. .org/article/info:doi/10.1371/journal (LRTS 56(2  112

.pbio.0040157 (accessed Aug. 22, 10. Astrid van Wesenbeeck, “Digital Aca- Library & Information Systems 45, 2011). demic Repositories in the Nether- no. 4 (2011): 376–96. 5. Mohammad Hanief Bhat, “Interop- lands: Built with the DARE Program 17. Ariel Ortiz Ramires, “Three-Tier erability of Open Access Reposito- (2003-2006)” (presentation, Valen- Architecture,” Linux Journal 75 (July 1, ries in Computer Science and IT— cia, Spain, June 20, 2006), http:// 2000), www.linuxjournal.com/article/ An Evaluation,” Library Hi Tech 28, cde.uv.es/documents/2007-VANWE 3508 (accessed Aug. 22, 2011). no. 1 (2010): 107–18. SENBEECK.pdf (accessed Nov. 14, 18. The H2EE Tutorial, What Is a 6. Eun G. Park and Marc Richard, 2011). Servlet? http://java.sun.com/j2ee/ “Metadata Assessment in E-Theses 11. Elsabé Olivier, “Open Scholarship tutorial/1_3-fcs/doc/Servlets2. and Dissertations of Canadian Insti- and Research Reporting in Tandem: html#75087 (accessed Nov. 19, 2011). tutional Repositories,” The Electron- Creating More Value” (presentation, 19. Oracle, JavaServer Pages Technology, ic Library 29, no. 3 (2011): 394–407; The African Digital Scholarship & www.oracle.com/technetwork/java/ Sevim McCutcheon et al., “Morph- Curation Conference, May 12–14, javaee/jsp/index.html (accessed Nov. ing Metadata: Maximizing Access to 2009, Pretoria, South Africa), www 19, 2011). Electronic Theses and Dissertations,” .ais.up.ac.za/digi/docs/olivier_paper 20. Dragan Ivanović, Dušan Surla and Library Hi Tech 26, no. 1 (2008): .pdf (accessed Aug. 22, 2011). Miloš Racković, “A CERIF Data 41–57. 12. EuroCRIS, Operation Work Plan for Model Extension for Evaluation and 7. Nicholas Joint, “Current Research the CRIS-IR Task Group, www.euro Quantitative Expression of Scientif- Information Systems, Open Access cris.org/Index.php?page=CRIS-IR_ ic Research Results,” Scientometrics Repositories and Libraries: ANTAE- workplan&t=1 (accessed Nov. 14, 86, no. 1 (2011): 155–72. US,” Library Review 57, no. 8 (2008): 2011). 21. Networked Digital Library of The- 570–75. 13. KE: Knowledge Exchange, CRIS/ ses and Dissertations, ETD-MS: An 8. Jürgen Krause, “Current Research OAR Project, www.knowledge Interoperability Metadata Standard Information As Part of Digi- -exchange.info/Default.aspx?ID=340 for Electronic Theses and Disser- tal Libraries and the Heterogene- (accessed Nov. 15, 2011). tations, version 1.00, rev. 2, www ity Problem Integrated Searches in 14. Dragan Ivanović, Dušla Surla, and .ndltd.org/standards/metadata/etd-ms the Context of Databases with Dif- Zora Konjović, “CERIF Compati- -v1.00-rev2.html (accessed Nov. 16, ferent Content Analyses,” in Gain- ble Data Model Based on MARC 21 2011); , The ing Insight from Research Informa- Format,” Electronic Library 29, no. 1 Open Archives Initiative Protocol for tion: 6th International Conference on (2011): 52–70. Metadata Harvesting, Protocol ver- Current Research Information Sys- 15. Lidija Ivanović, Dragan Ivanović, sion 2.0 of 2002-06014, www.openar tems, ed. Wolfgang Adamczak and and Dušan Surla, “A Data Model chives.org/OAI/openarchivesprotocol Annemarie Nase, 21-31 (Kassel, of Theses and Dissertations Com- .html (accessed Nov. 16, 2011). Germany: Kassel University Press, patible with CERIF, Dublin Core, 22. EuroCRIS, CERIF 2008—Final 2002), www.uni-kassel.de/hrz/db4/ and ETD-MS,” Online Information Release 1.2. extern/dbupress/publik/abstract_en Review (forthcoming). 23. Dušan Surla et al., “Overview of .php?978-3-933146-84-7 (accessed 16. Dragan Ivanović et al., “A CERIF- Implementation of the Networked Nov. 14, 2011). Compatible Research Management Digital Library of Theses and Disser- 9. Elly Dijk et al., “NARCIS: The Gate- System Based on the MARC 21 For- tations,” Infoteka 5, no. 1–2 (2004): way to Dutch Scientific Informa- mat,” Program: Electronic Library & 75–86. tion,” in Digital Spectrum: Inte- Information Systems 44, no. 3 (2010): 24. International Standards Organization, grating Technology and Culture: 229–51; Gordana Milosavljević et International Standard: ISO 2709, Proceedings of the 10th Internation- al., “Automated Construction of the Information and Documentation— al Conference on Electronic Pub- User Interface for a CERIF-Compli- Format for Information Exchange = lishing held in Bansko, June 14–16, ant Research Management System,” Information et Documentation—For- 2006, ed. Bob Martens and Milena Electronic Library 29, no. 5 (2011): mat por l’échange d’information, 4th Dobreva, 49–57 (Sofia: FOI-COM- 565–88; Aleksandar Kovačević et al., ed. (Geneva, Switzerland: ISO Copy- MERCE, 2006), http://elpub.scix “Automatic Extraction of Metada- right Office, 2008). .net/data/works/att/233_elpub2006 ta from Scientific Publications for .content.pdf (accessed Nov. 15, 2011). CRIS Systems,” Program: Electronic