CERN-THESIS-2009-057 20/10/2006 mrvn h Formatting the Improving o l fCSInvenio CDS of Tools TUSCS ENICIFHI EPFL I&C-IIF-HCI, CERN IT-UDS-CDS, Jean-Yves eMeur Le J´erˆome coe ,2006 1, October Supervisors: Caffaro r Pearl Dr. uFaltings Pu

Abstract

CDS Invenio is the web-based integrated digital library system developed at CERN. It is a strategical tool that supports the archival and open dissemina- tion of documents produced by CERN researchers. This paper reports on my Master’s thesis work done on BibFormat, a module in CDS Invenio, which for- mats document meta-data. The goal of this project was to implement a completely new formatting module for CDS Invenio. In this report a strong emphasis is put on the user-centered design of the new BibFormat. The bibliographic formatting process and its re- quirements are discussed. The task analysis and its resulting interaction model are detailed. The document also shows the implemented user interface of BibFormat and gives the results of the user evaluation of this interface. Finally the results of a small usability study of the formats included in CDS In- venio are discussed.

“Good tools obviate bad processes.” Dr. Michael B. Johnson, Pixar Animation Studios, Apple WWDC, August 2006 vi Contents

1 Introduction 1 1.1 Context ...... 1 1.1.1 CERN ...... 1 1.1.2 CDS Invenio ...... 1 1.1.3 BibFormat ...... 2 1.2 The Project ...... 4 1.2.1 Motivations ...... 4 1.2.2 Objectives ...... 4 1.2.3 Activities ...... 5

2 Analysis 7 2.1 Task Analysis ...... 7 2.1.1 Product Statement ...... 7 2.1.2 User Population Analysis ...... 7 2.1.3 Personas ...... 11 2.1.4 User Needs ...... 17 2.1.5 User Scenarios ...... 18 2.1.6 Task Tables and Task Map ...... 19 2.1.7 Usability Goals ...... 24 2.2 Comparative Analysis ...... 24 2.2.1 Previous Formatting Module ...... 25 2.2.2 Competitors ...... 30 2.2.3 Synthesis ...... 34

3 Design 37 3.1 Specifications ...... 37 3.2 Prototypes ...... 40 3.3 Final UI Design ...... 44 3.4 Formatting Engine Design ...... 51

4 Evaluation 57 4.1 User Evaluation of the BibFormat Admin User Interface . . . . . 57 4.2 Usability Study of the Formats ...... 58 4.2.1 Goals of the Study ...... 59 4.2.2 Experimental Conditions ...... 60 4.2.3 Results ...... 61 4.3 Further Improvements of BibFormat ...... 62

vii CONTENTS

5 Conclusions 65

Appendices

A UML Use Cases 69

B BibFormat Developers APIs 73

Bibliography 79

viii Chapter 1

Introduction

1.1 Context 1.1.1 CERN CERN (European Organization for Nuclear Research) is the world’s largest particle physics center[1]. Located near Geneva on the Frenco-Swiss border, it employs 3000 persons to operate, maintain and develop the complex facilities that allows scientists to discover the building blocks of matter. Founded in 1954, the laboratory was one of Europe’s first joint ventures and includes now 20 Member States. Some 6500 visiting scientists, half of the world’s particle physicists, come to CERN for their research. They represent 500 universities and over 80 nationalities. Among the important discoveries for which many CERN scientists have re- ceived prestigious awards, including Nobel prizes, is the Web. The Web was first thought by Tim Berners-Lee in 1989 as a solution to the problem of loss of in- formation due to the high turnover of people at CERN. His original proposal[2] suggested to use Hypertext to link information systems accross network such that anyone could have access to any important documents produced at CERN. The web has now extended worldwide and has become a primary component of IT. Unfortunately some concepts proposed by Tim Berners-Lee have not made their way into the world wide web: there is for example no typed links (' Semantic web) between information systems, and there is still not always a clear separation between the data and its visual representation.

1.1.2 CDS Invenio CDS Invenio1 is the integrated digital library system developed and used at CERN[3]. It is in some ways the implementation of the original idea of Tim Berners-Lee for scientific documents. As CERN researchers issue about 2000 publications per year, CDS Invenio is an essential tool to capture all of this information and turn it into shared knowl- edge. At CERN CDS Invenio currently has a database of more than 800’000 bibliographic references, including 360’000 fulltext documents[4]. These docu- ments are organized in more than 500 collections. In additions to the documents

1Formerly known as CDSware

1 CHAPTER 1. INTRODUCTION produced at CERN, CDS Invenio harvests about 100’000 documents from ex- ternal sources. CDS Invenio uses standards at its core. It is OAI-compliant[5] to support the open dissemination of these documents. CDS Invenio also uses MARC 21[6] (and its XML derivative MARCXML) to store and process bibli- ographic meta-data. The server is accessible from a web-based interface, which offers both a fast search of documents and a browsable tree-like structure. Collections of docu- ments can have customizable “portals” to support community building. Addi- tionally, CDS Invenio provides collaborative tools such as baskets or alerts for new specific documents [7]. CDS Invenio is a free, open source application (under the terms of the GNU Gen- eral Public License). It is developed at CERN by the CERN Document Server (CDS) section, in the User and Document Services (UDS) group. CDS Invenio is now being used by many scientific institutions worldwide, including EPFL[8]. The software complements other librarians’ tools such as Aleph 500[9], with which CDS Invenio can synchronize.

1.1.3 BibFormat

BibFormat is one of the many modules that compose CDS Invenio. Its purpose is to format the records of the database by applying customizable templates. The BibFormat module is almost invisible to end-users of CDS Invenio, in the sense that users only see the formatted output produced by BibFormat, but do not directly interact with the module. Figure 1.1 shows where BibFormat sits in the case of a record access by a user. This workflow is only a conceptual view of

CDS Invenio

Request for Search Request for formatting document(s) query a record Web search Search BibFormat interface Engine Formatted Formatted Formatted User output record(s) record

Records Meta-data

Figure 1.1: Simplified scenario of typical access to a record by end-user. the process, as other mechanisms such as caching of pre-formatted records are involved. This scenario repeats for each search request of the user as BibFormat is also responsible for formatting each individual notice of the search results list. Figure 1.2 shows the output produced by BibFormat in the case of a search results displaying and in the case of a detailed view of a record. Notice that BibFormat only produces the red surrounded areas, while other components are laid out by CDS Invenio itself.

2 guest :: session :: alerts :: baskets :: login Search Submit Convert Agenda Webcast Bulletin Library

Home > Search Results

CERN Document Server

Search:

einstein any field Search Browse

Search Tips :: Advanced Search :: Try your search on...

Search collections:

*** any collection ***

Sort by: Display results: Output format:

- latest first - 10 results HTML brief desc. - or rank by - split by collection

1.1.Results overview: CONTEXT Found 13,092 records in 0.56 seconds. Articles & Preprints, 12,433 records found guest :: session :: alerts :: baskets :: login Books & Proceedings, 337 records found Search Submit Convert Agenda Webcast Bulletin Library Presentations & Talks, 52 records found Periodicals & Progress Reports, 4 records found Home > Articles & Preprints > Preprints > Detailed record #981978 Multimedia & Outreach, 187 records found Archives, 79 records found Format: HTML | HTML MARC | XML DC | XML MARC

Articles & 12,433 records found 1 - 10 jump to Preprints Other Fields of Physics physics/0609026 Preprints record: 1 On the Machian Origin of Inertia 1. Why Does Gravity Ignore the Vacuum Energy? / Padmanabhan, T 3 pages including front one The equations of motion for matter fields are invariant under the shift of the matter lagrangian by a constant. [...] gr-qc/0609012; 5 Sep 2006 Fulltext Berman, M S; Detailed record - Similar records 3 Sep 2006 . - 3 p 2. On the Machian Origin of Inertia / Berman, M S We determine the numerical value of a constant which appears in the Machian inertial force expression devised by Graneau and Graneau[2]. [...] physics/0609026; 3 Sep 2006 . - 3 p Fulltext Detailed record - Similar records Abstract: We determine the numerical value of 3. Large atom number Bose-Einstein condensate of sodium / a constant which appears in the Machian van der Stam, K M R; van Ooijen, E D; Meppelink, R; inertial force expression devised by Graneau Vogels, J M; Van der Straten, P and Graneau[2]. We point out that this formula We describe the setup to create a large Bose-Einstein condensate containing may be not restricted to Newtonian physics. more than 120x10^6 atoms. [...] Keywords: Einstein; Brans-Dicke; Newton; physics/0609028; 4 Sep 2006 . - 11 p Fulltext Gravitation; Graneau and Graneau; Mach. Detailed record - Similar records PACS: 01.55.+b; 04.20.-q. 4. Extended Coherence Time with Atom-Number Squeezed Sources / Li, W; Tuchman, A K; Chien, H C; Kasevich, M A Coherence properties of Bose-Einstein condensates offer the potential for improved interferometric phase contrast. [...] Access to fulltext document quant-ph/0609009; 2 Sep 2006 . - 4 p Fulltext Detailed record - Similar records 5. Some analytical models of radiating collapsing spheres / You can download this document in the following formats: Herrera, L; Prisco, A D; Ospino, J Portable Document Format - [51984 bytes] We present some analytical(a) solutions Search to the Einstein results equations, describing (b) Detailed record output radiating collapsing spheres in the diffusion approximation. [...] gr-qc/0609009; 4 Sep 2006 . - 17 p Fulltext - Published in: References and further information: Phys. Rev. D74, 044001, (2006): You can search for documents which cite physics/0609026 Detailed record - Similar records You can get the Source File. This is generally TeX or LaTeX. Figure6. Einstein 1.2:-de HaasSamples effect for a $^{87}$Rb of output atoms condensate produced by BibFormat (red surrounded areas) and / Gawryluk, K; Brewczyk, M; Bongs, K; Gajda, M laid out by CDS Invenio Extract the main keywords from the full text (only works for physics documents). We consider a condensate of $^{87}$Rb atoms in an F=1 hyperfine state Experimental document classification based on DESY High Energy Physics confined in an optical dipole trap. [...] Thesaurus. cond-mat/0609061; 4 Sep 2006 . - 4 p Fulltext Detailed record - Similar records

7. Low temperature heat capacity of fullerite C60 doped with (The above information has not yet been proofread by the CERN library) nitrogen / Gurevich, A M; Terekhov, A V; Kondrashev, D S; Dolbin, A V; Cassidy, D; Gadd, G E; Moricca, S; Sundqvist, Record created 2006-09-06, last modified 2006-09-06 B TheThe formatting heat capacity Cm of polycrystalline challenge fullerite C60 doped with nitrogenAlthoughSimilar end-users records do not directly interact with has been measured in the temperature interval 2 - 13 K. [...] BibFormat,cond-mat/0609056; 4 administrators Sep 2006 . - 4 p Fulltext of CDS Invenio do have to configure it to produce the Detailed record - Similar records ADD TO BASKET desired8. Quantum output. Field Theoretical Typically Description of notUnstable all records will require the same formatting: there Behavior of Bose-Einstein Condensates in a Trapping are casesPotential with where Complex a Eigenvalues field (or of Bogoliubov attribute)-de of a record is present in a collection, but not Gennes Equations / Mine, M; Okumura, M; Sunaga, T; People who viewed this page also viewed: in others,Yamanaka, Y or worse, have a different meaning.(1) Extended For Coherence example Time with Atom a-Number technical Squeezed Sources report - Li, W and The Bogoliubov-de Gennes equations are used for a number of theoretical et al - quant-ph/0609009 works on the trapped Bose-Einstein condensates. [...] (1) Large atom number Bose-Einstein condensate of sodium - van der Stam, K a multimediacond-mat/0609052; 4 Sep document 2006 . - 30 p Fulltext do not share the sameM R et al - physics/0609028 meta-data and BibFormat should Detailed record - Similar records (1) Why Does Gravity Ignore the Vacuum Energy? - Padmanabhan, T - gr- certainly9. Screening effect produce and delocalization different of interacting kinds Bose- of outputqc/0609012 for each of these document types. Einstein condensates in random potentials / Sánchez- (1) CERN Document Server Software: the integrated digital library - Pepe, ThereforePalencia, L the customization of outputs isAlberto necessary et al - CERN-OPEN in-2005- BibFormat.018 Considering We theoretically investigate the physics of interacting Bose-Einstein the highcondensates at number equilibrium in a weak of (possibly different random) potential. collections [...] at CERN, this makes the management cond-mat/0609036; 2 Sep 2006 Fulltext of formatsDetailed record - aSimilar particularly records difficult task. 10. The scientific work of Albert Einstein / Fierz, Markus E CERN-TH-2644.- Geneva : CERN, 1979 . - 12 p TheDetailed record problem - Similar records becomes even more complex when the different variants of

CERN Document Server :: Search :: Submit :: Personalize :: Help This site is also available in the following aADD format TO BASKET are considered: for each formattingPowered by CDSware v0.7.1 style you might need to havelanguages: a Maintained by [email protected] Català !esky Deutsch "##$%&'( Last updated 26 Jul 2005 14:36:49 CEST English Español Français Italiano detailedArticles & Preprints version, : 12,433 records found a 1 brief- 10 jump version to record: 1 for search results,Norsk/Bokmål or even Português a)*++,-. BibTeX Slovensky Svenska /,0123+4,1 version such that people can directly insert a reference to a document in their LaTeX Books & 337 records found 1 - 10 jump to file.Proceedings record: 1

1. Einstein's universe : the layperson's guide / Calder, Nigel Another.- London: Penguin, particularity 1979.- 254 p of CDS Invenio which makes the formatting difficult CERN library copies - is itsDetailed internal record - Similar weakly-typedrecords semi-structured data. No contraints are put on the 2. Einstein's Annalen : the complete collection 1901- values1922 / entered Renn, Jürgen (ed.) by the users and everything is saved as text, such that it is very .- New York, NY: Wiley, 2005.- 585 p easyDetailed to harvestrecord - Similar records documents from different sources using different conventions, but 3. Verehrte An- und Abwesende : Originaltonaufnahmen also1921- difficult1951Honored to listeners provide present and uniform invisible output values for meta-data. It is for example .- Köln, 2003.- 2 CD-ROMs Access CD-ROM 1 - Access commonCD-ROM 2 to - have different abbreviated names for the same journal or author. CERN library copies - A moduleDetailed record - inSimilarCDS records Invenio takes care of normalizing meta-data of documents, 4. Albert Einstein: Akademie-Vortrage : Sitzungsberichte butder there Preußischen are Akademie some der casesWissenschaften where 1914-1932 it cannot apply or where it is more desirable / Simon, Dieter (ed.) to normalize.- Weinheim: Wiley, at 2006.- formatting 431 p time. One might want to change the way a date is CERN library copies - displayed,Detailed record - orSimilar link records a journal its website even if the URL is not saved inside the 5. Secrets of the old one : Einstein, 1905 / Bernstein, Jeremy meta-data..- New York, NY: This Springer, is 2006.- why 200 p BibFormat must have the capabilities to normalize and CERN library copies - enrichDetailed documents record - Similar records meta-data. 6. Einstein's predicament : a new approach to the speed of light / Pym, Francis; Denton, Clifford .- Waterlooville: Twoedged Sword Publications, 2005.- 85 p CERN library copies - 3 Detailed record - Similar records 7. 100 years of relativity : space-time structure : Einstein and beyond / Ashtekar, Abhay (ed.) .- Singapore: World Sci., 2005.- 510 p CERN library copies - CERN Bookshop Detailed record - Similar records 8. L'ère Einstein .- Paris: Pour la Science, 2004.- 159 p.- (Pour la Science : spécial, 326) CERN library copies - Detailed record - Similar records 9. Albert Einstein, Ingenieur des Universums / Renn, Jürgen (ed.) .- New York: Wiley, 2005.- 472 p.- (Ingenieur des universums) CERN library copies - Detailed record - Similar records 10. Albert Einstein, Ingenieur des Universums / Renn, Jürgen (ed.) .- New York: Wiley, 2005.- 255 p. ; 1 CD-ROM suppl.- (Ingenieur des universums) CERN library copies - Detailed record - Similar records

ADD TO BASKET

Books & Proceedings : 337 records found 1 - 10 jump to record: 1

Presentations & 52 records found 1 - 10 jump to Talks record: 1

1. 15th International Conference on Vacuum Ultraviolet Radiation Physics - VUV 15 , 29 Jul - 3 Aug 2007 , Berlin, Germany .- 2007 Conference home page - - Detailed record - Similar records 2. SARS Einstein Centennial Meeting South African Relativity Society - South African Relativity Society , 25 - 26 Sep 2005 , Durban, South Africa .- 2005 Conference home page - Detailed record - Similar records 3. International Conference on Century of Fundamental Discoveries of Einstein , 6 - 7 Dec 2005 , Moscow, Russia .- 2005 Detailed record - Similar records 4. South African Relativity Society's Einstein Centennial Meeting , 25 - 26 Sep 2005 , Durban, South Africa .- 2005 Detailed record - Similar records 5. International Meeting on Developing and Extending Einstein's Ideas - Developing and Extending Einstein's Ideas , 19 - 21 Dec 2005 , Cairo, Egypt .- 2005 Conference home page - Detailed record - Similar records 6. Bose-einstein Condensation And Quantum Information - Bose-einstein Condensation And Quantum Information , 18 - 20 Dec 2005 , Vienna, Austria .- 2005 Conference home page - Detailed record - Similar records 7. Einstein's impact on the physics of the twentieth century / Straumann, Norbert (speaker) Geneva : CERN, 2005 . - streaming video ; 5 DVD video .- (CERN Academic Training Lecture; Regular Lecture Programme) Talk 3 Oct 2005 - Talk 4 Oct 2005 - Talk 5 Oct 2005 - Talk 6 Oct 2005 - Talk 7 Oct 2005 - Talk Information - CERN library copies Detailed record - Similar records 8. 185. Jahreskongress der Akademie der Naturwissenschaften Schweiz SCNAT: Einstein heute - Einstein aujourd'hui - Einstein heute , 14 - 15 Jul 2005 , Bern, Switzerland .- 2005 Conference home page - Detailed record - Similar records 9. Einstein-Woche International Conference on General Relativity - General Relativity , 26 - 29 Sep 2005 , Jean, Germany .- 2005 Conference home page - Detailed record - Similar records 10. Isaac Newton Institute Conference on Einstein Constraint Equations - Einstein Constraint Equations , 12 - 16 Dec 2005 , Cambridge, UK .- 2005 Conference home page - Detailed record - Similar records

ADD TO BASKET

Presentations & Talks : 52 records found 1 - 10 jump to record: 1

Periodicals & Progress Reports 4 records found

1. AIP history of physics newsletter [Online version]. College Park, MD: AIP Center for History of Physics (1994- v 26 no 2 -) Electronic version Detailed record - Similar records 2. World year of physics 2005 newsletter [Electronic journal]. College Park, MD: APS (2004- ) Electronic version Detailed record - Similar records 3. Max-Planck-Gesellschaft zur Förderung der Wissenschaften. Golm. Institut für Gravitationsphysik, Albert-Einstein-Institut : Annual report. Golm: Max- Planck-Gesellschaft zur Förderung der Wissenschaften (1998- ) Website Detailed record - Similar records 4. Living reviews in relativity [Electronic journal]. Potsdam: Max Planck Institute for Gravitational Physics (1998- v 1 -) Electronic version Accepted papers Detailed record - Similar records

ADD TO BASKET

Multimedia & 187 records found 1 - 10 jump to Outreach record: 1

1. An extraordinary opportunity for China's brightest Une chance exceptionnelle pour de brillants étudiants chinois [BUL-NA-2006-102]; [33/2006]; [34/2006].- Published in CERN weekly bulletin 33/2006 (2006). Detailed record - Similar records 2. CERN physicist receives Einstein Medal Un physicien du CERN reçoit la Médaille Einstein [BUL-NA-2006-088]; [29/2006]; [30/2006].- Published in CERN weekly bulletin 29/2006 (2006). Detailed record - Similar records 3. Einstein-Medaille 2006 geht an Gabriele Veneziano. Published in: Tages-Anzeiger (Zurich, Switzerland), via le WEB (2006) , pp.Internet, 30 June 2006 Detailed record - Similar records 4. Il pacifista inventore dell'atomica. by Odifreddi, Piergiogio Published in: La domenica di Repubblica (2005) , pp.39, 9 January 2005 Detailed record - Similar records 5. A che cosa serve la ricerca di base?. by Bellone, Enrico Published in: Lescienze.it (2006) , pp.5, June 2006 Detailed record - Similar records 6. Centenarian Einstein Centenaire Einstein / Weisskopf,V; Amati,D; Fubini,S; Berob CERN-AUDIO-1979-002; 19790312 . - 104' - audio files Detailed record - Similar records 7. PRESSCUT-S-2005-562 - The Solvay Council - Concerted voices on strings. by Greene, Brian; Dijkgraaf, Robbert H Published in: RTDinfo (2005) , pp.Internet, April 2005 Fulltext Detailed record - Similar records 8. PRESSCUT-S-2006-068 - Ateliers pour petits sur Einstein. Published in: La Tribune de Genève (Switzerland) (2006) , pp.No pagination, 7 March 2006 Fulltext Detailed record - Similar records 9. PRESSCUT-S-2006-064 - CERN proposes a temporary exhibition "100 years after Einstein" in the Globe, until April 1 included. Published in: Le Dauphiné Libéré (2006) , pp.9, 1 March 2006 Fulltext Detailed record - Similar records 10. Literatur zu Gast im Globe Literatur zu Gast im Globe [BUL-NA-2006-028]; [11/2006]; [12/2006].- Published in CERN weekly bulletin 11/2006 (2006). Detailed record - Similar records

ADD TO BASKET

Multimedia & Outreach : 187 records found 1 - 10 jump to record: 1

Archives 79 records found 1 - 10 jump to record: 1

1. CERN-ARCH-NOMAD-016 . ; Neutrino Oscillation MAgnetic Detector, NOMAD : General Meeting held on CHAPTER 1. INTRODUCTION

1.2 The Project 1.2.1 Motivations Over the last few years CDS Invenio code base has moved from the PHP pro- gramming language to Python. BibFormat was the last module still written in PHP. This has put additional constraints on the installation requirements of CDS Invenio, which already has a long list of dependencies on other software. Integration with other Python modules was also limited given the difference of languages. Finally the already existing BibFormat had some serious usability issues that made the edition and management of formats a long and difficult task. The goal of the project was to design and implement a new version of BibFormat.

1.2.2 Objectives The initial objectives of the project were the following: 1. Implement a fully working Python version of BibFormat. 2. Reach higher usability goals than the previous BibFormat. 3. Design a formatting syntax that is easier to read and understand. 4. Reach at least equivalent expressiveness as with previous BibFormat. 5. Implement requested features.

Implement a fully working Python version of BibFormat The first objective is almost trivial. It has been decided that the developed product would not be a prototype, but a module ready for production. This implied a huge testing and integration work at the end of the project. It also meant that the design specifications might have needed to be scaled down to fit in the five months project’s timeframe. Note that is was also decided that the new version would not be just a code translation of the old one, but a completely redesigned product.

Reach higher usability goals than previous BibFormat Shortly after at the beginning of the project, concerns about the usability of the module were raised by the users of the previous BibFormat. Some serious usability issues in the old PHP BibFormat were the source of frustrations. These were pointed out immediately when establishing the requirements of the product. This second point was also motivated by my own interest in interaction design. This is why this report focuses on the user-centered analysis and design of the project.

Design a formatting syntax that is easier to read and understand The third objective is similar to the second one, but more power-users oriented: in addition to improving the user interaction with the module, we wanted to pro- vide a clearer syntax for the template configuration files, even if those were not to be directly manipulated by the novice and intermediate users. The previous BibFormat had quite a complex syntax, that looked like some subset of PHP, and that was not much appreciated by the users of BibFormat. This switch to

4 1.2. THE PROJECT a new syntax also meant a loss of backward compatibility that needed to be addressed.

Reach at least equivalent expressiveness as with previous BibFormat The last main objective of the project was to get the same level of expressive- ness with the new BibFormat as with the previous release. This means that every output generated by the old version could be generated by the new one. However this point does not mean that every feature of the previous BibFor- mat would have to be implemented in the new one. In fact, a lot of features have been dropped when transitioning to the new BibFormat because they were complicating the module with concepts that could easily be simulated by others.

Implement requested features While some features have been dropped, other features requested by users of BibFormat have been added. The most requested feature was support for internationalization of formats. Although CDS Invenio’s user interface has already been translated into 16 languages, BibFormat could not be adapted to a user’s language settings.

What the project was not about Finally we can talk about one point that was not an objective of the work: writing new formats. It was clear that this project was not about writing new formats for CDS Invenio. This is a task that would clearly have needed five more months to get done. However some improvements for these formats have been studied, even if they were not initially planned.

Personal objectives On a more personal point of view, this project was also the occasion for me to apply principles and methodologies of human-computer interaction and software engineering on a real project.

1.2.3 Activities

This project covered 5 months of development at CERN and 1 month of report redaction. Five major phases occured in these 6 months. The next figure sum- marizes these phases, their corresponding activities and approximate duration. This schema differs a bit from the initial project timeframe, as some activities had to be dropped or added, and some reordering occured. Also note that ac- tivities were not always carried in the same order as shown here and sometimes overlapped.

5 CHAPTER 1. INTRODUCTION

Phases Activities Duration Getting to know CDS Invenio & Python 3 days Interviews 3 days Analysis User population analysis 1 week Competitive analysis 2 days Scenarios and tasks model 1 week Low-Fi prototypes 1 week Design Module Architecture Design 1 week Call for feature requests 3 days Coding 2 months Implementation Preliminary User Evaluation 2 days Code debugging and Unit tests develop- 1 month Testing and ment Integration User Evaluation 4 days Formats Usability Testing 2 weeks Report Redaction 1 month

The analysis and design phases are discussed in more details in chapters 2 and 3. The user evaluation of the interface and the formats usability study are to be found in chapter 4.

6 Chapter 2

Analysis

2.1 Task Analysis 2.1.1 Product Statement “BibFormat is a module of CDS Invenio that formats the records of the database for displaying in reader’s web browser. The formatting is based on templates defined by administrators.”

2.1.2 User Population Analysis The sources for the population analysis were the CDS member themselves: they are mainly developing CDS Invenio for internal use and for similar institutions, such that they have become familiar with their users over the years. Of course we focus here on users that are concerned by the BibFormat module. 4 main types of users are considered by the CDS Invenio development team. They are known as: Users They are the CERN users who browse on the CDS website, looking for bibliographic references or full texts. Internal customers They are the groups (departments, laboratories, etc.) at CERN who have asked to use CDS Invenio for their publications. They can manage their set of documents, known as collections. Collections in CDS Invenio are organized as a tree structure (a collection belongs to a super-collection, and contains other subcollections) that users can browse. Depending on the collection type, internal customers can customize how their collection looks like. A collection is then some kind of mini-website or portal, with regular readers and some kind of moderator or webmaster. External customers Institute/Company/University who have decided to use CDS Invenio as bibliographic tool. They install and administrate the software by themselves. These people can also have their own internal customers that manage collections. Collaborators People external to CERN who contribute to the development of CDS Invenio. Mainly developers. These people are usually also external clients.

7 CHAPTER 2. ANALYSIS

We now need to analyse the user population in more details. We cannot keep the above categories as such: for example, the external customer is not limited to the people who install the system, but also includes users who browse the library. The way CDS Invenio is distributed (free, opensource, without intensive marketing or advertising) and the fact that it has been developed for internal use (while trying to fulfill more generic needs than only internal ones) has made that CDS Invenio is dominantly used in universities. The trend in these universities is to replace or complement the old library system with a more modern one, in particular which offers standardized metadata, interoperability with other libraries and that can make their documents freely available on the web. It results that (end-) users of the system are mostly students and workers with a university education level and a background in science (social science, physics, etc.). They consult the CDS Invenio database in order to document their works. We will hence to these users as readers (even though they might also publish papers, but this does not concern BibFormat). Among these users we can find beginners (typically students) who might not be used to online libraries and only have very basic knowledge of computer tools, limited to web browsing and office software. The other type of end users are scientists, who almost daily deals with library system and computer productivity tools to write their reports, do their experiments, etc. They are hence familiar with biblio- graphic search. BibFormat should have a limited or indirect impact on these users: although it formats the results they will read, the choice of the format- ting is let to the managers of the various collections, who should be the best experts to make such a decision, as we will see. It is however the biggest popu- lation of our analysis, hence the importance for Bibformat to provide adequate customization tools. Now let’s discuss the internal customers category. They are the groups that have asked to set up a collection for their needs. Most of the persons in this category are also the reader users we have just seen: they just want to submit their papers and consult others’. However there is one user in this category who is different from the others: it is the administrator of a collection. It chooses how the bibliographic references will look like in the collection it manages. For example, the administrator of some photos collection might decide not to display the date tag of the images in the list of search results. We will see different kinds of administrators. The first one belongs to this category of external customers, and can be called the domain expert administrator: this person is one of the readers who has been been given the responsibility to manage the collection of his group. He is expert in the domain he has to manage, but has no or few knowledge in HTML or in programming language. In the current version of CDS Invenio this user has usually not administration rights to modify the formatting of his collection. He does usually ask the CDS Invenio development team to write a custom format for the collection he manages, and does not have to care about the management of the collection after it has been set up. However in the future CDS Invenio might allows this user to edit and create his own formats. The biggest internal customer at CERN, who has to deal with many collections, is totally different from those we have seen. It is the people of the library, who directly manage most of the documents at CERN. The library staff have had a special training in bibliographic systems. They know about the bibiliographic

8 2.1. TASK ANALYSIS standard MARC 21. They are not particulary familiar with any programming language. We call them the librarian administrators. Currently defining the formatting of a collection requires a good knowledge of programming. Although they have been shown how to write custom formats, they have never really been able to do it because of the complexity of the previous BibFormat. This is why the administrators that we have seen most often relies on a third one to write the formatting part. This third administrator is simply a programmer (developer administrator), usually someone from the CDS Invenio development team who offers support for this kind of tasks. The external customers category needs refinement: it includes many cate- gories that we have already discovered, like readers and librarian administrators. One new user belongs to the external customers: it is the system administra- tor. It is the person who installs and maintains a running copy of CDS Invenio for his institute. He is very like the developer administrator, but he does not necessarly know about Python and does not want to change the source code of the application to customize CDS Invenio. We could also decide to consider another external customer, who is in charge of deciding if he will use CDS In- venio or a concurrent product (This person might be different from the system administrator). The capabilities provided by BibFormat to have custom for- mats for the references can have a considerable weight in the decision process. As there is no strong will to “sell” the product and that the development is strongly oriented for internal needs, we do not have to focus on this user. More- over the choice of this decision maker of using or not CDS Invenio should be based (concerning BibFormat) on the fulfillment of the needs of the users we have considered above. The collaborators category has somehow already been discussed through the previous points. They are made of the same readers and administrators users as at CERN.

Now let’s summarize the categories of users that we will consider. They are ordered according to their knowledge in the field where BibFormat applies:

• Readers

• Domain expert administrator

• Librarian administrator

• System administrator

• Developer administrator

As readers will not deal with the same part of the software as the others, we must consider them separately. Although we said that they were not direct users of BibFormat, we can still analyze this population, as it will tell us which default format we should provide in the CDS Invenio installation package.

Readers Here is the user profile of the reader users of CDS Invenio. I have compiled the statistics of CERN (Statistics of 2004 [10]), EPFL (2005 [11]) and RERO (2005 [12]) a swiss online library.

9 CHAPTER 2. ANALYSIS

70% of the CDS Invenio server traffic comes from outside CERN. It is difficult to analyze the user population accessing to documents from outside CERN in details. The remaining 30% readers at CERN are mostly scientists (Research physicists, various scientific and engineering work). 90% of them are CERN employees aged from 25 to 65 years (average: 50 years). The remaining 10% are students aged from 18 to 25 years. 43% of the persons come from France, the remaining come from various countries (mostly europeans). About 5% of the scientific staff are women. As scientists, they master scientific tools and report editing software, but do not necessarly have knowledge in computer science (System installation, maintenance and dedicated software development are managed by the IT department). The majority of the members know at least two languages, including English which is the official language at CERN. EPFL has about 6500 students, from 18 to 25 years old. 25% of the students are women. The CDS Invenio installation at EPFL mostly targets researchers, doctoral students and students. RERO is a network of 200 swiss French libraries. It has 150’000 read- ers, including 35’000 students. The library manages a collection of more than 3’000’000 documents. Only 14% of the documents are in the field of exact sci- ence. Other concerns history, literature, arts, etc. 37% of the documents are in French, others are in English (23%), German (21%), Italian (6%) and Spanish (2%). RERO targets university students, teachers and researchers. Here follows an estimate of the readers population:

5% female students 15% male students 18-25 y/o 18-25 y/o 4% female 76% male researchers researchers > 25 y/o > 25 y/o

Administrators

Domain expert administrator The domain expert administrator is typi- cally not experienced in programming languages. He might know a little bit of HTML. He should typically be a reader too, and belong to the category of researchers. His knowledge in bibliographic notation is limited (he only reads them in reports and write them for his own reports). There are about 30 cus- tom formatting formats that needed to be developed for this user at CERN. As CDS Invenio in installed in about 15 institues worldwide and that they are potentially 500 collections at CERN that could be managed by this user, this gives more than 100 users of that kind.

Librarian administrator The CERN librarian are mostly women, aged from 20 to 40. They are experienced in administration, bibliographic notation (MARC 21) and bibliographic tools (like ALEPH). There are about 5 librarians at CERN working on CDS Invenio. Given the installed base of CDS Invenio, we can give a rough estimate of about 40 librarians using CDS Invenio.

10 2.1. TASK ANALYSIS

System administrator The system administrator will install and maintains CDS Invenio. As CDS Invenio is not easy to install we can consider that he has knowledge in UNIX administration, command line tools and web server deployement. He does not have any experience in bibliographic notations and tools. He knows a bit about HTML and XML, but don’t have to deal daily with these languages. We can estimate that there are about 15 system administrators.

Developer administrator The developer team working at CERN on CDS In- venio is the typical developer population. They are also the most important de- veloper administrators for CDS Invenio. They must know Python as program- ming language and understand MARCXML to contribute to the development of CDS Invenio. The population is typically under 40 years old. A large part of the CDS Invenio development team at CERN are students, working on the project for less than 1 year. Most of them know at least Java when joining the team. It has never be a problem for them to learn Python. BibFormat formats are nowadays maintained by a full-time member of the CDS Invenio team. There are about 10 developers for CDS Invenio at CERN including 5 who offer support to internal customers. About 5 external contributors complete the number of developers.

Administrators synthesis While the numbers show that formatting should be adapted to non computer literate people, it would be strategically risky to develop BibFormat exclusively for them. Firstly because the edition of formats is a task that requires resources, and the allocation of a budget for this task to librarians instead of developers is a decision that is beyond the scope of this project. We cannot develop a software exclusively for a category of people if in the end they do not have the resources to use it. Secondly the edition of format is a task that can become quite difficult for complex formats. Managers of CD Invenio were not ready to trade off expressiveness of formats for ease of writing. We can however afford giving non computer literate persons the tools to specify how the formatting should look like and also anticipate future developments. Still we must balance their needs with the needs of the CDS Invenio developers, who will remain for a while the only persons with the permissions and resources to edit formats.

10% System administrator 40% Developer 20% Domain expert 30% Librarian

2.1.3 Personas Five personas representing the main categories of our user population were created. These personas were especially of a great help to prioritize the needs of our broad user population. A complete profile has been attached to each persona, to better concretize the users for whom I was developing, and enrich feedback from my co-workers.

11 CHAPTER 2. ANALYSIS

The first persona from our end-user population is Silvio Pedone, a CERN researcher:

Name Silvio Achile Pedone Age 56 y/o Job Physicist Nationality Italian Languages Italian, English, French Work hours 8.30am to 5pm Education Doctor in physics Location St-Genis, France Technology Unix system, latex, matlab. Web browsing and email. Income 8000 CHFr/month Disabilities Wear glasses Family Widower. Has 2 sons. Hobbies Reading science magazines. Walking in the mountains and ski. Goals Make science progress. Make his re- search recognized by the scientific community in particle physics. Silvio Pedone has always been interested in science. He has been encouraged very soon by his parents to think by himself and study hard to reach his goals. He studied physics at University of Padova, where he got his phd in physics. Soon after the end of his studies he found a job as researcher in nuclear physics at CERN. He moved to Geneva, where he met his wife Mireille. They got 2 children. Silvio has seen the evolution of technologies with great hopes for science. Silvio works in the theoretical physics laboratory on string theory with 15 other persons. He regularly publishes articles about his researches in scientific magazines. He also publishes them on CDS Invenio so that his work is easily accessible from the web to anyone.

12 2.1. TASK ANALYSIS

Stephanie represents a second reader from our user-population:

Name Stephanie Jarlet Age 23 y/o Job Biology student Nationality Swiss Languages French, English and German Work hours Flexible. Usually 6 hours a day at university. Works 2 hours at home in the evening. Education Bachelor degree Location Lausanne, Switzerland Technology Windows PC and Microsoft office. Web browsing, emails. Income None Disabilities She sometimes wear glasses to read. Family Has a younger brother, who lives at home with her parents, in Neuchatel. She lives currently with 2 friends in a student lodging. Hobbies Cinema and shopping Goals She would like to easily find results of studies that have been done to document the lab reports she has to hand out to her professors. Stephanie is a 3rd year student at EPFL. She studies biology. She had first thougt to become mathematician and has been studying maths for one year. She prefered biology because it was less abstract. She really does not regret her choice. Stephanie has practical labs at school. She has to hand out about one report per week. She must document herself for each of them. The professors have recommended the students some books at the beginning of the semester, and the assistants are always here to help. She also uses google to find paper references.

13 CHAPTER 2. ANALYSIS

Alain represents the user that is managing the formats as a full-time job in the old PHP BibFormat, but who will finally be able to spend more time on other tasks once the new BibFormat is released, when managment of formats has become easier. He keeps in touch with his customers for establishing the requirements of the formats. Alain represents the advanced user of BibFormat.

Name Alain Boilat Age 42 y/o Job CDS Invenio Developer at CERN Nationality French Languages French, English Work hours From 8.30 am to 5 pm Education Master degree Location France Technology Unix, Windows, Java, C++, Python, XML, etc Income 6000 CHFr Disabilities None Family Married, 2 children Hobbies Football, and sports in general Goals Enjoy his work. Make CDS Invenio fills the need of CERN users. Alain Boilat has been working on CDS Invenio for more that 6 years. He has become one of the few to fully understand the different modules of CDS Invenio. He implements about 2 major new features per year on CDS Invenio. A large part of his work is to maintain the existing CDS Invenio installation at CERN, and provide all kind of support to its users: correct bugs, assist new users and write new custom formats for the collections that are created. He regularly receive a request to write one new formats. He wished that it could take less time on his work, either by letting the users do it, or by having more adequate tools that he could use to write and manage formats more easily. Alain works on making CDS Invenio meets its customers needs. From time to time he gets a request to modify/correct the formatting of a given collection. Most of the time these request comes from a librarian. He can do this very easily, as he knows almost by heart where the faulty format lies. It is only a matter of minutes for him to fix any problem. He knows that it would take the people at the library much more time. To edit a format, Alain uses the web interface. This can become a pain, as it does involves programming, but without the usual tools that a developer needs, like syntax highlighting, debugger or even keyboard shortcuts. Alain lives near CERN, with his wife and his two girls. The youngest of his girl his very proud of her father and want to become developer too.

14 2.1. TASK ANALYSIS

The librarian user was very important to take into account: some CERN librarians have shown a great interest in being able to edit format themselves, for doing small modifications. With the new BibFormat they should be able to do the regular maintenance of formats, and ask assistance to CDS members for complex projects. Taking into account the librarian user will also lead to a design compatible with persons who are only familiar with basic computer usage.

Name Marisa Borboen Age 32 y/o Job Librarian at CERN Nationality French Languages French, English, Spanish Work hours From 8.30 am to 5 pm Education High school Location Ferney-Voltaire Technology Windows, Microsoft Office, Aleph, CDS Invenio. Web browsing. Income 4000 CHFr Disabilities Wear glasses Family Single Hobbies Reading people magazines and polars. She likes TV sitcoms. Goals Offer high quality service to the readers of the library. Hope to get married with a rich man. Marisa has been working for the CERN library for 5 years. Previously she was secretary in a bank, near Annecy. She has been fired due to budget cuts. She jumped on the occasion to do a one year long formation of librarian. She feels very lucky to have found a job at CERN. She now lives in a flat in Ferney-Voltaire, 5 minutes from CERN by car. She likes its quiet surroundings, which allows her to peacefully reads books outside on public bench. Marisa is quite excited to work for the CERN library. She feels that CDS Inve- nio is one of the indispensable tools for the users of the library, and therefore tries to maintain the bibliographic references as clean and updated as possible. As a librarian, she feels that the web could be a wonderful way to discover new writers and books, may it be fiction or scientific books, but that it still lacks some tools for this purpose. Marisa maintains a small blogs, where she can share her passion for some books and TV shows. She has learned some HTML, but did never really have to use it. However she would also like to help for some parts of the CDS Invenio website, as long as she does not have to deal with technical problems.

15 CHAPTER 2. ANALYSIS

Name Patrick Teneau Age 21 y/o Job Computer Science student, internship at CERN Nationality French Languages French, English Work hours From 8.30 am to 5 pm Education Bachelor degree Location Annemasse, France Technology Unix, Windows, Java, C++, HTML, XML Income 1200 CHFr Disabilities Wear glasses Family Single. Lives by his parents. Hobbies Computer science and Sci-fi reading. Plays online role based games. Goals Finish his studies with a great CV. Patrick has been studying computer science at IUT (Savoie University, France). He has to do an internship during his studies, and has had the possibility to do it at CERN, working on CDS Invenio. Patrick did not know Python before starting his internship, but he quickly learned it. Patrick still lives by his parents, in Annemasse. He travels everyday to CERN by car. During his short internship Patrick has been asked to develop new formats for BibFormat. The first step involved to read the BibFormat documentation. Understanding how to write formats was difficult as it almost required to understand the underlying architecture, and to learn a new language specific to BibFormat. The web interface allowed him to create a new format. The writing had to be done in a standard web text field, which was awful to use. Patrick fooled the system by using an external HTML editor to write the skeleton, then copy paste the results to the bibformat field. He could then benefit of all the nice tools of his editor, and then preview the result in CDS Invenio. The only problem is that he had to do modification after each copy-paste, to adapt the raw HTML to the BibFormat syntax.

The persona of Patrick was invalidated by members of the CDS team very soon in the analysis phase, as he did not correspond to a role distributed there. Currently one full-time member of the CDS team is responsible for managing all the formats, and this task is not assigned to temporary members of the team. There are two main reasons which make that this job must be done by a full-time member: firstly learning BibFormat and writing formats is hard, and secondly the work must be supervised by someone who knows the requirements of the formats. Still a big part of the work is spent on actually writing the code of the formats, while it would be better spent in the other tasks of the work, like establishing the requirements of the formats or designing the formats. Nevertheless introducing a user like Patrick, who represents the novice user that was critically missing in the design of the previous BibFormat, will help to reduce the time required to learn BibFormat and write formats. This might eventually leads to distribute this job to other resources, such as temporary

16 2.1. TASK ANALYSIS students. This novice user (but still computer literate), is also necessary to prevent the issue discussed in section 2.2.1, which makes of BibFormat users almost perpetual novices.

Still this persona has evolved a bit in the course of this analysis, to better reflect the current situation in the CDS team. He is now working closely with Marisa and Patrick to establish requirements of the formats. He keeps tracks of the requirements in an Excel spreadsheet to which he can refers when he has to modify a format. He also works on other modules of CDS Invenio from time to time.

Prioritization of personas The most important persona that will interact with BibFormat is Patrick. It is mainly for him that BibFormat is going to be designed. Our secondary user is Alain, as he only use BibFormat from time to time. Our third user is Marisa, although she would ideally be our primary user: Marisa knows the requirement of he formats, but as explained in the user population analysis it is strategically risky to design in priority for Marisa. The two last personas do not directly interact with BibFormat. We do not design an interaction model for them, but we must keep in mind that our three main personas will develop formats for them.

2.1.4 User Needs The high-level needs of the personas are the following: Patrick’s needs:

• Create new formats for new collections (totally new or duplicate from existing one)

• Modify the look of existing formats (Colors, fonts, layout, etc.) according to librarians’ directives.

• Modify the information displayed by an existing format (which fields are displayed) according to librarians’ directives.

• Check the quality of his modifications on formats.

• Assign a format to a new collection.

Alain’s needs:

• Modify the look of a format (Colors, fonts, layout, etc.) according to librarians’ directives.

• Modify the information displayed by an existing format (which fields are displayed) according to librarians’ directives.

• Sets the format for a given collection

• Keep track of requirements for each format.

• Make sure that everything is working right with the formats in production.

17 CHAPTER 2. ANALYSIS

• Define formats that require complex business logic that other users cannot or do not want to deal with.

• Define new kinds of outputs (brief output, portfolio ouput, Excel output, etc)

Alain and Patrick’s needs are almost identical. They mostly differ in the fact that Alain takes care of all modules of CDS Invenio and only punctually uses BibFormat, while Patrick uses it almost every day. They will also differ in the manner to execute their tasks.

Marisa’s needs:

• Modify the look of a format (Colors, fonts, layout, etc.)

• Modify the information displayed by an existing format (which fields are displayed) according to modification made to meta-data of library’s items.

• Modify references list such as list of journal’s names and websites, or collection names, which are not saved directly in records.

2.1.5 User Scenarios We discuss here typical scenarios for the three main personas of our analysis.

Patrick’s scenarios Patrick is responsible of the formats used at CERN for CDS Invenio. When a new collection is added, Patrick has to create a new format, which usually is only a slightly modified version of an already existing format. He mostly needs to add or remove displayed fields, to match the meta- data of the new collection. He modifies the labels, and sometimes has to provide a default value for a field in case it can be empty for some records. Patrick follows the requirements of the collection owners (mainly librarians) regarding the fields that have to be displayed, their ordering and their labels, but is most of the time free to choose the appearance of the page (fonts, colours, general layout). He tries to keep a uniform look accross collections, but gives priority to the wishes of collection owners. For each new collection, Patrick should design different formats: one detailed HTML format, one brief format (for the search results), and other formats for a BibTeX output, Excel output etc. However most of the time only a new HTML detailed format is written, unless there are specific needs to adapt other output formats. Patrick likes to try different designs, and therefore wants to be able to quickly see the results of different attempts. Given that records of the databases are entered by various users who use different conventions and harvested from different sources, Patrick has to adapt formats to produce the most uniform output as possible. He wants for example that a journal name abreviated in two different ways is always displayed in the same way. To do so Patrick maintains a list of mappings that the formatter can use to normalize output.

18 2.1. TASK ANALYSIS

Alain’s scenarios Alain is responsible of the complex and low-level tasks of the maintenance of the CERN documents server. He has to ensure that the server is always available to users. He backups the system, fixes bugs, etc. When he receives a request to add a new collection, he defines the requirements with the client regarding the meta-data that will be collected, the look of the collection and the submission process. Alain delegates the task of implementing the submission page and the formatting of the collection to his colleagues, while he prepares the database to support the new collection. From time to time Alain receives a request for a small modifications in the collections, which might require minor adaptations of a format. In that case Alain opens the format file and do the modification by himself. It also happens that Alain takes care of writing the complex business logic behind a format. In these cases he prefers to use all the power of his UNIX tools and scripting languages than a web interface. Alain sometimes cleans the existing templates. He tries to eliminate duplicate or very similar ones, and delete those who are not in use. While formats are collection-dependants, users wants to be able to select custom outputs (like BibTeX or detailed ouptut) which apply to all records. Therefore Alain must be able to define new categories of output which will take care of selecting the right template for the right record given the selected output.

Marisa’s scenarios Marisa is responsible about one third of her time of the loans at the users desk, one third of the time registering new books in the Aleph system and one third of the time preparing the various CERN periodicals. When she inputs new records in the Aleph system, she knows that they will be available to users from the CDS Invenio website. Marisa must create custom formats for the different publications at CERN. Se needs the tools to create formats with a low-level complexity, simply displaying some fields in tabular environments. She does this by using Dreamweaver the HTML editor distributed at CERN, and for which training session are offered to CERN employees. Sometimes she notices that an existing format does not display the right field or that a label needs to be clarified. She opens the corresponding format and tries to modify it by herself before calling the CDS support to do it if the task is too difficult for her. One of the big tasks of Marisa is to keep up-to-date list of mappings, which define the normalized version of a field for different values. For example if some records refers to Phys. Rev A or Phys. R. A, Marisa must let the BibFormat know that they refer to Physical review, A. journal, by mapping the different abbreviations to the full journal name. She also does this for authors’ names and journals’ websites. She just sets these mappings in knowledge bases.

Other scenarios You can find other more detailed and formalized scenarios for these users in appendix A.

2.1.6 Task Tables and Task Map Patrick’s task table is show in table 2.1.6, Alain’s task table in table 2.1.6 and Marisa’s task table is in table 2.1.6. The task vs frequency of usage table is in table 2.1.6.

19 CHAPTER 2. ANALYSIS

Task Importance Frequency Details Add format Medium Low Patrick creates a new format tem- template plate, or copy of existing one, using web interface Retrieve a for- Medium Medium Patrick finds a format template by mat template its name and description in the list of templates Edit look High High Edits fonts, colors, layout, ordering of format of elements using WYSIWYG editor template Edit displayed High Medium Modifies displayed fields of the information record according to requirements of format template Check format High High Patrick wants to be sure that he has template valid- made no error, after his modification ity Preview for- High High Patrick can see a preview of his work mat template with real records Delete format Medium Medium Patrick clicks on delete button in the template list of format template Modify format Medium Low Patrick clicks on “modify attributes” template name in the format template editor Add new cate- Medium Very low Patrick adds new output format (Ex- gory of output cel, XML, etc), using web interface Retrieve a cat- Medium Low Patrick finds an output format by its egory of out- name in the list of output formats put Delete cate- Medium Very low Patrick goes the list of output for- gory of output mats and click on “delete” button Assign a tem- High Low Patrick adds a new rule for for- plate to a col- mat template in corresponding out- lection put format Remove the Low Very Low Patrick removes the corresponding template of a rule in desired output format collection Modify name Medium Medium Patrick changes the name of the out- of category of put format in the output format ed- output itor

Table 2.1: Patrick’s task table

20 2.1. TASK ANALYSIS

Task Importance Frequency Details Add format Medium Low Alain creates a new format template template file directly from his terminal Retrieve a for- Medium Medium Alain finds a format template by its mat template filename in the list of templates Edit look Medium Low Edits fonts, colors, layout, ordering of format of elements using his text editor template Edit displayed High Low Modifies displayed fields of the information record according to requirements, of format from his text editor template Check format High High Alain checks the correctness of all template valid- formats from his terminal, at any ity time Preview for- High High Alain can see a preview of his work mat template with real records Check for- High High Alain track the dependencies of all mat template formats from his terminal, at any dependencies time Delete format Low Low Alain remove the file from his termi- template nal Modify format Low Low Alain renames the format template template name file right from his terminal Add new cate- Medium Low Alain adds a new output format file gory of output right from his terminal Retrieve a cat- Medium Low Alain finds an output format by its egory of out- filename in the list of output formats put Delete cate- Medium Low Alain deletes an output format file gory of output right from his terminal Assign a tem- High Medium Alain adds a rule in output format plate to a col- file using his text editor lection Remove the Low Low Alain removes a rule in output for- template of a mat file using his text editor collection Add complex High Medium Alain writes a new script in Python, format using his preferred tools Check validity High High Alain checks the correctness of all of output rules formats from his terminal, at any time Check depen- High High Alain track the dependencies of all dencies of out- outputs from his terminal, at any put categories time

Table 2.2: Alain’s task table

21 CHAPTER 2. ANALYSIS

Task Importance Frequency Details Add format Low Low Marisa creates a new format tem- template plate, or copy of existing one, using web interface Retrieve a for- Medium Medium Marias finds a format template by its mat template name and description in the list of templates Edit look High High Edits fonts, colors, layout, ordering of format of elements using WYSIWYG editor template Preview for- High High Marisa can see a preview of his work mat template with real records Edit displayed High High Modifies displayed fields of the information record of format template Check format Medium Medium Marisa wants to be sure that he has template valid- made no error, after her modifica- ity tions Assign a tem- Medium Low Marisa adds a new rule for for- plate to a col- mat template in corresponding out- lection put format Adds a new High High Marisa adds a new entry in the jour- journal name nal (or equivalent) knowledge base (or equivalent)

Table 2.3: Marisa’s task table

22 2.1. TASK ANALYSIS

Task Patrick Alain Marisa Add format template 5% 1% 5% Retrieve a format template 10% 3% 10% Edit look of format template 15% 1% 20% Edit displayed information of format template 10% 1% 10% Preview format template 15% 7% 10% Check format template validity 10% 15% 5% Check format template dependencies 5% 1% 5% Delete format template 3% 1% 0% Modify format template name 2% 1% 0% Add complex format 0% 15% 0% Add new category of output 2% 3% 0% Retrieve a category of output 5% 5% 0% Delete category of output 2% 2% 0% Assign a template to a collection 5% 15% 5% Remove the template of a collection 2% 2% 0% Modify name of category of output 2% 2% 0% Check validity of output rules 5% 15% 0% Check dependencies of output categories 2% 10% 0% Adds a new journal name (or equivalent) 0% 0% 30%

Table 2.4: Task vs frequency of usage table

For Patrick, it is very important that :

1. he can modify templates and see the results immediately

2. he can play with the system to learn how it works (without reading man- uals, but learn by looking at provided sample format templates)

3. he can easily see all existing templates and retrieve a particular one

4. he can quickly move to more advanced administration tasks

For Alain, it is very important that :

1. he can administrate BibFormat without the web interface, but using his preferred UNIX tools.

2. he can immediately check the status of all formats

3. he can easily see all existing formats and retrieve a particular one

4. he can assign a format to a sets of records

5. he does not have to remember how BibFormat works each time he wants to use it (reduce need of syntactic knowledge of the system)

For Marisa, it is very important that :

1. everything is well documented in case she has a problem (Contextual Help)

23 CHAPTER 2. ANALYSIS

2. she can use her HTML editor to edit formats

3. she can do minor modifications without having to contact Alain or Patrick

A summarizing task map is shown in figure 2.1

Create Format Normalize Data Create Output Format Add new format Retrieve corresponding knowledge base Add new output format Name and describe format Add new mapping or modify existing Name and describe format Edit format Edit output format Assign format to records mapping Edit Output Format Edit Format Retrieve output format Retrieve format Set default format Modify fonts, colors, Set which format used layout for which condition Modify displayed fields and labels Validate Output Format

Preview format Delete Output Format Save format Retrieve output format Check dependencies Remove output format Assign Format Set output format kind Assign to set of records

Validate Format

Delete Format Retrieve format Check dependencies Remove format

Figure 2.1: Task Map.

2.1.7 Usability Goals 1. Short learning phase (Rely on concepts and tools already known and ap- preciated by users).

2. Let users try/play with the edition of format, with their own tool or online WYSWYG editor.

3. Easily retrieve an exisiting format and edit it.

4. Provide preview of the formats.

5. Provide awareness of the status of formats (correctness, dependencies).

2.2 Comparative Analysis

In this section we study the different formatting solutions employed by direct competitors of CDS Invenio and products whose goal is to format bibliographic references, in order to get a glimpse of viable design solutions. Given the short amount of time allocated to the project, this analysis helped avoid spending time prototyping inefficient solutions.

24 2.2. COMPARATIVE ANALYSIS

We also describe the current formatting solution available in CDS Invenio previ- ous to this project. This section also gives an idea of the different requirements of the formatting process. This analysis was done shortly after the tasks analysis had begun, in order to evaluate the different solutions against our own requirements. Still both analy- sis continued in parallel given that this comparative analyis could also expand the task analysis with interesting elements.

2.2.1 Previous Formatting Module The PHP version of BibFormat is a complex software that has been included as a formatting module in CDS Invenio until now. It has a web configuration interface. An extract of the main page is shown in figure 2.2. Excepted for the complexity of this page, it is a typical entry page of the CDS Invenio’s modules: the main page links to the administrator guide and to the different tools or concepts of the module. The main page has the following links:

Behaviours Define the rules that will decide which format must be applied to a given record.

Extraction Rules Define how the metadata tags from the database are mapped into internal BibFormat variable names. .

Link Rules Define rules for automated creation of URI links from mapped internal variables.

File Formats Define file format types based on file extensions. This will be used when proposing various fulltext services.

User Defined Functions (UDFs) Define functions that can be reused when creating formats. This enables to do complex formatting without ever touching the BibFormat core code.

Formats Define the formatting of CDS Invenio records.

Knowledge Bases (KBs) Define one or more knowledge bases that allow to transform various forms of input data values into a unique standard form on the output. Example: Specify that Phys Rev D and Physical Review D are both the same journal and that these names should be standardized to Phys Rev : D.

Execution Test Test formats with a sample data file.

The most important concepts for the formatting process are Formats and Behaviours. The Formats link leads to a list of formats, as shown in figure 2.3. This page allows to view, edit or delete formats. It also let users insert a new format right from this page. Each format has a name, a description and a code that define the formatting. When clicking on the Modify link of a format in the list of formats, an editor for this format is loaded in another window, as shown in figure 2.4 The code of the format is written in EL (Evaluation Language), a custom language invented specifically for this BibFormat. EL is a subset of PHP, with loop and conditional statements, but without variable assignment. Interesting things to note about this language is that EL allows to :

25 CHAPTER 2. ANALYSIS

A T L A N T I S I N S T I T U T E O F F I C T I V E S C I E N C E Search Submit Personalize Help Navigation links Home > Admin Area > BibFormat Admin

ADMIN AREA Description of the module BibFormat Admin HOWTOs BibClassify Admin The BibFormat admin interface enables you to specify how the bibliographic data is presented to the end user in the BibConvert Admin search interface and search results pages. For example, you BibEdit Admin may specify that titles should be printed in bold font, the BibFormat Admin abstract in small italic, etc. Moreover, the BibFormat is not Behaviours only a simple bibliographic data output formatter, but also Extraction Rules an automated link constructor. For example, from the information on journal name and pages, it may Link Rules automatically create links to publisher's site based on some File Formats configuration rules. UDFs Formats Configuring BibFormat KBs Execution Test By default, a simple HTML format based on the most Reformat Records common fields (title, author, abstract, keywords, fulltext Guide link, etc) is defined. You certainly want to define your own ouput formats in case you have a specific metadata BibHarvest Admin structure. BibIndex Admin BibMatch Admin Here is a short guide of what you can configure: BibRank Admin Access to module's tools BibSched Admin Behaviours (followed by a description) Define one or more output BibFormat BibUpload Admin behaviours. These are then passed as ElmSubmit Admin parameters to the BibFormat modules while WebAccess Admin executing formatting. WebAlert Admin Example: You can tell BibFormat that is has to enrich the incoming metadata file by the WebBasket Admin created format, or that it only has to print the WebComment Admin format out. WebMessage Admin Extraction Rules WebSearch Admin Define how the metadata tags from input are WebSession Admin mapped into internal BibFormat variable WebStat Admin names. The variable names can afterwards be WebStyle Admin used in formatting and linking rules. WebSubmit Admin Example: You can tell that 100 $a field should be mapped into $100.a internal variable that you could use later. Link Rules Define rules for automated creation of URI links from mapped internal variables. Example: You can tell a rule how to create a link to People database out of the $100.a internal variable repesenting author's name. (The $100.a variable was mapped in the previous step, see the Extraction Rules.) File Formats Define file format types based on file extensions. This will be used when proposing various fulltext services. Example: You can tell that *.pdf files will be treated as PDF files. User Defined Functions (UDFs) Define your own functions that you can reuse when creating your own output formats. This enables you to do complex formatting without ever touching the BibFormat core code. Example: You can define a function how to match and extract email addresses out of a text file. Formats Define the output formats, i.e. how to create the output out of internal BibFormat variables that were extracted in a previous step. This is the functionality you would want to configure most of the time. It may reuse formats, user defined functions, knowledge bases, etc. Example: You can tell that authors should be printed in italic, that if there are more than 10 authors only the first three should be printed, etc. Knowledge Bases (KBs) Define one or more knowledge bases that enables you to transform various forms of input data values into the unique standard form on the output. Example: You can tell that Phys Rev D and Physical Review D are both the same journal and that these names should be standardized to Phys Rev : D. Execution Test Enables you to test your formats on your sample data file. Useful when debugging newly created formats. To learn more on BibFormat configuration, you can consult the BibFormat Guide of the module Admin Guide. Additional description Running BibFormat

FROM THE WEB INTERFACE

Run Reformat Records tool. This tool permits you to update stored formats for bibliographic records. It should normally be used after configuring BibFormat's Behaviours and Formats. When these are ready, you can choose to rebuild formats for selected collections or you can manually enter a search query and the web interface will accomplish all necessary formatting steps. Example: You can request Photo collections to have their HTML brief formats rebuilt, or you can reformat all the records written by Ellis.

FROM THE COMMAND-LINE INTERFACE

Consider having an XML MARC data file that is to be uploaded into the CDS Figure 2.2: Main BibFormat administrationInvenio. (For example, it might h pageave been harv (Oldested from o PHPther sources a BibFormat).nd processed via BibConvert.) Having configured BibFormat and its default output type behaviour, you would then run this file throught BibFormat as follows:

$ bibformat < /tmp/sample.xml> /tmp/sample_with_fmt.xml that would create default HTML formats and would "enrich" the input XML data file by this format. (You would then continue the upload procedure by 1. call another format from insidecalling a suc format,cessively BibUpload a suchnd BibWords that.) formats can be struc- Now consider a different situation. You would like to add a new possible tured in small reusable componentsformat, say "HT (butML portfolio can" and "HT alsoML caption haves" in order to n complexicely format dependen- multiple photographs in one page. Let us suppose that these two formats are called hp and hc and are already loaded in the collection_format table. (TODO: cies, or even insoluble circulardes dependencies).cribe how this is done via WebAdmin.) You would then proceed as follows: firstly, you would prepare the corresponding output behaviours called HP and HC (TODO: note the uppercase!) that would not enrich the input file but that would produce an XML file with only 001 and FMT tags. (This is in order not to update the bibliographic information but the formats only.) You would also prepare corresponding formats at the same time. Secondly, you would 2. use custom functions, also writtenlaunch the form inatting a EL,s follows: and saved in the User Defined $ bibformat otype=HP,HC < /tmp/sample.xml> /tmp/sample_fmts_only.xml Functions section. that should give you an XML file containing only 001 and FMT tags. Finally, you would upload the formats:

$ bibupload < /tmp/sample_fmts_only.xml 26 and that's it. The new formats should now appear in WebSearch.

Atlantis Institute of Fictive This site is also available in the Science :: Search :: Submit :: Personalize :: Help following languages: Powered by CDS Invenio v0.90.1 !"#$%&'() Català !esky Deutsch Maintained by [email protected] !""#$%&' English Español Français Last updated 23 Jul 2006 15:50:14 CEST Italiano 日本語 Norsk/Bokmål Polski Português *+''(), Slovensky Svenska -(&%./'0(% WebAlert Admin _FULL_ABSTRACT HTML Abstract display [Code] [Modify] [Delete] WebBasket Admin

_FULL_AFFILIATION HTML Affiliation display [Code] [Modify] [Delete] WebComment Admin WebMessage Admin 2.2. COMPARATIVE_FULL_AUTHOR ANALYSISHTML linked author display [Code] [Modify] [Delete] WebSearch Admin WebSession Admin _FULL_BIBTEX Creates BibTeX format for a record. [Code] [Modify] [Delete] WebStat Admin WebStyle Admin HTML "cited by" link creation, based on report _FULL_CITEDBY [Code] [Modify] [Delete] numbers. WebSubmit Admin

A _TFULLAL_NDTATIESDOC HTML Imprint date [Code] [Modify] [Delete] I N S T I T U T E O F _FULL_DATEREC [Code] [Modify] [Delete] F I C T I V E S C_FIUELLN_ICMPERINT Search SubmHitTM L PImerpsrionnt adlisipzlea y (notH theel pdate) [Code] [Modify] [Delete] Home > Admin Area > BibFormat Admin > Formats _FULL_KEYWORD HTML keyword display with search link [Code] [Modify] [Delete]

_FULL_NOTE HTML note display (various note fields) [Code] [Modify] [Delete] ADMIN AREA Formats HOWTOs _FULL_PHOTO_RESOURCES Prints image and link to photo resources. [Code] [Modify] [Delete] BibClassify Admin Define the output formats, i.e. how to create the output out of internal BibFormat variables that were extracted in a previous BibConvert Admin step. This is the functionality you would want to configHurTeM mLo psutb olifc athtioen t iimnfoer. mIta tmioany d irsepulasye pfosrsmibalyts w, uithse r defined functions, _FULL_PUBLIINFO [Code] [Modify] [Delete] BibEdit Admin knowledge bases, etc. Example: You can tell that authorlisn ksh to ueljdou brnea pl rinted in italic, that if there are more than 10 authors only the first three should be printed, etc. BibFormat Admin _FULL_REFERENCES HTML references [Code] [Modify] [Delete] Behaviours DEFAULT_HTML_BRIEF This is the default brief HTML format. [Code] [Modify] [Delete] Extraction Rules _FULL_TITLE HTML Title display [Code] [Modify] [Delete] Link Rules HTML "captions only" format [Code] [Modify] [Delete] DEFAULT_HTML_CAPTIONS File Formats HTML top page banner containing category, rep. _FULL_TOPBANNER [Code] [Modify] [Delete] UDFs DEFAULT_HTML_DETAILED nTuhmis bies rt,h eet cd efault HTML detailed format. [Code] [Modify] [Delete] Formats _DFEUFLALU_LUTR_LHTML_PORTFOLIO HTML U"pRoLrt fdoilsiop"la fyo rmat [Code] [Modify] [Delete] KBs Execution Test The brief HTML format suitable for displaying P_FICUTLULR_YEE_HATRML_BRIEF HTML Year display [Code] [Modify] [Delete] Reformat Records pictures. Guide

The detailed HTML format suitable for displaying BibHarvest Admin APIdCd TnUewR EFO_HRTMMALT_DETAILED [Code] [Modify] [Delete] pictures. BibIndex Admin Format Name BibMatch Admin This is the default subformat to yield the first _DEFAULT_ABSTRACT_FIRST_SENTENCE [Code] [Modify] [Delete] BibRank Admin sentence of the abstract. BibSched Admin F_oDrEmFatA dUocLuTm_eAntUatTioHnORS This is the default subformat to format author lists. [Code] [Modify] [Delete] BibUpload Admin ElmSubmit Admin _DEFAULT_TITLE HTML for displaying the title in brief formats [Code] [Modify] [Delete] WebAccess Admin

_DEFAULT_URL This is the default format for formatting URLs. [Code] [Modify] [Delete]

EL Code

Add

Figure 2.3: Extract of the list of formats administration page (Old PHP BibFormat). Bottom of the page offers controls to insert a new format.

Editing format 'DEFAULT_HTML_BRIEF'

Format Name DEFAULT_HTML_BRIEF

Documentation This is the default brief HTML format.

EL Code "" format("_DEFAULT_TITLE") " " if(count($100.a)!="0" || count($700.a)!="0") { " / " format("_DEFAULT_AUTHORS") " " } forall($088.a) { " [" $088.a "] " } forall($037.a) { " [" $037.a "] " } forall($520.a) { "
" format("_DEFAULT_ABSTRACT_FIRST_SENTENCE") "" } forall($8564.u) { "
" format("_DEFAULT_URL") "" }

UPDATE UPDATE&CLOSE

Figure 2.4: Format editor (Old PHP BibFormat).

27 CHAPTER 2. ANALYSIS

3. get the value defined for a key in a Knowledge Base. 4. create a link to some resource using the Link Rules. This is how formats are managed. Now let’s see how these formats are applied. To specify which format is applied to which record, users have to follow the Behaviours link of the main page, which leads to a list of behaviours (Figure 2.5). These behaviours can be managed in the same way as formats. A behaviour

A T L A N T I S I N S T I T U T E O F F I C T I V E S C I E N C E Search Submit Personalize Help Home > Admin Area > BibFormat Admin > Behaviours

ADMIN AREA Behaviours HOWTOs BibClassify Admin Define one or more output BibFormat behaviours. These are then passed as parameters to the BibFormat modules while executing formatting. Example: You can tell BibFormat that is has to enrich the incoming metadata file by the created format, BibConvert Admin or that it only has to print the format out. BibEdit Admin BibFormat Admin Name Type Documentation Behaviours Creates DEFAULT formats and includes them in the input XML OAI MARC record in the "FMT" Extraction Rules DEFAULT IENRICH [Details] element Link Rules HB NORMAL Produces HTML brief format. Useful for reformatting records existing in the database. [Details] File Formats UDFs PRODUCES HTML CAPTIONS FORMAT. USEFUL FOR REFORMATTING RECORDS HC NORMAL [Details] EXISTING IN THE DATABASE. Formats PRODUCES HTML DETAILED FORMAT. USEFUL FOR REFORMATTING RECORDS KBs HD NORMAL [Details] EXISTING IN THE DATABASE. Execution Test PRODUCES HTML PORTFOLIO FORMAT. USEFUL FOR REFORMATTING RECORDS Reformat Records HP NORMAL [Details] EXISTING IN THE DATABASE. Guide HX NORMAL PRODUCES HTML BibTeX format. [Details] BibHarvest Admin BibIndex Admin Add new OUPUT TYPE BibMatch Admin BibRank Admin Output Type Name BibSched Admin Behavior Type Normal BibUpload Admin ElmSubmit Admin WebAclecrets As dAmdminin WebBasket Admin Documentation WebComment Admin WebMessage Admin WebSearch Admin WebSession Admin Add output type WebStat Admin WebStyle Admin WebSubmit Admin

Atlantis Institute of Fictive Science :: Search :: Submit :: Personalize :: Help This site is also available in the following languages: Powered by CDS Invenio v0.90.1 !"#$%&'() Català !esky Deutsch !""#$%&' English Español Maintained by [email protected] Français Italiano 日本語 Norsk/Bokmål Polski Português *+''(), Figure 2.5:Last updatedList 23 Jul 2006of 15:49:2 behaviours1 CEST administration page, the rules Slo thatvensky Sven defineska -(&%./'0(% which for- mat is applied to which record (Old PHP BibFormat). works as a rule-based decision system, which given a condition on the record to format, executes a given portion of code. Typically a behaviour will define a list of rules with conditions on the value of one field of the record (for example that field 980.a is equal to value “PICTURE” or “PREPRINT”) and depending on the result of the evaluation of this condition (True or False), execute the corresponding code of the rule (most of the time the code simply calls a format to apply on the record). The condition and the associated code are written in the same EL language as formats. When clicking on the Details link of a behaviour, its editor opens in a new window. This editor lets users add and edit conditions and rules, as shown in figure 2.6

Critique of the old PHP formatting solution There are some serious issues in the solution provided until now. I think that the most important one is the long learning process required before being able to do anything with the BibFormat: • Overflow of concepts and information on the main page • Impossible to learn step-by-step

28 A T L A N T I S 2.2. COMPARATIVE ANALYSIS I N S T I T U T E O F F I C T I V E S C I E N C E Search Submit Personalize Help Home > Admin Area > BibFormat Admin > Behaviours

Behaviours ADMIN AREA A T L A N T I S HOWTOs BibClassify Admin I N S T I T U T E O F Modifying action 'HB -- 0 -- 0' F I C T I V E BibConvert Admin Apply order: 0 BibEdit Admin S C I E N C E Search Submit Personalize Help BibFormat Admin Home > Admin Area > BibFormat Admin > Behaviours Locator(only Behaviours IENRICH types): Extraction Rules ADMIN AREA Link Rules " Behaviours HOWTOs " $001 " File Formats BibClassify Admin UDFs Details of output type 'HB' BibConvert Admin hb Formats BibEdit Admin EL Code: " KBs Type NORMAL xml_text(format("PICTURE_HTML_BRIEF")) BibFormat Admin Execution Test " Produces HTML brief format. Useful for reformatting Behaviours Reformat Records Documentation records existing in the database. Extraction Rules " Guide [Add condition] [Modify] [Delete] Link Rules BibHarvest Admin File Formats A T L A N T I S CONDITIONS I N S T I T U T E O F UPDATE UPDATE&CLOSE BibIndex Admin UDFs 0 $980.a="PICTURE" F I C T I V E BibMatch Admin Formats " S C I E N C E Search Submit Personalize Help BibRank Admin KBs " $001 " BibSched Admin Execution Test Home > Admin Area > BibFormat Admin > Behaviours BibUpload Admin hb Reformat Records Action ElmSubmit Admin " [Modify] [Delete] Guide ADMIN AREA (0) WebAccess Admin xml_text(format("PICTURE_HTML_BRIEF")) Behaviours HOWTOs BibHarvest Admin WebAlert Admin " BibClassify Admin BibIndex Admin Editing condition 0 for output type'HB' BibConvert Admin " BibMatch Admin AtlanEtivs aInlustaittuitoen o fo Frdicetirv:e 0Science :: Search :: Submit :: Personalize :: Help This site is also available in thBei bfoEldloitw Aindgm ilnanguages: [Add Action] [Modify] [Delete] BibRank Admin BibFormat Admin BibSched Admin $980.a="PICTURE" 100 ""="" Behaviours BibUpload Admin " Extraction Rules ElmSubmit Admin g=\"001\">" $001 " Link Rules WebAccess Admin EL Code: File Formats hb WebAlert Admin Action " [Modify] [Delete] UDFs (0) WebBasket Admin Formats xml_text(format("DEFAULT_HTML_BRIEF")) WebComment Admin " KBs WebMessage Admin Execution Test " WebSearch Admin Reformat Records WebSession Admin Send [Add Action] [Modify] [Delete] Guide WebStat Admin WebStyle Admin BibHarvest Admin WebSubmit Admin BibIndex Admin BibMatch Admin BibRank Admin Atlantis Institute of Fictive This site is also available in the following BibSched Admin Science :: Search :: Submit :: Personalize :: Help languages: BibUpload Admin Powered by CDS Invenio v0.90.1 !"#$%&'() Català !esky Deutsch !""#$%&' ElmSubmit Admin Maintained by [email protected] English Español Français Italiano 日本語 Last updated 23 Jul 2006 15:49:18 CEST Norsk/Bokmål Polski Português *+''(), WebAccess Admin Slovensky Svenska -(&%./'0(% WebAlert Admin Figure 2.6: Behaviour editorAtlantis Institute (Old of Fictive SciencePHP :: Search :: Submit BibFormat).:: Personalize :: Help This site is also available in the following languages:

• Impossible to “play” with BibFormat tools

• User must learn the EL language

• EL language not really suitable as formatting language

The overflow of concepts on the main page might be daunting for a first time user. The provided information labels are too long and make this page looks like a manual (and nobody reads manuals). Moreover it is not possible for a user to hope learning BibFormat step-by-step by clicking on each of the links of the page: these tools are mostly interdependent and require a global view of BibFormat. Users also cannot just try each of the tools, as they mostly pro- vide no feedback of the result of the modifications. Another issue is that it is mandatory for users to learn the EL language, which has been designed for, and used only in, BibFormat. All of these reasons make that BibFormat is totally inadequate for novice users.

The EL language also makes BibFormat inadequate to intermediate users: in most situations, formats will be modified only punctually, such that it will be very difficult for users to remember EL between each time they will need it. They always will have to go back to the documentation or read sample code before editing a format. Moreover even if the EL language has been designed for the sole use of BibFor- mat, it fails to help users define formats easily. One of the biggest complaints about this language is that is that it requires quoting every outputted text: for example to output Hello World, users must write "Hello World". It that basic case, this looks pretty simple. In cases where " (double-quote char- acter) is part of the output, every double-quote must be escaped with a / (slash character). As the double-quote is frequent in HTML, it is very difficult not to make an error. When there is an error the output is random, and users have to track manually for the missing or additional double-quote in their code.

29 CHAPTER 2. ANALYSIS

This problem occurs mainly because EL is more like a programming language than a formatting language, such that outputted text is not a first-class citizen of EL although it should.

There are also other usability issues: • Layout of behaviours editor follows no traditional design pattern • No tool to help users • Frustrating experience in general • Controls and links labels are often misleading • No feedback of user’s actions • No use of CDS Invenio interface guidelines One thing worth to note is the problem in the behaviours editor: although the mental model of the assignment of a format to a record under the form of a rule-based system seemed totally natural for most users, the interaction model provided by the editor is totally flawed because of its layout and the fact that the rules are coded in EL. It is interesting to see that it would be possible to use the same mental model, and just build an adequate interaction model.

There is usually a lot of formats (more than one hundred in the production server at CERN), and as format can depends on each other, it becomes diffi- cult to track the dependencies when a format has to be modified. BibFormat provides almost no tool to help users manage these formats. In fact even the formats and behaviours editors are simply input fields for the EL code.

Users of the previous BibFormat expressed a lot of frustration when using the web interface: some actions, like opening the editor of a format, would open a new window, but the window would stay behind all others, such that was impossible to know that the action has been executed, or difficult to figure out which window to get back on top. It also happened to some users to loose all of their work when trying to save their modifications.

2.2.2 Competitors There are many applications that can handle bibliographic references. I will screen some of them, those that have been suggested to me by my colleagues and those that I found to be useful for this project. The focus is put on their capabilities to output differents formats and how much they allow customization of the ouput format. Some of the applications presented here are not direct competitors of CDS Invenio, but they are considered as long as they provide similar functionalities to BibFormat.

Direct Competitors DSpace (MIT Libraries and Hewlett-Packard Labs) From the authors’ website[13] “DSpace is a groundbreaking digital repository system that captures, stores, indexes, preserves, and redistributes an organization’s research data”.

30 2.2. COMPARATIVE ANALYSIS

This application offers pretty much the same general features as CDS Invenio. Formatting is defined using a configuration file. The system administrator tex- tually specifies in the configuration file which metadata appears, and in which order. For example, the line dc.title, date.issued(date), identifier.uri(link), description.* will show the title, the issue date (ren- dered as a date), the identifier (rendered as a link) and the DC description metadata. Whenever one of those fields does not exist for a record, it is not displayed. Labels for each of these fields are defined in another global UI dic- tionary file and automatically added. A third configuration file is used to map the field name to the actual meta-data of the record. The displayed fields can also be customized for individual collections. This is done by overriding the default line of the configuration file with a new similar line, which specifies the name of the display “style”. Then all collections using this style must be listed textually in another line of the configuration file. This method to customize formatting does not allow to change the style and layout of the page. These modifications must be done by customizing the JSP (Java Server Pages) files that produce the HTML code of each page. This is how the layout of the page can be modified. The colours, fonts, etc. are mostly described in CSS (Cascading Style Sheets) files. The system supports different JSP files (for internationalization), but does not allow to have different styles (same style accross all collections). Once the style has been customized, DSpace must be rebuilt and reinstalled to take the style into account.

Next screenshot shows how a DSpace record is displayed to the user.

About DSpace Software

Search DSpace DSpace at MIT > Go MIT Libraries > Advanced Search MIT Theses > Theses - Dept. of Electrical Engineering and Computer Sciences > Home Electrical Engineering and Computer Sciences - Master's degree >

Please use this identifier to cite or link to this item: Browse http://hdl.handle.net/1721.1/12760 Communities & Collections Title: A 12-bit 500 MHz GaAs MESFET digital-to-analog converter with p+ Titles ohmic contact isolation Authors Subjects Authors: Nuytkens, Peter R. (Peter Read) By Date Thesis advisor: Jesús A. del Alamo. Department: Massachusetts Institute of Technology. Dept. of Electrical Engineering Sign on to: and Computer Science Receive email Keywords: Electrical Engineering and Computer Science updates Issue Date: 1992 My DSpace authorized users Publisher: Massachusetts Institute of Technology Edit Profile Description: Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1992. General Help Vita. About DSpace@MIT Includes bibliographical references (leaves 134-135). URI: http://hdl.handle.net/1721.1/12760 Appears in Collections: Electrical Engineering and Computer Sciences - Master's degree Electrical Engineering and Computer Sciences - Master's degree

Figure 2.7: DSpace displaying a formatted record

EPrints (EPrints.org Community) EPrints[14] is another competitor of CDS Invenio. There does not seem to be a way to customize bibliographic notices other than programmatically (and it is not documented). Search results formatting can also be customized programmatically.

31 CHAPTER 2. ANALYSIS

Observations on the Domestic Sheep

Draut, Z. and Calafat, N. (1999) Observations on the Domestic Sheep. In: 11th Workshop on Habbitat Issues, 3-4 August.

Full text available as:

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader 12 Kb

Abstract

This is where the abstract of this record would appear. This is only demonstration data.

Item Type: Conference or Workshop Item (Paper) Subjects: H Social Sciences > HD Industries. Land use. Labor > HD61 Risk Management C Auxiliary Sciences of History > CE Technical chronology. Calendar N Fine Arts > NE Print media P Language and Literature > PN Literature (General) > PN1990 Broadcasting B Philosophy. Psychology. Religion > BV Practical Theology B Philosophy. Psychology. Religion > BM Judaism ID Code: 6898 Deposited By: Christopher Gutteridge Deposited On: 10 September 2006

Repository Staff Only: edit this record

Figure 2.8: Eprints displaying a formatted record

Fedora (Cornell University Information Science and University of Vir- ginia Library) Fedora[15] is a very powerful content management system, whose scope is much larger than the simple dissemination of documents. One could see Fedora as a technology on which systems such as a digital library could be built on. The very abstract “digital objects” Fedora manages support different views and complex business logics. The formatting in Fedora can be defined using XSLT, by accessing the XML internal structure of objects. Fedora hence relies on a standard well-known by webmasters. Fedora developers do not need to create, maintain and document a custom language, and webmasters do not have to learn a new language to format the records.

Applications with similar goals Pybliographer From the developer website[16] “Pybliographer is a tool for managing bibliographic databases. It can be used for searching, editing, re- formatting, etc”. It is designed to be used as a standalone application, for a single user. It is typically used for storing temporary bibliographic references when writing reports. Figure 2.9 shows the main window of Pybliographer. Pybliographer provides six defaults ouput formats: HTML, LaTeX, Raw, Text, Textau and Textnum. There is no GUI for editing new formats. However new formats can be added by dropping format files in a dedicated folder. The for- mats must be defined using an XML encoding. There is still no documentation on how to write new formats. The developers plan to replace the current formats by an XSL-based mechanism.

Biblioscape (CG Information) From the company website,“the product family is designed to help researchers collect and manage bibliographic data and notes, as well as generate citations and bibliographies for publica- tion”. It is a complete suite of products that is very similar in functionalities to CDS Invenio. The desktop version of the suite allows to edit formats output, as seen on screenshot 2.10. The interface relies on a list of elements (sequence of predefined bibliographic

32 2.2. COMPARATIVE ANALYSIS

Figure 2.9: Pybliographer main window

Figure 2.10: Biblioscape, edition of formats.

fields) to specify a format. An edit button allows to refine the format of the selected element. A preview of the format is displayed at the bottom of the window. The preview should be lively updated but an update button is pro- vided in case it does not work. While it seems simple to edit a format, the user manual of the application still provides a support email address to request custom formats. Biblioscape offers hundreds of predefined formats.

BookEnds (Sonny Software) From the company website[17] “ offers a powerful and flexible means of saving, retrieving, and formatting refer- ences for bibliographies or footnotes”. While its primary purpose is to be used as a desktop application, it can also be used as a web server. The application allows to manage and edit formats. Writing a format using the integrated editor is pretty much similar to writing a script. Bibliographic fields are represented using reserved characters. Some controls allow to cus- tomize some particular fields, like the authors and editors. The manual of this editor contains than 10 pages, and the creation of styles requires to know the scripting language. BookEnds provide more that 150 default formats. New formats can be created by reusing the existing base.

33 CHAPTER 2. ANALYSIS

(a) List of formats (b) Format editor

Figure 2.11: Bookends

2.2.3 Synthesis In table 2.2.3 we review the strengths and weaknesses of the applications we have seen. These applications either provide a scripting language to define formats or a GUI that does basic formatting. Given the user population of BibFormat, we should provide both. Formats management tools are almost non-existent, and maybe useless as these formats a have poor expressiveness. For most of these applications the documentation is scarce and lacks sample files.

34 2.2. COMPARATIVE ANALYSIS 2 2 2 6 2 3 1 5 to use. between formats) (Difficult) to escape) to escape) (Basic text input field) (Lots of text (Lots of text Dependencies (Tools difficult PHP BibFormat 1 2 2 5 6 2 5 N/A no help) formats) BookEnds characters) characters) edit, remove (GUI to add, (Lots of special (Lots of special (GUI almost of 4 6 6 5 N/A N/A N/A N/A formats) loops, etc.) edit, remove Biblioscape (GUI to add, (no branches, 2 4 5 4 4 2 doc) tool) N/A N/A XML) XML) mentation) loops, etc.) (Individual (Almost no text editor) (easier than (easier than (Exclusively (Little docu- (no branches, XSL, but still XSL, but still XML files. No Pybliographer 5 3 4 6 4 6 4 N/A easy) easy) Fedora (object- (Lots of oriented) especially especially standard) doc exists) (Defined in (Almost no text editor) (XSL is not (XSL is not (Exclusively doc, but XSL files. No tool) tutorials. XSL individual XSL ? ? ? 1 1 N/A N/A N/A Synthesis of competitive analysis. Scale go from 1 (weak) to 6 (excellent). tation) Eprints (No documen- Table 2.5: 4 6 6 1 5 2 3 N/A styles) (Time- DSpace manual) keywords) keywords) keywords) (Basic. No (Defined in consuming) text editor) (Exclusively file. No tool) only one text (short english (short english (need to learn (doesn’t allow Ease ofing learn- Formatting language ease of writing Formatting language ease of reading Formatting language expressiveness Management offormats existing Assigning for- mats to collec- tions GUI editor Documentation / Help

35 CHAPTER 2. ANALYSIS

36 Chapter 3

Design

3.1 Specifications

As a result of the analysis, the following specifications were formalized. The object table 3.1 lists the objects and attributes with which users will interact. The names of the objects have been carefully thought with the users to convey their meaning.

Object Attributes Description name Name of the output format A description of what the format description is for The MIME content-type of the Output content-type Format output A unique 6 chars long code used short identifier code to identify the output format A condition on the value of a field assignment condition of formatted record rules The template used for a matching format template condition Links to format templates used dependencies by this output format name Name of the template A description of what the tem- description Format plate does Template Static elements of the formatted text, images, etc. HTML code ouput Dynamic element that changes format elements contextually to record, language, etc. Links to output format that call dependencies this template, and format ele- ments used by this template name Name of the element Format A description of what the tem- description Element plate does A list of configurable options for parameters the element (varies depending on element)

37 CHAPTER 3. DESIGN

name Name of the knowledge base a description of what the knowl- description Knowledge edge base is used for Base from Key of the mapping mappings to Value of the mapping Links to format element using dependencies this knowledge base

Table 3.1: BibFormat object table

Output formats are categories of formatting styles. They can be selected by end-users in the web interface to change how records are displayed. Output format applies to any kind of record: for example the “detailed HTML” output format can be chosen for “publications” as well as for “pictures” documents. For admin-users an output format simply is a set of rules that redirect to a format template matching the type of record being formatted. The output format object more or less corresponds to the behavior of the old BibFormat module. The format templates simply corresponds to the formats of the old BibFormat module, and define how to format a record. It can be modified by all of our personas. A new concept has been introduced, the format element. It corresponds to a dynamic brick that can be added into a format template. The format element value will vary according to the record that is being formatted. For example, the authors format element will print the value of the authors, and the title element will print the title of the record. The object map (Figure 3.1) shows the relationships between these objects.

BibFormat Record <>

ListList of of Format Format ListList ofof OutputFormat ListList of Knowledgeof Format TTemplatesemplates TFormatsemplates TemplatesBases 1 1 1

* * *

Format Template Output Format Knowledge Base

<> 1 1

* * Format Static Text Rule Mapping Element

1 <>

* … Marc field

<>

Dependency

Legend: means "uses" if not specified

Figure 3.1: BibFormat object map

38 3.1. SPECIFICATIONS

Figure 3.2 summarizes the structure of the BibFormat administration interface that lets users interact with these objects. It is a mix of action-based and task- based structure.

Create output format, open newly created format and Main UI List of Output Formats Add Output Format redirect to "Edit attributes" Delete Output Format

Validate Output Format Show errors if any

Sort Output Formats

Open Output Format Edit attributes (Localized names, description, content-type) Set default format template Add new rule Delete rule Reorder rules Check dependencies Create format template, propose to make a copy of existing one, List of Format Templates Add Format Template open newly created format and redirect to "Edit attributes" Delete Format Template

Validate Format Template Show errors if any

Sort Format Templates

Open Format Template Edit attributes (Name, description) Add text, images, etc. Add format elements (such as title, authors, etc.) Modify fonts, colors, layout of text, images, elements Preview Check dependencies Create output format, open newly created format and List of Knowledge Bases Add Knowledge Base redirect to "Edit attributes" Delete Knowledge Base

Validate Knowledge Base Show errors if any

Sort Knowledge Bases

Open Knowledge Base Edit attributes (Name, description) Add new mapping (from → to) Edit mapping Delete mapping Sort mappings Check dependencies Format Elements Reference (doc) Description of the element

Options of the element

Preview element

Check dependencies

Administration Guide (doc) Quick Introduction

Tutorial

Guide

Figure 3.2: BibFormat administration interface structure

Once the specifications were done, an email was sent to all customers suscribed to the CDS Invenio mailing list, calling for feature requests and comments based on the textual description of the specifications. This was a way to validate our model, inform the users of an upcoming version of BibFormat and get them somehow involved in the design process. Feedback was globally positive. Some concerns raised regarding the possibility to use previous formats in the new BibFormat and support for internationalization. Both of these concerns were addressed in the implementation.

39 CHAPTER 3. DESIGN

3.2 Prototypes

Early paper prototypes were sketched as a means to get a feeling of the possible layouts. Some prototypes were designed even before the analysis was completed, such that they did not fully comply with the specifications. Nevertheless they have provided an interesting basis for the design of the BibFormat user inter- face. One of the prototypes shown here (Figure 3.3) is the interface for the manage- ment of format templates (called “Stylesheets” in the prototype). It included the possibility to create a template, delete selected templates and edit a tem- plate by clicking on the “edit” link of the template. Format templates could be analyzed and checked via a button at the bottom of the page. A column in the list showed the languages for which a format template had been translated.

Figure 3.3: Prototype of the format templates management page

Interfaces in figures 3.4(a) and 3.4(b) allow a format template to be assigned to a set of records. A condition needs to be specified on a field of the record. In a task-based system, this assignment would be done right after the template has been created. This case was studied in Figure 3.4(b). However, as seen in the task analysis, this task was more to be done by a user dealing with the output formats, in a separate interface. Figure 3.4(a) shows the same assignment from this point of view.

Figure 3.5 shows a prototype of the WYSIWYG format template editor, as conceived earlier in the design phase. The idea was to mimic well-known word processors for the edition of format templates. A button would let a user insert format elements into the template. When clicking on an element, it would expand to let a user modify options, such as a prefix or suffix text that would be printed only if the format element was not empty for the formatted record. Later in the design phase the prototype was reconsidered in order to better fit in

40 3.2. PROTOTYPES

(a) Assignment through output formats (b) Assignment through format templates

Figure 3.4: Prototype of the management of the assignment of format templates to records

⤵ ✂ ⤹

Normal Font Size B I add a brick + III

Article: title , author year

abstract

view html code

author ,

Figure 3.5: Prototype of the format template editor the project timeframe (Figure 3.8). The WYSIWYG editor has been replaced by a code editor and preview panel. It was also an occasion to give more impor- tance to the format elements documentation, as this was essential to the edition of format templates.

Based on the paper prototypes, an interactive HTML Low-Fi prototype was built. This allowed to test the visual integration with CDS Invenio administra- tion pages and check what controls could fit on a single screen. They were also used to get the interface approved by the CDS Invenio developers.

Figure 3.6 is a Low-Fi version of the paper prototype of figure 3.3 and fig- ure 3.7 shows two different versions of the Low-Fi prototype for the edition of knowledge bases.

41 CHAPTER 3. DESIGN

Manage Stylesheets

From here you can create, edit or delete stylesheets available for collections. More advanced users can edit the subformats.

Name Description Languages Action [?] Output a standard detailed record with title, year, authors, Standard detailed html de, en, fr, it Edit Preview Delete abstracts, etc.

Standard brief html Output a standard brief record with title, year, authors de, en, fr, it Edit Preview Delete

Output a standard detailed record with image, year, author, Image detailed html de, en, fr, it Edit Preview Delete description

Image brief html Output a standard brief record with image and description de, en, fr, it Edit Preview Delete

Detailed BibTex Output a detail record in BibTex format de, en, fr, it Edit Preview Delete

Show dependencies of selected stylesheets Add new stylesheet

Figure 3.6: Low-Fi prototype of the management of templates

Manage Knowledge Base BibTex

Here you can add new mappings to the BibTex base and change the base attributes.

Map From T o Action [?] New Mapping [?] Map From: ARTICLE => article Edit Delete Name: BibTex Description: Mapping between the 980 field and BSibaTveeX B eanstery A ttytpreibsutes To: BOOK => book Edit Delete Map Map To: Add New Mapping Add new Mapping From: PICTURE => picture Edit Delete Map From T o Action [?] POETRY => poetry Edit Delete ARTICLE => article Edit Delete Base Attributes [?] PREPRINT => preprint Edit Delete Name: BibTex BOOK => book Edit Delete Description: Mapping between the 980 field and BibTeX entry PICTURE => picture Edit Delete types

POETRY => poetry Edit Delete Update Base Attributes PREPRINT => preprint Edit Delete

(a) (b)

Figure 3.7: Low-Fi prototypes of the knowledge base editor

BibFormat Code: Elements:

Title: prefix="" suffix="" /> Prints the abstract

Prints the abstract

Prints the abstract Save Prints the abstract Preview: Language: English Prints the abstract

Content-Type: text/html Search Pattern: recid:74 Preview Prints the abstract Prints the abstract

Prints the abstract

Prints the abstract

Prints the abstract

Search

Figure 3.8: Revised rototype of the format template editor

Although unrelated to UI prototyping, it is interesting to discuss the proto- types of languages that have been thought as formatting language. Even if our personas Patrick and Marisa were not to see this language, it was in the interest of Alain to come up with a clear and elegant solution to define formatting.

42 3.2. PROTOTYPES

The prototypes below show how to output a bold title, followed by the au- thor’s name linked to his website (for example), and on next line the year and publication name of the current record.

Language 1

$format("title"), $link($AUTHOR_NAME)
$260C, $kb($999_t).

This language mixes HTML and custom elements starting with a $ sign. Each custom element is replaced by its value during the formatting process. Fields of the record can either by accessed by their MARC tags or by name. Being able to access fields by name instead of accessing them by MARC tags introduces an abstraction layer which has several benefits: it makes the understanding of the formats easier, allows people unfamiliar with MARC to write formats, and finally dissociates the meaning of a field from its code (which allows the users to redefine a MARC tag for other purposes without having to change all formats).

Language 2

$format("title"), $link($100_a)
$260C, $kb($999_t).

This language is similar to language 1, except that it only allows the user to refer to a field with its MARC tag.

Language 3

,
.

This language is similar to the PHP programming language: special tags enclose Python code.

Language 4

print ""; format("title") print ", %s
" % link($100_a) print "%s, %s." % ($260C, kb($999_t))

The fourth language is completely standard Python code.

None of these languages were satisfactory, as they either required learning a custom language, or were not suitable for formatting. The software architect of

43 CHAPTER 3. DESIGN

CDS Invenio has come up with a better idea than the prototyped language: the idea was to use exclusively HTML code. The examples shown above translate to:

Language 5

,
,

In that way we have a language that is HTML valid, and that can be generated by any HTML editor. The dynamic elements (format elements) are tags that start with BFE, and are replaced at runtime by values of the record. A format element can take parameters as input, in order to modify the behavior of the element. For example BFE_AUTHOR takes print_link as parameter, which de- termines if a link to author’s website has to be printed by the element. These format elements are defined in individual Python files provided by the program- mers. See section 3.4 for more information on the different files involved in the for- matting process.

3.3 Final UI Design

The implemented web-based administration user interface of BibFormat is shown in the screenshots of this section. The navigation between these pages follows the structure specified in the section 3.1. You can walk through these interfaces (each labeled with a number on their top left corner) by following the orange links printed on the screenshots.

Format templates related interfaces The list of format templates, output format and knowledges bases user interfaces are all similar, and very close to the prototype of section 3.2. Therefore only the format templates list is shown (Figure 3.10). Format templates and output formats lists interfaces show right in front of each format its status, such that any problem with a format is found imme- diately. The format templates list has an additional button “Check Format Templates Extensively”, which allows errors to be found in the templates that could not be detected by a quick check. When an error is found, the status switches to a red “Not Ok” link, which leads to a description of the problem. Errors can happen for example when an output format refers to a non-existing format template, or when a format template uses a format element which fails to produce an output. Errors should mainly occur when configuration files are modified manually without using the web interface.

A menu (standard in CDS Invenio) that allows the user to quickly switch to other sections of the administration without having to go back to the main menu is placed at the top of the page.

44 A T L A N T I S admin :: account :: messages :: baskets :: alerts :: groups :: approvals :: I N S T I T U T E O F administration :: logout F I C T I V E 3.3.S FINALC I E N C UIE DESIGN Search Submit Personalize Help Home > Admin Area > BibFormat Admin

BibFormat Admin 1 0A T L A N T I SBibFormat has changed! admin :: account :: messages :: baskets :: alerts :: groups :: approvals :: I N S T I T U T E O F administration :: logout I C T I V E You will need to migrate your old formats if you are not a F new user. You can read the documentation to learn how S C I E N C E to write formats, orSea userch the miSgurbamtioitn a ssPisetrasnont.alize Help Home > Admin Area > BibFormat Admin For some time the old BibFormat will still run along the 2 new one, so that you can transition smoothly (See old BibFormaAtd mAind Inmterifnace further below).

This is where you can edit the formatting styles available for the records. BibFormat has changed! Manage Format Templates Define howY toou f owrimlla nt eae rde ctor md.igrate your old formats if you are not a new user. You can read the documentation to learn how 3 Manage Output Ftoor mwriteats formats, or use the migration assistant. Define which template is applied to which record for a given output. For some time the old BibFormat will still run along the Manage Knowledngeew B oanses, so that you can transition smoothly (See old Define mapApdinmgisn o If nvtaelrufeasc, ef ofru srttahnedra brdeilzoiwng) .records or declaring often used values.

A T L A N T I S This iIsN wShTerIeT yUoTu Eca nO eFdit the formatting styles availablead mfion :r: atchcouen t r::e mcesosargeds s:: .baskets :: alerts :: groups :: approvals :: administration :: logout F I C T I V E FormSatC EIlEemNeCnEts Docume n tatiSoeanrch Submit Personalize Help 4 Manage Format Templates HomeDo >c Audmmin Aerena t>a BtibiFoornm ato Afd mtihn >e M faonagrem Foarmta t eTelmepmlatees nts to be used inside format templates. Define how to format a record. BibForMmatn Aadgme iFno Grumidaet Templates ManagDe oOcumtpeunt tFaotiromn aatbsout BibFormat administration DefMienneu which template is applied to which record for a given output. 0. Manage Format Templates 1. Manage Output Formats 2. Format Elements Documentation 3. Manage Knowledge Bases Manage Knowledge Bases From here you can create, edit or delete formats templates. Have a look at the format elements documentation to learn which elements you can use in Dyouerf tienmepl amtesa. ppings of values, for standardizing records or declaring often used values. 5 Name Description Status Last Modification Date Action [?]

BibTeX Creates BibTeX format of a record OK Thu Sep 14 09:57:16 2006 Delete OForlmda tB EliebmFenotsr Dmocautm eanFiguredtamtionin in 3.9:terfaBibFormatce (in gray mainbox) menu Doc umDeefaunltt HaTtMioL nbr ieof f the Bfroierf mHTMaLt feorlmeamt ents to be used inside formaOKt temTphul Saetpe 1s4 .09:57:16 2006 Delete The BibFormat admin interface enables you to specify how the bibliographic data is presented BibFormat Admin Guide to the end usDeefrau iltn H TtMhLe c aspteioansrchHTML inte "captionsrface only"an dformat search results pages. FoOrK exaTmhup Sleep ,1 4y 0o9:u57 :1m6 2a0y06 specDifeylete that titDleos csuhmouelndt abteio pnr ainbtoeudt iBn iboFlodr mfoantt ,a tdhme ianbisttrraatciot nin small italic, etc. Moreover, the A T L A N T I S BibF1orma t iDse fnauolt Ht ToMnL ldyet aaile dsimpThlies isb thieb dlefiaoulgt HrTaMpL hdeitacil ed faortmaat output formatterO,K but Tahlus Soep a14n 0 9a:57u:1t6o 2m006ated lDinelkete constIruNcStoTrI. TFoUrT eExa mO pFle, from the information on jourandmainl ::n acacomunte :: maenssadge sp ::a bgaskeetss ,:: ailte rtms :: agryoups :: approvals :: administration :: logout automFaItCicTalIlyV cEreate links to publisher's site based on some configuration rules. S C I E NDeCfauElt HTML portofolio HTSMeaLr "cphortf olio" Sfourbmmatit Personalize Help OK Thu Sep 14 09:57:16 2006 Delete Home > Admin Area > BibFormat Admin > Manage Format Templates Delete Config uDrefianult HgTM LB simiiblariFty oSmramll HTaMLt notice printed in "Similar Documents" section OK Thu Sep 14 09:57:16 2006 Manage Format Templates 6 Prints a record as a single row of an html table. Used for Excel Old Bi bEFxceol rmat admin interface (in gray boOxK ) Thu Sep 14 09:57:16 2006 Delete Menu output 0. Manage Format Templates 1. Manage Output Formats 2. Format Elements Documentation 3. Manage Knowledge Bases The BibFo rmMAaRtC a XdMmL in interfaStcaneda red nMaARbCl eXMs Ly ooutupu tto specify how the biblOioKgraTphhu Siecp 1d4a 09t:a57 :i1s6 2p00r6esentDeedlete to the eFnrodm huesree yro ui nca nt hcrea tsee, eadirt corh d eilnettee forrfmaactse te amnpldate s.e Haavrec ah l oroek satu thlet sfo prmaagt eelesm. eFntso dro ceumxaenmtatpiolne to, lyeaornu w mhicahy e lesmpeentcs iyfoyu can use in your templates. that titles shoPicutulrde H bTMeL p brrieifnted iTnh eb broiefl dHT fMoLn fotr,m atth sueit aableb fsort driasplcayti nign p icstumresall italic,O eKtc. MThuo Sreep o14v 0e9:r57,: 1t6h 2e006 Delete BibForma t iNsa mneot only a simDpelsecr ibpitibonliographic data output formatterS,t abtuus t Laaslts oM oadinfi catuionto Dmatae tedA cltiinonk [?] 7* constructo r.P Fictuorer H TeMxLa dmetaiplelde, frTohme de ttahileed HiTnMfLo frormaat stuitoabnle for ndis pjloayuingr pnicatulre sname anOKd paTghue Sse,p i1t4 0m9:5a7:y16 2006 Delete 0 automatic allByib TceXreate links toC preautebs BlibsTheXe fror'msa ts oif tae re cborad sed on some configuOrKatioTnh ur Suepl e14s 0.9:57:16 2006

XML DC XML Dublin Core output using BFX engine OK Thu Sep 14 09:57:16 2006 Delete Config uDrefiaunlt HgTM LB briiefbFoBriemf HTMaLt format OK Thu Sep 14 09:57:16 2006 XML Dublin Core XML Dublin Core output OK Thu Sep 7 16:56:34 2006 Delete Default HTML captions HTML "captions only" format OK Thu Sep 14 09:57:16 2006 Delete

XML EndNote XML EndNote output using BFX engine OK Thu Sep 14 09:57:16 2006 Delete Default HTML detailed This is the default HTML detailed format OK Thu Sep 14 09:57:16 2006 Delete

Delete DXMLefau lNLMt HTML portofolio HXTMMLL N "LpMor tofoultipou"t fuosrimngat BFX engine OK Thu Sep 14 09:57:167 2006 Delete

Delete DXeMfaLu lRt HSSTML similarity SXmMaLll RHSTSM 2L.0 n outitcpeu pt ruinsitnegd iBnF "XS iemniglainr eDocuments" section OK Thu Sep 14 09:57:167 2006 Delete

Check Format TempPrintslat ea srecord Ex tase an singlesive rowly of an html table. Used for Excel Add New Format Template Excel OK Thu Sep 14 09:57:16 2006 Delete output

MARC XML Standard MARC XML output OK Thu Sep 14 09:57:16 2006 Delete

2 & 3 are very similar toDe lete1 Atlantis I nstituPtiec toufr eF iHctTivMe LSc bierniecfe :: Search T::h Seu bbrmieift :H: TPMersLo nfaolrimzea :t: sHueitlapb le for displaying pictures OK Thu Sep T1h4i s0 9si:t5e7 i:s1 6al s2o0 0av6ailable in the following languages: Powered by CDS Invenio v0.90.1.20060822 !"#$%&'() Català !esky Deutsch !""#$%&' English Español Français Maintained by [email protected] Italiano 日本語 Norsk/Bokmål Polski Português *+''(), Slovensky Last upda ted: P$iDctautree: H20T0M6/0L9 d/1e3ta 1il8e:d32:46 $ The detailed HTML format suitable for displaying pictures OK Thu Sep 14 09:57:16 2006 De Slevetneska -(&%./'0(% Figure 3.10: Format templates list XML DC XML Dublin Core output using BFX engine OK Thu Sep 14 09:57:16 2006 Delete

XML Dublin Core XML Dublin Core output OK Thu Sep 7 16:56:34 2006 Delete

The format XM templateL EndNote XML editor EndNote output has using BFX finally engine been simplifiedOK Thu Sep 14 0 due9:57:16 200 to6 timeDelete constraints and technical requirements. The fully WYSIWYG editor has been replaced XML NLM XML NLM output using BFX engine OK Thu Sep 14 09:57:17 2006 Delete

XML RSS XML RSS 2.0 output using BFX engine OK Thu Sep 14 09:57:17 2006 Delete 45 Check Format Templates Extensively Add New Format Template

Atlantis Institute of Fictive Science :: Search :: Submit :: Personalize :: Help This site is also available in the following languages: Powered by CDS Invenio v0.90.1.20060822 !"#$%&'() Català !esky Deutsch !""#$%&' English Español Français Maintained by [email protected] Italiano 日本語 Norsk/Bokmål Polski Português *+''(), Slovensky Last updated: $Date: 2006/09/13 18:32:46 $ Svenska -(&%./'0(% CHAPTER 3. DESIGN

Code editor

A T L A N T I S 6I N S T I T U T E O F admin :: account :: messages :: baskets :: alerts :: groups :: approvals :: administration :: logout F I C T I V E S C I E N C E Search Submit Personalize Help Home > Admin Area > BibFormat Admin > Manage Format Templates > Format Template Default HTML brief Hide Format Elements List Format Template Default HTML brief

Menu 0. Close Editor 1. Template Editor 2. Modify Template Attributes 3. Check Dependencies

Format template code Hide Documentation Elements Documentation Search for: Search

prefix = ' [' Prints the abstract of a record in english and then french. suffix="] " /> Prints field 700__% of the record. guest :: session :: alerts :: baskets :: login Search Submit Convert Agenda Webcast B u l l e t i n Library Prints field 088__a of the record. Home > Aprefix='
' Published Articles > Search R esstyle="note"ults suffix=""/> Published Articles Prints the additional report numbers of the record. Prints field 65027a of the record. Search: 1 any field Search Browse Prints field 246__% of the record. Search Tips :: Advanced Search Prints a list of addresses linked to this report. Search collections: Save Changes HTML Affiliation display. PubPlirsehvieedw Articles *** add another collection *** Sort by: Content-type (MIME): text/htDmislplay L arnegsuuagltes: : English O Suetaprcuht Paftoterrmn:a t: recid:1 Reload Preview Prints the list of authors of a record. - latesAt fiTrstL - A N T I S desc. - or rank by - 10 results split by collection HTML brief An entry point to the BibFormat BFX engine, when used as an element. Formats the admin :: account :: messages :: baskets :: alerts :: groups :: approvals :: administration :: logout I N S T I T U T E O F record according to a template. For further details, please read the documentation. F I C T I V E S C I E N C E Search Submit Personalize Help Published Articles 274,001 records found 1 - 10 P jruinmts pa ftuol lr BeciboTredX: n1otice. 'width' must be bigger than or equal to 30. This format Search took 0.12 seconds. Home > No recid:1Document Found element is an example of large element, which does all the formatting by itself.

1. FIRST: Fast Iterative Reconstruction Software for (PET) tomography / Herraiz, J L; Espana, S; Vaquero, J J; Desco, M; Udias, J M SmNallo a nDimaol PcEuT smcanenenrs tre qFuiore uhignh dspatial resolution and good sensitivity. [...] Prints a list of records citing this record. physics/0609104; PMB-217610-PAP.- 12 Sep 2006 . - 19 p Fulltext - Published in: Physics in Medicine and Biology, Volume 51, Number 18, 21: Detailed record - Similar records Prints the collection identifier. Translate using given knowledge base. 2. Physical-mechanical characterization of hydraulic and non-hydraulic lime based mortars for a French porous limestone / Al-Mukhtar, M; Beck, K The focus of the study presented in this paper is to provide reliable criteria that can be used to estimate the degree of compatibility between the French limestone tuffeau and mortar.< [B...F]E_COLLECTION_IDENTIFIER/> physics/0609108; 13 Sep 2006 Fulltext - Published in: Heritage, Weathering and Conservation (2006) 6p: Prints field 980__% of the record. ADtleatnatiilse dIn rsetictoutred o-f S Fiimctiilvaer Sreciceonrcdes :: Search :: Submit :: Personalize :: Help This site is also available in the following languages: Powered by CDS Invenio v0.90.1.20060822 !"#$%&'() Català !esky Deutsch !""#$%&' English Español Français 3.MJaoinintatin rede abyli tcyds a.sunpdpo Brt@elcle rin.cehq ualities for consecutive measurements / Lapiedra, R Italiano Norsk/Bokmål Polski Português *+''(), Slovensky 日本語 Prints contact information for the record. LSaostm uep dnaetwed :B $eDlla iten:e 2q0u0a6li/t0i9e/s1 3fo 1r8 c:3o2n:s4e6c u$tive measurements are deduced under joint realism assumption, using some perfect correlation property. [...] Svenska -(&%./'0(% quant-ph/0609099; 13 Sep 2006 . - 7 p Fulltext - Published in: Europhysical Letters, 75 (2), 202 (2006): Detailed record - Similar records Get the record creation date. 4. Teaching the Environment to Control Quantum Systems / Pechen, A; Rabitz, H A non-equilibrium, generally time-dependent, environment whose form is deduced by optimal learning control is shown to provide a means for incoherent manipulation of quantum< sByFstEe_mDsA. [T..E.]/> Prints the imprint publication date as HTML. quant-ph/0609097; 12 Sep 2006 Fulltext - Published in: Phys. Rev. A 73, 062102 (2006): Detailed record - Similar records 5. Renormalization of expansions for Regge trajectories of the Schr\"odinger equation / Kulikov, D A; Tutik, R S A recursion technique for the renormalization of semiclassical expansions for the Regge trajectories of bound states of the Schr\"odinger equation is developed. [...] quant-ph/0609066; 10 Sep 2006 . - 5 p Fulltext - Published in: Dniepropetrovsk National University Scientifical Herald. Series: Detailed record - Similar records Atla6.ntisC Irnsutintucteh oifn Fgic rtievea lS cdieantcae :o: nSe atrhche :G: Srubidm i:t :p: Prearcsotniacleiz ea :n: Hde elpx perience with the European DataGrid / , Groep, D; , Templon, J; , Loomis, C This site is also available in the following languages: Powered by CDS Invenio v0.90.1.20060822 !"#$%&'() Català !esky Deutsch !""#$%&' English Español Français Italiano 日本語 Norsk/Bokmål Polski Português *+''(), MaintaTinhede Dby0 cedxsp.seurpimpoertn@t hcearsn .ucshe d the European DataGrid (EDG) testbed to reprocess real data obtained from the Tevatron collider at the Fermi National Accelerator Laboratory. [...] Slovensky Svenska -(&%./'0(% EGEE-PUB-2006-035; 2006 - PublishePreviewd in: Concurrency Computat.: Ppanelract. Exper.: 18 (2006) pp.925-940 Format Elements reference Last updated: $Date: 2006/09/13 18:32:46 $ Detailed record - Similar records 7. Medical image registration algorithms assesment : Bronze Standard application enactment on grids using the MOTEUR workflow engine / , Glatard, T; , Montagnat, J; Pennec, X Medical image registration is pre-processing needed for many medical image analysis procedures. [...] EGEE-PUB-2006-033; 2006 Detailed record - Similar records Figure 3.11: Format template editor by a code editor and a preview panel. Still the underlying design allows the implementation to be upgraded to the initial specifications in a later release of the module. The editor also contains a list of format elements that can be included in the format template. It is now a more important element of the interface than in the prototypes, as it is an essential part of the edition of format. Users can search in the list of format elements by keywords, move their mouse over an element to see its description, and click on an element to include it in their code. This allows users to concentrate on the edition of the template withoutA T L A havingN T I S to deal with another window containing the documentation of elements.I N S T I T FinallyU T E experiencedO F users can close the format elements reference. admin :: account :: messages :: baskets :: alerts :: groups :: approvals :: administration :: logout TheF I C finalT I versionV E includes a toolbar to insert HTML tags into the code of the S C I E N C E Search Submit Personalize Help template, and get the documentation of the selected format element in the code. AHome menu > Adm (Figurein Area > Bib 3.12)Format A atdminthe > Man topage Fo ofrmat the Temp editorlates > Format allows Template to Default HTML brief Format Template Default HTML brief

Menu 6 7 8 0. Close Editor 1. Template Editor 2. Modify Template Attributes 3. Check Dependencies

Format template code Hide Documentation Elements Documentation Search for: Search

prefix = ' 3.12: Top menu of[' the format template editor Prints the abstract of a record in english and then french. suffix="] " /> and go back to format[' templates list (go to figure 3.10). suffix="] " /> Prints field 700__% of the record. Prints field 088__a of the record. Prints the additional report numbers of the record.

Prints field 65027a of the record.

Prints field 246__% of the record.

Prints a list of addresses linked to this report.

Save Changes HTML Affiliation display. Preview Content-type (MIME): text/html Language: English Search Pattern: Reload Preview Prints the list of authors of a record. A T L A N T I S An entry point to the BibFormat BFX engine, when used as an element. Formats the admin :: account :: messages :: baskets :: alerts :: groups :: approvals :: administration :: logout I N S T I T U T E O F record according to a template. For further details, please read the documentation. F I C T I V E S C I E N C E Search Submit Personalize Help Prints a full BibTeX notice. 'width' must be bigger than or equal to 30. This format Home > No Document Found element is an example of large element, which does all the formatting by itself.

No Document Found Prints a list of records citing this record.

Prints the collection identifier. Translate using given knowledge base.

Prints field 980__% of the record. Atlantis Institute of Fictive Science :: Search :: Submit :: Personalize :: Help This site is also available in the following languages: Powered by CDS Invenio v0.90.1.20060822 !"#$%&'() Català !esky Deutsch !""#$%&' English Español Français Maintained by [email protected] Italiano 日本語 Norsk/Bokmål Polski Português *+''(), Slovensky Prints contact information for the record. Last updated: $Date: 2006/09/13 18:32:46 $ Svenska -(&%./'0(% Get the record creation date.

Prints the imprint publication date as HTML.

Atlantis Institute of Fictive Science :: Search :: Submit :: Personalize :: Help This site is also available in the following languages: Powered by CDS Invenio v0.90.1.20060822 !"#$%&'() Català !esky Deutsch !""#$%&' English Español Français Italiano 日本語 Norsk/Bokmål Polski Português *+''(), Maintained by [email protected] Slovensky Svenska -(&%./'0(% Last updated: $Date: 2006/09/13 18:32:46 $ 3.3. FINAL UI DESIGN

• edit the format template code (go to figure 3.11 from other parts of the interface). • edit the attributes of the templates (go to figure 3.13). • see the dependencies of the template on other objects (go to figure 3.13). Similar menus are also available in the knowledge base editor and output format editor. The edition of the attributes is done in a very basic interface (Figure 3.13).

A T L A N T I S 7 admin :: account :: messages :: baskets :: alerts :: groups :: approvals :: I N S T I T U T E O F administration :: logout F I C T I V E S C I E N C E Search Submit Personalize Help Home > Admin Area > BibFormat Admin > Manage Format Templates > Format Template Default HTML brief Attributes

A T L A N T I S admin :: account :: messages :: baskets :: alerts :: groups :: approvals :: I N S T I T U T E O F Format Templaatdemi nDistraetiofna ::u loglotu t HTML brief Attributes F I C T I V E S C I E N C E Search SubmMit en u Personalize Help Home > Admin Area > BibFormat Ad6min > Manage Format Templates >0 .F Corlmosaet TEedmitpolra te1 .U Tnteitmlepdl aAtett rEibduitteosr 2. Modify Template Attributes 3. Check Dependencies Format Template Untitled ADtetfaruiltb HuTtMeLs brief attributes [?] Name: Default HTML brief Menu Description: Brief HTML format 0. Close Editor 1. Template Editor 2. Modify Template Attributes 3. Check Dependencies 7* Make a copy of format template: [?] 1 None (Blank Page) Update Format Attributes Untitled attributes [?] Name: Untitled Description: Figure 3.13: Attributes of a format template (Similar to output formats and knowl- Atlantis Institute of Fictive This site is also available in the following edge bases attributes).Science :: WhenSearch :: Submit adding:: Personalize :: He alp new format template, a dupli-languages: Powered by CDS Invenio v0.90.1.20060822 !"#$%&'() Català !esky Deutsch !""#$%&' Update Format AttriMbauinttaeinsed by [email protected] English Español Français Italiano 日本語 cate of an existing oneLast upd canated: $Dat bee: 2006/ created.09/13 18:32:46 $ Norsk/Bokmål Polski Português *+''(), Slovensky Svenska -(&%./'0(%

It is not goingAtlantis Instituteto of Ficti beve used very often, exceptedThis site is also available for in the fol thelowing creation of the format Science :: Search :: Submit :: Personalize :: Help languages: template,P butowered by C itDS In isvenio essentialv0.90.1.20060822 to be able!" to#$%&'( set) Català these!esky Deutsch attributes,!""#$%&' in order to make Maintained by [email protected] English Español Français Italiano 日本語 Last updated: $Date: 2006/09/13 18:32:46 $ Norsk/Bokmål Polski Português *+''(), the retrieval of format template possible. Sl Whenovensky Svenska - creating(&%./'0(% a format template, this interface is shown first, and gives the possibility to make a duplicate of an existing format template.

Output formats and knowledge bases have a similar interface. The differ- ences are that the interface for format template attributes allows to make a copy of an existing template at the creation of a format template while other interfaces do not, and that output format attributes contain more fields for the name of the output format, one for each language supported in CDS Invenio, in order to display that name in the main search interface of CDS Invenio.

The dependencies page (Figure 3.14) shows the list of usages of the for- mat template in output formats, and the list of format elements and associated MARC tags used in the format template.

Finally a Dreamweaver floating panel was implemented as a proof of concept of the possibility to edit format template with HTML editors (Figure 3.15). It allows to insert format element in the HTML code and provide access to the format elements documentation. Inserted elements are shown as a blue brick marked as bfe.

47 CHAPTER 3. DESIGN

8A T L A N T I S admin :: account :: messages :: baskets :: alerts :: groups :: approvals :: I N S T I T U T E O F administration :: logout F I C T I V E S C I E N C E Search Submit Personalize Help Home > Admin Area > BibFormat Admin > Manage Format Templates > Format Template Default HTML brief Dependencies Format Template Default HTML brief Dependencies

Menu 0. Close Editor 1. Template Editor 2. Modify Template Attributes 3. Check Dependencies

Output Formats that use Format Elements used by Default All Tags 6 Default HTML brief HTML brief* Called*

HTML brief bfe_abstract(590__b, 590__a, 520__a, 100__a 520__b) 111__a bfe_authors(270__p, 700__a, 100__a) 245__a bfe_fulltext(8564_u) 245__b bfe_title_brief(245__a, 245__b, 250__a 111__a, 250__a) 270__p 520__a 520__b 590__a 590__b 700__a 8564_u *Note: Some tags linked with this format template might not be shown. Check manually.

Figure 3.14: Dependencies of a format template (Similar to output formats and

Atlantknowledgeis Institute of Ficti basesve dependencies) This site is also available in the following Science :: Search :: Submit :: Personalize :: Help languages: Powered by CDS Invenio v0.90.1.20060822 !"#$%&'() Català !esky Deutsch !""#$%&' Maintained by [email protected] English Español Français Italiano 日本語 Last updated: $Date: 2006/09/13 18:32:46 $ Norsk/Bokmål Polski Português *+''(), Slovensky Svenska -(&%./'0(%

Figure 3.15: Integration of BibFormat within Dreamweaver

Output Formats related interfaces When users click on an output format in the list of output formats, the rules of the output format are displayed (Fig- ure 3.16). They allow to specify which templates must be used when this output format is selected to format a record. Depending on user-specified conditions

48 3.3. FINAL UI DESIGN on the values of the fields of the record, one of the template will be chosen for the formatting. Users can reorder the rules (rules are evaluated from top to bottom), add rules and delete rules.

9A T L A N T I S I N S T I T U T E O F admin :: account :: messages :: baskets :: alerts :: groups :: approvals :: administration :: logout F I C T I V E S C I E N C E Search Submit Personalize Help Home > Admin Area > BibFormat Admin > Manage Output Formats > Output Format HB Rules Output Format HB Rules 0 menu 0. Close Output Format 1. Rules 2. Modify Output Format Attributes 3. Check Dependencies

Define here the rules the specifies which template to use for a given record. 2 Use template Picture HTML brief if field 980.a is equal to PICTURE [?] 1

Remove Rule 1

Use template Periodical HTML Brief if field 980.a is equal to PERIODICAL [?] 2 Remove Rule 2

By default use Picture HTML brief

Add New Rule Save Changes

Figure 3.16: Output format rules Atlantis Institute of Fictive Science :: Search :: Submit :: Personalize :: Help This site is also available in the following languages:

A T L A N T I S 10I N S T I T U T E O F admin :: account :: messages :: baskets :: alerts :: groups :: approvals :: administration :: logout F I C T I V E S C I E N C E Search Submit Personalize Help Home > Admin Area > BibFormat Admin > Manage Knowledge Bases > Knowledge Base EJOURNALS Knowledge Base EJOURNALS

Menu 0. Close Editor 1. Knowledge Base Mappings 2. Knowledge Base Attributes 3. Knowledge Base Dependencies

0 Here you can add new mappings to this base and change the base attributes.

Add New Mapping [?] Map From T o Action [?] Map From: AAS Photo Bull. => AAS Photo Bull. Save Delete To:

Add new Mapping => Save Delete 3 Accredit. Qual. Assur. Accredit. Qual. Assur.

Acoust. Phys. => Acoust. Phys. Save Delete

Acoust. Res. Lett. => Acoust. Res. Lett. Save Delete

Acta Astron. => Acta Astron. Save Delete

Adv. Comput. Math. => Adv. Comput. Math. Save Delete

Aequ. Math. => Aequ. Math. Save Delete

Afr. Skies => Afr. Skies Save Delete

Algorithmica => Algorithmica Save Delete

=> Save Delete Figure 3.17:Am. J.Knowledge Phys. base mappingsAm. J. Phys.

Ann. Phys. => Ann. Phys. Save Delete

Annu. Rev. Astron. Astrophys. => Annu. Rev. Astron. Astrophys. Save Delete

=> Save Delete Knowledge Bases related interfacesAnnu. Rev. Earth Planet. Sci.A knowledgeAnnu. Rev. Eart baseh Planet. Sci. defines mappings => Save Delete of values that need to be quicklyAppl. Ph andys. Lett. easily editable.Appl. Phys. Let Figuret. 3.18 shows the

=> Save Delete edition of a the mappings of knowledgeAppl. Phys., A base. Appl. Phys., A

=> Save Delete When a knowledge base is opened,Appl. Phy thes., B text insertionAppl. P cursorhys., B is placed right into

=> Save Delete the Map From field such thatA usersppl. Radiat. Iso cant. directlyAp startpl. Radiat. Isot inserting. value. The

=> Save Delete tabulator key let users enter theApTopl. Surf. Svalueci. of the mapping,Appl. Surf. Sci. and Enter key adds

=> Save Delete the mapping to the list. FieldsArc areh. Appl. M erasedech. and theArch. A insertionppl. Mech. cursor is placed

Arch. Envir. Contam. Toxicol. => Arch. Envir. Contam. Toxicol. Save Delete

=> Save Delete Arch. Rational Mech. Analys. Arch. Rational Mech. Analys. 49

=> Astron. Astrophys. Astron. Astrophys. Save Delete

=> Astron. Astrophys. Rev. Astron. Astrophys. Rev. Save Delete

=> Astron. Astrophys., Suppl. Astron. Astrophys., Suppl. Save Delete

Astron. J. => Astron. J. Save Delete

Astron. Lett. => Astron. Lett. Save Delete

Astron. Nachr. => Astron. Nachr. Save Delete

Astron. Rep. => Astron. Rep. Save Delete

Astropart. Phys. => Astropart. Phys. Save Delete

Astrophys. J. => Astrophys. J. Save Delete

=> Astrophys. Norvegica Astrophys. Norvegica Save Delete

=> ATLAS eNews ATLAS eNews Save Delete

=> Balt. Astron. Balt. Astron. Save Delete

=> Bioimaging Bioimaging Save Delete

Biol. Cybern. => Biol. Cybern. Save Delete

=> Bull. Astron. Belgrade Bull. Astron. Belgrade Save Delete

=> Bull. Astron. Inst. Czech. Bull. Astron. Inst. Czech. Save Delete

=> Bull. Astron. Soc. India Bull. Astron. Soc. India Save Delete

=> Bull. Eng. Geol. Environ. Bull. Eng. Geol. Environ. Save Delete

Bull. Environ. Contam. Toxicol. => Bull. Environ. Contam. Toxicol. Save Delete

=> Calc. Var. Partial Differ. Equ. Calc. Var. Partial Differ. Equ. Save Delete

Chaos => Chaos Save Delete

=> Chaos Solitons Fractals Chaos Solitons Fractals Save Delete

Chem. Phys. => Chem. Phys. Save Delete

=> Chem. Phys. Lett. Chem. Phys. Lett. Save Delete

=> Chin. Astron. Astrophys. Chin. Astron. Astrophys. Save Delete

=> Chin. J. Astron. Astrophys. Chin. J. Astron. Astrophys. Save Delete

=> Class. Quantum Gravity Class. Quantum Gravity Save Delete

Clim. Dyn. => Clim. Dyn. Save Delete

Colloid Polym. Sci. => Colloid Polym. Sci. Save Delete

Combinatorica => Combinatorica Save Delete

Combust. Theory Model. => Combust. Theory Model. Save Delete

Comment. Math. Helv. => Comment. Math. Helv. Save Delete

Commun. Math. Phys. => Commun. Math. Phys. Save Delete

default - A default value printed if the record has no value for this element.

See also: Dependencies of this element The correctness of this element Test this element

DATE Prints the imprint publication date as HTML. Parameters: prefix - A prefix printed only if the record has a value for this element. suffix - A suffix printed only if the record has a value for this element. default - A default value printed if the record has no value for this element. See also: Element DATE Element PUBLISHER Element REPRINTS Element IMPRINT Dependencies of this element The correctness of this element Test this element

DATE_REC Date of the entry of the record in the database. Parameters: prefix - A prefix printed only if the record has a value for this element. suffix - A suffix printed only if the record has a value for this element. default - A default value printed if the record has no value for this element. See also: Element DATE Dependencies of this element CHAPTER 3. DESIGN The correctness of this element Test this element back into the Map From field for a new addition. The mappings canD beIVI editedSION or deleted right from this page. Format elements related Prints field interfaces909C0p of the recordFormat. elements are provided to users as the bricks to be used Param inete theirrs: format templates, and are therefore not mod- prefix - A prefix printed only if the record has a value for this element. ifiable. Still users need suf tofix - know A suffix p whatrinted on thely if th elementse record has a v arealue fo for.r this el Aemen dynamicallyt. generated reference documentation separator - A sepa israto notr betw onlyeen elem availableents of the fie rightld. Defau fromlt valuethe is « » template nbMax - The maximum number of values to print for this element. No limit if not editor, but also fromspec aifie dedicatedd. web page that displays more information and offers more features that defa theult - A short default referencevalue printed ifof the therecordtemplate has no value fo editor.r this elemen At. sample notice of an element isSee shownalso: in figure 3.18. Users can test a format element Dependencies of this element The correctness of this element Test this element

5 EDITORS Prints the list of editors of a record.

Parameters: print_links - if yes, print the editors as HTML link to their publications. Default value is «yes» limit - the maximum number of editors to display. 0 separator - the separator between editors. Default value is « ; » extension - a text printed if more editors than 'limit' exist. Default value is «[...]» prefix - A prefix printed only if the record has a value for this element. suffix - A suffix printed only if the record has a value for this element. default - A default value printed if the record has no value for this element. See also: Dependencies of this element The correctness of this element Test this element

Figure 3.18: OneED formatIT_REC elementORD notice of the format elements documentation with custom parameters, Prints a seelink to where BibEdit, if the autho elementrization is gra isnted used. (similar to other de- pendencies pages) and Par validateameters: the code of the element (check that the format style - the CSS style to be applied to the link. element has no error). prefix - A prefix printed only if the record has a value for this element. suffix - A suffix printed only if the record has a value for this element. default - A default value printed if the record has no value for this element. Migration related interfaceSee also: As stated in the specifications, the new Bib- Format is not compatible Depen withdencies o thef this e configurationlement files of the old BibFormat. The correctness of this element This meansA T thatL A N allT I S formats Test this el haveement to be rewritten. However a compatibility layer has beenI N S T implementedI T U T E O F such that during the transitionadmin :: accou tont :: m theessagesnew :: baskets formats, :: alerts :: groups :: approvals :: administration :: logout F I C T I V E customersS canC I E stillN CEE useXPE theirRIMEN oldT S formats.earch Su Tobmit help Perso themnalize adoptHelp the new system, a migrationHome kit > Adm assistantin ArF BEibF_oErmX hasatP AEdmRinI been M> MEigNraTtemade />BibFormat S availableettings (Figure 3.19). It can migrate the behaviors, knowledge Prints fie basesld 909C0e and of the formatsrecord. to the new system (Although it cannot translateMigra formatste BibF intoorm theat S newettin formattinggs language, it can help create Parameters: the entries inYou thecan se newe b e l o w BibFormat). t hper erefmixa i-n Ain gp rsetefpixs ptor icnotmedp loentely t hife tmheig rreactiornd o hf ayso au rv BaliubeF ofromr atht isse tetlienmgse.nt. suffix - A suffix printed only if the record has a value for this element. Note that it is n o t r e c soempmareantdoerd - t oA p sreopceasrsa tao rs tbepe tmwoereen t helaenm oennctes (ofr tiht me figiehltd c. rDeaetfea duultp vliaclautes i)s. « » nbMax - The maximum number of values to print for this element. No limit if not Steps (ins pseucgigfieesdte.d order) Status 1. Migrat e k n o w l e dgeef baauslest - A default value printed if the record hMasi gnraot evda lue for this element. 2. Migrate behaviours Not Migrated 3. Migrat e f o r mSaetse also: Not Migrated Dependencies of this element The correctness of this element Test this element

Atlantis Institute of Fictive SFigurecience :: Search3.19: :: Submit :: MigrationPersonalize :: Help kit steps. This site is also available in the following languages: Powered by CDS InEveXnioT vE0.R90N.1.2A00L60_8P22U BLICATIONS !"#$%&'() Català !esky Deutsch !""#$%&' English Español Maintained by [email protected] Français Italiano 日本語 Norsk/Bokmål Polski Português *+''(), Last updated: $Dat Slovensky Svenska -(&%./'0(% 50 Prints list of links to external publications. Parameters: separator - (no description provided). Default value is «
» prefix - A prefix printed only if the record has a value for this element. suffix - A suffix printed only if the record has a value for this element. default - A default value printed if the record has no value for this element. See also: Dependencies of this element The correctness of this element Test this element

FIELD Prints the given field of a record.

Parameters: limit - the maximum number of values to display. tag - the tag code of the field that is to be printed. separator - a separator between values of the field. Default value is « » prefix - A prefix printed only if the record has a value for this element. suffix - A suffix printed only if the record has a value for this element. default - A default value printed if the record has no value for this element. See also: Dependencies of this element The correctness of this element Test this element

FIRST_AUTHOR Prints field 100__% of the record.

Parameters: prefix - A prefix printed only if the record has a value for this element. suffix - A suffix printed only if the record has a value for this element. separator - A separator between elements of the field. Default value is « » nbMax - The maximum number of values to print for this element. No limit if not specified. default - A default value printed if the record has no value for this element.

See also: Dependencies of this element The correctness of this element Test this element

FULLTEXT

This is the default format for formatting full-text reference. Parameters: style - CSS class of the link. separator - the separator between urls. Default value is «; » prefix - A prefix printed only if the record has a value for this element. suffix - A suffix printed only if the record has a value for this element. default - A default value printed if the record has no value for this element.

See also: Dependencies of this element The correctness of this element Test this element

IMPRINT

Print imprint (Order: Name of publisher, place of publication and date of publication).

Parameters: publisher_label - a label to print before the publisher name. date_label - a a label to print before the publication date. separator - a separator between the elements of imprint. Default value is «, » place_label - a label to print before the publication place value. prefix - A prefix printed only if the record has a value for this element. suffix - A suffix printed only if the record has a value for this element. default - A default value printed if the record has no value for this element. 3.4. FORMATTING ENGINE DESIGN

3.4 Formatting Engine Design

Along with the user interface, a new formatting engine had to be designed. In order to support the different configuration levels (Output formats, format tem- plates and format elements) suitable for our user population, a layered system has been architected (Figure 3.20).

BibFormat

Output Format Output Format

Format Template Format Template Format Template Format Template

Format Element Format Element Format Element Format Element Format Element

Figure 3.20: BibFormat layered architecture

Output formats can be considered as the entry point of the BibFormat work- flow. Whenever a record has to be formatted, BibFormat is at least given a record ID (to fetch record meta-data from the database) and an output format short identifier code. Output formats simply define which format template has to be used for formatting the given record. Output formats are configured either from the user interface presented in section 3.3, or by editing directly the code of the output format. A sample output format code is shown below:

tag 980.a : PICTURE --- Picture_HTML_detailed.bft PERIODICAL --- Periodical_HTML_detailed.bft

default: Default_HTML_detailed.bft

In this example, if tag 980.a is equal to PICTURE (ignoring case), then the Picture_HTML_Detailed template will be used. Else if the same field is equal to PERIODICAL then Periodical_HTML_Detailed template will be applied. In all other cases, the default case will apply.

The syntax is the following one: for each field of the record on which we need to put a condition, write tag marc tag:←- and make it followed by one or more lines with the value of the condition and the format template that must be applied if the condition is true, separated by three dashes. value --- format template filename.bft←-

51 CHAPTER 3. DESIGN

Note that the value can be expressed as a regular expression. In the same way as for the GUI administration, the conditions are evaluated from top to bot- tom, until a condition is found to be true for the current record or the default template is reached. Many conditions can be put on one field, and many fields can be declared.

Once a format template has been chosen, BibFormat has to process it. The format template contains the code that defines how the record will look like. It contains static and dynamic parts: static parts do not change according to the record that is being formatted and is outputted as such, while dynamic parts need to be interpreted by BibFormat to match the current record value. In most of the cases, static and dynamic elements of the code are HTML code, as the output need to be displayed in a browser. However static parts could be anything: the rule for static code is What You Write Is What You Get. Concerning dynamic code, it is always written with an HTML syntax. For example to output the title of the record, one needs to write . The text will be replaced with the title value of the record being formatted. Other such format elements are available. They can take special parameters as input: for example one could write in order to prefix the title with a label “Title: ””. Different attributes can be configured depending on the format element. Format templates contains other kinds of dynamic elements: the localized text. When a text need to be translated in various languages, a tag has to be used to enclose the different translations of the same text. Each translation is then enclosed with its language code tag: , , , etc. For example, the localized version of “title” in English, German and Italian would be written:

title Titel titolo

The BibFormat engine will filter the localized version and only keep the language relevant to the formatting. The code below shows a sample format template:

”> ” /> ”/> ”/> Abstract : R´esum´e:

Format elements used in format templates are basically bindings to the database, with some post-processing capabilities. They are provided by programmers to

52 3.4. FORMATTING ENGINE DESIGN users as black boxes that print values of a record. They cannot be modified through the user interface. A format element is a small Python script. The script has to implement a function named format, that takes a mandatory parameter bfo. This parame- ter is an object that provides accessors to the context in which the formatting occurs, such as the record or the preferred language of the user. The func- tion can also takes additional parameters, which can be used to pass values to the format element from the format template. For example a format ele- ment which outputs the list of authors of a record can take the limit param- eter to limit the number of authors printed by the element. One would write in the format template to call this element with a limit of authors set to 5. The function must return a textual value, that will be printed in the format template where the format element is called. A strong emphasis is put on the documentation of format elements, such the format elements layer of the BibFormat architecture is easily accessible to users of the format template layer. A Javadoc syntax in the docstring of the format function is used to define the description of the element, describe the parame- ters, and refers to other related format elements. This allows a documentation to be generated for the element, as shown in figure 3.18. A sample format element is shown below: def format(bfo, limit=’ 10 ’ , separator=”,” ): ’’’ Prints the list of authors of the record.

@param limit The maximum number of authors to print @param separator A separator between authors ’’’ authors = [ ] authors 1 = bfo.fields( ’ 100 a ’ ) authors 2 = bfo.fields( ’ 700 a ’ )

authors.extend(authors 1 ) authors.extend(authors 2 )

nb authors = len(authors)

i f limit.isdigit() and nb authors > int(limit): return separator.join(authors[: int(limit )]) else : return separator.join(authors)

The name of the Python script file used for the element corresponds to the name of the format element.

In summary the workflow of the BibFormat engine is shown in figure 3.21. One exception to this workflow is the case when BibFormat is given an un- known output format or format template: in that case, BibFormat will hand on the formatting to the old PHP formatting module. This allows customers of CDS Invenio to transition smoothly to the new BibFormat, as they can use all

53 guest :: session :: alerts :: baskets :: login Search Submit Convert Agenda Webcast B u l l e t i n Library

Home > Search Results

CERN Document Server CHAPTER 3. DESIGN

Search: any field of their old formatsSearch withBrows thee new module, and progressively replace them with the new formats.Search Tips :: Advanced Search Search collections: *** any collection ***

Sort by: Display results: Output format: - latest first - desc. - or rank by - 10 results split by collection HTML brief HTML address label Format record with output HTML brief format HTML detailed Results overview: Found 872,740 records in 0.48 seconds. HTML detailed detailed BibFormat Articles & Preprints, 716,214 records found HTML Marc Books & Proceedings, 57,552 records found HTML photo captions only Presentations & Talks, 13,845 records found HTMP portfolio Periodicals & Progress Reports, 3,444 records found XML Dublin Core Multimedia & Outreach, 29,957 records found XML MARC Retrieve corresponding Archives, 51,728 records found output format

Articles & Preprints 716,214 records found 1 - 10 jump to record: 1 tag 980.a: PICTURE---Picture_HTML_detailed.bft 1. Ground state properties of neutron-rich Mg isotopes - the "island of inversion" studied with laser and beta-NMR

HD.bfo default: Defaut_HTML_detailedDefaut_HTML_detailed

Evaluate rules, and format with corresponding format template

Detailed.bft Abstract Résumé : />

Default HTML Filter languages Evaluate all format elements

def format(bfo): "Print abstract of the record" return bfo.field("") bfe_abstract.py

Return formatted record

Figure 3.21: BibFormat workflow

The concept model of the BibFormat engine is shown in figure 3.22.

54 3.4. FORMATTING ENGINE DESIGN

Marisa Patrick Alain

CDS Invenio

bibformatadmin bibformat

bibformatadminlib bibformat_engine

bibformat_templates

bibformat_dblayer Silvio search_engine format

output format Stephanie bibfmt

fmtKNOWLEDGEBASES format template

bibrecord fmtKNOWLEDGEBASEMAPPINGS format element

Figure 3.22: Concept model of BibFormat engine

55 CHAPTER 3. DESIGN

56 Chapter 4

Evaluation

4.1 User Evaluation of the BibFormat Admin User Interface

The evaluation of the new BibFormat user interface was done informally with some of the target users. The person in charge of the formats at CERN and three members of the CERN library were given a short introduction to the new BibFormat interface. Although this was not very helpful at discovering issues in the interaction model, it helped to find out that:

• The notion of dependencies between the output format, format templates and format elements was not always clear to users. It was primarily the name that users suggested to revise.

• The “close editor” menu item in the format template editor was not clear for users: the editor does not open in a different window and it was not clear that it would make them go back to the list of formats when clicking on this option.

• The initial idea concerning the edition of templates with a custom HTML editor was that users would get access to the format templates through a locally mounted network volume. Librarians expressed the need to be able to upload or download format templates via the web interface. An upload button could be added in the format template editors, as well as download button in front of each format.

• Librarians were confused regarding the knowledge base usage. Although they already know the concept and purpose of of knowledge bases, and understood how to edit them, they were wondering when they would be used in the formatting process. This interaction issue might come from the fact that knowledge bases are primarily used by format elements, but are shown in the interface as a separate entity. It would maybe judicious to print out a reference to the format elements using the knowledge bases right in the knowledge bases list.

Another evaluation of BibFormat was conducted more formally with some- one totally unfamiliar with CDS Invenio. This person was only a little bit

57 CHAPTER 4. EVALUATION familiar with HTML, and did not know MARC 21 at all. He was first shortly introduced to CDS Invenio and explained the goals of the evaluation. The participant then was given four tasks:

1. Modify how the formatting of search results for picture, such that picture is shown on the left instead of the right.

2. Create a totally new formatting for a new collection of article that has just been added, and display the title, abstract and authors (limited to 10 authors).

3. Assign the formatting created in last step to the new collection.

4. Normalize the abbreviated journal name Phys Rev A to Phys. Rev. A.

For the first task the participant first tried to modify the formatting using the output formats. Very quickly he found out that it was not the right place for editing the formatting, and by elimination chose the format templates options. It seems that the name was not adequate. Labels such as “Modify formatting” instead of “Format templates” and “Assign format to collection” would have better helped this user. Of course these labels were written just under the options, but the user did not read them. The task was then completed almost successfully. Only one problem occurred when the user saved his work: after clicking the end button, the participant expected to be brought back to the main menu, as he had finished his task. The second task was more straightforward than the first one, as the participant had gotten familiar with the interface. The participant has gone through two minor problems. He tried to drag and drop a format element in the editor instead of clicking on it, but quickly figured out that clicking on the element was adding it to the code. He also did not immediately see that the Javascript toolbar could be expanded to offer more options for the insertion of HTML tags. This toolbar was important to him as he did not remember some HTML tags. Still the concept of adding elements and configuring their attributes was totally natural for the user. During the third task, the only problem that occurred was that the user wanted to replace an existing assignment rule with the rule for the new collection, instead of creating a new rule. This is because the task was not clear to him. When told that he should not modify the existing rule, he immediately created a new rule for the new collection. This concept was clear to him excepted for the notion of MARC code. The last task was completely successful. He did not asked any question for this task.

4.2 Usability Study of the Formats

A small online usability study of the default formatting of CDS Invenio was conducted at the end of the project, in order to reveal some usability issues with the current formats and give some ideas on their possible evolution. This study targets users different that the direct administrators users of BibFormat: it targets the before-mentioned readers users, who access CDS Invenio to find and read the documents.

58 4.2. USABILITY STUDY OF THE FORMATS

This study was not part of the initial project timeframe. Only a very short period of time was available to set this experiment up, such that it does not comply with a strict and scientifically accurate procedure. However it nicely complements this analysis of the formatting in CDS Invenio.

4.2.1 Goals of the Study Users were asked to compare two kinds of formatting for the search results. The first one was the regular formatting of CDS Invenio (Figure 4.1(a)), modified to show two lines of the abstract instead of a single one. The second format- ting (Figure 4.1(b)) was a modified version of the first one with the following hypothetic improvements: • Sans-serif font: given as the CSS stylesheet of CDS Invenio does not force the type of the font and as most browsers are by default configured to use serif font, most users were seeing the results displayed in a serif font that could be difficult to read. The CSS stylesheet has been modified to user a sans-serif font. • Clickable title: when I used CDS Invenio for the first time, I personally did not find out how to see the details of the record. It seems that persons who have been using popular web search engines are willing to click on the title to see the full record. • Highlighted keywords: words used as search query are highlighted in the results. • Contextual abstract: although the first lines of the abstract might be the most relevant lines to display for each record of the search results, an attempt has been done to extract a line that matches the most the keywords of the search query.

Physics at the front-end of a neutrino factory : a quantitative appraisal / Mangano, M L et al [CERN-TH-2001-131] [hep-ph/0105155] We present a quantitative appraisal of the physics potential for neutrino experiments at the front-end of a muon storage ring. We estimate the forseeable accuracy in the determination of several interesting observables, and explore the consequences of these measurements[...] http://documents.cern.ch/cgi-bin/setlink?base=preprint&categ=hep-ph&id=0105155 Detailed record - Similar records (a) Original search results

Highlighted search keywords Title links to detailed record

Physics at the front-end of a neutrino factory : a quantitative appraisal / Mangano, M L et al [CERN-TH-2001-131] [hep-ph/0105155] We present a quantitative appraisal of the physics potential for neutrino experiments at the front-end of a muon storage ring. We estimate the forseeable accuracy in the determination of several interesting observables, and explore the consequences of these measurements[...] http://documents.cern.ch/cgi-bin/setlink?base=preprint&categ=hep-ph&id=0105155 Detailed record - Similar recordsds Sans-serif font Abstract contextual to search keywords

(b) New search results

Figure 4.1: Search results formatting

The contextual abstract (which is equal to non-contextual one in figure 4.1(b)) is implemented using a very basic technique: each sentence is given a weight

59 CHAPTER 4. EVALUATION corresponding to the number of keywords it contains. Then each consecutive group of n lines (n corresponding to the number of lines to display for the ab- stract, 2 in our case) in the abstract is given the sum of the weights of the lines as weight. The group weighting the more is returned.

The goal of the study was to check that users preferred the second kind of formatting and that it allowed them to search more efficiently.

4.2.2 Experimental Conditions The experiment took place on website only accessible from within CERN. Users connecting to the website were introduced to the experiment, asked some per- sonal information and given the task to find

a) either a document that matches their current research

b) or a document that discusses the possibility of violation of traditional physics theory due to superconductors

For this first part they were given access to the regular CDS Invenio search engine, with search results formatted as in figure 4.1(a). Once they had found a matching document users were asked their feeling about the user interface. Users had then to perform a second task using a modified version of CDS Invenio that implemented the formatting shown in figure 4.1(b). They had to find

a) either a document that matches the most the kind of research they were doing when they graduated

b) or a course on history of particles

As for the first interface, they were asked their feeling about this second inter- face. Then they were asked to compare both interfaces. None of the fields were mandatory and users could skip any part of the experi- ment with the “Skip” button located at the top of all pages.

The records set was made of about one thousand records imported from the CERN documents server, taken from all the most recent additions. This means that most of the documents were physics-related, although many IT-related documents were also available. Users where asked to find documents related to their research, but could also find one of the suggested documents (The docu- ments were present in the documents set). Users could not just copy-paste the description of the documents to find them.

The time to complete each task was recorded, as well as the preferences of users, comments, their knowledge of search engines and the documents they have finally found. The experiment was also recording if users had tried to click on the title of search results in the first interface, even if it was not marked as a title. Users were asked closed and open questions, with a majority of closed questions. They were also asked to grade some assertions, like their computer knowledge, using a 1 to 4 stars scale.

60 4.2. USABILITY STUDY OF THE FORMATS

The experiment was advertised during a presentation of BibFormat at a User and Document Services group meeting. Some flyers were also put in front of the CERN library. It was however not possible to send an email to all CERN users.

Given the time allocated for the implementation of the system and the low number of participants expected, the order of the task was the same for all users: they were always first shown the old formatting before the new formatting, and always had to find the same documents. This has prevented to check that the results where independent of these variables.

As part of my EPFL Master thesis done at CERN, I am conducting a small online usability study

of CDS Invenio (formerly CDSWare), the CERN document server.

It takes about 10 minutes to complete the experiment. To participate, go to the following web page:

http://pcdh23.cern.ch/ui.py (only accessible from CERN) Thank you for your help!

For more informations: [email protected]

Figure 4.2: Flyers advertising the experiment

4.2.3 Results As expected, due to the reasons mentioned above, very few people managed to participate to the study during the few days it was available. A dozen of persons have connected to the experiment, but only 8 of them have completely filled in the questionnaires. The results discussed in this section should therefore be taken with precaution, and not treated as scientifically accurate.

Users of the experiment were mostly IT workers and librarians. In average they estimated their computer knowledge as good (about 3 stars out of 4 for IT staff and 2 stars for others). They have all been using web search engines, and regularly use CDS Invenio.

During the first part of the experiment, users rated the relevance of the ab- stract with the first interface at 2.4 stars in average, and only 28% found that they took too much time to complete their task (5 minutes 28 seconds in aver- age). There were no special comments, excepted one user who could not find a document related to her research, and who was not good enough in physics to search for the suggested document.

61 CHAPTER 4. EVALUATION

Only one user clicked on the (hidden) title of a records to display the details.

The relevance of the abstract with the second interface was rated 2.5 stars in average, and 25% estimated that it took too much time to find the document (3 minutes and 42 seconds). Comments suggested to decrease the importance given to the title (which is bold, highlighted and underlined in the new format) and to make the highlighting optional. 2/3 of users clicked on the title link to display the detailed view of records

When asked to choose their preferred interface, only 1 person chose the first one (certainly because of the highlighting which he found to be annoying). The others were in favor of the second kind of formatting. When asked about the font used, half of the users preferred the second one, while the others said they had no preference for one font over the other. Same results for the abstract, that only half of the user found to be more pertinent in the second case.

Although the numbers seem to show that users were more efficient at search- ing with the second kind of formatting that with the first one, it could be due to the fact that:

• the second document was easier to find than the first one

• users had become familiar with the set of document in the first interface, such that they could more easily find in the second one

Also when asked, participants did not feel that they were more efficient in the second interface.

The highlighting also seemed controversial. As suggested it could be turned off by default, and enabled by the user when needed. A more discreet style could also be studied, such as bolded text for keywords highlighting. This technique could not be applied to highlight the title, as title is already bolded in order to make it distinct from the authors list.

The linked titles seemed to be very much appreciated by users. The only problem is that it makes the title look much bigger, as it is now underlined and colored in blue. One solution would be to attach a style that would make it look like the first formatting. It could still be clickable by users, and even switch to the regular link appearance when the mouse would roll over the title.

It is difficult to state if the contextual abstract was more relevant than the first one: users graded the relevances almost as equal after each task, but when asked to directly compare the abstracts, users preferred the second one. This might come from a misunderstanding of the question which has made them evaluate the look of the abstract instead of the content of it.

4.3 Further Improvements of BibFormat

The small number of possible iterations for the analysis, design and implemen- tation phases have let a lot of room for improving the formatting module.

62 4.3. FURTHER IMPROVEMENTS OF BIBFORMAT

Some refinements are desirable but not critical, and would mostly provide guid- ance to novice users. For example the studied WYSIWYG format template editor could be implemented to help beginners play with the edition of formats, but would not reach the expressiveness allowed with professional HTML editors such as Dreamweaver. Advanced users would also certainly prefer to edit the code directly.

Nevertheless a more task-based structure might still be considered. An im- plementation of the prototype shown in figure 3.4(b) (Section 3.2) for the al- location of formats to records would tend toward such a solution. While this could simply mean providing an alternative view to output formats, it could also lead to the abandon of output formats. A more complete analysis should be undertaken to see if this is a viable solution, especially regarding the loss of possibilities compared to the current implementation.

Of course results of the user evaluation should be taken into account to refine BibFormat. The most important tasks will be to: • Add a way to download/upload formats through the web interface • Rename some confusing labels • Implement a resizable/customizable set of panels for the format template editor The new possibilities offered by BibFormat should finally help make better formats, given that it is now possible to focus on the design instead of spending time on the implementation. This is where the most important improvements can be done, as formats have a direct impact on what end users will see. For example a new detailed format that lets users show or hide the list of authors has been implemented as a proof of concept of the modifications that could be done. This particular feature was extremely important, as CERN publications might be co-authored by more that 2000 researchers. The new BibFormat also features context-aware formatting: language, keywords of the search query, etc. can be taken into account to format records. Now that BibFormat is much faster than before, these advanced formatting techniques become possible and should be further investigated.

63 CHAPTER 4. EVALUATION

64 Chapter 5

Conclusions

In this report we have focused on the user-centered design of the new BibFor- mat. We have detailed the task analysis in section 2.1, and made a comparative analysis of related products in section 2.2. The resulting specifications and corresponding prototypes have been described in chapter 3, along with a presentation of the implemented UI design and mod- ule architecture. The results of the evaluation of the administration interface of the product have been given in section 4.1. The results of small usability of the formats have been discussed in section 4.2. Finally improvements to the new modules have been suggested in section 4.3.

This project resulted in a completely new implementation of the BibFormat module, which now ships as part of a new CDS Invenio release. The module has gone through a testing phase that has made it robust and fast. It can run along the previous version of BibFormat to let customers smoothly transition to the new module. All formats that were previously included by default in CDS In- venio have also been translated to the new version, such that most customers can easily adopt the new version.

Usability testing has shown that BibFormat is now much more accessible to novice users. The structure of the software has been designed to be simpler and to use less concepts. No documentation reading is needed to use the system, and users can play with it and try to make their own formats out of the box. Novice users can use any other HTML editor to edit format templates. Intermediate users can go further by editing the sources of format templates, and modify output formats. Advanced users can add new format elements, and modify all the configuration files with their preferred text editor. Even though no reading is necessary to use BibFormat, an extensive docu- mentation has been written. It includes a tutorial, a short explanation of how BibFormat works, and a manual that details every aspect of the software (which is also used for contextual help).

Although the new BibFormat features less concepts than the old one, it provides at least an equivalent expressiveness. The separation between the pre- sentation layer (format template) and the business logic layer (format element)

65 CHAPTER 5. CONCLUSIONS has made the edition of the formatting both easier and more powerful, given that the layout is made using standard HTML, and that format elements can use the full power of Python and its libraries to format records. All concepts of the old BibFormat that have been dropped in the new release can be done using format elements.

This revision of BibFormat has also been an occasion to implement all fea- tures that have been requested by our customers for a long time. This includes the possibility to internationalize formats, and custom output content-type, such as Excel output.

A good trade-off between analysis, design and implementation has been found such that all objectives could be fulfilled. Moreover some guidelines have been given for future improvements that should be easy to bring, now that the formatting engine has been implemented and integrated into CDS Invenio. These improvements includes small adjustments to the user interface to satisfy users’ needs, but also more significant changes, such as a revised way to assign format templates to records, in order to better support novice users.

Still the new BibFormat has been welcomed by concerned people and has generated a lot of interest, especially among librarians whose support is a re- quirement for a software like CDS Invenio.

66 Acknowledgements

I first would like to thank all the persons who made this Master project at CERN possible: Dr. Pearl Pu Faltings for supervising my thesis, Jean-Yves Le Meur, for proposing open projects to EPFL students, and Marisa Mar- ciano Wynn, coordinator of the internship program, for her support during the application process.

I also would like to express my gratitude to all CDS members, who quickly integrated me in the team, and gave me technical support in various areas. I am especially indebted to Tibor Simkoˇ , who spent hours discussing the techni- cal aspects of BibFormat, and more generally for helping me understanding the architecture of CDS Invenio. I furthermore would like to thank Belinda Chan Kwok Cheong for showing me how she was working with the PHP BibFormat, telling me her needs for the new BibFormat, and reviewing the prototypes and UI designs.

Finally I want to thank all the people — CDS developers, librarians, family and friends — who reviewed this document, tested BibFormat and gave any kind of support during this internship.

67 Acknowledgements

68 Appendix A

UML Use Cases

The scope of the following scenarios is the BibFormat administration system, at user goal-level.

Format templates related scenarios

Use Case: Create a new format template Primary Actor: Patrick Intention in Context: The user wants to create a new format template, either from scratch or copied from another one. Main success scenario: 1. Patrick goes to BibFormat web interface, and opens the list of format templates 2. He adds a new format to the list, and enter the attributes of the format (name, description) 3. He adds static elements to the format (some labels, tables, etc.) 4. He dynamic elements (Title, authors, abstract, etc.) to the format from a list of elements. 5. He layouts the elements on the page (Align, reorder, etc.) 6. He modifies the styles of the elements (color, size, etc) 7. He previews his format with differents records 8. He makes additional modification to match his needs 9. He saves the modification made to the format once he is satisfied Extensions: 2a. Patrick creates a duplicate of an existing format.

Here we assume that modification of layout and style of the elements are done through direct manipulation, as done it text editors. Marisa has a similar scenario for the creation of formats, but she uses a tool that she already knows. Alain also uses his own tools, but as he is an advanced user, he prefers to write directly the code of a format instead of using a WYSIWYG editor.

Use Case: Create a new format template

69 APPENDIX A. UML USE CASES

Primary Actor: Marisa Intention in Context: The user wants to create a new format template, either from scratch or copied from another one. Main success scenario: 1. Marisa opens Dreamweaver, her preferred HTML editor 2. She creates a new file and save it it the format templates directory of CDS Invenio 3. She adds dynamic elements (Title, authors, abstract, etc.) and static elements to the format 4. She layouts the elements on the page (Align, reorder, etc.) 5. She modifies the styles of the elements (color, size, etc) 6. She previews her format by going to a special URL in her internet browser 7. She makes additional modification to match his needs 8. She saves the modifications made to the format once she is satisfied Extensions: 2a. Marisa creates a duplicate of an existing format.

Alain’s scenarios Use Case: Create a new format template Primary Actor: Alain Intention in Context: The user wants to create a new format template, either from scratch or copied from another one. Main success scenario: 1. Alain opens emacs and creates a new file in his preferred text editor 2. He writes the code of the template in the created file (a) He uses HTML to design his template (b) He can refers to the documentation of dynamic elements to know how to add fields of the record such as title, abstract, etc. 3. He saves the file in the format template directory of the CDS Invenio installation Extensions: 1a. Alternatively Alain duplicates the file of an existing format, and open it in a text editor. The edition of format templates is similar to their creation for all users. We show here the scenario of Patrick. Other users just open the file of the template to modify in their preferred editor.

Use Case: Edit an existing format template Primary Actor: Patrick Intention in Context: The user wants to edit an existing format template. Main success scenario: 1. Patrick Patrick goes to BibFormat web interface, and opens the list of format templates

70 2. He finds the template he wants to modify and click on the edit button 3. Patrick can make the modifications he wants

Use Case: Preview a format Primary Actor: Alain Intention in Context: The user wants to preview the output produced by a format template with real records Main success scenario: 1. Alain opens his web browser 2. He goes to a special URL in which the name of the format template to preview is specified 3. The web page shows the preview of the format template.

Use Case: Check the validity/correctness of formats: Primary Actor: Alain Intention in Context: The user wants to check that formats havr no error or do not call undefined elements Main success scenario: 1. Alain opens his terminal and type “bibformat -validate” to get the status regarding correctness of formats. 2. The terminal ouputs “Ok” or a list of issues and the line numbers in the format files where the problems occur . Extensions: 1a. Alternatively Alain goes to the list of format templates in BibFormat administration page and see if each of the formats marked as “ok”.

Use Case: Check the dependencies of a format Primary Actor: Alain Intention in Context: The user wants to check which MARC fields a format template uses, and in which output format it is being used. Main success scenario: 1. Alain opens his terminal and type “bibformat -dep format_name” to get the dependencies of the format. 2. The terminal output a list of MARC field used by this format and in which case this format is used. Extensions: 1a. Alternatively Alain goes to the list of format templates in BibFormat administration page, click on a format and choose “Check dependen- cies” option

The modification of the name and description of templates are straightfor- ward scenarios not detailed here. The deletion of a template is as simple as clicking a delete button and confirm in the case of Patrick, and remove files from the format templates directory in the case of other users.

71 APPENDIX A. UML USE CASES

Output format related scenarios Marisa should not have to modify output formats. Nevertheless the scenarios of Patrick should also apply to Marisa. Use Case: Create a new output format Primary Actor: Patrick Intention in Context: The user wants to create a new output format, and set the rules that will define which template is used when this output format is called for a record Main success scenario: 1. Patrick goes to BibFormat web interface, and opens the list of output formats 2. He adds a new output format to the list, and enter the attributes of the format (name, description, content-type, short code identifier) and confirms 3. He can set the default template that will be called as last option 4. He can then add rules and edit them as he wants 5. He saves the modification made to the output format once he is satisfied Use Case: Create a new output format Primary Actor: Alain Intention in Context: The user wants to create a new output format, and set the rules that will define which template is used when this output format is called for a record. He uses his own editor. Main success scenario: 1. Alain opens his preferred text editor and creates a new file 2. He saves the file in the output format directory of CDS Invenio 3. He write the code of the output format 4. He saves the modification made to the output format once he is satisfied Use Case: Edit an output format Primary Actor: Patrick Intention in Context: The user wants to edit an existing output format, and sets the rules that will define in which condition a format template is used for that output format Main success scenario: 1. Patrick goes to BibFormat web interface, and opens the list of output formats 2. He opens the output format he wants to modify 3. He can add rules to the set of rules, and reorder them 4. For each rule he defines which template is used, and which condition on which MARC field of the record must be valid to use the template 5. He can set the default template when no rule applies for a record 6. He saves the modification made to the output format once he is satisfied Alain can also open an output format in his preferred editor and modify it. Other scenarios not detailed here allow to check the validity of an output for- mat, check its dependencies (which templates it uses), delete it or modify its attributes.

72 Appendix B

BibFormat Developers APIs

The APIs of bibformat.py consists in these functions: def format record(recID, of, ln=cdslang, verbose=0, search pattern=None, xml record=None, uid=None, on the fly=False): ””” Formats a record given its ID (or its XMLrepresentation) and an output format.

Returns a formatted version of the record in the specified language, with pattern context, and specified output format. The function will define by itself which format template must be applied.

Parameters that allow contextual formatting (like ’search pattern ’ and ’uid ’) are useful only when doing on−the−fly formatting, or when caching with care (e.g. caching all formatted versions of a record for each possible ’ln’).

The arguments are as follows :

recID − the ID of the record to format. If ID does not exist the function returns empty string or an error string, depending on level of verbosity. If ’xml record’ parameter is specified , ’recID’ is ignored

of − an output format code. If ’of ’ does not exist as code in output format, the function returns empty string or an error string, depending on level of verbosity. ;of’ is case insensitive.

ln − the language to use to format the record. If ’ln’ is an unknown language, or translation does not exist, default cdslang language will be applied whenever possible. Allows contextual formatting.

verbose − the level of verbosity in case of errors/warnings

73 APPENDIX B. BIBFORMAT DEVELOPERS APIS

0 − Silent mode 5 − Prints only errors 9 − Prints errors and warnings

search pattern − the pattern used as search query when asked to format this record (User request in web interface). Allows contextual formatting.

xml record − anXMLstring representation of the record to format. If it is specified , recID parameter is ignored. The XMLmust be pasable by BibRecord.

uid − User ID of the user who will view the formatted record. Useful to grant access to special functions on a page depending on user’s priviledge. Allows contextual formatting. Typically ’uid’ is retrieved with webuser.getUid(req).

on the fly − if False, try to return an already preformatted version of the record in the database.

””” Example: >> from invenio .bibformat import f o r m a t r e c o r d >> f o r m a t r e c o r d ( 5 , ”hb” , ” f r ” )

def format records(recIDs, of, ln=cdslang, verbose=0, search pattern=None, xml records=None, uid=None, record prefix=None, record separator=None, record suffix=None, prologue=””, epilogue=””, req=None, on the fly=False): ””” Returns a list of formatted records given by a list of record IDs or a list of records as xml. Adds a prefix before each record, a suffix after each record, plus a separator between records.

Also add optional prologue and epilogue to the complete formatted list.

You can either specify a list of record IDs to format, or a list of xml records, but not both (if both are specified recIDs is ignored).

’record separator’ is a function that returns a string as separator between records. The function must take an integer as unique parameter, which is the index in recIDs (or xml records) of the record that has just been formatted. For example separator(i) must return the separator between recID[ i ] and recID[ i+1]. Alternatively separator can be a single string, which will be used to separate all formatted records. The same applies to ’record prefix’ and ’record suffix ’.

’req’ is an optional parameter on which the result of the function are printed lively (prints records after records) if it is given. Note that you should set ’req’ content−type by yourself, and send http header before calling this function as it will not do it.

74 This function takes the same parameters as ’format record’ except for:

recIDs − a list of record IDs to format

xml records − a list of xml string representions of the records to format. If this list is specified, ’recIDs’ is ignored.

record prefix − a string or a function the takes the index of the record in ’recIDs’ or ’xml records’ for which the function must return a string. Printed before each formatted record.

record separator − either a string or a function that returns string to separate formatted records. The function takes the index of the record in ’recIDs’ or ’xml records’ that is being formatted.

record prefix − a string or a function the takes the index of the record in ’recIDs’ or ’xml records’ for which the function must return a string. Printed after each formatted record

req − an optional request object on which formatted records can be printed (for ”live” output )

prologue − a string printed before all formatted records string

epilogue − a string printed after all formatted records string

on the fly − if False, try to return an already preformatted version of the records in the database ””” def get output format content type(of ): ””” Returns the content type (eg. ’text/html’ or ’application/ms−excel ’) \ of the given output format.

The function takes this mandatory parameter:

of − the code of output format for which we want to get the content type ””” def record get xml(recID, format=’xm’ , decompress=zlib .decompress): ””” Returns an XML string of the record given by recID.

The function builds the XMLdirectly from the database, without using the standard formatting process.

’format’ allows to define the flavour of XML:

75 APPENDIX B. BIBFORMAT DEVELOPERS APIS

− ’xm’ for standard XML − ’marcxml’ for MARCXML − ’oai dc ’ for OAI Dublin Core − ’xd’ for XMLDublin Core

If record does not exist, returns empty string.

The function takes the following parameters:

recID − the id of the record to retrieve

format − the XMLflavor in which we want to get the record

decompress a function used to decompress the record from the database ””” The API of the BibFormat Object (’bfo’), given as a parameter to the format function of format elements, consists in the following functions. This API is to be used only inside format elements. def control field(self, tag): ””” Returns the value of control field given by tag in record.

If the value does not exist, returns empty string The returned value is always a string.

The argument is :

tag − the marc code of a field

”””

def field(self, tag): ””” Returns the value of the field corresponding to tag in the current record.

If the value does not exist, returns empty string The returned value is always a string.

The argument is :

tag − the marc code of a field

”””

def fields(self, tag): ””” Returns the list of values corresonding to ”tag”.

If tag has an undefined subcode (such as 999C5), the function returns a list of dictionaries, whoose keys are the subcodes and the values are the values of tag.subcode.

76 If the tag has a subcode, simply returns list of values corresponding to tag. The returned value is always a list.

The argument is :

tag − the marc code of a field

”””

def kb(self, kb, string, default=””): ””” Returns the value of the ”string” in the knowledge base ”kb”.

If kb does not exist or string does not exist in kb, returns ’default’ string or empty string if not specified

The arguments are as follows :

kb − the knowledge base name in which we want to find the mapping. If it does not exist the function returns the original ’string’ parameter value. The name is case insensitive (Uses the SQL ’LIKE’ syntax to retrieve value).

string − the value for which we want to find a translation− If it does not exist the function returns ’default’ string. The string is case insensitive (Uses the SQL ’LIKE’ syntax to retrieve value).

default − a default value returned if ’string’ not found in ’kb’.

”””

def get record(self): ””” Returns the record encapsulated in bfo as a BibRecord structure. You can get full access to the record through bibrecord.py functions. ””” Example (from inside BibFormat element):

>> bfo . f i e l d ( ” 520. a” ) >> ’We present a quantitative appraisal of the physics potential for neutrino experiments.’ >> >> bfo.control f i e l d ( ”001” ) >> ’ 12 ’ >> >> bfo.fields( ” 700. a” ) >>[ ’Alekhin, S I’ , ’Anselmino , M’ , ’ Ball , R D’ , ’Boglione , M’ ] >> >> bfo . kb ( ”DBCOLLID2COLL” , ”ARTICLE” ) >> ’Published Article’ >> >> bfo . kb ( ”DBCOLLID2COLL” , ” not in kb ” , ”My Value ” )

77 APPENDIX B. BIBFORMAT DEVELOPERS APIS

>> ’My Value ’ Moreover you can have access to the language requested for the formatting, the search pattern used by the user in the web interface and the userID by directly getting the attribute from ’bfo’: bfo. ln ””” Returns the language that was asked to be used for the formatting. Always returns a string. ”””

bfo.search pattern ””” Returns the search pattern specified by the user when the record had to be formatted. Always returns a string. ”””

bfo.uid ””” Returns the user ID of the user who shall view the formatted record. ””” Example (from inside BibFormat element): >> bfo . ln >> ’ en ’ >> >> bfo . s e a r c h p a t t e r n >> ’mangano and neutrino and factory’

78 Bibliography

[1] CERN. The world’s largest physics laboratory. http://www.cern.ch, September 2006. [2] Tim Berners-Lee. Information Management: A Proposal. http://www.w3.org/History/1989/proposal.html. [3] Alberto Pepe, Thomas Baron, Maja Gracco, Jean Yves Le Meur, Nicholas Robinson, Tibor Simko, and Martin Vesely. CERN Document Server Software: the integrated digital library. http://doc.cern.ch/archive/electronic/cern/preprints/open/ open-2005-018.pdf, Apr 2005. [4] CDSweb, the CDSInvenio installation at CERN. http://cdsweb.cern.ch/. [5] Open Archives Initiatives. http://www.openarchives.org/. [6] Library of Congress. Marc 21 Concise Format for Bibliographic Data. http://www.loc.gov/marc/bibliographic/. [7] Gregory Favre. Extending CDSware with social tools. http://documents.cern.ch//archive/electronic/cern/others/ itnote/it-note-2005-002.pdf, 2005. [8] EPFL. Infoscience. http://infoscience.epfl.ch/. [9] Ex Libris Group. http://www.exlibrisgroup.com/. [10] CERN. HR Statistics Home Page [access restricted to cern]. http://humanresources.web.cern.ch/humanresources/internal/ general/HN-statistics/default.asp. [11] EPFL. Service acad´emique, Statistiques [french only]. http://sac.epfl.ch/page9623.html. [12] RERO. Information G´en´erales [french only]. http://www.rero.ch/page.php?section=infos&pageid=rero info. [13] MIT Libraries and Hewlett-Packard Company. DSpace Federation. http://www.dspace.org/.

79 [14] EPrints. http://www.eprints.org/. [15] Cornell University Information Science and University of Virginia Library. Fedora Project. http://www.fedora.info/. [16] Fr´ed´eric Gobry. Pybliographer. http://pybliographer.org/. [17] Sonny Software. Bookends, reference and bibliography software. http://www.sonnysoftware.com/. [18] Alan Cooper and Robert M. Reimann. About Face 2.0: The Essentials of Interaction Design. Wiley, March 2003. [19] Kathy Baxter Catherine Courage. Understanding Your Users: A Practi- cal Guide to User Requirements Methods, Tools, and Techniques. Morgan Kaufmann Publishers, November 2004.

80