Training: Repository Configuration Exercises

Exercise 1: Getting Started The PCs provided for this session have been pre­installed with an EPrints demo repository.

1.1 Logging In To log into the machine enter the username eprints, Press return, and when asked enter the password you have been given. You will the be logged into the desktop. If you're not familiar with , don't panic! Things work more or less the same as in Windows.

1.2 Entering Commands A terminal allows you to type commands directly. This is how you will issue commands to EPrints. Click the right mouse button on the desktop and select Open Terminal from the popup menu. Enter the following command: eprints> cd /opt/eprints2 (eprints> means “type this as user eprints”)

1.3 Starting a Text Editor Each of the exercises in this session require you to edit one or more EPrints configuration files. To start a text editor application, click on the Applications button on the top left hand corner of the screen, select Accessories, then Text Editor. To edit the file: /opt/eprints2/archives/myid/cfg/template­en.xml click the Open button on the toolbar, then double­click on Filesystem in the left hand pane of the Open File.. dialog box. In the right hand pane, double­click opt, then eprints2 etc. until you reach template­ en.xml. Double­click template­en.xml to open it for editing.

1.4 Starting a Web Browser After editing the EPrints configuration files, you will use a Web browser to check the results.To start a Web browser application, click on the Applications button, select Internet, then Firefox Web Browser. In the address bar, enter the hostname of the PC you are using (look on the front of the PC case for the machine name) to load the front page of the demo repository: http://machinename.ecs.soton.ac.uk/

1/15 http://www.eprints.org/services/training/ EPrints Training: Repository Configuration Exercises

Exercise 2: Branding Almost all institutions modify the default “look” of their EPrints repository. Some simply add their logo and colour scheme, others make far more radical changes. For examples, visit: http://www.eprints.org/software/examples/#branding In this exercise, you will modify the look of the demo repository. Every page displayed by EPrints is wrapped in an XHTML template. This can be found at: /opt/eprints2/archives/myid/cfg/template­en.xml

2.1 Change the Web Site Template Edit: /opt/eprints2/archives/myid/cfg/template­en.xml Find the line

&archivename;
This is the name of the archive as it appears in the top left of every page. Replace this with your institution's logo:
(Or use the EPrints logo: http://www.eprints.org/style/eprintslogo.gif) Note that this file is XHTML so make sure that the img tag is closed correctly! Save your changes.

2.2 Reload the configuration Whenever you make changes to the configuration, you need to tell EPrints to reload it: eprints> bin/force_config_reload myid Tell EPrints to re­generate its static pages (home page, help page...): eprints> bin/generate_static myid

2.3 Check it worked

• Reload the home page of the demo repository in your browser.

• Check your logo is displayed.

2.4 Change the Stylesheet Edit: /opt/eprints2/archives/myid/cfg/static/general/eprints.css

2/15 http://www.eprints.org/services/training/ EPrints Training: Repository Configuration Exercises

Change the background value for .header and .footer to suit your institution’s colour scheme (or use “EPrints Blue”: #ccccff). Save your changes. Update the static pages: eprints> bin/generate_static myid

2.5 Check it worked

• Reload the home page of the demo repository in your browser.

• Check your new colour scheme is displayed.

2.6 Notes

• Find the elements in the template­en.xml file. These tell EPrints where to insert page content.

• To apply your changes to the views and abstract pages, you would need to run generate_views and generate_abstracts.

• Explore how generate_static works in more detail: http://www.eprints.org/documentation/tech/php/generate_static. • force_config_reload is a quick way of reloading the eprints configuration, and is useful when tweaking the configuration files. However, for performance reasons you should always restart apache after tweaking a live archive. Read more: http://www.eprints.org/documentation/tech/php/force_config_reload.p hp

3/15 http://www.eprints.org/services/training/ EPrints Training: Repository Configuration Exercises

Exercise 3: Configuring Submission Workflow When depositing documents in EPrints, users work through several simple forms. However, some institutions prefer to provide users with one large submission form. In this exercise you will learn how EPrints splits up the submission process, and configure the demo repository to provide one large submission form.

3.1 Change the workflow configuration Edit: /opt/eprints2/archives/myid/cfg/metadata­types.xml Find the definition of the article type. It starts with Notice that the list of metadata fields is broken up by tags ­ these tell EPrints how to split the submission process into forms. Remove all the tags from the article type. Save your changes.

3.2 Reload the configuration eprints> bin/force_config_reload myid

3.3 Create a User Account To view the submission page, you first need to create yourself a user account: eprints> bin/create_user myid USERNAME EMAIL admin PASSWORD Choose your own values for USERNAME, EMAIL and PASSWORD.

3.4 Check it worked

• Load the home page of the demo repository in your browser.

• Go to the User Area.

• Log in using the account you just created.

• You will be asked to enter your name. This is because by default a user record is not valid unless it has a name. Click Update Record.

• Select Begin new Item.

• Select Article and click Next.

• Check that all the metadata fields are shown on a single form.

4/15 http://www.eprints.org/services/training/ EPrints Training: Repository Configuration Exercises

Exercise 4: Customising the Subject Hierarchy By default, EPrints is configured with a subject hierarchy based on the Library of Congress subject tree. Institutions often modify this hierarchy (e.g. by adding more detail to areas that they specialise in) or even change it for a completely different classification scheme. In this exercise you will replace the EPrints subject tree with an alternative classification scheme.

4.1 Change the default subject tree Edit: /opt/eprints2/archives/myid/cfg/subjects

Examine the format of this file: subjects:Library of Congress Subject Areas:ROOT:0 ... Q:Q Science:subjects:0 Q1:Q Science (General):Q:1 QA:QA Mathematics:Q:1 QA75:QA75 Electronic computers. Computer science:QA:1 QA76:QA76 Computer software:QA:1 QB:QB Astronomy:Q:1 QC:QC Physics:Q:1

You can see that each line of the file is in the form: subjectid:name:parents:depositable

• subjectid is a unique identifier for the subject.

• name is the name of the subject, in the default language of the archive.

• the parents field defines how the subject fits into the hierarchy. In this case, EPrints is able to determine that "Computer Software" (QA76) comes under "Mathematics" (QA) which itself comes under "Science" (Q) and so on. Therefore a search for eprints matching the "Mathematics" subject would also return items associated with the "Computer Science" (QA75) and "Computer Software" (QA76) subjects. The special keyword ROOT indicates the top level of the hierarchy.

• depositable specifies whether or not users can associate their deposits with the subject. It wouldn't be very helpful to let users associate their eprints with the top level subject "Library of Congress Subject Areas", so in this case depositable is set to 0.

5/15 http://www.eprints.org/services/training/ EPrints Training: Repository Configuration Exercises

Remove the eprints subject tree altogether and replace it with you own favourite classification scheme. Here's the first line of an ACM Computing Classification scheme (http://www.acm.org/class/1998/) to get you started: subjects:ACM Computing Classification System (1998):ROOT:0

4.2 Import the new subject tree eprints> bin/import_subjects myid

4.3 Check it worked

• Load the home page of the demo repository in your browser.

• Go to the User Area.

• Select Begin new Item.

• Select Book and click Next.

• Continue through the deposit process until you reach the Subjects page.

• Check your new subject tree is shown.

4.4 Notes

• The ability to specify multiple parents for a subject means that you can create a rich “lattice” rather than force items into a tree structure.

• Note that you did not need to run force_config_reload after making changes to the subjects file. This is because the subjects file is only used by import_subjects script and is not part of the main archive configuration.

• Editing the subjects file is a good way to define an inital classification scheme. To make small tweaks later, you can use the subject editing tool. Point your browser at: http://flag.ecs.soton.ac.uk:8080/perl/users/staff/edit_subject • This exercise dealt with creating subject trees in a single language. In order to specify subject trees in multiple languages, you need to use a special XML format. See: http://www.eprints.org/documentation/tech/php/import_subjects.php

6/15 http://www.eprints.org/services/training/ EPrints Training: Repository Configuration Exercises

Exercise 5: Bespoke Deposits 1 EPrints comes configured with a default set of metadata fields (title, creators, date, publication, keywords, abstract...). Institutions often add extra metadata fields to suit the individual requirements of their repositories. In this exercise you will add a new metadata field to the demo repository.

5.1 Add a new metadata field The field you are going to add will allow users to enter the project code of the project that produced the item. Edit: /opt/eprints2/archives/myid/cfg/ArchiveMetadataFieldsConfig.pm Find the entry for the referencetext field and add a comma after it, then add: { name => "proj_code", type => "text" }

5.2 Add human-readable name and description for new field Now you need to add a human­readable name and description of the proj_code field. Edit: /opt/eprints2/archives/myid/cfg/phrases­en.xml Add the following new phrases: Project Code Enter the code number of the project that produced this item. The order of the phrases in the file does not matter, but placing these with the other eprint_field* phrases would be sensible. The identifier proj_code is used internally in the database and in the EPrints configuration files, but these are never seen by users. The phrases are the “public face” of your repository. Although this makes things more complicated in some ways it gives several advantages:

• Phrases are Unicode so can contain any characters, not just Latin characters.

• The human­readable name or description of a field can easily be changed without having any impact on the underlying data.

• Multiple phrase files can be used for a single repository, allowing it to support multiple languages.

7/15 http://www.eprints.org/services/training/ EPrints Training: Repository Configuration Exercises

5.3 Add the field to certain types of EPrints Edit: /opt/eprints2/archives/myid/cfg/metadata­types.xml You are going to make the proj_code field optional for items of type conference_item but required for those of type monograph. Find the conference_item type. It starts with Add this line anywhere within the element (remember from Exercise 2 that the elements determine where in the submission process fields will appear): Now find the monograph eprint type and add:

5.4 Erase and rebuild the database The erase_archive command deletes all the database tables, all the documents in the repository, and the repository Web pages. However, it does not touch anything in the repository's configuration directory. Stop the indexer running before deleting the database tables, in case it tries to write to them: eprints> bin/indexer stop eprints> bin/erase_archive myid ­­force The mysql root password is blank (just hit return). Of course, on a properly secure system you would have set it to something! eprints> bin/create_tables myid eprints> bin/indexer start eprints> bin/import_subjects myid ­­force eprints> bin/generate_static myid

5.5 Reload the configuration eprints> bin/force_config_reload myid

5.6 Create a User Account Create yourself a user accout. erase_archive also destroyed any existing user data! eprints> bin/create_user myid USERNAME EMAIL admin PASSWORD

5.7 Check it worked

• Load the home page of the demo repository in your browser.

• Go to the User Area.

8/15 http://www.eprints.org/services/training/ EPrints Training: Repository Configuration Exercises

• Log in using the account you just created.

• You will be asked to enter your name. This is because by default a user record is not valid unless it has a name. Click Update Record.

• Select Begin new Item.

• Select Conference or Workshop Item and click Next.

• Check that the new project code field is shown.

• Check that an error message is shown if you don’t fill in the project code field for the Monograph type.

5.8 Notes

• Adding or removing fields means that the database tables will need to change. In this exercise you did this the easy way: erase the database and create a new one. Obviously you can't do this if you are already using your repository and have valuable data in the database. In that case you would modify the configuration files as described above, then modify the database directly by issuing SQL commands. There are instructions on how to do this on the EPrints Wiki: http://wiki.eprints.org/w/EPrints2/AddingLiveFields

9/15 http://www.eprints.org/services/training/ EPrints Training: Repository Configuration Exercises

Exercise 6: Bespoke Deposits 2 Default deposit types in Eprints are: Article, Book, Book Section, Conference Item, Monograph, Patent, Thesis, and Other. Some institutions have extended the default list of deposit types to include multimedia or data items. For examples of data repositories, visit: http://www.eprints.org/software/examples/#data In this exercise you will add a new deposit type to the demo repository.

6.1 Add a new deposit type You will be adding a new Photograph deposit type to the demo repository. Edit: /opt/eprints2/archives/myid/cfg/metadata­types.xml Add a new photograph type somewhere inside the .. tags: In order to upload a photograph, we also need to extend the list of document types to include photo (jpg) files. Add a new jpg type inside the .. tags:

6.2 Add human-readable name and description for new types Now you need to add the human readable name and description of the photograph and jpg types. Note that we don’t need to add any phrases for the title, creators or date_issue fields since these are defined by default. Edit: /opt/eprints2/archives/myid/cfg/phrases­en.xml Add the following new phrases: Photograph A scanned or digital photograph Jpeg

10/15 http://www.eprints.org/services/training/ EPrints Training: Repository Configuration Exercises

6.3 Add citation style for new types EPrints defines citation styles for each type. These citation styles tell EPrints how to display a short summary of the record, for example in search results. Edit: /opt/eprints2/archives/myid/cfg/citations­en.xml Add the following citation style for photographs: @creators@ @title;magicstop@ (@date_effective;res=year@) You can see that this citation style contains several special instructions:

• @creators@ tells EPrints to insert the value of the creators field.

• @title;magicstop@ tells EPrints to insert the value of the title field, followed by a full stop unless the title already ends in a full stop.

• @date_effective;res=year@ tells EPrints to insert the “year” part of the date_effective field.

• The tag tells EPrints where to insert a link to the item. To create a citation style for jpg files, just copy the document_other citation: @format@ (@formatdesc@) ­ @security@ In this case, the tags tell EPrints to only insert the value of the named field if its value is actually defined.

6.4 Add JPG to required formats By default, EPrints requires that users submit a file of type html, pdf, ps, or ascii. This requirement isn’t helpful to photographers! Tell EPrints that jpg is also a required type: Edit: /opt/eprints2/archives/myid/cfg/ArchiveConfig.pm Find the lines:

11/15 http://www.eprints.org/services/training/ EPrints Training: Repository Configuration Exercises

$c­>{required_formats} = [ "html", "pdf", "ps", "ascii" ]; And add the jpg format to the list.

6.5 Reload the configuration eprints> bin/force_config_reload myid Note that we don’t have to erase and rebuild the database as we did in the previous exercise because we have not defined any new fields for the photo type.

6.6 Check it worked

• Load the home page of the demo repository in your browser.

• Go to the User Area.

• Select Begin new Item.

• Select Photograph and click Next.

• Check that the fields you defined for photograph (title, creators, date_issue) are shown.

• Check that you can upload a JPG file.

6.7 Notes

• The list of deposit types can also be restricted, for example to create theses­only repositories. For examples, visit: http://www.eprints.org/software/examples/#theses

12/15 http://www.eprints.org/services/training/ EPrints Training: Repository Configuration Exercises

Exercise 7: Adding an Organisational Structure Exercise 4 added an alternative classification schema to EPrints, which involved editing the subjects file. In fact, the subjects file is a slight misnomer ­ it can actually be used to specify any tree­like classification. A popular customisation is for institutions to add a separate “subject tree” describing their organisational structure. This means that deposits can be associated with the departments or research groups from which they originated.

In this exercise you add a second “subject” tree describing the organisational structure of your institution.

7.1 Add your organisational structure Edit: /opt/eprints2/archives/myid/cfg/subjects Add another "subject" tree to the subjects file which describes the organisational structure of your university. Here's the first line of the University of Southampton's organisational structure (http://www.soton.ac.uk/about/academicschools/index.html) to get you started: uos:University of Southampton:ROOT:0

7.2 Add a new metadata field The field you are going to add will allow users to select the departments that were involved in producing the item. Edit: /opt/eprints2/archives/myid/cfg/ArchiveMetadataFieldsConfig.pm Find the entry for the referencetext field and add a comma after it, then add: { name => "depts", type=>"subject", top=>"org", multiple => 1 },

7.3 Add human-readable name and description for new field Add a human­readable name and description of the depts field. Edit: /opt/eprints2/archives/myid/cfg/phrases­en.xml Add the following new phrases: Department(s) Select the department(s) involved in producing this item.

13/15 http://www.eprints.org/services/training/ EPrints Training: Repository Configuration Exercises

7.4 Add the field to certain types of EPrints Make the depts field optional for records of type book. Edit: /opt/eprints2/archives/myid/cfg/metadata­types.xml Find the book eprint type. It starts with Add this line after the editor field:

7.5 Erase and rebuild the database eprints> bin/indexer stop eprints> bin/erase_archive myid ­­force eprints> bin/create_tables myid eprints> bin/indexer start eprints> bin/import_subjects myid ­­force eprints> bin/generate_static myid

7.6 Reload the configuration eprints> bin/force_config_reload myid

7.7 Create a User Account eprints> bin/create_user myid USERNAME EMAIL admin PASSWORD

7.8 Check it worked

• Load the home page of the demo repository in your browser.

• Go to the User Area.

• Log in using the account you just created.

• You will be asked to enter your name. This is because by default a user record is not valid unless it has a name. Click Update Record.

• Select Begin new Item.

• Select Book and click Next.

• Check that the new depts field is shown.

14/15 http://www.eprints.org/services/training/ EPrints Training: Repository Configuration Exercises

Exercise 8: Configuring Searching and Browsing The following tasks do not have step­by­step instructions. If you run into problems or have any questions, please ask!

8.1 Add a new “b rowse by” view The default EPrints configuration allows users to browse a repository by year or by subject. Add a browse by project code (the field you created in Exercise 5) option to the demo repository. Hints:

• Views are configured in ArchiveConfig.pm under browse_views.

• The id of the view determines the phrase used for its title.

• After reloading the configuration, run generate_views to update the views (add a few submissions first so that your views actually contain some data!)

8.2 Add a new search form The default EPrints configuration has two search forms: simple and advanced. Add a new search form to the demo repository which allows users to search on title and/or project code. Hints:

• Search forms are configured in ArchiveConfig.pm

• Copy the simple search definition and modify it.

• You could also add a new phrase as introductory text for your search form.

• Don’t forget to force_config_reload

• The id you give your search form ($c­>{search}­>{YOURID}) determines its URL ­ http://machinename.ecs.soton.ac.uk/perl/search/YOURID

15/15 http://www.eprints.org/services/training/