Coding for Efficiencies in Cataloging and Metadata: Practical Applications of XML, XSLT, XQuery, and PyMarc for Library Data

Software Recommendations and Installation Instructions

June 25, 8:30 – 4:30p.m.

An ALCTS Preconference of the ALA Annual Conference San Francisco, CA

XML Technologies for Catalogers and Metadata Librarians, Parts I & II Links and installation tips

The preconference morning sessions will include hands-on exercises and demonstrations in creating and editing XML cataloging and metadata records and will introduce the use of XSLT, the XML eXtensible Stylesheet Language for Transformations. The recommended software application so that you can follow along and complete the hands-on exercises during these sessions is SyncRO Soft's oXygen XML Editor. Below we provide links for downloading this application and getting a temporary 30-day trial license, and tips for installing oXygen on your laptop. oXygen is a powerful tool. We will only be exploring a small fraction of its menus. Do not be put off by the extent of its menus and options. (Alternatives to oXygen are discussed on the next page, but keep in mind, the presenters will be working primarily in oXygen.)

1. oXygen is a commercial, general purpose XML editing and development tool. It can be used to create and edit XML and HTML, as well as to create and test XML schemas and XSLT stylesheets. It has aggressive academic pricing and a 30-day free trial license. - Product home page: http://www.oxygenxml.com/ - Download from: http://www.oxygenxml.com/download_oxygenxml_editor.html. Please read the minimum requirements before installing. Both Windows & Mac operating systems are supported. Both options require Java (included as part of the oXygen download in case you do not already have a current version of Java installed). - Windows 32-bit & 64-bit downloads are self-extracting install executables (.exe files). After download, run (double-click) the file and follow the install wizard instructions. - Mac download is a zip file or a tar-gzip file. Once downloaded, installation is accomplished by unzipping or extracting in the usual way. - In order to use oXygen (all platforms) you must paste in a valid license key. Keys are emailed to you. The first time you start oXygen, you will be prompted to paste in the key you were emailed. Simply copy and paste. - To register for a 30-day trial license key, go to: http://www.oxygenxml.com/register.html. You can also request a trial license key or update your key by selecting 'Register…' from the 'Help' menu. - For academic pricing: http://www.oxygenxml.com/buy_new_licenses_academic.html

Other Software

We will also be reviewing ways to create XML files directly from MARC records, to view XML files in your Web browser and to edit simple XML files using generic plain text editors. So that you can follow along on these topics, you may also want to verify your Web Browser's ability to display raw XML. Please consider downloading and installing a plain text editing software and Terry Reese's MarcEdit tool. Finally, for completeness, we also include below links for installing 2

alternatives to oXygen, Altova's XML Spy and Stylus Studio XML Editor – but installation of these products is not required unless for some reason oXygen is not an option for you.

2. MarcEdit is a utility software application used by many catalogers and metadata librarians who routinely work with MARC records. It was developed by Terry Reese, and the current version is 6 (update released on 26 May 2015). General download information is available from: http://marcedit.reeset.net/downloads. The simple freeware license is here: http://marcedit.reeset.net/marcedit-end-user-license-agreement.

Download for Windows 32-bit Operating System: http://marcedit.reeset.net/software/MarcEdit_Setup.msi - Simply save this file to your computer, run it, agree to the license and tell the installer where to put the application on your (hard drive).

Download for Windows 64-bit Operating System: http://marcedit.reeset.net/software/MarcEdit_Setup64.msi - Simply save this file to your computer, run it, agree to the license and tell the installer where to put the application.

Download for Mac OXS, version 10.6 and later. - The Mono Framework, version 3.4 or later, must be installed first! Get and run the self- installing package (.pkg file) for Mac OSX from: http://www.mono-project.com/download - Then download MarcEdit: http://marcedit.reeset.net/software/MarcEdit_app.zip and install by unzipping. Run MarcEdit by clicking on the MarcEdit app. Detailed instructions: http://marcedit.reeset.net/marcedit-mac-installation-instructions

3. Web Browsers: Most modern Web browsers will display raw XML, and many will also display styled XML as transformed by an XSLT stylesheet. To verify that your Web browser will display simple XML, go to the following two links. Your Web browser window should look something like the screen shots shown on the next page. Raw XML: http://quest.library.illinois.edu/ala2015/ALCTS-XML/XMLExample/dc.xml Styled XML: http://quest.library.illinois.edu/ala2015/ALCTS-XML/XMLExample/dcStyle.xml

2

Display of a raw XML file in Chrome (Windows 7)

Display of an XML file after styling by XSLT in Chrome (Windows 7)

4. Plain-Text Editors: Any plain text editor can be used to create and edit XML files, but avoid word processors (like MS Word) which introduce special formatting characters into files when

3

saved. Also, some plain text editors do a better job displaying XML to make editing easier. A few good plain text editors to consider are listed below. These are free/ and easy to install. These tools do not have the test and debugging features available in oXygen, but if you at least have these and a Web browser you'll be able to follow along for the morning sessions.  Adobe Brackets for Mac & Windows (Adobe Systems, Inc.) - Website: http://brackets.io/ - Optimized for editing HTML, CSS, JavaScript, etc., but color coding works well for creating well-formed XML and XSLT as well.  Notepad++ for Windows (Don Ho) - Website: http://notepad-plus-plus.org/ - As above, but with a few more options regarding encoding, coding language, display, etc.  GNU Emacs, multiple operating systems ( Foundation) - Website: https://www.gnu.org/software/emacs/ - Works on many different platforms for wide range of markup and coding languages, but Emacs began in 1970's; many users today find interface and advanced features of Emacs challenging to use and non-intuitive. For the novice, Emacs is only appropriate for simple, straightforward XML editing.

5. XML Spy: (not free) XML Spy XML Editor (Altova) is a good full-feature XML editing tool that some prefer to oXygen – if your institution licenses XML Spy currently you may want to consider this tool instead of oXygen. - Website: http://www.altova.com/xmlspy.html - Full range of functionality, comparable to oXygen, with modest, mostly easy-to-recognize differences in look and feel. - Educational discounts available (http://www.altova.com/edu-partnership.html) - 30-day free trial available (http://www.altova.com/download-trial.html) - Installation similar to oXygen, but follow the instructions on the Website.

6. Stylus Studio: (not free) Stylus Studio XML Editor – comes in 3 versions at various price points, including an inexpensive Home Edition– but Home Edition version does not support XQuery and a number of other advanced features. - Website: http://www.stylusstudio.com/ (also: http://www.stylusstudio.com/buy/ and http://www.stylusstudio.com/buy/compare.html) - Professional and Enterprise versions comparable to oXygen and XML Spy; Home Edition has fewer features, but most of what will be needed for the workshop. - Academic pricing available (http://www.stylusstudio.com/buy/academic_pricing.html) - 15-day free trial available (http://www.stylusstudio.com/xml_download.html)

If you have any questions, contact Tim Cole – [email protected].

4

Using XQuery as a Library Metadata Tool Setup Instructions to Prepare for Hands-on Exercises

Half of this session will be spent on hands-on exercises. To experience working with XQuery and getting full benefit from this session, there are couple of things to prepare on your laptop before arriving. Please download and install BaseX, an easy-to-use, open-source XQuery processor. Also, download a dataset of freely available MODS documents.

Download Dataset of MODS documents

On your hard drive, create a folder in “DOCUMENTS” or “MY DOCUMENTS” and name it “mods- files.”

Go to University of Illinois Library Catalog Datasets: http://catalogdata.library.illinois.edu/

Under Available Datasets => UIUC Library created catalog records => MODS with WorldCat, VIAF and LCSH links (http://catalogdata.library.illinois.edu/mods-open/)

Download the last zip file on the list 7.zip to the mods-files folder.

Unzip the 7.zip file. The UIU_1.xml file contains the MODS records we will use for the XQuery exercises.

Download and Install BaseX (XQuery processor)

The XQuery processor for the exercises is BaseX. For both Windows/Mac environments, check to make sure Java 7 is installed on your computer, since Java 7 is required for the current version of BaseX.

Go to http://basex.org/ and Choose Download BaseX 8.2

Windows installation:

 Choose Windows Installer BaseX 8.2.exe to download and install BaseX  After downloading, double-click the BaseX82.exe file and follow the installation instructions.  Double-click the BaseX icon to open the BaseX GUI

Mac installation:

 Choose the Core Package BaseX 8.2.jar to download BaseX  After downloading, double-click on the jar file in the Downloads folder  Go into Preferences/Security to confirm that you want to open this file for the BaseX GUI to open  While not required, to keep the BaseX installation, move the BaseX 8.2.jar file to the Applications folder

Another option for Mac computers is: Choose Other Distributions => Mac OSX ()

For Mac users, I’d recommend installing BaseX using Homebrew.

First, install homebrew (http://brew.sh/) using the command line and

Then install BaseX (http://brewformulas.org/Basex) using the command line.

Users will then need to start the application from the command line: basexgui.

Setting up database of MODS records in BaseX

The next step is to use the MODS dataset to set up a database for the session exercises.

In BaseX:

Choose Database => New

On the General tab:

Input file or directory: Browse to the location of the XML document that contains the MODS records, UIU1.xml

Name of database: Name database “mods-file”

Then choose the Parsing tab and uncheck “Chop whitespaces” since we want to retain the whitespace in the MODS records, e.g., The we want to retain the space after “The.”

Click OK to complete the process. The database is now ready for writing and running queries.

XQuery Main Module

Copy and paste this main module into the BaseX Editor into the file workspace.

xquery version "3.0";

(: Add comment here to describe query :)

declare namespace mods = "http://www.loc.gov/mods/v3";

(: Source XML document contains 17,650 MODS records :) let $docs := collection("mods-files") return $docs

2

Click on the green arrow above the workspace to run the query. View the MODS documents in the Result view.

Save this query by selecting Editor => Save As => exercise1. The file will be saved with file extension .xq as an XQuery file.

If you would like to simplify the BaseX workspace, go to View and check off: Editor, Result, Buttons, and Status Bar. We only need the Editor and Result views for this session.

This completes the preparation for the hands-on part of the XQuery session.

Reading ahead (for the ambitious!)

While not required, if you’d like to take a look at XQuery before the session, I recommend reading the first two chapters of Priscilla Walmsley’s book, XQuery: Search Across a Variety of XML Data. Sebastopol, CA: O'Reilly. 2007.

For help, or if you have any questions, contact Christine Schwartz – [email protected]

3

Using Python & PyMARC for MARC Record Analysis and Manipulation Prerequisites for Hands-On Exercises

Given the limited time allotment for the session, general Python programming will not be covered. The focus will be on an introduction to PyMARC (which is a Python library), and the core PyMARC functions used for analyzing and modifying MARC records.

The last 30-45 minutes of this session will be spent on hands-on exercises. To participate in the exercises, you will need to have both Python installed, as well as the PyMARC library, and you will need to download the demo code and sample file of MARC records found on GitHub at: https://github.com/hfrank71/presentations/tree/master/pymarc_20150625

The demo code and examples are based on Python version 2.7. While the code may work fine with Python 3, it has not been tested.

There are many online tutorials for getting Python and PyMARC installed. Here are some pointers:

Mac setup:

Mac OS X Yosemite should already include Python 2.7.

If you do need to install Python on your Mac: Download Python for Mac: https://www.python.org/downloads/mac-osx/

Installing Python on Mac: http://docs.python-guide.org/en/latest/starting/install/osx/

Windows setup:

Download Python for Windows: https://www.python.org/downloads/windows/

Installing Python on Windows: http://docs.python-guide.org/en/latest/starting/install/win/ http://www.howtogeek.com/197947/how-to-install-python-on-windows/

Once Python is installed, you need to install the PyMARC library: https://github.com/edsu/pymarc#installation

For help, or if you have any questions, contact Heidi Frank – [email protected]