Transforming MARCXML Records Using XSLT

Violeta Ilik Head, Digital Systems & Collection Services Digital Innovations Librarian Galter Health Sciences Library Northwestern University Clinical and Translational Sciences Institute Chicago, IL March 4, 2015

Teun Hocks Violeta Ilik vivoweb.org

Hosted by ALCTS, Association for Library Collections and Technical Services + ALCTS - Library Resources & Technical Services

n Ilik, V., Storlien, J., and Olivarez, J. (2014). “ Makeover: Transforming MARC Records Using XSLT.” Library Resources and Technical Services 58 (3): 187-208.

“The knowledge gained from learning information technology can be used to experiment with methods of transforming one metadata schema into another using various software solutions”

Violeta Ilik vivoweb.org Hosted by ALCTS, Association for Library Collections and Technical Services + Why Catalogers?

n Survey conducted by Ma in 2007 revealed that the metadata qualifications and responsibilities required knowledge of MARC, crosswalks, XML, OAI …

n Analysis of cataloging position description performed by Park, Lu, and Marion reveal that advances in technology have created a new realm of desired skills, qualifications and responsibilities for catalogers.

n Terry Reese’s MarcEdit is a best friend to everyone in technical services

n Tosaka stresses the importance of metadata transformation to enable reuse

Violeta Ilik vivoweb.org Hosted by ALCTS, Association for Library Collections and Technical Services + Recommendations:

n Cataloging departments need to be proactive in creation and maintenance of non-MARC metadata AND in the development of means for sharing the metadata

n Catalogers need to participate in development and use of descriptive standards

n Catalogers need to participate in development of technologies; creation of semantic web compliant data

Violeta Ilik vivoweb.org Hosted by ALCTS, Association for Library Collections and Technical Services + Outline:

n MARC data – challenges

n XML family of standards n XML & XSLT

n Metadata Map

n Metadata transformation (live demo)

n Resources

n Conclusion

Violeta Ilik vivoweb.org Hosted by ALCTS, Association for Library Collections and Technical Services + MARC data – challenges Common issues - metadata and crosswalks

The four categories of metadata problems according to Dushay and Hillman:

n missing data

n incorrect or erroneous data

n confusing or inconsistent data

n insufficient data

Naomi Dushay and Diane I. Hillman, “Analyzing Metadata for Effective Use and Re-use,” in Proceedings of the 2003 International Conference on Dublin Core and Metadata Applications: Supporting Communities of Discourse and Practice - Metadata Research & Applications (n.p.: Dublin Core Metadata Initative, 2003)

Violeta Ilik vivoweb.org Hosted by ALCTS, Association for Library Collections and Technical Services + MARC data – challenges The Challenges:

n ambivalent matches

n hybrid bibliographic records

n data mapping to multiple fields or combining into single fields during migration

n orphaned data parsed into incongruous fields

n mixed standards in original data

n MARC data loss during the migration

n flat structure versus hierarchical structures

(Mary S. Woodley et al., “Crosswalks, Metadata Harvesting, Federated Searching, Metasearching: Using Metadata to Connect Users and Information,” in Introduction to Metadata, online edition, version 3.0 (Los Angeles: J. Paul Getty Trust, 2008) Violeta Ilik vivoweb.org Hosted by ALCTS, Association for Library Collections and Technical Services + MARC data – challenges The Challenges (continued):

n reconciling metadata organization systems

n choice of unanalogous processes during metadata standards creation

n imprecise definitions or alternate naming choices that inhibit element to element mapping

n information being lost or combined during mapping

n unharmonious hierarchical structures

(Margaret St. Pierre and William P. LaPlant Jr., “Issues in Crosswalking Content Metadata Standards,” (white paper, National Information Standards Organization (NISO)

Violeta Ilik vivoweb.org Hosted by ALCTS, Association for Library Collections and Technical Services + The XML family of standard

n Schema languages n XSLT n Xquery n Xpath n XSL-FO n SVG n Namespaces n Schematron n Xproc n Regular expressions n Xforms - An XML-oriented replacement for the forms used on the Web on commercial sites and elsewhere.

David Birnbaum, 2012. “What is XML and why should humanists care? An even gentler introduction to XML”

Violeta Ilik vivoweb.org Hosted by ALCTS, Association for Library Collections and Technical Services + XML family of standards Introduction to XML

n Ordered hierarchy of content objects (OHCO); three views of XML (containers, tree, serialization)

n Well-formedness: root element, matching tags, no overlapping elements, quoted attribute values, proper name characters, no reserved characters (ampersand, angle brackets)

n Document analysis (hierarchy; chunks and in-line elements)

n Elements and attributes

n Descriptive vs presentational markup

n Validity (vs well-formedness)

Birnbaum, D. & Ilik, V. “The Role of XSLT in Digital Libraries, Editions, and Cultural Exhibits” TPDL 2013, Valletta, Malta - Sept. 22nd 2013

Violeta Ilik vivoweb.org Hosted by ALCTS, Association for Library Collections and Technical Services + XML family of standards Ordered hierarchy

n XML is a hierarchical tree

n Tags are not toggle switches that turn properties on and off.

n When you’re writing XML, insert the whole element, with both the start and end tags

David Birnbaum, 2012. “What is XML and why should humanists care? An even gentler introduction to XML”

Hosted by ALCTS, Association for Library Collections and Technical Services + XML family of standards Rules of well-formedness:

- every start tag must have a closing tag Or

- Tags must nest cleanly Birnbaum, David

- Attribute values must appear within quotation marks

- Tags are case sensitive and they must match

Or

- Single root element

- The left angle bracket and ampersand are special characters

Birnbaum, D. & Ilik, V. “The Role of XSLT in Digital Libraries, Editions, and Cultural Exhibits” TPDL 2013, Valletta, Malta - Sept. 22nd 2013 Violeta Ilik vivoweb.org Hosted by ALCTS, Association for Library Collections and Technical Services + XML family of standards Document analysis (hierarchy; chunks and in-line elements)

n The life cycle of a project starts by determining the structural hierarchy and the

Birnbaum, D. & Ilik, V. “The Role of XSLT in Digital Libraries, Editions, and Cultural Exhibits” TPDL 2013, Valletta, Malta - Sept. 22nd 2013 Violeta Ilik vivoweb.org Hosted by ALCTS, Association for Library Collections and Technical Services + XML family of standards Elements & attributes

n An element consists of a start tag, an end tag, and content, which is whatever occurs between the tags. n XML elements may have four types of content: n Element content n Text content n Mixed content n Empty element

n Attributes – provide supplementary information about an element

Birnbaum, D. & Ilik, V. “The Role of XSLT in Digital Libraries, Editions, and Cultural Exhibits” TPDL 2013, Valletta, Malta - Sept. 22nd 2013 Violeta Ilik vivoweb.org Hosted by ALCTS, Association for Library Collections and Technical Services + XML family of standards Descriptive vs presentational markup

n Descriptive markup is usually mapped onto procedural markup. It is designed to support an open class of applications like information retrieval

n Presentational markup is designed for reading as it clarifies the presentation of a document

Birnbaum, D. & Ilik, V. “The Role of XSLT in Digital Libraries, Editions, and Cultural Exhibits” TPDL 2013, Valletta, Malta - Sept. 22nd 2013 Violeta Ilik vivoweb.org Hosted by ALCTS, Association for Library Collections and Technical Services + XML family of standards Validity (vs well-formedness)

n This is a technical term that means that the document uses only certain elements, and that it uses them only in certain contexts

Birnbaum, D. & Ilik, V. “The Role of XSLT in Digital Libraries, Editions, and Cultural Exhibits” TPDL 2013, Valletta, Malta - Sept. 22nd 2013 Violeta Ilik vivoweb.org Hosted by ALCTS, Association for Library Collections and Technical Services + XML family of standards Overview: XSLT

n ”XSLT (eXtensible Stylesheet Language Transformations) is one way to transform your document, manipulate the tree, and output the results as XML, HTML, SVG, or plain text.”

n An XSLT stylesheet is an XML document and must be valid against the XSLT schema.

n The root element is - elements inside the root are primarily elements - template elements typically have a @match attribute

n XSLT is a declarative programming language

Violeta Ilik vivoweb.org Hosted by ALCTS, Association for Library Collections and Technical Services + XML family of standards Namespaces

n Input namespace

n Output namespace

Violeta Ilik vivoweb.org Hosted by ALCTS, Association for Library Collections and Technical Services + XML family of standards Controlling the output with

n

n

Violeta Ilik vivoweb.org Hosted by ALCTS, Association for Library Collections and Technical Services + XML family of standards Standards @ the Library of Congress

The Library of Congress (LC) developed MARCXML architecture and MARCXML toolkit to standardize the exchange of MARC structured data in XML

Standards

Violeta Ilik vivoweb.org Hosted by ALCTS, Association for Library Collections and Technical Services + XML family of standards Introduction to the use of eXtensible Stylesheet Language Transformation (XSLT) for repurposing, editing and reformatting metadata

Review:

n Mapping compares and analyzes two or more metadata schemas, while crosswalks are the product of the mapping process

n each XSLT stylesheet describes how a set of XML documents (the source documents) should be converted to other documents (the result documents)

Difference between MARC and XML: XML uses beginning tags <> and ending tags Violeta Ilik vivoweb.org Hosted by ALCTS, Association for Library Collections and Technical Services + XML family of standards Two opposing views on crosswalks:

n crosswalks are a stopgap measure

n crosswalks represent an attempt to identify interoperable elements among standards

metadata can and should be reused, and libraries must ensure interoperability of their metadata for this very reason.

Violeta Ilik vivoweb.org Hosted by ALCTS, Association for Library Collections and Technical Services + Metadata Map 15 base Dublin Core elements:

n title; creator; subject; description; publisher; contributor; date; type; format; identifier; source; language; relation; coverage; rights

Violeta Ilik vivoweb.org Hosted by ALCTS, Association for Library Collections and Technical Services + Metadata Map

Violeta Ilik vivoweb.org Hosted by ALCTS, Association for Library Collections and Technical Services + Metadata map [cont.]

Violeta Ilik vivoweb.org Hosted by ALCTS, Association for Library Collections and Technical Services + How to get the MARCXML:

n Directly from OCLC

n via MarcEdit workflow (Export from OCLC the usual way >> convert to MARCXML via MarcEdit)

n Export from your ILS

n …

Violeta Ilik vivoweb.org Hosted by ALCTS, Association for Library Collections and Technical Services + Under Options window select “Record Characteristics” and under Bibliographic Records section and “Record Standard” select MARCXML, under “Character Set” select UTF-8 Unicode:

Violeta Ilik vivoweb.org Hosted by ALCTS, Association for Library Collections and Technical Services + Next steps:

n Export the selected bibliographic record. It will be stored in the new export destination file you selected

n Open the file with OxygenXML Editor (or any other similar editor) and you are ready to manipulate the XML file with a XSLT.

Violeta Ilik vivoweb.org Hosted by ALCTS, Association for Library Collections and Technical Services + Example MARCXML 00000ngm a2200000Ka 4500 ocn271676443 081114s2008 xx 059 vleng d v f c z a h z u The future of humanities scholarship in a digital world [videorecording]. 2008. 1 videocassette (59 min.) : sd., col. ; 1/5 in. Title from container. Laura Mandell. Recorded on August 1, 2008, in 105 Dartmouth Hall, Dartmouth College. Viewing copies available on DVD. DVCam. College films and programs. mim Mandell, Laura. Dartmouth College. Media Production Group. Dartmouth College. Research Computing. Violeta Ilik vivoweb.org

Services Technical Association for Collections and Library ALCTS, Hosted by + Live Demo …

Using OxygenXML Editor

Violeta Ilik vivoweb.org

Hosted by ALCTS, Association for Library Collections and Technical Services + Resources Cataloging Resources and tools:

n OCLC Bibliographic formats and standards (general MARC reference

n US Library of Congress MARC 21 format for bibliographic data (general MARC reference)

n MARCXML home (reference and tools)

n MarcEdit (freeware Windows-only application to edit MARC records, including individual and batch transformation

n Dublin Core Metadata Initiative (home)

n Metadata Object Description Schema (MODS) home (reference and tools)

n …

Violeta Ilik vivoweb.org

Hosted by ALCTS, Association for Library Collections and Technical Services + Resources XSLT and related references:

Online:

n FunctX XSLT function library

n Dave Pawson’s XSLT FAQ page

n w3schools tutorials and references

Books:

n Michael Kay’s XSLT 2.0 and XPath 2.0 programmer’s reference (reference)

n Jeni Tennison’s Beginning XSLT 2.0: From novice to professional (tutorial)

Violeta Ilik vivoweb.org

Hosted by ALCTS, Association for Library Collections and Technical Services + Resources Files on Git:

n Transforming-MARCXML-with-XSLT

https://github.com/vioil/Transforming-MARCXML-with-XSLT

Violeta Ilik vivoweb.org

Hosted by ALCTS, Association for Library Collections and Technical Services + Conclusion

n Catalogers have the potential to undertake metadata projects by active participation in metadata transformation

Teun Hocks

Violeta Ilik vivoweb.org

Hosted by ALCTS, Association for Library Collections and Technical Services + My teachers and mentors

n Dr. David Birnbaum, Chair, Slavic Languages & Literature Department, University of Pittsburg

n Dr. Laura Mandell, Director, Initiative for Digital Humanities, Media, and Culture; Professor, Department of English, Texas A&M University

n Matthew Gibson, Director of Digital Initiatives; Editor, Encyclopedia Virginia; Virginia Foundation for the Humanities

n Cristine Ruotuolo, Digital Services Manager for Humanities and Social Sciences and Bibliographer for English Language and Literature at the University of Virginia Library

Violeta Ilik vivoweb.org

Hosted by ALCTS, Association for Library Collections and Technical Services Thank you Violeta Ilik vivoweb.org

Teun Hocks

Hosted by ALCTS, Association for Library Collections and Technical Services