Transforming MARCXML Records Using XSLT
Total Page:16
File Type:pdf, Size:1020Kb
Transforming MARCXML Records Using XSLT Violeta Ilik Head, Digital Systems & Collection Services Digital Innovations Librarian Galter Health Sciences Library Northwestern University Clinical and Translational Sciences Institute Chicago, IL March 4, 2015 Teun Hocks Violeta Ilik vivoweb.org Hosted by ALCTS, Association for Library Collections and Technical Services + ALCTS - Library Resources & Technical Services n Ilik, V., Storlien, J., and Olivarez, J. (2014). “Metadata Makeover: Transforming MARC Records Using XSLT.” Library Resources and Technical Services 58 (3): 187-208. “The knowledge gained from learning information technology can be used to experiment with methods of transforming one metadata schema into another using various software solutions” Violeta Ilik vivoweb.org Hosted by ALCTS, Association for Library Collections and Technical Services + Why Catalogers? n Survey conducted by Ma in 2007 revealed that the metadata qualifications and responsibilities required knowledge of MARC, crosswalks, XML, OAI … n Analysis of cataloging position description performed by Park, Lu, and Marion reveal that advances in technology have created a new realm of desired skills, qualifications and responsibilities for catalogers. n Terry Reese’s MarcEdit is a best friend to everyone in technical services n Tosaka stresses the importance of metadata transformation to enable reuse Violeta Ilik vivoweb.org Hosted by ALCTS, Association for Library Collections and Technical Services + Recommendations: n Cataloging departments need to be proactive in creation and maintenance of non-MARC metadata AND in the development of means for sharing the metadata n Catalogers need to participate in development and use of descriptive standards n Catalogers need to participate in development of semantic web technologies; creation of semantic web compliant data Violeta Ilik vivoweb.org Hosted by ALCTS, Association for Library Collections and Technical Services + Outline: n MARC data – challenges n XML family of standards n XML & XSLT n Metadata Map n Metadata transformation (live demo) n Resources n Conclusion Violeta Ilik vivoweb.org Hosted by ALCTS, Association for Library Collections and Technical Services + MARC data – challenges Common issues - metadata and crosswalks The four categories of metadata problems according to Dushay and Hillman: n missing data n incorrect or erroneous data n confusing or inconsistent data n insufficient data Naomi Dushay and Diane I. Hillman, “Analyzing Metadata for Effective Use and Re-use,” in Proceedings of the 2003 International Conference on Dublin Core and Metadata Applications: Supporting Communities of Discourse and Practice - Metadata Research & Applications (n.p.: Dublin Core Metadata Initative, 2003) Violeta Ilik vivoweb.org Hosted by ALCTS, Association for Library Collections and Technical Services + MARC data – challenges The Challenges: n ambivalent matches n hybrid bibliographic records n data mapping to multiple fields or combining into single fields during migration n orphaned data parsed into incongruous fields n mixed standards in original data n MARC data loss during the migration n flat structure versus hierarchical structures (Mary S. Woodley et al., “Crosswalks, Metadata Harvesting, Federated Searching, Metasearching: Using Metadata to Connect Users and Information,” in Introduction to Metadata, online edition, version 3.0 (Los Angeles: J. Paul Getty Trust, 2008) Violeta Ilik vivoweb.org Hosted by ALCTS, Association for Library Collections and Technical Services + MARC data – challenges The Challenges (continued): n reconciling metadata organization systems n choice of unanalogous processes during metadata standards creation n imprecise definitions or alternate naming choices that inhibit element to element mapping n information being lost or combined during mapping n unharmonious hierarchical structures (Margaret St. Pierre and William P. LaPlant Jr., “Issues in Crosswalking Content Metadata Standards,” (white paper, National Information Standards Organization (NISO) Violeta Ilik vivoweb.org Hosted by ALCTS, Association for Library Collections and Technical Services + The XML family of standard n Schema languages n XSLT n Xquery n Xpath n XSL-FO n SVG n Namespaces n Schematron n Xproc n Regular expressions n Xforms - An XML-oriented replacement for the forms used on the Web on commercial sites and elsewhere. David Birnbaum, 2012. “What is XML and why should humanists care? An even gentler introduction to XML” Violeta Ilik vivoweb.org Hosted by ALCTS, Association for Library Collections and Technical Services + XML family of standards Introduction to XML n Ordered hierarchy of content objects (OHCO); three views of XML (containers, tree, serialization) n Well-formedness: root element, matching tags, no overlapping elements, quoted attribute values, proper name characters, no reserved characters (ampersand, angle brackets) n Document analysis (hierarchy; chunks and in-line elements) n Elements and attributes n Descriptive vs presentational markup n Validity (vs well-formedness) Birnbaum, D. & Ilik, V. “The Role of XSLT in Digital Libraries, Editions, and Cultural Exhibits” TPDL 2013, Valletta, Malta - Sept. 22nd 2013 Violeta Ilik vivoweb.org Hosted by ALCTS, Association for Library Collections and Technical Services + XML family of standards Ordered hierarchy n XML is a hierarchical tree n Tags are not toggle switches that turn properties on and off. n When you’re writing XML, insert the whole element, with both the start and end tags David Birnbaum, 2012. “What is XML and why should humanists care? An even gentler introduction to XML” Hosted by ALCTS, Association for Library Collections and Technical Services + XML family of standards Rules of well-formedness: - every start tag must have a closing tag <tag> </tag> Or <tag/> - Tags must nest cleanly <creator><name>Birnbaum, David</ name></creator> - Attribute values must appear within quotation marks <page n=“12”/> - Tags are case sensitive and they must match <creator></creator> Or <CREATOR></CREATOR> - Single root element - The left angle bracket and ampersand are special characters Birnbaum, D. & Ilik, V. “The Role of XSLT in Digital Libraries, Editions, and Cultural Exhibits” TPDL 2013, Valletta, Malta - Sept. 22nd 2013 Violeta Ilik vivoweb.org Hosted by ALCTS, Association for Library Collections and Technical Services + XML family of standards Document analysis (hierarchy; chunks and in-line elements) n The life cycle of a project starts by determining the structural hierarchy and the semantics Birnbaum, D. & Ilik, V. “The Role of XSLT in Digital Libraries, Editions, and Cultural Exhibits” TPDL 2013, Valletta, Malta - Sept. 22nd 2013 Violeta Ilik vivoweb.org Hosted by ALCTS, Association for Library Collections and Technical Services + XML family of standards Elements & attributes n An element consists of a start tag, an end tag, and content, which is whatever occurs between the tags. n XML elements may have four types of content: n Element content n Text content n Mixed content n Empty element n Attributes – provide supplementary information about an element Birnbaum, D. & Ilik, V. “The Role of XSLT in Digital Libraries, Editions, and Cultural Exhibits” TPDL 2013, Valletta, Malta - Sept. 22nd 2013 Violeta Ilik vivoweb.org Hosted by ALCTS, Association for Library Collections and Technical Services + XML family of standards Descriptive vs presentational markup n Descriptive markup is usually mapped onto procedural markup. It is designed to support an open class of applications like information retrieval n Presentational markup is designed for reading as it clarifies the presentation of a document Birnbaum, D. & Ilik, V. “The Role of XSLT in Digital Libraries, Editions, and Cultural Exhibits” TPDL 2013, Valletta, Malta - Sept. 22nd 2013 Violeta Ilik vivoweb.org Hosted by ALCTS, Association for Library Collections and Technical Services + XML family of standards Validity (vs well-formedness) n This is a technical term that means that the document uses only certain elements, and that it uses them only in certain contexts Birnbaum, D. & Ilik, V. “The Role of XSLT in Digital Libraries, Editions, and Cultural Exhibits” TPDL 2013, Valletta, Malta - Sept. 22nd 2013 Violeta Ilik vivoweb.org Hosted by ALCTS, Association for Library Collections and Technical Services + XML family of standards Overview: XSLT n ”XSLT (eXtensible Stylesheet Language Transformations) is one way to transform your document, manipulate the tree, and output the results as XML, HTML, SVG, or plain text.” n An XSLT stylesheet is an XML document and must be valid against the XSLT schema. n The root element is <xsl:stylesheet> - elements inside the root are primarily <xsl:template> elements - template elements typically have a @match attribute n XSLT is a declarative programming language Violeta Ilik vivoweb.org Hosted by ALCTS, Association for Library Collections and Technical Services + XML family of standards Namespaces n Input namespace n Output namespace <xsl:stylesheet xmlns:marc="http://www.loc.gov/MARC21/slim" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0" exclude-result-prefixes="marc"> <xsl:import href="MARC21slimUtils.xsl"/> <xsl:output method="xml" indent="yes"/> Violeta Ilik vivoweb.org Hosted by ALCTS, Association for Library Collections and Technical Services + XML family of standards Controlling the output with <xsl:output> n <xsl:output method="xml"