A Linked-Data Jazz Project for Stanford University Libraries
Total Page:16
File Type:pdf, Size:1020Kb
A Linked-Data Jazz Project for Stanford University Libraries Linked-Data Jazz Project Report December 2013 page Goal . 2 Background Objectives Audio surrogate example . 3 Status . 5 End notes . 7 Images . 8 1. Disc photos 2. COM index Experiment: RDI Label + issue-number links to re-releases 9 For access to internal document links and external URLs, please read this document on a computer, pad, etc. Page | 1 Goal Use linked data to rehabilitate and extend the reach of the explicit and implicit discographic relationships in the metadata that was built from 600,000+ commercial 78 rpm sound recordings during the early 1980s by the Rigler and Deutsch Record Index (RDI) project.1 Background The RDI project dealt with the historical sound collections at the Library of Congress and the New York Public Library, plus those at Stanford, Syracuse, and Yale universities. The work began by taking two high-resolution photographs of each disc-side – one photo with lighting designed to capture printed labels and a second aimed at highlighting information embossed into the disc surface (e.g., matrix numbers ). The metadata itself was created by data-entry staff who keyed input from enlarged images of the disc labels that were projected onto screens from copies of the photos stored on microfilm. An individual metadata record was created for each side of a disc – each side being equivalent to a separate recording session or take. It was common industry practice, especially with jazz and popular music, to release 78 rpm discs with different combinations of recording sessions/takes. For example: compare Columbia issue # 35660 with Okeh issue # 8503 both discs include the following session: Lois Armstrong, Potato head blues (matrix number 80855-C) recorded in Chicago, May 10, 1927 nd 2 side of Columbia # 35660 Lois Armstrong, Alligator crawl (matrix number 80854-B) recorded in Chicago, May 10, 1927 nd 2 side of OKeh # 8503 Lois Armstrong, Put ‘em down blues (matrix number 81302-B) recorded in Chicago, September 2, 1927 The RDI metadata was sorted into a number of sequences (e.g., composer+title, performer+title, archive+label+issue-number, etc.) and published in the form of COM indexes. Each entry in these indexes included the identification number of the microfilm copies of the photos from which the metadata was created. In response to a late change in the project plan, the vendor produced computer tapes with the metadata formatted as brief MARC records – records that were loaded into RLIN as a separately searchable database and later absorbed into WorldCat. Objectives With support from the Stanford University Libraries (SUL), the L-DJazzP seeks to: 1. Resurrect the original RDI metadata. 2. Restore the link between each metadata record and its respective pair of photos. 3. Pursue funding to convert the disc photographs into a permanent digital data store. 4. Filter the full set of RDI records to identify those representing jazz performances. 5. Reconcile the jazz-only subset of RDI metadata with RDI records from WorldCat. 6. Link RDI matrix and label+issue keys with re-releases in WorldCat. 7. Reconcile RDI matrix and label+issue keys with other 78 rpm disc metadata. With respect to objective no. 2, the keys that link RDI metadata to its pair of source images will serve as an internet surrogate for visual access to archival discs in support of discographic research. In addition, the combination of metadata indexes linked to images of Page | 2 label information and embossed data will provide sound archives around the world with a tool to assist in evaluating incoming materials for potential acquisition into their holdings. Regarding objective no. 6, tying together discrete RDI matrix and label+issue data with the re-release of a specific performance in LP, CD, and online formats will serve as an internet audio surrogate in support of casual, amateur, and professional/archival discographic pursuits. The following table illustrates the L-DJazzP’s potential for fostering access to audio surrogates. It is a hand-built sampling of the links that could be built between RDI source metadata and latter day LP, CD, and digital re-releases found in library holdings associated around the world. The connections shown here between one recording session released on two 78 rpm disc sides are derived from discographic information found in Rust 2 and Lord.3 A more extensive demonstration of such links is provided as a supplement to this report. Audio surrogate example Louis Armstrong Potato head blues Tom Lord , The jazz discography session ID A5612 Chicago, May 10, 1927 matrix # 80855-C Louis Armstrong And His Hot Seven : Louis Armstrong (cnt,vcl) John Thomas (tb) Johnny Dodds (cl) Lil Armstrong (p,vcl) Johnny St. Cyr (bj) Pete Briggs (tu) Baby Dodds (d) Page | 3 | continued | Lord: session / matrix number 8055-C Potato head blues Label names Issue numbers – includes 78 rpm discs + LP, CD & digital re-issues Portrait of the artist as a young man ... Columbia/Legacy, 1994 Portrait of the artist as a young man ... Columbia, n.d. Louis Armstrong : portrait ... v.2 ... Smithsonian, 1994 ARMSTRONG, Louis : Heebie Jeebies ... Naxos Digital Srvcs, 2004 Heebie-jeebies : original recordings ... HRH International, 2001 ARMSTRONG, Louis : Heebie Jeebies ... Naxos Digital Srvcs, 2004 Hot jazz from New Orleans .. Musidisc, 1988 Hot five & hot seven, 1925-1928 … Giants of Jazz, 1996 Portrait of the artist as a young man … Joker, 1989 Hot five & hot seven, 1925-1928 … S.I.A.E., 1990 Hot Five & Hot Seven, 1925-1928 ... Joker, 1988 Louis Armstrong Hot Five & Hot Seven ... SAAR, 1990 Louis Armstrong Hot Five & Hot Seven ... Mcps, 1990 Page | 4 Status Many issues are unresolved at this point. Some quite obvious (e.g, what extant and/or future schema and vocabularies can provide a web-wide representation of the types of traditional and new linkages outlined in this report). Some issues will need considerable work (e.g., what are the licensing implications of using published discographic data as the basis of links between original and subsequent re-releases of jazz performances). Other issues certainly will surface as work on this project goes forward. 1. Original RDI metdadata: Recovery of the RDI metadata began with a set of 31 mainframe data tapes stored in the SUL’s Archive of Recorded Sound. Of these tapes, 7 proved to contain ca. 4 million lines of RDI metadata. Unfortunately, analysis has shown that a substantive number of records are missing for the Library of Congress and the New York Public Library with a modest gap in the coverage of the Syracuse collections. Follow-on work is underway to determine the cost to scan and OCR the Label+Issue+Matrix sort of the COM index in order to recover a full set of RDI metadata from 5 reels of microfilm All of the index entries include tags for each separate metadata field. 2. Keys to photos: Both of the aforementioned sources of metadata include the keys that link each RDI record to the pair of disc images from which the RDI metadata was keyed. That link is a combination of the sound archive code, the film roll number, and the photo frame number. 3. Digital copies of disc photos: SUL is working to determine the potential cost of converting the disc photos into digital images. 4. Filters for jazz recordings: As demonstrated above in the Louis Armstrong example of links to audio surrogates, plans include using two discographic sources for filtering the full RDI metdata set in order to identify recordings of jazz performances. The first-level filter will use a just-completed detailed parse of Brian Rust’s Jazz and ragtime records.2 The parse provides source data for building a MySQL database from Rust’s data--an environment that will lend itself to building a variety of multiply formatted indexes for issue-number and matrix-number values. These two values include arbitrary combinations of alphanumeric characters spiced with spaces, dashes, periods, slashes and other forms of punctuation. Having access to an array of varied indexing will be essential to efficient and effective reconciliation of valid links across disparate pools of metadata, e.g., data found in different copies of RDI metadata, in entries from various jazz discographies, and in copies of local and WorldCat cataloging records. The second-level filter will use a gently-paced background process that will provide an unobtrusive pseudo-batch search interface to Tom Lord’s online Jazz discography.3 Here, the effort to identify jazz performances by matching individual 78 rpm disc sides to the respective recording sessions from which they originated will also include gathering citations for Lord’s unique session IDs, as well as collecting data that documents re-releases of specific performances in latter-day LP, CD, and digital-service formats. Page | 5 5. Links from RDI matrices and label+issue info to re-releases: Here the focus will be on creating flexible processes that can ingest and process discographic data that identifies re-issues as illustrated in the Louis Armstrong Potato head blues example above. Plans include work to identify re-releases in Stanford’s catalog, in WorldCat, etc. and to explore the creation of links to online music services with jazz collections like those provided by Alexander Street Press, Naxos Music Services, and Classic Jazz Online. These processes will require careful attention to verifying that recordings pointed to by “music number” indexes of label+issue keys do in fact contain the specific sound recording in question.4 6. Reconciliation of RDI metadata and extended links with extant cataloging: This part of the process will involve a two-way process. Linking RDI metadata with more complete cataloging records will expand access points to the RDI recordings.