Question And Requirement For Biomedical Oral History Interviews

TEI/XML Tagging Proposal for OHRC Interviews

Jan 2007

Note: this is a document for Pilot project.

Required Metadata Data for Interviews: Interview Title: Interview of Interviewee’s last name, first name AltTitle: Repeatable title field for alternative titles. Comments: The current interview titles are not consistent. Some have the interviewee’s name as the title; some have independent title which doesn’t contain the interviewee’s name; some have the series title plus interviewee name as the title. Date: Interview Date: The year when the interview was conducted. Coverage Date Birth Date Death Date Name: Interviewer, Interviewee, Introducer, Direct (commonly used name) Distributor: The agent who published the interview. (UCLA Library’s Center for Oral History Research) Copyright: The agent who holds the copyright (UC regents or interviewee). Subjects: Subject from LCSH, Collections of the Center, Series title and others. Language: English, Spanish, Italian and others Interview Length (Format): the total length of all audio files. Audio Equipment: Equipment used to generated the audio files. Video Equipment: Equipment used to generated the video files.

Audio File Information (Size, Location, Format( tape or wav files)) Source Information (BIB_ID, InternetLink, no need to save the physical format here since we have it in the library catalog database.)

Related Documents (will be included into XML file, need to talk with Jane about this list): 1. Introduction 2. Summary of Content 3. Biographical Information 4. Interview History 5. Access to the Interview 6. Terms and Condition of Use 7. Bibliographic Information 8. Related Material 9. Acknowledgement 10. Table of Contents 11. Index (Keep the original page number during OCR)

Other Documents: Picture of the Interviewee (could have more than one picture)

1 Validated Interview Sample XML file for review: O:\Working Space\TEI tagging\TeiTest.XML

Site Structure: Series/interviews/sessions Series will have its metadata information: description, date and so on.

I. TEI tags and UCLA Core 3.1 Metadata Mapping

Metadata from UCLA DLCS database (source: \\lis35\dl2$\UCLA Core releases\UCLA Core version 3.1 withAlt_title ):

DescMD_T field Description Mapping to TEI

DC_TITLE DC_DESCRIPTION div1 interviewdesc DC_PUBLISHER <publicationStmt> <pubPlace> <publisher> DC_CONTRIBUTOR Introduction Do not consider writer DC_DATE year-month-day <publicationStmt><date> ex. 1999-12-31 or just year DC_TYPE Text Do not consider DC_FORMAT 3 v. (xxvii, <sourceDesc> 1415 leaves) : <extent> portraits, index. DivID DublinCore TEI File ID Indentifier - <TEI.2 id="hb4f59n6vq"> Primary key ex. DPF0101 DC_SOURCE Do not consider DC_LANGUAGE <profileDesc> <langUsage> <language> DC_RELATION Do not consider DC_COVERAGE Do not consider DC_RIGHTS Div Copyright Law DESCMD_ID The Descriptive Do not consider Metadata ID for this project ex. DPF99DCCreators_T field Description Mapping to TEI2 CreatorId Name <teiHeader> <fileDesc><titleStmt> <author> <name> Type Interviewee, <profileDesc><particDesc><Person> interviewer, < role> IntroductionSubjects_T field TEI mapping DC_SUBJECT <textClass> <item>StructMD_T Structure of the collections and Series, related files information (ID/Title)XML Sample from UCLA DLCS database: <title>Water for Los Angeles: Robert V. Phillips Water for Los Angeles Phillips, Robert V. [interviewee] Basiago, Andrew D. [interviewer] UCLA Library’s Center for Oral History Research 1987 text xi, 146 leaves : portrait, index. 287 English 1917- UCLA Library

3 Mapping to DLCS

Questions: 1. For related documents, should we add the titles to the controlled terminology? 2. For related documents, the content will be saved into the XML files directly in stead of in the field of DESC_VALUE (3000 char limitation). 2. This project will require more qualifier later, exp: altTitle (firstline, firstlineofChorus, uniform), may have original title and so on. 3. One big audio files or keep the current files? 4. TEI XML files will be the final XML document.

PROJECTS: Insert the project name and related folder information. http://digidev.library.ucla.edu:8080/dlcs/PopulateProjectSetup.do (project set up)

DL_Objects: Insert the interview into current list http://digidev.library.ucla.edu:8080/dlcs/PopulateObjectModel.do (object model) Series: Level 1 Interview: Level 2 Core Descriptive Terms: http://digidev.library.ucla.edu:8080/dlcs/PopulateDescriptiveTerms.do

Core_desc_terms: Options Label

Title Required Title AltTitle Repeatable Alternative Title Subject Controlled Values, Repeatable Date Required Interview Year Publisher Required Contributor Subject Repeatable Subject Heading from LC or Controlled values generated by the staff Name Repeatable Qualifier : Interviewee, Interviewer Description Language Rights Copyright Owned by Audio File Size Audio File Format Controlled values: WAV, Tape Audio File Location BIB_ID Website

Control Values (map it to descriptive terms): Question with the interface: should we separate the CV list by difference project?

4 FILE_GROUPS: for all related files: text, image, audio, video files PROJECTID_FK: Oral History FILE_GROUP_TITLE: Interview related text files/ image files/ audio files/video files Description: Description

For Related Documents: The content could be written directly into the big XML files, whenever the user needs to modify the content, the text in the XML file will be called and saved back. For transcript content: yes, there will be a separate page in the admin interface, we can have the XML file read directly from the file.

Desc_Values DESC_VALUEID_PK Descriptive Metadata Value Identifier ProjectID_FK Foreign Key of Project ID DIVID_FK Item identifier, DIVID, from the PROJECT_ITEMS table Desc_termID_FK Foreign key of the description term DESC_VALUE DESC_CVID_FK Foreign key from the control values DESC_QUALIFIERID_FK Foreign key from the desc_qualifiers

For our metadata, we can save the value in the DESC_VALUE which has the character limitation to 3000. We could save the related document (it is very likely that they will be over 3000 characters) content directly into our final XML file then just use the DIVID_FK to point to the file. We need to save the document name as the description term.

For the transcript, we have an XML file already, we need to have the code read this XML file and write the content into the final file. Whenever there is update, we need to generate the final XML file again.

5 III. Recommended TEI tags :

<author> <publicationStmt> <pubPlace> <publisher> <address> <seriesStmt> To which series it belongs <title> <author> <idno> <sourceDesc> <bibl> <author> <title> <extent> <pubPlace> <publisher> <date> <idno> <encodingDesc> <projectDesc> <editorialDecl> <profileDesc> <langUsage> <language> <particDesc> <person role=" "> <textClass> (lcsh) </teiHeader> <text> <front> <titlePage> <div1 id=" " type="biography"> <head> <list> <sp> <speaker> <time> <body> <end> (index) </text>6 TEI Text Tags: All TEI text will be included in <text> tag. There are three sections within text: front, body and back. All related document will be tagged in front, the transcript will be in body and the index will be in the back. The transcript could be separated with the tape number or sections. For encoding the body: http://www.tei-c.org/Lite/U5-body.html For front and end tags: http://www.tei-c.org/Lite/U5-fronbac.html <text> <front> [ front matter ... ] </front> <body> [ body of text ... ] </body> <back> [ back matter ... ] </back> </text>Front: titlePage: Title, publisher, image location div1: Biography and other related documents Body: Transcript Back: indexTagging for Notes:1. Notes examples from interview of Jerrico There are two notes Jerrico’s interview. They are located at page 2 and 124. Both of them are interviewee added content. Here is the example from page 124.In transcript text: *[Well, there's an anecdote about that. I've told you about my being in the navy at the tail end of the war, stationed at Treasure Island and entertaining the entertainers.] Footnote: * Mr. Jarrico added the following bracketed section during his review of the Transcript. CDL give the recommendation for how to tag in-line notes, footnotes and endnotes. Inline notes are notes appear as a marked section in the body of the text. Footnotes are notes appear at the foot of the pate. Endnote are notes appear at the end of a chapter, part, or volume. The above text could be tagged this way following the recommendation:The following two notes has to be places together in the transcript text since we will just keep the page number in tag inside the text but not separate the content as the transcript text book did. <note place= “inline”> Well, there's an anecdote about that. I've told you about my being in the navy at the tail end of the war, stationed at Treasure Island and entertaining the entertainers. </notes>7 <note place= “foot”> Mr. Jarrico added the following bracketed section during his review of the transcript. </note>The benefit of using the notes tag as CDL recommended is that it will be easy to include all kind of notes in our text. The drawback is that the web interface won’t be able to mark out the added content since there will be no such tag as <add> or <delete>.2. Examples from Marcia: Marcia provided three notes scenarios from her interviews: Interviewee added content, editor annotation and restricted materials in text. All of them could be handled with the inline note tagging. 1) interviewee added content Interviewer: Alfred Fessard [at the Institut Marey]?Dr. Ulf Lindblom: Well, no, that was before his time. This was when the tradition went back to [Jean Martin] Charcot [(1825-1893), French neurologist who had a great influence on neurology and psychology through his students, including Freud] and [Joseph Jules] Dejerine [(1849-1917), Swiss-born neurologist, later head of the Salpetriere], and the other -- Interviewer: Claude Bernard [(1813-1878), French experimental physiologist]?Apply the notes tag:Dr. Ulf Lindblom: Well, no, that was before his time. This was when the tradition went back to<note place= “inline”> Jean Martin </note> Charcot<note place= “inline”> (1825-1893) </note>, French neurologist who had a great influence on neurology and psychology through his students, including Freud </note> and <note place= “inline”>Joseph JulesDejerine [(1849-1917), Swiss-born neurologist, later head of the Salpetriere </note>, and the other -- Interviewer: Claude Bernard <note place= “inline”> (1813-1878), French experimental physiologist </note>?2) editor annotation or clarification Interviewer: Alfred Fessard [footnote link a]?Dr. Ulf Lindblom: Well, no, that was before his time. This was when the tradition went back to Charcot [footnote link b] and [Joseph Jules] Dejerine [footnote link c], and the other -- Interviewer: Claude Bernard [footnote link d]? a.. b. Jean Martin Charcot (1825-1893), French neurologist who had a great influence on neurology and psychology through his students, including Freud. c. Joseph Jules Dejerine (1849-1917), Swiss-born neurologist, later head of the Salpetriere.8 d. Claude Bernard (1813-1878), French experimental physiologist. Apply the notes tag:Interviewer: Alfred Fessard <note place= “inline”> Alfred Fessard, director of the Insitut Marey in Paris </note>?SECOND example: here is a comment from the interviewee (Dr. Lindblom is discussing his student days):Ulf Lindblom: Yes, yes. Yes, I enjoyed it, and the student life in Uppsala at the time was organized with unions, student unions for each part of Sweden, and mine was Smålands. And it was called a club or a “nation”, and I belonged to “Smålands Nation”, which was our landscape, and we had our dinners there. You had a small, tiny student’s room, but you could go there <note place= “inline”> Added by oral author: to the student union. You had the daily newspapers, you had the library, and you had festivals there, traditional ones at certain periods of the year. </note>3) restricted material (if an interviewee indicates that he wants part of the material not to be made public for 20 years or until after his death or whatever, then this material will not be part of the digital transcript when initially posted. There will be a note indicating that restricted material has been deleted from the published transcript.)Ulf Lindblom: And I have been to every Pain Congress since. [Omitted here is material which has been restricted at the request of the oral author and will be made public at a future date designated by him.] Then the interview just continues from the point where the restricted material ends.Apply the notes tag:Ulf Lindblom: And I have been to every Pain Congress since. <note place= “inline”> Omitted here is material which has been restricted at the request of the oral author and will be made public at a future date designated by him.] Then the interview just continues from the point where the restricted material ends. </note>CDL TEI Best Practice Guidelines for Encoding Oral Histories http://www.cdlib.org/inside/diglib/stwg/oh/OH_BPG.html 1. XML file: ARK.xml ARK: Archival Resource Key http://www.cdlib.org/inside/diglib/ark/ 2. Image file: ARK_localname.gif/jpg, pdf 3. Need to generate our own DTD…..even it follows the CDL standard. System requirement: Rename the XML, image file name with above requirement.Problems Encountered During Parsing: 1. NCname: XML 1.0 "non-colonized" name 9 Namespace: http://www.w3.org/TR/REC-xml-names/ http://www.w3.org/TR/REC-xml-names/#NT-NCName the NCName gives the namespace prefix, used to associate element and attribute names with the namespace name in the attribute value in the scope of the element to which the declaration is attached. In such declarations, the namespace name may not be emptyConcerns and Questions: 1. Confidential Information when release to other agents 2. Controlled vocabulary for the div1 type attribute. Biography, legal…. 3. Are we going to keep all added and deleted content for our future interviews? Do we want to tell the user about who added and who deleted the content? If the content is deleted, is that still necessary to keep them in the transcript? Will we block the audio file if the interviewee deletes part of the content?Memo from meetings: 1. XML file need to be validate 2. Need place to hold the call number and so on 3. Subject heading will be further discussed with Dabbie 4. Series will be treated as single metadata record: description and so on (Original Proposal will be saved as the description. 5. Collections will be saved as subject heading 6. Metadata for audio files(Tape: how long and where is it. WAV, length. ) 7. XML markup. Need to keep the page number: Paginate page. 8. Interface browsed: Berkeley, docsouth, oyez.org, Michigan-writers.org/featured, hpol.org 9. Notes fields could be Join together 10. Name standard: All Capital, First Name, Last Name in transcript text. 11. who attribute: first initial+ full last name, all in capital. 12. Interviewee edit policy: less deleting, 13. Editor’s policy: less modification.Memo from discussing the TEI header part with Lisa and Stephen 1. Name in the DLCS database a. There will be only one authorized name in core_desc_control_values (Core_desc_CV, Core_desc_CVID_PK), the name should be input there before input any data for Oral History Interview. b. In DESC_Control_values, the project id (ProjectID_FK) and Core_desc_CVID_pk will be connected together. In Desc_Values, either DESC_Value or DESC_CVID_FK will be saved. DESC_CVID_FK will be used to connect with the Core_desc_controlled_value. c. All names for one person are controlled (http://unitproj1.library.ucla.edu/dlib/metadata/definitions.cfm ), either they are authorized or unauthorized.10 d. There are will be a qualifier as DIRECT FROM for standard names (First name, last name) , this name will be linked with the authorized name There will be an authorized name for each interviewee, and also possible a direct name. Will discuss this with Teresa. 2. Ark name is used as the file’s name. 3 Authority source for “oral histories” Att, Art and Architecture Thesaurus (http://www.getty.edu/research/conducting_research/vocabularies/aat/index.html ) Use for those works that record interviews conducted to preserve the recollections of persons whose experience or memories are representative or are of special historical or social significance. LC: http://authorities.loc.gov/cgi-bin/Pwebrecon.cgi? GetScopeNotes=1&SEQ=20070228201108&PID=15026 Here are entered works on the technique of recording the oral recollections of persons concerning their knowledge of historical events as well as collections of such recollections. Individual oral histories are entered under the appropriate subject, e.g. United States--Civilization--1918-1945.For next week’s question: Separate database/project for Genetic Oral History Project or different subject heading and publisher? Yes from the start. 11 Memo of recommend qualifier to UCLA Core Metadata1. altIdentifier : OPAC, CallNo Bib_id is used for making the links to UCLA catalog database. It is unique for each record and is also different from the call number. Example: http://catalog.library.ucla.edu/cgi-bin/Pwebrecon.cgi?db=local&BBID=462203 2. altTitle : parallel TEI give the following qualifier for title type: main, subordinate, parallel and abbreviated (http://www.tei-c.org/P4X/ref-TITLE.html), 3. description: TEISource, TEIScript, TEIRecording, TEISource Description: for describe the source of this Interview file. TEIScript Description: for describe how to tag the original script of this interview. TEIRecording Description: for describe the original source of the video and audio files. 4. Date: Interview (use the creation date for interview date) 5. Name: Direct_Form Some interviewees use a name different from the authorized name. For example: For Thomas Kilgore, the authorized name is: Kilgore, Thomas, 1913-1998, and he always go with Thomas Kilgore, Jr. 6. Format: extent We need this field to be repeatable so we can input the extend for page number and audio file length in total. I need qualifier to separate the extend of either Audio or Video. Keep the page number and interview length at the description part. Mapping to the DLCS database<?filetitle Interview of ?> [[Title]] <TEI.2 > <teiHeader type="UCLA DLP-TEI:OHRC"> [[DLCS defined type]] <title> [[Title]] < title type=””>[[AltTitle: parallel ]] [[name: Interviewee]] [[Interviewee Name in authority and direct form]] TEI markup done by UCLA Digital Library Program [[fixed text]] (For XML file)

12 Los Angeles,Calif. Fixed text Fixed text : UCLA DLP Fixed text, xml file generated date ark:/cdlarknumber [[ DLCS Ark number ]] [[rights]]

[[series title]] </seriesStmt> <sourceDesc>  This Transcript is generated directly from digital recording of the interview. <bibl> <title> [[Title]] [[AltTitle]] [[name: interviewee, direct form]] Interview conducted by [[name: interviewer]]

[[Title]] [[AltTitle]] [[name: interviewee, direct form]] [[format:extend]] [[publisher: placeOfOrigin]] [[publisher: publisherName]] [[date: publication]] [[altIdentifier:local]]

[[description: Script]]

13 [[description: Recording]]

Text encoded as part of UCLA digital Library Collection.

This XML files is generated following the CDL TEI Best Practice Guideline (http://www.cdlib.org/inside/diglib/stwg/oh/) and following the TEI 4 Consortium ( http://www.tei-c.org/P4X/).

For detail requirement of the header part , please go to http://www.tei-c.org/P4X/HD.html. For detail requirement of the text part , please go to http://www.tei- c.org/P4X/DS.html. For detail of each tag used in this document, please go to http://www.tei-c.org/P4X/REFTAG.html.

Library of Congress Subject Headings Art and Architecture Thesaurus local

14

The entire document is in

English

[[name: interviewee, direct form]] [[name: interviewer]]

oral histories.

Header for Session 1 January 23, 2002 Index

Resources: 1. The TEI Consortium: Guidelines for Electronic Text Encoding and Interchange, edited by C.M. Sperberg-McQueen and Lou Burnard, Oxford-Providence-Charlottesville_bergen, March 2002 (http://www.tei-c.org/P4X/ ) 2. CDL TEI Encoding Guidelines for Oral Histories http://www.cdlib.org/inside/diglib/stwg/oh/

16