Study of MARC21 Elements for Codification and Transliteration

Study of MARC21 Elements for Codification and Transliteration

CHAPTER: 3 Study of MARC21 Elements for Codification and Transliteration Introduction History of MARCH MARC Format MARC: Record Structure Study of MARC21 for Data Encoding Record Translation Methodology for Data Codification Analysis of MARC21 Fields Conclusion Chapter 3: Study of MARCH elements 82 3.1 INTRODUCTION The holding of a library is displayed in the form of catalogue. There are many forms of catalogue: > Book catalogue > Sheaf catalogue > Card Catalogue > Computer Catalogue (Machine Readable Catalogue) So far, the widely used form of catalogue is Card Catalogue but a gradual shift is seen towards having Machine Readable Catalogue. Computer Catalogue or Machine Readable Catalogue (MARC) was developed to print card catalogue and to facilitate bibliographic data exchange. MARC, was initiated by Library of Congress (LC) as pilot project in 1965. In 1963 a study conducted by LOC recommended automation of cataloguing, searching, indexing and document retrieval (1). MARC is for identification and arrangement of bibliographic data for computer processsing and fiirther distribution of catalogue (2). MARC format is a standardized approach to describe bibliographic data so that computers can understand the data irrespective of the language and script of the data. To cut across the language barrier MARC used numeric tags but still English has its own impact on MARC , for example the use of sub-field identifiers (3). 3.2 HISTORY OF MARC21 The pilot project known as MARC-I began in the year 1965 with the main aim of creation and distribution of machine readable cataloging data to other libraries with Library of Congress (LC) as the distributing point. MARC-I only dealt with books. The development of MARC-II started in 1968. It was Chapter 3: Study ofMARC2I elements 83 planned to cover all types of materials including books and monographs. During 1970-1973 documentation was issued for other materials i.e. in 1972 films records were issued, 1973 for serials, maps and French books and by 1975 records for German, Spanish, and Portuguese material. (4) In the year 1999 USMARC and CAN/MARC were harmonized and named as MARC21 (5). The MARC21 bibliographic format, as well as all official MARC 21 documentation, is maintained by the Library of Congress and by Canadian National Library (6) and the Library of Congress maintains the MARC21 website. Recently UKMARC is also being merged with MARC21 and British Library is shifting fi-om UKMARC to MARC21. The Library of Congress and the National Library of Canada serve as the maintenance agency for the MARC21 formats for bibliographic, authority, holdings, classification, and community information data. Proposals for changes to the format may originate from any MARC21 users. The proposals for change in formats can be expressed by MARC users at open meetings or via email and the listserv (7). 3.3 MARC FORMAT A MARC record involves three elements: the record structure, the content designation, and the data content of the record (8): > Structure: MARC records is typical of Information Interchange Format (ANSI Z39.2) and Format for Information Exchange (ISO 2709). > Content designators: By definition "the codes and conventions established to identify explicitly and characterize further the data elements within a record and to support the manipulation of those data". Anything which establishes the kind of data is a Content Designator, for example, there are three kinds of content designators ~ tags, indicators, and subfield codes. Chapter 3: Study ofMARC21 elements 84 > Content: This is the actual data which we store in the data fields. Often most of the data elements are defined by standards outside the formats in for example, Anglo-American Cataloguing Rules, Library of Congress Subject Headings, National Library of Medicine Classification. The current research follows hypothesis that much of the content can be codified or transliterated which facilitates cross lingual information retrieval in Indian languages. In MARC21, formats are defined for five types of data: bibliographic, holdings, authority, classification, and community information. The current study is only oriented to Bibliographic data. The elements which are essential for bibliographic description are taken in the account. 3.3.1 MARC: Record Structure A typical MARC record consists of three main sections (8): the leader, the directory, and the variable fields. > The leader consists of data elements that contain coded values and are identified by relative character position. It is also called as Record label in CCF and UNIMARC. Data elements in this section define parameters for processing the record. It is fixed in length (24 characters) and occurs at the beginning of each MARC record. > The directory contains the tag, starting location, and length of each field within the record. The length of the directory entry is defined in the entry map elements in Leader/20-23. In the MARC 21 format, the length of a directory entry is 12 characters, while in CCF it is 14 characters where character 13* and 14"" are Segment Identifier and Occurrence Identifier. The directory ends with a field terminator character. > The data content of a record is divided into variable fields. The MARC 21 format distinguishes two types of variable fields: variable control fields and variable data fields. Chapter 3: Study of MARCH elements. 85 RECORD LENGTH LABEL 5 characters 24 characters RECORD STATUS DIRECTORY 1 character variable length TYPE OF RECORD DATA FIELDS 1 character variable length BIBLIOGRAPHIC LEVEL RECORD SEPARATOR 1 character 1 character TYPE OF CONTROL 1 character Directory Structure CHARACTER CODING ENTRY SCHEME 12 characters 1 character INDICATOR COUNT E>rrRY 1 character 12 characters ENTRY SUBFIELD CODE COUNT 12 characters 1 character ENTRY BASE ADDRESS OF DATA 12 characters 5 character ENCODING LEVEL 1 character FIELD SEPARATOR 1 character DESCRIPTIVE CATALOGUING FORM 1 character LINKED RECORD ^ REQUIREMENT Directory 3 character LENGTH OF THE LENGTH OF FIELD STARTING LENGTH OF I character TAG CHARACTER DATAFIELD LENGTH OF THE 3 characters POSITION 4 characters STARTING CHARACTER 3 characters POSITION • LENGHT OF THE IMPLEMENTATION DEFINED 1 character UNDEFINED Datafleld Structure 1 character SUBFIELD SUBFIELD FIELD INDICATORS SUBFIELD SUBFIELD IDENTIFIER IDENTIFIER SEPARATOR 2 characters variable variable 2 characters 2 characters 1 character Fig. 3.1: Diagrammatic Representation of MARC Record Chapter 3: Study ofMARCll elements. 86 3.4 STUDY OF MARC21 FOR DATA ENCODING Listing of content designators in MARCH Manual is done in nine blocks. The '0th block' is for control data and codified data. Control data are basically different types of numbers given by different agencies or by libraries for unique identification of number for example, ISBN, ISSN, Library of Congress Class Number and so on. The codified data represents country of publication, date of publication and so on. Block MARC21 Area 0 Control Data and Coded Data 1 Main Entry Fields 2 Title, Edition, Imprint etc. 3 Physical Description, etc. Fields 4 Series Statement Fields 5 Note Fields: Part 1 and Part 2 6 Subject Access Fields 7 Name, etc. Added Entries or Series; Linking Entry Fields 8 Series Added Entry Fields, Holding and Location 9 Reserved for Local Implementation Table 3.1: MARC21 Blocks 3.4.1 Multilingual MARC One of the objectives of the present work is to display the records in different Indian language scripts. There are two aspects of multilingual display, transliteration and translation. Some of the data fields require transliteration from one script to other. For example, Name of a person will be always same in all the language only script should be changed. Chapter 3: Study ofMARCll elements 87 Much of the data in a catalogue requires translation. There are three ways of achieving Machine Translation, > Data codification > Multilingual Thesaurus > Machine translation If data is codified it is easy to translate it from one language to other using a kind of table lookup where rows contain words in different languages. That is why the current project has taken the approach to identify the fields which can be codified and allot numeric value to the standardized terms and at the end with help of machine readable multilingual lookup table, one can translate the data into any other language. Similarly, multi-lingual thesauri can be developed to handle subject headings. In other words, if we can develop a lookup table for multi-lingual thesauri, one can replace any keyword into a keyword in another language using the lookup table. In the present work, this aspect is less emphasized for the lack of the availability of multi-lingual thesauri of subject keywords for Indian languages. However, the approach taken in the case of replacing the codes by translated words can be easily adopted in the case of subject key terms. There are several subfields which require machine translation. But Machine translation (MT) is still a hard task with current Natural Language Processing (NLP) technology. In this regard a study is made to identify the fields which should be transliterated and translated using codification in MARC21 bibliographic standard. Chapter 3: Study ofMARCll elements. Broadly, field/subfields can be classified in following classes, > Translated field/subfield o Codified field/subfield o Data translation > Transliterated field/subfield > Non-processed 3.4.1.1 Translated field/subfield There are number of fields which need to be translated for mulfilingual access. Since, Machine Translation (MT) is a difficult task to achieve attempt has been made to identify the fields which could be codified and which must require translation. 3.4.1.1.1 Codified field/subfield The fields which use standard data elements for description are codified. For example. Relator Term (100 - MAIN ENTRY PERSONAL NAME, subfield $e) demonstrates the role or relation of the person with the document. It is not very tough to conclude to a list of such relation. Similarly, there are many instances in the MARC where there is no existing code but one can arrive at code for example.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    206 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us