Unit 7 Organisation and Formats
Total Page:16
File Type:pdf, Size:1020Kb
UNIT 7 ORGANISATION AND FORMATS Structure 7.0 Objectives 7.1 Introduction 7.2 Content Organisation 7.3 Data Types and Formats 7.3.1 Text Formats 7.3.2 Images 7.3.3 Audio/Speech Formats 7.3.4 Animations 7.3.5 Video Formats 7.3.6 Bibliographic Formats 7.4 Multimedia Authoring 7.5 Multimedia Collections in Libraries and Information Services 7.6 Summary 7.7 Answers to Self Check Exercises 7.8 Keywords 7.9 References and Further Reading 7.0 OBJECTIVES This unit aims to give an overview of content organisation. Content in modern databases is not just text but may also include other multimedia elements such as audio, video, images and animations. Various data formats that form the content of databases are discussed in detail in this unit. Illustrations are given to demonstrate different multimedia collections. The lesson aims to cover the issues in content types and its organisation. After reading this unit you will be able to: understand the concept of content organisation; describe formats of different data types available in electronic form; discuss the components of multimedia; explain the importance of multimedia in the present day; and enumerate different types of formats used in multimedia components and their characteristics. 7.1 INTRODUCTION Databases are created for specific purposes. Accordingly the information content varies from one database to another. For example, databases of bibliographic information and abstracts contain details of books, articles and conference and seminar papers. Moreover, particular services and applications may be developed for specific databases. In the above stated example typical applications for bibliographic databases are search interfaces, display interfaces 125 Content Organisation (ranked and sorted), download utilities etc. This leads to the conclusion that each database and its content have to be carefully organised according to its purpose. If the purpose is to generate information services then we would furnish details like author, title, publisher, place, subject keywords etc. Also content of such databases is structured and organised according to the needs of the end-users. 7.2 CONTENT ORGANISATION Database softwares are distinct from other softwares mainly for their organising capabilities. Full text of documents is most sought in electronic and traditional collections. But providing access to particular information in the full text is very difficult. Most search paradigms work on the logic of retrieving most frequent occurrences of terms. In a block of text, the most important term in that block may not be the most frequent; it may not be used at all since it is obvious. For reasons such as these, content organisation methods are necessary to be adopted. One method is to augment the database with an authority list. According to Otwell, an authority list is simply a list of preferred words that the database uses to index documents. It can improve results by focusing on terms and concepts that are known to be relevant, ignoring other terms. A controlled vocabulary list accommodates user errors (both when inputting new data and when searching) since it includes known misspellings, acronyms, or alternate terms. Organisation can also be at an intellectual level using ontologies or in more common terms subject related categorisation. This kind of organisation achieves a semantic mapping according to the subject concepts and is quite complicated to implement. One of the common means of achieving such organisations is by deploying subject-based thesaurus to map the hierarchical relationships of concept in subject domains. The hierarchical mapping is used for context based retrieval of the concepts. In simpler applications, individual database compilers plan for the information model as suited for their purpose. Content Organisation is the task of organising content according to different prescribed standards or categorisation of the database content. For example, for library purposes one of the familiar databases is bibliographic database. The common database elements and structure for this will be: Author Title Publisher Edition ISBN etc. Further, the design also includes planning for other features of bibliographic data such as whether a field is repeatable, what type of data will be entered in the field etc. Hence content organisation is a scientific task involving planning and designing the information model as per requirements and purpose for which the database is compiled. In libraries, standards such as MARC 21 are used for 126 bibliographic data organisation. Content may be organised in a classified manner also like most digital library Organisation and Formats collections are organised. The typical organisation in library collections would be by types of material like e-books, journals, reference works like, dictionaries and encyclopedia etc. Another categorisation is by type of sub collections. For instance, the Librarians’ Digital Library (LDL), is divided by collections such as publications, PowerPoint presentations, picture gallery under the main community of ‘Library and Information Science’ as shown in Fig. 7.1. Fig. 7.1: Librarian’s Digital Library (DRTC) Self Check Exercise 1) What is Content Organisation? Note: i) Write your answer in the space given below. ii) Check your answer with the answers given at the end of this Unit. ................................................................................................................................................ ................................................................................................................................................ ................................................................................................................................................ ................................................................................................................................................ ................................................................................................................................................ 7.3 DATA TYPES AND FORMATS Databases are organised collections of meaningful data/information. The collection may be just textual or numeric or may be a multimedia database. Multimedia refers to more than one medium of data. Multimedia databases give the facility to build databases of resources having more than one medium. The essential multimedia components of the multimedia databases are: 127 Content Organisation Text Sound Image 2-D drawings 3-D objects Video clippings Animations 7.3.1 Text Formats Text is the most used and common format of e-resources. Though all multimedia formats are incorporated, content is in textual format in e-resources as well as traditional resources. Textual matter can be keyed-in either in editors or sophisticated word processing packages. Editors use what is called ‘plain text’ that can be read by any text application whereas applications like word processors output formatted documents read only by that particular application. Plain text Plain text files usually have the extension ‘.txt’. They are also called ASCII text files and can be viewed with an editor (such as Edit or Notepad) or with a Word Processor (such as MS Word or Word Perfect). The characteristic feature of plain ‘.txt’ files is that they do not contain any kind of formatting on the document (such as bold, italics, font colour, images, etc.). Many simple programs are written as text files. E-mails are also sent as plain ASCII text, although now many mailing programs handle formatted mails. Formatted Text Documents In contrast to the simple text files other textual data is in the form of formatted documents. These are created using particular application software for word processing such as MS-Word or WordPerfect. The main difference between plain text and formatted text is that the word-processed documents in addition to the keyed-in text also contain special characters that ‘format the text’ in order to achieve the desired layout in the output. These are binary files. Some of the formatted text formats are discussed below: .doc , .wpd files A very common format found on PCs, for formatted text files, ‘.doc’ stands for ‘document’ files. These files may be created, viewed and edited using programs such as MS Word, Word Perfect and so on. Several formatting features such as bold, italics, justification, adding bullets and numbering, etc., are possible. .pdf files ‘pdf’ stands for ‘Portable Document Format’. This file format was developed by Adobe Systems in order to make it possible to transfer formatted documents 128 over the net so that they gave a ‘printed document’ feel and look the same on any system. The biggest advantage of .pdf files is that it allows printing of Organisation and Formats web pages – page by page as though it were a document file. This file type requires Adobe Acrobat Reader (free software to view .pdf files) and can be downloaded from the Net. .ps files ‘Post Script’ files are also an ASCII file type that is technically plain text. However, it is unreadable unless an onscreen viewer like ‘Ghostscript’ is used. It can be read by a (PostScript) PS printer. Hypertext Hypertext is structured text. Hypertext is created using HyperText Markup Language (HTML) – the ‘language’ in which web pages are written. The code of a web page is written in plain text and is saved with the extension ‘.htm/ .html’. The browser (such as Netscape Navigator or Internet Explorer) identifies the file as a web page. It reads