UNIT 7 ORGANISATION AND FORMATS

Structure 7.0 Objectives 7.1 Introduction 7.2 Content Organisation 7.3 Data Types and Formats 7.3.1 Text Formats 7.3.2 Images 7.3.3 Audio/Speech Formats 7.3.4 Animations 7.3.5 Video Formats 7.3.6 Bibliographic Formats 7.4 Multimedia Authoring 7.5 Multimedia Collections in Libraries and Information Services 7.6 Summary 7.7 Answers to Self Check Exercises 7.8 Keywords 7.9 References and Further Reading 7.0 OBJECTIVES

This unit aims to give an overview of content organisation. Content in modern is not just text but may also include other multimedia elements such as audio, video, images and animations. Various data formats that form the content of databases are discussed in detail in this unit. Illustrations are given to demonstrate different multimedia collections. The lesson aims to cover the issues in content types and its organisation. After reading this unit you will be able to:  understand the concept of content organisation;  describe formats of different data types available in electronic form;  discuss the components of multimedia;  explain the importance of multimedia in the present day; and  enumerate different types of formats used in multimedia components and their characteristics. 7.1 INTRODUCTION

Databases are created for specific purposes. Accordingly the information content varies from one to another. For example, databases of bibliographic information and abstracts contain details of books, articles and conference and seminar papers. Moreover, particular services and applications may be developed for specific databases. In the above stated example typical applications for bibliographic databases are search interfaces, display interfaces 125 Content Organisation (ranked and sorted), download utilities etc. This leads to the conclusion that each database and its content have to be carefully organised according to its purpose. If the purpose is to generate information services then we would furnish details like author, title, publisher, place, subject keywords etc. Also content of such databases is structured and organised according to the needs of the end-users. 7.2 CONTENT ORGANISATION

Database softwares are distinct from other softwares mainly for their organising capabilities. Full text of documents is most sought in electronic and traditional collections. But providing access to particular information in the full text is very difficult. Most search paradigms work on the logic of retrieving most frequent occurrences of terms. In a block of text, the most important term in that block may not be the most frequent; it may not be used at all since it is obvious. For reasons such as these, content organisation methods are necessary to be adopted. One method is to augment the database with an authority list. According to Otwell, an authority list is simply a list of preferred words that the database uses to index documents. It can improve results by focusing on terms and concepts that are known to be relevant, ignoring other terms. A controlled vocabulary list accommodates user errors (both when inputting new data and when searching) since it includes known misspellings, acronyms, or alternate terms. Organisation can also be at an intellectual level using ontologies or in more common terms subject related categorisation. This kind of organisation achieves a semantic mapping according to the subject concepts and is quite complicated to implement. One of the common means of achieving such organisations is by deploying subject-based thesaurus to map the hierarchical relationships of concept in subject domains. The hierarchical mapping is used for context based retrieval of the concepts. In simpler applications, individual database compilers plan for the information model as suited for their purpose. Content Organisation is the task of organising content according to different prescribed standards or categorisation of the database content. For example, for library purposes one of the familiar databases is bibliographic database. The common database elements and structure for this will be: Author Title Publisher Edition ISBN etc. Further, the design also includes planning for other features of bibliographic data such as whether a field is repeatable, what type of data will be entered in the field etc. Hence content organisation is a scientific task involving planning and designing the information model as per requirements and purpose for which the database is compiled. In libraries, standards such as MARC 21 are used for 126 bibliographic data organisation. Content may be organised in a classified manner also like most Organisation and Formats collections are organised. The typical organisation in library collections would be by types of material like e-books, journals, reference works like, dictionaries and encyclopedia etc. Another categorisation is by type of sub collections. For instance, the Librarians’ Digital Library (LDL), is divided by collections such as publications, PowerPoint presentations, picture gallery under the main community of ‘Library and Information Science’ as shown in Fig. 7.1.

Fig. 7.1: Librarian’s Digital Library (DRTC) Self Check Exercise 1) What is Content Organisation? Note: i) Write your answer in the space given below. ii) Check your answer with the answers given at the end of this Unit...... 7.3 DATA TYPES AND FORMATS

Databases are organised collections of meaningful data/information. The collection may be just textual or numeric or may be a multimedia database. Multimedia refers to more than one medium of data. Multimedia databases give the facility to build databases of resources having more than one medium. The essential multimedia components of the multimedia databases are: 127 Content Organisation  Text  Sound  Image  2-D drawings  3-D objects  Video clippings  Animations 7.3.1 Text Formats

Text is the most used and common format of e-resources. Though all multimedia formats are incorporated, content is in textual format in e-resources as well as traditional resources. Textual matter can be keyed-in either in editors or sophisticated word processing packages. Editors use what is called ‘plain text’ that can be read by any text application whereas applications like word processors output formatted documents read only by that particular application. Plain text Plain text files usually have the extension ‘.txt’. They are also called ASCII text files and can be viewed with an editor (such as Edit or Notepad) or with a Word Processor (such as MS Word or Word Perfect). The characteristic feature of plain ‘.txt’ files is that they do not contain any kind of formatting on the document (such as bold, italics, font colour, images, etc.). Many simple programs are written as text files. E-mails are also sent as plain ASCII text, although now many mailing programs handle formatted mails. Formatted Text Documents In contrast to the simple text files other textual data is in the form of formatted documents. These are created using particular application software for word processing such as MS-Word or WordPerfect. The main difference between plain text and formatted text is that the word-processed documents in addition to the keyed-in text also contain special characters that ‘format the text’ in order to achieve the desired layout in the output. These are binary files. Some of the formatted text formats are discussed below:

.doc , .wpd files

A very common format found on PCs, for formatted text files, ‘.doc’ stands for ‘document’ files. These files may be created, viewed and edited using programs such as MS Word, Word Perfect and so on. Several formatting features such as bold, italics, justification, adding bullets and numbering, etc., are possible.

.pdf files

‘pdf’ stands for ‘Portable Document Format’. This was developed by Adobe Systems in order to make it possible to transfer formatted documents 128 over the net so that they gave a ‘printed document’ feel and look the same on any system. The biggest advantage of .pdf files is that it allows printing of Organisation and Formats web pages – page by page as though it were a document file. This file type requires Adobe Acrobat Reader (free software to view .pdf files) and can be downloaded from the Net.

.ps files

‘Post Script’ files are also an ASCII file type that is technically plain text. However, it is unreadable unless an onscreen viewer like ‘Ghostscript’ is used. It can be read by a (PostScript) PS printer. Hypertext Hypertext is structured text. Hypertext is created using HyperText Markup Language (HTML) – the ‘language’ in which web pages are written. The code of a web page is written in plain text and is saved with the extension ‘.htm/ .html’. The browser (such as Netscape Navigator or Internet Explorer) identifies the file as a web page. It reads the code and displays it on the screen as we see it with images, colours and hyperlinks along with the layouts and specified appearance of the pages. Self Check Exercise 2) Discuss the different types of text formats. Note: i) Write your answer in the space given below. ii) Check your answer with the answers given at the end of this Unit...... 7.3.2 Images

Photographs, sketches, maps and the like constitute images which may also be incorporated into multimedia products. Images can be created through appropriate image processing packages, .BMP, .GIF, .TIF etc. are the extension of the files by which images available in electronic form are identified. Images can also be acquired by scanning the original and incorporating either the original as it is or part of the original by selecting it. Perhaps the easiest way would be to download the images available on CDs, if suited for the purpose at hand. The only problem that might arise is which image format would be acceptable at the time of integration. Most image processors allow inter- conversion of image formats like .BMP, .GIF, .TIF, etc. A number of image editing packages are available like Adobe Photoshop, Image Assistant and the most common Paint and Paintbrush. The common features of Image processors include enlarging or shrinking images (zoom-out/in), adding dithering effects, colour modifications and other editing options. 129 Content Organisation Image formats The following sections discuss some of the most common graphic file formats and their features. .bmp files Bitmap files or .bmp files are the standard Windows Raster format. These file lay emphasis on quick display. It hence stores images in the uncompressed form. They occupy more disk space. . files One of the most popular graphic file formats on the Internet, Graphic Interchange Files (.gif) was developed by Compuserve with the main purpose of archiving information. The .gif images are usually scanned stand-alone pictures that are not ‘drawn’ using an application program. gif is a highly compressible format and very useful format when large number of images are to be incorporated. It is a standard web format – most browsers have a .gif viewer. The small size of gif images allows quick transmission over the net and it also supports animations. ./.jpg files JPEG stands for ‘Joint Photographic Experts Group’ that designed this format for high compression. It is one of the most popular image formats on the web. It has good compression capabilities. It is a web standard (second only to .gif). The .jpg image sizes are small and hence quick transfer over the Internet is possible. It offers flexibility – allows user to choose between image size and picture quality. Picture quality remains accurate at lesser levels of compression. . files TIFF stands for Tagged Image File Format. This format was designed to overcome the problem of application dependence. It was originally designed to become the standard format. This file format is generally used when graphic files need to be moved between different computer types (For example: PC to Mac and vice-versa). This format allows for high resolution and is highly flexible – there are several possibilities of how a .tif image can be saved. It is supported by most scanning and image editing software. The tiff format works well for both on-screen display and print of photographs. 2D sketches 2D sketches are mainly the line drawings as used most in cartoons and caricatures. 2D drawings have a particular appeal with simplicity of statement. They can be created with the freehand draw options available in most processors. Animator Pro is a popular package for creating and animating 2D images. Image features The features of images to be considered while planning multimedia databases that are image intensive are the following:  Colour 130  Resolution  File Size Organisation and Formats  Compression  Conversion Colour Colour is a very important feature of graphic images and different file formats pay a great deal of attention to the detail of colour handling. The purpose of a project, website or database, may define the extent of colour clarity and quality that is required. For example, a website that primarily handles photographs requires at least 16-bit or near true color. However, for most standard applications 8-bit or 256 colours is fairly sufficient. The following comparison explains the relation between bits and colours: 1 bit per pixel refers to an image with 2 colours 4 bits per pixel refers to an image with up to 16 colours 8 bits per pixel refers to an image with up to 256 colours Resolution Resolution of the image decides its clarity and sharpness. But again the tradeoff is that more the resolution chosen bigger the file size. Image resolution refers to the spacing of pixels in an image and is measured in pixels per inch (ppi). The higher the resolution, more the pixels in the image. The resolution setting maps an image’s pixel dimensions to its physical size. Image size refers to the physical dimensions of an image. Because the number of pixels in an image is fixed, increasing the size of an image decreases its resolution and decreasing its size increases its resolution. For instance; with a pixel dimension of 600 x 600 and a resolution of 300 dpi then the image may ideally have a physical size of 2 inches by 2 inches. File Size Images form large file sizes and need large disk spaces. Hence it is important to choose the right format while making an image intensive collection. File size depends upon the image format chosen, physical image size and resolution. For example, the traditional Bitmap format consumes much space when compared to the Graphic Image format or gif. Compression Ratio Bandwidth is a main issue in transmission of data on networks. Images are big in size. Hence many compression techniques are tried to compress images when they are transmitted online. A compression ratio is simply the size of the original data divided by the size of the compressed data. That is, a technique that compresses a 1 megabyte image to 100 kilobytes achieves a compression ratio of 10. The bitmap format has least compression capability as compared to the jpg format. Conversion Image formats should be interoperable with respect to popular applications. Popular image processing packages have conversion software from one format 131 Content Organisation to another. Hence it is advisable to choose popular formats supported by most packages and upgrade formats when the present is no longer supported.

Fig. 7.2: Conversion using Microsoft Imaging Self Check Exercise 3) What are important features to be considered while including images in Databases? Note: i) Write your answer in the space given below. ii) Check your answer with the answers given at the end of this Unit...... 7.3.2 Audio/Speech Formats

Sound files or audio files are gaining popularity on the web. Today, most of the latest sound tracks are available on the Internet as sound files. There are even a few albums that have their presence on the web alone. Another popular application is online live news broadcasting. The following section discusses some of the common audio file formats: 132 .au files Organisation and Formats Most commonly found on the web, it is required by PC users to load applications such as Waveform Hold and Modify to play these files. Macintosh computers need different sound applications to play this file type. .mid files This is used by files following the Musical Instrument Digital Interface standard. These are used mostly in audio control in Multimedia industry. MIDI file specification allows for lengths to be specified as a variable number of bytes. .aiff files Audio Interchange File Format (aiff) was developed by Apple. Although it was originally made for Macs, now it can be used by other platforms too. It is a very good audio file format for use on the Internet. It can also be used in Multimedia authoring on both Macs and Windows. .mp3 Today this is the most popular audio format. mp3 – stands for MPEG layer three. MP3 allows for very high levels of compression. A single CD can contain hours of music. An MP3 player is required to play this file type. They are already available both in Mac and Windows machines. .voc files Creative Lab’s Sound Blaster uses the .voc files. They are designed for storing digitised voice data and hence the name. They can handle any digitised sound in any of a variety of formats. The VOC files have a two part structure. The header block which defines the contents of the file, the data block which actually contains the audio information. . Wave file is a commonly used file format on Windows machines. It can be used on the Internet and is good for multimedia authoring. It is flexible and handles both compressed and uncompressed storage formats. There are applications that allow editing a wave file through quite simple maneuvers. Self Check Exercise 4) Discuss the formats for audio. Note: i) Write your answer in the space given below. ii) Check your answer with the answers given at the end of this Unit...... 133 Content Organisation 7.3.3 Animations

Animations are very appealing and popular. These are created using applications such the Animator Pro. Here 2D and 3D objects are first created and then audio and movements are added to animate them. Typical example is the cartoons and simulations created in computer games and modern movies. Common file formats are the .flic or the .flc formats. 7.3.4 Video Formats

Video files have become most popular with films being available and viewed on VCDs and DVDs. It is important to be aware of the video file formats as these are the most bulky types. .avi files Audio-Video Interleaved file format was developed by Microsoft. An AVI player and drivers are required to play this format. They are readily available both in Mac and Windows machines. With the player, AVI plays full motion picture video with audio in a small window at about 15 frames per second. AVI comes with Windows, so no drivers need to be obtained and the built in media player. AVI is a popular standard, many videos have been produced in the format because of it’s non-requirement of drivers. The quality of AVI files with good drivers and good hardware can be quite impressive. .mov/.movie files Movie files are the common format used in QuickTime movies, the Mac native video platform. .mpg/.mpeg files The standard Internet format uses MPEG compression scheme. This format can be used on Macs by converting into QuickTime movies using applications such as ‘Sparkle’. .qt files QuickTime files. The latest version is used on Macs today. 7.3.5 Bibliographic Formats

Bibliographic data is produced by most libraries to facilitate access to collections by the users. Various bibliographic data formats have been advocated for bibliographic control. Bibliographic formats basically deal with representation techniques and content of bibliographic data. In most library databases the content is organised using one of the standard formats such as MARC21, UKMARC, Common Communication Formats and such. Each standard enlists the content designators, data elements and rules for extraction of the data from the information resources. Bibliographic data commonly includes titles, names, subjects, notes, publication data, and information about the physical description of an item. MARC21 has emerged as the most used formats for representation of 134 bibliographic data. The MARC 21 Format for Bibliographic Data is designed to be a carrier for bibliographic information about printed and manuscript textual Organisation and Formats materials, computer files, maps, music, serials, visual materials, and mixed materials. However, in the age of the networked information, as in the case of Internet today, library databases are also required to be interoperable and compliant with world standards for bibliographic data interchange. Also bibliographic data has to be communicated using web technologies and hence it becomes imperative that it should be encoded in the language of the web. Extensible Markup Language (XML) is used as a carrier of bibliographic data just as in the case of most data communication on the Net. 7.4 MULTIMEDIA AUTHORING

Integrating the different components into a sequenced presentation is referred to as “Authoring”. Just as an author writes a book or an article, multimedia presentations are put together by authoring routine. Icon authoring provides meaningful icons for most components which may be included at different stages of the presentation. Authorware Professional, HSC Interactive and Director are a few popular authoring packages available. The most striking feature of multimedia presentation over that of other presentations is that of user interaction. When a user is allowed to interact with the product it brings alive the document and offers non-sequential search of information. The feature of interaction scores above all other forms of presentation Authoring is the stage of designing a good presentation. There are various features available for sequencing and display of the components. Graphical icons are provided to denote movie, audio and text. Decision making and program icons are also provided to structure the program. Each component can be played concurrently, perpetually or sequentially. OLE (Object Linking and Embedding) is generally provided in most applications of authoring with which once a audio and video piece is saved in the digitised format it can be included in the database directly. Multimedia enabled DBMS packages allow the different components to be directly included in the database as shown in Fig. 7.3.

Fig. 7.3: How to Add Multimedia Files in Databases

135 Content Organisation 7.5 MULTIMEDIA COLLECTIONS IN LIBRARIES AND INFORMATION SERVICES

Multimedia is one technology that impacted many users in teaching and learning so also in libraries. Libraries are one of the first and major users of all multimedia applications. The varied type of content, the purpose of each collection and its organisation are discussed and illustrated below. The multimedia collections in libraries consist of resources like: 1) Encyclopaedia Example: Britannica CD 2) Dictionaries Example: Illustrated Oxford English Dictionary 3) Reference Manuals and tutorials Example: XML Online Tutorial 4) Online Journals Example: DLIB magazine (www.Dlib.org) 5) E-books Example: http://www.ebooksnbytes.com/ebooks.shtml (Fig. 7.4)

Fig. 7.4: E-books 136 1) Maps and Travel guides Organisation and Formats

Example: Mapquest (Fig. 7.5) is a very well organised database of maps and geographical and physical information of countries, cities and towns and gives expert travel guidance.

Fig. 7.5: Mapquest 7) Online databases Example: Science Direct

Fig. 7.6: Science Direct In addition to the above types of multimedia library collection, other library applications are: 137 Content Organisation  Multimedia orientation kiosks for library users: Kiosks are public access computers usually set up with a single purpose such as providing specific information or location specific directories etc. Multimedia kiosks are being increasingly used in libraries across the world for providing access to library resource information, multimedia applications and online services available only within the library and information center. The kiosks employ touch-screen monitors and navigational “buttons” through which text, photos, , audio and video are accessed. These kiosks provide access to the resources as online catalogue, library floor plans, and the library website. In addition, the kiosks provide virtual assistance to users outside the physical boundaries of traditional service areas.  Computer Aided learning/instruction packages: The philosophy behind computer aided instruction packages is that a combination of media types is used to illuminate the subject matter. An instructor can express an idea more clearly, and the student can understand it better when text appears along with a picture or video clip, than if the text or photograph stands alone.  Annotated Video lessons: In this video lessons are provided along with annotations. These annotations enumerate process or procedure explained in the video. The addition of annotations to a video greatly enhances its usefulness. 7.6 SUMMARY

Multimedia has made information presentation very attractive to the end users. The ability to use text with various other multimedia components such as images and audio adds a greater dimension to the resources. Special applications are developed using multimedia such as systems that read out loud for the benefit of vision impaired users. The possibilities are many especially for information service generation units like in library and information centers. 7.7 ANSWERS TO SELF CHECK EXERCISES

1) Content Organisation is the task of organising content according to different prescribed standards or categorisation of the database content. For example, for library purposes one of the familiar databases is bibliographic database. The common database elements and structure for this will be elements such as: Author, Title, Publisher, Edition, ISBN etc. Further, the design also includes planning for other features of bibliographic data such as whether a field is repeatable, what type of data will be entered in the field etc. Hence content organisation is a scientific task involving planning and designing the information model as per requirements and purpose for which the database is compiled. In libraries, standards such as MARC 21 are used for bibliographic data organisation. Content may be organised in a classified manner also like most digital 138 library collections are organised. The typical organisation in library collections would be by types of material like e-books, journals, reference Organisation and Formats works like dictionaries and encyclopedia etc. Another categorisation is by type of sub collections. 2) Textual data is perhaps the largest type of data. Text data can be mainly of two types: 1) Plain text 2) Formatted text – most familiar ones used are .doc, .wpd, .ps, .pdf However with emergence of Internet, Hypertext which is structured text written in the Hypertext Markup Language (HTML) for web documents is also a popular text format. 3) Important features to be considered while including images in databases are as follows:  Colour  Resolution  File sizes  Conversion and compatibility 4) The popular audio formats are the following: .wav – wave format .mp3 – MPEG layer 3 .au – audio .aiff – Audio Interchange File Format .mid – MIDI format .voc – voice data 7.8 KEYWORDS

Animation : Moving Graphic Images. ASCII : American Standard Code for Information Interchange. AVI file format : Audio Video Interleaved file format. Bandwidth : A measurement of the amount of data that can be transmitted ever a network at any give time. Binary Files : Data files in which the information is stored in the binary code of 1’s and 0’s that make up the basic language of . BMF : Bit Map Format. Dithering : A technique used in computer to improve the appearance of the image by 139 Content Organisation adding more colours or shades of grey to an existing image. GIF : Graphic Image Format. HTML : Hypertext Markup Language – language in which web pages are written. Hyperlink : A link that allows users to navigate between pieces of data that are interlinked through referencing in HTML. Hypermedia : Integration of two or more of these — text, audio, video, animations – into a single resource using interactive links. JPG/JPEG : Image format by the Joint Photographic Expert Group. MIDI : Musical Instrument Digital Interface. Multimedia : Integration of two or more of these — text, audio, video, animations – into a single resource. Pixel : One of the smallest units or picture elements that make up an image on a computer or television screen. TIF : Tagged Image Format. .wav : Wave audio format. 7.9 REFERENCES AND FURTHER READING

Download file formats and extension. http://www.learnthenet.com/english/ html/34filext.htm Frater, Harald and Paulissen, Dirk. (1994). Multimedia Mania. Grand Rapids: Abacus. How to make an image database. At http://courseware.utoronto.ca/web-ct/ help/img_db/img_how.html (browsed on 17/05/04) Image Compression at http://www.netnam.vn/unescocourse/computervision/ 101.htm Perry, Paul. (1994). Multimedia Developers Guide. Indianapolis: Sams Publishing. Mediachance at http://www.mediachance.com/ Multimedia Tutorial at: http://hotwired.lycos.com/webmonkey/98/17/ index0a_page 3. htm l?tw = multimedia Otwell, A. Comparison of Content Organising Methods. At http:// www.heyotwell.com/work/ia/contentorgmethods.doc (retrieved 20/01/05) Rimmer, Steve.(1994). Converting and using Graphic files. Indianapolis: Sams Publishing. Review of Internet file formats. http://dio.cdlr.strath.ac.uk/file_formats.html Rosch, Winn L. (1995). Multimedia Bible. Indianapolis: Sams Publishing. 140 Wodaski ,Ron.(1994). Multimedia Madness. Indianapolis: Sams Publishing.