Unit 2 File Concepts and File Structure
Total Page:16
File Type:pdf, Size:1020Kb
UNIT 2 FILE CONCEPTS AND FILE STRUCTURE Structure 2.0 Objectives 2.1 Introduction 2.2 File Organisation 2.2.1 Sequential File Organisation 2.2.2 Indexed Sequential File 2.3 Random Access File Organisation 2.4 Multi-key File Organisation 2.4.1 Multilist File Organisation 2.4.2 Inverted File Organisation 2.5 Summary 2.6 Answers to Self Check Exercises 2.7 Keywords 2.8 References and Further Reading 2.0 OBJECTIVES In the preceding Unit of this block you have learnt about the Database Concepts and various types of databases and database models. You have seen that the files are main ingredient of databases. In this Unit you will learn the concepts of files in the computer environment and also how such files are organised. After the completion of this unit, you will be able to: define what is file organisation and discuss its different types; understand various file organisation techniques; and discuss various types of indexes used in file organisation. 2.1 INTRODUCTION Generally speaking, a file consists of a collection of records. A key element in file management is concerned with the ways in which the records themselves are organised inside the file, since this affects system performances heavily as far as record finding and access are concerned. Here, by “organisation”, we refer to the logical arrangement of the records in a file (their ordering or, more generally, the presence of ``closeness’’ relations between them based on their content), and not to the physical layout of the file as stored on a storage media. However, access method of records in a file is dependent upon the physical medium on which the files, records are stored. Magnetic tape is sequential by its very nature. To read a record you must start at the beginning of the tape and sequentially read each record one after another (sequentially) until you get to 28 the one you want just like when you listen a song recorded in a audio tape. With disks, of course, random access of records is possible. It is the same as the difference between audio cassette and audio compact disc. In audio tape, File Concepts and File you have to start at the beginning and run the tape forward until you get to the Structure song you want to hear. With compact disc you can play the songs in random order or go directly to the track you want to hear. In this unit, we will be discussing about the ways data are represented for files on external storage devices so that required functions (e.g., retrieval, update) may be carried out efficiently. A particular organisation method most suitable for any application will depend upon such factors, as the kind of external storage available, types of queries allowed, number of keys, mode of retrieval and mode of update. 2.2 FILE ORGANISATION The technique used to represent and store the records in a file is known as file organisation. Thus a file is a named collection of related data or facts. Fields are the columns containing one type of information. File is a group of all the records. Therefore, a file contains Records and Records contain fields; Fields contain data items; Data items contain characters (alphabets, digits, special characters, etc.). Each character occupies one byte for its storage. Characters Fields Records File Fig. 2.1: Components of a File In the context of a traditional library the author catalogue is a file. Each individual author catalogue card is a record. Each area in a card such as author, title etc. is field. Thus, file may consist of one or more records, a record may consist of one or more fields and so on. A database is a collection of files that together implement a logical data model. Therefore, file organisation refers to the method used in organising data for storage, retrieval and processing. The two types of files in a physical database structure are data files and index files. Data files store the facts that comprise the database. Index files (or directories) support access to the data files but usually do not themselves store facts other than key values. A database’s logical structure helps in determining which facts should be accessed and how these facts relate to one another. Consider a simple bibliographical database. It may consist of a file of records containing bibliographical details about books. Each record about a book may consist of several fields (Author, Title, Imprint, etc.). For fast access to the 29 Database Concepts records we may create another file, an index file or inverted file – each record in which may hold the index term (Name of author, Subject descriptor, etc.) and an index number. It is similar to back of the book index. The organisation of file determines the sequence of a file’s record, which is the physical ordering of records in storage. It also determines the set of operations necessary to find particular records. Record Access Method The method of organising the record in a file is referred to as its structure or organisation. The method by which we search the file in order to retrieve data is called the access method. Since the type of structure determines the possible means of access and vice ersa, these two elements, structure/organisation and access method, are intertwined. Table 1: Record access for different file organisations File Organisation Sup Sequential Seq Direct Seq Indexed Seq For a particular file the most appropriate organisation is determined on the basis of the operational characteristics of the storage medium used and the nature of the operations to be performed on the data. The most important characteristic of a storage device to be considered is whether it allows direct access to particular record occurrences or allows only sequential access to record occurrences. Magnetic disks are examples of direct access storage devices (abbreviated DASD’s). Magnetic tapes are examples of sequential storage devices. The four basic file organisation techniques that we will discuss here are the following: 1) Sequential 2) Indexed sequential 3) Random Access 4) Multi-key 2.2.1 Sequential File Organisation Sequential file organisation is the simplest file organisation technique. In a sequentially organised file, records are written in a sequence in one long list. The records in the file are arranged, in the same sequence in which they were originally entered/written into the file. That is, the records of the file are stored one after another e.g. record with sequence number 11 located just after the 10th record. 30 Beginning of Record 1 File Concepts and File ○○○○○○○○○○○ Structure file Record 2 ○○ ○○ Record n-1 Record N End of file Fig. 2.2: Structure of Sequential File The file is read from the beginning in the sequence in which the records are arranged. Thus, in a simple sequential file, the one way to retrieve the data is to start at the beginning of the file and read one record after the other, in sequence, until you reach the record you are searching for. The search is sequential, record by record. This can be time consuming especially for large files. In searching large databases, such sequential method takes relatively more time to identify and retrieve particular records, in comparison with other files. A sequential file could be stored on a sequential storage device such as a magnetic tape. Sequential files are, however, suitable for storing only archive, backup, and transport copies databases. Updating a sequential file usually requires the creation of a new file. To maintain file sequence, records are copied to the point where amendment is required. The changes are then made and copied into the new file. Following this, the remaining records in the original file are copied to the new file. The basic advantage offered by a sequential file is the ease of access to the next record, the simplicity of organisation and the absence of auxiliary data structures. However, replies to simple queries are time consuming for large files. In sequential file in addition to the problem of simple access, there are problems in the insertion and deletion of records. The drawback of a sequential file is that once a sequential file is created, records can be added only at the end of the file. It is not possible to insert records in the middle of the file without rewriting the file. And it is also not possible to modify an existing record without rewriting the file. To delete a record you should locate it first. 2.2.2 Indexed-Sequential File Indexed sequential file is designed to overcome the limitations of the sequential file. In indexed sequential file, a file is sequenced on a particular field, and an index for that file is built, based on that very field. Thus in indexed sequential file a type of indexing technique is added. The index provides a mechanism for faster search. Through indexing, a set of objects is associated to a set of orderable quantities. The indexed sequential file organisation allows both sequential and random processing. A sequential (or sorted on primary keys) file that is indexed is called an indexed sequential file. The index provides for random access to records, while the sequential nature of the file provides easy access to the subsequent records as well as sequential processing. An additional feature of this file system is the overflow area. This feature provides additional space for record addition without necessitating the creation of a new file. Before starting discussion on indexed 31 Database Concepts sequential file structure, let us, discuss the types of indexes which may be possible.