BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 1 of 29

BCA204T: DATA BASE MANAGEMENT SYSTEMS Unit - I Introduction: Database and Database Users, Characteristics of the Database Approach, Different people behind DBMS, Implications of Database Approach, Advantages of using DBMS, When not to use a DBMS. Database System Concepts and architecture: Data Models, Schemas, and Instances. DBMS Architecture and Data Independence., Database languages and interfaces. The database system Environment, Classification of DBMS.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 2 of 29

Unit-I Introduction

Introduction: Database is a collection of related data. Database management system is software designed to assist the maintenance and utilization of large scale collection of data. DBMS came into existence in 1960 by Charles. Integrated data store which is also called as the first general purpose DBMS. Again in 1960 IBM brought IMS-Information management system. In 1970 Edgor Codd at IBM came with new database called RDBMS. In 1980 then came SQL Architecture- Structure Query Language. In 1980 to 1990 there were advances in DBMS e.g. DB2, ORACLE.

Data: Data is raw fact or figures or entity.

Information: The processed data is called information.

Database: A database is a collection of related data.

For example, a university database might contain information about the following: . Entities such as students, faculty and courses. . Relationships between entities, such as students' enrollment in courses, faculty teaching courses.

Database Management System:

A Database Management System (DBMS) is a collection of program that enables user to create, maintain and manipulate a database. The DBMS is hence a general purpose software system that facilitates the process of defining, constructing and manipulating database for various applications.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 3 of 29

Characteristics of DBMS • To incorporate the requirements of the organization, system should be designed for easy maintenance. • Information systems should allow interactive access to data to obtain new information without writing fresh programs. • System should be designed to co-relate different data to meet new requirements. • An independent central repository, which gives information and meaning of available data is required. • Integrated database will help in understanding the inter-relationships between data stored in different applications. • The stored data should be made available for access by different users simultaneously. • Automatic recovery feature has to be provided to overcome the problems with processing system failure.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 4 of 29

Different people behind DBMS These apply to "large" databases, not "personal" databases that are defined, constructed, and used by a single person via, say, Microsoft Access.

There are two categories of people behind DBMS a) Those who actually use and control the database content, and those who design, develop and maintain database applications (called ―Actors on the Scene) b) Those who design and develop the DBMS software and related tools, and the computer systems operators (called ―Workers Behind the Scene). a) Actors on the Scene 1. Database Administrator (DBA): DBA is a person who is responsible for authorizing access to the database, coordinating and monitoring its use, and acquiring software and hardware resources as needed.

2. Database Designers: They are responsible for identifying the data to be stored and for choosing an appropriate way to organize it. They also define views for different categories of users. The final design must be able to support the requirements of all the user sub-groups.

3. End Users: These are persons who access the database for querying, updating, and report generation. They are main reason for database's existence! o Casual end users: use database occasionally, needing different information each time; use query language to specify their requests; typically middle- or high- level managers. o Naive/Parametric end users: Typically the biggest group of users; frequently query/update the database using standard canned transactions that have been carefully programmed and tested in advance. Examples: ______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 5 of 29

. bank tellers check account balances, post withdrawals/deposits . reservation clerks for airlines, hotels, etc., check availability of seats/rooms and make reservations. o Sophisticated end users: engineers, scientists, business analysts who implement their own applications to meet their complex needs. o Stand-alone users: Use "personal" databases, possibly employing a special- purpose (e.g., financial) software package. Mostly maintain personal databases using ready-to-use packaged applications. o An example is a tax program user that creates its own internal database. o Another example is maintaining an address book

4. System Analysts, Application Programmers, Software Engineers: o System Analysts: determine needs of end users, especially naive and parametric users, and develop specifications for canned transactions that meet these needs. o Application Programmers: Implement, test, document, and maintain programs that satisfy the specifications mentioned above.

b) Workers Behind the Scene 1)DBMS system designers/implementors: provide the DBMS software that is at the foundation of all this! 2) Tool developers: design and implement software tools facilitating database system design, performance monitoring, creation of graphical user interfaces, prototyping, etc. 3) Operators and maintenance personnel: responsible for the day-to-day operation of the system.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 6 of 29

Advantages of a DBMS Using a DBMS to manage data has many advantages: Data independence: Application programs should be as independent as possible from details of data representation and storage. The DBMS can provide an abstract view of the data to insulate application code from such details. Efficient data access: A DBMS utilizes a variety of sophisticated techniques to store and retrieve data efficiently. This feature is especially important if the data is stored on external storage devices. Data integrity and security: If data is always accessed through the DBMS, the DBMS can enforce integrity constraints on the data. For example, before inserting salary information for an employee, the DBMS can check that the department budget is not exceeded. Also, the DBMS can enforce access controls that govern what data is visible to different classes of users. Data administration: When several users share the data, centralizing the administration of data can offer significant improvements. Experienced professionals who understand the nature of the data being managed, and how different groups of users use it, can be responsible for organizing the data representation to minimize redundancy and for fine-tuning the storage of the data to make retrieval efficient. Concurrent access and crash recovery: A DBMS schedules concurrent accesses to the data in such a manner that users can think of the data as being accessed by only one user at a time. Further, the DBMS protects users from the effects of system failures. Reduced application development time: Clearly, the DBMS supports many important functions that are common to many applications accessing data stored in the DBMS. This, in conjunction with the high-level interface to the data, facilitates quick development of applications. Such applications are also likely to be more robust than applications developed from scratch because many important tasks are handled by the DBMS instead of being implemented by the application.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 7 of 29

Functions of DBMS • Data Definition: The DBMS provides functions to define the structure of the data in the application. These include defining and modifying the record structure, the type and size of fields and the various constraints to be satisfied by the data in each field. • Data Manipulation: Once the data structure is defined, data needs to be inserted, modified or deleted. These functions which perform these operations are part of DBMS. These functions can handle plashud and unplashud data manipulation needs. Plashud queries are those which form part of the application. Unplashud queries are ad-hoc queries which performed on a need basis. • Data Security & Integrity: The DBMS contains modules which handle the security and integrity of data in the application. • Data Recovery and Concurrency: Recovery of the data after system failure and concurrent access of records by multiple users is also handled by DBMS. • Data Dictionary Maintenance: Maintaining the data dictionary which contains the data definition of the application is also one of the functions of DBMS. • Performance: Optimizing the performance of the queries is one of the important functions of DBMS.

When not to use a DBMS a) Main inhibitors (costs) of using a DBMS: i) High initial investment and possible need for additional hardware. ii) Overhead for providing generality, security, concurrency control, recovery, and integrity functions. b) When a DBMS may be unnecessary: i) If the database and applications are simple, well defined and not expected to change. ii) If there are stringent real-time requirements that may not be met because of DBMS overhead. ______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 8 of 29

iii)If access to data by multiple users is not required. c) When no DBMS may be sufficient: i) If the database system is not able to handle the complexity of data because of modeling limitations ii) If the database users need special operations not supported by the DBMS.

Role of Database Administrator. Typically there are three types of users for a DBMS: 1. The END User who uses the application. Ultimately he is the one who actually puts the data into the system into use in business. This user need not know anything about the organization of data in the physical level. 2. The Application Programmer who develops the application programs. He/She has more knowledge about the data and its structure. He/she can manipulate the data using his/her programs. He/she also need not have access and knowledge of the complete data in the system. 3. The Data base Administrator (DBA) who is like the super-user of the system.

The role of DBA is very important and is defined by the following functions. • Defining the schema: The DBA defines the schema which contains the structure of the data in the application. The DBA determines what data needs to be present in the system and how this data has to be presented and organized. • Liaising with users: The DBA needs to interact continuously with the users to understand the data in the system and its use. • Defining Security & Integrity checks: The DBA finds about the access restrictions to be defined and defines security checks accordingly. Data Integrity checks are defined by the DBA. • Defining Backup/Recovery Procedures: The DBA also defines procedures for backup and recovery. Defining backup procedure includes specifying what data is to be backed up, the periodicity of taking backups and also ______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 9 of 29

the medium and storage place to backup data. • Monitoring performance: The DBA has to continuously monitor the performance of the queries and take the measures to optimize all the queries in the application.

Database Manager

Database manager is a program module which provides the interface between the low level data stored in the database and the application programs and queries submitted to the system: – The database manager would translate DML statement into low level file system commands for storing, retrieving, and updating data in the database. – Integrity enforcement. Database manager enforces integrity by checking consistency constraints like the bank balance of customer must be maintained to a minimum of Rs. 1000, etc. – Security enforcement. Unauthorized users are prohibited to view the information stored in the data base. – Backup and recovery. Backup and recovery of database is necessary to ensure that the database must remain consistent despite the fact of failures.

Database Users

Database users are the people who need information from the database to carry out their business responsibility. The database users can be broadly classified into two categories like application programmers and end users.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 10 of 29

Sophisticated End Users Sophisticated end users interact with the system without writing programs. They form requests by writing queries in a database query language. These are submitted to query processor. Analysts who submit queries to explore data in the database fall in this category. Specialized End Users Specialized end users write specialized database application that does not fit into data-processing frame work. Application involves knowledge base and expert system, environment modeling system, etc. Naive End Users Naive end user interact with the system by using permanent application program Example: Query made by the student, namely number of books borrowed in library database. System Analysts System analysts determine the requirements of end user, and develop specification for canned transaction that meets this requirement. Canned Transaction Readymade programs through which naive end users interact with the database is called canned transaction.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 11 of 29

Database System Concepts and Architecture

Data Models, Schemas, and Instances One fundamental characteristic of the database approach is that it provides some level of data abstraction by hiding details of data storage that are irrelevant to database users. A data model is a collection of concepts that can be used to describe the conceptual/logical structure of a database. The structure of a database means that holds the data’s data types, relationships, and constraints.

According to C.J. Date (one of the leading database experts), a data model is an abstract, self-contained, logical definition of the objects, operators, and so forth, that together constitute the abstract machine with which users interact. The objects allow us to model the structure of data; the operators allow us to model its behavior.

Types of Data Models 1. High Level- Conceptual data model. 2. Low Level – Physical data model. 3. Relational or Representational 4. Object-oriented Data Models: 5. Object-Relational Models:

1. High Level-conceptual data model: User level data model is the high level or conceptual model. This provides concepts that are close to the way that many users perceive data. 2 .Low level-Physical data model: provides concepts that describe the details of how data is stored in the computer model. Low level data model is only for Computer specialists not for end-user.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 12 of 29

3. Representation data model: It is between High level & Low level data model Which provides concepts that may be understood by end-user but that are not too far removed from the way data is organized by within the computer.

The most common data models are 1. Relational Model

The Relational Model uses a collection of tables both data and the relationship among those data. Each table has multiple columns and each column has a unique name. Relational database comprising of two tables.

Advantages 1. The main advantage of this model is its ability to represent data in a simplified format. 2. The process of manipulating record is simplified with the use of certain key attributes used to retrieve data. 3. Representation of different types of relationship is possible with this model.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 13 of 29

2. Network Model The data in the network model are represented by collection of records and relationships among data are represented by links, which can be viewed as pointers.

The records in the database are organized as collection of arbitrary groups.

Advantages: 1. Representation of relationship between entities is implemented using pointers which allows the representation of arbitrary relationship 2. Unlike the hierarchical model it is easy. 3. Data manipulation can be done easily with this model.

3. Hierarchical Model

A hierarchical data model is a data model which the data is organized into a tree like structure. The structure allows repeating information using parent/child relationships: each parent can have many children but each child has one parent. All attributes of a specific record are listed under an entity type.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 14 of 29

Advantages: 1. The representation of records is done using an ordered tree, which is natural method of implementation of one–to-many relationships. 2. Proper ordering of the tree results in easier and faster retrieval of records. 3. Allows the use of virtual records. This result in a stable database especially when modification of the data base is made.

Data Instances and Schemas

Database Instances Database change over time as information is inserted and deleted. The collection of information stored in the database at a particular moment is called an instance of the database. Database Schema The overall design of the database is called the database schema. A schema is a collection of named objects. Schemas provide a logical classification of objects in the database. A schema can contain tables, views, triggers, functions, packages, and other objects.

DBMS Architecture A commonly used view of data approach is the three-level architecture suggested by the ANSI/SPARC (American National Standards Institute/Standards Planning and Requirements Committee). ANSI/SPARC proposed an architectural framework for databases.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 15 of 29

The three levels of the architecture are three different views of the data: External Schema - individual user view Conceptual Schema- Logical or community user view Physical Schema -Internal or storage view The three level database architecture allows a clear separation of the information meaning (conceptual view) from the external data representation and from the physical data structure layout. A database system that is able to separate the three different views of data is likely to be flexible and adaptable.

The External Schema is the view that the individual user of the database has. This view is often a restricted view of the database and the same database may provide a number of different views for different classes of users.

The Conceptual schema (sometimes called the logical schema) describes the stored data in terms of the data model of the DBMS. In a relational DBMS, the conceptual schema describes all relations that are stored in the database. It hides physical storage details, concentrating upon describing entities, data types, relationships, user operations, and constraints.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 16 of 29

The physical schema specifies additional storage details. Essentially, the physical schema summarizes how the relations described in the conceptual schema are actually stored on secondary storage devices such as disks and tapes. It tells us what data is stored in the database and how.

Data Independence Data independence can be defined as the capacity to change the schema at one level without changing the schema at next higher level. It also means the internal structure of database should be unaffected by changes to physical aspects of storage. Because of data independence, the Database administrator can change the database storage structures without affecting the users view.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 17 of 29

The different levels of data abstraction are: 1. Physical data independence 2. Logical data independence

1. Physical data independence is the capacity to change the internal schema without changing the conceptual schema(logical).

2. Logical data independence is the capacity to change the conceptual schema without having to change the external schema(physical).

Database languages and interfaces A database system provides a data definition language to specify the database schema and a data manipulation language to express database queries and updates. In practice, the data definition and data manipulation languages are not two separate languages; instead they simply form parts of a single database language, such as the widely used SQL language.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 18 of 29

Data Definition Language Data Definition Language (DDL) statements are used to define the database structure or schema. Some examples:

o CREATE - to create objects in the database

o ALTER - alters the structure of the database

o DROP - delete objects from the database

o TRUNCATE - remove all records from a table, including all spaces allocated for the records are removed

For instance, the following statement in the SQL language is used to create the account table: create table account (accountnumber number(10), balance number(8));

The storage definition language (SDL), is used to specify the internal schema. The mappings between the two schemas may be specified in either one of these languages.

The view definition language (VDL), to specify user views and their mappings to the conceptual schema, but in most DBMSs the DDL is used to define both conceptual and external schemas.

In addition, it updates a special set of tables called the data dictionary or data directory. A data dictionary contains metadata—that is, data about data. The schema of a table is an example of metadata. A database system consults the data dictionary before reading or modifying actual data.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 19 of 29

Data Manipulation Language Data manipulation is • The retrieval of information stored in the database • The insertion of new information into the database • The deletion of information from the database • The modification of information stored in the database

Data Manipulation Language (DML) statements are used for managing data within schema objects. Some examples:

o SELECT - retrieve data from the a database

o INSERT - data into a table

o UPDATE - updates existing data within a table

o DELETE - deletes all records from a table

A data-manipulation language (DML) is a language that enables users to access or manipulate data as organized by the appropriate data model.

There are basically two types: • Procedural DMLs require a user to specify what data are needed and how to get those data, The DML component of the PL/SQL language is procedural. •Nonprocedural DMLs require a user to specify what data are needed without specifying how to get those data. The DML component of the SQL language is nonprocedural.

A query is a statement requesting the retrieval of information. The portion of a DML that involves information retrieval is called a query language.

The query in the SQL language finds the name of the customer whose customer-id is 192:

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 20 of 29

customername from customer where customerid = 192;

The query specifies that those rows from the table customer where the customerid is 192 must be retrieved.

DBMS Interfaces User-friendly interfaces provided by a DBMS may include the following. Menu Based Interfaces for Web Clients or Browsing. These interfaces present the user with lists of options, called menus, that lead the user through the formulation of a request. Menus do away with the need to memorize the specific commands and syntax of a query language; rather, the query is composed step by step by picking options from a menu that is displayed by the system. Pull-down menus are a very popular technique in Web-based user interfaces. They are also often used in browsing interfaces, which allow a user to look through the contents of a database in an exploratory and unstructured manner. Forms Based Interfaces. A forms-based interface displays a form to each user. Users can fill out all of the form entries to insert new data, or they fill out only certain entries, in which case the DBMS will retrieve matching data for the remaining entries. Forms are usually designed and programmed for naive users as interfaces to canned transactions. Graphical User Interfaces. A graphical interface (GUI) typically displays a schema to the user in diagrammatic form. The user can then specify a query by manipulating the diagram. In many cases, GUIs utilize both menus and forms. Most GUIs use a pointing device, such as a mouse, to pick certain parts of the displayed schema diagram. Natural Language Interfaces. These interfaces accept requests written in English or some other language and attempt to "understand" them. A natural language interface usually has its own "schema," which is similar to the database conceptual ______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 21 of 29

schema, as well as a dictionary of important words. The natural language interface refers to the words in its schema, as well as to the set of standard words in its dictionary, to interpret the request. Interfaces for Parametric Users. Parametric users, such as bank tellers, often have a small set of operations that they must perform repeatedly. Systems analysts and programmers design implement a special interface for naive users. Usually, a small set of abbreviated commands is included, with the goal of minimizing the number of keystrokes required for each request. Interfaces for the DBA. Most database systems contain privileged commands that can be used only by the DBA's staff. These include commands for creating accounts, setting system parameters, granting account authorization, changing a schema, and reorganizing the storage structures of a database.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 22 of 29

The Database System Environment A DBMS is a complex software system. The database and the DBMS catalog are usually stored on disk. Access to the disk is controlled primarily by the operating system (OS), which schedules disk read/write. Many DBMSs have their own buffer management module to schedule disk read/write, because this has a considerable effect on performance. Reducing disk read/write improves performance considerably. A higher-level stored data manager module of the DBMS controls access to DBMS information that is stored on disk, whether it is part of the database or the catalog. the top part of Figure, It shows interfaces for the DBA staff, casual users who work with interactive interfaces to formulate queries, application programmers who create programs using some host programming languages, and parametric users who do data entry work by supplying parameters to predefined transactions. The DBA staff works on defining the database and tuning it by making changes to its definition using the DDL and other privileged commands.

The DDL compiler processes schema definitions, specified in the DDL, and stores descriptions of the schemas (meta-data) in the DBMS catalog. The catalog includes information such as the names and sizes of files, names and data types of data items, storage details of each file, mapping information among schemas, and constraints.

Casual users and persons with occasional need for information from the database interact using some form of interface, which we call the interactive query interface. These queries are parsed and validated for correctness of the query syntax, the names of files and a query compiler that compiles them into an internal form. This internal query is subjected to query optimization, the query optimizer is concerned with the rearrangement and possible reordering of

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 23 of 29

operations, elimination of redundancies, and use of correct algorithms and indexes during execution. The precompiler extracts DML commands from an application program written in a host programming language. These commands are sent to the DML compiler for compilation into object code for database access. The rest of the program is sent to the host language compiler. The object codes for the DML commands and the rest of the program are linked, forming a canned transaction whose executable code includes calls to the runtime database processor.

It is now common to have the client program that accesses the DBMS running on a separate computer from the computer on which the database resides. The former is called the client computer running a DBMS client software and the latter is called the database server. In some cases, the client accesses a middle computer, called the application server, which in turn accesses the database server.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 24 of 29

Database System Utilities In addition to possessing the software modules just described, most DBMSs have database utilities that help the DBA manage the database system. Common utilities have the following types of functions: Loading. A loading utility is used to load existing data files—such as text files or sequential files—into the database. Usually, the current (source) format of the data file and the desired (target) database file structure are specified to the utility, which then automatically reformats the data and stores it in the database. Backup. A backup utility creates a backup copy of the database, usually by dumping the entire database onto tape or other mass storage medium. The backup copy can be used to restore the database in case of disk failure. Database storage reorganization. This utility can be used to reorganize a set of database files into different file organizations, and create new access paths to improve performance. Performance monitoring. Such a utility monitors database usage and provides statistics to the DBA. The DBA uses the statistics in making decisions such as whether or not to reorganize files or whether to add or drop indexes to improve performance. Other utilities may be available for sorting files, handling data compression, monitoring access by users, interfacing with the network, and performing other functions.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 25 of 29

Centralized and Client/Server Architectures for DBMSs

Centralized DBMSs Architecture Architectures for DBMSs have followed trends similar to those for general computer system architectures. Earlier architectures used mainframe computers to provide the main processing for all system functions, including user application programs and user interface programs, as well as all the DBMS functionality. The reason was that most users accessed such systems via computer terminals that did not have processing power and only provided display capabilities. Therefore, all processing was performed remotely on the computer system, and only display information and controls were sent from the computer to the display terminals, which were connected to the central computer via various types of communications networks. As prices of hardware declined, most users replaced their terminals with PCs and workstations. At first, database systems used these computers similarly to how they had used display terminals, so that the DBMS itself was still a centralized DBMS in which all the DBMS functionality, application program execution, and user interface processing were carried out on one machine. The Figure illustrates the physical components in a centralized architecture. Gradually, DBMS systems started to exploit the available processing power at the user side, which led to client/server DBMS architectures.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 26 of 29

Client/Server Architectures for DBMSs Database architecture essentially describes the location of all the pieces of information that make up the database application. Database architecture is logically divided into two types.

a) Logical two-tier Client / Server architecture b) Logical three-tier Client / Server architecture

Two-tier Client / Server Architecture

Two-tier Client / Server architecture is used for User Interface program and Application Programs that runs on client side. An interface called ODBC(Open Database Connectivity) provides an API that allow client side program to call the dbms. Most DBMS vendors provide ODBC drivers. A client program may connect to several DBMS's. In this architecture some variation of client is also possible for example in some DBMS's more functionality is transferred to the client including data dictionary, optimization etc. Such clients are called Data server.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 27 of 29

Three-tier Client / Server Architecture

Three-tier Client / Server database architecture is commonly used architecture for web applications. Intermediate layer called Application server or Web Server stores the web connectivity software and the business logic(constraints) part of application used to access the right amount of data from the database server. This layer acts like medium for sending partially processed data between the database server and the client.

Differentiate between centralized and distributed data base

Centralized Distributed Database is maintained at one site Database is maintained at a number of different sites If centralized system fails, entire system If one system fails, system continues is halted. work with other sites Less reliable More reliable

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 28 of 29

Classification of Database Management Systems Several criteria are normally used to classify DBMSs. The first is the data model on which the DBMS is based. The main data model used in many current commercial DBMSs is the relational data model. The object data model has been implemented in some commercial systems but has not had widespread use. Many legacy applications still run on database systems based on the hierarchical and network data models. The second criterion used to classify DBMSs is the number of users supported by the system. Single-user systems support only one user at a time and are mostly used with PCs. Multiuser systems, which include the majority of DBMSs, support concurrent multiple users. The third criterion is the number of sites over which the database is distributed. A DBMS is centralized if the data is stored at a single computer site. A centralized DBMS can support multiple users, but the DBMS and the database reside totally at a single computer site. A distributed DBMS (DDBMS) can have the actual database and DBMS software distributed over many sites, connected by a computer network. Homogeneous DDBMSs use the same DBMS software at all the sites, whereas heterogeneous DDBMSs can use different DBMS software at each site. The fourth criterion is cost. It is difficult to propose a classification of DBMSs based on cost. Today we have open source (free) DBMS products like MySQL and PostgreSQL that are supported by third-party vendors with additional services. The main RDBMS products are available as free examination 30-day copy versions as well as personal versions, We can also classify a DBMS on the basis of the types of access path options for storing files. One well-known family of DBMSs is based on inverted file structures. Finally, a DBMS can be general purpose or special purpose. When performance is a primary consideration, a special-purpose DBMS can be designed and built for a specific application; such a system cannot be used for other applications without

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 29 of 29

major changes. Many airline reservations and telephone directory systems developed in the past are special-purpose DBMSs. These fall into the category of online transaction processing (OLTP) systems, which must support a large number of concurrent transactions without imposing excessive delays.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 1 of 49

BCA204T : DATA BASE MANAGEMENT SYSTEMS Unit - II Data Modelling Using the Entity-Relationship Model: High level conceptual Data Models for Database Design with and example., Entity types, Entity sets, attributes, and Keys, ER Model Concepts, Notation for ER Diagrams, Proper naming of Schema Constructs, Relationship types of degree higher than two. Record Storage and Primary File Organization: Secondary Storage Devices. Buffering of Blocks. Placing file Records on Disk. Operations on Files, File of unordered Records (Heap files), Files of Ordered Records (Sorted files), Hashing Techniques, and Other Primary file Organization.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 2 of 49

Unit-II Data Modeling Using E-R Model

E-R Model

A logical representation of the data for an organization or for a business area is called E-R Model. It is also called has Entity-Relationship Model. Entity-Relationship Diagram A graphical representation of entity-relationship model. Also called E-R diagram or just ERD.

ER Model: Basic Concepts Entity relationship model defines the conceptual view of database. It works around real world entity and association among them. At view level, ER model is considered well for designing databases. Entity An entity is an object that exists and which is distinguishable from other objects. An entity can be a person, a place, an object, an event, or a concept about which an organization wishes to maintain data. For example, in a school database, student, teachers, class and course offered can be considered as entities. All entities have some attributes or properties that give them their identity. An entity set is a collection of similar types of entities. Entity set may contain entities with attribute sharing similar values. For example, Students set may contain all the student of a school; likewise Teachers set may contain all the teachers of school from all faculties. Entities sets need not to be disjoint.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 3 of 49

Attributes An attribute is a property that describes an entity. All attributes have values. For example, a student entity may have name, class, age as attributes. There exists a domain or range of values that can be assigned to attributes. For example, a student's name cannot be a numeric value. It has to be alphabetic. A student's age cannot be negative, etc. Types of attributes:

Simple attribute: Simple attributes are atomic values, which cannot be divided further. For example, student's phone-number is an atomic value of 10 digits.

Composite attribute: Composite attributes are made of more than one simple attribute. For example, a student's name may have Firstname and Lastname.

Derived attribute: Derived attributes are attributes, which do not exist physical in the database, but there values are derived from other attributes presented in the database. For another example, Age can be derived from DOB.

Stored attribute: An attribute whose value cannot be derived from the values of other attributes is called a stored attribute. For example, DOB

Single-valued attribute: Single valued attributes contain on single value. For example: SocialSecurityNumber.

Multi-value attribute: Multi-value attribute may contain more than one values. For example, a person can have more than one phone numbers, EmailId etc.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 4 of 49

Entity-set and Keys Key is an attribute or collection of attributes that uniquely identifies an entity among entity set. For example, RegNo of a student makes her/him identifiable among students.

Super Key: Set of attributes (one or more) that collectively identifies an entity in an entity set.

Candidate Key: Minimal super key is called candidate key that is, supers keys for which no proper subset are a superkey. An entity set may have more than one candidate key.

Primary Key: This is one of the candidate key chosen by the database designer to uniquely identify the entity set.

Relationship The association among entities is called relationship. For example, employee entity has relation works_at with department. Another example is for student who enrolls in some course. Here, Works_at and Enrolls are called relationship.

Keys Superkey: an attribute or set of attributes that uniquely identifies an entity. Composite key: a key requiring more than one attribute. Candidate key: a superkey such that no proper subset of its attributes is also a superkey (minimal superkey – has no unnecessary attributes) Primary key: the candidate key chosen to be used for identifying entities and accessing records. Unless otherwise noted "key" means "primary key" Alternate key: a candidate key not used for primary key Secondary key: attribute or set of attributes commonly used for accessing records, but not necessarily unique

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 5 of 49

Foreign key: term used in relational databases (but not in the E-R model) for an attribute that is the primary key of another table and is used to establish a relationship with that table where it appears as an attribute also. So a foreign key value occurs in the table and again in the other table.

Mapping Cardinalities: Cardinality defines the number of entities in one entity set which can be associated to the number of entities of other set via relationship set.

One-to-one: one entity from entity set A can be associated with at most one entity of entity set B and vice versa.

One-to-many: One entity from entity set A can be associated with more than one entities of entity set B but from entity set B one entity can be associated with at most one entity.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 6 of 49

Many-to-one: More than one entities from entity set A can be associated with at most one entity of entity set B but one entity from entity set B can be associated with more than one entity from entity set A.

Many-to-many: one entity from A can be associated with more than one entity from B and vice versa.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 7 of 49

Notations/Symbols used in E-R diagrams

Entity: an entity can be any object, place person or class.

In E-R diagram entity is represented using rectangles. For example Student is an entity.

Strong Entities are independently from other entity types. They always possess one or more attributes that uniquely(primary key) distinguish each occurrence of the entity. For example Student is an entity.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 8 of 49

Weak Entities depend on another entity. Weak entity doesn’t have key attribute of their own. Double rectangle represents weak entity.

Relationship A relationship describes relations between entities. Relationship is represented using diamonds.

Attributes: An attributes describes a property or characteristic of an entity. An attribute is represented using eclipse. For example regno, name, course can be the attribute of student entity.

Key Attribute Key attribute represents the main characteristic of an entity. It is used to represent Primary key. Ellipse with underlying lines represent key attribute.

Composite Attribute An attributes can be sub divided. These attributes are known as composite attribute.

Multivalued Attribute Multivalued attributes are those that are capable of taking on more than one value. It is represented by double Ellipse.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 9 of 49

Derived Attribute Derived attributes are attributes whose value can be calculated from related attribute values. They are represented by dotted ellipse.

Example of E-R diagram

Cardinality of Relationships

Cardinality is the number of entity instances to which another entity set can map under the relationship. This does not reflect a requirement that an entity has to participate in a relationship. Participation is another concept. One-to-one: X-Y is 1:1 when each entity in X is associated with at most one entity in Y, and each entity in Y is associated with at most one entity in X. One-to-many: X-Y is 1:M when each entity in X can be associated with many entities in Y, but each entity in Y is associated with at most one entity in X. Many-to-many: X:Y is M:M if each entity in X can be associated with many entities in Y, and each entity in Y is associated with many entities in X ("many" =>one or more and sometimes zero)

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 10 of 49

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 11 of 49

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 12 of 49

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 13 of 49

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 14 of 49

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 15 of 49

ER Diagram for Company Database

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 16 of 49

Relationship types of degree higher than two. superclass and subclass A superclass is an entity type that has one or more distinct subgroups with unique attributes. For example, the entity type PERSON in below fig is a superclass that includes faculty, staff, and students as its subgroups. The superclass features only those attributes that are common for all its subgroups. For example, attributes of PERSON such as SSN, Name, Address, and Email are shared by all its subgroups regardless of an individual’s position as student, faculty, or staff within the university. The subgroups with unique attributes are defined as subclasses. The PERSON superclass thus has three subclasses: STUDENT, STAFF, and FACULTY. A subclass entity type STUDENT has attributes of its superclass along with its own attributes such as Major, GPA, and Class that uniquely identify the subclass. In the below fig depicts a superclass and subclasses.

Generalization and Specialization Process

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 17 of 49

Going up in this structure is called generalization, where entities are clubbed together to represent a more generalized view. For example, a particular student named, Nisarga can be generalized along with all the students, the entity shall be student, and further a student is person. The reverse is called specialization where a person is student, and that student is Nisarga.

Generalization is the process of defining general entity types from a set of specialized entity types by identifying their common characteristics. In other words, this process minimizes the differences between entities by identifying a general entity type that features the common attributes of specialized entities. Generalization is a bottom-up approach as it starts with the specialized entity types (subclasses) and forms a generalized entity type (superclass).

For example, suppose if we have given us the specialized entity types FACULTY,

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 18 of 49

STAFF, and STUDENT, and we want to represent these entity types separately in the E-R model as depicted in above. However, if we examine them closely, we can observe that a number of attributes are common to all entity types, while others are specific to a particular entity.

For example, FACULTY, STAFF, and STUDENT all share the attributes Name, SSN, BirthDate, Address, and Email. On the other hand, attributes such as GPA, Class, and MajorDept are specific to the STUDENTS; OfficePhone is specific to FACULTY, and Designation is specific to STAFF. Common attributes suggest that each of these three entity types is a form of a more general entity type. This general entity type is simply a PERSON superclass entity with common attributes of three subclasses (in below fig). Thus, in the generalization process, we group specialized entity types to form one general entity type and identify common attributes of specialized entities as attributes of a general entity type. The general entity type is a superclass of specialized entity types or subclasses. Generalization is a bottom-up approach.

Specialization

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 19 of 49

Specialization is the process of defining one or more subclasses of a superclass by identifying its distinguishing characteristics. It starts with the general entity (superclass) and forms specialized entity types (subclasses) based on specialized attributes or relationships specific to a subclass.

For example, in the above Figure. LIBRARY ITEM is an entity type with several attributes such as IdentificationNo, RecordingDate, Frequency, and Edition. After careful review of these items, it should become clear that some items such as books do not have values for attributes such as Frequency, RecordingDate, and CourseNo, while Video CDs do not have an Author or an Edition.

In addition, all items have common attributes such as IdentificationNo, Location, and Subject. Someone creating a library database, then, could use the specialization process to identify superclass and subclass relationships. In this case, the original entity LIBRARY ITEM forms a superclass entity type made up of attributes shared by all items, while specialized items with distinguishing attributes, such as BOOK, JOURNALS, and VIDEOCD, form subclasses as shown in below fig. Specialization is thus a top-down approach.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 20 of 49

Definition Generalization is the process of defining a general entity type from a set of specialized entity types by identifying their common characteristics. Specialization is a process of defining one or more subclasses of a superclass by identifying their distinguishing characteristics.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 21 of 49

Record Storage and Primary File Organization

The collection of data that makes up a computerized database must be stored physically on some computer storage medium. The DBMS software can then retrieve, update, and process this data as needed. Computer storage media form a storage hierarchy that includes two main categories:

Primary storage. This category includes storage media that can be operated on directly by the computer’s central processing unit (CPU), such as the computer’s main memory and smaller but faster cache memories. Primary storage usually provides fast access to data but is of limited storage capacity. Although main memory capacities have been growing rapidly in recent years, they are still more expensive and have less storage capacity than secondary and tertiary storage devices.

Secondary and tertiary storage. This category includes magnetic disks, optical disks (CD-ROMs, DVDs, and other similar storage media), and tapes. Hard-disk drives are classified as secondary storage, whereas removable media such as optical disks and tapes are considered tertiary storage. These devices usually have a larger capacity, cost less, and provide slower access to data than do primary storage devices. Data in secondary or tertiary storage cannot be processed directly by the CPU; first it must be copied into primary storage and then processed by the CPU.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 22 of 49

Memory Hierarchies and Storage Devices In a modern computer system, data resides and is transported throughout a hierarchy of storage media. The highest-speed memory is the most expensive and is therefore available with the least capacity. The lowest-speed memory is offline tape storage, which is essentially available in indefinite storage capacity.

At the primary storage level, the memory hierarchy includes at the most expensive end, cache memory, which is a static RAM (Random Access Memory). Cache memory is typically used by the CPU to speed up execution of program instructions using techniques such as prefetching and pipelining.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 23 of 49

The next level of primary storage is DRAM (Dynamic RAM), which provides the main work area for the CPU for keeping program instructions and data. It is popularly called main memory. The advantage of DRAM is its low cost, which continues to decrease; the drawback is its volatility1 and lower speed compared with static RAM. At the secondary and tertiary storage level, the hierarchy includes magnetic disks, as well as mass storage in the form of CD-ROM (Compact Disk–Read-Only Memory) and DVD (Digital Video Disk or Digital Versatile Disk) devices, and finally tapes at the least expensive end of the hierarchy. The storage capacity is measured in kilobytes (Kbyte or 1000 bytes), megabytes (MB or 1 million bytes), gigabytes (GB or 1 billion bytes), and even terabytes (1000 GB). The word petabyte (1000 terabytes or 10**15 bytes) is now becoming relevant in the context of very large repositories of data in physics, astronomy, earth sciences, and other scientific applications. Programs reside and execute in DRAM. Generally, large permanent databases reside on secondary storage, (magnetic disks), and portions of the database are read into and written from buffers in main memory as needed. Nowadays, personal computers and workstations have large main memories of hundreds of megabytes of RAM and DRAM, so it is becoming possible to load a large part of the database into main memory. Between DRAM and magnetic disk storage, another form of memory, flash memory, is becoming common, particularly because it is nonvolatile. Flash memories are high-density, high-performance memories using EEPROM (Electrically Erasable Programmable Read-Only Memory) technology.

The advantage of flash memory is the fast access speed; the disadvantage is that an entire block must be erased and written over simultaneously. Flash memory cards are appearing as the data storage medium in appliances with capacities ranging from a few megabytes to a few gigabytes.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 24 of 49

These are appearing in cameras, MP3 players, cell phones, PDAs, and so on. USB (Universal Serial Bus) flash drives have become the most portable medium for carrying data between personal computers; they have a flash memory storage device integrated with a USB interface. CD-ROM (Compact Disk – Read Only Memory) disks store data optically and are read by a laser. CD-ROMs contain prerecorded data that cannot be overwritten. WORM (Write-Once-Read-Many) disks are a form of optical storage used for archiving data; they allow data to be written once and read any number of times without the possibility of erasing. They hold about half a gigabyte of data per disk and last much longer than magnetic disks. The DVD is another standard for optical disks allowing 4.5 to 15 GB of storage per disk. Most personal computer disk drives now read CDROM and DVD disks. Typically, drives are CD-R (Compact Disk Recordable) that can create CD-ROMs and audio CDs (Compact Disks), as well as record on DVDs. Finally, magnetic tapes are used for archiving and backup storage of data. Tape jukeboxes—which contain a bank of tapes that are catalogued and can be automatically loaded onto tape drives—are becoming popular as tertiary storage to hold terabytes of data. For example, NASA’s EOS (Earth Observation Satellite) system stores archived databases. Many large organizations are already finding it normal to have terabyte-sized databases. The term very large database can no longer be precisely defined because disk storage capacities are on the rise and costs are declining. Very soon the term may be reserved for databases containing tens of terabytes.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 25 of 49

Secondary Storage Devices Magnetic disks are used for storing large amounts of data. The most basic unit of data on the disk is a single bit of information. Bits are grouped into bytes (or characters). Byte sizes are typically 4 to 8 bits, depending on the computer and the device. We assume that one character is stored in a single byte, and we use the terms byte and character interchangeably. The capacity of a disk is the number of bytes it can store, Hard disks for personal computers typically hold from several hundred MB up to tens of GB; and large disk packs used with servers and mainframes have capacities of hundreds of GB. Disk capacities continue to grow as technology improves.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 26 of 49

Whatever their capacity, all disks are made of magnetic material shaped as a thin circular disk, as shown in Figure(a), and protected by a plastic or acrylic cover. A disk is single-sided if it stores information on one of its surfaces only and doublesided if both surfaces are used. To increase storage capacity, disks are assembled into a disk pack, as shown in Figure (b), which may include many disks and therefore many surfaces. Information is stored on a disk surface in concentric circles of small width, each having a distinct diameter. Each circle is called a track. In disk packs, tracks with the same diameter on the various surfaces are called a cylinder because of the shape they would form if connected in space. The concept of a cylinder is important because data stored on one cylinder can be retrieved much faster than if it were distributed among different cylinders. The number of tracks on a disk ranges from a few hundred to a few thousand, and the capacity of each track typically ranges from tens of Kbytes to 150 Kbytes. Because a track usually contains a large amount of information, it is divided into smaller blocks or sectors. The division of a track into sectors is hard-coded on the disk surface and cannot be changed. The division of a track into equal-sized disk blocks (or pages) is set by the operating system during disk formatting (or initialization). Block size is fixed during initialization and cannot be changed dynamically. Typical disk block sizes range from 512 to 8192 bytes. A disk with hard-coded sectors often has the sectors subdivided into blocks during initialization. Blocks are separated by fixed- size interblock gaps, which include specially coded control information written during disk initialization. This information is used to determine which block on the track follows each interblock gap.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 27 of 49

A disk is a random access addressable device. Transfer of data between main memory and disk takes place in units of disk blocks. The hardware address of a block—a combination of a cylinder number, track number (surface number within the cylinder on which the track is located), and block number (within the track) is supplied to the disk I/O (input/output) hardware.

The address of a buffer—a contiguous reserved area in main storage that holds one disk block—is also provided. For a read command, the disk block is copied into the buffer; whereas for a write command, the contents of the buffer are copied into the disk block. Sometimes several contiguous blocks, called a cluster, may be transferred as a unit. In this case, the buffer size is adjusted to match the number of bytes in the cluster.

The actual hardware mechanism that reads or writes a block is the disk read/write head, which is part of a system called a disk drive.A disk or disk pack is mounted in the disk drive, which includes a motor that rotates the disks. A read/write head includes an electronic component attached to a mechanical arm. Disk packs with multiple surfaces are controlled by several read/write heads—one for each surface, as shown in Figure (b). All arms are connected to an actuator attached to another electrical motor, which moves the read/write heads in unison and positions them precisely over the cylinder of tracks specified in a block address. Disk drives for hard disks rotate the disk pack continuously at a constant speed (typically ranging between 5,400 and 15,000 rpm(revolutions per minute)). Once the read/write head is positioned on the right track and the block specified in the block address moves under the read/write head, the electronic component of the read/write head is activated to transfer the data. Some disk units have fixed read/write heads, with as many heads as there are tracks. These are called fixed- head disks, whereas disk units with an actuator are called movable-head disks.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 28 of 49

For fixed-head disks, a track or cylinder is selected by electronically switching to the appropriate read/write head rather than by actual mechanical movement; consequently, it is much faster. However, the cost of the additional read/write heads is quite high, so fixed-head disks are not commonly used.

A disk controller, typically embedded in the disk drive, controls the disk drive and interfaces it to the computer system. The controller accepts high-level I/O commands and takes appropriate action to position the arm and causes the read/write action to take place. To transfer a disk block, given its address, the disk controller must first mechanically position the read/write head on the correct track. The time required to do this is called the seek time. Typical seek times are 5 to 10 msec on desktops and 3 to 8 msecs on servers. There is another delay— called the rotational delay or latency—while the beginning of the desired block rotates into position under the read/write head. It depends on the rpm of the disk. Some additional time is needed to transfer the data; this is called the block transfer time. Hence, the total time needed to locate and transfer an arbitrary block, given its address, is the sum of the seek time, rotational delay, and block transfer time.

Buffering of Blocks When several blocks need to be transferred from disk to main memory and all the block addresses are known, several buffers can be reserved in main memory to speed up the transfer. While one buffer is being read or written, the CPU can process data in the other buffer because an independent disk I/O processor (controller) exists that, once started, can proceed to transfer a data block between memory and disk independent of and in parallel to CPU processing.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 29 of 49

Figure (a) illustrates how two processes can proceed in parallel. Processes A and B are running concurrently in an interleaved fashion, whereas processes C and D are running concurrently in a parallel fashion. When a single CPU controls multiple processes, parallel execution is not possible. However, the processes can still run concurrently in an interleaved way. Buffering is most useful when processes can run concurrently in a parallel fashion, either because a separate disk I/O processor is available or because multiple CPU processors exist.

Figure (b) illustrates how reading and processing can proceed in parallel when the time required to process a disk block in memory is less than the time required to read the next block and fill a buffer. The CPU can start processing a block once its transfer to main memory is completed; at the same time, the disk I/O processor can be reading and transferring the next block into a different buffer. This technique is called double buffering and can also be used to read a continuous ______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 30 of 49 stream of blocks from disk to memory. Double buffering permits continuous reading or writing of data on consecutive disk blocks, which eliminates the seek time and rotational delay for all but the first block transfer. Moreover, data is kept ready for processing, thus reducing the waiting time in the programs.

Placing file Records on Disk Data is usually stored in the form of records. Each record consists of a collection of related data values or items, where each value is formed of one or more bytes and corresponds to a particular field of the record. Records usually describe entities and their attributes. For example, an EMPLOYEE record represents an employee entity, and each field value in the record specifies some attribute of that employee, such as EmployeeID, Name, DOB, Department, Salary.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 31 of 49

A collection of field names and their corresponding data types constitutes a record type or record format definition. A data type, associated with each field, specifies the types of values a field can take. The data type of a field is usually include numeric (integer, long integer, or floating point), string of characters (fixed-length or varying), Boolean (having 0 and 1 or TRUE and FALSE values only), and sometimes specially coded date and time data types.

Files, Fixed-Length Records, and Variable-Length Records A file is a sequence of records. In many cases, all records in a file are of the same record type. If every record in the file has exactly the same size (in bytes), the file is said to be made up of fixed-length records. If different records in the file have different sizes, the file is said to be made up of variable-length records.

Allocating File Blocks on Disk There are several standard techniques for allocating the blocks of a file on disk.

In contiguous allocation, the file blocks are allocated to consecutive disk blocks. This makes reading the whole file very fast using double buffering, but it makes expanding the file difficult.

In linked allocation, each file block contains a pointer to the next file block. This makes it easy to expand the file but makes it slow to read the whole file. A combination of the two allocates clusters of consecutive disk blocks, and the clusters are linked. Clusters are sometimes called file segments or extents.

In indexed allocation, where one or more index blocks contain pointers to the actual file blocks. It is also common to use combinations of these techniques.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 32 of 49

File Headers A file header or file descriptor contains information about a file that is needed by the system programs that access the file records. The header includes information to determine the disk addresses of the file blocks as well as to record format descriptions, which may include field lengths and the order of fields within a record for fixed-length unspanned records and field type codes, separator characters, and record type codes for variable-length records.

Operations on Files

Operations on files are usually grouped into retrieval operations and update operations. The retrieval operations do not change any data in the file, but only locate certain records so that their field values can be examined and processed. The latter change the file by insertion or deletion of records or by modification of field values. In either case, we may have to select one or more records for retrieval, deletion, or modification based on a selection condition (or filtering condition), which specifies criteria that the desired record or records must satisfy.

Actual operations for locating and accessing file records vary from system to system. DBMS software programs, access records by using these commands. Open. Prepares the file for reading or writing. Allocates appropriate buffers (typically at least two) to hold file blocks from disk, and retrieves the file header. Sets the file pointer to the beginning of the file. Reset. Sets the file pointer of an open file to the beginning of the file. Find (or Locate). Searches for the first record that satisfies a search condition. Transfers the block containing that record into a main memory buffer (if it is not already there). The file pointer points to the record in the buffer and it becomes the current record

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 33 of 49

Read (or Get). Copies the current record from the buffer to a program variable in the user program. This command may also advance the current record pointer to the next record in the file, which may necessitate reading the next file block from disk. FindNext. Searches for the next record in the file that satisfies the search condition. Transfers the block containing that record into a main memory buffer. Delete. Deletes the current record and (eventually) updates the file on disk to reflect the deletion. Modify. Modifies some field values for the current record and (eventually) updates the file on disk to reflect the modification. Insert. Inserts a new record in the file by locating the block where the record is to be inserted, transferring that block into a main memory buffer (if it is not already there), writing the record into the buffer, and (eventually) writing the buffer to disk to reflect the insertion. Close. Completes the file access by releasing the buffers and performing any other needed cleanup operations. The preceding (except for Open and Close) are called record-at-a-time operations because each operation applies to a single record. It is possible to streamline the operations Find, FindNext, and Read into a single operation, Scan, whose description is as follows: Scan. If the file has just been opened or reset, Scan returns the first record; otherwise it returns the next record. If a condition is specified with the operation, the returned record is the first or next record satisfying the condition. FindAll. Locates all the records in the file that satisfy a search condition. Find (or Locate) n. Searches for the first record that satisfies a search condition and then continues to locate the next n – 1 records satisfying the same condition. Transfers the blocks containing the n records to the main memory

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 34 of 49 buffer (if not already there). FindOrdered. Retrieves all the records in the file in some specified order. Reorganize. Starts the reorganization process. As we shall see, some file organizations require periodic reorganization. An example is to reorder the file records by sorting them on a specified field.

Files of Unordered Records (Heap Files) In this simplest and most basic type of organization, records are placed in the file in the order in which they are inserted, so new records are inserted at the end of the file. Such an organization is called a heap or pile file.

Inserting a new record is very efficient. The last disk block of the file is copied into a buffer, the new record is added, and the block is then rewritten back to disk. The address of the last file block is kept in the file header. Searching for a record using any search condition involves a linear search through the file block by block—an expensive procedure. If only one record satisfies the search condition, then, on the average, a program will read into memory and search half the file blocks before it finds the record. For a file of b blocks, this requires searching (b/2) blocks, on average. If no records or several records satisfy the search condition, the program must read and search all b blocks in the file. To delete a record, a program must first find its block, copy the block into a buffer, delete the record from the buffer, and finally rewrite the block back to the disk. This leaves unused space in the disk block. Deleting a large number of records in this way results in wasted storage space. Another technique used for record deletion is to have an extra byte or bit, called a deletion marker, stored with each record. A record is deleted by setting the deletion marker to a certain value.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 35 of 49

Files of Ordered Records (Sorted Files) We can physically order the records of a file on disk based on the values of one of their fields—called the ordering field. This leads to an ordered or sequential file. If the ordering field is also a key field of the file—a field guaranteed to have a unique value in each record—then the field is called the ordering key for the file.

A binary search for disk files can be done on the blocks rather than on the records. Suppose that the file has b blocks numbered 1, 2, ..., b; the records are ordered by ascending value of their ordering key field; and we are searching for a record whose ordering key field value is K.Assuming that disk addresses of the file blocks are available in the file header, the binary search can be described by Algorithm shown below. Binary Search on an Ordering Key of a Disk File L 1; U  b; (* b is the number of file blocks *) while (U ≥ L ) do begin i (L + U) div 2; read block i of the file into the buffer; if K < (ordering key field value of the first record in block i ) then U  i – 1 else if K > (ordering key field value of the last record in block i ) then L  i + 1 else if the record with ordering key field value = K is in the buffer then goto found else goto notfound; end; goto notfound;

A binary search usually accesses log2(b) blocks, whether the record is found or not—an improvement over linear searches, where, on the average, (b/2) blocks

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 36 of 49 are accessed when the record is found and b blocks are accessed when the record is not found. Inserting and deleting records are expensive operations for an ordered file because the records must remain physically ordered. To insert a record, we must find its correct position in the file, based on its ordering field value, and then make space in the file to insert the record in that position. For a large file this can be very time consuming because, on the average, half the records of the file must be moved to make space for the new record. This means that half the file blocks must be read and rewritten after records are moved among them. For record deletion, the problem is less severe if deletion markers and periodic reorganization are used.

Modifying a field value of a record depends on two factors: the search condition to locate the record and the field to be modified. If the search condition involves the ordering key field, we can locate the record using a binary search; otherwise we must do a linear search. A nonordering field can be modified by changing the record and rewriting it in the same physical location on disk—assuming fixed- length records. Modifying the ordering field means that the record can change its position in the file. This requires deletion of the old record followed by insertion of the modified record. Hashing Techniques

Hashing is a method which provides very fast access to records under certain search conditions. This organization is usually called a hash file. The search condition must be an equality condition on a single field, called the hash field. In most cases, the hash field is also a key field of the file, in which case it is called the hash key. The idea behind hashing is to provide a function h, called a hash function or randomizing function, which is applied to the hash field value of a record and yields the address of the disk block in which the record is stored. A search for the record within the block can be carried out in a main memory buffer. For most records, we need only a single-block access to retrieve that record.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 37 of 49

Hashing is also used as an internal search structure within a program whenever a group of records is accessed exclusively by using the value of one field. Hashing can be used for internal files. it can be modified to store external files on disk. It can also be extended to dynamically growing files. Internal Hashing Hashing is typically implemented as a hash table through the use of an array of records. Let the array index range is from 0 to M–1, as shown in Figure(a); then we have M slots whose addresses correspond to the array indexes. We choose a hash function that transforms the hash field value into an integer between 0 and M-1. One common hash function is the h(K) = K mod M function, which returns the remainder of an integer hash field value K after division by M; this value is then used for the record address.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 38 of 49

Noninteger hash field values can be transformed into integers before the mod function is applied. For character strings, the numeric (ASCII) codes associated with characters can be used in the transformation—for example, by multiplying those code values. For a hash field whose data type is a string of 20 characters, Algorithm (a) can be used to calculate the hash address.

Algorithm Two simple hashing algorithms: (a) Applying the mod hash function to a character string K. (b) Collision resolution by open addressing.

(a) temp  1; for i 1 to 20 do temp  temp * code(K[i ] ) mod M ; hash_address  temp mod M;

(b) i  hash_address(K); a i; if location i is occupied then begin i  (i + 1) mod M; while (i ≠ a) and location i is occupied do i  (i + 1) mod M; if (i = a) then all positions are full else new_hash_address  i; end;

Other hashing functions can be used. One technique, called folding, involves applying an arithmetic function such as addition or a logical function such as exclusive or to different portions of the hash field value to calculate the hash address. A collision occurs when the hash field value of a record that is being inserted hashes to an address that already contains a different record. In this situation, we must insert the new record in some other position. The process of finding another position is called collision resolution.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 39 of 49

There are numerous methods for collision resolution, including the following: Open addressing. Ponce a position specified by the hash address is found to be occupied, the program checks the subsequent positions in order until an unused (empty) position is found. Algorithm (b) may be used for this purpose.

Chaining. For this method, various overflow locations are kept, usually by extending the array with a number of overflow positions. Additionally, a pointer field is added to each record location. A collision is resolved by placing the new record in an unused overflow location and setting the pointer of the occupied hash address location to the address of that overflow location. A linked list of overflow records for each hash address is thus maintained, as shown in Figure (b). Multiple hashing. The program applies a second hash function if the first results in a collision. If another collision results, the program uses open addressing or applies a third hash function and then uses open addressing if necessary.

External Hashing for Disk Files Hashing for disk files is called external hashing. the target address space is in external hashing is made of buckets, A bucket is either one disk block or a cluster of contiguous blocks. The hashing function maps the indexing field’s value into a relative bucket number. A table maintained in the file header converts the bucket number into the corresponding disk block address as shown in below Figure.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 40 of 49

The hashing scheme is called static hashing if a fixed number of buckets is allocated. A major drawback of static hashing is that the number of buckets must be chosen large enough that can handle large files. That is, it is difficult to expand or shrink the file dynamically. Dynamic Hashing Two hashing techniques are Extendible hashing and Liner hashing.

Extendible hashing Basic Idea:  Maintain a directory of bucket addresses instead of just hashing to buckets directly. (indirection)  The directory can grow, but its size is always a power of 2.  At any time, the directory consists of d levels, and a directory of depth d has 2d bucket pointers. - However, not every directory entry (bucket pointer) has to point to a unique bucket. More than one directory entry can point to the same one. - Each bucket has a local depth d’ that indicates how many of the d bits of

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 41 of 49

the hash value are actually used to indicate membership in the bucket.  The depth d of the directory is based on the # of bits we use from each hash value. - The hash function produces an output integer which can be treated as a sequence of k bits. We use the first d bits in the hash value produced to look up an entry in the directory. The directory entry points us to the block that contains records whose keys hash to that value.

Example: -Assume each hashed key is a sequence of four binary digits. -Store values 0001, 1001, 1100.

Insert 0100

Insert 1111

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 42 of 49

Directory grows one level.

Insert 1101

Directory grows one level.

Delete 1111

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 43 of 49

Merge Blocks and Shrink Directory

Linear hashing  Linear hashing allows a hash file to expand and shrink dynamically without the need of a directory. - Thus, directory issues like extendible hashing are not present.  A linear hash table starts with 2d buckets where d is the # of bits used from the hash value to determine bucket membership. - The size of the table will grow gradually, but not double in size. The growth of the hash table can either be triggered: 1) Every time there is a bucket overflow. 2) When the load factor of the hash table reaches a given point. We will examine the second growth method. Since overflows may not always trigger growth, note that each bucket may use overflow blocks.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 44 of 49

Parallelizing Disk Access Using RAID Technology

RAID (originally redundant array of inexpensive disks; now commonly redundant array of independent disks) is a data storage that combines multiple disk drive components into a logical unit for the purposes of data redundancy or performance improvement.

Disk striping is the process of dividing a body of data into blocks and spreading the data blocks across several partitions on several hard disks.

Striping can be done at the byte level, or in blocks. Byte-level striping means that the file is broken into "byte-sized pieces". The first byte of the file is sent to the first drive, then the second to the second drive, and so on. Sometimes byte-level striping is done as a sector of 512 bytes.

Block-level striping means that each file is split into blocks of a certain size and those are distributed to the various drives. The size of the blocks used is also called the stripe size (or block size, or several other names), and can be selected from a variety of choices when the array is set up.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 45 of 49

RAID 0

RAID 0 performs what is called ―Block Striping‖ across multiple drives. The data is fragmented, or broken up, into blocks and striped among the drives. This level increases the data transfer rate and data storage since the controller can access the hard disk drives simultaneously. However, this level has no redundancy. If single drive fails, the entire array becomes inaccessible. The more drivers in the array the higher the data transfer but higher risk of a failure. RAID level 0 requires 2 drives to implement.

Advantages

I/O performance is greatly improved by spreading the I/O load across many channels and drives

Best performance is achieved when data is striped across multiple controllers with only one drive per controller

No parity calculation overhead is involved

Very simple design

Easy to implement

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 46 of 49

Disadvantages

Not a "True" RAID because it is NOT fault-tolerant

The failure of just one drive will result in all data in an array being lost

Should never be used in mission critical environments

Recommended Applications

Video Production and Editing

Image Editing

Pre-Press Applications

Any application requiring high bandwidth

RAID 1

RAID 1 is used to create a complete mirrored set of disks. Usually employing two disks, RAID 1 writes data as a normal hard drive would but makes an exact copy of the primary drive with the second drive in the array. This mirrored copy is constantly updated as data on the primary drive changes. By keeping this mirrored backup, the array decreases the chance of failure from 5% over three years to 0.25%. Should a drive fail, the failed drive should be replaced as soon as possible ______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 47 of 49 to keep the mirror updated. RAID Level 1 requires a minimum of 2 drives to implement.

Advantages

Twice the Read transaction rate of single disks, same Write transaction rate as single disks

100% redundancy of data means no rebuild is necessary in case of a disk failure, just a copy to the replacement disk

Transfer rate per block is equal to that of a single disk

Under certain circumstances, RAID 1 can sustain multiple simultaneous drive failures

Simplest RAID storage subsystem design

Disadvantages

Highest disk overhead of all RAID types (100%) - inefficient

Typically the RAID function is done by system software, loading the CPU/Server and possibly degrading throughput at high activity levels. Hardware implementation is strongly recommended

May not support hot swap of failed disk when implemented in "software"

Recommended Applications

Accounting

Payroll

Financial

Any application requiring very high availability

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 48 of 49

RAID 5

RAID 5 is the most stable of the more advanced RAID levels and offers redundancy, speed, and the ability to rebuild a failed drive. RAID 5 uses the same block level striping as other RAID levels but adds another level of data protection by creating a ―parity block‖. These blocks are stored alongside the other blocks in the array in a staggered pattern and are used to check that the data has been written correctly in the drive.

when an error or failure occur, the parity block will be used to locate the information stored on the other member disks and parity blocks in the array to rebuild the data correctly. This can be done on-the-fly without interruptions to applications or other programs running on the computer. The computer will notify the user of the failed drive, but will continue to operate normally. This state of operation is known as Interim Data Recovery Mode. Performance may suffer while the disk is being rebuilt, but operation should continue. The failed drive should be replaced as soon as possible. RAID Level 5 requires a minimum of 3 drives to implement.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 49 of 49

Advantages

Highest Read data transaction rate

Medium Write data transaction rate

Low ratio of Error Correction Code (Parity) disks to data disks means high efficiency

Good aggregate transfer rate

Disadvantages

Disk failure has a medium impact on throughput

Most complex controller design

Difficult to rebuild in the event of a disk failure (as compared to RAID 1)

Individual block data transfer rate same as single disk

Recommended Applications

File and Application servers

Database servers

Web, E-mail, and News servers

Intranet servers

Most versatile RAID level

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 1 of 23

BCA204T: DATA BASE MANAGEMENT SYSTEMS

Unit - III Functional Dependencies and Normalization for Relational Database: Informal Design Guidelines for Relational schemas, Functional Dependencies, Normal Forms Based on Primary Keys., General Definitions of Second and Third Normal Forms Based on Primary Keys., General Definitions of Second and Third Normal Forms, Boyce- Codd Normal Form. Relational Data Model and Relational Algebra: Relational Model Concepts., relational Model Constraints and relational Database Schema, defining Relations, Update Operations on Relations., Basic Relational Algebra Operations, Additional Relational Operations., Examples of queries in the Relational Algebra., Relational Database design Using ER-to-Relational Mapping.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 2 of 23

Unit-III Functional Dependencies and Normalization for Relational

Database

Functional Dependency Functional dependency (FD) is set of constraints between two attributes in a relation. Functional dependency says that if two tuples have same values for attributes A1, A2,..., An then those two tuples must have to have same values for attributes B1, B2, ..., Bn. Functional dependency is represented by arrow sign (→), that is X→Y, where X functionally determines Y. The left hand side attributes determines the values of attributes at right hand side.

Armstrong’s Axioms William W. Armstrong established a set of rules which can be used to Inference the functional dependencies in a relational database:

Reflexivity rule: A B is true, if B is subset of A.

Augmentation rule: If A B is true, then ACBC is also true.

Transitivity rule: If A B and B C, then A C is implied.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 3 of 23

Codd's 12 Rules Dr Edgar F. Codd was a computer Scientist who invented Relational model for Database management. Based on relational model relational database was created. Codd came up with twelve rules of his own which according to him, a database must obey in order to be a true relational database. Codd's rule actually define what quality a DBMS requires in order to become a Relational Database Management System(RDBMS). Till now, there is hardly any commercial product that follows all the 13 Codd's rules. Even Oracle follows only eight and half (8.5) out of 12. The Codd's 12 rules are as follows Rule 1: Information rule All information is to be represented as stored data in cells of tables. Everything in a database must be stored in table formats. This information can be user data or meta-data. Rule 2: Guaranteed Access rule Each unique piece of data(atomic value) should be accessible by : Table Name + primary key(Row) + Attribute(column). No other means, such as pointers, can be used to access data. Rule 3: Systematic Treatment of NULL values NULL may have several meanings, it can mean data is missing, data is not applicable, or no value. It should be handled consistently. Rule 4: Active online catalog The structure description of whole database must be stored in an online catalog, i.e. data dictionary, which can be accessed by the authorized users. Users can use the same query language to access the catalog which they use to access the database itself.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 4 of 23

Rule 5: Comprehensive data sub-language rule A database must have a support for a language which has linear syntax which is capable of data definition, data manipulation and transaction management operations. Database can be accessed by means of this language only, either directly or by means of some application. Rule 6: View updating rule All view that are theoretically updatable should be updatable by the system. Rule 7: High-level insert, update and delete rule There must be Insert, Delete, Update operations at each level of relations. This must not be limited to a single row, it must also support union, intersection and minus operations to yield sets of data records. Rule 8: Physical data independence The application should not have any concern about how the data is physically stored. Also, any change in its physical structure must not have any impact on application. Rule 9: Logical data independence If there is change in the logical structure (table structures) of the database the user view of data should not change.

Say, if a table is split into two tables, a new view should give result as the of the two tables. This rule is most difficult to satisfy. Rule 10: Integrity independence Integrity constraints must be defined and separated from the application programs. Changing Constraints must be allowed without affecting the applications.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 5 of 23

Rule 11: Distribution independence A database should work properly regardless of its distribution across a network, whether they are distributed or not. This lays foundation of distributed database. Rule 12: Non-subversion rule If low-level access is allowed to a system it should not be able to subvert or bypass the integrity rules to change data.

Normalization Normalization is the process of organizing the attributes and tables of a relational database to minimize data redundancy.

or

Normalization is the process of reorganizing data in a database so that it meets two basic requirements: a) There is no redundancy of data (all data is stored in only one place). b) Data dependencies are logical (all related data items are stored together). Normalization involves refactoring a table into smaller (and less redundant) tables but without losing information; defining foreign keys in the old table referencing the primary keys of the new ones. The objective is to isolate data so that additions, deletions, and modifications of an attribute can be made in just one table and then propagated through the rest of the database using the defined foreign keys.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 6 of 23

Needs for Normalization Improves database design. Ensure minimum redundancy of data. It can save storage space and ensure the consistency of your data. More flexible database structure. Removes anomalies for database activities.

FIRST NORMAL FORM (1NF) First normal form: A table is in the first normal form if it contains no repeating columns. Consider the below table, in this example it shows several employees working on several projects. In this company the same employee can work on different projects and at a different hourly rate. Convert this table into 1NF.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 7 of 23

STEPS: Transform a table of unnormalised data into first normal form (1NF). The process is as follows:

Identify repeating attributes.

Remove these repeating attributes to a new table together with a copy of the key from the UNF table. After removing the duplicate data the repeating attributes are easily identified.

In the previous table the Employee No, Employee Name, Department No, Department Name and Hourly Rate attributes are repeating. These are the repeating attributes and have been to a new table together with a copy of the original key (ie:Project Code).

A key of Project Code and Employee No has been defined for this new table. This combination is unique for each row in the table.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 8 of 23

SECOND NORMAL FORM (2NF) Second normal form: A table is in the second normal form if it is in the first normal form and contains only columns that are dependent on the whole (primary) key. STEPS: Transform 1NF data into second normal form (2NF). Remove any -key attributes (partial Dependencies) that only depend on part of the table key to a new table. Ignore tables with a simple key or with no non- key attributes.

The first table went straight to 2NF as it has a simple key (Project Code).

Employee name, Department No and Department Name are dependent upon Employee No only. Therefore, they were moved to a new table with Employee No being the key.

However, Hourly Rate is dependent upon both Project Code and Employee No as an employee may have a different hourly rate depending upon which project they are working on. Therefore it remained in the original table.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 9 of 23

THIRD NORMAL FORM (3NF) Third normal form: A table is in the third normal form if it is in the second normal form and all the non-key columns are dependent only on the primary key. If the value of a non-key column is dependent on the value of another non-key column we have a situation known as transitive dependency. This can be resolved by removing the columns dependent on non-key items to another table.

STEPS: Data in second normal form (2NF) into third normal form (3NF). Remove to a new table any non-key attributes that are more dependent on other non-key attributes than the table key.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 10 of 23

The project team table went straight from 2NF to 3NF as it only has one non-key attribute.

Department Name is more dependent upon Department No than Employee No and therefore was moved to a new table. Department No is the key in this new table and a foreign key in the Employee table.

Boyce-Codd Normal Form(BCNF) A table is in Boyce-Codd normal form (BCNF) if and only if it is in 3NF and every determinant is a candidate key. Anomalies can occur in relations in 3NF if there is a composite key in which part of that key has a determinant which is not itself a candidate key. This can be expressed as R(A,B,C), C--->A where:

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 11 of 23

o The relation R contains attributes A, B and C.

o A and B form a candidate key.

o C is the determinant for A (A is functionally dependent on C).

o C is not part of any key. Anomalies can also occur where a relation contains several candidate keys where:

o The keys contain more than one attribute (they are composite keys).

o An attribute is common to more than one key.

Example to understand BCNF:-

Consider the following non-BCNF table:

The candidate key of the table are: {Person, Shop Type} {Person, Nearest Shop} The table does not adhere to BCNF because of the dependency

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 12 of 23

Nearest Shop  Shop Type, in which the determining attribute (Nearest shop) is neither a candidate key nor a superset of a candidate key.

After Normalization.

Candidate keys are {Person, Shop} and {Shop}, respectively.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 13 of 23

Relational Data Model and Relational

Algebra

Relational Model Concepts The relational Model of Data is based on the concept of a Relation. A Relation is a mathematical concept based on the ideas of sets. The strength of the relational approach to data management comes from the formal foundation provided by the theory of relations. The model was first proposed by Dr. E.F. Codd of IBM in 1970 in the following paper: "A Relational Model for Large Shared Data Banks," Communications of the ACM, June 1970.

Relation: It is a table which has rows and columns in the data model, where rows represent records and columns represents the attributes. Tuples: A single row of a table, which contains a single record for that relation, is called a tuple. Attributes: Columns in a table are called attributes of the relation.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 14 of 23

Cardinality of a relation: The number of tuples in a relation determines its cardinality. In this case, the relation has a cardinality of 4.

Degree of a relation: Each column in the tuple is called an attribute. The number of attributes in a relation determines its degree. The relation has a degree of 5.

Domain: A domain definition specifies the kind of data represented by the attribute. More- particularly, a domain is the set of all possible values that an attribute may validly contain. Domains are often confused with data types, but this is wrong. Data type is a physical concept while domain is a logical one. "Number" is a data type and "Age" is a domain. To give another example "StreetName" and "Surname" might both be represented as text fields, but they are obviously different kinds of text fields; they belong to different domains. Properties of a Relation A relation with N columns and M rows (tuples) is said to be of degree N and cardinality M. This is Student_Table which shows the relation of degree three and cardinality five.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 15 of 23

The characteristic properties of a relation are as follows: All entries in a given column are of the same kind or type. Attributes are unordered - The order of columns in a relation is immaterial. The display of a relation in tabular form is free to arrange columns in any order.

No duplicate tuples. A relation cannot contain two or more tuples which have the same values for all the attributes. i.e., In any relation, every row is unique. There is only one value for each attribute of a tuple. The tuple should have only one value. The table shown below is not allowed in the relational model, despite the clear intended representation, ie. the Student has two values for Place, (eg. Nisarga has one in Pune, and one in Chennai). In such situations,the multiple values must be split into multiple tuples to be a valid relation.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 16 of 23

Tuples are unordered. The order of rows in a relation is immaterial. One is free to display a relation in any convenient way.

Integrity Constraints over Relations An integrity constraint (IC) is a condition that is specified on a database schema, and restricts the data that can be stored in an instance of the database. If a database instance satisfies all the integrity constraints specified on the database schema, it is a legal instance. A DBMS enforces integrity constraints, in that it permits only legal instances to be stored in the database. Integrity constraints are specified and enforced at different times: 1. When the DBA or end user defines a database schema, he or she specifies the ICs that must hold on any instance of this database. 2. When a database application is run, the DBMS checks for violations and disallows changes to the data that violate the specified ICs.

Keys of a Relation It is a set of one or more columns whose combined values are unique among all occurrences in a given table. A key is the relational means of specifying uniqueness. Some different types of keys are:

Primary key is an attribute or a set of attributes of a relation which posses the properties of uniqueness and irreducibility (No subset should be unique). For example: Register Number in Student table is primary key, Passenger Number in passenger table is primary key, Passport number in Booking table is a primary key and the combination of passenger number and Passport Number in Reservation table is a primary key ie composite primary key.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 17 of 23

Foreign key is the attributes of a table, which refers to the primary key of some another table. Foreign key permit only those values, which appears in the primary key of the table to which it refers or may be (Unknown value). For example: Register number of Result table refers to the Register number of Student table, which is the primary key of Student table, so we can say that Register number of Result table is the foreign key.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 18 of 23

Relational Algebra

 Basic operations:  Projection (π) Selects a subset of columns from relation.  Selection (σ) Selects a subset of rows from relation.  Cross-product (×) Allows us to combine two relations.  Set-difference (-) Tuples in reln. 1, but not in reln. 2.  Union (U) Tuples in reln. 1 and in reln. 2.  Rename( ρ) Use new name for the Tables or fields.  Additional operations:  Intersection (∩), join().

PROJECT (π)

The PROJECT operation is used to select a subset of the attributes of a relation by specifying the names of the required attributes

Consider the Student_table:

A) For example, to get a name from Student_Table. πName(Student_Table)

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 19 of 23

B) For example, to get a regno and name from Student_Table. πRegno,Name(Student_Table)

SELECT(σ)

The SELECT operation is used to choose a subset of the tuples from a relation that satisfies a selection condition. the SELECT operation can be consider to be a filter that keeps only those tuples that satisfy a qualifying condition.

A) For example, to list the regno > 102 from Student_Table. σRegno>102(Student_table)

B) For example, to list all the Students belong to BCA course. σCourse=”BCA”(Student_table)

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 20 of 23

Union, Intersection, Set-Difference

 All of these operations take two input relations, which must be union-compatible:  Same number of fields.  `Corresponding’ fields have the same type.

Consider:

UNION Operator List of customers who are either borrower or depositor at bank

πCust-name (Borrower) U πCust-name (Depositor)

INTERSECTION Operator Customers who are both borrowers and depositors

πCust-name (Borrower) ∩ πCust-name (Depositor)

Set Difference

Customers who are borrowers but not depositors

πCust-name (Borrower) - πCust-name (Depositor)

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 21 of 23

Cartesian-Product or Cross-Product (S1 × R1)  Each row of S1 is paired with each row of R1.  Result schema has one field per field of S1 and R1, with field names `inherited’ if possible.  Consider the borrower and loan tables as follows:

JOIN Join is combination of Cartesian product followed by selection process. Join operation pairs two tuples from different relations if and only if the given join condition is satisfied. Following section describe briefly about join types:

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 22 of 23

Natural Join ( ⋈ )

Natural Join can only be performed if there is at least one common attribute exists between relation. Those attributes must have same name and domain. Natural join acts on those matching attributes where the values of attributes in both relation is same.

Theta (θ) join

Theta joins combines tuples from different relations provided they satisfy the theta condition.

Notation: R1 ⋈θ R2

R1 and R2 are relations with their attributes (A1, A2, .., An ) and (B1, B2,.. ,Bn) such that no attribute matches that is R1 ∩ R2 = Φ Here θ is condition in form of set of conditions C.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 23 of 23

Theta join can use all kinds of comparison operators(=,<,>,≤,≥,≠).

Student_Detail = STUDENT ⋈Student.Std = Subject.Class SUBJECT

Equi-Join

When Theta join uses only equality comparison operator it is said to be Equi-Join. The above example corresponds to equi-join.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 1 of 50

BCA204T : DATA BASE MANAGEMENT SYSTEMS

Unit – IV Relational Database Language: Data definition in SQL, Queries in SQL, Insert, Delete and Update Statements in SQL, Views in SQL, Specifying General Constraints as Assertions, specifying indexes, Embedded SQL. PL /SQL: Introduction .

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 2 of 50

Unit-IV Relational Database Language

SQL

• SQL stands for Structured Query Language • SQL lets you access and manipulate databases • SQL is an ANSI (American National Standards Institute) standard SQL is a declarative (non-procedural)language. SQL is (usually) not case-sensitive, but we’ll write SQL keywords in upper case for emphasis. Some database systems require a semicolon at the end of each SQL statement.

A table is database object that holds user data. Each column of the table will have specified data type bound to it. Oracle ensures that only data, which is identical to the datatype of the column, will be stored within the column.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 3 of 50

SQL DML and DDL SQL can be divided into two parts: The Data Definition Language (DDL) and the Data Manipulation Language (DML).

Data Definition Language (DDL) It is a set of SQL commands used to create, modify and delete database structure but not data. It also define indexes (keys), specify links between tables, and impose constraints between tables. DDL commands are auto COMMIT. The most important DDL statements in SQL are: • CREATE TABLE - creates a new table • ALTER TABLE - modifies a table TRUNCATE TABLE- deletes all records from a table DROP TABLE - deletes a table

Data Manipulation Language (DML) It is the area of SQL that allows changing data within the database. The query and update commands form the DML part of SQL: • INSERT - inserts new data into a database • SELECT - extracts data from a database • UPDATE - updates data in a database • DELETE - deletes data from a database

Data Control Language (DCL) It is the component of SQL statement that control access to data and to the database. Occasionally DCL statements are grouped with DML Statements.

COMMIT –Save work done. SAVEPOINT – Identify a point in a transaction to which you can later rollback. ROLLBACK – Restore database to original since the last COMMIT. GRANT – gives user’s access privileges to database. REVOKE – withdraw access privileges given with GRANT command.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 4 of 50

Basic Data Types

Data Type Description This data type is used to store character strings values of fixed length. The size in brackets determines the number of characters the cell can CHAR(size) hold. The maximum number of character(ie the size) this data type can hold is 255 characters. The data held is right padded with spaces to whatever length specified. This data type is used to store variable length alphanumeric data. It is more flexible form of CHAR data type. VARCHAR can hold 1 to 255 VARCHAR(size) characters. VARCHAR is usually a wiser choice than CHAR, due to its / variable length format characteristic. But, keep in mind, that CHAR is much faster than VARCHAR, sometimes up to 50%. VARCHAR2(size)

This data type is used to represent data and time. The standard format is DD-MMM-YY. Date Time stores date in the 24-hour format. By DATE default, the time in a date field is 12:00:00am.

The NUMBER data type is used to store numbers(fixed or floating point). Number of virtually any magnitude maybe stored up to 38 NUMBER(P,S) digits of precision. The Precision(P), determines the maximum length of the data, whereas the scale(S), determine the number of places to the right of the decimal. Example: Number(5,2) is a number that has 3 digits before the decimal and 2 digits after the decimal. This data type is used to store variable length character strings containing up to 2GB. LONG data can be used to store arrays of binary LONG data in ASCII format. Only one LONG value can be defined per table. RAW / The RAW / LONG RAW data types are used to store binary data, such LONG RAW as digitized picture or image. RAW data type can have maximum length of 255 bytes. LONG RAW data type can contain up to 2GB.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 5 of 50

The CREATE TABLE Command:

The CREATE TABLE command defines each column of the table uniquely. Each column has a minimum of three attributes, name, datatype and size(i.e column width).each table column definition is a single clause in the create table syntax. Each table column definition is separated from the other by a comma. Finally, the SQL statement is terminated with a semi colon. Rules for Creating Tables  A name can have maximum upto 30 characters.  Alphabets from A-Z, a-z and numbers from 0-9 are allowed.  A name should begin with an alphabet.  The use of the special character like _(underscore) is allowed.  SQL reserved words not allowed. For example: create, select, alter.

Syntax: CREATE TABLE ( (), (), ……. );

Example: CREATE TABLE gktab (Regno NUMBER(3), Name VARCHAR(20), Gender CHAR, Dob DATE, Course CHAR(5));

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 6 of 50

Inserting Data into Tables

Once a table is created, the most natural thing to do is load this table with data to be manipulated later.

When inserting a single row of data into the table, the insert operation:

 Creates a new row(empty) in the database table.  Loads the values passed(by the SQL insert) into the columns specified.

Syntax: INSERT INTO (, , ..)

Values(,…);

Example: INSERT INTO gktab(regno,name,gender,dob,course) VALUES(101,’Varsh G Kalyan’,’F’,’20-Sep-1985’,’BCA’);

Or you can use the below method to insert the data into table.

INSERT INTO gktab VALUES(102,’Mohith G Kalyan’,’M’,’20-Aug-1980’,’BBM’); INSERT INTO gktab VALUES(106,’Nisarga’,’F’,’15-Jul-1983’,’BCom’); INSERT INTO gktab VALUES(105,’Eenchara’,’F’,’04-Dec-1985’,’BCA’); INSERT INTO gktab VALUES(103,’Ravi K’,’M’,’29-Mar-1989’,’BCom’); INSERT INTO gktab VALUES(104,’Roopa’,’F’,’17-Jan-1984’,’BBM’);

Whenever you work on the data which has data types like CHAR,VARCHAR/VARCHAR2, DATE should be used between single quote(‘)

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 7 of 50

Viewing Data in the Tables

Once data has been inserted into a table, the next most logical operation would be to view what has been inserted. The SELECT SQL verb is used to achieve this. The SELECT command is used to retrieve rows selected from one or more tables.

All Rows and All Columns

SELECT * FROM

SELECT * FROM gktab;

It shows all rows and column data in the table

Filtering Table Data

While viewing data from a table it is rare that all the data from the table will be required each time. Hence, SQL provides a method of filtering table data that is not required. The ways of filtering table data are:

 Selected columns and all rows  Selected rows and all columns  Selected columns and selected rows

Selected Columns and All Rows

The retrieval of specific columns from a table can be done as shown below.

Syntax ______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 8 of 50

SELECT , FROM

Example

Show only Regno, Name and Course from gktab.

SELECT Regno, Name, Course FROM gktab;

Selected Rows and All Columns

The WHERE clause is used to extract only those records that fulfill a specified criterion. When a WHERE clause is added to the SQL query, the Oracle engine compares each record in the table with condition specified in the WHERE clause. The Oracle engine displays only those records that satisfy the specified condition.

Syntax

SELECT * FROM WHERE ;

Here, is always quantified as

When specifying a condition in the WHERE clause all standard operators such as logical, arithmetic and so on, can be used.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 9 of 50

Example-1:

Display all the students from BCA.

SELECT * FROM gktab WHERE Course=’BCA’;

Example-2:

Display the student whose regno is 102.

SELECT * FROM gktab WHERE Regno=102;

Selected Columns and Selected Rows

To view a specific set of rows and columns from a table

When a WHERE clause is added to the SQL query, the Oracle engine compares each record in the table with condition specified in the WHERE clause. The Oracle engine displays only those records that satisfy the specified condition.

Syntax

SELECT , FROM

WHERE ;

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 10 of 50

Example-1:

List the student’s Regno, Name for the Course BCA.

SELECT Regno, Name FROM gktab WHERE Course=’BCA’;

Example-2:

List the student’s Regno, Name, Gender for the Course BBM.

SELECT Regno, Name, Gender FROM gktab WHERE Course=’BBM’;

Eliminating Duplicate Rows when using a SELECT statement

A table could hold duplicate rows. In such a case, to view only unique rows the DISTINCT clause can be used.

The DISTINCT clause allows removing duplicates from the result set. The DISTINCT clause can only be used with SELECT statements.

The DISTINCT clause scans through the values of the column/s specified and displays only unique values from amongst them.

Syntax SELECT DISTINCT , FROM ;

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 11 of 50

Example:

Show different courses from gktab

SELECT DISTINCT Course from gktab;

Sorting Data in a Table

Oracle allows data from a table to be viewed in a sorted order. The rows retrieved from the table will be sorted in either ascending or descending order depending on the condition specified in the SELECT sentence.

Syntax

SELECT * FROM

ORDER BY , <[Sort Order]>;

The ORDER BY clause sorts the result set based on the column specified. The ORDER BY clause can only be used in SELECT statements. The Oracle engine sorts in ascending order by default

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 12 of 50

Example-1: Show details of students according to Regno. SELECT * FROM gktab ORDER BY Regno;

Example-2:

Show the details of students names in descending order.

SELECT * FROM gktab ORDER BY Name DESC;

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 13 of 50

DELETE Operations

The DELETE command deletes rows from the table that satisfies the condition provided by its WHERE clause, and returns the number of records deleted.

The verb DELETE in SQL is used to remove either

 Specific row(s) from a table OR  All the rows from a table

Removal of Specific Row(s)

Syntax:

DELETE FROM tablename WHERE Condition;

Example:

DELETE FROM gktab WHERE Regno=103;

1 rows deleted

SELECT * FROM gktab;

In the above table, the Regno 103 is deleted from the table

Remove of ALL Rows

Syntax

DELETE FROM tablename;

Example

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 14 of 50

DELETE FROM gktab;

6 rows deleted

SELECT * FROM gktab; no rows selected

Once the table is deleted, use Rollback to undo the above operations.

UPDATING THE CONTENTS OF A TABLE

The UPDATE Command is used to change or modify data values in a table.

The verb update in SQL is used to either updates:

 ALL the rows from a table. OR  A select set of rows from a table.

Updating all rows

The UPDATE statement updates columns in the existing table’s rows with a new values. The SET clause indicates which column data should be modified and the new values that they should hold. The WHERE clause, if given, specifies which rows should be updated. Otherwise, all table rows are updated.

Syntax: UPDATE tablename SET columnname1=expression1, columnname2=expression2;

Example: update the gktab table by changing its course to BCA. UPDATE gktab SET course=’BCA’; 6 rows updated

SELECT * FROM gktab;

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 15 of 50

In the above table, the course is changed to BCA for all the rows in the table.

Updating Records Conditionally

If you want to update a specific set of rows in table, then WHERE clause is used.

Syntax:

UPDATE tablename

SET Columnname1=Expression1, Columnname2=Expression2

WHERE Condition;

Example:

Update gktab table by changing the course BCA to BBM for Regno 102. UPDATE gktab SET Course=’BBM’ WHERE Regno=102; 1 rows updated SELECT * FROM gktab;

MODIFYING THE STRUCTURE OF TABLES

The structure of a table can be modified by using the ALTER TABLE command.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 16 of 50

ALTER TABLE allows changing the structure of an existing table. With ALTER TABLE if is possible to add or delete columns, create or destroy indexes, change the data type of existing columns, or rename columns or the table itself. ALTER TABLE works by making a temporary copy of the original table. The alteration is performed on the copy, then the original table is deleted and the new one is renamed. While ALTER TABLE is executing, the original table is still readable by the users of ORACLE.

Restrictions on the ALTER TABLE The following task cannot be performed when using the ALTER TABLE Clause:  Change the name of the table.  Change the name of the Column.  Decrease the size of a column if table data exists.

ALTER TABLE Command can perform  Adding New Columns.  Dropping A Column from a Table.  Modifying Existing Columns.

Adding New Columns Syntax: ALTER TABLE tablename ADD(NewColumnname1 Datatype(size), NewColumnname2 Datatype(size)…..);

Example: Enter a new filed Phno to gktab. ALTER TABLE gktab ADD(Phno number(10));

The table is altered with new column Phno

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 17 of 50

Select * from gktab;

You can also use DESC gktab, to see the new column added to table.

Dropping A Column from a Table.

Syntax: ALTER TABLE tablename DROP COLUMN Columnname;

Example: Drop the column Phno from gktab. ALTER TABLE gktab DROP COLUMN Phno;

The table is altered, the column Phno is removed from the table.

Select * from gktab;

You can also use DESC gktab, to see the column removed from the table.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 18 of 50

Modifying Existing Columns. Syntax: ALTER TABLE tablename MODIFY(Columnname Newdatatype(Newsize)); Example: ALTER TABLE gktab MODIFY(Name VARCHAR(25)); The table altered with new size value 25.

DESC gktab;

RENAMING TABLES Oracle allows renaming of tables. The rename operation is done atomically, which means that no other thread can access any of the tables while the rename process is running. Syntax RENAME tablename TO newtablename;

TRUNCATING TABLES TRUNCATE command deletes the rows in the table permanently. Syntax: TRUNCATE TABLE tablename; The number of deleted rows are not returned. Truncate operations drop and re- create the table, which is much faster than deleting rows one by one. Example: TRUNCATE TABLE gktab; Table truncated i.e., all the rows are deleted permanently.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 19 of 50

DESTROYING TABLES Sometimes tables within a particular database become obsolete and need to be discarded. In such situation using the DROP TABLE statement with table name can destroy a specific table. Syntax: DROP TABLE tablename; Example: DROP TABLE gktab; If a table is dropped all the records held within and the structure of the table is lost and cannot be recovered.

COMMIT and ROLLBACK Commit Commit command is used to permanently save any transaction into database. SQL> commit; Rollback Rollback is used to undo the changes made by any command but only before a commit is done. We can't Rollback data which has been committed in the database with the help of the commit keyword or DDL Commands, because DDL commands are auto commit commands. SQL> Rollback;

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 20 of 50

Difference between DELETE and DROP. The DELETE command is used to remove rows from a table. After performing a DELETE operation you need to COMMIT or ROLLBACK the transaction to make the change permanent or to undo it.

The DROP command removes a table from the database. All the tables' rows, indexes and privileges will also be removed. The operation cannot be rolled back.

Difference between DELETE and TRUNCATE. The DELETE command is used to remove rows from a table. After performing a DELETE operation you need to COMMIT or ROLLBACK the transaction to make the change permanent or to undo it.

TRUNCATE removes all rows from a table. The operation cannot be rolled back.

Difference between CHAR and VARCHAR.

CHAR

1. Used to store fixed length data. 2. The maximum characters the data type can hold is 255 characters. 3. It's 50% faster than VARCHAR. 4. Uses static memory allocation.

VARCHAR

1. Used to store variable length data. 2. The maximum characters the data type can hold is up to 4000 characters. 3. It's slower than CHAR. 4. Uses dynamic memory allocation.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 21 of 50

DATA CONSTRINTS

Oracle permits data constraints to be attached to table column via SQL syntax that checks data for integrity prior storage. Once data constraints are part of a table column construct, the engine checks the data being entered into a table column against the data constraints. If the data passes this check, it is stored in the table column, else the data is rejected. Even if a single column of the record being entered into the table fails a constraint, the entire record is rejected and not stored in the table.

Both CREATE TABLE and ALTER TABLE SQL verbs can be used to write SQL sentences that attach constraints to a table column. The constraints are a keyword. The constraint is rules that restrict the values for one or more columns in a table. The Oracle Server uses constraints to prevent invalid data entry into tables. The constraints store the validate data and without constraints we can just store invalid data. The constraints are an important part of the table.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 22 of 50

Primary Key Constraint

A primary key can consist of one or more columns on a table. Primary key constraints define a column or series of columns that uniquely identify a given row in a table. Defining a primary key on a table is optional and you can only define a single primary key on a table. A primary key constraint can consist of one or many columns (up to 32). When multiple columns are used as a primary key, they are called a composite key. Any column that is defined as a primary key column is automatically set with a NOT NULL status. The Primary key constraint can be applied at column level and table level.

Foreign Key Constraint A foreign key constraint is used to enforce a relationship between two tables. A foreign key is a column (or a group of columns) whose values are derived from the Primary key or unique key of some other table. The table in which the foreign key is defined is called a Foreign table or Detail table. The table that defines primary key or unique key and is referenced by the foreign key is called Primary table or Master table. The master table can be referenced in the foreign key definition by using the clause REFERENCES Tablename.ColumnName when defining the foreign key, column attributes, in the detail table. The foreign key constraint can be applied at column level and table level.

Unique Key Constraint Unique key will not allow duplicate values. A table can have more than one Unique key. A unique constraint defines a column, or series of columns, that must be unique in value. The UNIQUE constraint can be applied at column level and table level.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 23 of 50

CHECK Constraint Business Rule validation can be applied to a table column by using CHECk constraint. CHECK constraints must be specified as a logical expression that evaluates either to TRUE or FALSE.

The CHECK constraint ensures that all values in a column satisfy certain conditions. Once defined, the database will only insert a new row or update an existing row if the new value satisfies the CHECK constraint. The CHECK constraint is used to ensure data quality. A CHECK constraint takes substantially longer to execute as compared to NOT NULL, PRIMARY KEY, FOREIGN KEY or UNIQUE. The CHECK constraint can be applied at column level and table level. NOT NULL Constraint The NOT NULL column constraint ensures that a table column cannot be left empty.

When a column is defined as not null, then that column becomes a mandatory column. The NOT NULL constraint can only be applied at column level.

Example on Constraints

Consider the Table shown below

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 24 of 50

Arithmetic Operators

Oracle allows arithmetic operators to be used while viewing records from a table or while performing data manipulation operations such as insert, updated and delete. These are:

+ Addition - Subtraction / Division * Multiplication () Enclosed Operations

Consider the below employee table(gkemp)

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 25 of 50

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 26 of 50

Special Note

The DUAL table is a special one-row, one-column table present by default in Oracle and other database installations. Dual is a dummy table.

Logical Operators

Logical operators that can be used in SQL sentence are:

AND Operators OR Operators NOT Operators

Operators Description OR :-For the row to be selected at least one of the conditions must be true.

AND :-For a row to be selected all the specified conditions must be true.

NOT :-For a row to be selected the specified condition must be false.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 27 of 50

Consider the below employee table(gkemp)

For example: if you want to find the names of employees who are working either in Commerce or Arts department, the query would be like,

For example: To find the names of the employee whose salary between10000 to 20000, the query would be like,

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 28 of 50

For example: If you want to find out the names of the employee who do not belong to computer science department, the query would be like,

Range Searching (BETWEEN)

In order to select the data that is within a range of values, the BETWEEN operator is used. The BETWEEN operator allows the selection of rows that contain values within a specified lower and upper limit. The range coded after the word BETWEEN is inclusive. The lower value must be coded first. The two values in between the range must be linked with the keyword AND. The BETWEEN operator can be used with both character and numeric data types. However, the data types cannot be mixed.

For example: Find the names of the employee whose salary between10000 and 20000, the query would be like,

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 29 of 50

Pattern Matching (LIKE, IN, NOT IN)

LIKE

The LIKE predicate allows comparison of one string value with another string value, which is not identical. this is achieved by using wild characters. Two wild characters that are available are:

For character data types:

% allows to match any string of any length (including zero length).

_ allows to match on a single character.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 30 of 50

IN

The IN operator is used when you want to compare a column with more than one value. It is similar to an OR condition.

For example: If you want to find the names of company located in the city Bangalore, Mumbai, Gurgaon, the query would be like,

NOT IN

The NOT IN operator is opposite to IN.

For example: If you want to find the names of company located in the other city of Bangalore, Mumbai, Gurgaon, the query would be like,

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 31 of 50

Column Aliases(Renaming Columns) in Oracle: Sometimes you want to change the column headers in the report. For this you can use column aliases in oracle. Oracle has provided excellent object oriented techniques as its robust database. It always good to practice and implement column aliases since it will make your code readable while using this columns.

to add column aliases to your queries.  Give a column alias name separated by space after the column name. Select DOB DateofBirth from gkstudent In the above query, the word in bold is column aliases.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 32 of 50

ORACLE FUNCTIONS

Oracle functions serve the purpose of manipulating data items and returning a result. Functions are also capable of accepting user-supplied variables or constants and operating on them. Such variables or constants are called arguments. Any number of arguments( or no arguments at all) can be passed to a function in the following format.

Function_Name(arguments1,arguments2…..)

Oracle functions can be clubbed together depending upon whether they operate on a single row or a group of rows retrieved from a table. Accordingly, functions can be classified as follows:

Group Functions(Aggregate Functions)

Function that act on a set of values are called group functions.

Scalar Functions(Single Row Functions)

Function that act on only one value at a time are called scalar functions.

String Functions: for string data type

Numeric functions: for Number data type

Conversion function: for conversion of one data type to another.

Date conversions: for date data type.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 33 of 50

a) SQL Aggregate / Group Functions

Group functions return results based on groups of rows, rather than on single rows. returns the number of rows in the query.SQL aggregate functions return a single value, calculated from values in a column.

Useful aggregate functions:

a) COUNT() - Returns the number of rows

b) AVG() - Returns the average value

c) MAX() - Returns the largest value

d) MIN() - Returns the smallest value

e) SUM() - Returns the total sum

Consider the below employee table(gkemp)

a) COUNT() The COUNT() function counts number of values present in the column excluding Null values.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 34 of 50

b) AVG()

The AVG() function returns the average value of a column specified.

c) MAX()

The MAX() function returns the highest value of a particular column.

d) MIN()

The MIN() function returns the smallest value of a particular column.

e) SUM()

The SUM() function returns the sum of column values.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 35 of 50

b) SQL String Functions

SQL string functions are used primarily for string manipulation. The following table details the important string functions:

SQL Command Meaning || It used for concatenation. INITCAP Return a string with first letter of each word in upper case. LENGTH Return the length of a word. LOWER Returns character, with all letters forced to lowercase. UPPER Returns character, with all letters forced to uppercase. LPAD Returns character, left-padded to length n with sequence of character specified. RPAD Returns character, right-padded to length n with sequence of character specified. LTRIM Removes characters from the left of char with initial characters removed upto the first character not in set. RTRIM Returns characters, with final characters removed after the last character not in the set. SUBSTR Returns a portion of characters, beginning at character m, and going upto character n. if n is omitted, it returns upto the last character in the string. The first position of char is 1. INSTR Returns the location of substring in a string.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 36 of 50

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 37 of 50

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 38 of 50

Date Conversion Functions

SQL Command Meaning SYSDATE It shows the system date. ADD_MONTHS(d, n) Returns date after adding the number of months specified in the function. LAST_DAY(d) Returns the date of the month specified with the function. MONTHS_BETWEEN(d1,d2) Returns number of months between d1 and d2. NEXT-DAY(date, char) Returns the date of the first weekday named by char that is after the date named by date. char must be a day of the week ROUND(date, [format]) Returns a date rounded to a specific unit of measure. If the second parameter is omitted, the ROUND function will round the date to the nearest day. Conversion Functions TO_DATE([,]) TO-CHAR([,])

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 39 of 50

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 40 of 50

SET OPERATORS and JOINS

Set Operators

Consider the below tables for set operators examples

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 41 of 50

Set operators combine the result of two quires into single one. The different set operators are: UNION UNION ALL INTERSECT MINUS

Union Clause UNION is used to combine the result of two or more SELECT statements. However it will eliminate duplicate rows from its result set. In case of UNION, number of columns in all the query must be same and datatype must be same in both the tables.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 42 of 50

Union Example

Union ALL Clause Same as UNION but it shows the duplicate rows Union All Example

Intersect Clause Intersect is used to combine two SELECT statements, but it only returns the records which are common from both SELECT statement. In case of intersect the number of columns in all the query and datatype must be same.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 43 of 50

Intersect Example

Minus Clause Minus combines result of two SELECT statement and return only those result which belongs to the first set of result.

Minus Example

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 44 of 50

JOINS The SQL Joins clause is used to combine records from two or more tables in a database. A JOIN is a means for combining fields from two tables by using values common to each. Here, it is noticeable that the join is performed in the WHERE clause. Several operators can be used to join tables, such as =, <, >, <>, <=, >=, !=, BETWEEN, LIKE, and NOT; they can all be used to join tables. However, the most common operator is the equal symbol.

SQL Join Types: There are different types of joins available in SQL:  INNER  OUTER(LEFT,RIGHT,FULL)  CROSS Consider the below tables for Join Operations examples

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 45 of 50

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 46 of 50

INNER Join Inner join are also known as Equi Joins. They are the most common joins used in SQL. They are known as equi joins because it uses the equal sign as the comparison operator (=). The INNER join returns all rows from both tables where there is a match. Consider the above tables (gkproduct and gkorder), For example: If you want to display the product information for each order the query will be as given below

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 47 of 50

OUTER Join OUTER join condition returns all rows from both tables which satisfy the join condition along with rows which do not satisfy the join condition from one of the tables. The sql outer join operator in Oracle is (+) and is used on one side of the join condition only. For example: If you want to display all the product data along with order items data, with null values displayed for order items if a product has no order item, the sql query for outer join would be as shown below(ie First Query).

NOTE: If the (+) operator is used in the left side of the join condition it is equivalent to left outer join. If used on the right side of the join condition it is equivalent to right outer join.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 48 of 50

OUTER JOIN : Outer Join retrieves Either, the matched rows from one table and all rows in the other table Or, all rows in all tables (it doesn't matter whether or not there is a match).

There are three kinds of Outer Join :

 LEFT OUTER JOIN or LEFT JOIN

This join returns all the rows from the left table in conjunction with the matching rows from the right table. If there are no columns matching in the right table, it returns NULL values.

 RIGHT OUTER JOIN or RIGHT JOIN

This join returns all the rows from the right table in conjunction with the matching rows from the left table. If there are no columns matching in the left table, it returns NULL values.

 FULL OUTER JOIN or FULL JOIN

This join combines left outer join and right outer join. It returns row from either table when the conditions are met and returns null value when there is no match.

In other words, OUTER JOIN is based on the fact that : ONLY the matching entries in ONE OF the tables (RIGHT or LEFT) or BOTH of the tables(FULL) SHOULD be listed.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 49 of 50

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 50 of 50

CROSS Join It is the Cartesian product of the two tables involved. It will return a table with consists of records which combines each row from the first table with each row of the second table. The result of a CROSS JOIN will not make sense in most of the situations. Moreover, we won’t need this at all (or needs the least, to be precise).

INNER JOIN: returns rows when there is a match in both tables. LEFT JOIN: returns all rows from the left table, even if there are no matches in the right table. RIGHT JOIN: returns all rows from the right table, even if there are no matches in the left table. FULL JOIN: returns rows when there is a match in one of the tables. SELF JOIN: is used to join a table to itself as if the table were two tables, temporarily renaming at least one table in the SQL statement. CARTESIAN JOIN: returns the Cartesian product of the sets of records from the two or more joined tables. ______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 1 of 12

BCA204T: DATA BASE MANAGEMENT SYSTEMS

Unit - V Transaction Processing Concepts: Introduction, Transaction and System Concepts, Desirable properties of transaction, Schedules and Recoverability, Serializability of Schedules, Transaction Support in SQL, Locking Techniques for Concurrency Control, Concurrency Control based on time stamp ordering.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 2 of 12

Unit-V

Transaction Processing Concepts

A Transaction is a Logical unit of database processing that includes one or more access operations (read -retrieval, write - insert or update, delete). Or A transaction can be defined as a group of tasks. A single task is the minimum processing unit of work, which cannot be divided further. An example of transaction can be bank accounts of two users, say A & B. When a bank employee transfers amount of Rs. 500 from A's account to B's account, a number of tasks are executed behind the screen. This very simple and small transaction includes several steps: decrease A's bank account from 500

Open_Account(A) Old_Balance = A.balance New_Balance = Old_Balance - 500 A.balance = New_Balance Close_Account(A) In simple words, the transaction involves many tasks, such as opening the account of A, reading the old balance, decreasing the 500 from it, saving new balance to account of A and finally closing it. To add amount 500 in B's account same sort of tasks need to be done:

Open_Account(B) Old_Balance = B.balance New_Balance = Old_Balance + 500 B.balance = New_Balance Close_Account(B) A simple transaction of moving an amount of 500 from A to B involves many low level tasks.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 3 of 12

ACID Properties of a Transaction The execution of a transaction Ti must satisfy some properties that are referred to as „ACID‟. The acronym “ACID” stands for „Atomicity‟, „Consistency‟, „Isolation‟ and „Durability‟, which are explained below:-

Atomicity Though a transaction involves several low level operations but this property states that a transaction must be treated as an atomic unit, that is, either all of its operations are executed or none. There must be no state in database where the transaction is left partially completed. States should be defined either before the execution of the transaction or after the execution/abortion/failure of the transaction.

Consistency This property states that after the transaction is finished, its database must remain in a consistent state. There must not be any possibility that some data is incorrectly affected by the execution of transaction. If the database was in a consistent state before the execution of the transaction, it must remain in consistent state after the execution of the transaction.

Isolation In a database system where more than one transaction are being executed simultaneously and in parallel, the property of isolation states that all the transactions will be carried out and executed as if it is the only transaction in the system. No transaction will affect the existence of any other transaction.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 4 of 12

Durability This property states that in any case all updates made on the database will persist even if the system fails and restarts. If a transaction writes or updates some data in database and commits that data will always be there in the database. If the transaction commits but data is not written on the disk and the system fails, that data will be updated once the system comes up.

States of Transactions:

A transaction in a database can be in one of the following state:

Active: In this state the transaction is being executed. This is the initial state of every transaction.

Partially Committed: When a transaction executes its final operation, it is said to be in this state. After execution of all operations, the database system performs some checks e.g. the consistency state of database after applying output of transaction onto the database.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 5 of 12

Failed: If any checks made by database recovery system fails, the transaction is said to be in failed state, from where it can no longer proceed further.

Aborted: If any of checks fails and transaction reached in Failed state, the recovery manager rolls back all its write operation on the database to make database in the state where it was prior to start of execution of transaction. Transactions in this state are called aborted. Database recovery module can select one of the two operations after a transaction aborts:

. Re-start the transaction

. Kill the transaction

Committed: If transaction executes all its operations successfully it is said to be committed. All its effects are now permanently made on database system.

Why Recovery Is Needed Whenever a transaction is submitted to a DBMS for execution, the system is responsible for making sure that either (1) All the operations in the transaction are completed successfully and their effect is recorded permanently in the database, or (2) The transaction has no effect whatsoever on the database or on any other transactions. The DBMS must not permit some operations of a transaction T to be applied to the database while other operations of T are not. This may happen if a transaction fails after executing some of its operations but before executing all of them.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 6 of 12

Types of Failures: Failures are generally classified as transaction, system, and media failures. There are several possible reasons for a transaction to fail in the middle of execution: A computer failure (system crash): A hardware, software, or network error occurs in the computer system during transaction execution. Hardware crashes are usually media failures for example, main memory failure. A transaction or system error: Some operation in the transaction may cause it to fail, such as integer overflow or division by zero. Transaction failure may also occur because of erroneous parameter values or because of a logical programming error.' In addition, the user may interrupt the transaction during its execution. Local errors or exception conditions detected by the transaction: During transaction execution, certain conditions may occur that necessitate cancellation of the transaction. For example, data for the transaction may not be found. Notice that an exception condition," such as insufficient account balance in a banking database, may cause a transaction, such as a fund withdrawal, to be canceled. This exception should be programmed in the transaction itself, and hence would not be considered a failure. Concurrency control enforcement: The concurrency control method may decide to abort the transaction, to be restarted later, because it violates serializability or because several transactions are in a state of deadlock. Disk failure: Some disk blocks may lose their data because of a read or write malfunction or because of a disk read/write head crash. This may happen during a read or a write operation of the transaction. Physical problems and catastrophes: This refers to an endless list of problems (usually sudden damage) that includes power or air-conditioning failure, fire, theft, sabotage, overwriting disks or tapes by mistake, and mounting of a wrong tape by the operator.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 7 of 12

LOCKING TECHNIQUES FOR CONCURRENCY CONTROL A lock is a variable associated with a data item that describes the status of the item with respect to possible operations that can be applied to it. Generally, there is one lock for each data item in the database. Locks are used as a means of synchronizing the access by concurrent transactions to the database items.

Types of Locks and System Lock Tables Several types of locks are used in concurrency control. Binary Locks : A binary lock can have two states or values: locked and unlocked (or 1 and 0, for simplicity). A distinct lock is associated with each database item X. If the value of the lock on X is 1, item 'X cannot be accessed by a database operation that requests the item. If the value of the lock on X is 0, the item can be accessed when requested. We refer to the current value (or state) of the lock associated with item X as LOCK(X). Two operations, lock_item and unlock_item, are used with binary locking. A transaction requests access to an item X by first issuing a lock_item(X) operation. If LOCK(X) = 1, the transaction is forced to wait. If LOCK(X) = 0, it is set to 1 (the transaction locks the item) and the transaction is allowed to access item X. When the transaction is through using the item, it issues an un1ock_i tem(X) operation, which sets LOCK(X) to 0(unlocks the item) so that 'X may be accessed by other transactions. Hence, a binary lock enforces mutual exclusion on the data item. If the simple binary locking scheme described here is used, every transaction must obey the following rules: 1. A transaction T must issue the operation 1ock_i tem(X) before any read_i tem(X) or write_item(X) operations are performed in T. 2. A transaction T must issue the operation unlock_i tem(X) after all read_i tem(X) and write_item(X) operations are completed in T. 3. A transaction T will not issue a lock_i tem(X) operation if it already holds the lock on item X. ______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 8 of 12

4. A transaction T will not issue an unlock_i tern(X) operation unless it already holds the lock on item X. Shared/Exclusive (or Read/Write) Locks: The preceding binary locking scheme is too restrictive for database items, because at most one transaction can hold a lock on a given item. We should allow several transactions to access the same item X if they all access X for reading purposes only. However, if a transaction is to write an item X, it must have exclusive access to X. For this purpose, a different type of lock called a multiple mode lock is used. In this scheme-called shared/exclusive or read/write locks there are three locking operations: read_lock(X), write_lock(X), and unlock(X). A lock associated with an item X, LOCK(X), now has three possible states: "read-locked," "writelocked," or "unlocked." A readlocked item is also called share-locked, because other transactions are allowed to read the item, whereas a write-locked item is called exclusive-locked, because a single transaction exclusively holds the lock on the item. When we use the shared/exclusive locking scheme, the system must enforce the following rules: 1. A transaction T must issue the operation read_lock(X) or wri te_l ock(X) before any read_i tem(X) operation is performed in T. 2. A transaction T must issue the operation wri te_l ock(X) before any wri te_i tem(X) operation is performed in T. 3. A transaction T must issue the operation unlock(X) after all read_i tem(X) and wri te_i tem(X) operations are completed in T. 4. A transaction T will not issue a read_lock(X) operation if it already holds a read (shared) lock or a write (exclusive) lock on item X. 5. A transaction T will not issue a wri te_l ock(X) operation if it already holds a read (shared) lock or write (exclusive) lock on item X. This rule may be relaxed. 6. A transaction T will not issue an unlock(X) operation unless it already holds a read (shared) lock or a write (exclusive) lock on item X.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 9 of 12

The two Phase locking(2PL) protocol The two-phase locking protocol is used to ensure the serializability in Database. This protocol is implemented in two phase

Growing Phase In this phase we put read or write lock based on need on the data. In this phase we does not release any lock.

Shrinking Phase This phase is just reverse of growing phase. In this phase we release read

and write lock but doesn't put any lock on data.

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 10 of 12

CONCURRENCY CONTROL BASED ON TIMESTAMP ORDERING A timestamp is a unique identifier created by the DBMS to identify a transaction. Typically, timestamp values are assigned in the order in which the transactions are submitted to the system, so a timestamp can be thought of as the transaction start time. We will refer to the timestamp of transaction T as TS(T). Concurrency control techniques based on timestamp ordering do not use locks; hence, deadlocks cannot occur. The idea for this scheme is to order the transactions based on their timestamps. A schedule in which the transactions participate is then serializable, and the equivalent serial schedule has the transactions in order of their timestamp values. This is called timestamp ordering (TO). The timestamp algorithm must ensure that, for each item accessed by conflicting operations in the schedule, the order in which the item is accessed does not violate the serializability order. To do this, the algorithm associates with each database item X two timestamp (TS) values: 1. Read_TS(X): The read timestamp of item Xi this is the largest timestamp among all the timestamps of transactions that have successfully read item X-that is, read_TS(X) = TS(T), where T is the youngest transaction that has read X successfully. 2. Write_TS(X): The write timestamp of item Xi this is the largest of all the timestamps of transactions that have successfully written item X-that is, write_TS(X) = TS(T), where T is the youngest transaction that has written X successfully. Whenever some transaction T times to issue a read item(X) or a write_item(X) operation, the basic TO algorithm compares the timestamp of T with read_TS(X) and write_TS(X) to ensure that the timestamp order of transaction execution is not violated. If this order is violated, then transaction T is aborted and resubmitted to the system as a new transaction with a new timestamp. If T is aborted and rolled back, any transaction T 1 that may have used a value written by T must also ______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 11 of 12

be rolled back. Similarly, any transaction T2 that may have used a value written by T1 must also be rolled back, and so on. This effect is known as cascading rollback and is one of the problems associated with basic TO, since the schedules produced are not guaranteed to be recoverable.

We first describe the basic TO algorithm here. The concurrency control algorithm must check whether conflicting operations violate the timestamp ordering in the following two cases: 1. Transaction T issues a write_item(X) operation: a) If read_TS(X) > TS(T) or if write_TS(X) > TS(T), then abort and roll back T and reject the operation. This should be done because some younger transaction with a timestamp greater than TS(T)-and hence after T in the timestamp ordering-has already read or written the value of item X before T had a chance to write X, thus violating the timestamp ordering. b) If the condition in part (a) does not occur, then execute the wri te_i tem(X) operation ofT and set write_TS(X) to TS(T). 2. Transaction T issues a read_item(X) operation: a) If write_TS(X) > TS(T), then abort and roll back T and reject the operation. This should be done because some younger transaction with timestamp greater than TS(T)-and hence after T in the timestamp ordering- has already written the value of item X before T had a chance to read X. b) If write_TS(X) ≤TS(T), then execute the read_item(X) operation of T and set read_TS(X) to the larger of TS(T) and the current read_TS(X).

______www.gkmvkalyan.blogspot.in BCA204T: DATA BASE MANAGEMENT SYSTEMS Page 12 of 12

OPTIMISTIC CONCURRENCY CONTROL TECHNIQUES In optimistic concurrency control techniques, also known as validation or certification techniques, no checking is done while the transaction is executing. Several proposed concurrency control methods use the validation technique. In this scheme, updates in the transaction are not applied directly to the database items until the transaction reaches its end. During transaction execution, all updates are applied to local copies of the data items that are kept for the transaction. At the end of transaction execution, a validation phase checks whether any of the transaction‟s updates violate serializability. Certain information needed by the validation phase must be kept by the system. If serializability is not violated, the transaction is committed and the database is updated from the local copies; otherwise, the transaction is aborted and then restarted later.

There are three phases for this concurrency control protocol: 1. Read phase: A transaction can read values of committed data items from the database. However, updates are applied only to local copies (versions) of the data items kept in the transaction workspace. 2. Validation phase: Checking is performed to ensure that serializability will not be violated if the transaction updates are applied to the database. 3. Write phase: If the validation phase is successful, the transaction updates are applied to the database; otherwise, the updates are discarded and the transaction is restarted.

______www.gkmvkalyan.blogspot.in