TABLE OF CONTENTS

Unit 1: Introduction to System 1.1 Introduction 1.2 Objectives 1.3 Traditionalfileorientedapproach 1.4 Motivationfordatabaseapproach 1.5 DatabaseBasics 1.6 Threeviewsofdata 1.7 Thethreelevelarchitectureofdbms 1.7.1 Externallevelorsubschema 1.7.2 Conceptuallevelorconceptualschema 1.7.3 Internallevelorphysicalschema 1.7.4 Mappingbetweendifferentlevels 1.8 Databasemanagementsystemfacilities 1.8.1 Datadefinitionlanguage 1.8.2 Datamanipulationlanguage 1.9 Elementsofadatabasemanagementsystem 1.9.1 Dmlprecompiled 1.9.2 Ddlcompiler 1.9.3 Filemanager 1.9.4 Databasemanager 1.9.5 Queryprocessor 1.9.6 Databaseadministrator 1.9.7 Datadictionary 1.10 Advantagesanddisadvantagesofdbms 1.11 Selftest 1.12 Summary Unit 2: Database Models 2.1 Introduction 2.2 Objectives 2.3 Filemanagementsystem 2.4 Entityrelationship(er)diagram 2.4.1 IntrdoctionofERD 2.4.2 Entityrelationshipdiagram 2.4.3 Generalizationandaggregation 2.4.3.1 Aggregation 2.5 Thehierarchicalmodel 2.6 Thenetworkmodel 2.7 Therelationalmodel 2.8 Advantagesanddisadvantagesofrelationalapproach 2.9 Anexampleofarelationalmodel 2.10 Selftest

1 2.11 Summary Unit 3: File Organisation For dbms 3.1 Introduction 3.2 Objectives 3.3 Fileorganization 3.4 Sequentialfileorganisation 3.4.1 Indexsequentialfileorganization 3.4.1.1 Typesofindexes 3.5 Btrees 3.5.1 Advantagesofbtreeindexes 3.6 Directfileorganization 3.7 Needforthemultipleaccesspath 3.8 Selftest 3.9 Summary Unit 4: Representing Data Elements 4.1 Dataelementsandfields 4.2 Representingrelationaldatabaseelements 4.3 Records 4.4 Representingblockandrecordaddresses 4.5 Clientserversystems 4.6 Logicalandstructuredaddresses 4.7 Recordmodifications 4.8 Indexstructures 4.9 Indexesonsequentialfiles 4.10 Secondaryindexes 4.11 Btrees 4.12 Hashtables 4.13 SelfTest Unit 5: Relational Model 5.1 Introduction 5.2 Objectives 5.3 Conceptsofarelationalmodel 5.4 Formaldefinitionofarelation 5.5 Thecoddcommandments 5.6 Summary Unit 6: Normalization 6.1 Functionaldependency 6.2 Normalization 6.2.1 Firstnormalform 6.2.2 Secondnormalform 6.2.3 Thirdnormalform 6.2.4 Boycecoddnormalform 6.2.5 Multivalueddependency 6.2.6 Fifthnormalform 6.3 Selftest 6.4 Summary

2 Unit 7: Structured Query Language 7.1 Introductionofsql 7.2 Ddlstatements 7.3 Dmlstatements 7.4 Viewdefinitions 7.5 Constraintsandtriggers 7.6 Keysandforeignkeys 7.7 Constraintsonattributesandtuples 7.8 Modificationofconstraints 7.9 Cursors 7.10 Dynamicsql Unit 8: Relational Algebra 8.1 Basicsofrelationalalgebra 8.2 Setoperationsonrelations 8.3 Extendedoperatorsofrelationalalgebra 8.4 Constraintsonrelations 8.5 Selftest 8.6 Summary Unit 9: Management Considerations 9.1 Introduction 9.2 Objectives 9.3 Organisationalresistancetodbmstools 9.4 Conversionfromanoldsystemtoanewsystem 9.5 Evaluationofadbms 9.6 Administrationofadatabasemanagementsystem 9.7 Selftest 9.8 Summary Unit 10: Concurrency Control 10.1 Serialandserializabilityschedules 10.2 Conflictserializability 10.3 Enforcingserializabilitybylocks 10.4 Lockingsystemswithseveralmodes 10.5 Architectureforalockingscheduler 10.6 Managinghierarchiesofdatabaseelements 10.7 Concurrencycontrolbytimestamps 10.8 Concurrencycontrolbyvalidation 10.9 Summary Unit 11: Transaction Management 11.1 Introductionoftransactionmanagement 11.2 Serializabilityandrecoverability 11.3 Viewserializability 11.4 Resolvingdeadlocks 11.5 Distributed 11.6 Distributedcommit

3 11.7 Distributedlocking 11.8 Summary

4

UNIT 1

INTRODUCTION TO DATA BASE SYSTEM

Unit 1: Introduction to Database System 1.1 Introduction 1.2 Objectives 1.3 Traditionalfileorientedapproach 1.4 Motivationfordatabaseapproach 1.5 Databasebasics 1.6 Threeviewsofdata 1.7 Thethreelevelarchitectureofdbms 1.7.1 Externallevelorsubschema 1.7.2 Conceptuallevelorconceptualschema 1.7.3 Internallevelorphysicalschema 1.7.4 Mappingbetweendifferentlevels 1.8 Databasemanagementsystemfacilities 1.8.1.1 Datadefinitionlanguage 1.8.1.2 Datamanipulationlanguage 1.8.1.3 Dataquerylanguage 1.8.1.4 Datacontrollanguage 1.8.1.5 Transactioncontrollanguage 1.9 Elementsofadatabasemanagementsystem 1.9.1 Dmlprecompiled 1.9.2 Ddlcompiler 1.9.3 Filemanager 1.9.4 Databasemanager 1.9.5 Queryprocessor 1.9.6 Databaseadministrator 1.9.7 Datadictionary 1.10 Advantagesanddisadvantagesofdbms 1.11 Selftest 1.12 Summary

1.1 INTRODUCTION Database Management is an important aspect of data processing. It involves, several data models evolving into different DBMS packages. These packages demand certain knowledge in disciplineandprocedurestoeffectivelyusethemindataprocessingapplications.

5 WeneedtounderstandtherelevanceandscopeofDatabaseintheDataprocessingarea.Thiswedo byfirstunderstandingthepropertiesandcharacteristicsofdataandthenatureofdataorganization. Data structure can be defined as specification of data. Different data structures like array, stack, queue,treeandgraphareusedtoimplementdataorganizationinmainmemory.Severalstrategies areusedtosupporttheorganizationofdatainsecondarymemory. Adatabaseisacollectionofrelatedinformationstoredinamannerthatitisavailabletomanyusers fordifferentpurposes.Thecontentofadatabaseisobtainedbycombiningdatafromallthedifferent sources in an organization. So that data are available to all users and redundant data can be eliminated oratleastminimized. TheDBMShelps create anenvironmentinwhichend userhave better access to more and better managed data than they did before the DBMS become the data managementstandard. Adatabasecanhandlebusinessinventory,accountinginformationinitsfilestopreparesummaries, estimates,andotherreports.Therecanbeadatabase,whichstoresnewpaperarticles,magazines, books,andcomics.Thereisalreadyawelldefinedmarketforspecificinformationforhighlyselected group of users on almost all subjects. The database management system is the major software component of a database system. Some commercially available DBMS are INGRES, ORACLE, and Sybase. Adatabasemanagementsystem,therefore,isacombinationofhardwareandsoftwarethat can be used to set up and monitor a database, and can manage the updating and retrieval of database that has been stored in it. Most database management systems have the following facilities/capabilities: • Creating of a file, addition to data, deletion of data, modification of data; creation, addition anddeletionofentirefiles. • Retrievingdatacollectivelyorselectively. • Thedatastoredcanbesortedorindexedattheuser'sdiscretionanddirection. Variousreportscanbeproducedfromthesystem.Thesemaybeeitherstandardizedreportorthat maybespecificallygeneratedaccordingtospecificuserdefinition. Mathematicalfunctionscanbeperformedandthedatastoredinthedatabasecanbemanipulated withthesefunctionstoperformthedesiredcalculations. • Tomaintaindataintegrityanddatabaseuse. • TocreateanenvironmentforDatawarehousingandDatamining.

The DBMS interprets and processes users' requests to retrieve information from a database. The following figure shows that a DBMS serves as an interface in several forms. They may be keyed directlyfromaterminal,orcodedashighlevellanguageprogramstobesubmittedforinteractiveor

6 batchprocessing.Inmostcases,aqueryrequestwillhavetopenetrateseverallayersofsoftwarein theDBMSandoperatingsystembeforethephysicaldatabasecanbeaccessed. 1.2 OBJECTIVES After going through this unit, you should be able, to Appreciate the limitations of the traditional approachtoapplicationsystemdevelopment;Givereasonswhythedatabaseapproachisnowbeing increasingly adopted; — Discuss different views of data; List the components of a database managementsystem;Enumeratethefeature/capabilitiesofadatabasemanagementsystem;andList severaladvantagesanddisadvantagesofDBMS. 1.3 TRADITIONAL FILE ORIENTED APPROACH Thetraditionalfileorientedapproachtoinformationprocessinghasforeachapplicationaseparate masterfileanditsownsetofpersonalfiles.Anorganizationneedsflowofinformationacrossthese applications also and this requires sharing of data,whichissignificantlylackinginthetraditional approach. One major limitations of such a filebased approach is that the programs become dependentonthefilesandthefilesbecomedependentupontheprograms. Disadvantages • Data Redundancy : The same piece of information may be stored in two or more files. For example,theparticularsofanindividualwhomaybeacustomerorclientmaybestoredin two or more files. Some of this information may be changing, such as the address, the paymentmaid,etc.Itisthereforequitepossiblethatwhiletheaddressinthemasterfilefor oneapplicationhasbeenupdatedtheaddressinthemasterfileforanotherapplicationmay havenotbeen.Itmaybenoteasytoevenfindoutastoinhowmanyfilestherepeatingitems suchasthenameoccur. • Program/Data Dependency: In the traditional approach if a data field is to be added to a masterfile,allsuchprogramsthataccessthemasterfilewouldhavetobechangedtoallow forthisnewfieldwhichwouldhavebeenaddedtothemasterrecord. • Lack of Flexibility: Inviewofthestrongcouplingbetweentheprogramandthedata,most information retrieval possibilities would be limited to wellanticipated and predetermined requestsfordata,thesystemwouldnormallybecapableofproducingscheduledrecordsand querieswhichithasbeenprogrammedtocreate. 1.4. MOTIVATION FOR DATABASE APPROACH Having pointed out some difficulties that arise in a straightforward fileoriented approach towards informationsystemdevelopment.Theworkintheorganizationmaynotrequiresignificantsharingof data or complex access. In other words the data and the way it is used in the functioning of the organization are not appropriate to database processing. Apart from needing a more powerful hardware platform, the software for database management systems is also quite expensive. This means that a significant extra cost has to be incurredbyanorganizationifitwantstoadoptthis approach.

7 Advantages gainedbythepossibilityofsharingofthedatawithothers,alsocarrieswithittheriskof unauthorizedaccessofdata.Thismayrangefromviolationofofficeprocedurestoviolationofprivacy rightsofinformationtodownrightthefts.Theorganizations,therefore,havetobereadytocopewith additionalmanagerialproblems. Adatabasemanagementprocessingsystemiscomplexanditcouldleadtoamoreinefficientsystem thantheequivalentfilebasedone. The use ofthe databaseandits possibilityof beingsharedwill,thereforeaffectmanydepartments withintheorganization.Ifdieintegrityofthedataisnotmaintained,itispossiblethatonerelevant piece of data could have been used by many programs in different applications by different users withouttheyarebeingawareofit.Theimpactofthisthereforemaybeverywidespread.Sincedata can be input from a variety sources, the control over the quality of data become very difficult to implement. However,formostlargeorganization,thedifficultiesinmovingovertoadatabaseapproacharestill worthgettingoverinviewoftheadvantagesthataregained,namely,avoidanceofdataduplication, sharingofdatabydifferentprograms,greaterflexibilityanddataindependence. 1.5 DATABASE BASICS Since the DBMS of an organization will in some sense reflect the nature of activities in the organization, some familiarity with the basic concepts, principles and terms used in the field are important. • Data-items: Thetermdataitemisthewordforwhathastraditionallybeencalledthefieldin data processing and is the smallest unit of data that has meaning to its users. The phrase data element or elementary item is also sometimes used. Although the data item may be treated as a molecule of the database, data items are grouped together to form aggregates describedbyvariousnames.Forexample,thedatarecordisusedtorefertoagroupofdata items and a program usually reads or writes the whole records. The data items could occasionallybefurtherbrokendownintowhatmaybecalledanautomaticlevelforprocessing purposes. • Entities and Attributes: Therealworldwouldconsistofoccasionallyatangibleobjectsuchas an employee; a component in an inventory or a space or it may be intangible such as an event, a job description, identification numbers, or an abstract construct. All such items aboutwhichrelevantinformationisstoredinthedatabasearecalledEntities.Thequalitiesof the entity that we store as information are called the attributes. An attribute may be expressedasanumberorasatext.Itmayevenbeascannedpicture,asoundsequence,and amovingpicturethatisnowpossibleinsomevisualandmultimediadatabases. Data processing normally concerns itself with a collection of similar entities and records information about the same attributes of each of them. In the traditional approach, a programmer usually maintains a record about each entity and a data item in each record relates to each attribute. Similar records are grouped into files and such a 2dimensional arrayissometimesreferredtoasaflatfile. • Logical and Physical Data: Oneofthekeyfeaturesofthedatabaseapproachistobringabout a distinction between the logical and the physical structures of the data. The term logical structurereferstothewaytheprogrammersseeit and the physical structure refers to the

8 way data are actually recorded on the storage medium.Evenintheearlystagesofrecords stored on tape, the length of the interrecord tape requires that many logical records be groupedintoonephysicalrecordtoseveralstorageplacesontape.Itwasthesoftware,which separated them when used in an application program, and combined them again before writingbackontape.Intoday'ssystemthecomplexitiesareevengreaterandaswillbeseen whenoneisreferringtodistributeddatabasesthatsomerecordsmayphysicallybelocatedat significantlyremoteplaces. • Schema and Subschema: Having seen that the database does not focus on the logical organizationanddecouplesitfromthephysicalrepresentationofdata,itisusefultohavea termtodescribethelogicaldatabasedescription.Aschemaisalogicaldatabasedescription andisdrawnasachartofthetypesofdatathatareused.Itgivesthenamesoftheentities andattributes,andspecifiestherelationshipsbetweenthem.Itisaframeworkintowhichthe valuesofthedataitemcanbefitted.Likeaninformationdisplaysystemsuchasthatgiving arrivalanddeparturetimeatairportsandrailwaystations,theschemawillremainthesame thoughthevaluesdisplayedinthesystemwillchangefromtimetotime.Therelationships thathasspecifiedbetweenthedifferententitiesoccurringintheschemamaybeaonetoone, onetomany,manytomany,orconditional. Thetermschemaisusedtomeananoverallchartofallthedataitemtypesandrecordtypes storedinadatabase.Thetermsubschemareferstothesameviewbutforthedataitemtypes andrecordtypeswhichaparticularuserusesinaparticularapplicationor.Therefore,many differentsubschemascanbederivedfromoneschema. • Data Dictionary: Itholdsdetailedinformationaboutthedifferentstructures and data types: the details of the logical structure that are mapped into the different structure, details of relationshipbetweendataitems,detailsofallusersprivilegesandaccessrights,performance ofresourcewithdetails. 1.6 THREE VIEWS OF DATA DBMSisacollectionofinterrelatedfilesandasetofprogramsthatallowseveraluserstoaccessand modifythesefiles.Amajorpurposeofadatabasesystemistoprovideuserswithanabstractviewof thedata.However,inorderforthesystemtobeusable,datamustberetrievedefficiently. Theconcernforefficientlyleadstothedesignofcomplexdatastructurefortherepresentationofdata inthedatabase.Bydefininglevelsofabstractaswhichthedatabasemaybeviewed,thereare logical view or external, conceptual view and internal view or physical view . • External view: This is the highest level of abstraction as seen by a user. This level of abstractiondescribesonlythepartofentiredatabase. • Conceptual view: This is the next higher level of abstraction which is the sum total of Data BaseManagementSystemuser'sviews.Inthisweconsider;whatdataareactuallystoredin thedatabase.Conceptuallevelcontainsinformationaboutentiredatabaseintermsofasmall numberofrelativelysimplestructures. • Internal level: The lowest level of abstraction at which one describes how the data are physicallystored.Theinterrelationshipofanythreelevelsofabstractionisillustratedinfigure 2.

9

Fig:Thethreeviewsofdata To illustrate the distinction among different views of data, it can be compared with the concept of datatypesinprogramminglanguages.MosthighlevelprogramminglanguagesuchasC,VC++,etc. supportthenotionofarecordorstructuretype.Forexampleinthe‘C’languagewedeclarestructure (record)asfollows: struct Emp{ char name [30]; char address [100]; } ThisdefinesanewrecordcalledEmpwithtwofields.Eachfieldhasanameanddatatypeassociated withit. InanInsuranceorganization,wemayhaveseveralsuchrecordtypes,includingamongothers: CustomerwithfieldsnameandSalary PremiumpaidandDueamountatwhatdate Insuranceagentnameandsalary+Commission Attheinternallevel,acustomer,Premiumaccount,oremployee(insuranceagent)canbedescribed asasequenceofconsecutivebytes.Attheconceptualleveleachsuchrecordisdescribedbyatype definition, illustrated above and also die interrelation among these record types is defined and describingtherightsorprivilegesassigntoindividualcustomerorendusers.Finallyattheexternal level, we define several views of the database. For example, for preparing the Insurance checks of Customer_details’,onlyinformationaboutthemisrequired;onedoesnotneedtoaccessinformation

10 aboutcustomeraccounts.Similarly,tellerscanaccessonlyaccountinformation.Theycannotaccess informationconcerningaboutthepremiumpaidoramountreceived. 1.7 THE THREE LEVEL ARCHITECTURE OF DBMS Adatabasemanagementsystemthatprovidesthesethreelevelsofdataissaidtofollowthreelevel architectureasshowninfig..Thesethreelevelsaretheexternallevel,theconceptuallevel,andthe internallevel.

Fig:ThethreelevelarchitectureforaDBM Aschemadescribestheviewateachoftheselevels.Aschemaasmentionedearlierisanoutlineora planthatdescribestherecordsandrelationshipsexistingintheview.Theschemaalsodescribesthe wayinwhichentitiesatonelevelofabstractioncanbemappedtothenextlevel.Theoveralldesignof thedatabaseiscalledthedatabaseschema.Adatabaseschemaincludessuchinformationas: CharacteristicsofdataitemssuchasentitiesandattributesFormatforstoragerepresentation Integrityparameterssuchasphysicallyauthorizationandbackuppolitics.Logicalstructureand relationshipamongthosedataitems Theconceptofadatabaseschemacorrespondstoprogramminglanguagenotionoftypedefinition.A variableofagiventypehasaparticularvalueatagiveninstantintime.Theconceptofthevalueofa variableinprogramminglanguagescorrespondstotheconceptofaninstanceofadatabaseschema. Sinceeachviewisdefinedbyaschema,thereexistsseveralschemainthedatabaseandtheseexists several schema in the database and these schema are partitioned following three levels of data abstraction or views. At the lower level we have the physical schema; at the intermediate level we have the conceptual schema, while at the higher level we have a subschema. In general, database systemsupportsonephysicalschema,oneconceptualschema,andseveralsubschemas. 1.7.1 External Level or Subschema

The external level is at the highest level of database abstraction where only those portions of the databaseofconcerntoauserorapplicationprogramareincluded.Anynumberofuserviewsmay existforagivenglobalorconceptualview. 11 Eachexternalviewisdescribedbymeansofaschemacalledanexternalschemaorsubschema.The externalschemaconsistsofthe,definitionofthelogicalrecordsandtherelationshipsintheexternal view.Theexternalschemaalsocontainsthemethodofderivingtheobjectsintheexternalviewfrom theobjectsintheconceptualview.Theobjectsincludeentities,attributes,andrelationships. 1.7.2 Conceptual Level or Conceptual Schema

Oneconceptualviewrepresentstheentiredatabase.Theconceptualschemadefinesthisconceptual view.Itdescribesalltherecordsandrelationshipsincludedintheconceptualviewand,therefore,in the database. There is only one conceptual schema per database. This schema also contains the methodofderivingtheobjectsintheconceptualviewfromtheobjectsintheinternalview. Thedescriptionofdataatthislevelisinaformatindependentofitsphysicalrepresentation.Italso includesfeaturesthatspecifythecheckstoretaindataconsistencyandintegrity. 1.7.3 Internal Level or Physical Schema

Itindicateshowthedatawillbestoredanddescribesthedatastructuresandaccessmethodstobe used bythedatabase.Theinternalschema,whichcontainsthedefinitionofthestoredrecord,the methodofrepresentingthedatafields,expressestheinternalviewandtheaccessaidsused. 1.7.4 Mapping between different Levels

Twomappingsarerequiredinadatabasesystemwiththreedifferentviews.Amappingbetweenthe externalandconceptuallevelgivesthecorrespondenceamongtherecordsandtherelationshipsof theexternalandconceptuallevels. a)EXTERNALtoCONCEPTUAL:Determinehowtheconceptualrecordisviewedbytheuser b) INTERNAL to CONCEPTUAL: Enable correspondence between conceptual and internal levels. It representshowtheconceptualrecordisrepresentedinstorage. An internal record is a record at the internal level, not necessarily a stored record on a physical storagedevice.Theinternalrecordoffigure3maybesplitupintotwoormorephysicalrecords.The physicaldatabaseisthedatathatisstoredonsecondarystoragedevices.Itismadeupofrecords with certain data structures and organized in files. Consequently, there is an additional mapping fromtheinternalrecordtooneormorestoredrecordsonsecondarystoragedevices. 1.8 DATABASE MANAGEMENT SYSTEM FACILITIES TwomaintypesoffacilitiesaresupportedbytheDBMS: Thedatadefinitionfacilityordatadefinitionlanguage(DDL). Thedatamanipulationfacilityordatamanipulationlanguage(DML). Thedataqueryfacilityordataquerylanguage[DQL]. Thedatacontrolfacilityordatacontrollanguage[DCL]. Thetransactioncontrolfacilityordatacontrollanguage[TCL]. 1.8.1 Data Definition Language

12 Data Definition Language is a set of SQL commands used to create, modify and delete database structures(notdata).Thesecommandswouldn'tnormallybeusedbyageneraluser,whoshouldbe accessingthedatabaseviaanapplication.TheyarenormallyusedbytheDBA(toalimitedextent),a databasedesignerorapplicationdeveloper.Thesestatementsareimmediate;theyarenotsusceptible toROLLBACKcommands.YoushouldalsonotethatifyouhaveexecutedseveralDMLupdatesthen issuinganyDDLcommandwillCOMMITalltheupdatesaseveryDDLcommandimplicitlyissuesa COMMITcommandtothedatabase.AnybodyusingDDLmusthavetheCREATEobjectprivilegeand aTablespaceareainwhichtocreateobjects. 1.8.2 Data Manipulation Language

DMLisalanguagethatenablesuserstoaccessormanipulateasorganizedbytheappropriatedata model.Datamanipulationinvolvesretrievalofdatafromthedatabase,insertionofnewdataintothe database, and deletion or modification of existing data. The first of these data manipulation operationsiscalledaquery.AqueryisastatementintheDMLthatrequeststheretrievalofdata from the database. The DML provides commands to select and retrieve data from the database. Commandsarealsoprovidedtoinsert,update,anddeleterecords. TherearebasicallytwotypesofDML: — Procedural: which requires a user to specify what data is needed and how to get it —Nonprocedural:whichrequiresausertospecifywhatdataisneededwithoutspecifyinghowtogetit 9. ELEMENTS OF A DATABASE MANAGEMENT SYSTEM ThemajorcomponentsofaDBMSareexplainedbelow:

1.9.1 DML Precompiled

ItconvertsDMLstatementembeddedinanapplicationprogramtonormalprocedurecallsinthehost language. The precompiled must interact with the query processor in order to generate the appropriatecode. 1.9.2 DDL Compiler

TheDDLcompilerconvertsthedatadefinitionstatementsintoasetoftables.Thesetablescontain informationconcerningthedatabaseandareinaformthatcanbeusedbyothercomponentsofthe DBMS. 1.9.3 File Manager

File manager manages the allocation of space on disk storage and the data structure used to representinformationstoredondisk.Thefilemanagercanbeimplementedusinganinterfacetothe existingfilesubsystemprovidedbytheoperatingsystemofthehostcomputeroritcanincludeafile subsystemwrittenespeciallyfortheDBMS. 1.9.4 Database Manager

13 Databases typically require a large amount of storage space. Corporate databases are usually measured in terms of gigabytes of data. Since the main memory of computers cannot store this information,itisstoredondisks.Dataismovedbetweendiskstorageandmainmemoryasneeded. Sincethemovementofdatatoandfromdiskisslowrelativetothespeedofcontrolprocessingunitof computers,itisimperativethatdatabasesystemstructuredatasoastominimizetheneedtomove databetweendiskandmainmemory.Adatabasemanagerisaprogrammodule,whichprovidesthe interfacebetweenthelowleveldatastoredinthedatabaseandtheapplicationprogramsandqueries submittedtothesystem.Itisresponsibleforinterfacingwithfilesystem. Oneofthefunctionsofdatabasemanageristoconvertuser'squeriescomingdirectlyviathequery processor or indirectly via an application program from the user's logical view to the physical file system.Inaddition,databasemanageralsoperformsthetasksofenforcingconstraintstomaintain the consistency and integrity of the data as well as its security. Synchronizing the simultaneous operationsperformedbyconcurrentusersisunderthecontrolofthedatamanager.Italsoperforms backupandrecoveryoperations. 1.9.5 Query Processor

Thedatabaseuserretrievesdatabyformulatingaqueryinthedatamanipulationlanguageprovided withthedatabase.Thequeryprocessorisusedtointerprettheonlineuser'squeryandconvertitinto anefficientseries ofoperationsinaform capableofbeingsenttothe datamanagerforexecution. The query processor uses the data dictionary to find the structure of the relevant portion of the databaseandusesthisinformationinmodifyingthequeryandpreparinganoptimalplantoaccess thedatabase. 1.9.6 Database Administrator

Data administration is a high level function that is responsible for overall management of data resourcesinanorganizationincludingmaintainingcorporatewidedatadefinitionsandstandards.It isatechnicalfunctionthatisresponsibleforphysicaldatabasedesignandfordealingwithtechnical issuesuchassecurityenforcement,Databaseperformance,backup,andrecovery.Thepersonhaving such control over the system is called the database administrator (DBA). The DBA administers the three levels of the database and, in consultation with the overall user community, sets up the definition of the global view or conceptual level of the database. The DBA further specifies the external view of the various users and applications and is responsible for the definition and implementationoftheinternallevel,includingthestoragestructureandaccessmethodstobeused for the optimum performance of the DBMS. Changes to any of the three levels necessitated by changesorgrowthintheorganizationand/oremergingareunderthecontroloftheDBA. Mappings between the internal and the conceptual levels, as well as between the conceptual and external levels, are also defined by the DBA. Ensuring that appropriate measures are in place to maintaintheintegrityofthedatabaseandthatthedatabaseisnotaccessibletounauthorizedusers isanotherresponsibility.TheDBAisresponsibleforgrantingpermissiontotheusersofthedatabase andstorestheprofileofeachuserinthedatabase.Thisprofiledescribesthepermissibleactivitiesof auseronthatportionofthedatabaseaccessibletotheuserviaoneormoreuserviews.Theuser profile can be used by the database system to verify that a particular user can perform a given operationonthedatabase.

14 The DBA is also responsible for defining procedures to recover the database from failures due to human, natural, or hardware causes with minimal loss of data. This recovery procedure should enable the organization to continue to function and the intact portion of the database should continuetobeavailable. LetussummarizethefunctionsofDBAare: Schemadefinition:Thecreationoftheoriginaldatabaseschema.Thisisaccomplishedbywritinga setofdefinition,whicharetranslatedbytheDDLcompilertoaset of tablesthatare permanently storedinthedatadictionary. Storage Structure and access method definition: The creation of appropriate storage structure and accessmethod.Thisisaccomplishedbywritingasetofdefinitions,whicharetranslatedbythedata storageanddefinitionlanguagecompiler. Schema andPhysical organizationmodification:Either the modification of the database schema or the description of the physical storage organization. These changes, although relatively rare, are accomplished by writing a set of definition which is used by either the DDL compiler or the data storageanddefinitionlanguagecompilertogeneratemodificationtotheappropriateinternalsystem tables(forexamplethedatadictionary). 1.9.7 Data Dictionary

Keepingatrackofalltheavailablenamesthatareusedandthepurposeforwhichtheywereused becomes more and more difficult. Of course it is possible for a programmer who has coined the availablenamestobeartheminmind,butshouldthesameauthorcomebacktohisprogramaftera significanttimeorshouldanotherprogrammerhavetomodifytheprogram,itwouldbefoundthatit isextremelydifficulttomakeareliableaccountofforwhatpurposethedatafileswereused. Theproblembecomesevenmoredifficultwhenthenumberofdatatypesthatanorganizationhasin its database increased. It has also now perceived that the data of an organization is a valuable corporateresourceandthereforesomekindofaninventoryandcatalogueofitmustbemaintained soastoassistinboththeutilizationandmanagementoftheresource. Itisforthispurposethatadatadictionaryordictionary/directoryisemergingasamajortool.An inventory provides definitions of things. A directory tells you where to find them. A data dictionary/directorycontainsinformation(ordata)aboutthedata.Acomprehensivedatadictionary wouldprovidethedefinitionofdataitem,howtheyfitintothedatastructure,andhowtheyrelateto otherentitiesinthedatabase. TheDBAusesthedatadictionaryineveryphaseofadatabaselifecycle,startingfromtheembryonic datagathering phase to the design, implementation, and maintenance phases. Documentation provided by a data dictionary is as valuable to end users and managers as it provided by a data dictionaryisasvaluabletoendusersand managersasitare essentialtothe programmers.Users can plan their applications with the database only if they know exactly what is stored in it. For

15 example,the descriptionofadataitemina datadictionary may include its origin and other text descriptioninplainEnglish,inadditiontoitsdataformat.Thususersandmanagerswillbeableto seeexactlywhatisavailableinthedatabase.Youcouldconsideradatadictionarytobearoadmap, whichguidesuserstoaccessinformationwithinalargedatabase.

Fig:DBMSStructure A data dictionary is implemented as a database so that users can query its content by either interactive or batch processing. Whether or not the cost of acquiring a data dictionary system is justifiabledependsonthesizeandcomplexityoftheinformationsystem.Thecosteffectivenessofa datadictionaryincreasesasthecomplexityofaninformationsystemincreases.Adatadictionarycan beagreatassetnotonlytotheDBAfordatabasedesign,implementation,andmaintenance,butalso to managers or end users in their project planning. Figure 4 shows these components and the connectionamongthem. 1.10. Advantages and Disadvantages of DBMS Databasesystemisthattheorganizationcanexert,viatheDBA,centralizedmanagementandcontrol over the data. The database administrator is the focus of the centralized control. Any application requiring a change in the structure of a data record requires an arrangement with the DBA, who makes the necessary modifications. Such modifications do not effect other applications or users of the record in question. Therefore, these changes meet another requirement of the DBMS: data independence.ThefollowingaretheimportantadvantagesofDBMS:

Advantages

16 Reduction of Redundancies : CentralizedcontrolofdatabytheDBAavoidsunnecessaryduplication ofdataandeffectivelyreducesthetotalamountofdatastoragerequired.Italsoeliminatestheextra processing necessary to trace the required data in a large mass of data. Another advantage of avoidingduplicationistheeliminationoftheinconsistencies that tend to be present in redundant datafiles.AnyredundanciesthatexistintheDBMSarecontrolledandthesystemensuresthatthese multiplecopiesareconsistent.

Sharing Data: Adatabaseallowsthesharingofdataunderitscontrolbyanynumberofapplication programsorusers.

Data Integrity: Centralized control can also ensure that adequate checks are incorporated in the DBMStoprovidedataintegrity.Dataintegritymeansthatthedatacontainedinthedatabaseisboth accurateandconsistent.Therefore,datavaluesbeingenteredforstoragecouldbecheckedtoensure thattheyfallwithinaspecifiedrangeandareofthecorrectformat.Forexample,thevaluefortheage of an employee may be in the range of 16 and 75. Another integrity check that should be incorporated in the database is to ensure that if there is a reference to certain object, that object mustexist.Inthecaseofanautomatictellermachine,forexample,auserisnotallowedtotransfer fundsfromanonexistentsavingaccounttoacheckingaccount.

Data Security: Dataisofvitalimportancetoanorganizationandmaybeconfidential.Unauthorized personsmustnotaccesssuchconfidentialdata.TheDBAwhohastheultimateresponsibilityforthe data in the DBMS can ensure that proper access procedures are followed, including proper authentication schemas for access to the DBMS and additional checks before permitting access to sensitive data. Different levels of security could be implemented for various types of data and operations. Conflict Resolution: DBA chooses the best file structure and access method to get optimal Performance for the responsecritical applications, while permitting less critical applications to continuetousediedatabase,albeitwitharelativelyslowerresponse.

Data Independence: Dataindependenceisusuallyconsideredfromtwopointsofview:physicaldata independence and logical data independence. Physical data independence allows changes in the physical storage devices or organization of the files to be made without requiring changes in the conceptual view or any of the external views and hence in the application programs using the database. Disadvantages A significant disadvantage of the DBMS system is cost. In addition to the cost of purchasing or developingthesoftware,thehardwarehastobeupgradedtoallowfortheextensiveprogramsandthe workspaces required for their execution and storage. The processing overhead introduced by the DBMStoimplementsecurity,integrity,andsharingofthedatacausesadegradationoftheresponse and throughput times. An additional cost is that of migration from a traditionally separate applicationenvironmenttoanintegratedone. While centralization reduces duplication, the lack of duplication requires that the database be adequatelybackedupsothatinthecaseoffailurethedatacanberecovered.Backupandrecovery operationsarefairlycomplexinaDBMSenvironment,andthisisexacerbatedinaconcurrentmulti

17 user database system. In further a database system requires a certain amount of controlled redundanciesandduplicationtoenableaccesstorelateddataitems. Centralizationalsomeansthatthedataisaccessiblefromasinglesourcenamelythedatabase.This increases the potential severity of security breaches and disruption of the operation of the organization because of downtimes and failures. The replacement of a monolithic centralized databasebyafederationofindependentandcooperatingdistributeddatabasesresolvessomeofthe problemsresultingfromfailuresanddowntimes. 1.11 Self Test 1) Definethetermdatabase? 2) Explainlevelsofdatabasewiththehelpofsuitableexample 3) Explaincomponentsofdatabasesystem. 4) ExplainelementsofDBMS. 1.12 Summary • A database is a collection of related information stored in a manner that it is available to manyusersfordifferentpurposes.Thecontentofadatabaseisobtainedbycombiningdata fromallthedifferentsourcesinanorganization.Sothatdataareavailabletoallusersand redundant data can be eliminated or at least minimized. The DBMS helps create an environmentinwhichenduserhavebetteraccesstomoreandbettermanageddatathanthey didbeforetheDBMSbecomethedatamanagementstandard. • The traditional fileoriented approach to information processing has for each application a separate master file and its own set of personal files. An organization needs flow of information across these applications also and this requires sharing of data, which is significantlylackinginthetraditionalapproach. • Adatabasemanagementprocessingsystemiscomplexanditcouldleadtoamoreinefficient systemthantheequivalentfilebasedone.Theuseofthedatabaseanditspossibilityofbeing sharedwill,thereforeaffectmanydepartmentswithintheorganization.Ifdieintegrityofthe dataisnotmaintained,itispossiblethatonerelevantpieceofdatacouldhavebeenusedby manyprogramsindifferentapplicationsbydifferentuserswithouttheyarebeingawareofit. Theimpactofthisthereforemaybeverywidespread.Sincedatacanbeinputfromavariety sources,thecontroloverthequalityofdatabecomeverydifficulttoimplement. • Data-items: Thetermdataitemisthewordforwhathastraditionallybeencalledthefieldin dataprocessingandisthesmallestunitofdatathathasmeaningtoitsusers. Entities and Attributes: Therealworldwouldconsistofoccasionallyatangibleobjectsuchas an employee; a component in an inventory or a space or it may be intangible such as an event, a job description, identification numbers, or an abstract construct. All such items aboutwhichrelevantinformationisstoredinthedatabasearecalledEntities. • Logical and Physical Data: Oneofthekeyfeaturesofthedatabaseapproachistobringabout adistinctionbetweenthelogicalandthephysicalstructuresofthedata. • Aschemaisalogicaldatabasedescriptionandisdrawnasachartofthetypesofdatathat are used. It gives the names of the entities and attributes, and specifies the relationships betweenthem.Itisaframeworkintowhichthevaluesofthedataitemcanbefitted.Likean information display system such as that giving arrival and departure time at airports and 18 railwaystations,theschemawillremainthesamethoughthevaluesdisplayedinthesystem willchangefromtimetotime.

19

UNIT 2

DATABASE MODELS

2.1 Introduction 2.2 Objectives 2.3 Filemanagementsystem 2.4 Entityrelationship(er)diagram 2.4.1 Intrdoctionoferd 2.4.2 Entityrelationshipdiagram 2.4.3 Generalizationandaggregation 2.4.3.1 Aggregation 2.5 Thehierarchicalmodel 2.6 Thenetworkmodel 2.7 Therelationalmodel 2.8 Advantagesanddisadvantagesofrelationalapproach 2.9 Anexampleofarelationalmodel 2.10 Selftest 2.11 Summary

2.1 INTRODUCTION Datamodelingistheanalysisofdataobjectsthatareusedinabusinessorothercontextand the identification of the relationships among these data objects. Data modelingisafirststepindoing objectoriented programming. As a result of data modeling, you can then define the classes that providethetemplatesforprogramobjects. Asimpleapproachtocreatingadatamodelthatallowsyoutovisualizethemodelistodrawasquare (or any other symbol) to represent each individual data item that you know about (for example, a productoraproductprice)andthentoexpressrelationshipsbetweeneachofthesedataitemswith wordssuchas"ispartof"or"isusedby"or"uses"andsoforth.Fromsuchatotaldescription,you can create a set of classes and subclasses that define all the general relationships. These then become the templates for objects that, when executed as a program, handle the variables of new transactionsandotheractivitiesinawaythateffectivelyrepresentstherealworld. 2.2 OBJECTIVES Aftergoingthroughthisunit,youshouldbeableto: • IdentifythestructuresinthedifferentmodelsofDBMS. • Convertanygivendatabasesituationtoahierarchicalorrelationalmodel. • Discussentityrelationshipmodel.

20 • Statetheessentialfeaturesoftherelationalmodel;and • Discussimplementationissuesofallthethreemodels. 2.3 FILE MANAGEMENT SYSTEM

In the early days of data processing, all files were flat files. A flat file is one where each record containsthesametypesofdataitems.Oneormoreofthesedataitemsaredesignatedasthekeyand is used for sequencing the file and for locating and grouping records by sorting and indexing. All these types of structures can be closed. As either trees or clause structures. However, it may be borneinmindthatallthesecomplicatedfilestructurecanbebrokendownintogroupsofflatfiles withredundantdataitem. AnFMSconsistsofanumberofapplicationprograms.Becauseproductivityenhancementinusing anFMScomparedtoaconventionalhighlevellanguageisabout10to1,programmersuseit.But thecase ofuse ofanFMSalsoencouragesenduserswithnopreviousprogrammingexperienceto performquerieswithspecialFMSlanguage.OneofthemorewellknowninthisregardisRPG(Report ProgramGenerator),whichwasverypopularforgeneratingroutinebusinessreports.Inordertouse theRPGtheuserwoulddefinetheinputfieldsrequiredbyfillingoutaninputspecificationformat. Similarlyoutputformatscanbespecifiedbyfillingoutanoutputspecificationforms.Thepossibility ofgivingacertainstructuretotheoutputandtheavailabilityofdefaultoptionsmadethepackage relativelyeasytolearnanduse.SomewellknownexamplesofsuchFMSpackagesareMark4,Data tree,easytree,andpowerhouse.ThestructureofanFMSisdiagrammaticallyillustratedbelow:

Fig:Filemanagementsystem TheFMSreliesuponthebasicaccessmethodsofthehostoperatingsystemfordatamanagement. Butitmayhaveitsownspeciallanguageto be usedinperformingtheretrievals.Thislanguagein somewaysismorepowerfulthanthestandardhighlevelprogramminglanguagesinthewaythey define the data and development applications. Therefore, the file management system may be consideredtobealevelhigherthanmerehighlevellanguages. Thereforesomeofthe advantages ofFMScomparedtostandardhighlevellanguageare:

21 • LesssoftwaredevelopmentcostEvenbyexperiencedprogrammersittakesmonthsoryears indevelopingagoodsoftwaresysteminhighlevellanguage. • Supportofefficient queryfacility online queries for multiplekey retrievals are tedious to program. Of course one could bear in mind the limitations Of an FMS in the sense FMS cannot handle complicatedmathematicaloperationandarraymanipulations.Inordertoremedythesituationsome FMS provide an interface to call other programs written in a high level language or an assembly language. Another limitation ofFMSisthatfordatamanagementandaccessitisrestrictedto basic access methods. The physical and logical links required between different files to be able to cope with complexmultiplekeyqueriesonmultiplefilesisnotpossible.EventhoughFMSisasimple,powerful toolitcannotreplacethehighlevellanguage,norcanitperformcomplexinformationretrievallike DBMS.Itisinthiscontextthatrelianceonagooddatabasemanagementsystembecomeessential. 2.4 ENTITY-RELATIONSHIP (E-R) DIAGRAM

2.4.1 Introduction of ERD

Theentityrelationshipmodelisatoolforanalyzingthesemanticfeaturesofanapplicationthatare independentofevents. Entityrelationship modeling helps reduce data redundancy. This approach includesagraphicalnotationwhichdepictsentityclassesasrectangles,relationshipsasdiamonds, andattributesascirclesorovals.Forcomplexsituationsapartialentityrelationshipdiagrammaybe used to present a summary of the entities and relationships but not include the details of the attributes. The entityrelationship diagram provides a convenient method for visualizing the interrelationships amongentitiesinagivenapplication.Thistoolhasprovenusefulinmakingthetransitionfroman information application description to a formal database schema. The entityrelationship model is usedfordescribingtheconceptualschemeofanenterprisewithoutattentiontotheefficiencyofthe physical database design. The entityrelationship diagrams are later turned into a conceptual schemainoneoftheothermodelsinwhichthedatabaseisactuallyimplemented. Following are short definitions of some of the basic terms that are used for describing important entityrelationshipconcepts: • Entity An entity is a thing that exists and is distinguishable an object, something in the environment. o Entity Instance Aninstanceisaparticularoccurrenceofanentity.Forexample,eachperson isaninstanceofanentity,eachcarisaninstanceofanentity,etc. o Entity Set Agroupofsimilarentitiesiscalledanentityclassor entity class or entity type .An EntityClasshascommonproperties. • Attributes Attributesdescribepropertiesofentitiesandrelationships. o Simple (Scalars) smallestsemanticunitofdata,atomic(nointernalstructure)singulare.g.city o Composite groupofattributese.g.address(street,city,state,zip)

22 o Multivalued (list) multiplevaluese.g.degrees,courses,skills(notallowedinfirstnormalform) o Domain conceptualdefinitionofattributes  anamedsetofscalarvaluesallofthesametype  apoolofpossiblevalues • Relationships A relationship is a connection between entity classes. For example, a relationship between PERSONS and AUTOMOBILES could be an "OWNS" relationship.Thatistosay,automobilesareownedbypeople. o The degree ofarelationshipindicatesthenumberofentitiesinvolved. o The cardinality ofarelationshipindicatesthenumberofinstancesinentityclassE1thatcanor mustbeassociatedwithinstancesinentityclassE2  One-One Relationship Foreachentityinoneclassthereisatmostoneassociatedentityinthe otherclass.Forexample,foreachhusbandthereisatmostonecurrentlegalwife(inthiscountry atleast).Awifehasatmostonecurrentlegalhusband. A One-to-One Association Student ID Student Name Number 524779870 TomJones 543874309 SuzySmart 553673965 JoeBlow ... ... STUDENTIDNUMBER(onetoone)STUDENT * Many-One Relationships OneentityinclassE2isassociatedwithzeroormoreentitiesinclass E1,buteachentityinE1isassociatedwithatmostoneentityinE2.Forexample,awomanmay havemanychildrenbutachildhasonlyonebirthmother. Inaonetomanyassociation,eachmemberofdomainBisassignedtoauniqueelementofdomain A,buteachelementofdomainAmaybeassignedtoseveralelementsfromdomainB.Forinstance, each of the students may have a single major (accounting, finance, etc.), though each major may contain several students. This is a "onetomany" association between majors and students, or a "manytoone" association between students and majors. A onetomany association differs from a manytooneassociationonlyintheorderoftheinvolveddomains.FigureC9showsamanytoone association,whichmaybewrittenas: Student ID Number Major 524779870 Accounting 543874309 Accounting 553673965 Finance ... ... MAJORS(onetomany)STUDENTS Finally, manytomany associations are those in whichneitherparticipantisassignedtoaunique partner.Therelationshipbetweenstudentsandclassesisanexample.Eachstudentmaytakemany

23 different classes. Similarly, each class may be taken by many students. There is no limitation on eitherparticipant.Thisassociationiswritten: STUDENTS(manytomany)CLASSES • Many-Many Relationships Therearenorestrictionsonhowmanyentitiesineitherclassare associatedwithasingleentityintheother.Anexampleofamanytomanyrelationshipwould bestudentstakingclasses.Eachstudenttakesmanyclasses.Eachclasshasmanystudents.  Canbemandatory,optionalorfixed. o Isa Hierarchies Aspecialtypeofrelationshipthatallowsattributeinheritance.For example,tosaythatatruckisaautomobileandanautomobilehasamake,modeland serialnumberimpliesthatatruckalsohasamake,modelandserialnumber. • Keys TheKeyuniquelydifferentiatesoneentityfromallothersintheentityclass.Akeyisan identifier. o Primary Key Identifier used to uniquely identify one particular instance of an entity.  Can be one or more attributes. (consider substituting a single concatenated key attributeformultipleattributekey)  Mustbeuniquewithinthedomain(notjustthecurrentdataset)  Valueshouldnotchangeovertime  Mustalwayshaveavalue  Createdwhennoobviousattributeexists.Eachinstanceisassignedavalue. o Candidate Key whenmultiplepossibleidentifiersexist,eachisacandidatekey o Concatenated Key - Keymadeupofpartswhichwhencombinedbecomeaunique identifier.Multipleattributekeysareconcatenatedkeys. o Borrowed Key Attributes Ifanisarelationshipexists,thekeytothemoregeneral classisalsoakeytothesubclassofitems.Forexample,ifserialnumberisakeyfor automobilesitwouldalsobeakeyfortrucks. o Foreign Keys - Foreignkeysreferencearelatedtablethroughtheprimarykeyofthat related table. Referential Integrity Constraint Foreveryvalueofaforeignkeythereisaprimary keywiththatvalueinthereferencedtablee.g.ifaccountnumberistobeusedina tableforjournalentriesthenthataccountnumbermustexistinthechartofaccounts table. • Events Eventschangeentitiesorrelationships. 2.4.2 Entity-Relationship Diagram Symbolsusedinentityrelationshipdiagramsinclude: • Rectangles representENTITYCLASSES • Circles representATTRIBUTES • Diamonds representRELATIONSHIPS • Arcs Arcsconnectentitiestorelationships. Arcsare alsousedtoconnectattributestoentities.Somestyles ofentityrelationshipdiagramsusearrowsanddouble arrows to indicate the one and the many in relationships.Someuseforksetc.

24 • Underline Keyattributesofentitiesareunderlined. Fragment of an entity relationship diagram.

Entity Relationship Diagram for a simple Accounts Receivable database

2.4.3 Generalization and aggregation

There are two main abstraction mechanisms used to model information: Generalization and aggregation. Generalization is the abstracting process of viewing set of objects as a single general

25 class by concentrating on the general characteristics of the constituent sets while suppressing or ignoringtheirdifferences.Itistheunionofanumber oflowerlevelentitytypesforthe purposeof producing a higherlevel entity type. For instance, student is a generalization of graduate or undergraduate;fulltimeorparttimestudents.Similarly,employeeisageneralizationoftheclasses ofobjectscook,waiter,cashier,etc..GeneralizationisanISArelationship;therefore,managerISAn employee,cookISAnemployee,waiterISAnemployee,andsoforth.Specializationistheabstracting processofintroducingnewcharacteristicstoanexistingclassofobjectstocreateoneormorenew classes of objects. This involves taking a higherlevel entity and, using additional characteristics, generatinglowerlevelentities.Thelowerlevelentitiesalsoinheritthecharacteristicsofthehigher levelentity.Inapplyingthecharacteristicsizetocarwecancreateafullsize,midsize,compact,or subcompact car. Specialization may be seen as the reverse process of generalization: additional specific properties are introduced at a lower level in a hierarchy of objects. Both processes are illustratedinthefollowingfigure9whereinthelowerlevelsofthehierarchyaredisjoint. The entity set EMPLOYEE is a generalization of the entity sets FULL__TIMEEMPLOYEE and PART_TIMEEMPLOYEE.Theformerisageneralizationoftheentitysetsfacultyandstaff,thelatter, thatoftheentitysetsTEACHINGandCASUAL.FACULTYandSTAFFinherittheattributeSalaryof theentitysetFULL_TIME_EMPLOYEEandthelatter,inturn,inheritstheattributesofEMPLOYEE. FULL_TIMEEMPLOYEEisaspecializationoftheentitysetEMPLOYEEandisdifferentiatedbythe additionalattributeSalary.Similarly,PART_TIME_EMPLOYEEisaspecializationdifferentiatedbythe presenceoftheattributeType.

Fig:GeneralizationandSpecialization

26 Indesigningadatabasetomodelasegmentoftherealworld,thedatamodelingschememustbeable to represent generalization. It allows the model to represent generic entities and treat a class of objectsands0pecifyingrelationshipsinwhichthegenericobjectsparticipate. Generalizationformsahierarchyofentitiesandcanberepresentedbyahierarchyoftables,which canalsobeshownthroughfollowingrelationsforconveniences. EMPLOYEE(Emplno,name,Dateofbirth) FULLTIME(Empl_no,salary) PARTTIME(Emplno,type) FACULTY(Empl_no,Degree,Interest) STAFF(Emplno,Hourrate) TEACHING(Emplno,Stipend) Heretheprimarykeyofeachentitycorrespondstoentriesindifferenttablesanddirectsonetothe appropriaterowofrelatedtables. Anothermethodofrepresentingageneralizationhierarchyistohavethelowestlevelentitiesinherit theattributesoftheentitiesofhigherlevels.Thetopandintermediatelevelentitiesarenotincluded asonlythoseofthelowestlevelarerepresentedintabularform.Forinstance,theattributesofthe entity set FACULTY would be (Emp1No, Name, Date ofHire, Salary, Degree, Interest). A separate tablewouldherequiredforeachlowestlevelentityinthehierarchy.Thenumberofdifferenttables requiredtorepresenttheseentitieswouldbeequaltothenumberofentitiesatthelowestlevelofthe generalizationhierarchy. 2.4.3.1 Aggregation

Aggregationistheprocessofcompilinginformationonanobject,therebyabstractingahigherlevel object.Inthismanner,aggregatingthecharacteristics name, address, and Social Security number derives the entity person. Another form of the aggregation is abstracting a relationship between objectsandviewingtherelationshipasanobject.Assuch,theENROLLMENTrelationshipbetween entitiesstudentandcoursecouldbeviewedasentityREGISTRATION.Examplesofaggregationare showninfig. Consider the relationship COMPUTING of fig. Here we have a relationship among the entities STUDENT,COURSE,andCOMPUTINGSYSTEM.

27

Fig:Examplesofaggregation

Fig:Arelationshipamongthreeentities A student registered in a given course uses one of several computing systems to complete assignmentsandprojects.TherelationshipbetweentheentitiesSTUDENTandCOURSEcouldbethe aggregated entity REGISTRATION, as discussed above. In this ease, we could view the ternary relationshipoffigasonebetweenregistrationandtheentityComputingsystem.Anothermethodof aggregatingistoconsiderarelationshipconsistingoftheentityCOMPUTINGSystemsbeingassigned to COURSES. This relationship can be aggregate as a new entity and a relationship established between it and STUDENT. Note that the difference betweenarelationshipinvolvinganaggregation andonewiththethreeentitiesliesinthenumberofrelationships.Intheformercasewehavetwo relationships;inthelatteronly one exists.The approachto betakendependsonwhatwewantto express.WewouldusetheternaryrelationshiprelatedtoaCOMPUTINGSYSTEM.

28 2.5 THE HIERARCHICAL MODEL A DBMS belonging to the hierarchical data model uses tree structures to represent relationship among records. Tree structures occur naturally in many data organizations because some entities haveanintrinsichierarchicalorder.Forexample,aninstitutehasanumberofprogrammerstooffer. Eachprogramhasanumberofcourses.Eachcoursehasanumberofstudentsregisteredinit.The followingfiguredepicts,thefourentitytypesInstitute,Program,CourseandStudentmakeupthe fourdifferentlevelsofhierarchicalstructure.Thefigure12showsanexampleofdatabaseoccurrence foraninstitute.Adatabaseisacollectionofdatabaseoccurrence.

Fig:AsimpleHierarchy Ahierarchicaldatabasethereforeconsistsofacollectionofrecords,whichareconnectedwitheach otherthroughlinks.Eachrecordisacollectionoffields(attributes),eachofwhichcontainsonedata value.Alinkisanassociationbetweenpreciselytworecords. A tree structure diagram serves the same purpose as an entityrelationship diagram; namely it specifiestheoveralllogicalstructureofthedatabase. Thefollowingfigureshowstypicaldatabaseoccurrenceofahierarchicalstructure(treestructure).

Fig:Databaseoccurrenceofahierarchicalstructure Thehierarchicaldatamodelhasthefollowingfeatures: — Each hierarchical tree can have only one root record type and this record type does not have a parentrecordtype.

29 — The root cart have any number of child record types and each of which can itself be a root of a hierarchicalsubtree. — Each child record type can have only one parent record type; thus a M:N relationship cannot be directlyexpressedbetweentworecordtypes. —Datainaparentrecordappliestoallitschildrenrecords — A child record occurrence must have a parent record occurrence; deleting a parent record occurrencerequiresdeletingitsentirechildrenrecordoccurrence.

2.6 THE NETWORK MODEL

TheDatabaseTaskGroupoftheConferenceonDataSystemLanguage(DBTG/CODASYL)formalized thenetworkdatamodelinthelate1960s.Theirfirstreportthathasbeenrevisedanumberoftimes, contained detailed specifications for the network data model (a model conforming to these specificationsisalsoknownastheDBTGdatamodel).Thespecificationscontainedinthereportand its subsequent revisions have been subjected to much debate and criticism. Many of the current databaseapplicationshavebeenbuiltoncommercialDBMSsystemsusingtheDBTGmodel. DBTGSet: TheDBTGmodelusestwodifferentdatastructurestorepresentthedatabaseentitiesand relationshipsbetweentheentities,namelyrecordtypeandsettype.Arecordtypeisusedtorepresent anentitytype.Itismadeupofanumberofdataitemsthatrepresenttheattributesoftheentity. A set is used to represent a directed relationship between two record types, the socalled owner recordtype,andthememberrecordtype.Thesettype,liketherecordtype,isnamedandspecifies thatthereisaonetomanyrelationship(I:M)betweentheownerandmemberrecordtypes.Theset typecanhavemorethanonerecordtypeasitsmember,butonlyonerecordtypeisallowedtobe theownerinagivensettype.Adatabasecouldhaveoneormoreoccurrencesofeachofitsrecord and set types. An occurrence of a set type consists of an occurrence of each of its record and set types.Anoccurrenceofasettypeconsistsofanoccurrenceoftheownerrecordtypeandanynumber ofoccurrencesofeachofitsmemberrecordtypes.Arecordtypecannotbeamemberoftwodistinct occurrencesofthesamesettype. Bachman introduced a graphical means called a data structure diagram to denote the logical relationship implied by the set. Here a labelled rectangle represents the corresponding entity or recordtype.Anarrowthatconnectstwolabeledrectanglesrepresentsasettype.Thearrowdirection is from the owner record type to the member record type. Figure shows two record types (DEPARTMENT and ENTLOYEE) and the set type DEPITEMP, with DEPARTMENT as the owner recordtypeandEMPLOYEEasthememberrecordtype.

Fig:ADBTGset Thedatastructurediagramshavebeenextendedtoincludefieldnamesintherecordtyperectangle, andthearrowisusedtoclearlyidentifythedatafieldsinvolvedinthesetassociation.Aonetomany

30 (I:M)relationshipisshownbyasetarrowthatstartsfromtheownerfieldintheownerrecordtype. The arrow points to the member field within the member record type. The fields that support the relationshipareclearlyidentified. AlogicalrecordtypewiththesamenamerepresentseachentitytypeinanERdiagram.Datafields oftherecordrepresenttheattributesoftheentity.Weusethetermlogicalrecordtoindicatethatthe actualimplementationmaybequitedifferent. The conversion oftheERdiagramintoanetwork database consists ofconvertingeachI:M binary relationshipintoaset(a1:1binaryrelationshipbeingaspecialcaseofa1:Mrelationship).Ifthereis a1:MbinaryrelationshipR1fromentitytypeE1toentitytypeE2,

Fig:ConversionofanM:Nrelationshipintotwo1:MDBTGsets ThenthebinaryrelationshipisrepresentedbyasetaninstanceofthiswouldbeS1withaninstance oftherecordtypecorrespondingtoentityE1astheownerandoneormoreinstancesoftherecord typecorrespondingtoentityE2asthemember.Ifarelationshiphasattributes,unlesstheattributes canbeassignedtothememberrecordtype,theyhavetobemaintainedinaseparatelogicalrecord typecreatedforthispurpose.Theintroductionofthisadditionalrecordtyperequiresthattheoriginal setbeconvertedintotwosymmetrical sets,withthe record corresponding to the attributes of the relationship as the member in both the sets and the records corresponding to the entities as the owners. 2.7 THE RELATIONAL MODEL Atechniquecallednormalizationtheentanglementobservationinthetreeandnetworkstructurecan be replaced by a relatively neater structure. Codd principles relate to the logical description of the dataanditisimportantbearinmindthatthisisquiteindependentandfeasiblewayinwhichthe data is stored. It is only some years back that these concepts have emerged from the research development test and trial stages and are being seen as commercial projects. The attractiveness of the relational approach arouses from the simplicityinthedataorganizationandtheavailabilityof reasonablysimpletoverypowerfulquerylanguages.Thesizeoftherelationaldatabaseapproachis that all the data is expressed in terms of tables and nothing but tables. Therefore, all entities and attributeshavetobeexpressedinrowsandcolumns Thedifferencesthatariseintherelationalapproachareinsettinguprelationshipsbetweendifferent tables. This actually makes use of certain mathematical operations on the relation such as projection, union, joins, etc. These operations from relational algebra and relational calculus are 31 discussioninsomemoredetailsinthesecondBlockofthiscourse.Similarlyinordertoachievethe organizationofthedataintermsoftablesinasatisfactorymanner,atechniquecallednormalization isused. A unit in the second block of this course describes in detail the processing of normalization and variousstagesincludingthefirstnormalforms,secondnormalforms,andthethirdnormalforms.At themomentitissufficienttosaythatnormalizationisatechnique,whichhelpsindeterminingthe mostappropriategroupingofdataitemsintorecords,segmentsortuples.Thisisnecessarybecause in the relational model the data items are arranged in tables, which indicate the structure, relationship,andintegrityinthefollowingmanner: 1.Inanygivencolumnofatable,allitemsareofthesamekind 2.Eachitemisasimplenumberoracharacterstring 3.Allrowsofatablearedistinct. 4.Orderingofrowswithinatableisimmaterial 5. The columns of a table are assigned distinct names and the ordering of these columns is immaterial 6. If a table has N columns, it is said to be of degree N. This is sometimes also referred to as the cardinality of the table. From a few base tables it is possible by setting up relations; create views, whichprovidethenecessaryinformationtothedifferentusersofthesamedatabase.

2.8 Advantages and Disadvantages of Relational Approach

Advantages of Relational approach

The popularity of the relational database approach has been apart from access of availability of a largevarietyofproductsalsobecauseithascertaininherentadvantages. 1. Ease of use: The revision of any information as tables consisting of rows and columns is quite naturalandthereforeevenfirsttimeusersfinditattractive. 2.Flexibility:Differenttablesfromwhichinformation has to be linked and extracted can be easily manipulated by operators such as project and join to give information in the form in which it is desired. 3. Precision: The usage of relational algebra and relational calculus in the manipulation of he relations between the tables ensures that there is no ambiguity, which may otherwise arise in establishingthelinkagesinacomplicatednetworktypedatabase. 4. Security: Security control and authorization can also be implemented more easily by moving sensitive attributes in a given table into a separate relation with its own authorization controls. If authorizationrequirementpermits,aparticularattributecouldbejoinedbackwithotherstoenable fullinformationretrieval. 5.DataIndependence:Dataindependenceisachievedmoreeasilywithnormalizationstructureused inarelationaldatabasethaninthemorecomplicatedtreeornetworkstructure. 6. Data Manipulation Language: The possibility of responding to adhoc query by means of a language based on relational algebra and relational calculus is easy in the relational database approach. For data organized in other structure the query language either becomes complex or extremelylimitedinitscapabilities.

32 Disadvantages of Relational Approach

Amajorconstraintand thereforedisadvantageintheuseofrelational databasesystemismachine performance.Ifthenumberoftablesbetweenwhichrelationshipstobeestablishedarelargeandthe tablesthemselvesarevoluminous,theperformanceinrespondingtoqueriesisdefinitelydegraded.It mustbeappreciatedthatthesimplicityintherelationaldatabaseapproacharisesinthelogicalview. With an interactive system, for example an operation like join would depend upon the physical storagealso.Itis,thereforecommoninrelationaldatabasestotunethedatabasesandinsuchacase thephysicaldatalayoutwouldbechosensoastogivegoodperformanceinthemostfrequentlyrun operations. It therefore would naturally result in the fact that the lays frequently run operations wouldtendtobecomeevenmoreshared Whiletherelationaldatabaseapproachisalogicallyattractive,commerciallyfeasibleapproach,butif the data is for example naturally organized in a hierarchical manner and stored as such, the hierarchicalapproachmaygivebetterresults.Itishelpfultohaveasummaryviewofthedifferences betweentherelationalandthenonrelationalapproachinthefollowingsection. 2.9 An Example of a Relational Model

LetusseeimportantfeaturesofaRDBMSthroughsomeexamplesasshowninfigure24.Arelation hasthefollowingproperties: 1.Atableisperceivedasatwodimensionalstructurecomposedofrowandcolumns. 2.Eachcolumnhasadistinctname(attributename),andtheorderofcolumnsisimmaterial. 3.Eachtablerow(tuple)representsasingledataentitywithintheentityset. 4.Eachrow/columnintersectionrepresentsasingledatavalue. 5.Eachrowcarriesinformationdescribingoneentityoccurrence.

Fig:Exampleofarelationaldatamodel As showninfig,atupleisthecollection ofvaluesthatcompose onerow ofarelation. Atupleis equivalent to a record instance. An ntuple is a tuple composed of n attribute values, where n is called the degree of the relation. PRODUCT is an example of a 4tuple. The number of tuples in a relationisitscardinality. Adomainisthesetofpossiblevaluesforanattribute.Forexample,thedomainforQUANTITYON HANDinthePRODUCTrelationisallintegersgreaterthanorequaltozero.ThedomainforCITYin theVENDORrelationisasetofalphabeticcharactersstringsrestrictedtothenamesofU.S.cities. PRODUCT(PRODUCT#,DESCRIPTION,PRICE,

33 QUANTITYONHAND) VENDOR(VENDOR#,VENDORNAME,VENDORCITY) SUPPLIES(VENDOR#,PRODUCT#,VENDORPRICE) Inthisform,theattribute(orattributesincombination)forwhichnomorethanonetuplemayhave thesame(combined)valueiscalledtheprimarykey.(Theprimarykey,attributesareunderlinedfor clarity.)Therelationaldatamodelrequiresthataprimarykeyofatuple(oranycomponentattribute if a combined key) may not contain a null value. Although several different attributes (called candidatekeys)mightserveastheprimarykey,onlyone(oronecombination)ischosen.Theseother keysarethencalledalternatekeys. TheSUPPLIESrelationinfigrequirestwoattributesincombinationinordertoidentifyuniquelyeach tuple. A composite or concatenated key is a key thatconsistsoftwo ormoreattributesappended together.Concatenatedkeysappearfrequentlyinarelationaldatabase,sinceintersectiondata,like VENDORPRICE, may be uniquely identified by a combination of the primary keys of the related entities.Eachcomponentofaconcatenatedkeycanbeusedtoidentifytuplesinanotherrelation.In fact, values for all component keys of a concatenated key must be present, although monkey attributevaluesmaybemissing.Further,therelationalmodelhasbeenenhancedtoindicatethata tuple (e.g., for PRODUCT logically should exist with its key value (e.g., PRODUCT#) if that value appearsinaSUPPLIEStuple;thisdealswithexistencedependencies. 2.10 SELF TEST 1) Whatdoyoumeanbythetermdatamodel? 2) CreateandERDofHostelmanagement. 3) Explaindifferencebetween1:1and1:Mrelationship. 2.11 SUMMARY • Datamodelingistheanalysisofdataobjectsthatareusedinabusinessorothercontextand theidentificationoftherelationshipsamongthesedataobjects.Datamodelingisafirststep indoingobjectorientedprogramming. • Oneormoreofthesedataitemsaredesignatedasthekeyandisusedforsequencingthefile andforlocatingandgroupingrecordsbysortingandindexing. • Advantages: Less software development cost Even by experienced programmers it takes months or years in developing a good software system in high level language; Support of efficientqueryfacilityonlinequeriesformultiplekeyretrievalsaretedioustoprogram. • The entityrelationship model is a tool for analyzing the semantic features of an application thatareindependentofevents.Entityrelationshipmodelinghelpsreducedataredundancy.

34

UNIT 3

FILE ORGANISATION FOR DBMS

3.1 Introduction 3.2 Objectives 3.3 Fileorganization 3.4 Sequentialfileorganisation 3.4.1 Indexsequentialfileorganization 3.4.1.1 Typesofindexes 3.5 Btrees 3.5.1 Advantagesofbtreeindexes 3.6 Directfileorganization 3.7 Needforthemultipleaccesspath 3.8 Selftest 3.9 Summary

3.1 INTRODUCTION

Justasarrays,lists,treesandotherdatastructuresareusedtoimplementdataorganizationinmain memory,anumberofstrategiesareusedtosupporttheorganizationofdatainsecondarymemory.

Fig:Fileorganizationtechniques

35 Look at different strategies of organizing data in the secondary memory. We are concerned with obtaining data representation for files on external storage devices so that required functions (e.g. retrieval, update) may be carried out efficiently. The particular organization most suitable for any applicationwilldependuponsuchfactorsasthekindofexternalstorageavailable,typesofqueries allowed,numberofkeys,modeofretrievalandmodeofupdate. 3.2 OBJECTIVES

Aftergoingthroughthisunityouwillbeableto: Definewhatafileorganizationis Listfileorganizationtechniques Discussimplementationtechniquesofindexingthroughtreestructure Discussimplementationofdirectfileorganization Discussimplementationof.Multikeyfileorganization Discusstradeoffandcomparisonamongfileorganizationtechniques 3.3 FILE ORGANISATION

Thetechniqueusedtorepresentandstoretherecordsonafileiscalledthefileorganization.Thefour fundamentalfileorganizationtechniquesthatwewilldiscussarethefollowing: 1.Sequential 2.Relative 3.Indexedsequential 4.Multikey There are two basic ways that the file organization techniques differ. First the organization determinesthefile'srecordsequencing,whichisthephysicalorderingoftherecordsinstorage. Second, the file organization determines the set of operation necessary to find particular records. Having particular values in searchkey fields typically identifies individual records. This data field mayormaynothaveduplicatevaluesinfile;thefieldcanbeagrouporelementaryitem.Somefile organization techniques provide rapid accessibility on a variety of search key; other techniques supportdirectaccessonlyonthevalueofasingleone. 3.4 SEQUENTIAL FILE ORGANISATION

The most basic way to organize the collection of records that from a file is to use sequential organization.Inasequentiallyorganizedfilerecordsarewrittenconsecutivelywhenthefileiscreated andmustbeaccessedconsecutivelywhenthefileislaterusedforinput(fig).

36

Fig:Structureofsequentialfile Inasequentialfile,recordsaremaintainedinthelogicalsequenceoftheirprimarykeyvalues.The processing of a sequential file is conceptually simple but inefficient for random access. However, if accesstothefileisstrictlysequential,asequentialfileissuitable.Asequentialfilecouldbestoredon asequentialstoragedevicesuchasamagnetictape. Searchforagivenrecordinasequentialfilerequires,onaverage,accesstohalftherecordsinthe file.Considerasystemwherethefileisstoredonadirectaccessdevicesuchasadisk.Supposethe keyvalueisseparatedfromtherestoftherecordandapointerisusedtoindicatethelocationofthe record.Insuchasystem,thedevicemayscanoverthekeyvaluesatrotationspeedsandonlyreadin thedesiredrecord.Abinaryorlogarithmicsearchtechniquemayalsobeusedtosearchforarecord. Inthismethod,thecylinderonwhichtherequiredrecordisstoredislocatedbyaseriesofdecreasing headmovements.Thesearchhavingbeenlocalizedtoacylindermayrequirethereadingofhalfthe tracks,onaverage,inthecasewherekeysareembeddedinthephysicalrecords,orrequireonlya scanoverthetracksinthecasewherekeysarealsostoredseparately. Updatingusuallyrequiresthecreationofanewfile.Tomaintainfilesequence,recordsarecopiedto the point where amendment is required. The changes are then made and copied into the new file. Followingthis,theremainingrecordsintheoriginalfilearecopiedtothenewfile.Thismethodof updating a sequential file creates an automatic backup copy. It permits updates of the type U1 throughU4. Additioncanbehandledinamannersimilartoupdating.Addingarecordnecessitatestheshiftingof allrecordsfromtheappropriatepointtotheendoffiletocreatespaceforthenewrecord.Inversely, deletion of a record requires a compression of the file space, achieved by the shifting of records. Changestoanexistingrecordmayalsorequireshiftingiftherecordsizeexpandsorshrinks. Thebasicadvantageofferedbyasequentialfileistheeaseofaccesstothenextrecord,thesimplicity oforganizationandtheabsenceofauxiliarydatastructures.However,repliestosimplequeriesare timeconsumingforlargefiles.Updates,asseenabove,usuallyrequirethecreationofanewfile.A singleupdateisanexpensivepropositionifanewfilemustbecreated.Toreducethecostperupdate, allsuchrequestsarehatched,sortedintheorderofthesequen1tialfile,andthenusedtoupdatethe sequentialfileinasinglepass.Suchafile,containingtheupdatestobemadetoasequentialfile,is sometimesreferredtoatransactionfile.

37 Inthebatchedmodeofupdating,atransactionfileofupdaterecordsismadeandthensortedinthe sequenceofthesequentialfile.Aleupdateprocessrequirestheexaminationofeachindividualrecord intheoriginalsequentialfile(theoldmasterrile).Recordsrequiringnochangesarecopieddirectlyto a new file (the new master rile); records requiring one or Wore changes are written into the new masterfileonlyafterallnecessarychangeshavebeenmade.Insertionsofnewrecordsaremadein thepropersequence.Theyarewrittenintothenewmasterfileattheappropriateplace.Recordstobe deletedarenotcopiedtothenewmasterfile.Abigadvantageofthismethodofupdateisthecreation of an automatic backup copy. The new master file can always be recreated by processing the old masterfileandthetransactionfile.

Fig:Afilewithemptyspacesforrecordinsertions Apossiblemethodofreducingthecreationofanewfileateachupdaterunistocreatetheoriginal filewith"holes"(spaceleftfortheadditionofnewrecords,asshowninthelastfigure).Assuch,ifa blockcouldholdKrecords,thenatinitialcreationitismadetocontainonlyL*Krecords,where0< L<1isknownastheloadingfactor.Additionalspacemayalsobeearmarkedforrecordsthatmay "overflow"theirblocks,e.g.,iftherecordrilogicallybelongstoblockBibutthephysicalblockBidoes not contain the requisite free space. This additional free space is known as the overflow area. A similartechniqueisemployedinindexsequentialfiles.

3.4.1 INDEX-SEQUENTIAL FILE ORGANISATION Theretrievalofarecordfromasequentialfile,onaverage,requiresaccesstohalftherecordsinthe file,makingsuchinquiriesnotonlyinefficientbutverytimeconsumingforlargefiles.Toimprovethe queryresponsetimeofasequentialfile,atypeofindexingtechniquecanbeadded. An index is a set of y, address pairs. Indexing associates a set of objects to a set of orderable quantities,whichareusuallysmallerinnumberortheirpropertiesprovideamechanismforfaster search.Thepurposeofindexingistoexpeditethesearchprocess.Indexescreatedfromasequential (orsorted)setofprimarykeysarereferredtoasindexsequential.Althoughtheindicesandthedata blocks are held together physically, we distinguish between them logically. We shall use the term indexfiletodescribetheindexesanddatafiletorefertothedatarecords.Theindexisusuallysmall enoughtobereadintotheprocessormemory.

3.4.1.1 Types of Indexes Theideabehindanindexaccessstructureissimilartothatbehindtheindexesusedcommonlyin textbooks.Alongwitheachterm,alistofpagenumbers where the term appears is given. We can searchtheindextofindalistofaddressespagenumbersinthiscaseandusetheseaddressesto locatetheterminthetextbookbysearchingthespecifiedpages.Thealternative,ifnootherguidance

38 isgiven,istosiftslowlythroughthewholetextbookswordbywordtofindthetermweareinterested in,whichcorrespondstodoingalinearsearchonafile.Of course,mostbooks dohaveadditional information, such as chapter and section titles, which can help us find a term without having to searchthroughthewholebook.However,theindexistheonlyexactindicationofwhereeachterm occursinthebook. Anindexisusuallydefinedonasinglefieldofafile,calledanindexingField.Thereareseveraltypes ofindexes.Aprimaryindexisanindexspecifiedontheorderingkeyfieldofanorderedfileofrecords. Recallthatanorderingkeyfieldisusedtophysicallyorderthefilerecordsondisk,andeveryrecord hasauniquevalueforthatfield.Iftheorderingfieldisnotakeyfieldthatis,severalrecordsinthe filecanhavethesamevaluefortheorderingfieldanothertypeofindex,calledaclusteringindex,can beused.Noticethatafilecanhaveatmostonephysicalorderingfield,soitcanhaveatmostone primaryindexoroneclusteringindex,butnotboth.Athirdtypeofindex,calledasecondaryindex, can be specified on any nonordering field of a file. A file can have several secondary indexes in additiontoitsprimaryaccessmethod.Inthenextthreesubsectionswediscussthesethreetypesof indexes. • Primary Indexes

Aprimaryindexisanorderedfilewhoserecordsareoffixedlengthwithtwofields.Thefirstfieldisof thesamedatatypesastheorderingkeyfieldofthedatafile,andthesecondfieldisapointertoa diskblockablockaddress.Theorderingkeyfieldiscalledtheprimarykeyofthedatafilethereis oneindexentry(orindexrecord)intheindexfileforeachblockinthedatafile.Eachindexentryhas thevalueoftheprimarykeyfieldforthefirstrecordinablockandapointertootherblockasitstwo fieldvalues.WewillrefertothetwofieldvaluesofindexentryiasK(i),P(i). Tocreateaprimaryindexontheorderedfileshowninfigure4,weusetheNamefieldastheprimary key,becausethatistheorderingkeyfieldofthefile(assumingthateachvalueofNAMEisunique).

39 Fig:Someblocksonanordered(sequential)fileofEMPLOYEErecordswithNAMEas theorderingfield EachentryintheindexwillhaveaNAMEvalueandapointer.Thefirstthreeindexentrieswouldbe: <K(1)=(Aaron,Ed),P(I)=addressofblock1> <K(2)=(Adams,John),P(I)=addressofblock2> <K(3)=(Alexander,Fd),P(3)=addressofblock3>

40

Fig : Illustrating internal hashing data Structures. (a) Array of M Positions for use in hashing. (b) Collision resolution by chainingofrecords. FigillustratesthisPrimaryindex.Thetotalnumberofentriesintheindexwillbethesameasthe numberofdiskblockintheordereddatafile.Thefirstrecordineachblockofthedatafile.Thefirst record in each block of the data file is called the anchor record of the block, or simple the block anchor(aschemecalledtheanchorrecordofsimilartotheonedescribedherecanbeused,withthe lastrecordineachblock,ratherthanthefirstastheblockanchor.Aprimaryindexisanexampleof what is called a nondense index because it includes an entry for each disk block of the data file ratherthanforeveryrecordinthedatafile.Adenseindex,ontheotherhand,containsanentryfor everyrecordinthefile. Theindexfileforaprimaryindexneedssubstantiallyfewerblocksthanthedatafilefortworeasons. First,therearefewerindexentriesthantherearerecordsinthedatafilebecauseanentryexistsfor eachwholeblockofthedatafileratherthanforeachrecord.Second,eachindexentryistypically smaller in size than a data record because it has only two fields, so more index entries than data recordswillfitinoneblock.Abinarysearchontheindexfilewillhencerequirefewerblockaccesses thanabinarysearchonthedatafile.

41

Fig:Primaryindexontheorderingkeyfield ArecordwhoseprimarykeyvalueisKwillbeintheblockwhoseaddressisP(i),whereKi<K<(i+1). The ith block in the data file contains all such records because of the physical ordering of the file recordsontheprimarykeyfield,wedoabinarysearchontheindexfiletofindtheappropriateindex entryi,thenretrievethedatafileblockwhoseaddressisP(i).Noticethattheaboveformulawouldnot becorrectifthedatafilewasorderedonanonkeyfieldthatallowsmultiplerecordstohavethesame orderingfieldvalue.Inthatcasethesameindexvalueasthatintheblockanchorcouldberepeated in the last records of the previous block. Example 1 illustrates the saving in block accesses when usinganindextosearchforarecord. Example1:Supposewehaveanorderedfilewithr=30,000recordsstoredonadiskwithblocksize B=1024bytes.Filerecordsareoffixedsizeand unplannedwithrecordlengthR 100 bytes. The blockingfactorforthefilewouldbebfrL[(B/R)]=[(1024/100)]=10recordsperblock.Thenumberof blocksneededforthefileisb=[(r/bfr)]=[(30,000/10)]=3000blocks.Abinarysearchonthedatafile wouldneedapproximately[(log2b)]=[(log23000)]=12blockaccesses.

42 NowsupposetheorderingkeyfieldofthefileisV=9byteslong,ablockpointerisP=6byteslong, andweconstructaprimaryindexforthefile.ThesizeofeachindexentryisRi=(9+6)=15bytes,so theblockingfactorfortheindexisbfri=[(B/R)]=[(1024/15)]=68entriesperblock. Thetotalnumberofindexentriesriisequaltothenumberofblocksinthedatafile,whichis3000. The number of blocks needed for the index is hence bi = [ri/bfri)]= [(3OOO/68)]= 45 blocks. To perform a binary search on the index file would need [ (log2bi)] = [(log245)]=6 block accesses. To searchforarecordusingtheindex,weneedoneadditionalblockaccesstothedatafileforatotalof 6+1=7blockaccessesanimprovementoverbinary searchonthe datafile,whichrequired 12 blockaccesses. Amajorproblemwithaprimaryindexaswithanyorderedfileisinsertionanddeletionofrecords. Withaprimaryindex,theproblemis compoundedbecause if we attempt to insert a record in its correctpositioninthedatafile,wenotonlyhavetomoverecordstomakespaceforthenewrecord butalsohavetochangesomeindexentriesbecausemovingrecordswillchangetheanchorrecordsof some blocks. We can use an unordered overflow file. Another possibility is to use a linked list of overflowrecordsforeachblockinthedatafile.Wecankeeptherecordswithineachblockandits overflowlinked list sorted to improve retrieval time. Record deletion can be handled using deletion markers. • Clustering Indexes

Ifrecordsofafilearephysicallyorderedonanonkeyfieldthatdoesnothaveadistinctvalue,for eachrecord,thatfieldiscalledtheclusteringfieldofthefile.Wecancreateadifferenttypeofindex, calledaclusteringindex,tospeedupretrievalofrecordsthathavethesamevaluefortheclustering field.

43

Fig: A clustering Index on the DEPTNUMBER ordering field of an EMPLOYEEfile A clustering index is also an ordered file with twofields;thefirstfieldisofthesametype asthe clustering field of the data file and the second field is a block pointer. There is one entry in the clusteringindexforeachdistinctvalueoftheclusteringfield,containingthatvalueandapointerto thefirstblockinthedatafilethathasarecordwiththatvalueforitsclusteringfield.

44

Fig: Clustering index with separate blocks for each group of records with thesamevaluefortheclusteringfield The record and record deletion still cause considerable problems because the data records are physicallyordered.Toalleviatetheproblemofinsertion, it is common to reserve a whole block for eachvalueoftheclusteringfield;allrecordswiththatvalueareplacedintheblock.Ifmorethanone blockisneededtostoretherecordsforaparticularvalue,additionalblocksareallocatedandlinked together.Thismakesinsertionanddeletionrelativelystraightforward.Figure8showsthisscheme. Aclusteringindexisanotherexampleofanondenseindexbecauseithasanentryforeverydistinct valueoftheindexingfieldratherthanforeveryrecordinthefile.

45 • Secondary Indexes Asecondaryindexalsoisanorderedfilewithtwofields,and,asintheotherindexes,thesecondfield isapointertoadiskblock.Thefirstfieldisofthesamedatatypeassomenonorderingfieldofthe datafile.Thefieldonwhichthesecondaryindexisconstructediscalledanindexingfieldofthefile, whether its values are distinct for every record or not. There can be many secondary indexes, and henceindexingfields,forthesamefile. WeagainrefertothetwofieldvaluesofindexentryiasK(i),P(i).Theentriesareorderedbyvalueof K(i),sowecanusebinarysearchontheindex.Becausetherecordsofthedatafilearenotphysically orderedbyvaluesofthesecondarykeyfield,wecannotuseblock

Fig:AdensesecondaryIndexonanonorderingkeyfieldofafile Anchors.Thatiswhyanindexentryiscreatedforeachrecordinthedatafileratherthanforeach blockasinthecaseofaprimaryindex.Figure9illustratesasecondaryindexonakeyattributeofa datafile.Noticethatinfigure9thepointersP(i)intheindexentriesareblockpointers,notrecord pointers.Oncetheappropriateblockistransferredtomainmemory,asearchforthedesiredrecord withintheblockcanbecarriedout. Asecondaryindexwillusuallyneedsubstantiallymorestoragespacethanaprimaryindexbecause ofitslargernumberofentries.However,theimprovementinsearchtimeforanarbitraryrecordis muchgreaterforasecondaryindexthanitisforaprimaryindex,becausewewouldhaveto do a linearsearchonthedatafileifthesecondaryindexdidnotexist.Foraprimaryindex,wecouldstill

46 use binary search on the main file even if the index did not exist because the primary key field physicallyorderstherecords.Example2illustratestheimprovementinnumberofblocksaccessed whenusingasecondaryindextosearchforarecord. Example2:ConsiderthefileofExample1withr=30,000fixedlengthrecordsofsizeR100bytes stored on a disk with block size B = 1024 bytes. The file has b = 3000 blocks as calculated in Example1.Todoalinearsearchonthefile,wewouldrequireb/2=3000/2=1500blockaccesseson theaverage. SupposeweconstructasecondaryindexonanonorderingkeyfieldofthefilethatisV=9byteslong. AsinExample1,ablockpointerisP=6byteslong,soeachindexentryisR.(9+6)=15bytes,andthe blockingfactorfortheindexisbfri=[(B/Ri)]=[(1024/15)]=68entriesperblock.Inadensesecondary indexsuchasthis,thetotalnumberofindexentriesisriisequaltothenumberofrecordsinthe data file, which is 30,000. The number of blocks needed for the index is hence bi=(ri/bfri)=(30,000/68)=442blocks. Comparethistothe45blocksneededbythenondenseprimaryindexinExample1. Abinarysearchonthissecondaryindexneeds(log2bi)=(log2442)=9blockaccesses.Tosearchfora recordusingtheindex,weneedanadditionalblockaccesstothedatafileforatotalof9+1=10 blockaccessesavastimprovementoverthe1500blockaccessesneededontheaverageforalinear search.

Fig: A secondary index on a non key field implemented using one level of indirec tion so that index entries are fixed length and have unique field values 47 3.5 B-TREES

The basic Btreestructurewas discoveredbyR.BayerandE.McCreight(1970)ofBoeingScientific ResearchLabsandhasgrowntobecomeoneofthemostpopulartechniquesfororganizinganindex structure.ManyvariationsonthebasicBtreehavebeendeveloped;wecoverthebasicstructurefirst andthenintroducesomeofthevariations. TheBtreeisknownasthebalancedsorttree,whichisusefulforexternalsorting.Therearestrong usesofBtreesinadatabasesystemaspointedoutbyD.Comer(1979):"Whilenosingleschemecan beoptimumforallapplications,thetechniqueoforganizingafileanditsindexcalledtheBtreeis, Thefacto,thestandardorganizationforindexesinadatabasesystem." Thefileisacollectionofrecords.Theindexreferstoauniquekey,associatedwitheachrecord.One application of B trees is found in IBM's Virtual Storage Access Method (VSAM) file organization. Many data manipulation tasks require data storage onlyinmainmemory.For applicationswitha largedatabaserunningonasystemwithlimitedcompany, the data must be stored as records on secondary memory (disks) and be accessed in pieces. The size of a record can be quite large, as shownbelow: structDATA { intssn; charname[80]; charaddress[80]; charschoold[76]; structDATA*left;/*mainmemoryaddresses*/ structDATA*right;/*mainmemoryaddresses*/ d_blockd_left;/*diskblockaddress*/ d_blockd_right;/*diskblockaddress*/ }

3.5.1 Advantages of B-tree indexes Becausethereisnooverflowprobleminherentwiththistypeoforganizationitisgoodfordynamic tablethosethatsufferagreatdealofinsert/update/deleteactivity.Becauseitistoalargeextent selfmaintaining,itisgoodinsupporting24houroperation.As dataisretrievedviatheindex itis always presented in order. 'Get next' queries are efficient because of the inherent ordering of rows withintheindexblocks. Btreeindexesaregoodforverylargetablesbecausetheywillneedminimalreorganization.Thereis predictableaccesstimeforanyretrieval(ofthesamenumberofrowsofcourse)becausetheBtree structure keeps itself balanced, so that there is always the same number of index levels for every retrieval.Bearinmindofcourse,thatthenumberofindexlevelsdoesincreasebothwiththenumber ofrecordsandthelengthofthekeyvalue.

48 Becausetherowsareinorder,thistypeofindexcanservicerangetypeenquiries,ofthetypebelow, efficiently. SELECT...WHERECOLBETWEENXANDY. DisadvantagesofBtreeindexes: Forstatictables,therearebetterorganizationsthatrequirefewerI/0s.ISAMindexesarepreferable toBtreeinthistypeofenvironment. Btreeisnotreallyappropriateforverysmalltablesbecauseindexlookupbecomesasignificantpart oftheoverallaccesstime. The index can use considerable disk space, especially in products, which allow different users to createseparateindexesonthesametable/columncombinations. Because the indexes themselves are subject to modification when rows are updated, deleted or inserted,theyarealsosubjecttolockingwhichcaninhibitconcurrency. 3.6 DIRECT FILE ORGANISATION

Intheindexsequentialfileorganizationconsideredintheprevioussections,themappingfromthe searchkeyvaluetothestoragelocationisviaindexentries.Indirectfileorganization,thekeyvalue ismappeddirectlytothestoragelocation.

Fig:Mappingfromakeyvaluetoanaddressvalue Theusualmethodofdirectmappingisbyperformingsomearithmeticmanipulationofthekeyvalue. Thisprocessiscalledhashing.Letusconsiderahashfunctionhthatmapsthekeyvaluektothe valueh(k).Thevalueh(k)isusedasanaddressandforourapplicationwerequirethatthisvaluebe insomerange.IfouraddressareafortherecordsliesbetweenS1andS2,therequirementforthe hash function h(k) is that for all values of k it should generate values between S1 and S2. It is obviousthatahashfunctionthatmapsmanydifferent key values to a single address or one that doesnotmapthekeyvaluesuniformlyisabadhashfunction.Acollisionissaidtooccurwhentwo distinct key values are mapped to the same storage location. Collision is handled in a number of ways. Another problem that we have to resolve is to decide what address is represented by h(k). Let the addresses generated by the hash function the addresses of buckets in which the y, address pair valuesofrecordsarestored.Figureshowsthebucketscontainingthey,addresspairsthatallowa reorganizationoftheactualdatafileandactualrecordaddresswithoutaffectingthehashfunctions. Alimitednumberofcollisions couldbehandledautomatically by the use of a bucket of sufficient capacity. Obviously the space required for the buckets will be, in general, much smaller than the actualdatafile.Consequently,itsreorganizationwillnotbethatexpensive.Oncethebucketaddress isgeneratedfromthekeybythehashfunction,asearchinthebucketisalsorequiredtolocatethe addressoftherequiredrecord.However,sincethebucketsizeissmall,thisoverheadissmall.

49 Theuseofthebucketreducestheproblemassociatedwithcollisions.Inspiteofthis,abucketmay becomefullandtheresultingoverflowcouldbehandledbyprovidingoverflowbucketsandusinga pointer from the normal bucket to an entry in the overflow bucket. All such overflow entries are linked.Multipleoverflowsfromthesamebucketresultsinalonglistandslowsdowntheretrievalof these records. In an alternate scheme, the address generated by the hash function is a bucket addressandthebucketisusedtostoretherecordsdirectlyinsteadofusingapointertotheblock containingtherecord. .

Fig:Bucketandblockorganizationforhashing Letsrepresentthevalue: s=upperbucketaddressvaluelowerbucketaddressvalue+1 Here,sgivesthenumberofbuckets.Assumethatwehavesomemechanismtoconvertkeyvaluesto numericones.Thenasimplehashingfunctionis: h(k)=kmodS Wherekisthenumericrepresentationofthekeyandh(k)producesabucketaddress.Amoment's thoughttellsusthatthismethodwouldperformwellinsomecasesandnotinothers. It has been shown, however, that the choice of a prime number for s is usually satisfactory. A combination of multiplicative and divisive methods can be used to advantage in many practical situations.

50 Thereareinnumerablewaysofconvertingakeytoanumericvalue.Mostkeysarenumeric;others maybeeitheralphabeticoralphanumeric.Inthelattertwocases,wecanusethebitrepresentation of the alphabet to generate the numeric equivalent key. A number of simple hashing methods are givenbelow.Manyhashingfunctionscanbedevisedfromtheseandotherways. 1. Use the low order part of the key. For keys that are consecutive integers with few gaps, this methodcanbeusedtomapthekeystotheavailablerange. 2.Endfolding.Forlongkeys,weidentifystart,middle,andendregions,suchthatthesumofthe lengthsofthestartandendregionsequalsthelengthofthemiddleregion.Thestartandenddigits areconcatenatedandtheconcatenatedstringofdriftsisaddedtothemiddleregiondigits. 3.7 Need for the Multiple Access Path

Many interactive information systems require the support of multikey files. Consider a banking systeminwhichthereareseveraltypesofusers:teller,loanofficers,branchmanager,bankofficers, accountholders,andsoforth.Allhavetheneedtoaccessthesamedata,sayrecordsoftheformat shown in fig various types of users need to access these records in different ways. A teller might identifyanaccountrecordbyitsIDvalue.Aloan officermightneedtoaccessallaccountrecords with a given value for OVERDRAWLIMIT, or all account records for a given value of SOCNO. A branchmanagermightaccessrecordsbytheBRANCHandTYPEgroupcode.Abankofficermight wantperiodicreportsofallaccountsdata,sortedbyID.

Fig:Examplerecordformat

Support by Replicating Data: Oneapproachtobeingabletosupportallthesetypes of access is to have several different files; each organized to serve one type of request. For this banking example, theremightbeoneindexedsequentialaccountfilewithkeyID(toservertellers,bankofficers,and account holders), one sequential account file with records ordered by OVERDRAWLIMIT (to serve loanofficer),oneaccountrilewithrelativeorganizationanduserkeySOCNO(toserveloanofficers), onesequentialaccountfilewithrecordsorderedbyGROUPCODF(toservebranchmanagers),and onerelativeaccountfile,withuserkeyNAME,SOCNO,andTYPEcode(toserveaccountholders).We havejustidentifiedfivefiles,allcontainingthesamedatarecords!The,fivefilesdifferonlyintheir organizations,andthusintheaccesspathstheyprovide.

3.8 SELF TEST 1) DefinethetermFileOrganisation? 2) ExplaindifferenttypeofFileorganization. 3) ExplainDifferencebetweenindexsequenceandbinarysequencefile. 51 3.9 SUMMARY

• External storage available, types of queries allowed, number of keys, mode of retrieval and modeofupdate. • Thetechniqueusedtorepresentandstoretherecordsonafileiscalledthefileorganization. • A sequentially organized file records are written consecutively when the file is created and mustbeaccessedconsecutivelywhenthefileislaterusedforinput. • Theretrievalofarecordfromasequentialfile,onaverage,requiresaccesstohalftherecords inthefile,makingsuchinquiriesnotonlyinefficientbutverytimeconsumingforlargefiles. Toimprovethequeryresponsetimeofasequentialfile,atypeofindexingtechniquecanbe added. • Aprimaryindexisanorderedfilewhoserecordsareoffixedlengthwithtwofields.Thefirst fieldisofthesamedatatypesastheorderingkeyfieldofthedatafile,andthesecondfieldis apointertoadiskblockablockaddress.

52 UNIT 4

REPRESENTING DATA ELEMENTS

4.1 DataElementsandFields 4.2 RepresentingRelationalDatabaseElements 4.3 Records 4.4 RepresentingBlockandRecordAddresses 4.5 ClientServerSystems 4.6 LogicalandStructuredAddresses 4.7 RecordModifications 4.8 IndexStructures 4.9 IndexesonSequentialFiles 4.10 SecondaryIndexes 4.11 BTrees 4.12 HashTables 4.13 SelfTest

4.1 Data Elements and Fields

Inordertobeginconstructingthebasicmodel,themodelermustanalyzetheinformationgathered duringtherequirementsanalysisforthepurposeof: • Classifyingdataobjectsaseitherentitiesorattributes • Identifyinganddefiningrelationshipsbetweenentities • Naminganddefiningidentifiedentities,attributes,andrelationships • Documentingthisinformationinthedatadocument To accomplish these goals the modeler must analyze narratives from users, notes from meeting, policy and procedure documents, and, if lucky, design documents from the current information system.AlthoughitiseasytodefinethebasicconstructsoftheERmodel,itisnotaneasytaskto distinguishtheirrolesinbuildingthedatamodel.Whatmakesanobjectanentityorattribute?For example, given the statement "employees work on projects". Should employees be classified as an entityorattribute?Veryoften,thecorrectanswerdependsupontherequirementsofthedatabase.In somecases,employeewouldbeanentity,insomeitwouldbeanattribute. WhilethedefinitionsoftheconstructsintheERModelaresimple,themodeldoesnotaddressthe fundamentalissueofhowtoidentifythem.Somecommonlygivenguidelinesare: • entitiescontaindescriptiveinformation • attributeseitheridentifyordescribeentities • relationshipsareassociationsbetweenentities Theseguidelinesarediscussedinmoredetailbelow.

53 • Entities • Attributes • ValidatingAttributes • DerivedAttributesandCodeValues • Relationships • NamingDataObjects • ObjectDefinition • RecordingInformationinDesignDocument

Entities Therearevariousdefinitionsofanentity: "Anydistinguishableperson,place,thing,event,orconcept,aboutwhichinformationiskept" "Athingwhichcanbedistinctlyidentified" "Anydistinguishableobjectthatistoberepresentedinadatabase" "...anything about which we store information (e.g. supplier, machine tool, employee, utility pole, airlineseat,etc.).Foreachentitytype,certainattributesarestored". Thesedefinitionscontaincommonthemesaboutentities: Entities should not be used to distinguish between time periods. For example, the entities 1st Quarter Profits, 2nd Quarter Profits, etc. should be collapsed into a single entity called Profits. An attributespecifyingthe timeperiodwould be used to categorize by time not every thing the users wanttocollectinformationaboutwillbeanentity. A complex concept may require more than one entitytorepresentit.Others"things"usersthinkimportantmaynotbeentities.

Attributes Attributes aredataobjectsthateitheridentifyordescribeentities.Attributesthatidentifyentitiesare called key attributes .Attributesthatdescribeanentityarecallednonkeyattributes.Keyattributes willbediscussedindetailinalattersection. The process for identifying attributes is similar except now you want to look for and extract those namesthatappeartobedescriptivenounphrases.

Validating Attributes Attribute values should be atomic ,thatis,presentasinglefact. Havingdisaggregated data allows simpler programming, greater reusability of data, and easier implementation of changes. Normalization also depends upon the "single fact" rule being followed. Common types of violations include: Simple aggregation a common example is Person Name which concatenates first name, middle initial, and last name. Another is Address which concatenates, street address, city, and zip code. Whendealingwithsuchattributes,youneedtofindoutiftherearegoodreasonsfordecomposing them.Forexample,dotheenduserswanttousetheperson'sfirstnameinaformletter?Dothey wanttosortbyzipcode? Complex codes these are attributes whose values are codes composed of concatenated pieces of information.Anexampleisthecodeattachedtoautomobilesandtrucks.Thecoderepresentsover10 54 different pieces of information about the vehicle. Unless part of an industry standard, these codes havenomeaningtotheenduser.Theyareverydifficulttoprocessandupdate. Derived Attributes and Code Values Twoareaswheredatamodelingexpertsdisagreeiswhetherderivedattributesandattributeswhose values are codes should be permitted in the data model. Derived attributes are those created by a formula or by a summary operation on other attributes. Arguments against including derived data arebasedonthepremisethatderiveddatashouldnotbestoredinadatabaseandthereforeshould notbeincludedinthedatamodel.Theargumentsinfavorare: Deriveddataisoftenimportanttobothmanagersandusersandthereforeshouldbeincludedinthe data model it is just as important, perhaps more so, to document derived attributes just as you wouldotherattributesincludingderivedattributesinthedatamodeldoesnotimplyhowtheywillbe implemented.Acodedvalueusesoneormorelettersornumberstorepresentafact.Forexample, thevalueGendermightusetheletters"M"and"F"asvaluesratherthan"Male"and"Female".Those who are against this practice cite that codes have no intuitive meaning to the endusers and add complexitytoprocessingdata.Thoseinfavorarguethatmanyorganizationshavealonghistoryof using coded attributes, that codes save space, and improve flexibility in that values can be easily addedormodifiedbymeansoflookuptables. Relationships Relationships are associations between entities. Typically, a relationship is indicated by a verb connectingtwoormoreentities.Forexample:employees are assigned toprojects.Asrelationships areidentifiedtheyshouldbeclassifiedintermsofcardinality,optionality,direction,anddependence. Asaresultofdefiningtherelationships,somerelationshipsmaybedroppedandnewrelationships added.Cardinalityquantifiestherelationshipsbetweenentitiesbymeasuringhowmanyinstancesof one entity are related to a single instance of another. To determine the cardinality, assume the existence of an instance of one of the entities. Then determinehowmanyspecificinstancesof the secondentitycouldberelatedtothefirst.Repeatthisanalysisreversingtheentities.Forexample: employees may be assigned tonomorethanthreeprojectsatatime;everyprojecthasatleasttwo employeesassignedtoit. If a relationship can have a cardinality of zero, it is an optional relationship. If it must have a cardinality of at least one, the relationship is mandatory. Optional relationships are typically indicated by the conditional tense. For example: an employee may be assigned to a project Mandatoryrelationships,ontheotherhand,areindicatedbywordssuchasmusthave.Forexample: astudent must registerforatleastthreecourseeachsemester Inthecaseofthespecificrelationshipform(1:1and1:M),thereisalwaysaparententityandachild entity. In onetomany relationships, the parent isalwaystheentitywiththecardinalityofone.In onetoonerelationships,thechoiceoftheparententitymustbemadeinthecontextofthebusiness beingmodeled.Ifadecisioncannotbemade,thechoiceisarbitrary.

Naming Data Objects Thenamesshouldhavethefollowingproperties:

55 • unique • havemeaningtotheenduser • containtheminimumnumberofwordsneededtouniquelyandaccuratelydescribetheobject Someauthorsadviseagainstusingabbreviationsoracronymsbecausetheymightleadtoconfusion aboutwhattheymean.Otherbelieveusingabbreviationsoracronymsareacceptableprovidedthat theyareuniversallyusedandunderstoodwithintheorganization. You should also take care to identify and resolve synonyms for entities and attributes. This can happeninlargeprojectswheredifferentdepartmentsusedifferenttermsforthesamething.

Recording Information in Design Document The design document records detailed information about each object used in the model. As you name,define,anddescribeobjects,thisinformationshouldbeplacedinthisdocument.Ifyouarenot usinganautomateddesigntool,thedocumentcanbedoneonpaperorwithawordprocessor.There isnostandardfortheorganizationofthisdocumentbutthedocumentshouldincludeinformation aboutnames,definitions,and,forattributes,domains. TwodocumentsusedintheIDEF1Xmethodofmodelingareusefulforkeepingtrackofobjects.These aretheENTITYENTITYmatrixandtheENTITYATTRIBUTEmatrix. TheENTITYENTITYmatrixisatwodimensionalarrayforindicatingrelationshipsbetweenentities. Thenamesofallidentifiedentitiesarelistedalongbothaxes.Asrelationshipsarefirstidentified,an "X" is placed in the intersecting points where any of the two axes meet to indicate a possible relationshipbetweentheentitiesinvolved.Astherelationshipisfurtherclassified,the"X"isreplaced withthenotationindicatingcardinality. The ENTITYATTRIBUTE matrix is used to indicate the assignment of attributes to entities. It is similar in form to the ENTITYENTITY matrix except attribute names are listed on the rows.

4.2 Representing Relational Database Elements 56 Therelationalmodelwasformallyintroducedby1970andhasevolvedsincethen,throughaseriesof writings.Themodelprovidesasimple,yetrigorouslydefined,conceptofhowusersperceivedata.The relational model represents data in the form of twodimension tables. Each table represents some realworldperson,place,thing,oreventaboutwhichinformationiscollected.Arelationaldatabaseis acollectionoftwodimensionaltables.Theorganizationofdataintorelationaltablesisknownasthe logical view ofthedatabase.Thatis,theforminwhicharelationaldatabasepresentsdatatothe userandtheprogrammer.Thewaythedatabasesoftwarephysicallystoresthedataonacomputer disksystemiscalledthe internal view .Theinternalviewdiffersfromproducttoproductanddoes notconcernushere. A basic understanding of the relational model is necessary to effectively use relational database softwaresuchasOracle,MicrosoftSQLServer,orevenpersonaldatabasesystemssuchasAccessor Fox,whicharebasedontherelationalmodel.Thisdocumentisaninformalintroductiontorelational concepts, especially as they relate to relational database design issues. It is not a complete descriptionofrelationaltheory. 4.3 Records Data is usually stored in the form of records. Each record consists of a collection of related data valuesoritemswhereeachvalueisformedofoneormorebytesandcorrespondstoaparticularfield oftherecord.Recordsusuallydescribeentitiesandtheirattributes.Forexample,anEMPLOYEEand recordrepresentsanemployeeentity,andeachfieldvalueintherecordspecifiessomeattributeof thatemployee,suchasNAME,BIRTHDATE,SALARY,or SUPERVISOR.Acollection offieldnames andtheircorrespondingdatatypesconstitutesarecordtypeorrecordformatdefinition.Adatatype, associatedwitheachfield,specifiesthetypeofvaluesafieldcantake. Thedatatypeofafieldisusuallyoneofthestandarddatatypesusedinprogramming.Theseinclude numeric(integer,longinteger,orfloatingpoint),stringofcharacters(fixedlengthorvarying),Boolean (having0and1orTRUEandFALSEvaluesonly),andsometimesspeciallycodeddataandtimedata types. The number of bytes required for each data type is fixed for a given computer system. An integer may require 4 bytes, a long integer 8 bytes, a real number 4 bytes, a Boolean 1 byte, a Boolean1byte,adate10bytes(assumingaformatofYYYYMMDD),andafixedlengthstringofk characters k bytes. Variablelength strings may require, as many bytes as there are characters in eachfieldvalue.Forexample,anEMPLOYEErecordtypemaybedefined–usingtheCprogramming languagenotation–asthefollowingstructure: Structemployee{ charname[30]; charssn[9]; intsalary; intjobcode; chardepartment[20]; }; In recent database applications, the need may arise for storing data items that consist of large unstructured objects, which represent images, digitized video or audio streams, or free text. These arereferredtoasBLOBs(BinaryLargeObjects).ABLOBdataitemistypicallystoredseparatelyfrom itsrecordinapoolinapoolofdiskblocks,andapointertotheBLOBisincludedintherecord.

57 4.4 Representing Block and Record Addresses

Fields(attributes)needtoberepresentedbyfixedorvariablelengthsequencesofbytes,Fieldsare puttogetherinfixedorvariablelengthcollectionscalled“records”(tuples,orobjects).Acollectionof recordsthatformsarelationortheextentofaclassisstoredasacollectionofblocks,calledafile. Atupleisstoredinarecord.Thesizeisthesumofsizesofthefieldsintherecord.Theschemais storedtogetherwithrecords(orapointertoit).Itcontains: _Typesoffields _thefieldswithintherecord _Sizeoftherecord _Timestamps(lastread,lastupdated) Theschemaisusedtoaccessspecificfieldsintherecord.Recordsarestoredinblocks.Typicallya blockonlycontainsonekindofrecords. 4.5 Client-Server Systems

The principle is that the user has a client program. He asks information (or data) from the server program.Theserversearchesthedataandsendsitbacktotheclient.[1]Puttinginanotherway,we cansaythattheuseristheclient,heusesaclientprogramtostartaclientprocess,sendsmessage toserverwhichisaserverprogram,toperformataskorservice.Asamatteroffactaclientserver systemisaspecialcaseofacooperativecomputersystem.Allsuchsystemsarecharacterisedbythe useofmultipleprocessesthatworktogethertoformthesystemsolution.(Therearetwotypesofco operative systems clientserver systems and peertopeer systems.), The client and server systems consistofthreemajorcomponents:aserverwithrelationaldatabase,aclientwithuserinterfaceand a network hardware connection in between. Client and server is an open system with number of advantages such as interoperability, scalability, adaptability, affordability, data integrity, accessibility,performanceandsecurity.

WHAT DO THE CLIENT PROGRAMS DO? Theyusuallydealwith;  Managingtheapplication'suserinterfacepart  Confirmingthedatagivenbytheuser  Sendingouttherequeststoserverprograms  Managinglocalresources,likemonitor,keyboardandperipherals. Theclientbasedprocessistheapplicationthattheuserinteractswith.Itcontainssolutionspecific logicandprovidestheinterfacebetweentheuserandtherestoftheapplicationsystem.Inthissense thegraphicaluserinterface(GUI)isonecharacteristicofclientsystem.Itusestoolssomeareas:  AdministrationTool:forspecifyingtherelevantserverinformation,creationofusers,rolesand privileges, definition of file formats, document type definitions (DTD) and document status information.

58  TemplateEditor:forcreatingandmodifyingtemplatesofdocuments  Document Editor: for editing instances of documents and for accessing component information.  DocumentBrowser:forretrievalofdocumentsfromaDocumentServer  AccessTool:providesthebasicmethodsforinformationaccess. WHAT DO THE SERVER DO? Itspurposeisfulfillingclient'srequests.Whattheydoingeneralis;  Receiverequest  Executedatabaseretrivalandupdates  Managedataintegrity  Senttheresultstoclientback  Act as a software engine that manage shared resources like databases, printers, communicationlinks..etc. Whentheserverdothese,itcanusecommonorcomplexlogic.Itcansupplytheinformationonly usingitsownresourcesoritcanemploysomeothermachineonthenetwork,inmasterandslave attitude. .[ 2] Inotherwords,therecanbe,  Singleclient,singleserver  Multipleclients,singleserver  Singleclient,multipleservers  Multipleclients,multipleservers[ 6] Inaclientserversystem,wecantalkaboutspecializationofsomecomponentsforparticulartasks. The specialization can provide very fast computation servers or high throughput database servers, resilienttransactionserversorforsomeotherpurpose,neverthelesswhatisimportantisthatthey areoptimisedforthattask.Itisnotoptimaltotrytoperformallofthetaskstogether.Actuallyitis thisspecialization,thereforeoptimizationthatgivespowertoclientandserversystems.[ 10 ] Thesedifferentservertypesfordifferenttaskscanbecollectedintwocategories: 1.Simpleronesare;  Diskserver  Fileserver:clientcanonlyrequestforfiles,butthefilesaresentastheyare,butthisneeds largebandwidthanditslowsdownthenetwork. 2.Advancedones;  Databaseserver  Transactionserver  Applicationserver

What Makes a Design a Client/Server Design?

Therearemanyanswersaboutwhatdifferentiatesclient/serverarchitecturefromsomeotherdesign. Thereisnosinglecorrectanswer,butgenerally,anaccepteddefinitiondescribesa client application astheuserinterfacetoanintelligentdatabaseengine—the server .Welldesignedclientapplications

59 donothardcodedetailsofhoworwheredateisphysicallystored,fetched,andmanaged,nordothey performlowleveldatamanipulation.Instead,theycommunicatetheirdataneedsatamoreabstract level,theserverperformsthebulkoftheprocessing,andtheresultsetisn'trawdatabutratheran intelligentanswer. Generally,aclient/serverapplicationhasthesecharacteristics: Centralized Intelligent Servers

Thatis,theserverinaclient/serversystemismorethanjustafilerepositorywheremultipleusers share file sectors over a network. Client/server servers are intelligent, because they carry out commands in the form of Structured Query Language (SQL) questions and return answers in the formofresultsets.

Network Connectivity

Generally,aserverisconnectedtoaclientsbywayofanetwork.ThiscouldbeaLAN,WAN,orthe Internet.However,mostcorrectlydesignedclient/serverdesignsdonotconsumesignificant resourcesonawire.Ingeneral,shortSQLqueriesaresentfromaclientandaserverreturnsbrief setsofspecificallyfocuseddata.

Operationally Challenged Workstations

Aclientcomputerinaclient/serversystemneednotrunaparticularlycomplexapplication.Itmust simplybeabletodisplayresultsfromaqueryandcapturecriteriafromauserforsubsequent requests.Whileaclientcomputerisoftenmorecapablethanadumbterminal,itneednothousea particularlypowerfulprocessor—unlessitmustsupportother,morehardwarehungryapplications.

Applications Leverage Server Intelligence

Client/serverapplicationsaredesignedwithserverintelligenceinmind.Theyassumethattheserver willperformtheheavyliftingwhenitcomestophysicaldatamanipulation.Then,allthatisrequired oftheclientapplicationistoaskanintelligentquestion—expectinganintelligentanswerfromthe server.Theclientapplication'sjobistodisplaytheresultsintelligently.

Shared Data, Procedures, Support, Hardware

Aclient/serversystemdependsonthebehavioroftheclients—allofthem—toproperlysharethe server's—andtheclients'—resources.Thisincludesdatapages,storedprocedures,andRAMaswell asnetworkcapacity.

System Resource Use Minimized

Allclient/serversystemsaredesignedtominimizetheloadontheoverallsystem.Thispermitsan applicationtobescaledmoreeasily—oratall.

60 Maximize Design Tradeoffs

Awelldesignedclient/serverapplicationshouldbedesignedtominimizeoverallsystemsload,or designedformaximumthroughput,ordesignedtoavoidreallybadresponsetimes—ormaybeagood designcanbalanceallofthese. 4.6 Logical and Structured Addresses Atthelogicallevel,astructureddocumentismadeupofanumberofdifferentparts.Somepartsare optional, others are compulsory. Many of these document structures have a required orderthey cannotbeinsertedatarbitrarypointsinthedocument. Forexample,adocumentmusthaveatitleanditmustbethefirstelementinthedocument.The programmingexampleisverygoodbecauseitisverysimple.Atthelogicallevel,sectionsareusedto breakupthedocumentintopartsandsubpartsthathelptoassistthereadertofollowthestructure ofthedocumentandtonavigatetheirwaythroughit. Sectionsaremadeupofasectiontitle,followedbyoneormoretextblocksandthen,optionally,one or more subsections. Sections are allowed in either a Chapter document (within chapters) or a Simpledocument,andinAppendices. Sections can contain other sections i.e. they may be nested. Only 4 levels of section nesting are recommended. Note that once you start entering nested (sub)sections, you cannot enter any text blocksafterthesubsectionsi.e..allofthesectionstextblocksmustcomebeforeanysubsections. Sections are automatically numbered. A level 1 section has one number, a level two section is numbered N.n, a level 3 section N.n.n and so on. If the section is contained within a chapter or appendix,thesectionnumberisprefixedwiththechapternumberorappendixnumber. 4.7 Record Modifications Thedomainrecordmodification/transferprocessisthecompleteresponsibilityofthedomainowner. If you need assistance modifying your domain record, please contact your domain registrar for technicalsupport.ColossalHost.comwillnotprovideexcessivesupportresourcesassistingcustomers inthedomainrecordmodificationprocess.ItisimportanttounderstandthatColossalHost.comhas no more power to make modifications to a Subscriber's domain record than a complete stranger would. The domain name owner is completely responsible for the information (including the name servers)thatiscontainedwithinthedomainrecord. 4.8 Index Structures Indexstructuresinobjectorienteddatabasemanagementsystemsshouldsupportselectionsnotonly with respect to physical object attributes, but also with respect to derived attributes. A simple example arises, if we assume the object types Company , Division, and Employee, with the relationshipshasdivisionfromCompanytoDivision,andemploysfromDivisiontoEmployee.

61 Indexstructuresinobjectorienteddatabasemanagementsystemsshouldsupportselectionsnotonly with respect to physical object attributes, but also with respect to derived attributes. A simple example arises, if we assume the object types Company , Division, and Employee, with the relationshipshasdivisionfromCompanytoDivision,andemploysfromDivisiontoEmployee.Inthis case the index structure should allow support queries for companies specifying the number of employeesofthecompany. 4.9 Indexes on Sequential Files

Datastructureforlocatingrecordswithgivensearchkeyefficiently.Alsofacilitatesafullscanofa relation.

4.10 Secondary Indexes Secondary indexes to provide access to subsets of records. Both databases and tables provide automaticsecondaryindexes.Allsecondaryindexesareheldonaseparatedatabasefile(.sr6).Thisis createdwhenthefirstindexiscreatedanddeletedifthelastindexisdeleted.Eachsecondaryindex isphysicallyverysimilartoastandarddatabase.Itcontainsindexblocksanddatablocks.Thesizes oftheseblocksarecalculatedinasimilarwaytotheblocksizecalculationsforstandarddatabase blocks to ensure reasonably efficient processing given the size of the secondary index key and the maximum number of records of that type. Each index potentially has different block sizes. Each record in the data block in a secondary index has the secondary key as the key and contains the standarddatabasekeyasthedata.Thusthesizeofthesedatablocksisaffectedbythesizeofboth keys Alltheseindexfiles(i.e.,primary,secondary,andclusteringindexes)areorderedfiles,andhavetwo fieldsoffixedlength.Onefieldcontainsdata(inwhichitsvalueisthesameasafieldfromdatafile) andtheotherfieldisapointertothedatafile,but

62 Inprimaryindexes,thedataitemoftheindexhasthevalueoftheprimarykeyofthefirstrecord(the anchorrecord)oftheblockinwhichthepointerpointsto. In secondary indexes, the data item of the index has a value of a secondary key and the pointer pointstoablock,inwhicharecordwithsuchsecondarykeysisstoredin. In clustering indexes, the data item on the index has a value of a nonkey field, and the pointer pointstotheblockinwhichthefirstrecordwithsuchnonkeyfieldsisstoredin. Inprimaryindexfiles,foreveryblockinadatafile,onerecordexistsintheprimaryindexfile.The number of records in the index file is, equal to the number of blocks in the data file. Hence, the primaryindexesarenondense. Insecondaryindexfiles,foreveryrecordinthedatafile,onerecordexistsinthesecondaryindexfile. Thenumberofrecordsinthesecondaryfileisequaltothenumberofrecordsinthedatafile.Hence, thesecondaryindexesaredense. Inclusteringindexfiles,foreverydistinctclusteringfield,onerecordexistsintheclusteringindex file.Thenumberofrecordsintheclusteringindexisequaltothedistinctnumbersfortheclustering fieldinthedatafile.Hence,theclusteringindexesarenondense. The Customers Table holds information on customers,suchastheircustomernumber,nameand address. Run the Database Desktop program from the Start Menu or select Tools> Database DesktopinDelphi.OpentheCustomerstablecopiedinthepreviousstep.Bydefaultthedatainthe tableisdisplayedasreadonly.Familiariseyourselfwiththedata.Changetoeditmode(Table>Edit Data) and add a new record. View the structure of the table (Table>Info Structure). Select Table >Restructure to restructure the table. Add a secondary index to the table, by selecting Secondary Indexes from the Table properties combo box. The secondary index is composed of LastName and FirstName,inthatorder.CalltheindexCustomersNameIndex.Theindexwillbeusedtoaccessthe Customerstableoncustomername. 4.11 B-trees ABtreeisaspecializedmultiwaytreedesignedespeciallyforuseondisk.InaBtreeeachnodemay containalargenumberofkeys.Thenumberofsubtreesofeachnode,then,mayalsobelarge.AB treeisdesignedtobranchoutinthislargenumberofdirectionsandtocontainalotofkeysineach nodesothattheheightofthetreeisrelativelysmall.Thismeansthatonlyasmallnumberofnodes mustbereadfromdisktoretrieveanitem.Thegoalistogetfastaccesstothedata,andwithdisk drivesthismeansreadingaverysmallnumberofrecords.Notethatalargenodesize(withlotsof keysinthenode)alsofitswiththefactthatwithadiskdriveonecanusuallyreadafairamountof dataatonce. A multiway tree of order m isanorderedtreewhereeachnodehasatmostmchildren.Foreachnode, ifkistheactualnumberofchildreninthenode,thenk1isthenumberofkeysinthenode.Ifthe keysandsubtreesarearrangedinthefashionofasearchtree,thenthisiscalleda multiway search tree of order m .Forexample,thefollowingisamultiwaysearchtreeoforder4.Notethatthefirstrow

63 ineachnodeshowsthekeys,whilethesecondrowshowsthepointerstothechildnodes.Ofcourse, inanyusefulapplicationtherewouldbearecordofdataassociatedwitheachkey,sothatthefirst rowineachnodemightbeanarrayofrecordswhereeachrecordcontainsakeyanditsassociated data.Anotherapproachwouldbetohavethefirstrowofeachnodecontainanarrayofrecordswhere eachrecord containsa keyandarecordnumberfortheassociated datarecord,whichisfoundin anotherfile.Thislastmethodisoftenusedwhenthedatarecordsarelarge.Theexamplesoftware will use the first method.

Whatdoesitmeantosaythatthekeysandsubtreesare"arrangedinthefashionofasearchtree"? Supposethatwedefineournodesasfollows: typedefstruct { intCount;//numberofkeysstoredinthecurrentnode ItemTypeKey[3];//arraytoholdthe3keys longBranch[4];//arrayoffakepointers(recordnumbers) }NodeType; Thenamultiwaysearchtreeoforder4hastofulfillthefollowingconditionsrelatedtotheorderingof thekeys: • Thekeysineachnodeareinascendingorder. • Ateverygivennode(callitNode)thefollowingistrue: o The subtree starting at record Node.Branch[0] has only keys that are less than Node.Key[0]. o The subtree starting at record Node.Branch[1] has only keys that are greater than Node.Key[0]andatthesametimelessthanNode.Key[1]. o The subtree starting at record Node.Branch[2] has only keys that are greater than Node.Key[1]andatthesametimelessthanNode.Key[2]. o The subtree starting at record Node.Branch[3] has only keys that are greater than Node.Key[2]. • NotethatiflessthanthefullnumberofkeysareintheNode,these4conditionsaretruncated sothattheyspeakoftheappropriatenumberofkeysandbranches.

64 Thisgeneralizesintheobviouswaytomultiwaysearchtreeswithotherorders. ABtreeofordermisamultiwaysearchtreeofordermsuchthat: • Allleavesareonthebottomlevel. • Allinternalnodes(excepttherootnode)haveatleastceil(m/2)(nonempty)children. • Therootnodecanhaveasfewas2childrenifitisaninternalnode,andcanobviouslyhave

65 nochildreniftherootnodeisaleaf(thatis,thewholetreeconsistsonlyoftherootnode). • Eachleafnodemustcontainatleastceil(m/2)1keys. Notethatceil(x)isthesocalledceilingfunction.It'svalueisthesmallestintegerthatisgreaterthan orequaltox.Thusceil(3)=3,ceil(3.35)=4,ceil(1.98)=2,ceil(5.01)=6,ceil(7)=7,etc. ABtreeisafairlywellbalancedtreebyvirtueofthefactthatallleafnodesmustbeatthebottom. Condition (2) tries to keep the tree fairly bushy by insisting that each node have at least half the maximumnumberofchildren.Thiscausesthetreeto"fanout"sothatthepathfromroottoleafis veryshorteveninatreethatcontainsalotofdata.

Example B-Tree The following is an example of a Btree of order 5. This means that (other that the root node) all internalnodeshaveatleastceil(5/2)=ceil(2.5)=3children(andhenceatleast2keys).Ofcourse, themaximumnumberofchildrenthatanodecanhaveis5(sothat4isthemaximumnumberof keys). According to condition 4, each leaf node must contain at least 2 keys. In practice Btrees usuallyhaveordersalotbiggerthan5.

4.12 Hash Tables Linked lists are handy ways of tying data structures together but navigating linked lists can be inefficient.Ifyouweresearchingforaparticularelement,youmighteasilyhavetolookatthewhole listbeforeyoufindtheonethatyouneed.Linuxusesanothertechnique, hashing togetaroundthis restriction.A hash table isan array or vector ofpointers.Anarray,orvector,issimplyasetofthings comingoneafteranotherinmemory.Abookshelfcouldbesaidtobeanarrayofbooks.Arraysare accessed by an index , the index is an offset into the array. Taking the bookshelf analogy a little further,youcoulddescribeeachbookbyitspositionontheshelf;youmightaskforthe5thbook. Ahashtableisanarrayofpointerstodatastructuresanditsindexisderivedfrominformationin those data structures. If you had data structures describing the population of a village then you coulduseaperson'sageasanindex.Tofindaparticularperson'sdatayoucouldusetheirageasan

66 indexintothepopulationhashtableandthenfollowthepointertothedatastructurecontainingthe person'sdetails.Unfortunatelymanypeopleinthevillagearelikelytohavethesameageandsothe hashtablepointerbecomesapointertoachainorlistofdatastructureseachdescribingpeopleof thesameage.However,searchingtheseshorterchainsisstillfasterthansearchingallofthedata structures. Asahashtablespeedsupaccesstocommonlyuseddatastructures,Linuxoftenuseshashtablesto implement caches .Cachesarehandyinformationthatneedstobeaccessedquicklyandareusuallya subset ofthefullsetof informationavailable.Datastructuresareputintoacacheandkeptthere becausethekerneloftenaccessesthem.Thereisadrawbacktocachesinthattheyaremorecomplex touseandmaintainthansimplelinkedlistsorhashtables.Ifthedatastructurecanbefoundinthe cache(thisisknownasa cache hit ,thenallwellandgood.Ifitcannotthenalloftherelevantdata structuresmustbesearchedand,ifthedatastructureexistsatall,itmustbeaddedintothecache. 4.13 Self Test

1. Hash Tables (G&TSection8.3,page341) R-8.4 Drawthe11itemhashtablethatresultsfromusingthehashfunction h(i) = (2i+5) mod11 tohashthekeys12,44,13,88,23,94,11,39,20,16and5,assumingcollisionsarehandled by chaining . R-8.5 Whatistheresultofthepreviousexercise,assumingcollisionsarehandledby linear probing ? R-8.7 WhatistheresultofExercise R-8.4 assumingcollisionsarehandledby double hashing usingasecondaryhashfunction h'(k) =7(kmod7)? 2. “BTrees are the bread and butter of the database world: Virtually every database is implementedinternallyassomevariantofBTrees.”.Commentonthisstatement. 3. What are the differences among primary, secondary, and clustering indexes? How do these differencesaffectthewaysinwhichtheseindexesareimplemented?Whichoftheseindexes aredense,andwhicharenot? 4. Howdoesmultilevelindexingimprovetheefficiencyofsearchinganindexfile? 5. Dodetailedproceduresexistforatleastthefollowingoperations: 4.14 Recordcreation? 4.15 Recordmodification? 4.16 Recordduplication? 4.17 Recorddestruction? 4.18 Consistentqualitycontrol? 4.19 Problemresolution? 6. What is Client and server system. What do Client and the Server do.Client/Server Architecture.Example:WorldWideWeb.WhatisClient/ServerComputing

67

UNIT 5

RELATIONAL MODEL

5.1 Introduction 5.2 Objectives 5.3 Conceptsofarelationalmodel 5.4 Formaldefinitionofarelation 5.5 Thecoddcommandments 5.6 Summary

1 INTRODUCTION One of the main advantages of the relational model is that it is conceptually simple and more importantlybasedonmathematicaltheoryofrelation.Italsofreestheusersfromdetailsofstorage structureandaccessmethods. Therelationalmodellikeallothermodelsconsistsofthreebasiccomponents:Asetofdomainsanda setofrelationsOperationonrelationsIntegrityrules 5.2 OBJECTIVES Aftercompletingthisunit,youwillbeableto: —Definetheconceptsofrelationalmodel —Discussthebasicoperationsoftherelationalalgebra —Statetheintegrityrules 5.3 CONCEPTS OF A RELATIONAL MODEL TherelationalmodelwasformallyintroducedbyDr.E.F.Coddin1970andhasevolvedsincethen, through a series of writings. The model provides a simple, yet rigorously defined, concept of how usersperceivedata.Therelationalmodelrepresentsdataintheformoftwodimensiontables.Each tablerepresentssomerealworldperson,place,thing,oreventaboutwhichinformationiscollected. Arelationaldatabaseisacollectionoftwodimensionaltables.Theorganizationofdataintorelational tablesisknownasthe logical view ofthedatabase.Thatis,theforminwhicharelationaldatabase presentsdatatotheuserandtheprogrammer.Thewaythedatabasesoftwarephysicallystoresthe dataonacomputerdisksystemiscalledthe internal view .Theinternalviewdiffersfromproductto productanddoesnotconcernushere. A basic understanding of the relational model is necessary to effectively use relational database softwaresuchasOracle,MicrosoftSQLServer,orevenpersonaldatabasesystemssuchasAccessor Fox,whicharebasedontherelationalmodel.

E.F.CoddoftheIBMpropoundedtherelationalmodelin1972.Thebasicconceptintherelational modelisthatofarelation.

68 Arelationcanbeviewedasatable,whichhasthefollowingproperties: Property1:Itiscolumnhomogeneous.Inotherwords,inanygivencolumnofatable,allitemsareof thesamekind. Property2:Eachitemisasimplenumberoracharacterstring.Thatis,atablemustbein1NF.(First NormalForm)whichwillbeintroducedinthesecondunit. Property3:Allrowsofatablearedistinct. Property4:Theorderingofrowswithinatableisimmaterial. Property5:Thecolumnsofatableareassigneddistinctnamesandtheorderingofthesecolumnsis immaterial. Exampleofavalidrelation S# P# SCITY 10 1 BANGALORE 10 2 BANGALORE 11 1 BANGALORE 11 2 BANGALORE 5.4. FORMAL DEFINITION OF A RELATION

Relationshipsareequallyasimportanttoanobjectorientedmodelastoarelationalmodel.Following aresomeissuesrelatedtorelationshipsthatyoushouldkeepinmindwhendesigningschema. Althoughtherearenospecificrecommendationsastohowtohandlemanyoftherelationshipissues inobjectorientedmodeling,itisusuallypossibletocomeupwithareasonable,workablesolutionif youdesignwiththeseissuesinmind. Intherelationalmodel,adatabaseisacollectionofrelationaltables.Arelationaltableisaflatfile composedofasetofnamedcolumnsandanarbitrarynumberofunnamedrows.Thecolumnsofthe tablescontaininformationaboutthetable.Therowsofthetablerepresentoccurrencesofthe"thing" representedbythetable.Adatavalueisstoredintheintersectionofarowandcolumn.Eachnamed columnhas a domain, whichisthesetofvaluesthat may appear in that column. Fig shows the relationaltablesforasimplebibliographicdatabasethatstoresinformationaboutbooktitle,authors, andpublishers.

69 There are alternate names used to describe relational tables. Some manuals use the terms tables, fields, and records to describe relational tables, columns, and rows, respectively. The formal literature tends to use the mathematical terms, relations, attributes, and tuples. Figure 2 summarizesthesenamingconventions. In This Document Formal Terms Many Database Manuals RelationalTable Relation Table Column Attribute Field Row Tuple Record Relational tables can be expressed concisely by eliminating the sample data and showing just the tablenameandthecolumnnames.Forexample, AUTHOR (au_id,au_lname,au_fname,address,city,state,zip)

70 TITLE (title_id,title,type,price,pub_id) PUBLISHER (pub_id,pub_name,city) AUTHOR_TITLE (au_id,title_id)

Properties of Relational Tables Relational tables have six properties: ValuesAreAtomic:Thispropertyimpliesthatcolumnsinarelationaltablearenotrepeatinggroupor arrays. Such tables are referred to as being in the "first normal form" (1NF). The atomic value propertyofrelationaltablesisimportantbecauseitisoneofthecornerstonesoftherelationalmodel. Thekeybenefitoftheonevaluepropertyisthatitsimplifiesdatamanipulationlogic. ColumnValuesAreoftheSameKind: In relational terms this means that all values in a column come from the same domain. A domain is a set of values which a column may have. For example, a Monthly_Salary column contains only specific monthly salaries. It never contains other information such as comments, status flags, or even weekly salary.

• EachRowisUnique:Thispropertyensuresthatnotworowsinarelationaltableareidentical; thereisatleastonecolumn,orsetofcolumns,thevaluesofwhichuniquelyidentifyeachrow in the table. Such columns are called primary keys and are discussed in more detail in Relationships and Keys . • TheSequenceofColumnsisInsignificant:Thispropertystatesthattheorderingofthecolumns intherelationaltablehasnomeaning.Columnscanberetrievedinanyorderandinvarious sequences.Thebenefitofthispropertyisthatitenablesmanyuserstosharethesametable without concern of how the table is organized. It also permits the physical structure of the databasetochangewithoutaffectingtherelationaltables. • TheSequenceofRowsisInsignificant:Thispropertyisanalogoustheoneabovebutappliesto rowsinsteadofcolumns.Themainbenefitisthattherowsofarelationaltablecanberetrieved indifferentorderandsequences.Addinginformationtoarelationaltableissimplifiedanddoes notaffectexistingqueries. • EachColumnHasaUniqueName:Becausethesequenceofcolumnsisinsignificant,columns must be referenced by name and not by position. In general, a column name need not be uniquewithinanentiredatabasebutonlywithinthetabletowhichitbelongs. 5.5 THE CODD COMMANDMENTS InthemostbasicofdefinitionsaDBMScanberegardedasrelationalonlyifitobeysthefollowing threerules: —Allinformationmustbeheldintables —Retrievalofthedatamustbepossibleusingthefollowingtypesofoperations: SELECT,JOINandPROJECT

71 —Allrelationshipsbetweendatamustberepresentedexplicitlyinthatdataitself. Todefinetherequirementmorerigorously,E.FCoddformulatedthe12rulesasfollows

Rule 1: The information rule Allinformationisexplicitlyandlogicallyrepresentedinexactlyonewaybydatavaluesintables. In simple terms this means that if an item of data doesn't reside somewhere in a table in the databasethenitdoesn'texistandthisshouldbeextendedtothepointwhereevensuchinformation astable,viewandcolumnnamestomentionjustafew, should be contained somewhere in table form.ThisnecessitatestheprovisionofsuchfacilitiesthatallowrelativelyeasyadditionstoRDBMS ofprogrammingandCASEtoolsforexample.Thisruleservesonitsowntoinvalidatetheclaimsof severaldatabasestoberelationalsimplybecauseoftheirlackofabilitytostoredictionaryitems(or indeed metadata) in an integrated, relational form. Commonly such products implement their dictionaryinformationsystemsinsomenativefilestructure,andthussetthemselvesupforfailingat thefirsthurdle. Rule 2: The rule of guaranteed access Everyitemofdatamustbelogicallyaddressablebyresortingtoacombinationoftablename,primary keyvalueandacolumnname. Whilst it is possible to retrieve individual items of data in many different ways, especially in a relational/SQL environment, it must be true that any item can be retrieved by supplying the table name,theprimarykeyvalueoftherowholdingtheitemandcolumnnameinwhichitistobefound. Ifyouthinkbacktothetablelikestoragestructure,thisruleissayingthatattheintersectionofa columnandarowyouwillnecessarilyfindonevalueofadataitem(ornull). Rule 3: The systematic treatment of null values Itmaysurpriseyoutoseethissubjectonthelistofproperties,butitisfundamentaltotheDBMS thatnullvaluesaresupportedintherepresentation of missing and inapplicable information. This supportfornullvaluesmustbeconsistentthroughouttheDBMS,andindependentofdatatype(a nullvalueinaCHARfieldmustmeanthesameasnullinanINTEGERfieldforexample). Ithasoftenbeenthecaseinotherproducttypesthatacharactertorepresentmissingorinapplicable data has been allocated from the domain of characters pertinenttoa particularitem.Wemayfor exampledefinefourpermissiblevaluesforacolumnSEXas: MMale FFemale XNodataavailable YNotapplicable Such a solution requires careful design, and must decrease production at the very least. This situation is particularly undesirable when very highlevel languages such as SQL are used to manipulatesuchdata,andifsuchasolutionisusedfornumericcolumnsallsortsofproblemscan ariseduringaggregatefunctionssuchasSUMandAVERAGEetc.

Rule 4: The database description rule

72 Adescriptionofthedatabaseisheldandmaintainedusingthesamelogicalstructuresusedtodefine thedata,thusallowinguserswithappropriateauthoritytoquerysuchinformationinthesameways andusingthesamelanguagesastheywouldanyotherdatainthedatabase. Putintoeasyterms,Rule4meansthattheremustbeadatadictionarywithintheRDBMSthatis constructedoftablesand/orviewsthatcanbeexaminedusingSQL.Thisrulestatesthereforethata dictionary is mandatory, and if taken in conjunctionwithRule1,therecan benodoubtthatthe dictionarymustalsoconsistofcombinationsoftablesandviews. Rule 5: The comprehensive sub-language rule There must be at least one language whose statements can be expressed as character strings conformingtosomewelldefinedsyntax,thatiscomprehensiveinsupportingthefollowing: Datadefinition Viewdefinition Datamanipulation Integrityconstraints Authorization Transactionboundaries Againinrealterms,thismeansthattheRDBMSmust be completely manageable through its own dialectofSQL,althoughsomeproductsstillsupportSQLlikelanguages(IngresssupportofQuelfor example). This rule also sets out to scope the functionality of SQLyou will detect an implicit requirement to support access control, integrity constraints and transaction management facilities forexample. Rule 6: The view-updating rule Allviewsthatcanbeupdatedintheory,canalsobeupdatedbythesystem. Thisisquiteadifficultruletointerpret,andsoawordofexplanationisrequiredwhilstitispossible tocreateviewsinallsortsofillogicalways,andwithallsortsofaggregatesandvirtualcolumns,itis obviously not possible to update through some of them. As a very simple example, if you define a virtual column in a view as A*B where A and B are columns in a base table, then how can you performanupdateonthatvirtualcolumndirectly? The database cannot possible break down any numbersupplied,intoitstwocomponentparts,withoutmoreinformationbeingsupplied.Todelvea little deeper, we should consider that can be defined in terms of both tables and other views. Particular vendors restrict the complexity of their own implementations, in some cases quite drastically. Eveninlogicaltermsitisoftenincrediblydifficulttotellwhetheraviewistheoreticallyupdateable, letalonedelveintothepracticalitiesofactuallydoingso.Infactthereexistsanothersetofrulesthat, whenappliedtoaview,canbeusedtodetermineitsleveloflogicalcomplexity,anditisonlyrealistic toapplyRule6tothoseviewsthataredefinedassimplebysuchcriteria. Rule 7: The insert , update and delete rule AnRDBMSmustdomorethanjustbeabletoretrieverelationaldatasets.Ithastobecapableof inserting,updatinganddeletingdataasarelationalset.ManyRDBMSthatfailthegradefallbackto asinglerecordatatimeproceduraltechniquewhenitcomestimetomanipulatedata.

73 Rule 8: The physical independence rule Useraccesstothedatabase,viamonitorsorapplicationprograms,mustremainlogicallyconsistent wheneverchangestothestoragerepresentation,oraccessmethodstothedata,arechanged. Therefore,andbywayofanexample,ifanindexisbuiltordestroyedbyaDBAonatable,anyuser shouldretrievethesamedatafromthattable,albeitalittlemoreslowly.Itisthisrulethatdemands the clear distinction between the logical and physical layers of the database. Applications must be limitedtointerfacingwiththelogicallayertoenabletheenforcementofthisrule,anditisthisrule thatsortsoutthemenfromtheboysintherelationalmarketplace.Lookingatotherarchitectures alreadydiscussed,oncanimaginetheconsequencesofchangingthephysicalstructureofanetwork orhierarchicalsystem. However there are plenty of traps waiting even in the relational world. Consider the application designer who depends on the presence of a Btype tree index to ensure retrieval of data is in a predefined order, only to find that the DBA dynamically drops the index. The removal of such an indexmightbecatastrophic.Ipointoutthesetwoissuesbecausealthoughtheyareseriousfactors,I amnotconvincedthattheyconstitutethebreakingofthisrule;itisfortheindividualtomakeuphis ownmind.

Rule 9: The logical data independence rule Applicationprogramsmustbeindependentofchangesmadetothebasetables.

Fig:TAB1splitintotwofragments Thisruleallowsmanytypesofdatabasedesignchangetobemadedynamically,withoutusersbeing awareofthem.Toillustratethemeaningoftheruletheexamplesonthenextpageshowstwotypes ofactivity,describedinmoredetaillater,thatshouldbepossibleifthisruleisenforced.

74

Fig:TwofragmentsCombinedintoOneTable Firstly,itshouldbepossibletosplitatableverticallyintomorethanonefragment,aslongassuch splittingpreservesalltheoriginaldata(isnonloss),andmaintaintheprimarykeyineachandevery fragment.Thismeansinsimpletermsthatasingletableshouldbedividedintooneormoreother tables. Secondlyitshouldbepossibletocombinebasetablesintoonebywayofnonlossjoin.Notethatif suchchangesaremade,thenviewswillberequiredsothatusersandapplicationsareunaffectedby them. Rule 10: Integrity rules Therelationalmodelincludestwogeneralintegrityrules.Theseintegrityrulesimplicitlyorexplicitly definethesetofconsistentdatabasestates,orchangesofstate,orboth.Otherintegrityconstraints can bespecified,for example,intermsof dependencies during databasedesign.Inthissection we definetheintegrityrulesformulatedbyCodd. IntegrityRule1 Integrityrule1isconcernedwithprimarykeyvalues.Beforeweformallystatetherule,letuslookat theeffectofnullvaluesinprimeattributes.Anullvalueforanattributeisavaluethatiseithernot knownatthetimeordoesnotapplytoagiveninstanceoftheobject.Itmayalsobepossiblethata particulartupledoesnothaveavalueforanattribute;thisfactcouldberepresentedbyanullvalue. Ifanyattributeofaprimarykey(primeattribute)werepermittedtohavenullvalues,thenbecause theattributesinthekeymustbenonredundant,thekeycannotbeusedforuniqueidentificationof tuples. This contradicts the requirements for a primary key. Consider the relation P in fig. The attributeIdistheprimarykeyforP. Id Name 101 Jones 103 Smith 104 Lalonde 107 Evan 110 Drew 112 Smith (a) 75 P: Id Name 101 Jones @ Smith 104 Lalonde 107 Evan 110 Drew @ Lalonde @ Smith (b) Fig:(a)Relationwithoutnullvaluesand(b)relationwithnullvalues Integrityrule1specifiesthatinstancesoftheentitiesaredistinguishableandthusnoprimeattribute (componentofaprimarykey)valuemaybenull.Thisruleisalsoreferredtoastheentityrule.We couldstatethisruleformallyas: Definition:IntegrityRule1(EntityIntegrity): IftheattributeAofrelationRisaprimeattributeofR,thenAcannotacceptnullvalues. IntegrityRule2(ReferentialIntegrity): Integrityrule2isconcernedwithforeignkeys,i.e.,withattributesofarelationhavingdomainsthat arethoseoftheprimarykeyofanotherrelation. Relation(R),maycontainreferencestoanotherrelation(S).RelationsRandSneednotbedistinct. SupposethereferenceinrisviaasetofattributesthatformsaprimarykeyoftherelationS.Thisset ofattributesinRisaforeignkey.AvalidrelationshipbetweenatupleinRtooneinSrequiresthat thevaluesoftheattributesintheforeignkeyofRcorrespondtotheprimarykeyofatupleinS.This ensuresthatthereferencefromthetuple oftherelation R is made unambiguously to an existing tuple in the S relation. The referencing attribute(s)intheRrelationcanhavenullvalue(s);inthis case,itisnotreferencinganytupleintheSrelation.However,ifthevalueisnotnull,itmustexistas theprimaryattributeofatupleoftheSrelation.IfthereferencingattributeinRhasavaluethatis nonexistentinS,Risattemptingtoreferanonexistenttupleandhenceanonexistentinstanceofthe correspondingentity.Thiscannotbeallowed.Weillustratethispointinthefollowingexample. Rule 11: Distribution independence rule: ARDBMSmusthavedistributionindependence. This is one of the more attractive aspects of RDBMS. Database systems built on the relational frameworkarewellsuitedtotoday'sclient/serverdatabasedesign. Rule 12: No subversion rule: If an RDBMS supports a lower level language that permits for example, rowatatime processing, then this language must not be able to bypass any integrity rules or constraints of the relational language. Thus, not only must a RDBMS be governed by relational rules, but also those rules must be its primarylaws.

76 Thebeautyoftherelationaldatabaseisthattheconceptsthatdefineitarefew,easytounderstand andexplicit.The12rulesexplainedcanbeusedasthebasicrelationaldesigncriteria,andassuch areclearindicationsofthepurityoftherelationalconcept. 6. Self Test 1) Definerelationaldatabasesystem?WhereweuseRDBMS. 2) ExplaindifferenttypeofRDBMS 3) ExplainCoddRuleswiththehelpofsuitableexample.

5.7 SUMMARY • Oneofthemainadvantagesoftherelationalmodelisthatitisconceptuallysimpleandmore importantlybasedonmathematicaltheoryofrelation.Italsofreestheusersfrom details of storagestructureandaccessmethods. • Themodelprovidesasimple,yetrigorouslydefined,conceptofhowusersperceivedata.The relationalmodelrepresentsdataintheformoftwodimensiontables.Eachtablerepresents some realworld person, place, thing, or event about which information is collected. A relationaldatabaseisacollectionoftwodimensionaltables. • Relationshipsareequallyasimportanttoanobjectorientedmodelastoarelationalmodel. Following are some issues related to relationships that you should keep in mind when designingschema.

77 UNIT 6

NORMALIZATION

6.1 Functionaldependency 6.2 Normalization 6.2.1 Firstnormalform 6.2.2 Secondnormalform 6.2.3 Thirdnormalform 6.2.4 Boycecoddnormalform 6.2.5 Multivalueddependency 6.2.6 Fifthnormalform 6.3 Selftest 6.4 Summary 6.1 Functional Dependency Considerarelation Rthathastwoattributes Aand B.Theattribute Boftherelationis functionally dependent on the attribute A if and only if for each value of A no more than one value of B is associated.Inotherwords,thevalueofattributeAuniquelydeterminesthevalueof Bandifthere wereseveraltuplesthathadthesamevalueof Athenallthesetupleswillhaveanidenticalvalueof attribute B. That is, if t1 and t2 are two tuples in the relation R and t1(A) = t2(A) thenwemusthave t1(B) = t2(B) .

A and Bneednotbesingleattributes.Theycouldbeanysubsets of the attributes of a relation R (possiblysingleattributes).Wemaythenwrite R.A -> R.B If Bisfunctionallydependenton A(or Afunctionallydetermines B).Notethatfunctionaldependency does not imply a onetoone relationship between A and B although a onetoone relationship may existbetween Aand B. Asimpleexampleoftheabovefunctional dependencyiswhen Aisaprimarykeyofanentity(e.g. studentnumber)and Aissomesinglevaluedpropertyorattributeoftheentity(e.g.dateofbirth). A - > B thenmustalwayshold.(why?) Functionaldependenciesalsoariseinrelationships.Let Cbetheprimarykeyofanentityand Dbe theprimarykeyofanotherentity.Letthetwoentitieshavearelationship.Iftherelationshipisone toone,wemusthave C -> D and D -> C .Iftherelationshipismanytoone,wewouldhave C -> D but not D -> C .Formanytomanyrelationships,nofunctionaldependencieshold.Forexample,if Cis student number and D is subject number, there is no functional dependency between them. If however,wewerestoringmarksandgradesinthedatabaseaswell,wewouldhave (student_number, subject_number) -> marks andwemighthave

78 marks -> grades Thesecondfunctionaldependencyaboveassumesthatthegradesaredependentonlyonthemarks. This may sometime not be true since the instructor may decide to take other considerations into accountinassigninggrades,forexample,theclassaveragemark. Forexample,inthestudentdatabasethatwehavediscussedearlier,wehavethefollowingfunctional dependencies: sno -> sname sno -> address cno -> cname cno -> instructor instructor -> office Thesefunctionaldependenciesimplythattherecanbeonlyonenameforeach sno ,onlyoneaddress for each student and only one subject name for each cno . It is of course possible that several studentsmayhavethesamenameandseveralstudentsmayliveatthesameaddress.Ifweconsider cno -> instructor , the dependency implies that no subject can have more than one instructor (perhapsthisisnotaveryrealisticassumption).Functionaldependenciesthereforeplaceconstraints on what information the database may store. In the above example, one may be wondering if the followingFDshold sname -> sno cname -> cno Certainlythereisnothingintheinstanceoftheexampledatabasepresentedabovethatcontradicts the above functional dependencies. However, whether above FDs hold or not would depend on whethertheuniversityorcollegewhosedatabaseweareconsideringallowsduplicatestudentnames andsubjectnames.Ifitwastheenterprisepolicytohaveuniquesubjectnamesthan cname -> cno holds.Ifduplicatestudentnamesarepossible,andonewouldthinktherealwaysisthepossibilityof twostudentshavingexactlythesamename,then sname -> sno doesnothold. Functionaldependenciesarisefromthenatureoftherealworldthatthedatabasemodels.Often A and B are facts about an entity where A might be some identifier for the entity and B some characteristic.Functionaldependenciescannotbeautomaticallydeterminedbystudyingoneormore instancesofadatabase.Theycanbedeterminedonlybyacarefulstudyoftherealworldandaclear understandingofwhateachattributemeans. Wehavenotedabovethatthedefinitionoffunctionaldependencydoesnotrequirethat Aand Bbe singleattributes.Infact, Aand Bmaybecollectionsofattributes.Forexample (sno, cno) -> (mark, date) Whendealingwithacollectionofattributes,theconceptof full functional dependence isanimportant one.Let Aand Bbedistinctcollectionsofattributesfromarelation Rendlet R.A -> R.B . Bisthen fully functionally dependent on A if B is not functionally dependent on any subset of A. The above exampleof studentsandsubjectswould showfullfunctional dependenceif mark and date arenot functionallydependentoneitherstudentnumber( sno )orsubjectnumber( cno )alone.Theimplies thatweareassumingthatastudentmayhavemorethanonesubjectsandasubjectwouldbetaken

79 bymanydifferentstudents.Furthermore,ithasbeenassumedthatthereisatmostoneenrolmentof eachstudentinthesamesubject. Theaboveexampleillustratesfullfunctionaldependence.Howeverthefollowingdependence (sno, cno) -> instructor isnotfullfunctionaldependencebecause cno -> instructor holds. Asnotedearlier,theconceptoffunctionaldependencyisrelatedtotheconceptofcandidatekeyofa relation since a candidate key of a relation is an identifier which uniquely identifies a tuple and thereforedeterminesthevaluesofallotherattributesintherelation.Thereforeanysubset Xofthe attributesofarelation Rthatsatisfiesthepropertythatallremainingattributesoftherelationare functionallydependentonit(thatis,on X),then Xiscandidatekeyaslongasnoattributecanbe removed from X and still satisfy the property of functional dependence. In the example above, the attributes (sno, cno) formacandidatekey(andtheonlyone)sincetheyfunctionallydetermineallthe remainingattributes. Functionaldependenceisanimportantconceptandalargebodyofformaltheoryhasbeendeveloped aboutit.Wediscusstheconceptof closure thathelpsusderiveallfunctionaldependenciesthatare implied by a given set of dependencies. Once a complete set of functional dependencies has been obtained,wewillstudyhowthesemaybeusedtobuildnormalisedrelations.

Rules About Functional Dependencies • LetFbesetofFDsspecifiedonR • Mustbeableto reason aboutFD’sinF o SchemadesignerusuallyexplicitlystatesonlyFD’swhichareobvious o Withoutknowingexactlywhatalltuplesare,mustbeabletodeduceother/allFD’s thatholdonR o Essentialwhenwediscussdesignof“good”relationalschemas Design of Relational Database Schemas Problems such as redundancy that occur when we try to cram too much into a single relation are calledanomalies.Theprincipalkindsofanomaliesthatweencounterare: _Redundancy.Informationmayberepeatedunnecessarilyinseveraltuples. _ Update Anomalies. We may change information in one tuples but leave the same information unchangedinanother. _DeletionAnomalies.Ifasetofvaluesbecomesempty,wemayloseotherinformationassideeffect. 6.2 Normalization

Designing a database, usually a data model is translated into relational schema. The important questioniswhetherthereisadesignmethodologyoristheprocessarbitrary.Asimpleanswertothis question is affirmative. There are certain properties that a good database design must possess as dictated by Codd’s rules. There are many different ways of designing good database. One of such methodologies is the method involving ‘Normalization’. Normalization theory is built around the conceptofnormalforms.Normalizationreducesredundancy.Redundancyisunnecessaryrepetition ofdata.Itcancauseproblemswithstorageandretrievalofdata.Duringtheprocessofnormalization,

80 dependencies can be identified, which can cause problems during deletion and updation. Normalization theory is based on the fundamental notion of Dependency. Normalization helps in simplifyingthestructureofschemaandtables. Forexamplethenormalforms,wewilltakeanexampleofadatabaseofthefollowinglogicaldesign: RelationS{S#,SUPPLIERNAME,SUPPLYTATUS,SUPPLYCITY},PrimaryKey{S#} RelationP{P#,PARTNAME,PARTCOLOR,PARTWEIGHT,SUPPLYCITY},PrimaryKey{P#} RelationSP{S#,SUPPLYCITY,P#,PARTQTY}, PrimaryKey{S#,P#} ForeignKey{S#}ReferenceS ForeignKey{P#}ReferenceP SP S# SUPPLYCITY P# PARTQTY S1 Bombay P1 3000 S1 Bombay P2 2000 S1 Bombay P3 4000 S1 Bombay P4 2000 S1 Bombay P5 1000 S1 Bombay P6 1000 S2 Mumbai P1 3000 S2 Mumbai P2 4000 S3 Mumbai P2 2000 S4 Madras P2 2000 S4 Madras P4 3000 S4 Madras P5 4000 Letusexaminethetableabovetofindanydesigndiscrepancy.Aquickglancerevealsthatsomeof the data are being repeated. That is data redundancy, which is of course an undesirable. The fact thataparticularsupplierislocatedinacityhasbeenrepeatedmanytimes.Thisredundancycauses manyotherrelatedproblems.Forinstance,afteranupdateasuppliermaybedisplayedtobefrom MadrasinoneentrywhilefromMumbaiinanother.Thisfurthergivesrisetomanyotherproblems. Therefore,fortheabovereasons,thetablesneedtoberefined.Thisprocessofrefinementofagiven schemaintoanotherschemaorasetofschemapossessingqualitiesofagooddatabaseisknownas Normalization. Database experts have defined a series of Normal forms each conforming to some specified design quality condition(s). We shall restrict ourselves to the first five normal forms for the simple1NF reasonofsimplicity.Each next level 2NF ofnormalformaddsanother condition.Itisinteresting to note 3NF that the process of4N normalization is reversible. The following diagram depicts the relation between various normal forms. 5NF

81 Thediagramimpliesthat5 th Normalformisalsoin4 th Normalform,whichitselfin3 rd Normalform andsoon.Thesenormalformsarenottheonlyones.Theremaybe6 th ,7 th andn th normalforms,but thisisnotofourconcernatthisstage. Before we embark on normalization, however, there are a few more concepts that should be understood. Decomposition.Decompositionistheprocessofsplittingarelationintotwoormorerelations.Thisis nothing but projection process. Decompositions may or may not loose information. As you would learnshortly,thatnormalizationprocessinvolvesbreakingagivenrelationintooneormorerelations andalsothatthesedecompositionsshouldbereversibleaswell,sothatnoinformationislostinthe process.Thus,wewillbeinterestedmorewiththedecompositionsthatincurnolossofinformation ratherthantheonesinwhichinformationislost. Lossless decomposition: The decomposition, which results into relations without loosing any information,isknownaslosslessdecompositionornonlossdecomposition.Thedecompositionthat resultsinlossofinformationisknownaslossydecomposition. Consider the relation S{S#, SUPPLYSTATUS, SUPPLYCITY} with some instances of the entries as shownbelow. S S# SUPPLYSTATUS SUPPLYCITY S3 100 Madras S5 100 Mumbai Letusdecomposethistableintotwoasshownbelow: (1) SX S# SUPPLYSTATUS SY S# SUPPLYCITY S3 100 S3 Madras S5 100 S5 Mumbai (2) SX S# SUPPLYSTATUS SY SUPPLYSTATUS SUPPLYCITY S3 100 100 Madras S5 100 100 Mumbai Let us examine these decompositions. In decomposition(1)noinformationislost.Wecanstillsay thatS3’sstatusis100andlocationisMadrasandalsothatsupplierS5has100asitsstatusand locationMumbai.Thisdecompositionisthereforelossless. Indecomposition(2),however,wecanstillsaythatstatusofbothS3andS5is100.Butthelocation ofsupplierscannotbedeterminedbythesetwotables.Theinformationregardingthelocationofthe suppliershasbeenlostinthiscase.Thisisalossydecomposition.Certainly,losslessdecomposition is more desirable because otherwise the decomposition will be irreversible. The decomposition process is in fact projection, where some attributes are selected from a table. A natural question ariseshereastowhythefirstdecompositionislosslesswhilethesecondoneislossy?Howshoulda givenrelationmustbedecomposedsothattheresultingprojectionsarenonlossy?Answertothese questionsliesinfunctionaldependenciesandmaybegivenbythefollowingtheorem.

82 Heath’stheorem:LetR{A,B,C}bearelation,whereA,BandCaresetsofattributes.IfRsatisfiesthe FDA →B,thenRisequaltothejoinofitsprojectionson{A,B}and{A,C}. Letusapplythistheoremonthedecompositionsdescribedabove.WeobservethatrelationSsatisfies twoirreduciblesetsofFD’s S# →SUPPLYSTATUS S# →SUPPLYCITY Now taking A as S#, B as SUPPLYSTATUS, and C as SUPPLYCITY, this theorem confirms that relation S can be nonloss decomposition into its projections on {S#, SUPPLYSTATUS} and {S#, SUPPLYCITY}.Note,however,thatthetheoremdoesnotsaywhyprojections{S#,SUPPLYSTATUS} and{SUPPLYSTATUS,SUPPLYCITY}shouldbelossy.YetwecanseethatoneoftheFD’sislostinthis decomposition. While the FD S# →SUPPLYSTATUS is still represented by projection on {S#, SUPPLYSTATUS},buttheFDS# →SUPPLYCITYhasbeenlost. Analternativecriteriaforlosslessdecompositionisasfollows.LetRbearelationschema,andletF beasetoffunctionaldependenciesonR.letR 1andR 2formadecompositionofR.thisdecomposition isalosslessjoindecompositionofRifatleastoneofthefollowingfunctionaldependenciesareinF +:

R1∩R 2→R1

R1∩R 2→R2 Functional Dependency Diagrams: This is a handy tool for representing function dependencies existinginarelation.

PARTNAME

SUPPLIERNAME

S# PARTCOLOR

S# SUPPLYSTATUS PARTQTY P# PARTWEIGHT P#

SUPPLYCITY SUPPLYCITY

ThediagramisveryusefulforitseloquenceandinvisualizingtheFD’sinarelation.LaterintheUnit youwilllearnhowtousethisdiagramfornormalizationpurposes. 6.2.1 First Normal Form

Arelationisin1 st Normalform(1NF)ifandonlyif,ineverylegalvalueofthatrelation,everytuple containsexactlyonevalueforeachattribute. Although, simplest, 1NF relations have a number of discrepancies and therefore it not the most desirableformofarelation. Letustakearelation(modifiedtoillustratethepointindiscussion)as 83 Rel1{S#,SUPPLYSTATUS,SUPPLYCITY,P#,PARTQTY}PrimaryKey{S#,P#} FD{SUPPLYCITY →SUPPLYSTATUS} Note that SUPPLYSTATUS is functionally dependent on SUPPLYCITY; meaning that a supplier’s status is determined by the location of that supplier – e.g. all suppliers from Madras must have statusof100.TheprimarykeyoftherelationRel1is{S#,P#}.TheFDdiagramisshownbelow: S# SUPPLYCITY PARTQTY P# SUPPLYSTATUS Foragooddesignthediagramshouldhavearrowsoutofcandidatekeysonly.Theadditionalarrows causetrouble. Let us discuss some of the problems with this 1NF relation. Forthe purposeofillustration,letus insertsomesampletuplesintothisrelation. REL1 S# SUPPLYSTATUS SUPPLYCITY P# PARTQTY S1 200 Madras P1 3000 S1 200 Madras P2 2000 S1 200 Madras P3 4000 S1 200 Madras P4 2000 S1 200 Madras P5 1000 S1 200 Madras P6 1000 S2 100 Mumbai P1 3000 S2 100 Mumbai P2 4000 S3 100 Mumbai P2 2000 S4 200 Madras P2 2000 S4 200 Madras P4 3000 S4 200 Madras P5 4000 Theredundanciesintheaboverelationcausesmanyproblems–usuallyknownasupdateanamolies, thatisinINSERT,DELETEandUPDATEoperations.Letusseetheseproblemsduetosuppliercity redundancycorrespondingtoFDS# →SUPPLYCITY. INSERT: In this relation, unless a supplier supplies at least one part, we cannot insert the information regarding a supplier. Thus, a supplier located in Kolkata is missing from the relation becausehehasnotsuppliedanypartsofar. DELETE:Letusseewhatproblemwemayfaceduringdeletionofatuple.Ifwedeletethetupleofa supplier (if there is a single entry for that supplier), we not only delte the fact that the supplier supplied a particular part but also the fact that thesupplierislocatedinaparticularcity.Inour case, if we delete entries corresponding to S#=S2, we loose the information that the supplier is

84 located at Mumbai. This is definitely undesirable. The problem here is there are too many informationsattachedtoeachtuple,thereforedeletionforcesloosingtoomanyinformations. UPDATE:IfwemodifythecityofasupplierS1toMumbaifromMadras,wehavetomakesurethat alltheentriescorrespondingtoS#=S1areupdatedotherwiseinconsistencywillbeintroduced.Asa resultsomeentrieswillsuggestthatthesupplierislocatedatMadraswhileotherswillcontradictthis fact. 6.2.2 Second Normal Form

A relation is in 2NF if and only if it is in 1NF and every nonkey attribute is fully functionally dependentontheprimarykey.Hereithasbeenassumedthatthereisonlyonecandidatekey,which isofcourseprimarykey. A relation in 1NF can always decomposed into an equivalent set of 2NF relations. The reduction processconsistsofreplacingthe1NFrelationbysuitableprojections. Wehaveseentheproblemsarisingduetothelessnormalization(1NF)oftherelation.Theremedyis tobreaktherelationintotwosimplerrelations. REL2{S#,SUPPLYSTATUS,SUPPLYCITY}and REL3{S#,P#,PARTQTY} TheFDdiagramandsamplerelation,areshownbelow. SUPPLYCITY S# S# PARTQTY REL2 REL3 SUPPLYSTATUS P# S# SUPPLYSTATUS SUPPLYCITY S# P# PARTQTY S1 200 Madras S1 P1 3000 S2 100 Mumbai S1 P2 2000 S3 100 Mumbai S1 P3 4000 S4 200 Madras S1 P4 2000 S5 300 Kolkata S1 P5 1000 S1 P6 1000 S2 P1 3000 S2 P2 4000 S3 P2 2000 S4 P2 2000 S4 P4 3000 S4 P5 4000 REL2andREL3arein2NFwiththeir{S#}and{S#,P#}respectively.Thisisbecauseallnonkeysof REL1{SUPPLYSTATUS,SUPPLYCITY},eachisfunctionallydependentontheprimarykeythatisS#. By similar argument, REL3 is also in 2NF. Evidently, these two relations have overcome all the updateanomaliesstatedearlier.NowitispossibletoinsertthefactsregardingsupplierS5evenwhen he is not supplied any part, which was earlier not possible. This solves insert problem. Similarly, deleteandupdateproblemsarealsoovernow.

85 These relations in 2NF are still not free from all the anomalies. REL3 is free from most of the problemswearegoingtodiscusshere,however,REL2stillcarriessomeproblems.Thereasonisthat thedependencyofSUPPLYSTATUSonS#isthoughfunctional,itistransitiveviaSUPPLYCITY.Thus we see that there are two dependencies S# →SUPPLYCITY and SUPPLYCITY →SUPPLYSTATUS. This implies S# →SUPPLYSTATUS. This relation has a transitive dependency. We will see that this transitivedependencygivesrisetoanothersetofanomalies. INSERT:Weareunabletoinsertthefactthataparticularcityhasaparticularstatusuntilwehave somesupplieractuallylocatedinthatcity. DELETE:IfwedeletesoleREL2tupleforaparticularcity,wedeletetheinformationthatthatcityhas thatparticularstatus. UPDATE: The status for a given city still has redundancy. This causes usual redundancy problem relatedtoupdataion. 6.2.3 Third Normal Form

Arelationisin3NFifonlyifitisin2NFandeverynonkeyattributeisnontransitivelydependenton theprimarykey. Toconvertthe2NFrelationinto3NF,onceagain,theREL2issplitintotwosimplerrelations–REL4 andREL5asshownbelow. REL4{S#,SUPPLYCITY}and REL5{SUPPLYCITY,SUPLLYSTATUS} TheFDdiagramandsamplerelation,isshownbelow. S# SUPPLYCITY SUPPLYCITY SUPPLYCITY REL4 REL5 S# SUPPLYCITY SUPPLYCITY SUPPLYSTATUS S1 Madras Madras 200 S2 Mumbai Mumbai 100 S3 Mumbai Kolakata 300 S4 Madras S5 Kolkata Evidently, the above relations REL4 and REL5 are in 3NF, because there is no transitive dependencies. Every 2NF can be reduced into 3NF by decomposing it further and removing any transitivedependency.

Dependency Preservation The reduction process may suggest a variety of ways in which a relation may be decomposed in losslessdecomposition.Thus,REL2canbeinwhichtherewasatransitivedependencyandtherefore, wesplititintotwo3NFprojections,i.e.

86 REL4{S#,SUPPLYCITY}and REL5{SUPPLYCITY,SUPLLYSTATUS} Letuscallthisdecompositionasdecompositio1.Analternativedecompositionmaybe: REL4{S#,SUPPLYCITY}and REL5{S#,SUPPLYSTATUS} Whichwewillcalldecomposition2. Both the decompositions decomposition1 and decomposition2 are 3NF and lossless. However, decomposition2 is less satisfactory than decomposition1. For example, it is still not possible to inserttheinformationthataparticularcityhasaparticularstatusunlesssomesupplierislocatedin thecity. Inthedecomposition1,thetwoprojectionsareindependentofeachotherbutthesameisnottruein the second decomposition. Here independence is in the sense that updates are made into the relationswithoutregardoftheotherprovidedtheinsertionislegal.Alsoindependentdecompositions preservethedependenciesofthedatabaseandnodependenceislostinthedecompositionprocess. Theconceptofindependentprojectionsprovidesforchoosingaparticulardecompositionwhenthere ismorethanonechoice. 6.2.4 Boyce-Codd Normal Form

Thepreviousnormalformsassumedthattherewasjustonecandidatekeyintherelationandthat keywasalsotheprimarykey.Anotherclassofproblemsariseswhenthisisnotthecase.Veryoften therewillbemorecandidatekeysthanoneinpracticaldatabasedesigningsituation.Tobeprecise the 1NF, 2NF and 3NF did not deal adequately with the case of relations that had two or more candidatekeys,andthatthecandidatekeyswerecomposite,and Theyoverlapped(i.e.hadatleastoneattributecommon). ArelationisinBCNF(BoyceCoddNormalForm)ifandonlyifeverynontrivial,leftirreducibleFDhas acandiadtekeyasitsdeterminant. Or ArelationisinBCNFifandonlyifallthedeterminantsarecandidatekeys. Inotherwords,theonlyarrowsintheFDdiagramarearrowsoutofcandidatekeys.Ithasalready beenexplainedthattherewillalwaysbearrowsoutofcandidatekeys;theBCNFdefinitionsaysthere arenoothers,meaningtherearenoarrowsthatcanbeeliminatedbythenormalizationprocedure. Thesetwodefinitionsareapparentlydifferentfromeachother.ThedifferencebetweenthetwoBCNF definitionsisthatwetacitlyassumeintheformercasedeterminants are"nottoo big"andthatall FDsarenontrivial. ItshouldbenotedthattheBCNFdefinitionisconceptuallysimplerthantheold3NFdefinition,in thatitmakesnoexplicitreferencetofirstandsecondnormalformsassuch,nortotheconceptof transitivedependence.Furthermore,althoughBCNFisstrictlystrongerthan3NF,itisstillthecase thatanygivenrelationcanbenonlossdecomposedintoanequivalentcollectionofBCNFrelations.

87 Thus,relationsREL1andREL2whichwerenotin3NF,arenotinBCNFeither;alsothatrelations REL3, REL4, and REL5, which were in 3NF, are also in BCNF. Relation REL1 contains three determinants,namely{S#},{SUPPLYCITY},and{S#,P#};ofthese,only{S#,P#}isacandidatekey,so REL1isnotinBCNF.Similarly,REL2isnotinBCNFeither,becausethedeterminant{SUPPLYCITY} is not a candidate key. Relations REL3, REL4, and REL5, on the other hand, are each in BCNF, becauseineachcasethesolecandidatekeyistheonlydeterminantintherespectiverelations. Wenowconsideranexampleinvolvingtwodisjointi.e.,nonoverlappingcandidatekeys.Suppose thatintheusualsuppliersrelationREL1{S#,SUPPLIERNAME,SUPPLYSTATUS,SUPPLYCITY},{S#} and{SUPPLIERNAME}arebothcandidatekeys(i.e.,foralltime,itisthecasethateverysupplierhas a unique supplier number and also a unique supplier name). Assume, however, that attributes SUPPLYSTATUS and SUPPLYCITY are mutually independent i.e., the FD SUPPLYCITY →SUPPLYSTATUSnolongerholds.ThentheFDdiagramisasshownbelow. S# SUPPLYSTATUS

SUPPLIERNAME SUPPLYCITY RelationREL1isinBCNF.AlthoughtheFDdiagramdoeslook"morecomplex"thana3NFdiagram, itisneverthelessstillthecasethattheonlydeterminantsarecandidatekeys;i.e.,theonlyarrowsare arrows out of candidate keys. So the message of this example is just that having more than one candidatekeyisnotnecessarilybad. Forillustrationwewillassumethatinourrelationssuppliernamesareunique.ConsiderREL6. REL6{S#,SUPPLIERNAME,P#,PARTQTY}. Since it contains two determinants, S# and SUPPLIERNAME that are not candidate keys for the relation,thisrelationisnotinBCNF.Asamplesnapshotofthisrelationisshownbelow: REL6 S# SUPPLIERNAME P# PARTQTY S1 Pooran P1 3000 S1 Anupam P2 2000 S1 Vishal P3 4000 S1 Vinod P4 2000 As is evident from the figure above, relation REL6 involves the same kind of redundancies as did relationsREL1andREL2,andhenceissubjecttothesamekindofupdateanomalies.Forexample, changingthenameofsuppliersfromVinodtoRahulleads,onceagain,eithertosearchproblemsor topossiblyinconsistentresults.YetREL6isin3NFbytheolddefinition,becausethatdefinitiondid not require an attribute to be irreducibly dependent on each candidate key if it was itself a component of some candidate key of the relation, and so the fact that SUPPLIERNAME is not irreduciblydependenton{S#,P#}wasignored. ThesolutiontotheREL6problemsis,ofcourse,tobreaktherelationdownintotwoprojections,in thiscasetheprojectionsare: REL7{S#,SUPPLIERNAME}and

88 REL8{S#,P#,PARTQTY} Or REL7{S#,SUPPLIERNAME}and REL8{SUPPLIERNAME,P#,PARTQTY} BothoftheseprojectionsareinBCNF.Theoriginaldesign,consistingofthesinglerelationREL1,is clearly bad; the problems with it are intuitively obvious, and it is unlikely that any competent databasedesignerwouldeverseriouslyproposeit,evenifheorshehadnoexposuretotheideasof BCNFetc.atall. ComparisonofBCNFand3NF We have seen two normal forms for relationaldatabase schemas: 3NF and BCNF. There is an advantageto3NFinthatweknowthatitisalwayspossibletoobtaina3NFdesignwithoutsacrificing alosslessjoinordependencypreservation.Nevertheless,thereisadisadvantageto3NF.Ifwedonot eliminate all transitive dependencies, we may have to use null values to represent some of the possible meaningful relationship among data items, and there is the problem of repetition of information.Theotherdifficultyistherepetitionofinformation. If we are forced to choose between BCNF and dependency preservation with 3NF, it is generally preferableto optfor 3NF.Ifwecannottestfordependency preservationefficiently,weeitherpaya highpenaltyinsystemperformanceorrisktheintegrityofthedatainourdatabase.Neitherofthese alternatives is attractive. With such alternatives, the limited amount of redundancy imposed by transitive dependencies allowed under 3NF is the lesser evil. Thus, we normally choose to retain dependencypreservationandtosacrificeBCNF. 6.2.5 Multi-valued dependency

Multivalueddependencymaybeformallydefinedas: LetRbearelation,andletA,B,andCbesubsetsoftheattributesofR.ThenwesaythatBis multidependentonAinsymbols, A →→B (read "A multidetermines B," or simply "A double arrow B") if and only if, in every possible legal valueofR,thesetofBvaluesmatchingagivenAvalue,CvaluepairdependsonlyontheAvalue andisindependentoftheCvalue. To elucidate the meaningof the above statement, letustake oneexamplerelationREL8as shown beolw: REL8 COURSE TEACHERS BOOKS Computer TEACHER BOOK Dr.Wadhwa Graphics Prof.Mittal UNIX Mathematics TEACHER BOOK Prof.Saxena RelationalAlgebra Prof.Karmeshu DiscreteMaths Assume that for a given course there can exist any number of corresponding teachers and any number of corresponding books. Moreover, let us also assume that teachers and books are quite 89 independentofoneanother;thatis,nomatterwhoactuallyteachesanyparticularcourse,thesame booksareused.Finally,alsoassumethatagiventeacheroragivenbookcanbeassociatedwithany numberofcourses. Letustrytoeliminatetherelationvaluedattributes.Onewaytodothisissimplytoreplacerelation REL8byarelationREL9withthreescalarattributesCOURSE,TEACHER,andBOOKasindicated below. REL9 COURSE TEACHER BOOK Computer Dr.Wadhwa Graphics Computer Dr.Wadhwa UNIX Computer Prof.Mittal Graphics Computer Prof.Mittal UNIX Mathematics Prof.Saxena RelationalAlgebra Mathematics Prof.Karmeshu DisreteMaths Mathematics Prof.Karmeshu RelationalAlgebra Asyoucanseefromtherelation,eachtupleofREL8givesrisetom*ntuplesinREL9,wheremand n are the cardinalities of the TEACHERS and BOOKS relations in that REL8 tuple. Note that the resultingrelationREL9is"allkey". The meaning of relation REL9 is basically as follows: A tuple {COURSE:c, TEACHER:t, BOOK:x} appearsinREL9ifandonlyifcourseccanbetaughtbyteachertandusesbookxasareference. Observethat,foragivencourse,allpossiblecombinationsofteacherandbookappear:thatis,REL9 satisfiesthe(relation)constraint iftuples(c,t1,x1),(c,t2,x2)bothappear thentuples(c,t1,x2),(c,t2,x1)bothappearalso Now,itshouldbeapparentthatrelationREL9involvesagooddealofredundancy,leadingasusual tocertainupdateanomalies.Forexample,toaddtheinformationthattheComputercoursecanbe taughtbyanewteacher,itisnecessarytoinserttwonewtuples,oneforeachofthetwobooks.Can weavoidsuchproblems?Well,itiseasytoseethat: 1. The problems in question are caused by the fact that teachers and books are completely independentofoneanother; 2. Matters would be much improved if REL9 were decomposed into its two projections call them REL10andREL11on{COURSE,TEACHER}and{COURSE,BOOK},respectively. ToaddtheinformationthattheComputercoursecanbetaughtbyanewteacher,allwehavetodo nowisinsertasingletupleintorelationREL10.Thus,itdoesseemreasonabletosuggestthatthere shouldbeawayof"furthernormalizing"arelationlikeREL9. ItisobviousthatthedesignofREL9isbadandthedecompositionintoREL10andREL11isbetter. Thetroubleis,however,thesefactsarenotformallyobvious.NoteinparticularthatREL9satisfiesno functionaldependenciesatall(apartfromtrivialonessuchasCOURSE →COURSE);infact,REL9is inBCNF,sinceasalreadynoteditisallkeyany"allkey"relationmustnecessarilybeinBCNF.(Note

90 thatthetwoprojectionsREL10andREL11arealsoall key andhenceinBCNF.)Theideas ofthe previousnormalizationarethereforeofnohelpwiththeproblemathand. Theexistenceof"problem"BCNFrelationlikeREL9wasrecognizedveryearlyon,andthewaytodeal withthemwasalsosoonunderstood,atleastintuitively.However,itwasnotuntil1977thatthese intuitiveideaswereputonasoundtheoreticalfootingbyFagin'sintroductionofthenotionofmulti valued dependencies, MVDs. Multivalued dependencies are a generalization of functional dependencies,inthesensethat everyFDis anMVD,buttheconverseisnottrue(i.e.,thereexist MVDsthatarenotFDs).InthecaseofrelationREL9therearetwoMVDsthathold: COURSE →→TEACHER COURSE →→BOOK Notethedoublearrows;theMVDA →→Bisreadas"BismultidependentonA"or,equivalently,"A multideterminesB."LetusconcentrateonthefirstMVD,COURSE →→TEACHER.Intuitively,what thisMVDmeansisthat,althoughacoursedoesnothaveasinglecorrespondingteacheri.e.,the functional dependence COURSE →TEACHER does not holdnevertheless, each course does have a welldefinedsetofcorrespondingteachers.By"welldefined"herewemean,moreprecisely,thatfora givencoursecandagivenbookx,thesetofteacherstmatchingthepair(c,x)inREL9dependson thevaluecaloneitmakesnodifferencewhichparticularvalueofxwechoose.ThesecondMVD, COURSE →→BOOK,isinterpretedanalogously. Itiseasytoshowthat,giventherelationR{A,B,C),theMVDA →→BholdsifandonlyiftheMVD A→→C also holds. MVDs always go together in pairs in this way. For this reason it is common to representthembothinonestatement,thus: COURSE →→TEACHER|TEXT Now,westatedabovethatmultivalueddependenciesareageneralizationoffunctionaldependencies, inthesensethateveryFDisanMVD.Moreprecisely,anFDisanMVDinwhichthesetofdependent (righthandside)valuesmatchingagivendeterminant(lefthandside)valueisalwaysasingletonset. Thus,ifA →B.thencertainlyA →→B. ReturningtoouroriginalREL9problem,wecannowseethatthetroublewithrelationsuchasREL9 isthattheyinvolveMVDsthatarenotalsoFDs.(Incaseitisnotobvious,wepointoutthatitis precisely the existence of those MVDs that leads to the necessity of – for example inserting two tuples to add another Computer teacher. Those two tuples are needed in order to maintain the integrityconstraintthatisrepresentedbytheMVD.)ThetwoprojectionsREL10andREL11donot involveanysuchMVDs,whichiswhytheyrepresentanimprovementovertheoriginaldesign.We wouldthereforeliketoreplaceREL9bythosetwoprojections,andanimportanttheoremprovedby Fagininreferenceallowsustomakeexactlythatreplacement: Theorem(Fagin):LetR{A,B,C}bearelation,whereA,B,andCaresetsofattributes.ThenRisequal tothejoinofitsprojectionson{A,B}and{A,C}ifandonlyifRsatisfiestheMVDsA →→B|C. Atthisstageweareequippedtodefinefourthnormalform: Fourthnormalform:RelationRisin4NFifandonlyif,wheneverthereexistsubsetsAandBofthe attributesofRsuchthatthenontrivial(AnMVDA →→BistrivialifeitherAisasupersetofBorthe

91 union of R and B is the entire heading) MVD A →→B is satisfied, then all attributes of R are also functionallydependentonA. In other words, the only nontrivial dependencies (FDs or MVDs) in R are of the form Y →X (i.e., functionaldependencyfromasuperkeyYtosomeotherattributeX).Equivalently:Risin4NFifitis inBCNFandallMVDsinRareinfact"FDsoutofkeys."Therefore,that4NFimpliesBCNF. RelationREL9isnotin4NF,sinceitinvolvesanMVDthatisnotanFDatall,letaloneanFD"outof a key." The two projections REL10 and REL11 are both in 4NF, however. Thus 4NF is an improvementoverBCNF,inthatiteliminatesanotherformofundesirabledependency.Whatismore, 4NF is always achievable; that is, any relation can be nonloss decomposed into an equivalent collectionof4NFrelations. YoumayrecallthatarelationR{A,B,C}satisfyingtheFDsA →BandB →Cisbetterdecomposedinto itsprojectionson(A,B)and{B,C}ratherthanintothoseon{A,B]and{A,C).Thesameholdstrueif wereplacetheFDsbytheMVDsA →→BandB →→C. 6.2.6 Fifth Normal Form

Itseemsfromourdiscussionsofarinthatthesoleoperationnecessaryoravailableinthefurther normalization process is the replacement of a relation in a nonloss way by exactly two of its projections. This assumption has successfully carried us as far as 4NF. It comes perhaps as a surprise,therefore,todiscoverthatthereexistrelationsthatcannotbenonlossdecomposedintotwo projectionsbutcanbenonlossdecomposedintothree(ormore).Anunpleasantbutconvenientterm, wewilldescribesucharelationas"ndecomposable"(forsomen>2)meaningthattherelationin questioncanbenonlossdecomposedintonprojectionsbutnotintomforanym<n. Arelationthatcanbe nonlossdecomposedintotwo projections we will call "2decomposable" and similarlyterm“ndecomposable”maybedefined.Thephenomenonofndecomposabilityforn>2was firstnotedbyAho,Been,andUllman.Theparticularcasen=3wasalsostudiedbyNicolas. Consider relation REL12 from the supplierspartsprojects database ignoring attribute QTY for simplicityforthemoment.Asamplesnapshotofthesameisshownbelow.Itmaybepointedoutthat relationREL12isallkeyandinvolvesnonontrivialFDsorMVDsatall,andisthereforein4NF.The snapshotoftherelationsalsoshows: a. The three binary projections REL13, REL14, and REL15 corresponding to the REL12 relation valuedisplayedonthesectionoftheadjoiningdiagram; b.TheeffectofjoiningtheREL13andREL14projections(overP#); c.TheeffectofjoiningthatresultandtheREL15projection(overJ#andS#). REL12 S# P# J# S1 P1 J2 S1 P2 J1 S2 P1 J1 S1 P1 J1 REL13 S# P# REL14 P# J# REL15 J# S# S1 P1 P1 J2 J2 S1 S1 P2 P1 J1 J1 S1 92 S2 P1 P1 J1 J1 S2 JoinDependency: LetRbearelation,andletA,B,...,ZbesubsetsoftheattributesofR.ThenwesaythatRsatisfies theJoinDependency(JD) *{A,B,...,Z} (read"starA,K...,Z")ifandonlyifeverypossiblelegalvalueofRisequaltothejoinofitsprojections onA,B,...,Z. Forexample,ifweagreetouseSPtomeanthesubset(S#,P#}ofthesetofattributesofREL12,and similarlyforFJandJS,thenrelationREL12satisfiestheJD*{SP,PJ,JS}. We have seen, then, that relation REL12, with its JD * {REL13, REL14, REL15}, can be 3 decomposed.Thequestionis,shoulditbe?Andtheansweris"Probablyyes."RelationREL12(with itsJD)suffersfromanumberofproblemsoverupdateoperations,problemsthatareremovedwhenit is3decomposed. Fagin'stheorem,totheeffectthatR{A,B,C}canbenonlossdecomposedintoitsprojectionson{A, B}and{A,C]ifandonlyiftheMVDsA →→BandA →→CholdinR,cannowberestatedasfollows: R{A,B,C}satisfiestheJD*{AB,AC}ifandonlyifitsatisfiestheMVDsA →→B|C. Sincethistheoremcanbetakenasadefinitionofmultivalueddependency,itfollowsthatanMVDis justaspecialcaseofaJD,or(equivalently)thatJDsareageneralizationofMVDs. Thus,toputitformally,wehave A→→B|C ≡ *{AB,AC} Notethatjointdependenciesarethemostgeneralformofdependencypossible(using,ofcourse,the term "dependency" in a very special sense). That is, there does not exist a still higher form of dependencysuchthatJDsaremerelyaspecialcaseofthathigherformsolongaswerestrictour attentiontodependenciesthatdealwitharelationbeingdecomposedviaprojectionandrecomposed viajoin. Comingbacktotherunningexample,wecan seethat the problem with relation REL12 is that it involvesaJDthatisnotanMVD,andhencenotanFDeither.Wehavealsoseenthatitispossible, and probably desirable, to decompose such a relation into smaller components namely, into the projections specified by the join dependency. That decomposition process can be repeated until all resultingrelationsareinfifthnormalform,whichwenowdefine: Fifth normal form: A relation R is in 5NF also called projectionjoin normal torn (PJ/NF) ifand onlyifeverynontrivial*joindependencythatholdsforRisimpliedbythecandidatekeysofR. LetusunderstandwhatitmeansforaJDtobe"impliedbycandidatekeys." RelationREL12isnotin5NF,itsatisfiesacertainjoindependency,namelyConstraint3D,thatis

93 certainlynotimpliedbyitssolecandidatekey(thatkeybeingthecombinationofallofitsattributes). Stateddifferently,relationREL12isnotin5NF,because(a)itcanbe3decomposedand(b)that3 decomposabilityisnotimpliedbythefactthatthecombination{S#,P#,J#}isacandidate key.By contrast,after3decomposition,thethreeprojectionsSP,PJ,andJSareeachin5NF,sincetheydo notinvolveany(nontrivial)JDsatall. Nowletusunderstandthroughanexample,whatitmeansforaJDtobeimpliedbycandidatekeys. Suppose that the familiar suppliers relation REL1 has two candidate keys, {S#} and {SUPPLIERNAME}.Thenthatrelationsatisfiesseveraljoindependenciesforexample,itsatisfiesthe JD *{{S#,SUPPLIERNAME,SUPPLYSTATUS},{S#,SUPPLYCITY}} Thatis,relationREL1isequaltothejoinofitsprojectionson{S#,SUPPLIERNAME,SUPPLYSTATUS} and{S#,SUPPLYCITY),andhencecanbenonlossdecomposedintothoseprojections.(Thisfactdoes notmeanthatitshouldbesodecomposed,ofcourse,onlythatitcouldbe.)ThisJDisimpliedbythe factthat{S#}isacandidatekey(infactitisimpliedbyHeath'stheorem)Likewise,relationREL1also satisfiestheJD *{{S#,SUPPLIERNAME},{S#,SUPPLYSTATUS},{SUPPLIERNAME,SUPPLYCITY}} ThisJDisimpliedbythefactthat{S#}and{SUPPLYNAME}arebothcandidatekeys. Toconclude,wenotethatitfollowsfromthedefinitionthat5NFistheultimatenormalformwith respecttoprojectionandjoin(whichaccountsforitsalternativename,projectionjoinnormalform). That is, a relation in 5NF is guaranteed to be free of anomalies that can be eliminated by taking projections. For a relation is in 5NF the only join dependencies are those that are implied by candidate keys, and so the only valid decompositions are ones that are based on those candidate keys.(Eachprojectioninsuchadecompositionwillconsistofoneormoreofthosecandidatekeys, pluszeroormoreadditionalattributes.)Forexample,thesuppliersrelationREL15isin5NF.Itcan befurtherdecomposedinseveralnonlossways,aswesawearlier,buteveryprojectioninanysuch decompositionwillstillincludeoneoftheoriginalcandidatekeys,andhencetheredoesnotseemto beanyparticularadvantageinthatfurtherreduction. 6.3 Self Test Exercise 1 True-False Questions: 1. Normalisation istheprocessofsplittingarelationintotwoormorerelations. (T) 2. A functional dependency isarelationshipbetweenoramongattributes. (T) 3. Theknownorgivenattributeiscalledthe determinant inafunctionaldependency. (T) 4. Therelationshipinafunctionaldependencyisonetoone(1:1). (F) 5. A key isagroupofoneormoreattributesthatuniquelyidentifiesarow. (T) 6. The selection of the attributes to use for the key is determined by the database programmers. (F) 7. A deletion anomaly occurswhendeletingoneentityresultsindeletingfactsaboutanother entity. (T) 8. Aninsertionanomalyoccurswhenwecannotinsertsomedataintothedatabasewithout insertinganotherentityfirst. (T) 9. Modificationanomaliesdonotoccurintablesthatmeetthedefinitionofarelation. (F) 10. A table of data that meets the minimum definition of a relation is automatically in first

94 normal form . (T) 11. Arelationisin first normal form ifallofitsnonkeyattributesaredependentonpartofthe key. (F) 12. Arelationisin second normal form ifallofitsnonkeyattributesaredependentonallof thekey. (T) 13. Arelationcanbeinthirdnormalformwithoutbeinginsecondnormalform. (F) 14. Fifth normal form isthehighestnormalform. (F) 15. A relation in domain/key normal form is guaranteed to have no modification anomalies. (T) Exercise 2 Multiple choice questions 1. A relation isanalogoustoa: (a) a) file b) field c) record d) row e) column

2. Inafunctionaldependency,thedeterminant: (d) a) willbepairedwithonevalueofthedependentattribute b) maybepairedwithoneormorevaluesofthedependentattribute c) mayconsistofmorethanoneattribute d) aandc e) bandc

6.4 Summary • Describestherelationshipbetweenattributesinarelation.ForExampleifAandBareattributes ofarelationR,BisfunctionallydependentandA(denotedAàB),ifeachvalueofAisassociated withexactlyonevalueofB.(AandBmayeachconsistofoneormoreattributes) • The concept of database normalization is not unique to any particular Relational Database Management System. It can be applied to any of several implications of relational databases includingMicrosoftAccess,dBase,Oracle,etc. • Normalization mayhavetheeffectofduplicatingdatawithinthe databaseand oftenresultsin thecreationofadditionaltables.(Whilenormalizationtendstoincreasetheduplicationofdata,it doesnotintroduceredundancy,whichisunnecessaryduplication.)Normalizationistypicallya refinementprocessaftertheinitialexerciseofidentifyingthedataobjectsthatshouldbeinthe database,identifyingtheirrelationships,anddefiningthetablesrequiredandthecolumnswithin eachtable.

95 UNIT 7

STRUCTURED QUERY LANGUAGE

7.1 Introductionofsql 7.2 Ddlstatements 7.3 Dmlstatements 7.4 Viewdefinitions 7.5 Constraintsandtriggers 7.6 Keysandforeignkeys 7.7 Constraintsonattributesandtuples 7.8 Modificationofconstraints 7.9 Cursors 7.10 Dynamicsql 7.1 Introduction Of SQL

AttheheartofeveryDBMSisalanguagethatissimilartoaprogramminglanguage,butdifferent inthatitisdesignedspecificallyforcommunicatingwithadatabase. Onepowerfullanguageis SQL. IBM developed SQL in the late 1970s and early 1980s as a way to standardize query languageacrossthemanymainframeandmicrocomputerplatformsthatcompanyproduced. SQL differs significantly from programming languages. Most programming languages are still procedural. Procedural language consists of commands that tell the computer what to do instruction by instruction, step by step. SQL is notaprogramminglanguageitself,itisadata access language. SQL may be embedded in traditional procedural programminglanguages(like COBOL). SQL statement is not really command to the computer. Rather, it is a description of someofthedatacontainedinadatabase.SQLisnonproceduralbecauseitdoesnotgivestepby stepcommandstothecomputerordatabase.SQLdescribesdata,andinstructsthedatabaseto dosomethingwiththedata. Forexample SELECT[Name],[Company_Name] FROMContacts WHERE((City="KansasCity",and([Name]="R..")) 7.2 DDL Statements

Data Definition Language is a set of SQL commands used to create, modify and delete database structures(notdata).Thesecommandswouldn'tnormallybeusedbyageneraluser,whoshouldbe accessingthedatabaseviaanapplication.TheyarenormallyusedbytheDBA(toalimitedextent),a databasedesignerorapplicationdeveloper.Thesestatementsareimmediate,theyarenotsusceptible toROLLBACKcommands.YoushouldalsonotethatifyouhaveexecutedseveralDMLupdatesthen

96 issuinganyDDLcommandwillCOMMITalltheupdatesaseveryDDLcommandimplicitlyissuesa COMMITcommandtothedatabase.AnybodyusingDDLmusthavetheCREATEobjectprivilegeand aTablespaceareainwhichtocreateobjects. In an Oracle database objects can be created at any time, whether users are online or not. Table space need not be specified as Oracle will pick up the user defaults (defined by the DBA) or the systemdefaults.Tableswillexpandautomaticallytofilldiskpartitions(providedthishasbeensetup inadvancebytheDBA).Tablestructuresmaybemodifiedonlinealthoughthiscanhavedireeffects onanapplicationsobecareful. Creatingourtwoexampletables CREATETABLEBOOK( ISBNNUMBER(10), TITLEVARCHAR2(200), AUTHORVARCHAR2(50), COSTNUMBER(8,2), LENT_DATEDATE, RETURNED_DATEDATE, TIMES_LENTNUMBER(6), SECTION_IDNUMBER(3) ) CREATETABLESECTION( SECTION_IDNUMBER(3), SECTION_NAMECHAR(30), BOOK_COUNTNUMBER(6) ) The two commands above create our two sample tables and demonstrate the basic table creation command. The CREATE keyword is followed by the type of object that we want created (TABLE, VIEW,INDEXetc.),andthatisfollowedbythenamewewanttheobjecttobeknownby.Betweenthe outerbracketslietheparametersforthecreation,inthiscasethenames,datatypesandsizesofeach field. ANUMBERisanumericfield,thesizeisnotthemaximumexternallydisplayednumberbutthesize oftheinternalbinaryfieldsetasideforthefield(10canholdaverylargenumber).Anumbersize splitwithacommadenotesthefieldsizefollowedbythenumberofdigitsfollowingthedecimalpoint (inthiscaseacurrencyfieldhastwosignificantdigits) AVARCHAR2isavariablelengthstringfieldfrom0nwherenisthespecifiedsize.Oracleonlytakes upthespacerequiredtoholdanyvalueinthefield,itdoesn'tallocatetheentirestoragespaceunless requiredtobyamaximumsizedfieldvalue(Maxsize2000). ACHARisafixedlengthstringfield(Maxsize255). ADATEisaninternaldate/timefield(normally7byteslong).

97 ALONGorLONGRAWfield(notshown)isusedtoholdlargebinaryobjects(Worddocuments,AVI filesetc.).Nosizeisspecifiedforthesefieldtypes.(Maxsize2Gb).

Creating our two example tables with constraints Constraints are used to enforce table rules and prevent data dependent deletion (enforce database integrity).Youmayalsousethemtoenforcebusinessrules(withsomeimagination). Our two example tables do have some rules which need enforcing, specifically both tables need to haveaprimekey(sothatthedatabasedoesn'tallowreplicationofdata).AndtheSectionIDneedsto belinkedtoeachbooktoidentifywhichlibrarysectionitbelongsto(theforeignkey).Wealsowantto specify which columns must be filled in and possibly some default values for other columns. Constraintscanbeatthecolumnortablelevel. Constraint Description NULL / NOT NOTNULLspecifiesthatacolumnmusthavesomevalue.NULL(default)allowsNULL NULL valuesinthecolumn. DEFAULT Specifiessomedefaultvalueifnovalueenteredbyuser. UNIQUE Specifiesthatcolumn(s)musthaveuniquevalues PRIMARY Specifiesthatcolumn(s)arethetableprimekeyandmusthaveuniquevalues.Index KEY isautomaticallygeneratedforcolumn. Specifiesthatcolumn(s)areatableforeignke yandwillusereferentialuniquenessof FOREIGN parenttable.Indexisautomaticallygeneratedforcolumn.Foreignkeysallowdeletion KEY cascadesandtable/businessrulevalidation. CHECK Appliesaconditiontoaninputcolumnvalue. You may suffix DISABLE to any other constraint to make Oracle ignore the DISABLE constraint, the constraint will still be available to applications / tools and you can enabletheconstraintlaterifrequired.

CREATETABLESECTION( SECTION_IDNUMBER(3)CONSTRAINTS_IDCHECK(SECTION_ID>0), SECTION_NAMECHAR(30)CONSTRAINTS_NAMENOTNULL, BOOK_COUNTNUMBER(6), CONSTRAINTSECT_PRIMEPRIMARYKEY(SECTION_ID)) CREATETABLEBOOK( ISBNNUMBER(10)CONSTRAINTB_ISBNCHECK(ISBNBETWEEN1AND2000), TITLEVARCHAR2(200)CONSTRAINTB_TITLENOTNULL, AUTHORVARCHAR2(50)CONSTRAINTB_AUTHNOTNULL, COSTNUMBER(8,2)DEFAULT0.00DISABLE, LENT_DATEDATE, RETURNED_DATEDATE, TIMES_LENTNUMBER(6), SECTION_IDNUMBER(3), CONSTRAINTBOOK_PRIMEPRIMARYKEY(ISBN),

98 CONSTRAINTBOOK_SECTFOREIGNKEY(SECTION_ID)REFERENCESSECTION(SECTION_ID)) We have now created our tables with constraints. Column level constraints go directly after the columndefinitiontowhichtheyrefer,tablelevelconstraintsgoafterthelastcolumndefinition.Table level constraints are normally used (and must be used) for compound (multi column) foreign and prime key definitions, the example table level constraints could have been placed as column definitions if that was your preference (there would have been no difference to their function). The CONSTRAINT keyword is followed by a unique constraintnameandthentheconstraintdefinition. Theconstraintnameisusedtomanipulatetheconstraintoncethetablehasbeencreated,youmay omittheCONSTRAINTkeywordandconstraintnameifyouwishbutyouwillthenhavenoeasyway ofenabling/disablingtheconstraintwithoutdeletingthetableandrebuildingit,Oracledoesgive default names to constraints not explicitly name you can check these by selecting from the USER_CONSTRAINTSdatadictionaryview.NotethattheCHECKconstraintimplementsanyclause thatwouldbevalidinaSELECTWHEREclause(enclosed in brackets), any value inbound to this columnwouldbevalidatedbeforethetableisupdatedandaccepted/rejectedviatheCHECKclause. Notethattheorderthatthetablesarecreatedinhaschanged,thisisbecausewenowreferencethe SECTIONtablefromtheBOOKtable.TheSECTIONtablemustexistbeforewecreatetheBOOKtable elsewewillreceiveanerrorwhenwetrytocreatetheBOOKtable.Theforeignkeyconstraintcross referencesthefieldSECTION_IDintheBOOKtabletothefield(andprimarykey)SECTION_IDinthe SECTIONtable(REFERENCESkeyword). Ifwewishwecanintroducecascadingvalidationandsomeconstraintviolationloggingtoourtables.

CREATETABLEAUDIT( ROWIDROWID, OWNERVARCHAR2, TABLE_NAMEVARCHAR2, CONSTRAINTVARCHAR2) CREATETABLESECTION( SECTION_IDNUMBER(3)CONSTRAINTS_IDCHECK(SECTION_ID>0), SECTION_NAMECHAR(30)CONSTRAINTS_NAMENOTNULL, BOOK_COUNTNUMBER(6), CONSTRAINTSECT_PRIMEPRIMARYKEY(SECTION_ID),EXCEPTIONSINTOAUDIT) CREATETABLEBOOK( ISBNNUMBER(10)CONSTRAINTB_ISBNCHECK(ISBNBETWEEN1AND2000), TITLEVARCHAR2(200)CONSTRAINTB_TITLENOTNULL, AUTHORVARCHAR2(50)CONSTRAINTB_AUTHNOTNULL, COSTNUMBER(8,2)DEFAULT0.00DISABLE, LENT_DATEDATE, RETURNED_DATEDATE, TIMES_LENTNUMBER(6), SECTION_IDNUMBER(3), CONSTRAINTBOOK_PRIMEPRIMARYKEY(ISBN), CONSTRAINTBOOK_SECTFOREIGNKEY(SECTION_ID)REFERENCESSECTION(SECTION_ID)ON DELETECASCADE)

99 Oracle (and any other decent RDBMS) would not allow us to delete a section which had books assignedtoitasthisbreaksintegrityrules.Ifwewantedtogetridofallthebookrecordsassignedto a particular section when that section was deleted we could implement a DELETE CASCADE. The delete cascade operates across a foreign key link and removes all child records associated with a parent record (we would probably want to reassign the books rather than delete them in the real world). TologconstraintviolationsIhavecreatedanewtable(AUDIT)andstatedthatallexceptionsonthe SECTION table should be logged in this table, you can then view the contents of this table with standard SELECT statements. The AUDIT table must have the shown structure but can be called anything. It is possible to record a description or comment against a newly created or existing table or individualcolumnbyusingtheCOMMENTcommand.The comment commandwritesyourtable/ column descriptionintothe data dictionary.Youcan querycolumn comments byselectingagainst dictionaryviewsALL_COL_COMMENTSandUSER_COL_COMMENTS. You can query table comments by selecting against dictionary views ALL_TAB_COMMENTS and USER_TAB_COMMENTS.Commentscanbeupto255characterslong.

Altering tables and constraints ModificationofdatabaseobjectstructureisexecutedwiththeALTERstatement. Youcanmodifyaconstraintasfollows: Addnewconstrainttocolumnortable. Removeconstraint. Enable/disableconstraint. You cannot change a constraint definition. Youcanmodifyatableasfollows: Addnewcolumns. Modifyexistingcolumns. You cannot delete an existing column. Anexampleofaddingacolumntoatableisgivenbelow:

ALTER TABLE JD11.BOOK ADD (REVIEW VARCHAR2(200)) Thisstatementaddsanewcolumn(REVIEW)toourbooktable,toenablelibrarymemberstobrowse thedatabaseandreadshortreviewsofthebooks. IfwewanttoaddaconstrainttoournewcolumnwecanusethefollowingALTERstatement:

ALTER TABLE JD11.BOOK MODIFY(REVIEW NOT NULL) Notethatwecan'tspecifyaconstraintnamewiththeabovestatement.Ifwewantedtofurthermodify aconstraint(otherthanenable/disable)wewouldhavetodroptheconstraintandthenreapplyit specifyinganychanges. Assuming that we decide that 200 bytes is insufficient for our reviewfield we might then want to increaseitssize.Thestatementbelowdemonstratesthis:

ALTER TABLE JD11.BOOK MODIFY (REVIEW VARCHAR2(400)) WecouldnotdecreasethesizeofthecolumniftheREVIEWcolumncontainedanydata.

100 ALTER TABLE JD11.BOOK DISABLE CONSTRAINT B_AUTH ALTER TABLE JD11.BOOK ENABLE CONSTRAINT B_AUTH The above statements demonstrate disabling and enabling a constraint, note that if, between disablingaconstraintandreenablingit,datawasenteredtothetablethatincludedNULLvaluesin the AUTHOR column, then you wouldn't be able to re enable the constraint. This is because the existing data would break the constraint integrity. You could update the column to replace NULL valueswithsomedefaultandthenreenabletheconstraint.

Dropping (deleting) tables and constraints TodropaconstraintfromatableweusetheALTERstatementwithaDROPclause.Someexamples follow:

ALTER TABLE JD11.BOOK DROP CONSTRAINT B_AUTH Theabovestatementwillremovethenotnullconstraint(definedattablecreation)fromtheAUTHOR column.ThevaluefollowingtheCONSTRAINTkeywordisthenameofconstraint.

ALTER TABLE JD11.BOOK DROP PRIMARY KEY TheabovestatementdropstheprimarykeyconstraintontheBOOKtable.

ALTER TABLE JD11.SECTION DROP PRIMARY KEY CASCADE TheabovestatementdropstheprimarykeyontheSECTIONtable.TheCASCADEoptiondropsthe foreignkeyconstraintontheBOOKtableatthesametime. Use the DROP command to delete database structures like tables. Dropping a table removes the structure, data, privileges, views and synonyms associated with the table (you cannot rollback the DROPsobecareful).YoucanspecifyaCASCADEoptiontoensurethatconstraintsreferingtothe droppedtablewithinothertables(foreignkeys)arealsoremovedbytheDROP.

DROP TABLE SECTION TheabovestatementdropsthetableSECTIONbutleavestheforeignkeyreferencewithintheBOOK table.

DROP TABLE SECTION CASCADE CONSTRAINTS 7.3 DML Statements DatamanipulationlanguageistheareaofSQLthatallowsyoutochangedatawithinthedatabase.It consistsofonlythreecommandstatementgroups,theyareINSERT,UPDATEandDELETE.

Inserting new rows into a table WeinsertnewrowsintoatablewiththeINSERTINTOcommand.Asimpleexampleisgivenbelow. INSERT INTO JD11.SECTION VALUES (SECIDNUM.NEXTVAL, 'Computing', 0) TheINSERTINTOcommandisfollowedbythenameofthetable(andowningschemaifrequired), thisinturnisfollowedbytheVALUESkeywordwhichdenotesthestartofthevaluelist.Thevalue listcomprisesallthevaluestoinsertintothespecifiedcolumns.Wehavenotspecifiedthecolumns wewanttoinsertintointhisexamplesowemustprovideavalueforeachandeverycolumninthe correct order. The correct order of values can be determined by doing a SELECT * or DESCRIBE

101 againsttherequiredtable,theorderthatthecolumnsaredisplayedistheorderofthevaluesthat youspecifyinthevaluelist.Ifwewanttospecifycolumnsindividually(whennotfillingallvaluesina newrow)wecandothiswithacolumnlistspecified before the VALUES keyword. Our example is reworkedbelow,notethatwecanspecifythecolumnsinanyorderourvaluesarenowintheorder thatwespecifiedforthecolumnlist.

INSERT INTO JD11.SECTION (SECTION_NAME, SECTION_ID) VALUES ('Computing', SECIDNUM.NEXTVAL) Intheaboveexamplewehaven'tspecifiedtheBOOK_COUNTcolumnsowedon'tprovideavaluefor it,thiscolumnwillbesettoNULLwhichisacceptable since we don't have any constraint on the columnthatwouldpreventournewrowfrombeinginserted. TheSQLrequiredtogeneratethedatainthetwotesttablesisgivenbelow. INSERTINTOJD11.SECTION (SECTION_NAME,SECTION_ID) VALUES ('Fiction',10); INSERTINTOJD11.SECTION (SECTION_NAME,SECTION_ID) VALUES ('Romance',5); INSERTINTOJD11.SECTION (SECTION_NAME,SECTION_ID) VALUES ('ScienceFiction',6); INSERTINTOJD11.SECTION (SECTION_NAME,SECTION_ID) VALUES ('Science',7); INSERTINTOJD11.SECTION (SECTION_NAME,SECTION_ID) VALUES ('Reference',9); INSERTINTOJD11.SECTION (SECTION_NAME,SECTION_ID) VALUES ('Law',11); INSERTINTOJD11.BOOK (ISBN,TITLE,AUTHOR,COST,LENT_DATE,RETURNED_DATE,TIMES_LENT,SECTION_ID) VALUES (21,'HELP','B.Baker',20.90,'20AUG97',NULL,10,9); INSERTINTOJD11.BOOK (ISBN,TITLE,AUTHOR,COST,LENT_DATE,RETURNED_DATE,TIMES_LENT,SECTION_ID) VALUES (87,'KillerBees','E.F.Hammond',29.90,NULL,NULL,NULL,9);

102 INSERTINTOJD11.BOOK (ISBN,TITLE,AUTHOR,COST,LENT_DATE,RETURNED_DATE,TIMES_LENT,SECTION_ID) VALUES (90,'Upthecreek','K.Klydsy',15.95,'15JAN97','21JAN97',1,10); INSERTINTOJD11.BOOK (ISBN,TITLE,AUTHOR,COST,LENT_DATE,RETURNED_DATE,TIMES_LENT,SECTION_ID) VALUES (22,'Sevenseas','J.J.Jacobs',16.00,'21DEC97',NULL,19,5); INSERTINTOJD11.BOOK (ISBN,TITLE,AUTHOR,COST,LENT_DATE,RETURNED_DATE,TIMES_LENT,SECTION_ID) VALUES (91,'Dirtysteamtrains','J.SP.Smith',8.25,'14JAN98',NULL,98,9); INSERTINTOJD11.BOOK (ISBN,TITLE,AUTHOR,COST,LENT_DATE,RETURNED_DATE,TIMES_LENT,SECTION_ID) VALUES (101,'Thestoryoftrent','T.Wilbury',17.89,'10JAN98','16JAN98',12,6); INSERTINTOJD11.BOOK (ISBN,TITLE,AUTHOR,COST,LENT_DATE,RETURNED_DATE,TIMES_LENT,SECTION_ID) VALUES (8,'Overthepastagain','K.Jenkins',19.87,NULL,NULL,NULL,10); INSERTINTOJD11.BOOK (ISBN,TITLE,AUTHOR,COST,LENT_DATE,RETURNED_DATE,TIMES_LENT,SECTION_ID) VALUES (79,'Coursesforhorses','H.Harriot',10.34,'17JAN98',NULL,12,9); INSERTINTOJD11.BOOK (ISBN,TITLE,AUTHOR,COST,LENT_DATE,RETURNED_DATE,TIMES_LENT,SECTION_ID) VALUES (989,'Leaningonatree','M.Kilner',19.41,'12NOV97','22NOV97',56,11);

Changing row values with UPDATE TheUPDATEcommandallowsyoutochangethevaluesofrowsinatable,youcanincludeaWHERE clause in the same fashion as the SELECT statement to indicate which row(s) you want values changedin.InmuchthesamewayastheINSERTstatementyouspecifythecolumnsyouwantto update and the new values for those specified columns. The combination of WHERE clause (row selection)andcolumnspecification(columnselection)allowsyoutopinpointexactlythevalue(s)you wantchanged.UnliketheINSERTcommandtheUPDATEcommandcanchangemultiplerowssoyou should take care that you are updating only the values you want changed (see the transactions discussionformethodsoflimitingdamagefromaccidentalupdates). Anexampleisgivenbelow,thisexamplewillupdateasinglerowinourBOOKtable:

UPDATE JD11.BOOK SET TITLE = 'Leaning on a wall', AUTHOR = 'J.Killner', TIMES_LENT = 0, LENT_DATE = NULL, RETURNED_DATE = NULL WHERE ISBN = 989 WespecifythetabletobeupdatedaftertheUPDATEkeyword.FollowingtheSETkeywordwespecify acommadelimitedlistofcolumnnames/newvalues,eachcolumntobeupdatedmustbespecified here(notethatyoucansetcolumnstoNULLbyusingtheNULLkeywordinsteadofanewvalue).The WHEREclausefollowsthelastcolumn/newvaluespecificationandisconstructedinthesameway

103 asfortheSELECTstatement,usetheWHEREclausetopinpointwhichrowstobeupdated.Ifyou don'tspecifyaWHEREclauseonanUPDATEcommandallrowswillbeupdated(thismayormaynot bethedesiredresult).

Deleting rows with DELETE TheDELETEcommandallowsyoutoremoverowsfromatable,youcanincludeaWHEREclausein thesamefashionastheSELECTstatementtoindicatewhichrow(s)youwantdeletedinnearlyall casesyoushouldspecifyaWHEREclause,runningaDELETEwithoutaWHEREclausedeletes ALL rowsfromthetable.UnliketheINSERTcommandtheDELETEcommandcanchangemultiplerows so you should take great care that you are deleting only the rows you want removed (see the transactionsdiscussionformethodsoflimitingdamagefromaccidentaldeletions). Anexampleisgivenbelow,thisexamplewilldeleteasinglerowinourBOOKtable: DELETE FROM JD11.BOOK WHERE ISBN = 989 TheDELETEFROMcommandisfollowedbythenameofthetablefromwhicharowwillbedeleted, followedbyaWHEREclausespecifyingthecolumn/conditionvaluesforthedeletion. DELETE FROM JD11.BOOK WHERE ISBN <> 989 ThisdeleteremovesallrecordsfromtheBOOKtableexcepttheonespecified.Rememberthatifyou omittheWHEREclauseallrowswillbedeleted. 7.4 View Definitions

DBMakerprovidesseveralconvenientmethodsofcustomizingandspeedingupaccesstoyourdata. Views and synonyms are supported to allow userdefined views and names for database objects. Indexesprovideamuchfastermethodofretrievingdatafromatablewhenyouuseacolumnwithan indexinaquery.

Managing Views DBMakerprovidestheabilitytodefinea virtual table ,calleda view ,whichisbasedonexistingtables and is stored in the database as a definition and a userdefined view name. The view definition is storedpersistentlyinthedatabase,buttheactualdatathatyouwillseeintheviewisnotphysically storedanywhere.Rather,thedataisstoredinthebasetablesfromwhichtheview'srowsarederived. Aviewisdefinedbyaquerywhichreferencesoneormoretables(orotherviews). Views are a very helpful mechanism for using a database. For example, you can define complex queriesonceandusethemrepeatedlywithouthavingtoreinventthemoverandover.Furthermore, viewscanbeusedtoenhancethesecurityofyourdatabasebyrestrictingaccesstoapredetermined setofrowsand/orcolumnsofatable. Sinceviewsarederivedfromqueryingtables,youcannotdeterminetherowsofthetablestoupdate. Duetothislimitationviewscanonlybequeried.Users can not update, insert into, or delete from views.

Creating Views Eachviewisdefinedbyanametogetherwithaquerythatreferencestablesorotherviews.Youcan specifyalistofcolumnnamesfortheviewdifferentfromthoseintheoriginaltablewhencreatinga

104 view.Ifyoudonotspecifyanynewcolumnnames,the view will use the column names from the underlyingtables. Forexample,ifyouwantuserstoseeonlythreecolumnsofthetable Employees ,youcancreatea viewwiththeSQLcommandshownbelow.UserscanthenviewonlytheFirstN ame, LastName and Telephone columns ofthetable Employees throughtheview empView . dmSQL>createviewempView(FirstName,LastName,Telephone)as selectFirstName,LastName,PhonefromEmployees; ThequerythatdefinesaviewcannotcontaintheORDERBYclauseorUNIONoperator. Dropping Views Youcandropaviewwhenitisnolongerrequired.Whenyoudropaview,onlythedefinitionstoredin systemcatalogisremoved.Thereisnoeffectonthebasetablesthattheviewwasderivedfrom.To dropaview,executethefollowingcommand: dmSQL>DROPVIEWempView;

Managing Synonyms A synonym isanalias,oralternatename,foranytableorview.Sinceasynonymissimplyanalias,it requiresnostorageotherthanitsdefinitioninthesystemcatalog. Synonymsareusefulforsimplifyingafullyqualifiedtableorviewname.DBMakernormallyidentifies tablesandviewswithfullyqualifiednamesthatarecompositesoftheownerandobjectnames.By using a synonym anyone can access a table or view through the corresponding synonym without havingtousethefullyqualifiedname.Becauseasynonymhasnoownername,allsynonymsinthe databasemustbeuniquesoDBMakercanidentifythem. Creating Synonyms YoucancreateasynonymwiththefollowingSQLcommand: dmSQL>createsynonymEmployeesforEmployees; IftheownerofthetableEmployeesistheSYSADM,thiscommandcreatesanaliasnamed Employees for the table SYSADM.Employees . All database users can directly reference the table SYSADM.Employees throughthesynonym Employees . Dropping Synonyms Youcandropasynonymthatisnolongerrequired.Whenyoudropasynonym,onlyitsdefinitionis removedfromthesystemcatalog. ThefollowingSQLcommanddropsthesynonym Employees : dmSQL>dropsynonymEmployees; Managing Indexes Anindexprovidessupportforfastrandomaccesstoarow.Youcanbuildindexesonatabletospeed upsearching.Forexample,whenyouexecutethequerySELECTNAMEFROMEMPLOYEESWHERE NUMBER = 10005, it is possible to retrieve the data in a much shorter time if there is an index createdontheNUMBERcolumn. Anindexcanbecomposedofmorethanonecolumn,uptoamaximumof16columns.Althougha tablecanhaveupto252columns,onlythefirst127columnscanbeusedinanindex.

105 Anindexcanbe unique or non-unique .Inauniqueindex,nomorethanonerowcanhavethesame keyvalue,withtheexceptionthatanynumberofrowsmayhaveNULLvalues.Ifyoucreateaunique indexonanonemptytable,DBMakerwillcheckwhetherallexistingkeysaredistinctornot.Ifthere areduplicatekeys,DBMakerwillreturnanerrormessage.Aftercreatingauniqueindexonatable, youcaninsertarowinthistableandDBMakerwillcertifythatthereisnoexistingrowthatalready hasthesamekeyasthenewrow. When creating an index, you can specify the sort order of each index column as ascending or descending.Forexample,supposetherearefivekeysinatablewiththevalues1,3,9,2,and6.In ascendingorderthesequenceofkeysintheindexis1,2,3,6,and9,andindescendingorderthe sequenceofkeysintheindexis9,6,3,2,and1. Whenyouimplementaquery,theindexorderwilloccasionallyaffecttheorderofthedataoutput. Forexample,ifyouhaveatablenamefriendswithNAMEandAGEcolumns,theoutputwillappear asbelowwhenyouexecutethequerySELECTNAME,AGEFROMFRIEND_TABLEWHEREAGE>20 usingadescendingindexontheAGEcolumn. nameage Jeff49 Kevin40 Jerry38 Hughes30 Cathy22 Asfortables,whenyoucreateanindexyoucanspecifythe fillfactor forit.Thefillfactordenoteshow densethekeyswillbeintheindexpages.Thelegalfillfactorvalues areintherangefrom 1%to 100%,andthedefaultis100%.Ifyouoftenupdatedataaftercreatingtheindex,youcansetaloose fillfactorintheindex,forexample60%.Ifyouneverupdatethedatainthistable,youcanleavethe fillfactoratthedefaultvalueof100%. Creating Indexes Tocreateanindexonatable,youmustspecifytheindexnameandindexcolumns.Youcanspecify the sort order of each column as ascending (ASC) or descending (DESC). The default sort order is ascending. For example, the following SQL command creates an indexIDX1onthecolumnNUMBERoftable EMPLOYEESindescendingorder. dmSQL>createindexidx1onEmployees(Numberdesc); Also, if you want to create a unique index you have to explicitly specify it. Otherwise DBMaker implicitlycreatesnonuniqueindexes.Thefollowingexampleshowsyouhowtocreateauniqueindex idx1 onthecolumn Number ofthetable Employees : dmSQL>createuniqueindexidx1onEmployees(Number); Thenextexampleshowsyouhowtocreateanindexwithaspecifiedfillfactor: dmSQL>createindexidx2onEmployees(Number,LastNameDESC)fillfactor60; Dropping Indexes

106 YoucandropindexesusingtheDROPINDEXstatement.Ingeneral,youmightneedtodropanindex if it becomes fragmented, which reduces its efficiency. Rebuilding the index will create a denser, unfragmentedindex. Iftheindexisaprimarykeyandisreferredtobyothertables,itcannotbedropped. ThefollowingSQLcommanddropstheindex idx1 fromthetable Employees . dmSQL>dropindexidx1fromEmployees; 7.5 Constraints and Triggers

Constraintsaredeclaractionsofconditionsaboutthedatabasethatmustremaintrue. Triggers are a special PL/SQL construct similar to procedures. However, a procedure is executed explicitlyfromanotherblockviaaprocedurecall,whileatriggerisexecutedimplicitlywheneverthe triggeringeventhappens.ThetriggeringeventiseitheraINSERT,DELETE,orUPDATEcommand. ThetimingcanbeeitherBEFOREorAFTER.Thetriggercanbeeitherrowlevelorstatementlevel, wheretheformerfiresonceforeachrowaffectedbythetriggeringstatementandthelatterfiresonce forthewholestatement Constraintsaredeclaractionsofconditionsaboutthedatabasethatmustremaintrue.Theseinclude attributedbased, tuplebased, key, and referential integrity constraints. The system checks for the violationoftheconstraintsonactionsthatmaycauseaviolation,andabortstheactionaccordingly. Information on SQL constraints can be found in the textbook. The Oracle implementation of constraintsdiffersfromtheSQLstandard,asdocumentedinOracle9iSQLversusStandardSQL. Triggers are a special PL/SQL construct similar to procedures. However, a procedure is executed explicitlyfromanotherblockviaaprocedurecall,whileatriggerisexecutedimplicitlywheneverthe triggeringeventhappens.ThetriggeringeventiseitheraINSERT,DELETE,orUPDATEcommand. ThetimingcanbeeitherBEFOREorAFTER.Thetriggercanbeeitherrowlevelorstatementlevel, wheretheformerfiresonceforeachrowaffectedbythetriggeringstatementandthelatterfiresonce forthewholestatement.

Deferring Constraint Checking Sometimes it is necessary to defer the checking of certain constraints, most commonly in the "chickenandegg"problem.Supposewewanttosay: CREATETABLEchicken(cIDINTPRIMARYKEY, eIDINTREFERENCESegg(eID)); CREATETABLEegg(eIDINTPRIMARYKEY, cIDINTREFERENCESchicken(cID)); But if we simply type the above statements into Oracle, we'll get an error. The reason is that the CREATETABLEstatementforchickenreferstotableegg,whichhasn'tbeencreatedyet!Creatingegg won'thelpeither,becauseeggreferstochicken. To work around this problem, we need SQL schema modification commands. First, create chicken andeggwithoutforeignkeydeclarations: CREATETABLEchicken(cIDINTPRIMARYKEY, eIDINT); CREATETABLEegg(eIDINTPRIMARYKEY,

107 cIDINT); Then,weaddforeignkeyconstraints: ALTERTABLEchickenADDCONSTRAINTchickenREFegg FOREIGNKEY(eID)REFERENCESegg(eID) INITIALLYDEFERREDDEFERRABLE; ALTERTABLEeggADDCONSTRAINTeggREFchicken FOREIGNKEY(cID)REFERENCESchicken(cID) INITIALLYDEFERREDDEFERRABLE; INITIALLYDEFERREDDEFERRABLEtellsOracletododeferredconstraintchecking.Forexample,to insert(1,2)intochickenand(2,1)intoegg,weuse: INSERTINTOchickenVALUES(1,2); INSERTINTOeggVALUES(2,1); COMMIT; Becausewe'vedeclaredtheforeignkeyconstraintsas"deferred",theyareonlycheckedatthecommit point. (Without deferred constraint checking, we cannot insert anything into chicken and egg, becausethefirstINSERTwouldalwaysbeaconstraintviolation.) Finally,togetridofthetables,wehavetodroptheconstraintsfirst,becauseOraclewon'tallowusto dropatablethat'sreferencedbyanothertable. ALTERTABLEeggDROPCONSTRAINTeggREFchicken; ALTERTABLEchickenDROPCONSTRAINTchickenREFegg; DROPTABLEegg; DROPTABLEchicken; Basic Trigger Syntax BelowisthesyntaxforcreatingatriggerinOracle(whichdiffersslightlyfromstandardSQLsyntax): CREATE[ORREPLACE]TRIGGER {BEFORE|AFTER}{INSERT|DELETE|UPDATE}ON [REFERENCING[NEWAS][OLDAS]] [FOREACHROW[WHEN()]] Someimportantpointstonote: • Youcan createonly BEFOREand AFTERtriggersfortables.(INSTEAD OFtriggersareonly availableforviews;typicallytheyareusedtoimplementviewupdates.) • YoumayspecifyuptothreetriggeringeventsusingthekeywordOR.Furthermore,UPDATE can be optionally followed by the keyword OF and a list of attribute(s) in . If present,theOFclausedefinestheeventtobe onlyanupdateoftheattribute(s)listedafter OF.Herearesomeexamples: • ...INSERTONR... • • ...INSERTORDELETEORUPDATEONR... • 108 ...UPDATEOFA,BORINSERTONR... • If FOR EACH ROW option is specified, the trigger is rowlevel; otherwise, the trigger is statementlevel. • Onlyforrowleveltriggers: o The special variables NEW and OLD are available to refer to new and old tuples respectively. Note: In the trigger body, NEW and OLD must be preceded by a colon (":"),butintheWHENclause,theydonothaveaprecedingcolon!Seeexamplebelow. o The REFERENCING clause can be used to assign aliases to the variables NEW and OLD. o A trigger restriction can be specified in the WHEN clause, enclosed by parentheses. ThetriggerrestrictionisaSQLconditionthatmustbesatisfiedinorderforOracleto firethetrigger.Thisconditioncannotcontainsubqueries.WithouttheWHENclause, thetriggerisfiredforeachrow. • isaPL/SQLblock,ratherthansequenceofSQLstatements.Oraclehasplaced certainrestrictionsonwhatyoucandoin,inordertoavoidsituationswhere onetriggerperformsanactionthattriggersasecondtrigger,whichthentriggersathird,and so on, which could potentially create an infinite loop. The restrictions on include: o You cannot modify the same relation whose modification is the event triggering the trigger. o Youcannotmodifyarelationconnectedtothetriggeringrelationbyanotherconstraint suchasaforeignkeyconstraint.

Trigger Example We illustrate Oracle's syntax for creating a trigger through an example based on the following two tables: CREATETABLET4(aINTEGER,bCHAR(10)); CREATETABLET5(cCHAR(10),dINTEGER); WecreateatriggerthatmayinsertatupleintoT5whenatupleisinsertedintoT4.Specifically,the triggercheckswhetherthenewtuplehasafirstcomponent10orless,andifsoinsertsthereverse tupleintoT5: CREATETRIGGERtrig1 AFTERINSERTONT4 REFERENCINGNEWASnewRow FOREACHROW WHEN(newRow.a<=10) BEGIN INSERTINTOT5VALUES(:newRow.b,:newRow.a); ENDtrig1; . run; Notice that we end the CREATE TRIGGER statement with a dot and run, as for all PL/SQL statementsingeneral.RunningtheCREATETRIGGERstatementonlycreatesthetrigger;itdoesnot executethetrigger.Onlyatriggeringevent,suchasaninsertionintoT4inthisexample,causesthe triggertoexecute.

109 Displaying Trigger Definition Errors AsforPL/SQLprocedures,ifyougetamessage Warning:Triggercreatedwithcompilationerrors. youcanseetheerrormessagesbytyping showerrorstrigger; Alternatively,youcantype,SHOERR(shortforSHOWERRORS)toseethemostrecentcompilation error.Notethatthereportedlinenumberswheretheerrorsoccurarenotaccurate. Viewing Defined Triggers Toviewalistofalldefinedtriggers,use: selecttrigger_namefromuser_triggers; Formoredetailsonaparticulartrigger: selecttrigger_type,triggering_event,table_name,referencing_names,trigger_body fromuser_triggers wheretrigger_name=''; Dropping Triggers Todropatrigger: droptrigger; Disabling Triggers Todisableorenableatrigger: altertrigger{disable|enable}; Aborting Triggers with Error Triggerscanoftenbeusedtoenforcecontraints.TheWHENclauseorbodyofthetriggercancheck for the violation of certain conditions and signal an error accordingly using the Oracle builtin functionRAISE_APPLICATION_ERROR.Theactionthatactivatedthetrigger(insert,update,ordelete) wouldbeaborted.Forexample,thefollowingtriggerenforcestheconstraintPerson.age>=0: createtablePerson(ageint); CREATETRIGGERPersonCheckAge AFTERINSERTORUPDATEOFageONPerson FOREACHROW BEGIN IF(:new.age<0)THEN RAISE_APPLICATION_ERROR(20000,'nonegativeageallowed'); ENDIF; END; . RUN; Ifweattemptedtoexecutetheinsertion: insertintoPersonvalues(3); wewouldgettheerrormessage: ERRORatline1: ORA20000:nonegativeageallowed ORA06512:at"MYNAME.PERSONCHECKAGE",line3 ORA04088:errorduringexecutionoftrigger'MYNAME.PERSONCHECKAGE' andnothingwouldbeinserted.Ingeneral,theeffectsofboththetriggerandthetriggeringstatement arerolledback.

110 7.6 Keys and Foreign Keys

The word " key " is much used and abused in the context of relational database design. In pre relational databases (hierarchtical, networked) and file systems (ISAM, VSAM, et al ) "key" often referredtothespecificstructureandcomponentsofalinkedlist,chainofpointers,orotherphysical locatoroutsideofthedata.Itisthusnatural,butunfortunate,thattodaypeopleoftenassociate"key" withaRDBMS"index".Wewillexplainwhatakeyisandhowitdiffersfromanindex. AccordingtoCodd,Date,andallotherexperts,akeyhasonlyonemeaninginrelationaltheory:itis asetofoneormorecolumnswhosecombined valuesare unique amongalloccurrencesinagiven table. A key is the relational means of specifying uniqueness . Why Keys Are Important Keysarecrucialtoatablestructureforthefollowingreasons: They ensure that each record in a table is precisely identified. Asyoualreadyknow,atablerepresents a singular collection of similar objects or events. (For example, a CLASSES table represents a collection ofclasses,notjustasingleclass.)Thecompletesetofrecordswithinthetableconstitutes the collection, and each record represents a unique instance of the table's subject within that collection.Youmusthavesomemeansofaccuratelyidentifyingeachinstance,andakeyisthedevice thatallowsyoutodoso.

They help establish and enforce various types of integrity. Keysareamajorcomponentoftablelevel integrity and relationshiplevel integrity. For instance, they enable you to ensure that a table has uniquerecordsandthatthefieldsyouusetoestablisharelationshipbetweenapairoftablesalways containmatchingvalues.

Candidate Key As stated above, a candidate key isanysetofoneormorecolumnswhosecombinedvalues are uniqueamongalloccurrences(i.e.,tuplesorrows).Sinceanullvalueisnotguaranteedtobeunique, nocomponentofacandidatekeyisallowedtobenull. There can be any number of candidate keys in a table. Relational pundits are not in agreement whether zero candidate keysisacceptable,sincethatwouldcontradictthe(debatable)requirement thattheremustbeaprimarykey. Primary Key The primary key of any table is any candidate key of that table which the database designer arbitrarilydesignatesas"primary".Theprimarykeymaybeselectedforconvenience,comprehension, performance, or any other reasons. It is entirely proper (albeit often inconvenient) to change the selectionofprimarykeytoanothercandidatekey. Alternate Key The alternate key sofanytablearesimplythosecandidatekeyswhicharenotcurrentlyselectedas theprimarykey.Accordingto{ Date95 }(page115),"...exactlyoneofthosecandidatekeys[is]chosen asthe primary key [and]theremainder,ifany,arethencalled alternate keys ."Analternatekeyisa function ofallcandidatekeys minus theprimarykey. 7.7 Constraints on Attributes and Tuples Not Null Constraints

111 presC#INTREFERENCESMovieExec(cert#)NOTNULL Attribute-Based CHECK Constraints presC#INTREFERENCESMovieExec(cert#) CHECK(presC#>=100000) genderCHAR(1)CHECK(genderIN(‘F’,‘M’)), presC#INTCHECK(presC#IN(SELECTcert#FROMMovieExec)) Tuple-Based CHECK Constraints CREATETABLEMovieStar(nameCHr(30)PRIMARYKEY,addressVARCHAR(255), genderCHAR(1),birthdateDATE,CHECK(gender=‘F’ORnameNOT‘Ms.%’)); 7.8 Modification of Constraints Constraints can be considered as part of the corresponding ER models; constraint definitions are storedinmetadatatablesandseparatedfromstoredprocedures(infact,theSQLServerstoresthe TransactSQLcreationscriptinthesyscommentstableforeachview,rule,default,trigger,CHECK constraint,DEFAULTconstraint,andstoredprocedure);forinstance,theCHECKcolumnconstraint oncolumnf1willbestoredin syscomments.textfield as a SQL statement: ([f1] > 1) ; constraints implementation can be modified independently from stored procedures implementation and, by providing a proper design, modification of constraints does not affect implementation of stored procedures(orrelatedTransactSQLscripts). Moreover, our ER model and corresponding constraints can be mapped to any other RDBMS that supportsasimilarmetadadaformat(whichis,basically,trueformostofthedatabase 7.9 Cursors

Cursor isa bitimageonthescreenthatindicateseitherthemovementofapointing device orthe placewheretextwillnextappear.Xlibenablesclientstoassociateacursorwitheachwindowthey create.Aftermakingtheassociationbetweencursorandwindow,thecursorisvisiblewheneveritis inthewindow.Ifthecursorindicatesmovementofapointingdevice,themovementofthecursorin thewindowautomaticallyreflectsthemovementofthedevice. XlibandVMSDECwindowsprovidefontsofpredefinedcursors.Clientsthatwanttocreatetheirown cursorscaneitherdefineafontofshapesandmasksorcreatecursorsusingpixmaps. Thissectiondescribesthefollowing: • CreatingcursorsusingtheXlibcursorfont,afontofshapesandmasks,andpixmaps • Associatingcursorswithwindows • Managingcursors • Freeingmemoryallocatedtocursorswhenclientsnolongerneedthem

Create CURSOR Xlibenablesclientstousepredefinedcursorsortocreatetheirowncursors.Tocreateapredefined

112 Xlib cursor, use the CREATE FONT CURSOR routine. Xlib cursors are predefined in ECW$INCLUDE:CURSORFONT.H. See the X and Motif Quick Reference Guide for a list of the constantsthatrefertothepredefinedXlibcursors. Thefollowingexamplecreatesasailboatcursor,one ofthepredefinedXlibcursors,andassociates thecursorwithawindow: Cursorfontcursor; . . . fontcursor=XCreateFontCursor(dpy,XC_sailboat); XDefineCursor(dpy,win,fontcursor); TheDEFINECURSORroutinemakesthesailboatcursorautomaticallyvisiblewhenthepointerisin window win . To create clientdefined cursors, either create a font of cursor shapes or define cursors using pixmaps.Ineachcase,thecursorconsistsofthefollowingcomponents: • ShapeDefinesthecursorasitappearswithoutmodificationinawindow • MaskActsasaclipmasktodefinehowthecursoractuallyappearsinawindow • BackgroundcolorSpecifiesRGBvaluesusedforthecursorbackground • ForegroundcolorSpecifiesRGBvaluesusedforthecursorforeground • HotspotDefinesthepositiononthecursorthatreflectsmovementsofthepointingdevice 7.10 Dynamic SQL

Dynamic SQL is an enhanced form of Structured Query Language (SQL) that, unlike standard (or static) SQL, facilitates the automatic generation and execution of program statements. This can be helpfulwhenitisnecessarytowritecodethatcanadjusttovaryingdatabases,conditions,orservers. Italsomakesiteasiertoautomatetasksthatarerepeatedmanytimes.DynamicSQLstatementsare storedasstringsofcharactersthatareenteredwhentheprogramruns.Theycanbeenteredbythe programmer or generated by the program itself, but unlike static SQL statements, they are not embeddedinthesourceprogram.AlsoincontrasttostaticSQLstatements,dynamicSQLstatements canchangefromoneexecutiontothenext. Let's go back and review the reasons we use stored procedure and what happens when we use dynamicSQL.Asastartingpointwewillusethisprocedure: CREATEPROCEDUREgeneral_select@tblnamenvarchar(127), @keykey_typeASkey_typeischar(3) EXEC('SELECTcol1,col2,col3 FROM'+@tblname+' WHEREkeycol='''+@key+'''') TheSELECTstatementinclientcodeandsendthisdirectlytoSQLServer.

113 1. Permissions Ifyoucannotgiveusersdirectaccesstothetables, you cannot use dynamic SQL ,itisassimpleas that.Insomeenvironments,youmayassumethatuserscan begiven SELECTaccess.But unless youknowforafactthatpermissionsisnotanissue,don'tusedynamicSQLforINSERT,UPDATE and DELETE statements. I should hasten to add this applies to permanent tables. If you are only accessingtemptables,thereareneveranypermissionissues.

2. Caching Query Plans As we have seen, SQL Server caches the query plans for both bare SQL statements and stored procedures,butissomewhatmoreaccurateinreusingqueryplansforstoredprocedures.InSQL6.5 you could clearly say that dynamic SQL was slower, because there was a recompile each time. In laterversionsofSQLServer,thewatersaremoremuddy. 3. Minimizing Network Traffic InthetwoprevioussectionswehaveseenthatdynamicSQLinastoredprocedureisnotanybetter thanbareSQLstatementsfromtheclient.Withnetworktrafficitisadifferentmatter.Thereisnever any network cost for dynamic SQL in a stored procedure. If we look at our example procedure general_select ,neitheristheremuchtogain.ThebareSQLcodetakesaboutasmanybytesasthe procedurecall. Butsaythatyouhaveacomplexquerywhichjoinssixtableswithcomplexconditions,andoneofthe tableisoneof sales0101 , sales0102 etcdependingonwhichperiodtheuserwantsdataabout.This isabadtabledesign,thatwewillreturnto,butassumeforthemomentthatyouarestuckwiththis. IfyousolvethiswithastoredprocedurewithdynamicSQL,youonlyneedtopasstheperiodasa parameteranddon'thavetopassthequeryeachtime.Ifthequeryisonlypassedonceanhourthe gainisnegligible.Butifthequeryispassedeveryfifthsecondandthenetworkissoso,youarelikely tonoticeadifference. 4. Using Output Parameters Ifyouwriteastoredprocedureonlytogainthebenefitofanoutputparameter,youdonotinanyway affectthisbyusingdynamicSQL.Thenagain,youcangetOUTPUTparameterswithoutwritingyour ownstoredprocedures,sinceyoucancall sp_executesql directlyfromtheclient.

5. Encapsulating Logic Thereisnotmuchtoaddtowhatwesaidinourfirstroundonstoredprocedures.Iliketopointout, however,thatonceyouhavedecidedtousestoredprocedure,youshouldhaveallsecretsaboutSQL instoredprocedures,sopassingtablenamesasingeneral select isnotagoodidea.(Theexception herebeingsysadminutilities.) 6. Keeping Track of what Is Used DynamicSQLiscontradictorytothisaim.AnyuseofdynamicSQLwillhideareference,sothatit willnotshowupin sysdepends .Neitherwillthereferencerevealitselfwhenyoubuildthedatabase withoutthereferencedobject.Still,ifyourefrainfrompassingtableorcolumnnamesasparameters, youatleastonlyhavetosearchtheSQLcodetofindoutwhetheratableisused.Thus,ifyouuse dynamicSQL,confinetableandcolumnnamestotheprocedurecode.

114

UNIT 8

RELATIONAL ALGEBRA

8.1 BasicsofRelationalAlgebra 8.2 SetOperationsonRelations 8.3 ExtendedOperatorsofRelationalAlgebra 8.4 ConstraintsonRelations 8.5 SelfTest 8.6 Summary

8.1 Basics of Relational Algebra

Relational algebra isaproceduralquerylanguage,whichconsistsofasetofoperationsthattakeone ortworelationsasinputandproduceanewrelationastheirresult.Thefundamentaloperationsthat will be discussed in this tutorial are: select , project , union , and set difference . Besides the fundamentaloperations,thefollowingadditionaloperationswillbediscussed: set-intersection . Eachoperationwillbe appliedtotablesofasample database.Eachtableis otherwise known asa relationandeachrowwithinthetableisreferedtoasatuple.Thesampledatabaseconsistsoftables inwhichonemightseeinabank.Thesampledatabaseconsistsofthefollowing6relations: The account relation branch-name account-number balance Downtown A101 500 Mianus A215 700 Perryridge A102 400 Round Hill A305 350 Brighton A201 900 Redwood A222 700 Brighton A217 750 8.2 Set Operations on Relations The select operationisaunaryoperation,whichmeansitoperatesononerelation.Itsfunctionisto selecttuplesthatsatisfyagivenpredicate.Todenoteselection,thelowercaseGreeklettersigma( )is used. The predicate appears as a subscript to . The argument relation is given in parentheses followingthe . Forexample,toselectthosetuplesoftheloanrelationwherethebranchis"Perryridge,"wewrite: branchhome="Perryridge"(loan) Theresultsofthequeryarethefollowing:

115 branch-name loan-number amount Perryridge L15 1500 Perryridge L16 1300 Comparisonslike=, ,<,>, canalsobeusedintheselectionpredicate.Anexamplequeryusinga comparisonistofindalltuplesinwhichtheamountlentismorethan$1200wouldbewritten: amount>1200(loan) b) Project The project operationisaunaryoperationthatreturnsitsargumentrelationwithcertainattributes left out. Since a relation is a set, any duplicate rows are eliminated. Projection is denoted by the Greekletterpi( ).Theattributesthatwishtobeappearintheresultarelistedasasubscriptto . Theargumentrelationfollowsinparentheses.Forexample,thequerytolistallloannumbersandthe amountoftheloaniswrittenas: loannumber,amount(loan) Theresultofthequeryisthefollowing: loan-number amount L17 1000 L23 2000 L15 1500 L14 1500 L93 500 L11 900 L16 1300 AnothermorecomplicatedexamplequeryistofindthosecustomerswholiveinHarrisoniswritten as: customername( customercity="Harrison"(customer)) c) Union The union operationyieldstheresultsthatappearineither or both of two relations. It is a binary operationdenotedbythesymbol . Anexamplequerywouldbetofindthenameofallbankcustomerswhohaveeitheranaccountora loan or both. Tofindthisresultwewillneedthe information in the depositor relation and in the borrowerrelation.Tofindthenamesofallcustomerswithaloaninthebankwewouldwrite: customername(borrower)andtofindthenamesofallcustomerswithanaccountinthebank,we wouldwritecustomername(depositor) Then by using the union operation on these two querieswehavethequeryweneedtoobtainthe wantedresults.Thefinalqueryiswrittenas: customername(borrower) customername(depositor)

116 Theresultofthequeryisthefollowing: customer-name Johnson Smith Hayes Turner Jones Lindsay Jackson Curry Williams Adams

d) Set Difference The set-difference operation,denotedbythe,resultsinfindingtuplestahtareinonerelationbut arenotinanother.Theexpressionrsresultsinarelationcontainingthosetuplesinrabutnotins. Forexample,thequerytofindallcustomersofthe bank who have an account but not a loan, is writtenas: customername(depositor) customername(borrower) Theresultofthequeryisthefollowing: customer-name Johnson Turner Lindsay Forasetdifferencetobevalid,itmustbetakenbetweencompatibleoperationsjustasintheunion operation. 8.3 Extended Operators of Relational Algebra The set intersection operationisdenotedbythesymbol .Itisnotafundamentaloperation,however itisamoreconvenientwaytowriter(rs). Anexamplequeryoftheoperationtofindallcustomerswhohavebothaloanandandaccountcan bewrittenas: customername(borrower) customername(depositor) Theresultsofthequeryarethefollowing: customer-name Hayes Jones Smith Ithasbeenshownthatthesetofrelationalalgebraoperations{ σ, π,U,–,x}isacompleteset;thatis, anyoftheotherrelationalalgebraoperationscanbeexpressedasasequenceofoperationsfromthis set.Forexample,theINTERSECTIONoperationcanbeexpressedbyusingUNIONandDIFFERENCE asfollows:

117 R ∩S ≡(R ∪S)–((R–S) ∪(S–R)) Although,strictlyspeaking,INTERSECTIONisnotrequired,itisinconvenienttospecifythiscomplex expressioneverytimewewishtospecifyanintersection.Asanotherexample,aJOINoperationcan bespecifiedasaCARTESIANPRODUCTfollowedbyaSELECToperation,aswediscussed:

R S ≡σ (RxS) Similarly,aNATURALJOINcanbespecifiedasaCARTESIANPRODUCTproceededbyRENAMEand followed by SELECT and PROJECT operations. Hence, the various JOIN operations are also not strictlynecessaryfortheexpressivepoweroftherelationalalgebra;however,theyareveryimportant becausetheyareconvenienttouseandareverycommonlyappliedindatabaseapplications assignment the operation denoted by which is used to assign expressions to a temporaryrelationvariable. Cartesian product the operation denoted by a cross (X) allows for combination of informationfromanytworelations. division the operation denoted by and used in queries wanting to find results includingthephrase"forall". natural join theoperationthatpertainstoaquerythatinvolvesaCartestianproduct includesaselectionoperationontheresultoftheCartesianproduct. project theoperationdenotedbytheGreekletterpi( ),whichisusedtoreturn anargumentwithcertainattributesleftout. rename theoperationdenotedbytheGreekletterrho( ),whichallowstheresults ofarelationalalgebraexpressiontobeassignedaname,whichcanlater beusedtorefertothem. select the operation denoted by the Greek letter sigma (), which enables a selectionoftuplesthatsatisfyagivenpredicate. Set difference the operation denoted by allows for finding tuples that are in one relationbutarenotinanother. Set-intersection theoperationdenoted by whichresultsinthe tuplesthatareinboth relationstheoperationisapplyingto. union anoperationonrelations thatyieldstherelationofalltuplessharedby twoormorerelations.Denotedbythesymbol: 8.4 Constraints on Relations

Representation of Relations Wecanregardarelationintwoways:asasetofvaluesandasasetofmapsfromattributesfrom values.

Let betheschemaofarelation R,andlet bethe domain ofthe relation.Then Risasubsetof andeachtupleoftherelationcontainsasetofvalues,onedrawn fromeachofthedomains ,eachofwhichcontainsauniquenullelement,denoted .

118 Wecanalsoregardeachelementoftherelationasamapfrom Rto ,sothatif isan elementofaninstance rof R,wecanwrite andgettheexpectedresult. IntegrityConstraintsonRelations

Wedefinea (candidate) key onarelationschema Rtobeasubset oftheattributes of Rsuchthatforanyinstance forall wehave for .A primary key isacandidatekeyinwhichnoneofthe .Wedesignate one candidate keytobethe primary key of R, .We write tosignifythe projection of tontotheprimarykey .Where thereisnochanceofconfusionwewillwrite for . Werequireeveryrelationtosatisfyitsprimarykey,andallitscandidateskeys.Let Cbethesetofall candidatekeysof Randlet betheprimarykeyof R:werequire tosatisfy and OperationsonRelations Takingtheviewofarelationas asetwe can applythenormal setoperationstorelations overthe same domain. If the domain differ this is not possible. We have, of course, the normal algebraic structuretotheoperations:thenullrelationoveradomainiswelldefined,andthenulltupleisthe soleelementofthenullrelation. Wealsohavethree relational operators wewishtoconsider:select,joinandproject.

Firstwedefineforeachrelation Rthedomain ofconditionalexpressionsonrelations,which mapanelementofaninstance totrueorfalse. Select

Nowwedefine by

isarelationoverthesamedomainas Randisasubsetof R.Wenoticethatwecanusethe same primary key for both R and and that must satisfy this key, since if there exist

then Rdoesnotsatisfy . Join

Thejoinoperation isdefinedby

wherewehave

119 Whatistheschemaforthis?Thekey?Doesitsatisfyit? Project

Wedefine bythepostcondition and by requiring that if is the domain of R then the domain of is

. Ifweview Rasasetofmapwecanviewtheprojectionoperatorasrestrictingeachofthesemapstoa smallerdomain.

Theschemaof is A.If then formsakeyfor A.Otherwise,iftherearenonullsinany tupleoftheprojectionof Rwecanuse Aasaprimarykey.Otherwisewecannotingeneralidentifya primarykey. Insertion Foreachrelation Rwedefineaninsertionoperation:

Theinsertionoperationsatisfiestheinvariant,sinceitwillrefusetobreakit. Update Foreachrelation Rwedefineanupdateoperation.

Updatealsopreservestheinvariant. Deletion Wedefinethedeletionofatupleby:

8.5 Self Test 1. Simple selection and projection i.Whoarethepatients10yearsoldoryounger? ii.Whoarethesurgeons iii.Whatarethephonenumbersofdoctors iv.Whatarethephonenumbersofsurgeons

2. Set Operations i.Restatetheexpression1060() age age patients s £Ú³usingsetoperations. ii.Restatetheexpressionoculist() rank surgeon rank doctors s ¹Ù¹usingsetoperationswithout¹ andÙ iii.Findallthepatientswhosawdoctor801butnot802(i.e.dnum=801,dnum¹802)

3. Cartesian Product and Join i.Formpeergroupsforpatients,whereapeergroupisapairofpatientswhereagedifferenceisless than10years(canusetherenameoperator rA(R) ).

120 ii.Whoarethesurgeonswhovisitedthepatient101(i.e.pnum=101)? iii.Whohasseenasurgeoninthepasttwoyears? iv.Isthereanynonsurgeondoctorswhoperformedasurgeon(adoctorperformedasurgeonifthe visitrecordshowsdiagnosis=”operation”forhim)?

4. Divison i.Whohasseen all thesurgeonsinthepasttwomonths? ii.Findallpatientsexceptsfortheyoungestones.patients(pnum,pname,age) doctors(dnum,dname,rank)visits(pnum,dnum,date,diagnosis)

8.6 Summary • Relational algebra is a procedural query language, which consists of a set of operations that takeoneortworelationsasinputandproduceanewrelationastheirresult.Thefundamental operationsthatwillbediscussedinthistutorialare: select , project , union ,and set difference . • SetOperationsonRelations:selection,projection,joinandsettheory. • The set intersection operationisdenotedbythesymbol .Itisnotafundamentaloperation, howeveritisamoreconvenientwaytowriter(rs).

121

UNIT 9

MANAGEMENT CONSIDERATIONS

9.1 Introduction 9.2 Objectives 9.3 OrganisationalResistanceToDbmsTools 9.4 ConversionFromAnOldSystemToANewSystem 9.5 EvaluationOfADbms 9.6 AdministrationOfADatabaseManagementSystem 9.7 SelfTest 9.8 Summary

9.1 INTRODUCTION

Thisunitwillcompriseissuesoforganizationalresistance,themethodologyforconversionfroman old system to a new system, the importance of adopting a decentralized distributed approach and evaluationandadministrationofsuchsystems.

9.2 OBJECTIVES Aftergoingthroughthisunit,youshouldbeableto: • IdentifythefactorscausingresistancetotheinductionofnewDBMStools; • Determinethepaththatmustbechoseninconvertingfromanoldexistingsystemtoanew system; • ListthevariousfactorsthatareimportantinevaluatingaDBMSsystem; • FormulateasimpleevaluationmethodologyforDBMSselectionandacquisition; • Enumeratethefunctionsofthedatabaseadministrator;and • List the checkpoints and principles, which must be, adhered to in order that information qualityisassured. 9.3 ORGANISATIONAL RESISTANCE TO DBMS TOOLS

In practice, this does not happen and organizations react to information systems by offering resistance. This is a part of an inherent opposition to change. There are some aspects of change related to information system that are those great passions. This arose because of some of the followingfactors: • Political observation: The officers and managers at different levels of an organization feel threatened with the nice long standing political equations and relationships which have

122 enjoyedtheirotherwiseupwardmovementwithintheorganization,andmaybethreatenedby anewinterventionintotheirstylesofworking. • Information transparency :Intheabsenceofanelectroniccomputerbasedefficientinformation system,manyfunctionariesinanorganizationhaveaccesstoinformation,whichtheycontrol andpassongivingitthecolorthatwouldsuitthem.Theavailabilityofinformationthrough computerbasedsystemstoalmostallwhowouldhaveaninterestinitmakesthisauthority disappear.Itisthereforenaturallyresented. • Fear of future potential :Theveryfactthatcomputerscanstoreinformationinaverycompact manneranditcanbecollatedandanalyzedveryspeedilygivesrisetoapprehensionsoffuture adverseuseofthisinformationagainstanindividual.Mistakesndecisionmakingcannowbe highlightedandanalyzedindetailafterlearningspellsoftime.Itwouldnothavebeenpossible inmanualfilebasedsystemsoranysystemwherethedatadoesnotflowsoreadily. Interdepartmentalrivalry,fearofpersonalinadequacy,incomprehensiveofthenewregimetheloss of ones own power and the greater freedom to others and difference in work styles all add up to produce resistance to the induction of new information processing tools. Apart from these general considerations,therearereasonstoresistinstallationofanewDBMS. ThereareseveralpointsofresistancetonewDBMStools: • Resistancetoacquiringanewtool • Resistancetochoosingtouseanewtool • Resistancetolearninghowtouseanewtool • Resistancetousinganewtool. Theselectionandacquisitionofa DBMS andrelatedtoolsisoneofthemostimportantcomputer relateddecisionsmadeinanorganization.Itisalsooneofthemostdifficult.Therearemanysystems from which to choose and it is very difficult to obtain the necessary information to make a good decision.Vendorsalwayshavegreatthingstosay,convincingargumentfortheirsystems,andoften manysatisfiedcustomers.Publishedliteratureandsoftwarelistingservicesaretoocursorytoprovide sufficientinformationonwhichtobaseadecision.Themeredifficultyingatheringinformationand makingtheselectionisonepointofresistancetoacquiringthenewDBMStools. Theinitialcostmayalsobeabarriertoacquisition.However,thesubsequentinvestmentintraining people,developingapplications,andenteringandmaintainingdatawillbemuchtimemore.Selection ofaninadequatesystemcangreatlyincreasethesesubsequentcoststothe pointwheretheinitial acquisitioncostbecomesirrelevant. In spite of the apparent resistance to acquisition, the projections for the database industry are forecastingamultibilliondollarindustryinthe1990's.Eventhoughanorganizationmayacquirea DBMS,therearestillseveraladditionalpointsofresistancetoovercome. SimplyhavingaDBMSdoesnotmeanthatitwillbeused.Severalfactorsmaycontributetothelack ofuseofnewDBMStools. Lackoffamiliaritywiththetoolsandwhatitcando SystemdevelopersusedtowritingCOBOL(orotherlanguage)programsprefertobuildsystemsusing thetoolstheyalreadyknow

123 Thepressuretogetnewapplicationdevelopmentprojectscompleteddictatesusingestablishedtools andtechniques Systemsdevelopmentpersonnelhavenotbeenthoroughlytrainedintheuseofthenewtool DBMS tools DP management is afraid of run away demand on the computing facilities if they allow users to directlyaccessthedataonthehostcomputerusinganeasytouse,highlevelretrievalfacility Organizationalpolicies, whichdonotdemand,appropriatejustificationforthetoolschosen(ornot chosen)foreachsystemdevelopmentproject. HavingpointedoutthetransactionsfromwhichutilitycanarisetotheinterventionofanewDBMS, itmaybeusefultohaveasummaryofafewpointers,whichwouldpossiblyleadtoagreatersuccess insuchanendeavor. Reasonsforsuccess Reasonsforfailure Appreciationforinformation Perceptionbythebarrensin isavaluablecorporate theorganizationthattheMIS reasonsanditsmanagement designisamenace mustbegivenspecial conflictinginterestto importance. preventthesuccess. Focusingonmostbeneficial oversailingNUStotop, usageofdatabase,which managementandchosen relatetothebottomlevel. applicationsfortheir challengetotheprogrammer members. Anincrementalapproachto Agrantdesignforcreation buildingapplicationswith ofanimpressivesystem eachnewstepbeingreasonably thatcanbeapinatafor smallandrelativelyeasyto allinformationproblems implement. Cooperatewideplanningbya Fragmentedplansbynon highlevel,empowered, communicatingandnot competentdataadministrator. eventuallyresponsegroups. Conversionplanningwhich Asituationwhichmayput Permitsallthesystemstoco intothenewsystemand exitwiththenew. attemptstorewriteto manyoldprograms. Awarenesseducationand Apathybymostpeopleto involvementofallpersonsat implementationofthenew alevelappropriatetotheir system. functions.

124 Goodunderstandingofthe Inadequatecomputingpower, technicalissuesandtight incorrectassignmentof technicalcontrolbythe throughputandassigned databaseadministrators. timeandfailuretomonitor usageandperformance. Recognitionoftheimportance Casualapproachtodata ofadatadictionaryand standardsanddocumentation. standardsfornaming,update controlandversion synchronization. Simplicity. Confusedthinking. Apropermixofcentralized Indifferenceofthecentral guidanceanddecentralized systemandproliferationof implementation. incompatiblesystems. Provenworkfreesoftware. Thelatestsoftwarewonder.

9.4 CONVERSION FROM AN OLD SYSTEM TO A NEW SYSTEM Management is also concerned with longterm corporate strategy. The database selected has to be consistentwiththecommitmentsofthatcorporatestrategy.Butiftheorganizationdoesnothavea corporate database, then one has to be developed before conversion is to take place. Selecting a databasehastobefromthetopdown:dataflowdiagrams,representingtheorganization'sbusiness functions, processes and activities should be drawn up first followed by entityrelation charts detailingtherelationshipsbetweendifferentbusinessinformation;andthenfinallybydatamodeling. • Corporate Strategic Decisions: The database approach to information systems is a longterm investment. It requires a largescale commitment of an Organization's resources in compatible hardwareandsoftware,skilledpersonnelandmanagementsupport.Accompanyingcostsarethe education and training of the personnel, conversion,and documentation.Itisessentialforan organizationtofullyappreciate,ifnotunderstand,theproblemsofconvertingfromanexisting, filebasedsystemtoadatabasesystem,andtoaccepttheimplicationsofitsoperationbeforethe conversion. • Hardware Requirements and Processing Time: Thedatabaseapproachshouldbeinapositionto delegate to the database management system some of the functions that were previously performedbytheapplicationprogrammer.Asaresultofthisdelegation,acomputerwithalarge internalmemoryandgreaterprocessingpowerisneeded.Powerfulcomputersystemswereonce theluxuryenjoyedbythosedatabaseuserswhocouldaffordsuchsystemsbutfortunately,this trend is now changing. A recent development in hardware technology has made it possible to acquirepowerful,yetaffordablesystem.

125 For some applications, the need for highvolume transaction processing may force a company to engineeroneorevenseveralsystemsdesignedtosatisfythisneed.Thissacrificescertainflexibility forthesystemtorespondtoadhocrequests.Anditisalsoarguedthatbecauseoftheeasieraccess to data in the database, the frequency of access will become higher. Such overuse of computing resourceswillcauseslipsinperformance,resultinginanincreaseddemandforcomputingcapacity. Itissometimesdifficulttodetermineiftheincreasedaccesstothedatabaseisrallynecessary. The database approach offers a number of important and practical advantages to an organization. Reducing data redundancy improves consistency of the data as well as making savings in storage space.Sharingdataoftenenablesnewapplicationtobedevelopedwithouthavingtocreatenewdata files.Lessredundancyandgreatersharingalsoresultinlessconfusionbetweenorganizationalunits and less file spent by people resolving inconsistencies in reports. Centralized control over data standards, security restrictions, and so on, facilitates the evolution of information systems and organizationsinresponsetochangingbusinessneedsandstrategies.Nowaday,userswithlittleor no previous programming experience can, with the aid of powerful user friendly query languages, manipulatedatatosatisfyadhocqueries.Dataindependencehelpscaseprogramdevelopmentand maintenance,raisingprogrammerproductivity.Allthebenefitsofthedatabaseapproachcontribute toreducecostsofapplicationdevelopmentandimprovedqualityofmanagerialdecisions. Ingeneral,convertingtoadatabaseinvolvesthefollowing: • Inventories current systems such as data volume, user satisfaction, present condition, and thecosttomaintainorredevelop. • Determiningconversionpriorityinstrategicinformationsystemplans,buildingblocksystems andcriticalneedstoreplacesystem. • Obtaincommitmentfromsenior/topmanagement. • Appointqualifieddatabaseadministrationstaff. • Educationmanagementinformationsystemsstaff. • Selectsuitableandappropriatesoftware. • Installdatadictionaryfirst. • Involveandeducateusers. • Redesignandimplementnewdatastructures. • Write software utility tools to convert the files or database in the old system to the new database. • Modifyallapplicationprogramstomakeuseofthenewdatastructures. • Designasimpledatabasefirstforpilottesting. • Implementallsoftware. • Updatepoliciesandprocedures. • Installthenewdatabaseonproduction. In the recent trend of database development, a common front end to the various database management systems will often be constructed in such a way that the original systems and the programs on them are not modified, but their databases can be mapped to each other through a singleuniformlanguage. Another approach is to unify various database structures by applying the database standards laid downbytheInternationalStandardsOrganizationfordatadefinitionanddatamanipulation.Public

126 acceptanceofthesestandarddatabasestructureswillensureamorerapiddevelopmentofadditional conversion tools, such as automatic functions for loading and unloading databases into standard formsformodeltomodeldatabasemapping. If an organization after weighing all the relevant factors decides to make an investment in a good databasemanagementsystem,ithastodevelopaproductplannedfordoingso.Manyofthesteps required are more or less along the lines that are required when an organization first moves in towardstheuseofcomputerbasedinformationsystem.Onewouldimmediatelynotethesimilarityto the steps referred to in the course on "System Analysis and Design". In the interest of briefing therefore the reference would be only to those factors, which are of greater consequences for the problem at hand. It may, however, be useful to bear in mind that a detailed implementation plan wouldbemoreorlessalongthefinesofcreationofacomputerinformationsystemforthefirsttime. 9.5 EVALUATION OF A DBMS

The evaluation is not simply a matter of comparison or description of one system against another independent system, and surveying sometimes available through publication do describe and compare the features of available systems, but a value of an organization depends upon its own problemenvironment.Anorganizationmustthereforelook atthese ownneedstoevaluationof the availablesystems. Itisworthwhileputtingsomeattentiontowhoshoulddothis.Inasmallorganizationitispossible that a single individual would be able to do the job, but larger organizations need to formally establish an evaluation team. Even this team's composition would somewhat change as the evaluation process moves on. A good role in the initial stage would be played by users and managementfocusontheorganizationalneeds.ComputersandInformationtechnologyprofessionals then evaluate the technical gaps of several candidate systems and finally financial and accounting personnel examine the cost estimates, payment alternatives, tax consequences, personnel requirements,andcontractnegotiations. The reasons which inspire the organization to acquire a DBMS should be clearly documented and usedtodeterminethepropertiesandhelpinmakingtradeoffsbetweenconflictingobjectivesandin theselectionofvariousfeaturesthatthecandidateDBMSmayhave,dependingupontheenduser requirements. The evaluation team should also be aware of technical and administrative issues. Thesetechnicalcriteriacouldbethefollowing: a. SQL implementation b. Transaction management c. Programming interface d. Database server environment e. Data storage features f. Database administration g. Connectivity h. DBMSintegrity Similarlytherecouldbeadministrativecriteriasuchas:

127 1. Required hardware platform 2. Documentation 3. Vendor's financial stability 4. Vendor support 5. Initial cost 6. Recurringcost Eachofthese,especiallythetechnicalcriteriacouldbefurtherbrokenintosubcriteria.Forexample thedatastoragefeaturescanbefurthersubclassifiedinto: a. Lost database segments b. Clustered indexes c. Clusteredtables Oncethislevelofdetailingisdone,thelistoffeatures become quite large and may even run into hundreds.Ifadozenproductsaretobeevaluated,wearetalkingofafairlylargematrix. Atthispoint,itisimportantfortheevaluationwarsandespeciallyitstechnicalmemberstosegregate these features into those, which are mandatory. Mandatory features would be those which if not present in the candidate system, the system need not be considered further. For example, does DBMS provide facilities for programming and nonprogramming users? Can be considered as one amongseveralmandatoryconditions.Mandatoryrequirementmayalsoflowfromadesiretopreserve the previous investment in information systems made by an organization. The presence of the mandatoryconditionmeansthatthesystemisacandidatefortheratingprocedure. Havingdonethefirststageofcreatingafeaturelist,oneofthesimplestwayscouldbetodevelopa tablewherethefeaturesanditsrelatedinformationforeachcandidatesystemislistedtoinatabular form against the desired feature. Such forms can be chosen to compare the various systems and althoughthiscannotbeenoughtoconcludeanevaluation,itisausefulmethodforatleastbroadly rankingandshortlistingthesystems.Aquantitativeflavourcanbegiventotheaboveapproachby awardingpointsforfeatures,whichareinsimpleYesandNotype.Ifallthefeaturesarenotequally importanttotheorganization,thenthesummingupofthepointsawardedforeachofthefeaturesfor anyofthesystemisnotquiteappropriate.Insuchacasearatingfactorcanbeassignedtoeach featuretoreflecttherelevantlevelofimportanceofthatfeaturetotheorganization.Ofcoursesuch ratingorscoringshouldbedoneafterthefirstconditionofmandatoryrequirementshavebeenmet bytheproposedsystem.Sometimesthemandatorycharacteristicsmaybeexpressedinthenegative assomething,whichthesystemmustnothave. Thepointsoftheratesisacontentiousissueandmustbedecidedlookingonlytotheneedsofthe organization and with reference to the characteristics of any specific candidate system one of the approachesusedtowardsarrivingatasuitablesetofratingfactorsistofollowtheDelphimethod.In brief, the Delphi approach requires key people who may be expected to be knowledgeable to make suggestions as to what, would be the appropriate rating factor. These are collected, compiled, averagestakenanddeviationfromaveragespointedoutThisdataisthenrecirculatedtothesame setofpeopleforwantingtochangetheiropinionswheretheirownviewswerevaryinglargelyfromthe average.Thedetailscanthenbe carriedoutandithasbeenfoundthatinaboutasfewas3to4 iterationsingoodconsensusemerges.

128 Oneoftheweaknessesofthemethodologiesdiscussedsofaristhattheyarefocusingonthe systems but not on the cost benefit aspects. A good evaluation methodology should be possibly to suggestthemostcosteffectivesolutiontotheproblem.Forexample,ifasystemistwiceasgoodas anothersystem,butcostsonly40%morethanitoughttobeapreferredsolution. Inordertocarryacostafteranalysisonehastousearatingfunctionwitheachfeatureto normalize the sequence. Rather than having an approach where a feature is characterized as a Yes/No,theattributecorrespondingtoitspresenceorabsence,whichinmarkstermcouldbe0or1, amarkcanbegivenonascalewhichisappropriatetothefeature.Thiscanariseinissuessuchas the number of terminals that are supported or the amount of main memory required. Rating functionscanbeofseveraltypesofwhich4areillustratedinFigure1. Linear:Inalinearratingfunctiontheratingincreasesinproportiontohighermarksstarting from0.Brokenlinear:Therearesituationswheretheminimumthresholdisessentialandsimilarly thereisasaturatedvalueabovewhichnoadditionalvalueisgiven.Typicallyingeneralconcurrent access,fewor3wouldbeincludablevalueandmorethan9isofnoadditionalvalue. Binary:ThisisofcourseaYes/Notypewhereasystemeitherhasordoesnothavethefeature orsomeminimumvalueforthefeature. Inverse:Therearesomeattributeswhereahighermarkactuallyimpliesalowerrating.For exampleinaccessingthetimetoprocessastandardquery,themarkmaybesimplythetimescalein anappropriatemanner.Therefore,ashortertimeactuallyhasahigherrating. For each feature, the rating function uses an appropriate and convenient scale of measurement for determining a system's feature mark. The rating function transforms a system's featuremarkintoanormalizedratingindicatingitsvaluerelativetoanominalmarkforthatfeature. Thenominalmarkforeachfeaturehasanominalratingofone. The use of rating function is more sophisticated and costly to apply than the simplified methodologies. Thegreaterobjectivityandprecision obtained must be weighted against the overall benefits of DBMS acquisition and use. Some features will have no appropriate objective scale on whichtomarkthefeature.Theanalystcoulduseafivepointscalewithalinearratingfunctionas follows: Featureevaluation Ratingpoint Excellent(A) 5 Good(B) 4 Average(C) 3 Fair(D) 2 Poor(E) 1 Variationscanexpandorcontracttheratingscale,usinganonlinearratingfunction,orexpandthe points in the feature evaluation scale to achieve greater resolution. In extreme cases, the analyst couldsimplyusesubjectivejudgmenttoarriveataratingdirectly,rememberingthatafeaturerating ofoneappliestoanominaloraveragesystem. Having converted all the marks to ratings, the system score is the product of the rating and the weightsummedacrossallfeatures,justasbefore.Theoverallscoreforanominalsystemwouldhe one(sinceallweightssumtooneandallnominalratingsareone).Thisisimportantfordetermining cost effectiveness, the ratio between the value of a system and its cost. The organization first determines the value of a system which cams a nominal mark for all features. This is called the nominalvalue.Thentheactualvalueofagivensystemistheproductoftheoverallsystemscoreand thenominalvalue.Thecosteffectiveness ofasystemistheactualvaluedividedbythecostofthe

129 system. System cost is the present value cost of acquisition, operation and maintenance over the estimatedlifeofthesystem. .

Figure:SampleFeatureRatingFunctions Withacosteffectivenessmeasureforseveralcandidatesystems,the organizationwouldtentatively selectthesystemwiththehighestcosteffectivenessratio. Ofcoursetheremaybeintangiblefactorsotherthanthetechnicalandadministrativecriteria referred to earlier, which may influence the final selection based upon political judgments of the

130 managementorsomeotherconsiderations.Itwouldofcoursebepossibletoeventobuildtheseupof thatcanbeexplicitlysoillustratedintotheevaluationprocess. 9.6 ADMINISTRATION OF A DATABASE MANAGEMENT SYSTEM

Acquiring a DBMS is not sufficient for successful data management. The role of database administrator provides the human focus of responsibility to make it all happen. One person or severalpersonsmayfilltheDBArole. Wheneverpeoplesharetheuseofacommonresourcesuchasdata,thepotentialforconflictexists. Thedatabaseadministratorroleisfundamentallyapeopleorientedfunctiontomediatetheconflicts andseekcompromisefortheglobalgoodfortheorganization. Within an organization, database administration generally begins as a support functionwithin the systems development unit. Sometimes it is in a technical support unit associated with operations. Eventually,itshouldbeseparatefrombothdevelopmentandoperations,residinginacollectionof supportfunctionsreportingdirectlytothedirectorofinformationsystems.Suchapositionhassome stature, some independence, and can work directly with users to capture their data requirements. Databaseadministrationworkswithdevelopment,operations,anduserstocoordinatetheresponse to data needs. The database administrator is the key link in establishing and maintaining management and user confidence in the database and in the system facilities, which make it availableandcontrolitsintegrity. While the 'doing' of database system design and development can be decentralized to several development projects in the Data Processing Department or the user organizations, planning and controlofdatabasedevelopmentshouldbecentralized.Inthiswayanorganizationcanprovidemore consistentandcoherentinformationtosuccessivelyhigherlevelsofmanagement. Thefunctionsassociatedwiththeroleofdatabaseadministrationinclude: • Definition, creation, revision, and retirement of data formally collected and stored within a sharedcorporatedatabase. • MakingthedatabaseavailabletotheusingenvironmentthroughtoolssuchasaDBMSand relatedquerylanguagesandreportwriters. • Informing and advising users on the data resources currently available, the proper interpretation of the data, and the use of the availability tools. This includes educational materials,trainingsessions,participationonprojects,andspecialassistance. • Maintaining database integrity including existence control (backup and recovery), definition control,qualitycontrol,updatecontrol,concurrencycontrol,andaccesscontrol. • Monitor and improve operations and performance, and maintain an audit trail of database activities. • Thedatadictionaryisoneofthemoreimportanttoolsforthe databaseadministrator.Itis usedtomaintaininformationrelatingtothevarious resources used in information systems (hencesometimescalledaninformationresourcedictionary)data,inputtransactions,output reports,programs,applicationsystems,andusers.Itcan: • Assisttheprocessofsystemanalysisanddesign.

131 • Provideamorecompletedefinitionofthedatastoredinthedatabase(thanismaintainedby theDBMS). • Enable an organization to assess the impact of a suggested change within the information systemorthedatabase. • Helpinestablishingandmaintainingstandardsforexample,ofdatanames. • Facilitatehumancommunicationthroughmorecompleteandaccuratedocumentation. • Severaldatadictionarysoftwarepackagesarecommerciallyavailable. • TheDBAshouldalsohavetoolstomonitortheperformanceofthedatabasesystemtoindicate theneedforreorganizationorrevisionofthedatabase.

9.7 SELF TEST 1) Determinethepaththatmustbechoseninconvertingfromanoldexistingsystemtoanew system; 2) ListthevariousfactorsthatareimportantinevaluatingaDBMSsystem; 3) ExplainroleofDBAwiththehelpofanexample. 9.8 SUMMARY • There are some aspects of change related to information system that are those great passionssuchas Political observation and information transparency etc. • Organizational policies, which do not demand, appropriate justification for the tools chosen(ornotchosen)foreachsystemdevelopmentproject. • Theinitialcostmayalsobeabarriertoacquisition.However,thesubsequentinvestment in training people, developing applications, and entering and maintaining data will be much time more. Selection of an inadequate system can greatly increase these subsequentcoststothepointwheretheinitialacquisitioncostbecomesirrelevant. • Broken linear: There are situations where the minimum threshold is essential and similarlythereisasaturatedvalueabovewhichnoadditionalvalueisgiven.Typicallyin generalconcurrentaccess,fewor3wouldbeincludablevalueandmorethan9isofno additionalvalue.

132

UNIT 10

CONCURRENCY CONTROL

10.1 SerialandSerializabilitySchedules 10.2 ConflictSerializability 10.3 EnforcingSerializabilitybyLocks 10.4 LockingSystemsWithSeveralLockModes 10.5 ArchitectureforaLockingScheduler 10.6 ManagingHierarchiesofDatabaseElements 10.7 ConcurrencyControlbyTimestamps 10.8 ConcurrencyControlbyValidation 10.9 Summary 10.1 Serial and Serializable Schedules When two or more transactions are running concurrently, the steps of the transactions would normally be interleaved . The interleaved execution of transactions is decided by the database scheduler , which receives a stream of user requests that arise from the active transactions. A particularsequencing(usuallyinterleaved)oftheactionsofasetoftransactionsiscalleda schedule . A serial schedule isascheduleinwhichallthe operations ofonetransactionarecompletedbefore anothertransactioncanbegin(thatis,thereisnointerleaving). Serialexecutionmeansnooverlapoftransactionoperations. IfT1andT2transactionsareexecutedserially:

RT1 (X) WT1 (X) RT1 (Y) WT1 (Y) RT2 (X) WT2 (X) or

RT2 (X) WT2 (X) RT1 (X) WT1 (X) RT1 (Y) WT1 (Y) Thedatabaseisleftinaconsistentstate. The basic idea: Eachtransactionleavesthedatabaseinaconsistentstateifrunindividually Ifthetransactionsarerunoneaftertheother,thenthedatabasewillbeconsistent Forthefirstschedule: Database T1 T2 --- x=100, y=50 --- read(x) --- x=100 --- x:=x*5 --- x=500 --- --- x=500, y=50 --- write(X) read(Y) --- y=50 ---

133 Y:=Y5 --- y=45 --- --- x=500, y=45 --- write(Y) read(x) --- x=500 --- x:=x+8 --- x=508 ------x=508, y=45 --- write(X) Forthesecondschedule: Database T1 T2 --- x=100, y=50 --- read(x) --- x=100 --- x:=x+8 --- x=108 ------x=108, y=50 --- write(x) read(x) --- x=108 --- x:=x*5 --- x=540 --- --- x=540, y=50 --- write(x) read(y) --- y=50 --- y:=y5 --- y=45 --- --- x=540, y=45 --- write(y) SerializableSchedules Let Tbeasetof ntransactions T1, T2, ..., Tn .Ifthe ntransactionsareexecutedserially(callthis execution S), we assume they terminate properly and leave the database in a consistent state. A concurrent execution of the n transactions in T (call this execution C) is called serializable if the executionis computationally equivalenttoaserialexecution.Theremaybemorethanonesuchserial execution. That is, the concurrent execution C always produces exactly the same effect on the databaseas some serialexecution Sdoes.(Notethat Sissomeserialexecutionof T,notnecessarily theorder T1, T2, ..., Tn ).Aserial scheduleisalwayscorrect sincewe assumetransactions do not depend on each other and furthermore, we assume, that each transaction when run in isolation transforms a consistent database into a new consistent state and therefore a set of transactions executedoneatatime(i.e.serially)mustalsobecorrect.

Example 1. Given the following schedule, draw a serialization (or precedence) graph and find if the scheduleisserializable.

134 Solution : Thereisasimpletechniquefortestingagivenschedule Sforserializability.Thetestingis basedonconstructingadirectedgraphinwhicheachofthetransactionsisrepresentedby

onenodeandanedgebetween and existsifanyofthefollowingconflictoperationsappear intheschedule:

executesWRITE( X)before executesREAD( X),or

executesREAD( X)before executesWRITE( X)

executesWRITE( X)before executesWRITE( X). Ifthegraphhasacycle,thescheduleisnotserializable.

10.2 Conflict-Serializability

• Two operations conflict if: o Theyareissuedbydifferenttransactions, o Theyoperateonthesamedataitem,and o Atleastoneofthemisawriteoperation • Two executions are conflict-equivalent, if in both executions all conflicting operations have the same order • An execution is conflict-serializable if it is conflict-equivalent to a serial history

Conflict graph Executionisconflictserializableifftheconflictgraphisacyclic T1 T2 T3 W1(a)R2(a)R3(b)W2(c)R3(c)W3(b)R1(b) Example Schedule Conflict

135 Graph Nodes: transactions Directededges: conflicts between operations

Serializablity (examples) • H1:w1(x,1),w2(x,2),w3(x,3),w2(y,1),r1(y) • H1isviewserializable,sinceitisviewequivalenttoH2below: o H2:w2(x,2),w2(y,1),w1(x,1),r1(y),w3(x,3) • However, H1 is not conflictserializable, since its conflict graph contains a cycle: w1(x,1) occursbeforew2(x,2),butw2(x,2),w2(y,1)occursbeforer1(y) • NoserialschedulethatisconflictequivalenttoH1exists

Execution Order vs. Serialization Order • ConsiderthescheduleH3below:H3:w1(x,1),r2(x),c2,w3(y,1),C3,r1(y),C1 • H3isconflictequivalenttoaserialexecutioninwhichT3isfollowedbyT1,followedbyT2 • This is despite the fact that T2 was executed completely and committed, before T3 even started

Recoverability of a Schedule • AtransactionT1readsfromtransactionT2,ifT1readsavalueofadataitemthatwaswritten intothedatabasebyT2 • AscheduleHisrecoverable,iffnotransactioninHiscommitted,beforeeverytransactionit readfromiscommitted • Theschedulebelowisserializable,butnotrecoverable:H4:r1(x),w1(x),r2(x),w2(y)C2,C1

Cascadelessness of a Schedule • AscheduleHiscascadeless(avoidscascadingaborts),iffnotransactioninHreadsavalue thatwaswrittenbyanuncommittedtransaction • Theschedulebelowisrecoverable,butnotcascadeless:H4:r1(x),w1(x),r2(x),C1,w2(y)C2 Strictness of a Schedule • A schedule H is strict if it is cascadeless and no transaction in H writes a value that was previouslywrittenbyanuncommittedtransaction • Theschedulebelowiscascadeless,butnotstrict:H5:r1(x),w1(x),r2(y),w2(x),C1,C2 • Strictnesspermitstherecoveryfrombeforeimageslogs

Strong Recoverability of a Schedule • AscheduleHisstronglyrecoverable,iffforeverypairoftransactionsinH,theircommitment orderisthesameastheorderoftheirconflictingoperations. • Theschedulebelowisstrict,butnotstronglyrecoverable:H6:r1(x)w2(x)C2C1

136

Rigorousness of a Schedule • AscheduleHisrigorous,ifitisstrictandnotransaction in H reads a data item untils all transactionsthatpreviouslyreadthisitemeithercommitorabort • Theschedulebelowisstronglyrecoverable,butnotrigorous:H7:r1(x)w2(X)C1C2 • Arigorousscheduleisserializableandhasallpropertiesdefinedabove 10.3 Enforcing Serializability by Locks Databaseserverssupporttransactions:sequencesofactionsthatareeitherallprocessedornoneat all, i.e.atomic . To allow multiple concurrent transactions access to the same data, mostdatabase serversuseatwophaselockingprotocol.Eachtransactionlockssectionsofthedatathatitreadsor updates to prevent others from seeing its uncommitted changes. Only when the transaction is committed or rolled back can the locks be released. This was one of the earliest methods of concurrencycontrol,andisusedbymostdatabasesystems. Transactionsshouldbe isolated fromothertransactions.TheSQLstandard'sdefaultisolationlevelis serialisable .Thismeansthatatransactionshouldappeartorunaloneanditshouldnotseechanges madebyotherswhiletheyarerunning.Databaseserversthatusetwophaselockingtypicallyhaveto reducetheirdefaultisolationlevelto read committed becauserunningatransactionasserialisable wouldmeanthey'dneedtolockentiretablestoensurethedataremainedconsistent,andsuchtable locking would block all other users on the server. So transaction isolation is often traded for concurrency. But losing transaction isolation has implications for the integrity of your data. For example,ifwestartatransactiontoreadtheamountsinaledgertablewithoutisolation,anytotals calculatedwouldincludeamountsupdated,insertedordeletedbyotherusersduringourreadingof therows,givinganunstableresult. Database research in the early 1980s discovered a better way of allowing concurrent access to data.Storingmultipleversionsofrowswouldallowtransactionstoseeastablesnapshotofthedata. It had the advantage of allowing isolated transactions without the drawback of locks. While one transaction was reading a row, another could be updatingtherowbycreatinganewversion.This solutionatthetimewasthoughttobeimpractical:storagespacewasexpensive,memorywassmall, andstoringmultiplecopiesofthedataseemedunthinkable. Of course, Moore's Law has meant that disk space is now inexpensive andmemory sizes have dramaticallyincreased.This,togetherwithimprovementsinprocessorpower,hasmeantthattoday we can easily store multiple versions and gain the benefits of high concurrency and transaction isolationwithoutlocking. Unfortunatelythelockingprotocolsofpopulardatabasesystems,manyofwhichweredesignedwell overadecadeago,formthecoreofthosesystemsandreplacingthemseemstohavebeenimpossible, despiterecent research again finding that storing multiple versions isbetter than a single versionwithlocks

137 10.4 Locking Systems With Several Lock Modes Several Object Orientated Databases, which were more recently developed, have incorporated OCC withintheirdesignstogaintheperformanceadvantagesinherentwithinthistechnologicalapproach. Though optimistic methods were originally developed for transaction management the concept is equallyapplicableformoregeneralproblemsofsharingresourcesanddata.Themethodshavebeen incorporated into several recently developed Operating Systems, and many of the newer hardware architecturesprovideinstructionstosupportandsimplifytheimplementationofthesemethods. OptimisticConcurrencyControldoesnotinvolveanylockingofrowsassuch,andthereforecannot involveanydeadlocks.Insteaditworksbydividingthetransactionintophases. • Build-up commencesthestartofthetransaction.Whenatransactionisstartedaconsistent viewofthedatabaseisfrozenbasedonthestateafterthelastcommittedtransaction.Thismeans thattheapplicationwillseethisconsistentviewofthedatabaseduringtheentiretransaction.This isaccomplishedbytheuseofaninternal Transaction Cache ,whichcontainsinformationaboutall ongoing transactions in the system. The application "sees" the database through the Transaction Cache.DuringtheBuildupphasethesystemalsobuildsupa Read Set documentingtheaccesses tothedatabase,anda Write Set ofchangestobemade,butdoesnotapplyanyofthesechangesto thedatabase.TheBuildupphaseendswiththecallingoftheCOMMITcommandbytheapplication. • The Commit involves using the Read Set and the Transaction Cache to detect access conflicts with other transactions. A conflict occurs when another transaction alters data in a way thatwouldalterthecontentsoftheReadSetforthetransactionthatischecked.Othertransactions that were committed during the checked transaction's Buildup phase or during this check phase can cause a conflict. If a transaction conflict is detected, the checked transaction is aborted. No rollbackisnecessary,asnochangeshavebeenmadetothedatabase.Anerrorcodeisreturnedto theapplication,whichcanthentakeappropriateaction.Oftenthiswillbetoretrythetransaction withouttheuserbeingawareoftheconflict. • IfnoconflictsaredetectedtheoperationsintheWrite Set forthetransactionaremovedto another structure, called the Commit Set that is to be secured on disk. All operations for one transaction are stored on the same page in the Commit Set (if the transaction is not very large). Before the operations in the Commit Set are secured on permanent storage, the system checks if thereisanyothercommittedtransactionsthatcanbestoredonthesamepageintheCommitSet. After this, all transactions stored on the Commit Set page are written to disk (to the transaction databank TRANSDB) in one single I/O operation. This behavior is called a Group Commit, which means that several transactions are secured simultaneously. When the Commit Set has been securedondisk(inoneI/Ooperation),theapplicationisinformedofthesuccessoftheCOMMIT commandandcanresumeitsoperations. • Duringthe Apply phasethechangesareappliedtothedatabase,i.e.thedatabanksandthe shadowsare updated.TheBackgroundthreadsintheDatabaseServercarryoutthisphase.Even though the changes are applied in the background, the transaction changes are visible to all applications through the Transaction Cache. Once this phase is finished the transaction is fully complete.IfthereisanykindofhardwarefailurethatmeansthatSQLisunabletocompletethis phase,itisautomaticallyrestartedassoonasthecauseofthefailureiscorrected.

138 MostotherDBMSsoffer pessimistic concurrency control .Thistypeofconcurrencycontrolprotectsa user'sreadsandupdatesbyacquiringlocksonrows(orpossiblydatabasepages,dependingonthe implementation),thisleadstoapplicationsbecoming'contentionbound'withperformancelimitedby othertransactions.Theselocksmayforceotheruserstowaitiftheytrytoaccessthelockeditems. The user that 'owns' the locks will usually complete their work, committing the transaction and therebyfreeingthelockssothatthewaitinguserscancompetetoattempttoacquirethelocks. OptimisticConcurrencyControl(OCC)offersanumberofdistinctadvantagesincluding: • Complicated locking overhead is completely eliminated. Scalability is affected in locking systemsasmanysimultaneoususerscauselockinggraphtraversalcoststoescalate. • Deadlockscannotoccur,sotheperformanceoverheadsofdeadlockdetectionareavoidedas wellastheneedforpossiblesystemadministratorinterventiontoresolvethem. • ProgrammingissimplifiedastransactionabortsonlyoccurattheCommitcommandwhereas deadlocks can occur at any point during a transaction. Also it is not necessary for the programmertotakeanyactiontoavoidthepotentiallycatastrophiceffectsofdeadlocks,such as carrying out database accesses in a particular order. This is particularly important as potential deadlock situations are rarely detected in testing, and are only discovered when systemsgolive. • Datacannotbeleftinaccessibletootherusersasaresultofausertakingabreakorbeing excessively slow in responding to prompts. Locking systems leave locks set in these circumstancesdenyingotherusersaccesstothedata. • Data cannot be left inaccessible as a result of client processes failing or losing their connectionstotheserver. • Delayscausedbylockingsystemsbeingoverlycautiousareavoided.Thiscanariseasaresult oflargerthannecessarylockgranularity,buttherearealsoseveralothercircumstanceswhen lockingcausesunnecessarydelaysevenwhenusingfinegranularitylocking. • Removestheproblemsassociatedwiththeuseofadhoctools. • ThroughtheGroupCommitconcept,whichisappliedinSQL,thenumberofI/Osneededto securecommittedtransactionstothediskisreducedtoaminimum.Theactualupdatesto the database are performed in the background, allowing the originating application to continue. • TheROLLBACKstatementissupportedbut,becausenothingiswrittentotheactualdatabase duringthetransactionBuildupphase,thisinvolvesonlyareinitializationofstructuresused bythetransactioncontrolsystem. • AnothersignificanttransactionfeatureinSQLis the concept of Read-Only transactions, whichcanbeusedfortransactionsthatonlyperformreadoperationstothedatabase.When performingaReadOnlytransaction,theapplicationwillalwaysseeaconsistentviewofthe database. Since consistency is guaranteed during a ReadOnly transaction no transaction checkisneededandinternalstructuresusedtoperformtransactionchecks(i.e.theReadSet) isnotneeded,andforthisreasonnoReadSetisestablishedforaReadOnlytransaction.This hassignificantpositiveeffectsonperformanceforthesetransactions.ThismeansthataRead Onlytransactionalwayssucceeds,unaffectedofchangesperformedbyothertransactions.A ReadOnlytransactionalsoneverdisturbsanyothertransactionsgoingoninthesystem.For example,acomplicatedlongrunningquerycanexecuteinparallelwithOLTPtransactions. 139 10.5 Architecture for a Locking Scheduler

ArchitectureFeatures • MemoryUsage • SharedMemory Filesystem • PageReplacementProblems • Pageeviction • SimplisticNRUreplacement • Clockalgorithmcanevictaccessedpages • Suboptimalreactiontovariableloadorload Spikesafterinactivity • Improvements: • FinergrainedSMPLocking • Unificationofbufferandpagecaches • Supportforlargermemoryconfigurations • SYSVsharedmemorycodereplaced • Pageagingreintroduced • Active&inactivepagelists • Optimizedpageflushing • Controlledbackgroundpageaging • Aggressivereadahead SMP locking optimizations, Use of global “kernel_lock” was minimized. More subsystem based spinlockareused.Morespinlocksembeddedindatastructures. Semaphoresusedtoserializeaddressspaceaccess. Moreofaspinlockhierarchyestablished.Spinlockgranularitytradeoffs.

Kernel multi-threading improvements Multiplethreadscanaccessaddressspacedatastructuressimultaneously. Singlemem>msemsemaphorewasreplacedwithmultiplereader/singlewritersemaphore. Readerlockisnowacquiredforreadingperaddressspacedatastructures. Exclusivewritelockisacquiredwhenalteringperaddressspacedatastructures. 32bitUIDsandGIDs Increasefrom16to32bitUIDsallowupto4.2billionusers. Increasefrom16to32bitGIDsallowupto4.2billiongroups. 64bitvirtualaddressspace,Architecturallimitofthevirtualaddressspacewas expandedtoafull64bits. IA64currentlyimplements51bits(16disjoint47bitregions) Alphacurrentlyimplements43bits(2disjoint42bitregions) S/390currentlyimplements42bits FutureAlphaisexpandedto48bits(2disjoint47bitregions)

140 Unifiedfilesystemcache Singlepagecachewasunifiedfromprevious Page cache read write functionality. Reduces memory consumption by eliminating double buffered copiesoffilesystemdata. Eliminatesoverheadofsearchingtwolevelsofdatacache. 10.6 Managing Hierarchies of Database Elements

These storage type are sometimes called the storage hierarchy. It contains of the archival storage. It consist of the archival database, physical database, archival log, and current log. Physical database :thisistheonlinecopyofthedatabasethatisstoredinnonvolatilestorageand usedbyallactivetransactions.

Current Database: the current version of the database is made up of physical database plus modificationsimpliedbybufferinthevolatilestorage. Databaseusers

Programcode Application i Application i Andbuffer involatile storage Data Buffer Log Buffers

physical currentlog, databaseon checkpoint nonvolatile onstablestorage storage archivecopy archivelog ofdatabase onstable onstable storage storage Archivaldatabaseinstablestorage:thisisthecopyofthedatabaseatagiventime,stored.itcontain theentiredatabaseinaquiescentmodeandcouldhavebeenmadebysimpledumproutinetodump thephysicaldatabaseontostablestorage.alltransactionthathavebeenexecutedonthedatabase fromthetimeofarchivinghavetoberedlineinaglobalrecoverydatabaseisacopyofthedatabasein aquiescentstate,andonlythecommittedtransactionsincethetimeofarchivingareappliedtothis database. Current log: the log information required for recovery from system failure involving loss of volatile information. Archivallog:isusedforfailureinvolvingiflossofnonvolatileinformation.

141 TheonlineorcurrentdatabaseismadeupofalltherecordsthatareaccessibletotheDBMSduring its operation. The current database consist of the data stored in nonvolatile storage and not yet propagatedtotthenonvolatilestorage.

10.7 Concurrency Control by Timestamps

One of the important transactions is that their effect on shared data is serially equivalent. This meansthatanydatathatistouchedbyasetoftransactionsmustbeinsuchastatethattheresults couldhavebeenobtainedifallthetransactionsexecutedserially(oneafteranother)insomeorder(it doesnotmatterwhich).Whatisinvalid,isforthedatatobeinsomeformthatcannotbetheresultof serialexecution(e.g.twotransactionsmodifyingdataconcurrently).Oneeasywayofachievingthis guaranteeistoensurethatonlyonetransactionexecutesatatime.Wecanaccomplishthisbyusing mutual exclusion and having a “transaction” resource that each transaction must have access to. However,thisisusuallyoverkillanddoesnotallowustotakeadvantageoftheconcurrencythatwe maygetin distributed systems(forinstance,itis obviously overkill if two transactions don’t even access the same data). What we would like to do is allow multiple transactions to execute simultaneously but keep them out of each other’s way and ensure serializability. This is called concurrency control . Locking Wecanuseexclusivelocksonaresourcetoserializeexecutionoftransactionsthatshareresources. Atransactionlocksanobjectthatitisabouttouse.Ifanothertransactionrequeststhesameobject and it is locked, the transaction must wait until the object is unlocked. To implement this in a distributed system, we rely on a lock manager a server that issues locks on resources. This is exactlythesameasacentralizedmutualexclusionserver:aclientcanrequestalockandthensend amessagereleasingalockonaresource(byresourceinthiscontext,wemeansomespecificblockof datathatmaybereadorwritten).Onethingtowatchoutfor,isthatwestillneedtopreserveserial execution:iftwotransactionsareaccessingthesamesetofobjects,theresultsmustbethesameas ifthetransactionsexecutedinsomeorder(transactionAcannotmodifysomedatawhiletransaction BmodifiessomeotherdataandthentransactionAaccessesthatmodifieddatathisisconcurrent modification).Toensureserialorderingonresourceaccess,weimposearestrictionthatstatesthata transactionisnotallowedtogetanynewlocksafterithasreleasedalock.Thisisknownas two- phase locking .Thefirstphaseofthetransactionisa growing phase inwhichitacquiresthelocksit needs.Thesecondphaseisthe shrinking phase wherelocksarereleased. Strict two-phase locking Aproblemwithtwophaselockingisthatifatransactionaborts,someothertransactionmayhave already used data from an object that the aborted transaction modified and then unlocked. If this happens,anysuchtransactionswillalsohavetobeaborted.Thissituationisknownas cascading aborts .Toavoidthis,wecanstrengthenourlockingbyrequiringthatatransactionwillholdallits lockstotheveryend:untilitcommitsorabortsratherthanreleasingthelockwhentheobjectisno longerneeded. Thisisknownas strict two-phase locking .

142 Locking granularity Atypicalsystemwillhavemanyobjectsandtypicallyatransactionwillaccessonlyasmallamountof dataatanygiventime(anditwillfrequentlybethecasethatatransactionwillnotclashwithother transactions).Thegranularityoflockingaffectstheamountofconcurrencywecanachieve.Ifwecan havea smallergranularity(lock smallerobjectsor piecesof objects)thenwecangenerallyachieve higher concurrency. For example, suppose that all of a bank’s customers are locked for any transactionthatneedstomodifyasinglecustomerdatum:concurrencyisseverelylimitedbecause anyothertransactionsthatneedtoaccess any customerdatawillbeblocked.If,however,weusea customer record as the granularity of locking, Transactions that access different customer records willbecapableofrunningconcurrently.

Multiple readers/single writer Thereisnoharmhavingmultipletransactionsreadfromthesameobjectaslongasithasnotbeen modified by any of the transactions. This way we can increase concurrency by having multiple transactionsrunconcurrentlyiftheyareonlyreadingfromanobject.However,onlyonetransaction should be allowed to write to an object. Once a transaction has modified an object, no other transactionsshouldbeallowedtoreadorwritethemodifiedobject.Tosupportthis,wenowusetwo locks: read locks and write locks . Read locks are also known as shared locks (since they can be sharedbymultipletransactions)Ifatransactionneedstoreadanobject,itwillrequestareadlock fromthelockmanager.Ifatransactionneedstomodifyanobject,itwillrequestawritelockfromthe lockmanager.Ifthelockmanagercannotgrantalock,thenthetransactionwillwaituntilitcanget thelock(afterthetransactionwiththelockcommittedoraborted).Tosummarizelockgranting:Ifa transactionhas:anothertransactionmayobtain:nolocksreadlockorwritelockreadlockreadlock (waitforwritelock)writelockwaitforreadorwritelocks Increasing concurrency: two-version locking Twoversionlockingisanoptimisticconcurrencycontrolschemethatallowsonetransactiontowrite tentative versions of objects while other transactions read from committed versions of the same objects. Read operations only wait if another transaction is currently committing the same object. Thisschemeallowsmoreconcurrencythanreadwritelocks, but writingtransactionsriskwaiting(or rejection) when they attempt to commit. Transactions cannot commit their write operations immediatelyifotheruncommittedtransactionshavereadthesameobjects.Transactionsthatrequest tocommitinthissituationhavetowaituntilthereadingtransactionshavecompleted. Two-version locking Thetwoversionlockingschemerequiresthreetypesoflocks:read,write,andcommitlocks.Before anobjectisread,atransactionmustobtaina read lock .Beforeanobjectiswritten,thetransaction mustobtaina write lock (sameaswithtwophaselocking).Neitheroftheselockswillbegrantedif thereisa commit lock ontheobject.Whenthetransactionisreadytocommit: allofthetransaction’s write locks arechangedto commit locks ifanyobjectsusedbythetransactionhaveoutstandingreadlocks,thetransaction mustwaituntilthetransactionsthatsettheselockshavecompletedandthelocksarereleased. Ifwecomparetheperformancedifferencebetweentwoversionlockingandstricttwophaselocking (read/writelocks):

143 read operations in twoversion locking are delayed only while transactions are being committed rather than during the entire execution of transactions(usuallythecommitprotocoltakesfarless timethanthetimetoperformthetransaction) but…readoperationsofonetransactioncancauseadelayinthecommittingof othertransactions. Problems with locking LocksarenotwithoutdrawbacksLockshaveanoverheadassociatedwiththem:alockmanageris neededtokeeptrackoflocksthereisoverheadinrequestingthem.Evenreadonlyoperationsmust still request locks. The use of locks can result in deadlock. We need to have software in place to detect or avoid deadlock. Locks can decrease the potential concurrency in a system by having a transactionholdlocksforthedurationofthetransaction(untilacommitorabort). Optimistic concurrency control KingandRobinson(1981)proposedanalternativetechniqueforachievingconcurrencycontrol,called optimistic concurrency control .Thisisbasedontheobservationthat,inmostapplications, the chanceoftwotransactionsaccessingthesameobjectislow.Wewillallowtransactionstoproceedas iftherewerenopossibilityofconflictwithothertransactions:atransactiondoesnothavetoobtainor check for locks. This is the working phase . Each transaction has a tentative version (private workspace)oftheobjectsitupdatescopyofthemostrecentlycommittedversion.Writeoperations recordnewvaluesastentativevalues.Beforeatransactioncancommit,avalidationisperformedon allthedataitemstoseewhetherthedataconflictswithoperationsofothertransactions.Thisisthe validation phase . If the validation fails, then the transaction will have to be aborted and restarted later. If the transaction succeeds, then the changes in the tentative version are made permanent. This is the update phase .Optimisticcontrolisdeadlockfreeandallowsformaximumparallelism(atthe expenseofpossiblyrestartingtransactions) Timestamp ordering Reed presented another approach to concurrency control in 1983. This is called timestamp ordering .Eachtransactionisassignedauniquetimestampwhenitbegins(canbefromaphysicalor logical clock). Each object in the system has a read and write timestamp associated with it (two timestampsperobject).The read timestampisthetimestampofthelastcommittedtransactionthat readtheobject.The write timestampisthetimestampofthelastcommittedtransactionthatmodified the object (note the timestamps are obtained from the transaction timestamp the start of that transaction) Theruleoftimestamporderingis: ifatransactionwantstowriteanobject,itcomparesitsowntimestampwiththeobject’sreadand writetimestamps.Iftheobject’stimestampsareolder,thentheorderingisgood. if a transaction wants to read an object, it compares its own timestamp with the object’s write timestamp.Iftheobject’swritetimestampisolderthanthecurrenttransaction,thentheorderingis good.

144 Ifatransactionattemptstoaccessanobjectanddoesnotdetectproperordering,thetransactionis abortedandrestarted(improperorderingmeansthatanewertransactioncameinandmodifieddata beforetheolderonecouldaccessthedataorreaddatathattheolderonewantstomodify). 10.8 Concurrency Control by Validation Validation or certification techniques. A transaction proceeds without waiting and all updates are appliedtolocalcopies.Attheend,avalidationphasecheckifanyupdatesviolateserializability.If certified,thetransactioniscommittedandupdatesmadepermanent.Ifnotcertified,thetransaction isabortedandrestartedlater. Three phases: readphase validationphase writephase Validation Test EachTisassociatedwiththreeTS's: start(T):Tstarted val(T):Tfinishedreadandstarteditsvalidation finish(T):Tfinisheditswritephase Validation test for Ti: for each Tj that is committed or in its validation phase, at least one of the followingholds: finish(Tj)<start(Ti) writeset(Tj)∩readset(Ti)=finish(Tj)<val(Ti) writeset(Tj)∩readset(Ti)=.. writeset(Tj)∩writeset(Ti)=.. val(Tj)<val(Ti)

10.9 Summary • When two or more transactions are running concurrently, the steps of the transactions wouldnormallybe interleaved .Theinterleavedexecutionoftransactionsisdecidedbythe database scheduler , which receives a stream of user requests that arise from the active transactions. A particular sequencing (usually interleaved) of the actions of a set of transactions is called a schedule . A serial schedule is a schedule in which all the operationsofonetransactionarecompletedbeforeanothertransactioncanbegin(thatis, thereisnointerleaving). • Twooperationsconflictif: o theyareissuedbydifferenttransactions, o theyoperateonthesamedataitem,and o atleastoneofthemisawriteoperation

Two executions are conflict-equivalent, if in both executions all conflicting operations have the same order.

145 • Toallowmultipleconcurrenttransactionsaccesstothesamedata,mostdatabaseserversuse a twophase locking protocol. Each transaction locks sections of the data that it reads or updatestopreventothersfromseeingitsuncommittedchanges.Onlywhenthetransactionis committedorrolledbackcanthelocksbereleased.Thiswasoneoftheearliestmethodsof concurrencycontrol,andisusedbymostdatabasesystems. • severalObjectOrientatedDatabases,whichweremorerecentlydeveloped,haveincorporated OCC within their designs to gain the performance advantages inherent within this technologicalapproach. • MemoryUsage,SharedMemory

146 UNIT 11

TRANSACTION MANAGEMENT

11.1 IntroductionofTransactionmanagement 11.2 SerializabilityandRecoverability 11.3 ViewSerializability 11.4 ResolvingDeadlocks 11.5 DistributedDatabases 11.6 DistributedCommit 11.7 DistributedLocking 11.8 Summary 11.1 Introduction of Transaction Management Thesynchronizationprimitiveswehaveseensofararenotashighlevelaswemightwantthemtobe since they require programmers to explicitly synchronize, avoid deadlocks, and abort if necessary. Moreover, the highlevel constructs such as monitors and path expressions do not give users of sharedobjectsflexibilityindefiningtheunitofatomicity.Wewillstudyhereahighleveltechnique, calledconcurrencycontrol,whichautomaticallyensuresthatconcurrentlyinteractingusersdo not executeinconsistentcommandsonsharedobjects.Avarietyofconcurrencymodelsdefiningdifferent notions of consistency have been proposed. These models have been developed in the context of database management systems, operating systems, CAD tools, collaborative software engineering, and collaboration systems. We will focus here on the classical database models and the relatively neweroperatingsystemmodels. Atypeofcomputerprocessinginwhichthecomputerrespondsimmediatelytouserrequests.Each request is considered to be a transaction. Automatic teller machines for banks are an example of transactionprocessing. Theoppositeoftransactionprocessingisbatchprocessing,inwhichabatchofrequestsisstoredand thenexecutedallatonetime.Transactionprocessingrequiresinteractionwithauser,whereasbatch processingcantakeplacewithoutauserbeingpresent. TheRDBMSmustbeabletosupportacentralizedwarehousecontainingdetaildata,providedirect accessforallusers,andenableheavyduty,adhocanalysis.Yet,formanycompaniesjuststartinga warehouseproject,itseemsanaturalchoicetosimplyusethecorporatestandarddatabasethathas already proven itself for missioncritical work. This approach was especially common in the early days of data warehousing, when most people expected a warehouse to do little more than provide cannedreports. But decisionsupport requirements have evolved far beyond canned reports and known queries. Today'sdatawarehousesmustgiveorganizationstheindepthandaccurateinformationtheyneedto

147 personalize customer interactions at all touch points and convert browsers to buyers. An RDBMS designed for transaction processing can't keep up with the demands placed on data warehouses: support for high concurrency, mixedworkload, detail data, fast query response, fast data load, ad hocqueries,andhighvolumedatamining. The notion of concurrency control is closely tied to the notion of a ``transaction''. A transaction definesasetof``indivisible''steps,thatis,commandswiththeAtomicity,Consistency,Isolation,and Durability(ACID)properties: Atomicity :Eitherallornoneofthestepsofthetransactionoccursothattheinvariantsoftheshared objectsaremaintained.Atransactionistypicallyabortedbythesysteminresponsetofailuresbutit maybeabortedalsobyauserto``undo''theactions.Ineithercase,theuserisinformedaboutthe successorfailureofthetransaction. Consistency :Atransactiontakesasharedobjectfromonelegalstatetoanother,thatis,maintains theinvariantofthesharedobject. Isolation : Events within a transaction are hidden from other concurrently executing transactions. Techniques for achieving isolation are called synchronization schemes. They determine how these transactionsarescheduled,thatis,whattherelationshipsarebetweenthetimesthedifferentsteps ofthesetransactions.Isolationisrequiredto ensurethatconcurrenttransactions donotcause an illegal state in the shared object and to prevent cascaded rollbacks when a transaction aborts. Durability :Oncethesystemtellstheuserthatatransactionhascompletedsuccessfully,itensures that values written by the database system persist until they are explicitly overwritten by other transactions. Consider the schedules S1, S2, S3, S4 and S5 given below. Draw the precedence graphs for each scheduleandstatewhethereachscheduleis(conflict)serializableornot.Ifascheduleisserializable, writedowntheequivalentserialschedule(s). S1:read1(X),read3(X),write1(X),read2(X),write3(X). S2:read1(X),read3(X),write3(X),write1(X),read2(X). S3:read3(X),read2(X),write3(X),read1(X),write1(X). S4:read3(X),read2(X),read1(X),write3(X),write1(X). S5:read2(X),write3(X),read2(Y),write4(Y),write3(Z),read1(Z),write4(X),read1(X),write2(Y), read1(Y). 11.2 Serializability and Recoverability

Serializability is the classical concurrency scheme. It ensures that a schedule for executing concurrenttransactionsisequivalenttoonethatexecutesthetransactionsseriallyinsomeorder.It assumesthatallaccessestothedatabasearedoneusingreadandwriteoperations.Ascheduleis called``correct''ifwecanfindaserialschedulethatis``equivalent''toit.Givenasetoftransactions T1...Tn,twoschedulesS1andS2ofthesetransactionsareequivalentifthefollowingconditionsare satisfied:

148 Read-Write Synchronization : If a transaction reads a value written by another transaction in one schedule,thenitalsodoessointheotherschedule. Write-Write Synchronization : If a transaction overwrites the value of another transaction in one schedule,italsodoessointheotherschedule. Recoverability forchangestothe othercontrolfilerecordssectionsis provided bymaintainingall the information in duplicate. Two physical blocks represent each logical block. One contains the current information, and the other contains either an old copy of the information, or a pending version that is yet to be committed. To keep track of which physical copy of each logical block contains the current information, Oracle maintains a block version bitmap with the database informationentryinthefirstrecordsectionofthecontrolfile. Recovery is an algorithmic process and should be kept as simple as possible, since complex algorithmsarelikelytointroduceerrors.Therefore,anencodingschemeshouldbedesignedarounda setofprinciplesintendedtomakerecoverypossiblewithsimplealgorithms.Forprocessessuchas tagremoval,simplemappingsaremorestraightforwardandlesserror pronethan,say,algorithms which require rearrangement of the sequence of elements, or which are contextdependent, etc. Therefore, in order to provide a coherent and explicit set of recovery principles, various recovery algorithmsandrelatedencodingprinciplesneedtobeworkedout,takingintoaccountsuchthings as: Theroleandnatureofmappings(tagstotypography,normalizedcharacters,spellings,etc.,withthe original...); • Theencodingofrenditioncharactersandrenditiontext; • Definitions and separability of the source and annotation (such as linguistic annotation, notes,etc.); • Linkageofdifferentviewsorversionsofatext;0.5ex 11.3 View Serializability Serializability In this paper, we assume serializability is the underlying consistency criterion. Serializability requires that concurrent execution of transactions have the same effect as a serial schedule.Inaserialschedule,transactionsareexecutedoneatatime.Giventhatthedatabaseis initiallyconsistentandeachtransactionisaunitofconsistency,serialexecutionofalltransactions will give each transaction a consistent view of the database and will leave the database in a consistent state. Since serializable execution is computationally equivalent to serial execution, serializableexecutionisalsodeemedcorrect.

View Serializability Asubclassofserializability,called View Serializability ,isidentifiedbasedon the observation that two schedules are equivalent if they have the same effects. The effects of a schedulearethevaluestheyproduce,whicharefunctionsofvaluestheyread.TwoschedulesH 1andH 2aredefinedtobeviewequivalentif 3Runtimeoverheadinhardrealtimesystemseffectivelytranslatesintoincreasedtasks'execution times,whichinturnaffectsanalgorithm'sschedulability.

149 DatabaseConcurrencyControlfflMultipleusers.fflConcurrentaccesses.fflProblemscouldariseif thereisnocontrol.fflExample: Transaction1:withdraw$500fromaccountA.Transaction2:deposit$800toaccountAT 1:read(A) A=A\Gamma500write(A)T 2:read(A) A=A+800write(A)InitiallyA=1000T 1T 2A=1300T 2T 1A=1300T 1:read(A) T1:A=A\Gamma500 T2:read(A) T2:A=A+800 T2:write(A) T1:write(A) A=500,inconsistent T2:read(A) T1:read(A) T1:A=A\Gamma500 T1:write(A) T2:A=A+800 T2:write(A) A=1800,inconsistent TransactionsandSchedulesfflAtransactionisasequenceofoperations.fflAtransactionisatomic. fflT=fT1;:::;Tngisasetoftransactions.fflAscheduleofasequenceofoperationsinT1;:::; Tnsuchthatforeach1^i^n;1.eachoperationinTiappearsexactlyonce,and2.operationsinTi appearinthesameorderasinTifflAscheduleofTisserialif8i8j;i6=j)either1.alltheoperations in Ti appear before all the operations in Tj , or 2. all the operations in Tj appear before all the operationsinTifflAssumption:EachTiiscorrectwhenexecutedindividually.fflAnyserialschedule isvalid.fflObjective:Acceptschedules"equivalent"toaserialschedule(serializableschedules).ffl? Whatdowemeanby"equivalent"?

View Serializability: fflequivalent:sameeffectsfflTheeffectsofahistoryarethevaluesproducedby theWriteoperationsofunabortedtransactions.fflWedon'tknowanythingaboutthecomputationof eachtransactions.fflAssumethatifeachtransactions'Readsreadthesamevalueintwohistories, thenallWriteswritethesamevaluesinbothhistories.fflIfforeachdataitemx,thefinalWriteonx isthesameinbothhistories,thenthefinalvalueofalldataitemswillbethesameinbothhistories. fflTwohistoriesHandH0areviewequivalentif 1.Theyareoverthesamesetoftransactionsandhavethesameoperations; 2.ForanyunabortedTi,Tjandforanyx,ifTireadsxfromTjinHthenTireadsxfromTjinH0, and3.Foreachx,ifwi[x]isthefinalwriteof xinHthenitisalsothefinalwriteofxinH0. ffl Assume that there is a transaction (Tb) which initializes the values for all the data objects. ffl A scheduleisviewserializableifitisviewequivalent toaserialschedule.fflr3[x]w4[x]w3[x]w6[x]T3readfromTb. Thefinalwriteforxisw6[x]. ViewequivalenttoT3T4T6. fflr3[x]w4[x]r7[x]w3[x]w7[x] T3readfromTb. T7readfromT4. Thefinalwriteforxisw7[x]. ViewequivalenttoT3T4T7.

150 fflr3[x]w4[x]w3[x] T3readfromTb. Thefinalwriteforxisw3[x]. Notserializable. fflw1[x]r2[x]w2[x]r1[x] T2readfromT1. T1readfromT2. Thefinalwriteforxisw2[x]. Notserializable. fflTestforviewserializability.fflTbissueswritesforalldataobjects(firsttransaction).fflTfreadthe valuesforalldataobjects(lasttransaction). fflConstructionoflabeledprecedencegraph 1. AddanedgeTi.0!Tj,iftransactionTj. ReadsfromTi. 2. ForeachdataitemQsuchthatTj readfromTi.Tkexecuteswrite(Q)andTk6=Tb.(i6=j6=k)dothefollowings:(a)IfTi=TbandTj 6=Tf,theninserttheedgeTj0!Tk. 11.4 Resolving Deadlocks UsetheMonitorDisplaytounderstandandresolvedeadlocks.Thissectiondemonstrateshowthisis done. The steps assume that the deadlock is easy to recreate with the tested application running withinOptimizeItThreadDebugger.Ifthisisnotthecase,usetheMonitorUsageAnalyzerinstead. Toresolveadeadlock: 1. Recreatethedeadlock. 2. SwitchtoMonitorDisplay. 3. Identifywhichthreadisnotmakingprogress.Usually,thethreadisyellowbecauseitis blockingonamonitor.Callthatthreadtheblockingthread. 4. SelecttheConnectionbuttontoidentifywheretheblockingthreadisnotmakingprogress. Doubleclickonthemethodtodisplaythesourcecodefortheblockingmethod,aswellas methodscallingtheblockingmethod.Thisprovidessomecontextforwherethedeadlock occurs. 5. Identifywhichthreadownstheunavailablemonitor.Callthisthelockingthread. 6. Identify why the locking thread does not release the monitor. This can happen in the followingcases: •Thelockingthreadisitselftryingtoacquireamonitorowneddirectlyorindirectlybythe blockingthread.Inthiscase,abugexistssinceboththelockingandtheblockingthreads enter monitors in a different order. Changing the code to always enter monitors in the sameorderwillresolvethisdeadlock. •Thelockingthreadisnotreleasingthemonitorbecauseitremainsbusyexecutingthe code.Inthiscase,thelockingthreadisgreenbecauseitusessomeCPU.Thistypeofbug is not a real deadlock. It is an extreme contention issue caused by the locking thread holdingthemonitorfortoolong,sometimescalledthreadstarvation. • The locking thread is waiting for an I/O operation. In this case the locking thread is purple.ItisdangerousforathreadtoperformanI/Ooperationwhileholdingamonitor, unlesstheonlypurposeofthatmonitoristoprotecttheobjectsusedtoperformtheI/O.

151 A blocking I/O operation may never occur, causing the program to hang. Often these situationscanberesolvedbyreleasingthemonitorbeforeperformingtheI/O. •Thelockingthreadiswaitingforanothermonitor.Inthiscase,thelockingthreadisred. Itisequallydangeroustowaitforamonitorwhileholdinganothermonitor.Themonitor may never be notified, causing a deadlock. Often this situation can be resolved by releasing the monitor that the blocking thread wants to acquire before waiting on the monitor. 11.5 Distributed Databases To support data collection needs of large networks, a distributed multitier architecture is necessary. The advantage of this approach is that massive scalability can be implemented in an incrementalfashion,easingtheoperationsandfinancialburden.WebNMSServerhasbeendesigned toscalefromasingleserversolutiontoadistributedarchitecturesupportingverylargescaleneeds. Thearchitecturesupportsoneormorerelationaldatabases(RDBMS). A single backend server collects the data and stores it in a local or remote database. The system readily supports tens of thousands of managed objects. For example, on a 400 MHz Pentium WindowsNT system,thepollingenginecan collectover4000 collected variables/minute,including storageintheMySQLdatabaseonWindows. Thebottleneckfordatacollectionisusuallythedatabaseinserts,whichlimitsthenumberofentries persecondthatcanbeinsertedintothedatabase.Aswediscussbelow,withadistributeddatabase, considerably higher performance is possible through distributing the storage of data into multiple databases. Basedontestswithdifferentmodes,onecentraldatabaseoncommodityhardwarecanhandleupto 100 collected variables/second; with distributed databases, this can be scaled much higher. The achievableratedependsonthenumberofdatabasesandthenumberandtypeofserversused.With distributeddatabases,thereisoftenaneedtoaggregatedatainasinglecentralstoreforreporting andotherpurposes.Multipleapproachesarefeasiblehere: • Rollupdataperiodicallytothecentraldatabasefromthedifferentdistributeddatabases. • Use Native database distribution for centralized views, e.g. Oracle SQL Net. This is vendor dependent,butcanprovideeasyconsolidationofdatafrommultipledatabases. • AggregatedatausingJDBConlywhencreatingareport.Thiswouldrequirethereportwriter totakecareofcollectingthedatafromthedifferentdatabasesforthereport. Theneedarisesto • ToreducetheburdenofPollEngineinWebNMSServer. • Tofacilitatefasterdatacollection Thesolutionis DistributedPolling.You canadoptthistechniquewhenyou areableto distinguish thenetworkelementsgeographically.Youcanformagroupofnetworkelementsanddecidetohave oneDistributedPollerforthem. 152 ThissectiondescribesDistributedPollingarchitectureavailablewithWebNMSServer.Itdiscusses thedesign,andthechoicesavailableinimplementingthedistributedsolution.Itprovidesguidelines onsettingupthecomponentsofthedistributedsystem.

Architectureisverysimple. • YouhaveWebNMSserverrunninginonemachineandDistributedPollerrunninginother machines,oneineach. • EachPollerisidentifiedbyanameandhasanassociated database (labelled as Secondary RDBMSinthediagram) • YoucreatePolledDataandspecifythePollernameifyouwanttoperformdatacollectionfor thatPolledDataviathedistributedpoller.IncaseyouwantWebNMSPollingEnginetocollect datayoudon'tspecifyanyPollername.Bydefault,PolledDatawillnotbeassociatedwithany ofthePollers. • OnceyouassociatethePolledDatawiththePollerandstartthePoller,datacollectionisdone bypollerandcollecteddataisstoredinPollerdatabase(SecondaryRDBMS). MajorfeaturesofaDDBare: o Datastoredatanumberofsites,eachsitelogicallysingleprocessor o Sitesareinterconnectedbyanetworkratherthanamultiprocessorconfiguration o DDBislogicallyasingledatabase(althougheachsiteisadatabasesite) o DDBMShasfullfunctionalityofaDBMS Totheuser,thedistributeddatabasesystemshouldappearexactlylikeanondistributeddatabase system. Advantagesofdistributeddatabasesystemsare: o Localautonomy(inenterprisesthataredistributedalready) o Improvedperformance(sincedataisstoredclosetowhereneededandaquerymaybesplit overseveralsitesandexecutedinparallel) o Improvedreliability/availability(shouldonesitegodown) o Economics o Expandability o Shareability Disadvantagesofdistributeddatabasesystemsare: 153 o Complexity(greaterpotentialforbugsinsoftware) o Cost(softwaredevelopmentcanbemuchmorecomplexandthereforecostly.Also,exchangeof messagesandadditionalcomputationsinvolveincreasedoverheads) o Distributionofcontrol(nosingledatabaseadministratorcontrolstheddb) o Security(sincethesystemisdistributedthechancesofsecuritylapsesaregreater) o Difficulttochange(sinceallsiteshavecontrolofthetheirownsites) o Lackofexperience(enoughexperienceisnotavailableindevelopingdistributedsystems) Replicationimprovesavailabilitysincethesystemwouldcontinuetobefullyfunctionalevenifasite goesdown.Replicationalsoallowsincreasedparallelismsinceseveralsitescouldbeoperatingonthe samerelationsatthesametime.Replicationdoesresultinincreasedoverheadsonupdate. Fragmentation may be horizontal, vertical or hybrid (or mixed). Horizontal fragmentation splits a relation by assigning each tuple of the relation to a fragment of the relation. Often horizontal fragmentationisbasedonpredicatesdefinedonthatrelation. Vertical fragmentation splits the relation by decomposing a relation into several subsets of the attributes. Relation R produces fragments each of which contains a subset of attributesof Raswellastheprimarykeyof R.Aimofverticalfragmentationistoputtogetherthose attributesthatareaccessedtogether. Mixedfragmentationusesbothverticalandhorizontalfragmentation. To obtain a sensible fragmentation design, it is necessary to know some information about the database as well as about applications. It is usefultoknowthepredicatesusedintheapplication queriesatleastthe'important'ones. Aimistohaveapplicationsusingonlyonefragment. Fragmentation must provide completeness (all information in a relation must be available in the fragments), reconstruction (the original relation should be able to be reconstructed from the fragments)anddisjointedness(noinformationshouldbestoredtwiceunlessabsolutelyessential,for example,thekeyneedstobeduplicatedinverticalfragmentation). Transparencyinvolvestheusernothavingto knowhowarelationisstoredinthe DDB;itis the systemcapabilitytohidethedetailsofdatadistributionfromtheuser. Autonomyisthedegreetowhichadesigneroradministratorofonesitemaybeindependentofthe remainderofthedistributedsystem. Itisclearlyundesirablefortheuserstohavetoknowwhichfragmentoftherelationtheyrequireto processthequerythattheyareposing.Similarlytheusersshouldnotneedtoknowwhichcopyofa replicatedrelationorfragmenttheyneedtouse.It should be upto the system to figure out which fragmentorfragmentsofarelationaqueryrequiresandwhichcopyofafragmentthesystemwilluse toprocessthequery.Thisiscalledreplicationandfragmentationtransparency. A user should also not need to know where the data is located and should be able to refer to a relation by name which could then be translated by the system into full name that includes the locationoftherelation.Thisislocationtransparency. Globalqueryoptimizationiscomplexbecauseof

154 • costmodels • fragmentationandreplication • largesolutionspacefromwhichtochoose ComputingcostitselfcanbecomplexsincethecostisaweightedcombinationoftheI/O,CPUand communicationscosts.Oftenoneofthetwocostmodelsareused;onemaywishtominimizethetotal cost(time)ortheresponsetime.Fragmentationandreplicationaddanothercomplexitytofindingan optimumqueryplan.

Date's 12 Rules for Distributed Databases RDBMSinallotherrespectsshouldbehavelikeanondistributedRDBMS.Thisissometimescalled Rule0.

Distributed Database Characteristics AccordingtoOracle,thesearethedatabasecharacteristicsandhowOracle7technologymeetseach point: 1. Local autonomy Thedataisownedandmanagedlocally.Localoperationsremainpurelylocal.One site(node)inthedistributedsystemdoesnotdependonanothersitetofunctionsuccessfully. 2. No reliance on a central site Allsitesaretreatedasequals.Eachsitehasitsowndatadictionary. 3. Continuous operation Incorporatinganewsitehasnoeffectonexistingapplicationsanddoesnot disruptservice. 4. Location independence Userscanretrieveandupdatedataindependentofthesite. 5. Partitioning [fragmentation] independence Users can store parts of a table at different locations. Bothhorizontalandverticalpartitioningofdataispossible. 6. Replication independence Storedcopiesofdatacanbelocatedatmultiplesites.Snapshots,atype ofdatabaseobject,canprovidebothreadonlyandupdatablecopiesoftables.Symmetricreplication usingtriggersmakesreadableandwritablereplicationpossible. 7. Distributed query processing Userscanqueryadatabaseresidingonanothernode.Thequeryis executedatthenodewherethedataislocated. 8. Distributed transaction management Atransactioncanupdate,insert,ordeletedatafrommultiple databases. The twophase commit mechanism in Oracle ensures the integrity of distributed transactions.Rowlevellockingensuresahighlevelofdataconcurrency. 9. Hardware independence Oracle7runsonallmajorhardwareplatforms. 10. independence Aspecificoperatingsystemisnotrequired.Oracle7runsundera varietyofoperatingsystems. 11. Network independence The Oracle's SQL*Net supports most popular networking software. Network independence allows communication across homogeneous and heterogeneous networks. Oracle's MultiProtocol Interchange enables applications to communicate with databases across multiplenetworkprotocols. 12. DBMS independence DBMSindependenceistheabilitytointegratedifferentdatabases.Oracle's OpenGatewaytechnologysupportsODBCenahledconnectionstononOracledatabases. 11.6 Distributed Commit

155 Tocreateanewuser, test ,andacorrespondingdefaultschemayoumustbeconnectedastheADMIN userandthenuse: CREATEUSERtest; CREATESCHEMAtestAUTHORIZATIONtest;setsthedefaultschema COMMIT; andthenconnecttothenewuser/schemausing: CONNECTTO''USER'test' Notice that the COMMIT was needed before the CONNECT because reconnecting would otherwise rollbackanyuncommittedchanges. Inthisexamplethesequenceofeventsisasfollows: ThecoordinatoratClientAregistersautomaticallywiththeTransactionManagerdatabaseatServer B,usingTM_DATABASE=TMB. TheapplicationrequesteratClientAissuesaDUOWrequesttoServersCandE.Forexample,the followingREXXscriptillustratesthis: /**/ 'setDB2OPTIONS=+c'/*inordertoturnoffautocommit*/ 'db2setclientconnect2syncpointtwophase' 'db2connecttoDBCuserUSERCusingPASSWRDC' 'db2createtabletwopc(titlevarchar(50)artnosmallintnotnull)' 'db2insertintotwopc(title,artno)values("testCCC",99)' 'db2connecttoDBEuserUSEREusingPASSWRDE' 'db2createtabletwopc(titlevarchar(50)artnosmallintnotnull)' 'db2insertintotwopc(title,artno)values("testEEE",99)' 'commit' exit(0); Whenthecommitisissued,thecoordinatorattheapplicationrequestersendspreparerequeststo theSPMfortheupdatesrequestedatserversCandE. The SPM is running on Server D, as part of DB2 Connect, and it sends the prepare requests to serversCandE.ServersCandEinturnacknowledgethepreparerequests. TheSPMsendsbackanacknowledgementtothecoordinatorattheapplicationrequester. ThecoordinatorattheapplicationrequestersendsarequesttothetransactionmanageratServerB fortheserversthathaveacknowledged,andthetransactionmanagerdecideswhethertocommitor rollback.Thetransactionmanagerlogsthecommitdecision,andtheupdatesareguaranteedfrom thispoint.Thecoordinatorissuescommitrequests,whichareprocessedbytheSPM,andforwarded toserversCandE,aswerethepreparerequests.ServersCandEcommitandreportsuccesstothe SPM. SPM then returns the commit result to the coordinator, which updates the TMB with the commitresults.

156 Two-phase Commit RDBMS Scenario

11.7 Distributed Locking The intent of this white paper is to convey information regarding database locks as they apply to transactions in general and the more specific case of how they are implemented by the Progress server. We’ll begin with a general overview discussing why locks are needed and how they affect transactions.Transactionsand locking areoutlinedintheSQLstandardsonointroductionwould becompletewithoutdiscussingtheguidelinessetforthhere.Oncewehaveagrasponthegeneral concepts of locking we’ll dive into lock modes, such as table and record locks and their effect on differenttypesofdatabaseoperations.Next,thesubjectoftimingwillbeintroduced,whenlocksare obtainedandwhentheyarereleased.Fromherewe’llgetintolockcontentionanddeadlocks,which aremultipleoperationsortransactionsallattemptingtogetlocksonthesameresourceatthesame time. And to conclude our discussion on locking we’ll take a look at how we can see locks in our application so we know which transactions obtain which types of locks. Finally, this white paper describes differences in locking behavior between previous and current versions of Progress and differencesin locking behaviorwhenboth4GLandSQL92clientsareaccessingthesameresources. Locks The answer to why we lock is simple; if we didn’t there would be no consistency. Consistency provides us with successive, reliable, and uniform results without which applications such as banking and reservation systems, manufacturing, chemical, and industrial data collection and processing could not exist. Imagine a banking application where two clerks attempt to update an accountbalanceatthesametime:onecreditstheaccountandtheotherdebitstheaccount.While oneclerkreadstheaccountbalanceof$200tocredittheaccount$100,theotherclerkhasalready completedthedebitof$100andupdatedtheaccountbalanceto$100.Whenthefirstclerkfinishes thecreditof$100tothebalanceof$200andupdatesthebalanceto$300itwillbeasifthedebit neverhappened.Greatforthecustomer;howeverthebankwouldn’tbeinbusinessforlong. Whatobjectsarewe locking ? What database objects get locked is not as simple to answer as why they’re locked. From a user perspective,objectssuchastheinformationschema,usertables,anduserrecordsarelockedwhile

157 beingaccessedtomaintainconsistency.Thereareotherlowerlevelobjectsthatrequirelocksthatare handledbythe RDBMS ;however,theyarenotvisibletotheuser.Forthepurposesofthisdiscussion wewillfocusontheobjectsthattheuserhasvisibilityofandcontrolover. Transactions Nowthatweknowwhyandwhatwelock,let’stalkabitaboutwhenwelock.Atransactionisaunit of work; there is a welldefined beginning and end to each unit of work. At the beginning of each transactioncertainlocksareobtainedandattheendofeachtransactiontheyarereleased.During anygiventransaction,the RDBMS ,onbehalfoftheuser,canescalate,deescalate,andevenrelease locks as required. We’ll talk about this in more detail later when we discuss lock modes. The aforementionedisalltrueinthecaseofanormal,successfultransaction;howeverinthecaseofan abnormallyterminatedtransactionthingsarehandledabitdifferently.Whenatransactionfails,for anyreason,theactionperformedbythetransactionneedstobebackedout,thechangeundone.To accomplish this most RDBMS use what are known as “save points.” A save point marks the last known good point prior to the abnormal termination; typically this is the beginning of the transaction. It’s the RDBMS ’s job to undo the changes back to the previous save point as well as ensuringtheproperlocksarehelduntilthetransactioniscompletelyundone.So,asyoucansee, transactionsthatareintheprocesstobeundone(rolledback)arestilltransactionsnonethelessand still need locks to maintain data consistency. Locking certain objects for the duration of a transaction ensures database consistency and isolation from other concurrent transactions, preventingthebankingsituationwedescribedpreviously.TransactionsarethebasisfortheACID • ATOMICITY guaranteesthatalloperationswithinatransactionareperformedornoneofthemare performed. • CONSISTENCY istheconceptthatallowsanapplicationtodefineconsistencypointsandvalidate thecorrectnessofdatatransformationsfromonestatetothenext. • ISOLATION guaranteesthatconcurrenttransactionshavenoeffectoneachother. • DURABILITY guaranteesthatalltransactionupdatesarepreserved. 11.8 Summary

Atypeofcomputerprocessinginwhichthecomputerrespondsimmediatelytouserrequests.Each request is considered to be a transaction. Automatic teller machines for banks are an example of transactionprocessing.Theoppositeoftransactionprocessingisbatchprocessing,inwhichabatch ofrequestsisstoredandthenexecutedallatonetime.Transactionprocessingrequiresinteraction withauser,whereasbatchprocessingcantakeplacewithoutauserbeingpresent. Serializability is the classical concurrency scheme. It ensures that a schedule for executing concurrenttransactionsisequivalenttoonethatexecutesthetransactionsseriallyinsomeorder.It assumesthatallaccessestothedatabasearedoneusingreadandwriteoperations.Ascheduleis called``correct''ifwecanfindaserialschedulethatis``equivalent''toit.Givenasetoftransactions T1...Tn,twoschedulesS1andS2ofthesetransactionsareequivalentifthefollowingconditionsare satisfied: Read-Write Synchronization :Ifatransactionreadsavaluewrittenbyanothertransactionin oneschedule,thenitalsodoessointheotherschedule. Write-Write Synchronization :Ifatransaction overwritesthevalueofanothertransactioninoneschedule,italsodoessointheotherschedule.

158