Information Technology Module for Managers

Database Design Lesson No. 2

In this lesson you will come to know the history of database files and about Relational Database Management System (RDBMS). You will also come to know briefly about Client/Server computing.

History of Files: From the inception of computing we know three types of files. They are: Sequential Model Navigational Model Relational Model We are now seeing the emergence of the fourth type: Object Model (along with relational/object hybrids) Along with the above evolution, there has been a shift of responsibility from the application to the database. For example: in sequential model, application was responsible for locating a record and enforcement of business rules.

1.1 Sequential Model Sequential files consist of fixed length records with fixed length fields. The old 80-column cards are example of this. A variation of this came later, when a character was used as a delimiter, instead of depending on the end of a record in fixed location.

Visual Basic supports both these forms of sequential files. VB uses Open, Close and other variety of statements.

An advantage of this sequential model is its simplicity. On the other hand the disadvantage is that search always begins from the first records and one after the other. Some improvements were introduced later whereby Random search became possible.

1.2 Navigational Model Also called Hierarchical Model. It was a major improvement. The detail of navigation is kept hidden from the developer so that the database did the dirty work of locating a record. Common examples were dBase and Btrieve files. Visual Basic supports the Navigational Model in two ways. One is opening a file in random access mode and then manually maintaining some sort of hashing algorithm to navigate from record to record. Often, this method is implemented in a manner similar to what C programmers know as linked lists. A linked list essentially maintains a series of pointers into a structure or file allowing navigation by say, Account Code. The Other way is by connection to navigational databases via Microsoft Jet Engine. Although the interface provided is SQL, the underlying file is still accessed navigationally. Microsoft refers to this as Indexed Sequential Access Method (ISAM).

1.3 Relational Model This model organized data into related tables. A table is a two dimensional grid of columns and rows. Each table has a unique identifier, called Primary Key, which identifies a record uniquely. The relationship between tables is through a common field. An example will be: Accounts Names table, Petty Cash Payments table will be related by a common field called Account Number. The Account Number in the Petty Cash table is a Foreign Key. This is the key by which relationship is maintained. This foreign key is special type of index on the database that enforces a business rule “No payments should exist in the Payments tables without a valid Account Number”. This enforcement is called referential integrity. The database may in addition enforce other rules such as Petty Cash Payments limit on the Rupee amount. These enforced rules are called constraints.

There are many companies who provide Relational Database Management Systems (RDBMSs). They are Oracle, Informix, Sybase, IBM, Microsoft and so on. All RDBMSs‟ engines are using SQL as a language for accessing and manipulating data.

Client/Server Applications In a client/server configuration, both the database engine and the data needed by the application reside on the network server. Instead of running a copy of the database engine on each workstation, the database engine runs only on one machine – the server. The workstation simply sends a high-level request in the form of an SQL statement to the server, which retrieves the specific data and sends it back to the client. The subject of client/server applications will be covered separately later.

Relational Databases and Relational Database Management Systems Any database that organizes data in related tables and that can enforce those relationships through Referential Integrity can be called a relational database. (.MDB) falls into this category. Let us see how this database stored in a server will work. Assume that you want to see stock items that belong to a particular vehicle. Also assume there are 10,000 items in the table called Master and there are only 3 items that belong to the specified table. The SQL statement that will be sent to the server will be something like this:

SELECT * FROM Master WHERE VehicleNo= „11/5904‟

All 10,000 records are sent from the server to the application that queried. Then the application itself reads through those records to find those three items that belong to the vehicle specified in the query. This inefficient data processing occurs because Access database is just a file without an engine to process the request.

Let us compare this with a Relational Data Base Management System (RDBMS) such as Oracle or Microsoft SQL Server, which are true database engines. In our example stated above, the SQL statement is sent to the RDBMS, which processes the request on the server. The Server will send back to the application only three records that are necessary. You can see here how the network traffic is reduced greatly.

However, you should not hastily decide to buy the most powerful database engine available in the market even if you have enough money and made available. May be the most powerful engine may also be the most difficult to administer or simply excessive to the actual needs of your organization. Some of the considerations to be taken into account are as follows:  Network operating system  Typical database usage  Number of users and volume of data  Database administration  Database Cost  Vendor stability and reputation

The proper choice therefore will depend on an organization‟s needs and environment.

Microsoft Access Although Microsoft Access is not truly an RDBMS, it is a popular one and offers many sophisticated features. Implementing a true back-end database system requires higher levels of expertise than for a Microsoft Access implementation. Therefore choosing an MDB is not always a wrong thing to do. If the amount of data is small, then a file server solution may be a better solution. Clearly the network traffic will become a problem as a system grows. In such cases you can change over when the time is right. Microsoft offers Tools like upsizing Wizard to change over from an MDB file to a SQL Server database. Database Normalization Lesson No. 2 (contd)

There is a lot of overlap between the entity-relationship modeling and database normalization process. You can therefore design a database using one of these methods. But we know that each method has advantages and disadvantages. The earlier method is good for identifying database integrity constraints although it does not help much to ensure that attributes are assigned to the proper entities. The second method, database normalization tries to do this – verify attribute assignment by the application of standardized rules.

Comparing the design to a series of normal forms does the task of normalization. We can say that the database is fully normalized when it is in the third normal form. Each normal form seeks to eliminate data redundancy and anomalies. An anomaly is an undesirable situation when you cannot perform a natural operation in a database.

First Normal Form: The rule is that there can be no repeating groups of data. For example, an item name cannot be repeated in several rows or records. If an item is repeated, then you are violating the first normal form rule. To solve this, you must move the repeating group to a new table.

Second Normal Form: The rule is that we cannot have partial-key dependencies. A partial-key dependency is when a column is dependent only on only part of the primary key.

Third Normal Form: The rule is that there can be no non-key dependencies. In other words, every column must be dependent on the primary key. In our example, Customer name and address is depending on Customer Number and not on the Order Number.

Normally, you can stop at third normal and say that the database is fully normalized. There are actually other normal forms, namely the fourth and the fifth. These are used in rare situations such as resolving circular references. This is not necessary for us at this stage. It is not surprising however that sometimes a database is even de-normalized to overcome problems such as too many tables to join or performance bottlenecks. A critical examination is therefore necessary after normalization.

Conclusive Notes: Database design is partly science, partly art and partly experience. A lot of design work is simply common sense. However, it is better to go through the steps of the design process to confirm your design.

If you find that you need to join four or five tables too frequently, especially in the online environment, consider de-normalizing your database.

S Sivaloganathan, Management and IT Consultant