Of Modern Especially the Relational One

overview in general

November 2012

by

Sigrún Gunnarsdóttir

Database Administrator

ÍSOR - Iceland GeoSurvey

Table of contents

What is a ? - Spreadsheets or ....? ...... 3

Types of databases...... 4

Storing and securing a Database and accessing information using one ...... 4

Database tables and how to manage them ...... 5

Database design ...... 5

Database normalization ...... 6

What is DBMS? ...... 7

The choice of DBMS - Oracle, DB2, Informix, PostgresPlus, MySQL, MS-SQL Server ...??? ..... 8 Going over to a relational database ...... 9

Role of DBA ...... 11

Some tools for input and output of data ...... 11

References ...... 15

2012_Of_modern_databases_and_the_relational_one.docx Sigrún Gunnarsdóttir Updated: 16.th November 2012 Page 2 of 15

What is a database? - Spreadsheets or ....?

With some Google search (like e.g. “what is a database” or “define:database”) one can find all about that, both the older and more simplified definitions:  An organized body of related information  A database defines a structure for storing information upto the ones recently used definitions which include computer systems:  A database is a structured collection of records or data that is stored in a computer system – (in system files of some DBMS – Database Management System – or just in raw unix textfiles)

Modern databases are designed to offer an organized mechanism for storing, managing and retrieving information. They do so through the use of tables. If you’re already familiar with spreadsheets like Microsoft Excel, you’re probably accustomed to storing data in tabular form. Just like Excel tables, database tables consist of columns and rows.

If a database is so much like a spreadsheet, why can’t I just use a spreadsheet? Databases are actually much more powerful than spreadsheets .... Here are just a few of the actions that you can perform on a database that would be difficult if not impossible to perform on a spreadsheet: • Retrieve all records that match certain criteria • Update records in bulk • Cross-reference records in different tables • Perform complex aggregate calculations

The database used to be an extremely technical term used only by large institutions or companies, however with the rise of computer systems and information technology throughout our culture, the database has so to say become a household term. In order for a database to be truly functional, it must: • store large amounts of records well • be accessed easily • new information and changes should also be fairly easy to input

Besides these features, all databases that are created should be built with high data integrity and the ability to recover data if hardware fails.

2012_Of_modern_databases_and_the_relational_one.docx Sigrún Gunnarsdóttir Updated: 16.th November 2012 Page 3 of 15

Types of databases

There are several common types of databases and each database type has its own data model (how the data is structured). They include: • Flat Model • Hierarchical Model • Relational Model • Network Model More about the Flat Model and Relational Model here later on, in Going over to a relational database. The Hierarchical Model organizes data into a tree-like structure that allows data tables being represented using parent/child relationships so that each parent can have many children, but each child just one parent. The Network Model uses a flexible way representing data tables and is not restricted to being hierarchical or lattice. The relational model is the most popular type of database and an extremely powerful tool, not only to store information, but to access it as well. Relational databases are organized as tables. A table can have many records and each record can have many fields. Tables are sometimes called a relation. It should be noted that every record (group of fields) in a relational database table has its own primary key: one unique field or some few fields combined that makes it easy to identify a record. Relational databases use a program interface called SQL or Structured . SQL is currently used on practically all relational databases.

Storing and securing a Database and accessing information using one While storing data is a great feature of databases, for many database users the most important feature is quick and simple retrieval of information. Queries are requests to pull specific types of information. No matter what kind of information you store on your database, queries can be created using SQL to help answer both simple and important questions. Databases can be very small (less than 1 MB) or extremely large and complicated (terabytes as in many government databases), however all databases are usually stored and located on hard disk or other types of storage devices and are accessed via computer. Large databases may require separate servers and locations, however many small databases can fit easily as files located on your computer's hard drive. Obviously, many databases store confidential and important information that should not be easily accessed by just anyone. Many databases thus require passwords and other security features in order to access the information. While some databases can be accessed via the internet through a network, other databases are closed systems and can only be accessed on site.

2012_Of_modern_databases_and_the_relational_one.docx Sigrún Gunnarsdóttir Updated: 16.th November 2012 Page 4 of 15

Database tables and how to manage them

• Tables are the basic unit of storage in a database. Tables are usually defined by a name, a set of columns, and other optional parameters. For each column, a column name, data type and width, precision, or scale must be specified. • Just like Excel tables, database tables consist of columns and rows. Each column contains a different type of attribute and each row corresponds to a single record. • Using SQL statements, data can be added and removed from the table by rows. Each row represents one data record. Data can also be modified through SQL by removing columns or changing some column value for a record. • Some various software tools can be used to interact with the database, for example SQLPLUS, MS Access, dbVisualizer, Navicat and many, many more, in order to view, insert or update data. More about that later on.

Database design Before data entering database tables there are some stages of design. A good design of a database structure in the beginning can prevent levels of complexity at later stages. The main design of a database can take place even before a database management system has been chosen. The main building blocks of a database consist of so called entities, attributes and relationships between the entities. An entity may be defined as a thing which is capable of existing independently and which can be uniquely identified. In other words an entity is anything that might deserve its own table in the database concerned. An entity must have a minimal set of uniquely identifying attributes, which is called the entity's primary key. The relationships between entities can be optional or compulsory. When designing a database you should start with identifying it's entities. This is mostly a matter of talking to some people who will use the database and figuring out what data the system will be working with. An internet shop, as an example, sells products which are ordered by customers. In this one sentence three obvious entities of a database for this shop have already been mentioned, product, order and customer. To decide on entities, what is to be an entity in a database or an attribute or relationship isn't always straightforward though. Sometimes in complex data modelling we can come up with many entities that won't translate directly to tables. A lot of information regarding design of databases can be found on the internet. In software engineering, entity-relationship modeling is a method of database modeling, used to produce a data model of a system, often a relational database, and it‘s requirements in a top-down fashion. Diagrams created by this process are called entity-relationship diagrams, ER diagrams, or ERDs.

2012_Of_modern_databases_and_the_relational_one.docx Sigrún Gunnarsdóttir Updated: 16.th November 2012 Page 5 of 15

Database normalization In designing relational databases normalization is used, that is a data-structuring model which ensures data consistency and aviods data duplication, which means making sure not to repeat data in or between tables. A relataional database design should in fact force the database users to use rules for data entry. Derivatives of values kept in a relational database should not themselves be stored in database tables records because there is no reason to do so. Derivatives like sums, averages, etc., can easily been drawn out, whenever they are needed, from the various values by using the RDBMS inbuilt functions, like sum, avg, max, min, sin, tan, round, power, and more. To repeat data between tables and to keep unneccesary data in a relational database is against the design rules, takes extra place on hard disks and drives, and last but not the least it multiplies the possibilities of human error when handling the data.

At the website http://databases.about.com/od/specificproducts/a/normalization.htm says: Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization process: eliminating redundant data (for example, storing the same data in more than one table) and ensuring data dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored.

At the website http://support.microsoft.com/kb/283878 says this about normalization rules: There are a few rules for database normalization. Each rule is called a "normal form." If the first rule is observed, the database is said to be in "first normal form." If the first three rules are observed, the database is considered to be in "third normal form." Although other levels of normalization are possible, third normal form is considered the highest level necessary for most applications. As with many formal rules and specifications, real world scenarios do not always allow for perfect compliance. In general, normalization requires additional tables and some customers find this cumbersome. If you decide to violate one of the first three rules of normalization, make sure that your application anticipates any problems that could occur, such as redundant data and inconsistent dependencies. First Normal Form  Eliminate repeating groups in individual tables.  Create a separate table for each set of related data.  Identify each set of related data with a primary key. Second Normal Form  Create separate tables for sets of values that apply to multiple records.  Relate these tables with a foreign key. Records should not depend on anything other than a table's primary key (a compound key, if necessary). Third Normal Form  Eliminate fields that do not depend on the key.

2012_Of_modern_databases_and_the_relational_one.docx Sigrún Gunnarsdóttir Updated: 16.th November 2012 Page 6 of 15

What is DBMS?

In order to have a highly efficient database system, you need to incorporate a program that manages the queries and information stored on the system. This is usually referred to as DBMS or a Database Management System. DEFINITION - A database management system (DBMS), sometimes just called a database manager, is a program that lets one or more computer users create and access data in a database. A DBMS can be thought of as a file manager that manages data in databases rather than files in file systems. As one of the oldest components associated with computers, the database management system, or DBMS, is a computer software program that is designed as the means of managing all databases that are currently installed on a system hard drive or network. Different types of database management systems exist. As the tool that is employed in the broad practice of managing databases, the DBMS is marketed in many forms. Some of the more popular examples of DBMS solutions include Microsoft Access, FileMaker, DB2 and Oracle. The most typical DBMS is a relational database management system, that is RDBMS. Relational databases are facilitated through RDBMS and almost all database systems in use today are RDBMS, like Oracle, DB2, Microsoft SQL Server, MySQL, Sybase, etc. All these products provide for the creation of a series of rights or privileges that can be associated with a specific user. This means that it is possible to designate one or more database administrators who may control each function, as well as provide other users with various levels of administration rights. This flexibility makes the task of using DBMS methods to oversee a system something that can be centrally controlled, or allocated to several different people. The DBMS manages user requests (and requests from other programs) so that users and other programs are free from having to understand where the data is physically located .... and in a multi-user system, who else may also be accessing the data. A standard user and program interface is the Structured Query Language, SQL. In handling user requests, the DBMS ensures both the integrity and the security of the data. The integrity of the data is making sure data continues to be accessible and is consistently organized as intended. The security of the data is making sure only those with access privileges can access the data.

2012_Of_modern_databases_and_the_relational_one.docx Sigrún Gunnarsdóttir Updated: 16.th November 2012 Page 7 of 15

The choice of DBMS - Oracle, DB2, Informix, PostgresPlus, MySQL, MS-SQL Server ...???

How to choose DBMS for an institution or company? The Explosion in DBMS Choice: Database options in a cost-conscious world - A Monash Information Services Bulletin by Curt A. Monash, PhD. It‘s well worth it to read this bulletin from August 2008 wherein Monash writes about: The end of a database era? Differentiation among relational DBMS Sorting out the details How to move forward and more ...

Monash says choices have exploded: No matter what your OLTP application design or (almost) transaction volume, you can probably get the job done with any of Oracle, DB2, or Informix. Postgres Plus isn't far behind. If you're only using standard alphanumeric datatypes, the list expands further, to include products from Microsoft, Sybase, Progress, and perhaps MySQL as well.

Oracle versus DB2 A lot of information can be found on the Internet by “googling” this versus that. Here e.g. is information worth reading when choosing between these “two best“:

Technical Comparison of Oracle Database 10g vs. IBM DB2 v8.2: Focus on High Availability http://www.oracle.com/technology/deploy/availability/pdf/CWP_HA_Oracle10gR2_DB28.2. pdf

And here is some comparison of Oracle 9i versus DB2 v8.1: Oracle 9i Database vs DB2 v8.1 http://www.mssqlcity.com/Articles/Compare/oracle_vs_db2.htm

2012_Of_modern_databases_and_the_relational_one.docx Sigrún Gunnarsdóttir Updated: 16.th November 2012 Page 8 of 15

Going over to a relational database

The difference between an old fashioned "flat file" database and a relational database is well described at http://wiki.answers.com/Q/Difference_between_Flat_File_Database_and_Relational_Database: A "flat file" database allows the user to specify data attributes (columns, datatypes, etc) for one table at a time, storing those attributes independantly of an application. dBase III and Paradox were good examples of this kind of database in the CP/M and MS-DOS environments, and the original FileMaker was a good Mac O/S example. A relational database takes this "flat file" approach several logical steps further, allowing the user to specify information about multiple tables and the relationships between those tables, and often allowing much more declarative control over what rules the data in those tables must obey (constraints). Credited to: http://expertanswercenter.techtarget.com/eac/knowledgebaseAnswer/0,295199,sid63_g ci976564,00.html Before going over to a relational database people have maybe been using "flat files" like text files or Excel files for storing and handling data. Even though databases made of text files or Excel files can be very useful, they have limitations. In a "flat file" database one flat file may hold main information of locations, like names, coordinates, description, etc., like: Name X Y Z Description Hillside lake 1 595300 1322600 12 Nearby the cross roads. Hillside lake 2 594700 1321000 45 Situated nearby the little house. South park hill 598900 1348000 6 The southernmost place in the valley.

Some other flat file or files in this "flat file" database are maybe holding some measurement values, like temperature or pressure measurements at the various locations, and in those files is also some information of the locations concerned, in one or even more columns. Thus some of the location information, the name and possibly some more, is repeated in every record in all files holding measurement values for that particular location, like:

LocationName TaskID Value Depth Hillside lake 1 117 0.8 50 Hillside lake 1 117 5.4 100 Hillside lake 1 117 14.1 200 Hillside lake 1 117 22.8 300 Hillside lake 1 ......

If, in a flat file system like this, the main location information, like place, it‘s name, etc., is both in a special table for locations and some of it also kept in columns in other tables, like in the table example here above, it has to be taken care of (inserted, updated or deleted) in every table where it is stored. If "Hillside lake 1" is to be renamed, an update will be neccessary on the LocationName column in both these table examples, and surely many more tables and even in many hundreds or thousands of records.

2012_Of_modern_databases_and_the_relational_one.docx Sigrún Gunnarsdóttir Updated: 16.th November 2012 Page 9 of 15

The flat file or Excel file structure is not scaleable and is not the best choice of data storage, since it neither allows linking of information from one file to another nor lets many users accessing the same file at a time. Relational databases can on the contrary be accessed by many users at a time and are scalable, which means they can grow larger or smaller as needed. To use flat text files or excel files has maybe been useful, nice and somehow easy, but people must not let them control their new relational database structure, and should not at all design their new database tables so they can paste the content of older files straight into them. To use a powerful database tool as it is kind of a super Excel is not exploiting it‘s possibilities. A relational database can hold the location information data in one table while holding the various measurement values for the locations in separate tables, all of which can be linked together by constraints (often called ID columns) to look at a bigger picture. The previous table examples would look something like this in a relational database: LocationID Name X Y Z Description 102 Hillside lake 1 595300 1322600 12 Nearby the cross roads. 103 Hillside lake 2 594700 1321000 45 Situated nearby the little house. 104 South park hill 598900 1348000 6 The southernmost place in the valley.

LocationID TaskID Value Depth 102 117 0.8 50 102 117 5.4 100 102 117 14.1 200 102 117 22.8 300

If the location name "Hillside lake 1" is to be renamed here in a relational database an update will only be neccessary on the Name column in just one record in the location table and there is no need to update tables keeping measurements etc. for the concerned location. The LocationID in the latter table or tables is just the same as before and can be used to point to the newly updated name in the location table. As mentioned previously relational database is essentially a group of tables which are made up of columns and rows. The tables have constraints and relationships are defined between the tables. The relational databases are queried using SQL (Structured Query Language) and data result sets can be accessed from one or more tables at the same time by the same query or by combinations of queries. When multiple tables are accessed the tables are kind of joined together, usually by a criterion defined in the table relationship columns. Relational databases have been around for over three decades. The reasons for the dominance of relational databases are significant, they have been offering the best mixture of simplicity and flexibility, performance and compatibility in managing data. The advanced data structuring capability of the relational database allows database builders and programmers to create complex relationships between the various data.

2012_Of_modern_databases_and_the_relational_one.docx Sigrún Gunnarsdóttir Updated: 16.th November 2012 Page 10 of 15

Role of DBA The role of a varies between companies or institutions regarding to their function, size, etc. The DBA role naturally divides into three major activities: • ongoing maintenance of production databases (operations DBA) • planning, design, and development of new database applications, or major changes to existing applications (development DBA, or architect) • and management of an organisation's data and metadata (data administrator) One person may perform all three roles, but each is profoundly different. The role of DBA at ÍSOR as an example Regarding the role of DBA at ÍSOR as an example, the system managers and the Oracle agency in Iceland set up the database machine and the system and also the oracle instances needed. System managers take main backups of both the database machine and system. The database administrator, dba, designs entity diagrams (illustrations of the logical structure of databases), creates the schemas, users and roles and gives privileges to roles and users, makes database tables, keys, indexes, triggers and so on, along with users in charge of each scema, borehole schema, chemical schema, etc. The dba loads data into tables on a large scale and in smaller quantities, takes schema backups, moves data between machines, databases, schemas and tables when needed. The dba does routinely some "housekeeping" jobs in the database, like clean up unused space by reorganising tables and indexes, makes checklists of daily, weekly, and monthly maintenance tasks to prevent or detect problems and resolve. The dba also makes SQL queries for the users and assists them in doing so, makes database tables for them and helps the users regarding their inputs and outputs. About some tools for input and output of data There are number of ways one can interact with a database and numerous methods or tools can be used for that purpose, and to clarify just few such are mentioned here:  dbVisualizer  Navicat  MS-Access, MS-Query, MS-Excel ......  MS SQL Server Management Studio  SQLPLUS ......  SQL – Structured Query Language, a standard user and program interface MS Access and MS Excel can besides being used to interact with databases also be used to keep smaller databases within. They can be used to to view tables columns and data in tabular forms and to do some search in the forms by writing in criteria, can be used to add or delete records in a convenient way, to view how tables are referenced, to get some built in graphical data forms, to make queries to interact with the database through so-called sql-command-window, etc. When such tools are used to connect and make queries into databases a special interface (“bridge”) between the two is needed, an ODBC - Open Data Base Connectivity.

2012_Of_modern_databases_and_the_relational_one.docx Sigrún Gunnarsdóttir Updated: 16.th November 2012 Page 11 of 15 dbVisualizer This tool from MINQ Software can be used on all major operating systems accessing a wide range of databases including Oracle, DB2 for LUW, Mimer SQL, Microsoft SQL Server, Sybase ASE, Informix and open source alternatives such as MySQL, PostgreSQL and JavaDB/Derby.

Here built-in is a graphical tool which makes it very easy to get diagrams from selected output. Outputs can easily be exported into files of type CSV, HTML, SQL, XML or XLS, and it is also easy to import data into tables from files.

SQLPLUS SQL*Plus is a command line SQL and PL/SQL language interface and reporting tool that is launched with the Oracle Database Client and Server software. It can be used interactively or driven from scripts. SQL*Plus is frequently used by DBAs and Developers to interact with the Oracle database. If you are familiar with other databases, sqlplus is equivalent to: – "sql" in Ingres, – "isql" in Sybase and SQL Server, – "db2" in IBM DB2, – "psql" in PostgreSQL, and – "mysql" in MySQL.

2012_Of_modern_databases_and_the_relational_one.docx Sigrún Gunnarsdóttir Updated: 16.th November 2012 Page 12 of 15

SQL – A standardized Structured Query Language Structured Query Language (SQL) is a specialized language for updating, deleting, and requesting information from databases. SQL is an ANSI and ISO standard, and is the de facto standard database query language for relational database systems. SQL was designed for storing, manipulating and retrieving data stored in relational databases. All relational database management systems like MySQL, MS Access, Oracle, Sybase, Informix, postgres and SQL Server are using SQL as standard database language. A variety of established database products support SQL, including products from Oracle and Microsoft SQL Server. It is widely used in both industry and academia, often for enormous, complex databases. In a distributed database system, a program often referred to as the database's "back end" runs constantly on a server, interpreting data files on the server as a standard relational database. Programs on client computers allow users to manipulate that data, using tables, columns, rows, and fields. To do this, client programs send SQL statements to the server. The server then processes these statements and returns replies to the client program. For example, to retrieve from a table called Customers all records (designated by the asterisk) with a value of Smith for the column Last_Name, a client program would prepare and send this SQL statement to the server back end. The server back end may then reply with data such as this: +------+------+------+ | Cust_No | Last_Name | First_Name | +------+------+------+ | 1001 | Smith | John | | 2039 | Smith | David | | 2098 | Smith | Matthew | +------+------+------+ 3 rows in set (0.05 sec)

SQL was designed for database users and people do not have to be computer scientists or trained programmers to use SQL. To start learning and using SQL is remarkably easy, you just need to read a little bit and then go ahead and start practising making queries to select data from your database. Lots of free learning material can be found on the Internet, see e.g. http://www.gradiance.com/STwelcome.html. The w3schools are also worth mentioning, their SQL tutorial can teach people how to use SQL to access and manipulate data in: MySQL, SQL Server, Access, Oracle, Sybase, DB2, and other database systems. At w3schools website http://www.w3schools.com/sql/ says this about what SQL is and what it can do:

What is SQL?  SQL stands for Structured Query Language  SQL lets you access and manipulate databases  SQL is an ANSI (American National Standards Institute) standard

2012_Of_modern_databases_and_the_relational_one.docx Sigrún Gunnarsdóttir Updated: 16.th November 2012 Page 13 of 15

What can SQL do?  SQL can execute queries against a database  SQL can retrieve data from a database  SQL can insert records in a database  SQL can update records in a database  SQL can delete records from a database  SQL can create new databases  SQL can create new tables in a database  SQL can create stored procedures in a database  SQL can create views in a database  SQL can set permissions on tables, procedures, and views

Most of the actions users need to perform on a database are done with SQL statements and the most used statements by the users are: SELECT - extracts data from a database UPDATE - updates data in a database DELETE - deletes data from a database INSERT INTO - inserts new data into a database

When the SELECT statement is used to select data from a database the result is stored in a result table, often called the result-set.

The SELECT statement can be quite simple, like "SELECT * FROM Area" which results in all the elements and all the rows from table Area, and it can also have some optional clauses:  WHERE specifies which rows to retrieve  GROUP BY groups rows sharing a property so that an aggregate function can be applied to each group  HAVING selects among the groups defined by the GROUP BY clause  ORDER BY specifies an order in which to return the rows

2012_Of_modern_databases_and_the_relational_one.docx Sigrún Gunnarsdóttir Updated: 16.th November 2012 Page 14 of 15

References

• databases.about.com/od/specificproducts/a/whatisadatabase.htm • http://www.tech-faq.com/database.shtml • http://searchsqlserver.techtarget.com/sDefinition/0,,sid87_gci213669,00.html • http://www.wisegeek.com/what-is-dbms.htm • http://www.monash.com/uploads/explosion-database-choice.pdf • http://www.oracle.com/technology/deploy/availability/pdf/CWP_HA_Oracle10gR2_D B28.2.pdf • http://www.mssqlcity.com/Articles/Compare/oracle_vs_db2.htm • http://www.computerweekly.com/Articles/2000/03/13/178996/white-paper-the-role-of- the-database- administrator.htm • http://oreilly.com/catalog/9781565929487/index.html#top • http://www.oracle.com/technology/products/database/sql_developer/index.html • http://www.smartdraw.com/tutorials/software/erd/tutorial_01.htm • http://www.minq.se/products/dbvis/ • http://support.microsoft.com/kb/110093 • http://www.webopedia.com/TERM/O/ODBC.html • http://oreilly.com/catalog/9781565929487/index.html#top • http://www.w3schools.com/sql/ • http://www.gradiance.com/STwelcome.html • http://en.wikipedia.org/wiki/Entity-relationship_model • http://www.codewalkers.com/c/a/Database-Code/Relationships-Entities-and-Database- Design/ • http://wiki.answers.com/Q/Difference_between_Flat_File_Database_and_Relational_D atabase • http://expertanswercenter.techtarget.com/eac/knowledgebaseAnswer/0,295199,sid63_g ci976564,00.html • http://databases.about.com/od/specificproducts/a/normalization.htm • http://support.microsoft.com/kb/283878

2012_Of_modern_databases_and_the_relational_one.docx Sigrún Gunnarsdóttir Updated: 16.th November 2012 Page 15 of 15