Web Application Engineering Data Modeling
Total Page:16
File Type:pdf, Size:1020Kb
Web Application Engineering Data Modeling Matthew Dailey Information and Communication Technologies Asian Institute of Technology Matthew Dailey (ICT-AIT) Web Eng 1 / 54 Readings Readings for these lecture notes: - Greenspun, SQL For Web Nerds. - Fowler, Patterns of Enterprise Application Architecture, Addison-Wesley, 2003. - Ruby, Copeland, and Thomas, Agile Web Development with Rails, 6th edition, 2020. These notes contain material © Greenspun, 2006; Fowler, 2003; Ruby, Copeland, and Thomas, 2020. Matthew Dailey (ICT-AIT) Web Eng 2 / 54 Outline 1 Introduction 2 SQL basics 3 Useful PostgreSQL features 4 Database normalization 5 Object-relational mapping 6 NoSQL (Mongo) Matthew Dailey (ICT-AIT) Web Eng 3 / 54 Introduction To this day, the RDBMS is the king of data storage. NoSQL databases have important use cases (very large datasets, semi-structured or unstructured data, document oriented processing), but these aren't relevant for small and medium sized applications. We will thus learn (or review for some of you) how to use the RDBMS as an effective means of persistence for our Web applications. Later, we will take a look at NoSQL databases such as MongoDB. For all practical purposes, a \relational database is a big spreadsheet that several people can update simultaneously." (Greenspun). Matthew Dailey (ICT-AIT) Web Eng 4 / 54 Outline 1 Introduction 2 SQL basics 3 Useful PostgreSQL features 4 Database normalization 5 Object-relational mapping 6 NoSQL (Mongo) Matthew Dailey (ICT-AIT) Web Eng 5 / 54 SQL basics Tables Each table in the database is a spreadsheet with fixed columns, each having a name and a data type. The rows are unordered. Example: create table mailing_list ( email varchar(100) not null primary key, name varchar(100) ); The primary key constraint means this column must be unique, and in PostgreSQL causes an index to be created on the column. Indices allow efficient search of one or more columns in a table. Matthew Dailey (ICT-AIT) Web Eng 6 / 54 SQL basics Populating and modifying tables We use SQL's insert command to add data to a table: insert into mailing_list ( name, email ) values ('Philip Greenspun','[email protected]'); We can add and delete new columns: alter table mailing_list add phone_number varchar(20) not null; alter table mailing_list drop phone_number; For queries we use select: select * from mailing_list; Matthew Dailey (ICT-AIT) Web Eng 7 / 54 SQL basics Many-to-one relationships Most folks have more than one phone number. Should we put a list in the phone number column? It might work but our data would not be in \normal" form (more on normalization later). For many-to-one relationships we normally use a separate table: create table phone_numbers ( email varchar(100) references mailing_list, phone_type char(1) check ( phone_type in ( 'W', 'H', 'M', 'F' )), phone_number varchar(20) ); The keyword references creates a consistency constraint between the two tables. Try adding phone numbers for email addresses that are not in the mailing list table. OK, insert some data into the table. Matthew Dailey (ICT-AIT) Web Eng 8 / 54 SQL basics Joins A join combines information from more than one table: select * from mailing_list, phone_numbers; But we don't get what we want | we get the cross product of the rows in the two tables. We have to be more selective: select * from mailing_list, phone_numbers where mailing_list.email = phone_numbers.email; Other useful commands: delete from mailing list and update mailing list. Matthew Dailey (ICT-AIT) Web Eng 9 / 54 SQL basics Data types We saw a few of SQL's data types already. Here is a more complete but still partial list, for PostgreSQL: Fixed-length strings (char(len)) Variable-length strings (varchar(len)) Variable-length strings, no limit on length (text) Variable-length binary data (bytea) Dates and times (date, time, timestamp) Numbers (integer, numeric, real precision, double precision, serial, others) Other more complex, less-used types Matthew Dailey (ICT-AIT) Web Eng 10 / 54 SQL basics Constraints Values can also be constrained: not null unique primary key check references That's all you need for some simple data modeling! Matthew Dailey (ICT-AIT) Web Eng 11 / 54 SQL basics Keys: natural or surrogate? A key is an attribute or group of attributes what uniquely identifies a row of a table. Composite keys are made up of more than one attribute. Natural keys are attributes in the real world: citizen ID number, etc. Surrogate keys are artifical keys introduced into the data model that have no relationship to the real-world entities being modeled. Many analysts prefer natural keys because surrogate keys are artificial and unrelated to the business logic. But natural keys may be coupled to the business logic and might therefore change when requirements change. Most Web application frameworks are easiest to work with when you allow them to define their own surrogate key for every table. Matthew Dailey (ICT-AIT) Web Eng 12 / 54 Outline 1 Introduction 2 SQL basics 3 Useful PostgreSQL features 4 Database normalization 5 Object-relational mapping 6 NoSQL (Mongo) Matthew Dailey (ICT-AIT) Web Eng 13 / 54 Useful PostgreSQL features User-defined functions PostgreSQL provides the PL/pgSQL language for specification of user-defined functions. As a simple example consider f (x) = 2x: create or replace function doubleint( x integer ) returns integer as $$ declare y integer; begin y := 2 * x; return y; end; $$ language plpgsql; Before creating a first PL/pgSQL function in your database, you must use the shell command createlang plpgsql apache (use your database's name instead of apache). Now queries like select doubleint( 10 ); should work. Matthew Dailey (ICT-AIT) Web Eng 14 / 54 Useful PostgreSQL features Triggers PL/pgSQL functions returning trigger can be set to execute automatically when a table is changed. Example: automatically create a change log entry every time a student changes projects: create table project_changes ( studentid integer references students, oldproj integer references projects, newproj integer references projects, update_timestamp timestamp ); create or replace function proj_log() returns trigger as $PROC$ begin if ( NEW.studentid = OLD.studentid and NEW.projectid <> OLD.projectid ) then insert into project_changes ( studentid, oldproj, newproj, update_timestamp ) values ( NEW.studentid, OLD.projectid, NEW.projectid, current_timestamp ); end if; return NEW; end; $PROC$ language plpgsql; drop trigger proj_log_post on students; create trigger proj_log_post after insert or update on students for each row execute procedure proj_log(); Matthew Dailey (ICT-AIT) Web Eng 15 / 54 Outline 1 Introduction 2 SQL basics 3 Useful PostgreSQL features 4 Database normalization 5 Object-relational mapping 6 NoSQL (Mongo) Matthew Dailey (ICT-AIT) Web Eng 16 / 54 Database normalization Introduction A normalized database only stores atomic data in a non-redundant form. The concept of normal form for relational databases was proposed by E.F. Codd in 1970. Normalizing a database means ensuring that all data in every table is atomic and depends only on the primary key for that table. Normalization means all dependencies are explicit in the data model. This makes it easier to maintain the database in a consistent state. There are many levels of normalization. The most important are first, second, and third normal form. Matthew Dailey (ICT-AIT) Web Eng 17 / 54 Database normalization First normal form Criteria for first normal form: All columns in every table are atomic (nondecomposable). Every row of every table has a unique primary key. Example: conference program committee website: Papers are submitted by potential authors Papers are reviewed by committee members (who can also be authors) The program chair makes acceptance and rejection decisions based on the reviews. Papers have an author list, a title, a list of keywords, a link to the PDF submission, a set of reviews, and a decision. Reviews have a single author, a paper being reviewed, comments, and ratings from 1{5 for technical quality, originality, and presentation. Matthew Dailey (ICT-AIT) Web Eng 18 / 54 Database normalization First normal form 1NF procedure: Consider each relation and break non-atomic attributes into separate tables. Add the relationships between the tables. Determine the primary keys. Matthew Dailey (ICT-AIT) Web Eng 19 / 54 Database normalization First normal form For atomicity, we need separate tables for (at least): papers people keywords reviews Relationships: Papers to authors: many to many. Requires a new table, papers authors relating the two. Papers to keywords: many to many. Requires a new table, papers keywords relating the two. Papers to reviews: one to many. Requires a foreign key reference in reviews. People to reviews: one to many. Requires a foreign key reference in reviews. Matthew Dailey (ICT-AIT) Web Eng 20 / 54 Database normalization First normal form Keys: papers: no natural key. Introduce surrogate paper id. people: no natural key. Introduce surrogate person id. keywords: the keyword itself must be unique, so it is a natural key. reviews: the paper, reviewer pair is unique. It is a natural (composite) key. With a unique key for all tables, and only atomic data, our database is in first normal form. Matthew Dailey (ICT-AIT) Web Eng 21 / 54 Database normalization Second normal form Criteria for second normal form: The database is in 1NF There should be no columns dependent on only part of a composite key. Example: suppose we had a column reviewer home page in the reviews table. This would be atomic but redundant, and should be moved to the people table. Matthew Dailey (ICT-AIT) Web Eng 22 / 54 Database normalization Third normal form Criteria for third normal form: The database is in 2NF There should be no columns dependent on non-key columns. Example: suppose for each review, we have a field originality (an integer between 1 and 5) and originality desc (\Groundbreaking", \Novel", \Somewhat new", \Minor variation of existing work", and \Complete ripoff") describing what the rating means.