Database design

Database design is the process of producing a detailed of a database. This logical data model contains all the needed logical and physical design choices and physical storage parameters needed to generate a design in a Data Definition Language, which can then be used to create a database. A fully attributed data model contains detailed attributes for each entity.

The term database design can be used to describe many different parts of the design of an overall database system. Principally, and most correctly, it can be thought of as the logical design of the base data structures used to store the data. In the these are the tables and views. In an the entities and relationships map directly to object classes and named relationships. However, the term database design could also be used to apply to the overall process of designing, not just the base data structures, but also the forms and queries used as part of the overall database application within the database management system (DBMS).[1]

The process of doing database design generally consists of a number of steps which will be carried out by the database designer. Usually, the designer must:

y Determine the relationships between the different data elements. y Superimpose a logical structure upon the data on the basis of these relationships.[2] ER Diagram (Entity-relationship model)

A sample Entity-relationship diagram

Database designs also include ER(Entity-relationship model) diagrams. An ER diagram is a diagram that helps to design in an efficient way. Attributes in ER diagrams are usually modeled as an oval with the name of the attribute, linked to the entity or relationship that contains the attribute.

Within the relational model the final step can generally be broken down into two further steps, that of determining the grouping of information within the system, generally determining what are the basic objects about which information is being stored, and then determining the relationships between these groups of information, or objects. This step is not necessary with an Object database.[2] [edit] The Design Process

The design process consists of the following steps[3]:

1. Determine the purpose of your database - This helps prepare you for the remaining steps. 2. Find and organize the information required - Gather all of the types of information you might want to record in the database, such as product name and order number. 3. Divide the information into tables - Divide your information items into major entities or subjects, such as Products or Orders. Each subject then becomes a table. 4. Turn information items into columns - Decide what information you want to store in each table. Each item becomes a field, and is displayed as a column in the table. For example, an Employees table might include fields such as Last Name and Hire Date. 5. Specify primary keys - Choose each table¶s primary key. The primary key is a column that is used to uniquely identify each row. An example might be Product ID or Order ID. 6. Set up the table relationships - Look at each table and decide how the data in one table is related to the data in other tables. Add fields to tables or create new tables to clarify the relationships, as necessary. 7. Refine your design - Analyze your design for errors. Create the tables and add a few records of sample data. See if you can get the results you want from your tables. Make adjustments to the design, as needed. 8. Apply the normalization rules - Apply the data normalization rules to see if your tables are structured correctly. Make adjustments to the tables. [edit] Determining data to be stored

In a majority of cases, a person who is doing the design of a database is a person with expertise in the area of database design, rather than expertise in the domain from which the data to be stored is drawn e.g. financial information, biological information etc. Therefore the data to be stored in the database must be determined in cooperation with a person who does have expertise in that domain, and who is aware of what data must be stored within the system.

This process is one which is generally considered part of requirements analysis, and requires skill on the part of the database designer to elicit the needed information from those with the domain knowledge. This is because those with the necessary domain knowledge frequently cannot express clearly what their system requirements for the database are as they are unaccustomed to thinking in terms of the discrete data elements which must be stored. Data to be stored can be determined by Requirement Specification.[4] [edit] Normalization

Main article: Database normalization

In the field of design, normalization is a systematic way of ensuring that a database structure is suitable for general-purpose querying and free of certain undesirable characteristics²insertion, update, and deletion anomalies²that could lead to a loss of data integrity.

A standard piece of database design guidance is that the designer should create a fully normalized design; selective denormalization can subsequently be performed, but only for performance reasons. However, some modeling disciplines, such as the dimensional modeling approach to design, explicitly recommend non-normalized designs, i.e. designs that in large part do not adhere to 3NF. [edit] Types of Database design

This section does not cite any references or sources. Please help improve this article by adding citations to reliable sources. Unsourced material may be

challenged and removed. (August 2010)

[edit]

Main article: Conceptual schema

Once a database designer is aware of the data which is to be stored within the database, they must then determine where dependancy is within the data. Sometimes when data is changed you can be changing other data that is not visible. For example, in a list of names and addresses, assuming a situation where multiple people can have the same address, but one person cannot have more than one address, the name is dependent upon the address, because if the address is different, then the associated name is different too. However, the other way around is different. One attribute can change and not another.

(NOTE: A common misconception is that the relational model is so called because of the stating of relationships between data elements therein. This is not true. The relational model is so named because it is based upon the mathematical structures known as relations.)

[edit] Logically structuring data

Once the relationships and dependencies amongst the various pieces of information have been determined, it is possible to arrange the data into a logical structure which can then be mapped into the storage objects supported by the database management system. In the case of relational databases the storage objects are tables which store data in rows and columns.

Each table may represent an implementation of either a logical object or a relationship joining one or more instances of one or more logical objects. Relationships between tables may then be stored as links connecting child tables with parents. Since complex logical relationships are themselves tables they will probably have links to more than one parent.

In an Object database the storage objects correspond directly to the objects used by the Object- oriented programming language used to write the applications that will manage and access the data. The relationships may be defined as attributes of the object classes involved or as methods that operate on the object classes.

[edit] Physical database design

The physical design of the database specifies the physical configuration of the database on the storage media. This includes detailed specification of data elements, data types, indexing options and other parameters residing in the DBMS data dictionary. It is the detailed design of a system that includes modules & the database's hardware & software specifications of the system.

͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙.

Database From Wikipedia, the free encyclopedia

Jump to: navigation, search

This article is about managing and structuring the collections of data held on computers. For a fuller discussion of DBMS software, see database management system. For databased content libraries, see online database. For other uses, see Database (disambiguation).

It has been suggested that be merged into this article or section.

(Discuss)

A database is a system intended to organize, store, and retrieve large amounts of data easily. It consists of an organized collection of data for one or more uses, typically in digital form. One way of classifying databases involves the type of their contents, for example: bibliographic, document-text, statistical. Digital databases are managed using database management systems, which store database contents, allowing data creation and maintenance, and search and other access. Architecture

Database architecture consists of three levels, external, conceptual and internal. Clearly separating the three levels was a major feature of the relational that dominates 21st century databases.[1]

The external level defines how users understand the organization of the data. A single database can have any number of views at the external level. The internal level defines how the data is physically stored and processed by the computing system. Internal architecture is concerned with cost, performance, scalability and other operational matters. The conceptual is a level of indirection between internal and external. It provides a common view of the database that is uncomplicated by details of how the data is stored or managed, and that can unify the various external views into a coherent whole.[1] [edit] Database management systems

Main article: Database management system

A database management system (DBMS) consists of software that operates databases, providing storage, access, security, backup and other facilities. Database management systems can be categorized according to the database model that they support, such as relational or XML, the type(s) of computer they support, such as a server cluster or a mobile phone, the query language(s) that access the database, such as SQL or XQuery, performance trade-offs, such as maximum scale or maximum speed or others. Some DBMS cover more than one entry in these categories, e.g., supporting multiple query languages. Examples of some commonly used DBMS are MySQL, PostgreSQL, Microsoft Access, SQL Server, FileMaker,Oracle,Sybase, dBASE, Clipper,FoxPro etc. Almost every database software comes with an Open Database Connectivity (ODBC) driver that allows the database to integrate with other databases.

[edit] Components of DBMS

Most DBMS as of 2009 implement a relational model.[2] Other DBMS systems, such as Object DBMS, offer specific features for more specialized requirements. Their components are similar, but not identical.

[edit] RDBMS components

y SublanguagesͶ Relational DBMS (RDBMS) include Data Definition Language (DDL) for defining the structure of the database, Data Control Language (DCL) for defining security/access controls, and Data Manipulation Language (DML) for querying and updating data. y Interface driversͶThese drivers are code libraries that provide methods to prepare statements, execute statements, fetch results, etc. Examples include ODBC, JDBC, MySQL/PHP, FireBird/Python. y SQL engineͶThis component interprets and executes the DDL, DCL, and DML statements. It includes three major components (compiler, optimizer, and executor). y Transaction engineͶEnsures that multiple SQL statements either succeed or fail as a group, according to application dictates. y Relational engineͶRelational objects such as Table, Index, and Referential integrity constraints are implemented in this component. y Storage engineͶThis component stores and retrieves data from secondary storage, as well as managing transaction commit and rollback, backup and recovery, etc.

[edit] ODBMS components

Object DBMS (ODBMS) has transaction and storage components that are analogous to those in an RDBMS. Some DBMS handle DDL, DML and update tasks differently. Instead of using sublanguages, they provide APIs for these purposes. They typically include a sublanguage and accompanying engine for processing queries with interpretive statements analogous to but not the same as SQL. Example object query languages are OQL, LINQ, JDOQL, JPAQL and others. The query engine returns collections of objects instead of relational rows. [edit] Types

This section does not cite any references or sources. Please help improve this article by adding citations to reliable sources. Unsourced material may be

challenged and removed. (January 2011)

[edit] Operational database

These databases store detailed data about the operations of an organization. They are typically organized by subject matter, process relatively high volumes of updates using transactions. Essentially every major organization on earth uses such databases. Examples include customer databases that record contact, credit, and demographic information about a business' customers, personnel databases that hold information such as salary, benefits, skills data about employees, Enterprise resource planning that record details about product components, parts inventory, and financial databases that keep track of the organization's money, accounting and financial dealings.

[edit] Data warehouse

Data warehouses archive modern data from operational databases and often from external sources such as market research firms. Often operational data undergoes transformation on its way into the warehouse, getting summarized, anonymized, reclassified, etc. The warehouse becomes the central source of data for use by managers and other end-users who may not have access to operational data. For example, sales data might be aggregated to weekly totals and converted from internal product codes to use UPC codes so that it can be compared with ACNielsen data.Some basic and essential components of data warehousing include retrieving and analyzing data, transforming,loading and managing data so as to make it available for further use. Operations in a data warehouse are typically concerned with bulk data manipulation, and as such, it is unusual and inefficient to target individual rows for update, insert or delete. Bulk native loaders for input data and bulk SQL passes for aggregation are the norm.

[edit] Analytical database

Analysts may do their work directly against a data warehouse or create a separate analytic database for Online Analytical Processing. For example, a company might extract sales records for analyzing the effectiveness of advertising and other sales promotions at an aggregate level.

[edit] Distributed database

These are databases of local work-groups and departments at regional offices, branch offices, manufacturing plants and other work sites. These databases can include segments of both common operational and common user databases, as well as data generated and used only at a user¶s own site.

[edit] End-user database

These databases consist of data developed by individual end-users. Examples of these are collections of documents in spreadsheets, word processing and downloaded files, even managing their personal baseball card collection.

[edit] External database

These databases contain data collected for use across multiple organizations, either freely or via subscription. The Internet Movie Database is one example.

[edit] Hypermedia databases

The World wide web can be thought of as a database, albeit one spread across millions of independent computing systems. Web browsers "process" this data one page at a time, while web crawlers and other software provide the equivalent of database indexes to support search and other activities. [edit] Models

Main article: Database model

[edit] Post-relational database models

Products offering a more general data model than the relational model are sometimes classified as post-relational.[3] Alternate terms include "hybrid database", "Object-enhanced RDBMS" and others. The data model in such products incorporates relations but is not constrained by E.F. Codd's Information Principle, which requires that all information in the database must be cast explicitly in terms of values in relations and in no other way[4]

Some of these extensions to the relational model integrate concepts from technologies that pre- date the relational model. For example, they allow representation of a directed graph with trees on the nodes.

Some post-relational products extend relational systems with non-relational features. Others arrived in much the same place by adding relational features to pre-relational systems. Paradoxically, this allows products that are historically pre-relational, such as PICK and MUMPS, to make a plausible claim to be post-relational.

[edit] Object database models

Main article: Object database

In recent years, the object-oriented paradigm has been applied in areas such as engineering and spatial databases, telecommunications and in various scientific domains. The conglomeration of object oriented programming and database technology led to this new kind of database. These databases attempt to bring the database world and the application-programming world closer together, in particular by ensuring that the database uses the same type system as the application program. This aims to avoid the overhead (sometimes referred to as the impedance mismatch) of converting information between its representation in the database (for example as rows in tables) and its representation in the application program (typically as objects). At the same time, object databases attempt to introduce key ideas of object programming, such as encapsulation and polymorphism, into the world of databases.

A variety of these ways have been tried[by whom?] for storing objects in a database. Some products have approached the problem from the application-programming side, by making the objects manipulated by the program persistent. This also typically requires the addition of some kind of query language, since conventional programming languages do not provide language-level functionality for finding objects based on their information content. Others[which?] have attacked the problem from the database end, by defining an object-oriented data model for the database, and defining a database programming language that allows full programming capabilities as well as traditional query facilities.. [edit] Storage structures

Main article: Database storage structures

Databases may store relational tables/indexes in memory or on hard disk in one of many forms:

y ordered/unordered flat files y ISAM y heaps y hash buckets y logically-blocked files y B+ trees

The most commonly used[citation needed] are B+ trees and ISAM.

Object databases use a range of storage mechanisms. Some use virtual memory-mapped files to make the native language (C++, Java etc.) objects persistent. This can be highly efficient but it can make multi-language access more difficult. Others disassemble objects into fixed- and varying-length components that are then clustered in fixed sized blocks on disk and reassembled into the appropriate format on either the client or server address space. Another popular technique involves storing the objects in tuples (much like a relational database) which the then reassembles into objects for the client.[citation needed]

Other techniques include clustering by category (such as grouping data by month, or location), storing pre-computed query results, known as materialized views, partitioning data by range (e.g., a data range) or by hash.

Memory management and storage topology can be important design choices for database designers as well. Just as normalization is used to reduce storage requirements and improve database designs, conversely denormalization is often used to reduce join complexity and reduce query execution time.[5] [edit] Indexing

Main article: Index (database)

Indexing is a technique for improving database performance. The many types of index share the common property that they eliminate the need to examine every entry when running a query. In large databases, this can reduce query time/cost by orders of magnitude. The simplest form of index is a sorted list of values that can be searched using a binary search with an adjacent reference to the location of the entry, analogous to the index in the back of a book. The same data can have multiple indexes (an employee database could be indexed by last name and hire date.)

Indexes affect performance, but not results. Database designers can add or remove indexes without changing application logic, reducing maintenance costs as the database grows and database usage evolves.

Given a particular query, the DBMS' query optimizer is responsible for devising the most efficient strategy for finding matching data. The optimizer decides which index or indexes to use, how to combine data from different parts of the database, how to provide data in the order requested, etc. Indexes can speed up data access, but they consume space in the database, and must be updated each time the data is altered. Indexes therefore can speed data access but slow data maintenance. These two properties determine whether a given index is worth the cost. [edit] Transactions

Main article: Database transaction

This section may stray from the topic of the article into the topic of another article, Database management system. Please help improve this section or discuss this issue on the talk page.

(November 2010)

As every software system, a DBMS operates in a faulty computing environment and prone to failures of many kinds. A failure can corrupt the respective database unless special measures are taken to prevent this. A DBMS achieves certain levels of fault tolerance by encapsulating in database transactions units of work (executed programs) performed upon the respective database.

[edit] The ACID rules

Main article: ACID

Most DBMS provide some form of support for transactions, which allow multiple data items to be updated in a consistent fashion, such that updates that are part of a transaction succeed or fail in unison. The so-called ACID rules, summarized here, characterize this behavior:

y Atomicity: Either all the data changes in a transaction must happen, or none of them. The transaction must be completed, or else it must be undone (rolled back). y Consistency: Every transaction must preserve the declared consistency rules for the database. y Isolation: Two concurrent transactions cannot interfere with one another. Intermediate results within one transaction must remain invisible to other transactions. The most extreme form of isolation is serializability, meaning that transactions that take place concurrently could instead be performed in some series, without affecting the ultimate result. y Durability: Completed transactions cannot be aborted later or their results discarded. They must persist through (for instance) DBMS restarts.

In practice, many DBMSs allow the selective relaxation of these rules to balance perfect behavior with optimum performance.

[edit] Concurrency control and locking

Main article: Concurrency control Concurrency control is essential for the correctness of transactions executed concurrently in a DBMS, which is the common execution mode for performance reasons. The main concern and goal of concurrency control is isolation.

[edit] Isolation

Isolation refers to the ability of one transaction to see the results of other transactions. Greater isolation typically reduces performance and/or concurrency, leading DBMSs to provide administrative options to reduce isolation. For example, in a database that analyzes trends rather than looking at low-level detail, increased performance might justify allowing readers to see uncommitted changes ("dirty reads".)

A common way to achieve isolation is by locking. When a transaction modifies a resource, the DBMS stops other transactions from also modifying it, typically by locking it. Locks also provide one method of ensuring that data does not change while a transaction is reading it or even that it doesn't change until a transaction that once read it has completed.

[edit] Lock types

Locks can be shared[6] or exclusive, and can lock out readers and/or writers. Locks can be created implicitly by the DBMS when a transaction performs an operation, or explicitly at the transaction's request.

Shared locks allow multiple transactions to lock the same resource. The lock persists until all such transactions complete. Exclusive locks are held by a single transaction and prevent other transactions from locking the same resource.

Read locks are usually shared, and prevent other transactions from modifying the resource. Write locks are exclusive, and prevent other transactions from modifying the resource. On some systems, write locks also prevent other transactions from reading the resource.

The DBMS implicitly locks data when it is updated, and may also do so when it is read. Transactions explicitly lock data to ensure that they can complete without complications. Explicit locks may be useful for some administrative tasks.[7][8]

Locking can significantly affect database performance, especially with large and complex transactions in highly concurrent environments.

[edit] Lock granularity

Locks can be coarse, covering an entire database, fine-grained, covering a single data item, or intermediate covering a collection of data such as all the rows in a RDBMS table. [edit] Deadlocks

Deadlocks occur when two transactions each require data that the other has already locked exclusively. Deadlock detection is performed by the DBMS, which then aborts one of the transactions and allows the other to complete. [edit] Replication

Main article: Database replication

Database replication involves maintaining multiple copies of a database on different computers, to allow more users to access it, or to allow a secondary site to immediately take over if the primary site stops working. Some DBMS piggyback replication on top of their transaction logging facility, applying the primary's log to the secondary in near real-time. Database clustering is a related concept for handling larger databases and user communities by employing a cluster of multiple computers to host a single database that can use replication as part of its approach.[9][10] [edit] Security

Main article: Database security

Database security denotes the system, processes, and procedures that protect a database from unauthorized activity.

DBMSs usually enforce security through access control, auditing, and encryption:

y Access control manages who can connect to the database via authentication and what they can do via authorization. y Auditing records information about database activity: who, what, when, and possibly where. y Encryption protects data at the lowest possible level by storing and possibly transmitting data in an unreadable form. The DBMS encrypts data when it is added to the database and decrypts it when returning query results. This process can occur on the client side of a network connection to prevent unauthorized access at the point of use.

[edit] Confidentiality

Law and regulation governs the release of information from some databases, protecting medical history, driving records, telephone logs, etc.

In the United Kingdom, database privacy regulation falls under the Office of the Information Commissioner. Organizations based in the United Kingdom and holding personal data in digital format such as databases must register with the Office.[11]

͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙ Entity-relationship model From Wikipedia, the free encyclopedia

Jump to: navigation, search

A sample Entity-relationship diagram using Chen's notation

In software engineering, an entity-relationship model (ERM) is an abstract and conceptual representation of data. Entity-relationship modeling is a database modeling method, used to produce a type of conceptual schema or of a system, often a relational database, and its requirements in a top-down fashion. Diagrams created by this process are called entity-relationship diagrams, ER diagrams, or ERDs.

The definitive reference for entity-relationship modeling is Peter Chen's 1976 paper.[1] However, variants of the idea existed previously,[2] and have been devised subsequently. Overview

The first stage of design uses these models during the requirements analysis to describe information needs or the type of information that is to be stored in a database. The technique can be used to describe any ontology (i.e. an overview and classifications of used terms and their relationships) for a certain area of interest. In the case of the design of an information system that is based on a database, the conceptual data model is, at a later stage (usually called logical design), mapped to a logical data model, such as the relational model; this in turn is mapped to a physical model during physical design. Note that sometimes, both of these phases are referred to as "physical design". [edit] The building blocks: entities, relationships, and attributes

Two related entities

An entity with an attribute

A relationship with an attribute

Primary key

An entity may be defined as a thing which is recognized as being capable of an independent existence and which can be uniquely identified. An entity is an abstraction from the complexities of some domain. When we speak of an entity we normally speak of some aspect of the real world which can be distinguished from other aspects of the real world.[3]

An entity may be a physical object such as a house or a car, an event such as a house sale or a car service, or a concept such as a customer transaction or order. Although the term entity is the one most commonly used, following Chen we should really distinguish between an entity and an entity-type. An entity-type is a category. An entity, strictly speaking, is an instance of a given entity-type. There are usually many instances of an entity-type. Because the term entity-type is somewhat cumbersome, most people tend to use the term entity as a synonym for this term.

Entities can be thought of as nouns. Examples: a computer, an employee, a song, a mathematical theorem.

A relationship captures how two or more entities are related to one another. Relationships can be thought of as verbs, linking two or more nouns. Examples: an owns relationship between a company and a computer, a supervises relationship between an employee and a department, a performs relationship between an artist and a song, a proved relationship between a mathematician and a theorem.

The model's linguistic aspect described above is utilized in the declarative database query language ERROL, which mimics natural language constructs.

Entities and relationships can both have attributes. Examples: an employee entity might have a Social Security Number (SSN) attribute; the proved relationship may have a date attribute. Every entity (unless it is a weak entity) must have a minimal set of uniquely identifying attributes, which is called the entity's primary key.

Entity-relationship diagrams don't show single entities or single instances of relations. Rather, they show entity sets and relationship sets. Example: a particular song is an entity. The collection of all songs in a database is an entity set. The eaten relationship between a child and her lunch is a single relationship. The set of all such child-lunch relationships in a database is a relationship set. In other words, a relationship set corresponds to a relation in mathematics, while a relationship corresponds to a member of the relation.

Certain cardinality constraints on relationship sets may be indicated as well. [edit] Diagramming conventions

Entity sets are drawn as rectangles, relationship sets as diamonds. If an entity set participates in a relationship set, they are connected with a line.

Attributes are drawn as ovals and are connected with a line to exactly one entity or relationship set.

Cardinality constraints are expressed as follows:

y a double line indicates a participation constraint, totality or surjectivity: all entities in the entity set must participate in at least one relationship in the relationship set; y an arrow from entity set to relationship set indicates a key constraint, i.e. injectivity: each entity of the entity set can participate in at most one relationship in the relationship set; y a thick line indicates both, i.e. bijectivity: each entity in the entity set is involved in exactly one relationship. y an underlined name of an attribute indicates that it is a key: two different entities or relationships with this attribute always have different values for this attribute.

Attributes are often omitted as they can clutter up a diagram; other diagram techniques often list entity attributes within the rectangles drawn for entity sets.

Two related entities shown using Crow's Foot notation

Chen's notation for entity-relationship modeling uses rectangles to represent entities, and diamonds to represent relationships appropriate for first-class objects: they can have attributes and relationships of their own.

Related diagramming convention techniques:

y Bachman notation y EXPRESS y IDEF1X[4] y Martin notation y (min, max)-notation of Jean-Raymond Abrial in 1974 y UML class diagrams

[edit] Crow's Foot Notation

Crow's Foot notation is used in Barker's Notation, SSADM and Information Engineering. Crow's Foot diagrams represent entities as boxes, and relationships as lines between the boxes. The ends of these lines are shaped to represent the cardinality of the relationship.

Usage of Chen notation is more prevalent in the United States, while Crow's Foot notation is used primarily in the UK. Crow's Foot notation was used in the 1980s by the consultancy practice CACI. Many of the consultants at CACI (including Barker) subsequently moved to Oracle UK, where they developed the early versions of Oracle's CASE tools, introducing the notation to a wider audience. Crow's Foot notation is used by these tools: ARIS, System Architect, Visio, PowerDesigner, Toad Data Modeler, DeZign for Databases, Devgems Data Modeler, OmniGraffle, and MySQL Workbench. CA's ICASE tool, CA Gen aka Information_Engineering_Facility also uses this notation. [edit] ER diagramming tools

There are many ER diagramming tools. Some free software ER diagramming tools that can interpret and generate ER models, SQL and do database analysis are MySQL Workbench (formerly DBDesigner), and Open ModelSphere (open-source). A freeware ER tool that can generate database and application layer code (webservices) is the RISE Editor. The Open Source Erviz takes a simple textual description of the ERD and then uses Graphviz to automatically produce a nice, if slightly spaced out, layout.

Some of the proprietary ER diagramming tools are ARIS, Avolution, Aqua Data Studio, dbForge Studio for MySQL, DeZign for Databases, ER/Studio, Devgems Data Modeler, ERwin, MEGA International, ModelRight, OmniGraffle, Oracle Designer, PowerDesigner, Rational Rose, Sparx Enterprise Architect, SQLyog, System Architect, Toad Data Modeler, SQL Maestro, Microsoft Visio, Visible Analyst, and Visual Paradigm.

Some free software diagram tools just draw the shapes without having any knowledge of what they mean, nor do they generate SQL. These include Gliffy[5], Kivio and Dia. DIA diagrams, however, can be translated with tedia2sql.

͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙

Data model From Wikipedia, the free encyclopedia

Jump to: navigation, search

Overview of data modeling context: A data model provides the details of information to be stored, and is of primary use when the final product is the generation of computer software code for an application or the preparation of a functional specification to aid a computer software make-or-buy decision. The figure is an example of the interaction between process and data models.[1]

A data model in software engineering is an abstract model, that documents and organizes the business data for communication between team members and is used as a plan for developing applications, specifically how data is stored and accessed.

According to Hoberman (2009), "A data model is a wayfinding tool for both business and IT professionals, which uses a set of symbols and text to precisely explain a subset of real information to improve communication within the organization and thereby lead to a more flexible and stable application environment."[2]

A data model explicitly determines the structure of data or structured data. Typical applications of data models include database models, design of information systems, and enabling exchange of data. Usually data models are specified in a data modeling language.[3]

Communication and precision are the two key benefits that make a data model important to applications that use and exchange data. A data model is the medium which project team members from different backgrounds and with different levels of experience can communicate with one another. Precision means that the terms and rules on a data model can be interpreted only one way and are not ambiguous.[2]

A data model can be sometimes referred to as a , especially in the context of programming languages. Data models are often complemented by function models, especially in the context of enterprise models. Overview

Managing large quantities of structured and unstructured data is a primary function of information systems. Data models describe structured data for storage in systems such as relational databases. They typically do not describe unstructured data, such as word processing documents, email messages, pictures, digital audio, and video.

[edit] The role of data models

How data models deliver benefit.[4]

The main aim of data models is to support the development of information systems by providing the definition and format of data. According to West and Fowler (1999) "if this is done consistently across systems then compatibility of data can be achieved. If the same data structures are used to store and access data then different applications can share data. The results of this are indicated above. However, systems and interfaces often cost more than they should, to build, operate, and maintain. They may also constrain the business rather than support it. A major cause is that the quality of the data models implemented in systems and interfaces is poor".[4]

y "Business rules, specific to how things are done in a particular place, are often fixed in the structure of a data model. This means that small changes in the way business is conducted lead to large changes in computer systems and interfaces".[4] y "Entity types are often not identified, or incorrectly identified. This can lead to replication of data, data structure, and functionality, together with the attendant costs of that duplication in development and maintenance".[4] y "Data models for different systems are arbitrarily different. The result of this is that complex interfaces are required between systems that share data. These interfaces can account for between 25-70% of the cost of current systems".[4] y "Data cannot be shared electronically with customers and suppliers, because the structure and meaning of data has not been standardised. For example, engineering design data and drawings for process plant are still sometimes exchanged on paper".[4] The reason for these problems is a lack of standards that will ensure that data models will both meet business needs and be consistent.[4]

[edit] Three perspectives

The ANSI/SPARC three level architecture. This shows that a data model can be an external model (or view), a , or a physical model. This is not the only way to look at data models, but it is a useful way, particularly when comparing models.[4]

A data model instance may be one of three kinds according to ANSI in 1975:[5]

y Conceptual schema : describes the semantics of a domain, being the scope of the model. For example, it may be a model of the interest area of an organization or industry. This consists of entity classes, representing kinds of things of significance in the domain, and relationships assertions about associations between pairs of entity classes. A conceptual schema specifies the kinds of facts or propositions that can be expressed using the model. In that sense, it defines the allowed expressions in an artificial 'language' with a scope that is limited by the scope of the model. The use of conceptual schema has evolved to become a powerful communication tool with business users. Often called a subject area model (SAM) or high-level data model (HDM), this model is used to communicate core data concepts, rules, and definitions to a business user as part of an overall application development or enterprise initiative. The number of objects should be very small and focused on key concepts. Try to limit this model to one page, although for extremely large organizations or complex projects, the model might span two or more pages.[6] y : describes the semantics, as represented by a particular data manipulation technology. This consists of descriptions of tables and columns, object oriented classes, and XML tags, among other things. y Physical schema : describes the physical means by which data are stored. This is concerned with partitions, CPUs, tablespaces, and the like.

The significance of this approach, according to ANSI, is that it allows the three perspectives to be relatively independent of each other. Storage technology can change without affecting either the logical or the conceptual model. The table/column structure can change without (necessarily) affecting the conceptual model. In each case, of course, the structures must remain consistent with the other model. The table/column structure may be different from a direct translation of the entity classes and attributes, but it must ultimately carry out the objectives of the conceptual entity class structure. Early phases of many software development projects emphasize the design of a conceptual data model. Such a design can be detailed into a logical data model. In later stages, this model may be translated into physical data model. However, it is also possible to implement a conceptual model directly. [edit] History

One of the earliest pioneering works in modelling information systems was done by Young and Kent (1958),[7][8] who argued for "a precise and abstract way of specifying the informational and time characteristics of a data processing problem". They wanted to create "a notation that should enable the analyst to organize the problem around any piece of hardware". Their work was a first effort to create an abstract specification and invariant basis for designing different alternative implementations using different hardware components. A next step in IS modelling was taken by CODASYL, an IT industry consortium formed in 1959, who essentially aimed at the same thing as Young and Kent: the development of "a proper structure for machine independent problem definition language, at the system level of data processing". This led to the development of a specific IS information algebra.[8]

In the 1960s data modeling gained more significance with the initiation of the management information system (MIS) concept. According to Leondes (2002), "during that time, the information system provided the data and information for management purposes. The first generation database system, called (IDS), was designed by at General Electric. Two famous database models, the network data model and the hierarchical data model, were proposed during this period of time".[9] Towards the end of the 1960s Edgar F. Codd worked out his theories of data arrangement, and proposed the relational model for database management based on first-order predicate logic.[10]

In the 1970s entity relationship modeling emerged as a new type of conceptual data modeling, originally proposed in 1976 by Peter Chen. Entity relationship models were being used in the first stage of information system design during the requirements analysis to describe information needs or the type of information that is to be stored in a database. This technique can describe any ontology, i.e., an overview and classification of concepts and their relationships, for a certain area of interest.

In the 1970s G.M. Nijssen developed "Natural Language Information Analysis Method" (NIAM) method, and developed this in the 1980s in cooperation with Terry Halpin into Object-Role Modeling (ORM).

Further in the 1980s according to Jan L. Harrington (2000) "the development of the object- oriented paradigm brought about a fundamental change in the way we look at data and the procedures that operate on data. Traditionally, data and procedures have been stored separately: the data and their relationship in a database, the procedures in an application program. Object orientation, however, combined an entity's procedure with its data."[11] [edit] Types of data models

[edit] Database model

A database model is a theory or specification describing how a database is structured and used. Several such models have been suggested. Common models include:

Relational model Hierarchical model Network model Flat model

y Flat model: This may not strictly qualify as a data model. The flat (or table) model consists of a single, two-dimensional array of data elements, where all members of a given column are assumed to be similar values, and all members of a row are assumed to be related to one another. y Hierarchical model: In this model data is organized into a tree-like structure, implying a single upward link in each record to describe the nesting, and a sort field to keep the records in a particular order in each same-level list. y Network model: This model organizes data using two fundamental constructs, called records and sets. Records contain fields, and sets define one-to-many relationships between records: one owner, many members. y Relational model: is a database model based on first-order predicate logic. Its core idea is to describe a database as a collection of predicates over a finite set of predicate variables, describing constraints on the possible values and combinations of values.

Concept-oriented model Star schema

y Object-relational model: Similar to a relational database model, but objects, classes and inheritance are directly supported in database schemas and in the query language. y Star schema is the simplest style of data warehouse schema. The star schema consists of a few "fact tables" (possibly only one, justifying the name) referencing any number of "dimension tables". The star schema is considered an important special case of the snowflake schema. [edit]

Example of a Data Structure Diagram.

A data structure diagram (DSD) is a diagram and data model used to describe conceptual data models by providing graphical notations which document entities and their relationships, and the constraints that binds them. The basic graphic elements of DSDs are boxes, representing entities, and arrows, representing relationships. Data structure diagrams are most useful for documenting complex data entities.

Data structure diagrams are an extension of the entity-relationship model (ER model). In DSDs, attributes are specified inside the entity boxes rather than outside of them, while relationships are drawn as boxes composed of attributes which specify the constraints that bind entities together. The E-R model, while robust, doesn't provide a way to specify the constraints between relationships, and becomes visually cumbersome when representing entities with several attributes. DSDs differ from the ER model in that the ER model focuses on the relationships between different entities, whereas DSDs focus on the relationships of the elements within an entity and enable users to fully see the links and relationships between each entity.

There are several styles for representing data structure diagrams, with the notable difference in the manner of defining cardinality. The choices are between arrow heads, inverted arrow heads (crow's feet), or numerical representation of the cardinality.

Example of a IDEF1X Entity relationship diagrams used to model IDEF1X itself.[12]

[edit] Entity-relationship model

An entity-relationship model (ERM) is an abstract conceptual data model (or semantic data model) used in software engineering to represent structured data. There are several notations used for ERMs.

[edit] Geographic data model

A data model in Geographic information systems is a mathematical construct for representing geographic objects or surfaces as data. For example,

y the vector data model represents geography as collections of points, lines, and polygons; y the raster data model represent geography as cell matrixes that store numeric values; y and the Triangulated irregular network (TIN) data model represents geography as sets of contiguous, nonoverlapping triangles.[13]

Groups relate to process NGMDB databases NGMDB data model [14] of making a map[14] linked together applications[14] Representing 3D map information[14]

[edit]

Generic data models are generalizations of conventional data models. They define standardised general relation types, together with the kinds of things that may be related by such a relation type. Generic data models are developed as an approach to solve some shortcomings of conventional data models. For example, different modelers usually produce different conventional data models of the same domain. This can lead to difficulty in bringing the models of different people together and is an obstacle for data exchange and . Invariably, however, this difference is attributable to different levels of abstraction in the models and differences in the kinds of facts that can be instantiated (the semantic expression capabilities of the models). The modelers need to communicate and agree on certain elements which are to be rendered more concretely, in order to make the differences less significant.

[edit] Semantic data model

Semantic data models.[12]

A semantic data model in software engineering is a technique to define the meaning of data within the context of its interrelationships with other data. A semantic data model is an abstraction which defines how the stored symbols relate to the real world.[12] A semantic data model is sometimes called a conceptual data model.

The logical data structure of a database management system (DBMS), whether hierarchical, network, or relational, cannot totally satisfy the requirements for a conceptual definition of data because it is limited in scope and biased toward the implementation strategy employed by the DBMS. Therefore, the need to define data from a conceptual view has led to the development of semantic data modeling techniques. That is, techniques to define the meaning of data within the context of its interrelationships with other data. As illustrated in the figure. The real world, in terms of resources, ideas, events, etc., are symbolically defined within physical data stores. A semantic data model is an abstraction which defines how the stored symbols relate to the real world. Thus, the model must be a true representation of the real world.[12] [edit] Data model topics

[edit]

Data architecture is the design of data for use in defining the target state and the subsequent planning needed to hit the target state. It is usually one of several architecture domains that form the pillars of an or solution architecture. A data architecture describes the data structures used by a business and/or its applications. There are descriptions of data in storage and data in motion; descriptions of data stores, data groups and data items; and mappings of those data artifacts to data qualities, applications, locations etc.

Essential to realizing the target state, Data architecture describes how data is processed, stored, and utilized in a given system. It provides criteria for data processing operations that make it possible to design data flows and also control the flow of data in the system.

[edit] Data modeling

Main article: Data modeling

The data modeling process.

Data modeling in software engineering is the process of creating a data model by applying formal data model descriptions using data modeling techniques. Data modeling is a technique for defining business requirements for a database. It is sometimes called database modeling because a data model is eventually implemented in a database.[15]

The figure illustrates the way data models are developed and used today. A conceptual data model is developed based on the data requirements for the application that is being developed, perhaps in the context of an activity model. The data model will normally consist of entity types, attributes, relationships, integrity rules, and the definitions of those objects. This is then used as the start point for interface or database design.[4]

[edit] Data properties

Some important properties of data for which requirements need to be met are:

y definition-related properties[4] o relevance: the usefulness of the data in the context of your business. o clarity: the availability of a clear and shared definition for the data. o consistency: the compatibility of the same type of data from different sources.

Some important properties of data.[4]

y content-related properties o timeliness: the availability of data at the time required and how up to date that data is. o accuracy: how close to the truth the data is. y properties related to both definition and content o completeness: how much of the required data is available. o accessibility: where, how, and to whom the data is available or not available (e.g. security). o cost: the cost incurred in obtaining the data, and making it available for use.

[edit] Data organization

Another kind of data model describes how to organize data using a database management system or other data management technology. It describes, for example, relational tables and columns or object-oriented classes and attributes. Such a data model is sometimes referred to as the physical data model, but in the original ANSI three schema architecture, it is called "logical". In that architecture, the physical model describes the storage media (cylinders, tracks, and tablespaces). Ideally, this model is derived from the more conceptual data model described above. It may differ, however, to account for constraints like processing capacity and usage patterns.

While data analysis is a common term for data modeling, the activity actually has more in common with the ideas and methods of synthesis (inferring general concepts from particular instances) than it does with analysis (identifying component concepts from more general ones). {Presumably we call ourselves systems analysts because no one can say systems synthesists.} Data modeling strives to bring the data structures of interest together into a cohesive, inseparable, whole by eliminating unnecessary data redundancies and by relating data structures with relationships.

A different approach is through the use of adaptive systems such as artificial neural networks that can autonomously create implicit models of data.

[edit] Data structure

A binary tree, a simple type of branching linked data structure.

A data structure is a way of storing data in a computer so that it can be used efficiently. It is an organization of mathematical and logical concepts of data. Often a carefully chosen data structure will allow the most efficient algorithm to be used. The choice of the data structure often begins from the choice of an abstract data type.

A data model describes the structure of the data within a given domain and, by implication, the underlying structure of that domain itself. This means that a data model in fact specifies a dedicated grammar for a dedicated artificial language for that domain. A data model represents classes of entities (kinds of things) about which a company wishes to hold information, the attributes of that information, and relationships among those entities and (often implicit) relationships among those attributes. The model describes the organization of the data to some extent irrespective of how data might be represented in a computer system.

The entities represented by a data model can be the tangible entities, but models that include such concrete entity classes tend to change over time. Robust data models often identify abstractions of such entities. For example, a data model might include an entity class called "Person", representing all the people who interact with an organization. Such an abstract entity class is typically more appropriate than ones called "Vendor" or "Employee", which identify specific roles played by those people.

Linked list Hash table Array Stack (data structure)

[edit] Data model theory

The term data model can have two meanings:[16] 1. A data model theory, i.e. a formal description of how data may be structured and accessed. 2. A data model instance, i.e. applying a data model theory to create a practical data model instance for some particular application.

A data model theory has three main components:[16]

y The structural part: a collection of data structures which are used to create databases representing the entities or objects modeled by the database. y The integrity part: a collection of rules governing the constraints placed on these data structures to ensure structural integrity. y The manipulation part: a collection of operators which can be applied to the data structures, to update and query the data contained in the database.

For example, in the relational model, the structural part is based on a modified concept of the mathematical relation; the integrity part is expressed in first-order logic and the manipulation part is expressed using the relational algebra, tuple calculus and domain calculus.

A data model instance is created by applying a data model theory. This is typically done to solve some business enterprise requirement. Business requirements are normally captured by a semantic logical data model. This is transformed into a physical data model instance from which is generated a physical database. For example, a data modeler may use a data modeling tool to create an entity-relationship model of the corporate data repository of some business enterprise. This model is transformed into a relational model, which in turn generates a relational database.

[edit] Patterns

Patterns[17] are common data modeling structures that occur in many data models. [edit] Related models

[edit] Data flow diagram

Data Flow Diagram example.[18]

A data flow diagram (DFD) is a graphical representation of the "flow" of data through an information system. It differs from the flowchart as it shows the data flow instead of the control flow of the program. A data flow diagram can also be used for the visualization of data processing (structured design). Data flow diagrams were invented by Larry Constantine, the original developer of structured design,[19] based on Martin and Estrin's "data flow graph" model of computation.

It is common practice to draw a context-level Data flow diagram first which shows the interaction between the system and outside entities. The DFD is designed to show how a system is divided into smaller portions and to highlight the flow of data between those parts. This context-level Data flow diagram is then "exploded" to show more detail of the system being modeled

[edit]

Example of an EXPRESS G Information model.

An Information model is not a type of data model, but more or less an alternative model. Within the field of software engineering both a data model and an information model can be abstract, formal representations of entity types that includes their properties, relationships and the operations that can be performed on them. The entity types in the model may be kinds of real- world objects, such as devices in a network, or they may themselves be abstract, such as for the entities used in a billing system. Typically, they are used to model a constrained domain that can be described by a closed set of entity types, properties, relationships and operations.

According to Lee (1999)[20] an information model is a representation of concepts, relationships, constraints, rules, and operations to specify data semantics for a chosen domain of discourse. It can provide sharable, stable, and organized structure of information requirements for the domain context.[20] More in general the term information model is used for models of individual things, such as facilities, buildings, process plants, etc. In those cases the concept is specialised to Facility Information Model, Building Information Model, Plant Information Model, etc. Such an information model is an integration of a model of the facility with the data and documents about the facility. An information model provides formalism to the description of a problem domain without constraining how that description is mapped to an actual implementation in software. There may be many mappings of the information model. Such mappings are called data models, irrespective of whether they are object models (e.g. using UML), entity relationship models or XML schemas.

Document , a standard object model for representing HTML or XML.

[edit] Object model

An object model in computer science is a collection of objects or classes through which a program can examine and manipulate some specific parts of its world. In other words, the object- oriented interface to some service or system. Such an interface is said to be the object model of the represented service or system. For example, the Document Object Model (DOM) [3] is a collection of objects that represent a page in a web browser, used by script programs to examine and dynamically change the page. There is a Microsoft Excel object model[21] for controlling Microsoft Excel from another program, and the ASCOM Telescope Driver[22] is an object model for controlling an astronomical telescope.

In computing the term object model has a distinct second meaning of the general properties of objects in a specific computer programming language, technology, notation or methodology that uses them. For example, the Java object model, the COM object model, or the object model of OMT. Such object models are usually defined using concepts such as class, message, inheritance, polymorphism, and encapsulation. There is an extensive literature on formalized object models as a subset of the formal semantics of programming languages.

[edit] Object-Role Model

Example of the application of Object-Role Modeling in a "Schema for Geologic Surface", Stephen M. Richard (1999).[23]

Object-Role Modeling (ORM) is a method for conceptual modeling, and can be used as a tool for information and rules analysis.[24]

Object-Role Modeling is a fact-oriented method for performing systems analysis at the conceptual level. The quality of a database application depends critically on its design. To help ensure correctness, clarity, adaptability and productivity, information systems are best specified first at the conceptual level, using concepts and language that people can readily understand.

The conceptual design may include data, process and behavioral perspectives, and the actual DBMS used to implement the design might be based on one of many logical data models (relational, hierarchic, network, object-oriented etc.).[25]

[edit] Unified Modeling Language models

The Unified Modeling Language (UML) is a standardized general-purpose modeling language in the field of software engineering. It is a graphical language for visualizing, specifying, constructing, and documenting the artifacts of a software-intensive system. The Unified Modeling Language offers a standard way to write a system's blueprints, including:[26]

y Conceptual things such as business processes and system functions y Concrete things such as programming language statements, database schemas, and y Reusable software components.

UML offers a mix of functional models, data models, and database models

͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙.. Weak entity From Wikipedia, the free encyclopedia

Jump to: navigation, search

This article does not cite any references or sources. Please help improve this article by adding citations to reliable sources. Unsourced material may be

challenged and removed. (October 2008)

In a relational database, a Weak Entity is an entity that cannot be uniquely identified by its attributes alone; therefore, it must use a foreign key in conjunction with its attributes to create a primary key. The foreign key is typically a primary key of an entity it is related to.

In entity relationship diagrams a weak entity set is indicated by a bold rectangle (the entity) connected by a bold type arrow to a bold diamond (the relationship). This type of relationship is called an identifying relationship and in IDEF1X notation it is represented by an oval entity rather than a square entity for base tables. An identifying relationship is one where the primary key is populated to the child weak entity as a primary key in that entity.

In general (though not necessarily) a weak entity does not have any items in its primary key other than its inherited primary key and a sequence number. There are two types of weak entities: associative entities and subtype entities. The latter represents a crucial type of normalization, where the super-type entity inherits its attributes to subtype entities based on the value of the discriminator.

In IDEF1X, a government standard for capturing requirements, possible sub-type relationships are:

y Complete subtype relationship , when all categories are known. y Incomplete subtype relationship, when all categories may not be known.

A classic example of a weak entity without a sub-type relationship would be the "header/detail' records in many real world situations such as claims, orders and invoices, where the header captures information common across all forms and the detail captures information specific to individual items.

The standard example of a complete subtype relationship is the party entity. Given the discriminator PARTY TYPE (which could be individual, partnership, C Corporation, Sub Chapter S Association, Association, Governmental Unit, Quasi-governmental agency) the two subtype entities are PERSON, which contains individual-specific information such as first and last name and date of birth, and ORGANIZATION, which would contain such attributes as the legal name, and organizational hierarchies such as cost centers. When sub-type relationships are rendered in a database, the super-type becomes what is referred to as a base table. The sub-types are considered derived tables, which correspond to weak entities. Referential Integrity is enforced via cascading updates and deletes. [edit] Example

Consider a database that records customer orders, where an order is for one or more of the items that the enterprise sells. The database would contain a table identifying customers by a customer number (primary key); another identifying the products that can be sold by a product number (primary key); and it would contain a pair of tables describing orders.

One of the tables could be called Orders and it would have an order number (primary key) to identify this order uniquely, and would contain a customer number (foreign key) to identify who the products are being sold to, plus other information such as the date and time when the order was placed, how it will be paid for, where it is to be shipped to, and so on.

The other table could be called OrderItem; it would be identified by a compound key consisting of the order number (foreign key) and an item line number, plus the product number (foreign key) that was ordered, the quantity, the price, any discount, any special options, and so on. There may be zero, one or many OrderItem entries corresponding to an Order entry, but no OrderItem entry can exist unless the corresponding Order entry exists. (The zero OrderItem case normally only applies transiently, when the order is first entered and before the first ordered item has been recorded.)

The OrderItem table stores weak entities precisely because an OrderItem has no meaning independent of the Order. Some might argue that an OrderItem does have some meaning on its own; it records that at some time not identified by the record, somebody not identified by the record ordered a certain quantity of a certain product. This information might be of some use on its own, but it is of limited use. For example, as soon as you want to find seasonal or geographical trends in the sales of the item, you need information from the related Order record.

͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙͙.

Data hierarchy From Wikipedia, the free encyclopedia

Jump to: navigation, search

This article does not cite any references or sources. Please help improve this article by adding citations to reliable sources. Unsourced material may be

challenged and removed. (December 2009)

Data Hierarchy refers to the systematic organization of data, often in a hierarchical form. Data organization involves fields, records, files and so on. A data field holds a single fact. Consider a date field, e.g. "September 19, 2004". This can be treated as a single date field (eg birthdate), or 3 fields, namely, month, day of month and year.

A record is a collection of related fields. An Employee record may contain a name field(s), address fields, birthdate field and so on.

A file is a collection of related records. If there are 100 employees, then each employee would have a record (e.g. called Employee Personal Details record) and the collection of 100 such records would constitute a file (in this case, called Employee Personal Details file).

Files are integrated into a database. This is done using a Database Management System. If there are other facets of employee data that we wish to capture, then other files such as Employee Training History file and Employee Work History file could be created as well.

The above is a view of data seen by a computer user.

The above structure can be seen in the hierarchical model, which is one way to organize data in a database.

In terms of data storage, data fields are made of bytes and these in turn are made up of bits.