Color Coding ER-Diagrams, One Approach to Modeling Semi- Structured Data. Entity-Relationship Diagrams Used for Aggregate Oriented Databases
Total Page:16
File Type:pdf, Size:1020Kb
International Journal of Innovative Studies in Sciences and Engineering Technology (IJISSET) ISSN 2455-4863 (Online) www.ijisset.org Volume: 6 Issue: 6 | 2020 Color coding ER-diagrams, one approach to modeling semi- structured data. Entity-Relationship diagrams used for aggregate oriented databases Anders Lassen1,2, Carsten Ejstrup1 1 Datalogisk Institut, University of Copenhagen (DIKU), Copenhagen, DK-2200N, Denmark 2 Institute for People and Technology (IMT), Roskilde University, Roskilde, Denmark Abstract: Aggregate oriented databases[26] support and their relationships. In Chen 1976 he represents collections of documents with unstructured data. The entities as boxes and relationships as diamonds. It is a unstructured markup means that anything can be little unclear, as the theory revolves around entity sets stored anywhere. Notations for conceptual modeling of and relationship sets. Thus, ER-diagrams could equally entity relationship diagrams have been extended with well be interpreted as entity-set-relationship-set notation for aggregation, which satisfy the nesting of diagrams. The reason for name giving entities in entity- structures in aggregate oriented databases [11]. relationship-diagram in singular is that an entity is but Conceptual modeling of ER-diagrams [5,27] has the one tuple in an entity set. The box and diamond benefit that a generic diagram of the whole database is notation can be seen in Figure 1. In chapter 3 of [5], and sufficient for moving forward to a design activity, while in the section covering the network model the diamond conceptual modeling of aggregate oriented databases notation is left out for the data structure diagram while has established a notation for modeling [2]. However preserved for the ER-diagram. In [15,25] boxes are the model is specific and not generic. Solutions that shown as entity sets and diamonds are shown as prefer a different nesting needs a separate aggregate relationship sets. The extended entity relationship data model. Our approach to conceptual modeling does diagram (EER-diagram) [25,7,11] extends the ER not violate the existing relational constraints and notation with object orientated notations known from provides an extra conceptual layer where document- UML: Aggregation/Specialization, including function structure can be modeled by color coding. We will methods; multiple inheritance. The higher-order entity show that modeling decisions can readily be made and relationship model diagram (HERM) attempt a materialized in document views within this notation. diagrammatic form that is not bound to the relational model [28]. The circle with a cross is used in [28] in In relational database theory, the study of entity sets is modeling. Relational databases are well described in based on the understanding of functional dependencies [15,25,27] introducing relational algebra, and normalization. It is this theory of the fundamental decomposition using functional dependencies, properties where every attribute is scalar, not normalization, SQL and conceptual modeling, among composite. To reach first normal form of a relation, all others. attribute values must contain atomic values only [Date 1990, p 526-527][8]. No composite attributes; no In modeling, the concept of functional dependencies multivalue attributes, found in JSON or XML. By and normalization guides reasoning about selection of functional decomposition such representations would attributes in relations. The conceptual modeling is be split into a number of relations to reach progressive manifested in ER-diagram that may take any form higher normal forms. The outcome of ER diagramming according to the modeling tool at hand (crowfoot- for relational databases is in many cases a database notation[9], box-diamond-notation [15,25,21] and UML schema of third normal form or BCNF. But for semi- class diagram style notation [10,21]; data-diagram style structured databases, not even the first normal form is [1]. Chen [5] works with four levels of abstraction: reached, underlining the necessity for a notation. Modeling entities and their relationship which exists in your mind is level one; Organizing the information Keywords: ER-diagram conceptual modeling aggregate structure by normalization or functional decomposition -oriented databases aggregate-oriented data model· is level two; modeling access-path-independent data- entity-relationship modeling structures, e.g. modeling tables and their constraints, is Introduction level three; and modeling access-pathdependent-data- structures is level four.. The data-diagram style is used Conceptual modeling is used as a ‘tool of the trade’, in [9] for modeling level 3 diagrams with the intended when analyzing for structure in relational databases. database schema objects. The best guideline for An early account of databases is given by Date in [8]. designers is to explain their use of notation in their The most common use of conceptual modeling for reporting. The fundamental constructs in an ER- relational databases is the entity-relationship diagrams diagram: is identification of entity-sets(entities); and (ER-diagrams) of Chen 1976[5,6], depicting entities the relationship between entities, drawn as one-one, © 2020, IJISSET Page 14 International Journal of Innovative Studies in Sciences and Engineering Technology (IJISSET) ISSN 2455-4863 (Online) www.ijisset.org Volume: 6 Issue: 6 | 2020 one-many or many-many - relationships according to Interviews and observations recorded on tape can be the notation used. These are the fundamental transcripted and the transcription analyzed with a properties. In addition, Entity-sets can further be thematic analysis. In this process color coding modeled as an is-a hierarchy, in which case entity-sets represents different themes [4,16,19,18]. of similar type are modeled. Weak entity-sets model In this paper we aim to present one color coding close owner ship to a parent entity-set suggesting scheme as a prototype for a more general method of cascading delete properties between the two. using conceptual relational modeling as the data The conceptual modeling is then used by the designer analysis tool for both relational databases and semi- as a starting point for creating a set of tables and structured databases. This is accomplished by mapping constraints: primary keys, foreign keys; suggested decisions for semi-structured layout with color code. indexes; and domain constraints, in SQL data definition language (DDL). A foreign key draws an attribute on Supporting Literature for Conceptual Modeling the many entity set for a one-many relationship; and a Languages new mapping table is created for each many-many In Ng (2010) [23] a formal definition of the entity relationship. Even cascading policies can be considered. relationship model is given. A note to this work is that In particular the weak entity-set is interesting when attention is given to value set. The relational theory of modeling for semi-structured databases [15]. Semi- normalization says that to reach first normal form structured databases are exemplified as XML, JSON, every attribute must be single valued. But semi- BSON text-file format found in document stores structured databases do not obey relational theory and (Mongo, Couchbase, Casandra). The underlying multivalued value sets (lists) and composite fields structure of a semi-structured database is a undirected (mappings into cartesian products of value sets) are graph, but constraints modeled as foreign key highly relevant to preserve when modeling for semi- relationships can be defined (XMLSchema for XML). In structured databases. this case the mapping to an ER-diagram is readily identified. To outline the problem, the use a of [23] states that a relationship relation is normal if only diagramming to reasoning about databases is well attribute values of involved entities are involved in the established and the notations for ER-, EER- and HERM- primary key of the relationship relation. This means diagrams so rich that most modeling features are that the relationship relation has local scope to the included. involved entity sets. A relationship relation is weak if any part of the primary key of the relationship relation Consider modeling for a semi-structured database. In is identified by another relationship. This is phrased this case lists and composite data structures must be like: ... Since a relationship is identified by the involved represented as well. This model is now not adequate to entities, the primary key of a relationship can be describe the datamodel in terms of overall represented by the primary keys of the involved normalization. Maybee locally. In this case a level of entities. We also have two forms of relationship detail approach could preserve the high level ER- relations. If all the entities in the relationship are diagram in observing rules of normalization and at a identified by their own attribute’s values, we shall call lower level, on a subdiagram, model for semi- it a regular relationship relation. If some entities in the structured representations. And the XML structures not relationship are identified by other relationships, we readily extracted from the ER diagram, can readily be shall call it a weak relationship relation. ... [23, p224] extracted from sub diagrams. This modeling distinction limits the modeling of It is counter intuitive to use a conceptual modeling tool relationships to weak entity sets by not automatically to reach a certain database model and then break it interfering that the relationship to a weak entity must down by modeling for semi-structured documents. No be a weak relationship. An entity set can likewise be structure is upheld in a semi-structured database and defined as regular if the primary key is identified by the yet, as a document database grows, some structure is entity’s own attributes alone or as weak if the primary advised. An ER-diagram upholds the theory of the key must include a relationship. ...If relationships are relational data, and thus is a modeling tool for the used for identifying the entities, we shall call it a weak chosen relationships between data in the domain. In entity relation.