Explaining Query Answer Completeness and Correctness with Minimal Pattern Covers Fatma-Zohra Hannou, Bernd Amann, Mohamed-Amine Baazizi

Total Page:16

File Type:pdf, Size:1020Kb

Explaining Query Answer Completeness and Correctness with Minimal Pattern Covers Fatma-Zohra Hannou, Bernd Amann, Mohamed-Amine Baazizi Explaining Query Answer Completeness and Correctness with Minimal Pattern Covers Fatma-Zohra Hannou, Bernd Amann, Mohamed-Amine Baazizi To cite this version: Fatma-Zohra Hannou, Bernd Amann, Mohamed-Amine Baazizi. Explaining Query Answer Complete- ness and Correctness with Minimal Pattern Covers. 2019. hal-01982575 HAL Id: hal-01982575 https://hal.archives-ouvertes.fr/hal-01982575 Preprint submitted on 15 Jan 2019 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Explaining Query Answer Completeness and Correctness with Minimal Pattern Covers Fatma-Zohra Hannou Bernd Amann Mohamed-Amine Baazizi Sorbonne Universite,´ CNRS, Sorbonne Universite,´ CNRS, Sorbonne Universite,´ CNRS, LIP6 LIP6 LIP6 Paris, France Paris, France Paris, France [email protected] [email protected] Mohamed- [email protected] ABSTRACT returns complete answers and, in some cases, for annotat- Information incompleteness is a major data quality issue ing the query answers with some completeness meta-data. which is amplified by the increasing amount of data col- Despite these efforts, reasoning about data completeness re- lected from unreliable sources. Assessing the completeness mains tricky due to the complexity of exhaustively repre- of data is crucial for determining the quality of the data it- senting and deriving information about available and miss- self, but also for verifying the validity of query answers over ing data in large datasets. incomplete data. While there exists an important amount In many situations, datasets and query results are ex- of work on modeling data completeness, deriving this com- plicitly or implicitly depend on some complete reference pleteness information has not received much attention. In (or master) datasets to describe their expected full extent. this work, we tackle the issue of efficiently describing and in- For instance, sensor databases are usually construed within ferring knowledge about data completeness w.r.t. to a com- a spatio-temporal reference delimiting the coverage of the plete reference data set and study the use of a pattern alge- captured data. It is also believed according to [9] that 80 bra for summarizing the completeness and validity of query % of enterprises maintain master data with their analyti- answers. We describe an implementation and experiments cal databases (customers informations, product) . In other with a real-world dataset to validate the effectiveness and data-centric applications, a reference is defined by domain the efficiency of our approach. experts during database design and updated when neces- sary. Finally, it may also sometimes be useful to use an PVLDB Reference Format: existing table or query result as a reference for deriving a Fatma-Zohra HANNOU, Bernd AMANN and Mohamed-Amine comprehensive representation about available and missing BAAZIZI. Explaining Query Answer Completeness and Correct- information in some specific context. ness with Minimal Pattern Covers. PVLDB, 12(xxx): xxxx-yyyy, 2019. DOI: https://doi.org/10.14778/xxxxxxx.xxxxxxx Representing information completeness. To understand the importance of using reference datasets for assessing data completeness, consider the database in Table 2 which depicts 1. INTRODUCTION an example of a sensor database. The table Energy reports Information incompleteness is a major data quality issue on daily energy measurements for some locations specified that is exacerbated by the growing number of applications, by floor (fl) and room (ro). This database is endowed collecting data from distributed, open, and unreliable en- with a spatial reference MAP describing all locations in vironments. Sensor networks and data integration are sig- some building and a calendar CAL indicating the expected nificant examples in which data incompleteness naturally temporal coverage (Table 1). Observe that both reference arises due to hardware or software failures, data incompat- data sets are not necessarily validated master data but might ibility, missing data access authorizations etc. In all these have been built by an expert for a specific analysis task. situations, querying and analyzing data can lead to deriv- ing partial or incorrect answers. Extensive effort has been Table 1: Reference tables devoted to representing and querying incomplete databases MAP fl ro CAL we da [11, 8, 3, 7, 13, 16]. The common characteristics of these f1 r1 w1 × approaches is the use of some intensional or extensional in- f1 r2 Mon w2 formation about completeness for deciding whether a query f2 r1 Tue This work is licensed under the Creative Commons Attribution- For various reasons, the current database misses some val- NonCommercial-NoDerivatives 4.0 International License. To view a copy ues that are pinpointed in grayscale in the Energy table. of this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/. For Assume that an analyst wants to gain a full knowledge about any use beyond those covered by this license, obtain permission by emailing the segments of the data that are available or missing. To [email protected]. Copyright is held by the owner/author(s). Publication rights facilitate her understanding of the data, the analyst would licensed to the VLDB Endowment. like a summarized version of the completeness information Proceedings of the VLDB Endowment, Vol. 12, No. xxx ISSN 2150-8097. and may opt for a pattern-based representation like the one DOI: https://doi.org/10.14778/xxxxxxx.xxxxxxx presented in Table 3. This figure shows two tables PE and 1 directly from the patterns of the input data. To illustrate Table 2: Data table this idea, consider Q1 which retrieves all measures referring Energy fl ro we da kWh to week w2: t0 f1 r1 w1 Mon 10 Q1 : select ∗ from Energy where we = 'w2 ' ; t1 f1 r1 w1 Tue 12 t2 f1 r1 w2 Mon 10 By evaluating Q1 over PE and PE respectively, we obtain m0 f1 r1 w2 Tue ? Table Q1(PE) and table Q1(PE): Q1 produces a complete t f1 r2 w1 Mon 8 3 answer for floor f2 while the first tuple in Q (P ) indicates t f1 r2 w1 Tue 10 1 E 4 that no answer is returned for room r2 during week w2: m1 f1 r2 w2 Mon ? m2 f1 r2 w2 Tue ? Q1(PE) fl ro we da Q1(PE) fl ro we da t5 f2 r1 w1 Mon 12 f2 ∗ ∗ ∗ ∗ r2 w2 ∗ t6 f2 r1 w1 Tue 7 f1 r1 ∗ Mon f1 r1 w2 Tue t7 f2 r1 w2 Mon 8 t8 f2 r1 w2 Tue 8 The pattern completeness model can also play a crucial role for validating the correctness of aggregation queries an- PE capturing the available and the missing information of swers. When such queries are applied on incomplete data, table Energy respectively. More exactly, table PE contains the values resulting from aggregating incomplete partitions pattern tuples that capture all partitions which are complete are simply incorrect and there is means to notify this fact w.r.t. the reference. For instance, pattern tuple p0 indicates to the user. To illustrate the role of the pattern model in that all measurements pertaining to week w1 are available, detecting potential problems with aggregation queries, con- whatever the values of floor, room or day are. Pattern p3 sider Q2 which sums the energy consumption over all day in table PE reports that no measure can be found for room values. r2 and week w2. This representation is compact as it only reports on the largest possible partitions that are complete Q2 : select f l , ro , we , sum(kWh) as kWh (resp. missing) in the data. It is also covering as it re- from Energy ports on every possible maximal complete (resp. missing) group by f l , ro , we partition of the data. Without this presentation, the analyst This query returns both valid and non-valid answers pro- might have to issue several queries without any guarantee of duced by complete and incomplete partitions respectively. deriving an exhaustive information about completeness that Incomplete partitions producing incorrect results can easily the pattern representation offers in a rather natural manner. be identified by patterns for which the value of the attribute day is a constant instead of ∗. These "incompleteness" pat- terns are separated from the correct partitions in Q2(PE) Table 3: Completeness pattern tables of Energy and Q2(PE). PE fl ro we da Q (P ) fl ro we PE fl ro we da 2 E Q (P ) fl ro we p0 ∗ ∗ w1 ∗ ∗ ∗ w1 2 E p3 ∗ r2 w2 ∗ ∗ r2 w2 p1 f2 ∗ ∗ ∗ f2 ∗ ∗ p4 f1 r1 ∗ Tue f1 r1 ∗ p2 f1 r1 ∗ Mon f1 r1 ∗ Analyzing information completeness. Pattern tables can Annotating query results. Completeness pattern tables become very large due to randomness of missing information also can be used for rewriting aggregation queries to au- and may not be easily analyzed by hand. Querying pattern tomatically annotate the produced results. For example, tables with SQL turns out to be a convenient means for Table 4 shows the annotated result of query Q2 where com- extracting and reasoning about data completeness. In our pleteness information is directly extracted from P2 and P¯2. example, if an analyst wants to identify the floors where all measures are available, she could issue the following query on PE and notice that, since p1 is the only pattern satisfying Table 4: Annotated Query Answer the predicate of Q0, the only floor satisfying her criteria is f2. Result Q2 fl ro we kWh annot f1 r1 w1 22 ok Q0 : select f l from PE f1 r1 w2 10 incorr where ro= ' ∗ ' and we=' ∗ ' and da=' ∗ ' f1 r2 w1 18 ok Querying pattern tables have another interesting applica- f2 r1 w1 19 ok tion when it comes to extracting completeness information of f2 r1 w2 16 ok query answers obtained from incomplete input tables.
Recommended publications
  • Data Model Standards and Guidelines, Registration Policies And
    Data Model Standards and Guidelines, Registration Policies and Procedures Version 3.2 ● 6/02/2017 Data Model Standards and Guidelines, Registration Policies and Procedures Document Version Control Document Version Control VERSION D ATE AUTHOR DESCRIPTION DRAFT 03/28/07 Venkatesh Kadadasu Baseline Draft Document 0.1 05/04/2007 Venkatesh Kadadasu Sections 1.1, 1.2, 1.3, 1.4 revised 0.2 05/07/2007 Venkatesh Kadadasu Sections 1.4, 2.0, 2.2, 2.2.1, 3.1, 3.2, 3.2.1, 3.2.2 revised 0.3 05/24/07 Venkatesh Kadadasu Incorporated feedback from Uli 0.4 5/31/2007 Venkatesh Kadadasu Incorporated Steve’s feedback: Section 1.5 Issues -Change Decide to Decision Section 2.2.5 Coordinate with Kumar and Lisa to determine the class words used by XML community, and identify them in the document. (This was discussed previously.) Data Standardization - We have discussed on several occasions the cross-walk table between tabular naming standards and XML. When did it get dropped? Section 2.3.2 Conceptual data model level of detail: changed (S) No foreign key attributes may be entered in the conceptual data model. To (S) No attributes may be entered in the conceptual data model. 0.5 6/4/2007 Steve Horn Move last paragraph of Section 2.0 to section 2.1.4 Data Standardization Added definitions of key terms 0.6 6/5/2007 Ulrike Nasshan Section 2.2.5 Coordinate with Kumar and Lisa to determine the class words used by XML community, and identify them in the document.
    [Show full text]
  • Using Telelogic DOORS and Microsoft Visio to Model and Visualize Complex Business Processes
    Using Telelogic DOORS and Microsoft Visio to Model and Visualize Complex Business Processes “The Business Driven Application Lifecycle” Bob Sherman Procter & Gamble Pharmaceuticals [email protected] Michael Sutherland Galactic Solutions Group, LLC [email protected] Prepared for the Telelogic 2005 User Group Conference, Americas & Asia/Pacific http://www.telelogic.com/news/usergroup/us2005/index.cfm 24 October 2005 Abstract: The fact that most Information Technology (IT) projects fail as a result of requirements management problems is common knowledge. What is not commonly recognized is that the widely haled “use case” and Object Oriented Analysis and Design (OOAD) phenomenon have resulted in little (if any) abatement of IT project failures. In fact, ten years after the advent of these methods, every major IT industry research group remains aligned on the fact that these projects are still failing at an alarming rate (less than a 30% success rate). Ironically, the popularity of use case and OOAD (e.g. UML) methods may be doing more harm than good by diverting our attention away from addressing the real root cause of IT project failures (when you have a new hammer, everything looks like a nail). This paper asserts that, the real root cause of IT project failures centers around the failure to map requirements to an accurate, precise, comprehensive, optimized business model. This argument will be supported by a using a brief recap of the history of use case and OOAD methods to identify differences between the problems these methods were intended to address and the challenges of today’s IT projects.
    [Show full text]
  • Principles of the Concept-Oriented Data Model Alexandr Savinov
    Principles of the Concept-Oriented Data Model Alexandr Savinov Institute of Mathematics and Computer Science Academy of Sciences of Moldova Academiei 5, MD-2028 Chisinau, Moldova Fraunhofer Institute for Autonomous Intelligent System Schloss Birlinghoven, 53754 Sankt Augustin, Germany http://www.conceptoriented.com [email protected] Principles of the Concept-Oriented Data Model Alexandr Savinov Institute of Mathematics and Computer Science, Academy of Sciences of Moldova Academiei 5, MD-2028 Chisinau, Moldova Fraunhofer Institute for Autonomous Intelligent System Schloss Birlinghoven, 53754 Sankt Augustin, Germany http://www.conceptoriented.com [email protected] In the paper a new approach to data representation and manipulation is described, which is called the concept-oriented data model (CODM). It is supposed that items represent data units, which are stored in concepts. A concept is a combination of superconcepts, which determine the concept’s dimensionality or properties. An item is a combination of superitems taken by one from all the superconcepts. An item stores a combination of references to its superitems. The references implement inclusion relation or attribute- value relation among items. A concept-oriented database is defined by its concept structure called syntax or schema and its item structure called semantics. The model defines formal transformations of syntax and semantics including the canonical semantics where all concepts are merged and the data semantics is represented by one set of items. The concept-oriented data model treats relations as subconcepts where items are instances of the relations. Multi-valued attributes are defined via subconcepts as a view on the database semantics rather than as a built-in mechanism.
    [Show full text]
  • Metamodeling the Enhanced Entity-Relationship Model
    Metamodeling the Enhanced Entity-Relationship Model Robson N. Fidalgo1, Edson Alves1, Sergio España2, Jaelson Castro1, Oscar Pastor2 1 Center for Informatics, Federal University of Pernambuco, Recife(PE), Brazil {rdnf, eas4, jbc}@cin.ufpe.br 2 Centro de Investigación ProS, Universitat Politècnica de València, València, España {sergio.espana,opastor}@pros.upv.es Abstract. A metamodel provides an abstract syntax to distinguish between valid and invalid models. That is, a metamodel is as useful for a modeling language as a grammar is for a programming language. In this context, although the Enhanced Entity-Relationship (EER) Model is the ”de facto” standard modeling language for database conceptual design, to the best of our knowledge, there are only two proposals of EER metamodels, which do not provide a full support to Chen’s notation. Furthermore, neither a discussion about the engineering used for specifying these metamodels is presented nor a comparative analysis among them is made. With the aim at overcoming these drawbacks, we show a detailed and practical view of how to formalize the EER Model by means of a metamodel that (i) covers all elements of the Chen’s notation, (ii) defines well-formedness rules needed for creating syntactically correct EER schemas, and (iii) can be used as a starting point to create Computer Aided Software Engineering (CASE) tools for EER modeling, interchange metadata among these tools, perform automatic SQL/DDL code generation, and/or extend (or reuse part of) the EER Model. In order to show the feasibility, expressiveness, and usefulness of our metamodel (named EERMM), we have developed a CASE tool (named EERCASE), which has been tested with a practical example that covers all EER constructors, confirming that our metamodel is feasible, useful, more expressive than related ones and correctly defined.
    [Show full text]
  • The Importance of Data Models in Enterprise Metadata Management
    THE IMPORTANCE OF DATA MODELS IN ENTERPRISE METADATA MANAGEMENT October 19, 2017 © 2016 ASG Technologies Group, Inc. All rights reserved HAPPINESS IS… SOURCE: 10/18/2017 TODAY SHOW – “THE BLUE ZONE OF HAPPINESS” – DAN BUETTNER • 3 Close Friends • Get a Dog • Good Light • Get Religion • Get Married…. Stay Married • Volunteer • “Money will buy you Happiness… well, it’s more about Financial Security” • “Our Data Models are now incorporated within our corporate metadata repository” © 2016 ASG Technologies Group, Inc. All rights reserved 3 POINTS TO REMEMBER SOURCE: 10/19/2017 DAMA NYC – NOONTIME SPEAKER SLOT – MIKE WANYO – ASG TECHNOLOGIES 1. Happiness is individually sought and achievable © 2016 ASG Technologies Group, Inc. All rights reserved 3 POINTS TO REMEMBER SOURCE: 10/19/2017 DAMA NYC – NOONTIME SPEAKER SLOT – MIKE WANYO – ASG TECHNOLOGIES 1. Happiness is individually sought and achievable 2. Data Models can in be incorporated into your corporate metadata repository © 2016 ASG Technologies Group, Inc. All rights reserved 3 POINTS TO REMEMBER SOURCE: 10/19/2017 DAMA NYC – NOONTIME SPEAKER SLOT – MIKE WANYO – ASG TECHNOLOGIES 1. Happiness is individually sought and achievable 2. Data Models can in be incorporated into your corporate metadata repository 3. ASG Technologies can provide an overall solution with services to accomplish #2 above and more for your company. © 2016 ASG Technologies Group, Inc. All rights reserved AGENDA Data model imports to the metadata collection View and search capabilities Traceability of physical and logical
    [Show full text]
  • Towards the Universal Spatial Data Model Based Indexing and Its Implementation in Mysql
    Towards the universal spatial data model based indexing and its implementation in MySQL Evangelos Katsikaros Kongens Lyngby 2012 IMM-M.Sc.-2012-97 Technical University of Denmark Informatics and Mathematical Modelling Building 321, DK-2800 Kongens Lyngby, Denmark Phone +45 45253351, Fax +45 45882673 [email protected] www.imm.dtu.dk IMM-M.Sc.: ISSN XXXX-XXXX Summary This thesis deals with spatial indexing and models that are able to abstract the variety of existing spatial index solutions. This research involves a thorough presentation of existing dynamic spatial indexes based on R-trees, investigating abstraction models and implementing such a model in MySQL. To that end, the relevant theory is presented. A thorough study is performed on the recent and seminal works on spatial index trees and we describe their basic properties and the way search, deletion and insertion are performed on them. During this effort, we encountered details that baffled us, did not make the understanding the core concepts smooth or we thought that could be a source of confusion. We took great care in explaining in depth these details so that the current study can be a useful guide for a number of them. A selection of these models were later implemented in MySQL. We investigated the way spatial indexing is currently engineered in MySQL and we reveal how search, deletion and insertion are performed. This paves the path to the un- derstanding of our intervention and additions to MySQL's codebase. All of the code produced throughout this research was included in a patch against the RDBMS MariaDB.
    [Show full text]
  • HANDLE - a Generic Metadata Model for Data Lakes
    HANDLE - A Generic Metadata Model for Data Lakes Rebecca Eichler, Corinna Giebler, Christoph Gr¨oger, Holger Schwarz, and Bernhard Mitschang In: Big Data Analytics and Knowledge Discovery (DaWaK) 2020 BIBTEX: @inproceedingsfeichler2020handle, title=fHANDLE-A Generic Metadata Model for Data Lakesg, author=fEichler, Rebecca and Giebler, Corinna and Gr{\"o\}ger,Christoph and Schwarz, Holger and Mitschang, Bernhardg, booktitle=fInternational Conference on Big Data Analytics and Knowledge Discovery (DaWaK) 2020g, pages=f73{88g, year=f2020g, organization=fSpringerg, doi=f10.1007/978-3-030-59065-9 7g g c by Springer Nature This is the author's version of the work. The final authenticated version is avail- able online at: https://doi.org/10.1007/978-3-030-59065-9 7 HANDLE - A Generic Metadata Model for Data Lakes Rebecca Eichler1, Corinna Giebler1, Christoph Gr¨oger2, Holger Schwarz1, and Bernhard Mitschang1 1 University of Stuttgart, Universit¨atsstraße38, 70569 Stuttgart, Germany [email protected] 2 Robert Bosch GmbH, Borsigstraße 4, 70469 Stuttgart, Germany [email protected] Abstract The substantial increase in generated data induced the de- velopment of new concepts such as the data lake. A data lake is a large storage repository designed to enable flexible extraction of the data's value. A key aspect of exploiting data value in data lakes is the collec- tion and management of metadata. To store and handle the metadata, a generic metadata model is required that can reflect metadata of any potential metadata management use case, e.g., data versioning or data lineage. However, an evaluation of existent metadata models yields that none so far are sufficiently generic.
    [Show full text]
  • XV. the Entity-Relationship Model
    Information Systems Analysis and Design csc340 XV. The Entity-Relationship Model The Entity-Relationship Model Entities, Relationships and Attributes Cardinalities, Identifiers and Generalization Documentation of E-R Diagrams and Business Rules 2002 John Mylopoulos The Entity-Relationship Model -- 1 Information Systems Analysis and Design csc340 The Entity Relationship Model The Entity-Relationship (ER) model is a conceptual data model, capable of describing the data requirements for a new information system in a direct and easy to understand graphical notation. Data requirements are described in terms of a conceptualconceptual (or, ER) schema. ER schemata are comparable to class diagrams. 2002 John Mylopoulos The Entity-Relationship Model -- 2 Information Systems Analysis and Design csc340 The Constructs of the E-R Model AND/XOR 2002 John Mylopoulos The Entity-Relationship Model -- 3 Information Systems Analysis and Design csc340 Entities These represent classes of objects (facts, things, people,...) that have properties in common and an autonomous existence. City, Department, Employee, Purchase and Sale are examples of entities for a commercial organization. An instance of an entity is an object in the class represented by the entity. Stockholm, Helsinki, are examples of instances of the entity City, and the employees Peterson and Johanson are examples of instances of the Employee entity. The E-R model is very different from the relational model in which it is not possible to represent an object without knowing its properties (an employee is represented by a tuple containing the name, surname, age, and other attributes.) 2002 John Mylopoulos The Entity-Relationship Model -- 4 Information Systems Analysis and Design csc340 Examples of Entities 2002 John Mylopoulos The Entity-Relationship Model -- 5 Information Systems Analysis and Design csc340 Relationships They represent logical links between two or more entities.
    [Show full text]
  • Patterns for Slick Database Applications Jan Christopher Vogt, EPFL Slick Team
    Patterns for Slick database applications Jan Christopher Vogt, EPFL Slick Team #scalax Recap: What is Slick? (for ( c <- coffees; if c.sales > 999 ) yield c.name).run select "COF_NAME" from "COFFEES" where "SALES" > 999 Agenda • Query composition and re-use • Getting rid of boiler plate in Slick 2.0 – outer joins / auto joins – auto increment – code generation • Dynamic Slick queries Query composition and re-use For-expression desugaring in Scala for ( c <- coffees; if c.sales > 999 ) yield c.name coffees .withFilter(_.sales > 999) .map(_.name) Types in Slick class Coffees(tag: Tag) extends Table[C](tag,"COFFEES”) { def * = (name, supId, price, sales, total) <> ... val name = column[String]("COF_NAME", O.PrimaryKey) val supId = column[Int]("SUP_ID") val price = column[BigDecimal]("PRICE") val sales = column[Int]("SALES") val total = column[Int]("TOTAL") } lazy val coffees = TableQuery[Coffees] <: Query[Coffees,C] coffees(coffees.map ).map( :TableQuery(c => c.[Coffees,_]name) ).map( (c ) => ((c:Coffees) => (c.name ) : Column[String]) ): Query[Column[String],String] Table extensions class Coffees(tag: Tag) extends Table[C](tag,"COFFEES”) { … val price = column[BigDecimal]("PRICE") val sales = column[Int]("SALES") def revenue = price.asColumnOf[Double] * sales.asColumnOf[Double] } coffees.map(c => c.revenue) Query extensions implicit class QueryExtensions[T,E] ( val q: Query[T,E] ){ def page(no: Int, pageSize: Int = 10) : Query[T,E] = q.drop( (no-1)*pageSize ).take(pageSize) } suppliers.page(5) coffees.sortBy(_.name).page(5) Query extensions
    [Show full text]
  • HHS Logical Data Model Practices Guide
    DEPARTMENT OF HEALTH AND HUMAN SERVICES ENTERPRISE PERFORMANCE LIFE CYCLE FRAMEWORK <OIDV Logo> PPRRAACCTTIIICCEESS GGUUIIIDDEE LOGICAL DATA MODELING Issue Date: <mm/dd/yyyy> Revision Date: <mm/dd/yyyy> Document Purpose This Practices Guide is provides an overview describing the best practices, activities, attributes, and related templates, tools, information, and key terminology of industry-leading project management practices and their accompanying project management templates. Background The Department of Health and Human Services (HHS) Enterprise Performance Life Cycle (EPLC) is a framework to enhance Information Technology (IT) governance through rigorous application of sound investment and project management principles, and industry best practices. The EPLC provides the context for the governance process and describes interdependencies between its project management, investment management, and capital planning components. The EPLC framework establishes an environment in which HHS IT investments and projects consistently achieve successful outcomes that align with Department and Operating Division goals and objectives. A Logical Data Model (LDM) is a complete representation of data requirements and the structural business rules that govern data quality in support of project’s requirements. The LDM shows the specific entities, attributes and relationships involved in a business function and is the basis for the creation of the physical data model. Practice Overview A logical data model is a graphical representation of the business requirements. They describe the data that are important to the organization and how they relate to one another, as well as business definitions and examples. The logical data model is created during the Requirements Analysis Phase and is a component of the Requirements Document. The logical data model helps to build a common understanding of business data elements and requirements and is the basis of physical database design.
    [Show full text]
  • The Layman's Guide to Reading Data Models
    Data Architecture & Engineering Services The Layman’s Guide to Reading Data Models A data model shows a data asset’s structure, including the relationships and constraints that determine how data will be stored and accessed. 1. Common Types of Data Models Conceptual Data Model A conceptual data model defines high-level relationships between real-world entities in a particular domain. Entities are typically depicted in boxes, while lines or arrows map the relationships between entities (as shown in Figure 1). Figure 1: Conceptual Data Model Logical Data Model A logical data model defines how a data model should be implemented, with as much detail as possible, without regard for its physical implementation in a database. Within a logical data model, an entity’s box contains a list of the entity’s attributes. One or more attributes is designated as a primary key, whose value uniquely specifies an instance of that entity. A primary key may be referred to in another entity as a foreign key. In the Figure 2 example, each Employee works for only one Employer. Each Employer may have zero or more Employees. This is indicated via the model’s line notation (refer to the Describing Relationships section). Figure 2: Logical Data Model Last Updated: 03/30/2021 OIT|EADG|DEA|DAES 1 The Layman’s Guide to Reading Data Models Physical Data Model A physical data model describes the implementation of a data model in a database (as shown in Figure 3). Entities are described as tables, Attributes are translated to table column, and Each column’s data type is specified.
    [Show full text]
  • What Is a Metamodel: the OMG's Metamodeling Infrastructure
    Modeling and metamodeling in Model Driven Development What is a metamodel: the OMG’s metamodeling infrastructure Warsaw, May 14-15th 2009 Gonzalo Génova [email protected] http://www.kr.inf.uc3m.es/ggenova/ Knowledge Reuse Group Universidad Carlos III de Madrid What is a metamodel: the OMG’s metamodeling infrastructure 1 Structure of the seminar On the difference between What is a model: analysis and design syntax and semantics models What is a metamodel: Metamodeling directed the OMG’s metamodeling relationships in UML infrastructure By the way, what does this diagram mean, what is its syntax? What is a metamodel: the OMG’s metamodeling infrastructure 2 Sources • Jean Bézivin – Model Engineering for Software Modernization. • The 11th IEEE Working Conference on Reverse Engineering, Delft, November 8th-12th 2004. – On the unification power of models. • Software and Systems Modeling 4(2): 171–188, May 2005. • Colin Atkinson, Thomas Kühne – Model-Driven Development: A Metamodeling Foundation. • IEEE Software 20(5): 36-41, Sep-Oct 2003. – Reducing Accidental Complexity in Domain Models. • Software and Systems Modeling 7(3): 345-359, July 2008. • My own ideas and elaboration. What is a metamodel: the OMG’s metamodeling infrastructure 3 Table of contents 1. Introduction: definitions of metamodel 2. Representation and conformance 3. The four metamodeling layers 4. Metamodel and semantic domain 5. A case of metamodel/domain mismatch 6. Conclusions What is a metamodel: the OMG’s metamodeling infrastructure 4 Introduction: definitions of metamodel What is a metamodel: the OMG’s metamodeling infrastructure 5 What is a metamodel (according to Google definitions) • If someone still believes there is a commonly accepted definition..
    [Show full text]