A Metadata-Driven Approach to Relational Database Management
Total Page:16
File Type:pdf, Size:1020Kb
MASARYK UNIVERSITY FACULTY OF INFORMATICS A Metadata-Driven Approach to Relational Database Management DISSERTATION PROPOSAL Mgr. Vojtěch Přehnal Supervisor: doc. RNDr. Ivan Kopeček, CSc. doc. RNDr. Ivan Kopeček, CSc. Brno, January 2012 Supervisor 1 Acknowledgements I would like to thank to my supervisor Ivan Kopeček for his helpful discussions and advices as well as to members of LSD (Laboratory of Searching and Dialogue) for their support and creative working environment. 2 3 Contents 1 Introduction ..................................................................................................................................... 6 2 State of the Art .............................................................................................................................. 10 2.1 Automated SQL code generation .......................................................................................... 10 2.2 Automated Rich User Interface Generation .......................................................................... 11 2.3 Serialization Schema Redefinition ......................................................................................... 12 2.4 Relational Metadata Models ................................................................................................. 14 3 Dissertation Thesis Intent .............................................................................................................. 16 4 Achieved Results............................................................................................................................ 17 4.1 Relational Schema Model (RSM) ........................................................................................... 17 4.2 Relational Schema Protocol (RSP) ......................................................................................... 18 5 References ..................................................................................................................................... 21 6 Summary of Study Results ............................................................................................................. 22 4 5 1 Introduction Relational databases, originally introduced by H. Codd in [4], have emerged as a predominant way of data storage in various industries, such as finance, banking and accounting, manufacturing and logistics, human resources management, medical care, public administration and much more [6]. The most widely used language for relational database management is SQL (Structured Programming Language). It is standardized by ISO (International Organization for Standardization) and is supported by all major relational database management systems vendors, such as Microsoft, Oracle, IBM or Sybase. SQL involves DML (Data Manipulation Language) statements (e.g. SELECT, INSERT, UPDATE, DELETE…) for data manipulation, DDL (Data Definition Language) statements (e.g. CREATE, ALTER, RENAME, DROP …) for schema alteration and others [1]. The SQL statements contain relational schema elements (such as names of columns or tables) in their syntax, hence, in order to perform the SQL statements on the database, the schema of the database have to be known. In simple scenarios, when the relational schema is invariable, the application logic is precisely customized for the particular schema: the appropriate SQL queries are stored in the database or are hard-wired in the application tier or are computed on-the-fly from the fixed hierarchy of classes generated by some ORM (Object-Relational Mapping) tool. The interacting applications perform custom business logic on the particular schema and, eventually, have a custom user interface designed. As the relational schema evolves, the SQL queries have to be redefined and the interacting applications have to be reimplemented and recompiled. This represents a serious issue for evolving data models, because every change in the relational schema requires additional work of programmers. In other words: user may not perform any action resulting in redefining the schema, which may become a limitation in many cases. When the user needs to change the schema, they can't help themselves. Instead, they have to contact the manufacturer of the application and wait for them to fulfill their requirements. This takes some time which may cause considerable financial loss for the company and may prevent the user from laying up new claims, although they could bring them additional gains. In some cases it’s necessary to stop the database for a while (during the schema adjustments), which may cause additional losses or may be completely impossible. Furthermore, the user has to pay for something they could make themselves easily, using a few clicks of mouse. The user is dependent on the supplier/manufacturer of the application, and if the contract is terminated, the possibility of any kind of maintenance is over in the fact. Last but not least, the most of workload in development of data-driven applications is concerned on simple, fully-automatable tasks, such as altering database schema, altering source code, particularly the data model (classes and objects) – manually or automatically by ORM (object-relational mapping) tools, implementing business logic for CRUD (Create, Read, Update, Delete) operations, creating user interface or redefining data serialization schema for transport over the network [12]. In order to access evolving relational schema in real-time, without the need for rewriting and/or recompiling the source code, the application is required to retrieve the relation schema from the database in run-time and build SQL queries on-the-fly. However, this introduces several challenges: QUERYING INFORMATION SCHEMA: The only way of retrieving metadata about relational schema out-of-the-box (using only the resources of database engines) is querying a set of 6 system views called INFORMATION SCHEMA [7]. These views have several limitations, though: they provide very poor metadata about the relational schema and they are not easily extensible. They are incompatible across the different database engines and their performance may become very low with increasing number of tables. DESIGN-TIME OBJECT-RELATIONAL MAPPING EXCLUSION: Relations (tables) cannot be mapped to the data model of the application (internal classes and objects) in design-time (including automated mapping using ORM tools). Instead, the data has to be retrieved dynamically and mapped to in-memory objects in real-time using specialized algorithms based on current relational metadata. SERIALIZATION SCHEMA REDEFINITION: In order to exchange relational data over a shared environment (e.g. the internet), the format (i.e. serialization schema) of the data has to be defined and shared by the interacting applications [8]. The serialization schema is typically mapped to the data model (classes and objects) of the interacting applications in design-time and hence, as the relational schema evolves, the serialization schema has to be redefined and the interacting applications have to be recompiled. This is in contrast with the primary objective: avoiding recompilation of the application in the course of relational schema evolvement. AUTO-GENERATED USER INTERFACE: The user interface has to be inferred from the relational schema and generated on-the-fly instead of being designed and customized by the software vendor. This puts high demands on metadata models to supply sufficient information for generating rich user interface from the provided metadata. The current relational metadata models, such as INFORMATION SCHEMA, CWM or OIM do not provide sufficient information, hence they have to be extended [5]. This introduces other problems with synchronization between relational metadata and relational schema (more in the next chapter). The aim of this work is to propose a novel, metadata-driven approach to relational database management enabling schema alteration in real-time, i.e. without recompiling the application. The keynote of the new approach consists in automated relational metadata to relational schema mapping: Instead of altering relational schema using SQL statements and retrieving relational metadata from specialized database views, the opposite approach is proposed in this work: the relational schema is altered automatically by modifying relational metadata in regular tables. In this work, a new software tier, Relational Schema Tier (RST), is proposed. This tier enables automated relational schema management using relational metadata exchange. It provides algorithms for automated relational metadata to relational schema mapping. For relational data and metadata exchange, a new communication protocol, Relational Schema Protocol (RSP), is specified. The purpose of this protocol is to replace schema-dependent SQL statements with schema-independent remote procedure calls. It defines remote operations (procedures) for relational schema exploration, data and metadata exchange and efficient aggregate functions computation and a novel serialization schema for generic relational data and metadata. For relational metadata representation and storage, a new metadata model, Relational Schema Model (RSM), is defined. This model involves relational metadata from standard relational metadata 7 models with revised structure for more effective processing as well as the additional metadata for relational schema localization, data visualization and validation. Data Access Tier Application Logic Tier Presentation Tier Application RSP Auto- SQL RSP Relational generated