Semantics-Preserving RDB2RDF Data Transformation Using Hierarchical Direct Mapping
Total Page:16
File Type:pdf, Size:1020Kb
applied sciences Article Semantics-Preserving RDB2RDF Data Transformation Using Hierarchical Direct Mapping Hee-Gook Jun 1 and Dong-Hyuk Im 2,* 1 Openub, Seoul 06097, Korea; [email protected] 2 School of Information Convergence, Kwangwoon University, Seoul 01890, Korea * Correspondence: [email protected] Received: 22 September 2020; Accepted: 8 October 2020; Published: 12 October 2020 Abstract: Direct mapping is an automatic transformation method used to generate resource description framework (RDF) data from relational data. In the field of direct mapping, semantics preservation is critical to ensure that the mapping method outputs RDF data without information loss or incorrect semantic data generation. However, existing direct-mapping methods have problems that prevent semantics preservation in specific cases. For this reason, a mapping method is developed to perform a semantics-preserving transformation of relational databases (RDB) into RDF data without semantic information loss and to reduce the volume of incorrect RDF data. This research reviews cases that do not generate semantics-preserving results, and the corresponding problems into categories are arranged. This paper defines lemmas that represent the features of RDF data transformation to resolve those problems. Based on the lemmas, this work develops a hierarchical direct-mapping method to strictly abide by the definition of semantics preservation and to prevent semantic information loss, reducing the volume of incorrect RDF data generated. Experiments demonstrate the capability of the proposed method to perform semantics-preserving RDB2RDF data transformation, generating semantically accurate results. This work impacts future studies, which should involve the development of synchronization methods to achieve RDF data consistency when original RDB data are modified. Keywords: hierarchical direct mapping; relational database; semantic web; web ontological language 1. Introduction The transformation of relational databases (RDB) into resource-description-framework (RDF) data is a key information extraction method used to publish semantics web data [1–3]. In 1998, Tim Berners–Lee proposed the concept of mapping RDBs to the Semantic Web [4]. Since then, several approaches have been proposed to improve the mapping of RDBs to semantic data. Additionally, the World Wide Web Consortium (W3C) has organized working groups to help standardize the technologies used for transforming RDBs into RDF data. Direct mapping is a representative mapping method recommended by the W3C to support the automatic mapping of relational data to semantic data [5]. Figure1 illustrates an example of direct mapping, which defines mapping rules for transforming both relational schema and instance data into RDF data. In the field of direct mapping, researchers have long-studied effective automated processes that focus on semantics preservation. The semantics preservation of the direct mapping process reflects the relational integrity constraints within the mapping result [6,7]. Because integrity constraints define the semantics of a database, mapping with integrity constraints generates more semantically accurate results. Appl. Sci. 2020, 10, 7070; doi:10.3390/app10207070 www.mdpi.com/journal/applsci Appl. Sci. 2020, 10, 7070 2 of 19 Appl. Sci. 2020, 10, x FOR PEER REVIEW 2 of 19 Appl. Sci. 2020, 10, x FOR PEER REVIEW 2 of 19 Figure 1. A simplified example of direct mapping. RDB represents relational database, RDF represents Figure 1. A simplified example of direct mapping. relationalRDB represents databases relational resource-description-framework. database, RDF represents relational databases resource-description-framework. Figure 1. A simplified example of direct mapping. RDB represents relational database, RDF represents relational databases resource-description-framework. Although the the transformation transformation of relational of relational data using data integrity using constraints integrity constraints has been long-studied has been long-studied [8–16], several problems still occur during the direct-mapping processes under specific [8–16], severalAlthough problems the transformation still occur ofduring relational the data direct-mapping using integrity processes constraints under has been specific long-studied conditions conditions (see Figure2). If an attribute has two or more integrity constraints (e.g., NotNull and Unique (see Figure[8–16], 2).several If an problems attribute still has occur two or during more the integrity direct-mapping constraints processes (e.g., NotNull under specific and Unique conditions in Figure in2a), Figure the(see mapping Figure2a), the 2). mapping Ifresult an attribute will result output has will two a outputorsingle more RDF aintegrity single graph constraints RDF structure graph (e.g., structure that NotNull combines that and combinesUnique all of inthe Figure all integrity of the integrityconstraints2a), the constraints (the mapping subgraph (theresult subgraphrooted will output by rooted name a single in by Figure RDF name graph 2b). in Figure Thestructure mapping2b). that The combines mappingresult preserves all result of the preservesthe integrity semantics the semanticsin viewconstraints of inthe view previous (the ofsubgraph the methods previous rooted because by methods name relational in becauseFigure 2b).data relational The is transformedmapping data result is transformedinto preserves RDF data. the into semantics However, RDF data. to However,date, inno view explicit to of date, the transformingprevious no explicit methods transforming meta-information because relational meta-information or data well-designed is transformed or well-designed hierarchical into RDF hierarchicaldata. structure However, structurehas to been date, no explicit transforming meta-information or well-designed hierarchical structure has been hasprovided been providedto trace the to traceinput the relational input relational data into data the intomapping the mapping result. RDF result. NotNull RDF NotNull and Uniqueand Uniquetriples provided to trace the input relational data into the mapping result. RDF NotNull and Unique triples triplesshould shouldhave meta-information have meta-information that thatreveals reveals their their NotNullNotNull andand UniqueUnique constraintsconstraints respectively, respectively, regardingshould the have attribute meta-information name in table that Lecturereveals .their Without NotNull this and information, Unique constraints a new subgraphrespectively, can be regardingregarding the theattribute attribute name name in in table table Lecture Lecture.. WithoutWithout thisthis information, information, a newa new subgraph subgraph can becan be extracted from the merged RDF graph, which can then be misinterpreted, generating unintended extractedextracted from from the themerged merged RDF RDF graph, graph, which which can thenthen bebe misinterpreted, misinterpreted, generating generating unintended unintended constraints not found in the original data (see the primary key in Figure2c). Thus, mapping methods constraintsconstraints not foundnot found in thein the original original data data (see (see thethe primary key key in in Figure Figure 2c). 2c). Thus, Thus, mapping mapping methods methods basedbased on weak on weak definitionsdefinition definitions ofs of semanticssemantics semantics preservation preservation can can cause cause information information loss loss and and incorrect incorrect data data generation.generation. Therefore,Therefore, Therefore, to to ensure toensure ensure the the accuracythe accuracy accuracy of the ofmapping thethe mappingmapping process process process in all in cases, inall allcases, a stringentcases, a stringent a definitionstringent isdefinition neededdefinition tois quantifyneeded is needed to semantics quantifyto quantify preservation.semantics semantics preservation. preservation. Figure 2. An example of problems encountered during a direct-mapping process. Appl. Sci. 2020, 10, x FOR PEER REVIEW 3 of 19 Appl. Sci. 2020, 10, 7070 3 of 19 Figure 2. An example of problems encountered during a direct-mapping process. This paperpaper proposes proposes a hierarchicala hierarchical direct-mapping direct-mapping algorithm algorithm that prevents that prevents the problem the illustrated problem inillustrated Figure2 andin Figure preserves 2 and the preserves semantics the based semantics on strict based logical on rules. strict Mappinglogical rules. problems Mapping occur problems when a mappingoccur when method a mapping simply method focuses onsimp data-typely focuses transformations. on data-type transfor To preventmations. this, the To proposedprevent this, method the usesproposed a hierarchical method uses semantics a hierarchical vocabulary semantics and advanced vocabulary mapping and advanced rules to mapmapping without rules semantic to map informationwithout semantic loss. information This paper alsoloss. definesThis paper an evaluation also defines metric an evaluation using the metric inverse-mapping using the inverse- phase. Thatmapping is, a phase. mapping That method is, a mapping is said method to preserve is said the to semanticspreserve the if thesemantics result ofif the inverse-mapping result of inverse- is semanticallymapping is semantically identical to theidentical original to inputthe original data (Figure input3 data). Evaluation (Figure 3). results Evaluation confirm results the e confirmffectiveness the andeffectiveness accuracy and of semantics accuracy preservationof semantics ofpres theervation