applied sciences

Article Semantics-Preserving RDB2RDF Data Transformation Using Hierarchical Direct Mapping

Hee-Gook Jun 1 and Dong-Hyuk Im 2,*

1 Openub, Seoul 06097, Korea; [email protected] 2 School of Information Convergence, Kwangwoon University, Seoul 01890, Korea * Correspondence: [email protected]

 Received: 22 September 2020; Accepted: 8 October 2020; Published: 12 October 2020 

Abstract: Direct mapping is an automatic transformation method used to generate resource description framework (RDF) data from relational data. In the field of direct mapping, semantics preservation is critical to ensure that the mapping method outputs RDF data without information loss or incorrect semantic data generation. However, existing direct-mapping methods have problems that prevent semantics preservation in specific cases. For this reason, a mapping method is developed to perform a semantics-preserving transformation of relational (RDB) into RDF data without semantic information loss and to reduce the volume of incorrect RDF data. This research reviews cases that do not generate semantics-preserving results, and the corresponding problems into categories are arranged. This paper defines lemmas that represent the features of RDF data transformation to resolve those problems. Based on the lemmas, this work develops a hierarchical direct-mapping method to strictly abide by the definition of semantics preservation and to prevent semantic information loss, reducing the volume of incorrect RDF data generated. Experiments demonstrate the capability of the proposed method to perform semantics-preserving RDB2RDF data transformation, generating semantically accurate results. This work impacts future studies, which should involve the development of synchronization methods to achieve RDF data consistency when original RDB data are modified.

Keywords: hierarchical direct mapping; relational ; ; web ontological language

1. Introduction The transformation of relational databases (RDB) into resource-description-framework (RDF) data is a key information extraction method used to publish semantics web data [1–3]. In 1998, Tim Berners–Lee proposed the concept of mapping RDBs to the Semantic Web [4]. Since then, several approaches have been proposed to improve the mapping of RDBs to semantic data. Additionally, the Consortium (W3C) has organized working groups to help standardize the technologies used for transforming RDBs into RDF data. Direct mapping is a representative mapping method recommended by the W3C to support the automatic mapping of relational data to semantic data [5]. Figure1 illustrates an example of direct mapping, which defines mapping rules for transforming both relational schema and instance data into RDF data. In the field of direct mapping, researchers have long-studied effective automated processes that focus on semantics preservation. The semantics preservation of the direct mapping process reflects the relational integrity constraints within the mapping result [6,7]. Because integrity constraints define the semantics of a database, mapping with integrity constraints generates more semantically accurate results.

Appl. Sci. 2020, 10, 7070; doi:10.3390/app10207070 www.mdpi.com/journal/applsci Appl. Sci. 2020, 10, 7070 2 of 19 Appl. Sci. 2020, 10, x FOR PEER REVIEW 2 of 19

Appl. Sci. 2020, 10, x FOR PEER REVIEW 2 of 19

Figure 1. A simplified exampleFigure of 1. direct A simplified mapping. example RDB represents of direct mapping. relational database, RDF represents relationalRDB represents databases relational resource-description-framework. database, RDF represents relational databases resource-description-framework. Figure 1. A simplified example of direct mapping. RDB represents relational database, RDF represents relational databases resource-description-framework. Although the the transformation transformation of relational of relational data using data integrity using constraints integrity constraints has been long-studied has been long-studied [8–16], several problems still occur during the direct-mapping processes under specific [8–16], severalAlthough problems the transformation still occur ofduring relational the data direct-mapping using integrity processes constraints under has been specific long-studied conditions conditions (see Figure2). If an attribute has two or more integrity constraints (e.g., NotNull and Unique (see Figure[8–16], 2).several If an problems attribute still has occur two or during more the integrity direct-mapping constraints processes (e.g., NotNull under specific and Unique conditions in Figure in2a), Figure the(see mapping Figure2a), the 2). mapping Ifresult an attribute will result output has will two a outputorsingle more RDF aintegrity single graph constraints RDF structure graph (e.g., structure that NotNull combines that and combinesUnique all of inthe Figure all integrity of the integrityconstraints2a), the constraints (the mapping subgraph (theresult subgraphrooted will output by rooted name a single in by Figure RDF name graph 2b). in Figure Thestructure mapping2b). that The combines mappingresult preserves all result of the preservesthe integrity semantics the semanticsin viewconstraints of inthe view previous (the ofsubgraph the methods previous rooted because by methods name relational in becauseFigure 2b).data relational The is transformedmapping data result is transformedinto preserves RDF data. the into semantics However, RDF data. to However,date, inno view explicit to of date, the transformingprevious no explicit methods transforming meta-information because relational meta-information or data well-designed is transformed or well-designed hierarchical into RDF hierarchicaldata. structure However, structurehas to been date, no explicit transforming meta-information or well-designed hierarchical structure has been hasprovided been providedto trace the to traceinput the relational input relational data into data the intomapping the mapping result. RDF result. NotNull RDF NotNull and Uniqueand Uniquetriples provided to trace the input relational data into the mapping result. RDF NotNull and Unique triples triplesshould shouldhave meta-information have meta-information that thatreveals reveals their their NotNullNotNull andand UniqueUnique constraintsconstraints respectively, respectively, regardingshould the have attribute meta-information name in table that Lecturereveals .their Without NotNull this and information, Unique constraints a new subgraphrespectively, can be regardingregarding the theattribute attribute name name in in table table Lecture Lecture.. WithoutWithout thisthis information, information, a newa new subgraph subgraph can becan be extracted from the merged RDF graph, which can then be misinterpreted, generating unintended extractedextracted from from the themerged merged RDF RDF graph, graph, which which can thenthen bebe misinterpreted, misinterpreted, generating generating unintended unintended constraints not found in the original data (see the primary key in Figure2c). Thus, mapping methods constraintsconstraints not foundnot found in thein the original original data data (see (see thethe primary key key in in Figure Figure 2c). 2c). Thus, Thus, mapping mapping methods methods basedbased on weak on weak definitionsdefinition definitions ofs of semanticssemantics semantics preservation preservation can can cause cause information information loss loss and and incorrect incorrect data data generation.generation. Therefore,Therefore, Therefore, to to ensure toensure ensure the the accuracythe accuracy accuracy of the ofmapping thethe mappingmapping process process process in all in cases, inall allcases, a stringentcases, a stringent a definitionstringent isdefinition neededdefinition tois quantifyneeded is needed to semantics quantifyto quantify preservation.semantics semantics preservation. preservation.

Figure 2. An example of problems encountered during a direct-mapping process.

Appl. Sci. 2020, 10, x FOR PEER REVIEW 3 of 19 Appl. Sci. 2020, 10, 7070 3 of 19 Figure 2. An example of problems encountered during a direct-mapping process.

This paperpaper proposes proposes a hierarchicala hierarchical direct-mapping direct-mapping algorithm algorithm that prevents that prevents the problem the illustrated problem inillustrated Figure2 andin Figure preserves 2 and the preserves semantics the based semantics on strict based logical on rules. strict Mappinglogical rules. problems Mapping occur problems when a mappingoccur when method a mapping simply method focuses onsimp data-typely focuses transformations. on data-type transfor To preventmations. this, the To proposedprevent this, method the usesproposed a hierarchical method uses semantics a hierarchical vocabulary semantics and advanced vocabulary mapping and advanced rules to mapmapping without rules semantic to map informationwithout semantic loss. information This paper alsoloss. definesThis paper an evaluation also defines metric an evaluation using the metric inverse-mapping using the inverse- phase. Thatmapping is, a phase. mapping That method is, a mapping is said method to preserve is said the to semanticspreserve the if thesemantics result ofif the inverse-mapping result of inverse- is semanticallymapping is semantically identical to theidentical original to inputthe original data (Figure input3 data). Evaluation (Figure 3). results Evaluation confirm results the e confirmffectiveness the andeffectiveness accuracy and of semantics accuracy preservationof semantics ofpres theervation proposed of the mapping proposed methods. mapping methods.

Figure 3. Overview of our direct-mapping method. This paper makes the following contributions: This paper makes the following contributions: Lemmas are defined to ensure semantics preservation and to demonstrate the soundness and • • Lemmas are defined to ensure semantics preservation and to demonstrate the soundness and completenesscompleteness of direct of direct mapping. mapping. Hierarchical mapping rules are defined based on lemmas to perform semantics-preserving • • Hierarchical mapping rules are defined based on lemmas to perform semantics-preserving RDB-to-RDFRDB-to-RDF (RDB2RDF) (RDB2RDF) transformations transformations and and to to prevent prevent the the loss loss of of semanticssemantics andand incorrect semanticsemantic data data generation generation problems. problems. •The The scope scope of semanticsof semantics preservation preservation is is extended, extended, such such that that the the inverse inversetransformation transformation ofof the • outputoutput semantic semantic data data should should be identicalbe identical to the to originalthe original input input data. data. The remainder of thisthis paperpaper isis structuredstructured asas follows.follows. In the followingfollowing section,section, the RDB2RDFRDB2RDF mapping methods are brieflybriefly discussed.discussed. Section 33 presents presents preliminariespreliminaries andand problemproblem descriptionsdescriptions of direct mapping. Section 44 describesdescribes thethe proposedproposed mappingmapping rulesrules usingusing logicallogical definitionsdefinitions andand thethe implementation ofof thosethose rulesrules inin detail. Section Section 55 summarizessummarizes thethe resultsresults ofof ourour experiments.experiments. Finally, the conclusions andand discussiondiscussion ofof prospectiveprospective futurefuture researchresearch areare providedprovided inin SectionSection6 .6.

2. Related Work RDB2RDF isis aa mappingmapping method method that that transforms transforms relational relational data data into into semantic semantic data data represented represented by theby the RDF. RDF. The RDFThe RDF data modeldata model [17] is [17] a language is a langua thatge describes that describes semantic semantic information information on the Semantic on the Web.Semantic The Web. basic The unit basic of RDF unit data of RDF is based data onis abased graph on structure a graph (i.e.,structure the triple: (i.e., the subject, triple: property, subject, andproperty, object) and [18 –object)20]. The [18–20]. RDF is The a flexible RDF andis a interoperableflexible and interoperable model used to model publish used information to publish to theinformation web relative to the to web the relationalrelative to data the relational model. However, data model. because However, ~70% because of websites ~70% are of backed websites up are as RDBsbacked [1 up], existing as RDBs relational [1], existing data relational must be useddata withmust thebe used RDB2RDF with the methodology RDB2RDF formethodology the improvement for the ofimprovement the Semantic of Web. the Semantic Web. RDB2RDF mapping approaches include mapping creation,creation, mapping representation and accessibility, mapping implementation,implementation, query implementation,implementation, application domain, and data integration [[21].21]. Mapping creation has been widely studied to improve the generationgeneration ofof mappingsmappings between relationalrelational data data and and semantic semantic data, data, and and it canit can be be performed performed either either automatically automatically or manually or manually [22] (see[22] Table(see Table1 for a1 list for of a previouslist of previous mapping mapping works). works). Domain Domain semantics-driven semantics-driven mapping mapping is a manual is a mappingmanual mapping method [method23,24]. The[23,24]. W3C The RDB2RDF W3C RD WorkingB2RDF Working Group has Group recommended has recommended the R2RML the mapping R2RML languagemapping language [25,26] for [25,26] customizing for customizing mappings. mappings. Mapping Mapping tools, such tools, as D2RQsuch as [ 27D2RQ,28], [27,28], Virtuoso Virtuoso [29,30], Ultrawrap[29,30], Ultrawrap [31,32], [31,32 etc. have], etc. also have been also provided been provided to support to support manual manual mapping. mapping. On the On other the hand,other directhand, mappingdirect mapping is an automatic is an automatic mapping mapping method method that was that published was published by the W3C by the RDB2RDF W3C RDB2RDF Working Appl. Sci. 2020, 10, 7070 4 of 19

Group in 2012 [5]. It uses RDB instances and schemas as inputs and automatically generates RDF semantic data. In the field of RDF data creation, some methods used to transform various types of data (e.g., heterogeneous data [33], object-oriented data [34], and the Web of Data [35]) to RDF data have been devised. The current paper, however, mainly focuses on direct mapping to manage large-scale data on the web.

Table 1. Classification of previously reported mapping methods.

Type Method Authors Hert et al. (2011) R2RML RDB2RDF Working Group (2012) Manual D2RQ Bizer et al. (2004, 2009) RDB2RDF (Domain semantics-driven mapping) Blakeley (2007) Virtuoso Erling and Mikhailov (2009) Ultrawrap Sequeda et al. (2009, 2013) Sequeda et al. (2012) Buccella et al. (2004) Astrova et al. (2007) Augmented mapping Li et al. (2005) Automatic using OWL Lim et al. (2015) RDB2RDF Tirmizi et al. (2008) (Direct mapping) Jun et al. (2020) DB2OWL Cullot et al. (2007) RDBToOnto Cerbah (2008) OWLED Shen (2006)

Further research has been conducted to obtain semantic data from relational data without information loss [6,8–16]. RDF schema (RDFS) and (OWL) are used to obtain a more accurate mapping for RDB2RDF transformation. The concept of RDF data can be modeled by RDFS or OWL in a manner similar to that used for defining the relational schema using SQL data definition language (DDL). Moreover, because OWL contains a more expressive semantic vocabulary, the mapping methods can better express the semantics of relational integrity constraints. Sequeda et al. [7] proposed an augmented direct-mapping method that generates semantic data from the integrity constraints of the SQL DDL schema. Because integrity constraints define the semantics of the RDBs, the quality of augmented direct mapping depends on the transformation of the integrity constraints of the RDBs. DB2OWL [10] and RDBToOnto [11] also provided augmented direct mapping tools. However, they were restricted to supporting only referential integrity constraints. Lim et al. [13] used the OWL to process more rules and Jun et al. [14] proposed semantics-preserving optimization of mapping multi-column key constraints. However, their method still lacked support for use in the transformation of all integrity constraints of the relational SQL syntax. Moreover, this paper has observed that the problem of incorrect semantic data generation can occur in specific cases, as described in the next section.

3. Preliminaries

3.1. Direct Mapping Developed in 2010 and recommended in 2012 by the W3C RDB2RDF Working Group, direct mapping is an automatic map-creation method that transforms relational input data, including schema data, into RDF graph data (i.e., a direct graph). Direct mapping can be viewed as a function of transforming relational data with integrity constraints to semantic data. Figure4 provides an example of relational input data via the direct-mapping process. The table, “Product,” contains an Appl. Sci. 2020, 10, 7070 5 of 19

Appl. Sci. 2020, 10, x FOR PEER REVIEW 5 of 19 attribute,Appl. Sci. 2020 pId,, 10, as x FOR its primary PEER REVIEW key, an attribute name, and an attribute production as a foreign key5 thatof 19 referencesan attribute a name. table called Figure “Production.” 5 presents the Production result of the contains direct-mapping an attribute, process pCd, regarding as its primary the input key data and anshown attribute in Figure name. 4. FigureThe output5 5 presentspresents graph thethe comprises resultresult ofof theathe set direct-mappingdirect-mapping of RDF triples. processSupposeprocess regarding regarding that the base the the input inputURI of data data the shownoutput indatain FigureFigure is .4 4.. TheThe outputoutput graphgraph comprises comprises The a a set set primary of of RDF RDF key triples. triples. attributes SupposeSuppose with thatthat the thethe base basebase URI URIURI are ofof used thethe outputto generate data the is <. suhttp:bject//idb.snu.ac.kr resource. Two/example Product/> .resources The primary and key two attributes Production with resources the base are URI generated. are used toPredicatesto generategenerate are the generated subjectsubject resource. from the Two attribute Product names resources of relational and two tables, Production and objects resources are generated are generated. from Predicatesthe attribute are values. generated from the attribute names of relational tables, and objects are generated from thethe attributeattribute values.values.

Figure 4. An example input relational data of the direct mapping. Figure 4. An example input relational data of the direct mapping. Figure 4. An example input relational data of the direct mapping.

Figure 5. An example result of the direct mapping. Figure 5. An example result of the direct mapping. 3.2. Semantics Preservation ofFigure Direct 5. Mapping An example result of the direct mapping. 3.2. Semantics Preservation of Direct Mapping 3.2. SemanticsFurther research Preservation has been of Direct conducted Mapping on the improvement of direct mapping to reduce information loss andFurther ensure research semantics has preservation.been conducted Semantics on the preservationimprovement is anof importantdirect mapping feature to of reduce direct Further research has been conducted on the improvement of direct mapping to reduce mappinginformation as loss the and quality ensure of semantics direct mapping preservation. is heavily Semantics depends preservation on the semantics is an important preservation. feature information loss and ensure semantics preservation. Semantics preservation is an important feature Sequedaof direct mapping et al. [7] provided as the quality a theoretical of direct mapping definition is ofheavily semantics depends preservation. on the semantics In addition preservation. to the of direct mapping as the quality of direct mapping is heavily depends on the semantics preservation. definition,Sequeda et this al. research[7] provided provides a theoretical a stricter definitiondefinition to of quantify semantics semantics preservation. preservation In addition and evaluate to the definition,Sequeda et this al. research[7] provided provides a theoretical a stricter definitiondefinition to of quantify semantics semantics preservation. preservation In addition and evaluate to the thedefinition, accuracy this of research the mapping provides methods. a stricter This definition paper definesto quantify the semantics preservationpreservation and of mapping evaluate methodsthe accuracy as follows: of the mapping methods. This paper defines the semantics preservation of mapping methods as follows: Appl. Sci. 2020, 10, 7070 6 of 19 the accuracy of the mapping methods. This paper defines the semantics preservation of mapping methodsAppl. Sci. 2020 as, follows:10, x FOR PEER REVIEW 6 of 19 Semantics preservation: Suppose X is a set of relational data, F is a RDB to RDF mapping function, Semantics preservation: Suppose X is a set of relational data, F is a RDB to RDF mapping and G is an RDF to RDB inverse-mapping function of F, if |X| = |G(F(X))| and |G(F(X)) X| = 0, then F is function, and G is an RDF to RDB inverse-mapping function of F, if |X| = |G(F(X))|− and |G(F(X)) - an ideal function that satisfies semantics preserving mappings (Figure6). X| = 0, then F is an ideal function that satisfies semantics preserving mappings (Figure 6).

Figure 6. Graphical view of semantics preservation of direct mapping. 3.3. Limitation of Direct Mapping for Semantics Preservation 3.3. Limitation of Direct Mapping for Semantics Preservation This section defines the three challenging problems encountered during the direct-mapping This section defines the three challenging problems encountered during the direct-mapping process for semantic preservation. This paper observes loss of semantics (Problems 1 and 2) and process for semantic preservation. This paper observes loss of semantics (Problems 1 and 2) and incorrect semantic data generation (Problem 3), which may occur in specific cases. The problems and incorrect semantic data generation (Problem 3), which may occur in specific cases. The problems and specific conditions in which the problems occur were found during studies of existing direct-mapping specific conditions in which the problems occur were found during studies of existing direct- methods [8–16]. To overcome these drawbacks, the problems into three categories are organized. mapping methods [8–16]. To overcome these drawbacks, the problems into three categories are Problem 1 illustrates the loss of information when relational tables are transformed into semantic organized. data using an OWL class. Because the OWL class is associated with objects generated in a single Problem 1 illustrates the loss of information when relational tables are transformed into semantic semantic model structure, every semantic resource can be inferred from the OWL class. Therefore, data using an OWL class. Because the OWL class is associated with objects generated in a single additional methods are needed to indicate that the output data are particularly generated from the semantic model structure, every semantic resource can be inferred from the OWL class. Therefore, transformed relational table. additional methods are needed to indicate that the output data are particularly generated from the transformedProblem relational 1: Suppose table.ya = Class(xa) is an RDB2RDF mapping rule for a relational table, where xa R, • ∈ •R is a set of relational tables, ya C, C is a set of OWL classes, and xb = Class_Inverse(yb) is an Problem 1: Suppose ya = Class(∈ xa) is an RDB2RDF mapping rule for a relational table, where RDF2RDB inverse-mapping rule of an OWL class, where x X, X is a set of results generated xa ∈ R, R is a set of relational tables, ya ∈ C, C is a set of OWLb ∈ classes, and xb = Class_Inverse(yb) by Class_Inverse(y ), y C, and C is a set of OWL classes. However, x is not the same as xa, is an RDF2RDBb inverse-mappingb ∈ rule of an OWL class, where xb ∈ bX, X is a set of results because R X. generated⊂ by Class_Inverse(yb ), yb ∈ C, and C is a set of OWL classes. However, xb is not the same as xa, because R ⊂ X. Problem 2 illustrates another type of information loss that occurs when semantic resources referenceProblem other 2 resources.illustrates Each another referencing type of object, informat includingion loss relational that occurs attributes when andsemantic binary resources relations, canreference be transformed other resources. into the Each same referencing type (i.e., object, OWL objectincluding property). relational Thus, attributes a method and of binary distinguishing relations, eachcan be referencing transformed object into is the discussed same ty andpe (i.e., provided OWL object in Section property).4. Thus, a method of distinguishing each referencing object is discussed and provided in Section 4. Problem 2: Suppose ya = ObjProp(xa) is an RDB2RDF mapping rule for a relational table, where xa • • a a = {xProblem, x }, x 2: SupposeB, B is a y set = of ObjProp( binary relations,x ) is an xRDB2RDFF, F is amapping set of foreign rule keys,for a yrelationala O, O is table, a set a1 a2 a1 ∈ a2 ∈ ∈ where xa = {xa1, xa2}, xa1 ∈ B, B is a set of binary relations, xa2 ∈ F, F is a set of foreign keys, ya ∈ of OWL object properties, and xb = ObjProp_Inverse(yb) is an RDF2RDB inverse-mapping rule of anO, OWL O is a object set of property,OWL object where properties,x X, Xandis axb set = ObjProp_Inverse( of results generatedyb) is by an ObjProp_Inverse( RDF2RDB inverse-y ), b ∈ b y mappingO, and O ruleis a setof an of OWL objectobject properties.property, where However, xb ∈ ObjProp_Inverse(X, X is a set of results ) does generated not work by as b ∈ ObjProp_Inverse(yb), yb ∈ O, and O is a set of OWL object properties. However, intended, because it has not been given any information to determine whether yb is generated ObjProp_Inverse( ) does not work as intended, because it has not been given any information from xa1 or xa2. to determine whether yb is generated from xa1 or xa2. Problem 3 illustrates incorrect semantic data generation when integrity constraints are transformed Problem 3 illustrates incorrect semantic data generation when integrity constraints are without considering that every subgraph having a specific identical root node can be merged into a transformed without considering that every subgraph having a specific identical root node can be single graph. merged into a single graph. Problem 3: Assume that the mapping rules for integrity constraints of relational data are • • Problem 3: Assume that the mapping rules for integrity constraints of relational data are described in Figure7. Here, the predicates on the right-hand side are used to verify the integrity described in Figure 7. Here, the predicates on the right-hand side are used to verify the integrity constraints. DefaultCondition(p, v) is a function that assigns v as a default value of predicate p. CheckCondition (p, c) is a function that assigns c as a check condition of predicate p, and the other predicates on the left-hand side are defined in the Appendix A. Appl. Sci. 2020, 10, 7070 7 of 19

constraints. DefaultCondition(p, v) is a function that assigns v as a default value of predicate p. CheckCondition (p, c) is a function that assigns c as a check condition of predicate p, and the other Appl. Sci. 2020, 10, x FOR PEER REVIEW 7 of 19 Appl. Sci. 2020, 10, x FOR PEER REVIEW 7 of 19 Appl.predicates Sci. 2020, 10, onx FOR the PEER left-hand REVIEW side are defined in the AppendixA. 7 of 19

Figure 7. Simple mapping rules of integrity constraints. Figure 7. Simple mapping rules of integrity constraints. Figure 7. Simple mapping rules of integrity constraints.

TheThe subset subsetsubset relationships relationships relationships can can can be inferred,beinferred, inferred, as as shown shownas shown in Figurein Figurein 8Figure. However, 8. However,8. However, semantic semantic datasemantic generated data data The subset relationships can be inferred, as shown in Figure 8. However, semantic data generatedbygenerated the above by theby rules theabove above can rules be rules misinterpreted. can can be misinterpreted.be misinterpreted. For example, For For assumeexample, example, that assume a assume relational that that a attribute, relational a relational x, attribute, has attribute, integrity generated by the above rules can be misinterpreted. For example, assume that a relational attribute, x, hasconstraints,x, has integrity integrity “primaryconstraints, constraints, key” “primary and“primary “check,” key” key” and F isand “c a mappingheck,” “check,” F is function F a is mapping a mapping that containsfunction function thethat that rules contains contains of Figure the the7, x, has integrity constraints, “primary key” and “check,” F is a mapping function that contains the rulesandrules of GFigure of is Figure an inverse7, and 7, and G mapping is G an is inversean functioninverse mapping mapping of F. Then, function function the of integrity F. of Then, F. Then, constraints the theintegrity integrity of G(F(x)) constraints constraints are “primary of G(F(x)) of G(F(x)) key,” rules of Figure 7, and G is an inverse mapping function of F. Then, the⊆ integrity constraints of G(F(x)) are“check,” are“primary “primary “foreign key,” key,” “check,” key,” “check,” and “foreign “unique,” “foreign key,” key,” because and and “unique,” FK(p) “unique,” Check(p),because because FK(p) and FK(p) Unique Check(p), ⊆ Check(p), (p) andPK(p) and Unique UniqueCheck(p). (p) (p) ⊆ are “primary key,” “check,” “foreign key,” and “unique,”⊆ because FK(p) ⊆ Check(p),⊆ and∪ Unique (p) PK(p)Therefore,⊆ PK(p) ∪ Check(p). ∪ Check(p). developing Therefore, Therefore, a method developing developing to avoid a incorrectmethod a method to semantic av tooid av oidincorrect data incorrect generation semantic semantic is data a challenge data generation generation associated is a is a ⊆ PK(p) ∪ Check(p). Therefore, developing a method to avoid incorrect semantic data generation is a challengewithchallenge the associated semantics-preserving associated with with the thesemantics-preserving semantics-preserving RDB2RDF transformation RDB2RDF RDB2RDF (the transformation transformation detailed example (the (the detailed is detailed provided example example in the challenge associated with the semantics-preserving RDB2RDF transformation (the detailed example is providedAppendixis provided inB ).the in theAppendix Appendix B). B). is provided in the Appendix B).

Figure 8. Subset relationships by mapping rules of integrity constraints. Figure 8. Subset relationships by mapping rules of integrity constraints. Figure 8. Subset relationships by mapping rules of integrity constraints. Figure 8. Subset relationships by mapping rules of integrity constraints. 4. Hierarchical Mapping Rules 4. Hierarchical4. Hierarchical Mapping Mapping Rules Rules 4. HierarchicalThis section Mapping provides Rules the hierarchical rules for learning general relational schemas and integrity constraints.ThisThis section section Each provides ruleprovides is the based thehierarchical onhierarchical lemmas rules that rules for are forlearning valid learning within general general the semanticsrelational relational domain.schemas schemas Then,and and integrity this integrity section This section provides the hierarchical rules for learning general relational schemas and integrity constraints.thenconstraints. explains Each Each howrule rule theis based is problems based on onlemmas described lemmas that inthat are Section arevalid valid 3.3 within can within be the prevented thesemantics semantics by domain. using domain. proposed Then, Then, this rules this constraints. Each rule is based on lemmas that are valid within the semantics domain. Then, this sectionviasection lemmas.then then explains explains This how work how the uses theproblems predicateproblems described logicdescribed to in define Section in Section rules 3.3 3.3can and can be add prbeeventedgraphical prevented by examples usingby using proposed forproposed better section then explains how the problems described in Section 3.3 can be prevented by using proposed rulesunderstanding.rules via vialemmas. lemmas. This Then, This work work the uses hierarchically uses predicate predicate logic structured logic to define to define semantic rules rules and vocabularies and add add graphical graphical are examples provided examples for in betterfor order better to rules via lemmas. This work uses predicate logic to define rules and add graphical examples for better understanding.generateunderstanding. sound Then, andThen, the precise thehierarchically hierarchically semantic data.structured structured The relationships semantic semantic vocabularies among vocabularies the lemmas,are areprovided provided rules, in and orderin problems order to to understanding. Then, the hierarchically structured semantic vocabularies are provided in order to generatearegenerate described sound sound and in and Figureprecise precise9 semanticto clarify semantic data. the data. concept The The rela of relationships thetionships rules. among among the thelemmas, lemmas, rules, rules, and and problems problems generate sound and precise semantic data. The relationships among the lemmas, rules, and problems are aredescribed described in Figure in Figure 9 to 9 clarify to clarify the theconcept concept of the of therules. rules. are described in Figure 9 to clarify the concept of the rules.

FigureFigure 9. Relationships 9. Relationships among among the thelemmas, lemmas, rules, rules, and andproblems. problems. Figure 9. Relationships among the lemmas, rules, and problems. Figure 9. Relationships among the lemmas, rules, and problems.

4.1.4.1. Rules Rules for Generalfor General Relational Relational Schemas Schemas 4.1. Rules for General Relational Schemas ThisThis section section defines defines the therules rules using using Lemmas Lemmas 1 and 1 and 2 to 2 generateto generate accurate accurate RDF RDF data data from from This section defines the rules using Lemmas 1 and 2 to generate accurate RDF data from relationalrelational data data without without information information loss loss (proofs (proofs are areprovided provided in thein theAppendix Appendix C and C and D). D).Lemma Lemma 1 1 relational data without information loss (proofs are provided in the Appendix C and D). Lemma 1 describesdescribes the thefeature feature of the of theOWL OWL class cl assduring during the themapping mapping process. process. describes the feature of the OWL class during the mapping process. Appl. Sci. 2020, 10, 7070 8 of 19

4.1. Rules for General Relational Schemas This section defines the rules using Lemmas 1 and 2 to generate accurate RDF data from relational data without information loss (proofs are provided in the AppendicesC andD). Lemma 1 describes the feature of the OWL class during the mapping process.

Appl. Sci. 2020, 10, x FOR PEER REVIEW 8 of 19 Lemma 1: Suppose R is a relational table set, A is an attribute set, K is an integrity constraint • set, I is a relational instance set, X is a set where X (R A K I), and F is a direct mapping • Lemma 1: Suppose R is a relational table set, A is⊂ an attribute∪ ∪ ∪set, K is an integrity constraint function. Then, every y F(X) by inference from owl:Class can be retrieved. set, I is a relational instance∈ set, X is a set where X ⊂ (R ∪ A ∪ K ∪ I), and F is a direct mapping Thus,function. to avoid Then, Problem every 1 y described ∈ F(X) by in inference Section 3.3 from, Rule owl:Class 1 for mapping can be retrieved. relational tables based on LemmaThus, 1 is to defined avoid Problem as follows. 1 described in Section 3.3, Rule 1 for mapping relational tables based on Lemma 1 is defined as follows. Rule 1: Rel(r) BinRel(r, a1, ... , am, s, b1, ... ,bn, t) Relation(r), where the predicates used on • • ∧ ¬ → the Rule left-hand 1: Rel( sider) ∧ are ¬ definedBinRel(r, in a the1,…, Appendix am, s, b1,…,bA,n and, t) → Relation( Relation(r)r is), where a predicate the predicates that verifies used that onr is athe relational left-hand table side and are not defined a binary in relation.the Appendix A, and Relation(r) is a predicate that verifies that r is a relational table and not a binary relation. By Rule 1, relational tables are transformed into semantic resources using Relation (typed OWL By Rule 1, relational tables are transformed into semantic resources using Relation (typed OWL class), which is a semantic vocabulary to notate relational tables. For example, if a relational table, class), which is a semantic vocabulary to notate relational tables. For example, if a relational table, “Student.” is transformed by a naive rule, “Rel(Student) Class(Student),” then the transformed “Student.” is transformed by a naive rule, “Rel(Student)→→Class(Student),” then the transformed output Student loses the explicit information indicating that it is a relational table. This loss happens, output Student loses the explicit information indicating that it is a relational table. This loss happens, because all semantic resources are typed only by the OWL class (Figure 10a). On the other hand, because all semantic resources are typed only by the OWL class (Figure 10a). On the other hand, Student will not lose the information that it is a relational table if Rule 1 is implemented. Relation is Student will not lose the information that it is a relational table if Rule 1 is implemented. Relation is defined as a type of Student using Rule 1, which provides explicit information that the RDF data is defined as a type of Student using Rule 1, which provides explicit information that the RDF data is transformed from a relational table (Figure 10b). transformed from a relational table (Figure 10b).

Figure 10. A comparative example of rules for mapping relational tables (Rule 1). Figure 10. A comparative example of rules for mapping relational tables (Rule 1). Lemma 2 illustrates the feature of the OWL object property used to express the semantics of relationshipsLemma 2 between illustrates semantic the feature resources. of the OWL object property used to express the semantics of relationships between semantic resources. Lemma 2: For any X in relational data, if x X references another y X, then x can be transformed • • Lemma 2: For any X in relational data,∈ if x ∈ X references another∈ y ∈ X, then x can be into a semantic resource, which has a type of owl:ObjectProperty. transformed into a semantic resource, which has a type of owl:ObjectProperty. Based onon LemmaLemma 2, 2, if aif direct-mapping a direct-mapping method method does does not manage not manage the feature the feature of an object of an property object accurately,property accurately, then Problem then Problem 2 described 2 described in Section in 3.3Section can occur3.3 can during occur during the mapping the mapping process. process. Thus, RulesThus, Rules 2–5 based 2–5 based on Lemma on Lemma 2 for 2 mappingfor mapping the the semantics semantics of relationshipsof relationships are are defined. defined. RuleRule 22 is composed of fivefive sub rules toto specifyspecify thethe attributesattributes withinwithin thethe hierarchicalhierarchical structure:structure: •Rule Rule 2: Prop( 2: Prop(a, r,a, _ )r, _FP() ∧ aFP() a) Attr(→ Attr(a, r)a, r) • ∧ → ObjProp(ObjProp(a, r,a, s) r, s)FKeyAttr( → FKeyAttr(a, r, sa,) r, s) → DataProp(DataProp(a, r, a,type( r, type(a)) a))NonFKeyAttr( → NonFKeyAttr(a, r) a, r) → a r∀aFKeyAttr(∀r FKeyAttr(a, r,a, s) r, s)a ⊆ r∀Attr(a∀r Attr(a, r) a, r) ∀ ∀ ⊆ ∀ ∀ a r∀aNonFKeyAttr(∀r NonFKeyAttr(a, r)a, r) a⊆ r∀Attr(a∀r Attr(a, r),a, r), ∀ ∀ ⊆ ∀ ∀ where the predicates on the left-hand side are defined in the Appendix A, and the predicates on the right-hand side represent the transforms of relational attribute a. With these predicates, Rule 2 has a distinct advantage over previous approaches that simply used the OWL object property and the datatype property to map relational attributes. Because the OWL properties are provided to describe any resource with referencing semantics (not just for relational attributes), using only the OWL properties will not always guarantee that the output data was originally attributed data. As a result, previous approaches cannot avoid semantic information loss during mapping attributes. However, Rule 2 adopts hierarchical structured semantic Appl. Sci. 2020, 10, 7070 9 of 19 where the predicates on the left-hand side are defined in the AppendixA, and the predicates on the right-hand side represent the transforms of relational attribute a. With these predicates, Rule 2 has a distinct advantage over previous approaches that simply used the OWL object property and the datatype property to map relational attributes. Because the OWL properties are provided to describe any resource with referencing semantics (not just for relational attributes), using only the OWL properties will not always guarantee that the output data was originally attributed data. As a result, previous approaches cannot avoid semantic information loss during Appl. Sci. 2020, 10, x FOR PEER REVIEW 9 of 19 mappingAppl. Sci. 2020 attributes., 10, x FORHowever, PEER REVIEW Rule 2 adopts hierarchical structured semantic vocabularies on attributes9 of 19 (Figurevocabularies 11). Theon attributes vocabularies (Figure describe 11). The various vocabularies types of describe attributes, vari andous types each inputof attributes, attribute and can each be vocabularies on attributes (Figure 11). The vocabularies describe various types of attributes, and each transformedinput attribute into can RDF be transformed data with detailed into RDF information. data with detailed information. input attribute can be transformed into RDF data with detailed information.

FigureFigure 11.11. Set of attributes as a hierarchical structuredstructured semanticsemantic vocabularyvocabulary(Rule (Rule 2). Figure 11. Set of attributes as a hierarchical structured semantic vocabulary(Rule 2). Rule 3 is defined defined for mapping binary relations as follows: Rule 3 is defined for mapping binary relations as follows: •Rule Rule 3: BinRel(3: BinRel(r, ar,1, a...1,…,, a mam,, s,s, bb11,,…,b... ,bn,n t,) t)∧ ¬BinRel(BinRel(s,s, _, _, _, _, _, _, _ _)) ∧ ¬BinRel(t, _, _, _, _) → • • Rule 3: BinRel(r, a1,…, am, s, b1,…,bn, t) ∧ ∧¬ ¬BinRel(s, _, _, _, _) ∧∧ ¬ ¬BinRel(t, _, _, _, _) → BinaryRelation(BinaryRelation(r, s,r, t), s, t), BinaryRelation(r, s, t), where the predicates on the left-hand side are defineddefined in the Appendix AA,, andand BinaryRelation(r,BinaryRelation(r, s,s, t)t) where the predicates on the left-hand side are defined in the Appendix A, and BinaryRelation(r, s, t) is a a predicate predicate that that verifies verifies whether whether a binary a binary relation, relation, r, can r,be cantransformed be transformed into semantic into semanticresource is a predicate that verifies whether a binary relation, r, can be transformed into semantic resource resourceBinaryRelation BinaryRelation (typed OWL (typed object OWL property), object property), which is which a semantic is a semantic vocabulary vocabulary that notates that notatesbinary BinaryRelation (typed OWL object property), which is a semantic vocabulary that notates binary binaryrelations. relations. relations. Although both Rules 2 and 3 use owl:ObjectProp owl:ObjectPropertyerty during the mapping process, the mapping Although both Rules 2 and 3 use owl:ObjectProperty during the mapping process, the mapping results of thethe twotwo rulesrules cancan bebe readilyreadily distinguished.distinguished. The semanticssemantics of typetype owl:ObjectPropertyowl:ObjectProperty are results of the two rules can be readily distinguished. The semantics of type owl:ObjectProperty are encapsulated by the mapping resource, FKeyAttr, in Rule 2 (Figure 1111)) and BinaryRelation in Rule 3 encapsulated by the mapping resource, FKeyAttr, in Rule 2 (Figure 11) and BinaryRelation in Rule 3 (Figure(Figure 1212).). Therefore,Therefore, aa mappingmapping result result having having FKeyAttr FKeyAttr implies implies that that it it was was originally originally an an attribute attribute of (Figure 12). Therefore, a mapping result having FKeyAttr implies that it was originally an attribute relationalof relational data. data. Thus, Thus, from from a resulta result having having BinaryRelation, BinaryRelation, we we can can infer infer that that itit waswas aa binarybinary relation of relational data. Thus, from a result having BinaryRelation, we can infer that it was a binary relation before the mapping process.process. before the mapping process.

Figure 12. Semantic vocabulary of a binary relation (Rule 3). Figure 12. Semantic vocabulary of a binary relation (Rule 3). Rules 4 and 5 indicate the relationships between relational tables: Rules 4 and 5 indicate the relationships between relational tables: •Rule Rule 4: Rel( 4:s ) Rel(Rel(s) t)∧ PK(Rel(a,t s)) ∧ FK(PK(a,a, s, _,s t)) ∧ObjProp( FK(a, s,r, s,_, t) t)FP( ∧r ) ObjProp(IdentifyingRelationship(r, s, t) ∧ FP(rr,) s,→ t) • • Rule 4: ∧Rel(s) ∧ Rel(t) ∧∧ PK(a, s) ∧∧ FK(a, s, _, ∧t) ∧ →ObjProp(r, s, t) ∧ FP(r) → RuleIdentifyingRelationship( 5: Rel(s) Rel(t) r, s, PK(t) a, s) FK(a, s,_,t) ObjProp(r, s, t) FP(r) • • IdentifyingRelationship(∧ r,∧ s, t) ∧ ¬ ∧ ∧ → NonIdentifyingRelationship( Rule 5: Rel(s) ∧ Rel(t) r,∧ s, tPK(), a, s) ∧ ¬ FK(a, s,_,t) ∧ ObjProp(r, s, t) ∧ FP(r) → • Rule 5: Rel(s) ∧ Rel(t) ∧ PK(a, s) ∧ ¬ FK(a, s,_,t) ∧ ObjProp(r, s, t) ∧ FP(r) → NonIdentifyingRelationship(r, s, t), NonIdentifyingRelationship(r, s, t), where the predicates on the left-hand side are defined in the Appendix A, IdentifyingRelationship(r, where the predicates on the left-hand side are defined in the Appendix A, IdentifyingRelationship(r, s, t) is a predicate that verifies identifying relationships, and NonIdentifyingRelationship(r, s, t) is a s, t) is a predicate that verifies identifying relationships, and NonIdentifyingRelationship(r, s, t) is a predicate that verifies nonidentifying relationships. predicate that verifies nonidentifying relationships. Figure 13 shows an example of mapping an identifying relationship. Because a primary key of Figure 13 shows an example of mapping an identifying relationship. Because a primary key of Professor contains a foreign key referencing Person, the relation Professor is dependent on the relation Professor contains a foreign key referencing Person, the relation Professor is dependent on the relation Person. In such a case, relationships between Professor and Person can be mapped using Person. In such a case, relationships between Professor and Person can be mapped using IdentifyingRelationship( ), as defined by Rule 4. IdentifyingRelationship( ), as defined by Rule 4. Appl. Sci. 2020, 10, 7070 10 of 19

where the predicates on the left-hand side are defined in the AppendixA, IdentifyingRelationship(r, s, t) is a predicate that verifies identifying relationships, and NonIdentifyingRelationship(r, s, t) is a predicate that verifies nonidentifying relationships. Figure 13 shows an example of mapping an identifying relationship. Because a primary key of Professor contains a foreign key referencing Person, the relation Professor is dependent on the relation Person. In such a case, relationships between Professor and Person can be mapped using

Appl.IdentifyingRelationship( Sci. 2020, 10, x FOR PEER REVIEW ), as defined by Rule 4. 10 of 19 Appl. Sci. 2020, 10, x FOR PEER REVIEW 10 of 19

FigureFigure 13. 13.An Anexample example of mapping of mapping an iden an identifyingtifying relationship relationship (Rule (Rule 4). 4). Figure 13. An example of mapping an identifying relationship (Rule 4).

FigureFigure 14 14is anis anexample example of ofmapping mapping a nonidenti a nonidentifyingfying relationship. relationship. Because Because the theforeign foreign key key of of StudentFigurereferencing 14 is an exampleMajor is of not mapping an attribute a nonidenti for thefying primary relationship. key of Student Because, the relations,the foreignStudent key ofand Student referencingreferencing Major isis notnot anan attributeattribute forfor thethe primaryprimary keykey ofof Student,, thethe relations,relations, Student andand MajorMajor, are, are independent. independent. In such In such a case, a case, relationships relationships between between StudentStudent andand MajorMajor cancan be mapped be mapped using using MajorNonIdentifyingRelationship(, are independent. In such a ) case, by Rule relationships 5. between Student and Major can be mapped using NonIdentifyingRelationship( ) by Rule 5.

Figure 14. An example of mapping a nonidentifying relationship (Rule 5). FigureFigure 14. An 14. exampleAn example of mapping of mapping a nonident a nonidentifyingifying relationship relationship (Rule (Rule 5). 5). 4.2. Rules for Relational Integrity Constraints 4.2. Rules for Relational Integrity Constraints This section provides additional rules for transforming relational integrity constraints to prevent This section provides additional rules for transforming relational integrity constraints to prevent incorrect RDF data generation problems. Lemma 3 illustrates the feature of RDF data, using a linked incorrectincorrect RDFRDF datadata generationgeneration problems.problems. LemmaLemma 33 illustratesillustrates thethe featurefeature ofof RDFRDF data,data, usingusing aa linkedlinked graph structure. This feature acts as a major factor that prevents the generation of incorrect RDF graph structure. This feature acts as a major factor thatthat preventsprevents thethe generationgeneration ofof incorrectincorrect RDFRDF data.data. data. Thus, this section defines Rules 6–11 for the mapping integrity constraints based on Lemma 3 Thus, this section defines Rules 6–11 for the mapping integrity constraints based on Lemma 3 (proofs (proofs are provided in the AppendixE). are provided in the Appendix E). • • LemmaLemma 3:3: Suppose GG isis an an RDF RDF graph, graph, GG1 1andand GG2 are2 are the the components components of ofG, Gthere, there is no is noedge edge • Lemma 3: Suppose G is an RDF graph, G1 and G2 are the components of G, there is no edge betweenbetween GG1 and G2,, GG1 isis rooted rooted at at x x∈ RR, and, and GG2 is isrooted rooted at aty ∈y R, Rwhere, where R isR ais set a set of semantic of semantic between G1 and G22, G1 1is rooted at x ∈ ∈R, and G2 is2 rooted at y ∈ R∈, where R is a set of semantic resources.resources. Then,Then,

1. IfIf x andand y havehave thethe samesame uniformuniform resourceresource identifieridentifier (URI)(URI) [36],[36], thenthen x isis identicalidentical toto y.. 1. Thus,If x and G1 yandhave G2 thecan same be merged uniform into resource one graph. identifier (URI) [36], then x is identical to y. Thus, Thus, G1 and G2 can be merged into one graph. 2. IfG x1 and Gy 2havecan bedifferent merged URIs, into x one has graph. a property, p1, and y has a property, p2, that has the 2. If x and y have different URIs, x has a property, p1, and y has a property, p2, that has the 2. sameIf x and URIy have as p di1, ffthenerent G URIs,1 andx hasG2 cannot a property, be pmerged1, and y intohas aone property, graph,p2 ,and that p has1 can the be same same URI as p1, then G1 and G2 cannot be merged into one graph, and p1 can be distinguishedURI as p1, then fromG1 pand2 usingG2 cannot x and y be. merged into one graph, and p1 can be distinguished distinguished from p2 using x and y. • from p2 using x and y.∧ ∧ → • Rule 6: NonFKeyAttr(a, r)) ∧ subClassOf(subClassOf(r, _b)) ∧ Card(Card(a, _b, 1)) → NotNull(NotNull(a, r)) • ∧ ∧ ∧ ∧ ∃ → • RuleRule 6:7: NonFKeyAttr(NonFKeyAttr(a,a, rr))) ∧ subClassOf( IFP(IFP(a)) ∧ subClassOf(subClassOf(r, _b) Card(r, _ba,)) ∧ _b, MaxCard(MaxCard( 1) NotNull(a, _b, 1a,)) r∧) ((∃!!v)) a(a(r, v)) → • ∧ ∧ → RuleUnique( 7: NonFKeyAttr(a, r)) a, r) IFP(a) subClassOf(r, _b) MaxCard(a, _b, 1) ( !v) a(r, v) •• ∧∧ ∧∧ ∧ ∧ ∧ ∃ ∧ ∃ → → • Unique(Rule 8: NonFKeyAttr(a, r) a, r)) ∧ IFP(IFP(a)) ∧ subClassOf(subClassOf(r, _b)) ∧ Card(Card(a, _b, 1)) ∧ ((∃!!v)) a(a(r, v)) → PK(PK(a, r)) • ∧ ∧ → • Rule 9: FKeyAttr(a, r, s)) ∧ subClassOf(subClassOf(r, _b)) ∧ MinCard(MinCard(a, _b, 1)) → FK(FK(a, r, s)) • ∧ ∧ ∧ → • Rule 10: NonFKeyAttr(a, r)) ∧ subClassOf(subClassOf(r, _b)) ∧ MaxCard(MaxCard(a, _b, 1)) ∧ DefVal(DefVal(a, _b, v)) → Default(a, r)) • ∧ ∧ ∧ → • Rule 11: NonFKeyAttr(a, r)) ∧ subClassOf(subClassOf(r, _b)) ∧ MaxCard(MaxCard(a, _b, 1)) ∧ CheckCond(CheckCond(a, _b, v)) → Check(a, r),), where a is an attribute of relational table, r,, __b isis aa blankblank nodenode [37],[37], v inin RulesRules 77 andand 88 isis anan attributeattribute value, v inin RuleRule 1010 isis aa defaultdefault attributeattribute value,value, andand v inin RuleRule 1111 isis aa checkcheck condition.condition. TheThe predicatespredicates on the left-hand side are defined in the Appendix A, and the predicates on the right-hand side preserve the integrity constraints: not null, unique, primary key, foreign key, default, and check. Appl. Sci. 2020, 10, 7070 11 of 19

Rule 8: NonFKeyAttr(a, r) IFP(a) subClassOf(r, _b) Card(a, _b, 1) ( !v) a(r, v) PK(a, r) • ∧ ∧ ∧ ∧ ∃ → Rule 9: FKeyAttr(a, r, s) subClassOf(r, _b) MinCard(a, _b, 1) FK(a, r, s) • ∧ ∧ → Rule 10: NonFKeyAttr(a, r) subClassOf(r, _b) MaxCard(a, _b, 1) DefVal(a, _b, v) Default(a, r) • ∧ ∧ ∧ → Rule 11: NonFKeyAttr(a, r) subClassOf(r, _b) MaxCard(a, _b, 1) CheckCond(a, _b, v) •Appl. Sci. 2020, 10, x FOR PEER REVIEW ∧ ∧ ∧ 11 of→ 19 Check(a, r), Rule 6 describes the NotNull constraint. It also defines a predicate, Card(a, _b, 1), that restricts where a is an attribute of relational table, r,_b is a blank node [37], v in Rules 7 and 8 is an attribute the cardinality of an attribute to be exactly one (Figure 15a). Rule 7 specifies a unique constraint and value, v in Rule 10 is a default attribute value, and v in Rule 11 is a check condition. The predicates on defines a predicate with a unique existential quantifier, (∃!v) a(r, v), such that there is only one the left-hand side are defined in the AppendixA, and the predicates on the right-hand side preserve Appl.attribute Sci. 2020, 10 value,, x FOR v PEER, contained REVIEW in the domain of a(r, v) (Figure 15b). 11 of 19 the integrity constraints: not null, unique, primary key, foreign key, default, and check. RuleRule 6 describes 6 describes the theNotNullNotNull constraint.constraint. It also It also defines defines a predicate, a predicate, Card( Card(a, _ba,,_ 1),b, that1), that restricts restricts the thecardinality cardinality of an of attribute an attribute to be to exactly be exactly one one (Figure (Figure 15a). 15 Rulea). Rule 7 specifies 7 specifies a unique a unique constraint constraint and and defines a predicate with a unique existential quantifier, ( ∃!v) a(r, v), such that there is only one attribute defines a predicate with a unique existential quantifier, ∃( !v) a(r, v), such that there is only one attributevalue, value,v, contained v, contained in the in domain the domain of a(r, ofv) a (Figure(r, v) (Figure 15b). 15b).

Figure 15. Examples of mapping integrity constraints (Rule 6,7). Rule 8 specifies the primary key defined by Card(a, _b, 1) and (∃!v) a(r, v), to assign an attribute, a, with a primary key (Figure 16a). To define a foreign-key constraint, Rule 9 specifies a lower bound of the cardinality using MinCard(a, _b, 1), because the relational tables can reference more than one other table. Rule 9 also uses FKeyAttr(a, r, s) to describe the semantics that the type of attribute a is FigureFigure 15. Examples 15. Examples of mapping of mapping integr integrityity constraints constraints (Rule (Rule 6,7).6,7). an OWL object property with domain r and range s (Figure 16b). RuleRule 8 specifies 810 specifies specifies the theprimary the primary defau key keylt defined constraint, defined by byCard( and Card( ait, _usesab,_, 1b), a1and )function and (∃! (v)! va )(DefVal(ra, (vr,),v to), toaassign, assign_b, v an) that an attribute, attribute, returns aa, ∃ a, withwithdefault a aprimary primary value, key keyv, (Figureif (Figure a value 16a). 16 a).of To Tothe define define attribute, a aforeign-key foreign-key a, is omitted constraint, constraint, (Figure Rule Rule 17a). 9 9specifies specifies In Rule a alower lower11, abound boundfunction of of thetheCheckCond( cardinality cardinalitya , using using_b, v )MinCard( MinCard( is used ato,,_ _ brestrict, 1),), becausebecause the value thethe relationalrelational range for tables tables the can cancheck reference reference constraint. more more thanFor than oneexample, one other othertable.CheckCond(quantity, table. Rule Rule 9 also9 also uses uses FKeyAttr(_b, FKeyAttr(‘quantitya, r ,a>s, 0’))r, to s means) describe to describe that the a semanticsthevalue semantics of the that attribute that the typethe quantity type of attribute of attributemusta isbe an greatera OWLis an OWLobjectthan zeroobject property (Figure property with 17b). domainwith domainr and r range and range s (Figure s (Figure 16b). 16b). Rule 10 specifies the default constraint, and it uses a function DefVal(a, _b, v) that returns a default value, v, if a value of the attribute, a, is omitted (Figure 17a). In Rule 11, a function CheckCond(a, _b, v) is used to restrict the value range for the check constraint. For example, CheckCond(quantity, _b, ‘quantity > 0’) means that a value of the attribute quantity must be greater than zero (Figure 17b).

Figure 16. Examples of mapping integrityintegrity constraints (Rule 8,9).

Rule 10 specifies the default constraint, and it uses a function DefVal(a,_b, v) that returns a default value, v, if a value of the attribute, a, is omitted (Figure 17a). In Rule 11, a function CheckCond(a,_b, v) is used to restrict the value range for the check constraint. For example, CheckCond(quantity, _b, ‘quantity > 0’) meansFigure that 16. a Examples value of theof mapping attribute integr quantityity constraints must be (Rule greater 8,9). than zero (Figure 17b).

Appl. Sci. 2020, 10, x FOR PEER REVIEW 11 of 19

Rule 6 describes the NotNull constraint. It also defines a predicate, Card(a, _b, 1), that restricts the cardinality of an attribute to be exactly one (Figure 15a). Rule 7 specifies a unique constraint and defines a predicate with a unique existential quantifier, (∃!v) a(r, v), such that there is only one attribute value, v, contained in the domain of a(r, v) (Figure 15b).

Figure 15. Examples of mapping integrity constraints (Rule 6,7). Rule 8 specifies the primary key defined by Card(a, _b, 1) and (∃!v) a(r, v), to assign an attribute, a, with a primary key (Figure 16a). To define a foreign-key constraint, Rule 9 specifies a lower bound of the cardinality using MinCard(a, _b, 1), because the relational tables can reference more than one other table. Rule 9 also uses FKeyAttr(a, r, s) to describe the semantics that the type of attribute a is an OWL object property with domain r and range s (Figure 16b). Rule 10 specifies the default constraint, and it uses a function DefVal(a, _b, v) that returns a default value, v, if a value of the attribute, a, is omitted (Figure 17a). In Rule 11, a function CheckCond(a, _b, v) is used to restrict the value range for the check constraint. For example, CheckCond(quantity, _b, ‘quantity > 0’) means that a value of the attribute quantity must be greater than zero (Figure 17b).

Appl. Sci. 2020, 10, 7070 12 of 19 Figure 16. Examples of mapping integrity constraints (Rule 8,9).

Figure 17. Examples of mapping integrity constraints (Rule 10,11).

4.3. Soundness and Completeness of the Rules Lemmas describe the features of semantic resources during the mapping process. Lemma 1 states that every semantic resource can be inferred from the OWL class. Lemma 2 states that every semantic resource referencing other resources can be typed by the OWL object property. Lemma 3 states that every subgraph, which has the same semantic resource as a root node, can be merged into a single graph. On the other hand, the problems described in Section 3.3 are specific cases of violation of semantics preservation. Problem 1 illustrates the loss of information when the relational tables are transformed without considering Lemma 1. Problem 2 illustrates another case of loss of information when attributes, binary relations, or other referencing objects are transformed without considering Lemma 2. Problem 3 illustrates incorrect RDF data generation when integrity constraints are transformed without considering Lemma 3. Therefore, mapping rules are defined based on the lemmas to perform semantics-preserving transformation of RDBs to RDF data and to avoid the loss of semantics or incorrect RDF data generation. Lemma 4 demonstrates the soundness and completeness of the provided RDB2RDF data transformation methods (see the AppendixF for proof).

Lemma 4: Consider that X is a set of relational data, F is an RDB2RDF mapping function, and G is • an RDF2RDB inverse-mapping function of F. If the mapping rules are defined based on Lemmas 1, 2, and 3, then,

1. Soundness: the mapping rules are sound if the rules generate only semantics in RDB data (X G(F(X))). ⊇ 2. Completeness: the mapping rules are complete if the rules generate all semantics in RDB data (X G(F(X))). ⊆ 5. Experimental Results

5.1. Environments Experiments were conducted using five real datasets and one synthetic dataset on a cluster of 12 nodes using a 3.1-GHz quad-core processor, 4-GB memory, and a 2-TB hard disk. Each real dataset contains relational schema information with integrity constraints: Ensembl-compara (www.ensembl. org/info/docs/api/compara), Ensembl (www.ensembl.org), PHPmyadmin (https://www.phpmyadmin. net), and MusicBrainz (https://musicbrainz.org). The DBT2 (http://osdldbt.sourceforge.net) benchmark was used for the synthetic dataset. This work generated warehouse data using DBT2 and restructured the schema by adding integrity constraints to evaluate the semantics preservation of the mapping methods.

5.2. Analysis Figure 18a presents the results for the number of triples transformed from the relational data. To perform a comparative analysis of cost efficiency of the mapping rules, this work employed OWL Appl. Sci. 2020, 10, 7070 13 of 19

ontology-based augmented direct mapping [13], which provides implementation details of the mapping algorithm and demonstrates the improvement over other previous methods. The horizontal axis represents the relational data size of the input as mapping methods, and the vertical axis represents the number of semantic triples as output data. As viewed in Figure 18a, our approach generates fewer triples compared with the previous method. Figure 18b shows the average number of triples that result from each transformation method. The horizontal axis represents each relational dataset, and the vertical axis represents the average number of triples generated from the transformation of a single relational element. Assuming that two output results are identical in terms of semantics, the method that generates a smaller-sized result is better with regard to both space and computation. On one hand, Appl.whenAppl. Sci. Sci. the 2020 2020 input, ,10 10, ,x x dataFOR FOR PEER PEER aretransformed REVIEW REVIEW into RDF data, the mapping rule uses the hierarchical RDF1313 of dataof 19 19 model. On the other hand, previous methods generated output data without a predefined RDF data semanticmodel,semantic and modelmodel repetitive languageslanguages RDF whendatawhen were thethe input generatedinput datadata using hadhad severalseveral primitive referentialreferential semantic relationshipsrelationships model languages oror constraints.constraints. when the Thus,inputThus, the datathe resultsresults had several showshow referential thatthat thethe proposedproposed relationships approachapproach or constraints. generatesgenerates Thus, moremore the compactcompact results showRDFRDF data thatdata thethatthat proposed expressexpress theapproachthe same same information information generates more with with compact fewer fewer resources. resources. RDF data that express the same information with fewer resources.

((aa)) The The number number of of triples triples over over relational relational data. data. ((bb)) The The average average number number of of triples triples for for one one input input data.data.

FigureFigure 18. 18. Comparative Comparative results results of of the the mapping mapping methods. methods.

FigureFigure 1919 shows showsshows thethe the failurefailure failure raterate rate ofof of thethe the mappingmapping mapping methodsmethods methods inin each ineach each database.database. database. TheThe horizontal Thehorizontal horizontal axisaxis representsaxisrepresents represents eacheach each relationalrelational relational dataset,dataset, dataset, andand and thethe the verticvertic verticalalal axisaxis axis representsrepresents represents thethe the failurefailure failure raterate rate duringduring during the the transformationtransformation of ofof relational relational data datadata into intointo RDF RDFRDF data. data. The The The mapping mapping mapping failures failures failures of of the the the previous previous previous approach approach approach result result inin the thethe incorrect incorrectincorrect RDF RDFRDF data datadata generation generationgeneration problem problemproblem was waswas discussed discusseddiscussed in inin Section SectionSection 3 3.3.. TheseTheseThese failuresfailuresfailures occurredoccurredoccurred becausebecause the the previous previous methods methods lacked lacked support support for for ha handlinghandlingndling the thethe integrity integrityintegrity constraints. constraints.constraints. Our Our approach approach followedfollowed LemmaLemmaLemma 4 44 and andand guaranteed guaranteedguaranteed that thatthat the hierarchicalthethe hierhierarchicalarchical mapping mappingmapping rules generated rulesrules generatedgenerated fewer false fewerfewer mapping falsefalse mappingresults.mapping By results. results. our method, By By our our method, method, the false the the results false false couldresults results occur could could when occur occur the when when input the the data input input are data data defined are are defined defined based onbasedbased the onpracticalon the the practical practical SQL statements SQL SQL statements statements that are that that not are are included not not included included in standard in in standard standard SQL. SQL. SQL.

FigureFigure 19. 19. RDB2RDF RDB2RDF failure failure rate rate of of the the mapping mapping methods. methods.

FiguresFigures 2020 andand 2121 illustrateillustrate thethe soundnesssoundness andand completenesscompleteness ofof mappingmapping rulesrules whenwhen thethe sizesize ofof thethe relational relational input input data data varies varies from from 20 20 to to 100. 100. Th Thisis experiment experiment duplicated duplicated the the experiments experiments of of Astrova Astrova etet al.al. [8],[8], BuccellaBuccella etet al.al. [9],[9], LiLi etet al.al. [12],[12], LimLim etet al.al. [1[13],3], ShenShen etet al.al. [15],[15], andand TirmiziTirmizi etet al.al. [16].[16]. TheThe resultsresults showshow thatthat thethe proposedproposed methodmethod adheresadheres toto thethe definitiondefinition ofof thethe semantics-preservingsemantics-preserving direct-direct- mappingmapping rule.rule. TheThe mappingmapping rulesrules generategenerate onlyonly sesemanticsmantics inin RDBRDB datadata andand generategenerate mostmost ofof thethe semanticssemantics of of the the RDF RDF data. data. Appl. Sci. 2020, 10, 7070 14 of 19

Figures 20 and 21 illustrate the soundness and completeness of mapping rules when the size of the relational input data varies from 20 to 100. This experiment duplicated the experiments of Astrova et al. [8], Buccella et al. [9], Li et al. [12], Lim et al. [13], Shen et al. [15], and Tirmizi et al. [16].

TheAppl. resultsSci. 2020, show10, x FOR that PEER the REVIEW proposed method adheres to the definition of the semantics-preserving14 of 19 direct-mapping rule. The mapping rules generate only semantics in RDB data and generate most of Appl. Sci. 2020, 10, x FOR PEER REVIEW 14 of 19 the semantics of the RDF data.

Figure 20. Soundness of RDB2RDF data transformation methods. Figure 20. Soundness of RDB2RDF data transformation methods. Figure 20. Soundness of RDB2RDF data transformation methods.

Figure 21. Completeness of RDB2RDF data transformation methods. 6. Conclusions Figure 21. Completeness of RDB2RDF data transformation methods. 6. Conclusions The paper focuses on the problem of the existing direct-mapping methods that they do not 6. ConclusionsThe paper focuses on the problem of the existing direct-mapping methods that they do not fully fully support mapping-integrity constraints. The problems are observed in specific cases in which support mapping-integrity constraints. The problems are observed in specific cases in which semanticThe paper information focuses loss on the or incorrectproblem of RDF the data existing generation direct-mapping occurred. methods In this paper,that they the do improved not fully semantic information loss or incorrect RDF data generation occurred. In this paper, the improved definitionsupport mapping-integrity of semantics preservation constraints. is provided The problems to solve are the problemsobserved andin specific augment cases the RDB2RDFin which definition of semantics preservation is provided to solve the problems and augment the RDB2RDF mappingsemantic methods.information loss or incorrect RDF data generation occurred. In this paper, the improved mapping methods. definitionThree of lemmas semantics are definedpreservation to describe is provided the features to solve of the semantic problems resources and augment during the the RDB2RDF mapping Three lemmas are defined to describe the features of semantic resources during the mapping process.mapping methods. Lemma 1 stated that each semantic resource could be inferred from an OWL class process. Lemma 1 stated that each semantic resource could be inferred from an OWL class (i.e., (i.e., semanticThree lemmas resource are). defined Lemma to 2 describe stated that the eachfeatures semantic of semantic resource resources referencing during other the resourcesmapping semantic resource). Lemma 2 stated that each semantic resource referencing other resources could be couldprocess. be Lemma typed by 1 anstated OWL that object each property semantic (i.e., reso referentialurce could relationship). be inferred Lemmafrom an 3 OWL stated class that each(i.e., typed by an OWL object property (i.e., referential relationship). Lemma 3 stated that each subgraph subgraphsemantic resource). having the Lemma same 2 semantic stated that resource each semantic as a root resource node could referencing be merged other into resources a single could graph be having the same semantic resource as a root node could be merged into a single graph (i.e., union of (typedi.e., union by an of OWLsemantic object resources). property A(i.e., hierarchical referential structured relationship). semantic Lemma vocabulary 3 stated wasthatalso each defined subgraph for semantic resources). A hierarchical structured semantic vocabulary was also defined for use in direct- usehaving in direct-mapping the same semantic rules. resource as a root node could be merged into a single graph (i.e., union of semanticmapping resources).rules. A hierarchical structured semantic vocabulary was also defined for use in direct- mappingRule rules.sets are defined based on the lemmas to transform relational tables and attributes. The mappingRule rulessets are comprised defined basedgeneral- on and the constraintlemmas to-mapping transform rules. relational The ge tablesneral-mapping and attributes. rules Theare mappingused for mapping rules comprised relations, general- attributes, and and constraint other general-mapping relational rules. objectsThe ge neral-mappingand are defined rules to avoid are usedsemantic for mappinginformation relations, loss attributes,during the and transformati other generalon relationalof general objects relational and areobjects. defined Constraint to avoid semanticmapping rulesinformation are used loss for mappingduring the the transformatiintegrity constraintson of general and are relationaldefined to objects.reduce theConstraint volume mappingof incorrect rules RDF are data used generated. for mapping the integrity constraints and are defined to reduce the volume of incorrect RDF data generated. Appl. Sci. 2020, 10, 7070 15 of 19

Rule sets are defined based on the lemmas to transform relational tables and attributes. The mapping rules comprised general- and constraint-mapping rules. The general-mapping rules are used for mapping relations, attributes, and other general relational objects and are defined to avoid semantic information loss during the transformation of general relational objects. Constraint mapping rules are used for mapping the integrity constraints and are defined to reduce the volume of incorrect RDF data generated. Finally, the semantics-preserving direct-mapping method was implemented, and a comparative experimental study was performed with both synthetic and real datasets. The experiments demonstrated that the proposed mapping method performs semantics-preserving RDB2RDF transformation and generates semantically accurate results. In the future, we will study the methods of synchronization to achieve RDF data consistency when the original relational data are modified [38]. We will also build a cost-benefit model that reduces the number of repetitive processes.

Author Contributions: H.-G.J. conceive the problem and implemented the algorithm and performed the experiments; D.-H.I. supervised the whole research work and revised the algorithm and the theorems; H.-G.J. and D.-H.I. wrote the paper. All authors have read and agreed to the published version of the manuscript. Funding: This research was supported by the Ministry of Science and ICT (MSIT), Korea, under the Information Technology Research Center (ITRC) support program (IITP-2020-2018-08-01417) supervised by the Institute for Information & communications Technology Promotion (IITP). Conflicts of Interest: The authors declare no conflict of interest.

Appendix A. Definitions of Predicates Used in Mapping rules The predicates shown in Table A1 are used for the verification of OWL ontology and to be used by the mapping rules.

Table A1. List of predicates used in mapping rules.

Predicates Conditions of Predicates to Return True Class(r) r is an OWL class Prop(p, d, r) p is an RDF property with domain d and range r ObjProp(p, d, r) p is an OWL object property (ObjProp) with domain DataProp(p, d, t) d and range r FP(p) p is and OWL datatype property (DataProp) with IFP(p) domain d and datatype t Card(p, v) p is an OWL functional property (FP) MinCard(p, v) p is an OWL inverse functional property (IFP) MaxCard(p, v) cardinality of property p is v type(x, t) minimum cardinality of property p is v subClassOf(x, y) maximum cardinality of property p is v Rel(r) datatype of x is t BinRel(r, a1, ... ,am, s, b1, ... ,bn, t) x is a subclass of y Attr(a, r) r is a relation FKeyAttr(a, r) r is a binary relation between relation s with primary NonFKeyAttr(a, r) key columns a1, ... ,am and t with primary key columns b1, ... ,bm a is an attribute of relation r a is a foreign key a is not a foreign key

The properties in Table A1 (words in bold) are OWL properties. For better understanding, we provide the relationships among triples, OWL properties, and their semantics in Table A2. Appl. Sci. 2020, 10, 7070 16 of 19

Table A2. Examples of semantic triple data using the OWL notation.

(Subject, Property, Object) Types of Property Semantics (Professor, teaches, Student) Object property Relation between resources Appl.(Person, Sci. 2020, age, 10, x “23”) FOR PEER REVIEW Data type property Subject has a data type value16 of 19 (Product, produced, ProductLine) Functional property (FP) N:1 relationship (Shakespeare,(Shakespeare, writes, writes, Book) Book) InverseInverse functional functional property property (IFP) (IFP) 1:N1:N relationship relationship (Husband, marriage, Wife) FP and IFP 1:1 relationship (Husband, marriage, Wife) FP and IFP 1:1 relationship

Appendix B. An Example of Problem 3 Appendix B. An Example of Problem 3 Figure A1 shows an example of Problem3, assume that a relational attribute x has integrity Figure A1 shows an example of Problem3, assume that a relational attribute x has integrity constraints “unique” and “not null”, F is a mapping function that contains the rules in Figure7, and G constraints “unique” and “not null”, F is a mapping function that contains the rules in Figure 7, and is an inverse-mapping function of F, then the integrity constraints of G(F(x)) are “unique”, “not null”, G is an inverse-mapping function of F, then the integrity constraints of G(F(x)) are “unique”, “not and “primary key” because FK(p) Unique(p), NotNull(p) Unique(p), and PK(p) NotNull(p) null”, and “primary key” because FK(⊆p) Unique(p), NotNull(⊆p) Unique(p), and PK(p)⊆ NotNull(p) Unique(p). ∪ Unique(p).

Figure A1. An example of Problem 3. Appendix C. Proof of Lemma 1 Appendix C. Proof of Lemma 1 If x R, then x can be transformed directly into owl:Class. If x A, then x can be transformed If x ∈ R, then x can be transformed directly into owl:Class. If x ∈ A, then x can be transformed into either owl:ObjectProperty or owl:DatatypeProperty, which are types of rdfs:Class. If x A K, into either owl:ObjectProperty or owl:DatatypeProperty, which are types of rdfs:Class. If x ∈ A ∪ K, then x can be transformed into owl:FunctionalProperty or owl:InverseFunctionalProperty, which are then x can be transformed into owl:FunctionalProperty or owl:InverseFunctionalProperty, which are types of rdfs:Class. If x K, then x can be transformed into owl:onProperty, owl:minCardinality, types of rdfs:Class. If x ∈∈ K, then x can be transformed into owl:onProperty, owl:minCardinality, owl:maxCardinality,or owl:cardinality, all of which are types of rdf:Property, and the type of rdf:Property owl:maxCardinality, or owl:cardinality, all of which are types of rdf:Property, and the type of is rdfs:Class. As rdfs:Class is a subclass of owl:Class, semantic resources used in transformation are rdf:Property is rdfs:Class. As rdfs:Class is a subclass of owl:Class, semantic resources used in directly or indirectly assigned to owl:Class type. Therefore, transformed RDF data F(x) can be retrieved transformation are directly or indirectly assigned to owl:Class type. Therefore, transformed RDF data by inference using owl:Class. F(x) can be retrieved by inference using owl:Class. Appendix D. Proof of Lemma 2 Appendix D. Proof of Lemma 2 Let a be an attribute and b a relational table. If a is a foreign key of table x referencing a primary key attributeLet a be inan tableattribute y, then and a b can a relational be transformed table. If into a is owl:ObjectProperty a foreign key of table with x referencing domain x and a primary range y. Ifkey b isattribute a binary in relationtable y, then over a table can be t and transform u, thened b caninto beowl:ObjectProperty transformed into with owl:ObjectProperty domain x and range with domainy. If b is a t andbinary range relation u. Even over iftable a bigger t and relationshipu, then b can exists,be transformed the relation into can owl:ObjectProperty be transformed using with owl:ObjectProperty.domain t and range Therefore,u. Even if owl:ObjectPropertya bigger relationship can exists, describe the anyrelation referencing can be relationshiptransformed among using relationalowl:ObjectProperty. data sources. Therefore, owl:ObjectProperty can describe any referencing relationship among relational data sources.

Appendix E. Proof of Lemma 3 First, we prove (1). Assume that G1 and G2 cannot be merged into one graph, G1 has only one triple t(x, p1, o1), G2 has only one triple t(y, p2, o2) that is identical to t(x, p2, o2), and q(s, p, o) is a query function to find triples. If q(x, ?p, ?o) is the input of the query function, then the result is { t(x, p1, o1), (x, p2, o2) }. This contradicts our assumption that G1 and G2 cannot be merged into one graph. Appl. Sci. 2020, 10, 7070 17 of 19

Appendix E. Proof of Lemma 3 First, we prove (1). Assume that G1 and G2 cannot be merged into one graph, G1 has only one triple t(x, p1, o1), G2 has only one triple t(y, p2, o2) that is identical to t(x, p2, o2), and q(s, p, o) is a query function to find triples. If q(x, ?p, ?o) is the input of the query function, then the result is { t(x, p1, o1), (x, p2, o2) }. This contradicts our assumption that G1 and G2 cannot be merged into one graph. Therefore, if x and y have the same URI, then G1 and G2 can be merged into one graph. Second, we prove (2). Assume that G1 and G2 can be merged into one graph, p is a property that has the same URI as p1 and p2, p = p1 = p2, G1 has only one triple t(x, p, o1), G2 has only one triple t(y, p, o2), and q(s, p, o) is a query function to find triples. If q(y, p, o2) is the input of the query function, then there is no matching result. If q(x, p, o1) is the input for the query function, then the result is also empty. This contradicts our assumption that G1 and G2 can be merged into one graph. Therefore, if x and y have different URIs, then G1 and G2 cannot be merged into one graph.

Appendix F. Proof of Lemma 4 First, we prove the completeness of the mapping rules by induction. Suppose S is an RDB schema set, S’ is the semantic graph data that represents every schema in S, F is a mapping function and G is an inverse-function of F, and X is input data, where X S. (1) Base: Assume X = { }, then F(X) = {} ⊂ and G(F(X)) = { }, thus X G(F(X)). (2) Inductive hypothesis: Given a set X = {x1, x2, ... ,xk} and |X| = ⊆ k. Assume F(X) is a mapping function for all X S, which transforms X to RDF data that F(X) S’. ⊂ ⊂ (3) Inductive step: Given a set of RDB data {x1, x2,.. xk+1} and |X| = k+1. Consider the set X’ without x, where x is any element of X, and |X’| = k. We have X’ and x that are excluded from X. Now, we can apply the inductive hypothesis to X’, mapping them all to be transformed into RDF data F(X’) Y and ⊂ G(F(X’)) S. We can also apply the inductive hypothesis to x, which is identical to X when k is one. ⊂ Therefore, by the mapping rules, X G(F(X)). Second, we prove the soundness of the mapping rules. ⊆ Assume S is an RDB schema set, S’ is the semantic graph data that represents every schema in S, F is a mapping function that generates a directed graph with a root node containing a unique URI value, and G is an inverse-mapping function of F. If input data X is a disjoint set, |X| = k, X = {x1, x2, ... , xk}, then F(X) = yk is a graph data and F(X) is a disjoint set by Lemma 3.

References

1. He, B.; Patel, M.; Zhang, Z.; Chang, K.C.C. Accessing the deep web. Commun. ACM 2007, 50, 94–101. [CrossRef] 2. De Laborda, C.P.; Conrad, S. Database to Semantic Web mapping using RDF query languages. In International Conference on Conceptual Modeling; Springer: Berlin/Heidelberg, Germany, 2006; pp. 241–254. 3. Spanos, D.E.; Stavrou, P.; Mitrou, N. Bringing relational databases into the semantic web: A survey. Semant. Web 2012, 3, 169–209. [CrossRef] 4. Relational Databases on the Semantic Web. Availableonline: http://www.w3.org/DesignIssues/RDB-RDF.html (accessed on 28 August 2019). 5. Arenas, M.; Bertails, A.; Prud’hommeaux, E.; Sequeda, J. A direct mapping of relational data to RDF. W3C Recomm. 2012, 27, 1–11. 6. Sequeda, J.F.; Arenas, M.; Miranker, D.P. A completely automatic direct mapping of relational databases to RDF and OWL. In Proceedings of the 10th International Semantic Web Conference (ISWC2011), Bonn, Germany, 23–27 October 2011. 7. Sequeda, J.F.; Arenas, M.; Miranker, D.P. On directly mapping relational databases to RDF and OWL. In Proceedings of the 21st international conference on World Wide Web, Lyon, France, 16–20 April 2012; pp. 649–658. 8. Astrova, I.; Korda, N.; Kalja, A. Rule-based transformation of SQL relational databases to OWL ontologies. In Proceedings of the 2nd International Conference on & Semantics Research, Corfu, Greece, 2–11 October 2007; pp. 415–424. 9. Buccella, A.; Penabad, M.R.; Rodriguez, F.J.; Farina, A.; Cechich, A. From relational databases to OWL ontologies. In Proceedings of the 6th National Russian Research Conference, 29 September–1 October 2004. Appl. Sci. 2020, 10, 7070 18 of 19

10. Cerbah, F. Learning highly structured semantic repositories from relational databases. In European Semantic Web Conference; Springer: Berlin/Heidelberg, Germany, 2008; pp. 777–781. 11. Cullot, N.; Ghawi, R.; Yétongnon, K. DB2OWL: A tool for automatic database-to-ontology mapping. SEBD 2007, 7, 491–494. 12. Li, M.; Du, X.Y.; Wang, S. Learning ontology from relational database. In Proceedings of the 2005 International Conference on Machine Learning and Cybernetics, Guangzhou, China, 18–21 August 2005; Volume 6, pp. 3410–3415. 13. Lim, K.B.; Jun, H.G.; Kim, H.J. Semantics preserving MapReduce process for RDB to RDF transformation. Int. J. Metadata Semant. Ontol. 2015, 10, 229–239. [CrossRef] 14. Jun, H.G.; Im, D.H.; Kim, H.J. Semantics preserving optimisation of mapping multi-column key constraints for RDB to RDF transformation. J. Inf. Sci. 2020.[CrossRef] 15. Shen, G.; Huang, Z.; Zhu, X.; Zhao, X. Research on the rules of mapping from relational model to OWL. OWLED 2006, 216. Available online: http://ceur-ws.org/Vol-216/ (accessed on 11 October 2020). 16. Tirmizi, S.H.; Sequeda, J.; Miranker, D. Translating SQL applications to the semantic web. In International Conference on Database and Expert Systems Applications; Springer: Berlin/Heidelberg, Germany, 2008; pp. 450–464. 17. Lassila, O.; Swick, R.R. Resource Description Framework (RDF) Model and Syntax Specification. 1998. Available online: https://www.w3.org/TR/1998/WD-rdf-syntax-19980720/ (accessed on 11 October 2020). 18. Lee, T.B.; Connolly, D. Delta: An Ontology for the Distribution of Differences between RDF Graphs. Available online: https://www.w3.org/DesignIssues/lncs04/Diff.pdf (accessed on 11 October 2020). 19. Dau, F. RDF as graph-based, diagrammatic logic. In International Symposium on Methodologies for Intelligent Systems; Springer: Berlin/Heidelberg, Germany, 2006; pp. 332–337. 20. Hayes, J. A Graph Model for RDF.Darmstadt University of Technology/University of Chile. 2004. Available online: http://users.dcc.uchile.cl/cgutierr/papers/rdfgraphmodel.pdf (accessed on 11 October 2020). 21. Sahoo, S.S.; Halb, W.; Hellmann, S.; Idehen, K.; Thibodeau, T., Jr.; Auer, S.; Sequeda, J.; Ezzat, A. A survey of current approaches for mapping of relational databases to RDF. W3C RDB2RDF Incubator Group Rep. 2009, 1, 113–130. 22. Michel, F.; Montagnat, J.; Faron–Zucker, C. A Survey of RDB to RDF Translation Approaches and Tools; [Research Report] I3S, 2014, ffhal-00903568v2f. Available online: https://hal.archives-ouvertes.fr/ hal-00903568/file/Rapport_Rech_I3S_v2_-_Michel_et_al_2013__A_survey_of_RDB_to_RDF_translation_ approaches_and_tools.pdf (accessed on 11 October 2020). 23. Byrne, K. Having triplets-holding cultural data as RDF. In Proceedings of the ECDL 2008 Workshop on Information Access to Cultural Heritage, Aarhus, Denmark, 18 September 2008. 24. Green, J.; Dolbear, C.; Hart, G.; Goodwin, J.; Engelbrecht, P. Creating a system using spatial data. In Proceedings of the 2007 International Conference on Posters and Demonstration Session at the 7th International Semantic Web Conference (ISWC2008), Karlsruhe, Germany, 26–30 October 2008; Volume 401, pp. 70–71. 25. Hert, M.; Reif, G.; Gall, H.C. A comparison of RDB-to-RDF mapping languages. In Proceedings of the 7th International Conference on Semantic Systems, Graz, Austria, 7–9 September 2011; pp. 25–32. 26. R2RML: RDB to RDF Mapping Language. Available online: https://www.w3.org/TR/r2rml/ (accessed on 9 October 2019). 27. Bizer, C.; Seaborne, A. D2RQ-treating non-RDF databases as virtual RDF graphs. In Proceedings of the 3rd International Semantic Web Conference (ISWC2004), Hiroshima, Japan, 7–11 November 2004. 28. The D2RQ Platform v0.7-Treating Non-RDF Relational Databases as Virtual RDF Graphs. User Manual and Language Specification. Available online: http://wifo5-03.informatik.uni-mannheim.de/bizer/d2rq/spec/ 20090810/ (accessed on 30 January 2018). 29. Blakeley, C. Mapping Relational Data to RDF with Virtuoso’s RDF Views. OpenLink Software. 2007. Available online: http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VOSSQLRDF (accessed on 11 October 2020). 30. Erling, O.; Mikhailov, I. RDF Support in the Virtuoso DBMS. In Networked Knowledge-Networked Media; Springer: Berlin/Heidelberg, Germany, 2009; pp. 7–24. 31. Sequeda, J.F.; Depena, R.; Miranker, D.P. Ultrawrap: Using SQL Views for RDB2RDF; ISWC 2009 Posters & Demonstrations Track: Seattle, WA, USA, 2009. Appl. Sci. 2020, 10, 7070 19 of 19

32. Sequeda, J.F.; Miranker, D.P. Ultrawrap: SPARQL execution on relational data. J. Web Semant. 2013, 22, 19–39. [CrossRef] 33. Malik, K.R.; Ahmad, T.; Farhan, M.; Aslam, M.; Jabbar, S.; Khalid, S.; Kim, M. Big-data: Transformation from heterogeneous data to semantically-enriched simplified data. Multimed. Tools Appl. 2016, 75, 12727–12747. [CrossRef] 34. Sharma, K.; Marjit, U.; Biswas, U. RDF link generation by exploring related links on the Web of data. Int. J. Inf. Technol. Comput. Sci. 2018, 10, 62–68. [CrossRef] 35. Tong, Q. Mapping object-oriented database models into RDF (S). IEEE Access 2018, 6, 47125–47130. [CrossRef] 36. URIs, URLs, and URNs: Clarifications and Recommendations 1.0. Available online: https://www.w3.org/TR/ uri-clarification/ (accessed on 9 October 2019). 37. Mallea, A.; Arenas, M.; Hogan, A.; Polleres, A. On blank nodes. In International Semantic Web Conference; Springer: Berline/Heidelberg, Germany, 2011; pp. 421–437. 38. Im, D.H.; Lee, S.W.; Kim, H.J. A version management framework for RDF triple stores. Int. J. Softw. Eng. Knowl. Eng. 2012, 22, 85–106. [CrossRef]

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).