<<

Knowledge Representation and Reasoning: Integrating Symbolic and Neural Approaches: Papers from the 2015 AAAI Spring Symposium

Distributional-Relational Models: Scalable for Databases

Andre´ Freitas1,2, Siegfried Handschuh1, Edward Curry2 1Faculty of Computer Science and Mathematics University of Passau 2Insight Centre for Data Analytics National University of Ireland, Galway

Abstract to acquire comprehensive knowledge bases under that rep- resentation model. Logical frameworks are highly sensitive The crisp/brittle semantic model behind databases lim- to problems from the consistency and from the performance its the scale in which data consumers can query, ex- plore, integrate and process structured data. Approaches perspectives, which emerge in large-scale knowledge bases. aiming to provide more comprehensive semantic mod- This work proposes the use of distributional semantic els for databases, which are purely logic-based (e.g. as models (DSMs) to address these limitations, where the sim- in Semantic Web databases) have major scalability lim- plification of the semantic representation in DSMs facili- itations in the acquisition of structured semantic and tates the construction of large-scale and comprehensive se- commonsense data. This work describes a complemen- mantic/commonsense knowledge bases, which can be used tary semantic model for databases which has seman- to support effective semantic approximations for databases. tic approximation at its center. This model uses distri- Distributional semantics provides a complementary perspec- butional semantic models (DSMs) to extend structured tive to the formal perspective of database semantics, which data semantics. DSMs support the automatic construc- supports semantic approximation as a first-class database tion of semantic and commonsense models from large- scale unstructured text and provides a simple model to operation. analyze similarities in the structured data. The combi- A distributional semantics approach implies extending the nation of distributional and structured data semantics formal database semantics with a distributional semantic provides a simple and promising solution to address the layer. In the hybrid model, the crisp semantics of query challenges associated with the interaction and process- terms and database elements are extended and grounded ing of structured data. over a distributional semantic model (Figure 1). The distri- butional layer can be used to abstract the database user from Introduction the specific conceptualization of the data. Data consumers querying, exploring, integrating or analyz- Distributional Semantics ing data today need to go through the process of mapping Distributional semantics is built upon the assumption that their own conceptualization to the identifiers of database el- the context surrounding a given in a text provides im- ements. The requirement of a perfect symbolic and syntactic portant information about its meaning (Harris 1954), (Tur- matching in the database interaction process forces the user ney and Pantel 2010). Distributional semantics focuses on to perform (during query construction time) a time consum- the construction of a semantic representation of a word ing information need-database symbol alignment process. based on the statistical distribution of word co-occurrence With the growth of the symbolic space associated with con- in unstructured data. The availability of high volume and temporary databases, the process of manual alignment to the comprehensive Web corpora brought distributional seman- database symbolic space becomes infeasible and restrictive. tic models as a promising approach to build and represent Automatic semantic approximation between the data con- meaning at scale. sumer information needs and database elements is a cen- One of the major strengths of distributional models is tral operation for data querying, exploration, integration and from the acquisitional point of view, where a semantic data analysis. However, effective semantic approximation model can be automatically built from large unstructured is heavily dependent on the construction of comprehen- text. In Distributional Semantic Models (DSMs) the mean- sive semantic/commonsense knowledge bases. While differ- ing of a word is represented by a weighted vector, which can ent semantic approaches based on logical frameworks have be automatically built from contextual co-occurrence infor- been proposed, such as Semantic Web databases, these ap- mation in unstructured data. The distributional hypothesis proaches are limited in addressing the trade-off between pro- (Harris 1954) assumes that the local context in which a term viding an expressive semantic representation and the ability occurs can serve as discriminative semantic features which Copyright c 2015, Association for the Advancement of Artificial represent the meaning of the term. While this simplification Intelligence (www.aaai.org). All rights reserved. makes distributional semantics a coarse-grained semantic

57 model, not suitable for all tasks, the scale in which seman- Query: q (q ,x ) Ʌ ... Ʌ q (x ,q ) p ~q ... q ~c tic knowledge and associations can be captured makes them p0 c0 0 pi j ck m pi cj ck effective models for calculating semantic approximations,a ...... θr DB fact which is supported by empirical evidence (Gabrilovich p0 θ r p0(e0,e1) and Markovitch 2007). Distributional e1 p (e ,e ) DSMs are represented as a vector space model, where Relational e0 1 0 2 C Model θe p2(e1,e2) each dimension represents a context pattern for the lin- ... T guistic or data context in which the target term occurs. pm(el,en) A context can be defined using documents, data tuples, co- context occurrence window sizes (number of neighboring ) or Distributional vector space/ Reference data corpus/ syntactic features. The distributional interpretation of a tar- distribution of symbol associations get term is defined by a weighted vector of the contexts in Semantic Relatedness syntagmatic relation context pattern paradigmatic relation which the term occurs, defining a geometric interpretation context same symbol si under a distributional vector space. The weights associated srel (si,sj) document/ ... with the vectors are defined using an associated weighting dataset sj scheme W, which calibrates the relevance of more generic θ context or discriminative contexts. The semantic relatedness mea- η symbol sure s between two words is calculated by using different Observer 1 Observer 2 Observer n similarity/distance measures such as the , Euclidean distance, mutual information, among others. As the dimensionality of the distributional space grows, dimen- Figure 1: Depiction of distributional relations, contexts and sionality reduction approaches d can be applied. different representation views for distributional semantics. Distributional-Relational Models (DRMs) or external (a separate reference corpora); F is a map which The semantics of a database element e (e.g. constants, pred- −→ translates the elements ei ∈ E into vectors ei in the the dis- icates) is represented by the set of natural language descrip- tributional vector space VSDSM using the natural language tors associated with it. This typically does not include con- descriptor of ei; and H is the set of thresholds above which cept associations outside the scope of the specific task that two terms are semantically equivalent. the database was designed to address, limiting its use for Definition (Distributional-Relational Model (DRM)): A distri- semantic approximation to concepts outside the designed butional relational model is a tuple (DSM, DB, RC, F, H) such database representation. Semantic approximation operations that: are a fundamental operation to support schema-agnosticism •DSMis the associated distributional semantic model. (Freitas, Silva, and Curry 2014), i.e. the ability to interact with a database without a precise understanding of the con- •DBis an structured dataset with DB elements E and tuples T . ceptual model behind it. •RCis the reference corpora which can be unstructured, struc- In this work, the formal semantics of a database symbol is tured or both. The reference corpora can be internal (based on extended with a distributional semantics description, which the co-occurrence of elements within the DB) or external (a sep- arate reference corpora). captures the large-scale symbolic associations within a large −→ reference corpora. The distributional semantics representa- •Fis a map which translates the elements ei ∈ E into vectors ei DSM tion captures large-scale semantic, commonsense and do- in the the distributional vector space VS using the string main specific knowledge, using it in the semantic approxi- of ei and the data model category of ei. mation process between a third-party information need and •His the set of semantic thresholds for the distributional seman- the database (Figure 3). The hybrid distributional-structured tic relatedness s in which two terms are considered semantically model is called Distributional-Relational Model (DRM). A equivalent if they are equal and above the threshold. DRM embeds the structure defined by relational models in In this work we assume a simplified data model with a a distributional vector space, where every entity and rela- signature Σ=(P, C) formed by a pair of finite set of sym- tionship have an associated vector representation. The dis- bols used to represent binary and unary predicates p ∈ P tributional associational information embedded in the dis- between constants c ∈ C. The semantics of the DB is tributional vector space is used to semantically complement defined by the vectors in the distributional space used to the knowledge expressed in the structured data model. The represent the elements. The set of all distributional con- texts Context = {χ1, ··· ,χt} are extracted from a ref- distributional information is then used to support semantic erence corpus and each context χi ∈ Context is mapped approximations, while preserving the semantics of the struc- to an identifier which represents the co-occurrence pattern tured data. in the corpus. Each identifier χi defines a set which tracks A Distributional-Relational Model (DRM) is a tuple the context where a term t mapping to the database ele- (DSM, DB, RC, F, H) DSM e , where: is the associated ment occurred.−→ This set−→ is used to construct the basis distributional semantic model; DB is the database with ele- Contextbase = {χ1, ··· , χt} of vectors that spans the dis- ments E; RC is the reference corpora which can be unstruc- tributional vector space VSdist (Figure 1). Once the DSM tured, structured or both. The reference corpora can be inter- is built, the elements of the signature Σ of the DB are trans- nal (based on the co-occurrence of elements within the DB) lated into vectors in VSdist. The vector representation of E,

58 VSdist is defined by: κ κ t Query Q(daughter) DB DB(Bill Clinton) dist −→ −→ e−→ VS = { e : e = w χ i e ∈ E} i , for each (1) daughter (Bill Clinton, ?x0) almaMater(Bill Clinton, Oxford University) i=1 occupation(Bill Clinton, Politician) e sonOf(Bill Clinton, Virginia Cassidy Blythe) where wi are defined by a co-occurrence weighting scheme. religion(Bill Clinton, Baptists) The last step refers to the translation of DB tuples into childOf(Bill Clinton, Chelsea Clinton) ... VSdist elements. As each relation and entity symbol has a vector representation, we can define the vector representa- Semantic Relatedness: ς(‘daughter’, π {almaMater, ..., religion, childOf} , ηp) tion of an atom r in the concept vector space by the follow- Higher semantic > η ing definition. srel(daughter, childOf) = "0.03259" −→ relatedness values ... η =0.02 Definition (Distributional Representation of a Tuple): Let p , −→ −→ srel(daughter, sonOf) = "0.01091" c1 and c2 be the vector representations, respectively, of the binary srel(daughter,occupation) = "0.00356" p c c srel(daughter, religion) = "0.00120" predicate and its associated constants 1 and 2. A tuple vector Lower semantic < η −→ −→ −→ srel(daughter, almaMater) = "0.0" representation (denoted by r ) is defined by: ( p − c1) if p(c1); relatedness values −→ −→ −→ −→ ... ( p − c1, c2 − p ) if p(c1,c2). Semantic Approximation: daughter ~ childOf Semantic Approximation & Context The computation of the distributional semantic relatedness Figure 2: Distributional semantics approximation after the is a semantic approximation process in which the distribu- semantic pivot selection. tional semantic knowledge and relatedness measure serves as surrogates for the rules, axioms and inference in a de- ductive logic approach. The assumption is that the knowl- semantic entropy) for the semantic approximation mecha- edge that would be expressed as rules and axioms in a nism of the following alignments, since they are restricted logical commonsense KB is partially embedded in an un- by the syntactic constraints of the semantic pivot. structured way in the reference corpora, and that an initial query-database alignment provides the contextual and scop- ing mechanism in which the distributional knowledge can be D-DBMS Architecture applied as a semantic/commonsense approximation mecha- From an architectural perspective, a database management nism. system (DBMS) can be enriched with a separate distribu- t DSM t The distributional alignment (d-alignment) q ˜ DB tional semantics approximation layer. A high-level architec- t t between a query term q and a database term DB is de- ture diagram for the distributional DBMS (D-DBMS) is de- s (t ,t ) ≥ η fined when rel q DB . A d-alignment is not equiv- picted in Figure 3. The distributional semantic model com- t ≡ t alent to q DB in absolute terms, i.e. for all possible ponent builds the semantic model from the reference cor- inference contexts. However, we argue that given an initial pora, which can coincide with the target dataset. Different query-database alignment context (a semantic pivot), the d- distributional models have different approximation proper- t ≡ t alignment can be locally equivalent to q DB. ties, allowing broader or narrower semantic approximations, Contextual Distributional Equivalence Hypothesis: Let tq be a depending on the configuration of their parameters. Addi- query term and tDB be a database term. Let κ(tq) and κ(tDB) be tionally, different reference corpora can be used according the contextual information associated with the query and database to the domain of covered in the database. The se- terms respectively (i.e. previous alignments). If tq is d-aligned with mantic relatedness measure can be computed in two scenar- tDB under the context (κ(tq), κ(tDB)) then tq ≡ tDB, i.e. can be ios: (i) dynamically: where the semantic approximation op- assumed to be semantically equivalent. erator ζ calculates the semantic relatedness between a query For a semantic approximation process, the context sets term and database elements, (ii) using a distributional index: should be selected in a way which minimizes the probability where the distributional vectors of the database elements are of a semantic mismatching using a heuristic function which represented in an inverted index (Freitas and Curry 2014). prioritizes the easiest alignment to address, i.e., the query The semantic approximation layer can be implemented as term with lower semantic entropy (i.e. less subject to ambi- an external, distinct layer from the database component, not guity, vagueness and synonymy) and provides a higher prob- requiring direct adaptation of the database internals. ability of a correct semantic matching (Freitas and Curry 2014). This first alignment is called a semantic pivot and Application Patterns typically maps to a named entity (rigid designator) in the query (which most likely maps to a constant in the database) The convergence between distributional semantics and and can be determined using linguistic-based heuristics (e.g. databases is a recent development which has been used in Part-of-Speech, corpus-based specificity/entropy measures different application scenarios: such as IDF (Freitas and Curry 2014)) (Figure 2). The selection of a semantic pivot provides a drastic re- Schema-agnostic Queries: In this category the RC is un- duction of the symbolic matching space (and the associated structured and it is distinct from the DB. (Freitas and Curry

59 solidation in the life sciences domain. (Speer, Havasi, and Semantic approximation layer DB Lieberman 2008) proposed AnalogySpace, a DRM over a query KB Semantic Semantic commonsense using Latent Semantic Indexing target- Approximation Relateness ς Query ing the creation of the analogical closure of a semantic net- Database Integration e Knowledge Discovery work using dimensional reduction. AnalogySpace was used Distributional KB Dialog ψ to reduce the sparseness of the , generalizing its knowl- Semantic Model e User edge, allowing users to explore implicit associations. (Co- feedback Distributional hen, Schvaneveldt, and Rindflesch 2009) introduced PSI, Model Inverted Index Reference Corpora Distrbutional Relational a predication-based semantic indexing for biomedical data. PSI was used for similarity-based retrieval and detection of implicit associations. Figure 3: A distributional semantics layer to complement Conclusion structured data semantics. Preliminary results for Distributional Relational Models (DRMs) have been encouraging, showing the effectiveness 2014) define a DRM (τ − Space) for supporting schema- of distributional semantics as a semantic model comple- agnostic queries over large-schema databases, where users mentary to structured data semantics. Distributional seman- are not aware of the representation of the data. The DSM tics have been used to support semantic approximations for used is Explicit Semantic Analysis (ESA). The approach schema-agnostic queries, coping with knowledge bases’ in- was evaluated using natural language queries (QALD 2011 completeness for reasoning, and for knowledge discovery test collection) over DBpedia, a large-schema RDF graph (entity consolidation, link discovery) in structured data. dataset containing 45,767 properties, 9,434,677 instances Acknowledgments: This publication was supported in part by and over 200,000 classes achieving avg. recall=0.81, mean Science Foundation Ireland (SFI) (Grant Number SFI/12/RC/2289) avg. precision=0.62 and mean reciprocal rank=0.49, with and by the Irish Research Council. interactive-level query execution time (most queries are answered in less than 2s). References Selective Commonsense Reasoning over Incomplete Cohen, T.; Schvaneveldt, R. W.; and Rindflesch, T. C. 2009. Predication-based semantic indexing: Permutations as a means to KBs: (Freitas et al. 2014) uses a DRM to support selective KBs encode predications in semantic space. In T. AMIA Annu Symp reasoning over incomplete commonsense . Distri- Proc., 114–118. butional semantics is used to select the facts which are da Silva, J. C. P., and Freitas, A. 2014. Towards an approximative semantically relevant under a specific (abductive-style) ontology-agnostic approach for logic programs. In Proc. Intl. Sym- reasoning context, allowing the scoping of the reasoning posium on Foundations of Information and Knowledge Systems. context and also coping with incomplete knowledge of the KBs Freitas, A., and Curry, E. 2014. Natural language queries over commonsense . heterogeneous linked data graphs: A distributional-compositional semantics approach. In Proc. Intl. Conference on Intelligent User Approximative Logic Programming over Incomplete Interfaces. KBs: (da Silva and Freitas 2014) used a DRM to support Freitas, A.; Silva, J. C. P. D.; Curry, E.; and Buitelaar, P. 2014. A approximate reasoning in logic programs, defining predicate distributional semantics approach for selective reasoning on com- substitutions to support the application of schema-agnostic monsense graph knowledge bases. In Proc. Intl. Conference on queries and rules. The approach supports a runtime predi- Applications of Natural Language to Information Systems. cate integration within logic programs. Freitas, A.; Silva, J. C. P. D.; and Curry, E. 2014. On the semantic mapping of schema-agnostic queries: A preliminary study. In Proc Knowledge Discovery: In this category, the structured DB of. NLIWoD at ISWC. is used as a distributional reference corpora (where RC = Gabrilovich, E., and Markovitch, S. 2007. Computing seman- DB). Implicit and explicit semantic associations are used tic relatedness using wikipedia-based explicit semantic analysis. to derive new meaning and discover new knowledge. The In Proc. of the International Joint Conference on Artifical intel- use of structured data as a distributional corpus is a pattern ligence, 1606–1611. used for knowledge discovery applications, where knowl- Harris, Z. 1954. Distributional structure. Word 10 (23) 146–162. edge emerging from similarity patterns in the data can be Novacek, V.; Handschuh, S.; and Decker, S. 2011. Getting the used to retrieve similar entities and expose implicit associa- meaning right: A complementary distributional layer for the web tions. In this context, the ability to represent the KB entities’ semantics. In Proc. of the Intl. Semantic Web Conference, 504– attributes in a vector space and the use of vector similarity 519. measures as way to retrieve and compare similar entities can Speer, R.; Havasi, C.; and Lieberman, H. 2008. Analogyspace: Re- define universal mechanisms for knowledge discovery and ducing the dimensionality of common sense knowledge. In Proc. semantic approximation. (Novacek, Handschuh, and Decker of the Intl. Conf. on Artificial Intelligence, 548–553. 2011) describe an approach for using web data as a bottom- Turney, P. D., and Pantel, P. 2010. From frequency to meaning : up phenomena, capturing meaning that is not associated with Vector space models of semantics. Journal of Artificial Intelligence explicit semantic descriptions. They apply it to entity con- Research 141–188.

60