Open Relation Modeling: Learning to Define Relations between Entities Jie Huang Kevin Chen-Chuan Chang University of Illinois at Urbana-Champaign University of Illinois at Urbana-Champaign USA USA [email protected] [email protected] Jinjun Xiong Wen-mei Hwu University at Buffalo University of Illinois at Urbana-Champaign USA USA [email protected] [email protected]

ABSTRACT while deep learning is used heavily recently for natural language Relations between entities can be represented by different instances, processing. e.g., a sentence containing both entities or a fact in a Knowledge Besides, even relations are represented, they may not be inter- Graph (KG). However, these instances may not well capture the pretable to humans. There are different ways to represent relations general relations between entities, may be difficult to understand between entities. For example, if two entities co-occur in a sentence, by humans, even may not be found due to the incompleteness of they are possibly related and the relation can be implied by the sen- the knowledge source. tence. From a structured perspective, a relation can be represented In this paper, we introduce the Open Relation Modeling task– as a fact or a multi-hop reasoning path between two entities in a given two entities, generate a coherent sentence describing the Knowledge Graph (KG). However, for humans without too much relation between them. To solve this task, we propose to teach ma- prior knowledge about the entities, it is still difficult to understand chines to generate definition-like relation descriptions by letting the relations by reading them. For example, from sentence “we study them learn from definitions of entities. Specifically, we fine-tune data mining and database” or fact “(data mining, facet of, database)”, Pre-trained Language Models (PLMs) to produce definitions con- humans can guess data mining and database are related fields, but ditioned on extracted entity pairs. To help PLMs reason between they cannot know exactly how they are related. Besides, due to entities and provide additional relational knowledge to PLMs for the limited size of the corpus or the incompleteness of the KG, for open relation modeling, we incorporate reasoning paths in KGs many related entities, we are not able to extract a sentence or a fact and include a reasoning path selection mechanism. We show that containing both entities. PLMs can select interpretable and informative reasoning paths by Based on the above observation, a system for exploring rela- confidence estimation, and the selected path can guide PLMs to tions between entities needs to meet the following requirements: generate better relation descriptions. Experimental results show 1) interpretability: the system can provide interpretable relation that our model can generate concise but informative relation de- descriptions, with which humans can easily understand relations be- scriptions that capture the representative characteristics of entities tween entities; 2) openness: the system can deal with a wide range and relations.1 of related entities, including those neither co-occur in a corpus nor be connected in a knowledge graph, where types of relations are KEYWORDS not required to be explicitly pre-specified. To achieve a system meeting with the above requirements, we open relation modeling, relation description, text generation introduce a novel task– Open Relation Modeling, i.e., generating coherent sentences describing general relations between entities, 1 INTRODUCTION where types of relations do not need to be pre-specified. Different arXiv:2108.09241v1 [cs.CL] 20 Aug 2021 People are always interested in relations between entities. To learn from open relation extraction, which aims to extract relational facts about a new concept, people want to know how this concept relates between entities from an open-domain corpus [4], open relation to the ones they are familiar with; when getting two related entities modeling aims to generate a concise but informative sentence, cap- of interest, people ask how exactly they are related. turing the representative characteristics of the given entities and However, although existing systems identify related entities, their relation. From the perspective of interpretability, compared they do not provide features for exploring relations between en- to open relation extraction whose outputs are phrases with low tities. For instance, in Figure 1, the top is the ScienceDirect Topics interpretability, e.g., (data mining methods, to be integrate within, feature of Elsevier, which lists several related terms without any the framework of traditional database systems) by Ollie [28], open annotation; the bottom is the “see also” feature of Wikipedia, where relation modeling improves the interpretability of entity relations. the annotation of deep learning is not specific to the context of nat- For example, for data mining and database, we want to generate ural language processing. Users cannot get how deep learning and a relation description like “data mining is a process of extracting natural language processing are related by reading the annotation, and discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems”. 1Code and data are available at https://github.com/jeffhj/open-relation-modeling. Figure 1: Examples of two current services: Elsevier’s ScienceDirect Topics (top) and Wikipedia’s “see also” (bottom), both of which lack open relation modeling.

Such a relation description is informative and easy to understand We conduct both quantitative and qualitative experiments to since it contains important and precise information about entities validate the effectiveness of the proposed methods. Experimental and their relation. results show that, after learning from definitions of entities, PLMs To solve the task, we propose to teach machines to learn from have a great ability to describe relations between entities concisely definitions of entities. Definitions of entities are highly summarized and informatively. By incorporating reasoning paths and including sentences that capture the most representative characteristics of the reasoning path selection mechanism, machines can often gener- entities, where the general relations between the defined entity ate relation descriptions well capturing relations between entities, and other entities in the definitions are well captured. Therefore, where only minor errors that do not affect the understanding of we suggest to find the general relation between two entities by relations are included. We also show that PLMs can select inter- defining one entity in terms of the other entity. To achieve this, pretable and informative reasoning paths by confidence estimation, we first collect definitions of entities and extract entity pairs from which can in turn guide and help PLMs to generate more reasonable the definitions. Then we teach machines to generate definition-like and precise relation descriptions with the desired characteristics. relation descriptions by training a language generation model to Additionally, we conduct error analysis for the proposed methods produce definitions of entities conditioned on extracted entity pairs. and suggest several directions for future work. To generate informative relation descriptions, machines need The main contributions of our work are summarized as follows: knowledge about entities and relations. Therefore, we apply Pre- • We introduce and study the novel open relation modeling task– trained Language Models (PLMs) [6, 16, 24, 25], which have re- generating coherent sentences describing general relations be- cently been shown to contain rich relational knowledge of entities tween entities. [19, 23, 27, 35]. PLMs are usually first pre-trained on large un- • To solve this task, we propose a novel idea of teaching machines labeled corpora via self-supervision with auxiliary tasks such as to generate definition-like relation descriptions by letting them masked language modeling and next-sentence prediction [8]. They learn from producing definitions conditioned on extracted entity are typically deep neural networks with multiple (typically 12 or 24) pairs. Transformer layers [31]. After pre-training, rich world knowledge • We apply PLMs and design reasoning path-enriched PLMs for is stored in the huge number of parameters of PLMs, e.g., knowing open relation modeling. Dante was born in Florence [23]. During fine-tuning, PLMs can be • We show that PLMs can select interpretable and informative rea- adapted to specific tasks and incorporate task-specific knowledge soning paths by confidence estimation, and the selected reasoning from the training data. Thus, PLMs have a great ability to model path can guide PLMs to generate better relation descriptions. relations between entities and have the potentials to describe rela- • We release a large benchmark dataset and conduct extensive tions informatively. experiments for the proposed open relation modeling task. To utilize knowledge to describe relations between entities, ma- chines also need to reason between entities. However, recent studies have shown that, although PLMs contain rich world knowledge, 2 OPEN RELATION MODELING their reasoning ability is weak [10, 26, 38]. To alleviate this issue, we incorporate reasoning paths in KGs, which can help PLMs do 2.1 Problem Statement multi-hop reasoning and provide additional relational knowledge to The problem of Open Relation Modeling can be described as: PLMs. Since not all reasoning paths are equally helpful, we design given two entities 푥 and 푦, corresponding to head and tail, the a reasoning path selection mechanism by confidence estimation task is to generate a coherent sentence 푠 that describes the general of PLMs to select interpretable and informative reasoning paths, relation between 푥 and 푦, where types of relations do not need to which are then incorporated by PLMs for open relation modeling. be pre-specified. More specifically, the expected output is a concise 2 but informative sentence that captures the representative charac- relation description that possesses the desired characteristics. We teristics of the entities and their relation (examples of data mining refer to this model as RelationBART-Vanilla. and database as shown in Section 1). 2.3 Reasoning Path-Enriched Relation 2.2 Open Relation Modeling: Learning from Modeling Definitions While PLMs can generate coherent relation descriptions with fine- We formulate open relation modeling as a conditional sentence tuning on the entities-definition pairs, their ability is still limited. generation task, i.e., generating sentences capturing general rela- Recent studies [10, 26, 38] show that it is difficult for PLMs to tions between entities conditioned on entity pairs. Formally, we reason based on their knowledge. Besides, although PLMs contain apply the standard sequence-to-sequence formulation: given an rich relational knowledge implicitly, they cannot recover all the entity pair (푥,푦), the probability of the output relation description relational knowledge in a knowledge base. Knowledge graphs, in contrast, contain rich relational knowl- 푠 = [푤1, . . . ,푤푚] is calculated as: edge explicitly. Relations between entities can be represented by 푚 Ö reasoning paths extracted from KGs directly. A good reasoning path 푃 (푠|푥,푦) = 푃 (푤푖 |푤0,푤1, . . . ,푤푖−1, 푥,푦), can guide PLMs to do multi-hop reasoning and provide additional 푖=1 relational knowledge to PLMs for open relation modeling. Therefore, we want to inject relational knowledge of KGs into where 푤0 is a special start token. PLMs and incorporate reasoning paths to help PLMs reason be- To generate a sentence capturing the general relation between tween entities. We achieve this by a simple encoding scheme with- 푥 and 푦, machines need to know the semantic meanings of 푥 and out changing the architecture of PLMs and re-pre-training. Given a , reason between them, and learn to describe their relation in 푦 knowledge graph 퐺, for an entity pair (푥,푦), if there exists a rea- a concise but informative form. Definitions of entities, which are soning path 푝(푥,푦) = {푥, 푟1, 푒1, 푟2, . . . , 푟 ,푦} in 퐺, we encode (푥,푦) highly summarized (i.e., concise but informative) sentences, capture 푘 as 푥; 푟1: 푒1; 푟2: ... ; 푟푘 : 푦; if not, we encode (푥,푦) as 푥; unknown: 푦. the most representative characteristics of entities. To define an With fine-tuning on the path-sentence pairs, the model can learn entity, other entities may be included, and the relations between to utilize the relational knowledge in a reasoning path to reason the defined entity and other entities are well captured. between two entities and generate a coherent sentence describing Therefore, we propose to teach machines to describe general rela- the relation between them. tions between entities by letting them learn from definitions of entities. However, there may exist multiple reasoning paths between two The key idea is to find the general relation between two entities by entities in a KG, while not all reasoning paths are equally helpful. defining one entity in terms of the other. entity To achieve this, we Among the reasoning paths between two entities, the shortest first collect definitions of entities and extract entity pairs from one usually indicates the most direct relation. For example, if two these definitions to form entities-definition pairs (more details are entities have a direct relation in a KG, the shortest reasoning path in Section 3.1). After that, we teach machines to generate relation should be a 1-hop path 푝(푥,푦) = {푥, 푟1,푦}. This path can represent descriptions with the desired characteristics by training a language a reasonable relation between two entities because this is the reason generation model to produce definitions of entities conditioned on why the KG includes such a fact. Based on this observation, formally, extracted entity pairs. given an entity pair (푥,푦), the selected reasoning path is With the key idea in mind, the next step is to design the language generation model. Recently, Bevilacqua et al. [5] show that, by fine- 푝ˆ(푥,푦) = arg min 푙푒푛(푝(푥,푦)), tuning with context-gloss pairs, pre-trained language generation 푝 (푥,푦) ∈P (푥,푦) models can generate the glosses/definitions for definiendums that where P(푥,푦) is the set of reasoning paths connecting 푥 and 푦 are not seen in the training data. Besides, recent studies [19, 23, extracted from the KG and 푙푒푛(·) is the length of the reasoning path. 35] demonstrate that pre-trained language models contain rich We name the model trained with the shortest reasoning paths2 as relational knowledge, and such relational knowledge is essential to RelationBART-SP. To keep the presented model simple and easy describing relations between entities. to be verified, we leave the more complex mechanism of sampling Therefore, we apply pre-trained language models for open re- reasoning paths as future work [7, 15, 36]. In the next section, lation modeling. Particularly, we employ BART [16]– a recent we will show that PLMs can select interpretable and informative transformer-based encoder-decoder model. BART can be adapted to reasoning paths automatically based on confidence estimation. a specific conditional generation task by fine-tuning on task-specific input-output pairs. In our framework, we train BART to produce 2.4 Open Relation Modeling with Reasoning the definitions of entities with extracted entity pairs as input. Specif- Path Selection ( ) ically, we encode the entity pair 푥,푦 as 푥;푦, e.g., 퐻푎푠푡푒;퐺푒푟푚푎푛푦, While shortest reasoning paths can represent the most direct re- and fine-tune the model to generate the corresponding sentence 푠, lations between entities, from the perspective of human/machine e.g., “Haste is a municipality in the district of Schaumburg, in Lower understanding, these paths may not be the most interpretable and Saxony, Germany”. By fine-turning on the training data, the model informative. For instance, given entity pair (Haste, Germany), with can learn the knowledge about entities and learn to connect two sentence description 푠 = “Haste is a municipality in the district of entities in a coherent sentence based on its “knowledge”. When given a new entity pair, the model can generate a definition-like 2If there exist multiple shortest paths for an entity pair, we randomly choose one. 3 Schaumburg, in Lower Saxony, Germany”, the shortest reasoning Table 1: The statistics of the data. path in Wikidata KG is 푝1 = {Haste, country, Germany}. This rea- soning path is not interpretable since we only know Haste is in train dev test Germany, but we have no idea whether Haste is a municipality or number 5,434,158 27,431 55,226 a district of Germany. However, from reasoning path 푝2 = {Haste, located in the administrative territorial entity, Schaumburg, country, 1-hop 2-hop 3-hop > 3-hop Germany}, we can know Haste is a smaller administrative region ratio (%) 35.14 17.80 7.33 39.73 than Schaumburg– possibly a municipality. Besides, compared to 푝1, 푝2 is more informative. With 푝1, to generate 푠, machines need Table 2: Annotation guidelines excerpt. to “guess” the district of Haste. However, with 푝2, machines can predict the district of Haste is Schaumburg with a high confidence. A more interpretable and informative reasoning path can guide Rating Criterion and help machines to generate a more reasonable and precise re- 4 The relation is well captured, and important information about lation description with the desired characteristics. This is because entities is included and correctly predicted. machines can more easily reason between entities with the path 3 The prediction contains minor error(s) that do not affect the under- and incorporate more important information from the path. There- standing of the relation. fore, instead of using the shortest paths, we design a mechanism 2 The prediction contains major error(s) that affect the understanding to select the most interpretable and informative reasoning paths of the relation, while the relation can still be inferred to some extent. automatically. We achieve this by the confidence estimation of 1 The prediction contains major error(s) that will mislead the under- PLMs, which is motivated by related work on machine translation standing of the relation. and speech recognition for accessing the quality of the prediction [21, 29, 30]. Given an entity pair (푥,푦), with a reasoning path 푝(푥,푦), a trained model M, and the corresponding prediction M(푝(푥,푦)), The first sentences of Wikipedia are definition-like sentences the confidence of the prediction can be evaluated by the posterior connecting different entities. For instance, the first sentence ofpage probability 푃 (M(푝(푥,푦))|푝(푥,푦)). We select the reasoning path Deep Learning is 푠 = “[Deep learning] (also known as deep structured associated with the highest confidence score: learning) is part of a broader family of [machine learning] methods based on [artificial neural networks] with [representation learning].” 푝ˆ(푥,푦) = arg max 푃 (M(푝(푥,푦))|푝(푥,푦)). The head entity of this sentence is deep learning, and there are ( ) ∈P ( ) 푝 푥,푦 푥,푦 three tail entities: machine learning, artificial neural networks, and Reasoning path selection by confidence estimation of PLMs is representation learning, which are linked to other pages and can intuitive since 1) if a reasoning path is more interpretable, which be easily extracted with simple text preprocessing. Combining the means the path is easier to convert to a precise relation description, head entity and the three tail entities, we can construct three entity PLMs can “reason” between entities based on their knowledge with pairs, whose expected relation descriptions are all 푠. The version 3 less effort; 2) if a reasoning path is more informative, which means we used is 2021-03-20 dump of English Wikipedia. For each page, 4 the reasoning path provides useful relational knowledge, PLMs can we extract the plain text by WikiExtractor and further extract the incorporate such information into the prediction without guessing first sentence. We randomly split entity pairs to build train/dev/test the necessary information. In both cases, the confidence of the sets, where the head entities do not overlap in each set. prediction will be higher. To provide reasoning paths for open relation modeling, we sam- With the reasoning path selection mechanism, given an en- ple part of Wikidata to build a knowledge graph. Specifically, we tity pair (푥,푦), the generated relation description is M(푝ˆ(푥,푦)), keep facts whose head and tail entities both appear in Wikipedia. where 푝ˆ(푥,푦) is the reasoning path associated with the highest The extracted KG contains 5,033,531 entities and 23,747,210 fact confidence score. The selected reasoning path can also serveas triples. The relation between two entities is considered as 푘-hop if a support of the prediction and help users to understand the re- the shortest reasoning path between them is 푘-hop. Table 1 sum- lation in a structured view. To get the trained model M, we can marizes the statistics of the data. directly apply RelationBART-SP introduced in Section 2.3. We name RelationBART-SP with reasoning path selection as RelationBART- 3.2 Experimental Setup SP + PS. To make the training more robust and let PLMs learn more Baselines. Because our task on open relation modeling is new, features from valid reasoning paths, for each entity pair, we can there is no existing baseline for model comparison. We design the sample more than one reasoning path, e.g., the shortest 푛 reasoning following baselines/variants for evaluation: paths with hops ≤ 푘, to train the model. We refer to this model as • DefBART: Since the expected output is a definition-like sen- RelationBART-MP + PS. tence, the model proposed in [5] can be applied directly, i.e., generating the definition of the head entity with the head entity 3 EXPERIMENTS as input. The difference is that we do not provide any context for 3.1 Dataset Construction and Analysis the head entity. We can observe the performance gain of relation

We use Wikipedia and Wikidata [34] to build a benchmark dataset 3https://dumps.wikimedia.org/enwiki/20210320 for open relation modeling. 4https://github.com/attardi/wikiextractor 4 Table 3: Results of open relation modeling.

BLEU ROUGE METEOR BERTScore DefBART 20.67 41.82 18.84 81.56 RelationBART-Vanilla (w/o PT) 26.01 50.84 23.65 85.37 RelationBART-Vanilla 26.81 51.48 24.14 85.73 RelationBART-SP 27.78 52.59 24.79 86.20 RelationBART-SP + PS 28.83 53.48 25.42 86.40 RelationBART-MP + PS 29.51 53.74 25.64 86.51 RelationBART-Vanilla (Large) 27.93 52.10 24.72 86.03 RelationBART-SP (Large) 29.21 53.01 25.37 86.43 RelationBART-MP (Large) + PS 29.72 54.10 25.89 86.70

Table 4: Results of open relation modeling with 1% and 10% training data.

BLEU ROUGE METEOR BERTScore RelationBART-Vanilla (w/o PT) 17.30 44.12 19.02 81.56 1% RelationBART-Vanilla 20.99 47.11 21.23 84.04 RelationBART-Vanilla (w/o PT) 22.88 48.50 22.07 84.31 10% RelationBART-Vanilla 24.31 49.89 22.99 85.16

modeling compared to definition modeling in terms of generating 5 × 10−5 and batch-size of 1,024 tokens based on some preliminary definitions and see the difference between these two tasks. experiments and the memory size of GPUs. We set the maximum • RelationBART-Vanilla: The vanilla version of our model intro- reasoning length as 3 since the number of reasoning paths with duced in Section 2.2. hops > 3 is very large and the quality of these paths is generally low. • RelationBART-Vanilla (w/o PT): A variant of RelationBART- For RelationBART-MP and reasoning path selection, we sample at Vanilla, where the BART model is not pre-trained. By comparing most 5 reasoning paths with hops ≤ 3. Without additional notation, with this baseline, we can verify the function of pre-training for we apply the BART-base model, and denote “Large” when using open relation modeling. the BART-large model. All the models were trained on NVIDIA • RelationBART-SP: The shortest-path version of our model in- Quadro RTX 5000 GPUs, and the training converged in 50 epochs. troduced in Section 2.3. The training time of RelationBART-Vanilla, RelationBART-MP, and • RelationBART-SP + PS: The shortest-path version of our model, RelationBART-MP (Large) for one epoch with 3 GPUs are 80 min- combining with the reasoning path selection mechanism (Section utes, 4 hours, and 7 hours respectively. 2.4). • RelationBART-MP + PS: The multiple-path version of our model, 3.3 Open Relation Modeling combining with the reasoning path selection mechanism (Section Table 3 represents the experimental results of open relation mod- 2.4). eling with the automatic metrics. We observe that RelationBART- Metrics. Following existing works on text generation, we apply Vanilla achieves much better performance than DefBART, which several widely-used metrics to automatically evaluate the perfor- demonstrates the necessity of the tail entity in terms of generat- mance of open relation modeling, including BLEU [22], ROUGE-L ing definition-like sentences that imply relations between entities. [18], METEOR [3], and BERTScore [37]. Among them, BLEU and Besides, RelationBART-Vanilla outperforms the version without ROUGE-L are based on simple string matches, and METEOR also pre-training, which indicates that knowledge stored in PLMs after incorporates word stems, synonyms, and paraphrases for matching. pre-training is helpful for open relation modeling. However, the These three metrics mainly focus on measuring surface similar- improvement is not very significant, which may be because the ities. BERTScore is based on the similarities of contextual token size of our training data is large; thus the model can learn rich embeddings. We also conduct human evaluation by asking human knowledge about entities from definitions of entities without pre- annotators to assign graded values (1-4) to the sampled predictions training. To verify this, we also train the model with smaller size according to Table 2. of data. From Table 4, we observe that when the training data be- Implementation Details. We employ the fairseq library5 to comes smaller, the performance of the version without pre-training build the RelationBART model and adopt the key hyperparam- decreases much faster than the one with pre-training. eters as suggested in [16]. We manually set the learning rate as Compared to RelationBART-Vanilla, the models with reasoning paths all achieve better performance, which demonstrates that rea- 5https://github.com/pytorch/fairseq/tree/master/examples/bart soning paths can help PLMs reason between entities and provide 5 Table 5: Results of open relation modeling for reasonable and hard-to-reason pairs.

BLEU ROUGE METEOR BERTScore RelationBART-Vanilla 22.99 47.25 22.21 84.39 RelationBART-SP 23.07 47.36 22.32 84.42 RelationBART-MP + PS 22.63 46.91 21.99 84.24 hard-to-reason (> 3-hop) RelationBART-Vanilla (Large) 24.24 47.97 22.88 84.76 RelationBART-SP (Large) 24.50 47.81 22.90 84.70 RelationBART-MP (Large) + PS 22.92 47.45 22.34 84.55 RelationBART-Vanilla 29.61 54.25 25.56 86.61 RelationBART-SP 31.24 56.00 26.62 87.35 RelationBART-MP + PS 34.52 58.21 28.36 87.99 reasonable (≤ 3-hop) RelationBART-Vanilla (Large) 30.64 54.81 26.08 86.86 RelationBART-SP (Large) 32.66 56.42 27.20 87.56 RelationBART-MP (Large) + PS 34.69 58.45 28.54 88.11

Table 6: Qualitative evaluation results of open relation mod- fewer hops, it is easier to generate their relation descriptions with eling. PLMs, no matter whether a reasoning path is incorporated or not. Qualitative Evaluation. We also perform a qualitative evalua- Rating (1-4) tion by asking three annotators to assign graded values to relation RelationBART-Vanilla (Large) 2.67 descriptions generated by our models according to Table 2. We RelationBART-SP (Large) 2.82 randomly sample 100 reasonable entity pairs from the test set for RelationBART-MP (Large) + PS 3.01 evaluation. The average pairwise Cohen’s kappa is 0.67, which indicates a substantial agreement (0.61-0.8) according to [14]. Table 6 represents the qualitative evaluation results of open re- lation modeling. We observe the performance is satisfactory. Our Table 7: Results of reasoning path selection. best model RelationBART-MP (Large) + PS achieves a rating of about 3, which means the model can often generate a relation description Accuracy (%) that well captures the relation, where only minor errors that do not affect the understanding of the relation are included. Inad- Random Walk 64.43 dition, the qualitative evaluation results are consistent with the Shortest Path 61.34 quantitative evaluation results in Table 3 and Table 5, which vali- RelationBART-SP (Large) 72.68 dates the function of reasoning paths and reasoning path selection RelationBART-MP (Large) 80.93 mechanism.

3.4 Reasoning Path Selection additional relational knowledge to PLMs for open relation model- Results in Tables 3, 5, and 6 indicate machines can select better rea- ing. Besides, the models with reasoning path selection mechanism soning paths for open relation modeling by confidence estimation. outperform the ones without it, which indicates PLMs can select In this section, we aim to test the quality of the selected reasoning more interpretable and informative reasoning paths based on confi- paths from a human understanding perspective. dence estimation, and the selected reasoning paths can guide PLMs We randomly select 300 entity pairs from the test set and ensure to generate more reasonable and precise relation descriptions. all the pairs are associated with at least two reasoning paths with We also divide the testing examples into two groups: reasonable, hops ≤ 3. For each entity pair, we randomly select 2 reasoning paths where the entities can be reasoned within 3 hops in the Wikidata and manually label which one is more interpretable and informative, knowledge graph, and hard-to-reason, where the entities cannot i.e., humans can understand the relation between two entities more be reasoned within 3 hops. From the results shown in Table 5, we easily by reading the reasoning path. We skip pairs that are difficult observe that, for the reasonable pairs, the performance improvement to judge which path is better. Among the 300 pairs, 106 pairs were is significant, while for the hard-to-reason pairs, there is not much skipped. difference in model performance. This is because, for hard-to-reason Table 7 reports the results of reasoning path selection with dif- pairs, PLMs cannot incorporate additional relational knowledge ferent methods. The Random Walk baseline selects the reasoning from KGs only with encoding “푥; unknown: 푦”– which shows the path by the probability of generating the path starting from the training of the model is stable and the variance of the results is low. head entity, which is suggested by [15]. The Shortest Path baseline Besides, all the models perform much better on reasonable pairs, selects the path with a shorter length (for 52 cases where the length which indicates if two entities can be reasoned in existing KGs with of two paths is the same, we randomly choose one). We should 6 Table 8: Sample of relation descriptions generated by RelationBART-MP (Large).

Input Reasoning Path (Encoding) Output Score Romeries [reference] - Romeries is a commune in the Nord department in northern France. - Romeries; country: France [NOT selected] Romeries is a commune in the Somme department in Hauts-de-France in -0.29 northern France. Romeries; located in the administrative territo- Romeries is a commune in the Nord department in northern France. -0.13 (Romeries, France) rial entity: Nord; country: France [selected] Romeries; shares border with: Solesmes; coun- Romeries is a commune in the Marne department in north-eastern France. -0.31 try: France [NOT selected] data mining [refer- - Data mining is a process of extracting and discovering patterns in large data - ence] sets involving methods at the intersection of machine learning, statistics, and database systems. (data mining, artificial data mining; facet of: machine learning; sub- Data mining is a subfield of machine learning and artificial intelligence con- -0.95 intelligence) class of: artificial intelligence cerned with the collection, processing, and analysis of large amounts of data. (data mining, database) data mining; facet of: database Data mining is the process of extracting information from a data set, such as -0.94 a database. (data mining, statistics) data mining; facet of: machine learning; par- Data mining is the process of collecting, processing, and analyzing large -1.04 tially coincident with: statistics amounts of data in the form of statistics and machine learning. (data mining, data) data mining; facet of: machine learning; par- Data mining is the process of extracting useful information from data. -0.81 tially coincident with: statistics; studies: data evaluation [reference] - Evaluation is a systematic determination of a subject’s merit, worth and - significance, using criteria governed by a set of standards. (evaluation, machine evaluation; unknown: machine learning In computer science, evaluation is the process of evaluating a machine learning -1.30 learning) algorithm to determine whether the algorithm is performing well. (evaluation, algorithm) evaluation; unknown: algorithm In computer science, evaluation is the process of evaluating an algorithm to -1.13 determine whether it is correct. (evaluation, robotics) evaluation; unknown: robotics In robotics, evaluation is the process of determining whether or not a particular -1.54 component of a system is working properly. (evaluation, software evaluation; unknown: software engineering In computer science and software engineering, evaluation is the process of -1.26 engineering) determining whether a particular feature or feature should be added to a product or service. (evaluation, computer evaluation; unknown: computer security In computer security, evaluation is the process of determining the security of -1.09 security) a computer system. notice that there is a gap between machine understanding and one associated with the highest confidence score. By reading these human understanding– machines may interpret a reasoning path generated relation descriptions, humans can better understand data differently from humans; therefore, a reasoning path that ismore mining and its relationship with other terms. interpretable and informative for humans may be the opposite for The third group contains five hard-to-reason entity pairs whose machines. head entity is evaluation. We can see the reference definition of We can see the performance of RelationBART-MP (Large) is quite evaluation is quite abstract, which cannot capture the relation be- impressive, where machines make the same choices as humans tween evaluation and a specific field, while the generated relation in more than 80% of the cases. In addition, results in Table 7 are descriptions are more precise and easier to understand. By reading consistent with results in Table 3, which indicates a better reasoning these sentences, humans can understand what evaluation means in path selection mechanism can promote machines to generate better different fields and how it relates to them. relation descriptions. Error Analysis. To further understand the quality of the outputs produced by our model and identify the remaining challenges of 3.5 Generation Examples and Error Analysis open relation modeling, we investigate the error cases found by Table 8 shows some generation examples via the RelationBART-MP examining the generated relation descriptions. As a result, we found (Large) model. The first row of each group is the reference definition most errors can refer to as hallucinations, i.e., producing irrelevant in Wikipedia. The first group contains reasoning paths connecting or contradicted facts. This type of error is mainly due to knowledge Romeries and France, we can see the second reasoning path is the coming from pre-training, fine-tuning, and reasoning paths is not most interpretable and informative, where the confidence score is sufficient for generating an informative relation description forthe the highest, and the generated relation description is the same as input entity pair. the ground truth. Taking entity pair (Romeries, France) in Table 8 as an example, if The second group contains four entity pairs whose head entity the model takes the shortest reasoning path, i.e., Romeries; country: is data mining, and the reasoning path presented in each row is the France, as input, a relation description that wrongly predicts the 7 department of Romeries will be generated. This is because knowl- with entities with multi-hop relations, even including entities that edge about the department is missing from the reasoning path, and cannot be reasoned in existing KGs. such detailed knowledge is also difficult to obtain from the trained model. 5 CONCLUSION Another example is (Play It Loud, rock music), where the refer- In this paper, we introduce and study the novel open relation mod- ence relation description is “Play It Loud is the second studio eling problem– generating coherent sentences to describe general by the British rock group .” The reasoning path selected by relations between entities, where the relations between entities RelationBART-MP (Large) is {Play It Loud, performer, Slade, genre, can be multi-hop, even cannot be reasoned in an existing KG. We hard rock, subclass of, rock music}. This reasoning path contains achieve this by fine-tuning PLMs to generate definition-like sen- detailed knowledge about the performer; however, it is still difficult tences conditional with extracted entity pairs. We show that rea- to judge whether Play It Loud is a song or an album. As a result, soning paths extracted from KGs can help PLMs reason between the model generates “Play It Loud is a song by the British rock band entities and provide additional relational knowledge to PLMs for Slade.” open relation modeling. In addition, PLMs can select interpretable Hallucination is a common issue and challenging problem in text and informative reasoning paths by confidence estimation, and the generation. From the results in Table 6 and the generation examples, selected reasoning paths can in turn guide PLMs to generate better we can observe hallucination is reduced by incorporating reasoning relation descriptions. Experimental results show that our methods paths and the reasoning path selection mechanism. How to further can generate reasonable and precise relation descriptions with the alleviate it for open relation modeling will be our further work desired characteristics. direction. We believe this work will open a door for modeling relations between entities. We also need to mention that this work is just the beginning of this worthwhile direction. As for future work, we 4 RELATED WORK plan to improve our model and apply our methods to downstream applications, e.g., a system for users to explore relations between Previously, Voskarides et al. [33] study the problem of extracting entities. We can also use the generated relation descriptions to sentences that describe relations between entities with direct rela- help some related tasks, such as relation extraction [2], knowledge tions in a knowledge graph. They model this task as a learning to graph construction and completion [12]. The trained models can be rank problem and design a supervised learning model with manu- further fine-tuned for open relation modeling on specific domains. ally annotated sentences. As follow-up work, Huang et al. [11] solve this task with training data built by leveraging clickthrough data from Web search, and Voskarides et al. [32] generate the description REFERENCES of a relationship instance in a knowledge graph by filling created [1] Oshin Agarwal, Heming Ge, Siamak Shakeri, and Rami Al-Rfou. 2021. Knowledge Graph Based Synthetic Corpus Generation for Knowledge-Enhanced Language sentence templates with appropriate entities. The ability of these Model Pre-training. In Proceedings of the 2021 Conference of the North Ameri- models is limited since they heavily rely on the features of entities can Chapter of the Association for Computational Linguistics: Human Language and relations; thus these models can only handle entities with sev- Technologies. 3554–3565. [2] Nguyen Bach and Sameer Badaskar. 2007. A review of relation extraction. Liter- eral pre-specified types (only 10 in[32]) of explicit relations in KGs ature review for Language and Statistics II 2 (2007), 1–15. (e.g., isMemberOfMusicGroup), while our proposed methods can [3] Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for deal with a large number of types of relations, including implicit MT evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine ones (e.g., evaluation and algorithm), i.e., in an “open” setting. translation and/or summarization. 65–72. Recently, Lin et al. [17] propose a constrained text generation [4] Michele Banko, Michael J Cafarella, Stephen Soderland, Matthew Broadhead, and Oren Etzioni. 2007. Open Information Extraction from the Web. In IJCAI. task, named CommonGen, that aims to generate coherent sentences [5] Michele Bevilacqua, Marco Maru, and Roberto Navigli. 2020. Generationary describing everyday scenarios containing the given common con- or:“How We Went beyond Word Sense Inventories and Learned to Gloss”. In cepts. Following their work, Liu et al. [20] introduce a knowledge- Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 7207–7221. graph augmented pre-trained language model to improve the per- [6] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, formance of CommonGen. Different from CommonGen, we aim Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda to describe the general relation between two entities by defining Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris one entity in terms of the other entity. Therefore, the generated Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack sentences can explain the relation between entities intuitively and Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. In Advances in explicitly. In addition, Dognin et al. [9] study the problem of trans- Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. ferring sentences to paths in commonsense knowledge bases and Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 1877–1901. generating sentences with triples as input, with the main focus [7] Wenhu Chen, Wenhan Xiong, Xifeng Yan, and William Wang. 2018. Variational knowledge graph reasoning. arXiv preprint arXiv:1803.06581 (2018). on knowledge base completion. Agarwal et al. [1] study the data- [8] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: to-text generation problem [13] that converts the KG into natural Pre-training of Deep Bidirectional Transformers for Language Understanding. In text with language models. The focus of these works is to con- Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and vert knowledge graphs into natural language, while we propose Short Papers). 4171–4186. to discover relation descriptions between entities with pre-trained [9] Pierre Dognin, Igor Melnyk, Inkit Padhi, Cicero dos Santos, and Payel Das. 2020. DualTKB: A Dual Learning Bridge between Text and Knowledge Base. In Proceed- language models. Besides, only common concepts or entities with ings of the 2020 Conference on Empirical Methods in Natural Language Processing direct relations are studied in these works, while our methods deal (EMNLP). 8605–8616. 8 [10] Maxwell Forbes, Ari Holtzman, and Yejin Choi. 2019. Do Neural Language [34] Denny Vrandečić and Markus Krötzsch. 2014. Wikidata: a free collaborative Representations Learn Physical Commonsense?. In CogSci. knowledgebase. Commun. ACM 57, 10 (2014), 78–85. [11] Jizhou Huang, Wei Zhang, Shiqi Zhao, Shiqiang Ding, and Haifeng Wang. 2017. [35] Chenguang Wang, Xiao Liu, and Dawn Song. 2020. Language Models are Open Learning to explain entity relationships by pairwise ranking with convolutional Knowledge Graphs. arXiv preprint arXiv:2010.11967 (2020). neural networks. In Proceedings of the 26th International Joint Conference on [36] Wenhan Xiong, Thien Hoang, and William Yang Wang. 2017. Deeppath: A Artificial Intelligence. 4018–4025. reinforcement learning method for knowledge graph reasoning. arXiv preprint [12] Shaoxiong Ji, Shirui Pan, Erik Cambria, Pekka Marttinen, and S Yu Philip. 2021. arXiv:1707.06690 (2017). A Survey on Knowledge Graphs: Representation, Acquisition, and Applications. [37] Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q Weinberger, and Yoav Artzi. IEEE Transactions on Neural Networks and Learning Systems (2021). 2019. BERTScore: Evaluating Text Generation with BERT. In International Con- [13] Karen Kukich. 1983. Design of a knowledge-based report generator. In 21st ference on Learning Representations. Annual Meeting of the Association for Computational Linguistics. 145–150. [38] Xuhui Zhou, Yue Zhang, Leyang Cui, and Dandan Huang. 2020. Evaluating com- [14] J Richard Landis and Gary G Koch. 1977. The measurement of observer agreement monsense in pre-trained language models. In Proceedings of the AAAI Conference for categorical data. biometrics (1977), 159–174. on Artificial Intelligence, Vol. 34. 9733–9740. [15] Ni Lao, Tom Mitchell, and William Cohen. 2011. Random walk inference and learning in a large scale knowledge base. In Proceedings of the 2011 conference on empirical methods in natural language processing. 529–539. [16] Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2020. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 7871–7880. [17] Bill Yuchen Lin, Wangchunshu Zhou, Ming Shen, Pei Zhou, Chandra Bhagavatula, Yejin Choi, and Xiang Ren. 2020. CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings. 1823– 1840. [18] Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out. 74–81. [19] Xiao Liu, Yanan Zheng, Zhengxiao Du, Ming Ding, Yujie Qian, Zhilin Yang, and Jie Tang. 2021. GPT Understands, Too. arXiv preprint arXiv:2103.10385 (2021). [20] Ye Liu, Yao Wan, Lifang He, Hao Peng, and Philip S Yu. 2020. KG-BART: Knowl- edge Graph-Augmented BART for Generative Commonsense Reasoning. arXiv preprint arXiv:2009.12677 (2020). [21] Jan Niehues and Ngoc-Quan Pham. 2019. Modeling Confidence in Sequence-to- Sequence Models. In Proceedings of the 12th International Conference on Natural Language Generation. 575–583. [22] Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics. 311–318. [23] Fabio Petroni, Tim Rocktäschel, Sebastian Riedel, Patrick Lewis, Anton Bakhtin, Yuxiang Wu, and Alexander Miller. 2019. Language Models as Knowledge Bases?. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Pro- cessing (EMNLP-IJCNLP). 2463–2473. [24] Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9. [25] Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research 21 (2020), 1–67. [26] Kyle Richardson and Ashish Sabharwal. 2020. What does my QA model know? devising controlled probes using expert knowledge. Transactions of the Association for Computational Linguistics 8 (2020), 572–588. [27] Adam Roberts, Colin Raffel, and Noam Shazeer. 2020. How Much Knowledge Can You Pack into the Parameters of a Language Model?. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 5418–5426. [28] Michael Schmitz, Stephen Soderland, Robert Bart, Oren Etzioni, et al. 2012. Open language learning for information extraction. In Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning. 523–534. [29] Manhung Siu and Herbert Gish. 1999. Evaluation of word confidence for speech recognition systems. Computer Speech & Language 13, 4 (1999), 299–319. [30] Nicola Ueffing and Hermann Ney. 2007. Word-level confidence estimation for machine translation. Computational Linguistics 33, 1 (2007), 9–40. [31] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998–6008. [32] Nikos Voskarides, Edgar Meij, and Maarten de Rijke. 2017. Generating descrip- tions of entity relationships. In European Conference on Information Retrieval. Springer, 317–330. [33] Nikos Voskarides, Edgar Meij, Manos Tsagkias, Maarten De Rijke, and Wouter Weerkamp. 2015. Learning to explain entity relationships in knowledge graphs. In Proceedings of the 53rd Annual Meeting of the Association for Computational Lin- guistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 564–574.

9