JEL: Applying End-To-End Neural Entity Linking in Jpmorgan Chase

PRELIMINARY PREPRINT VERSION: DO NOT CITE The AAAI Digital Library will contain the published version some time after the conference. JEL: Applying End-to-End Neural Entity Linking in JPMorgan Chase Wanying Ding, Vinay K. Chaudhri, Naren Chittar, Krishna Konakanchi JPMorgan Chase & Co. fwanying.ding, vinay.chaudhri, [email protected], [email protected] Abstract ambiguous, and by relating such mentions with an internal knowledge graph, we can generate valuable alerts for finan- Knowledge Graphs have emerged as a compelling abstrac- tion for capturing key relationship among the entities of in- cial analysts. In Figure 1, we show a concrete example in terest to enterprises and for integrating data from heteroge- which the name “Lumier” has been mentioned in two differ- neous sources. JPMorgan Chase (JPMC) is leading this trend ent news items. “Lumier”s are two different companies in by leveraging knowledge graphs across the organization for the real world, and their positive financial activities should multiple mission critical applications such as risk assessment, be brought to the attention of different stakeholders. With a fraud detection, investment advice, etc. A core problem in successful EL engine, these two mentions of “Lumier”s can leveraging a knowledge graph is to link mentions (e.g., com- be distinguished and linked to their corresponding entities in pany names) that are encountered in textual sources to en- a knowledge graph. tities in the knowledge graph. Although several techniques Prior work on EL has been driven by a number of stan- exist for entity linking, they are tuned for entities that ex- dard datasets, such as CoNLYAGO (Suchanek, Kasneci, and ist in Wikipedia, and fail to generalize for the entities that 1 2 3 are of interest to an enterprise. In this paper, we propose a Weikum 2007), TAC KBP , DBpedia , and ACE . These novel end-to-end neural entity linking model (JEL) that uses datasets are based on Wikipedia, and are therefore, naturally minimal context information and a margin loss to generate coherent, well-structured and rich in context (Eshel et al. entity embeddings, and a Wide & Deep Learning model to 2017). We face the following problems when we use these match character and semantic information respectively. We methods for entity linking for our internal knowledge graph: show that JEL achieves the state-of-the-art performance to 1) Wikipedia does not cover all the entities of financial inter- link mentions of company names in financial news with entities in our knowledge graph. We report on our efforts to est. For example, as of this writing, the startup “Lumier” deploy this model in the company-wide system to generate mentioned in Figure 1 is not present in Wikipedia, but it alerts in response to financial news. The methodology used is of high financial interest as it has raised critical invest- for JEL is directly applicable and usable by other enterprises ment from famous investors. who need entity linking solutions for data that are unique to 2) Lack of context information. Many pre-trained models their respective situations. achieve great performance by leveraging rich context data from Wikipedia (Ganea and Hofmann 2017). For JPMC Introduction internal data, we do not have information comparable to Knowledge Graphs are being used for a wide range of Wikipedia to support re-training or fine-tuning of existing applications from space, journalism, biomedicine to enter- models. tainment, network security, and pharmaceuticals. Within JP To address the problems identified above, we built a novel Morgan Chase (JPMC), we are leveraging knowledge graphs entity linking system, JEL, to link mentions of company for financial applications such as risk management, sup- names in text to entities in our own knowledge graph. Our ply chain analysis, strategy implementation, fraud detec- model makes the following advancements on the current tion, investment advice, etc. While leveraging a knowledge state-of-the-art: graph, Entity Linking (EL) is a central task for semantic 1) We do not rely on Wikipedia to generate entity embed- text understanding and information extraction. As defined in dings. With minimum context information, we compute many studies (Zhang et al. 2010; Eshel et al. 2017; Kolitsas, entity embeddings by training a Margin Loss function. Ganea, and Hofmann 2018), in an EL task we link a poten- tially ambiguous Mention (such as a company name) with 2) We deploy the Wide & Deep Learning (Cheng et al. 2016) its corresponding Entity in a knowledge graph. EL can fa- to match character and semantic information respectively. cilitate several knowledge graph applications, for example, 1 the mentions of company names in the news are inherently https://www.ldc.upenn.edu/collaborations/current- projects/tac-kbp Copyright c 2021, Association for the Advancement of Artificial 2https://wiki.dbpedia.org/develop/datasets Intelligence (www.aaai.org). All rights reserved. 3https://catalog.ldc.upenn.edu/LDC2006T06 Figure 1: Example for Entity Linking Unlike other deep learning models (Martins, Marinho, – String Matching Methods. String matching measures and Martins 2019; Kolitsas, Ganea, and Hofmann 2018; the similarity between the mention string and entity name Ganea and Hofmann 2017), JEL applies a simple linear string. We experimented with different string matching layer to learn character patterns, making the model more methods for name matching, including Jaccard, Leven- efficient both in the training phase and inference phase. shtein, Ratcliff-Obershelp, Jaro Winkler, and N-Gram Co- sine Simiarity, and found that n-gram cosine similarity Problem Definition and Related Work achieves the best performance on our internal data. How- ever, pure string-matching methods breakdown when two Problem Definition different entities share similar or the same name (as shown We assume a knowledge graph (KG) has a set of entities in Figure 1) which motivates the need for better matching E. We further assume that W is the vocabulary of words techniques. in the input documents. An input document D is given as – Context Similarity Methods. Context Similarity meth- a sequence of words: D = fw1; w2; :::; wdg where wk 2 ods compare similarities of respective context words for W; 1 ≤ k ≤ d. The output of an EL model is a list of T mentions and entities. The context words for a mention mention-entity pairs f(mi; ei)gi2f1;T g, where each mention are the words surrounding it in the document. The context is a word subsequence of D, mi = wl; :::; wr; 1 ≤ l ≤ words for an entity are the words describing it in the KG. r ≤ d, and each entity ei 2 E. The entity linking process Similarity functions, such as Cosine Similarity or Jaccard involves the following two steps (Ceccarelli et al. 2013). Similarity, are widely used to compare the two sets of context words (Cucerzan 2007; Mihalcea and Csomai 2007), 1) Recognition. Recognize a list of mentions mi as a set of all contiguous sequential words occurring in D that might and then to decide whether a mention and an entity should 4 be linked. mention some entity ei 2 E. We adopted spaCy for mention recognition. – Machine Learning Classification. Many studies adopt machine learning techniques for the EL task. Binary 2) Linking. Given a mention mi, and the set of candidate classifiers, such as Naive Bayes (Varma et al. 2009), entities, C(mi) such that jC(mi)j > 1, from the KG, C4.5 (Milne and Witten 2008), Binary Logistic classifier choose the correct entity, ei 2 C(mi), to which the mention should be linked. We focus on solving the linking (Han, Sun, and Zhao 2011), and Support Vector Machines problem in this paper. (SVM) (Zhang et al. 2010), can be trained on mention- entity pairs to decide whether they should be linked. Popular Methods – Learn to Rank Methods. As a classification method will Entity Linking is a classical NLP problem for which the fol- generate more than one mention-entity pairs, many sys- lowing techniques have been used: String Matching, Con- tems use a ranking model (Zheng et al. 2010) to select the text Similarity, Machine Learning Classification, Learning most likely match. Learning to Rank (LTR) is a class of to Rank, and Deep Learning. In the following several para- techniques that supplies supervised machine learning to graphs, we will briefly discuss each of them. solve ranking problems. – Deep Learning Methods. Deep learning has achieved 4https://spacy.io/ success on numerous tasks including EL (Sun et al. 2015; Huang, Heck, and Ji 2015; Francis-Landau, Durrett, and Entity Linking Klein 2016). One specific model (Kolitsas, Ganea, and Two factors affect an EL model’s performance: Characters Hofmann 2018) uses two levels of Bi-LSTM to embed and Semantics. characters into words, and words into mentions, and cal- culates the similarity between a mention vector and a pre- – Characters: “Lumier” will be easily distinguished from trained entity vector (Ganea and Hofmann 2017) to decide “ ParallelM” because they have completely different char- whether they match. acter patterns. These patterns can be easily captured by a wide and shallow linear model. Proposed Framework – Semantics. In Figure 1, “Lumier(Software)” can be dis- Entity Embedding tinguished from “Lumier (LED)” because they have different semantic meanings behind the same name. These Most public entity embedding models (He et al. 2013; Ya- semantic differences can be captured by a deep learning mada et al. 2016; Ganea and Hofmann 2017) are designed model. for Wikipedia pages and require rich entity description information. In our case, each entity has a short description To combine the two important factors listed above, we de- that is insufficient to support a solid statistical estimation of velop a Wide&Deep Learning model (Cheng et al. 2016) for entity embeddings (Mikolov, Yih, and Zweig 2013).

Load more