Alicg: Fine-Grained and Evolvable Conceptual Graph Construction for Semantic Search at Alibaba

AliCG: Fine-grained and Evolvable Conceptual Graph Construction for Semantic Search at Alibaba Ningyu Zhang1,2∗, Qianghuai Jia4∗, Shumin Deng1,2∗, Xiang Chen1,2, Hongbin Ye1,2, Hui Chen3, Huaixiao Tou3, Gang Huang5, Zhao Wang1, Nengwei Hua3, Huajun Chen1,2¢ 1Zhejiang University, China & AZFT Joint Lab for Knowledge Engine, China 3Alibaba Group, China 2Hangzhou Innovation Center, China, Zhejiang University, China, 4AntGroup, China 5Zhejiang Lab, China {zhangningyu,231sm,xiang_chen,yehongbin,huanggang,zhao_wang,huajunsir}@zju.edu.cn, {qianghuai.jqh,weidu.ch,huaixiao.thx,nengwei.huanw}@alibaba-inc.com ABSTRACT 1 INTRODUCTION Conceptual graphs, which is a particular type of Knowledge Graphs, Knowledge is important for text-related applications such as se- play an essential role in semantic search. Prior conceptual graph mantic search. Knowledge Graphs (KGs) organize facts in a struc- construction approaches typically extract high-frequent, coarse- tured graph way as triples in the form of <subject, predicate, object>, grained, and time-invariant concepts from formal texts. In real abridged as ¹B, ?, >º, where B and > denote entities and ? builds rela- applications, however, it is necessary to extract less-frequent, fine- tions between entities. Conceptual Graph, which is a special type grained, and time-varying conceptual knowledge and build taxon- of KGs, builds the semantic connections between concepts and has omy in an evolving manner. In this paper, we introduce an approach proven to be valuable in short text understanding [21], Word Sense to implementing and deploying the conceptual graph at Alibaba. Disambiguation [20], enhanced entity linking [5], semantic query Specifically, We propose a framework called AliCG which is capa- rewriting [22], etc. Essentially, conceptualization helps humans ble of a) extracting fine-grained concepts by a novel bootstrapping generalize previously gained knowledge and experience to new with alignment consensus approach, b) mining long-tail concepts settings, which may reveal paths to high-level cognitive System 2 with a novel low-resource phrase mining approach, c) updating the [2] in a conscious manner. In real-life applications, the conceptual graph dynamically via a concept distribution estimation method graph provides valuable knowledge to support many applications based on implicit and explicit user behaviors. We have deployed the [25], such as semantic search. Web search engines (e.g., Google and framework at Alibaba UC Browser. Extensive offline evaluation as Bing) leverage a taxonomy to better understand user queries and well as online A/B testing demonstrate the efficacy of our approach. improve the search quality. Moreover, many online retailers (e.g., Alibaba and Amazon) organize products into categories of different granularities so that customers can easily search and navigate this CCS CONCEPTS category taxonomy to find the items they want to purchase. • Information systems ! Query representation; Information In this paper, we introduce the Alibaba Conceptual Graph (Al- extraction. iCG), which is a large-scale conceptual graph of more than 5,000,000 fine-grained concepts, still in fast growth, automatically extracted KEYWORDS from noisy search logs. As shown in Figure 1, AliCG comprises level-1 Concept Mining; Taxonomy Construction; Knowledge Graph four levels: consists of concepts expressing the domain that those instances belong to; level-2 consists of concepts referred to ACM Reference Format: level-3 1,2∗ 4∗ 1,2∗ 1,2 the type or subclass of instances; consists of concepts that Ningyu Zhang , Qianghuai Jia , Shumin Deng , Xiang Chen , are the fine-grained conceptualization of instances expressing the Hongbin Ye1,2, Hui Chen3, Huaixiao Tou3, Gang Huang5, Zhao Wang1, implicit user intentions; instance layer includes all instances such Nengwei Hua3, Huajun Chen1,2¢ . 2021. AliCG: Fine-grained and Evolvable as entities and non-entity phrases. AliCG is currently deployed at arXiv:2106.01686v1 [cs.AI] 3 Jun 2021 Conceptual Graph Construction for Semantic Search at Alibaba. In Proceed- ings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Alibaba to support a variety of business scenarios, including the Mining (KDD ’21), August 14–18, 2021, Virtual Event, Singapore. ACM, New product Alibaba UC Browser. AliCG has been applied to more than York, NY, USA, 12 pages. https://doi.org/10.1145/3447548.3467057 dozens of applications in Alibaba UC Browser, including intent classification, named entity recognition, query rewriting, andso ∗ Equal contribution and shared co-first authorship. ¢ Corresponding author. on, and it receives more than 300 million requests per day. Building AliCG is not a trivial task. Previous studies such as Permission to make digital or hard copies of all or part of this work for personal or YAGO [18] and DBPedia [1] have investigated the extraction of classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation knowledge from formal texts (e.g., Wikipedia). Probase [23] pro- on the first page. Copyrights for components of this work owned by others than ACM poses an approach for extracting concepts from semi-structured must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, Web documents. However, these approaches could not be adapted to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. to our applications because several challenges remain unresolved. KDD ’21, August 14–18, 2021, Virtual Event, Singapore Fine-grained Concept Acquisition. Conventional approaches © 2021 Association for Computing Machinery. devoted to extracting coarse-grained concepts such as categories ACM ISBN 978-1-4503-8332-5/21/08...$15.00 https://doi.org/10.1145/3447548.3467057 or types. However, in Alibaba’s scenario of question fine-grained KDD ’21, August 14–18, 2021, Virtual Event, Singapore Ningyu Zhang et al. Query: Thing 1.0 1.0 Is it safe to eat durian during 1.0 1.0 1.0 confinement? level1 concept 健康 health 娱乐 entertainment 人物 people 组织 organization tool Concepts of the Query: 工具 • during confinement isA 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 postpartum confinement period 饮食生理疾病歌曲电影虚拟人物公司厨具 • eat durian isA virtual level2 concept diet physiology disease song film company kitchenware eat tropical fruit character 1.0 1.0 0.6 1.0 产褥期古风歌曲神话角色互联网公司 0.8 0.9 1.0 postpartum 1.0 antiquity mythological 1.0 Internet confinement period song character company Concept Level Matching level3 concept eat tropical rare mental Chinese folk tale barbecue fruit disorder animation film character tool 0.9 吃热带水果罕见精神障碍中国动漫电影传说人物 0.9 烧烤工具 0.9 0.9 Concepts of the Documents: 0.1 0.6 0.8 0.1 • postpartum confinement period 0.9 0.9 0.8 0.9 during aboulomania Google • eat tropical fruit instance confinement 坐月子 body integrity 意志缺失狂谷歌 • eat fruit eat durian Nezha Zeus Alibaba grill identity disorder • dietary contraindications 吃榴莲身体完整性认同障碍哪吒宙斯阿里巴巴烧烤架 Figure 1: Data hierarchy of Alibaba Conceptual Graph (AliCG) for semantic search. concepts are necessary to increase the recall of answer results. edges, it is necessary to align those nodes with the same meaning For example, “grill (烤¶)” is a “tool (工w)” and “scarf (围巾)” is in the conceptual graph. Besides, as there are many multiple-to- a "clothes (服p)”. However, it would be more helpful if we can multiple nodes in the conceptual graph, it is prohibitive to update infer that a user searching for these items may be more interested such complex graphs over time. In other words, it is difficult to es- in “barbecue tool (烧烤工w)” or "keep warm clothes (保暖服 timate the confidence distribution of the concepts given instances. p)” rather than another "tool (工w)" like “wrench (sK)”—these To address challenges mentioned above, we propose the follow- concepts are rare in existing conceptual graphs. ing contributions in the design of AliCG: Long-tail Concept Mining. Conventional approaches [23] gen- First, we propose a novel bootstrapping with the alignment erally extract concepts based on Hearst patterns, e.g., "especially" consensus approach to tackle the first challenge of extracting fine- and "such as." However, these approaches cannot extract long-tail grained concepts from noisy search logs. Specifically, we utilize concepts from extremely short or noisy queries, which are com- a small number of predefined string patterns to extract concepts, mon in search engines. For instance, it is non-trivial to extract the which are then used to expand the pool of such patterns. Further, the concept "rare mental disorder (UÁ精^障碍)" of the instance new mined concepts are verified with query-title alignment; that is, "body integrity identity disorder («S完t'¤同障碍)" from the an essential concept in a query should repeat several times in the search log as only 35 instances mentioned "rare mental disorder". It document title frequently clicked by the user. Second, we introduce is rather difficult to extract such concepts from the short text with a novel conceptualized phrase mining and self-training with pattern matching (the pattern is too general [6–8, 26, 27], and there an ensemble consensus approach to extract long-tail concepts. On is little context information as well as co-occurrence samples), as the one hand, we extend the off-the-shelf phrase mining algorithm Figure 2 shows. Besides, there exist lots of scattered concepts in with conceptualized features to mine concepts unsupervisedly. On user search engines such as "traditional activities Tibetan New Year the other hand, we propose a novel low-resource sequence tagging (Ï历新t`俗)". Recent approaches usually regard such concept framework, namely, self-training with an ensemble consensus, to extraction procedure as sequence labeling tasks [13], which rely extract those scattered concepts. Finally, we propose a novel con- on a tremendous amount of training data for each concept.

Alicg: Fine-Grained and Evolvable Conceptual Graph Construction for Semantic Search at Alibaba

Formal Concept Analysis in Knowledge Processing: a Survey on Applications ⇑ Jonas Poelmans A,C, , Dmitry I

Concept Maps Mining for Text Summarization

Empowering Video Storytellers: Concept Discovery and Annotation for Large Audio-Video-Text Archives

Technical Phrase Extraction for Patent Mining: a Multi-Level Approach

Information Extraction in Text Mining Matt Ulinsm Western Washington University

Concept Mining: a Conceptual Understanding Based Approach

A User-Centered Concept Mining System for Query and Document Understanding at Tencent

Unsupervised Extraction and Clustering of Key Phrases from Scientific Publications

Alicoco: Alibaba E-Commerce Cognitive Concept

Unsupervised Concept Hierarchy Learning: a Topic Modeling Guided Approach V

Generating Word and Document Embeddings for Sentiment Analysis

A Survey of Building a Reverse Dictionary