Arxiv:2009.13827V1 [Cs.CL] 29 Sep 2020 to Cluster All Vocabulary Terms Into Synsets
SynSetExpan: An Iterative Framework for Joint Entity Set Expansion and Synonym Discovery Jiaming Shen1?, Wenda Qiu1?, Jingbo Shang2, Michelle Vanni3, Xiang Ren4, Jiawei Han1 1University of Illinois Urbana-Champaign, IL, USA, 2University of California San Diego, CA, USA 3U.S. Army Research Laboratory, MD, USA, 4University of Southern California, CA, USA 1{js2, qiuwenda, hanj}@illinois.edu 2jshang@ucsd.edu 4michelle.t.vanni.civ@mail.mil 4xiangren@usc.edu User Provided Vocabulary V Text Corpus Discovered Synsets VC Abstract D S Seed Synsets derives of Semantic Class C S Illinois IL part of Wisconsin WI Land of Lincoln America’s Dairyland Entity set expansion and synonym discovery inputs inputs outputs are two critical NLP tasks. Previous studies Texas TX SynSetExpan Framework Washington WA inputs accomplish them separately, without exploring Lone Star State Evergreen State their interdependences. In this work, we hy- pothesize that these two tasks are tightly cou- California CA Connecticut CT Golden State Set Expansion Synonym Discovery …… …… …… pled because two synonymous entities tend to have similar likelihoods of belonging to var- Figure 1: An illustrative example of joint entity set ious semantic classes. This motivates us to expansion and synonym discovery. design SynSetExpan, a novel framework that enables two tasks to mutually enhance each et al., 2017), taxonomy construction (Shen et al., other. SynSetExpan uses a synonym discovery 2018a), and online education (Yu et al., 2019a). model to include popular entities’ infrequent Previous studies regard ESE and ESD as two synonyms into the set, which boosts the set expansion recall. Meanwhile, the set expan- independent tasks. Many ESE methods (Mamou sion model, being able to determine whether et al., 2018b; Yan et al., 2019; Huang et al., 2020; an entity belongs to a semantic class, can gen- Zhang et al., 2020; Zhu et al., 2020) are developed erate pseudo training data to fine-tune the syn- to iteratively select and add the most confident en- onym discovery model towards better accuracy.
[Show full text]