Unsupervised Template Mining for Semantic Category Understanding

Total Page:16

File Type:pdf, Size:1020Kb

Unsupervised Template Mining for Semantic Category Understanding Unsupervised Template Mining for Semantic Category Understanding 1,2 3 3 1 3 Lei Shi ∗, Shuming Shi , Chin-Yew Lin , Yi-Dong Shen , Yong Rui 1State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences 2University of Chinese Academy of Sciences 3Microsoft Research shilei,ydshen @ios.ac.cn shumings,cyl,yongrui{ } @microsoft.com { } Abstract For example, both “CEO of General Motors” and “CEO of Yahoo” have structure “CEO of [com- We propose an unsupervised approach to pany]”. We call such a structure a category tem- constructing templates from a large collec- plate. Taking a large collection of open-domain tion of semantic category names, and use categories as input, we construct a list of category the templates as the semantic representa- templates and build a mapping from categories to tion of categories. The main challenge is templates. Figure 1 shows some example semantic that many terms have multiple meanings, categories and their corresponding templates. resulting in a lot of wrong templates. Sta- Templates can be treated as additional features tistical data and semantic knowledge are of semantic categories. The new features can be extracted from a web corpus to improve exploited to improve some upper-layer applica- template generation. A nonlinear scoring tions like web search and question answering. In function is proposed and demonstrated to addition, by linking categories to templates, it is be effective. Experiments show that our possible (for a computer program) to infer the se- approach achieves significantly better re- mantic meaning of the categories. For example in sults than baseline methods. As an imme- Figure 1, from the two templates linking to cat- diate application, we apply the extracted egory “symptom of insulin deficiency”, it is rea- templates to the cleaning of a category col- sonable to interpret the category as: “a symptom lection and see promising results (preci- of a medical condition called insulin deficiency sion improved from 81% to 89%). which is about the deficiency of one type of hor- mone called insulin.” In this way, our knowledge 1 Introduction about a category can go beyond a simple string A semantic category is a collection of items shar- and its member entities. An immediate application ing common semantic properties. For example, of templates is removing invalid category names all cities in Germany form a semantic category from a noisy category collection. Promising re- named “city in Germany” or “German city”. In sults are observed for this application in our ex- Wikipedia, the category names of an entity are periments. manually edited and displayed at the end of the An intuitive approach to this task (i.e., extract- page for the entity. There have been quite a lot of ing templates from a collection of category names) approaches (Hearst, 1992; Pantel and Ravichan- dran, 2004; Van Durme and Pasca, 2008; Zhang et national holiday of Brazil (instances: Carnival, Christmas…) national holiday of [country] al., 2011) in the literature to automatically extract- national holiday of South Africa (instances: Heritage Day, Christmas…) ing category names and instances (also called is-a school in Denver school in [place] or hypernymy relations) from the web. school in Houston school in [city] football player [sport] player Most existing work simply treats a category basketball player name as a text string containing one or multiple symptom of insulin deficiency (instances: nocturia, weight loss…) symptom of [medical condition] words, without caring about its internal structure. symptom of cortisol deficiency symptom of [hormone] deficiency (instances: low blood sugar…) In this paper, we explore the semantic structure Semantic Categories Category templates of category names (or simply called “categories”). Figure 1: Examples of semantic categories and ∗This work was performed when the first author was vis- iting Microsoft Research Asia. their corresponding templates. 799 Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 799–809, October 25-29, 2014, Doha, Qatar. c 2014 Association for Computational Linguistics contains two stages: category labeling, and tem- 1) To the best of our knowledge, this is the first plate scoring. work of template generation specifically for cate- Category labeling: Divide a category name gories in unsupervised manner. into multiple segments and replace some key seg- 2) We extract semantic knowledge and statisti- ments with its hypernyms. As an example, as- cal information from a web corpus for improving sume “CEO of Delphinus” is divided to three seg- template generation. Significant performance im- ments “CEO + of + Delphinus”; and the last seg- provement is obtained in our experiments. ment (Delphinus) has hypernyms “constellation”, 3) We study the characteristics of the scoring “company”, etc. By replacing this segment with function from the viewpoint of probabilistic evi- its hypernyms, we get candidate templates “CEO dence combination and demonstrate that nonlinear of [constellation]” (a wrong template), “CEO of functions are more effective in this task. [company]”, and the like. 4) We employ the output templates to clean Template scoring: Compute the score of each our category collection mined from the web, and candidate template by aggregating the information get apparent quality improvement (precision im- obtained in the first phase. proved from 81% to 89%). A major challenge here is that many segments After discussing related work in Section 2, we (like “Delphinus” in the above example) have mul- define the problem and describe one baseline ap- tiple meanings. As a result, wrong hypernyms proach in Section 3. Then we introduce our ap- may be adopted to generate incorrect candidate proach in Section 4. Experimental results are re- templates (like “CEO of [constellation]”). In this ported and analyzed in Section 5. We conclude the paper, we focus on improving the template scor- paper in Section 6. ing stage, with the goal of assigning lower scores to bad templates and larger scores to high-quality 2 Related work ones. Several kinds of work are related to ours. There have been some research efforts (Third, Hypernymy relation extraction: Hypernymy 2012; Fernandez-Breis et al., 2010; Quesada- relation extraction is an important task in text min- Martınez et al., 2012) on exploring the structure of ing. There have been a lot of efforts (Hearst, 1992; category names by building patterns. However, we Pantel and Ravichandran, 2004; Van Durme and automatically assign semantic types to the pattern Pasca, 2008; Zhang et al., 2011) in the literature to variables (or called arguments) while they do not. extract hypernymy (or is-a) relations from the web. For example, our template has the form of “city Our target here is not hypernymy extraction, but in [country]” while their patterns are like “city in discovering the semantic structure of hypernyms [X]”. More details are given in the related work (or category names). section. Category name exploration: Category name A similar task is query understanding, including patterns are explored and built in some ex- query tagging and query template mining. Query isting research work. Third (2012) pro- tagging (Li et al., 2009; Reisinger and Pasca, posed to find axiom patterns among category 2011) corresponds to the category labeling stage names on an existing ontology. For ex- described above. It is different from template gen- ample, infer axiom pattern “SubClassOf(AB, eration because the results are for one query only, B)” from “SubClassOf(junior school school)” without merging the information of all queries to and “SubClassOf(domestic mammal mammal)”. generate the final templates. Category template Fernandez-Breis et al. (2010) and Quesada- construction are slightly different from query tem- Martınez et al. (2012) proposed to find lexical pat- plate construction. First, some useful features terns in category names to define axioms (in med- such as query click-through is not available in cat- ical domain). One example pattern mentioned in egory template construction. Second, categories their papers is “[X] binding”. They need man- should be valid natural language phrases, while ual intervention to determine what X means. The queries need not. For example, “city Germany” is main difference between the above work and ours a query but not a valid category name. We discuss is that we automatically assign semantic types to in more details in the related work section. the pattern variables (or called arguments) while Our major contributions are as follows. they do not. 800 Template mining for IE: Some research work Li et al. (2013) proposed an clustering algorithm in information extraction (IE) involves patterns. to group existing query templates by search intents Yangarber (2003) and Stevenson and Greenwood of users. (2005) proposed to learn patterns which were in Compared to the open-domain unsupervised the form of [subject, verb, object]. The category methods for query template construction, our ap- names and learned templates in our work are not proach improves on two aspects. First, we propose in this form. Another difference between our work to incorporate multiple types of semantic knowl- and their work is that, their methods need a super- edge (e.g., term peer similarity and term clusters) vised name classifer to generate the candidate pat- to improve template generation. Second, we pro- terns while our approach is unsupervised. Cham- pose a nonlinear
Recommended publications
  • Günstige Freizeitmöglichkeiten PDF, 329 KB
    Günstige Freizeitgestaltungsmöglichkeiten Bäder Name Anschrift/Homepage Telefon Öffnungszeiten Eintritt Erreichbarkeit Alters- Erfahrungen gruppe Südstadtbad Allersberger Str. 120, 0911/ Mo, Mi, Do: 10 - 22 Uhr ab € 1,70 mit U-Bahnlinie U1, Nürnberg 23114164 Di: 10 - 21 Uhr Nbg.-Pass Haltestelle Maffeiplatz Fr: 10 - 23 Uhr ab € 2,30 ohne Straßenbahnlinien: https://nuernbergbad.nuernberg.de/suedstadtbad/ Sa: 8 bis 23 Uhr Nbg.-Pass 6, Haltestelle Schweiggerstraße So: 8 - 22 Uhr 8 und 9, Haltestelle Wodanstraße Nordostbad Elbinger Straße 85, 0911/ Mo - Mi, Fr, Sa: 7 - 22 Uhr ab € 1,70 mit Haltestelle: Nordostbahnhof Nürnberg 515025 So: 8 - 22 Uhr Nbg.-Pass U-Bahn: U2 Do: geschlossen ab € 2,30 ohne Bus: Linien 21, 45, 46 https://nuernbergbad.nuernberg.de/nordostbad Nbg.-Pass Regionalbahn: Linie R21 Königsbad Käsröthe 4, 09191/ Mo - So: 9:30 - 21 Uhr bei 100% GdB S-Bahnlinie S1, Haltestelle: Forchheim 91301 Forchheim 34156-60 und "B": Forchheim Bahnhof komplett frei von dort mit Buslinie 206 www.koenigsbad.forchheim.de bis Burker-/Stillstraße, dann ca. 5 Minuten Fußweg Langwasserbad Breslauer Str. 251, 0911/ Mo - Fr: 13 - 22 Uhr ab € 1,70 mit U-Bahnlinie U1, Haltestelle Nürnberg 23110900 Sa, So: 8 - 22 Uhr Nbg.-Pass Langwasser Mitte Eingang Gleiwitzer Str. ab € 2,30 ohne Bus: Linie 56/57, Haltestelle Nbg.-Pass Langwasser Bad https://nuernbergbad.nuernberg.de/langwasserbad/ Bibert Bad Neptunstr. 8 0911/ Mo, Di, Do, Fr, Sa: 8 - 22 Uhrab 3,50 € S-Bahn: S4, Haltestelle Zirndorf 90513 Zirndorf 6099140 Mi: 10 - 22 Uhr (ermäßigt) Unterasbach Bahnhof, dann So: 8 - 21 Uhr Bus: Linie 154 www.bibertbad.de Haltestelle: Zirndorf Hirtenackerstr.
    [Show full text]
  • U-Bahn Nürnberg 19
    Planungs- und Baureferat U-Bahn Nürnberg 19 15. Oktober 2020 1 www.schmitt-aufzuege.com Das Titelbild dieser Broschüre, der neunzehn- ten über die Nürnberger U-Bahn, zeigt den neuen U-Bahnhof Großreuth bei Schweinau mit seinen Impressum Wolkenmotiven. Herausgeber: Stadt Nürnberg / Planungs- und Baureferat Redaktion: U-Bahnbauamt / Matthias Lankes – Presse- und Informationsamt / Markus Jäkel Text Seite 35 – 38: VAG / Elisabeth Seitzinger Inhalt Nicht namentlich gekennzeichnete Texte: U-Bahnbauamt / Matthias Lankes Gestaltung: Stadtgrafik / Ralf Weglehner Anzeigenverwaltung: Presse- und Informationsamt / Akquisition Grußwort des Oberbürgermeisters 2 Druck: Hofmann Druck Nürnberg GmbH & Co KG, Emmericher Straße 10, 90411 Nürnberg Erscheinung: Oktober 2020 Der Südwesten Nürnbergs am U-Bahnnetz 4 Auflage: 10 000 Stück Wohnungsbau und U-Bahn im Einklang 8 Fotonachweis U-Bahnbauamt: Im Südwesten beginnt eine neue Zeit 10 Helmut Leier (S.13) Michael Meyer (S. 14, S. 21 oben und Mitte, S. 22) Michael Hirscheider (S. 18 unten, S. 21 unten, S. 23 unten) Mit der U3 nach Großreuth bei Schweinau 11 Martin Löwe (S. 23 oben, S. 25) Jochen Kohler (S. 28, 29) Bastian Weidinger (S. 44) „Meilensteine“ beim Bauabschnitt 2.1 28 Presse- und Informationsamt: Christine Dierenbach (S.2) Stadtarchiv Nürnberg / Bildstelle (S. 46) Himmlische Ankunft 30 Bernhard Wohlrab (S. 43, 45) Orthophoto / Kartengrundlage (S. 39, 42): Leistungsstark und nachhaltig 34 Quelle: Stadt Nürnberg Geobasisdaten © Bayerische Vermes- sungsverwaltung VAG: Zuverlässigkeit und Sicherheit als Maßstab 35 Claus Felix (S. 18 oben, 24, 26/27, 35, 36, 37, 38) Berschneider + Berschneider: Daten und Zahlen 40 ZEITLOSE WERTE Jonathan Danko Kielkowski (S. 15, 16/17, 19, 30, 31, 32, 33) Pläne und Grafiken Und weiter geht´s im Südwesten 42 DIE ÜBERZEUGUNG EINES FAMILIENUNTERNEHMENS U-Bahnbauamt (S.
    [Show full text]
  • U-Bahn Heft 17
    Baureferat U-Bahn Nürnberg 17 10.12.11 1 Inhalt Impressum Herausgeber: Stadt Nürnberg / Baureferat Redaktion: Presse- und Informationsamt / Thomas Meiler Entwurf: U-Bahnbauamt / Jochen Kohler Das Titelbild dieser Text Seite 37-40: VAG Broschüre, der Nicht namentlich gekennzeichnete Texte: siebzehnten über die U-Bahnbauamt / Jochen Kohler Nürnberger U-Bahn, zeigt eine Grafik mit Grafische Gestaltung: Stadtgrafik / Ralf Weglehner, den Köpfen der Herbert Kulzer (Titelgrafik) Namensgeber der Anzeigenverwaltung: neuen U-Bahnhöfe. Presse und Informationsamt / Akquisition Kartengrundlage und Bearbeitung: Amt für Geoinformation und Bodenordnung Druck: Fahner Druck GmbH, Nürnberg Erscheinung: Dezember 2011 Grußwort des Oberbürgermeisters 2 Auflage: 15 000 Stück U-Bahn im Nürnberger Norden 4 Fotonachweis U-Bahnbauamt: Die Nordstadt gewinnt an Wohnqualität 8 Jochen Kohler (S. 20 unten, S. 21 unten) Günther Perzl (S. 20 oben, S. 21 oben, S. 53), Martin Löwe (S. 12 oben, 13 oben, 42 unten rechts) Mit der U3 bis zum Friedrich-Ebert-Platz 9 Michael Meyer (S. 11, 12 unten) Michael Hirscheider (S. 15, 16, 42 unten links) Bilder-Collage 23 Roland Mahler (S. 18, 45, 46) Stadtarchiv Nürnberg: Bildstelle (S. 51) Harmonisches Miteinander – der Bahnhof Kaulbachplatz 24 Presse- und Informationsamt: Christine Dierenbach (S. 2), Ralf Schedlbauer (S. 74) Übersichtsplan 30 Orthophoto: Wiedergabe mit Genehmigung der Aerowest Pulsierendes Leben – der Bahnhof Friedrich-Ebert-Platz 32 GmbH (S. 30-31) VAG: Automatische U-Bahn in Nürnberg 37 Wieder die Nummer 1! Claus Felix / Peter Roggenthin (S. 37-40) Bernd Ogan (S. 23) Daten und Zahlen 42 Haid+Partner Architekten (S. 6, 24-29) stm Architekten (S. 4, 7, 32-36) Und weiter geht´s im Norden und Süden 44 Pläne und Grafiken: U-Bahnbauamt (S.
    [Show full text]
  • 16 CTZN Stadtplannbg En Low.Pdf
    Erlanger . tr N-NORD s G r ü Erlanger 3 n Regnitz d FÜ-RONHOF l Sebalder Str. a - c Buch ker h eac e Se r Almoshof FLUGHAFEN Ronhof Erlanger 73 Imperial Castle Museum Toy Germanisches Nationalmuseum of Human Rights and the Way Documentation Centre Museum DB Railway New Museum Dolphin Lagoon in the Zoo Underground passageways Beautiful Fountain at main market New Museum V a S Str. Reichswald c t r h Bierweg . e r 2 S Museums Obligation to the Past Attractions g t e r Knoblauchsland . w - The museums of Nuremberg cover a fascinating range of interests, from The Nazi Party Rally Grounds, laid out by Hitler’s National Socialists, is a e Schnepfenreuth ZIEGELSTEIN - 1 8 FÜ-POPPENREUTH e e er Nuremberg Zoo St. Lawrence Church Str. n s Bi the important and famous collection of contemporary art and design in permanent reminder to Nuremberg of the city’s obligation to analyze and h s c a en S g the New Museum to specialty collections barely known and just waiting to discuss the darkest chapter of German history. The Documentation Center Tel. +49 911 54546 Tel. +49 911 2446 9930 ll Str. 4 e r ap st uthe Pop Ziegelstein K r. nre p be discovered. and the Memorium Nuremberg Trials explain on the original historical sites Oct – March daily 9 am – 5 pm. Built ca. 1250–1477. Tabernacle by Pegnitz ppe e STADTHALLE Po Poppenreuthn r the causal factors and consequences of the National Socialist reign of terror König- eu Apr – Sept daily 8 am – 7.30 pm.
    [Show full text]
  • S-Bahn, U-Bahn Und Regionalzug Im MVV Suburban Train, Underground and Regional Train in the MVV Partner Im Münchner Verkehrs- Und Tarifverbund
    S-Bahn, U-Bahn und Regionalzug im MVV Suburban train, underground and regional train in the MVV Partner im Münchner Verkehrs- und Tarifverbund Ingolstadt, Treuchtlingen, Nürnberg n e 900 g Freising n i Marzling Langenbach Moosburg l t Petershausen 900 930 . 931 h c u Pulling . g e 930 931 a r r T P , , f h Lohhof Eching Neufahrn Flughafen Besucherpark t o r Vierkirchen-Esterhofen Flughafen München H ö , w Munich Airport g r u Altomünster e a Unterschleißheim b n n o Hallbergmoos r D Kleinberghofen ü , N Röhrmoos m U6 Garching-Forschungszentrum , l g r U Erdweg , Oberschleißheim u g Garching b r s u Arnbach n b e s Ismaning g g Garching-Hochbrück e u Hebertshausen Erding R A Markt Indersdorf , U2 Feldmoching u Fröttmaning a s 980 s Niederroth a 980 P Althegnenberg Hasen- Dülfer- Harthof Am Hart Altenerding , Frankfurter Ring Kieferngarten t bergl straße Unterföhring u Schwabhausen h Haspelmoor Fasanerie s Freimann d Aufhausen n Bachern Milbertshofen a Mammendorf L U3 Olympia- Studentenstadt Einkaufs- Dachau Stadt Moosach Moosacher Ober- Olympia- Petuel- Johanneskirchen St. Koloman St.-Martins-Platz zentrum wiesenfeld zentrum ring Bonner Platz Alte Heide Malching Dachau U1 U7 U8 Scheidplatz Nordfriedhof Ottenhofen Georg-Brauchle-Ring Maisach Karlsfeld Dietlindenstraße Englschalking 940 U4 Arabellapark h Westfriedhof Hohenzollernplatz c Münchner Freiheit Markt Schwaben a Gernlinden b m Allach Richard-Strauss-Str. 940 i Gern Poing S Giselastraße , f Josephsplatz r Esting Böhmerwaldplatz o Untermenzing Rotkreuzplatz Daglfing Grub d l U8 Universität
    [Show full text]
  • Drug Tariff Part IX April 2019
    04/2019 Part IX APPLIANCES (APPROVED LIST OF APPLIANCES - NOTES) Part IX - Appliances (Approved list of Appliances - Notes) APPROVED LIST OF APPLIANCES See Part II, Clause 8A (page 9) The price listed in respect of an appliance specified in the following list is the basic price (see Part II, Clause 8) on which payment will be calculated pursuant to Part I, Clause 5B in respect of the dispensing of appliances. NOTES 1. Definition - The appliances that may be supplied against orders on Forms FP10 are listed below and must conform with the specifications shown. See Part 1, Clause 2 (page 4). These specifications include published official standards ie BP, BPC or relevant British, European or International Standards or the Drug Tariff Technical Specification. It should be emphasized that any appliance must conform with the entry in this part of the Tariff as well as the official standard or technical specification quoted therein. Other dressings and appliances which may be necessary will normally be provided through the Hospital Services. 2. Sealed Packets - These are those which are "tamper-evident" - sealed with an easily detachable device that prevents removal of the contents without the seal being broken. Additionally in the case of sterile products: once a sealed package has been opened it should not be possible to re-seal it easily. Where an appliance, other than a bandage, required by the Tariff to be supplied in a sealed packet is ordered of a quantity or weight not listed in the Tariff, the quantity ordered should be made up as nearly as possible with the smallest numbers of sealed packets available for the purpose.
    [Show full text]