MMFEAT: a Toolkit for Extracting Multi-Modal Features

Total Page:16

File Type:pdf, Size:1020Kb

MMFEAT: a Toolkit for Extracting Multi-Modal Features MMFEAT: A Toolkit for Extracting Multi-Modal Features Douwe Kiela Computer Laboratory University of Cambridge [email protected] Abstract In this demonstration paper, we describe MM- FEAT, a Python toolkit that makes it easy to ob- Research at the intersection of language tain images and sound files and extract visual and other modalities, most notably vision, or auditory features from them. The toolkit in- is becoming increasingly important in nat- cludes two standalone command-line tools that ural language processing. We introduce a do not require any knowledge of the Python pro- toolkit that can be used to obtain feature gramming language: one that can be used for representations for visual and auditory in- automatically obtaining files from a variety of formation. MMFEAT is an easy-to-use sources, including Google, Bing and FreeSound Python toolkit, which has been developed (miner.py); and one that can be used for extract- with the purpose of making non-linguistic ing different types of features from directories of modalities more accessible to natural lan- data files (extract.py). In addition, the package guage processing researchers. comes with code for manipulating multi-modal spaces and several demos to illustrate the wide 1 Introduction range of applications. The toolkit is open source Distributional models are built on the assumption under the BSD license and available at https: that the meaning of a word is represented as a //github.com/douwekiela/mmfeat. distribution over others (Turney and Pantel, 2010; Clark, 2015), which implies that they suffer from 2 Background the grounding problem (Harnad, 1990). That is, 2.1 Bag of multi-modal words they do not account for the fact that human se- mantic knowledge is grounded in the perceptual Although it is possible to ground distributional se- system (Louwerse, 2008). There has been a lot mantics in perception using e.g. co-occurrence of interest within the Natural Language Processing patterns of image tags (Baroni and Lenci, 2008) community for making use of extra-linguistic per- or surrogates of human semantic knowledge such ceptual information, much of it in a subfield called as feature norms (Andrews et al., 2009), the de multi-modal semantics. Such multi-modal models facto method for grounding representations in per- outperform language-only models on a range of ception has relied on processing raw image data tasks, including modelling semantic similarity and (Baroni, 2016). The traditional method for ob- relatedness (Bruni et al., 2014; Silberer and La- taining visual representations (Feng and Lapata, pata, 2014), improving lexical entailment (Kiela 2010; Leong and Mihalcea, 2011; Bruni et al., et al., 2015b), predicting compositionality (Roller 2011) has been to apply the bag-of-visual-words and Schulte im Walde, 2013), bilingual lexicon (BoVW) approach (Sivic and Zisserman, 2003). induction (Bergsma and Van Durme, 2011) and The method can be described as follows: metaphor identification (Shutova et al., 2016). Al- though most of this work has relied on vision 1. obtain relevant images for a word or set of for the perceptual input, recent approaches have words; also used auditory (Lopopolo and van Miltenburg, 2. for each image, get local feature descriptors; 2015; Kiela and Clark, 2015) and even olfactory 3. cluster feature descriptors with k-means to (Kiela et al., 2015a) information. find the centroids, a.k.a. the “visual words”; 55 Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics—System Demonstrations, pages 55–60, Berlin, Germany, August 7-12, 2016. c 2016 Association for Computational Linguistics 4. quantize the local descriptors by comparing 2.3 Related work them to the cluster centroids; and The process for obtaining perceptual representa- 5. combine relevant image representations into tions thus involves three distinct steps: obtaining an overall visual representation for a word. files relevant to words or phrases, obtaining repre- sentations for the files, and aggregating these into The local feature descriptors in step (2) tend visual or auditory representations. To our knowl- to be variants of the dense scale-invariant feature edge, this is the first toolkit that spans this entire transform (SIFT) algorithm (Lowe, 2004), where process. There are libraries that cover some of an image is laid out as a dense grid and feature these steps. Notably, VSEM (Bruni et al., 2013) descriptors are computed for each keypoint. is a Matlab library for visual semantics represen- A similar method has recently been applied to tation that implements BoVW and useful func- the auditory modality (Lopopolo and van Mil- tionality for manipulating visual representations. tenburg, 2015; Kiela and Clark, 2015), using DISSECT (Dinu et al., 2013) is a toolkit for dis- sound files from FreeSound (Font et al., 2013). tributional compositional semantics that makes it Bag-of-audio-words (BoAW) uses mel-frequency easy to work with (textual) distributional spaces. cepstral coefficients (MFCCs) (O’Shaughnessy, Lopopolo and van Miltenburg (2015) have also re- 1987) for the local descriptors, although other lo- leased their code for obtaning BoAW representa- cal frame representations may also be used. In tions1. MFCC, frequency bands are spaced along the mel scale (Stevens et al., 1937), which has the advan- 3 MMFeat Overview tage that it approximates human auditory percep- The MMFeat toolkit is written in Python. There tion more closely than e.g. linearly-spaced fre- are two command-line tools (described below) for quency bands. obtaining files and extracting representations that 2.2 Convolutional neural networks do not require any knowledge of Python. The Python interface maintains a modular structure In computer vision, the BoVW method has been and contains the following modules: superseded by deep convolutional neural networks mmfeat.miner (CNNs) (LeCun et al., 1998; Krizhevsky et al., • 2012). Kiela and Bottou (2014) showed that such mmfeat.bow • mmfeat.cnn networks learn high-quality representations that • mmfeat.space can successfully be transfered to natural language • processing tasks. Their method works as follows: Source files (images or sounds) can be obtained with the miner module, although this is not a re- 1. obtain relevant images for a word or set of quirement: it is straightforward to build an in- words; dex of a data directory that matches words or 2. for each image, do a forward pass through phrases with relevant files. The miner module au- a CNN trained on an image recognition task tomatically generates this index, a Python dictio- and extract the pre-softmax layer; nary mapping labels to lists of filenames, which 3. combine relevant image representations into is stored as a Python pickle file index.pkl in the an overall visual representation for a word. data directory. The index is used by the bow and cnn modules, which together form the core of the They used the pre-softmax layer (referred to as package for obtaining perceptual representations. FC7) from a CNN trained by Oquab et al. (2014), The space package allows for the manipulation which was an adaptation of the well-known CNN and combination of multi-modal spaces. by Krizhevsky et al. (2012) that played a key role miner Three data sources are currently sup- in the deep learning revolution in computer vision ported: Google Images2 (GoogleMiner), Bing Im- (Razavian et al., 2014; LeCun et al., 2015). Such ages3 (BingMiner) and FreeSound4 (FreeSound- CNN-derived representations perform much better Miner). All three of them require API keys, than BoVW features and have since been used in 1 a variety of NLP applications (Kiela et al., 2015c; https://github.com/evanmiltenburg/soundmodels-iwcs 2https://images.google.com Lazaridou et al., 2015; Shutova et al., 2016; Bulat 3https://www.bing.com/images et al., 2016). 4https://www.freesound.org 56 which can be obtained online and are stored in the 4 Tools miner.yaml settings file in the root folder. MMFeat comes with two easy-to-use command- bow The bag-of-words methods are contained in line tools for those unfamiliar with the Python pro- this module. BoVW and BoAW are accessible gramming language. through the mmfeat.bow.vw and mmfeat.bow.aw modules respectively, through the BoVW and 4.1 Mining: miner.py BoAW classes. These classes obtain feature de- scriptors and perform clustering and quantization The miner.py tool takes three arguments: the data through a standard set of methods. BoVW uses source (bing, google or freesound), a query file dense SIFT for its local feature descriptors; BoAW that contains a line-by-line list of queries, and a uses MFCC. The modules also contain an inter- data directory to store the mined image or sound face for loading local feature descriptors from files in. Its usage is as follows: Matlab, allowing for simple integraton with e.g. miner.py {bing,google,freesound} \ VLFeat5. The centroids obtained by the clustering query_file data_dir [-n int] (sometimes also called the “codebook”) are stored The -n option can be used to specify the number of in the data directory for re-use at a later stage. images to download per query. The following ex- cnn The CNN module uses Python bindings amples show how to use the tool to get 10 images to the Caffe deep learning framework (Jia et from Bing and 100 sound files from FreeSound for al., 2014). It supports the pre-trained reference the queries “dog” and “cat”: adaptation of AlexNet (Krizhevsky et al., 2012), $ echo -e "dog\ncat" > queries.txt GoogLeNet (Szegedy et al., 2015) and VGGNet $ python miner.py -n 10 bing \ (Simonyan and Zisserman, 2015). The interface is queries.txt ./img_data_dir $ python miner.py -n 100 freesound \ identical to the bow interface. queries.txt ./sound_data_dir space An additional module is provided for 4.2 Feature extraction: extract.py making it easy to manipulate perceptual represen- tations.
Recommended publications
  • Semantic Computing
    SEMANTIC COMPUTING Lecture 7: Introduction to Distributional and Distributed Semantics Dagmar Gromann International Center For Computational Logic TU Dresden, 30 November 2018 Overview • Distributional Semantics • Distributed Semantics – Word Embeddings Dagmar Gromann, 30 November 2018 Semantic Computing 2 Distributional Semantics Dagmar Gromann, 30 November 2018 Semantic Computing 3 Distributional Semantics Definition • meaning of a word is the set of contexts in which it occurs • no other information is used than the corpus-derived information about word distribution in contexts (co-occurrence information of words) • semantic similarity can be inferred from proximity in contexts • At the very core: Distributional Hypothesis Distributional Hypothesis “similarity of meaning correlates with similarity of distribution” - Harris Z. S. (1954) “Distributional structure". Word, Vol. 10, No. 2-3, pp. 146-162 meaning = use = distribution in context => semantic distance Dagmar Gromann, 30 November 2018 Semantic Computing 4 Remember Lecture 1? Types of Word Meaning • Encyclopaedic meaning: words provide access to a large inventory of structured knowledge (world knowledge) • Denotational meaning: reference of a word to object/concept or its “dictionary definition” (signifier <-> signified) • Connotative meaning: word meaning is understood by its cultural or emotional association (positive, negative, neutral conntation; e.g. “She’s a dragon” in Chinese and English) • Conceptual meaning: word meaning is associated with the mental concepts it gives access to (e.g. prototype theory) • Distributional meaning: “You shall know a word by the company it keeps” (J.R. Firth 1957: 11) John Rupert Firth (1957). "A synopsis of linguistic theory 1930-1955." In Special Volume of the Philological Society. Oxford: Oxford University Press. Dagmar Gromann, 30 November 2018 Semantic Computing 5 Distributional Hypothesis in Practice Study by McDonald and Ramscar (2001): • The man poured from a balack into a handleless cup.
    [Show full text]
  • A Comparison of Word Embeddings and N-Gram Models for Dbpedia Type and Invalid Entity Detection †
    information Article A Comparison of Word Embeddings and N-gram Models for DBpedia Type and Invalid Entity Detection † Hanqing Zhou *, Amal Zouaq and Diana Inkpen School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa ON K1N 6N5, Canada; [email protected] (A.Z.); [email protected] (D.I.) * Correspondence: [email protected]; Tel.: +1-613-562-5800 † This paper is an extended version of our conference paper: Hanqing Zhou, Amal Zouaq, and Diana Inkpen. DBpedia Entity Type Detection using Entity Embeddings and N-Gram Models. In Proceedings of the International Conference on Knowledge Engineering and Semantic Web (KESW 2017), Szczecin, Poland, 8–10 November 2017, pp. 309–322. Received: 6 November 2018; Accepted: 20 December 2018; Published: 25 December 2018 Abstract: This article presents and evaluates a method for the detection of DBpedia types and entities that can be used for knowledge base completion and maintenance. This method compares entity embeddings with traditional N-gram models coupled with clustering and classification. We tackle two challenges: (a) the detection of entity types, which can be used to detect invalid DBpedia types and assign DBpedia types for type-less entities; and (b) the detection of invalid entities in the resource description of a DBpedia entity. Our results show that entity embeddings outperform n-gram models for type and entity detection and can contribute to the improvement of DBpedia’s quality, maintenance, and evolution. Keywords: semantic web; DBpedia; entity embedding; n-grams; type identification; entity identification; data mining; machine learning 1. Introduction The Semantic Web is defined by Berners-Lee et al.
    [Show full text]
  • Distributional Semantics
    Distributional Semantics Computational Linguistics: Jordan Boyd-Graber University of Maryland SLIDES ADAPTED FROM YOAV GOLDBERG AND OMER LEVY Computational Linguistics: Jordan Boyd-Graber UMD Distributional Semantics 1 / 19 j j From Distributional to Distributed Semantics The new kid on the block Deep learning / neural networks “Distributed” word representations Feed text into neural-net. Get back “word embeddings”. Each word is represented as a low-dimensional vector. Vectors capture “semantics” word2vec (Mikolov et al) Computational Linguistics: Jordan Boyd-Graber UMD Distributional Semantics 2 / 19 j j From Distributional to Distributed Semantics This part of the talk word2vec as a black box a peek inside the black box relation between word-embeddings and the distributional representation tailoring word embeddings to your needs using word2vec Computational Linguistics: Jordan Boyd-Graber UMD Distributional Semantics 3 / 19 j j word2vec Computational Linguistics: Jordan Boyd-Graber UMD Distributional Semantics 4 / 19 j j word2vec Computational Linguistics: Jordan Boyd-Graber UMD Distributional Semantics 5 / 19 j j word2vec dog cat, dogs, dachshund, rabbit, puppy, poodle, rottweiler, mixed-breed, doberman, pig sheep cattle, goats, cows, chickens, sheeps, hogs, donkeys, herds, shorthorn, livestock november october, december, april, june, february, july, september, january, august, march jerusalem tiberias, jaffa, haifa, israel, palestine, nablus, damascus katamon, ramla, safed teva pfizer, schering-plough, novartis, astrazeneca, glaxosmithkline, sanofi-aventis, mylan, sanofi, genzyme, pharmacia Computational Linguistics: Jordan Boyd-Graber UMD Distributional Semantics 6 / 19 j j Working with Dense Vectors Word Similarity Similarity is calculated using cosine similarity: dog~ cat~ sim(dog~ ,cat~ ) = dog~ · cat~ jj jj jj jj For normalized vectors ( x = 1), this is equivalent to a dot product: jj jj sim(dog~ ,cat~ ) = dog~ cat~ · Normalize the vectors when loading them.
    [Show full text]
  • Facing the Facts of Fake: a Distributional Semantics and Corpus Annotation Approach Bert Cappelle, Pascal Denis, Mikaela Keller
    Facing the facts of fake: a distributional semantics and corpus annotation approach Bert Cappelle, Pascal Denis, Mikaela Keller To cite this version: Bert Cappelle, Pascal Denis, Mikaela Keller. Facing the facts of fake: a distributional semantics and corpus annotation approach. Yearbook of the German Cognitive Linguistics Association, De Gruyter, 2018, 6 (9-42). hal-01959609 HAL Id: hal-01959609 https://hal.archives-ouvertes.fr/hal-01959609 Submitted on 18 Dec 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Facing the facts of fake: a distributional semantics and corpus annotation approach Bert Cappelle, Pascal Denis and Mikaela Keller Université de Lille, Inria Lille Nord Europe Fake is often considered the textbook example of a so-called ‘privative’ adjective, one which, in other words, allows the proposition that ‘(a) fake x is not (an) x’. This study tests the hypothesis that the contexts of an adjective-noun combination are more different from the contexts of the noun when the adjective is such a ‘privative’ one than when it is an ordinary (subsective) one. We here use ‘embeddings’, that is, dense vector representations based on word co-occurrences in a large corpus, which in our study is the entire English Wikipedia as it was in 2013.
    [Show full text]
  • Sentiment, Stance and Applications of Distributional Semantics
    Sentiment, stance and applications of distributional semantics Maria Skeppstedt Sentiment, stance and applications of distributional semantics Sentiment analysis (opinion mining) • Aims at determining the attitude of the speaker/ writer * In a chunk of text (e.g., a document or sentence) * Towards a certain topic • Categories: * Positive, negative, (neutral) * More detailed Movie reviews • Excruciatingly unfunny and pitifully unromantic. • The picture emerges as a surprisingly anemic disappointment. • A sensitive, insightful and beautifully rendered film. • The spark of special anime magic here is unmistak- able and hard to resist. • Maybe you‘ll be lucky , and there‘ll be a power outage during your screening so you can get your money back. Methods for sentiment analysis • Train a machine learning classifier • Use lexicon and rules • (or combine the methods) Train a machine learning classifier Annotating text and training the model Excruciatingly Excruciatingly unfunny unfunny Applying the model pitifully unromantic. The model pitifully unromantic. Examples of machine learning Frameworks • Scikit-learn • NLTK (natural language toolkit) • CRF++ Methods for sentiment analysis • Train a machine learning classifier * See for instance Socher et al. (2013) that classified English movie review sentences into positive and negative sentiment. * Trained their classifier on 11,855 manually labelled sentences. (R. Socher, A. Perelygin, J. Wu, J. Chuang, C. D. Manning, A. Y. Ng, C. Potts, Recursive deep models for semantic composition- ality over a sentiment treebank) Methods based on lexicons and rules • Lexicons for different polarities • Sometimes with different strength on the words * Does not need the large amount of training data that is typically required for machine learning methods * More flexible (e.g, to add new sentiment poles and new languages) Commercial use of sentiment analysis I want to retrieve everything positive and negative that is said about the products I sell.
    [Show full text]
  • Building Web-Interfaces for Vector Semantic Models with the Webvectors Toolkit
    Building Web-Interfaces for Vector Semantic Models with the WebVectors Toolkit Andrey Kutuzov Elizaveta Kuzmenko University of Oslo Higher School of Economics Oslo, Norway Moscow, Russia [email protected] eakuzmenko [email protected] Abstract approaches, which allowed to train distributional models with large amounts of raw linguistic data We present WebVectors, a toolkit that very fast. The most established word embedding facilitates using distributional semantic algorithms in the field are highly efficient Continu- models in everyday research. Our toolkit ous Skip-Gram and Continuous Bag-of-Words, im- has two main features: it allows to build plemented in the famous word2vec tool (Mikolov web interfaces to query models using a et al., 2013b; Baroni et al., 2014), and GloVe in- web browser, and it provides the API to troduced in (Pennington et al., 2014). query models automatically. Our system Word embeddings represent the meaning of is easy to use and can be tuned according words with dense real-valued vectors derived from to individual demands. This software can word co-occurrences in large text corpora. They be of use to those who need to work with can be of use in almost any linguistic task: named vector semantic models but do not want to entity recognition (Siencnik, 2015), sentiment develop their own interfaces, or to those analysis (Maas et al., 2011), machine translation who need to deliver their trained models to (Zou et al., 2013; Mikolov et al., 2013a), cor- a large audience. WebVectors features vi- pora comparison (Kutuzov and Kuzmenko, 2015), sualizations for various kinds of semantic word sense frequency estimation for lexicogra- queries.
    [Show full text]
  • Distributional Semantics for Understanding Spoken Meal Descriptions
    DISTRIBUTIONAL SEMANTICS FOR UNDERSTANDING SPOKEN MEAL DESCRIPTIONS Mandy Korpusik, Calvin Huang, Michael Price, and James Glass∗ MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA 02139, USA fkorpusik, calvinh, pricem, [email protected] ABSTRACT This paper presents ongoing language understanding experiments conducted as part of a larger effort to create a nutrition dialogue system that automatically extracts food concepts from a user’s spo- ken meal description. We first discuss the technical approaches to understanding, including three methods for incorporating word vec- tor features into conditional random field (CRF) models for seman- tic tagging, as well as classifiers for directly associating foods with properties. We report experiments on both text and spoken data from an in-domain speech recognizer. On text data, we show that the ad- dition of word vector features significantly improves performance, achieving an F1 test score of 90.8 for semantic tagging and 86.3 for food-property association. On speech, the best model achieves an F1 test score of 87.5 for semantic tagging and 86.0 for association. Fig. 1. A diagram of the flow of the nutrition system [4]. Finally, we conduct an end-to-end system evaluation through a user study with human ratings of 83% semantic tagging accuracy. Index Terms— CRF, Word vectors, Semantic tagging • We demonstrate a significant improvement in semantic tag- ging performance by adding word embedding features to a classifier. We explore features for both the raw vector values 1. INTRODUCTION and similarity to prototype words from each category. In ad- dition, improving semantic tagging performance benefits the For many patients with obesity, diet tracking is often time- subsequent task of associating foods with properties.
    [Show full text]
  • Distributional Semantics Today Introduction to the Special Issue Cécile Fabre, Alessandro Lenci
    Distributional Semantics Today Introduction to the special issue Cécile Fabre, Alessandro Lenci To cite this version: Cécile Fabre, Alessandro Lenci. Distributional Semantics Today Introduction to the special issue. Revue TAL, ATALA (Association pour le Traitement Automatique des Langues), 2015, Sémantique distributionnelle, 56 (2), pp.7-20. hal-01259695 HAL Id: hal-01259695 https://hal.archives-ouvertes.fr/hal-01259695 Submitted on 26 Jan 2016 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Distributional Semantics Today Introduction to the special issue Cécile Fabre* — Alessandro Lenci** * University of Toulouse, CLLE-ERSS ** University of Pisa, Computational Linguistics Laboratory ABSTRACT. This introduction to the special issue of the TAL journal on distributional seman- tics provides an overview of the current topics of this field and gives a brief summary of the contributions. RÉSUMÉ. Cette introduction au numéro spécial de la revue TAL consacré à la sémantique dis- tributionnelle propose un panorama des thèmes de recherche actuels dans ce champ et fournit un résumé succinct des contributions acceptées. KEYWORDS: Distributional Semantic Models, vector-space models, corpus-based semantics, se- mantic proximity, semantic relations, evaluation of distributional ressources. MOTS-CLÉS : Sémantique distributionnelle, modèles vectoriels, proximité sémantique, relations sémantiques, évaluation de ressources distributionnelles.
    [Show full text]
  • Distributional Semantics
    Distributional semantics Distributional semantics is a research area that devel- by populating the vectors with information on which text ops and studies theories and methods for quantifying regions the linguistic items occur in; paradigmatic sim- and categorizing semantic similarities between linguis- ilarities can be extracted by populating the vectors with tic items based on their distributional properties in large information on which other linguistic items the items co- samples of language data. The basic idea of distributional occur with. Note that the latter type of vectors can also semantics can be summed up in the so-called Distribu- be used to extract syntagmatic similarities by looking at tional hypothesis: linguistic items with similar distributions the individual vector components. have similar meanings. The basic idea of a correlation between distributional and semantic similarity can be operationalized in many dif- ferent ways. There is a rich variety of computational 1 Distributional hypothesis models implementing distributional semantics, includ- ing latent semantic analysis (LSA),[8] Hyperspace Ana- The distributional hypothesis in linguistics is derived logue to Language (HAL), syntax- or dependency-based from the semantic theory of language usage, i.e. words models,[9] random indexing, semantic folding[10] and var- that are used and occur in the same contexts tend to ious variants of the topic model. [1] purport similar meanings. The underlying idea that “a Distributional semantic models differ primarily with re- word is characterized by the company it keeps” was pop- spect to the following parameters: ularized by Firth.[2] The Distributional Hypothesis is the basis for statistical semantics. Although the Distribu- • tional Hypothesis originated in linguistics,[3] it is now re- Context type (text regions vs.
    [Show full text]
  • A Vector Space for Distributional Semantics for Entailment ∗
    A Vector Space for Distributional Semantics for Entailment ∗ James Henderson and Diana Nicoleta Popa Xerox Research Centre Europe [email protected] and [email protected] unk f g f Abstract ⇒ ¬ unk 1 0 0 0 Distributional semantics creates vector- f 1 1 0 0 space representations that capture many g 1 0 1 0 forms of semantic similarity, but their re- f 1 0 0 1 ¬ lation to semantic entailment has been less clear. We propose a vector-space model Table 1: Pattern of logical entailment between which provides a formal foundation for a nothing known (unk), two different features f and g known, and the complement of f ( f) known. distributional semantics of entailment. Us- ¬ ing a mean-field approximation, we de- velop approximate inference procedures ness with a distributional-semantic model of hy- and entailment operators over vectors of ponymy detection. probabilities of features being known (ver- Unlike previous vector-space models of entail- sus unknown). We use this framework ment, the proposed framework explicitly models to reinterpret an existing distributional- what information is unknown. This is a crucial semantic model (Word2Vec) as approxi- property, because entailment reflects what infor- mating an entailment-based model of the mation is and is not known; a representation y en- distributions of words in contexts, thereby tails a representation x if and only if everything predicting lexical entailment relations. In that is known given x is also known given y. Thus, both unsupervised and semi-supervised we model entailment in a vector space where each experiments on hyponymy detection, we dimension represents something we might know.
    [Show full text]
  • A Hybrid Distributional and Knowledge-Based Model of Lexical Semantics
    A Hybrid Distributional and Knowledge-based Model of Lexical Semantics Nikolaos Aletras Mark Stevenson Department of Computer Science, Department of Computer Science, University College London, University of Sheffield, Gower Street, Regent Court, 211 Portobello, London WC1E 6BT Sheffield S1 4DP United Kingdom United Kingdom [email protected] [email protected] Abstract which makes it straightforward to identify ones that are ambiguous. For example, these resources would A range of approaches to the representa- include multiple meanings for the word ball includ- tion of lexical semantics have been explored ing the ‘event’ and ‘sports equipment’ senses. How- within Computational Linguistics. Two of the ever, the fact that there are multiple meanings as- most popular are distributional and knowledge- sociated with ambiguous lexical items can also be based models. This paper proposes hybrid models of lexical semantics that combine the problematic since it may not be straightforward to advantages of these two approaches. Our mod- identify which one is being used for an instance of an els provide robust representations of synony- ambiguous word in text. This issue has lead to signif- mous words derived from WordNet. We also icant exploration of the problem of Word Sense Dis- make use of WordNet’s hierarcy to refine the ambiguation (Ide and Veronis,´ 1998; Navigli, 2009). synset vectors. The models are evaluated on More recently distributional semantics has become two widely explored tasks involving lexical semantics: lexical similarity and Word Sense a popular approach to representing lexical semantics Disambiguation. The hybrid models are found (Turney and Pantel, 2010; Erk, 2012). These ap- to perform better than standard distributional proaches are based on the premise that the semantics models and have the additional benefit of mod- of lexical items can be modelled by their context elling polysemy.
    [Show full text]
  • Assessing BERT As a Distributional Semantics Model
    Proceedings of the Society for Computation in Linguistics Volume 3 Article 34 2020 What do you mean, BERT? Assessing BERT as a Distributional Semantics Model Timothee Mickus Université de Lorraine, CNRS, ATILF, [email protected] Denis Paperno Utrecht University, [email protected] Mathieu Constant Université de Lorraine, CNRS, ATILF, [email protected] Kees van Deemter Utrecht University, [email protected] Follow this and additional works at: https://scholarworks.umass.edu/scil Part of the Computational Linguistics Commons Recommended Citation Mickus, Timothee; Paperno, Denis; Constant, Mathieu; and van Deemter, Kees (2020) "What do you mean, BERT? Assessing BERT as a Distributional Semantics Model," Proceedings of the Society for Computation in Linguistics: Vol. 3 , Article 34. DOI: https://doi.org/10.7275/t778-ja71 Available at: https://scholarworks.umass.edu/scil/vol3/iss1/34 This Paper is brought to you for free and open access by ScholarWorks@UMass Amherst. It has been accepted for inclusion in Proceedings of the Society for Computation in Linguistics by an authorized editor of ScholarWorks@UMass Amherst. For more information, please contact [email protected]. What do you mean, BERT? Assessing BERT as a Distributional Semantics Model Timothee Mickus Denis Paperno Mathieu Constant Kees van Deemter Universite´ de Lorraine Utrecht University Universite´ de Lorraine Utrecht University CNRS, ATILF [email protected] CNRS, ATILF [email protected] [email protected] [email protected] Abstract based on BERT. Furthermore, BERT serves both as a strong baseline and as a basis for a fine- Contextualized word embeddings, i.e. vector tuned state-of-the-art word sense disambiguation representations for words in context, are nat- urally seen as an extension of previous non- pipeline (Wang et al., 2019a).
    [Show full text]