Distributional Semantics
Total Page:16
File Type:pdf, Size:1020Kb
Distributional semantics Distributional semantics is a research area that devel- by populating the vectors with information on which text ops and studies theories and methods for quantifying regions the linguistic items occur in; paradigmatic sim- and categorizing semantic similarities between linguis- ilarities can be extracted by populating the vectors with tic items based on their distributional properties in large information on which other linguistic items the items co- samples of language data. The basic idea of distributional occur with. Note that the latter type of vectors can also semantics can be summed up in the so-called Distribu- be used to extract syntagmatic similarities by looking at tional hypothesis: linguistic items with similar distributions the individual vector components. have similar meanings. The basic idea of a correlation between distributional and semantic similarity can be operationalized in many dif- ferent ways. There is a rich variety of computational 1 Distributional hypothesis models implementing distributional semantics, includ- ing latent semantic analysis (LSA),[8] Hyperspace Ana- The distributional hypothesis in linguistics is derived logue to Language (HAL), syntax- or dependency-based from the semantic theory of language usage, i.e. words models,[9] random indexing, semantic folding[10] and var- that are used and occur in the same contexts tend to ious variants of the topic model. [1] purport similar meanings. The underlying idea that “a Distributional semantic models differ primarily with re- word is characterized by the company it keeps” was pop- spect to the following parameters: ularized by Firth.[2] The Distributional Hypothesis is the basis for statistical semantics. Although the Distribu- • tional Hypothesis originated in linguistics,[3] it is now re- Context type (text regions vs. linguistic items) ceiving attention in cognitive science especially regarding • Context window (size, extension, etc.) the context of word use.[4] In recent years, the distribu- tional hypothesis has provided the basis for the theory • Frequency weighting (e.g. entropy, pointwise mu- of similarity-based generalization in language learning: tual information, etc.) the idea that children can figure out how to use words they've rarely encountered before by generalizing about • Dimension reduction (e.g. random indexing, their use from distributions of similar words.[5][6] The dis- singular value decomposition, etc.) tributional hypothesis suggests that the more semantically • similar two words are, the more distributionally similar Similarity measure (e.g. cosine similarity, they will be in turn, and thus the more that they will tend Minkowski distance, etc.) to occur in similar linguistic contexts. Whether or not this suggestion holds has significant implications for both the Distributional semantic models that use linguistic items data-sparsity problem in computational modeling, and for as context have also been referred to as word space mod- the question of how children are able to learn language so els.[11][12] rapidly given relatively impoverished input (this is also known as the problem of the poverty of the stimulus). 3 Compositional 2 Distributional semantic model- Compositional distributional semantic models are an ex- ing tension of distributional semantic models that character- ize the semantics of entire phrases or sentences. This is Distributional semantics favor the use of linear algebra achieved by composing the distributional representations as computational tool and representational framework. of the words that sentences contain. Different approaches to composition have been explored, and are under discus- The basic approach is to collect distributional informa- [13] tion in high-dimensional vectors, and to define distribu- sion at established workshops such as SemEval. tional/semantic similarity in terms of vector similarity.[7] Simpler non-compositional models fail to capture the se- Different kinds of similarities can be extracted depend- mantics of larger linguistic units as they ignore grammati- ing on which type of distributional information is used to cal structure and logical words, which are crucial for their collect the vectors: topical similarities can be extracted understanding. 1 2 7 REFERENCES 4 Applications • Phraseme • Word2vec Distributional semantic models were successfully applied for the following tasks: • Gensim • finding semantic similarity between words and multi-word expressions; 7 References • word clustering based on semantic similarity; [1] Harris 1954 • automatic creation of thesauri and bilingual dictio- [2] Firth 1957 naries; [3] Sahlgren 2008 • lexical ambiguity resolution; [4] McDonald & Ramscar 2001 • expanding search requests using synonyms and as- [5] Gleitman 2002 sociations; [6] Yarlett 2008 • defining the topic of a document; [7] Rieger 1991 • document clustering for information retrieval; [8] Deerwester et al. 1990 • data mining and named entities recognition; [9] Padó & Lapata 2007 • creating semantic maps of different subject domains; [10] De Sousa Webber, Francisco (2015). “Semantic Folding Theory And its Application in Semantic Fingerprinting”. • paraphrasing; arXiv:1511.08855 . • sentiment analysis; [11] Schütze 1993 • modeling selectional preferences of words. [12] Sahlgren 2006 [13] “SemEval-2014, Task 1”. 5 Software 7.1 Sources • S-Space • Harris, Z. (1954). “Distributional structure”. Word. • SemanticVectors 10 (23): 146–162. • Gensim • Firth, J.R. (1957). “A synopsis of linguistic the- ory 1930-1955”. Studies in Linguistic Analysis. Ox- • DISCO Builder ford: Philological Society: 1–32. Reprinted in F.R. Palmer, ed. (1968). Selected Papers of J.R. Firth 1952-1959. London: Longman. 6 See also • Sahlgren, Magnus (2008). “The Distributional Hy- • Co-occurrence pothesis” (PDF). Rivista di Linguistica. 20 (1): 33– 53. • Statistical semantics • McDonald, S.; Ramscar, M. (2001). “Testing the • J. R. Firth distributional hypothesis: The influence of context on judgements of semantic similarity”. Proceed- • Zellig Harris ings of the 23rd Annual Conference of the Cog- nitive Science Society. pp. 611–616. CiteSeerX • Scott Deerwester 10.1.1.104.7535 . • Susan Dumais • Gleitman, Lila R. (2002). “Verbs of a feather flock • George Furnas together II: The child’s discovery of words and their meanings”. The Legacy of Zellig Harris: Language • Thomas Landauer and information into the 21st century: Philosophy of • Richard Harshman science, syntax and semantics. Current issues in Lin- guistic Theory. John Benjamins Publishing Com- • Word embedding pany. 1: 209–229. doi:10.1075/cilt.228.17gle. 3 • Yarlett, D. (2008). Language Learning Through Similarity-Based Generalization (PDF) (PhD the- sis). Stanford University. • Rieger, Burghard B. (1991). On Distributed Repre- sentations in Word Semantics (PDF) (Report). ICSI Berkeley 12-1991. CiteSeerX 10.1.1.37.7976 . • Deerwester, Scott; Dumais, Susan T.; Furnas, George W.; Landauer, Thomas K.; Harsh- man, Richard (1990). “Indexing by Latent Semantic Analysis” (PDF). Journal of the American Society for Information Science. 41 (6): 391–407. doi:10.1002/(SICI)1097- 4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9. • Padó, Sebastian; Lapata, Mirella (2007). “Dependency-based construction of semantic space models”. Computational Linguistics. 33 (2): 161–199. doi:10.1162/coli.2007.33.2.161. • Schütze, Hinrich (1993). “Word Space”. Advances in Neural Information Processing Systems 5. pp. 895–902. CiteSeerX 10.1.1.41.8856 . • Sahlgren, Magnus (2006). The Word-Space Model (PDF) (PhD thesis). Stockholm University. • Thomas Landauer; Susan T. Dumais. “A Solution to Plato’s Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representa- tion of Knowledge”. Retrieved 2007-07-02. • Kevin Lund; Curt Burgess; Ruth Ann Atchley (1995). Semantic and associative priming in a high- dimensional semantic space. Cognitive Science Pro- ceedings. pp. 660–665. • Kevin Lund; Curt Burgess (1996). “Produc- ing high-dimensional semantic spaces from lexi- cal co-occurrence”. Behavior Research Methods, Instruments, and Computers. 28 (2): 203–208. doi:10.3758/bf03204766. 8 External links • Zellig S. Harris 4 9 TEXT AND IMAGE SOURCES, CONTRIBUTORS, AND LICENSES 9 Text and image sources, contributors, and licenses 9.1 Text • Distributional semantics Source: https://en.wikipedia.org/wiki/Distributional_semantics?oldid=770427821 Contributors: Topbanana, Phil Boswell, Wikinaut, Rjwilmsi, Bgwhite, Bn, Headbomb, Pdturney, Showeropera, Castelargus, Auntof6, Felix Folio Secundus, Yobot, Robykiwi~enwiki, Rushbugled13, Salthizar, Lam Kin Keung, Dcirovic, Fredrikolsson, BG19bot, Kyoakoa, Pintoch, , Me, Myself, and I are Here, Sidharth10, JuliaMor, Topvisor, Mpgarnier io, Leonardossouza, MarcoMazzon94 and Anonymous: 6 9.2 Images • File:Lock-green.svg Source: https://upload.wikimedia.org/wikipedia/commons/6/65/Lock-green.svg License: CC0 Contributors: en:File: Free-to-read_lock_75.svg Original artist: User:Trappist the monk 9.3 Content license • Creative Commons Attribution-Share Alike 3.0.