<<

A Approach to Measuring Relatedness

Brian Harrington Oxford University Computing Laboratory [email protected]

Abstract 1999) notes, “ represents a special case of semantic relatedness: for ex- Humans are very good at judging the ample, cars and gasoline would seem to be strength of relationships between two more closely related than, say, cars and bi- terms, a task which, if it can be au- cycles, but the latter pair are certainly more tomated, would be useful in a range similar”. (Budanitsky and Hirst, 2006) fur- of applications. Systems attempting to ther note that “Computational applications solve this problem automatically have typically require relatedness rather than just traditionally either used relative po- similarity; for example, money and river are sitioning in lexical resources such as cues to the in-context meaning of bank that WordNet, or distributional relation- are just as good as trust company”. ships in large corpora. This paper pro- Systems for automatically determining the poses a new approach, whereby rela- of semantic relatedness between two tionships are derived from natural lan- terms have traditionally either used a mea- guage text by using existing nlp tools, surement based on the between the then integrated into a large scale se- terms within WordNet (Banerjee and Ped- mantic network. Spreading activation ersen, 2003; Hughes and Ramage, 2007), or is then used on this network in order used co-occurrence statistics from a large cor- to judge the strengths of all relation- pus (Mohammad and Hirst, 2006; Pad´oand ships connecting the terms. In com- Lapata, 2007). Recent systems have, how- parisons with human measurements, ever, shown improved results using extremely this approach was able to obtain re- large corpora (Agirre et al., 2009), and ex- sults on par with the best purpose built isting large-scale resources such as systems, using only a relatively small (Strube and Ponzetto, 2006). corpus extracted from the web. This In this paper, we propose a new approach is particularly impressive, as the net- to determining semantic relatedness, in which work creation system is a general tool a semantic network is automatically created for information collection and integra- from a relatively small corpus using exist- tion, and is not specifically designed for ing NLP tools and a network creation system tasks of this type. called ASKNet (Harrington and Clark, 2007), 1 Introduction and then spreading activation is used to de- termine the strength of the connections within The ability to determine semantic relatedness that network. This process is more analogous between terms is useful for a variety of nlp ap- to the way the task is performed by humans. plications, including sense disambigua- Information is collected from fragments and tion, and retrieval, and assimilated into a large semantic knowledge text summarisation (Budanitsky and Hirst, structure which is not purposely built for a 2006). However, there is an important dis- single task, but is constructed as a general tinction to be made between semantic relat- resource containing a wide variety of infor- edness and semantic similarity. As (Resnik, mation. Relationships represented within this

356 Coling 2010: Poster Volume, pages 356–364, Beijing, August 2010 structure can then be used to determine the and Brew, 2004; Pad´oand Lapata, 2007) or total strength of the relations between any two from the web using search engine results (Tur- terms. ney, 2001). In a recent paper, Agirre et al. (2009) parsed 2 Existing Approaches 4 billion documents (1.6 Terawords) crawled 2.1 Resource Based Methods from the web, and then used a search func- tion to extract syntactic relations and con- A popular method for automatically judging text windows surrounding key . These semantic distance between terms is through were then used as features for vector space, WordNet (Fellbaum, 1998), using the lengths in a similar manner to work done in (Pad´o of paths between words in the as a and Lapata, 2007), using the British National measure of distance. While WordNet-based Corpus (bnc). This system has produced ex- approaches have obtained promising results cellent results, indicating that the quality of for measuring semantic similarity (Jiang and the results for these types of approaches is re- Conrath, 1997; Banerjee and Pedersen, 2003), lated to the size and coverage of their corpus. the results for the more general notion of se- This does however present problems moving mantic relatedness have been less promising forward, as 1.6 Terawords is obviously an ex- (Hughes and Ramage, 2007). tremely large corpus, and it is likely that there One disadvantage of using WordNet for would be a diminishing return on investment evaluating semantic relatedness is its hierar- for increasingly large corpora. In the same pa- chical taxonomic structure. This results in per, another method was shown which used terms such as car and bicycle being close in the algorihm, run over a network the network, but terms such as car and gaso- formed from WordNet and the WordNet gloss line being far apart. Another difficulty arises tags in order to produce equally impressive re- from the non-scalability of WordNet. While sults. the quality of the network is high, the man- ual nature of its construction means that arbi- 3 A Semantic Network Approach trary word pairs may not occur in the network. Hence in this paper we pursue an approach in The resource we use is a semantic network, which the resource for measuring semantic re- automatically created by the large scale net- latedness is created automatically, based on work creation program, ASKNet. The rela- naturally occurring text. tions between nodes in the network are based A similar project, not using WordNet is on the relations returned by a parser and se- WikiRelate (Strube and Ponzetto, 2006), mantic analyser, which are typically the argu- which uses the existing link structure of ments of predicates found in the text. Hence Wikipedia as its base network, and uses sim- terms in the network are related by the chain ilar based measurements to those found of syntactic/semantic relations which connect in WordNet approaches to compute semantic the terms in documents, making the network relatedness. This project has seen improved ideal for measuring the general notion of se- results over most WordNet base approaches, mantic relatedness. largely due to the nature of Wikipedia, where Distinct occurrences of terms and entities articles tend to link to other articles which are are combined into a single node using a novel related, rather than just ones which are simi- form of spreading activation (Collins and Lof- lar. tus, 1975). This combining of distinct men- tions produces a cohesive connected network, 2.2 Distributional Methods allowing terms and entities to be related An alternative method for judging semantic across sentences and even larger units such as distance is using word co-occurrence statistics documents. Once the network is built, spread- derived from a very large corpus (McDonald ing activation is used to determine semantic

357 relatedness between terms. For example, to As an example of the usefulness of infor- determine how related car and gasoline are, mation integration, consider the monk-asylum activation is given to one of the nodes, say example, taken from the rg dataset (de- car, and the network is “fired” to allow the scribed in Section 5.1). It is possible that even activation to spread to the rest of the net- a large corpus could contain sentences linking work. The amount of activation received by monk with church, and linking church with gasoline is then a measure of the strength of asylum, but no direct links between monk and the semantic relation between the two terms. asylum. However, with an integrated seman- We use three datasets derived from human tic network, activation can travel across mul- judgements of semantic relatedness to test our tiple links, and through multiple paths, and technique. Since the datasets contain general will show a relationship, albeit probably not terms which may not appear in an existing a very strong one, between monk and asylum, corpus, we create our own corpus by harvest- which corresponds nicely with our intuition. ing text from the web via Google. This ap- Figure 1, which gives an example net- proach has the advantage of requiring little work built from duc documents describing the human intervention and being extensible to Elian Gonzalez custody battle, gives an indi- new datasets. Our results using the semantic cation of the kind of network that ASKNet network derived from the web-based corpus builds. This figure does not give the full net- are comparable to the best performing exist- work, which is too large to show in a sin- ing methods tested on the same datasets. gle figure, but shows the “core” of the net- 4 Creating the Semantic Networks work, where the core is determined using the technique described in (Harrington and Clark, ASKNet creates the semantic networks using 2009). The black boxes represent named en- existing nlp tools to extract syntactic and se- tities mentioned in the text, which may have mantic information from text. This informa- been mentioned a number of times across doc- tion is then combined using a modified version uments, and possibly using different names of the update algorithm used by Harrington (e.g. Fidel Castro vs. President Castro). The and Clark (2007) to create an integrated large- diamonds are named directed edges, which scale network. By mapping together concepts represent relationships between entities. and objects that relate to the same real-world A manual evaluation using human judges entities, the system is able to transform the has been performed to measure the accuracy output of various nlp tools into a single net- of ASKNet networks. On a collection of duc work, producing semantic resources which are documents, the “cores” of the resulting net- more than the sum of their parts. Combin- works were judged to be 80% accurate on av- ing information from multiple sources results erage, where accuracy was measured for the in a representation which would not have been merged entity nodes in the networks and the possible to obtain from analysing the original relations between those entities (Harrington sources separately. and Clark, 2009). The motivation for fully au- The nlp tools used by ASKNet are the tomatic creation is that very large networks, C&C parser (Clark and Curran, 2007) and containing millions of edges, can be created in the semantic analysis program Boxer (Bos a matter of hours. et al., 2004), which operates on the ccg derivations output by the parser to produce Automatically creating networks does result a first-order representation. The named en- in lower precision than manual creation, but tity recognizer of Curran and Clark (2003) this is offset by the scalability and speed of is also used to recognize the standard set of creation. The experiments described in this muc entities, including person, location and paper are a good test of the automatic cre- organisation. ation methodology.

358 Figure 1: Example network core derived from duc documents describing a news story from 2000.

The update algorithm mation contained within the network. The update algorithm iterates over each In order to merge information from multi- named entity node in the update fragment. ple sources into a single cohesive resource, This base node is provided with activation, ASKNet uses a spreading activation based up- which is allowed to spread throughout the date algorithm (Harrington and Clark, 2007). fragment. All named entities which receive As the system encounters new information, it activation in this process, then transfer their attempts to map named entities in the new activation to their target named entity nodes sentences it encounters to those already in its (nodes in the main network with which they network. Each new sentence is turned into an have a current mapping score greater than update fragment; a fragment of the network zero). The amount transferred is based on the representing the information contained in the strength of the mapping score. The activation sentence. Initial mapping scores, based on is then allowed to circulate through the main lexical similarity and named entity type, are network until it reaches a stable state. At this made between the update fragment’s named point, the base node’s mappings are updated entities and those in the main network. The based on which of its target nodes received job of the update algorithm is to improve upon activation. The more activation a target node those initial scorings using the semantic infor- receives, the more its mapping score with the

359 base node will increase. passing activation. We also had to remove the The intuition behind the update algorithm named entity type calculation from the initial is that we can use relatedness of nodes in mapping score, thus leaving the initial scor- the update fragment to determine appropriate ing to be simply the ratio of labels in the two mappings in the main network. So if our base nodes which overlapped. These changes were node has the label “Crosby”, and is related all done after manual observation of test net- to named entity nodes referring to “Canada”, works built from searches not relating to any “Vancouver” and “2010”, those nodes will dataset, and were not changed once the exper- pass their activation onto their main network iments had begun. targets, and hopefully onto the node repre- senting the ice hockey player Sidney Crosby. 4.1 Measuring semantic relatedness We would then increase the mapping score be- Once a large-scale network has been con- tween our base node and this target, while at structed from a corpus of documents, spread- the same time decreasing the mapping score ing activation can be used to efficiently obtain between out base node and the singer Bing a distance score between any two nodes in the Crosby, who (hopefully) would have received network, which will represent the semantic re- little or no activation. The update algorithm latedness of the pair. Each node in the net- is also self-reinforcing, as in the successive work has a current amount of activation and stages, the improved scores will focus the ac- a threshold (similar to classical ideas from the tivation further. In our example, in successive neural network literature). If a node’s activa- iterations, more of the activation coming to tion exceeds its threshold, it will fire, sending the “Crosby” node will be sent to the appro- activation to all of its neighbours, which may priate target node, and therefore there will be cause them to fire, and so on. The amount of less spurious activation in the network to cre- activation sent between nodes decreases with ate noise. distance, so that the effect of the original fir- For the purposes of these experiments, we ing is localized. The localized nature of the extended the update algorithm to map to- algorithm is important because it means that gether general object nodes, rather than fo- semantic relatedness scores can be calculated cusing solely on named entities. This was nec- efficiently even for pairs of nodes in very large essary due to the nature of the task. Sim- networks. ply merging named entities would not be suf- To obtain a score between nodes x and y, ficient, as many of the words in datasets would first a set amount of activation is placed in not likely be associated strongly with any par- node x; then the network is fired until it sta- ticular named entities. Extending the algo- bilises, and the total amount of activation re- rithm in this way resulted in a much higher ceived by node y is stored as act(x,y). This frequency of mapping, and a much more con- process is repeated starting with node y to ob- nected final network. Because of this, we tain act(y,x). The sum of these two values, found that several of the parameters had to which we call dist(x,y), is used as the mea- be changed from those used in Harrington sure of semantic relatedness between x and y.1 and Clark (2009). Our initial activation in- dist(x,y) is a measure of the total put was set to double that used in Harring- strength of connection between nodes x and ton and Clark’s experiments (100 instead of y, relative to the other nodes in their region. 50), to compensate for the activation lost over This takes into account not just direct paths, the higher number of links. We also found but also indirect paths, if the links along those that the number of iterations required to reach paths are of sufficient strength. Since the a stable state had increased to more than 4 1The average could be used also but this has no times the previous number. This was to be effect on the ranking statistics used in the later exper- expected due to the increased number of links iments.

360 networks potentially contain a wide variety sponse speed and accuracy when the prime of relations between terms, the calculation of and target are semantically related. 143 word dist(x,y) has access to a wide variety of in- pairs were tested across 6 different lexical re- formation linking the two terms. If we con- lations: antonymy (e.g., enemy, friend); con- sider the (car, gasoline) example mentioned ceptual association (e.g., bed, sleep); category earlier, the intuition behind our approach is coordination (e.g., train, truck); phrasal as- that these two terms are likely to be closely sociation (e.g., private, property); superordi- related in a semantic network built from text, nation/subordination (e.g., travel, drive); and either fairly directly because they appear in synonymy (e.g., value, worth). It was shown the same sentence or document, or indirectly that equivalent priming effects (i.e., reduced because they are related to the same entities. processing time) were present across all rela- tion types, thus indicating that priming was a 5 Experiments result of the terms’ semantic relatedness, not The purpose of the experiments was to de- merely their similarity or other simpler rela- velop an entirely automated approach for tion type. replicating human judgements of semantic re- The Hodgson dataset consists of the 143 latedness of words. We used three existing word pairs divided by lexical category. There datasets of human judgements: the Hodg- were no scores given as all pairs were shown son, Rubenstein & Goodenough (rg) and to have relatively similar priming effects. No Wordsimilarity-353 (ws-353) datasets. For examples of unrelated pairs are given in the each dataset we created a corpus using re- dataset. We therefore used the unrelated pairs sults returned by Google when queried for created by McDonald and Brew (2004). each word independently (Described in Sec- The task in this experiment was to obtain tion 5.2). We then built a semantic network scores for all pairs, and to do an ANOVA test from that corpus and used the spreading acti- to determine if there is a significant difference vation technique described in the previous sec- between the scores for related and unrelated tion to measure semantic relatedness between pairs. the word pairs in the dataset. The ws-353 dataset (Finkelstein et al., The parser and semantic analysis tool used 2002) contains human rankings of the seman- to create the networks were developed on tic distance between pairs of terms. Although newspaper data (a ccg version of the Penn the name may imply that the scores are based (Steedman and Hockenmaier, 2007; on similarity, human judges were asked to Clark and Curran, 2007)), but our impres- score 353 pairs of words for their relatedness sion from informally inspecting the parser on a scale of 1 to 10, and so the dataset is output was that the accuracy on the web ideal for our purposes. For example, the pair data was reasonable. The experimental re- (money, bank) is in the dataset and receives sults show that the resulting networks were a high relatedness score of 8.50, even though of high enough quality to closely replicate hu- the terms are not lexically similar. man judgements. The dataset contains regular nouns and named entities, as well as at least one 5.1 The datasets term which does not appear in WordNet Many studies have shown a marked prim- (Maradona). In this experiment, we calcu- ing effect for semantically related words. In lated scores for all word pairs, and then used his single-word lexical priming study, (Hodg- rank correlation to compare the similarity of son, 1991) showed that the presentation of our generated scores to those obtained from a prime word such as election directly fa- human judgements. cilitates processing of a target word such as The rg dataset (Rubenstein and Goode- vote. Hodgson showed an increase in both re- nough, 1965) is very similar to the ws-353,

361 though with only 65 word pairs, except that corpus sentences words the human judges were asked to judge the Hodgson 814,779 3,745,870 pairs based on synonymy, rather than over- rg 150,165 573,148 all relatedness. Thus, for example, the pair ws-353 1,042,128 5,027,947 (monk,asylum), receives a significantly lower score than the pair (crane,implement). Table 1: Summary statistics for the corpora gener- ated for the experiments. 5.2 Data collection, preparation and processing to complete, including time for and In order to create a corpus from which to build semantic analysis. the semantic networks, we first extracted each individual word from the pairings, resulting 6 Results in a list of 440 words for the ws-353 collec- 6.1 Hodgson priming dataset tion, 48 words for the rg (some words were After processing the Hodgson corpus to used in multiple pairings), and 282 words for build a semantic network with approximately the Hodgson collection. For each of the words 500,000 nodes and 1,300,000 edges, the appro- in this list, we then performed a query using priate node pairs were fired to obtain the dis- Google, and downloaded the first 5 page re- tance measure as previously described. Those sults for that query. The choice of 5 as the measurements were then recorded as measure- number of documents to download for each ments of semantic relatedness between two word was based on a combination of infor- terms. If a term was used as a label in two mal intuition about the precision and recall of or more nodes, all nodes were tried, and the search engines, as well as the practical issue of highest scoring pairs were used. obtaining a corpus that could be processed in As the Hodgson dataset did not provide ex- reasonable space and time. amples of unrelated pairs against which we Each of the downloaded web pages was then could compare, unrelated pairs were gener- cleaned by a set of Perl scripts which removed ated as described in (McDonald and Brew, all HTML markup. Statistics for the resulting 2004). This is not an ideal method, as sev- corpora are given in Table 1. eral pairs that were identified as unrelated did Three rules were added to the retrieval pro- have some relatively obvious relationship (e.g. cess to deal with problems encountered in for- tree – house, poker – heart). However we chose matting of web-pages: to retain the methodology for consistency with 1. Pages from which no text could be re- previous literature as it was also used in (Pad´o trieved were ignored and replaced with and Lapata, 2007). the next result. Scores were obtained from the network for 2. HTML lists preceded by a colon were re- the word pairs, and for each target an aver- combined into sentences. age score was calculated for all primes in its 3. For Wikipedia disambiguation pages category. Example scores are given in Table (pages which consist of a list of links to 2. articles relating to the various possible Two-way analysis of variance (ANOVA) senses of a word), all of the listed links was carried out on the network scores with were followed and the resulting pages the the relatedness status of the pair being added to the corpus. the independent variable. A reliable effect was observed for the network scores with the Each of these heuristics was performed au- primes for related words being significantly tomatically and without human intervention. larger than those for unrelated words. The The largest of the networks, created for the results are given in Table 3. ws-353 dataset, took slightly over 24 hours The use of ANOVA shows that there is a

362 Word pair Related Network Score F MSE p η2 empty - full Yes 10.13 McDonald & Brew 71.73 0.004 < 0.001 coffee - mug Yes 5.86 Pad´o& Lapata 182.46 0.93 < 0.01 0.332 horse - goat Yes 0.96 pmi 42.53 3.79 < 0.001 0.263 dog - leash Yes 4.70 Network 50.71 3.28 < 0.0001 0.411 friend - antonym No 0.53 vote - conceptual No 1.37 Table 3: ANOVA results of scores generated from property - phrasal No 2.47 the Hodgson dataset compared to those reported for drive - super/sub No 1.86 existing systems. (F = F-test statistic, MSE = Mean squared error, p = P-value, η2 = Effect size) Table 2: Example scores obtained from the network for related and unrelated word pairs from the Hodgson dataset 6.2 ws-353 and rg datasets The methodology used to obtain scores for the ws-353 and rg collections was identical to difference in the scores of the related and un- that used for the Hodgson data, except that related word pairs that cannot be accounted scores were only obtained for those pairs listed for by random variance. However, in order in the data set. Because both collections pro- to compare the strength of the experimental vided direct scores, there was no need to re- effects between two experiments, additional trieve network scores for unrelated pairings. statistics must be used. Eta-squared (η2) is a measure of the strength of an experimental 2 ws-353 rg effect. A high η indicates that the indepen- WikiRelate! 0.48 0.86 dent variable accounts for more of the variabil- Hughes-Ramage 0.55 0.84 ity, and thus indicates a stronger experimen- Agirre Et Al 0.66 0.89 tal effect. In our experiments, we found an pmi 0.41 0.80 η2 of 0.411, which means that approximately Network 0.62 0.86 41% of the overall variance can be explained by the relatedness scores. Table 4: Rank correlation scores for the semantic network and pmi-based approaches, calculated on the For comparison, we provide the ANOVA ws-353 and rg collections, shown against scores for existing systems. results for experiments by (McDonald and Brew, 2004) and (Pad´oand Lapata, 2007) on the same dataset. Both of these experiments For consistency with previous literature, the obtained scores using vector based models scores obtained by the semantic network were populated with data from the bnc. compared with those from the collections us- ing Spearman’s rank correlation. The correla- We also include the results obtained from tion results are given in Table 4. For compar- performing the same ANOVA tests on Point- ison, we have included the results of the same wise Mutual Information scores collected over correlation on scores from three top scoring our corpus. These results were intended to systems using the approaches described above. provide a baseline when using the web-based We also include the scores obtained by using corpus. To calculate the pmi scores for this ex- a simple pmi calculation as in the previous ex- periment, we computed scores for the number periment. of times the two words appeared in the same The scores obtained by our system were not paragraph or document, and the total number an improvement on those obtained by existing of occurrences of words in the corpus. The pmi systems. However, our scores were on par with scores were calculated by simply dividing the the best performing systems, which were pur- number of times the words co-occurred within pose built for this application, and at least in a paragraph, by the product of the number of the case of the system by Agirre et al. used a occurrences of each word within a document. corpus several orders of magnitude larger.

363 7 Conclusion Harrington, Brian and Stephen Clark. 2007. Asknet: Automated semantic knowledge network. In Pro- In this paper we have shown that a semantic ceedings of the 22nd National Conference on Arti- network approach to determining semantic re- ficial Intelligence (AAAI’07), pages 889–894, Van- latedness of terms can achieve performance on couver, Canada. par with the best purpose built systems. This is interesting for two reasons. Firstly, the ap- Harrington, Brian and Stephen Clark. 2009. Asknet: proach we have taken in this paper is much Creating and evaluating large scale integrated se- more analogous to the way humans perform mantic networks. International Journal of Seman- similar tasks. Secondly, the system used was tic Computing, 2(3):343–364. not purpose built for this application. It is instead a general tool for information collec- Hodgson, James. 1991. Information constraints on tion and integration, and this result indicates pre-lexical priming. Language and Cognitive Pro- that it will likely be useful for a wide variety cesses, 6:169 – 205. of language processing applications. Hughes, Thad and Daniel Ramage. 2007. Lexical se- mantic relatedness with walks. In Proceedings of the 2007 Joint Conference on Em- References pirical Methods in Natural Language Processing and Agirre, Eneko, Enrique Alfonseca, Keith Hall, Jana Computational Natural Language Learning, pages Kravalova, Marius Pasca, and Aitor Soroa. 2009. A 581–589, Prague, Czech Republic. study on similarity and relatedness using distribu- Jiang, J. J. and D. W. Conrath. 1997. Seman- tional and -based approaches. In Proceed- tic similarity based on corpus statistics and lexi- ings of NAACL-HLT. cal taxonomy. In International Conference on Re- Banerjee, Satanjeev and T. Pedersen. 2003. Extended search on Computational Linguistics, Taipei, Tai- gloss overlaps as a measure of semantic relatedness. wan, September. In Proceedings of the Eighteenth International Con- McDonald, Scott and Chris Brew. 2004. A distri- ference on Artificial Intelligence, Acapulco, Mexico. butional model of semantic context effects in lexi- cal processing. In Proceedings of the 42th Annual Bos, Johan, Stephen Clark, Mark Steedman, James R. Meeting of the Association for Computational Lin- Curran, and Julia Hockenmaier. 2004. Wide- guistics, pages 17 – 24, Barcelona, Spain. coverage semantic representations from a CCG parser. In Proceedings of the 20th Interna- Mohammad, Saif and Graeme Hirst. 2006. Dis- tional Conference on Computational Linguistics tributional measures of concept-distance: A task- (COLING-04), pages 1240–1246, Geneva, Switzer- oriented evaluation. In Proceedings of the Con- land. ference on Empirical Methods in Natural Language Processing (EMNLP 2006), Sydney, Australia. Budanitsky, Alexander and Graeme Hirst. 2006. Eval- uating wordnet-based measures of semantic dis- Pad´o, Sebastian and Mirella Lapata. 2007. tance. Computational Linguistics, 32:13 – 47, Dependency-based construction of semantic space March. models. Computational Linguistics, 33(2):161–199. Clark, Stephen and James R. Curran. 2007. Wide- Resnik, Philip. 1999. Semantic similarity in a tax- coverage efficient statistical parsing with CCG onomy: An information-based measure and its ap- and log-linear models. Computational Linguistics, plication to problems of ambiguity in natural lan- 33(4):493–552. guage. Journal of Artificial Intelligence Research, 11:95–130. Collins, Allan M. and Elizabeth F. Loftus. 1975. A spreading-activation theory of semantic processing. Rubenstein, H. and J.B. Goodenough. 1965. Contex- Psychological Review, 82(6):407–428. tual correlates of synonymy. Computational Lin- guistics, 8:627 – 633. Curran, James R. and Stephen Clark. 2003. Language independent NER using a maximum entropy tagger. Steedman, Mark and Julia Hockenmaier. 2007. Ccg- In Proceedings of the Seventh Conference on Natu- bank: A corpus of ccg derivations and dependency ral Language Learning, pages 164–167, Edmonton, structures extracted from the penn treebank. Com- Canada. putational Linguistics, 33:355–396.

Fellbaum, Christiane, editor. 1998. WordNet : An Strube, Michael and Simone Paolo Ponzetto. 2006. Electronic Lexical . MIT Press, Cam- Wikirelate! computing semantic relatedness using bridge, Mass, USA. wikipedia. In Proceedings of the 21st national con- ference on Artificial intelligence, pages 1419–1424. Finkelstein, Lev, Evgeniy Gabrilovich, Yossi Matias, AAAI Press. Ehud Rivlin, Zach Solan, Gadi Wolfman, and Ey- Turney, Peter D. 2001. Lecture notes in tan Ruppin. 2002. Placing search in context: The science 1: Mining the web for : PMI-IR concept revisited. In ACM Transactions on Infor- versus LSA on TOEFL. mation Systems, volume 20(1), pages 116–131.

364