<<

A Generalized Model for Text Retrieval Based on Semantic Relatedness

George Tsatsaronis and Vicky Panagiotopoulou Department of Informatics Athens University of Economics and Business, 76, Patision Str., Athens, Greece [email protected], [email protected]

Abstract et al. (2003) improved performance, while sev- Generalized Vector Space Models eral years before, Krovetz and Croft (1992) had (GVSM) extend the standard Vector already pointed out that resolving word senses can Space Model (VSM) by embedding addi- improve searches requiring high levels of recall. tional types of information, besides terms, In this work, we argue that the incorporation in the representation of documents. An of semantic information into a GVSM retrieval interesting type of information that can model can improve performance by considering be used in such models is semantic infor- the semantic relatedness between the query and mation from word thesauri like WordNet. document terms. The proposed model extends Previous attempts to construct GVSM the traditional VSM with term to term relatedness reported contradicting results. The most measured with the use of WordNet. The success of challenging problem is to incorporate the the method lies in three important factors, which semantic information in a theoretically also constitute the points of our contribution: 1) a sound and rigorous manner and to modify new measure for computing semantic relatedness the standard interpretation of the VSM. between terms which takes into account relation In this paper we present a new GVSM weights, and senses’ depth; 2) a new GVSM re- model that exploits WordNet’s semantic trieval model, which incorporates the aforemen- information. The model is based on a new tioned semantic relatedness measure; 3) exploita- measure of semantic relatedness between tion of all the semantic information a thesaurus terms. Experimental study conducted can offer, including semantic relations crossing in three TREC collections reveals that parts of speech (POS). Experimental evaluation semantic information can boost text in three TREC collections shows that the pro- retrieval performance with the use of the posed model can improve in certain cases the proposed GVSM. performance of the standard TF-IDF VSM. The rest of the paper is organized as follows: Section 1 Introduction 2 presents preliminary concepts, regarding VSM The use of semantic information into text retrieval and GVSM. Section 3 presents the term seman- or text classification has been controversial. For tic relatedness measure and the proposed GVSM. example in Mavroeidis et al. (2005) it was shown Section 4 analyzes the experimental results, and that a GVSM using WordNet (Fellbaum, 1998) Section 5 concludes and gives pointers to future senses and their hypernyms, improves text clas- work. sification performance, especially for small train- ing sets. In contrast, Sanderson (1994) reported 2 Background that even 90% accurate WSD cannot guarantee 2.1 Vector Space Model retrieval improvement, though their experimental methodology was based only on randomly gen- The VSM has been a standard model of represent- erated pseudowords of varying sizes. Similarly, ing documents in for almost Voorhees (1993) reported a drop in retrieval per- three decades (Salton and McGill, 1983; Baeza- formance when the retrieval model was based on Yates and Ribeiro-Neto, 1999). Let D be a docu- WSD information. On the contrary, the construc- ment collection and Q the set of queries represent- tion of a sense-based retrieval model by Stokoe ing users’ information needs. Let also ti symbol-

Proceedings of the EACL 2009 Student Research Workshop, pages 70–78, Athens, Greece, 2 April 2009. c 2009 Association for Computational Linguistics

70 ize term i used to index the documents in the col- vectors, respectively, as before, a´ki, q´j are the new lection, with i = 1,..,n. The VSM assumes that weights, and n´ the new space dimensions. for each term t there exists a vector t~ in the vector i i n´ n´ ~ ~ space that represents it. It then considers the set of j=1 i=1 a´kiq´jtitj cos(d~k, ~q)= P P (2) all term vectors {t~i} to be the generating set of the n´ 2 n´ 2 q i=1 a´ki j=1 q´j vector space, thus the space basis. If each dk,(for P P k = 1,..,p) denotes a document of the collection, From equation 2 it follows that the term vectors then there exists a linear combination of the term t~i and t~j need not be known, as long as the cor- vectors {t~i} which represents each dk in the vector relations between terms ti and tj are known. If space. Similarly, any query q can be modelled as one assumes pairwise orthogonality, the similarity a vector ~q that is a linear combination of the term measure is reduced to that of equation 1. vectors. 2.3 Semantic Information and GVSM In the standard VSM, the term vectors are con- sidered pairwise orthogonal, meaning that they are Since the introduction of the first GVSM model, linearly independent. But this assumption is un- there are at least two basic directions for em- realistic, since it enforces lack of relatedness be- bedding term to term relatedness, other than ex- tween any pair of terms, whereas the terms in a act keyword matching, into a retrieval model: language often relate to each other. Provided that (a) compute semantic correlations between terms, the orthogonality assumption holds, the similarity or (b) compute frequency co-occurrence statistics between a document vector d~k and a query vec- from large corpora. In this paper we focus on the tor ~q in the VSM can be expressed by the cosine first direction. In the past, the effect of WSD infor- measure given in equation 1. mation in text retrieval was studied (Krovetz and Croft, 1992; Sanderson, 1994), with the results re- n a q vealing that under circumstances, senses informa- ~ j=1 kj j cos(dk, ~q)= P (1) tion may improve IR. More specifically, Krovetz n a2 n q2 qPi=1 ki Pj=1 j and Croft (1992) performed a series of three exper- iments in two document collections, CACM and where akj, qj are real numbers standing for the TIMES. The results of their experiments showed weights of term j in the document dk and the that word senses provide a clear distinction be- query q respectively. A standard baseline retrieval tween relevant and nonrelevant documents, reject- strategy is to rank the documents according to their ing the null hypothesis that the meaning of a word to the query. is not related to judgments of relevance. Also, they reached the conclusion that words being worth 2.2 Generalized Vector Space Model of disambiguation are either the words with uni- Wong et al. (1987) presented an analysis of the form distribution of senses, or the words that in problems that the pairwise orthogonality assump- the query have a different sense from the most tion of the VSM creates. They were the first to popular one. Sanderson (1994) studied the in- address these problems by expanding the VSM. fluence of disambiguation in IR with the use of They introduced term to term correlations, which pseudowords and he concluded that sense ambi- deprecated the pairwise orthogonality assumption, guity is problematic for IR only in the cases of but they kept the assumption that the term vectors retrieving from short queries. Furthermore, his are linearly independent1, creating the first GVSM findings regarding the WSD used were that such model. More specifically, they considered a new a WSD system would help IR if it could perform space, where each term vector t~i was expressed as with very high accuracy, although his experiments n n a linear combination of 2 vectors ~mr, r = 1..2 . were conducted in the Reuters collection, where The similarity measure between a document and a standard queries with corresponding relevant doc- query then became as shown in equation 2, where uments (qrels) are not provided. n t~i and t~j are now term vectors in a 2 dimensional Since then, several recent approaches have vector space, d~k, ~q are the document and the query incorporated semantic information in VSM. Mavroeidis et al. (2005) created a GVSM ker- 1It is known from Linear Algebra that if every pair of vec- tors in a set of vectors is orthogonal, then this set of vectors nel based on the use of noun senses, and their is linearly independent, but not the inverse. hypernyms from WordNet. They experimentally

71 showed that this can improve text categorization. stronger edge types. The intuition behind the as- Stokoe et al. (Stokoe et al., 2003) reported an im- sumption of edges’ weighting is the fact that some provement in retrieval performance using a fully edges provide stronger semantic connections than sense-based system. Our approach differs from others. In the next subsection we propose a can- the aforementioned ones in that it expands the didate method of computing weights. The com- VSM model using the semantic information of a pactness of two senses s1 and s2, can take differ- word thesaurus to interpret the orthogonality of ent values for all the different paths that connect terms and to measure semantic relatedness, in- the two senses. All these paths are examined, as stead of directly replacing terms with senses, or explained later, and the path with the maximum adding senses to the model. weight is eventually selected (definition 3). An- other parameter that affects term relatedness is the 3 A GVSM Model based on Semantic depth of the sense nodes comprising the path. A Relatedness of Terms standard means of measuring depth in a word the- saurus is the hypernym/hyponym hierarchical re- Synonymy (many words per sense) and polysemy lation for the noun and adjective POS and hyper- (many senses per word) are two fundamental prob- nym/troponym for the verb POS. A path with shal- lems in text retrieval. Synonymy is related with low sense nodes is more general compared to a recall, while polysemy with precision. One stan- path with deep nodes. This parameter of seman- dard method to tackle synonymy is the expansion tic relatedness between terms is captured by the of the query terms with their synonyms. This in- measure of semantic path elaboration introduced creases recall, but it can reduce precision dramat- in the following definition. ically. Both polysemy and synonymy can be cap- tured on the GVSM model in the computation of Definition 2 Given a word thesaurus O and a pair of senses , where , the inner product between t~i and t~j in equation 2, S = (s1, s2) s1 s2 ∈ O as will be explained below. and s1 6= s2, and a path between the two senses of length l, the semantic path elaboration of the path (SPE(S,O)) is defined as l 2didi+1 · 1 , 3.1 Semantic Relatedness i=1 di+di+1 dmax where d is the depth of senseQs according to O, In our model, we measure semantic relatedness us- i i and dmax the maximum depth of O. If s1 = s2, ing WordNet. It considers the path length, cap- d and d = d1 = d2, SPE(S,O)= . If there is tured by compactness (SCM), and the path depth, dmax no path from s to s , SPE(S,O) = 0. captured by semantic path elaboration (SPE), 1 2 which are defined in the following. The two mea- Essentially, SPE is the harmonic mean of the sures are combined to for semantic relatedness two depths normalized to the maximum thesaurus (SR) beetween two terms. SR, presented in defini- depth. The harmonic mean offers a lower upper tion 3, is the basic module of the proposed GVSM bound than the average of depths and we think model. The adopted method of building seman- is a more realistic estimation of the path’s depth. tic networks and measuring semantic relatedness SCM and SPE capture the two most important from a word thesaurus is explained in the next sub- parameters of measuring semantic relatedness be- section. tween terms (Budanitsky and Hirst, 2006), namely path length and senses depth in the used thesaurus. Definition 1 Given a word thesaurus O, a weight- We combine these two measures naturally towards ing scheme for the edges that assigns a weight e ∈ defining the Semantic Relatedness between two (0, 1) for each edge, a pair of senses S = (s1, s2), terms. and a path of length l connecting the two senses, the semantic compactness of S (SCM(S,O)) is Definition 3 Given a word thesaurus O, a pair of l defined as ei, where e1, e2,...,el are the terms T = (t1,t2), and all pairs of senses S = Qi=1 path’s edges. If s1 = s2 SCM(S,O) = 1. If there (s1i, s2j), where s1i, s2j senses of t1,t2 respec- is no path between s1 and s2 SCM(S,O) = 0. tively. The semantic relatedness of T (SR(T,S,O)) is defined as max{SCM(S,O)·SPE(S,O)}. SR Note that compactness considers the path length between two terms ti,tj where ti ≡ tj ≡ t and and has values in the set [0, 1]. Higher com- t∈ / O is defined as 1. If ti ∈ O but tj ∈/ O, or pactness between senses declares higher seman- ti ∈/ O but tj ∈ O, SR is defined as 0. tic relatedness and larger weight are assigned to

72 Holonym Net 2.0. Then, SR((t ,t ), (S.i.2,S.j.1),O) = S.i.1 S.j.1 ... i j Synonym 3 2 Meronym 0.5 · 0.4615 · 0.5 = 0.01442. Example 3 of S.i.2 S.j.2 S.i.2 Hypernym S.j.1 tj ... ... figure 2 illustrates another possibility where S.i.7 ti ... ... Antonym Hyponym S.i.7 S.j.5 and S.j.5 is another examined pair of senses for ti Initial Phase Network Expansion Example 1 and tj respectively. In this case, the two senses co- Hyponym ... Hypernym S.i.1 S.j.1 d Synonym incide, and SR((ti,tj), (S.i.7,S.j.5),O) = 1· 14 , S.i.2 S.j.2 Meronym S.i.2 S.j.1 where d the depth of the sense. When two senses S.i.2.2 e t 3 ti ... ... j e ... Hyponym 2 S.i.7 Domain coincide, , as mentioned in definition 1, e S.j.5 SCM = 1 1 S.i.2.1 Network Expansion Example 2 Network Expansion Example 3 a secondary criterion must be levied to distinguish Index: = Word Node = Sense Node = Semantic Link the relatedness of senses that match. This crite- rion in SR is SPE, which assumes that a sense Figure 1: Computation of semantic relatedness. is more specific as we traverse WordNet graph downwards. In the specified example, SCM = 1, but d . This will give a final value to 3.2 Semantic Networks from Word Thesauri SPE = 14 SR that will be less than 1. This constitutes an intrin- In order to construct a semantic network for a pair sic property of SR, which is expressed by SPE. of terms t1 and t2 and a combination of their re- The rationale behind the computation of SPE spective senses, i.e., s1 and s2, we adopted the stems from the fact that word senses in WordNet network construction method that we introduced are organized into synonym sets, named synsets. in (Tsatsaronis et al., 2007). This method was pre- Moreover, synsets belong to hierarchies (i.e., noun ferred against other related methods, like the one hierarchies developed by the hypernym/hyponym introduced in (Mihalcea et al., 2004), since it em- relations). Thus, in case two words map into the beds all the available semantic information exist- same synset (i.e., their senses belong to the same ing in WordNet, even edges that cross POS, thus synset), the computation of their semantic related- offering a richer semantic representation. Accord- ness must additionally take into account the depth ing to the adopted semantic network construction of that synset in WordNet. model, each semantic edge type is given a different weight. The intuition behind edge types’ weight- 3.3 Computing Maximum Semantic ing is that certain types provide stronger semantic Relatedness connections than others. The frequency of occur- In the expansion of the VSM model we need to rence of the different edge types in Wordnet 2.0, is weigh the inner product between any two term used to define the edge types’ weights (e.g. 0.57 vectors with their semantic relatedness. It is obvi- for hypernym/hyponym edges, 0.14 for nominal- ous that given a word thesaurus, there can be more ization edges etc.). than one semantic paths that link two senses. In Figure 1 shows the construction of a semantic these cases, we decide to use the path that max- network for two terms ti and tj. Let the high- imizes the semantic relatedness (the product of lighted senses S.i.2 and S.j.1 be a pair of senses SCM and SPE). This computation can be done of ti and tj respectively. All the semantic links according to the following algorithm, which is a of the highlighted senses, as found in WordNet, modification of Dijkstra’s algorithm for finding are added as shown in example 1 of figure 1. The the shortest path between two nodes in a weighted process is repeated recursively until at least one directed graph. The proof of the algorithm’s cor- path between S.i.2 and S.j.1 is found. It might be rectness follows with theorem 1. the case that there is no path from S.i.2 to S.j.1. Theorem 1 Given a word thesaurus , a weight- In that case SR((ti,tj), (S.i.2,S.j.1),O) = 0. O Suppose that a path is that of example 2, where ing function w : E → (0, 1), where a higher value declares a stronger edge, and a pair of senses e1, e2, e3 are the respective edge weights, d1 is the S(s , s ) declaring source (s ) and destination depth of S.i.2, d2 the depth of S.i.2.1, d3 the depth s f s (s ) vertices, then the SCM(S,O) · SPE(S,O) of S.i.2.2 and d4 the depth of S.j.1, and dmax the f maximum thesaurus depth. For reasons of sim- is maximized for the path returned by Algorithm 1, by using the weighting scheme e = w · plicity, let e1 = e2 = e3 = 0.5, and d1 = 3. ij ij 2·di·dj , where eij the new weight of the edge Naturally, d2 = 4, and let d3 = d4 = d2 = 4. Fi- dmax·(di+dj ) nally, let dmax = 14, which is the case for Word- connecting senses si and sj, and wij the initial

73 Algorithm 1 MaxSR(G,u,v,w) above to prove the theorem. Because sy oc- Require: A directed weighted graph G, two curs before sf on the path from ss to sf and all nodes u, v and a weighting scheme w : E → edge weights are nonnegative2 and in (0, 1) we (0..1). have δ(ss, sy) ≥ δ(ss, sf ), and thus d[sy] = Ensure: The path from u to v with the maximum δ(ss, sy) ≥ δ(ss, sf ) ≥ d[sf ]. But both sy product of the edges weights. and sf were in V − S when sf was chosen, Initialize-Single-Source(G,u) so we have d[sf ] ≥ d[sy]. Thus, d[sy] = 1: for all vertices v ∈ V [G] do δ(ss, sy) = δ(ss, sf ) = d[sf ]. Consequently, 2: d[v]= −∞ d[sf ] = δ(ss, sf ) which contradicts our choice of 3: π[v]= NULL sf . We conclude that at the time each vertex sf is 4: end for inserted into S, d[sf ]= δ(ss, sf ). 5: d[u] = 1 Next, to prove that the returned maximum Relax(u,v,w) product is the SCM(S,O) · SPE(S,O), let 6: if d[v]

74 SR((ti,tj),O) = SR((tj,ti),O), the new space (2003), (GM) Gabrilovich and Markovitch (2007), n·(n−1) (F) Finkelstein et al. (2002), (HR) ) and (SP) is of 2 dimensions. In this space, each di- mension stands for a distinct pair of terms. Given Strube and Ponzetto (2006). In Table 1 the results a document vector d~k in the VSM TF-IDF space, of SR and the ten compared measures are shown. we define the value in the (i, j) dimension of The reported numbers are the Spearman correla- the new document vector space as dk(ti,tj) = tion of the measures’ rankings with the gold stan- (T F − IDF (ti,dk)+ T F − IDF (tj,dk)) · t~it~j. dard (human judgements). We add the TF-IDF values because any product- The correlations for the three data sets show that based value results to zero, unless both terms are SR performs better than any other measure of se- present in the document. The dimensions q(ti,tj) of the query, are computed similarly. A GVSM mantic relatedness, besides the case of (HR) in the model aims at being able to retrieve documents M&C data set. It surpasses HR though in the R&G that not necessarily contain exact matches of the and the 353-C data set. The latter contains the query terms, and this is its great advantage. This new space leads to a new GVSM model, which is word pairs of the M&C data set. To visualize the a natural extension of the standard VSM. The co- performance of our measure in a more comprehen- sine similarity between a document dk and a query sible manner, Figure 2 presents for all pairs in the q now becomes: R&G data set, and with increasing order of relat- edness values based on human judgements, the re- n n d (t , t ) · q(t , t ) ~ i=1 j=i k i j i j cos(dk, ~q) = P P spective values of these pairs that SR produces. A n n 2 n n 2 =1 = dk(ti, tj ) · =1 = q(ti, tj ) qPi Pj i qPi Pj i closer look on Figure 2 reveals that the values pro- (4) duced by SR (right figure) follow a pattern similar where n is the dimension of the VSM TF-IDF to that of the human ratings (left figure). Note that space. the x-axis in both charts begins from the least re- 4 Experimental Evaluation lated pair of terms, according to humans, and goes up to the most related pair of terms. The y-axis The experimental evaluation in this work is two- in the left chart is the respective humans’ rating fold. First, we test the performance of the seman- for each pair of terms. The right figure shows SR tic relatedness measure (SR) for a pair of words for each pair. The reader can consult Budanitsky in three benchmark data sets, namely the Ruben- and Hirst (2006) to confirm that all the other mea- stein and Goodenough 65 word pairs (Ruben- sures of semantic relatedness we compare to, do stein and Goodenough, 1965)(R&G), the Miller not follow the same pattern as the human ratings, and Charles 30 word pairs (Miller and Charles, as closely as our measure of relatedness does (low 1991)(M&C), and the 353 similarity data set y values for small x values and high y values for (Finkelstein et al., 2002). Second, we evaluate high x). The same pattern applies in the M&C and the performance of the proposed GVSM in three 353-C data sets. TREC collections (TREC 1, 4 and 6). 4.2 Evaluation of the GVSM 4.1 Evaluation of the Semantic Relatedness Measure For the evaluation of the proposed GVSM model, we have experimented with three TREC collec- For the evaluation of the proposed semantic re- tions 3, namely TREC 1 (TIPSTER disks 1 and latedness measure between two terms we experi- 2), TREC 4 (TIPSTER disks 2 and 3) and TREC mented in three widely used data sets in which hu- 6 (TIPSTER disks 4 and 5). We selected those man subjects have provided scores of relatedness TREC collections in order to cover as many dif- for each pair. A kind of ”gold standard” ranking ferent thematic subjects as possible. For example, of related word pairs (i.e., from the most related TREC 1 contains documents from the Wall Street words to the most irrelevant) has thus been cre- Journal, Associated Press, Federal Register, and ated, against which computer programs can test abstracts of U.S. department of energy. TREC 6 their ability on measuring semantic relatedness be- differs from TREC 1, since it has documents from tween words. We compared our measure against Financial Times, Los Angeles Times and the For- ten known measures of semantic relatedness: (HS) eign Broadcast Information Service. Hirst and St-Onge (1998), (JC) Jiang and Conrath For each TREC, we executed the standard base- (1997), (LC) Leacock et al. (1998), (L) Lin (1998), (R) Resnik (1995), (JS) Jarmasz and Szpakowicz 3http://trec.nist.gov/

75 HS JC LC L R JS GM F HR SP SR R&G 0.745 0.709 0.785 0.77 0.748 0.842 0.816 N/A 0.817 0.56 0.861 M&C 0.653 0.805 0.748 0.767 0.737 0.832 0.723 N/A 0.904 0.49 0.855 353-C N/A N/A 0.34 N/A 0.35 0.55 0.75 0.56 0.552 0.48 0.61

Table 1: Correlations of semantic relatedness measures with human judgements.

HUMAN RATINGS AGAINST HUMAN RANKINGS SEMANTIC RELATEDNESS AGAINST HUMAN RANKINGS 4 0.9 3.5 0.8 3 0.7 2.5 0.6 2 0.5 1.5 0.4 Human Rating 1 0.3 Semantic Relatedness 0.5 0.2 correlation of human pairs ranking and human ratings correlation of human pairs ranking and semantic relatedness 0 0.1 10 20 30 40 50 60 65 10 20 30 40 50 60 65 Pair Number Pair Number

Figure 2: Correlation between human ratings and SR in the R&G data set. line TF-IDF VSM model for the first 20 topics in TREC 1. This small boost in performance of each collection. Limited resources prohibited proves that the proposed GVSM model is promis- us from executing experiments in the top 1000 ing. There are many aspects though in the GVSM documents. To minimize the execution time, we that we think require further investigation, like for have indexed all the pairwise semantic related- example the fact that we have not conducted WSD ness values according to the SR measure, in a so as to map each document and query term oc- database, whose size reached 300GB. Thus, the currence into its correct sense, or the fact that the execution of the SR itself is really fast, as all pair- weighting scheme of the edges used in SR gen- wise SR values between WordNet synsets are in- erates from the distribution of each edge type in dexed. For TREC 1, we used topics 51 − 70, for WordNet, while there might be other more sophis- TREC 4 topics 201 − 220 and for TREC 6 topics ticated ways to compute edge weights. We believe 301 − 320. From the results of the VSM model, that if these, but also more aspects discussed in we kept the top-50 retrieved documents. In order the next section, are tackled, the proposed GVSM to evaluate whether the proposed GVSM can aid may improve more the retrieval performance. the VSM performance, we executed the GVSM in the same retrieved documents. The interpo- 5 Future Work lated precision-recall values in the 11-standard re- call points for these executions are shown in fig- From the experimental evaluation we infer that ure 3 (left graphs), for both VSM and GVSM. In SR performs very well, and in fact better than all the right graphs of figure 3, the differences in in- the tested related measures. With regards to the terpolated precision for the same recall levels are GVSM model, experimental evaluation in three depicted. For reasons of simplicity, we have ex- TREC collections has shown that the model is cluded the recall values in the right graphs, above promising and may boost retrieval performance which, both systems had zero precision. Thus, for more if several details are further investigated and TREC 1 in the y-axis we have depicted the differ- further enhancements are made. Primarily, the ence in the interpolated precision values (%) of the computation of the maximum semantic related- GVSM from the VSM, for the first 4 recall points. ness between two terms includes the selection of For TRECs 4 and 6 we have done the same for the the semantic path between two senses that maxi- first 9 and 8 recall points respectively. mizes SR. This can be partially unrealistic since we are not sure whether these senses are the cor- As shown in figure 3, the proposed GVSM may rect senses of the terms. To tackle this issue, improve the performance of the TFIDF VSM up to WSD techniques may be used. In addition, the 1.93% in TREC 4, 0.99% in TREC 6 and 0.42% role of phrase detection is yet to be explored and

76 Precision-Recall Curves TREC 1 Differences from Interpolated Precision in TREC 1 100 1.0 90 0.7 80 70 0.3 60 50 0.0 40 -0.3 30 20 -0.7 Precision Values (%) 10 VSM GVSM GVSM Precision Difference (%) TFIDF VSM 0 -1 0 10 20 30 40 0 10 20 30 Recall Values (%) Recall Values (%) Precision-Recall Curves TREC 4 Differences from Interpolated Precision in TREC 4 90 2.0 80 1.5 70 1 60 0.5 50 0 40 30 20 -1

Precision Values (%) 10 VSM -1.5 GVSM GVSM Precision Difference (%) TFIDF VSM 0 -2 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 Recall Values (%) Recall Values (%) Precision-Recall Curves TREC 6 Differences from Interpolated Precision in TREC 6 70 2.0 60 1.5 50 1 0.5 40 0 30

20 -1

Precision Values (%) 10 VSM -1.5 GVSM GVSM Precision Difference (%) TFIDF VSM 0 -2 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 Recall Values (%) Recall Values (%)

Figure 3: Differences (%) from the baseline in interpolated precision. added into the model. Since we are using a large 6 Conclusions knowledge-base (WordNet), we can add a simple method to look-up term occurrences in a specified In this paper we presented a new measure of window and check whether they form a phrase. semantic relatedness and expanded the standard This would also decrease the ambiguity of the re- VSM to embed the semantic relatedness between spective text fragment, since in WordNet a phrase pairs of terms into a new GVSM model. The is usually monosemous. semantic relatedness measure takes into account Moreover, there are additional aspects that de- all of the semantic links offered by WordNet. It serve further research. In previously proposed considers WordNet as a graph, weighs edges de- GVSM, like the one proposed by Voorhees (1993), pending on their type and depth and computes or by Mavroeidis et al. (2005), it is suggested the maximum relatedness between any two nodes, that semantic information can create an individual connected via one or more paths. The com- space, leading to a dual representation of each doc- parison to well known measures gives promis- ument, namely, a vector with document’s terms ing results. The application of our measure in and another with semantic information. Ratio- the suggested GVSM demonstrates slightly im- nally, the proposed GVSM could act complemen- proved performance in information retrieval tasks. tary to the standard VSM representation. Thus, the It is on our next plans to study the influence of similarity between a query and a document may be WSD performance on the proposed model. Fur- computed by weighting the similarity in the terms thermore, a comparative analysis between the pro- space and the senses’ space. Finally, we should posed GVSM and other semantic network based also examine the perspective of applying the pro- models will also shed light towards the condi- posed measure of semantic relatedness in a query tions, under which, embedding semantic informa- expansion technique, similarly to the work of Fang tion improves text retrieval. (2008).

77 References D. McCarthy, R. Koeling, J. Weeds, and J. Carroll. 2004. Finding predominant word senses in untagged R. Baeza-Yates and B. Ribeiro-Neto. 1999. Modern text. In Proc, of the 42nd ACL, pages 280–287. Information Retrieval. Addison Wesley. Spain.

S. Brody, R. Navigli, and M. Lapata. 2006. Ensemble R. Mihalcea and A. Csomai. 2005. Senselearner: methods for unsupervised wsd. In Proc. of COL- Word sense disambiguation for all words in unre- ING/ACL 2006, pages 97–104. stricted text. In Proc. of the 43rd ACL, pages 53–56. A. Budanitsky and G. Hirst. 2006. Evaluating R. Mihalcea, P. Tarau, and E. Figa. 2004. Pagerank on -based measures of lexical semantic related- semantic networks with application to word sense ness. Computational Linguistics, 32(1):13–47. disambiguation. In Proc. of the 20th COLING.

T.H. Cormen, C.E. Leiserson, and R.L. Rivest. 1990. G.A. Miller and W.G. Charles. 1991. Contextual cor- Introduction to Algorithms. The MIT Press. relates of . Language and Cogni- tive Processes, 6(1):1–28. H. Fang. 2008. A re-examination of query expansion using lexical resources. In Proc. of ACL 2008, pages A. Montoyo, A. Suarez, G. Rigau, and M. Palomar. 139–147. 2005. Combining knowledge- and corpus-based word-sense-disambiguation methods. Journal of Ar- C. Fellbaum. 1998. WordNet – an electronic lexical tificial Intelligence Research, 23:299–330, March. database. MIT Press. P. Resnik. 1995. Using information content to evalu- L. Finkelstein, E. Gabrilovich, Y. Matias, E. Rivlin, ate semantic similarity. In Proc. of the 14th IJCAI, Z. Solan, G. Wolfman, and E. Ruppin. 2002. Plac- pages 448–453, Canada. ing search in context: The concept revisited. ACM TOIS, 20(1):116–131. H. Rubenstein and J.B. Goodenough. 1965. Contex- tual correlates of synonymy. Communications of the E. Gabrilovich and S. Markovitch. 2007. Computing ACM, 8(10):627–633. semantic relatedness using wikipedia-based explicit semantic analysis. In Proc. of the 20th IJCAI, pages G. Salton and M.J. McGill. 1983. Introduction to 1606–1611. Hyderabad, India. Modern Information Retrieval. McGraw-Hill.

G. Hirst and D. St-Onge. 1998. Lexical chains as rep- M. Sanderson. 1994. Word sense disambiguation and resentations of context for the detection and correc- information retrieval. In Proc. of the 17th SIGIR, tion of malapropisms. In WordNet: An Electronic pages 142–151, Ireland. ACM. Lexical Database, chapter 13, pages 305–332, Cam- bridge. The MIT Press. R. Sinha and R. Mihalcea. 2007. Unsupervised graph- based word sense disambiguation using measures of M. Jarmasz and S. Szpakowicz. 2003. Roget’s the- word semantic similarity. In Proc. of the IEEE In- saurus and semantic similarity. In Proc. of Confer- ternational Conference on Semantic Computing. ence on Recent Advances in Natural Language Pro- cessing, pages 212–219. C. Stokoe, M.P. Oakes, and J. Tait. 2003. Word sense disambiguation in information retrieval revisited. In J.J. Jiang and D.W. Conrath. 1997. Semantic similarity Proc. of the 26th SIGIR, pages 159–166. based on corpus statistics and lexical taxonomy. In Proc. of ROCLING X, pages 19–33. M. Strube and S.P. Ponzetto. 2006. Wikirelate! com- puting semantic relatedness using wikipedia. In R. Krovetz and W.B. Croft. 1992. Lexical ambigu- Proc. of the 21st AAAI. ity and information retrieval. ACM Transactions on Information Systems, 10(2):115–141. G. Tsatsaronis, M. Vazirgiannis, and I. Androutsopou- los. 2007. Word sense disambiguation with spread- C. Leacock, G. Miller, and M. Chodorow. 1998. ing activation networks generated from thesauri. In Using corpus statistics and wordnet relations for Proc. of the 20th IJCAI, pages 1725–1730. sense identification. Computational Linguistics, E. Voorhees. 1993. Using wordnet to disambiguate 24(1):147–165, March. word sense for text retrieval. In Proc. of the 16th D. Lin. 1998. An information-theoretic definition of SIGIR, pages 171–180. ACM. similarity. In Proc. of the 15th International Con- S.K.M. Wong, W. Ziarko, V.V. Raghavan, and P.C.N. ference on Machine Learning, pages 296–304. Wong. 1987. On modeling of information retrieval D. Mavroeidis, G. Tsatsaronis, M. Vazirgiannis, concepts in vector spaces. ACM Transactions on M. Theobald, and G. Weikum. 2005. Word sense Database Systems, 12(2):299–321. disambiguation for exploiting hierarchical thesauri in text classification. In Proc. of the 9th PKDD, pages 181–192.

78