Using Semantic Graphs and Word Sense Disambiguation Techniques to Improve Text Summarization

Procesamiento del Lenguaje Natural, Revista nº 47 septiembre de 2011, pp 97-105 recibido 23-04-2011 aceptado 24-05-2011

Uso de Grafos Semánticos y de Técnicas de Desambiguaciónen la GeneraciónAutomática de Resúmenes

Laura Plaza Alberto D´ıaz Universidad Complutense de Madrid Universidad Complutense de Madrid Prof. Jos´eGarc´ıaSantesmases, s/n Prof. Jos´eGarc´ıaSantesmases, s/n 28040 Madrid 28040 Madrid [email protected] [email protected]

Resumen: En este trabajo se presenta un método para la generaciónautomática de resúmenesbasado en grafos semánticos. El sistema utiliza conceptos y relaciones de WordNet para construir un grafo que representa el documento, as´ıcomo un al- goritmo de clustering basado en la conectividad para descubrir los distintos temas tratados en él.La selecciónde oraciones para el resumen se realiza en funciónde la presencia en las oraciones de los conceptos másrepresentativos del documento. Los experimentos realizados demuestran que el enfoque propuesto obtiene resultados significativamente mejores que otros sistemas evaluados bajo las mismas condiciones experimentales. Asimismo, el sistema puede ser fácilmente adaptado para trabajar con documentos de diferentes dominios, sin másque modificar la base de conocimien- to y el método para identificar conceptos en el texto. Finalmente, este trabajo tam- biénestudia el efecto de la ambigüedadléxicaen la generaciónde resúmenes. Palabras clave: Generaciónautomáticade resúmenes,grafos semánticos, desam- biguaciónléxicay semántica, agrupamiento de conceptos Abstract: This paper presents a semantic graph-based method for extractive summarization. The summarizer uses WordNet concepts and relations to produce a semantic graph that represents the document, and a degree-based clustering algorithm is used to discover different themes or topics within the text. The selection of sentences for the summary is based on the presence in them of the most representative concepts for each topic. The method has proven to be an efficient approach to the identification of salient concepts and topics in free text. In a test on the DUC data for single document summarization, our system achieves significantly better results than previous approaches based on terms and mere syntactic information. Besides, the system can be easily ported to other domains, as it only requires modifying the knowledge base and the method for concept annotation. In addition, we address the problem of word ambiguity in semantic approaches to automatic summarization. Keywords: Automatic summarization, semantic graphs, word sense disambiguation, concept clustering

1. Introduction words in their context (the sentence, or even The problem of summarizing textual docu- the whole document), which is not the way a ments has been extensively studied during human thinks when writing a summary. the past half century. Common approach- Recently, graph-based methods have at- es include training diﬀerent machine learn- tracted the attention of the NLP commu- ing models; computing some simple heuristic nity. These methods have been applied to rules (such as sentence position or cue words); a wide range of tasks, such as word sense or counting the frequency of the words in the disambiguation (Agirre and Soroa, 2009) document to identify central terms. However, or question answering (Celikyilmaz, Thint, these approaches think of words as indepen- and Huang, 2009). Regarding summariza- dent entities that do not interact with other tion, graph-based methods have typically

tried to find salient sentences in the text ac- that are also found in the headings of the doc- cording to their similarity to other sentences, ument (Edmundson, 1969; Brandow, Mitze, computing this similarity as the cosine dis- and Rau, 1995). These attributes are usually tance between their term vectors (Erkan and weighted and combined using a linear func- Radev, 2004). However, few approaches have tion that assesses a single score for each sen- dealt with the text at the semantic level, and tence in the document. Most advanced tech- rarely explore more complex representations niques concern the use of graph-based meth- based on concepts and semantic relations. ods to rank textual units for extraction. This In this paper, we examine the use and work mainly investigates previous work relat- strength of concept graphs to identify the ed to these techniques because the method central topics covered in a text, as a previous proposed here clearly falls under this catego- step to rank the sentences for the summary. Graph-based methods usually represent ry. To this aim, we construct a graph where the documents as graphs, where the nodes each sentence is represented by the concepts correspond to text units (such as words, in WordNet that are found in it, and where phrases, sentences or even paragraphs), and the different concepts are interconnected to the edges represent cohesion relationships be- each other by a number of semantic relations. tween these units, or even similarity measures We identify salient concepts in this graph, between them (e.g. the Euclidean distance). based on the detection of hub or core ver- Once the graph for the document is creat- tices. These concepts constitute the centroids ed, the salient nodes are located in the graph of the clusters that delimitate the different and used to extract the corresponding units topics in the document. The ranking is based for the summary. on the presence in the sentences of the most LexRank (Erkan and Radev, 2004) is representative concepts for each topic. a well-know example of a centroid-based Our graph-based method has been evalu- method to multi-document summarization. ated on the Document Understanding Con- It assumes a fully connected and undirected ferences 2002 data1. We show that our graph, with sentences as nodes and similari- method performs significantly better than ty between them as edges. It represents the previously published approaches. This work sentences in each document by its TF-IDF also deals with the problem of word ambi- vectors and computes the sentence connectiv- guity, which inevitably arises when trying ity using the cosine similarity. A very similar to map the text to WordNet concepts, and method is proposed by Mihalcea and Tarau shows that applying a word sense disam- (2004) to perform mono-document summa- biguation algorithm benefit text summariza- rization. As in LexRank, the nodes represent tion. sentences and the edges represent the similarity between them, measured as a function of 2. Related Work their content overlap. Most recently, Litvak Text summarization is the process of auto- and Last (2008) proposed an approach that matically creating a compacted version of uses a graph-based syntactic representation a given text. Content reduction can be ad- for keyword extraction, which can be used as dressed by selection and/or by generalization a first step in summarization. However, most of what is important in the source (Sparck- of these systems ignore the latent semantic Jones, 1999). This definition suggests that associations that exist between the words, two generic groups of summarization meth- both intra and inter-sentence (e.g. synonymy, ods exist: those which generate extracts and hypernymy or co-occurrence relations). those which generate abstracts. In this pa- Consider the paragraph shown in Figure per, we focus on extractive methods; that is, 1. Approaches based on term frequencies and those which select sentences from the original mere syntactic representations do not suc- document to produce the summary. ceed in determining that the terms hurricane Traditional summarization systems typi- and cyclone are synonyms, and that both of cally rank the sentences using simple heuris- them are very close in meaning to the noun tic features such as the sentence position and phrase tropical storm. They do not detect the presence of certain cue words or terms that Puerto Rico, Virgin Islands and Domini- can Republic are hyponyms of the broader 1DUC Conferences: http://duc.nist.gov/ concept country, and that wind, rain and high

98 Using Semantic Graphs and Word Sense Disambiguation Techniques to Improve Text Summarization

sea are types of atmospheric conditions usu- der to prepare the document for the subse- ally produced by hurricanes. quent steps. Irrelevant sections in the document (such as authors, source or publication date) are removed. Generic and high frequency terms are also removed, using a stop list and the inverse document frequency (Sparck- Jones, 1972). The headline/title and body sections in the document are separated. Fi- nally, the text in the body section is split in- to sentences and the terms are tagged with Figure 1: A snippet of a news item that illus- their part of speech. trates the need to identify semantic relations Next, each sentence is translated to the between terms appropriate concepts in WordNet, using This problem can be partially solved by the WordNet::SenseRelate (WNSR) pack- 2 dealing with concepts instead of terms, and age (Patwardhan, Banerjee, and Pedersen, semantic relations instead of lexical or syn- 2005). WordNet::SenseRelate uses different tactical ones. To this end, some recent works measures of semantic similarity and related- have adapted existing methods to deal with ness to perform WSD and assigns a sense or concepts. Reeve, Han, and Brooks (2007) meaning (as found in WordNet) to each word adapt the method of lexical chaining to use in a text. In particular, in this work the Lesk biomedical concepts. Zhao, Wu, and Huang WSD method (Lesk, 1986) is used, which (2009) use WordNet concepts and synonyms computes semantic relatedness of word senses to represent and expand query words in their using gloss overlaps. Table 1 shows the result graph-based summarizer. Lloret et al. (2008) of applying WNSR to an example sentence. propose a term frequency based approach The term defense clearly illustrates the need combined with textual entailment relations for a disambiguation algorithm. The noun de- between text snippets, while Steinberger et fense presents 11 different senses in WordNet al. (2007) present a term frequency approach and, to be precise, the first sense refers to the fed with anaphoric information. role of certain players in some sports, while All these works have demonstrated that the ninth sense refers to an organization re- even purely lexical approaches can benefit sponsible for protecting a country. It is ob- from different sources of semantic informa- vious that, without a WSD algorithm, the tion. Nonetheless, semantic approaches have wrong sense would be considered. several shortcomings, mainly due to deficien- Term WN Sense Term WN Sense cies in the knowledge database and problems hurricane 1 populate 2 of word ambiguity. By performing word sense Gilbert 2 south 1 disambiguation (WSD), it is expected that sweep 1 coast 1 the quality of the summaries will improve. Dom. Rep 1 prepare 4 Sunday 1 high 2 However, to the authors’ knowledge, no pre- civil 1 wind 1 vious study has investigated the influence of defense 9 heavy 1 word ambiguity in automatic summarization. alert 1 rain 1 heavily 2 sea 1 3. Summarization Method The method presented in this paper consists Table 1: : WordNet senses found in the sen- of 4 main steps: (1) concept identification and tence Hurricane Gilbert swept toward the Do- sentence representation, (2) document rep- minican Republic Sunday and the Civil De- resentation, (3) concept clustering and sub- fense alerted its heavily populated south coast theme recognition, and (4) sentence selection. to prepare for high winds, heavy rains and Each step is discussed in detail in the follow- high seas ing subsections. After that, the WordNet concepts derived 3.1. Concept Identification and from nouns are extended with their hyper- Sentence Representation nyms, and the hierarchies of all the concepts Before starting with the summarization process, a preliminary step is undertaken in or- 2http://www.d.umn.edu/˜tpederse/senserelate.html

99 Laura Plaza y Alberto Díaz

in the same sentence are merged to build a sentence graph. Our experimental results have shown that the use of verbs in this graph decreases the quality of the summaries, while adjectives and adverbs are not included because they do not present the hypernymy relation in WordNet. Finally, the N upper levels of these is-a hierarchies are removed, since they represent concepts with a exces- sively broad meaning. This N value has been empirically set to 3.

3.2. Document Representation

Next, all the sentence graphs are merged in- to a single document graph that represents the whole document. This graph can be ex- Figure 2: Example of a simplified document tended with more specific semantic relations graph in order to obtain a more complete representation of the document. We have conducted 3.3. Concept Clustering and several experiments using a semantic simi- Sub-theme Identification larity relation apart from the is-a relation previously mentioned. To this end, we com- The following step consists in clustering the pute the similarity between every pair of leaf WordNet concepts in the document graph, concepts in the graph, using the WordNet using a degree-based clustering algorithm Similarity package3 (Banerjee and Pedersen, similar to that proposed in (Yoo, Hu, and 2002). This package implements a variety of Song, 2007). The aim is to construct sets of semantic similarity and relatedness measures concepts that are closely related in meaning, based on the information found in WordNet. under the assumption that each set repre- In particular, we have used the Lesk measure. sents a different sub-theme in the document To expand the document graph with these and that the most central concepts in the additional relations, a new edge is added be- cluster (the centroids) give the necessary and tween two leaf nodes if the similarity between sufficient information related to its subtheme. the underlying concepts exceeds a similarity We hypothesize that the document graph threshold. is an instance of a scale-free network (Barabásiand Albert, 1999). A scale-free net- Finally, each edge is assigned a weight in work is a complex network that (among oth- [0, 1]. This weight is calculated as the ratio er characteristics) presents a particular type between the relative positions in their corre- of nodes which are highly connected to oth- sponding hierarchies of the concepts linked er nodes in the network, while the remaining by the edge (that is, the more specific the nodes are quite unconnected. These highest concepts connected are, the more weight is degree nodes are often called hubs. assigned to it). Following (Yoo, Hu, and Song, 2007), we Figure 2 shows an example of an extend- introduce the salience of a vertex (vi) as the ed document graph for a fictitious document sum of the weights of the edges connected to that consists solely of the sentence present- vi (equation 1). ed in Table 1. Continuous lines represent is- a relations, while dashed lines represent semantic similarity relations. The edges of a salience(vi)= weight(ej) portion of this graph have been labeled with e v ∀ !j |∃ k their weights. Ignored too general concepts ej connect(vi,v ) ∧ k are shown in a lighter color. (1) The vertices with highest salience are named hub vertices, and they represent the 3http://wn-similarity.sourceforge.net/ central nodes in the graph. The clustering al-

100 Using Semantic Graphs and Word Sense Disambiguation Techniques to Improve Text Summarization

gorithm starts sorting the vertices by their wk,j =0 si vk Ci where w =γsi v HVS%∈ (C ) salience, and selecting the first n vertices in k,j k∈ i " w =δsi v HVS(Ci) the ranking (that is, the so called hub ver- k,j k"∈ tices). Next, the hub vertices are iterative- Finally, the most significant sentences are ly grouped forming hub vertex sets. A hub selected for the summary, based on the sim- vertex set (HVS) is a set of vertices strong- ilarity between them and the clusters as de- ly connected to one another. These will con- fined in equation 3. Three different heuristics stitute the centroids of the clusters. To con- for sentence selection have been investigated. struct these HVSs, the clustering algorithm first searches, iteratively and for each hub Heuristic 1: Under the hypothesis that vertex, the hub vertex most connected to it, the cluster with more concepts repre- and merges them into a single HVS. In a sents the main theme in the document, second stage, the algorithm checks, for every and hence the only one that contributes pair of HVSs, if their internal connectivity is to the summary, the N sentences with lower than the connectivity between them. If greater similarity to this cluster are se- so, both HVSs are merged. This decision is lected. encouraged by the assumption that the clus- Heuristic 2: All clusters contribute tering should show maximum intra-cluster to the summary proportionally to their connectivity but minimum inter-cluster con- sizes. Therefore, for each cluster, the top nectivity. ni sentences are selected, where ni is pro- Finally, the remaining vertices (those not portional to its size. So, this heuristic included in the HVSs) are assigned to that will generate summaries covering not on- cluster to which they are more connected, as ly the information related to the main shown in equation 2. This is again an itera- topic, but also other satellite informa- tive process that adjusts the HVSs and the tion. vertices assigned to them. Heuristic 3: Halfway between the two heuristics above, this one computes a single score for each sentence as the sum conn(v, HV Si)= weight(ej) of their similarity to each cluster adjust- ej w HVSi ed to their sizes (equation 4). Then, the ∀e connect|∃!∈ (v,w) ∧ j N sentences with higher scores are se- (2) lected.

similarity(Ci,Sj) 3.4. Sentences Selection sem sim(Sj)= Ci Once the concept clusters have been created, !Ci | | we compute the similarity between all sen- (4) tences in the document and each of these Note that the N value varies with the de- clusters. The similarity between a sentence sired compression rate. graph and a cluster is calculated using a non- Two additional features, apart from democratic vote mechanism, so that each ver- the semantic-graph similarity (Sem Graphs), tex (vk) of a sentence (Sj) gives to each clus- have been extracted and tested when com- ter (Ci) a different number of votes (wi,j) de- puting the score of the sentences: sentence pending on whether vk belongs or not to the location (Loc) and similarity with the ti- HVS of that cluster. The similarity is com- tle section (Tit). Despite of their simplici- puted as the sum of the votes given by all ty, these features are commonly used in the vertices in the sentence to each cluster, as most recent works on extractive summariza- expressed in equation 3. Next, each sentence tion (Bossard, Généreux,and Poibeau, 2008; is assigned to the cluster to which this simi- Bawakid and Oussalah, 2008). The final se- larity is greater. lection of sentences is based on the weighted sum of these features, as stated in equation 5. semantic similarity(Ci,Sj)= wk,j vk vk Sj Score(S )=λ Sem Graphs(S )+θ Loc(S )+χ T it(S ) |!∈ j × j × j × j (3) (5)

101 Laura Plaza y Alberto Díaz

4. Evaluation Framework If the semantic similarity relation is fi- 4.1. Evaluation Metrics: ROUGE nally used, the similarity threshold to be considered (see Section 3.2). We follow the ROUGE metrics and the guide- lines observed in the 2004 and 2005 Docu- The combination of summarization fea- ment Understanding Conferences (Litkowski, tures used to select the sentences and 2004). ROUGE (Lin, 2004) compares a sum- their weights (see Section 3.4). mary generated from an automated system As a result, we get the optimal (called peer) with one or more ideal sum- parametrization for each of the three maries (called models), usually created by hu- heuristics for sentence selection implemented mans, and computes a set of different mea- in our system, as shown in Table 2 (Plaza, sures to automatically determine the con- Diaz, and Gervas, 2010). tent quality of the summary. In this work, the ROUGE-1, ROUGE-2, ROUGE-L and Parameter H.1 H.2 H.3 ROUGE-S4 recall scores are used to evaluate Percentage of hubs 2 % 20 % 5 % the summarizer. In short, ROUGE-N eval- Set of relations hypernymy + sem. sim. Similarity threshold 0.01 0.05 0.01 uates n-grams occurrence, where N stands Summarization criteria Sem Graphs + Location for the length of the n-gram. ROUGE-L computes the union of the longest common Table 2: Summary of the evaluation ac- subsequences (LCS) between the candidate complished to determine the optimal and the model summary sentences. Finally, parametrization for the algorithm ROUGE-S4 evaluates “skip bigrams”, that is, pairs of words having intervening word gaps no larger than four words. It may be seen that the best configuration implies using both relations (is-a and seman- 4.2. Evaluation Collection tic similarity), but the percentage of hub ver- We adopt the evaluation corpus of DUC tices and the similarity threshold depend on 2002, which is the most recent one for sin- the heuristic. Heuristics 1 and 3 prefer a rela- gle document summarization. This collection tively small number of hub vertices (2 % and is composed of 567 news articles in English. 5 %, respectively), while heuristic 2 prefers a Each document comes with one or more ab- higher number of hub vertices (20 %). This stractive model summaries manually creat- is due to the nature of the summaries gener- ed by humans. Model summaries are ap- ated by the second heuristic. It is worth re- proximately 100 words long. Since the news membering that the aim of Heuristic 2 is to items have been selected from different sec- generate summaries covering all topics pre- tions of different newspapers, the topics cov- sented in the source document, regardless of ered in the collection are diverse. their relative relevance within the document. Thus, it is not sufficient to consider only the 4.3. Algorithm Parametrization concepts dealing with the main document Before the final evaluation, a preliminary topic as hub vertices, but also those dealing experimentation has been performed to de- with other secondary information. The simi- termine the best configuration for the sum- larity threshold is also higher for this second marization algorithm. To this end, we use a heuristic than for the remaining ones. On the set of 10 documents from the DUC corpus. other hand, the use of the positional crite- The model summaries for these documents ria, together with our semantic graph-based were manually created by selecting the 30 % approach, improves the results obtained by of the most salient sentences in them. So, the all heuristics and achieves better ROUGE model summaries for the parametrization are scores than any other combination of sen- extractive summaries. The parameters to be tence selection criteria. This result was ex- estimated include: pected since the information in news items is usually presented according to the invert- The percentage of vertices considered as ed pyramid form, so that the most important hub vertices in the clustering method information is placed first. In particular, the (see Section 3.3). best results are achieved when the parame- The set of semantic relations used to ters λ, θ and χ in equation 5 are set to 0.9, build the graph (see Section 3.2). 0.1 and 0.0 respectively.

102 Using Semantic Graphs and Word Sense Disambiguation Techniques to Improve Text Summarization

The choice of the parameters also influ- to 100 words as traditionally done in DUC. ences the structural characteristic of the doc- The highest result for each metric is shown ument graph as well as the result of the clus- in bold. tering algorithm. Table 3 shows how the num- System R-1 R-2 R-S4 R-L ber and size of the clusters are affected by the H.3 0,4648 0,2196 0,1928 0,4277 percentage of hub vertices and the similarity H.2 0,4651 0,2193 0,1927 0,4276 threshold. It can be observed that raising the H.1 0,4641 0,2191 0,1919 0,4268 LexRank 0,4558 0,2115 0,1846 0,4173 number of hub vertices increases the number TextEnt 0,4518 0,1942 — 0,4104 of clusters, but decreases their average size. LeLSA+AR 0,4228 0,2074 0,1661 0,3928 On the contrary, increasing the connectivi- DUC 28 0,4278 0,2177 0,1732 0,3865 ty of the graph (i.e. reducing the similarity DUC 21 0,4149 0,2104 0,1655 0,3754 Lead 0,4113 0,2108 0,1660 0,3754 threshold) decreases the number of clusters, DUC 19 0,4082 0,2088 0,1638 0,3735 but its effect on the cluster size is unclear. DUC 27 0,4052 0,2022 0,1600 0,3691 DUC 29 0,3993 0,2006 0,1576 0,3617 Sim. Clusters Larger Smaller Hubs Thres. (HVS) cluster cluster 2% 1,33 254,89 9,94 Table 4: ROUGE scores for the different 0.001 10 % 6,56 135,78 7,39 versions of our algorithm, and comparison 20 % 11,75 77,56 3,75 with related work. The systems are sorted by 2% 1,79 288,21 13,32 ROUGE-L score in descending order 0.01 10 % 6,58 136,11 8,74 20 % 13,16 76,37 3,58 2% 2,37 191,84 16,68 0.5 10 % 7,63 91,37 5,95 A Wilcoxon signed ranks test has shown 20 % 14,63 54,52 2,37 that, at the 95 % confidence level, the performance of our three heuristics is signif- Table 3: Average number and size of the clus- icantly better than that of LexRank, all ters built from the document graph, accord- the DUC systems and both baselines (in at ing to the similarity threshold and the per- least 2 out of the 4 ROUGE scores). But centage of hub vertices used no significant differences exist between the three heuristics. Regarding the anaphoric and textual entailment approaches, as we only know their average ROUGE scores, we could 5. Results and Discussion not apply the test for these systems. How- Table 4 shows the ROUGE scores for the ever, the three versions of our summariz- summaries created by the three versions of er outperform the LeLSA+AR system in all our system (H.1, H.2, H.3); the LexRank4 ROUGE scores, and the Freq+TextEnt sys- lexical graph-based summarizer (Erkan and tem in ROUGE-1, ROUGE-2 and ROUGE-L Radev, 2004); a lexical summarizer improved scores (the ROUGE- S4 score is not avail- with anaphoric information (LeLSA+AR) able). (Steinberger et al., 2007); a term frequen- A further experiment has been conducted cy summarizer improved with textual entail- to examine the effect of WSD on the results ment (TextEnt) (Lloret et al., 2008); and the reported by our method. To this aim, we re- 5 systems which participated in DUC-2002 peated these experiments without performing and achieved the best results (in terms of WSD, but simply assigning to each word its the ROUGE metric). In short, system 19 us- first sense in WordNet. The results are pre- es topic representation templates to extract sented in Table 5, and indicate that the use salient information; systems 21, 27 and 28 of word disambiguation improves the quality employ machine learning techniques to de- of the automatic summaries. The WSD algo- termine the best set of attributes for extrac- rithms identify the concepts that are being tion (word frequency, sentence position...); referred to in the documents more accurately and system 29 uses lexical chains. We also which leads to the creation of a graph that list a lead baseline (the first 100 words of better reflects the content of the document. a document). All summaries were truncated However, this improvement is less than expected. The reason seems to be that the first 4We use the implementation of LexRank as provided in the MEAD summarization platform WordNet sense criterion is a quite pertinent (http://www. summarization.com/ mead/). The pa- one, since the senses of the words in Word- rameters are set to their default values. Net are ranked according to their frequency.

103 Laura Plaza y Alberto Díaz

Besides, the Lesk algorithm introduces some the salient concepts in the text and the cen- noise, and it is biased toward the first sense, tral topics covered in it; thus the selection of so that the percentage of WordNet concepts sentences is close to that made by humans. in the DUC corpus that Lesk labels with the An extensive evaluation has been accom- first sense is above 61 %. Therefore, the dif- plished, which has confirmed that the use ference among the disambiguation performed of concepts rather than terms, along with by both criteria is not marked. the semantic relations that exist between them, can be very useful in automatic sum- System R-1 R-2 R-S4 R-L marization. As a result, the method proposed H.3-WSD 0,4648 0,2196 0,1928 0,4277 H.2-WSD 0,4651 0,2193 0,1927 0,4276 compares positively with previous approach- H.1-WSD 0,4641 0,2191 0,1919 0,4268 es based on terms. Our results also outper- H.3-1st sense 0,4608 0,2103 0,1838 0,4251 form those obtained by a lexical graph-based H.2-1st sense 0,4594 0,2073 0,1810 0,4224 approach and by others systems using differ- H.1-1st sense 0,4584 0,2057 0,1794 0,4216 ent types of syntactical information. Table 5: ROUGE scores achieved by our sys- We found that applying WSD improves tem: first, using Lesk to solve word ambi- the performance of our summarizer but, as guity; and second, selecting the 1st sense in already mentioned, the disambiguation algo- WordNet for every word rithm introduces some noise in the concept recognition, which in turns affects the subtheme identification step. As future work, we It is striking that the differences between plan to evaluate our system with other dis- the three heuristics (both with and without ambiguation algorithms. WSD) are not significant. In order to un- Finally, an important contribution is the derstand the reason, we examined the inter- possibility of applying the method to doc- mediate results of our algorithm. We found uments from different domains with minor that, in this particular experimentation, the changes, as it only requires modifying the clustering method usually produces one big knowledge base and the method for auto- cluster along with a variable number of small matically identifying the concepts within the clusters. As news items have little redundan- text. cy in their content, most concepts in them are closely related to the main topic, and so they References fall into the same cluster. As a consequence, Agirre, E. and A. Soroa. 2009. Personal- the three heuristics extract most of their sen- izing PageRank for Word Sense Disam- tences from this large cluster, and therefore biguation. In Proceedings of the 12th Con- the summaries are quite similar. However, the ference of the European Chapter of the As- best results are reported by the heuristic 3. sociation for Computational Linguistics, We have checked that this heuristic selects pages 33–41. most of the sentences from the most populated cluster, but it also includes some sen- Banerjee, S. and T. Pedersen. 2002. An tences from others when the sentences give a Adapted Lesk Algorithm for Word Sense high score to these clusters. Thus, in addition Disambiguation using WordNet. In Pro- to the information related to the central top- ceedings of the 3rd International Confer- ic, this heuristic also includes other depen- ence on Intelligent Text Processing and dent or “satellite” information that might be Computational Linguistics, pages 136– relevant to the user. On the contrary, heuris- 145. tic 1 fails to present this information; while Barabási,A.L. and R. Albert. 1999. Emer- heuristic 2 includes more secondary informa- gence of Scaling in Random Networks. tion, but misses other core one. Science, 268:509–512. 6. Conclusion and Future Work Bawakid, A. and M. Oussalah. 2008. A Se- mantic Summarization System: University In this paper, an efficient approach to extrac- of Birmingham at TAC 2008. In Proceed- tive text summarization has been presented, ings of the First Text Analysis Conference. which represents the document as a semantic graph, using WordNet concepts and re- Bossard, A., M. Généreux,and T. Poibeau. lations. The method succeeds in identifying 2008. Description of the LIPN Systems at

104 Using Semantic Graphs and Word Sense Disambiguation Techniques to Improve Text Summarization

TAC 2008: Summarizing Information and Methods on Natural Language Processing, Opinions. In Proceedings of the 1st Text pages 404–411. Analysis Conference. Patwardhan, S., S. Banerjee, and T. Peder- Brandow, R., K. Mitze, and L. F. Rau. sen. 2005. SenseRelate::TargetWord: A 1995. Automatic Condensation of Elec- Generalized Framework for Word Sense tronic Publications by Sentence Selection. Disambiguation. In Proceedings of the As- Information Processing and Management, sociation for Computational Linguistics, 5(31):675–685. pages 73–76. Celikyilmaz, Asli, Marcus Thint, and Zhi- Plaza, L., A. Diaz, and P. Gervas. 2010. heng Huang. 2009. A Graph-based Automatic Summarization of News Us- Semi-Supervised Learning for Question- ing WordNet Concept Graphs. IADIS In- Answering. In Proceedings of the 47th An- ternational Journal on Computer Science nual Meeting of the ACL, pages 719–727. and Information Systems, V:45–57. Edmundson, H. P. 1969. New Methods Reeve, L. H., H. Han, and A. D. Brooks. in Automatic Extracting. Journal of 2007. The Use of Domain-specific Con- the Association for Computing Machin- cepts in Biomedical Text Summarization. ery, 2(16):264–285. Information Processing and Management, Erkan, G. and D. R. Radev. 2004. 43:1765–1776. LexRank: Graph-based Lexical Central- Sparck-Jones, K. 1972. A Statistical Inter- ity as Salience in Text Summarization. pretation of Term Specificity and its Ap- Journal of Artificial Intelligence Research, plication in Retrieval. Journal of Docu- 22:457–479. mentation, 28(1):11–20. Lesk, M. 1986. Automatic Sense Disam- Sparck-Jones, K. 1999. Automatic Sum- biguation Using Machine Readable Dic- marising: Factors and Directions. The tionaries: How to Tell a Pine Cone from a MIT Press. Ice Cream Cone. In Proceedings of Special Interest Group on Design of Communica- Steinberger, J., M. Poesio, M. A. Kabad- tion, pages 24–26. jov, and K. Jezek. 2007. Two Uses of Anaphora Resolution in Summarization. Lin, C-Y. 2004. Rouge: A Package for Auto- Information Processing and Management, matic Evaluation of Summaries. In Pro- 43(6):1663–1180. ceedings of the Association for Computa- tional Linguistics, Workshop: Text Sum- Yoo, I., X. Hu, and I-Y. Song. 2007. A Co- marization Branches Out, pages 74–81. herent Graph-based Semantic Clustering Litvak, M. and M. Last. 2008. Graph-based and Summarization Approach for Biomed- Keyword Extraction for Single-document ical Literature and a New Summarization Summarization. In Proceedings of the In- Evaluation Method. BMC Bioinformat- ternational Conference on Computation- ics, 8(9). al Linguistics, Workshop on Multi-source Zhao, L., L. Wu, and X. Huang. 2009. Us- Multilingual Information Extraction and ing Query Expansion in Graph-based Ap- Summarization. proach for Query-focused Multi-document Lloret, E., O. Ferrández, R. Muñoz, and Summarization. Information Processing M. Palomar. 2008. A Text Summariza- and Management, 45:35–41. tion Approach under the Influence of Tex- tual Entailment. In Proceedings of the 5th International Workshop on Natural Lan- guage Processing and Cognitive Science in Conjunction with the 10th Internation- al Conference on Enterprise Information Systems, pages 22–31. Mihalcea, R. and P. Tarau. 2004. Tex- tRank: Bringing Order into Texts. In Pro- ceedings of the Conference on Empirical

105