An Approach to Text Summarization

Sankar K Sobha L AU-KBC Research Centre AU-KBC Research Centre MIT Campus, Anna University MIT Campus, Anna University Chennai- 44. Chennai- 44. [email protected] [email protected]

networks and more recently in ap- Abstract plications (Mihalcea and Tarau, 2004), (Mihalcea et al., 2004), (Erkan and Radev, 2004) and (Mihal- We propose an efficient text summarization cea, 2004). These iterative approaches have a high technique that involves two basic opera- time complexity and are practically slow in dy- tions. The first operation involves finding namic summarization. Proposals are also made for coherent chunks in the document and the coherence based automated summarization system second operation involves ranking the text (Silber and McCoy, 2000). in the individual coherent chunks and pick- We propose a novel text summarization tech- ing the sentences that rank above a given nique that involves two basic operations, namely threshold. The coherent chunks are formed finding coherent chunks in the document and rank- by exploiting the lexical relationship be- ing the text in the individual coherent chunks tween adjacent sentences in the document. formed. Occurrence of words through repetition or For finding coherent chunks in the document, we relatedness by sense relation plays a major propose a set of rules that identifies the connection role in forming a cohesive tie. The pro- between adjacent sentences in the document. The posed text ranking approach is based on a connected sentences that are picked based on the graph theoretic ranking model applied to rules form coherent chunks in the document. For text summarization task. text ranking, we propose an automatic and unsu- pervised graph based ranking algorithm that gives improved results when compared to other ranking 1 Introduction algorithms. The formation of coherent chunks greatly improves the amount of information of the Automated summarization is an important area in text picked for subsequent ranking and hence the NLP research. A variety of automated summariza- quality of text summarization. tion schemes have been proposed recently. NeATS The proposed text ranking technique employs a (Lin and Hovy, 2002) is a sentence position, term hybrid approach involving two phases; the first frequency, topic signature and term clustering phase employs word frequency statistics and the based approach and MEAD (Radev et al., 2004) is second phase involves a word position and string a centroid based approach. Iterative graph based pattern based weighing algorithm to find the Ranking algorithms, such as Kleinberg’s HITS weight of the sentence. A fast running time is algorithm (Kleinberg, 1999) and Google’s Page- achieved by using a compression hash on each sen- Rank (Brin and Page, 1998) have been traditionally tence. and successfully used in web-link analysis, social

53 Proceedings of CLIAWS3, Third International Cross Lingual Information Access Workshop, pages 53–60, Boulder, Colorado, June 2009. c 2009 Association for Computational This paper is organized as follows: section 2 Rule 2 discusses lexical cohesion, section 3 discusses the text ranking algorithm and section 4 describes the A 3rd person pronominal in a given sentence refers summarization by combining lexical cohesion and to the antecedent in the previous sentence, in such summarization. a way that the given sentence gives the complete meaning with respect to the previous sentence. 2 Lexical Cohesion When such adjacent sentences are found, they form coherent chunks. Coherence in linguistics makes the text semantical- ly meaningful. It is achieved through semantic fea- Rule 3 tures such as the use of deictic (a deictic is an expression which shows the direction. ex: that, The reappearance of NERs in adjacent sentences this.), anaphoric (a referent which requires an ante- is an indication of connectedness. When such adja- cedent in front. ex: he, she, it.), cataphoric (a refe- cent sentences are found, they form coherent rent which requires an antecedent at the back.), chunks. lexical relation and proper noun repeating elements (Morris and Hirst, 1991). Robert De Beaugrande Rule 4 and Wolfgang U. Dressler define coherence as a “continuity of senses” and “the mutual access and An ontology relationship between words across relevance within a configuration of concepts and sentences can be used to find semantically related relations” (Beaugrande and Dressler, 1981). Thus a words across adjacent sentences that appear in the text gives meaning as a result of union of meaning document. The appearance of related words is an or senses in the text. indication of its coherence and hence forms cohe- The coherence cues present in a sentence are di- rent chunks. rectly visible when we go through the flow of the All the above rules are applied incrementally to document. Our approach aims to achieve this ob- achieve the complete set of coherent chunks. jective with linguistic and heuristic information. The identification of semantic neighborhood, oc- 2.1.1 Connecting Word currence of words through repetition or relatedness by sense relation namely synonyms, hyponyms and The ACE Corpus was used for studying the cohe- hypernym, plays a major role in forming a cohesive rence patterns between adjacent sentences of the tie (Miller et al., 1990). document. From our analysis, we picked out a set of keywords such that, the appearance of these 2.1 Rules for finding Coherent chunks keywords at the beginning of the sentence provide a strong lexical tie with the previous sentence. When through a document, the relationship The appearance of the keywords “accordingly, among adjacent sentences is determined by the again, also, besides, hence, henceforth, however, continuity that exists between them. incidentally, meanwhile, moreover, namely, never- We define the following set of rules to find co- theless, otherwise, that is, then, therefore, thus, herent chunks in the document. and, but, or, yet, so, once, so that, than, that, till, whenever, whereas and wherever”, at the begin- Rule 1 ning of the present sentence was found to be highly coherent with the previous sentence. The presence of connectives (such as accordingly, Linguistically a sentence cannot start with the again, also, besides) in present sentence indicates above words without any related introduction in the connectedness of the present sentence with the the previous sentence. previous sentence. When such connectives are Furthermore, the appearance of the keywords found, the adjacent sentences form coherent “consequently, finally, furthermore”, at the begin- chunks. ning or middle of the present sentence was found to be highly cohesive with the previous sentence. Example 1

54 1. a The train was late. In Example 4, the pronominal he in the second sen- 1. b However I managed to reach the wedding tence refers to the antecedent Ravi in the first sen- on time. tence.

In Example 1, the connecting word however binds Example 5 with the situation of the train being late. 1. a He is the one who got the first prize. Example 2 1. a The cab driver was late. In example 5 the pronominal he is possessive and 1. b The bike tyre was punctured. it doesn’t need an antecedent to convey the mean- 1. c The train was late. ing. 1 .d Finally, I managed to arrive at the wed- ding on time by calling a cab. 2.1.3 NERs Reappearance Example 3 Two adjacent sentences are said to be coherent 1. a The cab driver was late. when both the sentences contain one or more reap- 1. b The bike tyre was punctured. pearing nouns. 1. c The train was late.

1. d I could not wait any more; I finally ma- Example 6 naged to reach the wedding on time by calling a 1. a Ravi is a good boy. cab. 1. b Ravi scored good marks in exams.

In Example 2, the connecting word finally binds Example 7 with the situation of him being delayed. Similarly, 1. a The car race starts at noon. in Example 3, the connecting word finally, though 1. b Any car is allowed to participate. it comes in the middle of the sentence, it still binds with the situation of him being delayed. Example 6 and Example 7 demonstrates the cohe- 2.1.2 Pronominals rence between the two sentences through reappear- ing nouns. In this approach we have a set of pronominals which establishes coherence in the text. From our 2.1.4 Thesaurus Relationships analysis, it was observed that if the pronominals WordNet covers most of the sense relationships. “he, she, it, they, her, his, hers, its, their, theirs”, To find the semantic neighborhood between adja- appear in the present sentence; its antecedent may cent sentences, most of the lexical relationships be in the same or previous sentence. such as synonyms, hyponyms, hypernyms, mero- It is also found that if the pronominal is not pos- nyms, holonyms and gradation can be used (Fell- sessive (i.e. the antecedent appears in the previous baum 1998). Hence, semantically related terms are sentence or previous clause), then the present sen- captured through this process. tence and the previous sentences are connected.

However, if the pronominal is possessive then it Example 8 behaves like reflexives such as “himself”, “herself” 1. a The bicycle has two wheels. which has subject as its antecedent. Hence the pos- 1. b The wheels provide speed and stability. sibility of connecting it with the previous sentence is very unlikely. Though pronominal resolution In Example 8, bicycle and wheels are related cannot be done at a window size of 2 alone, still through bicycle is the holonym of wheels. we are looking at window size 2 alone to pick guaranteed connected sentences. 2.2 Coherence Finding Algorithm

Example 4 The algorithm is carried out in four phases. Initial- 1. a Ravi is a good boy. ly, each of the 4 cohesion rules is individually ap- 1. b He always speaks the truth. plied over the given document to give coherent chunks. Next, the coherent chunks obtained in each

55 phases are merged together to give the global cohe- averaged to obtain the final weight metric. Sen- rent chunks in the document. tences are sorted in non ascending order of weight.

3.1 Graph Input Text Let G (V, E) be a weighted undirected complete graph, where V is set of vertices and E is set of weighted edges.

S1 Connecting Word

S 2 S6 Possessive Pronoun

Noun Reappearance

S3 S5 . Thesaurus Relationships

S Coherent Chunks 4

Figure 2: A complete undirected graph Figure 1: Flow of Coherence chunker In figure 2, the vertices in graph G represent the set Figure 1, shows the flow and rule positions in the of all sentences in the given document. Each sen- coherence chunk identification module. tence in G is related to every other sentence through the set of weighted edges in the complete 2.3 Evaluation graph.

One way to evaluate the coherence finding algo- 3.2 Phase 1 rithm is to compare against human judgments Let the set of all sentences in document S= {si | 1 ≤ made by readers, evaluating against text pre i ≤ n}, where n is the number of sentences in S. marked by authors and to see the improved result The sentence weight (SW) for each sentence is cal- in the computational task. In this paper we will see culated by average affinity weight of words in it. the computational method to see the improved re- For a sentence si= {wj | 1 ≤ j ≤ mi} where mi is the sult. number of words in sentence si, (1 ≤ i ≤ n) the af- finity weight AW of a word wj is calculated as fol- 3 Text Ranking lows: The proposed graph based text ranking algorithm consists of three steps: (1) Word Frequency Analy- ∑ IsEqual(, wj wk ) ∀∈wSk sis; (2) A word positional and string pattern based AW() wj = (1) weight calculation algorithm; (3) Ranking the sen- WC() S tences by normalizing the results of step (1) and where S is the set of all sentences in the given (2). document, wk is a word in S, WC (S) is the total The algorithm is carried out in two phases. The number of words in S and function IsEqual(x, y) weight metric obtained at the end of each phase is returns an integer count 1 if x and y are equal else integer count 0 is returned by the function.

56 Next, we find the sentence weight SW (si) of 3.3 Compression hash each sentence s (1 ≤ i ≤ n) as follows: i A fast compression hash function over word w is

given as follows: = 1 SW() sij∑ AW ( w ) (2) i k-1 k-2 k-3 0 m ∀∈wsji H (w) = (c1a +c2a +c3a +...+cka ) mod p (3)

At the end of phase 1, the graph vertices hold where w= {c1, c2, c3 ... ck} is the ordered set of the sentence weight as illustrated in figure 4. ASCII equivalents of alphabets in w and k the total number of alphabets in w. The choice of a=2 per- mits the exponentiations and term wise multiplica- [1]"The whole show is dreadful," she cried, com- tions in equation 3 to be binary shift operations on ing out of the menagerie of M. Martin. a micro processor, thereby speeding up the hash [2]She had just been looking at that daring specu- computation over the text. Any lexicographically lator "working with his hyena" to speak in the ordered bijective map from character to integer style of the program. may be used to generate set w. The recommenda- [3]"By what means," she continued, "can he have tion to use ASCII equivalents is solely for imple- tamed these animals to such a point as to be cer- tain of their affection for." mentation convenience. Set p = 26 (for English), to [4]"What seems to you a problem," said I, inter- cover the sample space of the set of alphabets un- rupting, "is really quite natural." der consideration. [5]"Oh!" she cried, letting an incredulous smile Compute H (w) for each word in sentence si to wander over her lips. obtain the hashed set [6]"You think that beasts are wholly without pas- sions?" Quite the reverse; we can communicate to Hs(im )= { Hw (12 ), Hw ( )... Hw (i )} (4) them all the vices arising in our own state of civi- lization. Next, invert each element in the set H (si) back to its ASCII equivalent to obtain the set Figure 2: Sample text taken for the ranking process. ˆ = H (sHcHcHcˆˆˆˆim ) { (12 ), ( )... (i )} (5) Then, concatenate the elements in set Hˆ(si ) to obtain the string sˆi ; where sˆi is the compressed representation of sentence si. The hash operations are carried out to reduce the computational com- plexity in phase 2, by compressing the sentences and at the same time retaining their structural properties, specifically word frequency, word posi- tion and sentence patterns. 3.4 Levenshtein distance (LD) between two strings string1 and string2 is a metric that is used to find the number of operations required to convert string1 to string2 or vice versa; where the set of possible operations on the character is insertion, deletion, or substitution. The LD algorithm is illustrated by the following example Figure 4: Sample graph of Sentence weight calcu- lation in phase 1 LD (ROLL, ROLE) is 1 LD (SATURDAY, SUNDAY) is 3

57 3.5 Levenshtein Similarity Weight

∀∈i ˆ im= 12 i Consider two strings, string1 and string2 where ls1 sS ,find H (sHcHcHcˆˆˆˆ ) { ( ), ( )... ( )} is the length of string1 and ls2 be the length of using equation 3 and 4. Then, concatenate the ele- string2. Compute MaxLen=maximum (ls , ls ). 1 2 ments in set Hˆ(si ) to obtain the string sˆi ; where sˆi Then LSW between string1 and string2 is the dif- is the compressed representation of sentence s . ference between MaxLen and LD, divided by Max- i Len. Clearly, LSW lies in the interval 0 to 1. In case Each string sˆi ; 1 ≤ i ≤ n is represented as the of a perfect match between two words, its LSW is 1 vertex of the complete graph as in figure 5 and in case of a total mismatch, its LSW is 0. In all and S={sˆ ˆi |1≤≤ i n}. For the graph in figure 5, other cases, 0 < LSW <1. The LSW metric is illu- find the Levenshtein similarity weight LSW be- strated by the following example. tween every vertex using equation 6. Find vertex LSW (ABC, ABC) =1 weight (VW) for each string sˆi ; 1 ≤ l ≤ n by LSW (ABC, XYZ) =0 LSW (ABCD, EFD) =0.25 = 1 VW() sˆˆˆlli∑ LSW (,) s s (7) Hence, to find the Levenshtein similarity n ˆ ∀≠ssSˆˆi l ∈ weight, first find the Levenshtein distance LD us- ing which LSW is calculated by the equation 4 Text Ranking

− The rank of sentence s ; 1 ≤ i ≤ n is computed as = MaxLen(,) sˆˆij s LD (,)s ˆˆ ij s i LSW(,) sˆˆij s (6) MaxLen(,) sˆˆij s SW() sii+ VW () sˆ where, sˆi and sˆj are the concatenated string out- Rank() si =≤≤ ;1 i n (8) puts of equation 5. 2 where, SW() si is calculated by equation 2 of 3.6 Phase 2 phase 1 and VW() sˆi is found using equation 7 of

Let S = {si | 1 ≤ i ≤ n} be the set of all sentences in phase 2. Arrange the sentences si; 1 ≤ i ≤ n, in non the given document; where n is the number of sen- increasing order of their ranks. tences in S. Further, si = {wj | 1 ≤ j ≤ m}, where m SW() si in phase 1 holds the sentence affinity in is the number of words in sentence si. terms of word frequency and is used to determine the significance of the sentence in the overall rak- ing scheme. VW() sˆi in phase 2 helps in the overall ranking by determining largest common subse- quences and other smaller subsequences then as- signing weights to it using LSW. Further, since named entities are represented as strings, repeated occurrences are weighed efficiently by LSW, the- reby giving it a relevant ranking position. 5 Summarization

Summarization is done by applying text ranking over the global coherent chunks in the document. The sentences whose weight is above the threshold is picked and rearranged in the order in which the sentences appeared in the original document.

Figure 5: Sample graph for Sentence weight calcu- lation in phase 2

58 6 Evaluation Comparatively Table 2, which is the the ROUGE score for summary including the cohe- The ROUGE evaluation toolkit is employed to rence chunker module gives better result. evaluate the proposed algorithm. ROUGE, an au- tomated summarization evaluation package based 7 Related Work on Ngram statistics, is found to be highly corre- lated with human evaluations (Lin and Hovy, Text extraction is considered to be the important 2003a). and foremost process in summarization. Intuitive- The evaluations are reported in ROUGE-1 me- ly, a hash based approach to graph based ranking trics, which seeks unigram matches between the algorithm for text ranking works well on the task generated and the reference summaries. The of extractive summarization. A notable study re- ROUGE-1 metric is found to have high correlation port on usefulness and limitations of automatic with human judgments at a 95% confidence level sentence extraction is reported in (Lin and Hovy, and hence used for evaluation. (Mihalcea and Ta- 2003b), which emphasizes the need for efficient rau, 2004) a graph based ranking model with algorithms for sentence ranking and summariza- Rouge score 0.4904, (Mihalcea, 2004) Graph- tion. based Ranking Algorithms for Sentence Extrac- tion, Applied to Text Summarization with Rouge 8 Conclusions score 0.5023. In this paper, we propose a coherence chunker Table 1 shows the ROUGE Score of 567 news module and a hash based approach to graph based articles provided during the Document Under- ranking algorithm for text ranking. In specific, we standing Evaluations 2002(DUC, 2002) using the propose a novel approach for graph based text proposed algorithm without the inclusion of cohe- ranking, with improved results comparative to ex- rence chunker module. isting ranking algorithms. The architecture of the

algorithm helps the ranking process to be done in a time efficient way. This approach succeeds in Score grabbing the coherent sentences based on the lin- ROUGE-1 0.5103 guistic and heuristic rules; whereas other super- ROUGE-L 0.4863 vised ranking systems do this process by training the summary collection. This makes the proposed Table 1: ROUGE Score for the news article algorithm highly portable to other domains and summarization task without coherence languages. chunker, calculated across 567 articles.

References Table 2 shows the ROUGE Score of 567 news ACE Corpus. NIST 2008 Automatic Content Extraction articles provided during the Document Under- Evaluation(ACE08). standing Evaluations 2002(DUC, 2002) using the http://www.itl.nist.gov/iad/mig/tests/ace/2008/ proposed algorithm after the inclusion of cohe- Brin and L. Page. 1998. The anatomy of a large scale rence chunker module. hypertextualWeb search engine. Computer Networks and ISDN Systems, 30 (1 – 7). Score Erkan and D. Radev. 2004. Lexpagerank: Prestige in ROUGE-1 0.5312 multi document text summarization. In Proceedings of the Conference on Empirical Methods in Natural ROUGE-L 0.4978 Language Processing, Barcelona, Spain, July.

Table 2: ROUGE Score for the news article Fellbaum, C., ed. WordNet: An electronic lexical data- summarization task with coherence chunker, base. MIT Press, Cambridge (1998). calculated across 567 articles. Kleinberg. 1999. Authoritative sources in a hyper- linked environment. Journal of the ACM, 46(5):604- 632.

59 Lin and E.H. Hovy. From Single to Multi document Summarization: A Prototype System and its Evalua- tion. In Proceedings of ACL-2002. Lin and E.H. Hovy. 2003a. Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of Human Language Technology Confe- rence (HLT-NAACL 2003), Edmonton, Canada, May. Lin and E.H. Hovy. 2003b. The potential and limitations of sentence extraction for summarization. In Pro- ceedings of the HLT/NAACL Workshop on , Edmonton, Canada, May. Mihalcea. 2004. Graph-based ranking algorithms for sentence extraction, applied to text summarization. In Proceedings of the 42nd Annual Meeting of the Asso- ciation for Computational Linguistics (ACL 2004) (companion volume), Barcelona, Spain. Mihalcea and P. Tarau. 2004. TextRank - bringing order into texts. In Proceedings of the Conference on Em- pirical Methods in Natural Language Processing (EMNLP 2004), Barcelona, Spain. Mihalcea, P. Tarau, and E. Figa. 2004. PageRank on semantic networks, with application to word sense disambiguation. In Proceedings of the 20th Interna- tional Conference on Computational Linguistics (COLING 2004), Geneva, Switzerland. Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., Miller, K. J. Introduction to WordNet: An on-line lexical database. Journal of Lexicography (1990). Morris, J., Hirst, G. Lexical cohesion computed by the- saural relations as an indicator of the structure of text. Computational Linguistics (1991). Radev, H. Y. Jing, M. Stys and D. Tam. Centroid-based summarization of multiple documents. Information Processing and Management, 40: 919-938, 2004. Robert de Beaugrande and Wolfgang Dressler. Intro- duction to Text Linguistics. Longman, 1981. Silber, H. G., McCoy, K. F. Efficient text summariza- tion using lexical chains. In Proceedings of Intelli- gent User Interfaces. (2000).

60