From Social Bookmarking to Social Summarization: An Experiment in Community-Based Summary Generation∗
Oisin Boydell Barry Smyth Adaptive Information Cluster Adaptive Information Cluster School of Computer Science and School of Computer Science and Informatics Informatics University College Dublin University College Dublin Belfield, Dublin 4 Belfield, Dublin 4 [email protected] [email protected]
ABSTRACT The Social Web We describe a novel document summarization technique that uses informational cues, such as social bookmarks or search In this paper we suggest a novel approach to summarization Social queries, as the basis for summary construction by leverag- that is inspired by the recent emergence of so-called Web ing the snippet-generation capabilities of standard search en- services, in which communities of users are playing an gines. A comprehensive evaluation demonstrates how the so- increasingly important role when it comes to producing, en- cial summarization technique can generate summaries that riching, organising, and facilitating access to Web content. are of significantly higher quality that those produced by a For example, the rapid growth of Web Logs (blogs) is just number of leading alternatives. one example of the dynamic new world of user-generated content. We also find communities of users eager to con- ACM Classification: H.4 [Information Systems Applica- tribute to existing content by submitting their own reviews 1 2 tions]: Miscellaneous; H.3.3 [Information Storage and Re- and opinions. Sites like Amazon and TripAdvisor have trieval]: Information Search and Retrieval learned to embraced this as a valuable source of social con- tent for some time now, by allowing users to submit their General terms: Design, Algorithms, Human Factors reviews and opinions on consumer products (Books, DVDs, etc. in the case of Amazon) or travel services (vacations, ho- Keywords: summarization, social bookmarks, click-through tels etc. in the case of TripAdvisor). Indeed, as we come data, community, Web search to appreciate the willingness of users to participate in these types of services a number of innovators have recognised the INTRODUCTION power of social interactions to drive entirely new types of so- The ability to effectively summarize a document — to ac- cial media. The news aggregator, Digg3, is a case in point: by curately and concisely capture its key information — is an allowing users to submit and rate news stories found on the important area of research that is dominated by a wide range Web, Digg plays the role of a community-based news aggre- of techniques which employ language models of varying de- gator and in just 18 months has attracted a reader-base that grees of sophistication. These summarization techniques at- is fast approaching that of the New York Times4. Services tempt to automatically capture the salient content from a doc- like Flickr5 and Del.icio.us6 harness a very different form of ument in order to present it to a human reader in a more con- social content by encouraging users to label or tag content densed form, but one of the problems of these traditional ‘one to facilitate the sharing of, and access to, said content. For size fits all’ approaches is that there is limited emphasis on instance, Flickr allows users to upload, tag, and share their the needs and preferences of the end users. While the result- photo libraries, whereas Del.icio.us, allows users to manage ing summaries may perform well in general terms — effec- and share their Web bookmarks. Importantly, these tagging tively extracting the core content of the document in question services allow various users to express their views of con- — they may not appeal to the needs and preferences of indi- tent, by submitting the tags that they deem to be appropriate, vidual users or a community of users. leading to the development of an alternative content taxon- ∗This material is based on works supported by Science Foundation Ireland omy (or folksonomy [1]) to serve as an alternative content under Grant No. 03/IN.3/I361. index for facilitating search.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are 1 : not made or distributed for profit or commercial advantage and that copies http //www.amazon.com 2 : bear this notice and the full citation on the first page. To copy otherwise, to http //www.tripadvisor.com 3 republish, to post on servers or to redistribute to lists, requires prior specific http : //www.digg.com permission and/or a fee. 4http : //www.alexaholic.com/digg.com + nyt.com IUI’07, January 28–31, 2007, Honolulu, Hawaii, USA.. 5http : //www.flickr.com/ Copyright 2007 ACM 1-59593-481-2/07/0001 ...$5.00. 6http : //del.icio.us/
42
Towards Social Summarization Extraction vs. Abstraction The point of all this is to emphasise how today’s content con- To begin with, there are two broad approaches to summariza- sumers are no longer playing a passive role when it comes to tion: extraction or intrinsic summarization versus abstrac- accessing online content. Instead, there are more and more tion or extrinsic summarization. Extraction techniques at- services that allow users to play more active roles, by pro- tempt to summarize a document by identifying and extract- viding feedback (opinions, annotations, ratings etc.) that can ing those parts of the document that are deemed to be the then be used to enrich or enhance the consumed content. In most important or salient, and the final summary is thus a this work we argue that a combination of user feedback and collection of sentences or sentence fragments from the orig- Web search engines can be harnessed to drive a effective inal document. In contrast, abstraction approaches do not and efficient form of document summarization. The key to preserve the original document content and instead prefer to our idea is the use of search engines as a means of generat- paraphrase the source content to provide a more concise con- ing short, query-sensitive document summaries. Specifically, tent representation. In general, abstraction techniques have when we submit a query q to a search engine, each result r is the potential to produce more condensed summaries than ex- accompanied by a so-called snippet text, a brief extract con- traction techniques, but rely heavily on sophisticated natural taining fragments of sentences from the document that are language processing and generation. Extraction techniques, related to the target query. These snippets provide the raw on the other hand, generally rely on shallow natural language material for document summaries and to generate a summary techniques, usually relying on statistical term-counting as the of document d we can look to the set of queries q1,...,qn basis for sentence selection and ordering. that have been submitted to a search engine which resulted in d being selected. The associated snippets, s1,...,sn, can One of the earliest automatic summarization techniques was then be combined to provide a document summary. The point described by [12] in 1958. Very briefly, a statistical approach is that the queries serve to identify key points of interest for was proposed in which the frequency of word occurrences in users who have been interested in d and provide a skeleton a document were used as an indicator of word significance. around which a summary can be constructed. These significance indicators were then combined with po- sitional information to obtain a sentence-level significance Thus, by mining search logs we can leverage implicit user value so that a final summary could be produced from the feedback (query-result selections) to construct social sum- top ranking sentences. This approach was refined in [6] with maries. In this paper, however, we focus on an alternative the inclusion of a range of additional document features for form of feedback by demonstrating how the bookmarks, and estimating the significance of each sentence. For example, their associated tags, in a social bookmarking service like structural elements such as title, subtitles and headings, as Del.icio.us can be similarly harnessed. Consider a user u that well as language features such as cue words, were all com- has bookmarked a page p using some bookmark tags b.We bined to guide a more sophisticated model of sentence scor- can treat b as a query for the bookmarked page, and retrieve a ing. Today, most commercial summarizers are still based on snippet for p by submitting b to a search engine and locating these basic approaches, with considerable research effort de- the snippet associated with the result that corresponds to p. voted to refinements such as the automatic learning of fea- ture weights for sentence selection [9] and shallow language Paper Summary based approaches such as analysing the discourse structure In this paper we describe the above technique in detail, fo- of text to aid sentence selection [13]. cusing on how to construct a summary of the page by ranking the snippet fragments according to their relative importance. Two extraction-based summaries that are especially relevant Furthermore, we demonstrate how the resulting summaries to this work are the Open Text Summarizer (OTS) [17] and out-perform those produced by a number of leading bench- the MEAD summarizer [16], which we use as benchmark mark summarization systems under a variety of experimental comparison systems in our evaluation. Both combine shal- conditions. Finally, we consider a number of additional ben- low NLP techniques with more conventional statistical word- efits of this social summarization technique, including the frequency methods to produce document abstracts from high potential to produce community-focused and query-sensitive scoring sentences. For example OTS incorporates NLP tech- summaries and the role of these summaries as a form of en- niques via an English language lexicon with synonyms and hanced, personalized result-snippets for search engines. cue terms as well as rules for stemming and parsing. These are used in combination with a statistical word frequency RELATED WORK based method for sentence scoring. Similarly, MEAD har- Automatic document summarization is a well established re- nesses statistical NLP data by using a database of English search area with a comprehensive programme of research words and their corresponding inverse document frequency providing a broad range of different approaches and tech- scores calculated from a large document corpus. Once again niques. While much work on summarization tends to focus this information is combined with word occurrence and po- on specialist types of document collections, such as news sto- sitional information to extract high scoring sentences. ries [14], technical papers [15] or medical documents [2], the rapid growth of the Web has motivated researchers to look at Abstract-oriented summarization requires a more knowledge- the summarization of more diverse Web content. In the fol- rich approach to summarization, and the research can be di- lowing sections we will outline the main approaches to docu- vided into two main areas. Very briefly, the first relies heavily ment summarization and discuss their relationship to the so- on syntactic parse trees for producing a structural conden- cial summarization technique proposed in this paper. sate. For example, the work of [4] uses a model of topic pro-
43
gression derived from lexical chains. The second approach good effect to generate high quality summaries of Web pages. also uses natural language processing, but the final source The novelty of our approach, however, stems from the use of text representation is conceptual rather than syntactic. This query-focused snippets in addition to the raw query terms. In semantic conceptual space is manipulated to eliminate redun- addition, we focus on how this perspective can be adapted to dant information, merge graphs and establish connectivity utilize other types of interaction data, such as the terms used patterns to produce a conceptual condensate of the original to annotate bookmarks in social bookmarking services such text (see for example the work of [8]). In general, generat- as Del.icio.us. ing abstract summaries is a very challenging task, requiring a combination of natural language understanding and gener- SOCIAL SUMMARIZATION ation, and is beyond the scope of this work. Social summarization is a method of producing intrinsic Web page summaries by using query-result selection pairs as a Web Page Summarization way to identify fragments of page content that may be com- bined to produce an effective page summary. The summaries With the advent of the World Wide Web, the need for docu- are social in the sense that they are derived from the inter- ment summarization has become more mainstream, and has actions of communities of users as they search. That being brought new challenges to automatic summarization. As said, this technique is not limited to the mining of search mentioned above, in the past many summarization techniques logs and in this section we will describe how our social sum- were carefully optimized for particular types of documents marization technique can harness the interaction information (news articles, scientific papers etc.). Such optimizations are captured by social bookmarking services such as Del.icio.us. often not feasible or appropriate in the more content-diverse world of the Web. That said, Web content introduces addi- The Core Idea tional features which may assist and guide the summarization As mentioned in the introduction, the core idea behind our process. For instance, Web pages include information fea- social summarization technique involves three basic ideas: tures beyond their core content compared to a generic doc- ument, such as the structural information implicit in HTML 1. A page p can be associated with a set of queries, Q(p)= mark-up. Moreover, Web pages do not exist in isolation since q1,...,qn, such that each qi has led to the selection of p the hyper-linked structure of the Web means that each doc- among a given search engine’s result-list; ument can be located within a network of inward and out- ward links. This connectivity information can also be used 2. For a given query, qi, the search engine (SE) will pro- to guide summarization. The InCommonSense system [3] duce a query-sensitive snippet, SSE(p, qi), which contains mines a Web page’s context by extracting segments of text a number of sentence fragments; surrounding in-links to the page, followed by a filter process ( ) that chooses the most accurate segment to return as an extrin- 3. The social summary for p, SSSE p , can be constructed ( ) sic summary of the page. This contextual idea is elaborated from the combination of fra gments associated with Q p such that SSSE(p)=f( SSE(p, qi)), with the on by [5] who look at combining this type of in-linking text qi Q(p) with the original page content to produce a more sophisti- fragments rank-ordered according to their importance.7. cated summary. In the following sections we will describe these stages in Particularly relevant to this paper is recent work on harness- more detail, focusing on how individual snippet fragments ing search engine click-through data to guide Web page sum- are scored and ranked during the summarization process. But marization. For example, the work of [19] explains how first, we will begin by describing how this methodology can two traditional approaches can be adapted to incorporate also be applied to the interaction data that makes up social click-through information during extract-based summariza- bookmarking such as Del.icio.us. tion. Specifically, [19] proposes adapting Luhn’s aforemen- tioned sentence-selection algorithm by using both the local From Bookmarks to Queries contents of a Web page and the query click-through data Services like Del.icio.us allow users to maintain online col- (queries submitted that have led to the selection of the tar- lections of their favourite bookmarks, with each bookmarked get document) to modify the basic word selection metric so page p associated with a set of terms, t1,...,tu,whichmake that the frequency of a word in the document is combined up the bookmark’s tag. Thus a given page, p, which has been with the frequency of a word in the queries to produce a hy- tagged by a set of users u1,...,uv, will be associated with brid word significance weight. A related approach is also set of bookmark tags, b1,...,bv; each tag, bi referring to the used to adapt a Latent Semantic Analysis approach to sum- set of terms submitted by user ui when tagging p. marization, originally described in [7]. Again, the weight of query words is increased according to its frequency within The set of p’s tags, b1,...,bv, obviously refer to the dif- the query collection. Both cases report significant improve- ferent ways that the users who bookmarked p view its con- ments over the summary quality of non click-through based tent; typically each tag contains 2-3 salient terms. As such, methods. bookmark tags look very much like search queries—that is, Q(p)={b1,...,bv}—and thus can be used to extract the The work described in this paper is concerned with produc- necessary snippet texts for p during social summarization as ing extract-based summaries of Web pages, and like the work described below. described in [19] we too believe that interaction or usage 7In the following we will drop the SE superscript for convenience, and refer data (such as search engine click-through data) can be used to to SS(p) and S(p, qi) without loss of generality.
44
Generating a Social Summary formally, matching sentence fragments are identified accord- Given a bookmarked page, p, a set of bookmark tags for this ing to Equations 3 and 4. If there is a significant overlap page, b1,...,bv, and a standard Web search engine, SE, between fragments (in practice, t =0.8 works well) then a social summary for p, SS(p), is generated in four basic the shorter fragment is said to be dominated by the longer steps: 1) extract the snippet texts, S(bi,p) to produce a set fragment; see Equation 5. To normalize the fragments across of sentence fragments; 2) normalise sentence fragments to the snippets for page p we replace all dominated fragments cope with fragment overlap and subsumption; 3) score each with copies their maximally dominating partners. In what sentence fragment according to its frequency of occurrence follows we use S (p, bj,y) to refer to the normalized version across the snippets; and finally, 4) rank-order the normalised of S(p, bj,y). fragments to produce the final summary.
Snippet Extraction Harvesting the snippet texts that form ( ( ) ( )) = the raw material for the social summaries is relatively straight- overlap S p, bi,x ,S p.bj,y (3) forward. In principle, each bi for p is submitted as a search |S(p, bi,x) S(p, bj,y)| engine query, p is located within the search engine’s result- max(|S(p, bi,x)|, |S(p, bj,y)|) list, and its snippet text is recorded; see Equation 1. There are a couple of options to consider here. First, there is no guarantee that p will appear near the top of the result-list for bi p . In fact there is no guarantee that will even be retrieved match?(S(p, bi,x),S(p.bj,y)) = (4) for bi by the search engine as it is not unusual for users to tag their bookmarks using terms that are not present in the 1 overlap(S(p, bi,x),S(p.bj,y)) ≥ t bookmarked page. In practice, we limit the search for p to 0 otherwise the top k results returned by the search engine for each bi; SEk(bi) refers to these top k results. This limits the cost of snippet extraction to be O(n, k), but does mean that certain bookmark tags may not lead to snippets. dominates?(S(p, bi,x),S(p, bj,y)) = (5) ⎧ ⎨1 match?(S(p, bi,x),S(p, bj,y)) ( )= ( ( )) ∩| ( )| | ( )| Sk p, b1,...,bn S p, bi (1) ⎩ S p, bi,x > S p, bj,y ∀bi:p SEk(bi) 0 otherwise
Of course an alternative snippet extraction approach is feasi- ble if one has direct access to a given search engine’s snippet S (p, bj,y)= (6) generator, in which case the snippet for page p given a query can be directly obtained. In fact, in the evaluation described S(p, bi,x) dominates?(S(p, bi,x),S(p, bj,y)) below we draw on the query-sensitive snippet extraction li- S(p.bi,y) otherwise brary from the Apache Foundation’s Lucene project8 to do this. Scoring Fragments For a page p we now have a set of snip- Normalizing Sentence Fragments More formally, each snip- pets (generated from queries over p), each made up of a pet is composed of a set of m sentence fragments (Equation normalized set of sentence fragments. Intuitively, it seems 2) that have been extracted from the text of the target page reasonable to assume that fragments which occur more fre- by the search engine. In general, for a given set of queries quently are likely to be more important; after all they are we can expect to generate a large collection of sentence frag- associated with page segments that are linked to the com- ments. Some of these fragments may be identical to each mon ways (queries or bookmark tags) that users refer to p. other, some fragments might subsume other fragments, and In this way the scoring model favours aspects of pages that many fragments will overlap. many users are interested in and these aspects will be more prominent in the resulting social summaries. Accordingly, S(p, bi)= S(p, bi,j) (2) we can compute the score of some fragment, f,as the num- j=1...m ber of times that f occurs in the snippets generated for p;see Equation 7. Our final summaries will be produced directly from these sentence fragments and significant overlaps will have an im- score(f)= occurs?(f,S (p, bi)) (7) pact on summary quality; there is little to be gained from in- i=1...v cluding two fragments in a summary that are all but identical, for example. While we could process the summaries to elim- inate this type of redundancy at summary formation time, we occurs?(f,S(p, bi)) = (8) choose instead to eliminate redundancy by producing a nor- malized set of fragments prior to summary formation. More 1 f {S (p, bi, 1),...,S (p, bi,m)} 0 8http : //lucene.apache.org/ otherwise
45
Fragment Ranking & Summary Generation Producing a fi- most recent bookmark tags up to a maximum of 50 per page; nal summary from the scored, normalised snippet fragments these tags are the information cues that we will use to gener- is now straightforward. First, compute the union of all of the ate the snippets used by our summarizer (SS). Furthermore normalised fragments (see Equation 9). Second, rank order 1386 of these pages contained description text embedded these fragments in descending order of their frequency scores within the HTML meta-content description tag. This facil- as shown in Equation 10. ity optionally allows a page author to provide a brief sum- mary description of the page in question and, for the pur- poses of this experiment, plays the role of a gold-standard ( )= ( ) Frags p, b1,...,bv S p, bi,j (9) (human-generated) summary. These 1386 pages, their meta ∀i,j descriptions, and their recent tags form the test data used in this study.
Methodology For each page we generate 3 different sum- SS(p)={fi :1≤ i ≤|Frags(p, b1,...,bv)|∩ (10) mary types from the visible page content; note that this ∀ ≤| ( )| ( ) ≥ ( )} i Frags p, b1,...,bv ,score fi score fi+1 means no meta data or HTML content or structural informa- tion is made available to any of the summarizers. Each SS An Example Social Summary summary is generated according to the approach described Figure 1 shows an overview of social summary generation in this paper. Note that for the purpose of this experiment, for a portion of the Wikipedia page “Java Platform”, using rather than relying on an existing commercial search engine the queries java platform and java virtual machine.Thesnip- to produce snippets, we used the Lucene snippet generator pet produced by a Web search engine for each query q1 and from the Apache Foundation, configuring it to produce snip- q2 is composed of text fragments extracted from the source pets of similar length and number of fragments to the main document, which are related to the query terms. For exam- Web search engines such as Google, Yahoo and MSN Search. ple, in Figure 1(a) and (b) we see that the snippet for query For each SS summary we also generate a comparable OTS q2=“java platform” is composed of three fragments from the and MEAD summary of the same length for the purpose of source text: comparison. Moreover, we generate SS summaries under a • f1=“The Java platform is the name for a bundle of related number of different conditions, varying parameters such as programs, or platform, from Sun Microsystems” the number of queries used to generate the SS summary and • f2=“Java Platform (formerly Java 2 Platform[1])” the target length of the summary, as discussed below. • f3=“the current version of the Java Platform is alterna- tively specified as version 1.5 or version 5.0 or version 5” Evaluation Metrics When comparing each automatically gen- erated summary to the corresponding gold-standard we used The union of all such fragments from the two snippets gener- the the ROUGE (Recall-Oriented Understudy for Gisting ated for q1 and q2, namely SSE(p, q1) and SSE(p, q2),pro- Evaluation) package [10]. ROUGE measures summarization vide the core content for the social summary, and the frag- quality by counting overlapping n-gram, word sequences, ments are scored and rank-ordered according to the method and word pairs between the candidate and gold-standard sum- described above to produce the final summary, SSSE(p) maries, a common approach that has been shown to correlate (Figure 1(c)). very well with human evaluations [11]. According to [11], EVALUATION the ROUGE-1 (unigram co-occurrence) metric is highly ef- So far we have described an approach to document sum- fective for single document summarization evaluation tasks marization that harnesses external information cues, such and evaluation of short summaries and so we used this along as search queries or bookmark tags, to guide a summariza- with ROUGE-L, which is based on co-occurrence between tion process. This social summarization technique constructs the longest common n-grams between candidate and gold- a summary by selecting fragments of sentences that recur standard summaries in our experiments. within the snippets that are generated from these information cues, thereby tailoring the final summary according to the Experiment 1 - A Comparison of Summary Quality ways in which it is used in practice. In this section we evalu- To begin with we will look at overall summary quality by ate the quality of the summaries generated by our social sum- comparing the summaries produced for each page by each of marization technique to comparable summaries generated by the 3 techniques (SS, OTS, MEAD), with SS using the full two leading benchmark systems, OTS [17] and MEAD [16]. complement of tags/queries retrieved for the page in ques- tion; this means that all fragments occur in the final SS sum- Setup mary. Note, the average length of the SS summaries was Test Data To begin with we will explain the test data, in- 24% of the original. In each case we compare the resulting cluding: a collection of documents to summarize; a set of summaries to the gold-standard using the ROUGE measures. gold standard summaries to compare against the automatic summaries; and a set of information cues to use a queries The results are presented in Figure 2 and show a clear bene- over the document collection. fit to SS across all evaluation metrics. For example, we see that SS achieves a relative improvement in its precision, re- As mentioned previously we use data from the Del.icio.us call, and F-measure scores over OTS by between 31% and social bookmarking service for this study. To begin with we 39%; for MEAD the relative advantage enjoyed by SS is be- downloaded a sample of 3781 bookmarked pages and their tween 24% and 29%. In all cases the differences between
46
Figure 1: An overview of social summarization. (a) A page p is associated with a set of queries used to access p (or tags used to bookmark p), in this case q1 and q2. (b) For each query, the search engine will produce a query-focused snippet, SSE(p, q1) and SSE(p, q2), composed of sentence fragments from the page content that are related to the queries. (c) Fragments are scored and rank-ordered to produce the final social summary, SSSE(p)
47
Figure 2: Overall summary quality in terms of precision Figure 3: Recall scores for SS, OTS, and MEAD sum- (P), recall (R), and F-measure (F), under ROUGE-1 marizers generating summaries of different lengths. and ROUGE-L, for SS, OTS and MEAD summarizers.
again the results point to a clear advantage for the SS ap- SS and the benchmark summarizers is significant at the 95% proach, which produces summaries of significantly higher confidence level, with the appropriate error bars shown in the recall across all sizes. Moreover, the relative advantage of SS figure. is largest for more compact summaries. For example, when generating summaries that are 10% of the original document, Experiment 2 - Summary Length vs. Quality the SS approach produces summaries that are 43-45% bet- One of the main motivations behind our work is the desire ter than those produced by OTS and 35%-37% better than to produce competent, compact summaries of Web pages for those produced by MEAD, in terms of recall. Of course, use in applications such as search result summarization or as expected we do see the quality of the OTS and MEAD converting content for small-screen mobile devices. As such summaries improving as summary size increases but it is we are interested in generating highly compact summaries. worth noting that both OTS and MEAD require significantly In this experiment we will consider the quality of summaries larger summaries to achieve the SS recall at the 10% level; in of different lengths by eliminating low scoring fragments both cases summary size must be about 30% before OTS or from the final social summary. The experiment above gener- MEAD summaries achieve the SS recall scores achieved by ated summaries with an average length of 24% of the source the 10% summaries. document and in this experiment we look at summaries that are 10%, 20%, 30%, 40% and 50% of the original document Although we have just presented the recall data in this ex- length. periment, similar trends are found for the corresponding pre- One point to note here is that we cannot always generate cision and f-measure scores. In each case we find that the a social summary above a certain length for a given docu- SS methods continues to significantly out-perform OTS and ment because we only focus on a fixed set of queries during MEAD across all summary sizes. summarization and these available queries may not lead to Experiment 3 - Search Activity vs Quality snippets that provide broad coverage of the document. Thus when reporting quality results below we highlight how many So far we have demonstrated how the SS technique can gen- documents were summarized for each size. Otherwise the erate superior quality summaries than OTS and MEAD by experiment proceeded in the usual way: for each document using information cues, such as social bookmark tags (or in- we generated SS, OTS and MEAD summaries of size k% and deed search queries), as a basis for fragment identification evaluated the result, relative to the gold-standard summaries, and selection. In this section we will consider the influence using the various ROUGE metrics. of these information cues on summary quality. In particular, we consider the relationship between the number of available The results are presented in Figure 3. For clarity we only cues (bookmark tags, in this case) and summary quality. To show the recall results in this experiment because we are es- investigate this we use different size subsets of queries as the pecially interested in demonstrating how, as each technique basis for SS summary generation, focusing on query sets of creates longer summaries, these summaries cover more and size 1-10, 11-20, 21-30, 31-40, and 41-50 queries. For each more of the gold-standard concepts and content. Thus, the page we generated SS summaries using different numbers of graph presents the ROUGE-1 and ROUGE-L recall scores queries selected at random from those available, producing for each of the 3 summarizers when generating summaries nearly 25,000 different summaries in total. that are 10% to 50% of the original document size in incre- ments of 10%. Note that the numbers in brackets along the To begin with it is worth speaking briefly about the relation- x-axis denote the number of documents tested for each size ship between average SS summary size and the number of summary. unique queries/bookmarks available for use during summa- rization. As the number of available queries increases there is As expected, as summary size increases we see a gradual a tendency towards larger social summaries; in general more improvement in the recall score for each technique. Once queries means greater coverage of document content. Bear-
48
we propose combining the snippets and text fragments that are associated with queries that are similar to qT .Moreover, when scoring a particular fragment we can give more weight to fragments that are associated with queries that are more similar to qT . |qT qC | Sim(qT ,qC )= (11) |qT qC |