Summarization of Documents That Include Graphics

Robert P. Futrelle

Biological KnowledgeLaboratory, Collegeof ComputerScience 161 CullinaneHall, NortheasternUniversity, Boston,MA 02115 [email protected]//www.ccs.neu.edu/home/futrelle/

Abstract the fi.ct that graphics plays a crucial role in so many Whendocuments include graphics such as diagrams,photos, documents. It is tempting to assumethat one can index and anddata plots, the graphicsmay also require summarization. summarize documents with graphics by focusing on This paper discussesessential differences in informational captions or running text commentaries. Unfortunately, content and rhetorical structure betweentext and graphics, manyimportant figures occur with little associated text - as well as their interplay. Thethree approachesto graphics the figures "speak for themselves". summarizationdiscussed are: Selection, in whicha subset This paper lays out someof the problems and prospects of figures is chosen; Merging,in whichinformation in for this new area of research on the graphical multiple figures is mergedinto one; and Distillation, in summarization of graphics. It first describes the basic whicha single diagramis reducedto a simpler form. These characteristics of graphics, comparingit to text. A critical procedureshave to considerthe contentand relations of the graphical elements within figures, the relations amonga component of the approach is metadata, a formal collectionof figures, andthe figure captionsand discussions description of graphics content that can be used by an of figure content in the running text. Weargue that for automated summarization system. This is in part what summarizationto be successful, metadata, a manipulable diagram parsing produces (Futrelle and Nikolakis 1995). representation of the content of figures, needs t, k then gives examples of manually constructed summaries generatedor includedinitially. Often,the textual refert , ~ing the strategies of Selection, Merging,and Distillation. to figures are not veryinformative, so it will be necessa ’ltilding ,m these, it then discusses the design of automated generate metadataby diagramparsing, as we have dont ~,.~ r",iqucs - workin progress. to developintelligent authorh~gsystems that will allowthe authorto easily includemetadata. This paperintroduces this newarea of research with manualsummarization examples Basic Concepts and Terminology and follows themwith a discussion of automatedtechniques Graphics and language are fundamentally distinct, because under development. For example, here is howtwo data language builds logical structures from words, which are graphs might be merged: arbitrary signs. In contrast, graphics is analogical, using spatial structures as the information-carrying elements (Sloman 1995). But from another point of view, graphics and language are similar, since both arrange familiar and understood elements in various ways to state novel information. Documents can be characterized by their genre (journal article, newspaperarticle, book, etc.) and their domain(operating systems, microbiology, etc.). Text time time ~ time is esse~ttially propositional, presenting statements in tatural language that can be mappedinto formalisms such -~ the predicate calculus. Latour has argued cogently Introduction ,at,~:r 1990)that graphics has been critical for the growth of science and technology because authors can "show" a The goal of automating the summarizationof text has been reader somethingremotely, without the author or the thing pursued extensively, e.g., the work referenced ill shownbeing present. (ACL/EACL-Summarization 1997; Mani and Maybury Graphics and language are different modalities, because 1998). But little has been done to produce summariesthat they require distinct interpretation functions, whereasboth select, merge,or distill the figures in documentsto produce mayrendered in the same medium,e.g., on the printed page a smaller or simpler set of figures and captions, in spite of (Stenning and Inder 1995). Graphics can be divided into veridical entities, such as photographs and drawings of Copyright©1997. American Association for ArtificialIntelligence real-world structures, and abstract diagrams such as flow (www.aaai.org).All rightsreserved. charts or data graphs. In graphics, there are also conv~’ntional classes of symbols, such as arrows, tick it|arks, and error bars, that are learned elements of the

92 visual lexicon. another. For graphics, text within figures as well as It is useful to distinguish the term, graphics, from more captions augment the purely visual material. HTMLimage specific ones such as: figures which are instances of mapsallow regions within figures to be defined that link to graphics in documents; diagrams which are line drawings; locations in the sameor different documents. and images which are raster-based. Diagrams are built up Graphicsmetadata is normallyof little direct interest to a from content-free items such as lines, curves, and closed humanreader, because it would contain knowledge that regions such as polygons, called vector data. The text would be self-evident when viewing the images. Complex associated with figures can be in captions or in the runni,,g and for,nally structured metadata such as parse trees or text, or includedwithin the figures. FOPCrepresentations would not be of interest to a reader Metadatais normally propositional material th;,~ ,ives eithzr. additionalt informationabout diagramstructure and c~ .mete. but is not alwaysmade available for viewingby the r.ad~ r. hdelligent Authoring Systems (IAS) Summarizationof text or graphics can be indicative or informative (Paice 1990). The former presents material Using these systems we have proposed (Futrelle and that indicates the subject domain,while the latter contains Fridman 1995), an author could easily generate metadata, information that is a subset of the original document.The and most importantly, could generate structured graphics three types of graphics summarizationwe will describe are for which the metadata is developedas an integral and non- selection, merging, and distillation. In selection, a subset obtrusive part of the drawing process. (That earlier work of the figures in a documentis presented unchangedin the focuses primarily on an IAS for text, but the principles summary. Merging combines information from more than apply to graphics.) Manydrawing and writing tools exist, one figure into a single figure, whereas distillation, though most are not particularly intelligent and few operates on a single figure to produce a simpler one. Both produce useful metadata. of the latter approaches involve generation of distinct new Probably the simplest approach to metadata would be for figures. All three methodsmay require parallel operations the author to provide a plain-text description of figure on the text, especially to generate captions. content, in a form that would go beyond the normal figure captiol~. It would be essentially the same as any verbal (tex,u~l) account of a scene, such as in a telephone Metadata con cersation, a novel, or in a written news report lacking f~gures, l’he plain-text descriptions could then be used by Wedefine metadata as all information beyond the surface con ventional text-based summarizationsystems. form of the document that a reader normally sees. An alternative method would be to use a form-based Metadata is information in a form that can be reasoned interface in an IAS, possibly keyed to the domain,allowing about by automated procedures. Metadata for text can the author’s input to be turned into a morestructured form consist of word frequencies, collocational statistics, such as knowledge frames. The system could allow the thesaural relations, sentence parses of varying author to indicate links betweenform elements and objects sophistication, and discourse structure such as resolved or regions in the diagram being drawn. anaphors. The description changes somewhat when we To generate more structured graphics metadata, we first include diagrams, because it seems appropriate to consider consider normal drawing applications. In these, predefined text and graphics each serving as metadata for the other, elements are available (lines, ovals, polygons, Bezier e.g., a figure caption. But true graphic metadataconsists of curves, etc.). In addition, constraints can be imposedsuch structural descriptions of figure content, which might be as positioning to a grid, or aligning or spacing horizontally plain-text or a syntactic parse of the diagram(Futrelle ar or vertically. Transformationssuch as translation, scaling Nikolakis 1995) or a formal semantic structure. A simpk and ro,ation are available. exampleof graphics metadata is the numerical data behil. The key to building an IAS for graphics that builds the a spreadsheet data plot. -eded metadata is to mapvarious predefined graphical In manualanalysis, the characterization of a figure arises ctements onto semantically meaningful concepts. This is from the visual recognition and analysis done by the what is customarily done in CADsystems, e.g., a VLSI reader. In an electronic document, automated image design system in which "lines" are interpreted by the understanding techniques could be applied to GIF and system as electrical conductors and the designer selects JPEG figures to develop metadata. This is a difficult various items defined by their function and their form, e.g., problem, especially if the techniques are to be applicable a NANDgate, and places and "connects" them in a across a variety of domains. If a figure is represented in drawing. vector form or is easily convertible to vector form, then The above approach to diagram authoring can be there are techniques that we have developed to parse such described as using a "semantic construction kit". The final figures to generate metadata(Futrelle and Nikolakis 1995). structures, beyond the purely visual form, constitute Authors generate metadata as a matter of course. The graphical metadatathat could be used for indexing, search, superstructure of a document, represented in formal tern and automated graphics summarization. such as SGMLmarkup, is metadata. Links to ot’ r documents such as citations or HTMLhyperlinks .,

93 Summarization by Selection launch with no distinguishing features. The selection procedure should choose the right-most figure in this case. In this example, we consider three figures, grouped as a The automation of this selection is not easy, because of the single entity, Fig. 1, and formulatecriteria for the selection complexity of the caption. For example, the terms "thin of one of them as the summary(often called "extraction" in groundedwire" and "copper filament" are co-referential. It the text summaryliterature). The figures originally is even more difficult to deduce from the caption that the appearedtogether as color halftones in Scientific Amel:’c.,.,t ~cketis involvedin the third figure, but it is the proximate (Diels et al. 1977)(pg. 53) but are representedhere ~:s ’in,~ cause of the "copper filament" causing "a conductive drawings. channel of ionized air." This requires reasoning beyond syntax to world knowledge, which forces the analysis to ~:ocus on a specialized (sub-)domain. (To appreciate difi’iculties of such an analysis, the reader should imagine making the figure selection based entirely on the text, without looking at the figures themselves.) Another important point about the caption is the distribution of text devoted to the three figures. There is not a one-to-one correspondence between text spans in the caption and the three component figures, but an approximate division can be identified. The first caption sentence serves as the title (7 words). The segment, "The small .... groundedwire" (16 words) correspondsto the left Figure 1. Original caption: "ROCKETStrigger lightning in figure, the overlapping segment, "a spool ... in flight variousfield experiments.The small, specially constructed missile(left) carries at its base, a spoolof thin, grot,~d-~d ¢center)" (1 l words) corresponds to the center figure, and wire that unwindsin flight (center). The first stroke ,he longest span, "The first stroke .... path (right)." (49 triggered in this wayfollows this copperfilamcnL .,,~d x~?ords)":orresponds to the right figure. Thoughthis sort of creates a conductivechannel of ionizedair; later strok,. ~f n?!ysis is attractive, there is a potential circularity to the the sameflash event (whichcan occur repeatedlywiil,.l~ ¯ gum~.~t,since we are relating the text to figure content, fraction of a second)travel along increasingly tortuous which maybe knowonly through the text! routes as the wind deformsthe conductivepath (right)." The salience measures are automatedtechniques, but the Theline drawingsare for discussionpurposes, schematizing determination of the figure content in waysthat can relate the originalhalftones. to the salient terms is the challenge. It is not even The only mention of rockets in the running text of the appropriate to assume that there will always be text from article is: "... researchers ... can trigger lightning using which the most salient content is determined, which would small rockets that trail a thin, grounded wire." This then control figure selection. Indeed, a figure mayitself statement parallels the first sentence in the caption. If we deliver important information by itself without much were to generate a summaryrelated to rockets using the supportingtext. running text and caption by any of a numberof text-based methods, the dominant content would be rockets oigger Metadata for selection. lightning, appearing in both places; it is the most salient (Boguraev and Kennedy 1997). Other approaches for It is worth discussing how the selection decision above discovering the most significant terms use the tf*idf could profit from the appropriate graphics metadata. Plain- text meiadatafor the figure might take the following form: measure (Salton 1989), or assume Bernoulli distributions i of terms, which is useful for small samples (Dunning 1993). All of these methodscompare the local frequencies "Figure, rocket (left): The rocket is sitting stationary of terms within an article with the frequencies in a broader the ground on a sidewalk. Attached to the rocket base is corpus - simply choosing the highest frequency open-class a spool of copper wire. Some of the wire is shown terms is not adequate. In analyzing captions in this way,it unwrappedfrom the spool." maybe useful to consider saliency measuresacross a set of occurrence contexts: within figures, in captions, in "Figure, rocket (center): Shows the horizon and referring text, in the article, in a containing technical portion of the sky that contains the rocket and the corpus, and in a larger corpus containing additional non- trailing wire. The rocket is ascending in flight. The technical documents. It should be possible to exploit such rocket is trailing the copperwire (not visible). Thelower a hierarchical sequence of contexts to generate a end of the wire is attached to the groundso the wire can hierarchical set of descriptive text items. conduct electricity betweenthe sky and the ground." Only the right-most figure illustrates rockets trigger lightning. The first two showaspects of the rockets with "Figure, rocket (right): Showsthe horizon and a portion no explicit inclusion of lightning. The left figure showsthe of the sky that contains the rocket and the trailing wire wire and spool, and the center shows a "generic" rocket and many lightening strokes. (Rocket and wire not visible). The first lightning stroke triggered by the wire

94 follows the wire to the ground, creating a conductive L-C-R channel of ionized air. Thus, the first lightening path follows the smooth path of the wire. Later lightning strokes of the same flash event (which can occur repeatedly within a fraction of a second) travel alo,,~ ENABLE increasingly tortuous routes as the wind deforlns ’,.,- conductive path." L-C Lightningpaths (R) Note that the right-most figure metadata is similar to material in the original caption, since that was quite detailed. The metadata contains definite noun phrases but no pronouns, to facilitate natural language analysis. A BACKGROUND text-based summarization system using the plain-text would presumablyconclude that the third item is the most salient. A more structured approach to metadata for the figures Rocket(L) Rocketin flight (C) could be used for indexing and summarization. This could be generated by an IAS, a form-basedinterface, or parsing Figure 2. Rhetorical Structure Theory(RST) tree (Mann the plain-text metadata. An example is shown here in a and Thompson1987) for the three figures in Fig. 1. The hybrid frame-like structure: nuclei are underlined, with satellites occupyingthe other branchof eachrelation. Thenucleus that is attachedto the Figure ID: rocket, left root is the right-mostfigure (R), whichis retained as the Foreground: summaryby selection. The left-most figure (L) is the Item: rocket buckgroundfor the center one (C), and the left mergednode State : stationary (L-C)enables the final, right-most,figure. Location: on ground Rhetorical structures such as the one in Fig. 2 are typically Background: generated by hand, because automated generation depends Items : walkway, building, trees on rather deep understanding of content. They are nonetheless useful for discussion purposes. Figure ID: rocket, center In the rhetorical structure of figures, as with text, it is Foreground: important to keep in mind that there is an informational Item: rocket componentthat is focused on the content of the figures, as State: flying well as an intentional component that focuses on the Location: in air author’s goal of emphasizing certain points in the Background: discourse. These are not entirely separate concepts and one Items: sky, clouds does not necessarily dominate the other (Moore and Implicit: Pollack 1992; Nicholas 1996). trailing(rocket, wire)

Figure ID: rocket, right Summarization by Merging Foreground : Item-set : lightning Mergingthe information from various figures into a single Background: one could be used for the rocket example, but it is Items: sky, clouds particularly appropriate for plots of data, especially for sets Implicit: of sinfilar or contrasting data. It is quite commonto plot equal (path (wire), more than one set of data in a single data graph to effect path ( first (lightning) this type of comparison. This merging within a single figure is already in an efficient form and difficult to reduce The "Impi ic i t" slot holds informationabout the figures further. So we will concern ourselves with data in multiple that is not necessarily visually obvious. figures, e.g., the data in the four subfigures shownin Fig. 3, taken from (Kim, Watrud, and Matin 1995). Rhetorical Structure Theory (RST) Basedon the figure content, it is possible to argue for the rhetorical structure shownin Fig. 2.

95 m 1000 1 12000 A

-10000 800

-8000 !

.1 600 ~ ’ _6000

" r 1_400 .~ _4000 -4

200 .01 .01 I I I I I I I I I I I I I I 2OOO 0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16 Time(hrs) Time(hrs)

5000 1 12000 -D m LllOOO ) -4000 10000 9000 i .1 8000

./ 7000 i-3°°° 600O

01 5000 .01 I I I I I I I I 2000 0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 6 Time(hrs) Time(hrs)

Fig. 3. Originalcaption: "[3-Galactosidasesynthesis during growthand stationary phase in the mini-Tn5::/acZ1fusion ,mutants of P. putida MK1.(A to D) MK201,MKI04, MK107,and MK114,respectively. Cultures were grownon M9medium plus 0.1%glucose; other methodsare described i,~ the text. Opensymbols, growth; solid symbols, b- galactosidaseactivity." This wasFigure 2 in the referenced document.It is faithfully redrawnfrom the original. In Fig. 3, there are four open circle datasets and four

96 solid circle datasets. These,rather obviously, fall into thrte data redundancy in the four subfigures. To maximize behavior classes. All the open circle datasets behave i,l informationin a research article, authors try to include only essentially the same way; they form the first class. Ti~c f;gures that present non-redundant information. By their solid circle datasets in B, C, and D are similar to one ve~3nature, such sets of non-redundantdata are difficult to another; they form a second class. The solid circle da:asef merge. In such cases, selection and distillation techniques in A is the outlier; it behavesquite differently and formsa ~, ould have to be used, techniques that makedecisions as third, singleton class. Given that there are only three to which information is the most important or germaneto classes of behavior, it is not difficult to construct a rather the document,discarding what is more peripheral. succinct summary,using only one plot, as shownin Fig. 4. 1.0 ’~ 1 Summarizationby Distillation = In this example, we consider a single figure, 5, describing the series of operations from Java source code through to .75 its execution on hardware (Hamilton 1996). The goal is generate a simpler figure that omits and/or combines the objects in the original (Futrelle and Chambers1998). The strategy is one that can be rather universally applied to flow charts: the omission of intermediate steps. Such a simple t~chnique is inadequate for text summarization, =. becam_ the most salient sentences are sometimes in a single paragraph, or maybe the first sentence of each of a ° sequence of paragraphs. In addition, we take into account .25 the diagramsuperstructure, its columnarorganization. ! Bytecode I loader .01 0.0 0 2 4 6 8 10 12 14 16 I:=I Byte~cOde I 1 Time (hrs) verifier I ICOffJ’ ~er Figure 4. A mergedgraphic, a data plot, showingthe three distinct classes of behavior evident in Fig. 8. The generationof this plot wouldalso require the cogenewion Iinterpreter . ~i of a captionwhich, at the very least, mustidentify the t, c I Java data sets, i.e., that the solid circles arose frombac,t.i .’ byteeode mutant strains MKI04, MKI07, and MKII4, and the I crosses from strain MK201.The open circle data applies to all four strains. Sincethe numericalvalues of the right-haad data scale differ in each of the four original data graphs, only normalizedvalues are used in the mergedplot. Each dataset group could be averaged, but here only example Hardware datasets selected fromsingle plots are used, whichshows the data variancebetter than an averagewould.

The automation of the generation of Fig. 4 is more Figure 5. Original caption: "Executing a Java applet" straightforward than for the other examples, since This was Figure 1 in the referenced document. It is statistical analysis of the datasets could reveal the faithfully redrawnfrom the original. appropriate equivalence classes. Oncethis was done, the generation of a revised caption wouldnot be too difficult; it Fig. 5 showsthe full original figure from the document, could be structured as an addendumto the original caption. with 10 stages and 10 arrows. The figure is a directed Hovyhas emphasized the production of summaries using acyclic graph (DAG)organized as three vertical columns, analysis followed by generation of summarizatioti te.q wi: the right-most column containing a split and merge. (Hovy and Lin 1997). All of our examples require There is no other figure in the original article that generation of altered captions. res.,:mb’.-s Fig. 5, so we musteither omit it or summarizeit Muchof the data published in data-intensive fields such in sc,Jc way. Weshow three successively simpler manual as biology is not as clear-cut as in this example. Tt : summaries below, which attempt to distill the most mergingof data here is facilitated by the high degree of important structures fromthe original.

97 the text, it may be appropriate to also include the Network I BytecodeI or tile system stage. Viewing the figure in the large, this Java _1,oa,:,er I st:lic is intermediate between the processing the source, on IsourceI l!.,e l~.ft, and execution, on the right. There are two textual references to the figure in the original document. The first is the caption, only four 4, wo~’ds, but containing critical content. The only explicit Java reference to the figure in the running text is, "In this case, bytecode Bytecode Just-in-time interpreter compiler program integrity is guaranteed by the runtime, shown in Figure 5, which verifies that the bytecode corresponds to valid Java code and does not contain any viruses." This I discussion is a bit confusing, since the verification is L Hardware shown being done immediately after loading and the I runtime is shown as a separate later stage. But the text does emphasize the verifier which we have omitted based on an analysis of the figure itself. This could argue for the Figure 6. Executing a Java applet. Three internal inclusion of the verifier stage in even the simplest nodes from Fig. 5 have been deleted. summary, as shown in Fig. 8. Fig. 6, shows a moderate reduction of the figure to a summary with 7 stages and 7 arrows. The intermediate Figure 8. Executing a compiler step is omitted, as well as the Bytecode veril,~. Java applet. Verifier and Bytecode interpreter steps. The distillation removes th,: Iairce node reinserted intermediate stage in the left-most column, retains the only because of its mention item in the middle column and deals with the right-most in running text. column by omitting the commonintermediate verifier stage r and the intermediate runtime stage on the left split section headed by the interpreter stage. It could be argued that va:ode every stage between the loader and the hardware could be I omitted in the right-most column. One argument against this, is that it would give strong weight to the non-specific V item, "Hardware". BytecodeJ Iverifier J Java Figure" 7. Executing a Ja~ . source applet. Additional /\ internal nodesand fin ,1 Java Just-in-time I "Hardware" n J0e V rut:time compiler "vaI deleted. IbytecodeJ It is important to note that some of the items in the /o\ original Fig. 5 are not mentioned anywhere in the running text of the article. The most notable example is the lack of Java any discussion of the Just-in-time compiler. runtime compiler It is not difficult to imagine a content representation for J Just-in-time Fig. 4 that could be used by a distillation algorithm. The figu,e is in the form of a DAG,so the representation of the figure can be exactly that, a directed acyclic graph, with Fig. 7 is a further reduction to 4 stages and 3 afro’, s text contained in each node, Fig. 9. In addition, the from the original 10 and 10. What is retained here is the . olumnar structure of the original figure is indicated by initial stage from Fig. 6, the Java source, and the non- rectangles grouping the nodes. trivial end stages of the process, the interpreter ar.d J1T compiler (Hardware stage omitted). The decision include the end-stage of the left-most column, the bytecode, is a difficult one, but "bytecode" is prominent in the running text, whereas the loader is a generic stage that receives little discussion. Depending on its prominence in

98 the last example. We will further assume the most common mixture of text and graphics which is the following:

¯ The graphics contain only a modest amountof text. ¯ The caption discusses only limited aspects of the figure, perhapsnot including its significance. X= nodesdeleted to createFig. 4 ¯ The running text discusses manyaspects of the topic X=additional nodes deleted to createFig. 5 illustrated in the figure but not all of it in direct reference to the items displayedin the figure. ¯ = noderetained in Fig.6 because ofits mentionin the running text Using these assumptions, the following computational strategy could be used to distill a diagram. Steps 1-4 focus on text salience, 5 on graphics structure and salience, and Figure9. Nodesdeleted fromFig. 5 to create figures f td 6- 10 on the distillation process: 7. Retentionof nodefor Fig. 8. The distillations are shownschematically in Fig. 9. The I. Computethe statistical linkages between the running text, caption, and text within the diagram. algorithmconsists of first deleting interior nodesof triples 2. Find cue constructs in text that point to the figure to and final nodes of pairs, excepting the final leaf node. The pinpoint relevant text, e.g., "is shown","the figure second set of deletions removes the node which is the interior of the triple at the highest level of organization of demonstrates....", "Fig. 15 shows....", etc. the DAG(Network or file system) and deletes the low- 3. Locate definite noun phrases that appear to reference information (Hardware) node on the right, retaining the the figure content. 4. Using 1-3, determine the most salient text-related nodes that distinguish the two branches. Clearly, we are dealing with a partitioned DAG.The partitioning allows a portions of the figure. hierarchical approach to be used. A similar argument can 5. Using genre- and domain-specific graphics routines, be made for the generation of Fig. 8, except that the tabulate all topological and layout structures of the diagram, e.g., looking at initial and final elements, verifier node is retained rather than deleted, becauseof the reference to it in the running text. This is indicated by the topology, alignments, graphically and textually filled circle in Fig. 9. To summarize,the uncrossed nt,~2, e nphasized elements, etc. in Fig. 9 are all that are retained in the simplest sumn,. , 6. Pick the graphical elements to be retained or Fig. 7, while the filled nodeis addedto those to cleate .... emphasizedin the distilled version, using measures five nodes in Fig. 8. The form shownin Fig. 9 em?hasize~ from 1-5. the topology of the graph, its topovisual structure (Harel 7. If elements are to be omitted, use topological or layout containment to "seal the gaps". This can be 1995), since node labels are not included and the metric layout of the original figure is not relevant. aided by coalescing low-scoring elements into single An alternate view is to write the partitioned DAGas a units. series of indented structures analogous to sections and 8. If elements are to be emphasized, use properties subsections of a text. Then the algorithm could be thought such as size, color, and central location to emphasize of as retaining the initial "sentence" of each paragraph and them. deleting interior or final sentences. This puts the approach 9. Restructure text to matchthe distilled graphics using text summarization procedures focused on the text more on a par with text summaryalgorithms. An RSTtree could be built for this examplealso. most closely related to the summarydiagram. 10. Generate the final summarygraphics and text.

Approaches to Automation Thoughthe steps above are not concise directions on how to bdi!l summarization algorithms, they are adequate to Wewill assume for this discussion that the metadal ~,uide the implementation development that we are figures already exists. This assumption must be tempered, , ~tently pursuing. i.e., it isn’t appropriateto assumethat the metadatacontains all high-level data about the graphics and text and all their interrelations. That would essentially assumethe result, Discussion allowing us in principle to read off the summaryfrom the metadata. Instead, we’ll assumethat only the metadata for The major aspects of the rhetorical role of graphics are that the diagram structure itself is available. A numberof the graphics can present complex information succinctly and operations described below can be viewed as building the that the reader can retain and use information in the visual additional metadata needed to construct a summary. The modality, rather than merely having "read" or "heard" a goal assumedwill be the distillation of a single figure, as in discourse. As in text, there need be no explicit signal of a rhetorical relation involving graphics. For example, a

99 magazine article on environmental legislation could be the text that discusses it. Such an interactive mapping headedby a color photo of a dead stand of trees. Its role as between corresponding symbolic and graphical structures an exampleof the result of air pollution / acid rain would is used in our Diagram Understanding System Inspector be clear without any specific mention in the text. Graphics (DUSI) (Futrelle and Nikolakis 1995). HTMLimage is often exemplary and/or extensional, making its point maps are similar (Schengili-Roberts 1996). The inverse, with an example or with contrasting cases. Many mappingfrom a click to a graphic region, is available in abstractions easily expressed in language are essentially some Webpages in which, for example, a click on a city impossible to represent in a static graphic, e.g., a mapwill execute a CGIscript (Gundavaram1996) that will "tendency" to do something, or a "hope" that something cause the mapto be recentered or zoomedaround the click will happen, or that a certain methodis "reliable". But on point. A shift from a detailed street mapto a broader area the whole, graphics can participate in virtually all the text city or region mapin which only major arteries are shown rhetorical relations discussed in the classic paper (Mann is a form of summarization by distillation. The opposite and Thompson1987). shift from a broad view to a local view is a form of Grammar-based approaches to diagram parsing can smm,l~rization by selection, actually with expandedrather generate useful metadata for diagrams (Futrelle , -.J than reduceddetail. Nikolakis 1995; Golin 1991; Helm, Marriott, and Oder~kl, Othel lines of research that can contribute to graphics 1991; Wittenburg, Weitzman, and Talley 1991). TI.,. summarization are the work on diagram generation (North results of the syntactic analysis can be used in the 1997), and the co-generation of text/graphics explanations generation of semantic representations (Futrelle and (Andr6 and Rist 1993; Feiner and McKeown1990; Marks Nikolakis 1995) that can in turn be used in summarization. and Reiter 1990). There is a great deal of relevant research Statistical analysis of data can result in an extremeform on text summarization, e.g., (ACL/EACL-Summarization of distillation to produce what is essentially a summary, 1997; Mani and Maybury 1998), which we have profited e.g., the computation of the meanand standard deviation from, but the focus on graphics in this paper and for a dataset, or the fitting of a straight line or simplecurve limitations of space have prevented any detailed discussion to a dataset. Whena point statistic such as a mean is of it here. This Symposiumis replete with up-to-date computed,the result does not even require a data plot. papers on the topic, which in turn refer to the large and The nature of graphics allows summarization in ways burgeoningliterature on text summarization.For additional that are quite impossible with text. Oneexample is the use results by our group on graphics summarization, see of "thumbnails" which are figures that are markedly (Futrelle and Chambers1998). reduced in size from the original. Doingthe samefor texi would render it unreadable, but graphics thumbnail tc often quite useful. Theyare used frequently on the V ~, ,d- Conclusion Wide Web to reduce download times and space requirements. Documentsummarization including graphics is a worthy The interactive nature of graphics can be used in goal and a necessary componentof any system that claims summarization. For example, thumbnails used on the Web to handle document content in full. Though there are overlaps between summarization concepts and techniques can normally be expandedto larger and more detailed form by selecting them. Another tool is color or other forms of for graphics and text, graphics has a distinct analogical highlighting. They allow a system to display rather character. It requires its ownconcepts and methodology complexstructures, but then to selectively colorize or bold that go beyondtext but which are integrated with text. We certain elements of the viewer’s choice, or put other have illustrated howmanual construction of summarization information in a muted background form. Because of the involving graphics can be done. The challenge is to gestalt nature of much graphics, the essentially automate them. Automation will depend in part on instantaneous recognition that can occur, these techniques developing better content representation techniques for can be quite useful. Whenthis is done with text it places a graphics in future electronic document systems. This greater burden on the viewer, who would have to devote metadata will be built by a combination of author- time to reading each new piece of text, e.g., whe,a contributed information and direct analysis of figure following hyperlinks in a browser. These intera conten, and of the content of graphics-related text. techniques allow a higher information density than wt,..J Aekno,vledgments. Thanks to Dragomir Radev normally be feasible for hardcopy, since the density i~ (Columbia), Linda Mizushima (USC/ISIS library), selectively loweredwhen certain subsets of information are Mark Maybury(Mitre).for furnishing me material related highlighted or others deleted from view. to summarization. Even more sophisticated interactions can easily be imagined, e.g., selecting a portion of text that discusses a subset of the information in a figure could lead to the References highlighting of only that information, or hiding the ACL/EACL-Summarization. 1997. Proceedings of non-referenced information. Similarly, selecting a portion ACL/EACL’97Intelligent Scalable Text Summarization of a figure could lead to the display of only that portion of Workshop.Madrid: Assoc. Computational Linguistics.

100 Andr6, Elisabeth, and ThomasRist. 1993. The Design t,f Mani, Inderjeet, and Mark Maybury, eds. 1998. Advances Illustrated Documentsas a Planning Task. In Intelligent in Scalable Text Summarization:(to appear). Multimedia htterfaces, edited by M. T. Maybury. Menlo Mann, William C., and Sandra A. Thompson. 1987. Park, CA: AAAIPress/The MIT Press. Rhetorical structure theory: A theory of text organization: Boguraev, Branimir, and Christopher Kennedy. 1997. USC/InformationSciences Institute. Salience-based Content Characterisation of Text Marks, Joseph, and Ehud Reiter. 1990. Avoiding Documents. Workshop on Intelligent Scalable Text Unwanted Conversational Implicatures in Text and Summarization,at Madrid, Spain. Graphics. AAAI90. Diels, Jean-Claude, Ralph Bernstein, Karl E. Stahlkopf, Moore, Johanna D., and Martha E. Pollack. 1992. A and Xin MiaoZhao. 1977. Lightning Control with Lasers. Problem for RST: The Need for Multi-Level Discourse Scientific American277 (August):50-55. Analysis. ComputationalLinguistics 18 (4):537-544. Dunning,Ted. 1993. Accurate Methodsfor the Statistics of Nicht, las, Nick. 1996. Parametersfor Rhetorical Structure Surprise and Coincidence. Computational Linguistics 1’! The~ ry Ontology. Melbourne: (1):61-74. http://www.arts.u nimelb.edu.au/Dept/LALX/postgrad/rst_o Feiner, Steven K., and Kathleen R. McKeown.19.;,,. ntolo~y.i~tml. Coordinating Text and Graphics in Explan,’,tiol, North, Stephen, ed. 1997. Graph Drawing (Symposium on Generation. AAAI90. Graph Drawing, GD ’96). Vol. 1190, Lect. Notes Futrelle, R. P., and Tyler Chambers.1998. Summarization ComputerSci. Berlin: Springer. of Diagrams in Documents (submitted). In Advances in Paice, Chris D. 1990. Constructing literature abstracts by Scalable Text Summarization, edited by I. Mani and M. computer: Techniques and prospects. Information Maybury:(to be published). Processing & Management26 (1): 171-186. Futrelle, Robert P., and Natalya Fridman. 1995. Principles Salton, G. 1989. Automatic Text Processing: The and Tools for Authoring Knowledge-Rich Documents. Transformation, Analysis, and Retrieval of Information by DEXA95 (Database and Expert System Applications) Computer. Reading, MA:Addison-Wesley Pub. Co. Workshopon Digital Libraries, at London, UK. Schengili-Roberts, Keith. 1996. The Advanced Html Futrelle, R. P., and N. Nikolakis. 1995. Efficient Anal)s’. ( ompanion: AcademicPress. of Complex Diagrams using Constraint-Based Pats:, ICDAR-95 (Intl. Conf. on Document Analysi~ ,~ Slomao Aaron. 1995. Musings on the Roles of Logical and Recognition), at Montreal, Canada. Non-I )gical Representations of Intelligence. In Diagrummatic Reasoning. Cognitive and Computational Golin, E.J. 1991. Parsing Visual Languageswith Plc:~tte ; erspect;ves, edited by B. Chandrasekaran, J. Glasgowand Layout Grammars. Journal of Visual Languages at, I N. H. Narayanan. Menlo Park, CA, Cambridge, MA: Computing 2:371-393. AAAIPress, MITPress. Gundavaram, Shishir. 1996. CGI Programming on the Stenning, Keith, and Robert Inder. 1995. Applying World Wide Web (Nutshell Handbook): O’Reilly & Semantic Concepts to Analyzing Media and Modalities. In Associates. Diagrammatic Reasoning. Cognitive and Computational Hamilton, MarcA. 1996. Java and the Shift to Net-Centric Perspectives, edited by B. Chandrasekaran, J. Glasgowand Computing. Computer 29 (August):31-39. N. H. Narayanan. Menlo Park, CA, Cambridge, MA: Harel, David. 1995. On Visual Formalisms. In AAAIPress, MITPress. Diagrammatic Reasoning. Cognitive and Computational Wittenburg, K, L. Weitzman, and J. Talley. 1991. Perspectives, edited by B. Chandrasekaran, J. Glasgowand Unification-based grammars and tabular parsing for N. H. Narayanan. Menlo Park, CA, Cambridge, MA.: graphical languages. Journal of Visual Languages and AAAIPress, MITPress. Computing2 (4):347-370. Helm, R., K. Marriott, and M. Odersky. 1991. Building Visual LanguageParsers. CHI ’91 - Proc. Conf. on Human Factors in ComputingSystems, at NewOrleans, Louisiana. Hovy, Eduard, and Chin Yew Lin. 1997. Automated Text Summarization in SUMMARIST.Workshop on Intelligent Scalable Text Summarization,at Madrid, Spain. Kim, Youngjun, Lidia S. Watrud, and A. Matin. 1995. A Carbon Starvation Survival Gene of Pseudomonasputida is Regulatedby ~54. J. Bacteriology 177 (7): 1850-1859. Latour, Bruno. 1990. Drawing things together. In Representation in Scientific Practice, edited by M. Lynch and S. Woolgar. Cambridge, MA:MIT Press.

i01