Information Cartography

COMMUNICATIONS CACM.ACM.ORG OF THEA C M 11/2015 VOL.58 NO.11 Information Cartography

Algorithmic Authors Fail at Scale Inductive Programming When Technologies Manipulate Our Emotions

Association for Computing Machinery contributed articles

DOI:10.1145/2735624 The problem of automatically extract- A metro map can tell a story, ing structured knowledge from large datasets is increasingly prevalent. as well as provide good directions. Several methods have sought to sum- marize and visualize narratives.2,28,29 BY DAFNA SHAHAF, CARLOS GUESTRIN, However, most work only for simple ERIC HORVITZ, AND JURE LESKOVEC stories that are linear in nature. In contrast, complex stories exhibit a nonlinear structure; stories spaghetti into branches, side stories, dead ends, and intertwining narratives. To explore Information them, users need a map to guide them through unfamiliar territory. We previously introduced a methodology for creating structured sum- Cartography maries of information we call “metro maps.” The name is metaphoric; just as cartographic maps have been relied on for centuries to help us understand our surroundings, metro maps help us understand the information landscape. In this article, we explore methods we “RAISE YOUR HAND if you don’t quite understand this have developed for automatically creating metro maps of information.25–27 whole financial crisis,” said David Leonhardt’s New Metro maps consist of a set of lines York Times article, March 2008. The credit crisis had with intersections or overlaps. Most important, they explicitly show the re- been going on for seven months and extensively and lationships among different pieces of continuously covered by every major media outlet in information in a way that captures a the world. Despite that coverage, many readers felt story’s evolution. Each metro stop is a cluster of articles, and lines follow they did not understand what it was about. coherent narrative threads. Different Paradoxically, pervasive media coverage may have lines focus on different aspects of the story; for example, the map in Figure contributed to the public’s lack of understanding, 1 was automatically generated for the a phenomenon known as information overload. query “Crimea.” The map outlines the Recent technology advances allow us to produce data at bewildering rates, while the surge of the Web key insights " Though human attention and has brought down the barriers of distribution. Yet comprehension can be overwhelmed by the data deluge, automatic methods despite this accelerating data deluge, knowledge and can extract structured knowledge and attention remain precious and scarce commodities. provide maps of complex information landscapes to help people understand Writers, researchers, and analysts spend countless ideas, connections, and storylines. hours gathering information and synthesizing " Properties of good maps are difficult to formalize; important characteristics meaningful narratives, examining and inferring include coherence of storylines, coverage of diverse and important relationships among pieces of information. Subtleties topics, and relationships among pieces and relationships in an evolving story are easy to lose of information. " These principles can be used to in an echo chamber created by the modification and synthesize meaningful narratives from reuse of content, as fueled by incentives to attract large datasets across multiple domains, including news stories, research papers, indexers, eyeballs, and clicks on advertisements. legal cases, and works of literature.

62 COMMUNICATIONS OF THE ACM | NOVEMBER 2015 | VOL. 58 | NO. 11 METRO MAP CREATED BY ALBERTO ANTONIAZZI ANTONIAZZI ALBERTO BY METRO MAP CREATED

NOVEMBER 2015 | VOL. 58 | NO. 11 | COMMUNICATIONS OF THE ACM 63 contributed articles

Figure 1. Sample output: metro map of the 2014 Crimean crisis.

Legend on the left of each line lists the important words for the line; the lines correspond to the Russian, Ukrainian, and Western points of view. Each metro stop is a cluster of articles; the callout bubbles are manual annotations of the content. The timeline is at the bottom of the map.

Crimea Putin declares independence recognizes independence

Putin Crimean Ukraine’s Ousted rubber- Ukrainian parliamentary Crimea Seeks stamps leader Occupation delegation to Become Crimea urges

Ukrainian Ukrainian Ukraine, Pro-Russia Putin, Ukrainian ofﬁcer leader dies in Occupiers concedes leader, urge Ukraine wants to ﬁrst gun Stay Put loss keep China Canada Ukraine won’t crisis ‘not West sanctions recognize another referendum Cold War’

Crimea Ukraine, force, Eastern, West steps votes to up Russia More sanctions warns, Russia, crisis join Russia sanctions

Obama: Ofgem Republican Obama, Merkel West says the U.S. discuss Ukraine Won’t Let Ukraine, sanctions, Competition lawmakers Kremlin Crimea, seek, Obama and seek

Mar 8 Mar 17 Apr 4 Apr 30

2014 Crimean crisis, with the three users to digest information. We also An objective function. Before we can lines corresponding to the Russian, integrated capabilities for supporting come up with an algorithm for comput- Ukrainian, and Western points of view. user interaction into the methodol- ing good maps, we must craft an objec- The legend to the left of each line shows ogy, letting users guide formulation tive function, which is especially im- the important words for the line. The of the maps. portant for maps, where the objective timeline appears at the bottom of the We demonstrate that metro maps is not clear, a priori. In the following figure. The Russian (green) line starts can help people understand informa- sections, we motivate and formalize in March, with the Crimea parliament tion in many areas, including news sto- several (sometimes conflicting) crite- voting to join Russia and Vladimir Putin ries, research areas, legal cases, even ria. In the next section, we present a recognizing Crimean independence. works of literature. Metro maps can principled approach to constructing The Ukrainian (orange) line starts with help them cope with information over- maps that optimizes trade-offs among Ukraine’s former prime minister urg- load, framing a direction for research these criteria. ing the West to stop Russian aggres- on automated extraction of informa- First, recall our goal. Given a set of sion. The Ukrainian line then joins tion, as well as on new representations documents, we seek to compute a met- the Western (blue) line to discuss the for summarizing and presenting com- ro map that summarizes and organizes West’s attempts to support Ukraine. Fi- plex sets of interrelated concepts. the documents. A metro map consists nally, the Russian and Ukrainian lines of a set of metro lines, each an ordered intersect when pro-Russia groups took Finding a Good Map sequence of stops, where a stop is a over police stations in Ukraine. We start by formalizing the character- subset of articles. Each line follows a Our representation is motivated istics of good maps and formulating coherent narrative thread, and differ- by the strong empirical evidence that their construction as an optimization ent lines focus on different aspects of map representations help users gain problem. We then provide efficient, the story. Intersections across lines and retain knowledge; for example, scalable methods with theoretical reveal the ways different storylines in- mind maps and knowledge maps guarantees for constructing maps. Our teract; for example, we computed the have been shown to increase memory description of the characteristics is in- map in Figure 1 over news articles con- recall in students,11,23 as well as moti- tentionally abstract. Later, we demon- taining the word “Crimea” from March vation and concentration.15 We have strate how to adapt these abstract no- to April 2014. Each stop is a cluster of also found map visualizations enable tions to various domains. articles. The map includes three story-

64 COMMUNICATIONS OF THE ACM | NOVEMBER 2015 | VOL. 58 | NO. 11 contributed articles lines, following the Russian, Ukrainian, good maps, but is it sufficient? Pur- tra coverage encourages us to pick docu- and Western points of view. suing an answer, we found maximally ments that cover new topics instead. Coherence. A first requirement is coherent lines for the query “Bill Clin- We next introduce weights for each that each metro line tells a coherent ton.” The results were discouraging. element, indicating the element’s im- story; following the articles along a While the lines were indeed coherent, portance. The weights bias the map line should give the user a clear under- they were not important. Many lines toward covering important elements standing of the evolution of a story. revolved around narrow topics (such while also offering a natural mecha- Consider a chain of clusters, where as Clinton visiting Belfast). Moreover, nism for personalization. In Shahaf a cluster is a set of documents. For as there was no notion of diversity, et al.,26 we discussed learning weights the sake of the presentation, we fo- multiple lines included redundant from user feedback, resulting in a per- cus on singletons, with each cluster information. This example suggests sonalized notion of coverage. a single document. In order to define selecting the most coherent lines does Connectivity. Finally, a map is more coherence, a natural first step is to not guarantee a good map. Instead, than a set of lines, with information measure similarity between each two the key challenge is balancing coher- in its structure as well. Our final prop- consecutive articles along the chain. ence and coverage; in addition to be- erty is thus connectivity. A map should As a single bad transition can destroy ing coherent, lines must also cover di- convey the underlying structure of the the coherence of an entire chain, we verse topics important to the user. story and how different aspects of the measure the strength of the chain by We define a set of elements the map story interact. the strength of its weakest link. can cover. The elements can depend on Intuitively, different stories have However, this simple approach the domain; in the case of news articles, different structures. Some stories are can produce poor chains. Consider, we select words (such as “Obama” and almost linear, while others are much for example, chains A and B. Both “China”),26 so a high-coverage map dis- more complex. In order to capture the have the same endpoints, yet Chain cusses many important words. In the structure of a story, we compute the A is significantly less coherent. Note case of a scientific corpus, we select pa- minimum number of lines that cover the transitions of Chain A are all rea- pers25 so a high-coverage map touches a all metro stops. This objective prefers sonable when examined out of con- large chunk of the corpus. long storylines whenever possible; lin- text; the first two articles are about We calculate a coverage function, ear stories become linear maps, while debt default, the second and third measuring how well each document complex stories maintain their inter- are about Republicans, and so on. covers each element. We extend it to a weaving threads. Despite these local connections, the set function, measuring how well a set of Tying it all together. We now for- overall effect is incoherent. documents covers each element. In or- mulate the problem of finding a good Now take a closer look at the two der to encourage diversity, this function metro map, given a set of documents. chains. Figure 2 shows word appear- is submodular; if the map covers an ele- We need to consider trade-offs among ance along both chains; for example, ment well already, adding another docu- the properties discussed earlier: “clus- the word “Greece” appears through- ment covering that element well thus ter quality,” “line coherence,” “map out Chain B. It is easy to spot the as- provides little extra coverage. Lack of ex- structure,” and “coverage under bud- sociative flow of Chain A. Words appear for short stretches; some words Figure 2. Word patterns in Chain A (left) and B (right); bars correspond to word appearance in the articles listed above. appear, then disappear and reappear. Contrast this with Chain B, where stretches are longer and transitions are smoother. This observation moti- • Europe weighs possibility of debt default • Europe weighs possibility of debt default in in Greece Greece vates our definition of coherence. • Why Republicans don’t fear a debt default • Europe commits to action on Greek debt We transform the problem into • Italy; The Pope is leaning toward • European Union moves toward a bailout Republican ideas of Greece a linear programming optimization problem, where the goal is to choose • Italian-American groups • Greece set to release austerity plan protest “Sopranos” • Greek workers protest austerity plan a small set of words and score the • Greek workers protest austerity plan chain based solely on these words. To ensure the strength of each tran- Chain A (Incoherent) Chain B (Coherent) sition, the score of a chain (given a set of active words) is the score of the weakest link; see Shahaf and Guestrin24 for details. Greece Greece The score of a single link might de- Europe Europe pend on the domain. In Shahaf et al.26 Republican Debt we showed how to compute a score, Italy Austerity given article content alone. In Shahaf Protest Protest et al.,25 we showed how to take advan- tage of links among articles. Coverage. Coherence is crucial for

NOVEMBER 2015 | VOL. 58 | NO. 11 | COMMUNICATIONS OF THE ACM 65 contributed articles

get”; for example, maximizing cover- Problem 1 (Metro maps: Informal) Algorithm age leads to a disconnected map, as A map must satisfy We now brieﬂy review the main ideas be- there is no reason to reuse a cluster for High coverage (o1) hind the algorithm, which starts by com- more than one line. Maximizing coher- High structure quality (o2) puting a set of documents from a query. ence often results in repetitive, narrow- Subject to We then segment the articles into time scope chains. It is thus better to treat Minimal level of line coherence (c1) windows and compute good clusters for coherence as a constraint; a chain is Minimal cluster quality (c2) each window (constraint c2 in Problem either coherent enough to be included Maximal map size (c3) 1) using a community-detection algo- in the map, or it is not. Coverage and rithm on word co-occurrence graphs.27 structure, on the other hand, should See Shahaf et al.27 for a formal state- These clusters serve as metro stops. both be optimized. We deﬁne the map ment of the algorithm and optimiza- Once we have clusters, we can pro- objective like this: tion. ceed to computing coherent lines (con-

Figure 3. A metro map for the query “Boston” in May 2013.

Two lines discuss the aftermath of the Boston Marathon bombings, with one line focusing on the suspect, the other on community events; the other two lines are about Boston major league sports—hockey and baseball.

Boston Cemeteries Dzhokhar Boston marathon, refuse to Indy 500 Tsarnaev’s Marathon: marathon bury bomb fans face confession Man Shot bombings suspects Cemetery long lines Boston note Last of by FBI had right explos Response marathon Tamerl to bury to bombing Police victims Boston Boston to be Photos: finish line, Chief: We leave Mass Local music Marathon reviewed Runners Boston Are Not acts set runners to finish the marathon Barbarians get another for Boston last mile benefit

Fan’s For Bruins’ Red Sox’ Here’s ‘Toronto An added Boston Bruins, challenge to Rangers, back lie Andrew picking series, Florida, Stronger’ a Chance boosted close out Indians Bailey Penguins in NHL playoffs sign angers to Steal a by rookie hammer faces seven Red Sox

Lee Arcia HR, Misplayed This 0-2 Lackey bafﬂes wires, top, Baseball, popup costs hole seems Red Sox error boost Red Sox larger for Red Sox, recap, page Twins in Phillies

May 08 May 17 May 31

Figure 4. Overview of the algorithm. We compute clusters, encode coherent lines in a graph, and use the graph to compute the structure of the story. We then pick K lines from the structure that maximize coverage.

......

Compute Clusters Coherence Graph Compute Structures K Lines to Max Coverage

66 COMMUNICATIONS OF THE ACM | NOVEMBER 2015 | VOL. 58 | NO. 11 contributed articles straint c1). Ideally, we could enumerate straint in Problem 1 has an associated all possible candidate lines, but that is parameter that is manually adjusted often infeasible. We instead propose on training queries. Another important a divide-and-conquer approach, con- parameter is m, the user’s “history win- structing long lines from shorter ones. dow,” or number of previous articles in It allows us to compactly encode many The key challenge the line the user can remember. Higher candidate lines as a graph; the nodes is balancing m results in more coherent chains but of the graph correspond to short co- is more computationally expensive. In herent lines, and edges indicate lines coherence practice, we choose the highest m we that can be concatenated and remain and coverage; in can afford computationally. coherent. Paths in the graph thus correspond to coherent lines. addition to being Applications After encoding all coherent lines, In the previous section, we discussed we identify the underlying structure coherent, lines metro maps in the news domain, but of the story, optimizing a connectivity must also cover maps are easily applied to other do- objective (o2) that prefers longer story- mains. The main principles—coher- lines whenever possible. The objective, diverse topics ence, coverage, and connectivity—are while difficult to optimize exactly, is important the same, but one can use domain submodular and can be efficiently ap- knowledge to improve the objective. proximated within guarantees. to the user. In the following sections, we discuss A story may be very complex, but the four applications: news, science, legal user’s attention span is limited. To keep documents, and books. maps manageable, our final step is to News. News media play a pivotal restrict their size. We pick up to K lines role informing the public of social, cul- that obey the structure and maximize tural, and political issues. Understand- coverage (c3 and o1). We rely on sub- ing news enables the public to make modular optimization again to optimize key life decisions (such as choosing a the map within theoretical guarantees. place to live or a political orientation). Example outputs of the algorithm are in The consequences of acting without Figure 1 (Crimea) and Figure 3 (Boston). understanding the big picture can be Complexity and running time. Given adverse. However, with the increasing a query set of documents D, we first amount of content published every run a linear-time algorithm, compil- day, readers can easily miss the big pic- ing D to a sequence of word co-occur- ture amidst a flood of data. rence graphs. More important, the Approach. We used our algorithms size of the graphs does not depend on to compute maps about news events, D but on the size of our vocabulary W. assembling multiple news datasets Our dependency on the size of D is lin- covering hundreds of thousands of ear, and our algorithm scales well (see posts from Internet news sources; for a Figure 4); see Shahaf et al.26 and Sha- demonstration of the system see http:// haf et al.27 for theoretical guarantees. metromaps.stanford.edu/. Our main bottleneck is the coverage Evaluation. Quantitatively evaluat- step, which is polynomial of a high ing metro maps is difficult. There is degree in |W|. A parallel implementa- no established gold standard for do- tion and lazy evaluations achieve the ing it, and ground truth is difficult to same approximation guarantees while define. Since the goal of the maps is to often leading to dramatic speedups. help people navigate information, we In practice, our system often takes conducted a user study to better un- less than a minute for query sets in- derstand the value of the methodology. cluding tens of thousands of docu- The study took place in 2011 at Carn- ments. Note while our system could egie Mellon University in Pittsburgh, in principle support even larger query PA, aiming to test whether the maps we sets, the use case we have in mind for generate are useful for humans seek- maps rarely necessitates it. We specu- ing information about complex topics. late that very broad queries (“U.S.”) To demonstrate a deep understand- would be less common than narrower ing of a topic, we asked users to explain ones (“health care reform”). it to others. We recruited 15 under- Parameters. Tuning is required for graduate students, asking them to write several parameters to obtain a good two paragraphs, one summarizing the metro map. In particular, each con- Haiti earthquake and one summarizing

NOVEMBER 2015 | VOL. 58 | NO. 11 | COMMUNICATIONS OF THE ACM 67 contributed articles

the Greek debt crisis. We randomly as- multi-agent and robotics lines and how signed them a metro map or the Google the exploration-exploitation line inter- News result page. We computed the acts with the analysis line. These rela- maps from a corpus of more than 18,000 tionships are gray dashed paths, with articles from the New York Times. We relevant citation text nearby. employed crowdworkers on Mechanical Maps are designed Evaluation. To test our maps, we re- Turk (http://www.mturk.com/) to evalu- to display cruited 30 graduate students from Carn- ate the paragraphs. In each round, we egie Mellon University, asking them to gave the crowdworkers two paragraphs connections conduct a quick literature survey in rein- (map user vs. Google News user) and between multiple forcement learning, an area they had not asked them to assess which one provid- studied. In particular, we asked them ed a more complete and coherent pic- pieces of to update a survey paper from 1996 by ture of the story. After removing spam, identifying up to five research directions we had 294 evaluations for Greece and information. that should be included in the updated 290 for Haiti; 72% of the Greece com- survey and listing a few relevant papers parisons preferred map paragraphs, but for each direction. We recorded their only 59% of the Haiti comparisons pre- browsing histories and took a snapshot ferred map paragraphs. After examining of their progress every minute. We lim- the Haiti paragraphs, we found most ited their time to 40 minutes to simulate map users focused solely on the major a quick first pass on the papers. All par- storyline (on distributing aid). Based ticipants used Google Scholar,a a search on the results of the study, we speculate engine that indexes scholarly literature. maps are more useful for stories without In addition, we gave 15 of them a metro a single dominant storyline. map. Allowing them to query Google Science. As the number of scientific Scholar’s entire set of publications publications soars, even the most en- makes the task both more realistic and thusiastic reader can have trouble stay- more difficult for maps. ing on top of the evolving literature. We An expert graded the output of are motivated by the idea of creating all participants. We wanted them to valuable literature exploration tools find good papers, as well as identify that can help people entering a new important research areas. We thus field (such as new graduate students measured precision (fraction of re- or experts reaching beyond their tradi- trieved papers that are relevant) and tional discipline’s borders). subtopic recall (fraction of relevant Approach. We extended our tech- research areas retrieved). Map us- niques to the scientific domain, aim- ers outperformed Google-only users ing to test whether maps can help rein every parameter, with an average searchers understand the state of the score of 84.5%, discovering on aver- art of a field. We modified the objective age 1.62 seminal papers. Google users slightly (see Shahaf et al.25), taking ad- achieved a score of 74.2%, finding 1.2 vantage of the citation graph. Our data- seminal papers on average. The map set included more than 35,000 papers users’ average recall score was 73.1%, from ACM conferences and journals. compared to Google’s 46.4%. Figure 5 outlines part of a map we Further analysis of the snapshots computed for the query “reinforcement we took throughout the study provides learning,” depicting multiple lines of anecdotal evidence of the map’s utility. research, including Markov decision Google users visited more pages and processes, robotics and control, bounds listed more papers on average. However, and analysis, exploration-exploitation when looking at the average ratio, only trade-offs, and multiagent coopera- one of 4.5 pages visited by Google users tion. Note the lines in the figure do not was added to their list, while map users intersect. Intersection in the scientific added one of 3.8 pages. That is, the map domain is difficult; a theory line and an users appeared to be more focused; they application line can be highly related, may have visited fewer pages but found yet no single document belongs to both. them satisfactory. In addition, several We thus modified our objective to allow map users started by composing a short for a softer kind of connectivity, where list of research directions, then progres- lines can interact through citations; for sively added papers to each direction example, the map shows how the Mar- kov Decision Process line affects the a http://scholar.google.com

68 COMMUNICATIONS OF THE ACM | NOVEMBER 2015 | VOL. 58 | NO. 11 contributed articles throughout the session. Google users, Addressing this challenge, we fo- then computed the words that made in contrast, did not exhibit such “big cused on anchor text, or text surround- each line coherent for our algorithm. picture” behavior. ing citations. Identifying highly cited Figure 6 (right) shows the comparison. Legal documents. Law is built on paragraphs allowed us to focus on the The coherent words correspond well to the evolution of ideas with links to important parts of each case; for exam- the lawyers’ annotation; for example, key precedents. Legal scholars and ple, Women’s Community Health v. Cohen the purple line deals with the question lawyers routinely do research on legal cites Roe v. Wade, saying, “[T]he Supreme of whether Congress may abrogate the corpora, dealing with an avalanche of Court held that the constitutional “right Eleventh Amendment immunity of the information. Despite the related infor- of privacy ... is broad enough to encom- states. This line was labeled “eleventh mational challenges, legal documents pass a woman’s decision whether or not amendment, state sovereignty” by the and review processes remain largely to terminate her pregnancy” (Id. 410 U.S. lawyers, and its coherent words were untouched by technology. We sought at 153 93 S.Ct. at 727). We used the an- “immunity,” “sovereignty,” “amend- to explore the value of metro maps to chor text to compute our input set of doc- ment,” and “eleventh.” help lawyers argue a case, envisioning uments and applied our map algorithm. We further asked the lawyers to ex- a system that would help them ﬁnd Evaluation. As a reality check, we plain each line. As an example, con- related cases, understand how the law computed a map for the query “com- sider again the purple line in Figure 6. has evolved (and why it changed), and merce clause,” which appears in Arti- The line starts with Ford Motor v. Dept. prepare a case strategy accordingly. cle I, Section 8 of the U.S. Constitution, of Treasury. The Court held the Elev- Approach. Our data consisted of U.S. saying the Congress has power “To reg- enth Amendment denies to the federal Supreme Court decisions as supplied ulate Commerce with foreign Nations, courts authority to entertain a suit by by Ravel Law.b Unlike news articles and and among the several States, and with private parties against a state without scientiﬁc papers, Supreme Court deci- the Indian Tribes.” It is an important the state’s consent. sions can be lengthy, reaching hun- clause that has been thoroughly dis- In the next stop (a case known as dreds of pages. The simple text-pro- cussed in multiple courts. Parden v. Terminal Railway), the Court cessing methods of the earlier sections Figure 6 (left) shows the map we com- discussed whether a state owning a could not separate the wheat from the puted and shown to lawyers from Ravel railroad could successfully plead sov- chaff for legal scholars. Law, asking for their interpretation. ereign immunity in a federal court suit They browsed through it and manually by its employees. In Employees v. Dept. b http://www.ravellaw.com labeled each line. As a reality check, we of Public Health, the Court noted even

Figure 5. Part of the map computed for the query “reinforcement learning.”

The map outlines multiple lines of research (see legend at the bottom right). Interactions between the lines are dashed gray lines, with relevant citation text appearing nearby.

Elevator Group Control Using Multiple Reinforcement Learning 1998 Efficient reinforcement Near-Optimal RL in Polynomial Agents Crites, Barto | Mach. Learn. learning in factored MDPs specific rates for Q-learning Time Kearns, Singh | ICML 1999 Kearns, Koller | UCAI are somewhat disappointing Partially observable PEGASUS: A policy search Finite-time Analysis 2000 Markov decision processes (POMDPs) method for large MDPs and of the Multiarmed extend stochastic games POMDPs Ng. Jordan | UAI Bandit Problem Auer, 2001 Cesa-Bianchi, Fischer the later trick is known as Reinforcement learning of coordination | Mach. Learn. R-max - a general polynomial 2002 in cooperative multi-agents systems the PEGASUS method time algorithm for near-optimal Using confidence Kapetanakis, Kudenko | AAAI e-mdps: learning RL Brafman, Tennenholtz | JMLR Lyapunov design for safe bounds for 2003 Optimizing information in varying reinforcement learning exploitaton- Learning Rates for Q-learning exchange in cooperative environments Perkins, Barto | JMLR exploration Even-Dar, Mansour | JMLR 2004 multi-agent systems Szita, Tackas, trade-on, Stopping coditions that are based Goldman,Zilberstein | AAMAS Lorinez | JMLR Using inaccurate models Auer | JMLR on generic convergence rate 2005 Cooperative Multi-Agent Learning: in reinforcement learning bounds are overly conservative The State of the Art Panait, Liviu Abbeel, Quigley, Ng | ICML PAC model-free RL 2006 and Luke, Sean | AAMAS Action Elimination and Stopping Conditions Strehl, Li, Wiewiora, RL by reward-weighted Lenient learners in for the Multi-Armed Bandit and RL Problems Langford, Littman regression for operational space cooperative multiagent Even-Dar, Mannor, Mansour | JMLR | ICML control Peters, Schaal | ICML 2007 systems Panait, Sullivan, Model-based Luke | AAMAS Reinforcement learning function The many faces of motor kills with policy 2008 approximation of optimism: Non-linear dynamics in gradients Peters, Schaal | multi-agent cooperative joint team in reinforcement a unifying multiagent reinforcement Neural Netw. mdp states pomdp transition option learning algorithms learning Jong, approach Szita control motor robot skills arm Abdullah, Lesser | AAMAS Stone | AAMAS Lorincz | ICML bandit regret dilemma exploration arm q-learning bound optimal rmax mdp

NOVEMBER 2015 | VOL. 58 | NO. 11 | COMMUNICATIONS OF THE ACM 69 contributed articles

Figure 6. Detail of a map over legal documents for the query “Commerce Clause.”

Lines focus on the U.S. Congress’s power to prohibit commerce, the Eleventh Amendment, regulating wholesale energy sales, and more. Right: Lawyers’ annotation of each line, compared to words chosen by our coherence algorithm. Manual annotations correspond to coherent words.

US V. Darby Perez V. US 1941 1971 Fry V. US 1975 Hodel V. Virginia US V. Lopez Surface 1995 ... Schechter Wickard Maryland Poultry Corps V. Filburn V. Wirtz V. US 1942 1968 1935

Ford Co. Parden V. Employees Quern V. Welch V. Dept. Of Terminal V. Missouri Jordan V. Texas Treasury R. Co. Public 1979 Highways 1945 1964 Health ......

US. V. Arkansas Missouri V. Illinois Gas Public Elec. Coop. Kansas Gas Co. V. Public Utilites Corp. V. 1924 ......

Lawyers’ Label Coherent Words interstate, commerce, Power to prohibit commerce affect, regulate

Congress’s power to regulate congress, interest, regulate, channel

11th Amendment, state sovereignty immunity, sovereignty, amendment, eleventh

“Merely” vs “substanitally” affects affects, substantial, regulate

wholesale, electricity, Regulating wholesale energy sale resale, steam, utilities

if the Court reverses itself, Congress Books. Narratives are important in encountered an interesting problem can provide authority for suits in state many areas, including literary criticism, with our notion of coherence, origi- courts to implement federal statutory political science, and linguistics. Despite nally developed in the context of news. rights, thus doing away with common it, we know little about their structure. Journalists often remind their readers law sovereign immunity of the states. We thus decided to apply metro maps of previous events, and thus coherence Next, in Quern v. Jordan, the Court held to elucidate the structure of complex could be inferred through repetition. Congress had not intended to include books. Our first test case was The Lord of In contrast, book authors do not recap states within the term “person” for the the Rings, an epic fantasy novel of more events that happened several pages purpose of subjecting them to suit. than 480,000 words, internally divided ago but rather rely on the memory of Finally, in the last stop (a case known into six books. It includes a long list of engaged readers. We therefore relied as Welch v. Texas Highways), the Court characters that can be difficult to follow on other hints for coherence. overruled an aspect of Parden regard- for even the most dedicated reader. Noting a character’s point of view ing states participating in federal Approach. Since our maps operate is often a coherent narrative, we thus spending programs. on a collection of documents, we par- decided to focus on named entities. Likewise, the lawyers could explain titioned the book into three-page seg- We identified the characters present all other lines, expressing their confi- ments, treating each segment as a doc- on each page and looked for storylines dence in the benefit of maps to the le- ument. Applying the maps algorithm with co-occurring characters. gal community. to this collection of segments, we Evaluation. Figure 7 shows a seg-

70 COMMUNICATIONS OF THE ACM | NOVEMBER 2015 | VOL. 58 | NO. 11 contributed articles ment from The Lord of the Rings map Using Maps rely on Bloom’s taxonomy,3 which iden- that reveals important structural infor- We have discussed the process of cre- tifies six cognitive categories character- mation: The story begins in the Shire ating maps and now shift to the user, izing the processes of learning, from re- (leftmost cluster) where Gandalf meets exploring possible uses of maps. We calling facts to making judgments. the hobbits. In the pages associated rely on the traditional information- We distinguish between two catego- with this cluster, Gandalf advises Fro- retrieval framework, characterizing ries of information needs for map users. do to take the ring away from the Shire. a user by an information need. Users “Learn” corresponds to the lower three In the second cluster, Frodo leaves, ac- formulate their information needs and levels of Bloom’s taxonomy; the user’s companied by Sam, Merry, and Pippin. submit queries to a system. If not satis- goal is to acquire knowledge. A user in They take the Strider, later revealed to fied with the results, they may interact the learn category might be interested be Aragorn, as guide and protector. with the system through reformulated in surveying a concrete topic, which may In the next cluster, Aragorn leads the queries until they are satisfied. In this be new to the user or a familiar one the hobbits to Rivendell where the Council section, we discuss information needs user wishes to monitor. Alternatively, of Elrond meets. The Council decides and interaction scenarios. the user may wish to explore and navi- the Ring must be destroyed, and a “Fel- Information needs. Maps are not in- gate around a starting point. Navigation lowship of the Ring” is formed, includ- tended to replace search engines; many is a promising application for maps, ing Sam, Merry, Pippin, Aragorn, Gan- search-engine queries are extremely fo- with many news sites today including dalf, Gimli, Legolas, and Boromir. cused, and corresponding information a “related articles” function. Maps can The council cluster splits into three needs are often satisfied with a simple augment it, allowing users to see the ar- lines, as in Figure 7, corresponding to phrase. In contrast, maps are designed ticle in a broader context. Note the learn the fellowship splitting up when orcs to display connections between mul- category does not include fact lookup kidnap Merry and Pippin (green line). tiple pieces of information. In terms of or question answering. As described in Aragorn, Gimli, and Legolas pursue the Broder’s taxonomy,7 maps are mostly Shahaf et al.,26 maps are less useful for orcs (purple line), while Frodo and Sam useful for informational queries, and this type of query. continue on their own, capturing Gol- of little use for navigational and trans- The “investigate” category corre- lum (blue line), and head toward Mor- actional queries. sponds to the higher levels of Bloom’s dor, the region controlled by Sauron We would like to characterize the in- taxonomy, where users aim to produce and his servant, Saruman (yellow line). formation needs of map users. Informa- outcomes. In it, users aim to transform This example, while preliminary, tional queries are driven by a user’s need existing data into useful patterns, seek- demonstrates the potential of maps for to learn something. In order to charac- ing gaps in current knowledge. They sorting through complex plotlines. terize the different types of learning, we analyze and synthesize different pieces

Figure 7. Detail from The Lord of the Rings map revealing the structure of the story; one can see how hobbits and Gandalf start their journey, gather together in Elrond’s council, and then split up, with callout bubbles as manual annotations.

Aragorn Legolas At the Shire Council of Elrond Gimli

Merry Pippin Elrond Bilbo Sam Aragorn Pippin Gandalf Gandalf Gandalf Merry Frodo Frodo Frodo

Sam Sam Frodo Frodo Gollum

Saruman Sauron

NOVEMBER 2015 | VOL. 58 | NO. 11 | COMMUNICATIONS OF THE ACM 71 contributed articles

of information, looking for plausible Our work differs from previous work generalizations that could result in new in several important ways. First, our sys- insights. In particular, such users might tem has structured output, so not only be interested in contrasting and com- does it pick nuggets of information, it paring multiple maps. explicitly shows connections among Interaction. Interaction is crucial to Given a query, them. Prior work was limited largely to the success of metro maps. Users often our algorithms list-output models. In the summariza- know precisely what they want to find, tion task,2,20,22 the goal is often to sum- but it is not easy for them to distill their generate concise marize a corpus of texts by extracting a ideas into a few keywords. Maps should structured sets list of sentences. Other methods18,31,30 thus allow interaction. Many models of discover new events but do not attempt interaction can be naturally integrated of storylines that to string them together. with metro maps. We rely on user feed- Numerous efforts at information re- back to learn preferences and adjust maximize coverage trieval go beyond lists to provide richer our maps accordingly. In the following of salient pieces views, including different notions of sections, we discuss two interaction storylines.1,2,28,29 Graph representations mechanisms we have implemented: of information. are common across a variety of related “zooming” and “word feedback.” problems,10,14,17,19 from topic evolution Zooming. Some users are interested to news analysis. However, in all these in a quick, high-level overview of a topic, methods there is no notion of graph while others wish to delve into the de- paths as coherent storylines. Rather, tails. We thus want our maps to be zoom- graph edges might be selected because able. As maps are richly expressive, there they pass a threshold or belong to a are multiple ways to interpret zoom in- spanning tree. teractions. We have implemented three Still other methods5,6 consider co- interpretations: zooming could affect herence at the path-level, in the sense time resolution; cluster resolution could that they aggregate a similarity score make clusters split and merge; and users across all chain documents. However, could focus on a particular metro line. they do not consider the order of the Word feedback. When a user inter- documents or the strength of the weak- acts with a map, the map should ide- est link and may assign high coher- ally support feedback of the form “Tell ence to chains despite bad transitions. me more about the E.U.’s reaction to Guaranteeing strong transitions across the crisis” or “I am not interested in chains facilitates knowledge acquisi- the Red Sox.” Labeling entire maps, tion and comprehension. or even single documents, is not rich Multiple tools exist for summarizing enough to support this interaction and visualizing literature; see Borner4 model. Also, there is no way to indicate for a compendium. Unlike our clusters a user likes something not on the map. of documents, many of these systems12,16 We propose “feature-based feed- use a single concept as a unit of analysis. back” instead to provide a natural way This granularity is too fine to be useful by to support the queries discussed ear- a non-expert. Other tools with granular- lier; the user could increase the impor- ity similar to ours often focus on visual- tance of the word “E.U.” and decrease izing citations or co-citations.8,13 Again, the importance of “baseball” to achieve edges between documents are based on the desired effect. We use a discrimina- local computation, and there is no no- tive semi-supervised learning method tion of coherent lines of research. that incorporates such training affini- Finally, visual metaphors similar to ties between features and classes.9 Us- metro maps have been used before to ing it, we define a personalized, ses- display abstract knowledge; for exam- sion-sensitive notion of coverage that ple, Nesbitt’s map shows interconnect- accounts for user feedback. ing ideas running through his Ph.D. thesis.21 However, these maps were all Related Work constructed manually, as opposed to To the best of our knowledge, automat- ours being generated automatically. ic construction of metro maps is novel. Nevertheless, extensive work has been Conclusion done on myriad related directions, from We have outlined our studies to date on topic detection and tracking to summa- methods that extract information and rization and temporal text mining. construct summarizing metro maps.

72 COMMUNICATIONS OF THE ACM | NOVEMBER 2015 | VOL. 58 | NO. 11 contributed articles

Given a query, our algorithms gener- CNS-1010921, IIS-1149837, DARPA 13th ACM International Conference on Information and Knowledge Management (Washington, D.C., Nov. ate concise structured sets of story- SMISC, DARPA GRAPHS, ARL AHP- 8–13). ACM Press, New York, 2004, 446–453. lines that maximize coverage of salient CRC, Okawa Foundation, Docomo, 20. Nenkova, A. and McKeown, K. A survey of text summarization techniques. In Mining Text Data, C.C. pieces of information. Most important, Boeing, Volkswagen, Intel, the Brown Aggarwal and C. Zhai, Eds. Springer, 2012, 43–76. metro maps explicitly show the relation- Institute for Media Innovation, and 21. Nesbitt, K. Getting to more abstract places using the metro map metaphor. In Proceedings of the Eighth ships between lines. We have applied the Alfred P. Sloan Fellowship. International Conference on Information Visualisation metro maps to help people understand (London, U.K., July 14–16). IEEE, 2004, 488–493. 22. Radev, D., Otterbacher, J., Winkel, A., and Blair- References news stories, research areas, legal cases, Goldensohn, S. Newsinessence: Summarizing online 1. Ahmed, A., Ho, Q., Eisenstein, J., Xing, E., Smola, A.J., news topics. Commun. ACM 48, 10 (Oct. 2005), 95–98. and works of literature. We conducted and Teo, C.H. Unified analysis of streaming news. In 23. Rewey, K.L., Dansereau, D.F., and Peel, J.L. Knowledge Proceedings of the 20th International Conference on maps and information processing strategies. promising pilot user studies over real- the World Wide Web (Hyderabad, India, Mar. 28–Apr. Contemporary Educational Psychology 16, 3 (July 1). ACM Press, New York, 2011. world datasets in several domains. The 1991), 203–214. 2. Allan, J., Gupta, R., and Khandelwal, V. Temporal 24. Shahaf, D. and Guestrin, C. Connecting the dots results suggest metro maps help users summaries of new topics. In Proceedings of the 24th between news articles. In Proceedings of the 16th Annual International ACM SIGIR Conference on acquire knowledge more efficiently. ACM SIGKDD International Conference on Knowledge Research and Development in Information Retrieval (New Discovery and Data Mining (Washington, D.C., July Our work also has several limitations Orleans, LA, Sept. 9–13). ACM Press, New York, 2001. 25–28). ACM Press, New York, 2010. 3. Bloom, B.S., Engelhart, M.D., Furst, E.J., and Hill, W.H., that would be interesting to address in 25. Shahaf, D., Guestrin, C., and Horvitz, E. Metro maps Eds. Taxonomy of Educational Objectives, Handbook 1: of science. In Proceedings of the 18th ACM SIGKDD the future; for example, our notion of Cognitive Domain. Longman, White Plains, NY, 1956. International Conference on Knowledge Discovery and 4. Borner. K. Atlas of Science: Visualizing What We Know. coherence assumes word repetition, Data Mining (Beijing, China, Aug. 12–16). ACM Press, MIT Press, Cambridge, MA, 2010. New York, 2012. so our system cannot handle extreme- 5. Boyack, K.W. and Klavans, R. Creation of a highly 26. Shahaf, D., Guestrin, C., and Horvitz, E. Trains of detailed, dynamic, global model and map of science. ly short documents (such as Twitter thought: Generating information maps. In Proceedings Journal of the Association for Information Science and of the 21st International Conference on the World Wide posts). Moreover, our shallow features Technology 65, 4 (Apr. 2014), 670–685. Web (Lyon, France, Apr. 16–20). ACM Press, New York, 6. Braam, R.R., Moed, H.F., and Van Raan, A.F. Mapping make coherence metrics prefer chains 2012, 899–908. of science by combined co-citation and word analysis, 27. Shahaf, D., Yang, J., Suen, C., Jacobs, J., Wang, H., of articles from the same source. We I: Structural aspects. Journal of the Association for and Leskovec, J. Information cartography: Creating Information Science and Technology 42, 4 (May 1991), also plan to make our system more ro- zoomable, large-scale maps of information. In 233–251. Proceedings of the 19th ACM SIGKDD International bust to noise; while the method is not 7. Broder, A. A taxonomy of Web search. ACM SIGIR Conference on Knowledge Discovery and Data Mining Forum 36, 2 (Sept. 2002), 3–10. (Chicago, IL, Aug. 11–14). ACM Press, New York, 2013, very sensitive to the removal of a few ar- 8. Chen, C. Citespace II: Detecting and visualizing 1097–1105. emerging trends and transient patterns in scientific ticles from the dataset, near-duplicates 28. Swan, R. and Jensen, D. TimeMines: Constructing literature. Journal of the Association for Information timelines with statistical models of word usage. In affect coverage weights, biasing the al- Science and Technology 57, 3 (Feb. 2006), 359–377. Proceedings of the Sixth ACM SIGKDD International 9. Druck, G., Mann, G., and McCallum, A. Learning from gorithm toward covering them. Sensi- Conference on Knowledge Discovery and Data Mining labeled features using generalized expectation criteria. (Boston, MA, Aug. 20–23). ACM Press, New York, 2000. tivity to small changes in the query date In Proceedings of the 31st Annual International ACM 29. Yan, R., Wan, X., Otterbacher, J., Kong, L., Li, X., and SIGIR Conference on Research and Development in range might be addressed by automati- Zhang, Y. Evolutionary timeline summarization: Information Retrieval (Singapore, July 20–24). ACM A balanced optimization framework via iterative cally finding an optimal segmentation Press, New York, 2008. substitution. In Proceedings of the 34th International 10. Faloutsos, C., McCurley, K.S., and Tomkins, A. Fast of the timeline. ACM SIGIR Conference on Research and discovery of connection subgraphs. In Proceedings of Development in Information Retrieval (Beijing, China, We also plan to experiment with the 10th ACM SIGKDD International Conference on July 24–28). ACM Press, New York, 2011, 745–754. Knowledge Discovery and Data Mining (Seattle, WA, richer forms of input, output, and in- 30. Yang, Y., Ault, T., Pierce, T., and Lattimer, C. Improving Aug. 22–25). ACM Press, New York, 2004. text categorization methods for event tracking. In teraction mechanisms and integrate 11. Farrand, P., Hussain, F., and Hennessy, E. The efficacy Proceedings of the 23rd Annual International ACM of the ‘mind map’ study technique. Medical Education higher-level semantic features. Much SIGIR Conference on Research and Development in 36, 5 (May 2002), 426–431. Information Retrieval (Athens, Greece, July 24–28). of the current work was devoted to 12 Fox, E.A., Neves, F.D., Yu, X., Shen, R., Kim, S., and ACM Press, New York, 2000, 65–72. Fan, W. Exploring the computing literature with crafting an objective function; in the 31. Yang, Y., Carbonell, J., Brown, R., Pierce, T., Archibald, visualization and stepping stones and pathways. B., and Liu, X. Learning approaches for detecting and future, we wish to learn or revise an Commun. ACM 49, 4 (Apr. 2006), 52–58. tracking news events. IEEE Intelligent Systems 14, 4 13. Garfield, E. and Pudovkin, A.I. The histcite system (July/Aug. 1999), 32–43. objective function directly from user for mapping and bibliometric analysis of the output feedback. Another interesting direc- of searches using the ISI Web of Knowledge. In Proceedings of the 67th Annual Meeting of the tion is the point-of-view mechanism, American Society for Information Science and Dafna Shahaf ([email protected]) is a letting users see a topic through the Technology (Providence, RI, Nov. 12–17). Association postdoctoral fellow in the Computer Science Department at Stanford University, Stanford, CA. eyes of another person (such as a Dem- for Information Science and Technology, Silver Spring, MD, 2004. Carlos Guestrin ([email protected]) is an ocrat asking for the Republican point 14. Gillenwater, J., Kulesza, A., and Taskar, B. Discovering associate professor in the Department of Computer of view). diverse and salient threads in document collections. Science & Engineering at the University of Washington, In Proceedings of the 2012 Joint Conference on Seattle, WA. This line of work can lead to tools Empirical Methods in Natural Language Processing and Computational Natural Language Learning (Jeju Island, Eric Horvitz (horvitz@microsoft.com) is a Distinguished that help people navigate and un- Korea, July 12–14). Association for Computational Scientist and Director at Microsoft Research, Redmond, WA. derstand ideas, trends, connections, Linguistics, Stroudsburg, PA, 2012, 710–720. 15. Hall, R.H. and O’Donnell, A. Cognitive and affective Jure Leskovec ([email protected]) is an assistant and storylines amidst an informa- outcomes of learning from knowledge maps. professor in the Department of Computer Science at Stanford University, Stanford, CA. tion explosion. Contemporary Educational Psychology 21, 1 (Jan. 1996), 94–101. 16. Hossain, M.S., Gresock, J., Edmonds, Y., Helm, R., Potts, © 2015 ACM 00010782/15/11 $15.00 Acknowledgments M., and Ramakrishnan, N. Connecting the dots between PubMed abstracts. PloS One 7, 1 (Jan. 2012), e29509. We would like to thank Rok Sosic, 17. Jo, Y., Hopcroft, J.E., and Lagoze, C. The web of topics: discovering the topology of topic evolution in a corpus. Andrej Krevl, Dima Brezhnev, Caro- In Proceedings of the 20th International Conference on line Suen, Jeff Jacobs, Heidi Wang, the World Wide Web (Hyderabad, India, Mar. 28–Apr. 1). ACM Press, New York, 2011, 257–266. Thomas von der Ohe, Tom Camen- 18. Kleinberg, J. Bursty and hierarchical structure in Watch the authors discuss zind, Rohan Puttagunta, and Raiyan streams. Data Mining and Knowledge Discovery 7, 4 their work in this exclusive (Oct. 2003), 373–397. Communications video. Khan. This research has been sup- 19. Nallapati, R., Feng, A., Peng, F., and Allan, J. Event http://cacm.acm.org/videos/ ported in part by NSF IIS-1016909, threading within news topics. In Proceedings of the information-cartography

NOVEMBER 2015 | VOL. 58 | NO. 11 | COMMUNICATIONS OF THE ACM 73