<<

EUROVIS 2019/ J. Johansson, F. Sadlo, and G. E. Marai Short Paper

WordStream: Interactive for Topic Evolution

T. Dang , H.N. Nguyen, and V. Pham

Texas Tech University, Lubbock, USA

dole person singh kempsikhs location ayn rand isaacted cruz organization clint eastwood lanza hurricane isaac christie miscellaneous morsi todd akin jill kelley daniel louie gohmert perkins rose clapper lady gaga vote smart lapierre madisonpell steven spielberg hagel mark udall martin dempsey christtrump tim pawlenty jack kemp ron wyden stephen colbert eastwood rand adams gore vidal columbus day ted cruz shelby county thatcher abedin louis brandeis poverty paul broun boehner sherlach timothy macbain jim messina bachmann barack karzaipower brinker dan quayle joe soptic christopher stevens bork petraeus richard lugar mark wyden colbert hoovernation bill mckibben bobby jindal martha raddatz john allen abbas jr. day windsor susan rice bainjim demint snowe heller pat buchanan mohammed patty murray martin roberts sheldon adelson karen handel rupert murdoch congressman ryan marc leder broun paula broadwell adam lanza james rosen black bradley jim lehrer martha paul loeb baldwin rick warren valentine juddslate khan smith nikki haley jeremy lin walker huma abedin big bird kelly seneca falls richard cordray tim cook our future griswold joe wilson chancellor lugar holmes bob diamondpaul ryan simpson bowles david siegel allen wayne lapierre trueselma john kasich scalia ron wyden james ricesimpson heartlandhaifa clint eastwood martin schneiderman bain monica lewinsky aristotle chris stevens richard mourdock pope aaron swartz secretary desiline victor portman margaret thatcher memorial day udall hispanic rush ornstein pascrell james holmes artur davis beyond outrage danbury chuck hagel swartz richard blanco tony blair charlie priebus david stockman town old south handel bales david cameron reince priebus memorial daydimon ross perot stevens castro john brennan nancy lanza jack lew rob portman thatcher south asia m.d. trayvon martin robert bales pay day mourdock cuomo lisa brown christie mourdock doha stonewall hadiya pendleton dzhokhar tsarnaev charleston dayton women hilary rosen chief arab mother jones rice columbine virginia tech tip o'neill islam women tsarnaev nancy watergate george zimmermanalito rosenbernanke richard mourdock alito simpson bowles biden grenada aaron alex jones cruz rubio ted cruz kandahar jacksonville argentina komen norfolk julian castro broadwell newtown rick snyder watergate rogoffissa amazon indianapolis manchester limbaugh bashar al­assad eric schneiderman robin roberts morsi south sudan austria oak creek terry jonesraddatzmcgovern martin luther king day isaac ottawa gm game change beirut stein gay rubicon conn. kinggiglio brennan baghdadlong island nissan bahrain bolivia sunni fla. granite state zimmerman françois hollandeb. reichholder cayman islands thailandstaff charlestonlaw mark udall hayes east coast e.u. bronx cpac chechnya seoul kabulsouth florida sandra fluke geoffrey dunn biden fear cairo jack kennedy gettysburg aurora louie giglio serbia the president muslim benghazi east asia harvard connecticut okinawa don ’ ann romney aurorasalt lake city oslo janesville lehrer fast monroe lew tenn. ptsdpersian gulf passover nebraska nature oslo n.h. greg smith dick lugar muslim paterson luxembourg northern virginia andrew sullivan fisher jobs progressgaza strip ghana amsterdam cayman islands western europe atlantic labor n.y. west west point norway jewsrumsfeld boylston street slate nashua mesa terry rice geneva salt lake city pearl harbor staffrio finland springfield benghazi matthewrove love jersey shore sandy hookhellernew delhi ukraine tanzania steubenville cairo deep south union appalachia switzerland verizon gop arabs l.a. law haitisouth africa oklahoma city moore district court south carolina georgetown sanford uc berkeley war fort braggliberty rochester pacific coast gaza n.c. san diego world war hiroshimawatertownkent amazon washington state baton rouge n.c. greensboro army charlotte missouriboca raton space arabs virginia tech vra ramallah arkansas boy scouts bible gq poland west wing sacramento rhode island bermuda n.j.stockton reno sesame street hispanics al jazeera latino holland boston u.s. consulate tallahassee fema defense aden neb. roanoke tampa bay ill. ottoman empire hempstead issues stalin jewsnra house democrats gps arab cia ala. mormon church colorado caribbean nsa friends kenya winston­salem colo. mexico city haiti peru staten island wcbs­tv edmund pettus bridge nra terrorizes ny new mexico goldman toryiom tampa zuccotti park oath of office sotu lone star state mesopotamia action fisc sunshine state gm work of nations dhs seiu long island uganda county fair unity south dakota zero dark thirty marines uncle sam south wal­mart south carolinians kochscure missouri buffett rule jpmorgan chase prosperity darfur s.c. hiv ny first lady prop marathon apple commerce house bill aipac house bill communist party iaea the court chamber hbo north mom dea kingdom issue blogtaiwan next new deal roe v. wade boeing iraq war aipac morgan stanley college of charleston seoul jpmorgan sex charles dadt mali latinos wild west union address coast guard union address stanford university chief justice roberts duke university republican congress tampa bay times forum islam long island treasury secretary conservative political action conference usc wsj blunt amendment monsanto general assembly nc imffiscal cliff second amendment sb bsa lgbtq bain capital national public radio intuit hhs secret service hispanic family research council fema health care sandy hook school fannie mae public interest biblecpac u.s. senators irssnap ap common cause enda department of energy jp morgan warren court libor ibm boca raton republican senate church global zero mp dzhokhar tsarnaev star trek u.s. marines save muslim trayvon romneys ninth circuit gi bill barclays atlas shrugged innocence of muslims sesame street united marine corps jstor future throne heritage state of the union susan g. komen buffett rule statelaws army uaw superstorm sandy union beautiful justice scalia ctnew york post vra center commonwealth cpac prop hbo top cubans bible furious general motorssplc foreign service hurricane sandy katrina vietnamese justices labour gun control debate sb journal easter dodd­frank super pacs governor washington times latinos disclose act glock white american revolution cuban world cdc pac asians ninth circuit nom colombian mommy wars memorial day arabic prophet hofstra university nobel prize vatican turkisharabic harvard university bank super old testament medicaid no. bus big bird medicare new york city top talk about psychopaths chechen the gop nsa south pacific eisenhowerags komen black christmas fed nazi ap rnc anti­american act los angeles aipac california­based venezuelan cypriots fisa atlantic framers chrysler u. s. nationnazis sharia swiss frc osama des moines register hispanicsnra oscar­nominated nation passover marathon oecd article burmese nra batman games americans dec. nazis welfarefloridians democracyarabs tsarnaev italians prism winning our future ptsd pope keynesian app dem libyan project vote smart adelsons iraqis scotch democrat­controlled circuit euro independence day medicaredanishwelfare italians hsbc columbine jewish lobby apollo war victorian chief justice union bishops african­americans dodd­frank venezuelans biblical african­american international aids conference islam debt unioncola vatican at & t bainswiss kochs helms­burton act pill latin american billionaires californians anti­muslim marinethai dadt cpi dow jones romneycare economist gulf brits saudis host john room zionist latin american cubans super pac olympic gamesmoon sikh americans framers mideast ct verizon live free republicanism belgian south kennedy black syrians social security somali libor republicanism life guardian south carolinians jew olympics pew soviets midwestern city marathon stars capitol dow arab springapp manchurianlatino south declarationgopaclu jew nra partyfisa top romney­ryan puerto ricans drug war thanksgiving filipino cpi sikh sikhmormon mayan nixonkoch randian aids gospel catholic martian assad Jan 2012 Feb 2012 Mar 2012 Apr 2012 May 2012 Jun 2012 Jul 2012 Aug 2012 Sep 2012 Oc t 2012 Nov 2012 Dec 2012 Jan 2013 Feb 2013 Mar 2013 Apr 2013 May 2013 Jun 2013

Figure 1: WordStream visualization for the Huffington Post data, from January 2012 to June 2013. Terms are color-coded by category.

Abstract This paper introduces WordStream, an interactive visual tool for the demonstration of topic evolution. Our approach utilizes the two popular techniques. Word clouds are designed to give an engaging visualization of text via font sizes and colors, while stacked graphs are a common method for visualizing topic evolution. In particular, WordStream emphasizes essential terms chronologically and spatially. To show the usefulness of WordStream, we demonstrate its applications on various data sets, including the Huffington Post and IEEE VIS publications.

1. Introduction The combination of word cloud and stacked graph models has been recently studied [SWL∗14]. However, there is still room for The of topic evolution has a long history. In 1931, improvement and optimization, especially when the topic streams HistoMap [Spa31] was created by John Spark, showcasing the highly fluctuate. In this paper, we introduce a hybrid visualization power of civilizations along four thousand years of world history. to fill this gap. Our contributions in this paper are: In more recent efforts, ThemeRiver [HHN00] and, later, Stream- Graph [BW08] expand the idea of the stacked graph to convey • We propose a synthesized approach to visualize by the evolution of topic [DGWC10]. Limited screen display and a means of word cloud within a stream layer. large number of layers lead to the small area allocated for each • We implement an interactive text visualization prototype, named term; therefore, the task of fitting terms into topic streams becomes WordStream, to represent the evolution of topics to convey both more challenging. Word clouds [VWF09] are designed for optimiz- spatial and temporal information. ing space usage [Fei10]. However, temporal information has not • We evaluate the usefulness of the WordStream on various data been considered properly in this type of text presentation. sets (e.g., political blogs and other application domains).

c 2019 The Author(s) Eurographics Proceedings c 2019 The Eurographics Association.

DOI: 10.2312/evs.20191178 https://www.eg.org https://diglib.eg.org 104 T. Dang et al. / WordStream: Interactive Visualization for Topic Evolution

2. Related Work layers depict the different topics. However, TIARA only shows a few word clouds into the stream boxes where space is sufficient. 2.1. Word cloud models Therefore, maximizing the space usage within the stream layer was Wordle [VWF09] uses a randomized greedy algorithm to place not the priority of this technique. TextFlow [CLT∗11] demonstrates words. It is greedy since it prioritizes the more frequent words. In evolution and relationships among topics and their critical events: addition, Wordle is aesthetically and visually appealing [Fei10]. In birth, death, split, merge. A word cloud at a particular timestamp the past few years, there many efforts to optimize the Wordle layout. can be only displayed on request. Our WordStream embeds terms ManiWordle [KLKS10] provides more flexible control over how to directly into the stream layer. By using the space sharing technique form the word layout with the interaction from the users. Rolled- between consecutive word cloud and color-encoding for terms in out Wordles [SSS∗12] makes use of Linear Sorting (RWordle-L) the same layer, we present the global evolution of topic streams and Concentric Sorting (RWordle-C) to place the words more com- using their text elements. pactly and preserve the orthogonal ordering and topology. Wordle- Plus [JLS15] extends the idea of ManiWordle to provide some further natural interaction supported for pen- and touch-enabled 3. Design Decisions for WordStream Visualization tablets while controlling the overall Wordle layout such as re- This section presents the goals for designing WordStream and de- sizing, adding, deleting elements. A recent work, called EdWor- WordStream ∗ cisions made during the implementation of to meet dle [WCB 18], allows editing and preserving the word cloud lay- these goals. The aim of WordStream is to communicate the global out. These work mostly focus on extending/optimizing the Wordle patterns of the text corpus across time. The design goals are largely layout and discard the time element. drawn from the related work reviewed in Section2 . There are attempts to integrate temporal constraints into Wor- • G1. Display the evolution of stream topics [CLT∗11]. dle. Parallel Tag Clouds [CVW09] utilize the parallel coordinates • G2. Emphasize important terms in their corresponding topic lay- to represent time constraint. At each time step terms are placed in ers at their corresponding time steps [PD18]. alphabetical order or order of importance, based on term frequency. • G3. Maximize the space usage and place as many terms within This technique also has a feature that is implemented in Word- each topic stream layer as possible [FFB18, WSK∗13]. Stream, which is to display a stream when a term is selected. How- ever, Parallel Tag Clouds is not space-efficient and cannot show the The following decisions were made during the implementation topic or overall evolution across time. of WordStream: • D1: Apply StreamGraph to represent the topic evolution (G1). 2.2. Time Series Visualizations The time direction is from left to right [DW13, DAW13] while the height of each stream layer at a time step is relative to the Stacked graph [Har00] has a baseline representing time constraint, total term frequencies [LZT09]. and each layer serves as a topic of interest. The layers are stacked • D2: Utilize a word cloud algorithm to display terms from each on one another starting from the baseline, and the change of topic within its corresponding stream layer. This helps to satisfy the width of each layer represents the evolution over time. The- G2 [VWF09, Fei10]. meRiver [HHN00] is an optimization of the stacked graph, which • D3: Place each term into its corresponding horizontal position provides a symmetric layout by setting the baseline at the center of according to its timestamp [CVW09]. However, this constraint is the overall graph which then helps to smooth the transition across loosened to as close as possible to its timestamp so some terms time. StreamGraph [BW08], an evolution of ThemeRiver, focuses may be placed outside of their time boxes to meet G3. on minimizing the wiggle per layer, which is sum of squares of the • D4: Arrange texts with regard to their stream flows. Each word slopes at each time point, to avoid legibility issue in the previous may have its own orientation (not always horizontal) to form the stacked graph and ThemeRiver versions, and also improve the aes- stream flows (G1). thetic aspect of the overall graph. All these versions of the stacked graph have a common issue: the limitation in layer width when Figure2 shows a overview of our approach. Topics are the number of layers increases [CSYP18]. With limited space for color-coded by using displaCy Named Entity Visualizer [Mon17]. the stream layer, the process of locating terms of the topic within the stream layer [Wat05] may encounter difficulties. Therefore, the stream layer contains typically only a few or no terms. This makes it difficult to view the evolution at a finer granularity [WSK∗13]. Our WordStream places terms as close to its time step as possible by utilizing space sharing approach between adjacent time steps.

2.3. Combined models TIARA [LZP∗12] utilized the combination of the word cloud and stacked graph to demonstrate visual representation from abstract Figure 2: A standard word cloud and a stream graph are synthe- and complex text summarization. The TIARA visualization includes sized into a single snapshot (on the right). keyword clouds embedded in the layers of a stacked graph, whose

c 2019 The Author(s) Eurographics Proceedings c 2019 The Eurographics Association. T. Dang et al. / WordStream: Interactive Visualization for Topic Evolution 105

On the control panel, users can customize the WordStream layout been placed on the board at the checking time. This board is used by adjusting the visual settings such as the font scale, the maximum to check for collision against a new term. In the pixel-perfect ap- number of words in each box, and the dimensions of the overall proach, the terms and the board are represented in terms of pixels. layout. Regarding term orientation, we provide the following two To reduce the memory space, only the red component (instead of options. all red, green, blue, and alpha components) is used to represent a pixel; this data is stored into a variable called sprite of the board or • Flow: The orientation of the words corresponds to its streams the term. The sprite value of a pixel i is computed from the pixel orientation at the time step that they belong to. data using the following formula: sprite[i] = pixels[i << 2]. The • Angle variance: The angle of rotation of the words varies within collision detection checks the positions of all pixels on the sprite of a fixed range with regard to the medium line of the stream flow. the term and the corresponding positions on the board. Similarly, the placement of the new term onto the current board adds these 4. Computing WordStream Visualization values from the sprite of the term to the corresponding positions of the sprite of the board. As depicted in Figure4 , this spiral starts at the center of each box as calculated for its corresponding time step, and its maximum deviation from the center (dx, dy) is smaller than the diagonal of the box. Drawing: The filtered terms are sequentially located to formulate the stream layers. Notice that the layers can be ordered vertical for quantitative data, such as in increasing order of security levels or user ratings. In addition, interactive features are supported to highlight individual term evolution (see Figure7 ).

5. Evaluation 5.1. Quantitative Evaluation Figure 3: The main components of our WordStream visualization: For each test data set, the combinations of two options (flow and 1 TF-IDF Ratio Preprocessing data, building boxes, placing terms, and drawing. angle0.9 variance) are evaluated on the Compactness metric as de- 0.8

picted0.7 in Figure5 . Compactness is defined by the area of all dis- played0.6 words divided by area of the stream [BKP14], indicating Preprocessing data: The input text documents are prepro- 0.5 the0.4 level of coverage over the stream layer. Compactness has the cessed into entities and further classified into different cate- 0.3 gories [Mon17]. In many cases, the term frequency might not con- range0.2 from 0 to 1; a higher value means the words are closer to 0.1

forming0 the full shape of the stream layers. vey user interests [DPF16]. For example, the term “Obama” re- WikiNews Huffington CrooksAndLiars EmptyWheel Esquire FactCheck VIS_paper IMDB PopCha Protein Discovery

1 peated numerous times in political blogs and news might not draw Compactness 0.9 a lot of attention or interest [SMR08]. To focus on the more sig- 0.8 nificant terms, we use the sudden attention measure, referring to a 0.7 0.6 sharp increase in frequency [DN18]. Let F1,F2,...,Fn be the fre- 0.5 quency of an entity at n different time points. The sudden attention 0.4 0.3 (Ft +1) series (S1,S2,...,Sn) is computed by St = . 0.2 (Ft−1+1) 0.1

0 Building boxes: As for D1 and D3, our approach places terms in- WikiNews Huffington CrooksAndLiars EmptyWheel Esquire FactCheck VIS_paper IMDB PopCha Protein Discovery side their corresponding stream and close to their time steps. In- Flow Angle variance visible boxes are created for this purpose as depicted in Figure4 . WordStream scans along a spiral pattern centered at the box to find Figure 5: Comparisons of the Compactness measure for 10 test the first available space to place terms. data sets (from left to right).

Box As seen in Figure5 , surprisingly, the combination that yields Scanning position StreamGraph layer the best result is disabling both features. One example of this ar- rangement can be seen at the top panel of Figure6 . The conflicts in placing terms can be explained: If the variation is allowed, be- sides the conflicts from each sprite to one another, there are also conflicts from constraints in direction. In contrast, the combination of enabling both options produces the lowest Compactness scores Figure 4: Building boxes and placing terms into stream layers. on most test data sets. WordStream is implemented using D3.js [BOH11]. The demo Placing terms: Mask-based, pixel-perfect collision detection algo- video, online prototype, and more examples can be found rithm is used to detect collisions. The Mask-based algorithm uses at the Github page https://idatavisualizationlab. a board to represent the stream layer with all the terms that have github.io/WordStream/.

c 2019 The Author(s) Eurographics Proceedings c 2019 The Eurographics Association. 106 T. Dang et al. / WordStream: Interactive Visualization for Topic Evolution

5.2. Exploring IEEE VIS author contribution 5.3. Informal User Study The IEEE Visualization Publications data set [IHK∗17] contains We conducted informal user studies to gather qualitative responses 2,867 entries with attributes such as Conference (InfoVis, VAST, about WordStream from two experts, one researcher in political sci- and SciVis), Year (from 1990 to 2016), Paper Title, Link, and Au- ence and one professor in data science. The study began with a thor names. This data set is particularly useful for finding appropri- brief description of WordStream to familiarize them with the us- ate authors for reviewing paper/proposal submissions and exploring age of the analytic tool. We also adapted the implementation of the contributions of researchers over time. Figure6 presents popu- TimeArcs [DPF16] for the same political blogs as a reference. Then lar IEEE VIS authors over a period of 10 years. From top down, we the experts were free to explore the visual interfaces for a specific show the different generated layouts by combining the two options: task: What are the top political events in the past ten years. Both Flow and Angle variance. As depicted, the top panel is the most of them agreed that, in comparison to TimeArcs, the WordStream is compact layout as it can fit more author names than the others. This useful to convey the global trend and can be applied to visualize the confirms our observation in the previous section. Moreover, the top emerging topics in various domains. panel also achieves the best readability as all terms are horizontal. For the overall presentation, both of the users commented that they can quickly understand the idea of the layout. The data sci- No Flow, No Angle Variance Vis Patric Ljung Ye Zhao VAST ence professor stated that he had known word cloud before, and Heike Leitte Lijie Xu Wei Chen Huamin Qu Jaegul Choo Klaus Mueller Christoph Garth Kai Xu InfoVXin Liis Kwan­Liu Ma Torsten M̦ ller David Günther Alexander RindShixia LiuKlaus Mueller Helwig Hauser Hans­Peter Seidel Jan Reininghaus Aidan Slingsby Nathalie Reuter Zhimin Ren Huamin Qu Aditi Majumder Ingrid Hotz Chris North David S. EbertSciVJing Yisang WordStream “allows you to do the longitudinal analysis easily”. Eduard Groller Gerik Scheuermann Eduard Groller Pablo Roman B. L. William Wong Bernd Hamann Jens Kasten Harald Bosch Wei ChenAlex EndertPeng Xu Cláudio Silva Hamish Carr Kwan­Liu Ma Kwan­Liu Ma Nico Pietroni Harald Piringer Jian ChenQuan Li Christoph Garth Kwan­Liu Ma Xavier Tricoche Daniel Weiskopf Hujun Bao R. Jordan Crouser Li Yu Attila Gyulassy Yingcai Wu Silvia Miksch Kresimir Matkovic Yun Wang He chose a term and scrolled through the entire timeline to see Hans Hagen Kenneth I. Joy David S. Ebert Cláudio Silva Daniel A. Keim Jie Liang Valerio Pascucci Jian Cui Remco Chang Wolfgang Aigner Dominik Sacha Ivan Viola Carlos Eduardo Scheidegger Jim Wall Xiaoru Yuan Stefan Bruckner Alark Joshi Wei Chen Chris Weaver Remco Chang Rick Walker Eduard Groller Nathaniel Fout Paul Rosen Zhicheng Liu Jason DykesJian Zhao Liu Ren the fluctuation of its occurrences as depicted in Figure7 . Hence, Yun Jang Jaegul Choo Daniel A. Keim Danyel Fisher Jo Wood Carsten G̦ rg William Wright Chris North Bongshin Lee Jeffrey Heer Neel Parekh Kwan­Liu Ma Tera Marie GreenJo Wood Jo Wood Gilles Bailly William RibarskyAnnie Tat David S. Ebert Daniel A. Keim Maryam Nafari Yun Jang Hanspeter Pfister Daniel A. Keim Andrew Wade Salvatore Rinzivillo Ryan M. Kilgore Bernhard JennyTamara Munzner Kevin Tate the visualization is helpful in an exploratory analysis. However, he Chris Weaver Ulrik Brandes Steven M. Drucker Jean­Daniel Fekete Jing Yang Zhicheng Liu William Ribarsky John T. StaskoJi­Dong YimJo Wood Romain Vuillemot William Wright Jean Scholtz Minoo Erfani Joorabchi Bongshin Lee Enrico Bertini Aur̩ lie Coh̩ Minjung Kim Natalia V. Andrienko Petra Isenberg Nathalie Henry Riche Timothy Major Chi­Chun Pan Peter Bak Chris Shaw Alexander Lex Jocelyn Ng commented that the layout might be cluttered when the number of Kevin I.­J. Ho Tobias Schreck Chris North Remco Chang Ji Soo Yi Martin Fink Pierre Dragicevic John T. Stasko Shixia Liu Jian Chen Marc Streit Tim Dwyer Heidi Lam Remco Chang Adam Perer Huamin Qu Xi Zhu Hanqi Guo Daniel Ha Christopher Collins Jeffrey Heer Bret Jackson Hanspeter Pfister Aaron Knoll Lynn Chien Bongshin Lee Jeffrey HeerJason Dykes Arie E. Kaufman Tung Yuen Lau Maximilian Baust Jorge Poco layers increases to about ten. On the other hand, the political expert Tamara Munzner Jean­Daniel Fekete Danyel Fisher Marian D̦ rk Min Chen Yubo Zhang Justin Talbot Xianfeng Gu Kwan­Liu Ma Hanqi Guo George G. Robertson Ying Tu Frank van Ham Shixia Liu Daniel F. Keefe Bernhard Preim Julien Tierny M. Sheelagh T. Carpendale M. Sheelagh T. Carpendale M. Sheelagh T. Carpendale Martin Falk Youn ah Kang Kwan­Liu Ma Han­Wei Shen Markus Hadwiger John T. Stasko Petra Isenberg George Legrady Kai Xu Vijay Natarajan at first found it “intimidating,” but shortly after the description, he John T. Stasko Melanie Tory Filip SadloTimo Ropinski Jason Dykes Han­Wei Shen Jo Wood Kenneth I. Joy Alexander Bock Zhen Wen Tobias H̦ llerer Robert Kosara Torsten Mller Thomas Ertl Jeffrey Heer Dan Gruen ̦ Ji Soo Yi David Günther found the interface easy to use. Furthermore, he found the brushing

Only Flow and linking are efficient for highlighting the temporal patterns of Helwig HauserValerio Pascucci Jian Zhao ̦ ller Rick Walker ei Chen Klaus Mueller Christoph Garth Paul Rosen W terms, along with supporting content analysis. Zhimin RenLijie Xu orsten M Harald Piringer Huamin Qu Bernd Hamann Huamin QuT Shixia Liu Eduard GrollereiskopfIngrid Hotz Wolfgang Aigner Adam PererAlex Endert Klaus MuellerCláudio Silva John Allen Crow Alexander Rind Eduard Groller Gerik Scheuermann Daniel W Hujun Bao Huamin Qu Jian Zhao Hans Hagen Philipp MuiggHans­Peter SeidelWei Chen Pablo Roman Wenwen Dou Enrico Bertini Aditi Majumder Tim Lammarsch Jie Liang Andrew J. Hanson Alark Joshi Kwan­Liu MaTom Abel ood Nathalie Reuter Bernd HamannricocheZi'ang Ding Wei Chen Chris North Eduard Groller Isaac Cho B. L. William W Kwan­Liu MaIvan V Xavier T Luis Gustavo Nonato R. Jordan CrouserJo W ong David S. EbertXiaoru Yuan eigle Li Yu Daniel A. Keim Silvia Miksch iola Kenneth I. Joy Jian Cui J̦ rn KohlhammerMartin Hess Cláudio Silva Quan Li Chris W Aidan Slingsby Daniel A. Keim Mario Valle Ming C. Hao Jo Wood Remco ChangHanspeter Pfister Daniel Ha William W Daniel A. Keim Liu Ren right Wenwen DouAidong Lu Jocelyn Ng William Ribarsky Kwan­Liu Ma ̩ Schulz Romain V Hanspeter Pfister Jing Yang Zhicheng Liu J̦ rn Kohlhammer Jaegul Choo Jean­Daniel FeketeJean­Daniel FeketeTim Dwyer Kelly SearsmithDavid S. Ebert Andr at Wenwen Dou Shixia Liu Tamara Munzneruillemot Kevin T Daniel A. Keimright Annie T Remco ChangMark A. Whiting eaver Enrico Bertini ate William W Loretta AuvilTobias Schreck Catherine PlaisantChris W Ji Soo Yi Alexander LexJeffrey Heer . Stasko Justin Talbot Bongshin Lee Anushka Anand John T Minjung Kim Jürgen BernardAlex Endert. Stasko im ood Ross Maciejewski John T Ji­Dong Y Vijay Natarajan Jo W Aur̩ lie Coh̩ Chi­Chun Pan Justin T Kwan­Liu Ma ̦ rg Y albot Daniel F Hanqi Guo Carsten G Jean­Daniel Feketeing T Bongshin Lee Hanqi Guo un Jang u T Bongshin Lee Marian D Yubo Zhang. Keefe Bernhard PreimFilip Sadlo Aaron Knoll Y John T amara Munzner ̦ rk Nathalie Henry RicheKwan­Liu Ma Xianfeng Gu Petra IsenbergTamara MunznerRobert Kosara Petra IsenbergArie E. Kaufman David Günther Jorge Poco Pat HanrahanBrian RexroadHeidi Lam. Stasko Daniel A. Keim Marc Streit Jian Chen Han­W Thomas Ertl u Jason Dykes Han­W Adam PererMelanie T Kenneth I. Joy Aaron BarskyZhen W ory Bernhard Preim ing T . Carpendale ei Shen Shixia Liu Zhao Geng ei Shen Y Jef Tung Y Markus HadwigerKai Zhao frey Heer. Stasko en Frank van Hamfrey Heer Jo Wood Ankur Kapoor Jef Alexander Lex uen Lau Martin Falk John T Kai Xu Kwan­Liu Ma M. Sheelagh TGeorge G. Robertson Only Angle Variance Kwan­Liu Ma Jens Kasten Heike Leitte Paul Rosen Han­Wei Shen Ye Zhao Holger Theisel Ingrid Hotz Zhiqiang Maong Shixia Liu Valerio Pascucci ricoche Holger Theisel Jo Wood Torsten MMin Chen Bernd HamannMarkus HadwigerXavier T Kai Xu Cláudio SilvaQuan Li Helwig Hauser ̦ ller Eduard Groller Enrico BertiniAlex Endert Hans Hagen Alark Joshi Hamish Carr Lijie Xu an Zi'ang Ding B. L. William WHarald PiringerEduard GrollerHuamin Qu Jie LiangPeng Xu Ivan V im LammarschJason Dykes iola Zhicheng Y Aditi MajumderNatallia KotavaDavid Günther Chris NorthAndreas PaepckeT Dominik SachaKwan­Liu Ma Xin Li Eduard Groller Chris NorthLi Yu W Jaegul Choo ei Chen Eduard GrollerJo Wood ei Chen Eduard Groller Lars LinsenEduard GrollerW Kwan­Liu Ma Alexander Rind Helwig Hauser Gerik Scheuermann Jian Cui ood ory William Ribarsky Jo W im Bum Chul Kwon Silvia Miksch Klaus MuellerCarlos Eduardo Scheidegger Melanie T Tim Dwyer Jing Y Nathalie Henry RicheAdam PererCagatay Turkay ang Fosca Giannotti Ji­Dong Y Figure 7: Term selection in our WordStream layout: boston (loca- David S. Ebert Remco ChangMartin EisemannJohn Gerth Enrico Bertini Kim Marriott Remco ChangDaniel A. KeimRemco ChangWilliam C. Ray Martin Fink William Wright Harald BoschChris Weaver ood Neel Parekh right Lane Harrisonang Steven M. Drucker Anushka Anand Chi­Chun Pan Alex Endert all Tamara Munzner Jo W ood tion) and google (organization). John T i Nathalie Henry Riche Petra IsenbergJocelyn Ng Daniel A. KeimChris ShawJim W Jean­Daniel Fekete Jo W . StaskoDaniel A. KeimZhicheng Liu William WJürgen BernardKwan­Liu MaJaegul Choo Xiaoyu W Jeffrey Heer Thomas KaplerWilliam RibarskyRoss Maciejewski Ji Soo Y Petra Isenberg Marc StreitXi Zhu Jarke J. van WijkXiaotong LiuGilles Bailly Carsten G̦ rg Peter Bak Thomas Ertl GeofHeidi Lam ̦ llerer Marc Streit T frey P Ying Tu Tobias H W Daniel Ha amara Munzner. Ellis Yubo Zhang on­Ki JeongMartin Falk M. Sheelagh T . Staskoattenberg Jessica Hullman Kwan­Liu Ma Petra IsenbergAdam PererJef Shixia Liu Ji Soo Yi frey HeerHuamin QuSharon Lin frey Heer Kwan­Liu Ma Filip Sadlo . CarpendaleJohn T Bongshin Lee Jo W Jef Kai Xu Arie E. Kaufman. Keefe Xianfeng Gu frey Heer Martin WDanyel FisherFrank van Ham Robert KosaraAlexander Lexood Daniel F Vijay Natarajanei Shen Besides the positive feedback, the experts also mentioned some Jef u Han­W imo Ropinski Zhen W Harald Obermaier ierny T Kai Zhao Ying T . Stasko en Hanqi Guo David GüntherJulien T John TJason Dykes Hanqi Guo limitations of WordStream, in which the related words are not Flow and Angle Variance shown in clusters, unlike TimeArcs. One of them suggested that the Eduard Groller Marc Streit Bernd Hamann Thomas MühlbacherThomas BaudelKresimir Matkovic Pablo Roman Ingrid Hotz Weiwei Cui relationships being drawn explicitly among terms would be more u Helwig HauserHans Hagen Helwig HauserAttila Gyulassy Li Y Yingcai Wu Jie Liang Attila GyulassyNathaniel Fout Kwan­Liu MaJan Reininghaus Tim Lammarsch Jing Yang Carlos Eduardo ScheideggerHuamin Qu alker V Wei Chen Lijie Xu David Günther Rick W Jian Chen useful than the proximity of terms as in the current visualization. Ivan V alerio Pascucci Eduard Groller ei ChenCláudio SilvaJaegul Choo iola Min Chen ̦ ller Natallia Kotava Alexander RindKai Xu W Alex EndertQuan Li Christoph Garth Zhimin Ren Torsten M Xin Li Heike Leitte Jian Cui Chris NorthHarald Piringer Klaus Mueller Alark Joshi Zi'ang Ding Holger TheiselPeter KokJens Kasten B. L. William Wong Huamin Qu Cláudio Silva R. Jordan CrouserChris NorthRemco ChangSilvia Miksch Kwan­Liu Ma Heike Leitte Paul RosenChris Shaw Huamin Qu Peng Xu Song Zhang William Ribarsky Eytan AdarAidan SlingsbyEduard GrollerKlaus Mueller Minjung Kim Gerik Scheuermann Jim Wall Adam Perer Shixia Liu Kim Marriott Tobias Schreck Natalia V. Andrienko ood Jocelyn Ng Gennady L. AndrienkoDaniel A. Keim Xi Zhu ang Chris Weaver T Hanspeter Pfister ate Jing Y William Ribarsky obias SchreckJo W Hanspeter Pfister Brittany Kondo Kevin T 6. Conclusion . StaskoWilliam Chao Mirco Nanni Magnus Heitzler ade Daniel A. Keim Tim Dwyer right Andrew W Supriya Garg Pierre Dragicevic Liu Ren William W Phong H. Nguyen Enrico BertiniAlexander Lex John T right Minoo Erfani Joorabchiim Nathalie Henry Riche William Ribarsky Jean Scholtz albot ang T frey Heer i amara MunznerMarc Streit ood Jef William W Chris North Justin T Mike Sips Min Chen Anushka AnandAur Thomas Kapler Ross Maciejewski Peter Bak Ji­Dong YXiaoyu WJi Soo Y Vijay NatarajanYvonne Jansen Jo W ̩ lie Coh This paper presents a hybrid text visualization technique. Word- Chi­Chun Panrg Petra IsenbergKwan­Liu MaUlrik BrandesYubo Zhang Martin Falk ̩ ̦ David S. Ebert M. Sheelagh T Arie E. Kaufman Jürgen Bernard . CarpendaleJessica Hullman Kai Zhao Daniel A. Keim Adam Perer Jeffrey Heer Daniel F Kenneth I. Joy ierny u Jean­Daniel FeketeAaron Barsky Torsten M Tung Y Filip Sadlo John J. Socha Carsten G Brian Rexroad Marian D Julien T Stream aims to communicate the global trends of the underlying Daniel Ha ing T Chris North ̦ Jian Chen uen Lau Ankur Kapoor frey Heer Y Ying Tu Bongshin Lee rk . Keefe̦ ller Kwan­Liu MaTimo RopinskiColin Ware i Jason Dykes Petra Isenberg Hanspeter Pfister Jef oun ah KangJohn T. Stasko Harald Piringer Shixia Liu Kwan­Liu Ma Han­Wei Shen uan . CarpendaleY Martin Wattenberg Huamin QuJason Dykes Ronell Sicat Xiaoru Y . Stasko Heidi Lam Inan̤ Birol Marc Streit Xianfeng GuHarald ObermaierMarkus Hadwiger topic evolution while preserving the presentation-oriented criteria Kai Xu ilanova Ji Soo Y Frank van Ham Melanie ToryJo W John T Sharon Lin ood Anna V Thomas Ertl Eduard Groller M. Sheelagh TGeorge G. Robertson of the visualization solution. We demonstrate the applications on 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 various data sets, showing that WordStream could quickly highlight important terms and could assist users in exploring term evolution Figure 6: Popular IEEE VIS authors over 10 years from 2007 to at a finer granularity. Future work will focus on algorithms to clus- 2016: author names are colored by their first publication venue. ter the terms within and across topic streams. Also, more interactive features should be supported to optimize WordStream layout.

c 2019 The Author(s) Eurographics Proceedings c 2019 The Eurographics Association. T. Dang et al. / WordStream: Interactive Visualization for Topic Evolution 107

References [LZP∗12]L IU S.,ZHOU M.X.,PAN S.,SONG Y., QIAN W., CAI W., LIAN X.: TIARA: interactive, topic-based visual text summarization and [BKP14]B ARTH L.,KOBOUROV S.G.,PUPYREV S.: Experimen- analysis. ACM TIST 3, 2 (2012), 25:1–25:28.2 tal comparison of semantic word clouds. In Experimental Algorithms (Cham, 2014), Gudmundsson J., Katajainen J., (Eds.), Springer Interna- [LZT09]L OHMANN S.,ZIEGLER J.,TETZLAFF L.: Comparison of tag tional Publishing, pp. 247–258.3 cloud layouts: Task-related performance and visual exploration. In Pro- ceedings of the 12th IFIP TC 13 International Conference on Human- 3 [BOH11]B OSTOCK M.,OGIEVETSKY V., HEER J.:D data-driven doc- Computer Interaction: Part I (Berlin, Heidelberg, 2009), INTERACT uments. IEEE Transactions on Visualization & , 12 ’09, Springer-Verlag, pp. 392–404.2 (2011), 2301–2309.3 [Mon17]M ONTANI I.: An open-source named entity visualiser [BW08]B YRON L.,WATTENBERG M.: Stacked Graphs - Geometry & for the modern web, 2017. https://explosion.ai/blog/ Aesthetics. IEEE Transactions on Visualization and Computer Graphics displacy-ent-named-entity-visualizer [Accessed date: 14, 6 (2008), 1245–1252.1 ,2 April 10, 2019].2 ,3 ∗ [CLT 11]C UI W., LIU S.,TAN L.,SHI C.,SONG Y., GAO Z.,QU H., [PD18]P HAM V., DANG T.: Cvexplorer: Multidimensional visualization TONG X.: TextFlow: Towards better understanding of evolving topics for common vulnerabilities and exposures. In 2018 IEEE International in text. IEEE Trans. Vis. Comput. Graph. 17, 12 (2011), 2412–2421.2 Conference on Big Data (Big Data) (Dec 2018), pp. 1296–1301.2 [CSYP18]C UENCA E.,SALLABERRY A.,YING WANG F., PONCELET [SMR08]S CHÜTZE H.,MANNING C.D.,RAGHAVAN P.: Introduction P.: MultiStream: A Multiresolution Streamgraph Approach to Explore to information retrieval, vol. 39. Cambridge University Press, 2008.3 Hierarchical Time Series. IEEE Transactions on Visualization and Com- [Spa31]S PARK J.: Time of world history: A puter Graphics (2018).2 histomap of peoples and nations for 4,000 years, [CVW09]C OLLINS C.,VIÉGAS F. B., WATTENBERG M.: Parallel tag 1931. http://www.worldhistorycharts.com/ clouds to explore and analyze faceted text corpora. Proceedings of the the-histomap-by-john-sparks/ [Accessed date: Feb 20, VAST 09 - IEEE Symposium on Science and Technology 2019].1 (2009), 91–98.2 [SSS∗12]S TROBELT H.,SPICKER M.,STOFFEL A.,KEIM D., [DAW13]D ANG T. N., ANAND A.,WILKINSON L.: TimeSeer: DEUSSEN O.: Rolled-out Wordles: A Heuristic Method for Overlap Re- Scagnostics for high-dimensional time series. IEEE Trans. Vis. Comput. moval of 2D Data Representatives. Computer Graphics Forum 31, 3pt3 Graph. 19, 3 (March 2013), 470–483.2 (2012), 1135–1144.2 ∗ [DGWC10]D ÖRK M.,GRUEN D.,WILLIAMSON C.,CARPENDALE S.: [SWL 14]S UN G.,WU Y., LIU S.,PENG T., ZHU J.J.H.,LIANG A Visual backchannel for large-scale events. IEEE Transactions on Vi- R.: Evoriver: Visual analysis of topic coopetition on social media. sualization and Computer Graphics 16, 6 (2010), 1129–1138.1 IEEE Transactions on Visualization and Computer Graphics 20, 12 (Dec 2014), 1753–1762.1 [DN18]D ANG T., NGUYEN V. T.: ComModeler: Topic Modeling Us- ing Community Detection. In EuroVis Workshop on Visual Analytics [VWF09]V IEGAS F. B., WATTENBERG M.,FEINBERG J.: Participa- (EuroVA) (2018), Tominski C., von Landesberger T., (Eds.), The Euro- tory visualization with Wordle. IEEE transactions on visualization and graphics Association.3 computer graphics 15, 6 (2009).1 ,2 [Wat05]W ATTENBERG M.: Baby names, visualization, and social data [DPF16]D ANG T. N., PENDAR N.,FORBES A. G.: Timearcs: Visual- Proc. IEEE Symp. on Information Visualization izing fluctuations in dynamic networks. In Computer Graphics Forum analysis. In (2005), (2016), vol. 35, Wiley Online Library, pp. 61–69.3 ,4 pp. 1–7.2 [WCB∗18]W ANG Y., CHU X.,BAO C.,ZHU L.,DEUSSEN O.,CHEN [DW13]D ANG T., WILKINSON L.: TimeExplorer: Similarity search B.,SEDLMAIR M.: EdWordle: Consistency-Preserving Word Cloud time series by their signatures. In Proc. International Symp. on Visual Editing. IEEE Transactions on Visualization and Computer Graphics Computing (2013), pp. 280–289.2 24, 1 (2018), 647–656.2 [Fei10]F EINBERG J.: Wordle. Beautiful Visualization: Looking at data [WSK∗13]W ALDNER M.,SCHRAMMEL J.,KLEIN M.,KRISTJÁNS- through the eyes of experts. (2010), 37–58.1 ,2 DÓTTIR K.,UNGER D.,TSCHELIGI M.: Facetclouds: Exploring tag [FFB18]F ELIX C.,FRANCONERI S.,BERTINI E.: Taking word clouds clouds for multi-dimensional data. In Proceedings of Graphics Interface apart: An empirical investigation of the design space for keyword sum- 2013 (Toronto, Ont., Canada, Canada, 2013), GI ’13, Canadian Informa- maries. IEEE transactions on visualization and computer graphics 24, 1 tion Processing Society, pp. 17–24.2 (2018), 657–666.2

[Har00]H ARRIS R.L.: Information Graphics: A Comprehensive Illus- trated Reference. Oxford University Press, 2000.2

[HHN00]H AVRE S.,HETZLER B.,NOWELL L.: ThemeRiver: visual- izing theme changes over time. Proceedings of IEEE Symposium on Information Visualization 2000. INFOVIS 2000 2000 (2000), 115–123. 1,2

[IHK∗17]I SENBERG P., HEIMERL F., KOCH S.,ISENBERG T., XU P., STOLPER C.,SEDLMAIR M.,CHEN J.,MÖLLER T., STASKO J.: vis- pubdata.org: A metadata collection about IEEE visualization (VIS) pub- lications. IEEE Transactions on Visualization and Computer Graphics 23, 9 (Sept. 2017), 2199–2206.4

[JLS15]J O J.,LEE B.,SEO J.: WordlePlus: Expanding Wordle’s Use through Natural Interaction and Animation. IEEE Computer Graphics and Applications 35, 6 (2015), 20–28.2

[KLKS10]K OH K.,LEE B.,KIM B.,SEO J.: ManiWordle: Providing flexible control over Wordle. IEEE Transactions on Visualization and Computer Graphics 16, 6 (2010), 1190–1197.2

c 2019 The Author(s) Eurographics Proceedings c 2019 The Eurographics Association.