Visualizing Information and

Visualizing Data: An Interview with Anselm Spoerri

visualizing data starts with putting the data into a form you can visualize, then understanding who your audience is and what they want to know.

By Jocelyn McNamara, MLis

f a picture is worth a thousand and teaches students how best to visu- usually are static. They words, how many data is an alize data. Earlier this year, he was hon- give you a high-level view of the key worth? ored with a Professional and Scholarly salient patterns in the data that have Typically not many, says Excellence (PROSE) Award for DataVis been identified by an analyst, who Anselm Spoerri. An infographic—often Material Properties, a tool he helped then—either by using some tools or (wrongly)I used as a catch-all term for design for McGraw-Hill Education. hiring a graphic designer—comes up everything from simple bar to Information Outlook interviewed with a visual representation of those key complex interactive visualizations—is Spoerri about the techniques and tools results. So, from the perspective of data best used to convey a few data points for visualizing data and information and as a whole, infographics that represent key patterns or trends. the biggest challenges facing informa- are what you use when you are com- Visualizations, on the other hand, pro- tion professionals who work with data. municating with a very general audi- vide context for the data and often allow ence or you just want to communicate the audience to manipulate the data The terms graphics, infographics, and highlights in a quick way. to reveal more complex relationships visualization are used frequently and But the moment you want to under- among the factors that affect the data. stand more about the data, then you often interchangeably in regard to data Spoerri, a faculty member at the come into the realm of what’s con- presentation. Can you define these School of Communication and sidered interactive , words and explain their differences and Information at Rutgers University, has meaning you can actually drill into the similarities? conducted research in the field of infor- data and filter it and look at the subsets mation visualization for the last 20 years The way I like to think about it is, and understand the data yourself. The questions to ask about interactive data visualization are, one, who is doing it, and two, for whom is it being done? Jocelyn McNamara is deputy director at LAC Federal in Rockville, An infographic is designed for end Maryland, and a member of the Information Outlook Advisory Council. She can be reached at [email protected]. users who don’t need to have a lot of content understanding or domain understanding, so they can very quickly grasp key things that are going on with the data. But you can’t communicate

INFORMATION OUTLOOK V21 N05 september/october 2017 11 Visualizing Information and Data

more complex or subtle relationships with data using infographics. If you’re an analyst and you have a data set, you’d like to have interactive data visualization tools so you can bet- ter understand the data you’re trying to analyze. What’s starting to happen— and you’re seeing this in newspapers like The New York Times and The Washington Post—is that readers are being given access to interactive data visualization tools, so the readers them- selves can explore the data and not just have to take the word of journalists that these are the key insights to be gained from the data. It’s critical that the right kinds of dis- plays are used to make the underlying patterns of the data visible. Often, you need to first figure out what those pat- terns are, so you need to go through an interactive or iterative loop to figure out what’s going on in the data and what’s Anselm Spoerri the best way to see it and understand it and then present it. The distinction, in a major sights. You position your camera one of motivation—how much do I care way, is between the presentation of data and take a snapshot, and you say, here about the data, and how much value and understanding and analyzing data. is an interesting sight. Infographics is is in the data? For example, if you look Sometimes the same tools are being like that—you take a snapshot of an at The New York Times, they did quite used for both purposes. interesting pattern, and then you move a lot of visualizations around the 2016 If you want to go one level deeper, on and explore the data and find anoth- elections. Why? Because they believe you get into what’s called data analytics. er interesting connection or relation- their readers care about that kind of This is where the data sets are so huge ship, and you take another snapshot. data—they want to know more about it, that, yes, you can visualize them, but to And those snapshots are what you put and they may even want to drill into it. be able to really understand them you into your presentation. So it really comes down to the value of need machine learning or artificial intel- Using infographics is really a ques- the data set and what people are hop- ligence methods to find the interesting tion of who your audience is and how ing to do with the information they gain subsets. Then you can use visualiza- much they care to know. Infographics from the data set. tion to further examine these data sets are great for giving you the highlights, What’s also happening with info- or fine-tune the algorithms that are and depending on the sophistication of graphics is what I call “animated pre- crunching through them. the designer, sometimes it can be done sentations.” These are like mini-videos in a very memorable way so that even a that tell you, in an animated way, a So the idea is that you don’t need to be lay person can understand quickly that story. They’re highly scripted, like a a subject matter expert to be able to the data are about cars or animals or movie. And somebody has to figure out drill down and understand what’s going the climate. There are pictorial repre- the highlights to present. on, correct? sentations that immediately tell people It depends. For example, you can the context the data are in. So the interpretation has already been view an infographic as an access ramp The challenge with what I would call done for the viewer? into data to get the key ideas. The “cold” or “sober” data visualization is Yes. And the questions there are, how reader gets an understanding, then that you have a series of displays that skillful is the analyst who interpreted the moves on. He doesn’t linger with the you look at and you don’t even know data, and how skillful is the graphic data, maybe doesn’t even have a desire what they’re about. You have to go look designer who created the presentation? to, because often data is messy. at the axes and read the labels, and it’s Think of it almost like being in a big all very abstract. I can readily imagine that data could be city and wanting to show someone the The other question in my mind is interpreted and communicated one way,

12 INFORMATION OUTLOOK V21 N05 september/october 2017 Visualizing Information and Data

but someone might view it and draw a sured, or this is what was easy to a very low probability of happening, but different and unintended conclusion. measure, so now we’ll work with it even when it happens, it has a catastrophic though it’s not really the best way to impact. True, there is a possibility that If it’s a rich data set, there are mul- think about this phenomenon. That’s we don’t know yet how to use all of tiple ways you can slice it and filter one issue. Another issue is, how impor- this data and make sense of it. But we it. For example, go back to the 2016 tant is this data? Is it worth it for me to have this computer infrastructure now election. People are still trying to under- develop new ways of measuring things that has gotten so cheap that we can stand what happened, how we got so I can start collecting this data? really go for it. Before, it was way too the result we got, and who voted for And there are always errors in data, expensive to work with these large data whom. The question is, what types of because data is noisy and messy. So sets, so you had to be more “clever” to data variables do you need to take into when I teach students about data visu- extract value. account? Income? Religion? Education? alization, I tell them the hardest part Now, let’s bring it back to infograph- What variables do we have, and how do is getting the data in a form that you ics. I‘m in this sea of data and I’m using we bring them into the data space so can visualize. That’s a lot of work. And visualization to navigate the data and that a viewer can start to see relation- then you have to ask, how valid is your get some sense of it. In a way, data ships and correlations? data, and what is the provenance of visualization is a communication tool. An Infographic simplifies, either your data? The question is, with whom am I com- because a person has decided that Another thing that’s happening—and municating? Am I communicating with what’s being presented is good enough, I think this is where big data is coming myself, meaning I’m looking at the data or there’s a very established way to in—is that people are starting to col- and I want to understand it so I can think about it. The key is to make the lect so much data that, at the end of make a decision? If so, I’m using visual- information as accessible and consum- the day, the noise in it doesn’t matter ization as an analyst of the data. able and understandable as possible. as much. If you were to look at one Now I need to tell others about the For me, if an infographic is done well, data variable on its own, you might say data. If I have a team of specialists, we it’s like a well-designed documentary there’s too much noise in it, there’s too work together to build up know-how or movie. It’s also there for quick con- much measurement error, or there’s and maybe we even customize some sumption. That’s what it’s designed for. too much bias. But if you add another visualization tools so we can better If you look at what’s happening right data variable, and then another and understand the data. Then we may now, infographics are becoming very another, soon you have millions and need to communicate the data to oth- popular. The news media are using millions of data points. And with the ers who are not experts, who are not as infographics to communicate with their statistical methods and machine learn- involved with the nitty-gritty of the data. readers; infographics are also being ing techniques available today, you can And the question there is, what’s the used in education to make certain facts squeeze out some decent patterns that, attention span of the audience? available, to make numbers not appear if you had much less data, wouldn’t be If it’s higher management and they as cold or abstract—to make them more as good. just want the highlights, I might say relatable. Infographics is more about Yes, the data is noisy, and yes, it can we should create an infographic for how data can be consumed. When be biased. But there comes a point them—here are the key facts, and here we talk about data visualization, we where, at certain volumes of it, and are some charts that are easy to read are moving into a higher-dimensional combined with other data sets—this is and don’t require special knowledge to realm of interaction, inquiry, examina- another thing that’s happening, people understand. Or maybe we need to have tion, asking questions, and developing are combining multiple databases— a feedback loop and an interactive tool, hypotheses. when you start comparing them and so that when we’re talking about the relating them, you can sort of get the data we can be in the moment and Going back to your election example, noise out, so to speak. change the filter settings and the view people often talk about data as some- of the data so we can start exploring the thing that’s very objective and that we Looking at big data sets is something data and communicate to others how can refer to in a highly scientific way. we haven’t previously done before. rich and complex it is. Oftentimes, we forget that the variables And that raises all sorts of black swan It’s all a question of who’s doing it, we come up with at the front end of the scenarios, where we’re trying to look why they’re doing it, what they’re hop- inquiry are being determined by people. for patterns that are emerging, but ing to get out of it, and how much time I think the human bias is overlooked a we don’t even know what to look for they’re willing to invest in it. lot when we decide how to construct because we haven’t analyzed data at and interpret data sets. this level before. So, what are the main techniques for Sometimes you have to work with visually displaying data, and what visu- You mentioned the black swan phe- what you have—this is what we mea- alization tools do you think would be nomenon, which is something that has

INFORMATION OUTLOOK V21 N05 september/october 2017 13 Visualizing Information and Data

most useful for librarians and informa- , and so on. Most often, librarians cated the tool you need to understand tion professionals to learn? will be using these types of tools. Some the data. of them you can generate in Excel to For me, the main driver is this: At the If you think in terms of visualization, create static visualizations, or you can core is the data. Does the data mat- there are standard display types—bar use a tool like Tableau that has a free ter? Depending on how rich the data charts, line charts, pie charts, bubble public version. Or maybe you get a is and how much it matters, are there charts, scatter plots, and . These license because you have data that’s meaningful patterns in the data? What are the workhorses of data visualization, coming from different databases that are those patterns? You pick the right and you’ll also see them in infographics. needs to be merged, and then you cre- display to make those patterns visible. People are familiar with them; they’ve ate visualizations and presentations for Then, if you want to design for a short been in use for some time. And if you your management or for the public. attention span or for people who just think in terms of how designers visu- Google has free tools you can use; want the key facts, you move to an ally encode information, they do it in a there’s a whole slew of tools available infographic; if you want to open it up so way that makes it quite easy for human out there. The question then becomes, there can be more exploration, you use visual systems to read it, because they do you also want to do some statistical an interactive data visualization tool. produce visual patterns that the human analysis? There are packages that are With an interactive tool, the question visual system is good at detecting. optimized for that. So it really becomes is, do you give people access to all of Now the question is, when do you a question of how rich your data is—the the data, or do you create a curated use a , when do you use a line larger the data set, the more sophisti- data set that contains the key facts

Figure 1: The 100 Most Controversial Pages on Wikipedia (by language)

14 INFORMATION OUTLOOK V21 N05 september/october 2017 Visualizing Information and Data

and patterns, but you don’t put all of it value that’s there or to help people find and that really depends on the specific in there because it’s not necessary or the value that’s there. need and also on the value proposition because it’s proprietary or there’s some- in the data. It’s the same as in other thing you don’t want everyone to see? Where do you see data visualization contexts—if there’s something that has And in order to get to that, you need going, and what may be some of its special value, we will build specialized to have some data specialists who, long-term applications? tools to extract that value. depending on the size of the data, have I said before that visualization is a to dig around and analyze it until they Do you have a favorite resource to communication tool. It’s either a com- get it to a place where it’s presentable. recommend for those of us who want munication tool for me to understand There’s a notion out there that if I to learn more? the data as a data analyst, or a tool to have a data visualization tool and I communicate what I found in the data There’s Alberto Cairo, who has writ- connect it to the data set, I will magi- to others. The question here is the ten some great books that are not just cally understand the data. Not so. You sophistication. Will this become part for specialists. He comes from a jour- need to have certain skills and under- of a general tool set that’s available for nalism background, but he’s also a data standings so that you know how best free—meaning I will buy a computer or visualization researcher. to make visible what’s in the data. a smartphone and it will have this capa- To me, what’s interesting is what’s And sometimes there’s nothing in the bility built in—or will I have to buy some happening with the newspapers. If you data—there’s nothing strange or earth- sort of specialized software? look at The New York Times and The shaking, or you already knew it or had a The trend for the last few years has Washington Post and The Guardian, sense about it, and there are no major been that more and more has to go they’re using visualizations in their sto- insights lurking in there. through the browser. Many people, rytelling. And I don’t mean infograph- even heavy-duty users, would like a way ics—I mean interactive data visualiza- Is there a risk that data visualization to use their browser so they can access tion for a lay audience to better get a can be overused—that it will lose its the tool and view the data. That puts sense of the data and understand and impact? pressure on what can be done in the explore it. This brings me back to infograph- browser, whereas if you build special- I think the other thing to think about ics. If you ask me my opinion about ized applications that are optimized for, is how it’s being consumed. Some of infographics, I’m sort of skeptical. I say, rendering data points very quickly, the visualizations require a large screen; understand the value of a well-designed that would be easier, but it would otherwise, it is difficult to make sense infographic—if it’s done well, it can be require either a download or a special of them. If it’s being consumed on extremely powerful. But the key is that license, and that will make it harder for a smartphone, that immediately limits there has to be something in it that’s of people to become part of the conversa- what you can do and how much rich- meaning, of value. tion. So the trend is toward making it ness you can communicate. So I would I think sometimes visualization, more ubiquitous in terms of what you be guided by the attention span of your because we’re moving into a more can visualize and how you can visualize audience. That guides what you can do “visual” world, is used as an attention- it, and there the browser plays a key and what you should be doing in terms getter. If you look at some infographics, role as a gateway to view the data. of what you present. SLA they use very saturated colors and large I think the other trend is that visualiza- fonts in a way to draw the eye. It’s like tion is used in different contexts to serve RESOURCES putting a little sugar out there just to different purposes. For example, there Scatterplot matrix (http://mbostock. get somebody to pay attention. So it are certain visualizations that I would github.io/d3/talk/20111116/iris-splom.html) loses its power, but maybe for a certain call entertainment. They’re designed to Motion Chart: https://bost.ocks.org/mike/ consumption pattern, it’s perfect. I get entertain, to move, to elicit an emotion. nations/ a little taste, a little flavor, a little visual Yes, there’s an overall theme, but it’s Treemap of folder hierarchy of one million stimulus, and I can quickly make sense not about being able to reliably infer computer files: http://www.cs.umd.edu/hcil/ of it. No harm is done, but no deep certain things. It’s more to get a feeling millionvis/million-treemap.gif at http://www. insights are generated. about the data. cs.umd.edu/hcil/millionvis/ Can visualization be overused? In cer- Then there are tools that are more tain contexts, it is overused—the visual analytical, so that I can ask questions. If event that’s being created is not war- I do certain interactions, I get the same ranted. But you also have that danger display, whereas if there’s a more artful with videos, when people are creating visualization, it changes each time how animations. Where there is data that it chooses to display the data. has value in it, visualization will play its Another trend is the diversity of dis- role, either to help communicate the play types that are becoming available,

INFORMATION OUTLOOK V21 N05 september/october 2017 15