VOLUME 20, NO 1, JUNE 2009

A joint newsletter of the Statistical Computing & Sections of the American Statistical Association

A Word from our 2009 Section Chairs

JOSÉ PINHEIRO ANTONY UNWIN COMPUTING GRAPHICS

Greetings. I wanted to use my first column as chair to "A picture is worth a thousand words" share some thoughts on what I see as a bright future for Well, sometimes they are and sometimes they aren't. Statistical Computing, using as motivation my own The best visualisations can be very convincing, bad ones professional area, pharmaceutical drug development. can just be confusing and misleading, they are The hope is to reassure some of our younger imaginative but not always statistically aware. There are colleagues of the increasing relevance of Statistical many more visualisations around nowadays, both in Computing in a large number of critical industries and the classical media like newspapers and television, and in areas of application, and the many employment and lots of places on the web. research opportunities that come, and are sure to continue Graphics, Continues on Page 2 … to come, with it.

As many of you have certainly heard, pharmaceutical Word from the Chairs 1 companies are going through tough times, with ever Editorial Notes 3 increasing costs to develop drugs, decreasing number of Gapminder: Liberating the x-axis from the 4 approvals, and dwindling pipelines of promising new burden of time compounds. Recently announced mergers in big pharma The Fan : displaying relative quantities 8 are a direct consequence of this pipeline problem . Taking it to higher dimensions 11 MMDS 08 12 Technology and Commerce Corner 18

Computing, Continues on Page 2 … A Six-Dimensional Scatterplot 19 Conference News 20

PAGE 1 STATISTICAL COMPUTING AND GRAPHICS NEWSLETTER VOLUME 20, No 1, JUNE 2009

Computing, Continues from Page 1 … Graphics, Continues from Page 1… Different strategies have been put in place to try to Sites like Manyeyes, Swivel, Flowingdata, Vizlab, improve things, one of the most promising and effective Data360 ask users to contribute data and visualisations being the use of trial simulation, coupled with statistical for discussion and comparison. This can sometimes and pharmacological modeling. produce some excellent results, occasionally some awful Statistical computing is at very core of this in silico ones (please send me your favorites in the email development approach, which relies on trial simulation to equivalent of a discreet brown paper envelope, anonymity evaluate the operational characteristics (e.g., power, type guaranteed). The very fact that discussion is encouraged I error rate, cost, duration) of complex, innovative study is valuable and it is fascinating to see confirmed what we designs. These designs frequently involve sophisticated always knew: there are few principle of statistical tools, such as Bayesian futility rules to decide that are universally accepted and some people's taste has on study discontinuation for clear lack of efficacy, which to be seen to be believed. themselves are computationally intensive. The bottom "The glass is half empty. The glass is half full." line is that clever, innovative statistical computing Pessimists bewail the flood of poor graphics methods and thinking are a pre-requisite for widespread published on the web, optimists are delighted that use of this promising technology, and research and graphics are getting more attention. Personally I am on employment opportunities for statisticians are being the side of the optimists. The more attention graphics gets created in increasing numbers, as a consequence. Similar the better. If at the same time standards are raised as stories of the increasing importance of Statistical people become aware of what is possible, that would be Computing as a critical tool to achieve greater efficiency splendid. If new ideas are generated through more people are common in many other industries. getting involved, that has to be good news too. If we have Switching to the upcoming JSM in Washington, to put up with graphics we would rather had never seen D.C.: our program-chair, Robert McCulloch, has crafted a the light of day, then that is a small price to pay for wonderful program with 19 great sessions, six of which graphics becoming more used and better known. I would invited. Featured topics include, among others, much prefer that graphics are a heated topic of popular “Advanced Monte Carlo Simulation” and “Case Studies discussion than a refined debate amongst experts in a in Complex Bayesian Computation,” closely related to backroom. the previous discussion on the increasing role of What is on at the JSM? Statistical Computing. Be sure to take a look at the online program: Of course, it's important for graphics experts to put http://www.amstat.org/meetings/jsm/2009/onlineprogram/ across their point of view and contribute to education in good graphical practice. The section has been active in sponsoring courses at the JSM and this year Martin Theus On a final note, I would like to express, on behalf of and Simon Urbanek will be giving a course on Interactive the section, our gratitude for the great work done by J.R. Statistical Graphics, based on their book which came out Lockwood during his three-year tenure as Statistical last Autumn. One of the things I like about the book, Computing Section Awards Chair, which is now coming which I imagine will be an important part of the course, is to and end. J.R.’s leadership and hard work have ensured the inclusion of graphical case studies. They don't just the success of the competitions sponsored by the section, discuss one graphic in isolation, they include a whole which fulfill its key mission of fostering and recognizing group of graphics to explore and gain information from new talent in Statistical Computing. Many thanks, J.R.! their data. I would like to see more of that on the web. At - José Pinheiro the moment you get only single graphics and no one display can do justice to a complex situation. Graphics, Continues on Page 3 …

PAGE 2 STATISTICAL COMPUTING AND GRAPHICS NEWSLETTER VOLUME 20, No 1, JUNE 2009

Graphics, Continued fron Page 3 … Six-Dimensional Scatterplot” to visualize six variables in Our Program Chair, Steven MacEachern, has a single scatterplot. organised a fine set of sessions and there should be plenty Nicholas Lewin-Koh and Andreas Krause (co-editors) to interest everyone. Rather than listing all the highlights here, I suggest you look at the online program and choose as filter all events sponsored by the section. Unfortunately that does not include one event which I hope is a fixture We are always looking for contributions! in your JSM plans, the annual Graphics and Computing Sections mixer which will take place on Monday 3rd August. Please get in touch with the editors directly if you think you have something interesting to contribute: a scientific Surprising as it may seem, plans are already afoot for article, a conference report, a great graph, a piece of next year's JSM in Vancouver. Please let our Program that you wrote or came across, or anything else Chair-elect, Heike Hofmann, have your ideas and that you think might be of interest to our readers. suggestions.

And should that quotation I began with really read "A full graphical analysis involves drawing a thousand pictures"? - Antony Unwin Editorial Change This issue of our newsletter sees an editorial change: Nicholas Lewin-Koh is taking over the position of the Editorial Note newsletter co-editor for the computing section from Michael O’Connell. What’s Inside this Issue Michael has been a co-editor for a year and we would Nicholas Lewin-Koh and Andreas Krause like to thank him for his efforts in 2008. This issue of our newsletter reflects the changes in Nicholas is a statistician our organization: we start out with addresses of the two working in the nonclinical new section chairs, José Pinheiro for the Statistical statistics department of Computing Section and Antony Unwin for the Graphics Genentech/ Roche in the Section. All current officers are listed on the last page of Bay area. In his day to day this volume. work he collaborates with We are glad to have several interesting articles. The many groups in research, scientific articles include “Gapminder: Liberating the x- process development, toxi- axis from the burden of time” by Hans Rosling and Claes cology, bioinformatics and Johansson, introducing a highly interesting piece of even manufacturing. software, readily available for use (watch the video!) and His interests are in “The Fan Plot: A technique for displaying relative smoothing, high dimensional quantities and differences” by Jim Lemon and Anupam , and dimension Tyagi, introducing a graphical technique to display reduction techniques. Nicholas did his graduate work at relative quantities. Iowa State University in statistics and ecology, and his This volume features a conference report on “MMDS undergraduate work at San Francisco State University in 2008: Algorithmic and Statistical Challenges in Modern ecology. Prior to starting school, Nicholas had a career in Large-Scale Data Analysis are the Focus” by Michael W. theater, and performed internationally. Mahoney, Lek-Heng Lim, and Gunnar E. Carlsson. And finally, we have included two single page contributions Welcome Nicholas! about graphics: “Taking it to higher dimensions”, showing some of the problems with 3D effects, and “A

PAGE 3 STATISTICAL COMPUTING AND GRAPHICS NEWSLETTER VOLUME 20, No 1, JUNE 2009

Key indicators are displayed in graphic formats that Scientific Article reach wider audiences, but interactive animations have not yet been applied for statistics at full scale. The young today grow up with amazing interactive animations in Gapminder: Liberat- computer games but still learn to use statistics in the form of static bar and colored . This will change! ing the x-axis from the Gapminder’s experience Gapminder is a non-profit foundation founded in burden of time 2005 with a goal of “…increase use and understanding of statistics and other information about social, economic Hans Rosling, Director, Gapminder Foundation and and environmental development at local, national and Professor of Public Health, Karolinska Institute, global levels.” We make statistical information available Stockholm through our graphical tool and produce videos, Flash Claes Johansson, Deputy Director, Gapminder animations and other products using this software. All our Foundation content is freely available at http://www.gapminder.org . The mission of Gapminder Foundation was established to Gapminder’s main experience is that animation has a increase the use and understanding of statistics. We surprising power to make wide user groups understand promote a fact-based world view in public debate by time-trends and disparities. Equally important is that we displaying accessible public data in animated graphics have shown that innovative computer-generated that can be understood by groups far beyond the normal animations for statistics can emerge from the user end consumers of statistics. rather than from the producer end of the data delivery The world’s taxpayers probably spend more than line. User-end innovation can be stimulated if the concept US$ 10 billion per year on collection and compilation of of producer dissemination is replaced by free access to statistics on all aspects of societies. Censuses, surveys, data. collection from administrative records and other forms of The main innovation from Gapminder is so far “the primary data collection constitute the lion’s share of this moving bubble ” in the form of the Trendalyzer cost. Most of the resulting public statistics are freely software that was acquired by in 2007. Google accessible at national level but mainly in formats that has made a 2008 version freely available as Google only experts can manage and understand. Motion Chart. Figure 1. Trendalyzer: The moving bubble chart (Google Motion Chart).

PAGE 4 STATISTICAL COMPUTING AND GRAPHICS NEWSLETTER VOLUME 20, No 1, JUNE 2009

The basic idea behind this animation chart is that time For users wishing to use similar visualizations to needed to be liberated from the x-axis to which it had Trendalyzer for their own data, provides a been tied since more than a century. After this liberation, similar free ‘gadget’ named Google Motion Chart. It uses time was shown as motion. This dramatically improved the Google Visualization Application Programming the understanding of time trends in a wide audience as Interface (API) and therefore can use data from a range of watching time series statistics became like watching a sources, including Google Spreadsheets but also from ball game. XML or JSON through HTTP. Following the liberation of time the empty x-axis Further details on how to use the software with could be populated by an indicator for which time series arbitrary data is available at are displayed together with a y-axis indicator, and the http://code.google.com/apis/visualization/documentation/ moving bubble displays a 3 rd indicator as bubble size and gallery/motionchart.html . Here data are not limited to the a 4 th as color of the bubble. The name of the bubble unit display of countries or even the use of time as the variable is displayed by “mouse over” or permanently by clicking of animation. For example, the software might even on a bubble. The moving bubble graph represents an animate using the dosage of a drug as the variable of abstraction of the reality it portrays, whereas the display animation. The only requirement of the dataset included of the time dimension as motion is intuitive. But it is to have at least x and y continuous data. Datasets surprised us how all aspects of the bubble chart display typically also include the optional variables for size and became more understandable as soon as the bubbles color. For color the software can both use categorical and started to move. continuous data. When the Singaporean bubble catches up with the The interactive display of time series in moving Swedish bubble all viewers understand that displayed bubbles does not display any form of analytical statistical aspects of life in Singapore have became similar to the calculations. Moving bubbles offers two complementary corresponding aspects in Sweden due to the swift uses: to explore the data for sanity check and for possible progress of Singapore during the last decades. In fact, this correlations to be later analyzed in conventional statistical particular example works well as a quick introduction to software and to communicate findings of conventional the basic functionality of the software. You may view it analysis and monitoring of the progress of societies. yourself at http://www.gapminder.org/graphs/35 . We have found Trendalyzer to be particularly useful Broadly speaking, the software is an interactive to explain statistical trends by narrating a story in lectures visualization and exploration tool primarily meant for or a screen capture video of which there are several: time series for geographical units. The main mode of http://www.gapminder.org/videos/ . Trendalyzer is the chart (Fig 1). It allows for the simultaneously display of four time series: x axis, y axis, One particularly successful video and a good as well as size and color of the bubbles. The software introduction is from the talk of one of the authors, requires continuous data for x, y, size, but allows both, Professor Rosling, given at the TED conference in 2006: continuous or categorical data for color of the bubbles. http://www.gapminder.org/graphs/35 The Trendalyzer reads data from Google’s online One common problem with interactive software used office suite . More specifically, the data are for communication is how to reference the particular stored in Google Spreadsheets which reside online on graphic being shown on your screen. Trendalyzer solves Google’s server. Through a particular convention the this by communicating through the URL. In other words, software reads and displays data from these spreadsheets. whenever something is changed in the graph, the URL Technically, the software is written in ActionScript 2, automatically updated to reflect this change, whether it is the penultimate version of the programming language a change of variable selection, display time, or country used in . The web service which loads the selection. The URL becomes a permanent link that can data and presents it to the software is written in Java. then be copied and shared in email, bibliographic While the source code was acquired by Google, references or through other means. Gapminder has a perpetual license to any binary versions What are the strengths? released by Google, and also has the right to sub-license We have many times shown time series to the subject the software to whomever it chooses. matter expert who compiled the dataset only to hear surprise at some feature of the data which previously

PAGE 5 STATISTICAL COMPUTING AND GRAPHICS NEWSLETTER VOLUME 20, No 1, JUNE 2009 wasn’t noted and understood to the producer before it was These are very powerful features because they allow a animated in front of the eyes. wide group of users to interact deeply with the data Another powerful feature of Trendalyzer is the direct without having to manipulate the underlying numbers in a interactivity in the GUI. Each bubble in the graph will, spreadsheet. A typical question, when displaying when hovered over, display the name of the entity shown international country data, is “how is my country doing?”. as well as the values underlying the x and y positions as Through the interactive features these kinds of questions well as the size. If clicked, the entity will be selected, can easily be answered. which decreases the opacity of all non-selected bubbles (Figure 2). Moreover, once a bubble is selected it will be displayed with a ‘trail’ when the graph is animated (Figure 3).

Figure 2. Selected entities cause the other bubbles to decrease opacity.

PAGE 6 STATISTICAL COMPUTING AND GRAPHICS NEWSLETTER VOLUME 20, No 1, JUNE 2009

In the future Gapminder hopes to both expand the However, in the future, we hope to expand the group availability of cross-country datasets and to do more to of users who are capable to use the software without our explain these datasets using videos and other tools. We involvement, in keeping with our mission of broadening have found that working with subject matter experts who and deepening understanding of the world by unveiling bring their expertise combined with our knowledge of the the beauty of statistics. software and how to use it to explain phenomena has been quite effective.

Figure 3. Selection of a bubble causes a trail of the data in time when the graph is animated.

PAGE 7 STATISTICAL COMPUTING AND GRAPHICS NEWSLETTER VOLUME 20, No 1, JUNE 2009

areas of the world, as estimated by the IUCN, will be Scientific Article used. The success of the illustration may be judged by The Fan Plot: A technique for having a previously uninformed member of the intended audience correctly explain those relationships upon displaying relative quantities viewing the illustration. A final restriction is that the and differences sensory modality be visual. The successful comparison of quantities depends to Jim Lemon some extent upon how they are portrayed. Relative Research Fellow comparisons are most easily made if the representations National Drug and Alcohol Research Centre of quantities are adjacent, have the same origin and University of New South Wales numeric scale. This suggests a bar plot. Figure 1 shows Sydney NSW 2052, Australia the example data as a bar plot, using the "barplot" [email protected] function of the R statistical language.

Anupam Tyagi Figure 1. of threatened species by Consultant Economist geographical area. anuptyagi@.com

ABSTRACT Displaying both relative quantities and the differences between those quantities in an illustration remains a challenge. The fan plot, a variation on the pie chart, is presented as an alternative way of illustrating relative magnitude and differences. Using real data, the bar plot, pie chart and fan plot are compared and a test of the ability of the fan plot to convey the required information is suggested.

Summary illustrations of quantitative data continue to challenge the researcher who presents to an audience that does not share the researcher's expertise. The concepts may not be especially complex, but the difficulty of conveying them is well recognized (Tufte, 2001). Ideally, an illustration should explain the concept with little or no supplementary explanation by the presenter. The observation that many illustrations do not achieve this suggests that there is room for development Because the numbers have been placed in decreasing in this area. order, it is easy to see that Asia is home to the most Beginning with the principle that no single illustrative endangered species, while Europe is home to the fewest. technique can suit all data sets, the first task is to define It is also easy to see that Asia is home to considerably the characteristics of the data to be presented. In what more endangered species than Africa, and that Oceania follows, it will be assumed that a small number, say a has a smaller excess compared to Europe. maximum of ten, quantities are to be displayed. Each Could the hypothetical member of the audience quantity will have a label that links it to its source. correctly state the relative proportion of Oceania to Africa The principal aim is to display the relative or the relation of the differences between the Americas magnitudes and differences between the quantities. As an and Europe and Oceania? example, the numbers of threatened species in various Clearly the farther apart any two rectangles are, the more difficult it will be to make such a comparison. The

PAGE 8 STATISTICAL COMPUTING AND GRAPHICS NEWSLETTER VOLUME 20, No 1, JUNE 2009 same objection holds for any comparison of length. If the outer arc of each sector is visible. An illustration bars were to be made very narrow in order to facilitate somewhat reminiscent of a folding fan is produced, as in such comparisons, the labels would be commensurately Figure 3. crowded and it would be more difficult to identify which Differences between adjacent sectors are apparent as bar represented which area. If the plot area began just thinner sectors, allowing the relative differences between below the smallest quantity to be displayed, the adjacent non-adjacent sectors to be visually compared. To estimate differences would be magnified, but the proportional the difference between the number of endangered species comparisons would be difficult or impossible. in Oceania and Europe with the same difference between The ever popular pie chart (Playfair, 1801), useful for the Americas is now at least possible, and the success of displaying large differences between very small numbers the audience in estimating that ratio is a proposed test of of quantities, appears less successful than the barplot in the effectiveness of the fan plot. this instance. Comparisons of adjacent and opposite sectors are equally difficult. Does South America have Figure 3. Fan plot of threatened species by area. more threatened species than North and Central? If the actual numbers were appended to the sector labels, the audience could do the comparisons with simple arithmetic, but a would then be satisfactory and the illustration would become a mere decoration.

Figure 2. Pie chart of threatened species by geographical area.

The inspiration for the fan plot, as for the pie chart, derives from the latter’s reference to food. Many items of food apart from pies are roughly circular and their division may be of considerable importance to the consumers. The popularity of the pie chart may in part derive from this analogy, in that most people are practiced in the estimation of equality or difference of circular sectors. Anyone who doubts this is encouraged to slice a pie into fairly equal sectors and then ask a hungry child to take the largest. While the average person can make a fairly accurate determination of the relative size of a circular sector, estimating the differences between sectors is a much

more difficult task. However, if those differences can be Yet the attraction of the pie chart is that the audience converted into sectors, they should be easier to compare. typically understands it, or appears to. The technique to The superimposition of the sectors means that the visible be described is a modification of the pie chart intended to radial slice of each sector but the smallest represents the address the shortcomings while preserving the difference between it and the adjacent sector. intelligibility. In the original implementation of the fan plot, the The sectors of the pie chart are first rotated so that sum of the sectors was restricted to a complete circle. they are concentric and overlapping, with a common This is not strictly necessary, as the overlapping sectors radial alignment. In decreasing order of extent, can be stretched until the largest is a complete circle each sector but the largest is reduced in radius so that the

PAGE 9 STATISTICAL COMPUTING AND GRAPHICS NEWSLETTER VOLUME 20, No 1, JUNE 2009 while still allowing the complete extent of every sector to REFERENCES be visible. Lemon, J. (2006) plotrix: A package in the red light This manipulation expands the differences as well, district of R. R News, 6(4),8-12. perhaps making those differences more easily compared. Playfair, W. (1801) Commercial and Political Atlas Figure 3 illustrates the data with the largest sector and Statistical Breviary. Cambridge: Cambridge expanded to a semicircle. University Press. The fan plot has been programmed in the R statistical R Development Core Team (2003) R: a language and language (R Core Team, 2006) and is available in the environment for statistical computing. (ISBN 3-900051- plotrix package (Lemon, 2003). The options of the fan 00-3), Vienna: R Foundation for Statistical Computing. plot function allow circumferential labels to be displayed, so that the sectors can be identified without the inclusion Tufte, E.R. (2001) The visual display of quantitative of a legend. Circumferential tick marks can also be information. Cheshire: Graphics Press. included if this improves the intelligibility of the plot. The fan plot is intended primarily for use in SOFTWARE INSTALLATION illustrations where both relative quantities and the From within R, select from the menu “Packages” the item differences between those quantities are to be conveyed “Install Packages”. Select from the list a location close to to the reader or audience. It maintains the simplicity and you, select from the next menu the item plotrix, click OK. comprehensibility of the pie chart, but deals more effectively with relative quantities and differences. The If you prefer the R command line, enter aim is to produce understandable and memorable images utils:::menuInstallPkgs() and then proceed as described that require minimal explanation. above. After successful installation of the library, enter > library(plotrix) Enter > ?fan.plot or ?floating.pie to see the corresponding help pages.

PAGE 10 STATISTICAL COMPUTING AND GRAPHICS NEWSLETTER VOLUME 20, No 1, JUNE 2009

Figure 2. Reading off the values in a 3d bar chart. Taking it to higher dimensions Recently I encountered the following situation: I was asked to generate a three-dimensional graph with shadings and colors with a spreadsheet package. So I took a data set and produced the graph I was asked for. This type of graph made me aware of a problem that is not obvious at first glance. Take a look at the graph below. The same problem occurs in Figure 3, and the Figure 1. Three-dimensional bar chart. problem is even magnified: the reference lines on the left, closest to the bars, are not horizontal any more. Can you 40 read off the values correctly? (They are still 10, 20, 30, 35 and 40).

30

25 Figure 3. Three-dimensional bar chart (2).

20

15

10

5 40 0 ABCD 30

The bars represent four simple values, 10, 20, 30, and 20 S4 40. The bars denote four values that are labeled A, B, C, S3 10 and D underneath the corresponding bars. S2 S1 My visual check if I did the graph correctly consisted 0 of looking at the graph and checking if the values 1 displayed summed up to 100, the sum of the values. These graphs of four simple values might look If you take a look at the top of the bars and the “nice”, but their interpretation is not straightforward and horizontal lines, you will notice that the horizontal line this seems not a good way to produce a graph that gets that is labeled “40” on the left-hand side is clearly above interpreted by co-workers or clients. the rightmost bar that represents a value of 40. Similarly, the lines denoting the values of 10, 20, and 30 are clearly Hadley Wickham pointed out to me that the problem above the bars that should have these values. was already recognized by K.W. Haemer in his 1947 publication in The American Statistician, drawing the Why is this? Because we look at the data in three illustrations with pencil and paper. More than sixty years dimensions! The value must be read off by projecting the later, the problem still persists and is possibly even more bar into the two-dimensional plane that contains the widespread due to the use of popular software packages. horizontal lines. In other words, the bars must be projected onto the rear wall that holds the reference lines. Andreas Krause The most important element in this chart, as it turns out, is the gap between the bars and the rear wall. That Reference: gap determines the projection onto the rear wall as Haemer, K.W. (1947): The perils of perspective. The displayed in Figure 2. American Statistician, Vol. 1, No. 3, p. 19.

PAGE 11 STATISTICAL COMPUTING AND GRAPHICS NEWSLETTER VOLUME 20, No 1, JUNE 2009

As with the original 2006 meeting, the MMDS MMDS 2008: Algorithmic and 2008 program generated intense interdisciplinary Statistical Challenges in Mod- interest: with 43 talks and 18 poster presentations from a wide spectrum of researchers in modern ern Large-Scale Data Analysis large-scale data analysis, including both senior re- searchers well-established as leaders in their respec- are the Focus tive fields as well as junior researchers promising to become leaders in this new interdisciplinary field, Michael W. Mahoney the program drew nearly 300 participants. Research Scientist Department of Mathematics Diverse Approaches to Modern Data Problems Stanford University. Graph and matrix problems were common topics [email protected] for discussion, largely since they arise naturally in Lek-Heng Lim almost every aspect of data mining, machine learn- Charles Morrey assistant professor ing, and pattern recognition. For example, a common Department of Mathematics way to model a large social or information network University of California, Berkeley is with an interaction graph model, G = (V,E), in [email protected] which nodes in the vertex set V represent “entities” Gunnar E. Carlsson and the edges (whether directed, undirected, weighted or unweighted) in the edge set E represent Professor “interactions” between pairs of entities. Department of Mathematics Stanford University Alternatively, these and other data sets can be [email protected] modeled as matrices, since an m× n real-valued ma- trix A provides a natural structure for encoding in- The 2008 Workshop on Algorithms for Modern formation about m objects, each of which is de- Massive Data Sets (MMDS 2008), sponsored by the scribed by n features. Due to their large size, their NSF, DARPA, LinkedIn, and Yahoo!, was held last extreme sparsity, and their complex and often adver- year at Stanford University, June 25–28, 2008. The sarial noise properties, data graphs and data matrices goals of MMDS 2008 were (1) to explore novel arising in modern informatics applications present techniques for modeling and analyzing massive, considerable challenges and opportunities for inter- high-dimensional, and nonlinearly-structured scien- disciplinary research. tific and internet data sets; and (2) to bring together These algorithmic, statistical, and mathematical computer scientists, statisticians, mathematicians, challenges were the focus of MMDS 2008. It is and data analysis practitioners to promote cross- worth emphasizing the very different perspectives fertilization of ideas. that have historically been brought to such problems. MMDS 2008 originally grew out of discussions For example, a common view of the data in a data- about our vision for the next-generation of algo- base, in particular historically among computer sci- rithmic, mathematical, and statistical analysis meth- entists interested in data mining and knowledge dis- ods for complex large-scale data sets. These discus- covery, has been that the data are an accounting or a sions occurred in the wake of MMDS 2006, which record of everything that happened in a particular was originally motivated by the complementary per- setting. spectives brought by the numerical linear algebra For example, the database might consist of all the and theoretical computer science communities to customer transactions over the course of a month, or matrix algorithms in modern informatics applica- it might consist of all the friendship links among tions [1]. members of a social networking site. From this per-

PAGE 12 STATISTICAL COMPUTING AND GRAPHICS NEWSLETTER VOLUME 20, No 1, JUNE 2009 spective, the goal is to tabulate and process the data graphs and networks yield to approximation algo- at hand to find interesting patterns, rules, and asso- rithms when assumptions are made about the net- ciations. An example of an association rule is the work participants; much recent work in machine proverbial “People who buy beer between 5 p.m. and learning draws on ideas from both areas; and in 7 p.m. also buy diapers at the same time.” boosting, a statistical technique that fits an additive model by minimizing an objective function with a The performance or quality of such a rule is method such as gradient descent, the computation judged by the fraction of the database that satisfies parameter, i.e., the number of iterations, also serves the rule exactly, which then boils down to the prob- as a regularization parameter. lem of finding frequent item sets. This is a compu- tationally hard problem, and much algorithmic work Given the diversity of possible perspectives, has been devoted to its exact or approximate solution MMDS 2008 was organized loosely around six hour- under different models of data access. long tutorials that introduced participants to the ma- jor themes of the workshop. A very different view of the data, more common among statisticians, is one of a particular random Large-Scale Informatics: Problems, Methods, and instantiation of an underlying process describing un- Models observed patterns in the world. In this case, the goal On the first day of the workshop, participants is to extract information about the world from the heard tutorials by Christos Faloutsos of Carnegie noisy or uncertain data that is observed. To achieve Mellon University and Edward Chang of Google Re- this, one might posit a model: search, in which they presented an overview of tools and applications in modern large-scale data analysis. data ∼ F θ and mean(data) = g( θ), Faloutsos began his tutorial on “Graph mining: where F is a distribution that describes the ran- θ laws, generators and tools” by motivating the prob- dom variability of the data around the deterministic lem of data analysis on graphs. He described a wide model g( θ) of the data. Then, using this model, one range of applications in which graphs arise naturally, would proceed to analyze the data to make infer- and he reminded the audience that large graphs that ences about the underlying processes and predictions arise in modern informatics applications have struc- about future observations. From this perspective, tural properties that are very different from tradi- modeling the noise component or variability well is tional Erdõs-Rényi random graphs. For example, due as important as modeling the mean structure well, in to subtle correlations, statistics such as degree distri- large part since understanding the former is neces- butions and eigenvalue distributions exhibit heavy- sary for understanding the quality of predictions tailed behavior. made. Although these structural properties have been With this approach, one can even make predic- studied extensively in recent years and have been tions about events that have yet to be observed. For used to develop numerous well-publicized models, example, one can assign a probability to the event Faloutsos also described empirically-observed prop- that a given user at a given web site will click on a erties that are not well-reproduced by existing mod- given advertisement presented at a given time of the els. As an example, most models predict that over day, even if this particular event does not exist in the time the graph should become sparser and the diame- database. ter should grow as O(logN) or perhaps O(log logN), The two perspectives need not be incompatible. where N is the number of nodes at the current time For example, statistical and probabilistic ideas are step, but empirically it is often observed that the central to much of the recent work on developing networks densify over time and that their diameter improved approximation algorithms for matrix prob- shrinks. lems; otherwise intractable optimization problems on

PAGE 13 STATISTICAL COMPUTING AND GRAPHICS NEWSLETTER VOLUME 20, No 1, JUNE 2009

To explain these phenomena, Faloutsos de- substantially from the scalability issues that arise in scribed a model based on Kronecker products and traditional applications of interest in scientific com- also a model in which edges are added via an itera- puting. tive “forest fire” burning mechanism. With appropri- A recurrent theme of Chang was that an algo- ate choice of parameters, both models can be made rithm that is expensive in floating point cost but to reproduce a much wider range of static and dy- readily parallelizable is often a better choice than namic properties than can previous generative mod- one that is less expensive but non-parallelizable. els. As an example, although SVMs are widely-used, Building on this modeling foundation, Faloutsos largely due to their empirical success and attractive spent much of his talk describing several graph min- theoretical foundations, they suffer from well-known ing applications of recent and ongoing interest: scalability problems in both memory use and com- methods to find nodes that are central to a group of putational time. To address these problems, Chang individuals; applications of the Singular Value De- described a Parallel SVM algorithm. This algorithm composition and recently-developed tensor methods reduces memory requirements by performing a row- to identifying anomalous patterns in time-evolving based Incomplete Cholesky Factorization (ICF) and graphs; modeling information cascades in the blo- by loading only essential data to each of the parallel gosphere as virus propagation; and novel methods machines; and it reduces computation time by intel- for fraud detection. Edward Chang described other ligently reordering computational steps and by per- developments in web-scale data analysis in his tuto- forming them on parallel machines. Chang noted that rial on “Mining large-scale social networks: chal- the traditional column-based ICF is better for the lenges and scalable solutions.” single machine setting, but it cannot be parallelized After reviewing emerging applications—such as as well across many machines. social network analysis and personalized information Algorithmic Approaches to Networked Data retrieval—that have arisen as we make the transition from Web 1.0 (links between pages and documents) Milena Mihail of the Georgia Institute of Tech- to Web 2.0 (links between documents, people, and nology described algorithmic perspectives on devel- social platforms), Chang covered four applications in oping better models for data in her tutorial “Models detail: spectral clustering for network analysis, fre- and algorithms for complex networks.” She noted quent itemset mining, combinatorial collaborative that in recent years a rich theory of power law ran- filtering, and parallel Support Vector Machines dom graphs, i.e., graphs that are random conditioned (SVMs) for personalized search. on a specified input power law degree distribution, has been developed. With the increasingly wide In all these cases, he emphasized that the main range of large-scale social and information networks performance requirements were “scalability, scal- that are available, however, generative models that ability, scalability.” Modern informatics applications are structurally or syntactically more flexible are in- like web search afford easy parallelization—e.g., the creasingly necessary. overall index can be partitioned such that even a sin- gle query can use multiple processors. Mihail described two such extensions: one in which semantics on nodes is modeled by a feature Moreover, the peak performance of a machine is vector and edges are added between nodes based on less important than the price-performance ratio. In their semantic proximity; and one in which the phe- this environment, scalability up to petabyte-sized nomenon of associativity/ disassociativity is modeled data often means working in a software framework by fixing the probability that nodes of a given degree like MapReduce or Hadoop that supports data-inten- di tend to link to nodes of degree dj . sive distributed computations running on large clus- ters of hundreds, thousands, or even hundreds of By introducing a small extension in the parame- thousands of commodity computers. This differs ters of a generative model, of course, one can ob-

PAGE 14 STATISTICAL COMPUTING AND GRAPHICS NEWSLETTER VOLUME 20, No 1, JUNE 2009 serve a large increase in the observed properties of Whereas in certain applications, such as in phys- generated graphs. This observation raises interesting ics, the studied phenomena support clean ex- statistical questions about model overfitting, and it planatory theories which define exactly the metric to argues for more refined and systematic methods of use to measure the distance between pairs of data model parameterization. This observation also leads points, in most MMDS applications this is not the to new algorithmic questions that were the topic of case. For instance, the Euclidean distance between Mihail’s talk. DNA expression profiles in high-throughput mi- croarray experiments may or may not capture a An algorithmic question of interest in the basic meaningful notion of distance between genes. power law random graph model is the following: given as input an N-vector specifying a degree se- Similarly, although a natural geodesic distance is quence, determine whether there exists a graph with associated with any graph, the sparsity and noise that degree sequence, and, if so, efficiently generate properties of social and information networks means one (perhaps approximately uniformly randomly that this is not a particularly robust notion of distance from the ensemble of such graphs). in practice. Such realizability problems have a long history Part of the problem is thus to define useful met- in graph theory and theoretical computer science. rics—in particular since applications such as clus- Since their solutions are intimately related to the tering, classification, and regression often depend theory of graph matchings, many generalizations of sensitively on the choice of metric—and two design the basic problem can be addressed in a strict theo- goals have recently emerged. retical framework. First, don’t trust large distances—since distances For example, motivated by associa- are often constructed from a similarity measure, tive/disassociative networks, Mihail described recent small distances reliably represent similarity but large progress on the Joint-Degree Matrix Realization distances make little sense. Problem: given a partition of the node set into Second, trust small distances only a bit—after classes of vertices of the same degree, a vector speci- all, similarity measurements are still very noisy. fying the degree of each class, and a matrix specify- These ideas have formed the basis for much of the ing the number of edges between any two classes, work on Laplacian-based non-linear dimensionality determine whether there exists such a graph, and if reduction, i.e., manifold-based, methods that are cur- so construct one. rently popular in harmonic analysis and machine She also described extensions of this basic prob- learning. More generally, they suggest the design of lem to connected graphs, to finding minimum cost analysis tools that are robust to stretching and realizations, and to finding a random graph satisfy- shrinking of the underlying metric, particularly in ing those basic constraints. applications such as visualization in which qualita- tive properties, such as how the data are organized The Geometric Perspective: Qualitative Analysis on a large scale, are of interest. of Data A very different perspective was provided by Much of Carlsson’s tutorial was occupied by de- scribing these analysis tools and their application to Gunnar Carlsson of Stanford University, who gave natural image statistics and . Ho- an overview of geometric and topological ap- mology is the crudest measure of topological proper- proaches to data analysis in his tutorial “Topology ties, capturing information such as the number of and data.” The motivation underlying these ap- connected components, whether the data contain proaches is to provide insight into the data by im- holes of various dimensions, etc. posing a geometry on it. Importantly, although the computation of homol- ogy is not feasible for general topological spaces, in

PAGE 15 STATISTICAL COMPUTING AND GRAPHICS NEWSLETTER VOLUME 20, No 1, JUNE 2009 many cases the space can be modeled in terms of loss function exhibit high variance. To make the es- simplicial complexes, in which case the computation timates more regular, one typically considers a con- of homology boils down to the linear algebraic com- strained or penalized optimization problem putation of the Smith normal form of certain data- dependent matrices. Carlsson also described persis- tent homology, an extension of the basic idea in where is the empirical loss and P (·) is a pen- which parameters such as the number of nearest alty term. The choice of an appropriate value for the neighbors, error parameters, etc., can be varied. regularization parameter λ is a classic model selec- A “bar code signature” can then be associated tion problem, for which cross validation can be used. with the data set. Long segments in the bar code in- The choice for the penalty depends on what is known dicate the presence of a homology class which per- or assumed about the problem at hand. sists over a long range of parameters values. This A common choice is can often be interpreted as corresponding to large- This interpolates between the subset selection prob- scale geometric features in the data, while shorter lem (γ=0) and ridge regression (γ=0) and includes segments can be interpreted as noise. the well-studied lasso (γ=0). For γ ≤ 1, sparse solu- Statistical and Machine Learning Perspectives tions (which are of interest due to parsimony and in- terpretability) are obtained, and for γ ≥ 1, the penalty Statistical and machine learning perspectives on is convex. MMDS were the subject of a pair of tutorials by Jerome Friedman of Stanford University and Mi- Although one could choose an optimal ( λ, γ) by chael Jordan of the University of California at cross validation, this can be prohibitively expensive, Berkeley. even when the loss and penalty are convex, due to the need to perform computations at a large number Given a set of measured values of attributes of an of discretized pairs. In this case, path seeking meth- object, x = (x 1, x 2, …, x n), the basic predictive or ods have been studied. machine learning problem is to predict or estimate the unknown value of another attribute y. The quan- Consider the path of optimal solutions tity y is the “output” or “response” variable, and {x 1, which is a one-dimensional curve x2, … , x n} are the “input” or “predictor” variables. in the parameter space . If the loss function is quadratic and the penalty function is piecewise lin- In regression problems, y is a real number, while ear, e.g., with the lasso, then the path of optimal so- in classification problems, y is a member of a dis- lutions is piecewise linear, and homotopy methods crete set of unorderable categorical values (such as can be used to generate the full path in time that is class labels). In either case, this can be viewed as a not much more than that needed to fit a single model function estimation problem—the prediction takes at a single parameter value. the form of a function that maps a point x in the space of all joint values of the predictor vari- Friedman described a generalized path seeking algorithm, which solves this problem for a much ables to a point in the space of response variables, wider range of loss and penalty functions (including and the goal is to produce an F(·) that minimizes a some non-convex functions) very efficiently. loss criterion. Jordan, in his tutorial “Kernel-based contrast In his tutorial, “Fast sparse regression and classi- functions for sufficient dimension reduction,” con- fication,” Friedman began with the common assump- sidered the dimensionality reduction problem in a tion of a linear model, in which F(x) = is supervised learning setting. modeled as a linear combination of the n basis func- tions. Unless the number of observations is much Methods such as Principal Components Analysis, larger than n, however, empirical estimates of the Johnson-Lindenstrauss techniques, and recently- developed Laplacian-based non-linear methods are

PAGE 16 STATISTICAL COMPUTING AND GRAPHICS NEWSLETTER VOLUME 20, No 1, JUNE 2009 often used, but their applicability is limited since, Conclusions and Future Directions e.g., the axes of maximal discrimination between In addition to other talks on the theory of data al- two the classes may not align well with the axes of gorithms, machine learning and kernel methods, di- maximum variance. mensionality reduction and graph partitioning meth- Instead, one might hope that there exists a low- ods, and co-clustering and other matrix factorization dimensional subspace S of the input space X which methods, participants heard about a wide variety of can be found efficiently and which retains the statis- data applications, including movie and product rec- tical relationship between X and the response space ommendations; predictive indexing for fast web Y. search; pathway analysis in biomolecular folding; Conventional approaches to this problem of Suf- functional MRI, high-resolution terrain analysis, and ficient Dimensionality Reduction (SDR) make strong galaxy classification; and other applications in com- putational geometry, , computer modeling assumptions about the distribution of the vision, and manifold learning. covariate X and/ or the response Y. Jordan consid- ered a semiparametric formulation, where the condi- (We even heard about using approximation algo- tional distribution p(Y | X) is treated nonparam- rithms in a novel manner to probe the community etrically and the goal is estimate the parameter S. structure of large social and information networks to He showed that this problem could be formulated test the claim that such data are even consistent with in terms of conditional independence and that it the manifold hypothesis - they clearly are not.) could be evaluated in terms of operators on Repro- In all these cases, scalability was a central issue - ducing Kernel Hilbert Spaces (RKHSs). Recall that motivating discussion of external memory algo- claims about the independence between two random rithms, novel computational paradigms like MapRe- variables can be reduced to claims about correlations duce, and communication-efficient linear algebra between them by considering transformations of the algorithms. random variables: Interested readers are invited to visit the confer- X1 and X 2 are independent if and only if ence website, http://mmds.stanford.edu , where the presentations from all speakers can be found. The feedback we received made it clear that for a suitably rich function space Η. If Η is L 2 MMDS has struck a strong interdisciplinary chord. and thus contains the Fourier basis, this reduces to a For example, nearly every statistician commented on well-known fact about characteristic functions. More the desire for more statisticians at the next MMDS; interesting from a computational perspective - recall nearly every scientific computing researcher told us that by the “reproducing” property, function evalua- they wanted more data-intensive scientific compu- tion in a RKHS reduces to an inner product - this tation at the next MMDS; nearly every practitioner also holds for suitably rich RKHSs. This use of from an application domain wanted more applica- RKHS ideas to solve this SDR problem cannot be tions at the next MMDS; and nearly every theoretical viewed as a kernelization of an underlying linear al- computer scientist said they wanted more of the gorithm, as is typically the case when such ideas are same. used (e.g., with SVMs) to provide basis expansions There is a lot of interest in MMDS as a develop- for regression and classification. ing interdisciplinary research area at the interface Instead, this is an example of how RKHS ideas between computer science, statistics, applied provide algorithmically efficient machinery to opti- mathematics, and scientific and internet data appli- mize a much wider range of statistical functionals of cations. Keep an eye out for future MMDSs! interest.

PAGE 17 STATISTICAL COMPUTING AND GRAPHICS NEWSLETTER VOLUME 20, No 1, JUNE 2009

Acknowledgments The authors are grateful to the numerous indi- Technology and viduals (in particular, Mayita Romero, Victor Olmo, and David Gleich) who provided assistance prior to Commerce Corner and during MMDS 2008; to Diane Lambert for pro- YouTube viding an interesting perspective on these problems that we incorporated here; and to each of the speak- The widely known archive for movies contains a large ers, poster presenters, and other participants, without number of videos on many topics related to statistics, whom MMDS 2008 would not have been such a computing, and graphics. These include: success. Peter Donnelly’s talk at the TED conference on “How References juries are fooled by statistics”: http://www.youtube.com/watch?v=kLmzxmRcUTo [1] G.H. Golub, M.W. Mahoney P. Drineas, and L.- H. Lim, “Bridging the gap between numerical linear Did you know? Predicting Future Statistics: algebra, theoretical computer science, and data ap- http://www.youtube.com/watch?v=j7FP1kgtD8U plications,” SIAM News, 39, no. 8, (2006) Video Archive of the Statistical Computing and Statis- tical Graphics Sections The ASA sections on Statistical Computing and Statisti- cal Graphics maintain a web page with great videos about graphics and visualization: http://stat-graphics.org/movies/ Among many other great movies, you can watch J.B. Kruskals demonstration on multidimensional scaling from 1962 and John W. Tukey showing Prim-9, an interactive data analysis system back in 1973 (!). Gallery of Data Visualization maintains a web page that carries the subtitle “The Best and Worst of Statistical Graphics”. http://www.math.yorku.ca/SCS/Gallery/ Professional Networking on LinkedIn LinkedIn is one of many professional networking sites, maybe similar to social networking sites like facebook. These networks offer easy ways of setting up discussion groups, and very likely you will find many of your friends and colleagues here already. Some of these groups on LinkedIn are on Visual Ana- lytics, The R Project for Statistical Computing, Data Min- ing, Statistics and Visualization, cloud computing, graph- ics professionals, and more. http://www.linkedin.com/ Andreas Krause

PAGE 18 STATISTICAL COMPUTING AND GRAPHICS NEWSLETTER VOLUME 20, No 1, JUNE 2009

The patient started taking 400 mg of the drug and A Six-Dimensional continued on this dose for about 150 days. A pause with no drug for about 10 days follows before the patient started taking the medication again. After about 300 days, Scatterplot another so-called drug holiday takes place after which the Andreas Krause dose is increased to 600 mg. The dotted line indicates that no data is available about the end of this dosing regimen. The graph below shows how a lot of information can be condensed into a graph that is easily understood. Blood samples were taken and the concentration of a In this particular example, we were able to display marker in the blood was measured at various time points. The time points of the blood sampling are indicated by the data in a form that made the construction of a vertical dashes within the polygon on top. The normal statistical model obsolete since the clients were able range is indicated by black horizontal lines. to derive the results from the display of the raw data. The graph shows data from a single patient. The The patient’s well-being was measured: at each visit: patient is identified by the number across the graph (the it was assessed whether the disease was progressing, panel strip). He or she was observed for some 600 days, stable, or improving. The disease status is associated with while receiving a treatment. The doses are indicated by an intuitive color scheme: red (warning color) indicates the pink line at the bottom. progressive disease, yellow indicates stable disease, and green (positive association) indicates an improving patient status. Blue indicates that no assessment is available. The graph can do without a legend because the color scheme seems intuitive. One could argue that in total this scatterplot has six dimensions: the patient number, the time axis, the Figure 1. Concentration of a biological marker in blood concentration of the marker, the normal range, the blood over time, normal ranges (black lines), the patient’s disease status, and the drug dosing his- disease status (line colors), and dose of drug re- tory. ceived.

PAGE 19 STATISTICAL COMPUTING AND GRAPHICS NEWSLETTER VOLUME 20, No 1, JUNE 2009

The 65 th Deming Conference Conference News Atlantic City, NJ, Dec 7-11, 2009 Several meetings and conferences are coming up that http://www.demingconference.com/ might be of interest to our members. Among them are: The conference consists of tutorials only, either half a day or two days long. Generally, the tutorials are based on books that have appeared recently. The focus of this year Statistical Computing 2009 is on drug development and related topics. Reisensburg Castle, Guenzburg, Germany,

June 28-July 1, 2009 If you would like your conference to be listed here, http://www.statistical-computing.de/ please drop us a note. The 41 st workshop comprises a variety of presentations with a focus on computing and bioinformatics. The old castle provides a perfect environment for extended discussions.

Innovative Approaches to Turn Statistics into Puzzlers Knowledge Suitland, MD, July 15-16, 2009 You might be familiar with NPR’s radio show Car Talk, featuring Tom and Ray Magliozzi. http://www.oecd.org/document/33/0,3343,en_40033426_ 40033828_42054241_1_1_1_1,00.html But did you know that these two MIT graduates put out a new puzzler every week (except when the puzzler is Conference of the OECD, The World Bank, and other on vacation)? governmental organizations on measuring the progress of societies to foster the development of sets of key The puzzlers are often highly scientific in nature, as economic, social, and environmental indicators. can be guessed already by the fact that the staff comprises a statistician (Marge Innovera), a chairman of the Math

Dept. (Horatio Algebra), a chief estimator (Edward James UseR! 2009 Almost) and a public opinion pollster (Paul Murky of Rennes, France, July 8-10, 2009 Murky Research). http://www2.agrocampus-ouest.fr/math/useR-2009/ The puzzlers are available on the web, as text and as The R User Conference, covering a wide variety of topics sound clips. Check out the Car Talk web page for some around the R system, statistics and computing. entertaining and challenging puzzlers: http://www.cartalk.com/content/puzzler/ American Conference on Pharmacometrics (ACoP) Mystic, CT, October 4-7, 2009 The page contains the answers too, except for this week’s puzzler of course since you can win a prize for http://www.go-acop.org/ sending in the correct answer. The conference is the key conference for pharmaco- The full car talk staff list is available at metrics, also known as modeling and simulation in drug development. One of this year’s highlights is a http://www.cartalk.com/content/about/credits/credits.html presentation by .

PAGE 20

Statistical Computing Statistical Graphics Section Officers 2009 Section Officers 2009 T Jose C. Pinheiro, Chair Antony Unwin, Chair [email protected] [email protected] (862) 778-8879 +49-821-598-2218 Deborah A. Nolan, Past-Chair Daniel J. Rope, Past-Chair [email protected] [email protected] The Statistical Computing & Statistical 643-7097 (703) 740-2462 Graphics Newsletter is a publication of Luke Tierney, Chair-Elect Simon Urbanek, Chair-Elect the Statistical Computing and Statisti- [email protected] [email protected] cal Graphics Sections of the ASA. All (319) 335-3386 (973) 360-7056 communications regarding the publica- tion should be addressed to: Elizabeth Slate, Secretary/ Frederick J. Wicklin, Secretary/ Treasurer Treasurer Nicholas Lewin-Koh [email protected] [email protected] Editor Statistical Computing Section (843) 876-1133 (614) 688-3634 [email protected] Peter Craigmile, COS Rep 08-10 Montserrat Fuentes, Andreas Krause COMP/COS Representative [email protected] Editor Statistical Graphics Section [email protected] (614) 688-3634 [email protected] (919) 515-1921 Linda W. Pickle, COMP/COS All communications regarding ASA Robert E. McCulloch, Program Representative 2007-09 membership and the Statistical Com- Chair [email protected] puting and Statistical Graphics Sec- [email protected] (301) 402-9344 tion, including change of address, (512) 471-9672 Steven MacEachern, Program should be sent to American Statistical Chair Thomas Lumley, Program Association, 1429 Duke Street, Alex- [email protected] Chair-Elect andria, VA 22314-3402 USA (614) 292-5843 [email protected] (703)684-1221, fax (703)684-2036 (206) 543-1044 Heike Hofmann, Program [email protected] Barbara A Bailey, Publications Chair-Elect Officer [email protected] [email protected] (515) 294-8948 (619) 594-4170 Donna F. Stroup, Jane Lea Harvill, Computing Council of Sections Section Representative [email protected] [email protected] (404) 218-0841 (254) 710-1517 Monica D. Clark, Donna F. Stroup (see right) ASA Staff Liaison [email protected] Monica D. Clark (see right) (703) 684-1221

Nicholas Lewin-Koh, Andreas Krause, Newsletter Editor Newsletter Editor [email protected] [email protected]

PAGE 21