Commentariesare informativeessays dealing withviewpoints of sta- theyinvolve longer discussionsof background,issues, and perspec- tisticalpractice, statistical education, and othertopics consideredto tives. All commentarieswill be refereedfor their meritand com- be of generalinterest to the board readershipof The AmericanStatis- patibilitywith these criteria. tician.Commentaries are similarin spiritto Lettersto the Editor,but

How to Display Data Badly HOWARD WAINER*

categorized. This article is the beginningof such a Methods for displayingdata badly have been devel- compendium. oping formany years, and a wide varietyof interesting The aim of good data graphicsis to displaydata accu- and inventiveschemes have emerged.Presented here is ratelyand clearly.Let us use thisdefinition as a starting a synthesisyielding the 12 most powerfultechniques pointfor categorizing methods of bad data display.The thatseem to underliemany of the realizationsfound in definitionhas threeparts. These are (a) showingdata, practice. These 12 (the dirtydozen) are identifiedand (b) showing data accurately, and (c) showing data illustrated. clearly.Thus, ifwe wish to displaydata badly,we have three avenues to follow. Let us examine them in se- KEY WORDS: Graphics; Data display; Data density; quence, parse theminto some of theircomponent parts, Data-ink ratio. and see if we can identifymeans for measuringthe success of each strategy.

1. INTRODUCTION 2. SHOWING DATA

The displayof data is a topic of substantialcontem- Obviously,if the aim of a good displayis to convey poraryinterest and one thathas occupied the thoughts information,the less informationcarried in the display, of manyscholars for almost 200 years. Duringthis time there have been a numberof attemptsto codifystan- Change in Science Achievement of 9-, ,,i,,,, biologicalscience 13-,and 17-Year-Olds,by Type of dards of good practice (e.g., ASME Standards 1915; Exercise: 1969-1977 _. Physicalscience Cox 1978; Ehrenberg 1977) as well as a number of Changein percentcorrect 9-YEAR-OLDS books that have illustrated them (i.e., Bertin 1 , 1973,1977,1981; Schmid 1954; Schmid and Schmid A s ss llll.||| O1 1979; Tufte 1983). The last decade or so has seen a tremendousincrease in the developmentof new display -2 ______-3 _ techniquesand tools that have been reviewedrecently (Macdonald-Ross 1977; Fienberg 1979; Cox 1978; Wainer and Thissen 1981). We wish to concentrateon methodsof data displaythat leave the viewersas unin- 13-YEAR-OLDS formedas theywere beforeseeing the displayor, worse, 0I those thatinduce confusion.Although such techniques are broadlypracticed, to my knowledgethey have not -2 as yet been gatheredinto a single source or carefully -3 _ -4 _5 *Howard Waineris Senior Research Scientist,Educational Testing -6 17-YEAR-OLDS Service,Princeton, NJ 08541. This is the textof an invitedaddress to the American StatisticalAssociation. It was supportedin partby the ProgramStatistics Research Project of the Educational TestingSer- -2 ...... 1 vice. The authorwould like to expresshis gratitudeto the numerous friendsand colleagues who read or heard this article and offered - h Is______1______valuable suggestionsfor its improvement.Especially helpfulwere -4 = a David Andrews, Paul Holland, Bruce Kaplan, James 0. Ramsay, , the participantsin the StanfordWorkshop on Ad- vanced Graphical Presentation,two anonymousreferees, the long- 1969 1970 1973 1977 sufferingassociate editor, and Gary Koch. Figure 1. An example of a low densitygraph (fromS13 [ddi = .3]).

C) The AmericanStatistician, May 1984, Vol. 38, No. 2 137

This content downloaded from 160.39.32.189 on Fri, 23 Jan 2015 07:21:37 AM All use subject to JSTOR Terms and Conditions Publicand Private Elementary Schools m Public SelectedYears 1929-1970 -Prjvale Thousand0oi Schools 0.8 F

300

i_ 2

0. -

1929-30 1939-40 1949-50 1959-60 1969-70 0.0 t.40 0.6 0.Z 1.0 SchoolYear

LOCATION DIFFERENCE: 3 JNIT Figure 4. Hiding the data in the scale (fromS13).

Figure 2. A low densitygraph (fromFriedman and Rafsky1981 [ddi = .5]). in2).This is unusualfor JASA, where the median data graphhas a ddiof 27. In defenseof the producers of this ,the point of the graph is to showthat a methodof the worse it is. Tufte(1983) has devised a scheme for analysissuggested by a criticof theirpaper was not measuringthe amountof informationin displays,called fruitful.I suspect that prose would have worked pretty the data densityindex (ddi), whichis "the numberof wellalso. numbersplotted per square inch." This easily calcu- Althougharguments can be madethat high data den- lated indexis oftensurprisingly informative. In popular sitydoes notimply that a graphicwill be good,nor one and technicalmedia we have founda range from.1 to withlow density bad, itdoes reflecton theefficiency of 362. This provides us with the firstrule of bad data thetransmission of information.Obviously, if we hold display. clarityand accuracyconstant, more information is bet- Rule 1-Show as Few Data as Possible (Minimize the Data Density) THENUMBER OF PRIVATE ELEMENTARY SCHOOLS What does a data graphicwith a ddi of .3 look like? FROM1930-1970 Shown in Figure 1 is a graphicfrom the book Social IndicatorsIII (S13), originallydone in fourcolors (orig- inal size 7" by 9") thatcontains 18 numbers(18/63 = .3). 15- The median data graphin S13 has a data densityof .6 numbers/in2;this one is notan unusualchoice. Shownin Figure 2 is a plot fromthe article by Friedman and 14 - Rafsky(1981) witha ddi of .5 (it shows4 numbersin 8

13-

C,, Labor jyy US.vs Japan Is * 12- .00 _~~~~~~~~~~~~~. C/

70% 62.3%/ 1930 9.275

100%-moutpu pe mon-hor urmQanus in urngfn cia reta-oU upt 10 _ 1940 10.000 1950 10.375 44%/ 1960 13.574 1970 14.372 9

0" 1930 1940 1950 1960 1910

Figure3. A lowdensity graph (? 1978,The Washington Post) with Figure5. Expandingthe scale and showingthe data in Figure4 -junkto fillin thespace (ddi = .2). (fromS13).

138 (? The AmericanStatistician, May 1984, Vol. 38, No. 2

This content downloaded from 160.39.32.189 on Fri, 23 Jan 2015 07:21:37 AM All use subject to JSTOR Terms and Conditions A New Set of Projectins forthe U.S. Supplyof Energy .N 1.: - ,l&U^,* Compared are two proctlons ot United State *rtrgy upply In th, y.r 2000 made by the Pftedynt s Council of EnvirnonottalOuallty and th ectual 1977 supply Alltigurasar* inquads uunitaootm"aurmomnt that reprnt a millionbillin-on quadrilion- Britlsh thettal unlts (8T U a), a standard masure otfergy Cmnlions ofU.S dollars) (inmillions of U S dollars) 3,000 6,000 ____ l . O.la,daa* Solad 0~~i_ 1977 T tal 777 5_ a t U U.S. exports U.S. imports M~~~~~~S5 / 42 -14- 1 Nca, Coal to China fromTaiwan 2000 - A- Er,phsyes ,egy 05 ' 2,000 4,000 c_e,to Tutal '40 19 1 I 7 7 7) U.S. imports U.S. exports

2000-8 fromChina to Taiwan

Erphas.zes.rc,eased Total 1a 9 erergyoduct.on 37d* 1,000 2000

(1979 The Times Figure 6. Ignoringthe visual metaphor (? 1978, The New York 1972 1974 1976 1978 1980 1970 1972 1974 1976 1978 1980 Times). Source Dpartmentof Commerce

Figure 7. Reversing the metaphorin mid-graphwhile changing scales on both axes (? June 14, 1981, The New YorkTimes). ter than less. One of the great assets of graphical tech- niques is that they can convey large amounts of informa- tion in a small space. or by hidingthem (Rule 2). We can measurethe extent We note that when a graph contains little or no infor- to whichwe are successfulin excludingthe data through mation the plot can look quite empty (Figure 2) and the data density;we can sometimesconvince viewers thus raise suspicions in the viewer that there is nothing that we have included the data throughthe incorpo- to be communicated. A way to avoid these suspicions is rationof .Hiding the data can be done either to fill up the plot with nondata figurations-what Tufte by usingan overabundanceof chartjunkor by cleverly has termed "chartjunk." Figure 3 shows a plot of the choosingthe scale so that the data disappear. A mea- labor productivity of Japan relative to that of the sure of the success we have achieved in hidingthe data . It contains one number for each of three is throughthe data-inkratio. years. Obviously, a graph of such sparse information would have a lot of blank space, so filling the space 3. SHOWING DATA ACCURATELY hides the paucity of information from the reader. A convenient measure of the extent to which this The essence of a graphicdisplay is thata set of num- practice is in use is Tufte's "data-ink ratio." This mea- bers havingboth magnitudesand an order are repre- sure is the ratio of the amount of ink used in graphing sented by an appropriatevisual metaphor-the mag- the data to the total amount of ink in the graph. The nitude and order of the metaphoricalrepresentation closer to zero this ratio gets, the worse the graph. The matchthe numbers. We can displaydata badlyby ignor- notion of the data-ink ratio brings us to the second ing or distortingthis concept. principle of bad data display. Rule 3-Ignore the Visual MetaphorAltogether Rule 2-Hide WhatData You Do Show If the data are orderedand ifthe visual metaphorhas (Minimizethe Data-Ink Ratio) a naturalorder, a bad displaywill surelyemerge if you One can hide data in a variety of ways. One method shufflethe relationship.In Figure 6 note that the bar that occurs with some regularityis hiding the data in the labeled 14.1 is longerthan the bar labeled 18. Another grid. The grid is useful for plotting the points, but only methodis to changethe meaningof the metaphorin the rarely afterwards. Thus to display data badly, use a fine middleof the plot. In Figure7 the dark shadingrepre- grid and plot the points dimly (see Tufte 1983, sentsimports on one side and exportson the other.This pp. 94-95 for one repeated version of this). is but one of the problemsof thisgraph; more serious A second way to hide the data is in the scale. This stillis the change of scale. There is also a differencein corresponds to blowing up the scale (i.e., looking at the the time scale, but that is minor.A commontheme in data from far away) so that any variation in the data is Playfair's(1786) work was the differencebetween im- obscured by the magnitude of the scale. One can justify ports and exports. In Figure 8, a 200-year-oldgraph this practice by appealing to "honesty requires that we tells the storyclearly. Two such plots would have illus- start the scale at zero," or other sorts of sophistry. trated the storysurrounding this graph quite clearly. In Figure 4 is a plot that (from S13) effectivelyhides Rule 4-Only Order Matters the growth of private schools in the scale. A redrawing of the number of private schools on a differentscale One frequenttrick is to use lengthas the visualmeta- conveys the growththat took place during the mid- phorwhen area is whatis perceived.This was used quite 1950's (Figure5). The relationshipbetween this rise and effectivelyby The WashingtonPost in Figure 9. Note Brownvs. TopekaSchool Board becomes an immediate thatthis graph also has a low data density(.1), and its question. data-ink ratio is close to zero. We can also calculate To conclude this section, we have seen that we can Tufte's (1983) measure of perceptual distortion(PD) displaydata badlyeither by not includingthem (Rule 1) forthis graph. The PD in thisinstance is the perceived

?) The AmericanStatistician, May 1984, Vol. 38, No. 2 139

This content downloaded from 160.39.32.189 on Fri, 23 Jan 2015 07:21:37 AM All use subject to JSTOR Terms and Conditions C IIA ItTT

E-xPORT.i & 1511'01TS8I

E-.N;C_ rLA_- A - .

...... ~ ~ ~ ~ ~ ~ . 5tf

Figure8. A ploton thesame topicdone welltwo centuries eariler (from Playfair 1786).

changein thevalue of thedollar from Eisenhower to Til E tIXITEI, S1A'M1IS,(WFAMIEICiA.A Carterdivided by the actual change. I readand measure E 1430I3632 thus: Actual Measured U5 5 5 1.00 - .44 22.00- 2.06 96 =44 1.27 2.06 1958- ESENHOWER:$1. PD = 9.68/1.27= 7.62 This distortionof over700% is substantialbut by no TiE FAt V meansa record. ' A less distortedview of thesedata is providedin Figure10. In addition,the spacingsuggested by the

0 E I SENHOWER KENNED T 1963 - KENNEDY:94c JOHNSON

t 4 36 2t X pw~~~~~~~~~~~ 0.8

1968- JOHNSON:53Uc ~0. 4 IN: I 1TEDISTA\TE:SM'A3.11:11 . _ =0.2

CC ofthel 0.2 lXnlshllng r4 ffiN^>: Dollar rc:LuborDportment I 0. O.I I 1978-CTER:44CAR 1958 1963 1968 1973 1978 (August) YERR Figure 9. An example of how to goose up the effectby squaring Figure10. Thedata in Figure9 as an unadornedline chart (from the eyeball (? 1978, The WashingtonPost). Wainer,1980).

140 ? The AmericanStatistician, May 1984, Vol. 38, No. 2

This content downloaded from 160.39.32.189 on Fri, 23 Jan 2015 07:21:37 AM All use subject to JSTOR Terms and Conditions presidentialfaces is made expliciton the time scale. scribedpreviously; that is, thedata are displayed,and theymight even be accuratein theirportrayal. Yet sub- Rule 5-Graph Data Out of Context tle (and notso subtle)techniques can be usedto effec- Often we can modifythe perceptionof the graph tivelyobscure the mostmeaningful or interestingas- (particularlyfor time series data) by choosingcarefully pectsof the data. It is moredifficult toprovide objective theinterval displayed. A precipitousdrop can disappear measuresof presentationalclarity, but we relyon the if we choose a startingdate just afterthe drop. Simi- readerto judgefrom the examples presented. larly,we can turnslight meanders into sharp changes by focusingon a singlemeander and expandingthe scale. Rule 6-Change Scales in Mid-Axis Oftenthe choice of scale is arbitrarybut can have pro- Thisis a powerfultechnique that can makelarge dif- foundeffects on theperception of the display.Figure 11 ferenceslook small and make exponential changes look shows a famous example in which PresidentReagan linear. givesan out-of-contextview of the effectsof his tax cut. In Figure12 is a graphthat supports the associated The Times'alternative provides the context for a deeper storyabout the skyrocketingcirculation of The New understanding.Simultaneously omitting the contextas York Post compared to the plummetingDaily News well as any quantitativescale is the key to the practice circulation.The reason given is that New Yorkers of Ordinal Graphics(see also Rule 4). Automaticrules "trust"the Post. It takes a carefullook to note the do not always work, and wisdom is always required. 700,000jump thatthe scale makesbetween the two In Section 3 we discussedthree rules for the accurate lines. displayof data. One can compromiseaccuracy by ignor- In Figure13 is a plot of physicians'incomes over ing visual metaphors(Rule 3), by onlypaying attention time.It appearsto be linear,with a slighttapering off to the order of the numbersand not theirmagnitude in recentyears. A carefullook at thescale shows that it (Rule 4), or by showingdata out of context(Rule 5). startsout plotting every eight years and ends up plotting We advocated the use of Tufte'smeasure of perceptual yearly.A moreregular scale (in Figure14) tellsquite a distortionas a wayof measuringthe extentto whichthe differentstory. accuracyof the data has been compromisedby the dis- play. One can thinkof modificationsthat would allow it to be applied in other situations,but we leave such expansion to other accounts. The soaraway Post 4. SHOWING DATA CLEARLY the daily paper In this section we discuss methods for badly dis- playingdata that do not seem as serious as those de- New Yorkers trust

THE NEW YORK TIMES, SUNDAY, AUGUST 2, 1981 1,900,000. 1,829,000 NEWS 1,800000 ; $2500 Paymentsunder the $2500 Ways and Means __ Committeeplan 1 700,000 1,636,000

2000 Paymentsundr the 2000 Prdentfs proposW % 1,555,000

1500 1,500,000(- - YOUR TAXES AVERAGE FAMILY - S20.000 Tawpld by INCOME ... 1,491,000

SOfl4Nfl~ 1982 1986 bu,000 - -

1 ??? w m11W0 $ THEIR $ ~~~~~~~BILL

500l - .- OUR BILL ... -_ __ ~ a., :a1.... E~~~17 197 198 198 1982_-- .-E .:_Fiur k00t0 12. Chngn scl inmid-ai tomak lag differences. 1982 1983 1984 1985 198 Figure11. The WhiteHouse showingneither scale norcontext (? 1981,The NewYork Times, reprinted with permission).

? The AmericanStatistician, May 1984, Vol. 38, No. 2 141

This content downloaded from 160.39.32.189 on Fri, 23 Jan 2015 07:21:37 AM All use subject to JSTOR Terms and Conditions Median Income of Year-Round, Full-Time Workers 25 to 34 Years Old, by Sex and Educational Attainment:1961977 hIcomesof Doctors Constant 1977 dollars MALE Vs.Other Profesionals $20,000 - - - - ? (MEDIANNET INCOMES) $18000 . = = = = = SOURCE: on Council Wageand Price Stability or ore $16000 _ _ _, s_ _s ,_~_ _| 6to16years

L ~~, mm~~~~ ~~ ____ ~~~ 13 ro 1,5yeror $14,000 A__ M ein meoino _ 2 1 year 9 $12,000 I~-- -n r OFFICED-BASED 62.799 ~ ~ NONSALARIEDPHYSICIANS 54,14 5,4 $10,0000 1i 50,823 46,780 $8,000 - - -_ _ _ _ - - 43,100 $1200>_ __ _ s__--LIL _16eaorm 34,740 $4,000 - -__- -

$2,000 -- 25,050

16,107 13,150 8,744 $3,262 ... A $14,001 _ l0 - 1V $ -ale'SW_L7 -|------d- ic - 1939 1947 1951 1955 1963 1965 1967 1970 1972 1973 1974 1975 1976 showed~~ larges~a own, effct clal an plaa eardthessals o

Figure 13. Changingscale inmid-axis to make exponentialgrowth $1,000 Feor lees y ars linear (? The WashingtonPost). $4,000 - ______

$2,0 0 0 ______Rule 7-Emphasize the Trivial(Ignore theImportant) $0 - _ _ Sometimesthe data thatare to be displayedhave one 1968 1970 1972 1974 1976 1978 1980 importantaspect and othersthat are trivial.The graph Figure20 15. Emphasizingthe trivial: Hiding the main effect of sex differencesin income throughthe verticalplacement of plots (from can be made worse by emphasizingthe trivialpart. In S13). Figure 15 we have a page fromS13 that compares the income levels of men and women by educationallevels. It reveals the not surprisingresult that bettereducated showed the large effectsclearly and placed the smallish individualsare paid betterthan more poorlyeducated time trendinto the background(Figure 16). ones and thatchanges across timeexpressed in constant 110 12_1`1F ,IA,A 1 e dollars are reasonably constant. The comparison of MEDIANINCOME OF YEAR-ROUND FULLTIME WORKERS greatestinterest and currentconcern, comparingsal- 25-34YEARS OLD BY SEX AND EDUCATIONAL ATTAINMENT: aries between sexes within education level, must be 1968-1977(IN CONSTANT 1977 DOLLARS) made clumsilyby verticallytransposing from one graph to another. It seems clear that Rule 7 musthave been 20- operatinghere, forit would have been easy to place the graphsside by side and allow the comparisonof interest to be made more directly.Looking at the problemfrom 16 Males a strictlydata-analytic point of view, we note thatthere '1 are two large main effects(education and sex) and a C" small time effect.This would have implied a plot that ~MalesV - 12 INCOMES OF DOCTORS VS. OTHER PROFESSIONRLS 710 I-- -' ~~~~~~~~~~~~~~~Females -60 cso n50 DOCTORS 8 Females OTHER PROFESSIONALS Legend z30/ 4 -maximum z202 z H~~~~~~~~~~~EOICRRE STRRTE0 -median n010. (uveriime)

1939 19414 1949 1954 1959 1964 1969 1974

YEAR Figure14. Data fromFigure 13 redone withlinear scale (from Wainer1980).

142 C) The AmericanStatistician, May 1984, Vol. 38, No. 2

This content downloaded from 160.39.32.189 on Fri, 23 Jan 2015 07:21:37 AM All use subject to JSTOR Terms and Conditions LifeExpectancy at Birth,by Sex, Selected m Male Countres, Most Recent Available Year: U.S. IMPORTS OF RED MEATS Female 1970-1IMi Female BIL. LB.

2.5 LAMB MUTTON AND GOATMEAT

HIM Austria,1974 1975 2.0

1.5 ...... Canada, 1970-1972

1.0------Finland,1974 1 N.0 ' N": ' , ,\ : : :, : . .;; ' : .: : :: : ',::BEEF - AN VEAL....: .. .: ,- :. ' : : : : : : :; ...: -:...... ,,.;.-- o~~~~~~~.'.:;::-: .;.... ::.;. 'X'-*..,...,...; 0. France,1972 1 | 4

1960 1963 1969 1972 1975 1978 i90 Germany(Fed Rep , ill i l ^~~~~~~~~~~~~~~~~~~~~~~~~~aFA '.:,: -:-@:-:-:9o *eAftcA WG, r EOUIVALENT 1973-1975 R

Japan, 1974 Figure 17. Jigglingthe baseline makes comparisonsmore difficult (fromHandbook of Agricultural ).

U S S R., 1971-1972) i S S Rule 8-Jiggle the Baseline Sweden, 1971-1975 Making comparisonsis alwaysaided when the quan- titiesbeing comparedstart from a commonbase. Thus UnitedKingdom, 1970-1972 we can always make the graphworse by startingfrom differentbases. Such schemes as the hangingor sus- UnitedStates, 1975 pended rootogramand the residual plot are meant to facilitatecomparisons. In Figure 17 is a plot of U.S. 0 50 60 70 80 90 importsof red meat takenfrom the Handbook of Agri- Years of lifeexpectancy cultural Chartspublished by the U.S. Departmentof Figure 19. AustriaFirst! Obscuring the data structureby alpha- Agriculture.Shading beneath each line is a convention betizingthe plot (fromS13). thatindicates summation, telling us thatthe amountof each kind of meat is added to the amounts below it. in Figure 18 withthe separate amountsof each meat, as Because of the dominance of and the fluctuationsin well as a summation line, shown clearly. Note how importationof beef and veal, it is hard to see what the easily one can see the structureof importof each kind changes are in the otherkinds of meat-Is the importa- of meat now that the standard of comparison is a tion of pork increasing?Decreasing? Stayingconstant? straightline (the time axis) and no longer the import The onlypurpose for stacking is to indicategraphically amountof those meats withgreater volume. the total summation.This is easily done throughthe Rule 9-Austria First! addition of another line for TOTAL. Note that a TOTAL willalways be clear and willnever intersect the Ordering graphs and tables alphabeticallycan ob- otherlines on the plot. A versionof these data is shown scurestructure in the data thatwould have been obvious had the display been ordered by some aspect of the data. One can defend oneself against criticismsby pointingout that alphabetizing"aids in findingentries U.S. IMPORTS OF RED rMEATS* of interest."Of course, withlists of modestlength such BIL. LB.- aids are unnecessary;with longer lists the indexing schemescommon in 19thcentury statistical atlases pro- vide easy lookup capability. Figure 19 is anothergraph from Sf3 showinglife ex- pectancies,divided by sex, in 10 industrializednations. 2.5 e-_____~~_ The order of presentationis alphabetical (with the USSR positionedas Russia). The messagewe getis that

POkK thereis littlevariation and thatwomen live longerthan 10- men. Redone as a stem-and-leafdiagram (Figure 20 is simplya reorderingof the data with spacing propor- 1960 1963 1966 1969 1972 1975 1978 tional to the numericaldifferences), the magnitudeof the sex differenceleaps out at us. We also note thatthe USSR is an outlierfor men.

Source: Handbook of Agri_ultural Charts , U.S . Department of Agriculture, 1976, p. 93. Rule JO-Label (a) Illegibly,(b) Incompletely, Chart Source: Origzinal (c) Incorrectly,and (d) Ambiguously Figure18. An alternativeversion of Figure 17 witha straightline used as thebasis ofcomparison. There are manyinstances of labels thateither do not

C) The AmericanStatistician, May 1984, Vol. 38, No. 2 143

This content downloaded from 160.39.32.189 on Fri, 23 Jan 2015 07:21:37 AM All use subject to JSTOR Terms and Conditions LIFE EXPECTANCYAT BIRTH, BY SEX, MOST RECENT AVAILABLEYEAR Commtssion Payqrents to Travel Agents WOMEN YEARS r3Ef 15o SWEDEN 78 77 m FRANCE, US, JAPAN, CANADA 76 FINLAND, AUSTRIA, UK 75 L 1 20- USSR,GERMANY 74 L 73 I JUNITE 72 SWEDEN 0 71 JAPAN N 9 0- TWA 70 5 69 CANADA,UK, US, FRANCE E A S TERN 68 GERMANY,AUSTRIA 0 67 FINLAND F 60- 66 65 D D E L T A 64 0 673 USSR L 30- 62 I L A Figure 20. Orderingand spacing the data fromFigure 19 as a R stem-and-leaf provides insights previously difficultto 5 0 extract(from S13).

1976 1977 1978 (ea t I ma t e d tell the whole story,tell the wrong story,tell two or Y EAR more stories,or are so small thatone cannotfigure out Figure 22. Figure 21 redrawn with 1978 data placed on a whatstory they are telling.One of myfavorite examples comparable basis (fromWainer 1980). of small labels is from The New York Times (August

1978), in whichthe articlecomplains that fare cuts lower commissionpayments to travelagents. The graph(Fig- ure 21) supportsthis view until one noticesthe tinylabel To TravelAgents indicatingthat the small bar showingthe decline is for In lkosoldofdlAer just the firsthalf of 1978. This omitssuch heavy travel periodsas Labor Day, Thanksgiving,Christmas, and so on, so thatmerely doubling the first-halfdata is proba- bly not enough. Nevertheless,when this bar is doubled (Figure 22), we see thatthe agentsare doing verywell indeed compared to earlieryears. $57~~~~~~~~~~~~~~~'6 Rule 11-More Is Murkier:(a) More Decimal Places and (b) More Dimensions We oftensee tables in whichthe numberof decimal places presentedis far beyondthe numberthat can be perceived by a reader. They are also commonly presentedto show more accuracythan is justified.A displaycan be made clearerby presentingless. In 1 is a sectionof a table fromDhariyal and Dudewicz's (1981) JASA paper. The table entriesare presentedto five decimal places! In Table 2 is a heavily rounded versionthat shows what the authorsintended clearly. It O ~~E,ASTIEEN unITE=D also shows thatthe various columnsmight have a sub- stantial redundancyin them (the maximumexpected d arerass gain withb/c = 10 is about 1/10ththat of b/c = 100 and web discount and airlines' telephone 1/100ththat of b/c = 1,000). If theydo, the entiretable of fars could have been reduced substantially. (ravel agents'overhead, offsetting revenue gains from higher volume. Justas increasingthe numberof decimal places can Figure21. Mixinga changedmetaphor with a tinylabel reverses make a table harder to understand,so can increasing themeaning of the data (? 1978,The NewYork Times). the number of dimensionsmake a graph more con-

144 (C The AmericanStatistician, May 1984, Vol. 38, No. 2

This content downloaded from 160.39.32.189 on Fri, 23 Jan 2015 07:21:37 AM All use subject to JSTOR Terms and Conditions Table 1. OptimalSelection Froma Finite into thinkingthat we are communicatingmore thanwe Sequence WithSampling Cost are (see Fienberg 1979; Wainer and Francolini 1980; Wainer 1981). This leads us to the last rule. b/c = 10.0 100.0 1,000.0 N r* (GN(r*) - a)/c r* (GN(r*) - a)/c r* (GN(r*) - a)/c Rule 12-If It Has Been Done Wellin thePast, Thinkof AnotherWay to Do It 3 2 .20000 2 2.22500 2 22.47499 4 2 .26333 2 2.88833 2 29.13832 The two-variablecolor was done ratherwell by 5 2 .32333 3 3.54167 3 35.79166 6 3 .38267 3 4.23767 3 42.78764 Mayr(1874), 100 yearsbefore the U.S. Census version. 7 3 .44600 3 4.90100 3 49.45097 He used bars of varyingwidth and frequencyto accom- 8 3 .50743 4 5.57650 4 56.33005 plish gracefullywhat the U.S. Census used varying 9 3 .56743 4 6.26025 4 63.20129 10 4 .62948 4 6.92358 4 69.86462 saturationsto do clumsily. A particularlyenlightening experience is to look NOTE: g(Xs + r - 1) = bR(Xs + r - 1) + a, ifS =s, and g(Xs +r - 1)= 0, otherwise. Source: Dhariyaland Dudewicz (1981). carefullythrough the six books of graphsthat William Playfairpublished duringthe period 1786-1822. One fusing.We have alreadyseen how extradimensions can discoversclear, accurate, and data-laden graphs con- cause ambiguity(Is it lengthor area or volume?). In tainingmany ideas thatare usefuland too rarelyapplied addition, human perception of areas is inconsistent. today. In the course of preparingthis article,I spent Justwhat is confusingand whatis not is sometimesonly manyhours looking at a varietyof attemptsto display a conjecture,yet a hintthat a particularconfiguration will be confusingis obtainedif the displayconfused the grapher.Shown in Figure23 is a plot of per shareearn- ingsand dividendsover a six-yearperiod. We note (with Earns PerSha LAnd some amusement)that 1975 is the side of a bar-the g~~~ third dimension of this bar (rectangular parallelo- piped?) charthas confusedthe artist!I suspectthat 1975 DMndenldxs is reallywhat is labeled 1976, and the unlabeled bar at (Dollars) the end is probably1977. A simple line chartwith this interpretationis shownin Figure 24. In Section4 we illustratesix more rulesfor displaying data badly. These rules fall broadlyunder the heading 111. of how to obscure the data. The techniquesmentioned 1.71 1.21.70 were to change the scale in mid-axis,emphasize the 1.63 trivial,jiggle the baseline, order the chartby a charac- 1.53 teristicunrelated to the data, label poorly,and include more dimensionsor decimalplaces thanare justifiedor .4. needed. These methodswill work separately or in com- S , bination with others to produce graphs and tables of . littleuse. Their commoneffect will usuallybe to leave 1~~~.....'.. ',. the reader uninformedabout the points of interestin I ,.S < m -:- ... '' ...... 1 the data, althoughsometimes they will misinformus; L....I Earning . D...... d. the physicians'income plot in Figure 13 is a primeex- Figure,"'.''.23'.'''.An extra . gr . . .cnueevnt. . . . of .::: .: ::::::.:;:;: .:::. .. dimensn..: .: .: .;;:. : .: :.: . .: ::: ;::::::: ample misinformation. , :.:.: '' :..:.:...... ~::" ~ '~ ~ :- :, .. . . : "',...... ( : 1979,::::The WashingtonPost)::.:::::: ...... : .::::::::::::::: the of color ...... Finally, availability usually means that :: . . :. : :::. . ::. :::: :: .::::: .:::: . . .:: .: .::: .: . ; . : .: .: ..:: .:: there are additional ::::::..: .::.:.::. .: :::::: :: :: .. ::: :: ::::.:.. :.. parametersthat can be misused. : . . :.'. :: .:- . :.. . : '. . ' .::: .' .. .: .: .': . ". '. . :. : . ,''. . .

The U.S. Census' two-variablecolor map is a wonderful :::::::: :::::::::;:::~ .: . .::::: .::: .: .:::::::: : .: : .:: .::::::.::: :: ::: example of how using color in a graph can seduce us : ~~~~~~~~~~.,.:'.::'.. :. . :. . ' : . . :'. .: ...... : ...... ,:...... ::.. .' . . :.':: ...... ':~~~~~~~~~~~ ...... ,...... Table 2. OptimalSelection Froma FiniteSequence '.,...::~ ~~~~~~~~~~~~:.' ,. ,. ,.::" '::" .'' ": . ~~~~~~.': .:'. ~~ .:: :':' .. WithSampling Cost (revised) X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~.,., ,,,., ...,., .::::::'' ...... 972~~~~~~~~~~~~.73 ... .6.77 7S ......

b/c = 10 b/c = 100 b/c = 1,000 l . .Dvded.l 1 Eanig N r* G r* G r* G Fiur 2. . .iesoAn.h xr cofue .grapher . eve ...... t .99 .h .Post)....Wahngo ...... 3 2 .2 2 2.2 2 22 4 2 .3 2 2.9 2 29 5 2 .3 3 3.5 3 36 6 3 .4 3 4.2 3 43 7 3 .4 3 4.9 3 49 8 3 .5 4 5.6 4 56 9 3 .6 4 6.3 4 63 10 4 .6 4 6.9 4 70

NOTE:g(Xs + r - 1) =bR(Xs + r - 1) + a, ifS = s, and g(Xs + r - 1) 0, otherwise.

?D The AmericanStatistician, May 1984, Vol. 38, No. 2 145

This content downloaded from 160.39.32.189 on Fri, 23 Jan 2015 07:21:37 AM All use subject to JSTOR Terms and Conditions 2. 00 Beginning at the left on the Polish-Russian border near the Nieman River, the thickband showsthe size of the army(422,000 men) as it invaded Russia in June 1812. The widthof the band indicatesthe size of the armyat each place on the map. In Sep- 1. 75 tember,the armyreached Moscow, whichwas by thensacked and EornL n. deserted,with 100,000 men. The path of Napoleon's retreatfrom Moscow is depictedby the darker,lower band, whichis linkedto I . 5 0 a temperaturescale and dates at the bottomof the chart.It was a bitterlycold winter,and manyfroze on the marchout of Russia. cca -J As the graphicshows, the crossingof the Berezina River was a disaster,and the armyfinally struggled back to Poland withonly M 1.25 10,000men remaining.Also shownare the movementsof auxiliary Di v Xdend. troops, as they sought to protectthe rear and flank of the ad- vancingarmy. Minard's graphictells a rich,coherent story with its 1.00 multivariatedata, farmore enlighteningthan just a singlenumber bouncingalong over time. Six variablesare plotted:the size of the army,its location on a two-dimensionalsurface, direction of the army'smovement, and temperatureon various dates duringthe retreatfrom Moscow. 1972 1974 1976 It may well be the best statisticalgraphic ever drawn. Y E R R Figure 24. Data fromFigure 23 redrawn simply (fromWainer 1980). 5. SUMMING UP data. Some of the horrorsthat I have presentedwere Althoughthe tone of thispresentation tended to be the fruitsof thatsearch. In addition,jewels sometimes light and pointed in the wrong direction,the aim is emerged. I saved the best for last, and will conclude serious. There are manypaths thatone can followthat withone of those jewels-my nominee for the titleof willcause deterioratingquality of our data displays;the "World's Champion Graph." It was produced by 12 rules that we described were only the beginning. Minard in 1861 and portraysthe devastatinglosses suf- Nevertheless,they point clearly toward an outlook that fered by the French armyduring the course of Napo- providesmany hints for good display.The measuresof leon's ill-fatedRussian campaign of 1812. This graph display described are interlocking.The data density (originallyin color) appears in Figure 25 and is re- cannotbe highif the graphis clutteredwith chartjunk; producedfrom Tufte's book (1983, p. 40). His narrative the data-inkratio growswith the amount of data dis- follows. played; perceptualdistortion manifests itself most fre-

CARTE FIGURATIVE des pertes successivesen hommesde l ArmneeFranr&ise dans la campagnede Russ\e1812 -1813. Dresscepar M.Mtnard.Inspecteur Ceneralt de. Ponts et Chausseesen retra'cte.

0 C,~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

0t'r'' = 6I)7ollzhtr Awer =I , =

e~~~~~~~~~ G.&'&- j 0

- 2 0 1 . Z 8 9 1 " -- - - - ______r. ______- Z~~~~~~~~~~~~~~~*~~~~~~~~*~~~~~~~9b~~~~~~~~~~~~~~~.5$~~~~~~~~~~~~C -tie-, ~~~~~~~~~~~~~~~~~ 10~~~~~~

XIr - I )cc:iibtr - N'~~~ciubcr - October

Tufte_Fiue 1983_ 5 for_inr' q_ superb_ (81)gah reproduction fteofthis0initsoriginalcolor-p.176). rnc ry' l-ftd oa it usi-Acnidt orte il o Wrl' CaponGah"(e Tufe183for~ upeb eprducio ofths i is oigialcolr-. 16)

Figue 2. Mnar'sTABLE gaUho tRAhe Urenh Army'sil-ftued eoa ntdeRedusi- carmn~rdiae foa~rthe ditesouf deWorldsCapo rp"(e

146 C The AmericanStatistician, May 1984, Vol. 38, No. 2

This content downloaded from 160.39.32.189 on Fri, 23 Jan 2015 07:21:37 AM All use subject to JSTOR Terms and Conditions quentlywhen additional dimensions or worthlessmeta- FRIEDMAN, J.H., and RAFSKY, L.C. (1981), "Graphics forthe phorsare included.Thus, the rulesfor good displayare MultivariateTwo-Sample Problem," Journalof theAmerican Sta- tisticalAssociation, 76, quite simple. Examine the data 277-287. carefullyenough to JOINT COMMITTEE ON STANDARDS FOR GRAPHIC know what theyhave to say, and then let them say it PRESENTATION, PRELIMINARY REPORT (1915), Journal witha minimumof adornment.Do thiswhile following of theAmerican Statistical Association, 14, 790-797. reasonableregularity practices in the depictionof scale, MACDONALD-ROSS, M. (1977), "How NumbersAre Shown: A and label clearlyand fully.Last, and perhapsmost im- Review of Research on the Presentationof QuantitativeData in Texts," Audiovisual CommunicationsReview, portant,spend some time looking at the work of the 25, 359-409. MAYR, G. VON (1874), "Gutachen Uber die Anwendungder Graph- mastersof the craft.An hour spent with Playfairor ischen und Geographischen,"Method in der Statistik,Munich. Minard will not only benefityour graphicalexpertise MINARD, C.J. (1845-1869), Tableaus Graphiqueset CartesFigura- but will also be enjoyable. Tukey (1977) offers236 tivesde M. Minard,Bibliotheque de l'Ecole Nationale des Pontset graphsand littlechartjunk. The workof FrancisWalker Chaussees, Paris. (1894) concerningstatistical is PLAYFAIR, W. (1786), The Commercialand PoliticalAtlas, Lon- clear and concise, don: Corry. and it is trulya mysterythat their current counterparts SCHMID, C.F. (1954), Handbook of Graphic Presentation,New do not make betteruse of the schema developed a cen- York: Ronald Press. turyand more ago. SCHMID, C.F., and SCHMID, S.E. (1979), Handbook of Graphic Presentation(2nd ed.), New York: JohnWiley. [ReceivedSeptember 1982. RevisedSeptember 1983.] TUFTE, E.R. (1977), "ImprovingData Display," Universityof Chi- cago, Dept. of . (1983), The Visual Display of QuantitativeInformation, Cheshire,Conn.: Graphics Press. REFERENCES TUKEY, J.W. (1977), ExploratoryData Analysis, Reading, Mass: Addison-Wesley. BERTIN, J. (1973), Semiologie Graphique (2nd ed.), The Hague: WAINER, H. (1980), "Making Newspaper Graphs Fit to Prinit,"in Mouton-Gautier. Processingof Visible Language, Vol. 2, eds. H. Bouma, P.A. (1977), La Graphique et le TraitementGraphique de Kolers, and M.E. Wrolsted,New York: Plenum, 125-142. l'Information,France: Flammarion. , "Reply" to Meyerand Abt (1981), The AmericanStatistician, (1981), Graphicsand theGraphical Analysis of Data, transla- 57. tion, W. Berg, tech. ed., H. Wainer,Berlin: DeGruyter. (1983), "How Are We Doing? A Review of Social Indicators COX, D.R. (1978), "Some Remarks on the Role in Statisticsof III," Journalof theAmerican Statistical Association, 78, 492-496. Graphical Methods," Applied Statistics,27, 4-9. WAINER, H., and FRANCOLINI, C. (1980), "An Empirical In- DHARIYAL, I.D., and DUDEWICZ, E.J. (1981), "Optimal Selec- quiry ConcerningHuman Understandingof Two-VariableColor tion From a FiniteSequence WithSampling Cost," Journalof the Maps," The AmericanStatistician, 34, 81-93. AmericanStatistical Association, 76, 952-959. WAINER, H., and THISSEN, D. (1981), "Graphical Data Anal- EHRENBERG, A.S.C. (1977), "Rudimentsof Numeracy,"Journal ysis," Annual Reviewof Psychology,32, 191-241. of the Royal StatisticalSociety, Ser. A, 140, 277-297. WALKER, F.A. (1894), StatisticalAtlas of theUnited States Based on FIENBERG, S.E. (1979), Graphical Methods in Statistics, The theResults of theNinth Census, Washington, D.C.: U.S. Bureau of AmericanStatistician, 33, 165-178. the Census.

?) The AmericanStatistician, May 1984, Vol. 38, No. 2 147

This content downloaded from 160.39.32.189 on Fri, 23 Jan 2015 07:21:37 AM All use subject to JSTOR Terms and Conditions