<<

Principles of Statistical

Principles of Statistical Graphics 1 What’s wrong with this?

Principles of Statistical Graphics 2 What’s wrong with this?

How would you graph the same ?

Principles of Statistical Graphics 2 Cleveland’s paradigm

A graph encodes quantitative and categorical using symbols, geometry and color. Graphical perception is the visual decoding of the encoded information. The graph may be beautiful but a failure: the visual decoding has to work. To make graphs work we need to understand graphical perception: what the eye sees and can decode well. .

Principles of Statistical Graphics 3 Cleveland’s paradigm

A graph encodes quantitative and categorical information using symbols, geometry and color. Graphical perception is the visual decoding of the encoded information. The graph may be beautiful but a failure: the visual decoding has to work. To make graphs work we need to understand graphical perception: what the eye sees and can decode well. visual perception.

Principles of Statistical Graphics 3 Cleveland’s paradigm

A graph encodes quantitative and categorical information using symbols, geometry and color. Graphical perception is the visual decoding of the encoded information. The graph may be beautiful but a failure: the visual decoding has to work. To make graphs work we need to understand graphical perception: what the eye sees and can decode well. visual perception.

Principles of Statistical Graphics 3 Cleveland’s paradigm

A specification and ordering of elementary graphical-perception tasks.

Ten properties of graphs: Angle Position along a Area common scale Colour hue Position along Colour saturation identical, Density (amount of non-aligned scales black) Slope Length (distance) Volume

Principles of Statistical Graphics 4 Cleveland’s paradigm

A specification and ordering of elementary graphical-perception tasks.

Ten properties of graphs: Angle Position along a Area common scale Colour hue Position along Colour saturation identical, Density (amount of non-aligned scales black) Slope Length (distance) Volume

Principles of Statistical Graphics 4 Cleveland’s paradigm

A specification and ordering of elementary graphical-perception tasks.

Ten properties of graphs: Angle Position along a Area common scale Colour hue Position along Colour saturation identical, Density (amount of non-aligned scales black) Slope Length (distance) Volume

Principles of Statistical Graphics 4 Cleveland’s paradigm

A specification and ordering of elementary graphical-perception tasks.

Ten properties of graphs: Angle Position along a Area common scale Colour hue Position along Colour saturation identical, Density (amount of non-aligned scales black) Slope Length (distance) Volume

Principles of Statistical Graphics 4 Cleveland’s paradigm

A specification and ordering of elementary graphical-perception tasks.

Ten properties of graphs: Angle Position along a Area common scale Colour hue Position along Colour saturation identical, Density (amount of non-aligned scales black) Slope Length (distance) Volume

Principles of Statistical Graphics 4 Cleveland’s paradigm

A specification and ordering of elementary graphical-perception tasks.

Ten properties of graphs: Angle Position along a Area common scale Colour hue Position along Colour saturation identical, Density (amount of non-aligned scales black) Slope Length (distance) Volume

Principles of Statistical Graphics 4 Cleveland’s paradigm

A specification and ordering of elementary graphical-perception tasks.

Ten properties of graphs: Angle Position along a Area common scale Colour hue Position along Colour saturation identical, Density (amount of non-aligned scales black) Slope Length (distance) Volume

Principles of Statistical Graphics 4 Cleveland’s paradigm

A specification and ordering of elementary graphical-perception tasks.

Ten properties of graphs: Angle Position along a Area common scale Colour hue Position along Colour saturation identical, Density (amount of non-aligned scales black) Slope Length (distance) Volume

Principles of Statistical Graphics 4 Cleveland’s paradigm

A specification and ordering of elementary graphical-perception tasks.

Ten properties of graphs: Angle Position along a Area common scale Colour hue Position along Colour saturation identical, Density (amount of non-aligned scales black) Slope Length (distance) Volume

Principles of Statistical Graphics 4 Cleveland’s paradigm

A specification and ordering of elementary graphical-perception tasks.

Ten properties of graphs: Angle Position along a Area common scale Colour hue Position along Colour saturation identical, Density (amount of non-aligned scales black) Slope Length (distance) Volume

Principles of Statistical Graphics 4 Cleveland’s paradigm

A specification and ordering of elementary graphical-perception tasks.

Ten properties of graphs: Angle Position along a Area common scale Colour hue Position along Colour saturation identical, Density (amount of non-aligned scales black) Slope Length (distance) Volume

Principles of Statistical Graphics 4 Cleveland’s paradigm

Cleveland’s order of accuracy:

1 Position along a common scale

2 Position along identical, non-aligned scales

3 Length

4 Angle and slope

5 Area

6 Volume

7 Colour hue, colour saturation, density

Encode data on a graph so that the visual decoding involves tasks as high as possible in the ordering.

Principles of Statistical Graphics 5 Cleveland’s paradigm

Cleveland’s order of accuracy:

1 Position along a common scale

2 Position along identical, non-aligned scales

3 Length

4 Angle and slope

5 Area

6 Volume

7 Colour hue, colour saturation, density

Encode data on a graph so that the visual decoding involves tasks as high as possible in the ordering.

Principles of Statistical Graphics 5 Cleveland’s paradigm

Cleveland’s order of accuracy:

1 Position along a common scale

2 Position along identical, non-aligned scales

3 Length

4 Angle and slope

5 Area

6 Volume

7 Colour hue, colour saturation, density

Encode data on a graph so that the visual decoding involves tasks as high as possible in the ordering.

Principles of Statistical Graphics 5 Cleveland’s paradigm

Cleveland’s order of accuracy:

1 Position along a common scale

2 Position along identical, non-aligned scales

3 Length

4 Angle and slope

5 Area

6 Volume

7 Colour hue, colour saturation, density

Encode data on a graph so that the visual decoding involves tasks as high as possible in the ordering.

Principles of Statistical Graphics 5 Cleveland’s paradigm

Cleveland’s order of accuracy:

1 Position along a common scale

2 Position along identical, non-aligned scales

3 Length

4 Angle and slope

5 Area

6 Volume

7 Colour hue, colour saturation, density

Encode data on a graph so that the visual decoding involves tasks as high as possible in the ordering.

Principles of Statistical Graphics 5 Cleveland’s paradigm

Cleveland’s order of accuracy:

1 Position along a common scale

2 Position along identical, non-aligned scales

3 Length

4 Angle and slope

5 Area

6 Volume

7 Colour hue, colour saturation, density

Encode data on a graph so that the visual decoding involves tasks as high as possible in the ordering.

Principles of Statistical Graphics 5 Cleveland’s paradigm

Cleveland’s order of accuracy:

1 Position along a common scale

2 Position along identical, non-aligned scales

3 Length

4 Angle and slope

5 Area

6 Volume

7 Colour hue, colour saturation, density

Encode data on a graph so that the visual decoding involves tasks as high as possible in the ordering.

Principles of Statistical Graphics 5 Cleveland’s paradigm

Cleveland’s order of accuracy:

1 Position along a common scale

2 Position along identical, non-aligned scales

3 Length

4 Angle and slope

5 Area

6 Volume

7 Colour hue, colour saturation, density

Encode data on a graph so that the visual decoding involves tasks as high as possible in the ordering.

Principles of Statistical Graphics 5 Cleveland’s paradigm

Cleveland’s order of accuracy:

1 Position along a common scale

2 Position along identical, non-aligned scales

3 Length

4 Angle and slope

5 Area

6 Volume

7 Colour hue, colour saturation, density

Encode data on a graph so that the visual decoding involves tasks as high as possible in the ordering.

Principles of Statistical Graphics 5 Cleveland’s paradigm

Corollaries

1 Pie perform poorly because they rely on comparing angles rather than distances. 2 should be plotted as lines with time on the horizontal axis. 3 Avoid representing data using volumes. 4 If a key point is represented by a changing slope, consider plotting the rate of change itself rather than the original data. 5 Think of simplifications which enhance the detection of the basic properties of the data. 6 Think of how the distance between related representations of data affects their interpretation.

Principles of Statistical Graphics 6 Cleveland’s paradigm

Corollaries

1 Pie charts perform poorly because they rely on comparing angles rather than distances. 2 Time series should be plotted as lines with time on the horizontal axis. 3 Avoid representing data using volumes. 4 If a key point is represented by a changing slope, consider plotting the rate of change itself rather than the original data. 5 Think of simplifications which enhance the detection of the basic properties of the data. 6 Think of how the distance between related representations of data affects their interpretation.

Principles of Statistical Graphics 6 Cleveland’s paradigm

Corollaries

1 Pie charts perform poorly because they rely on comparing angles rather than distances. 2 Time series should be plotted as lines with time on the horizontal axis. 3 Avoid representing data using volumes. 4 If a key point is represented by a changing slope, consider plotting the rate of change itself rather than the original data. 5 Think of simplifications which enhance the detection of the basic properties of the data. 6 Think of how the distance between related representations of data affects their interpretation.

Principles of Statistical Graphics 6 Cleveland’s paradigm

Corollaries

1 Pie charts perform poorly because they rely on comparing angles rather than distances. 2 Time series should be plotted as lines with time on the horizontal axis. 3 Avoid representing data using volumes. 4 If a key point is represented by a changing slope, consider plotting the rate of change itself rather than the original data. 5 Think of simplifications which enhance the detection of the basic properties of the data. 6 Think of how the distance between related representations of data affects their interpretation.

Principles of Statistical Graphics 6 Cleveland’s paradigm

Corollaries

1 Pie charts perform poorly because they rely on comparing angles rather than distances. 2 Time series should be plotted as lines with time on the horizontal axis. 3 Avoid representing data using volumes. 4 If a key point is represented by a changing slope, consider plotting the rate of change itself rather than the original data. 5 Think of simplifications which enhance the detection of the basic properties of the data. 6 Think of how the distance between related representations of data affects their interpretation.

Principles of Statistical Graphics 6 Cleveland’s paradigm

Corollaries

1 Pie charts perform poorly because they rely on comparing angles rather than distances. 2 Time series should be plotted as lines with time on the horizontal axis. 3 Avoid representing data using volumes. 4 If a key point is represented by a changing slope, consider plotting the rate of change itself rather than the original data. 5 Think of simplifications which enhance the detection of the basic properties of the data. 6 Think of how the distance between related representations of data affects their interpretation.

Principles of Statistical Graphics 6 The Quality Magazine 7 #4 (1998), 64-68.

Good Graphs for Better Business Data — — — — — — — — — — — — — — — — W. S. Cleveland & N.I. Fisher

Analysis and Are we on track with our target for market share? Is Interpretation our installation process capable of meeting the industry benchmark? What things are causing most of the unhappiness with our staff? Information

These are typical of the management problems that require timely and accurate information. They are also Encoding problems for which effective use of graphs can make a

big difference. 200 150 North 100 West 50 East ? 0 1st 2nd 3rd 4th The good news is that graphical capabilities are now Qtr Qtr Qtr Qtr readily available in statistical, spreadsheet, and word- processing packages. The bad news is that much of this graphical capability produces graphs that can hide Decoding or seriously distort crucial information contained in the data. Decoded Information For a long time, how to make a good graph was largely a matter of opinion. However, the last 20 years have seen the development of a set of principles for sound graphical construction, based on solid scientific research and experimentation. Good entry points to Business these principles are provided in the References. In this decision article, we look at a couple of common examples of applying these principles. Figure 1. The graphical process involves extraction of information from data, a decision about which It is helpful to think about the whole process of patterns are to be displayed, and then selection of a graphing, shown schematically in Figure 1. The type of graph that will reveal this pattern to the user, crucial question is: How does the choice of Casegraph studywithout distortio 1n. affect the information as perceived by the recipient of the graph?

To see how graphs can conceal, or reveal, information, consider the humble pie . Figure 2 shows data on the contributions to enterprise profits of a particular product in various regions R1, R2, … around Australia (labels modified from original, but retaining the same ordering, which was alphabetical). What information is this supposed to be purveying? Certainly the caption doesn’t enlighten us. If we wanted the actual percentage share for each region, we should simply use a : tables are intended to provide precise numerical data, whereas the purpose of graphs is to reveal pattern.

For a more elaborate example, we turn to another popular graphical display, the divided for trend data. Figure 3 shows data on market share of whitegoods sales for an eighteen month period, based Figure 2. , showing the relative contributions on monthly industry surveys. What can we glean from to the profits of an enterprise from various Divisions this? Total sales aren’t changing. Manufacturer 1 has around Australia. the biggest market share. Is there nothing else? Principles of Statistical Graphics 7

Version 1.0, 28th May, 1998 1 The Quality Magazine 7 #4 (1998), 64-68.

The answer to this is provided by one of the fundamental tenets of graphing. Detection of pattern with this type of data is best done when each measurement is plotted as a distance from a common baseline. The baseline in Figure 4 is the left vertical axis, and we’re simply comparing horizontal lengths that are vertically aligned.

On the other hand, in the pie chart, we’re trying to compare angles, and very small angles at that. This sort of comparison is known to be imprecise. Similar problems occur when the data are graphed in other colourful or picturesque ways, such as when their sizes are represented by 3-dimensional solid volumes.

However, we haven’t finished with this data set yet. At least one more step should be taken: re-order the data so that they from largest to smallest. The final result is shown in Figure 5.

Figure 3. Divided bar chart, showing monthly sales of different brands of whitegoods over an 18-month period.

Returning to Figure 2, no obvious patterns emerge. So what’s happened in the graphical process? Perhaps the Analysis and Interpretation step wasn’t carried out. So, let’s try a different plot that shows things Case studymore simply, 1 a dotplot: see Figure 4.

Figure 5. The display from Figure 4 has been modified, so that the data plot from largest to smallest. A further pattern emerges: different States tend to contribute differently to enterprise profits.

A (potentially) more important pattern has now emerged: different States are not contributing equally to group profits. This information may be vital in helping management to identify a major improvement opportunity. We create one more graph to bring this out: see Figure 6.

Figure 4. A dotplot of the data used in Figure 2, What can we now say in defence of Figure 2 in the showing the relative contributions to enterprise profits light of Figures 5 and 6? Really, only that it was from its various Divisions around Australia. The produced from a spreadsheet at the press of a button. discrete nature of the data is immediately evident. But judged by a standard of how well it conveys Principles of Statistical Graphics information,7 it has failed. This is typical of pie charts. A simple pattern emerges immediately: there are only five different levels of contribution; some Now let’s re-visit the market share data plotted in rounding of the raw data has been performed. Why Figure 3. What aspects of this graph might be wasn’t this evident in the pie chart? hindering or preventing us from seeing important pattern?

Version 1.0, 28th May, 1998 2 The Quality Magazine 7 #4 (1998), 64-68.

The answer to this is provided by one of the fundamental tenets of graphing. Detection of pattern with this type of data is best done when each measurement is plotted as a distance from a common baseline. The baseline in Figure 4 is the left vertical axis, and we’re simply comparing horizontal lengths that are vertically aligned.

On the other hand, in the pie chart, we’re trying to compare angles, and very small angles at that. This sort of comparison is known to be imprecise. Similar problems occur when the data are graphed in other colourful or picturesque ways, such as when their sizes are represented by 3-dimensional solid volumes.

However, we haven’t finished with this data set yet. At least one more step should be taken: re-order the data so that they plot from largest to smallest. The Case studyfinal result is sh 1own in Figure 5.

Figure 3. Divided bar chart, showing monthly sales of different brands of whitegoods over an 18-month period.

Returning to Figure 2, no obvious patterns emerge. So what’s happened in the graphical process? Perhaps the Analysis and Interpretation step wasn’t carried out. So, let’s try a different plot that shows things more simply, a dotplot: see Figure 4.

Figure 5. The display from Figure 4 has been modified, so that the data plot from largest to smallest. A further pattern emerges: different States tend to contribute differently to enterprise profits. Principles of Statistical Graphics 7 A (potentially) more important pattern has now emerged: different States are not contributing equally to group profits. This information may be vital in helping management to identify a major improvement opportunity. We create one more graph to bring this out: see Figure 6.

Figure 4. A dotplot of the data used in Figure 2, What can we now say in defence of Figure 2 in the showing the relative contributions to enterprise profits light of Figures 5 and 6? Really, only that it was from its various Divisions around Australia. The produced from a spreadsheet at the press of a button. discrete nature of the data is immediately evident. But judged by a standard of how well it conveys information, it has failed. This is typical of pie charts. A simple pattern emerges immediately: there are only five different levels of contribution; some Now let’s re-visit the market share data plotted in rounding of the raw data has been performed. Why Figure 3. What aspects of this graph might be wasn’t this evident in the pie chart? hindering or preventing us from seeing important pattern?

Version 1.0, 28th May, 1998 2 The Quality Magazine 7 #4 (1998), 64-68.

Case study 1

Figure 6. The contributions to group profit by different regions are plotted by State. The clear differences between States are evident.

Principles of Statistical Graphics 7 We can get some idea of what’s happening with the goods sold by Manufacturer 1: probably not much. We Figure 8. The monthly sales data from Figure 7 have see this pattern because we’re effectively comparing been replotted so that the dominant curve is displayed aligned (vertical) lengths: the 18 values of monthly separately, with a false origin, and the other curves sales for this Manufacturer are all measured up from a that are measured on a much smaller scale can then be common baseline (the horizontal axis). plotted using a better aspect ratio. This reveals more information about individual and comparative trends However, this isn’t the case for any of the other in the curves. variables. For example, the individual values for the second manufacturer are measured upwards from We have made some progress. Manufacturer 1 is where the corresponding values for the first more easily studied, and there is little evidence of manufacturer stop: the lengths are not aligned. So the anything other than random fluctuations around an first step is to re-plot the data in order that, for each average monthly sales volume of about 25,000 units. variable, we are comparing aligned lengths. See However, possible patterns in the other curves are Figure 7. difficult to detect because the scale of the axes is adjusted to allow all the data to be plotted on the same graph. The next step is to display the dominant curve separately, and to use a better aspect ratio when plotting the other variables. See Figure 8.

Now some very interesting patterns emerge. It is evident that sales for Manufacturer 2 have been in decline for most of this period. Not much is happening with Manufacturer 4, who is averaging about 2500 units. However, there is something happening with Manufacturers 3 and 5. From months 6 to 18, sales for Manufacturer 3 rose significantly and then declined. This appears to have been at the expense of Manufacturer 5, whose sales declined significantly and then recovered. If this comparison is really of interest, it should be studied separately. The difference is plotted in Figure 9.

Figure 7. The monthly sales data from Figure 3 have been replotted so that sales patterns for each manufacturer can be seen without distortion. However, the curve for the dominant manufacturer is compressing patterns in the other curves.

Version 1.0, 28th May, 1998 3 The Quality Magazine 7 #4 (1998), 64-68. Case study 2 The answer to this is provided by one of the fundamental tenets of graphing. Detection of pattern with this type of data is best done when each measurement is plotted as a distance from a common baseline. The baseline in Figure 4 is the left vertical axis, and we’re simply comparing horizontal lengths that are vertically aligned.

On the other hand, in the pie chart, we’re trying to compare angles, and very small angles at that. This sort of comparison is known to be imprecise. Similar problems occur when the data are graphed in other colourful or picturesque ways, such as when their sizes are represented by 3-dimensional solid volumes.

However, we haven’t finished with this data set yet. At least one more step should be taken: re-order the data so that they plot from largest to smallest. The final result is shown in Figure 5.

Figure 3. Divided bar chart, showing monthly sales of different brands of whitegoods over an 18-month period.

Principles of Statistical Graphics 8 Returning to Figure 2, no obvious patterns emerge. So what’s happened in the graphical process? Perhaps the Analysis and Interpretation step wasn’t carried out. So, let’s try a different plot that shows things more simply, a dotplot: see Figure 4.

Figure 5. The display from Figure 4 has been modified, so that the data plot from largest to smallest. A further pattern emerges: different States tend to contribute differently to enterprise profits.

A (potentially) more important pattern has now emerged: different States are not contributing equally to group profits. This information may be vital in helping management to identify a major improvement opportunity. We create one more graph to bring this out: see Figure 6.

Figure 4. A dotplot of the data used in Figure 2, What can we now say in defence of Figure 2 in the showing the relative contributions to enterprise profits light of Figures 5 and 6? Really, only that it was from its various Divisions around Australia. The produced from a spreadsheet at the press of a button. discrete nature of the data is immediately evident. But judged by a standard of how well it conveys information, it has failed. This is typical of pie charts. A simple pattern emerges immediately: there are only five different levels of contribution; some Now let’s re-visit the market share data plotted in rounding of the raw data has been performed. Why Figure 3. What aspects of this graph might be wasn’t this evident in the pie chart? hindering or preventing us from seeing important pattern?

Version 1.0, 28th May, 1998 2 The Quality Magazine 7 #4 (1998), 64-68.

Figure 6. The contributions to group profit by different regions are plotted by State. The clear differences between States are evident.

We can get some idea of what’s happening with the goods sold by Manufacturer 1: probably not much. We Figure 8. The monthly sales data from Figure 7 have see this pattern because we’re effectively comparing been replotted so that the dominant curve is displayed aligned (vertical) lengths: the 18 values of monthly separately, with a false origin, and the other curves sales for this Manufacturer are all measured up from a that are measured on a much smaller scale can then be common baseline (the horizontal axis). plotted using a better aspect ratio. This reveals more information about individual and comparative trends However, this isn’t the case for any of the other in the curves. variables. For example, the individual values for the second manufacturer are measured upwards from We have made some progress. Manufacturer 1 is where the corresponding values for the first more easily studied, and there is little evidence of manufacturer stop: the lengths are not aligned. So the anything other than random fluctuations around an first step is to re-plot the data in order that, for each average monthly sales volume of about 25,000 units. variable, we are comparing aligned lengths. See However, possible patterns in the other curves are Case studyFigure 7. 2 difficult to detect because the scale of the axes is adjusted to allow all the data to be plotted on the same graph. The next step is to display the dominant curve separately, and to use a better aspect ratio when plotting the other variables. See Figure 8.

Now some very interesting patterns emerge. It is evident that sales for Manufacturer 2 have been in decline for most of this period. Not much is happening with Manufacturer 4, who is averaging about 2500 units. However, there is something happening with Manufacturers 3 and 5. From months 6 to 18, sales for Manufacturer 3 rose significantly and then declined. This appears to have been at the expense of Manufacturer 5, whose sales declined significantly and then recovered. If this comparison is really of interest, it should be studied separately. The difference is plotted in Figure 9.

Figure 7. The monthly sales data from Figure 3 have been replotted so that sales patterns for each manufacturer can be seen without distortion. However, the curve for the dominant manufacturer is compressing patterns in the other curves. Principles of Statistical Graphics 8

Version 1.0, 28th May, 1998 3 The Quality Magazine 7 #4 (1998), 64-68.

Case study 2

Figure 6. The contributions to group profit by different regions are plotted by State. The clear differences between States are evident.

We can get some idea of what’s happening with the goods sold by Manufacturer 1: probably not much. We Figure 8. The monthly sales data from Figure 7 have see this pattern because we’re effectively comparing been replotted so that the dominant curve is displayed aligned (vertical) lengths: the 18 values of monthly separately, with a false origin, and the other curves sales for this Manufacturer are all measured up from a that are measured on a much smaller scale can then be common baseline (the horizontal axis). plotted using a better aspect ratio. This reveals more information about individual and comparative trends However, this isn’t the case for any of the other in the curves. variables. For example, the individual valuesPrinciples for the of Statistical Graphics 8 second manufacturer are measured upwards from We have made some progress. Manufacturer 1 is where the corresponding values for the first more easily studied, and there is little evidence of manufacturer stop: the lengths are not aligned. So the anything other than random fluctuations around an first step is to re-plot the data in order that, for each average monthly sales volume of about 25,000 units. variable, we are comparing aligned lengths. See However, possible patterns in the other curves are Figure 7. difficult to detect because the scale of the axes is adjusted to allow all the data to be plotted on the same graph. The next step is to display the dominant curve separately, and to use a better aspect ratio when plotting the other variables. See Figure 8.

Now some very interesting patterns emerge. It is evident that sales for Manufacturer 2 have been in decline for most of this period. Not much is happening with Manufacturer 4, who is averaging about 2500 units. However, there is something happening with Manufacturers 3 and 5. From months 6 to 18, sales for Manufacturer 3 rose significantly and then declined. This appears to have been at the expense of Manufacturer 5, whose sales declined significantly and then recovered. If this comparison is really of interest, it should be studied separately. The difference is plotted in Figure 9.

Figure 7. The monthly sales data from Figure 3 have been replotted so that sales patterns for each manufacturer can be seen without distortion. However, the curve for the dominant manufacturer is compressing patterns in the other curves.

Version 1.0, 28th May, 1998 3 The Quality Magazine 7 #4 (1998), 64-68. Case study 2 • two very commonly-used displays, pie charts and divided bar charts, typically do a poor job of revealing pattern

Can you afford not to be using graphs in the way information is reported to you, or the way you are reporting it? How else can vital patterns be revealed and presented, and so provide effective input to decision-making at all levels of your enterprise?

References

Cleveland, William S. (1993), Visualizing Data. Hobart Press, Summit, New Jersey.

Cleveland, William S. (1994), The Elements of Graphing Data. Second edition. Hobart Press, Summit, New Jersey.

Figure 9. This graph shows the difference between Tufte, Edward R. (1983). The Visual Display of sales of Manufacturers 5 and 3. Over the period Quantitative Information. Graphics Press, Cheshire, January – July 1997 there was a marked increase in Connecticut. sales in favour of Manufacture 5; after July, this

advantage declined steadily to the end of the year.

Principles of Statistical Graphics Bill Cleveland8 is a member of the & Data One plausible explanation would be a short but Mining Research Department of Bell Labs, Murray intense marketing campaign conducted during this Hill, NJ, USA. period; there may be others. The main point is that

appropriate graphs have elicited very interesting Nick Fisher is a member of the Organisational patterns in the data that may well be worthy of further Performance Measurement and Data Mining research exploration. The divided bar chart has done poorly; groups in CSIRO. this is typically the case.

There is much to be learnt about constructing good graphs, to do with effective use of colour, choice of aspect ratio, displaying large volumes of data, use of false origin, and so on. These issues are discussed at length in the references.

To summarise, what basic messages can we draw from these simple examples? There are several:

• the process of effective graph construction begins with simple analyses to see what sorts of patterns, that is, information, are present.

• for many graphs, pattern detection is far more acute when the data are measured from a common baseline, so that we are comparing aligned lengths

• re-arranging the values in a dot plot so that they are in decreasing order provides greatly enhanced pattern recognition

• graph construction is an iterative process

• sometimes more than one graph is needed to show all the interesting patterns in the best way

Version 1.0, 28th May, 1998 4 Twenty rules for good graphics

1 Use vector graphics such as eps or pdf.

2 Use readable fonts.

3 Avoid cluttered legends.

4 If you must use a legend, move it inside the plot.

5 No dark shaded backgrounds.

6 Avoid dark, dominating grid lines.

7 Keep the axis limits sensible.

8 Do not forget to specify units.

9 Tick intervals should be at nice round numbers.

10 Axes should be properly labelled.

Principles of Statistical Graphics 9 Twenty rules for good graphics

1 Use vector graphics such as eps or pdf.

2 Use readable fonts.

3 Avoid cluttered legends.

4 If you must use a legend, move it inside the plot.

5 No dark shaded backgrounds.

6 Avoid dark, dominating grid lines.

7 Keep the axis limits sensible.

8 Do not forget to specify units.

9 Tick intervals should be at nice round numbers.

10 Axes should be properly labelled.

Principles of Statistical Graphics 9 Twenty rules for good graphics

1 Use vector graphics such as eps or pdf.

2 Use readable fonts.

3 Avoid cluttered legends.

4 If you must use a legend, move it inside the plot.

5 No dark shaded backgrounds.

6 Avoid dark, dominating grid lines.

7 Keep the axis limits sensible.

8 Do not forget to specify units.

9 Tick intervals should be at nice round numbers.

10 Axes should be properly labelled.

Principles of Statistical Graphics 9 Twenty rules for good graphics

1 Use vector graphics such as eps or pdf.

2 Use readable fonts.

3 Avoid cluttered legends.

4 If you must use a legend, move it inside the plot.

5 No dark shaded backgrounds.

6 Avoid dark, dominating grid lines.

7 Keep the axis limits sensible.

8 Do not forget to specify units.

9 Tick intervals should be at nice round numbers.

10 Axes should be properly labelled.

Principles of Statistical Graphics 9 Twenty rules for good graphics

1 Use vector graphics such as eps or pdf.

2 Use readable fonts.

3 Avoid cluttered legends.

4 If you must use a legend, move it inside the plot.

5 No dark shaded backgrounds.

6 Avoid dark, dominating grid lines.

7 Keep the axis limits sensible.

8 Do not forget to specify units.

9 Tick intervals should be at nice round numbers.

10 Axes should be properly labelled.

Principles of Statistical Graphics 9 Twenty rules for good graphics

1 Use vector graphics such as eps or pdf.

2 Use readable fonts.

3 Avoid cluttered legends.

4 If you must use a legend, move it inside the plot.

5 No dark shaded backgrounds.

6 Avoid dark, dominating grid lines.

7 Keep the axis limits sensible.

8 Do not forget to specify units.

9 Tick intervals should be at nice round numbers.

10 Axes should be properly labelled.

Principles of Statistical Graphics 9 Twenty rules for good graphics

1 Use vector graphics such as eps or pdf.

2 Use readable fonts.

3 Avoid cluttered legends.

4 If you must use a legend, move it inside the plot.

5 No dark shaded backgrounds.

6 Avoid dark, dominating grid lines.

7 Keep the axis limits sensible.

8 Do not forget to specify units.

9 Tick intervals should be at nice round numbers.

10 Axes should be properly labelled.

Principles of Statistical Graphics 9 Twenty rules for good graphics

1 Use vector graphics such as eps or pdf.

2 Use readable fonts.

3 Avoid cluttered legends.

4 If you must use a legend, move it inside the plot.

5 No dark shaded backgrounds.

6 Avoid dark, dominating grid lines.

7 Keep the axis limits sensible.

8 Do not forget to specify units.

9 Tick intervals should be at nice round numbers.

10 Axes should be properly labelled.

Principles of Statistical Graphics 9 Twenty rules for good graphics

1 Use vector graphics such as eps or pdf.

2 Use readable fonts.

3 Avoid cluttered legends.

4 If you must use a legend, move it inside the plot.

5 No dark shaded backgrounds.

6 Avoid dark, dominating grid lines.

7 Keep the axis limits sensible.

8 Do not forget to specify units.

9 Tick intervals should be at nice round numbers.

10 Axes should be properly labelled.

Principles of Statistical Graphics 9 Twenty rules for good graphics

1 Use vector graphics such as eps or pdf.

2 Use readable fonts.

3 Avoid cluttered legends.

4 If you must use a legend, move it inside the plot.

5 No dark shaded backgrounds.

6 Avoid dark, dominating grid lines.

7 Keep the axis limits sensible.

8 Do not forget to specify units.

9 Tick intervals should be at nice round numbers.

10 Axes should be properly labelled.

Principles of Statistical Graphics 9 Twenty rules

11 Use linewidths big enough to read. 12 Avoid overlapping text on plotting characters or lines. 13 Follow Tufte’s principles by removing chart junk and keeping a high data-ink ratio. 14 Plots should be self-explanatory. 15 Use a sensible aspect ratio. 16 Use points not lines if element order is not relevant. 17 Use lines if element order is relevant (e.g., time). 18 Use common scales for comparison (or a single plot). 19 Avoid pie-charts. Especially 3d pie-charts. Especially 3d pie-charts with exploding wedges. 20 Never use fill patterns such as cross-hatching.

Principles of Statistical Graphics 10 Twenty rules

11 Use linewidths big enough to read. 12 Avoid overlapping text on plotting characters or lines. 13 Follow Tufte’s principles by removing chart junk and keeping a high data-ink ratio. 14 Plots should be self-explanatory. 15 Use a sensible aspect ratio. 16 Use points not lines if element order is not relevant. 17 Use lines if element order is relevant (e.g., time). 18 Use common scales for comparison (or a single plot). 19 Avoid pie-charts. Especially 3d pie-charts. Especially 3d pie-charts with exploding wedges. 20 Never use fill patterns such as cross-hatching.

Principles of Statistical Graphics 10 Twenty rules

11 Use linewidths big enough to read. 12 Avoid overlapping text on plotting characters or lines. 13 Follow Tufte’s principles by removing chart junk and keeping a high data-ink ratio. 14 Plots should be self-explanatory. 15 Use a sensible aspect ratio. 16 Use points not lines if element order is not relevant. 17 Use lines if element order is relevant (e.g., time). 18 Use common scales for comparison (or a single plot). 19 Avoid pie-charts. Especially 3d pie-charts. Especially 3d pie-charts with exploding wedges. 20 Never use fill patterns such as cross-hatching.

Principles of Statistical Graphics 10 Twenty rules

11 Use linewidths big enough to read. 12 Avoid overlapping text on plotting characters or lines. 13 Follow Tufte’s principles by removing chart junk and keeping a high data-ink ratio. 14 Plots should be self-explanatory. 15 Use a sensible aspect ratio. 16 Use points not lines if element order is not relevant. 17 Use lines if element order is relevant (e.g., time). 18 Use common scales for comparison (or a single plot). 19 Avoid pie-charts. Especially 3d pie-charts. Especially 3d pie-charts with exploding wedges. 20 Never use fill patterns such as cross-hatching.

Principles of Statistical Graphics 10 Twenty rules

11 Use linewidths big enough to read. 12 Avoid overlapping text on plotting characters or lines. 13 Follow Tufte’s principles by removing chart junk and keeping a high data-ink ratio. 14 Plots should be self-explanatory. 15 Use a sensible aspect ratio. 16 Use points not lines if element order is not relevant. 17 Use lines if element order is relevant (e.g., time). 18 Use common scales for comparison (or a single plot). 19 Avoid pie-charts. Especially 3d pie-charts. Especially 3d pie-charts with exploding wedges. 20 Never use fill patterns such as cross-hatching.

Principles of Statistical Graphics 10 Twenty rules

11 Use linewidths big enough to read. 12 Avoid overlapping text on plotting characters or lines. 13 Follow Tufte’s principles by removing chart junk and keeping a high data-ink ratio. 14 Plots should be self-explanatory. 15 Use a sensible aspect ratio. 16 Use points not lines if element order is not relevant. 17 Use lines if element order is relevant (e.g., time). 18 Use common scales for comparison (or a single plot). 19 Avoid pie-charts. Especially 3d pie-charts. Especially 3d pie-charts with exploding wedges. 20 Never use fill patterns such as cross-hatching.

Principles of Statistical Graphics 10 Twenty rules

11 Use linewidths big enough to read. 12 Avoid overlapping text on plotting characters or lines. 13 Follow Tufte’s principles by removing chart junk and keeping a high data-ink ratio. 14 Plots should be self-explanatory. 15 Use a sensible aspect ratio. 16 Use points not lines if element order is not relevant. 17 Use lines if element order is relevant (e.g., time). 18 Use common scales for comparison (or a single plot). 19 Avoid pie-charts. Especially 3d pie-charts. Especially 3d pie-charts with exploding wedges. 20 Never use fill patterns such as cross-hatching.

Principles of Statistical Graphics 10 Twenty rules

11 Use linewidths big enough to read. 12 Avoid overlapping text on plotting characters or lines. 13 Follow Tufte’s principles by removing chart junk and keeping a high data-ink ratio. 14 Plots should be self-explanatory. 15 Use a sensible aspect ratio. 16 Use points not lines if element order is not relevant. 17 Use lines if element order is relevant (e.g., time). 18 Use common scales for comparison (or a single plot). 19 Avoid pie-charts. Especially 3d pie-charts. Especially 3d pie-charts with exploding wedges. 20 Never use fill patterns such as cross-hatching.

Principles of Statistical Graphics 10 Twenty rules

11 Use linewidths big enough to read. 12 Avoid overlapping text on plotting characters or lines. 13 Follow Tufte’s principles by removing chart junk and keeping a high data-ink ratio. 14 Plots should be self-explanatory. 15 Use a sensible aspect ratio. 16 Use points not lines if element order is not relevant. 17 Use lines if element order is relevant (e.g., time). 18 Use common scales for comparison (or a single plot). 19 Avoid pie-charts. Especially 3d pie-charts. Especially 3d pie-charts with exploding wedges. 20 Never use fill patterns such as cross-hatching.

Principles of Statistical Graphics 10 Twenty rules

11 Use linewidths big enough to read. 12 Avoid overlapping text on plotting characters or lines. 13 Follow Tufte’s principles by removing chart junk and keeping a high data-ink ratio. 14 Plots should be self-explanatory. 15 Use a sensible aspect ratio. 16 Use points not lines if element order is not relevant. 17 Use lines if element order is relevant (e.g., time). 18 Use common scales for comparison (or a single plot). 19 Avoid pie-charts. Especially 3d pie-charts. Especially 3d pie-charts with exploding wedges. 20 Never use fill patterns such as cross-hatching.

Principles of Statistical Graphics 10 References

Cleveland, W.S. (1985) The elements of graphing data, Wadsworth. Cleveland, W.S. (1993) Visualizing data, Hobart Press Tufte, E.R. (1983) The visual display of quantitative information, Graphics Press. Tufte, E.R. (1990) Envisioning information, Graphics Press.

Principles of Statistical Graphics 11 References

Cleveland, W.S. (1985) The elements of graphing data, Wadsworth. Cleveland, W.S. (1993) Visualizing data, Hobart Press Tufte, E.R. (1983) The visual display of quantitative information, Graphics Press. Tufte, E.R. (1990) Envisioning information, Graphics Press.

Principles of Statistical Graphics 11 References

Cleveland, W.S. (1985) The elements of graphing data, Wadsworth. Cleveland, W.S. (1993) Visualizing data, Hobart Press Tufte, E.R. (1983) The visual display of quantitative information, Graphics Press. Tufte, E.R. (1990) Envisioning information, Graphics Press.

Principles of Statistical Graphics 11 References

Cleveland, W.S. (1985) The elements of graphing data, Wadsworth. Cleveland, W.S. (1993) Visualizing data, Hobart Press Tufte, E.R. (1983) The visual display of quantitative information, Graphics Press. Tufte, E.R. (1990) Envisioning information, Graphics Press.

Principles of Statistical Graphics 11