Paper SP09

Box plots and Radar Plots for Pharmaceutical Studies: Some of the Better Ways to get a Clear Picture of Study Results

Mary Cowmeadow, BioStat Reports, Plymouth, Michigan

Abstract Box plots, bar graphs, histograms and histograms are among the most useful plots in the pharmaceutical industry. It is possible to virtually anything in a clinical trial dataset with one or the other. If the data is discrete, use a bar graph or histogram; if it is continuous, use a box plot. The reason these simple plots are so appropriate is that treatment group is almost always a discrete entity. A change variable such as the difference in labs from baseline to final plotted by treatment group is the perfect setup for box plots. For smaller studies, other types of plots such as radar plots can be useful as well.

For many years good box plots have not been available to the general pharmaceutical user in SAS, but, with the advent of version 8, really good plots can now be found in SAS/STAT as PROC BOXPLOT. In this paper you will explore box plots styles, uses and options, and how the block variable concept solves the axis problem. Radar plots will be discussed and compared to spaghetti plots.

Introduction Several years ago a hallway discussion lead to a long hunt for good box plots. “What would be the ideal plot for labs” was the question. Some people suggested following each patient over time. But for large studies this was impracticable. The best suggestion was to do box plots of the change in each parameter from baseline to final value.

It then seemed that, between box plots and bar graphs / histograms, one could plot all the data collected in normal clinical trials. Bar graphs or histograms would work for categorical data; box plots are ideal for continuous data crossed with treatment group. Since box plots are nonparametric and give you medians and quartiles, the normality assumption is not necessary.

Several years later we actually did plot virtually everything in a small study with either box plots or bar graphs. This not only helped with study interpretation, it also helped with a more mundane task: checking tables. After that we added the Kaplan-Meier plots to make the list more complete. More recently we have realized that radar plots can be really nice for smaller studies. They are useful for the same type of data as the box plot and spaghetti plot, so they too will be described. We hope you will find them an exciting new addition to our repertoire of pharmaceutical plots.

Early box plots As a result of that hallway discussion, the study standardization group incorporated box plots into the then new Parke-Davis standard system called CIDS. This was a nice beginning. But they were not immediately popular. This was in part because few people really understood box plots. They needed an “advocate”, and, especially one with a handy explanation of what they mean. Since there are several ways that one can do box plots, even statisticians needed to know “what they mean”.

The next place box plots were found was in version 6.12 with SAS Insight. These were attractive plots with drill down capability, but they had to be generated interactively each time one wanted to use one. They were nice, but in the final crunch of a study, there was too little time to do them. At that time we were using Excel to output data, and Excel does have some nice graphing capabilities, but it did not have box plots. The language R was too little known to the programmers, but it does have good box plots and might have worked well. Then came SAS version 8.2. Box plots were found first in SAS solutions and SAS analyst. The SAS analyst rather neatly produced re-useable code, something SAS Insight had not done. One could surround it later with ODS statements and produce beautiful fonts in larger more readable type. ODS graph output could then go into Word or PDF files. For the first time box plots began to look really good. PROC BOXPLOT options seemed endless, better even than the options in SAS/GRAPH.

PROC BOXPLOT and styles The first thing you learn, however, is that there are several styles of box plots and one should be careful with the differences. The style “SCHEMATICID” is especially appealing because one can identify outliers with circles and print the patient number right next to each circle. This works really well for small studies with few treatment groups and short patient numbers. The people in study work groups love it. But when any of those conditions are not met, these plots are less appealing. Patient numbers print on top of one another and you cannot read them. Long patient numbers are sometimes truncated at the edge of a page. So the next thing worth to try is SCHEMATICIDFAR, which only prints patient numbers on the farthest outliers. This can help, but again 2 patient numbers can still print on top of one another. So, for larger studies, you may want to abandon the attempt to print the patient number on the graph, and use plain or SKELETAL instead. See figures 1-4

Figure 1: Example of block variables and footnotes with a SCHEMATICID boxstyle.

Figure 2: Example of the options SCHEMATICID. Clinicians really like to see the patient number printed next to the outlier, but it sometimes leads to overprinting, as seen here.

Figure 3: The option skeletal avoids these problems, but also gives one less information:

Figure 4. Color is a nice option. Here we use the colors SAS provided in their examples, because it is difficult to find a compatible set of colors from scratch. But these colors are not as clear to many eyes as the black and white versions, and do not print very well.

Long Treatment Variable Names If you have long treatment names and many treatments, SAS may not print all of them. You can, however, add a split character to the axis statement, which will be enough in most circumstances.

Some SAS procedures will sort the variables used. For PROC BOXPLOT you will need to sort before you invoke it.

Block Variables One of the nicer things you can do in PROC BOXPLOT and in other plotting procedures is to use block variables. Block variables are similar to doing a “by”, but much nicer because they solve the whole problem of having axes change from plot to plot for each value of the “by”. For example, if you are plotting a lab parameter by treatment and gender and one uses gender as a “by”, the 2 plots for gender are likely to come out with different axes. This may or may not be noticed by the user. It is much better to insure that those 2 plots, one for male and one for female, have the same axis. You can, of course, after the fact add an axis statement, but a very nice way to do the same thing is with the block method. In this case males and females can often be put on the same plot, and this definitely enhances interpretation. To see an example, look at Figures 1-3.

To find how to use block variables in online documentation, look up BLOCKPOS in SAS OnlineDoc, version 8. While it is mentioned under PROC BOXPLOT, it is not well described and there is no example in v8. But the code is quite simple but not that obvious. For example, it is the parenthesis in the example below turns birth weight into a block variable, but you might not guess this.

proc boxplot data=lab; plot labvar*Treatment(birthweight) / blockpos=3; run; Figure 1 above illustrates using a block variable, birth weight, within treatment group. Figure 2 illustrates using a block variable for one factor of a factorial study, where treatment factor B is “b1”, “b35” and “b72

Plot labvar * FactorB(FactorA)/blockpos=2;

You might wonder where the box went in the first treatment in Figures 2 and 3. It turned out that there were a lot of cases where there was zero change in that group, so the box shrank to 1 line. The box may also disappear if you forget to sort the data, resulting in only one patient per treatment.

BLOCKPOS tells SAS whether to put the block variable on the top or bottom or below the graph.

Reference lines A zero reference line is especially important for change variables. It helps one spot such things as negative or positive trends across dose. One can also include several reference lines. These are particularly useful if you are forced to use a real “by” for one parameter: the movement of the reference lines from plot to plot helps remind the reader of the change in axes.

Interpretation and footnotes: So how does one interpret the various styles of box plots? SAS online documentation has an excellent explanation, and a very nice of SCHEMATIC box plots. Condensing all that information to a footnote, however, is a somewhat daunting task.

The most difficult part is to explain is the whisker length for SCHEMATIC plots. If you define the box as “1st-3rd (Q1-Q3) quartiles”, you can then use Q to explain whisker length for SCHEMATIC plots. This whisker length is then the “largest value < 1.5(Q3-Q1)”. The rest of the footnote is now easy: the median is the line inside the box, the mean is the plus, and the circles outliers. For SKELETAL plots, the whisker length is the maximum or minimum value.

Radar Plots: For smaller studies radar plots can be very useful, especially for our clinicians who want to see every point plotted. Spaghetti plots have traditionally been used (see Figure 6), but are difficult to read. You can think of radar plots as something like a spaghetti plot, but more organized. They are also called “spider plots” and “cobweb plots”. Radar plots are available in Excel, and data can be sorted in SAS and written to Excel with a DDE statement, or by simply exporting the file to SAS.

The radar plot below is based on fake data sorted by the baseline value. Each spoke or radial line represents a patient, and each radial line contains 2 values for the patient, the baseline and final value. There are 15 patients, and thus 15 “spokes”. In this example, the spiral line of the sorted baseline value is contrasted with the more rounded line of the final values showing that the final values are less extreme. If you look at the data and then look at the plot, you should be able to understand intuitively what it means.

Figure 5. A simple radar plot with data.

Patient # Baseline Final 10.5 2.3 20.8 2.2 30.7 2.1 40.6 1.2 51.2 1.7 61.6 1.2 71.8 2.5 81.9 2.5 92.2 2.9 10 2.4 3.2 11 3.3 2.9 12 3.5 3.2 13 3.6 2.9 14 3.9 2.8 15 4 2.9

Figure 6. A Spaghetti Plot Contrast the above with the more traditional spaghetti plot. While the slope of the spaghetti plot line is the first thing to hit your eye, but the plot is so chaotic with as few as 16 patients that it is difficult to trace the subject. Pt 1 Spaghetti Plot Pt 2 Pt 3 30 Pt 4 Pt 5 25 Pt 6 20 Pt 7 Pt 8 15 Pt 9 10 Pt 10 Pt 11

Parameter Value 5 Pt 12 0 Pt 13 Baseline Final Pt 14 Visit Pt 15 Pt 16

In a radar plot it is not the slope that is important, but the distance between the 2 dots on the same spoke. If the 2 dots are far apart, the change is large.

The next 2 radar plots, figure 7a and 7b, represent exciting results where values are dramatically lowered. One plot is for placebo, the other for the comparison drug. This type of data has actually been seen.

Figure 7a.

Placebo

1 16 30 2 15 20 3 14 10 4 Baseline 13 0 5 Final 12 6 11 7 10 8 9

Figure 7b.

Drug A 80 mg.

1 18 30 2 17 3 20 16 4 10 15 5 Baseline 0 14 6 Final 13 7 12 8 11 9 10

Conclusion: To the many people who are visually oriented rather than “number” oriented, plots are the fastest way to absorb and parse large amounts of information from a study. In addition, they are a good way to double check the data in the tables and vise versa. Our goal, as SAS programmers, is to present information about experimental drugs to the study designers so that the best and most full-informed decision can be made about the future of that drug. Box plots and radar plots are among of the most useful ways to do this, and deserve prominent places in our plotting repertoire.

References: SAS Institute Inc. SAS OnlineDOC, Version 8, 1999, Cary, NC.

Trademarks SAS is a registered trademark of SAS Institute, Inc. in the USA and other countries. The symbol  indicates USA registration. Microsoft is the trademark of the Microsoft Corporation.

Contact Information: Your comments and questions are encouraged. Please feel free to contact the author at: Mary Cowmeadow BioStat Reports 9225 Joy Road Plymouth, Mi 48170 Email: [email protected]