Box Plots and Radar Plots for Pharmaceutical Studies: Some of the Better Ways to Get a Clear Picture of Study Results
Total Page:16
File Type:pdf, Size:1020Kb
Paper SP09 Box plots and Radar Plots for Pharmaceutical Studies: Some of the Better Ways to get a Clear Picture of Study Results Mary Cowmeadow, BioStat Reports, Plymouth, Michigan Abstract Box plots, bar graphs, histograms and histograms are among the most useful plots in the pharmaceutical industry. It is possible to plot virtually anything in a clinical trial dataset with one or the other. If the data is discrete, use a bar graph or histogram; if it is continuous, use a box plot. The reason these simple plots are so appropriate is that treatment group is almost always a discrete entity. A change variable such as the difference in labs from baseline to final plotted by treatment group is the perfect setup for box plots. For smaller studies, other types of plots such as radar plots can be useful as well. For many years good box plots have not been available to the general pharmaceutical user in SAS, but, with the advent of version 8, really good plots can now be found in SAS/STAT as PROC BOXPLOT. In this paper you will explore box plots styles, uses and options, and how the block variable concept solves the axis problem. Radar plots will be discussed and compared to spaghetti plots. Introduction Several years ago a hallway discussion lead to a long hunt for good box plots. “What would be the ideal plot for labs” was the question. Some people suggested following each patient over time. But for large studies this was impracticable. The best suggestion was to do box plots of the change in each parameter from baseline to final value. It then seemed that, between box plots and bar graphs / histograms, one could plot all the data collected in normal clinical trials. Bar graphs or histograms would work for categorical data; box plots are ideal for continuous data crossed with treatment group. Since box plots are nonparametric and give you medians and quartiles, the normality assumption is not necessary. Several years later we actually did plot virtually everything in a small study with either box plots or bar graphs. This not only helped with study interpretation, it also helped with a more mundane task: checking tables. After that we added the Kaplan-Meier plots to make the list more complete. More recently we have realized that radar plots can be really nice for smaller studies. They are useful for the same type of data as the box plot and spaghetti plot, so they too will be described. We hope you will find them an exciting new addition to our repertoire of pharmaceutical plots. Early box plots As a result of that hallway discussion, the study standardization group incorporated box plots into the then new Parke-Davis standard system called CIDS. This was a nice beginning. But they were not immediately popular. This was in part because few people really understood box plots. They needed an “advocate”, and, especially one with a handy explanation of what they mean. Since there are several ways that one can do box plots, even statisticians needed to know “what they mean”. The next place box plots were found was in version 6.12 with SAS Insight. These were attractive plots with drill down capability, but they had to be generated interactively each time one wanted to use one. They were nice, but in the final crunch of a study, there was too little time to do them. At that time we were using Excel to output data, and Excel does have some nice graphing capabilities, but it did not have box plots. The language R was too little known to the programmers, but it does have good box plots and might have worked well. Then came SAS version 8.2. Box plots were found first in SAS solutions and SAS analyst. The SAS analyst rather neatly produced re-useable code, something SAS Insight had not done. One could surround it later with ODS statements and produce beautiful fonts in larger more readable type. ODS graph output could then go into Word or PDF files. For the first time box plots began to look really good. PROC BOXPLOT options seemed endless, better even than the options in SAS/GRAPH. PROC BOXPLOT and styles The first thing you learn, however, is that there are several styles of box plots and one should be careful with the differences. The style “SCHEMATICID” is especially appealing because one can identify outliers with circles and print the patient number right next to each circle. This works really well for small studies with few treatment groups and short patient numbers. The people in study work groups love it. But when any of those conditions are not met, these plots are less appealing. Patient numbers print on top of one another and you cannot read them. Long patient numbers are sometimes truncated at the edge of a page. So the next thing worth to try is SCHEMATICIDFAR, which only prints patient numbers on the farthest outliers. This can help, but again 2 patient numbers can still print on top of one another. So, for larger studies, you may want to abandon the attempt to print the patient number on the graph, and use plain SCHEMATIC or SKELETAL instead. See figures 1-4 Figure 1: Example of block variables and footnotes with a SCHEMATICID boxstyle. Figure 2: Example of the options SCHEMATICID. Clinicians really like to see the patient number printed next to the outlier, but it sometimes leads to overprinting, as seen here. Figure 3: The option skeletal avoids these problems, but also gives one less information: Figure 4. Color is a nice option. Here we use the colors SAS provided in their examples, because it is difficult to find a compatible set of colors from scratch. But these colors are not as clear to many eyes as the black and white versions, and do not print very well. Long Treatment Variable Names If you have long treatment names and many treatments, SAS may not print all of them. You can, however, add a split character to the axis statement, which will be enough in most circumstances. Some SAS procedures will sort the variables used. For PROC BOXPLOT you will need to sort before you invoke it. Block Variables One of the nicer things you can do in PROC BOXPLOT and in other plotting procedures is to use block variables. Block variables are similar to doing a “by”, but much nicer because they solve the whole problem of having axes change from plot to plot for each value of the “by”. For example, if you are plotting a lab parameter by treatment and gender and one uses gender as a “by”, the 2 plots for gender are likely to come out with different axes. This may or may not be noticed by the user. It is much better to insure that those 2 plots, one for male and one for female, have the same axis. You can, of course, after the fact add an axis statement, but a very nice way to do the same thing is with the block method. In this case males and females can often be put on the same plot, and this definitely enhances interpretation. To see an example, look at Figures 1-3. To find how to use block variables in online documentation, look up BLOCKPOS in SAS OnlineDoc, version 8. While it is mentioned under PROC BOXPLOT, it is not well described and there is no example in v8. But the code is quite simple but not that obvious. For example, it is the parenthesis in the example below turns birth weight into a block variable, but you might not guess this. proc boxplot data=lab; plot labvar*Treatment(birthweight) / blockpos=3; run; Figure 1 above illustrates using a block variable, birth weight, within treatment group. Figure 2 illustrates using a block variable for one factor of a factorial study, where treatment factor B is “b1”, “b35” and “b72 Plot labvar * FactorB(FactorA)/blockpos=2; You might wonder where the box went in the first treatment in Figures 2 and 3. It turned out that there were a lot of cases where there was zero change in that group, so the box shrank to 1 line. The box may also disappear if you forget to sort the data, resulting in only one patient per treatment. BLOCKPOS tells SAS whether to put the block variable on the top or bottom or below the graph. Reference lines A zero reference line is especially important for change variables. It helps one spot such things as negative or positive trends across dose. One can also include several reference lines. These are particularly useful if you are forced to use a real “by” for one parameter: the movement of the reference lines from plot to plot helps remind the reader of the change in axes. Interpretation and footnotes: So how does one interpret the various styles of box plots? SAS online documentation has an excellent explanation, and a very nice diagram of SCHEMATIC box plots. Condensing all that information to a footnote, however, is a somewhat daunting task. The most difficult part is to explain is the whisker length for SCHEMATIC plots. If you define the box as “1st-3rd (Q1-Q3) quartiles”, you can then use Q to explain whisker length for SCHEMATIC plots. This whisker length is then the “largest value < 1.5(Q3-Q1)”. The rest of the footnote is now easy: the median is the line inside the box, the mean is the plus, and the circles outliers.