Summary Statistics in Rows

Paper PH02 Summary Statistics in Rows John Henry King, Ouachita Clinical Data Services, Inc. Mount Ida, AR ABSTRACT Many if not most summary tables created from clinical trials data use basically the same format, they display a mixture of discrete (categorical) and/or continuous variables in a style that could be referred to as “summary statistics in rows”. Each variable whether, discrete or continuous forms a group of rows in the table, the descriptive labels for these variables form the first left most column of the table. Continuous variable rows are made up of descriptive statistic descriptions e.g. N, Mean, Standard Deviation, Median, Minimum and Maximum. It is common for two or more of these statistics to be displayed on the same row; “Mean (SD)” and “Min, Max”. Discrete variable rows are formed from the descriptive values of the levels of each discrete variable, e.g. levels of Gender (Male, Female). Usually the treatment variable from the clinical trial forms groupings in columns to the right of the row descriptions. The table’s cells are either count and column percent for discrete variables or the values of summary statistics for continuous variables. This table style is not directly obtainable in SAS® and many pharmaceutical companies and individuals have devoted great effort in programming elaborate macros to produce this table. This paper presents a simple yet dynamic program that uses SAS procedures, standard SAS metadata objects FORMAT and LABEL and a few macro variables to produce tables in the style of summary statistics in rows. Summary statistics are displayed with a decimal precision that is derived from each continuous variable’s format. Each variable’s LABEL is used to describe the variable in the table. The formatted value of the discrete variables is used to label the rows for each discrete variable. Complete rows and/or columns of zero counts are accommodated using SAS procedure options. This is all accomplished without a macro, using SAS procedure and data steps, yet the program is dynamic with regards to variables summarized and treatment details. INTRODUCTION The most common clinical trials table looks more or less like the example shown in Figure 1. Fonts and titling may be different but the table should look very familiar to anyone working with clinical trials data. This table style is used for both safety and efficacy data summaries. It may have variables of all one type, categorical or continuous, or a mixture. The summary statistics displayed here are easily obtainable with SAS procedures SUMMARY and FREQ; it is the formatting that makes this table require extra attention, specifically the table cells that contain values from two statistics, N (%), Mean (SD) and Min, Max. Also, the row labels which usually consists of data from two variables that are displayed in one column, this will require some addition processing. This goal of this paper is to describe a program that will produce tables of this type. This program will be dynamic but will not use a complicated macro to achieve the results. Without a complicated macro the program can be readily distributed to clients without the need to support or explain or otherwise document a complicated macro. The program uses SAS PROCEDURES and simple data steps to achieve the results. The syntax and other details are fully documented in the SAS user documentation. The program uses a simple user interface of %LET statements. 1 ABC Pharmaceutical Page 1 of 1 Protocol: ABC2010-103, A Phase III Study Table 14-2.1 Summary of Demographic Characteristics at Baseline Treatment Placebo Active Total (N=13) (N=6) (N=19) Gender, n(%) Female 7 (53.8) 2 (33.3) 9 (47.4) Male 6 (46.2) 4 (66.7) 10 (52.6) Ethnic Origin, n(%) White 7 (53.8) 4 (66.7) 11 (57.9) Black 6 (46.2) 1 (16.7) 7 (36.8) Hispanic 0 1 (16.7) 1 (5.3) Other 0 0 0 Age (years) N 13 6 19 Mean (SD) 13.2 (1.5) 13.7 (1.6) 13.3 (1.5) Median 13.0 13.5 13.0 Min, Max 11, 15 12, 16 11, 16 Age group, n(%) 10 and Under 0 0 0 Pre-teen 5 (38.5) 2 (33.3) 7 (36.8) Teen 8 (61.5) 4 (66.7) 12 (63.2) C:\Documents and Settings\johking\My Documents\My 03SEP11:09:45:55 SAS Files\9.1\TT05.sas Figure 1 Example of a Summary Statistics in Rows table. The program uses PROC REPORT as the “print engine” to display a specially structured data set that is printed with very generic PROC REPORT syntax. Therefore most of the program is concerned with building this data set. The PROC CONTENTS output shown in Figure 2 describes the variables created by the program. Variable numbers 1-6 will be always be created, variable numbers 7 and higher are the treatment column variables. The number of COL_n variables depends on the number of levels for the treatment variable. In the example data there are three levels of treatment. # Variable Type Len Label 1 page Num 8 Break variable used in PROC REPORT BREAK statement 2 2 vorder Num 8 Order variable used to order the analysis variables, defined in PROC REPORT as ORDER NOPRINT. 3 vname Char 32 Analysis variable name 4 vlabel Char 256 Analysis variable label, defined in PROC REPORT as ORDER NOPRINT, displayed with LINE statement in COMPUTE block. 5 roworder Num 8 Order variable for row labels, defined in PROC REPORT as ORDER NOPRINT 6 rowlabel Char 256 Detail row label, derived from categories or statistic labels 7 col_1 Char 32 Placebo~(N=13) 8 col_2 Char 32 Active~(N=6) 9 col_3 Char 32 Total~(N=19) Figure 2 Variables information from PROC CONTENTS. An example of the data contained in this data set is displayed in Figure 3. Variables COL_2 and COL_3 have been omitted here. page vorder vname vlabel roworder rowlabel col_1 1 3 Age Age (years) 1 N 13 1 3 Age Age (years) 2 Mean (SD) 13.2 (1.5) 1 3 Age Age (years) 3 Median 13.0 1 3 Age Age (years) 4 Min, Max 11, 15 2 6 Height Height (inches) 1 N 13 2 6 Height Height (inches) 2 Mean (SD) 61.87 (4.81) 2 6 Height Height (inches) 3 Median 62.50 2 6 Height Height (inches) 4 Min, Max 51.3, 69.0 2 7 Weight Weight (lbs.) 1 N 13 2 7 Weight Weight (lbs.) 2 Mean (SD) 94.31 (17.68) 2 7 Weight Weight (lbs.) 3 Median 98.00 2 7 Weight Weight (lbs.) 4 Min, Max 50.5, 112.5 2 5 bmi BMI (kg/m**2) 1 N 13 2 5 bmi BMI (kg/m**2) 2 Mean (SD) 17.181 (1.917) 2 5 bmi BMI (kg/m**2) 3 Median 17.772 2 5 bmi BMI (kg/m**2) 4 Min, Max 13.49, 20.25 1 4 ageg Age group, n(%) 1 10 and Under 0 1 4 ageg Age group, n(%) 2 Pre-teen 5 (38.5) 1 4 ageg Age group, n(%) 3 Teen 8 (61.5) 1 2 race Ethnic Origin, n(%) 1 White 7 (53.8) 1 2 race Ethnic Origin, n(%) 2 Black 6 (46.2) 1 2 race Ethnic Origin, n(%) 3 Hispanic 0 1 2 race Ethnic Origin, n(%) 4 Other 0 1 1 Sex Gender, n(%) 1 Female 7 (53.8) 1 1 Sex Gender, n(%) 2 Male 6 (46.2) Figure 3 Example of data set produce by the program. DETAILS SUBJECT LEVEL DATA The program accepts a subject level data set similar to the ADaM data ADSL. This data set has one observation for each subject, a treatment variable used to define summary table columns and analysis variables. The statements below create a data set that will demonstrate the features of the program. proc format; value trt 1='Placebo' 2='Active' 3='Total'; value $sex 'F'='Female' 'M'='Male'; value race 1 ='White' 2='Black' 3='Hispanic' 4='Other'; value age low-10 = '10 and Under' 11-12='Pre-teen' 13-high='Teen'; run; 3 These formats are used to label the output. Each categorical variable should be formatted if it is a coded variable, e.g. TRT, RACE and SEX or if there are levels of the variable that do not appear in the data. The grouping format age will be used to create and summarize age groups. We can use SASHELP.CLASS to model ADSL, treatment (TRT) and a few other variables are created to make the output more interesting. Notice that each variable is given a LABEL and a FORMAT, these metadata objects will be used to annotate the output. data ADSL; set sashelp.class; trt = rantbl(12345,.5); race = rantbl(12345,.5,.4); ageg = age; bmi = (weight*703) / height**2; attrib age format=F3.0 label='Age (years)'; attrib height format=F7.1 label='Height (inches)'; attrib weight format=F7.1 label='Weight (lbs.)'; attrib sex format=$sex. label='Gender'; attrib race format=race. label='Ethnic Origin'; attrib ageg format=age. label='Age group'; attrib bmi format=F7.2 label='BMI (kg/m**2)'; attrib trt format=trt. label='Treatment'; run; The data created by this data step are shown in Figure 4, with the formatting removed. Obs Name Sex Age Height Weight trt race ageg bmi 1 Alfred M 14 69.0 112.5 1 2 14 16.6115 2 Alice F 13 56.5 84.0 2 1 13 18.4986 3 Barbara F 13 65.3 98.0 1 2 13 16.1568 4 Carol F 14 62.8 102.5 1 2 14 18.2709 5 Henry M 14 63.5 102.5 2 2 14 17.8703 6 James M 12 57.3 83.0 1 1 12 17.7715 7 Jane F 12 59.8 84.5 1 1 12 16.6115 8 Janet F 15 62.5 112.5 1 2 15 20.2464 9 Jeffrey M 13 62.5 84.0 1 1 13 15.1173 10 John M 12 59.0 99.5 1 1 12 20.0944 11 Joyce F 11 51.3 50.5 1 2 11 13.4900 12 Judy F 14 64.3 90.0 1 2 14 15.3030 13 Louise F 12 56.3 77.0 2 1 12 17.0777 14 Mary F 15 66.5 112.0 1 1 15 17.8045 15 Philip M 16 72.0 150.0 2 1 16 20.3414 16 Robert M 12 64.8 128.0 2 1 12 21.4297 17 Ronald M 15 67.0 133.0 2 3 15 20.8285 18 Thomas M 11 57.5 85.0 1 1 11 18.0733 19 William M 15 66.5 112.0 1 1 15 17.8045 Figure 4 Data ADSL, example input data.

Summary Statistics in Rows

Summarize — Summary Statistics

U3 Introduction to Summary Statistics

Descriptive Statistics

Measures of Dispersion for Multidimensional Data

Summary Statistics, Distributions of Sums and Means

Numerical Summary Values for Quantitative Data 35

Measures of Dispersion

Section II Descriptive Statistics for Continuous & Binary Data (Including

4. Descriptive Statistics

Analysis of Variance with Summary Statistics in Microsoft Excel

Descriptive Statistics

1 - to Overcoming the Reluctance of Epidemiologists to Explicitly Tackle the Measurement Error Problem