Analyzing∧complex binary using SAS (by a Non- statistician)

Jaswant Singh Veterinary Biomedical Sciences Most researchers use that way a drunkard uses a lamp-post –more for support than illumination

- Winfred Castle Stat 101

 Dependent Variable (Outcome)  Independent Variables (Predictor)  Covariates (Confounders)  Variable types: – Categorical (Qualitative) • Nominal, Dichotomous, Ordinal/count – Continuous (Quantitative)  Fixed versus random factors First thing first…..

 What is the primary question that I am going to answer?  Are the any secondary questions?  Understand your Model: – What is my dependent variable? – What is/are my independent variables? – What is the type of data?  Are there any confounders (covariates) Simplest Scenario

 Response variable: Binary  Independent variable: Categorical e.g. My Dean would like to know: Does the Mclean’s Prestige rating of an Institution matters for admission into graduate program at UofS? Let’s generate a Frequency Table

Prestige

High Low

Rejected 125 148

Admitted 87 40 Number students of Number

 2x2 contingency table  Chi-square test Chi-Square test

Data Grad; Input Prestige$ Admission$ number; Cards; 1 Rejected 125 1 Admitted 87 2 Rejected 148 2 Admitted 40 Proc freq; Weight number; Tables Prestige*admission/chisq exact nocol norow; Run;

Chi-Square: P-value=0.001 Dean got interested but….

 Now want us to test Institutional Prestige Rating on 1 to 4 scale (best to worst)

Prestige Rank (Highest to Lowest)

1 2 3 4

Rejected 28 97 93 55

Admitted 33 54 28 12 Number students of Number

Chi-square test: Two-tailed P-value = 0.001, Degrees of freedom = 3 Simple Situation

 An Associate Vice-President (Research) is interested in knowing what other factors affect admission into graduate school

 Variables of Interest (Independent variables): – GRE Score - continuous – Percent Marks - continuous – Prestige of the undergraduate institution – rank (1 to 4)

 Outcome or Response variable: – Admission to Graduate School is Yes / No (binary)

Example and data from ULCA Academic Technology Service: http://www.ats.ucla.edu/stat/sas/dae/logit.htm

Data SAS Code GRE Mark Prestige Adm proc means; 660 82 3 1 var gre mark; 800 90 1 1 run; 640 70 4 1

520 63 4 0 proc freq; 760 65 2 1 tables rank admission admission*rank; 560 65 1 1 run; 400 67 2 0 540 75 3 1 700 88 2 0 800 90 4 0 440 71 1 0 760 90 1 1 700 67 2 0 700 90 1 1 480 76 3 0 780 87 4 0 … …. . . Proc Logistic proc logistic descending; class rank / param=ref; model admission = gre mark rank; contrast 'Rank 1 vs 2' rank 1 -1 0 /estimate=parm; contrast 'Rank 2 vs 3' rank 0 1 -1 /estimate=parm;

contrast 'GRE200' intercept 1 gre 200 mark 74.78 rank 0 1 0 /estimate=prob; contrast 'GRE300' intercept 1 gre 300 mark 74.78 rank 0 1 0 /estimate=prob; contrast 'GRE400' intercept 1 gre 400 mark 74.78 rank 0 1 0 /estimate=prob; contrast 'GRE500' intercept 1 gre 500 mark 74.78 rank 0 1 0 /estimate=prob; contrast 'GRE600' intercept 1 gre 600 mark 74.78 rank 0 1 0 /estimate=prob; contrast 'GRE700' intercept 1 gre 700 mark 74.78 rank 0 1 0 /estimate=prob; contrast 'GRE800' intercept 1 gre 800 mark 74.78 rank 0 1 0 /estimate=prob; Run;

How about the Crossed-Categorical Factors?

A researcher (me!) is interested to examine factors leading to successful pregnancy outcome: – Blood progesterone levels during previous cycle (luteal- vs. subluteal-P4) – Time between luteolysis and exogenous LH (long- vs. short) – Can subluteal progesterone compensate for short treatment time? (P4*LH interaction) – Does parity matter ? (first-time moms vs. others) – Data were gathered over 2 years (replicate 1 and 2) Approaches

 LOGISTIC  GENMOD  GLIMMIX  GLM / PROC MIXED

Glimmix – Fixed Factors

PROC glimmix method=quad; CLASS Progest Proest Type Data Replicate; ID Replicate Progest Proest Type Foll_Dia Preg 32 1 High Long A 14 0 46 1 High Long A 12 1 MODEL Preg (event="1") = 134 1 High Long A 11 1 171 1 High Long B 11 0 Progest Proest Type Replicate 178 2 High Long B 12 1 12 2 High Long A 16 1 34 2 High Long A 15 1 Progest*Proest Progest*Type 36 2 High Long A 15 0 82 2 High Long B 15 1 Proest*Type / dist=bin link=logit; 1 1 High Short B 9 0 17 1 High Short A 9 0 21 1 High Short A 10 0 LSMEANS Progest*Proest /diff 53 1 High Short A 12 0 …………….. lines ilink or adjust=tukey; run;

Glimmix – Mixed Factors

PROC glimmix method=quad; CLASS Progest Proest Type Replicate; Data ID Replicate Progest Proest Type Foll_Dia Preg MODEL Preg (event="1") = 32 1 High Long A 14 0 46 1 High Long A 12 1 Progest Proest Type 134 1 High Long A 11 1 171 1 High Long B 11 0 178 2 High Long B 12 1 Progest*Proest Progest*Type 12 2 High Long A 16 1 34 2 High Long A 15 1 Proest*Type / dist=bin link=logit; 36 2 High Long A 15 0 82 2 High Long B 15 1 1 1 High Short B 9 0 Random intercept 17 1 High Short A 9 0 21 1 High Short A 10 0 /subject=Replicate; 53 1 High Short A 12 0 …………….. LSMEANS Progest*Proest /diff lines ilink or adjust=tukey; run;

Conclusions

 Use KISS principle  We can analyze dichotomous response variable by: – Chi-Square – Logistic regression – GenMod / Glimmix