Biostatistics 213, Fall 2013 Applied Multiple Regression for Clinical Research

Instructor: E John Orav Email: [email protected] Teaching Assistants: Christina McIntosh Email: [email protected] Fei Li Email: [email protected]

Lecture: Wednesday 8:30-10:20 Discussion: Monday 8:30-10:20

Group Office Hours: (Webex) Tuesday 9:00-10:00 and Friday 12:00-1:00, as needed Individual Office Hours: (Webex) As requested

Contact: - Use the ChatRoll on the class website for communication during Monday and Wednesday lectures - Use Webex for group and individual office hours - Use the homework, notes and exam tabs on the website to get class notes and homeworks - Use the homework and exam drop-boxes on the website to turn in assignments and exams

Course Goals: This course will introduce students involved with clinical research to the practical application of regression analysis. Linear regression and logistic regression are the primary models covered in the course, as well as general concepts in model selection, goodnessoffit, and testing procedures. Each lecture will be accompanied by a data analysis using JMP, and an on-line discussion of the results. The course will introduce but will not attempt to develop the underlying likelihood theory, and will use as little calculus as possible. Upon completion of this course, you should be able to carry out your own multiple linear and logistic regression analyses. The specific topics that we will cover are listed on the attached syllabus. You will not be able to derive the distributions of the test statistics, nor describe in any detail the algorithms that lead to model estimates. This will not hinder you in carrying out data analyses, but it will limit your ability to extrapolate to other models. We will also not cover survival analysis nor talk at all about correlated data. If you wish to develop these skills, you should consider more advanced courses in biostatistics in the future.

Format: 1. Every Wednesday will start with a general lecture from 8:30 to 10:20. Use ChatRoll to convey your questions to the TA and I will answer them in real-time, just as if you were present in class. (Infrequently, the Wednesday lecture may include a demonstration of the pertinent SAS code, as well as interpretation of the SAS output.) A homework assignment for the week will ask you to exercise those same techniques on a different data set. The homework will outline the necessary JMP and SAS commands, and the website will include an additional JMP handout with more general details about the approach.

2. I will be available by Webex to answers questions about the Wednesday lecture or about JMP on Fridays from 12:00-1:00. If you would like to join this group discussion, please let me know by 5PM on Thursday and I will let everyone know who is participating. If no one responds by 5PM, there will be no Friday session.

3. The following Monday, from 8:30 to 10:20, we will discuss the results of your data analyses in class. Again, use ChatRoll on the website to relay your questions through the TA.

4. I will be available by Webex to answer questions about the homework or about JMP on Tuesdays from 9:00-10:00. . If you would like to join this group discussion, please let me know by 5PM on Monday and I will let everyone know who is participating. If no one responds by 5PM, there will be no Tuesday session.

5. If you have a question you would like to discuss individually, or if the timing of the group sessions is not possible for you, then please email me and suggest some times that we could communicate through Webex individually.

Grading: There will be two data sets that we will go through as a group, plus separate data sets for the midterm and final. Every Monday you must hand in a summary (no computing print-outs!) of the results of your data analysis. It will be recorded but not graded. Homeworks can be turned in late without penalty. Homeworks can be done collaboratively. Homeworks should be submitted through the “drop-box” on the class website.

The midterm is a project in which you must run a linear regression analysis of a data set that I will provide, but that has not been previously discussed in class. Midterms must be your work and no one else’s. Your midterm will be submitted to the drop-box and I will grade and comment on each one. After all midterms are submitted, there will be a short, on-line exam based entirely on the midterm. There will be nothing new to study, and no new analyses to carry out. The on-line exam will be monitored to insure that the work is your own.

The final is a project in which you can choose your own data set and analyze it. You can work alone or in small groups of 2 or 3. If you have no data set available, then one will be provided. The final must reflect your own work and, if relevant, the work of your group; no one else’s. Your grade will be based primarily on the midterm and final; less so on the weekly assignments and your participation during the Monday sessions. Textbooks: The class is based entirely on the notes that are available on the class website. If you would like additional or alternative reading materials, there are two textbooks which I would recommend. Only the first of these, "Applied Regression Analysis...” is highly recommended; the other one is recommended for those whose research interests take them toward binary endpoints. We will spend half of the course on linear regression; many of the ideas we learn there (i.e., model selection schemes; dummy variables; residuals) will carry over to logistic models. Hence, the regression textbook is the primary text. However, the other text is very readable and could serve you as nice references in the future.

The two textbooks are:

- Applied Regression Analysis and Other Multivariate Methods, by Kleinbaum, Kupper, Muller, and Nizan

Applied Logistic Regression Analysis, by Hosmer and Lemeshow

Class Materials: Notes for the class, as well as homework assignments, computing outlines, exams, and videos of the lectures are available on the class website. You should print out the notes in advance of the lectures, or have them available on your computer for viewing during the lecture. Syllabus: “Wednesday” Lectures

September 4 Scatter Plots, Correlation and Simple Regression (Ch. 5, 6) 11 Theory: Simple Regression and ANOVA (Ch. 7) 18 Theory and Models: Multiple Regression (Ch. 8, 9) 25 Binary and Categorical Predictors (Ch. 12)

October 2 Colinearity and Confounding (Ch. 11, 14) 9 Model Selection Schemes (Ch. 16) 16 Interaction Terms (Ch. 11, 12, 13) (Midterms Handed Out) 21 Interaction Terms 23 Residual Analysis (Ch. 14)

28 Transformations of the Predictors (Ch. 15) 30 Transformations of the Outcome Variable (Ch. 14) (Midterms Handed In)

November 4 Median Regression 6 Theory and Models: Logistic Regression (Ch. 22)

13 Estimation & Interpretation: Logistic Regression (Ch. 22) 20 Model Selection and Diagnostics for Logistic Regression 25 Midterm Review

December 2 Prediction Rules 4 Ordinal Logistic Regression (Ch. 23) (Finals Handed Out) 9 Propensity Scores (Notes to be handed out)

11 Missing Data (Notes pp 142-155; article)

13 Friday – Potential Catch-Up Lecture 16 Presentations of Final Projects 18 Presentations of Final Projects (Finals Handed In by 12/20) Note: If you would prefer to get verbal feedback from your colleagues, you can defer the date of the final and present orally in January at a date to be arranged. This is totally optional and can be done through teleconference. You do not need to travel to Boston.