Many Hands Make Light Work: Crowdsourced Ratings of Medical Student OSCE Performance Mark Grichanik University of South Florida, [email protected]
Total Page:16
File Type:pdf, Size:1020Kb
University of South Florida Scholar Commons Graduate Theses and Dissertations Graduate School 4-4-2017 Many Hands Make Light Work: Crowdsourced Ratings of Medical Student OSCE Performance Mark Grichanik University of South Florida, [email protected] Follow this and additional works at: http://scholarcommons.usf.edu/etd Part of the Educational Assessment, Evaluation, and Research Commons, Medicine and Health Sciences Commons, and the Psychology Commons Scholar Commons Citation Grichanik, Mark, "Many Hands Make Light Work: Crowdsourced Ratings of Medical Student OSCE Performance" (2017). Graduate Theses and Dissertations. http://scholarcommons.usf.edu/etd/6706 This Dissertation is brought to you for free and open access by the Graduate School at Scholar Commons. It has been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Scholar Commons. For more information, please contact [email protected]. Many Hands Make Light Work: Crowdsourced Ratings of Medical Student OSCE Performance by Mark Grichanik A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Psychology College of Arts and Sciences University of South Florida Major Professor: Michael Coovert, Ph.D. Walter Borman, Ph.D. Michael Brannick, Ph.D. Chad Dubé, Ph.D. Matthew Lineberry, Ph.D. Joseph Vandello, Ph.D. Date of Approval: March 31, 2017 Keywords: OSCE, crowdsourcing, patient perspective, clinical skills, medical education Copyright © 2017, Mark Grichanik DEDICATION This dissertation is dedicated to my family; it is their sacrifice and support that allows me to live my American Dream. To my mother, Stella Grichanik, whose dedication to her students stimulated my interest in education. To my father, Alex Grichanik, who taught me that if you’re gonna do it, do it right. To my brother, Edward Grichanik, who inspires in me the courage to take the road less traveled. To my enlightened wife, Hanna Grichanik, who has gifted me the balance I didn’t know I needed to savor work and life. ACKNOWLEDGMENTS Thank you to Michael Coovert, my major advisor, for his many years of guidance and support. Thank you for always offering me your wisdom, an open mind, and a stable platform from which I could pursue my passions. Thank you to my committee members, Walter Borman, Michael Brannick, Chad Dubé, Matt Lineberry, and Joseph Vandello for your service. Your expertise and constructive feedback helped shape this dissertation into a meaningful work. Thank you to the people and organizations who paved the way for me to pursue a career in medical education, Michael Brannick, Wendy Bedwell, Jan Cannon-Bowers, the Center for Advanced Medical Learning and Simulation, the Morsani College of Medicine, and the Naval Air Warfare Center – Training Systems Division. Thank you to Matt Lineberry, a dear friend, colleague, and mentor. I always need something to look forward to and someone to look up to; you satisfy the latter in abundance. You’re always courteous in saying that you’ll be chasing my coattails; for now, I’m chasing yours. Thank you to my USF classmates for many fond memories and for their unwavering support; I look forward to the privilege of a long career alongside these inspiring colleagues. Thank you to Adam Ducey, my close friend and partner through grad school; one of our projects in Selection helped germinate the idea for this dissertation. Thank you to Onursal Onen, a truly selfless friend and roommate, who helped create a home with me in Florida and who supported me through the most trying of times. Thank you to Zaur Lachuev, a friend that keeps on giving, for always boosting my motivation to finish my “dessert station.” Thank you to Rush Medical College for their generosity in sponsoring this research. Thank you to Keith Boyd for taking a chance and giving a budding IO psychologist an opportunity to make an impact. Thank you to the students and faculty of RMC, who inspire me to work in the service of the patients and families they heal and who make me feel righteous in my labor. Thank you to my RMC colleagues, for nurturing a creative workplace that allows me to innovate in pursuit of a more meaningful education for our future physicians. Thank you to the RMC Simulation Team, Jamie Cvengros, Ellenkate Finley, Rebecca Kravitz, and the RMC standardized patients and videographers for your tremendous effort in producing the videos for this project. Thank you to Ellenkate Finley and Ryan Horn for your incisive reviews of this manuscript. TABLE OF CONTENTS List of Tables iv List of Figures vi Abstract vii Chapter One: Introduction 1 Reliable, “Expert” Raters Wanted: Defining Rater Expertise and the Boundaries of Rater Reliability 5 Beyond “Blame-the-Rater”: Improving the Quality of Clinical Performance Assessment Ratings 11 Agreeing to Disagree: Rater Divergence as Information 18 Lay Raters: Using Non-Experts to Accomplish Expert Work 22 Crowdsourcing: An Open Call to Novices 25 Sorry, the Doctor is Busy: Crowdsourced Ratings in Medicine 29 Ratee Reactions to Alternative Rating Practices 34 Chapter Two: Present Study 38 Potential Advantages of Crowdsourced OSCE Ratings 38 Lay Rater Accessibility to Various Rating Tasks and Formats 41 Student Reactions to Crowdsourced OSCE Ratings 43 Research Questions 44 Chapter Three: Study 1 Method 46 Participants 46 Standardized patient raters 46 Faculty raters (ICS task) 47 Faculty raters (PE task) 47 Crowd raters 48 Measures 49 Video Stimuli 51 Collection of Ratings 53 Faculty and SP ratings 53 Crowd ratings 54 Chapter Four: Study 1 Results 57 Data Preparation 57 Crowd Demographics 58 Timing and Cost 59 Rating Distribution and Mean Performance Levels 62 i Accuracy 63 Reliability 68 Rater Reactions Questionnaire (RRX) 72 Likert-type items 72 Open-ended items 78 Chapter Five: Study 2 Method 124 Participants 124 Measures 124 Feedback Packages 125 Procedure 126 Chapter Six: Study 2 Results 129 Data Preparation 129 Rating and Feedback Quality Questionnaire (RFQQ) 129 Likert-type items 129 Forced choice package preference 132 Student Reactions Questionnaire (SRX) 133 Likert-type items 133 Open-ended items 135 Chapter Seven: Discussion 141 Who is “the Crowd”? 142 How Long Did It Take and How Much Did It Cost to Collect Crowd Ratings? 142 Were Crowd Ratings Accurate and Reliable? 145 How Did Crowd Raters Feel About the Rating Tasks? 149 Did Students Prefer Feedback from Crowd Raters? 152 How Did Students React to Crowd Ratings? 153 Limitations and Future Directions 158 Implications and Conclusion 162 References 165 Appendices 176 Appendix 1: IRB Approval Letters 177 Appendix 2: Interpersonal and Communication Skills Questionnaire (ICSF) 183 Appendix 3: Physical Exam Skills Checklist (PESC) 187 Appendix 4: Interpersonal and Communication Skills Task Crowd Rater Reactions Questionnaire (RRX-ICSF) 190 Appendix 5: Physical Exam Task Crowd Rater Reactions Questionnaire (RRX-PESC) 192 Appendix 6: Demographics Questions by Participant Type 194 Appendix 7: Sample Video Frames for ICS and PE Videos 198 Appendix 8: Sample SP Case Door Chart 199 Appendix 9: Survey Gizmo Layout for the Physical Exam Skills Task for Physician Faculty Raters 200 ii Appendix 10: Survey Gizmo Layout for the Interpersonal and Communication Skills Task for Behavioral Science Faculty Raters 202 Appendix 11: Survey Gizmo Layout for the Interpersonal and Communication Skills and Physical Exam Skills Tasks for Standardized Patient Raters 204 Appendix 12: Survey Gizmo Layout for the Physical Exam Skills Task for Crowd Raters 206 Appendix 13: Survey Gizmo Layout for the Interpersonal and Communication Skills Task for Crowd Raters 212 Appendix 14: Sample Amazon Mechanical Turk HIT Posting 216 Appendix 15: RRX-ICSF Content Analysis Comments by Theme 217 Appendix 16: RRX-PE Content Analysis Comments by Theme 226 Appendix 17: Rating and Feedback Quality Questionnaire for the ICS Task (RFQQ-ICS) 232 Appendix 18: Rating and Feedback Quality Questionnaire for the PE Task (RFQQ-PE) 235 Appendix 19: Student Reactions Questionnaire (SRX) 236 Appendix 20: Sample Interpersonal and Communication Skills Feedback Package Presentation 241 Appendix 21: Sample Physical Exam Skills Feedback Package Presentation 246 Appendix 22: Survey Gizmo Layout for the Student Reactions Study 248 Appendix 23: SRX Content Analysis. Advantages Comments by Theme 255 Appendix 24: SRX Content Analysis. Disadvantages Comments by Theme 259 iii LIST OF TABLES Table 1: Video and task properties 81 Table 2: Rater schematic 82 Table 3: Comparison of Crowd rater and U.S. demographics 83 Table 4: Task timing (video x task x rater type) 84 Table 5: Timing to complete rating packages (video x task x rater type) 85 Table 6: Response distribution for ICSF checklist items (item x rater type x video) 86 Table 7: Response distribution for ICSF global items (item x rater type x video) 87 Table 8: Descriptive statistics for ICSF (item x rater type x video) 91 Table 9: Response distribution for PESC (item x rater type x video) 94 Table 10: Raw agreement for ICSF (item x rater type) 96 Table 11: Average deviation for ICSF items (item x rater type) 97 Table 12: Average deviation for ICSF items (rater type x skill level) 98 Table 13: Raw agreement for PESC items (item x rater type) 99 Table 14: Variance components from ICSF generalizability studies (item x rater type) 100 Table 15: Phi coefficients (single rater, max raters, cost equivalent) from decision studies for ICSF items (item x rater type) 101 Table 16: Phi coefficients (minimum