Machine-Learning-Based Multiple Abnormality Prediction with Large-Scale Chest Computed Tomography Volumes Rachel Lea Draelosa,b, David Dovc, Maciej A. Mazurowskic,d,e, Joseph Y. Loc,d,f, Ricardo Henaoc,e, Geoffrey D. Rubind, Lawrence Carina,c,g aComputer Science Department, Duke University, LSRC Building D101, 308 Research Drive, Duke Box 90129, Durham, North Carolina 27708-0129, United States of America bSchool of Medicine, Duke University, DUMC 3710, Durham, North Carolina 27710, United States of America cElectrical and Computer Engineering Department, Edmund T. Pratt Jr. School of Engineering, Duke University, Box 90291, Durham, North Carolina 27708, United States of America dRadiology Department, Duke University, Box 3808 DUMC, Durham, North Carolina 27710, United States of America eBiostatistics and Bioinformatics Department, Duke University, DUMC 2424 Erwin Road, Suite 1102 Hock Plaza, Box 2721 Durham, North Carolina 27710, United States of America fBiomedical Engineering Department, Edmund T. Pratt Jr. School of Engineering, Duke University, Room 1427, Fitzpatrick Center (FCIEMAS), 101 Science Drive, Campus Box 90281, Durham, North Carolina 27708-0281, United States of America gStatistical Science Department, Duke University, Box 90251, Durham, North Carolina 27708-0251, United States of America Address for correspondence: Rachel Lea Draelos Email:
[email protected] Department of Computer Science 308 Research Drive Durham, North Carolina 27708-0129 United States Abstract Machine learning models for radiology benefit from large-scale data sets with high quality labels for abnormalities. We curated and analyzed a chest computed tomography (CT) data set of 36,316 volumes from 19,993 unique patients. This is the largest multiply-annotated volumetric medical imaging data set reported.