Very Deep Convolutional Neural Networks for Morphologic Classification of Erythrocytes Thomas J.S
Total Page:16
File Type:pdf, Size:1020Kb
Clinical Chemistry 63:12 Informatics and Statistics 1847–1855 (2017) Very Deep Convolutional Neural Networks for Morphologic Classification of Erythrocytes Thomas J.S. Durant,1† Eben M. Olson,1† Wade L. Schulz,1 and Richard Torres1* BACKGROUND: Morphologic profiling of the erythrocyte cells, including abnormal leukocytes in lymphoma, leu- population is a widely used and clinically valuable diag- kemia, and dysplasia; intracellular parasites such as ma- Downloaded from https://academic.oup.com/clinchem/article/63/12/1847/5612763 by guest on 29 September 2021 nostic modality, but one that relies on a slow manual laria and Anaplasma; and platelet changes characteristic process associated with significant labor cost and limited of specific causes of thrombocytopenia and myeloprolif- reproducibility. Automated profiling of erythrocytes from erative disease (1, 2). Similarly, aberrant erythrocytic digital images by capable machine learning approaches forms can be associated with renal and liver disease, he- would augment the throughput and value of morphologic moglobinopathies, toxins, and dysplasia, and are criti- analysis. To this end, we sought to evaluate the performance cally important for the detection of hemolytic disorders of leading implementation strategies for convolutional neu- such as thrombotic thrombocytopenia purpura (2, 3). ral networks (CNNs) when applied to classification of Automated hematology analyzers typically rely on a com- erythrocytes based on morphology. bination of laser-light scatter, fluorescence, and imped- ance, as well as other flow-based physical or cytochemical METHODS: Erythrocytes were manually classified into 1 properties that are insensitive to these morphologic of 10 classes using a custom-developed Web application. changes, making visual microscopy frequently necessary Using recent literature to guide architectural consider- (4). ations for neural network design, we implemented Unfortunately, gold-standard morphologic profil- a “very deep” CNN, consisting of Ͼ150 layers, with ing of blood cells relies heavily on manual smear process- dense shortcut connections. ing techniques and visual inspection with limitations from quality-control and economic scalability. Blood RESULTS: The final database comprised 3737 labeled smear preparation and interpretation are thought to be cells. Ensemble model predictions on unseen data dem- negatively affected by observer bias, slide distribution er- onstrated a harmonic mean of recall and precision met- rors, statistical sampling error, and recording errors, and rics of 92.70% and 89.39%, respectively. Of the 748 cells also involve labor-intensive processes that require highly in the test set, 23 misclassification errors were made, with trained individuals, rendering them time and cost- a correct classification frequency of 90.60%, represented prohibitive (5–8). Although the process remains in uni- as a harmonic mean across the 10 morphologic classes. versal use, it is typically performed under the stewardship of institutional guidelines, to limit requests in the interest CONCLUSIONS: These findings indicate that erythrocyte of conserving resources (9). As such, there has been con- morphology profiles could be measured with a high de- siderable interest in improving sensitivity and specificity gree of accuracy with “very deep” CNNs. Further, these of automated analyzers for morphologic abnormalities, data support future efforts to expand classes and optimize and in developing automated classification of images of practical performance in a clinical environment as a pre- peripheral blood smears (10). lude to full implementation as a clinical tool. Historically, efforts to automate morphologic classi- © 2017 American Association for Clinical Chemistry fication have used statistical models that rely on input that is derived in a way similar to analysis by morpholo- gists (11). Known as feature engineering, such attempts Microscopic examination of peripheral blood is standard quantify predetermined morphologic features from dig- practice in clinical medicine and serves an important role ital images to serve as input to prediction algorithms. in the diagnosis of both hematologic and nonhemato- Although these models have demonstrated the potential logic disease. It is uniquely capable of discerning clini- to discern between basic morphologic classes, they gen- cally relevant morphologic features of hematopoietic erally distinguish only a small number of categories and 1 Department of Laboratory Medicine, Yale University School of Medicine, New Haven, CT. Received May 18, 2017; accepted August 11, 2017. * Address correspondence to this author at: Department of Laboratory Medicine, 55 Park Previously published online at DOI: 10.1373/clinchem.2017.276345 St., PS345D, New Haven, CT 06511. Fax 203-937-4746; e-mail [email protected]. © 2017 American Association for Clinical Chemistry † T.J.S. Durant and E.M. Olson contributed equally to this work. 1847 show limited accuracy, particularly for red blood cells CNNs rely on input from patterns that mimic hu- (11–13). man visual recognition fields, known as filters, which are Implementation of feature engineering has also been mathematically convolved with the image of interest. In studied as input for modern machine learning (ML)2 contrast to traditional ANNs taking input across an en- algorithms, such as artificial neural networks (ANNs), tire image simultaneously, convolutional filters operate which have been an effective combination for leukocyte on limited neighborhoods to produce “feature maps” of classification (14). A commercial system that relies on local patterns, which are combined by subsequent layers ANNs for leukocyte classification, CellaVision (Cel- into more abstract features (20). This strategy of local lavision AB), was first Food and Drug Administration- hierarchical feature extraction allows models to be less approved for automated image analysis in 2001 and is sensitive to perturbations in image orientation, size, or Downloaded from https://academic.oup.com/clinchem/article/63/12/1847/5612763 by guest on 29 September 2021 widely available as an add-on for several automated rotation and reduces the number of examples needed to hematology analyzer systems (15). However, the Food train a model, leading to model generalizability and bet- and Drug Administration-approved implementation ter performance (18). Although CNNs are currently a is limited to human-assisted classification, as it re- predominant ML technique for complex image recogni- quires confirmation of leukocyte classifications by a tion tasks, their performance as applied to the morpho- skilled operator. Such reliance on human confirma- logic classification of erythrocytes has not been described. tion can limit the potential benefits of ML technology New data have demonstrated that the improved per- in clinical medicine, and future applications may ben- formance of CNNs can be positively influenced by increas- efit from modern and more capable algorithms to pur- ing their “depth,” or number of layers. Early implementa- sue unassisted classification. tions of multilayer neural networks consisted of 5 to 10 CellaVision also offers a non-Food and Drug layers of learned features (24, 25). Modifications in training Administration-approved image analysis solution for techniques, connection patterns, and improved computa- erythrocytes that is similarly based on ANNs, using 80 tional power have gradually allowed increasing network predetermined object features to classify cells from 17 depth with the potential for improved performance. How- morphologic classes, directed into 4 qualitative cate- ever, as the depth of CNNs increases, information used to gories. However, published reports for CellaVision red train the network (i.e., adjustment of feature weights) can blood cell classification demonstrate limited specific- diminish, causing gradient-based methods to remain within ity and variable accuracy without reclassification by local error minima and fail to converge (26, 27). To address human operators (16, 17). Also, the intended use is this issue, recent methods have evolved in which “shortcut” described as qualitative rather than a quantitative met- connections are added to aid information flow between ric, without reliable single-cell resolution, an aspect shallow and deep layers, allowing gradients to persist that limits its practical and clinical utility (16). Hence, throughout the network and yielding improvements in per- there is considerable need for more robust analytic formance on a variety of benchmark tasks (28). approaches for erythrocyte classification, capable of Given these recent developments in ML, we hypo- accurate quantitative analysis. thesized that CNNs could be used to develop high- In recent years, important advancements in image performance classification models and serve to quantify classification performance have been made through the erythrocyte morphologic profiles with high single-cell reso- implementation of learned feature representations, with lution accuracy. Automated classification of erythrocytes convolutional neural networks (CNNs) as a prominent from digital images by capable ML approaches would aug- example (18–21). Determination of which ML method ment the throughput, precision, and clinical value of eryth- to use for a given problem is a decision that may be rocyte morphologic analysis. With this in mind, we sought guided by results from annual benchmark challenges in to determine the “off-the-shelf”