The Performance of Digital Microscopy for Primary Diagnosis in Human Pathology: a Systematic Review
Total Page:16
File Type:pdf, Size:1020Kb
Virchows Archiv (2019) 474:269–287 https://doi.org/10.1007/s00428-018-02519-z REVIEW AND PERSPECTIVES The performance of digital microscopy for primary diagnosis in human pathology: a systematic review Anna Luíza Damaceno Araújo1 & Lady Paola Aristizábal Arboleda1 & Natalia Rangel Palmier1 & Jéssica Montenegro Fonsêca1 & Mariana de Pauli Paglioni1 & Wagner Gomes-Silva1,2,3 & Ana Carolina Prado Ribeiro1,2,4 & Thaís Bianca Brandão2 & Luciana Estevam Simonato4 & Paul M. Speight5 & Felipe Paiva Fonseca1,6 & Marcio Ajudarte Lopes1 & Oslei Paes de Almeida1 & Pablo Agustin Vargas1 & Cristhian Camilo Madrid Troconis7 & Alan Roger Santos-Silva1 Received: 9 August 2018 /Revised: 25 December 2018 /Accepted: 28 December 2018 /Published online: 26 January 2019 # Springer-Verlag GmbH Germany, part of Springer Nature 2019 Abstract Validation studies of whole slide imaging (WSI) systems produce evidence regarding digital microscopy (DM). This systematic review aimed to provide information about the performance of WSI devices by evaluating intraobserver agreement reported in previously published studies as the best evidence to elucidate whether DM is reliable for primary diagnostic purposes. In addition, this review delineates the reasons for the occurrence of discordant diagnoses. Scopus, MEDLINE/PubMed, and Embase were searched electronically. A total of 13 articles were included. The total sample of 2145 had a majority of 695 (32.4%) cases from dermatopathology, followed by 200 (9.3%) cases from gastrointestinal pathology. Intraobserver agreements showed an excellent concordance, with values ranging from 87% to 98.3% (κ coefficient range 0.8–0.98). Ten studies (77%) reported a total of 128 disagreements. The remaining three studies (23%) did not report the exact number and nature of disagreements. Borderline/challenging cases were the most frequently reported reason for disagreements (53.8%). Six authors reported limitations of the equipment and/or limited image resolution as reasons for the discordant diagnoses. Within these articles, the reported pitfalls were as follows: difficulties in the identification of eosinophilic granular bodies in brain biopsies; eosinophils and nucleated red blood cells; and mitotic figures, nuclear details, and chromatin patterns in neuropathology spec- imens. The lack of image clarity was reported to be associated with difficulties in the identification of microorganisms (e.g., Candida albicans, Helicobacter pylori,andGiardia lamblia). However, authors stated that the intraobserver variances do not derive from technical limitations of WSI. A lack of clinical information was reported by four authors as a source for disagree- ments. Two studies (15.4%) reported poor quality of the biopsies, specifically small size of the biopsy material or inadequate routine laboratory processes as reasons for disagreements. One author (7.7%) indicated the lack of immunohistochemistry and special stains as a source for discordance. Furthermore, nine studies (69.2%) did not consider the performance of the digital method—limitations of the equipment, insufficient magnification/limited image resolution—as reasons for disagreements. To summarize the pitfalls of digital pathology practice and better address the root cause of the diagnostic discordance, we suggest a Electronic supplementary material The online version of this article (https://doi.org/10.1007/s00428-018-02519-z) contains supplementary material, which is available to authorized users. * Alan Roger Santos-Silva 4 Faculdade de Odontologia, Fernandópolis, Universidade Brasil, São [email protected] Paulo, Brazil 5 1 Oral Diagnosis Department, Semiology and Oral Pathology Areas, Unit of Oral and Maxillofacial Pathology, School of Clinical Piracicaba Dental School, University of Campinas (UNICAMP), Av. Dentistry, University of Sheffield, Sheffield, UK Limeira, 901, Bairro Areião, Piracicaba, São Paulo 13414-903, Brazil 6 Clinic, Pathology and Odontological Surgery Department, Minas 2 Gerais Dental School, Federal University of Minas Gerais (UFMG), Dental Oncology Service, Instituto do Câncer do Estado de São Belo Horizonte, Minas Gerais, Brazil Paulo, Faculdade de Medicina da Universidade de São Paulo (FMSUP), São Paulo, Brazil 7 Dentistry Program, Health of Science Faculty, Corporación 3 Medical School of Nove de Julho University, São Paulo, Brazil Universitaria Rafael Nuñez (CURN), Cartagena de Índias, Colombia 270 Virchows Arch (2019) 474:269–287 Categorization for Digital Pathology Discrepancies to be used in further validations studies. Among 99 discordances, only 37 (37.3%) had preferred diagnosis rendered by means of WSI. The risk of bias and applicability concerns were judged with the QUADAS-2. Two studies (15.4%) presented an unclear risk of bias in the sample selection domain and 2 (15.4%) presented a high risk of bias in the index test domain. Regarding applicability, all studies included were classified as a low concern in all domains. The included studies were optimally designed to validate WSI for general clinical use, providing evidence with confidence. In general, this systematic review showed a high concordance between diagnoses achieved by using WSI and conventional light microscope (CLM), summarizes difficulties related to specific findings of certain areas of pathology— including dermatopathology, pediatric pathology, neuropathology, and gastrointestinal pathology—and demonstrated that WSI can be used to render primary diagnoses in several subspecialties of human pathology. Keywords Whole slide imaging . Intraobserver agreement . Systematic review Introduction blinded). The methodological characteristics should be indi- vidually evaluated by domain, which represents the way that Validation studies regarding the feasibility of whole slide imag- the study was conducted [4]. ing (WSI) systems have been conducted by pathology laborato- The most common problems identified in the design of ries in a wide range of subspecialties to produce solid evidence previously published validation studies are the case and support the use of this technology for several applications, selection—samples selected have a narrow range of subspe- including primary diagnosis. The guideline statement of the cialty specimens or known malignant diagnoses—and the College of American Pathologists Pathology and Laboratory comparisons of the study results with a “gold standard”/con- Quality Center (CAP-PLQC) for WSI systems validation sum- sensus diagnosis/expert diagnosis instead of establishing the marizes recommendations, suggestions, and expert consensus concordance by assessing the intraobserver agreement [5]. opinion about the methodology of validation studies in an effort The FDA recently approved a WSI system for primary to standardize the process. This guideline encompasses the need diagnosis purposes [6] and, even though this statement to include a sample set of at least 60 cases for one application and highlighted some assurance about the safety and feasibility to establish a diagnostic concordance between digital and glass of the digital system, only one device was tested and ap- slides for the same observer—intraobserver variability—with a proved. Regardless of this achievement, individual validation minimum washout period of 2 weeks between views [1]. studies conducted by each laboratory and customized for each Surprisingly, the recommendations do not suggest a consecutive service and WSI system used are still necessary and will pro- or random selection of the cases or a need to blind evaluators, but vide the best evidence to attest the feasibility of digital pathol- they do highlight that the viewing can be random or non-random. ogy, especially if based on CAP-PLQC guidelines. Validation studies are cross-sectional studies by definition, Given the absence of a broader collective agreement on the and their designs have many methodological variations, which use of WSI in a human pathology context, it is necessary to should be considered when evidence is assembled [2]. All these assemble evidence regarding the performance of digital mi- variations lead to skewed estimates about the test accuracy. The croscopy in order to establish whether this technology can be most important variation concerns how the sample was select- used to provide a primary diagnosis. Therefore, this system- ed, included, and analyzed [3]. Some aspects regarding config- atic review tested the diagnostic performances of WSI in hu- uration, the purpose of the test, and the risks that prevent the test man pathology. In addition, this review provided access to the from serving its purposes may have been considered in valida- main reasons for disagreement occurrences. tion studies, since performance may be influenced by analysis bias; reproducibility; washout period; response time; and size, scope, and suitability of certain types of specimens. Besides Materials and methods that, the learning curve and performance problems may be related to the method or to the pathologists [2]. Apparently, The present systematic review was conducted following the the order of analyses—digital or conventional—does not affect guidelines of Preferred Reporting Items for Systematic the interpretation in this context [3]. Reviews and Meta-Analysis (PRISMA) [7] and was regis- The most common biases in diagnostic studies are verifi- tered with the PROSPERO database under the protocol cation bias/detection bias/work-up bias (when the reference CRD42018085593. The review question defined was: “Is dig- standard is not applied in