Empowering Radiologists and Accelerating AI Adoption Using Clinical Analytics for AI Model Validation, Evaluation, and Performance Monitoring

Healthcare Solutions White Paper Nuance AI Marketplace and mPower Radiology analytics: Empowering radiologists and accelerating AI adoption Using clinical analytics for AI model validation, evaluation, and performance monitoring. By Karen Holzberger Senior Vice President and General Manager, Diagnostics Nuance Communications Woojin Kim, MD Musculoskeletal Radiologist and Imaging Informaticist Healthcare Solutions White Paper 2 Nuance AI Marketplace and mPower In a few short years, perceptions of AI in radiology have shifted from the technology as a threat to the profession to an exciting innovation with the potential for faster and more accurate diagnosis, reduction of repetitive and mundane tasks, and faster scanning with improved image quality. Commercial AI developers are now offering applications for a range of medical imaging findings and modalities, and AI marketplaces or “app stores” simplify access to these AI models. Multiple industry initiatives also are focused on the regulatory, medico-legal, ethical, data science, and other challenges of developing and using AI in clinical practice. They include activities conducted under the auspices of the American College of Radiology (ACR) and the Radiological Society of North America (RSNA), at radiology teaching institutions, and by hospital systems, healthcare IT vendors, and AI developers. Many efforts are focused on the quantity, quality, and use of data for AI model training, validation, and surveillance over time. Radiologists as the “essential data scientists of medicine” At a grassroots level, radiologists are fulfilling the role urged by the ACR in 2016 as “essential data scientists of medicine,” actively guiding the development and adoption of AI in ways similar to how they shaped the implementation of PACS, RIS, and speech recognition in the past. One important area where they are taking the lead is in applying advanced clinical analytics to create their own datasets for AI model training and validation. Clinical analytics are typically used to mine radiology report data for performance measurement, and for performing follow-up tracking and other quality improvement programs. But analytics tools also can quickly define and extract data to generate and validate AI models. In addition, analytics reports can be used to assess clinical needs and associated opportunities for AI use cases. The use of internal data helps ensure that the results of AI models accurately represent the demographics of a hospital system’s patient population, imaging systems, and protocols in use. Internal data access and analytics also enable quicker AI model generation and evaluation as well as comparisons of various AI model performance. At the same time, patient data remains secure within the hospital’s systems. A surge of activity It was only in late 2016 that Geoffrey Hinton, “The Godfather of Deep Learning,” declared that deep learning algorithms would outperform human radiologists within a decade, and that “people should stop training radiologists now.” The same year, AI pioneer Andrew Ng told The Economist that it would be easier to replace radiologists with machines than it would be to replace their executive assistants. Nonetheless, radiologists not only continued but expanded research into potential uses of the technology. In April 2018, the editor of Radiology magazine noted “a remarkable transformation” underway in the investigation of AI to diagnose and detect disease. He noted that in 2015, the journal had no publications about AI, three in 2016, and 17 in 2017. In 2019, the RSNA launched a new journal, Radiology: Artificial Intelligence, dedicated to AI applications in medical imaging. Healthcare Solutions White Paper 3 Nuance AI Marketplace and mPower The publication boom across the profession itself has prompted studies. One in the December 2019 issue of the American Journal of Roentgenology examined global trends in radiology AI research published in scientific and medical journals and proceedings. “Our bibliometric analysis yielded 8813 radiology AI publications worldwide from 2000 to 2018,” wrote authors West et al. They called the growth in AI research “exponential.” Much of the attention has focused on the development and performance of AI models for a variety of image characterization use cases. For example, some of the AI models already on the Nuance AI Marketplace include: • Triage algorithms from Aidoc for detection and prioritization on the radiologist worklist of intracranial hemorrhage, pulmonary embolism, and spinal fractures. • FDA-cleared algorithms from MaxQ for the detection of intracranial hemorrhage. • An FDA-cleared application for the automatic detection of lung nodules from Riverain Technologies, which is working with Nuance to pilot its ClearRead CT algorithm. • VIDA’s LungPrint Discovery algorithm, which is an automated AI-powered analysis of an inspiratory chest CT scan that identifies lung density abnor- malities. VIDA also is collaborating with Nuance on a pilot. • Zebra Medical Vision has four FDA-cleared AI algorithms: A cardiology offering calculates coronary calcium scores based on gated CT scans; three worklist triage offerings include intracranial hemorrhage that automatically detects brain bleeds based on standard, non-contrast brain CTs, pneumo- thorax detection using chest CR, DR and/or DX images, and pleural effusion. Understanding and responding to the data science challenges Most AI models for medical imaging use machine learning algorithms. First introduced in 1959 by AI pioneer Arthur Samuel, the term machine learning describes computers that learn automatically from exposure to data; AI models learn by being trained with more data over time with the goal of improved performance. Developers and users validate performance by comparing model output to “ground truth,” or known results in the real world. For example, a finding detected by a model can be compared to data in the medical record or radiology report. A radiologist’s own observations can also serve as a reference standard to supplement ground truth data for evaluating model output. However, monitoring of AI model performance must continue beyond initial development and training due to the nature of machine learning algorithms. AI developers and users need to monitor for performance decay continuously and adjust either the algorithm, or the data, or both as needed to maintain optimal performance. Healthcare Solutions White Paper 4 Nuance AI Marketplace and mPower Radiologists should be aware of three important data science concerns when developing or applying AI models in clinical use: – Brittleness: AI models can be brittle, or likely to produce results with more inaccuracies when exposed to new or different datasets. The term “overfitting” sometimes applies, where an algorithm essentially memorizes or over-learns the statistical characteristics or parameters of its training data and performs less well when exposed to new data. Developers address this with larger training datasets and multiple rounds of testing until accuracy warrants more generalized use. In radiology, brittle models perform well with data from a single hospital environment but stumble when used with different data from other facilities. In a study published in November 2018 in PLOS Medicine, researchers at the Icahn School of Medicine reported that AI model performance in diagnosing pneumonia on chest X-rays was signifi- cantly lower using data from two other institutions. – Concept drift: Concept drift describes changes to the properties of the “target variable,” or the result that the model is expected to produce. While the input data remain consistent, the condition or concept that you are trying to predict changes. AI models that predict human behavior or biological processes are particularly prone to concept drift. For example, a radiology practice using an AI model that predicts patient no-show rates for an MRI exam may find that it becomes less accurate over time. That could be caused by seasonality, where the no-show rates may be higher during the winter holiday season than during the fall. By adding a seasonality variable to the AI model, its performance may improve, and the rate of model decay may decrease. Users need to monitor the quality of model output, adjust model inputs to account for the drift, re-label the training data, and retrain or “refresh” the model. – Data drift: In data drift, the input data can change unexpectedly as new sources are used, or as changes are made to the systems generating data. In radiology, that can be caused by different data from external sources or changes in imaging devices, new imaging protocols, and other factors. Again, model output quality should be monitored, data labeled with new classes, and the model retrained. There are different approaches to how concept drift and data drift are detected and remedied. AI developers also have various methods for predicting those phenomena. Radiologists, in any case, need to be aware of both concerns when evaluating and using AI models in clinical practice. In fact, radiologists are instrumental in the ongoing surveillance of AI model performance, and in helping AI developers diagnose and correct problems. Radiologists are addressing the data science challenges and the need for AI model evaluation and performance surveillance within their own institutions and at scale in collaborative efforts with other organizations. In one notable pilot, radiologists from seven leading healthcare organizations are partnering

Empowering Radiologists and Accelerating AI Adoption Using Clinical Analytics for AI Model Validation, Evaluation, and Performance Monitoring

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support