Synthetic Data in Machine Learning for Medicine and Healthcare

comment Synthetic data in machine learning for medicine and healthcare The proliferation of synthetic data in artifcial intelligence for medicine and healthcare raises concerns about the vulnerabilities of the software and the challenges of current policy. Richard J. Chen, Ming Y. Lu, Tifany Y. Chen, Drew F. K. Williamson and Faisal Mahmood s artificial intelligence (AI) for scenarios such as road accidents and harsh of AI involves learning algorithms, also applications in medicine and driving environments for which collecting known as deep generative models, that healthcare undergoes increased data can be challenging4, in medicine and can emulate how data are generated in A 11,12 regulatory analysis and clinical adoption, healthcare accurate synthetic data can be the real world . Generative adversarial the data used to train the algorithms are used to increase diversity in datasets and networks (GANs) are a type of generative undergoing increasing scrutiny. Scrutiny of to increase the robustness and adaptability model that learn probability distributions the training data is central to understanding of AI models. However, synthetic data can of how high-dimensional data are likely algorithmic biases and pitfalls. These can also be used maliciously, as exemplified to be distributed. GANs consist of two arise from datasets with sample-selection by fake impersonation videos (also known neural networks — a generator and a biases — for example, from a hospital that as deepfakes), which can propagate discriminator — that compete in a minimax admits patients with certain socioeconomic misinformation and fool facial recognition game (that is, a game of minimizing the backgrounds, or medical images acquired software5. maximum possible loss) to fool each other. with one particular type of equipment or The United States Food and Drug For instance, in a GAN being trained to camera model. Algorithms trained with Administration (FDA) has put forward an produce paintings in the style of Claude biases in sample selection typically fail approval pathway for AI-based software Monet, the generator would be a neural when deployed in settings sufficiently as a medical device (AI-SaMD). An network that aims to produce a Monet different from those in which the trained increasing number of AI algorithms are counterfeit that fools a critic discriminator data were acquired1. Biases can also arise being approved, with uses ranging from network attempting to distinguish real owing to class imbalances — as is typical the detection of atrial fibrillation to the Monet paintings from counterfeits. As of data associated with rare diseases — clinical grading of pathology slides6,7. By the game progresses, the generator learns which degrade the performance of trained incorporating synthetic data emulating the from the poor counterfeits caught by the AI models for diagnosis and prognosis. phenotypes of underrepresented conditions discriminator, progressively creating more And AI-driven diagnostic assistance tools and individuals, AI algorithms can make realistic counterfeits. GANs have shown relying on historical data would not typically better medical decisions in a wider range promise in a variety of applications, ranging detect new phenotypes, such as those of of real-world environments. In fact, the use from synthesizing paintings of modern patients with stroke or cancer presenting of synthetic data has attracted mainstream landscapes in the style of Claude Monet to with symptoms of coronavirus disease 2019 attention as a potential path forward for generating realistic images of skin lesions13 (COVID-19)2. Because the utility of AI greater reproducibility in research and (Fig. 1, top), pathology slides14, colon algorithms for healthcare applications hinges for implementing differential privacy for mucosa15 and chest X-rays16–18 (Fig. 1, top), on the exhaustive curation of medical data protected health information (PHI)8,9. With and in a range of imaging modalities19–22. with ground-truth labels, the algorithms are the increasing digitization of health data and Advancements in computer graphics and as effective or as robust as the data they are the market size of AI for healthcare expected in information theory have driven progress supplied with. to reach US$45 billion by 2026 (ref. 8), in generative modelling from the generation Therefore, large datasets that are diverse the role of synthetic data in the health of 28 × 28 (pixels per side) black-and-white and representative (of the heterogeneity information economy needs to be precisely images of handwritten digits to simulating of phenotypes in the gender, ethnicity and delineated in order to develop fault-tolerant life-like human faces with 1,024 × 1,024 geography of the individuals or patients, and and patient-facing health systems10. How do high-fidelity images23. To show the in the healthcare systems, workflows and synthetic data fit within existing regulatory capabilities of the AI-driven generation of equipment used) are necessary to develop frameworks for modifying AI algorithms in synthetic medical data, we produced images and refine best practices in evidence-based healthcare? To what ends can synthetic data of three histological subtypes of renal cell medicine involving AI3. To overcome be used to protect or exploit patient privacy, carcinoma (chromophobe, clear cell and the paucity of annotated medical data in and to improve medical decision-making? In papillary carcinoma) by training a GAN real-world settings, synthetic data are being this Comment, we examine the proliferation with 10,000 real images of each subtype, increasingly used. Synthetic data can be of synthetic data in medical AI and discuss and then compared the performance of created from perturbations using accurate the associated vulnerabilities and necessary the model with another model trained forward models (that is, models that simulate policy challenges. using both real and synthetic data (Fig. 1, outcomes given specific inputs), physical middle and bottom). The synthetic images simulations or AI-driven generative models. Fidelity tests generated by the GAN finely mimic the As with the development of computer vision Beyond improved image classification and characteristic thin-walled ‘chicken wire’ algorithms for self-driving cars to emulate natural language processing, one promise vasculature of the clear-cell carcinoma NATURE BIOMEDICAL ENGINEERING | VOL 5 | JUNE 2021 | 493–497 | www.nature.com/natbiomedeng 493 comment Synthetic images Real images Challenges in adoption The generation of synthetic data has garnered significant attention in medicine and healthcare13,14,17,32–34 because it can Skin lesions improve existing AI algorithms through data augmentation. For instance, among renal cell carcinomas, the chromophobe subtype is rare and accounts for merely 5% of all renal cell carcinoma cases35. By providing Chest X-rays synthetic histology images of renal cell carcinoma as additional training input to a convolutional neural network, the detection accuracy of the subtype can be improved (Fig. 1, bottom). However, the wider roles of synthetic Chromophobe 40 μm 40 μm data in AI systems in healthcare remain unclear. Unlike traditional medical devices, the function of AI-SaMDs may need to be adaptive to data streams that evolve Clear cell over time, as is the case for health data 40 μm 40 μm from smartphone sensors36,37. Researchers may be tempted to use synthetic data Renal cell carcinoma histology Renal cell carcinoma as a stopgap for the fine-tuning of algorithms; however, policymakers may Papillary 40 μm 40 μm find it troubling that there are not always clinical-quality measures and evaluation metrics for synthetic data. In a proposed Real Real FDA regulatory framework for software Real and synthetic Real and synthetic modifications in adaptive AI-SaMDs, 0.7 0.8 0.9 1.0 0.7 0.8 0.9 1.0 guidance for updating algorithms would AUC AUC mandate reference standards and quality assurance of any new data sources6. Fig. 1 | Synthetic medical data in action. Top: synthetic and real images of skin lesions and of frontal However, when generating synthetic data for chest X-rays. Middle: synthetic and real histology images of three subtypes of renal cell carcinoma. rare or new disease conditions, there may Bottom: areas under the receiver operating characteristic curve (AUC) for the classification performance not even be sufficient samples to establish of an independent dataset of the histology images by a deep-learning model trained with 10,000 real clinical reference standards. As with other images of each subtype and by the same model trained with the real-image dataset augmented by data-driven deep-learning algorithms, 10,000 synthetic images of each subtype. Methodology and videos are available as Supplementary generative models are constrained by the Information. size and quality of the training dataset used to model the data distribution, and models trained with biased datasets would still be biased toward overrepresented conditions. subtype and the unique features of the other diversity in adolescents with de novo How can we assess whether synthetic data two subtypes. The GAN also improves the mutations makes the weights of the trained are emulating the correct phenotype and accuracy of classification. GAN model publicly available, the GAN are free from artefacts that would bias the By closely mimicking real-world could be used by a third party to synthesize deployed AI-SaMDs? Current quantitative observational data, synthetic

Load more