Deep Learning with Microfluidics for Biotechnology

TSpace Research Repository tspace.library.utoronto.ca Deep Learning with Microfluidics for Biotechnology Jason Riordon, Dušan Sovilj, Scott Sanner, David Sinton, and Edmond W. K. Young Version Post-print/Accepted Manuscript Citation Riordon, Jason, et al. ""Deep Learning with Microfluidics for (published version) Biotechnology."" Trends in Biotechnology (2018). Copyright/License This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/. How to cite TSpace items Always cite the published version, so the author(s) will receive recognition through services that track citation counts, e.g. Scopus. If you need to cite the page number of the author manuscript from TSpace because you cannot access the published version, then cite the TSpace version in addition to the published version using the permanent URI (handle) found on the record page. This article was made openly accessible by U of T Faculty. Please tell us how this access benefits you. Your story matters. Deep Learning with Microfluidics for Biotechnology Jason Riordon, Dušan Sovilj, Scott Sanner1, David Sinton2 & Edmond W. K. Young3* Department of Mechanical and Industrial Engineering, University of Toronto, 5 King’s College Road, Toronto, ON, Canada, M5S 3G8 *Correspondence: [email protected] (E.W.K. Young) 1http://d3m.mie.utoronto.ca/ 2 http://www.sintonlab.com/ 3 http://ibmt.mie.utoronto.ca/ Keywords: Deep Learning, Machine Learning, Microfluidics, Lab-on-a-Chip Abstract: Advances in high-throughput and multiplexed microfluidics have rewarded biotechnology researchers with vast amounts of data. However, our ability to analyze complex data effectively has lagged. Over the last few years, deep artificial neural networks leveraging modern graphics processing units have enabled the rapid analysis of structured input data - sequences, images, videos - to predict complex outputs with unprecedented accuracy. While there have been early successes in flow cytometry, for example, the extensive potential of pairing microfluidics (to acquire data) and deep learning (to analyze data) to tackle biotechnology challenges remains largely untapped. Here we provide a roadmap to integrating deep learning and microfluidics in biotechnology labs that matches computational architectures to problem types, and provide an outlook on emerging opportunities. 1 The Challenge of Processing Data from Microfluidics Over the past two decades, microfluidics (see Glossary) has shown great promise in enhancing biotechnology applications by leveraging small sample volumes, short reaction times, parallelization and fluid manipulation at sample-relevant scales [1]. Major milestones in the field are indicated in Figure 1. By demonstrating an ability to efficiently perform a wide range of functions within biotechnology laboratories, including DNA and RNA sequencing [2,3], single- cell omics [4,5], antimicrobial resistance screening [6] and drug discovery [7], microfluidic technologies have revolutionized the way we approach experimental biology and biomedical research. Microfluidic technologies have also demonstrated an ability to capture, align, and manipulate single cells for cell sorting and flow-based cytometry [8–10], mass [11] and volume [12] sensing, phenotyping [13], cell fusion [14], cell capture (e.g., circulating tumor cells [15,16]), and cell movement (e.g., sperm motility [17–20]). Microfluidic high-throughput screening using droplets benefit from rapid reaction times, high sensitivity and low cost of reagents [21]. Microfluidics has also enabled the capture and monitoring of zebrafish embryos [22,23], the high-throughput and rapid imaging of the nematode worm Caenorhabditis elegans [24], and the ordering and orientation of Drosophila embryos [25]. While microfluidic applications in biotechnology vary widely, the real product in all cases is data. For example, a typical time-lapse experiment can readily generate >100 GB of data (e.g., 100 cells × 4 images/cell (1 brightfield + 3 fluorescence bands) × 6 time points/hr × 48 hrs × 1 MB/image). However, the generation of data using high-throughput, highly parallelized microfluidic systems, has far outpaced researchers’ abilities to effectively process this data – an analysis bottleneck. 2 Machine learning (Box 1) is a class of artificial intelligence-based methods that enable algorithms to learn without direct programming. While traditional machine learning has long offered data processing capabilities, the advent of deep learning methods represents a step- increase in the ability to handle structured data such as images or sequences. Deep learning architectures can now exploit structured data applicable to a variety of research fields [26,27], and leverage inherent data structures. Deep learning has achieved impressive recent gains in analyzing a variety of data types: images [28–31], natural language translation [32–34], speech data [35,36], text documents [37,38] and computational biology [39,40]. These major gains have been powered by increased computational power from GPUs, open-source frameworks (e.g., TensorFlow), and distributed computing. Traditional machine learning has already been paired with microfluidics for biotechnology applications, for example in disease detection within liquid biopsies as comprehensively reviewed by Ko and colleagues [41]. This marriage between traditional machine learning and microfluidics has yielded (i) improvements in analysis methods including single-cell lipid screening [42], cancer screening [43,44] and cell counting [45], and (ii) advances in microfluidic design and flow modeling including predicting water-in-oil emulsion sizes [46]. Applications that combine deep learning with microfluidics, however, are only beginning to emerge, with label-free cell classification as a prime example. Cells have been identified using either deep learning architectures that process pre-extracted features [47] or use raw images as inputs and fully leverage a deep network’s ability to extract relevant features for improved prediction [48– 50]. Deep learning methods have also led to well-defined flows by tuning channel geometry [51]. While these deep learning demonstrations indicate potential, we see far broader biotechnology applicability in the near future. 3 In this perspective we outline the many opportunities presented by the marriage of deep learning (to analyze data) and microfluidics (to generate data) for biotechnology applications. We provide a roadmap to integrating deep learning strategies and microfluidics applications, carefully pairing problem types to artificial neural network architectures - Figure 2 (Key Figure). We begin with advances in processing simple unstructured data for which there is much precedence in biotechnology, and transition to complex sequential and image data for which there are abundant opportunities. We provide practical implementation guidelines for biotechnology researchers and research leaders new to deep learning, including a summary of how neural networks function and tips on getting started (Box 2). We end with an outlook on emerging opportunities that will have a profound impact on biotechnology research in the near and medium term. Deep Learning Architectures for Microfluidic Challenges Deep neural architectures have been developed and applied to a wide range of challenges, including single-molecule science [52], computational biology [40], and biomedicine [53,54]. Here we highlight strategies that have successfully been applied within the microfluidics community, and where such approaches could be applied in the near future. Most relevant architecture categories (input-to-output data types) are listed alongside existing microfluidic applications in Figure 2. We progress from the simple unstructured-to-unstructured case to the more complex image-to-image combination, and highlight microfluidic achievements and opportunities. Unstructured-to-Unstructured Neural Networks – e.g., for Classifying Cells Based on Manually-Extracted Cell Traits 4 Unstructured-to-unstructured neural networks represent the simplest neural network case – a type of architecture that typically falls into the traditional machine learning category, but where deep learning can often be beneficial. In a typical biotechnology application, a neural network designed to handle unstructured inputs could be used to classify flowing cells within a microfluidic channel, where the input x would be a vector of cell traits (e.g., circularity, perimeter and major axis length) and the output y would be a class (e.g., white blood cell or colon cancer cell) as in the label-free cell segmentation and classification example in Figure 3A by Chen and colleagues (simplified – only three traits are shown for clarity). “Label-free” here refers to physical labels such as fluorophores rather than feature labels as used in machine learning terminology. The authors used time-stretch quantitative phase microscopy [47] to create a rich hyperdimensional feature space of cell traits, trained their network using supervised learning, and achieved an accuracy and consistency in cell classification with their deep neural network surpassing that of traditional machine learning approaches, including logistic regression, naive Bayes and support vector machines. Whereas the unstructured-to-unstructured example above utilized a deep network, most microfluidic work using simple unstructured-to-unstructured

Load more