Pattern Recognition Software and Techniques for Biological Image Analysis

Education Pattern Recognition Software and Techniques for Biological Image Analysis Lior Shamir, John D. Delaney, Nikita Orlov, D. Mark Eckley, Ilya G. Goldberg* Laboratory of Genetics, National Institute on Aging/National Institutes of Health, Baltimore, Maryland, United States of America Abstract other imaging devices. Automated image rules. An example of unsupervised learn- acquisition systems integrated with labo- ing is clustering, where a dataset can be The increasing prevalence of auto- ratory automation have produced image divided into several groups based on pre- mated image acquisition systems is datasets that are too large for manual existing definitions of what constitutes a enabling new types of microscopy processing. This trend led to a new type of cluster, or the number of clusters expected. experiments that generate large biological experiment, in which the image A review of machine learning in the image datasets. However, there is a analysis must be performed by machines. context of bioinformatics (as opposed to perceived lack of robust image analysis systems required to process Clearly, this approach is different than the imaging) can be found in [3]. these diverse datasets. Most auto- bulk of the microscopy performed for the Traditionally, analysis of digital micros- mated image analysis systems are past ,400 years. However, while the copy images requires identifying regions of tailored for specific types of micros- availability of automated microscopy, lab- interest (ROIs) or ‘‘objects’’ within the copy, contrast methods, probes, and oratory automation, computing resources, images. Once a region is isolated from the even cell types. This imposes signif- and digital imaging and storage devices background, the resolution and dynamic icant constraints on experimental has been increasing consistently, in some range afforded by digital microscopy design, limiting their application to cases the bottleneck for high-throughput allows many types of measurements and the narrow set of imaging methods imaging experiments is the efficacy of statistics to be collected about the object in for which they were designed. One computer vision, image analysis, and question, such as intensity, shape, size, and of the approaches to address these pattern recognition methods [1]. Comput- position, as well as the number of objects limitations is pattern recognition, er-based image analysis provides an ob- and their distribution [4]. This region which was originally developed for jective method of scoring visual content selection can be done manually by draw- remote sensing, and is increasingly independently of subjective manual inter- ing boxes or free-hand regions using an being applied to the biology domain.Thisapproachreliesontrain- pretation, while potentially being more interactive tool [5], or automatically using ing a computer to recognize pat- sensitive, more consistent, and more computer algorithms known as segmenta- ternsinimagesratherthan accurate [2]. These advantages are not tion algorithms [4,6]. While most image developing algorithms or tuning limited to massive image datasets, as they analysis continues to rely on region parameters for specific image pro- allow microscopy to be used as a routine identification, PR can also be used to cessing tasks. The generality of this assay system even on a small scale. process whole images or images tiled on a approach promises to enable data An effective computational approach to grid without a prior region identification mining in extensive image reposito- objectively analyze image datasets is step [7]. ries, and provide objective and pattern recognition (PR, see Box 1). PR Traditional image processing is there- quantitative imaging assays for rou- is a machine-learning approach where the fore predicated on the question, ‘‘can my tine use. Here, we provide a brief machine finds relevant patterns that dis- objects of interest be identified?’’. In overview of the technologies behind tinguish groups of objects after being contrast, PR is predicated on the question, pattern recognition and its use in trained on examples (i.e., supervised ‘‘can these groups of images be distin- computer vision for biological and machine learning). In contrast, the other guished?’’. In this context, the input to a biomedical imaging. We list available software tools that can be used by approach to machine learning and artifi- PR algorithm may be an entire image, a biologists and suggest practical ex- cial intelligence is unsupervised learning, sub-image region identified with segmen- perimental considerations to make where the machine finds new patterns tation algorithms, or simply image samples the best use of pattern recognition without relying on prior training exam- in the form of rectangular tiles. Thus, in techniques for imaging assays. ples, usually by using a set of pre-defined contrast to selecting and tuning a segmen- Citation: Shamir L, Delaney JD, Orlov N, Eckley DM, Goldberg IG (2010) Pattern Recognition Software and Techniques for Biological Image Analysis. PLoS Comput Biol 6(11): e1000974. doi:10.1371/journal.pcbi.1000974 Introduction Editor: Fran Lewitter, Whitehead Institute, United States of America Computer-aided analysis of microscopy Published November 24, 2010 images has been attracting considerable This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration attention in the past few years, particularly which stipulates that, once placed in the public domain, this work may be freely reproduced, distributed, in the context of high-content screening transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. (HCS). The link between images and Funding: This research was supported entirely by the Intramural Research Program of the NIH, National physiology is well established, and it is Institute on Aging (Z01:AG000685-02). The funders had no role in the decision to publish or preparation of the common knowledge that a significant manuscript. portion of what we know about biology Competing Interests: The authors have declared that no competing interests exist. relies on different types of microscopy and * E-mail: [email protected] PLoS Computational Biology | www.ploscompbiol.org 1 November 2010 | Volume 6 | Issue 11 | e1000974 Box 1. Pattern Recognition Overview of Bioimage PR Systems N PR is the task of automatically detecting patterns in datasets and using them to Although there are many examples of characterize new data. PR is a form of machine learning, which itself is a field within artificial intelligence. Machine learning can be divided into two major PR systems, the process can be summa- groups. In supervised learning, or PR, a computer system is trained using a set of rized in several steps (Figure 1). pre-defined classes, and then used to classify unknown objects based on the As with traditional image processing patterns detected in training. In unsupervised learning there are no classes approaches based on object identification defined a priori, and the computer system subdivides or clusters the data, alone, PR can also benefit from various usually by using a set of general rules. An example of supervised learning is techniques to subdivide images into ROIs. automatic detection of protein localization, in which the computer system is The three principal reasons for doing so trained using images of probes for known sub-cellular compartments [10]. An are to 1) reduce the number of pixels the example of unsupervised learning is clustering an expression profiling PR algorithm needs to consider all at once microarray experiment into groups of genes with similar expression patterns. to improve response time or increase N Other approaches to PR include semi-supervised learning, which uses pre- statistical power, 2) bias the PR algorithm defined classes to find new similarity relationships and define new groups, and to process objects of interest rather than reinforcement learning, in which decisions are improved iteratively based on a background, and 3) center or align objects feedback mechanism and specified reward criteria. In this educational article we that have inherent orientation. ROI de- focus on the application of supervised learning to automated analysis of tection algorithms and tools are described microscopy image datasets. more thoroughly in the Finding Regions of Interest section. The second step is the extraction of tation algorithm, PR requires training a (and solely) on the availability and image content descriptors (image features), computer to distinguish groups of images. distinguishability of control images. which are values that describe the image These groups correspond to experimental Thus, a PR approach to an imaging content numerically. These values can controls, and the set of images within a experiment is very closely tied to the reflect various texture parameters of the group encompasses the variation within biological experiment itself rather than image, the statistical distribution of pixel each control. Given these groups of intermediate measurements from image intensities, edges, colors, etc. While the images, the machine can learn on its processing, or familiarity with the algo- dimensionality of the raw pixels can own what aspects of the images represent rithms necessary to produce these mea- typically reach ,1,000,000 (assuming a natural experimental variation and are surements. PR can be used in tandem microscopy image of 1000|1000 pixels), therefore irrelevant,

Load more