Automatic Pixel Classification and Segmentation of Biofilm Forming Bacteria from Fluorescence Microscopy Images

Automatic pixel classification and segmentation of biofilm forming bacteria from fluorescence microscopy images.

Koen Hendriks STUDENT NUMBER: 2018158

THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DATA SCIENCE & SOCIETY DEPARTMENT OF COGNITIVE SCIENCE & ARTIFICIAL INTELLIGENCE SCHOOL OF HUMANITIES AND DIGITAL SCIENCES TILBURG UNIVERSITY

Thesis committee:

Dr. Sharon Ong Dr. Marie Postma

Tilburg University School of Humanities and Digital Sciences Department of Cognitive Science & Artificial Intelligence Tilburg, The Netherlands June 2019

Preface

I would like to thank Dr. Sharon Ong for her excellent support during this project and Dr. Marie Postma for her valuable feedback and comments on the first version. Furthermore, I would like to thank my parents for always being supportive of my decisions, providing me with the resources which allowed me to pursue this master’s degree and a loving home on which I can always fall back.

Table of contents 1. Introduction ...... 5 2. Related Work ...... 6 2.1 Brief history of biofilms...... 6 2.2 Analyzing time-lapse fluorescence microscopy images ...... 6 2.2.1 Bacteria detection...... 7 2.2.2 Bacteria Tracking ...... 10 3. Experimental Setup ...... 10 3.1 Data ...... 10 3.1.1 Creating ground truth ...... 11 3.1.2 Feature engineering ...... 11 3.1.3 Balancing classes ...... 15 3.2 Method / Models ...... 17 3.2.1 Software and Libraries ...... 18 3.2.2 Models...... 18 3.2.3 Hyperparameter Tuning ...... 19 3.2.4 Evaluation ...... 21 3.2.5 Baseline ...... 21 4. Results ...... 23 5. Discussion ...... 27 6. Conclusion ...... 27

Automatic pixel classification and segmentation of biofilm forming bacteria from fluorescence microscopy images.

Koen Hendriks

Antimicrobial resistance is considered a threat to global health. One cause of this phenomenon is biofilm formation, of which knowledge is limited. Experiments of biofilm forming bacteria can generate a significant amount of time series fluorescence microscopy images. However, at present, these images require extensive manual labor in order to analyze, and results are often subjected to human error and biases. In addition, current research on image analysis of bacteria often does not examine in detail the important first step in bioimage analysis – pixel classification. As a result, this research investigated feature importance and three different classifiers in terms of their predictive power of background, border, and interior pixels in a dataset engineered from 55 raw fluorescence microscopy images of Pseudomonas Aeruginosa. Results reveal a small subset of useful features and that the Random Forest classifier outperformed both the k-NN and Support Vector Machine classifiers in terms of F1-score. Accurately classifying these pixels is an essential pre-requisite for labeling single bacteria. When comparing labeled images of commonly used segmentation techniques to this method we found it is less prone to segmentation error. Any segmentation errors translate into errors further down the bioimage analysis pipeline, making this experiment an important first step in making an algorithm which automatically segments and tracks bacteria.

1. Introduction The World Health Organization (WHO) mentions antimicrobial resistance (AMR) as one of the main threats to global health in 2019. As it hinders both prevention and effective treatment of an expanding variety of common infections caused by microbes, AMR is a relevant threat to all layers of modern-day civilization (World Health Organization, 2019). One phenomenon which has previously been explained as causing bacteria to be increasingly resistant to antimicrobial drugs, is the formation of biofilms (Hall-Stoodley & Stoodley, 2009). Knowledge of the early stages of bacterial biofilms formation, in which single free-swimming bacteria attach to a surface and continue to form antimicrobial resistant microcolonies, is relatively limited (Conrad et al., 2011; Gibiansky et al., 2013; Tolker-Nielsen, 2015). Studying this phenomenon almost always consists of the analysis of time-lapse fluorescence microscopy images (Tolker-Nielsen, 2015). Technological advances resulted in increased amounts of data being available (M. Wang, 2010). However, manual analysis is laborious and time-consuming, and automation is often not generalizable (Haubold et al., 2016a; Q. Wang, Niemi, Tan, You, & West, 2010). Therefore, the aim of this project is assisting in the development of a tool which automatically segments individual bacterium in time-lapse microscopy images of the early stages of biofilm development.

Potentially the most important characteristic of biofilms is their resistance to antimicrobial drugs. There are, however, several applications of biofilms which also are worth discussing. Firstly, biofilms forming on indwelling medical devices are causing device associated infections. For example, biofilms have been found on both central venous and urinary catheters, prosthetic heart valves, and artificial hip prosthesis. Secondly, several diseases

have been reported as caused by microorganisms forming biofilms. These include native valve endocarditis, otitis media, chronic bacterial proctitis, cystic fibrosis, and periodontitis. Thirdly, they have also been linked to food and water contamination when forming in the environments were these products are processed (Petersen, 2009). Finally, even though biofilms are often considered as negatively influencing human life, there are also cases where they can be linked to more positive outcomes. Examples include wastewater treatment, remediation of contaminated soil and groundwater, and microbial leaching (extracting precious metals from ores with biofilms instead of chemicals) (Cunningham, Lennox, & Ross, 2009)

Considering the importance to study biofilms, limited knowledge of the early stages of their formation and the challenges in analyzing fluorescence microscopy images, this research focusses on answering the following research questions:

RQ1: Which set of features is most important for classifying pixels in fluorescence microscopy images of pseudomonas aeruginosa?

RQ2: What is the best classifier for segmenting pseudomonas aeruginosa, and how accurate is it?

2. Related Work This section provides an overview of previous work related to the analysis of images of the early stages of biofilm development.

2.1 Brief history of biofilms Dutch businessman and scientist Antonie van Leeuwenhoek, also sometimes referred as the father of Microbiology, is believed to have first discovered biofilms when he observed the bacteria in his own dental calculus in the 17th century (Costerton, Geesey, & Cheng, 1978; Slavkiny, 1997;Costerton, Stewart, & Greenberg, 1999). It was not until 1969, when Jones et al. developed a technique for staining and examining biofilms by electron microscopy, that a polysaccharide matrix surrounding the individual cells was found (Jones et al., 1969; Rodgers, Zhan, & Dolan, 2004). Today, biofilms are described as a colony of microorganisms encased in an, primarily, extracellular polymeric substance (EPS) matrix which are irreversibly attached to a surface (Costerton et al., 1999; Hall-Stoodley & Stoodley, 2009; Pamp & Tolker-Nielsen, 2007; Rodgers et al., 2004). Biofilms have been previously found on many different surfaces including teeth, medical implants, and streambeds(Rodgers et al., 2004; Vidyasagar & Nagarathnamma, 2018).

2.2 Analyzing time-lapse fluorescence microscopy images Time-lapse microscopy is a widely used technique in analyzing pseudomonas aeruginosa on the single-bacteria level(Chang, Yokota, Abe, Tang, & Tasi, 2017; Lee et al., 2018; M. Wang, 2010). By recording observations at a set interval (e.g. 15 minutes), it allows for studying processes over a long period of time. Time-lapse microscopy is also considered the only technique allowing the accurate observation of dividing cells by enabling researchers to construct lineage trees(Etzrodt, Endele, & Schroeder, 2014; Ulman et al., 2018). Analysis of the images produced using this technique is a challenge to be performed manually as it is, time-consuming and often hard to replicate. The problems observed in

manual analysis has created a demand for (semi-)automated techniques (Ulman et al., 2018; M. Wang, 2010).

Classifying motility mechanisms of bacteria in time-lapse images requires tracking individual bacteria in each consecutive image and extracting their trajectories. Based on the analysis of these tracks, cell behavior in the early stages of biofilm development can be quantified and predicted (Chang et al., 2017; Conrad et al., 2011; Cristina & Queimadelas, 2012). However, tracking relies on successfully identifying these microbes as separate objects, which in turn relies on accurate pixel classification (Haubold et al., 2016a; Kan, 2017). Figure 2.1 provides a simplified overview of the time-lapse image analysis pipeline. This process will be discussed in more detail in the following sections.

Figure 2.1 Flow-chart displaying the time-lapse image analysis process.

2.2.1 Bacteria detection Accurate detection and segmentation of bacteria as objects in raw microscopy images is an essential pre-requisite for tracking and, thus, quantifying bacterial behavior. Some researchers even state that 90% accuracy, although acceptable in some research settings, most likely results in useless tracking results (Kan, 2017). Bacteria detection and segmentation starts by grouping pixels in raw images into bacteria objects, providing each object with a unique label (M. Wang, 2010; Q. Wang et al., 2010). Van Valen et al. explains it usually relies on a problem-specific mixture of methods from the computer vision domain, including filtering, morphological operations, thresholding, and watershed transformations. The results of this process are then fed into the Bacteria Tracking workflow which tracks labelled objects through space and time. Figure 2.2 depicts this traditional bacteria segmentation pipeline with examples of the different stages.

Figure 2.2 Bacteria segmentation pipeline with examples. top: (a) raw image, (b) pixel classified image, and (c) segmented image with bacteria objects. Bottom: traditional bacteria detection flow-chart.

At the root of the image segmentation domain is pixel classification, which encompasses labeling each individual pixel in order to generate a binary image containing the main features of bacteria (Lee et al., 2018). Previous research revealed there are several ways of performing pixel classification and labeling of connected components.

A frequently used technique is the watershed transform (Vallotton et al., 2009). The watershed transform is a technique applied to grayscale images with the aims of segmenting them. Instead of treating pixels as color intensity, it considers them a measure of height. For example, darker regions in the image are considered ‘lower’ compared to lighter ‘higher’ regions. By flooding the lower situated ‘basins’ and measuring where water of different basins meet, the watershed transform is able to determine boundaries between different objects in images (Barnes, Lehman, & Mulla, 2014; Salman, 2006). However, a frequent problem with this technique is leakage, which results in undersegmentation, where multiple blobs are segmented as one blob. Segmentation problems will be discussed in more detail in later sections.

Kan and others explain a technique were pixels are classified as belonging to a the ‘background’ or ‘cell’ (foreground) class. The question remains what features to use to classify the pixels into these categories. One method is the single-feature approach which only uses pixel intensity, thresholding. Due to the fact is does not utilize information of neighboring pixels, it is not considered very effective. In their extensive survey Sezgin and Sankur quantitively evaluated a wide range of image thresholding techniques which also take neighbor pixel intensities into account, the two-feature approach. The accuracy of these techniques, measured as misclassification error, ranged between 0.753 to 0.256 in a dataset which consisted, amongst others, of six light microscope images (Sezgin & Sankur, 2004).

Haubold et al. (Haubold et al., 2016b) make use of the freely available software Ilastik, in which they annotate several pixels as either belonging to the cell or background. With these annotations, they compute pixel features based on edge and texture filters. Then a random forest classifier predicts unannotated pixels based on the information provided by the annotated pixels. Their work was applied on microscopy images of eukaryotic nuclei cells which are of different shape and size compared to bacteria. These filters include (1) Gaussian Smoothing, (2) Laplacian of Gaussian, (3) Gaussian Gradient Magnitude, (4) the

difference of Gaussian, (5) Structure Tensor Eigenvalues, and (6) Hessian of Gaussian Eigenvalues. (Sommer, Straehle, Kothe, & Hamprecht, 2011). They utilize a process called convolution which, in its most basic form, encompasses multiplying two image arrays of different sizes with the same dimensions. This smaller array is generally referred to as the kernel which, in turn, comes in different sizes and values patterns resulting from different distributions. Convolution is the process through which the kernel moves over the original image pixels and calculates new pixel values based on the original pixel and its neighbors, resulting in a ‘convoluted’ version of the original image (Fisher, Perkins, Walker, & Wolfart, 2000). Finding the right set of features which are capable of describing a whole image is not considered a trivial task (Kan, 2017).

A potential problem with only labeling two-label method used in Haubold et al. is that not all objects, bacteria in this case, in images are separated from each other. This could result in both under and over segmentation, i.e. two bacteria being classified as one or more than two respectively, which negatively influences results in the tracking phase (Haubold et al., 2016a; Q. Wang et al., 2010). Oversegmentation results in false detections, whilst undersegmentation is likely to result in appearing of disappearing tracks during the tracking stage (Schiegg, Hanslovsky, Kausler, Hufnagel, & Hamprecht, 2013). Others have tried combating this problem by splitting the foreground class into bacteria border and interior classes and trying a variety of different algorithms, reducing the risk of segmentation errors when labeling connected components as bacterial objects (Maayan et al., 2016; Q. Wang et al., 2010). Figure 2.3 displays the two and the and three label method and the segmentation resulting from it. It reveals that the two-label technique resulted in an undersegmentation error where a cluster of five was classified as a single object.

Figure 2.3 (a) two label binary image. (b) three label binary image, here bacteria interior pixels are separated from border pixels. The green box show the undersegmentation issue.

Two main challenges are currently present in this field when it comes to segmenting images of biofilm forming bacteria, namely curation time, and solution sharing (Maayan et al., 2016). Curation time is linked to the annotation of the acquired fluorescence microscopy images. In order to accurately classify the pixels in the raw image, the various algorithms require examples in order to classify the unseen pixels. Manual annotation can be a laborious and time-consuming process, Maayan et al. reportedly spend around 40 hours in order to correctly segment biofilm forming bacteria. Finally, solution sharing, in other

words generalizability, is often impossible due to the use of unique highly tuned mixtures of the previously mentioned techniques (Maayan et al., 2016).

2.2.2 Bacteria Tracking Chang et al. describe the two general approaches for tracking objects in time-lapse images. The first approach segments and tracks objects at the same time and uses the results of the previous frame t -1 as input for analyzing the following frame t. Although this method is efficient, it struggles with new cells entering the field of view. On the other hand, the second approach starts by detecting objects in all the available frames, prior to tracking them through the sequence of frames (Chang et al., 2017). Examples of algorithms implementing this approach are neighboring graph (Li et al., 2017), minimum-cost flow tracking (Padfield, Rittscher, & Roysam, 2011), multiple-hypothesis tracking (Chenouard, Bloch, & Olivo-Marin, 2013). Although these algorithms can cope with new cells entering the field of view, they are not optimal for dealing with temporary disappearance, splitting and merging objects (Chang et al., 2017; Schiegg et al., 2013).

In order to track the pseudomonas aeruginosa bacteria, which exhibit the behaviors described above, different algorithms are required. Scherf et al. a method able to generate an ‘objective and reproducible quantification of structural cell properties as the evolve in space and time’. A potential problem with this technique comes from the fact it was developed on mouse embryonic stem cells (Scherf et al., 2012), indicating this technique might not be suitable for tracking individual bacteria. Furthermore, it also fails to address the, almost always, existing issue of under and oversegmentation in the object detection stage. One technique which does explicitly combats this segmentation problem is described by Schiegg et al. Their conservation tracking algorithm allows tracking of divisible objects whilst considering merged and over segmented objects. It functions by processing all time steps at the same time and inferring globally as opposed to time-step wise. The advantage is that it allows to check for consistency in the global environment, and thus allows correcting or accounting for segmentation errors (Schiegg et al., 2013).

3. Experimental Setup This section provides a description of the dataset and the pre-processing procedure of how to generate features, followed by the experimental procedure of training and tuning classifiers and evaluating their performance on unseen data.

3.1 Data The original dataset used in developing the classifier of background, border, and interior pixels consisted of 55 time-lapse fluorescence microscopy images of the bacteria Pseudomonas aeruginosa (figure 3.1). These images come in the tagged image file (TIF) format and where acquired using an Olympus Confocal Laser Scanning Microscope in combination with a flow cell device at 50x with 1-minute intervals (Ong, 2019). When viewed as a NumPy array, each of the raw images are shaped 1024,1024,1 containing a total of 1048576 gray-scale pixel values ranging from 0 to 255 with 0 being black and 255 being white. In order to determine the most accurate technique for pixel classification and subsequently object labeling, pre-processing of the images was required. This consisted of creating a ground truth, feature engineering, and balancing classes.

Figure 3.1 Pseudomonas aeruginosa at different time intervals

3.1.1 Creating ground truth First a ground truth was developed by annotating the images through the Ilastik pixel classification workflow. This resulted in a 55,1024,1024,1 NumPy array containing the labels for each pixel in each raw image, ‘1’ corresponding to a background pixel, ‘2’ a border pixel, and ‘3’ an interior pixel. The array was exported from Ilastik and saved for later use in the dataset creation process.

3.1.2 Feature engineering The second pre-processing step consisted of engineering features from the raw gray-scale pixel values and combining these with the labels acquired in the previous step to form a data frame to which different types of classifiers can be applied. Due to the fact Sommer et al. (Sommer et al., 2011) advocate the use of a broad combination of color and intensity, edge, and texture filters at different values of sigma of the Gaussian, the decision was made to engineer a dataset containing the Gaussian Smoothing, Laplacian of Gaussian, Gaussian Gradient Magnitude, Difference of Gaussian, Structure Tensor Eigenvalues and Hessian of Gaussian Eigenvalues filters applied with sigma 0.3, 0.7, 1.0, 1.6, 3.5, 5.0, and 10 levels of smoothing. Each of these features will be described in more detail below and examples of their application to the raw data will be provided.

Gaussian Smoothing, also referred to as Gaussian blur, is an image filter which essentially blurs the image in order to remove/reduce noise and detail. It uses a bell-shaped (Gaussian or Normal distribution) kernel to compute a blurred image and is often used prior to edge detection algorithms. The user specifies a level of σ which determines the degree of blur in the final convoluted image (The University of Auckland, 2010). Figure 3.2 provides two examples of Gaussian blur applied to an image of P. aeruginosa. (a) shows the raw version, (b) a Gaussian blur applied with σ = 1, and (c) a Gaussian blur applied with σ = 5. At σ = 5 it is clearly visible the original shapes of the microbes have been blurred.

Figure 3.2 Examples of a Gaussian blur filter applied to a raw image of pseudomonas aeruginosa at different levels of σ. (a) raw image, (b) filtered with σ = 1, and (c) filtered with σ = 5.

The Laplacian of Gaussian (LoG) filter, due to its proneness to noise, is often applied after having smoothed raw images. It is frequently used to detect blobs by tracking the second derivative of an image, resulting in uniform areas becoming zero. When edges are present between two regions, the LoG will produce values of zero away from the edge (Bedros, n.d.; Weeks, 1996). An advantage of this filter is the low computational cost, whilst a disadvantage is that it does not take into account the direction of the edges (Nixon & Aguado, 2012). Figure 3.3 provides two examples of a Laplacian of Gaussian filter applied to an image of P. aeruginosa. (a) shows the raw version, (b) a Laplacian of Gaussian applied after blurring the image with σ = 1, and (c) one applied with a σ = 5 blur. (c) provides a clear example of the algorithm producing values of zero away from the edges as the insides of the microbes are turned black.

Figure 3.3 Examples of a Laplacian of Gaussian filter applied to a raw image of pseudomonas aeruginosa at different levels of σ. (a) raw image, (b) filtered with σ = 1, and (c) filtered with σ = 5.

Gaussian Gradient Magnitude filters, also referred to as Sobel operators, are used to find thin continuous edges in images and are calculated by convolving the original image with a integer values kernel, making this method computationally cheap (Fisher et al., 2000). Figure 3.4 provides two examples of a Gaussian Gradient Magnitude filter applied to an image of P. aeruginosa. (a) shows the raw version, (b) a Gradient Magnitude applied to a

Gaussian image blurred at σ = 1, and (c) one applied to an image blurred at σ = 5. (b) and (c) show the filters capability of highlighting the thin edges of the microbe’s borders.

Figure 3.4 Examples of a Gaussian Gradient Magnitude filter applied to a raw image of pseudomonas aeruginosa at different levels of σ. (a) raw image, (b) filtered with σ = 1, and (c) filtered with σ = 5.

Difference of Gaussian (DoG) is another algorithm commonly used as a filter in image processing. It removes high-frequency spatial detail, which often contains random noise, from the images and highlights any edges potentially present. The DoG is calculated by computing two Gaussian blurred versions with different levels of sigma of the raw image and subtracting these two from each other. A disadvantage of the DoG filter is that it reduces the overall contrast in the image (Spring, Russ, Parry-Hill, Fellers, & Davidson, 2016). Figure 3.5 provides two examples of a Difference of Gaussian filter applied to an image of P. aeruginosa. (a) shows the raw version, (b) a DoG applied after blurring the image with σ = 1 and subtracting an image blurred at σ = 0.66, and (c) one applied with a σ = 5 blur, subtracting an image blurred at σ = 3.3. (b) and (c) reveal that this feature is helpful to distinguish between a microbe’s internal and border pixels and show when two microbes are attached together.

Figure 3.5 Examples of a Difference of Gaussian filter applied to a raw image of pseudomonas aeruginosa at different levels of σ. (a) raw image, (b) filtered with σ = 1, and (c) filtered with σ = 5.

Structure Tensor Eigenvalues filters are computed by applying Eigen-decomposition to the structure tensor of the raw image, resulting in a more accurate evaluation of local gradient characteristics (Jähne, 1993). It is used for both corner and contour detection and can also be applied to data in dimensions higher than 2D. Figure 3.6 provides two examples of a Structure Tensor Eigenvalues filter applied to an image of P. aeruginosa. (a) shows the raw version, (b) the filter applied after blurring the image with σ = 1, and (c) one applied with a σ = 5 blur. Whilst (b) reveal that this feature might be used to find microbe contours at lower levels of sigma., (c) indicates that a larger value of sigma this filter might be less useful.

Figure 3.6 Examples of a Structure Tensor Eigenvalues filter applied to a raw image of pseudomonas aeruginosa at different levels of σ. (a) raw image, (b) filtered with σ = 1, and (c) filtered with σ = 5.

The final filter proposed by Sommer et al. is the Hessian of Gaussian Eigenvalues filter (Sommer et al., 2011). It is especially suitable for detecting spherical objects, or center detection, highlighting their edges white and internals black (Fazlollahi, Meriaudeau, Villemagne, Rowe, & Desmond, 2013). Figure 3.7 provides two examples of a Hessian of Gaussian filter applied to an image of P. aeruginosa. (a) shows the raw version, (b) the filter applied after blurring the image with σ = 1, and (c) one applied with a σ = 5 blur. (b) and (c) indicate this might be a useful feature in order to segment multiple bacteria clustered together as separate entities, combatting the problem which will be described in the following paragraph.

Figure 3.7 Examples of a Hessian of Gaussian filter applied to a raw image of pseudomonas aeruginosa at different levels of σ. (a) raw image, (b) filtered with σ = 1, and (c) filtered with σ = 5.

By utilizing the VIGRA Computer Vision Library created by Köthe, a script was written which automatically loads (a subset of) the raw images in Python, applies the previously described filters, stacks the corresponding labels for each pixel, and saves the output NumPy array to a specified location. The user specifies the path to both the pixel label array and the data folder and indicates how many pictures to be transformed into a dataset. When all raw images are to be processed, the script loops over the entire data folder and , for each image, applies the filters and stacks the labels, creating a 1024 by 1024 by 50 NumPy array which is then flattened to 2D , resulting in an array of shape 57671680 by 50, and saved at a location of choice. The final step in creating the data frame consisted of loading the previously saved array in Pandas and naming the columns based on the filter and sigma level used and transforming the floating-point labels to strings. In order to get an idea of the resulting data frame, Figure 3.8 provides an overview of the first five observations, their values for the last four features, and their labels. Furthermore, a table containing descriptive statistics of all the 49 features is included in Appendix A.

Figure 3.8 Overview of the structure of the data frame

3.1.3 Balancing classes After inspecting the distribution of the labels in the dataset, it was discovered the amount of background, border, and interior labels is unbalanced. Although this was not a complete

surprise after visually inspecting the raw images (more background compared to bacteria), it is still an issue which needs to be addressed in order to accurately access the pixel classification models. As shown in Figure 3.9, a data frame, created by pre-processing a random selection of five raw images, contains 5053195 background, 116612 border, and 73073 interior pixels. In other words, roughly 96.4% of the labels belongs to the background class, whilst only 2.2% belong to the border class and the remaining 1.4% to the interior class.

Figure 3.9 Distribution of class labels

To balance the classes and at the same time create a more manageable dataset in terms of computational time and hardware requirements, the decision was made to apply the under- sampling technique. Under-sampling aims at reducing the number of majority class members in the dataset and is less prone to overfitting, but at the risk of useful information being lost (Minh, 2018). However, considering the size of our available data, larger balanced datasets could be created in order to mitigate the potential negative influence of lost information. First, the data frame was randomly shuffled. Afterwards, a user specified number (1000 in this case) observations from each class were randomly sampled using Python’s build-in sample function. This number can be easily increased by adapting n of the interior class in the preprocessing code should better hardware be available. Figure 3.10 shows the balanced classes after applying this process.

Figure 3.10 Distribution of balanced classes

3.2 Method / Models After having pre-processed the data, the two experiments aimed at answering the questions of which features are most predictive of the outcome pixels and the best technique for segmenting pseudomonas aeruginosa were performed. In order to prepare the data for training and tuning the algorithms, the training data was loaded-in using the pandas read_csv function (McKinney, 2011). This data was then split into an X and a y variable containing the features and outcome variable respectively. Using the Scikit-Learn train_test_split function (Pedregosa et al., 2011), the X and y data was split into a training and test set resulting in four new datasets, namely X_train, X_test, y_train and y_test. These sets contained 33%, or roughly 333, of the original observations and were composed of the features for the training set, the features for the test set, the outcomes of the train set and the outcomes of the test set respectively. The final step before training and tuning the different algorithms was scaling the features using Scikit-Learn’s StandardScaler function (Pedregosa et al., 2011), which standardizes features by removing and scaling to unit variance. After data preparation, the four different classifiers where trained, tuned, and evaluated on the data which resulted from the pre-processing step.

This section is aimed at providing a detailed explanation of the analysis methodology. It starts with a more formal explanation of the software, libraries, and models used during this project before providing a detailed explanation of the tuning and evaluation procedure of these models.

3.2.1 Software and Libraries At the core of the project are Anaconda Navigator and Jupyter Notebook. Anaconda was used to create separate environments containing the different packages/libraries which were necessary during the different stages of the project. The code was then written in Jupyter notebooks as it allows provision of clear structure and documentation (Pérez & Granger, 2007).

Other software used was Fiji and Ilastik. Fiji is image processing software (Morais, Koller, & Raffaelli, 2009). It was used to open and view the TIF format microscopy images. Ilastik also played an important role during this project. It is an interactive learning and segmentation toolkit focused at interactive image classification, segmentation and analysis (Sommer, Strähle, Köthe, & Hamprecht, 2011). Ilastik was first used to create a ground truth for the raw data which allowed training the previously mentioned classifiers. After the classifier where trained, they were applied to all 55 raw images in order to classify these images. They images were then fed back into Ilastik to track microbes through the different times steps and export this data as CSV file. This allowed further quantification of microbe behavior during the initial stages of biofilm development.

Next to the different software distributions, this project also made extensive use of wide selection of Python libraries. The most important were NumPy, Matplotlib scikit-learn, pandas, matplotlib, seaborn, vigra. NumPy was used to load raw data in Python, compose data frames, and saving data for later use (Oliphant, 2007). These frames and data were made more interpretable and indexable by naming feature columns using Pandas (McKinney, 2011). VIGRA was then used for engineering features for the raw fluorescence microscopy images which were required to create a classifier which could accurately classify background, border, and interior pixels. Furthermore, VIGRA’s connected component analysis tool was used to provide each segmented bacterium with a unique label, which were required for further analysis down the pipeline (Kothe, 1999). Matplotlib and Seaborn were then used to visually inspect data and for the creation of plots (Hunter, 2007; Waskom, Botvinnik, & O’Kane, 2017). Scikit-learn provided the functions required to pre- process data, perform LASSO, k-NN, Random Forest, and SVM classifications, model selection, and evaluate the selected models (Pedregosa et al., 2011).

3.2.2 Models To address the first research question related to which set of features is most predictive of the outcome pixel, a LASSO classification was performed. This model is built on the idea of logistic regression, with the addition of an L1 regularization penalty. It is useful for determining feature predictive importance as it favors simple, sparse models. Furthermore, it can deal with multicollinearity, which might be a factor in this research due to the same filters at different levels of sigma serving as features. When tuning this model, the λ determines the amount of shrinkage applied to the parameters in the model (James, Witten, Hastie, & Tibshirani, 2013).

Moving on to the second research question related to determining the best classifier to segment pseudomonas aeruginosa from raw fluorescence microscopy images. Three types of classifiers were evaluated in order to answer this question. These include the k-Nearest Neighbor (k-NN), Random Forests, and Support Vector Machines (SVM). For each of them, the following paragraphs provide a short introduction to the model, followed by a

description of their hyperparameters which were tuned in order to optimize their results as pixel classifier.

The first model which was trained to classify pseudomonas aeruginosa image pixels was the k-NN algorithm. Because no training occurs, some consider the k-NN algorithm as being one of the simpler in machine learning. Training data is stored in memory and the algorithm predicts the class of an unseen data point as the majority class of its k closest data points or ‘neighbors’ (Muller & Guido, 2017). When evaluating multiple different algorithms 1-NN is often used as a benchmark, or baseline, due to it adequate performance on most classification tasks (Jain, Duin, & Mao, 2000).The most important hyperparameters of the k-NN classifier are considered the value of k and the distance metric used in calculating the proximity of data points. Most commonly the Euclidean distance is used to calculate the distance between two points. Other metrics include the Minkowsky, also referred to as Manhattan, cosine similarity, and Chi square distance (Hu, Huang, Ke, & Tsai, 2016).

A more complex type of algorithm is the Random decision forests classifier. It builds on the disadvantage of single decision trees, which are considered to be prone to overfitting. (Breiman, 2001; Justia 2019) By fitting multiple marginally different trees and averaging their results, the Random Forest classifier is an ensemble method which aims at reducing overfitting whilst keeping the predictive power observed in single trees. The term Random Forest comes from the randomized process of selecting data or features when building each separate tree (Muller & Guido, 2017). There are several hyper-parameters which can be tuned when implementing a Random Forest. These include n_estimators, max_features, max_depth, max_features, and max_leaf_nodes. N-estimators is the number of trees in the forest, max_features is the number of features being considered when determining the most appropriate split, max_depth is the depth of the nodes in a tree, and, finally, max_leaf_nodes prioritizes trees with less number of leaf nodes (Muller & Guido, 2017).

Finally, an SVM algorithm was used. It utilizes the idea of maximizing margins between the decision boundary and data points, using the kernel trick. (Boser, Vapnik, Guyon, & Laboratories, n.d.). They project non-linear input vectors to high-dimensional space where a linear decision boundary can separate the classes (Cortes & Vapnik, 1995). Predicting unseen data is done by calculating the distance to these support vectors as measured by the Gaussian kernel, taking the exponent of the squared product of the Euclidean distance and a gamma parameter (James et al., 2013; Muller & Guido, 2017). When implementing the SVM two hyper-parameters can be tuned, these include gamma and C. Gamma determines the width of the Gaussian kernel, whilst C is a regularization parameter. Large values of gamma mean a smaller width for the Gaussian kernel, implying points are treated as being far away and vice versa. On the other hand, large values of C result in more unrestricted models were data points can have significant influence, bending the decision boundary in order to classify them correctly (James, Witten, Hastie & Tibshirani, 2013; Muller & Guido, 2017).

3.2.3 Hyperparameter Tuning First the optimal hyperparameter settings for the four algorithms had to be determined. In order to reduce the computational resources required evaluate the grid of parameters, the Scikit-Learn function RandomizedSearchCV (Pedregosa et al., 2011) was used to narrow

the range of values to evaluate (Koehrsen, 2018). This function conducts a cross-validated randomized hyperparameter search based on a range of values, k-fold, and number of searches specified by the user and outputs the top three models and corresponding parameters. Table 3.1 provides an overview of the ranges used for the different models their different parameters. As recommended by Koehrsen, for each model the number of searches was 100 and k was five.

Table 3.1 RandomizedSearchCV parameter ranges. Model Parameter Range LASSO λ 0 - 1 k-NN n_neighbors 1-100 kernel [linear, rbf, poly, sigmoid] SVM C 0.05 – 4 gamma 0.01 - 1 max_depth [3, None] max_features 1-25 RF min_samples_split 2-20 bootstrap [True, False] criterion [gini, entropy] Note. n_neighbors, max_features, and min_samples_split are integer ranges. For λ the step size was 0.00001 whilst for C and gamma this was 0.05.

Based on the results of this randomized search, a new grid was proposed in order to evaluate with Scikit-Learns GridSearchCV function, which conducts an exhaustive search over user all specified parameters (Pedregosa et al., 2011). Table 3.2 provides an overview of the different parameters used for this exhaustive search.

Table 3.2 GridSearchCV parameter values. Model Parameter Values LASSO λ [1e-05, 3e-05, 5e-05, 8e-05] k_NN n_neighbors SVM kernel [linear, rbf, poly, sigmoid] C [0.01, 0.05, 0.1, 0.5, 1, 2, 3, 5] [0.05, 0.1, 0.15, 0.2, 0.25, 0.3, gamma 0.35, 0.5] RF max_depth [None] max_features [11, 12, 13, 14, 15, 16, 17] min_samples_split [3, 5, 7, 8, 10] bootstrap [False] criterion [gini, etropy] Note. The values were selected based on the outcome of the RandomizedSearchCV. If only one value is specified, all the top three models reported this parameter.

3.2.4 Evaluation Classification algorithms are usually evaluated across equally important aspects. These include calculation time, and predictive power (Srivastava, 2018). In evaluating the predictive power of the algorithms described above, we made use of the precision, recall and F1 confusion metrics (Powers, 2007). Precision relates to ratio of correct positively predicted observations to total positively predicted observations.

푻푷 푷풓풆풄풊풔풊풐풏 = 푻푷 + 푭푷

Recall, also known as Sensitivity, relates to the fraction of correctly predicted positive observations in one specific class.

푻푷 푹풆풄풂풍풍 = 푻푷 + 푭푵

Finally, the F1 score calculates the weighted average of both the precision and the recall score. ퟐ ∗ (푹풆풄풂풍풍 ∗ 푷풓풆풄풊풔풊풐풏) 푭ퟏ 풔풄풐풓풆 = 푹풆풄풂풍풍 + 푷풓풆풄풊풔풊풐풏

In order to figure out the most appropriate technique for segmenting bacteria from raw images consisted of evaluating the top performing model parameters in terms of precision, recall and their F1 score on previously unseen data. This was done by generating a second dataset using the procedure describe in 4.1. For each model, the default model from Scikit- learn and tuned model were trained on 3/2th of this new dataset and evaluated on the remaining 1/3th and compared based on precision, recall and their f1 score. In addition, the top performing model will be applied to a raw image of pseudomonas aeruginosa and segmentation errors will be manually counted and compared to the baseline segmentation using Otsu’s and mean thresholds.

3.2.5 Baseline In order to determine whether the model has indeed learned something about classification of pixels in fluorescence microscopy images of pseudomonas aeruginosa, three baselines were created. These consist of object labelled images of a raw image and the mean and Otsu thresholding techniques as mentioned by Kan in his review (Kan, 2017). Figure 3.11 displays the object labels when applied to a raw version of image. When carefully inspecting the image, it is clear there is a lot of noise in the image which was not visible when visually inspecting the raw image. In addition, when manually evaluating the segmentation errors it reveals there are four undersegmentation error in this section.

Figure 3.11 Example of object labels (b) when applied to the raw image (a).

The next baseline which was created was the mean threshold. As the name implies, it binarizes the image based on the mean of the grayscale values of the raw image (Pedregosa et al., 2011). Figure 3.12a shows an example of the mean threshold applied to a raw image. When carefully inspecting figure 3.12b, we see that it outperforms labeling a raw image in terms of dealing with noise but is not able to filter out all the noise. In terms of undersegmentation errors it performs equally well (four errors).

Figure 3.12 Example of object labels (b) after applying a mean threshold (a) to a raw image.

Otsu’s threshold was another method which was evaluated. It clusters the pixels into two groups with different ranges of values, trying to make each as different as possible in order to prevent overlap. The threshold is then set at the value at which the clusters are least similar to each other (Morse, 2000). Figure 3.13a displays the results of applying Otsu’s method to a raw image and the corresponding object labels in 3.13b. We see that this method can remove all the noise from the image and is less prone to undersegmentation error. However, whilst there is only one undersegmentation error, it is important to note this method is prone to oversegmenting the image as seen from the five segmentation errors in section b of figure 3.13.

Figure 3.12 Example of object labels (b) after applying a mean threshold (a) to a raw image.

4. Results This section starts with presenting the results of applying different baseline thresholds to the data, followed by an evaluation of the classifiers using the all the available features. Afterwards, the different feature coefficients resulting from the LASSO regression are presented in order to determine which features are most predictive of the outcome pixels. It continues with each classifier’s results when trained using the eight features with the largest coefficients. Finally, the classifier with the best score is applied to classify the pixels of the unseen raw images which are subsequently labeled.

Table 4.1 shows the evaluation metrics of different thresholding techniques applied to the dataset. Looking at the F1-score, the binarized (raw) threshold outperforms the other methods in the BG class. The mean threshold scores the highest in the bacteria class.

Table 4.1 Best performing thresholds for classifying background (BG) and bacteria pixels in fluorescence microscopy image of Pseudomonas Aeruginosa.

Precision Recall F1 score

Threshold BG Bacteria BG Bacteria BG Bacteria

Raw 0.70 0.99 0.98 0.79 0.82 0.82

Mean 0.67 0.99 0.99 0.76 0.80 0.86

Otsu 0.45 1.00 1.00 0.38 0.62 0.55

Note. Applied to the same dataset used for determining feature importance and classifier performance.

Table 4.2 provides an overview of the evaluation results of each classification model with their optimized parameters. From the table it is possible to see that, with 1.00 on the background class and 0.97 on both the border and interior class, the Random Forest model outperforms or matches the other models in terms of the F1-score for all the three classes. Next, precision and recall will be evaluated individually. Looking at precision, it was found that the SVM outperforms the other models in predicting the background class and matches the performance of the Random Forest for the border class. For the interior the k-NN outperforms the other models. When evaluating the models results in terms of recall, both the SVM and Random Forest perform equally well for the background class, the k-NN scores highest on the border class, and again the SVM and Random forest achieve the highest scores on the interior class.

Table 4.2 Best performing models for classifying background (BG), border (B), and interior (I) pixels in fluorescence microscopy image of Pseudomonas Aeruginosa.

Precision Recall F1 score

Model BG B I BG B I BG B I

k-NN 0.99 0.93 0.98 0.99 0.98 0.94 0.99 0.96 0.96

SVM 1.00 0.96 0.96 1.00 0.96 0.96 1.00 0.96 0.96

RF 0.99 0.96 0.97 1.00 0.97 0.96 1.00 0.97 0.97

Note. k-Nearest Neighbors (k = 14), Support Vector Machine (C = 0.05, kernel = ‘poly’, gamma = 0.3), Random Forest (bootstrap = False, criterion = 'gini', max_depth = None, max_features = 4, min_samples_split = 17).

Table 4.3 provides an overview of feature coefficients which were not penalized to 0 by the LASSO regression. From the table we see that the Gaussian Smoothing σ = 3.5 has the largest positive coefficient with 0.083, followed by Gaussian Gradient Magnitude σ = 1.6 with 0.041. Looking at the largest negative coefficients, the second channel of Hessian of Gaussian Eigenvalues σ = 1.6 has a value of -0.055, followed by the second channel of Structure Tensor Eigenvalues σ = 1.0 with a value of -0.02. Furthermore, we see that the second channel of the Hessian of Gaussian eigenvalues and Gaussian Gradient Magnitude occur three and two times at different levels of σ respectively.

Table 4.3 Feature importance for classifying background, border, and interior pixels in fluorescence microscopy images of Pseudomonas Aeruginosa according to Lasso classifier.

Feature σ Coefficient

Gaussian Smoothing 3.5 0.082898960

Gaussian Gradient Magnitude 1.6 0.041397010

Gaussian Gradient Magnitude 3.5 0.022350758

Hessian of Gaussian Eigenvalues 1.6 0.005934474

Hessian of Gaussian Eigenvalues * 0.7 -0.007669529

Hessian of Gaussian Eigenvalues * 3.5 -0.008701015

Structure Tensor Eigenvalues 1.0 -0.020438848

Hessian of Gaussian Eigenvalues * 1.6 -0.054961003

Note. The λ value used for the lasso classification was 0.0005 which was determined through performing GridSearchCV.

Table 4.4 provides an overview of the evaluation results of each classification model with their optimized parameters trained on the dataset with the eight features resulting from the LASSO classification. From the table it is possible to see that, with 1.00 on the background class and 0.97 on both the border and interior class, the Random Forest and k-NN model outperform the SVM in terms of F1-score for all the three classes. Next, precision and recall will be evaluated individually. Looking at precision, it was found that all classifiers perform equally well in predicting the background class and interior class. The Random Forest classifier, however, outperforms the other models when predicting the border class. When evaluating the models’ results in terms of recall, the Random Forest outperforms the other models for the background and interior class, whilst the k-NN scores highest on the border class.

Table 4.4 Best performing models for classifying background (BG), border (B), and interior (I) pixels in fluorescence microscopy image of Pseudomonas Aeruginosa With a reduced feature set.

Precision Recall F1 score

Model BG B I BG B I BG B I

k-NN 1.00 0.96 0.97 0.99 0.98 0.96 1.00 0.97 0.97

SVM 1.00 0.95 0.97 0.99 0.97 0.95 1.00 0.96 0.96

RF 1.00 0.97 0.97 1.00 0.97 0.97 1.00 0.97 0.97

Note. k-Nearest Neighbors (k = 5), Support Vector Machine (C = 2.6, kernel = ‘poly’, gamma = 0.11), Random Forest (bootstrap = True, criterion = 'entropy', max_depth = None, max_features = 5, min_samples_split = 5).

After having determined the best performing classifier for pixel classification of fluorescence microscopy images of pseudomonas aeruginosa, it was applied to the original 55 raw images. Afterward the connected component labeling was applied to the pixel classified images in order to generate the labelled images. The results of all 55 images can be found in appendix B. Figure 4.1a shows a section of a pixel classified image and 4.1b shows the resulting connected component analysis resulting from that image. The classifier can detect noise and classify it as background pixels accordingly. Furthermore, comparing these results to the raw image reveals one undersegmentation error in the top left corner of this specific section of the image.

Figure 4.1 Top performing pixel classifier applied to a raw image (a) and the resulting connected component labeling.

5. Discussion This section will discuss the results in relation to the research questions. The overall goal of this research was to assist in the development of a tool which automatically segments individual bacterium in time-lapse microscopy images of the early stages of biofilm development. Therefore, we wanted to find out which set of features is most important for pixel classifying fluorescence microscopy images of pseudomonas aeruginosa and subsequently finding out the best classifier and its accuracy. In order to do so, several experiments were conducted.

First, a LASSO was performed to determine which features were most important for pixel classifying the raw images. All but eight features were shrunk to zero by our model. These included Gaussian Smoothing σ = 3.5, Gaussian Gradient Magnitude σ = 1.6, Gaussian Gradient Magnitude σ = 3.5, Hessian of Gaussian Eigenvalues σ = 1.6, Hessian of Gaussian Eigenvalues * σ = 0.7, Hessian of Gaussian Eigenvalues * σ = 3.5, Structure Tensor Eigenvalues σ = 1.0, and Hessian of Gaussian Eigenvalues * = 1.6. The coefficients of these features in combination with the fact that the second channel of Hessian of Gaussian Eigenvalues occurs three times in this list, might be an indication that when developing a tool for segmenting bacterium from fluorescence microscopy images, these are the important features. Feature research might look at further refining the optimal levels of σ, as we have experimented with 0.3, 0.7, 1.0, 1.6, 3.5, 5.0, and 10 and optimal values could potentially fall in between.

Next, a baseline was created using thresholding techniques which are commonly used in image segmentation, followed by evaluating the classifiers’ performance on the dataset containing the full and reduced set of features. The Random Forest classifier outperformed or matched the other classifiers’ performance in both cases, be it by a very small margin, indicating this is potentially the best classifier for performing pixel classification on fluorescence microscopy images of pseudomonas aeruginosa. Visually comparing the object labelled images of the baselines and our method revealed it is also much less prone to segmentation errors compared to other commonly described techniques in the literature. Feature research might look at tracking individual bacterium through time with these labelled images to see how well tracking algorithms are able to operate based on these results.

Finally, it is important to consider the potential limitations of this study. This work was applied to one type of cell, pseudomonas aeruginosa, and captured using one type of microscope, the Olympus Confocal Laser Scanning Microscope. Furthermore, the images were all taken at the same level of magnification and parts of the images were out of focus, all of which could negatively influence generalization capabilities outside of the original dataset. It would be interesting to apply the technique described in this research to images of other biological cells and images taken from different types of microscopes such as phase contrast microscopes to see how well they generalize outside our original dataset, especially because generalization is a known problem in this field of research.

6. Conclusion This research focused on determining the best technique to segment bacteria in fluorescence microscopy images of the early stages of biofilm formation. More specifically, the important feature set and algorithms when pixel classifying the images and subsequently object labeling them. It is an interesting problem due to technological

advances producing more and more data which is laborious and time-consuming the analyze manually. In order to answer the research questions a dataset was carefully engineered from 55 raw fluoresce microscopy images and different models were applied to it. This included creating a ground truth using the Ilastik software, feature engineering using the features from the VIGRA library, and balancing and subsetting the data, performing a LASSO to determine feature importance, tuning and evaluating the performance of three commonly used classifiers, and, finally, applying the top performing classifier to the original dataset of 55 fluorescence microscopy images and labeling the objects. It was discovered that the most important features were a subset of the original 49 consisting of Gaussian Smoothing σ = 3.5, Gaussian Gradient Magnitude σ = 1.6, Gaussian Gradient Magnitude σ = 3.5, Hessian of Gaussian Eigenvalues σ = 1.6, Hessian of Gaussian Eigenvalues * σ = 0.7, Hessian of Gaussian Eigenvalues * σ = 3.5, Structure Tensor Eigenvalues σ = 1.0, and Hessian of Gaussian Eigenvalues * = 1.6. Using this subset and training different classifiers on it, the discovery was made the Random Forest classifier was the most suitable for classifying pixels in raw fluorescence microscopy images of pseudomonas aeruginosa. Furthermore, labeling the objects in the images resulting from this pixel classification revealed this method is less prone to segmentation compared to other commonly used techniques in image segmentation.

References

Barnes, R., Lehman, C., & Mulla, D. (2014). Priority-flood: An optimal depression-filling and watershed-labeling algorithm for digital elevation models. Computers and Geosciences, 62(2), 117–127. https://doi.org/10.1016/j.cageo.2013.04.024

Bedros, S. J. (n.d.). Edge Detection.

Boser, E., Vapnik, N., Guyon, I. M., & Laboratories, T. B. (n.d.). A Training Algorithm for Optimal Margin Classifiers, 144–152.

Breiman, L. (2001). Random Forests, 1–33.

Chang, Y. H., Yokota, H., Abe, K., Tang, C. T., & Tasi, M. D. (2017). Automated Detection and Tracking of Cell Clusters in Time-Lapse Fluorescence Microscopy Images. Journal of Medical and Biological Engineering, 37(1), 18–25. https://doi.org/10.1007/s40846- 016-0216-y

Chenouard, N., Bloch, I., & Olivo-Marin, J. C. (2013). Multiple hypothesis tracking for cluttered biological image sequences. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(11), 2736–2750. https://doi.org/10.1109/TPAMI.2013.97

Conrad, J. C., Gibiansky, M. L., Jin, F., Gordon, V. D., Motto, D. A., Mathewson, M. A., … Wong, G. C. L. (2011). Flagella and pili-mediated near-surface single-cell motility mechanisms in P. aeruginosa. Biophysical Journal, 100(7), 1608–1616. https://doi.org/10.1016/j.bpj.2011.02.020

Cortes, C., & Vapnik, V. (1995). Support-Vector Cortes, C., & Vapnik, V. (1995). Support- Vector Networks. Machine Learning, 20(3), 273–297. http://doi.org/10.1023/A:1022627411411Networks. Machine Learning. https://doi.org/10.1023/A:1022627411411

Costerton, J. W., Geesey, G. G., & Cheng, K. J. (1978). How bacteria stick. Scientific American, 238(1), 86–95. https://doi.org/10.1038/scientificamerican0178-86

Costerton, J. W., Stewart, P. S., & Greenberg, E. P. (1999). Bacterial biofilms: a common cause of persistent. Infections, Science, 284(May), 1318–1322.

Cristina, C., & Queimadelas, A. (2012). Automated segmentation , tracking and evaluation of bacteria in microscopy images.

Cunningham, A., Lennox, J., & Ross, R. (n.d.). Biofilms: The Hypertextbook.

Etzrodt, M., Endele, M., & Schroeder, T. (2014). Quantitative single-cell approaches to stem cell research. Cell Stem Cell. https://doi.org/10.1016/j.stem.2014.10.015

Fazlollahi, A., Meriaudeau, F., Villemagne, V., Rowe, C., & Desmond, P. (2013). AUTOMATIC DETECTION OF SMALL SPHERICAL LESIONS USING MULTISCALE AP-PROACH IN 3D MEDICAL IMAGES. IEEE ICIP.

Fisher, R., Perkins, S., Walker, A., & Wolfart, E. (2000). HYPERMEDIA IMAGE

PROCESSING. Retrieved from https://homepages.inf.ed.ac.uk/rbf/HIPR2/convolve.htm

Gibiansky, M. L., Luijten, E., Beckerman, B., Zhao, K., Jin, F., Wong, G. C. L., … Harrison, J. J. (2013). Psl trails guide exploration and microcolony formation in Pseudomonas aeruginosa biofilms. Nature, 497(7449), 388–391. https://doi.org/10.1038/nature12155

Hall-Stoodley, L., & Stoodley, P. (2009). Evolving concepts in biofilm infections. Cellular Microbiology, 11(7), 1034–1043. https://doi.org/10.1111/j.1462-5822.2009.01323.x

Haubold, C., Schiegg, M., Kreshuk, A., Berg, S., Koethe, U., & Hamprecht, F. A. (2016a). Segmenting and tracking multiple dividing targets using ilastik. Advances in Anatomy Embryology and Cell Biology, 219, 199–229. https://doi.org/10.1007/978-3-319-28549- 8_8

Haubold, C., Schiegg, M., Kreshuk, A., Berg, S., Koethe, U., & Hamprecht, F. A. (2016b). Segmenting and tracking multiple dividing targets using ilastik. Advances in Anatomy Embryology and Cell Biology. https://doi.org/10.1007/978-3-319-28549-8_8

Hu, L. Y., Huang, M. W., Ke, S. W., & Tsai, C. F. (2016). The distance function effect on k- nearest neighbor classification for medical datasets. SpringerPlus, 5(1). https://doi.org/10.1186/s40064-016-2941-7

Hunter, J. (2007). Matplotlib: A 2D Graphics Environment. Computing in Science & Engineering, 9(3), 90–95.

Jähne, B. (1993). Spatio-Temporal Image Processing: : Theory and Scientific Applications.

Jain, A., Duin, R., & Mao, J. (2000). Statistical Pattern Recognition: a Review. IEEE Transactions on pattern analysis and machine intelligence.

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning. https://doi.org/10.1007/978-1-4614-7138-7

JONES HC., ROTH IL., S. W. (1969). Electron microscopic of a slime layer., 1969, 99, 316- 25. J. Bacteriol., 99(I), 316–325.

Kan, A. (2017). Machine learning applications in cell image analysis. Immunology and Cell Biology, 95(6), 525–530. https://doi.org/10.1038/icb.2017.16

Koehrsen, W. (2018). Hyperarameter Tuning the Random Forest in Python. Retrieved from https://towardsdatascience.com/hyperparameter-tuning-the-random-forest-in-python- using-scikit-learn-28d2aa77dd74

Kothe, U. (1999). Reusable Software in Computer Vision. Handbook on Computer Vision and Applications, 3.

L., H.-S., & P., S. (2009). Evolving concepts in biofilm infections. Cellular Microbiology. https://doi.org/10.1111/j.1462-5822.2009.01323.x

Lee, C. K., de Anda, J., Baker, A. E., Bennett, R. R., Luo, Y., Lee, E. Y., … Wong, G. C. L. (2018). Multigenerational memory and adaptive adhesion in early bacterial biofilm communities. Proceedings of the National Academy of Sciences. https://doi.org/10.1073/pnas.1720071115

Li, J., Wu, Y., Zhao, J., Guan, L., Ye, C., & Yang, T. (2017). Pedestrian detection with dilated convolution, region proposal network and boosted decision trees. Proceedings of the International Joint Conference on Neural Networks, 2017-May, 4052–4057. https://doi.org/10.1109/IJCNN.2017.7966367

Maayan, I., Lane, K. M., Kudo, T., Covert, M. W., Macklin, D. N., Quach, N. T., … Van Valen, D. A. (2016). Deep Learning Automates the Quantitative Analysis of Individual Cells in Live-Cell Imaging Experiments. PLOS Computational Biology, 12(11), e1005177. https://doi.org/10.1371/journal.pcbi.1005177

McKinney, W. (2011). pandas: a Foundational Python Library for Data Analysis and Statistics. PyHPC 2011 : Python for High Performance and Scientific Computing, (January 2011), 1–9.

Minh, H. (2018). How to Handle Imbalanced Data in Classification Problems. Retrieved from https://medium.com/james-blogs/handling-imbalanced-data-in-classification-problems- 7de598c1059f

Morais, N. A. de, Koller, S. H., & Raffaelli, M. (2009). Trajetórias de vida de crianças e adolescentes em situação de vulnerabilidade social: entre o risco e a protecção. Nature Methods, 9(7), 241. https://doi.org/10.1038/nmeth.2019.Fiji

Morse, B. S. (2000). Lecture 4 : Thresholding, 1998–2000.

Muller, A. C., & Guido, S. (2017). Introduction to Machine Learning with Python: A guide for data scientists. O’Reilly Media. https://doi.org/10.1017/CBO9781107415324.004

Nixon, M. S., & Aguado, A. S. (2012). Basic image processing operations. Feature Extraction & Image Processing for Computer Vision, 83–136. https://doi.org/10.1016/b978-0-12- 396549-3.00003-3

Oliphant, T. (2007). Python for Scientific Computing. Computing in Science & Engineering, 9(10).

Organization, W. H. O. (2019). Ten threats to global health in 2019. Retrieved from https://www.who.int/emergencies/ten-threats-to-global-health-in-2019

Padfield, D., Rittscher, J., & Roysam, B. (2011). Coupled minimum-cost flow cell tracking for high-throughput quantitative analysis. Medical Image Analysis, 15(4), 650–668. https://doi.org/10.1016/j.media.2010.07.006

Pamp, S. J., & Tolker-Nielsen, T. (2007). Multiple roles of biosurfactants in structural biofilm development by Pseudomonas aeruginosa. Journal of Bacteriology. https://doi.org/10.1128/JB.01515-06

Pedregosa, F., Michel, V., Grisel OLIVIERGRISEL, O., Blondel, M., Prettenhofer, P., Weiss, R., … Duchesnay EDOUARDDUCHESNAY, Fré. (2011). Scikit-learn: Machine Learning in Python Gaël Varoquaux Bertrand Thirion Vincent Dubourg Alexandre Passos PEDREGOSA, VAROQUAUX, GRAMFORT ET AL. Matthieu Perrot. Journal of Machine Learning Research, 12, 2825–2830.

Pérez, F., & Granger, B. E. (2007). IPython: A System for Interactive Scientific Computing, Computing. Science Engineering, 9(3), 21–29.

Petersen, F. N. R. (2009). Raman Spectroscopy as a Tool for Investigating Lipid – Protein Interactions. Spectroscopy, 24(10), 1–8.

Powers, D. M. W. (2007). Evaluation: From Precision, Recall, and F-Factor to ROC, Informedness, Markedness & Correlation, (December).

Rodgers, M., Zhan, X. M., & Dolan, B. (2004). Mixing characteristics and whey wastewater treatment of a novel moving anaerobic biofilm reactor. Journal of Environmental Science and Health - Part A Toxic/Hazardous Substances and Environmental Engineering. https://doi.org/10.1081/ESE-120039383

Salman, N. (2006). Image Segmentation Based on Watershed and Edge Detection Techniques. The International Arab Journal of Information Technology, 3(2), 104–110.

Scherf, N., Herberg, M., Thierbach, K., Zerjatke, T., Kalkan, T., Humphreys, P., … Roeder, I. (2012). Imaging, quantification and visualization of spatio-temporal patterning in mESC colonies under different culture conditions. Bioinformatics, 28(18), 556–561. https://doi.org/10.1093/bioinformatics/bts404

Schiegg, M., Hanslovsky, P., Kausler, B. X., Hufnagel, L., & Hamprecht, F. A. (2013). Conservation tracking. Proceedings of the IEEE International Conference on Computer Vision, 2928–2935. https://doi.org/10.1109/ICCV.2013.364

Sezgin, M., & Sankur, B. (2004). Survey over image thresholding techniques and quantitative performance evaluation. Journal of Electronic Imaging, 13(1), 220. https://doi.org/10.1117/1.1631316

Sommer, C., Straehle, C., Kothe, U., & Hamprecht, F. A. (2011). Ilastik: Interactive learning and segmentation toolkit. In Proceedings - International Symposium on Biomedical Imaging. https://doi.org/10.1109/ISBI.2011.5872394

Spring, K., Russ, J., Parry-Hill, M., Fellers, T., & Davidson, M. (2016). Difference of Gaussians Edge Enhancement. Retrieved from https://micro.magnet.fsu.edu/primer/java/digitalimaging/processing/diffgaussians/

Srivastava, T. (2018). Introduction to k-Nearest Neighbors: Simplified (with implementation in Python).

The University of Auckland. (2010). Gaussian filtering • Significant values, 18–32.

Tolker-Nielsen, T. (2015). Biofilm Development. Microbiology Spectrum. https://doi.org/10.1128/microbiolspec.mb-0001-2014

Ulman, V., Maška, M., Magnusson, K. E. G., Ronneberger, O., Haubold, C., Harder, N., … Blau, H. M. (2018). HHS Public Access, 14(12), 1141–1152. https://doi.org/10.1038/nmeth.4473.An

Vallotton, P., Sun, C., Wang, D., Ranganathan, P., Turnbull, L., & Whitchurch, C. (2009). Segmentation and tracking of individual Pseudomonas aeruginosa bacteria in biofilms. Aerospace Engineering, (Ivcnz), 221–225.

Vidyasagar, K., & Nagarathnamma, T. (2018). Study of catheter associated urinary tract infection and biofilm production by Escherichia coli. Indian Journal of Microbiology Research. https://doi.org/10.18231/2394-5478.2018.0081

Wang, M. (2010). Feature extraction , selection and classifier design in automated time-lapse fluorescence microscope image analysis, 1378–1388.

Wang, Q., Niemi, J., Tan, C. M., You, L., & West, M. (2010). Image segmentation and dynamic lineage analysis in single-cell fluorescence microscopy. Cytometry Part A. https://doi.org/10.1002/cyto.a.20812

Waskom, M., Botvinnik, O., & O’Kane, D. (2017). Seaborn.

Weeks, A. R. (1996). Fundamentals of Electronic Image Processing.

Appendix A: Descriptive statistics of pixel features.

Table A.1 Descriptive statistics of pixel vales of Gaussian blur, Laplacian of Gaussian, Gaussian Gradient Magnitude, Difference of Gaussian, Structure Tensor Eigenvalues, and Hessian of Gaussian Eigenvalues at different levels of σ.

σ = 0.7 σ = 1.0 σ = 1.6 σ = 3.5 σ = 5 σ = 10

Filter Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD

Gaussian Smoothing 507.47 734.29 501.66 695.92 486.82 613.99 412.43 403.19 345.97 311.19 201.67 174.82

Laplacian of Gaussian -25.46 377.85 -21.08 194.28 -18.06 116.75 -13.10 30.01 -8.40 12.06 -1.85 1.84

Gaussian Gradient Magnitude 223.03 275.68 197.65 232.40 159.60 168.74 72.87 64.37 39.91 34.51 9.68 8.45

Difference of Gaussian -4.63 98.31 -6.52 74.44 -13.67 102.38 -50.90 156.29 -75.78 140.30 -84.49 88.54

Structure Tensor Eigenvalues 1.13e05 1.74e05 83664.80 115595.48 46671.99 55470.18 6821.18 7055.76 1914.06 1915.07 121.81 131.88

Structure Tensor Eigenvalues * 8.24e03 1.6e04 5032.97 11776.44 3576.22 7858.63 1134.91 1501.70 453.63 504.58 47.26 52.21

Hessian of Gaussian Eigenvalues 97.22 211.64 52.65 102.52 30.50 56.83 4.40 13.36 0.29 4.83 -0.31 0.77

Hessian of Gaussian Eigenvalues -122.68 249.86 -73.73 134.53 -48.57 82.70 -17.51 21.78 -8.69 9.06 -1.53 1.29 *

Note. The Structure Tensor Eigenvalues and Hessian of Gaussian Eigenvalues filter which have been denoted with an asterisk* are the second channel filters which are the result of applying the respective filter to a 2-d image.

Appendix B: Classified images

In this appendix we have provided a pixel classification and object segmentation of each of the 55 raw fluorescence microscopy images in our dataset in chronological order of timestep t = 1 to t = 55.