DEGREE PROJECT IN COMPUTER SCIENCE AND ENGINEERING, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2018

Machine learning for blob detection in high-resolution 3D microscopy images

MARTIN TER HAAK

KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE for blob detection in high-resolution 3D microscopy images

MARTIN TER HAAK

EIT Digital Data Science Date: June 6, 2018 Supervisor: Vladimir Vlassov Examiner: Anne Håkansson Electrical Engineering and Computer Science (EECS)

iii

Abstract

The aim of blob detection is to find regions in a digital image that dif- fer from their surroundings with respect to properties like intensity or shape. Bio-image analysis is a common application where blobs can denote regions of interest that have been stained with a fluorescent dye. In image-based in situ sequencing for ribonucleic acid (RNA) for exam- ple, the blobs are local intensity maxima (i.e. bright spots) correspond- ing to the locations of specific RNA nucleobases in cells. Traditional methods of blob detection rely on simple image processing steps that must be guided by the user. The problem is that the user must seek the optimal parameters for each step which are often specific to that image and cannot be generalised to other images. Moreover, some of the existing tools are not suitable for the scale of the microscopy images that are often in very high resolution and 3D. Machine learning (ML) is a collection of techniques that give computers the ability to ”learn” from data. To eliminate the dependence on user parameters, the idea is applying ML to learn the definition of a blob from labelled images. The research question is therefore how ML can be effectively used to perform the blob detection. A blob detector is proposed that first extracts a set of relevant and non- redundant image features, then classifies pixels as blobs and finally uses a clustering algorithm to split up connected blobs. The detec- tor works out-of-core, meaning it can process images that do not fit in memory, by dividing the images into chunks. Results prove the fea- sibility of this blob detector and show that it can compete with other popular for blob detection. But unlike other tools, the pro- posed blob detector does not require parameter tuning, making it eas- ier to use and more reliable. Keywords Biomedical Image Analysis; Blob Detection; Machine Learning; 3D; ; Image Processing iv

Abstract

Syftet med blobdetektion är att hitta regioner i en digital bild som skil- jer sig från omgivningen med avseende på egenskaper som intensitet eller form. Biologisk bildanalys är en vanlig tillämpning där blobbar kan beteckna intresseregioner som har färgats in med ett fluorescerande färgämne. Vid bildbaserad in situ-sekvensering för ribonukleinsyra (RNA) är blobbarna lokala intensitetsmaxima (dvs ljusa fläckar) motsvarande platserna för specifika RNA-nukleobaser i celler. Traditionella metoder för blob-detektering bygger på enkla bildbehan- dlingssteg som måste vägledas av användaren. Problemet är att an- vändaren måste hitta optimala parametrar för varje steg som ofta är specifika för just den bilden och som inte kan generaliseras tillandra bilder. Dessutom är några av de befintliga verktygen inte lämpliga för storleken på mikroskopibilderna som ofta är i mycket hög upplösning och 3D. Maskininlärning (ML) är en samling tekniker som ger datorer möj- lighet att “lära sig” från data. För att eliminera beroendet av använ- darparametrar, är tanken att tillämpa ML för att lära sig definitionen av en blob från uppmärkta bilder. Forskningsfrågan är därför hur ML effektivt kan användas för att utföra blobdetektion. En blobdetekteringsalgoritm föreslås som först extraherar en uppsät- tning relevanta och icke-överflödiga bildegenskaper, klassificerar sedan pixlar som blobbar och använder slutligen en klustringsalgoritm för att dela upp sammansatta blobbar. Detekteringsalgoritmen fungerar utanför kärnan, vilket innebär att det kan bearbeta bilder som inte får plats i minnet genom att dela upp bilderna i mindre delar. Resultatet visar att detekteringsalgoritmen är genomförbar och visar att den kan konkurrera med andra populära programvaror för blobdetektion. Men i motsats till andra verktyg behöver den föreslagna detekteringsalgo- ritmen inte justering av sina parametrar, vilket gör den lättare att an- vända och mer tillförlitlig. Nyckelord Biomedicinsk bildanalys; Blobdetektion; Maskininlärning; 3D; Datorseende; Bildbehandling v

Acknowledgements

First, I would like to express my gratitude towards my examiner As- soc. Prof. Anne Håkansson at the KTH Royal Institute of Technol- ogy for guiding me from the first project proposal all the way to the final deliverable. She was always open to answering the most trouble- some questions or providing critical feedback. Due to her meticulous remarks I was able to reshape and tweak my work in order to achieve the high quality it has now. I would also like to thank my supervisor Jacob Kowalewski at Sin- gle Technologies under whom I performed this research. Not only would he provide me with the required resources at any moment, but he would also not hesitate to free up time for discussion. That I was able to finish the project well within the set time is most likely due to his dependable commitment. Moreover, his ideas and suggestions have strongly contributed to the approach applied in this project. Furthermore, I would like to thank Single Technologies for providing me with a very interesting thesis subject and a pleasant working space. I want to thank my co-workers for the nice chats and the friendly am- bience around the office. Finally, I would like to thank my university supervisor Assoc. Prof. Vladimir Vlassov who provided me with some highly needed hints so that I could proceed with my research. Martin ter Haak Stockholm, May 2018 Contents

0.1 Acronyms and abbreviations ...... ix

1 Introduction 1 1.1 Background ...... 1 1.2 Problem ...... 2 1.3 Purpose ...... 4 1.4 Goals ...... 4 1.4.1 Benefits, ethics and sustainability ...... 5 1.5 Research methodology ...... 6 1.6 Delimitations ...... 7 1.7 Outline ...... 8

2 An introduction to in situ RNA sequencing 9

3 Blob detection 11 3.1 Automatic scale selection ...... 11 3.2 Algorithms ...... 13 3.2.1 Template matching ...... 13 3.2.2 Thresholding ...... 14 3.2.3 Local extrema ...... 16 3.2.4 Differential extrema ...... 16 3.2.5 Machine learning ...... 19 3.2.6 Super-pixel classification ...... 19

4 Machine learning 21 4.1 Classification ...... 22 4.1.1 Naive Bayes ...... 22 4.1.2 Logistic regression ...... 23 4.1.3 K-Nearest Neighbour ...... 24 4.1.4 Decision Tree ...... 25

vi CONTENTS vii

4.1.5 ...... 26 4.1.6 AdaBoost ...... 26 4.1.7 Support Vector Machines ...... 27 4.1.8 Neural network ...... 27 4.1.9 Validation ...... 29 4.2 Clustering ...... 30 4.2.1 K-means ...... 30 4.2.2 Agglomerative clustering ...... 30 4.2.3 MeanShift ...... 31 4.2.4 Spectral clustering ...... 31 4.2.5 Other clustering algorithms ...... 32 4.2.6 Validation ...... 32 4.3 Dimensionality reduction ...... 33 4.3.1 Principal Component Analysis (PCA) ...... 33

5 Related work 34 5.1 Blob detection ...... 34 5.2 Machine learning for biomedical image analysis ..... 35

6 Methodology 38 6.1 Blob detection process ...... 38 6.1.1 Feature extraction ...... 39 6.1.2 Feature compression ...... 40 6.1.3 Pixel classification ...... 40 6.1.4 Pixel clustering ...... 40 6.1.5 Blob extraction ...... 41 6.1.6 Blob filtration ...... 41 6.1.7 Chunking ...... 41 6.2 Experiments ...... 42 6.2.1 A: Feature extraction ...... 42 6.2.2 B: Feature compression ...... 45 6.2.3 : Pixel classification ...... 45 6.2.4 D: Pixel clustering ...... 49 6.2.5 E: Run on whole image ...... 50 6.2.6 F: Comparison with state-of-the-art ...... 51 6.2.7 Summary ...... 51 6.3 Data collection ...... 51 6.3.1 Characteristics ...... 51 6.3.2 Labelling ...... 53 viii CONTENTS

6.4 Experimental design ...... 55 6.4.1 Test system ...... 55 6.4.2 Software ...... 56 6.4.3 Data analysis ...... 56 6.4.4 Overall reliability and validity ...... 56

7 Analysis 58 7.1 Results from A: Feature extraction ...... 58 7.2 Results from B: Feature compression and C: Pixel classi- fication ...... 61 7.3 Results from D: Pixel clustering ...... 66 7.4 Results from E: Run on whole image ...... 70 7.5 Results from F: Comparison with state-of-the-art ..... 71

8 Conclusions 74 8.1 Discussion ...... 76 8.2 Future work ...... 78

Bibliography 79

A Experiment F software configurations 95 A.1 Crops ...... 95 A.2 MFB detector ...... 95 A.3 ...... 96 A.4 CellProfiler ...... 97 A.5 ...... 102 0.1. ACRONYMS AND ABBREVIATIONS ix

0.1 Acronyms and abbreviations

Terms related to biology

RNA Ribonucleic acid FISH Fluorescence in situ hybridization HCS High content screening DNA Deoxyribonucleic acid FISSEQ Fluorescent in situ sequencing mRNA Messenger RNA HCA High content analysis

Terms related to image processing

2D Two-dimensional 3D Three-dimensional LoG Laplacian of Gaussian GGM Gaussian gradient magnitude DoH Determinant of Hessian DoG

Terms related to machine learning

ML Machine learning NN Neural network PCA Principal component analysis SVD Singular value decomposition MI Mutual information SVM Support vector machine RF Random forest DT Decision tree LR Logistic regression KNN k-nearest neighbour NB Naive Bayes ReLU Rectified linear unit RBF Radial basis function

Chapter 1

Introduction

In this thesis it is researched how machine learning can be applied to blob detection. What is meant with machine learning and blob detec- tion will be later described in their respective chapters. This chapter provides an introduction to the research.

1.1 Background

On the interface of computer science and biology we have an interdis- ciplinary field called bio-informatics. This field focuses on applying techniques from computer science to better understand biological data. One of its areas, namely biomedical image analysis, aims to analyse images that have been captured for the purpose of analysing medical data. Microscopy imaging is an important tool in the biomedical field for applications like the study of the anatomy of cells and tissues (histol- ogy) [1], urine analysis [2] and cancer diagnosis [3]. Fluorescent chem- icals are often added to mark interesting features in the images such as with fluorescence in situ hybridization (FISH). FISH is the binding of fluorescent dyes to specific ribonucleic acid (RNA) sequences intissue cells [4]. By capturing microscopy images under certain lighting con- ditions, these sequences light up as groups of local intensity maxima, also called blobs (see Figure 1.1 for an example). The location and the order of RNA sequences that are detected can be used for gene expres-

1 2 CHAPTER 1. INTRODUCTION

sion profiling. This profiling allows researchers to determine the types and structure of single cells [5]. As microscopes are becoming faster and supporting higher resolutions, the scale of the produced images makes it unfeasible for researchers to do manual analysis. Even more, it has been demonstrated that ma- chine learning methods can outperform human vision at recognising patterns in microscopy images [6]. Therefore several bio-informatics software packages [7–10] have been developed that facilitate them or even make the analysis fully automatic in so called high-content screen- ing (HCS) [11]. Furthermore, confocal microscopes are increasingly be- ing used to create 3D images of tissue. These microscopes, which were to a large extent originally developed at KTH [12], can capture images at different depths. Machine learning, as a field from computer science, aims to ”train” pro- grams to perform specific tasks by supplying them with data. Learning from data is useful when the task is hard to formalise such is often the case in object detection. For example, explaining to a computer how it can find cells in an image of animal tissue is hard. One waytodo this is by providing the computer with a large dataset of cell images. With this data machine learning algorithms can be applied to deduce a visual definition of a cell. Using this new definition the computer can spot instances of cells in any image. The same reasoning can be applied to detecting blobs in biomedical images as well. By supplying a program with a set of examples of blobs, it can learn to detect blobs in images analogously to how it can detect cells.

1.2 Problem

In this thesis the aim is to do blob detection on high-resolution 3D mi- croscopy images. This is a difficult task because firstly it is often not possible to check the veracity of the found blobs. Experts can usu- ally only assess the results by looking at them visually or by checking whether they match with prior knowledge. Secondly, the scale of im- ages poses a challenge for both the blob detection algorithms and for verifying the results. Popular methods for biomedical image analysis rely on a number of CHAPTER 1. INTRODUCTION 3

Figure 1.1: Microscope image of human tissue cells where RNA se- quences have been stained with a specific fluorescent dye. The blobs, visible as bright spots, are spatially clustered within cells. A single cell and its most clear blobs have been labelled for an example. simple image processing steps for which the user has to set the right pa- rameters such as in FIJI [13] and CellProfiler [10]. The main drawback of this approach is that some assumptions have to be made in order to tune parameters for the algorithms. Because these parameters are optimised only for the current image set, they cannot always be gener- alised to other image sets. Or simply, what can be a blob in one image may not be a blob in another image. Secondly, to deal with noise popu- lar methods usually apply a number of pre-processing steps incurring extra time and additional parameters. Moreover, FIJI and CellProfiler were not created with high-content screening in mind since they can only process images that fully fit in memory, which is not always the case. Also, for a tool that is so popularly used, CellProfiler is quite slow and some of its functions only work for 2D images. To tackle the issue of user-set parameters, machine learning can be ap- plied to train a model that can find blobs without user interaction. In addition, the models can be taught to ignore noise, thereby skip- ping the pre-processing steps. The algorithms have to deal with the 3D aspect and ideally use that information in their analysis. Further- 4 CHAPTER 1. INTRODUCTION

more, the algorithms have to operate out-of-core, meaning that they can process images that do not fit in memory. Lastly, efficiency isa major concern because of both the high resolution of microscopy im- ages nowadays and the extra computations that machine learning al- gorithms usually require. Therefore a requirement is that the analysis of an image does not take longer than the time needed to capture that image. The research question is: How can machine learning techniques effectively be applied to blob detection in high-resolution 3D microscopy images?. Note that here ’effective’ combines both the notions of high quality andlow running time since solutions that only excel in one aspect but lack in the other are useless.

1.3 Purpose

The purpose of this thesis is to apply and test different machine learn- ing techniques for blob detection in high-resolution 3D microscopy im- ages. For the purpose of a proof-of-concept, images produced for in situ RNA sequencing are analysed as these images usually satisfy these characteristics. Since multiple steps are needed to distinguish blobs, machine learning can be applied at different stages in different forms. Therefore in each step suitable machine learning techniques are tested. The result is an analysis that compares the tested machine learning techniques and makes a conclusion on which are best suited for solving the problem.

1.4 Goals

The aim of this project is to aid the development of autonomous bio- image analysis tools such that they require user minimal interaction. As user-guided image processing is replaced by computer vision the hope is that these tools become both faster and more accurate. While humans are limited by their cognitive capabilities, machines can con- tinuously be enhanced by iterative upgrades. Faster hardware, smarter algorithms and better data will all help to progress the performance of such analytical tools. CHAPTER 1. INTRODUCTION 5

Even though blob detection is only one task of current bio-image analy- sis tools, insights originating from this research can be applied to other common tasks as well such as edge or . Machine learn- ing models can be taught to recognise cell membranes, cytoplasms or nuclei in a similar fashion as to blob detection. Different training data and alternative features have to be used but the algorithms will be anal- ogous.

1.4.1 Benefits, ethics and sustainability

With the ongoing research on cell tissue such as brain and organs, the ability to do large-scale gene expression profiling of single cells has great advantages. The identity and function of every cell can be de- termined, which allows researchers to accurately map the structure of complex tissues. Having an automated analysis pipeline can be a sig- nificant benefit to effective research in this field. Researchers donot wish to continuously adjust the settings with trial-and-error to find those parameters that give the best results. Therefore an approach is needed that picks the optimal settings for them so they can focus on their research. Letting computers take over the tasks of humans for image analysis can lead to great gains in terms of performance. Computers will surely be much faster and work longer, but their accuracy will not necessarily be comparable to that of humans. Human experts can directly profit from their prior knowledge where computer programs have to be specifi- cally tailored for this. This means that the precision of such computer programs depends on the experience of both the original domain ex- pert and the software engineer. Human mistakes can lead to errors in the software but while a human will usually notice when something has gone wrong, a computer does not care as long as the exception is not caught. When machine learning is employed, this problem be- comes even more significant because then the accuracy of the software hinges on the quality of the training data. As biomedical images are fre- quently used in the research, diagnosis or treatment of human health, it is important to think about who should take responsibility when im- age analysis tools produce incorrect results. Ethics play an important role in deciding whether the producer of the software should be held accountable, or the user of the software. It is easy to shove the blame 6 CHAPTER 1. INTRODUCTION

towards the original creator but there is also the responsibility of the operating researchers and doctors. This is a difficult predicament, but according to me the liability should be investigated on a case-by-case basis. When an incident has occurred, thorough inspection of the in- volved events should be performed. The inspection should determine whether the cause was a doctors mistake, software error or hardware fault. Based on this information a verdict needs to be made on who should be held accountable. Regarding the possible medical applications of an automated image analysis tool, it is not hard to imagine the profits it brings about for the sustainability of health. As we humans are being surpassed by com- puter vision in our image analysis ability, we can focus on the tasks in which we are still superior such as interpreting the results and draw- ing conclusions. The consequence is that we become more efficient at treating health. There is clearly a strong relationship with the third Sustainable Development Goal (SDG): ”Good Health and Well-being” set down by the United Nations on September the 25th 2015 [14]. The project is not related to environmental sustainability.

1.5 Research methodology

Research can be classified as either quantitative; meaning that a phe- nomenon is proved by experiments or tested with large data sets (quan- tity), or as qualitative; wherein a phenomenon is studied through prob- ing the terrain or environment (quality) [15]. Since in this thesis the goal is to find the algorithms that perform best on a certain input, quan- titative results will be collected. The performance is measured by pre- determined metrics, therefore numbers dictate the conclusions. The philosophical assumption followed is post-positivism. Even though the reality is objectively given through reproducible results, as in pos- itivism [15], different observers can have divergent opinions on what is the ’optimal’ algorithm for the problem, which distinguishes post- positivism from just positivism [15]. Additionally, it may also depend in practice on which characteristics of the algorithm are deemed most important. For example, a low quality but fast solution can have pref- erence to a high quality but slow solution in some cases. Realism, which is the other potential philosophical assumption in this case [15], is not CHAPTER 1. INTRODUCTION 7

applicable because it assumes that matters do not depend on the per- son who is thinking about them. However, it has just been argued that the interpreter possibly assesses the results subjectively. The research method used is applied research because the practical prob- lem of blob detection needs to be solved, which is the main character- istic of applied research [15]. Multiple approaches are tested to find the best approach with the application of RNA sequencing in mind. Possi- ble competing research methods are fundamental research; also called ba- sic research since it drives new innovations, principles and theories, and descriptive research; which focuses on more statistical research and de- scribing the characteristics of a situation as opposed to describing the causes and effects. However, since the goal of the thesis is to improve the performance of known solutions it should not be characterised as basic or descriptive research, but rather as applied research. A deductive approach is adopted because a generalisation is concluded that answers the research question, based on large amounts of quan- titative data [15]. An abductive approach could also be chosen, but this approach assumes that the data is incomplete [15]. Since more data can be generated if desired, this is not the case in this project.

1.6 Delimitations

The main product of this thesis is the results and conclusions of the analysis as opposed to the developed software. Since the developed software is not meant to be used in production as-is, it does not have to be highly optimised or robust to bad user input. Nevertheless, its quality must be sufficient such that the test results are credible. In addition, the focus will be on evaluating existing techniques, instead of coming up with custom algorithms and methods unless necessary. Available tried-and-tested implementations will be deployed to limit the amount of coding and debugging needed. This means that only those algorithms will be tested of which there are thrust-worthy im- plementations such as those found in popular software libraries. 8 CHAPTER 1. INTRODUCTION

1.7 Outline

The first 3 chapters introduce the background information that is needed to understand the context and the experiments. Chapter 2: An intro- duction to in situ RNA sequencing provides a broad description of an example application of blob detection. The next Chapter 3: Blob detec- tion describes the current state of art in algorithmic blob detection with biomedical image analysis in mind. Chapter 4: Machine learning intro- duces the basic theory of the machine learning concepts and algorithms that are applicable in this thesis. It is followed by Chapter 5: Related work which discusses the papers and corresponding researches that are rele- vant to this thesis. Next Chapter 6: Methodology lays out the strategy for answering the research question by six experiments. Chapter 7: Analy- sis contains the results of the experiments while argumenting their re- liability. The thesis ends with Chapter 8: Conclusions that answers the research question, discusses the implications and suggests some open questions that are left. Chapter 2

An introduction to in situ RNA sequencing

In order to do phenotypic profiling1 of single cells traditionally one would look at the appearance of the cells by morphological methods2 [16]. In image-based cell profiling3, hundreds of morphological fea- tures [such as the shape, structure and texture] are measured from a population of cells treated with either chemical or biological per- turbagens [16]. A perturbagen is an agent (small molecule, genetic reagent, etc.) that can be used to produce gene expression changes in cell lines [17]. If one would then like to quantify the effects of a treatment, he or she can measure the changes in those morphological features compared to untreated cell in the control group. However, instead of looking at the results of gene expression such as the shape and structure of cells, one could also look more directly at which RNA4 sequences are being synthesised by transcription. In tran- scription, messenger RNA (mRNA5) is synthesised as a complemen- tary copy of a DNA segment by an enzym called RNA polymerase6 [18]. These RNA sequences are used to transport the genetic informa-

1use the set of observable characteristics to create a profile 2methods that are based on form and structure 3gaining information on a cell 4ribonucleic acid, a molecule essential in various biological roles in coding, de- coding, regulation, and expression of genes 5RNA molecule that convey genetic information from DNA to the ribosomes 6enzyme that is responsible for copying a DNA sequence into a RNA sequence

9 10 CHAPTER 2. AN INTRODUCTION TO IN SITU RNA SEQUENCING

tion from the DNA in the nucleus to the ribosomes7 where they specify the amino acid8 sequence for the creation of proteins. Protein products like enzymes control the processes in the cell by facilitating the chem- ical reactions [19]. By knowing which enzymes are being produced, one can tell the type and functions of single cells. Developments in high-resolution microscopy together with fluorescence in situ hybridization (FISH) allow gene expression profiling for resolv- ing molecular states of many different cell types [20] without losing spacial information. The FISH procedure starts by binding specific flu- orescent chemicals to specific nucleobases in RNA-strings [4]. These chemicals are chosen such that they absorb light and emit it with a longer wavelength [21]. When capturing an image of that specific wave- length the locations of the fluorescent chemicals are revealed, and thus the locations of the tagged nucleobases. The nucleobases will show up as local intensity maxima in the images, that are usually called blobs. By capturing multiple photos with different fluorescent agents, frag- ments of nucleobases (sometimes called barcodes) can be distinguished that can encode for the full RNA string [20]. One popular method of fluorescent in situ RNA sequencing is FISSEQ [5]. Automated microscopy systems with the ability to make large amounts of high-resolution images each hour allow the transcriptonomic9 pro- filing of thousands of cells [22]. Even more, confocal microscopes can be used to capture photos of the cells at different depths of the tissue resulting in 3D images [12]. One of the main challenges from a bio- informatics point of view is accurately finding the blobs corresponding to different nucleobases and use them to do RNA sequencing.

7complex molecule that acts as a factory for protein synthesis in cells 8building blocks of proteins 9based on information relayed through transcription Chapter 3

Blob detection

Blob detection falls within the field of visual feature detection. This field, which is part of computer vision, focuses on finding imageprim- itives such as corners, edges, curves and other points of interests in digital images [23]. Blob detection is aimed at finding regions in an im- age that are different from the surroundings with respect to properties like brightness, colour and shape (see Figure 3.1a for more properties). These regions are called blobs (see Figure 3.1b for an example). As there are multiple definitions of blobs depending on the application, there are also many different algorithms for finding them. A different but more exact definition used by Tony Lindeberg, whoisa influential researcher on multi-scale feature detection, is that ablobis a region with at least one local extremum [24], such as a bright spot in a dark image or dark spot in a light image. Even though most classical definitions consider blobs in 2D, the definition can be extended to3D as well. In this thesis blobs are defined as small (< 50 pixels) round 3D spots in an image that are brighter than their background (i.e. local intensity maxima). Refer back to 1.1 for an example.

3.1 Automatic scale selection

A majority of blob detection methods are based on automatic scale se- lection as inspired by Lindeberg [27]. Before detection, the image is converted to scale-space representation by applying a convolutional

11 12 CHAPTER 3. BLOB DETECTION

(a) Examples of blob properties. From [25]. (b) Blob detection in a field of sunflow- ers. From [26].

Figure 3.1

1 x2 + y2 g(x, y) = exp{− } 2πσ2 2σ2

Equation 3.2: Two-dimensional Gaussian function. x is the distance from the origin in the horizontal axis, y is the distance from the origin in the vertical axis, and σ is the standard deviation of the Gaussian distribution smoothing kernel over the image. In most cases this is the Gaussian fil- ter which performs a weighted average of its surrounding pixels based on the Gaussian distribution (see Formula 3.2 for the 2D filter), leading to a blurred image. The main purpose of scale-space representation is to understand the image structure at multiple levels of resolutions si- multaneously [27]. The scale can set by changing the parameter σ.A larger scale σ increases the amount of smoothing, which leads to more Gaussian noise ignored and larger objects that can be detected [28]. By running the blob detection algorithms on the same image at different scales, blobs of different sizes can be detected. Figure 3.3 shows how with different scale levels of Gaussian smoothing variously sizes blobs can be found. CHAPTER 3. BLOB DETECTION 13

Figure 3.3: Smoothed and thresholded images of an old telephone for scale levels s2 = 0, 2, 16, 32, 128, 1024 (from top-left to bottom-right). From [24].

3.2 Algorithms

For every combination of blob definition and application different blob detection algorithms can be optimal. In the domain of this thesis, a few algorithms stand out that are either popularly used or are potential candidates. These are template matching, thresholding, local extrema algorithms, differential algorithms, algorithms using machine learning and over-segmentation.

3.2.1 Template matching

Since blobs can be regarded as simple objects in an image, template matching can be applied to find them. This algorithm requires an im- age of the expected appearance of the object, called a template (Figure 3.4a). The template is moved over the search image (Figure 3.4b) with a stride of 1 and objects are detected where the template matches part of the image [28]. Every time the sum of absolute differences (SAD) or sum of squared differences (SSD) is stored in a correlation matrix (Figure 3.4c). The highest values (local maxima) in the correlation ma- trix correspond to a high probability that a object is located there. A threshold can then be used to extract the most significant objects and their locations. To find objects of different shapes and sizes multiple 14 CHAPTER 3. BLOB DETECTION

(a) Template (b) Search image (c) Correlation image

Figure 3.4: Template matching for finding a coin in an image of a set of coins. From [30]. templates can be designed beforehand. Template matching is easy to implement and very fast [29]. However, its main drawback is that it has a hard time finding objects that do not match the precise template. Since the blobs in our case can be of slightly different sizes and some- times clumped up with other blobs, this method will not be very effec- tive.

3.2.2 Thresholding

When blobs are defined as either bright or dark spots in an image (Fig- ure 3.5a), one can simply threshold the pixels to attain a binary image with regions corresponding to blobs (Figure 3.5b). Many thresholding techniques exist that exploit different information such as shape, clus- tering, entropy and object attributes. Sezgin and Sankur performed a survey and comparison of 40 selected thresholding methods from various categories [31]. Common processing steps that follow are fill- ing up holes within the blobs that are a result of noise and splitting up multiple connected blobs by using a watershed algorithm (Figure 3.5b). The next step consists of locating the blobs by looking for con- nected components; groups of neighbouring blob pixels. Also the blobs can be filtered out that do not adhere to certain criteria such assizeand shape. Finally the centroids of the blobs are calculated and returned as the location of the blob (Figure 3.5d). Exactly this approach is used by the popular bio-image analysis tool CellProfiler [10]. This interactive tool lets users create a custom pipeline that takes an image as input and outputs results according to the cho- CHAPTER 3. BLOB DETECTION 15

(a) Input image (b) Binary image by thresholding

(c) Binary image after watershed (d) Final clustering and count

Figure 3.5: Common steps in a thresholding algorithm. Created using Fiji [13]. sen steps. These steps are simple image processing steps such as back- ground removal, smoothing, enhancements and object detection. It works well when the user has time to tweak the parameters for each image or when images are similar. However, if not so, then it can be- come quite time-consuming to do batch processing of a large number of images.

Watershed

Watershed works by treating an image as a topographic map and let- ting ”water” flow from the peaks of the image downwards. In Figure 3.6, the peaks are marked as red circles. The boundary where the water from two markers meet each other indicates where the blobs should be split. 16 CHAPTER 3. BLOB DETECTION

Figure 3.6: Starting markers for watershed. First the shortest distance to the edge of the blobs is computed for each pixel. The darker the pixel, the further it is from the edge. The local minima that then appear are used as markers (visualised as red circles). When multiple markers are close together, then all but one are purged. Created using scikit-image [32].

3.2.3 Local extrema

One can also simply look at the local maxima or minima in intensity to find the bright or dark blobs in the image. During run-time for every 3x3 region (other sizes are possible) the location of the pixel with the maximum or minimum intensity is recorded, usually only when it is above a certain threshold to ignore noise. These pixels are assumed to be the centres of blobs. Next a filtering step often follows to remove the extrema that are not centres of blobs. Sometimes a segmentation algorithm like watershed (see section 3.2.2) is used to find which other pixels belong to the blobs. Although this method is simple, problems will occur when there are large blobs with multiple local extrema. In this case the algorithm will output multiple smaller blobs instead of a large one.

3.2.4 Differential extrema

Differential methods can be used instead, when local extrema arenot sufficient to distinguish blobs due to noise. These methods are based on the derivative of the intensity function with respect to the coordi- nates and will therefore pinpoint regions where the intensity changes faster than in the rest of the image. Blobs can be mathematically repre- sented by a pair consisting of a saddle point and one extremum point CHAPTER 3. BLOB DETECTION 17

making it look like a peak in the frequency domain [33] (see Figure 3.7). Laplacian of the Gaussian (LoG) is a popular differential method used for blob detection [34]. First it convolves the input image by a Gaus- sian kernel at a certain scale t = σ2 to give a scale-space representation L(x, y; t), where x and y are the pixel coordinates. Next, it applies the Laplacian operator (3.1) which results in a strong positive response of dark blobs of a specific size [34]. To capture blobs of different sizes usually the Gaussian kernel is applied with different scales simulta- neously with the scale-normalised Laplacian operator (3.2). Figure 3.8 shows the result of applying the LoG with different scales to the same image. Since the Laplacian is expensive to compute, the Difference of Gaussians (DoG) is commonly used instead. This operator can be seen as an approximation of the Laplacian but is faster to compute. Similarly as to the LoG method, blobs can be detected in different scale-spaces. It is computed as the difference between two images smoothed with Gaussian kernels of different scales3.3 ( ).

2 ∇ L = Lxx + Lyy (3.1)

∇2 normL = t(Lxx + Lyy) (3.2)

t ∇2 L ≈ (L(x, y; t + ∆t) − (L(x, y; t) (3.3) norm ∆t

The scale-normalised Determinant of the Hessian (DoH) is another pop- ular differential method. It uses the Monge-Ampère operator (3.4), where HL denotes the Hessian matrix of the scale-space representa- tion L. What Lindeberg found in a detailed analysis is that the Hessian operator has better scale selection properties under linear image trans- formations than the Laplacian operator [34].

H 2 − 2 det normL = t (LxxLyy Lxy) (3.4) 18 CHAPTER 3. BLOB DETECTION

(a) Sunflower with a line (b) Intensity of the pixels on the red line in straight through the y- (a). The local minima that is used as the blob centre. Adapted from centre is indicated, together with the saddle [33]. points. Created with Matplotlib [35].

Figure 3.7: Intensity function over the x-axis of a sunflower image. This is applicable to 2D and 3D as well.

Original image σ = 1.0 σ = 3.5 σ = 10.0

Figure 3.8: Laplacian of Gaussian applied to image with different scales σ. Created using Ilastik [36]. CHAPTER 3. BLOB DETECTION 19

3.2.5 Machine learning

The problem of the previous algorithms is that they require the user to tune the parameters in order to find the desired blobs. Also what may be good parameters in one image may not be satisfactory in another. So what if we could teach the program what is a good blob by giving it ex- amples and then letting it find the other blobs according to the learned definition. This is exactly how supervised machine learning can beap- plied to blob detection. In advance different features for each pixel are calculated that describe the intensity, edges and texture. By preceding the feature extraction with Gaussian smoothing using multiple scales, features are generated for multiple scale-spaces (as explained in sec- tion 3.1). Next, the user interactively selects some pixels belonging to a blob and some that do not belong to a blob. With this information a supervised machine learning algorithm like a RandomForest (see sub- section 4.1.5) or the support vector machine (SVM) (subsection 4.1.7) is trained which can predict the class of the remaining pixels. The connected components of the blob pixels are then tagged as candi- date blobs in the next step. If these candidate blobs are as desired, then their centroids can be returned as the blobs positions. But if there are stricter criteria, then machine learning can be applied again to distin- guish the true blobs from the false blobs. First a set of features is calcu- lated for each candidate blob such as shape, size or intensity histogram for example. Then a few blobs must be selected by the user as being true and a few others as being false. A machine learning algorithm can then use this information to identify only the correct blobs. Since an arbitrary number of features can be included, this method of finding blobs can be very accurate. This was what the people be- hind Ilastik thought as well, because their software does exactly this [36].

3.2.6 Super-pixel classification

Super-pixel classification is another approach for blob detection. It starts by creating a segmentation of the pixels into regions of pixels called super-pixels. This is essentially a clustering step that tries to group neighbouring pixels together that are similar with respect to spe- 20 CHAPTER 3. BLOB DETECTION

(a) Felzenszwalb (b) Quickshift

Figure 3.9: Products of two over-segmentation algorithms on an image of the astronaut Eileen Collins. From [41]. cific properties. Algorithms producing such so-called over-segmentations are, among others, Felzenzwalb’s [37] algorithm and Quickshift [38] (see Figure 3.9). The next step is classifying these super-pixels as being a blob or not. A popular approach for this is ex- tracting SIFT (scale-invariant feature transform) descriptors [39], map these to clusters and create a bag-of-visual-words histogram for the clusters appearing in the super-pixel as in [40]. The histogram is then classified as blob or non-blob using a supervised machine learning al- gorithm such as the SVM (see 4.1.7). This requires off-line training of the classifier prior to run-time with labelled super-pixels. Chapter 4

Machine learning

In this chapter the machine learning techniques that can be applied to the project’s problem are treated. As the focus of this thesis is to evalu- ate their performance, the techniques are only shortly discussed. These descriptions are not meant to be exhaustive, so the reader is advised to consult more elaborate sources if he or she requires a more thorough explanation. Machine learning is a field of computer science that gives computer systems the ability to ”learn” (i.e. progressively improve performance on a specific task) with data, without being explicitly programmed [42]. Data is usually structured as multi-array of values. Each row corre- sponds to one instance (e.g. customer) that we can call a datapoint. The columns are called features and describe characteristics of that instance (e.g. name, birth year, address, phone number, etc.). Typical tasks of machine learning are classification, regression, clustering, anomaly detection and structured prediction. A distinction that is commonly made between machine learning algorithms is whether they are su- pervised or unsupervised. Supervised learning is the task of learning a function that maps an in- put to an output based on example input-output pairs [43]. The output value is commonly called label. After training the learned function can be used to predict the label for new inputs. In classification the algo- rithm needs to decide to which discrete class a datapoint belongs. A classic example is classifying e-mails as either spam or non-spam. Re- gression on the other hand, aims to predict a continuous target value

21 22 CHAPTER 4. MACHINE LEARNING

for some datapoint. Lets say you want to approximate the price of a house with input information such as the floor area, location, build year and the number of bedrooms. Then you could look at other houses and build a model that describes the relationship between the house in- formation and the price. With enough data this model is then usable for predicting other house prices. Unsupervised learning algorithms are not provided with the labels during training. This means that they will have to find patterns on their own. One of the most common types of unsupervised learning is clustering. Here an algorithm is applied that groups datapoints to- gether that are similar with respect to some properties [44]. For a set of music tracks for example, it can investigated whether they can be partitioned into categories by considering their metadata like the year, artist, genre and length. As another unsupervised learning type, di- mensionality reduction aims to describe the original data using fewer dimensions [45]. The main advantages of this are gaining a better con- ceptual understanding of the data, decreasing required storage space and improving the running time for following algorithms. In literature different terms are used for the same concepts. Therefore note that datapoints, instances, observations and example inputs all mean the same thing, namely the individual data-units. The attributes of the data-units are sometimes called properties, features or dimen- sions. The attribute that needs to be predicted can be called label, de- cision class, output class, response variable or target output.

4.1 Classification

4.1.1 Naive Bayes

As a baseline classifier, which is an algorithm to which other algo- rithms are compared, the naive Bayesian classifier is often used [46]. This classifier uses the famous Bayesian theorem to make predictions (see Equation 4.1). It works by determining the probability of a data- point belonging to a certain class by considering the prior knowledge of conditions related to the datapoint. For example, if one would like to estimate the probability of a person Bert of age 57 having cancer CHAPTER 4. MACHINE LEARNING 23

P (B|A)P (A) P (A|B) = P (B)

Equation 4.1: Bayes theorem where A and B are events and P (B) ≠ 0.

P (Bert has cancer|Bert is 57 years old), the Bayesian formula can be used with P (A) as the probability of someone having cancer and P (B) as the probability of someone being 57 years old. To do a binary classification of Bert having cancer or not, this condi- tional probability is calculated and compared to the threshold of 50%. If the probability is more than 50%, then Bert is classified as having cancer. Since the probabilities are usually assumed to be normally dis- tributed, probabilities can be estimated for conditions that have not been seen before. This method can be extended to consider multiple conditions (i.e. fea- tures) by calculating the product of the conditional probabilities for the given conditions. Unfortunately, the main drawback of this method is that it assumes that within one class all features are statistically inde- pendent, hence the name ”naive” [46]. On the good side though, re- search has shown that this is not a very significant problem in practice, especially for highly dimensional data [47]. Furthermore, this algo- rithm has very convenient properties that make it worth trying in many cases. Namely, it offers a range of important services such as learning from very large datasets, incremental learning, anomaly detection, row pruning, and feature pruning - all in near linear time [46]. In addition, it requires a minimal memory footprint and is fast to train.

4.1.2 Logistic regression

The probability of a datapoint belonging to a certain class can be esti- mated in other ways as well. A common method is logistic regression that uses regression to fit a line h(x) through the data. By inserting the h(x) value for a datapoint x into a sigmoid function (see Figure 4.2), a number between 0 and 1 is returned. This number indicates the probability that the datapoint belongs to the positive class (in the case of binary logistic regression). Multinomial logistic regression may be used in cases where the dependent variable has more than two out- 24 CHAPTER 4. MACHINE LEARNING

1 f(x) = 1 + e−x

Figure 4.2: Sigmoid function. come categories.

4.1.3 K-Nearest Neighbour

A very simple classification algorithm that has seen popular usage in research is k-Nearest Neighbour (or kNN). Its ease of understanding and implementing, together with its general applicability, is the rea- son that it was included in the top 10 algorithms in data mining [48]. Instead of building a model from the training data such as most other learning algorithms, it actually uses the training data directly for clas- sification. For this reason it is called a non-parametric classifier. Forev- ery datapoint it finds the k nearest datapoints in the training dataset. The classes of those nearest neighbours dictate the class of the input datapoint using a majority vote. The used distance function depends on the application but common types are the Euclidean and cosine dis- tance [49]. kNN is notorious to being sensitive to noise such as outliers. Too small values of k can lead to noisy datapoints receiving strong influence in classifying new datapoints [48]. Because n comparisons need to be made for each input datapoint, performance is a big issue for large datasets as well. For this reason a number of improvements have been proposed such as ’condensing’ [50] or ’editing’ [51] the train- ing dataset such that it becomes smaller but approximately retains its accuracy. CHAPTER 4. MACHINE LEARNING 25

Figure 4.3: An example of a decision tree for deciding whether to go for a trip when considering the weather. From [52].

4.1.4 Decision Tree

Decision trees are models that map observations about an item to con- clusions about its target value using a series of decisions based on the observation’s attributes [52]. The decision tree model has the shape of a directed acyclic graph in the form of a tree, where each internal node represents a decision and each leaf node represents the predicted class for a given observation (see Figure 4.3). Every time a new observation has to be classified, it starts with a comparison at the root of the tree. There one of its attribute is compared to a certain value and based on this decision it continues down either of the node’s branches. At every node such comparison is made until the observation arrives at the leaf node and a final classification is made. Inducing the decision trees from training data is called . The goal is to generate a general model that can be used to classify new observations [52]. There are different algorithms for gen- erating such model but they all rely on the main idea that at each node the decision has to be made that best splits the data with respect to the target class. The quality of the split is measured by the information gain or information gain ratio that the decision produces. The infor- mation is often defined as the weighted average of Shannon Entropy 94.1) or the Gini Impurity (4.2) over the new branches, where P (xi) is 26 CHAPTER 4. MACHINE LEARNING

defined as the probability of a possible value from {x1, ..., xn}. ∑ − · H(X) = P (xi) log2 P (xi) (4.1) i ∑ 2 Gini(X) = 1 − P (xi) (4.2) i

4.1.5 Random Forest

A common extension of decision trees are ensemble methods like ran- dom forests. These are a set of multiple induced decision trees that combine their outputs into a single classification to improve the over- all accuracy [52]. Decision trees are known for being very sensitive to irregularities in the training data which makes them susceptible to over-fitting [52]. A random forest is created by building multiple decision trees, each with a different random sample of features from the training data. This ensemble method is also sometimes called ”random subspace method” or ”feature bagging”. The motivation for this method is that it pre- vents classifiers from focusing on only a single (or few) features that are strong predictors of the response variable. Because the classifiers have to look for more general features, they are less likely to over-fit. Ho performed an analysis of how random space projection leads to ac- curacy gains [53]. Random forests have been successfully applied to pixel classification in the bio-image analysis software Ilastik. Theac- companying paper adds that ”The ability of the random forest to cap- ture highly non-linear decision boundaries in feature space is a major prerequisite for the application to general sets of use cases.” [36].

4.1.6 AdaBoost

AdaBoost is another ensemble method that has shown good results in practice. Similarly to bagging, it combines the predictions of multiple arbitrary classifiers. It was invented by Y. Freund and R.E. Schapire in 1996 [54]. Where bagging takes into account all the predictors equally, boosting differs by actually taking a weighted sum of the predictions CHAPTER 4. MACHINE LEARNING 27

as the final output. The ”Ada” in the name stands for adaptive, be- cause the algorithm is able to tweak subsequent weak learners such that they focus on instances that are harder to classify. By combining weak learners that are only slightly better then random guessing, the final model is provably going to converge to a strong learner.

4.1.7 Support Vector Machines

Support vector machines (SVM) represent a powerful technique in clas- sification, regression and outlier detection [55]. Similarly as to decision trees and random forests, they are non-probalistic. For a binary classi- fication it seeks out an optimum hyperplane separating the two classes involved such that the distance between the closest representatives of the two classes is maximised. During training time SVM algorithms build a SVM model that splits the training data into two classes with the least error. Next, new datapoints are classified based on which side of the hyperplane they fall. The hyperplane can be linear such in reg- ular linear SVM’s but sometimes has other shapes such as curves. In these cases a kernel function can used to map the data into a different features space [55]. In this new feature space it should be easier to find a linear hyperplane that divides the transformed data. Popular kernels are the Gaussian radial basic function (RBF) kernel, the exponential kernel and the polynomial kernel.

4.1.8 Neural network

The previously described machine learning algorithms are heavily used in industry and work well on a wide variety of important problems. However, for some problems central in artificial intelligence (AI), such as speech recognition and object detection they have not achieved the required performance. Therefore a new field of machine learning called has emerged, motivated in part by the failure of tradi- tional algorithms to generalise well on such AI tasks [56]. A significant challenge for more complex data is the curse of dimensionality, which makes machine learning exceedingly more difficult when the number of dimensions is high [56]. In order to cope with this problem tra- ditional machine learning algorithms need prior beliefs to be guided 28 CHAPTER 4. MACHINE LEARNING

f(x) = max(0, x)

Figure 4.4: Rectified Linear Unit (ReLU) function. about what kind of function to learn. However, these priors hurt the algorithm’s ability to generalise over more complex functions. Deep learning relies heavily on the concept of artificial neural networks, which are networks of nodes inspired loosely by the neural networks of which animal brains are composed. These networks consist of con- nected layers where each layer is made up of nodes. The output of these nodes are actually linear functions of the input connections of the node followed by a non-linear activation function such as sigmoid (Fig- ure 4.2) or Rectified Linear Unit (Figure 4.4). What makes these neu- ral networks ’deep ’ are their hidden layers that are situated between the input and output layer. These hidden layers enable the network to learn very complex non-linear functions as are required in more com- plicated tasks. Usually the nodes of each layer are connected to all the nodes in the neighbouring layers, this is called densely connected. The most common type of artificial neural network is the feed-forward network that aims to approximate some function in order to predict the output for any arbitrary input. This can be useful for tasks such as classification and regression but even for more complex tasks suchas data compression or image segmentation. These networks are called ’feed-forward’ because the data ’flows’ from the input layer to the out- put layer (see Figure 4.5). There are no feedback connections in this type of network, in contrast to recurrent neural networks for example. The method of training a feed-forward neural network (or most other artificial neural networks) is called backpropagation. Backpropagation is used to calculate the gradient of the loss function with respect to the weights working from the final layer back to the first hidden layer. This CHAPTER 4. MACHINE LEARNING 29

Figure 4.5: Example of a feedforward neural network. Adapted from [59]. gradient is then needed in gradient descent to update the weights of each layer. There are also more advanced optimisation algorithms such as Adadelta [57] and Adam optimiser [58].

4.1.9 Validation

F1-score

To measure the performance of a binary classification algorithm the f1-score is often used. It is defined as the harmonic mean4.3 ( ) of the precision (4.4) and recall (4.5). Its range runs from 0.0 to 1.0.

precision · recall f1 = 2 · (4.3) precision + recall

|true positive| precision = (4.4) |true positive| + |false positive|

|true positive| recall = (4.5) |true positive| + |false negative| 30 CHAPTER 4. MACHINE LEARNING

4.2 Clustering

4.2.1 K-means

K-means clustering is one of the most simple and popular clustering al- gorithm [60]. It starts with selecting k random (though smarter meth- ods exists) points from the data as centroids. Then in the next step it assigns each of the remaining points to the closest centroid. At the end of this iteration the points have been partitioned in k disjoint clusters. Next, for each cluster a new centroid is calculated as the mean of all the attribute values of the points in the cluster. In the next iteration the points are assigned to the new centroids. The algorithm continues iterating until either the centroids stop moving between iterations or another stop criterion is reached. The space requirements for K-means are modest because only the data points and centroids are stored [60]. K-means is also quite fast because its running time is linear with respect to the dataset size. This makes it a powerful multi-purpose clustering algorithm and a good starting point for more advanced clustering al- gorithms.

4.2.2 Agglomerative clustering

Agglomerative clustering is an example of a hierarchical clustering method that first derives a hierarchical tree from the data and then infers the main clusters [60]. Agglomerative is also sometimes called bottom- up because it starts by putting each point in a separate cluster, and then builds up new larger clusters from the smaller clusters until all points are connected. At every step it determines which two clusters are closest together and then merges them. There are different linkage criteria for deciding the distance between clusters such as: minimal distance between closest members (single linkage), minimal distance between furthest members (complete linkage), distance between cen- troids and minimal sum of squared differences clusters (’ward’ link- age). The biggest drawback of this algorithm is the running time since it needs to compare every pair of clusters in each step, therefore re- quiring O(n3) [60] computations. There are however faster implemen- tations that run in O(n2 log n). Another challenge is the non-triviality CHAPTER 4. MACHINE LEARNING 31

of inferring the flat clusters from the hierarchical tree since the criteria can be subjective. Examples of criteria are a fixed number of clusters or a maximum distance between clusters.

4.2.3 MeanShift

As a non-parametric clustering algorithm for feature spaces MeanShift was proposed in 2002 [61]. The algorithm relies on centroids that it continuously updates to be the mean of the points within a given re- gion. It aims to discover ’blobs’ in a smooth density of samples [62]. This property makes the algorithm an attractive candidate for cluster- ing pixels, since blobs have a consistent density and are often quite dense. The algorithm is however not as highly scalable, because it re- quires multiple nearest neighbour searches during the execution of the algorithm [62]. The only parameter it requires is the bandwidth, which dictates the size of the region to search through. The bandwidth can be set beforehand or be estimated.

4.2.4 Spectral clustering

Spectral clustering algorithms use the top eigenvectors of a matrix de- rived from the distance between points (also called affinity matrix)[63]. For this family of algorithms a common approach goes as follows. First the affinity matrix is calculated for all the points. Then an eigendecom- position is performed on the normalised Laplacian of this matrix. Next the k eigenvectors are selected belonging to the top k highest eigenval- ues. These vectors are concatenated into a matrix of n × k. Finally the points in this lower-dimensional space are assigned to k clusters using a simple clustering algorithm such as k-means. k is usually determined beforehand as the expected number of clusters, but other approaches exist that guess k from the eigendecomposition matrix. Spectral clus- tering is a popular algorithm because it is simple to implement, can be solved efficiently by standard linear algebra software and very of- ten outperforms traditional clustering algorithms such as the k-means algorithm [64]. 32 CHAPTER 4. MACHINE LEARNING

4.2.5 Other clustering algorithms

Of course the mentioned list does not cover all algorithms that have been invented for the purpose of clustering, which is impossible due to the overwhelming amount of literature on the subject. Other pop- ular algorithms that have been considered but are deemed unsuitable are: affinity propagation [65]; since it does not scale very well for n, DB- SCAN [66]; because all the blobs have the same density the algorithm will not be able to distinguish them, and clustering using the Gaussian mixture model; since it has too many parameters and is not scalable [67].

4.2.6 Validation

Silhouette Score

Since the ground-truth of the cluster to which a point belongs is of- ten either subjective or unknown, it is difficult to evaluate the quality of a clustering algorithm. The Silhouette Coefficient was therefore pro- posed by P.J.Rousseeuw in 1987 [68], because it can be calculated solely from the clustering results. It is composed of two scores [69]: a The mean distance between a datapoint and all other points in the same cluster b The mean distance between a datapoint and all other points in the next nearest cluster The Silhouette Coefficient s for a single datapoint is then defined as:

b − a s = max(a, b)

To determine the Silhouette Score for a dataset, the mean of the Silhou- ette Coefficients for all datapoints is calculated (or a random sampleif there are too many). The score is bounded by -1 for incorrect clustering and +1 for highly dense clustering. When the score is around zero it means that clusters are likely overlapping [69]. If the clusters are dense and well separated, CHAPTER 4. MACHINE LEARNING 33

then the score is higher, which corresponds to the conventional defi- nition of a cluster. Furthermore, a large silhouette score corresponds with a high value of roundedness of the clusters which is fortunately a desired property of blobs.

4.3 Dimensionality reduction

4.3.1 Principal Component Analysis (PCA)

Principal Component Analysis is likely the most popular multivariate statistical technique [70]. It is widely used as a method for dimension- ality reduction. It can be thought of as fitting a k-dimensional ellipsoid to the data such that each axis corresponds to a principal component. The axes are chosen so that they explain the highest amount of vari- ance and are orthogonal with respect to each other. The first principal component is the axis that explains the largest amount of variance. The second principal component lays orthogonally to the first and explains the second highest amount of variance and so on. Shorter axes do not provide as much information and can therefore be removed. The aim of PCA is to find from the data X these components, which are linear combinations of the original variables. Singular Value De- composition (SVD) is used to calculate X = P ∆QT [70]. The matrix P ∆ denotes the factor scores, or in other words the importances of the dimensions. Matrix Q holds the coefficients of the linear combinations used to compute the factor scores. By multiplying the original matrix X with Q we can project the data to a lower dimensional space. The result is a compressed version of the original matrix which can be used to speed up following steps. Because of this characteristic PCA is often used for image compression where it has proved itself to be effective [71]. Chapter 5

Related work

The related work is divided in two subjects: blob detection and ma- chine learning for biomedical image analysis. Each subject is treated separately.

5.1 Blob detection

A survey on the usage of blob detection algorithms for biomedical im- age analysis described in literature has been done in 2016 [72]. The authors gathered and examined 30 relevant papers in which classical blob detection algorithms are utilised. In other words, the algorithms that do not use any machine learning or artificial intelligence. They found that a majority (20 of 30) of the papers used either the Laplacian of the Gaussian (LoG), the Difference of Gaussians (DoG) or the De- terminant of Hessian (DoH) method (see Figure 5.1). The authors did not provide an explanation why they think these methods are the most popular. Blob detection is not only useful for analysing biomedical images, but also for images in other fields. When fruits are interpreted as blobs for example, machine vision techniques can be used to count them in trees [40, 73]. Tracking piglets in videos [74] and traffic sign detection for autonomous driving [75] are alternative applications of blob detec- tion. Other types of images may be analysed as well like infra-red [76] and ultrasound images for the purpose of detecting breast abnormali-

34 CHAPTER 5. RELATED WORK 35

Figure 5.1: Frequency of blob detection methods used in 30 biomedical image analysis papers. From [72]. ties [77]. Ultimately blob detection can be used for any problem where regions need to be detected that are visually distinct from their sur- roundings.

5.2 Machine learning for biomedical image anal- ysis

With the development of more powerful computing systems the use of artificial intelligence is becoming ubiquitous. The bio-informatics experts have already discovered the advantages of computer-assisted image analysis in the 2000’s and a great deal of literature has been writ- ten about it already. Before the popularity of machine learning for computer vision, more simple image processing techniques were used such as segmentation, thresholding, and watershed (see 3.2.2) such as in CellProfiler [10]. CellProfiler allows the user to define a number of processing modules in sequence for performing analysis on cell images. Another popu- lar software application for image processing is FIJI [13], which is a ”batteries-included” version of the powerful ImageJ 1.x [78] image pro- cessing tool. Even though these tools have many features, each step in the analysis requires parameters to be determined by the user. This can be difficult because the results depend highly on these parameters and 36 CHAPTER 5. RELATED WORK

it is difficult to affirm the correct parameters. Also, both tools cannot do out-of-core image processing without resorting to custom plugins or scripts. This missing feature makes them unsuitable for analysing images that do not fit in memory. Even more, 3D is not yet fully sup- ported in CellProfiler with some crucial functions missing. Gene expression profiling using image processing methods is described in [5, 20, 79]. Transcriptomics, the techniques used to study an organ- ism’s transcriptome (sum of all its RNA transcripts), are often used for gene expression profiling. Image analysis methods have been success- fully applied to transcriptomics [22, 80]. The authors of the papers [81, 82] have provided an extensive explanation of the common steps in an automated bio-image analysis pipeline. Especially in high-throughput experiments, image analysis is used heavily to quantify phenotypes of interest to biologists [16]. Papers such as [16, 83, 84] treat the common case of phenotypic cell profiling specifically. Since mere image processing methods were not always sufficient, ma- chine learning techniques have become more present in biomedical image analysis - often in the context of high-content screening (HCS). By using techniques from image processing, computer vision and ma- chine learning, large amounts of bio-image data can be analysed, which is also frequently called high content analysis (HCA) [82]. Machine learning has also been applied to cell segmentation [85, 86] and nu- cleus detection [87]. In [88] the authors use a type of neural network, called convolutional neural network, to detect nuclei. For cell segmen- tation SVM’s are utilised in [89], while deep learning algorithms are compared in [90]. Besides , free tools that apply machine learning for image-based cell analysis have been developed such as CellCognition [8], CellClassifier [7], Advanced Cell Classifier (ACC) [91] and cellX- press [9]. CellCognition is a computational framework for quantitative analysis of high-throughput fluorescence microscopy. It has functions for among others: image segmentation, object detection, feature extrac- tion and statistical classification. The only purpose of CellClassifier is automatic classification of single-cell phenotypes using supervised ma- chine learning. It requires images that have been prepared with Cell- Profiler. CellCognition and CellClassifier rely on a SVM for phenotypic classification. Advanced Cell Classifier is the improvement over Cell- Classifier that is more user-friendly, allows for more advanced machine CHAPTER 5. RELATED WORK 37

learning with 16 different classifiers and was made for high-content screens. cellXpress is another fully featured and highly optimised soft- ware platform for cellular phenotype profiling. The platform is de- signed for fast and high-throughput analysis of cellular phenotypes based on microscopy images. Notably the bio-image analysis tool Ilastik [36] was a major inspira- tion for this project because its use of active learning, which lets users label a few instances iteratively on-line, showed to be very effective. It uses a random forest for pixel classification, without indication of other tried algorithms. Therefore in this project other machine learning can- didates will be evaluated as well. The described tools are primarily meant for cell detection and profiling, which is slightly different from blob detection. Besides that, they all work semi-automatically by re- quiring the user to label a few instances beforehand, while the aim of this project is to do blob detection fully automatically. Chapter 6

Methodology

Machine learning can be applied to blob detection by first classifying the pixels as blobs/non-blobs, followed by a clustering of these pixels into blobs. This approach will also be used in this research, as exten- sively described in section 6.1. Based on this approach a number of experiments can be devised that help answer the research question. These are discussed in section 6.2. Section 6.3 describes the charac- teristics of the data, how the data was collected and how it has been labelled. The last section 6.4 discusses the details of the experiments and how overall reliability and validity will be assured.

6.1 Blob detection process

In this project the blob detection process consists of 6 subsequent steps. The input is a 3D image and the output is a list of blob coordinates. The first step starts by extracting a set of features for each pixel in the image. These features signify intensity, edges and texture. An optional inter- mediary step is compressing the features with PCA. Next, a trained classification model is used to classify the pixels into two classes: blob and non-blob. The resulting binary image is passed into a clustering steps that attempts to declump the touching blobs. In the next step the locations and characteristics of all blobs are extracted. Finally, the blobs are filtered based on their characteristics and returned as output. An overview of the process is visible in figure 6.1. Now each step will

38 CHAPTER 6. METHODOLOGY 39

Figure 6.1: The blob detection process. be more thoroughly discussed.

6.1.1 Feature extraction

Besides the single pixel intensities, filters can be applied to the input image to obtain additional features for each pixel. The filters are partly inspired by Ilastik [36]. The intensities of neighbouring pixels are rep- resented by the raw image smoothed with a . The Lapla- cian of Gaussian, Difference of Gaussians and Determinant of Hes- sian (see 3.2.4), and Gaussian of gradient magnitude are used to detect edges. The texture of regions is distinguished by the eigenvalues of the (see 3.2.5) and the eigenvalues of the Hessian of Gaus- sian [36]. The scale of the filter σ for each feature can be specified. The features can be calculated in 2D by calculating them for each z-plane separately, but some also in 3D by applying a Gaussian filter in the z dimension. 40 CHAPTER 6. METHODOLOGY

6.1.2 Feature compression

The idea behind this step is that since the number of extracted features may be too high, it can take very long to train and run a classification algorithm. Also some features may not be very relevant after all. A di- mensionality reduction algorithm such as PCA can reduce the number of features without sacrificing the accuracy too much. This step takes in the pixel features, transforms them using a fitted PCA model and outputs the resulting pixel features in a lower dimensionality.

6.1.3 Pixel classification

A trained classifier model can now be used to classify pixels by their features. The output of the classification is a list of predicted labels for each pixel saying whether it is likely part of a blob or not. The labels for all pixels are then put together again such that we get a binary image with a background of 0’s and regions of 1’s that denote blobs.

6.1.4 Pixel clustering

Since blobs can be clumped up together, the aim of this step is to split them up. The method commonly used in image-based cell analysis software is a watershed algorithm (see 3.2.2). The starting markers are the local maxima in the blobs and the ”water” flows until the edges of the blobs. Even though the results of watershed are usually acceptable, there are other algorithms that can be useful for declumping as well. When the x and y coordinates are treated as features of each blob pixel, then they can be grouped in clusters by a clustering algorithm with the in- verse of the Euclidian distance as similarity measure. The pixels that are close together will form clusters which correspond with individual blobs. In order to reduce the running time, the clustering is performed for each connected component separately, so that only the pixels in a com- ponent are considered each time. The result of this step is a segmented CHAPTER 6. METHODOLOGY 41

image with labelled regions. The background is labelled with 0’s, while each cluster is labelled with a unique id from {1, 2, ...}.

6.1.5 Blob extraction

In this step the segmented image is processed and all the clusters (i.e. blobs) with their characteristics are extracted. For each blob the centroid is calculated as the mean of the x, y and z coordinates of the pixels. The radius is the maximum distance over all the pixels to the centroid. The output of this step is a list of blobs with their respective characteris- tics.

6.1.6 Blob filtration

With some prior knowledge about the size of blobs, this step filters out the blobs that are either too small or too large. This is needed because it may be possible that noise may be mistaken as blobs in the previous steps. Finally this step outputs the filtered blobs from the original input image.

6.1.7 Chunking

Since an image may not fit in memory completely, it has to be processed out-of-core. The solution in this thesis is applying analysis on separate parts of the image by chunking. A chunk is a rectangular cuboid whose depth is the same as the image depth but whose width and height are usually much smaller (usually 500 × 500 pixels). In addition to the im- age data within the boundaries, every chunk also contains data of a 10 pixel thick border around the chunk called the overlap. This is needed for the convolutional filters that otherwise need to guess the values out- side of the chunk boundaries. Also, some blobs may lie across chunk boundaries that otherwise would be split in two. The value of 10 pix- els was chosen because blobs are very unlikely to be bigger than 20 pixels in diameter, which means that blobs whose centroid lays within a chunk are fully encompassed by the overlap. All the steps are per- formed on each chunk in sequence. In the end the blobs of each chunk are collected and returned as final list of blobs. 42 CHAPTER 6. METHODOLOGY

6.2 Experiments

The research question of this thesis is How can machine learning tech- niques effectively be applied to blob detection in high-resolution 3D microscopy images? The research is comprised of six experiments. The first four experiments evaluate how machine learning techniques can be applied to the first four steps of the blob detection process: fea- ture extraction, feature compression, pixel classification and pixel clus- tering. There are no experiments related to the last two steps of blob de- tection; blob extraction and blob filtering, because there is not enough data to justify using a machine learning approach over a simple heuris- tic method. The fifth experiment assesses the feasibility of the blob detection pro- cess, using the found optimal machine learning techniques, by running it over a whole image. In the sixth experiment the blob detector of this thesis will be compared to the state-of-the-art tools that are commonly used for blob detection in biomedical images.

6.2.1 A: Feature extraction

Even though an unlimited number of features can be extracted from an image, only a limited set may be useful for a specific application. The features that may be beneficial for differentiating blob pixels from non-blob pixels are shown in Table 6.1. The single pixel value and the Gaussian filter are the most basic features and respond to light/dark regions in general. The motivation for choosing the Laplacian of Gaus- sian, Difference of Gaussians and Determinant of Hessian features is their popularity in blob detection (see Figure 5.1). The reason is their high response to local extrema (see 3.2.4). The Gaussian of gradient magnitude is suitable for detecting edges in images. After applying a Gaussian filter it employs a gradient magnitude filter that reveals the gradients of the pixel intensities. The Eigenvalues of structure tensor and the Eigenvalues of Hessian of Gaussian are both also used in Ilastik [36] to reveal texture. CHAPTER 6. METHODOLOGY 43

Feature Code 2D/3D Scales σ Value value N/A N/A Gaussian filter gaus 2D, 3D 0.7, 1.0, 1.6, 2.5, 4.0 Laplacian of Gaussian log 2D, 3D 0.7, 1.0, 1.6, 2.5, 4.0 Gaussian of gradient magnitude ggm 2D, 3D 0.7, 1.0, 1.6, 2.5, 4.0 Difference of Gaussians dog 2D, 3D 0.7, 1.0, 1.6, 2.5, 4.0 Determinant of Hessian doh 2D 0.7, 1.0, 1.6, 2.5, 4.0 Eigenvalues of structure tensor stex, stey 2D 0.7, 1.0, 1.6, 2.5, 4.0 Eigenvalues of Hessian of Gaussian hogex, hoge 2D 0.7, 1.0, 1.6, 2.5, 4.0

Table 6.1: All the pixel features that are tested. The value filter repre- sents the single pixel intensities. The Determinant of Hessian, Eigen- values of structure tensor and Eigenvalues of Hessian of Gaussian are not implemented in 3D. Because the eigenvalues consist of both an x and y component, these features consist of two attributes.

A1: Feature selection

To optimise feature extraction it is needed to find a subset of features from Table 6.1 that is both relevant for classifying a pixel as blob/non- blob and is non-redundant. The feature selection process proposed by José Bins and Bruce A. Draper does exactly this [92]. They suggest a three step approach for selecting a small number of important features from a huge set of features that is often available in the computer vision domain. The feature selection method in this thesis is largely inspired by their work. The first step is filtering out the irrelevant features bytheir Relief score [93] with respect to their predictive power of the label. Even though Re- lief has been shown to detect relevant features well, in practice it is very time consuming to compute for large datasets. Therefore in this thesis the choice was made to use mutual information (MI) instead, which has seen use in feature selection as well [94]. This metric, also sometimes called information gain, is utilised to calculate the gain in information (defined as the decrease in entropy) when instances are split bysome condition. This makes it a suitable metric because the features are eval- uated for significance to classification, which essentially is splitting in- stances by their features. Even more, the benefit of mutual information 44 CHAPTER 6. METHODOLOGY

is that it does not make assumptions on the data such as other methods like the Chi-square test that assumes categorical variables [95] and the Pearson correlation coefficient that only considers linear correlations [96]. The filtering step ends by removing the features whose mutual info score is not at least some minimum value. Additionally, because some features may take longer to calculate, the duration of calculation is also taken into account. Since mutual information does not detect redundancy, it is possible that some of the most relevant features are very similar to each other. Therefore the second step aims to eliminate redundancy by only keep- ing the most relevant feature for each group of similar features. Sim- ilar features are found by applying k-Means. This method of apply- ing k-Means is unusual because most times instances are clustered by their features but now features are clustered by the values for each in- stance. In the paper the authors suggest a third step that uses the Sequential Floating Forward Selection (SFFS) algorithm [97] to create an optimal subset of features, but this is not needed in our case because the number of features is already sufficiently low after the second step. The field of feature selection in machine learning is very broadand many algorithms have been proposed that aim to find the set of op- timal features. However, in order to limit the scope of this thesis the other methods of feature selection are not considered. The reason for choosing above method is that it has been shown to be more effective than standard feature selection algorithms for large datasets with lots of irrelevant and redundant features [92]. It seeks the most relevant features with the lowest amount of redundancy as are desired proper- ties for a feature set.

A2: 2D versus 3D comparison

In addition, it is interesting to see whether features are more relevant when calculated in 3D versus 2D. This can be done by comparing their mutual information with respect to the label with a two sided t-test1 for the null hypothesis that the mean of the mutual information for the 2D feature is equal to the mean of the mutual information for the 3D

1From scipy.stats.ttest_rel CHAPTER 6. METHODOLOGY 45

feature µ(MI(f2D))=µ(MI(f3D). For the features where the hypothesis is rejected, it is determined whether the mutual information for the 2D feature is higher, or for the 3D feature.

Reliability and validity

To reduce the effect of chance, 100 random samples of about 26,400 pixels, and their corresponding labels are selected from the training image. In total about 1% of the pixels in the blob regions are sampled. In each sample about 144 (0.55%) of the pixels are labelled as blob, the rest as non-blob. Then for each sample the desired features are calcu- lated, followed by a mutual information calculation with respect to the label.

6.2.2 B: Feature compression

For the sake of reducing classification time, it is needed to investigate how much the pixel features can be compressed with PCA whilst keep- ing enough information for accurate classification. This is measured by the performance of the pixel classification after the features have transformed using a fitted PCA model. The features are compressed to {1, 2, ...m} components where m is the number of extracted features.

6.2.3 C: Pixel classification

For pixel classification there are a number of suitable algorithms that go from simple to more complex. Simpler algorithms have less pa- rameters because they make more assumptions on the data. This can make them more useful when little data is available compared to more complex algorithms. Algorithms with more hyper-parameters require more data and effort to train but can become more accurate because they can detect less obvious patterns in the data. Now for candidates, naive Bayes can be used as a baseline classifier to compare the other algorithms with. Logistic regression is a known good performer on binary classification problems. k-Nearest neigh- bour can work well on some problems that are not too complex. It 46 CHAPTER 6. METHODOLOGY

is simple to understand and implement, but its major drawback is the long running time for large data sets. Decision trees can be effective too, because they can potentially emulate any decision boundary. How- ever, since they are prone to over-fitting, a random forest of multiple bagged decision trees may be a better alternative. AdaBoost is another ensemble method that, like a random forest, can mitigate the shortcom- ings of a decision tree by focusing on the harder-to-classify instances. As a classifier that can achieve a high accuracy, even with little data, the support vector machine is a popular additional contestant. Finally, a simple feed-forward neural network is chosen as last candidate be- cause due to its high number of parameters it can potentially achieve a very high accuracy when given enough data. The metric used for measuring classification quality is the f1-score (see 4.1.9). The accuracy metric is not so useful because the blob/non-blob pixels are highly imbalanced with at most 1% of the pixels being blob pixels. The speed is measured as the prediction time for classifying a set of pixels. The training time of the classifiers is not taken into con- sideration because all training happens off-line.

C1: Hyper-parameter tuning

First, the optimal hyper-parameters are searched for each classifier such that they achieve the highest f1-score on the best selected features. Ta- ble 6.2 shows the classifiers that will be tested and their hyper-parameter search space. The same optimised decision tree is used as base estima- tor in Random Forest and in AdaBoost. Naive Bayes does not have any hyper-parameters that can be optimised. For logistic regression, sup- port vector machine, decision tree and k-nearest neighbour grid search [56] is used to find the optimal combination of hyper-parameters. For the neural network a random search [56] is performed over 100 ran- dom combinations to approach the best hyper-parameters, followed by a grid search to attain the local maximas. With exception of the neural network classifier which is implemented with Keras [98] using a TensorFlow [99] back-end, the other classifiers use the implementation in scikit-learn [100]. CHAPTER 6. METHODOLOGY 47

Table 6.2: Classifiers tested for pixel classification and their search space of hyper-parameters. The remaining hyper-parameters get val- ues according to the default values in scikit-learn 0.19.1 or Keras 2.1.5 for the neural network.

Classifier Search space of hyper-parameters Naive Bayes None Logistic regression penalty1 ∈ {’l1’, ’l2’}, C2 ∈ {0.5, 1.0, 1.5, 2.0, 2.5} k-Nearest neighbour n_neighbors3 ∈ {1, 3, 5, 10}, weights4 ∈ {’uniform’, ’distance’}, p5 ∈ {1, 2} Decision tree criterion6 ∈ {’gini’, ’entropy’}, splitter7 ∈ {’best’, ’random’, max_depth8 ∈ {3, 4, ..., 12}, max_features9 ∈ {1, 2, ..., 10} Random forest None - uses 50 optimised decision trees AdaBoost None - uses 50 optimised decision trees Support vector machine C2 ∈ {0.5, 1.0, 1.5, 2.0, 2.5}, kernel10 ∈ {’linear’, ’poly’, ’rbf’, ’sigmoid’}, gamma11 ∈ {0.1, 0.4, 0.7, 1.0, 1.3} Neural network n_neurons112 ∈ {1, 5, 10, 15, 20, 25, 30} n_neurons213 ∈ {1, 5, 10, 15, 20, 25, 30} dropout14 ∈ {0.0, 0.1, 0.2, 0.3, 0.5} lr15 ∈ {0.0001, 0.0005, 0.001, 0.005, 0.01} decay16 ∈ {0.000001, 0.00001, 0.0001, 0.001, 0.01, 0.1} 1 ’l1’ or ’l2’ regularisation 2 regularisation term - smaller means stronger regularisation 3 number of neighbours to be considered when classifying a new point 4 ’uniform’ means that all neighbours are weighted equally, ’distance’ means that the weight for each neighbour is the inverse of its distance to the point 5 p=1 is the Manhattan distance and p=2 is the Euclidean distance [49] 6 criterion for determining best split - ’gini’ is the gini impurity, ’entropy’ is the infor- mation gain 7 strategy for choosing best split at each node - ’best’ chooses the best split and ’random’ chooses a random split 8 maximum depth of the tree 9 number of features to consider when choosing best split 10 kernel used in the algorithm - ’poly’ is a polynomial kernel of degree 3 and ’rbf’ stands for radial basis function 11 coefficient used in the ’rbf’, ’poly’ and ’sigmoid’ kernels 12 number of neurons in the first hidden layer 13 number of neurons in the second hidden layer 14 fraction of input units to drop 15 learning rate 16 learning rate decay over each update 48 CHAPTER 6. METHODOLOGY

C2: Pixel classifier comparison on different PCA compressions

To test the optimised classifiers, they are first trained on the selected best features which are compressed with PCA to different numbers k=m,m-1.m-2,...,1 of components. Then they are used to predict the labels for a different of features. Next, the classifiers are compared to each other in terms of f1-score and prediction time.

Design neural network

Compared to other classification algorithms a neural network allows for many more hyper-parameters. In order to limit the options, the fol- lowing parameters are therefore fixed. The input layer has a number of neurons that is equal to the number of best features found in A1. The number of hidden layers is 2 with the reasoning that more layers is better for learning complex functions but the problem is not com- plex enough to justify a deeper neural network. The output layer has only a single neuron because the neural network is expected to give a binary output. The layers are all dense meaning that between every pair of layers all the neurons are connected. An exception is the op- tional drop-out between the first and second hidden layer. Here some neurons may not be connected to the next layer based on the dropout parameter. The benefit of drop-out is that since the neural network cannot focus only on a few features and therefore needs to find general patterns, overfitting is prevented. The activation functions for the neu- rons in the hidden layers are ReLU’s (see Figure 4.4), which are easy to train because of their linear behaviour making them in general an excellent choice [56]. The activation function in the output layer is a sigmoid (see Figure 4.2) function as is common for binary classification problems with gradient-based learning [56]. In terms of optimisation algorithm there is no consensus on what is the best algorithm [101]. However, since the Adam optimiser [58] has shown to be robust and is used frequently in literature [56], this will be our choice as well. With exception of the learning rate and learning rate decay, all the hyper- parameters in the Adam optimiser have the default values as provided in the original paper. The number of training epochs is 150 and the batch size is 10000. CHAPTER 6. METHODOLOGY 49

Reliability and validity

For both hyper-parameter optimisation and comparing the classifiers, cross validation is used to test how well the classification algorithm is at predicting labels of an unseen dataset. First 1% of the pixels in the blob regions of the training image are randomly sampled, together with their labels. For these pixels the best features are calculated as determined by A1. Stratified 10-fold cross validation is then applied to train the classifiers with 9 folds and calculate their f1-score for the pre- dicting the labels in the remaining fold. Each of the 10 folds is the test fold exactly once. Stratification makes sure that the ratio of blob pixels is equal in each fold. This is important because the blob ratio must be similar to what is expected in a whole image. The mean of the f1-score for all 10 folds is taken as the overall f1-score for each classifier.

6.2.4 D: Pixel clustering

For the pixel clustering step, the goal is to find the algorithm that is most suitable on the basis of its clustering quality and running time. The metric used for measuring clustering quality is the silhouette score (see 4.2.6). In order to compare other clustering algorithms, the watershed algo- rithm is treated as a clustering algorithm in this experiment. Not every clustering algorithm is suitable for pixel clustering because some may have different definitions of clusters or not scale well for instance. The candidates have therefore been chosen for the following reasons. K- means is a popular clustering algorithm with good reason: it is simple and fast. Besides that, it looks for round blobs with similar size which is beneficial in the case of blobs. Agglomerative clustering algorithms make few assumptions on the data, making them a suitable general- purpose candidate. The MeanShift algorithm was created for solving high-density clusters which is exactly what blobs are. Spectral clus- tering is known to be slower than others but can find clusters of high quality. The clustering algorithms and their hyper-parameters are shown in Ta- ble 6.3. Their implementation in scikit-learn [100] is used. 50 CHAPTER 6. METHODOLOGY

Table 6.3: Algorithms and their parameters used for pixel cluster- ing. The values of the remaining parameters are the default values in scikit-learn 0.19.1.

Clustering algorithm Parameters Watershed None K-means n_clusters=number of local maxima1 Agglomerative (centroid) t2=5 Agglomerative (ward) n_clusters=number of local maxima MeanShift bandwidth3=4 Spectral n_clusters=number of local maxima 1 starting centroids are initialised as the locations of the local maxima 2 maximum distance between clusters - two clusters are merged when the dis- tance between their centroids is smaller than t 3 coefficient used in the RBF kernel

Reliability and validity

To reduce the effect of chance, the blob region of the test image issplit in chunks of 500×500×depth and 1% of the chunks are randomly sam- pled. All the pixels in each chunk are first classified using the best clas- sifier as determined by C2 and best hyper-parameters as determined by C1. The pixels in each chunk are then clustered using every clustering algorithm in Table 6.3. For each clustering result the silhouette score and other blob statistics are calculated. The clustering algorithms are finally evaluated by the mean of the silhouette score for each sampled chunk.

6.2.5 E: Run on whole image

The purpose of this experiment is to determine whether the overall ma- chine learning approach taken in this thesis is suitable to detect blobs in whole images in practice. The set of the optimal features from A1, the optimal PCA compression from B, the optimal classifier from C and the optimal clustering algorithm from D will be used in the pro- cess. The blob detector with these properties is ran over one complete image. Both the blob detection results and the running time will be analysed. CHAPTER 6. METHODOLOGY 51

6.2.6 F: Comparison with state-of-the-art

As a final step for validation of the blob detector proposed in this thesis, it is compared to the current state-of-the-art of tools for blob detection. It is not fair to compare the found blobs from the different tools. Firstly, because all the tools require different parameters, it is unreasonable to guarantee that the used parameters are optimal. Secondly, there is no ground truth available on the number, location and sizes of the blobs to check the blob results with. Therefore the tools will only be compared based on their running time. The blob detection approach in this thesis, called the MFB detector, will be compared to FIJI [13], CellProfiler [10] and Ilastik [36]. For the configuration of each tool, the reader should consult Appendix A. Each tool will be run over 10 random crops of size 500×500×16 pixels of the test image. Then mean of those times is used for comparing the performance of the four tools.

6.2.7 Summary

For a summary of the solutions that will be tested in the experiments, one can consult Table 6.4.

6.3 Data collection

6.3.1 Characteristics

The data consists of high-resolution 3D microscopy images made with a confocal microscope. The images are saved in TIFF files where the depth layers are saved as a z-series. The description of the TIFF files contain metadata in the OME-TIFF XML format [102]. The character- istics of the images are summarised in Table 6.5. Experiments A: Feature extraction, B: Feature compression and C: Pixel classification require images with labelled pixels. For these steps the same image is used for both training and evaluation with the charac- teristics in Table 6.6. For experiment D: Pixel clustering there is no data of the ground truth 52 CHAPTER 6. METHODOLOGY

A - Image features B - Feature compression C - Classification algorithms Raw pixel values PCA transform to Naive Bayes Gaussian filter m to 1 components Logistic regression Laplacian of Gaussian k-Nearest neighbour Gaussian of gradient magnitude Decision tree Difference of Gaussians Random forest Determinant of Hessian AdaBoost Eigenvalues of structure tensor Support vector machine Eigenvalues of Hessian of Gaussian Neural network D - Clustering algorithms E - Run on whole image F - Blob detection tools Watershed N/A MFB Detector K-means FIJI Agglomerative (centroid) CellProfiler Agglomerative (ward) Ilastik MeanShift Spectral

Table 6.4: Summary of solutions that will be tested in each experiment.

Characteristic Values Width/height Any size, but typically around 30000 × 30000 Number of layers Any number ≥ 1, but typically 3 − 12 Pixel size x/y Any value, but typically 0.27 µm Pixel size z At least 0.7 µm, but typically 1 µm Storage size Typically 0.5 − 10 GB Data format Uncompressed OME-TIFF file with description in the OpenMicroscopy OME-XML format [102] Color depth 8 bits, but usually less than 256 unique values

Table 6.5: General characteristics of the biomedical images considered in this research. CHAPTER 6. METHODOLOGY 53

Characteristic Values Width/height 2310 × 115000 Number of layers 3 Pixel size x/y 0.217µm, 0.219µm Pixel size z 1.0µm Storage size 762 MB Data format OME-TIFF-XML Color depth 8 bits, at most 75 unique values Blob coverage 0.528% blob pixels

Table 6.6: Characteristics of the training image. This image has been labelled with the procedure described in 6.3.2.

Characteristic Values Width/height 24097 × 14445 Number of layers 16 Pixel size x/y 0.273µm, 0.272µm Pixel size z 1.0µm Storage size 5.19 GB Data format OME-TIFF-XML Color depth 8 bits, at most 210 unique values Capture time 18548 s (5:09 hours)

Table 6.7: Characteristics of the test image. This image is unlabelled. of the clusters. Since we now have more freedom to pick any image, a different image is used than in the previous experiments. The charac- teristics of this larger image can be found in Table 6.7. The same image is used for experiments E: Run on whole image and F: Comparison with state-of-the-art.

6.3.2 Labelling

As supervised machine learning algorithm the pixel classifiers must be trained with image data where the pixels have been labelled as blobs and non-blobs. Two methods have been considered for generating these labels: humans can label the pixels or a program labels them with prior 54 CHAPTER 6. METHODOLOGY

knowledge that is not present during classification. Human labelling is tedious and error-prone, while machines are much faster but can make errors too without noticing. To reduce the drawbacks of these methods it has been decided to combine them. There are two version of the training image: one with blobs and one in which the fluorescent dyes have been stripped such that there are no blobs visible. A computer program can subtract the image without blobs from the image with blobs to reduce the background. The back- ground is not completely removed because the images are slightly dif- ferent due to noise and misalignment. In the resulting image the blobs are more distinct from their surroundings than in the original image. See also Figure 6.2. Then to accentuate the local maxima in the images, the Difference of Gaussians (DoG) of the image is calculated. Ascale of σ = 1.5 has shown to give the best results by trail-and-error. Next a sample of the pixels (0.1%) from the DoG image is randomly chosen and used to fit a K-means clustering algorithm with 2 clusters. Standardisation (see Equation 6.1) is used to improve the speed and accuracy of the convergence. Since pixels that belong to a blob receive a much higher value after a DoG transformation, the clustering algo- rithm will group blob pixels in one cluster and the non-blob pixels in another cluster. K-means is chosen because it is general-purpose and reasonably efficient for a clustering algorithm. Its drawback however, is that it can get stuck in a local minimum because of an unfortunate choice of initial centroids. Therefore the algorithm is re-initiated 50 times with different centroid seeds. The centroids of the best output of the 50 runs in terms of inertia, which is defined as the sum of squared distances of samples to their closest cluster centre [103], are used to k-means cluster the remaining pixels. The results of the machine-labelled image are visually checked by a human to make sure that the blobs are correctly labelled. The labelling method utilises two pieces of prior knowledge that are not available during run-time of the classifier. Firstly, the classifier does not have ac- cess to the image where the blobs have been stripped during run-time, because it is too costly to strip the fluorescent dyes and then recapture each image. Secondly, the results will not be checked by a human in CHAPTER 6. METHODOLOGY 55

(a) Blobs visible (b) Blobs stripped (c) Difference

Figure 6.2: Image c is produced as: c = a − 2 · gaussian_filter(b, σ = 1). The values were chosen empirically.

Specification Value Windows 10 Home (64-bit) Processor (R) Core(TM) i7-3630QM CPU @ 2.40 GHz Memory size 8.00 GB

Table 6.8: Specifications of the test system. practice.

′ X − µ X = (6.1) σ

6.4 Experimental design

6.4.1 Test system

All the experiments are performed on the same computer with the specifications in Table 6.8. 56 CHAPTER 6. METHODOLOGY

Package name Version Purpose javabridge [104] 1.0.15 Dependency of python-bioformats Keras [98] 2.1.5 User-friendly API for neural networks Matplotlib [35] 2.2.2 Visualisation of analysis results [105] 1.14.2 nD arrays, memory maps and helper functions NetworkX [106] 2.1 Graph colouring pandas [107] 0.22.0 Data manipulation and analysis python-bioformats [108] 1.3.2 Reading OME-TIFF image files PyQt5 [109] 5.10.1 Matplotlib backend scikit-image [110] 0.12.3 Image processing scikit-learn [100] 0.19.1 Machine learning SciPy [111] 1.0.0 Statistics and other helper functions TensorFlow [99] 1.7.0 Neural network framework tifffile [112] 0.14.0 Reading OME-TIFF image files PyYAML [113] 3.12 Reading and writing YAML files

Table 6.9: Python software packages used in this project.

6.4.2 Software

All the software is written in Python 3.5 from the Anaconda2 4.2.0 dis- tribution. The used Python packages are listed in Table 6.9.

6.4.3 Data analysis

During the experiments the relevant data is stored in pandas DataFrames, which are 2D tabular datastructures. Then after the experiments have finished, the data is analysed using pandas and statistical functions from SciPy. The results are either displayed in text or plotted with Matplotlib.

6.4.4 Overall reliability and validity

To guarantee statistical significance of the experiments the significance level is fixed to α = 0.001 for each statistical test. For the purpose of re-

2See Anaconda: https://anaconda.org/ CHAPTER 6. METHODOLOGY 57

peatability, the random seed is set to 0 before creation of every sample. The validity of the optimised blob detector is evaluated by running it over a whole image. To check that the performance is reasonable, it is compared with performance of state-of-the-art tools. Chapter 7

Analysis

In this chapter the results are listed for each experiment. Furthermore, interesting observations from the results are described in the text.

7.1 Results from A: Feature extraction

This section lists the results that have been collected in according to the tests on feature extraction. One should refer back to Table 6.1 for find- ing the features that correspond with the feature abbreviations gaus, log, ggm, etc.

A1: Selected best features

In the first step of the feature selection process we filter out thosefea- ture that are either irrelevant or take too long to calculate. Figure 7.1 shows per feature and scale the mutual information and time. The figure shows that especially the Hessian of Gaussian eigenvalues do not provide much information for large scales while being at the same time expensive to compute. Based on the information from the figure we filter out the features whose mutual information is smaller than 0.010 and whose time is larger than 0.4 s. In the second step the features must be clustered such that redundant features can be removed. The results of the clustering is visible in Fig-

58 CHAPTER 7. ANALYSIS 59

Figure 7.1: Mutual information with respect to the blob/non-blob label of features for different scales on the left axis. On the right axisthe average time to calculate the feature for a 500 × 500 × 3 pixels image chunk. 60 CHAPTER 7. ANALYSIS

Figure 7.2: Results of clustering the features with K-means and k=10. The colors denote the cluster assignment. t-SNE [114] was used to con- vert the points to a 2D space and spread them out nicely. ure 7.2. It is interesting to see that some clusters contain only one fea- ture (e.g. value and 2d_log_4.0) while some clusters are much larger. Now for each cluster only the feature with the highest mutual informa- tion score is kept. Table 7.1 shows the final selection of the 10 best fea- tures as determined by their mutual information. The value of 10 was chosen because it resulted in the highest silhouette score (see 4.2.6).

A2: Comparison 2D vs 3D

What Table 7.2 shows is that for the features with small scale it does not matter if they are calculated in two dimensions or in three dimensions. For features with larger scale there is a difference between 2D and 3D CHAPTER 7. ANALYSIS 61

Name Mutual info Time (s) 3d_dog_1.6 0.030 0.085 3d_gaus_0.7 0.026 0.039 2d_stex_1.0 0.024 0.167 3d_log_1.6 0.022 0.074 2d_log_1.6 0.022 0.060 value 0.022 0.012 3d_log_2.5 0.020 0.075 2d_log_2.5 0.018 0.062 2d_log_4.0 0.013 0.073 2d_hogey_0.7 0.011 0.209

Table 7.1: The 10 selected best features with their mutual information as calculated using the two-step feature selection process discussed in 6.2.1. The shown time is the average time to calculate the feature for a 500 × 500 × 3 pixels image chunk. but no consistent pattern is visible. That for the Difference of Gaussians (dog) with σ = 1.6 the third dimension provides more information than 2D on the blob/non-blob label was to be expected since during the generation of the training data the 3D version of the DoG was used to predict the label.

7.2 Results from B: Feature compression and C: Pixel classification

During the tests the support vector machine classifier proved too hard to train because its training time scales more than quadratically with the number of samples. So therefore the SVM was trained with a smaller sample of 0.01% pixels. But since a large sample is crucial due to the significant class imbalance, the SVM performed badly. 62 CHAPTER 7. ANALYSIS

Scale σ 0.7 1.0 1.6 2.5 4.0 gaus = [0.725] = [0.617] 2d [0.000] 2d [0.000] × log = [0.938] = [0.704] 3d [0.000] 3d [0.000] 3d [0.000] ggm = [0.795] = [0.596] = [0.696] 2d [0.000] 2d [0.000] dog = [0.574] 3d [0.000] 3d [0.000] 2d [0.000] 2d [0.000]

Table 7.2: Results of t-test comparing the means of the mutual informa- tion with respect to the label between 2D and 3D versions of the fea- tures. ’×’ means that one of the samples was not normally distributed making the t-test is invalid. ’=’ means that the hypothesis can be ac- cepted and there is likely no difference in predictive power on the label between 2D and 3D. ’2D’ or ’3D’ means that the hypothesis must be rejected and the respective dimension has more predictive power than the other. The number between brackets is the p-value.

C1: Hyper-parameter optimisation

In Table 7.3 one can observe the values of the optimised hyper-parameters for the classifiers used in pixel classification.

C2: Comparison classifiers with respect to PCA compres- sion

Each optimised pixel classifier was run over the same pixel sample that has been compressed with PCA to varying number of components. Figure 7.3 shows that for most classification algorithms more compo- nents (i.e. features) leads to a higher f1-score, as is expected. The excep- tions are however the decision tree, random forest and naive Bayes. The disappointing performance of naive Bayes does not come as a surprise since it is known that its independence assumption can be too limiting for complex problems. What is surprising is that the decision tree and random forest actually profit from stronger (i.e. less components) PCA compression. This gives the perception that they are under-fitted when trained on PCA-compressed features. Also AdaBoost, that uses a deci- sion tree as base estimator, loses out compared to k-nearest neighbour, neural network and logistic regression. Without PCA-compression all CHAPTER 7. ANALYSIS 63

Classifier Best hyper-parameters f1-score Naive Bayes N/A 0.159 Logistic regression penalty=’l1’, C=2.5 0.831 k-Nearest neighbour n_neighbours=10, weights=’distance’, p=1 0.803 Decision tree criterion=’entropy’, max_depth=4, 0.894 max_features=10 Random forest Same as decision tree 0.848 AdaBoost Same as decision tree 0.897 Support vector machine C=7.5, kernel=’rbf’, gamma=1.3 0.812 Neural network n_neurons1=26, n_neurons2=10, 0.903 dropout=0.1, lr=0.08, decay=0.0001

Table 7.3: Found optimal hyper-parameters and their f1-score of the pixel classification algorithms. The values of the missing hyper- parameters are the default values in scikit-learn 0.19.1 or in Keras 2.1.5 for the neural network.

the classifiers (with exception of naive Bayes and SVM) perform simi- larly with a f1-score around 0.89, with a neural network being the best performer. The interesting observation that can be made from Figure 7.4 is that some classification algorithms (k-nearest neighbour, support vector ma- chine, naive bayes) profit from PCA-compression in terms of prediction time, others are slowed down (random forest, neural network, logis- tic regression, decision tree), and for AdaBoost there does not seem to be a difference. This means that the decision whether to apply PCA- compression should depend on the chosen classifier. Also the same figure shows that k-nearest neighbour scales very badly with thenum- ber of components. This can be explained by the added complexity of calculating the distance between points in a higher dimension. Its huge prediction time makes it unsuitable for our problem. In Figure 7.5 one can evaluate the classifiers on both their f1-score and prediction time. The most accurate and fastest classifiers can be found in the top left corner. Those are neural network and decision tree on uncompressed data and logistic regression on both compressed on un- compressed data. A neural network is the most accurate classifier but is slower to train and run than a decision tree or logistic regression. It is 64 CHAPTER 7. ANALYSIS

Figure 7.3: f1-score of classification algorithms on varying numbers of components for PCA-compression. The dashed line indicates the f1- score of the classifiers on non-compressed data. The SVM classifier is not included due to its lacklustre performance. CHAPTER 7. ANALYSIS 65

Figure 7.4: Prediction time of classification algorithms on varying numbers of components for PCA-compression. The dashed line in- dicates the prediction time of the classifiers on non-compressed data. Note the logarithmic scale of the time axis. 66 CHAPTER 7. ANALYSIS

surprising that a decision tree classifier performs better than a random forest classifier. The output of a random forest is basically the modeof the output of multiple bagged decision trees. An explanation could be that there is only a single feature or a few features on which the deci- sion tree relies for its classification. Since these essential features are ignored in some of the bagged decision trees, the overall accuracy of the random forest is less than that of a single decision tree. The find- ing that a random forest performs worse than a single decision tree for pixel classification of blobs is bad for Ilastik which uses arandom forest.

7.3 Results from D: Pixel clustering

The clustering algorithms were run on a subset of the chunks of the test image after the pixels have been classified. Since the decision tree with- out PCA-compression has shown to be both accurate and fast, it was used for the pixel classification. In Figure 7.6 one can see how well the clustering algorithms perform in terms of silhouette score and running time. Interestingly, agglomerative clustering and k-Means have simi- lar performance that is both fast and with reasonable silhouette score. MeanShift and spectral clustering are definitely too slow for our pur- pose. Even more, Table 7.4 shows that MeanShift creates less but larger clusters. Also, the high standard deviation shows that it is highly unre- liable in terms of clustering time. Watershed is not suitable because of its low silhouette score that is likely the result of it creating irregularly sized clusters. For this claim Figure 7.7 provides additional evidence by showing the different products of the pixel clustering algorithms on the same connected components. Indeed, in that example there is more variation in the size of the clusters produced by watershed. Also, subjectively speaking, agglomerative (centroid) has created the best shaped clusters in the figure. Based on these results agglomera- tive clustering with inter-centroid distance as pairing metric seems to be the most appropriate as clustering algorithm. CHAPTER 7. ANALYSIS 67

Figure 7.5: Comparison of f1-score and prediction time for the clas- sification algorithms. The number between the square brackets indi- cates the number of components for PCA-compression. Missing square brackets means no compression. Support vector machine is not in- cluded due to its lacklustre performance and k-nearest neighbour is not included since its prediction time is much longer than the rest. 68 CHAPTER 7. ANALYSIS

Figure 7.6: Comparison of silhouette score and running time of the clustering algorithms. The time is calculated as the mean clustering time for a 500 × 500 × 16 pixels chunk. CHAPTER 7. ANALYSIS 69

(a) Watershed (b) K-means

(c) Agglomerative (centroid) (d) Agglomerative (ward)

(e) MeanShift (f) Spectral

Figure 7.7: The same connected components segmented with water- shed and clustering algorithms. 70 CHAPTER 7. ANALYSIS

Algorithm Time (s) # Blobs Blob size Silhouette Watershed 1.416 (+/- 1.266) 15306 96.603 (+/- 97.092) 0.284 (+/- 0.171) K-means 0.131 (+/- 0.085) 15686 94.232 (+/- 78.200) 0.372 (+/- 0.188) Agglomerative 0.066 (+/- 0.083) 16437 89.953 (+/- 57.253) 0.381 (+/- 0.189) (centroid) Agglomerative 0.110 (+/- 0.108) 15616 94.580 (+/- 75.068) 0.375 (+/- 0.190) (ward) MeanShift 6.378 (+/- 41.416) 9530 154.930 (+/- 91.375) 0.424 (+/- 0.222) Spectral 4.249 (+/- 9.835) 13081 87.701 (+/- 86.807) 0.350 (+/- 0.198)

Table 7.4: Statistics of the pixel clustering algorithms. The mean clus- tering time per 500×500×16 pixels chunk is given in time. # blobs is the total number of found blobs. blob size is the mean blob size in pixels. silhouette is the mean silhouette score for clustering each chunk. The number between parentheses is the standard deviation.

7.4 Results from E: Run on whole image

The found optimal blob detector uses the 10 features in Figure 7.1, no PCA-compression, a decision tree classifier and agglomerative cluster- ing with inter-centroid distance as metric. Running this complete blob detection process on the whole test image took in total 10,696 seconds, which is 2:58 hours. This duration is good considering that capturing that image took 5:09 hours. The share of each step can be found in Figure 7.8. It is unsurprising that the feature extraction step takes the majority of the running time because 10 filters have to applied to each image. The feature extraction could be made faster by not calculating the heavier filters anymore. In this case the 2d_stex_1.0 and the 2d_hogey_0.7 filters have been selected, but since they take more time to calculate than the other filters, they may be discounted in order to improve speed. Furthermore, a total 1,556,913 of blobs were found with an average ra- dius of 2.587 pixels and density of 317.05 · 109 blobs per mm. In Figure 7.10 one can see an example of the steps in the optimised blob detector. CHAPTER 7. ANALYSIS 71

Figure 7.8: Share of steps in total running time of the blob detection process.

7.5 Results from F: Comparison with state-of- the-art

Figure 7.9 shows how the running time of this thesis’ MFB detector compares with the speed of blob detection performed with FIJI, Cell- Profiler and Ilastik. Even though the MFB detector uses a very similar approach to Ilastik, it is twice as slow. The likely reason is that Ilastik has been better optimised. Ilastik uses the C++ VIGRA [115] and makes use of multiple cores. By contrast, the MFB detector is im- plemented in Python and works on a single core only. Moreover, for a tool that solely makes use of simple image processing algorithms, Cell- Profiler is very slow. CellProfiler performs approximately thesame steps as the FIJI macro but its performance is far worse. 72 CHAPTER 7. ANALYSIS

Figure 7.9: Mean running time of blob detection on a 500 × 500 × 16 pixels chunk with different tools. CHAPTER 7. ANALYSIS 73

(a) Input image (b) Extracted features (examples)

(c) Classified pixels (d) Clustered pixels (colours are for visible separation only)

(e) Extracted blob centroids

Figure 7.10: Input, output and intermediary images in the blob detec- tion steps on a chunk from the test image. Chapter 8

Conclusions

In this project the goal is to move away from simple user-guided image processing and go to fully automated computer vision for the task of blob detection. With the existing tools for biomedical image analysis, the user has to tune the parameters for each step of the blob detection process. Not only is this tedious, but it has to be repeated for every im- age set as well. This makes automatic operation of such pipelines im- possible. Besides this, some of the current software was not designed for the scale of modern high-resolution 3D microscopy images that can get into the tens of gigabytes in size. To avoid the manual selection of blob detection parameters, machine learning techniques can be used to learn the definition of blobs from a large amount of image data byit- self. Also, the scaling problem can be solved by out-of-core processing methods. Since performance is a major concern in this project, the re- search question was formulated as: How can machine learning techniques effectively be applied to blob detection in high-resolution 3D microscopy im- ages? A blob detection pipeline, inspired by Ilastik [36], was designed that consists of 6 sequential steps. For steps 1 to 4, machine learning tech- niques were tested, while in step 5 and 6 a heuristic approach is used instead. For the first step of feature extraction a set of 10 image features was selected that is characterised by high relevancy and low redun- dancy. These features are used in the next step of pixel classification to decide for each pixel of the input image whether it is part of a blob or not. For this step 8 popular classifiers were first optimised for the

74 CHAPTER 8. CONCLUSIONS 75

problem, and then evaluated by their f1-score and prediction time. The results show that a decision tree and logistic regression are most suit- able because of their high accuracy combined with a low running time. A neural network can achieve a slightly higher f1-score but is slower to classify with. Experiments on feature compression show that PCA com- pression is not worth it for the top classifiers because the accuracy is hit harder than the prediction time. The fourth step of pixel clustering aims to split up touching blobs. Conventional approaches use a wa- tershed algorithm, but the novelty of this thesis is to apply clustering algorithms too. From the 6 clustering algorithms that were deemed suitable, agglomerative clustering and k-means show the most poten- tial. Not only are they simple and run fast, but they create a high quality clustering, as measured by silhouette score, as well. In the next experiment the optimised blob detection process was ap- plied to a typical image captured for the purpose of in situ RNA se- quencing. The aim was to ascertain that the pipeline would work in practice. The running time was just over 3 hours, which satisfies the requirement that it must be less than the time to capture that image (5 hours). Furthermore, the experiment shows that feature extraction is by far (75%) the most lengthy step. This shows that feature selec- tion is likely more significant than choice of classification or clustering algorithm. Even more, the results of experiment A show large discrep- ancies between the relevancy and time to calculate the features. This means that by selecting a less computation-heavy set of features, the running time can be greatly decreased. The final experiment compares the running time of this thesis’ blob detector, called MFB detector, with FIJI, CellProfiler and Ilastik. The running time of MFB detector was similar to FIJI but slower than Ilastik. The probable reason is that Ilastik is more highly optimised. All-in-all, the results show that indeed machine learning can be very effective for blob detection in high-resolution 3D microscopy images. The proposed blob detector, using 10 optimal features, a decision tree pixel classifier and agglomerative clustering algorithm, approaches the state-of-the-art in terms of speed. Since the MFB detector is trained on labelled data it is presumed to be more accurate than the other blob detectors that rely on user-set parameters. This suspicion is however hard to check because there is no ground truth available on the blobs. Table 8.1 provides a comparison of the four blob detectors. 76 CHAPTER 8. CONCLUSIONS

MFB detector FIJI CellProfiler Ilastik Thresholding Supervised ML User parameters User parameters Supervised ML (labels from data) (labels from user) Declumping xy-clustering Watershed Watershed None or watershed Out-of-core Yes No No Yes 3D Yes Yes Yes, but limited Yes Time (s) 16 14 90 8

Table 8.1: Comparison of blob detectors. The time denotes the mean duration to process a 500 × 500 × 16 pixels chunk.

8.1 Discussion

The implication of the proven good performance is that the MFB de- tector can eventually be integrated in high-content screening pipelines for analysis of biomedical images. Because the MFB detector does not rely on user parameters, it saves the medical experts time and cognitive effort. Instead of needing to tune the blob detection, they can focuson other stages of the analysis like diagnosis and interpretation. Further- more, since the detector was trained on labelled data of blobs, it is sup- posedly more accurate than approaches that rely on user parameters. As a potential drawback, it does mean that the labelled has to be cor- rect, because the performance of the blob detection depends strongly on it. There comes always a time that one should be critical of his own work. Starting with what is good about the research, the found blob detector combines a machine-learned thresholding algorithm combined with the novel use of a clustering algorithm for blob declumping. On the way an optimal set of features was selected and it was found that there can be slight differences in features calculated in 2D versus 3D. More- over, PCA compression was found to be nonsensical for the problem. An automatic method of creating labelled image data was devised by using the difference between two images with and without blobs. The research question has been thoroughly investigated. For every step in the blob detection process those features or algorithms have been CHAPTER 8. CONCLUSIONS 77

tested that are either used in related work or have similar function. An additional requirement for the algorithms was that they must be easy to implement using available software libraries. By focusing on this low hanging fruit it is possible that some other good candidates were missed. Also it is possible that the tested candidates have not been perfectly optimised. Since a support vector machine scales more than quadratically with the number of samples, it was arduous to train. The solution was therefore to train on a smaller sample. But this resulted in a low f1-score compared to other classifiers, even though the SVM is known to be generally a good performer. Moreover, only a very simple version of a feed-forward neural network was evaluated. Perhaps other hyper-parameters such as activation function, learning algorithm and network structure can improve its performance. For feature selection, the (confirmed) suspicion was that the choice of features is highly important for the overall performance of blob detec- tion. Therefore an approach was consciously chosen that is specifically designed for computer vision. In a structured fashion it selects a sub- set of features that expresses both high relevancy and low redundancy. Despite this, there may be better ways to pick the features. For exam- ple, wrapper methods can be applied to recursively find an optimal sub- set of features using a performance evaluation at every iteration [94]. Or embedded methods can be tested that directly integrate with the learn- ing algorithm [94]. In feature selection, being a whole field on its own, a great number of methods have been published already. And with open problems such as scalability, stability and model selection [94] more papers are added every year. The training data came from a single image only because there was no access to more. This may hurt the generality of the learned blob defini- tion. Ideally data from multiple images is used to train the pixel classi- fiers. There was also no data on the ground truth of blobs tocheckthe clustering results with. Moreover, the unsupervised silhouette score metric was used to measure clustering quality, but this may be sub- optimal. Speaking of metrics, those chosen may not provide a good assessment of the image features and algorithms. For feature selection many other metrics exist, to name a few: Fisher-score, ReliefF, chi-square and F- Score [94]. Also for classification and clustering there are alternative metrics. For classification we have the Receiver Operating Character- 78 CHAPTER 8. CONCLUSIONS

istic Area Under Curve (ROC AUC), log loss and fbeta-score [116]. For measuring clustering quality there is also Calinski-Harabaz score [117]. With so much choice it is difficult to determine what is the optimal met- ric to use. In the final experiment F the blob detector from this thesis was com- pared to other popular bio-image analysis tools. The performance of the different tools may be inaccurate because the chosen parameters could be less than optimal. The trial-and-error nature of those tools made it very costly to try every combination of parameters. Therefore the experiment should be regarded as an approximate comparison of the running times of state-of-the-art tools.

8.2 Future work

As mentioned in the delimitations, the blob detector implementation in this project must be viewed as a leap in the right direction and not as final software product. It works reasonably but numerous improve- ments to speed, accuracy and usability are still available. Image processing, which is now performed by the Python-based scikit- image, could be accelerated by C-based libraries like OpenCV [118] or VIGRA [115]. Within the Python environment, general speed improve- ments can be achieved by code optimisations such as Numba’s [119] just-in-time compilation. For feature extraction, instead of letting the CPU apply convolutional filters to the input image, GPU’s, being far more efficient at such tasks, can be used. Parallelisation over multi- ple cores or multiple CPU’s may decrease the running time even more. Since the images are currently processed in chunks anyway, it should not be too hard to distribute them over multiple processors. A higher division of work leads to more overhead though. So perhaps it would be interesting to compare this to the more efficient method of storing the complete image in a single memory array and processing it in a single system. In this thesis only one approach of machine learning to blob detection has been tested, but there are others that may be successful as well. Over-segmentation, upon which subsection 3.2.6 shortly touched, may be used to first split the image into super-pixels and then classify each CHAPTER 8. CONCLUSIONS 79

group of pixels as being a blob or not using a machine-learned classi- fier. In this project there was no access to data for enabling this butper- haps future research can evaluate the over-segmentation approach as well. For instance, white blood cell over-segmentation has been done with support vector machines in [120]. In addition, a convolutional neural network may be used for segmentation of bio-images like in [121, 122]. The advantage is that feature extraction is not necessary because the neural network is able to detect visual patterns of neigh- bouring pixels using convolutions. For images where only a small fraction of the total area contains blobs, there may be ways to avoid searching through empty regions. An idea could be to first predict for every region whether it contains blobs, per- haps using machine learning, before running the full blob detector on the region. By observing the classification results, it could be sometimes noted that noisy artefacts in the images, caused by imperfections of the tissue sam- ples, were sometimes classified as blobs. Since this leads to a lower overall precision, one could look into ways of avoiding these false pos- itives. As extension to the blob detection, machine learning may be applied to the sequencing of RNA as well. With enough data of different se- quences, a model can be trained to predict the RNA sequences based on the found blobs in multiple input images. In [123] the authors use graph optimisation to find the most probable RNA sequence. How in- stead this could be achieved with machine learning is left as an open question. Finally, since the tool was not intended for production as-is, usability has been left out of scope. But for the software tool to be convenient in practice, several additions are required that aid the user. A graph- ical user interface could be included that shows the available options, provides feedback and displays the end results. Extra checks on user input may be added to improve robustness. Options for different input and output formats could be a good addition as well. Bibliography

[1] D. Smyth, J. Bowman, and E. Meyerowitz, “Early flower devel- opment in Arabidopsis,” Plant Cell, vol. 2, no. 8, pp. 755–767, 1990. [2] Z. Zaman, G. Fogazzi, G. Garigali, M. Croci, G. Bayer, and T. Kránicz, “Urine sediment analysis: Analytical and diagnostic performance of sediMAX® - A new automated microscopy image- based urine sediment analyser,” Clinica Chimica Acta, vol. 411, no. 3-4, pp. 147–154, 2010. doi: 10.1016/j.cca.2009.10.018. [3] H. Krivinkova, J. Pontén, and T. Blöndal, “THE DIAGNOSIS OF CANCER FROM BODY FLUIDS: A Comparison of Cytol- ogy, DNA Measurement, Tissue Culture, Scanning and Trans- mission Microscopy,” Acta Pathologica Microbiologica Scandinav- ica Section A Pathology, vol. 84 A, no. 6, pp. 455–467, 1976. doi: 10.1111/j.1699-0463.1976.tb00143.x. [4] E. Volpi and J. Bridger, “FISH glossary: An overview of the fluo- rescence in situ hybridization technique,” BioTechniques, vol. 45, no. 4, pp. 385–409, 2008. doi: 10.2144/000112811. [5] J. Lee, E. Daugharthy, J. Scheiman, et al., “Fluorescent in situ sequencing (FISSEQ) of RNA for gene expression profiling in intact cells and tissues,” Nature Protocols, vol. 10, no. 3, pp. 442– 458, 2015. doi: 10.1038/nprot.2014.191. [6] T. W. Nattkemper, T. Twellmann, H. Ritter, and W. Schubert, “Human vs machine: Evaluation of fluorescence micrographs,” eng, Computers in Biology and Medicine, vol. 33, no. 1, pp. 31–43, Jan. 2003, issn: 0010-4825.

80 BIBLIOGRAPHY 81

[7] P. Rämö, R. Sacher, B. Snijder, B. Begemann, and L. Pelkmans, “CellClassifier: Supervised learning of cellular phenotypes,” Bioin- formatics, vol. 25, no. 22, pp. 3028–3030, 2009. doi: 10 . 1093 / bioinformatics/btp524. [8] M. Held, M. Schmitz, B. Fischer, et al., “CellCognition: Time- resolved phenotype annotation in high-throughput live cell imag- ing,” Nature Methods, vol. 7, no. 9, pp. 747–754, 2010. doi: 10. 1038/nmeth.1486. [9] D. Laksameethanasan, R. Tan, G.-L. Toh, and L.-H. Loo, “CellX- press: A fast and user-friendly software platform for profiling cellular phenotypes,” BMC Bioinformatics, vol. 14, no. SUPPL16, 2013. doi: 10.1186/1471-2105-14-S16-S4. [10] A. Carpenter, T. Jones, M. Lamprecht, et al., “CellProfiler: Im- age analysis software for identifying and quantifying cell phe- notypes,” Genome Biology, vol. 7, no. 10, 2006. doi: 10.1186/gb- 2006-7-10-r100. [11] F. Zanella, J. B. Lorens, and W. Link, “High content screening: Seeing is believing,” eng, Trends in Biotechnology, vol. 28, no. 5, pp. 237–245, May 2010, issn: 1879-3096. doi: 10.1016/j.tibtech. 2010.02.005. [12] K. Carlsson, R. Lenz, and N. Åslund, “Three-dimensional mi- croscopy using a confocal laser scanning microscope,” Optics Letters, vol. 10, no. 2, pp. 53–55, 1985. doi: 10 . 1364 / OL . 10 . 000053. [13] J. Schindelin, I. Arganda-Carreras, E. Frise, et al., “Fiji: An open- source platform for biological-image analysis,” Nature Methods, vol. 9, no. 7, pp. 676–682, 2012. doi: 10.1038/nmeth.2019. [14] Health, en-US. [Online]. Available: http://www.un.org/sustainabledevelopment/ health/ (visited on 03/28/2018). [15] A. Håkansson, “Portal of Research Methods and Methodolo- gies for Research Projects and Degree Projects,” eng, in DIVA, CSREA Press U.S.A, 2013, pp. 67–73. [Online]. Available: http: //urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-136960 (visited on 01/24/2018). [16] J. Caicedo, S. Cooper, F. Heigwer, et al., “Data-analysis strategies for image-based cell profiling,” Nature Methods, vol. 14, no. 9, pp. 849–863, 2017. doi: 10.1038/nmeth.4397. 82 BIBLIOGRAPHY

[17] B. Edris, J. A. Fletcher, R. B. West, M. van de Rijn, and A. H. Beck, Comparative Gene Expression Profiling of Benign and Malignant Le- sions Reveals Candidate Therapeutic Compounds for Leiomyosarcoma, en, Research article, 2012. doi: 10.1155/2012/805614. [Online]. Available: https : / / www . hindawi . com / journals / sarcoma / 2012/805614/ (visited on 02/20/2018). [18] E. Solomon, L. Berg, and D. W. Martin, Biology, English, 8 edi- tion. Belmont, CA: Brooks Cole, Jan. 2007, isbn: 978-0-495-31714- 2. [19] K. Hofmann, “Enzyme Bioinformatics,” en, in Enzyme Cataly- sis in Organic Synthesis, Karlheinzauz and H. Waldmann, Eds., Wiley-VCH Verlag GmbH, 2002, pp. 139–162, isbn: 978-3-527- 61826-2. doi: 10 . 1002 / 9783527618262 . ch5. [Online]. Avail- able: http : / / onlinelibrary . wiley . com . focus . lib . kth . se / doi / 10 . 1002 / 9783527618262 . ch5 / summary (visited on 02/20/2018). [20] R. Ke, M. Mignardi, A. Pacureanu, et al., “In situ sequencing for RNA analysis in preserved tissue and cells,” Nature Meth- ods, vol. 10, no. 9, pp. 857–860, 2013. doi: 10.1038/nmeth.2563. [21] D. J. S. Birch, Y. Chen, and O. J. Rolinski, “Fluorescence,” en, in Photonics, D. L. Andrews, Ed., John Wiley & Sons, Inc., 2015, pp. 1–58, isbn: 978-1-119-01180-4. doi: 10.1002/9781119011804. ch1. [Online]. Available: http://onlinelibrary.wiley.com. focus . lib . kth . se / doi / 10 . 1002 / 9781119011804 . ch1 / summary (visited on 02/20/2018). [22] N. Battich, T. Stoeger, and L. Pelkmans, “Image-based transcrip- tomics in thousands of single human cells at single-molecule resolution,” En, Nature Methods, vol. 10, no. 11, p. 1127, Oct. 2013, issn: 1548-7105. doi: 10.1038/nmeth.2657. [Online]. Avail- able: https://www.nature.com.focus.lib.kth.se/articles/ nmeth.2657 (visited on 01/25/2018). [23] Y. Li, S. Wang, Q. Tian, and X. Ding, “A survey of recent ad- vances in visual feature detection,” Neurocomputing, vol. 149, pp. 736–751, Feb. 2015, issn: 0925-2312. doi: 10.1016/j.neucom. 2014.08.003. [Online]. Available: http://www.sciencedirect. com / science / article / pii / S0925231214010121 (visited on 01/22/2018). BIBLIOGRAPHY 83

[24] T. Lindeberg, “Detecting salient blob-like image structures and their scales with a scale-space primal sketch: A method for focus- of-attention,” en, International Journal of Computer Vision, vol. 11, no. 3, pp. 283–318, Dec. 1993, issn: 0920-5691, 1573-1405. doi: 10 . 1007 / BF01469346. [Online]. Available: https : / / link - springer-com.focus.lib.kth.se/article/10.1007/BF01469346 (visited on 01/30/2018). [25] Blob Detection Using OpenCV ( Python, C++ ) | Learn OpenCV. [Online]. Available: https : / / www . learnopencv . com / blob - detection-using--python-c/ (visited on 02/21/2018). [26] S. Lazebnik, Blob detection, Feb. 2011. [Online]. Available: http: //www.cs.unc.edu/~lazebnik/spring11/lec08_blob.pdf (visited on 02/21/2018). [27] T. Lindeberg and J.-O. Eklundh, “Scale detection and region ex- traction from a scale-space primal sketch,” 1990, pp. 416–426. [28] A. Kaspers, “Blob detection,” English, Image Science Institute, UMC Utrecht, Tech. Rep., 2011. (visited on 02/21/2018). [29] M. Sonka, V. Hlavac, and R. Boyle, Image Processing, Analysis, and Machine Vision. Thomson-Engineering, 2007, isbn: 978-0-495- 08252-1. [30] Template Matching — skimage v0.14dev docs. [Online]. Available: http : / / scikit - image . org / docs / dev / auto _ examples / features_detection/plot_template.html (visited on 02/21/2018). [31] M. Sezgin and B. Sankur, “Survey over image thresholding tech- niques and quantitative performance evaluation,” J. Electronic Imaging, vol. 13, pp. 146–168, Jan. 2004. [32] S. van der Walt, J. L. Schönberger, J. Nunez-Iglesias, et al., “Scikit- image: Image processing in Python,” en, PeerJ, vol. 2, e453, Jun. 2014, issn: 2167-8359. doi: 10.7717/peerj.453. [Online]. Avail- able: https://peerj.com/articles/453 (visited on 02/21/2018). [33] T. Lindeberg, “Feature Detection with Automatic Scale Selec- tion,” International Journal of Computer Vision, vol. 30, no. 2, pp. 79– 116, 1998. 84 BIBLIOGRAPHY

[34] T. Lindeberg, “Scale Selection Properties of Generalized Scale- Space Interest Point Detectors,” en, Journal of Mathematical Imag- ing and Vision, vol. 46, no. 2, pp. 177–210, Jun. 2013, issn: 0924- 9907, 1573-7683. doi: 10.1007/s10851-012-0378-3. [Online]. Available: https://link-springer-com.focus.lib.kth.se/ article/10.1007/s10851-012-0378-3 (visited on 03/22/2018). [35] J. D. Hunter, “Matplotlib: A 2d Graphics Environment,” Com- puting in Science & Engineering, vol. 9, no. 3, pp. 90–95, 2007, issn: 1521-9615. doi: 10 . 1109 / MCSE . 2007 . 55. [Online]. Available: http://ieeexplore.ieee.org/document/4160265/ (visited on 02/21/2018). [36] C. Sommer, C. Straehle, U. Kothe, and F. Hamprecht, “Ilastik: Interactive learning and segmentation toolkit,” 2011, pp. 230– 233. doi: 10.1109/ISBI.2011.5872394. [37] P. F. Felzenszwalb and D. P. Huttenlocher, “Efficient Graph- Based Image Segmentation,” en, International Journal of Computer Vision, vol. 59, no. 2, pp. 167–181, Sep. 2004, issn: 0920-5691, 1573-1405. doi: 10.1023/B:VISI.0000022288.19776.77. [On- line]. Available: https://link-springer-com.focus.lib.kth. se/article/10.1023/B:VISI.0000022288.19776.77 (visited on 01/30/2018). [38] A. Vedaldi and S. Soatto, “Quick Shift and Kernel Methods for Mode Seeking,” en, in Computer Vision – ECCV 2008, ser. Lec- ture Notes in Computer Science, Springer, Berlin, Heidelberg, Oct. 2008, pp. 705–718, isbn: 978-3-540-88692-1 978-3-540-88693- 8. doi: 10.1007/978-3-540-88693-8_52. [Online]. Available: https://link-springer-com.focus.lib.kth.se/chapter/ 10.1007/978-3-540-88693-8_52 (visited on 01/30/2018). [39] D. Lowe, “Distinctive image features from scale-invariant key- points,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004. doi: 10.1023/B:VISI.0000029664.99615.94. [40] W. S. Qureshi, A. Payne, K. B. Walsh, R. Linker, O. Cohen, and M. N. Dailey, “Machine vision for counting fruit on tree canopies,” en, Precision Agriculture, vol. 18, no. 2, pp. 224–244, Apr. 2017, issn: 1385-2256, 1573-1618. doi: 10 . 1007 / s11119 - 016 - 9458 - 5. [Online]. Available: https : / / link . springer . com/article/10.1007/s11119-016-9458-5 (visited on 01/22/2018). BIBLIOGRAPHY 85

[41] Comparison of segmentation and superpixel algorithms — skimage v0.14dev docs. [Online]. Available: http://scikit-image.org/ docs/dev/auto_examples/segmentation/plot_segmentations. html (visited on 02/21/2018). [42] A. L. Samuel, “Some Studies in Machine Learning Using the Game of Checkers,” IBM Journal of Research and Development, vol. 3, no. 3, pp. 210–229, Jul. 1959, issn: 0018-8646. doi: 10.1147/ rd.33.0210. [43] S. J. Russell and P. Norvig, Artificial Intelligence: A Modern Ap- proach, en. Prentice Hall, 2010, isbn: 978-0-13-604259-4. [44] P. Berkhin, “A survey of clustering data mining techniques,” in Grouping Multidimensional Data: Recent Advances in Clustering, 2006, pp. 25–71. doi: 10.1007/3-540-28349-8_2. [45] S. T. Roweis and L. K. Saul, “Nonlinear Dimensionality Reduc- tion by Locally Linear Embedding,” en, Science, vol. 290, no. 5500, pp. 2323–2326, Dec. 2000, issn: 0036-8075, 1095-9203. doi: 10 . 1126/science.290.5500.2323. [Online]. Available: http:// science.sciencemag.org/content/290/5500/2323 (visited on 02/23/2018). [46] T. Menzies, “Data Mining,” en, in Recommendation Systems in Software Engineering, Springer, Berlin, Heidelberg, 2014, pp. 39– 75, isbn: 978-3-642-45134-8 978-3-642-45135-5. doi: 10.1007/978- 3-642-45135-5_3. [Online]. Available: https://link-springer- com.focus.lib.kth.se/chapter/10.1007/978-3-642-45135- 5_3 (visited on 02/26/2018). [47] P. Domingos and M. Pazzani, “On the Optimality of the Simple Bayesian Classifier under Zero-One Loss,” en, Machine Learning, vol. 29, no. 2-3, pp. 103–130, Nov. 1997, issn: 0885-6125, 1573- 0565. doi: 10.1023/A:1007413511361. [Online]. Available: https: //link-springer-com.focus.lib.kth.se/article/10.1023/ A:1007413511361 (visited on 02/26/2018). [48] X. Wu, V. Kumar, J. R. Quinlan, et al., “Top 10 algorithms in data mining,” en, Knowledge and Information Systems, vol. 14, no. 1, pp. 1–37, Jan. 2008, issn: 0219-1377, 0219-3116. doi: 10.1007/ s10115 - 007 - 0114 - 2. [Online]. Available: https : / / link - springer-com.focus.lib.kth.se/article/10.1007/s10115- 007-0114-2 (visited on 02/28/2018). 86 BIBLIOGRAPHY

[49] M. M. Deza and E. Deza, Encyclopedia of Distances, en, 4th ed. Berlin Heidelberg: Springer-Verlag, 2016, isbn: 978-3-662-52843- 3. [Online]. Available: //www.springer.com/la/book/9783662528433 (visited on 03/22/2018). [50] P.Hart, “The condensed nearest neighbor rule (Corresp.),” IEEE Transactions on Information Theory, vol. 14, no. 3, pp. 515–516, May 1968, issn: 0018-9448. doi: 10.1109/TIT.1968.1054155. [51] D. L. Wilson, “Asymptotic Properties of Nearest Neighbor Rules Using Edited Data,” IEEE Transactions on Systems, Man, and Cy- bernetics, vol. SMC-2, no. 3, pp. 408–421, Jul. 1972, issn: 0018- 9472. doi: 10.1109/TSMC.1972.4309137. [52] V. Podgorelec and M. Zorman, “Decision trees,” in Computa- tional Complexity: Theory, Techniques, and Applications, vol. 9781461418009, 2012, pp. 827–845. doi: 10.1007/978-1-4614-1800-9_53. [53] T. K. Ho, “A Data Complexity Analysis of Comparative Ad- vantages of Decision Forest Constructors,” en, Pattern Analysis & Applications, vol. 5, no. 2, pp. 102–112, Jun. 2002, issn: 1433- 7541. doi: 10.1007/s100440200009. [Online]. Available: https: //link-springer-com.focus.lib.kth.se/article/10.1007/ s100440200009 (visited on 02/28/2018). [54] Y. Freund and R. E. Schapire, “A Decision-Theoretic Generaliza- tion of On-Line Learning and an Application to Boosting,” Jour- nal of Computer and System Sciences, vol. 55, no. 1, pp. 119–139, Aug. 1997, issn: 0022-0000. doi: 10.1006/jcss.1997.1504. [On- line]. Available: http://www.sciencedirect.com/science/ article/pii/S002200009791504X (visited on 02/28/2018). [55] D. Simovici, “Intelligent Data Analysis Techniques—Machine Learning and Data Mining,” en, in Artificial Intelligent Approaches in Petroleum Geosciences, Springer, Cham, 2015, pp. 1–51, isbn: 978-3-319-16530-1 978-3-319-16531-8. doi: 10.1007/978-3-319- 16531-8_1. [Online]. Available: https://link-springer-com. focus.lib.kth.se/chapter/10.1007/978-3-319-16531-8_1 (visited on 02/27/2018). [56] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT Press, 2016. BIBLIOGRAPHY 87

[57] M. D. Zeiler, “ADADELTA: An Adaptive Learning Rate Method,” arXiv:1212.5701 [cs], Dec. 2012, arXiv: 1212.5701. [Online]. Avail- able: http://arxiv.org/abs/1212.5701 (visited on 04/11/2018). [58] D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Opti- mization,” arXiv:1412.6980 [cs], Dec. 2014, arXiv: 1412.6980. [On- line]. Available: http://arxiv.org/abs/1412.6980 (visited on 04/11/2018). [59] D. Mysid, A simplified view of an artifical neural network. Nov. 2006. [Online]. Available: https://commons.wikimedia.org/ w/index.php?curid=1412126 (visited on 04/11/2018). [60] S. Firdaus and M. A. Uddin, “A Survey on Clustering Algo- rithms and Complexity Analysis,” English, International Jour- nal of Computer Science, vol. 12, no. 2, pp. 62–85, Mar. 2015, issn: 1694-0814. [Online]. Available: https://www.ijcsi.org/papers/ IJCSI-12-2-62-85.pdf (visited on 03/01/2018). [61] D. Comaniciu and P. Meer, “: A robust approach to- ward feature space analysis,” IEEE Transactions on Pattern Analy- sis and Machine Intelligence, vol. 24, no. 5, pp. 603–619, May 2002, issn: 0162-8828. doi: 10.1109/34.1000236. [62] Sklearn.cluster.MeanShift — scikit-learn 0.19.1 documentation. [On- line]. Available: http://scikit-learn.org/stable/modules/ generated/sklearn.cluster.MeanShift.html#sklearn.cluster. MeanShift (visited on 03/05/2018). [63] A. Y. Ng, M. I. Jordan, and Y. Weiss, “On Spectral Clustering: Analysis and an algorithm,” in Advances in Neural Information Processing Systems, MIT Press, 2001, pp. 849–856. [64] U. V. Luxburg, A Tutorial on Spectral Clustering. 2007. [65] B. J. Frey and D. Dueck, “Clustering by Passing Messages Be- tween Data Points,” en, Science, vol. 315, no. 5814, pp. 972–976, Feb. 2007, issn: 0036-8075, 1095-9203. doi: 10.1126/science. 1136800. [Online]. Available: http : / / science . sciencemag . org/content/315/5814/972 (visited on 03/02/2018). 88 BIBLIOGRAPHY

[66] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A Density-based Algorithm for Discovering Clusters a Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise,” in Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, ser. KDD’96, Portland, Oregon: AAAI Press, 1996, pp. 226–231. [Online]. Available: http://dl.acm. org/citation.cfm?id=3001460.3001507 (visited on 03/02/2018). [67] 2.3. Clustering — scikit-learn 0.19.1 documentation. [Online]. Avail- able: http://scikit-learn.org/stable/modules/clustering. html#birch (visited on 03/02/2018). [68] P. Rousseeuw, “Silhouettes: A graphical aid to the interpreta- tion and validation of cluster analysis,” Journal of Computational and Applied Mathematics, vol. 20, no. C, pp. 53–65, 1987. doi: 10. 1016/0377-0427(87)90125-7. [69] Sklearn.metrics.silhouette_score — scikit-learn 0.19.1 documentation. [Online]. Available: http://scikit-learn.org/stable/modules/ generated/sklearn.metrics.silhouette_score.html#sklearn. metrics.silhouette_score (visited on 03/05/2018). [70] A. Hervé and L. Williams, “Principal component analysis,” Wi- ley Interdisciplinary Reviews: Computational Statistics, vol. 2, no. 4, pp. 433–459, Jul. 2010, issn: 1939-5108. doi: 10.1002/wics.101. [Online]. Available: http://onlinelibrary.wiley.com/doi/ abs/10.1002/wics.101 (visited on 04/03/2018). [71] S. Ng, “Principal component analysis to reduce dimension on digital image,” vol. 111, 2017, pp. 113–119. doi: 10 . 1016 / j . procs.2017.06.017. [72] K. T. M. Han and B. Uyyanonvara, “A Survey of Blob Detec- tion Algorithms for Biomedical Images,” in 2016 7th Interna- tional Conference of Information and Communication Technology for Embedded Systems (IC-ICTES), Mar. 2016, pp. 57–60. doi: 10 . 1109/ICTEmSys.2016.7467122. [73] K. Yamamoto, Y. Yoshioka, and S. Ninomiya, “Detection and counting of intact tomato fruits on tree using image analysis and machine learning methods,” 2013, pp. 664–667. BIBLIOGRAPHY 89

[74] N. J. B. McFarlane and C. P. Schofield, “Segmentation and track- ing of piglets in images,” en, Machine Vision and Applications, vol. 8, no. 3, pp. 187–193, May 1995, issn: 0932-8092, 1432-1769. doi: 10.1007/BF01215814. [Online]. Available: https://link. springer.com/article/10.1007/BF01215814 (visited on 01/23/2018). [75] J. Zavadil, J. Tuma, and V. Santos, “Traffic signs detection using blob analysis and pattern recognition,” 2012, pp. 776–779. doi: 10.1109/CarpathianCC.2012.6228752. [76] L. Minor and J. Sklansky, “The Detection and Segmentation of Blobs in Infrared Images,” IEEE Transactions on Systems, Man and Cybernetics, vol. 11, no. 3, pp. 194–201, 1981. doi: 10.1109/TSMC. 1981.4308652. [77] W. Moon, Y.-W. Shen, M. Bae, C.-S. Huang, J.-H. Chen, and R.-F. Chang, “Computer-aided tumor detection based on multi-scale blob detection algorithm in automated breast ultrasound im- ages,” IEEE Transactions on , vol. 32, no. 7, pp. 1191– 1200, 2013. doi: 10.1109/TMI.2012.2230403. [78] C. A. Schneider, W. S. Rasband, and K. W. Eliceiri, NIH Image to ImageJ: 25 years of image analysis, en, Comments and Opin- ion, Jun. 2012. doi: 10.1038/nmeth.2089. [Online]. Available: http://www.nature.com/articles/nmeth.2089 (visited on 05/15/2018). [79] J. Lee, E. Daugharthy, J. Scheiman, et al., “Highly multiplexed subcellular RNA sequencing in situ,” Science, vol. 343, no. 6177, pp. 1360–1363, 2014. doi: 10.1126/science.1250212. [80] T. Stoeger, N. Battich, M. Herrmann, Y. Yakimovich, and L. Pelk- mans, “Computer vision for image-based transcriptomics,” Meth- ods, vol. 85, pp. 44–53, 2015. doi: 10.1016/j.ymeth.2015.05. 016. [81] O. Z. Kraus and B. J. Frey, “Computer vision for high content screening,” Critical Reviews in Biochemistry and Molecular Biol- ogy, vol. 51, no. 2, pp. 102–109, Mar. 2016, issn: 1040-9238. doi: 10.3109/10409238.2015.1135868. [Online]. Available: https: / / doi . org / 10 . 3109 / 10409238 . 2015 . 1135868 (visited on 02/01/2018). 90 BIBLIOGRAPHY

[82] A. Shariff, J. Kangas, L. P. Coelho, S. Quinn, and R. F. Murphy, “Automated image analysis for high-content screening and anal- ysis,” eng, Journal of Biomolecular Screening, vol. 15, no. 7, pp. 726– 734, Aug. 2010, issn: 1552-454X. doi: 10.1177/1087057110370894. [83] C. Sommer and D. Gerlich, “Machine learning in cell biology- teaching computers to recognize phenotypes,” Journal of Cell Science, vol. 126, no. 24, pp. 5529–5539, 2013. doi: 10.1242/jcs. 123604. [84] B. T. Grys, D. S. Lo, N. Sahin, et al., “Machine learning and com- puter vision approaches for phenotypic profiling,” en, J Cell Biol, vol. 216, no. 1, pp. 65–71, Jan. 2017, issn: 0021-9525, 1540-8140. doi: 10.1083/jcb.201610026. [Online]. Available: http://jcb. rupress.org/content/216/1/65 (visited on 01/23/2018). [85] M. Wang, X. Zhou, F. Li, J. Huckins, R. King, and S. Wong, “Novel cell segmentation and online SVM for cell cycle phase identifi- cation in automated microscopy,” Bioinformatics, vol. 24, no. 1, pp. 94–101, 2008. doi: 10.1093/bioinformatics/btm530. [86] K. Vermeer, d. S. van der, H. Lemij, and B. de, “Automated seg- mentation by pixel classification of retinal layers in ophthalmic OCT images,” Biomedical Optics Express, vol. 2, no. 6, pp. 1743– 1756, 2011. [87] H. Irshad, A. Veillard, L. Roux, and D. Racoceanu, “Methods for nuclei detection, segmentation, and classification in digital histopathology: A review-current status and future potential,” IEEE Reviews in Biomedical Engineering, vol. 7, pp. 97–114, 2014. doi: 10.1109/RBME.2013.2295804. [88] K. Sirinukunwattana, S. E. A. Raza, Y.-W. Tsang, D. R. J. Snead, I. A. Cree, and N. M. Rajpoot, “Locality Sensitive Deep Learn- ing for Detection and Classification of Nuclei in Routine Colon Cancer Histology Images,” IEEE Transactions on Medical Imag- ing, vol. 35, no. 5, pp. 1196–1206, May 2016, issn: 0278-0062, 1558- 254X. doi: 10 . 1109 / TMI . 2016 . 2525803. [Online]. Available: http://ieeexplore.ieee.org/document/7399414/ (visited on 05/03/2018). [89] S. Niu and K. Ren, “Neural cell image segmentation method based on support vector machine,” vol. 9675, 2015. doi: 10.1117/ 12.2205114. BIBLIOGRAPHY 91

[90] N. Hatipoglu and G. Bilgin, “Cell segmentation in histopatho- logical images with deep learning algorithms by utilizing spa- tial relationships,” Medical and Biological Engineering and Com- puting, vol. 55, no. 10, pp. 1829–1848, 2017. doi: 10.1007/s11517- 017-1630-1. [91] F. Piccinini, T. Balassa, A. Szkalisity, et al., “Advanced Cell Clas- sifier: User-Friendly Machine-Learning-Based Software for Dis- covering Phenotypes in High-Content Imaging Data,” eng, Cell Systems, vol. 4, no. 6, 651–655.e5, Jun. 2017, issn: 2405-4712. doi: 10.1016/j.cels.2017.05.012. [92] J. Bins and B. A. Draper, “Feature selection from huge feature sets,” in Proceedings Eighth IEEE International Conference on Com- puter Vision. ICCV 2001, vol. 2, 2001, 159–165 vol.2. doi: 10.1109/ ICCV.2001.937619. [93] K. Kira and L. A. Rendell, “The Feature Selection Problem: Tra- ditional Methods and a New Algorithm,” in Proceedings of the Tenth National Conference on Artificial Intelligence, ser. AAAI’92, San Jose, California: AAAI Press, 1992, pp. 129–134, isbn: 978-0- 262-51063-9. [Online]. Available: http://dl.acm.org/citation. cfm?id=1867135.1867155 (visited on 03/27/2018). [94] J. Li, K. Cheng, S. Wang, et al., “Feature Selection: A Data Per- spective,” arXiv:1601.07996 [cs], Jan. 2016, arXiv: 1601.07996. [On- line]. Available: http://arxiv.org/abs/1601.07996 (visited on 03/27/2018). [95] K. Yeager, LibGuides: SPSS Tutorials: Chi-Square Test of Indepen- dence, en. [Online]. Available: https://libguides.library. kent.edu/SPSS/ChiSquare (visited on 04/09/2018). [96] ——, LibGuides: SPSS Tutorials: Pearson Correlation, en. [Online]. Available: https : / / libguides . library . kent . edu / SPSS / PearsonCorr (visited on 04/09/2018). [97] R. Caruana and D. Freitag, “Greedy Attribute Selection,” in In Proceedings of the Eleventh International Conference on Machine Learn- ing, Morgan Kaufmann, 1994, pp. 28–36. [98] F. Chollet et al., Keras. 2015. [Online]. Available: https://keras. io. 92 BIBLIOGRAPHY

[99] M. Abadi, A. Agarwal, P. Barham, et al., TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. 2015. [Online]. Avail- able: https://www.tensorflow.org/. [100] F. Pedregosa, G. Varoquaux, A. Gramfort, et al., “Scikit-learn: Machine Learning in Python,” Journal of Machine Learning Re- search, vol. 12, no. Oct, pp. 2825–2830, 2011, issn: ISSN 1533- 7928. [Online]. Available: http : / / jmlr . org / papers / v12 / pedregosa11a.html (visited on 03/27/2018). [101] T. Schaul, I. Antonoglou, and D. Silver, “Unit Tests for Stochastic Optimization,” arXiv:1312.6055 [cs], Dec. 2013, arXiv: 1312.6055. [Online]. Available: http://arxiv.org/abs/1312.6055 (visited on 04/11/2018). [102] I. G. Goldberg, C. Allan, J.-M. Burel, et al., “The Open Microscopy Environment (OME) Data Model and XML file: Open tools for informatics and quantitative analysis in biological imaging,” eng, Genome Biology, vol. 6, no. 5, R47, 2005, issn: 1474-760X. doi: 10. 1186/gb-2005-6-5-r47. [103] Sklearn.cluster.KMeans — scikit-learn 0.19.1 documentation. [On- line]. Available: http://scikit-learn.org/stable/modules/ generated/sklearn.cluster.KMeans.html (visited on 03/13/2018). [104] L. Kamentsky, Python-javabridge: Python wrapper for the Java Na- tive Interface, original-date: 2014-03-05T16:10:38Z, May 2018. [On- line]. Available: https://github.com/LeeKamentsky/python- javabridge (visited on 05/15/2018). [105] NumPy — NumPy. [Online]. Available: http://www.numpy.org/ (visited on 03/27/2018). [106] A. Hagberg, P.Swart, and D. S Chult, “Exploring Network Struc- ture, Dynamics, and Function Using NetworkX,” in Proceedings of the 7th Python in Science Conference, Jan. 2008. [107] W. McKinney,“Data Structures for Statistical Computing in Python,” in Proceedings of the 9th Python in Science Conference, S. v. d. Walt and J. Millman, Eds., 2010, pp. 51–56. [108] Python-bioformats: Read and write life sciences file formats, original- date: 2014-03-05T16:23:41Z, Apr. 2018. [Online]. Available: https: //github.com/CellProfiler/python-bioformats (visited on 05/15/2018). BIBLIOGRAPHY 93

[109] Riverbank | Software | PyQt | What is PyQt? [Online]. Avail- able: https://www.riverbankcomputing.com/software/pyqt/ intro (visited on 05/15/2018). [110] S. v. d. Walt, J. L. Schönberger, J. Nunez-Iglesias, et al., “Scikit- image: Image processing in Python,” en, PeerJ, vol. 2, e453, Jun. 2014, issn: 2167-8359. doi: 10.7717/peerj.453. [Online]. Avail- able: https://peerj.com/articles/453 (visited on 03/27/2018). [111] E. Jones, T. Oliphant, P. Peterson, et al., SciPy: Open source sci- entific tools for Python. 2001. [Online]. Available: http://www. scipy.org/. [112] S. Silvester, Tifffile: Read and write image data from and to TIFF files. [Online]. Available: https : / / github . com / blink1073 / tifffile (visited on 03/27/2018). [113] Pyyaml: Canonical source repository for PyYAML, original-date: 2011-11-03T05:09:49Z, May 2018. [Online]. Available: https:// github.com/yaml/pyyaml (visited on 05/15/2018). [114] D. M. Van and G. Hinton, “Visualizing data using t-SNE,” Jour- nal of Machine Learning Research, vol. 9, pp. 2579–2625, 2008. [115] U. Köthe, Generische Programmierung für die Bildverarbeitung, Deutsch. Hamburg: Books on Demand, Sep. 2000, isbn: 978-3-8311-0239- 6. [116] C. D. Manning, P. Raghavan, and H. Schütze, Introduction to In- formation Retrieval. New York, NY, USA: Cambridge University Press, 2008, isbn: 978-0-521-86571-5. [117] M. Kozak, ““A Dendrite Method for Cluster Analysis” by Cal- iński and Harabasz: A Classical Work that is Far Too Often In- correctly Cited,” Communications in Statistics - Theory and Meth- ods, vol. 41, no. 12, pp. 2279–2280, Jun. 2012, issn: 0361-0926. doi: 10.1080/03610926.2011.560741. [Online]. Available: https: / / doi . org / 10 . 1080 / 03610926 . 2011 . 560741 (visited on 05/16/2018). [118] G. Bradski, “The OpenCV Library,” Dr. Dobb’s Journal of Software Tools, 2000. [119] Numba — Numba. [Online]. Available: https://numba.pydata. org/ (visited on 05/16/2018). 94 BIBLIOGRAPHY

[120] X. Zheng, Y. Wang, and G. Wang, “White blood cell segmen- tation using expectation-maximization and automatic support vector machine learning,” Shuju Caiji Yu Chuli/Journal of Data Ac- quisition and Processing, vol. 28, no. 5, pp. 614–619, 2013. [121] D. Cireşan, A. Giusti, L. Gambardella, and J. Schmidhuber, “Deep neural networks segment neuronal membranes in electron mi- croscopy images,” vol. 4, 2012, pp. 2843–2851. [122] P. Moeskops, M. Viergever, A. Mendrik, V. De, M. Benders, and I. Isgum, “Automatic Segmentation of MR Brain Images with a Convolutional Neural Network,” IEEE Transactions on Medical Imaging, vol. 35, no. 5, pp. 1252–1261, 2016. doi: 10.1109/TMI. 2016.2548501. [123] G. Partel, G. Milli, and C. Wählby, “Improving Recall of In Situ Sequencing by Self-Learned Features and a Graphical Model,” arXiv:1802.08894 [cs, q-bio], Feb. 2018, arXiv: 1802.08894. [On- line]. Available: http://arxiv.org/abs/1802.08894 (visited on 05/16/2018). Appendix A

Experiment F software configu- rations

A.1 Crops

The coordinates of the crops that are processed by the four blob detec- tion programs can be found in Table A.1.

A.2 MFB detector

The proposed blob detector in this thesis follows the blob detection process described in 6.1. The top 10 best features from Table 7.1 are ex- tracted for each input image. The features are not compressed. A deci- sion tree classifier is used to classify the pixels based on these features. Then an agglomerative clustering algorithm relying on inter-centroid distance is used to group the blob pixels into blobs. Finally the blobs smaller than 4 pixels are filtered out and the blob centroids are calcu- lated.

95 96 APPENDIX A. EXPERIMENT F SOFTWARE CONFIGURATIONS

x y 10462 6331 11060 5698 12158 4521 15579 4741 17481 13492 20019 13236 21200 7883 3729 2072 6315 8851 8038 12185

Table A.1: Locations in pixels of the random crops of the test image. All the crops are of size 500 × 500 × 16 pixels.

A.3 FIJI

ImageJ macro, executed with FIJI (ImageJ 1.51s) for each input image with Process -> Batch -> Macro...:

name=getTitle; run("Smooth (3D)", "method=Gaussian sigma=1.000 use"); run("3D Fast Filters","filter=TopHat radius_x_pix=2.0 radius_y_pix =2.0 radius_z_pix=1.0 Nb_cpus=8"); run("Make Binary", "method=MaxEntropy background=Default"); run("3D Fill Holes"); run("3D Maxima Finder", "radiusxy=1.50 radiusz=0.5 noise=100"); run("3D Watershed Split", "binary=3D_TopHat seeds=peaks radius=1") ; run("3D object counter...", "threshold=100 slice=8 min.=4 max .=4000000 statistics"); filename = name + "_blobs.csv" saveAs("Results", "D:\\Single\\my-first-blobs\\analysis\\F\\fiji \\" + filename); APPENDIX A. EXPERIMENT F SOFTWARE CONFIGURATIONS 97

A.4 CellProfiler

CellProfiler pipeline, executed with CellProfiler 3.0.0 for each input im- age:

CellProfiler Pipeline: http://www.cellprofiler.org Version:3 DateRevision:300 GitHash: ModuleCount:13 HasImagePlaneDetails:False

Images:[module_num:1|svn_version:\'Unknown\'| variable_revision_number:2|show_window:False|notes:\x5B\'To begin creating your project, use the Images module to compile a list of files and/or folders that you want to analyze. You can also specify a set of rules to include only the desired files in your selected folders.\'\x5D|batch_state:array(\x5B\ x5D, dtype=uint8)|enabled:True|wants_pause:False] : Filter images?:Images only Select the rule criteria:and (extension does isimage) ( directory doesnot containregexp "\x5B\\\\\\\\\\\\\\\\/\x5D \\\\\\\\.")

Metadata:[module_num:2|svn_version:\'Unknown\'| variable_revision_number:4|show_window:False|notes:\x5B\'The Metadata module optionally allows you to extract information describing your images (i.e, metadata) which will be stored along with your measurements. This information can be contained in the file name and/or location, or in an external file.\'\x5D|batch_state:array(\x5B\x5D, dtype=uint8)|enabled: True|wants_pause:False] Extract metadata?:Yes Metadata data type:Text Metadata types:{} Extraction method count:1 Metadata extraction method:Extract from file/folder names Metadata source:File name Regular expression to extract from file name:(?P.*) Regular expression to extract from folder name:(?P\x5B0 -9\x5D{4}_\x5B0-9\x5D{2}_\x5B0-9\x5D{2})$ Extract metadata from:All images Select the filtering criteria:and (file does contain "") 98 APPENDIX A. EXPERIMENT F SOFTWARE CONFIGURATIONS

Metadata file location: Match file and image metadata:\x5B\x5D Use case insensitive matching?:No

NamesAndTypes:[module_num:3|svn_version:\'Unknown\'| variable_revision_number:8|show_window:False|notes:\x5B\'The NamesAndTypes module allows you to assign a meaningful name to each image by which other modules will refer to it.\'\x5D| batch_state:array(\x5B\x5D, dtype=uint8)|enabled:True| wants_pause:False] Assign a name to:All images Select the image type:Grayscale image Name to assign these images:input Match metadata:\x5B\x5D Image set matching method:Order Set intensity range from:Image metadata Assignments count:1 Single images count:0 Maximum intensity:255.0 Process as 3D?:Yes Relative pixel spacing in X:1.0 Relative pixel spacing in Y:1.0 Relative pixel spacing in Z:3.7 Select the rule criteria:and (file does contain "") Name to assign these images:DNA Name to assign these objects:Cell Select the image type:Grayscale image Set intensity range from:Image metadata Maximum intensity:255.0

Groups:[module_num:4|svn_version:\'Unknown\'| variable_revision_number:2|show_window:False|notes:\x5B\'The Groups module optionally allows you to split your list of images into image subsets (groups) which will be processed independently of each other. Examples of groupings include screening batches, microtiter plates, time-lapse movies, etc .\'\x5D|batch_state:array(\x5B\x5D, dtype=uint8)|enabled:True| wants_pause:False] Do you want to group your images?:No grouping metadata count:1 Metadata category:None

GaussianFilter:[module_num:5|svn_version:\'Unknown\'| variable_revision_number:1|show_window:False|notes:\x5B\x5D| batch_state:array(\x5B\x5D, dtype=uint8)|enabled:True| wants_pause:False] APPENDIX A. EXPERIMENT F SOFTWARE CONFIGURATIONS 99

Select the input image:input Name the output image:gaussian_filtered Sigma:0.3

EnhanceOrSuppressFeatures:[module_num:6|svn_version:\'Unknown\'| variable_revision_number:6|show_window:False|notes:\x5B\x5D| batch_state:array(\x5B\x5D, dtype=uint8)|enabled:True| wants_pause:False] Select the input image:gaussian_filtered Name the output image:enhanced Select the operation:Enhance Feature size:12 Feature type:Speckles Range of hole sizes:1,10 Smoothing scale:2.0 Shear angle:0.0 Decay:0.95 Enhancement method:Tubeness Speed and accuracy:Fast

Threshold:[module_num:7|svn_version:\'Unknown\'| variable_revision_number:10|show_window:False|notes:\x5B\x5D| batch_state:array(\x5B\x5D, dtype=uint8)|enabled:True| wants_pause:False] Select the input image:enhanced Name the output image:thresholded Threshold strategy:Global Thresholding method:Manual Threshold smoothing scale:0.0 Threshold correction factor:1.0 Lower and upper bounds on threshold:0.0,1.0 Manual threshold:0.10 Select the measurement to threshold with:None Two-class or three-class thresholding?:Two classes Assign pixels in the middle intensity class to the foreground or the background?:Foreground Size of adaptive window:50 Lower outlier fraction:0.05 Upper outlier fraction:0.05 Averaging method:Mean Variance method:Standard deviation # of deviations:2.0 Thresholding method:Otsu

RemoveHoles:[module_num:8|svn_version:\'Unknown\'| variable_revision_number:1|show_window:False|notes:\x5B\x5D| 100 APPENDIX A. EXPERIMENT F SOFTWARE CONFIGURATIONS

batch_state:array(\x5B\x5D, dtype=uint8)|enabled:True| wants_pause:False] Select the input image:thresholded Name the output image:removed_holes Size:1.0

Watershed:[module_num:9|svn_version:\'Unknown\'| variable_revision_number:1|show_window:False|notes:\x5B\x5D| batch_state:array(\x5B\x5D, dtype=uint8)|enabled:True| wants_pause:False] Select the input image:removed_holes Name the output object:watershed Generate from:Distance Markers:None Mask:Leave blank Connectivity:8 Downsample:1

MeasureObjectSizeShape:[module_num:10|svn_version:\'Unknown\'| variable_revision_number:1|show_window:False|notes:\x5B\x5D| batch_state:array(\x5B\x5D, dtype=uint8)|enabled:True| wants_pause:False] Select objects to measure:watershed Calculate the Zernike features?:No

FilterObjects:[module_num:11|svn_version:\'Unknown\'| variable_revision_number:8|show_window:False|notes:\x5B\x5D| batch_state:array(\x5B\x5D, dtype=uint8)|enabled:True| wants_pause:False] Select the objects to filter:watershed Name the output objects:filtered_objects Select the filtering mode:Measurements Select the filtering method:Limits Select the objects that contain the filtered objects:None Select the location of the rules or classifier file:Elsewhere ...\x7C Rules or classifier file name:rules.txt Class number:1 Measurement count:1 Additional object count:0 Assign overlapping child to:Both parents Select the measurement to filter by:AreaShape_Area Filter using a minimum measurement value?:Yes Minimum value:4 Filter using a maximum measurement value?:Yes Maximum value:1000 APPENDIX A. EXPERIMENT F SOFTWARE CONFIGURATIONS 101

MeasureObjectSizeShape:[module_num:12|svn_version:\'Unknown\'| variable_revision_number:1|show_window:False|notes:\x5B\x5D| batch_state:array(\x5B\x5D, dtype=uint8)|enabled:True| wants_pause:False] Select objects to measure:filtered_objects Calculate the Zernike features?:No

ExportToSpreadsheet:[module_num:13|svn_version:\'Unknown\'| variable_revision_number:12|show_window:False|notes:\x5B\x5D| batch_state:array(\x5B\x5D, dtype=uint8)|enabled:True| wants_pause:False] Select the column delimiter:Comma (",") Add image metadata columns to your object data file?:Yes Select the measurements to export:No Calculate the per-image mean values for object measurements?: No Calculate the per-image median values for object measurements ?:No Calculate the per-image standard deviation values for object measurements?:No Output file location:Elsewhere...\x7CD\x3A\\\\\\\\Single \\\\\\\\my-first-blobs\\\\\\\\analysis\\\\\\\\F\\\\\\\\ Create a GenePattern GCT file?:No Select source of sample row name:Metadata Select the image to use as the identifier:None Select the metadata to use as the identifier:None Export all measurement types?:No Press button to select measurements:filtered_objects\ x7CAreaShape_Area,filtered_objects\x7CAreaShape_MeanRadius, Image\x7CCount_filtered_objects,Image\ x7CExecutionTime_01Images,Image\x7CExecutionTime_04Groups, Image\x7CExecutionTime_02Metadata,Image\ x7CExecutionTime_11FilterObjects,Image\ x7CExecutionTime_03NamesAndTypes,Image\ x7CExecutionTime_07Threshold,Image\ x7CExecutionTime_08RemoveHoles,Image\ x7CExecutionTime_05GaussianFilter,Image\ x7CExecutionTime_09Watershed,Image\ x7CExecutionTime_06EnhanceOrSuppressFeatures,Image\ x7CExecutionTime_10MeasureObjectSizeShape,Image\ x7CFileName_input,Experiment\x7CModification_Timestamp, Experiment\x7CRun_Timestamp Representation of Nan/Inf:NaN Add a prefix to file names?:No 102 APPENDIX A. EXPERIMENT F SOFTWARE CONFIGURATIONS

Figure A.1: List of input images.

Filename prefix:MyExpt_ Overwrite existing files without warning?:Yes Data to export:filtered_objects Combine these object measurements with those of the previous object?:No File name:blobs.csv Use the object name for the file name?:No

A.5 Ilastik

The parameters of the Ilastik Pixel Classification + Object Classification project, executed with Ilastik 1.3.0 can be found in the Figures A.1, A.2, A.3, A.4, A.5 and A.6. APPENDIX A. EXPERIMENT F SOFTWARE CONFIGURATIONS 103

Figure A.2: Selected pixel features based on the found best features in Table 7.1.

Figure A.3: Labels in the Training step. Label 1 denotes non-blob pixels and Label 2 denotes blob pixels. Of both around 50 example pixels were indicated. 104 APPENDIX A. EXPERIMENT F SOFTWARE CONFIGURATIONS

Figure A.4: Parameters of the Thresholding step.

Figure A.5: Parameters of the Object Feature Selection step. Only the size feature was selected. APPENDIX A. EXPERIMENT F SOFTWARE CONFIGURATIONS 105

Figure A.6: Labels in the Object Classification step. The labels are irrel- evant because object classification is not part of blob detection. How- ever, two labels were needed in order to export the blobs. TRITA TRITA-EECS-EX-2018:125 ISSN 1653-5146

www.kth.se