Machine Learning for Blob Detection in High-Resolution 3D Microscopy Images
Total Page:16
File Type:pdf, Size:1020Kb
DEGREE PROJECT IN COMPUTER SCIENCE AND ENGINEERING, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2018 Machine learning for blob detection in high-resolution 3D microscopy images MARTIN TER HAAK KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE Machine learning for blob detection in high-resolution 3D microscopy images MARTIN TER HAAK EIT Digital Data Science Date: June 6, 2018 Supervisor: Vladimir Vlassov Examiner: Anne Håkansson Electrical Engineering and Computer Science (EECS) iii Abstract The aim of blob detection is to find regions in a digital image that dif- fer from their surroundings with respect to properties like intensity or shape. Bio-image analysis is a common application where blobs can denote regions of interest that have been stained with a fluorescent dye. In image-based in situ sequencing for ribonucleic acid (RNA) for exam- ple, the blobs are local intensity maxima (i.e. bright spots) correspond- ing to the locations of specific RNA nucleobases in cells. Traditional methods of blob detection rely on simple image processing steps that must be guided by the user. The problem is that the user must seek the optimal parameters for each step which are often specific to that image and cannot be generalised to other images. Moreover, some of the existing tools are not suitable for the scale of the microscopy images that are often in very high resolution and 3D. Machine learning (ML) is a collection of techniques that give computers the ability to ”learn” from data. To eliminate the dependence on user parameters, the idea is applying ML to learn the definition of a blob from labelled images. The research question is therefore how ML can be effectively used to perform the blob detection. A blob detector is proposed that first extracts a set of relevant and non- redundant image features, then classifies pixels as blobs and finally uses a clustering algorithm to split up connected blobs. The detec- tor works out-of-core, meaning it can process images that do not fit in memory, by dividing the images into chunks. Results prove the fea- sibility of this blob detector and show that it can compete with other popular software for blob detection. But unlike other tools, the pro- posed blob detector does not require parameter tuning, making it eas- ier to use and more reliable. Keywords Biomedical Image Analysis; Blob Detection; Machine Learning; 3D; Computer Vision; Image Processing iv Abstract Syftet med blobdetektion är att hitta regioner i en digital bild som skil- jer sig från omgivningen med avseende på egenskaper som intensitet eller form. Biologisk bildanalys är en vanlig tillämpning där blobbar kan beteckna intresseregioner som har färgats in med ett fluorescerande färgämne. Vid bildbaserad in situ-sekvensering för ribonukleinsyra (RNA) är blobbarna lokala intensitetsmaxima (dvs ljusa fläckar) motsvarande platserna för specifika RNA-nukleobaser i celler. Traditionella metoder för blob-detektering bygger på enkla bildbehan- dlingssteg som måste vägledas av användaren. Problemet är att an- vändaren måste hitta optimala parametrar för varje steg som ofta är specifika för just den bilden och som inte kan generaliseras tillandra bilder. Dessutom är några av de befintliga verktygen inte lämpliga för storleken på mikroskopibilderna som ofta är i mycket hög upplösning och 3D. Maskininlärning (ML) är en samling tekniker som ger datorer möj- lighet att “lära sig” från data. För att eliminera beroendet av använ- darparametrar, är tanken att tillämpa ML för att lära sig definitionen av en blob från uppmärkta bilder. Forskningsfrågan är därför hur ML effektivt kan användas för att utföra blobdetektion. En blobdetekteringsalgoritm föreslås som först extraherar en uppsät- tning relevanta och icke-överflödiga bildegenskaper, klassificerar sedan pixlar som blobbar och använder slutligen en klustringsalgoritm för att dela upp sammansatta blobbar. Detekteringsalgoritmen fungerar utanför kärnan, vilket innebär att det kan bearbeta bilder som inte får plats i minnet genom att dela upp bilderna i mindre delar. Resultatet visar att detekteringsalgoritmen är genomförbar och visar att den kan konkurrera med andra populära programvaror för blobdetektion. Men i motsats till andra verktyg behöver den föreslagna detekteringsalgo- ritmen inte justering av sina parametrar, vilket gör den lättare att an- vända och mer tillförlitlig. Nyckelord Biomedicinsk bildanalys; Blobdetektion; Maskininlärning; 3D; Datorseende; Bildbehandling v Acknowledgements First, I would like to express my gratitude towards my examiner As- soc. Prof. Anne Håkansson at the KTH Royal Institute of Technol- ogy for guiding me from the first project proposal all the way to the final deliverable. She was always open to answering the most trouble- some questions or providing critical feedback. Due to her meticulous remarks I was able to reshape and tweak my work in order to achieve the high quality it has now. I would also like to thank my supervisor Jacob Kowalewski at Sin- gle Technologies under whom I performed this research. Not only would he provide me with the required resources at any moment, but he would also not hesitate to free up time for discussion. That I was able to finish the project well within the set time is most likely due to his dependable commitment. Moreover, his ideas and suggestions have strongly contributed to the approach applied in this project. Furthermore, I would like to thank Single Technologies for providing me with a very interesting thesis subject and a pleasant working space. I want to thank my co-workers for the nice chats and the friendly am- bience around the office. Finally, I would like to thank my university supervisor Assoc. Prof. Vladimir Vlassov who provided me with some highly needed hints so that I could proceed with my research. Martin ter Haak Stockholm, May 2018 Contents 0.1 Acronyms and abbreviations ................ ix 1 Introduction 1 1.1 Background .......................... 1 1.2 Problem ............................ 2 1.3 Purpose ............................. 4 1.4 Goals .............................. 4 1.4.1 Benefits, ethics and sustainability .......... 5 1.5 Research methodology .................... 6 1.6 Delimitations ......................... 7 1.7 Outline ............................. 8 2 An introduction to in situ RNA sequencing 9 3 Blob detection 11 3.1 Automatic scale selection ................... 11 3.2 Algorithms ........................... 13 3.2.1 Template matching .................. 13 3.2.2 Thresholding ..................... 14 3.2.3 Local extrema ..................... 16 3.2.4 Differential extrema ................. 16 3.2.5 Machine learning ................... 19 3.2.6 Super-pixel classification ............... 19 4 Machine learning 21 4.1 Classification .......................... 22 4.1.1 Naive Bayes ...................... 22 4.1.2 Logistic regression .................. 23 4.1.3 K-Nearest Neighbour ................. 24 4.1.4 Decision Tree ..................... 25 vi CONTENTS vii 4.1.5 Random Forest .................... 26 4.1.6 AdaBoost ....................... 26 4.1.7 Support Vector Machines .............. 27 4.1.8 Neural network .................... 27 4.1.9 Validation ....................... 29 4.2 Clustering ........................... 30 4.2.1 K-means ........................ 30 4.2.2 Agglomerative clustering .............. 30 4.2.3 MeanShift ....................... 31 4.2.4 Spectral clustering .................. 31 4.2.5 Other clustering algorithms ............. 32 4.2.6 Validation ....................... 32 4.3 Dimensionality reduction .................. 33 4.3.1 Principal Component Analysis (PCA) ....... 33 5 Related work 34 5.1 Blob detection ......................... 34 5.2 Machine learning for biomedical image analysis ..... 35 6 Methodology 38 6.1 Blob detection process .................... 38 6.1.1 Feature extraction ................... 39 6.1.2 Feature compression ................. 40 6.1.3 Pixel classification .................. 40 6.1.4 Pixel clustering .................... 40 6.1.5 Blob extraction .................... 41 6.1.6 Blob filtration ..................... 41 6.1.7 Chunking ....................... 41 6.2 Experiments .......................... 42 6.2.1 A: Feature extraction ................. 42 6.2.2 B: Feature compression ............... 45 6.2.3 C: Pixel classification ................. 45 6.2.4 D: Pixel clustering .................. 49 6.2.5 E: Run on whole image ................ 50 6.2.6 F: Comparison with state-of-the-art ........ 51 6.2.7 Summary ....................... 51 6.3 Data collection ......................... 51 6.3.1 Characteristics .................... 51 6.3.2 Labelling ........................ 53 viii CONTENTS 6.4 Experimental design ..................... 55 6.4.1 Test system ...................... 55 6.4.2 Software ........................ 56 6.4.3 Data analysis ..................... 56 6.4.4 Overall reliability and validity ........... 56 7 Analysis 58 7.1 Results from A: Feature extraction ............. 58 7.2 Results from B: Feature compression and C: Pixel classi- fication ............................. 61 7.3 Results from D: Pixel clustering ............... 66 7.4 Results from E: Run on whole image ............ 70 7.5 Results from F: Comparison with state-of-the-art ..... 71 8 Conclusions 74 8.1 Discussion ........................... 76 8.2 Future work .......................... 78 Bibliography 79 A Experiment F software configurations 95 A.1 Crops .............................. 95 A.2 MFB detector ........................