Detection of Mahjong Tiles from Videos Using Computer Vision
Total Page:16
File Type:pdf, Size:1020Kb
Aalto University School of Science Master's Programme in Computer, Communication and Information Sciences Ossi Hirvola Detection of Mahjong tiles from videos using computer vision Master's Thesis Espoo, May 21, 2019 Supervisor: Prof. Juho Kannala Advisor: D.Sc (Tech.) Juha Ylioinas Aalto University School of Science Master's Programme in Computer, Communication and ABSTRACT OF Information Sciences MASTER'S THESIS Author: Ossi Hirvola Title: Detection of Mahjong tiles from videos using computer vision Date: May 21, 2019 Pages: 49 Major: Computer Science Code: SCI3042 Supervisor: Prof. Juho Kannala Advisor: D.Sc (Tech.) Juha Ylioinas Mahjong is a popular 4-player game originating from China. The result of a single game of Mahjong involves considerable amount of chance, similarly to that of a few hands of Poker. Hence, long term data analysis is often required to evaluate one's decisions. There are internet Mahjong platforms that provide replays of past games, but no solution providing digital replays for games played in person exists. In this thesis, we approached this problem by the means of object detection. Recently in object detection, convolutional neural network (CNN) based methods have been popular. To train these methods, large amounts of labeled training data is required. For this, we implemented synthetic data generator that produces synthetic images containing Mahjong tiles. We used synthetic images to train single-shot multibox detector (SSD), our object detector of choice. The SSD network that was trained solely on synthetic training data performed remarkably on synthetic validation data, but did not reach desirable accuracy on real video data. However, by introducing scarce amount of real images as part of the training data, we achieved reasonable accuracy on Mahjong tile detection from real video data. Fine tuning the synthetic data generation to better correspond to real data, as well as implementing error correction to further post process object proposals are potential future improvements. Keywords: computer vision, object detection, convolutional neural net- works, ssd Language: English 2 Aalto-yliopisto Perustieteiden korkeakoulu DIPLOMITYON¨ Tieto-, tietoliikenne- ja informaatiotekniikan maisteriohjelma TIIVISTELMA¨ Tekij¨a: Ossi Hirvola Ty¨on nimi: Mahjong-tiilien tunnistus videoista konen¨a¨oll¨a P¨aiv¨ays: 21. toukokuuta 2019 Sivum¨a¨ar¨a: 49 P¨a¨aaine: Tietotekniikka Koodi: SCI3042 Valvoja: Professori Juho Kannala Ohjaaja: Tekniikan tohtori Juha Ylioinas Mahjong on suosittu kiinalainen nelj¨an pelaajan peli. Yksitt¨aisen mahjong-pelin lopputulema riippuu paljon sattumasta, samaan tapaan kuin esimerkiksi muuta- man pokerik¨aden lopputulema. T¨am¨an vuoksi siirtojen hyvyytt¨a arvioidessa on usein tarvetta analysoida pelej¨a pitk¨an aikav¨alin yli. Monet internetin digitaaliset mahjong alustat tarjoavat pelaajalle aiempien pelien tallenteet, mutta t¨allaista ratkaisua ei ole fyysisesti pelatuille peleille. T¨ass¨a ty¨oss¨a l¨ahestymme t¨at¨a ongel- maa kohteentunnistuksen avulla. Viime aikoina konvoluutioneuroverkkoihin pohjautuvat menetelm¨at ovat olleet suosittuja kohteentunnistuksessa. N¨aiden menetelmien opettamiseen tarvitaan suuria m¨a¨ari¨a ennalta merkitty¨a opetusdataa. T¨ast¨a syyst¨a kehitimme oh- jelman, joka tuottaa mahjong-tiili¨a sis¨alt¨avi¨a synteettisi¨a kuvia. K¨ayt¨amme n¨ait¨a synteettisi¨a kuvia opettamaan Single-shot multibox detector (SSD) - kohteentunnistusmenetelm¨a¨a. Kokonaan synteettisell¨a datalla opetettu SSD neuroverkko tuotti eritt¨ain tarkko- ja tuloksia synteettisell¨a testidatalla, mutta ei saavuttanut toivottua tarkkuutta oikeille videoille. Kuitenkin lis¨a¨am¨all¨a opetusdataan hieman aitoja kuvia syn- teettisten kuvien ohelle, menetelm¨a saavutti kohtuullisen tarkkuuden mahjong- tiilien tunnistuksessa. Tulevaisuudessa tarkkuutta voidaan pyrki¨a parantamaan hienos¨a¨at¨am¨all¨a synteettisen datan generointia vastaamaan paremmin aitoja ku- via sek¨a parantamalla virheenkorjausmenetelmi¨a. Asiasanat: konen¨ak¨o, kohteentunnistus, konvoluutioneuroverkot, ssd Kieli: Englanti 3 Acknowledgements I would like to thank Prof. Juho Kannala for his guidance and support. I would also like to thank Dr. Juha Ylioinas for his advices. Espoo, May 21, 2019 Ossi Hirvola 4 Abbreviations and Acronyms CNN Convolutional Neural Network SSD Single-Shot Multibox Detector mAP Mean Average Precision FPS Frames Per Second 5 Contents Abbreviations and Acronyms 5 1 Introduction 8 1.1 Motivation . .9 1.2 Scope of the thesis . .9 1.3 Contributions . 10 1.4 Structure of the thesis . 10 2 Background 11 2.1 Object detection . 11 2.1.1 Datasets and tasks . 11 2.1.2 Traditional object detection . 12 2.1.3 Deep neural network based object detection . 13 2.1.4 Synthetic training data . 14 2.2 Riichi Mahjong . 15 2.2.1 Tiles . 15 2.2.2 Setup and game play . 15 2.2.3 Winning hand . 17 2.2.4 Calls . 19 2.2.5 Objective . 19 3 Single-Shot MultiBox Detector 21 3.1 Model structure . 21 3.2 Training . 22 3.3 Performance on standard benchmarks . 24 4 Detection of Mahjong tiles 26 4.1 Synthetic data generation . 26 4.1.1 Tiles . 26 4.1.2 Tile and camera positioning . 28 4.1.3 Background . 29 6 4.1.4 Lighting and shadows . 29 4.1.5 Post processing . 30 4.1.6 Bounding boxes . 31 4.2 SSD implementation . 31 5 Experiments 33 5.1 Training . 33 5.2 Synthetic validation performance . 33 5.3 Real data performance . 36 5.4 Combining real and synthetic training data . 37 6 Discussion 42 6.1 Result analysis . 42 6.2 Synthetic training data . 43 6.3 SSD for Mahjong tile detection . 43 6.4 Future work . 43 7 Conclusions 45 7 Chapter 1 Introduction The game of Mahjong is one of the most popular table games, with estimated player base of 700 million people [1]. By Mahjong, we refer to the 4-player game originating from China as shown in the figure 1.1, and it should not be confused with the single player digital tile matching game. There are many regional variations of Mahjong. In this thesis, when referring to Mahjong, we specifically mean the Riichi Mahjong originating from Japan. Figure 1.1: Common view of a game of Mahjong. 8 CHAPTER 1. INTRODUCTION 9 1.1 Motivation Mahjong is a game of chance, meaning that the outcome of a single Mahjong game considerably involves chance, much like in Poker. Therefore, the im- mediate feedback of a decision does not necessarily determine whether the decision was good or bad. For players striving for improving themselves, ana- lyzing their games afterwards is crucial. Multiple online Mahjong platforms, such as Tenhou [2] provide complete digital game data of previously played online games for analysis. At the time, however, there are no applications that could provide digital replays of Mahjong games played in real life. The recent progress in object detection research gives reason to expect that a satisfactory result on detecting Mahjong tiles can be achieved with the current state-of-the-art object detection methods. Especially the lat- est convolutional neural network (CNN) based approaches achieve relatively accurate real-time object detection. The training of neural networks is an important part of these methods, and it often requires large amounts of la- beled training data [3]. Therefore, using synthetic images as training data has been an important area of research. Since there is no suitable Mahjong dataset publicly available at the time, we explore the possibilities of synthetic training data generation for Mahjong tile detection. The objective of this thesis is to bring digital replays of real life Mahjong games one step closer by providing a base solution for accurately detecting Mahjong tiles from video data. In addition, we examine the possibilities of synthetic training data approach for a well defined object detection problem. The video data used in this thesis was captured using inexpensive consumer web cameras. 1.2 Scope of the thesis There are multiple approaches that could possibly be used to achieve digital replays, but in this thesis we concentrate on the object detection approach. That is, acquiring game information from video data to produce digital replay of the game. This approach can be roughly split into two phases: detecting Mahjong tiles from video data, and applying Mahjong rules to create digital replay from the extracted tile information. The scope of this thesis is the first phase of the pipeline, the detection of Mahjong tiles from video data. CHAPTER 1. INTRODUCTION 10 1.3 Contributions • Synthetization of training data • Accurate Mahjong tile detection using SSD • Analysis of performance 1.4 Structure of the thesis This thesis is divided into six chapters. First, in Chapter 2, we present a background review of object detection, and explain essential Riichi Mahjong rules. Next, in Chapter 3, we thoroughly explain the Single-Shot MultiBox Detector (SSD) object detection method, which is used for Mahjong tile detection in this thesis. In Chapter 4, we give a step by step description of our synthetic training data generation process, as well as explain our SSD implementation. The experimental setting as well as the results are presented in Chapter 5. Then, the results are further discussed in Chapter 6. Finally, we conclude the thesis in Chapter 7. Chapter 2 Background In this chapter we discuss object detection, after which we give general ex- planation on the Riichi Mahjong rules that are essential for understanding the following chapters of this thesis. 2.1 Object detection Object detection is the problem of classifying and localizing objects from images. That is, in addition to recognizing objects in the image, each ob- ject's location in the image is to be estimated. Object locations are usually indicated with rectangular bounding boxes. 2.1.1 Datasets and tasks Object detection methods can roughly be categorized into two sub categories: generic object detection, and salient object detection. Generic object detec- tion focuses on localizing objects by determining the bounding boxes around the objects [4].