Automatic Recognition of Historical Buildings in Valletta Using Smartphone Technology
Total Page:16
File Type:pdf, Size:1020Kb
Automatic Recognition of Historical Buildings in Valletta using Smartphone Technology Donna Agius Supervisor: Dr. George Azzopardi Faculty of ICT University of Malta May 2016 Submitted in partial fulfillment of the requirements for the degree of B.Sc. ICT in Artificial Intelligence (Hons.) Abstract Building classification is a widely researched area in computer vision. In this re- search project a smartphone application is developed which uses computer vision tech- niques. The idea is introduced by pointing out several advantages and disadvantages of such a system. The application allows the user to take a picture of historical buildings, and then it automatically classifies it and gives information about the building. The technical terms that are used in this project, are described in the following chapter, and previous work from the literature is highlighted. Building recognition is ultimately an object recognition problem, so in the second chapter we look at different approaches to object recognition algorithms applied to various building datasets. An overview of the system is described next, and the design of how the system is implemented is highlighted. The section describes how the user can upload a photo, receive information regarding the respective building and give back feedback to the system. In the same chapter there is also a description of the process to create three distinct datasets using data from thirteen Maltese buildings. Furthermore, an "Un- known" category was included to make the project more realistic. The implementation is discussed in the fourth chapter, and the system to recognise a building is described in detail. The full operation is discussed in detail, i.e. uploading the image, recognising the building, sending back the data to the user, and the user sends back feedback to the system. A flowchart is also drawn to show the full process to recognise the building in a query image. Next, several experiments were implemented, such as using different image pro- cessing techniques, or applying the algorithm on various datasets. The application is also distributed to users for evaluation. Results show that the proposed system is very effective. The algorithm gained also an accuracy which is equal to the state-of-the-art on the Zurich Building benchmark dataset. Finally, ideas are discussed to further improve the application, such as implement- ing augmented reality, and using deep learning algorithms. Acknowledgements: I would like to thank my supervisor Dr. George Azzopardi for the advice, guidance and constant feedback in order to create such a project. I would also like to thank my parents, my sisters and Julian for their continuous support and encouragement. Contents 1 Introduction 1 1.1 Thesis Statement . 1 1.2 Motivation . 1 1.3 Scope . 2 1.4 Approach . 2 1.5 Aims and Objectives . 4 1.6 Report Layout . 4 2 Background and Literature Review 5 2.1 Background . 5 2.1.1 Feature detection and description . 5 2.1.2 Feature description: Scale Invariant Feature Transform (SIFT) 6 2.1.3 Local Binary Patterns (LBP) . 7 2.1.4 Bag of visual words . 8 2.1.5 Machine Learning . 8 2.1.5.1 K-means . 9 2.1.5.2 Support Vector Machines . 9 2.2 Literature Review . 9 2.2.1 Application Context . 14 3 Specification and Design 15 3.1 Client-Server Model . 15 3.1.1 Dataset Acquistion . 15 3.1.2 Client Application . 17 4 Implementation 18 4.1 Client . 18 4.2 Server . 19 4.2.1 System configuration . 20 4.2.2 Application . 23 i 5 Evaluation 25 5.1 Evaluation Protocol . 25 5.2 Bag of Words and Vector of Locally Aggregated Descriptors . 26 5.3 Local Binary Patterns (LBP) . 27 5.4 Kernel Fusion of SIFT and LBP Features . 28 5.5 Normalised and unnormalised data . 28 5.6 Cropped and uncropped datasets . 29 5.7 Dataset containing images sourced online . 29 5.8 Confusion Matrix . 30 5.9 Investigating the \Unknown" Category . 32 5.10 Zurich Building Dataset (ZuBuD) . 32 5.11 Mobile Application . 32 5.12 Discussion . 33 6 Future Work 35 7 Conclusion 36 Appendix: Discover Valletta - Manual 42 ii List of Figures 1 A photo of a building, captured using a smartphone. 1 2 Multiple buildings in the same view. 3 3 An example of flat, edge and corner regions in an image. 6 4 Example of local feature matching. 7 5 The figure shows how the basic LBP operator works. 7 6 Overview of the system data flow. 15 7 Some buildings from the dataset. 16 8 Three different categories from the three different datasets. 17 9 A diagram showing the application, and its different functions. 19 10 A spatial pyramid using three levels. 22 11 A flowchart that shows the required steps to label a test image. 24 12 Various images from the dataset, where images are sourced online. 30 13 Confusion matrix of the Valletta Buildings dataset. 31 14 True positive, false positive and false negative samples from the dataset. 31 List of Tables 1 An overview of the building data sets and their complexity. 13 2 The datasets acquired. 16 3 Performance across different image processing techniques. 27 4 Results when testing normalised and data which is not normalised. 28 5 Accuracy garnered from cropped and uncropped datasets. 29 6 Accuracy gained from images sourced online, using both techniques. 30 7 Results obtained when omitting the \Unknown" category from the dataset. 32 8 Results on ZuBuD Dataset. 33 iii 1 Introduction 1.1 Thesis Statement When tourists wander in a city, they may not be familiar with every building. They may find an interesting building and may want to find information about it, as they are walking along. In this research project, I develop a smartphone application, which makes use of computer vision techniques. The application allows the user to take a picture of historical buildings (Fig. 1), and then it automatically analyses it and gives information about the building. Valletta is chosen as the most suitable candidate for the application for multiple reasons; the city is rich of many historical buildings in a rather small area. Moreover, Valletta is popular amongst tourists, and it will be the European capital city of the year 2018. Figure 1: A photo of a building, captured using a smartphone. 1.2 Motivation Building recognition and classification is an important task and an ongoing research topic, which is used in several applications such as video surveillance [1], navigation [2], robot localisation [3] and 3D city reconstructions [4] [5], among others. The ap- plication that I developed, can be used by people of all ages, particularly tourists and it will be beneficial to users to develop further their cultural knowledge, especially in tourism. Since Valletta is going to be the European capital city in 2018, it would be an opportunity to increase its popularity in Europe. 1 1.3 Scope In the application proposed, the user will be able to identify a single building for each picture submitted, i.e. multiple buildings in a single photo will not be considered. As a result, the user may crop any unnecessary clutter from the image, such as other buildings, trees, cars, etc. Finally, in order to predict the name of the building, the system will use computer vision techniques rather than the Global Positioning System (GPS) system. For a human, the identification of objects, such as buildings, is an effortless op- eration. It is also easy for a human to identify the same objects, even if they are in different angles or the objects themselves are skewed. For a machine, however, this is less simple, since the machine needs to identify the same building at different times of the day, different weather conditions, and in different angles, and each scenario may present with its own challenges. In this project, the data contains different photos of buildings in different conditions, such as photos taken in the morning or at night time. The buildings may also suffer from partial occlusions from trees, moving vehicles, or other buildings. This issue may interfere with the identification of the building as well. 1.4 Approach The proposed project can be implemented as a desktop application, but I opted for a smartphone implementation because of numerous advantages. By using a smartphone, the information about the building is given on the spot. Information is provided in real-time and therefore the user does not need to do research beforehand. As a result, this is particularly beneficial for tourists, since they do not need to research in advance, but can learn on-the-fly. The application enhances their experience while wandering around the city. Furthermore, the proposed system is highly accessible, as it only requires a smart phone and internet access. Internet connection may be expensive to tourists however, this issue will be minimised with the removal of roaming charges in the EU within the next few months. The mobile platform offers several advantages over desktop which makes it the pre- 2 ferred choice of target platform. The user has access to the information on the building in real-time, presented through a familiar interface with minimal distractions. Acces- sibility to the application also improves since most mobile devices have application stores from where you can download the software in a few steps. These stores also have application version control, meaning updates can be delivered to the user in more direct and convenient way. Adding more features, information and identifiable buildings to the application could be done through these updates. In this research project, the building is identified using computer vision techniques. Identifying the building could also be done using the user's position and orientation.