Master of Science in Computer Science September 2020

Domain Adaptation from 3D synthetic images to real images

Krishna Himaja Manamasa

Faculty of Computing, Blekinge Institute of Technology, 371 79 Karlskrona, Sweden This is submitted to the Faculty of Computing at Blekinge Institute of Technology in partial fulfilment of the requirements for the degree of Master of Science in Computer Science. The thesis is equivalent to 20 weeks of full time studies.

The authors declare that they are the sole authors of this thesis and that they have not used any sources other than those listed in the bibliography and identified as references. They further declare that they have not submitted this thesis at any other institution to obtain a degree.

Contact Information: Author: Krishna Himaja Manamasa E-mail: [email protected]

University supervisor: Prof. Hakan Grahn Department of Computer science

External supervisor: Xiaomeng Zhu E-mail: [email protected]

Faculty of Computing Internet : www.bth.se Blekinge Institute of Technology Phone : +46 455 38 50 00 SE–371 79 Karlskrona, Sweden Fax : +46 455 38 50 57 Abstract

Background: Domain adaptation is described as, a model learning from a source data distribution and performing well on the target data. This concept, Domain adaptation is applied to assembly-line production tasks to perform automatic qual- ity inspection. Objectives: The aim of this master thesis is to apply this concept of 3D domain adaptation from synthetic images to real images. It is an attempt to bridge the gap between different domains (synthetic and real point cloud images), by implementing deep learning models that learn from synthetic 3D point cloud (CAD model images) and perform well on actual 3D point cloud (3D Camera images). Methods: Through this course of thesis project, various methods for understand- ing the data and analysing it for bridging the gap between CAD and CAM to make them similar is looked into. Literature review and controlled experiment are research methodologies followed during implementation. In this project we experiment with four different deep learning models with data generated and compare their perfor- mance to know which deep learning model performs best for the data. Results: The results are explained through metrics i.e, accuracy and train time, which were the outcomes of each of the deep learning models after experiment. These metrics are illustrated in the form of graphs for comparative analysis between the models on which the data is trained and tested on. PointDAN showed better results with higher accuracy compared to the other 3 models. Conclusions: The results attained show that domain adaptation for synthetic im- ages to real images is possible with the data generated. PointDAN deep learning model which focus on local feature alignment and global feature alignment with single-view point data show better results with our data.

Keywords: Domain adaptation, Transfer learning, Deep learning, 3D point clouds Acknowledgments

This thesis is an industrial research, carried out in Smart Factory Lab, TEED, Sca- nia CV AB, Södertälje, Sweden. I take immense pleasure in thanking my supervisor Xiaomeng Zhu for guiding me throughout the whole research project. Many thanks for her patience and support.

I would like to thank Prof. Håkan Grahn, my university supervisor and advisor for all the valuable insights and support for making it an incredible learning experi- ence.

My special thanks to my manager Franz A Waker for providing such an amazing environment to develop my thesis and to my collegue Juan Luis for helping me gen- erate CAD data. My colleagues at TEED, Smart Factory Lab equally helped me throughout the whole period and made it enjoyable.

Lastly, I thank my parents Rammohan Manamasa and Radha Rani Manamasa, fam- ily and friends for all the love and support who stood by me through these tough times during covid-19 pandemic situation.

ii Contents

Abstract i

Acknowledgments ii

1 Introduction 3 1.1 Problem statement ...... 3 1.2Aim...... 4 1.2.1 Objectives ...... 4 1.3 Research questions ...... 4 1.4 Contribution ...... 5 1.5 Ethical, Societal and Sustainable aspects ...... 6 1.6 Outline ...... 6

2 Background 7 2.1 Automatic quality inspection ...... 7 2.1.1 Quality inspection ...... 7 2.1.2 Automatic quality inspection of assembly ...... 7 2.2 Machine learning and Deep learning ...... 8 2.2.1 Transfer learning ...... 9 2.3 Artificial Neural Network ...... 9 2.3.1 Activation function ...... 10 2.3.2 Loss function ...... 11 2.3.3 Back propagation ...... 11 2.4 Convolutional Neural Network ...... 11 2.4.1 Convolutional layer ...... 12 2.4.2 Pooling layer ...... 12 2.4.3 Fully-connected layer ...... 12 2.5 Domain adaptation ...... 12 2.5.1 Unsupervised domain adaptation ...... 14 2.6 Point cloud data ...... 15

3 Related Work 16 3.1 Literature review on previous studies ...... 16 3.2Drawbacks...... 17

iii 4 Deep learning models 18 4.1PointDAN...... 18 4.1.1 Architecture ...... 19 4.2 Self-supervised domain adaptation network ...... 20 4.2.1 Architecture ...... 21 4.3 Pointnet++ ...... 21 4.3.1 Architecture ...... 22 4.4Pointnet...... 22 4.4.1 Architecture ...... 23

5 Methods 24 5.1 Data preparation ...... 24 5.1.1 Source data ...... 25 5.1.2 Target data ...... 25 5.2 Data ...... 26 5.3 Dataset labelling ...... 26 5.4 Data cleaning ...... 26 5.5 Model selection through literature review ...... 27 5.5.1 Search terminology and Search strings ...... 27 5.6 Methodology and Experimental design ...... 29 5.6.1 Experimentation on the deep learning models ...... 29 5.6.2 Evaluation ...... 31

6 Understanding the data 32 6.1 WRL to point cloud conversion ...... 32 6.2 Single view point cloud ...... 33 6.3 Generation of data from camera ...... 37 6.4 Other methods ...... 39

7 Results 40 7.1 Results ...... 40 7.1.1 Model 1-PointDAN ...... 40 7.1.2 Model 2-Self-supervised network ...... 42 7.1.3 Model 3-Pointnet++ ...... 43 7.1.4 Model 4-Pointnet ...... 45

8 Analysis and Discussions 48 8.1PointDAN...... 48 8.2 Self-supervised network ...... 49 8.3 Pointnet++ ...... 49 8.4Pointnet...... 50 8.5 Limitations and Challenges ...... 51 8.5.1 Validity threats ...... 52

iv 9 Conclusions and Future work 54 9.1 Conclusion ...... 54 9.2 Answering the research questions ...... 55 9.3 Recommended Future work ...... 56

References 57

A UI for automatic quality inspection 62

v List of Figures

1.1 Domain Adaptation [38] ...... 3

2.1 Simple neural network[8] ...... 10 2.2 Different activation functions[44] ...... 10 2.3 Convolutional Neural Network[29] ...... 12 2.4 Categories of domain adaptation[19] ...... 13 2.5 Deep domain adaptation[46] ...... 14 2.6 Unsupervised domain adaptation[9] ...... 15 2.7 Example point cloud image [14] ...... 15

4.1 Feature alignment[37] ...... 18 4.2 PointDAN Network [37] ...... 19 4.3 Self-supervised domain adaptation network [1] ...... 21 4.4 pointnet++ network [36] ...... 22 4.5 pointnet network [34] ...... 23

6.1 MeshLab software ...... 33 6.2 Example of perspective projection[49] ...... 34 6.3PedalCar...... 35 6.4 Cropping the pedal car to project only the wheel with bounding box using functions and libraries in python...... 35 6.5 Cropped wheel ...... 36 6.6 Single view point cloud wheel ...... 36 6.7 TrispectorP1000 along with the robot ...... 37 6.8 Operator of the robot to scan the wheel ...... 38 6.9 CloudCompare Software ...... 39

7.1 Overall accuracy vs epoch graph when the data is test with pointDAN network...... 41 7.2 Loss vs epoch graph when the data is trained and tested on PointDAN 41 7.3 Class-wise accuracy for PointDAN model ...... 42 7.4 Overall accuracy vs epoch graph when the data is test with self- supervised network...... 42 7.5 Loss vs epoch graph when the data is trained and tested on self- supervised network...... 43 7.6 Class-wise accuracy on Self-supervised model...... 43 7.7 Overall accuracy vs epoch graph when the data is test with Pointnet++ 44 7.8 Loss vs epoch graph when the data is trained and tested on Pointnet++ 44

vi 7.9 Class-wise accuracy on Pointnet++ ...... 45 7.10 Overall accuracy vs epoch graph when the data is test on Pointnet . 45 7.11 Loss vs epoch graph when the data is trained and tested on Pointnet 46 7.12 Class-wise accuracy on Pointnet ...... 46 7.13 Accuracies of the 4 deep learning models ...... 47

A.1 User Interface for automatic quality inspection ...... 62 A.2 Software initialised and Application running for automatic quality in- spection ...... 63 A.3 Terminal that shows the accuracy of the image inspected for each class. 63

vii List of Tables

5.1CADdata...... 26 5.2CAMdata ...... 26 5.3 Results obtained after conducting literature review ...... 27

8.1 Accuracy results on PointDA-10 Dataset with different domains[37] . 48 8.2 Test set accuracy results on general dataset [1] ...... 49 8.3 pointnet++ deep learning model accuracy on modelnet40 data [36] . 50 8.4 pointnet deep model accuracy on shapenet data [34] ...... 50 8.5 Comparative analysis of accuracies with PointDAN and self-supervised deep learning models ...... 51 8.6 Comparative analysis of accuracies with Pointnet++ and Pointnet deep learning models ...... 51

1 Nomenclature

RGBD Red Green Blue Depth

ANN Artificial Neural Network

CAD Computer Aided Design

CAM Camera

CNN Convolutional Neural Network

DGCNN Disordered Graph Convolutional Neural Network

DILD Deep learning for domain Adaptation by interpolating between domains

DOF Degrees of Freedom

FPS Farthest Point Sampling

GAN Generative Adversarial Network

GPU Graphical Processing Unit

IPS Industrial Path Solutions

MCD Mean Classifier Discrepancy

MMD Maximum Mean Discrepancy

NN Neural Network

PCL Point Cloud Library

PLY Polygon File Format

SSL Self Supervised Learning

VRML Virtual Reality Modeling Language

WRL Extension to VRML

2 Chapter 1 Introduction

Domain adaptation is a branch under Machine learning and is closely associated with transfer learning. The concept domain adaptation is described as, a model learning from a source data distribution and performing well on target data. In recent years, domain adaptation has come into light as, it is applied in computer vision tasks such as detection, segmentation, image categorization etc. In most of the cases, these tasks are performed to bridge the gap between different domains.

Figure 1.1: Domain Adaptation [38]

For example, through an autonomous car’s camera, the images of a specific area in a city are captured and these images are trained on a network model. when the car is tested in those particular city streets everything works fine. But, when it is taken around a different place it wont be able to recognise certain elements as the scenario is different. The challenge faced in this example is adaptation of the model.

1.1 Problem statement

Computer vision technologies have been widely applied in robotics and quality in- spection. However, not many experiments has been done specifically for tasks that involve vehicle assembly quality inspection or automatic quality inspection for com- plex production processes[31]. Applying automatic quality inspection on vehicle assembly can bring significant difference in improving the overall product quality in manufacturing stage. Concepts of deep learning are investigated to discover the best solutions for automatic quality inspection.

3 The use case represented in this project is, to observe a pedal car and scan it through a 3D camera to generate a 3D point-cloud image and inspect its quality automatically through deep learning. The application is based on recognising the specified parts and classifying the images to different classes. This classification helps in automatic quality inspection of the wheel.

Certain limitations and challenges are posed while attempting domain adaptation in this field. But, the motivation to focus on domain adaptation is due to require- ments such as need for quite a large training set for the alignment of synthetic image to real image. Looking into the perspective of matching the features to the ren- dered views of a CAD image[25]. And also the aspect of generating the data, which consumes a large amount of time.

1.2 Aim

The aim of this project is to bridge the gap between different domains (synthetic and real point cloud images) by implementing deep learning models that learns from synthetic 3D point cloud (CAD model images) and performs well on real 3D point cloud (3D Camera images). The focus is on 3D point cloud data as, it gives various perspectives of the 3D image generated. These perspectives could be an additional dimension and some extra parameters in the data. These can help the deep learning model understand and learn with a specific set of features to achieve better results. This project also focuses on bridging the gap in domain adaptation in 3D images specifically, as another dimension (3D instead of 2D) to the images help in gathering additional information such as effect of environment(background and light) on the image etc, which also helps in the analysis of the quality.

1.2.1 Objectives • To prepare and analyse the data generated and narrow down the gap between two domains so that CAD and CAM look similar.

• To identify the deep learning models that perform on real 3D point cloud, when learnt from synthetic 3D point cloud.

• To apply the deep learning models on the data generated and check the per- formance of each model, which helps in automatic quality inspection.

1.3 Research questions

The research questions that are tackled throughout this project are detailed below:

RQ1: How to narrow the gap between CAD generated images and camera gen- erated images, to make them look similar? Motivation: To analyse the data generated from IPS software and from Sick Trispector camera, which helps in preparing and pre-processing the data.

4 RQ2: Can domain adaptation be achieved from synthetic images(CAD) to real images(CAM) ? Motivation: To identify the state of the art deep learning models that can bridge the gap between the domains, i.e, from synthetic CAD generated images to camera images.

RQ3: How is the performance of the selected deep learning models from RQ2 anal- ysed on target data(CAM)? Motivation: Based on the research methodology conducted, the deep learning mod- els selected would be applied to the data generated to check the performance and compare to learn which outperforms the other, for application of automatic quality inspection.

These research questions were formulated by carrying out literature reviews on previous studies. In most of the research papers, the authors have explained domain adaptation by comparing different machine learning techniques and deep learning models[1, 4, 22, 30, 36]. The above questions results in answering certain objectives and goals of this project.

1.4 Contribution

Work on domain adaptation has been previously done on several projects. By thor- oughly going through articles, journals, literature and research papers[4, 16, 7, 6], we have concluded that this project is specific to an industry level, which is one of the plus point. Also this project helps in achieving automatic quality inspection during assembly of the product, which is the major contribution to the manufacturing in- dustry. This project could help create proof of concepts for other complex production processes in the manufacturing phase that can utilise automatic quality inspection to reduce labour and costs.

The main contributions of this project are :

• Analysis of different deep learning models by doing an extensive literature review on previous researches and selecting the suitable ones for the data gen- erated.

• Concept of full view point cloud to single view point cloud is introduced to make the domains look similar.

• Implementation of the deep learning models with a complete new set of data generated.

• Analysis of the deep learning models’ performance by identifying the images from different domains.

• Establishing a good model that work well with the specified use case scenario and also which can be implemented in future to discover other use cases.

5 1.5 Ethical, Societal and Sustainable aspects

Regarding Ethical issues, the data and resources used in this experiment is enclosed by an authorised management and the deep learning models used in implementation are open-source. Either survey or interview which involve people or their information is not considered in this project, therefore it lacks societal aspect. As this project is still a proof of concept which is being in its initial stages, there is no room for sustainable aspects as well.

1.6 Outline

Chapter 2(Background) focuses on explaining the background and theoretical as- pects of this research project. Chapter 3(Related Work) in this report explains about some of the previous research studies and related work done in this field and area which solve similar problems. Chapter 3(Background) focuses on explaining the background and theoretical aspects of this research project. Chapter 4(Deep learning models), gives an in-depth knowledge about the selected deep learning models and their respective architectures.Chapter 5(Methods) discusses the un- derlying methods for implementing this research and explains about the data. In Chapter 6(Understanding the data) the data is described in detail and expla- nation of the data generation and formation is seen. Chapter 7(Results) explains about the outcomes and results when the models are implemented and further dis- cussions are detailed in chapter 8(Analysis and Discussion). Lastly Chapter 9(Conclusion and Future work), concludes the research project by explaining about the learnings and the proposal of future work.

6 Chapter 2 Background

2.1 Automatic quality inspection

The goal of this project is to achieve domain adaptation which further helps in automatic quality inspection in assembly.

2.1.1 Quality inspection Quality inspection is defined as an activity for examining, testing or inspecting the characteristics of an object and comparing the results with the requirements. This confirms if the results meet the characteristics as verified. Automated quality inspec- tion is similar to the inspection procedure, but involves automation in some steps during the process of inspection. Some examples that implement automatic quality inspection are, checking the quality of fruits, inspecting the quality of certain man- ufacturing products or machines etc.

Advantages of automatic quality inspection: • Systems can be programmed and operated remotely. • They can be faster in inspection processes and are easily adaptable. • Compared to manual inspection, automated inspections are more accurate.

2.1.2 Automatic quality inspection of assembly Assembly is one of the stage in manufacturing and production processes. Assem- bling the parts or minor objects to finish the product as a whole are some procedures observed in this stage. These procedures can be done manually or through a robot. Sometimes there an error can occur when such procedures are done by humans man- ually.

Quality inspection plays a crucial role in such procedures as inspecting the final product is an important task. Compared to manual quality inspection, automatic quality inspection can improve the quality of the product. As the process is more automated, it is easily repeatable and accurate which produces high quality products.

In assembly, conducting a manual quality inspection can be time consuming which is also prone to more human errors and also labour intensive. Shifting the procedures

7 in assembly to more automated level can be cost efficient and get the products to the market more quickly.

2.2 Machine learning and Deep learning

The ability of the computers to construct significant patterns and descriptions of real world objects from images is termed as computer vision [18]. Machine learning is a branch of artificial intelligence and is described as a technique which helps the sys- tems to learn from the data, to identify patterns and help the systems make decisions.

Four machine learning methods are :

• Supervised learning

• Unsupervised learning

• Semi-supervised learning

• Reinforcement learning

Supervised learning describes about labelled data. The algorithm tries to pre- dict the output from the input given. Regression and classification are the tasks mainly performed by supervised models. Unsupervised learning deals with un- labelled data and the algorithm tries to obtain the structure from the input. Clus- tering and association are some of the tasks performed by unsupervised models. Semi-supervised learning is a combination of supervised and unsupervised learn- ing with some missing data labels, also known as partial training data, which is fed to the model for it to learn. Reinforcement learning is ought to learn from the actions taken by itself, in a context by assigning rewards for the best possible actions taken by the model and by attaining punishment for a bad move. The goal of re- inforcement learning is to receive maximum rewards by making best possible choices.

Some conventional machine learning techniques include representational learning which involves processing the data in it’s natural form and classifying into represen- tations automatically for detecting patterns[21] commonly known as deep learning method. This is obtained through multiple levels, that consists of modules for each representation after the transformation, from raw input level to more abstract. The concept of deep learning techniques is defined by these layers which are not designed but learned from the data [21].

Deep learning is a part of machine learning methods applied on artificial intel- ligence. It is referred as deep learning by identifying the number of layers through which the transformation of data takes place. It uses multiple layers to extract the features from the given input. Applications of deep learning architectures such as Convolutional neural networks, recurral neural networks, deep neural networks etc are mostly seen in computer vision, natural language processing, image analysis, bioinformatics etc.

8 2.2.1 Transfer learning Transfer learning is one of the popular research problem under machine learning. This approach transfers the learnings that are stored from an existing pre-trained model and are applied to create a new model with similar tasks which is related to the pre-trained model[3, 27]. This approach is followed to save the time for training a newly created model for the same tasks and also helps in reducing the resources used for a new problem. The existing pre-trained model can be used for a completely new task but related to the previous ones by fine-tuning the model according to the dataset.

2.3 Artificial Neural Network

The concept behind Artificial Neural Network (ANN) was derived from the struc- ture of biological neural network and it’s functions. ANN is known to be a compu- tational model which comprises parameterized computational nodes called artifical neurons[15]. The neurons are connected to one another through a spatial layer ar- rangement. The output of the neuron from the previous layer is given as an input to the next layer, which continues for a specified path, till the whole network attains an output. The connection among these neurons is weighted. Mathematically, a neuron can be defined as a function, n f(x; wj):=σ(w0j + wijxi) (2.1) i=1 N N Here, x is the input vector, wj ∈ R +1is the input weight and σ : R →− R is the activation function.

The main components of the neural network consists of ; • neurons • propagation function • connection and weights In ANNs, the arrangement of neurons is in the form of simple feedforward neural network which consists of multiple layers. From the illustration of figure 2.1, the three main types of layers are:

1.Input layer: In this layer, the input data is passed on to the hidden layers without any calculation.

2.Hidden layer: In this layer, the neurons proceed with linear calculations from the input given to them and the results after calculation as passed on to the neurons in the subsequent layer.

3.Output layer: The output layer is similar to the hidden layers, as it takes in- put from the neurons of previous layers to calculate the final result of the whole network.

9 Figure 2.1: Simple neural network[8]

2.3.1 Activation function

An activation function, also known as transfer function, is defined to be the output of the node, for a set of input given. The activation function helps in determining the results between the values 0 to 1 or -1 to 1 of the neural network.

In general, a non-linear approximation is restrained in the neural network, as only the linear calculations are processed inside the neuron with weighted inputs and bias. Activation function is introduced into the network for it to enhance the non-linearity of the output for the whole neural network. Figure 2.2 depicts Some of the commonly used activation functions which are sigmoid, ReLU, TanH, Leaky ReLU, Exponential LU.

Figure 2.2: Different activation functions[44]

10 2.3.2 Loss function

The loss function is determined by analysing the actual value to the output value achieved by the model. Loss function calculates the model’s performance by com- paring to the model prediction output and the actual input. If the loss attained is very high then the predictions with the actual input are not so accurate. If the loss is low, then the model is described as it made good predictions. Two commonly used loss functions in neural networks are:

1. Cross entropy loss function

2. Quadratic loss function

Cross entropy loss function, also known as log loss, measures the difference between two probability distributions and is mostly used in classification model prob- lems. An exponential increase is seen in the loss graph of cross entropy as the gap between the predicted value to the actual value widens.

1 N LCE = −1 yi.log(p(yi)) + (1 − yi).log(1 − p(yi)) (2.2) N i=1

Where, y indicates the label. p(y) is the prediction probability for N points.

Quadratic loss function, also known as mean square error, is measured as the average squared difference between the actual and predicted values. This function deals with the average magnitude of error and is not not concerned with the direction.  n 2 i=1(yi − yˆi) LMSE = (2.3) n

2.3.3 Back propagation

A widely used algorithm in machine learning for feedforward neural networks. Back propagation calculates the gradients of loss functions considering the weights of the network[39]. when deriving back propagation, specific activation and loss functions do not matter.

2.4 Convolutional Neural Network

A convolutional neural network(CNN) is a kind of deep neural network. The struc- ture of CNN consists of convolutional layers, pooling layers and fully-connected layers and these layers represent different parameters and perform actions based on the in- put data. Figure 2.3 illustrates a convolutional neural network specifying the layers.

11 Figure 2.3: Convolutional Neural Network[29]

2.4.1 Convolutional layer In some of the applications of ANN like classification of images, the input given to the network is in the form of an image, due to this, parameter explosion takes place in neural network which increase the cost of computation. To avoid such parameter ex- plosion, CNN utilizes the neuron called as convolutional kernel, which slides over the image and extracts the feature maps from the input image in the convolutional layer.

In these layers, filters are applied to the original input or to feature maps in a deep CNN. This layer consists of most of the parameters specified in the network for training. some of the main parameters in this layer are: number of kernels, size of kernels, activation function.

2.4.2 Pooling layer Pooling layers perform specific tasks or functions like max pooling or average pooling, otherwise they are almost similar to convolutional layers. These functions are applied to consider the maximum value in certain filter region or take the average values in certain filter regions. These functions help in reducing the dimensionality of the network as well.

2.4.3 Fully-connected layer These layers are placed before the classification output of the convolutional neural network and these layers help to flatten the results before the classification is done. They aggregate the information from the final feature maps and generate a final classification.

2.5 Domain adaptation

In the field of computer vision, domain adaptation is defined through the terms transfer learning and machine learning. When a model trained from a set of source data distribution and is tested with another target data distribution, shift from one domain to another occur which can drop the performance of the model. Figure 2.4

12 describes and depicts various categories of domain adaptation.

Categories of domain adaptation:

• One-step domain adaptation This is done in a single step transformation.

• Multi-step domain adaptation Traverses multiple domains in the process.

Figure 2.4: Categories of domain adaptation[19]

Classification of domain adaptation is of 3 types:

• Supervised domain adaptation, It is known to be supervised when all the data considered is labelled.

• Semi-supervised domain adaptation, is where only a set of target data is la- belled.

• Unsupervised domain adaptation, the learning model consists of a set of source data which is labelled along with a set of unlabelled source and target examples.

In [30, 5, 6, 25] the authors have focused on domain shift problem from synthetic to real images by analysing the performance of state of the art deep learning models for domain adaptation and experimenting on object detection by auditing different approaches. From the above mentioned papers, the concept of domain adaptation is thor- oughly studied to apply in this project. According to the authors, the annotations from the 3D synthetic images were difficult to render and match with the natural images, hence performance of the model drops significantly when it is tested on dif- ferent domain data. The working of domain adaptation, which describes training and testing is observed in figure 2.5. In this project, one of the goal is to reduce the domain shift problem by making the source and target domains look similar.

13 Figure 2.5: Deep domain adaptation[46]

This project aims to analyse the data generated and implement a deep learning model which helps in automatic quality inspection by reducing the domain gap in the manufacturing industry. Two main factors involved in domain adaptation are [4]:

1. Space coverage of the images from different domain images

2. Diversity in size or texture in the domains

This can be achieved by understanding the source data and changing it in ac- cordance with the target data i.e; by matching the coordinates and by normalising the images, so that both domains have similar size and texture to reduce the gap between the domains.

2.5.1 Unsupervised domain adaptation Unsupervised domain adaptation is performed when the source data is labelled and the target data is unlabelled. some of the assumptions in this method are,

• Probability distributions of both source and target data are not equal.

• Conditional probability distributions when given instances are equal.

This transformation is performed from source to target domains to get the dis- tributions closer. Later train the classifier on the source distribution which is trans- formed and check the accuracy of the model which improves periodically.

Neural networks play an important role to achieve these transformations. Let the transformation of the performing network be denoted as F and the parameters of the neural network be W. Source and target instances are denoted as s and t respectively. After the transformation is applied on source and target instances,

F (s, W )=Vs (2.4)

F (t, W )=Vt (2.5) are the vectors where, Vs is Source instance vector and Vt is the target instance vector.

14 As illustrated in figure 2.6, three main components in an unsupervised domain adap- tation are: • Feature extractor It is a NN that learns to perform transformation on source and target instances. • Label classifier Transformation of the labelled source instance is done through the neural net- work. • Domain classifier Prediction is done by this neural network to known whether the feature extrac- tor output is from source instance or target instance.

Figure 2.6: Unsupervised domain adaptation[9]

2.6 Point cloud data

Point clouds are defined to be unified structures which are simple and easy to learn, by avoiding complexities and irregularities[34].They are structures created from data points by 3D scanners or cameras. Purpose of Point clouds are widely used to create 3D CAD images in quality inspection, manufacturing etc. Recently large amounts of 3D point cloud data are captured by cameras and sensors[37]. Point clouds derived from the 3D sensors are collected as direct source of information which describe that particular object state [32]. Figure 2.7 depicts the points cloud image version of a general RGBD image.

Figure 2.7: Example point cloud image [14]

15 Chapter 3 Related Work

To recognise and understand more approaches in this area, several existing literature and academic published articles and journals were looked into. Reviewing each of these materials helped in understanding the concept and various algorithms, deep learning models which are used in domain adaptation.

3.1 Literature review on previous studies

In the papers [16, 4] the authors discuss on bridging the gap between source and target domains using GAN (Generative Adversarial Network) which is followed by experi- ments through conditional generator and discriminator. Generator is responsible for new examples which are indistinguishable from the real example and the discrimi- nator for identifying and classifying the real ones from the fake(newly generated). Once the model is trained, it transforms synthetic point cloud data to real point cloud data which minimises this gap between the domains.The paper[16], analyses these results through CNN model YOLOv3 which is mostly used for object detection.

In the papers [7, 5, 28] the authors have discussed about their experimentation with deep learning models and object detection on 3D models through unsupervised domain adaptation. Experimental setup tasks include closed-set classification, open- set classification and object detection that are performed on the trained model[28]. Based on the results acquired the authors have proposed some new benchmarks such as Syn2Real, Vote3deep, DLID for domain adaptation.

Papers [20, 6] illustrate some general techniques applied for domain adaptation, like deep neural networks using photo realistic image style transfer algorithm for do- main stylization. These studies were known to be exemplar approaches which were applied on 3D point cloud data by using certain classifiers like SVM and boosting to train the models.

The authors of the papers[34, 30, 26, 23] have prioritised the research on analysing the performance of the deep learning models through object detection on 3D point cloud data. To evaluate and identify 3D shape classfiers, models such as PointNet, PointRCNN, VoxNet were implemented on the data generated from CAD images.

The problem domain adaptation in most of the researches was solved by training through appropriate CNN methods. some methods were on the use of auto encoder

16 on target domain to draw domain invariant features[11]. while some other methods included clustering techniques and pesudo-labels[42]. To achieve an accurate domain adaptation, methods that involve distribution matching in the features of convolu- tional layers is also effective way which was followed in [10, 43, 33].

Application of volumetric convolutionals on grids which is achieved from point clouds is identified as a deep learning method on point clouds by [35, 48]. [12] discuss on domain adaptation in unsupervised approach for detecting the objects in different domains. The authors have proposed their method by representing the data into the generative subspace(within the domain) and points out of these domains are pre- sented on Grassmann manifold and to attain this domain shift the samlpling points along with the geometric information of the points is obtained.

Recently in the paper[1], the authors have introduced a new approach to solve the domain shift problem. (Self-supervised learning) SSL. This method learns meaning- ful geometric representations from unlabelled data and is adapted for domain shift. But it is not concluded in this paper that the model’s behaviour could be the same for 3D perception. SSL is carried out by following a technique known as region recon- struction, which is instigated from deformations of the images when transformation takes place from synthetic to real. The training procedure used by this learning is a MixUp methd.

The main base paper to conduct this research and carry out this project is from [37].The authors in [37] have succeeded in achieving domain adaption directly on 3D point cloud data. The researchers have used three different datasets(ModelNet, ShapeNet, ScanNet) which consist standardized images like table, chair, bed, lamp etc. The model focuses on alignment of local and global features of images across domains.

Recently the papers[13, 50, 51, 2, 24, 52], have emphasized on domain adaptation for point cloud data which is used for semantic segmentation for LiDAR point clouds.

3.2 Drawbacks

In most of the researches discussed, the dataset considered is more gneralised, which contains images of standard objects like table, chair, lamp etc. None of the researches have actually discussed their application in real-time environment in detail.

The dataset generated in this thesis research brings out a novel approach for implementing these deep learning models in real-time environment. The challenging part in this project is the application of the deep learning model on real-time images generated, which helps in automatic quality inspection. Also this research is specific to manufacturing industry based on the dataset and the use-case represented in this thesis.

17 Chapter 4 Deep learning models

4.1 PointDAN

PointDAN model is a novel approach designed towards bridging the gap between the domains through unsupervised network on 3D point cloud data. This model approaches the method through aligning the local and global features of the data. This is attained through self-adaptive nodes which dynamically aligns the features across the domains. Two main features in this pointDAN model are : • Attention module • Adversarial training Unsupervised domain adaptation models is defined to have a labelled source do- main with certain data points and an unlabelled target data points. These point cloud data has the input representational structure as 3-Dimensional coordinates being (x,y,z). The important aspect in an unsupervised domain adaptation is to determine the mapping function, that projects the input to a shared feature space among the different domains.

Local geometric information of the point cloud data from both domains play a crucial role for the alignment of the features. For example, From the figure 4.1 il- lustrated below, the object being table, is given the same class name for both the domains for identification. The image table from ScanNet dataset misses some parts due to environmental disturbance while scanning through LiDAR. where as the im- age table from the domain ModelNet dataset is visible proper structure.

The main concept of local alignment here is to focus on the similarities with the structures and ignore other parts to extract and match the features across these domains. PointDAN model is constructed for selecting the key nodes for better alignment of the local features.

Figure 4.1: Feature alignment[37]

18 Self-adaptive nodes are constructed to attain these local and global alignment of features. This is carried out by defining the node in the point cloud. Each point cloud is represented with a set of point sets and local geometric structures along with the nearest-neighbors points. Based on the location of the defined node, the local region and the points surrounding it are considered.

This model considers some commonly used methods such as farthest point sam- pling or random sampling [36, 22] to attain centre node, which is employed to achieve local features. This also allows to deal with domain alignment which covers the com- mon characteristics of the 3D geometric shapes and ignores the some particular parts in the objects.

4.1.1 Architecture The figure 4.2 describes the pointDAN model for the alignment of features. It is observed that the semantic information in module is reconstructed by adding weights to each of the edge and aggregating them together to get the predicted value direction. Through voting, for edges having significant differences prediction shift is decided.

Figure 4.2: PointDAN Network [37]

Farthest point sampling is performed on the point clouds and the location of the nodes is first initialized, the nearest neighbour points are collected based on the formation of these regions together. The equation to compute the offset for the C-th node is : 1 k Δˆxc = (RT (vcj − vˆc).(xcj − xˆc)) (4.1) k j=1 where, Δˆxc = Predicted location offset for C-th node. RT = Weight of one convolutional layer for feature transformation. xˆ = Node location xˆc = Neighbour point of the node xcj − xˆc = Edge direction vcj and vˆc = Feature points (Mid-level) extracted from the encoder V.

Self-Adaptive node attention

An attention module is designed within this model to know the relation among

19 the nodes. This module is also important for weighting the various SA nodes gener- ated based on their improvement, which helps in domain alignment and also capture some of the spatial features.For the alignment of domain, node attention network in this model is applied by specifying the the SA nodes through bottleneck network[14].

Self-Adaptive node feature alignment

Minimisation of MMD (Maximum mean discrepancy) loss is rendered for cross- domain alignment through SA node features. This is done to overcome the perfor- mance drop of GAN method since, both network parameters and offsets are unstable due to the disturbances occured from the gradient values with local alignment. MMD is defined as a measure to find the distance between distributions on space.

Global feature alignment

To decrease the distance among the features of cross-domains, global feature align- ment is enforced in this model. This is done once all the features are extracted by the generator network. Compared to local feature alignment, global feature align- ment indicates a stable nature with the input values given and also increases the options of GAN methods to choose. This model imposes MCD( Maximum Classifier Discrepancy) [41]for global feature alignment due to it’s exceptional performance on generalised domain alignment.

4.2 Self-supervised domain adaptation network

Self-supervised learning(SSL) is an evolving branch of machine learning most com- monly used for insufficient or poorly labeled data. In the context of domain adap- tation we are using self-supervised learning to solve the classification problem of a poorly labeled target domain point cloud data distribution using the well labeled the source point cloud data. As observed in the figure 4.3, the modules present in SSL are detailed below.

There are two main modules in this model,

• Region Reconstruction SSL task

• Point-Cloud Mix-up

Region Reconstruction SSL task: It involves capturing semantic properties of the point cloud which is useful to divide the data point into meso-scale features so that the deconstruction task can be repeated in the same way for the target data as well. In this module, distorted point-clouds are generated with mid-sized regions and then train them in the model using the original point-cloud regions and labels[1]. These regions are called as voxels. Simultaneously, we train an encoder to reconstruct the distorted point cloud. Corresponding voxel is generated with the same size and but points picked at random from a isotropic gaussian distribution centered at the center of the corresponding voxel, thus generating input-output pairs[1]. Finally an

20 encoder is used to reconstruct the distorted voxel using Chamfer distance between the input-output pairs as the loss function.

Point-cloud Mix-up: Mix-up method involves data augmentation using input samples and labels. This method generates a labeled sample as a convex combination of the different inputs point[1]. The Idea behind the Mix-up method is that, it generates a larger point-cloud for a particular category which captures the target distribution better.

Figure 4.3: Self-supervised domain adaptation network [1]

4.2.1 Architecture • For feature extraction DGCNN is used with four point-cloud convolution layers and 1 fully connected layer before extracting a global feature vector by max- pooling[1].

• For PCM method a head classification is implemented using three fully con- nected layers, and 0.5 dropout between the two hidden layers[1].

• A special transformation network is implemented to align the input set to a canonical space using two point cloud convolution layers of sizes 64,128, a 1D convolution layer of size 1024 and three fully connected layers of size 512,256,3[1].

• The self-supervised learning is a four 1D convolution layers with input as global feature vectors. Batch normalization is applied after each convolution layer and leaky ReLU activation with a slope of 0.2[1].

4.3 Pointnet++

PointNet++ is a deep learning model based on Pointnet classifier architecture. Point- net++ model has an architecture which hierarchically builds the groups of points and generating larger local regions along the hierarchy[36]. In our experiment we build a transfer learning model based on PointNet++ architecture. Here the pointnet++ architecture is trained on CAD data set(source data) and then transfer the learnings of the CAD data to CAM data( target data) for classification. Pointnet++ uses

21 hierarchical datapoint grouping to generate features, in which local alignment of the features is seen.

Figure 4.4: pointnet++ network [36]

4.3.1 Architecture

Figure 4.4 illustrates different layers in the architecture. The three major layers in the Pointnet++[36] architecture:

• Sampling Layer: It iteratively uses farthest point sampling (FPS) to choose a subset of points in this layer. It has better coverage of the entire pointset than random sampling.

• Grouping Layer: This generates groups of pointsets where each group represents local regions in the image. Each region is generated using an algorithm similar to K nearest neighbors. While kNN finds a fixed number of neighboring point the grouping layer finds all the points that are within the radius to the query point.

• Pointnet Layer: PointNet layer basically corresponds to the basic architecture of pointnet clas- sifier.

4.4 Pointnet

Pointnet is a deep neural network that consumes at set of raw point cloud data[34]. Tasks such as object classification and semantic segmentation are looked into by im- plementing this architecture. For the task, object classification, the input can be a sample from a point cloud which outputs n scores for all n classes. For semantic seg- mentation, the input can be of a single part of the region segmentation or part of 3D scene which outputs n x m scores for each n points and m sub-categories(semantic).

22 Figure 4.5: pointnet network [34]

4.4.1 Architecture Figure 4.5 depicts the architecture of the model which consists of two main modules combined,

• Classification network

• Segmentation network

Classification network takes in the input for n points where feature transfor- mation is applied on it. further, the point features are aggregated by using max pooling layer. The basic idea of this network is to align the inputs given in the form of canonical space and then extract the features[34].

Segmentation network segmentation networks acts as an extension to the clas- sification network, where the output and features extracted are combined for the alignment of source and target data. This network utilises global point cloud fea- tures extracted and and feeded to each of the point features in segmentation network to predict the global semantics of the target data[34].

23 Chapter 5 Methods

This project is designed to implement a deep learning model that works well for domain adaptation on the data generated. This chapter includes details on how the data is prepared and investigation of the suitable deep learning models which best fit with the dataset. A detailed explanation on generation of data is discussed in chapter 6, which also includes narrowing the gap between CAD and CAM. The re- search methodology used for implementation and to obtain viable results is further looked into. since the data generated is in the form of point clouds, not many models were explored this path to achieve domain adaptation. Based on the dataset and data pre-processing steps ideally few models to work on are mentioned and the best possible approach is implemented over this project course.

Each model which is a possible approach to reduce the gap between domains with respect to the data that has been collected is analysed, and is supported with proper decisions and their strategies. After the data has been generated and pre-processed as detailed in chapter 5, it is then divided into classes as mentioned in table 5.1 & 5.2, this division of classes explains the use case of this project i.e, quality inspection of the front wheel.

5.1 Data preparation

Data preparation was done during the project timeline. The implementation of the deep learning models on the data generated in this project are from the CAD Image software known as IPS and from SICK TriSpectorP (laser triangulation 3D camera). The data generated is of the front wheel of a pedal car. The application of automatic inspection of the quality and assembly is done after performing experimentation on deep learning model with the data generated. In this project a limited amount of data is generated since getting large amounts of data in production line is difficult to be produced and consumes a lot of time.

The data is generated from two different means :

1. Train data - IPS software

2. Test data - SICK TriSpectorP (laser triangulation 3D camera)

24 Based on the variations of the rotation, of front wheel upto 360 degrees and the front steering shaft to upto 60 degrees the images are captured and generated accord- ingly. The data generated is in the form of point clouds from both the sources and is specified in the format of x,y,z geometric coordinates of each point in a particular data file.

5.1.1 Source data Data to train the deep learning model is generated from IPS(Industrial Path Solu- tions), which is used by scania internally. In IPS software, LUA language is used to write the script, to generate the CAD images according to the requirements.

LUA is an embedded that supports procedural program- ming along with descriptions of the data [17]. Specifications are given in the script when generating the CAD images from this software. Different assembly CAD im- ages are generated through this software.

In total 4800 CAD images are created as the train data. These data images are in WRL file format. WRL file extension is for virtual reality modelling language (VRML) to display virtual environments. WRL consists of plain ASCII text files that include 3D specification details like edges of the polygon, vertices, image-mapped textures, surface colours etc. These text files also include coordinates and view points of the 3D image.

5.1.2 Target data The generation of test data is executed by SICK TriSpectorP1000 (laser triangulation 3D camera). It is a programmable 3D camera which fits the current industry 4.0. Some software tools and applications are involved along with the 3D camera which open wide range of possibilities and solutions. These tools help with easy operability in areas such as quality control, profile verification and robot handling.

The Trispectorp1000 performs as a stand-alone that contain analysis, imaging and lighting. It is embedded with laser triangulation technology that include the characteristics like contrast and colour independent along with acquiring true object shape data in millimeters.

Applications of TrispectorP1000: • Part quality check • Assembly check • 3D robot guidance • Profile verification • Edge and surface defect detection About 800 images are generated from this laser triangulation camera.

25 5.2 Data

The dataset consists of 4800 training sample and 800 testing samples. These data images are collection from the sources mentioned above. This dataset is split into two domains CAD data and CAM data. Under CAD data, we have 4 different classes and each class splitted into test and train. Similarly, for the CAM data as well. The Data split is in the ratio of 80:20. where, 80% is for training and 20% is for testing.

CAD DATA

Table 5.1: CAD data

Variation Class 1 Class 2 Class 3 Class 4 Side of the wheel Ok Ok No No Screw Placement Ok No Ok No

To generate CAD data, the wheel is rotated 40 times and the shaft of the pedal car is rotated 30 times, which brings the total of 1200 images for a class. For all the 4 classes the final sample is 4800.

CAM DATA

Table 5.2: CAM data

Variation Class 1 Class 2 Class 3 Class 4 Side of the wheel Ok Ok No No Screw Placement Ok No Ok No

In total 800 samples of CAM data (testing) is generated by rotating the wheel 10 times and the shaft 20 times for each class.

5.3 Dataset labelling

The CAD and CAM images generated are labelled in a hierarchical file structure. Basically, a folder is created with sub folders having CAD and CAM images. Inside the individual CAD and CAM folders, is the division of 4 different class folders. Class 1 represents presence of both screw and proper wheel placement. Class 2 has no screw but the wheel is in proper place. Class 3 is vice versa of class 2 and finally, class 4 has no screw present and no proper wheel placement. The data is then divided into train and test respectively, by having train folder and test folder inside each class folder.

5.4 Data cleaning

After the data is generated, the quality of the data is checked and filtered to improve the performance of the model and to achieve good results. Basically, CAM data is

26 filtered out by going through the images with bad quality scan. The CAM image is determined as bad quality if, the outline of the wheel is not properly visible, if the variations such as presence of screw and wheel placement is not scanned properly. Initially, CAD images are of 4800 and CAM images were 800. After filtering, CAM data has about 650 images, where each class has roughly 150 images after filtering, which are tested on.

Research question 1 is answered by analysing the data which is generated and bridg- ing the gap between the source data(CAD) and target data(CAM) by making the images look similar. As, in the early stages of CAD data generated the CAD point cloud image is of a full view point cloud pedal car, where as the CAM image is a single view point cloud of just the front wheel. This is done by narrowing a full-view point cloud CAD image to a single-view point cloud image. Also making sure that the coordinates of CAD and CAM are similar. This is further explained in chapter 6.

5.5 Model selection through literature review

5.5.1 Search terminology and Search strings Literature review is conducted by collecting specific published papers and journals by applying proper search terminology. Certain international digital libraries such as scopus, IEEE xplore, ACM along with and research gate were searched extensively. some search keywords are: Domain adaptation, 3D point clouds, Object detection, Deep neural network(DNN), Deep learning models etc. Some of the main search strings used to find the related literature study are,

1. ((3D Domain Adaptation) AND (Point cloud)

2. ((Deep learning models) AND Domain Adaptation))

3. ((Domain Adaptation) AND (Object detection) OR (Classification))

4. ((3D Domain adaptation) AND (Neural network))

Table 5.3: Results obtained after conducting literature review

Since PointDAN and Self-supervised[1, 37]are recent(2020) researches not many papers cited or referred them. But, Pointnet++ and Pointnet[34, 36] were roughly cited in 2500-3000 research papers. Published articles and literature that were found

27 relevant within the thesis research area were selected through snowballing technique, that is a search strategy addressed by Wohlin et al.[47] to select relevant articles, pa- pers, academic literature specific to the research topic. After identifying the papers related to the topic approaches like forward and backward snowballing are applied. To narrow down the search and to maintain the quality of the literature being as- sessed, following inclusion and exclusion criteria are taken into consideration.

Inclusion criteria • Including journals with impact factor(specific to the area and topic of research). • Literature that discuss about domain adaptation, point cloud data and object classification or detection. • Between the period 2015-2020. • Full length published articles in journals and proceedings. • Presence of qualitative and quantitative results. Exclusion criteria • Short papers from conferences. • Chapters in a book. • Old articles, powerpoint presentations and abstracts. After carrying out a comprehensive literature review through various deep learn- ing models applied to achieve domain adaptation, some of the deep learning models were shortlisted for a comparative analysis of the performance on the data gener- ated. These models mentioned below are experimented for domain adaptation on point clouds. • PointDAN[37] • Self-supervised domain adaptation network[1] • Pointnet++[36] • Pointnet[34] The motivation for selecting these specific models is because, the processing of point cloud data and handling the label structure is complicated for many models. Also, these were some of the recent state-of-the art models which were published in between the years 2017-2020 that matched with our research topic. Henceforth, these models were carefully analysed and selected which can prevent some of the drawbacks such as low accuracy etc, when experimenting with the data that has been generated.

Each of these models include some modules such as local feature alignment, global feature alignment, local and global feature alignment and no feature alignment, which is explained in chapter 4. These modules help the source data and target data to match with their common features and reduce the gap between the domains. In this project, a deeper investigation on the above models is examined and the design of the experimentation of the data is explained in this chapter.

28 5.6 Methodology and Experimental design

In this research, RQ1 is answered by analysing the data generated and reducing the gap between CAD and CAM by narrowing CAD image to single-view-point cloud image, which is explained in chapter 6. Literature review and controlled experiment are applied for answering research questions RQ2 and RQ3. The above methods are chosen since variables involved(dependent and independent) in this project are understandable and clearly defined.

Other alternative methodologies such as, case study is not appropriate for this re- search as, the aim is to implement a deep learning model which learns from synthetic 3D point cloud (source domain) and can perform well on actual 3D point cloud (tar- get domain). This can be accomplished only by conducting controlled experiment. The chosen deep learning models through literature review are trained and tested with our data generated. Finally, comparative analysis is drawn among the models to check the performance and find the suitable model for our data set.

1.Independent variables would be the deep learning models used in experiment.

2.Dependent variable is the performance of the deep learning models which in- cludes training time and accuracy for measuring.

5.6.1 Experimentation on the deep learning models Training In this project, the dataset generated is specific to the field, manufacturing indus- try. This 3D point cloud data is given to the models to check it’s performance and accuracy. Both the train and test data are given as an input to the model for it to learn. Based on the dataset which is structured as CAD dataset and CAM dataset i.e, the data labelling hierarchy, each of the model gets trained.

Four different classes related to our use-case are placed inside CAD and CAM respectively. Finally, the division of train and test data happens inside each of these classes. The deep learning models, PointDAN, self-supervised, Pointnet++, Point- net are fed with CAD data and some of negligible amount of CAM data so it can learn from both the domains and predicts on CAM data.

The process of training the network is done by using the method similar to transfer learning. Transfer learning is used because, it can decrease the train time of the model and also the computational resources.

Hyperparameters PointDAN 1. Optimizer = Adam 2. Learning rate = 0.0001

29 3. Weight decay = 0.0005

4. Batch size = 64

5. Epoch = 150

Self-supervised

1. Optimizer = Adam

2. Learning rate = 0.0001-0.001

3. Weight decay = 0.00005-0.0005

4. Batch size = 32-64

5. Epoch = 150

Pointnet++

1. Optimizer = Adam

2. Learning rate = 0.001

3. Weight decay = 0.5

4. Batch size = 32-64

5. Epoch = 150

Pointnet++

1. Optimizer = Adam

2. Learning rate = 0.001

3. Weight decay = 0.5

4. Batch size = 32-64

5. Epoch = 150

Pointnet

1. Optimizer = Adam

2. Learning rate = 0.001

3. Weight decay = 0.5 increased to 0.99 gradually.

4. Batch size = 32-64

5. Epoch = 150

30 Experimental setup This project is developed on pytorch version 1.0 with python version 3.6. envi- ronment. cuDNN and CUDA were installed, for the deep learning models to train on Nvidia QuadroP5000 GPU (Graphical Processing Unit) with RAM size 32 GB. Python, which is one of the simplest and user friendly programming language has been chosen for implementing the deep learning models.

5.6.2 Evaluation The performance of the deep learning models experimented are measured through accuracy and train time. The number of correct predictions made by the deep learn- ing model over total number of predictions is defined as accuracy. Accuracy is a good performance metrics if the classes in the target data are balanced. No.ofcorrectpredictions Accuracy = (5.1) T otalno.ofpredictions

The amount of time taken for the deep learning model to finish the training and test on the data, is measured. Training time may vary based on the architecture of individual deep learning models. Sometimes, it depends on the data given to the model as well. These specific evaluation metrics are chosen in this experiment as, the main idea of this project is how accurately the model can identify the images in the CAM dataset when it learns from CAD dataset which helps in the application of automatic quality inspection. So, calculating accuracy and train time of the deep learning model can help in comparing the performances and concluding the best out of the selected.

31 Chapter 6 Understanding the data

This chapter analyses the data generated i.e, CAD images and images scanned from 3D camera. Previous chapter brushes on the experimental studies and various tech- niques which involve preparing the data. This chapter discusses outcomes of the techniques to narrow the gap between CAD and CAM. Also discusses single view point cloud and other methods applied on the data to standarize them and make them similar, which answers our research question 1 on how to bridge the gap be- tween CAD and CAM images. All the sections are explained in the preferable order to achieve the goal of matching the source and target data.

6.1 WRL to point cloud conversion

The WRL file of the CAD image is converted to point cloud file (PLY) extension since, the test data taken from the camera is in the PLY format. To match the data files between both the domains, conversion is required. PLY file format also known as polygon file format or St anford triangle format is outlined to store three dimensional data and it’s properties taken from the 3D camera. The header in the text file specifies the number of vertices and polygons as well as properties associated with the 3D image such as points, normals and coordinates.

The conversion of WRL to PLY is performed with the software known as Mesh- Lab. The MeshLab sever is a tool that batch converts the specified input files to the required output files which are supported by MeshLab. This processing is done through batch mode conversion that allows to automate the operations without the need of GUI.

The MeshLab server is connected with three arguments for it’s usage :

• Input file

• Output file

• Script file

Script file is an XML file, which is read by the MeshLab server and consists of filters along with their parameters which can be applied on the input file.

32 Figure 6.1: MeshLab software

6.2 Single view point cloud

This section explains about functions used and methods applied to achieve single view point cloud from a full view point cloud. The 3D image generated from the CAD software is a full point view of the pedal car. As, the use case of this research is the front wheel of the pedal car, the rest of the view must be discarded and only the wheel must be projected as a part of train data. Once the data is converted from WRL to PLY files, projection of the wheel is done.

Single view point cloud can be derived through various approaches, some of them are :

• Perspective projection

Homography

Interactive visualization(Crop Geometry)

• Image reconstruction

Perspective projection

Perspective projection also known to be linear projection is defined to be mapping the points in the image on a 3-Dimensional or 2-Dimensional space or on a plane. It is categorised into 3 kinds of perspectives; one-point, two-point and three-point re- spectively depending on the orientation of the plane to the axis of the object. Based upon mathematical formulae, these transformations of the points to be projected on the plane is carried out.

33 Figure 6.2: Example of perspective projection[49]

Homography

A homography matrix is associated with the transformation bounded by two planes. It is a 3X3 matix with 8 DOF(Degrees of freedom). Uses of homography transfor- mation are :

1. Perspective correction of the image.

2. Camera pose estimation

Images of homgography matrix with normalization equation.

Interactive visualization(Crop geometry)

This concept is from the library Open3d in python. Open3d is used to perform advanced visualisations and crop the required image as the train data. It is also used for data processing of 3D files and is comparatively much convenient and reputed for it’s functionalities as well. It can also be set up on various platforms with some basic instructions [53].

Image reconstruction

Through certain imaging techniques like projections of the object and by the use of iterative algorithms on 2D and 3D images, the process of image reconstruction is done to obtain view points. One of the major methods used is CNN to retrieve single view from the 3D object which interprets the points geometrically and semantically [45].

From the methods explained in the above section Interactive Visualization(Crop Geometry) through Open3d is selected to achieve single view point cloud image(Wheel) from the full view point cloud(pedal car) because compared to homography and image reconstruction, applying Interactive Visualization(Crop Geometry) was more feasible to the images generated.

This procedure executes two main functions to obtain the required result:

34 1. Crop_geometry()

2. Manual_registration()

Crop_geometry(),also known as polygon selection function reads the point clouds from the loaded image and calls another function draw_geometries_with_editing(), with a feature to select the vertex of the image and allows to crop the image. As seen in figure 6.4, by selecting the coordinates of the boundaries, the polygon is cropped, where it discards the remaining image and saves the cropped polygon.

Figure 6.3: Pedal Car

Figure 6.4: Cropping the pedal car to project only the wheel with bounding box using functions and libraries in python.

35 By using point-point ICP, Manual_registration() function registers two point clouds and aligns them. Before alignment it also reads the point clouds and visualises the image. Figure 6.5 represents the cropped wheel, which is viewed from the top angle perspective. Figure 6.6 depicts the single view point cloud of the wheel after cropping.

Figure 6.5: Cropped wheel

Figure 6.6: Single view point cloud wheel

36 6.3 Generation of data from camera

Figure 6.7, represents the 3D camera/scanner along with the robot that is used to generate the real images. This 3D camera, attached to the robot is operated using the robot operator as shown in figure 6.8, which helps the robot scan the wheel vertically.

Figure 6.7: TrispectorP1000 along with the robot

37 Figure 6.8: Operator of the robot to scan the wheel

To analyse both train and test images along with their coordinates, they are cross verified in a 3D point cloud software known as CloudCompare. Initially, the purpose of this software was to detect changes in high density 3D point clouds captured from laser camera or scanner in industrial organizations, later it was revised to more ad- vanced 3D data processing software. some manual rendering functions and advanced processing algorithms are provided in this software for performing operations like :

• Projections

• Geographical features estimation

• Registration

• Distance computation between the points

• Segmentation

It supports the file extensions of PLY, STL, OBJ, VTK, ASCII etc.

Both the test and train images obtained from different sources are then checked to see if their coordinates match. If there exist large coordinate value gap among those two different images, the scale of the image generated is increased. The Figure 6.9 below depicts the match of the coordinates of both test and train image,this ensures the size and structure of both the data are alike.

38 Figure 6.9: CloudCompare Software

6.4 Other methods

Another approaches to achieve single point of the point cloud image were attempted and experimented with PCL( Point cloud Library) but could not give the results as expected. PCL contains various algorithms and modular libraries for 3D point cloud data processing tasks. some algorithms are based upon feature estimation, surface reconstruction, 3D registration etc. But, due to compatibility issues of the PCL library which was developed few years ago, visualising the point cloud image was failed and conversions of WRL to PLY was not supported by it.

The data which is obtained by conducting these above mentioned processes is fed to the deep learning model. Experiment conducted through this data on the model is explained in the chapter 5.

39 Chapter 7 Results

This chapter discusses about the results achieved by four deep learning models, Point- DAN, Pointnet++, Pointnet and self-supervised model respectively and also some advantages and disadvantages are explained. The performance of these four models are then compared for the analysis. The evaluation of these four pre-trained models are analysed based on accuracy and training time. The graphical representations be- low depicts the train data and test data for all the four classes from the two domains CAD and CAM in this dataset which is considered in this project.

7.1 Results

7.1.1 Model 1-PointDAN

The figure 7.1 demonstrates the overall accuracy predicted by pointDAN model and depicts the change in accuracy with gradual increase in the epoch. This results in getting the accuracy upto 82% when the model is trained with the CAD data and tested on the CAM data. The loss vs epoch graph shown in figure 7.2 details on the decrease in the loss progressively resulted having 0.98 which indicates that the accuracy of prediction is good. The train time taken for this model to finish the implementation was about 1 hour 30 minutes.

From the figure 7.1, we observe two accuracies. Blue graph line represents train accuracy and orange graph line represents test accuracy. Train accuracy is the ac- curacy we get on training data and test accuracy is what we get on test data. Train accuracy is validated by fine-tuning the deep learning model and optimising the pa- rameters. The graph for train accuracy is the outcome of training the model with CAD data and cross-validating some of the CAD data. By doing so, observations can be drawn from both training data and testing data which is useful to know about the robustness of the deep learning model.

As mentioned in chapter 5, data section, the data is split into 80-20 ratio. 80% for training and 20% for testing. The same is applied for the other models as well, where both train and test accuracies are observed to know if over-fitting or under- fitting takes place. If we observe all the graphs, the results seems to be in good state as either over-fitting (indicating large gap between train and test graph lines) or under-fitting (test graph line is higher compared to train graph line) is not seen.

40 Figure 7.1: Overall accuracy vs epoch graph when the data is test with pointDAN network.

Figure 7.2: Loss vs epoch graph when the data is trained and tested on PointDAN

The figure 7.3 details about each class in the dataset and their accuracies respec- tively. class 1, which has both wheel alignment and screw correctly placed has an accuracy about 76%. Class 2 has the wheel placed correctly but no screw with an accuracy of 79%. Class 3 indicates that the accuracy of having the wheel placed incorrectly with the screw present is about 82%. Class 4 has an accuracy of 40% with no screw and incorrect wheel placement. Compared to the other classes, class 4 has low accuracy in average. This is due to the quality of data generated for class 4 being low.

41 Figure 7.3: Class-wise accuracy for PointDAN model

7.1.2 Model 2-Self-supervised network

When the data is trained on self-supervised network, the accuracy attained through it was round 79%. The figure 7.4 demonstrates the overall accuracy predicted by self-supervised network and depicts the change in accuracy with gradual increase in the epoch. This results are illustrated when the model is trained with the CAD data and tested on the CAM data. The loss vs epoch graph shown in figure 7.5 details on the decrease in the loss progressively with slight variations at the end being 0.80.

Figure 7.4: Overall accuracy vs epoch graph when the data is test with self-supervised network.

42 Figure 7.5: Loss vs epoch graph when the data is trained and tested on self-supervised network.

The figure 7.6 explains about each class in the dataset and their accuracies re- spectively. Class 1, which has both wheel alignment and screw correctly placed has an average accuracy about 80%. Class 2 has the wheel placed correctly but no screw with an accuracy of 81%. Class 3 indicates that the accuracy of having the wheel placed incorrectly with the screw present is about 89%. class 4 has an accuracy of 40% with no screw and incorrect wheel placement. As described for the results of pointDAN, due to having low quality data for Class 4 the accuracy for it turned out to be lower.

Figure 7.6: Class-wise accuracy on Self-supervised model.

7.1.3 Model 3-Pointnet++ Pointnet++ model produced an outcome with an accuracy result of 76%. The figure 7.7 demonstrates the overall accuracy predicted by pointnet++ which also details in

43 the changes in accuracy with increase in number of epochs. The loss vs epoch graph shown in figure 7.8 explains the decrease in the loss progressively which points out to be 0.79.

Figure 7.7: Overall accuracy vs epoch graph when the data is test with Pointnet++

Figure 7.8: Loss vs epoch graph when the data is trained and tested on Pointnet++

The figure 7.9 details about each class in the dataset and their accuracies respec- tively. Class 1, which has both wheel alignment and screw correctly placed has an average accuracy about 71%. Class 2 has the wheel placed correctly but no screw with an accuracy of 79%. Class 3 indicates that the accuracy of having the wheel placed incorrectly with the screw present is about 63%. Class 4 has an average ac- curacy of below 39% with no screw and incorrect wheel placement, due to the bad quality of data that was generated. Factors like light, orientation of the wheel could be the possible reasons for the bad quality.

44 Figure 7.9: Class-wise accuracy on Pointnet++

7.1.4 Model 4-Pointnet

The figure 7.10 demonstrates the overall accuracy predicted by self-supervised net- work and depicts the change in accuracy with gradual increase in the epoch. This resulted in getting the accuracy upto 75% when the model is trained with the CAD data and tested on the CAM data. The loss vs epoch graph shown in figure 7.11 details on the decrease in the loss progressively which indicates that the accuracy of prediction is good. The loss was around 0.75 with few variations in the last epochs.

Figure 7.10: Overall accuracy vs epoch graph when the data is test on Pointnet

45 Figure 7.11: Loss vs epoch graph when the data is trained and tested on Pointnet

The figure 7.12 captures the accuaries of each class in the dataset respectively. Class 1, which has both wheel alignment and screw correctly placed has an accuracy about 61%. Class 2 has the wheel placed correctly but no screw with an accuracy of 58%. Class 3 indicates that the accuracy of having the wheel placed incorrectly with the screw present is about 75%. Class 4 has an accuracy of 48% with no screw and incorrect wheel placement.

Figure 7.12: Class-wise accuracy on Pointnet

NOTE: The accuracies mentioned for the class-wise graphs are the mean accu- racies.

The Figure 7.13 illustrates the accuracies of 4 deep learning models chosen in one graph for performance analysis and comparison. From the graph plotted, it is observed that PointDAN deep learning model could perform well with the dataset generated i.e, on CAD and CAM images and the model adapts well from one domain

46 to another by increased accuracy when compared with other deep learning models chosen to experiment with. This is due to pointDAN model having both local and global feature alignment to match the features between the two domains. where as the others are just either just local level alignment or global level alignment. Reducing from full view point cloud CAD image to a single view point cloud CAD image played an important role for the increase in accuracy. Since Having single view point cloud image for both the domains could match the features easily.

Figure 7.13: Accuracies of the 4 deep learning models

47 Chapter 8 Analysis and Discussions

This chapter analyses the findings and results of the experimentation. Some dis- cussions on the behaviour of the deep learning models with the data generated in this project are described. Evaluation factors such as accuracy and training time are compared and analysed to check which deep learning model best fits the dataset. Additionally, the performance of the selected deep learning models with our data set is compared with the generalised dataset (open-source data used in previous re- searches. The dataset consists of standard images of table, chair,bed etc.) to see the changes in domain adaptation from synthetic to real images.

8.1 PointDAN

The results of this deep learning model with our dataset has exhibited comparatively better results with that of the generalised dataset. While the accuracy is about 82% with our dataset, the model trained on generalised dataset has the accuracy of 64%. PointDAN deep learning model turned out to be the best fit for our dataset generated as it focuses on both local feature alignment and global feature alignment.

Table 8.1: Accuracy results on PointDA-10 Dataset with different domains[37]

From the table 8.1, previous researches on pointDAN were based on adaptation on three domains. while our deep learning model focuses on domain adaptation on 2 domains i.e, from CAD to CAM respectively. Table 8.1 outlines the accuracies, when the adaptation takes place from one domain to another. For example, M->S indicates the source data is ModelNet dataset and the target dataset is ShapeNet. Similarly M->S, etc are the accuracies drawn from the adaptation of different domain datasets. PointDAN and PointDAN1 are the two versions of the model implemented with G(global alignment), L(local alignment), A(self-training) components that are observed in the architecture.

48 8.2 Self-supervised network

Self-supervised network when trained on our dataset has given out results with an accuracy of 79% on the other hand the same model with a generalised dataset was proven to produce results with an average accuracy of 75% from the table 8.2. when a comparative analysis of the performance is drawn between the deep learning model of different dataset, it could be said that both of the results have almost equal outcomes.

Table 8.2: Test set accuracy results on general dataset [1]

Since the self-supervised deep learning model has been structured differently com- pared to pointDAN, it may not be the preferred deep learning model to experiment with further, as our dataset has showed better results with pointDAN.

Another possible explanation on why self-supervised network is not that prefer- able in our scenario with our dataset is because, self-supervised network deals with deformation reconstruction which involves de-constructing the image into smaller re- gions and again re- constructing them back which is highly suitable for different data images such as table, chair, lamp etc. Since our data mainly focuses on the whole wheel and some parts within the wheel itself, reconstruction is not an ideal choice here.

8.3 Pointnet++

Poinetnet++ deep learning model details specially on local-level feature alignment as the architecture describes on the data points from the image divided into specific local regions and grouped for extracting the features and applying segmentation or classification. Table 8.3 display the results of pointnet++ deep learning model on general data, with an accuracy of 91.9%. When the same deep learning model is trained on our dataset the results illustrated having an accuracy of 76%.

Though pointnet++ deep learning model has proven to have better results for the generalised data set, it was not applying the concept of domain adaptation when training the model since the general dataset comprises of only a single domain images i.e, either modelNet40 point cloud dataset or ScanNet point cloud dataset.

49 Table 8.3: pointnet++ deep learning model accuracy on modelnet40 data [36]

Since the concept of domain adaptation was not previously applied on point- net++ with general dataset the results can vary quite a lot as, in this project we tried to apply domain adaptation on pointnet++ deep learning model with CAD point cloud data as the source and CAM point cloud data as the target.

8.4 Pointnet

The table 8.4 represent the results of pointnet deep learning model when trained on general dataset(ShapeNet). The mean accuracy achieved by doing so was 83% which is comparivtely higher to the same model trained with our data. However, pointnet is a accustomed to solve problems like classification or segmentation which is done using the data of a single domain. Table 8.4 again explains about the accuracies obtained from the same level of data. This could be a possible reason on why the accuracy of pointnet with general data is higher than our data which has an accuracy of 75%.

Table 8.4: pointnet deep model accuracy on shapenet data [34]

NOTE:The tables 8.1, 8.2, 8.3 and 8.4 are reproduced from reference.

The table 8.5 and table 8.6 clearly represents the results derived from our data in a comparative form. These tables consists of the accuracy and train time values. In table 8.5, it is observed that PointDAN model resulted in having mean accuracy of 82.65%, when the model was trained on the data that was generated. The time taken for the model to train was about 1 hour 30 minutes. The self-supervised model had an outcome of 79.41% with our data. This model compatively took very long time to train because the architecture network of this model was heavy i.e, according to its architecture this model had to perform reconstruction and deconstruction tasks

50 for each image.

Table 8.5: Comparative analysis of accuracies with PointDAN and self-supervised deep learning models

Table 8.6: Comparative analysis of accuracies with Pointnet++ and Pointnet deep learning models

From table 8.6, Pointnet++ showed an accuracy of 76.91% with a train time of 7 hours and Pointnet with 75.97% with 1 hour of train time for our data. This is ob- served due to the domain of both source and target data being similar to each other, also these models consists of simple architecture for point cloud data. This being said, the feature alignment could have been more accurately done which matched the points of the images, leading to higher accuracy.

From the results shown in chapter 7 and from table 8.5 and table 8.6 we can conclude that PointDAN deep learning model has shown quite an improvement with our data. Another explanation for PointDAN being a better deep learning model for our dataset when compared with the rest three deep learning models is, PointDAN exhibits both local-feature alignment and global- feature alignment in it’s architec- ture which can reduce the gap between the domains CAD and CAM, eventually giving better accuracy results which can justify our scenario for automatic quality inspection.

8.5 Limitations and Challenges

This section briefs on some of the challenges and limitations faced during this project period.

51 The size of the dataset considered was limited due to the use-case taken and the time period to generate the data. The data generated was based on various possibil- ities of the orientation of wheel. If all the possibilities and angles of the wheel were taken into consideration, it consumes a lot of time for data generation itself as, the 3D camera scans each image vertically and moves back to it’s initial position to start the process again.

In some cases, due to bad quality of the image taken from the 3D camera, some of the data had to be filtered for the model to be properly train and test. Due to filteration of the dataset, the amount of data used for the model to learn and apply adaptation has become quite difficult to produce better results.

Since the data is generated by considering 4 different classes, all 4 classes were important aspects to collectively increase the accuracy of the model. But, from the dataset generated it was observed that class 4 data(no screw and no proper wheel placement) has comparatively bad performance in adapting. This is seen due to the quality of that particular class being bad, therefore, decreasing the average accuracy of the model.

Another challenging aspect during the course of this project is to match the coordinates of the image of both domains(CAD and CAM) to maximise accuracy of the results. Programmatically it was not possible to match the coordinates, as CAM data had large coordinates size compared to the CAD data. Hence, we decided to change the settings( increase the coordinate size) in the software tool that generates CAD data.

8.5.1 Validity threats Internal validity There could be a possibility of having internal validity threat if the factors affecting the investigation is to known to the researcher[40].

The main focus of the data generated in this project is to work on domain adap- tation. Since the data for CAD images was around 4800 images and CAM images are 800, the deep learning models trained on the dataset could show bias in their performance. This is due to the lack of sufficient data. In addition, due to bad quality of the data generated more data is filtered from the dataset which turned out to have inadequate data for testing. This issue can be resolved by increasing the dataset with better quality by proper wheel orientations when scanning it through 3D camera.

External validity The extent to which the research can be replicated, and the results obtained can be used outside this research[40].

52 External validity does not pose a threat since the data used in this project is real to formulate and build a solution. Various criterias were looked into for creating the dataset and again filtering the dataset to bring out optimal results. This project gives a possibility of replication as it covers the aspects of certain measures followed in data generation and using various deep learning models.

53 Chapter 9 Conclusions and Future work

This chapter concludes with the learnings discovered in this research project and also discusses about the future work that can further improve the results.

9.1 Conclusion

This thesis is an initiative taken by Scania to explore and examine the process of automatic quality inspection using deep learning models. From this course of thesis project, it is proved that automatic quality inspection can be developed and utilised in assembly and manufacturing processes. Deep learning models were investigated to look into domain adaptation and also data images from CAD and 3D camera are generated. Four CNN-based deep learning models were experimented on the gener- ated dataset. The experimental results demonstrated that the models’ selected after literature review were feasible and could give good results which can reduce the gap between domains.

The deep learning models considered were trained on CAD point cloud images and could reduce the domain gap by identifying the use-cases such as screw and wheel placement on actual point cloud images(3D camera point cloud). Initially, the CAD images generated were of full view point cloud of the whole pedal car. But, this this project focused on the wheel of the pedal car with the use-cases of presence of screw and wheel alignment since the 3D camera image can scan only a single view of the pedal car. The full view point cloud of CAD image is cropped to a single view point cloud by separating only the wheel required. The dataset which includes CAD point cloud images and CAM point cloud images is created after the single view point cloud is generated.

A controlled experiment is performed on the deep learning models pointDAN, self-supervised network, pointnet++, pointnet to compare the performance based on accuracy and training time. Among all these models, pointDAN deep learning model performed better compared to the others as pointDAN architecture details about both local level alignment as well as global level alignment. Having both these features in the network has been an advantage in reducing the gap between the do- mains with an increased accuracy in identifying the use-cases mentioned. Another main reason for pointDAN model to perform better on the data that was generated

54 was because of reducing the difference between the CAD and CAM images. This was done by focusing on a single view point cloud. The CAD images were reduced from full view point cloud to single view point cloud to match with CAM images. By do- ing this the alignment of features between different domains was more accurate and the model could perform better with our data. self-supervised deep learning network was second best and the results were close to pointDAN. The model pointnet++ focused on local region feature alignment while pointnet focused on the global level alignment of the features.

This project is concluded by,

• Applying and implementing the concept of domain adaptation which helps in automatic quality inspection of assembly.

• Improving the results of pointDAN on our dataset by reducing the gap between CAD and CAM images by investigating single view point.

• Drawing comparisons and briefing the results achieved with local level align- ment, global level alignment, both local and global level alignment and recon- struction -deconstruction of features. Model with both local level and global level alignment proves to achieve better results.

9.2 Answering the research questions

RQ1

How to narrow the gap between CAD generated images and camera gen- erated images, to make them look similar?

By Analysing the source data(CAD) and target data(CAM), and by applying data pre-processing techniques, both the CAD and CAM images are matched by making them look similar. As, Initially CAD image was a full view point cloud image of the Pedal car i.e, the whole pedal car is generated as an image instead of the wheel. so, this CAD image is cropped by creating bounding box and discarding the other parts, while reducing it to a single view point cloud image that is similar to the CAM image.

RQ2

Can domain adaptation be achieved from synthetic images(CAD) to real images(CAM) ?

Through an extensive literature review, state-of-the-art deep learning models such as pointDAN, Self-supervised network, Pointnet++, Pointnet are selected to perform 3D domain adaptation from synthetic point cloud data(CAD images) to actual point cloud data(CAM images).

55 RQ3 How is the performance of the selected deep learning models from RQ2 analysed on target data(CAM)?

Controlled experiment is performed on the 4 deep learning models and the results are analysed based on accuracy and train time of each of the models. PointDAN deep learning model was proven to achieve better results for domain adaptation when compared to the other models.

9.3 Recommended Future work

This project mainly focuses on a specific use-case and two variations observed in the wheel for automatic quality inspection of front wheel of a pedal car. One recom- mended extension to this project can be of having different scenarios or use-cases. As, in our project we look into the aspects of wheel alignment and screw placement. we can extend it to having additional variations in the wheel, of checking if the shape of the screw fitted is correct which can increase the number of classes.

Another suggestion is look into some more variants of deep learning models for domain adaptation as, more state-of-the art models are currently being researched. These models can be trained with our data to generate more comprehensive results for comparisons.

Different data augmentation methods like image scaling or even adding noise to the data could help in achieving better performance. As even the size of the data has an effect on the deep learning model’s performance, future researchers can increase the size of the dataset with better quality data.

56 References

[1] Idan Achituve, Haggai Maron, and Gal Chechik. Self-supervised learning for domain adaptation on point-clouds. arXiv preprint arXiv:2003.12641, 2020.

[2] Antonio Alliegro, Davide Boscaini, and Tatiana Tommasi. Joint supervised and self-supervised learning for 3d real-world challenges. arXiv preprint arXiv:2004.07392, 2020.

[3] Jason Brownlee. A gentle introduction to transfer learning for deep learning. Machine Learning Mastery, 2017.

[4] Wenzheng Chen, Huan Wang, Yangyan Li, Hao Su, Zhenhua Wang, Changhe Tu, Dani Lischinski, Daniel Cohen-Or, and Baoquan Chen. Synthesizing train- ing images for boosting human 3d pose estimation. In 2016 Fourth International Conference on 3D Vision (3DV), pages 479–488. IEEE, 2016.

[5] Sumit Chopra, Suhrid Balakrishnan, and Raghuraman Gopalan. Dlid: Deep learning for domain adaptation by interpolating between domains. In ICML workshop on challenges in representation learning, volume 2, 2013.

[6] Aysegul Dundar, Ming-Yu Liu, Ting-Chun Wang, John Zedlewski, and Jan Kautz. Domain stylization: A strong, simple baseline for synthetic to real image domain adaptation. arXiv preprint arXiv:1807.09384, 2018.

[7] Martin Engelcke, Dushyant Rao, Dominic Zeng Wang, Chi Hay Tong, and In- gmar Posner. Vote3deep: Fast object detection in 3d point clouds using effi- cient convolutional neural networks. In 2017 IEEE International Conference on Robotics and Automation (ICRA), pages 1355–1361. IEEE, 2017.

[8] Arnaud Nguembang Fadja, Evelina Lamma, Fabrizio Riguzzi, et al. Vision inspection with neural networks. In RiCeRcA@ AI* IA, 2018.

[9] Yaroslav Ganin and Victor Lempitsky. Unsupervised domain adaptation by backpropagation. arXiv preprint arXiv:1409.7495, 2014.

[10] Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, François Laviolette, Mario Marchand, and Victor Lempitsky. Domain-adversarial training of neural networks. The Journal of Machine Learn- ing Research, 17(1):2096–2030, 2016.

[11] Muhammad Ghifary, W Bastiaan Kleijn, Mengjie Zhang, David Balduzzi, and Wen Li. Deep reconstruction-classification networks for unsupervised domain

57 adaptation. In European Conference on Computer Vision, pages 597–613. Springer, 2016.

[12] Raghuraman Gopalan, Ruonan Li, and Rama Chellappa. Domain adaptation for object recognition: An unsupervised approach. In 2011 international conference on computer vision, pages 999–1006. IEEE, 2011.

[13] Yulan Guo, Hanyun Wang, Qingyong Hu, Hao Liu, Li Liu, and Mohammed Bennamoun. Deep learning for 3d point clouds: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020.

[14] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learn- ing for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.

[15] Jeff Heaton. Ian goodfellow, yoshua bengio, and aaron courville: Deep learning, 2018.

[16] Weixiang Hong, Zhenzhen Wang, Ming Yang, and Junsong Yuan. Conditional generative adversarial network for structured domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1335–1344, 2018.

[17] Roberto Ierusalimschy, Luiz Henrique De Figueiredo, and Waldemar Celes Filho. Lua—an extensible extension language. Software: Practice and Experience, 26(6):635–652, 1996.

[18] Jyoti A Kodagali and S Balaji. Computer vision and image analysis based tech- niques for automatic characterization of fruits-a review. International Journal of Computer Applications, 50(6), 2012.

[19] Akash kumar. Domain adaptation: An in-depth sur- vey analysis. https://medium.com/analytics-vidhya/ domain-adaptation-an-in-depth-survey-analysis-part-i-17c8b4d7f9c8, September 2019.

[20] Kevin Lai and Dieter Fox. Object recognition in 3d point clouds using web data and domain adaptation. The International Journal of Robotics Research, 29(8):1019–1037, 2010.

[21] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature, 521(7553):436–444, 2015.

[22] Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Xinhan Di, and Baoquan Chen. Pointcnn: Convolution on x-transformed points. In Advances in neural infor- mation processing systems, pages 820–830, 2018.

[23] Weiping Liu, Jia Sun, Wanyi Li, Ting Hu, and Peng Wang. Deep learning on point clouds and its application: A survey. Sensors, 19(19):4188, 2019.

58 [24] Haifeng Luo, Kourosh Khoshelham, Lina Fang, and Chongcheng Chen. Un- supervised scene adaptation for semantic segmentation of urban mobile laser scanning point clouds. ISPRS Journal of Photogrammetry and Remote Sensing, 169:253–267, 2020.

[25] Francisco Massa, Bryan C Russell, and Mathieu Aubry. Deep exemplar 2d- 3d detection by adapting from real to rendered views. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6024– 6033, 2016.

[26] Daniel Maturana and Sebastian Scherer. Voxnet: A 3d convolutional neural network for real-time object recognition. In 2015 IEEE/RSJ International Con- ference on Intelligent Robots and Systems (IROS), pages 922–928. IEEE, 2015.

[27] Emilio Soria Olivas, Jose David Martin Guerrero, Marcelino Martinez Sober, Jose Rafael Magdalena Benedito, and Antonio Jose Serrano Lopez. Handbook Of Research On Machine Learning Applications and Trends: Algorithms, Methods and Techniques-2Volumes. Information Science Reference - Imprint of: IGI Publishing, Hershey, PA, 2009.

[28] Xingchao Peng, Ben Usman, Kuniaki Saito, Neela Kaushik, Judy Hoffman, and Kate Saenko. Syn2real: A new benchmark forsynthetic-to-real visual domain adaptation. arXiv preprint arXiv:1806.09755, 2018.

[29] Van Hiep Phung, Eun Joo Rhee, et al. A high-accuracy model average ensemble of convolutional neural networks for classification of cloud image patches on small datasets. Applied Sciences, 9(21):4500, 2019.

[30] Pedro O Pinheiro, Negar Rostamzadeh, and Sungjin Ahn. Domain-adaptive single-view 3d reconstruction. In Proceedings of the IEEE International Con- ference on Computer Vision, pages 7638–7647, 2019.

[31] Klaus Pottler, Marc Röger, Eckhard Lüpfert, and Wolfgang Schiel. Automatic noncontact quality inspection system for industrial parabolic trough assembly. Journal of solar energy engineering, 130(1), 2008.

[32] Florent Poux, Romain Neuville, Pierre Hallot, and Roland Billen. Model for rea- soning from semantically rich point cloud data. ISPRS Annals of the Photogram- metry, Remote Sensing and Spatial Information Sciences, 4:107–115, 2017.

[33] Sanjay Purushotham, Wilka Carvalho, Tanachat Nilanon, and Yan Liu. Varia- tional recurrent adversarial deep domain adaptation. 2016.

[34] Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 652–660, 2017.

[35] Charles R Qi, Hao Su, Matthias Nießner, Angela Dai, Mengyuan Yan, and Leonidas J Guibas. Volumetric and multi-view cnns for object classification on

59 3d data. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5648–5656, 2016.

[36] Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in neural information processing systems, pages 5099–5108, 2017.

[37] Can Qin, Haoxuan You, Lichen Wang, C-C Jay Kuo, and Yun Fu. Pointdan: A multi-scale 3d domain adaption network for point cloud representation. In Advances in Neural Information Processing Systems, pages 7190–7201, 2019.

[38] Lin Ruiyuan. Mcl research on domain adaptation. http://mcl.usc.edu/news/ 2018/12/16/mcl-research-on-domain-adaptation/, December 2018.

[39] David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. Learning representations by back-propagating errors. nature, 323(6088):533–536, 1986.

[40] Per Runeson and Martin Höst. Guidelines for conducting and reporting case study research in software engineering. Empirical software engineering, 14(2):131, 2009.

[41] Kuniaki Saito, Kohei Watanabe, Yoshitaka Ushiku, and Tatsuya Harada. Maxi- mum classifier discrepancy for unsupervised domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3723–3732, 2018.

[42] Ozan Sener, Hyun Oh Song, Ashutosh Saxena, and Silvio Savarese. Learning transferrable representations for unsupervised domain adaptation. In Advances in Neural Information Processing Systems, pages 2110–2118, 2016.

[43] Baochen Sun and Kate Saenko. Deep coral: Correlation alignment for deep domain adaptation. In European conference on computer vision, pages 443–450. Springer, 2016.

[44] Vivienne Sze, YH Chen, TJ Yang, and J Emer. Efficient processing of deep neural networks: A tutorial and survey. arxiv 2017. arXiv preprint arXiv:1703.09039, 2017.

[45] Maxim Tatarchenko, Stephan R Richter, René Ranftl, Zhuwen Li, Vladlen Koltun, and Thomas Brox. What do single-view 3d reconstruction networks learn? In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3405–3414, 2019.

[46] Garrett Wilson and Diane J Cook. A survey of unsupervised deep domain adaptation. arXiv preprint arXiv:1812.02849, 2018.

[47] Claes Wohlin, Per Runeson, Martin Höst, Magnus C Ohlsson, Björn Regnell, and Anders Wesslén. Experimentation in software engineering. Springer Science & Business Media, 2012.

60 [48] Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, and Jianxiong Xiao. 3d shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1912–1920, 2015.

[49] Xinchen Yan, Jimei Yang, Ersin Yumer, Yijie Guo, and Honglak Lee. Perspec- tive transformer nets: Learning single-view 3d object reconstruction without 3d supervision. In Advances in neural information processing systems, pages 1696–1704, 2016.

[50] Li Yi, Boqing Gong, and Thomas Funkhouser. Complete & label: A domain adaptation approach to semantic segmentation of lidar point clouds. arXiv preprint arXiv:2007.08488, 2020.

[51] Sicheng Zhao, Yezhen Wang, Bo Li, Bichen Wu, Yang Gao, Pengfei Xu, Trevor Darrell, and Kurt Keutzer. epointda: An end-to-end simulation-to-real do- main adaptation framework for lidar point cloud segmentation. arXiv preprint arXiv:2009.03456, 2020.

[52] Sicheng Zhao, Xiangyu Yue, Shanghang Zhang, Bo Li, Han Zhao, Bichen Wu, Ravi Krishna, Joseph E Gonzalez, Alberto L Sangiovanni-Vincentelli, Sanjit A Seshia, et al. A review of single-source deep unsupervised visual domain adap- tation. arXiv preprint arXiv:2009.00155, 2020.

[53] Qian-Yi Zhou, Jaesik Park, and Vladlen Koltun. Open3d: A modern library for 3d data processing. arXiv preprint arXiv:1801.09847, 2018.

61 Appendix A UI for automatic quality inspection

A user interface(UI) for automatic quality inspection of the wheel was built to classify and inspect the quality of real-time images generated from the 3D camera. This UI is built with the help of Streamlit UI package for machine learning and data science applications.

Figure A.1: User Interface for automatic quality inspection

The start software button is clicked for the application to start the inspection of images generated from the 3D camera.

The images generated are stored in a folder in the computer automatically. The software then accesses the folder and inspects the images. This inspection is done in real-time. The results show the quality description of the image.

62 Figure A.2: Software initialised and Application running for automatic quality in- spection

Figure A.3: Terminal that shows the accuracy of the image inspected for each class.

63

Faculty of Computing, Blekinge Institute of Technology, 371 79 Karlskrona, Sweden