<<

TRANSFER LEARNING IN

TAKAGI-SUGENO FUZZY

MODELS

Hua

Faculty of Engineering and Information Technology

University of Technology Sydney

A thesis submitted for the Degree of

Doctor of Philosophy

May 2018 Certificate of Authorship/Originality

I certify that the work in this thesis has not previously been submitted for a degree nor has it been submitted as part of the requirements for a degree except as fully acknowledged within the text. I also certify that the thesis has been written by me. Any help that I have received in my research work and the preparation of the thesis itself has been acknowledged. In addition, I certify that all information sources and literature used are indicated in the thesis.

Signature of candidate:

Date: Acknowledgements

I would like to express my sincere gratitude to my principal supervisor, Profes- sor Guangquan , and my co-supervisors, Professor Jie and Dr Vahid Behbood. Their comprehensive guidance covered all aspects of my PhD study, including research methodology, research topic selection, experiments, academic writing skills and thesis writing, even sentence structure and formulas. Their critical comments and suggestions strengthened my study significantly. Their strict academic attitude and respectful personalities benefited my PhD study and will be a great treasure throughout my life. Without their excellent supervision and continuous encouragement, this research would not have been finished on time. Thanks to all of for your kind help! I am grateful to all members of the Decision Systems and e-Service Intelli- gent (DeSI) Lab in the Centre for Artificial Intelligence (CAI) for their careful participation in my presentation and their valuable comments on my research. I wish to express my appreciation to all friends (especially Hongshu Chen, Qian Zhang, Xue and Fan Dong) for their help, accompany and valued comments in not only on my study, but also on my life. I would like to thank Ms Sue Felix and Ms Jem Moore for helping me to proofread my publications and this thesis. I wish to express my appreciation for the financial support I received for my study. Special thanks go to the University of Technology Sydney (UTS) and the China Scholarship Council (CSC). Also thanks for the support of the Australian Research Council under DP 140101366. iv

Last but not least, I would also like to thank my family members. Thanks to my mother, father and my brother for their continued encouragement and generous support. Thank you for supporting me and always being there for me during my ups and downs. Pursuing a PhD has always been a long-term challenge for me and a dream of my brother’s. This dream would have not been achieved without the love, great empathy and kind assistance of my family. Abstract

In classical data-driven machine learning methods, massive amounts of labeled data are required to build a high-performance prediction model. However, the amount of labeled data in many real-world applications is insufficient, so estab- lishing a prediction model is impossible. Transfer learning has recently emerged as a way of exploiting previously acquired knowledge to solve new yet similar problems much more efficiently and effectively. It exploits the knowledge ac- cumulated in auxiliary domains to help construct prediction models in a target domain with inadequate training data. Most existing methods on transfer learn- ing address classification tasks, but studies on transfer learning in the case of regression problems, which is important category in prediction models, are still scarce. In addition, the existing works ignore the inherent phenomenon of uncertainty - a crucial factor during the knowledge transfer process. To fill these gaps, this research develops algorithms and methods to deal with transfer learning in homogeneous and heterogeneous spaces using fuzzy rule- based models. Fuzzy rules are first generated from the source domain through a learning process. These rules, as acquired knowledge, are then transferred to the target domain. First, a novel fuzzy rule-based domain adaptation method is proposed to transfer knowledge between domains in homogeneous spaces. It utilizes the existing fuzzy rules of source domain and modifies the input space with nonlinear mappings to adapt to the current regression tasks in the target domain. Second, a granular fuzzy domain adaptation framework, comprising vi three methods, is developed to handle the knowledge transfer problems based on the idea of granular computing. Thirdly, a fuzzy domain adaptation method is developed to handle the case that the numbers of fuzzy rules in two domains do not match. In addition, the proposed methods are also used to solve the classification problems in transfer learning. Fourthly, an innovative method that combines an infinite Gaussian mixture model with active learning is presented to discover the structure of data and actively augment information in a target domain. Fifthly, a fuzzy heterogeneous domain adaptation method is proposed to transfer knowledge in heterogeneous spaces. The proposed algorithms and methods are validated in each step of development using experiments performed on both synthetic and real-world datasets. The results show that this study significantly improve the performance of existing models when solving new tasks in the target domain in both homogeneous and heterogeneous domain adaptation settings. Table of Contents

Certificate of Authorship/Originality ii

List of Figures xii

List of Tables xv

1 Introduction 1 1.1 Background ...... 1 1.2 Research questions and objectives ...... 5 1.3 Research contributions ...... 9 1.4 Research significance ...... 11 1.5 Thesis structure ...... 12 1.6 Publications related to this thesis ...... 14

2 Literature Review 17

2.1 Definitions and categories of transfer learning and the category . 17

2.2 Different types of transfer learning methods based on content . . 19

2.3 Types of transfer learning models based on domain knowledge . 21

2.4 Types of transfer learning based on computational intelligence techniques ...... 24 2.4.1 Transfer learning using neural networks (NN) ...... 24 2.4.1.1 Transfer learning using deep NN ...... 24 Table of Contents viii

2.4.1.2 Transfer learning using multiple task NN . . . 26 2.4.1.3 Transfer learning using radial basis function NNs 27 2.4.2 Transfer learning using Bayes ...... 28 2.4.2.1 Transfer learning using naïve Bayes ...... 28 2.4.2.2 Transfer learning using Bayesian network . . 30 2.4.2.3 Transfer learning using a hierarchical Bayesian model ...... 33 2.4.3 Transfer learning using fuzzy systems and genetic algo- rithms ...... 34 2.4.4 Applications of transfer learning ...... 37 2.4.4.1 Natural language processing ...... 37 2.4.4.2 Computer vision and image processing .... 38 2.4.4.3 Biology ...... 40 2.4.4.4 Finance ...... 41 2.4.4.5 Business management ...... 41 2.5 Granular computing ...... 42 2.5.1 Concepts of granular computing ...... 42 2.5.2 The formalisms of granular computing ...... 44 2.5.3 The application of granular computing ...... 45 2.6 Summary ...... 47

3 Fuzzy Homogeneous Domain Adaptation 48

3.1 Introduction ...... 48

3.2 Takagi-Sugeno fuzzy model ...... 50

3.3 Fuzzy domain adaptation ...... 54

3.3.1 Problem statement ...... 54 3.3.2 Fuzzy rule-based domain adaptation in homogeneous space 55 Table of Contents ix

3.3.3 Knowledge transfer in fuzzy rule-based models ..... 57 3.4 Empirical results analysis ...... 61 3.4.1 Generation of synthetic datasets ...... 62 3.4.2 Comparison of evolution algorithms partical swarm optimization (PSO) and differential evolution (DE) ...... 68 3.4.3 Comparison of different ways of constructing mappings 73 3.4.4 Experiments on real-world datasets ...... 76 3.5 Summary ...... 80

4 Granular Domain Adaptation in Takagi-Sugeno Fuzzy Models 81 4.1 Introduction ...... 81 4.2 Problem statement ...... 83 4.3 The definitions of fuzzy domain adaptation ...... 83 4.4 Granular domain adaptation in Takagi-Sugeno fuzzy models . . 85 4.4.1 Granular transfer learning framework ...... 85 4.4.2 Methods and algorithms of GFRDA ...... 89 4.4.3 Performance index ...... 97 4.5 Empirical results analysis ...... 100 4.5.1 Experiments on synthetic datasets ...... 100 4.5.1.1 Datasets and experimental settings ...... 101 4.5.1.2 Experimental results ...... 103

4.5.2 Experiments on real-world datasets ...... 109

4.5.3 Fuzzy transfer learning for label space adaptation .... 111

4.6 Summary ...... 117

5 Fuzzy Domain Adaptation for the Dismatch of Fuzzy Rules 119

5.1 Introduction ...... 119 5.2 Problem statement ...... 120 Table of Contents x

5.3 Fuzzy transfer learning for dismatch of fuzzy rules ...... 121 5.4 Empirical results analysis ...... 126 5.4.1 Datasets and experimental settings ...... 127 5.4.2 Validation of algorithm effectiveness ...... 128 5.4.3 Parameter sensitivity analysis ...... 131 5.5 Comparison of two ways of constructing mappings ...... 134 5.6 Application to the classification problems ...... 139 5.7 Summary ...... 144

6 Fuzzy Transfer Learning Based on IGMM and Active Learning 145 6.1 Introduction ...... 145 6.2 Preliminaries ...... 146 6.2.1 The infinite Gaussian mixture model ...... 146 6.2.2 Active learning ...... 147 6.3 Domain adaptation through active learning ...... 148 6.3.1 The framework ...... 148 6.3.2 A transfer learning method based on infinite Gaussian mixture model (IGMM) and active learning ...... 152 6.3.3 Performance index ...... 158 6.4 Experiments on synthetic datasets ...... 159 6.4.1 Exploring data’s structure using IGMM ...... 160

6.4.1.1 Similar source and target data ...... 160

6.4.1.2 Different source and target ...... 166

6.4.2 Augmenting the information in the target domain with an active learning technique ...... 169

6.5 Experiment on real-world datasets ...... 171 6.6 Summary ...... 175 Table of Contents xi

7 Fuzzy Rule-based Domain Adaptation in Heterogeneous Spaces 176 7.1 Introduction ...... 176 7.2 Problem statement ...... 178 7.3 Fuzzy domain adaptation in heterogeneous spaces ...... 180 7.4 Empirical results analysis ...... 186 7.4.1 Experiments on synthetic datasets ...... 186 7.4.1.1 Datasets and experimental settings ...... 186 7.4.1.2 Regression results ...... 187 7.4.1.3 Parameter sensitivity analysis ...... 189 7.4.2 Experiments on real-world datasets ...... 190 7.5 Summary ...... 192

8 Conclusion and Future Research 194 8.1 Conclusions ...... 194 8.2 Future study ...... 196

Bibliography 198

Abbreviations 216 List of Figures

1.1 RSS collected in different time periods ...... 3 1.2 RSS collected by different devices ...... 3 1.3 Thesis structure ...... 12

3.1 Architecture of nonlinear mapping ...... 59 3.2 Input data of source domain ...... 70 3.3 Input data of target domain ...... 70 3.4 Source and target data (2D) ...... 70 3.5 Source and target data (3D) ...... 70 3.6 Two methods of constructing the mappings for the input space . 75 3.7 Housing data (2D) ...... 77 3.8 Housing data (3D) ...... 77

4.1 Knowledge transfer from a GrC perspective ...... 87 4.2 An example of conditions of fuzzy rules under space transformation 89 4.3 The granular fuzzy domain adaptation process ...... 90

4.4 Method 1: changing the input space ...... 92

4.5 Method 2: changing the output space ...... 95

4.6 Method 3: changing both the input and output spaces ...... 96 4.7 Source data for exp A ...... 112 4.8 Target datta for exp A ...... 112 List of Figures xiii

4.9 Source data for exp. B ...... 114 4.10 Target data for exp. B ...... 114

5.1 Architecture of piecewise linear mapping ...... 123 5.2 Sensitivity analysis of parameter p in three experiments ..... 132 5.3 Sensitivity analysis of parameter q in three experiments ..... 133 5.4 Sensitivity analysis of parameter h in three experiments ..... 133 5.5 Source data ...... 134 5.6 Target data ...... 134 5.7 Nonlinear mappings ...... 137 5.8 Piecewise liner mappings ...... 137 5.9 Source data ...... 137 5.10 Target data ...... 137 5.11 Nonlinear mappings ...... 139 5.12 Piecewise linear mappings ...... 139 5.13 Instances in source domain ...... 143 5.14 Instances in target domain ...... 143

6.1 The framework of the active learning-based fuzzy transfer learn- ing method ...... 149 6.2 Example of the results for IGMM ...... 153 6.3 The data structure in Experiment 1 ...... 161

6.4 The data structure in Experiment 2 ...... 163

6.5 The data structure in Experiment 3 ...... 164

6.6 The model’s performance with varying values for ε ...... 166

6.7 Three datasets with different numbers of clusters ...... 167

6.8 The structure of the three datasets ...... 168 6.9 IGMM’s results with the "air quality" dataset ...... 172 List of Figures xiv

6.10 IGMM’s results with the "housing" dataset ...... 174

7.1 Four cases distinguishing the source and target domains ..... 179 List of Tables

3.1 Parameters of input data in the source domain ...... 63 3.2 Parameters of input data in the source domain ...... 63 3.3 Parameters of input data in the target domain ...... 65 3.4 Coefficients of linear functions in the target domain ...... 65 3.5 Distributions of source data and target Data ...... 69 3.6 Coefficients of linear functions in two domains ...... 69 3.7 The results of experiments ...... 70 3.8 The stability of PSO and DE ...... 71

3.9 The values of Q3 under different parameters ...... 73 3.10 Distributions of source data and target data ...... 74

3.11 Summary of the experimental results Q3(mean± standard deviation) 76 3.12 Results of dataset "Housing" ...... 77

3.13 The values of Q1, Q2, and Q3 for "Housing" ...... 78 3.14 Results of dataset "Concrete Compressive Strength" ...... 79

4.1 The distributions of input data 1 and data 2 ...... 102

4.2 Coefficients of linear functions in the two different groups . . . 102

4.3 Datasets for three cases in domain adaptation ...... 102

4.4 Results of the first method by varying λ ...... 104 4.5 Results of the second method by varying λ ...... 105 List of Tables xvi

4.6 Results of the third method by varying λ ...... 106

4.7 The values of Q,Q1,Q2, Q3 with different number of labeled target data ...... 107 4.8 Comparison of the three methods in three cases ...... 108 4.9 Comparison of the three methods used in real-world datasets . . 111 4.10 Results with three clusters (fuzzy rules) ...... 113 4.11 Results with four clusters (fuzzy rules) ...... 113 4.12 Results with three clusters (fuzzy rules) ...... 115 4.13 Results with four clusters (fuzzy rules) ...... 115 4.14 Attributes of the "banknote authentication" dataset ...... 116 4.15 Results of the "banknote authentication" dataset ...... 117

5.1 Four datasets with a different number of fuzzy rules ...... 127 5.2 Results of the twelve experiments ...... 130 t 5.3 M1 built using/not using HU - model comparison ...... 131 5.4 Parameters in model Mt (nonlinear mappings) ...... 135 5.5 Parameters in model Mt (piecewise linear mappings) ...... 136 5.6 The regression results ...... 136 5.7 The regression results ...... 138 5.8 Three cases of different distributions ...... 140 5.9 Parameters in model Mt (nonlinear mappings) ...... 140 5.10 Information of datasets in source and target domains ...... 140

5.11 Accuracy of models in predicting the labels in domains ..... 141

5.12 Information of datasets in two domains ...... 141

5.13 Information of two datasets ...... 142

5.14 Accuracy of models in predicting the labels in domains ..... 142 5.15 Accuracy of models in predicting the labels in domains ..... 144 List of Tables xvii

6.1 Various source domains and a two-clusters target domain .... 161 6.2 Various source domains and a three-clusters target domain . . . 163 6.3 Various source domains and a four-clusters target domain .... 164 6.4 Various source domains and a four-clusters target domain .... 165 6.5 The values of Q2 with two clusters target domain ...... 168 6.6 The values of Q2 with three clusters target domain ...... 168 6.7 The values of Q2 with four clusters target domain ...... 169 6.8 Exploring the effect of the active learning technique ...... 170 6.9 The values of Q2 with varying clusters ...... 170 6.10 The values of Q2 with varying clusters for "air quality" ..... 172 6.11 The values of Q2 with varying clusters for "housing" ...... 174

7.1 Information of four datasets ...... 186 7.2 Heterogeneous domain adaptation datasets ...... 187 7.3 Results of the six experiments ...... 188

7.4 Using/not using HU - comparative experiments ...... 189 7.5 Results of the sensitivity analysis for c ...... 190 7.6 Results for fuzzy heterogeneous domain adaptation (FHeDA) on real-world datasets ...... 192