Target Curricula for Multi-Target Classification: the Role of Internal Meta-Features in Machine Teaching
Total Page:16
File Type:pdf, Size:1020Kb
Target Curricula for Multi-Target Classification: The Role of Internal Meta-Features in Machine Teaching by Shannon Kayde Fenn B Eng (Comp) / B Comp Sc Thesis submitted in fulfilment of the requirements for the Degree of Doctor of Philosophy Supervisor: Prof Pablo Moscato Co-Supervisor: Dr Alexandre Mendes Co-Supervisor: Dr Nasimul Noman This research was supported by an Australian Government Research Training Program (RTP) Scholarship The University of Newcastle School of Electrical Engineering and Computing June, 2019 © Copyright by Shannon Kayde Fenn 2019 Target Curricula for Multi-Target Classification: The Role of Internal Meta-Features in Machine Teaching Statement of Originality I hereby certify that the work embodied in the thesis is my own work, conducted under normal supervision. The thesis contains no material which has been accepted, or is being examined, for the award of any other degree or diploma in any university or other tertiary institution and, to the best of my knowledge and belief, contains no material previously published or written by another person, except where due reference has been made. I give consent to the final version of my thesis being made available worldwide when deposited in the University’s Digital Repository, subject to the provisions of the Copyright Act 1968 and any approved embargo. Shannon Kayde Fenn 3rd June 2019 iii To Mum and Mar Mar, the brightest lights may shine the shortest but their light reaches farthest. Acknowledgments The first and most important acknowledgement must go to my wife Coralie. You have been with me through every hard moment of my adult life, and the cause of most of the good ones. This thesis would not exist if not for you, nor would I be the person I am today. Thank you. My heartfelt gratitude to my advisers, Pablo, Nasimul, and Alex. Your guidance and wisdom sheltered me from many a poor choice over the years. Your support in professional and personal matters went above and beyond. I am thankful to count you friends. To my father, Richard, you have always been my hero. The way I approach the world and treat the people in it is completely due to you. The day you met mum, and the day you married her, I’m sure they were her best. To my little sisters, Jess and Deeds, never change. You’ve always made me feel loved and special. If ever two people made me feel like I could accomplish something, it was you two. To Bo, Grizzy, Terry, Archie, Baxter, Jill, Brian, Jack, Amos, and everyone else in the Fenn clan and beyond, I can’t remember a single time I didn’t feel part of the family and for that I can never thank you enough. The best result of choosing to do a PhD has been the friends I made. To my friends from CIBM: Amer, Amir, Claudio, Francia, Heloisa, Jake, Inna, Leila, Luke, Łukasz, Marta, Nader, and Natalie, thank you for all the coffee breaks, lunch conversations that were too good to want to leave, and other countless helping hands along the way. To Pat, Matt, Greg, Amy, Chelsea, Ben, Bec, Jason, and far too many more good people to list, thank you for all your support throughout these years. Shannon Kayde Fenn The University of Newcastle June 2019 v List of Publications • S. Fenn, and P. Moscato, “Target Curricula via Selection of Minimum Feature Sets: a Case Study in Boolean Networks”. The Journal of Machine Learning Research, vol. 18, no. 114, pp. 1–26, 2017. vi Contents Acknowledgments v List of Publications vi List of Tables x List of Figures xi Abstract xxiii 1 Introduction 1 1.1 Curriculum Learning . 1 1.2 Machine Learning for Logic Synthesis . 2 1.3 Research Aims . 3 1.4 Thesis Overview . 4 2 Background 6 2.1 Learning Multiple Targets . 6 2.1.1 Formalism and Terminology . 7 2.1.2 The Label/Target distinction . 8 2.1.3 Measuring Prediction Performance . 9 2.1.4 Methods for MTC . 13 2.2 Curriculum Learning . 16 2.2.1 Curricula in Human and Animal Learning . 17 2.2.2 Example Curricula . 17 2.2.3 Target Curricula . 18 2.2.4 Measuring and comparing curricula . 19 2.2.5 Summary . 21 2.3 Intrinsic Dimension and Feature Selection . 22 2.3.1 Feature Selection . 22 2.3.2 The Minimum Feature Set Problem . 23 2.3.3 Summary . 24 vii 2.4 Logic Synthesis . 25 2.5 Boolean Networks . 26 2.5.1 Training Boolean Networks . 28 2.5.2 Late-Acceptance Hill Climbing . 28 2.5.3 Varying sample size . 29 2.6 Conclusion . 30 3 Target Curricula and Hierarchical Loss Functions 32 3.1 Can Guiding Functions Enforce a Curriculum? . 33 3.1.1 Hierarchical Loss Functions . 33 3.1.2 Cases with Suspected Curricula . 36 3.1.3 Training . 38 3.1.4 Experimental Results . 39 3.2 Appraising “Easy-to-Hard”: Ablation Studies . 44 3.3 Discovering Curricula . 46 3.3.1 Target Complexity and Minimum Feature Sets . 48 3.3.2 Experiments and Results . 49 3.4 Real-World Problems . 50 3.4.1 ALU and Biological Models . 50 3.4.2 Inferring Regulatory Network Dynamics From Time-Series . 53 3.4.3 Results . 54 3.5 Discussion and Conclusion . 56 3.5.1 Issues and Limitations . 58 3.5.2 Future Work and Direction . 59 4 Application of ID-curricula to Classifier Chains 60 4.1 Introduction . 60 4.1.1 Classifier Chains . 61 4.1.2 When and how to order chains . 63 4.2 Target-aware ID-curricula . 65 4.3 FBN Classifier Chains . 66 4.4 Experiments . 67 4.4.1 Datasets . 67 4.4.2 Training and Evaluation . 71 4.4.3 Qualifying Curricula . 72 4.5 Results . 73 4.5.1 Prior benchmarks . 73 4.5.2 Random Cascaded Circuits . 80 4.5.3 LGSynth91 . 83 viii 4.6 Discussion and Conclusion . 85 4.6.1 Future Work and Direction . 88 5 Adaptive Learning Via Iterated Selection and Scheduling 89 5.1 Self-paced Curricula and Internal Meta-Features . 90 5.1.1 Motivation: stepping stone state discovery . 90 5.1.2 The Internal Meta-Feature Selection principle . 91 5.1.3 Tractability . 94 5.2 Baseline Comparison . 96 5.2.1 Baselines . 97 5.2.2 Measures . 97 5.2.3 Results and Discussion . 98 5.3 Ablative Studies . 108 5.3.1 Aims and Experimental Design . 108 5.3.2 Results and Discussion . 108 5.4 Scaling to Larger Problems . 113 5.5 Conclusion . 116 5.5.1 Future Work . 117 6 Application of ALVISS to Deep Neural Nets 118 6.1 Feedforward Neural Networks . 118 6.2 Meta-features and Target Curricula in NNs . 120 6.3 Experiments . 121 6.3.1 Architectures . 122 6.3.2 Hyper-parameter Selection . 122 6.3.3 Metrics . 123 6.4 Results . 124 6.5 Discussion and Conclusions . 127 7 Conclusion and future work 129 7.1 Conclusions and Contributions . 129 7.2 Suggestions for Future Work . 131 7.2.1 Feature Selection . 132 7.2.2 Domain Extension . 133 7.2.3 Benchmarks . 134 ix List of Tables 3.1 Example error matrices and the associated values for L1, Lw, Llh, and Lgh (ignoring a normalisation constant for readability). Note that the purpose is not to directly compare the different losses on a particular matrix, but instead to compare the pairwise disparities in the same loss on different error matrices. For example, the first and second matrices are equivalent under L1 but the first is preferred by all other losses. Rows of E are examples and columns are targets (ordered by the curriculum being enforced). Also included are equivalents of E under the two hierarchical losses: the recurrences defining these losses can be thought of as defining a transformation on E under which the respective loss is equivalent to L1............................... 36 3.2 Instance sizes of initial test-bed problems. 39 3.3 Instance sizes of the real-world test-beds. 54 3.4 Results for the yeast dataset. See Figure 3.10 for the estimated hierar- chies. Note that these results are mean difference in test set accuracy, reported as a percentage, and not MCC. The use of curricula on the SK! {Ste9, Rum1} hierarchy yielded negligible improvement, however we see more promise in the PP! {Cdc2/Cdc13, Cdc2/Cdc13*} hierarchy, particularly for Lgh. ............................ 56 3.5 Results for the E. coli dataset. Note that these results are mean differ- ence in test set accuracy, reported as a percentage, and not MCC. See Figure 3.10 for the estimated hierarchies. The use of curricula on the {G2, G8}!G6 hierarchy has given some improvement, however for Llh we see a drop in performance on one of the base targets in the hierarchy. For Lgh the results remained positive overall. 56 4.1 Test-bed instance sizes used in this chapter. 71 5.1 Parameters for Meta-heuristic for Randomised Priority Search (Meta- RaPS) [203] along with optimal values found using random parameter search [206]. 96 x 5.2 Instance and training set sizes for the large adders. Number of targets ni and example pool size are not given as they are 2ni and 2 respectively for all problems. 115 6.1 Architecture and hyper-parameter search space and values found using random parameter search [206]. 123 List of Figures 2.1 A 56-node Feedforward Boolean Network (FBN) which correctly im- plements the 6-bit addition function. Each node takes 2 inputs and computes the NAND function as its output. Inputs (far left) have been coloured red and outputs (far right) green.