Hierarchy and Interpretability in Neural Models of Language Processing ILLC Dissertation Series DS-2020-06
Total Page:16
File Type:pdf, Size:1020Kb
Hierarchy and interpretability in neural models of language processing ILLC Dissertation Series DS-2020-06 For further information about ILLC-publications, please contact Institute for Logic, Language and Computation Universiteit van Amsterdam Science Park 107 1098 XG Amsterdam phone: +31-20-525 6051 e-mail: [email protected] homepage: http://www.illc.uva.nl/ The investigations were supported by the Netherlands Organization for Scientific Research (NWO), through a Gravitation Grant 024.001.006 to the Language in Interaction Consortium. Copyright © 2019 by Dieuwke Hupkes Publisher: Boekengilde Printed and bound by printenbind.nl ISBN: 90{6402{222-1 Hierarchy and interpretability in neural models of language processing Academisch Proefschrift ter verkrijging van de graad van doctor aan de Universiteit van Amsterdam op gezag van de Rector Magnificus prof. dr. ir. K.I.J. Maex ten overstaan van een door het College voor Promoties ingestelde commissie, in het openbaar te verdedigen op woensdag 17 juni 2020, te 13 uur door Dieuwke Hupkes geboren te Wageningen Promotiecommisie Promotores: Dr. W.H. Zuidema Universiteit van Amsterdam Prof. Dr. L.W.M. Bod Universiteit van Amsterdam Overige leden: Dr. A. Bisazza Rijksuniversiteit Groningen Dr. R. Fern´andezRovira Universiteit van Amsterdam Prof. Dr. M. van Lambalgen Universiteit van Amsterdam Prof. Dr. P. Monaghan Lancaster University Prof. Dr. K. Sima'an Universiteit van Amsterdam Faculteit der Natuurwetenschappen, Wiskunde en Informatica to my parents Aukje and Michiel v Contents Acknowledgments xiii 1 Introduction 1 1.1 My original plan . .1 1.2 Neural networks as explanatory models . .2 1.2.1 Architectural similarity . .3 1.2.2 Behavioural similarity . .4 1.2.3 Model interpretability . .5 1.3 Summary and outline . .5 2 Recurrent neural networks and grammatical structure 7 2.1 Recurrent neural networks . .7 2.1.1 Simple recurrent networks . .8 2.1.2 Long short-term memory networks . .8 2.1.3 Gated recurrent units . 10 2.1.4 LSTM vs GRU . 11 2.2 Recurrent architectures . 11 2.2.1 Types of architectures . 11 2.2.2 Word embeddings . 12 2.2.3 Output layers . 12 2.2.4 Attention . 12 2.3 Approach 1: artificial data . 13 2.3.1 Formal grammars . 13 2.3.2 Compositional signal-meaning mappings . 13 2.4 Approach 2: naturalistic data . 16 2.4.1 Language models as psycholinguistic subjects . 16 2.4.2 The number agreement task . 17 2.4.3 Supervised number prediction . 17 2.4.4 SV agreement in language models . 18 vii 2.4.5 SV agreement in language models, revisited . 18 2.5 Summary . 19 Part One: Artificial languages 3 Diagnostic classification and the arithmetic language 23 3.1 Arithmetic language . 24 3.1.1 Symbolic strategies . 24 3.1.2 Predictions following from strategies . 26 3.2 Can RNNs learn the arithmetic language? . 26 3.2.1 Training . 27 3.2.2 Evaluation . 28 3.3 Interpreting hidden activations . 29 3.3.1 Individual cell dynamics . 29 3.3.2 Gate activation statistics . 29 3.4 Diagnostic classification . 31 3.5 Cumulative or recursive? . 32 3.5.1 Diagnostic accuracies . 33 3.5.2 Plotting trajectories . 34 3.6 Refining the cumulative hypothesis . 35 3.6.1 Two hypotheses . 35 3.6.2 Computing scopes . 36 3.6.3 What is encoded where? . 37 3.6.4 Diagnosing gates . 38 3.7 Conclusion . 39 4 PCFG SET 41 4.1 Compositionality . 42 4.1.1 Systematicity . 44 4.1.2 Productivity . 45 4.1.3 Substitutivity . 45 4.1.4 Localism . 46 4.1.5 Overgeneralisation . 47 4.2 Data . 48 4.2.1 Input sequences: syntax . 49 4.2.2 Output sequences: semantics . 49 4.2.3 Data construction . 50 4.3 Architectures . 52 4.3.1 LSTMS2S . 53 4.3.2 ConvS2S . 53 4.3.3 Transformer . 54 4.4 Experiments and results . 55 viii 4.4.1 Task accuracy . 56 4.4.2 Systematicity . 58 4.4.3 Productivity . 60 4.4.4 Substitutivity . 63 4.4.5 Localism . 67 4.4.6 Overgeneralisation . 70 4.5 Conclusion . 74 Part Two: Natural language 5 Diagnostic classification and interventions 79 5.1 Model . 80 5.2 Data . 80 5.2.1 Gulordava data . 80 5.2.2 Wikipedia dependency corpus . 81 5.3 Predicting number from activations . 81 5.3.1 Diagnostic classifier training . 82 5.3.2 Results . 83 5.4 Representations across time steps . 85 5.4.1 Diagnostic classifier training . 85 5.4.2 Results . 85 5.5 Representations across components . 87 5.6 Interventions . 88 5.6.1 Intervention procedure . 89 5.6.2 New diagnostic classifier accuracies . 89 5.6.3 NA-task accuracies . 90 5.7 Conclusion . 91 6 Neuron ablation 93 6.1 Data . 94 6.1.1 Linzen data set . 94 6.1.2 Synthetic data sets . 94 6.2 Task performance . 95 6.3 Long-distance number units . 97 6.3.1 Ablation experiment . 97 6.3.2 Singular and plural unit dynamics . 99 6.3.3 Correctly vs incorrectly processed sentences . 101 6.4 Short-distance number units . 103 6.4.1 Short and long-distance number information . 104 6.4.2 Ablating short-distance units . 104 6.5 Syntax units . 105 6.5.1 Tree-depth prediction . 105 ix 6.5.2 Behaviour of syntax units . 106 6.6 Conclusion . 106 7 Generalised contextual decomposition 109 7.1 Contextual Decomposition . 110 7.1.1 Separating relevant and irrelevant parts . 110 7.1.2 Output logits zt ........................ 111 7.1.3 Hidden state ht ........................ 112 7.1.4 Memory cell ct ........................ 114 7.1.5 Gates ft, ot and it, and candidate activationc ~ ....... 114 7.1.6 Shapley approximation . 115 7.2 Generalised contextual decomposition . 116 7.2.1 Choosing interaction sets . 117 7.2.2 Technical fixes . 118 7.3 Model and data . 120 7.4 Token contributions . 120 7.4.1 Decomposition matrix . 121 7.4.2 Average decomposition matrices . 123 7.5 Information ablation . 124 7.5.1 Subject information . 124 7.5.2 The role of the intercepts . 125 7.5.3 The decoder bias . 126 7.6 Conclusion . 126 Part Three: Guiding models 8 Attentive Guidance 131 8.1 Data . 132 8.1.1 Task description . 132 8.1.2 Data splits . 133 8.2 Model . 134 8.2.1 Architecture . 134 8.2.2 Attention mechanism . 135 8.2.3 Learning . 136 8.3 Attentive guidance . 136 8.3.1 Attentive guidance as a loss . 136 8.4 Experiments . 138 8.4.1 Model parameters . ..