Fact Extraction and Verification in Danish Sidsel Latsch Jespersen (
[email protected]) - Mikkel Ekenberg Thygesen (
[email protected]) June 1, 2020 Supervised by Leon Derczynski 1 Contents 1 Introduction 4 1.1 Research Objective . .5 2 Fact Extraction and Verification 6 2.1 Background . .6 2.2 Measures to Counter Information Disorder . .6 2.3 Fact Extraction and Verification Definition . .7 2.4 Natural Language Inference . .7 2.5 The FEVER Shared Task . .7 3 Collecting and Preprocessing Data 8 3.1 Data Statement . .8 3.2 Data Characteristics . .8 3.3 Annotation Process . .9 3.3.1 Annotation Reliability . 10 3.4 Data Collection . 10 3.4.1 3K Data Set . 11 3.4.2 4K Data Set with Randomly Generated Claim Entities . 13 3.5 Individual Overrepresentation . 15 3.5.1 Addressing the Bias . 16 4 A Model for Danish Fact Verification 16 4.1 Data Preparation . 16 4.2 Pretrained BERT model . 18 4.2.1 Regular BERT . 18 4.2.2 Multilingual BERT . 19 4.3 Optimiser . 20 4.4 Scoring Metrics . 21 4.5 Baseline . 21 5 Results 22 5.1 Choosing Model Parameters . 22 5.2 Using Randomly Generated NotEnoughInfo-labelled Claims . 23 5.3 Performance . 24 5.3.1 SGD . 24 5.3.2 BertAdam . 26 5.3.3 Results Comparison . 27 5.3.4 Results Stability . 28 5.4 False Prediction Analysis . 29 5.4.1 Reasons for Incorrect Predictions . 30 5.4.2 Unsatisfactory Model Performance . 32 6 Comparison with LSTM Model 33 2 6.1 RNN . 33 6.2 LSTM architecture .