Toward Deconfounding the Influence of Entity Demographics for Question Answering Accuracy Maharshi Gor∗ Kellie Webster Jordan Boyd-Graber∗ CS Google Research, New York CS, UMIACS, iSchool, LCS University of Maryland
[email protected] University of Maryland
[email protected] [email protected] NQ QB SQuAD TriviaQA Entity Type Abstract Train Dev Train Dev Train Dev Train Dev Person 32.14 32.27 58.49 57.18 14.56 9.56 40.89 40.69 QA No entity 21.97 22.61 15.88 20.13 37.27 47.95 14.85 14.93 The goal of question answering ( ) is to Location 28.42 27.82 34.50 35.11 33.52 29.34 44.32 44.10 answer any question. However, major QA Work of art 17.31 16.19 27.98 28.16 3.15 1.15 17.53 17.76 Other 2.66 2.70 9.83 9.88 5.08 3.10 14.38 14.61 datasets have skewed distributions over gen- Organization 17.73 18.05 20.37 17.78 21.34 19.14 22.15 21.14 der, profession, and nationality. Despite that Event 8.15 8.89 8.87 8.75 3.16 3.01 8.59 8.73 Product 0.85 1.06 0.88 0.63 1.39 0.33 3.99 3.99 skew, model accuracy analysis reveals little ev- Total Examples 106926 2631 112927 2216 130319 11873 87622 11313 idence that accuracy is lower for people based on gender or nationality; instead, there is more Table 1: Coverage (% of examples) of entity-types in variation on professions (question topic).