On the Efficacy of Adversarial Data Collection for Question Answering: Results from a Large-Scale Randomized Study Divyansh Kaushiky, Douwe Kielaz, Zachary C. Liptony, Wen-tau Yihz y Carnegie Mellon University; z Facebook AI Research fdkaushik,
[email protected], fdkiela,
[email protected] Abstract investigated adversarial data collection (ADC), a scheme in which a worker interacts with a model In adversarial data collection (ADC), a hu- man workforce interacts with a model in real (in real time), attempting to produce examples that time, attempting to produce examples that elicit incorrect predictions (e.g., Dua et al., 2019; elicit incorrect predictions. Researchers hope Nie et al., 2020). The hope is that by identifying that models trained on these more challeng- parts of the input domain where the model fails one ing datasets will rely less on superficial pat- might make the model more robust. Researchers terns, and thus be less brittle. However, de- have shown that models trained on ADC perform spite ADC’s intuitive appeal, it remains un- better on such adversarially collected data and that clear when training on adversarial datasets pro- with successive rounds of ADC, crowdworkers are duces more robust models. In this paper, we conduct a large-scale controlled study focused less able to fool the models (Dinan et al., 2019). on question answering, assigning workers at While adversarial data may indeed provide more random to compose questions either (i) adver- challenging benchmarks, the process and its actual sarially (with a model in the loop); or (ii) in benefits vis-a-vis tasks of interest remain poorly the standard fashion (without a model).