Mitigating Political Bias in Language Models Through Reinforced Calibration Ruibo Liu,1 Chenyan Jia, 2 Jason Wei, 3 Guangxuan Xu, 1 Lili Wang, 1 Soroush Vosoughi 1 1 Department of Computer Science, Dartmouth College 2 Moody College of Communication, University of Texas at Austin 3 ProtagoLabs
[email protected],
[email protected] Abstract with particular keywords of the aforementioned attributes, and 2) Direct Bias, which measures bias in texts generated Current large-scale language models can be politically bi- using prompts that have directly ideological triggers (e.g., ased as a result of the data they are trained on, potentially causing serious problems when they are deployed in real- democrat, republican) in addition to keywords of aforemen- world settings. In this paper, we describe metrics for mea- tioned attributes. Table 1 shows four samples of text gen- suring political bias in GPT-2 generation and propose a re- erated by off-the-shelf GPT-2 with different attribute key- inforcement learning (RL) framework for mitigating political words in the prompts—all samples exhibit political bias. biases in generated text. By using rewards from word em- For example, when triggered with a prompt including mar- beddings or a classifier, our RL framework guides debiased ijuana, the generated text tends to present a favorable atti- generation without having access to the training data or re- tude (e.g., “I believe it should be legal and not regulated.”), quiring the model to be retrained. In empirical experiments which is mostly a liberal stance. More interestingly, even on three attributes sensitive to political bias (gender, loca- a prompt including a conservative trigger (republican) re- tion, and topic), our methods reduced bias according to both sults in generation which leans to the liberal side (“vote for our metrics and human evaluation, while maintaining read- ability and semantic coherence.