Resonance Analysis
Total Page:16
File Type:pdf, Size:1020Kb
Working Paper Resonance Analysis A Method for Detecting Author Affiliations Through Distinctive Patterns of Word Choice Joshua Mendelsohn, William Marcellino, Jaime L. Hastings, and Steven J. Morgan RAND Center for Network Analysis and System Science WR-A1008-1 January 2021 Prepared for the Pardee RAND Graduate School RAND working papers are intended to share researchers’ latest findings and to solicit informal peer review. This paper has been approved for circulation by the RAND Pardee Graduate School. Unless otherwise indicated, working papers can be quoted and cited without permission of the author, provided the source is clearly referred to as a working paper. RAND’s publications do not necessarily reflect the opinions of its research clients and sponsors. is a registered trademark. For more information on this publication, visit www.rand.org/pubs/working_papers/WRA1008-1.html Published by the RAND Corporation, Santa Monica, Calif. © Copyright 2021 RAND Corporation R® is a registered trademark Limited Print and Electronic Distribution Rights This document and trademark(s) contained herein are protected by law. This representation of RAND intellectual property is provided for noncommercial use only. Unauthorized posting of this publication online is prohibited. Permission is given to duplicate this document for personal use only, as long as it is unaltered and complete. Permission is required from RAND to reproduce, or reuse in another form, any of its research documents for commercial use. For information on reprint and linking permissions, please visit www.rand.org/pubs/permissions.html. The RAND Corporation is a research organization that develops solutions to public policy challenges to help make communities throughout the world safer and more secure, healthier and more prosperous. RAND is nonprofit, nonpartisan, and committed to the public interest. RAND’s publications do not necessarily reflect the opinions of its research clients and sponsors. Support RAND Make a tax-deductible charitable contribution at www.rand.org/giving/contribute www.rand.org Abstract Resonance analysis is an evolving text mining method for estimating how much affinity exists in a broader population for a specific group. The method involves a multistage procedure for scoring words according to how distinctive they are of authors in the specific group, and then scoring a broader population according to whether they make similar distinctive choices. While likely applicable to many different forms of written content, resonance analysis was developed for short-form social media posts and has been tested primarily on Twitter and Twitter-like data. In this working paper, we describe resonance analysis and provide detailed guidance for using it effectively. We then conduct an empirical test of real-world Twitter data to demonstrate and validate the method using tweets from Republican Party members, Democratic Party members, and members of the news media. The results show that the method is able to distinguish Republicans’ tweets from Democrats’ tweets with 92-percent accuracy. We then demonstrate and validate the method using simulated artificial language to create controlled experimental conditions. The method is very accurate when minimum data requirements have been met. In an operational field test, resonance analysis generated results that mirror real-world conditions and achieved statistically significant agreement with double-blind human analyst judgments of the same users. Finally, we provide examples of resonance analysis usage with social media data and identify opportunities for future research. ii Contents Abstract ........................................................................................................................................... ii Figures............................................................................................................................................ iv Tables .............................................................................................................................................. v Executive Summary ....................................................................................................................... vi Acknowledgments…………………………………………………………………………..…....xi Abbreviations ................................................................................................................................ xii 1. Context and Conceptual Framework .......................................................................................... 1 Motivating Problem .................................................................................................................................. 1 Methodological Foundations .................................................................................................................... 3 Conceptual Foundations ........................................................................................................................... 7 2. Methodology ............................................................................................................................... 9 Chapter Summary ..................................................................................................................................... 9 Procedure ................................................................................................................................................ 10 Assorted Guidance on Research Design Decisionmaking ..................................................................... 23 3. Empirical Validation Test Results ............................................................................................ 29 Chapter Summary ................................................................................................................................... 29 Data ......................................................................................................................................................... 30 Results .................................................................................................................................................... 32 4. Simulation-Based Validation Test Results ............................................................................... 44 Chapter Summary ................................................................................................................................... 44 Procedure ................................................................................................................................................ 44 Results .................................................................................................................................................... 49 5. Resonance Analysis in Practice: Closing Thoughts .................................................................. 53 Chapter Summary ................................................................................................................................... 53 Previous Work Using Resonance Analysis on Social Media ................................................................. 53 Opportunities for Further Research ........................................................................................................ 55 Conclusion .............................................................................................................................................. 57 References ..................................................................................................................................... 58 iii Figures Figure 2.1. Example Dual Calibration Chart from Performance Testing of Equation 2.2 ........... 22 Figure 2.2. Percentage of Tweeters in Top 13 Twitter Languages Meeting Minimum Volume Requirements over Time ....................................................................................................... 28 Figure 3.1. Correspondence Between Resonance Scores and NOMINATE Scores .................... 39 Figure 3.2. Correspondence Between Resonance Scores and Congressional Mentions Network 43 Figure 4.1. Word Distributions for Tweets in 15 Languages, Plus Our Synthetic Language ...... 47 Figure 4.2. Generating Distinctive Word Use Patterns ................................................................. 48 Figure 4.3. Average Accuracy ...................................................................................................... 51 Figure 4.4. Expected Accuracy Rates Given Moderate Topical Focus ........................................ 52 iv Tables Table 2.1. Example of Text Regularization—World Series Outcomes ........................................ 14 Table 2.2. Example of Signature Creation—Cities in Which Major League Baseball World Series Occurred, Before and After 1967 ............................................................................... 17 Table 2.3. Resonance Scoring Text Vignettes and Authors ........................................................ 18 Table 2.4. Resonance Detection Rates from the Test Simulation Underlying Figure 2.3 ............ 23 Table 2.5. Proposed Scoring Table for Population of Interest Language Distinctiveness ........... 25 Table 2.6. Guidance Table for Determining Minimum Number of Training Examples .............. 26 Table 3.1. Congressional Tweet Data ........................................................................................... 30 Table 3.2. New Media Data .........................................................................................................