
The Potential of the Computational Linguisti c Analysis of Social Media for Population Studies Letizia Mencarini Bocconi University, Dondena Centre for Research on Social Dynamics and Public Policy via Roentgen 1, 20136 Milan, I. [email protected] Abstract demographic studies took place. Rather than simply describing demographic patterns, today The paper provides an outline of the scope demographers are equally concerned in under- for synergy between computational lin- standing both the drivers and the consequences of guistic analysis and population studies. It demographic processes. In doing so, demogra- first reviews where population studies stand in terms of using social media data. phers have assembled an enormously rich set of Demographers are entering the realm of data for explaining not only population processes, big data in force. But, this paper argues, but also the motivational and behavioral drivers population studies have much to gain from behind these processes. However, data generated computational linguistic analysis, especial- by surveys may have peaked. As survey and poll- ly in terms of explaining the drivers be- ing agencies struggle with increasing costs and hind population processes. The paper gives declining survey response rates, statistic produc- two examples of how the method can be ers are increasingly looking towards big data. applied, and concludes with a fundamental Still, given their quantitative pedigree, demogra- caveat. Ye s, computational linguistic anal- phers are perhaps better placed than most other ysis provides a possible key for integrating micro theory into any demographic analy- social scientists to take on the challenge of the sis of social media data. But results may new big data revolution. be of little value in as much as knowledge Demographers are, in fact, already using big about fundamental sample characteristics data to describe demographic processes, including are unknown. data derived from social media. But there are chal- lenges. Big data is messy and unstructured, and this 1 The incomplete data revolution in is a considerable challenge for a scientific field demography acutely concerned with representativeness and un- biased estimation. Social media provides a promis- Demography is the study of population. Tradi- ing avenue, however, as demographers are interest- tionally, demography is concerned with measur- ed not only in describing population processes, but ing and estimating population change by births, also in the motivations that individuals have for deaths and migration. Demography is rooted in their behavior, which, ultimately, generates ob- quantitative methods, with data at its heart. As the served population processes. For demographers in field moved through different epochs of data search of the determinants and consequences of availability, in demography data have always demographic behavior, the linguistic analysis of so- been "big" (Billari and Zagheni, 2017). Starting cial media texts can offer a precious and rich new with the exercise of mapping macro-level trends source. Caution – as this paper highlights – is nec- through population level parameters, based large- essary, since its non-representativeness and partiali- ly on census and administrative records, the field ty makes it problematic in social-science terms. became more theory driven as individual data be- The rapid emergence of big data from social came available. It is fair to say that with the ex- media outpaced social scientists’ capacity for us- plosion in available survey data, a revolution in ing and analyzing them. That having been said, 62 Proceedings of the Second Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media, pages 62–68 New Orleans, Louisiana, June 6, 2018. c 2018 Association for Computational Linguistics demographers have made a start in exploiting so- cent decades. The theory stems from Inglehart’s cial media data. For example, Reis and Brown- work (1971). He argued that with the onset of stein (2010) show that the volume of Internet modernization, individuals now cared more about searches for abortion is inversely proportional to self-realization and less about traditional family local abortion rates and directly proportional to life, which consequently fostered new demo- local restrictions on abortion. Billari et al. (2013) graphic behaviors, such as out-of-wedlock show that Google searches for fertility-related childbearing, cohabitation replacing marriage and queries, like ‘pregnancy’ or ‘birth’, can be used to fertility decline. In other words, values, attitudes predict fertility intentions and consequently fertil- and opinions, play a critical role. Another example ity rates, several months ahead of them being concerns the theoretical concept of gender equali- made public through other data sources. Ojala et ty and equity. As women increasingly attain the al. (2017) use Google Correlate to detect evi- same levels of higher education as men their atti- dence for different socio-economic contexts relat- tudes change. Other than having children, they al- ed to fertility (e.g., teen fertility, fertility in high so want fulfilling work careers (Esping-Andersen income households, etc.). Email data have been and Billari, 2015; Aassve et al., 2015). The sense used to track migrants (Zagheni and Weber, of gender equity (Mencarini, 2014) changes as 2012); Facebook data to monitor migrant stocks women reach men’s level in terms of education, (Zagheni et al., 2017); patterns of short- and long- but traditional attitudes may prevail within house- term migration (Zagheni et al., 2014); and family holds. If so, there is a mismatch between gender change have been derived from Twitter data (Bil- equity and actual equality, which, McDonald lari et al., 2017). These applications are im- (2000) argues, creates a gender conflict, which portant, and have demonstrated that the combina- eventually leads to lower fertility. Yet, another tion of survey and internet data improve predic- important theoretical concept originates in eco- tive power and the accuracy of the described de- nomics. Economic models are used to explain mographic phenomena. Billari and Zagheni changes in divorce, migration drivers, and fertility (2017) triumphally affirm that the Data Revolu- and so forth. Starting with individual preferences, tion is already here for the study of population behavior come out through a process of decision processes. However, these studies are all ulti- making, where individuals’ (presumed) rational mately about describing demographic processes. evaluations are made in order to maximize their So far, progress in exploiting content analysis of wellbeing. As one moves from survey data to a texts and corpora has been limited, and existing social media corpus, these theoretical concepts of- studies have not yet tackled how social media da- fer both challenges and opportunities. On the one ta can explain the behavioral motivations that hand, new methods, not always familiar to de- drive observed population processes. On this mographers, must be implemented. On the other, point, there is massive potential for synergy be- there is opportunity in the fact that social science tween demography and computational linguistics. theories can show us what one should be looking Certain strands of the social sciences have started for in an otherwise complex and sometimes over- looking in this direction, as there are several ex- whelming amount of data. amples in political science and political economy. 3 Social media linguistic analysis as a 2 Why people’s opinions matter middle ground between qualitative and quantitative analysis In order to exploit social media data to explain the determinants of population processes, one has, One important reason behind the slow progress in perforce, to delve into the behavioral theories the field, is, perhaps, that demographers are more commonly invoked in demographic studies. For confident with the analysis of numbers than with population studies, there is no single theory. In- text: i.e. with quantitative rather than qualitative stead, being an interdisciplinary science, demog- analysis. Or, perhaps, there is still uncertainty and raphers borrow from a host of theoretical concepts suspicion about the extent to which social media from across the social sciences. One example is data can be used to properly infer theoretical con- the Second Demographic Transition theory (Van cepts for demography. Developments are being de Kaa, 1987; Lesthaeghe, 2010), which has been made elsewhere in the social sciences. However, a point of reference in family demography in re- 63 the most prominent examples are based on digit- The challenge lies in how text data can be investi- ized historical texts. The approach taken is similar gated for research questions which require closer to what is being done with social media data, in analysis and nuanced interpretation. But neither the sense that one exploits distributional semantic traditional qualitative approaches requiring the techniques. This is a ‘usage-based’ theory of manual screening and classification of all the ma- meaning built upon similarities of linguistic dis- terial, nor quantitative statistical analysis, can be tributions in a corpus (Lenci 2008), and it allows applied. In this sense, social media data texts pro- for the extraction of (near-) synonyms in a con- vide a middle ground between qualitative studies text-dependent way, for each document and period and
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages7 Page
-
File Size-