Big Data and Audience Measurement: a Marriage of Convenience? Lorie Dudoignon*, Fabienne Le Sager* Et Aurélie Vanheuverzwyn*
Total Page:16
File Type:pdf, Size:1020Kb
Big Data and Audience Measurement: A Marriage of Convenience? Lorie Dudoignon*, Fabienne Le Sager* et Aurélie Vanheuverzwyn* Abstract – Digital convergence has gradually altered both the data and media worlds. The lines that separated media have become blurred, a phenomenon that is being amplified daily by the spread of new devices and new usages. At the same time, digital convergence has highlighted the power of big data, which is defined in terms of two connected parameters: volume and the frequency of acquisition. Big Data can be as voluminous as exhaustive and its acquisition can be as frequent as to occur in real time. Even though Big Data may be seen as risking a return to the paradigm of census that prevailed until the end of the 19th century – whereas the 20th century belonged to sampling and surveys. Médiamétrie has chosen to consider this digital revolution as a tremendous opportunity for progression in its audience measurement systems. JEL Classification : C18, C32, C33, C55, C80 Keywords: hybrid methods, Big Data, sample surveys, hidden Markov model Reminder: The opinions and analyses in this article are those of the author(s) and do not necessarily reflect * Médiamétrie ([email protected]; [email protected]; [email protected]) their institution’s or Insee’s views. Receveid on 10 July 2017, accepted on 10 February 2019 Translation by the authors from their original version: “Big Data et mesure d’audience : un mariage de raison ?” To cite this article: Dudoignon, L., Le Sager, F. & Vanheuverzwyn, A. (2018). Big Data and Audience Measurement: A Marriage of Convenience? Economie et Statistique / Economics and Statistics, 505‑506, 133–146. https://doi.org/10.24187/ecostat.2018.505d.1969 ECONOMIE ET STATISTIQUE / ECONOMICS AND STATISTICS N° 505-506, 2018 133 uring the 20th century, census has gradu‑ Preamble: Data available in D ally declined in favor of sample surveys. Audience Measurement The founding act can be considered to be Anders N. Kiaer’s paper at the Congress of the Both survey data and Big Data exist for tele‑ International Statistical Institute in 1895 enti‑ vision and especially internet media. In both tled Observations et expériences concernant cases, audience measurement is based on a des dénombrements représentatifs. In 1934, panel and a semi‑automatic system of measure‑ Jerzy Neyman published the reference arti‑ ment. In this introduction, our aim is to briefly cle in sampling theory “On the two different describe the existing audience measurement aspects of representative methods, the method systems for television and internet applied by of stratified sampling and the method of pur- Médiamétrie in France. posive selection”. The growth of telephone equipment then encouraged the use of sample Internet surveys in many fields (public statistics, pol‑ itics, health, marketing, audience measure‑ Internet audience measurement relies on two ment, etc.). The end of the 20th century saw types of system: user‑centric measurement is a new paradigm shift with the emergence of dedicated to tracking internet site and app audi‑ Big Data: a return to the census. As a major ence behaviour for individuals across all of their player in this digital revolution, the media sec‑ devices. These systems are based on panels of tor has seen its measurement systems multiply individuals whose connections are measured and sometimes, inevitably, contradict itself. using meter software installed on their com‑ Médiamétrie, a benchmark institute for media puters, mobile phones or tablets that feeds data audience measurement in France, has had to back to Médiamétrie’s servers. The second change its methods to take advantage of the type of system is called site‑centric. This kind best of each source. of measurement relies on the insertion of tags (Box 1) into the websites and apps of clients subscribing to the measurement and produces The first part of the article deals with the rela‑ a total counting of the number of visits, page tive advantages and limits of survey data and views and connection times. Big Data, with an emphasis on the notion of quality in its various dimensions. This will Internet Audience Measurement on Computers allow to explain why Médiamétrie has chosen to see survey data and Big Data as complemen‑ Since the home computer is often a shared tary rather than in competition with one another. device, the panel consists of a cluster sample Indeed, we will look at how hybrid approaches: of all individuals aged 2 years and over within “the mix of two data sources that differ in a household. Therefore, the primary unit in the both nature and level to create a third, richer panel is the household and the secondary unit or more detailed one” have become the natural is the individual. Primary sampling units are approach (Médiamétrie, 2010). The second part recruited in accordance with the empirical quota will illustrate these approaches through two method. Once the meter has been installed on operational implementations in media audience all household computers, a pop‑up screen or measurement. We shall begin by introducing window will appear each time there is a con‑ the hybrid method used to measure internet nection. The secondary sampling units (indivi‑ audiences as part of the French market stand‑ dual household members) then have to identify ard since 2012 – an example of a so‑called themselves by ticking the box that corresponds to them. In September 2018, the panel com‑ panel‑up approach (Dudoignon et al., 2012). prised approximately 6,200 households with We shall finish by illustrating the so‑called internet access via a computer, i.e. more than log‑up approach used to measure the audience 14,000 individuals. of special interest channels (Dudoignon et al., 2014). In both cases, for Big Data to have any The scope of measurement is not limited to meaning or value, we must first understand connections to the internet at home. In fact, how it is acquired, which often includes tech‑ for the population in employment, a significant nical aspects performed to “clean” the data and proportion of their connections to the inter‑ process it in such a way as to create a poten‑ net occurs in the workplace. Nevertheless, the tially happy marriage with survey data. effort required by individuals to take part in 134 ECONOMIE ET STATISTIQUE / ECONOMICS AND STATISTICS N° 505-506, 2018 Big Data and Audience Measurement: A Marriage of Convenience? BOX 1 – Description of Measurement Technologies What is a Tag? the broadcast being measured of a mark (similar to a tattoo) that is inaudible to the human ear. This digital tat‑ In web analytics, a tag is an element that is inserted into too is inserted by a professional embedder validated by each content to be measured, so as to count the number Médiamétrie. The principle is to modify the signal broad‑ of content views. The content can be a page, an app, a casting the program with some additional information, podcast or even audio or video content. A code is inserted without affecting sequence audibility. At the other end, into the source code of the content. This generates a log the watermark is read by the TV meters connected to on the third‑party measurement system server each and the TV sets owned by panelists. The mark inserted by every time a content is viewed. This then makes a total the embedder contains identification information for the counting of connections to the tagged content possible. channel broadcasting the program, as well as regular What is Audio Watermarking? markers of the broadcast time. In this way, we can dif‑ ferentiate between the audience watching a live broad‑ A technology used for television audience measure‑ cast, the audience watching a pre‑recording and the ment, audio watermarking consists of the insertion into audience watching via a catch‑up TV platform. measurement – also known as the “response 11 years old, and in accordance with the con‑ burden” – prevents us from insisting that all straints imposed by France’s Data Protection secondary sampling units on the panel are addi‑ Act of 6th January 1978, participation by minors tionally measured at their place of work (if they is subject to the consent of an adult with paren‑ have a computer with internet access at work). tal authority. Like the system for measuring Such insistence would likely lead to very low connections via tablets, the panelist must install response rates. Therefore, the system is supple‑ an app on its mobile phone. This app routes the mented by an independent panel of individuals connections to Médiamétrie’s servers. All inter‑ who have internet access on a work computer. net traffic on the phone is attributed to the main In September 2018, there were 2,000 individu‑ user of the phone. Any use of the mobile phone als on this panel, and it is linked to the preced‑ by a secondary user is therefore, by convention, ing panel by statistical matching (Fisher, 2004). assigned to the main user. In September 2018, the panel consisted of 11,000 individuals aged Internet Audience Measurement on Tablets 11 and older. The principle of internet audience measure‑ Measurement of Secure Connections ment on tablets is very similar to that for com‑ puter measurement. Given that tablets are still Participation in user‑centric measurement sys‑ hardly used in businesses, the scope for meas‑ tems begins with the signature of an agreement urement is for the moment restricted to house‑ between Médiamétrie and its panelists. This holds. The panel consists of a cluster sample agreement details the respective commitments of individuals from within the recruited house‑ of Médiamétrie and the panelists. In particular, holds. The latter must install a measurement Médiamétrie undertakes to collect panelist user app on all tablets used in their home and must data for purely statistical purposes.