<<

Policy Research Working Paper 9329 Public Disclosure Authorized

Big Data for Sampling Design

The Venezuelan Migration Crisis in Ecuador Public Disclosure Authorized Juan Muñoz José Muñoz Sergio Olivieri Public Disclosure Authorized Public Disclosure Authorized

Poverty and Equity Global Practice July 2020 Policy Research Working Paper 9329

Abstract The worsening of Ecuador’s socioeconomic conditions and location in the . The total estimated population the rapid inflow of Venezuelan migrants demand a rapid represents about 3 percent of the total Ecuadoran popu- government response. Representative information on the lation. settled across urban areas, mainly in migration and host communities is vital for evidence-based , , and Manta (). The strategy policy design. This study presents an innovative method- implemented may be useful in designing similar exercises ology based on the use of big data for sampling design of in with limited information (that is, lack of a a representative survey of migrants and host communities’ recent census or migratory registry) and scarce resources populations. This approach tackles the difficulties posed by for rapidly gathering socioeconomic data on migrants and the lack of information on the total number of Venezuelan host communities for policy design. migrants—regular and irregular—and their geographical

This paper is a product of the Poverty and Equity Global Practice. It is part of a larger effort by the Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The authors may be contacted at [email protected].

The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry the names of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/ and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent.

Produced by the Research Support Team Big Data for Sampling Design: The Venezuelan Migration Crisis in Ecuador∗

Juan Muñoz1, José Muñoz2 and Sergio Olivieri3

JEL: C8, C83, F22 Keywords: Sampling design, stratification, Big Data, migration, , weight calibration, maximum entropy.

∗This paper has benefited from comments by Ana Aguilera, Tara Vishwanath, Nandini Krishnan, Beatriz Godoy, Teresa Reinaga, Ana Rivadeneira, Tanja Goodwin, Carlos Vayas from Telefonica de Ecuador, Alexandra Escobar and Paul Guerrero from UNICEF - Ecuador. We are very grateful to Roberto Carrillo (ex-Director of NSO – INEC acronym in Spanish) and Christian Garces, Xavier Núñez and Francisco Céspedes from the sampling division of INEC for facilitating access to update information from the 2010 Census. Different versions of the paper benefitted with comments from participants to Conference on Inclusion of Refugees in National Surveys that Measure Poverty, October 2-3, 2019, Washington, DC; Research Conference on Forced Displacement, January 16-18, 2020, Copenhagen; and Urban Migration & Forced Displacement: Data Collection in Fragile States, February 27, 2020, Washington DC The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do not necessarily represent the views of the International Bank for Reconstruction and Development/World Bank and its affiliated organizations, or those of the Executive Directors of the World Bank or the governments they represent 1 Sistemas Integrales, , . 2 Telefónica de Ecuador. 3 World Bank, Poverty and Equity Global Practice. I. Introduction The República Bolivariana de Venezuela is experiencing a multifaceted humanitarian, economic, and social crisis, which has led to an exodus without precedent in the region. By 2019, approximately 4.3 million people had left the República Bolivariana de Venezuela. This exodus also resembles the experiences in other conflict states, such as the Syrian Arab , Afghanistan, Somalia, and Southern Sudan. Eighty percent of the Venezuelan emigrants migrated to America, with the main recipient countries being , , and Ecuador. At the beginning of the Venezuelan exodus, Ecuador was characterized as a country of transit to Peru or other countries, such as Chile and . Since 2015, more than 1.2 million Venezuelans have passed through Ecuador in a "humanitarian corridor".4 However, between 2015 and September 2019, almost 400,000 Venezuelans decided to settle in Ecuador.5 The hardening of migration policies in other countries of the region, combined mainly with reasons of family reunification and the search for better economic opportunities, helped persuade many migrants to consider Ecuador as a destination country. The migration stock in Ecuador tripled between 2017 and 2018. By 2019, it had almost doubled again when compared to the same period the previous year (Figure I-1).

FIGURE I-1: THEY ARE NO LONGER IN TRANSIT; MORE AND MORE ARE DECIDING TO STAY 400,000 389,103 MONTHLY ARRIVALS 350,000 ACCUMULATED MIGRATORY BALANCE (from JANUARY 2015) 300,000 248,480 250,000 200,000 94,694 150,000 33,942

10,058 97,688 93,303 93,303 91,356 91,356 90,828 90,828 89,015 89,015 87,828 87,828 86,924 86,924 82,117 82,117 81,142 81,142 76,469 76,469 73,284 73,284 72,556 72,556 70,950 70,950 70,842 70,842 67,434 67,434

100,000 62,037 52,166 52,166 48,758 48,758 47,071 47,071 39,178 39,178 30,397 30,397 30,345 30,345 30,073 30,073 29,631 29,631 26,988 26,988 26,839 26,839 16,869 16,869 14,716 14,716 14,647 14,647 13,726 13,726 13,130 13,130 13,071 13,071 12,767 12,767 12,400 12,400 50,000 12,082 11,344 11,344 10,862 10,862 10,801 10,801 9,991 9,991 9,652 9,652 9,540 9,540 8,890 8,890 8,760 8,760 8,361 8,361 8,012 8,012 7,938 7,938 7,869 7,869 7,456 7,456 7,278 7,278 7,161 7,161 6,710 6,710 6,205 6,205 6,186 6,186 6,102 6,102 5,982 5,982 5,659 5,659 - JUL JUL JUL JUL JUL SEP SEP SEP SEP JAN JAN JAN JAN JAN NOV NOV NOV NOV MAY MAY MAY MAY MAY MAR MAR MAR MAR MAR 2015 2016 2017 2018 2019 Source: Own estimates based data from the Ministry of the Government of Ecuador, August 2019. The Venezuelan exodus coincided with the Ecuadoran economic crisis, but it was not the cause. Oil prices have been falling since mid-2014, and, as a result, the Ecuadoran government had to shrink spending, which was the main engine of growth. The economy stagnated in 2015 and contracted by 1.2 percent in 2016.6 Due to fiscal pressures, the coverage of Ecuador's main conditional cash transfer program was also reduced. Working conditions deteriorated. Unemployment, underemployment, and informality rates increased to levels not seen in 10 years. Between 2013 and 2018, approximately 66,000 good jobs were lost. This

4 http://reporting.unhcr.org/node/2543. 5 Ministry of Government of Ecuador. Available at https://www.ministeriodegobierno.gob.ec/migracion/. 6 World Bank (2018).

2

significant reduction was partially offset by the increase in jobs offering fewer hours per week and lower wages, or a combination of both. The worsening of socio-economic conditions demands a government's rapid response. To do so, representative information on migrants and host communities is vital for evidence-based policy design. However, two main challenges are associated with conducting a representative survey of the host and migrant populations. Even though Ecuador has a reliable and updated sampling frame for the resident population, the total number of Venezuelans and their geographical location in the country were not available at the time of the survey.7 The official data provide only the net flows of migrants who entered and exited the country through an official gate. However, no official estimates exist for those migrants who did not register their entry into the country to add up to the total Venezuelan migrant population. In 2019, Ecuador was in a weak position relative to its neighboring countries in the availability of such data. Peru gathered the National Census in 2017, and Colombia had collected not only the 2018 National Census but also a national migratory registry. Unlike other Latin American host countries, Venezuelan migrants were expected to be spatially clustered in Ecuador throughout the corridor. For instance, while in Peru, over 84 percent of migrants are in , in Ecuador, it is estimated that only around 60 percent of Venezuelan migrants live in five main cities – Quito, Guayaquil, Cuenca, Manta and Santo Domingo.8 This phenomenon is likely standard in this type of population's movements, a fact that has significant implications on both the survey's intended analyses and its sampling design. From an analytical standpoint, assessing the impact of immigrants on the host population depends on their spatial proximity and their relative density. From the sampling viewpoint, the clustering of immigrants would help to implement a design that has proven to be successful in conducting similar surveys elsewhere. For instance, such as those addressed to immigrants from the rest of the continent in South Africa (Plaza, Navarrete and Ratha, 2011); Syrian refugees and their hosts in Lebanon, Jordan and Iraqi Kurdistan (World Bank, 2018, and Aguilera, Krishnan, Muñoz, Russo, Sharma and Vishwanath, 2020); and Myanmar Rohingya in Bangladesh (World Bank, 2020 b). The paper is structured as follows: section II describes the innovative use of Big Data from Telefonica de Ecuador to come up with a strategy for generating representative samples of host and migrant households for the Human Mobility and Host Community Survey (EPEC acronym in Spanish). Section III presents the implementation strategy. Section IV concludes by drawing general lessons from our experience in sampling migrant populations.

7 The Human Mobility and Host Communities Survey in Ecuador (Encuesta de personas en Movilidad y Comunidades de Acogida – EPEC acronym in Spanish). 8 World Bank (2020 a).

3

II. Big Data at Work

Ecuador's most recent population and national housing census was conducted in 2010 by the National Statistical Institute (INEC acronym in Spanish). It reported the population and number of households in each of Ecuador's 40,681 census sectors. The census cartography is entirely digitized, and polygon shapefiles are also available for each sector. It has been partially updated over the years, and it has been the primary source of information for sampling design of all official surveys like labor, health, and nutrition, and multipurpose surveys, among others. These surveys are representative of the Ecuadoran population residing in the country at different geographical levels. However, the census cartography is not updated enough to capture the recent influx of Venezuelan migrants. Thus, it does not allow to build a representative sample of the migrant and host populations. Defining a sampling strategy to yield representative samples of host and displaced populations demanded a key innovation: the incorporation of Big Data for the sampling design. The flow of migrants affects mainly urban areas or main cities across the corridor from the northern border with Colombia to the southern border with Peru. The geographical aspect of the phenomenon encouraged the use of Big Data from telecommunication companies because of its accuracy in urban areas. Two leading cell phone providers were contacted in Ecuador to request their collaboration. However, only one of them, Telefonica de Ecuador, had the technical capacity to implement the sampling strategy proposed. Telefonica de Ecuador has vast experience in processing and providing the information requested for this sample.9 Even though its market share is around 30 percent, which represents approximately 4.6 million clients in Ecuador, it is the only private company with a significant presence in the República Bolivariana de Venezuela's mobile phone market. Thus, it could be expected to attract most Venezuelan consumers seeking to obtain cell phone services in Ecuador. Telefonica storages around 15 million call detail records (CDR) and 180 million external detail records (xDR) on average per day.10 For this project, Telefonica processed more than 4.7 billion mobile registries adding up CDR and xDRs for a period between June 2018 and March 2019.11 Telefonica's database was analyzed to determine how many of the active mobile phones in its database were likely to belong to a Venezuelan who lived in each of INEC's census sectors. The process was conducted in three phases:

• First, each mobile phone in the database was tagged as being active; • Second, each of the active phones was tagged as likely to belong to a Venezuelan;

9 The company has been working since 2017 with the Ministry of Tourism in identifying strategic travel destinations based on the analysis of Big Data (Telefonica de Ecuador, 2018). It has also been working with Quito’s municipality on developing an urban mobility study for the Metro (Telefonica de Ecuador, 2017). 10 There are two ways of collecting data: active or passive. The active way is when an event like calls, text, data, etc. is made by the owner of the cellphone and it can be charged to the owner. The coordinates of these events are registered and stored for future analysis. The passive way is when an event is captured by the device and this could not necessarily be charged to the owner. Telefonica de Ecuador (2019) 11 There is plenty of evidence in the literature on the use of CDR and XDRs for defining the place of residency of individuals based on the implementation of micro-simulation techniques at the global level and across different issues like employment shocks, infectious diseases, and others. Telefonica de Ecuador (2019).

4

• Third, each of the active Venezuelan phones was assigned to the primary sampling unit (PSU) where the owner was most likely to reside.

II.1 Identifying active mobile phones

A phone was tagged as being active if, during the past 90 days, it generated at least 60 of the following events in 30 different days: being turned on or off, making or receiving voice calls, sending or receiving text messages, or accessing the Internet. II.2 Identifying Venezuelan mobile phones

Active mobile phones were tagged as Venezuelan if they were registered under the name of a Venezuelan national or if, during the past 30 days, they were used at least 30 times to make/receive a voice call to/from the República Bolivariana de Venezuela; send/receive a text message to/from the República Bolivariana de Venezuela; or access a website of interest to Venezuelans (e.g., visa applications, job opportunities, etc.). II.3 Georeferencing the phones

Cellphones are mobile by . For this exercise, a phone's residence was based on the events generated between 8 pm and 6 am – the evening hours, which can be reasonably assumed to be the time when the owner is at home.12 Each of the events uses one (or occasionally more than one) of Telefonica's antennas, and each of the antennas is characterized by a single territorial point – the centroid of its coverage . The antennas used by phone in the evening hours during a given number of days are first clustered into groups of neighboring antennas, and the cluster most frequently used during that period is selected as the cluster that defines the owner's residence. The phone's location is computed as the average of centroids of antennas in that cluster, weighted by the number of events used by each antenna. Finally, the phone is assigned to the PSU where that point belongs. II.4 Extrapolating to the overall population

The above process determines how many of Telefonica's phones reside in each PSU, and how many of them are likely to be Venezuelan. The figures were adjusted by canton-wise adjustment factors based on Telefonica's market shares (known from other sources), to estimate the total number of phones from all companies and the total number of Venezuelan phones from all companies in each sector. Another adjustment factor based on the fraction of the population that uses mobile phones was used to make the figures consistent with the total number of households reported by INEC in each sector at the time of the 2010 Census. II.5 Results At the country level, the total number of Venezuelan migrants estimated in 2019 by Telefonica de Ecuador was 470,095. This population represented about 3 percent of the total population in Ecuador in 2019 and

12 The workplace was also determined by monitoring cellphones’ activity from 9 am to 5 pm. The idea was to see whether Venezuelans move far away from their homes to seek for a job or not.

5 spread across the whole territory. Mainly, Venezuelan migrants are found throughout the main cities of the corridor from the Colombian border in the north to the Peruvian border in the south (Figure II-1). Given this migration flow affects urban areas, it is not surprising the map shows a higher density of migrants in Quito, Guayaquil, and Manta (Portoviejo) where migrants could find more economic opportunities whether to settle in or continuing the journey to southern countries.

FIGURE II-1: WHERE VENEZUELAN MIGRANTS ARE SETTLE IN ECUADOR

Source: Own estimates based on Telefonica de Ecuador, March 2019. As Figure II-2 shows, there is a high provincial variation, ranging from cantons hosting almost 90,000 Venezuelan migrants (Guayaquil and Quito) to others having none. Four cantons, Guayaquil, Quito, Mana, and Santo Domingo, concentrate more than half of the Venezuelan migrant population estimated. Moreover, these migrants are not only residing but working in the same areas where they settled (Figure II-2 – Panel B).

FIGURE II-2: WHERE VENEZUELAN MIGRANTS RESIDE AND WORK PANEL A: GUAYAS CAPTURES 30 PERCENT OF TOTAL VENEZUELANS PANEL B: VENEZUELANS RESIDE AND WORK IN THE SAME CANTON 150,000 40% 25% 100,000 100,000 30% 80,000 20% 20% 50,000 60,000 15% 10% 0 0% 40,000 10% 20,000 5% QUITO OTROS IBARRA MANTA SALINAS 0 0% CUENCA GUAYAQUIL PORTOVIEJO ESMERALDAS AZUAY OTROS SANTO… EL ORO GUAYAS SANTO DOMINGO MANABI LOS RIOS IMBABURA Total Residentes ESMERALDAS SANTA ELENA Total Trabajadores Total Residentes Participación Participación Residentes Participación Trabajadores Source: Own estimates based on Telefonica de Ecuador, March 2019.

6

III. Implementation

The primary sampling strategy was very standard. A nominal sample of 2,800 households was to be selected in two stages, with 200 Census sectors -hereafter referred to primary sampling units, or PSUs- in the first stage and 14 households per PSU in the second stage. However, the design departed from the standard in the way it addressed stratification in both phases. III.1 First sampling stage The 200 primary sampling units (PSUs) were stratified into three categories depending on the Venezuelan migrant density, defined as the ratio between the number of Venezuelan cellphones in the PSU (as per Telefonica's prediction) and the total population of the PSU (as per INEC's population census), and allocated into these strata as follows:

o 100 High-density PSUs: where the density was more than 0.15.; o 80 Medium-density PSUs: where the density was between 0.05 and 0.15.; and o 20 Low-density PSUs: where the density was less than 0.05. Within each stratum, the sample was selected with probability proportional to the number of households reported by the 2010 Census. III.2 Second sampling stage All households in each of the selected sectors were listed and stratified into three categories considering nationality and demographic composition. In each stratum, households were selected as follow:

o Seven non-Venezuelan households (i.e. Ecuadoran, Peruvian, Colombian or any other nationality except Venezuelan). 13 o Seven Venezuelan households without children aged less than 5 years. o All Venezuelan households with children aged less than 5 years.

Within each stratum, the sample was selected by systematic equal-probability sampling. The second stage posed some fieldwork management difficulties but is conceptually straightforward. A full listing operation of all households was conducted in the 200 PSUs selected in the first stage. The listing operation consisted of a complete enumeration of all physical structures in the area, with each physical structure being classified as a residential dwelling, commercial building, hospital, church, collective residence, in construction, etc. The listing operation collected information about the number of households occupying each residential dwelling and the number of Venezuelan members living in each household. For each Venezuelan household, the existence of children under five years of age was also gathered. Notice that given the high mobility of migrants, the listing operation was done twice: first a month before fieldwork and then immediately before fieldwork. To ensure the quality and completeness of the listing operation, enumerators relied on the updated cartography from NSO and high-resolution paper maps identifying all buildings within each segment. The

13 In the PSUs with fewer than seven childless Venezuelan households, all of them were selected. Since this was the case in many PSUs, the sample size was reduced from 2,800 to 1,871 households.

7

procedure was implemented using smartphones and an application designed on the SurveyCTO Collect platform. Length, , and altitude information was also collected for monitoring enumerators' work remotely. Enumerators then created a record for each residential unit and households following the protocol described in the 2019 Manual for Enumerators.14 III.3 Selection probabilities and sampling weights

Given the sampling design discussed above, the probability phij of selecting a household in the second-stage stratum hij of sector hi of first-stage stratum h is given by:

= (1) 𝑘𝑘ℎ𝑛𝑛ℎ𝑖𝑖 𝑚𝑚ℎ𝑖𝑖𝑖𝑖 ℎ𝑖𝑖𝑖𝑖 where 𝑝𝑝 𝑁𝑁ℎ 𝑀𝑀ℎ𝑖𝑖𝑖𝑖

kh is the number of sectors selected in the first-stage stratum h (20 in a stratum of low density, 80 in medium density and 100 in high density),

nhi is the population in sector hi, as per the 2010 Housing and Population Census;

Nh is the population in first-stage stratum h, as per the 2010 Housing and Population Census;

mhij is the number of households selected in the second-stage stratum hij; and

Mhij is the total number of households in the second-stage stratum hij.

Notice, two fractions on the right-hand side of (1) represent the probability of selecting the sector in the first stage and the conditional probability of choosing the household in the second stage, respectively. The first probability is correct regardless of the accuracy of Telefonica's predicted number of Venezuelan phones in each sector.

The data from each household in hij should be affected by a sampling weight (or raising factor) whij, equal to the inverse of its selection probability (i.e. whij = 1 / phij) to produce unbiased estimates from the sample. III.4 Non-response adjustment and calibration To account for non-response during the listing operation, the total number of households in each second- stage stratum hij (the denominator of the second ratio in (1)) was estimated as: ( , ) for non-Venezuelan households = ′ for Venezuelan households without children (2) 𝑀𝑀𝑀𝑀𝑀𝑀 𝑌𝑌ℎ𝑖𝑖𝑖𝑖 𝑌𝑌ℎ𝑖𝑖𝑖𝑖 for Venezuelan households with children 𝑀𝑀�ℎ𝚤𝚤𝚤𝚤 � 𝑌𝑌ℎ𝑖𝑖𝑖𝑖 where Yhij is the total number𝑚𝑚 ℎof𝑖𝑖𝑖𝑖 households listed in hij and Y'hij is the total number of households registered in hij by the 2010 Housing and Population Census.

The prior weights whij are constant for all members of all households in hij. These weights were calibrated considering the household characteristics hijk in hij. For the Ecuadoran population, the survey matches INEC's 2019 demographic projections in three geographic areas (i.e., Guayas, Pichincha, and rest) and five age categories by gender (i.e., 0-4, 5-9, 10-14, 15-64 and 65+ years old). For the Venezuelan population,

14 Centre for Evaluation and Development (2019).

8

the net migrant flows from the Ministry of Governance was matched. Weights are the same for all members

of the household. We achieved this by looking for a set of weights { hijk} that differed as little as possible from the set {wihj} subject to several constraints, using the maximum entropy principle (Wittenberg, 2010). ω Calling w*hijk and *hijk the sums of the prior and calibrated weights of all members of household hijk, they should both add up to the total population N: ω = = (3) ∗ ∗ which can be written in terms of probabilities∑ℎ𝑖𝑖𝑖𝑖𝑖𝑖 𝑤𝑤ℎ as𝑖𝑖𝑖𝑖𝑖𝑖 ∑ℎ𝑖𝑖𝑖𝑖𝑖𝑖 𝜔𝜔ℎ𝑖𝑖𝑖𝑖𝑖𝑖 𝑁𝑁 = = 1 (4) ∗ ∗ * * * * ℎ𝑖𝑖𝑖𝑖𝑖𝑖 ℎ𝑖𝑖𝑖𝑖𝑖𝑖 where r hijk = w hijk/N and hijk = hijk∑/N. The𝑟𝑟ℎ𝑖𝑖𝑖𝑖 𝑖𝑖adjusted∑ weights𝜌𝜌ℎ𝑖𝑖𝑖𝑖𝑖𝑖 can be obtained by minimizing

ρ ω ( , ) = ∗ (5) ℎ𝑖𝑖𝑖𝑖𝑖𝑖 ∗ 𝑟𝑟 ℎ𝑖𝑖𝑖𝑖𝑖𝑖 ℎ𝑖𝑖𝑖𝑖𝑖𝑖 ∗ subject to 𝐼𝐼 𝑟𝑟 𝜌𝜌 ∑ 𝑟𝑟 𝑙𝑙𝑙𝑙 �𝜌𝜌ℎ𝑖𝑖𝑖𝑖𝑖𝑖� ( ) = = 1, … , (6) ∗ = 1 𝐸𝐸 𝑥𝑥𝑐𝑐 ∑ℎ𝑖𝑖𝑖𝑖𝑖𝑖 𝜌𝜌ℎ𝑖𝑖𝑖𝑖𝑖𝑖 𝑚𝑚𝑥𝑥𝑐𝑐ℎ𝑖𝑖𝑖𝑖𝑖𝑖 𝑐𝑐 𝐶𝐶 ∗ ℎ𝑖𝑖𝑖𝑖𝑖𝑖 The term is the mean of the xc characteristic∑ 𝜌𝜌ℎ𝑖𝑖𝑖𝑖𝑖𝑖 within household hijk. In other words, this

minimization problem asserts that we should pick the distribution *hijk that meets the moment constraints 𝑚𝑚𝑥𝑥𝑐𝑐ℎ𝑖𝑖𝑖𝑖𝑖𝑖 and the normalization restriction and deviates as little as possible from the prior distribution r*hijk, requiring ρ the least additional information. The solution is given by:

= (6) (𝑋𝑋𝜆𝜆�) ∗ 𝑒𝑒 ℎ𝑖𝑖𝑖𝑖𝑖𝑖 where is the Lagrange multiplier for the constrains𝜌𝜌 E(XΩ)𝜆𝜆�, and Ω is the normalizing factor to scale the sum of the adjusted weights to the target population N. The numeric calculations used Stata's maxentropy 𝜆𝜆̂ algorithm.

9

IV. Final remarks

Weak economic growth and deterioration of social outcomes since the plunge in oil prices combined with a high influx of Venezuelans demanded a rapid response from Ecuador’s government. The survey (i.e., EPEC) was designed and implemented to produce comparable findings of the lives and livelihoods of host communities and Venezuelan migrants and refugees for policy design. Even though Ecuador has a reliable and updated sampling frame, the lack of information on how many Venezuelan migrants are settled in Ecuador and where they reside in the territory posed challenges for the design and implementation of the survey. This difficulty shows in many developing and developed countries that are affected by a rapid influx of people. The exclusion of these types of populations from national sampling frames provides a biased picture of the situation the country is experiencing (World Bank, 2018). Moreover, as the number of migrants and refugees continues to increase, it becomes more relevant to develop strategies to include them in frequent representative socio-economic surveys.

The contribution of this study relies on the innovative methodological approach of using Big Data for sampling design. The methodology brings a proper solution for approximating the total number of migrants -including regular and irregular migrants- and their location in a country. The strategy implemented in this survey can be useful in designing similar exercises in countries with limited information (i.e., lack of a recent Census or a migratory registry) and scarce resources to rapidly gather socio-economic data of migrants and host communities for policy design.15 Even though access to Big Data is beneficial for sampling design, researchers and policy makers could face a limitation in the lack of knowledge and technical capacity from telecommunication companies to implement the method. Also, Big Data are particularly precise when both market share and coverage -i.e., urban areas- are high. Thus, information on geographic regions (e.g., rural areas) where coverage is low, or market shares are small might be taken with caution. Notice that this type of population is hugely mobile and difficult to locate if the information is not frequently updated. This poses significant trade-offs between having the most updated data and the cost of getting it. Lastly, although the confidentiality of users is not an issue in this type of exercise (the number of Venezuelan phones was requested as an aggregated PSU-level figure), our telecommunication partner still censored the precise number when it referred to fewer than five clients.

15 Notice that the Government of Ecuador approved Executive Decree 826 on July 25, 2019 for conducting a migratory registry to be finalized by end March 2020.

10

References

Aguilera, A., Krishnan, N., Muñoz, J., Russo Riva, F., Sharma, D. and Vishwanath, T. 2020, Sampling for representative surveys of displaced populations, in: Hoogeveen J., Pape U. (eds) Data Collection in Fragile States. Palgrave Macmillan, Cham Washington DC. Altonji, J. G., Elder, T. E., and Taber, C. R. 2005. Selection on Observed and Unobserved Variables: Assessing the Effectiveness of Catholic Schools, Journal of Political Economy, 113(1): 151–184. Asian Development Bank, 2014, Purchasing power parities and real expenditures, Mandaluyong City, Philippines: Asian Development Bank. Centre for Evaluation and Development, 2019, Manual para el uso del CAPI para alistamiento de hogares, abril. Quito, Ecuador. Plaza, S., Navarrete, M. and Ratha, D., 2011, Migration and Remittances Household Surveys in SubSaharan Africa: Methodological Aspects and Main Findings, World Bank and African Development Bank, Washington DC. Telefonica de Ecuador, 2017, Aplicación de tecnología móvil para la planificación y gestión de movilidad en el distrito metropolitano de Quito – Big Data. Telefonica Data Unit. Quito, Ecuador. Telefonica de Ecuador, 2018, Aplicación de tecnología móvil para la planificación y gestión de movilidad cantonal enfocada en Turismo – Big Data. Telefonica Data Unit. Quito, Ecuador. Telefonica de Ecuador, 2019, Insumos de estudio de la movilidad de migrantes venezolanos – Big Data. Telefonica Data Unit. Quito, Ecuador. Wittenberg, M. (2010) "An introduction to maximum entropy and minimum cross-entropy estimation using Stata", The Stata Journal, 10, Number 3, pp. 315-330. World Bank, 2018, Poverty and Shared Prosperity Report: Piecing Together the Poverty Puzzle, Washington, DC. World Bank, 2018, Syrian refugees and their host in Jordan, Lebanon and the Kurdistan Region in Iraq: lives, livelihoods, and local impacts, Unpublished manuscript. World Bank, 2018, Systematic Country Diagnostic, Washington DC. http://documents.worldbank.org/curated/en/835601530818848154/Ecuador-Systematic- Country-Diagnostic World Bank, 2020 (a), Between countries: challenges and opportunities of Venezuelan migration in Ecuador, Mimeo, Washington DC. World Bank, 2020 (b), Drawing representative inferences of Rohingya displaced and their hosts populations: Sampling design of the 2019 Cox's Bazar Panel Survey, Mimeo, Washington DC.

11