International Migration of Sex Workers

INTERNATIONAL MIGRATION OF SEX WORKERS

Word count: 15.549

Andreas Bogaerts

Student number: 01204442

Supervisor: Prof. dr. Luis Enrique Correa da Rocha

Master’s dissertation submitted to obtain the degree of:

Master of Science in Business Engineering Main subject: Data Analytics

Academiejaar/ Academic year: 2019 – 2020

PERMISSION

I declare that the content of this Master’s Dissertation may be consulted and/or reproduced, provided that the source is referenced.

Name student: Andreas Bogaerts

Signature

Ghent, 7th of August 2020

Andreas Bogaerts

Preface

As I progressed through the Data Analytics master's program and learned more about the world of ‘big data’, I became aware of the endless possibilities of using ‘big data’ in a creative way to study seemingly unrelated phenomena. This combination of big data and creativity is a tremendous asset in our understanding of the world around us, be it from a social, economic or political perspective. Therefore, I am very grateful that Prof. Dr. Luis E. C. da Rocha gave me the opportunity to apply this creative, data-driven approach in my research on a topic that is important from all three perspectives: international migration. Working on this master’s thesis has also given me the opportunity to apply my acquired knowledge of data scraping and data visualisation to a real-life subject and to increase my comprehension of Python programming and the usage of network analysis on migration data.

I am very thankful to Prof. Dr. Luis E. C. da Rocha for his patient advice and guidance along the way. I would also like to thank my parents, who supported me throughout my education.

I hope you enjoy the reading.

Andreas Bogaerts

7th of August 2020, Ghent

Table of Contents

Preface……………………………………………………………………………………….iii

Table of Contents…..…………………………………………………………...……………iv

List of Figures………………………………………………………………………………..vi

List of Tables………………………………………………………………………………...vii

1. Introduction ...... 8

1.1. Motivation ...... 8

1.2. Literature review & contextualization ...... 9

1.2.1. International migration ...... 9 1.2.2. Network analysis of migration ...... 10 1.2.3. Migration data...... 11 1.2.4. Digitalization of Commercial sex work ...... 12 1.2.5. Escort websites ...... 13 1.2.6. Migrant sex work ...... 14

1.2.7. Conclusion of the literature review ...... 15 2. Materials & Methods ...... 16

2.1. Data ...... 16

2.1.1. Data Collection ...... 18 2.1.2. Data cleaning ...... 18 2.1.3. Data preparation ...... 18

2.2. Methods ...... 20

2.2.1. Networks ...... 20

2.2.2. Network Measures ...... 23

3. Results ...... 26

3.1. Data exploration ...... 26

3.2. International Migration Network ...... 29

3.2.1. Visual Analysis ...... 30

3.2.2. Network analysis ...... 32

3.3. Socio-economic analysis ...... 45

3.3.1. Price on Micro-level ...... 45 3.3.2. Price on Meso-level ...... 50

3.3.3. Price on Macro-level ...... 51 3.3.4. In degree - GDP per capita ...... 51 4. Discussion, Limitations, and Further Research ...... 54

4.1. Contextualize findings with literature ...... 54

4.2. Limitations of data, methods, and analysis ...... 55

4.3. Ways Forward ...... 55

5. Conclusion ...... 56

6. Bibliography ...... 57

7. Appendix ...... 64

List of Figures

Figure 1: Boxplot of hourly rate before (A) and after filtering (B) ...... 19

Figure 2: Visual representation of an undirected network ...... 21

Figure 3: Nodes with betweenness centrality ...... 23

Figure 4: Age-Ethnicity distribution...... 27

Figure 5: The figure plots the directed weighted version of the IMN ...... 30

Figure 6: The figure reports the migration network with a focus on out-degree...... 31

Figure 7: (A) in-degree & (B) out-degree histogram ...... 33

Figure 8: Cumulative degree distribution of the IMN, plotted on a log-log scale ...... 34

Figure 9: Visualization of betweenness centrality in our network ...... 41

Figure 10: Communities obtained with the Louvain method, ...... 42

Figure 11: One-hour price distribution incall (A) and outcall (B) ...... 46

Figure 12: Correlation Plots between price and physical attributes ...... 48

Figure 13: Boxplots of median prices for different ethnicities in our data set ...... 49

Figure 14: Overview of cities in our database ranked on median price ...... 50

Figure 15: Overview of countries in our data base ranked on median price ...... 50

Figure 16: Correlation between in-degree and pcGDP ...... 50

Figure 17: Relationship between in-degree and out-degree ...... 50

Figure 18: Visualization of emigration network (bigger version) ...... 66

List of Tables

Table 1: Top 10 working locations of sex workers in our data set ...... 26

Table 2: Major migration flows in our data set ...... 28

Table 3:Example of our edge list ...... 29

Table 4: Ranking of countries in our data set based on Degree, In-Degree and Out-Degree . 35

Table 5: Biggest migration inflows into The United Arab Emirates ...... 36

Table 6: Most attractive countries based on the weighted number of immigrants ...... 38

Table 7:Countries ranked based on the weighted number of emigrants (weighted out-degree) ...... 39

Table 8: Most attractive countries according to their PageRank ...... 40

Table 9: Network measures compared to ensemble of 1000 random networks ...... 44

Table 10: : [Outcall price- Incall price] ...... 47

Table 11: Expanded list of migration flows in our network ...... 64

Table 12: Expanded table of centrality measures ...... 65

Table 13: one way ANOVA price/ethnicity ...... 67

Table 14: Pearson correlation price/cpGDP ...... 67

Table 15: Linear regression in-degree/GDP ...... 67

Abbreviations

IMN: International Migration Network

GDP: Gross domestic product pcGDP: per-capita gross domestic product

UAE: the United Arab Emirates

UK: United Kingdom

UNSD: United Nations Statistics Division

HFI: Human Freedom Index

EU: European Union 7

1. INTRODUCTION

1.1. MOTIVATION

With an estimated revenue of $186 Billion1 worldwide, the commercial sex industry is bigger than ever before. Until recently, researchers interested in prostitution or other sex work related topics were confronted with a scarcity of data. Compared to other industries for which official statistics are widely available (e.g. tourism, retail), the sex work industry has been underrepresented in the literature. This underrepresentation is no surprise; studying the business of prostitution before the Internet came along, required fieldwork. Due to the stigmatized nature of sex work and the tendency of sex workers to protect their privacy, field research has not always been an easy task. Scholars needed to be innovative; in the last decades, they have used techniques like email or phone calls to generate a small database on which they conduct their research. However, it is not until recently that researches have started to use ‘big’ data analysis tools on publicly available data to study sex workers. In this thesis, we investigate the validity of using web scraped data to build an international migration network.

In our study, we aim to map a particular migration flow (i.e. sex workers) which is severely underrepresented in migration research. To achieve this, we collected a database of over 50.000 sex workers, described their demographics, and mapped their migration flows. The application of network analysis to examine socio-economic networks is relatively new. For a long time, work on macroscale trends of international migration has focused on administrative records and national statistics (e.g. population registers), and simple measures such as the number of migrant inflows and outflows were used to analyze them. It is not until recently that use of publicly available self-reported geolocated data, such as Facebook advertising location data (Spyratos et al., 2019), or geotagged Twitter posts (Zagheni et al., 2014) have been used to map migration movements. To our knowledge, there have been no other studies that mapped the international migration of sex workers using complex networks. Earlier research of sex workers using network analysis techniques (Rocha et al., 2010) has been primarily focused on local communities. In our analysis, international migration is modeled as a graph (i.e. a network), where nodes are countries, and edges show migration flows. We study the properties

1 Prostitution : Prices and Statistics of the Global Sex Trade (Havoscope, 2015) 8

of this international migration network (IMN) from the network perspective, evaluating the centrality indices, and detecting communities. This methodology aims to provide a ranking of the countries based on their importance for the migration process. In view of the fact that there is little research on the demographics of online sex workers, we also performed a descriptive analysis of the data set we collected. Since there are no official statistics available on neither the migration nor the demographics of online sex workers, we believe our data may provide valuable insights into an otherwise hard to reach demographic.

1.2. LITERATURE REVIEW & CONTEXTUALIZATION 1.2.1. International migration

International migration is becoming an increasingly important aspect of the global economy. The last few decades can be defined as a period during which international migration has accelerated, globalized, feminized, and diversified. In 2019 the number of international migrants worldwide had reached nearly 272 million, up from 221 million in 20102. Over the last few decades, advancements in transportation and communication have led to remarkable increases in international migration (Elliot and Urry, 2010). Where migratory movements in the recent past often used to be between neighboring countries or between countries bound by past colonial relationships (e.g. North Africa -France, Indonesia-The Netherlands), they are now globe-spanning. This shift has been coined as the ‘globalization of migration.’ In studies, the ‘globalization of migration’ not only refers to the increasing numbers of countries that are affected by migration but also to the diversification of origin areas and socio-economic backgrounds of migrants (Danchev, 2015).

Until the late 1970’s migration patterns mainly consisted of ‘large groups moving from a particular place to a new place,’ while global migration since the 1980s has involved ‘small numbers moving from many places to many places’ (Vertovec, 2010). There is a large body of literature that explores the motives of immigrants. Most of the existing theories on migration focus on geographic, demographic, economic, historical, and cultural factors. One of the earliest and most straightforward theories about migration is the push-pull factor theory. The push-pull theory, first written down by Ravenstein (1889) defines push and pull factors for both

2 United Nations, Department of Economic and Social Affairs, Population Division (2019). International Migration 2019: Report (ST/ESA/SER.A/438). 9

the origin and destination countries. Push factors originate from the origin country and could be humanitarian or military conflicts, low wages, or political repression.

In comparison, pull factors for the destination country could be the prospect of better living standards, better job opportunities, and higher wages. Over the years, several other theories have been developed, but most of them are variants of push-pull theory. According to Haas et al. (2019) the critical issue of the push-pull theory is first, failing to explain why so few people migrate, and second, why some countries have high rates of out-migration while others, with the same structural economic conditions, have low rates.

The neoclassical economic theory is a framework that suggests that international migration patterns are related to the global supply and demand for labor. If we look at this theory on the micro-level, migration is the result of decisions made by individual ‘rational’ actors, who weigh the pros and cons of moving relative to staying. Rational individuals try to maximize their benefit and decide to migrate to richer countries. Although due to border control and immigration policies, such mechanisms rarely exist in pure form, it hypothesizes that global immigration flows from poor to rich countries. We will test this hypothesis on our data.

Some migration theories focus on ‘chain migration’ (Bartram et al., 2014) between countries. Chain migration occurs when big waves of immigration, e.g. (Europe to the US in the 19th century), will boost further inflow because, on average, the size of migrant groups in that country is already large. This is why countries such as Canada and the US are regarded as being tolerant and open and are still attractive to immigrants. The same applies to cultural characteristics such as religion and language. When migrants practice the same religion or already have a basic knowledge of the language, the cost of migration decreases.

1.2.2. Network analysis of migration

Network analysis has proven to be a useful tool to study relationships between actors (e.g. people, countries, web pages). The ability to study patterns of relationships that emerge from interacting actors and examine how network structure affects the dynamics of a system, combined with the increasing complexity of international-migration patterns has led to an excellent match between network analysis and migration analysis. The study of migration using a complex network analysis perspective has gained popularity over the last decade. One of the main reasons why it took such a long time is the limited availability of migration data between countries (Fagiolo and Mastrorillo, 2013).

We can categorize the applications of network analysis on migration data into two main parts (Danchev and Porter, 2018). The first focuses on micro-networks, for example, in the form of migrant’s personal networks. These networks are constructed based on interpersonal relationships that link migrants and non-migrants. Recent work has used data from interviews and surveys (Popielarz and Cserpes, 2018) to analyze the migrant experience as a function of citizenship, legal status, or ethnicity. Studies often use ‘social network analysis’ (SNA) (Wasserman and Faust, 1994) to analyze these networks. SNA allows us to examine migrants’ personal networks and the relative importance of ties to other migrants and ties to non-migrants. Migrant’s personal networks can circulate information about destination opportunities or provide initial employment, accommodation, and overall assistance (Blumenstock et al., 2019). The second application of network analysis for migration networks focuses on macro-scale migration networks. Trough aggregate data, migration is studied as a ‘mechanism that connects places’(Gunther & Vyborny, 2005) Researchers have examined networks of movements that connect different states in the United States (Charyyev and Gunes, 2019) and global networks of international-migration ties that connect different countries (Fagiolo and Mastrorillo, 2013). In this thesis, we will focus on macroscale migration networks since that is the best fit for our data. Note that SNA is also used to analyze macro-scale migration networks. In their work about internal migration between US-states (Gunther & Vyborny, 2005) stated that since SNA focuses on the relationship between actors, it also deals with interaction and is therefore suited for analyzing migration. In this context, the term ‘social’ has nothing to do with social networks such as networks of friends and relatives, but the same methods are used to study the connections between countries.

1.2.3. Migration data

One of the major reasons holding back the usage of network analysis to study macroscale international migration was the lack of compatible migration data. Migration research should ideally be based on data of migration decisions at the level of individual actors or households, e.g. a study by Massey (2015). Unfortunately, this data is not always available. Migration research has been able to focus on country-specific data for migration between areas or cities (if reported), but studying global macro migration issues could, until recently, only be done using a very small sample of reporting countries. The research used to rely exclusively on national statistics (e.g. population registers, surveys) or administrative records. This scarcity of data has changed over the last few decades. In 2011 a reliable bilateral-migration database covering most of the world’s countries during the period 1960-2000 had been made available 11

(Docquier, 2011). This allowed researchers like Fagiolo and Mastrorillo (2013) and others in their wake to study migration from a complex-network perspective. The main limitations of administrative records and national statistics data are that they are often highly aggregated and available only after a substantial time lag.

In the age of big data, there are unprecedented opportunities to scrape, generate, store, and analyze massive amounts of online geolocated data. Data can be scraped from websites, social media (Spyratos et al., 2018), and even from mobile phone records (Blumenstock, 2012). When online geolocated data is publicly available and accessible, it provides an opportunity to (responsibly), study social interactions, and human mobility. Along with the advancements in computing infrastructure and data storage capacities, one could exploit the availability of such geolocated data to increase understanding of migration networks. However, because of various ethical, data-availability, and reproducibility challenges, the adoption of those new sources of migration data has been relatively slow.

1.2.4. Digitalization of Commercial sex work

The wide adoption of technology and e-commerce in today’s economy has revolutionized the way we do business. Whereas 15 years ago, e-commerce was still a niche market reserved for the big players, today, independent business owners can set up a webshop and advertise their goods and services in just a couple of clicks. Due to the quick rise of technology, e-commerce has become prevalent for all layers of society and provides unique opportunities for linking marketers with consumers. The Internet has had a profound impact on people buying and selling sexual services. The sex industry is being increasingly operated online, and sex workers have been provided with a wide array of new online possibilities. Most importantly, they are provided with a new venue through which they can reach potential clients (Jones, 2015). Additionally, the Internet has also seen the development of completely new forms of commercial sex (e.g ., webcam, private Snapchat accounts, selling personalized pictures/video’s).

Generally recognized as one of the oldest professions, commercial sex work has been the focus of researchers from a wide range of fields. Public health studies have investigated the role of prostitution in the transition of sexually transmitted infections (STI’s) (Baral et al., 2012). Research literature has focused on the use of Internet technology for solicitation (Cunningham

and Kendall, 2011), sex trafficking (Baker, 2013), or focused specific groups of sex workers, e.g. male sex workers (Walby, 2012) or strippers (Hardy and Sanders, 2015).

Undoubtedly, commercial sex work has profited tremendously from digitalization. Given the possibilities internet offers, people involved in commercial sex work have tried to maximize their presence on the Internet. In recent years studies have investigated the influence of digital technologies on how sex workers reduce their exposure to classic risks and maximize profit (Teela et al., 2018).

In this thesis, we will narrow our focus on a special group of sex workers, escorts. An escort is a term used to describe a sex worker (man, woman, trans) who is paid not for the act but instead is paid for their time (usually an hourly rate). Escort services are often more expensive and deemed higher quality and cost compared to other sex workers. Literature about the movement of escorts to the Internet focuses on how they use the Internet for marketing their services and screening their clients (Walby, 2012, Cunningham and Kendall, 2011). For escorts, the movement online-enabled horizontal and vertical differentiation. Vertical product differentiation can be seen in the services they offer (erotic-massage, BDSM, ‘girlfriend experience’ ). Horizontal product differentiation was enabled as the advertising and search cost for a specific sex worker variety were reduced trough online platforms, and as reviews enabled sex workers to build a reputation. On top of enabling differentiation, increasing the earning potential, improving the working conditions, and reducing much of the risks, the Internet has also expanded the overall sex work market. As stated by (Jones, 2015) “The increase of online sex work is not just a reflection of a unilateral move of existing prostitutes from the streets to online environments. Instead, online sex work reflects an expansion of the market of sexual commerce. The Internet has created additional spaces for the sale of sexual goods and services”. While overall, the adaptation of the Internet had a positive influence on the safety of the sex workers, there are still inherent dangers in being a sex worker. Even online, sex workers are often faced with harassment or violence.

1.2.5. Escort websites

The solicitation of escort services has shifted from street corners to easily-accessed online websites (Castle and Lee, 2008). Escort advertising websites are sites that feature profiles of sex workers containing: photos, rates, physical attributes, information about the services they offer, and the city/country where they are currently located. Some of these sites ask a monthly

fee to advertise; some are free to use and run on advertisements. As well as increasing the number of potential clients, escort websites offer the possibility of removing the need for third- party involvement (Bernstein, 2008). This evolution has allowed for a new type of sex worker, the independently employed, Internet advertising sex worker. These sex workers differ from other street, brothel, or agency based sex workers in that they have a higher degree of control over their working circumstances, set their own rates, screen their clients, and preserve more of the profit. Although some research is available about the content of escort websites (Castle and Lee, 2008), very little is known about the ethnic, racial, and class diversity of Internet- based sex workers. In this thesis, we will try to provide some insight into this subject.

Since sex work is not regulated in most of the world, little data about the price of sex is publicly available. Using data from escort websites allows us to collect price data and physical attributes of sex workers (e.g. age, height, weight). Research by Griffith et al., (2016) has shown that higher fees were associated with female escorts who advertised a waist-to-hip ratio near 0.7, a lower weight and body mass index, younger age, and photographic displays of breast, buttocks and, nudity. Successful female sex workers are aware of the traits and qualities male patrons are looking for, and they likely tailor their services and fees to the conditions of the market (Edlund and Korn, 2002)

1.2.6. Migrant sex work

Sex work should be seen as an international phenomenon, involving an increasing number of women and men from countries all over the world. Migration is a road many take to seek other opportunities or to break away from oppressive local conditions. Since the 1970s, there has been a notable global inflow of sex workers who migrated from Asia, Africa, and Latin America. For instance, in Western Europe, the sex work industry has seen a constant increase in the number of Central and Eastern Europeans that have been initiated into or continue to practice as sex workers (Brussa and Munk, 2010). In 1991, around seventy percent of the sex workers in Japan were reported to be Filipino, and the red-light district in Bombay, India relied mostly upon migrant sex workers, many of which originated in Nepal (Amoore, 2005). However, laws prohibiting or regulating prostitution can create complex and oppressive situations for sex workers who migrate. The illegal movement of persons for work elsewhere, commonly known as ‘trafficking,’ becomes a real issue for people involved in sex work. In this thesis, we will not go in-depth about trafficking.

Much of the research around the migration of sex workers has been focused on prevention and promoting awareness on HIV/AIDS and other STD’s. One of the major driving forces behind much of the work in Europe is the TAMPEP (Transnational AIDS/STD Prevention among Migrant Prostitutes in Europe) project. TAMPEP was Founded in 1993 in response to the needs of migrant sex workers across Europe. They fight for the decriminalization of sex work and equal protection access to health, rights, and justice for sex workers.

1.2.7. Conclusion of the literature review

Recent digital technologies allow for new exciting opportunities to study age-old phenomena such as migration and sex work. In this thesis, we will build on previous research and combine network analysis and self-collected online sex worker data to create a migration network. We hope to achieve: A) insights on the individual level B) Macro-level insights into the migration patterns of sex workers, such as: identifying the most important countries in the migration network and defining drivers of migration. There is still limited academic interest in the economics and economic motivations of sex workers, so we hope this thesis can be a meaningful addition to the literature. We could not find previous studies researching the migration patterns of sex workers using network analysis. After concluding our literature review, we wish to test some hypotheses on our data.

Hypothesis 1: We expect that certain physical attributes of sex workers are linked to their rates.

Hypothesis 2: We expect migration ties to be linked to linguistic factors and a common colonial history.

Hypothesis 3: Following the push-pull theory of migration, we expect countries with a high GDP to attract more sex workers than countries with a lower GDP.

2. MATERIALS & METHODS

The following chapter will cover the scraping, description, and preparation of our data set. In order to test our hypotheses, we analyzed the data on an individual level and used an aggregated version to build a migration network.

2.1. DATA

The data for this dissertation was collected from a single ‘escort directory website.’ On this website, sex workers can make a personal profile to advertise their services for potential clients. On their profile, they display pictures, describe the services they offer, describe their physical attributes, and indicate the amount they charge for services. The website we used is explicitly catered to escorts, so no other services (e.g. webcam, chatting) are advertised.

The website is publicly accessible and ranks highly when googling for queries like ‘escort in + country/city.’ Using similarweb.com, we estimate a total of 2.1 million visitors during the month we scraped our data (March 2020). Note that this is ~20% less than previous months, possibly due to the current influence of the COVID-19 virus on the sex work industry3. The choice for the site was influenced by the search engine ranking, the high number of sex workers registered, and the well-structured, vast amount of information we can scrape for each sex worker. Broadly, the information available for each sex worker can be divided into three categories. First, there is the general information about the sex worker ( e.g. gender, age, weight, nationality, breast size), which is mandatory to fill out by sex workers when they create a profile on the website. The input needs to be selected from a drop-down menu; as such, the data scraped from this section is uniform for each profile and eases the data processing.

The second data category consists of rates, services, and reviews. Rates and services are not mandatory, and reviews are not always available; therefore, the information in this section varies for each profile. Services have to be selected from a drop-down menu. Hence the names of the services are uniform across all profiles. A sex worker can select ‘Included’ or ‘ Extra’ for specific services. When ‘Extra’ is selected, there is a price indicating how much extra the service will cost. The rates are listed based on time and whether the client prefers incall (i.e. the client comes to the place of business of the sex worker) or outcall (i.e. the sex worker will

3 During March 2020 global lockdowns have shut-down public life and severely limited international travel. 16

go to the client). The prices are often noted in the local currency; we will deal with this in the data processing step. For our study, we did not process the reviews, but the data is available in the source files. For every review, a star rating, date, meeting length, city/country, and the cost of the encounter is mentioned. Note that in our data set, only 764 out of 58.612 sex workers have reviews.

The third data category is contact data. Here we find a cell phone number, and (if the sex worker is connected with an agency) a website URL. Also listed are the country and city where the sex worker is currently active. Cell phone number and website are not mandatory, but the country and city where they are currently active are. This means we have this information for every sex worker. We do not process cell phone numbers, photos, or individual-specific information in our data set. The key factor that allows us to create a migration network from this data set is the combination of nationality and the country where the sex worker is currently active; we will go more into detail in later sections.

To create our base table, the following features are scraped from the website.

Base table

• Unique identifier: model_id • Description (short text where the sex worker can introduce themselves) • Name • Gender • Age • Location where the sex worker is active (City/Country) • General features: Weight, Height, Eye Color, Hair color, Bust size, Bust type,… • Ethnicity • Orientation (Straight, Bisexual,Trans) • Nationality • Languages • Provides (Incall/Outcall or both) • Meeting with (Man, Women, both) • Info about Services offered (service_included( 1 if included ; -2 if not listed), service_extra(price listed if extra; -1 if included in price; -2 if not listed) • Price, listed in local currency and for incall/outcall o 1hr incall, 2hr incall, … 24hr incall o 1hr outcall, 2hr outcall, … 48hr outcall • Dummy variables o Verified dummy, vip_dummy, review_dummy, independent_dummy

The various economic indicators (e.g. GDP) and statistics of countries/regions were found in UNData, a data access system to the United Nations (UN) database. (https://data.un.org/)

2.1.1. Data Collection

The code for this project was written in Python, and R. Visualizations were made using Python, R, and Tableau. The data collection process can be split into two parts.

First, a script was made to loop through the profiles of all sex workers on the website and download an HTML text file for each profile. During the scraping, a random delay was added between requests; this method allows for minimal impact on the servers of the website. We collected a total of 54.814 HTML files. During the extraction, we made a distinction between male and female profiles. Since the site is primarily focused on female sex workers, and there are only 1.324 male profiles, we decided to focus our research on female sex workers.

Next, using the Python library BeautifulSoup, we parsed the HTML files to extract the required information and create a base table. Due to some profiles having more or less information listed than other profiles, parsing the HTML files was a bit of a challenge. We decided to parse as many features as possible in case they might be useful later on. No pictures were scraped, and telephone numbers, email addresses, and other identifying information were dropped from the data.

2.1.2. Data cleaning

Since the data collection was done by parsing HTML files, cleaning our data is relatively straightforward. The most important part was removing the bad records that were introduced during the scraping. We dropped the sex workers that did not have a nationality or a location listed since these are the essential features we need to build our network.

2.1.3. Data preparation

Nationality data In order to build our migration network, we need to be able to define the country of origin of the sex workers. We make the assumption that the nationality (e.g. Russian, Brazilian) listed on the profile of the sex worker can be linked to the country of origin (e.g. Russia, Brazil). Note that there may be an error in this approach, as to how people report their ‘nationality’ is

the result of personal interpretation. While we do not think this measurement error undermines our macro-level analysis, this data should not be seen as a substitute for official statistics.

To make this link, we use a nationality/country data set found online and merge this with our table using nationality as a key. We dropped the nationality table and only kept the new country table. We name this column source. Using the location data listed on the sex worker’s profile, we create another column destination. This column indicates where the sex workers currently offer their services. Some small changes needed to be made to the data (e.g. changing the spelling of some countries to match in both columns). There were different records for the UK and England. It was decided to change both England and the UK to the United Kingdom. Since making a separation between the UK and England would complicate the understanding of the network. To allow visualization, we added the latitude and longitude coordinates for every country to our data set.

Price data In this thesis, we will look at price data on both the individual and country level. To analyze prices on a country level, we aggregated the price data on a country basis and calculated the median price for each country. We used the median instead of the average to better deal with outliers. Considering not every sex worker has the same hourly rates listed on their profile, e.g. some only have 1/hr incall or 1/hr outcall price listed and some have all prices listed (1hr outcall/incall-2hr-…-48hr), we decided to use the 1hr/outcall rate because it is the most listed price on the website. After dropping all the records that do not have price data for 1hr/outcall, we are left with 35.5918 sex workers. Since most prices are listed in USD or the local currency, we converted all currencies to USD. In total, there were 27 unique currencies in our database.

A B

Figure 1: Boxplot of hourly rate before (A) and after filtering (B)

While visually checking the price data (Figure 1, A), we identified some outliers within the converted prices, probably caused by junk data. We noticed that some of the local currency 19

prices were wrongly listed as USD prices; this resulted in some converted USD prices being too high. e.g. 200$ is around 15.000 INR (Indian Rupee), so if a price in INR is accidentally listed in USD, this can cause issues. To filter out these wrong records, we used the interquartile range (IQR). We removed the values that lie in the lowest quantile and the highest quantile.

In total, 661 records were removed. The aggregated data provides a way to compare sex worker prices between countries. To make the comparison meaningful, we removed all countries that had less than three records making up the median country price. Three countries were removed.

Other economic data The country_statistics data set is a data set of economic, social, and environmental indicators with data from the United Nations Statistics Division (UNSD). The data is from 2017 and contains info about population, surface area, GDP, GDP per capita for every country in our database. We also use data from the HFI4 (Human Freedom Index). The HFI presents the state of human freedom in the world based on a broad measure that encompasses personal, civil, and economic freedom.

2.2. METHODS 2.2.1. Networks

We will briefly discuss the network-related terminology used in this thesis. For a more thorough reference about the network theory discussed here, we refer to (Newman, 2010)

Network analysis can be used as a powerful tool for studying relationships between actors ( e.g. countries, people, web pages). Researchers from various fields such as economics, physics, and sociology have used network analysis to study complex systems of interconnected actors. In general, a network can be seen as a way to represent the relationships, called edges, between actors, called nodes. Over the years, researchers have described the world wide web (Barabási et al., 2000), flight patterns (Lin, 2012), and endless systems in biology, logistics, transportation using networks (Ideker et al., 2012, Phillips et al., 1998). To create a network

4 Data can be found on : https://www.cato.org/human-freedom-index-new 20

from a system, you need to extract the set of actors as nodes and a set of relationships between those actors as edges (e.g. hyperlinks, migration between countries, social ties). Using this data network analysis makes it possible to study patterns that emerge from the relationships between actors, or to examine how network structure affects the importance of particular nodes and edges.

Mathematically networks can be represented as graphs. A graph G =(V,E), with V a collection of N vertices(nodes) and E the set of M pairs of edges (links) connecting the nodes of V. Each node can be identified by an integer value i = 1,2,…,N; the edges are identified by a pair (푖, 푗) that represents a connection from node i to node j. A common representation of a graph is the adjacency matrix. The adjacency matrix A of a network is the matrix with elements Aij such that

1 푖푓 푡ℎ푒푟푒 푖푠 푎푛 푒푑푔푒 푏푒푡푤푒푒푛 푛표푑푒푠 푖 푎푛푑 푗 퐴 = { 푖푗 0 푂푡ℎ푒푟푤푖푠푒.

Figure 2: Visual representation of an undirected network with 10 nodes (or vertices) and 11 edges (or links)

Edges Two types of edges can be present in a network directed edges and undirected edges. Directed edges connect two nodes implying a direction. In the directed case (i,j) ∈ E does not imply (j,i) ∈ E. Visually, this is represented by the edge having an arrowhead indicating a one-way direction. Undirected edges indicate a relationship between nodes but without direction. The

presence of an undirected edge (푖, 푗) ∈ E means that a connection exists from i to j and from j to i. As seen in Figure 2, Undirected edges are visually represented by lines (no arrowheads). Networks can be either directed (i.e. all edges are directed) or undirected (i.e. no edges are directed). It is clear that when talking about migration, the direction of the edge plays an important role. The migration network will use directed edges.

An example of an undirected network could be a friendship network between members of a club. If the nodes represent members of the club, and there is an edge between two people if they are friends, than this graph is undirected because any person A can be friends with person B only if person B is also friends with A.

Edges can be either weighted or unweighted. An edge weight 휔(푖, 푗) is a numerical value attached to each edge. A weighted edge reflects the strength of the relationship between the nodes. In our migration network, we define the edge weights as the number of sex workers that move between countries connected by the edge.

Degree The degree of a node i is the number of connections it has to other nodes. We denote the degree of node i as 푘푖. For undirected networks, the degree can be written in the function of the adjacency matrix as

푘푖 = ∑ 푎푖푗 = ∑ 푎푗푖 푗 푗

As the network is undirected the direction of the connection does not matter; there is no difference between 푎푖푗 or 푎푖푗.In a directed network, the direction of the edge does matter. The 표푢푡 out-degree denoted as 푘푖 , of node i is the number of edges originating or ‘pointing away’ 푖푛 from i. Accordingly, the in-degree denoted as 푘푖 , is the number of incoming edges or the number of edges ‘pointing towards’ i.

표푢푡 푖푛 푘푖 = ∑ 푎푖푗 푘푖 = ∑ 푎푗푖 푗 푗

푡표푡 푖푛 표푢푡 The degree of a node in a directed network is the sum of the in and out-degree 푘푖 = 푘푖 + 푘푖 .

2.2.2. Network Measures

2.2.2.1. Centrality measures

Not all nodes in a network are equally important in determining the network’s structure. Centrality indices give an estimation of the relative importance of a node in the context of other nodes in the network (Borgatti, 2005). We will describe two of the most commonly used centrality measures.

Degree Centrality The most straightforward measure is to look at the degree centrality. The degree centrality for a node is simply its degree. In directed networks, we have an in and out-degree, both can be useful as measures of centrality depending on the circumstances. A node has a high degree of centrality when it has high connectivity within the network. In our case, the degree centrality characterizes the number of different countries connected with the node.

Betweenness Centrality Betweenness centrality (Freeman, 2004) measures how important a node is in the average pathway between other pairs of nodes. In a migration network, it assesses how crucial the presence of a country is for mediating human movements between different countries. Nodes with a high betweenness centrality may have a bigger influence within a network. The purpose of betweenness centrality is to measure the extent to which an edge contributes to the global connectivity of the network. Mathematically we can describe the betweenness centrality as

휎푠푡(푖) 푐퐵(푖) = ∑ 휎푠푡 푠,푡∈푉

Where V is the set of nodes, 휎푠푡 is the number of shortest paths between nodes s and t, and

휎푠푡(푖) represents the number of shortest paths from s to t that pass through node i. A visual representation of betweenness can be seen in Figure 3, where the nodes that connect other nodes have large betweenness (0.57).

Figure 3: Nodes with betweenness centrality

2.2.2.2. PageRank

PageRank is an algorithm originally developed to rank web pages (Page et al., 1998). It gives each page a rating of its importance, which is a recursively defined measure whereby a page becomes important if important pages link to it. The underlying assumption is that more important pages are likely to receive more links from other websites.

One way to think about PageRank is to imagine a random migrant moving from country to country, following the edges (migratory flow) between countries. This migrant sometimes decides to go to any country randomly. Since weighted edges are considered, if the number of migrants between countries is high, the probability that the migrant will follow this edge will also be high. By assuming the migrant moves out an infinite number of times, the PageRank of any country is roughly the probability that the random migrant has been in this country. In a migration, network PageRank identifies attractive destination countries

2.2.2.3. Triadic closure/Clustering

Triadic closure is often observed in social and spatial networks. The idea behind triadic closure is rather simple: in many networks, it occurs that if node A is connected to node B, and node B is connected to node C, there is a larger than the expected probability that node A will also be connected to node C. This is sometimes described as ‘friends of my friends are my friends.’ Factors that could influence triadic closure in networks could be social closeness (e.g. common religion, language) or geographical closeness. In this thesis, we will use the clustering coefficient 퐶푖 as a measure for triadic closure. The clustering coefficient 퐶푖 is defined as the number of triads in which node i participates divided by the maximum possible number of triads:

2푡푖 퐶푖 = 푑푖(푑푖 − 1)

Where 푡푖 denotes the number of triads around i. and 푑푖 denotes the degree and 푑푖(푑푖 − 1) the maximum possible number of triads. The value will always be between 0 and 1 since it is the percent of possible edges that are realized. The average clustering coefficient C for the entire network is the mean of the clustering coefficient for each node

1 퐶 = ∑ 퐶 푛 푖 푖

2.2.2.4. Shortest path & diameter

The clustering coefficient helps to identify local clusters in the network but provides limited information about the overall connectivity of the network. The average shortest path length can be used as an indicator of global connectivity. A path in a network can be defined as a sequence of edges which joins a sequence of distinct nodes. The shortest path is the path with the minimum number of edges between 2 nodes. The path length is equal to the number of edges passed traversing between those two nodes. Mathematically, the average shortest path length can be written as:

푑(푠, 푡) 푎 = ∑ 푛(푛 − 1) 푠,푡∈푉

Where V is a set of nodes, d(s,t) is the shortest path from ’s’ to ‘t,’ and n is the number of nodes in the network. In a weighted network, the weight of the edges is taken into account, and a weighted sum for d(s,t) is used.

2.2.2.5. Community detection

The ability to detect communities within large networks is something that has been of great interest to researchers. For example, in a network of web pages, communities might correspond to sets of sites dealing with related topics; in a migration network, communities might group countries with dense internal migration connections. Over the years, a wide array of community detection algorithms have been developed; in this thesis, we will use the Louvain method (Blondel et al, 2008). The Louvain method is an algorithm for detecting communities in networks by maximizing a modularity score for each community.

Modularity quantifies the quality of an assignment of nodes to communities. This means evaluating how much more densely connected the nodes within a community are, compared to how connected they would be in a random network. In a randomly wired network, the connection pattern between the nodes is expected to be uniform, independent of the network’s degree distribution. Modularity can be either positive or negative, with positive values indicating the possible presence of community structure (Newman, 2006). Thus, when searching for divisions of a network that have positive (and preferably large) values of modularity, communities can be identified.

For a directed weighted graph, the modularity Q can be defined as :

1 푘표푢푡푘푖푛 푄 = ∑ [휔 − 푖 푗 ] 훿(푐 , 푐 ) 푚 푖푗 푚 푖 푗 푖,푗

표푢푡 푖푛 With 푚 = ∑푖,푗 휔푖푗 the sum of all the weighted edges; 푘푖 푘푗 the weighted out and in- degrees; 푐푖 the community of node i; 훿(푐푖, 푐푗), equals 1 if i and j are in the same community, 표푢푡 푖푛 0 otherwise; (푘푖 푘푗 )/푚 is the probability to have an edge from i to j in a random graph having the same configuration as ours. Now we need to find the partitioning producing the highest modularity Q. This will be the optimal partition for our network. To do this, we use the Louvain method (Blondel et al., 2007). This method maximizes the modularity score for each community. 3. RESULTS 3.1. DATA EXPLORATION

Our data set contains a total of 58.612 unique profiles from sex workers. Due to the website being advertised as a place to contact female sex workers, and the limited number of records for male sex workers (+- 1.000), we decided to focus this thesis on female sex workers.

Ranking Country # sex workers 1 United Kingdom 7.958 6 Netherlands 2.613 2 Malaysia 5.478 7 Turkey 2.553 3 Germany 4.069 8 Italy 2.397 4 United Arab Emirates 3.286 9 France 2.029 5 Spain 3.085 10 Russia 1.888 Table 1: Top 10 working locations of sex workers in our data set

Table 1 shows the top countries in our data set; this ranking is based on the number of profiles currently active in each country. Within the top 10, we see a distribution of Western-European countries, Asian and Middle Eastern countries. Assuming the number of sex workers registered in each country is an indication of popularity, we can assume countries having a high number of sex workers registered, will provide more traffic to the site compared to countries with a low number of profiles. These locations are a mix between countries that have legalized (The Netherlands, Germany) or decriminalized (Malaysia) prostitution and countries where

prostitution is illegal (United Arab Emirates). The variation of countries in Table 1 shows that our data set can provide an outlook on the demographics and migration patterns of sex workers throughout the world. In total, there are 106 unique countries represented in our database.

Our data set allows a unique insight into the demographics of internet-based sex workers. The average sex worker in our data set is a 24 years old female, weighs 53.4 kg, is 167 cm tall, and charges $219 for 1h of services (outcall). The ethnicities of sex workers are predominantly European (60.93%) and Asian (19.61%). The other ethnicities, Latin (6.79%), Mixed (4.29%) Ebony (3.48%), Indian (3.03%), and Arabian (1.83%), are a minority in the data set. Around 69% of the sex workers meet exclusively with men, 30% with both men and women. The remaining 1% meets with couples (0.6%) or exclusively with women (0.4%). There is no significant difference between the rates of sex workers based on whom they meet.

Figure 4: Age-Ethnicity distribution, number of records (x-axis) of a certain age (y-axis). Bars are colored based on ethnicity distribution within that age group

We observed that there are almost no 18-year-old sex workers in our data set. We could theorize this is due to escort work requiring more experience, and there is a lower demand for 18-year- old sex workers in the industry. We can see that most of the sex workers lie between the age bracket of 19-30 years old. When looking at the ethnicities within age distribution, we can make some observations. Firstly, we notice a steep decline in Asian sex workers after age 25. 27

Since the majority of Asian sex workers in our data set are from China, an explanation for this could be found in Chinese culture. In China, the derogatory term ‘leftover women’ is widely used by state-owned media to describe women in their late twenties who are not married (Fincher, 2014). As a result, there is significant pressure on Chinese women to get married before they reach thirty. The strong decline of Asian sex workers after age 25 is in line with the effects of China’s mass media ‘leftover women’ campaign. Although our data set is limited, this age distribution gives us an interesting view of a cultural difference between sex workers.

Source Target Migration flow China Malaysia 2905 Russia Italy 1340 India United Arab Emirates 923 Russia Israel 836 Russia France 780 Romania United Kingdom 766 Ukraine Turkey 707 Russia Turkey 609 Russia United Kingdom 582 Italy Netherlands 439 Vietnam Malaysia 434

Table 2: Major migration flows in our data set

Sex workers often participate in (short-term) migration. Around 66% of the sex workers in our data set are not active in their country of origin. Table 2 shows the major migration flows in our database. Notable is the migration flow between China and Malaysia. With a flow more than twice the size of the second position, this is by far the largest migration flow in our data. Malaysia appears twice in the target column; we could hypothesize Malaysia is a popular destination for sex workers in South East Asia. Malaysia is a country with a broad immigration policy, and immigrants represent around ~40% of the population5. We notice the dominant presence of Russia in the source column. This implies that Russia is a high ‘exporter’ of sex

5 Department of statistics Malaysia, Population distribution study (2010) 28

workers. The strong migration flow between India and the United Arab Emirates is not unexpected. Given the UAE form a principal destination for Indian emigrants in search of jobs (de Bel-Air, 2015), many of the migration systems are already in place. Other migration flows may not be as straightforward to explain; we will discuss push-pull factors of migrations in more detail later in this thesis.

3.2. INTERNATIONAL MIGRATION NETWORK

The global migration patterns of sex workers can be treated as a network. We call this the International Migration Network (IMN). Countries are viewed as nodes, and the migration flows between countries are edges. To build our network, we construct an edge list. The edge list consists of three columns: Source, Destination and Weight. The source column is based on the nationality of the sex worker, and the destination column is based on the location where the sex worker is currently active. We then aggregate all edges and count the number of times each connection occurs in our data set; we call this the weight. This will give us a weighted directed edge for each migration flow in the data set.

Source Destination Weight

China Malaysia 2905 Germany Germany 2295 Russia Russia 1614 Russia Italy 1340 Malaysia Malaysia 1131 India India 1067 India United Arab Emirates 923 Russia Israel 836 Russia France 780 Romania United Kingdom 766

Table 3:Example of our edge list. The source column indicates the country of origin; the destination column indicates where the sex worker is currently active. The weight indicates the size of the migration flow

Table 3 provides an example of our edge list. The first row is interpreted as follows: in our data set, there are 2905 sex workers born in China that are now active in Malaysia. Note that some of the edges with the highest weights are ‘self-loops.’ Self-loops are when a sex worker is active in their country of origin.

3.2.1. Visual Analysis

Visualizing networks is a crucial step while studying them. To visualize our network, we added lat-long coordinates and projected each node on a world map. When visualizing networks, it is important to select the correct approach. We have selected two visualizations, each serving a unique purpose.

Figure 5: The figure plots the directed weighted version of the IMN where only links with weight larger than 5 are reported. The thickness of the edges is proportional to the weights. The size of the nodes is proportioninal to the log of country population. Node color represent country income, measured by the GDP per capita Figure 5 is designed to increase the overall understanding of the structure of our network. Inspired by Mastrorillo and Fagiolo (2013), we plot the weighted IMN where node size is made proportional to the logged country population, while node color (from beige to darker red) represents country income, measured by the country's GDP per capita. To accomplish a better visualization, only edges with a weight larger than 5 are shown. The map allows one to appreciate the relevant countries that are involved in the migration of sex workers within our data set. Notable is the widespread presence of low-income countries. The high density of edges in Europa shows the importance of European countries. The issue with this visualization is that it does not indicate which countries are the most important suppliers of sex workers in our data set. Therefore we have included Figure 6.

Figure 6: The figure reports the migration network with a focus on out-degree. The size of the nodes is proportioninal to the out-degree. The thicknes of the edge is proportional to the weight. Node color represent out- degree (Darker indicates higher out-degree)

Figure 66 shows a visual representation of the directed migration network, focused on emigration. The size of each node is scaled based on out-degree (i.e. the number of different countries receiving migrants from the node). Nodes with a higher out-degree are displayed bigger than nodes with a smaller out-degree. The weight of each migration flow defines the color and size of the edge. Edges with a higher weight will be thicker and have a darker color. The arrows indicate the direction of the migration flow. Some significant nodes are the United States and South American countries like Brazil, Colombia, and Venezuela. When we look at the edges, we can see a strong connection (darker lines) between countries like Brazil and Portugal and Colombia/Venezuela and Spain, Supporting the hypothesis that countries with strong social proximity (language and colonial ties) will have more migration between them.

When observing Europe, we see that Central and Eastern European countries are the dominant nodes—confirming the idea that many of the sex workers in Europe originate from eastern European countries. The biggest nodes are Ukraine, Poland and the Chez Republic. Due to sex

6 A bigger version of Figure 6 is available in the appendix 31

workers migrating for economic reasons, Ukraine being the second poorest country7 in Europe will likely contribute to its high out-degree. The size of Russia stands out on the map. It is clear that Russia is a big supplier of sex workers and has strong connections (noted by darker red edges) to several European and Asian countries. In our data set, ~20% of migrant sex workers have a Russian nationality. The top 3 European destinations for Russian sex workers are Italy, France, and Spain. With Italy having near double the weight of France and France having approximately double the weight of Spain. We wanted to see if the strong connection between Russia and Italy was also present in official migration statistics; this is not the case. We could hypothesize the exceptionally strong Russia-Italy connection is due to the high demand for Russian sex workers in Italy. These effects can then be enhanced as a result of migrated sex workers personal networks spreading information about destination opportunities and providing assistance to other sex workers, thereby reducing movement costs and risks (Gurak and Caces, 1992)

Weight wise the biggest migration flow in our network is the connection between China and Malaysia. Given that prostitution is illegal in most of China, it is expected that sex workers will migrate to more favorable countries. In Malaysia, there are no federal laws against prostitution, so from a policy perspective, this migration flow makes sense. As though why so many Chinese sex workers migrate specifically to Malaysia is hard to say. Some possibilities could be proximity (distance-wise) or due to over 25% of Malaysians being ethnically Malaysian Chinese, thus creating a social connection like mentioned before. However, since there is limited research about female sex work in Malaysia, it would be hard to establish these assumptions. We should also take into account our data set is based on one website; it could be the website is just popular in Malaysia.

3.2.2. Network analysis

The vast majority of complex networks share common features. Since migration occurs within a network, studying its properties is fundamental to understanding migration patterns and the underlying process of globalization. Unlike more straightforward measures (e.g. the total number of migrants living in each country, net migration rate), network measures can provide an integrated understanding of the IMN. In this section, we will look at the structure of our network and apply the measures discussed in section 2.2

7 Using cpGDP as a measure 32

3.2.2.1. Centrality measures

Not all nodes in a network are equally important. We will identify the most important nodes using centrality measures. In this thesis, the following measures will be evaluated: degree and weighted degree centrality, betweenness, and PageRank. The centrality measures are computed on the weighted directed IMN with self-loops8 removed.

The most straightforward centrality measure is simply to look at a node’s degree. We will start by discussing the simple and cumulative distribution of the degrees. Describing the degree distribution is an important step in learning more about the structure of our network. For directed networks, we have both in- and out-degree distributions. In most real-world networks, the degree distribution is highly skewed: most of the nodes have low degrees while a small but significant fraction of the nodes has high degrees. These well-connected nodes are called ‘hubs.’ Hubs are defined as nodes with many edges or with edges that place them in central positions for facilitating traffic over a network. Since the probability of hubs, although low, is significant, the degree distribution of real-life networks is often characterized by a positive skewness and a long tail.

We observe this long tail on the out and in-degree distributions in Figure 7. Most notable in the in-degree histogram, this asymmetric shape of the degree distribution has several consequences for the migration process. The highly connected nodes, the hubs, are generally responsible for keeping the network connected. Hubs are essential for spreading information in a network, but the removal of one of the hubs would cause the network to fall apart. In our case, the removal of a hub could occur due to new migration policies or laws targeting sex workers.

A B

Figure 7: (A) in-degree & (B) out-degree histogram

8 A self-loop is when a node links to itself. In our data set, this means a sex worker is active in their country of origin.

A policy change in one of the hubs could dramatically increase the migration pressure on other major hubs. On the other hand, if one would like to reduce international migration among sex workers, targeting the hubs would be the most effective method. We will focus more on these hubs later. There has been a high level of interest in whether real-world networks follow power- law degree distributions. Networks that follow power-law degree distribution are referred to as scale-free networks. An easy way to spot the power law is by looking at the cumulative degree distribution; this shows the likelihood of a randomly selected country possessing a degree greater than a particular reference value. The cumulative degree distribution of our network (Figure 8) does not display strong power-law characteristics.

Figure 8: Cumulative degree distribution of the IMN, plotted on a log-log scale

3.2.2.2. Degree Centrality

The degree centrality of a network can be found by looking at the degree of each node. The degree of a node is equal to the number of countries that node is connected to. In a directed network, a distinction can be made between a node’s in- and out-degree. The in-degree of a node k corresponds to the number of incoming edges that terminate in node k (i.e. the number of different countries that send sex workers to country k). The out-degree should be interpreted as the number of outgoing edges that originate in node k (i.e. the total of countries sex workers from country k are active in).

In network analysis, the in-degree can be seen as an indication of popularity (Opsahl et al., 2010), and the out-degree evidence of expansiveness. Nodes with a high degree are involved in the circulation of sex workers to and from multiple countries. These nodes are the so-called ‘hubs’ mentioned earlier. When translated to a migration context, nodes with a high out-degree will send sex workers to the highest number of countries, i.e. they supply the labor force. Nodes with a high in-degree will receive migrants from the highest number of countries. In other words, they attract sex workers from all over the world.

Table 4: Ranking of countries in our data set based on Degree, In-Degree and Out-Degree

Country Degree Country In-Degree Country Out-Degree United Kingdom 137 United Arab 71 United Kingdom 69 Emirates Spain 114 United Kingdom 68 Russia 69 France 101 Oman 60 United States 67 Russia 98 Spain 59 Ukraine 56 Germany 96 Germany 59 Spain 55 Italy 95 Malta 58 Brazil 54 Turkey 85 Turkey 56 Poland 51 Belgium 85 France 54 Czech Rep. 49 Canada 83 Netherlands 52 Colombia 49 Netherlands 80 Italy 51 France 47 China 77 China 51 Italy 44 Greece 77 Singapore 50 Estonia 44 Sweden 73 Qatar 50 Bulgaria 42 Poland 72 Belgium 47 Romania 40 Austria 72 Canada 44 Latvia 40 United Arab Em. 71 Bahrain 43 Canada 39 Australia 70 Greece 42 Belgium 38 Singapore 69 Sweden 41 Germany 37 Romania 68 Austria 40 Hungary 37

In Table 4, we rank the countries in regard to their degree, in-degree, and out-degree. In the ranking based on degree, the highest-ranked nodes are the UK, Spain, France, and Russia. The top 10 is completed by Canada, Turkey, and some Western(Germany, Belgium, Netherlands) and Southern (Italy) European countries. Countries with a high degree can be seen as popular migrant destinations. In 2019, two-thirds of all international migrants were living in only 20

countries9. The top spots on that list are The United States10, Germany, Russia, United Arab Emirates, France, and Italy. The overlap between the top migration countries and the most popular migration destinations in our data set indicates that the migration of sex workers is likely to follow overall migration patterns, while still displaying unique characteristics. The average degree in our network is 20.613; this means that each country is connected on average to 20 other countries, by either out or in-migration. In our case, it would be interesting to look at the differences between countries with a high in or out-degree. A high in-degree implies that the country is attractive to sex workers (strong pull-factors). Therefore, a high out-degree would imply many sex workers leaving the country looking for better opportunities (strong push-factors).

When we order the countries by in-degree, the ranking immediately changes. Two Middle Eastern countries (United Arab Emirates, Oman) enter the top 10, respectively, in the 1st and third position. In the first instance, the high in-degree (i.e. popularity) of these countries seems counterintuitive; prostitution is illegal in both of them. Nevertheless, the United Arab Emirates (UAE) has a reputation as being one of the Middle East’s top sex tourism destinations (Lageman, 2016). The high in-degree in our network seems to confirm this reputation. The lack of official statistics about sex work makes this observation valuable; it is empirical evidence confirming the UAE to be a popular destination for sex workers around the world. Following this reasoning, we could hypothesize Oman and Qatar being similar hot-spots for sex workers in the Middle East.

Ranking Country weight 1 India 923 2 Russia 308 3 Ukraine 219 4 Poland 146 5 Romania 111

Table 5: Biggest migration inflows into The United Arab Emirates

9 International Migrant stock: Ten key meassages 2019, UN DESA 10 Data concerning the USA is not representative in our data set due to the website not being active in the USA 36

Table 5 shows the biggest migration inflows into the UAE. Next to the strong migration flow between India and the UAE, which we explained earlier, the biggest suppliers of sex workers to the UAE are Russia and several Eastern European countries. The high income (as measured by pcGDP) discrepancy between the UAE and its major suppliers of sex workers supports the hypothesis that countries with a high in-degree offer economic opportunities for sex workers.

An interesting observation is that prostitution is illegal in both the UAE and Oman, and yet they are both in the top three most popular countries in our data set. Implying that policy has little effect on sex-worker migration. The primary drivers seem economic factors such as labor demand and income inequalities. Notable is that Middle Eastern countries like Qatar, the UAE, and Oman only have an in-degree, no out-degree. No sex worker in our database has listed their nationality as one of these countries. As an explanation, we could assume that the religious strictness would limit the sex workers that are open about their nationality.

When we look at the out-degree, the top 10 changes again. Russia leads the ranking, followed by the UK and the USA. Notable newcomers in the top 10 are Ukraine and Central European countries (Poland, Czech Republic) as well as South American countries (Brazil, Colombia). The high position of Ukraine is no surprise; the combination of high poverty rates and lack of employment opportunities led many women to sex work. Ukraine is known to have a greater number of trafficking victims than any other Eastern European nation (IOM, 2015). It could be that some of the sex workers with a Ukrainian nationality in our data set are victims of trafficking. If that were the case looking at the major migration outflows from Ukraine could provide insights into trafficking patterns. The strongest migration outflows from Ukraine are Turkey, the UAE, France, and Israel. In this thesis we will not go in detail about this, but it is an interesting potential application for the international migration network of sex workers we have build.

The USA is an interesting case. Since prostitution is illegal in the USA11, the website we used for our data has no listings for escort services in the USA. We can assume this is to avoid legal trouble. Considering no sex workers from the data set are active in the USA, the in-degree is 0. The high out-degree is due to the number of workers with an American nationality working in different countries.

11 Nevada is the only U.S. jurisdiction to allow some legal prostitution 37

Note about the high position of the UK in all rankings. Although being one of the most popular migration destinations worldwide, the dominance of the UK in our network needs to be seen in the context of our data. Like most data scraped from public websites and forums, there are some inaccuracies. Our data was not published to be turned into a migration network, and we have to take this into account. In the case of the UK, we noted all sex workers that had ‘English’ listed as their nationality as being from the United Kingdom. When manually checking the data, we noticed some of the UK nationalities could be listed wrong. Especially since some of the sex workers with a UK nationality did not have English listed as one of the languages they speak. What we think happened is that due to confusion when creating their profile, some sex workers listed ‘English’ as their nationality instead of their language. This should be taken into account when looking at the results. Of course, this does not take away from the United Kingdom being one of the most popular migration destinations worldwide.

Weighted Degree Centrality Since countries are viewed as nodes of a weighted network, weighted variants of these degree measures are also available. Weighted degrees take into account not only the number of connections but also the strength of the migration flow between countries (i.e. the weight).

Weighted in-degree

Ranking Country weight weight 1 United Kingdom 5769 6 Turkey 1931 2 Malaysia 4051 7 Netherlands 1858 3 United Arab Emirates 2765 8 France 1762 4 Spain 1949 9 Germany 1421 5 Italy 1945 10 Israel 1050

Table 6: Most attractive countries based on the weighted number of immigrants (weighted in-degree)

The weighted in-degree is an empirical measure of attractiveness, i.e. the total of immigrants into a country. We can say the weighted in-degree highlights the popular immigration countries. Applied to our migration data, weighted measures will heavily favor countries in which the website is more popular. Table 6 shows the ranking based on weighted in-degree. We can observe this ranking is almost identical to Table 1, where countries were ranked based on the active profiles in each country. The differences between both tables are limited to a lower position of Germany, and Russia being swapped with Israel. Germany is ranked lower

in the weighted ranking due to the removal of self-loops in the network, and Russia is ranked lower duet to not attracting many immigrants (low in-degree).

The similarities between Table 1 and Table 6 highlight the main shortcoming of using weighted degree measures to rank countries. The weighted in-degree is noisy and not always reliable. It varies over time and is subject to variability in the data. It provides a biased view due to the users of the site not being evenly distributed across all countries in the database. Later in this thesis, we will try to link the attractiveness of certain countries based on other possible factors like wealth and GDP per capita. However, such measures require extra data. In the literature other measures such as dividing the weighted in-degree by the population, or creating a ranking based on 푟푎푡푖표 = weighted in−degree are proposed. These measures would give too much weighted out−degree weight to lowly populated countries or are just not feasible with our data. e.g. we cannot calculate the ratio considering the limitations of our data12. Therefore we will look at some other centrality measures to get a better understanding of our network.

Weighted out-degree

Ranking Country weight weight 1 Russia 6814 6 Brazil 1422 2 China 3449 7 Italy 1272 3 Ukraine 2024 8 Czech Republic 1204 4 Romania 1562 9 India 1189 5 United Kingdom 1432 10 Spain 996

Table 7:Countries ranked based on the weighted number of emigrants (weighted out-degree)

Table 7 shows countries ranked based on weighted out-degree (i.e. migrant outflow). The highest weighted out-degrees belong to Russia, China, Ukraine, and Romania. Notable is the number of Eastern European countries that appear in the ranking. This observation provides empiric evidence in line with a tendency to think of migrant sex workers as being Eastern European. Since there are no official statistics that focus on sex workers, this is a valuable observation. Prior research13 on this subject focusses mainly on the trafficking14 of Eastern

12 Some important nodes have an out-degree of 0 13 Andrijasevic, Problematising trafficking the sex sector : A case of eastern European women in the EU, 2007 14 We specifically refer to sex-trafficking : the action or practice of illegally transporting people from one country or area to another for the purpose of sexual exploitation. 39

European women in the EU. The reason why China ranks second place is due to a very strong migration outflow to Malaysia.

PageRank If we want a robust ranking based solely on migration data itself, we can use the PageRank algorithm (explained in section 2.2). The approach of using PageRank to rank countries in migration networks was proposed by (Quentin Cappart and Adrien Thonet, 2015). In a migration network, PageRank identifies attractive migrants destination countries (Aleskerov et al., 2016). The algorithm contains the idea that not all the connections are equal. When a country receives migrants from an ‘important’ country, it will gain more importance than when receiving migrants from a less ‘important’ country. Ranking countries based on their PageRank highlights countries that are ‘centers of international immigration’ and the countries which are directly linked with them through migration flows.

Ranking Country 1 United Arab Emirates 6 Oman 2 United Kingdom 7 Malta 3 Spain 8 Qatar 4 Germany 9 China 5 Turkey 10 Italy

Table 8: Most attractive countries according to their PageRank

We immediately notice a difference between Table 6, where we ranked countries based on weighted in-degree and Table 8, where we use PageRank to rank the countries. For example, Malaysia, which was ranked second in Table 6, is ranked 17th based on PageRank15. We could say that Malaysia, although having a high in-degree, is not an important or well-connected country in the migration network. Contrary, Malta, Oman and, Qatar are countries that have not appeared in any previous rankings of centrality indices. The reason they appear now indicates they are popular destinations and share migration ties with other important nodes in our network.

15 See Table 11 in Appendix for full ranking of all centrality measures 40

Betweenness Centrality Betweenness centrality measures which node lies on many shortest paths between other nodes. A node with high betweenness is ‘central’ in the sense of being fundamental for the network. If it is removed, it becomes substantially harder for the rest of the nodes to communicate with each other. In an airline network, nodes with high betweenness would be airports like Heathrow or other major airports one has to travel through to get from country A to B. The interpretation of betweenness centrality in a migration network is somewhat less straightforward. Since migrants are not constrained to the flow of links in the network, there is no reason for a migrant to stop in a certain country first. However, countries with high measures of betweenness

Figure 9: Visualization of betweenness centrality in our network centrality can still be thought of as hubs in the sense that they exchange large numbers of migrants with parts of the network that are otherwise not connected. Betweenness centrality assesses how crucial a country is for brokering human movements between different countries. Figure 9 shows a visual representation where the size and color of the node are scaled with betweenness centrality. It is immediately clear that only a few countries have high betweenness, and there is a fairly strict hierarchy among them. We see that the UK is by far the biggest and darkest node, indicating that the UK plays an important role in mediating migration flows in the IMN. The Other nodes that stand out are Spain and Russia. Spain will have a high betweenness due to the migrants from South America. Moreover, Russia is connected to many Middle Eastern and European countries.

3.2.2.3. Community detection.

Information about the connections of each node can be used to identify community structures in a network. The goal is to partition the network into communities where connections within each group of nodes are stronger than the connections between groups (i.e. high modularity scores). Community detection goes further than descriptive measures; it is a tool to expose significant internal network structure, detecting relationships between nodes that may have otherwise gone unnoticed. In the migration context, communities are groups of countries that exchange many migrants with each other (and do not send migrants to other parts). It is not a given that these countries will share common social and cultural attributes. Previous research has shown that international migration is strongly related to cultural ties. It will be interesting to see if a community detection algorithm can replicate those ties, and find out to what extent cultural affinities explain migration patterns in our data. Language-based communities can be defined using major colonial European languages (English, French, Spanish, Portuguese). For community detection, we will use the Louvain method (Blondel & Guillaume, 2008) described in section 2.2.

Figure 10: Communities obtained with the Louvain method, countries belonging to the same community are plotted in the same color.

Figure 10 shows the result of community detection in the IMN. The 106 countries in our data set are divided into four communities. Each different color represents a community; the countries in grey are not present in our data. First is the orange (3) community of most South

American countries, together with Spain and Portugal. Proximity factors could explain this community, but the inclusion of Spain and Portugal indicates the influence of linguistic factors (e.g. common languages) and historical factors (e.g. former colonies). Ecuador and Bolivia are not part of this cluster; this could be explained by the limited records we have for both these countries. The yellow community (2) groups Russia and former USSR countries like Belarus, Ukraine, and Kazakhstan. Notable is the inclusion of France and Italy in this community, indicating a strong connection between those countries.

Interestingly, we notice Europe is split between three clusters. This split is somewhat counterintuitive. Due to the freedom of movement and residence for persons in the European Union (EU), we would expect European countries to form a single community with strong migration ties. The clear split between European countries indicates that an open migration policy does not necessarily mean strong migration flows between countries. We could hypothesize migration policy does not seem a primary driver in sex worker migration.

3.2.2.4. Small World Network

Many real-life networks display the small-world property. We can call a network small-world when most nodes are not neighbors of one another, but the neighbors of any given node are likely to be neighbors of each other, and most nodes can be reached from every other node by a small number of ‘hops.’ Research by Fagiolo & Mastrillo (2013) has made the argument that the decreasing average shortest path length in combination with a high clustering coefficient indicates small-world behavior in the international migration network. The implications of the small-world effect on a migration network would be the following: in networks with small distances, on average, any country can reach any other country through two or three hops. Potential migrants may obtain information about a large number of destinations and have access to more diverse opportunities.

To test whether our IMN exhibits small-world behavior, we use random graph null models. Random graph models are commonly used in research communities that analyze network data sets. This approach is based on generating a network that preserves the characteristics of our empirical network while randomizing all other structure. Usually, properties of an empirical network are compared to the properties of an ensemble of randomized networks, in order to

measure whether the empirical network properties are meaningful or whether they are a regular consequence of the underlying degree sequence. (Fosdick et al., 2016). Most of the research concerning null models focuses on undirected networks without self-loops. We will transform our directed migration network into an undirected one.16 The choice of the randomized null model directly affects the conclusions that can be drawn from the test. Since the degree distribution constrains many key properties of networks, it is common to use a space of graphs where the degrees of all the nodes are fixed, and the edges are placed between nodes uniformly at random. The most popular method to achieve this is called a configuration model.

To test whether our network displays the same ‘small-world behavior,’ we calculate the shortest path length (L) and the clustering coefficient (C) of our undirected empiric network. We then generate an ensemble of null-model networks and calculate the average of the shortest path length (퐿푅) and clustering coefficient (퐶푅) over this ensemble of null-model networks. To generate our ensemble of null-networks, we sampled 1000 random undirected migration networks17.

Measure Empiric Random 95% CI p-value Network Network

Clustering coefficient 0.6333 0.6537 [0.64639, 0.66103] 0.989

Avg. shortest path length 1.7230907 1.7094806 [1.6993299, 1.71971767] 0.047

Diameter 4 3.66 [3.4768003, 3.7454218] 0.342

Table 9: Network measures compared to an ensemble of 1000 random networks

In Table 9, we can see no substantial differences between the empiric and random networks. We tested for statistical significance by assigning a one-sided p-value and comparing the times

퐶푅 yields higher values compared to C. If 퐶푅 yields higher scores in more than 95% of the runs, this would suggest that the clustering coefficient is mainly a function of the global network density rather than a result of our data. We conclude that the clustering in the random network

16 Implemented using NetworkX function to_undirected https://networkx.github.io/documentation/stable/reference/classes/generated/networkx.DiGraph.to_undirected.ht ml 17 Random graphs were generated using Igraph https://igraph.org/r/doc/sample_degseq.html 44

is higher than in the empiric network, while the shortest path length is lower. These results suggest that the clustering coefficient in the IMN is primarily a result of the underlying degree sequence and non-trivial local patterns.

Technically our migration network does display the key property of small-world networks, i.e. the shortest path distance between all countries in the network is sufficiently small (Newman, 2010). Nevertheless, as stated by (Danchev, 2015), we must consider what this means in the specific context of international migration. Migration is often subject to restrictive policies. As a result, sex workers from certain nationalities might be restricted from traveling along the network, even if a path exists. We conclude that the argument our IMN exhibits small-world behavior is questionable on both the methodological (퐶푅 > 퐶 & 퐿푅 < 퐿) and the substantive ground.

3.3. SOCIO-ECONOMIC ANALYSIS

3.3.1. Price on Micro-level

Due to the marginalized nature of sex work, little is known about the market for sex services. In the past, studies have been done where field data were collected by interviewing providers (Strathdee et al., 2008) or customers (Lowman and Atchison, 2006). It is not until recently that research has been done using ‘big data’ techniques on the massive amount of information that has become available (Griffith et al., 2016). In this thesis, we would like to add some insights to this by economist largely overlooked market. Our data is about a specific type of sex-worker, the female escort. Escorts provide sexual services and/or companionship in exchange for money (usually an hourly rate).

Incall vs Outcall Different service locations expose sex workers to different risks (Taylor, 2003). If a sex worker can set their own service location, i.e. ‘incall,’ they will often be safer than when offering services at a location of the customer’s choosing, i.e. ‘outcall.’ In our database, the median hourly incall rate is $219, while the median hourly outcall rate is $272.318, confirming earlier research indicating a higher outcall than incall rates for sex workers.

A B

Figure 11: One-hour price distribution incall (A) and outcall (B)

The difference between outcall and incall prices can be seen in Figure 11. In their work “Rational pricing in prostitution: Evidence from online sex ads” DeAngelo et al. (2018) studied data for more than 30 million online ads for real-world sex. They establish that there is a 15-19% price premium for outcall services. According to their research, this premium can be decomposed into travel cost (75%) and a remainder that is strongly correlated with violent crime risk.

One way to test whether the price premium is related to violent crime risk is by creating a new variable [outcall price – incall price]. Following DeAngelo et al. (2018), countries having a high discrepancy between outcall and incall prices will have higher travel costs and thus crime. We tested this in our data set and found no significant correlation between the price discrepancy and crime statistics19, or other relevant statistics in the human freedom index20. Table 10 shows the countries with the highest discrepancy between in and outcall prices.

18 Calculated on 20.791 records for which we had both an incall and outcall rate available 19 Crime rate by country 2020 : https://worldpopulationreview.com/country-rankings/crime-rate-by-country 20 The Human Freedom Index presents the state of human freedom in the world based on a broad measure that encompasses personal, civil, and economic freedom. 46

Ranking Country [Outcall price – Incall price] 1 Saudi Arabia 124,3 2 Kuwait 115,1 3 Qatar 103,2 4 Iceland 102,7 5 Oman 96,8

Table 10: : [Outcall price- Incall price]

Looking at the top 5 countries, four are strict Islam countries. We could hypothesize these are commanding a higher outcall premium due to strict religious policies. Although, we must keep in mind that these countries are some with the highest overall prices, thus automatically resulting in a bigger price discrepancy between incall and outcall prices.

Migrant vs. non-migrant sex workers Since we suspect a higher income to be a driver of sex worker migration, we compare the prices of migrant sex workers to non-migrant sex workers (i.e. sex workers who are active in their country of origin). We find that the median incall price ($219.1) for migrant sex workers is significantly higher than for non-migrant sex workers ($194). We also find the average age of migrant sex workers (24 years old) to be lower than native sex workers (25 years old). This provides evidence that points to increase economic opportunities for sex workers who decide to migrate.

In the literature Stark (2006) has observed a positive relationship between income inequality, as measured by the GINI index21 and the incentive to migrate. In our data set, we found no significant correlation between out or in-degree and the GINI index. We would expect high income inequality to be a push factor so that migration would occur from countries with higher income inequality to countries with lower income inequality. This is not the case in our data set. We could hypothesize that since the sex workers in our dataset offer a high-end service, lower income inequality is not a pull factor for sex workers.

21 A measure to represent the income inequality or wealth inequality within a nation 47

Effects of physical attributes on price On escort websites, sex workers can advertise their physical attributes. This can be done through pictures or by mentioning attributes on their profile, such as weight, height, age. In our data, we have several physical features for every sex worker. As mentioned by (Griffith et al., 2016), this allows a unique means of exploring short-term mating preferences. Sex workers who have traits highly valued by most men should be able to charge higher fees than less desirable competitors (Baumeister and Vohs, 2004)

We have information about height, weight, age, eye color, hair color, hair length, bust size, and ethnicity. After multiple regression analysis, we did not find any significant correlation between any of these variables and the 1hr/outcall price.

Figure 12: Correlation Plots between price and physical attributes

Figure 12 shows correlation plots between price and age, weight, and height of the sex workers. Although significant, we find a very minimal correlation coefficient between price and physical features. We see a minimal negative correlation between age and price, even though this confirms trends in the literature indicating higher rates for younger sex workers (Cunningham et al., 2017), we do not provide much validity to these results. The correlation is only significant

due to the high sample size. We see the strongest correlation between height and price. We hypothesize there is a smaller supply of tall females sex workers; thus, they can command a higher price premium. Height was not mentioned in any previous research on this subject. We also take into account the correlation coefficients are influenced by inaccuracy in our price data. Most of these inaccuracies have been filtered out, but there are still some present. One way to prevent this would be only take into account prices listed in USD, but this would limit our data significantly.

When studying the relation between price and ethnicity, our findings differ somewhat from those of Cunningham and Kendall (2017) and are more in line with Nelson et al. (2019). We find that Ebony (Black) and Asian sex workers have the lowest median rates. We then see similar rates for sex workers indicating a European, Latin, or Mixed ethnicity. Providers with an Arabian or Indian ethnicity command the highest price premium in our data set.

Figure 13: Boxplots of median prices for different ethnicities in our data set

Although there is a trend, the effect size (휂2 = 0.022) is minimal, and using a One-Way ANOVA test (p-value = 0.353), we do not consider these price differences to be statistically significant.

3.3.2. Price on Meso-level

Figure 14: Overview of cities in our database ranked on median price. A bigger rectangle indicates a more expensive city.

Figure 14 shows a treemap of the median price on a city level. Dimensions of the treemap are defined based on the median price for each city, with more expensive cities displayed bigger. For increased accuracy, we only take into account cities where we have price data for at least 50 sex workers. Monte Carlo appears to be the most expensive city, and not surprisingly it is also the capital of Monaco, the country with the highest GDP per capita in our database. Being so small, Monte Carlo is a particular case. Nevertheless, we notice many of the other most expensive cities are capitals as well. As a result, we could say that if a sex worker would like to maximize her earning potential, migrating to a capital city would be the smart move. The second interesting observation is that Dutch cities Rotterdam, Amsterdam, Schiphol are all lower price cities, in the company of cities from countries with a lower GDP compared to The Netherlands. We could theorize this is due to prostitution being legal and well organized in the Netherlands, thus reducing the demand and therefore, the price of website based escort- services. It comes as no surprise that escort services are more expensive in cities where prostitution is illegal.

3.3.3. Price on Macro-level

Figure 15: Overview of countries in our database ranked on median 1hr outcall price (bigger rectangle size = higher median outcall price) and legality of prostitution. Red: prostitution is illegall, Green: prostitution is legal, Yellow: not specifically addressed by the law, Orange: prostitution is restricted. For accurate representation we only display countries with more than 50 observations

Figure 15 shows a visual representation of the median outcall price and whether prostitution is legal, illegal, restricted, or unregulated. We notice the three most expensive countries are all countries where prostitution is illegal. This is no surprise, illegal goods and services are often much more expensive than legal goods (Reuter, 1985). An interesting thought is that due to economic reasons being strong pull factors in sex worker migration, having laws that prohibit prostitution might actually make the country more attractive to sex workers. Of course, other variables are influencing the median 1hr outcall price of a country; nevertheless, this is an interesting observation. To further support this thought, we can see countries like Austria and the Netherlands, where prostitution is legal and regulated, are among the cheaper countries in our data set.

3.3.4. In degree - GDP per capita

The movement of sex workers from their origin countries to new ones represents a demographic event at the international level. Although there have been studies on international migration among sex workers, much of the literature has focused on HIV & STI transmission in low- and middle-income countries in Africa, Asia, and Latin America. Limited attention has been paid to the driving factors of international migration among sex workers. In general, there

are a number of reasons behind migrations. It is common for migrants to leave their countries to escape civil wars, ethnic conflicts, but also poverty and the absence of a good labor market. Drivers of migration among women engaged in sex work are diverse and could include: new economic opportunities, safety, career opportunities, or policies governing sex work and immigration. In our third hypothesis we state that we expect countries with a high GDP to attract more sex workers. This hypothesis is based on the idea that migration flows between poor and rich countries. This is a result from the uneven global distribution of income and wealth, encouraging the movement towards richer countries in an attempt to improve the conditions of life (Castles, 2013). We will test this on our data set. The economic strength of countries is often identified through their GDP’s. Earlier, we hypothesized GDP being a pull factor and increasing the attractiveness of individual countries. From an economic perspective, migrating to a country with a high GDP per capita makes sense since these countries will often have a better market for high-end services like escorts.

Figure 16: Correlation between in-degree (y-axis)and GDP per capita (x-axis). The size of the circle indicates population size

Figure 16 displays the correlation between the GDP per capita and the in-degree of countries22 in our database. We notice a significant positive, correlation coefficient of 0.44, and an adjusted 푅2 of 0.189. This result confirms there is a correlation between the GDP of a

22 Monaco removed from visualization due to scaling issues 52

country and its attractiveness to migrants. An 푅2 of 0.189 means that the GDP per capita can explain 18.9% of the variance of the in-degree.

Other well-documented literature focusses on the ‘gravity’ model, in which migration is, among other things, directly related to the population of the origin and destination country, and inversely related to the distance between them. We tested this on our data set and found no relation between population and in-degree. Visually this can also be seen in Figure 16, where the circle size is scaled based on population. In our records, many of the countries with high in-degree are countries with small populations (Qatar, United Arab Emirates).

Figure 17 shows that when we look at the relation between in-degree out-degree of countries,23 we can divide the countries into four quadrants. We have countries with a low in-degree and high out-degree (Ukraine, Brazil, Poland), countries with a high in-degree and a low-out degree (the UAE, Malta, Singapore) and countries with a high in-degree and out-degree (the UK, Spain, France). We could say that countries in quadrant A are providers of sex-workers (i.e. strong supply). In D, they are mainly receivers (i.e. strong demand). B would be countries with both a strong demand and supply of sex workers. And countries in quadrant C are sparsely involved in the migration of sex workers.

A B

C D

Figure 17: Relationship between in-degree and out-degree. Circle size is proportional to degree, color is scaled with in-degree, dark green indicating higher in-degree. Lines indicate the average in and out degree, grey area is 95% Confidence Interval 23 Countries with in-degree or out-degree = 0 are not displayed 53

4. DISCUSSION, LIMITATIONS, AND FURTHER RESEARCH

4.1. CONTEXTUALIZE FINDINGS WITH LITERATURE

In this thesis, we provided an innovative approach to study global human migration. We combine ‘big’ data collection and number of methods and visualization strategies borrowed from network analysis (mainly SNA) to map the international migration of sex workers. Although we started our research to be primarily explorative, we were able to define several hypotheses and successfully test them on our data. We investigated the possibilities of using publicly available data to get a better understanding of a subject where obtaining data has historically been very difficult.

First, we took a closer look at the demographics and features of our data. We found that the average female sex worker in our data set is 24 years old and charges $219 for 1/hour of her services (outcall). Consistent with existing literature (DeAngelo et al., 2018), we found outcall rates to be higher than incall, but inconsistent with the literature found a no significant relationship between a sex workers rate and physical attributes like age, weight height, or ethnicity. We are thereby disproving our first hypothesis expecting certain physical attributes of sex workers to be linked to their rates. Since this is a subject with little data availability, we consider this a valuable insight into sex worker statistics. We also found that migrant sex workers are younger and charge higher rates (i.e. earn higher income) than non-migrant sex workers, suggesting that short-term mobility and migration increase social and economic opportunities for sex workers.

When analyzing network statistics, we noticed the dominant role of certain countries and identified them as hubs. We used the Louvain method to identify four communities within our data, which provide evidence sex worker migration follows similar migration patterns as other human migration and is influenced by cultural ties—thus providing evidence to support our second hypothesis, where we expected migration ties to be linked to linguistic factors and a common history. When comparing our international migration network to a random network, we concluded it does not display ‘small-world properties.’. In our socio-economic analysis, we found that there is a significant relationship between a countries in-degree and GDP per capita.

This relationship confirms our first hypothesis where we stated countries with a high pcGDP would attract more sex workers than countries with a low pcGDP.

4.2. LIMITATIONS OF DATA, METHODS, AND ANALYSIS

When taking into account the results of this analysis, it is important to consider the limitations of our data. Two key limitations impact the strength of our data set and the conclusions we draw from them. First, when collecting data from a single website, there will be selection bias. Websites might be more popular in certain countries or attract a specific type of sex worker, which could lead to our data not being representative of escorts worldwide. On top of that, not all profiles registered on the site will be real profiles. There will be fake profiles and old profiles that are no longer active. We have no way of filtering out these profiles ourselves; we rely on the website to maintain the quality and frequently remove fake profiles. Self-reported attributes like weight, height should be taken with a grain of salt, since providers have some incentive to lie about their physical appearance. To reduce selection bias, one could make a global database of several escort directory websites. Creating a global database would decrease the selection bias, increase the data points, and overall allow for more accurate analysis.

Second, the data collection is limited to one period in time. It is often useful to look at networks (and specifically network measures) over a period of time since doing so provides insight into how processes of globalization or other events influence our migration network. Having time- series data available would allow us to analyze how the IMN changes over time. An interesting approach would be to repeat the scraping of the website over a sizeable period (e.g. five years with biyearly scrapes). This approach could provide interesting information about the evolution of migration patterns among sex workers.

4.3. WAYS FORWARD

In our model, we did not take into account the distance between countries. In the migration literature, one of the most widely used methods to take distance into account are gravity models. Gravity models hypothesize that the number of migrants moving between two countries is directly proportional to their economic size (sometimes population size is used as a proxy), and inversely proportional to the distance between them. Recent studies have, on average, confirmed this observation. Gravity models can be improved when supplemented with other factors, such as colony-colonizer relationships, a shared common language, or the effects of political borders and a common currency (Bergstrand and Egger, 2007). We did not take 55

gravity models into account in this thesis to limit complexity. Still, the application of these models could be an interesting opportunity for further research. Gravity models have been successfully used in order to describe the International Trade Network (ITN), traffic flow, and migration. Furthermore, it would be interesting to see if there are any similarities between the international flow of goods and the international flow of sex workers. 5. CONCLUSION

From the literature review it was clear that the combination of ‘big data’ and network analysis is an emerging field with numerous opportunities. Globally, limited information is available about sex workers who engage in migration. We observed an opportunity to creatively combine ‘big’ data and network analysis to increase our understanding of an unknown and unappreciated migration pattern.

Our analysis suggests that within the set of destinations available, sex workers do not choose at random. Drivers of mobility and migration among women engaged in sex work are diverse and could include new economic opportunities, safety, and policies governing sex work. There is a significant relationship between a countries pcGDP and in-degree, suggesting that sex workers prefer to migrate to richer countries. We identified major suppliers of sex-workers (Russia, China, Ukraine, Romania) and major receivers of sex workers (the United Kingdom, the United Arab Emirates, Italy, Spain). We consider these valuable insights helping to map the structure of sex worker migration. On account of studying centrality measures we identified several hubs—countries like The United Arab Emirates, The United Kingdom, and Spain are countries playing an important role in facilitating the migration flows of sex workers between countries. Through the use of community structure, we have shown that geographical, cultural and linguistic influences at least partially explain the sex worker migration patterns in our data set. Interestingly some of the most popular immigration destinations are countries where prostitution is illegal—implying that sex workers do not seem to base their migration decision on policies regarding sex work. It might even be that the criminalization of sex work has the opposite effect, increasing the income of sex workers and thus making the country more attractive to migrant sex workers. On the individual level, we find that migrant sex workers earn a higher income than non-migrant sex workers, once again indicating that economic gain is a major driver of migration.

Our analysis is a first attempt to use web scraped data for building a migration network specifically for sex workers. The obvious question arising from this empirical study is whether the results can be generalized for the global migrant sex worker population. Due to the lack of official statistics, this is a decision that is particularly hard to judge. We think we proved our methodology to be successful, and since we neither used network analysis to its full potential nor have we checked all the options our sex worker data set has to offer, we see the opportunity for further research. We are happy with the results; what started as exploratory research has in fact allowed us to test several hypotheses and make valuable observations about the migration of sex workers.

6. BIBLIOGRAPHY

Aleskerov, F.T., Meshcheryakova, N., Rezyapova, A., & Shvydun, S., (2016). Network Analysis of International Migration. Available at SSRN: https://ssrn.com/abstract=3196966 or http://dx.doi.org/10.2139/ssrn.3196966

Amoore, L. ed., 2005. The global resistance reader. Psychology Press. (p. 291)

Baker, C., (2013). Moving Beyond “Slaves, Sinners, and Saviors”: An Intersectional Feminist Analysis of US Sex-Trafficking Discourses, Law and Policy. Study of women and gender: Faculty Publications, Smith College, Northampton, MA.

Barabási, A.L., Albert, R., Jeong, H., (2000). Scale-free characteristics of random networks: The topology of the world-wide web. Physica A: Statistical Mechanics and its Applications 281(1), 69–77. doi: 10.1016/S0378-4371(00)00018-2

Baral, S., Beyrer, C., Muessig, K., Poteat, T., Wirtz, A.L., Decker, M.R., Sherman, S.G., Kerrigan, D., (2012). Burden of HIV among female sex workers in low-income and middle-income countries: A systematic review and meta-analysis. The Lancet Infectious Diseases 12(7), 538–549. doi: 10.1016/S1473-3099(12)70066-X

Bartram, D., Poros, M., Monforte, P., (2014) Key Concepts in Migration. Sage Publications Ltd.

Baumeister, R.F., & Vohs, K.D., (2004). Sexual economics: Sex as female resource for social exchange in heterosexual interactions. Personality and Social Psychology Review 8(4), 339-363. doi:10.1207/s15327957pspr0804_2

Bergstrand, J.H., & Egger, P., (2007). A knowledge-and-physical-capital model of international trade flows, foreign direct investment, and multinational enterprises. Journal of International Economics 73, 278–308. doi: 10.1016/j.jinteco.2007.03.004

Blondel, V.D., Jean-Loup, G., Lambiotte, R., Levebvre, E., (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment. doi: 10.1088/1742-5468/208/10/P10008

Blumenstock, J., (2012). Inferring patterns of internal migration from mobile phone call records: evidence from Rwanda. Information Technology for Development 18(2), 107- 125. doi:10.1080/02681102.2011.643209

Blumenstock, J., Chi, G., Tan, X., (2019). Migration and the Value of Social Networks. CEPR Discussion Papers, 13611.

Borgatti, S.P., (2005). Centrality and network flow. Social Networks 27(1), 55–71. doi: 10.1016/j.socnet.2004.11.008

Brussa L, & Munk V., (2010). Vulnerabilities and rights of migrant sex workers in Europe. HIV AIDS Policy & Law Review 15(1), 61-62.

Gurak, D.T. and Caces, F., 1992. Migration networks and the shaping of migration systems. International migration systems: A global approach, pp.150-176.

Cappart, Q., & Thonet, A., (2015). The World Migration Network: rankings, groups and gravity models. Leuven, KU Leuven. Published in : IEEE EUROCON 2015 - International Conference on Computer as a Tool (EUROCON)

Castle, T., & Lee, J., (2008). Ordering sex in cyberspace: a content analysis of escort websites. International Journal of Cultural Studies 11(1), 107–121. doi: 10.1177/1367877907086395

Castles, S., (2013). The Forces Driving Global Migration. Journal of Intercultural Studies 34 (2), 122–140. doi: 10.1080/07256868.2013.781916

Champion, T., (1999). Book Review EXPLORING CONTEMPORARY MIGRATION by P. Boyle, K. Halfacree and V. Robinson. ISBN 0 582 25161 3. Longman Harlow, 1998. International Journal of Population Geography 5(2), 152–153. doi: 10.1002/(SICI)1099-1220(199903/04)5:2<152::AID-IJPG120>3.0.CO;2-W

Charyyev, B., & Gunes, M.H., (2019). Complex network of United States migration. Computational Social Networks 6(1). doi: 10.1186/s40649-019-0061-6

Cunningham, S., & Kendall, T.D., (2011). Prostitution 2.0: The changing face of sex work. Journal of Urban Economics 69(3), 273–287. doi: 10.1016/j.jue.2010.12.001

Cunningham, S., Kendall, T.D., & Cunningham scott, S., (2017). Prostitution, hours, job amenities and education. Review of Economics of the Household 15(4), 1055–1080. doi: 10.1007/s11150-017-9360-6

Danchev, V., (2015). Spatial Network Structures of World Migration: Heterogeneity of Global and Local Connectivity. Thesis, University of Oxford, UK.

Danchev, V., & Porter, M.A., (2018). Neither global nor local: Heterogeneous connectivity in spatial network structures of world migration. Social Networks 53, 4–19. doi: 10.1016/j.socnet.2017.06.003

De BeL-Air, F., (2015). Demography, Migration, and the Labour Market in the UAE. Gulf Labour Market and Migration Programme (GLMM). Explanatory Note.

Docquier, F., Marfouk, A., Özden, C., & Parsons, C., (2011). Geographic, Gender and Skill Structure of International Migration. Munich Personal RePEc Archive (MPRA).

Edlund, L., & Korn, E., (2002). A theory of prostitution. Journal of Political Economy 110 (1), 181–214. doi: 10.1086/324390

Fagiolo, G., & Mastrorillo, M., (2013). International migration network: Topology and modeling. Physical Review E - Statistical, Nonlinear, and Soft Matter Physics 88(1). doi: 10.1103/PhysRevE.88.012812

Fincher, L.H., 2016. Leftover women: The resurgence of gender inequality in China. Zed Books Ltd.. pp. 1-10

Fosdick, B.K., Larremore, D.B., Nishimura, J., Ugander, J., (2016). Configuring Random Graph Models with Fixed Degree Sequences. SIAM Review, 60(2), 315–355. doi: 10.1137/16M1087175

Freeman, L.C., (2004). The Development of Social Network Analysis. A study in the sociology of science. Empirical Press.

DeAngelo, G., Shapiro, J.N., Borowitz, J., Cafarella, M., Re, C. and Shiffmann, G., 2017. Rational pricing in prostitution: Evidence from online sex ads. Working Paper.

Griffith, J.D., Capiola, A., Balotti, B., Hart, C.L., Turner, R., (2016). Online Female Escort Advertisements. Evolutionary Psychology 14(2), 1–9. doi: 10.1177/1474704916651270

Haas, H., Czaika, M., Flahaux, M., Mahendra, E., Natter, K., Vezzoli, S., Villares‐Varela, M., (2019). International Migration: Trends, Determinants, and Policy Effects. Population and Development Review 45(4), 885–922. doi: 10.1111/padr.12291

Hong Fincher, L., (2014). Leftover Women: The Resurgence of Gender Inequality in China. London: Zed Books Ltd.

IOM Press Release (20 October 2015) IOM Responds to Growing Human Trafficking Threat in Ukraine, via https://www.iom.int/news/iom-responds-growing-human-trafficking- threat-ukraine

Jones, A., (2015). Sex Work in a Digital Era. Sociology Compass 9(7), 558–570. doi: 10.1111/soc4.12282

Lageman, T., (20 January 2016). Dubai in United Arab Emirates a centre of human trafficking and prostitution. The Sydney Morning Herald, via https://www.smh.com.au/world/dubai-in-united-arab-emirates-an-epicentre-of-human- trafficking-and-prostitution-20160115-gm6mdl.html

Lin, J., (2012). Network analysis of China’s aviation system, statistical and spatial structure. Journal of Transport Geography 22, 109–117. doi: 10.1016/j.jtrangeo.2011.12.002

Lowman, J., & Atchison, C., (2006). Men who buy sex: A survey in the Greater Vancouver regional district. Canadian Review of Sociology and Anthropology 43(3), 281–296 doi: 10.1111/j.1755-618x.2006.tb02225.x

Maier, G. & Vyborny, M., (2005). Internal migration between US-states - A Social Network Analysis. SRE - Discussion Papers, 2005/04. Institut für Regional- und Umweltwirtschaft, WU Vienna University of Economics and Business, Vienna.

Massey, D.S., (2015). A missing element in Migration theories. Migration Letters 12(3), 279- 299. doi: 10.33182/ml.v12i3.280

Nelson, A.J., Korgan, K.H., Izzo, A.M., Bessen, S.Y., (2019). Client desires and the Price of Seduction: Exploring the Relationship Between Independent Escorts’ Marketing and Rates. The Journal of Sex Research 57(5), 664–680. doi: 10.1080/00224499.2019.1606885

Newman, M., (2010). Networks: An Introduction. Oxford University Press. doi: 10.1093/acprof:oso/9780199206650.001.0001

Newman, M.E.J., (2006). Modularity and community structure in networks. Proceedings of the National Academy of Sciences of the United States of America (PNAS) 103(23), 8577–8582. doi: 10.1073/pnas.0601602103

Opsahl, T., Agneessens, F., Skvoretz, J., (2010). Node centrality in weighted networks: Generalizing degree and shortest paths. Social Networks 32(3), 245–251. doi: 10.1016/j.socnet.2010.03.006

Popielarz, P.A., & Cserpes, T., (2018). Comparing the discussion networks and voluntary association memberships of immigrants and non-immigrants in U.S. suburban gateways. Social Networks 53, 42–56. doi: 10.1016/j.socnet.2017.03.004

Ravenstein, E.G. (1889). The Laws of Migration. Journal of the Royal Statistical Society 52 (2), 241–305. doi: 10.2307/2979333

Reuter, P., 1985. The organization of illegal markets: An economic analysis (Vol. 84, No. 9). US Department of Justice, National Institute of Justice.

Rocha, L.E.C., Liljeros, F., Holme, P., (2010). Information dynamics shape the sexual networks of Internet-mediated prostitution. Proceedings of the National Academy of Sciences of the United States of America (PNAS) 107(13), 5706–5711. doi: 10.1073/pnas.0914080107

Sanders, T., Scoular, J., S., Campbell R., Pitcher J., Cunningham S., (2018). Internet Sex Work: Beyond the Gaze. Hampshire: Palgrave MacMillan.

Spyratos, S., Vespe, M., Natale, F., Weber, I., Zagheni, E., Rango, M., (2018). Migration Data using Social Media. A European Perspective. JRC Technical Reports. Luxembourg: Publications Office of the European Union. doi: 10.2760/964282

Spyratos, S., Vespe, M., Natale, F., Weber, I., Zagheni, E., Rango, M., (2019). Quantifying international human mobility patterns using Facebook Network data. PLoS ONE 14 (10). doi: 10.1371/journal.pone.0224134

Stark, O., 2006. Inequality and migration: A behavioral link. Economics Letters, 91(1), pp.146-152.

Strathdee, S.A., Philbin, M.M., Semple, S.J., Pu, M., Orozovich, P., Martinez, G., … Patterson, T.L., (2008). Correlates of injection drug use among female sex workers in two Mexico-U.S. border cities. Drug and Alcohol Dependence 92(1-3), 132–140. doi: 10.1016/j.drugalcdep.2007.07.001

Page, L., Brin, S., Motwani, R. and Winograd, T., 1999. The PageRank citation ranking: Bringing order to the web. Stanford InfoLab.

Vertovec, S., (2010). Towards post-multiculturalism? Changing communities, conditions and contexts of diversity. International Social Science Journal 61(199), 83–95. doi :10.1111/j.1468-2451.2010.01749.x

Walby, K., (2012). Touching Encounters: Sex, Work & Male-for-male Internet Excorting. University of Chicago Press.

Wasserman, S., & Faust, K., (1994). Network Data, Measurement and Collection, in: Social Network Analysis : Methods and Applications (Structural Analysis in the Social

Sciences), pp. 28–66. Cambridge University Press. doi: 10.1017/CBO9780511815478

Wolkowitz, C., Cohen, R.L., Sanders, T., & Hardy, K., (Eds.) (2015). Body/Sex/Work: Intimate, Embodied and Sexualised Labour. Hampshire: Palgrave MacMillan

Zagheni, E., Garimella, V.R.K., Weber, I., State, B., (2014). Inferring international and internal migration patterns from twitter data, in: WWW 2014 Companion - Proceedings of the 23rd International Conference on World Wide Web. pp. 439–444. Association for Computing Machinery, Inc. doi: 10.1145/2567948.2576930

7. APPENDIX Origin Destination # Migrants Origin Destination #Migrants

China Malaysia 2905 Moldova Austria 147 Russia Italy 1340 India Malaysia 146 India UAE 923 Poland UAE 146 Russia Israel 836 Venezuela Spain 145 Russia France 780 Russia China 134 Romania UK 766 Albania Netherlands 133 Ukraine Turkey 707 United States UK 131 Russia Turkey 609 France UK 128 Russia UK 582 Ukraine Israel 120 Italy Netherlands 439 China UK 119 Vietnam Malaysia 434 Colombia Mexico 118 Russia Spain 426 Estonia Turkey 113 Italy UK 409 Japan UAE 111 Spain UK 379 Romania Germany 110 Brazil Spain 371 Lithuania UK 108 Hungary UK 367 Latvia UK 105 Czech Rep. UK 325 Sweden France 104 Thailand Malaysia 324 Lebanon Turkey 103 Russia UAE 308 Ukraine Cyprus 102 Russia Greece 259 Romania UAE 96 Brazil Portugal 254 UK Germany 94 Colombia Spain 254 UK South Africa 90 Czech Netherlands 254 United States Canada 90 Republic Russia Cyprus 249 Latvia Germany 89 China Taiwan 242 Spain Netherlands 89 Ukraine UAE 219 Ukraine Italy 87 Japan UK 213 United States Indonesia 87 Poland UK 213 Moldova Greece 86 Estonia UK 205 Poland Germany 85 Brazil UK 193 Indonesia Malaysia 83 Czech France 191 Poland France 81 Republic Ukraine France 173 Bulgaria Netherlands 80 Hungary Netherlands 169 Brazil UAE 79 Bulgaria UK 164 UK Netherlands 79 Moldova UK 163 Germany UK 76 Romania Austria 157 Romania Netherlands 76 Russia Germany 157 UK Netherlands 74 Austria UK 149 Croatia UK 72 Moldova Austria 147 Italy Germany 71 India Malaysia 146 Spain Germany 71 Poland UAE 146 Canada UK 70 Table 11: Expanded list of migration flows in our network

Betweenness Country Betweenness PageRank Closeness pageRank rank Closeness rank rank

UK 0,088437486 0,027972021 0,867768595 1 2 1 Spain 0,049815076 0,023997862 0,807692308 2 3 2 Russia 0,032790734 0,01368734 0,766423358 3 26 4 France 0,023428031 0,019320023 0,766423358 4 12 4 Germany 0,017032976 0,022657636 0,755395683 9 4 6 UAE 0 0,029759026 0,755395683 85 1 6 Italy 0,022031503 0,019561527 0,744680851 5 10 7 Turkey 0,011469549 0,022096946 0,729166667 13 5 9 USA 0 0,00318259 0,729166667 85 100 9 Canada 0,022030405 0,016361686 0,719178082 6 16 10 China 0,011833732 0,020279996 0,704697987 11 9 12 Belgium 0,011546286 0,019177821 0,704697987 12 14 12 Malta 0,00306119 0,021947041 0,7 30 7 14 Oman 0 0,022033856 0,7 85 6 14 Austria 0,009225926 0,014795038 0,695364238 16 19 15 Netherlands 0,009107294 0,019195657 0,690789474 17 13 17 Singapore 0,007463547 0,019466833 0,690789474 19 11 17 Brazil 0,017685572 0,007143555 0,68627451 7 56 20 Greece 0,017418846 0,014262464 0,68627451 8 23 20

Malaysia 0.0052330 0.0162642 0.644172 240 170 32 Table 12: Expanded table of centrality measures

Figure 16: Visualization of emigration network (bigger version) One way ANOVA price/ethnicity

Table 13: one way ANOVA price/ethnicity Pearson correlation price/GDP_capita

Table 14: Pearson correlation price/cpGDP

Linear Regression in-degree/GDP

Std. Error of the Model R R Square Adjusted R Square Estimate

1 .444a .197 .189 17.249 a. Predictors: (Constant), GDPpercapitacurrentUS$

Table 15: Linear regressino in-degree/GDP