Measuring Algorithms in Online Marketplaces
Total Page:16
File Type:pdf, Size:1020Kb
Measuring Algorithms in Online Marketplaces A Dissertation Presented by Le Chen to The College of Computer and Information Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science Northeastern University Boston, Massachusetts May 2017 Dedicated to my courageous and lovely wife. i Contents List of Figures v List of Tables viii Acknowledgments x Abstract of the Dissertation xi 1 Introduction 1 1.1 Uber . 2 1.2 Amazon Marketplace . 4 1.3 Hiring Sites . 7 1.4 Outline . 9 2 Related Work 10 2.1 Online Marketplaces . 10 2.1.1 Price Competition . 10 2.1.2 Fraud and Privacy . 11 2.2 Hiring Discrimination . 11 2.2.1 The Importance of Rank . 12 2.3 Auditing Algorithms . 13 2.3.1 Filter Bubbles . 13 2.3.2 Price Discrimination . 14 2.3.3 Gender and Racial Discrimination . 14 2.3.4 Privacy and Transparency . 15 3 Uber’s Surge Pricing Algorithm 16 3.1 Background . 16 3.2 Methodology . 18 3.2.1 Selecting Locations . 19 3.2.2 The Uber API . 19 3.2.3 Collecting Data from the Uber App . 19 3.2.4 Calibration . 22 3.2.5 Validation . 24 ii 3.3 Analysis . 25 3.3.1 Data Collection and Cleaning . 26 3.3.2 Dynamics Over Time . 27 3.3.3 Spatial Dynamics and EWT . 29 3.4 Surge Pricing . 30 3.4.1 The Cost of Surges . 31 3.4.2 Surge Duration and Updates . 32 3.4.3 Surge Areas . 33 3.4.4 Algorithm Features and Forecasting . 34 3.4.5 Impact of Surge on Supply and Demand . 36 3.5 Avoiding Surge Pricing . 38 3.6 Summary . 40 4 Algorithmic Pricing on Amazon Marketplace 42 4.1 Background . 42 4.1.1 Amazon Marketplace . 43 4.1.2 The Buy Box . 45 4.1.3 Amazon Marketplace Web Service . 46 4.2 Data Collection . 46 4.2.1 Obtaining Sellers and Prices . 46 4.2.2 Crawling Methodology . 47 4.2.3 Selecting Products . 49 4.3 The Buy Box . 51 4.3.1 Seller Ranking . 51 4.3.2 Behavior of the Buy Box Algorithm . 51 4.3.3 Algorithm Features and Weights . 52 4.4 Dynamic Pricing Detection . 56 4.4.1 Methodology . 56 4.4.2 Algorithmic Pricing Sellers . 58 4.4.3 Price Matching Examples . 60 4.5 Analysis . 60 4.5.1 Business Practices . 61 4.5.2 Competition . 64 4.5.3 Amazon as a Seller . 66 4.6 Simulations . 67 4.6.1 Methodology . 68 4.6.2 Simulation Results . 70 4.6.3 Breaking Into the Buy Box . 74 4.7 Summary . 76 5 Impact of Gender on Hiring Sites 77 5.1 Background . 77 5.1.1 Resume Search Engines . 79 5.1.2 Hiring Websites . 80 5.2 Data Collection . 83 iii 5.3 Analysis . 86 5.3.1 Ranking Bias . 86 5.3.2 Fairness . 91 5.3.3 Uncovering Hidden Features . 101 5.4 Alternative Ranking Approaches . 102 5.4.1 Algorithms . 103 5.4.2 Evaluation . 104 5.5 Summary . 110 6 Conclusion 112 6.1 Summary and Impact . 112 6.2 Ethics for Algorithm Audits . 114 6.3 A Guide for Future Audits . 116 6.3.1 Incentive Misalignment . 116 6.3.2 Biases in Human Data . 117 Bibliography 119 iv List of Figures 3.1 Screenshot of the Uber Partner app. 18 3.2 Visibility radius of clients in two cities. 23 3.3 Locations of my Uber and taxi measurement points in SF and Manhattan. 24 3.4 Measured and ground-truth supply (left) and demand (right) of taxis in midtown Manhattan. 25 3.5 Lifespan of UberX cars in Manhattan and SF before data cleaning. 27 3.6 Distance of short-lived insiders to the boundary. 27 3.7 Lifespan of different types of Uber cars after data cleaning. 27 3.8 Supply, demand, surge multiplier, and EWT over time for midtown Manhattan and downtown SF. Surge multiplier and estimated wait time (EWT) are only shown for UberX. Diurnal patterns are observed in supply and demand, but the characteristics of the surge multiplier show less predictability. 28 3.9 Heatmaps for UberX cars in Manhattan. 30 3.10 Heatmaps for UberX cars in SF. 30 3.11 Distribution of EWTs for UberXs. 31 3.12 Distribution of surge multipliers for UberXs. 31 3.13 Duration of surges for UberXs. 31 3.14 Examples of surge over time as seen from (a) the API and (b) the Client app. Although surge is recalculated every 5 minutes in both cases, clients also observe surge jitters for 20-30 seconds (red arrows). 31 3.15 Moment when surge changes during each interval. 33 3.16 Distribution of surge multipliers seen during times of jitter. 33 3.17 Fraction of clients that observe jitter at the same time. 33 3.18 Surge areas in in Manhattan. 35 3.19 Surge areas in SF. 35 3.20 (Supply - Demand) vs. Surge for UberX. 35 3.21 EWT vs. Surge for UberX. 35 3.22 Transition probabilities of UberXs when all areas have equal surge, and when one area is surging. 37 3.23 Fraction of the time when passengers would receive lower surge prices on UberXs by walking to an adjacent area. ..