Hassanpour, Bigazzi, and MacKenzie

1 What Can Publicly-Available API Data Tell Us about Supply and Demand for 2 New Mobility Services? 3 4 5 Amir Hassanpour 6 Department of Civil Engineering 7 University of , , , V6T 1Z4 8 Email: [email protected] 9 10 Alexander Bigazzi 11 Department of Civil Engineering and School of Community and Regional Planning 12 University of British Columbia, Vancouver, Canada, V6T 1Z4 13 Email: [email protected] 14 ORCiD: 0000-0003-2253-2991 15 16 Don MacKenzie 17 Department of Civil and Environmental Engineering 18 University of Washington 19 201 More Hall, Box 352700 20 Seattle, WA 98195-2700 21 Tel: 206-685-7198; Email: [email protected] 22 23 24 25 26 Forthcoming in the Transportation Research Record, and presented at the 2020 TRB Annual Meeting 27 28 29 30 31

1 Hassanpour, Bigazzi, and MacKenzie

1 Abstract 2 Better understanding of the impacts of New Mobility Services (NMS) is needed to inform evidence-based 3 policy, but cities and researchers are hindered by a lack of access to detailed system data. Application 4 Programming Interface (API) services can be a medium for real-time data sharing and access and have been 5 used for data collection in the past, but the literature lacks a systematic examination of the potential value 6 of publicly-available API data for extracting policy-relevant information, specifically supply and demand, 7 on NMS. The objectives of this study are 1) to catalogue all the publicly available API data streams for 8 NMS in three major cities known as the Cascadia Corridor (Vancouver, British Columbia, Seattle, 9 Washington, and Portland, Oregon), 2) to create, apply, and share web data extraction tools (Python scripts) 10 for each API, and 3) to assess the usefulness of the extracted data in quantifying supply and demand for 11 each service. Results reveal some measures of supply and demand that can be extracted from API data and 12 useful in future analysis (mostly for bikeshare and carshare services, not ridesourcing). However, important 13 information on supply and demand of most of the NMS in these cities cannot be obtained through API data 14 extraction. Stronger open data policies for mobility services are therefore needed if policymakers want to 15 obtain useful and independent insights on the usage of these services. 16 17 18 Keywords: carshare, bikehare, ridesource, API 19

2 Hassanpour, Bigazzi, and MacKenzie

1 1 Introduction 2 In rencet years, technological advances, has led to rapid expansion and proliferation of New 3 Mobility Services (NMS) in transportation such as bikesharing, carsharing, and ridesourcing. In the United 4 States, 35 million trips were taken through bikesharing companies in 2017, which is 25% more than the 5 previous year. Dockless bikesharing emerged in early 2017 in its modern form and generated a substantial 6 increase in bikesharing. The total fleet size expanded from 42,500 at the end of 2016 to 100,000 in the U.S. 7 at the end of 2017 (NACTO Bike Share Initiative, 2017). Carsharing companies were operating in 46 8 countries as of October 2016, a 31% increase from 2014 (Shaheen, Cohen, & Jaffee, 2018). Car2go is one 9 of the largest carsharing companies in North America, and has expanded rapidly since beginning in 2010 10 (Car2go, 2018). The two dominant ridesourcing companies in North America are Uber and (Wirtz, 11 Lovelock, Wirtz, & Tang, 2016) operating since 2009 and 2012 respectively. In 2016, Uber carried its 12 second billion rides in just six months, after taking six years to provide the first billion (Somerville, 2016). 13 Elsewhere in the world, services such as DiDi in China, Ola in India, and Kater in British Columbia are 14 also growing rapidly. Projections indicate continued steady growth in the ridesourcing market (Costello, 15 2018). 16 As these services establish themselves in many cities and grow, their impacts, either negative or 17 positive, on cities and on people with different socio-demographics is increasingly important. Numerous 18 studies have examined environmental, social, and mobility impacts of NMS, but more research is needed 19 to generate the level of understanding that can inform evidence-based policy (Gehrke, Felix, & Reardon, 20 2019; Litman, 2017; Shaheen & Chan, 2016; Zhao, Deng, & Song, 2014). Public agencies and 21 municipalities are struggling to address new issues raised by NMS with effective policies. Some cities are 22 cautious about allowing new modes of transportation to operate, due to legitimate concerns about safety, 23 congestion, and other impacts. For example, British Columbia excluded ridesourcing services from 24 operating for almost the first decade of their prominence (Lindsay, 2019; Ma et al., 2018). 25 Supply and demand are fundamental characteristics of transportation services, and essential 26 measures for understanding the impacts of new services and relationships with internal and external 27 components of the transportation system. Examination of supply and demand relationships provides 28 information about resource allocation efficiency and equity, and can therefore enable policies that promote 29 broad public benefits from emerging technology. Quantifying supply and demand to inform relevant policy 30 decisions requires access to detailed, disaggregate trip and service availability data. However, NMS are 31 often operated by private companies which may have disincentives to share data due to concerns about 32 customer and employee privacy and business intelligence. As evidence, in 2019 large NMS providers such 33 as Uber, Lyft, and Bird supported a bill that would prevent cities in California from collecting granular data 34 from NMS providers (Bliss, 2017). Similar barriers to third-party analysis have existed in the freight sector 35 for decades, which is a recognized problem for travel modeling and transportation planning (Czerniak, 36 Lahsene, & Chatterjee, 2000; Jiang, Johnson, & Calzada, 1999; Southworth, 2018). 37 Where access to detailed, disaggregate system data is restricted, an alternative approach which has 38 been used in recent years is extracting data from Application Programming Interface (API) services 39 (Hughes & MacKenzie, 2016). Currently in the Cascadia Corridor, 7 out of 13 bikesharing, carsharing, and 40 ridesourcing services have APIs which are used to allow third-party access to limited information on their 41 fleet and their availability. APIs can be a medium for real-time data sharing, with appropriate data 42 specifications and standards. Wolff, Possnig, & Petersen (2019) describe and evaluate five data-sharing 43 framework scenarios for NMS in Vancouver, British Columbia, although no one approach has been widely 44 adopted in practice. The Los Angeles Department of Transportation created data specifications for mobility 45 APIs, to ensure that municipalities can evaluate and manage service providers (Los Angeles Local 46 Government, 2018). In a slightly different approach, Austin, Texas, created the Austin Dock-less API to 47 provide data reporting tools for NMS (Austin Transportation, 2019). While APIs have been used for data 48 extraction in a few ad-hoc studies and cities, the literature lacks a systematic examination of the potential 49 of publicly-available API data for improving understanding NMS.

3 Hassanpour, Bigazzi, and MacKenzie

1 Recognizing the limited availability of detailed NMS system data, this work seeks to answer the 2 question: “Can we derive policy-relevant information from publicly-available API data?” To work toward 3 the goal of helping cities understand the impacts of NMS, including interacting demand for various modes 4 of transportation, we must first examine what measures of supply and demand can be derived from API 5 data. Thus, the objectives of this study are 1) to catalogue all the publicly available API data streams in the 6 three major cities of the “Cascadia Corridor” (Vancouver, British Columbia, Seattle, Washington, and 7 Portland, Oregon), 2) to create, apply, and share web data extraction tools for each API, and 3) to assess 8 the usefulness of the extracted data in quantifying supply and demand for each service.

9 2 Method 10 The methodology consists of three steps: data extraction, creation of candidate measures of supply 11 and demand, and evaluation of the candidate measures. To extract data, Python scripts were written for 12 each NMS provider and run continuously for six months, saving the extracted data into a MySQL database. 13 The extracted API data were then explored to create candidate measures of supply and demand to quantify 14 serviced trips and service availability. Finally, the candidate measures were evaluated using three criteria 15 based on conformity to microeconomic theory and travel behavior data. 16 2.1 Cataloguing the API 17 The first step was to compile a list of all bikesharing, carsharing, and ridesourcing companies that 18 operate in Vancouver, British Columbia, Seattle, Washington, and Portland, Oregon. Each NMS company’s 19 website was then researched for API services. If no information on API services were found on the website, 20 the companies were contacted through publicly-available email addresses to enquire about the available of 21 an API. A single account was created by the research team to obtain credentials (keys) for API access. Keys 22 were requested by submitting a short description of the study. Past research has used “a couple of hundred” 23 API keys to extract Uber and Lyft API data (Cooper, Castiglione, Mislove, & Wilson, 2018); we chose to 24 limit the scope of this study to data extractable with a single key, both for feasibility and to avoid 25 deactivation by the API providers. 26 Each API service consists of multiple “endpoints” that can be queried to extract data (more 27 information about APIs is provided by Paruchuri (2019)). A reply (output) from an endpoint is typically in 28 one of two types: bulk data for the whole system, or output specific to an input location. For bulk-return 29 endpoints, scripts were written to make data requests at specific time intervals throughout the data collection 30 period. For location-specific endpoints, inputs for the request are either a single location or a pair of 31 locations (i.e., an origin and a destination). A set of query locations was defined and used to systematically 32 request data from location-specific endpoints over the data collection period. The number and spacing of 33 query locations were established with consideration of three factors: 1) covering the intersection of all NMS 34 territories in a city, 2) staying under the API query rate limits, and 3) keeping the total query run times for 35 all endpoints and locations in a city under one hour. To avoid order bias, a random sequence of query 36 locations within each city was generated for each data collection interval. 37 2.2 Data Extraction 38 The data gathering process from each API involved three steps: sending requests to endpoints in 39 the server, receiving output from the server, and storing the raw data in a MySQL database. Scripts were 40 written in Python 3 to perform those three steps for each city. The codes are available online at Github 41 (https://github.com/amirhhassanpour/Web-Data-Extraction-NMS). API data were extracted for 6 months 42 from June 2018 to December 2019, using a Dell desktop with 3.40 GHz Intel Core i7-6700 and 16 GB of 43 RAM.

4 Hassanpour, Bigazzi, and MacKenzie

1 2.3 Evaluating Candidate Measures of Supply and Demand 2 Variables extracted in the API data were evaluated for their usefulness in characterizing supply and 3 demand. Several that were deemed not relevant to assessing supply and demand of NMS were excluded 4 from consideration, using the following categories: 5 1) variables that are determined solely by the location inputs (e.g. Estimated Trip Distance), 6 2) variables that only contain descriptive data (e.g. Type of Service, Pricing Structure), 7 3) variables with no diurnal variability (e.g. Station Capacity, Vehicle Capacity), and 8 4) price variables (e.g. Estimated Cost, and PrimeTime Percentage), which combine information on 9 supply and demand – they are used in the evaluation below, but not as supply and demand measures 10 themselves. 11 The remaining variables were examined for measures to quantify the volume of trips (as demand), and 12 availability of services (as supply) for each NMS. 13 The candidate supply and demand measures created from extracted API data were evaluated against 14 three criteria, where possible. The three criteria were developed with consideration of 1) established 15 characteristics of transport supply and demand variables (e.g., demand varies with activity intensity), and 16 2) dimensions over which the extracted data varied and so could be evaluated. The first criterion is 17 conformity of the measure to price relationships described by microeconomic theory and the literature, i.e. 18 does demand decrease and supply increase as price rises? For NMS with dynamic pricing, the relationship 19 between demand and price was investigated by Cohen, et al. (2016) using internal Uber data, yielding 20 estimated price elasticities of demand from -0.6 to -0.4. This provides evidence that Uber is an ordinary 21 good, meaning an increase in price is expected to reduce the quantity demanded. Other researchers have 22 found positive price elasticties of supply for Uber services, consistent with standard economic theory (Hall, 23 Kendrick, & Nosko, 2015; Hall, Horton, & Knoepfle, 2019). Mixed effect regression models were created 24 for the candidate measures against price variables, controlling for random spatial and fixed temporal effects. 25 The sign and significance of the price coefficient were then used as indicators of measure validity. 26 The second criterion is conformity of measures to the temporal distribution of relevant travel data 27 from other sources. The United States 2017 National Household Travel Survey and TransLink 2011 28 Regional Household Trip Diary Survey were used as empirical data comparators. For bikeshare services, 29 the temporal distribution of demand was compared to that of trips shorter than the longest observed bike 30 trip in the travel survey (the assumed threshold for ‘bike-able’ trips). For other services, all trips in the 31 travel survey data was used. Kolmogorov-Smirnov tests at a significance threshold of p<0.05 were used to 32 compare the distributions. 33 The third criterion is conformity of the measures to the spatial relationships reported in relevant 34 travel behavior literature. Correlation of the candidate supply and demand measures with spatial variables 35 in the city was calculated and compared to relationships reported in the literature. If more than half of the 36 investigated relationships with spatial variables are in the expected direction, it was considered a good 37 measure. Population density, bike network density, and income were previously reported to be positively 38 correlated with bikeshare demand, and distance to the Central Business District (CBD) and distance to the 39 university negatively correlated (Buck & Buehler, 2012; El-Assi, Salah Mahmoud, & Nurul Habib, 2017; 40 Faghih-Imani & Eluru, 2016; Tran, Ovtracht, & D’Arcier, 2015; Wang, Lindsey, Schoner, & Harrison, 41 2015; Zhao et al., 2014). Percentage of single-person households, population density, and income were 42 previously reported to be positively correlated with carshare demand (Bhat & Guo, 2007; Handy, Cao, & 43 Mokhtarian, 2006; Millard-Ball, 2005; Zhang, 2018).

44 3 Results 45 3.1 Query locations and Data Extraction 46 A list of all companies operating NMS in Vancouver, Seattle, and Portland as of spring 2018 is 47 given in Table 1. Among the 13 companies that operated in the 3 cities, 7 had API services, and 6 of them

5 Hassanpour, Bigazzi, and MacKenzie

1 granted keys to access their API services. Car2go was the only service that declined the request for access, 2 stating that they no longer support using their API for study purposes.

3 Table 1. API of all new mobility services operating in Vancouver, Seattle, and Portland Category Company Cities operating in API services Access Ride-Hailing uber Seattle - Portland Available Granted lyft Seattle - Portland Available Granted Carsharing zipcar Vancouver - Seattle - Portland Not Available NA car2go Vancouver - Seattle - Portland Available Not Granted reachnow Seattle - Portland Not Available NA evo Vancouver Not Available NA getaround Seattle - Portland Not Available NA modo Vancouver Available Granted Bikesharing Portland Available Granted Seattle Available Granted limebike Seattle Not Available NA dropbike Vancouver Not Available NA mobibikes Vancouver Available Granted 4 5 Query locations were not necessary for Vancouver because the only endpoints for the available 6 APIs (Modo and Mobibikes) were of the bulk-data type. Of the six companies with accessible API services, 7 only Uber and Lyft had API query limits. Uber’s API had a short-term rate limit of 500 queries per 5 8 seconds and a long-term rate limit of 2000 queries per hour. Lyft had a limit of 5 queries per minute, which 9 is equal to 300 queries per hour. Ofo had no specific API request limit reported online, but deactivated our 10 API key after 1 month of data collection, possibly due to request volume. 11 Run time was a limiting factor in number of query locations for Uber and Lyft. The average run 12 time of each endpoint in the Uber and Lyft API ranged from 0.92 to 2.82 seconds. For each query location, 13 seven API endpoints (three Uber and four Lyft) were queried, which required 10.95 seconds on average. 14 Considering the API request limits and run times, the density of query locations was selected to cover the 15 intersect of the NMS territory in each city. Sampling grids consisted of 113 locations in Seattle (spaced at 16 1.5 km) and 72 in Portland (with east-west spacing of 0.8 km and north-south spacing of 1.2 km). 17 Requesting data at intervals of 30 seconds, extraction of Uber and Lyft API data for Seattle required 57 18 minutes. Ofo data were extracted at the same query location in Seattle, once per hour. APIs of the other 19 NMS (Mobibikes, Modo, Biketown) provided bulk system data upon a single request and did not have API 20 rate limits. Data were extracted at 15 minute intervals. At the end of 6 months, 2.25 GB of data had been 21 extracted from the APIs of the 6 NMS operating within the 3 cities. 22 3.2 Extracted Variables 23 The structure of each API is illustrated in Figure 1. The list of all extracted variables is given in 24 Table 2, including which were excluded from consideration as candidate measures of supply and demand. 25 Types of available services, pricing structure of the services (base price, booking fee, cancelation fee etc.), 26 and vehicle capacity were excluded as they only contain descriptive information that does not vary. 27 Location and capacity of stations and carshare parking locations exhibited some variation over the six 28 months, but were excluded for a lack of diurnal variation. Length and duration of trip were determined by 29 the query inputs and also excluded. Finally, the net change in the number of bicycles available at bikeshare 30 stations was excluded because it did not reflect actual trip ends within the sample interval.

6 Hassanpour, Bigazzi, and MacKenzie

1 2 Figure 1. API structure of new mobility services in Vancouver, Seattle, and Portland

7 Hassanpour, Bigazzi, and MacKenzie

1 Table 2. Variables extracted from new mobility service API in Vancouver, Seattle, and Portland

Services Variable Description Uber types of available services* List of type of services supplied by Uber pricing structure of the A list of booking and cancellation fee, base cost, minimum cost, and services* cost per distance and time vehicle capacity* Number of persons pickup time estimate Minutes duration estimate of the trip* Minutes length estimate of the trip* km price estimate of the trip* Dollars Lyft types of available services* List of type of services supplied by Lyft pricing structure of the A list of base and minimum cost, cost per distance and time, trust and services* services, and cancellation fee vehicle capacity* Number of persons pickup time estimate Minutes duration estimate of the trip* Minutes length estimate of the trip* km price estimate of the trip* Dollars location of nearby drivers Latitude and longitude of cars with no id prime time percentage* In percentage Biketown location and capacity of List of stations with their capacity (number of docks) stations* number of bikes at station* Number of bikes available at each station location of bike left outside Latitude and longitude of bikes left outside stations with id stations Ofo location of bikes Latitude and longitude of bikes that are not being used location and capacity of List of stations their capacity (number of docks) stations* number of bikes at station* Number of bikes available at each station Modo cars in the parking List of cars available at each parking location with id, model, make, locations* color, capacity, and reserved cars and hours List of hours that each car is reserved * variables excluded from consideration as candidate measures of supply and demand 2 3.3 Candidate Measures of Supply and demand 3 Seven candidate measures of supply and demand were created from the remaining variables not 4 excluded in Table 2. For Uber and Lyft, none of the variables were indicators of trip volume, but measures 5 of service availability were derived. Estimated Time of Arrival (ETA) was used without modification as a 6 candidate measure of supply (also previously used by Hughes & MacKenzie (2016)). Locations of Cars 7 (latitudes and longitudes) was transformed into Number of Cars and Car Density (number of cars per square 8 km) as additional candidate measures of supply for Lyft. 9 Candidate measures of demand for Ofo and Biketown were created by assuming a trip was initiated 10 each time a bicycle ceased to exist in the list of reported bikes. This approach assumes that actual trip 11 volume far exceeds rebalancing events (which are not distinguished, and so the measure over-represents 12 actual trips). This approach only captured Biketown trips initiated outside of a station. Biketown trips were 13 aggregated at the census tract level and Ofo trips were aggregated at 500-meter buffers around query 14 locations because not all Ofo trips were captured in the data extraction. If there were abundant bicycles 15 available around a query locations, the Ofo API only returned bicycles within one-square-kilometer of the 16 query location. Buffers of 200 to 800 m have been previously used in studies of bikeshare demand (Buck 17 & Buehler, 2012; El-Assi et al., 2017; Faghih-Imani & Eluru, 2016; Krykewycz, Puchalsky, Rocks, 18 Bonnette, & Jaskiewicz, 2010; Tran et al., 2015; Wang et al., 2015). 19 For Modo, trips per parking location was created as a measure of demand using the variables 20 Reserved Cars and Hours. If a reservation existed 15 minutes before the reservation start time, it was

8 Hassanpour, Bigazzi, and MacKenzie

1 counted as one trip which was made with Modo. In addition, available vehicle locations from the bikeshare 2 and carshare service APIs provide direct measures of supply in those systems; these measures were 3 excluded from evaluation due to their clarity and a lack of empirical data available for evaluation. 4 3.4 Evaluation of supply and demand measures 5 Table 3 gives summary results of the model-estimated price relationships for candidate supply 6 measures of Uber and Lyft in Seattle and Portland. Price variable coefficients for Car Density and ETA do 7 not conform to expected signs. Number of Cars does exhibit the expected positive relationship with price. 8 No data were available on the expected temporal or spatial distribution of Uber and Lyft service availability, 9 so the second and third criteria were not tested for these measures.

10 Table 3. Summary of model results for relationship of ridesource supply measures to price*

Seattle Portland Service Dependent Expected sign Estimated t-statistic** Estimated t- statistic** variable (measure of Price sign of Price sign of Price of supply) coefficient coefficient coefficient Uber ETA - + 28.6 + 29.9 Lyft ETA - + 31.6 + 40.2 Number of Cars + + 3.5 + 7.3 Car Density + - -7.8 - -9.9 * Mixed effect regression models for each dependent variable in Seattle and Portland against Price, with fixed temporal effects and random spatial effects (total of 8 models) ** All coefficients p<0.001 11 12 For the rest of the candidate measures, criteria one could not be evaluated because there was no 13 observed price variation for the bikeshare or carshare services. Results of the second evaluation criterion 14 (temporal conformity) for demand measures of Biketown, Ofo, and Modo were positive. The null 15 hypothesis (that the samples were drawn from the same distribution) was not rejected at 95 percent 16 confidence for any of the three services, suggesting similar temporal distributions of the candidate measures 17 to expected demand profile. The expected demand profile was created using trips shorter than the longest 18 observed bike trip in the travel survey; however, the criterion was also tested against the expected demand 19 profile using trips shorter than 95th percentile of bike trips. This was to exclude outlier bicycle trips (such 20 as long recreational or sport rides), and isolate more common distances; this did not affect the result of this 21 evaluation criterion. 22 For the third evaluation criterion (spatial conformity), the correlation of Biketown, Ofo, and Modo 23 trips with sociodemographic and built environment variables is given in Table 4, along with expected signs 24 based on the literature. All tested spatial variables were correlated with the demand measures in the 25 expected direction except for household income.

9 Hassanpour, Bigazzi, and MacKenzie

1 Table 4. Correlation of spatial variables with Ofo, Biketown, and Modo candidate demand 2 measures Service Spatial variable Pearson correlation Expected with demand measure sign Biketown Median household income -0.25 + Distance to CBD -0.46 - Distance to university -0.44 - Population density +0.23 + Bike network density +0.52 + Ofo Median household income -0.14 + Distance to CBD -0.41 - Distance to university -0.22 - Population density +0.53 + Bike network density +0.54 + Modo Percentage single-person households +0.16 + Median household income +0.01 + Population density +0.12 + 3 4 A summary of evaluation results for all measures is given in Table 5. Other than ETA and Car 5 Density, the other candidate measures performed well: Number of Cars from the Lyft API as a measure of 6 supply, and bikeshare and carshare system trips as measures of demand. As described above, direct 7 measures of supply for bikeshare and carshare services were also available from the API. To illustrate the 8 use of these measures, Figure 2, Figure 3, and Figure 4 show average daily trips for services in each of the 9 three cities over the 6 month of data extraction.

10 Table 5. Evaluation of candidate supply and demand measures obtained from API data Price Temporal Spatial Measure Service Variable relationship pattern pattern Supply Uber ETA Fail - - Lyft ETA Fail - - Number of Cars Pass - - Car Density Fail - - Demand Biketown Trips per Census Tract - Pass Pass Ofo Trips per Buffer - Pass Pass Modo Trips from parking station - Pass Pass ‘-‘ indicates tests not performed 11 12

10 Hassanpour, Bigazzi, and MacKenzie

1 2 Figure 2. Average daily Modo trips by census tract in Vancouver

3 4 Figure 3. Average daily Ofo trips in 1 km2 areas around query locations in Seattle

11 Hassanpour, Bigazzi, and MacKenzie

1 2 Figure 4. Average daily Biketown trips (outside of stations) by census tract in Portland

3 4 Conclusions 4 The aim of this research was to determine what useful information about supply and demand for 5 new mobility services can be derived from publicly-available API data. Currently, API data provide limited 6 information about NMS in the three large North American cities studied. Among the 13 bikeshare, carshare 7 and ride-source companies operating in Vancouver, Seattle and Portland, only 6 of them provide accessible 8 API services. The utility of an API depends on what data are available through it. API data from the 9 bikesharing and carsharing services was the most useful for evaluating supply and demand. The data 10 currently available through Uber and Lyft APIs provided limited information on supply and no information 11 on demand. Number of Cars, extracted from Lyft API data, can serve as a measure of ridesourcing service 12 availability or supply, but none of the publicly-available data provided a clear measure of trip volume or 13 demand. The relationships between price and three out of the four candidate measures of supply 14 investigated in this paper were inconsistent with economic theory and with previous research based on high- 15 resolution internal Uber data. This highlights the inadequacy of current public API data for measuring 16 supply and demand, and associated measures of welfare. 17 To address the issues described in the introduction, the ideal solution is public sharing of 18 anonymized NMS system data, at a level of aggregation to protect business intelligence and customer and 19 employee privacy, through data repositories. This solution could be encouraged or enforced through 20 stronger open data policies and inclusion of data sharing in operating license agreements (Wolff et al., 21 2019). If a publicly-accessible data repository is too costly to implement, there are also ways to improve 22 the utility of API data for monitoring or research. The first is a consistent API data reporting structure across 23 services, such as has been proposed in Los Angeles (Los Angeles Local Government, 2018). In addition, 24 the data streams of publicly-available APIs are largely undocumented, and data dictionaries are needed to 25 ensure accurate interpretation of extracted data. To effectively use the available data on bikeshare and free- 26 floating carshare systems, additional information is needed on rebalancing events.

12 Hassanpour, Bigazzi, and MacKenzie

1 The major limitations of this study are regarding data quality. API rate limits and long run times 2 for location-specific endpoint queries restricted the temporal and spatial data resolution which could be 3 obtained. More API keys could be used along with additional data collection to gain more insights on a 4 specific question or service, as in previous research (Chen, Mislove, & Wilson, 2015), or data could be 5 obtained directly from companies in a cooperative research project, but our goal was to catalogue the 6 possibilities of readily-available API data. Without data dictionaries, our analysis relied on several 7 assumptions about the data streams based on preliminary investigations, such as the number of available 8 bicycles returned from Ofo. The effects of rebalancing were neglected in creating demand measures for 9 bikesharing services. Modo trips were created using the variable “reserved cars and hours” which was 10 extracted every 15 minutes; this could have created errors if a car were booked or canceled within 15 11 minutes of the trip beginning. Finally, using correlation to determine the quality of demand measures 12 overlooks potential confounding effects. 13 Despite these limitations, this study makes several contributions to our understanding of modern 14 transportation system data. We provide a comprehensive catalogue of all the publicly-available data streams 15 from the API of new mobility services in three large North American cities, along with Python scripts for 16 data extraction. The systematic examination of the API data reveals some measures of supply and demand 17 that can be extracted from API and useful in future analysis (mostly for bikeshare and carshare services, 18 not ridesourcing). However, important information on supply and demand of most of the New Mobility 19 Services in these cities cannot be obtained through API data extraction. API can be a last-resort “backdoor” 20 approach to getting some information, but what is clearly needed is stronger open data policies for mobility 21 service providers operating on public facilities.

22 5 Acknowledgement 23 The authors would like to acknowledge support from Cascadia Urban Analytics Cooperative.

24 6 Author contribution statement 25 The authors confirm contribution to the paper as follows: study conception and design: AB, DM, and AH; 26 coding and data collection: AH, DM, and AB; analysis and interpretation of results: AH, AB, and DM; 27 manuscript preparation: AH, AB, and DM. All authors reviewed the results and approved the final version 28 of the manuscript.

29 7 References 30 Austin Transportation. (2019). Dockless mobility. Retrieved from http://austintexas.gov/docklessmobility 31 Bhat, C. R., & Guo, J. Y. (2007). A comprehensive analysis of built environment characteristics on 32 household residential choice and auto ownership levels. Transportation Research Part B: 33 Methodological. https://doi.org/10.1016/j.trb.2005.12.005 34 Bliss, L. (2017). Why Uber and Lyft Will Continue to Dominate Ride-Hailing. Retrieved March 2, 2019, 35 from https://www.citylab.com/transportation/2017/06/why-uber-will-still-dominate/529686/ 36 Buck, D., & Buehler, R. (2012). Bike lanes and other determinants of trips. 91st 37 Transportation Research Board Annual Meeting, 703–706. Retrieved from 38 http://scholar.google.com/scholar?hl=en&btnG=Search&q=intitle:Bike+Lanes+and+Other+Determi 39 nants+of+Capital+Bikeshare+Trips#0 40 Car2go. (2018). Carsharing is growing rapidly: car2go celebrates over three million members. Retrieved 41 from https://www.car2go.com/media/data/germany/microsite-press/files/180205_press- 42 release_car2go-celebrates-over-three-million-members.pdf 43 Chen, L., Mislove, A., & Wilson, C. (2015). Peeking Beneath the Hood of Uber. 44 https://doi.org/10.1145/2815675.2815681 45 Cohen, P., Hahn, R., Hall, J. V, Levitt, S., & Metcalfe, R. (2016). Using Big Data to Estimate Consumer 46 Surplus: The Case of Uber. NBER Working Paper Series, 42. https://doi.org/10.3386/w22627

13 Hassanpour, Bigazzi, and MacKenzie

1 Cooper, D., Castiglione, J., Mislove, A., & Wilson, C. (2018). Profiling transport network company activity 2 using big data. Transportation Research Record. https://doi.org/10.1177/0361198118798459 3 Costello, H. (2018). Global Ride-Sharing Market 2018 Research Analysis, Statistics, Business Growth 4 Opportunity, Services and Facilities by Key Companies till 2022. Retrieved from Reuters website: 5 https://www.reuters.com/brandfeatures/venture-capital/article?id=50078 6 Czerniak, R. J., Lahsene, J. S., & Chatterjee, A. (2000). Urban Freight Movement: What Form Will It Take? 7 Transportation Research Board. Retrieved from 8 http://onlinepubs.trb.org/onlinepubs/millennium/00139.pdf%5Cnhttp://files/74/CZERNIAK et al. - 9 2000 - Urban freight movement–What form will it take.pdf 10 El-Assi, W., Salah Mahmoud, M., & Nurul Habib, K. (2017). Effects of built environment and weather on 11 bike sharing demand: a station level analysis of commercial bike sharing in Toronto. Transportation. 12 https://doi.org/10.1007/s11116-015-9669-z 13 Faghih-Imani, A., & Eluru, N. (2016). Incorporating the impact of spatio-temporal interactions on bicycle 14 sharing system demand: A case study of New York CitiBike system. Journal of Transport Geography. 15 https://doi.org/10.1016/j.jtrangeo.2016.06.008 16 Fry, R. (2013). Young Adults After the Recession: Fewer Homes, Fewer Cars, Less Debt. Pew Research 17 Center, 3, 5–8. Retrieved from http://www.pewsocialtrends.org/2013/02/21/young-adults-after-the- 18 recession-fewer-homes-fewer-cars-less-debt/ 19 Gehrke, S. R., Felix, A., & Reardon, T. G. (2019). Substitution of Ride-Hailing Services for More 20 Sustainable Travel Options in the Greater Boston Region. Transportation Research Record: Journal 21 of the Transportation Research Board, 2673(1), 438–446. 22 https://doi.org/10.1177/0361198118821903 23 Godelnik, R. (2017). Millennials and the sharing economy: Lessons from a ‘buy nothing new, share 24 everything month’ project. Environmental Innovation and Societal Transitions. 25 https://doi.org/10.1016/j.eist.2017.02.002 26 Hall, J. V., Horton, J. J., & Knoepfle, D. T. (2019). Pricing Efficiently in Designed Markets: The Case of 27 Ride-Sharing. 28 Hall, J., Kendrick, C., & Nosko, C. (2015). The effects of Uber’s surge pricing: A case study. The University 29 of Chicago Booth School of Business. 30 Handy, S., Cao, X., & Mokhtarian, P. L. (2006). Self-selection in the relationship between the built 31 environment and walking: Empirical evidence from Northern California. Journal of the American 32 Planning Association. https://doi.org/10.1080/01944360608976724 33 Hughes, R., & MacKenzie, D. (2016). Transportation network company wait times in Greater Seattle, and 34 relationship to socioeconomic indicators. Journal of Transport Geography, 56, 36–44. 35 https://doi.org/10.1016/j.jtrangeo.2016.08.014 36 Jiang, F., Johnson, P., & Calzada, C. (1999). Freight Demand Characteristics and Mode Choice : An 37 Analysis of the Results of Modeling with Disaggregate. Journal of Transportation and Statistics, 2(2), 38 149–158. 39 Krykewycz, G. R., Puchalsky, C. M., Rocks, J., Bonnette, B., & Jaskiewicz, F. (2010). Defining a Primary 40 Market and Estimating Demand for Major Bicycle-Sharing Program in Philadelphia, Pennsylvania. 41 Transportation Research Record: Journal of the Transportation Research Board. 42 https://doi.org/10.3141/2143-15 43 Lindsay, B. (2019). No ride-hailing services in B.C. until late 2019, province says. Retrieved from CBC 44 website: https://www.cbc.ca/news/canada/british-columbia/no-ride-hailing-services-in-b-c-until-late- 45 2019-province-says-1.4753339 46 Litman, T. (2017). Introduction to Multi-Modal Transportation Planning. Victoria Transport Policy 47 Institute, (September), 0–15. https://doi.org/10.1080/00420987620080731 48 Los Angeles Local Government. (2018). Data Standard for Mobility as a Service Providers, Los Angeles, 49 California - Shared Mobility Policy Database. Retrieved March 3, 2019, from 50 http://policies.sharedusemobilitycenter.org/#/policies/1044 51 Ma, B., Cadieux, S., Herbert, S. C., Elmore, M., Johal, J., Kahlon, R., … Weaver, D. A. (2018).

14 Hassanpour, Bigazzi, and MacKenzie

1 Transportation Network Companies in British Columbia. Retrieved from 2 https://www.leg.bc.ca/content/CommitteeDocuments/41st-parliament/2nd- 3 session/CrownCorporations/Report/SSC-CC_41-2_Report-2018-02-15_Web.pdf 4 Millard-Ball, A. (2005). Millard-Ball, Adam. Carsharing: Where and how it succeeds (Vol. 108). 5 NACTO Bike Share Initiative. (2017). Bike Share in the US: 2017. Retrieved from 6 https://nacto.org/bikeshare-statistics-2017 7 Paruchuri, V. (2019). Python API Tutorial: Getting Started with APIs. Retrieved November 29, 2019, from 8 https://www.dataquest.io/blog/python-api-tutorial/ 9 Ross, D. (2014). Millennials don’t care about owning cars, and car makers can’t figure out why. Retrieved 10 from Fast Company, Co website: https://www.fastcompany.com/3027876/millennials-dont-care- 11 about-owning-cars-and-car-makers-cant-figure-out-why 12 Shaheen, S., & Chan, N. (2016). Mobility and the sharing economy: Potential to facilitate the first-and last- 13 mile public transit connections. Built Environment. https://doi.org/10.2148/benv.42.4.573 14 Shaheen, S., Cohen, A., & Jaffee, M. (2018). Innovative Mobility: Carsharing Outlook. Carsharing Market 15 Overview, Analysis and Trends, 0–7. https://doi.org/10.7922/G2CC0XVW 16 Somerville, H. (2016). Uber reaches 2 billion rides six months after hitting its first billion. Retrieved from 17 Reuters website: https://www.reuters.com/article/us-uber-rides-idUSKCN0ZY1T8 18 Southworth, F. (2018). Freight Flow Modeling in the United States. Applied Spatial Analysis and Policy, 19 11(4), 669–691. 20 Tran, T. D., Ovtracht, N., & D’Arcier, B. F. (2015). Modeling bike sharing system using built environment 21 factors. Procedia CIRP. https://doi.org/10.1016/j.procir.2015.02.156 22 Wang, X., Lindsey, G., Schoner, J. E., & Harrison, A. (2015). Modeling Bike Share Station Activity: Effects 23 of Nearby Businesses and Jobs on Trips to and from Stations. Journal of Urban Planning and 24 Development. https://doi.org/10.1061/(asce)up.1943-5444.0000273 25 Wirtz, J., Lovelock, C., Wirtz, J., & Tang, C. (2016). Uber: Competing as Market Leader in the US versus 26 Being a Distant Second in China. Services Marketing, 626–632. 27 https://doi.org/10.1142/9781944659028_0019 28 Wolff, H., Possnig, C., & Petersen, G. (2019). An Open Data Framework for the New Mobility Industry. 29 Retrieved from http://hendrikwolff.com/web/MaaS.pdf 30 Zhang, M. (2018). Intercity Variations in the Relationship between Urban form and Automobile 31 Dependence. Transportation Research Record: Journal of the Transportation Research Board. 32 https://doi.org/10.1177/0361198105190200107 33 Zhao, J., Deng, W., & Song, Y. (2014). Ridership and effectiveness of bikesharing: The effects of urban 34 features and system characteristics on daily use and turnover rate of public bikes in China. Transport 35 Policy. https://doi.org/10.1016/j.tranpol.2014.06.008 36

15