<<

Comparative Analysis of Coffee Franchises in the Cambridge- Area

May 10, 2010 ESD.86: Models, Data, and Inference for Socio-Technical Systems

Paul T. Grogan [email protected] Institute of Technology Introduction

The placement of storefronts is a difficult question on which many corporations spend a great amount of time, effort, and money. There is a careful interplay between environment, potential customers, other storefronts from the same franchise, and other storefronts for competing franchises. From the customer’s perspective, the convenience of storefronts, especially for “discretionary” products or services, is of the utmost importance. In fact, some franchises develop mobile phone applications to provide their customers with an easy way to find the nearest storefront.1

This project takes an in-depth view of the storefront placements of Dunkin’ Donuts and Starbucks, two competing franchises with strong presences in the Cambridge-Boston area. Both franchises purvey coffee, coffee drinks, light meals, and pastries and cater especially well to sleep-deprived graduate students. However, Dunkin’ Donuts typically puts more emphasis on take-out (convenience) customers looking to grab a quick coffee before class whereas Starbucks provides an environment conducive to socializing, meetings, writing theses, or studying over a longer duration. These differences in target customers may drive differences in the distribution of storefronts in the area.

The goal of this project is to apply some of the concepts learned in ESD.86 on probabilistic modeling and to the real-world system of franchise storefronts and customers. The focus of the analysis is directed on the “convenience” of accessing storefronts, determined by the distance to the nearest location from a random customer. The “nearest neighbor” probabilistic model is a natural choice for application to this problem. Under this model, the distance from a random uniformly-distributed customer to the closest spatially Poisson distributed storefront can be expressed with a closed-form equation. Of course, in the real-world system, there are several assumptions that must be checked.

• Can the franchise storefronts be modeled with a spatial Poission distribution? • Can the customers be modeled with a uniform distribution? • Does the “nearest-neighbor” distance correlate with the “actual” closest storefront distance? • Is the Euclidean or Manhattan distance metric appropriate for pedestrian walking paths?

To answer these questions, as well as the greater question of which coffee franchise provides better service to the residents of the Cambridge/Boston area, the project is broken down into three parts. First, data must be gathered on the existing storefront locations within an area of interest. Fortunately, both franchises provide “store locator” services from the corporate web sites. Additionally, data representing

1 myStarbucks App for iPhone and iPod Touch, http://www.starbucks.com/coffeehouse/mobile-apps/mystarbucks

Grogan – ESD.86 2 the demand distribution either through population density or other relevant features are required for constructing the customer model. Second, probabilistic distributions will be created in accordance with the nearest neighbor model. Using the data gathered in the first phase, storefront locations will be modeled as spatial Poisson distributions and customers will be modeled with uniform distributions. Finally, comparative analysis will investigate the differences between the two franchises as well as the underlying assumptions and accuracy of the probabilistic models.

Grogan – ESD.86 3 Data Gathering

The data gathering portion of the project assembles the information required to build the probabilistic models. There are two primary formats of data needed: positional data and population data. Positional data provides coordinates for storefront locations for both franchises as well as locations of other features that may be helpful in the analysis. Population data provides a sense of customer density that will be used to help drive customer demand models.

Positional Coordinates

Not long ago, gathering position coordinates in a format conducive to numerical analysis would have been an insurmountable challenge for a term project. Fortunately, with the confluence of several technologies, it is no longer out of scope to build a very accurate representation of the real world.

The general process to gather location data is as follows:

1. Aggregate addresses using online-available services or documents 2. Process addresses into GPS coordinates using online GeoCoder tool2 3. Visualize GPS coordinates using online mapping applications such as Google Maps, iterating on improperly-identified addresses as necessary 4. Transform GPS coordinates into Cartesian coordinates using the haversine formula3

The main innovation in the above steps is the availability of the GeoCoder tool, which allows batch queries of addresses to either Yahoo or Google mapping applications. Though the queries are not always correct, it dramatically reduces the time required to generate GPS coordinates (latitude and longitude) from text-based addresses.

Franchise Storefronts

The franchise storefront addresses are readily available on both Dunkin’ Donuts4 and Starbucks5 corporate websites. In both cases, the search criteria was limited to a target area being within five miles of ZIP code 02139 , which resolves to a location near Central Square in Cambridge, MA. In addition, all franchise storefront locations at Logan International Airport were removed under the assumption that

2 GeoCoder tool provides search queries using Yahoo or Google: http://www.gpsvisualizer.com/geocoder/ 3 Haversine formula computes great-circle distances: http://en.wikipedia.org/wiki/Haversine_formula 4 Dunkin’ Donuts store locator: https://www.dunkindonuts.com/aboutus/store/Search.aspx 5 Starbucks store locator (legacy): http://ie.starbucks.com/en-ie/_Our+Stores/

Grogan – ESD.86 4 airline customers do not include locally-quantifiable customers. With these restrictions, there were a total of 163 Dunkin’ Donuts and 59 Starbucks franchise storefronts identified in the target area.

MBTA Stations

As noted in one journal article, the optimal storefront placement for “discretionary services” may be at intersections of high pedestrian traffic.6 In the Boston area, the MBTA public transportation system hosts an average weekday ridership of 1.24 million customers as of April 20107 and is a prime target for storefront location placement. In this project, MBTA stations on the red, blue, green, orange, and silver lines were considered as inputs for a potential customer model. Also, as addresses are not widely used for these stations, an freely-distributable list of 142 stations current through 2006 including GPS coordinates was used for station location data.8

Visualizations

As an important part of gathering data, visualizations were used throughout the project to verify locations. Figure 1 (below) shows plots of the storefront locations and MBTA stations using both GPS and Cartesian coordinate systems. In the Cartesian coordinate system, the five-mile radius is highlighted. a) b)

Figure 1: a) Raw GPS Position Coordinates b) Cartesian Position Coordinates with 5-Mile Radius Highlighted

To improve the context of the franchise storefronts and MBTA stations, the location data was overlaid on an area map9, as shown in Figure 2.

6 Berman, O., Larson R., Fouska N., “Optimal Location of Discretionary Service Facilities,” Transportation Science, Vol. 26, No. 3, pp. 201-211, August 1992. 7 Davey R., “MBTA Scorecard,” April 2010. Retrieved 4/25/2010 from http://mbta.com/about_the_mbta/scorecard/ 8 Demaine, E., “Boston Subway Google Map.” Retrieved 4/25/2010 from http://erikdemaine.org/maps/mbta/ 9 Background map retrieved from Google Maps: http://maps.google.com

Grogan – ESD.86 5

Figure 2: Location Data Overlaid on Map

Population Density

Gathering population density data was a challenge for this project. Although population data is commonly available from decadal censuses, it is commonly aggregated by county or city which is not conducive for spatial analysis. Fortunately, an online “Digital Atlas” of Boston includes population maps based on the 1990 census utilizing red dots to represent 100 persons randomly distributed within a census tract.10

With some post-processing using Adobe Photoshop, the image was copped, resized, and filtered to display only the population information which is readable using built-in MATLAB image processing functions. The processed data is shown in Figure 3. Though there are some concerns over the accuracy of the resulting population data,11 it should be internally consistent and be helpful towards the modeling process.

10 Bowen, W., “Boston and Vicinity: Total Population,” 1997. Retrieved 4/25/2010 from http://130.166.124.2/boston/bos1.GIF 11 There is some discrepancy if a “dot” is one pixel or two and whether the pixels were sampled with or without replacement. In some cases, one pixel could represent somewhere between 50 and 100 people, more if there could be overlap, though from rough estimates, the 100 people per pixel seems to provide accurate population data.

Grogan – ESD.86 6 a) b)

Figure 3: a) Raw Population Data for Boston Area b) Processed Population Data of Target Area

Grogan – ESD.86 7 Probabilistic Modeling

Within the topics covered in ESD.86, the discussion of spatial probability distributions involved the “nearest neighbor” problem of finding the expected distance to the closest neighbor from a random point. This problem uses a uniform distribution to select the “customer” and a spatial Poisson distribution for the neighboring “storefronts” within a specified area.

If this type of problem is to be extended to a real-world case of storefronts, ultimately selecting whether Dunkin’ Donuts or Starbucks is a closer neighbor for random customers, the distributions of both the customers and storefronts should be investigated. A city-wide Poisson distribution of storefronts is not likely to hold as there is clearly some location dependence in the storefront placing. On a smaller scale, however, a spatial Poisson distribution is conceivable, as the exact placement of a storefront within a small area may be independent of others. In a similar sense, a city-wide uniform distribution of customers is not likely to hold as there are significantly higher concentrations of customers in the city-centers. On smaller scales, however, uniformly-distributed customers may be a valid approximation.

To implement the concept of piecewise spatially Poisson distributed storefronts and uniformly distributed customers, the initial 78.5 square mile target area (circle with 5 mile radius) was sub selected to a 49 square miles (square with 7 miles per side). This area was then broken down into 100 square sectors, each 0.7 miles per side, or 0.49 square miles in area. a) b)

Figure 4: a) Target Area Divided into Sectors b) Sectors Highlighted by Neighborhood Assignment

Grogan – ESD.86 8 With the relatively fine level of sector definition, many were not large enough to contain a storefront. In order to determine a non-zero storefront density for each unit of analysis, sectors were grouped into seven “neighborhoods.” The sizing of each neighborhood along with a description is provided in Table 1.

Table 1: Neighborhood Descriptions

Number Neighborhood Description Sectors Northwest 8 Cambridge Highlands, M. Auburn, East Watertown Cambridge 18 Cambridge, East Cambridge Northeast 21 Everett, Somerville, Charlestown Downtown 4 Downtown Boston, North End Back Bay 3 Back Bay, Southwest 17 Brookline, Aberdeen, Brighton Southeast 17 Roxbury, South Boston, Harrison Lenox

The process of neighborhood definition was done by hand using approximate city or geographical boundaries. The neighborhoods do not exactly correspond to the geographical equivalents due to the discretization of sectors, though the labeling scheme helps infer relative location. The only requirement of each neighborhood is that it must contain both a Dunkin’ Donuts and a Starbucks storefront, providing a non-zero storefront density. In some cases, sectors did not fit into an existing neighborhood, nor did they exhibit enough information to establish a new neighborhood, so they went unused.

Storefront Distribution Model

Using the neighborhood definitions, the storefront distribution model constructs a piecewise spatial Poisson distribution for each neighborhood. Since there is at least one storefront from each franchise in each neighborhood, the storefront densities are non-zero in all neighborhoods.

Figure 5: Storefront Locations by Neighborhood

Grogan – ESD.86 9 Using a count of the number of storefronts within each neighborhood and the associated area, the storefront density parameter was determined for each neighborhood, as shown in Table 2.

Table 2: Storefront Model Parameters

Storefront Density Number Storefronts Area (γ, 1/mi2) Neighborhood (mi2) Dunkin’ Dunkin’ Starbucks Starbucks Either Donuts Donuts Northwest 3.92 1 1 0.2551 0.2551 0.5102 Cambridge 8.82 11 23 1.2472 2.6077 3.8549 Northeast 10.29 2 26 0.1944 2.5267 2.7211 Downtown 1.96 13 27 6.6327 13.7755 20.4082 Back Bay 1.47 16 14 10.8844 9.5238 20.4082 Southwest 8.33 10 19 1.2005 2.2809 3.4814 Southeast 8.33 6 20 0.7203 2.4010 3.1212 Total 43.12 59 130 1.3683 3.0148 4.3831

Customer Distribution Model

The customer distribution model uses the same sector and neighborhood definition as used in the storefront distribution model. Using the assumption that customers are distributed in proportion to population, the customer distribution model uses the population data to weigh the relative probabilities of customers emerging from each neighborhood.

Figure 6: Sectors with Estimated Population

Using the population density information previously gathered, the number of potential customers is determined for each sector, which is then aggregated into neighborhoods. The probability of a customer

Grogan – ESD.86 10 emerging from a particular neighborhood is in proportion to the neighborhood’s population fraction, as shown in Table 3.

Table 3: Customer Model Parameters

Estimated Customer Neighborhood Population Probability Northwest 46900 0.0558 Cambridge 180600 0.2148 Northeast 162900 0.1937 Downtown 40800 0.0485 Back Bay 51700 0.0615 Southwest 182500 0.2170 Southeast 175500 0.2087 Total 840900 1.0000

City-wide Spatial Poisson Distribution

As an aside, I thought it would be interesting to test the hypothesis that Dunkin’ Donuts and Starbucks franchise storefronts follow a Poisson spatial distribution over all 100 sectors. This null hypothesis was tested using the Pearson Goodness-of-Fit test using an assumed Poisson distribution parameter estimated from the average number of storefronts per sector (1.36 Dunkin’ Donuts per sector, 0.59 Starbucks per sector).

In both cases, the null hypothesis was rejected (α=0.05, p=1). For some reasoning behind the hypothesis was rejected at such a high confidence level, the histograms of expected versus observed storefronts per sector is displayed below. The Poisson spatial distribution over-estimates the low-frequency sectors and under-estimates the high-frequency sectors, which makes sense given a high concentration of storefronts in the city-center. a) b)

Figure 7: Histograms Generated During Goodness-of-Fit Test for Spatial Poisson Distribution

Grogan – ESD.86 11 It should also be noted that there is some flexibility for running the test with different numbers of bins and different sized sectors. With larger sectors, there are more storefronts on average allowing more numerous bins, but the downside is that the frequency in each bin is decreased.

Grogan – ESD.86 12 Comparative Analysis

Armed with the probabilistic models for both storefront and customer distributions, the next step is to apply the models to compare the expected distances to each franchise between the neighborhoods, over the entire city, and also to test the validity of the models by processing existing data.

Visualizations

In order to frame the resulting discussion, several visualizations of the storefront, MBTA station, and population density are provided below. Storefronts are clearly focused in the Downtown and Back Bay neighborhoods (where the population density is highest), though there is also significant population density in the Cambridge and Southwest neighborhoods without similar storefront densities. Also, the MBTA station density is higher in the Southwest neighborhood due to the numerous stops.

Figure 8: Sector-based Heat Maps of Location and Population Data

Grogan – ESD.86 13 Nearest Storefront Analysis

The primary comparative analysis between Dunkin’ Donuts and Starbucks seeks to determine which franchise is closer to more customers. From the customer’s perspective, the closest storefront may determine which franchise gets their business.

Model Results

Within the Poisson nearest neighbor model, the probability density function and expected value for distance to the nearest neighbor take the form of closed-form solutions given below for both Euclidean

(De) and Manhattan (Dm) distance metrics, where r is the distance to the nearest storefront and γ is the storefront density.

∞ −πγr 2 1 f ()r = 2πrγe E[]De = r ⋅ f D ()r dr = De ∫0 e 4γ

∞ −2γr 2 π f ()r = 4rγe E[]Dm = r ⋅ f D ()r dr = Dm ∫0 m 8γ

By applying the nearest-neighbor formulas, the expected distance to the nearest storefront of one or either franchise can be determined for each neighborhood. When combined with the customer model, aggregated values can be determined for the target area as a whole.

Table 4: Probabilistic Model Results by Neighborhood

Expected Min Euclidean Distance Expected Min Manhattan Distance Customer (D , mi) (D , mi) Neighborhood e m Prob. Dunkin’ Dunkin’ Starbucks Either Starbucks Either Donuts Donuts Northwest 0.0558 0.9899 0.9899 0.7000 1.2407 1.2407 0.8773 Cambridge 0.2148 0.4477 0.3096 0.2547 0.5611 0.3881 0.3192 Northeast 0.1937 1.1341 0.3146 0.3031 1.4214 0.3942 0.3799 Downtown 0.0485 0.1941 0.1347 0.1107 0.2433 0.1688 0.1387 Back Bay 0.0615 0.1516 0.1620 0.1107 0.1899 0.2031 0.1387 Southwest 0.2170 0.4563 0.3311 0.2680 0.5719 0.4149 0.3359 Southeast 0.2087 0.5891 0.3227 0.2830 0.7384 0.4044 0.3547 Aggregated 1.0000 0.6118 0.3383 0.2819 0.7668 0.4240 0.3533

The Downtown and Back Bay neighborhoods have the shorted expected distance to either storefront, with an average of just 580 feet to the closest Dunkin’ Donuts or Starbucks. A few neighborhoods show drastically different storefront placement strategies between the two franchises. Dunkin’ Donuts holds a large advantage in the Northeast and moderate advantages in the Southeast. The only neighborhood where Starbucks holds an advantage is in the Back Bay.

Grogan – ESD.86 14

Figure 9: Expected Distance Advantage (Blue: Dunkin' Donuts, Green: Starbucks)

Comparison with “Exact” Population Density Demand

As the data set is relatively small, an exhaustive analysis can be performed on the raw data for a comparison to the model. In this comparison, each population “dot” is used as a simulated customer, maintaining the underlying customer probability distribution. For each customer, the actual closest storefront is located using either the Euclidean or Manhattan distance metric. To report data consistently with the model results, the customers and minimum distances are aggregated by sector and neighborhood.

Table 5: Population-based Customer Results by Neighborhood

Mean Min Euclidean Distance Mean Min Manhattan Distance Customer (D , mi) (D , mi) Neighborhood e m Count Dunkin’ Dunkin’ Starbucks Either Starbucks Either Donuts Donuts Northwest 469 0.8599 0.6174 0.6315 1.1724 0.7626 0.7804 Cambridge 1806 0.4329 0.2995 0.2569 0.5536 0.3753 0.3202 Northeast 1629 1.2250 0.2788 0.2730 1.6372 0.3526 0.3435 Downtown 408 0.2485 0.2248 0.1812 0.3485 0.2805 0.2228 Back Bay 517 0.1604 0.1722 0.1448 0.2125 0.2145 0.1817 Southwest 1825 0.4499 0.3207 0.2641 0.5260 0.4043 0.3305 Southeast 1755 0.7784 0.3098 0.2796 0.8902 0.3897 0.3506 Aggregated 8409 0.6602 0.3085 0.2766 0.8314 0.3873 0.3457

Remarkably, the results closely mirror those of the model using the estimated expected distance under uniformly distributed customers and Poisson spatially-distributed storefronts in each sector. All of the aggregated measures are within 10% of the previous estimates. The largest neighborhood-specific differences occurred in the Northeast, Northwest neighborhoods (with errors around 0.5 miles for the advantage of the nearest storefront), indicating that the spatial Poisson distribution may not be a good fit for these regions.

Grogan – ESD.86 15 a) b)

Figure 10: a) Minimum Distance Advantage (Blue: Dunkin' Donuts, Green: Starbucks) b) Absolute Expected Distance Error

The effectiveness of the model can be statistically evaluated using a paired t-test by neighborhood. The null hypothesis that the mean distance to the closest storefront differs between the two approaches cannot be rejected for either Dunkin’ Donuts (p=0.418), Starbucks (p=0.492), or either (p=0.994) using Euclidean distance.

Comparison with “Exact” MBTA Station Demand

Under the theory that optimal storefronts should be placed at the intersections of high-traffic areas, MBTA stations are a prime location for Dunkin’ Donuts and Starbucks. This next section investigates how the minimum distance to the closest storefront varies if customers originate only from MBTA stations, rather than their residences. MBTA stations for the red, green, blue, orange, and silver lines within the neighborhoods are used.

Table 6: MBTA-based Customer Results by Neighborhood

Mean Min Euclidean Distance Mean Min Manhattan Distance MBTA (D , mi) (D , mi) Neighborhood Station e m Dunkin’ Dunkin’ Count Starbucks Either Starbucks Either Donuts Donuts Northwest 0 ------Cambridge 14 0.1628 0.0733 0.055 0.1707 0.0932 0.071 Northeast 6 1.1817 0.2443 0.2681 1.5962 0.3321 0.3639 Downtown 11 0.2182 0.1063 0.091 0.3292 0.1394 0.1204 Back Bay 12 0.1199 0.1003 0.0643 0.1326 0.1216 0.0765 Southwest 45 0.4340 0.2894 0.2314 0.5378 0.3517 0.2809 Southeast 33 0.6083 0.2094 0.1799 0.7119 0.2744 0.2343 Aggregated 121 0.4365 0.2049 0.1694 0.5361 0.2576 0.2132

Grogan – ESD.86 16 In every neighborhood except for Southwest’s Starbucks with Manhattan distance, the mean distance from an MBTA station to the nearest storefront is lower than the population distribution. In the case of Cambridge, the mean distances were 60-75% lower, indicating a strong preference for storefront locations near public transport stations. In the aggregate sense, the mean distances were consistently around 35% lower than simply using population density data.

Figure 11: Minimum Distance Advantage (Blue: Dunkin' Donuts, Green: Starbucks) with MBTA Customers

The difference in expected distance between the population-based and MBTA-based models can be statistically evaluated using a paired t-test by neighborhood. The null hypothesis that the mean distance to the closest storefront differs between the two approaches can be rejected for Dunkin’ Donuts (p=0.022), and either (p=0.028), but not for Starbucks (p=0.072), using Euclidean distance.

Distance Metric Comparison

Euclidean vs. Manhattan Distance Metrics

In ESD.86, as a part of the spatial probability distribution lecture, we found the following relation for the expected ratio between the Manhattan and Euclidean distance metrics.

⎡ Dm ⎤ 4 E⎢ ⎥ = ⎣ De ⎦ π

Grogan – ESD.86 17 This formula, however, assumes that the origin and destination remain constant between the two metrics. In the application used in this project, of finding the nearest storefront using one or the other metric, the nearest physical location may differ between the two metrics. For example, even if storefront A is closest under the Euclidean metric, there may be another storefront B that is closer under the Manhattan metric. Therefore, in this application, we would expect the ratio between distance metrics to be less than 4/π.

To investigate, the 8409 pair-wise distances for the population-based analysis can be plotted against each other in a scatter plot. As expected, the Manhattan metric distance is always greater than the Euclidean metric distance, but the expected ratio (highlighted in red), does not appear to be at the center of the distribution.

Figure 12: Scatter Plot of Closest Dunkin' Donuts Storefront under Different Distance Metrics

Using a hypothesis test on mean, the ratio of Manhattan-to-Euclidean distances is found to be 1.2617 with a 95% confidence interval of [1.2591, 1.2642]. Of note, this interval does not include 4/π ≈1.2732, meaning the expected ratio between closest storefront using Euclidean distance metric and closest storefront using Manhattan distance metric is not 4/π in this application, though for all practical purposes, the approximation is fine.

Distance Metrics vs. Google Distance

Aside from how the distance metrics relate to each other, it is of interest to see how they compare to “real” distance calculations. Google Maps provides a direction generation service that provides a walking distance between an origin and a destination point. Although it is “still under development,” it often

Grogan – ESD.86 18 provides more realistic distance calculations based on obstructions such as waterways, highways, crooked Boston streets, and buildings.

Since the question of distance metric accuracy does not rely on the underlying customer distribution, customers were selected at random with a uniform distribution. In total, 80 customer origins were generated, each being paired with the closest Dunkin’ Donuts (via Euclidean metric) and the GPS coordinates were used to find the walking distance in Google Maps.

Figure 13: Paired Customers-Dunkin' Donuts for Distance Comparison

There are a few challenges to using the Google distance metric. First, the output is typically rounded to the nearest 0.1 mile, which causes some accuracy problems. Second, there are some locations for which walking directions do not exist (e.g. if the point falls in the middle of a highway) – these points were omitted from the analysis. Aside from these points, there were a couple “outliers” in the resulting data set where the Google distance was (much) greater than the Euclidean or Manhattan distance. These points typically corresponded to navigating the network of roads and highways between the coastal islands near Logan Airport.

Grogan – ESD.86 19 a) b)

Figure 14: Distance Ratios for a) Google vs. Euclidean b) Google vs. Manhattan

In general, the Manhattan metric outperformed the Euclidean metric when compared to the Google distance. The Euclidean metric under-estimated the Google distances by an average of 28% (37% if outliers are included). The Manhattan metric under-estimated the Google distance by an average of 9% (20% if outliers are included). Note that in this case, the Manhattan-to-Euclidean metric ratio confidence interval does include 4/π, as the origin-destination locations are invariant under the choice of metric.

Table 7: Distance Metric Ratios

95% CI LB Mean 95% CI UB 95% CI LB Mean 95% CI UB Ratio w/o Outliers w/o Outliers w/o Outliers w/ Outliers w/ Outliers w/ Outliers Dg/De 1.3191 1.4019 1.4847 1.2955 1.6151 1.9348 Dg/Dm 1.0356 1.1031 1.1706 1.0253 1.2615 1.4977 Dm/De 1.2562 1.2821 1.3080 1.2577 1.2831 1.3084

Grogan – ESD.86 20 Conclusions

In conclusion, Dunkin’ Donuts holds a stronger grasp on the Cambridge-Boston area coffee market. The areas of greatest advantage for Dunkin’ Donuts include the Northeast and the Southeast. Only in the Back Bay neighborhood does Starbucks hold a shorter expected distance. Contributing to this analysis, several key assumptions have been checked and are summarized as follows.

Can the franchise storefronts be modeled with a spatial Poission distribution?

On a city-wide scale, the spatial Poisson distribution does not accurately model the franchise storefront locations. However, on a smaller scale such as neighborhoods, especially in regions of uniform characteristics and low-density, spatial Poisson distributions can be used to accurately model the locations of franchise storefronts.

Can the customers be modeled with a uniform distribution?

Similar to the storefront distribution method, although it is difficult to model an entire city with a uniform customer distribution, on a smaller scale such as neighborhoods, piecewise-uniform customer distributions can be used to model demands.

Does the “nearest-neighbor” distance correlate with the “actual” closest storefront distance?

The “nearest-neighbor” expected distance was not statistically different from the actual distances to the closest storefront for both Dunkin’ Donuts and Starbucks, indicating that it seems to be an accurate estimation of actual closest storefronts. The approximation was more accurate in neighborhoods with a higher storefront density, such as Downtown and Back Bay, and less accurate in less well-defined neighborhoods such as the Northeast and Northwest.

Is the Euclidean or Manhattan distance metric appropriate for pedestrian walking paths?

When comparing the Euclidean and Manhattan distance metrics as applied in the “nearest-neighbor” problem, the differences observed were slightly less than the expected ratio due to changes in the closest storefront when transitioning from one metric to the other. When compared to a realistic Google distance metric (using Google Maps to calculate the walking distance), both Euclidean and Manhattan over- estimated the distances, through Manhattan metric was generally closer (within 9% not considering outliers). There were a few outliers observed corresponding to customers originating in “difficult to access” spots such as between islands connected by few roads and/or highways.

Grogan – ESD.86 21 Appendix A: Raw Data

Note: I cannot claim this data set is complete or accurate. I do not know how up-to-date the Dunkin’ Donuts and Starbucks corporate store locator web applications are (for example, the Dunkin’ Donuts location in the MIT Student Center is missing), nor can I vouch for the accuracy of the GeoCoder service. Whenever possible, I compared the coordinates from both the Google and Yahoo services in attempt to detect any significant deviations, though this would not catch incorrect or missing addresses.

Dunkin’ Donuts Locations

Address Latitude Longitude 616 Massachusetts Ave, Cambridge, MA 02139 42.3650774 -71.1032368 222 Broadway, Cambridge, MA 02142 42.3664749 -71.0938686 1001 Cambridge St., Cambridge, MA 02141 42.372795 -71.093455 1 Bow St, Cambridge, MA 02138 42.3721864 -71.115655 65 Jfk St, Cambridge, MA 02138 42.3720035 -71.1206418 1 Broadway, Cambridge, MA 02142 42.362805 -71.084037 530 Commonwealth Ave, Boston, MA 02215 42.3487341 -71.096409 282 Somerville Ave, Somerville, MA 02143 42.3790404 -71.0940013 219 Cambridge St, Allston, MA 02134 42.3583 -71.1263 1008 Beacon St, Brookline, MA 02446 42.3458805 -71.1082714 1020 Commonwealth Ave, Boston, MA 02215 42.3517146 -71.1216648 14 Mcgrath Hwy, Somerville, MA 02143 42.3745146 -71.0855527 Station, Cambridge, MA 02163 42.373362 -71.118956 209 N Harvard St, Allston, MA 02134 42.362548 -71.1303143 519 Somerville Ave, Somerville, MA 02143 42.3831933 -71.1062606 333 Newbury St, Boston, MA 02115 42.3485475 -71.0864513 5 3rd St, Cambridge, MA 02141 42.3724142 -71.0794484 1420 Boylston St, Boston, MA 02215 42.3435354 -71.1015308 153 Massachusetts Ave, Boston, MA 02115 42.34649 -71.0873454 100 Cambridgeside Place, Cambridge, MA 02141 42.367101 -71.076376 90 Washington St, Somerville, MA 02143 42.381481 -71.0851061 330 Brookline Ave, Boston, MA 02215 42.340164 -71.105855 715 Boylston St, Boston, MA 02116 42.3497 -71.08 800 Boylston St, Boston, MA 02199 42.348296 -71.083099 210 Harvard Ave, Allston, MA 02134 42.3499 -71.1302 179 Brighton Ave, Allston, MA 02134 42.3532517 -71.1336082 53 Huntington Ave, Boston, MA 02199 42.3484649 -71.0780325 1316 Beacon St, Brookline, MA 02446 42.3423157 -71.1211249 350 Longwood Ave, Boston, MA 02115 42.3386435 -71.1068662

Grogan – ESD.86 22 154 Highland Ave, Somerville, MA 02143 42.3884292 -71.1035077 283 Huntington Ave, Boston, MA 02115 42.3420175 -71.0859189 430 Stuart St, Boston, MA 02116 42.3484 -71.075 99 Cambridge St, Charlestown, MA 02129 42.3823624 -71.0791666 457 Brookline Ave, Boston, MA 02215 42.3376055 -71.1087189 145 Dartmouth St, Boston, MA 02116 42.34719 -71.075406 60 Everett St, Allston, MA 02134 42.3559771 -71.1385555 1 White St, Cambridge, MA 02140 42.3885801 -71.1190066 509 Cambridge St, Allston, MA 02134 42.353605 -71.1374228 434 Mass Ave, Boston, MA 02118 42.334632 -71.0691073 220 Broadway, Somerville, MA 02145 42.3896854 -71.0883978 125 Nashua St, Boston, MA 02114 42.3678 -71.0649 Causeway Street, Boston, MA 02114 42.3654553 -71.0611291 8 Park Plz, Boston, MA 02116 42.3521066 -71.0673109 709 McGrath Hwy, Somerville, MA 02145 42.3904617 -71.0873328 1631 Tremont St, Boston, MA 02120 42.3341 -71.1039 106 Cambridge St, Boston, MA 02114 42.3611697 -71.0628675 59 Causeway St, Boston, MA 02114 42.3643423 -71.0633287 100 Cambridge St, Boston, MA 02114 42.360367 -71.062124 22 Beacon St, Boston, MA 02108 42.3577347 -71.063153 1131 Tremont St, Boston, MA 02120 42.3353579 -71.0878575 1 Legends Way, Boston, MA 02114 42.3643579 -71.0661193 11 Austin St, Charlestown, MA 02129 42.3751605 -71.0650097 80 Boylston St, Boston, MA 02116 42.3523815 -71.0650276 100 Legends Way, Boston, MA 02114 42.3643579 -71.0661193 180 Canal St, Boston, MA 02114 42.3652919 -71.0609754 127 Tremont St, Boston, MA 02108 42.3564 -71.0618 8 Harvard St, Brookline, MA 02445 42.3334223 -71.1188349 630 Washington St., Boston, MA 02111 42.352419 -71.062727 750 Washington St, Proger Bldg, Boston, MA 02111 42.3490894 -71.0601767 16 Tremont St., Boston, MA 02116 42.3592205 -71.0595142 616 Massachusetts Ave, Boston, MA 02118 42.3369897 -71.0774615 1 Summer St, Boston, MA 02110 42.354485 -71.05942 16 Kneeland St, Boston, MA 02111 42.3508657 -71.0622508 417 Washington St, Boston, MA 02108 42.3557073 -71.0602716 76 Middlesex Ave, Somerville, MA 02145 42.3932588 -71.0832451 20 Boylston St, Brookline, MA 02445 42.3318209 -71.1174846 504 Broadway, Somerville, MA 02145 42.3975134 -71.104938 214 N. Beacon St, Brighton, MA 02135 42.3558 -71.149 1138 Washington St, Boston, MA 02118 42.3438149 -71.0660403 244 Elm St, Somerville, MA 02144 42.3953081 -71.1218721

Grogan – ESD.86 23 100 City Hall Plaza, Boston, MA 02114 42.3582252 -71.0590107 235 Washington St, Boston, MA 02108 42.3578883 -71.0579421 2 City Hall Sq, Boston, MA 02108 42.3577028 -71.0593792 498 Mystic Ave, Stop & Shop, Somerville, MA 02145 42.3930959 -71.0884036 1 Congress St, Boston, MA 02114 42.362775 -71.059621 101 Summer St., Boston, MA 02110 42.3535216 -71.0578313 10 Winthrop Sq, Boston, MA 02110 42.354979 -71.0577489 20 North St, Boston, MA 02109 42.3608194 -71.0558302 3 Post Office Square, Boston, MA 02109 42.35672 -71.056577 736 Cambridge St, Brighton, MA 02135 42.3486807 -71.1496925 850 Harrison Ave, Roxbury, MA 02118 42.3348927 -71.0751815 265 Boylston St, Brookline, MA 02445 42.3305118 -71.1238184 111 , Boston, MA 02109 42.3591596 -71.0548161 176 Federal St, Boston, MA 02110 42.3536133 -71.0561934 2360 Washington St, Roxbury, MA 02119 42.3294746 -71.084687 517 Concord Ave, Cambridge, MA 02138 42.3863016 -71.1387706 One Fleet Center, Boston, MA 02114 42.365841 -71.060724 230 Congress St, Boston, MA 02110 42.3543283 -71.0543216 265 Franklin St, Boston, MA 02109 42.356767 -71.0535218 201 Alewife Brook Pky, Cambridge, MA 02138 42.3892258 -71.1429552 850 Broadway, Somerville, MA 02144 42.4009618 -71.1169843 17 Melnea Cass Blvd, Boston, MA 02119 42.3312853 -71.0748777 350 Washington St, Brighton, MA 02135 42.3490753 -71.1530339 70 E India Row, Boston, MA 02110 42.3579577 -71.0508084 635 Mount Auburn St, Watertown, MA 02472 42.3714388 -71.157704 22 W Broadway, South Boston, MA 02127 42.3425236 -71.0565548 315 Centre St, , MA 02130 42.3232 -71.1036 2480 Massachusetts Ave, Cambridge, MA 02140 42.398885 -71.1321608 5 Cambridgepark Dr, Cambridge, MA 02140 42.3946428 -71.1424549 268 Summer St, Boston, MA 02210 42.3503648 -71.0503943 275 Mystic Ave, Medford, MA 02155 42.4053859 -71.101411 485 Arsenal St., Watertown, MA 02472 42.363237 -71.155593 330 Congress St, Boston, MA 02210 42.351117 -71.049438 3850 Mystic Valley Pkwy, Medford, MA 02155 42.4058055 -71.0948504 50 Broadway, Everett, MA 02149 42.3967042 -71.0651774 1955 Beacon St, Brighton, MA 02135 42.3358927 -71.1499417 75 Old Colony Ave, South Boston, MA 02127 42.3358 -71.056 200 Seaport Blvd, Boston, MA 02210 42.349535 -71.040589 620 Fellsway, Medford, MA 02155 42.4074146 -71.0826755 1100 Mass Ave, Dorchester, MA 02125 42.313405 -71.0570969 1926 Columbus Ave, Roxbury, MA 02119 42.3168307 -71.0982039

Grogan – ESD.86 24 13-15 Maverick Sq, , MA 02128 42.369 -71.0391999 510 Southampton St, Boston, MA 02127 42.3297 -71.0572 600 Washington St, Brighton, MA 02135 42.3505411 -71.1672294 7 Commercial Street, Medford, MA 02155 42.4108632 -71.0881968 364 Boston Ave, Medford, MA 02155 42.4110023 -71.1208938 34 Central Sq., East Boston, MA 02128 42.3743987 -71.0395548 276 Beacham St, Chelsea, MA 02150 42.3958 -71.0526 154 Main St, Medford, MA 02155 42.4142048 -71.1106422 482 W Broadway, Boston, MA 02127 42.3355984 -71.0458407 318 Broadway, Everett, MA 02149 42.403701 -71.057403 132 Main St, Everett, MA 02149 42.4054905 -71.0617024 684 Centre St, Jamaica Plain, MA 02130 42.3118183 -71.114301 283 Middlesex Ave, Medford, MA 02155 42.4122525 -71.0791565 15 Commonwealth Ave, Chestnut Hill, MA 02467 42.3399488 -71.1671857 256 Boston St, Dorchester, MA 02125 42.3209 -71.061 1 Old Harbor St, South Boston, MA 02127 42.3345983 -71.0475223 130 Broadway, Chelsea, MA 02150 42.3892 -71.0408 101 Broadway, Arlington, MA 02474 42.409729 -71.1394997 379 Alewife Brook Pky, Somerville, MA 02144 42.4134649 -71.131227 757 Centre St, Jamaica Plain, MA 02130 42.3104 -71.1152 847 Dorchester Ave, Dorchester, MA 02125 42.3217677 -71.0568335 2 Salem St, Medford Square, Medford, MA 02155 42.4182977 -71.1096766 1885 Revere Beach Pkwy, Everett, MA 02149 42.4026488 -71.0497033 1886 Revere Beach Pkwy, Everett, MA 02149 42.4029719 -71.0576032 83 Everett Ave, Chelsea, MA 02150 42.3938 -71.0386 456 Blue Hill Ave, Dorchester, MA 02121 42.3094486 -71.0825407 49 Mount Auburn St, Watertown, MA 02472 42.3663261 -71.1818116 234 Everett Ave, Chelsea, MA 02150 42.3983547 -71.040518 100 Service Rd, East Boston, MA 02128 42.3685455 -71.0300292 369 Massachusetts Ave, Arlington, MA 02474 42.4120094 -71.1486312 524 Broadway, Everett, MA 02149 42.4095468 -71.0532538 12 Washington St, Chelsea, MA 02150 42.3934689 -71.0338541 430 Salem St, Medford, MA 02155 42.423165 -71.0912071 200 Commercial St, Malden, MA 02148 42.4216695 -71.0753317 4 Harvard Ave, Medford, MA 02155 42.4211848 -71.1331323 345 Washington St., Newton, MA 02458 42.3567005 -71.1875089 353 Trapelo Rd, Belmont, MA 02478 42.3853394 -71.1833662 99 Charles St, Malden, MA 02148 42.423576 -71.0712012 1236 Dorchester Ave, Dorchester, MA 02125 42.3084253 -71.0582217 372 Washington Ave, Chelsea, MA 02150 42.4050143 -71.0355293 321 Ferry St, Everett, MA 02149 42.4144921 -71.0474161

Grogan – ESD.86 25 171 Watertown Street, Watertown, MA 02472 42.3622518 -71.1931601 21 Summer St, Arlington, MA 02474 42.4194124 -71.1527705 57 Eastern Ave, Malden, MA 02148 42.4239636 -71.0656994 245 Pleasant St, Malden, MA 02148 42.4273278 -71.0740543 448 Main St, Watertown, MA 02472 42.3709389 -71.1952175 52 Church St, Belmont, MA 02478 42.3870472 -71.190931 7 Walk Hill St, Jamaica Plain, MA 02130 42.2956999 -71.1162 424 Main St, Malden, MA 02148 42.4265519 -71.0672918 356 Eastern Ave., Chelsea, MA 02150 42.3982 -71.0207 936-942 Broadway, Chelsea, MA 02150 42.4009 -71.0216 903 Broadway, Everett, MA 02149 42.4201478 -71.0439663

Starbucks Locations

Address Latitude Longitude 655 Massachusetts Ave, Cambridge, MA 02139 42.3656973 -71.1041728 750 Memorial Drive, Cambridge, MA 02139 42.3567886 -71.0945479 775 Commonwealth Ave., Boston, MA 02215 42.350478 -71.1096413 580 Commonwealth Ave, Boston, MA 02215 42.3488 -71.0994 595 Commonwealth Ave, Boston, MA 02215 42.349347 -71.099656 6 Cambridge Center, Cambridge, MA 02142 42.362707 -71.0864538 874 Commonwealth Ave, Brookline, MA 02215 42.3507655 -71.1138252 2 Cambridge Center, Cambridge, MA 02142 42.362793 -71.086199 142-148 Brookline Avenue, Boston, MA 02215 42.3449155 -71.1009163 468 Broadway, Cambridge, MA 02138 42.3738963 -71.1126878 350 , Boston, MA 02116 42.3481 -71.0872 147-151 Massachusetts Avenue, Boston, MA 02115 42.3466 -71.0878 36 JFK Street, Cambridge, MA 02138 42.3720035 -71.1206418 755 , Boston, MA 02116 42.349244 -71.080974 39 Dalton Street, Boston, MA 02199 42.346302 -71.084032 165 Newbury Street, Boston, MA 02116 42.3508 -71.0789 31 Church Street, Cambridge, MA 02138 42.3743901 -71.1200808 100 Cambridgeside Place, Cambridge, MA 02141 42.3673199 -71.0775368 110 Huntington Ave, Boston, MA 02116 42.346175 -71.079397 273 Huntington Ave, Boston, MA 02115 42.341792 -71.0862592 10 Huntington Ave, Boston, MA 02116 42.3488089 -71.0775803 364 Brookline Ave., Boston, MA 02215 42.3384972 -71.107699 441 Stuart Street, Boston, MA 02116 42.3485141 -71.076156 283 Longwood Avenue, Boston, MA 02115 42.3379 -71.1045 346 Huntington Ave, Boston, MA 02115 42.3402211 -71.0889516 277 Harvard St, Brookline, MA 02146 42.3491045 -71.1299685 443 Boylston Street, Boston, MA 02116 42.3515 -71.0729999

Grogan – ESD.86 26 473 Havard Street, Brookline, MA 02146 42.3470869 -71.1285487 97 , Boston, MA 02114 42.3588955 -71.0707138 1 Charles Street, Boston, MA 02108 42.351456 -71.067188 1662 Massachusetts Ave, Cambridge, MA 02138 42.3818755 -71.1198028 222 Cambridge Street, Boston, MA 02114 42.3600106 -71.0583794 627 Tremont St, Boston, MA 021181649 42.3424754 -71.0744816 711-723 Somerville Ave, Somerville, MA 02143 42.3854634 -71.1135012 143 Stuart Street, Boston, MA 02116 42.3511502 -71.0662507 62 Boylston Street, Boston, MA 02116 42.352269 -71.064369 15 Harvard Street, Brookline, MA 02445 42.3337518 -71.1188552 821 Washington Street, Boston, MA 02111 42.348839 -71.06427 12 , Boston, MA 02106 42.3557941 -71.0613094 63-65 Court Street, Boston, MA 02108 42.3593092 -71.0594142 27 , Boston, MA 02106 42.3774405 -71.0647911 240 Washington St., Boston, MA 02108 42.357925 -71.057883 1655 , Brookline, MA 02146 42.3388327 -71.1367592 75-101 , Boston, MA 02110 42.3549 -71.0569 1 Federal Street, Boston, MA 02110 42.355626 -71.056716 125 , Boston, MA 02110 42.3531 -71.0575 2-4 Faneuil Hall Market Place, Boston, MA 02101 42.35996 -71.05579 84 State Street, Boston, MA 02109 42.3590059 -71.0559325 1 Financial Center, Boston, MA 02111 42.352344 -71.056266 211 , Boston, MA 02110 42.3547995 -71.0548818 296 State St, Boston, MA 02109 42.3602 -71.0509 1 International Place, Boston, MA 02110 42.3561655 -71.0520816 1660-1670 Soldiers Field Rd, Brighton, MA 02135 42.3589 -71.1536 2 , Boston, MA 02110 42.364 -71.0505 260 Elm Street, Somerville, MA 02144 42.3955927 -71.1220376 550 Arsenal Street, Watertown, MA 02472 42.363459 -71.157439 222 Alewife Brook Parkway, Cambridge, MA 02138 42.389376 -71.142569 7 Allstate Rd, Boston, MA 02125 42.3293 -71.0627 470 Washington St, Brighton, MA 02135 42.3486 -71.1596

MBTA Station Locations

Name Latitude Longitude 42.39490705 -71.14098072 Davis Station 42.39606385 -71.12205505 Porter Square Station 42.38834612 -71.1192441 Harvard Square Station 42.373939 -71.119106 Central Station 42.36516345 -71.10332251 Kendall/MIT Station 42.36246023 -71.08658552

Grogan – ESD.86 27 Charles/Massachusetts General Hospital Station 42.36127109 -71.07208014 Park Street Station 42.35619719 -71.06229544 Station 42.355295 -71.060788 42.35170961 -71.05499983 Broadway Station 42.3429 -71.05713 42.32955 -71.05696 JFK / UMass Station 42.32143786 -71.05239272 Savin Hill Station 42.31130702 -71.05322957 Station 42.30026198 -71.06070757 42.29279438 -71.06578231 42.285924 -71.064219 42.27842012 -71.05974197 Butler Station 42.27211695 -71.06276751 Milton Station 42.27034655 -71.06794953 Central Avenue Station 42.27001311 -71.07324958 42.26789332 -71.08306646 Capen Street Station 42.2675678 -71.08722925 42.26745665 -71.09313011 42.27481612 -71.02917552 42.26561466 -71.01940155 42.25093242 -71.00497127 42.23275157 -71.00714922 Braintree Station 42.20878042 -71.00133419 42.370582 -71.076884 Science Park Station 42.36667752 -71.06816411 42.365512 -71.061423 Haymarket Station 42.362498 -71.058996 Government Center Station 42.359297 -71.059895 Boylston Street Station 42.35239149 -71.06487036 Arlington Station 42.351868 -71.070498 42.349962 -71.078089 Hynes Convention Center/ICA Station 42.348097 -71.088396 42.348797 -71.095296 Blandford Street Station 42.349297 -71.100796 Boston University East Station 42.349648 -71.103825 Boston University Central Station 42.34993352 -71.10618711 Boston University West Station 42.35090086 -71.1140728 St. Paul Street Station (B) 42.3511308 -71.11590743 Pleasant Street Station 42.35134488 -71.11821413 Babcock Street Station 42.35174133 -71.12126112 Packards Corner Station 42.35207434 -71.12486601

Grogan – ESD.86 28 Harvard Avenue Station 42.35023483 -71.13102436 Griggs Street / Long Avenue Station 42.34871243 -71.13415718 Allston Street Station 42.34844284 -71.13778353 Warren Street Station 42.34847455 -71.14029408 Washington Street Station 42.34368509 -71.142869 Sutherland Street Station 42.34149641 -71.14662409 Chiswick Road Station 42.3403386 -71.15130186 Chestnut Hill Avenue Station 42.33808635 -71.15334034 South Street Station 42.33957728 -71.15778208 Station 42.33994208 -71.16619349 St. Marys Street Station 42.34613537 -71.10680938 Hawes Street Station 42.34495386 -71.11101508 Kent Street Station 42.343997 -71.114596 St. Paul Street Station (C) 42.34322516 -71.11734509 Coolidge Corner Station 42.342097 -71.121396 Winchester Street / Summit Avenue Station 42.34128229 -71.12461925 Brandon Hall Station 42.340072 -71.128526 Fairbanks Station 42.339609 -71.13134623 Washington Square Station 42.33933937 -71.13542318 Tappan Street Station 42.33846702 -71.13879204 Dean Road Station 42.33770568 -71.14196777 Englewood Avenue Station 42.33713468 -71.14512205 Station 42.33589747 -71.1507225 42.34528691 -71.10439539 Longwood Station 42.34044962 -71.11089706 Brookline Village Station 42.33204296 -71.11811757 42.33121016 -71.12586379 Beaconsfield Station 42.33596092 -71.14160299 Reservoir Station 42.33493783 -71.14940286 Chestnut Hill Station 42.32667321 -71.16551757 Newton Centre Station 42.32935418 -71.1923182 Newton Highland Station 42.32169964 -71.20617986 42.31919287 -71.21691942 42.32626075 -71.23117805 42.33368473 -71.24492168 Riverside Station 42.33711088 -71.2517345 42.34563581 -71.08158588 42.342697 -71.085095 Northeastern University Station 42.34032274 -71.08889222 Museum of Fine Arts Station 42.33772154 -71.09547973 Longwood Medical Area Station 42.335837 -71.100652

Grogan – ESD.86 29 Station 42.334097 -71.104996 Fenwood Street Station 42.33374818 -71.10558629 Mission Park Station 42.33322472 -71.1070776 Station 42.33197951 -71.11207724 Back of the Hill Station 42.33007596 -71.11133695 Heath Street Station 42.3287593 -71.11059666 42.358897 -71.057795 Aquarium Station 42.359456 -71.05357 42.36886 -71.039926 Logan Airport Station 42.37273343 -71.0351944 Wood Island Station 42.380797 -71.023394 Orient Heights Station 42.386676 -71.006628 42.38840159 -71.00035787 Beachmont Station 42.39741872 -70.99219322 42.40716336 -70.99219322 42.414246 -70.992144 42.43534302 -71.07118964 42.42731334 -71.07387185 Wellington Station 42.40429559 -71.07700467 Station (Broadway Exit) 42.38575484 -71.07707977 (Cambridge Street Exit) 42.38301288 -71.07710123 Community College Station 42.37263832 -71.07027769 Chinatown Station 42.352228 -71.062892 New England Medical Center Station 42.349873 -71.063795 42.34727722 -71.07603908 Massachusetts Avenue Station (Orange) 42.34155192 -71.08321667 42.33566748 -71.090523 42.33152742 -71.09540462 42.32273881 -71.1000824 Stony Brook Station 42.31920081 -71.10282898 42.31056915 -71.10731363 Forest Hills Station 42.29814321 -71.11548901 Herald Street Station 42.346377 -71.064842 East Berkeley Street Station 42.343878 -71.066039 Union Park Street Station 42.341197 -71.069795 Newton Street Station 42.338697 -71.073795 Worcester Square Station 42.337456 -71.075812 Massachusetts Avenue Station (Silver) 42.336441 -71.077238 Lenox Street Station 42.33504887 -71.07881784 Station 42.33290747 -71.0810709 Dudley Square Station 42.32889414 -71.08511567

Grogan – ESD.86 30 Courthouse Station 42.35207434 -71.04530096 World Trade Center Station 42.3488393 -71.04253292 Silver Line Way Station 42.34801465 -71.0371685 Silver Line (Airport = SL1) 42.36628117 -71.01931572 Northern Avenue at Harbor Street, Boston 42.34661908 -71.03523731 Northern Avenue at Tide Street 42.34509659 -71.03197575 25 Dry Dock Avenue 42.344602 -71.028307 88 Black Falcon Avenue 42.34393885 -71.02721214 Black Falcon Avenue at Design Center Place 42.3438992 -71.03431463 Dry Dock Avenue at Design Center Place 42.34468 -71.034797 Summer Street at Power House Street 42.33986278 -71.03553772 East First Street at M Street 42.3381498 -71.03345633 City Point 42.3382291 -71.02935791

Population Density

Data is formatted as a 254x254-pixel GIF, scaled to approximately 25.4 pixels per mile. Black pixels indicate 100 units of population, white pixels indicate 0 units of population.

Target Area Map

Data is formatted as a 500x500-pixel JPG, scaled to approximately 50 pixels per mile.

Grogan – ESD.86 31

Random Customers and Paired Dunkin’ Donuts

Customer Paired Dunkin’ Donuts Euclidean Manhattan Google Distance Distance Distance Latitude Longitude Latitude Longitude (De, miles) (Dm, miles) (Dg, miles) 42.3082 -71.1327 42.3104 -71.1152 0.9063 1.0454 1.2 42.4231 -71.0546 42.424 -71.0657 0.5698 0.6263 0.7 42.3773 -71.1012 42.379 -71.094 0.3867 0.4878 0.4 42.3162 -71.0922 42.3168 -71.0982 0.3096 0.3501 0.5 42.3481 -71.1549 42.3491 -71.153 0.1167 0.1627 0.1 42.4119 -71.1774 42.4194 -71.1528 1.3603 1.7764 1.8 42.3666 -71.0827 42.3628 -71.084 0.2709 0.3305 0.4 42.3673 -71.0242 42.3685 -71.03 0.3098 0.3836 0.3 42.3539 -71.1775 42.3567 -71.1875 0.5464 0.7045 0.6 42.3199 -71.1624 42.3359 -71.1499 1.275 1.741 1.7 42.3285 -71.1936 42.3399 -71.1672 1.5634 2.1395 1.7 42.3332 -71.0489 42.3346 -71.0475 0.1195 0.1669 0.1

Grogan – ESD.86 32 42.3654 -71.1112 42.3651 -71.1032 0.4071 0.4288 0.5 42.3694 -71.1135 42.3722 -71.1157 0.2217 0.3025 0.2 42.3503 -71.0367 42.3495 -71.0406 0.2055 0.2514 0.3 42.4149 -71.1224 42.411 -71.1209 0.2801 0.3462 0.4 42.3359 -71.1003 42.3341 -71.1039 0.2219 0.3082 0.3 42.3812 -71.0732 42.3824 -71.0792 0.315 0.3849 0.5 42.3337 -71.1814 42.3399 -71.1672 0.8444 1.1574 1 42.3293 -71.1063 42.3341 -71.1039 0.3536 0.4542 0.7 42.3598 -71.0539 42.3592 -71.0548 0.0644 0.091 0.08 42.4084 -71.0839 42.4074 -71.0827 0.0924 0.1306 0.3 42.3766 -71.0385 42.3744 -71.0396 0.1613 0.2059 0.2 42.3112 -71.0867 42.3094 -71.0825 0.2444 0.3334 0.3 42.3756 -71.1229 42.3734 -71.119 0.2539 0.356 0.3 42.3173 -71.0822 42.3094 -71.0825 0.5428 0.5599 0.6 42.3482 -71.1098 42.3459 -71.1083 0.1783 0.2383 0.3 42.3658 -71.1789 42.3663 -71.1818 0.153 0.185 0.2 42.4109 -71.084 42.4109 -71.0882 0.2143 0.2168 0.4 42.3824 -71.0294 42.3892 -71.0408 0.748 1.0518 1 42.3994 -71.1121 42.401 -71.117 0.2717 0.3573 0.3 42.3123 -71.09 42.3094 -71.0825 0.4288 0.5778 0.5 42.3648 -71.092 42.3665 -71.0939 0.15 0.2111 0.2 42.4141 -71.1559 42.412 -71.1486 0.3982 0.5155 0.5 42.3101 -71.0786 42.3094 -71.0825 0.2062 0.2462 0.3 42.3792 -71.1679 42.3714 -71.1577 0.7473 1.0568 0.9 42.3063 -71.0601 42.3084 -71.0582 0.1754 0.2427 0.2 42.3498 -71.158 42.3491 -71.153 0.2584 0.3036 0.3 42.3895 -71.1657 42.3853 -71.1834 0.9466 1.1894 1.4 42.3089 -71.0686 42.3084 -71.0582 0.5308 0.5626 0.7 42.3294 -71.0452 42.3346 -71.0475 0.3782 0.4777 2.7 42.3171 -71.1547 42.3359 -71.1499 1.321 1.5414 1.8 42.3781 -71.1659 42.3714 -71.1577 0.622 0.8787 0.7 42.3287 -71.1906 42.3399 -71.1672 1.4258 1.9726 1.7 42.3303 -71.1389 42.3359 -71.1499 0.6834 0.9501 0.8 42.4068 -71.0983 42.4054 -71.1014 0.1865 0.2565 0.4 42.348 -71.0513 42.3504 -71.0504 0.1698 0.2096 0.2 42.3095 -71.0549 42.3084 -71.0582 0.1851 0.2438 0.3 42.3212 -71.191 42.3399 -71.1672 1.7766 2.5112 2.4 42.3435 -71.1304 42.3499 -71.1302 0.4423 0.4524 0.8 42.4108 -71.1094 42.4142 -71.1106 0.2436 0.2987 0.3 42.3746 -71.1139 42.3722 -71.1157 0.1893 0.2564 0.2 42.3297 -71.1879 42.3399 -71.1672 1.2727 1.7656 1.5

Grogan – ESD.86 33 42.3429 -71.0875 42.342 -71.0859 0.1012 0.1417 0.2 42.4125 -71.0943 42.4109 -71.0882 0.3315 0.4247 0.4 42.4142 -71.1431 42.412 -71.1486 0.3204 0.4337 0.4 42.422 -71.1892 42.4194 -71.1528 1.8684 2.0386 2.3 42.3311 -71.1714 42.3399 -71.1672 0.6481 0.8265 0.8 42.3686 -71.1429 42.3632 -71.1556 0.7465 1.0186 1.2 42.3192 -71.0623 42.3209 -71.061 0.1349 0.1838 0.2 42.3003 -71.1411 42.2957 -71.1162 1.3103 1.589 2.4 42.3312 -71.0764 42.3313 -71.0749 0.0779 0.0836 0.083 42.4268 -71.0322 42.4201 -71.044 0.7564 1.0603 0.9 42.3545 -71.0299 42.3495 -71.0406 0.6446 0.8887 8.2 42.3158 -71.0907 42.3168 -71.0982 0.3897 0.4543 0.6 42.4124 -71.0296 42.405 -71.0355 0.5933 0.813 0.8 42.2998 -71.1247 42.2957 -71.1162 0.5182 0.7172 1.1 42.311 -71.1881 42.3399 -71.1672 2.2673 3.0679 3.5 42.3287 -71.1439 42.3359 -71.1499 0.5849 0.8054 0.6 42.3957 -71.1896 42.387 -71.1909 0.6017 0.6658 1 42.4139 -71.0443 42.4145 -71.0474 0.1643 0.2 0.3 42.3251 -71.1197 42.3305 -71.1238 0.429 0.5842 0.8 42.3843 -71.179 42.3853 -71.1834 0.2342 0.2947 0.5 42.4035 -71.1712 42.412 -71.1486 1.2935 1.7401 1.7 42.4206 -71.153 42.4194 -71.1528 0.0829 0.0938 0.1 42.4108 -71.1563 42.412 -71.1486 0.4003 0.4751 0.8 42.3475 -71.0438 42.3495 -71.0406 0.216 0.3045 0.4 42.419 -71.0619 42.424 -71.0657 0.394 0.5369 0.6 42.3532 -71.1351 42.3533 -71.1336 0.0762 0.0797 0.078 42.3171 -71.1281 42.3118 -71.1143 0.7934 1.0694 1.2

Grogan – ESD.86 34 Appendix B: MATLAB Files

Attached files:

• Processing script (project.m) • Haversine function (haversine.m) • Pixel-Cartesian transformation (ij2xy.m) • Cartesian-Pixel transformation (xy2ij.m)

Grogan – ESD.86 35 5/5/10 6:47 PM C:\Documents and Settings\Paul Grogan\My Documents\...\project.m 1 of 15

% ESD.86 Term Project - Paul Grogan % April 12, 2010

clc; clear all; close all;

dd = xlsread('data2.xlsx','DD'); sb = xlsread('data2.xlsx','SB'); mbta = xlsread('data2.xlsx','MBTA'); mit = [42.363253 -71.103086];

%% Calculate Distances using "Haversine" Formula

dd_dist = haversine(mit(1),mit(2),dd(:,1),dd(:,2)); sb_dist = haversine(mit(1),mit(2),sb(:,1),sb(:,2)); mbta_dist = haversine(mit(1),mit(2),mbta(:,1),mbta(:,2));

%% Transform to Cartesian Coordinates

dd_xy = [haversine(mit(1),mit(2),mit(1),dd(:,2)).*sign(dd(:,2)-mit(2))... haversine(mit(1),mit(2),dd(:,1),mit(2)).*sign(dd(:,1)-mit(1))]; sb_xy = [haversine(mit(1),mit(2),mit(1),sb(:,2)).*sign(sb(:,2)-mit(2))... haversine(mit(1),mit(2),sb(:,1),mit(2)).*sign(sb(:,1)-mit(1))]; mbta_xy = [haversine(mit(1),mit(2),mit(1),mbta(:,2)).*... sign(mbta(:,2)-mit(2))... haversine(mit(1),mit(2),mbta(:,1),mit(2)).*sign(mbta(:,1)-mit(1))];

%% Load Map Image and Transform to Pixel Coordinates

I = imread('map.jpg'); W_i = length(I); % image width (pixels) W_m = 10; % map width (miles) dd_ij = zeros(size(dd_xy)); [dd_ij(:,1) dd_ij(:,2)] = xy2ij(dd_xy(:,1),dd_xy(:,2),W_m,W_i); sb_ij = zeros(size(sb_xy)); [sb_ij(:,1) sb_ij(:,2)] = xy2ij(sb_xy(:,1),sb_xy(:,2),W_m,W_i); mbta_ij = zeros(size(mbta_xy)); [mbta_ij(:,1) mbta_ij(:,2)] = xy2ij(mbta_xy(:,1),mbta_xy(:,2),W_m,W_i);

%% Load and Process Population Data P = imread('population.gif'); W_p = length(P); pop_xy = zeros(sum(sum(P)),2); pop_count = 1; for i=1:length(P) for j=1:length(P) if P(j,i) [pop_xy(pop_count,1) pop_xy(pop_count,2)] = ij2xy(i,j,W_p,W_m); pop_count = pop_count+1; end end end pop_ij = zeros(size(pop_xy)); 5/5/10 6:47 PM C:\Documents and Settings\Paul Grogan\My Documents\...\project.m 2 of 15

[pop_ij(:,1) pop_ij(:,2)] = xy2ij(pop_xy(:,1),pop_xy(:,2),W_m,W_i);

%% Discretize Data into Sectors

N_s = 10; W_s = 7/N_s; S_xy = zeros(N_s+1,1); for i=0:N_s S_xy(i+1) = -N_s*W_s/2 + W_s*i; end S_ij = xy2ij(S_xy,S_xy,W_m,W_i);

number_dd = zeros(N_s); number_sb = zeros(N_s); number_mbta = zeros(N_s); number_pop = zeros(N_s); for i=1:N_s for j=1:N_s number_dd(j,i) = sum((dd_xy(:,1)>S_xy(i)) .* ... (dd_xy(:,1)<=S_xy(i)+W_s) .* ... (dd_xy(:,2)>S_xy(end-j)) .* ... (dd_xy(:,2)<=S_xy(end-j)+W_s)); number_sb(j,i) = sum((sb_xy(:,1)>S_xy(i)) .* ... (sb_xy(:,1)<=S_xy(i)+W_s) .* ... (sb_xy(:,2)>S_xy(end-j)) .* ... (sb_xy(:,2)<=S_xy(end-j)+W_s)); number_mbta(j,i) = sum((mbta_xy(:,1)>S_xy(i)) .* ... (mbta_xy(:,1)<=S_xy(i)+W_s) .* ... (mbta_xy(:,2)>S_xy(end-j)) .* ... (mbta_xy(:,2)<=S_xy(end-j)+W_s)); number_pop(j,i) = 100*sum((pop_xy(:,1)>S_xy(i)) .* ... (pop_xy(:,1)<=S_xy(i)+W_s) .* ... (pop_xy(:,2)>S_xy(end-j)) .* ... (pop_xy(:,2)<=S_xy(end-j)+W_s)); end end

%% Define Neighborhoods (hard-coded for 10x10 grid)

neighborhoods = {'northwest'; 'cambridge'; 'northeast'; ... 'downtown'; 'back_bay'; 'southwest'; 'southeast'}; cambridge = [13:14 23:24 33:36 43:47 53:57]; northeast = [5:10 15:20 25:30 37:39]; northwest = [11:12 21:22 31:32 41:42]; downtown = [48:49 58:59]; southwest = [51:52 61:65 71:75 81:85]; back_bay = 66:68; southeast = [69:70 76:80 86:90 96:100]; unused = [1:4 40 50 60 91:95];

%% Build Neighborhood Probabalistic Model

5/5/10 6:47 PM C:\Documents and Settings\Paul Grogan\My Documents\...\project.m 3 of 15

POP_n = zeros(length(neighborhoods),1); A_n = zeros(length(neighborhoods),1); DD_n = zeros(length(neighborhoods),1); SB_n = zeros(length(neighborhoods),1); number_pop_s = reshape(number_pop',N_s^2,1); number_dd_s = reshape(number_dd',N_s^2,1); number_sb_s = reshape(number_sb',N_s^2,1); for i=1:length(neighborhoods) A_n(i) = W_s^2*eval(['length(' neighborhoods{i} ')']); DD_n(i) = sum(eval(['number_dd_s(' neighborhoods{i} ')'])); SB_n(i) = sum(eval(['number_sb_s(' neighborhoods{i} ')'])); POP_n(i) = sum(eval(['number_pop_s(' neighborhoods{i} ')'])); end gamma_dd = DD_n./A_n; gamma_sb = SB_n./A_n; p_cust = POP_n./sum(POP_n);

%% Evaluate Model Results

exp_de_dd = sqrt(1./(4*gamma_dd)); exp_de_sb = sqrt(1./(4*gamma_sb)); exp_dm_dd = sqrt(pi()./(8*gamma_dd)); exp_dm_sb = sqrt(pi()./(8*gamma_sb));

exp_de = sqrt(1./(4*(gamma_dd+gamma_sb))); exp_dm = sqrt(pi()./(8*(gamma_dd+gamma_sb)));

total_exp_de_dd = p_cust'*exp_de_dd; total_exp_de_sb = p_cust'*exp_de_sb; total_exp_dm_dd = p_cust'*exp_dm_dd; total_exp_dm_sb = p_cust'*exp_dm_sb;

total_exp_dm = p_cust'*exp_dm; total_exp_de = p_cust'*exp_de;

%% Compare the Euclidean and Manhattan Metrics

% use population data as customers target_pop_xy = pop_xy((pop_xy(:,1)>S_xy(1)) .* ... (pop_xy(:,1)S_xy(1)) .* ... (pop_xy(:,2)

% use mbta stations as customers % target_pop_xy = mbta_xy((mbta_xy(:,1)>S_xy(1)) .* ... % (mbta_xy(:,1)S_xy(1)) .* ... % (mbta_xy(:,2)

closest_de_dd(i) = min(sqrt((target_pop_xy(i,1)-dd_xy(:,1)).^2+ ... (target_pop_xy(i,2)-dd_xy(:,2)).^2)); closest_de_sb(i) = min(sqrt((target_pop_xy(i,1)-sb_xy(:,1)).^2+ ... (target_pop_xy(i,2)-sb_xy(:,2)).^2)); closest_dm_dd(i) = min(abs(target_pop_xy(i,1)-dd_xy(:,1))+ ... abs(target_pop_xy(i,2)-dd_xy(:,2))); closest_dm_sb(i) = min(abs(target_pop_xy(i,1)-sb_xy(:,1))+ ... abs(target_pop_xy(i,2)-sb_xy(:,2))); end closest_de = min(closest_de_dd,closest_de_sb); closest_dm = min(closest_dm_dd,closest_dm_sb);

avg_de_dd_s = zeros(N_s^2,1); avg_de_sb_s = zeros(N_s^2,1); avg_dm_dd_s = zeros(N_s^2,1); avg_dm_sb_s = zeros(N_s^2,1); num_cust_s = zeros(N_s^2,1); avg_de_s = zeros(N_s^2,1); avg_dm_s = zeros(N_s^2,1);

for sector=1:N_s^2 num_cust_s(sector) = length(target_pop_xy(... (target_pop_xy(:,1)>S_xy(mod(sector-1,10)+1)) .* ... (target_pop_xy(:,1)S_xy(end-ceil(sector/10))) .* ... (target_pop_xy(:,2)S_xy(mod(sector-1,10)+1)) .* ... (target_pop_xy(:,1)S_xy(end-ceil(sector/10))) .* ... (target_pop_xy(:,2)S_xy(mod(sector-1,10)+1)) .* ... (target_pop_xy(:,1)S_xy(end-ceil(sector/10))) .* ... (target_pop_xy(:,2)S_xy(mod(sector-1,10)+1)) .* ... (target_pop_xy(:,1)S_xy(end-ceil(sector/10))) .* ... (target_pop_xy(:,2)S_xy(mod(sector-1,10)+1)) .* ... (target_pop_xy(:,1)S_xy(end-ceil(sector/10))) .* ... (target_pop_xy(:,2)S_xy(mod(sector-1,10)+1)) .* ... (target_pop_xy(:,1)S_xy(end-ceil(sector/10))) .* ... (target_pop_xy(:,2)

(target_pop_xy(:,1)>S_xy(mod(sector-1,10)+1)) .* ... (target_pop_xy(:,1)S_xy(end-ceil(sector/10))) .* ... (target_pop_xy(:,2)

avg_de_dd_n = zeros(length(neighborhoods),1); avg_de_sb_n = zeros(length(neighborhoods),1); avg_dm_dd_n = zeros(length(neighborhoods),1); avg_dm_sb_n = zeros(length(neighborhoods),1); avg_de_n = zeros(length(neighborhoods),1); avg_dm_n = zeros(length(neighborhoods),1); num_cust_n = zeros(length(neighborhoods),1);

for i=1:length(neighborhoods) sectors = eval(neighborhoods{i}); for s=1:length(sectors) sector = sectors(s); avg_de_dd_n(i) = (avg_de_dd_n(i)*num_cust_n(i) + ... avg_de_dd_s(sector)*num_cust_s(sector))/... (num_cust_n(i)+num_cust_s(sector)+eps); avg_de_sb_n(i) = (avg_de_sb_n(i)*num_cust_n(i) + ... avg_de_sb_s(sector)*num_cust_s(sector))/... (num_cust_n(i)+num_cust_s(sector)+eps); avg_dm_dd_n(i) = (avg_dm_dd_n(i)*num_cust_n(i) + ... avg_dm_dd_s(sector)*num_cust_s(sector))/... (num_cust_n(i)+num_cust_s(sector)+eps); avg_dm_sb_n(i) = (avg_dm_sb_n(i)*num_cust_n(i) + ... avg_dm_sb_s(sector)*num_cust_s(sector))/... (num_cust_n(i)+num_cust_s(sector)+eps); avg_de_n(i) = (avg_de_n(i)*num_cust_n(i) + ... avg_de_s(sector)*num_cust_s(sector))/... (num_cust_n(i)+num_cust_s(sector)+eps); avg_dm_n(i) = (avg_dm_n(i)*num_cust_n(i) + ... avg_dm_s(sector)*num_cust_s(sector))/... (num_cust_n(i)+num_cust_s(sector)+eps); num_cust_n(i) = num_cust_n(i) + num_cust_s(sector); end end

avg_de_dd = 0; avg_de_sb = 0; avg_dm_dd = 0; avg_dm_sb = 0; avg_de = 0; avg_dm = 0; 5/5/10 6:47 PM C:\Documents and Settings\Paul Grogan\My Documents\...\project.m 6 of 15

num_cust = 0;

for i=1:length(neighborhoods) avg_de_dd = (avg_de_dd*num_cust + ... avg_de_dd_n(i)*num_cust_n(i))/... (num_cust+num_cust_n(i)+eps); avg_de_sb = (avg_de_sb*num_cust + ... avg_de_sb_n(i)*num_cust_n(i))/... (num_cust+num_cust_n(i)+eps); avg_dm_dd = (avg_dm_dd*num_cust + ... avg_dm_dd_n(i)*num_cust_n(i))/... (num_cust+num_cust_n(i)+eps); avg_dm_sb = (avg_dm_sb*num_cust + ... avg_dm_sb_n(i)*num_cust_n(i))/... (num_cust+num_cust_n(i)+eps); avg_de = (avg_de*num_cust + ... avg_de_n(i)*num_cust_n(i))/... (num_cust+num_cust_n(i)+eps); avg_dm = (avg_dm*num_cust + ... avg_dm_n(i)*num_cust_n(i))/... (num_cust+num_cust_n(i)+eps); num_cust = num_cust + num_cust_n(i); end

table = [vertcat(num_cust_n,num_cust) vertcat(avg_de_sb_n,avg_de_sb) ... vertcat(avg_de_dd_n,avg_de_dd) vertcat(avg_de_n,avg_de)... vertcat(avg_dm_sb_n,avg_dm_sb) vertcat(avg_dm_dd_n,avg_dm_dd)... vertcat(avg_dm_n,avg_dm)];

R_de_dm = mean(closest_dm_dd./closest_de_dd); E_de_dm = -norminv(0.05/2)*std(closest_dm_dd./closest_de_dd)/... sqrt(length(closest_dm_dd)); CI_de_dm = R_de_dm + [-E_de_dm E_de_dm];

%% Simulated Location Pairs for Metric Comparison

% lat = [min(dd(:,1)) max(dd(:,1))]; % long = [min(dd(:,2)) max(dd(:,2))]; % new_cust = [lat(1)+(lat(2)-lat(1))*rand(20,1) ... % long(1)+(long(2)-long(1))*rand(20,1)];

rand_cust = [ 42.3082 -71.1327; 42.4231 -71.0546; 42.3773 -71.1012; 42.3162 -71.0922; 42.3481 -71.1549; 42.4119 -71.1774; 42.3666 -71.0827; 42.3673 -71.0242; 42.3539 -71.1775; 42.3199 -71.1624; 42.3285 -71.1936; 42.3332 -71.0489; 42.3654 -71.1112; 42.3694 -71.1135; 42.3503 -71.0367; 42.4149 -71.1224; 42.3359 -71.1003; 42.3812 -71.0732; 42.3337 -71.1814; 42.3293 -71.1063; 42.3598 -71.0539; 42.4084 -71.0839; 42.3766 -71.0385; 42.3112 -71.0867; 42.3756 -71.1229; 42.3173 -71.0822; 42.3482 -71.1098; 42.3658 -71.1789; 42.4109 -71.0840; 42.3824 -71.0294; 42.3994 -71.1121; 42.3123 -71.0900; 42.3648 -71.0920; 5/5/10 6:47 PM C:\Documents and Settings\Paul Grogan\My Documents\...\project.m 7 of 15

42.4141 -71.1559; 42.3101 -71.0786; 42.3792 -71.1679; 42.3063 -71.0601; 42.3498 -71.1580; 42.3895 -71.1657; 42.3089 -71.0686; 42.3294 -71.0452; 42.3171 -71.1547; 42.3781 -71.1659; 42.3287 -71.1906; 42.3303 -71.1389; 42.4068 -71.0983; 42.3480 -71.0513; 42.3095 -71.0549; 42.3212 -71.1910; 42.3435 -71.1304; 42.4108 -71.1094; 42.3746 -71.1139; 42.3297 -71.1879; 42.3429 -71.0875; 42.4125 -71.0943; 42.4142 -71.1431; 42.4220 -71.1892; 42.3311 -71.1714; 42.3686 -71.1429; 42.3192 -71.0623; 42.3003 -71.1411; 42.3312 -71.0764; 42.4268 -71.0322; 42.3545 -71.0299; 42.3158 -71.0907; 42.4124 -71.0296; 42.2998 -71.1247; 42.3110 -71.1881; 42.3287 -71.1439; 42.3957 -71.1896; 42.4139 -71.0443; 42.3251 -71.1197; 42.3843 -71.1790; 42.4035 -71.1712; 42.4206 -71.1530; 42.4108 -71.1563; 42.3475 -71.0438; 42.4190 -71.0619; 42.3532 -71.1351; 42.3171 -71.1281; ];

rand_cust_dist = haversine(mit(1),mit(2),rand_cust(:,1),rand_cust(:,2)); rand_cust_xy = [haversine(mit(1),mit(2),mit(1),rand_cust(:,2)).*... sign(rand_cust(:,2)-mit(2))... haversine(mit(1),mit(2),rand_cust(:,1),mit(2)).*... sign(rand_cust(:,1)-mit(1))]; rand_de_dd = zeros(length(rand_cust_xy),1); rand_dm_dd = zeros(length(rand_cust_xy),1); rand_dd = zeros(length(rand_cust_xy),2); for i=1:length(rand_cust_xy) [C,CI] = min(sqrt((rand_cust_xy(i,1)-dd_xy(:,1)).^2+ ... (rand_cust_xy(i,2)-dd_xy(:,2)).^2)); rand_dd(i,:) = dd(CI,:); rand_de_dd(i) = C; rand_dm_dd(i) = abs(rand_cust_xy(i,1)-dd_xy(CI,1))+ ... abs(rand_cust_xy(i,2)-dd_xy(CI,2)); end rand_dd_dist = haversine(mit(1),mit(2),rand_dd(:,1),rand_dd(:,2)); rand_dd_xy = [haversine(mit(1),mit(2),mit(1),rand_dd(:,2)).*... sign(rand_dd(:,2)-mit(2))... haversine(mit(1),mit(2),rand_dd(:,1),mit(2)).*... sign(rand_dd(:,1)-mit(1))];

[rand_cust_ij(:,1) rand_cust_ij(:,2)] = ... xy2ij(rand_cust_xy(:,1),rand_cust_xy(:,2),W_m,W_i); [rand_dd_ij(:,1) rand_dd_ij(:,2)] = ... xy2ij(rand_dd_xy(:,1),rand_dd_xy(:,2),W_m,W_i); % for i=41:length(rand_cust) % disp(['from: ' num2str(rand_cust(i,1)) ', ' num2str(rand_cust(i,2)) ... % ' to: ' num2str(rand_dd(i,1)) ', ' num2str(rand_dd(i,2))]) % end rand_dg_dd = [1.2; 0.7; 0.4; 0.5; 0.1; 1.8; 0.4; 0.3; 0.6; 1.7; 1.7; 0.1; 0.5; 0.2; 0.3; 0.4; 0.3; 0.5; 1.0; 0.7; .080; 0.3; 0.2; 0.3; 0.3; 0.6; 0.3; 0.2; 0.4; 1.0; 0.3; 0.5; 0.2; 0.5; 0.3; 0.9; 0.2; 0.3; 1.4; 0.7; 2.7; 1.8; 0.7; 1.7; 0.8; 0.4; 0.2; 0.3; 2.4; 0.8; 0.3; 0.2; 1.5; 0.2; 0.4; 0.4; 2.3; 0.8; 5/5/10 6:47 PM C:\Documents and Settings\Paul Grogan\My Documents\...\project.m 8 of 15

1.2; 0.2; 2.4; .083; 0.9; 8.2; 0.6; 0.8; 1.1; 3.5; 0.6; 1.0; 0.3; 0.8; 0.5; 1.7; 0.1; 0.8; 0.4; 0.6; .078; 1.2; ];

outliers = abs(rand_dg_dd-rand_de_dd)>3*min(rand_dg_dd,rand_de_dd);

R_de_dg_o = mean(rand_dg_dd./rand_de_dd); E_de_dg_o = -norminv(0.05/2)*std(rand_dg_dd./rand_de_dd)/... sqrt(length(rand_dg_dd)); CI_de_dg_o = R_de_dg_o + [-E_de_dg_o E_de_dg_o]; R_de_dg_no = mean(rand_dg_dd(~outliers)./rand_de_dd(~outliers)); E_de_dg_no = -norminv(0.05/2)*std(rand_dg_dd(~outliers)./... rand_de_dd(~outliers))/sqrt(length(rand_dg_dd(~outliers))); CI_de_dg_no = R_de_dg_no + [-E_de_dg_no E_de_dg_no]; R_dm_dg_o = mean(rand_dg_dd./rand_dm_dd); E_dm_dg_o = -norminv(0.05/2)*std(rand_dg_dd./rand_dm_dd)/... sqrt(length(rand_dg_dd)); CI_dm_dg_o = R_dm_dg_o + [-E_dm_dg_o E_dm_dg_o]; R_dm_dg_no = mean(rand_dg_dd(~outliers)./rand_dm_dd(~outliers)); E_dm_dg_no = -norminv(0.05/2)*std(rand_dg_dd(~outliers)./... rand_dm_dd(~outliers))/sqrt(length(rand_dg_dd(~outliers))); CI_dm_dg_no = R_dm_dg_no + [-E_dm_dg_no E_dm_dg_no]; R_de_dm_o = mean(rand_dm_dd./rand_de_dd); E_de_dm_o = -norminv(0.05/2)*std(rand_dm_dd./rand_de_dd)/... sqrt(length(rand_dm_dd)); CI_de_dm_o = R_de_dm_o + [-E_de_dm_o E_de_dm_o]; R_de_dm_no = mean(rand_dm_dd(~outliers)./rand_de_dd(~outliers)); E_de_dm_no = -norminv(0.05/2)*std(rand_dm_dd(~outliers)./... rand_de_dd(~outliers))/sqrt(length(rand_dm_dd(~outliers))); CI_de_dm_no = R_de_dm_no + [-E_de_dm_no E_de_dm_no];

ci_table = [ [CI_de_dg_no(1) R_de_dg_no CI_de_dg_no(2) ... CI_de_dg_o(1) R_de_dg_o CI_de_dg_o(2)] [CI_dm_dg_no(1) R_dm_dg_no CI_dm_dg_no(2) ... CI_dm_dg_o(1) R_dm_dg_o CI_dm_dg_o(2)] [CI_de_dm_no(1) R_de_dm_no CI_de_dm_no(2) ... CI_de_dm_o(1) R_de_dm_o CI_de_dm_o(2)] ];

%% Plot Locations using Longitude and Latitude (GPS) Coordinates

figure(1) plot(dd(:,2),dd(:,1),'.b',... sb(:,2),sb(:,1),'.g',... mbta(mbta_dist<5,2),mbta(mbta_dist<5,1),'*r') axis equal xlabel('Longitude (\circ)') ylabel('Latitude (\circ)') legend('Dunkin Donuts','Starbucks','MBTA Station')

%% Plot Locations using Cartesian Coordinates

5/5/10 6:47 PM C:\Documents and Settings\Paul Grogan\My Documents\...\project.m 9 of 15

figure(2) plot(dd_xy(:,1),dd_xy(:,2),'.b',... sb_xy(:,1),sb_xy(:,2),'.g',... mbta_xy(mbta_dist<5,1),mbta_xy(mbta_dist<5,2),'*r',... 5*cos(linspace(0,2*pi(),100)),5*sin(linspace(0,2*pi(),100)),'-k') xlabel('Distance (miles)') ylabel('Distance (miles)') legend('Dunkin Donuts','Starbucks','MBTA Station') axis equal

%% Plot Locations Overlaid Map Image

figure(3) imshow(I) hold on plot(dd_ij(:,1),dd_ij(:,2),'.b',... sb_ij(:,1),sb_ij(:,2),'.g',... mbta_ij(mbta_dist<5,1),mbta_ij(mbta_dist<5,2),'*r',... W_i/2+W_i/2*cos(linspace(0,2*pi(),100)),... W_i/2-W_i/2*sin(linspace(0,2*pi(),100)),'-k') hold off legend('Dunkin Donuts','Starbucks','MBTA Station') axis image

%% Overlay Location Sector Sums on Location Map

figure(3) hold on for i=1:N_s+1 for j=1:N_s+1 plot(S_ij(i)*ones(100,1),linspace(S_ij(1),S_ij(end),100),'-k',... linspace(S_ij(1),S_ij(end),100),S_ij(j)*ones(100,1),'-k') if i<=N_s && j<=N_s text((S_ij(i)+S_ij(i+1))/2,(S_ij(j)+S_ij(j+1))/2,... ['\bf\color{blue}' num2str(number_dd(j,i)) ... '\newline\bf\color{green}' num2str(number_sb(j,i)) ... '\newline\bf\color{red}' num2str(number_mbta(j,i)) ],... 'HorizontalAlignment','center',... 'VerticalAlignment','middle') end end end hold off

%% Plot Population Data Overlaid on Map figure(4) imshow(I) hold on plot(pop_ij(:,1),pop_ij(:,2),'.magenta',... W_i/2+W_i/2*cos(linspace(0,2*pi(),100)),... W_i/2-W_i/2*sin(linspace(0,2*pi(),100)),'-k') hold off 5/5/10 6:47 PM C:\Documents and Settings\Paul Grogan\My Documents\...\project.m 10 of 15

legend('100 People') axis image

%% Overlay Population Sector Sums on Population Map

figure(4) hold on for i=1:N_s+1 for j=1:N_s+1 plot(S_ij(i)*ones(100,1),linspace(S_ij(1),S_ij(end),100),'-k',... linspace(S_ij(1),S_ij(end),100),S_ij(j)*ones(100,1),'-k') if i<=N_s && j<=N_s text((S_ij(i)+S_ij(i+1))/2,(S_ij(j)+S_ij(j+1))/2,... ['\bf' num2str(number_pop(j,i))],... 'HorizontalAlignment','center') end end end hold off

%% Display Sector Labels Overlaid on Map

figure(5) imshow(I) hold on for i=1:N_s+1 for j=1:N_s+1 plot(S_ij(i)*ones(100,1),linspace(S_ij(1),S_ij(end),100),'-k',... linspace(S_ij(1),S_ij(end),100),S_ij(j)*ones(100,1),'-k') if i<=N_s && j<=N_s text((S_ij(i)+S_ij(i+1))/2,(S_ij(j)+S_ij(j+1))/2,... ['\bf' num2str(10*(j-1)+i)],... 'HorizontalAlignment','center') end end end hold off axis off image

%% Overlay Neighborhood Colors on Sector Map

figure(5) hold on for sector=1:N_s^2 color = 'w'; if sum(cambridge==sector)>0 ... || sum(southeast==sector)>0 color='y'; elseif sum(northeast==sector)>0 ... || sum(northwest==sector)>0 ... || sum(back_bay==sector)>0 color='g'; 5/5/10 6:47 PM C:\Documents and Settings\Paul Grogan\My Documents\...\project.m 11 of 15

elseif sum(downtown==sector)>0 ... || sum(southwest==sector)>0 color='r'; end rectangle('Position',[S_ij(mod(sector-1,10)+1),... S_ij(ceil((sector)/10)),... S_ij(mod(sector-1,10)+2)-S_ij(mod(sector-1,10)+1),... S_ij(ceil(sector/10)+1)-S_ij(ceil(sector/10))],'FaceColor',color); end hold off

%% Test Poisson Spatial Distribution - Dunkin Donuts

bins_dd = 0:1:4; dd_exp = zeros(length(bins_dd),1); dd_obs = zeros(length(bins_dd),1); lambda_dd = sum(number_dd_s)/N_s^2; for i=1:length(bins_dd) if i==1 && length(bins_dd)>1 dd_exp(i) = N_s^2*poisscdf(bins_dd(i+1),lambda_dd); dd_obs(i) = sum(number_dd_s

figure(6) bar(bins_dd,[dd_exp dd_obs],'group') title('Dunkin Donuts Spatial Distribution Model') ylabel('Frequency') xlabel('Bin Lower Bound') legend('Expected (Poisson)','Observed')

chi2_dd = sum(sum((dd_obs-dd_exp).^2./dd_exp)); if chi2_dd > chi2inv(0.05,length(bins_dd)-2) disp(['H0: Dunkin Donuts storefronts Poisson spatially distributed '... 'is rejected (p=' ... num2str(chi2cdf(chi2_dd,length(bins_dd)-2)) ').']) else disp(['H0: Dunkin Donuts storefronts Poisson spatially distributed '... 'cannot be rejected (p=' ... num2str(chi2cdf(chi2_dd,length(bins_dd)-2)) ').']) end 5/5/10 6:47 PM C:\Documents and Settings\Paul Grogan\My Documents\...\project.m 12 of 15

%% Test Poisson Spatial Distribution - Starbucks

bins_sb = 0:1:3; sb_exp = zeros(length(bins_sb),1); sb_obs = zeros(length(bins_sb),1); lambda_sb = sum(number_sb_s)/N_s^2; for i=1:length(bins_sb) if i==1 && length(bins_sb)>1 sb_exp(i) = N_s^2*poisscdf(bins_sb(i+1),lambda_sb); sb_obs(i) = sum(number_sb_s

figure(7) bar(bins_sb,[sb_exp sb_obs],'group') title('Starbucks Spatial Distribution Model') ylabel('Frequency') xlabel('Bin Lower Bound') legend('Expected (Poisson)','Observed')

chi2_sb = sum(sum((sb_obs-sb_exp).^2./sb_exp)); if chi2_sb > chi2inv(0.05,length(bins_sb)-2) disp(['H0: Starbucks storefronts Poisson spatially distributed ' ... 'is rejected (p=' ... num2str(chi2cdf(chi2_sb,length(bins_sb)-2)) ').']) else disp(['H0: Starbucks storefronts Poisson spatially distributed ' ... 'cannot be rejected (p=' ... num2str(chi2cdf(chi2_sb,length(bins_sb)-2)) ').']) end

%% "Heat Map" Visualizations

X = S_xy(1:end-1)+W_s/2; Y = flipud(S_xy(1:end-1))+W_s/2;

figure(10) colormap jet contourf(X,Y,number_dd/W_s^2,120) caxis([0 30]) 5/5/10 6:47 PM C:\Documents and Settings\Paul Grogan\My Documents\...\project.m 13 of 15

colorbar shading flat axis off equal title('Dunkin Donuts (per mi^2)')

figure(11) contourf(X,Y,number_sb/W_s^2,120) caxis([0 30]) colorbar shading flat axis off equal title('Starbucks (per mi^2)')

figure(12) contourf(X,Y,number_mbta/W_s^2,120) caxis([0 30]) colorbar shading flat axis off equal title('MBTA Stations (per mi^2)')

figure(13) contourf(X,Y,number_pop/1000/W_s^2,120) colorbar shading flat axis off equal title('Population (thousands per mi^2)')

%% Plot Distance Metric Comparison Ratio

figure(14) hold on scatter(closest_de_dd,closest_dm_dd,'.k') plot(linspace(0,2,100),linspace(0,2,100),'-k',... linspace(0,2,100),4/pi().*linspace(0,2,100),'--r',... linspace(0,2,100),R_de_dm.*linspace(0,2,100),'--b') hold off title('Closest Dunkin'' Donuts') xlabel('Euclidean Distance (D_e, miles)') ylabel('Manhattan Distance (D_m, miles)') legend('Sample','R = 1',... 'R = 4/\pi',['R = ' num2str(R_de_dm)]); axis xy square

%% Plot Metric Comparison Ratios

figure(15) hold on scatter(rand_de_dd(~outliers),rand_dg_dd(~outliers),'.k') scatter(rand_de_dd(outliers),rand_dg_dd(outliers),'.r') plot(linspace(0,6,100),linspace(0,6,100),'-k',... linspace(0,6,100),R_de_dg_o*linspace(0,6,100),'--r',... 5/5/10 6:47 PM C:\Documents and Settings\Paul Grogan\My Documents\...\project.m 14 of 15

linspace(0,6,100),R_de_dg_no*linspace(0,6,100),'--b') hold off xlabel('Euclidean Distance (D_e, miles)') ylabel('Google Distance (D_g, miles)') legend('Sample','Outlier','R=1',['R=' num2str(R_de_dg_o)... ' (w/ Outliers)'],['R=' num2str(R_de_dg_no) ' (w/o Outliers)']) axis xy square

figure(16) hold on scatter(rand_dm_dd(~outliers),rand_dg_dd(~outliers),'.k') scatter(rand_dm_dd(outliers),rand_dg_dd(outliers),'.r') plot(linspace(0,6,100),linspace(0,6,100),'-k',... linspace(0,6,100),R_dm_dg_o*linspace(0,6,100),'--r',... linspace(0,6,100),R_dm_dg_no*linspace(0,6,100),'--b') hold off xlabel('Manhattan Distance (D_m, miles)') ylabel('Google Distance (D_g, miles)') legend('Sample','Outlier','R=1',['R=' num2str(R_dm_dg_o)... ' (w/ Outliers)'],['R=' num2str(R_dm_dg_no) ' (w/o Outliers)']) axis xy square

figure(17) hold on scatter(rand_de_dd(~outliers),rand_dm_dd(~outliers),'.k') scatter(rand_de_dd(outliers),rand_dm_dd(outliers),'.r') plot(linspace(0,6,100),linspace(0,6,100),'-k',... linspace(0,6,100),R_de_dm_o*linspace(0,6,100),'--r',... linspace(0,6,100),R_de_dm_no*linspace(0,6,100),'--b') hold off xlabel('Euclidean Distance (D_e, miles)') ylabel('Manhattan Distance (D_m, miles)') legend('Sample','Outlier','R=1',['R=' num2str(R_de_dm_o)... ' (w/ Outliers)'],['R=' num2str(R_de_dm_no) ' (w/o Outliers)']) axis xy square

%% Display Customer-Storefront Pairs Overlaid on Map

figure(18) imshow(I) hold on plot(rand_cust_ij(:,1),rand_cust_ij(:,2),'.m',... rand_dd_ij(:,1),rand_dd_ij(:,2),'.b') for i=1:length(rand_cust_ij) plot([rand_cust_ij(i,1) rand_dd_ij(i,1)],... [rand_cust_ij(i,2) rand_dd_ij(i,2)],'-k') end hold off legend('Customer','Dunkin'' Donuts') axis image %% figure(19) 5/5/10 6:47 PM C:\Documents and Settings\Paul Grogan\My Documents\...\project.m 15 of 15

hold on for i=1:size(neighborhoods) plot([1 2],[exp_de_dd(i) avg_de_dd_n(i)],'-b') plot([1 2],[exp_de_sb(i) avg_de_sb_n(i)],'-g') end hold off axis([0 3 0 1.5])

5/5/10 6:49 PM C:\Documents and Settings\Paul Grogan\My Documents\...\haversine.m 1 of 1

function [dist] = haversine(lat1, long1, lat2, long2)

radius = 3958.75587; % earth's radius (miles)

a = sin(deg2rad(lat2-lat1)/2).^2 + ... cos(deg2rad(lat1)).*cos(deg2rad(lat2)).*sin(deg2rad(long2-long1)/2).^2; c = 2*atan2(sqrt(a),sqrt(1-a)); dist = radius*c;

5/5/10 6:49 PM C:\Documents and Settings\Paul Grogan\My Documents\Acad...\xy2ij.m 1 of 1

function [i j] = xy2ij(x, y, W_xy, W_ij)

i = round(W_ij/2+W_ij/W_xy*x); j = round(W_ij/2-W_ij/W_xy*y);

5/5/10 6:49 PM C:\Documents and Settings\Paul Grogan\My Documents\Acad...\ij2xy.m 1 of 1

function [x y] = ij2xy(i, j, W_ij, W_xy)

x = W_xy/W_ij*i-W_xy/2; y = W_xy/2 - W_xy/W_ij*j;