Pokemon Go Lab Dev Harrington, srh2162 31 October 2016

library(vegan, quietly = TRUE)

## This is vegan 2.4-1

library(clustsig) library(ggplot2) library(dplyr, warn.conflicts = FALSE) library(magrittr)

Inputting the data and doing some cleanup

pokemon <- read.csv('PokemonGo.csv')

pokemon$Species <- gsub(pattern = '', replace = '', pokemon$Species) pokemon$Species <- as.factor(pokemon$Species) levels(pokemon$Species)

## [1] "Abra" "Bellsprout" "" "Caterpie" "Clefairy" ## [6] "" "Drowzee" "Ekans" "Electabuzz" "Exeggcute" ## [11] "Gastly" "" "Golbat" "Growlithe" "Haunter" ## [16] "Hitmonchan" "Horsea" "Hypno" "Koffing" "Krabby" ## [21] "Machop" "" "Magnemite" "Mankey" "" ## [26] "Nidoran" "Nidoran(F)" "Oddish" "Paras" "Pidgey" ## [31] "Poliwag" "" "Rattata" "Shellder" "Slowpoke" ## [36] "Spearow" "Staryu" "Tentacool" "Venonat" "Voltorb" ## [41] "Weedle" "Zubat"

pokemon$Location <- as.character(pokemon$Location) pokemon$Location[which(pokemon$Location == "Little Itlay")] <- 'Little Italy' pokemon$Location[which(pokemon$Location == "Riverside Park ")] <- 'Riverside Park' pokemon$Location[which(pokemon$Location == "Morningside Park ")] <- 'Morningside Park' pokemon$Location <- factor(pokemon$Location, levels = c('Chinatown', 'Little Italy', 'LES', 'Chelsea', 'Midtown', 'AMNH', 'UWS', 'Morningside Park', 'Riverside Park', '125th St Harlem', 'Bronx Zoo')) levels(pokemon$Location)

## [1] "Chinatown" "Little Italy" "LES" ## [4] "Chelsea" "Midtown" "AMNH" ## [7] "UWS" "Morningside Park" "Riverside Park" ## [10] "125th St Harlem" "Bronx Zoo"

Some of the character strings for the pokemon species and sample locations included some extra blank spaces or typos; these should be taken care of now. Also, the order of the sample locations is now south to north.

1 A first glance ggplot(data = pokemon) + geom_bar(mapping = aes(x = Location, fill = Species)) + theme(axis.text.x = element_text(angle = 90, hjust =1))

Abra Haunter Paras Bellsprout Hitmonchan Pidgey Bulbasaur Horsea Poliwag Caterpie Hypno Psyduck 20 Clefairy Koffing Rattata Cubone Krabby Shellder Drowzee Machop Slowpoke

count Ekans Magikarp Spearow 10 Electabuzz Magnemite Staryu Exeggcute Mankey Tentacool Gastly Meowth Venonat Gengar Nidoran Voltorb 0 Golbat Nidoran(F) Weedle Growlithe Oddish Zubat LES UWS AMNH Chelsea Midtown Little Italy Bronx Zoo Chinatown Riverside Park Riverside 125th St Harlem Morningside Park Location

Species accumulation curve event.list <- unique(pokemon$Location) nEvents <- length(event.list) species.list <- vector(mode = 'character') n.species <- vector(mode = 'numeric') for (i in1:nEvents){ species.list <- pokemon %>% filter(Location == event.list[i]) %>% use_series(Species) %>% droplevels() %>% levels() %>% c(species.list, .) %>% unique()

n.species[i] <- length(unique(species.list)) }

2 ggplot(mapping = aes(x =1:nEvents,y= n.species)) + geom_point() + xlab('Locations Sampled') + ylab('Number of Species') + theme_bw() + theme(panel.grid.major = element_blank())

40

30 Number of Species 20

3 6 9 Locations Sampled

Thought questions:

1) At the end of the sampling period was the curve increasing or flat? What does that mean for our sampling effort? It looks like the curve is just starting to flatten out at the end of the sampling period, but it’s still generally increasing. This suggests that the sampling effort may not have been enough to capture the full species richness of Pokemon in NYC. 2) Did you notice the curve being smooth or were there certain stations which created? What might cause a sudden jump in the rate of species accumulation? There’s a bit of a jump when the second and third sites (Little Italy and the Lower East Side) are added, and again when Morningside Park is added. A jump in the rate of species accumulation is caused by a site having a higher number of species that were not present in the previously sampled sites. In the case of Little Italy and Lower East Side, I think that the seeming jumps might just reflect a high rate of species accumulation with the first few sites sampled. In other words, it’s just an effect of which order we add the sites in, rather than an effect of the species at those sites. With the jump when Morningside Park is added, perhaps uptown parks represent a substantially different Pokemon habitat type.

3 Species diversity indices sites <- levels(pokemon$Location) species <- levels(pokemon$Species) species.mat <- matrix(nrow = length(sites), ncol = length(species)) rownames(species.mat) <- sites colnames(species.mat) <- species for (i in1: nrow(species.mat)) { for (j in1: ncol(species.mat)) { species.mat[i, j] <- pokemon %>% filter(Location == sites[i], Species == species[j]) %>% nrow } } div.indices <- data.frame( location = factor(sites, levels = c('Chinatown', 'Little Italy', 'LES', 'Chelsea', 'Midtown', 'AMNH', 'UWS', 'Morningside Park', 'Riverside Park', '125th St Harlem', 'Bronx Zoo')), shannon = diversity(species.mat, index = 'shannon'), simpson = diversity(species.mat, index = 'simpson'), pielou = diversity(species.mat, index = 'shannon')/ log(specnumber(species.mat))) div.indices

## location shannon simpson pielou ## Chinatown Chinatown 2.260234 0.8808864 0.9425907 ## Little Italy Little Italy 2.287314 0.8884298 0.9538840 ## LES LES 2.300611 0.8832000 0.9258338 ## Chelsea Chelsea 2.145842 0.8760331 0.9766147 ## Midtown Midtown 2.762350 0.9285714 0.9557074 ## AMNH AMNH 2.338372 0.8977778 0.9751767 ## UWS UWS 1.945910 0.8367347 0.9357850 ## Morningside Park Morningside Park 2.460909 0.9049587 0.9594375 ## Riverside Park Riverside Park 1.609438 0.7600000 0.8982444 ## 125th St Harlem 125th St Harlem 2.399204 0.9012346 0.9655108 ## Bronx Zoo Bronx Zoo 2.260234 0.8808864 0.9425907 ggplot(data = div.indices, mapping = aes(x = location,y= shannon)) + geom_point() + xlab("Locations Sampled") + ylab('Shannon-Weiner Index') + theme_bw() + theme(panel.grid.major = element_blank()) + theme(axis.text.x = element_text(angle = 90, hjust =1))

4 2.8

2.4

2.0 Shannon−Weiner Index Shannon−Weiner

1.6 LES UWS AMNH Chelsea Midtown Little Italy Bronx Zoo Chinatown Riverside Park Riverside 125th St Harlem Morningside Park Locations Sampled ggplot(data = div.indices, mapping = aes(x = location,y= simpson)) + geom_point() + xlab("Locations Sampled") + ylab('Simpson Index') + theme_bw() + theme(panel.grid.major = element_blank()) + theme(axis.text.x = element_text(angle = 90, hjust =1))

5 0.90

0.85 Simpson Index 0.80 LES UWS AMNH Chelsea Midtown Little Italy Bronx Zoo Chinatown Riverside Park Riverside 125th St Harlem Morningside Park Locations Sampled

Thought questions:

1) Why do we have different measures of diversity? What does it tell you when one site is more diverse by one measure while a second site is most diverse by a different? In general, we have different measures of diversity because they measure different things. Each has a different probabilistic interpretation, though they all can be reasonably lumped under “diversity.” (The Shannon-Weiner index can be interpreted as the uncertainty in predicting the species of an individual sampled at random from your dataset, whereas the Simpson index can be interpreted as the probability that two individuals sampled (with replacement) from your dataset will be the same species.) When different metrics give different results, then you can compare those probabilistic interpretations to understand how the sites differ. In this case, the Shannon-Weiner Index and the Simpson Index give pretty similar (but not identical) results:

ggplot(data = div.indices, mapping = aes(x = simpson,y= shannon)) + geom_point() + xlab("Simpson Index") + ylab('Shannon-Weiner Index') + theme_bw() + theme(panel.grid.major = element_blank())

6 2.8

2.4

2.0 Shannon−Weiner Index Shannon−Weiner

1.6 0.80 0.85 0.90 Simpson Index

2) Are there any geographic patterns in diversity? What would this tell us about the distribution of Pokemon in NYC? The four southernmost locations have pretty similar diversity levels, whereas the uptown sites (AMNH, UWS, Morningside Park, Riverside Park, 125 St Harlem) are much more variable. It’s not completely clear whether that difference is an effect of the north-south gradient and is therefore a geographic pattern, or whether it reflects other site characteristics. For example, the four southernmost locations are all mixed residential and business districts (and could be considered homogeneous in that sense), whereas the uptown sites are a museum, two parks, a primarily residential neighborhood, and a commercial street in a primarily residential neighborhood (and could be considered more heterogeneous). The site characteristic hypothesis could explain the high Pokemon diversity in Midtown, as potentially related to the high density of human pedestrian traffic there. The low diversity in Riverside Park could be related to its proximity to the Hudson River, rather than its nature as a park or its relatively low human pedestrian density–sampling more sites near large bodies of water (eg Battery Park, the Cloisters, Brooklyn Bridge) could help determine whether it’s driven by the proximity to a large body of water. 3) If you were approached by a tourist who only had a limited amount of time to play Pokemon Go about where to play, what neighborhood would you say was the best place to go? Why? If the tourist was interested in catching the most diverse set of Pokemon possible, I would recommend playing in Midtown. It has the highest diversity according to both indices. And there’s some other points of interest to tourists there as well, which are both more likely to have Pokemon and potentially of interest in themselves.

7 Community similarity

site.mat <- dist(species.mat, upper =T, method = 'binary') tree <- hclust(site.mat) plot(tree, xlab = 'Locations')

Cluster Dendrogram 0.9 0.7 Riverside Park Riverside Height AMNH 0.5 LES Chelsea Midtown UWS Bronx Zoo Chinatown 125th St Harlem Morningside Park Little Italy

Locations hclust (*, "complete")

The simprof.plot() function is giving me errors, so I’m not plotting with the significance, but I did find that simprof() creates a slightly different dendrogram:

tree2 <- simprof(data = species.mat, method.distance = 'binary') plot(tree2$hclust, pch = 16)

8 huh questions: Thought 4) it? 3) is what so, If present? pattern geographic a there Was 2) Ddststknfo ihntesm egbrodcutrtogether? cluster neighborhood same the within from taken sites Did 1) Height obte nesadwehrtoecaatrsismgtb rvn h ieec,Iwudrecommend would I difference, the driving be might characteristics those from whether different understand is better Park To Riverside else. everything and Park Riverside between is difference biggest The sad t. n oeohrstswt eaieylwhmnpdsra rffi sc sqitprsof parts quiet as (such traffic Coney pedestrian Bridge, human Brooklyn low Park, relatively Battery with and Cloisters, sites River), (the other Hudson water some (the traffic. of and water pedestrian bodies of etc.) human large body Island, of near large density sites a lowest other to the sampling have closest might the it it’s locations, reasons: sampling of exact couple on a depending most for the sites were other communities the these think you do Why dissimilar? kinds. most different different? the of were habitats communities include What clustered subtrees blocks level, urban every in virtually sites At while particularly. together Not clustered parks in sites (e.g. influence together)? habitat a see you Did particularly. uptown Not the all and clusters different in are clusters. Chinatown different and in Italy are Little sites example, For particularly. Not eieta egbrod n ue at fohrparks). other of parts quiet and neighborhoods residential 0.5 0.6 0.7 0.8 0.9

Riverside Park

Morningside Park

Chelsea

Little Italy Cluster Dendrogram hclust (*,"average")

rawdata.dist UWS 9 LES

Midtown

AMNH

125th St Harlem

Chinatown

Bronx Zoo NMDS

nmds <- metaMDS(species.mat,k=2)

The Shepard plot below looks pretty decent. There isn’t too much scatter around the line, suggesting that the original dissimilarites are reasonably well preserved in two dimensions.

stressplot(nmds)

2.0 Non−metric fit, R2 = 0.984 Linear fit, R2 = 0.914 1.5 1.0 Ordination Distance 0.5

0.5 0.6 0.7 0.8 0.9 1.0

Observed Dissimilarity

The stress is 0.1270986, which is decent. A basic NMDS plot:

plot(nmds)

10 + + + + + 0.5 + + + + + + + + + + + + + + 0.0 + + + + + NMDS2 + + + + + + + −0.5 + +

+ −1.0 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0

NMDS1

An NMDS plot with the species and sites labeled:

ordiplot(nmds, type = 'n') orditorp(nmds, display = 'species', col = 'red', air = 0.01) orditorp(nmds, display = 'sites', air = 0.01)

11 MagikarpMankey Bronx ZooPsyduck Tentacool

0.5 Oddish Voltorb Bulbasaur Slowpoke MidtownMagnemite Horsea Cubone Machop Nidoran Gastly Riverside Park LESDrowzee 125th St HarlemAMNH KrabbyNidoran(F) HitmonchanChinatownPidgey 0.0 Meowth Haunter Rattata Zubat Gengar Little Italy NMDS2 Weedle Koffing Spearow Golbat Morningside Park ChelseaPoliwagStaryu

−0.5 Venonat UWS Paras Hypno

Shellder −1.0 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0

NMDS1

12