University of New Hampshire University of New Hampshire Scholars' Repository

Master's Theses and Capstones Student Scholarship

Summer 2018

Abundance data change geographic estimates of terrestrial invasive risk in the contiguous United States

Mitchell W. O'Neill University of New Hampshire, Durham

Follow this and additional works at: https://scholars.unh.edu/thesis

Recommended Citation O'Neill, Mitchell W., "Abundance data change geographic estimates of terrestrial invasive plant risk in the contiguous United States" (2018). Master's Theses and Capstones. 1207. https://scholars.unh.edu/thesis/1207

This Thesis is brought to you for free and open access by the Student Scholarship at University of New Hampshire Scholars' Repository. It has been accepted for inclusion in Master's Theses and Capstones by an authorized administrator of University of New Hampshire Scholars' Repository. For more information, please contact [email protected].

ABUNDANCE DATA CHANGE GEOGRAPHIC ESTIMATES OF TERRESTRIAL PLANT INVASION RISK IN THE CONTIGUOUS UNITED STATES

BY

MITCHELL W. O’NEILL Bachelor of Arts, Biology & Studio Art, University of Vermont, 2015

THESIS

Submitted to the University of New Hampshire in Partial Fulfillment of the Requirements for the Degree of

Master of Science in Natural Resources: Environmental Conservation

September, 2018

i

ALL RIGHTS RESERVED © 2018 Mitchell W. O’Neill

ii

This thesis has been examined and approved in partial fulfillment of the requirements for the degree of Master of Science in Natural Resources: Environmental Conservation by:

Thesis Director, Dr. Jenica M. Allen, Assistant Professor, Natural Resources & the Environment

Dr. Thomas D. Lee, Associate Professor, Natural Resources & the Environment

Dr. Rebecca J. Rowe, Associate Professor, Natural Resources & the Environment

On May 22, 2018

Original approval signatures are on file with the University of New Hampshire Graduate School.

iii

ACKNOWLEDGEMENTS

First, I’d like to thank my advisor, Dr. Jenica Allen, for supporting and challenging me as a budding scientist. I will forever appreciate being treated as an intellectual equal when I clearly have a lot to learn, and for your endless dedication to my success as a graduate student as well as my professional development. Working with you has been a pleasure. I would also like to thank my committee members Dr. Rebecca Rowe and Dr. Thomas Lee for their support of my graduate career and valuable insights on my thesis work. In addition, I thank Tom for his mentorship during my first semester here at UNH as his TA. I thank Dr. Bethany Bradley at the University of Massachusetts for her insights on my project from its conception to its completion. I also thank Rebekah Wallace, the EDDMapS Data coordinator for sending us a full download of the database that allowed for the research in this document, and for her help in interpreting the abundance data.

Thank you to the Allen Lab (Dr. Jenica Allen, Dr. Tim Szewczyk, Katie Moran, and

Ceara Sweetser) for the interesting and informative discussions on invasive and the regular feedback on my research progress. I specifically thank Ceara for numerous data processing contributions to this project, and Tim for his advice on the final edits of this thesis and his help with various technical struggles.

I also thank my parents, without whom I would never have gotten to this point where I have the privilege of pursuing what fascinates me. Last but not least, I thank my girlfriend, Tori, for joining me in New Hampshire so that I could pursue my interests and for supporting and inspiring me every single day.

This project was funded by the National Science Foundation (BCS Award # 1560925).

iv

TABLE OF CONTENTS

COPYRIGHT PAGE ………………………………………………………………. ii

COMMITTEE PAGE ………………………………………………………………. iii

ACKNOWLEGEMENTS …………………………………………………………... iv

LIST OF FIGURES ……………………………………………………………….... vii

LIST OF TABLES ………………………………………………………………….. vii

ABSTRACT ………………………………………………………………………… viii

CHAPTER PAGE

I. Invasive plant impacts, management, and biogeography……………………. 1

Why are invasive plants important? ……………………………………….. 1

Strategies for managing invasive plants .…………………………………... 3

Invasive plant biogeography …...…………………………………...... 4

Summary ……………………………………………………………...... 7

II. Abundance data change geographic estimates of terrestrial invasive plant risk in the contiguous United States…………………….……………………. 8

Introduction ………………………………………………………………... 8

Methods ……………………………………………………………………. 12

Study region ………………………………………………………... 12

Invasive data ………………………………………………. 13

Environmental data ………………………………………………... 16

Species Distribution Modeling …………………………………...... 17

Species-level abundance assessments ……………………………… 20

v

Hotspot comparison ………………………………………………... 22

Results ……………………………………………………………………… 23

Model quality and species retained ………………………………... 23

Establishment and impact ranges………………….……………….. 26

Invasive species hotspots …………………………………………... 29

Discussion ………………………………………………………………….. 32

Conclusion ………………………………………………………………….. 37

LIST OF REFERENCES …………………………………………………………... 39

APPENDIX A: Ecoregions and land cover in the contiguous United States .……… 48

APPENDIX B: Invasive species data in the United States …………………………. 50

APPENDIX C: Ordinal ranking and data cleaning for multiple abundance metrics .. 54

APPENDIX D: Abundance-based species distribution modeling using infested area 60

APPENDIX E: Intermediate model results …………..…………...... 64

vi

LIST OF FIGURES FIGURES PAGE 1. Visualization of the modeling process for an example species ....……...... 21 2. Composition of species traits for 1,045 terrestrial invasive plants, 155 species with sufficient abundance data, and 70 species that passed through all three model screenings….….…………………………………………..……………... 25 3. Histogram of Cohen’s kappas for the best fit ordinal regressions for the 70 species included in the analysis ……………………………………………….... 26 4. Histograms of potential establishment range size, potential impact range size, and percent infilling………………..………………………………………………... 27 5. The relationship between range size and sample size and between establishment range size and impact range size……………………………………....………... 28 6. Invasive species richness and abundance hotspots overlaid and the proportion of high-abundance species………………………………………...………..……... 30 7. Ecoregion composition of the contiguous U.S., potential invasive species richness hotspots, and abundance hotspots .………………...... 31 A1. Level I ecoregions in the contiguous U.S. ………...... ……………………….... 48 A2. Geographic distribution of absence records ..…………………………………... 50 A3. Spatial bias surface from modeling sampling effort as a response to road and human population density across the contiguous U.S. …..………………...... 50 A4. Box plot illustrating the congruency between infested area and text data ……... 59 A5. Map of potential species richness and the potential number of species at high abundance …………………………………………………………………….... 70

LIST OF TABLES TABLES PAGE 1. Summary of the combination of several abundance metrics into an ordinal abundance ranking scheme .……………………………………..……………… 15 2. Summary of sequential model screening ..………………………………...... 24 A1. Area and percent of the contiguous U.S. belonging to each ecoregion and land cover class ….....…...... 49 A2. Aggregation and selection of proportional land cover classes ……...………….. 49 A3. Invasive plant list sources for each state in the contiguous U.S. ……………….. 51 A4. Invasive species abundance data availability by abundance metric …..………... 54 A5. Pairwise correlations between all abundance metrics ………………………….. 60 A6. Summary of model results for all 155 species with adequate data …………….. 64

vii

ABSTRACT

ABUNDANCE DATA CHANGE GEOGRAPHIC ESTIMATES OF TERRESTRIAL PLANT

INVASION RISK IN THE CONTIGUOUS UNITED STATES

by

Mitchell W. O’Neill

University of New Hampshire, September, 2018

Invasive plants pose severe threats to ecosystems worldwide. Geographic invasion risk is often assessed using occurrence-based species distribution modeling, which estimates where species can occur but cannot represent variation in abundance, a crucial component of impact. I assembled an ordinal abundance dataset for 155 terrestrial invasive plant species in the contiguous United States based on reports to the Early Detection & Distribution Mapping

System, a georeferenced invasive species data repository. I used maximum entropy models to estimate range boundaries and ordinal regression to estimate abundance within ranges. I predicted the areas where establishment is possible (potential establishment ranges), and where high abundance is likely (potential impact ranges) for each species, and compared where many species may establish (richness hotspots) to areas where many species may achieve high abundance (abundance hotspots). The potential impact range encompasses a small portion of the potential establishment range for many species (median: 9%). High abundance populations have not yet been observed across much of potential impact ranges for each species, indicating substantial risk for expansion. Forty-seven percent of the richness hotspots were overlapped by the abundance hotspots. Eastern Temperate Forests constituted large portions of both hotspots

viii

(80% of richness hotspots, 49% of abundance hotspots), but abundance hotspots included large areas not identified by richness hotspots, particularly in western U.S. ecoregions. The delineation of impact ranges within establishment ranges can facilitate the prioritization of management to species and geographic areas associated with the highest impact risks. Using establishment ranges alone could over-estimate geographic invasion risk in areas where many species can establish but few are likely to be abundant, and under-estimate geographic invasion risk in areas where not many species can establish, but high abundance is likely for many of the potential establishers. Given the potential for abundance data to refine estimates of geographic invasion risk, more spatial analyses of invasive plants should include abundance data, and more abundance data should be collected to allow additional comprehensive analyses with greater numbers of species.

ix

CHAPTER I:

Invasive plant impacts, management, and biogeography

Why are invasive plants important?

Human activities drive profound changes in global ecology, including the movement of species beyond natural dispersal capabilities (Millennium Ecosystem Assessment, 2005).

Sometimes, this movement results in biological invasions where alien species have severe negative impacts on native ecosystems (Vilà et al., 2011; Simberloff et al., 2013; Blackburn et al., 2014). Human mediated biotic exchange has increased over the past two centuries (Hulme,

2009; Seebens et al., 2017), and novel introductions continue due to expanding trade networks

(Hulme, 2009; Bradley et al., 2012; Early et al., 2016; Seebens et al., 2018). Terrestrial plants are among the most studied taxa in biological invasion literature and are well-known for their impacts on native environments (Pysek et al., 2008; Vilà et al., 2011). Invasive plants threaten native communities (Hejda et al., 2009), disrupt ecosystem services (Hejda et al., 2009; Vilà et al., 2011; Simberloff et al., 2013; Blackburn et al., 2014), and have negative economic impacts

(Mack et al., 2000; Leung et al., 2002; Pimentel et al., 2005; Olson, 2006; Jardine & Sanchirico,

2018).

United States Executive Order 13112 (1999) defines invasive species as species that occur outside of their native ranges and negatively impact environments, economies, or human health. Invasive plant species have complex and highly variable effects at multiple ecological levels (Vilà et al., 2011; Blackburn et al., 2014) and spatial scales (Powell et al., 2013). At the population level, invasive plants tend to decrease the abundance of native species through

1 competition (Vilà et al., 2011), though invasion-driven extirpations are rare (Gurevitch &

Padilla, 2004; Powell et al., 2013). The impacts of invasive species on community-level biodiversity are widely studied, but results are mixed (e.g., Gurevitch & Padilla, 2004; Hejda et al., 2009), vary with scale (Powell et al., 2013), and largely depend on the traits of the invasive species and the invaded environment (Pyšek et al., 2012). At regional scales, plant invasion can increase species richness if all species are included in biodiversity calculations regardless of nativity (Gurevitch & Padilla, 2004; Powell et al., 2013) due to the addition of species and the rareness of invasive plant driven extinctions (Gurevitch & Padilla, 2004; Powell et al., 2013).

However, the reduction of native species abundance at the population level (Vilà et al., 2011) can reduce plant biodiversity at the community level and at the landscape scale (Hejda et al., 2009;

Pyšek et al., 2012). Invasive plants can also drive changes in the trophic structure of biological communities (Pearson, 2009; Burghardt et al., 2010) and dramatically alter ecosystem processes including microbial activity, nutrient availability, and nutrient cycling at local to regional scales

(Ehrenfeld, 2010; Vilà et al., 2011; Pyšek et al., 2012).

The impacts of invasive plants at different ecological levels can be interrelated

(Gurevitch et al., 2011). For example, impacts on nutrient cycling could also have effects on individual plant species performance, which can propagate changes in community structure (Vilà et al., 2011). Additionally, some invasive plants alter disturbance regimes (e.g., fire), sometimes in ways which favor the establishment of new individuals of the invasive species (Mack &

D’Antonio, 1998; Brooks et al., 2004). Synergistic interactions among invasive species may accelerate the impacts of invasion on native ecosystems in a phenomenon known as invasional meltdown (Simberloff & Von Holle, 1999). In some cases established invasive plants facilitate the establishment of new invasive species (Simberloff, 2006; Braga et al., 2018), particularly

2 similar plant species (Sheppard et al., 2018), but community and ecosystem-wide impacts of invasion synergism are not yet well understood (Simberloff, 2006; Braga et al., 2018).

In addition to driving adverse ecological and environmental effects, invasive plants can disrupt ecosystem services, causing harm to humans (Pyšek & Richardson, 2010) and economies

(Pimentel et al., 2005; Stohlgren & Schnase, 2006; Hester et al., 2009; Yokomizo et al., 2009;

Sheley et al., 2015). Agricultural weeds can severely reduce crop yields (McDonald et al., 2009;

Yokomizo et al., 2009), and poison livestock or reduce forage quality and quantity (DiTomaso,

2000). Many invasive plants can cause skin damage and allergic reactions in humans (Pyšek &

Richardson, 2010). Managing invasive plant populations is very expensive (Leung et al., 2002;

Jardine & Sanchirico, 2018), so plant invasion is both economically and environmentally beneficial (Sheley et al., 2015).

Strategies for managing invasive plants

Strategies for managing with invasive plant impacts range from proactive prevention to reactive mitigation (Hulme, 2006). For invasive species that are not yet established in a region, managers can employ prevention strategies including border control and quarantine measures to manage the invasion pathway directly (Leung et al., 2002; Hulme, 2006, 2009; Pyšek &

Richardson, 2010). As some potential invaders will inevitably pass through preventative screening measures, managers also apply the method of early detection and rapid response

(EDRR) to species that are not yet well-established in the region of interest (Hulme, 2006).

Managers conducting EDRR monitor the region of interest for new invasive plant species and eliminate them soon after detection (Hulme, 2006; Pyšek & Richardson, 2010). This strategy

3 focuses on nascent populations, which can be eradicated much more effectively than well- established populations (Moody & Mack, 1988; Hulme, 2006).

Eradication is typically infeasible for well-established invasive plants, but these species can be actively managed to minimize potential impacts (Hulme, 2006). Established invasive plant species in the United States exhibit low infilling, meaning that many suitable areas are not yet infested and there is high risk of invasion into new areas within potential ranges (Bradley et al., 2015; Allen & Bradley, 2016). Therefore, containment is applied to invasive species that are well-established in the region of interest, in which managers control the spread from occupied areas to unoccupied areas within the region of establishment by monitoring for and addressing nascent populations in previously uninvaded areas (Hulme, 2006). In locations already infested by the invasive species, managers can mitigate negative impacts using management takes such as herbicide treatment, prescribed burning, restoration, and biological control (Pyšek & Richardson,

2010). The considerable effort required to monitor for new incursions of invasive plants (whether within the establishment region for containment or in new regions for EDRR) highlights the need for prioritizing some species over others (Hulme, 2006; Pyšek & Richardson, 2010).

Invasive plant biogeography

Modeling geographic invasion risk is a useful way to estimate the potential impact of invasive species across space (Ibáñez et al., 2009a, 2009b; O’Donnell et al., 2012; Gallagher et al., 2013; Allen & Bradley, 2016). Correlative species distribution models (SDMs), alternatively referred to as environmental niche models, are often used to address biogeographical questions about the distributions of invasive species (e.g., Ibáñez et al., 2009a, 2009b; O’Donnell et al.,

2012; Gallagher et al., 2013; Allen & Bradley, 2016). SDMs are statistical models that relate

4 observations of a species in nature to the environmental conditions at the locations of those observations (Guisan & Zimmermann, 2000; Elith & Leathwick, 2009). Climate is the main driver of species distributions at regional and continental scales (Pearson & Dawson, 2003), so most SDMs relate species datasets to climate conditions in order to predict coarse-resolution distributions across large spatial extents (e.g, Franklin & Miller, 1997; Guisan & Zimmermann,

2000; Austin, 2007; Ibáñez et al., 2009a, 2009b; Elith & Leathwick, 2009; Kulhanek et al., 2011;

O’Donnell et al., 2012; Peterson & Soberón, 2012; Gallagher et al., 2013; Bradley et al., 2015;

Allen & Bradley, 2016; Bradley, 2016; Hui et al., 2016; Cross et al., 2017). Land cover may also be a useful predictor of invasive species distributions at the regional scale (Ibáñez et al., 2009a,

2009b; Merow et al., 2011; Vilà & Ibáñez, 2011; Allen et al., 2013).

Understanding the biogeography of invasive plants is key to management efforts including prevention, EDRR, and containment. The considerable effort required to monitor for new incursions of invasive plants and the often large number of potential invasive species warrants the prioritization of species most likely to occur and negatively impact native ecosystems (Hulme, 2006; Pyšek & Richardson, 2010). Modeling the distributions of invasive species can identify which species could establish in particular areas (e.g., Ibáñez et al., 2009a,

2009b; O’Donnell et al., 2012; Gallagher et al., 2013; Allen & Bradley, 2016), and which ones are likely to be dominant if they do establish (e.g., Kulhanek et al., 2011; Curtis & Bradley,

2015; Bradley, 2016). Managers can use this information to cost-effectively target monitoring and management to the species that pose the greatest threats in a given area (Pyšek &

Richardson, 2010). A spatial analysis of abundance data can identify appropriate areas for containment (areas of potential establishment within the region an invasive plant has already

5 established) or EDRR (areas outside of the region in which an invasive plant has established but where the conditions are likely to support high abundance).

Potential invasive species richness, which is used as a metric of geographic invasion risk, varies substantially across geographic space (e.g, Ibáñez et al., 2009a, 2009b; O’Donnell et al.,

2012; Gallagher et al., 2013; Allen & Bradley, 2016). This spatial variation has prompted the adaptation of methods for assessing native biodiversity to quantifying geographic invasion risk across large spatial extents using species’ potential distributions (Ibáñez et al., 2009b; O’Donnell et al., 2012; Gallagher et al., 2013; Allen & Bradley, 2016). Delineating geographic regions of high native species richness as biodiversity hotspots highlights areas of particular conservation significance (Myers, 1988). Prioritizing biodiversity hotspots in the allocation of limited conservation resources provides a systematic method for addressing broad-scale extinction that maximizes the number of species preserved per dollar spent (Myers et al., 2000).

Similarly, identifying invasion hotspots, defined as geographic regions with conditions suitable to large numbers of invasive species, reveals where management and prevention efforts are most needed to mitigate potential invasion impacts across large spatial extents (Ibáñez et al.,

2009b; O’Donnell et al., 2012; Gallagher et al., 2013; Allen & Bradley, 2016). Delineating hotspots from invasive species’ potential distributions could cost-effectively direct the allocation of finite resources for management and prevention efforts across large geographic extents

(O’Donnell et al., 2012). Hotspot analyses based solely on where species could occur assume equal potential impact from all species able establish at any location. Abundance can estimate impact (Parker et al., 1999; Brooks et al., 2004; Stohlgren & Schnase, 2006; Yokomizo et al.,

2009; Kulhanek et al., 2011; Seabloom et al., 2013) and varies within ranges (Brown, 1984;

McGill et al., 2007), so incorporating potential levels of abundance into invasion hotspot

6 analysis could change geographic invasion risk assessment. Delineating regions of particularly high invasive plant abundance may highlight different regions where management is most needed.

Summary

Invasive plants can have profound negative ecological impacts at multiple biological levels and across spatial scales, as well as negative economic impacts. Geographic regions with especially high invasion risk are areas of particular management concern. We can estimate how invasion risk varies across geographic space using SDMs based on occurrence data and/or abundance data. While modeling occurrence can provide us information of where plant species can establish, modeling abundance can tell us where species will have the highest impacts within their potential ranges. Using abundance data adds detail that is valuable to managers concerned with ecological and economic impacts and may refine broad scale assessments of geographic invasion risk.

7

CHAPTER II:

Abundance data change geographic estimates of terrestrial invasive plant risk

in the contiguous United States

INTRODUCTION

Invasive plants pose severe threats to native ecosystems across the globe (Millennium

Ecosystem Assessment, 2005). Many of these invasive species are capable of outcompeting native plants, reducing local biodiversity, altering nutrient cycling, and disrupting ecosystem function (Vilà et al., 2011; Pyšek et al., 2012; Simberloff et al., 2013). Invasive plants alter their environments at a variety of ecological levels (Vilà et al., 2011) and spatial scales (Powell et al.,

2013), so understanding how their threat varies across geographic space is important to the conservation of native ecosystems. Geographic invasion risk is one way to estimate the potential impact of invasive species across space (Ibáñez et al., 2009a, 2009b; O’Donnell et al., 2012;

Gallagher et al., 2013; Allen & Bradley, 2016).

We can assess the ranges of invasive species across large extents to examine broad-scale geographic invasion risk (e.g., Ibáñez et al., 2009a, 2009b; O’Donnell et al., 2012; Gallagher et al., 2013; Allen & Bradley, 2016). The potential establishment range of an invasive species represents the geographic area in which environmental conditions are suitable to its occurrence

(Austin, 2007), so estimating the potential establishment ranges of many species identifies which species can invade any given area (Parker et al., 1999). The number of species that can invade varies substantially across geographic space, prompting the identification of areas with high potential invasive species richness as invasion hotspots (Ibáñez et al., 2009b; O’Donnell et al.,

2012; Gallagher et al., 2013; Allen & Bradley, 2016). Invasion hotspots represent sub-regions of

8 a study area that have the highest geographic invasion risk and require particular management attention (O’Donnell et al., 2012).

Management strategies depend on the extent of invasion (Hulme, 2006). While prevention or early detection and rapid response (EDRR) are applied to invasions that are new or unknown in the region of interest, containment is applied to well established invasions, in which managers control the spread from occupied areas to unoccupied areas within the region of establishment (Hulme, 2006). Established invasive plant species in the United States often exhibit relatively low range infilling, meaning that much of the potential establishment ranges are not yet infested and there is high risk for invasion into new areas within the region of establishment (Bradley et al., 2015; Allen & Bradley, 2016). Therefore, estimating potential establishment ranges of well-established species highlights areas where containment is appropriate: unoccupied areas within the potential establishment range. For successful EDRR and containment, monitoring efforts must often be prioritized to a subset of all potential invaders due to the difficulty in detecting nascent low-abundance populations and the prohibitive length of potential invasive species lists (Hulme, 2006; Pyšek & Richardson, 2010).

While the establishment range represents the area in which a species could affect native ecosystems, abundance is an additional component of impact (Parker et al., 1999; Stohlgren &

Schnase, 2006; Seabloom et al., 2013) that varies within the establishment range (Brown, 1984;

McGill et al., 2007). Local populations with high abundance generally have larger ecological and economic impacts than low-abundance populations (e.g, Medd et al., 1985; Bobbink & Willems,

1987; Standish et al., 2001; Alvarez & Cushman, 2002; Hester et al., 2006; Yokomizo et al.,

2009). Therefore, we can use abundance to estimate potential impacts (Parker et al., 1999;

Brooks et al., 2004; Stohlgren & Schnase, 2006; Yokomizo et al., 2009; Kulhanek et al., 2011;

9

Seabloom et al., 2013) and identify areas of particular management concern (Hulme, 2006;

McDonald et al., 2009; Bradley, 2016; Cross et al., 2017). Hereafter I refer to the full geographic area where an invasive species can potentially establish as the potential establishment range, and the subset of this range where high abundance is likely as the potential impact range. Geographic invasion risk assessments based solely on potential establishment ranges assume uniform impacts across species ranges, so incorporating potential levels of abundance into invasion hotspot analysis could change geographic invasion risk assessment.

Species distribution models (SDMs) can be constructed using records of species occurrence or abundance (Guisan & Zimmermann, 2000; Potts & Elith, 2006; Elith &

Leathwick, 2009). Georeferenced occurrence and abundance datasets have been aggregated from field surveys and herbaria in large volumes, especially in recent years (Elith & Leathwick, 2009; e.g, EDDMapS, 2016). Specimen and crowd-sourced datasets provide excellent opportunities for biogeographic analyses given the variety of statistical and machine learning tools available to model distributions (Guisan & Zimmermann, 2000; Potts & Elith, 2006; Elith & Leathwick,

2009) while accounting for biases and data quality issues in nonrandom samples (Phillips &

Dudík, 2008; Merow et al., 2013, 2016; Bird et al., 2014). Specimen data, which are occurrence- only, are generally more common than data with abundance information (Bradley et al., 2018).

Therefore, spatial analyses of many species across large geographic areas typically use occurrence datasets without incorporating information on abundance (e.g., Ibáñez et al., 2009a,

2009b; O’Donnell et al., 2012; Gallagher et al., 2013; Allen & Bradley, 2016).

Modeling abundance data would allow characterization of within-range variation in risk levels associated with invasive species impacts (Parker et al., 1999; Brooks et al., 2004;

Stohlgren & Schnase, 2006; Yokomizo et al., 2009; Kulhanek et al., 2011; Seabloom et al.,

10

2013). This added specificity in potential risk levels is an important addition for managers concerned with environmental and economic impacts (Hulme, 2006; McDonald et al., 2009;

Bradley, 2016). However, abundance data require greater effort from of the collector, who must choose a metric of abundance and make an estimate beyond simply recording the occurrence of the species (Pearce & Boyce, 2006; Bradley, 2016; Bradley et al., 2018). As such, reliable abundance data are less available than occurrence data at regional scales, presenting a barrier to the use of abundance data in SDMs (Pearce & Boyce, 2006; Bradley, 2016; Bradley et al., 2018).

Since occurrence data are more common than abundance data (Parker et al., 1999;

Stohlgren & Schnase, 2006; Seabloom et al., 2013), using occurrence-based models to estimate abundance would be useful (Pearce and Ferrier, 2001; Nielsen et al., 2005; VanDerWal et al.,

2009; Jimenez-Valverde et al., 2009; Estes et al., 2013; Gutierrez et al., 2013; Bradley 2016;

Cross et al., 2017). Correlations between occurrence-based suitability and observed abundance are mixed for vertebrates (Pearce & Ferrier, 2001; Nielsen et al., 2017) and invertebrates

(Jiménez-Valverde et al., 2009; Gutiérrez et al., 2013), but generally poor for plants (Pearce &

Ferrier, 2001; Cross et al., 2017; Nielsen et al., 2017) unless based on occurrences of high abundance (Estes et al., 2013; Bradley, 2016). Invasive plant SDMs based solely on occurrence rarely reflect variation in abundance (Bradley, 2016). The ecological importance and management implications of abundance (Parker et al., 1999; Stohlgren & Schnase, 2006;

Seabloom et al., 2013) and generally poor ability of occurrence-based models to predict abundance (Bradley, 2016) warrant modeling abundance data directly, despite potentially limited data availability.

In this study I contribute a large-extent, coarse-resolution spatial analysis that directly draws on occurrence and abundance data to better understand how abundance varies within

11 ranges and how that can refine invasive species geographic risk assessments. By evaluating how the type of data used may shape geographic invasion risk estimates, we can help determine best practices for constructing models of geographic invasion risk to guide the prioritization of management efforts. Specifically, I examined how abundance data alters our understanding of the geographic invasion risk of terrestrial invasive plants in the contiguous United States. I constructed SDMs from occurrence and abundance data for 70 species, which I then used to address the following questions:

1. How large is the potential impact range relative to the potential establishment range for

terrestrial invasive plant species?

2. To what extent is the potential impact range currently occupied by high-abundance

observations for individual species?

3. How do invasion hotspots estimated from potential impact ranges of many species

differ from those estimated from potential establishment ranges?

Answering these questions will help assess the contribution of abundance data in SDM- based estimates of geographic invasion risk within and across species. This assessment will help inform decisions to incorporate abundance data in future estimates of geographic invasion risk and aid the prioritization of management efforts.

METHODS

Study region

The study extent encompasses the contiguous United States, including all states except

Alaska and Hawaii and falling within 24-50° N and 66-125° W (WGS84). Political boundaries are an important consideration given the policy implications of conservation issues posed by

12 invasive plants. The contiguous U.S. represents a large continuous area under the jurisdiction of a single federal government and with sufficient data on the distribution of many invasive species

(e.g., EDDMapS, 2016). The study region exhibits many combinations of climatic conditions

which range from cool to warm and wet to dry, and which fall into ten broad ecoregions (Fig.

A1; Commission for Environmental Cooperation (CEC), 1997). The majority of the contiguous

U.S. area is made up of three main ecoregions: Eastern Temperate Forests (32%), Great Plains

(28%), and North American Deserts (20%) (Commission for Environmental Cooperation (CEC),

1997). The remaining seven ecoregions encompass from <1 % to 11% of the contiguous U.S.

(Table A1). In terms of land cover, the contiguous U.S. is primarily composed of forest (26%), agricultural land (23%), shrubland (23%), and grasslands (15%), with the remainder being developed, wetland, barren, or perennial ice or snow (Table A1; EPA, 2014).

Invasive species data

The candidate species list was compiled from the Federal Noxious Weed List (Executive

Order 13112, 1999), state noxious and invasive plant lists (Table A3), and the Invasive Plant

Atlas of the United States (Swearington & Bargeron, 2016). These sources were selected to encompass species that are not only exotic, but known or expected to impact a variety of environments from agricultural land to natural areas (Executive Order 13112, 1999; Swearington

& Bargeron, 2016). Any plants classified as native to the contiguous U.S. in the PLANTS database (USDA & NRCS, 2017) or identified as aquatic, semiaquatic, or exclusive to wetlands were excluded. The candidate species list included 1,078 species. Of these 1,078 species, I included species based on data availability and model quality (see Species distribution modeling).

13

I used the terrestrial invasive plant occurrence dataset from Allen & Bradley (2016), which includes georeferenced points from 33 collections of field and museum data for modeling.

Many of these data are from the Early Detection & Distribution Mapping System (EDDMapS,

2016), an online mapping system that compiles observations of invasive species in the U.S. from a variety of data contributors. Data contributors include citizen scientists reporting directly to

EDDMapS, regional invasive species monitoring organizations, non-governmental organizations, and government agencies (EDDMapS, 2016). I appended an updated download of the EDDMapS

(2016) database to update the Allen & Bradley (2016) dataset.

Nearly one third of the occurrence records from the EDDMapS database also contained abundance data. The dataset includes multiple forms of abundance data including canopy cover, stem count, and the local extent of infestations. Abundance data are sparser than occurrence data and several metrics are popular, meaning that the subset of records with abundance information are further divided between the many different metrics of abundance including continuous, ordinal, and qualitative entries (Bradley et al., 2018). Infested area (local extent of an infestation) was the only metric that was consistently reported on a continuous scale (Table A4). A preliminary analysis using only infested area was unsuccessful due to small sample sizes and poor model fit (Appendix C). Using ordinal categories allowed me to incorporate several different abundance metrics and capture as much of the available abundance data as possible

(Table A4). I implemented multiple quality control measures to ensure the quality of these data was sufficient for modeling (Appendix B).

Many of the cover and count data were reported as ranges (i.e. “11-100” stems), so for cover and count I selected the set of bins (Table 1) observed most often in the dataset as my ordinal ranking scheme. Integer entries for cover and count were categorized according to these

14 bins. Qualitative metrics were ordered logically (Bradley et al., 2018). While the bins do not correspond perfectly between metrics, most of the binned metrics showed reasonable pairwise correlations (Table A5). The continuous area estimates were binned to maximize congruence with other categories (Appendix B, Fig. A4). For observations containing entries in more than one abundance field, I assigned ordinal ranks to the entries for each abundance metric and took the median value. Six hundred and nine species of the 1,078 candidate species had one or more abundance data points, over two hundred of which had abundance data in fifty or more climate grid cells (see Environmental data).

Table 1: Summary of the combination of several abundance metrics into an ordinal abundance ranking scheme.

Abundance Cover Count Text Area Rank Qualitative Quantitative Single plant/ 1 Low <5% 1 - 10 plants Scattered plants/ <100 ft2 Scattered linearly 11 - 100 Scattered dense 100 ft2 - 0.25 2 Medium 5 - 25% plants patches acres Dominant cover/ 3 High >25% >100 plants Dense > 0.25 acres monoculture

As I was unable to construct satisfactory models for all 1,078 species in the candidate species list (see Species Distribution Modeling), I used growth habit and duration to assess how well the subset of species I modeled represent the traits exhibited in the full candidate list. I selected duration (perennial, biennial, and/or annual) and growth habit (tree, shrub, forb, grass, vine, cactus and/or palm) as biologically meaningful traits that are known for many species. I collected habit and duration trait data from the PLANTS database (USDA & NRCS, 2017), the

15

Invasive Plant Atlas of the U.S. (Swearington & Bargeron, 2016) and a variety of other online sources (Commonwealth Scientific and Industrial Research Organisation, 2004; Flora of North

America Association, 2008; Kleyer et al., 2008; The vPlants Project, 2009; Parr et al., 2014;

CABI, 2017; Center for Aquatic and Invasive Plants, 2017; Weldy et al., 2017; Dolan & Moore,

2017; Keener et al., 2017; Missouri Botanical Garden, 2017). I used analyses of variance

(ANOVA) to test for significant differences in potential impact range sizes between species of different habits or durations to assess the importance of plaint traits in geographic distributions.

Environmental data

The environmental variables in this study included interpolated climate data and land cover data. I obtained data for current climate from WorldClim (Hijmans et al., 2005), based on conditions at a 2.5 arc-minute spatial resolution (roughly 4km x 4km grid cells for most of the contiguous U.S.) averaged from 1950 to 2000. I selected six variables from the nineteen bioclimatic variables available: mean diurnal range, maximum temperature of the warmest month, minimum temperature of the coldest month, mean temperature of the wettest quarter, annual precipitation, and precipitation seasonality. These variables were selected to best represent the maxima, minima, and variation of temperature and precipitation while minimizing pairwise correlations within the contiguous U.S.

I used the 2011 National Land Cover Database (Homer et al., 2015) to represent land cover. Similar land cover classes were aggregated to minimize complexity, resulting in the following nine classes: evergreen forest, mixed/deciduous forest, shrubland, pasture/hay, cultivated crops, herbaceous/grassland, wetland, developed land, and unsuitable (including barren land, open water, heavily developed land, and perennial ice or snow, Table A2). The land

16 cover dataset has a much finer spatial resolution (30 meter) than the climate data (Homer et al.,

2015). To make the land cover data compatible with climate, I calculated the proportion of each land cover class contained within each climate grid cell. Out of these nine proportional land cover variables I added six (evergreen forest, mixed/deciduous forest, shrubland, pasture/hay, cultivated crops, herbaceous/grassland) that represent land covers important to terrestrial invasive plants (Ibáñez et al., 2009a; Merow et al., 2011; Allen et al., 2013) and exhibit minimal pairwise correlations to my set of environmental predictors. Land cover likely influences abundance at a much finer spatial scale (Ibáñez et al., 2009a, 2009b; Merow et al., 2011), but broad patterns in land cover across the U.S. should be sufficient for a coarse-resolution analysis of abundance over a regional extent (Allen et al., 2013). Coarsening land cover data is more justifiable than resampling climate data to a finer scale considering the location uncertainty of the invasive plant data points (Mitchell et al., 2017).

Species distribution modeling

I constructed occurrence-based models for the 155 species with sufficient data for the abundance-based models (see below). Absence data were rare (2.7% of EDDMapS records), unavailable for >90% of species, and nearly exclusive to small regions within the Eastern U.S.

(Fig. A2). Therefore, I modeled the distributions for each species using the maximum entropy modeling software package MaxEnt (Phillips et al., 2006), an effective machine-learning tool for

SDM using occurrence-only data (Elith et al., 2011; Merow et al., 2013). MaxEnt estimates environmental suitability through comparing the environmental conditions at points where a species is present to the conditions at background points in the surrounding landscape (Elith et al., 2011). I used target group sampling to account for sampling bias (Phillips et al., 2009;

17

Merow et al., 2013), with the target group defined as the 155 species with sufficient abundance data. Target group sampling uses all occurrence records in a group of taxonomically or ecologically similar species that are observed using similar techniques to account for sampling bias in the species-level occurrence datasets (Ponder et al., 2001; Phillips et al., 2009; Merow et al., 2013). I represented sampling bias as the estimated sampling effort from a target group

MaxEnt model with human population density and road density as predictors (Fig. A3; Bocsi et al., 2016; Merow et al., 2016). This sampling bias surface was used in MaxEnt to assign non- uniform weights to each set of background points so that the spatial bias in the background points approximates the spatial bias in the occurrence data points (Ponder et al., 2001; Phillips et al., 2009; Merow et al., 2013). The same sampling bias surface was used for all species.

I modeled each species individually using all observations for the species and all twelve environmental (climate and land cover) variables. I removed duplicate points within each grid cell, used 10-fold cross-validation, and included linear and quadratic features to maximize ecological realism and interpretability (Merow et al., 2013; Moreno-Amat et al., 2015; Allen &

Bradley, 2016). Model fit was assessed using average area under the curve (AUC) for test samples across the ten model runs, which measures how well occurrence points can be distinguished from background points (Fitzpatrick et al., 2013). I removed species with MaxEnt models with test AUC scores below 0.70 because test AUC scores above 0.70 are considered informative (Swets, 1988). The potential establishment ranges for each species were defined using the threshold Maxent logistic suitability value that included 95% of the fitting occurrence points to reduce the influence of marginal and outlier occurrences (e.g., Fig. 1a; Fitzpatrick et al.,

2013; Allen & Bradley, 2016; Bocsi et al., 2016). To assess the effect of sample size on the size of estimated ranges, I conducted simple linear regressions for (1) each species with potential

18 establishment range size as a response to occurrence sample size and (2) potential impact range size as a response to the abundance sample size in R 3.4 (R Core Team, 2017).

I also constructed abundance-based distribution models using the subset of occurrence data points containing abundance information. I thinned abundance data within species to one point per climate grid cell in the same way as MaxEnt models to avoid potential bias from points clustered within grid cells (Hernandez et al., 2006). For grid cells containing multiple abundance points from a single species, I took the median value to reflect the central tendency within those environmental conditions. I included species with at least 10 observations in each ordinal class

(at least 30 observations total). For the 155 species that met this condition, I modeled ordinal abundance as a response to climate and land cover in R 3.4 (R Core Team, 2017) using the cumulative log odds model (clm) function in the ordinal package (Christensen, 2015). I assessed pairwise correlations of covariates for each species dataset individually before modeling and fit all possible linear and quadratic models where no pairs of covariates were highly correlated

(Spearman’s rho > 0.6), allowing up to one term per every five records in the rarest abundance class to avoid inflating bias and type I error rate (Vittinghoff & McCulloch, 2007). I selected the best model using Akaike’s Information Criterion (Akaike, 1974) and assessed the agreement between observed and predicted abundance ranks using the kappa2 function in the irr package

(Gamer et al., 2012) to calculate the Cohen’s kappa (Cohen, 1960; Cross et al., 2017). I excluded species where agreement was not significantly (p<0.05) better than random agreement.

The lack of absence data poses a special challenge to the use of abundance-based models. With no ordinal rank for absence, ordinal regression is not capable of predicting range margins. To account for this issue, I used the ordinal regression to model the variation in abundance within the potential establishment ranges estimated using MaxEnt. The abundance dataset is a subset of

19 the occurrence dataset (e.g., Fig. 1a), so there is a possibility that for any species the points with abundance information are not representative of the full spectrum of environmental conditions observed in their potential establishment ranges. I used multivariate environmental similarity surfaces (MESS) for each species to identify environmental conditions across the potential establishment range that are not represented in the abundance dataset (Elith et al., 2011;

O’Donnell et al., 2012). For each species, only the environmental variables in the best fit ordinal regression were considered. The MESS analyses compared the values of the environmental variables at the points with abundance data to the values of the environmental variables in all grid cells within the species range, yielding similarity index values up to 100 and with no lower bound. I considered cells with index values equal to or below -10 to be non-analog, for this means that conditions are 10% or more beyond the range of values in the abundance dataset for one or more variables (Fig. 1b; Elith et al., 2010, 2011; O’Donnell et al., 2012; Li et al., 2016). I predicted abundance within the range for species where less than 10% of the range contained non-analog conditions (Fig. 1c). I performed MESS calculations in R 3.4 (R Core Team, 2017) using the mess function in the dismo package (Hijmans et al., 2017).

Species-level abundance assessments

I used the best-fit ordinal regression for each species to predict the odds of each abundance class (class 1, 2, or 3) in each grid cell within the species’ range. I then identified the grid cells where high abundance (class 3) is most likely (>0.33 probability and thus more likely than classes 1 and 2) as the potential impact range. I calculated the proportion by area of each species potential establishment range that is made up by the potential impact range (e.g., Fig. 1c).

All area calculations were performed after projecting geographic data into the Albers conic

20

Figure 1: Visualization of the modeling process for an example species, paper mulberry (Broussonetia papyrifera). (a) The potential establishment range is estimated in MaxEnt, using all observations of paper mulberry. (b) To assess whether the subset of points with abundance data represent the climatic conditions, a MESS analysis is conducted to identify non-analog conditions in the range. (c) If less than 10% is of the establishment range contains non-analog conditions, then ordinal regression can be used to predict abundance with the range. The area in red on panel (c) represents the impact range.

21 equal-area projection for the contiguous U.S. (NAD83/Conus Albers, EPSG:5070). I also estimated the extent of high-abundance infilling of the potential impact range (hereafter referred to as infilling) by calculating the proportion of cells in the potential impact range that contain observations of high abundance (class 3). Using species records to calculate occupancy within ranges may underestimate infilling due to imperfect detection of invasive species, but species records provide the best available and most widely used estimates of where invasive plants currently exist (e.g. EDDMapS, 2016). To assess whether potential impact range size is proportional to potential establishment range size, I performed a linear regression of potential impact range size on potential establishment range size across all species in R 3.4 (R Core Team,

2017).

Hotspot comparison

The potential establishment ranges for each species were summed so that each pixel contained the number of species predicted to be present, or invasive species richness (Allen &

Bradley, 2016). Species richness represents the occurrence-based invasive species assessment of geographic invasion risk because richness is calculated from occurrence-based MaxEnt models. I also calculated the number of species most likely in the highest abundance class (class 3) for each pixel, yielding the number of highly abundant species, or abundant invasive species richness. I selected the number of high abundance species as a representation of invasion risk that incorporates abundance and is logically comparable to invasive species richness. To identify areas of particular geographic invasion risk, I designated grid cells in the top 25th percentile for invasive species richness or abundant invasive species richness as richness hotspots and

22 abundance hotspots, respectively (Ibáñez et al., 2009b; O’Donnell et al., 2012; Gallagher et al.,

2013; Allen & Bradley, 2016).

I compared richness and abundance hotspots by overlaying the two hotspot types and quantifying percent overlap. This comparison shows us how the incorporation of abundance information in species distribution modeling changes the interpretation of geographic invasion risk at broad spatial scales. To help explain any differences between the two hotspots I divided the number of highly abundant species by potential species richness to estimate the proportion of potentially establishing invasive species that are likely to achieve high abundance in each grid cell. I performed a leave-one-out analysis in which I removed each of the species in the final analysis individually and recomputed the overlap between abundance and richness hotspots to assess sensitivity to species identity. Spatial data were processed in R 3.4 (R Core Team, 2017) using the raster (Hijmans, 2016), rgdal (Bivand et al., 2017), and rgeos (Bivand & Rundel,

2017) packages.

RESULTS

Model quality and species retained

Of the 155 species with abundance data suitable for modeling, 70 species (45%) were retained through all three quality screens: ordinal abundance models (kappa > 0, p < 0.05),

MaxEnt models test AUC > 0.7, and <10% of cells with non-analog conditions; Table A6). Test

AUC scores were above 0.70 for the vast majority (146 species) of species. Out of the 146 species with informative establishment range estimates, abundance datasets represented range- wide environmental conditions for a majority (95 species; Table 2). There was significant agreement between observed and predicted abundance values in the best-fit ordinal regression

23 for most (70 species) of the 95 species with informative establishment range estimates and representative abundance datasets.

Table 2: Summary of sequential model screening; only species that passed though screen 1 were considered for screen 2, and only species that passed through screen two where considered for screen 3. Screening Model screen Number of species Percent of species retained (out of sequence retained total species from previous screening) 1 Test AUC (MaxEnt) 146 94% Percent of range non- 2 analogous to abundance 95 65% dataset (MESS) 3 Cohen’s kappa 70 74% (ordinal regression)

Perennials and woody species were retained at higher rates than other species, resulting in a slight overrepresentation of perennials and woody species and an underrepresentation of forbs, grasses, and annuals, biennials, and species that exhibit multiple durations (Fig. 2). Palms and cacti were rare in the candidate species list (1 species each) and were not represented in the final species list. Potential impact range sizes were not significantly different between perennials and non-perennial, nor between species of different habits (ANOVA p > 0.05).

Agreement between observed abundance and the abundance class predicted to be most likely (Cohen’s kappa) varied greatly among the 70 species included in the final analysis. Forty- four species (63% of species) had slight agreement (kappa <0.2), 22 (31% of species) had fair agreement (kappa = 0.2-0.4), and 4 (6% of species) had moderate agreement (kappa = 0.4-0.6)

(Fig. 3; Landis and Koch 1977). MaxEnt test AUC scores had a median of 0.868, and ranged from 0.710 to 0.996 (mean:0.862, SD: 0.078).

Land cover and climatic variables were included together in the best-fit ordinal abundance models for 58 of the 70 species, with 7 climate-only models and 5 land cover-only

24

Figure 2: Composition of species durations (a) and habits (b) for 1,045 terrestrial invasive plants, 155 species with sufficient abundance data, and 70 species that passed through all three model screenings (AUC>0.7, kappa > 0, <10% of range with no-analog conditions).

models (Table A6). Climatic variables were slightly more common across all species (53% of variables selected overall). Individual species models contained 1-9 variables each (median: 4 variables). Each individual variable was included in best fit models for 17-35 species, indicating that all candidate variables were somewhat important given this set of species. The most common variables included mean temperature of the wettest quarter (35 species), precipitation

25

Figure 3: Histogram of Cohen’s kappas for the best fit ordinal regressions of the 70 species included in the analysis. Qualitative interpretations of kappa values from Landis and Koch (1977) are included along the horizontal axis.

seasonality (33), maximum temperature of the warmest month (32), and the proportion of cultivated crop land cover (32 species). The least commonly selected variables included minimum temperature of the coldest month (17 species) and the proportion of shrub land cover

(21 species each). Quadratic terms were included for nearly half of the variables in best fit models (44% and 48% for land cover variables and climatic variables respectively).

Establishment and impact ranges

Individual species establishment ranges encompassed from 22,236 – 6,250,172 km2

(median: 1,718,143 km2; mean: 2,027,910 km2; sd: 1,477,979 km2) and encompassed 3-67%

(median: 22%; mean: 26%; sd: 19%) of the study region (Fig. 4a). Potential impact ranges covered from 0 – 1,657,087 km2 (median: 146,945 km2; mean: 333,968 km2; sd: 438,581 km2), and covered from 0-84% (median: 9%, mean: 16%, sd: 20%) of the establishment regions (Fig.

26

4b). Infilling was low (mean: 0.36%, sd: 0.68%) for all potential impact ranges (Fig. 4c). The number of occurrence data points had a positive and highly significant relationship with potential establishment range size (460.7 km2 per data point, p = 0.00001) that explained some of the variation in establishment range size (adjusted R2=0.23), while the number of abundance data

Figure 4: Histograms for the 70 species in this analysis of (a) the size of the potential establishment range (b) the size of the potential impact range, and (c) the percent infilling of the impact range.

27 points had no significant relationship with potential impact range size (129.2 km2 per data point, p = 0.12, adjusted R2 =0.02; Fig, 5a). The predicted abundance class varied throughout the potential impact range. There was a highly significant relationship between the size of the potential impact range and the size of the potential establishment range (0.14 km2 increase in

Figure 5: The relationship between (a) range size and sample size and (b) establishment range size and impact range size.

28

potential impact range per 1 km2 increase in potential establishment range, p=0.00002), but the ability of potential establishment range size to predict the variability in potential impact range size was limited (adjusted R2=0.22; Fig. 5b).

For all but ten species, each abundance class was predicted to be most likely within some part of the potential establishment range. Of the ten species with fewer than three abundance classes predicted, there were seven species for which all cells within the range were predicted into classes 1 or 2 (Louise’s swallow-wort, Cynanchum louiseae; white lead tree, Leucaena leucocephala; para grass, Urochloa mutica; scotch brome, Cysticus scoparius; silkreed,

Neyraudia reynaudiana; princesstree, Paulownia tomentosa; and small- climbing fern,

Lygodium microphyllum). Similarly, there were two species where either rank 2 or 3 was most likely for all cells within the range (Japanese stiltgrass, Microstegium viminium; and celandine,

Chelidonium majus), and one species where 1 or 3 was most likely for all cells within the range

(gray sheoak, Casuarina glauca).

Invasive species hotspots

Forty-seven percent of the richness hotspots was overlapped by the abundance hotspot, particularly in the interior southeast (Fig. 6a). Calculated overlap between the richness and abundance hotspots in the leave-one-out analysis for each species were generally similar (mean:

46.5%, sd: 1.3%) to the all-species 47% overlap (Table A5). Potential richness and abundance hotspots were most similar (56% overlap) when bird's-foot trefoil (Lotus corniculatus) was excluded and the least similar (39% overlap) when kudzu (Pueraria montana var. lobata) or common periwinkle (Vinca minor) was excluded. Invasive species richness hotspots covered much of the eastern U.S. including the mid-Atlantic states and southern New England (Fig. 6a).

29

Figure 6: (a) Invasive species richness and abundance hotspots overlaid, and (b) the proportion of species where the highest abundance class is most likely out of all species to which the conditions are suitable for establishment. Level I ecoregion boundaries are shown in black.

The invasive species abundance hotspots were less contiguous, encompassing much of the interior southeast and northwest as well as substantial patches in the northern Great Plains (Fig.

6a). The majority (81%) of the richness hotspots fell in the Eastern Temperate Forest ecoregion, reflecting the generally higher richness in this region than surrounding areas (Fig. A5a). Eastern

Temperate Forests are vastly overrepresented in the richness hotspots compared to their area in

30 the U.S. (80% in the richness hotspots, 32% in the U.S.), as are the Northern Forests (11% in richness hotspot, 5% in U.S., Fig. 7). All other ecoregions were underrepresented in the richness hotspots (Fig. 7).

Eastern Temperate Forests also made up the largest portion of the abundance hotspots, but to a lesser degree (49%). Northwestern Forested Mountains made up a much larger portion of the abundance hotspots (18%) than the richness hotspots (1%) and were overrepresented in the

Figure 7: Ecoregion composition of the contiguous U.S., potential invasive species richness hotspots, and abundance hotspots.

31 abundance hotspots compared to the contiguous U.S. (11%), reflecting the relatively large number of potentially high-abundance species in much of the northwestern U.S. (Fig. A5b).

Similarly, most of the Marine West Coast Forests are identified as abundance hotspots, but not richness hotspots. The abundance hotspots also included substantially larger portions of the

North American Deserts, Mediterranean California, and Temperate Sierras than the richness hotspots, but a smaller proportion of Northern Forests (Fig. 7). One key difference between regions identified as abundance hotspots only and richness hotspots only is the proportion of invasive plants that could potentially be highly abundant. In abundance-only hotspots 22-90%

(mean: 49%, SD:11%) of potential invasive species are likely to be highly abundant, while in richness only hotspots only 0-16% (mean: 10%, SD:3%) of potential invasive plants are likely to be highly abundant.

DISCUSSION

Estimated potential impact ranges are considerably smaller than potential establishment ranges (Fig. 4a-b). Information on the modeled variation in abundance within potential establishment ranges can provide guidance for the prioritization of management and monitoring efforts beyond that available from potential establishment range maps (McDonald et al., 2009;

Bradley, 2016; Cross et al., 2017). Potential impact and establishment ranges are significantly related, but potential establishment range sizes do not explain all the variation in potential impact range sizes across species. Assuming that the risk of a particular species is constant across a potential establishment range is unrealistic, and likely overestimates risk in many areas (Bradley,

2013; Howard et al., 2014; Gomes et al., 2018). The reliance of occurrence-based models on the large amount of low-abundance observations in herbarium records (Bradley, 2013) and field

32 datasets (Cross et al., 2017) appears to result in predicting impacts where abundance levels are likely to be minimal.

The low infilling of the potential impact range suggests high potential for incursions of high-abundance infestations in currently uninvaded areas. This would indicate large risks of future impacts that managers have the opportunity to manage through containment (Hulme,

2006). Invasive species exhibit relatively low infilling in contiguous U.S., meaning that while suitable environmental conditions for a species may span large areas within the U.S., these species are generally not ubiquitous throughout their ranges (Bradley et al., 2015; Allen &

Bradley, 2016; Bocsi et al., 2016).

The richness hotspots highlighted much of the Eastern Temperate Forests and Northern

Forests, but included comparatively little of all other ecoregions, possibly due to the long history of introduction in the eastern U.S. compared to other regions, notably through the ornamental trade (Lehan et al., 2013). Forests in the western U.S. are generally invaded by fewer invasive plant species than forests in the eastern U.S. (Iannone et al., 2015; Oswalt et al., 2015), which is consistent with the lower proportions of western ecoregions in the richness hotspots. The eastern

U.S. is a region of particularly high native plant diversity as well as invasive plant diversity

(Kartesz & The Biota of North America Program (BONAP), 2015). Relationships between native and invasive plant diversity may be due to extrinsic factors related to landscape heterogeneity and human disturbance (Bartomeus et al., 2018). The abundance hotspots also included much of the Eastern Temperate Forests (particularly the interior southeast), but Marine

West Coast Forests, Northwestern Forested Mountains, and Great Plains made up considerably higher proportions of the abundance hotspots than the richness hotspots. While these regions in the northwestern abundance hotspots are suitable to modest numbers of species, substantial

33 proportions (>20%) of those species are likely to be highly abundant, leading to considerable risk of high-abundance invasion. The high impact risk in the western U.S. may reflect the distribution of high-impact invasive herbs such as species of knapweed, thistle, cheatgrass, and spurge

(DiTomaso, 2000) that have been spreading through U.S. rangelands (Masters & Sheley, 2001;

Sheley et al., 2016). The herbs leafy spurge (Euphorbia elata), and whitetop (Cardaria draba) are indeed among the species with impact ranges in the northwestern hotspots in this analysis.

The apparent importance of the land cover variables in most of the abundance models presented in this study suggests land cover information can augment climatic models in broad scale, large extent species distribution modeling. Land cover variables appeared virtually as often as climate variables in best-fit models, with the proportion of agricultural land being one of the most selected variables overall. This may be due to the importance of agriculture and other human activity as invasion pathways for some invasive plant species (Radosevich et al., 2007;

Goyal & Sharma, 2016). The slight positive relationship between range size and sample size may indicate that broadly distributed species are encountered and sampled more often, or this may suggest that the MaxEnt models underestimate ranges for data-poor species. The impact range, however, does not appear to be affected by sample size.

Exclusion of species from this study was due to either small sample sizes, abundance datasets that did not represent conditions across the range, and poor model performance, which could be due to a number issues including insufficient predictors (e.g., a lack of local factors), temporal variation in abundance, or data quality issues. The leave-one-out analysis suggests that the difference between patterns of potential invasive abundance and richness aggregated across many species is relatively robust to the identities of individual species, but retaining more species would still add value at the individual species level (e.g., the delineation of the impact

34 range for a particular species). Biennial, annual, and herbaceous species were removed at higher rates than perennials and woody species. For species with shorter life cycles, local abundance may fluctuate substantially from year to year due to variation in weather. Therefore, relationships between variables that represent a broader time scale may be obscured by this finer temporal variation. Likewise, plants with woody tissue are likely more resilient to fine-scale variation in weather (Morris et al., 2008), and as a result I expect observations of abundance at any given time to reflect climatic conditions at a broader time scale. This may indicate a need for a better understanding and incorporation of temporal variation in SDMs, particularly for short-lived species. Since potential impact ranges did not vary significantly with plant traits, increasing the retention of short-lived and nonwoody plants may not be essential to represent broad patterns across species (e.g., the level of overlap of abundance and richness hotspots, or the average size of impact ranges relative to establishment ranges). However, improving models for nonwoody and short-lived species could add value at the individual species level (e.g., better range estimates for these species) and allow us to include species that were excluded from this study, like poison hemlock (Conium maculatum, a biennial forb).

Our analyses highlight the need for the continued collection of abundance information for invasive plant species. Out of the over 1,000 terrestrial plant species considered invasive in the

U.S. (Allen & Bradley, 2016), only 155 had sufficient abundance data for modeling. Giant hogweed (Heracleum mantegazzianum) and European swallow-wort (Cynanchum rossicum) were among the species excluded due to insufficient data. Furthermore, the MESS analyses suggested that for many species, including Japanese honeysuckle (Lonicera japonica), abundance is not sufficiently sampled throughout the environmental space encompassed by the species’ establishment range. Not only do we need more abundance data, but we need abundance

35 data throughout species’ establishment ranges to reliably predict abundance across establishment ranges for a more comprehensive set of species. Additionally, the data processing and cleaning required for projects like this (e.g., Appendix B) can be a significant hurdle. Including existing abundance information for comprehensive analyses encompassing many species over large extents would be more straightforward if abundance data were collected more consistently.

Bradley et al. (2018) recommend that the extent of local infestation (infested area) and percent cover within that area should be recorded using the following bins: trace (<1%), low (1-5%), moderate (5-25%), and high (>25%). The combination of area and cover provide the most readily interpretable and reliable description of local abundance (Bradley et al., 2018). Data collected in this way would streamline the incorporation of abundance into SDM, and the increased consistency would likely improve the performance of abundance models.

Absence data are invaluable for SDMs that incorporate abundance information, allowing for abundance modeling with a zero abundance class or multinomial hurdle models which predict presence/absence and then abundance if present (Potts & Elith, 2006; Zuur, 2012;

Gonzáles-Andrés et al., 2016; Heersink et al., 2016). Both multinomial hurdle models and ordinal regressions including an absence class would allow for abundance to be modeled within the same statistical framework, rather than melding together separate models built using different datasets.

Richness hotspots identify the areas of the greatest establishment risk, where increased

EDRR may be warranted. However, using predictions of potential establishment ranges alone to guide management assumes that potential impacts are constant throughout potential establishment ranges, while in reality impact varies with abundance (Parker et al., 1999; Brooks et al., 2004; Stohlgren & Schnase, 2006; Yokomizo et al., 2009; Kulhanek et al., 2011;

36

Seabloom et al., 2013), which varies within potential establishment ranges (Brown, 1984;

McGill et al., 2007). Estimates of potential establishment ranges are still useful to management, as they show the full geographic area where monitoring and management efforts may be appropriate. Furthermore, areas of a species’ potential establishment range not included in the potential impact range (areas where low abundance is likely) could serve as corridors for the spread of the invasive species into adjacent uninvaded areas within the potential impact range.

However, abundance predictions reveal where high impacts are most likely within a species’ potential establishment range, and where management concerns for that species will likely be the highest (Parker et al., 1999; Hulme, 2009; McDonald et al., 2009; Pyšek & Richardson, 2010;

Bradley, 2016). The locations of abundance hotspots indicate where impacts may be particularly drastic and management particularly important across many species (Parker et al., 1999;

McDonald et al., 2009; Kulhanek et al., 2011). In areas where the list of potential establishers is much longer than what can feasibly be monitored, abundance predictions can be used to systematically prioritize species that threaten to have the most considerable impacts (Parker et al., 1999; Brooks et al., 2004; Stohlgren & Schnase, 2006; Yokomizo et al., 2009; Kulhanek et al., 2011; Seabloom et al., 2013). Given these insights, more spatial analyses of invasive plants should include existing abundance information so that managers can consider the distribution of abundance within ranges when prioritizing management and monitoring efforts.

CONCLUSION

The regions identified as high-risk by abundance-based and occurrence-based models were quite different, suggesting that areas at high risk of establishment are not necessarily at high risk of impact. Abundance-based models predict the variation in invasion risk within ranges of

37 individual species, revealing that the potential impact range is often considerably smaller than the potential establishment range. Delineation of the impact range within the potential establishment range provides guidance for management and monitoring prioritization beyond that available from establishment ranges and considers the increased impact associated with abundant invasive plants. Therefore, more spatial analyses of invasive plants should include abundance data moving forward, and more abundance data should be collected across species ranges to allow additional comprehensive analyses with greater numbers of species in the future.

38

REFERENCES

Akaike H (1974) A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19, 716–723. Allen JM, Bradley BA (2016) Out of the weeds? Reduced plant invasion risk with climate change in the continental United States. Biological Conservation, 203, 306–312. Allen JM, Leininger TJ, Hurd JD, Civco DL, Gelfand AE, Silander JA (2013) Socioeconomics drive woody invasive plant richness in New England, USA through forest fragmentation. Landscape Ecology, 28, 1671–1686. Alvarez ME, Cushman JH (2002) Community-Level Consequences of a Plant Invasion : Effects on Three Habitats in Coastal California. Ecological Applications, 12, 1434–1444. Austin M (2007) Species distribution models and ecological theory: A critical assessment and some possible new approaches. Ecological Modelling, 200, 1–19. Bartomeus I, Sol D, Pino J et al. (2018) Deconstructing the native — exotic richness relationship in plants. Global Ecology and Biogeography, 21, 524–533. Bird TJ, Bates AE, Lefcheck JS et al. (2014) Statistical solutions for error and bias in global citizen science datasets. Biological Conservation, 173, 144–154. Bivand R, Rundel C (2017) Rgeos package for R, version 0.3-23. Bivand R, Keitt T, Rowlingson B (2017) Rgdal package for R, version 1.2-7. Blackburn TM, Essl F, Evans T et al. (2014) A Unified Classification of Alien Species Based on the Magnitude of their Environmental Impacts. PLoS Biology, 12, e1001850. Bobbink R, Willems JH (1987) Increasing dominance of pinnatum (L.) beauv. in chalk grasslands: A threat to a species-rich ecosystem. Biological Conservation, 40, 301–314. Bocsi T, Allen JM, Bellemare J, Kartesz J, Nishino M, Bradley BA (2016) Plants’ native distributions do not reflect climatic tolerance (ed Thuiller W). Diversity and Distributions, 22, 615–624. Bouyer Y, Rigot T, Panzacchi M et al. Using Zero-Inflated Models to Predict the Relative Distribution and Abundance of Roe Deer Over Very Large Spatial Scales. Linnell Source: Annales Zoologici Fennici, 52, 1–266. Bradley BA (2013) Distribution models of invasive plants over-estimate potential impact. Biological Invasions, 15, 1417–1429. Bradley BA (2016) Predicting abundance with presence-only models. Landscape Ecology, 31, 19–30. Bradley BA, Blumenthal DM, Early R et al. (2012) Global change, global trade, and the next wave of plant invasions. Fronteirs in Ecology and the Environment, 10, 20–28.

39

Bradley BA, Early R, Sorte CJB (2015) Space to invade? Comparative range infilling and potential range of invasive and native plants. Global Ecology and Biogeography, 24, 348–359. Bradley BA, Allen JM, O’Neill MW, Wallace RD, Bargeron CT, Richburg JA, Stinson K (2018) Invasive species risk assessments need more consistent spatial abundance data. Ecosphere, in press, 1–16. Braga RR, Gómez-Aparicio L, Heger T, Vitule JRS, Jeschke JM (2018) Structuring evidence for invasional meltdown: broad support but with biases and gaps. Biological Invasions, 20, 923–936. Brooks ML, Antonio CMD, Richardson DM et al. (2004) Effects of Invasive Alien Plants on Fire Regimes. BioScience, 54, 677–688. Brown JH (1984) On the Relationship between Abundance and Distribution of Species. The American Naturalist, 124, 255–279. Bullock JM (2006) Plants. In: Ecological Census Techniques, 2nd edn (ed Sutherland WJ), pp. 186–213. Cambridge University Press, Cambridge, United Kingdom. CABI (2017) Invasive Species Compendium: Detailed coverage of invasive species threatening livelihoods and the environment worldwide. Centre for Agriculture and Biosciences International. Center for Aquatic and Invasive Plants (2017) Plant Directory. University of Florida, Institute of Food and Agricultural Services, Gainesville. Christensen RHB (2015) Ordinal package for R, version 3.4.2. 1–22. Cohen J (1960) A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37–46. Commission for Environmental Cooperation (CEC) (1997) Ecological Regions of North America. Montreal, 60 pp. Commonwealth Scientific and Industrial Research Organisation (2004) Water for a Healthy Country. Australian National Botanic Gardens Centre for Australian National Biodiversity Research. Cross T, Finn JT, Bradley BA (2017) Frequency of invasive plant occurrence is not a suitable proxy for abundance in the Northeast United States. Ecosphere, 8, e01800. Curtis CA, Bradley BA (2015) Climate Change May Alter Both Establishment and High Abundance of Red Brome (Bromus rubens) and African Mustard (Brassica tournefortii) in the Semiarid Southwest United States. Invasive Plant Science and Management, 8, 341–352. DiTomaso JM (2000) Invasive weeds in rangelands: Species, impacts, and management. Weed Science, 48, 255–265. Dolan RW, Moore ME (2017) Indiana Plant Atlas. Butler University Friesner Herbarium, Indianapolis, Indiana. Early R, Bradley BA, Dukes JS et al. (2016) Global threats from invasive alien species in the twenty-first century and national response capacities. Nature Communications, 7, 1–9.

40

EDDMapS (2016) Early Detection and Distribution Mapping System. University of Georgia - Center for Invasive Species and Ecosystem Health, https://www.eddmaps.org/. Accessed 18 Oct, 2016. Ehrenfeld JG (2010) Ecosystem Consequences of Biological Invasions. Annual Review of Ecology, Evolution, and Systematics, 41, 59–80. Elith J, Leathwick J (2009) Species distribution models: ecological explanation and prediction across space and time. Annual Review of Ecology, Evolution, and Systematics, 40, 677–697. Elith J, Kearney M, Phillips S (2010) The art of modelling range-shifting species. Methods in Ecology and Evolution, 1, 330–342. Elith J, Phillips SJ, Hastie T, Dudík M, Chee YE, Yates CJ (2011) A statistical explanation of MaxEnt for ecologists. Diversity and Distributions, 17, 43–57. EPA (2014) Report on Environment: Land cover. Estes LD, Bradley BA, Beukes H et al. (2013) Comparing mechanistic and empirical model projections of crop suitability and productivity: Implications for ecological forecasting. Global Ecology and Biogeography, 22, 1007–1018. Executive Order 13112 (1999) Executive Order 13112 of February 3, 1999. Federal Register, 64, 6183– 6186. Fitzpatrick, MC; Gotelli, NJ; Ellison A (2013) MaxEnt vs. MaxLike: Empirical comparisons with ant species distributions. Ecosphere, 4, 1–15. Flora of North America Association (2008) Flora of North America. Flora of North America Association, Point Arena, California. Franklin J, Miller JA (1997) Mapping species distributions: spatial inference and prediction. Board Member of Landscape Ecology Journal of Vegetation Science. Gallagher R V., Englert Duursma D, O’Donnell J, Wilson PD, Downey PO, Hughes L, Leishman MR (2013) The grass may not always be greener: Projected reductions in climatic suitability for exotic grasses under future climates in . Biological Invasions, 15, 961–975. Gamer M, Lemon J, Fellows I, Singh P (2012) Irr package for R, version 0.84. Gomes VHF, Ijff SD, Raes N et al. (2018) Species Distribution Modelling: Contrasting presence-only models with plot abundance data. Scientific Reports, 8, 1–12. Gonzáles-Andrés C, F. M. Lopes P, Cortés J et al. (2016) Abundance and Distribution Patterns of Thunnus albacares in Isla del Coco National Park through Predictive Habitat Suitability Models (ed Hyrenbach D). PLOS ONE, 11, e0168212. Goyal N, Sharma GP (2016) Emerging Invaders from the Cultivated Croplands: An Invasion Perspective. In: Gene Pool Diversity and Crop Improvement, pp. 271–290. Guisan A, Zimmermann NE (2000) Predictive habitat distribution models in ecology. Ecological

41

Modelling, 135, 147–186. Gurevitch J, Padilla DK (2004) Are invasive species a major cause of extinctions? Trends in Ecology & Evolution, 19, 470–474. Gurevitch J, Fox GA, Wardle GM, Inderjit, Taub D (2011) Emergent insights from the synthesis of conceptual frameworks for biological invasions. Ecology Letters, 14, 407–418. Gutiérrez D, Harcourt J, Díez SB, Gutiérrez Illán J, Wilson RJ (2013) Models of presence-absence estimate abundance as well as (or even better than) models of abundance: The case of the butterfly Parnassius apollo. Landscape Ecology, 28, 401–413. Heersink DK, Meyers J, Caley P, Barnett G, Trewin B, Hurst T, Jansen C (2016) Statistical modeling of a larval mosquito population distribution and abundance in residential Brisbane. Journal of Pest Science, 89, 267–279. Hejda M, Pyšek P, Jarošík V (2009) Impact of invasive plants on the species richness, diversity and composition of invaded communities. Journal of Ecology, 97, 393–403. Hernandez PA, Graham C, Master LL, Albert DL (2006) The effect of sample size and species characteristics on performance of different species distribution modeling methods. Ecography, 29, 773–785. Hester SM, Sinden JA, Cacho OJ (2009) Weed Invasions in Natural Environments : Toward a Framework for Estimating the Cost of Changes in the Output of Ecosystem Services. Agricultural and resource economics working paper number 2006–2007. University of New England, Armidale. New South Wales, Australia. Hijmans RJ (2016) Raster package for R. version 2.5-8. Hijmans RJ, Cameron SE, Parra JL, Jones PG, Jarvis A (2005) Very high resolution interpolated climate surfaces for global land areas. International Journal of Climatology, 25, 1965–1978. Hijmans RJ, Phillips S, Leathwick JR, Elith J (2017) Dismo package for R, version 1.1-4. Homer C, Dewitz J, Yang L et al. (2015) Completion of the 2011 National Land Cover Database for the conterminous United States - representing a decade of land cover change information. Photogrammetric Engineering and Remote Sensing, 81, 345–354. Howard C, Stephens PA, Pearce-Higgins JW, Gregory RD, Willis SG (2014) Improving species distribution models: the value of data on abundance (ed McPherson J). Methods in Ecology and Evolution, 5, 506–513. Hui C, Richardson DM, Landi P, Minoarivelo HO, Garnas J, Roy HE (2016) Defining invasiveness and invasibility in ecological networks. Biological Invasions, 18, 971–983. Hulme PE (2006) Beyond control: Wider implications for the management of biological invasions. Journal of Applied Ecology, 43, 835–847.

42

Hulme PE (2009) Trade, transport and trouble: Managing invasive species pathways in an era of globalization. Journal of Applied Ecology, 46, 10–18. Iannone B V., Oswalt CM, Liebhold AM et al. (2015) Region-specific patterns and drivers of macroscale forest plant invasions. Diversity and Distributions, 21, 1181–1192. Ibáñez I, Silander Jr. JA, Wilson AM, LaFleur N, Tanaka N, Tsuyama I (2009a) Multivariate Forecasts of Potential Distributions of Invasive Plant Species. Ecological Applications, 19, 359–375. Ibáñez I, Silander JA, Allen JM, Treanor SA, Wilson A (2009b) Identifying hotspots for plant invasions and forecasting focal points of further spread. Journal of Applied Ecology, 46, 1219–1228. Jardine SL, Sanchirico JN (2018) Estimating the cost of invasive species control. Journal of Environmental Economics and Management, 87, 242–257. Jiménez-Valverde A, Diniz F, Azevedo EB De, Borges P a. V. (2009) Species Distribution Models Do Not Account for Abundance: The Case of Arthropods on Terceira Island. Annales Zoologici Fennici, 46, 451–464. Kartesz JT, The Biota of North America Program (BONAP) (2015) North American Plant Atlas. Chapel Hill, N.C. Keener BR, Diamond AR, Devenport LJ et al. (2017) Alabama Plant Atlas. University of West Alabama, Livingston, Alabama. Kleyer M, Bekker RM, Knevel IC et al. (2008) The LEDA Traitbase: a database of life-history traits of the Northwest European flora. Journal of Ecology, 96, 1266–1274. Kulhanek SA, Leung B, Ricciardi A, Ricciardi1 A (2011) Using ecological niche models to predict the abundance and impact of invasive species: application to the common carp. Source: Ecological Applications Ecological Applications, 21, 203–213. Landis JR, Koch GG (1977) The Measurement of Observer Agreement for Categorical Data Published. Biometrics, 33, 159–174. Lehan NE, Murphy JR, Thorburn LP, Bradley BA (2013) Accidental introductions are an important source of invasive plants in the continental United States. American journal of botany, 100, 1287– 93. Leung B, Lodge DM, Finnoff D, Shogren JF, Lewis MA, Lamberti G (2002) An ounce of prevention or a pound of cure: bioeconomic risk analysis of invasive species. Proceedings of the Royal Society B: Biological Sciences, 269, 2407–2413. Li G, Du S, Wen Z (2016) Mapping the climatic suitable habitat of oriental arborvitae ( orientalis) for introduction and cultivation at a global scale. Scientific Reports, 6, 1–9. Mack MC, D’Antonio CM (1998) Impacts of biological invasions on disturbance regimes. Trends in Ecology and Evolution, 13, 195–198.

43

Mack RN, Simberloff D, Mark Lonsdale W, Evans H, Clout M, Bazzaz FA (2000) Biotic invasion: causes, epidemiology, global consequences, and control. Ecological Applications, 10, 689–710. Masters RA, Sheley RL (2001) Principles and Practices for Managing Rangeland Invasive Plants. Journal of Range Management, 54, 502–517. McDonald A, Riha S, DiTommaso A, DeGaetano A (2009) Climate change and the geography of weed damage: Analysis of U.S. maize systems suggests the potential for significant range transformations. Agriculture, Ecosystems & Environment, 130, 131–140. McGill BJ, Etienne RS, Gray JS et al. (2007) Species abundance distributions: moving beyond single prediction theories to integration within an ecological framework. Ecology Letters, 10, 995–1015. Medd RW, Auld BA, Kemp DR (1985) The influence of wheat density and spatial \r\tarrangement on annual ryegrass, Lolium rigidum Gaud. competition. Australian Journal of Agricultural Research, 36, 361–371. Merow C, LaFleur N, Silander Jr. JA, Wilson AM, Rubega M (2011) Developing Dynamic Mechanistic Species Distribution Models: Predicting Bird-Mediated Spread of Invasive Plants across Northeastern North America. The American Naturalist, 178, 30–43. Merow C, Smith MJ, Silander JA (2013) A practical guide to MaxEnt for modeling species’ distributions: What it does, and why inputs and settings matter. Ecography, 36, 1058–1069. Merow C, Allen JM, Aiello-Lammens M, Silander JA (2016) Improving niche and range estimates with Maxent and point process models by integrating spatially explicit information. Global Ecology and Biogeography, 25, 1022–1036. Millennium Ecosystem Assessment (2005) Ecosystems and Human Well-Being: Synthesis. Washington, DC. Missouri Botanical Garden (2017) Plant Finder. Missouri Botanical Garden, St. Louis, Missouri. Mitchell PJ, Monk J, Laurenson L (2017) Sensitivity of fine-scale species distribution models to locational uncertainty in occurrence data across multiple sample sizes. Methods in Ecology and Evolution, 8, 12–21. Moody ME, Mack RN (1988) Controlling the Spread of Plant Invasions : The Importance of Nascent Foci. Journal of Applied Ecology, 25, 1009–1021. Moreno-Amat E, Mateo RG, Nieto-Lugilde D, Morueta-Holme N, Svenning JC, García-Amorena I (2015) Impact of model complexity on cross-temporal transferability in Maxent species distribution models: An assessment using paleobotanical data. Ecological Modelling, 312, 308–317. Morris WF, Pfister CA, Tuliapurkar S et al. (2008) Longevity Can Buffer Plant and Animal Populations against Changing Climatic Variability. Ecology, 89, 19–25. Myers N (1988) Threatened biotas: “Hot spots” in tropical forests. The Environmentalist, 8, 187–208.

44

Myers N, Mittermeier R a, Fonseca GAB, Fonseca GAB, Kent J (2000) Biodiversity hotspots for conservation priorities. Nature, 403, 853–8. Nielsen SE, Johnson CJ, Heard DC et al. (2017) Can Models of Presence-Absence be Used to Scale Abundance ? Two Case Studies Considering Extremes in Life History. Ecography, 28, 197–208. O’Donnell J, Gallagher R V., Wilson PD, Downey PO, Hughes L, Leishman MR (2012) Invasion hotspots for non-native plants in Australia under current and future climates. Global Change Biology, 18, 617–629. Olson LJ (2006) The Economics of Terrestrial Invasive Species: A Review of the Literature. Agricultural and Resource Economics Review, 35, 178–194. Oswalt CM, Fei S, Guo Q, Iannone III B V., Oswalt SN, Pijanowski BC, Potter KM (2015) A subcontinental view of forest plant invasions. NeoBiota, 24, 49–54. Parker IM, Simberloff D, Lonsdale WM et al. (1999) Impact: toward a framework for understanding the ecological effects of invaders. Biological Invasions, 1, 3–19. Parr CS, Wilson N, Leary P et al. (2014) The Encyclopedia of Life v2: Providing Global Access to Knowledge About Life on Earth. Biodiversity Data Journal, 2, e1079. Pearce JL, Boyce MS (2006) Modelling distribution and abundance with presence-only data. Journal of Applied Ecology, 43, 405–412. Pearce J, Ferrier S (2001) The practical value of modelling relative abundance of species for regional conservation planning: a case study. Biological Conservation, 98, 33–43. Pearson RG, Dawson TP (2003) Predicting the impacts of climate change on the distribution of species: are bioclimate envelope models useful? Global Ecology and Biogeography, 12, 361–371. Peterson AT, Soberón J (2012) Integrating fundamental concepts of ecology, biogeography, and sampling into effective ecological niche modeling and species distribution modeling. Plant Biosystems - An International Journal Dealing with all Aspects of Plant Biology, 146, 789–796. Phillips SJ, Dudík M (2008) Modeling of species distribution with Maxent: new extensions and a comprehensive evalutation. Ecograpy, 31, 161–175. Phillips SB, Anderson RP, Schapire RE (2006) Maximum entropy modeling of species geographic distributions. Ecological Modelling, 190, 231–259. Phillips SJ, Dudik M, Elith J, Graham CH, Lehmann A, Leathwick J, Ferrier S (2009) Sample Selection Bias and Presence-Only Distribution Models. Ecological Applications, 19, 181–197. Pimentel D, Zuniga R, Morrison D (2005) Update on the environmental and economic costs associated with alien-invasive species in the United States. Ecological Economics, 52, 273–288. Ponder WF, Carter GA, Flemons P, Chapman RR (2001) Evaluation of Museum Collection Data for Use in Biodiversity Assessment. Conservation Biology, 15, 648–657.

45

Potts JM, Elith J (2006) Comparing species abundance models. Ecological Modelling, 199, 153–163. Powell KI, Chase JM, Knight TM (2013) Invasive Plants Have Scale-Dependent Effects on Diversity by Altering Species-Area Relationships. Science, 339, 316–318. Pysek P, Richardson DM, Pergl J, Jarosik V, Sixtova Z, Weber E (2008) Geographical and taxonomic biases in invasion ecology. Trends in Ecology & Evolution, 23, 237–244. Pyšek P, Richardson DM (2010) Invasive Species, Environmental Change and Management, and Health. Annual Review of Environment and Resources, 35, 25–55. Pyšek P, Jarošík V, Hulme PE, Pergl J, Hejda M, Schaffner U, Vilà M (2012) A global assessment of invasive plant impacts on resident species, communities and ecosystems: The interaction of impact measures, invading species’ traits and environment. Global Change Biology, 18, 1725–1737. R Core Team (2017) R: A Language and Environment for Statistical Computing. Radosevich SR, Holt JS, Ghersa C, Radosevich SR, Wiley InterScience (Online service) (2007) Ecology of weeds and invasive plants : relationship to agriculture and natural resource management. Wiley- Interscience, 454 pp. Rawlins KA, Griffin JE, D.J. Moorhead, Bargeron CT, Evans CW (2011) EDDMapS: Invasive Plant Mapping Handbook. The University of Georgia. Center for Invasive Species and Ecosystem Health, TIfton GA. BW-2011-02, 32 pp. Seabloom EW, Borer ET, Buckley Y et al. (2013) Predicting invasion in grassland ecosystems: Is exotic dominance the real embarrassment of richness? Global Change Biology, 19, 3677–3687. Seebens H, Blackburn TM, Dyer EE et al. (2017) No saturation in the accumulation of alien species worldwide. Nature Communications, 8, 1–9. Seebens H, Blackburn TM, Dyer EE et al. (2018) Global rise in emerging alien species results from increased accessibility of new source pools. Proceedings of the National Academy of Sciences, 115, 201719429. Sheley RL, Sheley JL, Smith BS (2015) Economic Savings from Invasive Plant Prevention. Weed Science, 63, 296–301. Sheley RL, Mangold JM, Anderson JL (2016) Potential for Successional Theory to Guide Restoration of Invasive-Plant-Dominated Rangeland. 76, 365–379. Sheppard CS, Carboni M, Essl F, Seebens H, Thuiller W (2018) It takes one to know one: Similarity to resident alien species increases establishment success of new invaders. Diversity and Distributions, 24, 680–691. Simberloff D (2006) Invasional meltdown 6 years later: Important phenomenon, unfortunate metaphor, or both? Ecology Letters, 9, 912–919. Simberloff D, Von Holle B (1999) Positive Interactions of Nonindigenous Species: Invasional Meltdown?

46

Biological Invasions, 1, 21–32. Simberloff D, Martin JL, Genovesi P et al. (2013) Impacts of biological invasions: What’s what and the way forward. Trends in Ecology and Evolution, 28, 58–66. Standish RJ, Robertson AW, Williams PA, Williams PA (2001) The Impact of an Invasive Weed Tradescantia fluminensis on Native Forest Regeneration. Journal of Applied Ecology, 38, 1253– 1263. Stohlgren TJ, Schnase JL (2006) Risk analysis for biological hazards: What we need to know about invasive species. Risk Analysis, 26, 163–173. Swearington J, Bargeron C (2016) Invasive Plant Atlas of the United States. https://www.invasiveplantatlas.org/. Swets JA (1988) Measuring the Accuracy of Diagnostic Systema. American Association for the Advancement of Science, 240, 1285–1293. The vPlants Project (2009) vPlants: A Virtual Herbarium of the Chicago Region. USDA, NRCS (2017) The PLANTS Database. National Plant Data Team, Greensboro, NC, USA 27401- 4901, https://plants.usda.gov/java/. Vilà M, Ibáñez I (2011) Plant invasions in the landscape. Landscape Ecology, 26, 461–472. Vilà M, Espinar JL, Hejda M et al. (2011) Ecological impacts of invasive alien plants: A meta-analysis of their effects on species, communities and ecosystems. Ecology Letters, 14, 702–708. Vittinghoff E, McCulloch CE (2007) Relaxing the rule of ten events per variable in logistic and cox regression. American Journal of Epidemiology, 165, 710–718. Weldy T, Werier D, Nelson A (2017) New York Flora Atlas. New York Flora Association, Albany, NY. Wood SN (2003) Thin plate regression splines. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 65, 95–114. Wood SN (2011) Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. Journal of the Royal Statistical Society (B), 73, 3–36. Wood SN (2017) Mgcv package for R, version 1.8-2.3. Yokomizo H, Possingham HP, Thomas MB, Buckley YM (2009) Managing the impact of invasive species: The value of knowing the density-impact curve. Ecological Applications, 19, 376–386.

47

APPENDICES

APPENDIX A: Ecoregions and land cover in the contiguous United States

Figure A1: Level I ecoregions in the contiguous United States (Commission for Environmental Cooperation (CEC), 1997).

48

Table A1: Area and percent of the contiguous U.S. belonging to each ecoregion (Commission for Environmental Cooperation (CEC), 1997) and land cover class (EPA, 2014). Area in study region Percent of study Ecoregion (millions of acres) region by area Eastern Temperate Forests 622 32% Great Plains 554 29% North American Deserts 351 18% Northwestern Forested Mountains 203 11% Northern Forests 91 4% Mediterranean California 41 2% Temperate Sierras 27 1% Marine West Coast Forest 21 1% Southern Semiarid Highlands 11 <1% Tropical Wet Forests 6 <1% Area in study region Percent of study Land cover (millions of acres) region by area Forest 486 26% Agriculture 442 23% Shrubland 431 23% Grassland 290 15% Developed 111 6% Wetland 101 5% Barren 24 1% Perennial ice/snow <1 <1%

Table A2: Aggregation and selection of proportional land cover classes. NLCD 2011 class(es) Aggregated class Included/excluded Deciduous Forest, Mixed Forest Mixed/deciduous forest Included Evergreen Forest Evergreen forest Included Shrub/Scrub Shrubland Included Grassland/Herbaceous Grassland Included Pasture/Hay Pasture/Hay Included Cultivated Crops Cultivated Crops Included Wetlands Wetlands Excluded Developed/Open Space, Developed/Low Developed Land Excluded Intensity, Developed/Medium Intensity Developed/High Intensity, Barren Land, Unsuitable Excluded Open Water, Perennial Ice/Snow

49

APPENDIX B: Invasive species data in the United States

Figure A2: Geographic distribution of absence records from the EDDMapS (2016) dataset.

Figure A3: Spatial bias surface from modeling sampling effort as a response to road and human population density across the contiguous U.S.

50

Table A3: Invasive plant list sources for each state in the contiguous U.S.

State Links to List(s) Description of List Location Alabama Department of http://www.alabamaadministrativecode.state.al.us/docs/agr/McWord1 Alabama 0AGR14.pdf Agriculture Administrative Code Chapter 80-10-14 Administrative Code Title 3 Arizona http://apps.azsos.gov/public_services/Title_03/3-04.pdf Chapter 4 http://www.aad.arkansas.gov/Websites/aad/files/Content/5943045/Of Arkansas ficial_Standards_For_Seed_Certification_in_Arkansas.pdf Sections 2-18-101 and 2-18-108 https://govt.westlaw.com/calregs/Document/ID0CA0B50BE0A11E4 Title 3 Division 4 Chapter 6 A26BC7E8507C2F0D?viewType=FullText&originationContext=doc California umenttoc&transitionType=CategoryPageItem&contextData=(sc.Defa Subchapter 6 of Official ult) California Code of Regulations https://www.colorado.gov/pacific/agconservation/noxious-weed- species#a 8 CCR 1206-2 Colorado Colorado https://www.colorado.gov/pacific/sites/default/files/NoxiousWeedLis Noxious Weed Act t.pdf

Section 22a-381d of Chapter Connecticut https://www.cga.ct.gov/current/pub/chap_446i.htm#sec_22a-381d 446i Title 3 801 Regulations for Delaware http://regulations.delaware.gov/AdminCode/title3/800/801.shtml Noxious Weed Control and Title 3 Chapter 15 https://www.flrules.org/gateway/RuleNo.asp?title=INTRODUCTION Department of Agriculture Florida %20OR%20RELEASE%20OF%20PLANT%20PESTS,%20NOXIO US%20WEEDS,%20ARTHROPODS,%20AND%20BIOLOGICAL Section 5B-57.007 %20CONTROL%20AGENTS&ID=5B-57.007 40-12-4-.01 and Federal Noxious List Because of Georgia http://rules.sos.state.ga.us/nllxml/georgiacodesGetcv.aspx?urlRedirec ted=yes&data=admin&lookingfor=40-12-4 Department 40 Chapter 40-4 Subject 40-4-9.11 Idaho http://invasivespecies.idaho.gov/terrestrial-plants Invasive Species of Idaho Page http://www.ilga.gov/legislation/ilcs/ilcs3.asp?ActID=1735&ChapAct =525%26nbsp%3BILCS%26nbsp%3B10%2F&ChapterID=44&Chap Illinois terName=CONSERVATION&ActName=Illinois+Exotic+Weed+Act Illinois Noxious Weed Act %2E Title 15 Article 16 Chapter 7 Indiana https://iga.in.gov/legislative/laws/2017/ic/titles/015/#15-16-7 Section 2, Violation Found Under IC14-24-5-8 http://www.weeds.iastate.edu/reference/weedlaw.htm Iowa Code of Iowa Chapter 317 http://iowaweedcommissioners.org/iowa-noxious-weed-law/ https://agriculture.ks.gov/docs/default-source/statutes- ppwc/noxious_weed.pdf?sfvrsn=2 Kansas Statues Chapter 2 Kansas http://agriculture.ks.gov/docs/default-source/statutes- Article 13, the Kansas Law acap/seedlaw.pdf?sfvrsn=2 12 KAR 1:120. Noxious Weed http://www.lrc.state.ky.us/kar/012/001/120.htm Seed and Kentucky Acts of the Kentucky http://lrc.ky.gov/Statrev/ACTS2014RS/0053.pdf General Assembly Chapter 129 Pages 946 and 947 Seed Rules and Regulations Louisiana http://www.ldaf.state.la.us/wp-content/uploads/2014/10/Part- XIII_Seed_Rules_and_Regs-6-3-14.pdf Title 7 Part XIII

51

State Links to List(s) Description of List Location Title 7 Part 2 Chapter 103 Maine http://www.maine.gov/dacf/php/horticulture/invasiveplants.shtml Subchapter 11: , and CMR 01-001 Chapter 273 Department of Agriculture Invasive Plant List Tier 1 and Maryland http://mda.maryland.gov/plants- pests/Documents/List_target_spp_for_assessment_12Sept2017-1.pdf Tier 2 Plants with Effective Dates The Massachusetts Prohibited Massachusetts http://www.mass.gov/eea/agencies/agr/farm- products/plants/massachusetts-prohibited-plant-list.html Plant List Natural Resources and http://www.michigan.gov/mdard/0,4610,7-125-1569_16993-11250-- Michigan ,00.html Environmental Protection Act, Michigan Seed Law http://www.mda.state.mn.us/plants/pestmanagement/weedcontrol/~/m Minnesota Department of edia/Files/plants/weeds/noxiousweeds2017.pdf Agriculture 2017 Noxious Weed Minnesota https://www.mda.state.mn.us/licensing/licensetypes/~/media/Files/lic List and Minnesota Seed Law ensing/seed/proresweedseeds.pdf 1510.0271 and 151.0320 Mississippi Noxious Weed List, Mississippi http://www.mdac.ms.gov/wp-content/uploads/bpi_plant_nox.pdf Seed List Missouri http://agriculture.mo.gov/plants/pdf/missouri-noxious-weed-law.pdf Title 2 Division 70 Chapter 45 2017 Montana State Noxious Montana http://agr.mt.gov/Portals/168/Documents/Weeds/2017%20Noxious% 20Weed%20List.pdf Weed list Nebraska Noxious Weed Nebraska http://www.nda.nebraska.gov/plant/noxious_weeds/index.html program website Nevada Noxious Weed list Nevada http://agri.nv.gov/Plant/Noxious_Weeds/Noxious_Weed_List/ according to the Department of agriculture https://www.agriculture.nh.gov/publications- forms/documents/prohibited-invasive-species.pdf New Prohibited Invasive Plant Hampshire https://www.agriculture.nh.gov/publications- Species rule Agr 3800 forms/documents/restricted-invasive-species.pdf

Department of Agriculture New Jersey http://www.state.nj.us/agriculture/divisions/pi/prog/noxious.html Noxious Weed Seed Regulations Department of Ag. Noxious New Mexico http://www.nmda.nmsu.edu/wp- content/uploads/2012/01/weed_memo_list.pdf Weed List Official compilation of Codes, Rules and Regulations of the New York http://www.dec.ny.gov/docs/lands_forests_pdf/isprohibitedplants2.pd f state of NY Title 6 Chapter 5 Subchapter C part 575 http://www.ncagr.gov/plantindustry/plant/weed/noxweed.htm http://reports.oah.state.nc.us/ncac/title%2002%20- North %20agriculture%20and%20consumer%20services/chapter%2048%2 02 NCAC 48A .1702 and 02 0- Carolina %20plant%20industry/subchapter%20c/subchapter%20c%20rules.ht NCAC 48C .0101 ml

http://www.legis.nd.gov/information/acdata/pdf/7-06-01.pdf Chapter 07-06-01-02, and ND North Dakota https://www.nd.gov/seed/regulatory/seedlabelingrequirements.aspx Seed Department

52

State Links to List(s) Description of List Location

http://codes.ohio.gov/oac/901%3A5-37 901:5-37-01 of the Ohio Ohio Administrative Code and 901:5- http://codes.ohio.gov/oac/901%3A5-27 27-06 http://www.oda.state.ok.us/cps/noxweedlaw.htm Oklahoma Noxious Weed Law Oklahoma http://www.oda.state.ok.us/cps/okseedlaw.htm and Seed Law https://secure.sos.state.or.us/oard/viewSingleRule.action?ruleVrsnRs n=158408 Department of Agriculture 603- Oregon https://secure.sos.state.or.us/oard/viewSingleRule.action?ruleVrsnRs 052-1200 and 603-056-0205 n=158659 https://www.pacode.com/secure/data/007/chapter110/chap110toc.htm l Pennsylvania Code 111.22, Pennsylvania https://www.pacode.com/secure/data/007/chapter111/s111.22.html 111.23, 110.1

https://www.pacode.com/secure/data/007/chapter111/s111.23.html Rhode Island http://www.dem.ri.gov/pubs/regs/regs/agric/seedregs.pdf Rhode island State Seed Law South None NA Carolina http://www.sdlegislature.gov/Rules/DisplayRule.aspx?Rule=12:51:03 :01 12:51:03:01 and SD General South Dakota https://www.sdstate.edu/sites/default/files/ps/sdcia/upload/Revised- Seed Certification Standards Standards-_2_.pdf Tennessee Department of ag. Tennessee http://share.tn.gov/sos/rules/0080/0080-06/0080-06-24.20090413.pdf Chapter 008-06-24 Texas Department of Texas http://texreg.sos.state.tx.us/fids/200701978-1.html Agriculture's Invasive Plant List http://ag.utah.gov/divs-progs/50-plants-and-pests/hay-grain-seed/599- State of Utah Noxious Weed Utah noxious-weed-list.html https://rules.utah.gov/publicat/code/r068/r068-008.htm#T2 List

http://agriculture.vermont.gov/sites/ag/files/pdf/plant_protection_wee Vermont Agency of Agriculture Food and Market Quarantine 3, Vermont d_management/noxious_weeds/NoxiousWeedsQuarantine.pdf 7 CFR 360.200 Designation of https://www.law.cornell.edu/cfr/text/7/360.200 Noxious Weeds https://law.lis.virginia.gov/admincode/title2/agency5/chapter390/secti Virginia on20 2VAC5-390-20 Washington State Weed Control Washington https://www.nwcb.wa.gov/washingtons-noxious-weed-laws Board Chapter 19 article 16 west Virginia Seed Law, and West Virginia http://www.legis.state.wv.us/WVCODE/Code.cfm?chap=19&art=16 Department of ag. Noxious Weed List http://dnr.wi.gov/topic/Invasives/species.asp?filterBy=Terrestrial&filt Wisconsin Department of Wisconsin erVal=Y&catVal=PlantsReg#RegSelect Natural Resources Chapter 40 http://docs.legis.wisconsin.gov/code/admin_code/nr/001/40.pdf http://www.wyoweed.org/images/Designated_List.pdf Statues: Seeds Chapter 51 Regulations Pertaining to Seed Wyoming http://wyagric.state.wy.us/images/stories/pdf/techserv/rules/seedsregs 2017.pdf Law and Wyoming Weed and Pest Control Act

53

APPENDIX C: Ordinal ranking and data cleaning for multiple abundance metrics

Many types of abundance data are represented in the EDDMapS dataset (2016), with most falling into the following categories: cover data (percent cover and canopy cover fields), count data (stem count, number observed, and Texas abundance fields), text data, and area data

(infested and gross area fields; Table A4). Percent plot covered was not included because relatively few observations used this field, and all but eight of the records that had percent plot covered also had one of the other selected metrics. The density column is essentially a catchall that mainly contains unitless numbers, area estimates, stem counts, and percent cover data

(Bradley et al., 2018). Entries of density that included ‘%’ or one of the widely used qualitative cover bins (“Trace”, “Low”, “Medium”/ “Moderate”, “High”) were added to the cover data, and entries of density that included a number followed by “plants” were added to the count data.

Table A4: Invasive species abundance data availability by abundance metric in the EDDMapS (2016) database (after data cleaning). Abundance Number of points Number of species with metric one or more points Cover 366,957 549 Text 277,849 490 Count 81,135 247 Area 83,498 458 Any metric 582,607 609

Cover data

Source: Cover data were extracted from cover, canopy cover (only when percent cover is not given; R. Wallace, personal communication, 2017), and density, when reported with one of the

54 main qualitative cover entries (see below), or a number or range of numbers was reported with a

“%”.

Definition of bins: In the most commonly reported bins (Table 1), the number on the border of two adjacent bins is included in both bins. Out of the minority of observations that account for this (e.g. those specifying 0-1%, 1.1-5%, 5.1-25%, 25.1-100%), schemes that placed the boundary number in the lower bin were most common, so I used that rule as the standard for binning the cover data. Likewise, observations spanning two qualitative cover bins (i.e. “low- medium” or “medium-high”) were binned according to the lower of the two qualitative values.

Some reporters used an additional class “majority” for >50% cover. However, most reporters did not differentiate between high and majority cover, so observations of “majority” were included in the high category. Moderate was interpreted as a synonym to medium.

Quality control: Records with numeric ranges reported that were identical or fell within the categories in Table 1 were binned accordingly. Numbers above 100 were not included, as these are uninterpretable. Singular numbers with percent symbols were binned according to the numeric ranges. Unitless numbers from two to one hundred were also binned according to the numeric ranges, for the field implies a unit of percent. Unitless numbers less than or equal to one

(i.e, 0.02, 0.50. 0.85, 1.00) were not used, for I cannot determine whether these are less than one percent estimates (e.g. 0.50 = 0.5%), or proportional cover estimates (e.g. 0.50 = 50% cover).

For numeric range entries spanning multiple bins (e.g. an entry of “20-30%” compared to the bins of 5-25% and >25%), the observation was placed into a bin based on the midpoint of the range entry (25% in the case of 20-30%, placing it in the 5-25% bin). When numerical ranges have no logical maximum, the observation was binned according to the minimum of the range

(e.g. “ >10%” was placed in the 5-25% bin).

55

Count data

Source: Count data were extracted from the following fields: number observed, stem count,

Texas abundance (where all observations were numeric ranges with “plants” as the unit, or the same qualitative counts found in the stem count and number observed columns), and density

(when a number is reported with “plants” or “rosettes” as the unit).

Defining the bins: Most of the entries were numerical ranges, with the most common scheme in

Table 1. Ranking schemes that placed the border number in the lower category were more common than those that placed the border number into the higher category. Therefore, I placed all records on the boundary of two bins in the lower bin rule.

Quality control: Records with numeric ranges reported that were identical or fell within the categories in Table 1 were binned accordingly. Singular numbers with “plants” as well as unitless numbers (because the unit is implied in the number observed and stem count columns) were also binned according to the scheme in Table 1. Qualitative counts (i.e. “few”, “many”, etc.) were also reported, but their correspondence to the numeric classes is not known or inferable. These entries were not included.

For numeric range entries spanning multiple bins (i.e. entries of “0-20” plants compared to the bins of 1 plant, 2-10 plants, and 11-100 plants), the observation was placed into a bin based on the midpoint of the numeric range entered (10 in the case of 0-20, placing it in the 2-10 category). When numerical ranges have no logical maximum, the observation was binned according to the minimum of the range (e.g. “>50 plants” was placed in the 11-100 plants bin).

56

Text data

Source: This data was extracted from one field, abundance text. This column contains the data called “abundance” in the EDDMapS handbook (Rawlins et al., 2011), and contains six acceptable values.

Defining the bins: The acceptable values were “Single plant”, “Scattered plants”, “Scattered linearly”, “Scattered dense patches”, “Dominant cover”, and “Dense monoculture”. Although these terms describe dispersion, a hierarchy of abundance can be inferred (a dense monoculture corresponds with a higher abundance than scattered plants). There is no logical difference in abundance between scattered plants and scattered linearly, so these were included in one category. As the cover bins do not differentiate between dominant cover (~50%) and dense monoculture (~100%), these two entries were also included in one category.

Quality control: As these six terms were the only entries allowed, no quality control was required.

Area data

Source: Data on the extent of infestations is recorded in the infested area and gross area columns. Entries are typically visual estimates rather than measured values. While this indicates suboptimal accuracy and precision, visual estimation is a widely accepted and frequently used method for quantifying area that compares well against more objective measures (Bullock,

2006). Gross area is defined by drawing a border around the perimeter of an infestation (Rawlins et al., 2011). Infested area is the total area within the gross area that is infested by the target species, excluding the areas in between clumps of infestation (Rawlins et al., 2011).

57

Defining bins: Whole numbers of square feet or acres were chosen as the cutoffs to reflect how the data were collected. These data are described as visual estimates and are often reported in simple whole numbers. Most smaller area records are reported in square feet, while most large area entries are recorded in acres (Bradley et al., 2018) The bin cutoff values (Fig. 1) were chosen to capture the variation in infested area, and to maximize congruence with the text and count ranking schemes graphically (Fig. A4) and logically (i.e., for the majority of species, a single plant is unlikely to exceed 10 square feet, so areas less than 10 square feet are categorized with the “Single plant” bin seen in other metrics of abundance). For instance, the median values of infested area within each text bin are included in that bin (or close to the bin boundary) wherever possible (Fig. A4).

Quality Control: Infested area has the potential to be confounded with the gross area field in the

EDDMapS database. Therefore, observations where infested area exceeds gross area (violating the definition in Rawlins, 2011) were not included. Observations from reporters known to enter aggregated data (R. Wallace, personal communication, 2017) were removed, as these would inflate estimates of local infested area. Observations of 1000 acres were only accepted when specified as reasonable by the R. Wallace (personal communication, 2017).

Aligning the bins

While many abundance metrics distinguished between trace and low abundance (i.e. <1% and 1-5% cover, or single plants and 2-10 plants), observations with trace abundance were so rare that these two categories had to be lumped together. While a fourth category may further refine our predictions of abundance, the distinction between trace and low abundance is likely less important to managers than the distinctions between low, medium, and high abundance. The

58 ordinal scales for each abundance data type were aligned so that cover ranks 1-3 were interpreted the same as text ranks 1-3. This method maximizes the volume of data captured for each species and should represent coarse-scale variation in abundance (e.g. high vs. low) even if some finer scale information is lost by combining non-identical metrics. Using observations that had entries for more than one data type, I assessed the compatibility of the different abundance metrics. As expected, agreement between the different metrics is not perfect, but most pairs of metrics had moderate to high Spearman’s correlations (Table A5).

Figure A4: Box plot illustrating the congruency between infested area (a continuous variable) and text (logically ordered qualitative estimates of invasive plant dispersion). The boundaries between bins (1-3, shown in red) are shown as blue horizontal lines.

59

Table A5: Pairwise correlations (Kendall’s tau, bottom left half of matrix) between all abundance metrics, based on the points (sample size, top right half of matrix) with entries for both metrics. All correlations were highly statistically significant (p<0.001). Sample size

Cover Count Text Area

Cover 9,678 108,330 42,499

Correlation Count 0.37 106,624 7,164

(Kendall’s Text 0.28 0.54 21,349 tau) Area -0.08 0.36 0.26

APPENDIX D: Abundance-based species distribution modeling using infested area

One of the most common metrics of abundance in EDDMapS (2016) is the local extent of infestations (area). This metric is especially attractive because it is continuous, while other common metrics like cover are either qualitative or binned. Data on the extent of infestations are recorded in the infested area and gross area columns. Entries are typically visual estimates rather than measured values. While this indicates suboptimal accuracy and precision, visual estimation is a widely accepted and frequently used method for quantifying area that compares well against more objective measures (Bullock, 2006). Gross area is defined by drawing a border around the perimeter of an infestation while infested area is the total area within the gross area that is infested by the target species, excluding the areas in between clumps of infestation (Rawlins et al., 2011). Given this definition, infested area is a more specific metric of local abundance than gross area, and infested area should not exceed gross area (Rawlins et al., 2011). I am using log-

60 transformed infested area as the response variable in my SDMs, and excluding all records where infested area exceeds gross area contrary to the above definition.

One issue with the raw dataset is the large range in infested area values from less than a square foot to hundreds of thousands of acres. The high values may be from aerial surveys or the aggregation of finer scale ground survey data into broader scale estimates of abundance (R.

Wallace, personal communication, 2017). These large values would be problematic if included in my analyses because the data definitions imply a local scale and most of the data points (90%) are less than 10 acres. Therefore, I excluded all observations where gross area is greater than 10 acres. The threshold of 10 acres was applied to gross area because it is the best indicator of the scale at which the data were collected. This threshold was selected to minimize the loss of local- scale abundance data while maximizing the removal of broad-scale abundance data. An area of

10 acres can be considered local at the scale of my analysis; 10 acres covers less than 1% of a climate grid cell. Data were thinned to the 4km x 4km grid cells.

Starting with the same species list from Allen & Bradley (2016), I included all species with observations of abundance in at least 20 unique grid cells. Although this lower limit on data quantity is much more relaxed than the limit used in the ordinal regressions (10 or more events in the rarest bin), this resulted in a similarly sized set of 154 species (compared to 155 in the ordinal regressions). This illustrates the extent to which points are lost when using one abundance metric. Most species in the infested area dataset had less than 50 points, while most species in the ordinal abundance dataset had more than 50 points.

I used generalized additive models (GAMs) to model abundance as a response to the bioclimatic variables (mean diurnal range, minimum temperature of the coldest month, maximum temperature of the warmest month, mean temperature of the wettest month, annual

61 precipitation, and precipitation seasonality). GAMs estimate relationships between one or more predictor variables and one response variable using smooth functions rather than assuming the shape of a response (Wood, 2011). I used thin plate regression splines to estimate non-linear response curves of abundance to climate variables for each species individually, which are available in the mgcv package (Wood, 2017) in R 3.4 (R Core Team, 2017). “Thin plate” is an analogy to bending a metal sheet, where there is some resistance to bending. This allows for non- linear response curves that are not overfitted and is achieved through a penalized smoothness parameter that weights smoothness versus model fit (Wood, 2003). Out of all possible combinations of variables, the best model was selected using AIC.

Overall, model fit tended to be poor. Even in models with reasonable fit, the shapes of fit relationships were ecologically unreasonable. For example, some models predicted that as annual precipitation increases, abundance will decrease until a certain point, where it will increase and then decrease again. It was clear that the GAMs were overfitting implausible relationships to the small species-level datasets. Therefore, I conducted a follow-up analysis using linear regression, where linear and quadratic terms were included. This analysis resulted in even poorer model fits, and implausible relationships (those which are concave up) were very common.

These issues may have been due to the properties of the infested area dataset itself.

Abundance was not necessarily sampled all along the gradient of possible infested areas (i.e. 0-

10 acres) within species. For species where samples ranged from, say 1-2 acres, relationships between abundance and climate variables were spurious, as variation in abundance was not adequately captured. Given that the points with infested area data are a subset of all occurrence points, the datasets for many species did not represent the full range of climatic conditions represented by the species’ full range. In comparison to the ordinal abundance dataset, this issue

62 is magnified by the fact that the infested area dataset is an even smaller subset of the full occurrence dataset.

Given these results, I decided to incorporate more abundance information into this analysis. Continuous variation in the extent of local variation may not be well-captured by the visual estimates used for the infested area and gross area data fields. These methods are more likely to well-represent coarser-scale variation, such as the difference between general ordinal categories (e.g. “low”, “medium”, “high”). Ordinal categorization also provides the opportunity to include multiple metrics of abundance into a single metric of abundance. This option is especially attractive, allowing for the incorporation of much more abundance information from a variety of popularly recorded abundance metrics. As there were generally more data, the shapes of response curves were typically more reasonable, and model fits were more satisfactory.

Therefore, the ordinal regressions were used for the main analysis.

63

APPENDIX E: Intermediate model results

Table A6: Summary of model input and output for all 155 species with adequate data, with the 70 species included in the final analyses placed first and in bold (listed by scientific name, column 1). For each species, I included: the number of climate grid cells with occurrence data and abundance data (column 2-3): the values of metrics used for model screening (columns 4-6) — MaxEnt test AUC score, percent of climate grid cells in the establishment range with MESS scores below -10, and Cohen’s kappa of the best-fit ordinal model; a summary of the selected climate (columns 7-12) and proportional land cover (columns 13-18) variables in the best-fit ordinal model; the size of the establishment range in square kilometers and proportional to the study region (columns 19-20); the size of the impact range in square kilometers and proportional to the establishment range (columns 21-22); infilling of the impact range (column 23), and the percent of the richness hotspot overlapped by the abundance hotspot when calculated without the species (leave-one-out analysis, column 24).

Ordinal Regression Establishment

Impact Range

Range

with Climate Proportional land cover

AUC

data

data

Scientific

test test

nce

2 2 analog cells analog

name

e

ance

-

r

d

r

km km

(MESS)

kappa

6 6

Infilling (%) Infilling

range

forest

range

Leave one out out one Leave

region

Annual

abun overlap hotspot cropland

occu 10 10

Grassland

Cultivated

Pine Pine forest

seasonality

MaxEnt MaxEnt

Shrub/scrub Hay/pasture

% of study study % of

Min temp of

precipitation

Precipitation

% of estab. estab. % of

Max temp of warmest mo. warmest mo.

Mixed/decid.

Mean diurnal

Mean temp of

wettest quart..

Number of cells of cells Number with of cells Number % of non % of

64 INCLUDED IN FINAL ANALYSES

Aegopodium podagraria 406 109 0.945 12.1^ 0.16** -L** -L** 1.0 12.6 0.3 26.4 <0.1 47.4 Ailanthus altissima 3623 1277 0.794 1.7 0.19** -Q** -Q** -Q** +Q** +L** -Q** -L** +Q** 2.9 37.6 0.5 18.0 0.2 46.4 Albizia julibrissin 2927 1668 0.824 4.7 0.22** +Q** +L** -Q** +Q** -Q** +Q** -Q +L* 1.7 21.4 0.2 11.9 0.8 46.0 Alliaria petiolata 3778 1423 0.814 1.5 0.07** -L** -Q** -L** -L** -Q** -L** 1.9 24.5 0.0 0.9 1.1 46.4 Arundo donax 1761 228 0.882 2.5 0.31** +Q** +L** -Q** -Q** 2.2 28.4 0.5 20.5 0.1 46.5 Arctium minus 1800 224 0.733 2.2 0.16** -Q** -Q** +L** +L** 5.1 65.2 0.8 15.7 <0.1 50.2 Berteroa incana 1158 437 0.799 4.1 0.02** -L** +L** -L** 4.1 52.5 0.3 6.1 <0.1 47.4 Berberis thunbergii 2440 1041 0.869 1.4 0.15** +Q** -Q** -Q** +Q** 1.3 16.8 0.0 1.8 0.0 46.9 Broussonetia papyrifera 617 192 0.890 1.6 0.27** -Q** -L** 1.7 21.3 0.4 23.8 <0.1 44.7 Cardaria draba 5304 1837 0.765 8.3 0.02** -Q** -Q** -Q** -L** -L** -L** 2.5 31.7 0.4 14.2 <0.1 48.4 Casuarina glauca 53 51 0.996 7.5 0.19** +Q** 0.0 0.5 0.0 84.3 1.2 47.2 diffusa 4157 670 0.785 6.0 0.07** -Q** -L** +Q** -Q** +L** +L** 2.5 32.0 0.2 8.7 <0.1 47.5 Celastrus orbiculatus 2479 1171 0.874 0.9 0.08** -L* -L** -Q** -L** -L** +Q** -L** 1.3 16.2 0.0 0.1 0.0 47.1 Chelidonium majus 735 106 0.928 30.7^ 0.18** +Q** 1.2 15.5 0.0 0.5 1.2 47.0 Cynanchum louiseae 581 141 0.954 0.0 0.21** -Q +Q** 0.6 7.5 0.0 0.0 NA 47.2 Cynoglossum officinale 4134 1286 0.793 7.0 0.06** -Q** +L** +Q** -Q** +L** +L** +L** 2.1 26.7 0.1 2.8 <0.1 46.3 Cytisus scoparius 3451 159 0.848 8.4 0.6** -Q** 2.0 25.7 0.0 0.0 NA 45.2 Dioscorea bulbifera 769 587 0.961 0.3 0.11** -Q** +L** +Q** +L** 0.2 1.9 0.0 3.3 0.4 47.2

Ordinal Regression Establishment

Impact Range

Range

with Climate Proportional land cover

AUC

data

data

Scientific

test test

nce

2 2 analog cells analog

name

e

ance

-

r

d

r

km km

(MESS)

kappa

6 6

Infilling (%) Infilling

range

forest

range

Leave one out out one Leave

region

Annual

abun overlap hotspot cropland

occu 10 10

Grassland

Cultivated

Pine Pine forest

seasonality

MaxEnt MaxEnt

Shrub/scrub Hay/pasture

% of study study % of

Min temp of

precipitation

Precipitation

% of estab. estab. % of

Max temp of warmest mo. warmest mo.

Mixed/decid.

Mean diurnal

Mean temp of

wettest quart..

Number of cells of cells Number with of cells Number % of non % of

INCLUDED IN FINAL ANALYSES Dioscorea oppositifolia 277 69 0.955 1.9 0.25** +Q** +L 0.9 12.1 0.2 17.9 <0.1 45.8 Elaeagnus angustifolia 3008 879 0.758 1.7 0.18** -Q** -L** +Q** -L** -L** -Q** -L** -L** 5.1 64.9 0.8 15.8 <0.1 49.5 Elaeagnus umbellata 3029 1449 0.838 2.5 0.11** +L** -L** +Q** -Q** +Q** +L** +Q** +L** 1.9 23.7 0.1 6.7 <0.1 46.5 Euonymus alatus 1060 432 0.929 4.4 0.17** +Q** -L** +Q** 1.0 12.4 0.0 2.8 0.0 47.0 Euphorbia esula 5015 2015 0.710 2.7 0.11** +L** -L** -Q** +L** +Q** +L** -Q** -Q** +L** 4.1 52.4 0.3 8.0 0.2 48.4 Frangula alnus 1391 625 0.904 3.3 0.14** +Q** +Q** -Q** -L** -L** -Q** 0.9 12.0 0.1 7.6 <0.1 46.7 Hemerocallis fulva 1061 87 0.838 2.9 0.19** -Q** +L** 2.8 35.4 0.1 2.6 <0.1 45.5 Hesperis matronalis 1554 167 0.788 1.2 0.22** -L** +L** 4.3 54.3 0.5 11.2 <0.1 46.1 Hypericum perforatum 4807 317 0.744 7.2 0.36** -L* +L* +Q** +L** +L* 4.2 53.9 0.2 4.2 <0.1 48.7 Imperata cylindrica 1983 1414 0.909 0.8 0.11** -L** -L** -Q** +L** +L** +L** 0.5 6.9 0.1 18.6 0.9 47.1 65 Iris pseudacorus 1540 208 0.824 8.6 0.22** -Q** -L** +L** +Q** +Q** +L** +L** 3.5 44.6 0.9 24.5 <0.1 46.9

Isatis tinctoria 1471 593 0.848 9.8 0.03** -Q** -L** +Q** 4.0 51.5 0.0 0.9 0.0 45.4 Lespedeza cuneata 1973 720 0.852 3.8 0.44** +L** -Q** -Q** +L** -L** 2.0 26.0 0.5 23.5 0.9 45.2 Leucaena leucocephala 342 257 0.981 0.2 0.24** -L** -L** 0.1 1.2 0.0 0.0 NA 47.2 Ligustrum sinense 1952 1143 0.868 3.4 0.15** -Q** +L** -Q** -Q** +L** 1.5 18.7 0.1 7.3 0.3 46.1 Linaria vulgaris 3492 764 0.737 5.9 0.13** -Q** -L** +Q** -Q** +Q** 4.4 55.5 0.7 16.6 <0.1 49.7 Lotus corniculatus 2785 1056 0.787 7.9 0.08** -Q** +Q** 4.3 55.5 1.4 32.9 0.7 55.7 Lonicera maackii 710 234 0.910 2.2 0.31** +L** -Q** +Q* -L** -L** 1.8 22.4 0.2 10.5 0.2 46.5 Lonicera morrowii 1742 582 0.881 6.0 0.35** +Q** -L** +Q** -Q** +L* 1.5 18.8 0.2 13.3 0.5 47.6 Ludwigia peruviana 557 429 0.971 3.9 0.21** -L** -L** -L** -L** 0.1 1.4 0.0 0.6 0.0 47.2 Lygodium japonicum 1498 1105 0.924 2.0 0.17** -L** -Q** +Q** -L* -Q** -Q** +L** 0.6 7.2 0.0 1.9 0.2 46.9 Lygodium microphyllum 1093 624 0.953 0.5 0.16** -Q** -L** -L* -Q** -L** 0.1 0.9 0.0 0.0 NA 47.2 Lythrum salicaria 7171 1291 0.737 1.1 0.18** +L** +Q** -L** -Q** +Q** 3.5 44.6 0.1 3.7 0.1 45.8 Melia azedarach 2029 1096 0.873 9.0 0.23** -L** +L** -Q** +Q** +L** +L** 1.2 15.9 0.1 10.7 <0.1 46.7 Microstegium vimineum 2214 875 0.879 1.7 0.05** +Q* -L** -Q** +Q** 1.3 16.2 0.3 25.3 0.4 45.5 Neyraudia reynaudiana 196 127 0.991 0.1 0.12** -L** -L** 0.0 0.3 0.0 0.0 NA 47.2 Onopordum acanthium 5964 2228 0.759 4.6 0.17** -Q** -L** -Q** -L** +Q* -L** -L** -L** 2.2 28.6 0.1 3.3 <0.1 46.9 Panicum repens 958 772 0.952 0.6 0.06** +Q** -L** +L** +L** 0.2 2.0 0.0 9.1 3.2 47.2 Paulownia tomentosa 807 229 0.914 7.1 0.28** +L** -L** +L** 1.5 19.2 0.0 0.0 NA 46.1

Ordinal Regression Establishment

Impact Range

Range

with Climate Proportional land cover

AUC

data

data

Scientific

test test

nce

2 2 analog cells analog

name

e

ance

-

r

d

r

km km

(MESS)

kappa

6 6

Infilling (%) Infilling

range

forest

range

Leave one out out one Leave

region

Annual

abun overlap hotspot cropland

occu 10 10

Grassland

Cultivated

Pine Pine forest

seasonality

MaxEnt MaxEnt

Shrub/scrub Hay/pasture

% of study study % of

Min temp of

precipitation

Precipitation

% of estab. estab. % of

Max temp of warmest mo. warmest mo.

Mixed/decid.

Mean diurnal

Mean temp of

wettest quart..

Number of cells of cells Number with of cells Number % of non % of

INCLUDED IN FINAL ANALYSES Phyllostachys aurea 460 244 0.926 19.9^ 0.22** -L** +L** 1.2 15.0 1.0 83.1 0.2 45.9 Phleum pratense 1915 88 0.767 27.9^ 0.33** +L** -L** 4.7 60.4 1.1 23.5 <0.1 46.4 Pinus sylvestris 232 70 0.933 88.6^ 0.35** +Q** +L** 1.2 15.7 0.2 13.0 <0.1 46.5 Polygonum cuspidatum 4702 2133 0.787 3.6 0.11** +L** -Q* +L** -Q** 2.2 28.2 1.2 53.7 1.1 40.3 Pueraria montana var. 2922 1730 0.837 0.4 0.19** -Q** +L** -Q** -L** -Q** +Q** -Q** 1.8 22.6 1.5 83.3 1.4 39.1 lobata Pyrus calleryana 824 563 0.909 4.4 0.28** +Q** +L** -L** -L** -L** -L** +Q** -L** 1.6 20.6 0.1 6.0 0.3 45.8 Ranunculus ficaria 218 77 0.954 8.7 0.25** -L** 1.2 14.9 0.2 16.9 <0.1 46.3 Rhamnus cathartica 2432 1253 0.856 0.7 0.13** -Q** -L** -L** -Q* -Q** 1.2 15.6 0.1 8.5 0.5 47.7 Rosa multiflora 4671 2057 0.798 2.8 0.12** +Q** +Q** -Q** -L** -Q** -L** -L* +Q** -L** 2.4 30.0 0.2 9.1 0.7 45.9

66 Rubus armeniacus 1829 65 0.891 8.6 0.54** -Q** +L** 1.8 22.7 0.6 34.6 <0.1 47.2

Schinus terebinthifolius 1658 1489 0.932 9.1 0.14** -Q** -Q* +L** +Q** +Q** +L** 0.1 1.2 0.0 0.5 3.8 47.2 Sesbania punicea 445 197 0.953 4.4 0.1** -L** -L** 0.7 9.0 0.1 11.0 0.0 47.2 Securigera varia 1253 497 0.820 6.3 0.18** +L** +L** -L* -Q* +Q* +Q** 3.0 38.4 1.7 55.0 0.2 45.2 Sorghum halepense 1718 334 0.781 0.0 0.16** -Q** +L** -Q* -Q** 4.3 54.3 0.1 2.6 <0.1 45.1 Tamarix ramosissima 4601 1689 0.755 2.0 0.1** +L** +Q** +Q** +L** +L** -L** -Q** 3.7 47.1 1.4 38.2 <0.1 50.4 Tanacetum vulgare 3517 2011 0.815 6.9 0.13** +Q** -Q** +Q** -Q** +Q** +Q** -Q** 3.2 40.7 1.4 44.4 0.1 41.6 Triadica sebifera 1916 1210 0.900 5.6 0.3** +L** +Q** +Q** -L** 0.8 10.2 0.3 35.6 0.6 47.7 Tribulus terrestris 2365 389 0.751 8.5 0.05** -Q** +Q** +L** -Q** 5.3 67.4 0.3 5.7 0.0 46.0 Tussilago farfara 1084 224 0.921 15.9^ 0.16** -L** +L** +Q** 0.8 10.0 0.0 0.6 0.0 47.1 Urochloa mutica 292 221 0.982 0.0 0.08** -L** -L** 0.1 1.0 0.0 0.0 NA 47.2 Vernicia fordii 151 94 0.980 18.3^ 0.55** +Q** +Q** 0.4 4.5 0.1 29.7 0.1 47.4 Verbascum thapsus 3428 871 0.749 3.1 0.25** +L** +Q** -L** -Q** -L** +L** 4.1 52.1 0.0 1.0 0.0 45.8 Vinca minor 1110 130 0.867 6.0 0.35** -L** -Q** +L** 2.5 31.4 1.2 47.0 <0.1 38.8 NOT INCLUDED IN FINAL ANALYSES Abrus precatorius 376 325 0.980 2.9 <0.01^ -L* 0.1 0.9 0.0 0.0 NA Acer ginnala 290 173 0.940 0.4 0.02^ +L 1.1 14.4 0.0 0.4 0.0 Acer platanoides 1145 246 0.894 11.5^ 0.11** +L** -L** 2.9 37.0 0.1 2.0 0.0 Achyranthes japonica 152 138 0.984 29^ 0.23** +Q** -Q** 0.3 3.4 0.0 1.0 0.0

Ordinal Regression Establishment

Impact Range

Range

with Climate Proportional land cover

AUC

data

data

Scientific

test test

nce

2 2 analog cells analog

name

e

ance

-

r

d

r

km km

(MESS)

kappa

6 6

Infilling (%) Infilling

range

forest

range

Leave one out out one Leave

region

Annual

abun overlap hotspot cropland

occu 10 10

Grassland

Cultivated

Pine Pine forest

seasonality

MaxEnt MaxEnt

Shrub/scrub Hay/pasture

% of study study % of

Min temp of

precipitation

Precipitation

% of estab. estab. % of

Max temp of warmest mo. warmest mo.

Mixed/decid.

Mean diurnal

Mean temp of

wettest quart..

Number of cells of cells Number with of cells Number % of non % of

NOT INCLUDED IN FINAL ANALYSES Acroptilon repens 5301 1471 0.752 4.9 <0.01^ -L** -L** -Q** -L** -L** -L** 2.6 33.7 0.0 1.1 0.0 Alhagi maurorum 532 206 0.905 16.9^ 0.03*^ +L 3.7 46.8 0.8 22.7 0.0 Ampelopsis 359 76 0.967 8.4 0.07^ -L** 0.5 6.8 0.0 0.0 NA brevipedunculata Anthriscus sylvestris 347 150 0.890 26^ 0.25** -Q** 3.2 40.6 0.0 0.0 NA Arthraxon hispidus 454 65 0.910 31.5^ 0.04^ +L** 1.9 24.0 0.1 2.7 0.0 Artemisia vulgaris 765 50 0.886 27.5^ 0.29** +Q** 3.2 41.3 0.7 21.3 <0.1 Bromus tectorum 11084 3495 0.652^ 4.4 0.22** +Q** -Q** +L** +Q** -Q** -Q** +L** +L** 5.2 66.7 1.2 23.2 0.8 Carduus acanthoides 1179 412 0.795 17.6^ 0.37** +L** -L** +Q** +Q** -Q* +L** +L** 5.1 65.3 1.7 33.3 <0.1 Caragana arborescens 491 323 0.903 46.4^ 0.06** -Q** +Q** 2.9 37.0 0.3 11.1 <0.1

67 Casuarina equisetifolia 441 287 0.977 4.5 <0.01^ -Q** -Q** 0.1 0.7 0.0 0.0 NA

Carduus nutans 7879 2754 0.69^ 0.8 0.04** -Q** -Q** +L** +L** -L** +L** -L** +L** 5.0 63.2 0.3 5.7 <0.1 Centaurea solstitialis 3989 218 0.817 34.1^ <0.01^ +L** +L** 2.7 34.7 0.5 17.7 0.0 Centaurea stoebe ssp. 9248 4446 0.682^ 1.1 <0.01** -Q** +Q** +L** -Q* +L** +L** 3.7 47.6 0.0 0.1 2.3 micranthos Centaurea virgata 755 533 0.929 19.7^ 0.15** -L** -L** -L** +L** 1.6 20.1 0.2 11.1 0.0 Chondrilla juncea 2295 1229 0.838 54.6^ 0.02*^ +L** -Q** +Q** +Q** -L** 2.7 34.4 0.6 23.0 <0.1 Cirsium arvense 14686 6836 0.634^ 2.3 0.05** -Q** -Q** -Q** -L** -L** -Q** 4.0 51.3 0.0 0.0 0.0 Cinnamomum camphora 677 573 0.954 7.8 0.01^ -L** -L** 0.4 5.2 0.0 0.0 NA Cirsium vulgare 8358 1541 0.673^ 0.8 0.23** -Q** -Q** +Q** -Q** +L* 5.0 64.3 0.0 0.0 0.0 Convolvulus arvensis 5588 1264 0.661^ 9.8 0.24** +Q** +L** -L** -L** -Q** -Q** +L** +L** -Q** +L** 5.7 73.2 1.2 21.4 <0.1 Colocasia esculenta 823 374 0.948 21.2^ 0.17** -Q** -L** +Q** -L** 0.6 7.4 0.0 5.9 0.1 Conium maculatum 3365 728 0.725 0.2 0.02^ -Q** -L* 5.1 65.0 0.0 0.0 NA Cynodon dactylon 2321 260 0.776 0.1 <0.01^ -L** -Q** 4.3 55.1 0.0 0.2 0.0 Daucus carota 2831 130 0.776 17.9^ 0.3** -L** +L** +L** -L** 3.9 49.7 0.2 6.1 <0.1 Dactylis glomerata 2592 135 0.724 17.3^ 0.19** +Q** +L 5.2 66.7 1.5 29.0 <0.1 Dipsacus fullonum 1793 497 0.815 14.7^ 0.12** +L** -L* -Q** +L** +Q** 4.0 50.6 0.7 17.2 <0.1 Dipsacus laciniatus 630 120 0.877 26.6^ 0.3** +L** +L** +L 2.9 36.7 1.3 44.9 <0.1 Echium vulgare 770 175 0.862 17^ 0.33** +L** -L** +L** 3.0 38.7 0.3 9.6 0.0

Ordinal Regression Establishment

Impact Range

Range

with Climate Proportional land cover

AUC

data

data

Scientific

test test

nce

2 2 analog cells analog

name

e

ance

-

r

d

r

km km

(MESS)

kappa

6 6

Infilling (%) Infilling

range

forest

range

Leave one out out one Leave

region

Annual

abun overlap hotspot cropland

occu 10 10

Grassland

Cultivated

Pine Pine forest

seasonality

MaxEnt MaxEnt

Shrub/scrub Hay/pasture

% of study study % of

Min temp of

precipitation

Precipitation

% of estab. estab. % of

Max temp of warmest mo. warmest mo.

Mixed/decid.

Mean diurnal

Mean temp of

wettest quart..

Number of cells of cells Number with of cells Number % of non % of

NOT INCLUDED IN FINAL ANALYSES Elaeagnus pungens 225 133 0.938 17^ 0.12*^ -L** +L** 1.2 15.3 0.0 2.2 0.0 Elymus repens 2232 146 0.686^ 6.7 0.36** -L** -L** -Q** 5.4 68.3 1.0 17.7 <0.1 Epipremnum pinnatum 83 73 0.994 0.0 0.1^ -Q** 0.0 0.5 0.0 0.0 NA Erodium cicutarium 2795 528 0.784 35.7^ 0.29** +Q** +Q** -L** -Q** 4.8 61.6 0.5 10.5 <0.1 Euphorbia cyparissias 1623 107 0.813 21.1^ 0.12** +L* 3.8 48.8 0.8 20.6 <0.1 Euonymus fortunei 356 115 0.934 14.2^ 0.27** -L** +Q** +Q** 1.3 17.1 0.4 33.4 <0.1 Euphorbia myrsinites 200 62 0.942 16.9^ 0.23** +L** +L** 1.4 18.2 0.3 23.4 <0.1 Glechoma hederacea 2089 245 0.804 16.6^ <0.01^ +L** +L** 4.2 53.1 0.0 0.5 0.0 Halogeton glomeratus 1262 616 0.877 23^ 0.41** +L** -Q** +Q** +L** -Q** -Q +Q** 3.0 37.8 1.3 44.3 <0.1 Hedera helix 1387 351 0.880 3.0 0.04*^ -L** +Q** 2.0 26.0 0.1 4.6 <0.1

68 Hieracium aurantiacum 1101 468 0.872 16.9^ 0.15** -L** -L** +Q** -Q** -L** 2.1 26.5 1.4 65.3 0.2 -L* Humulus japonicus 595 113 0.915 7.6 0.03^ 1.7 21.1 0.0 0.0 NA Hyoscyamus niger 1181 645 0.887 13.7^ 0.11** -Q** -L** -L** +L** +L** 1.9 24.1 0.0 2.1 <0.1 Lepidium latifolium 3357 710 0.794 1.5 0.02^ -L** -Q** +L** -L** +L** 2.7 34.8 0.2 7.9 0.0 Leucanthemum vulgare 4384 991 0.730 10.6^ 0.28** +L** +L +Q** -Q** -Q** +Q** 4.5 57.8 1.5 32.9 0.5 Linaria dalmatica 3709 738 0.778 18.4^ 0.09** +L** +L** +Q** -Q** +L** -Q** -L** 2.7 34.9 0.0 0.9 0.0 Ligustrum vulgare 711 60 0.896 23.5^ <0.01^ -Q** 2.4 31.0 0.0 0.0 NA Lonicera japonica 4439 1783 0.742 13.5^ 0.18** +Q** +Q** -Q** -L** +Q** 3.2 41.2 1.1 33.4 0.1 Lonicera tatarica 938 150 0.843 18.9^ 0.42** -Q** -Q** +L** 3.5 44.9 0.2 6.4 0.1 Lysimachia nummularia 1507 87 0.853 17.8^ 0.04^ -Q** 2.2 27.5 0.0 0.0 NA Melilotus officinalis 6144 1703 0.631^ 1.6 0.36** +Q** +Q** +L** +Q** -Q** -Q** +Q** 6.1 77.7 0.6 10.5 1.1 Melinis repens 507 399 0.969 14.5^ 0.16** +L** -L** -Q** 0.3 3.4 0.0 0.3 0.0 Miscanthus sinensis 454 107 0.939 2.1 0.12*^ +L** -L* -L* 1.2 15.8 0.1 5.0 0.2 Morus alba 1773 87 0.791 45.9^ 0.15** +Q** 4.5 57.9 0.8 17.2 <0.1 Murdannia keisak 294 57 0.943 16.5^ 0.13*^ -L** 1.1 13.7 0.0 0.0 NA Nephrolepis cordifolia 379 326 0.978 4.0 0.04*^ -L** +L* 0.1 1.1 0.0 13.7 1.4 Oplismenus hirtellus ssp. 295 76 0.900 35.3^ 0.04^ -L 1.8 22.8 0.0 0.0 NA undulatifolius Paederia foetida 237 209 0.985 0.4 0.02*^ -Q 0.1 1.5 0.0 0.0 NA Pastinaca sativa 2897 1540 0.804 37^ 0.18** +Q** -L** -Q** +Q** -L** -Q** +Q** 3.4 43.2 0.8 22.9 0.1 Pennisetum purpureum 213 139 0.987 17^ 0.26** -L** -Q** -L** 0.1 1.6 0.0 34.8 0.3 Polygonum perfoliatum 442 170 0.962 0.6 <0.01^ -L* 0.5 5.8 0.0 0.0 NA Polygonum sachalinense 488 82 0.869 24.6^ 0.21** -L** +L** 4.0 50.5 2.2 55.5 <0.1 Ricinus communis 360 183 0.938 85.3^ 0.16** -Q** 2.2 28.2 0.0 0.0 NA Rumex acetosella 2702 84 0.711 10.6^ <0.01^ +L 5.3 67.0 0.0 0.0 NA

Ordinal Regression Establishment

Impact Range

Range

with Climate Proportional land cover

AUC

data

data

Scientific

test test

nce

2 2 analog cells analog

name

e

ance

-

r

d

r

km km

(MESS)

kappa

6 6

Infilling (%) Infilling

range

forest

range

Leave one out out one Leave

region

Annual

abun overlap hotspot cropland

occu 10 10

Grassland

Cultivated

Pine Pine forest

seasonality

MaxEnt MaxEnt

Shrub/scrub Hay/pasture

% of study study % of

Min temp of

precipitation

Precipitation

% of estab. estab. % of

Max temp of warmest mo. warmest mo.

Mixed/decid.

Mean diurnal

Mean temp of

wettest quart..

Number of cells of cells Number with of cells Number % of non % of

NOT INCLUDED IN FINAL ANALYSES Ruellia caerulea 114 80 0.984 50.3^ 0.16** +L* +L 0.3 3.8 0.1 48.1 0.1 Sansevieria hyacinthoides 189 156 0.990 1.7 0.04^ -L 0.1 0.7 0.0 0.0 NA Salsola kali 724 298 0.789 32.9^ 0.12** -Q* -L** 6.3 79.7 0.0 0.0 NA Salsola 2706 572 0.721 35.7^ 0.33** +Q** +Q** -L** -Q** -Q** +L** -Q** +L** 5.5 70.0 0.9 17.3 <0.1 Senecio jacobaea 2940 82 0.865 18.4^ 0.49** -Q** 0.7 8.5 0.0 0.0 NA Sisymbrium altissimum 3138 266 0.686^ 53.8^ 0.58** +L** -L** +L** -L** -L** 5.9 75.1 0.1 2.3 <0.1 Sonchus arvensis 1945 327 0.755 17.3^ 0.15** -Q** +L** -L** +L** 4.0 51.6 0.7 18.2 <0.1 Solanum dulcamara 2330 192 0.802 38.6^ 0.16** +Q** -L** -L** 3.9 49.4 1.0 26.7 <0.1 Sphagneticola trilobata 361 300 0.981 0.0 <0.01^ -L** 0.1 0.9 0.0 0.0 NA Syngonium podophyllum 200 170 0.988 0.6 0.06^ -L* -Q** 0.1 0.7 0.0 0.0 NA Taeniatherum caput- 1328 74 0.874 42.3^ 0.21** +L** +L** 3.2 40.9 1.5 46.4 <0.1 medusae Torilis japonica 290 81 0.909 6.7 0.05^ +L* 2.0 25.2 0.2 8.2 0.0 69 Tradescantia fluminensis 114 70 0.977 7.5 0.11^ +L -Q** 0.6 7.3 0.2 32.6 0.2

Ulmus pumila 1183 297 0.750 15.3^ 0.22** +L** +L** +Q** -L** -L* 5.3 67.6 0.8 15.6 <0.1 Urena lobata 932 815 0.957 0.1 <0.01^ -Q** -L* +L** 0.1 1.1 0.0 0.0 NA Urochloa maxima 414 356 0.977 11.6^ 0.09** -L** +Q** +L** 0.1 1.4 0.0 2.1 0.8 Verbena incompta 202 137 0.945 27.6^ 0.44** -L** -Q** +L** +L** +L** 0.9 12.0 0.2 19.2 0.3 Vicia cracca 624 185 0.852 29.1^ 0.03*^ -L** 3.3 42.3 0.0 0.0 NA Vicia villosa 1896 158 0.713 81.9^ 0.3** +L** +L** 5.4 68.6 4.3 79.1 <0.1

* marginal significance (0.05

Note: for variables with a quadratic and linear term, sign (+/-) and significance (*/**) pertain to the quadratic term

Figure A5: Maps of (a) potential species richness and (b) the potential number of species at high abundance.

70