Provisional Floristic Checklist of the Middle Magdalena Valley

Rachel K. Rock August 2020

Thesis submitted in partial fulfilment for the MSc in the Biodiversity and of

ii

Abstract

The Middle Magdalena Valley is an intermontane river basin located in the northwest of Colombia. It is situated between the Central and Eastern Cordilleras of the Colombian Andes and is a biodiversity hotspot home to thousands of species. This project aimed to expand the scope of knowledge concerning diversity in the area by creating a provisional checklist of the Middle Magdalena Valley.

This project began with by defining the Middle Magdalena Valley using a combination of watershed, altitude, and political borders resulting in a polygon covering 70,500 km2. A total of 1,292,390 records of digitized, preserved specimens from Colombia detailed in GBIF were cleaned and analysed to yield a final checklist of 1,476 plant names thought to occur in the Middle Magdalena Valley. These names are representative of 171 families and 872 genera of plants and are represented by 16,384 specimen records. This project and checklist serve to spotlight the exceptional plant diversity present in the valley and uphold why continuing work in this area and others like it is so vital to diversity and conservation research.

iii

Acknowledgments

This thesis could not have been completed without the support system that has kept me both motivated and grounded during this process. I would like to offer thanks to everyone who has aided in the completion and presentation of this finished work, including:

I would like to offer special thanks to my supervisors; Peter Moonlight, who spent countless hours on calls explaining the intricacies of ‘R’ and neotropical botany to me, and for serving as support and a sounding board for all my doubts and questions throughout the process; David Harris, for expertise and feedback on BRAHMS databases and checklist construction, without which this project would not have been possible; and to Francisco ‘Pacho’ Fajardo, for his warm welcome during my time in Colombia, bridging communication gaps, and for his much appreciated expertise regarding the area.

I would also like to thank my cohort for all their support during this process. Without sharing the highs, lows, and complexities of this project and year, this process would have been infinitely more difficult and nowhere near as fulfilling.

And finally, thank you to my loved ones and family back home for their never-ending support during this journey. Thank you everyone who has had a hand in shaping both myself and this project. Your support and guidance will not be forgotten. iv

Table of Contents

Abstract ...... ii Acknowledgments ...... iii List of Tables and Figures ...... v 1. Introduction ...... 1 1.1 Biodiversity in the Tropics ...... 1 1.2 The Biodiversity Crisis ...... 1 1.3 Shortfalls in Biodiversity Research ...... 3 1.4 Floras and Checklists ...... 4 1.5 Description of the Study Area ...... 5 1.6 Aims ...... 7 2. Study Area ...... 9 2.1. Formation of the Middle Magdalena Valley ...... 9 2.2. Geographic and Political Boundaries...... 9 2.3. Industrialization and Fragmentation ...... 10 2.4. Previous Work in the Area ...... 11 3. Methods ...... 14 3.1. Data Sets ...... 14 3.2. Development of the Cleaning Pipeline ...... 15 3.3. Definition of the Middle Magdalena Valley ...... 17 3.4. Checklists Construction ...... 18 4. Results ...... 21 4.1. Middle Magdalena Valley ...... 21 4.2. Data Removal ...... 22 4.3. Breakdown of Taxa ...... 26 4.4. Other Relevant Data ...... 35 5. Discussion ...... 45 5.1. Pipeline Analysis ...... 45 5.2. Comparison with Other Checklists ...... 46 5.3. Conclusions from species composition and notable patterns ...... 49 6. Conclusion ...... 52 References ...... 54 Appendixes ...... 58

v

List of Tables and Figures

COVER || CANOPY IN THE MIDDLE MAGDALENA VALLEY DURING 2020 COLLECTION TRIP (CREDIT R. ROCK) FIGURE 1 || NAMES AS DETERMINED BY TNRS 14 FIGURE 2 || SAMPLE OF CHECKLIST 18 FIGURE 3 || MAPS OF THE STUDY AREA 20 FIGURE 4 || MAP OF THE MMV SHOWING THE DISTRIBUTION OF INCLUDED RECORDS 21 FIGURE 5 || PIPELINE DATA REMOVAL BY NAME 23 FIGURE 6 || PIPELINE DATA REMOVAL BY RECORD 24 FIGURE 7 || RELATIVE COLLECTION RATES OF 25 MOST COLLECTED FAMILIES AND SPECIES 27 FIGURE 8 || NUMBER OF SPECIES IN CHECKLIST PER FAMILY AND GENUS 28 FIGURE 9 || DISTRIBUTION OF RECORDS 29 FIGURE 10 || DISTRIBUTION OF FABACEAE RECORDS 29 FIGURE 11 || DISTRIBUTION OF RUBIACEAE RECORDS 30 FIGURE 12 || DISTRIBUTION OF MALVACEAE RECORDS 30 FIGURE 13 || DISTRIBUTION OF POACEAE RECORD 31 FIGURE 14 || DISTRIBUTION OF APOCYNACEAE RECORDS 31 FIGURE 15 || DISTRIBUTION OF ASTERACEAE RECORDS 32 FIGURE 16 || DISTRIBUTION OF BIGNONIACEAE RECORDS 32 FIGURE 17 || DISTRIBUTION OF PIPERACEAE RECORDS 33 FIGURE 18 || DISTRIBUTION OF SAPINDACEAE RECORDS 33 FIGURE 19 || DISTRIBUTION OF MALPIGHIACEAE RECORDS 34 FIGURE 20|| DIGITIZATION OF BONPLAND’S 1801 COLLECTION OF CASEARIA CORYMBOSA 35 FIGURE 21 || NUMBER OF RECORDS INCLUDED IN THE FINAL CHECKLIST VISUALIZED BY YEAR 36 FIGURE 22 || NUMBER OF RECORDS PER COLLECTOR INCLUDED IN THE CHECKLIST FROM 1987 37 FIGURE 23 || NUMBER OF RECORDS INCLUDED IN THE FINAL CHECKLIST VISUALIZED BY DEPARTMENT 38 FIGURE 24 || DENSITY OF RECORDS BY ALTITUDE AND FAMILY (ACANTHACEAE – FABACEAE) 40 FIGURE 25 || DENSITY OF RECORDS BY ALTITUDE AND FAMILY (HELICONIACEAE – PASSIFLORACEAE) 41 FIGURE 26 || DENSITY OF RECORDS BY ALTITUDE AND FAMILY (POACEAE – VIOLACEAE) 42 TABLE 1 || COMPARISON OF CHECKLIST ELEMENTS 47

1

1. Introduction

1.1 Biodiversity in the Tropics

The tree of life is immense and attempts to show the evolutionary relationship between all life on Earth. Everything from the smallest algae to massive whales, anywhere in the world is included. This variety is referred to as biodiversity. It encompasses not only the variety in species, but their genetic diversity and the ecosystems they create. This diversity, however, is not evenly distributed across the globe. These differences in diversity are often measured in terms of species richness or alpha diversity, a measure of how many different species are in an area(Gaston, 2000). Areas like the arctic tundra tend to be less species rich than areas such as moist forests in the tropics. This follows the global pattern of areas closer to the equator being more species rich than areas closer to the poles(Stevens, 1989). This pattern is not uniform though. The change in species richness is not matched in the Northern and Southern Hemispheres, with the richness decreasing more rapidly in the Northern Hemisphere(Platnick, 1991). There are also other factors such as longitude, elevation, and climate differences that affect the species richness of an areas. This leads to areas with very concentrated diversity, dubbed biodiversity hotspots. These hotspots are defined by their high concentrations of species and endemism, as well as the elevated threats of habitat loss that they face (Myers, 1988). These hotspots, while smaller in total area, are many times as species rich as areas of comparable or larger size. They also tend to have higher rates of endemic plant species. For comparison, the Chocó department of Colombia is an area of less than 50,000 km2, but it is home to over 10,000 higher plant species(Myers, 1988). Meanwhile, the British Isles are roughly 60 times a large (300,000km2) and home to less than 5,000 species of higher plants(Stace, 2010). The dense nature of these hotspots makes them prime candidates for botanical studies, and vital for conservation work.

1.2 The Biodiversity Crisis

This diversity is under threat. Without any outside interference species become extinct, whether due to the multitudes of evolutionary pressures such as disease and competition, or from simple bad luck and misplaced meteors(Brusatte, 2015). The same way species become extinct, there is also a natural rate for the creation of new species through speciation. These two processes fluctuate in a balance but are not always equal. During the history of life on Earth there have been periods where the rates of extinction have outweighed the rates of speciation. These period are called extinction events(Vignieri, 2014). During these events, the rate of species extinction greatly outweighs the rate 2

of speciation, leading to a drastic decline in total species and biodiversity. Though there are been many, there are five commonly cited as “mass extinction events” which wiped out upwards of 70% of all species. The most well-known of these is the Cretaceous-Paleogene extinction which killed the dinosaurs around 65 million years ago(Brusatte, 2015). These extinction events are important in providing a frame of reference for current trends in species loss.

When studying threats to biodiversity current species loss is often compared to a measure called background extinction rate (BER). BER refers to the historical average for species loss and is typically expressed as extinctions per million species per year (E/MSY)(Vos et al., 2014). These calculations are based on fossil records, and as such, suffer from limitations in the data they are based on. The calculations tend to focus on taxa that are well represented in the fossil record, such as hard bodied marine animals, and are coarse temporally and taxonomically(Vos et al., 2014). Previous work has put the BER between 0.01 and 1 E/MSY, with more current research that incorporates phylogenetic studies and speciation modelling refines the estimate at 0.1 E/MSY(Vos et al., 2014). Given this estimate and the approximately nine million eukaryotic species on Earth (what the BER is based on), the expected rate of species loss should average one extinction per year. This is not the current case.

Anthropogenic threats such as climate change, shifting land use, and pollution all jeopardize the future of global biodiversity(Brummitt and Lughadha, 2003; Jenkins et al., 2015). These threats and other secondary ones, such as invasive species and habitat fragmentation, are accelerating the rate of species extinction. Recent reports estimate that the current extinction rate is 1,000 times higher than the BER, and could reach as high as 10,000 times the BER if current trends in acceleration continue(Vos et al., 2014). Understanding what is being lost and where is vital for the future of the planet and for conservation efforts.

Present reports estimate that, like mammals and invertebrates, only a fraction of plant species have been described(Jenkins et al., 2015). There is the possibility, especially with current elevated extinction rates, species are becoming extinct before they can be discovered or described(Jenkins et al., 2015). There are many organizations and groups globally at all levels that aim to study biodiversity and its loss. The International Union for Conservation of Nature (IUCN) is a group of hundreds of such member organizations and experts that conduct and compile research and policy suggestions regarding conservation(Vié, 2010). As part of this, they publish a list called the IUCN Red List, which was started in 1964 with the IUCN Red List of Threatened Plants. It was updated in the 1990s with more modern criteria and became the 1997 IUCN Red List of Threatened 3

Plants(Gillett and Walter, 1998). Current lists include species from all taxa, with birds and amphibians being completely assessed(UNEP-WCMC, IUCN and NGS, 2018). Recent IUCN reports include the assessment of 43,556 plant species, 43,201 of them being classified as higher plants, and 41,515 of those being angiosperms. This represents only a little over 10% of the species of plants that are theorized to exist(Scotland and Wortley, 2003). Even with such a small portion of plant species assessed, 147 of the listed angiosperms are categorized as extinct or near extinct in the wild(Vié, 2010). The ratio of extinct and threatened species to assessed species is biased due to at risk and endemic species being the focus of more conservation work and research, but it still paints a bleak picture. Humphreys et al. claim the number of extinct seed plants to be closer to 600 species(2019). Their analysis looks at species that have become extinct since Linnaeus’ Species Plantarum and puts the modern extinction rate for plant species described after 1900 at 18.3 E/MSY(Humphreys et al., 2019). Those these estimates are not as large as others(Jenkins et al., 2015); it does not consider confounding factors such as extinction debts and the longer time scale that many plants operate on. Plants serve as keystone species in ecosystems and their loss have widespread effects.

1.3 Shortfalls in Biodiversity Research

It is part of human nature to try and group and categorize the world around us as a way of understanding it. Science itself is an attempt to methodically and systematically investigate and describe the natural world. Botany is the parts that concern plants and is as old or older than writing itself, as knowledge of different plants and their uses was passed down through the generations(Shoeb, 2008). The classification systems of plants have changed as knowledge of diversity and evolution have changed. Early classifications systems (pre 1700s) grouped plants by form and function. They tended to split plants into trees shrubs and herbs or medicinal and agricultural. This system reflects the priorities and knowledge of the time. In the 18th century, Carl Linnaeus popularized binomial Latin nomenclature of the description of species(Woodland, 1991). This marked the beginning of what is now modern botanical taxonomy. The systems of naming and classification have changed since the publication of Species plantarum in 1753 as more is known about the relationships between species has evolved(Linnæus, 1753). Modern knowledge of evolution and genetics now greatly shapes the way biodiversity is described and studied. This reflects one of the major problems in biodiversity research: it depends on current knowledge to build the framework for discussion. 4

Current attempts as exploring and describing biodiversity are inherently limited(Hortal et al., 2015). Beyond the limitations imposed by the way we talk about and describe biodiversity, there are also limits on what is physically possible to know. Biodiversity is not a direct cause and effect of any one thing, and these interactions exponentially complicate the amount of data needed to have a complete understanding. Hortal et al. propose seven shortfalls in the current understanding of biodiversity stemming from variety of sources(Hortal et al., 2015). They are Linnaean shortfall, Wallacean shortfall, Prestonian shortfall, Darwinian shortfall, Raunkiaeran shortfall, Hutchinsonian shortfall, and Eltonian shortfall.

The two most commonly discussed shortfalls when focusing on biodiversity research are Linnaean and Wallacean shortfalls. The Linnaean shortfall refers to how the majority of species on Earth have not yet been described. It also applies to extinct species and variation within species(Hortal et al., 2015). Without knowledge of all species a complete picture of an ecosystem cannot be accurately constructed. The Wallacean shortfall on the other hand refers to the incomplete nature of species distribution data, especially at fine scale measurements(Lomolino, 2004). The combination of these two shortfalls make up the majority of the hurdles in floristic checklist creation.

The five other shortfalls also contribute to lesser extents. The Prestonian shortfall is the lack of data on relative abundance of species and the shifts in abundance relative to location and time(Cardoso et al., 2011). The Darwinian shortfall reflects how shifts in knowledge and theory concerning evolution affect the conversation about biodiversity(Diniz-Filho et al., 2013). The Raunkiaeran, Hutchinsonian, and Eltonian shortfalls refer to knowledge about species traits, species niches, and species interactions respectively(Hortal et al., 2015). These five shortfalls in combination with the previously discussed two highlight the current gaps in biodiversity knowledge. Understanding these gaps are key in “conscious ignorance” or knowing what we do not know. The creation of a floristic checklist relies on existing knowledge of taxonomy and distribution. It can only be as complete and accurate as current knowledge for the species and area studied.

1.4 Floras and checklists

The same way that botany and taxonomy have been studied and recorded throughout human history, compilations of this knowledge have existed. In earliest forms, they were lists of edible or medicinal plants. They were grouped according to use or where the plants could be found, and were often limited in scope(Shoeb, 2008). Modern Floras on the other hand, tend to be expansive in what they detail. Limited by area, either geographic or political, they aim to detail the species included. 5

Often floras have descriptions of the taxa included which include the history, distribution, and other relevant facts like endemism. More local Floras with a narrower scope also tend include dichotomous keys for identification purposes. A similar type of publication is a monograph, which is a comprehensive treatment of a taxon. Monographs usually include all of the known species in the genus or family being described and cover much of the same information as a Flora. They are however more limited and usually done as a way to define or redefine species. The earliest botanical monograph is a treatment of Apiaceae from 1672 by Robert Morison(Oliver, 1913).

Checklists represent another type of botanical publication. Compared to floras and monographs they tend to contain less detailed information. Checklists compile an inventory of taxa for a specific area. The focus of these inventories usually revolves around assessing the taxonomic breakdown of the included species as a way of measuring the biodiversity in the area. Checklists are often more useful when trying to initially assess the biodiversity and breakdown of the flora in the area compared to a complete Flora.

1.5 Description of the study area

The tropics refer to the area of the globe bounded by the Tropic of Cancer in the Northern Hemisphere and the Tropic of Capricorn in the Southern Hemisphere (23 degrees north and south). They cover almost half of the Earth’s surface area and more than a third of its landmass(Kim Rutledge et al., 2011). All areas where the sun is directly overhead at some point in the year are included. While the term “tropics” is a geographic descriptor, it is more often used to describe a type of biome or climate. The tropics as a region include areas of deserts and snowy mountains in addition to areas that could be referred to as tropical. Tropical zones or areas with tropical climates are areas that tend to be hotter and wetter on average than globally. They usually have wet and dry seasons as opposed to the four temperate seasons, though this does not always apply to tropical rainforests. Areas of North America, South America, Africa, Asia, and Australia can all be classified as part of the tropics.

The neotropics refer specifically to one of the biogeographic realms of Earth’s land surface. It is usually defined to include Florida in the United States, southern coastal Mexico, Central America, the Caribbean islands, and all of South America(Udvardy, 1975). This means it includes areas that are not technically in the tropics but is unified by floristic similarities due in part to the geologic history of the continents. The definition of the neotropics that is often used in botany however includes Florida in the United States, southern coastal Mexico, Central America, the Caribbean 6

islands, and only parts of South America(Antonelli and Sanmartín, 2011). Part of the tropical region is the country of the Republic of Colombia or Colombia. Colombia is located in the northern most part of South America. It is bounded to the north by the Caribbean Sea and Panama, the west by the Pacific Ocean, to the east by Venezuela, and to the south by Brazil, Peru, and Ecuador. This makes it the only South American country to border both the Pacific Ocean and Caribbean Sea. Colombia is divided into 32 political districts, called departments, and five geographical regions(Oyuela- Caycedo, 2008). These regions cover a variety of climates with areas of savannas, deserts, rainforests, and montane climates. In the west the Andean mountain range dictates much of the climactic variability, while to the east large grasslands called Los Llanos dominate. The southern portion of the country is home to Amazonia, which contains the Colombian portions of the amazon rainforest(Luhr, 2003). The broad variation in geography and climate lead to a megadiverse country with a high number of not only total species, but a high number of endemic species(Condit, 1996).

One area of the country that is said to be especially biodiverse are the valleys of the Andean mountain range. The Andes are divided into three cordilleras, the Eastern Cordillera, the Central Cordillera, and the Western Cordillera. The Magdalena River runs south to north through these valleys with the central course of the river following the intermontane basin between the eastern and central cordilleras. This Magdalena River Valley is further divided into three regions, the Upper, Middle, and Lower Magdalena Valleys. The Middle Magdalena Valley (MMV), the area of study for this checklist, is bounded by the Palestina Fault to the west, the Bucaramanga Fault to the east, and the Upper and Lower Magdalena Valleys to the south and north respectively(Cubillos, 2 2008). The MMV covers an area of approximately 34,000 km , or roughly 3% of the country’s total land mass(Luhr, 2003) and connects the dry forests of the Upper Magdalena Valley with the moist forests of the Lower Magdalena Valley. The valley is surrounded at high elevations by montane forests and small areas of Northern Andean páramo. There are also various areas of swamps, wetlands, and rainforests below 500 meters. This mixture of conditions and climates lends the area to high speciation and biodiversity(Parsons, 2020). The high rate of diversity and endemism has made the valley a common site for botanical expeditions throughout the centuries. Consequently, the valley has been the location of discovery for dozens of species and is also the site where European explorers first encountered the potato in the 1500s(Simmonds, 1976).

7

1.6 Aims

There are currently no comprehensive checklists specific to the flora of the Middle Magdalena Valley in Colombia. The area is known to be part of a hotspot for biodiversity, with new species still being discovered. This, coupled with the growing threats of deforestation and climate change affecting the area and others like it, led to the development of this project. The first aim of the project is to create a comprehensive checklist for the flora of the Middle Magdalena Valley. Secondly, this project aims to look at major patterns in the species composition and collection distribution within the valley, in order to gain a more complete understanding of the valley and the flowering plants in it.

8

9

2. Study Area

2.1 Formation of the Middle Magdalena Valley

The Magdalena valley is a region in north-western Colombia that is situated between the central and eastern ranges (also known as cordilleras) of the Colombian Andes. The valley is thought to have formed during the late Triassic to Cretaceous periods (65-120 MYA) as a series of expansions and subductions of tectonic plates(Cubillos, 2008). Originally, as a result of the collision for the Pacific Ocean Plate and the South American Continental Plate, a rift was formed between what is now called the central and eastern cordilleras of the Colombian Andes(Butler and Schamel, 1988). Over time the plates continued to push together raising up both the mountains and the basin. This created the series of intermontane river valleys filled with rich sediment of which the middle Magdalena valley is a part. The combination of the rich soil and position of the valley allowed it to act as a bridge between the biota in Central America and South America(Webb, 1991; Cody et al., 2010). While plants had already been crossing the gap between North and South America before the formation of the land bridge, its formation 3.5 MYA allowed for pollinators and other fauna to move more freely between the previously separate continents(Simpson and Neff, 1985; Silvestro et al., 2015).

2.2 Geographic and Political Boundaries

Modern day Colombia is naturally divided into many regions based on ecological factors and geographic barriers(Oyuela-Caycedo, 2008). The country is usually split into five geographic regions: the Andes mountains region in the northwest, the Pacific Ocean and Caribbean Sea coastal regions, the Los Llanos region of tropical grasslands shared with Venezuela, and the Amazon rainforest basin in the south(Luhr, 2003). The effects of these regions are reflected in evidence of human settlements dating back to 12,500 BCE(Hammen and Urrego, 1978). The area served as a corridor for not only animals and plants but also humans traveling between Mesoamerica and the Caribbean to the North and the Andes and Amazon river basin to the South(Hammen and Urrego, 1978). In the 15th and 16th century, conquistadors and colonizers from Europe started coming to what is now Colombia in search of gold and other resources. They called the area that includes modern Colombia, Venezuela, and Ecuador New Grenada. In the 19th century, many wars were fought which led to the formation of Gran Colombia which over time was dissolved into Colombia, Ecuador, and Venezuela. 10

In 1886, the modern day Colombia, the Republic of Colombia, was formed(Saunders et al., 1991). Internal conflicts and outside influence have led to civil wars through the years resulting in shifting internal and external borders. In 1991, a new constitution was written and the country now has 32 administrative departments along with the capital district of Bogota(Gutiérrez, Acevedo and Juan Viatela, 2007). The geographic region known as the Middle Magdalena Valley includes portions of the modern departments of Santander, Boyacá, Cundinamarca, Bolívar, Cesar, Antioquia, and Caldas.

2.3 Industrialization and Fragmentation

Colombia is home to many diverse biomes which offer a wide array of resources. Areas formed through ocean uplift, like the Andes region, are rich in hydrocarbons and limestone. The occurrence of valuable resources in these areas has led to industrial drilling and mining for decades(Armenteras et al., 2006). These practices have shown that with the increasing demands of the 21st century, and as more countries become industrialized, moist and montane forests are especially at risk. Between 2001 and 2013, 9% (144 km2) of the deforestation in Colombia took place in the Magdalena Valley(Alvarez-Berríos and Mitchell Aide, 2015). Another threat to these habitats is the clear cutting and burning of forests for agricultural use. Whether due to cattle ranching, palm oil plantations, or illegal coca crops, each year more primary forest is lost(Etter et al., 2006; Dávalos et al., 2011). By 1998, over half of the land in the Andean region of the country had been subjected to heavy transformation(Etter et al., 2006).

Previous work in the Colombia Amazon has shown that as areas become increasingly fragmented the life in those becomes at an increased risk due to disease and anthropogenic effects(Armenteras et al., 2006). The deforestation tends to accelerate over time as more of the area becomes exposed. These fragments become cut off from one another and suffer physically and genetically(Link et al., 2010).In the last decade massive amounts of conservation and protection work have been undertaken by both the government and private individuals. Between 2010 and 2017 the land classed as national conserved increased from 13 million hectares to 28.4 million hectares(Catanoso, 2017). But these policies cannot stop the growing trend of illegal cropping, which often takes place in more remote and untouched areas of forest(Armenteras et al., 2006).

Colombia is a biodiversity hotspot. It leads the world in number of discovered endemic species and is second only to Brazil in measures of total species richness(Blumenthal, 2013). According to the Ministry of Environment, the MMV is home to many of the country’s over 6,000 endemic plant 11

species(Bernal, Gradstein and Celis, 2019). It is not known how many of these species are functioning under a local extinction debt, which makes creating a catalogue and increasing protection efforts even more pressing(Kuussaari et al., 2009).

2.4 Previous works in the area

A comprehensive flora of the Middle Magdalena Valley has not yet been undertaken, but Colombia as a whole has a long history of botanical expeditions and records(F. J. Hermann, 1948; Churchill, 1988; Idárraga-Piedrahíta et al., 2011; A., C. and P., 2018). One of the most famous large-scale expeditions in the area was undertaken by José Celestino Mutis. He was Spanish born priest and botanist who travelled to what was then the called New Grenada to study and practice medicine. Once there he petitioned the Spanish government to support a naturalist expedition of the area. In 1763 he began what would become a decades long mission that catalogued thousands of plants and animals and minerals(Villamil-Montero and Ming, 2016). He sent his findings to Linnaeus in Europe who shared them with the scientific community. This inspired other botanists, explorers, and doctors to visit him and the neotropics. In the early 1800s Humboldt and Bonpland visited him upon their arrival in the area. They shared findings many of which were later incorporated into Nova genera et Species Plantarum published in 1815 (Bonpland, Humboldt and Kunth, 1815). His influence can still be seen today in genera such as Matisia Bonpl.in Malvaceae or Mutisia L.f. in Asteraceae, both named in his honour.

Since then Colombia has been a place of focus for many naturalists. Given the wide array of habitats located in the country it is no surprise that a large portion of all described species can be found there. The country is home to roughly 10% of the world’s total animal species (14% of all described amphibian species and 19% of bird species) and almost a fifth of all orchid species on the plant can be found in Colombia (Blumenthal, 2013). Many of the taxa found there are endemic to the country and often a small area within it. Current estimates place the assessment of endemic plants at roughly 6,500 species with new species and genera being described every year(Bernal, Gradstein and Celis, 2019). As recently as 2020 new taxa are still being described from the area. Humberto et al in their 2020 publication describe a new monospecific genus of Rubiaceae, Lintersemina. The type species, Lintersemina chucuriensis, is so far reported as endemic to the MMV like multiple other species in the family(Mendoza-Cifuentes et al., 2020).

Other works in the area include the 1992 revision of Kohleria, which described four new species that are endemic to the country, and the 1993 revision of Malvaviscus, which discusses Malvaviscus 12

williamsii found in the Andes(Kvist and Skog, 1992; Turner and Mendenhall, 1993). These examples are only a small sample of the work that has been done in the area carrying on the long tradition of botanical exploration in the area. Most of the work done is focused on specific taxa, and as the habitats in the valley become more fragmented and exposed to human interaction more taxa are likely to be discovered and described. It can only be hoped that these discoveries and descriptions can outpace the extinction threatening the valley.

13

14

3. Methods

3.1 Data sets

Records of preserved specimens are stored and digitized by many institutions, including national herbaria and international agencies. It is common for records to have duplicates stored in multiple locations, both physical and digital. Due to the preliminary nature of this project, it was decided that beginning with one source for data to serve as a framework would be appropriate. This reduced the problem of duplicate records and simplified the construction of the cleaning pipeline.

Global Biodiversity Information Facility (GBIF) is an online resource that compiles biodiversity data from over 1,500 institutions, representing over 50,000 databases. It works with national and international institutes to make occurrence records available to the public for use. The Herbario Nacional Colombia (COL), Universidad Nacional de Colombia, and many Colombian department herbaria all have records available through GBIF, in addition to older records from collection trips and expeditions taken during the colonization of Colombia. From the databases considered, GBIF offered the widest scope and covered records from multiple sources that were also considered independently. It allowed data to be taken as a whole, while still covering records from multiple databases. For this project, records were filtered to include only those of preserved specimens listed as occurring in Colombia. Further filters were applied restricting the dataset to records identified as vascular plants.

However, taking data from large aggregate databases such as GBIF does pose drawbacks. The nature of these databases and the amount of records stored in them means that they are prone to errors. One common type is errors in GPS coordinates such missing data, rounded or assumed coordinates, and coordinates that fall outside of the listed locality. When cleaning the data, records representing over 12,000 names were removed due to discrepancies between the listed coordinates and the department they were recorded to occur in. Because of the sample size, many of these names were represented by multiple records and were not lost. Further cleaning removed records representing 8,659 names due to mismatched reported and actual altitude data. Again, many of these names were represented by multiple records and were not lost from the dataset completely, but it does emphasize one of the common problems in using this kind of data.

Other common errors involve the identification of the specimen. A recent study has shown that, when using aggregate databases such as GBIF, over half of the included tropical specimens were listed under synonymous or otherwise incorrect names(Goodwin et al., 2015). Figure 1 shows the 15

breakdown of the taxonomic status of the names from this study after being evaluated in TNRS. Roughly a quarter of listed names from this dataset were in need of correction (77% accepted names, 13% synonyms, 10% other corrections).

Figure 1 || A chart showing the taxonomic breakdown of the 32,693 names as determined by TNRS

3.2 Development of Cleaning Pipeline

In this study pipeline refers to the series of cleaning and corrective functions carried out in RStudio to resolve the previously discussed errors common in datasets like the one used. The framework for the scripts created here were based on work by Moonlight et al. (2020).

Because the checklist is restricted to a specific geographic region within the country, the first step of data cleaning was to remove any records with missing latitude and longitude data. This step was important since specimens missing this data would not be able to be accurately categorized as inside or outside of the MMV. Removing these entries does pose a disadvantage, as many records from the study area may be cut simply because they were collected before the addition of GPS data was common practice. We felt that this was a justified compromise for the reason that it still left a 16

sizable portion of the data, and there is always the possibility of integrating the removed records at a later date.

Once the specimens missing coordinate data were removed, the next step was to update and correct the taxonomy of the specimens. The last 20 years have seen large-scale revisions of taxonomy as published by the APG in versions I-IV(Chase et al., 2016). Many smaller scale revisions at the family and genus level have also been undertaken, with particular attention being given to neotropical taxa(Cuatrecasas, 1961; Guarin, 2005, 2007; Sanín and Galeano, 2011). Many of the specimens in the data set predate these and other relevant revisions, and as such need to be standardized. This was done by exporting the unique names from the data into the Taxonomic Name Resolution Service(Boyle et al., 2013) and checking them. TNRS not only corrects common misspellings but it updates synonyms to match with the current literature according to the selected source. In some cases, the names returned by TNRS require manual review, but in general they serve as a decent starting point and allow for the data to be standardized. The standardized versions of the names were then reimported into R and matched to the corresponding records. This allowed the data to be broken into chunks and processed more accurately in subsequent steps.

The next section of the pipeline was built to remove records where the coordinates had been entered incorrectly. To achieve this each record was plotted against a shapefile of the Colombian department matching the listed department using the raster package in R(Hijmans, 2020). The main drawback of this method is that due to the history of the country many of the internal borders have shifted significantly throughout the years, as have the names of some of the departments. This means that records with correct location data could still be removed. However, when these cases are compared to the number of records with truly inaccurate coordinate data, plotting them in places like Japan and New York, it was felt to be a justifiable trade-off.

Next, a mask was constructed of Colombia using SMRT altitude data obtained from CGIAR- CIS(Reuter, Nelson and Jarvis, 2007) to a resolution of 1km2. Each record for all the species were then plotted against this mask. When the difference between the altitude at the coordinates and the altitude recorded on the specimen differed by more than 500 meters the record was removed. 500 meters was chosen due to the proximity of the study area to the Andes. Rapid shifts in elevation over relatively short distances are common in this area, so the cut-off allows for this variation while still excluding erroneous data. Removing these data served as a backup step to catch more of the records with incorrectly entered coordinate data. There is the possibility that due to the resolution of 17

the SMRT data larger differences may have still been assigned to accurate coordinates, but the chance is remote.

Using “Catálogo de plantas y líquenes de Colombia” (Bernal, Gradstein and Celis, 2019) as a reference, the remaining records were then compared to taxa previously known to occur in Colombia. This was done to reduce the including of records that were most likely misidentified or mislabelled. The mask of the Colombian departments from earlier was used, and the specimens were plotted. These coordinates were then cross referenced with the occurrence data by state from the Colombian catalogue. Since the checks were based on political bounds and not ecological bounds, there is the risk that records may have been falsely excluded due to sampling bias in certain departments. There is also the risk of poorly sampled taxa being excluded from this checklist on similar bounds.

The last cleaning step in the pipeline was the exclusion of records outside of the study area. A polygon was created delineating the MMV as defined in the study. Working with a local expert an approximation of the valley was created with department and municipality borders. Shapefiles of the Magdalena watershed were added along with restrictions in altitude (areas between 0-1,000 meters) and political borders to create the final shape shown in Figure 3. A mask of this polygon was used to exclude records with coordinates outside of the study area. Because this is a human made map and uses a combination of human and natural borders to define the study area it is not perfect. However, the size of the data set reduces the chance of the exclusion or inclusion of a species based on small changes in the study area borders that could be due to human error.

3.3 Definition of the Middle Magdalena Valley

The full process of creating the bounds of the MMV used in this project can be found as an annotated R script in Appendix 2.1. Firstly, the watershed for the Magdalena River was download as a shapefile from HydroSHEDS (Lehner, Verdin and Jarvis, 2008). The watershed for the Magdalena River extends the length of the valley, not just the portion that constitutes the MMV. As such, it was trimmed to the longitudes of -80° to -66° E/W and the latitudes of 9° to -5° N/S to create a more accurate bound for the valley. The northern bound of the valley was chosen with the help of a local expert based on personal experience in the area (Francisco Fajardo, personal communication). After these restrictions the altitude data for Colombia, downloaded from CGIAR- CIS(Reuter, Nelson and Jarvis, 2007), was used to create a mask. The mask excluded all locations with an altitude above 1,000 meters. This excluded a mountain peak in the middle of the valley seen 18

as a “hole” in the map in Figure 3. After adjustments for altitude the polygon was further refined using political borders. The department and municipality borders were downloaded from DIVA- GIS(Hijmans, Guarino and Mathur, 2012). A mask was constructed from department borders of Bolívar, Boyacá, Cesar, Cundinamarca, and Santander and the municipality borders of included municipalities from Antioquia and Caldas. A full list of the 37 municipalities of Antioquia and seven municipalities of Caldas included in the final polygon can be found in Appendix 2.1. A final polygon for the MMV was constructed combining the watershed, altitude department and municipality borders and can be seen in Figure 3.

3.4 Checklist construction

The corrected records from the pipeline were imported into BRAHMS v7.9.14 as an Rapid Data Entry (RDE) file(Filer, 2013). From there, the records were imported into a new database founded on the template database available through the University of Oxford website. The records were sorted using FoxPro commands(Stark and Satonin, 1992) for export into the checklist. The checklist was structured by family in alphabetical order. Each family was further broken down by species and genus, also alphabetically. For each name listed, the administrative districts where the species was found are included to give an idea of the species distribution within the study area. A sample of the checklist is shown below in Figure 2 for reference.

Figure 2 || Segment of the checklist showing how it is structured and the information included

The data collection portion of the checklist creation did not allow for the physical examination of the included specimens, and as such observed specimens for each name have been left out. The full list of specimens included in the checklist creation can be found in Appendix 1. The limitations of 19

the data collection also did not allow for the inclusion of endemic status, habit or global distribution of the taxa, which could be useful for conservation work in the future.

Other data was collected for each taxon that is not included in the checklist. This is because it was felt that these measurements and data would be better represented in other formats. Data on the relative collection rates for families in the study area and density of collection are shown in Figure 7 and Figures 9-19 respectively. Another useful metric that has been recorded is the altitude range for the different taxa. A full list of altitude ranges for all recorded taxa is available on Appendix 4, and diagrams illustrating the altitudinal range for some of the more collected taxa are shown in Figures 24-26.

20

21

4. Results

The results of this study can be broken into three main categories: the definition of the MMV, information on the data removed from the pipeline, and the breakdown of taxa in the produced checklist. Section 4.1 will break down the final area included in the study. Section 4.2 will describe the data and amounts removed in each section of the pipeline and its effects. Section 4.3 will provide a look at some of the relevant taxonomic information in the checklist. For the complete dataset and checklist, refer to Appendixes 1 and 4. A further section, 4.4, shows a summary of other noteworthy trends in the data. It includes prominent collectors, trends in collection through time, altitude data, and geographic variances.

4.1 Middle Magdalena Valley

Figure 3 shows the final area designated as the MMV in this study. It covers an area of approximately 70,500 km2, or 6% of the total land area in Colombia. Because the study area was defined using multiple factors instead of a single determinate like watersheds or political barriers, it differs from the commonly cited extent of the valley (34,000 km2)(Luhr, 2003). The MMV as defined in this study accounts for roughly double the area previously discussed as included.

Figure 3 || Maps of A B the study area, the Middle Magdalena Valley (MMV), in Colombia.

A: The MMV as described in this study is highlighted in in green showing its position in Colombia relative to department borders outlines in grey.

B: A closer view of the study area showing the altitude in meters.

22

The records that comprise the checklist have been plotted over this polygon in Figure 4 for visualization.

Figure 4 || Map of the MMV showing the distribution of records used to create the final checklist. The study area is shown in grey scale rendering altitude. Relevant Colombian department borders are shown. Each triangle signifies a single record.

4.2 Data removal

The data originally brought in from GBIF included 32,315 names from 1,292,390 records from 357 published datasets. The first step in cleaning, the removal of data points without GPS coordinates and the consolidation of names according to TNRS, removed 6,705 names. This accounted for a removal of roughly 20% of the total names. The second step removed 3,043 names, roughly 9.5% of the total names, due to discrepancies between stated location and GPS coordinates. The third step removed names where the altitude on the specimen differed by more than 500 meters from the altitude at the stated GPS coordinates. This accounted for the smallest number of removed names, 465, only 1.5% of the total names. The next step accounted for the largest removal of names, removing 19,265 names or 60% of the total. These names were removed if they were not previously recorded as occurring in Colombia or occurring in their listed department. The final step removed remaining names with all records falling outside the MMV polygon. This removed 1,370 23

names, which accounted for 4% of the total names entered into the pipeline. However, these names accounted for 48% of the names that made it to the penultimate step. This discrepancy could reflect either sampling bias present in Colombia, or the variance in biodiversity resulting from hotspots.

The removal as visualized by records as opposed to by name follows similar trends. The first step removed the largest portion of records, 555,715, or 43% of the total records. Steps two removed 143,078 records, 11% of the total. Step three removed 39,795 records and step four removed 46,398 records. The final step removed 77,020 records. This accounted for 6% of the total records that entered the pipeline. Compared to the number of names removed though, it represents a higher proportion. The 77,020 removed records account for 82.5% of the records that made it to the previous step in the pipeline, while the final step only removed 48% of the names.

A full list of the names and records included in the final checklist, as well as relevant accompanying data can be viewed in Appendixes 1 and 4.

24

Figure 5 || The figure above shows the number of names removed from the pipeline at each point. The first step excluded names based on lack of GPS coordinates. It also consolidated names according to the TNRS. The second step excluded names based on discrepancy in stated location and GPS coordinates. The third step excluded names where the altitude of the record differed by more than 500 meters from the known altitude at that location. The fourth step excluded names not previously listed as occurring in Colombia as per the Catálogo de plantas y líquenes de Colombia, or names that occurred outside of their known distribution. The final step excluded records outside of the MMV. The pipeline reduced the names from a starting point of 32,315 names to 1,467 names in the final checklist. 25

Figure 6 || The figure above shows the number of records removed from the pipeline at each point. The first step excluded records based on lack of GPS coordinates. It also consolidated names according to the TNRS. The second step excluded records based on discre pancy in stated location and GPS coordinates. The third step excluded records where the altitude of the record differed by more than 500 meters from the known altitude at that location. The fourth step excluded records whose names were not previously listed as occurring in Colombia as per the Catálogo de plantas y líquenes de Colombia, or whose names occurred outside of their known distribution. The final step excluded records outside of the MMV polygon. The pipeline reduced the records from a starting point of 1,292,390 records to 16,384 records in the final checklist. 26

4.3 Breakdown of taxa

The full checklist can be found in Appendix 1 and a full list of all taxa including the number of records per taxa, as well as the number of species per genus and family can be found in Appendix 4. The graphs and results in this section will only focus on the most common taxa as determined by number of collections and number of names.

The checklist is comprised of 1,467 names. These names are split into 171 families, and 872 genera. The family represented by the most species is Fabaceae, with 173 recorded species included in the checklist. Families included in the checklist have a mean of 13 species, while many families are represented by a single species. Forty of the 171 families in the checklist are represent by a single species, 23% of the checklist’s families. Conversely, the 42 families with the most species account for over half of all species in the checklist.

Figure 7 shows the most collected families and species in terms of total records. The most collected family was Melastomataceae with 2,215 records, followed by Fabaceae and Rubiaceae with 1,209 and 1,186 records respectively. Melastomes account for 13.5% of the total records included in the final checklist. Interestingly, Melastomataceae does not feature in any of the top five most collected species. These five, Palicourea guianensis, Compsoneura mutisii, Isertia haenkeana, Passiflora vitifolia, Mendoncia lindavii come from the families of Rubiaceae, Acanthaceae, Myristicaceae, and Passifloraceae. lacera, the most commonly collected Melastomataceae species, is represented by less than 100 collections. This is indicative of the sharp decline in the number of records per family and species as we move away from the most heavily collected taxa. Seven families in the checklist (Nephrolepidaceae, Podostemaceae, Dennstaedtiaceae, Amaryllidaceae, Cannaceae, Apodanthaceae, and Anemiaceae) are represented by single records and 249 species in the checklist are represented by only a single collection.

Compared to Figure 7 which shows the number of records per taxa, Figure 8 shows the breakdown of families and genera represented by total number of species. Families with 50 or more species are included in the visualization. The geographic distributions of these records are also plotted by family in Figures 9 – 19. These 11 families account for 42% of the total species in the checklist. The genera with more than 15 species are also included in Figure 8. These 11 genera account for 12% of the total species in the checklist and represent nine 27

different families. Of these nine families, four appear as families with more than 50 species, Rubiaceae, Melastomataceae, Piperaceae, and Sapindaceae. 28

A

B

Figure 7 || Graphs representing the number of collections for different taxa. A: The 25 most collected families included in the checklist as shown by total records. B: The 25 most collected species as shown by total records. (Gloeospermum… refers to Gloeospermum sphaerocarpum) 29

A

B

Figure 8 || Graphs representing the number of species represented in the checklist for different families and genera. A: The number of species represented in the checklist broken down by family. Only families with more than 50 species are included. B: The number of species represented in the checklist broken down by genus. Only genera with more than 15 species are included. 30

Figure 9 || Distribution of Melastomataceae records. The study area is shown in grey scale rendering altitude in meters. Each triangle signifies a single record.

Figure 10 || Distribution of Fabaceae records. The study area is shown in grey scale rendering altitude in meters. Each triangle signifies a single record. 31

Figure 11 || Distribution of Rubiaceae records. The study area is shown in grey scale rendering altitude in meters. Each triangle signifies a single record.

Figure 12 || Distribution of Malvaceae records. The study area is shown in grey scale rendering altitude in meters. Each triangle signifies a single record. 32

Figure 13 || Distribution of Poaceae records. The study area is shown in grey scale rendering altitude in meters. Each triangle signifies a single record.

Figure 14 || Distribution of Apocynaceae records. The study area is shown in grey scale rendering altitude in meters. Each triangle signifies a single record. 33

Figure 15 || Distribution of Asteraceae records. The study area is shown in grey scale rendering altitude in meters. Each triangle signifies a single record.

Figure 16 || Distribution of Bignoniaceae records. The study area is shown in grey scale rendering altitude in meters. Each triangle signifies a single record. 34

Figure 17 || Distribution of Piperaceae records. The study area is shown in grey scale rendering altitude in meters. Each triangle signifies a single record.

Figure 18 || Distribution of Sapindaceae records. The study area is shown in grey scale rendering altitude in meters. Each triangle signifies a single record. 35

Figure 19 || Distribution of Malpighiaceae records. The study area is shown in grey scale rendering altitude in meters. Each triangle signifies a single record.

4.4 Other Relevant Data

The results of the pipeline included more than just taxa names and geographic locations. Information on altitude, collector, year of collection and department are also included in the report available in Appendix 4. When observing the trends in collections temporally two noteworthy trends can be observed. Firstly, there was a large uptick in the number of collections in the region between 1970 and 1990 corresponding to a period of civil change leading up to the creation of the 1991 constitution. The peak of this surge in collections occurred in 1987, with 2,072 collections in that year alone. These changes in collection rate are visualized in Figure 20 with a more complete breakdown of collections per year available in Appendix 4. A breakdown of the top collectors from 1978 is also shown in Figure 21. Secondly, the records included in the checklist span 218 years (1801-2019). The oldest record included in the checklist is a collection of Casearia corymbosa by Bonpland from the department of Bolivar dated April 1801. It is held in Field Museum of Natural History - Botany Department in Berlin. The specimen is part of the collection digitized by J. F. Macbride. 36

Figure 20||Digitization of Bonpland’s 1801 collection of Casearia corymbosa. (Credit: The Field Museum of Natural History (2014). J. F. Macbride's Historical Photographs (1929-1939) of Type Specimens from Berlin (B))

37

Figure 21 || Number of records included in the final checklist visualized by year. 38

Figure 22 || Number of records per collector included in the checklist from 1987

Another trend in the data to consider is the distribution of collections throughout the different departments of the MMV. When looking at Figures 9-19 it can be seen that, while different families have different distribution patterns, the valley as a whole is skewed towards more collections on the western side of the valley, primarily in Antioquia. This trend can be seen more clearly in Figure 21 which shows the breakdown by department for all records included in the final checklist. Antioquia alone accounts for more than half of the records (11,715) while Santander and Cundinamarca, the next most heavily collected departments, account for 1,614 1,417 records, respectively. There are also a small number of records stated to occur in departments not that are not the original seven used to define the MMV (Antioquia, Santander, Cundinamarca, Bolívar, Caldas, Boyacá, and Cesar). Upon further investigation it was found that the records indicated as occurring in Tolima, Chocó, Córdoba, and Nariño are located along the western border of the valley and are most likely the result of the resolution capabilities of the mapping. Also, as the capital district of Bogotá falls within the department of Cundinamarca records occurring in this area are part of the MMV. 39

Figure 23 || Number of records included in the final checklist visualized by department. 40

Included in Appendix 4 are the altitudes for each record included in the final checklist. The altitude data for the 25 most collected families were chosen to serve as representatives for the dataset as a whole and have been visualized using violin plots in Figures 22-24. The families are shown in alphabetical order.

A few general conclusions can be drawn from these figures. First, certain families tend to be clustered at different altitudes. Meliaceae, Myristicaceae, and Bignoniaceae all show a bias towards lower elevations, while Clusiaceae, Heliconiaceae, and Melastomataceae all have collections at higher elevations. This clustering is not true for all the families presented, with families such as Moraceae and Passifloraceae have a more even spread of collections across their entire recorded range. Another difference that can be noticed in these figures is in the comparative sizes of the altitudinal ranges of the families. Violaceae covers a relatively small range of 60-970 meters. Meanwhile, Sapindaceae and Malvaceae span almost the entire range of altitudes included in the valley (7-1610m and 40-1645m respectively).

However, having a listed altitude was not a requirement for specimens’ inclusion in the checklist, and as such, it is possible that there are records and taxa included in the final list with higher and/or lower altitudes. The records included in the final checklist that did list an altitude span from 2-1765 meters in altitude, with the highest recorded species being Blechnum polypodioides from Blechnaceae. 41

Figure 24 || Density of records by altitude and family. Each violin represents the altitudinal distribution of one family. The width indicates the relative abundance of collections at each altitude per family. The central box plots illustrate the mean as a dot and the median as a horizontal bar. 42

Figure 25 || Density of records by altitude and family. Each violin represents the altitudinal distribution of one family. The width indicates the relative abundance of collections at each altitude per family. The central box plots illustrate the mean as a dot and the median as a horizontal bar. 43

Figure 26 || Density of records by altitude and family. Each violin represents the altitudinal distribution of one family. The width indicates the relative abundance of collections at each altitude per family. The central box plots illustrate the mean as a dot and the median as a horizontal bar. 44

45

5. Discussion

5.1 Pipeline Analysis

The goal of the pipeline constructed in this project was to sort specimen records from Colombia for occurrences in the MMV. Of the 1,292,390 total records representing 32,315 names that entered the pipeline, 16,384 records representing 1,467 names were included in the final checklist. This signifies a roughly 99% reduction in records and 95% reduction in names. These numbers are in line with the expected values for a data set of this size. When it came to the different cleaning steps, they did not all remove equal amount of data, and the data they removed was not always equivalent. The steps in the pipeline can be broken into two main categories, steps that removed missing or faulty data and steps that curated the list to be specific to the MMV. Steps one through three all fall broadly into the first category, while steps four and five fall into the latter.

The first step, the removal of records without associated GPS coordinates, removed a large portion of the data both in terms of total records and names removed. This step was designed to remove records that could not be definitively placed in the valley. A side effect of this step is that some records that were truly from the MMV were excluded, either due to human or mechanical error. In future iterations of the checklist a secondary step that assess records lacking GPS coordinates for other locality information would be helpful. This could allow for the construction of a more complete checklist and picture of species richness in the valley. Step three, removal of records by altitude, removed the smallest amount of data from the pipeline in terms of both total records and names. By this point a large portion of the records with incorrect location data had been removed by the previous steps. This may account for why step three removed such a small portion of the data when compared to the other steps. This small number of removed records does seem to indicate that in combination with the previous steps, the first portion of the pipeline was successful in removing records with missing and faulty data.

The fourth step, removal of records using the Catálogo de plantas y líquenes de Colombia, was the first step that aimed to curate the data to the specific goals of the checklist. One aim of the step was to provide a check that could broadly remove misidentified records without individual scrutiny. Single records of species not thought to occur in Colombia could be caught where they otherwise would end up in the checklist. This step removed a large portion of names from the pipeline. Due to the limitations of this project it was not possible to 46

examine these taxa individually, but future efforts could examine these removed taxa and prove useful in expanding the checklist. The final step of the pipeline refined the list of records to include only those reported from inside the MMV and illustrated another key point about the nature of the data removed in the pipeline.

It is important to consider when discussing the effectiveness of the pipeline, the disparity in the proportion of names to records removed in various steps. These differences can be looked at in regard to what kind of data each step tended to remove. Step one removed the most records (555,715 of 1,292,390), but step four removed the most names (19,265 of 32,315). These variations reflect the different aims of said steps. As state previously, the first step was designed primarily to remove records based on quality. As such it was completed early in the process and removed a large portion of the records. The fourth step on the other hand was designed to make sure the included data made sense in the framework of previous research and the scope of this project. This included checking if species were reported to occur in the area. Due to this, it resulted in the removal of whole taxa, not just individual records. When looking at the ratio of records removed to names removed in the final step in light of what the step was designed to do showcase an important takeaway of the checklist.

As stated in section 4.2, the reduction in names during step five accounts for removal of 48% of the names, but 82.5% of the records. This seems to suggest that while the MMV accounts for only a small portion of the total area of the country, it contains a large portion of the species. This supports the idea that the valley is a biodiversity hotspot and critical for efforts in conservation and research. Of the 6,499 species listed as endemics of the country by the Catalog of Plants and Lichens of Colombia, 400 are cited as occuring in the valley(Bernal, Gradstein and Celis, 2019). Further work with the checklist would be needed to verify how many of the listed endimcs are recorded in the MMV, but these insights have the potential to prove invaluable for the work in the area.

5.2 Comparison with Other Checklists

When discussing the success or failure of the produced checklist it is important to consider the objectives of the checklist and current paradigms in floristic checklists. This checklist is intended foremost to serve as a starting point for understanding the scope of plant diversity in the MMV. It is also by nature different from checklists produced from small study areas and 47

datasets, as well as checklists created directly from survey of an area as opposed to aggregate databases as this checklist. One criterion in the evaluation is what information is included. The checklist produced by this project (Appendix 1) includes three key pieces of information about each name listed: scientific name, authority, and the departments it occurs in within the MMV. It does not include other data attached to the records and taxa (found in Appendix 4) such as altitudinal range and collection notes on habit. Other information that was not included in this checklist but are featured in others include synonymy, conservation status, endemism, and observed specimens.

Table 1 below shows a comparison of included elements between the checklist produced in this study and three other checklists of varying format and scope: Ecological Checklist Of The Missouri Flora For Floristic Quality Assessment (Ladd and Thomas, 2015), Checklist of the Vascular Plants of Niagara Regional Municipality , Ontario (Oldham, 2010), and An updated checklist of the vascular flora native to Italy (Bartolucci et al., 2018). For reference, the number of names included in each checklist are as follows- MMV (1,467), Missouri (2,961), Niagara (1,696), Italy (8,195).

Table 1 || Comparison of included checklist elements. (MMV = Middle Magdalena Valley)

MMV Missouri Niagara Italy

Scientific name X X X X

Authority X X X

Distribution X X X

Habit X

Common name X X

Synonymy X X

Conservation status X X

Endemism X

The Missouri checklist is split into three parts (common names, synonymy, and wetness ratings) and covers an area of 180,560 km2. The Niagara checklist is a technical report and covers an area of 1,854.23 km2. The Italy checklist is an update to a previous version of the Italian flora and covers an area of 301,340 km2. This is stated to illustrate the different scopes and intents of each lists.

The core element of what makes a report a checklist is the inclusion of a standardized list of plant names. Without these plant names it ceases to be useful botanically. The next element 48

considered is the authority for the species name. This is another part that is considered standard in a plant checklist. The Missouri checklist does not include authorities with scientific names, but it is the opinion of this paper that this information is important in disambiguation and as such has been included. The final element included in the checklist presented in this project is the local distribution of each name, stated as departments. The checklist does not include global distribution or endemic status due to limitations in the data, but it was felt that the inclusion of distribution at the local level was still important to provide an idea of the ranges of the species. The Italy checklist included the distribution data in supplemental materials, which was considered for this project, but it was decided that due to the small number of possible regions it made sense to include it in the final checklist. The full scope of the distribution data is also available in Appendix 4 and select taxa are visualized in Figures 9-19, as a way of showing a more complete picture.

In this vein, altitude data was restricted to supplemental data and not included in the final version of the checklist. The main reason for this decision was that the pipeline did not require an associated altitude for records to be included in the checklist. There was fear that this data would be skewed, especially in species with few collections. For similar reasons information on habit and habitat for each species was not included. In records that included information on habit and habitat it can be found in Appendix 4. It was also felt that this type of information would be more appropriate in a full flora or revision as opposed to a checklist.

While other discussed elements would be welcome additions to this checklist the limitations of the scope of this project necessitated that full synonymies and conservational status be left out and provides a good place for future expansion of this work.

There is also another type of checklist that has gained popularity in recent years, searchable annotated checklists. Examples of these include the 2020 Flora of Brazil and the Catalogue of the Plants and Lichens of Colombia. They serve as a combination database and checklist and allow users to filter results based on criteria like those discussed previously (Cardoso et al., 2017). Another example is Solanaceae Source which was created from a BRAHMS database like the one used in this project (PBI Solanum Source, 2020). The adaptation of the checklist and associated database produced by this project into a searchable online format could be useful and offers a direction for future work.

49

5.3 Conclusions from species composition and notable patterns

The project ended with a list of 171 families, constituted of 872 genera. These taxa are split into 1,467 names. The upper quartile of families, in terms of collections per family, account for over half of all the names included in the final checklist. The four families represented by the most names (Fabaceae, Rubiaceae, Melastomataceae, and Malvaceae) account for approximately a third of the checklist. These four families also appear in the top of the most commonly collected families list. This reflects a pattern seen throughout the data, where collections and collectors tend to focus on a small subset of the taxa that are present in the area. The majority of these collection also tend to be made by a small number of collectors(Bebber et al., 2012).

The collections in 1987 provide a good case study on this trend. During that year there were 2,072 records, 773 of which are credited to Ricardo Callejas Posada. Other large collectors during that year included Juan Guillermo Ramírez and Alan E. Brant; together with Posada their collections total over three quarters of all records from 1987. For a little bit of background on these large scale collection efforts it help to know that Posada is professor at the University of Antioquia who completed his doctorate in 1987, a taxonomic revision of the family Piperaceae. He also worked on completing the Flora of Antioquia(Idárraga-Piedrahíta et al., 2011) and other studies and revisions in the area . In 1987 he was part of a large project that aimed to create a more complete picture of the flora of Colombia. As such, this work considerably expanded the scope of knowledge regarding the presence of species in the country. Still, most of these collections focused on a few specific taxa. Only 175 of the over 4,000 species collected during that year were represented by a single collection. Even fewer families, 18 of the 101, were represented by collection only once.

This brings forward a problem in the dataset, collection bias. Most collecting trips, especially in the neotropics are limited in taxonomic scope. This is often necessary due to the immense amounts of biodiversity present. Specialist focus on what they know and collect those taxa more often. Specimens that preserve well or make prettier collections also tend to be more represented. These concentrations on certain taxa are a can be seen as a double-edged sword. Repeated, intense focus on the same taxa lead to species like Palicourea guianensis being recorded over 100 times, often in the same locations and time frames, while other species are neglected. On the other hand, a focus on a family such as Piperaceae or Orchidaceae can lead to large amounts of progress in terms of the scope of knowledge in a relatively short amount 50

of time. To get a full picture of biodiversity in the area it is necessary to have a mixture of generalist and focused collecting trips.

These biases are not only present in what is collected, but also in where collections are from. Most of the records in the database are from the western side of the valley, and Antioquia specifically. There are many reasons for this bias. Certain areas of the valley are more easily accessible due to both geography and political restrictions. Because of this, while interpretations of species distribution within the valley can be made, they need to be considered with caution. As such, distribution data presented in the checklist should be viewed in an inclusionary manner as opposed to an exclusionary one. The current checklist uses records from 375 datasets. Future work incorporating more datasets can bolster the validity of species distribution claims. Another possibility to improve the checklist and assessments of species distribution would be to add a temporal check on records. Some species such as Passiflora magdalenae which are currently included in the checklist have not been documented in the valley since the mid-20th century.

Overall, this project proves that a lot of work can be done with existing datasets, without the need for additional field work. Using data from 375 individual datasets a preliminary checklist for the Middle Magdalena Valley was completed. The checklist is not comprehensive, but it is a good starting point for future work in the area and provides an introduction to some of the globally important biodiversity present there.

51

52

6. Conclusion

The aim of this project was to construct a preliminary floristic checklist for the Middle Magdalena Valley in northern Colombia. The valley is knowns as a hotspot for biodiversity, but until now a comprehensive checklist had not been undertaken. While the checklist presented here is preliminary in nature, it provides a strong starting point for future work. This research has succeeded in constructing a working definition of the Middle Magdalena Valley that combines political and geographic barriers. The map created based on this definition was used in conjunction with the cleaning pipeline to produce a dataset of records from GBIF of taxa occurring in the Middle Magdalena Valley. The analysis of this dataset and the checklist created from it has highlighted various trends not only in the taxonomic variation in the valley, but also about the long history of collections in the area. The final version of the checklist is based on 16,384 records covering 1,467 names split into 171 families and 872 genera. The records span more than 200 years of collections from over 1,000 different collectors or excursions. It is hoped that this project has highlighted the immense biodiversity present in the Middle Magdalena Valley and that the dataset and checklist created can serve as the basis for future work in the area.

53

54

References

A., G., C., A. and P., H. A. (2018) ‘First Report of Peridiscaceae for the Vascular Flora of Colombia’, Harvard Papers in Botany, 23(1), pp. 109–121. doi: 10.3100/hpib.v23iss1.2018.n12. Alvarez-Berríos, N. L. and Mitchell Aide, T. (2015) ‘Global demand for gold is another threat for tropical forests’, Environmental Research Letters. IOP Publishing, 10(2). doi: 10.1088/1748- 9326/10/2/029501. Antonelli, A. and Sanmartín, I. (2011) ‘Why are there so many plant species in the Neotropics?’, 60(April), pp. 403–414. doi: 10.1002/tax.602010. Armenteras, D. et al. (2006) ‘Patterns and causes of deforestation in the Colombian Amazon’, Ecological Indicators, 6, pp. 353–368. doi: 10.1016/j.ecolind.2005.03.014. Bartolucci, F. et al. (2018) ‘An updated checklist of the vascular flora native to Italy’, Plant Biosystems. Taylor & Francis, pp. 1–125. doi: 10.1080/11263504.2017.1419996. Bebber, D. P. et al. (2012) ‘Big hitting collectors make massive and disproportionate contribution to the discovery of plant species’, Proceedings of the Royal Society B: Biological Sciences, 279(1736), pp. 2269–2274. doi: 10.1098/rspb.2011.2439. Bernal, R., Gradstein, S. and Celis, M. (2019) ‘Catalog of plants and lichens of Colombia’, Institute of Natural Sciences, National University of Colombia, Bogotá. Blumenthal, D. A. (2013) ‘Nuevas Tendencias Internacionales De La Cooperación Internacional Y La Conservación De La Biodiversidad: Afectación Local Y Retos Institucionales Para Su Financiamiento’. Bonpland, A., Humboldt, A. von and Kunth, K. S. (1815) Nova genera et species plantarum :quas in peregrinatione ad plagam aequinoctialem orbis novi collegerunt /descripserunt, partim adumbraverunt. Available at: https://www.biodiversitylibrary.org/item/11233 (Accessed: 24 August 2020). Boyle, B. et al. (2013) ‘The taxonomic name resolution service: An online tool for automated standardization of plant names’, BMC Bioinformatics, 14(1). doi: 10.1186/1471-2105-14-16. Brummitt, N. and Lughadha, E. N. (2003) ‘Biodiversity : Where ’s Hot and Where ’s Not’, Conservation Biology, 17(5), pp. 1442–1448. Brusatte, S. (2015) ‘What killed the dinosaurs’, Scientific American, 313(6), pp. 54–59. doi: 10.1038/scientificamerican1215-54. Butler, K. and Schamel, S. (1988) ‘Structure along the eastern margin of the central Cordillera, upper Magdalena Valley, Colombia’, Journal of South American Earth Sciences, 1(1), pp. 109–120. doi: 10.1016/0895-9811(88)90019-3. Cardoso, D. et al. (2017) ‘Amazon plant diversity revealed by a taxonomically verified species list’, 114(40). doi: 10.1073/pnas.1706756114. Cardoso, P. et al. (2011) ‘The seven impediments in invertebrate conservation and how to overcome them’, Biological Conservation. Elsevier Ltd, 144(11), pp. 2647–2655. doi: 10.1016/j.biocon.2011.07.024. Catanoso, J. (2017) Colombia, an example to world, balances conservation and development, Mongabay. Available at: https://news.mongabay.com/2017/10/columbia-an-example-to-world- balances-conservation-and-development/. Chase, M. W. et al. (2016) ‘An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG IV’, Botanical Journal of the Linnean Society, 181(1), 55

pp. 1–20. doi: 10.1111/boj.12385. Churchill, S. P. (1988) ‘Bryologia Novo Granatensis . Studies on the Moss Flora of Colombia II . Further Additions to Colombia and the Department of Antioquia’, The Bryologist, 91(2), pp. 130–133. Cody, S. et al. (2010) ‘The great American biotic interchange revisited’, Ecography, 33(2), pp. 326– 332. doi: 10.1111/j.1600-0587.2010.06327.x. Condit, R. (1996) ‘Defining and mapping vegetation mega-diverse tropical forests’, Tree, 11(1). Cuatrecasas, J. (1961) ‘A Taxonomic Revision of the Humiriaceae’, Systematic Plant Studies. Cubillos, F. G. (2008) Structural Analysis Of An Area In The Northern Central Part. Dávalos, L. M. et al. (2011) ‘Forests and drugs: Coca-driven deforestation in tropical biodiversity hotspots’, Environmental Science and Technology, 45(4), pp. 1219–1277. doi: 10.1021/es102373d. Diniz-Filho, J. A. F. et al. (2013) ‘Darwinian shortfalls in biodiversity conservation’, Trends in Ecology and Evolution, 28(12), pp. 689–695. doi: 10.1016/j.tree.2013.09.003. Etter, A. et al. (2006) ‘Modelling the conversion of Colombian lowland ecosystems since 1940: Drivers, patterns and rates’, Journal of Environmental Management, 79(1), pp. 74–87. doi: 10.1016/j.jenvman.2005.05.017. F. J. Hermann (1948) ‘Additions to the Flora Of Colombia’, Caldasia, 5(21), pp. 33–42. Filer, D. L. (2013) Botanical Research And Herbarium Management System training guide. Gaston, K. J. (2000) ‘Global patterns in biodiversity’, Nature, 405(6783), pp. 220–227. doi: 10.1038/35012228. Gillett, H. J. and Walter, K. S. (1998) 1997 IUCN red list of threatened plants, IUCN, Gland, Switzerland and Cambridge, UK. doi: 10.5962/bhl.title.44833. Goodwin, Z. A. et al. (2015) ‘Widespread mistaken identity in tropical plant collections’, Current Biology, 25(22), pp. R1066–R1067. doi: 10.1016/j.cub.2015.10.002. Guarin, F. A. (2005) ‘Three New Species of Bornarea (Alstroemeriaceae) from the Andean Region of Colombia’, Novon, 15(2), pp. 253–258. Guarin, F. A. (2007) ‘Two New Species of Bomarea ( Alstroemeriaceae ) from Colombia’, A Journal for Botanical Nomenclature, 17(2), pp. 141–144. Gutiérrez, F., Acevedo, T. and Juan Viatela (2007) ‘Violent Liberalism? State, Conflict and Political Regime in Colombia, 1930-2006’, Crisis States Working Papers. Hammen, T. Van Der and Urrego, G. C. (1978) ‘Prehistoric Man On The Sabana De Bogota: Data For An Ecological Prehistory’, Palaeogeography, Palaeoclimatology, Paiaeoecology, 25, pp. 179– 190. Hijmans, R. J. (2020) ‘raster: Geographic Data Analysis and Modeling’. Available at: https://cran.r- project.org/package=raster. Hijmans, R. J., Guarino, L. and Mathur, P. (2012) DIVA-GIS. Hortal, J. et al. (2015) ‘Seven Shortfalls that Beset Large-Scale Knowledge of Biodiversity’, Annual Review of Ecology, Evolution, and Systematics, 46(December), pp. 523–549. doi: 10.1146/annurev- ecolsys-112414-054400. Humphreys, A. M. et al. (2019) ‘Global dataset shows geography and life form predict modern plant extinction and rediscovery’, Nature Ecology & Evolution. Springer US, 3, pp. 1043–1047. doi: 10.1038/s41559-019-0906-2. 56

Idárraga-Piedrahíta, Á. et al. (2011) Flora de Antioquia. Catálogo de las plantas vasculares, vol. II., Universidad de Antioquia. Jenkins, C. N. et al. (2015) ‘The biodiversity of species and their rates of extinction, distribution, and protection’, Science, 344(6187). doi: 10.1126/science.1246752. Kim Rutledge et al. (2011) Tropics, National Geographic. Available at: https://www.nationalgeographic.org/encyclopedia/tropics/. Kuussaari, M. et al. (2009) ‘Extinction debt: a challenge for biodiversity conservation’, Trends in Ecology and Evolution, 24(10), pp. 564–571. doi: 10.1016/j.tree.2009.04.011. Kvist, L. P. and Skog, L. E. (1992) ‘Revision of Kohleria (Gesneriaceae)’, Smithsonian Contributions to Botany, 79. doi: 10.5962/bhl.title.123256. Ladd, D. and Thomas, J. R. (2015) ‘Ecological Checklist Of The Missouri Flora For Floristic Quality Assessment’, Phytoneuron, pp. 1–274. Lehner, B., Verdin, K. and Jarvis, A. (2008) ‘New Global Hydrography Derived From Spaceborne Elevation Data’, EOS, Transactions, American Geophysical Union, 89(10), pp. 93–104. Link, A. et al. (2010) ‘Initial effects of fragmentation on the density of three neotropical primate species in two lowland forests of Colombia’, Endangered Species Research, 13, pp. 41–50. doi: 10.3354/esr00312. Linnæus, C. (1753) ‘Species Plantarium’, Genera Relatas, 1, p. 572. doi: 10.1017/CBO9781107415324.004. Lomolino, M. (2004) Frontiers of Biogeography: New Directions in the Geography of Nature. Luhr, J. F. (2003) Earth. Dorling Kindersley. Mendoza-Cifuentes, H. et al. (2020) ‘Lintersemina (Rubiaceae: Condamineeae), a new and enigmatic genus from the Magdalena medio region of Colombia’, Phytotaxa, 451(1), pp. 1–20. doi: 10.11646/phytotaxa.451.1.1. Moonlight, P. W. et al. (2020) ‘The strengths and weaknesses of species distribution models in biome delimitation’, Global Ecology and Biogeography, (July 2019), pp. 1–15. doi: 10.1111/geb.13149. Myers, N. (1988) ‘Threatened Biotas: “Hot Spots” in Tropical Forests’, The Environmentalist, 8(3), pp. 187–208. Oldham, M. J. (2010) Checklist of the Vascular Plants of Niagara Regional Municipality , Ontario. Oliver, F. W. (Francis W. (ed.) (1913) Makers of British botany; a collection of biographies by living botanists. Cambridge, University press. Oyuela-Caycedo, A. (2008) ‘Late Pre-Hispanic Chiefdoms of Northern Colombia and the Formation of Anthropogenic Landscapes’, in The Handbook of South American Archaeology, pp. 405–428. doi: 10.1007/978-0-387-74907-5_22. Parsons, J. J. (2020) ‘Colombia’, Brittanica. Platnick, N. I. (1991) ‘Patterns of biodiversity: Tropical vs temperate’, Journal of Natural History, 25(5), pp. 1083–1088. doi: 10.1080/00222939100770701. Reuter, H. I., Nelson, A. and Jarvis, A. (2007) An evaluation of void-filling interpolation methods for SRTM data, International Journal of Geographical Information Science. doi: 10.1080/13658810601169899. Sanín, M. J. and Galeano, G. (2011) A revision of the Andean wax palms, Ceroxylon (Arecaceae). 57

Saunders, D. A., Hobbs, R. J. and Margules, C. R. (1991) ‘Biological Consequences of Ecosystem Fragmentation: A Review’, Conservation Biology, 5(1), pp. 18–32. doi: 10.1111/j.1523- 1739.1991.tb00384.x. Scotland, R. W. and Wortley, A. H. (2003) ‘How many species of seed plants are there?’, Taxon, 52, pp. 101–104. doi: 10.2307/3647306. Shoeb, M. (2008) ‘Anticancer agents from medicinal plants’, Bangladesh Journal of Pharmacology, 1(2), pp. 35–41. doi: 10.3329/bjp.v1i2.486. Silvestro, D. et al. (2015) ‘Biological evidence supports an early and complex emergence of the Isthmus of Panama’, 112(24). doi: 10.1073/pnas.1509107112. Simmonds, N. W. (1976) Evolution of Crop Plants. Simpson, B. B. and Neff, J. L. (1985) ‘Plants, Their Pollinating Bees, and the Great American Interchange’, pp. 427–452. doi: 10.1007/978-1-4684-9181-4_16. Stace, C. (2010) New Flora of the British Isles Third Edition. Stark, R. and Satonin, S. (1992) FoxPro : the master reference. 2nd edn. Blue Ridge Summit, PA : Windcrest. Stevens, G. C. (1989) ‘The latitudinal gradient in geographical range: how so many species coexist in the tropics’, American Naturalist, 133(2), pp. 240–256. doi: 10.1086/284913. Turner, B. L. and Mendenhall, M. G. (1993) ‘A Revision of Malvaviscus ( Malvaceae )’, Annals of the Missouri Botanical Garden, 80(2), pp. 439–457. Udvardy, M. D. F. (1975) A Classification of the Biogeographical Provinces of the World. UNEP-WCMC, IUCN and NGS (2018) Protected Planet Report 2018. Vié, J.-C., Stuart, C. H.-T. and N., S. (2010) Wildlife in a Changing World An analysis of the 2008 IUCN Red List of Threatened SpeciesTM, Marine Ecology. IUCN, Gland, Switzerland. doi: 10.1111/j.1439-0485.2010.00364.x. Vignieri, S. (2014) ‘Vanishing Fauna’, Science, 345(6195), pp. 392–395. doi: 10.1126/science.345.6195.392. Villamil-Montero, D. A. and Ming, L. C. (2016) ‘The botanical explorations in Colombia: A review of the written botanical heritage with analyses of Lamiaceae collections as studied case’, Boletin Latinoamericano y del Caribe de Plantas Medicinales y Aromaticas, 15(2), pp. 128–135. Vos, J. M. De et al. (2014) ‘Estimating the Normal Background Rate of Species Extinction’, Conservation Biology, p. 5. Webb, S. D. (1991) ‘Ecogeography and the Great American Interchange’, Paleobiology, 17(3), pp. 266–280. Woodland, D. W. (1991) Contemporary plant systematics. Englewood Cliffs, N.J. : Prentice Hall.

58

Appendix

A1. Preliminary Checklist of the Middle Magdalena Valley in Colombia

Checklist of the 1,467 species of seed plants occurring in the Middle Magdalena Valley.

Attached in supplemental materials as PCMMV.pdf

A2. R scripts

A2.1 Polygon Creation

Annotated script used in the creation of the shapefile of the study area. Attached in

supplemental materials as Polygon_Creation.R.

A2.2 Cleaning Pipeline

Annotated script of the cleaning steps and construction of the data pipeline adapted from

Moonlight et al. 2020. Attached in supplemental materials as

Magdalena_Cleaning_Pipeline.R.

A3. Shapefiles of the Middle Magdalena Valley

Combined shapefiles of the polygon of the Middle Magdalena Valley created using the script

in appendix A2.1. Attached in supplemental materials as Polygon_Shapefiles.zip.

A4. BRAHMS Database

Export of created BRAHMS database used in the construction of the checklist. Attached in

supplemental materials as Database.zip

A5. Original GBIF Dataset

The full dataset originally used can be found at https://doi.org/10.15468/dl.45fk4c. It includes

all preserved specimen records of seed plants in Colombia.