Assessing the impacts of environmental change on British pollinators (Syrphidae) using next generation sequencing techniques

Hannah Norman

Submitted for the degree of

Doctorate of Philosophy

Department of Life Sciences, Imperial College London Declaration of Originality

All the work in this PhD is mine, and any programs, data and work which is not mine or was not carried out by me is referenced within the text.

Copyright Declaration

The copyright of this thesis rests with the author. Unless otherwise indicated, its contents are licensed under a Creative Commons Attribution-Non-Commercial 4.0 International License (CC BY-NC). Under this licence, you may copy and redistribute the material in any medium or format. You may also create and distribute modified versions of the work. This is on the condition that: you credit the author and do not use it, or any derivative works, for a commercial purpose. When reusing or sharing this work, ensure you make the licence terms clear to others by naming the licence and linking to the licence text. Where a work has been adapted, you should indicate that the work has been changed and describe those changes. Please seek permission from the copyright holder for uses of this work that are not included in this licence or permitted under UK Copyright Law.

1 Abstract

British pollinating are vital for their contribution to crop yields as well as maintenance of semi-natural environments across the UK. The Syrphidae family are highly diverse and thought to be an important pollinating group. In order to understand the evolutionary relationships in this family, mitochondrial genomes were used to build the largest tree of the Syrphidae family to date. This tree was used to establish relative ages for divergences within the family, and to explore the evolution of diverse syrphid larval life histories. The recent introduction of a UK pollinator monitoring programme has resulted in a large amount of data with the potential to inform conservation and management, presenting an opportunity for DNA analysis. A CO1 reference database was curated and tested for UK syrphids, resulting in sequences for 70% of UK and highlighting difficulties in DNA identification. Barcoding of syrphids was then expanded by also from gut samples. This showed the diversity of – syrphid interactions at an individual level across three different land use types. It also highlighted the importance of including syrphid larval life histories in analyses of this family, as these appeared to have a larger effect on species composition than pollen composition did. Finally, the reach of this thesis was expanded to include non- and non-syrphid pan trap visitors from the pollinator monitoring programme. This highlighted the diversity of non- syrphid Dipterans and allowed species and phylogenetic diversity to be compared in these bulk samples. Using DNA methods to analyse monitoring data has the potential to increase our knowledge of cryptic species, phylogenetic diversity and pollinator associations. Alongside this, DNA can be used to analyse large datasets of diverse pollinating insects which otherwise would be overlooked.

2 Table of Contents

Declaration of Originality ...... 1 Copyright Declaration ...... 1 Abstract ...... 2 List of table and figures ...... 5 Chapter 1 ...... 5 Chapter 2 ...... 5 Chapter 3 ...... 5 Chapter 4 ...... 5 Literature review ...... 6 ...... 6 Syrphid pollinators ...... 8 Monitoring ...... 9 DNA monitoring ...... 11 Plant – pollinator networks ...... 13 Phylogenetics ...... 15 Conclusion ...... 17 Chapter 1: Syrphidae Mitochondrial Genomes and Phylogeny ...... 18 Introduction ...... 18 Methodology ...... 21 Results ...... 25 Discussion ...... 37 Conclusion ...... 41 Chapter 2: Developing and testing a Syrphidae CO1 reference database ...... 43 Introduction ...... 43 Methodology ...... 46 Results ...... 52 Discussion ...... 63 Conclusion ...... 67 Chapter 3: diversity and pollen associations in a semi-urban landscape ...... 68 Introduction ...... 68 Methods ...... 71 Results ...... 78 Discussion ...... 95

3 Conclusion ...... 100 Chapter 4: Metabarcoding of pan trap bycatch to identify non-bee and non-syrphid flower visitors ...... 102 Introduction ...... 102 Methods ...... 104 Results ...... 106 Discussion ...... 117 Conclusion ...... 122 Discussion ...... 123 Acknowledgements ...... 127 Bibliography ...... 128 Supplementary Information ...... 143 Chapter 1: Syrphidae Mitochondrial Genomes and Phylogeny ...... 143 Chapter 2: Developing and testing a Syrphidae CO1 reference database ...... 145

4 List of table and figures

Chapter 1 Table 1………………………………………………………………………………………………………………………….25 Table 2 …………………………………………………………………………………………………………………………26 Table 3 ……………………………………………………………………………………………………………………….…27 Figure 1…………………………………………………………………………………………………………………………30 Figure 2…………………………………………………………………………………………………………………………32 Figure 3…………………………………………………………………………………………………………………………33 Figure 4…………………………………………………………………………………………………………………………35 Figure 5…………………………………………………………………………………………………………………………36

Chapter 2 Table 1 …………………………………………………………………………………………………………………………52 Figure 1…………………………………………………………………………………………………………………………53 Figure 2…………………………………………………………………………………………………………………………55 Table 2 ………………………………………………………………………………………………………………………….56 Figure 3…………………………………………………………………………………………………………………………57 Figure 4…………………………………………………………………………………………………………………………59 Figure 5…………………………………………………………………………………………………………………………60 Figure 6…………………………………………………………………………………………………………………………62

Chapter 3 Figure 1…………………………………………………………………………………………………………………………72 Table 1 ……………………………………………………………………………………………………………………….…75 Table 2 …………………………………………………………………………………………………………………….……79 Figure 2…………………………………………………………………………………………………………………………80 Figure 3…………………………………………………………………………………………………………………………82 Table 3 ………………………………………………………………………………………………………………………….84 Figure 4…………………………………………………………………………………………………………………………85 Table 4……………………………………………………………………….…………………………………………………87 Figure 5 …………………………….……………………………………………………………………………………….…86 Figure 6…………………………………………………………………………………………………………………………88 Figure 7…………………………………………………………………………………………………………………………89 Figure 8…………………………………………………………………………………………………………………………91 Figure 9…………………………………………………………………………………………………………………………93 Figure 10…………………………………………………………………………………………………………………….…94

Chapter 4 Figure 1…………………………………………………………………………………………………………………………108 Figure 2…………………………………………………………………………………………………………………………109 Figure 3…………………………………………………………………………………………………………………………111 Figure 4…………………………………………………………………………………………………………………………113 Figure 5…………………………………………………………………………………………………………………………115 Figure 6…………………………………………………………………………………………………………………………116

5 Literature review

Pollination In the UK crops and wildflowers are pollinated by a wide variety of insects, providing an important ecosystem service to both semi-natural and agricultural land. It has been estimated that around 78% of temperate are pollinated by (Ollerton, Winfree and Tarrant, 2011), and without them, many crop yields would be greatly reduced or eliminated (Klein et al., 2007). There is a huge diversity of pollinating insects in the UK, with over 280 species of (Falk, 2018) and 280 species of syrphid (Ball and Morris, 2015), both of which are important pollinators (Garibaldi et al., 2013). There are also many non-bee flower visiting insects which contribute an unknown amount to pollination (Orford, Vaughan and Memmott, 2015).

Our population is growing, increasing the demand on land for housing and agriculture. It is therefore important for farmers to maximize their crop yields. However, at the same time pollinators are facing a number of threats (Vanbergen and Initiative, 2013) which work in concert to reduce overall pollinator health and productivity (Goulson et al., 2015). Some of these threats are headline grabbing, such as neonicotinoid pesticides which appear to affect bee behavior and reduce pollination efficiency (Godfray et al., 2014; Rundlöf et al., 2015). Others include disease and parasites, spread from commercial bee hives into the wild (Singh et al., 2010; Fürst et al., 2014), which have a larger impact on individuals already weakened by pesticides (Evans et al., 2018). For pollination, the threat is not just that pollinating species will become less efficient or diseased, but that they will become uncoupled with the plants that they are pollinating. This could be caused by climate change, which may change distributions of species and push their southern range north whilst not expanding their northern range (Kerr et al., 2015; Rafferty, 2017), as well as potentially disrupting plant- pollinator interactions by uncoupling emergence times (Memmott et al., 2007). In the UK there is also an increased threat to pollinators from land use change (Vanbergen, 2014), as more and more of our environment is given over to housing our increasing population, or growing the food needed to sustain it (D. Senapathi et al., 2015). This can result in pollinator deserts, where large swathes of monoculture provide no food for the insects.

6 Of course, the way that different pollinators will react to these threats is not straightforward. New bee and syrphid species have been arriving in the UK in more recent years (Cross and Notton, 2017; Notton and Norman, 2017), suggesting that for some pollinator species climate change is opening up niches, and this may in turn benefit pollination services in the UK. Alongside this, pollinators react in different ways to land use changes, with some species thriving in urban areas due to the floral resources available in gardens (Bates et al., 2011). Overall, ecosystems are currently experiencing extraordinary upheaval, and their ability to continue to function remains uncertain. Their resistance and resilience to these changes depends in a large part on interactions among species, and so the UK government are recognising the importance of monitoring pollinating insects (Department for Environment, 2014).

It appears that crop pollination alone may not be a suitable argument for conserving species diversity of pollinators, as a small proportion of total bee species provide the vast majority of flower visits to crops (Kleijn et al., 2015). This could be interpreted by those interested in pollination services, such as policy makers and farmers, as meaning that species diversity is unimportant for pollination, and even that the role could be filled by commercial pollinators. However, in reality the situation is more complex. Sapir et al., (2017) found that the pollination efficiency of honeybees on apples was increased by the presence of wild bee species, likely because they introduced competition for flowers, decreasing the time an individual spent on a single flower. Other studies have shown that increased diversity of pollinators increases fruit set and yield (Klein, Steffan-Dewenter and Tscharntke, 2003; Hoehn et al., 2008), and that wild bees are vital for this increase (Holzschuh, Dudenhöffer and Tscharntke, 2012; Garibaldi et al., 2013). It is also important to mention that pollinators are similarly vital in semi-natural ecosystems where they increase floral diversity and therefore are important contributors to those ecosystems. In an environment such as the UK with the kind of rapid change and numerous threats affecting pollinators, it is also important to think about the resilience of a community. Reducing conservation of pollinators to the small number of bee species which are currently the most important crop pollinators fails to provide resilience to change. Ensuring a diverse community of pollinators means that they are more likely able to adapt to changes and threats in the environment (Deepa Senapathi et al., 2015), and continue to provide a pollination service.

7 Syrphid pollinators In this thesis the focus is on the Syrphidae family (Diptera), which comprises some 5000 species, and are a large group of flower-visiting that provide important pollination services (Jauker and Wolters, 2008). They are commonly known as in the UK, where there are 280 species (Ball and Morris, 2015). Recent research has shown that there has been a long term decline in syrphids across the UK, and that the rate of decline is markedly different from that of UK bees (Powney et al., 2019), making them an important group for monitoring and research in the UK.

One of the reasons for the difference in decline is that syrphids respond differently to land use change in comparison to most bee species (Jauker et al., 2009). Some research has shown syrphid species diversity decreases with an increasing urban landscape (Bates et al., 2011). This may be down to a reduction in potential larval habitat, or because of the abundance of complex floral morphologies which prevent flies accessing pollen (Geslin et al., 2013). Unlike bees, whose larvae feed solely on provisions from flowers, syrphid larvae have a large range of larval habitats and diets (Ball and Morris, 2015). Larval life histories range from predatory to aquatic to phytophagous. Aphidophagous species in particular may be able to thrive in agricultural environments and field edges (Sutherland, Sullivan and Poppy, 2001; Haenke et al., 2009), and may provide important biocontrol (Pascual-Villalobos et al., 2006).

Adult syrphids also have markedly different ecology from bees, which impacts their response to land use change. It has been suggested that syrphids can travel longer distances than bees, since they do not have to return to a nest site (Rader et al., 2011). This would make them more able to survive in substandard environments, and more adaptable to change. However, this distance is not known, nor are the impacts of travelling long distances in search of floral resources. There is evidence that syrphids are generalist pollinators, allowing them to exploit the floral resources that are available (Branquart and Hemptinne, 2000). However the morphology of syrphid mouthparts means that the morphology of flowers is important, with fewer open flowers resulting in fewer flower visits (Geslin et al., 2013). These complex interactions and the large number of factors involved mean that it is hard to predict how species will respond to land use change without evidence (Bartomeus et al., 2018).

8 There has been little research into how other known threats to pollinators affect syrphids. It has been found that neonicotinoids have no effect on the potentially vulnerable aquatic larvae of some species (Basley et al., 2018), however the impact of neonicotinoids in pollen on adult flies is unknown. In bees, disease is a stress factor, but the effect of disease on syrphids is less well known. It is unlikely that disease is a huge issue in this family as they do not form social nests or aggregations, unlike large numbers of bee species, and so there is less chance of disease spreading through a population. However, it has been suggested that flowers are important points of disease and parasite exchange (Schwarz and Huck, 1997), since many different individuals of different species can visit a single flower. Alongside this, it was recently found that syrphids are capable of spreading bee diseases between flowers, potentially increasing bees exposure to disease (Bailes et al., 2018). Surprisingly, there has been no research into whether these diseases were affecting the syrphid carriers, or how these diseases may be impacting syrphid populations.

Monitoring The UK has monitoring programmes for many different groups, for one of two reasons. There are animals and plants which it has a legal obligation to monitor. For example, the great crested newt must be surveyed for in areas where there are planned developments (Rees et al., 2014). The second type of monitoring is for groups for which long-term data is available regarding trends and habitat, tracking populations and communities over time to detect any large changes. There are several examples of this, including the UK bat survey (Barlow et al., 2015) and the plant monitoring scheme of Britain and Ireland (Pescott et al., 2015). One example of how these long-term monitoring schemes can be vital for detecting change is the UK breeding birds survey. This has been running for four years and provided sufficient long- term data on breeding birds in the UK to detect of changes in the migration phenology of UK breeding birds (Newson et al., 2016). These long-term monitoring programmes are vital for conservation, especially with the increasing threats of climate change and land use change. Understanding how populations and communities are changing is the first step towards ensuring they are conserved.

9 The majority of monitoring in the UK is carried out by volunteers who give their time and expertise for free to record wildlife. This enables large-scale long-term monitoring at much lower costs than it would take using paid experts, and thus increases the feasibility and longevity of programmes. However, setting up a new monitoring programme is a large and expensive undertaking because the surveying needs to be planned in a way that will provide meaningful data, and long-term survey sites need to be identified and accessed. Monitoring programmes also require volunteer coordinating and the ability to compile, store and analyse the data collected. In 2015, a pilot was launched with Defra and The Centre for Ecology and Hydrology (CEH) to monitor bee and syrphid pollinators across the UK (Carvell et al., 2016). This has resulted in a programme which is a combination of volunteer and expert identification. Volunteers set up pan traps and collect insects which are sent to paid taxonomists to be identified. They also conduct flower visitor observations (FIIT counts), by recording all of the insects that visit a flower for 5 minutes.

For many monitoring programmes, such as the breeding bird survey, volunteers are able to identify species during the survey. This is because there are a lot of amateur experts who take part in surveys and have a wealth of knowledge about the groups they are surveying (Newson et al., 2016), but also because it is fairly easy to identify some groups from sight. However, this is not the case for the pollinator survey, where many species of bee and syrphid are difficult to identify, and some require careful examination out of the field. Because of this volunteer observations only distinguish between groups of pollinators such as bee, syrphid and other Diptera. This allows a large amount of data to be collected by volunteers, however it does not give the species-level data required to understand detailed differences in communities over time. In order to obtain this data, the scheme also includes pan trapping in 12 locations five times over the course of the summer months. This provides a snapshot of the pollinator community with the bees and syrphids identified by expert taxonomists to species level. Although pan trapping includes lethal sampling, where individuals are removed from the environment, a recent study by Gezon et al. (2015) found that lethal sampling does not impact insect populations.

10 DNA monitoring Paying taxonomists to identify a large sample of diverse insects is time consuming and expensive. For a group such as pollinators it most likely requires several taxonomists with different specialisms due to the different groups present. Currently, this makes the monitoring programme expensive, and may result in long waiting periods while the specimens are identified. In the future this may become more of an issue since the number of taxonomists is declining around the world, resulting in less expertise for identifying these groups. Recent studies have shown that DNA can provide accurate identification of specimens to species level (Creedy et al., 2019). Utilising DNA in the correct manner could be a beneficial way forward for taxonomists and molecular biologists alike, as it would allow taxonomists to transfer their skills from routine species identification to species discovery and identification of cryptic species. It is important to emphasise that a move to using DNA methods for species identification cannot be done without the vital input of taxonomists, who are indispensable for identifying reference specimens, validating DNA methods and establishing species identification for cryptic species that DNA cannot separate.

Recently, this move towards using DNA to monitor plants and animals has been gaining interest from policy makers, and for some groups DNA is already employed in monitoring. The most well established use of DNA monitoring in the UK is using environmental DNA (eDNA) to detect the presence of great crested newts (Rees et al., 2014; Biggs et al., 2015). This eDNA method does not require sequencing, making it faster and cheaper than other DNA methods. Instead, this method utilises species-specific primers which exclusively amplify great crested newt DNA using qPCR. This method is now successfully used in monitoring of this species across the UK. In the case of great crested newts, a single species is being detected. This is different to monitoring of a mixed community, where a number of different species are present. If there were only a few species of interest, then the same methods could be applied but with several different primer sets for different species. However, for studies where there are many different species of interest, or where the focus is not on particular species but the community as a whole, a broader approach must be taken, using methods such as DNA barcoding, metabarcoding or genome skimming,

11 There are several DNA techniques that are not yet used in UK monitoring, which would enable large scale analysis of diverse samples. For moderately diverse samples, Illumina barcoding can be used. In this method, individuals from a sample are separated, and then DNA extracted separately from each. When the barcoding region is amplified using PCR, a DNA tag is attached to the primers, allowing that sample to be traced back to the specimen. Using different tags means multiple samples can be pooled together. Post sequencing, the tags are used to separate out each sample (Shokralla et al., 2015; Creedy et al 2019). This has the benefit over other bulk sequencing methods of retaining the link between specimen and sequence, resulting in more powerful data which can be verified easily, and giving accurate abundance data. This is important in monitoring, where the abundance of species present in populations is vital information.

For larger and more diverse samples, metabarcoding can be used. This technique uses PCR of a chosen barcode on a large diverse sample (Yu et al., 2012). A mixed sample is used, with multiple unknown species present, such as from a highly diverse soil community (Arribas et al., 2016). Identifying these species morphologically requires a lot of taxonomic expertise and could take years, and so molecular identification allows monitoring of biodiversity in these species-rich ecosystems. This methodology can be used for mixed samples of invertebrates, and is particularly useful for highly diverse communities such as rainforest canopy insects (Creedy, Ng and Vogler, 2019), and for challenging groups such as those found in soil communities (Andújar et al., 2015). Pan trap samples such as those obtained by the pollinator monitoring scheme (Carvell et al., 2016) would be ideal for metabarcoding, as they contain a diverse community of insects. Currently only the bees and syrphids from these samples are identified and including molecular analysis of the bulk samples could therefore increase the amount of data gained from this monitoring scheme.

In order to identify the different species correctly a well-curated reference database is required, which provides a connection between the DNA and the traditional of species. Monitoring programmes mostly require species level identifications, making DNA monitoring without linking to taxonomic species less useful. These reference databases are already being developed, particularly at country level. For example, there is now a curated reference database for Canadian bee species (Sheffield et al., 2017). For reference databases

12 to be robust, they should ideally be linked to a database of specimens, so that the identifications can be verified. We already have long-term storage of specimens in museums, and these institutions are preserving specimens for DNA analysis. An example is the Natural History Museum in London which contains a molecular collection facility where specimens and DNA can be stored for future use.

There has been increasing focus on developing robust reference databases for monitoring fauna, and reference databases for regional bee fauna have been developed in Canada (Sheffield et al., 2017), Chile (Packer and Ruz, 2017), Ireland (Magnacca and Brown, 2012) and the UK (Creedy et al. 2019). There has also been development of a reference database for Afrotropical hoverflies (Jordaens et al., 2015). On a larger scale, the Barcode of Life Initiative have a long running project to barcode all life on Earth (Savolainen et al., 2005), which has resulted in large initiatives such as the German Barcode of Life Initiative, to barcode all German biodiversity (Geiger et al., 2016).

Plant – pollinator networks Increasingly, research into pollinators is encompassing a network approach, as interactions between pollinators and plants are vital to understanding the ecology, threats and changes (Vanbergen, 2014) in these communities. Studies have shown that the structure of a network influences the response to habitat loss (Fortuna and Bascompte, 2006), and that species loss does not have the same effect on networks as interaction loss (Santamaría et al., 2016), suggesting that researching and quantifying pollinator communities cannot be done using species diversity and abundance alone (Forup et al., 2007). Networks also potentially give a greater understanding as to why changes are occurring. This enables understanding as to why syrphid flower visitation is lower in urban areas, which is likely due to abundant floral morphologies (Geslin et al., 2013), and why specialised pollinators are more vulnerable to change (Weiner et al., 2014).

Pollination networks are often constructed using data from flower visits, collected by observing flowers and recording insect visitors for a set length of time (Garbuzov, Samuelson and Ratnieks, 2015). Just recording visits assumes that every visit results in pollination, and so

13 several studies have translated visitation into pollination using exclusion experiments (Ballantyne, Baldock and Willmer, 2015). These have shown that visitation is a useful proxy for pollination. However, these methods are limited in the area over which networks can be surveyed. Recording flower visits is labour intensive and requires participants with good taxonomic expertise. These considerations mean that implementing a large-scale monitoring programme for pollination networks is not a feasible management option. There is therefore a demand for a new methodology for establishing pollination networks, which addresses these concerns and presents a viable management option for monitoring this ecosystem service.

Pollination networks involve an exchange of plant material from the plant to the insect in the form of pollen, therefore providing a record of which plant species an individual has visited (Tur et al., 2014). By using pollen on an insect’s body to establish network links, the need for lengthy field observations is removed. Identifying pollen to species is already used in forensic biology (Karen L Bell et al., 2016), as well as for identifying honey sources (Hawkins et al., 2015), and has been shown to identify network links which were missed in field observations (Bosch et al., 2009). However, morphologically identifying pollen is time consuming, and still requires taxonomic expertise, which is often a limited resource. Identifying pollen under the can be especially difficult in some families such as Rosaceae, where pollen cannot be separated to species level (Kendall and Solomon, 1973). Rather than identifying pollen from morphology, molecular methods can be used to establish the plant species present. These methods are well suited to processing large amounts of data, thus enabling large scale monitoring of the pollination system (Baird, Hajibabaei and Brunswick, 2012), as well as being well suited to identifying hidden interactions such as pollen on or inside an insect (Evans et al., 2016).

DNA has been used to identify pollen collected from hives (Keller et al., 2015), and from honey (Bruni et al., 2015; Hawkins et al., 2015; de Vere et al., 2017), where there is a large mixed sample of pollen. Alongside this, small mixed samples of pollen from individual insects have also been used to establish individual visitation networks, with metabarcoding of pollen at the specimen level (Lucas et al., 2018). Here, each individual is treated like a community,

14 with the diverse pollen in or on the insect treated as a soil community would be (Arribas et al., 2016), or a pollen sample taken from a hive.

There are some complications with DNA barcoding of plants. Unlike with insects, where the CO1 barcoding region is accepted as a barcode with appropriate inter-specific variation, there is not a single barcode which can be used to identify all plant species (CBOL Plant Working Group et al., 2009). This is due to differences in variability in different plant groups, meaning that more than one barcode should be used to identify pollen. Studies have used rbcL (de Vere et al., 2012), trnL (Taberlet et al., 2007), ITS2 (Yao et al., 2010) and matK, although a major constraint is the reference database. If a plant species’ pollen is present in the unknown sample that is not present in the reference database, then that species will remain unknown. This limitation is true both for barcoding of the insects and of the plants, and therefore an important step in developing a framework for monitoring pollination networks is to build a well curated reference dataset of plant and syrphid barcodes (de Vere et al., 2012).

Phylogenetics Currently, reasons for using DNA in monitoring and conservation focus on fast and accurate identification and high-throughput data analysis. However, DNA also provides the opportunity to gain more information from the data than traditional methods in the form of potential associations, such as detection of gut contents and parasites and disease. DNA also allows the analysis of phylogenetic diversity, which adds an extra measure alongside species diversity and abundance as to the health and resilience of a community. Measures such as phylogenetic diversity and species diversity may react differently to changes in the environment (De Palma et al., 2017), and so not including these measures may result in important information for conservation being lost. This can be important for pollination, as a study by Grab et al. (2019) found that increasing agricultural land led to a decrease in phylogenetic diversity, alongside a decrease in pollination services.

Sequencing the large amount of data needed for phylogenetics can be an expensive process, however sequencing costs are lowering and new techniques allow more data to be sequenced at once, thus further reducing costs (Shapland et al., 2015). Mitochondrial

15 is a technique by which multiple species can be sequenced in a single pooled library, enabling full mitochondrial genomes to be obtained for many specimens at once (Andújar et al., 2015). The DNA is sequenced using , without PCR amplification, thus reducing cost and potential bias towards certain species. This technique has been used to produce high amounts of genetic data to create well resolved phylogenetic trees in diverse groups such as the (Crampton-platt et al., 2015), and has been used to monitor wild bee populations (Tang et al., 2015).

There have been several published phylogenies of the Syrphidae family, although they generally have low numbers of taxa (Skevington and Yeates, 2000; Pauli et al., 2018) and low amounts of genetic data (Ståhls et al., 2003). There is often a trade-off between the amount of genetic data included in a phylogeny and the number of taxa included, because generating large amounts of data for large numbers of taxa is time consuming and expensive. However, using mitochondrial metagenomics helps to reduce the cost for generating whole mitochondrial genomes for large numbers of specimens (Andújar et al., 2015). A study using mitochondrial genomes for deep relationships in Diptera found that they were an informative source of phylogenetic data, resulting in topologies which agreed with consensus (Cameron et al., 2007). There are some mitochondrial genomes available for Syrphidae; two from Li (2019), and five genomes from Sonet et al. (2019), however for a comprehensive phylogeny, more mitochondrial genomes are needed. Due to the limited number of taxa in current phylogenies, the focus of the topologies has been the relationship between the three subfamilies, , and Microdontinae. Ståhls et al. (2003), found the three subfamilies to be monophyletic, but other studies have found Eristalinae to be paraphyletic (Skevington and Yeates, 2000; Mengual, Ståhls and Rojo, 2015; Pauli et al., 2018).

A well-supported phylogeny with well-represented taxa would allow phylogenetic diversity to be investigated along with other measures of diversity. It also allows investigation of evolution of traits. In the case of syrphids, the diverse range of larval diets may be important when looking at species composition and habitat type, and give a greater understanding of the evolution of this large family of pollinators.

16 Conclusion UK pollinators are an important community which contribute to agricultural yields as well as the maintenance of many habitat types across the country. They are often in the news; however, most attention and research focus is on bees, which although important pollinators form only a fraction of the taxonomic diversity of flower visitors. The Syrphidae family are an important pollinator group in the UK and appear to be responding differently to bees to threats such as land use change and climate change. DNA offers an opportunity to identify and survey this family in an efficient and accurate manner, whilst also increasing the data to include plant visitation and phylogenetic diversity. In this thesis the evolutionary history of the Syrphidae family will be explored with the largest Syrphidae phylogeny to date, resulting in further insights into the larval life histories of this family. The feasibility of identifying and monitoring syrphids using DNA will be investigated, before integrating pollen network data alongside this to investigate how syrphids are using a diverse mosaic landscape. Finally, the scope of this thesis will be expanded to investigate the non-bee and non-syrphid flower visitors across the UK, to give a fuller picture of UK pollinator diversity using DNA techniques.

17 Chapter 1: Syrphidae Mitochondrial Genomes and Phylogeny

Introduction Pollinators are the subject of conservation, monitoring and research efforts around the world, many of which are investigating the species diversity of communities. However, functional and phylogenetic diversity are also important for understanding their resilience. Communities with low phylogenetic diversity are more likely to be impacted by a sudden change to the environment, and there is evidence that some insect communities are becoming less evolutionarily diverse in agricultural land (Grab et al., 2019). This is important for pollination of crops, for which it has been shown that an increase in species diversity leads to an increase in fruit set (Holzschuh, Dudenhöffer and Tscharntke, 2012; Garibaldi et al., 2013). The lack of a robust and well sampled phylogeny for the Syrphidae family means that there is a knowledge gap around how evolutionary diversity is impacting this important pollinator group.

The most recent tree of the Syrphidae is from 2016 (Young et al., 2016), using hybrid enrichment to obtain 559 loci for 30 species of Syrphidae, which established the relationships between the three subfamilies. Earlier trees used mitochondrial and nuclear genes, as well as morphological characters to look at the relationships (Skevington and Yeates, 2000; Ståhls et al., 2003; Mengual, Ståhls and Rojo, 2015). These studies have found the subfamily Microdontinae to be sister to the rest of Syrphidae, and Syrphinae to be monophyletic within a paraphyletic Eristalinae. These studies contain varying amounts of molecular data, but all contain low numbers of species. There is therefore a need for a phylogeny with a large number of taxa and a large amount of molecular data.

There is very little molecular data publicly available for the Syrphidae family beyond CO1 barcodes, which limits the confidence in current phylogenetic analysis. To date, only three mitochondrial genomes are available on GenBank, in addition to five mitochondrial genomes for the (Sonet et al., 2019), which together represent only six genera. Recent advances in sequencing and bioinformatics have allowed an increasingly large amount of data to be obtained, although large scale sequencing of genomes is still expensive. Mitochondrial genomes are much easier to obtain than nuclear genomes, due to

18 their smaller size (around 15,000bp in insects), and the fact that they are present in higher copy number in cells. This makes them amenable to genome skimming, i.e. genome assembly from low-coverage shotgun sequencing of total DNA (Straub et al., 2012). In a technique known as mitochondrial metagenomics (MMG), several specimens can be sequenced together in a single pooled library (Tang et al., 2014; Crampton-platt et al., 2015). The mitochondrial genomes for each specimen can then be assigned to one of the original samples in the pool using bait sequences, usually of the CO1 barcode. This hugely reduces the cost of sequencing each individual, making mitochondrial genomes fairly easy to obtain for larger numbers of individuals. However, working with mixtures can lead to the formation of chimeras if sequences are assembled incorrectly post sequencing. To help avoid this, the specimen pooling is done to maximize the genetic distance between individuals in the pools. The approach has been widely used and applied successfully in phylogenetic studies of several groups, including Coleoptera (Crampton-platt et al., 2015), high level Hymenoptera (Mao, Gibson and Dowton, 2015) and other groups within Diptera (Zhang et al., 2019).

Mitochondrial genomes can go a long way towards generating the desired greater number of nucleotides for each species, while also greatly expanding the number of specimens that can be sampled. MMG thus overcomes an issue in phylogenetics, where increasing the amount of data and number of species results in a more robust tree, and thus resolves the problem of prioritising more data or more species when building trees (Rokas and Carroll, 2002). However, as with any type of single locus marker, mitochondrial genomes suffer from non-uniform character variation that creates biases in the resulting tree searches and potentially leads to incorrect topologies, potentially with high support. Therefore, appropriate model choice and the use of efficient tree searches are critical to obtain the most realistic topology possible. Beyond parsimony, the most widely used model-based tree building method is Maximum Likelihood. This has efficient implementations in programs such as RAxML (Stamatakis, 2014), which is suitable for large datasets of the kind produced in MMG. Maximum likelihood methods can also deal with differences in the type of nucleotide variation within mitochondrial genomes. Genes are transcribed from different strands, leading to GC skew and great differences in codon usage, while rates vary among genes and between codon positions. Models of evolution can accommodate the

19 heterogeneity in rates and nucleotide composition to various degrees and provide an accurate estimation of how the sequence is evolving and thus establish the best tree topology. The same is true with partitioning the data, which improves the models by estimating parameters separately for different portions of the data, for example different genes within an analysis or different codon positions.

Bayesian analysis can potentially implement more complex models, due to a more efficient search strategy using MCMC chains. The basic implementations use the same GTR (general time reversible) models as the popular likelihood approaches, but they provide an alternative implementation of the tree searches which potentially reveal nodes of low confidence of other weaknesses in the inferences. In addition, Bayesian approaches can implement more complex models, which allow the application of multiple independent evolutionary models with their own rate parameters and substitution matrices. In programs such as BEAST (Drummond et al., 2012), molecular clock models are also implemented, which can be allowed to vary across the tree. This allows the dating of trees and calculation of the evolutionary rate, which provides greater information about the evolution of the family. However, for large trees implementing these methods can be time consuming, and so methods such as least squares allow dating using a Gaussian model to estimate substitution rates and ages of ancestral nodes across the tree (To et al., 2016) in a fast and accurate manner.

In this study the aim is to increase the number of mitochondrial genomes available for the Syrphidae family and use these to produce a phylogenetic tree with a large number of taxa to investigate subfamily and tribe level relationships. This tree will then be further used to look at the evolutionary rates and relative ages of important nodes within this family, and to map the evolution of the diverse larval life histories across the phylogeny.

20 Methodology Taxa Choice and DNA preparation

DNA for the mitochondrial genome sequencing was obtained through the Canadian National Collection (CDC), and the specimens were selected from the collections at the CDC. The species were chosen to represent the spread of genera and subfamilies, according to the topology of Young et al (2016). Several species from closely related families were also chosen to form an outgroup. DNA was extracted from the samples by the collaborators at the CDC and DNA concentrations measured using the Qubit high sensitivity kit. The specimens were identified by taxonomists, and alongside the DNA for shotgun sequencing the CO1 barcoding region was amplified and Sanger sequenced to obtain a reference barcode for each specimen, which could be used later in the analysis as baits to identify contigs.

The DNA pooling was based firstly on the taxonomy of the specimens, and a CO1 tree was generated in RAxML using the CO1 barcodes to check the phylogenetic relationships. The libraries were designed to maximize phylogenetic diversity and to have equimolar concentrations as in previous metagenomic studies (Gillett et al., 2014). Specimens from the same genera were not pooled in the same library, as these are highly likely to form chimeras during the assembly, and it can be difficult to distinguish closely related species using CO1 barcodes. The specimens were pooled so that each library had equimolar concentrations of each specimen, giving a total of 200ng of DNA per library. Where possible with taxonomy constraints, samples with similar concentrations were pooled together. This meant that high concentration samples were not diluted to compensate for low concentrations, increasing the potential to recover these specimens. It also meant that low concentration samples were more likely to be recovered, as even after equimolar pooling high concentration samples likely have higher quality DNA.

After the samples were pooled, they were dehydrated and shipped to the UK where they were re-hydrated and sent for sequencing on an Illumina HiSeq 2x250. The pooled libraries were re-run on the same Illumina HiSeq after the results from the first sequencing run were obtained, due to the low number of contigs obtained from the first run, and the lower than

21 expected number of overall reads that the first run achieved.

Assemblies

Post sequencing quality analysis for each library was carried out using fastqc, and remaining Illumina adapters were removed using Trimmomatic (Bolger, Lohse and Usadel, 2014). Prior to assembly the dataset was filtered for potential mitochondrial reads against a database of Dipteran mitochondrial genomes, using dc-megabast under low stringency conditions, minimizing loss of target reads. Putative mitochondrial reads from this step were extracted using FastqExtract3 and subject to genome assembly using three different methods: Ray (Boisvert, Laviolette and Corbeil, 2010), SPAdes (Bankevich et al., 2012) and IDBA. Assemblies from each procedure were imported into Geneious (Kearse et al., 2012) and de novo assembled to produce super-contigs from primary assemblies, which generally produced more and longer contigs than any one assembler on its own.

Gene predictions were obtained using the MITOS server (Bernt et al., 2013), based on existing annotations for a range of invertebrate mitochondrial genomes. The annotations were manually edited to obtain the correct start and (full or partial) stop codons, selecting among possible alternative start and stop codons by minimizing the intergenic spaces and overlap of genes. For simplicity, once several full length contigs had been annotated in this way, these were used as a reference genome for the rest of the unannotated genomes. Homology of the start and stop codons was tested by alignment for each gene using Muscle (Edgar, 2004). Full mitochondrial genomes were circularised.

A database of CO1 barcodes was generated for the same specimens included in the MMG libraries, to be used as ‘baits’ for identification of contigs. All contigs over 2kb were identified by blasting against the baits, with a positive identification if >98% match was found. Some contigs were unable to be identified in this way because they lacked CO1, but additional identifications were obtained by placement on a phylogenetic tree of identified and unidentified contigs. Each gene was extracted and aligned separately using the Muscle aligner, then concatenated with the SeqCat.pl script, before a tree was run in RAxML (Stamatakis, 2014) on the Cipres Science Gateway (Miller, Pfeiffer and Schwartz, 2010).

22 Taxa with apparently misplaced positions based on the current taxonomy (Young et al. 2016) were investigated for potential chimera formation during assembly from mixed samples. All mitochondrial CO1 from the potential chimeric genera available from GenBank (Clark et al., 2015) and BOLD (Ratanasingham et al., 2007) were obtained, and aligned with the mitogenome CO1. Alongside this, all other genes were aligned to produce separate gene trees. Major discrepancies in the phylogenetic position of the focal sequences in the different gene trees were taken as evidence for chimera formation.

Phylogenetic analysis

Phylogenetic trees were generated from the 13 protein coding and two rRNA genes from all identified contigs. Maximum likelihood analysis was conducted using RAxML version 8.2.12 (Stamatakis, 2014) on the Cipres Science Gateway (Miller, Pfeiffer and Schwartz, 2010) and visualised in Figtree. A total of five different data different partitioning schemes were applied, performing the tree searches with each of the 15 genes partitioned separately, or partitioned into genes on the forward and reverse strand. Both of these schemes were also run with additional partitioning for each of the three codon positions. A fifth analysis was run only with three codon partitioning. The GTR+I+G model was used for all of the analyses, estimating model parameters for each partition separately.

The RAxML tree was used to estimate the rate of evolution and the relative divergence dates of the nodes on the tree. This was done using least squares analysis (To et al., 2016), with the most recent common ancestor (the root) given a divergence time of 0 and the tips a time of 1. This allowed the calculation of relative divergence dates of nodes. The tree was rooted using Epalpus signifier, and the other outgroup sequences were removed as only the divergence times of the Syrphidae family were of interest. Confidence intervals were produced for the relative divergence times by running a simulation of 1000 trees.

Finally, a Bayesian tree was created using BEAST V1.8.4 (Drummond et al., 2012) under a molecular clock model. An xml file was created in BEAUTI, with 15 partitions for each of the protein coding genes and two rRNA. The best evolutionary model was selected for each gene alignment in jModelTest (Darriba and Posada, 2014) based on the AIC value. As a result, for all but one of the partitions the GTR+I+G evolutionary model was used. For nad4l

23 HKY+I+G was used. A lognormal relaxed clock was applied, with a birth and death model, which allows each lineage to speciate or go extinct at a fixed rate. Unlike the evolutionary models, the tree prior and clock model were allowed to vary between the partitions. Finally, the syrphid ingroup was constrained as monophyletic, ensuring that the tree was rooted by the outgroup node. The BEAST analysis was run on the Cipres Science Gateway for 50M generations, for 150 hours. The log file was visualised in Tracer to determine the burn-in. TreeAnnotator was used to summarise the sampled tree and determine Posterior Probability values, to generate a maximum clade credibility tree, keeping target node heights. The tree was visualised in FigTree and the relative clock scale bar shown on the tree, along with the relative divergence dates of the nodes.

Larval life history evolution

Life history information for larval stages of Syrphidae species were available for the UK (Ball and Morris, 2015), and so CO1 barcodes from UK syrphids were added to the phylogeny. This was done using sequences from 59 specimens collected over the summer of 2016 in East Anglia, which were sequenced using Illumina barcoding to give the 418-barcoding region of CO1 (Chapter 3). These sequences were aligned with the CO1 gene from the 93 mitochondrial genomes using Muscle (Edgar, 2004), and this alignment was concatenated together with the other 12 protein coding gene alignments and the two rRNA alignments. This concatenated alignment was used to run a RAxML tree on the Cipres Science Gateway. The mitochondrial genome tree was used as a backbone to constrain the topology, so that the barcodes were added into the existing topology. This meant that the large amount of missing data would not affect the tree topology.

The tree was visualised in FigTree and then used to construct the larval life history evolution. The larval life histories of the mitochondrial genome specimens were generalised to genus level life histories. The evolution of these traits was mapped on to the tree in R using the package Phytools (Revell, 2012) in which simmap was used to create a simulation of how the character traits mapped onto the tree. The simulation was run 1000 times, and the output trees used to map the posterior probability of a larval life history occurring at each node.

24 Results Assemblies

By combining the results of the two sequencing runs, a total of 94 contigs were obtained. The breakdown of this across the two runs can be seen in table 1, with the second run adding a total of 30 mitochondrial genomes to the dataset. It also increased the length of 28 contigs obtained in the first run, adding more genes to the dataset. The contigs ranged in length from 2,655bp to 17,574bp, with an average length of 12,493bp. 60 of the contigs had a length of over 10,000bp, and 58 of the contigs contained all 13 protein coding genes.

library First run contigs First & second run tree identified contigs 1 9 10 0 2 7 15 2 3 9 11 0 4 8 12 2 5 7 12 1 6 10 16 0 7 9 13 0 Total 59 89 5

Table 1. The number of identified contigs recovered from each of the seven libraries in the first sequencing run and once the repeat sequencing run data had been added. The third column shows the extra contigs which were identified based on tree placement in each library.

Overall, 94 identified contigs were obtained. Of these, 89 contained CO1 and thus could be identified with the CO1 reference database, while others were identified by their placement on the phylogenetic tree. A total of 81 of the contigs belonged to species of syrphid, and 13 were outgroup species. Three contigs with improbable phylogenetic placements were investigated for chimeric structure based on incongruence among gene trees. Two of them, and Lejota, were consequently removed from the analysis, whereas no clear evidence for a chimeric sequence was obtained for the third, Milesia, which was retained (Supplementary Material).

25 Gene Number of sequences nad2 81 cox1 90 cox2 90 ATP8 84 ATP6 82 cox3 80 nad3 79 nad5 75 nad4 73 nad4l 69 nad6 67 cytb 67 nad1 67 rrnL 65 rrnS 51

Table 2. The number of sequences present in each of the 13 protein coding gene datasets. This varies between genes due to the incompleteness of many of the contigs.

Phylogenetic analysis

The five RAxML trees obtained under different partitioning schemes were compared for topology and support of clades expected based on the current higher-level taxonomy (table 3). All of the trees found the Syrphidae family to be monophyletic. At the subfamily level, all found Microdontinae to be monophyletic, and a sister to the rest of the Syrphidae family. In all trees Eristalinae was found to be paraphyletic with respect to a monophyletic Syrphinae that was embedded in Eristalinae near Rhinginii. Out of 18 tribes initially included in the MMG, sequencing for four of them (Cheilosiini, Paragini, and Spheginobaccha) was not successful. A further four tribes (Callicerini, Merodontini, Sericomyiini and Toxomerini) were only represented by a single mitochondrial genome. Of the 10 tribes remaining, all RAxML trees found two Eristalinae tribes, and , to be non- monophyletic. Bachini was found to be paraphyletic, as has been found in previous studies (Mengual, Ståhls and Rojo, 2015; Young et al., 2016), and Toxomerini was embedded within (table 3). The only difference between the trees at tribe level was that the monophyletic Rhinginii clade differed in the levels of support, with partitioning by gene and codon giving the highest bootstrap value of 81.

26 RAxML RAxML RAxML RAxML genes RAxML fwd+reverse genes codon and codons fwd+reverse and codon only Syrphidae monophyletic YES (100) YES (100) YES (100) YES (100) YES (100) Syrphinae monophyletic YES (93) YES (95) YES (85) YES (97) YES (85) within paraphyletic Eristalinae (Mengual, Ståhls and Rojo, 2015; Young et al., 2016) Microdontinae sister to YES (100) YES (100) YES (100) YES (100) YES (100) the rest of Syrphidae (Young et al., 2016) Bachini polyphyletic YES YES YES YES YES (Mengual, Ståhls and Rojo, 2015; Young et al., 2016) Brachyopini NO NO NO NO NO monophyletic Ceriodini monophyletic YES (100) YES (100) YES (100) YES (100) YES (100) monophyletic YES (100) YES (100) YES (100) YES (100) YES (100) monophyletic YES (100) YES (100) YES (100) YES (100) YES (100) (Mengual, Ståhls and Rojo, 2008) Milesiini monophyletic NO NO NO NO NO Rhinginii monophyletic YES (69) YES (74) YES (43) YES (81) YES (48) (Mengual, Ståhls and Rojo, 2015) Toxomerini embedded YES YES YES YES YES within Syrphini as in (Mengual, Ståhls and Rojo, 2015) Volucellini monophyletic YES (100) YES (100) YES (100) YES (100) YES (100) (Mengual, Ståhls and Rojo, 2015) Parhelophilus NO NO NO NO NO monophyletic Allograpta YES (92) YES (95) YES (89) YES (97) YES (87) monophyletic Criorhina monophyletic NO YES (79) YES (83) YES (64) NO Ocypatamus NO NO NO NO NO monophyletic

Table 3. The five different partitioning methods used in the RAxML analysis, showing the monophyly of different clades on the trees, in relation to those found in two recent phylogenetic studies of the Syrphidae family. If a clade is monophyletic then the bootstrap support value is shown in brackets.

27 At the genus level, there were seven genera with more than one mitochondrial genome present on the tree, and all but three of them were monophyletic on all trees. Allograpta was monophyletic but with differing clade support, with the genes and codon partitioned tree having the highest support (table 3). Parhelophilus and Ocypatamus were paraphyletic on all trees. The two trees partitioned by forward and reverse strands also found Criorhina to be paraphyletic. There were differing levels of support for the Criorhina clade, with the codon partitioned tree having the highest support (bootstrap = 83). Overall the two forward and reverse strand partitioned trees had lower support than the gene and codon only partitioned tree.

Selecting the 15 gene partitioned tree as a better partitioning scheme was supported by comparing the bootstrap values across the five trees. The two forward and reverse partitioned trees had 46.67% of branches with bootstrap > 80, compared to 51.69% in the gene partitioned tree and 51.11% for the gene and codon partitioned tree. Overall the gene partitioned tree had the highest bootstrap values, with an average of 70, and 30% with support values of 100. This was compared to 26.67% of bootstrap values = 100 for the gene and codon partitioned tree and for both the forward and reverse partitioned trees. The codon only tree had an even lower percentage of bootstrap values = 100 (23.33%). The overall higher support on the 15 gene partitioned tree, along with the high support for specific clades, meant that it was selected as the final tree, which is shown in figure 1.

The final RAxML tree in figure 1 shows the Syrphidae family as monophyletic, with the subfamily Microdontinae sister to the other two subfamilies. The subfamily Syrphinae is monophyletic, within a paraphyletic Eristalinae. The tribe level relationships on the tree are indicated in figure 1. Two tribes in Eristalinae, Milesiini and Brachyopini, are paraphyletic and found across the subfamily. Merodontini is found to be sister to the rest of the Eristalinae, with Ceriodini and then Volucellini branching off afterwards. Eumerini and Callicerini form a clade, as do the Eristalini and a large portion of the Milesiini sequences. Rhinginii is found to be sister to the Syrphini, along with one of the Milesiini sequences. Within the Syrphinae, the Bachini tribe is sister to the rest of the subfamily, although one Bachini sequence is placed within the Syrphini tribe. The Toxomerini tribe is found to be within Syrphini, making that tribe paraphyletic.

28 There are not many genera that contain more than one sequence on the tree, but of the seven that do, five are monophyletic on the tree in figure 1. The genus Ocypatamus has been recently reorganized and the sequences from the genera Hybobathus, Nuntianus and Victoriana are thought to belong within Ocypatamus. In figure 1 these sequences do form a clade with one of the Ocypatamus sequences, however the other Ocypatamus sequence forms a clade with Orphanabaccha, which is sister to the Ocypatamus clade and Toxomerus. The other genus which is found to be paraphyletic is Parhelophilus, where the two sequences form a clade with Lejops.

29

Themira_nigriconis Argyra 59 93 Apystomyia_elinguis 37 Iteaphila_macquarti 100 Lindneromyia 100 Platypeza 83 Ironomyia_nigromaulata 100 Anevrina_luggeri Verallia 95 Dorylomorpha_alaskensis 80 100 Claraeola_sicilis Nephrocerus_lapponicus Aristosyrphus Domodon_peperpotensis 97 Pseudomicrodon 100 54 Paramixogaster 45 Stipomorpha_sp 65 100 100 Hypselosyrphus Microdont 100 Microdon 47 Serichlamys ini Microdon_globosus Alipumilio_avispas Merodonti Ceriana_cacica 100 Ceriana_willistoni ni 100 100 Ceriana_alboseta Ceriodini 100 Ceriana_vespiformis Orthoprosopa_grisea 94 Tropidia_rostrata Milesiini 73 Calcaretropidia_sp Parhelophilus_rex 75 100 Parhelophilus_ 98 Lejops_lunulatus 100 100 Triatylosus_dibapha 89 Senaspis_dentipes Erisalinus_aeneus Eristalini 10079 94 Pseuderistalis_violascens 35 Chasmomma_nigrum 83 Austalis_copiosus 51 Eristalis_pratorum Lycopale_wygodzinskyi 96 Mallota_florea 28 Pterallastes_thoracicus Milesiini 7 Sericomyia_flagrans Chamaesphegina Sericomyii Neoplesia_analis Brachyopi 100 Orthoprosopa_multicolor 82 35 Matsumyia_nigrofacies 3117 100 Criorhina_nigrventris Milesiini 39 79 Criorhina_coquilletti Cyphipelta_rufocyanea_ Brachyopi Brachypalpus_oarus 100 Brachypalpus_sp. ni 10 75 Chalcosyrphus_chalybeus 9360 Hadromyia_pulchra Xylota_quadrimaculata 57 100 Crepidomyia_Sterphus Milesiini Somula_decora 100 Blera_eoa 13 86 Caliprobola 100 Psilota_atra 49 Psilota_anthracina Eumerini Callicera_aenea Callicerini Argentinomyia 100 Talahua Bachini 75 Rohdendorfia_alpina Salpingogaster_sp 62 Asarkina_ericetorum DQ866050.1_Simosyprhus_grandicornis 95 25 100 52 Meliscaeva_auricollis 100 Allograpta_fascifrons Syrphin 17 95 Allograpta_sp. Allograpta_obliqua i 31 Allobacha_monobia 50 Toxomerus_saphiridiceps Toxomeri 19 1997 Hybobathus_norina 99 Nuntianus_cubana ni 29 88 94 KT272862.1_Ocyptamus_sativus 100 Victoriana_melanorrhina Ocyptamus_dimidiatus Syrphin 40 44 Orphnabaccha_priscilla Doros_destillatorius i 1821 syrphus_rectus 22 45 Betasyrphus_seraruis Dideopsis Baccha_elongata Bachini Milesia_pendleburyi Milesiini Pelecocera_tricincta 74 Cheilosia_albitarsis Rhingin 57 KM244713.1_Syrphidae_sp. Orthnevra_nitida Brachyopiii 24 Ornidia 100 Copestylum Volucellini Epalpus_signifer

0.08 Figure 1. A rooted phylogeny of the Syrphidae family made using RAxML, with 15 gene partitions. The bootstrap values are shown on the branches. The three subfamilies are coloured, with Syrphinae in green, Eristalinae in pink and Microdontinae in orange. The outgroup is made up of closely related Diptera and is shown in black. The tribes are labelled next to the tree, with each tribe coloured individually. Non-monophyletic tribes appear more than once across the tree.

30 The least squares analysis gave an overall evolutionary rate of 0.3 with confidence intervals [0.29, 0.31]. The relative divergence times for each node are shown in figure 2, and the confidence intervals indicate high confidence in these times. The Microdontinae diverge from the rest of the Syrphidae family early on (0.0616), and the early divergences within this group are also before any of the divergence times in the rest of the Syrphidae. The Syrphinae subfamily diverged from Eristalinae at a relative time of 0.35. There is a great deal of variation in genus level nodes, with the two species having a most recent common ancestor at 0.9, and so only diverging very recently. This is unlike other genera such as , for which there are four species present and have a divergence between two clades at 0.4694, a much earlier split.

The Bayesian tree produced in Beast (figure 3) found the same sub-family level relationships as the RAxML tree in figure 1. The relative divergence dates are shown on the tree (figure 3) and can be compared to the least squares analysis. The Beast analysis gives a much faster rate of evolution (1.93), given as the mean birth rate, than the least squares analysis. The scale of evolution on the tree is reversed in the Beast analysis, with the root at 1 rather than 0. The divergence dates of the node put the divergence of Eristalinae and Syrphinae at 0.51, which is later than in the least squares analysis. Once again Microdontinae diverged from the rest of the Syrphidae family early on, at a relative time of 0.96. The genus level divergence times show the same pattern as the least squares analysis, with the two Psilota species diverging at 0.04, very close to the tips of the tree.

31 Ceriana_vespiformis 0.6931 Ceriana_alboseta 0.4694 Ceriana_cacica 0.6429 Ceriana_willistoni Lejops_lunulatus 0.694 0.6408 Parhelophilus_ Parhelophilus_rex Lycopale_wygodzinskyi 0.6643 Mallota_florea 0.444 Austalis_copiosus 0.5656 0.6483 Eristalis_pratorum Pseuderistalis_violascens 0.518 0.5857 Chasmomma_nigrum Erisalinus_aeneus Senaspis_dentipes 0.7745 Triatylosus_dibapha Calcaretropidia_sp 0.3953 Tropidia_rostrata Orthoprosopa_grisea Hadromyia_pulchra 0.6592 Chalcosyrphus_chalybeus 0.6267 Brachypalpus_oarus 0.8019 0.5568 Brachypalpus_sp. Crepidomyia_Sterphus 0.6618 Xylota_quadrimaculata 0.2642 0.4934 Caliprobola 0.7408 0.6885 Blera_eoa 0.5272 Somula_decora Sericomyia_flagrans 0.4112 Criorhina_nigrventris 0.7091 0.6384 Criorhina_coquilletti 0.4582 Matsumyia_nigrofacies Pterallastes_thoracicus 0.4137 Neoplesia_analis 0.3752 0.4253 Orthoprosopa_multicolor Callicera_aenea Psilota_atra 0.9082 0.4846 Psilota_anthracina 0.4067 Chamaesphegina Cyphipelta_rufocyanea_ Pelecocera_tricincta 0.3249 0.5397 0.5322 Cheilosia_albitarsis 0.4078 KM244713.1_Syrphidae_sp. Milesia_pendleburyi Argentinomyia 0.8068 Talahua Betasyrphus_seraruis 0.5953 0.1533 syrphus_rectus 0.3549 Doros_destillatorius Orphnabaccha_priscilla 0.5884 Ocyptamus_dimidiatus KT272862.1_Ocyptamus_sativus 0.5741 0.8068 0.7248 Victoriana_melanorrhina 0.703 Nuntianus_cubana 0.47 0.6719 Hybobathus_norina Toxomerus_saphiridiceps Allobacha_monobia Allograpta_fascifrons 0.7767 0.5213 0.6661 Allograpta_sp. Allograpta_obliqua DQ866050.1_Simosyprhus_grandicornis 0.7361 Meliscaeva_auricollis Asarkina_ericetorum Salpingogaster_sp 0.4918 Dideopsis Baccha_elongata Rohdendorfia_alpina Ornidia 0.514 0.3453 Copestylum Orthnevra_nitida Alipumilio_avispas 0 Microdon 0.3878 0.0786 Serichlamys Microdon_globosus 0.0616 Domodon_peperpotensis 0.2332 Pseudomicrodon 0.0744 Stipomorpha_sp 0.706 0.0993 Hypselosyrphus Paramixogaster Aristosyrphus Epalpus_signifer

0.2 Figure 2. Tree with results of least squares dating performed on the RAxML tree rooted with Epaulus signifier. The relative node ages are displayed on the tree, with the most recent common ancestor at time point 1 and the tips 0. The confidence intervals are show as bars and were calculated for 1000 tree simulations.

32 Themira_nigriconis 0.71 Epalpus_signifer Lindneromyia 0.8 0.42 0.65 Platypeza 0.51 Anevrina_luggeri 0.67 Ironomyia_nigromaulata Apystomyia_elinguis 0.82 0.6 Argyra 0.45 Iteaphila_macquarti Nephrocerus_lapponicus 0.81 0.47 Claraeola_sicilis 0.67 Dorylomorpha_alaskensis Verallia 0.17 Caliprobola 0.19 Blera_eoa Somula_decora 0.34 Xylota_quadrimaculata 0.12 Crepidomyia_Sterphus 0.23 0.16 Chalcosyrphus_chalybeus 0.37 Hadromyia_pulchra 0.18 Brachypalpus_sp. 0.11 Brachypalpus_oarus 0.41 Pterallastes_thoracicus 0.23 Sericomyia_flagrans 0.46 Orthoprosopa_multicolor 0.26 Neoplesia_analis Callicera_aenea 0.44 Psilota_anthracina 0.51 0.04Psilota_atra Criorhina_coquilletti 0.27 0.17 Matsumyia_nigrofacies 0.51 Criorhina_nigrventris Chamaesphegina 0.47 Cyphipelta_rufocyanea_ Milesia_pendleburyi 0.49 Cheilosia_albitarsis 0.54 0.36 KM244713.1_Syrphidae_sp. 0.15 Pelecocera_tricincta Argentinomyia 0.11 Talahua Rohdendorfia_alpina Salpingogaster_sp 0.27 Asarkina_ericetorum Allograpta_obliqua 0.51 0.43 0.23 0.11 Allograpta_fascifrons 1 Allograpta_sp. 0.32 Orphnabaccha_priscilla 0.27 0.22 Toxomerus_saphiridiceps 0.4 Nuntianus_cubana 0.2 0.12 Victoriana_melanorrhina 0.23 0.150.1 0.330.29 KT272862.1_Ocyptamus_sativus Hybobathus_norina 0.46 Ocyptamus_dimidiatus 0.19 Meliscaeva_auricollis 0.55 0.34 DQ866050.1_Simosyprhus_grandicornis Allobacha_monobia 0.38 0.28 Betasyrphus_seraruis 0.32 syrphus_rectus Doros_destillatorius Dideopsis Baccha_elongata Chasmomma_nigrum Triatylosus_dibapha 0.33 0.15 Senaspis_dentipes Erisalinus_aeneus 0.320.28 Austalis_copiosus 0.59 0.23 Pseuderistalis_violascens 0.350.31 0.22 Eristalis_pratorum Lycopale_wygodzinskyi 0.17 Mallota_florea 0.48 Lejops_lunulatus 0.15 Parhelophilus_ 0.12 Parhelophilus_rex 0.610.54 Calcaretropidia_sp 0.41 Tropidia_rostrata 0.38 Orthoprosopa_grisea Orthnevra_nitida 0.25 Ornidia 0.68 Copestylum Ceriana_willistoni 0.24 Ceriana_cacica 0.36 Ceriana_vespiformis 0.11 Ceriana_alboseta Alipumilio_avispas 0.91 Microdon_globosus 0.5 Serichlamys 0.33 Microdon 0.57 Domodon_peperpotensis 0.36 Pseudomicrodon 0.41 0.61 Paramixogaster 0.28 Stipomorpha_sp 0.11 Hypselosyrphus Aristosyrphus

0.2 Figure 3. Bayesian tree created using BEAST v1.8.4 using the 13 protein coding genes and two rrnLs, partitioned by gene. The scale bar shows the relative evolutionary time scale of the tree, calculated using a relaxed clock model, and the relative divergence dates are shown on the tree, with the most recent common ancestor at time point 1 and the tips at time point 0. The three subfamilies are shown in different colours: orange = Microdontinae, green = Syrphinae and pink = Eristalinae 33 Larval life history evolution

The RAxML tree with added CO1 barcodes from UK species can be seen in figure 4. There were no species from subfamily Microdontinae in the UK sample, but there were species from both Eristalinae and Syrphinae, which were all placed in the correct subfamilies on the tree. Including the barcodes added some genera which were not present in the mitogenome only tree, including Volucella and Eupeodes. It also increased the number of species in some genera, such as Cheilosia and Eristalis. All genera with multiple species were found to be monophyletic, except for Parhelophilus and . Parhelophilus was paraphyletic, as in the mitogenome only tree, but another species, Anasimyia lineata was also found in the same clade. All three of the Syrphus barcodes and the one Syrphus mitogenome came out in a single clade, but the barcode Xanthandrus comtus was also found in the same clade.

34 Aristosyrphus Domodon_peperpotensis 100 Pseudomicrodon 98 100 Paramixogaster 100 Stipomorpha_sp 100 100 Hypselosyrphus 100 Microdon 100 Serichlamys Microdon_globosus Rhingia_campestris_barcode 13 99 Ceriana_cacica 38 Ceriana_willistoni Ceriana_alboseta 97 Ceriana_vespiformis Neoascia_podagrica_barcode 48 Orthoprosopa_grisea 38 Tropidia_rostrata 38 Calcaretropidia_sp 53 Syritta_pipiens_barcode 36 Tropidia_scita_barcode Parhelophilus_rex 22 81 41 Anasimyia_lineata_barcode 43 Parhelophilus_ Lejops_lunulatus Triatylosus_dibapha 100 Senaspis_dentipes 87 Helophilus_pendulus_barcode 98 Helophilus_hybridus_barcode 27 Eristalinus_sepulchralis_barcode 100 Erisalinus_aeneus 5647 Pseuderistalis_violascens 84 Chasmomma_nigrum 65 Austalis_copiosus Eristalis_pertinax_barcode 69 Eristalis_intricaria_barcode 59 29 Eristalis_nemorum_barcode 49 94 39 Eristalis_horticola_barcode 52 Eristalis_tenax_barcode Eristalis_pratorum 37 Eristalis_abusiva_barcode 100 Eristalis_arbustorum_barcode Lycopale_wygodzinskyi 99 Myathropa_florea_barcode 100 Mallota_florea 82 Pterallastes_thoracicus 59 Sericomyia_flagrans Chamaesphegina 100 Neoplesia_analis 83 Orthoprosopa_multicolor 89 Matsumyia_nigrofacies 5 26 97 Criorhina_nigrventris 13 92 100 Criorhina_coquilletti Cyphipelta_rufocyanea_ Brachypalpus_oarus 98 Brachypalpus_sp. 48 72 77 Chalcosyrphus_chalybeus 64 Hadromyia_pulchra Xylota_quadrimaculata 55 96 Crepidomyia_Sterphus Somula_decora 94 Blera_eoa 55 95 Caliprobola 100 Psilota_atra 100 Psilota_anthracina Callicera_aenea Melanostoma_mellinum/scalare_barcode 95 Argentinomyia 98 Talahua Platycheirus_peltatus_barcode 1553 Platycheirus_scutatus_barcode 10 Platycheirus_albimanus_barcode 19 Platycheirus_granditarsis_barcode 100 Platycheirus_rosarum_barcode 16 Platycheirus_clypeatus_barcode 46 Platycheirus_occultus_barcode Rohdendorfia_alpina Salpingogaster_sp 66 Asarkina_ericetorum 24 Episyrphus_balteatus_barcode 100 DQ866050.1_Simosyprhus_grandicornis 9 65 5 Meliscaeva_cinctella_barcode 82 Meliscaeva_auricollis_barcode 99 Meliscaeva_auricollis 94 Sphaerophoria_philanthus_barcode 13 99 Sphaerophoria_interrupta_barcode Sphaerophoria_scripta_barcode 20 92 Allograpta_fascifrons 51 Allograpta_sp. Allograpta_obliqua 12 Allobacha_monobia 7 Toxomerus_saphiridiceps 298 37 70 Hybobathus_norina 96 Nuntianus_cubana 97 54 100 KT272862.1_Ocyptamus_sativus 44 Victoriana_melanorrhina Ocyptamus_dimidiatus 86 Orphnabaccha_priscilla 26 Xanthogramma_pedissequum_barcode 28 Doros_destillatorius Syrphus_torvus_barcode 75 96 Xanthandrus_comtus_barcode 54 Syrphus_ribesii_barcode 179 Syrphus_vitripennis_barcode 99 syrphus_rectus 11 Melangyna_compositarum_barcode 43 Eupeodes_latifasciatus_barcode 95 16 Eupeodes_luniger_barcode 3 60 Eupeodes_corollae_barcode Leucozona_lucorum_barcode 17 13 Epistrophe_grossulariae_barcode 1213 Paragus_haemorrhous_barcode 1 Betasyrphus_seraruis Epistrophe_eligans_barcode Chrysotoxum_festivum_barcode 18 Dideopsis Baccha_elongata Milesia_pendleburyi Ferdinandea_cuprea_barcode 81 Pelecocera_tricincta Cheilosia_impressa_barcode Cheilosia_soror_barcode 67 66 8256 Cheilosia_illustrata_barcode Cheilosia_variabilis_barcode 48 100 Cheilosia_ranunculi_barcode 81 72 Cheilosia_albitarsis 53 Cheilosia_bergenstammi_barcode Cheilosia_pagana_barcode KM244713.1_Syrphidae_sp. Orthnevra_nitida Eumerus_funeralis_barcode 9 65 Merodon_equestris_barcode 6 Heringia_heringi_barcode Volucella_inanis_barcode 13 19 87 Volucella_bombylans_barcode 53 Volucella_pellucens_barcode Ornidia 97 Copestylum Alipumilio_avispas 9 Lejogaster_metallina_barcode Epalpus_signifer Themira_nigriconis 93 Argyra 94 100 Apystomyia_elinguis 100 Iteaphila_macquarti 95 100 Lindneromyia 83 Platypeza 98 Ironomyia_nigromaulata 99 Anevrina_luggeri Verallia 93 Dorylomorpha_alaskensis 90 100 Claraeola_sicilis Nephrocerus_lapponicus

0.08 Figure 4. Maximum likelihood tree created in RAxML with 15 gene partitions. A backbone tree of mitochondrial genomes was used to constrain the topology, with CO1 barcodes from UK Syrphidae species added on to the tree. Barcodes are labelled as ‘barcode’ on the tree. The three subfamilies are coloured on the tree, orange = Microdontinae, green = Syrphinae and purple = Eristalinae. Bootstrap values are shown on the branches. 35 Nephrocerus lapponicus Dorylomorpha alaskensisClaraeola sicilis Verallia Anevrina luggeri Ironomyia nigromaulata

Platypeza

Lindneromyia Iteaphila macquarti Apystomyia elinguis Argyra Themira nigriconis Epalpus signifer metallina barcode avispas Copestylum Ornidia Volucella pellucens barcode Volucella bombylans barcode Volucella inanis barcode barcode equestris barcode funeralis barcode Orthnevra nitida KM244713.1 Syrphidae sp. Cheilosia pagana barcode Cheilosia bergenstammi barcode Cheilosia albitarsis Cheilosia ranunculi barcode Cheilosia variabilis barcode Cheilosia illustrata barcode Cheilosia soror barcode Cheilosia impressa barcode tricincta Ferdinandea cuprea barcode Milesia pendleburyi Baccha elongata Dideopsis Chrysotoxum festivum barcode Epistrophe eligans barcode Betasyrphus seraruis Paragus haemorrhous barcode Epistrophe grossulariae barcode Leucozona lucorum barcode Eupeodes corollae barcode Eupeodes luniger barcode Eupeodes latifasciatus barcode Melangyna compositarum barcode syrphus rectus Syrphus vitripennis barcode Syrphus ribesii barcode Xanthandrus comtus barcode Syrphus torvus barcode Doros destillatorius Xanthogramma pedissequum barcode Orphnabaccha priscilla Ocyptamus dimidiatus Victoriana melanorrhina KT272862.1 Ocyptamus sativus Nuntianus cubana Hybobathus norina Toxomerus saphiridiceps Allobacha monobia Allograpta obliqua Allograpta sp. Allograpta fascifrons scripta barcode Sphaerophoria interrupta barcode Sphaerophoria philanthus barcode Meliscaeva auricollis Meliscaeva auricollis barcode Meliscaeva cinctella barcode DQ866050.1 Simosyprhus grandicornis Episyrphus balteatus barcode Asarkina ericetorum Salpingogaster sp Rohdendorfia alpina occultus barcode Platycheirus clypeatus barcode Platycheirus rosarum barcode Platycheirus granditarsis barcode Platycheirus albimanus barcode Platycheirus scutatus barcode Platycheirus peltatus barcode Talahua Argentinomyia Melanostoma mellinum/scalare barcode Callicera aenea Psilota anthracina Psilota atra Caliprobola eoa Somula decora Crepidomyia Sterphus Xylota quadrimaculata Hadromyia pulchra Chalcosyrphus chalybeus Brachypalpus sp. Brachypalpus oarus Cyphipelta rufocyanea Criorhina coquilletti Criorhina nigrventris Matsumyia nigrofacies Orthoprosopa multicolor Neoplesia analis Chamaesphegina Sericomyia flagrans Pterallastes thoracicus Mallota florea Myathropa florea barcode Lycopale wygodzinskyi Eristalis arbustorum barcode Eristalis abusiva barcode Eristalis pratorum Eristalis tenax barcode Eristalis horticola barcode Eristalis nemorum barcode Eristalis intricaria barcode Eristalis pertinax barcode Austalis copiosus Chasmomma nigrum Pseuderistalis violascens Erisalinus aeneus Eristalinus sepulchralis barcode Helophilus hybridus barcode Helophilus pendulus barcode Senaspis dentipes Triatylosus dibapha Lejops lunulatus Parhelophilus Anasimyia lineata barcode Parhelophilus rex Tropidia scita barcode Syritta pipiens barcode Calcaretropidia sp Tropidia rostrata Orthoprosopa grisea Neoascia podagrica barcode Ceriana vespiformis Ceriana alboseta Ceriana willistoni Rhingia campestrisCeriana cacica barcode Microdon globosus Serichlamys Microdon Hypselosyrphus Stipomorpha sp Paramixogaster Pseudomicrodon Domodon peperpotensis Aristosyrphus

Figure 5. Tree showing the evolution of larval life histories using stochastic character mapping. Five life histories are shown on the tree, represented by different colours. Turquoise = myrmecophiles, purple = saprophagous, green = insectivorous, blue = phytophagous and yellow = fungus feeding. The posterior probability for each trait over 1000 simulations is show for each node in the form of a pie chart.

36 The trait mapping analysis is shown in figure 5 and shows the five broad categories of larval life histories mapped onto the phylogeny. This shows that in almost all of the simulations saphrophagy comes out as the ancestral state, with Microdontinae breaking away early from the rest of the Syrphidae family and evolving into myrmecophiles. However, this was not found in all simulations, with a few finding myrmecophily as the ancestral state. There appears to be one large radiation of predatory life histories, occurring in Syrphinae, and then two smaller isolated radiations into predation in Volucella and Heringia. In a few simulations the ancestral nodes between Volucella and Heringia were also predatory states, suggesting a single evolution of this life history for these two genera. The fourth larval life history, phytophagy, appears twice on the tree, once evolving in the genus Cheilosia and once in the clade of the bulb flies, and Eumerus funeralis. A single member of Cheilosia present in this phylogeny, Cheilosia soror, has switched from phytophagy to a fungal diet, but the sample size of species here is not enough to detect whether this is an isolated evolution or a small radiation into a new niche within this genus.

Discussion Taxa choice and Assemblies

Specimens were selected from a broad a geographic range, to produce a phylogeny which reflects the relationships of the whole family and not just a specific geographic region. It was also important to select specimens for the phylogeny based on representation of the subfamilies and tribes within the Syrphidae family. Both these things were achieved by collaborating with the Skevington lab in Canada in the CDC, which has expert taxonomic knowledge of this family and the means to obtain specimens. Out of the 207 specimens sampled, contigs were recovered for 94. This is a lower number than expected, mostly due to low DNA concentrations. Each library was designed to contain over 200ng of DNA, so that it was suitable for TruSeq nano library preparation (Illumina, 2015). However, the DNA amounts of the libraries were close to the minimum amounts required or below, especially after quality assessment. This may have been exacerbated by DNA transport, with any DNA degradation reducing the long DNA fragments needed for MMG. Sequencing the libraries again increased the number of mitochondrial genomes obtained by 30, suggesting there

37 was a stochastic loss of data. It is important to note that a large number of mitochondrial genomes were obtained for a cost which would not have been feasible without MMG. Despite the fact that this study did not obtain all of the mitochondrial genomes it set out to, it increases the number of syrphid genomes available from eight (Sonet et al., 2019) to 92 and thus makes an important contribution to the public database of syrphid DNA. It also provides data for the most complete and well supported Syrphidae phylogeny to date (Skevington and Yeates, 2000; Ståhls et al., 2003; Mengual, Ståhls and Rojo, 2015; Young et al., 2016).

Not all of the contigs were able to be identified using CO1 baits. To try to increase the number of identified contigs, a tree was built containing all the contigs over 2kb, however this was only suitable for contigs which belonged to a genus already containing an identified specimen. In the future, sequencing more bait mitochondrial genes may enable more contigs to be identified. For other groups, it may be possible for genes to be found on GenBank to be used as baits, but these were not available for the syrphid species sequenced here.

The preliminary tree showed three specimens in unlikely positions based on prior knowledge of the Syrphidae family (Young et al., 2016), especially as two were in different subfamilies to those they are assigned. It was important to investigate these sequences empirically, as it could have been the case that this placement was due to the evolution of the mitochondrial data. The individual gene trees showed that for Nausigaster and Lejota there was a clear split along the genome as to where they placed on the tree. This showed that in both cases the contig was a chimera which was causing the strange placement. Chimeras can be an issue in MMG, as there are many specimens from different species included in a single pool of DNA. The likelihood of them occurring is minimised by only including one representative from a single genus in each pool, so that closely related specimens are not in the same pool.

38 Phylogenetic analysis

The gene partitioned trees both performed better in terms of clade support and overall bootstrap values than the other strategies. However, the differences between all the trees were quite small, suggesting that the high-quality alignments and large amount of data were robust enough to be largely unaffected by different partitioning schemes. The gene and codon partitioned tree had lower support than the gene partitioned tree, suggesting that the increase in information was not outweighed by the increase in the number of partitions, which was tripled from 15 to 45. This is a large number of partitions and it is not unexpected that this tree therefore had lower support. There is a balance between finding an evolutionary sound partitioning scheme and not over-partitioning the data, which can result in a higher rate of error.

The placement of Microdontinae as sister to the rest of Syrphidae was established in 2000 by Skevington and Yeates (2000), using 12S and 16S sequences to look at broad relationships between the subfamilies in Syrphidae. The topology of this tree is congruent with the topology found in this study, with Microdontinae sister to the rest of the Syrphidae, and Syrphinae monophyletic within paraphyletic Eristalinae. Other studies have also looked at these relationships (Ståhls et al., 2003), however the small amount of data and specimens meant that relationships between Syrphinae and Eristalinae were unable to be resolved. This study supports the subfamily level topology found in Skevington and Yeates, (2000), and firmly places Syrphinae within Eristalinae. This is the most complete phylogeny of the Syrphidae family to date, with 83 species and 15 genes, resulting from 14,333 bases of DNA.

In this study three out of the ten tribes represented by more than one species were found to be polyphyletic, and one was found to be paraphyletic. This was the case for all of the RAxML trees. The tribe Syrphini is paraphyletic, with the one specimen from Toxomerini within it. Toxomerini was also found to be embedded in Syrphini in the study by Mengual, Ståhls and Rojo (2015), which looked at relationships within Syrphinae. The third Syrphinae tribe in this study, Bachini, was found to be paraphyletic in agreement with Mengual, Ståhls and Rojo (2015) and Young et al. (2016). The Milesiini tribe was heterogenous and the

39 members of the tribe were found across Eristalinae, as in Mengual, Ståhls and Rojo (2015), where Milesiini were found to be polyphyletic and distributed among the rest of Eristalinae. The Brachyopini tribe also formed multiple clades within Eristalinae in this study and Mengual, Ståhls and Rojo, (2015). The current accepted tribal level relationships in Syrphidae are based on morphological characters and it is clear that further research is required with more species to review these relationships.

In this study Rhinginii were found to be sister to the Syrphinae subfamily. In Mengual, Ståhls and Rojo, (2015) and Young et al. (2016) Pipizini were found to be sister to Syrphinae, and are a tribe with aphidophagous larval, like a lot of Syrphinae species. However, this tribe was not recovered from the libraries in this study, and so was not represented in this dataset. This study is therefore limited as to resolving the relationships between the Eristalinae and Syrphinae subfamilies, and in determining the sister group to Syrphinae. Further research into these tribal level relationships should focus on obtaining mitochondrial genomes for the tribes not recovered here, so that the gaps in the tribal level relationships can be resolved. Particular focus on tribes with divergent larval histories, such as the Pipizini, would be especially informative.

The least squares dating and Beast analysis show different rates of evolution across the tree. This shows the importance of using a fossil calibrated phylogeny, so that the rates can be calculated with greater accuracy. In this study only relative rates were used, because data on the Syrphidae fossil record is not readily available, and many recorded fossils are difficult to verify (Popov, 2015). Further research to include fossil calibration would allow these relative dates to be turned into time calibrated dates as has been done for other groups (Espeland et al., 2018) and would give more information on the evolution of this family. Despite this, the two analyses show the same patterns for divergence of different clades. In both analyses the subfamily Microdontinae splits early on from the rest of the Syrphidae, and as a result the evolutionary distances between groups in this subfamily are greater than in the rest of the family. This supports previous calls to elevate Microdontinae to family level (Thompson, 1972). This has not been disputed by previous phylogenies (Young et al., 2016), but the addition of relative divergence times adds support to the theory that the divergence from the rest of Syrphidae is deep enough to warrant family

40 status. Both trees also show the divergence time of Syrphinae from Eristalinae, which is later than many of the divergences within Eristalinae, further providing evidence that the relationships within these subfamilies are complex and may not be best explained by the current taxonomy. These relative rates may also go some way to explaining the inability of CO1 to separate some species, as was found to be the case in Chapter 2. Some of the genus level splits in the tree occur close to the tips, suggesting that species within these genera have only recently diverged in their mitochondrial DNA.

Larval life history evolution

Adding in CO1 barcodes from UK species allowed the larval life histories to be examined in more detail, as these are generally species for which more is known about their life histories (Ball and Morris, 2015). Alongside this, it added in genera that were not included in the tree, including Volucella, where a small radiation of predatory lifestyles has occurred. The tree showed that the predatory lifestyle has radiated within Syrphinae, but that there was also a smaller, separate evolution of a predatory lifestyle in Volucella. This is not the only larval life history to have evolved more than once, with phytophagous larvae species evolving in two separate places within Eristalinae. The stochastic mapping analysis also placed the saprophagous life history as being the ancestral state on the tree, with Microdontinae splitting off from the rest of the rest of the family and radiating into a new life history niche. This analysis gives an idea of the ways in which a robust phylogeny such as this one can be used to look in more detail at the evolution of this family of pollinators in the future.

Conclusion This study resulted in 81 new syrphid mitochondrial genomes, massively increasing the amount of genetic data available for this family, which will continue to benefit syrphid research beyond this study. This is despite sequencing issues, which mean there is still a need to expand the list of mitochondrial genomes for tribes that were not captured here. The phylogenetic tree produced in this study is the most complete tree of this family to date, with more genetic data and more species than have been included previously. The topology supports the findings of previous studies and finds Microdontinae sister to the rest

41 of the Syrphidae, and Syrphinae monophyletic within a paraphyletic Eristalinae. The tribal level relationships found here show several tribes are polyphyletic, and suggest the need for a revision of the tribes in this family. Finally, the relative ages of divergence found here support an elevation of Microdontinae to family level, and highlight the need for a fossil calibrated phylogeny. This topology has implications for the evolution of larval life histories, supporting multiple points where predatory larvae have evolved, and supporting an ancestral saprophagous lifestyle. Having this robust phylogeny also enables further research to incorporate phylogenetic diversity into studies of community diversity, adding another layer to the data available for monitoring and conservation of these important pollinators.

42 Chapter 2: Developing and testing a Syrphidae CO1 reference database

Introduction In the UK a recent focus on pollinators has resulted in a national monitoring programme, with the aim to monitor pollinator populations long-term. This will enable the detection of changes which may indicate conservation concerns (Carvell et al., 2016). Monitoring in the UK is carried out mainly by volunteers (Barlow et al., 2015; Pescott et al., 2015; Newson et al., 2016). However, reliance on volunteers can be challenging when trying to monitor a diverse or cryptic group. Both of these issues apply to monitoring pollinators, which are made up of many different insect groups, and which contain many species which are difficult to identify to species level. The current situation relies on paid expert taxonomists identifying bees and syrphid flies from pan trap samples, which is expensive and time intensive. The Syrphidae family are less well studied than bee pollinators, but Diptera are thought to be the second most important group of pollinators in the UK, and so monitoring this family provides an indicator for a super-diverse flower visiting group.

There is increasing interest from stakeholders and policy makers in new technology and research, and how it can enable us to address environmental issues in new and innovative ways. Several studies have used DNA to monitor animals and plants, showing the potential of DNA to make monitoring more time and money efficient. The current applications of DNA in monitoring have mainly been of environmental DNA (eDNA), which is already used to detect great crested newts in ponds across the UK (Rees et al., 2014; Biggs et al., 2015). However, there are also applications for using DNA to identify individuals and bulk samples, particularly of invertebrates (Arribas et al., 2016). These methods are best applied to diverse groups which are challenging to identify using morphology. There are around 280 syrphid species in the UK (Ball and Morris, 2015), as well as many other insect flower visitors, including other Diptera, Coleoptera and Lepidoptera, making it a very diverse group to survey.

43 DNA barcoding is a widely used method which aims to distinguish as species quickly and in a cost effective manner. DNA barcoding was first used as a tool for identifying microbial communities, and in 2003 Hebet et al. proposed adapting it for identifying animals, using a region of CO1 as an appropriate barcode. This barcode is still the most widely used for animals, including insects, today. Since then DNA barcoding has been adapted for animals, plants and bacteria, although different barcoding regions are used for different groups.

A DNA barcode is a section of DNA which has more between species variation than within species variation, resulting in a barcode gap between species. For insects a region of CO1 is widely accepted as the barcode of choice, and there is a wealth of data publicly available on online databases such as NCBI and Genbank. This can be very useful for identifying unknown specimens, however there are also issues with online databases, as they are not curated and mistakes can be difficult to identify. Whilst the CO1 barcode is thought to be able to distinguish species, in reality there is a large range of variability between groups, with some fast evolving or newly separated groups unable to be split into species.

DNA barcoding of individual specimens can now be done in large volumes due to the large number of reads provided by next generation Illumina sequencing, which provide enough reads for many samples. The method introduced by Shokralla et al., (2015) allows dual tagging, firstly of the primary PCR using a 6 base pair tag, and secondly during the library preparation. Large numbers of samples can be stacked together in the same library, and separated post-sequencing using bioinformatics.

One of the biggest challenges of using DNA for monitoring is a lack of reference databases. Without a comprehensive reference database it is still possible to get valuable information which can be used for wide scale biodiversity studies (Creedy, Ng and Vogler, 2019), but identification to species level is used for all major monitoring schemes in the UK, and thus it is important that robust databases are created and curated. There are several public databases such as GenBank (Clark et al., 2016) and BOLD (Ratanasingham and Hebert, 2007) where sequences are stored, and these provide a valuable resource for identification. However, the nature of public databases is that they contain information which cannot always be verified, and so it is possible that errors in identification occur. They are also

44 incomplete, as not all species or even all known species have been barcoded. When setting up a methodology for monitoring a group of species, it is important that the reference databases are curated, to reduce any online database errors, to fill gaps caused by missing sequences and to ensure that the CO1 barcode is suitable for species delimitation. The creation of a reference database requires specimens which have been collected and identified by expert taxonomists, and ideally are held in public collections so that they can be validated into the future. This highlights the ongoing importance of taxonomists, and of the institutions which house natural history collections. These provide vital sources of DNA which is robustly identified. They also contain storage facilities for DNA collections, which are important for retaining the physical DNA of reference specimens.

Metabarcoding studies often use operational taxonomic units (OTUs) as a proxy for species. These are clusters of closely related sequences which are closer in distance to each other than to other clusters. In general, for the CO1 barcode a 3% barcode gap is used, with sequences that are ³97% similar clustered together. However, the barcode distance between species is not static. For cryptic species, which may be the result of recent rapid evolution, the CO1 barcode may not have diverged enough to distinguish these species based on 97% clustering (Čandek and Kuntner, 2015). Alongside this, clustering results in a loss of data within communities, as population level differences within species are masked by the clustering process. There have been moves towards using individual sequences, or haplotypes, with metabarcoding data (Elbrecht et al., 2018; Turon et al., 2019) as this generally removes confusion between species. These are unique sequences, and if they are mapped on to a reference sequence with a very high threshold for sequence matches, different haplotypes within a species can also be monitored. This could be important for conservation, as it would show how connected populations are, and whether there are unique differences in a species between sites. Using haplotypes rather than OTUs gives a much more detailed picture of populations and results in more data for conservation decisions.

The aim of this study is to develop a reference dataset for UK hoverfly species, which will be a vital resource for future research, as well as for future monitoring programmes. Alongside this the study aims to investigate the success of DNA for species delimitation in a family

45 containing many cryptic species, by using a monitoring sample as a test of identification. The use of OTUs for DNA identification will be compared to unique haplotypes, which are hypothesised to be more accurate and informative. This study complements that by Creedy et al. (2019), where bee specimens from the national pollinator monitoring programme were used to create a reference database for UK bees.

Methodology Samples for the reference database

The sequences used to form the reference database for UK Syrphidae species were compiled using specimens from three sources: The reference samples from the National Pollinator Monitoring Framework (NPPMF) pilot, fresh specimens from the Natural History Museum collections and online sequence records from the BOLD database.

Samples were caught during the 2015 pilot of the NPPMF, using line transects with hand netting, and pan traps at 12 sites across the UK throughout the summer (Carvell et al. 2016). The pan trap samples were transferred to ethanol, and then bees and syrphids sorted out from the rest of the samples. The syrphids were sent to professional taxonomists and identified, and then sent for DNA analysis. A reference set of specimens were selected to represent the species present in the dataset, and these were kept separate from the other syrphids. These reference specimens were used to create the DNA reference dataset.

DNA was extracted from a single leg from each specimen, which was crushed and incubated overnight in lysis buffer made up of 180µl ATL and 20µl proteinase K, before following the Quiagen blood and tissue extraction protocol. A 600bp barcoding fragment of CO1 was amplified using the primers BEEf and BEEr, following the same protocol as in Creedy et al (2019). PCR reactions and conditions consisted of between 1-2.5 μl of DNA, 0.4 μM of each primer (at 10 mM), 2.5μl of the NH4 reaction buffer (Bioline, London, UK), 1.5 mM of MgCl2 solution, 200nM of each dNTP, 1 unit of BioTaq™ (Bioline), and ddH2O up the final volume of 25μl. Standard PCR cycle conditions for COI developed by the Canadian Centre for DNA Barcoding (CCDB) were used: initial denaturation at 94°C for 2 minutes, followed by 5 cycles

46 of 94°C for 30 seconds, annealing at 45°C for 40 seconds, and extension at 72°C for 1 minute, 35 cycles of 94°C for 30 seconds, annealing at 51°C for 40 seconds, and extension at 72°C for 1 minute, and a final extension at 72°C for 10 minutes. The success of the PCR was checked by gel electrophoresis, and PCR products were purified using a QIAquick PCR Purification Kit (Qiagen) and sequenced in both directions using ABI dye terminator sequencing (Thermo Fisher Scientific, Waltham, MA, USA). Sequence chromatograms were assembled into contigs and manually edited using Geneious v5.3.6. (Kearse et al., 2012). The sequences were uploaded to BOLD in the BEEEE database along with bee reference sequences from the NPPMF.

The second source of data was from specimens caught over the summers of 2015 and 2016 by a curator of Hymenoptera at the Natural History Museum London (NHM), D. Notton. Samples were pinned and identified by DN and Diptera curator and taxonomist, N. Wyatt, before being accessioned into the collections at the NHM. DNA was extracted from a single leg which was crushed and incubated in lysis buffer made up of 180µl ATL and 20µl proteinase K, overnight on a shaking incubator at 56°C, and then DNA was extracted using the Quiagen blood and tissue DNA extraction kit. The 418bp barcoding region of CO1 was amplified for each specimen using primers BF and foldR. PCR reactions and conditions consisted of between 2μl of DNA, 0.4 μM of each primer (at 10 mM), 2.5μl of the NH4 reaction buffer (Bioline, London, UK), 1.5 mM of MgCl2 solution, 0.25µl of each dNTP, 1 unit of BioTaq™ (Bioline), and ddH2O up the final volume of 25μl. The PCR conditions were as follows: An initial denaturation at 94°C for 4 minutes, followed by 40 cycles of 94°C for 30 seconds, annealing at 48°C for 30 seconds, and extension at 72°C for 45 seconds, and a final extension at 72°C for 10 minutes. The success of the PCR was checked by gel electrophoresis, and the PCR product was purified using the AMPure XP bead purification kit to remove sequences below 200bp. The samples were sent for library preparation and sequencing on an Illumina HiSeq lane.

Post sequencing, forward and reverse sequences were merged using Pear, with a quality score of 26, and primers were removed. The paired end reads were put through NAPselect (Creedy, 2019), which selected the most abundant unique sequence for each sample, bootstrapping to check that it was significantly more abundant than the other reads. The

47 top sequence was also blasted against all of the Diptera on GenBank (Clark et al., 2016), to reject any sequences that were contamination from other organisms such as fungi. To check for contamination between samples, a Muscle (Edgar, 2004) alignment of these sequences and the NPPMF barcodes was made in Geneious (Kearse et al., 2012), and a maximum likelihood phylogenetic tree ran in RAxML (Stamatakis, 2014) to ensure that genera were monophyletic, and no sequences were misplaced. Sequences were then uploaded to BOLD and can be found in the SyrUK database.

Finally, the data was supplemented by the online database BOLD. A list of all UK hoverfly species was compiled using Ball and Morris Britain’s Hoverflies (Ball and Morris, 2015). These species were searched for on BOLD and any public sequences were downloaded in fasta format.

Database cleaning and species delimitation

Before a well curated reference dataset could be collated, the sequences were quality filtered and the identifications tested. Sequences with ambiguities or missing data were removed, as these are indicative of lower quality data.

To determine whether the CO1 barcodes could separate species, three species delimitation analyses were carried out on genera with more than one sequence. Species were defined by the name of each sequence as identified by a taxonomist. Species could be lumped or split, or a combination of both. In lumped species the barcode gap was not large enough to separate them from another species of the same genera, and so the sequences clustered together. For split species the sequence variation within species meant that the barcode gap was just as wide within as between species, and resulted in the species being split into more than one group of sequences. For some species both of these issues were found and therefore some of the sequences were split between two barcode groups whilst others were found with sequences from another species. This method of defining the barcode clustering was used to align with that used in Creedy et al. (2019).

48 Firstly, for each genus the barcodes were aligned in Geneious (Kearse et al., 2012) using Muscle (Edgar, 2004), including an outgroup from a closely related genus. An xml file was created in BEAUTI with a strict clock, yule speciation and the evolutionary model GTR+I+G, and a tree produced using BEAST v.8 (Drummond et al., 2012) on the Cipres Science Gateway (Miller, Pfeiffer and Schwartz, 2010). This tree was checked in Figtree and then used to run a GMYC analysis in R using the package Splits. The number of species per genus was recorded, as was the monophyly of each species and whether they were split or lumped with another species. Secondly, ABGD analysis was carried out by uploading the same Muscle alignments to the ABGD web server. The ABGD analysis was run using a Kimura distance of 2.0 (Puillandre et al., 2012), and the output was once again checked for whether species were split, lumped or both. The final species delimitation method used was BOLD BINs, which are used to define species on the BOLD database. The number of BINs a species occupied, and any other species in that BIN, was recorded from the BOLD website.

For each of the analyses the results were investigated for obvious misidentifications, for example single sequences lumped with another species. Secondly sequences were removed where there was a geographic split. Since this is a UK reference database, those split separately to UK barcodes were removed. After running the species delimitation methods, a final filtered dataset was produced containing sequences which had been established to be correct and geographically relevant. This filtered reference database was used to re-run the species delimitation analyses to get a final picture of the species delimitation capacity of the database.

After the barcodes were filtered, all of the remaining barcodes were clustered at 97% similarity using Usearch10. Clustering is the method most commonly used in metabarcoding to obtain OTUs, and so this tested whether this method can be used to find OTUs in syrphid datasets. Firstly, the barcodes were de-replicated so that only unique sequences were included in the clustering. Usually at this stage singleton sequences are discarded, but some species were only represented by one sequence, and these were retained by specifying - sizeout 1. Then the sequences were clustered at 97% similarity. The parameter -uparseout was used to produce a table showing each barcode and whether it formed a new OTU or was a >97% match to an existing OTU. Each species was checked to see whether it was split

49 between OTUs, lumped in with another species, or split between OTUs which also contained other species.

The sequences were also split into individual haplotypes. This was done by blasting the reference sequences against themselves and then grouping them based on a 100% match along the whole of the CO1 barcoding region. The haplotypes were analysed to see how many were present for each species, and to check whether any haplotypes were shared between species. The ability of haplotypes to delimit species was then compared to the clustering analysis.

Testing the reference database

To test the ability of the reference database to accurately identify a large number of syrphid specimens, it used to identify specimens caught as part of the UK wide pilot pollinator monitoring scheme, NPPMF (Carvell et al., 2016). As has been mentioned previously, some of these specimens were removed by taxonomists and were used in developing the reference database. However, the rest of the specimens were identified by the taxonomists and returned to the pan trap samples and thus provided a large sample of syrphid specimens which could be used to test the reference database.

DNA was extracted from each syrphid specimen by piercing the abdomen and incubating the whole specimen in lysis buffer overnight on a shaking incubator at 56°C. DNA was extracted using the Biosprint, according to the Quiagen Biosprint protocol. The 418bp barcoding region of CO1 was amplified for each specimen using the primers BF and FoldR. The primers contained Illumina tails for library preparation, as well as a six base pair tag, so that samples could be pooled together (Shokralla et al., 2015). PCR was carried out using

2.5µl of Bioline buffer, 1µl of MgCl2, 0.25µl of dntps mix, 0.75µl each of forward and reverse primer, 0.1µl of BioTaq™ (Bioline), 2µl of DNA, and 15.4µl of ddH2O to give a total volume of 25µl. The PCR was run with an initial denaturation of 94°C for 4 minutes, and then 45 cycles of 94°C for 30 seconds, 45°C for 30 seconds and 72°C for 45 seconds. The PCR products were visualised on an agarose gel and then purified using AMPure XP beads to retain DNA

50 above 200bp. 96 libraries were made by amplifying using i5 and i7 NextEra XT indices with 96 unique tags, and then quantified using the Tape Station. The libraries were pooled in equimolar concentrations and sequenced on an Illumina MiSeq.

The libraries were separated by the sequencing facility back into 96 libraries based on the NextEra indices. NAPdemux (Creedy, 2019) was used to separate out the samples based on the 6 base pair tags using cutadapt, while also pairing forward and reverse reads based on sequence names. The sequences were then combined, and the primers removed using Pear, with a quality score of 26.

Two methods were used to obtain the barcode sequence for each specimen. The first was using NAPselect, as for the reference barcodes from the NHM. The second method used clustering to obtain OTUs, using usearch90. The OTUs were blasted against the reference database and the top OTU blasting to Syrphidae was selected. This differed to the NAPselect method because a 97% similarity was used, allowing some sequence variation. The NAPselect barcodes were identified in two ways, firstly using 97% similarity against the reference dataset, and secondly 100% similarity to the reference dataset. The 100% similarity method matched the haplotype method used for separating the reference barcodes. The species determinations for each method were compared against each other and against the morphological identifications using ggplot2 (Wickham, 2016) in R. Secondary OTUs

Although the NPPMF specimens were sequenced individually, they came from mixed samples and were sequenced at a greater depth than was required for obtaining a single barcode. The samples were therefore also investigated for secondary OTUs sequenced alongside the target specimen. This was done using the NAPcluster script which utilises usearch90. A strict target length of 418bp was used, as any length variation is likely to be caused by pseudogenes and PCR and sequencing errors. The minimum group size was increased from the default of 2 sequences to 5, as very rare OTUs are likely to be sequencing and PCR errors. Finally, denoising was carried out on the samples to detect any remaining errors.

51

Results Samples for the reference database

The number of species and sequences from each dataset is shown in table 1, although this was reduced after filtering, and resulted in coverage for 73% (209/284) of syrphid species present in the UK. There was a high level of species overlap between the datasets, with the two specimen datasets used in this study each having fewer than ten unique species, as can be seen in table 1. The split of missing species across genera can be seen in figure 1a, with seven genera not represented in the database, although these are all genera with under three UK species. The majority of genera are well represented in the reference dataset. The 20% of species that were not included in the reference set included species which are critically endangered or nationally scarce, as is shown in figure 1b. This figure also shows that for 16% of species with no barcode, they do not fall into these official categories but have fewer than ten records on GBIF, and so have been rarely found in studies of UK Syrphidae.

NHM NPPMF BOLD Total samples reference sequences

Species 43 (4) 45 (3) 211 (123) 228 Barcodes 129 75 3222 3682 Table 1. Number of barcodes and species found in each of the three datasets used in this study before filtering. The number of species unique to that dataset are shown in brackets.

52

B.

2% 16% critically endangered A. 15% Nationally scarce near threatened 1% none 2% priority species sp with under 10 records 20%

Figure 1A. Graph showing how many UK species are represented in the database for each genus. The species found in the dataset are shown in blue and the missing proportion of species in red. The total number of UK species for that genus is shown above each bar. Figure 1B. is a pie chart of the species present in the UK with no DNA barcode in the reference database. This shows the species which are nationally scarce, critically endangered, near threatened, priority species, have under ten records or have no special status. These classifications were taken from the NBN atlas website on the 26.9.2018 (https://nbnatlas.org/).

53 Database cleaning and species delimitation

25 short sequences were removed because they did not overlap with other sequences and could not be compared using the species delimitation methods or haplotype analysis. 110 sequences were removed from the BOLD data because they contained ambiguous characters and therefore were of poor quality. Most of the splits in species which could be resolved were caused by geographical variation in samples from the BOLD database. An example of this can be seen in figure 2, where one of the species in the genus Volucella is split between two GMYC groups. These sequences were split between those from the UK and the rest of Europe, and those specimens from Canada. Because of the UK focus of this study the Canadian sequences were removed, resulting in all the Volucella species being monophyletic. Many of the lumped species could not be resolved using the species delimitation methods, and this could be caused in part by an inability to split species based on the CO1 barcode. It was also the case that distinguishing true lumping from incorrect identification was not possible in many cases. Overall, 155 sequences were removed. This meant that the total number of species was reduced to 209.

54

Figure 2. Phylogenetic tree created in BEAST V1.8.2 using CO1 barcodes for the genus Volucella. The pink box shows the outgroup Eristalinus. The red branches show where the GMYC analysis has designated species, and the coloured boxes show the morphological species. Volucella bombylans is split between two GMYC groups (yellow boxes).

For all three species delimitation methods the number of lumped, split and split & lumped species was reduced after filtering (table 2). Lumped was the category which decreased the least in all cases and split & lumped species decreased the most. Both before and after filtering the BIN method found the highest number of congruent species. This was followed by ABGD, which only found one split species but found a higher number of lumped species. The GMYC analysis found fewer lumped species than ABGD but the most split species (29).

55

Method Split Lumped Split & Congruent Lumped GMYC 34 52 26 95 Before ABGD 21 65 21 100 filtering BOLD bins 14 55 12 126 GMYC 29 48 6 126 After ABGD 1 57 2 149 Filtering BOLD bins 0 45 4 154

Table 2. The number of species which were split, lumped, split & lumped or congruent with morphological species for the three different species delimitation methods. The top three rows show the numbers before filtering the data, and the bottom three rows show the number of species after filtering.

GMYC found the fewest congruent genera, due to the tendency to split species, as shown in figure 3A. For most of the incongruent genera GMYC found a single split species. ABGD and BIN methods found lumped species over more genera, as shown in figures 3B and 3C. In all methods the genera with the highest number of incongruences are also the genera with the highest number of species, particularly Cheilosia, Eupeodes, Platycheirus and Sphaerophoria. For all three methods, 27 genera were found to have no species which were split or lumped, and these were excluded from figure 3. For 16 other genera only one of the methods (GMYC) found any incongruences.

56 A.

B.

Figure 3. Graphs showing how the species in each genus are split and lumped by the three different species delimitation methods after filtering the barcodes. Three methods of species delimitation are shown for each genus. A. shows the C. GMYC, B. ABGD, and C. BINs analysis. The bars are divided into those species that are split by the methods, those that are lumped with another species, and those that are both split while also being in groups with another species. Species that are congruent are not shown, and genera where all species are congruent in all methods are not shown. The total number of species present in each genus is shown on the graphs.

57 The clustering analysis found a larger number of species incongruent compared to the species delimitation analysis. There were 36 lumped, 80 split and 24 split & lumped species. This indicates that the 97% clustering may not be sensitive enough to distinguish species in this family. The haplotype analysis provided a much more sensitive way to identify species, with only three genera with lumped species compared to 25 in the clustering analysis (figure 4).

Some genera containing high numbers of lumped species in the species delimitation methods, such as Cheilosia, had no haplotypes shared between species. This indicates that whilst there is a narrow barcode gap between Cheilosia species, they can be distinguished at the haplotype level. The genera which did contain shared haplotypes shown in figure 5 were Sphaerophoria, Platycheirus and Melanostoma. It may be the case that all these sequence identifications are correct and that the haplotype is shared between species. These sequences were left in the dataset so that any of these haplotypes found could be identified to genus level. There was a link between number of sequences representing a genus and number of haplotypes present, with more sequences resulting in more haplotypes.

58 Figure 4. Graph showing the number of lumped species per genus for clustering and haplotype analysis. The cluster method is shown in red and the haplotype in blue. The total number of species for each genus is shown on the graph. Only genera with clustered species in at least one of the methods are shown.

Testing the reference database

The DNA identifications against the reference database were compared to the expert taxonomists identification. Using the barcode sequence, 68.18% of the specimens were an exact match to the morphology, matching to the same species. A further 22.29% matched to genus level. Of these genus level identifications, there were some which had only been identified to genus by the morphology but were identified to species by the DNA (1.94%). There were also some specimens which were not identified to species by the morphology or DNA (39.81%). There were others which had been given a species identification by morphology, but which were only identified to genus by DNA (38.83%).

59 method top OTU NAPselect Haplotype

60

40

20

0

match genus incorrect ID no sequence

Figure 5. Bar graph showing how the many specimens were identified to different levels compared to the morphology by the three molecular methods, 97% match to top OTU, 97% match to syrphid barcode and 100% haplotype match.

The barcode identity was compared to the top OTU, to see how the two methods differed (figure 5). The select identified barcode clustered in with the most abundant OTU for 481 of the samples. For the other four samples the select sequence clustered into the second most abundant OTU. This meant that the outcomes were very similar between the two methods, with both finding 22% of samples matching to genus level. The OTU method found slightly fewer species level matches than the barcode method, with 66% matching compared to 68.18%. This was due to more incorrect identifications, 10.17% as opposed to 8.66%. The barcode method had more samples for which no sequence was obtained (2.37% Vs 1.73%),

60 because if there was no syrphid barcode that was significantly more abundant than other sequences then no sequence was selected.

When using the barcode for haplotype analysis (100% match), the number of specimens identified to species level increased to 74%. Only 13.75% of specimens were identified to genus. However, the trade-off for this increase in species level identification was an increase in the number of specimens which were given no ID by the DNA (8.52%), as their haplotype was not present in the reference database.

At a species level, a higher percentage were identified by the NAPselect barcodes (72%) than at the specimen level (68.18%). This suggests that a few abundant species could not be identified to species level, and so decreased the number of species identifications overall. No species was consistently identified incorrectly as belonging to a different genus. There were species which were consistently identified only to genus level (27% for NAPselect), suggesting that some species cannot be identified to species level by the DNA.

Secondary OTUs

The depth at which the specimens were sequenced meant that DNA was recovered for more than just the target specimen. Secondary OTUs were found in almost all of the samples, with an average of 10 OTUs per sample. The taxonomic diversity of the secondary OTUs can be seen in figure 6. Most of the diversity was made up of other syrphids, potentially from contamination between samples. The largest group found in the secondary OTUs was Hymenoptera. These were present in low numbers in the samples, as can be seen from figure 6, but in total 52 OTUs were found. This may be due to the fact that the bees from the study, whilst not addressed here, were processed alongside the syrphids in the same manner and sequenced on the same MiSeq run. In total there were 36 syrphid species identified in the samples by the taxonomists. When running clustering on the default settings, 91 OTUs were identified as syrphids, but changing the parameters to strict sequence length selection and increasing the minimum size of OTUs reduced the number of syrphid OTUs to 50. All of the OTUs which matched a specimen barcode were also found as secondary OTUs in other samples.

61

20

15

10

OTU Richness OTU

5

0

52 50 24 3 3 4 2 2 3 4 1 3 1 8

Fungi Diptera Bacteria Unknown Syrphidae Vertebrate

Coleoptera Other insect Hymenoptera Flowering plant Flowering

Other or unknown Other or unknown Other or unknown arthropodOther or unknown eukaryote Other or unknown Other or unknown vascular plant vascular Other or unknown Taxonomic group

Figure 6. graph showing the taxonomic makeup of the OTUs found in the sample, which were not the target specimen. Any syrphid OTUs shown on the graph were found as secondary OTUs. The Boxplot shows the average richness of each group of OTUs per sample, and the total number of OTUs in each taxonomic group is shown on the graph. There were also OTUs present from other groups that were caught alongside the syrphids in the pan traps or are known associates. These included Coleoptera, fungi and flowering plants (figure 6). While these OTUs are most likely a result of mixing in the traps, the syrphid and Hymenoptera OTUs could come from a number of sources. These OTUs may be the result of mixing in the pan traps, contamination from specimens on the same plate during

62 PCR, or contamination between specimens in the same well for sequencing, caused by tag jumping.

Discussion The coverage of the reference database

The study succeeded in obtaining sequences for over 70% of UK species. The non-capture of species was due to a number of reasons, including geography, as fewer specimens were taken from the north of the UK, and the difficulty in collecting rare or restricted species. This study shows the potential issues of using public records to form reference databases, as these often contain incorrectly identified sequences. Here, species delimitation methods allowed incorrect sequences to be identified, making the syrphid reference database more robust and showing the importance of curating databases before using them.

Database cleaning and species delimitation

Many sequences were obtained from the BOLD online database public portal, and the majority of these were not from the UK, but were from countries such as Norway, Germany and Canada. All sequences available for a species were downloaded, no matter the country of origin, which did introduce more variability into the database. For three species, Parasyrphus nigritarsis, Leucozona lucorum and Volucella bombylans, two clades were found which split along geographic lines, with sequences from Europe split from sequences from Canada. For one species, Helophilus hybridus, the split was between sequences from Pakistan and sequences from Europe and Canada. The sequences that clustered away from European sequences were removed from the database, a choice that was made because of the focus on developing a robust UK database.

The majority of species in the database were successfully separated using the CO1 barcode. This supports other studies which have used the CO1 barcode to delimit species in syrphid genera including Merodon (Šašić et al., 2016) and Eumerus (Chroni et al., 2017). However, after filtering the reference database some species were still unable to be distinguished by

63 any of the species delimitation methods. One example of this Melanostoma, where the species Melanostoma mellenium and Melanostoma hirtella are unable to be distinguished using the CO1 barcode. This may in part be due to the difficulty in identifying these specimens morphologically, resulting in less reliable reference identifications, however it is more likely that these species have diverged recently, and thus the CO1 barcode does not have the power to separate them. This can be true in rapidly evolving groups, or groups with cryptic species (Zhu et al., 2017) where the use of CO1 as a DNA barcode must be investigated to ensure that it has enough power to delimitate species. In future, this issue could be resolved by using a multi-locus approach to identify those species (Dupuis, Roe and Sperling, 2012). This study would be a useful tool for this further research, as it shows the genera and species in the UK for which additional diagnostic tools are required. There were also species which were split by the species delimitation methods. This may be for several reasons, one of which is the geographic split as has been discussed above, but it could also be caused by cryptic species which have not been detected by morphological methods (Darwell and Cook, 2017).

Grouping identical sequences into haplotypes rather than 97% OTUs was useful because of the number of species that were inseparable using species delimitation methods. It decreased the number of genera with incongruent species to just three with shared haplotypes. Species can share haplotypes, and this has been found to occur in CO1 for another Diptera group (Whitworth et al., 2007), and so while this may be the result of incorrect specimen identification on BOLD, they may also be truly shared between species. Using the haplotypes also increases the potential data that can be obtained from this study and studies like this. Rather than species level differences, monitoring datasets can be used at a population level, to look at genetic variation as well as species variation.

Testing the reference database

When comparing the top sequence to the top OTU for identifying the syrphid specimen there was little difference, with the top sequence identifying slightly more specimens to species level. When sequencing individual specimens, as was done in this study, using 97% clustering should not be necessary as there should be no sequence variation in the sample.

64 This also has the potential to provide more information on genetic diversity in communities, as haplotype diversity can be measured alongside species diversity. This has been done in studies such as Elbrecht et al., (2018), and can add information on the health and diversity of populations as well as species diversity.

As can be seen from the reference data and species delimitation, several syrphid species are difficult to separate using CO1, and so using haplotypes rather than OTUs to assess diversity may be more robust, when the 3% barcode gap is not present in all species. However, sequencing errors can lead to erroneous sequences, especially in metabarcoding studies (Elbrecht et al., 2018). In these cases, it may be necessary to allow for sequence ambiguity using 97% clustering rather than haplotypes to identify specimens. Matching the sequences to a DNA haplotype meant that more sequences were identified to species level, however this was a trade-off as a larger number of specimens were not identified. This is because although all of the species present in the NPPMF dataset are present in the reference database, it is highly unlikely that all the haplotypes are. As it would be very challenging to produce a database for all UK syrphid haplotypes, a dual identification method is currently the most suitable. This way the haplotypes present can be identified, and then unidentified sequences checked using the 97% method. For genera where the species show clear delimitation these specimens can be identified to species level, and any new haplotypes can be recorded. For those genera such as Cheilosia, where several species are unable to be separated using species delimitation methods, those not identified by haplotypes can be given a genus level identification.

For 90% of the specimens caught in the NPPMF pan traps, there was no disagreement between the morphological and molecular identifications. 68% of the identifications were an exact match, and 22% a match to genus level. This suggests that DNA can identify the majority of syrphid species caught in a UK wide monitoring programme. However, the lumping of species in the reference database meant that some species which could be identified by morphology could not be identified using DNA. Alongside this, several species could not be identified to species levelby morphology or DNA, suggesting there is a wider issue with syrphid identification in monitoring. Species level identification of some species of syrphid is challenging no matter the method used, particularly for certain genera.

65 Secondary OTUs

Alongside the target specimen, other OTUs were also present in the data. Their presence shows the potential future of monitoring, with ecological associations, symbionts and diet OTUs able to be detected in the samples. The most abundant OTUs apart from Diptera were fungal OTUs, including the known insect symbiont Wolbachia.

There have been several studies recently using DNA to identify pollen from pollinating insects (de Vere et al., 2017; Lucas et al., 2018). In this study we find flowering plant OTUs present in the dataset in low numbers. The use of the CO1 barcode and insect specific primers meant that plant DNA was not present in large amounts, but the small number of OTUs suggests that with a more targeted DNA barcode this method could provide plant association data. Alongside species associations, the secondary OTUs also included other insects which were likely present in the pan traps. This included a Coleopteran OTU for the species Malthodes pumilus, and other Diptera, including several Phoridae OTUs. These are known flower visitors (Larson, Kevan and Inouye, 2001) and so were likely caught in the traps alongside the syrphids.

There were also many syrphids which were found as secondary OTUs in the samples. A few of these non-target species were found at high read counts and resulted in incorrect identification due to high level contamination, however most were at low read numbers. Some of these reads may be from specimens which were found alongside the target specimen in the same pan trap, and so are the result of specimen mixing in the same way as the other insect OTUs. Other low-level contamination is likely a result of protocols for morphological and molecular analysis, which are not yet streamlined. This results in specimen handling and mixing during morphological identification which does not conform to the strict protocols for DNA analysis. However, despite this the DNA analysis produced a result with 90% agreement to the morphological identification, giving confidence in the curated reference database.

66 Conclusion

This study has developed a well curated and tested reference dataset of CO1 barcodes for UK Syrphidae species, covering almost all genera and >70% of all species, including all species commonly found in monitoring programmes. The robust curation has shown which species and genera are harder to separate using DNA, which will enable most species to be quickly and efficiently identified, whilst enabling taxonomists to focus on these cryptic and challenging groups. It has also been shown that using haplotypes rather than OTUs can provide accurate species level information, whilst potentially providing population level details which could inform conservation, as well as solving the majority of the species identification difficulties. However, the challenges with developing a reference database are magnified if all haplotypes need to be captured alongside all species, and thus presently a combination of species and haplotype identification provides the best overall result. This study complements the study Creedy et al., (2019), which developed a reference database for UK bees tested against the same monitoring dataset, and together these provide a valuable resource for future monitoring and research of UK pollinators.

67 Chapter 3: Hoverfly diversity and pollen associations in a semi-urban landscape

Introduction Semi-urban environments may be important for insect preservation and species diversity, as they make up an increasing proportion of land use in the UK, alongside agriculture, as natural habitats are lost at large scales. Insect pollination in these areas may be retained to a better level than in strictly agricultural areas (Radford and James, 2013). This benefit of urban environments to pollinators leaves key questions about the resources, habitat requirements and specific interactions with native and introduced species of plants that maintain this high diversity. This is investigated here in the case of syrphids at the interface of a semi-urban to agricultural area typical of the landscapes of Britain.

The Syrphidae family have a diverse array of larval life histories, broadly falling into saprophagous, predatory, phytophagous and myrmecophiles. The saprophagous larvae can be further split into aquatic species and species which live in compost or dung. These diverse larval histories may play an important role in the diversity of syrphid species in a habitat, adding a further factor to species diversity alongside pollen and nectar availability for the flower visiting adults (Meyer, Jauker and Steffan-Dewenter, 2009).

Previous studies have found some degree of resource partitioning among several syrphid genera, suggesting that the floral community may have an impact on species diversity (Lucas et al., 2018). Understanding how these floral resources are used by syrphids is vital for understanding the network interactions between plants and pollinators, and in turn understanding how resilient the networks are to changes in floral diversity. Networks where pollinators have a high level of specificity, whether that is to a plant species, group or floral phenotype, are at risk of being destabilised by changes to the habitat (Weiner et al., 2014). This in turn could lead to greater species loss if there are changes to the land use or the habitat.

68 Semi-urban, agricultural and semi-natural habitats differ in the type, number and length of floral resources available. Field edges of mass flowering crops will be impacted by the resources available in the field, as well as the potential refuge of the field edge (Haenke et al., 2014). Gardens may provide a diverse array of larval habitats and a longer period of diverse floral resources, as they are generally planted for long flowering periods, however a lot of floral resources in gardens come from non-native or cultivated plants which may not be attractive to syrphid pollinators (Geslin et al., 2013). Finally, semi-natural habitat managed for wildlife is likely to possess more native flowers, which may have an impact on species diversity and stability of the networks.

The mix of floral resources over a long flowering period in gardens may be especially important in areas surrounded by agricultural land that produces a mass of flowers for a short period, but no floral resources for the rest of summer months (Osborne et al, 2008). These differences in the amount and the quality of floral resources throughout the summer months, especially in sites bordering mass flowering crops, may have an impact on the diversity of the syrphids as well as the diversity and shape of the plant-pollinator networks.

Research into pollinator networks is often done using visitation studies, where a flower is observed for a set time and visitors recorded (Geslin et al., 2013). This is a visitation network rather than a pollination network, as pollination can only be ensured when a flower is isolated after visitation and seed set measured (Ballantyne, Baldock and Willmer, 2015). These methods for establishing links are time consuming and difficult to scale up for large scale projects, and so studies have begun using pollen as a proxy for visitation and pollination networks (Alarcón, 2010). Presence of pollen on an insect’s body does not mean they are effectively pollinating the plants, but it may be a better proxy for pollination networks than observation because the insects are picking up pollen, a step closer to successful pollination than flower visitation. Pollen can be identified using morphology (Rader et al., 2011), but this is challenging and requires expert training and techniques (Karen L. Bell et al., 2016). More recently, DNA identification methods have been employed to identify pollen from insect pollinators, including in studies of Diptera (Galliot et al., 2017) and UK syrphids (Lucas et al., 2018). These methods of pollen metabarcoding have used a variety of DNA barcodes including ITS2 (Richardson et al., 2015), rbcL (Garibaldi et al., 2013;

69 de Vere et al., 2017), trnL and matK to successfully identify pollen from the bodies of insects. These studies have so far been able to identify pollen at a small scale and produce visitation networks for a variety of pollinators.

Monitoring of pollinating insects is an important part of working towards their conservation, and there is interest in making this monitoring as accurate, time and money efficient as possible. However, pollinators contribute important ecosystem services, and thus surveying species abundance and diversity of the pollinators alone may not provide the complete picture. Looking at network interactions between plants and pollinators gives a much more complete picture of how the insects are using the environment, which resources are important for them, and which pollinators are visiting economically important plants.

This study aims to use pollen metabarcoding for a large sample of flies, whilst keeping individual level networks to greater understand the movement and diet of syrphids. The networks will show how floral resources are being utilised by flies over the summer months, and over sites managed for different uses. It is hypothesised that plant species composition will affect species composition between rounds and site-types. Different larval life histories will also be investigated between rounds and site-types, as these may also be a factor in driving fly species composition

70 Methods Testing of primers ITS2 rbcL and trnL

Samples for testing pollen metabarcoding were collected from two different locations, the Natural History Museum wildlife garden, London and a garden in South Norfolk. The syrphids were from two tribes and five genera. For DNA extraction the ethanol was removed and the samples ground using micro-pestles, to release the gut contents and break the hard shell of any pollen grains. DNA extraction was carried out using the Quiagen DNA blood and tissue extraction kit in an extraction hood, to reduce the risk of contamination. During each extraction two negative controls were taken through the same process. Three barcodes were tested for amplifying pollen DNA: rbcL, trnL and ITS2. The primers used were taken from previous studies amplifying DNA from pollen. For all of the barcodes the PCR reaction mix contained 2.5µl of Bioline buffer, 1µl of Bioline MgCl2, 2.5µl of dNTPs, 0.75µl of both the forward and reverse primer, 0.1µl of BioTaq™ (Bioline), 15.4µl of ddH2O and 2µl of DNA, to make a total reaction volume of 25µl.

For each of the primers, the PCR reaction was taken from previous studies. The ITS2 reaction was taken from Keller et al., (2015), the rbcL from Hawkins et al., (2015) and the trnL from Kraaijeveld et al., (2015). PCR products were purified using AMPure XP bead purification kit, and pooled and sequenced alongside other samples on an Illumina MiSeq. Post-sequencing the forward and reverse reads were merged, and quality filtered using Pear with a quality score of 26. The three different barcodes for each specimen were separated based on the primer sequence. The OTUs were clustered using Usearch10, and then blasted against GenBank and BOLD (Ratanasingham and Hebert, 2007). The blast output was inputted into Megan (Huson et al., 2016) to give an identification for each OTU. The OTU table and the taxonomy table were then imported into R where the package Vegan (Oksanen et al., 2016) was used to create bipartite networks for the insects and pollen in each of the two sites.

71 Fieldwork

Samples were collected throughout the summer of 2017 in a 1km2 area of South Norfolk. Nine sites were selected from different habitat types in the area: agricultural, urban, and semi-natural, shown in figure 1. Sites were sampled five times throughout the summer, in May, June, July, August and September. Samples were only taken if the temperature was above 15°C, and the weather was sunny, with low wind levels and no precipitation. Sampling was carried out by two people with hand nets walking the site for an hour, catching any syrphid flies they encountered and storing them in individual collecting tubes. The insects were killed by freezing and then 100% ethanol was added to each tube before transportation. The samples were stored at -20°C before processing. A reference list of all the flowering plants present in a site was recorded alongside the fly sampling, so that the OTUs detected could be compared to the list from the sampling sites. The plants were identified in situ and a list recorded for each site and time period.

A. B.

¨

Figure 1 A. A map of the United Kingdom showing the field location in East Anglia with a red diamond symbol. B. an aerial view of the field location with the location of the nine

sites labelled with the site-type they were assigned.

72 CO1 barcoding of syrphid flies

Flies were processed inside a fume hood to reduce any chance of contamination, and all equipment was sterilized using UV light before the dissection. For each fly, the guts were dissected out in a petri dish with sterile razor blades and pins. Each gut was macerated using a sterile micro-pestle and then incubated in 200μl of lysis buffer made up of 180µl ATL and 20µl proteinase K, overnight on a shaker at 56°C, before extraction with the Quiagen blood and tissue 96 plate extraction kit.

For the fly barcodes, the same protocol was used as in Chapter 2 and Creedy et al., (2019) to amplify the 418bp CO1 barcoding region. The primers contained a unique six base pair tag, shown in table 1, which enabled over 1000 fly samples to be pooled together. PCR was carried out using the Bioline PCR kit, with 2.5μl of buffer, 1μl of MgCl2, 0.25μl of dNTPs, 0.75μl each of the forward and reverse primers, 0.1μl of BioTaq™ (Bioline) and 18.65μl of

H2O. Each reaction contained 1μl of DNA and a total volume of 25μl. PCR was carried out with an initial denaturing temperature of 94°C for 4 minutes, and then 40 cycles of 94°C for 30 seconds, 48°C for 30 seconds and 72°C for 45 seconds. A final extension was carried out at 72°C for 10 minutes. PCR products were visualised using electrophoresis.

The PCR products were assessed for band brightness to ensure that the samples were pooled approximately equimolar. The average concentration of the different band strengths was obtained by measuring 4 samples of each using the Qubit high sensitivity kit. For strong bands 12µl of PCR product was pooled, and for weak bands this was increased to 15µl. From the final pool, 40µl of sample was cleaned using the AMPure XP purification kit, removing DNA fragments below 200 base pairs. The purified samples were quantified using the Qubit. The pooled barcoding samples were combined with the pooled ITS2 metabarcoding samples, with the aim to achieve 12,000 reads per metabarcoding sample, and 2000 reads per barcoding sample. The pooling was adjusted to account for the size difference between the fragments, as smaller fragments are preferentially sequenced. The final plate of samples was quantified using nanodrop and qubit and sequenced on an Illumina MiSeq using 2x300 base pair read lengths.

73

All of the data was demultiplexed into individual samples using the primer tags through NAPdemux (Creedy, 2019). The data was assembled into pairwise reads and quality filtered using Pear, with a quality score of 26, and then further quality filtered using fastx, with an eemax value of 1, removing sequences with Ns resulting from incomplete matching of paired reads. The merged and quality filtered CO1 fly barcodes were further processed using the NAPselect tool, in the same method used and tested in Chapter 2 and Creedy et al., (2019). The most abundant unique read was blasted against the UK syrphid reference database developed in Chapter 2 to ensure that the sequence was a syrphid, and to give a species identification. Those species which were unable to be distinguished in Chapter 2 were identified to genus level.

The fly species presence and abundance were recorded for each sampling period. To account for sampling biases, this data was rarefied in R to give a more comparable dataset, after rarefaction parameters were investigated to remove the least samples whilst retaining the most fly species as possible. Fly diversity and composition was investigated using beta diversity, in the package betapart (Baselga and Orme, 2012), which was compared using DB- RDS for site, site-type and round.

Species presence and abundance was plotted over time, site and site-type. Alongside this, the larval life histories for each species were researched (Ball and Morris, 2015) and the species grouped into seven functional categories: aphid predators, other insect predators, aquatic, compost/dung, fungi, myrmecophiles, and roots/bulb. The abundance and diversity of these groups was plotted over time, site and site-type (categorized as agricultural, urban or semi-natural).

ITS2 Metabarcoding of gut pollen

Previous studies obtained pollen from washing the pollen from the bodies of insects (Lucas et al., 2018). This method is useful and informative when assessing potential pollination networks, as the pollen on the body of an insect is available for transfer to the next flower visited. Here, the pollen was taken by dissecting the gut of the fly for DNA analysis. Although

74 this makes these networks less of a proxy for pollination, it enables a more detailed study into which floral resources are important in the diet of syrphids (Holloway, 1976).

Primer name Primer Tag

ITS2-SF ATGCGATACTTGGTGTGAAT CTTGGA, GGTGTT, TTGACG, CCTTAC, CTACTC, CAGCTT, ATGCGA, ACTGGT, ACACAC, CATAGG, GTTCGT ITS4 TCCTCCGCTTATTGATATGC CTTGGA, GGTGTT, TTGACG, CCTTAC, CTACTC, CAGCTT, ATGCGA, ACTGGT, ACACAC, CATAGG, GTTCGT CO1 FoldegR TANACYTCNGGRTGNCCRAARAAYCA AACACC, AACGGA, GTACAG, ACGTGT, AAGAGG, AAGCCT, AAGGTC, AATCGC, ACAACG, AGCTAC, AGTGAG CO1 Ill_B_F CCNGAYATRGCNTTYCCNCG AACACC, AACGGA, GTACAG, ACGTGT, AAGAGG, AAGCCT, AAGGTC, AATCGC, ACAACG, AGCTAC, AGTGAG

Table 1. the primers and tags used for ITS2 plant metabarcoding and CO1 insect barcoding. Each 96 well plate used a different six base pair tag, and overall 11 ITS2 tags and 11 CO1 tags were used. Metabarcoding of gut contents was carried out on each specimen using the ITS2 barcode region. The primers used were ITS2SF and ITS4, which have been used in previous pollen metabarcoding studies (Keller et al., 2015), and were shown in the trial to amplify pollen DNA successfully. Like the barcode primers, each primer contained a unique tag, so that the metabarcoding samples could be pooled together. The 6 base pair tags for the primers were randomly generated, and the tags, tails and primers for each primer pair were tested using the Thermofisher multiple primer analyser to check that they did not bind to each other. The tags and primers are shown in table 1.

75 Two PCR repeats were carried out for each sample to account for any stochastic variation in amplification (Dopheide et al., 2019). For each repeat, the Takara HS PCR reagents were used. This enzyme has higher accuracy of replication and reduces PCR errors that create artefactual variation, and so is more suitable for metabarcoding analysis. The PCR reaction contained 1.36µl of Takara buffer, 0.17µl of Takara dNTPs, 0.51µl each of the primers,

0.068µl of Takara taq, and 2µl of DNA, with ddH2O added to give a total reaction volume of 17µl. PCR conditions were tested to choose a suitable number of cycles. A high number of cycles can cause overamplification of high abundance species in a metabarcoding sample, obscuring rarer species, but too low cycles can result in low levels of amplification of all species in a sample (Casbon et al., 2011; Nichols et al., 2018). 37, 35, 32 and 29 cycles were tested for three samples and a negative control, with the top number taken from Keller et al., (2015). The final PCR was carried out following the protocol from Keller et al., (2015), except with the lowest successful cycle number. This resulted in an initial denaturation at 94°C for 4 minutes, and then 29 cycles of 94°C for 45 seconds, 49°C for 45 seconds and 72°C for 1.30 minutes, followed by a final elongation step at 72°C for 10 minutes. The PCR products were visualised using electrophoresis.

15µl of each repeat was pooled for each sample, and each gel electrophoresis band categorized as strong or weak. Three representative samples from each category were quantified using the Qubit HS. The strong samples were diluted to match the average concentration of the weak samples pooled together with a final volume of 110µl. The pooled metabarcoding was purified using AMPure XP beads to retain fragments over 100 base pairs, and then quantified using the Qubit HS. The samples were pooled with the CO1 barcodes and sequenced as described above.

The ITS2 barcodes were demultiplexed into individual samples using NAPdemux (Creedy, 2019) and quality filtered in the same way as the CO1 barcodes. The merged ITS2 metabarcoding sequences were processed through the NAPcluster pipeline. This utilised Usearch80 to dereplicate the sequences so that only unique sequences remain, and then clusters these unique sequences at 97% similarity into OTUs. Then all of the sequences were mapped back onto the OTUs, to give a table of reads per OTU for each of the samples.

76 Before the final NAPcluster was run, parameter exploration was used to test how different parameters affected the number of OTUs produced. This was especially important for ITS2 as there are often multiple copies of the region in plant genomes, and there can be a wide amount of length variation. The final parameters for NAPcluster used a length of 350 with denoising, and a length variation of 5%, and were used to obtain the OTU sequences and OTU table. The OTU sequences were blasted against the GenBank database, and then inputted into MEGAN to obtain a well-supported identification for each OTU. OTUs which did not identify as plants were removed from the dataset.

The OTU data was rarefied to reduce sampling bias. Different rarefaction points were investigated, and a parameter selected which removed the fewest plant OTUs whilst retaining the most samples. The data was grouped into each sampling period rather than individual fly specimens, to investigate the beta diversity of plants visited in each sampling round. this was done using the same method as for the flies, with beta-part in R and for site, round and site-type. The plant OTUs found in the pollen were compared to the plant lists recorded in each site for each sampling round, to see the similarity between the two plant lists.

Plant-pollen networks

Networks were created using the data for each individual fly, so that the bipartite networks were on an individual rather than species level. The networks were created using the bipartite package in R (Dormann, Gruber and Fründ, 2008), which was created specifically for pollination networks and so is well suited to this data. A network was created for each sample point, so that overall 44 networks were produced. The network indices nestedness, weighted nestedness, Shannon diversity, and linkage density and were calculated in the bipartite package. ggplot2 (Wickham, 2016) was used to produce boxplots for each network index over site-type and sampling round, and an ANOVA was used to check the significance of site-type and round on each index. If significance was found, a Tukey test was used to investigate where the significance was.

77 Results Testing of primers ITS2 rbcL and trnL

All of the plant barcodes were successfully amplified and sequenced. The rbcL fragment was 600bp long, which meant that the forward and reverse sequences could not be assembled, and so forward reads alone were used for further analysis. The trnL fragment was the shortest of the three genes, at 250bp long, potentially decreasing the power for separating species. Both ITS2 and trnL identified a wider number of plant families compared to rbcL, and only one plant family was identified solely by rbcL. The rbcL database for British flowering plants is very complete (de Vere et al., 2012), and so was unlikely to be the reason for this discrepancy. It is more likely that the inability to assemble the rbcL fragment reduced the barcoding power of this gene. In order to compare the three barcodes, plants were only identified to family level because of the different levels of identification between the three genes.

CO1 barcoding of syrphid flies

Out of 1041 fly specimens which were sequenced, 943 were given identified sequence barcodes by NAPselect using the UK syrphid reference database from Chapter 2, with 6.9% of specimens having no ID. After rarefaction 38 species of fly were retained, belonging to 22 genera (table 2). During rarefaction to 4 individuals per species per site, 21 species and six genera were lost from the dataset.

78

Genus Species Cheilosia Cheilosia impressa, Cheilosia ranunculi, Cheilosia soror Chrysotoxum Chrysotoxum festivum Epistrophe Epistrophe eligans Episyrphus Episyrphus balteatus Eristalis Eristalis arbustorum, Eristalis horticola, Eristalis nemorum, Eristalis pertinax, Eristalis tenax Eumerus Eumerus funeralis Eupeodes Eupeodes corollae, Eupeodes luniger Helophilus Helophilus pendulus Leucozona Leucozona lucorum Melangyna Melangyna compositarum Melanostoma Melanostoma mellenium/scalare Meliscaeva Meliscaeva auricollis Merodon Merodon equestris Myathropa Myathropa florea Paragus Paragus haemorrhous Platycheirus Platycheirus albimanus, Platycheirus clypeatus, Platycheirus granditarsis, Platycheirus occultus, Platycheirus peltatus, Platycheirus rosarum, Platycheirus scutatus Rhingia Rhingia campestris Sphaerophoria Sphaerophoria interrupta Syritta Syritta pipiens Syrphus Syrphus ribesii, Syrphus torvus, Syrphus vitripennis Xanthandrus Xanthandrus comtus Xanthogramma Xanthogramma pedissequum

Table 2. all of the genera found in the samples after rarefaction and

the species from each genus present. Overall 22 genera were found.

Fly species beta diversity was significantly influenced by site-type (P < 0.03) and round (P < 0.001), but not by individual site (figure 2). This was in contrast to the plant pollen beta diversity, which was significantly influenced by round but not by site-type or site (figure 3). This suggests that the difference in fly species composition between site-type is not driven by plant composition.

79 A .

B.

80

C.

Figure 2. DB-RDS plots showing the total beta diversity for fly species over each sample site and round. The DB-RDS was run for A. site-type, B., site and C. round.

81 A .

B .

82 C.

Figure 3. DB-RDS plots showing the total beta diversity for plant OTUs over each sample site and round. The DB-RDS was run for A. site-type, B., site and C. round.

Fly species presence and abundance were tracked over the five sampling periods. Overall more species were found later in the summer (table 3), particularly in the final sampling period. This was also the month with the most unique species (species not present in any other sampling round), although all of the sampling rounds contained unique species, as can be seen in table 3. Figure 4 shows the species abundance over sampling rounds for non- unique species. For many species there is a peak in abundance in June. However, for some species their abundance peaks in other months, for example Syritta pipiens abundance peaks in August. Also, for some species such as Helophilus pendulus there is a double peak in abundance, one in June and another in August. The most abundant species found in the dataset, Episyrphus balteatus, dominates the July sampling period, with most other species declining or not present. The flies were sampled between May and September, and so there may also be peaks earlier in the spring or later in the autumn which were not captured by the data here.

83

Round Unique species Non-unique species Total species May 3 9 12 June 2 14 16 July 3 8 11 August 6 13 19 September 5 11 16 Table 3. The number of species found during each sampling round, split into those sampled in more than one round and those unique to one round of sampling.

.

84 30

Cheilosia_ranunculi Epistrophe_eligans Episyrphus_balteatus Eristalis_arbustorum Eristalis_nemorum Eristalis_tenax 20 Eupeodes_corollae Eupeodes_luniger Helophilus_hybridus Helophilus_pendulus Melangyna_compositarum Melanostoma_mellinum/scalare Meliscaeva_auricollis Platycheirus_albimanus Platycheirus_granditarsis Sphaerophoria_interrupta Syritta_pipiens 10 Syrphus_ribesii Syrphus_vitripennis

0

May June July August September

Figure 4. Graph showing how fly species abundance changed over the four sampling rounds for each species. Only species found in more than one time period were included on the graph. The species names are shown in the figure key. Standard Deviation error bars are shown on the graph.

85

The presence and abundance were also plotted for fly species over the three site-types (figure 5). All of the site-types contained around the same number of unique species per site, and the same total number of species in each site, showing that species number and uniqueness was not what resulted in differences between the site-types. Figure 5 shows the diversity and abundance of flies in each site-type, and it clearly shows that urban and agricultural sites were dominated by Episyrphus balteatus, which was present at much lower numbers in the semi-natural sites. the spread of species abundance is more even in the semi-natural sites, which are not dominated by a few abundant species

Xanthogramma_pedissequum Xanthandrus_comtus Syrphus_vitripennis Syrphus_torvus Syrphus_ribesii Syritta_pipiens Sphaerophoria_interrupta Rhingia_campestris Platycheirus_scutatus Platycheirus_rosarum Platycheirus_peltatus Platycheirus_occultus Platycheirus_granditarsis Platycheirus_clypeatus Platycheirus_albimanus Paragus_haemorrhous Myathropa_florea Merodon_equestris 6 Meliscaeva_auricollis 4 Melanostoma_mellinum.scalare 2 Melangyna_compositarum Leucozona_lucorum 0 Helophilus_pendulus Helophilus_hybridus Eupeodes_luniger Eupeodes_corollae Eumerus_funeralis Eristalis_tenax Eristalis_pertinax Eristalis_nemorum Eristalis_horticola Eristalis_arbustorum Episyrphus_balteatus Epistrophe_eligans Chrysotoxum_festivum Cheilosia_soror Cheilosia_ranunculi Cheilosia_impressa semi−natural urban agricultural

Figure 5. Heatmap showing fly species diversity and abundance over the different site-types, for species found in more than one site. The darker blue the boxes, the higher the abundance of that species in that site type, as indicated by the figure key.

86 As well as species diversity and abundance, the flies were grouped into larval life history categories (shown in table 4) and these were used to look at how larval diversity was spread over sampling rounds, sites and site-type. Looking at larval history across sampling rounds in figure 6, the aphid predators are by far the most abundant type. However, in September there is a peak in aquatic life histories which overtakes the aphid predators. The large peak for aphid predators in July matches the peak in Episyrphus balteatus numbers, so the trend of this group is likely driven by the most abundant species in the dataset. There is also a peak in the compost/dung larval life histories in August. The larval life history is also shown across the three site-types (figure 7) and shows that these site-types are also dominated by aphidophagous species, except for the semi-natural sites where aquatic species are more abundant. Overall, urban sites appear to have the most diverse larval life histories, with all categories present.

Category Species Aphidophagous Syrphus vitripennis, Syrphus torvus, Syrphus ribesii, Sphaerophoria interrupta, Platycheirus scutatus, Platycheirus rosarum, Platycheirus peltatus, Platycheirus occultus, Platycheirus granditarsis, Platycheirus clypeatus, Platycheirus albimanus, Meliscaeva auricollis, Melangyna compositarum, Leucozona lucorum, Eupeodes corollae, Episyrphus balteatus, Epistrophe eligans Aquatic Helophilus pendulus, Helophilus hybridus, Eupeodes luniger, Eristalis tenax, Eristalis pertinax, Eristalis nemorum, Eristalis arbustorum, Eristalis horticola, Eristalinus sepulchralis, Myathropa florea Compost/dung Syritta pipiens, Rhingia campestris Fungi Cheilosia soror Myrmophiles Chrysotoxum festivum, Xanthogramma pedissequum Other invert predators Xanthandrus comtus, Melanostoma mellenium/scalare Roots/bulbs Eumerus funeralis, Cheilosia ranunculi, Cheilosia impressa, Merodon equestris

Table 4. Table showing the seven larval life history categories that species were separated into, and a list of all of the species assigned to each category.

87

15

aphid predators 10 aquatic compost/dung/manure myrmophiles other invert predators roots/bulbs scavengers

5

0

May June July August September

Figure 6. Graph showing how the abundance of each larval life histories changes over the summer months. Each larval life history category is represented by a different colour, shown in the figure key. Error bars show standard deviation. 88

30

20 aphid predators aquatic compost/dung/manure myrmophiles other invert predators roots/bulbs scavengers

10

0

semi−natural urban agricultural

Figure 7. Bar graph showing the abundance of individuals belonging to the different larval life history categories across the three site-types. The different larval life histories are coloured separately, as shown in the key. Standard deviation error bars are shown.

ITS2 Metabarcoding of gut pollen

Several different parameters were investigated during the NAPcluster analysis of the ITS2 metabarcoding data. Default parameters resulted in a large number of OTUs (1257 total,

89 with 1141 plant OTUs). Some of these OTUs were found to belong to the same species or genus and are likely to be repeats. Parameter exploration found that reducing the amount of variation around the sequence length to 5% resulted in 1016 OTUs, 922 of which were identified as plant OTUs, and also resulted in fewer OTUs which were identified as repeats. The plant OTUs that were retrieved from the gut contents were filtered to remove non- plant OTUs. These were mostly fungi, along with two OTUs blasting to a mite and a spider. Overall there were 922 plant OTUs, which were identified as belonging to 32 different plant families, alongside several others which could not be identified to family level (figure 8). The most abundant plant family in the dataset its , with a total of 136 OTUs. For most of the families, the OTU richness was low overall, with a few samples containing most of the richness of that family. However, for a few families (Asteraceae, Ranunculaceae, Apiaceae and Plantaginaceae), there were higher levels of OTU richness across many of the samples (figure 8). There were also six families with only one OTU present.

90

100

75

50 OTU Richness OTU

25

0

9 3 1 1 6 1 8 1 1 2 3 3 9 3 8 2 1 2 5 72 126 136 53 12 13 19 39 33 57 124 40 13 30 321105 Unknown Apiaceae Taxaceae Oleaceae Araliaceae Rubiaceae Lamiaceae Cornaceae Adoxaceae Asteraceae Solanaceae Primulaceae Verbenaceae Aquifoliaceae Apocynaceae Bignoniaceae Boraginaceae Polygonaceae Papaveraceae Caprifoliaceae Cupressaceae Balsaminaceae Plantaginaceae Ranunculaceae Hydrangeaceae Convolvulaceae Orobanchaceae Campanulaceae Hydrophyllaceae Caryophyllaceae Plumbaginaceae Chenopodiaceae Scrophulariaceae Other or unknown eukaryote Other or unknown Other or unknown vascular plant vascular Other or unknown Taxonomic group

Figure 8. Graph showing the taxonomic makeup of the plant OTUs in all of the samples. Plants are grouped at family level. The boxplots show the average OTU richness per sample of the different plant families, and the total number of OTUs in each family is shown on the graph.

91

The plant OTUs were compared to the plant lists taken from each site and sampling round. Overall, 191 plants were recorded from the sampling sites, which was much lower than the 922 plant OTUs detected. 68 of the plants recorded from the sites were found in the pollen samples, suggesting that there was a large mismatch between the plants recorded and the plants visited. There were also a large number of OTUs which were found in the pollen that were not present in the plant survey.

Plant-pollen networks

Networks were created for each sampling event, and so in total 44 networks were produced. for each network five indices were calculated, which were compared across sampling rounds. Nestedness was found to be significant for round (p<0.005), with round 3 having significantly lower nestedness than all the other rounds (round 1 (p<0.0001), round 2 (p<0.005), round4 (p<0,01), round5 (p<0.001) (figure9a). When nestedness was weighted to include frequency of interactions, round 1 was significantly lower than rounds 3 (P<0.0005), 4 (p<0.01) and 5 (p<0.0005), and round 2 significantly lower than round 3 (p<0.005) and 5 (p<0.005). Shannon diversity increased in the middle of the summer (figure 9c), with round 3 significantly higher than round 1 (p<0.05), 2 (p<0.005) and 5 (p<0.05). The last network indices calculated for each sample was linkage density. The boxplots in figure 9d show a potential increase in linkage density in the middle of the summer, but this is not significant.

92

A. B.

40

0.50

30

0.25 nestedness 20 weighted.nestedness

10 0.00

round1 round2 round3 round4 round5 round1 round2 round3 round4 round5 C. D. 10.0

4

7.5 3

2 5.0 linkage.density shannon.diversity

1

2.5

0

round1 round2 round3 round4 round5 round1 round2 round3 round4 round5

Figure 9. Boxplots showing the network indices for each network over the five sampling rounds. The network indices shown are: A. Nestedness, B. Weighted nestedness, C. Shannon diversity, D. Linkage density.

93 A. B.

40

0.50

30

0.25 nestedness 20 weighted.nestedness

10 0.00

semi−natural urban agricultural semi−natural urban agricultural C. D.10.0

4

7.5

3

2 5.0 linkage.density shannon.diversity

1

2.5

0

semi−natural urban agricultural semi−natural urban agricultural

Figure 10. Boxplots showing the network indices for each network over the three site-types. The three network indices shown are: A. Nestedness, B. Weighted nestedness, C. Shannon diversity, D. Linkage density.

94 These network indices were also compared over the three site-types (figure 10). In this case nestedness was found to be significant for site-type (p<0.05), with agricultural sites having higher nestedness, significantly higher than semi-natural sites (p < 0.005). Weighted nestedness was significantly higher in semi-natural sites than agricultural (p < 0.05) and urban sites (p < 0.05) (figure 10b), and Shannon diversity was significantly lower in agricultural sites than urban (p < 0.005) and semi-natural sites (p < 0.05). Linkage density was found to have no significant differences between the three site-types, although in figure 10d linkage density does appear to be lower for agricultural sites, suggesting these had fewer links per species. Overall linkage density varied between 1 and 9, with an average of 3, with some networks only having an average of a single link in the network, and others up to 9 links.

Discussion Testing of primers ITS2 rbcL and trnL

All of the primers successfully amplified pollen DNA from the syrphids, however the rbcL primers chosen resulted in a long fragment that could not be merged after sequencing and did not identify as many OTUs as the other two barcodes. The trnL fragment was short and therefore likely to be less able to separate groups, and so ITS2 was selected as the barcode of choice for the main study. Ideally when studying plants, more than one barcode will be used as barcode distances are not constant (Karen L Bell et al., 2016). However, the size of the study meant that this was not possible, and many recent plant barcoding studies are using a single gene (de Vere et al., 2017; Lucas et al., 2018). There are some issues with using ITS2, mainly that it varies in length and that there can be multiple copies in a single plant genome. These issues were minimised by carrying out proper parameter exploration when clustering the OTUs, and by stringent quality filtering of the data prior to clustering.

CO1 barcoding of syrphid flies

Overall there were 22 genera and 38 species of syrphid collected over the sampling period, from an area 1km2. Most of the species were collected in small numbers, with Episyrphus

95 balteatus dominating the sampling, followed by Syritta pipiens. Fly species diversity was Influenced by round and site-type, unlike pollen composition which was only affected by round. This suggests that pollen resources may have been a factor influencing the change in species composition over the summer months, but they do not explain the differences in species composition between the three site-types. This highlights the importance of factors other than floral resources for syrphid pollinators, as it appears that differences in larval habitat availability may be the driver behind species composition differences between sites, with semi-natural sites dominated by aquatic species rather than aphidophagous species. These differences between site-type match the ecology of the sites, as the semi-natural sites were both marshland, which is wet with lots of transient and permanent water bodies which could provide habitat for aquatic species. Although dominated by aphidophagous species, urban sites had the most diverse larval life histories of the site-types, potentially providing habitats including ponds, compost bins, wood piles and more, which may be important in providing a wide range of habitats for a diverse number of species (Geslin et al., 2013). Previous research has shown that different pollinator groups react differently to changes in land use (Jauker et al., 2009; Bates et al., 2011), and this study shows that syrphid species diversity may be influenced by larval habitat even where floral resources do not differ. This highlights the importance of including this larval information in future studies into syrphid ecology, as it has an effect on the responses of syrphids to a changing landscape (Sutherland, Sullivan and Poppy, 2001), and may help to explain larger scale differences in syrphid and bee populations (Powney et al., 2019).

It should also be noted however that in this study all of the sites were situated close together, meaning that potentially individuals could travel between sites. Also, sites of the same type were sometimes closer to each other than to sites of different type, meaning that site-type effects could be a result of spatial differences rather than habitat type. This study demonstrates that DNA barcoding of insects and metabarcoding of pollen can be done simultaneously, and thus shows the potential for large scale ecological studies on pollinators. However, in this study the sample sites configuration makes ecological analysis challenging. Further work is needed to understand Syrphidae pollinators and how they respond to land use change.

96 ITS2 Metabarcoding of gut pollen

Around 900 plant OTUs were found in the pollen, although the exact number of plants is difficult to identify, because OTUs do not always equate to species and it is highly likely that there are some copies of OTUs found in the data. However, these were minimised using clustering parameter exploration and the dataset is reliable to compare diversity between sites and individuals. This is an issue with all metabarcoding but is likely to be a greater issue when using ITS2 due to its length variation and repeats.

Whilst Chapter 2 provided a curated reference library to identify the syrphid species caught in this study, there was no equivalent ITS2 reference library for UK plants, meaning that the NCBI database was used to identify the plant species present. This has limitations in a number of ways, firstly because it is not known whether all of the plant species present in the pollen are represented in NCBI. This means that these plant species would be overlooked in the analysis. This was partly accounted for by using plant families rather than species, so that all barcodes could be identified to family level, but for families with large numbers of species present this meant that their diversity was lost. Secondly, Chapter 2 showed that online databases are often messy and contain sequences from different countries as well as potentially incorrectly identified barcodes which confound the analysis. This means that whilst the diversity of the pollen samples can be compared between individuals and samples, the pollen diversity cannot be compared with other studies as the identification database will not be the same.

Overall, 32 plant families were detected in the samples, suggesting that the syrphids were visiting a high diversity of flowering plants. Most of these were present at low numbers in the dataset, and so the flies were not visiting huge numbers of species from the same family, but rather flowers from a diversity of different families. This supports the view of syrphids as generalist pollinators (Branquart and Hemptinne, 2000). The most abundant family in the dataset was Asteraceae, which contained far more OTUs than the other families and was found in many samples. This family has a diversity of floral morphologies but includes those with open flowers with easily accessible pollen such as oxeye daisy (Leucanthemum vulgare) which was present in many of the sites. This abundant family

97 supports the view that syrphids are more likely to visit plants with simple, open flower morphologies so that they have easy access to pollen and nectar resources (Gilbert, 1981; Geslin et al., 2013), as the mouthpart morphology of many species limits the flowers that they can access (van Rijn and Wäckers, 2016).

Not all of the plants surveyed during the sampling were detected in the pollen, again suggesting that although syrphids are generalist, they do not visit all flowers in an area. This may be because some flowers are unattractive to syrphids, or less attractive than other flowers in the site. This could be due to the morphology of the flowers (Gilbert, 1981; Garbuzov, Samuelson and Ratnieks, 2015) or because of the nutritional value of the pollen (Amorós-Jiménez et al., 2014). It could also be the case that some plants surveyed are present in the pollen but the ITS2 database on GenBank was not complete enough to identify them to genus or species level.

As well as surveyed plants not found in the pollen, there were also plant OTUs detected in the pollen which were not surveyed during collection. Most of these plants are likely to be species present outside of the collection area, indicating that the flies were travelling further distances. Some of the plants were also overlooked during the plant survey. This is especially true of the several tree species which were found in the pollen. Several different tree species including , willow, ash and yew were found in the pollen samples, indicating that these trees that often put out large amounts of pollen could be an important pollen resource for syrphid flies. This highlights the potential importance of pollen resources from wind pollinated plants and trees, which may form an important part of syrphid pollen diets (Saunders, 2018).

Plant-pollen networks

The networks are not true pollination networks, as this requires exclusion experiments to ensure that seed set has taken place (Ballantyne, Baldock and Willmer, 2015). Other studies have looked at visitation networks using pollen washed from the insects body (Lucas et al., 2018), as this pollen has the potential to be transferred to another plant and result in

98 pollination. In this study, pollen from the gut was used rather than from the body. This cannot be used for pollination because it has been ingested by the fly, however it gives a clear picture of fly resource use. This study therefore shows not only which plants the flies are visiting, but which plants are important floral food sources.

The average number of links per species varied between networks, but not significantly, and overall was low at around 3. this suggests that although overall the plant species visited were diverse, individuals tended to be specific in their diets (Tur et al., 2014). It could be that individuals specialise on certain flowers to be efficient, as has been shown to be the case in some bee species (Gruter and Ratnieks, 2011). It could also be that if these flies were surveyed for long time periods, they would visit several different types of flowers, but they do not switch plant species frequently enough for those species to be detected in pollen in the gut.

The third round of sampling in August produced networks with lower nestedness and higher Shannon diversity than the rest of the sampling rounds, suggesting that in the height of summer networks are more diverse and have fewer nested links. However, when nestedness was weighted for the size of the networks, it became significantly higher in August, suggesting that nestedness is highly influenced by the size of networks, as larger networks have the opportunity for more links. This is important at a species level because the more nested a network is, the more likely it will be resilient to change as no plants are reliant on particular pollinators. In this individual level network however, higher nestedness is an indication that individuals in the network are visiting the same plant species, and thus increasing competition. In the summer months when more flies and more species were around an increase in competition is highly likely. However this may not be a negative from a pollination perspective, as previous studies have shown that an increase in competition can increase the number of individual flowers an insect visits, maximising the spread of pollen (Sapir et al., 2017). This increase in competition may also explain the increase in Shannon diversity in August. These networks contained a higher diversity and more even links, meaning a more uniform proportion of pollen in flies’ guts from different sources.

99 The networks also varied between site-types, with the agricultural sites having a higher nestedness and lower Shannon diversity than urban and semi-natural sites, as well as fewer average links. This suggests that these sites have less diverse networks with fewer links between species and more nested links than the urban and semi-natural sites. Agricultural sites are often dominated by a few plant species, and so may have a lower diversity of network links than urban and semi-natural sites. However, because there was not a significant effect of site-type on pollen diversity, these differences between the networks are likely to be driven by the differences in fly species diversity, and also potentially by the behaviour of the individual flies, which may vary between site-types. These sites were all within a 1km2 area, and yet there are still significant differences between the habitat types, suggesting that there were barriers to dispersal (Lövei, Macleod and Hickman, 1998; Wratten et al., 2003) or that the short distances were enough to prevent mixing between site-types.

Studies of gut contents analysis have shown that the time DNA stays in the gut varies, with some studies finding a matter of hours (Millan et al., 2007) and some studies longer (Hosseini, Schmidt and Keller, 2008). Pollen is encased in a hard shell, and so it may remain longer in the gut than other types of food. The longer detection time helps to explain the pollen found from plants not on the survey list, as it is likely flies visited plants outside of the site and the pollen DNA still remains. The length of detection time in this study is not of importance, because there is enough time between sampling rounds for there not to be an overlap in detection, and travelling between sites may help to explain why there was no significant difference in plant beta diversity between sites and site-type.

Conclusion This study shows that DNA can be used in a dual approach to survey fly species and pollen associations from a single DNA extraction. Fly beta diversity changes over the summer months, potentially influenced by changes to the floral resources available. However, the changes in fly species composition between semi-natural, urban and agricultural sites are not influenced by floral resources and are more likely affected by larval habitat availability, suggesting that the ecology of these syrphid pollinators is vital to understanding how they

100 are affected by land use change. This study also shows that agricultural sites have less diverse and more nested networks than the other site-types, suggesting that networks in these sites are less robust and more vulnerable to further change. This study gives a detailed picture of how these complex mosaics of land use types, which make up a large proportion of the UK landscape, are influencing syrphid – plant interactions. The importance of urban and semi-natural sites for increasing fly diversity and network strength are clear, and future research should include larval life histories in order to fully understand how this family of pollinators are affected by land use and floral resources.

101 Chapter 4: Metabarcoding of pan trap bycatch to identify non-bee and non-syrphid flower visitors

Introduction While current pollinator research has focused on bees, and to a lesser extent on the Diptera family Syrphidae, there are many other flower visiting insects which are also contributing to pollination. The importance of these insects to crop yields and ecology is unknown, as their contribution has not been measured, but Orford, Vaughan and Memmott (2015) found that non-syrphid Diptera had similar pollen-loads to syrphids. Research has shown that they probably make up a high proportion of flower visits (Rader et al., 2016), and thus should not be underestimated as pollinators. Alongside this, research into visitation networks, competition between species and disease transmission is incomplete without acknowledging the large diversity of flower visiting species.

This hidden pollinator diversity can be difficult to measure, because the species visiting flowers are extremely diverse, and often difficult to identify to species level by volunteers. This means that the pollinator monitoring programmes only categorises insects to higher levels during floral observations, obscuring species diversity. However, this diversity is also potentially collected during the pan trap monitoring in the pollinator monitoring scheme (Carvell et al., 2016). This pan trap monitoring was discussed in Chapter 2, when the syrphid pollinators from the pilot study were used to test the UK syrphid reference database. Alongside this, the bees from the same study were identified using DNA barcoding in the Creedy et al. (2019) study. This leaves the ‘bycatch’, defined as anything which is not a bee or a syrphid. This is a rich source of data on potential flower visitors which is currently not being utilised due to the constraints of performing taxonomic analysis.

The bulk pan trap samples could be identified by traditional taxonomists using morphology. However, diverse samples would have to be pre-sorted to separate groups and sent to relevant experts, a time-consuming step and an extra cost. Using DNA methods to identify these samples enables species identification quickly and without pre-sorting. DNA metabarcoding is a technique which allows identification of species from bulk samples, and

102 has been used in many studies including soil biodiversity (Andújar et al., 2015) and hyper- diverse rainforest data (Creedy, Ng and Vogler, 2019). Bulk samples are taken and homogenised, before the barcode of choice is amplified and sequenced using next generation sequencing. This gives sequences of all of the species present in the dataset. For , the CO1 barcoding region is usually used for metabarcoding, using degenerative primers which will amplify DNA from all arthropod species.

As well as providing species identifications which would otherwise be time consuming to obtain, using metabarcoding to identify these species also gives more data than just a species identification, and allows phylogenetic diversity of the ‘bycatch’ species to be investigated and compared across sites and time, giving more information on the communities. Previous research has shown that bee phylogenetic diversity has decreased over time with changes in land use, and this may well have an impact on the communities’ resilience to further change. Little is known about these ‘bycatch’ flower visitors, and how diverse they are across different sites. Adding in a measure of phylogenetic diversity for a potentially highly diverse group provides more information on these potential pollinators and may give an indication of how they will respond to pressures such as land use change and climate change.

In this study pan trap samples from sites across the UK will be used to identify the taxonomy of these ‘bycatch’ flower visitors, to explore species and phylogenetic diversity. Previous research has focused on the high diversity of Diptera flower visitors (Orford, Vaughan and Memmott, 2015; Rader et al., 2016), and so the hypothesis that these non-syrphid Dipteran pollinators make up a large proportion of flower visitors will be explored. This data will then be used to look at whether species and phylogenetic diversity respond in the same way to differences across the sample sites and throughout the summer.

103 Methods The samples were caught over 12 sites across five months in the summer of 2015, as part of the NPPMF pilot monitoring programme (Carvell et al., 2016). For each site a 25m transect was sampled, with five pan trap stations at 5m intervals along the transect. There were three pan traps at each station, one each of yellow, white and blue. During each sampling period pan traps were left out for four hours to sample the flower visiting invertebrate community. Once collected the samples were stored in 70% ethanol and posted to the lab where they were kept at -20°C. Bee and syrphid specimens were sorted and identified by expert taxonomists. The bees were kept separate, and the syrphid specimens returned to the bulk samples, to be sorted out for separate analysis at a later date. The ‘bycatch’ from the pan traps were stored at -20°C in the original 70% ethanol.

The samples were put into 5ml tubes, with some samples requiring more than one tube. These samples were dried using the SpeedVac to remove excess ethanol which may interfere with DNA extraction. A 5mm metal bead was added to each tube and the samples were ground in the TissueLyser for 1 minute. 400µl of lysis buffer made up of 350µl of ATL and 40µl of proteinase K, was added to the tubes, so that the specimens were covered, and incubated overnight at 56°C on a shaker at 75rpm. 100µl of the lysis buffer was taken from each tube, with some samples made up of multiple tubes. This meant that all the specimens were represented in the extraction. The DNA extraction was carried out following the protocol for the Quiagen DNA blood and tissue extraction kit.

PCR was carried out using the primers BF and foldR to amplify the 418bp barcoding region of CO1. The primers were tagged with the same 6bp tag on the forward and reverse so that they could be pooled with samples from other projects for sequencing, and separated post sequencing (Shokralla et al., 2015). PCR was carried out using 2.4µl of takara buffer, 0.18µl of MgCl2, 0.3µl of dNTPs, 0.6µl of both forward and reverse primers, 0.1µl of takara hot start taq and 2µl of DNA. The volume was made up to 15µl by ddH2O. The PCR reaction conditions were an initial denaturation at 94°C for 4 minutes, followed by 29 cycles of 94°C for 30 seconds, 48°C for 30 seconds and 72°C for 3 minutes. The final elongation was carried out at 72°C for 10 minutes, and the PCR checked using electrophoresis. The PCR was

104 repeated three times for each sample to account for stochasticity in the amplification, and then 10µl from each reaction was pooled. The samples were pooled with samples from other projects and run on a MiSeq 2x300.

Post sequencing the samples were de-multiplexed from the rest of the data using the 6bp tags, using cutadapt as part of the NAPdemux script. After this the primers were removed using the primer sequence in fastx-trimmer, and then the forward and reverse reads were paired using Pear, with a quality score of 26. The samples were inputted into fastq_filter to quality filter sequences with an eemax of <0.2. Metabarcoding analysis was run using NAPcluster, which clustered OTUs together using Usearch80 (Edgar, 2010) at 97% similarity. Strict filtering parameters were used, with a strict length filter of 418 and a minimum sequence number of 5. The strict length filter removed any sequences not exactly 418 base pairs long, which were highly likely to be pseudogenes or sequencing and PCR errors. The minimum sequence number removed very rare OTUs which were highly likely to be PCR and sequencing errors.

After clustering, different taxonomic assignment methods were used to identify the OTUs using two databases; the BOLD barcode database (Ratanasingham and Hebert, 2007) and the GenBank CO1 database. First the OTUs were searched against the GenBank database using Blast, and identifications accepted if there was a >97% identity to a single species or a higher taxonomic classification. To identify using the BOLD database all Insecta CO1 barcodes were downloaded from BOLD and used to create a custom blast database. As with the GenBank search, identities over 97% were accepted if there was no ambiguity. The GenBank blast output was also inputted into MEGAN (Huson et al., 2016) where the identifications were accepted using the top 10% of hits to that species, with a minimum score of 50 and minimum support of 1. If these standards were not met, then a higher-level identification was given. Using MEGAN resulted in more conservative identification levels, which can reduce incorrect identifications, but may also result in correct species level identifications being discarded.

The taxon IDs were exported from MEGAN into R alongside the metadata and the read abundance table, which gives the OTU abundance for each sample. Any syrphid OTUs were

105 removed from the dataset before the rest of the analysis, as the syrphids were analysed separately. The samples were rarefied for ecological analysis to a level based on a trade-off between the number of samples that were retained and the number of OTUs that would be included in the analysis. Ecological analysis was carried out in R using the Phytools package (Revell, 2012) to look at taxonomic diversity of OTUs over the whole dataset. The beta diversity, including turnover and nestedness of samples, was investigated using Betapart (Baselga and Orme, 2012) and Vegan (Oksanen et al., 2016).

The data was then used to look at phylogenetic diversity. One of the outputs of the Usearch clustering algorithm is a “representative” sequence from each OTU, which is selected as the central sequence around which other sequences cluster at 97%. It thus gives a representative sequence for each OTU which can be used for phylogenetic analysis. These sequences were aligned using Muscle (Edgar, 2004), and then converted into an xml file using BEAUTI. A phylogenetic tree was run using BEAST V8.1.4 (Drummond et al., 2012) on the Cipres Science Gateway (Miller, Pfeiffer and Schwartz, 2010) for 50M generations, using the evolutionary model GTR+I+G. BEAST was used because it allowed the orders to be constrained to monophyly in BEAUTI. The progress of the MCMC chains from the tree search was visualised in Tracer, to ensure that the chains had converged, and to choose what to designate as burn-in. The tree was then selected using tree annotator, with a burn in of 1000000, and visualised in Figtree. The R package Picante (Kembel et al., 2010) was used to calculate Faith’s phylogenetic distance and mean phylogenetic distance for all of the samples, which was summarised in ggplot2 (Wickham, 2016) using boxplots.

A second tree was also constructed in BEAST, without any constrained groups, using just the Diptera sequences. These were aligned using Muscle and run on the Cipres Science Server in the same way as the complete dataset. Once again, the tree was used to calculate phylogenetic distance measures in R using the picante package (Kembel et al., 2010).

Results The output from sequencing after pairing and quality filtering varied, with one sample having no reads and the largest sample 8252 reads. Out of 53 samples, five had under 100

106 reads and 27 had under 1000 reads. The average read count per sample was 1568. Overall there were 138 OTUs produced from NAPcluster. Using GenBank blast and MEGAN, 31 syrphid OTUs were identified which were removed from the dataset. The data was rarefied to 75 reads per OTU per sample, which resulted in the loss of 18 out of 50 of the samples, leaving 32 samples for ecological analysis. Nine out of the 12 sites had data for more than one time point across the summer, and for two sites no data was lost. After rarefaction and removal of the syrphid OTUs, 77 OTUs remained.

The OTUs identified using BOLD and GenBank were compared to the identifications obtained by including MEGAN ID selection using the GenBank blast output (figure 1). Using the BOLD database resulted in a higher number of OTUs identified to species level than the two GenBank methods, suggesting that BOLD’s curated databases may contain more species references for this group of insects compared to GenBank. The MEGAN output had more OTUs identified to higher levels such as genus, family and order, which is in line with the fact that it is a more conservative method for identifying OTUs, and so some which were identified to species level by GenBank did not meet the criteria for MEGAN to assign a species level identification. Finally, figure 1 shows that all of the OTUs were assigned an identification by MEGAN, which was not the case for the other two methods. Using BOLD or GenBank databases on their own with a simple criterion for accepting or rejecting an identification resulted in high numbers of OTUs unidentified.

107 BOLD GenBank MEGAN

40

30

20

count 10

0

species genus family order no ID

Figure 1. levels of OTU identification using three different methods. GenBank (green) is using the GenBank database to identify OTUs, whereas BOLD (red) is using the same methodology to identify the OTUs but using the BOLD database. The third method, MEGAN (blue), uses the assignment program to assign an ID to the OTUs post blast. Identifications are classified into different levels: species, genus, family, order and no ID. Because of the more conservative criteria, and the fact that all of the OTUs were given an identification, the further analysis was carried out using the identifications from MEGAN. The OTUs were split into higher level groups in figure 2A, showing the number of OTUs per taxonomic group and the richness of each group per sample. Even after removing the syrphid OTUs from the dataset, the group with the largest diversity of OTUs was Diptera, with 51 OTUs. The second most abundant group, the Coleoptera, contained 11 OTUs. 33%

108 of the Diptera were identified to species level, and 40% of the other 5 Insect groups shown in figure 2A were also identified to species level.

5

12

4

8 3

2

4 Richness OTU OTU Richness OTU

1

0 0 3 11 51 2 3 5 2 1 1 1 3 3 3 4 1 4 2 9 6 1 12 26 Diptera Unknown Coleoptera Trichoptera Phoridae Unknown Sciaridae Muscidae Hymenoptera Hybotidae Bibionidae Empididae Tachinidae Ephemeroptera other Diptera Calliphoridae Anthomyiidae Rhinophoridae Sarcophagidae Dolichopodidae Scathophagidae Taxonomic group

Other or unknown eukaryote Other or unknown Taxonomic group Figure 2 A. The average OTU richness of the samples shown by taxonomic assignment to order level. The total number of OTUs identified to each taxonomic group is shown on the plot. Figure 2B. The taxonomic makeup of the most diverse group found in the samples, Diptera, split into families. The total number of OTUs found for each family is shown on the graph. The boxplots show the average OTU richness per sample for each family.

22 OTUs were present in only one sample, and so were designated as rare OTUs. This meant that they had only been detected in one site at one of the sampling periods. The makeup of these OTUs included 10 Diptera OTUs, one parasitoid wasp, one Hemiptera and 10 Coleoptera OTUs. This meant that only one Coleoptera OTU was found in more than one sample, and that was Oedemera virescens, a known common flower visitor, which was found in three samples. Overall, OTUs were found in low numbers of samples, suggesting species composition varied between samples. 11 OTUs were found in over 10 samples, and of them four were found in over 20 samples. The most abundant OTU was an unidentified

109 Diptera OTU which was found in 46 out of 50 samples and is shown on the phylogenetic tree in figure 4 marked with a star. It forms a clade with other unidentified Diptera OTUs, sister to the Tachinidae, Muscidae, Sarcophagidae, Calliphoridae and Anthomiidae clade.

The number of OTUs found per site over all of the sampling rounds ranged from 10 to 29. 13 of the sites contained more than 20 OTUs over the whole summer. Generally, sites with more OTUs had more OTUs that were found in multiple sampling rounds. 32 OTUs were only found in one of the sites. 10 OTUs were found in over half of the sampled sites. Of these, eight were Diptera and two were Ephemeroptera. The only OTU which was found in all of the sites was the same unidentified Diptera OTU which was the most abundant OTU.

Site had a significant effect on the beta diversity of a sample (P < 0.001). This can be seen in figure 3A, with the samples from different sites clustering separately in the DB-RDS analysis. There is also some clustering of samples based on sampling round (figure 3B), however round does not significantly explain differences in beta diversity. This suggests that whilst there is a difference in species turnover and nestedness between sampling sites, there was not a significant turnover of species over the course of the summer.

110

A.

B.

Figure 3. DB-RDS plots showing total beta diversity of flies. This beta diversity is shown over A. sites and B. sampling round.

111 The phylogeny in figure 4 shows the phylogenetic diversity of OTUs within the samples. There were six main groups present in the dataset which are highlighted on the tree, Ephemeroptera, Hemiptera, Coleoptera, Diptera, Hymenoptera and Trichoptera. The tree also placed two unidentified Insecta OTUs within Diptera and Ephemeroptera, retrospectively increasing the identification level of these OTUs. The Diptera were by far the largest clade on the tree, and there was a large amount of phylogenetic diversity within this clade. The posterior probabilities on the tree are quite low, with 16% of nodes having a posterior probability of 1 and 33% with a posterior probability of over 0.9. The average posterior probability is 0.56. This is due to the fact that this is a CO1 barcode tree, and so only uses a single gene.

112

otu14_Insecta 1 otu95_Ephemeroptera_Setisura 1 otu91_Ephemeroptera_Baetidae 1 otu67_Ephemeroptera_Baetidae otu89_Diptera 0.16 otu55_Diptera_Bibionidae_Dilophus 0.2 1 otu104_Diptera 0.2 otu41_Diptera 0.13 otu54_Diptera_Dolichopodidae_Chrysotus 0.99 otu51_Diptera_Dolichopodidae_Chrysotus otu61_Diptera_Dolichopodidae 0.31 otu38_Diptera 0.6 0.9 otu22_Diptera_Empididae_Rhamphomyia 0.83 otu101_Insecta otu6_Diptera_Empididae_Empis 0.15 0.62 otu37_Diptera_Empididae 0.3 otu108_Diptera otu64_Diptera_Phoridae 0.08 otu74_Diptera 0.92 0.63 1 otu40_Diptera_Hybotidae_Platypalpus

0.26 otu77_Diptera

otu30_Diptera

0.74 0.29 otu65_Diptera * 0.68 otu99_Diptera otu2_Diptera otu59_Diptera_Tachinidae_Epicampocera 0.32 otu46_Diptera_Muscidae_Thricops 0.980.19 otu18_Diptera_Tachinidae_Eriothrix 0.28 otu1_Diptera_Muscidae_Drymeia otu78_Diptera_Calliphoridae_Pollenia otu27_Diptera_Rhinophoridae_Rhinophora 0.71 0.4 otu76_Diptera_Muscidae_Phaonia 0.81 otu87_Diptera_Muscidae_Phaonia 0.39 otu102_Diptera_Muscidae 0.210.21 otu17_Diptera_Muscidae_Helina 0.05 otu15_Diptera_Muscoidea 0.080.91 0.89 otu26_Diptera_Muscidae_Phaonia 1 otu13_Diptera_Muscidae_Phaonia otu82_Diptera_Muscidae 0.98otu4_Diptera_Scathophagidae 0.41 otu88_Diptera_Scathophagidae_Norellisoma 0.42 otu72_Diptera_Calliphoridae 0.01 0.65 otu62_Diptera_Calliphoridae_Calliphora 0.110.35 otu49_Diptera_Calliphoridae_Melinda 0.97otu94_Diptera_Anthomyiidae otu8_Diptera_Anthomyiidae_Pegoplata 0.69 otu79_Diptera_Anthomyiidae 0.06 0.89otu10_Diptera_Anthomyiidae_Delia 1 otu92_Diptera_Anthomyiidae_Delia otu70_Diptera_Anthomyiidae 0.16 otu24_Diptera_Sarcophagidae_Sarcophaga 1 otu16_Diptera_Sarcophagidae_Sarcophaga 0.04 0.84 1 otu12_Diptera_Sarcophagidae_Sarcophaga otu19_Diptera_Sarcophagidae_Sarcophaga 0.22 otu68_Diptera_Tachinidae_Pelatachina otu42_Diptera_Sciaridae otu93_Coleoptera_Scarabaeoidea_Phyllopertha 0.8 otu103_Coleoptera_Elateroidea_Ctenicera 0.36 otu106_Coleoptera_Elateroidea_Cantharis 1 otu84_Coleoptera_Elateroidea_Cantharis otu44_Coleoptera_Cucujoidea_Cheilomenes 1 1 0.75 otu47_Coleoptera_Cucujoidea_Coccinellidae 0.39 0.23 otu85_Coleoptera_Cucujoidea_Coccinellini 0.54 otu75_Coleoptera_Cucujoidea_Scymninae 0.96 otu52_Coleoptera_Cucujoidea_Scymnini 0.56 otu53_Coleoptera_Cucujoidea_Coccinellidae otu43_Coleoptera_Tenebrionoidea_Oedemera 1 1 otu50_Trichoptera 1 otu39_Trichoptera_Odontoceridae_Marilia otu32_Trichoptera_Glossosomatidae_Agapetus 1 otu98_Hemiptera_Laccotrephes otu86_Hymenoptera_Tenthredinidae_Tenthredo 1 otu83_Hymenoptera_Ichneumonidae_Lissonota otu69_cellular 1 otu109_cellular

0.3

Figure 4. Phylogenetic tree of all of the OTUs present in the dataset. The tree was built using BEAST v8.1.4 and posterior probabilities are shown on the tree. Each group is coloured separately: green = Ephemoptera, blue = Diptera, orange = Coleoptera, grey = Hemiptera and purple = Hymenoptera. The most abundant OTU found in the dataset is marked on the tree with a red star.

113 Faiths phylogenetic diversity, calculated for each sample based on the tree, varied from 2 to 10, with the lowest site SU0541 in round 2, and the highest NT4820 in round 3. The variation between sites shown in figure 5A shows that some sites have a higher average Faith’s PD than others, with NS9052 having the highest average faith’s PD and SU0541 the lowest, however there is no significant difference between any of the sites, suggesting that overall phylogenetic diversity is even across the sites. The same is true of Faith’s PD across the four sampling rounds (figure 5B), which is extremely even between rounds, with all of the rounds having an average faith’s PD of around 4.5.

As well as Faith’s PD, mean pairwise distances were also calculated for each sample, and the results per site and round are shown in figure 5 B and C respectively. Here, site NT5444 has the lowest mean pairwise distance, and SH654 the highest, and so mean pairwise distance does not show the same pattern as faith’s PD. However, like faith’s PD there is no significant difference between the sites. Between rounds there is little variation, as can be seen in figure 5D where the average varies between 1 and 2. This matches the result of Faith’s PD which showed there was little variation in phylogenetic diversity between rounds.

As Diptera was the largest group present in the samples, a phylogenetic tree was made using only this group to see how phylogenetic diversity of Diptera changed across sites and times. As can be seen in figure 6A and 6B the patterns of Faith’s PD across site and round mirror that for the whole dataset, which is to be expected as Diptera take up a large proportion of all of the OTUs. However, unlike the whole dataset, site does have a significant effect on faith’s PD (P <0.01), and further investigation shows that this is due to the site with the highest phylogenetic distance (NS9052) and the lowest (SU5041) being significantly different from each other, which can be clearly seen on the boxplot in figure 6A. Unlike Faith’s PD, the mean pairwise distance was not significantly different between any of the sites. However, unlike for all of the OTUs, the same sites were found to be highest (NS9052) and lowest (SU5041), as can be seen in figure 6C. There was once again very little variation in mean pairwise distance between rounds, with all of the average mean pairwise distances falling between 1.25 and 1.5 (figure 6D).

114 A. B.

10 10

8 8 faiths.PD faiths.PD

6 6

4 4

round1 round2 round3 round4 round ST8939 NT3959 NT4820 NT5379 NT5444 NT6860 SE1246 SE2911 NS9052 SH6354 SH6573 SU0541 SU5588 SU5692 site D. C.

4

4

3 3 mean.pairwise.distance mean.pairwise.distance

2 2

1 1

round1 round2 round3 round4 round ST8939 NT3959 NT4820 NT5379 NT5444 NT6860 SE1246 SE2911 NS9052 SH6354 SH6573 SU0541 SU5588 SU5692 site Figure 5. Boxplots showing two phylogenetic diversity measures between sites and rounds for all of the OTUs in the rarefied dataset. A and B show faith’s PD and C and D show mean pairwise distances. A and C. show phylogenetic measures per site and B and D show phylogenetic measures per round. 115 A. B.

8 8

6 6 faiths.PD faiths.PD

4 4

2 2 round1 round2 round3 round4 ST8939 NT3959 NT4820 NT5379 NT5444 NT6860 SE1246 SE2911 NS9052 SH6354 SH6573 SU0541 SU5588 SU5692 site round C D. . 2.00 2.00

1.75 1.75

1.50 1.50

1.25 1.25 mean.pairwise.distance mean.pairwise.distance

1.00 1.00

0.75 0.75 round1 round2 round3 round4 ST8939 NT3959 NT4820 NT5379 NT5444 NT6860 SE1246 SE2911 NS9052 SH6354 SH6573 SU0541 SU5588 SU5692 round site Figure 6. Boxplots of phylogenetic distance measures for each sample, using only the Diptera OTUs present in each sample. This is shown here as boxplots A and C per site and B and D per round. Distance measures are faith’s PD for A and B and mean pairwise distance for C and D.

116 Discussion All of the insects in this study were lethally sampled using pan traps. There is some debate as to whether sampling insects in this way has a negative effect on the community, however studies have found that there is no impact on species abundance (Gezon et al., 2015). If possible, it is more ethical to avoid lethally sampling, however in instances such as this where a wide diversity of species could not be identified in the field and otherwise would not be sampled, lethal sampling is justified. Pan trap sampling was used here as it is a recognised way of sampling potential flower visitors, which are attracted to flower-like coloured pan traps. However, pan traps are biased and do not capture certain types of flower visitors. For example, large powerful flying insects are able to pull themselves away from the water, and so groups such as bumblebees are often underrepresented in the data (Roulston, T’ai H., Smith, Stephen A., 2007; Wilson, Griswold and Messinger, 2008; Baum and Wallen, 2011). Alongside this, some orders such as Lepidoptera are not well sampled by pan traps (Vrdoljak and Samways, 2012), and in fact no Lepidoptera were found in this study. However, all sampling methods have biases, and it is important that all samples are collected in the same way, so they can be compared. This is the benefit of having a well- designed and maintained monitoring programme such as the Pollinator Monitoring Scheme (Carvell et al., 2016), as it results in long term datasets which can be compared over time.

There were several syrphid OTUs present in the dataset, even though the syrphid specimens were removed and processed separately to the bycatch samples. This is in contrast to the bees, which were also analysed separately, as no bee OTUs were detected in the metabarcoding samples. This is likely because the bee specimens were removed and sent to taxonomists where they were identified and stored separately. Conversely, the syrphids were identified by taxonomists and then returned to the bulk samples where they were stored in the same ethanol until removal for DNA extraction. This means that there was a high likelihood that syrphid DNA would be present in the ethanol (Linard et al., 2016), potentially along with body parts such as legs and wings. Any syrphid OTUs detected were removed from the dataset. The specimens had been removed and so any OTUs detected would not have been an accurate representation of the syrphid species present, and the species diversity of this family has already been analysed separately (Chapter 2).

117 In this study all of the insect CO1 sequences present in GenBank and BOLD were used as a reference, because the samples contained a diverse group of insects. Only species with sequences in the reference database were identified to species level, otherwise higher-level identifications are given. In studies such as this one, where there is a diverse range of insects present, it is difficult to curate an accurate reference database, as the groups present are to a certain extent unknown, and there is currently no curated UK invertebrate reference database. It is also highly likely that the CO1 sequences available on GenBank and BOLD are biased towards certain groups. This is because not all groups present in the samples have been studied equally. There may have been a concerted effort to barcode a certain order such as Diptera in the UK, which would bias the database towards detecting more Diptera in the samples. This may have led to an overestimation of the dominance of certain groups in the samples, although it is likely that even species not present in the reference dataset are able to be identified to the family level used here, which mitigates some of these issues. However this means that for families were there is a more robust reference library, some detail is lost as family level identifications need to be used to allow comparisons between groups.

It is also highly likely that there are errors and incorrectly identified sequences in online databases (Harris, 2000), and so simply selecting the top hit as an identification is not a reliable way of obtaining an ID. Here MEGAN, a program designed to select identifications based on similarity and percentage of matching hits, was used to ensure that erroneous hits were not accepted. This can however lead to over-cautiousness and therefore to under- identification. This is especially true for species that are under-represented in the database, as several reference sequences are required to identify something to species level. The most important future work to improve the identification of OTUs is therefore the creation and curation of reliable and well populated reference databases.

There were a wide range of species caught in the pan traps. Although the majority were Diptera, there were also several Coleoptera, Ephemoptera, Trichoptera and non-bee Hymenoptera detected. The Coleoptera OTUs tended to be rare OTUs, with all but one found in only one sample. Several Coccinellidae, or ladybird OTUs were detected in the samples, alongside, a leaf weevil (Phyllobius pyri), soldier beetles (Cantharsis), a click

118 ( cuprea) and a sun beetle (Amara). The Hymenoptera present were sawflies (Dolerus and Tenthredo) and a parasitoid wasp (Lissonota lineolaris). This shows the diversity of species present as potential flower visitors. Most research on flower visitors or pollinators is focused on bees and syrphids (Orford, Vaughan and Memmott, 2015; Rader et al., 2016), which visit flowers for food and thus are known to actively transport pollen between flowers. However, insects visit flowers for a variety of reasons including to rest, to predate other flower visitors, or to find mates (Kevan, 1983). Their potential as pollinators is therefore overlooked as they are not actively transporting pollen, but the diversity and abundance of these species and the potential to contribute to pollination should not be overlooked. Alongside this, it is known that bumblebees change the behaviour of honeybees and make them more effective pollinators (Sapir et al., 2017). The presence of other insects, especially predators and parasitoids, also has an effect on the behaviour of pollinators even if they themselves are not contributing to pollination (Romero, Antiqueira and Koricheva, 2011), and can have an effect on seed set and yield (Goncalves-Souza et al., 2008; Antiqueira and Romero, 2016).

Of course, it may be the case that not all insects caught in the pan traps are flower visitors. These water filled pan traps may also trap aquatic species, or other passing insects. For example, one of the Coleoptera OTUs detected was Elmidae, a family of aquatic beetles (Painter, 2001) which may have been attracted by the water in the trap. Once they begin to fill up, traps may also trap species that are attracted to decaying matter, as the water contains bodies of other insects. However, this is likely to be minimised in this case because the pan traps were only left out for 6 hours in a single day (Carvell et al., 2016). In this study any insect caught in the trap is treated as a potential flower visitor, as our current knowledge of flower visiting insects is still small.

This study found Diptera to be the most common insect type to be caught in the pan traps. The taxonomic makeup of the Diptera was diverse, with thirteen different families detected. It has already been hypothesised that non-syrphid Diptera make up a large proportion of flower visitors (Orford, Vaughan and Memmott, 2015), and thus may be important and overlooked in terms of pollination. The results here support that hypothesis and show that potential flower visiting Diptera are a diverse group of insects. The within Diptera

119 phylogenetic diversity showed site level differences in this study. Conserving this phylogenetic diversity and looking at what drives it could be an important way of maintaining diversity and resilience in the community.

The species diversity in this study overall was quite low, with a total of 77 OTUs, and many of them found in only one or two samples. There were significant differences in species diversity and composition between sites. These sites were situated around England and Wales, and so there were differences in latitude which may have affected the species composition. The sites were either semi-natural or agricultural, and were selected from the already running UK plant survey sites (Carvell et al., 2016). The taxonomy, species diversity and species composition are data that could be obtained using traditional taxonomy methods. The rationale for using DNA is that it does not require specialist taxonomists for multiple groups and can be done with fewer person-hours. However, currently this may result in a loss of resolution at species level, as not all of the species in the dataset may be present in reference datasets online. The added benefit of DNA analysis is the addition of phylogenetic diversity. This adds another level of diversity that can be analysed and monitored from this survey data.

In this study there was no significant difference in phylogenetic diversity between sites, even though there was in species composition. This suggests that the changes in species diversity are likely to be turnover within closely related groups. This difference between species and phylogenetic diversity has also been found to be occurring in bee populations across Europe (De Palma et al., 2017), and so using only one of these metrics does not give the whole picture. In some studies, phylogenetic diversity has been used as a predictor of functional diversity, however these two measures are not necessarily correlated (Petchey and Gaston, 2002), and so in future, studies could also include measures of functional diversity to further inform conservation strategies.

Both species composition and phylogenetic diversity were stable over the summer, suggesting that many of the species were present throughout the sampling period. These four months are a relatively short amount of time over which to see change, and future studies in this area would benefit from data taken from multiple sampling years. This

120 dataset is part of the National Pollinator Monitoring Scheme, which is now in its third year of data gathering. This analysis therefore has the potential to be applied to multiple sampling years, to give a more long-term picture of how species composition and phylogenetic diversity may be changing. Adding in a measure of phylogenetic diversity will give a level of detail not currently present in other monitoring programmes. A long term study into bumblebee phylogenetic diversity found that this was decreasing with increasing agriculture, and that it was linked to decreases in pollination services (Grab et al., 2019), and long term trends such as this would be missed without monitoring phylogenetic diversity alongside species diversity.

This multi-year dataset is more than just flower visitor data, as the broad taxonomy in the pan trap samples gives a potential dataset for monitoring insect decline in the UK. This is especially true for Diptera, which are well represented in the dataset. There is an increasing focus on overall insect diversity and decline, with studies showing widespread insect decline (Hallmann et al., 2017), including in species and phylogenetic diversity (Homburg et al., 2019). Using this dataset to measure species and phylogenetic diversity has the potential to widely increase our understanding of how insect communities in the UK are changing and inform conservation strategies.

An important metric for looking at insect decline is species abundance. This detects a decline before a species is totally lost from a community, and therefore is an important measure for monitoring and conservation. Abundance is currently difficult to measure using metabarcoding data, as there are biases in DNA extraction and amplification between species, different body types and body sizes, as well as random variation in extraction and amplification success. However, there are several studies looking at how abundance in metabarcoding can be accurately measured. Ji et al., (2019) added a standard into their PCR and sequencing reactions which could be used to correct the data, enabling them to adjust for biases in sequencing and make samples more comparable. In future including standards in analysis of this type would enable a more accurate measure of abundance.

121 Conclusion This study has shown the diversity of non-bee and non-syrphid flower visiting insects across the UK, which is currently not part of monitoring UK pollinators. This includes OTUs from groups such as Coleoptera, Trichoptera and Ephemoptera. The large diversity of non-syrphid Dipterans supports the hypothesis that this is an important group of flower visitors that are currently overlooked in pollinator research and monitoring. Using metabarcoding this overlooked group could be easily included in analysis using already collected samples from the Pollinator Monitoring Scheme. Alongside giving species diversity data quickly and efficiently, metabarcoding also reveals the phylogenetic diversity of these communities, which in the case of UK flower visitors does not correlate with species diversity. There is also the potential to monitor flower visiting insects long term using metabarcoding, without an increase in sampling, making it a cost-efficient data set to examine the trends in insect decline. Future research can build on this study by increasing the reach over more than a single year, and including methods such as DNA standards (Ji et al., 2019) to add abundance measures into metabarcoding which will increase the power of this data to inform conservation and detect changes in species trends.

122 Discussion

Throughout this thesis it has been shown that DNA barcoding and metabarcoding can offer an alternative to traditional taxonomic methods for monitoring of UK syrphids and non-bee pollinators. They can reduce the people-hours involved and potentially cost and time as well. However, the true value of these DNA methods is the increase in data that they provide, with phylogenetic diversity, diet associations and population level diversity adding to the overall picture of pollinator health in the UK.

Before DNA methods can be used to measure the diversity and evolutionary history of syrphid communities, an understanding into the evolution of this family is required in the form of a phylogenetic tree. Previous phylogenies have contained small numbers of taxa and data which have not shown tribal level relationships (Skevington and Yeates, 2000; Mengual, Ståhls and Rojo, 2015; Young et al., 2016), and include some ambiguity in the relationships between the three subfamilies (Ståhls et al., 2003). Chapter 1 provides strong evidence to support previous phylogenies which find Microdontinae to be sister to Eristalinae and Syrphinae, and Syrphinae monophyletic within a paraphyletic Eristalinae (Young et al., 2016). Using two different dating methods this phylogeny has also provided support for raising the subfamily Microdontinae to family level (Thompson, 1972; Young et al., 2016), with an early divergence from the rest of the Syrphidae family. Alongside the phylogeny, this study adds 83 mitochondrial genomes, 58 of which are complete, to available syrphid DNA resources. This provides a valuable tool for further research into evolutionary relationships, trait evolution and even identification of cryptic species.

One way in which this phylogeny can improve our understanding of the Syrphidae family is to give a clearer insight into the evolution of diverse larval life histories. Chapter 3 shows that these histories are vital to understanding syrphid species composition and the potential impact of land use change on this group. Chapter 1 demonstrates how the phylogeny of the Syrphidae family can be used to look at how this important trait has evolved. It appears that a saprophagous life history is ancestral, with predatory behaviour evolving at least twice on the tree – once in the rapid radiation within Syrphinae and once in the Volucella and Heringia genera. Understanding how these traits evolved can allow us to look at the

123 phylogenetic and functional diversity of a community and see how resilient it may be to land use change.

Whilst mitochondrial genomes provide large amounts of data for specimens, they are expensive to obtain, and so in order to identify species at a large-scale using DNA, CO1 barcodes are used. This requires a reference database to provide robust species identifications. Building a reference database for UK syrphids resulted in a database for around 70% of species recorded as resident in the UK. This included all of the species which were recorded during a national monitoring programme, and so although not all UK species are in the database, it is a reliable and useful tool for identifying the majority of syrphid diversity. Developing this database highlighted how challenging it can be to rely on species IDs from public databases, as around 100 sequences were removed for being incorrectly identified or separated geographically, such as in the case of Volucella bombylans. Alongside this, many species were challenging to separate using several different species delimitation methods and 97% clustering. This may be a result of rapid evolution in this family, which has resulted in species which cannot be distinguished using the CO1 barcoding region alone. This has been found to be the case for some species in other groups such as UK bees (Creedy et al. 2019) and ground beetles (Assmann et al., 2019), and thus shows the importance of testing DNA species identification for any group of interest so as to identify these issues. This lack of barcoding gap between species was resolved by looking at the haplotypes in the dataset, which separated species in all but three genera. This also indicated an opportunity to include haplotype level analysis in future monitoring and pollinator research. In a time where land use is becoming increasingly urban and agricultural, this type of analysis could give important insights into how isolated pollinator populations are becoming, indicating how fitness is affected before it impacts species diversity or abundance.

Developing and testing the reference database enabled DNA barcoding to be implemented throughout the rest of the thesis, and thus enabled the identification of individuals sampled in Chapter 3. Here the fly identification and gut contents were analysed from a single DNA extract, showing how DNA can enable a larger amount of data to be gained from insect samples than could be done using traditional methods. The individual level networks

124 showed that most networks contained a large variety of plant species, as was shown in previous research into syrphid pollen loads (Branquart and Hemptinne, 2000), but at an individual level there were an average of three links, suggesting that individuals visit few plant species. This suggests some specialism at an individual level, and has been detected previously in syrphids (Tur et al., 2014), and in some bee species, which specialise on a flower type to reserve energy required to switch between flower morphology (Gruter and Ratnieks, 2011).

Larval life history appears to be an important factor in syrphid species diversity. Data suggests that availability of larval habitat plays a role in determining species composition, and potentially more so than floral resources. Whilst pollen composition did not significantly affect species compositions across sites, larval life history did affect the species composition, with aquatic larvae most common in the wet, semi-natural marshland. This is important for considering the different responses of bee and syrphid pollinators. Whilst adults rely on floral resource availability, there are large differences between larval life histories which may be causing some of the differences in syrphid response to land use change (Sutherland, Sullivan and Poppy, 2001; Jauker et al., 2009).

The majority of this thesis focuses on the Syrphidae family; however, these are only a proportion of the total insect flower visitors in the UK. The pilot of the NPPMF collected 53 samples from 12 sites around the UK, over five months in the summer of 2015. The bee specimens were analysed in Creedy et al. (2019) and the syrphid specimens were used in Chapter 2 of this thesis. These analyses left a large number of specimens unanalysed, which also have not been identified by taxonomists. Chapter 4 gives an insight into how DNA methods can be used to look at diverse mixed samples which would otherwise be overlooked, and to give a picture of the flower visiting species diversity across the UK. The majority of the non-bee and non-syrphid insects were Diptera, and this group has been suggested as an overlooked and diverse group of flower visitors for some time (Orford, Vaughan and Memmott, 2015; Rader et al., 2016). However, there were also OTUs from Coleoptera, non-bee Hymenoptera, Ephemoptera and Trichoptera in the dataset. Not all of these may be pollinators, such as in the case of one Coleoptera OTU which belonged to an aquatic genus and may have been attracted to the water in the trap. Nevertheless, these

125 samples provide an insight into the as yet unknown taxonomic makeup of UK flower visitors. Furthermore, this data has the broader potential for long term monitoring of insect biodiversity in the UK, tracking insect diversity and abundance over time as has been done in other international studies (Homburg et al., 2019).

Once again, Chapter 4 shows the use of DNA methods in providing data which would have otherwise been difficult and time consuming to obtain. It further demonstrates the value of DNA metabarcoding in providing additional measures of diversity. Whilst traditional taxonomic methods are able to provide species diversity information, metabarcoding also provides phylogenetic insight. This is important because species and phylogenetic diversity do not always react to environmental change in the same way (De Palma et al., 2017), and so including the latter gives a further dimension through which to measure the health of the pollinator community.

Overall, the theme throughout this thesis has been to show the importance of using DNA in monitoring and research into pollinator communities, especially the Syrphidae family, with their diverse and cryptic species and complex life history strategies. The complexity within this family has shown the constraints and issues with DNA species identification, although solutions have been given to allow the full diversity to be explored. Alongside this, DNA has been shown to be more than just a tool for simplifying monitoring, but a valuable method that facilitates a deeper exploration of insect communities. This includes phylogenetic diversity, species associations such as diet, and composition of bulk samples. Specifically, this thesis has provided the most robust and complete Syrphidae phylogeny to date, as well as providing a resource for future research and monitoring of UK syrphids in a well curated and tested reference database. It is hoped that these resources will be used to continue and further the research into this important family of UK pollinators.

126 Acknowledgements

The funding for this PhD was provided by NERC through the Grantham Institute Science and Solutions for a Changing Planet Doctoral Training Partnership at Imperial College London. Additional funding and a three-month placement were provided by Defra. Research facilities were provided at the Natural History Museum, London, where this PhD was based. All of the Illumina MiSeq sequencing in this thesis was carried out at the Natural History Museum sequencing facility. The Illumina HiSeq sequencing was carried out at the Earlham Institute in Norwich. I would like to thank my supervisor Alfried Vogler, and all members of his research group past and present who have helped and supported me along the way. Special thanks to David Notton for all his support and work collecting flies for me. Thank you to Jemilah Vanderpump for enabling me to expand my knowledge and the reach of my research at Defra. Thank you to the landowners who allowed me collect in their gardens, and to Alison Norman for her excellent work as field assistant. Thank you to my family and friends, particularly my mum and dad for supporting me even when not entirely sure what I actually do. And finally, thank you to James, I would never have been able to do this without you.

127 Bibliography

Alarcón, R. (2010) ‘Congruence between visitation and pollen-transport networks in a California plant-pollinator community’, Oikos, 119(1), pp. 35–44. doi: 10.1111/j.1600- 0706.2009.17694.x. Amorós-Jiménez, R. et al. (2014) ‘Feeding preferences of the aphidophagous hoverfly Sphaerophoria rueppellii affect the performance of its offspring’, BioControl, 59(4), pp. 427– 435. doi: 10.1007/s10526-014-9577-8. Andújar, C. et al. (2015) ‘Phylogenetic community ecology of soil biodiversity using mitochondrial metagenomics’, Molecular Ecology, 24(14), pp. 3603–3617. doi: 10.1111/mec.13195. Antiqueira, P. A. P. and Romero, G. Q. (2016) ‘Floral asymmetry and predation risk modify pollinator behavior, but only predation risk decreases plant fitness’, Oecologia. Springer Berlin Heidelberg, 181(2), pp. 475–485. doi: 10.1007/s00442-016-3564-y. Arribas, P. et al. (2016) ‘Metabarcoding and mitochondrial metagenomics of endogean arthropods to unveil mesofauna of the soil’, Methods in Ecology and Evolution, pp. 1–11. doi: 10.13140/RG.2.1.3702.8566. Assmann, T. et al. (2019) ‘Heaven and hell: Spotlights on some DNA barcodes for species identification and delimitation in ground beetles’, ARPHA Conference Abstracts, 2, pp. 0–1. doi: 10.3897/aca.2.e38819. Bailes, E. J. et al. (2018) ‘First detection of bee viruses in hoverfly (syrphid) pollinators’, Biology Letters, 14(2), pp. 4–7. doi: 10.1098/rsbl.2018.0001. Baird, D. J., Hajibabaei, M. and Brunswick, N. (2012) ‘Biomonitoring 2.0: a new paradigm in ecosystem assessment made possible by next-generation DNA sequencing.’, Molecular Ecology, 21(8), pp. 2039–44. doi: 10.1111/j.1365-294X.2012.05519.x. Ball, S. and Morris, R. (2015) Britain’s Hoverflies. second edition. Princeton University Press. Ballantyne, G., Baldock, K. and Willmer, P. (2015) ‘Pollinator Importance Networks – Visitation and Pollen Deposition in A Heathland Plant Community’, Proceedings of the Royal Society B Biological Sciences, 282(20151130). Bankevich, A. et al. (2012) ‘SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing’, Journal of Computational Biology, 19(5), pp. 455–477. doi: 10.1089/cmb.2012.0021.

128 Barlow, K. E. et al. (2015) ‘Citizen science reveals trends in bat populations: The National Bat Monitoring Programme in Great Britain’, Biological Conservation, 182, pp. 14–26. doi: 10.1016/j.biocon.2014.11.022. Bartomeus, I. et al. (2018) ‘On the inconsistency of pollinator species traits for predicting either response to land-use change or functional contribution’, Oikos, 127(2), pp. 306–315. doi: 10.1111/oik.04507. Baselga, A. and Orme, C. D. L. (2012) ‘Betapart: An R package for the study of beta diversity’, Methods in Ecology and Evolution, 3(5), pp. 808–812. doi: 10.1111/j.2041- 210X.2012.00224.x. Basley, K. et al. (2018) ‘Effects of chronic exposure to thiamethoxam on larvae of the hoverfly Eristalis tenax (Diptera, Syrphidae)’, PeerJ, 6, p. e4258. doi: 10.7717/peerj.4258. Bates, A. J. et al. (2011) ‘Changing Bee and Hoverfly Pollinator Assemblages along an Urban- Rural Gradient’, PLoS ONE, 6(8), p. e23459. doi: 10.1371/journal.pone.0023459. Baum, K. a. and Wallen, K. E. (2011) ‘Potential Bias in Pan Trapping as a Function of Floral Abundance’, Journal of the Kansas Entomological Society, 84(2), pp. 155–159. doi: 10.2317/JKES100629.1. Bell, Karen L et al. (2016) ‘Pollen DNA barcoding : current applications and future’, 640(April), pp. 629–640. Bell, Karen L. et al. (2016) ‘Review and future prospects for DNA barcoding methods in forensic palynology’, Forensic Science International: Genetics. Elsevier Ireland Ltd, 21, pp. 110–116. doi: 10.1016/j.fsigen.2015.12.010. Bernt, M. et al. (2013) ‘MITOS: Improved de novo metazoan mitochondrial genome annotation’, Molecular Phylogenetics and Evolution, 69(2), pp. 313–319. doi: 10.1016/j.ympev.2012.08.023. Biggs, J. et al. (2015) ‘Using eDNA to develop a national citizen science-based monitoring programme for the great crested newt (Triturus cristatus)’, Biological Conservation, 183, pp. 19–28. doi: 10.1016/j.biocon.2014.11.029. Boisvert, S., Laviolette, F. and Corbeil, J. (2010) ‘Ray: simultaneous assembly of reads from a mix of high-throughput sequencing technologies.’, Journal of computational biology : a journal of computational molecular cell biology, 17(11), pp. 1519–33. doi: 10.1089/cmb.2009.0238. Bolger, A. M., Lohse, M. and Usadel, B. (2014) ‘Trimmomatic: A flexible trimmer for Illumina

129 sequence data’, Bioinformatics, 30(15), pp. 2114–2120. doi: 10.1093/bioinformatics/btu170. Bosch, J. et al. (2009) ‘Plant-pollinator networks: adding the pollinator’s perspective’, Ecology Letters, 12(5), pp. 409–419. doi: 10.1111/j.1461-0248.2009.01296.x. Branquart, E. and Hemptinne, J. L. (2000) ‘Selectivity in the exploitation of floral resources by hoverflies (Diptera : Syrphinae)’, Ecography, 23(6), pp. 732–742. doi: 10.1111/j.1600- 0587.2000.tb00316.x. Bruni, I. et al. (2015) ‘A DNA barcoding approach to identify plant species in multiflower honey’, Food Chemistry, 170, pp. 308–315. doi: 10.1016/j.foodchem.2014.08.060. Cameron, S. L. et al. (2007) ‘A mitochondrial genome phylogeny of Diptera: Whole genome sequence data accurately resolve relationships over broad timescales with high precision’, Systematic Entomology, 32(1), pp. 40–59. doi: 10.1111/j.1365-3113.2006.00355.x. Čandek, K. and Kuntner, M. (2015) ‘DNA barcoding gap: Reliable species identification over morphological and geographical scales’, Molecular Ecology Resources, 15(2), pp. 268–277. doi: 10.1111/1755-0998.12304. Carvell, C. et al. (2016) ‘Design and Testing of a National Pollinator and Pollination Monitoring Framework’, Defra report WC1101 p. 1-65. Casbon, J. A. et al. (2011) ‘A method for counting PCR template molecules with application to next-generation sequencing’, Nucleic Acids Research, 39(12). doi: 10.1093/nar/gkr217. CBOL Plant Working Group et al. (2009) ‘A DNA barcode for land plants.’, Proceedings of the National Academy of Sciences of the United States of America, 106(31), pp. 12794–7. doi: 10.1073/pnas.0905845106. Chroni, A. et al. (2017) ‘Molecular species delimitation in the genus Eumerus (Diptera: Syrphidae)’, Bulletin of Entomological Research, 107(1), pp. 126–138. doi: 10.1017/S0007485316000729. Clark, K. et al. (2016) ‘Genbank’, Nucleic Acids Research, 44, pp. 67–72. doi: 10.1093/nar/gkv1276. Crampton-platt, A. et al. (2015) ‘Soup to tree : the phylogeny of beetles inferred by mitochondrial metagenomics of a Bornean rainforest sample’, Molecular Biology and Evolution, 32(9), pp. 2302-2316. doi: 10.1093/molbev/msv/111. Creedy, T. J. et al. (2019) ‘A validated workflow for rapid taxonomic assignment and monitoring of a national fauna of bees (Apiformes) using high throughput barcoding’, bioRxiv, p. 575308. doi: 10.1101/575308.

130 Creedy, T. J. (2019) NAPtime. Available at: https://github.com/tjcreedy/NAPtime/wiki (Accessed: 16 September 2019). Creedy, T. J., Ng, W. S. and Vogler, A. P. (2019) ‘Toward accurate species-level metabarcoding of arthropod communities from the tropical forest canopy’, Ecology and Evolution, 9(6), pp. 3105–3116. doi: 10.1002/ece3.4839. Cross, I. and Notton, D. G. (2017) ‘Small-Headed Resin Bee , Heriades Rubicola ’, British Journal of Entomology and Natural History, 30, pp. 1–8. Darriba, D. and Posada, D. (2014) ‘jModelTest 2.0 Manual’, pp. 1–24. Available at: http://code.google.com/p/jmodeltest2. Darwell, C. T. and Cook, J. M. (2017) ‘Cryptic diversity in a fig wasp community— morphologically differentiated species are sympatric but cryptic species are parapatric’, Molecular Ecology, 26(3), pp. 937–950. doi: 10.1111/mec.13985. Department for Environment, F. & R. A. (2014) ‘The National Pollinator Strategy: for bees and other pollinators in England’, Defra Report, (November), pp. 1–36. Available at: https://www.gov.uk/government/publications/national-pollinator-strategy-for-bees-and- other-pollinators-in-england. Dopheide, A. et al. (2019) ‘Impacts of DNA extraction and PCR on DNA metabarcoding estimates of soil biodiversity’, Methods in Ecology and Evolution, 10(1), pp. 120–133. doi: 10.1111/2041-210X.13086. Dormann, C., Gruber, B. and Fründ, J. (2008) ‘Introducing the bipartite package: analysing ecological networks’, Interaction, 8(2), p. 0.2413793. Drummond, A. J. et al. (2012) ‘Bayesian phylogenetics with BEAUti and the BEAST 1.7’, Molecular Biology and Evolution, 29(8), pp. 1969–1973. doi: 10.1093/molbev/mss075. Dupuis, J. R., Roe, A. D. and Sperling, F. A. H. (2012) ‘Multi-locus species delimitation in closely related animals and fungi: One marker is not enough’, Molecular Ecology, 21(18), pp. 4422–4436. doi: 10.1111/j.1365-294X.2012.05642.x. Edgar, R. C. (2004) ‘MUSCLE: Multiple sequence alignment with high accuracy and high throughput’, Nucleic Acids Research, 32(5), pp. 1792–1797. doi: 10.1093/nar/gkh340. Edgar, R. C. (2010) ‘Search and clustering orders of magnitude faster than BLAST’, Bioinformatics, 26(19), pp. 2460–2461. doi: 10.1093/bioinformatics/btq461. Elbrecht, V. et al. (2018) ‘Estimating intraspecific genetic diversity from community DNA metabarcoding data’, PeerJ, 6, p. e4644. doi: 10.7717/peerj.4644.

131 Espeland, M. et al. (2018). A comprehensive and dated phylogenomic analysis of butterflies. Current Biology, 28(5), pp. 770-778. doi: 10.1016/j.cub.2018.01.061. Evans, A. N. et al. (2018) ‘Indirect effects of agricultural pesticide use on parasite prevalence in wild pollinators’, Agriculture, Ecosystems and Environment, 258, pp. 40–48. doi: 10.1016/j.agee.2018.02.002. Evans, D. M. et al. (2016) ‘Merging DNA metabarcoding and analysis to understand and build resilient terrestrial ecosystems’, Functional Ecology, 30(12), pp. 1904- 1916. doi: 10.1111/1365-2435.12659. Falk, S. J. (2018) ‘Field Guide to the Bees of Great Britain and Ireland’, in. Bloomsbury Publishing. Fortuna, M. a. and Bascompte, J. (2006) ‘Habitat loss and the structure of plant-animal mutualistic networks’, Ecology Letters, 9(3), pp. 281–286. doi: 10.1111/j.1461- 0248.2005.00868.x. Forup, M. L. et al. (2007) ‘The restoration of ecological interactions: plant-pollinator networks on ancient and restored heathlands’, Journal of Applied Ecology, 45(3), pp. 742– 752. doi: 10.1111/j.1365-2664.2007.01390.x. Fürst, M. a. et al. (2014) ‘Disease associations between honeybees and bumblebees as a threat to wild pollinators’, Nature, 506(7488), pp. 364–366. doi: 10.1038/nature12977. Galliot, J. N. et al. (2017) ‘Investigating a flower-insect forager network in a mountain grassland community using pollen DNA barcoding’, Journal of Insect Conservation, 21(5–6), pp. 827–837. doi: 10.1007/s10841-017-0022-z. Garbuzov, M., Samuelson, E. E. W. and Ratnieks, F. L. W. (2015) ‘Survey of insect visitation of ornamental flowers in Southover Grange garden, Lewes, UK’, Insect Science, 22(5), pp. 700–705. doi: 10.1111/1744-7917.12162. Garibaldi, L. A. et al. (2013) ‘Wild pollinators enhance fruit set of crops regardless of honey bee abundance’, Science, 339, pp. 1608–1612. Geiger, M. F. et al. (2016) ‘How to tackle the molecular species inventory for an industrialized nation-lessons from the first phase of the German Barcode of Life initiative GBOL (2012-2015)’, Genome, 59(9), pp. 661–670. doi: 10.1139/gen-2015-0185. Geslin, B. et al. (2013) ‘Plant pollinator networks along a gradient of urbanisation’, PLoS ONE, 8(5), p. e63421. doi: 10.1371/journal.pone.0063421. Gezon, Z. J. et al. (2015) ‘The effect of repeated, lethal sampling on wild bee abundance and

132 diversity’, Methods in Ecology and Evolution, 6(9), pp. 1044–1054. doi: 10.1111/2041- 210X.12375. Gilbert, F. S. (1981) ‘Foraging ecology of hoverflies: morphology of the mouthparts in relation to feeding on nectar and pollen in some common urban species’, Ecological Entomology, 6(3), pp. 245–262. Gillett, C. P. D. T. et al. (2014) ‘Bulk de novo mitogenome assembly from pooled total DNA elucidates the phylogeny of weevils (Coleoptera: Curculionoidea)’, Molecular Biology and Evolution, 31(8), pp. 2223–2237. doi: 10.1093/molbev/msu154. Godfray, H. C. J. et al. (2014) ‘A restatement of the natural science evidence base concerning neonicotinoid insecticides and insect pollinators.’, Proceedings of the Royal Society B: Biological Sciences, 281(1786), p. 20140558. doi: 10.1098/rspb.2014.0558. Goncalves-Souza, T. et al. (2008) ‘Trait-mediated effects on flowers: Artificial spiders deceive pollinators and decrease plant fitness’, Ecology, 89(7), pp. 2407–2413. doi: 10.1890/07-1881.1 Goulson, D. et al. (2015) ‘Bee declines driven by combined stress from parasites, pesticides, and lack of flowers’, Science, 347(6229), pp. 1255957-1-1255957–9. doi: 10.1126/science.1255957. Grab, H. et al. (2019) ‘Agriculturally dominated landscapes reduce bee phylogenetic diversity and pollination services’, Science, 363, pp. 282–284. doi: 10.1126/science.aat6016 Gruter, C. and Ratnieks, F. L. W. (2011) ‘Flower constancy in insect pollinators’, Communicative and Intergrative Biology, 4(6), pp. 633–636. doi: 10.4161/cib.4.6.16972. Haenke, S. et al. (2009) ‘Increasing syrphid fly diversity and density in sown flower strips within simple vs. complex landscapes’, Journal of Applied Ecology, 46(5), pp. 1106–1114. doi: 10.1111/j.1365-2664.2009.01685.x. Haenke, S. et al. (2014) ‘Landscape configuration of crops and hedgerows drives local syrphid fly abundance’, Journal of Applied Ecology, 51(2), pp. 505–513. doi: 10.1111/1365- 2664.12221. Hallmann, C. A. et al. (2017) ‘More than 75 percent decline over 27 years in total flying insect biomass in protected areas’, PLoS ONE, 12(10) p. e0185809. doi: 10.1371/journal.pone.0185809. Harris, J. D. (2000) 'Can you bank on GenBank?', Trends in Ecology and Evolution, 18(7), pp. 317-319. doi: 10.1016/S0169-5347(03)00150-2.

133 Hawkins, J. et al. (2015) ‘Using DNA Metabarcoding to Identify the Floral Composition of Honey: A New Tool for Investigating Honey Bee Foraging Preferences’, Plos One, 10(8), p. e0134735. doi: 10.1371/journal.pone.0134735. Hebert, P.D.N. et al. (2003) 'Biological identifications through DNA barcodes', Proc Biol Sci, 270(1512), pp.313-321. doi: 10.1098/rspb.2002.2218 Hoehn, P. et al. (2008) ‘Functional group diversity of bee pollinators increases crop yield’, Proceedings of the Royal Society B: Biological Sciences, 275(1648), pp. 2283–2291. doi: 10.1098/rspb.2008.0405. Holloway, B. A. (1976) ‘Pollen-feeding in hover-flies (Diptera: Sryphidae)’, New Zealand Journal of Zoology, 3(4), pp. 339–350. doi: 10.1007/s13398-014-0173-7.2. Holzschuh, A., Dudenhöffer, J. H. and Tscharntke, T. (2012) ‘Landscapes with wild bee habitats enhance pollination, fruit set and yield of sweet cherry’, Biological Conservation, 153, pp. 101–107. doi: 10.1016/j.biocon.2012.04.032. Homburg, K. et al. (2019) ‘Where have all the beetles gone? Long-term study reveals carabid species decline in a nature reserve in Northern Germany’, Insect Conservation and Diversity, 12(4), pp. 268–277. doi: 10.1111/icad.12348. Hosseini, R., Schmidt, O. and Keller, M. A. (2008) ‘Factors affecting detectability of prey DNA in the gut contents of invertebrate predators: A polymerase chain reaction-based method’, Entomologia Experimentalis et Applicata, 126(3), pp. 194–202. doi: 10.1111/j.1570- 7458.2007.00657.x. Huson, D. H. et al. (2016) ‘MEGAN Community Edition - Interactive Exploration and Analysis of Large-Scale Microbiome Sequencing Data’, PLoS Computational Biology, 12(6), pp. 1–12. doi: 10.1371/journal.pcbi.1004957. Illumina (2015) 'TruSeq Nano DNA library preparation referenec guide'. catalog#: FC-121- 9010DOC. Jauker, F. et al. (2009) ‘Pollinator dispersal in an agricultural matrix: opposing responses of wild bees and hoverflies to landscape structure and distance from main habitat’, Landscape Ecology, 24(4), pp. 547–555. doi: 10.1007/s10980-009-9331-2. Jauker, F. and Wolters, V. (2008) ‘Hover flies are efficient pollinators of oilseed rape’, Oecologia, 156(4), pp. 819–823. doi: 10.1007/s00442-008-1034-x. Ji, Y. et al. (2019) ‘SPIKEPIPE : A metagenomic pipeline for the accurate quantification of eukaryotic species occurrences and intraspecific abundance change using DNA barcodes or

134 mitogenomes’, Molecular Ecology Resources, pp. 1–12. doi: 10.1111/1755-0998.13057. Jordaens, K. et al. (2015) ‘DNA Barcoding to Improve the Taxonomy of the Afrotropical Hoverflies (Insecta: Diptera: Syrphidae)’, Plos One, 10(10), p. e0140264. doi: 10.1371/journal.pone.0140264. Kearse, M. et al. (2012) ‘Geneious Basic: An integrated and extendable desktop software platform for the organization and analysis of sequence data’, Bioinformatics, 28(12), pp. 1647–1649. doi: 10.1093/bioinformatics/bts199. Keller, a et al. (2015) ‘Evaluating multiplexed next-generation sequencing as a method in palynology for mixed pollen samples.’, Plant biology (Stuttgart, Germany), 17(2), pp. 558– 66. doi: 10.1111/plb.12251. Kembel, S. W. et al. (2010) ‘Picante: R tools for integrating phylogenies and ecology’, Bioinformatics, 26(11), pp. 1463–1464. doi: 10.1093/bioinformatics/btq166. Kendall, D. A. and Solomon, M. . (1973) ‘Quantities of pollen on the bodies of insects visiting apple blossom’, Journal of Applied Ecology, 10(2), pp. 627–634. doi: 10.2307/2402306. Kerr, J. T. et al. (2015) ‘Climate change impacts on bumblebees converge across continents’, Science, 349(6244), pp. 177–180. doi: 10.1126/science.aaa7031. Kevan, P.G. (1983) 'Insects as flower visitors and pollinators', Annual Review of Entomology, 28, pp. 407-53. doi: 10.1146/Annurev.en.28.010183.002203. Kleijn, D. et al. (2015) ‘Delivery of crop pollination services is an insufficient argument for wild pollinator conservation’, Nature Communications, 6, p. 7414. doi: 10.1038/ncomms8414. Klein, A.M. et al. (2007) ‘Importance of pollinators in changing landscapes for world crops’, Proceedings of the Royal Society B: Biological Sciences, 274(1608), pp. 303–313. doi: 10.1098/rspb.2006.3721. Klein, A.M., Steffan-Dewenter, I. and Tscharntke, T. (2003) ‘Fruit set of highland coffee increases with the diversity of pollinating bees’, Proceedings of the Royal Society B: Biological Sciences, 270(1518), pp. 955–961. doi: 10.1098/rspb.2002.2306. Kraaijeveld, K. et al. (2015) ‘Efficient and sensitive identification and quantification of airborne pollen using next-generation DNA sequencing’, Molecular Ecology Resources, 15(1), pp. 8–16. doi: 10.1111/1755-0998.12288. Larson, B. M. H., Kevan, P. G. and Inouye, D. W. (2001) ‘Flies and flowers: taxonomic diversity of anthophiles and pollinators’, The Canadian Entomologist, 133, pp. 439–465. doi:

135 10.4039/Ent133439-4. Li, H. (2019) ‘Characterization and phylogenetic implications of the complete mitochondrial genome of Idiocerinae (Hemiptera: Cicadellidae)’, Genes, 10(563), pp. 1–13. doi: 10.1016/j.ijbiomac.2018.08.191. Linard, B. et al., (2016). 'Lessons from genome skimming of arthropod-preserving ethanol', Molecular Ecology Resources, 16(6). doi: 10.1111/1755-0998.12539. Lövei, G. L., Macleod, A. and Hickman, J. M. (1998) ‘Dispersal and effects of barriers on the movement of the New Zealand hoverfly Melanostoma fasciatum (Dipt., Syrphidae) on cultivated land’, Journal of Applied Entomology, 122(1-2), pp. 115–120. doi: 10.1111/j.1439- 0418.1998.tb01471.x. Lucas, A. et al. (2018) ‘Floral resource partitioning by individuals within generalised hoverfly pollination networks revealed by DNA metabarcoding’, Scientific Reports, 8(1), p. 5133. doi: 10.1038/s41598-018-23103-0. Magnacca, K. N. and Brown, M. J. F. (2012) ‘DNA barcoding a regional fauna : Irish solitary bees’, Molecular ecology resources, 12, pp. 990–998. doi: 10.1111/1755-0998.12001. Mao, M., Gibson, T. and Dowton, M. (2015) ‘Higher-level phylogeny of the Hymenoptera inferred from mitochondrial genomes’, Molecular Phylogenetics and Evolution, 84, pp. 34– 43. doi: 10.1016/j.ympev.2014.12.009. Memmott, J. et al. (2007) ‘Global warming and the disruption of plant-pollinator interactions’, Ecology Letters, 10(8), pp. 710–717. doi: 10.1111/j.1461-0248.2007.01061.x. Mengual, X., Ståhls, G. and Rojo, S. (2015) ‘Phylogenetic relationships and taxonomic ranking of pipizine flower flies (Diptera: Syrphidae) with implications for the evolution of aphidophagy’, Cladistics, 45, p. n/a-n/a. doi: 10.1111/cla.12105. Meyer, B., Jauker, F. and Steffan-Dewenter, I. (2009) ‘Contrasting resource-dependent responses of hoverfly richness and density to landscape structure’, Basic and Applied Ecology, 10(2), pp. 178–186. doi: 10.1016/j.baae.2008.01.001. Millan, S. M. C. et al. (2007) ‘The influence of time and temperature on molecular gut content analysis: Adalia bipunctata fed with Rhopalosiphum padi’, Insect Science, 14(5), pp. 353–358. doi: 10.1111/j.1744-7917.2007.00161.x. Miller, M. a., Pfeiffer, W. and Schwartz, T. (2010) ‘Creating the CIPRES Science Gateway for inference of large phylogenetic trees’, 2010 Gateway Computing Environments Workshop, GCE 2010, pp. 1–8. doi: 10.1109/GCE.2010.5676129.

136 Newson, S. E. et al. (2016) ‘Long-term changes in the migration phenology of UK breeding birds detected by large-scale citizen science recording schemes’, Ibis, 158(3), pp. 481–495. doi: 10.1111/ibi.12367. Nichols, R. V. et al. (2018) ‘Minimizing polymerase biases in metabarcoding’, Molecular Ecology Resources, 18(5), pp. 927–939. doi: 10.1111/1755-0998.12895. Notton, D. G. and Norman, H. (2017) ‘Hawk’s-beard Nomad Bee, Nomada facilis, new to Britain (Hymenoptera: Apidae)’, British Journal of Entomology and Natural History, 30, pp. 201–214. Oksanen, J. et al. (2016) ‘vegan: Community Ecology Package. Ordination methods, diversity analysis and other functions for community and vegetation ecologists’, 2006. https://cran. r- project. org/web/packages/vegan/index. html. Ollerton, J., Winfree, R. and Tarrant, S. (2011) ‘How many flowering plants are pollinated by animals?’, Oikos, 120(3), pp. 321–326. doi: 10.1111/j.1600-0706.2010.18644.x. Orford, K. a, Vaughan, I. P. and Memmott, J. (2015) ‘The forgotten flies: the importance of non-syrphid Diptera as pollinators.’, Proceedings. Biological sciences / The Royal Society, 282(1805), pp. 20142934-. doi: 10.1098/rspb.2014.2934. Packer, L. and Ruz, L. (2017) ‘DNA barcoding the bees (Hymenoptera: Apoidea) of Chile: Species discovery in a reasonably well known bee fauna with the description of a new species of Lonchopria (Colletidae)’, Genome, 60(5), pp. 414–430. doi: 10.1139/gen-2016- 0071. Painter, D., (2001) 'Macroinvertebrate distributions and the conservation value of aquatic Coleoptera, Mollusca and Ordonata in the ditches of traditionally managed and grazing fen at Wicken Fen, UK', Journal of Applied Ecology, 36(1). doi: 10.1046/j.1365- 2664.1999.00376.x De Palma, A. et al. (2017) ‘Dimensions of biodiversity loss: Spatial mismatch in land-use impacts on species, functional and phylogenetic diversity of European bees’, Diversity and Distributions, 23(12), pp. 1435–1446. doi: 10.1111/ddi.12638. Pascual-Villalobos, M. J. et al. (2006) ‘Effect of flowering plant strips on aphid and syrphid populations in lettuce’, European Journal of Agronomy, 24(2), pp. 182–185. doi: 10.1016/j.eja.2005.07.003. Pauli, T. et al. (2018) ‘New data, same story: phylogenomics does not support Syrphoidea (Diptera: Syrphidae, Pipunculidae)’, Systematic Entomology, 43(3), pp. 447–459. doi:

137 10.1111/syen.12283. Pescott, O. L. et al. (2015) ‘Ecological monitoring with citizen science: The design and implementation of schemes for recording plants in Britain and Ireland’, Biological Journal of the Linnean Society, 115(3), pp. 505–521. doi: 10.1111/bij.12581. Petchey, O. L. and Gaston, K. J. (2002) ‘ and the loss of functional diversity.’, Proceedings. Biological sciences / The Royal Society, 269(1501), pp. 1721–7. doi: 10.1098/rspb.2002.2073. Popov, G. V. (2015) Syrphidae from the Cretaceous - refuted? 8th International Symposium on Syrphidae, Monchau, Germany. Powney, G. D. et al. (2019) ‘Widespread losses of pollinating insects in Britain’, Nature Communications, 10(1), pp. 1–6. doi: 10.1038/s41467-019-08974-9. Puillandre, N. et al. (2012) ‘ABGD, Automatic Barcode Gap Discovery for primary species delimitation’, Molecular Ecology, 21(8), pp. 1864–1877. doi: 10.1111/j.1365- 294X.2011.05239.x. Rader, R. et al. (2011) ‘Pollen transport differs among bees and flies in a human-modified landscape’, Diversity and Distributions, 17, pp. 519–529. doi: 10.1111/j.1472- 4642.2011.00757.x. Rader, R. et al. (2016) ‘Non-bee insects are important contributors to global crop pollination’, Proceedings of the National Academy of Sciences. 113(1), pp. 146-151. doi: 10.1073/pnas.1517092112. Radford, K. G. and James, P. (2013) ‘Changes in the value of ecosystem services along a rural-urban gradient: A case study of Greater Manchester, UK’, Landscape and Urban Planning, 109(1), pp. 117–127. doi: 10.1016/j.landurbplan.2012.10.007. Rafferty, N. E. (2017) ‘Effects of global change on insect pollinators: multiple drivers lead to novel communities’, Current Opinion in Insect Science, 23, pp. 22–27. doi: 10.1016/j.cois.2017.06.009. Ratanasingham, S. and Hebert, P. D. N. (2007) ‘The Barcode of Life Data System BOLD ’:, Molecular Ecology Notes, 7, pp. 355-364. doi: 10.1111/j.1471-8286.2006.01678.x. Rees, H. C. et al. (2014) ‘The application of eDNA for monitoring of the Great Crested Newt in the UK’, Ecology and Evolution, 4(21), pp. 4023–4032. doi: 10.1002/ece3.1272. Revell, L. J. (2012) ‘phytools: An R package for phylogenetic comparative biology (and other things)’, Methods in Ecology and Evolution, 3(2), pp. 217–223. doi: 10.1111/j.2041-

138 210X.2011.00169.x. Richardson, R. T. et al. (2015) ‘Application of ITS2 Metabarcoding to Determine the Provenance of Pollen Collected by Honey Bees in an Agroecosystem’, Applications in Plant Sciences, 3(1), p. 1400066. doi: 10.3732/apps.1400066. van Rijn, P. C. J. and Wäckers, F. L. (2016) ‘Nectar accessibility determines fitness, flower choice and abundance of hoverflies that provide natural pest control’, Journal of Applied Ecology, 53(3), pp. 925–933. doi: 10.1111/1365-2664.12605. Rokas, A. and Carroll, S. B. (2002) ‘More Genes or More Taxa? The Relative Contribution of Gene Number and Taxon Number to Phylogenetic Accuracy’, 22(5), pp. 1337-1344. doi: 10.1093/molbev/msi121. Romero, G. Q., Antiqueira, P. A. P. and Koricheva, J. (2011) ‘A meta-analysis of predation risk effects on pollinator behaviour’, PLoS ONE, 6(6). p. e20689 doi: 10.1371/journal.pone.0020689. Roulston, T’ai H., Smith, Stephen A., B. A. L. (2007) ‘A Comparison of Pan Trap and Intensive Net Sampling Techniques for Documenting a Bee ( Hymenoptera : Apiformes ) fauna, Journal of the Kansas Entomological Society, 80(2), pp. 179–181. Rundlöf, M. et al. (2015) ‘Seed coating with a neonicotinoid insecticide negatively affects wild bees’, Nature, 521, pp. 77–80. doi: 10.1038/nature14420. Santamaría, S. et al. (2016) ‘Removing interactions, rather than species, casts doubt on the high robustness of pollination networks’, Oikos, 125(4), pp. 526–534. doi: 10.1111/oik.02921. Sapir, G. et al. (2017) ‘Synergistic effects between bumblebees and honey bees in apple orchards increase cross pollination, seed number and fruit size’, Scientia Horticulturae, 219, pp. 107–117. doi: 10.1016/j.scienta.2017.03.010. Šašić, L. et al. (2016) ‘Molecular and morphological inference of three cryptic species within the Merodon aureus species group (Diptera: Syrphidae)’, PLoS ONE, 11(8), pp. 6–8. doi: 10.1371/journal.pone.0160001. Saunders, M. E. (2018) ‘Insect pollinators collect pollen from wind-pollinated plants: implications for pollination ecology and sustainable agriculture’, Insect Conservation and Diversity, 11(1), pp. 13–31. doi: 10.1111/icad.12243. Savolainen, V. et al. (2005) ‘Towards writing the encyclopaedia of life: An introduction to DNA barcoding’, Philosophical Transactions of the Royal Society B: Biological Sciences,

139 360(1462), pp. 1805–1811. doi: 10.1098/rstb.2005.1730. Schwarz, H. H. and Huck, K. (1997) ‘Phoretic mites use flowers to transfer between foraging bumblebees’, Insectes Sociaux, 44(4), pp. 303–310. doi: 10.1007/s000400050051. Senapathi, Deepa et al. (2015) ‘Pollinator conservation - The difference between managing for pollination services and preserving pollinator diversity’, Current Opinion in Insect Science, 12, pp. 93–101. doi: 10.1016/j.cois.2015.11.002. Senapathi, D. et al. (2015) ‘The impact of over 80 years of land cover changes on bee and wasp pollinator communities in England’, Proceedings of the Royal Society B: Biological Sciences, 282(1806), pp. 20150294–20150294. doi: 10.1098/rspb.2015.0294. Shapland, E. B. et al. (2015) ‘Low-cost, high-throughput sequencing of DNA assemblies using a highly multiplexed Nextera process’, ACS Synthetic Biology, p. 150425205656002. doi: 10.1021/sb500362n. Sheffield, C. S. et al. (2017) ‘Contribution of DNA barcoding to the study of the bees (Hymenoptera: Apoidea) of Canada: Progress to date’, Canadian Entomologist, 149(6), pp. 736–754. doi: 10.4039/tce.2017.49. Shokralla, S. et al. (2015) ‘Massively parallel multiplex DNA sequencing for specimen identification using an Illumina MiSeq platform.’, Scientific reports, 5, p. 9687. doi: 10.1038/srep09687. Singh, R. et al. (2010) ‘RNA Viruses in Hymenopteran Pollinators: Evidence of Inter-Taxa Virus Transmission via Pollen and Potential Impact on Non-Apis Hymenopteran Species’, PLoS ONE, 5(12), p. e14357. doi: 10.1371/journal.pone.0014357. Skevington, J. H. and Yeates, D. K. (2000) ‘Phylogeny of the Syrphoidea (Diptera) inferred from mtDNA sequences and morphology with particular reference to classification of the Pipunculidae (Diptera)’, Molecular Phylogenetics and Evolution, 16(2), pp. 212–224. doi: 10.1006/m. Sonet, G. et al. (2019) ‘First mitochondrial genomes of five hoverfly species of the genus Eristalinus (Diptera: Syrphidae)’, Genome, 999, pp. 1-11. doi: https://doi.org/10.1139/gen- 2019-0009. Ståhls, G. et al. (2003) ‘Phylogeny of Syrphidae (Diptera) inferred from combined analysis of molecular and morphological characters’, Systematic Entomology, 28(4), pp. 433–450. doi: 10.1046/j.1365-3113.2003.00225.x. Stamatakis, A. (2014) ‘RAxML version 8: A tool for phylogenetic analysis and post-analysis of

140 large phylogenies’, Bioinformatics, 30(9), pp. 1312–1313. doi: 10.1093/bioinformatics/btu033. Straub, S. C. K. et al. (2012) ‘Navigating the tip of the genomic iceberg: Next-generation sequencing for plant systematics’, American Journal of Botany, 99(2), pp. 349–364. doi: 10.3732/ajb.1100335. Sutherland, J. P., Sullivan, M. S. and Poppy, G. M. (2001) ‘Distribution and abundance of aphidophagous hoverflies (Diptera: Syrphidae) in wildflower patches and field margin habitats’, Agricultural and Forest Entomology, 3(1), pp. 57–64. doi: 10.1046/j.1461- 9563.2001.00090.x. Taberlet, P. et al. (2007) ‘Power and limitations of the chloroplast trnL (UAA) intron for plant DNA barcoding’, Nucleic Acids Research, 35(3), p. e14. doi: 10.1093/nar/gkl938. Tang, M. et al. (2014) ‘Multiplex sequencing of pooled mitochondrial genomes - a crucial step toward biodiversity analysis using mito-metagenomics’, Nucleic Acids Research, 42(22), pp. e166–e166. doi: 10.1093/nar/gku917. Tang, M. et al. (2015) ‘High-throughput monitoring of wild bee diversity and abundance via mitogenomics’, Methods in Ecology and Evolution, 6, pp. 1034-1043. doi: 10.1111/2041- 210X.12416. Thompson, F. C. (1972) ‘A contribution to a generic revision of the neotropical Milesinae (Diptera: Syrphidae)’, Arquivos de Zoologia, 23(2), pp. 73–215. doi: 10.11606/issn.2176- 7793.v23i2p73-215 To, T. H. et al. (2016) ‘Fast Dating Using Least-Squares Criteria and Algorithms’, Systematic Biology, 65(1), pp. 82–97. doi: 10.1093/sysbio/syv068. Tur, C. et al. (2014) ‘Downscaling pollen-transport networks to the level of individuals’, Journal of Animal Ecology, 83(1), pp. 306–317. doi: 10.1111/1365-2656.12130. Turon, X. et al. (2019) ‘From metabarcoding to metaphylogeny: separating the wheat from the chaff’, bioRxiv. doi: 10.13811/j.cnki.eer.2019.06.011. Vanbergen, A. J. (2014) ‘Landscape alteration and habitat modification: impacts on plant– pollinator systems’, Current Opinion in Insect Science, 5, pp. 44–49. doi: 10.1016/j.cois.2014.09.004. Vanbergen, A. J. (2013) ‘Threats to an ecosystem service: pressures on pollinators’, Frontiers in Ecology and the Environment, 11(5), pp. 251–259. doi: 10.1890/120126. de Vere, N. et al. (2012) ‘DNA barcoding the native flowering plants and conifers of wales’,

141 PLoS ONE, 7(6), pp. 1–12. doi: 10.1371/journal.pone.0037945. de Vere, N. et al. (2017) ‘Using DNA metabarcoding to investigate honey bee foraging reveals limited flower use despite high floral availability’, Scientific Reports, 7, pp. 1–10. doi: 10.1038/srep42838. Vrdoljak, S. M. and Samways, M. J. (2012) ‘Optimising coloured pan traps to survey flower visiting insects’, Journal of Insect Conservation, 16(3), pp. 345–354. doi: 10.1007/s10841- 011-9420-9. Weiner, C. N. et al. (2014) ‘Land-use impacts on plant-pollinator networks: Interaction strength and specialization predict pollinator declines’, Ecology, 95(2), pp. 466–474. doi: 10.1890/13-0436.1. Whitworth, T. L. et al. (2007) ‘DNA barcoding cannot reliably identify species of the blowfly genus Protocalliphora (Diptera: Calliphoridae)’, Proceedings of the Royal Society B: Biological Sciences, 274(1619), pp. 1731–1739. doi: 10.1098/rspb.2007.0062. Wickham, H. (2016) ggplot2: Elegant Graphics for Data Analysis. New York: Springer-Verlag. Wilson, J. S., Griswold, T. and Messinger, O. J. (2008) ‘Sampling Bee Communities (Hymenoptera: Apiformes) in a Desert Landscape: Are Pan Traps Sufficient?’, Journal of the Kansas Entomological Society, 81(3), pp. 288–300. doi: 10.2317/jkes-802.06.1. Wratten, S. D. et al. (2003) ‘Field boundaries as barriers to movement of hover flies (Diptera: Syrphidae) in cultivated land.’, Oecologia, 134(4), pp. 605–611. doi: 10.1007/s00442-002-1128-9. Yao, H. et al. (2010) ‘Use of ITS2 region as the universal DNA barcode for plants and animals’, PLoS ONE, 5(10). doi: 10.1371/journal.pone.0013102. Young, A. D. et al. (2016) ‘Anchored enrichment dataset for true flies (order Diptera) reveals insights into the phylogeny of flower flies (family Syrphidae)’, BMC Evolutionary Biology, 16(1), pp. 1–13. doi: 10.1186/s12862-016-0714-0. Yu, D. W. et al. (2012) ‘Biodiversity soup: Metabarcoding of arthropods for rapid biodiversity assessment and biomonitoring’, Methods in Ecology and Evolution, 3(4), pp. 613–623. doi: 10.1111/j.2041-210X.2012.00198.x. Zhang, X. et al. (2019) ‘Mitochondrial genomes provide insights into the phylogeny of Culicomorpha (Insecta: Diptera)’, International Journal of Molecular Sciences, 20(3), pp. 747. doi: 10.3390/ijms20030747. Zhu, X. C. et al. (2017) ‘DNA barcoding and species delimitation of chaitophorinae

142 (Hemiptera, Aphididae)’, ZooKeys, 2017(656), pp. 25–50. doi: 10.3897/zookeys.656.11440.

Supplementary Information

Chapter 1: Syrphidae Mitochondrial Genomes and Phylogeny

Present Genus Valid Species length nad2 cox1 cox2 atp8 atp6 cox3 nad3 nad5 nad4 nad4l nad6 cob nad1 protein-coding genes rrnL rrnS RNA Alipumilio avispas 15,510 1 1 1 1 1 1 1 1 1 1 1 1 1 13 0 Allobaccha monobia 3,858 1 1 1 3 0 Allograpta fascifrons 13,402 1 1 1 1 1 1 1 1 1 1 1 1 1 13 1 1 2 Allograpta obliqua 15,075 1 1 1 1 1 1 1 1 1 1 1 1 1 13 1 1 2 Argentinomyia sp. ECU9 3,767 1 1 1 1 1 5 0 Aristosyrphus 14,923 1 1 1 1 1 1 1 1 1 1 1 1 1 13 1 1 2 Asarkina sp. 4,852 1 1 1 1 1 1 6 0 Austalis copiosa 15,345 1 1 1 1 1 1 1 1 1 1 1 1 1 13 1 1 2 Baccha elongata 2,804 1 1 2 0 Betasyrphus serarius 6,258 1 1 1 1 1 1 1 1 8 0 Blera eoa 6,605 1 1 1 1 1 1 1 7 0 Brachypalpus oarus 13,242 1 1 1 1 1 1 1 1 1 1 1 1 1 13 1 1 Calcaretropidia sp. 10,037 1 1 1 1 1 1 1 1 1 9 0 Caliprobola sp. 15,381 1 1 1 1 1 1 1 1 1 1 1 1 1 13 1 1 2 Callicera aenea 15,442 1 1 1 1 1 1 1 1 1 1 1 1 1 13 1 1 2 Ceriana alboseta 14,954 1 1 1 1 1 1 1 1 1 1 1 1 1 13 1 1 2 Ceriana cacica 3,975 1 1 1 1 4 0 Ceriana vespiformis 15,495 1 1 1 1 1 1 1 1 1 1 1 1 1 13 1 1 2 Chalcosyrphus chalybeus 15,239 1 1 1 1 1 1 1 1 1 1 1 1 12 1 1 2 Chamaesphegina sp. 15,549 1 1 1 1 1 1 1 1 1 1 1 1 1 13 1 1 2 Chasmomma nigrum 15,704 1 1 1 1 1 1 1 1 1 1 1 1 1 13 1 1 2 Cheilosia albitarsis 13,886 1 1 1 1 1 1 1 1 1 1 1 1 1 13 1 1 2 Claraeola sicilis 11,482 1 1 1 1 1 1 1 1 1 1 1 1 12 0 Copestylum sp. ECU3 16,211 1 1 1 1 1 1 1 1 1 1 1 1 1 13 1 1 2 Criorhina nigriventris 16,004 1 1 1 1 1 1 1 1 1 1 1 1 1 13 1 1 2 Cyphipelta rufocyanea 15,248 1 1 1 1 1 1 1 1 1 1 1 1 1 13 1 1 2 Dideopsis aegrota 15,935 1 1 1 1 1 1 1 1 1 1 1 1 1 13 1 1 2 Domodon peperpotensis 8,160 1 1 1 1 1 1 1 1 8 0 Doros destillatorius 9,298 1 1 1 1 1 1 1 1 1 9 0 Eristalinus aeneus 15,182 1 1 1 1 1 1 1 1 1 1 1 1 1 13 1 1 2 Epalpus signifer 3,526 1 1 1 1 1 5 0 Eristalis pratorum 2,873 1 1 2 0 Hadromyia pulchra 4,661 1 1 1 1 1 1 6 0 Hybobathus norina 16,085 1 1 1 1 1 1 1 1 1 1 1 1 1 13 1 1 2 Hypselosyrphus 15,760 1 1 1 1 1 1 1 1 1 1 1 1 1 13 1 1 2 Kertesziomyia violascens 15,485 1 1 1 1 1 1 1 1 1 1 1 1 1 13 1 1 2 Lejops lunulata 15,621 1 1 1 1 1 1 1 1 1 1 1 1 1 13 1 1 2 Lejota ruficornis 2,655 1 1 1 3 0 Lycopale sp. 15,207 1 1 1 1 1 1 1 1 1 1 1 1 1 13 1 1 Mallota florea 14,321 1 1 1 1 1 1 1 1 1 1 1 1 1 13 1 1 Matsumyia nigrofacies 14,584 1 1 1 1 1 1 1 1 1 1 1 1 1 13 1 1 2 Meliscaeva auricollis 15,759 1 1 1 1 1 1 1 1 1 1 1 1 1 13 1 1 2 Microdon globosus 13,093 1 1 1 1 1 1 1 1 1 1 1 1 1 13 1 1 2 Microdon sp. ECU1 14,420 1 1 1 1 1 1 1 1 1 1 1 1 1 13 1 1 Milesia pendleburyi 14,545 1 1 1 1 1 1 1 1 1 1 1 1 12 1 1 2 Nausigaster sp. MEX11 8,684 1 1 1 1 1 1 1 1 1 9 0 Neoplesia analis 15,509 1 1 1 1 1 1 1 1 1 1 1 1 1 13 1 1 2 Nephrocerus lapponicus 16,581 1 1 1 1 1 1 1 1 1 1 1 1 1 13 1 1 2 Ocyptamus dimidiatus 14,301 1 1 1 1 1 1 1 1 1 1 1 1 1 13 1 1 Ocyptamus cubana 15,520 1 1 1 1 1 1 1 1 1 1 1 1 1 13 1 1 2 Ocyptamus melanorrhina 16,191 1 1 1 1 1 1 1 1 1 1 1 1 1 13 1 1 2 Ocyptamus priscilla 13,942 1 1 1 1 1 1 1 1 1 1 1 1 1 13 1 1 Ornidia obesa 15,555 1 1 1 1 1 1 1 1 1 1 1 1 1 13 1 1 2 Orthonevra nitida 4,218 1 1 1 3 1 1 2 Orthoprosopa grisea 15,846 1 1 1 1 1 1 1 1 1 1 1 1 1 13 1 1 2 Orthoprosopa multicolor 15,205 1 1 1 1 1 1 1 1 1 1 1 1 1 13 1 1 2 Paramixogaster 14,508 1 1 1 1 1 1 6 0 Parhelophilus rex 9,933 1 1 1 1 1 1 1 1 1 1 1 1 1 13 1 1 2 Parhelophilus sp. HUN1 14,682 1 1 1 1 1 1 1 1 1 1 1 1 1 13 1 1 2 Pelecocera tricincta 4,149 1 1 1 3 0 Pseudomicrodon sp. 15,969 1 1 1 1 1 1 1 1 1 1 1 1 1 13 1 1 2 Psilota atra 15,693 1 1 1 1 1 1 1 1 1 1 1 1 1 13 1 1 2 Psilota anthracina 15,701 1 1 1 1 1 1 1 1 1 1 1 1 1 13 1 1 2 Pterallastes thoracicus 4,003 1 1 1 3 0 Rodendorfia alpina 7,237 1 1 2 0 Salpingogaster sp. ECU13 9,177 1 1 1 1 1 1 1 1 1 1 10 0 Senaspis dentipes 14,389 1 1 1 1 1 1 1 1 1 1 1 1 1 13 1 1 Serichlamys sp. 2,725 1 1 1 1 1 1 1 1 1 1 1 1 12 1 1 2 Sericomyia flagrans 3,444 1 1 1 1 1 5 0 Somula decora 14,028 1 1 1 1 1 1 1 1 1 1 1 1 1 13 1 1 2 Sterphus 15,713 1 1 1 1 1 1 1 1 1 1 1 1 1 13 1 1 2 Stipomorpha sp. 13,682 1 1 1 1 1 1 1 1 1 1 1 1 12 1 1 2 Syrphus rectus 4,588 1 1 1 3 0 Talahua sp. 13,691 1 1 1 1 1 1 1 1 1 1 1 1 1 13 1 1 Toxomerus saphiridiceps 17,574 1 1 1 1 1 1 1 1 1 1 1 1 1 13 1 1 2 Tropidia rostrata 10,416 1 1 1 1 1 1 1 1 8 0 Xylota quadrimaculata 15,055 1 1 1 1 1 1 1 1 1 1 1 1 1 13 1 1 2 DQ866050 Simosyrphus grandicornis 16,141 1 1 1 1 1 1 1 1 1 1 1 1 1 13 1 1 2 KM244713 Syrphidae sp. 11,583 1 1 1 1 1 1 1 1 1 1 10 1 1 2 KT272862 Ocyptamus sativus 15,214 1 1 1 1 1 1 1 1 1 1 1 1 1 13 1 1 2 69 77 77 72 70 67 66 63 60 57 56 56 56

Table showing all of the Syrphidae contigs used in the phylogenetic analysis, their length and the genes present.

143

The placement of Nausigaster in each of the 13 gene trees, shown in order of genes on the mitochondrial genome. The CO1 gene tree includes four Nausigaster CO1 sequences downloaded from GenBank. The three families are shown in different colours: orange = Microdontinae, pink = Eristalinae and green = Syrphinae. Nausigaster is marked on the tree with a red branch and a red star.

144 Chapter 2: Developing and testing a Syrphidae CO1 reference database

BOLD NPPMF NHM collections

GBMIN61940_17|Myolepta_dubia|COI_5P_KM270877 BEEEE357_16|Chrysotoxum_arcuatum|COI_5P SYRUK1_Platycheirus_clypeatus

AGAKB003_17|Chalcosyrphus_nemorum|COI_5P_MG166799 BEEEE358_16|Cheilosia_fraterna|COI_5P SYRUK2_Paragus_haemorrhous

ASDIP283_15|Syrphus_ribesii|COI_5P BEEEE359_16|Cheilosia_lasiopa|COI_5P SYRUK3_Syrphus_vitripennis

ASDIP288_15|Syrphus_ribesii|COI_5P BEEEE360_16|Chalcosyrphus_nemorum|COI_5P SYRUK4_Syrphus_ribesii

ASDIP290_15|Syrphus_rectus|COI_5P BEEEE361_16|Cheilosia_ranunculi|COI_5P SYRUK5_Eupeodes_luniger

ASDIP556_15|Syrphus_torvus|COI_5P BEEEE362_16|Dasysyrphus_pinastri|COI_5P SYRUK6_Platycheirus_clypeatus

ASDIP561_15|Syrphus_rectus|COI_5P BEEEE364_16|Episyrphus_balteatus|COI_5P SYRUK7_Dasysyrphus_albostriatus

ASDIP562_15|Syrphus_rectus|COI_5P BEEEE365_16|Eupeodes_corollae|COI_5P SYRUK8_Eupeodes_luniger

ASDIP563_15|Syrphus_rectus|COI_5P BEEEE366_16|Epistrophe_grossulariae|COI_5P SYRUK9_Melanostoma_mellinum

ASDMT1051_11|Eumerus_funeralis|COI_5P_MG164182 BEEEE366_16|Epistrophe_grossulariae|COI_5P_2 SYRUK10_Eupeodes_corollae

ASDMT1068_11|Eumerus_funeralis|COI_5P_MG163077 BEEEE367_16|Eristalis_horticola|COI_5P SYRUK11_Eupeodes_latifasciatus

ASDMT1274_11|Melangyna_lasiophthalma|COI_5P_MG163641 BEEEE369_16|Eupeodes_latifasciatus|COI_5P SYRUK12_Platycheirus_clypeatus

ASDMT1307_11|Melangyna_lasiophthalma|COI_5P_MG170934 BEEEE370_16|Eupeodes_luniger|COI_5P SYRUK13_Dasysyrphus_tricinctus

ASDMT1328_11|Melangyna_lasiophthalma|COI_5P_MG166162 BEEEE371_16|Eristalis_nemorum|COI_5P SYRUK14_Syrphus_ribesii

ASDMT1419_11|Syrphus_ribesii|COI_5P_MG165935 BEEEE372_16|Eristalis_pertinax|COI_5P SYRUK15_Platycheirus_clypeatus

ASDMT1459_11|Merodon_equestris|COI_5P_MG170543 BEEEE373_16|Eristalinus_sepulchralis|COI_5P SYRUK16_Cheilosia_vernalis

ASDMT1461_11|Merodon_equestris|COI_5P_MG164839 BEEEE374_16|Eumerus_strigatus|COI_5P SYRUK17_Pipizella_viduata

ASDMT1462_11|Merodon_equestris|COI_5P_MG164422 BEEEE375_16|Eristalis_tenax|COI_5P SYRUK18_Meliscaeva_auricollis

ASDMT1465_11|Merodon_equestris|COI_5P_MG165657 BEEEE376_16|Eristalis_tenax|COI_5P SYRUK19_Eupeodes_luniger

ASDMT1487_11|Merodon_equestris|COI_5P_MG163224 BEEEE377_16|Eristalis_tenax|COI_5P SYRUK20_Eupeodes_corollae

ASDMT1495_11|Merodon_equestris|COI_5P_MG169358 BEEEE378_16|Ferdinandea_cuprea|COI_5P SYRUK21_Syrphus_vitripennis

ASDMT1884_12|Syrphus_ribesii|COI_5P_KR428380 BEEEE379_16|Helophilus_pendulus|COI_5P SYRUK22_Eupeodes_luniger

ASDMT649_11|Eumerus_funeralis|COI_5P_MG169456 BEEEE380_16|Leucozona_lucorum|COI_5P SYRUK23_Syrphus_ribesii

ASDMT652_11|Eumerus_funeralis|COI_5P_MG165759 BEEEE381_16|Lejogaster_metallina|COI_5P SYRUK24_Dasysyrphus_albostriatus

ASDMT665_11|Eumerus_funeralis|COI_5P_MG163858 BEEEE382_16|Meliscaeva_cinctella|COI_5P SYRUK25_Melanostoma_mellinum

ASDMT869_11|Eupeodes_latifasciatus|COI_5P_MG163414 BEEEE383_16|Myathropa_florea|COI_5P SYRUK26_Melanostoma_mellinum

BARSM074_17|Chalcosyrphus_nemorum|COI_5P_MG163416 BEEEE384_16|Melanogaster_hirtella|COI_5P SYRUK27_Eupeodes_luniger

BARSM513_17|Syritta_pipiens|COI_5P_MG169778 BEEEE385_16|Melanogaster_hirtella|COI_5P SYRUK28_Platycheirus_clypeatus

BBDCM060_10|Neoascia_geniculata|COI_5P_JF866880_SUPPRESSED BEEEE386_16|Melanogaster_hirtella|COI_5P SYRUK29_Eupeodes_corollae

supplementary table barcode

Table containing the first 50 sequence names from each of the three databases. The rest of the sequence names making up the curated reference databases can be accessed through the hyperlink, and all of the sequence data is available on BOLD.

145