Instituto de Investigación en Recursos Cinegéticos

Instituto de Investigación en Recursos Cinegéticos

CSIC - UCLM - JCCM l a r o t c o D s i s e T

María José González Serna 9 1 0 2

Tesis Doctoral

Sistemática del género Dociostaurus (: ) y variación genómica en especies con diferente grado de abundancia: Implicaciones para su manejo y conservación

María José González Serna

TESIS DOCTORAL

CIUDAD REAL, 2019

Instituto de Investigación en Recursos Cinegéticos (IREC) (CSIC-UCLM-JCCM)

CITA RECOMENDADA:

González-Serna, M.J. (2019) Sistemática del género Dociostaurus (Orthoptera: Acrididae) y variación genómica en especies con diferente grado de abundancia: Implicaciones para su manejo y conservación. Tesis Doctoral. Instituto de Investigación en Recursos Cinegéticos, IREC (CSIC-UCLM-JCCM), , España.

IMAGEN DE PORTADA: Dociostaurus hispanicus ♂, Valle de Alcudia (Ciudad Real). Fotografía tomada por Pedro J. Cordero.

DISEÑO DE PORTADA: María José González Serna.

Sistemática del género Dociostaurus (Orthoptera: Acrididae) y variación genómica en especies con diferente grado de abundancia: Implicaciones para su manejo y conservación

Memoria presentada por María José González Serna para optar al grado de Doctora por la Universidad de Castilla-La Mancha

La Doctoranda Vº Bº del Director Vº Bº del Director

María José González Serna Dr. Joaquín Ortego Lozano Dr. Pedro J. Cordero Tapia

Instituto de Investigación en Recursos Cinegéticos (IREC) CSIC-UCLM-JCCM Programa de Doctorado en Ciencias Agrarias y Ambientales Departamento de Ciencia y Tecnología Agroforestal y Genética UNIVERSIDAD DE CASTILLA-LA MANCHA

Durante el desarrollo de mi Tesis Doctoral disfruté de dos Ayudas para la Formación de Personal Investigador (FPI) que se instrumentaron a través de varios contratos anuales sucesivos para la formación de doctores según la Ley 14/2011, de 1 de junio, de la Ciencia, la Tecnología y la Innovación, durante el periodo de diciembre 2014-diciembre 2018. Dichas ayudas FPI fueron cofinanciadas por la Junta de Comunidades de Castilla-La Mancha y el Fondo Social Europeo, en el marco del Programa de Potenciación de Recursos Humanos del Plan Regional de Investigación e Innovación de Castilla-La Mancha y dentro de las medidas para la retención y el retorno del talento en línea con los objetivos de la RIS3 (DOCM nº 155 del 13 de agosto de 2014, pág. 23906-23924; DOCM nº 232 del 29 de noviembre de 2016, pág. 27644-27657). Esta Tesis Doctoral además se ha desarrollado en el contexto de diversos proyectos I+D+i financiados por el Ministerio de Economía y Competitividad, la Junta de Comunidades de Castilla-La Mancha y el Fondo Social Europeo (CGL2011-25053, CGL2014- 54671-P, CGL2016-80742-R, CGL2017-83433-P, PEII-2014023-P).

A mi padre Antonio, a mi madre Mari Carmen

y a mi hermana Beatriz

¿Cuál es el que se parece a todos los animales?

La langosta; porque tiene cuernos de ciervo, ojos de vaca, frente de caballo, patas de cigüeña, cola de culebra y alas de paloma.

BOWLES, D. (1775) INTRODUCCIÓN A LA HISTORIA NATURAL Y A LA FÍSICA DE ESPAÑA. P.258

ÍNDICE

RESUMEN ………………………………………………………………………………………………………….……3

INTRODUCCIÓN GENERAL …………………………………………..………………………………………….5

‐ SISTEMA DE ESTUDIO …………………………………………………………………………………..12 ‐ JUSTIFICACIÓN Y OBJETIVOS GENERALES ……………………………………………………..17 ‐ ESTRUCTURA DE LA TESIS DOCTORAL ………………………………………………………..…19 ‐ METODOLOGÍA ……………………………………………………………………………………………22

CAPÍTULO 1 A review of cross-backed grasshoppers of the genus Dociostaurus Fieber (Orthoptera: Acrididae) from the Western Mediterranean: insights from phylogenetic analyses and DNA-based species delimitation ……………..….37

CAPÍTULO 2 Using high-throughput sequencing to investigate the factors structuring genomic variation of a Mediterranean grasshopper of great conservation concern ……………..….69

CAPÍTULO 3 Spatiotemporally explicit demographic modelling supports a joint effect of historical barriers to dispersal and contemporary landscape composition on structuring genomic variation in a red-listed grasshopper ……………………..…………..….115

CAPÍTULO 4 Insights into the neutral and adaptive processes shaping the spatial distribution of genomic variation in the economically important Moroccan locust () …………………………………………………………………………….….179

DISCUSIÓN GENERAL ………………………………………………….……………….………………...….227

CONCLUSIONES ……………………………………………………………………………….…………..…....241

AGRADECIMIENTOS ………………………………………………………………………….…………...….245

RESUMEN 2

Dociostaurus maroccanus ♂ adulto(arriba) y ninfa (abajo), (Ciudad Real). Fotografía tomada por Pedro J. Cordero.

RESUMEN

El auge de las tecnologías de secuenciación masiva y el desarrollo de nuevas aproximaciones analíticas nos brindan actualmente una oportunidad única para resolver problemas taxonómicos e inferir con una resolución sin precedentes los procesos ecológicos y evolutivos que han modelado la variación genómica de las especies a lo largo del tiempo y el espacio. Estos conocimientos suponen, a su vez, una herramienta clave para el desarrollo de estrategias orientadas al manejo y conservación de las poblaciones, especies y comunidades, algo que cobra especial relevancia en el actual escenario de cambio global y crisis de la biodiversidad. Los objetivos generales de esta Tesis Doctoral son (i) esclarecer la sistemática del género Dociostaurus (Orthoptera: Acrididae) y (ii) profundizar en el conocimiento de diferentes aspectos de la ecología y evolución de algunas de sus especies con distinto grado de abundancia, con el fin de obtener respuestas integradoras y generales que nos permitan entender los procesos que generan la diversidad biológica a diferentes niveles. La reconstrucción de las relaciones filogenéticas entre las distintas especies de Dociostaurus distribuidas en el Mediterráneo occidental y la datación de sus tiempos de divergencia indicaron que la principal diversificación del género tuvo lugar durante el Mioceno-Plioceno y pusieron en evidencia que el estatus de los subgéneros actualmente reconocidos debe ser reevaluado. Análisis detallados de delimitación de especies resolvieron el estatus taxonómico de diferentes especies hermanas que presentan una distribución disyunta en la región Paleártica, dos de las cuales son endemismos ibéricos con un importante grado de amenaza (Dociostaurus crassiusculus y D. hispanicus). Análisis demográficos y de genética del paisaje para D. crassiusculus mostraron que las poblaciones de esta especie se estructuran en dos linajes que comprenden un total de tres grupos genéticos cuya distribución es explicada por los límites entre las principales cuencas hidrográficas. El contraste de modelos demográficos espaciotemporalmente explícitos para la especie D. hispanicus revelaron que su marcada estructuración genética ha sido moldeada por la configuración espacial de barreras topográficas y, en menor medida, por la fragmentación de sus hábitats naturales. Por el contrario, análisis genómicos para la langosta marroquí (D. maroccanus) indicaron que las poblaciones de esta especie plaga presentan una escasa estructuración genética y mostraron pocas diferencias en la demografía y en los patrones espaciales de variación genómica entre poblaciones gregarias y solitarias. Sin embargo, la

3

RESUMEN presencia de numerosos loci bajo selección en esta especie sugiere que algunas de sus poblaciones podrían haber desarrollado ciertas adaptaciones locales en respuesta a determinados agentes selectivos. En su conjunto, los resultados derivados de esta Tesis Doctoral resaltan la importancia de aproximaciones multidisciplinares e integradoras para estudiar los procesos neutrales y adaptativos que moldean la variación genética a distintas escalas evolutivas y ponen de manifiesto cómo esta información puede tener implicaciones relevantes de cara a establecer estrategias de manejo en organismos con estatus de conservación muy dispares.

4

INTRODUCCIÓN GENERAL 2

Dociostaurus crassiusculus ♂, Laguna de Peña Hueca (Toledo). Fotografía tomada por Pedro J. Cordero.

INTRODUCCIÓN GENERAL

La biología de la conservación surgió como una disciplina para unir la brecha que separaba la genética y la ecología, por una parte, y las prácticas de gestión y conservación, por otra (Soulé & Wilcox, 1980). El reto que aborda esta rama de la ciencia es generar conocimientos sólidos basados en la evidencia científica que permitan desarrollar políticas de manejo y conservación orientadas a preservar los ecosistemas, evitar la extinción de especies y minimizar la pérdida de su variabilidad genética. Su objetivo último es, por tanto, evitar “defoliar” aún más el árbol de la vida (Avise, 2005). Es en esta intersección entre la obtención de conocimientos básicos en sistemática, ecología y evolución y las implicaciones aplicadas que éstos tienen para el manejo y la conservación de la biodiversidad donde se enmarca el desarrollo de la presente Tesis Doctoral.

En numerosos organismos, fundamentalmente invertebrados, el progreso en el diseño de estrategias adecuadas para su conservación se ha visto obstaculizado principalmente por lo que se han denominado como impedimentos “taxonómicos” y “ecológicos”, que hacen referencia a la escasez de datos robustos acerca de su sistemática (p. ej.: reconstrucciones filogenéticas, delimitación de especies y linajes, etc.) y a las enormes lagunas de conocimiento sobre elementos básicos de su ecología (p. ej.: distribución, hábitat, fenología, etc.) (Taylor, 1983; New & Samways, 2014). Un aspecto de gran importancia desde un punto de vista tanto de ciencia fundamental como aplicada es establecer criterios objetivos que permitan delimitar con rigor las distintas especies, puesto que éstas representan las unidades en base a las cuales se establecen la mayor parte de las políticas y programas de conservación (UICN, 2018). La delimitación de especies tiene por objetivo alcanzar un conocimiento más preciso de la diversidad biológica mediante la separación de estas unidades taxonómicas (Wiens, 2007; Carsten et al., 2013). El primer obstáculo al respecto radica en la falta de consenso teórico en torno al concepto de especie, un aspecto estrechamente ligado a los problemas prácticos que implica su delimitación a partir de la información empírica disponible (de Queiroz, 2007). La aparición de nuevos tipos de datos y aproximaciones analíticas, así como la evolución progresiva de los distintos conceptos de especie, han permitido dar respuestas a esta problemática desde una perspectiva cada vez más integradora (de Queiroz, 2007; Padial et al., 2010). Un ejemplo paradigmático de problema taxonómico es la existencia de especies crípticas, entendiendo

7

INTRODUCCIÓN GENERAL como tales aquellas que se encuentran clasificadas bajo una única denominación en base a criterios, generalmente fenotípicos, que ha hecho imposible diferenciarlas (Bickford et al., 2007; Fišer et al. 2018). Otros autores van más allá, incluyendo en esta definición a aquellas especies hermanas que han divergido recientemente, se encuentran aisladas reproductivamente en la actualidad y son únicamente separables mediante datos moleculares (Stebbins, 1950). La determinación de linajes en algunos taxones formalmente reconocidos que exceden los niveles de divergencia genética esperados a nivel intraespecífico, en combinación con la identificación de diferencias fenotípicas sutiles o su distribución en diferentes ambientes o regiones, han permitido establecer la presencia de especies crípticas en grupos de organismos a lo largo y ancho de todo el árbol de la vida (p. ej.: peces de agua dulce: Feulner, et al., 2006; mariposas tropicales: Hebert, et al., 2004; bivalvos: Vrijenhoek, et al., 1994; plantas árticas: Grundt, et al. 2006). Aunque en muchas ocasiones el descubrimiento de especies crípticas es un subproducto de la investigación sobre otros aspectos, su identificación supone un gran avance para el conocimiento de la diversidad del planeta y tiene implicaciones multidisciplinares en aspectos tan dispares como son la salud humana (p. ej.: los mosquitos transmisores de la malaria, Anopheles), el manejo de plagas (p. ej.: variabilidad en la resistencia a plaguicidas) o la comprensión de procesos coevolutivos (Bickford et al., 2007). Por todos estos motivos, la delimitación de especies es un tema de enorme trascendencia en taxonomía y sistemática y uno de los más candentes en biología evolutiva, puesto que su resolución es el primer paso para avanzar en la comprensión de los procesos ecológicos y evolutivos que subyacen a los patrones actuales de diversidad biológica. Por otro lado, la delimitación de especies tiene profundas implicaciones que trascienden al ámbito puramente académico (de Queiroz, 2007; Wiens, 2007) dada su gran importancia de cara a la elaboración de las normativas ambientales que rigen las políticas de gestión, manejo y conservación (Zachos, 2015; UICN, 2018).

En un sentido amplio, uno de los objetivos de la biología de la conservación es preservar el resultado de procesos evolutivos pasados que suponen una componente fundamental y, generalmente, única e irrepetible de la biodiversidad actual (Bowen, 1999). Estos procesos evolutivos idiosincráticos no son exclusivamente aquellos que desembocan en la formación de diferentes especies, sino que muy frecuentemente se desarrollan a nivel

8

INTRODUCCIÓN GENERAL infraespecífico e implican poblaciones, linajes o ecotipos que merecen ser preservados o, cuanto menos, considerados en los planes de manejo y conservación. Esto ha dado lugar al desarrollo de lo que se conoce como Unidades Evolutivas Significativas (Evolutionary Significant Units, ESU; Ryder, 1986; Waples, 1991; Moritz, 1994), que definen aquellas fracciones de una determinada especie que pueden ser evaluadas y designadas considerando la acumulación de diferencias genéticas que presentan como resultado del papel de distintas fuerzas evolutivas (p. ej.: deriva genética, selección divergente, procesos de adaptación local, etc.) (Fraser & Bernatchez, 2001). El concepto de ESU surgió ante la necesidad de hacer frente a la imposibilidad de la taxonomía para capturar la diversidad genética infraespecífica y los procesos de adaptación local que representan una componente esencial del legado evolutivo de una especie (Moritz, 2002). Por tanto, su razón de ser es proporcionar la base teórica sobre la que debe fundamentarse el establecimiento de prioridades de conservación por debajo del nivel de especie (Casacci et al., 2013). Desafortunadamente, la dificultad de establecer las ESUs y la falta de consenso en torno a su definición ha generado que su aplicación haya tenido un escaso impacto en la evaluación y definición de prioridades de conservación en las políticas gubernamentales. Como alternativa se desarrolló el concepto de “Unidades Designables” (Designatable Units, DU) con el objetivo de intentar dar una solución práctica que permita identificar entidades infraespecíficas de distinta índole que difieren en su estatus de conservación aun cuando su grado de diferenciación ecológica, genética o adaptativa no se encuentre formalmente resuelto (Green, 2005; Mee et al., 2015).

Los procesos que modulan la variación genética neutral y adaptativa de una especie y el conjunto de sus poblaciones no pueden ser entendidos si no es mediante su estudio en el contexto espaciotemporal en el que éstos tienen lugar. Los enormes avances en la capacidad para obtener datos genómicos así como el desarrollo de nuevas aproximaciones computacionales nos permiten en la actualidad inferir con una resolución sin precedentes tiempos de divergencia y tasas de migración entre poblaciones, patrones espaciales de estructuración genética, cambios en tamaños efectivos poblacionales a lo largo del tiempo y distinguir procesos selectivos y neutrales que afectan a determinados loci o regiones genómicas (p. ej.: Ray et al., 2010; Excoffier & Foll, 2011; Frichot et al., 2013; Liu & Fu, 2015).

9

INTRODUCCIÓN GENERAL

El estudio de los procesos neutrales que moldean la variación genética a diferentes escalas evolutivas (genes, poblaciones, linajes, especies), espaciales (local, regional) y temporales (histórico vs. contemporáneas) no solo nos permite inferir cómo los distintos componentes del paisaje y sus cambios en el tiempo han determinado la historia demográfica de los organismos, sino que también nos puede dar pistas para predecir las respuestas de las poblaciones, especies y comunidades ante diversos escenarios futuros como son el calentamiento global o la fragmentación de los hábitats naturales (Espindola et al., 2014; Brown et al., 2016). La capacidad actual de obtener datos genómicos en organismos no modelo ha permitido difuminar los límites que durante un largo tiempo han separado los campos de la filogeografía (Avise, 2000) y la genética del paisaje (Manel et al., 2003; Manel & Holderegger, 2013). Esto ha permitido abordar en un único marco analítico los procesos que determinan la distribución de la variación genética de los organismos a diferentes escalas espaciales y temporales sin las limitaciones anteriormente impuestas por la distinta resolución que podían aportar los marcadores genéticos disponibles (p. ej.: Ray et al., 2010; Excoffier & Foll, 2011; He et al., 2013). Entre otros muchos aspectos, estas nuevas aproximaciones nos pueden ayudar a inferir cómo los procesos históricos (ej. cambios climáticos del Pleistoceno: Hewit, 2000; Hewit, 2004) y contemporáneos (ej. fragmentación del hábitat por parte del hombre: Landguth et al., 2010; Ortego et al., 2010) han moldeado la demografía de las poblaciones. Estas inferencias pueden ser de gran relevancia para determinar, por ejemplo, si las diferencias genéticas observadas entre poblaciones son el resultado de una fragmentación resultante de las actividades humanas y, por lo tanto, se deben tomar medidas encaminadas a restaurar la conectividad y el flujo genético interpoblacional, o si, por el contrario, reflejan procesos históricos de divergencia que deben ser preservados como una parte consustancial e insustituible del acervo genético y evolutivo de la especie (Cunningham & Moritz, 1998; Keller et al., 2013). Por otro lado, las actuales herramientas genómicas nos brindan la posibilidad de inferir procesos selectivos en determinados loci o regiones genómicas que pueden ser indicativos de fenómenos de adaptación local y ayudar a determinar en qué medida las distintas poblaciones presentan trayectorias evolutivas idiosincráticas que han de ser consideradas en los planes de manejo y conservación (Bowen, 1999).

10

INTRODUCCIÓN GENERAL

Integrando todas estas herramientas y aproximaciones conceptuales, y utilizando como sistema de estudio el género de saltamontes Dociostaurus (Orthoptera: Acrididae), la presente Tesis Doctoral trata en primer lugar de dar respuestas a problemas que abarcan desde la delimitación de unidades taxonómicas y la resolución de especies crípticas hasta la compresión de los procesos ecológicos y evolutivos que determinan la distribución de la variación genómica a diferentes escalas espaciales y temporales en especies que presentan diferente grado de abundancia. En segundo lugar, partiendo de las inferencias obtenidas con respecto a cada una de estas cuestiones, se derivan recomendaciones con implicaciones directas para el desarrollo de políticas orientadas a la gestión, manejo y conservación de los taxones analizados o bien se sientan las bases para estudios futuros que permitan abordar estos aspectos aplicados con una mayor profundidad.

11

INTRODUCCIÓN GENERAL

SISTEMA DE ESTUDIO

El sistema de estudio de la presente Tesis Doctoral es un conjunto de especies de ortópteros del género Dociostaurus Fieber, 1853 pertenecientes a la tribu Dociostaurini y a la subfamilia (Orthoptera: Acrididae) cuyo origen posiblemente tuvo lugar a lo largo del Mioceno (García-Navas et al., 2017). El género Dociostaurus se extiende principalmente por el sur de Europa, norte de África y centro-sur de Asia (Cigliano et al., 2019) y algunas de sus especies se encuentran entre las más comunes y abundantes de la región Paleártica (ej. Sirin & Mol, 2013). Los rasgos más representativos que caracterizan a este género de ortópteros es un llamativo dibujo en forma de cruz de San Andrés de color claro en el pronoto, tres surcos transversales bien desarrollados, crestas obliteradas laterales en la segunda mitad de la prozona, vértex curvo y fovéolas cercanas con bordes marcados, muy perceptibles y visibles desde arriba (Fieber, 1853). La práctica totalidad del conocimiento disponible sobre el género Dociostaurus es de carácter descriptivo, bien acerca de su taxonomía o sobre determinados aspectos de la distribución, ecología y fenología de algunas especies. El género ha experimentado sucesivas revisiones taxonómicas desde mediados del S. XIX hasta la actualidad lo que ha dado lugar al cambio del estatus taxonómico de varias especies y subespecies a lo largo del tiempo (Soltani, 1978; Cigliano et al., 2019). Esta problemática ha generado un continuo debate entre taxónomos, tanto expertos en ortópteros como en otros muchos grupos de organismos, sobre la necesidad de fundamentar la delimitación de especies no solo basándose en rasgos fenotípicos, sino integrando éstos con datos genéticos para poder definir las distintas entidades taxonómicas de un modo más robusto y objetivo (Wiens, 2007; Zachos, 2018; Padial et al., 2010).

De las aproximadamente 30 especies que componen este género (Cigliano et al., 2019), tres de ellas han servido como modelo de estudio específico para diferentes capítulos de esta Tesis Doctoral (Figura 1):

i. Dociostaurus crassiusculus (Pantel, 1886): especie endémica de la península ibérica cuyo tamaño corporal oscila entre 19-25 mm de largo, siendo las hembras de mayor tamaño que los machos. Se caracterizan por su cuerpo

12

INTRODUCCIÓN GENERAL robusto y pronoto cuadrangular (no estrechado en el centro) con el dibujo en forma de cruz característico del género más definido en la metazona que en la prozona. Las tegminas nunca sobrepasan el final de sus rodillas. Las rodillas son de color pardo y las tibias de color rojizo. La fila estridulatoria del interior del fémur presenta 27-35 púas de forma cónica en machos y 26-34 púas vestigiales y menos definidas en hembras (Pantel, 1886; Soltani, 1978; García et al., 2005). Ecológicamente, este endemismo ibérico es mucho menos conocido que otras especies del mismo género. El hábitat que ocupa es pseudo-estepario, constituido por terrenos limosos, yesosos y salinos con suelo arenoso, y vegetación gipsófila y halófila, rala y escasa. Es de destacar que no se encuentra fácilmente en todos los hábitats potencialmente propicios, sino que sus poblaciones relictas están muy fragmentadas y localizadas en el centro y sur de la península ibérica (Cordero et al., 2010). Tiene un máximo poblacional entre finales de mayo y principios de junio. Actualmente la especie está catalogada en el Libro Rojo de la Unión Internacional para la Conservación de la Naturaleza (UICN) como “en peligro” (EN), según el criterio B2ab (iii, v) por el que se estima que este taxón se distribuye en un área inferior a 500 km2, sufre una severa fragmentación de sus poblaciones y un continuo retroceso tanto de la extensión y calidad de su hábitat como del número de individuos maduros (Hochkirch et al., 2016).

ii. Dociostaurus hispanicus Bolívar, 1898: es otra especie endémica de la península ibérica. Este taxón presenta un tamaño corporal que oscila entre 12-24 mm de largo, siendo mayores las hembras que los machos. Se caracteriza por su cuerpo estilizado, por el dibujo de cruz del pronoto muy bien definido y por tener los lóbulos esternales por encima del espacio mesoesternal. Las fositas del vértex son casi rectangulares y, en los machos, las antenas sobrepasan el pronoto dos veces su longitud. Las tegminas nunca sobrepasan el final de sus rodillas. Se caracterizan por tener de color negro tanto las rodillas como en una pequeña parte de la tibia cercana a la rodilla, mientras que el resto de la tibia es de color

13

INTRODUCCIÓN GENERAL

rojo intenso. La fila estridulatoria de los machos es muy larga, uniforme y recta con 70-86 púas subcónicas, mientras que en las hembras esta fila es vestigial y delgada, con 73-88 púas casi invisibles (Pantel, 1886; Soltani, 1978; García et al., 2005). Su hábitat propicio son las dehesas y pastos con escasa vegetación de las llanuras del centro de la península ibérica (García et al., 2005; Presa et al., 2016). Su máximo poblacional es en junio y principios de julio. Esta especie está catalogada en un rango menor de amenaza que la anterior, siendo su categoría la de “casi amenazada” (NT) según la UICN, debido a su pequeña área de distribución y a sus poblaciones fragmentadas en gran parte del centro peninsular (Hochkirch et al., 2016; Presa et al., 2016).

iii. Dociostaurus maroccanus (Thunberg, 1815): esta especie es conocida como la “langosta marroquí”, ya que el ejemplar tipo con el que se describió por Thunberg a principios del S. XIX, provenía de la cordillera del Atlas en Marruecos. Sin embargo, su distribución es mucho más amplia, aunque discontinua, apareciendo desde algunas islas macaronésicas (Canarias y Madeira) hasta Afganistán y sur de Kazajistán, en una franja de unos 2,000 km de latitud (Latchininsky, 1998; Latchininsky, 2013). Su tamaño corporal oscila entre 16-37 mm de largo, siendo mayor para hembras que para machos. Esta especie se caracteriza fácilmente porque sus tegminas sobrepasan ampliamente los extremos de sus rodillas. Sus rodillas son negras y las tibias rojizas. Presentan el borde posterior del décimo terguito abdominal liso y los lóbulos esternales por debajo del espacio mesoesternal. Las fositas del vértex son trapezoidales, más anchas junto a los ojos y, en los machos, las antenas sobrepasan el pronoto una vez su longitud. La fila estridulatoria del interior del fémur en machos es larga, recta pero poco densa, con 68-80 púas de forma subcónica y redondeada, mientras que las hembras presentan 73-88 púas más tenues (Pantel, 1886; Soltani, 1978; García et al., 2005). Esta especie se puede encontrar en poblaciones aisladas en su forma solitaria o en forma gregaria formando plagas devastadoras y recurrentes año tras año en determinadas zonas (“Teoría de las

14

INTRODUCCIÓN GENERAL

Fases”, Uvarov, 1921). Se han observado diferencias morfológicas entre las distintas formas, sobre todo en lo relativo al tamaño corporal (Latchininsky & Launois-Luong, 1992), siendo las formas gregarias de mayor tamaño que las solitarias (García de la Vega, 1980). Cuando se encuentra en forma gregaria, es posible que los individuos puedan llegar a desplazarse volando 70-100 km, nunca más de 200 km a lo largo de su vida (Latchininsky, 1998), lo que puede dar lugar a intercambios de individuos entre poblaciones distantes. El hábitat preferido por la polífaga langosta marroquí son las estepas y las zonas semidesérticas con mosaicos de vegetación baja de pastos efímeros (ej. Poa bulbosa), juncias y herbazales ralos, junto a tierras sin vegetación. La presencia de D. maroccanus se suele asociar a terrenos con sobrepastoreo o explotación ganadera (Latchininsky, 1998). Su máximo poblacional varía según la latitud, desde finales de mayo a principios de julio en la península ibérica, y desde mediados de junio hasta final de julio en las islas Canarias.

Figura 1 Áreas de distribución aproximadas de las tres especies objeto de estudio en los capítulos 2, 3 y 4 de esta Tesis Doctoral. La fotografía de D. hispanicus fue realizada por Piluca Álvarez, mientras que las de D. crassiusculus y D. maroccanus fueron tomadas por Pedro J. Cordero.

15

INTRODUCCIÓN GENERAL

Por todo ello, las especies modelo en las que se ha basado esta Tesis Doctoral hace especialmente atractivo su estudio por el interés aplicado que llevan asociado en sus diversos estatus, que abarcan desde especies “en peligro” de interés para la conservación hasta especies plaga de gran importancia económica.

16

INTRODUCCIÓN GENERAL

JUSTIFICACIÓN Y OBJETIVOS GENERALES

El objetivo general de esta Tesis Doctoral es ahondar en el conocimiento de diferentes aspectos de la sistemática, ecología y evolución del género Dociostaurus con el fin de obtener respuestas integradoras que nos permitan entender los procesos que generan la diversidad biológica a diferentes niveles, desde especies a poblaciones. De modo particular, la presente Tesis se centra en el estudio del género Dociostaurus en la región occidental del arco Mediterráneo, investigando en primer lugar su sistemática para después abordar el análisis de los procesos demográficos y adaptativos que determinan la distribución espacial de la variabilidad genética en tres especies (D. crassiusculus, D. hispanicus y D. maroccanus) con particularidades ecológicas dispares. Para afrontar estas cuestiones, la presente Tesis Doctoral integra conceptos y metodologías de múltiples disciplinas como la taxonomía, sistemática, genética de poblaciones y filogeografía, con el objetivo de estudiar los procesos neutrales y adaptativos que moldean la variación genética a distintas escalas espaciotemporales y evolutivas.

Abordar estos aspectos adquiere particular relevancia si consideramos el contexto biogeográfico donde se desarrolla esta Tesis Doctoral: la región Mediterránea. Esta región constituye uno de los puntos calientes (“hotspots”) de biodiversidad más importantes del planeta (Médail & Quezel, 1999; Myers et al., 2000; Brooks et al., 2006) debido principalmente a su histórica estabilidad climática (Blondel & Aronson, 1999), el bajo impacto de las glaciaciones del Pleistoceno (Hewitt, 2000) y las conexiones entre diversas áreas mediterráneas de Europa, Asia y África (Ribera & Blasco-Zumeta, 1998; Faille et al. 2014). Pero, a su vez, la cuenca mediterránea es también una de las zonas más alteradas por las actividades humanas (Blondel & Aronson, 1999; Ortego et al., 2015), lo que hace particularmente importante la necesidad de entender los procesos que mantienen y generan su diversidad biológica para establecer medidas adecuadas que garanticen su conservación (Barredo et al., 2016; Giorgi & Lionello, 2008). Teniendo en cuenta las argumentaciones precedentes, el conocimiento generado a partir de la presente Tesis Doctoral contribuirá de modo último a la elaboración de planes de conservación y manejo tanto de las especies estudiadas como de los ecosistemas a los que éstas se encuentran asociadas. En esta Tesis

17

INTRODUCCIÓN GENERAL

Doctoral se plantean una serie de objetivos específicos, cada uno de los cuales se desarrollada en un capítulo diferente:

1. Reconstruir de las relaciones filogenéticas de las especies del género Dociostaurus distribuidas en el Mediterráneo occidental mediante marcadores mitocondriales (12S, 16S y COI) y utilizar esta información para reevaluar el estatus taxonómico de los diferentes subgéneros, especies y subespecies.

2. Determinar la importancia relativa que los factores históricos y antropogénicos han tenido en el moldeado de la estructura genética contemporánea del ortóptero D. crassiusculus y considerar las inferencias obtenidas para definir unidades de conservación en este endemismo ibérico catalogado actualmente como “en peligro de extinción”.

3. Ilustrar el potencial de integrar datos ecológicos y diferentes fuentes de información espacial para inferir los procesos (p. ej.: barreras topográficas, cambios climáticos del Pleistoceno, fragmentación del hábitat por parte del hombre) que han determinado la distribución espacial de la variación genómica en el endemismo ibérico D. hispanicus, recientemente incluido en la lista roja de especies amenazadas.

4. Examinar la estructura y conectividad genética entre poblaciones, determinar cambios temporales en sus tamaños efectivos poblacionales e identificar señales genómicas de procesos selectivos en poblaciones gregarias y solitarias de la langosta marroquí (D. maroccanus) para entender mejor las dinámicas demográficas y evolutivas de esta especie de gran interés económico.

18

INTRODUCCIÓN GENERAL

ESTRUCTURA DE LA TESIS DOCTORAL

La presente Tesis Doctoral se ha dividido en cuatro capítulos en los cuales se exponen una serie de casos de estudio representativos que abordan los objetivos planteados utilizando diferentes aproximaciones. Para ello, comienzo explorando la sistemática actual del género Dociostaurus (Capítulo 1) para clarificar, entre otros aspectos, el estatus taxonómico de dos especies amenazadas de las que posteriormente se estudian en detalle los factores que explican la variación genómica de sus poblaciones a lo largo de sus respectivos rangos de distribución (Capítulos 2 y 3). Finalmente, en el último capítulo se aborda el estudio de la conectividad, demografía y las señales genómicas de procesos selectivos en una especie plaga de gran importancia económica con el objetivo de comprender mejor sus dinámicas poblacionales y evolutivas (Capítulo 4).

En el Capítulo 1 se hace una revisión de la sistemática del género Dociostaurus poniendo el foco en las especies que se distribuyen en el sector occidental del Mediterráneo y utilizando una aproximación de taxonomía molecular, siendo éste el primer estudio genético específico para el género. Para ello se reconstruye la filogenia del grupo de taxones estudiados utilizando una serie de marcadores de ADN mitocondrial (12S, 16S y COI) lo que permite determinar que la asignación de subgéneros taxonómicamente aceptados hoy en día tiene importantes incongruencias que deberían ser estudiadas en mayor detalle. Los resultados obtenidos aportan además nueva información sobre la organización filogenética de las especies estudiadas y proporcionan una datación aproximada de la división de dichos taxones. De la misma forma, se estudia la relación filogenética de determinadas especies crípticas con distribución disyunta en el Paleártico como D. crassiusculus vs. D. kraussi y D. hispanicus vs. D. brevicollis, siendo D. crassiusculus y D. hispanicus endemismos ibéricos, mientras que D. kraussi y D. brevicollis son especies que habitan en Europa del este y Asia central. La delimitación de especies abordada en este capítulo permitió dilucidar que D. crassiusculus es un taxón con rango de especie, no una subespecie de D. kraussi, lo que resuelve una ambigüedad taxonómica arrastrada desde finales del S. XIX.

19

INTRODUCCIÓN GENERAL

El Capítulo 2 estudia en profundidad la estructuración y la variación genética del taxón D. crassiusculus, ascendido al rango de especie en el capítulo anterior. La aplicación de datos genómicos (polimorfismos de nucleótidos simples, SNPs), modelos demográficos basados en coalescencia y una aproximación de genética del paisaje permitieron dilucidar los factores que han moldeado la estructuración genética de las poblaciones de este endemismo ibérico en peligro de extinción. Estos análisis mostraron que los grupos genéticos inferidos están delimitados por las principales cuencas hidrográficas, revelando así que la estructuración genética de sus escasas poblaciones es el resultado de fenómenos históricos en lugar de una consecuencia de la fragmentación de sus hábitats debido a las actividades humanas. En este capítulo se pone además de manifiesto la necesidad de reconocer los diferentes grupos genéticos singulares que conforman las poblaciones de esta especie como “unidades evolutivas significativas” o “unidades designables” que deben ser consideradas en futuros planes de gestión encaminados a la protección de sus poblaciones.

El Capítulo 3 se centra en el estudio pormenorizado de los factores que explican la variación genómica de las poblaciones del endemismo ibérico D. hispanicus. En primer lugar, se utilizaron datos genómicos (SNPs) para inferir los patrones de diversidad y estructuración genética de la especie, así como para reconstruir los perfiles demográficos de sus poblaciones. Estos análisis mostraron una marcada estructura genética y una alta variabilidad espacial en la diversidad genética resultante de las diferentes dinámicas demográficas de sus distintas poblaciones. En segundo lugar, se contrastó una serie de modelos espaciotemporalmente explícitos que consideran diferentes factores que hipotéticamente han podido moldear la estructura genética de las poblaciones de la especie, como son cambios en su distribución debidos a las oscilaciones climáticas del Pleistoceno, la presencia de barreras geográficas y la fragmentación de sus hábitats como consecuencia de las actividades humanas. Los resultados obtenidos mostraron que la distribución actual de la especie está determinada principalmente por la presencia de barreras topográficas para la dispersión (zonas con elevadas pendientes) y, en menor medida, por la configuración actual de sus hábitats. Este capítulo ejemplifica la necesidad de integrar información espacial y ecológica en un contexto temporal para obtener un conocimiento detallado de los procesos

20

INTRODUCCIÓN GENERAL que determinan la variación genómica en taxones con poblaciones altamente fragmentadas y en riesgo de extinción.

Finalmente, en el Capítulo 4 se utilizaron también datos genómicos (SNPs) para analizar los procesos neutrales y adaptativos que determinan la distribución espacial de la variabilidad genética en las poblaciones de langosta marroquí (D. maroccanus) del extremo occidental de su rango de distribución (península ibérica e islas Canarias). Los resultados obtenidos muestran una marcada estructura genética entre las poblaciones de Canarias y la península ibérica, mientras que las poblaciones muestreadas a lo largo de toda la península presentan muy escasa diferenciación genética posiblemente como resultado de la gran capacidad dispersiva de la especie. Numerosos loci mostraron señales de estar sometidos a selección divergente, algo que apunta a que podrían estar involucrados en procesos de adaptación local en respuesta a diferentes factores ambientales y ecológicos. Los análisis abordados en este capítulo aportan un conocimiento básico sobre las dinámicas demográficas y evolutivas de este taxón de gran interés económico por las plagas recurrentes y dañinas que genera para la agricultura en las áreas langosteras distribuidas a lo largo de todo su rango de distribución en el entorno del Mediterráneo.

21

INTRODUCCIÓN GENERAL

METODOLOGÍA

La descripción de las diferentes aproximaciones metodológicas en las que se ha fundamentado el desarrollo de esta Tesis Doctoral se detallan a continuación en el orden cronológico en el que fueron aplicadas, partiendo de las campañas de muestreo y llegando hasta el análisis de datos genéticos y genómicos para elaborar los artículos científicos que componen los diferentes capítulos (Figura 2). Se trata de un trabajo totalmente original planteado a partir de la necesidad de dar respuestas a problemas teóricos (diversidad de especies y variabilidad genética) y aplicados (conservación y manejo de especies) que atañen a la compresión de los procesos que generan la diversidad biológica en un sentido amplio y que, de modo último, pueden contribuir a su conservación.

MUESTREOS

Antes del llevar a cabo los muestreos de campo revisé la bibliografía disponible para los diferentes taxones objeto de estudio y se diseñaron las campañas de prospección para abarcar la totalidad de su distribución conocida, así como lugares potencialmente propicios, de forma óptima y más eficiente posibles. De este modo, se realizaron muestreos exhaustivos durante los meses de mayo, junio y julio (según la fenología de las diversas especies estudiadas) entre los años 2015 y 2017. Los ortópteros se recolectaron con manga entomológica o a mano y se conservaron a -20°C en microtubos de 2-5 mL con etanol 96% hasta que fue necesario utilizarlos para los análisis posteriores. Todos los muestreos se realizaron con el permiso de las distintas administraciones competentes.

DATOS GENÉTICOS: ADN MITOCONDRIAL

El ADN de los especímenes seleccionados se extrajo a partir de tejido del fémur mediante kits de extracción y purificación de ADN (Nucleo Spin Tissue kits; Macharey-Nagel). Se amplificaron y secuenciaron tres fragmentos de genes mitocondriales: 12S ARNr (12S) y el

22

INTRODUCCIÓN GENERAL

16S ARNr (16S), ambos altamente conservados y que habitualmente se han empleado en resolver relaciones filogenéticas profundas, y el citocromo oxidasa subunidad I (COI) que, debido a su alta tasa de mutación, permite identificar entidades a nivel de especie, subespecie y linaje. El alineamiento y edición de las secuencias de genes mitocondriales obtenidas se realizó principalmente con el programa SEQUENCHER (Gene Codes Corporation). Las secuencias obtenidas se depositaron en la base de datos moleculares de GenBank, perteneciente al National Center of Biotechnology Information (NCBI), con números de registro KX954639-KX954810 (Capítulo 1).

DATOS GENÓMICOS: SNPs

Partiendo del ADN extraído, se elaboraron un total de cinco librerías genómicas siguiendo el método double-digest RAD sequencing (ddRADseq) descrito en Peterson et al. (2012) para obtener datos genómicos a gran escala (SNPs, en nuestro caso). El protocolo consta de una digestión doble con las enzimas de restricción MseI y EcoRI seguida de una ligación de estos productos a los adaptadores específicos de Illumina que contienen un código de barras único de 7 pares de bases (“barcode”) que permite identificar a cada individuo. Se seleccionaron solo los fragmentos de ADN con una longitud comprendida entre 475 y 580 pares de bases para amplificarlos mediante PCR con una polimerasa de alta fidelidad. El producto resultante se secuenció en una plataforma Illumina Hiseq 2500, la cual produjo lecturas de secuencias de 151 pares de bases. Estos datos fueron procesados bioinformáticamente con dos programas específicos comúnmente utilizados para el ensamblaje y filtrado de secuencias obtenidas mediante RADseq: STACKS (Hohenlohe et al., 2010; Catchen et al., 2011; Catchen et al., 2013) y PYRAD (Eaton, 2014). Al no existir un genoma de referencia para ninguna especie del género Dociostaurus, se empleó la metodología de ensamblaje de novo (Capítulos 2, 3 y 4).

23

INTRODUCCIÓN GENERAL

ANÁLISIS FILOGENÉTICOS

La selección de los modelos de sustitución nucleotídica que mejor se ajustaban a nuestro conjunto de datos de ADN mitocondrial se llevó a cabo mediante JMODELTEST (Darriba et al., 2012), mientras que las relaciones filogenéticas de los diversos taxones y los tiempos de divergencia se estimaron con el programa BEAST (Drummond et al., 2012) (Capítulo 1). A escala intraespecífica, se infirieron las relaciones filogenéticas entre los principales grupos genéticos de D. crassiusculus a partir de SNPs y usando métodos basados en coalescencia implementados en SNAPP (Bryant et al., 2012) y SVDQUARTETS (Chifman & Kubatko, 2014) (Capítulo 2).

DELIMITACIÓN DE ESPECIES

La delimitación de especies mediante el uso de secuencias de ADN mitocondrial se llevó a cabo utilizando dos métodos que no requieren información taxonómica previa: General Mixed Yule Coalescent (GMYC; Fujisawa & Barraclough, 2013) y Bayesian Poisson Tree Processes (bPTP; Zhang et al., 2013). Ambos métodos son complementarios puesto que el método GMYC modela la especiación como procesos neutrales coalescentes dentro de un árbol ultramétrico calibrado en el tiempo, mientras que el método bPTP modela la especiación utilizando directamente el número de sustituciones de las bases (obviando los posibles problemas relacionados con la calibración de los árboles filogenéticos) y aportando un soporte Bayesiano a dicha delimitación (Capítulo 1). Complementariamente, se calculó la distancia genética a nivel de ADN mitocondrial entre las distintas especies estudiadas mediante el programa MEGA (Tamura et al., 2011).

VARIABILIDAD GENÉTICA

La identificación de haplotipos de ADN mitocondrial y su frecuencia en las poblaciones se calcularon con DNASP (Librado & Rozas, 2009) (Capítulo 1). Por otro lado, las estimas de

24

INTRODUCCIÓN GENERAL variabilidad genómica de las distintas poblaciones estudiadas para cada especie se calcularon a través del programa STACKS (Hohenlohe et al., 2010; Catchen et al., 2011; Catchen et al., 2013). Éstas incluyeron la frecuencia del alelo más común (P), la diversidad nucleotídica (π), la heterozigosidad observada (HO) y esperada (HE) y el coeficiente de endogamia de Wright (FIS) (Capítulos 2, 3 y 4).

ESTRUCTURA GENÉTICA POBLACIONAL

La diferenciación genética (FST) entre pares de poblaciones se calculó en base a SNPs usando el programa ARLEQUIN (Excoffier & Lischer, 2010) (Capítulos 2, 3 y 4). La estructuración genética espacial de las poblaciones de las distintas especies estudiadas en los Capítulos 2-4 se abordó mediante análisis de componentes principales (PCA, Principal Component

Analysis) y usando el programa STRUCTURE (Pritchard et al., 2000), que utiliza métodos de inferencia bayesiana para asignar estadísticamente cada individuo a un determinado grupo genético en función de su genotipo.

GENÉTICA DEL PAISAJE

Se utilizó el marco conceptual de la Teoría de Circuitos implementado en CIRCUITSCAPE (McRae, 2006; McRae & Beier, 2007) para analizar la resistencia al flujo genético que ofrecen algunos componentes del paisaje, incluida la topografía, litología, los límites de las cuencas hidrográficas y la distribución de los hábitats, lo que permitió contrastar diferentes hipótesis de aislamiento por resistencia (isolation-by-resistance, IBR) (Capítulo 2). Por otro lado, se analizó el papel potencial que la disimilitud ambiental entre poblaciones ha podido tener en su diferenciación genética para contrastar la hipótesis de aislamiento por ambiente (isolation-by-environment, IBE) (Capítulo 4). Para analizar la posible relación entre la diferenciación genética, por un lado, y las distancias de resistencia y ambientales, por otro, se utilizó una aproximación de regresiones múltiples de matrices (Multiple Matrix Regression with Randomization, MMRR; Wang, 2013) (Capítulos 2 y 4).

25

INTRODUCCIÓN GENERAL

MODELADO DEL NICHO AMBIENTAL

El modelado de nicho ambiental para la especie D. hispanicus se realizó mediante el algoritmo de máxima entropía implementado en MAXENT (Phillips et al., 2006) y ENMEVAL (Muscarella et al., 2014). Para ello, se partió de datos de ocurrencia de la especie de estudio y de diversas capas de variables bioclimáticas con el objetivo final de generar reconstrucciones de la distribución potencial de la especie tanto en el presente como en el pasado, concretamente durante el último máximo glacial (hace 21000 años aproximadamente; Last Glacial Maximum, LGM). Estas capas se utilizaron para generar hipótesis espaciotemporalmente explícitas de como los cambios climáticos del Pleistoceno han podido tener un impacto en la distribución y variabilidad genética de las poblaciones de la especie (Capítulo 3).

INFERENCIA DEMOGRÁFICA

Para entender los procesos que han podido determinar la estructura genética y tamaños efectivos poblacionales de las poblaciones, datar sus tiempos de divergencia e inferir los patrones de flujo genético entre las mismas se contrastaron diferentes modelos demográficos mediante el programa FASTSIMCOAL2 (Excoffier & Foll, 2011) (Capítulo 2). Por otra parte, también se realizaron reconstrucciones de la historia demográfica pasada de las poblaciones utilizando STAIRWAY PLOT (Liu & Fu, 2015), un método que se basa en el espectro de frecuencias alélicas (Site Frequency Spectrum, SFS) para inferir cambios en los tamaños efectivos poblacionales a lo largo del tiempo (Capítulos 3 y 4).

MODELOS DEMOGRÁFICOS ESPACIOTEMPORALMENTE EXPLICITOS

Se utilizó información espacial y ecológica para generar diversos escenarios espaciotemporalmente explícitos que representan distintas hipótesis acerca de cómo la heterogeneidad del paisaje ha impactado en la demografía y la distribución de la variabilidad

26

INTRODUCCIÓN GENERAL genética de la especie objeto de estudio. Estos escenarios se utilizaron para generar simulaciones demográficas y genéticas (coalescencia) con el programa SPLATCHE2 (Ray et al., 2010) y poder comparar, en un marco de trabajo de Computación Bayesiana Aproximada (Approximate Bayesian Computation, ABC), los datos genómicos empíricos con los datos genómicos simulados bajo cada escenario para determinar cuál de ellos explica mejor la distribución espacial de la variabilidad genética de la especie (Capítulo 3).

ESCANEO DEL GENOMA

Se utilizaron los programas ARLEQUIN (Excoffier & Lischer, 2010), BAYESCAN (Foll & Gaggiotti, 2008) y LFMM (Fichot & François, 2015) para inferir señales genómicas de selección que pudieran ser indicativas de procesos de adaptación local (Capítulo 4).

27

INTRODUCCIÓN GENERAL

Figura 2 Esquema de las distintas aproximaciones y métodos con los que se resuelven las diferentes cuestiones abordadas en los cuatro capítulos que componen esta Tesis Doctoral.

28

INTRODUCCIÓN GENERAL

REFERENCIAS

Avise, J.C. (2000) Phylogeography: The history and formation of species. 464 pp. Harvard University Press. Cambridge, UK.

Avise, J.C. (2005) Phylogenetic units and currencies above and below the species level. Phylogenetic and Conservation. 76-100 pp. Purvis, Brooks & Gittleman (eds.). Cambridge University Press. Cambridge, UK.

Barredo, J. I., Caudullo, G. & Dosio, A. (2016) Mediterranean habitat loss under future climate conditions: Assessing impacts on the Natura 2000 protected area network. Applied Geography, 75, 83-92.

Bickford, D., Lohman, D.J., Sodhi, N.S. et al. (2007) Cryptic species as a window on diversity and conservation. Trends in Ecology & Evolution, 22, 148-155.

Blondel, J. & Aronson, J. (1999) Biology and wildlife of the Mediterranean region. 328 pp. Oxford University Press. Oxford, USA.

Bowen, B.W. (1999) Preserving genes, species, or ecosystems? Healing the fractured foundations of conservation policy. Molecular Ecology, 8, Suppl. 1, S5-S10.

Brooks, T.M., Mittermeier, R.A., da Fonseca, G.A. et al. (2006) Global biodiversity conservation priorities. Science, 313, 58-61.

Brown, J.L., Weber, J.J., Alvarado-Serrano, D.F. et al. (2016) Predicting the genetic consequences of future climate change: The power of coupling spatial demography, the coalescent, and historical landscape changes. American Journal of Botany, 103, 153-163.

Bryant, D., Bouckaert, R., Felsenstein, J. et al. (2012) Inferring species trees directly from biallelic genetic markers: bypassing gene trees in a full coalescent analysis. Molecular Biology and Evolution, 29, 1917-1932.

Carstens, B.C., Pelletier, T.A., Reid, N.M. & Salter, J.D. (2013) How to fail at species delimitation. Molecular Ecology, 22, 4369-4383.

Casacci, L.P., Barbero, F. & Balletto, E. (2013) The “Evolutionarily Significant Unit” concept and its applicability in biological conservation. Italian Journal of Zoology, 81, 182-193.

Catchen, J.M., Amores, A., Hohenlohe, P.A. et al. (2011) STACKS: building and genotyping loci de novo from short-read sequences. G3: Genes|Genomes|Genetics, 1, 171-182.

Catchen, J.M., Hohenlohe, P.A., Bassham, S. et al. (2013) STACKS: an analysis tool set for population genomics. Molecular Ecology, 22, 3124-3140.

Chifman, J. & Kubatko, L. (2014) Quartet inference from SNP data under the coalescent. Bioinformatics, 30, 3317-3324.

29

INTRODUCCIÓN GENERAL

Cigliano, M.M., Braun, H., Eades, D.C. & Otte, D. (2019) Orthoptera Species File. Version 5.0/5.0. [WWW document]. URL http://orthoptera.speciesfile.org

Cordero, P.J., Llorente, V., Aguirre, M.P. & Ortego, J. (2010) Dociostaurus crassiusculus (Pantel, 1886), especie (Orthoptera: Acrididae) rara en la Península Ibérica con poblaciones locales en espacios singulares de Castilla-La Mancha (España). Boletín de la Sociedad Entomológica Aragonesa, 46, 461-465.

Cunningham, M. & Moritz, C. (1998) Genetic effects of forest fragmentation on a rainforest restricted lizard (Scincidae: Gnypetoscincus queenslandiae). Biological Conservation, 83, 19- 30.

Darriba, D., Taboada, G.L., Doallo, R. & Posada, D. (2012) JMODELTEST2: more models, new heuristics and parallel computing. Nature Methods, 9, 772. de Queiroz, K. (2007) Species concepts and species delimitation. Systematic Biology, 56, 879- 886.

Drummond, A.J., Suchard, M.A., Xie, D. & Rambaut, A. (2012) Bayesian phylogenetics with BEUTI and the BEAST 1.7. Molecular Biology and Evolution, 29, 1969-1973.

Eaton, D.A. (2014) PYRAD: assembly of de novo RADseq loci for phylogenetic analyses. Bioinformatics, 30, 1844-1849.

Espindola, A., Carstens, B.C. & Álvarez, N. (2014) Comparative phylogeography of mutualists and the effect of the host on the genetic structure of its partners. Biological Journal of the Linnean Society, 113, 1021-1035.

Excoffier, L. & Foll, M. (2011) FASTSIMCOAL: a continuous-time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios. Bioinformatics, 27, 1332- 1334.

Excoffier, L. & Lischer, H.E. (2010) ARLEQUIN suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows. Molecular Ecology Resources, 10, 564-567.

Faille, A., Andújar, C., Fadrique, F. & Ribera, I. (2014) Late Miocene origin of an Ibero- Maghrebian clade of ground beetles with multiple colonizations of the subterranean environment. Journal of Biogeography, 41, 1979-1990.

Feulner, P.G. , Kirschbaum, F., Schugardt, C. et al. (2006) Electrophysiological and molecular genetic evidence for sympatrically occuring cryptic species in African weakly electric fishes (Teleostei: Mormyridae: Campylomormyrus). Molecular Phylogenetics and Evolution, 39, 198-208.

30

INTRODUCCIÓN GENERAL

Fieber, F.X. (1853) Synopsis der europäischen Orthoptera mit besonderer Rücksicht auf die in Böhmen vorkommenden Arten als Auszug aus dem zum Drucke vorliegenden Werke “Die europäischen Orthoptera”. Lotos, 3, 118-119.

Fišer, C., Robinson, C.T. & Malard, F. (2018) Cryptic species as a window into the paradigm shift of the species concept. Molecular Ecology, 27, 613-635.

Foll, M. & Gaggiotti, O. (2008) A genome-scan method to identify selected loci appropriate for both dominant and codominant markers: A Bayesian perspective. Genetics, 180, 977-993.

Fraser, D.J. & Bernatchez, L. (2001) Adaptive evolutionary conservation. Molecular Ecology, 10, 2741-2752.

Frichot, E. & François, O. (2015) LEA: An R package for landscape and ecological association studies. Methods in Ecology and Evolution, 6, 925-929.

Frichot, E., Schoville, S.D., Bouchard, G. & François, O. (2013) Testing for associations between loci and environmental gradients using Latent Factor Mixed Models. Molecular Biology and Evolution, 30, 1687-1699.

Fujisawa, T. & Barraclough, T.G. (2013) Delimiting species using single-locus data and the Generalized Mixed Yule Coalescent approach: a revised method and evaluation on simulated data sets. Systematic Biology, 62, 707-724.

García, M.D., Larrosa, E., Clemente, M.E. & Presa, J.J. (2005) Contribution to the knowledge of genus Dociostaurus Fieber, 1853 in the Iberian Peninsula, with special reference to its sound production (Orthoptera: ). Anales de Biología, 27, 155-189.

García de la Vega, C. (1980) Algunos datos morfobiométricos sobre poblaciones de Dociostaurus maroccanus Thb. observadas durante el año 1980 en la comarca de la Serena. Boletín de Sanidad Vegetal. Plagas, 6, 49-55.

García-Navas, V., Noguerales, V., Cordero, P.J. & Ortego, J. (2017) Phenotypic disparity in Iberian short-horned grasshoppers (Acrididae): the role of ecology and phylogeny. BMC Evolutionary Biology, 17, 109.

Giorgi, F. & Lionello, P. (2008) Climate change projections for the Mediterranean region. Global and Planetary Change, 63, 90-104.

Green, D.M. (2005) Designatable units for status assessment of endangered species - Unidades designatables para la evaluación del estatus de especies en peligro. Conservation Biology, 19, 1813-1820.

Grundt, H.H., Kjolner, S., Borgen, L. et al. (2006) High biological species diversity in the arctic flora. Proceedings of the National Academy of Sciences of the United States of America, 103, 972-975.

31

INTRODUCCIÓN GENERAL

He, Q., Edwards, D.L. & Knowles, L.L. (2013) Integrative testing of how environments from the past to the present shape genetic structure across landscapes. Evolution, 67, 3386-3402.

Hebert, P.D.N., Penton, E.H., Burns et al. (2004) Ten species in one: DNA barcoding reveals cryptic species in the neotropical skipper butterfly Astraptes fulgerator. Proceedings of the National Academy of Sciences of the United States of America, 101, 14812-14817.

Hewitt, G. (2000) The genetic legacy of the Quaternary ice ages. Nature, 405, 907-913.

Hewitt, G.M. (2004) The structure of biodiversity: insights from molecular phylogeography. Frontiers in Zoology, 1, 1-16.

Hochkirch, A., Nieto, A., García-Criado, M. et al. (2016) European red list of grasshoppers, crickets and bush-crickets. 94 pp. Publications Office of the European Union. Luxembourg.

Hohenlohe, P.A., Bassham, S., Etter, P.D. et al. (2010) Population genomics of parallel adaptation in threespine stickleback using sequenced RAD tags. PLOS Genetics, 6, e1000862.

Keller, D., van Strien, M.J., Herrmann, M. et al. (2013) Is functional connectivity in common grasshopper species affected by fragmentation in an agricultural landscape? Agriculture, Ecosystems & Environment, 175, 39-46.

Landguth, E.L., Cushman, S.A., Schwartz, M.K. et al. (2010) Quantifying the lag time to detect barriers in landscape genetics. Molecular Ecology, 19, 4179-4191.

Latchininsky, A.V. & Launois-Luong, M.H. (1992) Le cricket marocain, Dociostaurus maroccanus (Thunberg, 1815) dans la partie orientale de son aire de distribution. Etude monographique á L'ex-URSS et aux pays proches. 270 pp. Cirad-Gerdat-Prifas. Montpellier, Francia.

Latchininsky, A.V. (1998) Moroccan locust Dociostaurus maroccanus (Thunberg, 1815): A faunistic rarity or an important economic pest? Journal of Conservation, 2, 167-178.

Latchininsky, A.V. (2013) Locusts and remote sensing: a review. Journal of Applied Remote Sensing, 7, 1-19.

Librado, P. & Rozas, J. (2009) DNASP v5: A software for comprehensive analysis of DNA polymorphism data. Bioinformatics, 25, 1451-1452.

Liu, X. & Fu, Y.-X. (2015) Exploring population size changes using SNP frequency spectra. Nature Genetics, 47, 555-559.

Manel, S. & Holderegger, R. (2013) Ten years of landscape genetics. Trends in Ecology & Evolution, 28, 614-621.

Manel, S., Schwartz, M.K., Luikart, G. & Taberlet, P. (2003) Landscape genetics: combining landscape ecology and population genetics. Trends in Ecology & Evolution, 18, 189-197.

32

INTRODUCCIÓN GENERAL

McRae, B. H. (2006) Isolation by resistance. Evolution, 60, 1551-1561.

McRae, B.H. & Beier, P. (2007) Circuit theory predicts gene flow in plant and animal populations. Proceedings of the National Academy of Sciences of the United States of America, 104, 19885-19890.

Médail, F. & Quézel, P. (1999) Biodiversity hotspots in the Mediterranean Basin: setting global conservation priorities. Conservation Biology, 13, 1510-1513.

Mee, J.A., Bernatchez, L., Reist J.D. et al. (2015) Identifying Designatable Units for intraspecific conservation prioritization: a hierarchical approach applied to the lake whitefish species complex (Coregonus spp.) Evolutionary Applications, 8, 423-441.

Moritz, C. (1994) Defining 'Evolutionarily Significant Units' for conservation. Trends in Ecology & Evolution, 9, 373-375.

Moritz, C. (2002) Strategies to protect biological diversity and the evolutionary processes that sustain it. Systematic Biology, 51, 238-354.

Muscarella, R., Galante, P.J., Soley-Guardia, M. et al. (2014) ENMEVAL: An R package for conducting spatially independent evaluations and estimating optimal model complexity for MAXENT ecological niche models. Methods in Ecology and Evolution, 5, 1198-1205.

Myers, N., Mittermeier, R.A., Mittermeier, C.G. et al. (2000) Biodiversity hotspots for conservation priorities. Nature, 403, 853-858.

New, T.R. & Samways, M.J. (2014) Insect conservation in the southern temperate zones: an overview. Austral Entomology, 53, 26-31.

Ortego, J., Aguirre, M.P. & Cordero, P.J. (2010) Population genetics of Mioscirtus wagneri, a grasshopper showing a highly fragmented distribution. Molecular Ecology, 19, 472-483.

Ortego, J., Aguirre, M.P., Noguerales, V. & Cordero, P.J. (2015) Consequences of extensive habitat fragmentation in landscape-level patterns of genetic diversity and structure in the Mediterranean esparto grasshopper. Evolutionary Applications, 8, 621-632.

Padial, J.M., Miralles, A., de la Riva, I. & Vences, M. (2010) The integrative future of . Frontiers in Zoology, 7, 1-14.

Pantel, S.J. (1886) Contribution a L’Orthoptérologie de L’Espagne Centrale. Anales de la Sociedad Española de Historia Natural, 15, 237-241.

Peterson, B.K., Weber, J.N., Kay, E.H. et al. (2012) Double digest RADseq: an inexpensive method for de novo SNP discovery and genotyping in model and non-model species. PLOS ONE, 7, e37135.

Phillips, S.J., Anderson, R.P. & Schapire, R.E. (2006) Maximum entropy modeling of species geographic distributions. Ecological Modelling, 190, 231-259.

33

INTRODUCCIÓN GENERAL

Presa, J.J., García, M., Clemente, M. et al. (2016) Dociostaurus hispanicus: the IUCN Red List of Threatened Species 2016: e.T16084433A75088044. [WWW document]. URL https://www.iucnredlist.org

Pritchard, J.K., Stephens, M. & Donnelly, P. (2000) Inference of population structure using multilocus genotype data. Genetics, 155, 945-959.

Ray, N., Currat, M., Foll, M. & Excoffier, L. (2010) SPLATCHE2: a spatially explicit simulation framework for complex demography, genetic admixture and recombination. Bioinformatics, 26, 2993-2994.

Ribera, I. & Blasco-Zumeta, J. (1998) Biogeographical links between steppe in the Monegros region (Aragon, NE ), the eastern Mediterranean, and central Asia. Journal of Biogeography, 25, 969-986.

Ryder, O.A. (1986) Species conservation and systematics: The dilemma of the subspecies. Trends in Ecology & Evolution, 1, 9-10.

Sirin, D. & Mol, A. (2013) New species and new song record of the genus Dociostaurus Fieber, 1853 (Orthoptera, Acrididae, Gomphocerinae) from Southern Anatolia, Turkey. Zootaxa, 3683, 486-500.

Soltani, A.A. (1978) Preliminary synonymy and description of new species in the genus Dociostaurus Fieber, 1853 (Orthoptera: Acridoidea: Acrididae, Gomphocerinae) with a key to the species in the genus. Journal of Entomological Society of Iran, 2, 1-93.

Soulé, M.E. & Wilcox, B. (1980) Conservation Biology: an evolutionary-ecological perspective. 395 pp. Sinauer Associates Inc. Sunderland, USA.

Stebbins, G.L. (1950) Variation and evolution in plants. 643 pp. Columbia University Press. New York, USA.

Tamura, K., Peterson, D., Peterson, N. et al. (2011) MEGA5: Molecular Evolutionary Genetics Analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Molecular Biology and Evolution, 28, 2731-2739.

Taylor, R.W. (1983) Descriptive taxonomy: past, present and future. Australian systematic Entomology: a bicentenary perspective. 93–134 pp. Highley & Taylor (eds.) CSIRO. Melbourne, Australia.

UICN (2018) The IUCN Red List of Threatened Species. Versión 2018-2. [WWW document]. URL https://www.iucnredlist.org

Uvarov, B.P. (1921) A revision of the genus Locusta L. (Pachytylus, Fieb.), with a new theory as to the periodicity and migration of locusts. Bulletin of Entomological Research, 12, 135- 163.

34

INTRODUCCIÓN GENERAL

Vrijenhoek, R.C., Schutz, S.J., Gustafson, R.G. & Lutz, R.A. (1994) Cryptic species of deep-sea clams (Mollusca, Bivalvia, Vesicomyidae) in hydrothermal vent and cold-seep environments. Oceanographic Research Papers, 41, 1171-1189.

Wang, I.J. (2013) Examining the full effects of landscape heterogeneity on spatial genetic variation: a multiple matrix regression approach for quantifying geographic and ecological isolation. Evolution, 67, 3403-3411.

Waples, R.S., Jones, R.P.J., Beckman, B.R. & Swan, G.A. (1991) Status review for Snake River fall Chinook salmon. 73 pp. Department of Commerce, National Oceanic and Atmospheric Administration. National Marine Fisheries Service. USA.

Wiens, J.J. (2007) Species delimitation: new approaches for discovering diversity. Systematic Biology, 56, 875-878.

Zachos, F.E. (2018) (New) Species concepts, species delimitation and the inherent limitations of taxonomy. Journal of Genetics, 97, 811-815.

Zhang, J., Kapli, P., Pavlidis, P. & Stamatakis, A. (2013) A general species delimitation method with applications to phylogenetic placements. Bioinformatics, 29, 2869-2876.

35

CAPÍTULO 1

A review of cross-backed grasshoppers of the genus Dociostaurus Fieber (Orthoptera: Acrididae) from the Western Mediterranean: insights from phylogenetic analyses and DNA-based species delimitation

María José González-Serna, Joaquín Ortego* & Pedro J. Cordero* Systematic Entomology (2018), 43: 136 - 146 DOI: 10.1111/syen.12258

*Equal senior co-authorship

CAPÍTULO I

A review of cross-backed grasshoppers of the genus Dociostaurus Fieber (Orthoptera: Acrididae) from the Western Mediterranean: insights from phylogenetic analyses and DNA-based species delimitation

Abstract

Phylogenetic analyses and species delimitation methods are powerful tools for understanding patterns of species diversity. Given the current biodiversity crisis, it makes urgent the assessment and delimitation of truthful species particularly of endangered and morphologically cryptic taxa from vulnerable areas submitted to strong climate change and progressive human intervention like the Mediterranean region. In this study, we applied two DNA-based species delimitation methods and performed a Bayesian phylogenetic reconstruction using three mitochondrial gene fragments (12S, 16S and COI) to solve several taxonomic uncertainties among species of cross-backed grasshoppers (genus Dociostaurus Fieber) from the western Mediterranean. Phylogenetic analyses demonstrate the polyphyletic character of subgenera Dociostaurus, Kazakia Bey- Bienko and Stauronotulus Tarbinsky and, thus, the need of revising the currently accepted taxonomic subgenera within the genus Dociostaurus. We propose the split of closely related taxa with allopatric distributions such as D. (S.) kraussi and D. (S.) crassiusculus, considering the later a distinct species limited to the Iberian Peninsula and excluding the name crassiusculus from other forms of D. (S.) kraussi from East Europe and Asia. Estimates of divergence times indicate that diversification of Dociostaurus probably happened during the Miocene-Pliocene (3-7 Ma), and the split of the studied pairs of sister taxa took place during the middle and late Pleistocene (1-2 Ma). This study highlights the need for more molecular studies on the genus and their different species for a better understanding of their evolution, genetic variation and population dynamics in order to prioritize strategies for their adequate conservation and management.

Keywords: cryptic species, divergence times, DNA barcoding, genetic divergence, Gomphocerinae, mitochondrial DNA, phylogeny, speciation.

39

CAPÍTULO I

INTRODUCTION

Understanding the origin and diversity of the living world requires a revision of traditional taxonomic practices and the advent of molecular tools has been a great complement on this respect (Tautz et al., 2003; Pons et al., 2006; Vogler & Monaghan, 2006). Phylogenetic analyses, coupled with molecular-based species delimitation methods, are nowadays considered fundamental to comprehend current patterns of biological diversity (e.g. Fujisawa & Barraclough, 2013; Huang et al., 2013; Zhang et al., 2013b; Solis-Lemus et al., 2015; Yang, 2015). Mitochondrial DNA has been proven to be very useful for phylogenetic inference and species delimitation due to its universally amplifiable loci, small genome size, fast rates of molecular evolution, low or absent sequence recombination, and evolutionary conserved gene products (Pons et al., 2006; Zhang et al., 2013a; Amaral et al., 2016). Protein-coding genes are suitable to resolve phylogenetic relationships among related species, whereas the most conserved regions of ribosomal RNA genes are useful to establish deep levels of divergence (Simon et al., 1994). In recent years, the employment of DNA markers in taxonomic delimitation has steadily increased and contributed to unravel cryptic patterns of species diversity that could not be resolved by classical morphological studies (e.g. Allegrucci et al., 2009; Grzywacz et al., 2013; Bocek & Bocak, 2016). This is particularly relevant in groups of species inhabiting geographic regions severely impacted by abrupt climate change and human activities, as these processes may rapidly lead to environmental degradation and the stochastic decline of natural populations (Myers et al., 2000).

Although the Mediterranean region is one of the areas of the world historically most altered by humans (Blondel & Aronson, 1999; Ortego et al., 2015), it constitutes an important biodiversity hotspot (Médail & Quézel, 1999; Myers et al., 2000; Brooks et al., 2006). The main reason for the great species richness and endemism of this region is believed to be associated with its historically high climatic stability in comparison with northern temperate areas (Blondel & Aronson, 1999; Hewitt, 2000). It is also widely accepted that northern Mediterranean Peninsulas have served both as glacial refugia and as important diversification hotspots (Hewitt, 1999; Petit et al., 2003). Accordingly, most European thermophilous taxa present deep patterns of phylogeographic divergence driven

40

CAPÍTULO I by their retraction into five main refugia during the Pleistocene glacial cycles: the Iberian Peninsula, the Apennine Peninsula, the Balkans, Anatolia and North Africa (Hewitt, 1999). The Mediterranean region is predicted to experience the highest proportion of biodiversity loss among all terrestrial biomes due to its particular sensitivity to a wide range of threats, including land use alterations, global climate change, and their negative interactions (Giorgi & Lionello, 2008; Klausmeyer & Shaw, 2009). For these reasons, understanding the biological diversity from the Mediterranean region is necessary in order to establish priorities for conservation and inform management practices aimed to preserve its unique diversity (Blondel & Aronson, 1999).

In this study, we adopt an integrative taxonomic approach by examining the phylogenetic relationships and taxonomic status of western Mediterranean Dociostaurini (Mistshenko, 1974), a tribe of Orthoptera comprising several species of either great conservation concern or important economic interest (e.g. Latchininsky, 1998; Latchininsky, 2013; Hochkirch et al., 2016). This tribe comprises eight different genera, with three of them (genera Dociostaurus, Notostaurus and Xerohippus) being represented in the Western Mediterranean region. Dociostaurus are grasshoppers with an “X” shaped pattern on their pronotum, three well developed transverse grooves and obliterated lateral ridges in the posterior half of the prozona, convex vertex and closed foveolae with distinct sharp margins visible (Fieber, 1853). Species within this genus are mainly distributed in South Europe, North Africa, Angola, the Canary Islands, Madeira Archipelago, Central Asia, and Hawaii Islands (Cigliano et al., 2017). They constitute one of the most common grasshoppers living in desert and semi-desert landscapes of the Palearctic region (e.g. Sirin & Mol, 2013). The genus comprises 30 described species and three subgenera separated by some morphological traits: sixteen species in the subgenus Dociostaurus Fieber; six species in the subgenus Kazakia Bey-Bienko; seven species in the subgenus Stauronotulus Tarbinsky (Cigliano et al., 2017), and a new controversial species, Dociostaurus biskrensis Moussi & Petit, that has not been yet assigned to any subgenus (Moussi et al., 2014). The description of new species in Dociostaurus is relatively recent as more than half of the taxa within the genus were described in the 20th century. However, most taxonomic efforts on the group have been focused on the identification of morphological diagnostic traits and bioacoustic

41

CAPÍTULO I signals (e.g. Harz, 1975; Soltani, 1978; García et al., 2005), an approach presenting certain limitations to deal with some closely related and phenotypically similar species (e.g. sibling and/or cryptic species).

The employment of DNA markers can help to define species boundaries and resolve several taxonomic ambiguities within Dociostaurus, which is of particular interest given that different species of this genus are of great conservation concern or constitute important agricultural pests. According to the International Union for Conservation of Nature (IUCN), there are several species of Dociostaurus included in the European Red List of Orthoptera (Hochkirch et al., 2016). One of them is Dociostaurus (Dociostaurus) minutus La Greca, a brachypterous narrow endemic species restricted to some coastal dunes in the south-east of Sicily (Massa, 2011; Massa et al., 2012). The extremely small distribution range of this species, together with the considerable degradation of coastal habitats, has motivated its inclusion in the European Red List of Orthoptera with the category “endangered” (Bushell, 2013; Hochkirch et al., 2016). Very different is the situation for the Moroccan locust Dociostaurus (Dociostaurus) maroccanus (Thumberg), a pest species of many crops with considerable economic impacts, although currently scarce in many areas (reviewed in Latchininsky, 1998; Latchininsky, 2013). Among Mediterranean species, there is also a couple of interesting cases of sibling/cryptic-species of conservation concern that present disjunctive-distributions in vast areas between the Western Mediterranean, on the one side, and Eastern Europe and Central Asia, on the other side: D. (Stauronotulus) crassiusculus crassiusculus (Pantel) for D. (S.) kraussi (Ingenitskii), and D. (D.) hispanicus Bolívar for D. (K.) brevicollis (Eversmann). Dociostaurus (S.) c. crassiusculus and D. (D.) hispanicus are Iberian endemics that have been recently assigned to the categories “endangered” and “near threatened”, respectively, due to the high fragmentation of their small populations (Cordero et al., 2010; Hochkirch et al., 2016). The taxonomic relationship of D. (S.) crassiusculus and D. (S.) kraussi is controversial and has been modified by different authors according to morphological criteria (e.g. Harz, 1975; Soltani, 1978; Hodjat, 2016). Resolving the taxonomic status of these two putative species is particularly interesting because they involve one endangered Iberian endemic and a relatively abundant species presenting a disjunctive distribution in Eastern Europe and Asia. The same controversial case occurs with the pair D.

42

CAPÍTULO I

(D.) hispanicus and D. (K.) brevicollis, also with disjunctive distributions and the last one being a widely distributed common species in Eastern Europe (Cigliano et al., 2017). Taxonomic problems also involve common and widely distributed species like Dociostaurus (Kazakia) jagoi Soltani, with populations showing subtle morphological differences at both sides of Mediterranean Sea: D. (K.) j. occidentalis in South Europe and D. (K.) j. jagoi in North Africa.

The systematics of Dociostaurus from the West arc of the Mediterranean region is reviewed. In particular, this study: (i) analyzes the phylogenetic relationships for most taxa of the genus; (ii) evaluates the validity of current supra-specific classification (i.e. genera/subgenera); and (iii) employs species-delimitation methods to resolve the taxonomic status of controversial sibling species with a disjunctive Palearctic distribution.

Figure 1 Geographical location of samples of different species/subspecies of the genus Dociostaurus used for phylogenetic analyses. Dociostaurus (S.) kraussi and D. (S.) c. nigrogeniculatus were retrieved from GenBank and their location is approximate. Notostaurus anatolicus was used as outgroup.

43

CAPÍTULO I

MATERIALS AND METHODS

TAXON SAMPLING

Between 2007 and 2015, samples from different species belonging to the Dociostaurini tribe (Mistshenko, 1974) were collected: seven species of the genus Dociostaurus and one species of the genus Notostaurus (Table S1; Fig. 1). For each specimen, collection date, locality, geographical coordinates and elevation were recorded. Fresh whole adult specimens were stored in 2,000 µL ethanol 96% at -20°C until needed for DNA extraction. The identification of doubtful specimens was checked against the entomological collection of the National Museum of Natural History (MNCN) in . Nine specimens of D. (K.) brevicollis collected in different localities from Turkey and Russia and sequences of D. (S.) kraussi and D. (S.) crassiusculus nigrogeniculatus deposited in the GenBank (accession numbers KR005944, KR014937 and KM816675) were used for genetic comparison with D. (D.) hispanicus and D. (S.) c. crassiusculus, their respective putative sibling species from the Western Mediterranean region.

DNA EXTRACTION, AMPLIFICATION AND SEQUENCING

Nucleo Spin Tissue kits (Macherey-Nagel, Durën, Germany) were used to extract and purify total DNA (mitochondrial DNA + genomic DNA) from a hind leg of each specimen. Segments of three mitochondrial genes: 12S rRNA (12S), 16S rRNA (16S) and cytochrome oxidase subunit I (COI) were amplified for each sample by polymerase chain reaction (PCR) (Table S2). Previous studies in Orthoptera have demonstrated that these molecular markers are informative and useful for comparison within and among species (e.g. Ortego et al., 2009; Vedenina & Mugue, 2011; Çiplak et al., 2014). A nuclear gene fragment, the internal transcribed spacer 2 (ITS2), was also amplified but it could not be used for subsequent analyses due to the high frequency of indels and unambiguous peaks in most sequences.

PCR amplifications were performed in 15 µL reaction volumes including 1× reaction buffer, 2 mM MgCl2, 10 mM of each dNTPs, 10 µM of each primer and 5 U/µL of Immolase

44

CAPÍTULO I

DNA Polymerase (Bioline Reagents, UK). Amplifications were carried out on a Mastercycler EpgradientS (Eppendorf, Hamburg, Germany) thermal cycler under the following program: 9 minutes denaturing at 95°C followed by 40 cycles of 30 seconds at 94°C, 45 seconds at the annealing temperature (12S: 50°C; 16S: 60°C; COI: 48°C) and 45 seconds at 72°C, ending with a 10 minutes final elongation stage at 72°C. PCR products were visualized on 2% agarose gels stained with Orange G (10 mM Tris-HCl pH 7.6, 0.15% Orange G, 60% glycerol, 60mM EDTA). Amplified products were commercially purified and sequenced (Macrogen, South Korea).

BAYESIAN PHYLOGENETIC RECONSTRUCTION

All sequences were visually inspected, edited and trimmed to the same length to remove ambiguous ends using the software SEQUENCHER v.5.2.4 (Gene Codes Corporation, Ann Arbor, MI, USA). Sequences were submitted to GenBank with accession numbers KX954639- KX954810 (Table S1). The genetic dataset was complemented with three sequences of COI from GenBank: two of D. (S.) kraussi (accession numbers KR005944 and KR014937) and one of D. (S.) c. nigrogeniculatus (accession number KM816675), all of them obtained from specimens collected in Xinjiang (China). Sequences were aligned using CLUSTALW on the Web Server of Kyoto University Bioinformatics Center with opening gap = 10 and extension gap penalty = 0.10 (e.g. Allegrucci et al., 2014). The number of haplotypes, haplotype diversity and the number of polymorphic sites were calculated in DNASP v.5 (Librado & Rozas, 2009). Neutrality tests (Fu and Li’s D statistic tests and Tajima’s D tests) were performed as implemented in DNASP v.5. Genetic differentiation among sequences was estimated using

Kimura 2-parameter genetic distances in MEGA v.5.0 (Tamura et al., 2011).

The three mitochondrial gene fragments (12S, 16S and COI) were treated as separate data partitions in phylogenetic analyses. JMODELTEST v.2.1.7 was used to find the best-fitting- model of nucleotide evolution for each gene fragment in a hierarchal hypothesis testing framework based on the Bayesian Information Criterion (BIC) (Darriba et al., 2012). For phylogenetic analyses, the three mitochondrial gene fragments were concatenated in a data matrix of 1507 bp using MEGA v.5.0 (Tamura et al., 2011).

45

CAPÍTULO I

We inferred an ultrametric tree and estimated divergence times for mtDNA sequences using BEAST 1.8.3 (Drummond et al., 2012). Analyses on BEAST were performed using concatenated data from the three mitochondrial gene fragments and only using COI gene. For both, different clocks and demographic models were considered. Each analysis was run with two independent Markov chains for 100 million generations sampled every 10,000 generations (i.e. 10,000 retained genealogies). Posterior probabilities were calculated from post-burn trees. Fossil evidence or adequate events of geological variance to calibrate the molecular clock are not available. Thus, to approximate absolute ages of divergence among Dociostaurus species, molecular clocks were calibrated using mutation rates reported in the literature for insects (mean ± S.D.; 16S = 0.0049±0.0008 substitutions/site/MY; COI = 0.0169±0.0019 substitutions/site/MY; Papadopoulou et al., 2010) and used default values from BEAST for 12S gene. Each run was inspected in TRACER v.1.6 (Rambaut et al., 2014) in order to check the convergence to stationary of model parameters and that Effective Sample Sizes (ESS) were always much higher than 200. Afterwards, the two replicate independent runs for each analysis were combined using LOGCOMBINER v.1.8.3 (Drummond et al., 2012). The first 10 million generations were discarded as burn-in period (burn-in of 10% of MCMC). The best-fitting clock and demographic model were determined using the Akaike’s information criterion through Markov chain Monte Carlo (AICM; Baele et al., 2012) with 100 bootstraps as implemented in TRACER 1.6. Finally, TREEANNOTATOR v.1.8.3 (Drummond et al.,

2012) and FIGTREE v.1.4.2 (Rambaut, 2014) were used to draw Bayesian consensus trees and obtain 95% highest posterior density (HPD) intervals.

DNA-BASED SPECIES DELIMITATION

We used two independent species delimitation methods that do not require a priori taxonomic information: the General Mixed Yule Coalescent (GMYC) model (Fujisawa & Barraclough, 2013) and the Bayesian Poisson Tree Processes model (bPTP) (Zhang et al., 2013b). The GMYC model requires an ultrametric tree as input, so the Bayesian tree generated by BEAST as described above was used, with and after outgroup removal to provide the most robust diversity estimates (Montagna et al., 2016). The Bayesian tree

46

CAPÍTULO I

generated by BEAST was converted into a Newick file with FIGTREE and used to run GMYC with a single-threshold method in the species delimitation web server (http://species.h- its.org/gmyc/).

In the bPTP species delimitation model, branch lengths represent the number of substitutions, not time, eliminating the problems associated with requiring a calibrated tree when a priori information on divergence time is not available (Zhang et al., 2013b). bPTP tends to overestimate the number of species when using multiple sequences per population (Zhang et al., 2013b). Thus, the number of sequences in the data matrix was reduced to one individual per population (n = 50) for this species delimitation analysis. RAXML v.8.2.9 (Stamatakis, 2014) was used to build a non-ultrametric phylogenetic tree in the CIPRES

Science Gateway (Miller et al., 2010). The output from RAXML with RAXML -HPC BLACK BOX model was used as input for bPTP analyses in the species delimitation web server (http://species.h-its.org/ptp/), specifying outgroup, considering 100,000 MCMC generations, a thinning value of 100, and a burn-in of 10%.

RESULTS mtDNA SEQUENCE DATA AND POLYMORPHISM

Fifty-eight individual sequences of COI, 16S and 12S gene fragments were obtained, and three more COI sequences were retrieved from GenBank (KR005944, KR014937 and KM816675).

No sequence presented nucleotide double peaks that could prompt the existence of nuclear mitochondrial DNA sequences (NUMTs). In the case of the COI, internal stop codons that could suggest the amplification of pseudo-genes were also absent. Results of the genetic variability analyses obtained from the three different mtDNA genes used (12S, 16S and COI) and calculated without considering gaps and missing data are shown in Table 1.

47

CAPÍTULO I

Particularly, COI revealed the highest nucleotide and haplotype diversity values (Table 1). For the concatenated data set of mitochondrial DNA fragments, Fu and Li’s D statistic was 2.21 and significantly higher than zero (P < 0.02). However, Tajima’s test of selective neutrality was not significant (P > 0.1). The average number of nucleotide differences for the concatenated dataset was 23.33 between sequences of D. (S.) crassiusculus and D. (S.) kraussi and 49.67 for the comparison involving D. (D.) hispanicus and D. (K.) brevicollis. Kimura 2-parameter genetic distances among species for COI spanned between 4.0-16.0 %. Interspecific genetic distances for COI between pairs of closely related species ranged between 4.0 % for D. (S.) crassiusculus and D. (S.) kraussi, and 7.6% for D. (D.) hispanicus and D. (K.) brevicollis. Between the two putative subspecies of D. (S.) crassiusculus (D. (S.) c. crassiusculus and D. (S.) c. nigrogeniculatus) pairwise genetic distance was 2.8 %.

Table 1 Descriptive statistics for the three mtDNA genes used in this study (12S, 16S and COI): number of analyzed individuals (N); number of haplotypes (H); number of polymorphic sites (S); haplotype diversity (Hd); nucleotide diversity (π); and Theta per sequence from S (Theta-W).

12S 16S COI N 58 56 61 H 21 22 39 S 51 57 177 Hd 0.940 0.949 0.978 π 0.032 0.024 0.100 Theta-W 11.017 12.409 37.822

48

CAPÍTULO I

BAYESIAN PHYLOGENETIC RECONSTRUCTION

For phylogenetic analyses in BEAST, the Hasegawa-Kishino-Yano model with invariable sites (HKY+I) was used for 12S and 16S, and the General Time-Reversible nucleotide substitution model with gamma-distributed rate heterogeneity (GTR+G) for COI. The clock and demographic model best fitting (i.e. with the lowest AICM values) the concatenated dataset (12S, 16S and COI) was a “strict” molecular clock model with coalescent exponential growth.

All BEAST runs converged and ESS values obtained were always above 200. Bayesian and maximum likelihood phylogenetic analyses produced a similar consensus topology and, in general, most clades were well supported by both Bayesian posterior probabilities and bootstrap values (Fig. 2).

The obtained phylogenetic tree grouped most species/subspecies in good agreement with traditional classification based on phenotypic data, but the analyses also revealed that the current taxonomic nomenclature within the genus Dociostaurus present certain incongruences that require to be revised (Fig. 2): (i) phylogenetic analyses indicated that species within the genus Dociostaurus from the Western Mediterranean constitute a polyphyletic group; (ii) D. (S.) dantini Bolívar, 1914 from Morocco resulted the most distant taxon and grouped with the outgroup (genus Notostaurus); (iii) the two species from the subgenus Stauronotulus, the Iberian D. (S.) c. crassiusculus and the Asian D. (S.) kraussi, grouped together as sister species in a monophyletic clade that was quite distant from the rest of species; (iv) in this clade, D. (S.) c. nigrogeniculatus grouped with D. (S.) kraussi (sequences from GenBank) but not with D. (S.) c. crassiusculus contrary to expectations from current taxonomic classification; (v) the Iberian D. (D.) hispanicus clustered as sister taxon of D. (K.) brevicollis specimens from Russia and Turkey; (vi) the two subspecies of D. (K.) jagoi, D. (K.) j. occidentalis corresponding to specimens from South of Europe and D. (K.) j. jagoi from North of Africa, grouped as sister taxa; (vii) finally, the analyses showed that the subgenera Kazakia, Dociostaurus and Stauronotulus are polyphyletic, indicating that current hypotheses of subgeneric delimitation require to be revised (Table 2; Fig. 2).

49

CAPÍTULO I

Table 2 Summary of the different taxonomic classifications and synonyms for D. crassiusculus and D. kraussi. (S.) for abbreviation for subgenus Stauronotulus and (D.) for abbreviation for subgenus Dociostaurus. Each row corresponds to a single taxon and its synonymous names are indicated in different columns. References: 1, Pantel (1886); 2, Ingenitskii (1897); 3, Tarbinsky (1928); 4, Bey-Bienko (1933); 5, Bey-Bienko & Mistshenko (1951); 6, Mistshenko (1974); 7, Soltani (1978); 8, Hodjat (2016); 9, Cigliano et al. (2017); 10, this study.

Proposed taxonomic Past taxonomic classification (1-6) Soltani taxonomic classification (7) Current taxonomic classification (8-9) classification (10) Species Subspecies Species Subspecies Species Subspecies Species Subspecies Dociostaurus (=Stauronotus) - Dociostaurus (D.) D. (D.) c. Dociostaurus (S.) D. (S.) c. Dociostaurus - crassiusculus (1) (2) crassiusculus crassiusculus crassiusculus crassiusculus (8, 9) crassiusculus Dociostaurus (S.) crassiusculus (6) D. k. kraussi (3, 4) Dociostaurus (D.) D. (D.) c. kraussi Dociostaurus (S.) - Dociostaurus Dociostaurus (=Stauronotus) kraussi (2) D. k. kraussi D. (S.) k. kraussi (5, 6) crassiusculus kraussi (8, 9) kraussi D. (S.) c. D. k. nigrogeniculatus (3, 4) Dociostaurus (D.) D. (D.) c. Dociostaurus (S.) Dociostaurus D. k. Dociostaurus (=Stauronotus) kraussi (2) nigrogeniculatus D. (S.) k. nigrogeniculatus (5, 6) crassiusculus nigrogeniculatus crassiusculus kraussi nigrogeniculatus (8, 9) D. k. aurantipes (4) Dociostaurus (D.) Dociostaurus (S.) D. (S.) c. aurantipes Dociostaurus Dociostaurus (=Stauronotus) kraussi (2) D. (D.) c. kraussi D. k. aurantipes D. (S.) k. aurantipes (5, 6) crassiusculus crassiusculus (9) kraussi Dociostaurus (D.) Dociostaurus (S.) - Dociostaurus Dociostaurus (=Stauronotus) kraussi (2) D. (S.) k. claripes (5, 6) D. (D.) c. kraussi D. k. kraussi crassiusculus kraussi (8, 9) kraussi Dociostaurus (D.) D. (D.) c. kraussi Dociostaurus (S.) - Dociostuarus Dociostaurus (=Stauronotus) kraussi (2) D. (S.) k. ornatus (5, 6) D. k. kraussi crassiusculus kraussi (8, 9) kraussi

50

CAPÍTULO I

Figure 2 Phylogenetic reconstruction of the genus Dociostaurus using Bayesian analyses in BEAST 1.8.3 and maximum likelihood analyses in RAXML. Numbers in nodes indicate posterior probabilities from BEAST analyses performed on concatenated data for three mtDNA genes (12S, 16S and COI), posterior probabilities from BEAST analyses based only on COI gene, and bootstrap support values (over 50%) from RAXML analyses based on the three mtDNA genes. Species delimited using the GYMC model are indicated with red little boxes and posterior delimitation probabilities from bPTP models are shown with red numbers. Notostaurus anatolicus was used as outgroup (shaded in grey). Sequences of Dociostaurus (S.) kraussi and D. (S.) c. nigrogeniculatus (only available for COI gene) were retrieved from GenBank (striped in grey). Species names follow Cigliano et al. (2017). *: 1.00 / 1.00 / 85

Divergence time estimates based on the three mitochondrial gene fragments (12S, 16S and COI) indicated that the split into major clades occurred during the Miocene (~6.36 Ma; HPD: 4.82-8.16). Sister taxa as D. (S.) c. crassiusculus - D. (S.) kraussi and D. (D.) hispanicus - D. (K.) brevicollis split around 1.01 Ma (HPD: 0.6-1.52) and 1.88 Ma (HPD: 1.30- 2.56), respectively. Subspecies of D. (S.) c. nigrogeniculatus diverged from D. (S.) kraussi 0.35 Ma (HPD: 0.13-0.63), more recently than the well-established subspecies of D. (K.) j. jagoi and D. (K.) j. occidentalis (0.71 Ma; HPD: 0.46-1.03) (Fig. 3).

51

CAPÍTULO I

Figure 3 Phylogenetic tree showing the relationship between species of the genus Dociostaurus from the Western Mediterranean. Analyses were performed in BEAST 1.8.3 using sequence information from three mtDNA genes (12S, 16S and COI) and considering a strict clock and a coalescent exponential growth model. Mean ages in million years (Ma) of each node are given and horizontal shadow bars indicate the 95% highest posterior density (HPD) intervals. Notostaurus anatolicus was used as outgroup (shaded in grey). Sequences of Dociostaurus (S.) kraussi and D. (S.) c. nigrogeniculatus (only available for COI gene) were retrieved from GenBank (striped in grey). Species names follow Cigliano et al. (2017).

DNA-BASED SPECIES DELIMITATION

The GMYC model of species delimitation yielded 10 maximum likelihood (ML) entities, including outgroup, with a confidence interval of 8-12. In agreement with the current number of recognized species, the results of GMYC showed that each entity corresponded to a described morphological taxon. Accordingly, sequences corresponding to different subspecies clustered together in a unique species (Fig. 2). The bPTP model retrieved 12 species, including outgroup, with a confidence interval of 10-21 estimated species. The most

52

CAPÍTULO I conservative number of species (10) yielded by bPTP model matched exactly with the taxa established by current taxonomic classification. The bPTP model set boundaries of species delimitation at a lower taxonomic level than GMYC, identifying subspecies as species. However, the lowest posterior delimitation probabilities corresponded to those assigned to the two subspecies of D. (K.) jagoi and to the separation of D. (S.) kraussi and D. (S.) c. nigrogeniculatus (Fig. 2).

DISCUSSION

The GMYC and bPTP models yielded 10-12 taxonomic entities that correspond well with the current number of accepted species (Fig. 2). However, the general agreement between molecular and classical taxonomy at the species/subspecies level contrasts with a remarkable number of incongruences at a higher taxonomic level (genus/subgenus) that require to be thoroughly revised (Cigliano et al., 2017). These analyses point out that increasing the number of molecular markers or adding different data to attempt higher success in species delimitation does not necessary improve the performance of the analysis, although it could improve its statistical power (Blaimer, 2012). Phylogenetic analyses revealed that, at the subgeneric level, Dociostaurus, Kazakia and Stauronotulus do not cluster according to the expectations of current proposed taxonomic classification (Cigliano et al., 2017). Soltani (1978) completely reordered the genus Dociostaurus, synonymizing the subgenera Dociostaurus and Stauronotulus (see Table 2) and including the subgenus Notostaurus within the genus Dociostaurus. He also included the species D. (D.) minutus into the subgenus Kazakia, with D. (K.) brevicollis, D. (K.) jagoi and D. (K.) genei, instead of in its current subgenus Dociostaurus. This author also proposed to include D. (D.) hispanicus into the subgenus Kazakia by its proximity with D. (K.) brevicollis, which is congruent with the results of this study (Figs. 2 and 3). The phylogenetic analyses placed D. (S.) dantini together with Notostaurus anatolicus (Krauss) (outgroup) (Figs. 2 and 3). This result is in agreement with the study by Soltani (1978) suggesting that the genus Notostaurus should be considered

53

CAPÍTULO I a subgenus within the genus Dociostaurus, a nomenclature also followed by other more recent studies (e.g. Sirin & Mol, 2013; Mol et al., 2014). Considering different ways of assigning species, the results of the present study suggest a classification more akin to Soltani´s proposal (Table 2). Given that taxonomic changes have occurred very often within genus Dociostaurus on the basis of phenotypic traits that may vary among populations (e.g. Mistshenko, 1974; Soltani, 1978; Hodjat, 2016), we conclude that the status of subgenera within Dociostaurus is not satisfactory. A possible alternative is using the results of the present phylogenetic study, which are in good agreement with the particular morphological traits previously described to separate the subgenus Dociostaurus (Bey-Bienko, 1933; Bey- Bienko & Mistshenko, 1951; Mistshenko, 1974). Thus, both morphological and genetic data support that D. (K.) brevicollis, D. (K.) genei and D. (K.) jagoi should be assigned to subgenus Dociostaurus. Further phylogenetic analyses including a wider range of species from the tribe Dociostaurini would be of great help to re-evaluate these incongruences and determine the taxonomic value of currently accepted supra-specific classification (Cigliano et al., 2017).

Intraspecific polymorphism, different kinds of morphological crypsis and hybridization are the main problems to establish species boundaries (Evangelista et al., 2014). The analyses presented here resolve some taxonomic ambiguities involving closely related taxa with allopatric distributions. This is the case of the controversial taxonomic classification of D. (S.) kraussi, D. (S.) crassiusculus and their respective subspecies. A summary of historical nomenclatural changes of these two species and our own proposal is shown in Table 2. The results from the present study indicate that the first historical taxonomic classification is probably more accurate than the present one. Based on our results, we propose that: (i) D. crassiusculus is a young species endemic to the Iberian Peninsula. So, this taxon should be considered a conservation priority given the high fragmentation of its few and declining populations (Cordero et al., 2010); (ii) phylogenetic analyses indicate that D. c. nigrogeniculatus is much more closely related to D. kraussi than to D. crassiusculus. Thus, D. c. nigrogeniculatus should be considered a subspecies of D. kraussi (e.g. Tarbinsky, 1928; Bey-Bienko, 1933; Bey-Bienko & Mistshenko, 1951; Mistshenko, 1974); (iii) similarly, D. c. aurantipes from Republic of Tajikistan in Asia should probably be considered as a subspecies of D. kraussi (e.g. Bey-Bienko & Mistshenko, 1951;

54

CAPÍTULO I

Mistshenko, 1974) instead of a subspecies of D. crassiusculus until more analyses can be performed (Soltani, 1978; Hodjat, 2016; Cigliano et al., 2017) (Figs. 2 and 3; Table 2). The fact that D. kraussi has a wide distribution in East Europe and Central and South Asia could explain the description of many different subspecies that may simply reflect phenotypic plasticity among populations (Mal’kotskii, 1963). Accordingly, most subspecies of D. kraussi, like D. k. claripes and D. k. ornatus (Mistshenko, 1951), have been recently suggested as synonyms of D. kraussi (Cigliano et al., 2017). Information on acoustic signals obtained in previous studies agrees with the obtained phylogenetic relationships here (Blondheim, 1990; Ragge & Reynolds, 1998; García et al., 1994, 2005; Savitsky, 2000; Vedenina & Mugue, 2011; Sirin & Mol, 2013). The calling song of D. c. crassiusculus (García et al., 2005) is almost identical to D. kraussi (Savitsky, 2000), which supports the close relationship between these species (Fig. 2). By contrast, the songs of D. brevicollis (Savitsky, 2000; Vedenina & Mugue, 2011; Sirin & Mol, 2013) and D. hispanicus (Ragge & Reynolds, 1998; García et al., 2005) are more different, which is concordant with the estimation of an earlier split between these two sister taxa (Fig. 2).

Phylogenetic analyses and molecular dating in BEAST revealed that the main diversification of the genus Dociostaurus probably took place during the late Miocene and the Pliocene (Fig. 3). This may be explained by the progressively drier and cooler climate during these epochs, which is known to have promoted the expansion of typical habitats for Dociostaurus such as steppes, savannahs and grasslands (Dowsett et al., 1994; Thompson & Fleming, 1996). Thus, the opening of a new niche space in many areas and founder events after the colonization of suitable habitats might have favoured the diversification of the genus since the late Miocene (Song et al., 2015). The split of sister taxa (1.88 Ma: D. hispanicus - D. brevicollis; 1.01 Ma: D. c. crassiusculus - D. kraussi) and subspecies (0.71 Ma: D. j. jagoi - D. j. occidentalis; 0.35 Ma: D. kraussi - D. c. nigrogeniculatus) took place more recently, during the middle and late Pleistocene (Fig. 3). These findings are in agreement with previous studies suggesting that speciation in Orthoptera can happen in as little as 1-2 million years (e.g. Hemp et al., 2015). As shown in many other Mediterranean taxa, these recent or incipient speciation events have probably resulted from long-term population isolation mediated by geographical barriers (e.g. D. j. jagoi - D. j. occidentalis) or as

55

CAPÍTULO I consequence of distributional shifts driven by Pleistocene climatic oscillations (e.g. D. hispanicus - D. brevicollis) (Knowles & Richards, 2005; Noguerales et al., 2016).

The phylogenetic relationships and timing of divergence of D. minutus and D. jagoi provide some insights about the biogeographical origin of the former. The Sicilian D. minutus is a species of great conservation concern that has been included in the European Red List of Orthoptera with the category “endangered” (Bushell, 2013; Hochkirch et al., 2016). The distribution of this brachypterous species is limited to a very restricted area in south Sicily and our results indicate that it diverged from D. jagoi around the early Pleistocene (~2.19 Ma; Fig. 3). The colonization of Sicily by the ancestor of the current D. minutus may have occurred through the siculo-tunisian strait, an area that has been hypothesized to facilitate the exchange of terrestrial fauna between European and African continents when the lower sea level characterizing Pleistocene glacial periods resulted in the emergence of "stepping stone islands" and reduced the distance between north African paleo-coast and the Sicilian landmass (Stöck et al., 2008). This is in accordance with the presence in Sicily of other grasshopper taxa also distributed in north Africa such as Euchorthippus albolineatus (Lucas, 1849), Acinipe calabra (Costa, 1836), and Ocneridia nigropunctata (Lucas, 1849) (Massa, 2011; Massa et al., 2012). The ancestor of D. minutus would have evolved to a short-winged form due to isolation and genetic drift in Sicily where D. jagoi is absent (Massa et al., 2012).

Overall, this study reveals the importance of considering genetic information to disentangle the intricate taxonomy of Dociostaurus and establishes background knowledge for future studies aimed to reconsider the value of the currently accepted supra-specific classification into genera and subgenera (Cigliano et al., 2017). Future studies should consider employing nuclear markers for phylogenetic reconstructions (e.g., Song et al., 2015), a wider number of taxa, and sequencing more specimens from more localities for taxa with large distribution ranges (e.g. D. brevicollis, D. kraussi, and D. maroccanus). Detailed phylogeographic and population genetic analyses of some species of particular interest would also greatly contribute to increase the knowledge on their biogeography and evolutionary history, establish evolutionary significant units for conservation (Vogler & Dessalle, 1994), identify corridors for gene flow (e.g. Ortego et al., 2009; Ortego et al., 2015)

56

CAPÍTULO I and, in the case of pest species (i.e. D. maroccanus), get a better understanding of their demographic dynamics (e.g. Chapuis et al., 2008; Chapuis et al., 2009).

ACKNOWLEDGEMENTS

D. Chobanov, Shaun Winterton and three anonymous referees provided valuable comments to improve an earlier version of this manuscript. Michael G. Sergeev kindly provided us samples of Dociostaurus (K.) brevicollis from Russia. We thank Milagros Coca-Abia and José Ramón Correas for providing us information about the location of two populations of D. (S.) c. crassiusculus. We also acknowledge the unconditional support of Vicenta Llorente and Mercedes Paris during our visits to the entomological collections from the MNCN, and Víctor Noguerales and Pilar Aguirre for their help during sampling. The administrative authorities from each study area provided us the corresponding permits for sampling. MJG was supported by a pre-doctoral scholarship from Junta de Comunidades de Castilla-La Mancha and European Social Fund. JO was supported by a Ramón y Cajal fellowship (RYC-2013- 12501) and a research contract funded by Severo Ochoa Program (SEV-2012-0262). This work received financial support from research grants CGL2011-25053, CGL2014-54671-P, and CGL2016-80742-R (co-funded by the Dirección General de Investigación y Gestión del Plan Nacional I+D+i and European Social Fund); POII10-0197-0167 and PEII-2014-023-P (co- funded by Junta de Comunidades de Castilla-La Mancha and European Social Fund).

REFERENCES

Allegrucci, G., Massa, B., Trasatti, A. & Sbordoni, V. (2014) A taxonomic revision of western Eupholidoptera bush crickets (Orthoptera: Tettigoniidae): testing the discrimination power of DNA barcode. Systematic Entomology, 39, 7-23.

Allegrucci, G., Rampini, M., Gratton, P. et al. (2009) Testing phylogenetic hypotheses for reconstructing the evolutionary history of Dolichopoda cave crickets in the eastern Mediterranean. Journal of Biogeography, 36, 1785-1797.

Amaral, D.T., Mitani, Y., Ohmiya, Y. & Viviani, V. (2016) Organization and comparative analysis of the mitochondrial genomes of bioluminescent Elateroidea (Coleoptera: Polyphaga). Gene, 586, 254-262.

Baele, G., Lemey, P., Bedford, T. et al. (2012) Improving the accuracy of demographic and molecular clock model comparison while accommodating phylogenetic uncertainty. Molecular Biology and Evolution, 29, 2157-2167.

57

CAPÍTULO I

Bey-Bienko, G. (1933) Records and descriptions of some Orthoptera from U.S.S.R. Boletín de la Sociedad Española de Historia Natural, 33, 317-341.

Bey-Bienko, G.Y. & Mistshenko, L.L. (1951) Keys to the Fauna of the U.S.S.R. [1964 English translation no. 40]. Locusts and Grasshoppers of the U.S.S.R. and Adjacent Countries. Bey- Bienko & Mistshenko (Eds.); original Russian edition edited by E.N. Povlovsky. 444 pp. Academy of Sciences. Moscow/Leningrad, Russia.

Blaimer, B.B. (2012) Untangling complex morphological variation: taxonomic revision of the subgenus Crematogaster (Oxygyne) in Madagascar, with insight into the evolution and biogeography of this enigmatic ant clade (Hymenoptera: Formicidae). Systematic Entomology, 37, 240-260.

Blondel, J. & Aronson, J. (1999) Biology and wildlife of the Mediterranean region. 328 pp. Oxford University Press. Oxford, USA.

Blondheim, S.A. (1990) Patterns of reproductive isolation between the sibling grasshopper species Dociostaurus curvicercus and D. jagoi jagoi (Orthoptera: Acrididae: Gomphocerinae). Transactions of the American Entomological Society, 116, 1-64.

Bocek, M. & Bocak, L. (2016) Species limits in polymorphic mimetic Eniclases net-winged beetles from New Guinean mountains (Coleoptera, Lycidae). ZooKeys, 593, 15-35.

Brooks, T.M., Mittermeier, R.A., da Fonseca, G.A. et al. (2006) Global biodiversity conservation priorities. Science, 313, 58-61.

Bushell, M. (2013) Dociostaurus minutus: the IUCN Red List of Threatened Species 2013: e.T16084624A44828840 [WWW document]. URL http://www.iucnredlist.org

Chapuis, M.P., Lecoq, M., Michalakis, Y. et al. (2008) Do outbreaks affect genetic population structure? A worldwide survey in Locusta migratoria, a pest plagued by microsatellite null alleles. Molecular Ecology, 17, 3640-3653.

Chapuis, M.P., Loiseau, A., Michalakis, Y. et al. (2009) Outbreaks, gene flow and effective population size in the migratory locust, Locusta migratoria: a regional-scale comparative survey. Molecular Ecology, 18, 792-800.

Cigliano, M.M., Braun, H., Eades, D.C. & Otte, D. (2017) Orthoptera Species File. Version 5.0/5.0. [WWW document]. URL http://orthoptera.speciesfile.org

Çiplak, B., Kaya, S., Boztepe, Z. & Gündüz, I. (2014) Mountainous genus Anterastes (Orthoptera, Tettigoniidae): autochthonous survival across several glacial ages via vertical range shifts. Zoologica Scripta, 44, 534-549.

Cordero, P.J., Llorente, V., Aguirre, M.P. & Ortego, J. (2010) Dociostaurus crassiusculus (Pantel, 1886), especie (Orthoptera: Acrididae) rara en la Península Ibérica con poblaciones

58

CAPÍTULO I locales en espacios singulares de Castilla-La Mancha (España). Boletín de la Sociedad Entomológica Aragonesa, 46, 461-465.

Darriba, D., Taboada, G.L., Doallo, R. & Posada, D. (2012) JMODELTEST2: more models, new heuristics and parallel computing. Nature Methods, 9, 772.

Dowsett, H., Thompson, R., Barron, J. et al. (1994) Joint investigations of the middle Pliocene climate I:PRISM paleoenvironmental reconstructions. Global and Planetary Change, 9, 169- 195.

Drummond, A.J., Suchard, M.A., Xie, D. & Rambaut, A. (2012) Bayesian phylogenetics with BEUTI and the BEAST 1.7. Molecular Biology and Evolution, 29, 1969-1973.

Evangelista, D.A., Bourne, G. & Ware, J.L. (2014) Species richness estimates of Blattodea s.s. (Insecta: Dictyoptera) from northern Guyana vary depending upon methods of species delimitation. Systematic Entomology, 39, 150-158.

Fieber, F.X. (1853) Synopsis der europäischen Orthoptera mit besonderer Rücksicht auf die in Böhmen vorkommenden Arten als Auszug aus dem zum Drucke vorliegenden Werke “Die europäischen Orthoptera”. Lotos, 3, 118-119.

Fujisawa, T. & Barraclough, T.G. (2013) Delimiting species using single-locus data and the Generalized Mixed Yule Coalescent approach: a revised method and evaluation on simulated data sets. Systematic Biology, 62, 707-724.

García, M.D., Clemente, M.E. & Presa, J.J. (1994) The acoustic behaviour of Dociostaurus jagoi occidentalis Soltani, 1978 (Orthoptera, Acrididae). Zoologica Baetica, 5, 79-87.

García, M.D., Larrosa, E., Clemente, M.E. & Presa, J.J. (2005) Contribution to the knowledge of genus Dociostaurus Fieber, 1853 in the Iberian Peninsula, with special reference to its sound production (Orthoptera: Acridoidea). Anales de Biología, 27, 155-189.

Giorgi, F. & Lionello, P. (2008) Climate change projections for the Mediterranean region. Global and Planetary Change, 63, 90-104.

Grzywacz, B., Heller, K.G., Lehmann, A.W. et al. (2013) Chromosomal diversification in the flightless Western Mediterranean bushcricket genus Odontura (Orthoptera: Tettigoniidae: Phaneropterinae) inferred from molecular data. Journal of Zoological Systematics and Evolutionary Research, 52, 109-118.

Harz, K. (1975) Die Orthopteren Europas II. The Orthoptera of Europe II. Series Entomologica, 11. Dr.W. Junk B.V. Publishers (Ed.). 939 pp. The Hague, NL.

Hemp, C., Kehl, S., Schultz, O. et al. (2015) Climatic fluctuations and orogenesis as motors for speciation in East Africa: case study on Parepistaurus Karsch, 1896 (Orthoptera). Systematic Entomology, 40, 17-34.

59

CAPÍTULO I

Hewitt G. (2000) The genetic legacy of the Quaternary ice ages. Nature, 405, 907-913.

Hewitt, G.M. (1999) Post-glacial re-colonization of European biota. Biological Journal of the Linnean Society, 68, 87-112.

Hochkirch, A., Nieto, A., García-Criado, M. et al. (2016) European red list of grasshoppers, crickets and bush-crickets. 94 pp. Publications Office of the European Union. Luxembourg.

Hodjat, S.H. (2016) A review of Iranian Dociostaurini (Orthoptera: Gomphocerinae) with keys to their species. Entomologia Generalis, 35, 253-268.

Huang, J.H., Zhang, A.B., Mao, S.L. & Huang, Y. (2013) DNA barcoding and species boundary delimitation of selected species of Chinese Acridoidea (Orthoptera: ). PLOS ONE, 8, e82400.

Ingenitskii (1897) Ueber eine neue Acridiiden-Art. Trudy Russkago Entomologicheskago Obshchestva. Horae Societatis Entomologicae Rossicae, 31, 63-71.

Klausmeyer, K.R. & Shaw, M.R. (2009) Climate Change, habitat loss, protected areas and the climate adaptation potential of species in Mediterranean ecosystems worldwide. PLOS ONE, 4, e6392.

Knowles, L.L. & Richards, C.L. (2005) Importance of genetic drift during Pleistocene divergence as revealed by analyses of genomic variation. Molecular Ecology, 14, 4023-4032.

Latchininsky, A.V. (1998) Moroccan locust Dociostaurus maroccanus (Thunberg, 1815): a faunistic rarity or an important economic pest? Journal of Insect Conservation, 2, 167-178.

Latchininsky, A.V. (2013) Locusts and remote sensing: a review. Journal of Applied Remote Sensing, 7, 1-32.

Librado, P. & Rozas, J. (2009) DNASP v5: A software for comprehensive analysis of DNA polymorphism data. Bioinformatics, 25, 1451-1452.

Mal’kotskii, M. P. (1963) On the polymorphism and phase variability of the Dociostaurus kraussi Ingen. (Orthoptera, Acrididae) in Mangyshlak and Ust-Urt, West Kazakhstan. Entomologicheskoe Obozrenie Leningrad, 42, 778-781.

Massa, B. (2011) Gli Ortotteri di Sicilia: check-list commentata. La Biogeografia della Sicilia. Biogeographia. Vol. 30, 567-626 pp. Palermo, IT.

Massa, B., Fontana, P., Buzzetti, F.M., Kleukers, R. & Odé, B. (2012) Fauna d’Italia: Orthoptera. Vol. 48, 563 pp + DVD. Calderini. Italy.

Médail, F. & Quézel, P. (1999) Biodiversity hotspots in the Mediterranean Basin: setting global conservation priorities. Conservation Biology, 13, 1510-1513.

60

CAPÍTULO I

Miller, M.A., Pfeiffer, W. & Schwartz, T. (2010) Creating the CIPRES Science Gateway for Inference of Large Phylogenetic Trees. Proceedings of the Gateway Computing Environments Workshop (GCE), 1-8. [WWW document]. URL http://www.phylo.org/

Mistshenko, L.L. (1974) To the knowledge of the grasshoppers belonging to the genus Dociostaurus Fieb. (Orthoptera, Acrididae). Revue d’Entomologie de l’U.R.S.S. Entomologicheskoe Obozrenie, 53, 334-342.

Mol, A., Sirin, D. & Taylan, M.S. (2014) Türkiye’de dağılım gösteren bazı Caelifera (Insecta: Orthoptera) türlerinin yeni lokalite kayıtları, endemizm, yaygınlık ve tarımsal zarar oluştıurma açısından değerlendirilmesi. Bitki Koruma Bülteni, 54, 133-170.

Montagna, M., Kubisz, D., Mazur, M.A. et al. (2016) Exploring species-level taxonomy in the Cryptocephalus flavipes species complex (Coleoptera: Chrysomelidae). Zoological Journal of the Linnean Society [doi: 10.1111/zoj.12445]

Moussi, A., Abba, A., Harrat, A. & Petit, D. (2014) Description of Dociostaurus biskrensis sp. nov. and male allotypes of four species: Pamphagulus bodenheimeri dumonti, P. uvarovi, Sphingonotus ebneri and Notopleura pygmaea (Orthoptera: Acridoidea) in the region of Biskra, Algeria. Zootaxa, 3755, 379-390.

Myers, N., Mittermeier, R.A., Mittermeier, C.G. et al. (2000) Biodiversity hotspots for conservation priorities. Nature, 403, 853-858.

Noguerales, V., Cordero, P.J. & Ortego, J. (2016) Hierarchical genetic structure shaped by topography in a narrow-endemic montane grasshopper. BMC Evolutionary Biology, 16, 96.

Ortego, J., Aguirre, M.P., Noguerales, V. & Cordero, P.J. (2015) Consequences of extensive habitat fragmentation in landscape-level patterns of genetic diversity and structure in the Mediterranean esparto grasshopper. Evolutionary Applications, 8, 621-632.

Ortego, J., Bonal, R., Cordero, P.J. & Aparicio J.M. (2009) Phylogeography of the Iberian populations of Mioscirtus wagneri (Orthoptera: Acrididae), a specialized grasshopper inhabiting highly fragmented hypersaline environments. Biological Journal of the Linnean Society, 97, 623-633.

Pantel, S.J. (1886) Contribution a L’Orthoptérologie de L’Espagne Centrale. Anales de la Sociedad Española de Historia Natural, 15, 237-241.

Papadopoulou, A., Anastasiou, I. & Vogler, A.P. (2010) Revisiting the insect mitochondrial molecular clock: the mid-aegean trench calibration. Molecular Biology and Evolution, 27, 1659-1672.

Petit, R.J., Aguinagalde, I., Beaulieu, J.L. et al. (2003) Glacial refugia: Hotspots but not melting pots of genetic diversity. Science, 300, 1563-1565.

61

CAPÍTULO I

Pons, J., Barraclough, T.G., Gomez-Zurita, J. et al. (2006) Sequence-based species delimitation for the DNA taxonomy of undescribed insects. Systematic Biology, 55, 595-609.

Ragge, D.R. & Reynods, W.J. (1998) The Songs of the Grasshoppers and Crickets of Western Europe. Harley Books, Colchester.

Rambaut, A. (2014) FIGTREE v1.4.2. Computer Program [WWW document]. URL http://tree.bio.ed.ac.uk/software/figtree/

Rambaut, A., Suchard, M.A., Xie, D. & Drummond, A.J. (2014) TRACER v1.6. Computer Program [WWW document]. URL http://beast.bio.ed.ac.uk/Tracer/

Savitsky, V.Y. (2000) Acoustic signals, ecological features and reproductive isolation of grasshoppers of the genus Dociostaurus (Orthoptera, Acridiae) in semi-desert. Entomological Review, 80, 950-967.

Simon, C., Frati, F., Beckenbach, A. et al. (1994) Evolution, weighting, and phylogenetic utility of mitochondrial gene sequences and a compilation of conserved polymerase chain reaction primers. Annals of the entomological Society of America, 87, 651-701.

Sirin, D. & Mol, A. (2013) New species and new song record of the genus Dociostaurus Fieber, 1853 (Orthoptera, Acrididae, Gomphocerinae) from Southern Anatolia, Turkey. Zootaxa, 3683, 486-500.

Solis-Lemus, C., Knowles, L.L. & Ané, C. (2015) Bayesian species delimitation combining multiple genes and traits in a unified framework. Evolution, 69, 492-507.

Soltani, A.A. (1978) Preliminary synonymy and description of new species in the genus Dociostaurus Fieber, 1853 (Orthoptera: Acridoidea: Acrididae, Gomphocerinae) with a key to the species in the genus. Journal of Entomological Society of Iran, 2, 1-93.

Song, H., Amédégnato, C., Cigliano, M.M. et al. (2015) 300 million years of diversification: elucidating the patterns of orthopteran evolution based on comprehensive taxon and gene sampling. Cladistics, 31, 621-651.

Stamatakis, A. (2014) RAXML Version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics, 30, 1312-1313.

Stöck, M., Sicilia, A., Belfiore, N.M. et al. (2008). Post-Messinian evolutionary relationships across the Sicilian channel: Mitochondrial and nuclear markers link a new green toad from Sicily to African relatives. BMC Evolutionary Biology, 8, 56.

Tamura, K., Peterson, D., Peterson, N. et al. (2011) MEGA5: Molecular Evolutionary Genetics Analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Molecular Biology and Evolution, 28, 2731-2739.

62

CAPÍTULO I

Tarbinsky (1928) On some new and little known Orthoptera from Palaearctic Asia (in Russian with English description). Bulletin of the Institute of applied Zoology and Phytopathology of Leningrad, 4, 51-61.

Tautz, D., Arctander, P., Minelli, A. et al. (2003) A plea for DNA taxonomy. Trends in Ecology & Evolution, 18, 70-74.

Thompson, R.S. & Fleming, R.F. (1996) Middle Pliocene vegetation: reconstructions, paleoclimatic inferences, and boundary conditions for climate modeling. Marine Micropaleontology, 27, 27-49.

Vedenina, V. & Mugue N. (2011) Speciation in gomphocerine grasshoppers: molecular phylogeny versus bioacoustics and courtship behavior. Journal of Orthoptera Research, 20, 109-125.

Vogler, A.P. & Desalle, R. (1994) Diagnosing units of conservation management. Conservation Biology, 8, 354-363

Vogler, A.P. & Monaghan, M.T. (2006) Recent advances in DNA taxonomy. Journal of Zoological Systematics and Evolutionary Research, 45, 1-10.

Yang, Z.H. (2015) The BPP program for species tree estimation and species delimitation. Current Zoology, 61, 854-865.

Zhang, H.L., Huang, Y., Lin, L.L. et al. (2013a) The phylogeny of the Orthoptera (Insecta) as deduced from mitogenomic gene sequences. Zoological Studies, 52, 37.

Zhang, J., Kapli, P., Pavlidis, P. & Stamatakis, A. (2013b) A general species delimitation method with applications to phylogenetic placements. Bioinformatics, 29, 2869-2876.

63

CAPÍTULO I

SUPPORTING INFORMATION

Table S1 List of species names according to Cigliano et al. (2017) included in this study with their individual reference (ID), number of samples per locality (n), geographical location and GenBank accession numbers (corresponding to 12S ribosomal RNA gene, 16S ribosomal RNA gene, and cytochrome oxidase subunit I gene, in this order respectively).

Species ID n Location Country Latitude Longitude GenBank accession numbers Dociostaurus (Kazakia) Db1; Db2 2 Slavgorodskiy (Altayskiy kray) Russia 53.106277 78.991363 KX954639, KX954697, KX954753; KX954640, KX954698, KX954754 brevicollis Db3; Db4 2 Borovskoye (Altai Krai) Russia 52.640551 82.225441 KX954641, KX954699, KX954755; KX954642, KX954700, KX954756 (Eversmann, 1848) Db5; Db6 2 Krasnozerskoye (Novosibirskaya) Russia 53.952226 79.239308 KX954643,------, KX954757; KX954644, KX954702, KX954758 Db7; Db8 2 Voltchikha (Altayskiy Kray) Russia 52.037756 80.431709 KX954645, KX954703, KX954759; KX954646, KX954704, KX954760 Db9 1 Peçenek (Ankara) Turkey 40.420864 32.328815 KX954647, KX954701, KX954761 Dociostaurus Dcc1 1 El Bonillo (Albacete) Spain 38.937583 -2.565444 KX954648, KX954705, KX954762 (Stauronotulus) Dcc2 1 (C.Real) Spain 39.464714 -3.180559 KX954649, KX954706, KX954763 crassiusculus Dcc3 1 Belinchón (Cuenca) Spain 40.052653 -3.057982 KX954650, KX954707, KX954764 crassiusculus Dcc4 1 Orce (Granada) Spain 37.751535 -2.425993 KX954651, KX954708, KX954765 (Pantel, 1886) Dcc5 1 Perales del Tajuña (Madrid) Spain 40.206424 -3.318093 KX954652, KX954709, KX954766 Dcc6 1 Villacañas (Toledo) Spain 39.520717 -3.347664 KX954653, KX954710, KX954767 Dociostaurus (Stauronotulus) crassiusculus Dcn1 1 Xinjiang China ------, ------, KM816675 nigrogeniculatus Tarbinsky, 1928 Dociostaurus Dd1; Dd2; KX954654, KX954711, KX954768; KX954655, KX954712, KX954769; (Stauronotulus) dantini 3 Ouarzazate (Ouarzazate) Morocco 31.314908 -7.370333 Dd3 KX954656, KX954713, KX954770 Bolívar, 1914

64

CAPÍTULO I

(continuation of Table S1)

Species ID n Location Country Latitude Longitude GenBank accession numbers Dociostaurus (Kazakia) Dgg1 1 Serradilla (Cáceres) Spain 39.722527 -6.083853 KX954657, KX954714, KX954771 genei genei Dgg2 1 Belalcázar (Córdoba) Spain 38.594329 -5.157927 KX954658, KX954715, KX954772 (Ocskay, 1832) Dgg3 1 Cañada del Hoyo (Cuenca) Spain 39.937090 -1.993173 KX954659, KX954716, KX954773 Dgg4 1 Puebla de Don Fadrique (Granada) Spain 38.046225 -2.485038 KX954660, KX954717, KX954774 Dgg5 1 Cervera de Pisuerga (Palencia) Spain 42.908750 -4.642257 KX954661, KX954718, KX954775 Dgg6 1 Majaelrayo (Guadalajara) Spain 41.125534 -3.329502 KX954662, KX954719, KX954776 Dociostaurus Dh1 1 Ávila (Ávila) Spain 40.624106 -4.689135 KX954663, KX954720, KX954777 (Dociostaurus) Dh2 1 (Ciudad Real) Spain 38.566024 -4.310637 KX954664, KX954721, KX954778 hispanicus Dh3 1 Belalcázar (Córdoba) Spain 38.594329 -5.157927 KX954665, KX954722, KX954779 Bolívar, 1898 Dh4 1 Santa Elena (Jaén) Spain 38.332111 -3.529301 KX954666, KX954723, KX954780 Dh5 1 Chapinería (Madrid) Spain 40.391272 -4.209263 KX954667, KX954724, KX954781 Dh6 1 Trabanca (Salamanca) Spain 41.242753 -6.402215 KX954668, KX954725, KX954782 Dociostaurus (Kazakia) Djj1 1 Aïn Leuh (Ifrane) Morocco 33.249180 -5.309502 KX954669, KX954726, KX954783 jagoi jagoi Djj2 1 Chiker (Taza) Morocco 34.086929 -4.108407 KX954670, KX954727, KX954784 Soltani, 1978 Djj3 1 Dar Chaoui (Tetouan) Morocco 35.520889 -5.732648 KX954671, KX954728, KX954785 Djj4 1 Ben Mansour (Kenitra) Morocco 34.632973 -6.415268 KX954672, KX954729, KX954786 Djj5 1 Afourer (Azilal) Morocco 32.190371 -6.508905 KX954673, KX954730, KX954787 Djj6; Djj7 2 Korba (Nābul) Tunisia 36.520655 10.838902 KX954674, KX954731, KX954788; KX954675, KX954732, KX954789 Dociostaurus (Kazakia) Djo1 1 Cabrera de Mar (Barcelona) Spain 41.532165 2.389093 KX954676, KX954733, KX954790 jagoi occidentalis Djo2 1 Chiclana de la Frontera (Cádiz) Spain 36.385553 -6.205986 KX954677, KX954734, KX954791 Soltani, 1978 Djo3 1 La Hinojosa (Cuenca) Spain 39.707254 -2.455189 KX954678, KX954735, KX954792 Djo4; Djo5 2 Ciudadela (Balearic Islands) Spain 39.932997 3.920243 KX954679, KX954736, KX954793; KX954680, KX954737, KX954794 Djo6 1 Majaelrayo (Guadalajara) Spain 41.125534 -3.329502 KX954681, KX954738, KX954795 Djo7 1 Litago (Zaragoza) Spain 41.792921 -1.763370 KX954682, KX954739, KX954796 Djo8 1 Stintino (Sardinia) Italy 40.885799 8.238018 KX954683, ------, KX954797 Djo9 1 Aguçadoura (Póvoa de Varzim) Portugal 41.443531 -8.777878 KX954684, KX954740, KX954798

65

CAPÍTULO I

(continuation of Table S1)

Species ID n Location Country Latitude Longitude GenBank accession numbers Dociostaurus (Stauronotulus) kraussi Dk1; Dk2 2 Xinjiang China ------, ------, KR005944; ------, ------, KR014937 (Ingenitskii, 1897) Dociostaurus Dm1 1 El Bonillo (Albacete) Spain 38.937583 -2.565444 KX954685, KX954741, KX954799 (Dociostaurus) Dm2 1 Castuera (Badajoz) Spain 38.748272 -5.544498 KX954686, KX954742, KX954800 maroccanus Dm3 1 Brazatortas (Ciudad Real) Spain 38.589580 -4.353536 KX954687, KX954743, KX954801 (Thunberg, 1815) Dm4 1 Bérchules (Granada) Spain 36.968460 -3.214250 KX954688, KX954744, KX954802 Dm5 1 Caravaca de la Cruz (Murcia) Spain 38.102832 -2.019989 KX954689, KX954745, KX954803 Dm6 1 Sando (Salamanca) Spain 40.969612 -6.109545 KX954690, KX954746, KX954804 Dm7 1 Alhama de Aragón (Zaragoza) Spain 41.353294 -1.936456 KX954691, KX954747, KX954805 Dociostaurus Dmi1; Dmi2; KX954692, KX954748, KX954806; KX954693, KX954749, KX954807; (Dociostaurus) minutus 4 Marza (Sicily) Italy 36.712590 14.931700 Dmi3; Dmi4 KX954694, KX954750, KX954808; KX954695, KX954751, KX954809 La Greca, 1962 Notostaurus anatolicus KX954696, KX954752, KX954810 Na1 1 Peçenek (Ankara) Turkey 40.420864 32.328815 (Krauss, 1896)

66

CAPÍTULO I

Table S2 mtDNA genes, PCR amplicon sizes, primers used for amplification and their respective sequences (Simon et al., 1994).

Gene Amplicon size Primer Sequence (5’ → 3’) 12SF TAC TAT GTT ACG ACT TAT 12S 390 bp 12SR AAA CTA GGA TTA GAT ACC C 16Sar CGC CTG TTT AAC AAA AAC AT 16S 488 bp 16Sbr CCG GTC TGA ACT CAG ATC ACG T

LCO1490 GGT CAA CAA ATC ATA AAG ATA TTG G COI 629 bp HC02198 TAA ACT TCA GGG TGA CCA AAA AAT CA

67

CAPÍTULO 2

Using high-throughput sequencing to investigate the factors structuring genomic variation of a Mediterranean grasshopper of great conservation concern

María José González-Serna, Pedro J. Cordero & Joaquín Ortego Scientific Reports (2018), 8: 13436 DOI: 10.1038/s41598-018-31775-x

CAPÍTULO II

Using high-throughput sequencing to investigate the factors structuring genomic variation of a Mediterranean grasshopper of great conservation concern

Abstract

Inferring the demographic history of species is fundamental for understanding their responses to past climate/landscape alterations and improving our predictions about the future impacts of the different components of ongoing global change. Estimating the time-frame at which population fragmentation took place is also critical to determine whether such process was shaped by ancient events (e.g. past climate/geological changes) or if, conversely, it was driven by recent human activities (e.g. habitat loss). We employed genomic data (ddRADseq) to determine the factors shaping contemporary patterns of genetic variation in the Iberian cross-backed grasshopper Dociostaurus crassiusculus, an endangered species with limited dispersal capacity and narrow habitat requirements. Our analyses indicate the presence of two ancient lineages and three genetic clusters resulted from historical processes of population fragmentation (~18-126 ka) that predate the Anthropocene. Landscape genetic analyses indicate that the limits of major river basins are the main geographical feature explaining large-scale patterns of genomic differentiation, with no apparent effect of human-driven habitat fragmentation. Overall, our study highlights the importance of detailed phylogeographic, demographic and spatially-explicit landscape analyses to identify evolutionary significant units and determine the relative impact of historical vs. anthropogenic factors on processes of genetic fragmentation in taxa of great conservation concern.

Keywords: conservation genomics, ddRADseq, demographic inference, habitat fragmentation, landscape genetics, population structure.

71

CAPÍTULO II

INTRODUCTION

Inferring the evolutionary and demographic history of species and populations is fundamental for understanding how they were impacted by past environmental and landscape alterations and anticipating their responses to different components of global change such as climatic variations (Cordero et al., 2010; Espindola et al., 2012; Jay et al., 2012), habitat loss (Brown et al., 2016) or the emergence of infectious diseases (Cullingham et al., 2009). Many organisms show nowadays highly fragmented distributions due to a natural patchy distribution of their particular habitats (Harrison, 1997; Cordero et al., 2010) or as consequence of their originally continuous populations became isolated due to habitat fragmentation driven by human activities (Saunders et al., 1991; Lindenmayer & Fischer, 2006) or past climatic/geological events (Ribera & Blasco-Zumeta, 1998; Wallis et al., 2016). The genetic, ecological and evolutionary consequences of severe population fragmentation are numerous, including alteration of selective pressures, genetic erosion, inbreeding, accumulation of deleterious mutations, reduced evolutionary potential and, ultimately, increased risk of extinction (Willi et al., 2006; Zastavniouk et al., 2017). For these reasons, the study of population fragmentation is a major area of research for conservation biologists and geneticists, and particular attention has been paid to taxa forming small populations and presenting narrow distributions, low dispersal capabilities, and specific habitat requirements (Frankham, 1995; Frankham & Ralls, 1998).

Estimating the time-frame at which population fragmentation took place is critical to determine whether such process was driven by historical processes that predate the Anthropocene or if, conversely, it is a direct consequence of human activities (Cunningham & Moritz, 1998; Kalvik et al., 2012). This has important implications to inform on ground conservation practices (Moritz, 2002). If recent human-induced habitat fragmentation is identified as the main driver of population genetic structure and disruption of gene flow, then conservation practices should focus on restoring population connectivity either creating corridors to dispersal or assisting gene flow to avoid the long-term negative consequences of inbreeding and loss of genetic diversity (Wang et al., 2009; Aitken & Whitlock, 2013). If, instead, population genetic structure was driven by ancient processes,

72

CAPÍTULO II then the different clades, lineages or genetic clusters might represent Evolutionary Significant Units (ESUs) with particular local adaptations that deserve to be managed independently to maximize the protection of both vicariant and adaptive diversity (Fraser & Bernatchez, 2001; Moritz, 2002; Yan et al., 2018). Beyond the temporal scale of population divergence, identifying the proximate factors shaping contemporary patterns of genetic structure is also fundamental to understand how organisms interact with the different components of the landscape (Manel et al., 2003). Genetic and spatial information has been successfully integrated to infer dispersal routes across different habitat types (Wang et al., 2009), identify natural barriers to dispersal (e.g. rivers: Quéméré et al., 2016; topography: Noguerales et al., 2016; geology: Pepper et al., 2008) and determine the consequences of human activities on disrupting gene flow of natural populations (e.g. agriculture: Ortego et al., 2015a; Tinnert et al., 2016; infrastructures: Ruiz-González et al., 2014). For this reason, testing alternative spatially-explicit scenarios of population connectivity under a landscape genetic framework can help to determine the relative role of human and natural barriers to gene flow on structuring present-day patterns of genetic variation (Segelbacher et al., 2010). This takes particular relevance in the case of specialist taxa with patchy distributions, as identifying contemporary barriers to gene flow and cryptic corridors to dispersal is crucial to establish management practices aimed to restore or enhance connectivity among remnant populations (Wang et al., 2009).

The Iberian Peninsula constitutes an important biodiversity hotspot, with high species richness, rates of endemism and levels of intra-specific genetic diversity (Médail & Quézel, 1999; Myers et al., 2000; Brooks et al., 2006). Explanations for the high diversity of the Iberian Peninsula include its historically high climatic stability (Blondel & Aronson, 1999), the low impact of Pleistocene glaciations in comparison with northern temperate areas (Hewitt, 2000), its current proximity and Miocene connection with North Africa and other Mediterranean regions (Ribera & Blasco-Zumeta, 1998; Faille et al., 2014), and the presence of deep environmental gradients and a complex topography (Blondel & Aronson, 1999; Ferrer-Castán & Vetaas, 2005). Despite its high biodiversity and conservation value, the Mediterranean region has been exposed to millennia of strong human intervention (Blondel & Aronson, 1999; Ortego et al., 2015a) that have reduced the original extent of its primary

73

CAPÍTULO II vegetation by ~95% (Myers et al., 2000). This region is also predicted to be impacted by intense effects of climate change and experience distributional shifts and remarkable range contractions in many taxa (Giorgi & Lionello, 2008; Barredo et al., 2016). Both severe habitat loss and climate warming represent serious threats for many taxa with small and highly fragmented distributions that deal with important difficulties for maintaining viable populations and face risk of extinction (Hanski, 2011; Remón et al., 2012). Thus, understanding the evolutionary history, demographic trends, and interactions with landscape heterogeneity of these taxa is critical for establishing effective conservation policies and informed management practices that ensure their long-term persistence (Saccheri et al., 1998; Rubidge et al., 2012).

In this study, we use genomic data to infer the processes structuring genetic variation in the Iberian cross-backed grasshopper Dociostaurus crassiusculus (Pantel, 1886), a species of great conservation concern that has been recently catalogued as “endangered” in the Red List of European Orthoptera (Hochkirch et al., 2016). The taxonomic position of this species was controversial and according to morphological criteria it has been considered for a long time a subspecies of the Asian Dociostaurus kraussi (Ingenitskii, 1897) (Harz, 1975; Soltani, 1978; Hodjat, 2016). A recent re-evaluation of the taxonomic status of the genus using genetic data has supported the presence of two well recognized species in concordance with their disjunct distributions: D. crassiusculus in the Iberian Peninsula and D. kraussi in Asia (González-Serna et al., 2018). The full species status of D. crassiusculus makes it of higher conservation concern provided that the very few known populations of the species persist in highly isolated pockets of habitat embedded in an expansive matrix of unsuitable areas (Cordero et al., 2010). The species is currently distributed in central-southeast Iberia, mostly occupying pseudo-steppe habitats with halophytic plant communities linked to gypsum or hypersaline soils (Cordero et al., 2010). These narrow habitat requirements, together with the reduced flying capacity of the species and the progressive loss of its natural habitat by human activities, has led that all populations of D. crassiusculus are nowadays extremely fragmented and at high risk of extinction by stochastic phenomena (Gangwere et al., 1985; Cordero et al., 2010).

74

CAPÍTULO II

Here, we employ restriction-site-associated DNA sequencing (ddRADseq), coalescent- based simulations and a landscape genetics framework to evaluate alternative demographic scenarios, estimate the timing of population fragmentation, and infer the processes shaping contemporary patterns of genetic structure across all known populations of D. crassiusculus. Specifically, we first used genomic data to analyse the spatial genetic structure of extant populations of the species, identify main lineages and establish their phylogenomic relationships, and define hierarchical units for conservation and management (Fraser & Bernatchez, 2001; Moritz, 2002). Second, we tested alternative coalescent-based demographic and migration models to infer spatial patterns and rates of inter-population gene flow, estimate the timing of population fragmentation at different spatial scales and, ultimately, determine whether the genetic structure of the species is a consequence of ancient events (e.g. past climate or geological changes) or if, conversely, it is compatible with human-driven population fragmentation (Cunningham & Moritz, 1998). Finally, we generated alternative isolation-by-resistance (IBR) scenarios of population connectivity within a spatially-explicit framework to identify the specific landscape features shaping genetic differentiation in the species and unravel the relative importance of natural (topography, lithology, limits of main river basins) vs. anthropogenic (habitat loss) processes of genetic fragmentation.

MATERIALS AND METHODS

STUDY AREA AND SAMPLING

During 2008-2015, we sampled six populations of Dociostaurus crassiusculus (Pantel, 1886) (Fig. 1; Table 1). All the populations were found in areas with a particular lithological composition (evaporites, limestones, and conglomerates) and with plant communities linked to gypsum or hypersaline soils. We are confident that these populations cover the entire distribution range of the species, as other areas with potentially adequate habitats (i.e.

75

CAPÍTULO II pseudo-steppe saline grounds, wastelands with halophytic vegetation and surroundings of hypersaline/saline lagoons with marl-gypsum outcrops) have been extensively prospected without any records of the species (Cordero et al., 2010; González-Serna et al., 2018). Dociostaurus crassiusculus has been recently assigned to the category “endangered” in the IUCN red list of threatened species due to the high fragmentation of its very small size populations (Cordero et al., 2010; Hochkirch et al., 2016) and, for this reason, we only collected 5-6 adult individuals per population. We aimed at collecting an equal number of males and females in each locality, but samples sizes are often male-biased due to very low female numbers in some populations. Monitoring of some of the studied populations indicates that the abundance of D. crassiusculus in years before and after sampling was qualitatively similar, which suggests that the removal of only 5-6 individuals per locality had little impact on the population dynamics of the species. Fresh whole specimens were stored in 2,000 µL ethanol 96% at -20°C until used for genomic analyses.

Table 1 Geographical location and number of samples (n) for the studied populations of the Iberian cross-backed grasshopper Dociostaurus crassiusculus.

Locality Province Code n males n females Latitude Longitude Perales de Tajuña Madrid TAJU 4 1 40.2064 -3.3181 Belinchón Cuenca BELI 3 3 40.0793 -3.0446 Laguna de Peña Hueca Toledo PHUE 3 3 39.5158 -3.3486 Laguna de Salicor Ciudad Real SALI 3 3 39.4637 -3.1808 El Bonillo Albacete BONI 3 3 38.8779 -2.4834 Orce Granada ORCE 3 3 37.7515 -2.4259

76

CAPÍTULO II

Figure 1 (a) Geographical location and genetic structure of the studied populations of the Iberian cross-backed grasshopper Dociostaurus crassiusculus. Brown shading in the map represents elevation, with darker areas corresponding to higher altitudes. Black lines show the boundaries of the main river basins separating the three population groups: Northern (dark green), Central (light green) and Southern (orange). (b) Male of D. crassiusculus. (c) Typical habitat of the species, with gypsophilous grounds and wastelands with halophytic vegetation. (d) Barplots showing the genetic assignments of the different individuals based on the Bayesian method implemented in the program STRUCTURE for K = 2 and K = 3. Each individual is represented by a vertical bar, which is partitioned into k coloured segments showing the individual’s probability of belonging to the cluster with that colour. Thin vertical black lines separate individuals from different populations. These analyses are based on a random subset of 10,000 unlinked SNPs obtained with STACKS (p = 4).

77

CAPÍTULO II

DNA EXTRACTION AND GENOMIC LIBRARY PREPARATION

We used NucleoSpin Tissue kits (Macherey-Nagel, Durën, Germany) to extract and purify genomic DNA from the hind femur of each individual. Genomic DNA was individually barcoded and processed into one genomic library using the double-digestion restriction- fragment-based procedure (ddRADseq) described in Peterson et al. (2012). In brief, DNA was doubly digested with the restriction enzymes MseI and EcoRI (New England Biolabs, Ipswich, MA, USA) and Illumina adaptors including unique 7-bp barcodes were ligated to the digested fragments. Ligation products were pooled, size-selected between 475-580 bp with a Pippin Prep (Sage Science, Beverly, MA, USA) machine and amplified by PCR with 12 cycles using the iProofTM High-Fidelity DNA Polymerase (BIO-RAD, Hercules, CA, USA). The library was sequenced in a single-read 150-bp lane on an Illumina HiSeq2500 platform at The Centre for Applied Genomics (SickKids, Toronto, ON, Canada).

GENOMIC DATA PROCESSING AND BIOINFORMATICS

We used both STACKS v. 1.35 (Hohenlohe et al., 2010; Catchen et al., 2011; Catchen et al.,

2013) and PYRAD v. 3.0.66 (Eaton, 2014) to assemble our sequences into de novo loci and call genotypes. This allowed us to examine the robustness of our analyses based on SNP datasets obtained using two of the most popular programs currently available to assemble RADseq data (Catchen et al., 2011; Eaton, 2014). The choice of different filtering thresholds using either STACKS or PYRAD had little impact on the obtained inferences (Ortego et al., 2018). For this reason, unless otherwise indicated, all downstream analyses were performed using a SNP dataset obtained with STACKS including only those loci that were represented in at least four populations (p = 4). See Supplementary Methods for additional details on sequence assembling and data filtering.

78

CAPÍTULO II

POPULATION GENETIC STATISTICS

Population genetics statistics, including major allele frequency (P), nucleotide diversity (π), observed (HO) and expected (HE) heterozygosity, and the Wright’s inbreeding coefficient (FIS), were calculated using the program populations from STACKS (Catchen et al., 2013). For biallelic RADseq loci, π is an estimate of expected heterozygosity and is therefore a useful measure of the genetic diversity of populations. Furthermore, FIS measures the reduction in observed heterozygosity as compared to expected heterozygosity for an allele in a population, with positive values indicating non-random mating or cryptic population structure (Nei & Kumar, 2000; Hartl & Clark, 2007; Holsinger & Weir, 2009; Lanier et al.,

2015). Pair-wise FST values of genetic differentiation were calculated between all pairs of populations in ARLEQUIN v.3.5 (Excoffier & Lischer, 2010). We used PGDSPIDER v. 2.1.0.3

(Lischer & Excoffier, 2012) to convert Variant Call Format (VCF) files provided by STACKS into the correct format needed for ARLEQUIN.

POPULATION GENETIC STRUCTURE

We analysed population genetic structure and identified groups of individuals with similar ancestral gene pools using the Bayesian clustering method implemented in the program

STRUCTURE v.2.3.3 (Pritchard et al., 2000; Falush et al., 2003; Hubisz et al., 2009). We ran

STRUCTURE using a random subset of 10,000 unlinked SNPs from six different datasets obtained with STACKS and PYRAD considering different filtering/clustering parameters (see

Supplementary Methods for further details). For each dataset, we ran STRUCTURE assuming correlated allele frequencies and admixture and without using prior population information (Hubisz et al., 2009). We conducted 15 independent runs for each value of K, where K ranged from 1 to n+1 for the dataset of n populations, to estimate the “true” number of clusters with a burn-in step of 100,000 iterations followed by 200,000 MCMC cycles. We retained the ten runs having the highest likelihood for each value of K and defined the number of populations best fitting the dataset using log probabilities [Pr(X|K)] (Pritchard et al., 2000) and the ΔK method (Evanno et al., 2005), as implemented in STRUCTURE HARVESTER

79

CAPÍTULO II

(Earl & vonHoldt, 2012). We used CLUMPP v. 1.1.2 and the Greedy algorithm to align multiple runs of STRUCTURE for the same K value (Jakobsson & Rosenberg, 2007) and DISTRUCT v. 1.1 (Rosenberg, 2004) to visualize as bar plots the individual’s probabilities of membership to each inferred genetic cluster. Complementary to Bayesian clustering analyses and in order to visualize the major axes of population genetic differentiation, we performed individual- based PCA using the R 3.3.3 (R Core Team 2017) package ADEGENET (Jombart, 2008).

PHYLOGENOMIC INFERENCE

We inferred the phylogenetic relationships among the studied populations using the coalescent model implemented in the SNAPP v.1.3.0 (Bryant et al., 2012) plug-in for BEAST v.2.4.5 (Bouckaert et al., 2014). Due to the large computational demands of this program,

SNAPP analyses were conducted using a random subset of 2,500 SNPs and including four populations (TAJU, PHUE, BONI, and ORCE) representative of the main geographical areas (i.e. populations separated > 80 km; Fig. 1a) and the three genetic groups identified by PCA and Bayesian clustering analyses in STRUCTURE (Northern, Central and Southern clusters) (see RESULTS section). We ran these analyses using different theta priors to allow for different current and ancestral population sizes (scenario 1: α = 2; β = 200; and scenario 2: α = 2; β =

20,000). The forward (u) and reverse (v) mutation rates were set to be calculated by SNAPP and the remaining parameters were left at default values. We used the phrynomics R script written by Barb Banbury (https://github.com/bbanbury/phrynomics) to remove non-binary and invariant SNPs, code heterozygotes, and format input files for SNAPP. We used different starting seed numbers to run two independent runs for each scenario, each with >5 million generations sampled every 1,000 steps. Each run was inspected in TRACER v.1.6 (Rambaut et al., 2018) in order to check the convergence to stationary of the chains and confirm that Effective Sample Sizes (ESS) for all parameters were always much higher than 200.

Afterwards, we combined the two replicate runs for each analysis using LOGCOMBINER v.2.4.5, discarded 10% of trees as burn-in and used TREEANNOTATOR v.2.4.5 to obtain maximum credibility trees. Phylogenetic trees were displayed with DENSITREE v.2.2.5 (Bouckaert &

80

CAPÍTULO II

Heled, 2014). Complementary to SNAPP, we also ran phylogenetic analyses using SVDQUARTETS

(Chifman & Kubatko, 2014) as implemented in PAUP* v.4.0a152 (Swofford, 2002). Analyses with SVDQUARTETS included Dociostaurus maroccanus (Thunberg, 1815) as outgroup. Phylogenetic trees were constructed by exhaustively evaluating all possible quartets from the dataset and uncertainty in relationships was quantified using 1,000 bootstrapping replicates.

COALESCENT-BASED DEMOGRAPHIC MODELS

We used FASTSIMCOAL2 and the site frequency spectrum (SFS) (Excoffier & Foll, 2011; Excoffier et al., 2013) to compare six hypothetical models of gene flow (see Fig. 2), calculate the composite likelihood of the probability of the observed data given a specified model, and estimate divergence times (t), effective population sizes (θ), and migration rates per generation (m) (Excoffier & Foll, 2011; Excoffier et al., 2013) under the best supported model/s. For FASTSIMCOAL2 analyses we considered the three genetic groups inferred by

STRUCTURE and PCAs (Northern, Central and Southern) and the topology yielded by phylogenomic analyses in SNAPP (see RESULTS section) (Eaton et al., 2015; Lanier et al., 2015). For each of the three population groups considered in the simulations, we selected 11 individuals from the Northern cluster, 12 individuals from the Central cluster, and the 6 individuals from the Southern cluster. A folded joint SFS was calculated considering a single SNP per locus to avoid the effects of linkage disequilibrium (Papadopoulou & Knowles, 2015). Because we did not consider invariable sites in the SFS (i.e. “removeZeroSFS” option in FASTSIMCOAL2), we fixed the effective population size for one of the populations (ORCE; θS) to enable the estimation of other parameters in FASTSIMCOAL2 (Excoffier et al., 2013; Lanier et al., 2015; Papadopoulou & Knowles, 2015; Ortego et al., 2018). The effective population size fixed in the models was calculated from the level of nucleotide diversity (π) and estimates of mutation rate per site per generation (μ), since Ne = (π/4μ). Nucleotide diversity (π) for the population ORCE was estimated from polymorphic and non-polymorphic loci using STACKS (π = 0.0011; Table S1). We considered an average mutation rate per site per generation (Keightley et al., 2009; Papadopoulou & Knowles, 2015) of 3.50 × 10-9. To remove all missing

81

CAPÍTULO II data for the calculation of the joint SFS and minimize errors with allele frequency estimates, each population group was downsampled to 8-4 individuals (Northern group: 7 individuals; Central group: 8 individuals; Southern group: 4 individuals) using a custom Python script written by Qixin He and available on Dryad (Papadopoulou & Knowles, 2015). The final SFS contained information for 10,167 variable SNPs.

Figure 2 Alternative migration models tested using FASTSIMCOAL2. Parameters include ancestral (θANC, θN-C) and contemporary (θN, θC, θS) effective population sizes, timing of population split (TDIV2, TDIV1), and migration rates (m) between different pairs of populations. An asterisk and bold type indicate the three best supported migration models (see the RESULTS section and Table 2 for more details).

82

CAPÍTULO II

Each of the six models was run 100 replicated times using the computing resources provided by CESGA (Galician Supercomputer Center, Spain) and considering 100,000- 250,000 simulations for the calculation of the composite likelihood, 10-40 expectation- conditional maximization (ECM) cycles, and a stopping criterion of 0.001 (Papadopoulou & Knowles, 2015; Ortego et al., 2018). We used an information-theoretic model selection approach based on the Akaike’s information criterion (AIC) to determine the probability of each model given the observed data (Burnham & Anderson, 1998; Abascal et al., 2016; Thome & Carstens, 2016). After the maximum likelihood was estimated for each model, we calculated the AIC scores (Thome & Carstens, 2016). AIC values for each model were rescaled (ΔAIC) calculating the difference between the AIC value of each model and the minimum AIC obtained among all competing models (i.e. the best model has ΔAIC = 0). Confidence intervals of parameter estimates for the best supported models were obtained from 100 parametric bootstrap replicates by simulating SFS from the maximum composite likelihood point estimates and re-estimating parameters each time (Lanier et al., 2015).

LANDSCAPE GENETIC ANALYSES

We generated alternative spatially-explicit isolation-by-resistance (IBR) scenarios of population connectivity and tested which one is better supported by observed data of genetic differentiation (Wang, 2013). We applied circuit theory and used CIRCUITSCAPE 4.0 (McRae, 2006; McRae & Beier, 2007) to calculate resistance distance matrices between all pairs of populations under five hypothetical scenarios of gene flow: (i) a “flat” landscape in which all cells have equal resistance (resistance = 1), which is analogous to geographical distance but more appropriate for comparison with others competing models also generated with CIRCUITSCAPE (Noguerales et al., 2016); (ii) topographic roughness (slope); (iii) resistance offered by the boundaries of the main river basins from the study area (Tagus, Guadiana, Guadalquivir, Júcar, and Segura rivers; Fig. 1) (iv) resistance offered by non-natural landscapes and natural habitats not occupied by the species; and (v) resistance offered by areas with lithologies where the species is not present. Topographic roughness (slope) was calculated using a 90-m resolution digital elevation model from NASA Shuttle Radar

83

CAPÍTULO II

Topographic Mission (SRTM Digital Elevation Data; http://srtm.csi.cgiar.org/) and the final layer was transformed to 30 arc-sec (c. 1 km) resolution for subsequent analyses. Natural habitats occupied by the species, natural habitats not occupied by the species, and non- natural habitats were defined according to Corine Land Cover maps (CORINE land cover, 2012). We considered as natural habitats occupied by the species the Corine Land Cover categories “Natural grassland” and “Sclerophyllous vegetation”, which represent the two habitat classes used by the species according to our own occurrence data (Cordero et al., 2010). Natural habitats not occupied by the species included all other habitats falling within the category “Forest and semi-natural areas” plus the category “Pastures”. Non-natural habitats not occupied by the species grouped all other land cover categories, including agricultural areas and artificial surfaces (CORINE land cover, 2012). The lithological categories constituting the typical habitats occupied by the species (evaporites, limestones, and conglomerates) were identified according to our own occurrence data (Cordero et al., 2010) and mapped using the spatial dataset OneGeology-Europe (http://info.igme.es/cartografia/oneGeology.asp?mapa=oneGeology). In scenarios iii-v we assigned a range of resistance values (2.5-1,000,000) to the barrier (limit of main river basins) or the areas not occupied by the species (non-suitable habitats/lithologies), which allowed us to identify the resistance value for these landscape features that best fits our data of genetic differentiation (FST) (Andrew et al., 2012; Ortego et al., 2015a). Non-natural habitats (agricultural areas and artificial surfaces) were assumed to offer twice the resistance than natural habitats not occupied by the species (Table S3). Background areas (i.e. areas within main river basins and habitats/lithologies occupied by the species) were given a fixed value of 1. All maps and GIS calculations were performed using ARCMAP v.10.2.1

(ESRI, Redlands, CA, USA). In CIRCUITSCAPE, we employed a four-neighbor cell connection scheme in order to make effective the resistance assigned to river basin boundaries, as linear landscape features become permeable through pixel corners under the eight-neighbor cell connection scheme (McRae, 2006). Finally, we determined how well the different landscape resistance models fit observed data of genetic differentiation (FST) using multiple matrix regressions with randomization (MMRR) as implemented in R 3.3.3 (Wang, 2013). The final model was selected following a backward procedure, initially fitting all explanatory

84

CAPÍTULO II terms and progressively eliminating non-significant variables until all retained variables were significant. The significance of the variables excluded from the model was tested again until no additional variable reached significance (Ortego et al., 2015b).

RESULTS

GENOMIC DATA AND GENETIC STATISTICS

A total of 91,666,732 reads were obtained for the 35 genotyped individuals of D. crassiusculus. The number of reads per individual (mean ± SD = 2,619,049 ± 841,054 reads) before and after different quality filtering steps is shown in Fig. S1. The datasets obtained with STACKS for p = 2 and p = 4 contained 80,534 and 65,459 unlinked SNPs, respectively. The datasets obtained with PYRAD for Wclust = 90% and MinCov = 11 and 23 contained 32,424 and

18,442 unlinked SNPs, respectively; and for Wclust = 95% and MinCov = 11 and 23 contained

42,053 and 23,628 unlinked SNPs, respectively. Population genetic statistics (P, π, HO, HE and

FIS) calculated with STACKS for all positions (polymorphic and non-polymorphic) and considering loci that were represented in at least two (p = 2) and four populations (p = 4) and the 50% of individuals within populations (r = 0.5) are presented in Table S1. Pair-wise

FST values ranged from 0.063 to 0.237 and all were significantly different from zero based on 100 permutations (P < 0.05; Table S2).

POPULATION GENETIC STRUCTURE

STRUCTURE analyses based on a random subset of unlinked 10,000 SNPs from six different datasets obtained with STACKS and PYRAD considering different filtering/clustering parameters (see Supplementary Methods for further details), always identified K = 2 as the most likely clustering solution according with the ΔK criterion (Figs. S2-3). The two clusters

85

CAPÍTULO II presented no signature of genetic admixture and split the southernmost population (ORCE) from the remainder of the populations (Fig. 1d). STRUCTURE analyses for K = 3 divided Northern (TAJU-BELI) and Central populations (PHUE-SALI-BONI) in two different genetic clusters, but in this case the geographically closer populations (TAJU-BELI and PHUE-SALI) showed a considerable degree of genetic admixture (~25%) (Fig. 1d and Fig. S2). The results obtained with STRUCTURE were in agreement with those obtained from Principal Component Analyses (PCA), in which PC1 split the Southern population (ORCE) from the remainder of the populations, and PC2 separated Northern (TAJU-BELI) from Central populations (PHUE-SALI- BONI) (Fig. 3 and Fig. S4).

Figure 3 Principal component analyses (PCA) of genetic variation for populations of D. crassiusculus. Analyses are based on SNP datasets obtained with STACKS considering different filtering parameters: (a) 80,534 unlinked SNPs for p = 2; and (b) 65,459 unlinked SNPs for p = 4. Dotted-line rectangles group main population clusters. Population codes are described in Table 1.

PHYLOGENOMIC INFERENCE

Phylogenomic relationships among populations inferred by SNAPP were well-resolved and nodes presented high posterior probabilities (Fig. 4a). In agreement with analyses of genetic structure (STRUCTURE and PCAs), SNAPP analyses supported an earlier split of ORCE from the remainder of the populations, which in turn divided into Northern (TAJU) and Central populations (PHUE and BONI) (Fig. 4a). Analyses considering different current and ancestral population sizes (α = 2; β = 200 or α = 2; β = 20,000) and different population combinations

86

CAPÍTULO II for Central and Northern populations (i.e. BELI-SALI, BELI-PHUE, and TAJU-SALI) yielded analogous results (data not shown) (Ortego et al., 2018). The best tree from SVDQUARTETS yielded the same topology than SNAPP, but the relationships among populations were not well resolved (bootstrap support values < 70 %) probably as a result of inter-population gene flow or incomplete lineage sorting (Fig. 4b).

Figure 4 Phylogenetic trees inferred with (a) SNAPP and (b) SVDQUARTETS considering four populations representative of the main geographical areas (populations separated > 80 km) and the three genetic groups identified by Principal Component Analyses (PCA) and Bayesian clustering analyses in STRUCTURE (Northern, Central and Southern populations). Bayesian posterior probabilities (for SNAPP) and bootstrapping support values (for SVDQUARTETS) are indicated on the nodes. Population codes are described in Table 1.

87

CAPÍTULO II

COALESCENT-BASED DEMOGRAPHIC MODELS

FASTSIMCOAL2 analyses supported models A, B and D (Fig. 2) as the best-fitting and statistically equivalent models (ΔAIC < 2.00; Table 2). These three migration models have in common that all of them consider gene flow between ancestral populations (mANC) (Fig. 2). Although analogous models without ancestral migration were tested (models C, E and F; Fig. 2), they were poorly supported (Table 2). Demographic parameters estimated under the three best supported models (A, B, and D) and their weighted averages are presented in Table 3.

Considering a 1-year generation time for D. crassiusculus (Cordero et al., 2010), FASTSIMCOAL2 analyses showed that the division between the Southern and Northern-Central populations

(TDIV2) occurred ~126 ka (95% CIs: 90-197 ka), probably during the Eemian Interglacial period

(115-130 ka) (Table 3). The weighted average estimate yielded by FASTSIMCOAL2 for the more recent split between Northern and Central populations (TDIV1) indicate that this event took place ~17 ka (95% CIs: 11-24 ka), around the last glacial maximum (LGM; 20 ka) (Table 3). Gene-flow estimates were low and the migration rate (m) inferred between Central and

Southern populations (mC-S) was nearly an order of magnitude lower than the migration rate between Northern and Central populations (mN-C) and between ancestral populations (mANC) after the first population split (TDIV2) (Table 3).

Table 2 Comparison of alternative migration models (detailed in Fig. 2) tested using FASTSIMCOAL2. For each model, the table shows the maximum likelihood estimate (lnL), the number of parameters (k), the Akaike’s information criterion score (AIC), the difference in AIC value of each model from that of the strongest model (ΔAIC), and AIC weight (ωi). Best-supported equivalent models (ΔAIC < 2) are indicated in bold (Fig. 2).

Model lnL k AIC ΔAIC ωi A -19,539.06 11 39,100.12 1.76 0.19 B -19,539.18 10 39,098.36 0.00 0.46

C -19,546.21 10 39,112.43 14.07 0.00 D -19,540.48 9 39,098.97 0.61 0.34 E -19,566.71 9 39,151.42 53.06 0.00 F -19,567.69 8 39,151.38 53.02 0.00

88

CAPÍTULO II

Table 3 Parameters inferred from coalescent simulations with FASTSIMCOAL2 under the three best supported migration models (see Fig. 2 for details). The effective population size of one population (ORCE: θS) was calculated from nucleotide diversity estimates and fixed in the different models to enable the estimation of other parameters (see the METHODS section). Table shows point estimates under each model and model averaged estimates with lower and upper 95% confidence intervals in parenthesis. Estimates of time are given in units of generations.

Parameter Model A Model B Model D Model average (95% CIs)

θANC 99,783 95,231 111,992 101,840 (48,527 - 121,923)

θN-C 148,756 150,712 195,197 165,446 (134,574 - 201,611)

θN 163,990 166,040 130,914 153,687 (136,724 - 200,599)

θC 102,756 118,736 78,941 102,010 (91,758 - 131,587)

TDIV1 31,995 19,268 4,507 16,795 (11,509 - 23,731)

TDIV2 152,866 156,242 68,431 125,711 (90,218 - 197,204) -05 -05 -06 -05 -06 -04 mANC 2.25 × 10 1.45 × 10 5.30 × 10 1.30 × 10 (8.37 × 10 - 1.50 × 10 ) -05 -05 -05 -05 -05 mN-C 3.86 × 10 3.30 × 10 - 3.47 × 10 (2.69 × 10 - 3.99 × 10 ) -06 -06 -07 -06 mC-S 1.83 × 10 - - 1.83 × 10 (3.57 × 10 - 2.97 × 10 )

LANDSCAPE GENETIC ANALYSES

Genetic differentiation was significantly and positively correlated with resistance distances obtained under all tested scenarios (Table S3). Hypothetical scenarios based on habitat and lithology reached the highest model fit at the lowest resistance value for the non-suitable category, indicating that they do not explain the data better than a flat landscape in which all cells have equal resistance (= 1) (Table S3; Fig. 5a). In contrast, model fit for the scenario incorporating the resistance offered by the boundaries of main river basins peaked when the resistance value offered by this landscape feature was set to 100 (r2 = 0.830; P = 0.001) (Table S3; Fig. 5a). A multiple matrix regression with randomization (MMRR) analysis considering simultaneously the best fit resistance value under each scenario showed that the scenario incorporating the resistance offered by the boundaries of main river basins was the only one retained into the final model (Table 4; Fig. 5b). This indicates that isolation in different river basins is the main factor explaining genetic differentiation in the species, with no apparent effect of topographic roughness, lithology or habitat (Table 4).

89

CAPÍTULO II

2 Figure 5 (a) Coefficient of determination (R ) for models analysing genetic differentiation (FST) in relation with resistance distances defined by limits of main river basins (blue dots/line), habitat (yellow dots/line), and lithology (red dots/line). Each scenario considered a range (2.5-1,000,000) of 23 hypothetical resistance values offered by the barrier (limit of main river basins) or the areas not occupied by the species (non-suitable habitats/lithologies). Resistance values for different scenarios (x-axis) are log-transformed for illustrative purposes. (b) Relationship between genetic differentiation (FST) and resistance distances calculated using CIRCUITSCAPE for the best fitting scenario (resistance offered by the boundaries of main river basins set to 100; see Table 4).

Table 4 Multiple matrix regression with randomization (MMRR) for genetic differentiation (FST) in relation with resistance distances defined by (i) a flat landscape, (ii) topographic roughness (slope), (iii) limits of main river basins, (iv) habitat, and (v) lithology. The last three scenarios initially considered a range (2.5-1,000,000) of hypothetical resistance values offered by the barrier (limit of main river basins) or the areas not occupied by the species (non-suitable habitats/lithologies), but only resistance values (indicated in parentheses) yielding their respective best fitting models were included in this multivariate analysis (see Table S3).

Variable β t p Explanatory terms Constant -0.246 -2.117 0.067 Limits of main river basins (100) 1.382 7.973 0.001 Rejected terms Lithology (2.5) 0.648 0.827 Habitat (2.5) 2.366 0.312 Topography (slope) 0.304 0.836 Flat landscape -1.102 0.257

90

CAPÍTULO II

DISCUSSION

Genomic data revealed that populations of the endangered Iberian grasshopper D. crassiusculus show a marked hierarchical genetic structure, with the presence of two highly divergent cryptic lineages (Fig. 4) that comprise three genetic clusters (Fig. 1 and 2). One of the lineages is only represented by the highly isolated population (ORCE) located in the southernmost limit of species distribution (Southern cluster), whereas the other includes the remainder of the populations and is sub-structured into two genetic clusters (Northern and Central clusters). Our phylogenomic and coalescent-based analyses supported an early split of the two lineages and estimated that their divergence took place during the Upper Pleistocene (~126 ka), probably around the Eemian interglacial stage. The Northern (TAJU- BELI) and Central (SALI-PHUE-BONI) genetic clusters were estimated to diverge much more recently (~17 ka), probably after the LGM. Note, however, that these estimates of divergence time must be interpreted with caution. In particular, it is remarkable the different estimates of divergence time obtained for Model D vs. Models A-B (Table 3). The fact that Model D does not consider gene flow among contemporary populations is expected to have resulted in younger estimates of population split than in Models A-B. Thus, Model D and Models A-B are statistically indistinguishable but find two different solutions that fit equally well our genomic data (Models A-B: presence of contemporary gene flow and older estimates of divergence times; Model D: lack of contemporary gene flow and younger estimates of divergence times) (Tables 1-2). Statistical evaluation of alternative migration models showed that the most likely scenarios were always those considering ancient gene flow between ancestral populations and contemporary gene flow between populations from Northern and Central genetic clusters, although with very low absolute values for migration

-05 -05 rates per generation (mANC = 1.30 x 10 ; mN-C = 3.47 x 10 ; Table 3 and Fig. 2). The consistent support for models including gene flow between ancestral populations (mANC), indicate that vicariance with multiple contacts (probably during glacial-interglacial cycles) is likely to have led to the current genetic structure of the species (i.e. isolation with gene flow). The best supported scenario (Model B) is the one considering gene flow between recently split populations across time, with higher contemporary migration rates among

91

CAPÍTULO II closer populations from Northern and Central genetic clusters (Fig. 2). Migration models involving gene flow with the Southern lineage were either not supported (Model C) or

-06 yielded point estimates of migration rates an order of magnitude lower (mC-S = 1.83 x 10 ) than those inferred between Northern and Central populations (Model A) (Table 2; Fig. 2). These results are in agreement with Bayesian clustering analyses, which showed considerable genetic admixture (~25%) among nearby populations from Northern and Central genetic clusters but no signature of admixed ancestry for the Southern lineage (Fig. 1d). Thus, despite the small distribution range and the relatively short geographical distances separating the extant populations of D. crassiusculus, our results indicate that this species shows a remarkable genetic structure that is comparable to that reported for other Orthoptera taxa with patchy distributions and forming highly fragmented populations (Ortego et al., 2012; Ortego et al., 2015a; Tinnert et al., 2016; Schmid et al., 2018).

Our landscape genetic analyses indicate that geographical distance, the spatial distribution of suitable habitats, lithology or topography do not explain per se the degree of genetic differentiation among populations and revealed that the limits of major river basins are the main factor explaining large-scale patterns of genetic structure in D. crassiusculus

(Fig. 1a). These results are in agreement with inferences from STRUCTURE and PCA analyses, which showed that the populations of the species are clustered according to the limits of main river basins: Northern genetic cluster in Tagus river basin, Central genetic cluster in Guadiana river basin, and Southern genetic cluster in Guadalquivir river basin (Fig. 1a). Apart from numerous freshwater fishes (Gómez & Lunt, 2006; Thacker et al., 2007), the importance of palaeo- and modern drainages in structuring genetic variation has been also reported in another steppe specialist grasshopper (Mioscirtus wagneri) presenting highly fragmented populations and inhabiting a similar geographic area (Ortego et al., 2012) and in geckos (genus Rhynchoedura) from arid regions of Australia (Pepper et al., 2011). These results indicate the importance of this landscape feature on the evolutionary histories of terrestrial organisms from steppe and arid landscapes (Pepper et al., 2011). Rivers themselves do not seem to be an important barrier to dispersal in our study system, as populations located within the same basin but at different sides of main river stems or their tributaries (e.g. TAJU and BELI) show low levels of genetic differentiation in comparison with

92

CAPÍTULO II populations located in different basins (Tantrawatpan et al., 2011). Estimates of divergence time among contemporary populations of D. crassiusculus (~17-126 ka; Table 3) and the timing of species split from its sister taxon D. kraussi (1.01 Ma; González-Serna et al., 2018) indicate that the origin of the species and its different lineages is probably posterior to the formation of the main river basins from the central-south Iberia, which are thought to have acquired their current configuration during the Oligocene-Pliocene (Doadrio, 1988; Gómez & Lunt, 2006; Pepper et al., 2011). Thus, the different genetic clusters and lineages are not likely to have resulted from population isolation in different palaeodrainages or ancient geological surfaces (Pepper et al., 2018), but probably reflect the role of river drainages and lowlands as corridors of suitable habitat facilitating connectivity among populations located within the same basin (Ortego et al., 2012). Given that a predominantly flat landscape characterizes the distribution area of D. crassiusculus and the main drainages are not separated by an abrupt topography (i.e. mountain systems), our results suggest that populations of the species have probably remained linked to lowlands (e.g. pseudo-steppe saline low grounds) from different river basins (Peinado, 1994; Ortego et al., 2012) rather than physically separated by ridges representing the divides between drainages. Accordingly, our analyses indicate that other landscape features such as topographic roughness (slope) or the distribution of the typical habitats and lithological formations occupied by the species are not important factors explaining spatial patterns of genetic structure in D. crassiusculus (Table 4). Previous studies have identified topographic roughness as a relevant factor shaping genetic differentiation in two montane grasshoppers inhabiting areas with abrupt landscapes (Noguerales et al., 2016; Noguerales et al., 2017), a situation contrasting with the predominantly flat areas characterizing the distribution range of D. crassiusculus (Cordero et al., 2010). The widespread presence of sedimentary lithologies (evaporites, limestones, and conglomerates) across the distribution range of the species could have reduced our ability to identify barriers to dispersal linked to unsuitable geological formations or, alternatively, might reflect the capacity of the species to cross them. In any case, we must point out that our landscape genetic analyses should be interpreted with extreme caution, given that the very few extant populations of the species (n = 6) strongly limit the power of our analyses and the scope of the obtained inferences.

93

CAPÍTULO II

Coalescent-based analyses support the fact that range-wide patterns of genetic structure in D. crassiusculus are a consequence of ancient processes of population fragmentation (~17-126 ka; Table 3) that predate the Anthropocene. Accordingly, landscape genetic analyses suggest that land clearing for agriculture is not likely to explain large-scale patterns of genetic fragmentation (Fig. 5; Table 4). Based on the degree of divergence between the different lineages and genetic clusters, we recommend that the Northern, Central and Southern groups are recognized as Evolutionarily Significant Units (ESU; Waples et al., 1991), Designatable Units (DU; Green, 2005) or Conservation Significant Units (CSU; Yuan et al., 2011). These entities are likely to be substantially reproductively isolated from each other, represent an important component in the evolutionary legacy of the species, and include all discrete genetic and geographic subunits below the species level for status assessment, establishing conservation priorities and setting on-ground management strategies (Waples et al., 1991; Moritz, 1994). Of particular concern is the highly divergent Southern lineage because, as far as we know, it is currently represented by a single small population (ORCE) within the Guadalquivir river basin and Andalucía region (Fig. 1a,d). The correspondence between the identified units (lineages and genetic clusters) and the circumscription of different government administrations (Madrid, Castilla-La Mancha and Andalucía regions) could facilitate the establishment of regional conservation plans aimed at implementing the most efficient management strategies within each territory. Although nearby populations (TAJU-BELI and PHUE-SALI) showed no apparent signatures of genetic fragmentation (Fig. 1d), we must point that several lines of evidence suggest that this finding is not incompatible with a dramatic impact of human activities on the decline of the species at local and regional scales (Cordero et al., 2010). For instance, historical museum records indicate that many populations from the Northern cluster (specifically, in Madrid province) have been extirpated in the last decades (Gangwere et al., 1985; Cordero et al., 2010). All remaining populations are extremely small and submitted to severe impacts of human intervention (e.g. land ploughing, urbanization) and stochastic phenomena (e.g. flash flooding) that have been linked to sharp population declines (Gangwere et al., 1985; Cordero et al., 2010). The expected time-lag between population fragmentation/declines and the genetic consequences of such processes (disruption of gene flow, genetic differentiation,

94

CAPÍTULO II loss of genetic diversity, etc.) might explain why recent human impacts have not been yet reflected in spatial patterns of genetic variation (Cushman et al., 2006). Unfortunately, the small number of extant populations at local/regional scales (1-2 populations/genetic cluster), makes difficult to perform detailed analyses to evaluate the role of current landscape structure (e.g. land clearing for agriculture, urbanization, etc.) on the genetic connectivity of contemporary populations (Zellmer & Knowles, 2009; Ortego et al., 2015a). Future genomic analyses of specimens available in museum collections (Cordero et al., 2010) could help to determine temporal changes in genetic diversity and study past patterns of gene flow in relation with historical landscape composition (Schmid et al., 2018).

Overall, our genomic data support that the different lineages and genetic clusters of D. crassiusculus can be regarded as independent units that require adequate conservation and management strategies to preserve their idiosyncratic evolutionary histories. Conservation actions for D. crassiusculus should be focused on the preservation of areas with sensitive habitat occupied by the main lineages and units delineated by our genomic analyses. These should include the control of negative human interventions and the monitoring of local populations, actions that could also benefit other co-distributed and poorly-known species with similar ecological requirements and fragmented populations linked to gypsum and salt steppes of the Iberian Peninsula (Ribera & Blasco-Zumeta, 1998; Cordero & Llorente, 2008; Ortego et al., 2010; Ortego et al., 2015a; Ortego et al., 2015b). Given the extremely low number and size of extant populations of the species, ex-situ conservation plans and reintroduction/translocations programmes in restored habitats could help to reduce the chances of species/lineage extinction (Baur et al., 2017; Perl et al., 2018). These conservation actions should always consider the genomic singularity of the different units identified in this study and be accompanied with long-term habitat management and population monitoring (Baur et al., 2017; Tian et al., 2018). Future studies including detailed ecological information (e.g. diet analyses: McClenaghan et al., 2015) and genome scans to detect potential loci under selection implicated in ecological adaptation (Apple et al., 2010; Feng et al., 2015) would be of great help to get a better understanding of the processes underlying the evolutionary history of the different lineages and refine the conservation actions for this endangered species.

95

CAPÍTULO II

ACKNOWLEDGEMENTS

We wish to thank Milagros Coca-Abia and José Ramón Correas for providing us information about D. crassiusculus locations from Orce and Perales de Tajuña, respectively. We also thank to Anna Papadopoulou for her support with genomic data analyses and David Aragonés (LAST-EBD) for his help with GIS analyses. We thank to Centro de Supercomputación de Galicia (CESGA) for access to computer resources. The respective administrative authorities from each study area (Madrid, Castilla-La Mancha and Andalucía) provided us the corresponding permits for sampling. MJG was supported by a pre-doctoral scholarship from Junta de Comunidades de Castilla-La Mancha and European Social Fund. JO was supported by a Ramón y Cajal (RYC-2013-12501) research fellowship. This work received financial support from research grants CGL2011-25053, CGL2014-54671-P, and CGL2016- 80742-R (co-funded by the Dirección General de Investigación y Gestión del Plan Nacional I+D+i and European Social Fund); POII10-0197-0167 and PEII-2014-023-P (co-funded by Junta de Comunidades de Castilla-La Mancha and European Social Fund).

REFERENCES

Abascal, F., Corvelo, A., Cruz, F. et al. (2016) Extreme genomic erosion after recurrent demographic bottlenecks in the highly endangered Iberian lynx. Genome Biology, 17, 251.

Aitken, S.N. & Whitlock, M.C. (2013) Assisted gene flow to facilitate local adaptation to climate change. Annual Review of Ecology, Evolution, and Systematics, 44, 367-388.

Andrew, R.L., Ostevik, K.L., Ebert, D.P. & Rieseberg, L.H. (2012) Adaptation with gene flow across the landscape in a dune sunflower. Molecular Ecology, 21, 2078-2091.

Apple, J.L., Grace, T., Joern, A. et al. (2010) Comparative genome scan detects host-related divergent selection in the grasshopper Hesperotettix viridis. Molecular Ecology, 19, 4012- 4028.

Barredo, J. I., Caudullo, G. & Dosio, A. (2016) Mediterranean habitat loss under future climate conditions: Assessing impacts on the Natura 2000 protected area network. Applied Geography, 75, 83-92.

Baur, B., Thommen, G.H. & Coray, A. (2017) Dynamics of reintroduced populations of Oedipoda caerulescens (Orthoptera, Acrididae) over 21 years. Journal of Insect Science, 17, 10.

Blondel, J. & Aronson, J. (1999) Biology and wildlife of the Mediterranean region. 328 pp. Oxford University Press. New York, USA.

Bouckaert, R. & Heled, J. (2014) DENSITREE 2: seeing trees through the forest. bioRxiv 012401. doi: 10.1101/012401

96

CAPÍTULO II

Bouckaert, R. Heled, J., Kühnert, D. et al. (2014) BEAST 2: a software platform for Bayesian evolutionary analysis. PLOS Computational Biology, 10, e1003537.

Brooks, T. M., Mittermeier, R.A., da Fonseca, G.A. et al. (2006) Global biodiversity conservation priorities. Science, 313, 58-61.

Brown, J.L., Weber, J.J., Alvarado-Serrano, D.F., Hicjerson, M.J. et al. (2016) Predicting the genetic consequences of future climate change: The power of coupling spatial demography, the coalescent, and historical landscape changes. American Journal of Botany, 103, 153-163.

Bryant, D., Bouckaert, R., Felsenstein, J. et al. (2012) Inferring species trees directly from biallelic genetic markers: bypassing gene trees in a full coalescent analysis. Molecular Biology and Evolution, 29, 1917-1932.

Burnham, K.P. & Anderson, D.R. (1998) Model selection and inference: a practical information-theoretic approach. 355 pp. Springer New York, USA.

Catchen, J.M., Amores, A., Hohenlohe, P.A. et al. (2011) STACKS: building and genotyping loci de novo from short-read sequences. G3: Genes|Genomes|Genetics, 1, 171-182.

Catchen, J.M., Hohenlohe, P.A., Bassham, S. et al. (2013) STACKS: an analysis tool set for population genomics. Molecular Ecology, 22, 3124-3140.

Chifman, J. & Kubatko, L. (2014) Quartet inference from SNP data under the coalescent. Bioinformatics, 30, 3317-3324.

Cordero, P.J. & Llorente, V. (2008) New data on the 'silver-bell cricket' (Orthoptera, Gryllidae), a forgotten and overlooked cricket subject to a high risk of extinction in western Europe. Graellsia, 64, 171-180.

Cordero, P.J., Llorente, V., Aguirre, M.P. & Ortego, J. (2010) Dociostaurus crassiusculus (Pantel, 1886), especie (Orthoptera: Acrididae) rara en la Península ibérica con poblaciones locales en espacios singulares de Castilla-La Mancha (España). Boletín de la Sociedad Entomológica Aragonesa, 46, 461-465.

CORINE land cover (2012) EEA. Commission of the European Communities, Luxembourg, https://land.copernicus.eu/pan-european/corine-land-cover/clc-2012.

Cullingham, C.I., Kyle, C.J., Pond, B.A. et al. (2009) Differential permeability of rivers to raccoon gene flow corresponds to rabies incidence in Ontario, Canada. Molecular Ecology, 18, 43-53.

Cunningham, M. & Moritz, C. (1998) Genetic effects of forest fragmentation on a rainforest restricted lizard (Scincidae: Gnypetoscincus queenslandiae). Biological Conservation, 83, 19- 30.

97

CAPÍTULO II

Cushman, S.A., McKelvey, K.S., Hayden, J. & Schwartz, M.K. (2006) Gene flow in complex landscapes: Testing multiple hypotheses with causal modeling. The American Naturalist, 168, 486-499.

Doadrio, I. (1988) Delimitation of areas in the Iberian Peninsula on the basis of freshwater fishes. Bonner Zoologische Beiträge, 39, 113-128.

Earl, D.A. & vonHoldt, B.M. (2012) STRUCTURE HARVESTER: a website and program for visualizing STRUCTURE output and implementing the Evanno method. Conservation Genetics Resources, 4, 359-361.

Eaton, D.A. (2014) PYRAD: assembly of de novo RADseq loci for phylogenetic analyses. Bioinformatics, 30, 1844-1849.

Eaton, D.A., Hipp, A.L., González-Rodríguez, A. & Cavender-Bares, J. (2015) Historical introgression among the American live oaks and the comparative Nature, of tests for introgression. Evolution, 69, 2587-2601.

Espindola, A., Pellissier, L., Maiorano, L. et al. (2012) Predicting present and future intra- specific genetic structure through niche hindcasting across 24 millennia. Ecology Letters, 15, 649-657.

Evanno, G., Regnaut, S. & Goudet, J. (2005) Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Molecular Ecology, 14, 2611-2620.

Excoffier, L. & Foll, M. (2011) FASTSIMCOAL: a continuous-time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios. Bioinformatics, 27, 1332- 1334.

Excoffier, L. & Lischer, H.E. (2010) ARLEQUIN suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows. Molecular Ecology Resources, 10, 564-567.

Excoffier, L., Dupanloup, I., Huerta-Sánchez, E. et al. (2013) Demographic inference from genomic and SNP data. PLOS Genetics, 9, e1003905.

Faille, A., Andújar, C., Fadrique, F. & Ribera, I. (2014) Late Miocene origin of an Ibero- Maghrebian clade of ground beetles with multiple colonizations of the subterranean environment. Journal of Biogeography, 41, 1979-1990.

Falush, D., Stephens, M. & Pritchard, J.K. (2003) Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics, 164, 1567- 1587.

Feng, X.J., Jiang, G.F. & Fan, Z. (2015) Identification of outliers in a genomic scan for selection along environmental gradients in the bamboo locust, kiangsu. Scientific Reports, 5, 13758.

98

CAPÍTULO II

Ferrer-Castán, D. & Vetaas, O.R. (2005) Pteridophyte richness, climate and topography in the Iberian Peninsula: comparing spatial and nonspatial models of richness patterns. Global Ecology and Biogeography, 14, 155-165.

Frankham, R. & Ralls, K. (1998) Conservation biology - Inbreeding leads to extinction. Nature, 392, 441-442.

Frankham, R. (1995) Conservation genetics. Annual Review of Genetics, 29, 305-327.

Fraser, D.J. & Bernatchez, L. (2001) Adaptive evolutionary conservation: towards a unified concept for defining conservation units. Molecular Ecology, 10, 2741-2752.

Gangwere, S.K., Viedma, M.G. d. & Llorente, V. (1985) Libro rojo de los ortópteros ibéricos. 91 pp. Instituto Nacional para la Conservación de la Naturaleza (Ed.). Monografía 41, Ministerio de Agricultura, Pesca y Alimentación. Madrid, Spain.

Giorgi, F. & Lionello, P. (2008) Climate change projections for the Mediterranean region. Global and Planetary Change, 63, 90-104.

Gómez, A. & Lunt, D.H. (2006) Refugia within Refugia: Patterns of phylogeographic concordance in the Iberian Peninsula. In: Phylogeography of Southern European Refugia. 155-188 pp. Weiss, S. & Ferrand, N. (Eds.). Springer. The Netherlands.

González-Serna, M.J., Ortego, J. & Cordero, P.J. (2018) A review of cross-backed grasshoppers of the genus Dociostaurus Fieber (Orthoptera: Acrididae) from the western Mediterranean: insights from phylogenetic analyses and DNA-based species delimitation. Systematic Entomology, 43, 136-146.

Green, D.M. (2005) Designatable units for status assessment of endangered species - Unidades designatables para la evaluación del estatus de especies en peligro. Conservation Biology, 19, 1813-1820.

Hanski, I. (2011) Habitat loss, the dynamics of biodiversity, and a perspective on conservation. Ambio, 40, 248-255.

Harrison, S. (1997) How natural habitat patchiness affects the distribution of diversity in Californian serpentine chaparral. Ecology 78, 1898-1906.

Hartl, D.L. & Clark, A.G. (2007) Principles of population genetics. 545 pp. Sinauer Associates, Inc. Publishers (4th Ed.). Sunderland. Massachusetts, USA.

Harz, A. (1975) Die Orthopteren Europas II / The Orthoptera of Europe II. 941 pp. Springer. The Netherlands.

Hewitt, G. (2000) The genetic legacy of the Quaternary ice ages. Nature, 405, 907-913.

Hochkirch, A., Nieto, A., García-Criado, M. et al. (2016) European red list of grasshoppers, crickets and bush-crickets. 94 pp. Publications Office of the European Union. Luxembourg.

99

CAPÍTULO II

Hodjat, S.H. (2016) A review of Iranian Dociostaurini (Orthoptera: Gomphocerinae) with keys to their species. Entomologia Generalis, 35, 253-268.

Hohenlohe, P.A., Bassham, S., Etter, P.D. et al. (2010) Population genomics of parallel adaptation in threespine stickleback using sequenced RAD tags. PLOS Genetics, 6, e1000862.

Holsinger, K.E. & Weir, B.S. (2009) Genetics, in geographically structured populations: defining, estimating and interpreting FST. Nature Reviews Genetics, 10, 639-650.

Hubisz, M.J., Falush, D., Stephens, M. & Pritchard, J.K. (2009) Inferring weak population structure with the assistance of sample group information. Molecular Ecology Resources, 9, 1322-1332.

Jakobsson, M. & Rosenberg, N.A. (2007) CLUMPP: a cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure. Bioinformatics, 23, 1801-1806.

Jay, F., Manel, S., Álvarez, N., Durand, E.Y. et al. (2012) Forecasting changes in population genetic structure of alpine plants in response to global warming. Molecular Ecology, 21, 2354-2368.

Jombart, T. (2008) ADEGENET: A R package for the multivariate analysis of genetic markers. Bioinformatics, 24, 1403-1405.

Kalkvik, H.M., Stout, I.J. & Parkinson, C.L. (2012) Unraveling natural versus anthropogenic effects on genetic diversity within the southeastern beach mouse (Peromyscus polionotus niveiventris). Conservation Genetics, 13, 1653-1664.

Keightley, P.D., Trivedi, U., Thomson, M. et al. (2009) Analysis of the genome sequences of three Drosophila melanogaster spontaneous mutation accumulation lines. Genome Research, 19, 1195-1201.

Lanier, H.C., Massatti, R., He, Q. et al. (2015) Colonization from divergent ancestors: glaciation sigNature,s on contemporary patterns of genomic variation in Collared Pikas (Ochotona collaris). Molecular Ecology, 24, 3688-3705.

Lindenmayer, D.B. & Fischer, J. (2006) Habitat Fragmentation and Landscape Change: An Ecological and Conservation Synthesis. 352 pp. Island Press. Washington, USA.

Lischer, H.E. & Excoffier, L. (2012) PGDSPIDER: an automated data conversion tool for connecting population genetics and genomics programs. Bioinformatics, 28, 298-299.

Manel, S., Schwartz, M.K., Luikart, G. & Taberlet, P. (2003) Landscape genetics: combining landscape ecology and population genetics. Trends in Ecology & Evolution, 18, 189-197.

100

CAPÍTULO II

McClenaghan, B., Gibson, J.F., Shokralla, S. & Hajibabaei, M. (2015) Discrimination of grasshopper (Orthoptera: Acrididae) diet and niche overlap using next-generation sequencing of gut contents. Ecology and Evolution, 5, 3046-3055.

McRae, B. H. (2006) Isolation by resistance. Evolution, 60, 1551-1561.

McRae, B.H. & Beier, P. (2007) Circuit theory predicts gene flow in plant and animal populations. Proceedings of the National Academy of Sciences of the United States of America, 104, 19885-19890.

Médail, F. & Quézel, P. (1999) Biodiversity Hotspots in the Mediterranean Basin: Setting Global Conservation Priorities. Conservation Biology, 13, 1510-1513.

Moritz, C. (1994) Defining 'Evolutionarily Significant Units' for conservation. Trends in Ecology & Evolution, 9, 373-375.

Moritz, C. (2002) Strategies to protect biological diversity and the evolutionary processes that sustain it. Systematic Biology, 51, 238-254.

Myers, N., Mittermeier, R.A., Mittermeier, C.G. et al. (2000) Biodiversity hotspots for conservation priorities. Nature, 403, 853-858.

Nei, M. & Kumar, S. (2000) Molecular evolution and phylogenetics. 352 pp. Oxford University Press. New York, USA.

Noguerales, V., Cordero, P.J. & Ortego, J. (2016) Hierarchical genetic structure shaped by topography in a narrow-endemic montane grasshopper. BMC Evolutionary Biology, 16, 96.

Noguerales, V., Cordero, P.J. & Ortego, J. (2017) Testing the role of ancient and contemporary landscapes on structuring genetic variation in a specialist grasshopper. Ecology and Evolution, 7, 3110-3122.

Ortego, J., Aguirre, M.P. & Cordero, P.J. (2010) Population genetics of Mioscirtus wagneri, a grasshopper showing a highly fragmented distribution. Molecular Ecology, 19, 472-483.

Ortego, J., Aguirre, M.P. & Cordero, P.J. (2012) Landscape genetics of a specialized grasshopper inhabiting highly fragmented habitats: a role for spatial scale. Diversity and Distributions, 18, 481-492.

Ortego, J., Aguirre, M.P., Noguerales, V. & Cordero, P.J. (2015) Consequences of extensive habitat fragmentation in landscape-level patterns of genetic diversity and structure in the Mediterranean esparto grasshopper. Evolutionary Applications, 8, 621-632.

Ortego, J., Gugger, P.F. & Sork, V.L. (2018) Genomic data reveal cryptic lineage diversification and introgression in Californian golden cup oaks (section Protobalanus). New Phytologist, 218, 804-818.

101

CAPÍTULO II

Ortego, J., Gugger, P.F., Sork, V.L. & Riddle, B. (2015) Climatically stable landscapes predict patterns of genetic structure and admixture in the Californian canyon live oak. Journal of Biogeography, 42, 328-338.

Papadopoulou, A. & Knowles, L.L. (2015) Species-specific responses to island connectivity cycles: refined models for testing phylogeographic concordance across a Mediterranean Pleistocene Aggregate Island Complex. Molecular Ecology, 24, 4252-4268.

Peinado, M. (1994) Funcionamiento y variabilidad de los geosistemas de los humedales manchegos. 480 pp. PhD Thesis, Universidad Complutense de Madrid. Spain.

Pepper, M., Doughty, P., Arculus, R. & Keogh, J.S. (2008) Landforms predict phylogenetic structure on one of the world's most ancient surfaces. BMC Evolutionary Biology, 8, 152.

Pepper, M., Doughty, P., Hutchinson, M.N. & Keogh, J.S. (2011) Ancient drainages divide cryptic species in Australia's arid zone: Morphological and multi-gene evidence for four new species of Beaked Geckos (Rhynchoedura). Molecular Phylogenetics and Evolution, 61, 810- 822.

Perl, R.G.B., Geffen, E., Malka, Y. et al. (2018) Population genetic analysis of the recently rediscovered Hula painted frog (Latonia nigriventer) reveals high genetic diversity and low inbreeding. Scientific Reports, 8, 5588.

Peterson, B.K., Weber, J.N., Kay, E.H. et al. (2012) Double digest RADseq: an inexpensive method for de novo SNP discovery and genotyping in model and non-model species. PLOS ONE, 7, e37135.

Pritchard, J.K., Stephens, M. & Donnelly, P. (2000) Inference of population structure using multilocus genotype data. Genetics, 155, 945-959.

Quéméré, E., Crouau-Roy, B., Rabarivola, C. et al. (2010) Landscape genetics of an endangered lemur (Propithecus tattersalli) within its entire fragmented range. Molecular Ecology, 19, 1606-1621.

Rambaut, A., Drummond, A. J., Xie, D. et al. (2018) TRACER v.1.7, http://tree.bio.ed.ac.uk/software/tracer/

Remón, N., Galán, P. & Naveira, H. (2012) Chronicle of an extinction foretold: genetic properties of an extremely small population of Iberolacerta monticola. Conservation Genetics, 13, 131-142.

Ribera, I. & Blasco-Zumeta, J. (1998) Biogeographical links between steppe insects in the Monegros region (Aragon, NE Spain), the eastern Mediterranean, and central Asia. Journal of Biogeography, 25, 969-986.

Rosenberg, N.A. (2004) DISTRUCT: a program for the graphical display of population structure. Molecular Ecology Notes, 4, 137-138.

102

CAPÍTULO II

Rubidge, E.M., Patton, J.L., Lim, M. et al. (2012) Climate-induced range contraction drives genetic erosion in an alpine mammal. Nature Climate Change, 2, 285-288.

Ruiz-González, A., Gurrutxaga, M., Cushman, S.A. et al. (2014) Landscape genetics for the empirical assessment of resistance surfaces: The European pine marten (Martes martes) as a target-species of a regional ecological network. PLOS ONE, 9, e110552.

Saccheri, I., Kuussaari, M., Kankare, M. et al. (1998) Inbreeding and extinction in a butterfly metapopulation. Nature, 392, 491-494.

Saunders, D.A., Hobbs, R.J. & Margules, C.R. (1991) Biological consequences of ecosystem fragmentation - A review. Conservation Biology, 5, 18-32.

Schmid, S., Neuenschwander, S., Pitteloud, C. et al. (2018) Spatial and temporal genetic dynamics of the grasshopper Oedaleus decorus revealed by museum genomics. Ecology and Evolution, 8, 1480-1495.

Segelbacher, G., Cushman, S.A. Epperson, B.K. et al. (2010) Applications of landscape genetics in conservation biology: concepts and challenges. Conservation Genetics, 11, 375- 385.

Soltani, A.A. (1978) Preliminary synonymy and description of new species in the genus Dociostaurus Fieber, 1853 (Orthoptera: Acridoidea: Acrididae, Gomphocerinae) with a key to the species in the genus. Journal of Entomological Society of Iran, 2, 1-93.

Swofford, D.L. (2002) PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4.0b10. Sinauer Associates, Sunderland, MA.

Tantrawatpan, C., Saijuntha, W., Pilab, W. et al. (2011) Genetic differentiation among populations of Brachytrupes portentosus (Lichtenstein 1796) (Orthoptera: Gryllidae) in Thailand and the Lao PDR: the Mekong River as a biogeographic barrier. Bulletin of Entomological Research, 101, 687-696.

Thacker, C.E., Unmack, P.J., Matsui, L. & Rifenbark, N. (2007) Comparative phylogeography of five sympatric Hypseleotris species (Teleostei: Eleotridae) in south‐eastern Australia reveals a complex pattern of drainage basin exchanges with little congruence across species. Journal of Biogeography, 34, 1518-1533.

Thome, M.T. & Carstens, B.C. (2016) Phylogeographic model selection leads to insight into the evolutionary history of four-eyed frogs. Proceedings of the National Academy of Sciences of the United States of America, 113, 8010-8017.

Tian, H.Z., Han, L.X., Zhang, J.L. et al. (2018) Genetic diversity in the endangered terrestrial orchid Cypripedium japonicum in East Asia: Insights into population history and implications for conservation. Scientific Reports, 8, 6467.

103

CAPÍTULO II

Tinnert, J., Hellgren, O., Lindberg, J. et al. (2016) Population genetic structure, differentiation, and diversity in Tetrix subulata pygmy grasshoppers: roles of population size and immigration. Ecology and Evolution, 6, 7831-7846.

Wallis, G.P., Waters, J.M., Upton, P. & Craw, D. (2016) Transverse alpine speciation driven by glaciation. Trends in Ecology & Evolution, 31, 916-926.

Wang, I.J. (2013) Examining the full effects of landscape heterogeneity on spatial genetic variation: a multiple matrix regression approach for quantifying geographic and ecological isolation. Evolution, 67, 3403-3411.

Wang, I.J., Savage, W.K. & Shaffer, H.B. (2009) Landscape genetics and least-cost path analysis reveal unexpected dispersal routes in the California tiger salamander (Ambystoma californiense). Molecular Ecology, 18, 1365-1374.

Waples, R.S., Jones, R.P.J., Beckman, B.R. & Swan, G.A. (1991) Status review for Snake River fall Chinook salmon. 73 pp. Department of Commerce, National Oceanic and Atmospheric Administration. National Marine Fisheries Service. USA.

Willi, Y., Van Buskirk, J. & Hoffmann, A.A. (2006) Limits to the adaptive potential of small populations. Annual Review of Ecology, Evolution, and Systematics, 37, 433-458.

Yan, F., Lu, J., Zhang, B. et al. (2018) The Chinese giant salamander exemplifies the hidden extinction of cryptic species. Current Biology, 28, R590-R592.

Yuan, J.H., Cheng, F.Y. & Zhou, S.L. (2011) The phylogeographic structure and conservation genetics of the endangered tree peony, Paeonia rockii (Paeoniaceae), inferred from chloroplast gene sequences. Conservation Genetics, 12, 1539-1549.

Zastavniouk, C., Weir, L.K. & Fraser, D.J. (2017) The evolutionary consequences of habitat fragmentation: Body morphology and coloration differentiation among brook trout populations of varying size. Ecology and Evolution, 7, 6850-6862.

Zellmer, A.J. & Knowles, L.L. (2009) Disentangling the effects of historic vs. contemporary landscape structure on population genetic divergence. Molecular Ecology, 18, 3593-3602.

104

CAPÍTULO II

SUPPORTING INFORMATION

Supplementary Methods

GENOMIC DATA PROCESSING AND BIOINFORMATICS

We used both STACKS v. 1.35 (Hohenlohe et al., 2010; Catchen et al., 2011; Catchen et al.,

2013) and PYRAD v. 3.0.66 (Eaton et al., 2014) to assemble our sequences into de novo loci and call genotypes. This allowed us to examine the robustness of our analyses based on SNP datasets obtained using two of the most popular programs currently available to assemble RADseq data (Catchen et al., 2011; Eaton et al., 2014). Reads were de-multiplexed and filtered for overall quality using the program process_radtags, retaining reads with a Phred score > 10 (using a sliding window of 15%), no adaptor contamination, and that had an unambiguous barcode and restriction cut site. Raw reads were screened for quality with

FASTQC v. 0.11.5 (Simon, 2018) and all sequences were trimmed to 129-bp using SEQTK (Heng, 2017) in order to remove low-quality reads near the 3´ ends.

First, we used the different programs distributed as part of the STACKS v. 1.35 pipeline (ustacks, cstacks, sstacks, and populations) to assemble our sequences into de novo loci and call genotypes (Hohenlohe et al., 2010; Catchen et al., 2011; Catchen et al., 2013). Filtered reads of each individual were assembled de novo into putative loci with the ustacks program. The minimum stack depth (m) was set to three and we allowed a maximum distance of two nucleotide mismatches (M) to group reads into a “stack”. We used the “removal” (r) and “deleveraging” (d) algorithms to eliminate highly repetitive stacks and resolve over-merged loci, respectively. Single nucleotide polymorphisms (SNPs) were identified at each locus and genotypes were called using a multinomial-based likelihood model that accounts for sequencing errors, with the upper bound of the error rate (ε) set to 0.2 (Hohenlohe et al., 2010; Catchen et al., 2011; Catchen et al., 2013). A catalogue of loci was built using the cstacks program, with loci recognized as homologous across individuals if the number of nucleotide mismatches between consensus sequences (n) was ≤2. Each individual was matched against this catalogue using sstacks program and output files were

105

CAPÍTULO II exported in different formats for subsequent analyses using the program populations. We exported only the first SNP per RAD locus and retained loci that were sequenced in at least half of the individuals of each population (parameter r = 0.5) and represented in at least two (~33%; parameter p = 2) or four (~66%; parameter p = 4) populations (out of the six populations analysed; Table 1).

Second, we assembled our sequences into de novo loci using PYRAD v. 3.0.66 (Eaton et al., 2014). Briefly, reads retained after process_radtags were further quality-filtered with

PYRAD to convert base calls with a Phred score <20 into Ns and discard reads with >2 Ns. Retained reads were clustered within- and across samples considering two different thresholds of sequence similarity (Wclust = 90 and 95%) and clusters with a coverage depth <5 were discarded (Mindepth = 5). Consensus sequences with more than five heterozygous sites were excluded (maxH = 5) as well as loci containing one or more heterozygous sites across more than 5 samples (~15% of individuals; maxSH = p.15), as we expect that this represents a fixed difference among clustered paralogs rather than a true polymorphism (Eaton et al., 2014 and 2015). In a final filtering step, we excluded loci that were not recovered in at least 11 or 23 samples (minCov = 11 and 23, corresponding with ~33 and ~66% of individuals, respectively).

The choice of different filtering thresholds using either STACKS or PYRAD had little impact on the obtained inferences (see Fig. S2-4) (Eaton et al., 2015; Ortego et al., 2018). For this reason, unless otherwise indicated, all downstream analyses were performed using a

SNP dataset obtained with STACKS including only those loci that were represented in at least four populations (p = 4).

106

CAPÍTULO II

Figure S1 Number of reads per individual before and after different quality filtering steps by STACKS. The total height of the bars represents the total number of raw reads obtained for each individual. Within each bar, the dark red color represents the reads that were discarded by process_radtags due to low quality, adapter contamination or ambiguous barcode and orange color represents the reads that were discarded by ustacks after filtering out repetitive elements and reads that did not comply the different criteria required to create a “stack”. Green color represents the number of retained reads used to identify homologous loci. Populations are sorted from NW to SE and are labelled using the same codes presented in Table 1.

107

CAPÍTULO II

Figure S2 Results of Bayesian clustering analyses in STRUCTURE based on a random subset of 10,000 SNPs obtained with STACKS for p = 2 (a, c) and p = 4 (b, d). Panels a) and b) show the mean (± SD) log probability of the data (LnPr (X|K) over 10 best runs (left y-axis, black dots and error bars) for each value of K. The magnitude of ΔK as a function of K (right y-axis, open dots) indicates the best-supported number of clusters (K = 2 in all cases). Panels c) and d) show the individual’s probabilities of membership to each inferred genetic cluster from K = 2 to K = n (n: number of populations). Each individual is represented by a vertical bar, which is partitioned into k coloured segments showing the individual’s probability of belonging to the cluster with that colour. Thin vertical black lines separate individuals from different populations.

108

CAPÍTULO II

Figure S3 Results of Bayesian clustering analyses in STRUCTURE based on a random subset of 10,000 SNPs obtained with PYRAD using two different clustering thresholds of sequence similarity (Wclust = 90% and 95%) and two different values of minimum taxon coverage in a given locus (MinCov = 11 and 23). Panels a), b), e) and f) show the mean (± SD) log probability of the data (LnPr (X|K) over 10 best runs (left y-axis, black dots and error bars) for each value of K. The magnitude of ΔK as a function of K (right y-axis, open dots) indicates the best-supported number of clusters (K = 2 in all cases). Panels c), d), g) and h) show the individual’s probabilities of membership to each inferred genetic cluster for K = 2 and K = 3. Each individual is represented by a vertical bar, which is partitioned into k coloured segments showing the individual’s probability of belonging to the cluster with that colour. Thin vertical black lines separate individuals from different populations.

109

CAPÍTULO II

Figure S4 Principal component analyses (PCA) of genetic variation for populations of D. crassiusculus. Panels a) and b) show analyses based on SNP datasets obtained with STACKS considering two different filtering parameters (p = 2 and p = 4). Panels c), d), e) and f) show analyses based on SNP datasets obtained with PYRAD using two different clustering thresholds of sequence similarity (Wclust = 90% and 95%) and two different values of minimum taxon coverage in a given locus (MinCov = 11 and 23). Dotted-line rectangles group main population clusters. Population codes are described in Table 1.

110

CAPÍTULO II

Table S1 Population genetic statistics (P, π, HO, HE, and FIS) for the studied populations of the Iberian cross-backed grasshopper Dociostaurus crassiusculus. Average values across loci are presented for major allele frequency (P), nucleotide diversity (π), observed (HO) and expected (HE) heterozygosity, and the Wright’s inbreeding coefficient (FIS). Genetic statistics were calculated in STACKS for all positions (polymorphic and non-polymorphic) and considering loci that were represented in at least two (p = 2) or four populations (p = 4) and the 50% of individuals within populations (r = 0.5).

P π HO HE FIS Code p = 2 p = 4 p = 2 p = 4 p = 2 p = 4 p = 2 p = 4 p = 2 p = 4 TAJU 0.9992 0.9992 0.0013 0.0012 0.0009 0.0009 0.0011 0.0011 0.0007 0.0006 BELI 0.9992 0.9993 0.0012 0.0011 0.0009 0.0009 0.0010 0.0010 0.0005 0.0004 PHUE 0.9992 0.9992 0.0013 0.0012 0.0009 0.0009 0.0011 0.0011 0.0007 0.0006 SALI 0.9993 0.9993 0.0012 0.0011 0.0009 0.0009 0.0010 0.0010 0.0004 0.0004 BONI 0.9992 0.9993 0.0012 0.0011 0.0009 0.0009 0.0010 0.0010 0.0006 0.0005 ORCE 0.9992 0.9993 0.0012 0.0011 0.0009 0.0009 0.0010 0.0010 0.0006 0.0005

Table S2 Pairwise FST values calculated in ARLEQUIN. All FST values are significantly different from zero after sequential Bonferroni corrections (P < 0.05).

Code TAJU BELI PHUE SALI BONI ORCE TAJU --

BELI 0.063 --

PHUE 0.071 0.099 --

SALI 0.074 0.107 0.033 --

BONI 0.105 0.125 0.088 0.093 --

ORCE 0.206 0.233 0.212 0.192 0.237 --

111

CAPÍTULO II

Table S3 Multiple Matrix Regressions with Randomization (MMRR) for genetic differentiation (FST) in relation with resistance distances defined by (i) a flat landscape, (ii) topographic roughness (slope), (iii) limits of main river basins, (iv) habitat, and (v) lithology. The last three scenarios considered a range of hypothetical resistance values offered by the barrier (limit of main river basins) or the areas not occupied by the species (non-suitable habitats/lithologies). Non-natural habitats (agriculture and artificial surfaces) were assumed to offer twice the resistance than natural habitats not occupied by the species (asterisks) (see Methods for further details). The best-supported scenario is indicated in bold.

Model Resistance values R2 β t p Flat landscape - 0.739 1.784 6.074 0.005 Topography - 0.657 0.998 4.987 0.033 Limits of main river basins 2.5 0.751 1.764 6.259 0.006 Limits of main river basins 5 0.762 1.740 6.460 0.006 Limits of main river basins 10 0.778 1.703 6.750 0.006 Limits of main river basins 20 0.798 1.645 7.156 0.004 Limits of main river basins 30 0.810 1.597 7.437 0.004 Limits of main river basins 40 0.818 1.555 7.638 0.003 Limits of main river basins 50 0.823 1.519 7.780 0.003 Limits of main river basins 75 0.830 1.443 7.956 0.002 Limits of main river basins 100 0.830 1.382 7.973 0.001 Limits of main river basins 125 0.828 1.332 7.904 0.003 Limits of main river basins 150 0.824 1.289 7.789 0.003 Limits of main river basins 250 0.800 1.165 7.217 0.003 Limits of main river basins 500 0.745 1.001 6.155 0.002 Limits of main river basins 1,000 0.678 0.868 5.228 0.002 Limits of main river basins 2,500 0.606 0.756 4.470 0.004 Limits of main river basins 5,000 0.571 0.711 4.162 0.002 Limits of main river basins 10,000 0.548 0.684 3.973 0.004 Limits of main river basins 25,000 0.527 0.663 3.807 0.002 Limits of main river basins 50,000 0.515 0.652 3.712 0.025 Limits of main river basins 100,000 0.505 0.644 3.642 0.021 Limits of main river basins 250,000 0.497 0.638 3.584 0.018 Limits of main river basins 500,000 0.494 0.635 3.561 0.023 Limits of main river basins 1,000,000 0.492 0.634 3.549 0.020 Habitat 2.5/1.25* 0.682 1.737 5.283 0.003 Habitat 5/2.5* 0.611 1.531 4.516 0.010 Habitat 10/5* 0.545 1.381 3.948 0.019 Habitat 20/10* 0.494 1.280 3.564 0.029 Habitat 30/15* 0.471 1.237 3.399 0.040 Habitat 40/20* 0.456 1.212 3.304 0.044 Habitat 50/25* 0.447 1.196 3.240 0.040 Habitat 75/37.5* 0.432 1.172 3.144 0.046 Habitat 100/50* 0.423 1.158 3.090 0.056 Habitat 125/62.5* 0.418 1.150 3.054 0.063 Habitat 150/75* 0.414 1.144 3.029 0.045 Habitat 250/125* 0.405 1.130 2.973 0.064 Habitat 500/250* 0.397 1.119 2.925 0.061 Habitat 1,000/500* 0.392 1.113 2.898 0.075 Habitat 2,500/1,250* 0.390 1.109 2.880 0.077

112

CAPÍTULO II

(continuation of Table S3)

Model Resistance values R2 β t p Habitat 5,000/2,500* 0.389 1.108 2.874 0.067 Habitat 10,000/5,000* 0.388 1.107 2.871 0.077 Habitat 25,000/12,500* 0.388 1.107 2.869 0.063 Habitat 50,000/25,000* 0.388 1.107 2.869 0.068 Habitat 100,000/50,000* 0.388 1.107 2.868 0.086 Habitat 250,000/125,000* 0.388 1.107 2.868 0.089 Habitat 500,000/250,000* 0.388 1.106 2.868 0.079 Habitat 1,000,000/500,000* 0.388 1.106 2.868 0.080 Lithology 2.5 0.777 1.444 6.723 0.006 Lithology 5 0.763 1.239 6.469 0.017 Lithology 10 0.736 1.093 6.018 0.041 Lithology 20 0.700 0.990 5.506 0.081 Lithology 30 0.675 0.942 5.199 0.102 Lithology 40 0.657 0.912 4.987 0.089 Lithology 50 0.642 0.890 4.830 0.098 Lithology 75 0.617 0.855 4.575 0.126 Lithology 100 0.601 0.832 4.427 0.111 Lithology 125 0.591 0.817 4.335 0.126 Lithology 150 0.584 0.805 4.276 0.138 Lithology 250 0.575 0.777 4.192 0.115 Lithology 500 0.581 0.752 4.244 0.100 Lithology 1,000 0.599 0.735 4.403 0.070 Lithology 2,500 0.621 0.722 4.618 0.042 Lithology 5,000 0.633 0.716 4.739 0.037 Lithology 10,000 0.642 0.712 4.823 0.027 Lithology 25,000 0.648 0.710 4.887 0.025 Lithology 50,000 0.650 0.709 4.912 0.019 Lithology 100,000 0.651 0.709 4.925 0.033 Lithology 250,000 0.652 0.708 4.933 0.018 Lithology 500,000 0.652 0.708 4.936 0.028 Lithology 1,000,000 0.652 0.708 4.937 0.026

113

CAPÍTULO II

Supplementary References

Catchen, J.M., Amores, A., Hohenlohe, P.A. et al. (2011) STACKS: building and genotyping loci de novo from short-read sequences. G3: Genes|Genomes|Genetics, 1, 171-182.

Catchen, J.M., Hohenlohe, P.A., Bassham, S. et al. (2013) STACKS: an analysis tool set for population genomics. Molecular Ecology, 22, 3124-3140.

Eaton, D.A. (2014) PYRAD: assembly of de novo RADseq loci for phylogenetic analyses. Bioinformatics, 30, 1844-1849.

Eaton, D.A., Hipp, A.L., González-Rodríguez, A. & Cavender-Bares, J. (2015) Historical introgression among the American live oaks and the comparative Nature, of tests for introgression. Evolution, 69, 2587-2601.

Heng, L. (2017) SEQTK. [WWW document]. URL https://github.com/lh3/seqtk

Hohenlohe, P.A., Bassham, S., Etter, P.D. et al. (2010) Population genomics of parallel adaptation in threespine stickleback using sequenced RAD tags. PLOS Genetics, 6, e1000862.

Ortego, J., Gugger, P.F. & Sork, V.L. (2018) Genomic data reveal cryptic lineage diversification and introgression in Californian golden cup oaks (section Protobalanus). New Phytologist, 218, 804-818.

Simon, A. (2018) FASTQC v.0.11.7. [WWW document]. URL http://www.bioinformatics.babraham.ac.uk/projects/fastqc/

114

CAPÍTULO 3

Spatiotemporally explicit demographic modelling supports a joint effect of historical barriers to dispersal and contemporary landscape composition on structuring genomic variation in a red-listed grasshopper

María José González-Serna, Pedro J. Cordero & Joaquín Ortego Molecular Ecology (2019), en revisión.

CAPÍTULO III

Spatiotemporally explicit demographic modelling supports a joint effect of historical barriers to dispersal and contemporary landscape composition on structuring genomic variation in a red-listed grasshopper

Abstract

Inferring the processes underlying spatial patterns of genomic variation is fundamental to understand how organisms interact with landscape heterogeneity and to identify the factors determining species distributional shifts. Here, we employ genomic data (ddRADseq) to test biologically-informed models representing historical and contemporary demographic scenarios of population connectivity for the Iberian cross-backed grasshopper Dociostaurus hispanicus, a species with a narrow distribution that currently forms highly fragmented populations. All models incorporated biological aspects of the focal taxon that could hypothetically impact its geographical patterns of genomic variation, including (a) spatial configuration of impassable barriers to dispersal defined by topographic landscapes not occupied by the species, (b) distributional shifts resulted from the interaction between the species bioclimatic envelope and Pleistocene glacial cycles, and (c) contemporary distribution of suitable habitats after extensive land clearing for agriculture. Spatiotemporally-explicit simulations under different scenarios considering these aspects and statistical evaluation of competing models within an Approximate Bayesian Computation (ABC) framework supported spatial configuration of topographic barriers to dispersal and human-driven habitat fragmentation as the main factors explaining the geographical distribution of genomic variation in the species, with no apparent impact of hypothetical distributional shifts linked to Pleistocene climatic oscillations. Collectively, this study supports that both historical (i.e., topographic barriers) and contemporary (i.e., anthropogenic habitat fragmentation) aspects of landscape composition have shaped major axes of genomic variation in the studied species and emphasizes the potential of model- based approaches to gain insights into the temporal scale at which different processes impact the demography of natural populations.

117

CAPÍTULO III

Keywords: Approximate Bayesian Computation, ddRADseq, Dociostaurus hispanicus, environmental niche modelling, landscape genetics, SPLATCHE2.

INTRODUCTION

The neutral genetic makeup of populations is modulated by intrinsic demographic parameters, which in turn are determined by the interaction between species-specific biological traits (e.g. dispersal capacity, physiological tolerances, environmental niche, etc.) and the different spatial and temporal components that define landscape heterogeneity (e.g. current and past climate, geological changes, spatial configuration of barriers to dispersal, etc.) (Ortego et al., 2015; Paz et al., 2015; Sukumaran & Knowles, 2018). Range size expansions/contractions and population fragmentation/coalescence linked to past or contemporary phenomena are demographic processes experienced by most organisms that can ultimately have important evolutionary consequences, from species extinction to rapid speciation (Rand, 1948; Knowles, 2000; Lorenzen et al., 2011; Mescht et al., 2015; Talbot et al., 2016; van der Weir et al., 2016). For this reason, understanding how organisms have been impacted by past environmental changes and interact with landscape structure is fundamental to elucidate the evolutionary and ecological processes underlying their current distribution, abundance and spatial patterns of genetic variation. The development of spatiotemporally-explicit analytical approaches based in coalescent theory have made it possible to simulate complex demographic processes (e.g. bottlenecks and expansions) under a suite of contemporary and historical scenarios that can be statistically evaluated (Currat et al., 2004; e.g. Neuenschwander et al., 2008; Estoup et al., 2010; Ray et al., 2010; He et al., 2013). Recently, much emphasis in the literature has been also put on testing biologically-informed models to gain insights into the taxon-specific causal mechanisms structuring a species’ gene pool (Paz et al., 2015; Papadopoulou & Knowles, 2016; Sukumaran & Knowles, 2018). These conceptual and analytical advances have proven useful for generating refined hypotheses and testing the impacts of microhabitat preferences (Massatti & Knowles, 2016), physiological trade-offs (Bemmels et al., 2016), and climate- induced range shifts (He et al., 2013; Reid et al., 2018) on the spatial distribution of genetic

118

CAPÍTULO III variation in numerous organisms (Hoban, 2014). Ultimately, this can allow better inference of biological processes that in turn are useful to predict and mitigate threats associated with various components of global change, such as warming climate and habitat fragmentation (Espindola et al., 2012; Brown et al., 2016; Estrada et al., 2016; Prates et al., 2016; Morrison et al., 2018).

Climate shifts during the late Quaternary have altered the distribution and spatial patterns of genetic variation in many taxa (Hewitt, 2000), are recognized as an important engine of speciation (Weir et al., 2016), and have shaped large-scale gradients of species diversity (Sandel et al., 2011). Although the consequences of Pleistocene glaciations are well documented in numerous organisms from northern latitudes or linked to montane or alpine environments (Hewitt, 2000; Carstens & Knowles, 2007), their impacts on species distributed at lower latitudinal and elevational ranges, such as those inhabiting flat landscapes from the Mediterranean region, are more difficult to predict and much less well understood (Hewitt, 2000; Gómez & Lunt, 2006). Phylogeographic studies in southern Mediterranean peninsulas have identified multiple species-specific cryptic refugia (Gómez & Lunt, 2006; Stewart et al., 2010; Hewitt, 2011), which suggests that organisms from this region have probably responded to glacial and interglacial periods through local distributional shifts (Gómez & Lunt, 2006; Abellan & Svenning, 2014). Such local-scale distributional shifts can be difficult to predict from environmental niche models and currently available palaeoclimatic layers (Waltari et al., 2007). Such layers effectively capture drastic temperature and precipitation oscillations characterizing steep elevational gradients and northern latitudes that became covered by ice, but may fail to reflect changes in microclimatic conditions that are relevant for species distributed in other areas such as flat lowland landscapes from the climatically more stable Mediterranean region (Waltari et al., 2007; Worth et al., 2014). This region has been also subjected to millennia of intensive logging and land clearing for agriculture (Blondel & Aronson, 1999), which has reduced the original extent of its primary vegetation by around 95% (Myers et al., 2000). Although anthropogenic activities have altered every ecosystem from the Mediterranean region, the consequences of habitat fragmentation are particularly remarkable for organisms occupying flat landscapes at low elevations, as these areas are highly suitable for crop production (e.g. Ortego et al., 2015). It is consequently

119

CAPÍTULO III worth investigating whether the current spatial distribution of genetic variation in these species has been predominantly shaped by population fragmentation resulted from human- driven habitat destruction or if the genetic signals left by historical processes such as Pleistocene climate-change distributional shifts and the disruption of gene flow driven by geographical barriers to dispersal still prevail (Cunningham & Moritz, 1998; Vandergast et al., 2007; Zellmer & Knowles, 2009).

Despite the fact that understanding how organisms interact with landscape heterogeneity at different time scales is critical for establishing effective conservation and informed management (Saccheri et al., 1998; Rubidge et al., 2012), disentangling the historical and contemporary processes shaping their spatial distribution of genetic variation is challenging for several reasons (Zellmer & Knowles, 2009). First, the time-lag between the point in which habitat fragmentation takes place and when signatures of disrupted gene flow become detectable is often long (Keyghobadi et al., 2005; Landguth et al., 2010). Second, the amount of time during which historical and contemporary processes impact the genetic makeup of a species generally differ in several orders of magnitude and their study has traditionally required the application of different genetic markers (e.g. mitochondrial DNA vs. microsatellite markers; Wang, 2010). Finally, historical and contemporary landscape structure can be spatially autocorrelated, which can make difficult to tease apart the effects of past and recent changes in landscape composition (Zellmer & Knowles, 2009). In the last decade, the advent of high-throughput sequencing technology, computational advances and new modelling approaches have made possible to bridge landscape genetics and phylogeography (Rissler, 2016), address questions at different evolutionary levels without being constrained by the nature of the genetic markers employed (Villaverde et al., 2018), and improve our capacity to disentangle processes taking place at contrasting spatiotemporal scales within unified, hypothesis-testing frameworks (He et al., 2013).

Here, we employ genomic data obtained via restriction site-associated DNA sequencing (ddRADseq; Peterson et al., 2012) to test alternative demographic models representing historical and contemporary scenarios of population connectivity in the cross- backed grasshopper Dociostaurus hispanicus Bolívar, 1898. This narrow-endemic species is

120

CAPÍTULO III exclusively distributed in dehesas, open Mediterranean forests, scrublands and pastures scattered in the flat landscapes characterizing the central Iberian Peninsula (García et al., 2005; Presa et al., 2016). Human-driven habitat destruction drove severe population declines and population fragmentation in this species and, as a result, it has been recently included in the IUCN Red List of Threatened Species with the category “Near Threatened” (Hochkirch et al., 2016; Presa et al., 2016). Specifically, we first used genomic data to infer the spatial patterns of genetic diversity and structure of the studied species and reconstruct historical demographic profiles of its different populations (Fig. 1). Second, we generated a suite of biologically-informed models representing hypothetical processes that might have shaped the spatial distribution of genomic variation in our focal taxon and tested them within an Approximate Bayesian Computation framework (He et al., 2013; e.g. Neuenschwander et al., 2008) (Fig. 1). All tested models incorporated biological aspects of the focal taxon that could have hypothetically impacted its geographical patterns of genetic variation, including the spatial configuration of barriers to dispersal defined by rugged landscapes not occupied by the species, distributional shifts resulting from the interaction between the species-specific bioclimatic envelope and Pleistocene glacial cycles, and the contemporary distribution of remnant habitat patches after large-scale processes of land clearing for agriculture. This information was used to inform competing demographic models and infer, through the evaluation of their relative statistical support, the role of historical and contemporary components of landscape heterogeneity on structuring genomic variation in the focal species (He et al., 2013). Our study exemplifies how a unified, model-based framework integrating genomic data and different sources of spatial information can help to tease apart the role of contemporary and historical processes on shaping geographical patterns of genetic variation. This can improve our ability to reach biologically-meaningful insights about the proximate ecological and evolutionary processes governing the distribution of species and, ultimately, understand how organisms respond to changing and heterogeneous landscapes at contrasting temporal scales (Massatti & Knowles, 2016).

121

CAPÍTULO III

Figure 1 Workflow summarizing the integrative approach employed in this study to infer the processes shaping the spatial distribution of genomic variation in the cross-backed grasshopper Dociostaurus hispanicus. iDDC: Integrative distributional, demographic and coalescent modelling; ABC: Approximate Bayesian computation; RM: Regularization multiplier in MAXENT; FC: Feature class in MAXENT; LGM: Last Glacial Maximum.

122

CAPÍTULO III

MATERIALS AND METHODS

POPULATION SAMPLING

During 2012-2016, we prospected potentially adequate habitats (i.e., dry pastures in steppe lands, dehesas, wastelands with scattered slate formations, and lands with sparse vegetation) for Dociostaurus hispanicus and collected samples from ten populations (Table 1; Fig. 2) spanning the entire distribution range of the species (Presa et al., 2016). Dociostaurus hispanicus is classified as “Near Threatened” according to the IUCN Red List of Threatened Species due to the high fragmentation of its small-size populations (Hochkirch et al., 2016; Presa et al., 2016). For this reason, we analysed only 6-7 adult individuals per population (Table 1). We placed fresh whole adult specimens in vials with 2,000 µL ethanol 96% and stored them at -20° C until needed for genomic analyses.

Figure 2 Map displaying the geographical location of sampling populations of Dociostaurus hispanicus in the Iberian Peninsula. Warmer colours and larger circle sizes denote higher levels of population genetic diversity (π, nucleotide diversity). The dashed polygon indicates the area considered to run SPLATCHE2 analyses (see Fig. 3). Population codes are described in Table 1.

123

CAPÍTULO III

Table 1 Geographical location and genetic statistics (π, HO, HE, P and FIS) for the studied populations of Dociostaurus hispanicus. Genetic statistics were calculated in STACKS for all positions (polymorphic and non-polymorphic) and only variant (polymorphic) positions considering loci that were represented in all populations (p = 10) and the 80% of individuals within populations (r = 0.8). Number of analysed individuals (n); average values across loci are presented for nucleotide diversity (π), observed (HO) and expected (HE) heterozygosity, major allele frequency (P), and the Wright’s inbreeding coefficient (FIS).

All positions (variant and fixed) Variant positions Locality (Province) Code n Latitude Longitude π HO HE P FIS π HO HE P FIS Trabanca (Salamanca) TRAB 7 41.242753 -6.402215 0.0010 0.0009 0.0010 0.9993 0.0003 0.1330 0.1188 0.1233 0.9094 0.0373 Navas de San Antonio NAVA 6 40.744560 -4.299340 0.0010 0.0009 0.0010 0.9993 0.0003 0.1354 0.1216 0.1237 0.9088 0.0327 (Segovia) Puerto de Peña Negra (Ávila) PNEG 6 40.432180 -5.316690 0.0010 0.0010 0.0009 0.9993 0.0000 0.1253 0.1237 0.1146 0.9150 0.0029 Ávila (Ávila) AVIL 6 40.624106 -4.689135 0.0009 0.0008 0.0008 0.9994 0.0003 0.1132 0.0985 0.1037 0.9224 0.0326 (Madrid) ALDE 6 40.391270 -4.209260 0.0011 0.0010 0.0010 0.9992 0.0003 0.1455 0.1292 0.1327 0.9028 0.0388 Casa Ventorro (Cáceres) VENT 6 39.685520 -5.763620 0.0012 0.0011 0.0011 0.9992 0.0004 0.1561 0.1359 0.1428 0.8954 0.0496 Trujillo (Cáceres) TRUJ 6 39.474870 -5.892462 0.0012 0.0011 0.0011 0.9992 0.0002 0.1556 0.1426 0.1424 0.8960 0.0318 Belalcázar (Córdoba) BELA 6 38.594329 -5.157927 0.0013 0.0011 0.0011 0.9992 0.0004 0.1617 0.1392 0.1478 0.8919 0.0550 Valle de Alcudia (Ciudad Real) ALCU 6 38.566058 -4.310914 0.0012 0.0011 0.0011 0.9992 0.0003 0.1554 0.1394 0.1420 0.8961 0.0369 Santa Elena (Jaén) SANT 6 38.332111 -3.529301 0.0012 0.0011 0.0011 0.9992 0.0004 0.1597 0.1397 0.1462 0.8934 0.0495

124

CAPÍTULO III

GENOMIC LIBRARY PREPARATION AND PROCESSING OF GENOMIC DATA

We used NucleoSpin Tissue kits (Macherey-Nagel, Durën, Germany) to extract and purify genomic DNA from the hind femur of each individual. Genomic DNA was processed into two genomic libraries using the double-digestion restriction-fragment-based procedure (ddRADseq) described in Peterson et al. (2012) (see Supporting Information Methods S1).

We used the different programs distributed as part of the STACKS v.1.35 pipeline (process_radtags, ustacks, cstacks, sstacks, and populations) to assemble our sequences into de novo loci and call genotypes (Hohenlohe et al., 2010; Catchen et al., 2011; Catchen et al., 2013). See Supporting Information Methods S2 for details on sequence assembling and data filtering.

POPULATION GENETIC STRUCTURE

We calculated genetic differentiation among populations and used Bayesian clustering and principal component analyses to determine the geographical patterns of genetic structure in the studied species. We then used this information to evaluate the degree of genetic fragmentation of the populations at different spatial scales, and to interpret inferences obtained from spatiotemporally explicit demographic model testing (Fig. 1). We calculated genetic differentiation (FST) between each pair of populations in ARLEQUIN v.3.5 (Excoffier & Lischer, 2010). We inferred genetic structure using the Bayesian Markov chain Monte Carlo clustering method implemented in the program STRUCTURE v.2.3.3 (Pritchard et al., 2000;

Falush et al., 2003; Hubisz et al., 2009). We conducted STRUCTURE analyses hierarchically, initially analysing data from all populations jointly and, subsequently, running independent analyses for subsets of populations assigned to the same genetic cluster in the previous hierarchical level analysis (e.g. Noguerales et al., 2016). We performed 15 independent runs for each value of K, where K ranged from 1 to n+1 for each dataset of n populations, to estimate the “true” number of clusters. We ran STRUCTURE using 200,000 MCMC iterations after a burn-in step of 100,000 iterations, assuming correlated allele frequencies and admixture, and without using prior population information (Hubisz et al., 2009). We retained

125

CAPÍTULO III the ten runs having the highest likelihood for each value of K and identified the number of genetic clusters best fitting the data set using log probabilities [Pr(X|K)] (Pritchard et al.,

2000) and the ΔK method (Evanno et al., 2005), as implemented in STRUCTURE HARVESTER (Earl

& vonHoldt, 2012). We used CLUMPP v.1.1.2 and the Greedy algorithm to align multiple runs of STRUCTURE for the same K value (Jakobsson & Rosenberg, 2007) and DISTRUCT v.1.1 (Rosenberg, 2004) to visualize as bar plots the individual’s posterior probabilities of population membership to K genetic clusters. Complementary to Bayesian clustering analyses and in order to visualize the major axes of population genetic differentiation, we performed an individual-based principal components analysis (PCA) using the R v.3.3.3 (R

Core Team 2017) package ADEGENET (Jombart, 2008).

GENETIC DIVERSITY AND PAST DEMOGRAPHIC HISTORY

We used genomic data to reconstruct the time and scale of historical changes in effective population size (Ne) of each studied population and assess whether they have experienced parallel or contrasting demographic histories that might explain spatial heterogeneity in contemporary levels of genetic diversity (Fig. 1). First, we calculated different estimates of genetic diversity for each studied population (Table 1) using the program populations from

STACKS (Catchen et al., 2013) and employed: (i) one-way ANOVAs to test for significant differences in genetic variability between the main clusters inferred by STRUCTURE and PCA analyses (see RESULTS section); and (ii) linear regressions to analyse geographical clines (i.e., latitude and longitude) of genetic diversity. We performed these analyses in SPSS v.24 software (IBM, NY, USA). Second, we reconstructed the past demographic history of the studied populations using the program STAIRWAY PLOT, a method based on the site frequency spectrum (SFS) that does not require whole-genome sequence data or reference genome information (Liu & Fu, 2015). Given the marked genetic structure and significant genetic differentiation of all studied populations (see RESULTS section), we ran STAIRWAY PLOT analyses for each sampled population individually (Liu & Fu, 2015; e.g. Mattila et al., 2017; Miles et al., 2017). To compute the SFS for each population, we ran the program populations

126

CAPÍTULO III

from STACKS (Catchen et al., 2013) in order to export only the first SNP per RAD locus and retain loci with a minimum stack depth ≥ 5 (m = 5) and that were represented in at least 50% of the individuals of the population (r = 0.5). To remove all missing data for the calculation of the SFS and minimize errors with allele frequency estimates, each population was down- sampled to five individuals using a custom Python script written by Qixin He and available on

Dryad (Papadopoulou & Knowles, 2015). We ran STAIRWAY PLOT for each population fitting a flexible multi-epoch demographic model, considering a 1-year generation time, assuming the mutation rate per site per generation of 2.8 × 10-9 estimated for Drosophila melanogaster (Keightley et al., 2014), and performing 200 bootstrap replicates to estimate 95% confidence intervals.

ENVIRONMENTAL NICHE MODELLING

We built an environmental niche model (ENM) to predict the geographic distribution of climatically suitable habitats for D. hispanicus both in the present and during the last glacial maximum (LGM, 21 ka). We used this information to create maps of environmental suitability in these two time periods and test dynamic demographic models considering hypothetical distributional shifts experienced by the species in response to Pleistocene climatic oscillations (e.g. He et al., 2013; Massatti & Knowles, 2016) (see details in section TESTING ALTERNATIVE DEMOGRAPHIC MODELS). To build the ENM, we used the maximum entropy algorithm implemented in MAXENT v.3.3.3 (Elith et al., 2006; Phillips et al., 2006;

Phillips & Dudík, 2008; Elith et al., 2011) and the 19 bioclimatic variables from the WORLDCLIM dataset (http://www.worldclim.org/) interpolated to 30-arcsec resolution (~1 km2 cell size) (Hijmans et al., 2005). To generate climate suitability maps during the LGM, we projected the ENM onto LGM bioclimatic conditions derived from the CCSM4 (Community Climate System Model; Braconnot et al., 2007) and the MIROC-ESM (Model of Interdisciplinary Research on Climate; Hasumi & Emori, 2004) general atmospheric circulation models. Palaeoclimatic reconstructions under both models yielded qualitatively similar predictions about the distribution of the species during the LGM, and we performed all subsequent analyses using the MIROC-ESM model (e.g. Wachter et al., 2016). We built the ENM using

127

CAPÍTULO III our own species occurrence data and records available in the literature, the Global Biodiversity Information Facility (https://www.gbif.org/), and entomological collections from the Spanish National Museum of Natural Sciences (MNCN, Madrid) (see Table S1). Prior to modelling, we mapped and examined all records to identify and exclude those having obvious geo-referencing errors. To reduce the problems associated with sampling bias and the spatial aggregation of records, we applied a systematic sampling correction by randomly selecting a single occurrence record among those falling within the same ~16-km2 grid cell (Fourcade et al., 2014). After this data filtering step, we retained a total of 57 occurrence records. ENMs built using all available records (n = 103; Table S1) or performing systematic sampling corrections at other spatial scales (~1-km2, n = 96 records; ~4-km2, n = 89 records; ~8-km2, n = 80 records) yielded qualitatively identical results. We used the R package

ENMEVAL (Muscarella et al., 2014) and the Akaike’s Information Criterion corrected for small sample size (AICc) (Burnham & Anderson, 1998; Warren & Seifert, 2011) to conduct parameter tuning and determine the optimal feature class (FC) and regularization multiplier

(RM) settings for MAXENT. Complementary, we evaluated the performance of the retained model using the “block” method for data partitioning into training and testing datasets (Muscarella et al., 2014). Specifically, we calculated the area under the receiver-operating characteristic plot on the testing data (AUCTEST) and the minimum training presence omission rate (ORMTP). An AUCTEST value > 0.9 suggests a high discriminatory ability of the model

(Peterson et al., 2011) whereas an ORMTP close to zero is indicative of a low degree of model overfitting (Radosavljevic & Anderson, 2014). Further details on model and variable selection are presented in Supporting Information Methods S3.

TESTING ALTERNATIVE DEMOGRAPHIC MODELS

We used the integrative distributional, demographic and coalescent (iDDC) approach (He et al., 2013) and an Approximate Bayesian Computation (ABC) framework (Beaumont et al., 2002; Csilléry et al., 2010) to test alternative models representing different hypotheses about how landscape heterogeneity has impacted the demography and spatial distribution

128

CAPÍTULO III of genomic variation in our focal taxon (Fig. 1). This approach is described in detail in He et al. (2013) and consists of three main steps: (i) constructing alternative demographic models representing different hypotheses about how carrying capacities (k) vary across space and through time (i.e., spatiotemporally explicit models); (ii) running demographic and genetic simulations under each model using the software SPLATCHE2 (see Ray et al., 2010); (iii) evaluating the fit of observed genomic data (i.e., empirical data obtained after genotyping sampled populations) to the genetic expectations under each model, identifying the most probable model/s, and estimating demographic parameters (e.g. He et al., 2013; Massatti & Knowles, 2016) (Fig. 1). Below we indicate the most relevant aspects for each of these three steps and present an extended explanation in Supporting Information Methods S4.

Constructing demographic models. We generated three basic demographic models: (i) a static model in which carrying capacities (k) are homogeneous across space and time. This scenario is analogous to a flat landscape or an isolation-by-distance model (e.g. He et al., 2013); (ii) a dynamic model representing distributional shifts resulted from the interaction between the species bioclimatic envelope and Pleistocene glacial cycles (e.g. He et al., 2013; Massatti & Knowles, 2016). In this scenario, carrying capacities change over time according to climatic suitability maps obtained from projections of the ENM to the present and the LGM bioclimatic conditions (see section ENVIRONMENTAL NICHE MODELLING). This dynamic model considered landscapes from three consecutive time periods (LGM, intermediate, current) reflecting temporal shifts in the spatial distribution of environmentally suitable areas for the species in response to climate changes since the LGM (e.g. He et al., 2013; Massatti & Knowles, 2016). As done in previous studies, carrying capacities were scaled proportionally to logistic climatic suitability scores obtained from the ENM for each time period (e.g. He et al., 2013; Massatti & Knowles, 2016; Knowles & Massatti, 2017). Thus, we assumed that the carrying capacity for each grid cell was proportional to the estimated probability of presence of the species in that grid cell; (iii) a dynamic model starting with a flat landscape (i.e., non-fragmented) that shifted to a fragmented landscape in which carrying capacities for habitats unsuitable for the species were n times lower than those assigned to suitable habitats (Fig. 3; Table 2).

129

CAPÍTULO III

Figure 3 Layers used to run alternative demographic models in SPLATCHE2. Model A is a static model defined by a flat landscape in which carrying capacities of demes (k) are homogeneous across space and time. Model D is a dynamic model in which carrying capacities change over three time periods and are scaled proportionally to logistic environmental suitability scores (red: high; blue: low) obtained from projections of the species- specific environmental niche model (ENM) to the present (panel c) and the last glacial maximum (panel a). Model G is a dynamic model starting with a flat landscape (panels a and b) that shifted to a contemporary fragmented landscape in which carrying capacities for habitats unsuitable for the species (in blue) are smaller than those assigned to suitable habitats (in red) (panel c). Each of these three basic models was also run considering barriers to dispersal (areas with slopes > 20 % not occupied by the focal taxon; in black) (Models B, E, and H, respectively) and incorporating a friction layer defined by topographic roughness (Models C, F, and I, respectively; not shown). Green dots in initial layers of each model indicate the locations of ancestral populations used to initiate the simulations. Panel (c) for Model D shows the species records (black crosses) used to build the ENM. The geographical location of the layers used to run the models is indicated in Fig. 2.

130

CAPÍTULO III

Table 2 Statistics from the Approximate Bayesian Computation (ABC) procedure used for evaluating the relative support of each demographic model. A higher marginal density corresponds to a higher model support and non-significant p-values indicate that the model is able to reproduce data in agreement with observed empirical data (Wegmann et al., 2010). Bayes factors (BF) represent the degree of relative support for the most supported model (in bold) over the other models. BF >20 indicate strong support, while those >150 indicate very strong support (Kass & Raftery, 1995). Models G-I were fine-tuned considering different relative values of carrying capacities for unsuitable and suitable habitats (1:10, 1:100 and 1:1000) (Ratio). The coefficient of determination (R2) between each parameter and the five partial least squares (PLS) components retained is indicative of the power of estimating the parameters.

Marginal Bayes R2 Model Description Ratio p-value density Factor Kmax m NANC A Flat landscape - 9.06 × 10-07 0.078 448.08 0.275 0.826 0.924 B Flat landscape + barriers - 3.60 × 10-05 0.357 11.26 0.417 0.837 0.916 C Flat landscape + roughness - 7.11 × 10-08 0.005 5708.09 0.385 0.812 0.927 D ENMroughness - 4.02 × 10-08 0.006 10089.70 0.267 0.826 0.929 E ENM + barriers - 7.03 × 10-06 0.335 57.78 0.386 0.836 0.927 F ENM + roughness - 1.15 × 10-07 0.002 3534.66 0.331 0.818 0.923 G Habitat 1:10 1.85 × 10-05 0.774 21.95 0.301 0.822 0.918 G Habitat 1:100 2.89 × 10-05 0.908 14.07 0.313 0.824 0.918 G Habitat 1:1000 1.93 × 10-05 0.804 20.99 0.288 0.826 0.920 H Habitat + barriers 1:10 4.06 × 10-04 0.957 1.00 0.397 0.845 0.922 H Habitat + barriers 1:100 1.88 × 10-04 0.988 2.16 0.372 0.825 0.924 H Habitat + barriers 1:1000 7.72 × 10-05 0.804 5.26 0.366 0.820 0.921 I Habitat + roughness 1:10 1.38 × 10-07 0.010 2943.72 0.371 0.834 0.923 I Habitat + roughness 1:100 1.50 × 10-06 0.032 270.10 0.374 0.844 0.929 I Habitat + roughness 1:1000 2.12 × 10-06 0.060 191.87 0.348 0.817 0.923

The distribution of suitable habitats was estimated from Corine Land Cover maps (CORINE, 2012). We considered as suitable habitats for the species the Corine Land Cover classes “broad-leaved forest”, “transitional woodland-shrub”, “land principally occupied by agriculture, with significant areas of natural vegetation”, “sclerophyllous vegetation”, “agro- forestry areas”, “pastures” and “natural grasslands”, which represent the habitats used by the species according to occurrence data (Table S1). We considered as unsuitable habitats all other land cover classes that are never occupied by the species. Habitat specialist grasshoppers are tightly linked to their specific microhabitats and generally show low rates of inter-patch dispersal (e.g. Reinhardt et al., 2005; Ortego et al., 2010; Ortego et al., 2015). In the case of D. hispanicus this is evidenced by the strong genetic fragmentation of its populations at both local and regional spatial scales (see RESULTS section). Thus, the land use classes occupied by D. hispanicus are likely to be an accurate proxy of suitable habitats

131

CAPÍTULO III that sustain comparatively much larger carrying capacities than those habitat categories where the species has never been recorded. Dynamic models considering the impacts of human-driven habitat fragmentation started with a hypothetically non-fragmented landscape that 250 years ago shifted to a fragmented landscape with a spatial configuration of suitable/unsuitable habitats that has not changed until present. Thus, this model assumes that habitat fragmentation took place 250 years ago, the spatial configuration of suitable habitats after fragmentation took place has remained unaltered until present times, and the carrying capacities of unsuitable habitats are n times smaller than those of habitats suitable for the species. We fine-tuned this model considering different relative values of carrying capacities for unsuitable and suitable habitats (1:10, 1:100 and 1:1000) (e.g. Paquette et al., 2014; Ortego et al., 2015). The three orders of magnitude considered for the relative carrying capacities of unsuitable and suitable habitats span a range that has been found to be informative in another specialist grasshopper from the Iberian Peninsula inhabiting similar flat lowland landscapes (Ortego et al., 2015). Finally, we generated two variants of the three basic models described above by incorporating information from geographical features that might hypothetically hinder (topographic roughness) or impede (impassable barriers to dispersal) gene flow among populations. The first variant incorporated the hypothetical negative impact of topographic roughness on migration rates with neighbouring demes (specified as a friction parameter in SPLATCHE2; Ray et al., 2010). The second variant incorporated the presence of impassable barriers to dispersal (k = 0). We considered grid cells with a slope > 20 % as impassable barriers to dispersal, as these areas are not occupied by the focal taxon according to occurrence data (Table S1). We calculated topographic roughness (slope) using a 90-m resolution digital elevation model from NASA Shuttle Radar Topographic Mission (SRTM Digital Elevation Data; http://srtm.csi.cgiar.org/). Overall, we generated a total of nine demographic models resulted from the combinations of three basic static/dynamic models with the two hypothetical effects of topography (topographic roughness and the presence/absence of barriers to dispersal) (Fig. 1 and 3; Table 2).

132

CAPÍTULO III

Demographic and genetic simulations. We used SPLATCHE2 to perform forward-in-time demographic simulations followed by backward-in-time genetic (coalescent) simulations under each model (see Ray et al., 2010), which are expected to produce contrasting patterns of genetic variation due to differences among scenarios in the way that carrying capacities vary across the landscape and through time (see Massatti & Knowles, 2016) (Fig. 3). For each model, we ran 200,000 simulations using the same uniform priors for the three demographic parameters employed: migration rate per deme per generation (m; range of log(m): -2.3, -

1.2), carrying capacity of the deme with highest suitability (Kmax; range of log(Kmax): 5.2, 7.2), and ancestral population size (NANC; range of log(NANC): 2.0, 4.0). Following each time- forward demographic simulation, a spatially-explicit time-backward coalescent model informed by the deme-specific demographic parameters (K, m and NANC) was used to generate genetic data (Currat et al., 2004; Ray et al., 2010). We used ARLSUMSTAT v.3.5.2 to calculate a total of 67 summary statistics for simulated datasets under each model, including mean heterozygosity across loci for each population and across populations (H), number of segregating sites for each population and across populations (S), and pairwise population FST values (Excoffier & Lischer, 2010). The same 67 summary statistics were extracted from observed empirical data (Fig. 1).

Model choice and parameter estimation. We used an Approximate Bayesian computation (ABC) framework to perform model selection and parameter estimation (for an overview of

ABC, see Beaumont et al., 2002), as implemented in the programs TRANSFORMER and

ABCESTIMATOR and R scripts (findPLS) distributed as part of ABCTOOLBOX (Wegmann et al., 2010; e.g. He et al., 2013; Massatti & Knowles, 2016). In order to account for correlations between summary statistics and reduce the “curse of dimensionality” associated with using a large number of statistics (Boulesteix & Strimmer, 2007), we used the R package PLS v.2.6-0 (Mevik & Wehrens, 2007) and the findPLS script to extract partial least squares (PLS) components with Box-Cox transformation from the summary statistics of the first 10,000 simulations for each model (Boulesteix & Strimmer, 2007; Wegmann et al., 2010; e.g. He et al., 2013). We used the linear combinations of summary statistics obtained from the first

133

CAPÍTULO III

10,000 simulations for each model to transform all datasets (observed empirical and simulated datasets) with the program TRANSFORMER (for details about this procedure, see Wegmann et al., 2010). For each model, we retained the 1,000 simulations (0.5%) closest to observed empirical data and used them to approximate marginal densities and posterior distributions of the parameters with a postsampling regression adjustment using the ABC- GLM (general linear model) procedure detailed in Leuenberger & Wegmann (2010) and implemented in ABCTOOLBOX (see also Csilléry et al., 2010). We used Bayes factors (BF) for model selection, defined as the ratio between marginal densities of the model with the highest marginal density and the alternative model (Jeffreys, 1961). The higher the ratio is, the more supported the first model is. A BF > 20 indicates strong relative support for the first model, while those >150 indicate very strong support (Jeffreys, 1961; Kass & Raftery, 1995; Leuenberger & Wegmann, 2010). To evaluate the ability of each model to generate the observed empirical data, we calculated the Wegmann’s p-value from the 1,000 retained simulations (Wegmann et al., 2010). The p-value is calculated as the fraction of the retained simulations with a smaller or equal likelihood than the observed empirical data, with low values indicating that a model is highly unlikely (Wegmann et al., 2010). We also assessed the potential for a parameter to be correctly estimated by computing the proportion of parameter variance that was explained (i.e., the coefficient of determination, R2) by the retained PLSs (Neuenschwander et al., 2008). Finally, for the most supported model/s, we determined the accuracy of parameter estimation using a total of 1,000 pseudo-observation datasets (PODs) generated from prior distributions of the parameters. If the estimation of the parameters is unbiased, posterior quantiles of the parameters obtained from PODs should be uniformly distributed (Cook et al., 2006; Wegmann et al., 2010). As done for observed empirical data, we calculated the posterior quantiles of true parameters for each pseudo run based on the posterior distribution of the regression-adjusted 1,000 simulations closest to each pseudo-observation (e.g. Massatti & Knowles, 2016).

134

CAPÍTULO III

RESULTS

GENOMIC DATASET

After quality filtering, a total of 136,580,290 reads were retained across all genotyped individuals (mean ± SD = 2,239,021 ± 657,973) (Fig. S1). The final exported dataset obtained with STACKS after removing loci that did not meet the different filtering requirements (p = 10, r = 0.8, and min_maf = 0.01) contained 18,249 SNPs, with a proportion of missing data of 1.68%.

POPULATION GENETIC STRUCTURE

Pair-wise FST values ranged from 0.082 to 0.307 and all estimates were significantly different from zero based on 100 permutations (P < 0.05; Table S2). STRUCTURE analyses revealed a good congruence between geography and the genetic clusters inferred at different hierarchical levels (Fig. 4). Analyses considering all populations showed the presence of two main genetic clusters corresponding to populations located north (TRAB, NAVA, PNEG, AVIL; hereafter, Northern populations) and south (ALDE, VENT, TRUJ, BELA, ALCU, SANT; hereafter, Southern populations) of the Central Mountain System (Fig. 4 and Figs. S2-S3).

STRUCTURE analyses performed at lower hierarchical levels for Southern populations also showed a good correspondence between geography and the identified genetic clusters, with two well defined genetic clusters separating southernmost populations (BELA, ALCU, SANT) from the rest of Southern populations (ALDE, VENT, TRUJ) (Fig. 4). However, the congruence between geography and genetic substructure among Northern populations was less obvious and some populations (AVIL, PNEG) were often assigned to different genetic clusters than those grouping nearby and spatially interspersed populations (Fig. 4). The results obtained with STRUCTURE were in good agreement with those obtained from Principal Component Analyses (PCA) in which PC1 split Northern and Southern populations, and PC2 separated southernmost populations from the rest of Southern populations (Fig. S4).

135

CAPÍTULO III

Figure 4 Results of STRUCTURE analyses showing posterior probability plots of individual assignments to the different genetic clusters inferred at each hierarchical level. Subsets of populations included in each analysis were defined according to their assignment to the genetic clusters inferred at the previous hierarchical level. Each individual is represented by a vertical bar, which is partitioned into k coloured segments showing the individual’s probability of belonging to the cluster with that colour. Thin vertical black lines separate individuals from different populations. Population codes are described in Table 1.

GENETIC DIVERSITY AND PAST DEMOGRAPHIC HISTORY

Population genetic statistics (π, HO, HE, P and FIS) calculated with STACKS for all positions (polymorphic and non-polymorphic) and only considering variant positions are presented in

Table 1. One-way ANOVAs showed that Northern populations have lower levels of genetic diversity than Southern populations (π: F1,8 = 35.34, P < 0.001; HO: F1,8 = 13.43, P = 0.006). Accordingly, genetic diversity was negatively correlated with latitude (π: r = 0.853, t = -4.62,

P = 0.002; HO: r = 0.818, t = -4.02, P = 0.004) but showed no association with longitude (π: r =

0.052, t = 0.15, P = 0.887; HO: r = 0.043, t = 0.12, P = 0.906). STAIRWAY PLOT analyses revealed that contemporary effective population sizes for Northern populations are lower than those inferred for Southern populations (Fig. 5), which is in agreement with the observed negative latitudinal cline of genetic diversity (Fig. 2). Two Northern populations (AVIL and PNEG) experienced a deep bottleneck starting 40ka BP. These severe bottlenecks reduced Ne by ~75 %, from ~100,000 to ~25,000 diploid individuals, and lasted until the LGM (21 ka), when

136

CAPÍTULO III the two populations abruptly expanded (Fig. 5a). Two populations (VENT and BELA) showed signals of small demographic bottlenecks (Ne reduction by ~10-15%) occurring at different time periods between 50-30 ka BP (Fig. 5b). With the exception of AVIL and PNEG, all populations have experienced demographic expansions of different magnitude after long stable periods with negligible fluctuations in population size. Demographic expansions in most of these populations (NAVA, VENT, TRU, BELA, ALCU) started 40-20 ka BP from an ancestral population of ~80,000-100,000 diploid individuals. Two populations (TRAB and ALDE) experienced earlier (100-120 ka BP) abrupt expansions followed by a long period of demographic stability spanning until the present. These expansions extended for over 10-20 ka and strong variability in their magnitude is reflected in the remarkable differences in Ne and estimates of genetic diversity among contemporary populations (Figs. 2 and 5; Table 1).

Figure 5 Demographic history of each studied population of Dociostaurus hispanicus inferred using STAIRWAY PLOT. Lines show the median estimate of Ne over time for (a) Northern populations (in blue) and (b) Southern populations (in red). Ne is the effective population size, assuming a mutation rate of 2.8 × 10-9 and 1-year generation time. Population codes shown in the legend of each panel are described in Table 1.

137

CAPÍTULO III

ENVIRONMENTAL NICHE MODELLING

Parameter tuning performed using ENMEVAL showed that a LQH feature class (FC) combination and a regularization multiplier (RM) of 2 minimized AICc. After removing highly correlated variables and those with a zero percent contribution, nine bioclimatic layers were retained to construct the final ENM. These variables, sorted by their percent contribution (in parentheses), included: temperature seasonality (BIO4; 26.2%); mean temperature of wettest quarter (BIO8; 23.2%); precipitation of wettest quarter (BIO16; 18.0%); precipitation seasonality (BIO15; 9.7%); precipitation of driest month (BIO14; 6.8%); mean temperature of coldest quarter (BIO11; 6.6%); isothermality (BIO3; 4.8%); mean diurnal range (BIO2; 2.7%); and mean temperature of driest quarter (BIO9; 1.9%). The high AUCTEST (AUCTEST = 0.905) and low ORMTP (ORMTP = 0.017) estimates for the most supported model indicate that it has high discriminatory power and a low degree of overfitting, respectively. Accordingly, inspection of predicted distributions for the present confirmed that the ENM yielded a distribution pattern coherent with the observed current distribution of the species (see panel c from Model D in Fig. 3). The projection of the present-day climate niche envelope to LGM climatic conditions suggests some important temporal changes in the distribution and patterns of population connectivity of the species (Fig. 3a). Climatic suitability maps predicted a more continuous distribution and overall higher suitability during the LGM than in the present and suggest that the species has experienced a northward range contraction and considerable fragmentation across its entire distribution after the last glacial period (Fig. 3).

TESTING ALTERNATIVE DEMOGRAPHIC MODELS

Based on the relative information content contained in the PLS components, the first five PLSs were selected and extracted from the summary statistics for ABC analyses (Fig. S5). According to marginal densities calculated from the 0.5% simulations closest to observed empirical data, Model H had the highest support (Table 2). The high Wegmann p-value (p- value = 0.957) of this model suggests a strong correspondence between observed empirical data and simulated genetic data (Table 2). This model incorporates barriers to dispersal

138

CAPÍTULO III

(areas with slopes > 20 % not occupied by the focal taxon are assigned a k = 0) and starts with a flat landscape with homogeneous carrying capacities (i.e., non-fragmented) that 250 years ago shifted to a fragmented landscape in which carrying capacities for habitats unsuitable for the species are ten times smaller than those assigned to suitable habitats. Fine-tuning of models G-I considering different ratios of carrying capacities for unsuitable and suitable habitats (1:10, 1:100 and 1:1000) did not have a strong impact on relative model fitting to empirical data and Model H was consistently the most supported model in all cases (Table 2). The relative values of carrying capacities for unsuitable and suitable habitats that better fitted empirical data differed among models (1:100 for model G, 1:10 for model H and 1:1000 for model I; Table 2), indicating that the best solution under each specific model was reached through different combinations of demographic parameter values and ratios of carrying capacities between the two habitat classes (Table 2). Analogous models considering alternative times for the onset of habitat fragmentation had consistently lower support (125 years: BF = 14.53; 500 years: BF = 2.62; 1000 years: BF = 3.33; 2000 years: BF =7.49). The second most supported model (Model B; BF = 11.26; Table 2) was the one considering a flat landscape incorporating barriers to dispersal (Fig. 3). This model was plausibly similar to Model H (BF < 20; Table 2). All other models had differences in BF > 20 with the most supported model (Table 2), indicating strong relative support for Model H (Jeffreys, 1961; Kass & Raftery, 1995).

Posterior distributions of parameter estimates were considerably distinct from the prior, indicating the simulated data contained information relevant to estimating the parameters (Fig. S6). The comparison of posterior distributions before and after the ABC- GLM also showed the improvement that this procedure had on parameter estimates (blue shading vs. solid black line in Fig. S6). The posterior distribution of maximum carrying capacity (Kmax) was flatter than those obtained for ancestral population size (NANC) and migration rates (m) (Fig. S6). Accordingly, the coefficients of determitation (R2) from a multiple regression between each demographic parameter and the five retained PLS indicated that the employed summary statistics had a high potential to correctly estimate all the parameters except Kmax (Table 2). Histograms of the posterior quantile of m is uniformly distributed, indicating that this parameter is estimated without bias (Fig. S7b). In contrast,

139

CAPÍTULO III

the Kolmogorov-Smirnov test indicated that the posterior quantiles for Kmax and NANC deviated from an uniform distribution, suggesting a potential bias in the estimation of these parameters (Fig. S7a,c).

DISCUSSION

In a first (observational) step, we employed genomic data to infer the spatial patterns of genetic diversity and structure of the studied species and reconstruct historical demographic profiles of the different populations. In a second (hypothesis-testing) step, we used information on different aspects of the ecology of the focal taxon (environmental niche envelope, habitat-associations, topographic roughness, and barriers to dispersal) to generate biologically-informed models representing alternative hypotheses linking patterns of genetic variation inferred in the first step with the ecological and evolutionary processes shaping them. Statistical evaluation of the different competing scenarios revealed that the marked spatial genetic structure of the focal taxon has been primarily determined by topographic barriers to dispersal and the contemporary configuration of suitable habitats for the species, with no apparent effect of distributional shifts linked to Pleistocene glacial cycles.

Spatial distribution of genetic variation

Despite the narrow distribution of the cross-backed grasshopper D. hispanicus and the short geographical distances separating some of the studied populations, Bayesian clustering analyses revealed a marked hierarchical genetic structure in the species. Specifically, these analyses showed the presence of two main genetic clusters separating populations located north and south of the Central Mountain System that, in turn, are substructured at finer geographical scales. Such pronounced genetic structure is comparable to that reported for other Mediterranean grasshoppers showing low dispersal capacities and/or a highly fragmented distribution of suitable habitats (Ortego et al., 2010; Noguerales et al., 2016). Beyond the assignment to different genetic clusters, several lines of evidence indicate that

140

CAPÍTULO III the two main population groups have experienced contrasting demographic histories. Whereas expansions following long stable demographic periods with negligible fluctuations in population size characterize southern populations, some northern populations (PNEG, AVIL) have passed through severe demographic bottlenecks during the last glaciation. Increased genetic drift in northern populations is reflected in their higher levels of genetic differentiation and lower levels of genetic diversity in comparison with southern populations, which leads to a latitudinal decline of genetic diversity. Genetic drift and strong bottlenecks in northern populations might have led to some incongruences between geography and the inferred genetic substructure and explain why some nearby populations do not cluster together in STRUCTURE analyses (see also Ortego et al., 2018). Collectively, these results indicate considerable geographic heterogeneity in the distribution of genetic diversity across populations, suggesting that different temporal and spatial aspects of landscape composition have plausibly impacted the demography of the species and shaped its present-day patterns of genetic variation.

Testing alternative demographic models

Model comparison using an Approximate Bayesian Computation framework indicated that the spatiotemporally explicit demographic scenario most supported by genomic data was the one considering a landscape incorporating both topographic barriers to dispersal and the contemporary distribution of suitable habitats (Model H). The scenario based on a static flat landscape with impassable barriers to dispersal (Model B) was also relatively well supported (BF < 20; Kass & Raftery, 1995) and provided a reasonably good fit to the data. Remarkably, among models that did not incorporate the barrier, only the one based on the contemporary distribution of suitable habitats (Model G) was able to generate data in agreement with observed empirical data whereas the other models had low to very low relative supports (Wegmann p-values < 0.1).

The predominant role of topographic barriers to dispersal on structuring genomic variation in the studied species is in agreement with its strong preferences for flat

141

CAPÍTULO III landscapes and is in line with previous studies documenting the importance of topographic features on shaping genetic divergence across multiple terrestrial organisms (e.g. Castillo et al., 2014; Benham & Witt, 2016). The effects of topographic barriers on structuring genetic variation can be explained by multiple non-mutually exclusive mechanisms, including the greater energetic expenditure associated with moving across topographically heterogeneous landscapes (Castillo et al., 2014), dispersal preferences for a particular environment (flat landscapes), behavioural reluctance to cross novel habitats (i.e., steep slopes) (Wang & Bradburd, 2014) or physiological constrains (i.e., thermal tolerances and sensitivity) and/or competition disadvantage under the specific microclimatic conditions prevailing at high elevational ranges (Slatyer et al., 2016; Strangas et al., 2018). Remarkably, fitting topographic roughness as a friction parameter (Ray et al., 2010) yielded very poor model performance under any scenario (BF > 190; Wegmann p-value < 0.1), which indicates that the failure of the species to cross landscapes with slopes above a certain threshold (> 20 %), rather than topographic roughness per se, is the proximate factor structuring genetic variation in the focal taxon (see also González-Serna et al., 2018). The strong preferences of the species for flat areas might explain the observed “threshold effect”, in contrast to the “cumulative effects” of topographic roughness often documented in organisms inhabiting abrupt landscapes (Wang, 2012; Castillo et al., 2014; Benham & Witt, 2016), including some montane grasshoppers (e.g. Noguerales et al., 2016). Limited gene flow through steep slopes is in strong agreement with inferences from STRUCTURE and PCA analyses, which showed that the different genetic clusters group populations separated by the main mountain systems in the study area (Central Mountain System, Montes de Toledo, and Sierra Morena) and that the hierarchy of such genetic splits is congruent with the extension and elevation of the different ranges. The fact that the uplift of these mountain ranges largely predates (Vera, 2004) the estimated divergence of D. hispanicus from its sister taxon D. brevicollis Eversmann 1848 (~1.9 Ma; González-Serna et al., 2018) points to range expansion followed by genetic drift, rather than fragmentation of an originally continuous population, as the most likely explanation for the observed link between contemporary patterns of genetic structure and the spatial configuration of barriers to dispersal (Short & Petren, 2011). The most supported model suggested that the contemporary distribution of suitable habitats

142

CAPÍTULO III derived from land cover maps has impacted the spatial distribution of genetic variation in the studied species. The effect of contemporary landscape composition makes biological sense considering that land clearing for extensive agriculture (vineyards, olive groves and cereal crops) has dramatically fragmented the species habitats (Presa et al., 2016). However, ABC model selection suggests that the impacts of human-driven habitat fragmentation on observed patterns of genetic variation are very small in comparison with the effects of topographic barriers to dispersal. In fact, the most supported model (Model H) was statistically followed by the one based on a flat landscape (Model I), indicating the difficulties to discern the effects of a continuous vs. a fragmented landscape. This suggests that our range-wide sampling scheme was mostly able to capture the effects of historical processes (Schwartz & McKelvey, 2008; Oyler-McCance et al., 2013) or, more likely, that the time elapsed since human-driven habitat fragmentation took place has not been enough to strongly impact the genetic makeup of the species (Landguth et al., 2010). Accordingly, our demographic reconstructions do not support population declines during the Anthropocene and, on the contrary, indicate that all populations have actually experienced considerable expansions starting at different time periods followed by a phase of demographic stability until present days. Overall, these results are in line with previous studies on Orthoptera showing that population fragmentation linked to anthropogenic habitat destruction can leave detectable genetic signatures but, generally, with very small effect sizes (Lange et al., 2010; Keller et al., 2013; Ortego et al., 2015) unlikely to blur phylogeographic structure resulted from historical processes (Cunningham & Moritz, 1998; Keller et al., 2013).

Although the ENM seemed to predict well the current distribution of the focal taxon, several lines of evidence indicate that estimates of climatic suitability derived from it across different time periods had a low ability to recover the demographic history of the species. Firstly, projection of the present-day bioclimatic envelop to the LGM predicted high climatic suitability across all populations during this period, which is incompatible with the comparatively lower levels of genetic diversity of northern populations and the severe bottlenecks experienced by some of them during the last glaciation according to demographic reconstructions (e.g. Carnaval et al., 2009; Cristofari et al., 2018; Noguerales et al., 2018; Yannic et al., 2018). Secondly, spatially-explicit simulations and ABC model choice

143

CAPÍTULO III gave a much lower support to the dynamic model based on ENM than to the static model based on a flat landscape representing a baseline isolation-by-distance scenario (e.g. He et al., 2013). These results emphasize the importance of using independent sources of information (i.e., genomic data and, when available, fossil records and ancient DNA) to validate distributional shifts inferred from ENMs, an aspect that might be particularly relevant for species inhabiting Mediterranean lowlands showing less predictable demographic responses to past climatic oscillations than organisms distributed at high latitudes or in alpine/montane environments (Davis et al., 2014; Fordham et al., 2014; Metcalf et al., 2014).

Conclusions and final remarks

Our study highlights the power of integrating genomic data and model-based approaches to infer complex demographic and evolutionary processes taking place at contrasting temporal scales. Such processes would be difficult to unravel through traditional phylogeographic or landscape genetic approaches (Vandergast et al., 2007; Zellmer & Knowles, 2009). By integrating these approaches, we have been able to use a unified hypothesis-testing framework to analyse the relative role of historical (i.e., topography and Pleistocene climatic oscillations) and contemporary (i.e., human-driven habitat fragmentation) processes in shaping the distribution of genetic variation in a red-listed species. Our results suggest a low performance of ENM to predict the past demographic history of the studied species and indicate a predominant role of historical landscape composition (i.e., spatial configuration of topographic barriers to dispersal) and contemporary anthropogenic habitat fragmentation on determining its major axes of genomic variation. At this point, we must note that similar genetic patterns may result from different demographic processes and, thus, our most supported model, even if it is able to generate data in agreement with empirical data, is likely to have ignored many relevant aspects shaping the evolutionary history of focal species (He et al., 2013; Massatti & Knowles, 2016). For instance, the marked spatial heterogeneity in genetic diversity (i.e., negative latitudinal cline) and the heterogeneous

144

CAPÍTULO III demographic profiles (bottlenecks vs. expansions) of the different populations are probably the outcome of complex ecological/evolutionary processes that are unlikely to be covered by the most supported scenarios. Acknowledging the limitations inherent to any model-based approach, our study highlights the potential of testing biologically-informed models to better understand the predominant processes shaping spatial patterns of genetic variation in organisms inhabiting regions severely impacted by changes in landscape composition acting at contrasting temporal scales (He et al., 2013; Papadopoulou & Knowles, 2016).

ACKNOWLEDGEMENTS

We wish to thank to Carlos Muñoz-Alcón for providing us information about the location of some populations, Anna Papadopoulou and L. Lacey Knowles Lab for their great support with data analyses, David Aragonés (LAST-EBD) for his help with GIS analyses, Charles B. van Rees for kindly correcting the English language of the manuscript, and three anonymous referees for helpful and constructive comments on an earlier version of this article. We also thank to Centro de Supercomputación de Galicia (CESGA) and Doñana's Singular Scientific-Technical Infrastructure (ICTS-RBD) for access to computational resources. The respective administrative authorities from each study area (Castilla y León, Madrid, Castilla - La Mancha, Extremadura, and Andalucía) provided us the corresponding permits for sampling. MJG was supported by a pre-doctoral scholarship from Junta de Comunidades de Castilla-La Mancha and European Social Fund. This work received financial support from research grants CGL2011-25053, CGL2014-54671-P, CGL2016-80742-R and CGL2017-83433-P (co- funded by the Dirección General de Investigación y Gestión del Plan Nacional I+D+i and European Social Fund); PEII-2014-023-P (co-funded by Junta de Comunidades de Castilla-La Mancha and European Social Fund).

REFERENCES

Abellán, P. & Svenning, J.-C. (2014) Refugia within refugia - patterns in endemism and genetic divergence are linked to Late Quaternary climate stability in the Iberian Peninsula. Biological Journal of the Linnean Society, 113, 13-28.

Beaumont, M.A., Zhang, W. & Balding, D.J. (2002) Approximate Bayesian computation in population genetics. Genetics, 162, 2025-2035.

Bemmels, J.B., Title, P.O., Ortego, J. & Knowles, L.L. (2016) Tests of species-specific models reveal the importance of drought in postglacial range shifts of a Mediterranean-climate tree:

145

CAPÍTULO III insights from integrative distributional, demographic and coalescent modelling and ABC model selection. Molecular Ecology, 25, 4889-4906.

Benham, P.M. & Witt, C.C. (2016) The dual role of Andean topography in primary divergence: functional and neutral variation among populations of the hummingbird, Metallura tyrianthina. BMC Evolutionary Biology, 16, 22.

Blondel, J. & Aronson, J. (1999) Biology and wildlife of the Mediterranean region. 328 pp. Oxford University Press. Oxford, USA.

Boulesteix, A.L. & Strimmer, K. (2007) Partial least squares: a versatile tool for the analysis of high-dimensional genomic data. Briefings in Bioinformatics, 8, 32-44.

Braconnot, P., Otto-Bliesner, B., Harrison, S. et al. (2007) Results of PMIP2 coupled simulations of the Mid-Holocene and Last Glacial Maximum - Part 1: experiments and large- scale features. Climate of the Past, 3, 261-277.

Brown, J.L., Weber, J.J., Alvarado-Serrano, D.F. et al. (2016) Predicting the genetic consequences of future climate change: The power of coupling spatial demography, the coalescent, and historical landscape changes. American Journal of Botany, 103, 153-163.

Burnham, K.P. & Anderson, D.R. (1998) Model selection and inference: a practical information-theoretic approach. 355 pp. Springer New York, USA.

Carnaval, A.C., Hickerson, M.J., Haddad, C.F. et al. (2009) Stability predicts genetic diversity in the Brazilian Atlantic forest hotspot. Science, 323, 785-789.

Carstens, B.C. & Knowles, L.L. (2007) Estimating species phylogeny from gene-tree probabilities despite incomplete lineage sorting: an example from Melanoplus grasshoppers. Systematic Biology, 56, 400-411.

Castillo, J.A., Epps, C.W., Davis, A.R. & Cushman, S.A. (2014) Landscape effects on gene flow for a climate-sensitive montane species, the American pika. Molecular Ecology, 23, 843-856.

Catchen, J., Hohenlohe, P.A., Bassham, S. et al. (2013) STACKS: an analysis tool set for population genomics. Molecular Ecology, 22, 3124-3140.

Catchen, J.M., Amores, A., Hohenlohe, P. et al. (2011) STACKS: Building and genotyping loci de novo from short-read sequences. G3: Genes|Genomes|Genetics, 1, 171-182.

Cook, S.R., Gelman, A. & Rubin, D.B. (2006) Validation of software for Bayesian models using posterior quantiles. Journal of Computational and Graphical Statistics, 15, 675-692.

CORINE (2012) CORINE Land Cover, EEA, Commission of the European Communities. [WWW document]. URL https://land.copernicus.eu/pan-european/corine-land-cover/clc-2012

146

CAPÍTULO III

Cristofari, R., Liu, X., Bonadonna, F. et al. (2018) Climate-driven range shifts of the king penguin in a fragmented ecosystem. Nature Climate Change, 8, 245-251.

Csilléry, K., Blum, M.G.B., Gaggiotti, O.E. & François, O. (2010) Approximate Bayesian Computation (ABC) in practice. Trends in Ecology & Evolution, 25, 410-418.

Cunningham, M. & Moritz, C. (1998) Genetic effects of forest fragmentation on a rainforest restricted lizard (Scincidae: Gnypetoscincus queenslandiae). Biological Conservation, 83, 19- 30.

Currat, M., Ray, N. & Excoffier, L. (2004) SPLATCHE: a program to simulate genetic diversity taking into account environmental heterogeneity. Molecular Ecology Notes, 4, 139-142.

Davis, E.B., McGuire, J.L. & Orcutt, J.D. (2014) Ecological niche models of mammalian glacial refugia show consistent bias. Ecography, 37, 1133-1138.

Earl, D.A. & vonHoldt, B.M. (2012) STRUCTURE HARVESTER: a website and program for visualizing STRUCTURE output and implementing the Evanno method. Conservation Genetics Resources, 4, 359-361.

Elith, J., Graham, C.H., Anderson, R.P. et al. (2006) Novel methods improve prediction of species’ distributions from occurrence data. Ecography, 29, 129-151.

Elith, J., Phillips, S.J., Hastie, T. et al. (2011) A statistical explanation of MAXENT for ecologists. Diversity and Distributions, 17, 43-57.

Espindola, A., Pellissier, L., Maiorano, L. et al. (2012) Predicting present and future intra- specific genetic structure through niche hindcasting across 24 millennia. Ecology Letters, 15, 649-657.

Estoup, A., Baird, S.J., Ray, N. et al. (2010) Combining genetic, historical and geographical data to reconstruct the dynamics of bioinvasions: application to the cane toad Bufo marinus. Molecular Ecology Resour, 10, 886-901.

Estrada, A., Delgado, M.P., Arroyo, B. et al. (2016) Forecasting large-scale habitat suitability of European bustards under climate change: The role of environmental and geographic variables. PLOS ONE, 11, e0149810.

Evanno, G., Regnaut, S. & Goudet, J. (2005) Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Molecular Ecology, 14, 2611-2620.

Excoffier, L. & Lischer, H.E. (2010) ARLEQUIN suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows. Molecular Ecology Resources, 10, 564-567.

147

CAPÍTULO III

Falush, D., Stephens, M. & Pritchard, J.K. (2003) Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics, 164, 1567- 1587.

Fordham, D.A., Brook, B.W., Moritz, C. & Nogués-Bravo, D. (2014) Better forecasts of range dynamics using genetic data. Trends in Ecology & Evolution, 29, 436-443.

Fourcade, Y., Engler, J.O., Rödder, D. & Secondi, J. (2014) Mapping species distributions with MAXENT using a geographically biased sample of presence data: a performance assessment of methods for correcting sampling bias. PLOS ONE, 9, e97122.

García, M.D., Larrosa, E., Clemente, M.E. & Presa, J.J. (2005) Contribution to the knowledge of genus Dociostaurus Fieber, 1853 in the Iberian Peninsula, with special reference to its sound production (Orthoptera: Acridoidea). Anales de Biología, 27, 155-189.

Gómez, A. & Lunt, D.H. (2006) Refugia within Refugia: Patterns of phylogeographic concordance in the Iberian Peninsula. In: Phylogeography of Southern European Refugia. 155-188 pp. Weiss, S. & Ferrand, N. (Eds.). Springer. The Netherlands.

González-Serna, M.J., Ortego, J. & Cordero, P.J. (2018) A review of cross-backed grasshoppers of the genus Dociostaurus Fieber (Orthoptera: Acrididae) from the western Mediterranean: insights from phylogenetic analyses and DNA-based species delimitation. Systematic Entomology, 43, 136-146.

Hasumi, H. & Emori, S. (2004) K-1 Coupled GCM (MIROC) Description. Tokyo: Center for Climate System Research (CCSR), University of Tokyo. National Institute for Environmental Studies (NIES). Frontier Research Center for Global Change (FRCGC).

He, Q., Edwards, D.L. & Knowles, L.L. (2013) Integrative testing of how environments from the past to the present shape genetic structure across landscapes. Evolution, 67, 3386-3402.

Hijmans, R.J., Cameron, S.E., Parra, J.L. et al. (2005) Very high resolution interpolated climate surfaces for global land areas. International Journal of Climatology, 25, 1965-1978.

Hewitt, G. (2000) The genetic legacy of the Quaternary ice ages. Nature, 405, 907-913.

Hewitt, G. (2011) Mediterranean Peninsulas: The Evolution of Hotspots. Biodiversity Hotspots: Distribution and Protection of Conservation Priority Areas. 123-148 pp. Zachos, F.E. & Habel, J.C. (Eds.). Springer-Verlag. New York, USA.

Hoban, S. (2014) An overview of the utility of population simulation software in molecular ecology. Molecular Ecology, 23, 2383-2401.

Hochkirch, A., Nieto, A., García-Criado, M. et al. (2016) European red list of grasshoppers, crickets and bush-crickets. 94 pp. Publications Office of the European Union. Luxembourg.

148

CAPÍTULO III

Hohenlohe, P.A., Bassham, S., Etter, P.D. et al. (2010) Population genomics of parallel adaptation in threespine stickleback using sequenced RAD tags. PLOS Genetics, 6, e1000862.

Hubisz, M.J., Falush, D., Stephens, M. & Pritchard, J.K. (2009) Inferring weak population structure with the assistance of sample group information. Molecular Ecology Resources, 9, 1322-1332.

Jakobsson, M. & Rosenberg, N.A. (2007) CLUMPP: a cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure. Bioinformatics, 23, 1801-1806.

Jeffreys, H. (1961) Theory of probability. 472 pp. Oxford University Press. Oxford, UK.

Jombart, T. (2008) Adegenet: a R package for the multivariate analysis of genetic markers. Bioinformatics, 24, 1403-1405.

Kass, R.E. & Raftery, A.E. (1995) Bayes Factors. Journal of the American Statistical Association, 90, 773-795.

Keightley, P.D., Ness, R.W., Halligan, D.L. & Haddrill, P.R. (2014) Estimation of the spontaneous mutation rate per nucleotide site in a Drosophila melanogaster full-sib family. Genetics, 196, 313-320.

Keller, D., van Strien, M.J., Herrmann, M. et al. (2013) Is functional connectivity in common grasshopper species affected by fragmentation in an agricultural landscape? Agriculture, Ecosystems & Environment, 175, 39-46.

Keyghobadi, N., Roland, J., Matter, S.F. & Strobeck, C. (2005) Among- and within-patch components of genetic diversity respond at different rates to habitat fragmentation: an empirical demonstration. Proceedings of the Royal Society B-Biological Sciences, 272, 553- 560.

Knowles, L.L. (2000) Tests of pleistocene speciation in montane grasshoppers (genus Melanoplus) from the sky islands of western North America. Evolution, 54, 1337-1348.

Knowles, L.L. & Massatti, R. (2017) Distributional shifts - not geographic isolation - as a probable driver of montane species divergence. Ecography, 40, 1475-1485.

Landguth, E.L., Cushman, S.A., Schwartz, M.K. et al. (2010) Quantifying the lag time to detect barriers in landscape genetics. Molecular Ecology, 19, 4179-4191.

Lange, R., Durka, W., Holzhauer, S.I. et al. (2010) Differential threshold effects of habitat fragmentation on gene flow in two widespread species of bush crickets. Molecular Ecology, 19, 4936-4948.

Leuenberger, C. & Wegmann, D. (2010) Bayesian computation and model selection without likelihoods. Genetics, 184, 243-252.

149

CAPÍTULO III

Liu, X. & Fu, Y.-X. (2015) Exploring population size changes using SNP frequency spectra. Nature Genetics, 47, 555-559.

Lorenzen, E.D., Nogués-Bravo, D., Orlando, L. et al. (2011) Species-specific responses of Late Quaternary megafauna to climate and humans. Nature, 479, 359-365.

Massatti, R. & Knowles, L.L. (2016) Contrasting support for alternative models of genomic variation based on microhabitat preference: species-specific effects of climate change in alpine sedges. Molecular Ecology, 25, 3974-3986.

Mattila, T.M., Tyrmi, J., Savolainen, O. & Pyhäjärvi, T. (2017) Genome-wide analysis of colonization history and concomitant selection in Arabidopsis lyrata. Molecular Biology and Evolution, 34, 2665-2677.

Metcalf, J.L., Prost, S., Nogués-Bravo, D. et al. (2014) Integrating multiple lines of evidence into historical biogeography hypothesis testing: a Bison bison case study. Proceedings of the Royal Society B-Biological Sciences, 281, 20132782.

Mevik, B.-H. & Wehrens, R. (2007) The pls Package: Principal component and partial least squares regression in R. Journal of Statistical Software, 18, 1-23.

Miles, A., Harding, N.J., Bottà, G. et al. (2017) Genetic diversity of the African malaria vector Anopheles gambiae. Nature, 552, 96-102.

Morrison, L., Estrada, A. & Early, R. (2018) Species traits suggest European mammals facing the greatest climate change are also least able to colonize new locations. Diversity and Distributions, 24, 1321-1332.

Muscarella, R., Galante, P.J., Soley-Guardia, M. et al. (2014) ENMEVAL: An R package for conducting spatially independent evaluations and estimating optimal model complexity for MAXENT ecological niche models. Methods in Ecology and Evolution, 5, 1198-1205.

Myers, N., Mittermeier, R.A., Mittermeier, C.G. et al. (2000) Biodiversity hotspots for conservation priorities. Nature, 403, 853-858.

Neuenschwander, S., Largiadèr, C.R., Ray, N. et al. (2008) Colonization history of the Swiss Rhine basin by the bullhead (Cottus gobio): inference under a Bayesian spatially explicit framework. Molecular Ecology, 17, 757-772.

Noguerales, V., Cordero, P.J. & Ortego, J. (2016) Hierarchical genetic structure shaped by topography in a narrow-endemic montane grasshopper. BMC Evolutionary Biology, 16, 96.

Noguerales, V., Cordero, P.J. & Ortego, J. (2018) Inferring the demographic history of an oligophagous grasshopper: Effects of climatic niche stability and host-plant distribution. Molecular Phylogenetics and Evolution, 118, 343-356.

150

CAPÍTULO III

Ortego, J., Aguirre, M.P. & Cordero, P.J. (2010) Population genetics of Mioscirtus wagneri, a grasshopper showing a highly fragmented distribution. Molecular Ecology, 19, 472-483.

Ortego, J., Aguirre, M.P., Noguerales, V. & Cordero, P.J. (2015) Consequences of extensive habitat fragmentation in landscape-level patterns of genetic diversity and structure in the Mediterranean esparto grasshopper. Evolutionary Applications, 8, 621-632.

Ortego, J., Gugger, P.F. & Sork, V.L. (2018) Genomic data reveal cryptic lineage diversification and introgression in Californian golden cup oaks (section Protobalanus). New Phytologist, 2, 804-818.

Oyler-McCance, S.J., Fedy, B.C. & Landguth, E.L. (2013) Sample design effects in landscape genetics. Conservation Genetics, 14, 275-285.

Papadopoulou, A. & Knowles, L.L. (2015) Species-specific responses to island connectivity cycles: refined models for testing phylogeographic concordance across a Mediterranean Pleistocene Aggregate Island Complex. Molecular Ecology, 24, 4252-4268.

Papadopoulou, A. & Knowles, L.L. (2016) Toward a paradigm shift in comparative phylogeography driven by trait-based hypotheses. Proceedings of the National Academy of Sciences of the United States of America, 113, 8018-8024.

Paquette, S.R., Talbot, B., Garant, D. et al. (2014) Modelling the dispersal of the two main hosts of the raccoon rabies variant in heterogeneous environments with landscape genetics. Evolutionary Applications, 7, 734-749.

Paz, A., Ibáñez, R., Lips, K.R. & Crawford, A.J. (2015) Testing the role of ecology and life history in structuring genetic variation across a landscape: a trait-based phylogeographic approach. Molecular Ecology, 24, 3723-3737.

Peterson, A.T., Soberón, J., Pearson, R.G. et al. (2011) Ecological Niches and Geographic Distributions. 328 pp. Princeton University Press. Princeton, USA.

Peterson, B.K., Weber, J.N., Kay, E.H. et al. (2012) Double digest RADseq: an inexpensive method for de novo SNP discovery and genotyping in model and non-model species. PLOS ONE, 7, e37135.

Phillips, S.J., Anderson, R.P. & Schapire, R.E. (2006) Maximum entropy modeling of species geographic distributions. Ecological Modelling, 190, 231-259.

Phillips, S.J. & Dudík, M. (2008) Modeling of species distributions with MAXENT: new extensions and a comprehensive evaluation. Ecography, 31, 161-175.

Prates, I., Xue, A.T., Brown, J.L. et al. (2016) Inferring responses to climate dynamics from historical demography in neotropical forest lizards. Proceedings of the National Academy of Sciences of the United States of America, 113, 7978-7985.

151

CAPÍTULO III

Presa, J.J., García, M., Clemente, M. et al. (2016) Dociostaurus hispanicus: the IUCN Red List of Threatened Species (Publication no. http://dx.doi.org/10.2305/IUCN.UK.2016- 3.RLTS.T16084433A75088044.en). [WWW document]. URL https://www.iucnredlist.org/species/16084433/75088044

Pritchard, J.K., Stephens, M. & Donnelly, P. (2000) Inference of population structure using multilocus genotype data. Genetics, 155, 945-959.

R Core Team. (2018) R: A Language and Environment for Statistical Computing. [WWW document]. URL https://www.R-project.org

Radosavljevic, A. & Anderson, R.P. (2014) Making better MAXENT models of species distributions: complexity, overfitting and evaluation. Journal of Biogeography, 41, 629-643.

Rand, A.L. (1948) Glaciation, an isolating factor in speciation. Evolution, 2, 314-321.

Ray, N., Currat, M., Foll, M. & Excoffier, L. (2010) SPLATCHE2: a spatially explicit simulation framework for complex demography, genetic admixture and recombination. Bioinformatics, 26, 2993-2994.

Reid, B.N., Kass, J.M., Wollney, S. et al. (2019) Disentangling the genetic effects of refugial isolation and range expansion in a trans-continentally distributed species. Heredity, in press. [doi:10.1038/s41437-018-0135-5]

Reinhardt, K., Köhler, G., Maas, S. et al. (2005) Low dispersal ability and habitat specificity promote extinctions in rare but not in widespread species: the Orthoptera of Germany. Ecography, 28, 593-602.

Rissler, L. (2016) Union of phylogeography and landscape genetics. Proceedings of the National Academy of Sciences of the United States of America, 113, 8079-8086.

Rosenberg, N.A. (2004) DISTRUCT: a program for the graphical display of population structure. Molecular Ecology Notes, 4, 137-138.

Rubidge, E.M., Patton, J.L., Lim, M. et al. (2012) Climate-induced range contraction drives genetic erosion in an alpine mammal. Nature Climate Change, 2, 285-288.

Saccheri, I., Kuussaari, M., Kankare, M. et al. (1998) Inbreeding and extinction in a butterfly metapopulation. Nature, 392, 491-494.

Sandel, B., Arge, L., Dalsgaard, B. et al. (2011) The influence of late Quaternary climate- change velocity on species endemism. Science, 334, 660-664.

Schwartz, M.K. & McKelvey, K.S. (2008) Why sampling scheme matters: the effect of sampling scheme on landscape genetic results. Conservation Genetics, 10, 441-452.

152

CAPÍTULO III

Short, K.H. & Petren, K. (2011) Fine-scale genetic structure arises during range expansion of an invasive gecko. PLOS ONE, 6, e26258.

Slatyer, R.A., Nash, M.A. & Hoffmann, A.A. (2016) Scale-dependent thermal tolerance variation in Australian mountain grasshoppers. Ecography, 39, 572-582.

Stewart, J.R., Lister, A.M., Barnes, I. & Dalén, L. (2010) Refugia revisited: individualistic responses of species in space and time. Proceedings of the Royal Society B-Biological Sciences, 277, 661-671.

Strangas, M.L., Navas, C.A., Rodrigues, M.T. & Carnaval, A.C. (2018) Thermophysiology, microclimates, and species distributions of lizards in the mountains of the Brazilian Atlantic Forest. Ecography, in press [doi:10.1111/ecog.03330]

Sukumaran, J. & Knowles, L.L. (2018) Trait-dependent biogeography: (re)integrating biology into probabilistic historical biogeographical models. Trends in Ecology & Evolution, 33, 390- 398.

Talbot, B., Vonhof, M.J., Broders, H.G. et al. (2016) Range-wide genetic structure and demographic history in the bat ectoparasite Cimex adjunctus. BMC Evolutionary Biology, 16, 268. van der Mescht, L., Matthee, S. & Matthee, C.A. (2015) Comparative phylogeography between two generalist flea species reveal a complex interaction between parasite life history and host vicariance: parasite-host association matters. BMC Evolutionary Biology, 15, 105.

Vandergast, A.G., Bohonak, A.J., Weissman, D.B. & Fisher, R.N. (2007) Understanding the genetic effects of recent habitat fragmentation in the context of evolutionary history: phylogeography and landscape genetics of a southern California endemic Jerusalem cricket (Orthoptera: Stenopelmatidae: Stenopelmatus). Molecular Ecology, 16, 977-992.

Vera, J.A. (2004) Geología de España. 884 pp. Vera, J.A. (Ed.). SGE-IGME. Madrid, Spain.

Villaverde, T., Pokorny, L., Olsson, S. et al. (2018) Bridging the micro- and macroevolutionary levels in phylogenomics: Hyb-Seq solves relationships from populations to species and above. New Phytologist, 220, 636-650.

Wachter, G.A., Papadopoulou, A., Muster, C. et al. (2016) Glacial refugia, recolonization patterns and diversification forces in Alpine-endemic Megabunus harvestmen. Molecular Ecology, 25, 2904-2919.

Waltari, E., Hijmans, R.J., Peterson, A.T. et al. (2007). Locating Pleistocene refugia: Comparing phylogeographic and ecological niche model predictions. PLOS ONE, 2, e563.

Wang, I.J. (2010) Recognizing the temporal distinctions between landscape genetics and phylogeography. Molecular Ecology, 19, 2605-2608.

153

CAPÍTULO III

Wang, I.J. (2012) Environmental and topographic variables shape genetic structure and effective population sizes in the endangered Yosemite toad. Diversity and Distributions, 18, 1033-1041.

Wang, I.J. & Bradburd, G.S. (2014) Isolation by environment. Molecular Ecology, 23, 5649- 5662.

Warren, D.L. & Seifert, S.N. (2011) Ecological niche modeling in MAXENT: the importance of model complexity and the performance of model selection criteria. Ecological Applications, 21, 335-342.

Wegmann, D., Leuenberger, C., Neuenschwander, S. & Excoffier, L. (2010) ABCTOOLBOX: a versatile toolkit for approximate Bayesian computations. BMC Bioinformatics, 11, 116.

Weir, J.T., Haddrath, O., Robertson, H.A. et al. (2016) Explosive ice age diversification of kiwi. Proceedings of the National Academy of Sciences of the United States of America, 113, E5580-E5587.

Worth, J.R.P., Williamson, G.J., Sakaguchi, S. et al. (2014) Environmental niche modelling fails to predict Last Glacial Maximum refugia: niche shifts, microrefugia or incorrect palaeoclimate estimates? Global Ecology and Biogeography, 23, 1186-1197.

Yannic, G., Ortego, J., Pellissier, L. et al. (2018) Linking genetic and ecological differentiation in an ungulate with a circumpolar distribution. Ecography, 41, 922-937.

Zellmer, A.J. & Knowles, L.L. (2009) Disentangling the effects of historic vs. contemporary landscape structure on population genetic divergence. Molecular Ecology, 18, 3593-3602.

154

CAPÍTULO III

SUPPORTING INFORMATION

Supplementary Methods

Methods S1 GENOMIC LIBRARY PREPARATION

We used NucleoSpin Tissue kits (Macherey-Nagel, Durën, Germany) to extract and purify genomic DNA from the hind femur of each individual. Genomic DNA was processed into two genomic libraries using the double-digestion restriction-fragment-based procedure (ddRADseq) described in Peterson et al. (2012). In brief, DNA was doubly-digested with the restriction enzymes MseI and EcoR1 (New England Biolabs, Ipswich, MA, USA) and Illumina adaptors including unique 7-bp barcodes were ligated to the digested fragments. Ligation products were pooled, size-selected between 475-580 bp with a Pippin Prep (Sage Science, Beverly, MA, USA) machine and amplified by PCR with 12 cycles using the iProofTM High- Fidelity DNA Polymerase (BIO-RAD, Hercules, CA, USA). Single-read 150-bp sequencing was performed on an Illumina HiSeq2500 platform at The Centre for Applied Genomics (SickKids, Toronto, ON, Canada).

Methods S2 PROCESSING OF GENOMIC DATA

We used the different programs distributed as part of the STACKS v.1.35 pipeline (process_radtags, ustacks, cstacks, sstacks, and populations) to assemble our sequences into de novo loci and call genotypes (Hohenlohe et al., 2010; Catchen et al., 2011; Catchen et al., 2013). Reads were de-multiplexed and filtered for overall quality using the program process_radtags, retaining reads with a Phred score > 10 (using a sliding window of 15%), no adaptor contamination, and that had an unambiguous barcode and restriction cut site (see

Fig. S1). Raw reads were screened for quality with FASTQC v.0.11.5 (Simon, 2018) and all sequences were trimmed to 129-bp using SEQTK (Heng, 2017) in order to remove low-quality reads near the 3´ ends. Filtered reads of each individual were assembled de novo into putative loci with the ustacks program. The minimum stack depth (m) was set to three and

155

CAPÍTULO III we allowed a maximum distance of two nucleotide mismatches (M) to group reads into a “stack”. We used the “removal” (r) and “deleveraging” (d) algorithms to eliminate highly repetitive stacks and resolve over-merged loci, respectively. Single nucleotide polymorphisms (SNPs) were identified at each locus and genotypes were called using a multinomial-based likelihood model that accounts for sequencing errors, with the upper bound of the error rate (ε) set to 0.2 (Hohenlohe et al., 2010; Catchen et al., 2011; Catchen et al., 2013). A catalogue of loci was built using the cstacks program, with loci recognized as homologous across individuals if the number of nucleotide mismatches between consensus sequences (n) was ≤2. Each individual was matched against this catalogue using sstacks program and output files were exported in different formats using the program populations. For all downstream analyses, we exported only the first SNP per RAD locus and retained loci with a minimum stack depth ≥ 5 (m = 5), that were sequenced in at least 80% of the individuals of each population (r = 0.8), represented in all populations (p = 10), and with a minimum minor allele frequency (MAF) ≥ 0.01 (min_maf = 0.01). The choice of different filtering thresholds had little impact on the obtained inferences (Ortego et al., 2018). For all analyses, we removed outlier loci (i.e. putatively under selection) identified using BAYESCAN v.2.1 (Foll & Gaggiotti, 2008) and the hierarchical and non-hierarchical island models (FDIST method; Excoffier et al., 2009) implemented in ARLEQUIN v.3.5 (Excoffier & Lischer, 2010) (see

Ortego et al., 2018). Outliers identified by ARLEQUIN and BAYESCAN were conservatively combined such that a total of 357 unique outlier loci (1.94%) identified by either method were excluded to create a dataset only containing neutral loci (e.g. Brauer et al., 2016; Ortego et al., 2018). The resulting files were used for subsequent analyses or converted into other formats using the program PGDSPIDER V.2.1.0.3 (Lischer & Excoffier, 2012).

Methods S3 ENVIRONMENTAL NICHE MODELLING

We used the R package ENMEVAL (Muscarella et al., 2014) to conduct parameter tuning and determine the optimal feature class (FC) and regularization multiplier (RM) settings for

MAXENT. We tested a total of 248 models of varying complexity by combining a range of

156

CAPÍTULO III regularization multipliers (RM) (from 0 to 15 in increments of 0.5) with eight different feature classes (FC) combinations (L, LQ, LQP, H, T, LQH, LQHP, LQHPT, where L = linear, Q = quadratic, H = hinge, P = product and T = threshold) (Muscarella et al., 2014). We compared

MAXENT models with different settings using the Akaike Information Criterion corrected for small sample size (AICc) (Burnham & Anderson, 1998; Warren & Seifert, 2011). We performed a three-stage approach to select model parameters (RM and FC) and the set of environmental variables retained in the final model (Warren et al., 2014). In a first step, we built a full set of models including all bioclimatic variables, retained the model with the lowest AICc score, and among those variables that were spatially correlated (Pearson’s coefficient > 0.9, estimated using ENMTOOLS; Warren et al., 2010) we only retained for the next step the one with the highest percent contribution to the model. In a second step, we ran another full set of models with the subset of variables retained in the first step, selected the model with the lowest AICc score, and removed variables with zero percent contribution to the model. Finally, we re-run another full set of models with the environmental variables retained in the previous steps and used for all downstream analyses the model with the lowest AICc score.

Methods S4 TESTING ALTERNATIVE DEMOGRAPHIC MODELS

Constructing demographic models. We generated three basic demographic models: (i) a static model in which carrying capacities (k) are homogeneous across space and time. This scenario is analogous to a flat landscape or an isolation-by-distance model (e.g. He et al., 2013); (ii) a dynamic model representing distributional shifts resulted from the interaction between the species bioclimatic envelope and Pleistocene glacial cycles (e.g. He et al., 2013; Massatti & Knowles, 2016). In this scenario, carrying capacities change over time according to climatic suitability maps obtained from projections of the ENM to the present and the LGM bioclimatic conditions (see section ENVIRONMENTAL NICHE MODELLING). This dynamic model considered landscapes from three consecutive time periods (LGM, intermediate, current) reflecting temporal shifts in the spatial distribution of environmentally suitable areas for the species in response to climate changes since the LGM (e.g. He et al., 2013;

157

CAPÍTULO III

Massatti & Knowles, 2016). Maps for the present time period and for the LGM were obtained from projections of the ENM, whereas maps representing intermediate conditions were generated averaging cell values from current and LGM raster maps. As SPLATCHE2 requires a single raster file with positive integer numbers, we first used ARCMAP v.10.3 to categorize cell values from logistic ENM maps (ranging continuously from 0 to 1) under each time period into 20 bins of equal magnitude (i.e. intervals of 0.05) (e.g. Bemmels et al., 2016). Second, we employed a custom Python script written by Q. He (deposited in Dryad; Bemmels et al., 2016) to convert the maps from the different time periods into a single raster map in which each category represents a unique combination of climatic suitability bins across the three time periods (e.g. He et al., 2013; Bemmels et al., 2016; Massatti & Knowles, 2016). Climatic suitability bins corresponding to each of the three periods (LGM, intermediate, current) were applied to one-third of the total number of simulated generations (see next section). As done in previous studies, carrying capacities were scaled proportionally to logistic climatic suitability scores obtained from the ENM for each time period (e.g. He et al., 2013; Massatti & Knowles, 2016; Knowles & Massatti, 2017). Thus, we assumed that the carrying capacity for each grid cell was proportional to the estimated probability of presence of the species in that grid cell; (iii) a dynamic model starting with a flat landscape (i.e., non-fragmented) that shifted to a fragmented landscape in which carrying capacities for habitats unsuitable for the species were n times lower than those assigned to suitable habitats (Fig. 3; Table 2). The distribution of suitable habitats was estimated from Corine Land Cover maps (CORINE, 2012). We considered as suitable habitats for the species the Corine Land Cover classes “broad-leaved forest”, “transitional woodland- shrub”, “land principally occupied by agriculture, with significant areas of natural vegetation”, “sclerophyllous vegetation”, “agro-forestry areas”, “pastures” and “natural grasslands”, which represent the habitats used by the species according to occurrence data (Table S1). We considered as unsuitable habitats all other land cover classes that are never occupied by the species. Habitat specialist grasshoppers are tightly linked to their specific microhabitats and generally show low rates of inter-patch dispersal (e.g. Reinhardt et al., 2005; Ortego et al., 2010; Ortego et al., 2015). In the case of D. hispanicus this is evidenced by the strong genetic fragmentation of its populations at both local and regional spatial

158

CAPÍTULO III scales (see RESULTS section). Thus, the land use classes occupied by D. hispanicus are likely to be an accurate proxy of suitable habitats that sustain comparatively much larger carrying capacities than those habitat categories where the species has never been recorded. Dynamic models considering the impacts of human-driven habitat fragmentation started with a hypothetically non-fragmented landscape that 250 years ago shifted to a fragmented landscape with a spatial configuration of suitable/unsuitable habitats that has not changed until present. Thus, this model assumes that habitat fragmentation took place 250 years ago, the spatial configuration of suitable habitats after fragmentation took place has remained unaltered until present times, and the carrying capacities of unsuitable habitats are n times smaller than those of habitats suitable for the species. We fine-tuned this model considering different relative values of carrying capacities for unsuitable and suitable habitats (1:10, 1:100 and 1:1000) (e.g. Paquette et al., 2014; Ortego et al., 2015). The three orders of magnitude considered for the relative carrying capacities of unsuitable and suitable habitats span a range that has been found to be informative in another specialist grasshopper from the Iberian Peninsula inhabiting similar flat lowland landscapes (Ortego et al., 2015). Finally, we generated two variants of the three basic models described above by incorporating information from geographical features that might hypothetically hinder (topographic roughness) or impede (impassable barriers to dispersal) gene flow among populations. The first variant incorporated the hypothetical negative impact of topographic roughness on migration rates with neighbouring demes (specified as a friction parameter in SPLATCHE2; Ray et al., 2010). The second variant incorporated the presence of impassable barriers to dispersal (k = 0). We considered grid cells with a slope > 20 % as impassable barriers to dispersal, as these areas are not occupied by the focal taxon according to occurrence data (Table S1). We calculated topographic roughness (slope) using a 90-m resolution digital elevation model from NASA Shuttle Radar Topographic Mission (SRTM Digital Elevation Data; http://srtm.csi.cgiar.org/). Overall, we generated a total of nine demographic models resulted from the combinations of three basic static/dynamic models with the two hypothetical effects of topography (topographic roughness and the presence/absence of barriers to dispersal) (Figs. 1 and 3; Table 2).

159

CAPÍTULO III

Demographic and genetic simulations. We used SPLATCHE2 to perform forward-in-time demographic simulations followed by backward-in-time genetic (coalescent) simulations under each model (see Ray et al., 2010), which are expected to produce contrasting patterns of genetic variation due to differences among scenarios in the way that carrying capacities vary across the landscape and through time (see Massatti & Knowles, 2016) (Fig. 3). To have a computationally traceable number of cells for demographic simulations, we statistically downscaled cell sizes to 16 km2 (e.g. Bemmels et al., 2016; Massatti & Knowles, 2016). Despite of the studied species has a generation time of one year, we scaled it by a factor of 15 to make simulations computationally tractable (e.g. Massatti & Knowles, 2016), which results in a total of 1,400 generations from the LGM to present (21 ka). Because of this scaling, any biological interpretation of absolute values of population genetic parameters would need to be adjusted accordingly (Massatti & Knowles, 2016). Forward demographic simulations under all models initialized 21 ka BP from hypothetized ancestral populations located north and south of the Central Mountain System (Fig. 3), each one with an effective population size of NANC. We considered these two initial source-populations because our genetic structure analyses (STRUCTURE and PCA) identified two well differentiated genetic clusters corresponding with populations located in these two regions (see RESULTS section) (e.g. Massatti & Knowles, 2016). We allowed source population overflow, so that all individuals exceeding the carrying capacity of initial populations spreaded around neighbouring cells (Ray et al., 2010). For each model, we ran 200,000 simulations using the same uniform priors for the three demographic parameters employed: migration rate per deme per generation (m; range of log(m): -2.3, -1.2), carrying capacity of the deme with highest suitability (Kmax; range of log(Kmax): 5.2, 7.2), and ancestral population size (NANC; range of log(NANC): 2.0, 4.0). Before setting the final prior values used in the simulations, we tested a broad range of priors in pilot runs to identify those that result in the colonization of the landscape within the time spanning from the LGM to the present and generate genetic data within the range of observed empirical data. Following each time-forward demographic simulation, a spatially-explicit time-backward coalescent model informed by the deme- specific demographic parameters (K, m and NANC) was used to generate genetic data (Currat et al., 2004; Ray et al., 2010). We run an independent coalescent process to trace the

160

CAPÍTULO III genealogy for each locus from the present to the onset of population expansion from ancestral source populations 21 ka BP and beyond, until alleles coalesced in a single

6 ancestral population of size NANC. We set a maximum of 10 generations for providing ample time for coalescence. To make simulations computationally tractable, we randomly selected 1,250 loci for each focal individual. Genetic structure inferred for this subset of 1,250 loci did not differ from that estimated at all loci, indicating the SNP dataset retained for SPLATCHE2 analyses represent well the major axes of genetic variation among populations (Figs. S2-S4) (e.g. Massatti & Knowles, 2016). Simulated datasets were sampled from the same geographical locations (grid cells) from which the empirical genomic data were obtained (see Table 1) and consisted of the same number of loci, number of individuals, and amount and pattern of missing data than observed empirical data (for details see Massatti & Knowles,

2016). We used ARLSUMSTAT v.3.5.2 to calculate a total of 67 summary statistics for simulated datasets under each model, including mean heterozygosity across loci for each population and across populations (H), number of segregating sites for each population and across populations (S), and pairwise population FST values (Excoffier & Lischer, 2010). The same 67 summary statistics were extracted from observed empirical data (Fig. 1). We ran all simulations on the high-performance computing cluster from Centro de Supercomputación de Galicia (CESGA, Spain). Simulations required ~7,200 CPU hours per model.

Model choice and parameter estimation. We used an Approximate Bayesian computation (ABC) framework to perform model selection and parameter estimation (for an overview of

ABC, see Beaumont et al., 2002), as implemented in the programs TRANSFORMER and

ABCESTIMATOR and R scripts (findPLS) distributed as part of ABCTOOLBOX (Wegmann et al., 2010; e.g. He et al., 2013; Massatti & Knowles, 2016). In order to account for correlations between summary statistics and reduce the “curse of dimensionality” associated with using a large number of statistics (Boulesteix & Strimmer, 2007), we used the R package PLS v.2.6-0 (Mevik & Wehrens, 2007) and the findPLS script to extract partial least squares (PLS) components with Box-Cox transformation from the summary statistics of the first 10,000 simulations for each model (Boulesteix & Strimmer, 2007; Wegmann et al., 2010; e.g. He et

161

CAPÍTULO III al., 2013). We examined root-mean-squared error (RMSE) plots for each model and parameter before deciding upon the optimal number of PLS components to be used for parameter estimation (see Fig. S5). We used the linear combinations of summary statistics obtained from the first 10,000 simulations for each model to transform all datasets

(observed empirical and simulated datasets) with the program TRANSFORMER (for details about this procedure, see Wegmann et al., 2010). For each model, we retained the 1,000 simulations (0.5%) closest to observed empirical data and used them to approximate marginal densities and posterior distributions of the parameters with a postsampling regression adjustment using the ABC-GLM (general linear model) procedure detailed in

Leuenberger & Wegmann (2010) and implemented in ABCTOOLBOX (see also Csilléry et al., 2010). We used Bayes factors (BF) for model selection, defined as the ratio between marginal densities of the model with the highest marginal density and the alternative model (Jeffreys, 1961). The higher the ratio is, the more supported the first model is. A BF > 20 indicates strong relative support for the first model, while those >150 indicate very strong support (Jeffreys, 1961; Kass & Raftery, 1995; Leuenberger & Wegmann, 2010). To evaluate the ability of each model to generate the observed empirical data, we calculated the Wegmann’s p-value from the 1,000 retained simulations (Wegmann et al., 2010). The p- value is calculated as the fraction of the retained simulations with a smaller or equal likelihood than the observed empirical data, with low values indicating that a model is highly unlikely (Wegmann et al., 2010). We also assessed the potential for a parameter to be correctly estimated by computing the proportion of parameter variance that was explained (i.e., the coefficient of determination, R2) by the retained PLSs (Neuenschwander et al., 2008). Finally, for the most supported model/s, we determined the accuracy of parameter estimation using a total of 1,000 pseudo-observation datasets (PODs) generated from prior distributions of the parameters. If the estimation of the parameters is unbiased, posterior quantiles of the parameters obtained from PODs should be uniformly distributed (Cook et al., 2006; Wegmann et al., 2010). As done for observed empirical data, we calculated the posterior quantiles of true parameters for each pseudo run based on the posterior distribution of the regression-adjusted 1,000 simulations closest to each pseudo-observation (e.g. Massatti & Knowles, 2016).

162

CAPÍTULO III

Figure S1 Number of reads per individual before and after different quality filtering steps by STACKS. The total height of the bars represents the total number of raw reads obtained for each individual. Within each bar, the dark red color represents the reads that were discarded by process_radtags due to low quality, adapter contamination or ambiguous barcode and orange color represents the reads that were discarded by ustacks after filtering out repetitive elements and reads that did not comply the different criteria required to create a “stack”. Green color represents the number of retained reads used to identify homologous loci. Populations are sorted from NW to SE and labelled using the same codes presented in Table 1.

163

CAPÍTULO III

Figure S2 Results of Bayesian clustering analyses in STRUCTURE based on (a, c, e) 10,000 unlinked SNPs and (b, d, f) the random subset of 1,250 unlinked SNPs used for SPLATCHE2 analyses. Analyses were performed for (a, b) all populations jointly and for (c, d) Northern and (e, f) Southern populations separately. The panels show the mean (± SD) log probability of the data (LnPr (X|K) over 10 best runs (left y-axes, open dots and error bars) for each value of K and the magnitude of ΔK (right y-axes, black dots).

164

CAPÍTULO III

Figure S3 Results of STRUCTURE analyses showing posterior probability plots of individual assignments to the different genetic clusters inferred at each hierarchical level based on (a) 10,000 unlinked SPNs and (b) the random subset of 1,250 unlinked SPNs used for SPLATCHE2 analyses. Subsets of populations included in each analysis were defined according to their assignment to the genetic clusters inferred at the previous hierarchical level. Each individual is represented by a vertical bar, which is partitioned into k coloured segments showing the individual’s probability of belonging to the cluster with that colour. Thin vertical black lines separate individuals from different populations. Population codes are described in Table 1.

165

CAPÍTULO III

Figure S4 Principal component analyses (PCA) of genetic variation for populations of Dociostaurus hispanicus. Analyses are based on datasets including (a) 10,000 unlinked SNPs and (b) the random subset of 1,250 unlinked SNPs used for SPLATCHE2 analyses. Solid ellipses group populations from Northern (blue) and Southern (red) genetic clusters. Dashed ellipses group populations from the two main genetic groups identified within the Southern cluster. Population codes are described in Table 1.

166

CAPÍTULO III

Figure S5 Root Mean Square Error (RMSE) of parameter estimates against the number of Partial Least Squares (PLS) components under the nine demographic models (A-I; see Table 2) tested for Dociostaurus hispanicus.

167

CAPÍTULO III

Figure S6 Prior and posterior distributions of model parameters for the most supported model (Model H: habitats + barriers; Table 1). Red shading: prior distribution; blue shading: posterior distribution before the ABC-GLM procedure; solid black line: posterior distribution after ABC- GLM; dotted vertical line: mode of parameter estimate. (a) Kmax, carrying capacity; (b) m, migration rate; and (c) NANC, ancestral population size.

Figure S7 Distribution of posterior quantiles of true parameter values from 1000 pseudo- observed data sets (PODs) used to assess bias in parameter estimation for the most supported model (Model H: habitats + barriers; Table 1). Posterior quantiles (grey bars) are compared to a uniform distribution (dashed red line) using a Kolmogorov-Smirnov test. Significant p-values indicate a deviation from a uniform distribution and potential bias in parameter estimation. (a) Kmax, carrying capacity; (b) m, migration rate; and (c) NANC, ancestral population size.

168

CAPÍTULO III

Table S1 Occurrence data for Dociostaurus hispanicus obtained from records available in the literature, the Global Biodiversity Information Facility (http://www.gbif.org), collections from the National Museum of Natural Sciences (MNCN, Madrid) and our own sampling sites. Geographical data for specimens deposited in the entomological collection of the MNCN are based on the localities specified in their respective labels.

Locality (Province) Latitude Longitude Date Reference Aguirre-Segura, Barranco-Vega Negrilla (Salamanca) 41.121594 -5.668672 1939 & Pascual, 1995 Aragón, Coca-Abia, Llorente & Trujillo (Cáceres) † 39.520104 -6.002698 2008 Lobo, 2013 Belalcázar (Córdoba) † 38.627145 -5.116555 2008 Aragón et al., 2013 El Escorial (Madrid) † 40.573851 -4.165536 2008 Aragón et al., 2013 Campo de Argañán (Salamanca) † 40.812937 -6.771741 2008 Aragón et al., 2013 Cipérez (Salamanca) † 40.955956 -6.200795 2008 Aragón et al., 2013 El Abadengo (Salamanca) † 40.829565 -6.727795 2008 Aragón et al., 2013 Los Arribes (Salamanca) † 41.194323 -6.683850 2008 Aragón et al., 2013 Sierra de Francia (Salamanca) † 40.479505 -6.002698 2008 Aragón et al., 2013 García, Larrosa, Clemente & (Madrid) 40.869469 -3.618099 1992 Presa, 2005 Escurial de la Santa Peña de Francia 40.625864 -5.959261 1988 García et al., 2005 (Salamanca) Sierra Arroyo, paraje Las Batuecas 40.528359 -6.367747 1988 García et al., 2005 (Salamanca) Aldehuela de la Bóveda (Salamanca) 40.847246 -6.046293 1977 González-García, 1980 Almenara de Tormes (Salamanca) 41.057450 -5.832213 1977 González-García, 1980 Cabeza de Diego Gómez (Salamanca) 40.928820 -6.066411 1977 González-García, 1980 Campillo (Salamanca) 40.978521 -6.059098 1977 González-García, 1980 El Collado (Salamanca) 40.764740 -6.357955 1977 González-García, 1980 El Palacio de los Villalones (Salamanca) 40.885918 -5.848617 1977 González-García, 1980 Gargabete (Salamanca) 40.928264 -5.610810 1977 González-García, 1980 Golpejas (Salamanca) 40.992398 -5.912633 1977 González-García, 1980 La Fuente de San Esteban (Salamanca) 40.798525 -6.261795 1977 González-García, 1980 Megrillán (Salamanca) 40.887368 -5.789338 1977 González-García, 1980 Orejudos (Salamanca) 40.873774 -5.659121 1977 González-García, 1980 Pelayos (Salamanca) 40.649086 -5.578344 1977 González-García, 1980 Sando (Salamanca) 40.966006 -6.112292 1977 González-García, 1980 Tordelalosa (Salamanca) 40.844030 -5.805571 1977 González-García, 1980 Vecinos (Salamanca) 40.777069 -5.879503 1977 González-García, 1980 Olmedo, Cerro de Fuente (Valladolid) 41.313792 -4.662833 1904 Gutiérrez-Martín, 1905 Olmedo, Cuestas del Telégrafo y Alto 41.309722 -4.673888 1904 Gutiérrez-Martín, 1905 (Valladolid) Hernández-Crespo & Santiago- (Ciudad Real) 38.680134 -4.794070 1992 Alvarez, 1997 Hernández-Crespo & Santiago- Belalcázar (Córdoba) 38.594329 -5.157927 1991 Alvarez, 1997 2008- Llucià-Pomares & Fernández- PN Monfragüe (Cáceres) 39.877830 -6.181870 2011 Ortín, 2011

169

CAPÍTULO III

(continuation of Table S1)

Locality (Province) Latitude Longitude Date Reference 2008- Llucià-Pomares & Fernández- PN Monfragüe (Cáceres) 39.784060 -6.033920 2011 Ortín, 2011 2008- Llucià-Pomares & Fernández- PN Monfragüe, Serradilla (Cáceres) 39.838050 -6.031590 2011 Ortín, 2011 2008- Llucià-Pomares & Fernández- PN Monfragüe, Torrejón el Rubio (Cáceres) 39.783760 -6.022250 2011 Ortín, 2011 2008- Llucià-Pomares & Fernández- PN Monfragüe (Cáceres) 39.856940 -5.934150 2011 Ortín, 2011 2008- Llucià-Pomares & Fernández- PN Monfragüe (Cáceres) 39.761409 -5.790020 2011 Ortín, 2011 C. Muñoz-Alcón (pers. Cabañas (Ávila) 40.559443 -4.763938 2010 commun.) 2008, Aldeacentenera, paraje Puente del Conde C. Muñoz-Alcón (pers. 39.557729 -5.578050 2009, (Cáceres) commun.) 2014 Aldeacentenera, paraje Arroyo del Abad C. Muñoz-Alcón (pers. 39.565367 -5.659226 2014 (Cáceres) commun.) C. Muñoz-Alcón (pers. Trabanca (Salamanca) 41.246303 -6.394735 2010 commun) Villares de Yeltes, paraje Pedro Alvaro C. Muñoz-Alcón (pers. 40.895920 -6.433751 2009 (Salamanca) commun) C. Muñoz-Alcón (pers. Vilvestre, paraje Vilvestre (Salamanca) 41.115912 -6.686962 2009 commun) Cabañas, Sierra del Zapatero (Ávila) 40.559443 -4.763938 2010 J. I. Ortega (pers. commun) Trabanca (Salamanca) 41.246303 -6.394735 2010 J. I. Ortega (pers. commun) Villares de Yelte (Salamanca) 40.895920 -6.433751 2009 J. I. Ortega (pers. commun) Vilvestre (Salamanca) 41.115912 -6.686962 2009 J. I. Ortega (pers. commun) Castro Verde (Portugal) 37.638700 -8.073300 2014 Pina et al., 2017 (Madrid) 40.672677 -4.009491 1975 Presa-Asensio, 1977 (Madrid) 40.686362 -4.027663 1975 Presa-Asensio, 1977 Cuestas de (Madrid) 40.589544 -4.015051 1975 Presa-Asensio, 1977 (Madrid) 40.720160 -3.909280 1976 Presa-Asensio, 1977 La Cabrera (Madrid) 40.861778 -3.627884 1976 Presa-Asensio, 1977 La Jarosa (Madrid) 40.669754 -4.127836 1976 Presa-Asensio, 1977 La Navata (Madrid) 40.599611 -3.995964 1976 Presa-Asensio, 1977 (Madrid) 40.738158 -3.871657 1975 Presa-Asensio, 1977 Miraflores (Madrid) 40.819654 -3.755685 1976 Presa-Asensio, 1977 1973, Robledondo (Madrid) 40.583651 -4.210748 Presa-Asensio, 1977 1976 1975, Santa María de la Alameda (Madrid) 40.598061 -4.265184 Presa-Asensio, 1977 1976 (Madrid) 40.878023 -3.664012 1976 Presa-Asensio, 1977 GBIF, Masó-Ros & Agulló- Chamartín de la Rosa (Madrid) 40.469665 -3.683231 - Villaronga, 2018 , Venta de Cárdenas (Ciudad GBIF, Masó-Ros & Agulló- 38.438482 -3.490568 1923 Real) Villaronga, 2018 Cañada del Hoyo (Cuenca) 39.980656 -1.889820 1987 MNCN

170

CAPÍTULO III

(continuation of Table S1)

Locality (Province) Latitude Longitude Date Reference La Hinojosa (Cuenca) 39.735519 -2.424928 1979 MNCN Hontoba (Guadalajara) 40.458668 -3.036776 1985 MNCN 1936, (Madrid) 40.395991 -3.995715 MNCN 1938 Carretera a desde 40.777436 -3.776613 1985 MNCN (Madrid) Casa de Campo (Madrid) 40.428969 -3.755441 1964 MNCN Cementerio Mingorrubio, El Pardo (Madrid) 40.538115 -3.783499 1979 MNCN El Molar (carretera Burgos, parada San 40.723579 -3.580161 1902 MNCN Agustín) (Madrid) 1933, 1934, El Pardo (Madrid) 40.523660 -3.785379 MNCN 1935, 1977 El Pardo al Torreón de la Casa de Campo 1978, 40.497757 -3.764918 MNCN (Madrid) 1980 El Torreón (El Pardo) (Madrid) 40.520813 -3.797708 1980 MNCN Gandullas (Madrid) 41.013360 -3.599210 1982 MNCN 1980, La Quinta de El Pardo (Madrid) 40.507142 -3.732476 MNCN 1982 Miraflores de la Sierra (Madrid) 40.786890 -3.771375 1986 MNCN Miraflores de la Sierra (más allá de Soto del 40.785428 -3.771289 1980 MNCN Real) (Madrid) (Madrid) 40.380192 -4.256606 - MNCN 1978, Peña Real (Soto del Real) (Madrid) 40.739047 -3.755936 MNCN 1982 Peña Real (Soto del Real) Monte de San Pedro, Finca los Roncajal, Arbustos 40.739047 -3.755936 1982 MNCN (Madrid) Puerto de (Miraflores de la Sierra) 40.875504 -3.772373 1981 MNCN (Madrid) Soto del Real (Madrid) 40.760993 -3.770916 1985 MNCN (Madrid) 40.496411 -4.053810 1979 MNCN Valdemorillo (camping cercano), Herbazal 40.505866 -4.051138 1979 MNCN (Madrid) Valdemorillo (carretera Extremadura hacia la derecha, hacia Villaviciosa de Odón) 40.474487 -4.025446 1980 MNCN (Madrid) Negrillos (pueblo) (Salamanca) 40.773976 -5.930094 - MNCN El Espinar (Segovia) 40.725964 -4.253588 1894 MNCN Navas de Río Frío (Segovia) 40.872049 -4.135070 1957 MNCN 1929, San Rafael (Segovia) 40.719115 -4.202048 MNCN 1931 Ávila (Ávila) 40.624106 -4.689134 2015 Own sampling Puerto de Peña (Ávila) 40.432180 -5.316690 2013 Own sampling Cáceres, paraje Los Llanos (Cáceres) 39.533104 -6.324554 2015 Own sampling

171

CAPÍTULO III

(continuation of Table S1)

Locality (Province) Latitude Longitude Date Reference 2012, Casa del Ventorro (Cáceres) 39.685520 -5.763620 Own sampling 2015 Serradilla (Cáceres) 39.741750 -6.089540 2012 Own sampling Trujillo (Cáceres) 39.474870 -5.892461 2015 Own sampling Cabañeros (Ciudad Real) 39.340214 -4.449220 2016 Own sampling Navas de Estena, Boquerón de la Estena 39.496502 -4.541076 2016 Own sampling (Ciudad Real) 2012, Valle de Alcudia (Ciudad Real) 38.566058 -4.310914 Own sampling 2013 Belalcázar (Córdoba) 38.594329 -5.157927 2015 Own sampling Santa Elena (Jaén) 38.332111 -3.529301 2015 Own sampling Aldea del Fresno (Madrid) 40.391270 -4.209260 2012 Own sampling Trabanca (Salamanca) 41.242752 -6.402215 2015 Own sampling Navas de San Antonio (Segovia) 40.744560 -4.299340 2014 Own sampling

† Estimated locations based on maps provided.

172

CAPÍTULO III

Table S2 Pairwise FST values calculated in ARLEQUIN. All FST values are significantly different from zero after sequential Bonferroni corrections (P < 0.05). Population codes are described in Table 1.

Code TRAB NAVA PNEG AVIL ALDE VENT TRUJ BELA ALCU SANT TRAB --

NAVA 0.153 --

PNEG 0.203 0.188 --

AVIL 0.248 0.197 0.250 --

ALDE 0.225 0.206 0.269 0.307 --

VENT 0.185 0.185 0.231 0.277 0.124 --

TRUJ 0.197 0.188 0.227 0.277 0.133 0.092 --

BELA 0.222 0.200 0.242 0.277 0.145 0.119 0.098 --

ALCU 0.226 0.206 0.238 0.286 0.146 0.125 0.122 0.109 --

SANT 0.200 0.183 0.231 0.263 0.141 0.102 0.098 0.087 0.082 --

173

CAPÍTULO III

Supplementary References

Aguirre-Segura, A., Barranco-Vega, P. & Pascual, F. (1995) La colección de ortópteros de la Estación Experimental de Zonas Áridas (C.S.I.C.) de Almería (Insecta, Orthoptera). Boletín de la Asociación Española de Entomología, 19, 133-155.

Aragón, P., Coca-Abia, M.M., Llorente, V. & Lobo, J.M. (2013) Estimation of climatic favourable areas for locust outbreaks in Spain: integrating species' presence records and spatial information on outbreaks. Journal of Applied Entomology, 137, 610-623.

Beaumont, M.A., Zhang, W. & Balding, D.J. (2002) Approximate Bayesian computation in population genetics. Genetics, 162, 2025-2035.

Bemmels, J.B., Title, P.O., Ortego, J. & Knowles, L.L. (2016) Tests of species-specific models reveal the importance of drought in postglacial range shifts of a Mediterranean-climate tree: insights from integrative distributional, demographic and coalescent modelling and ABC model selection. Molecular Ecology, 25, 4889-4906.

Boulesteix, A.L. & Strimmer, K. (2007) Partial least squares: a versatile tool for the analysis of high-dimensional genomic data. Briefings in Bioinformatics, 8, 32-44.

Brauer, C.J., Hammer, M.P. & Beheregaray, L.B. (2016) Riverscape genomics of a threatened fish across a hydroclimatically heterogeneous river basin. Molecular Ecology, 25, 5093-5113.

Burnham, K.P. & Anderson, D.R. (1998) Model selection and inference: a practical information-theoretic approach. 355 pp. Springer New York, USA.

Catchen, J., Hohenlohe, P.A., Bassham, S. et al. (2013) STACKS: an analysis tool set for population genomics. Molecular Ecology, 22, 3124-3140.

Catchen, J.M., Amores, A., Hohenlohe, P. et al. (2011) STACKS: building and genotyping loci de novo from short-read sequences. G3: Genes|Genomes|Genetics, 1, 171-182.

Cook, S.R., Gelman, A. & Rubin, D.B. (2006) Validation of software for Bayesian models using posterior quantiles. Journal of Computational and Graphical Statistics, 15, 675-692.

CORINE (2012) CORINE Land Cover, EEA, Commission of the European Communities. [WWW document]. URL https://land.copernicus.eu/pan-european/corine-land-cover/clc-2012

Csilléry, K., Blum, M.G.B., Gaggiotti, O.E. & François, O. (2010) Approximate Bayesian Computation (ABC) in practice. Trends in Ecology & Evolution, 25, 410-418.

Currat, M., Ray, N. & Excoffier, L. (2004) SPLATCHE: a program to simulate genetic diversity taking into account environmental heterogeneity. Molecular Ecology Notes, 4, 139-142.

Excoffier, L., Hofer, T. & Foll, M. (2009) Detecting loci under selection in a hierarchically structured population. Heredity, 103, 285-298.

174

CAPÍTULO III

Excoffier, L. & Lischer, H.E. (2010) ARLEQUIN suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows. Molecular Ecology Resources, 10, 564-567.

Foll, M. & Gaggiotti, O. (2008) A genome-scan method to identify selected loci appropriate for both dominant & codominant markers: a Bayesian perspective. Genetics, 180, 977-993.

García, M.D., Larrosa, E., Clemente, M.E. & Presa, J.J. (2005) Contribution to the knowledge of genus Dociostaurus Fieber, 1853 in the Iberian Peninsula, with special reference to its sound production (Orthoptera: Acridoidea). Anales de Biología, 27, 155-189.

González-García, M.J. (1980) Contribución al conocimiento de los Acridoidea (Orth.) de la Dehesa Salmantina. Boletín de la Asociación Española de Entomología, 4, 55-64.

Gutiérrez-Martín, D. (1905) Algunos ortópteros de Olmedo (Valladolid). Boletín de la Real Sociedad Española de Historia Natural, 5, 140-143.

He, Q., Edwards, D.L. & Knowles, L.L. (2013) Integrative testing of how environments from the past to the present shape genetic structure across landscapes. Evolution, 67, 3386-3402.

Heng, L. (2017) SEQTK. [WWW document]. URL https://github.com/lh3/seqtk

Hernandez-Crespo, P. & Santiago-Alvarez, C. (1997) Entomopathogenic fungi associated with natural populations of the moroccan locust Dociostaurus maroccanus (Thunberg) (Orthoptera: Gomphocerinae) and other Acridoidea in Spain. Biocontrol Science and Technology, 7, 357-364.

Hohenlohe, P.A., Bassham, S., Etter, P.D. et al. (2010) Population genomics of parallel adaptation in threespine stickleback using sequenced RAD Tags. PLOS Genetics, 6, e1000862.

Jeffreys, H. (1961) Theory of probability. 472 pp. Oxford University Press. Oxford, UK.

Kass, R.E. & Raftery, A.E. (1995) Bayes Factors. Journal of the American Statistical Association, 90, 773-795.

Knowles, L.L. & Massatti, R. (2017) Distributional shifts - not geographic isolation - as a probable driver of montane species divergence. Ecography, 40, 1475-1485.

Leuenberger, C. & Wegmann, D. (2010) Bayesian computation and model selection without likelihoods. Genetics, 184, 243-252.

Lischer, H.E. & Excoffier, L. (2012) PGDSPIDER: an automated data conversion tool for connecting population genetics and genomics programs. Bioinformatics, 28, 298-299.

Llucià-Pomares, D. & Fernández-Ortín, D. (2011) Nuevos datos sobre la ortopterofauna del Parque Nacional de Monfragüe y zona periférica de protección (Cáceres, España). Boletín de la Sociedad Entomológica Aragonesa, 48, 267-286.

175

CAPÍTULO III

Massatti, R. & Knowles, L.L. (2016) Contrasting support for alternative models of genomic variation based on microhabitat preference: species-specific effects of climate change in alpine sedges. Molecular Ecology, 25, 3974-3986.

Masó-Ros, G. & Agulló-Villaronga, J. (2018). Dociostaurus hispanicus. Global Biodiversity Information Facility (Museu de Ciències Naturals de Barcelona: MCNB-ART). [WWW documents]. URL https://www.gbif.org/occurrence/1248766649 and https://www.gbif.org/occurrence/1248766661

Mevik, B.-H. & Wehrens, R. (2007) The pls Package: Principal component and partial least Squares regression in R. Journal of Statistical Software, 18, 1-23.

Muscarella, R., Galante, P.J., Soley-Guardia, M. et al. (2014) ENMEVAL: an R package for conducting spatially independent evaluations and estimating optimal model complexity for MAXENT ecological niche models. Methods in Ecology and Evolution, 5, 1198-1205.

Neuenschwander, S., Largiadèr, C.R., Ray, N. et al. (2008) Colonization history of the Swiss Rhine basin by the bullhead (Cottus gobio): inference under a Bayesian spatially explicit framework. Molecular Ecology, 17, 757-772.

Ortego, J., Aguirre, M.P. & Cordero, P.J. (2010) Population genetics of Mioscirtus wagneri, a grasshopper showing a highly fragmented distribution. Molecular Ecology, 19, 472-483.

Ortego, J., Aguirre, M.P., Noguerales, V. & Cordero, P.J. (2015) Consequences of extensive habitat fragmentation in landscape-level patterns of genetic diversity and structure in the Mediterranean esparto grasshopper. Evolutionary Applications, 8, 621-632.

Ortego, J., Gugger, P.F. & Sork, V.L. (2018) Genomic data reveal cryptic lineage diversification and introgression in Californian golden cup oaks (section Protobalanus). New Phytologist, 2, 804-818.

Paquette, S.R., Talbot, B., Garant, D. et al. (2014) Modelling the dispersal of the two main hosts of the raccoon rabies variant in heterogeneous environments with landscape genetics. Evolutionary Applications, 7, 734-749.

Peterson, B.K., Weber, J.N., Kay, E.H. et al. (2012) Double digest RADseq: an inexpensive method for de novo SNP discovery and genotyping in model and non-model species. PLOS ONE, 7, e37135.

Pina, S., Vasconcelos, S., Reino, L. et al. (2017) The Orthoptera of Castro Verde special protection area (Southern Portugal): new data and conservation value. ZooKeys, 691,19-48.

Presa-Asensio, J.J. (1978) Los Acroidea (Orthoptera) de la Sierra del . Cátedra de Entomología, trabajo nº 26. 281 pp. Universidad Complutense de Madrid, Facultad de Biología. Madrid, Spain.

176

CAPÍTULO III

Ray, N., Currat, M., Foll, M. & Excoffier, L. (2010) SPLATCHE2: a spatially explicit simulation framework for complex demography, genetic admixture and recombination. Bioinformatics, 26, 2993-2994.

Reinhardt, K., Köhler, G., Maas, S. & Detzel, P. (2005) Low dispersal ability and habitat specificity promote extinctions in rare but not in widespread species: the Orthoptera of Germany. Ecography, 28, 593-602.

Simon, A. (2018) FASTQC v.0.11.7. [WWW document]. URL http://www.bioinformatics.babraham.ac.uk/projects/fastqc/

Warren, D.L., Glor, R.E. & Turelli, M. (2010) ENMTOOLS: a toolbox for comparative studies of environmental niche models. Ecography, 33, 607-611.

Warren, D.L. & Seifert, S.N. (2011) Ecological niche modeling in MAXENT: the importance of model complexity and the performance of model selection criteria. Ecological Applications, 21, 335-342

Warren, D.L., Wright, A.N., Seifert, S.N. & Shaffer, H.B. (2014) Incorporating model complexity and spatial sampling bias into ecological niche models of climate change risks faced by 90 California vertebrate species of concern. Diversity and Distributions, 20, 334-343.

Wegmann, D., Leuenberger, C., Neuenschwander, S. & Excoffier, L. (2010) ABCTOOLBOX: a versatile toolkit for approximate Bayesian computations. BMC Bioinformatics, 11, 116.

177

CAPÍTULO 4

Insights into the neutral and adaptive processes shaping the spatial distribution of genomic variation in the economically important Moroccan locust

(Dociostaurus maroccanus)

María José González-Serna, Pedro J. Cordero & Joaquín Ortego Evolutionary Applications (2019), enviado.

CAPÍTULO IV

Insights into the neutral and adaptive processes shaping the spatial distribution of genomic variation in the economically important Moroccan locust (Dociostaurus maroccanus)

Abstract

Understanding the processes that shape neutral and adaptive genomic variation is a fundamental step to determine the demographic and evolutionary dynamics of pest species and develop informed management practices aimed at mitigating their negative impacts. Here, we use genomic data (~50,000 SNPs) obtained via restriction site-associated DNA sequencing (ddRADseq) to investigate the factors shaping the genetic structure of Moroccan locust (Dociostaurus maroccanus) populations from the westernmost portion of the species distribution (Iberian Peninsula and Canary Islands), infer historical demographic trends in outbreak and solitary populations, and determine the relative role of neutral vs. selective processes on shaping spatial patterns of genomic variation in this pest species of great economic importance. Our analyses showed that Iberian populations are characterized by high gene flow and a shallow genetic structure, whereas the highly isolated Canarian populations have experienced strong genetic drift and loss of genetic diversity. Historical demographic reconstructions revealed that all populations have passed through a remarkable genetic bottleneck around the last glacial maximum (~21 ka BP) followed by an abrupt demographic expansion at the onset of the Holocene, indicating increased effective population sizes during warm periods as expected from the thermophilic nature of the species. Genome-scans and environmental association analyses identified several putative loci under selection, suggesting that local adaptation processes in certain populations might not be impeded by widespread gene flow. Finally, all analyses showed very little differences between traditionally outbreak and non- outbreak populations in patterns of genetic diversity and structure, demographic profiles, or signatures of selection. Integrated pest management practices should consider population connectivity at contrasting spatial scales, the potential importance of local adaptation processes on population persistence, and likely future range expansions and increased outbreak frequency in response to land use changes and ongoing climate warming.

181

CAPÍTULO IV

Keywords: ddRADseq, demographic inference, environmental association analyses, genetic admixture, genetic structure, local adaptation, pest species.

INTRODUCTION

Pest species, either invasive or native, are responsible of considerable economic losses worldwide and, as a result, public, private and non-profit organizations annually invest vast amounts of resources to prevent or mitigate their negative impacts (Skaf et al., 1990; Enserink, 2004). In turn, such management practices often have undesirable side effects on wildlife, natural ecosystems and human health (e.g. Baker & Wilkinson, 1990; Carson, 2002). For these reasons, understanding the population dynamics, dispersal routes, demographic history and idiosyncratic evolutionary processes of pest species is a fundamental step to predict their future impacts and develop informed, less pernicious and more targeted management practices (Lankau et al., 2011; Abrol, 2014). Population genetic approaches have proven useful to address several of the above mentioned aspects and their potential has exponentially growth in the last years by the generalization of high-throughput sequencing techniques and the possibility of inferring both neutral and adaptive evolutionary processes at an unprecedented resolution (Wang et al., 2014; Crossley et al., 2017). The application of genomic tools is particularly important if we consider that most pest species often show large effective population sizes, high dispersal potential, shallow genetic differentiation, and fluctuating and complex demographic dynamics that are difficult to study using traditional capture-mark-recapture approaches or standard genetic methods (Ibrahim et al., 2000; Chapuis et al., 2008; Chapuis et al., 2011a; Bekkevold et al., 2016).

The genetic makeup of a species is influenced by complex interactions between neutral and selective forces, life-history characteristics, and contemporary and past environmental conditions that collectively shape the evolutionary and demographic fate of its populations (Bernatchez et al., 2018). From a neutral perspective, recently developed analytical methods allow inferring complex demographic processes from genomic data and obtaining robust estimates of population size changes and gene flow at different time scales

182

CAPÍTULO IV

(Liu & Fu, 2015; Miles et al., 2017; Sherpa et al., 2018). Historical demographic reconstructions, linking population-size changes with past environmental fluctuations, can help to predict future demographic trends of pest species under hypothetical climate change scenarios (Espindola et al., 2012; Fordham et al., 2014; Brown et al., 2016), whereas contemporary estimates of population genetic connectivity and spatially-explicit landscape genetic analyses are useful to obtain baseline information on dispersal rates (Crossley et al., 2019) and identify corridors for gene flow that might facilitate pest spread (Venkatesan & Rasgon, 2010; Zepeda-Paulo et al., 2010; Karsten et al., 2016; Qin et al., 2018). From an adaptive perspective, new molecular tools bring the opportunity to infer selection at specific loci displaying patterns of variation contrasting to those characterizing genomic regions only affected by neutral processes such as mutation, genetic drift, migration and demographic changes (Luikart et al., 2003; Berdan et al., 2015). Identifying loci that exhibit a significant reduction of within-population genetic variability and higher divergence across populations consistent with disruptive selection (Vasemägi & Primmer, 2005), coupled with environment association analyses linking variation in allele frequencies with ecological gradients (EAA; Rellstab et al., 2015), can help to determine local adaptation processes in response to specific selective forces such as those imposed by climate (Guo et al., 2016; Crossley et al., 2017; Dowle et al., 2017; Dudaniec et al., 2018), pesticide application (Gassmann et al., 2009; Leftwich et al., 2016; Crossley et al., 2017) or host plant use (Gassmann et al., 2009; Soria-Carrasco et al., 2014; Simon et al., 2015). Gene flow is generally accepted to constrain local adaptation and, on the opposite, strong divergent selection is expected to prevent inter-population realized gene flow (Lenormand, 2002). Thus, joined inference of both neutral and selective phenomena can provide a more comprehensive understanding on the relative role of dispersal and local adaptation processes on structuring genetic variation at different spatiotemporal scales (Guo et al., 2016; Dudaniec et al., 2018).

Locusts are a paradigmatic case of pest species with cyclical outbreaks that cause considerable agricultural losses and remission periods during which local populations either disappear or persist at very low densities (Skaf et al., 1990; Latchininsky, 1998; Enserink, 2004; Chapuis et al., 2009). The relative prevalence of these two phases also varies geographically, with populations from some areas recurrently becoming agricultural pests

183

CAPÍTULO IV while those from traditionally non-outbreaking regions often occur at low numbers forming harmless populations (Latchininsky, 1998; Chapuis et al., 2009). An intriguing example of this demographic stochasticity is the case of the mysterious Rocky Mountain locust Melanoplus spretus, a devastating pest species endemic from North American prairies during the 19th century that went suddenly extinct within a 30-year interval without the causes for such phenomenon have been yet clearly identified (Chapco & Litzenberger, 2004; Lockwood, 2004). Extreme demographic oscillations charactering locust populations are expected to have a considerable impact on genetic variation and the potential of species to respond to selection and evolve local adaptations. On the one hand, population crashes in the transition phase from gregarious to solitary forms are likely to leave genetic signatures of demographic bottlenecks that are probably ephemeral and blurred by genetic admixture after population expansions during outbreak periods (Ibrahim et al., 2000; Ibrahim, 2001; Chapuis et al., 2008). On the other hand, local adaptation processes in response to spatially varying selective pressures could be impeded by high gene flow (Lenormand, 2002; Pujolar et al., 2014; Babin et al., 2017) or restricted to isolated populations that are not swamped by gene flow from outbreaking populations (Chapuis et al., 2014). Previous single-locus and microsatellite-based studies on different locust species have found very shallow patterns of genetic structure at local/regional scales (e.g. Chapuis et al., 2011a), no genetic differentiation between gregarious and solitary phase populations (Chapuis et al., 2008), higher levels of gene flow among outbreaking populations than among recession or non- outbreak populations (Ibrahim et al., 2000; Ibrahim, 2001; Chapuis et al., 2009), and a little impact of recession periods on genetic diversity (Ibrahim et al., 2000; Chapco & Litzenberger, 2004; Chapuis et al., 2014). However, with the exception of the recent sequencing and annotation of the locust Locusta migratoria genome (Wang et al., 2014) and the identification of some genes linked to physiology, phase change and dispersal capacity (Wang et al., 2014; Ernst et al., 2015; Martín-Blázquez et al., 2017; Bakkali & Martín- Blázquez, 2018), high-resolution genomic data have not been yet employed to determine fine-spatial scale patterns of genetic structure, perform robust demographic inferences in outbreak and non-outbreak populations, and assess the potential role of selective processes

184

CAPÍTULO IV on shaping spatial patterns of genetic variation in these organisms of great economic importance (e.g. Crossley et al., 2017).

The so-called Moroccan locust, Dociostaurus maroccanus (Thunberg, 1815), is a xerophilous species distributed in most of the Western Palearctic, from the Canary Islands to South Kazakhstan (Latchininsky, 1998; Cigliano et al., 2019). The species is characterized by its broad polyphagia, extreme voracity, enormous fecundity, extraordinarily fluctuating populating sizes, and high capability to migrate (del Cañizo & Moreno, 1949; Uvarov, 1977; Latchininsky, 1998; el Ghadraoui et al., 2002). Its distribution is discontinuous and consists of fragmented non-outbreaking populations in some areas and permanent foci of outbreak populations that cyclically become devastating agricultural pests (del Cañizo & Moreno, 1949; Latchininsky, 1998). It has been reported that Moroccan locusts can move distances of 70-100 km during their entire lifetime (rarely up to 200 km; Latchininsky, 1998), making possible the exchange of individuals between distant populations during swarming phases (Latchininsky, 1998). The species is considered a major agricultural pest of high economic importance, damaging pastures and a wide variety of crops during outbreaks, which requires extensive control operations and chemical interventions with a tremendous cost year after year in affected countries (e.g. Arias-Giralda et al., 1997; Latchininsky, 2013; Guerrero et al., 2019). Here, we focus on populations of the Moroccan locust from the Iberian Peninsula and the Canary Islands, which represent the West margin of the species’ distribution (Latchininsky, 2013). In the Iberian Peninsula, there are three main vast regions that are traditional foci of population outbreaks and have suffered considerable damages to pastures and crops for centuries: Monegros (Aragón), La Serena (Extremadura) and Valle de Alcudia (Castilla-La Mancha) (Arias-Giralda et al., 1993; Alberola-Roma, 2012). These regions are characterized by considerable cattle overgrazing, a factor that has been related to irruptive population growth in the Moroccan locust (Louveaux et al., 1996; Latchininsky, 1998). Besides, there are other regions from the Iberian Peninsula that represent historically outbreak areas that nowadays only sustain small populations or where the species has traditionally occurred at very low densities in isolated pockets of suitable habitat (Latchininsky, 1998; Aragón et al., 2013). The small size of some formerly outbreak populations has been hypothesized to be related with the expansion of agriculture and

185

CAPÍTULO IV certain ploughing techniques, the destruction and fragmentation of suitable breeding areas linked to land use changes, and the massive application of pesticides to control locusts’ populations (Latchininsky, 1998). On this respect, several Moroccan locust populations from outbreak areas have been considerably reduced by human intervention to the point that many of them have almost disappeared in the last decades (Latchininsky, 1998; Aragón et al., 2013). However, no study has been performed so far to understand the degree of genetic connectivity among populations of the Moroccan locust at regional scales, infer its past demographic history or determine the potential role of local adaptation processes, information that might help to shed light on key aspects of the ecology, distribution and evolutionary dynamics of this economically important species.

In this study, we use genomic data obtained via restriction site-associated DNA sequencing (ddRADseq) to investigate the factors shaping the genetic structure of Moroccan locust populations from the westernmost portion of the species distribution, infer the past demographic history in outbreak and solitary populations, and determine the relative role of neutral vs. selective processes on structuring spatial patterns of genetic variation in the species. Specifically, we (i) employed estimates of genomic differentiation and diversity and performed Bayesian clustering analyses to test the hypothesis of increased genetic differentiation and lower levels of genetic variation in solitary than outbreak populations (Chapuis et al., 2014). Given the thermophilous character of the Moroccan locust, we hypothesize (ii) historical bottlenecks during glacial periods, population expansions in interglacials (Meco et al., 2011), and genomic signatures of recent demographic declines in solitary populations and formerly outbreaking populations that have experienced remarkable retreats during the last decades (Ibrahim et al., 2000; Chapuis et al., 2014). Finally, (iii) we performed genome-scans and environmental association analyses to identify loci under selection and evaluate the potential importance of local adaptation processes on shaping spatial patterns of genetic differentiation at non-neutral genomic regions.

186

CAPÍTULO IV

MATERIALS AND METHODS

POPULATION SAMPLING

Between May and July 2009-2016, we prospected adequate habitats for the Moroccan locust (Dociostaurus maroccanus) (i.e. grazed grasslands, natural sparse vegetation, arid or semi-desert steppes, abandoned agricultural fields, etc.; Latchininsky, 1998; Latchininsky, 2013) in the Iberian Peninsula and the Canary Islands. We sampled a total of 21 localities representative of both outbreak and non-outbreak populations, a status defined according with our own field observations and information provided by regional government authorities implementing pest management programs (Fig. 1; Table S1). For this study, we analysed a total of 5-8 adult individuals per locality (Table S1). Fresh whole adult specimens were placed in vials with 2-5 ml ethanol 96% and stored at -20° C until needed for genomic analyses.

Figure 1 Geographic location of the studied populations of Moroccan locust in (A) the Iberian Peninsula and (C) the Canary Islands. Black dots indicate those populations analysed with STAIRWAY PLOT (n = 8 individuals) and white squares the rest of the populations (n < 8 individuals). Panels (B) and (C) present the inferred demographic profiles for Iberian and Canarian populations, respectively. Lines show the median estimate of effective population size (Ne) over time, assuming a -9 mutation rate of 2.8 × 10 and 1-year generation time. Population codes are described in Table S1.

187

CAPÍTULO IV

GENOMIC LIBRARY PREPARATION AND DATA PROCESSING

We used NucleoSpin Tissue kits (Macherey-Nagel, Durën, Germany) to extract and purify genomic DNA from the hind femur of each individual. Genomic DNA was processed into three genomic libraries using the double-digestion restriction site-associated DNA sequencing procedure (ddRADseq) described in Peterson et al. (2012) (see Supporting

Information Methods S1). We used the STACKS v. 1.35 pipeline to assemble our sequences into de novo loci and call genotypes (Hohenlohe et al., 2010; Catchen et al., 2011; Catchen et al., 2013). See Supporting Information Methods S2 for details on sequence assembling and genomic data filtering.

OUTLIER LOCI DETECTION AND ENVIRONMENTAL ASSOCIATION ANALYSES

In a first step, we screened for loci not conforming to neutral expectations using two outlier detection approaches: the coalescent-based FDIST method from ARLEQUIN (Excoffier &

Lischer, 2010) and the Bayesian approach implemented in BAYESCAN v.2.1 (Foll and Gaggiotti,

2008). The FDIST method in ARLEQUIN was run in two different ways, considering both the non-hierarchical (Beaumont & Nichols, 1996) and hierarchical (Excoffier et al., 2009) island models. The non-hierarchical island model was run using 200,000 simulations, 100 demes and expected heterozygosity ranging from 0 to 1. The hierarchical island model was run grouping populations according to their geographical origin and the results obtained from genetic structure analyses (i.e. Iberian vs. Canarian populations; see RESULTS section), using the same settings as the non-hierarchical model, and considering three simulated groups (i.e. the number of defined population groups plus one, as recommended in Excoffier & Lischer, 2010). P-values were corrected for multiple testing using the p.adjust function in R (R Core Team, 2018) and loci significantly outside the neutral distribution at a false discovery rate (FDR) of 5% (i.e. q-value < 0.05) were considered as outliers. BAYESCAN analyses were run under default settings (thinning interval size of 10; 20 pilot runs of 5,000 iterations; burn-in length of 50,000 iterations), except for an increase of outputted iterations to 10,000. We used 10 (default) prior odds and adopted the same FDR to identify candidate loci under

188

CAPÍTULO IV

selection as in ARLEQUIN analyses (FDR of 5%, q-value < 0.05). BAYESCAN and ARLEQUIN analyses were run independently for two different genomic datasets, one considering all populations and another one only considering populations from the Iberian Peninsula (e.g. Guo et al., 2016).

In a second step, we performed environmental association analyses (EAA) using Latent Factor Mixed Models (LFMM) implemented in the R v.3.3.3 (R Core Team, 2018) package LEA (Frichot & François, 2015). This approach uses a stochastic Monte Carlo Markov Chain algorithm to test for associations between environmental/ecological variables and allele frequencies while simultaneously controlling for background levels of population structure (Frichot et al., 2013). As environmental information we used the 19 bioclimatic variables from the WORLDCLIM dataset interpolated to 30-arcsec resolution (~1 km2 cell size) (Hijmans et al., 2005). We extracted the values for these variables from all adjacent cells around each sampling location (i.e. ~9 km2) using bilinear interpolations as implemented in

ARCGIS 10.3 (ESRI, Redlands, CA, USA). To summarize and reduce redundancy among the 19 bioclimatic variables, we ran a principal component analysis (PCA) on all of them (e.g. Frichot & François, 2015). The first three principal components (PCs) cumulatively accounted for 94.82% of the variance (PC1: 38.58%; PC2: 33.77%; PC3: 22.47%) and were retained for LFMM analyses (Table S1). The contribution of bioclimatic variables to each axis (i.e. the factor loadings for each PC) is presented in Table S2. The MCMC algorithm was implemented for each of the three PCs (i.e. PC1, PC2 and PC3), using 10,000 iterations, 5,000 as burning period, and 5 independent replicates of the analysis. As indicated above for BAYESCAN and non-hierarchical ARLEQUIN analyses, LFMM analyses were run independently for all populations and only considering Iberian populations. The number of latent factors included in the model as a covariate to control for demographic history were defined on the basis of

STRUCTURE analyses (Pritchard et al., 2000; see RESULTS section) and sparse nonnegative matrix factorization (snmf) analyses implemented in the R package LEA (Frichot et al., 2014). We set the number of latent factors (K) at K = 2 for analyses including all populations and K = 1 for analyses focused on Iberian populations. The z-scores over the five replicates were combined and recalibrated using the genomic inflation factor (λ) (Frichot & François, 2015). Finally, we performed a FDR adjustment to control for multiple tests (FDR of 5%, q-value <

189

CAPÍTULO IV

0.05) and identify candidate loci under environmental selection for each of the three PCs summarizing bioclimatic variation.

GENETIC STRUCTURE

Genetic structure at neutral loci. We inferred genetic structure at neutral loci using the Bayesian Markov chain Monte Carlo clustering method implemented in the program

STRUCTURE v.2.3.3 (Pritchard et al., 2000; Falush et al., 2003; Hubisz et al., 2009). Outlier loci detected by ARLEQUIN and BAYESCAN and SNPs identified by LFMM analyses to be under environmental-driven selection (a total of 9,346 loci; see RESULTS section) were conservatively compiled into a black list and excluded to create datasets only containing neutral loci (e.g. Brauer et al., 2016; Ortego et al., 2018). This yielded neutral datasets of 40,179 SNPs for all populations, 42,114 SNPs for Iberian populations, and 33,998 SNPs for

Canarian populations. We ran STRUCTURE for a random subset of 10,000 unlinked neutral SNPs from each of these three datasets to make analyses computationally tractable and using 200,000 MCMC iterations after a burn-in step of 100,000 iterations, assuming correlated allele frequencies and admixture, and without considering prior population information (Hubisz et al., 2009). We performed 15 independent runs for each value of K to estimate the “true” number of clusters. We retained the ten runs having the highest likelihood for each value of K and identified the most likely number of genetic clusters (K) using log probabilities of Pr(X|K) (Pritchard et al., 2000) and the ΔK method (Evanno et al.,

2005), as implemented in STRUCTURE HARVESTER (Earl & vonHoldt, 2012). We used CLUMPP v.

1.1.2 and the Greedy algorithm to align multiple runs of STRUCTURE for the same K value

(Jakobsson & Rosenberg, 2007) and DISTRUCT v. 1.1 (Rosenberg, 2004) to visualize as bar plots the individual’s probabilities of membership to each inferred genetic cluster. Complementary to Bayesian clustering analyses and in order to visualize the major axes of population genetic differentiation, we performed individual-based principal component analyses (PCA) using the R package ADEGENET (Jombart, 2008). PCAs were run using all neutral SNPs for the two main datasets (all populations and Iberian Peninsula).

190

CAPÍTULO IV

Genetic structure at loci under divergent selection. Genetic structure at loci identified to be under diversifying selection by either ARLEQUIN or BAYESCAN (see RESULTS section) was analysed for all populations (694 SNPs) and Iberian populations (311 SNPs) using STRUCTURE and PCAs as detailed above for neutral loci.

GEOGRAPHICAL AND ENVIRONMENTAL DRIVERS OF GENETIC DIFFERENTIATION

We tested for the presence of isolation-by-distance (IBD) and/or isolation-by-environment (IBE) patterns of genetic structure by analysing the association between genetic differentiation (FST) and geographic and environmental distances among populations (Shafer

& Wolf, 2013; Wang, 2013; Sexton et al., 2014). Genetic differentiation (FST) between all pairs of populations was calculated using the program populations from STACKS separately for neutral loci (Tables S3) and for the subset of loci identified to be under diversifying selection by ARLEQUIN or BAYESCAN (Table S4). Geographic distance between each pair of populations was calculated using GEOGRAPHIC DISTANCE MATRIX GENERATOR v.1.2.3 (Ersts, 2018). Environmental distances were calculated for each PC (PC1, PC2 and PC3) obtained from a PCA on the 19 bioclimatic variables (see LFMM analyses above for details on PCA) using the

“dist” function in R 3.3.3 (R Core Team, 2018). Genetic differentiation (FST) was tested against matrices of geographical and environmental distances using multiple matrix regressions with randomization (MMRR) as implemented in the “MMRR” function (Wang, 2013) in R 3.3.3 (R Core Team, 2018). We ran four independent analyses, considering the two subsets of loci (neutral loci and outlier SNPs under diversifying selection) for either all populations or only Iberian populations (e.g. Guo et al., 2016). We selected final models following a backward procedure, initially fitting all explanatory terms and progressively removing non-significant variables until all retained variables were significant. The significance of the variables excluded from the model was tested again until no additional term reached significance (e.g. Ortego et al., 2015).

191

CAPÍTULO IV

GENETIC DIVERSITY AND PAST DEMOGRAPHIC HISTORY

We only employed neutral loci for calculating genetic diversity statistics and performing demographic inference analyses (Luikart et al., 2003). We used the program populations from STACKS to calculate some genetics statistics, including nucleotide diversity (π), observed

(HO) and expected (HE) heterozygosity, major allele frequency (P), and the Wright’s inbreeding coefficient (FIS) (Catchen et al., 2013). Standardized multilocus heterozygosity

(sMLH) was calculated for each individual using the R package INBREEDR (Stoffel et al., 2016). sMLH is an individual‐based metric defined as the total number of heterozygous loci in an individual divided by the sum of average observed heterozygosities in the population, over the subset of loci successfully typed in the focal individual (Coltman et al., 1999).

We used STAIRWAY PLOT (Liu & Fu, 2015) to reconstruct the demographic history of the studied populations, a novel model-flexible method based on the site frequency spectrum (SFS) that does not require whole-genome sequence data or reference genome information to infer changes in effective population size (Ne) over time. These analyses were restricted to populations with eight genotyped individuals (see Fig. 1 and Table S1), as the calculation of the SFS requires a downsampling procedure to remove missing data. To compute the SFS for each population, we ran the program populations from STACKS (Catchen et al., 2013) in order to export the first SNP per RAD locus and retain loci with a minimum stack depth ≥ 5 (m = 5) and that were represented in at least 50% of the individuals of the focal population (r = 0.5). To remove all missing data for the calculation of the SFS and minimize errors with allele frequency estimates, each population was down-sampled to 6 individuals using a custom Python script written by Qixin He and available on Dryad (Papadopoulou & Knowles, 2015).

We ran STAIRWAY PLOT for each population fitting a flexible multi-epoch demographic model, assuming the mutation rate per site per generation of 2.8 × 10-9 estimated for Drosophila melanogaster (Keightley et al., 2014), a one-year generation time, four different number of random breakpoints [(nseq-2)/4, (nseq-2)/2, (nseq-2)*3/4, and nseq-2], and 200 bootstrap replicates to estimate 95% confidence intervals.

192

CAPÍTULO IV

RESULTS

GENOMIC DATA ANALYSES

Illumina sequencing of ddRAD libraries generated >358 millions of reads in total after first quality filtering using the program process_radtags. The number of reads per individual before and after different quality filtering steps is shown in Fig. S1. Only one sample from population BONI was excluded for subsequent analyses due to low number of reads (Fig. S1).

The dataset obtained with STACKS for all populations contained a total of 49,373 unlinked SNPs.

OUTLIER LOCI DETECTION AND ENVIRONMENTAL ASSOCIATION ANALYSES

For the dataset including all populations, we identified 318 (0.64%) outlier loci using

ARLEQUIN (FDIST method) and 503 (1.02%) using BAYESCAN (Fig 2a, b). When the analyses were restricted to Iberian populations, we identified 93 (0.18%) outlier loci using ARLEQUIN and 270

(0.52%) using BAYESCAN (Fig. 2a,b). Several SNPs were commonly identified as outliers by both

ARLEQUIN and BAYESCAN analyses (Fig. 2a,b). Most outlier loci were identified to be under divergent selection in analyses based on both the dataset including all populations (ARLEQUIN: n = 275, 86.47%; BAYESCAN: n = 467, 92.84%) and the one restricted to Iberian populations

(ARLEQUIN: n = 76, 81.72%; BAYESCAN: n = 269, 99.63%). Environmental association analyses in LFMM showed that a high number of loci were significantly associated with environmental variation (Fig. 2c,d). For the dataset including all populations, LFMM detected 7,710 unique loci (15.62%) associated with at least one PC of environmental variation (PC1: 2,932 loci; PC2: 4,051 loci; PC3: 2,797 loci; Fig. S2) and 196 of them (2.54 %) showed significant associations with the three PCs (Fig. 2c). Similarly, when the analyses were restricted to Iberian populations, LFMM detected 6,104 unique loci (11.87%) associated with at least one PC of environmental variation (PC1: 2,713 loci; PC2: 3,081 loci; PC3: 2,982 loci) of which 363 (5.95 %) were shared across all PCs (Fig. 2d). Only 42 loci for analyses based on all populations and 23 loci for analyses focused on Iberian populations showed associations

193

CAPÍTULO IV

with environmental variation in LFMM and were also identified as FST outliers by ARLEQUIN and BAYESCAN analyses (Fig. 2a, b).

Figure 2 Venn diagrams showing the overlap of number of loci identified to be under selection by ARLEQUIN (FDIST method) (red circles) and BAYESCAN (blue circles) and presenting environmental associations according to LFMM (yellow circles) for analyses based on (A) all populations and (B) restricted to populations from the Iberian Peninsula. Panels (C-D) show the number of loci presenting associations with environmental PC1 (green circles), PC2 (orange circles) and PC3 (purple circles) according to LFMM analyses based on (C) all populations and (D) restricted to populations from the Iberian Peninsula.

GENETIC STRUCTURE

Genetic structure at neutral loci. STRUCTURE analyses for all populations based on a random subset of 10,000 unlinked neutral SNPs identified K = 2 as the most likely clustering solution according with the ΔK criterion (Fig. S3). These two clusters show a very low degree of genetic admixture and split Canarian and Iberian populations (Fig. 3c). These results are in good agreement with those obtained from PCA, which showed a clear separation of Iberian and Canarian populations (Fig. 4a). STRUCTURE analyses for K = 3 showed that one population

194

CAPÍTULO IV

Figure 3 (A-B) Genetic diversity and (C) results of Structure analyses for the studied populations of Moroccan locust based on neutral loci. Panel (A) shows nucleotide diversity (π) for each population calculated in stacks for all positions (polymorphic and non-polymorphic). Panel (B) shows standardized multilocus heterozygosity (sMLH) of each individual, with values below the lowest 10th percentile shown in red. (C) Bar plots of Structure analyses show the individual’s probabilities of membership to each inferred genetic cluster for different K values. Each individual is represented by a vertical bar, which is partitioned into k coloured segments showing the individual’s probability of belonging to the cluster with that colour. Thin vertical black lines separate individuals from different populations. Structure analyses were performed at different hierarchical levels, analysing all populations together and Iberian (brownish) and Canarian (greenish) populations separately. Population codes are described in Table S1. 195

CAPÍTULO IV from the Iberian Peninsula (BERZ) split from the rest of the mainland populations, albeit with a high degree of genetic admixture with the other Iberian populations (~ 10-20%; Fig. 3c).

STRUCTURE analyses restricted to Iberian populations and based on a random subset of 10,000 unlinked neutral SNPs identified that the most likely number of clusters was K = 2 (Fig. 3c and Fig. S3). However, all the populations and individuals presented the same proportion of ancestry to the two inferred clusters (~ 20/80), indicating that they represent ghost clusters with no biological significance (see Guillot et al., 2005; Chen et al., 2007). STRUCTURE analyses for K = 3 and K = 4 showed that BERZ and HOYO were assigned to two different genetic groups albeit with some degree of genetic admixture with the rest of Iberian populations (Fig. 3c). PCA for the Iberian subset of populations showed that HOYO was the most differentiated population and some individuals from two other populations (ALHA and ALPU) also stood out from the rest (Fig. 4c).

Figure 4 Principal component analyses (PCA) of genetic variation for populations of Moroccan locust based on (A, B) neutral loci and (C, D) loci identified to be under divergent selection by ARLEQUIN (FDIST method) and BAYESCAN. PCAs were performed for (A, C) all populations and (B, D) only considering populations from the Iberian Peninsula. Dashed brownish and greenish ellipses encircle Iberian and Canarian populations, respectively. The number of loci used in the different analyses is indicated in each panel. Population codes are described in Table S1.

196

CAPÍTULO IV

Finally, STRUCTURE analyses restricted to populations from the Canary Islands showed K = 2 as the most likely clustering solution. These two clusters separated HIER and TENE populations and showed a very low degree of genetic admixture (Fig. 3c).

Genetic structure at loci under divergent selection. STRUCTURE analyses based on loci under divergent selection from the dataset including all populations (694 SNPs) identified K = 2 as the best supported number of clusters (Fig. S3b). As shown for analyses based on neutral loci, these two clusters separated Canarian and Iberian populations. Likewise, STRUCTURE analyses for K = 3 showed that the Iberian population BERZ split from the rest of the populations (Fig. 5a). These results are in total agreement with those obtained from a PCA

(Fig. 4c). STRUCTURE analyses based on outlier loci identified to be under divergent selection from the dataset restricted to Iberian Peninsula (311 SNPs) also showed K = 2 as the most likely clustering solution (Fig. S3d) and separated BERZ from the rest of the populations (Fig.

5b). STRUCTURE analyses for K = 3 and K = 4 revealed further genetic substructure, with one cluster mostly represented in ALPU population and another grouping ALHA, ALCU, BELA, and ESPA populations (Fig. 5b). A PCA for Iberian populations only showed a clear separation between BERZ and the rest of the populations (Fig. 4d). The Fig. 6 illustrates on a map population allele frequency for some outlier loci identified to be under divergent selection by both ARLEQUIN and BAYESCAN.

GEOGRAPHICAL AND ENVIRONMENTAL DRIVERS OF GENETIC DIFFERENTIATION

Multiple matrix regression with randomization (MMRR) for all populations and datasets of either neutral loci or loci under diversifying selection showed that the only variable significantly associated with genetic differentiation (FST) and retained into the final models was geographic distance (Table 1). MMRR analyses restricted to Iberian populations showed that no variable was retained into the final models, indicating that the association between genetic differentiation and geographic distance obtained across all populations was mostly driven by the highly differentiated and geographically isolated Canarian populations (Table 1).

197

CAPÍTULO IV

Figure 5 STRUCTURE analyses for populations of Moroccan locust based on loci identified to be under divergent selection by ARLEQUIN (FDIST method) and BAYESCAN for (A) all populations (694 SNPs) and (B) only populations from the Iberian Peninsula (311 SNPs). Bar plots show the individual’s probabilities of membership to each inferred genetic cluster for different K values. Each individual is represented by a vertical bar, which is partitioned into k coloured segments showing the individual’s probability of belonging to the cluster with that colour. Thin vertical black lines separate individuals from different populations. Population codes are described in Table S1.

198

CAPÍTULO IV

Table 1 Multiple matrix regressions with randomization (MMRR) for genetic differentiation (FST) in relation with geographical and environmental distances between each pair of populations. Environmental distances (PC1, PC2 and PC3) were estimated from a principal component analysis (PCA) on the 19 bioclimatic variables from the WORLDCLIM dataset. Analyses were performed for estimates of genetic differentiation based on neutral loci and loci identified to be under divergent selection by ARLEQUIN (FDIST method) and BAYESCAN and considering (A) all populations or (B) only populations from the Iberian Peninsula.

(A) All populations Neutral loci Loci under divergent selection β t P β t p Explanatory terms Constant 0.145 3.702 0.031 0.043 1.837 0.009 Geographic distance 0.416 10.675 0.006 0.888 38.162 0.009 Rejected terms Environmental PC1 -2.164 0.244 -1.806 0.323 Environmental PC2 0.317 0.852 1.379 0.422 Environmental PC3 0.315 0.885 0.592 0.809

(B) Iberian populations Neutral loci Loci under divergent selection β t P β t p Rejected terms Geographic distance -0.169 -3.510 0.055 -0.062 -0.966 0.625 Environmental PC1 -0.098 -2.085 0.306 -0.022 -0.357 0.880 Environmental PC2 -0.077 -1.630 0.299 -0.066 -1.068 0.473 Environmental PC3 0.023 0.484 0.805 0.104 1.688 0.373

GENETIC DIVERSITY AND PAST DEMOGRAPHIC HISTORY

Population genetic statistics (P, HO, HE, π, FIS) calculated in STACKS for all positions (polymorphic and non-polymorphic) are presented in Table S1. Population genetic diversity (π) and standardized multilocus heterozygosity (sMLH) for each individual are shown in Fig. 3a,b. All estimates of population genetic diversity were significantly lower in Canarian than in Iberian populations (one-way ANOVAs; all Ps < 0.001; Table 1). STAIRWAY PLOT analyses showed no clear demographic differences between outbreak and non-outbreak populations, so we grouped the demographic profiles inferred for the different populations according to the two main geographical regions: Iberian Peninsula and Canary Islands. All Iberian populations showed an ancestral population increase around 150 ka BP, a severe demographic bottleneck around the last glacial maximum (LGM; ~21 ka BP) with a Ne

199

CAPÍTULO IV reduction by ~80% (from ~250,000 to ~50,000 diploid individuals), and an abrupt expansion at the onset of the Holocene to reach contemporary population sizes of around ~450,000 individuals (Fig. 1b, Fig. S4). The two Canarian populations presented a similar demographic profile than the Iberian ones although with a less pronounced decline in Ne during the last glacial period (Fig 1c). Canarian populations showed a long period of stability in the past (200-100 ka BP) followed by a demographic bottleneck during the last glacial period (100-21 ka BP) and a subsequent expansion at the onset of the Holocene to reach contemporary effective population sizes of 220,000-350,000 diploid individuals (Fig. 1c; Fig. S4).

Figure 6 Allele frequency distributions for four unlinked SNPs identified to be under diversifying selection by both ARLEQUIN (FDIST method) and BAYESCAN for the studied populations of Moroccan locust. Red indicates the minor allele frequency in all cases. SNP codes are presented in each panel.

200

CAPÍTULO IV

DISCUSSION

We employed a large single-nucleotide polymorphism dataset (~50,000 loci) to infer historical demographic trends and understand the neutral and selective processes shaping spatial patterns of genomic variation in the Moroccan locust, a grasshopper that has traditionally emerged as an devastating pest in overgrazed areas and caused extensive agricultural damage in Spain (e.g. Buj-Buj, 1992) and many other areas across its ample distribution range (Latchininsky, 1998). Our analyses, focused on the Iberian Peninsula and remote populations from the Canary Islands, indicated that the species is characterized by widespread gene flow and low levels of genetic differentiation at regional scales. They also showed little differences in past demography and levels of genetic diversity and differentiation between traditionally outbreak and non-outbreak populations. Finally, genome scans revealed that a small fraction (~0.2-1.0 %) of the sequenced loci are under divergent selection and might be involved in local adaptation processes in response to the different ecological and environmental gradients experienced by the species.

Population genetic structure at different spatial scales

Bayesian clustering and principal component analyses based on neutral loci showed the presence of two well-defined genetic groups corresponding to Iberian and Canarian populations. The large geographical distance between these two regions (1,700 km), their separation by oceanic water barriers to dispersal, and genetic drift in the small and highly isolated populations from the Canary Islands are probably the main drivers of observed patterns of genetic structure (e.g. Chapuis et al., 2009). According to this last point, Canarian populations presented lower levels of genetic diversity than most Iberian populations (Fig.

3a; Table S3) and STRUCTURE analyses showed that genetic drift after divergence for the cluster corresponding to the Canary Islands (F-value = 0.419) was more than threefold the estimated for Iberian populations (F-value = 0.121) (Pritchard et al., 2000). Hierarchical

STRUCTURE analyses for the two main genetic clusters showed that Canarian populations split into two very well defined genetic clusters corresponding to Tenerife and El Hierro Islands,

201

CAPÍTULO IV whereas continental populations from Iberian Peninsula show a much shallower genetic structure and considerable admixture (Fig. 3c). Levels of genetic differentiation among

Iberian populations of D. maroccanus (mean FST = 0.067; range = 0.051-0.102) were much lower than those reported for other Iberian cross-backed grasshoppers (genus Dociostaurus) with narrower distributions such as D. crassiusculus (mean FST = 0.129; range = 0.033-0.237;

González-Serna et al., 2018) and D. hispanicus (mean FST = 0.189; range = 0.082-0.307; the authors, unpublished data).

In spite of STRUCTURE analyses for Iberian populations identified K = 2 as the most likely number of clusters, such result has no biological sense as it showed the presence of two clusters with almost identical proportions of population membership for all individuals and populations (i.e. “ghost clusters” sensu Guillot et al., 2005). STRUCTURE analyses for K = 3 and K = 4 showed the presence of two genetic clusters mostly corresponding to the non- outbreaking and relatively isolated populations of BERZ and HOYO (Fig. 3c). The genetic clusters of these two populations had much higher estimates of genetic drift after divergence (BERZ: F-value = 0.263; HOYO: F-value = 0.084) than the one representing the rest of Iberian populations (F-value = 0.021), which suggests that their relatively small sizes and/or low connectivity with other populations are probably the main causes underlying their stronger genetic differentiation. Accordingly, one of these populations (BERZ) had the lowest levels of genetic diversity among all Iberian populations (Table S1). Apart from these two exceptions, we did not find differences in spatial patterns of genetic variation between traditionally outbreak and non-outbreak populations.

Overall, these results are in agreement with previous studies on other locust taxa finding very low levels of genetic differentiation across large spatial scales (Chapuis et al., 2008; Chapuis et al., 2011a), similar to that reported for some pelagic marine species (e.g. Hoarau et al., 2002; Als et al., 2011), and occasional genetic drift and differentiation in non- outbreaking, solitarious or phase transition populations persisting at very low densities in isolated pockets of suitable habitats (Ibrahim et al., 2000; Chapuis et al., 2014).

202

CAPÍTULO IV

Past demographic history

Demographic reconstructions for the Moroccan locust using STAIRWAY PLOT revealed the presence of a remarkable genetic bottleneck during the last glacial maximum (~21 ka BP) followed by an abrupt expansion at the onset of the Holocene (Fig. 1). Changes in effective population size (Ne) through time did not qualitatively differ among populations, indicating that all of them have experienced parallel demographic histories. Demographic reconstructions for the recent past did not reveal more pronounced expansions in outbreak populations or population bottlenecks in solitarious or phase transition populations. It is also noteworthy that Canarian and Iberian population presented similar demographic profiles, with two main differences: Canarian populations consistently sustained lower effective population sizes over time and experienced less marked demographic declines during the last glacial period than Iberian populations. These results are congruent with the comparatively lower levels of genetic diversity of Canarian populations (Table S1) and compatible with the less severe impact of Pleistocene climatic oscillations at lower latitudes (Hewitt, 1999; Fernández-Palacios et al., 2015; Snyder, 2016).

The inferred demographic trends are also in agreement with fossil records of egg pods of D. maroccanus in sediments from the Canary Islands (Meco et al., 2010; Meco et al., 2011). Palaeontological evidence indicates that the abundance of the species has dramatically oscillated since the end of the Pliocene (3 Ma), with population peaks matching with warm interglacial periods of the Middle and Late Pleistocene (Meco et al., 2010; Meco et al., 2011; Snyder, 2016).

The overall good correspondence between peaks of population size and warm periods inferred from both fossil records (Meco et al., 2010; Meco et al., 2011) and genomic data (present study) can be explained by the thermophilic nature of the species and the fact that its development and distribution are limited by low ambient temperatures (Benlloch, 1947; Arias-Giralda et al., 1997; Quesada-Moraga & Santiago-Álvarez, 2000; Aragón & Lobo, 2012; Aragón et al., 2013).

203

CAPÍTULO IV

Loci under selection and potential for local adaptation

Genome-scans based on FST outlier tests revealed that a small portion of the sampled genome (between 0.2 to 1.0%, depending on the dataset and method) is under selection, whereas environmental association analyses (EAA) identified a much larger proportion of SNPs (12-15%) with allele frequencies correlated with one or more environmental gradients. Beyond these differences in numbers, the specific loci identified by the two methods showed little overlap (Fig. 2). Our results are in agreement with previous RADseq-based studies (e.g. Guo et al., 2016; Dudaniec et al., 2018) that interpreted such differences to be a consequence of the contrasting sensitivities of each approach to deal with the effects of genetic drift and structure and by the better performance of EAA methods to detect weak or polygenic signatures of selection (de Villemereuil et al., 2014; Frichot & François, 2015). A BLAST search in an attempt to identify candidate genes associated with outlier SNPs yielded no significant alignment with available sequences at NCBI database (see also Jeffery et al., 2018). This might be explained by the very scarce genomic resources available for grasshoppers, with only one draft genome sequenced so far for a species (Locusta migratoria) (Wang et al., 2014) belonging to a different subfamily than the Moroccan locust (Cigliano et al., 2019). Also, the signals of selection in certain loci could be a consequence of genetic hitchhiking resulted from linkage disequilibrium between the identified outlier SNPs and nearby non-sequenced genes actually subjected to diversifying selection (Smith & Haigh, 1974). This is particularly relevant considering the extraordinarily large size charactering the genome of grasshoppers, which might have resulted in we have only sampled a relatively small representation of it (Wang et al., 2014; Camacho et al., 2015).

As expected, Bayesian clustering analyses focused on the subset of outlier loci identified to be under diverging selection showed a stronger genetic structure than yielded by analyses based on neutral loci (see also Table S3-4 for FST values). These analyses separated Iberian populations in a total of four relatively well-defined clusters with no clear geographical correspondence (Fig. 5). This indicates that, despite the high levels of gene flow, certain populations from different regions experience similar selective regimes that might have resulted in convergent processes of local adaptation. Once again, such genetic

204

CAPÍTULO IV clusters were not restricted to non-outbreak or solitary populations and often involved outbreak populations (e.g. ALPU and ALHA), suggesting that selection at certain regions of the genome and potential local adaptation processes are not impeded by widespread gene flow (Guo et al., 2016; Bernatchez et al., 2018). Isolation-by-environment (IBE) analyses showed that genetic differentiation among populations at outlier SNPs was not explained by environmental distances, suggesting that other unmeasured selective forces (e.g. predators: Arias et al., 1994; Barranco et al., 2000; host-plant use: Kokanova, 2014; parasites: Valverde- Garcia et al., 2018) are probably responsible of observed spatial patterns of variation at loci subjected to divergent selection (Guo et al., 2016).

Conclusions and prospects

Overall, our study shows for the first time that populations of the economically important Moroccan locust present a shallow genetic differentiation and little differences in past demography, signatures of selection and contemporary levels of genetic diversity and structure between outbreak and non-outbreak populations. Outbreaks of Moroccan locust are usually linked to considerable cattle densities and overgrazing (Louveaux et al., 1996; Latchininsky, 1998) and it has been suggested that their frequency might increase in the future favoured by global warming (Aragón et al., 2013). Although genetically non- differentiated populations have often been interpreted as a single panmictic unit (e.g. Hoarau et al., 2002; Schrey et al., 2008), the homogenizing effects of gene flow do not necessarily indicate demographic dependence or synchrony among populations (Chapuis et al., 2011a; Chapuis et al., 2011b) and evidence of selective processes inferred in this study suggests that some populations might experience idiosyncratic evolutionary dynamics (e.g. Pujolar et al., 2014; Guo et al., 2016). Future longitudinal and functional genomic studies could help to identify the proximate factors and specific genes underlying the observed signatures of selection (e.g. Wang et al., 2014; Bakkali & Martín-Blázquez, 2018) and determine whether these are temporally stable or if, on the contrary, they are ephemeral and restricted to one or a few generations due to the homogenising effects of gene flow (Pujolar et al., 2014; Laporte et al., 2016; Babin et al., 2017). Pest managers should consider

205

CAPÍTULO IV the high connectivity among populations at local and regional scales, the possible existence of local adaptations and the probable future range expansions that might increase outbreak intensity in response to ongoing climate warming and land use alterations (Benfekih et al., 2002). This information, together with available predictive maps of outbreak favourability (Aragón et al., 2013) would be of great help to develop integrated management practices aimed at reducing the negative impacts of this locust species on agriculture (Lankau et al., 2011; Abrol, 2014).

ACKNOWLEDGEMENTS

We thank the unconditional support of Mercedes París and Vicenta Llorente during our visits to the entomological collections from the MNCN, José Miguel Aparicio for his help during sampling in the Canary Islands, María del Milagro Coca-Abia for providing us samples from Zaragoza populations, Carlos Muñoz-Alcón, Antonio González, Benito Ortiz and Manolo Pérez for taking us to some sampling locations from Salamanca and Canary Islands, and Anna Papadopoulou for her great support with genomic data analyses. We also wish to thank to Centro de Supercomputación de Galicia (CESGA) and Doñana's Singular Scientific-Technical Infrastructure (ICTS-RBD) for access and use of computing resources. The respective administrative authorities from each study area (Andalucía, Aragón, Castilla-La Mancha, Castilla y León, Cataluña, El Hierro, Extremadura, La Rioja, Madrid, Melilla, Murcia, Navarra and Tenerife) provided us the corresponding permits for sampling. MGS was supported by a pre-doctoral scholarship from Junta de Comunidades de Castilla-La Mancha and European Social Fund. This work received financial support from research grants CGL2011-25053, CGL2014-54671-P, CGL2016-80742-R and CGL2017-83433-P (co-funded by the Dirección General de Investigación y Gestión del Plan Nacional I+D+i and European Social Fund); PEII- 2014-023-P (co-funded by Junta de Comunidades de Castilla-La Mancha and European Social Fund).

REFERENCES

Abrol, D.P. (editor) (2014) Integrated pest management: Current concepts and ecological perspective. 576 pp. Academic Press. San Diego, USA.

Alberola-Roma, A. (2012) Plagas de langosta y clima en la España del siglo XVIII. Relaciones, 129, 21-50.

206

CAPÍTULO IV

Als, T.D., Hansen, M.M., Maes, G.E. et al. (2011) All roads lead to home: Panmixia of European eel in the Sargasso sea. Molecular Ecology, 20, 1333-1346.

Aragón, P., Coca-Abia, M.M., Llorente, V. & Lobo, J.M. (2013) Estimation of climatic favourable areas for locust outbreaks in Spain: Integrating species' presence records and spatial information on outbreaks. Journal of Applied Entomology, 137, 610-623.

Aragón, P. & Lobo, J.M. (2012) Predicted effect of climate change on the invasibility and distribution of the western corn root-worm. Agricultural and Forest Entomology, 14, 13-18.

Arias-Giralda, A., Jiménez-Viñuelas, J. & Pérez-Romero, A. (1997) Observaciones sobre el desarrollo embrionario y el avivamiento de Dociostaurus maroccanus (Thunb) en una finca de "La Serena" (Extremadura). Boletín de Sanidad Vegetal. Plagas, 23, 113-132.

Arias-Giralda, A., Morales-Agacino, E., Cobos-Suárez, J.M. et al. (1993) La langosta mediterránea Dociostaurus maroccanus (Thunberg). Boletín de Sanidad Vegetal. Plagas, 19, 1001-1011.

Arias, A., Sánchez, M., Jiménez, J. et al. (1994) Distribución en el suelo de las ootecas de Dociostaurus maroccanus (Thunb.) e importancia de su depredación en dos fincas de Extremadura. Boletín de Sanidad Vegetal. Plagas, 20, 3-22.

Babin, C., Gagnaire, P.A., Pavey, S.A. & Bernatchez, L. (2017) RAD-seq reveals patterns of additive polygenic variation caused by spatially-varying selection in the American eel (Anguilla rostrata). Genome Biology and Evolution, 9, 2974-2986.

Baker, S.R. & Wilkinson, C.F. (editors) (1990) The effect of pesticides on human health. Princeton Scientific Publishing. Princeton, NJ, USA.

Bakkali, M. & Martín-Blázquez, R. (2018) RNA-seq reveals large quantitative differences between the transcriptomes of outbreak and non-outbreak locusts. Scientific Reports, 8, 9207

Barranco, P., Pascual, F. & Cabello, T. (2000) Oviposition and egg predation in Dociostaurus maroccanus (Thunberg, 1815). (Orthoptera: Acrididae). Boletín de la Asociación Española de Entomología, 24, 161-177.

Beaumont, M.A. & Nichols, R.A. (1996) Evaluating loci for use in the genetic analysis of population structure. Proceedings of the Royal Society of London. Series B: Biological Sciences, 263, 1619-1626. https://doi.org/10.1098/rspb.1996.0237

Bekkevold, D., Gross, R., Arula, T. et al. (2016) Outlier loci detect intraspecific biodiversity amongst spring and autumn spawning herring across local scales. PLOS ONE, 11, e0148499.

Benfekih, L., Chara, B. & Doumandji-Mitche, B. (2002) Influence of anthropogenic impact on the habitats and swarming risks of Dociostaurus maroccanus and Locusta migratoria

207

CAPÍTULO IV

(Orthoptera, Acrididae) in the Algerian Sahara and the semiarid zone. Journal of Orthoptera Research, 11, 243-250.

Benlloch, M. (1947) Influencia de la humedad y la temperatura sobre la vitalidad y desarrollo de los huevos de langosta. Boletín de Patología Vegetal y Entomología Agrícola, 15, 271-274.

Berdan, E.L., Mazzoni, C.J., Waurick, I. et al. (2015) A population genomic scan in Chorthippus grasshoppers unveils previously unknown phenotypic divergence. Molecular Ecology, 24, 3918-3930.

Bernatchez, S., Xuereb, A., Laporte, M. et al. (2018) Seascape genomics of eastern oyster (Crassostrea virginica) along the Atlantic coast of Canada. Evolutionary Applications, in press [doi:10.1111/eva.12741]

Brauer, C.J., Hammer, M.P. & Beheregaray, L.B. (2016) Riverscape genomics of a threatened fish across a hydroclimatically heterogeneous river basin. Molecular Ecology, 25, 5093-5113.

Brown, J.L., Weber, J.J., Alvarado-Serrano, D.F. et al. (2016) Predicting the genetic consequences of future climate change: The power of coupling spatial demography, the coalescent, and historical landscape changes. American Journal of Botany, 103, 153-163.

Buj-Buj, A. (1992) Control de las plagas de langosta y modernización agrícola en la España de la segunda mitad del siglo XIX. Cuadernos críticos de Geografía Humana. Universidad de Barcelona, 95, 1-67.

Camacho, J.P.M., Ruiz-Ruano, F.J., Martín-Blázquez, R. et al. (2015) A step to the gigantic genome of the desert locust: Chromosome sizes and repeated DNAs. Chromosoma, 124, 263-275.

Carson, R. (2002) Silent spring (40th anniversary ed.). Houghton Mifflin. Boston, MA, USA.

Catchen, J.M., Amores, A., Hohenlohe, P.A. et al. (2011) STACKS: Building and genotyping loci de novo from short-read sequences. G3: Genes|Genomes|Genetics, 1, 171-182.

Catchen, J.M., Hohenlohe, P.A., Bassham, S. et al. (2013) STACKS: An analysis tool set for population genomics. Molecular Ecology, 22, 3124-3140.

Chapco, W. & Litzenberger, G. (2004) A DNA investigation into the mysterious disappearance of the rocky mountain grasshopper, mega-pest of the 1800s. Molecular Phylogenetics and Evolution, 30, 810-814.

Chapuis, M.P., Estoup, A., Augé-Sabatier, A. et al. (2008) Genetic variation for parental effects on the propensity to gregarise in Locusta migratoria. BMC Evolutionary Biology, 8, 37.

Chapuis, M.P., Plantamp, C., Blondin, L. et al. (2014) Demographic processes shaping genetic variation of the solitarious phase of the desert locust. Molecular Ecology, 23, 1749-1763.

208

CAPÍTULO IV

Chapuis, M.P., Loiseau, A., Michalakis, Y. et al. (2009) Outbreaks, gene flow and effective population size in the migratory locust, Locusta migratoria: A regional-scale comparative survey. Molecular Ecology, 18, 792-800.

Chapuis, M.P., Popple, J.A., Berthier, K. et al. (2011a) Challenges to assessing connectivity between massive populations of the Australian plague locust. Proceedings of the Royal Society of London. Series B: Biological Sciences, 278, 3152-3160.

Chapuis, M.P., Simpson, S.J., Blondin, L. & Sword, G.A. (2011b) Taxa-specific heat shock proteins are over-expressed with crowding in the Australian plague locust. Journal of Insect Physiology, 57, 1562-1567.

Chen, C., Durand, E., Forbes, F. & François, O. (2007) Bayesian clustering algorithms ascertaining spatial population structure: A new computer program and a comparison study. Molecular Ecology Notes, 7, 747-756.

Cigliano, M.M., Braun, H., Eades, D.C. & Otte, D. (2019) Orthoptera Species File. Version 5.0/5.0. [WWW document]. URL http://orthoptera.speciesfile.org

Coltman, D.W., Pilkington, J.G., Smith, J.A. & Pemberton, J.M. (1999) Parasite-mediated selection against inbred soay sheep in a free-living, island population. Evolution, 53, 1259- 1267.

Cristofari, R., Liu, X., Bonadonna, F. et al. (2018) Climate-driven range shifts of the king penguin in a fragmented ecosystem. Nature Climate Change, 8, 245-251.

Crossley, M.S., Chen, Y.H., Groves, R.L. & Schoville, S.D. (2017) Landscape genomics of Colorado potato beetle provides evidence of polygenic adaptation to insecticides. Molecular Ecology, 26, 6284-6300.

Crossley, M.S., Rondon, S.I. & Schoville, S.D. (2019) Patterns of genetic differentiation in Colorado potato beetle correlate with contemporary, not historic, potato land cover. Evolutionary Applications, in press [doi:10.1111/eva.12757] de Villemereuil, P., Frichot, É., Bazin, É. et al. (2014) Genome scan methods against more complex models: When and how much should we trust them? Molecular Ecology, 23, 2006- 2019. del Cañizo, J. & Moreno, V. (1949) Biología y ecología de la langosta mediterránea o marroquí (Dociostaurus maroccanus Thunb.). Boletín de Patología Vegetal y Entomología Agrícola, 17, 209-242.

Dowle, E.J., Bracewell, R.R., Pfrender, M.E. et al. (2017) Reproductive isolation and environmental adaptation shape the phylogeography of mountain pine beetle (Dendroctonus ponderosae). Molecular Ecology, 26, 6071-6084.

209

CAPÍTULO IV

Dudaniec, R.Y., Yong, C.J., Lancaster, L.T. et al. (2018) Signatures of local adaptation along environmental gradients in a range-expanding damselfly (Ischnura elegans). Molecular Ecology, 27, 2576-2593.

Earl, D.A. & vonHoldt, B.M. (2012) STRUCTURE HARVESTER: A website and program for visualizing structure output and implementing the Evanno method. Conservation Genetics Resources, 4, 359-361. el Ghadraoui, L., Petit, D., Picaud, F. & el Yamani, J. (2002) Relationship between labrum sensilla number in the Moroccan locust Dociostaurus maroccanus and the nature of its diet. Journal of Orthoptera Research, 11, 11-18.

Enserink, M. (2004) Can the war on locusts be won? Science, 306, 1880-1882.

Ernst, U.R., van Hiel, M.B., Depuydt, G. et al. (2015) Epigenetics and locust life phase transitions. The Journal of Experimental Biology, 218, 88-99.

Ersts, P.J. (2018) GEOGRAPHIC DISTANCE MATRIX GENERATOR v.1.2.3. American Museum of Natural History, Center for Biodiversity and Conservation. [WWW document]. URL http://biodiversityinformatics.amnh.org

Espindola, A., Pellissier, L., Maiorano, L. et al. (2012) Predicting present and future intra- specific genetic structure through niche hindcasting across 24 millennia. Ecology Letters, 15, 649-657.

Evanno, G., Regnaut, S. & Goudet, J. (2005) Detecting the number of clusters of individuals using the software STRUCTURE: A simulation study. Molecular Ecology, 14, 2611-2620.

Excoffier, L., Hofer, T. & Foll, M. (2009) Detecting loci under selection in a hierarchically structured population. Heredity, 103, 285-298.

Excoffier, L. & Lischer, H.E. (2010) ARLEQUIN suite ver 3.5: A new series of programs to perform population genetics analyses under Linux and Windows. Molecular Ecology Resources, 10, 564-567.

Falush, D., Stephens, M. & Pritchard, J.K. (2003) Inference of population structure using multilocus genotype data: Linked loci and correlated allele frequencies. Genetics, 164, 1567- 1587.

Fernández‐Palacios, J.M., Rijsdijk, K.F., Norder, S.J. et al. (2016) Towards a glacial‐sensitive model of island biogeography. Global Ecology and Biogeography, 25, 817-830.

Foll, M. & Gaggiotti, O. (2008) A genome-scan method to identify selected loci appropriate for both dominant and codominant markers: A Bayesian perspective. Genetics, 180, 977-993.

Fordham, D.A., Brook, B.W., Moritz, C. & Nogués-Bravo, D. (2014) Better forecasts of range dynamics using genetic data. Trends in Ecology & Evolution, 29, 436-443.

210

CAPÍTULO IV

Frichot, E. & François, O. (2015) LEA: An R package for landscape and ecological association studies. Methods in Ecology and Evolution, 6, 925-929.

Frichot, E., Mathieu, F., Trouillon, T. et al. (2014) Fast and efficient estimation of individual ancestry coefficients. Genetics, 196, 973-983.

Frichot, E., Schoville, S.D., Bouchard, G. & François, O. (2013) Testing for associations between loci and environmental gradients using Latent Factor Mixed Models. Molecular Biology and Evolution, 30, 1687-1699.

Gassmann, A.J., Onstad, D.W. & Pittendrigh, B.R. (2009) Evolutionary analysis of herbivorous insects in natural and agricultural environments. Pest Management Science, 65, 1174-81.

González-Serna, M.J., Cordero, P.J. & Ortego, J. (2018) Using high-throughput sequencing to investigate the factors structuring genomic variation of a Mediterranean grasshopper of great conservation concern. Scientific Reports, 8, 13436.

Guerrero, A., Ramos, V.E., López, S. et al. (2019). Enantioselective synthesis and activity of all diastereoisomers of (e)-phytal, a pheromone component of the Moroccan locust, Dociostaurus maroccanus. Journal of Agricultural and Food Chemistry, 67, 72-80.

Guillot, G., Estoup, A., Mortier, F. & Cosson, J.F. (2005) A spatial statistical model for landscape genetics. Genetics, 170, 1261-1280.

Guo, B., Li, Z. & Merilä, J. (2016) Population genomic evidence for adaptive differentiation in the Baltic sea herring. Molecular Ecology, 25, 2833-2852.

Hemmer-Hansen, J., Therkildsen, N.O. & Pujolar, J.M. (2014) Population genomics of marine fishes: Next-generation prospects and challenges. The Biological Bulletin, 227, 117-132.

Hewitt, G.M. (1999) Post-glacial re-colonization of European biota. Biological Journal of the Linnean Society, 68, 87-112.

Hijmans, R.J., Cameron, S.E., Parra, J.L. et al. (2005) Very high resolution interpolated climate surfaces for global land areas. International Journal of Climatology, 25, 1965-1978.

Hoarau, G., Rijnsdorp, A.D., Van Der Veer, H.W. et al. (2002) Population structure of plaice (Pleuronectes platessa L.) in northern Europe: Microsatellites revealed large-scale spatial and temporal homogeneity. Molecular Ecology, 11, 1165-1176.

Hohenlohe, P.A., Bassham, S., Etter, P.D. et al. (2010) Population genomics of parallel adaptation in threespine stickleback using sequenced RAD tags. PLOS Genetics, 6, e1000862.

Hubisz, M.J., Falush, D., Stephens, M. & Pritchard, J.K. (2009) Inferring weak population structure with the assistance of sample group information. Molecular Ecology Resources, 9, 1322-1332.

211

CAPÍTULO IV

Ibrahim, K.M. (2001) Plague dynamics and population genetics of the desert locust: Can turnover during recession maintain population genetic structure? Molecular Ecology, 10, 581-591.

Ibrahim, K.M., Sourrouille, P. & Hewitt, G.M. (2000) Are recession populations of the desert locust (Schistocerca gregaria) remnants of past swarms? Molecular Ecology, 9, 783-791.

Jakobsson, M. & Rosenberg, N.A. (2007) CLUMPP: A cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure. Bioinformatics, 23, 1801-1806.

Jeffery, N.W., Bradbury, I.R., Stanley, R.R.E. et al. (2018) Genomewide evidence of environmentally mediated secondary contact of European green crab (Carcinus maenas) lineages in eastern North America. Evolutionary Applications, 11, 869-882.

Jombart, T. (2008) ADEGENET: A R package for the multivariate analysis of genetic markers. Bioinformatics, 24, 1403-1405.

Jones, C.M., Papanicolaou, A., Mironidis, G.K. et al. (2015). Genomewide transcriptional signatures of migratory flight activity in a globally invasive insect pest. Molecular Ecology, 24, 4901-4911.

Karsten, M., Addison, P., Jansen van Vuuren, B. & Terblanche, J.S. (2016) Investigating population differentiation in a major African agricultural pest: Evidence from geometric morphometrics and connectivity suggests high invasion potential. Molecular Ecology, 25, 3019-3032.

Keightley, P.D., Ness, R.W., Halligan, D.L. & Haddrill, P.R. (2014) Estimation of the spontaneous mutation rate per nucleotide site in a Drosophila melanogaster full-sib family. Genetics, 196, 313-320.

Kirk, H., Dorn, S. & Mazzi, D. (2013) Molecular genetics and genomics generate new insights into invertebrate pest invasions. Evolutionary Applications, 6, 842-856.

Kokanova, E.O. (2014) Food links of the Moroccan locust, Dociostaurus maroccanus (Thunberg, 1815) (Orthoptera, Acrididae) in Turkmenistan. Entomologicheskoe Obozrenie, 93, 53-57.

Lankau, R., Jørgensen, P.S., Harris, D.J. & Sih, A. (2011) Incorporating evolutionary principles into environmental management and policy. Evolutionary Applications, 4, 315-325.

Laporte, M., Pavey, S.A., Rougeux, C. et al. (2016) RAD sequencing reveals within-generation polygenic selection in response to anthropogenic organic and metal contamination in north Atlantic eels. Molecular Ecology, 25, 219-237.

Latchininsky, A.V. (1998) Moroccan locust Dociostaurus maroccanus (Thunberg, 1815): A faunistic rarity or an important economic pest? Journal of Insect Conservation, 2, 167-178.

212

CAPÍTULO IV

Latchininsky, A.V. (2013) Locusts and remote sensing: a review. Journal of Applied Remote Sensing, 7, 1-19.

Leftwich, P.T., Bolton, M. & Chapman, T. (2016) Evolutionary biology and genetic techniques for insect control. Evolutionary Applications, 9, 212-230.

Lenormand, T. (2002) Gene flow and the limits to natural selection. Trends in Ecology & Evolution, 17, 183-189.

Liu, X. & Fu, Y.-X. (2015) Exploring population size changes using SNP frequency spectra. Nature genetics, 47, 555-559.

Lockwood, J.A. (2004) Locust: The devastating rise and mysterious disappearance of the insect that shaped the American frontier. 320 pp. Basic Books. New York, USA.

Louveaux, A., Mouhim, A., Roux, G. et al. (1996) Effect of pastoral activities upon locust populations in the Siroua Massif (Morocco). Revue d'Écologie: La Terre et La Vie, 51, 139- 151.

Luikart, G., England, P.R., Tallmon, D. et al. (2003) The power and promise of population genomics: From genotyping to genome typing. Nature Reviews Genetics, 4, 981.

Martín-Blázquez, R., Chen, B., Kang, L. & Bakkali, M. (2017) Evolution, expression and association of the chemosensory protein genes with the outbreak phase of the two main pest locusts. Scientific Reports, 7, 6653.

Meco, J., Muhs, D.R., Fontugne, M. et al. (2011) Late Pliocene and quaternary Eurasian locust infestations in the Canary archipelago. Lethaia, 44, 440-454.

Meco, J., Petit-Maire, N., Ballester, J. et al. (2010) The Acridian plagues, a new Holocene and Pleistocene palaeoclimatic indicator. Global and Planetary Change, 72, 318-320.

Miles, A., Harding, N.J., Bottà, G. et al. (2017) Genetic diversity of the African malaria vector Anopheles gambiae. Nature, 552, 96.

Ortego, J., Aguirre, M.P., Noguerales, V. & Cordero, P.J. (2015) Consequences of extensive habitat fragmentation in landscape-level patterns of genetic diversity and structure in the Mediterranean esparto grasshopper. Evolutionary Applications, 8, 621-632.

Ortego, J., Gugger P.F. & Sork, V.L. (2018) Genomic data reveal cryptic lineage diversification and introgression in Californian golden cup oaks (section Protobalanus). New Phytologist, 218, 804-818.

Papadopoulou, A. & Knowles, L.L. (2015) Species-specific responses to island connectivity cycles: Refined models for testing phylogeographic concordance across a Mediterranean Pleistocene aggregate island complex. Molecular Ecology, 24, 4252-4268.

213

CAPÍTULO IV

Peterson, B.K., Weber, J.N., Kay, E.H. et al. (2012) Double digest RADseq: An inexpensive method for de novo SNP discovery and genotyping in model and non-model species. PLOS ONE, 7, e37135.

Pritchard, J.K., Stephens, M. & Donnelly, P. (2000) Inference of population structure using multilocus genotype data. Genetics, 155, 945-959.

Pujolar, J.M., Jacobsen, M.W., Als, T.D. et al. (2014) Genome-wide single-generation signatures of local selection in the panmictic European eel. Molecular Ecology, 23, 2514- 2528.

Qin, Y.-J., Krosch, M.N., Schutze, M.K. et al. (2018) Population structure of a global agricultural invasive pest, Bactrocera dorsalis (Diptera: Tephritidae). Evolutionary Applications, 11, 1990-2003.

Quesada-Moraga, E. & Santiago-Álvarez, C. (2000) Temperature related effects on embryonic development of the Mediterranean locust, Dociostaurus maroccanus. Physiological Entomology, 25, 191-195.

R Core Team. (2017) R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. [WWW document]. URL https://www.R- project.org/

Rellstab, C., Gugerli, F., Eckert, A.J. et al. (2015) A practical guide to environmental association analysis in landscape genomics. Molecular Ecology, 24, 4348-4370.

Rosenberg, N.A. (2004) DISTRUCT: A program for the graphical display of population structure. Molecular Ecology Notes, 4, 137-138.

Schrey, N.M., Schrey, A.W., Heist, E.J. & Reeve, J.D. (2008) Fine-scale genetic population structure of southern pine beetle (Coleoptera: Curculionidae) in Mississippi forests. Environmental Entomology, 37, 271-276.

Sexton, J.P., Hangartner, S.B. & Hoffmann, A.A. (2014) Genetic isolation by environment or distance: Which pattern of gene flow is most common? Evolution, 68, 1-15.

Shafer, A.B. & Wolf, J.B. (2013) Widespread evidence for incipient ecological speciation: A meta-analysis of isolation-by-ecology. Ecology Letters, 16, 940-950.

Sherpa, S., Rioux, D., Goindin, D. et al. (2018) At the origin of a worldwide invasion: Unraveling the genetic makeup of the Caribbean bridgehead populations of the dengue vector Aedes aegypti. Genome Biology and Evolution, 10, 56-71.

Simon, J.-C., d’Alençon, E., Guy, E. et al. (2015) Genomics of adaptation to host-plants in herbivorous insects. Briefings in Functional Genomics, 14, 413-423.

214

CAPÍTULO IV

Skaf, R., Popov, G.B., Roffey, J. et al. (1990) The desert locust: An international challenge [and discussion]. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 328, 525-538.

Smith, J.M. & Haigh, J. (1974) The hitch-hiking effect of a favourable gene. Genetical Research, 23, 23-35.

Snyder, C.W. (2016) Evolution of global temperature over the past two million years. Nature, 538, 226-228.

Soria-Carrasco, V., Gompert, Z., Comeault, A.A. et al. (2014) Stick insect genomes reveal natural selection's role in parallel speciation. Science, 344, 738-742.

Stoffel, M.A., Esser, M., Kardos, M. et al. (2016) INBREEDR: An R package for the analysis of inbreeding based on genetic markers. Methods in Ecology and Evolution, 7, 1331-1339.

Uvarov, B. (1977) Grasshoppers and locusts: A handbook of general Acridology (vol. 2): behaviour, ecology, biogeography, population dynamics. 613 pp. Centre for Overseas Pest Research. London, UK.

Valverde-García, P., Santiago-Álvarez, C., Thomas, M.B. et al. (2018) Comparative effects of temperature and thermoregulation on candidate strains of entomopathogenic fungi for Moroccan locust Dociostaurus maroccanus control. BioControl, 63, 819-831.

Vasemägi, A. & Primmer, C.R. (2005) Challenges for identifying functionally important genetic variation: The promise of combining complementary research strategies. Molecular Ecology, 14, 3623-3642.

Venkatesan, M. & Rasgon, J.L. (2010) Population genetic data suggest a role for mosquito- mediated dispersal of west Nile virus across the western United States. Molecular Ecology, 19, 1573-1584.

Wang, I.J. (2013) Examining the full effects of landscape heterogeneity on spatial genetic variation: A multiple matrix regression approach for quantifying geographic and ecological isolation. Evolution, 67, 3403-3411.

Wang, X., Fang, X., Yang, P. et al. (2014) The locust genome provides insight into swarm formation and long-distance flight. Nature Communications, 5, 2957.

Zepeda-Paulo, F.A., Simón, J.C., Ramírez, C.C. et al. (2010) The invasion route for an insect pest species: The tobacco aphid in the new world. Molecular Ecology, 19, 4738-4752.

215

CAPÍTULO IV

SUPPORTING INFORMATION

Supplementary Methods

Methods S1 GENOMIC LIBRARY PREPARATION

We used NucleoSpin Tissue kits (Macherey-Nagel, Durën, Germany) to extract and purify genomic DNA from the hind femur of each individual. Genomic DNA was processed into three genomic libraries using the double-digestion restriction-site associated DNA sequencing procedure (ddRADseq) described in Peterson et al. (2012). In brief, DNA was doubly digested with the restriction enzymes MseI and EcoR1 (New England Biolabs, Ipswich, MA, USA) and Illumina adaptors including unique 7-bp barcodes were ligated to the digested fragments. Ligation products were pooled, size-selected between 475-580 bp with a Pippin Prep (Sage Science, Beverly, MA, USA) machine and amplified by PCR with 12 cycles using the iProofTM High-Fidelity DNA Polymerase (BIO-RAD, Hercules, CA, USA). Libraries were sequenced in single-read 150-bp lanes on an Illumina HiSeq2500 platform at The Centre for Applied Genomics (SickKids, Toronto, ON, Canada).

Methods S2 GENOMIC DATA ANALYSES

We used the different programs distributed as part of the STACKS v. 1.35 pipeline (process_radtags, ustacks, cstacks, sstacks, and populations) to assemble our sequences into de novo loci and call genotypes (Hohenlohe et al., 2010; Catchen et al., 2011; Catchen et al., 2013). Reads were de-multiplexed and filtered for overall quality using the program process_radtags, retaining reads with a Phred score > 10 (using a sliding window of 15%), no adaptor contamination, and that had an unambiguous barcode and restriction cut site. Raw reads were screened for quality with FASTQC v. 0.11.5 (Simon, 2018) and all sequences were trimmed to 129-bp using SEQTK (Heng, 2017) in order to remove low-quality reads near the 3´ ends. Filtered reads of each individual were assembled de novo into putative loci with the ustacks program. The minimum stack depth (m) was set to three and we allowed a

216

CAPÍTULO IV maximum distance of two nucleotide mismatches (M) to group reads into a “stack”. We used the “removal” (r) and “deleveraging” (d) algorithms to eliminate highly repetitive stacks and resolve over-merged loci, respectively. Single nucleotide polymorphisms (SNPs) were identified at each locus and genotypes were called using a multinomial-based likelihood model that accounts for sequencing errors, with the upper bound of the error rate (ε) set to 0.2 (Hohenlohe et al., 2010; Catchen et al., 2011; Catchen et al., 2013). A catalogue of loci was built using the cstacks program, with loci recognized as homologous across individuals if the number of nucleotide mismatches between consensus sequences (n) was ≤2. Each individual data was matched against this catalogue using sstacks program and output files were exported in different formats for subsequent analyses using the program populations. For all downstream analyses, we exported only the first SNP per RAD locus (option write_single_snp) and retained loci with a minimum stack depth ≥ 5 (m = 5), that were sequenced in at least 50% of the individuals of each population (parameter r = 0.5), represented in ~66% of populations, and with a minimum minor allele frequency (MAF) ≥ 0.01 to reduce the number of false polymorphic loci due to sequencing errors. The choice of different filtering thresholds had little impact on the obtained inferences (e.g. González- Serna et al., 2018; Ortego et al., 2018). The resulting files were used for subsequent analyses or converted into other formats using the program PGDSPIDER v.2.1.0.3 (Lischer & Excoffier, 2012).

217

CAPÍTULO IV

Figure S1 Number of reads per individual before and after different quality filtering steps by STACKS. The total height of the bars represents the total number of raw reads obtained for each individual. Within each bar, the dark red color represents the reads that were discarded by process_radtags due to low quality, adapter contamination or ambiguous barcode and orange color represents the reads that were discarded by ustacks after filtering out repetitive elements and reads that did not comply the different criteria required to create a “stack”. Green color represents the number of retained reads used to identify homologous loci. The individual with an asterisk was removed for subsequent analyses (< 1,000,000 retained reads). Populations are labelled using the same codes presented in Table S1.

218

CAPÍTULO IV

Figure S2 Results of LFMM analyses for three environmental principal components (PC1, PC2 and PC3). Histograms show the frequency of loci with different p-values for each environmental principal component tested (PC1, PC2 and PC3). Analyses were performed for (A, B, C) all populations and (D, E, F) only considering populations from the Iberian Peninsula. Grey bars indicate unadjusted p-values and green bars indicate p-values adjusted with the genomic inflation factor (λ) that is indicated in each panel.

219

CAPÍTULO IV

Figure S3 Results of Bayesian clustering analyses in STRUCTURE based on (A, C, E) neutral loci and (B, D) loci identified to be under divergent selection by ARLEQUIN (FDIST method) and BAYESCAN. Analyses were performed for (A, B) all populations, only considering populations from (C, D) the Iberian Peninsula and (E) the Canary Islands (only for neutral loci). Panels show the mean (± SD) log probability of the data (LnPr (X|K) over 10 best runs (left y-axes, open dots, and error bars) for each value of K and the magnitude of ΔK (right y-axes, black dots). The number of loci used in the different analyses is indicated in each panel.

220

CAPÍTULO IV

Figure S4 Inferred demographic profiles for populations of Moroccan locust using STAIRWAY PLOT. Within in each panel, the red line shows the median -9 estimate of effective population size (Ne) over time, assuming a mutation rate of 2.8 × 10 and 1-year generation time. Grey lines represent confidence intervals (CI) obtained in STAIRWAY PLOT: thick grey lines are 97.5% and 2.5% percentiles, and thin grey lines are 87.5% and 12.5% percentiles, respectively. Population codes are described in Table S1 and name colours are the same as in Fig. 1.

221

CAPÍTULO IV

Table S1 Geographical location of the studied populations of Moroccan locust, population codes, sampling year, number of individuals per population (n) and scores for the three environmental principal components used in LFMM analyses (PC1, PC2 and PC3). Localities with an asterisk indicate pest outbreaks during the sampling year. Average values of genetic statistics across neutral loci are presented for major allele frequency (P), observed (HO) and expected (HE) heterozygosity, nucleotide diversity (π), and the Wright’s inbreeding coefficient (FIS) using STACKS for all positions (polymorphic and non-polymorphic).

Locality (Province) Code Year n Latitude Longitude PC1 PC2 PC3 P HO HE π FIS Trabanca (Salamanca) TRAB 2015 8 41.24275 -6.40222 -0.447 -0.283 1.282 0.9994 0.0007 0.0009 0.0010 0.0007 Sando (Salamanca)* SAND 2015 8 40.96961 -6.10954 -0.573 -0.279 0.630 0.9994 0.0007 0.0009 0.0010 0.0007 Salamanca (Salamanca) SALA 2015 5 40.93764 -5.66806 -0.608 -0.335 -0.726 0.9994 0.0007 0.0009 0.0010 0.0006 Alhama de Aragón (Zaragoza)* ALHA 2015 8 41.34571 -1.92277 -1.613 -0.820 -0.939 0.9994 0.0007 0.0009 0.0010 0.0007 Los Llanos de Cáceres (Cáceres) CACE 2015 8 39.53310 -6.32455 1.072 0.887 0.094 0.9994 0.0007 0.0009 0.0010 0.0008 Puerto de la Berzocana (Cáceres) BERZ 2012 5 39.44998 -5.42609 -0.349 0.401 1.006 0.9994 0.0007 0.0008 0.0009 0.0005 Castuera (Cáceres)* CAST 2015 8 38.74827 -5.54450 1.036 1.186 0.466 0.9994 0.0007 0.0009 0.0010 0.0007 Cañada del Hoyo (Cuenca) HOYO 2015 8 39.93709 -1.99317 -1.787 -0.467 0.010 0.9994 0.0007 0.0009 0.0010 0.0007 Laguna de Tirez (Toledo) TIRE 2015 7 39.54259 -3.35907 0.144 0.660 -1.063 0.9994 0.0007 0.0009 0.0010 0.0007 Raña Cornicabra (Ciudad Real)* CORN 2011 5 39.13134 -4.73814 0.550 0.924 -0.471 0.9994 0.0007 0.0009 0.0010 0.0005 Valle de Alcudia (Ciudad Real)* ALCU 2015 7 38.58958 -4.35354 0.560 1.059 0.502 0.9994 0.0008 0.0009 0.0010 0.0006 Belalcázar (Córdoba)* BELA 2015 5 38.67942 -5.16172 1.125 1.311 0.342 0.9994 0.0008 0.0009 0.0010 0.0005 El Bonillo (Albacete) BONI 2015 7 38.93758 -2.56544 -0.528 0.209 -0.507 0.9994 0.0007 0.0009 0.0010 0.0008 La Felipa (Albacete) FELI 2015 5 39.03552 -1.64832 -0.299 -0.185 -1.642 0.9994 0.0007 0.0009 0.0010 0.0005 Santa Elena (Jaén) SANT 2015 5 38.33753 -3.51923 0.594 0.935 0.098 0.9994 0.0008 0.0009 0.0010 0.0005 Santiago de la Espada (Jaén) ESPA 2011 5 38.18128 -2.66677 -1.238 -0.058 0.920 0.9994 0.0007 0.0009 0.0010 0.0005 Jumilla (Murcia)* JUMI 2015 5 38.49016 -1.24497 0.228 -0.237 -1.815 0.9994 0.0007 0.0009 0.0010 0.0005 Caravaca de la Cruz (Murcia)* CARA 2015 8 38.10283 -2.01999 -0.208 -0.053 -0.895 0.9994 0.0007 0.0009 0.0010 0.0007 Alpujarra (Granada)* ALPU 2015 8 36.96846 -3.21425 -0.986 -0.101 1.811 0.9994 0.0007 0.0009 0.0010 0.0007 Tenerife (Santa Cruz de Tenerife) TENE 2016 8 28.41569 -16.40991 1.343 -2.257 1.449 0.9995 0.0006 0.0007 0.0008 0.0004 El Hierro (Santa Cruz de Tenerife)* HIER 2016 8 27.77303 -17.95796 1.984 -2.497 -0.555 0.9995 0.0006 0.0007 0.0008 0.0004

222

CAPÍTULO IV

Table S2 Contributions (factor loadings) of each environmental variable to the different principal components (PC1, PC2 and PC3). Values > 0.8 and < -0.8 are highlighted in grey.

Environmental variables PC1 PC2 PC3 Annual Mean Temperature 0.855 0.392 -0.289 Mean Diurnal Range (Mean of monthly (max temp - min temp)) -0.478 0.837 -0.135 Isothermality (BIO2/BIO7) (* 100) 0.085 -0.874 -0.262 Temperature Seasonality (standard deviation *100) -0.438 0.896 -0.050 Max Temperature of Warmest Month 0.167 0.975 -0.116 Min Temperature of Coldest Month 0.940 -0.299 -0.074 Temperature Annual Range (BIO5-BIO6) -0.420 0.899 -0.050 Mean Temperature of Wettest Quarter 0.198 -0.110 -0.835 Mean Temperature of Driest Quarter 0.404 0.875 -0.242 Mean Temperature of Warmest Quarter 0.458 0.848 -0.240 Mean Temperature of Coldest Quarter 0.950 -0.187 -0.195 Annual Precipitation -0.298 0.222 0.902 Precipitation of Wettest Month 0.262 -0.423 0.793 Precipitation of Driest Month -0.955 -0.094 -0.121 Precipitation Seasonality (Coefficient of Variation) 0.812 -0.450 0.341 Precipitation of Wettest Quarter 0.264 -0.386 0.867 Precipitation of Driest Quarter -0.965 0.063 -0.174 Precipitation of Warmest Quarter -0.947 -0.061 -0.202 Precipitation of Coldest Quarter 0.380 -0.133 0.910

223

CAPÍTULO IV

Table S3 Pairwise FST values calculated for neutral loci. Upper diagonal shows FST values for all populations and lower diagonal present FST values for analyses restricted to populations from the Iberian Peninsula.

TRAB SAND SALA ALHA CACE BERZ CAST HOYO TIRE CORN ALCU BELA BONI FELI SANT ESPA JUMI CARA ALPU TENE HIER

TRAB 0.052 0.061 0.053 0.053 0.078 0.052 0.058 0.054 0.060 0.054 0.062 0.061 0.063 0.062 0.064 0.061 0.051 0.058 0.084 0.078

SAND 0.052 0.060 0.054 0.051 0.079 0.051 0.058 0.054 0.061 0.054 0.061 0.061 0.062 0.061 0.063 0.061 0.051 0.056 0.083 0.077

SALA 0.061 0.061 0.063 0.062 0.096 0.060 0.069 0.066 0.077 0.065 0.076 0.073 0.078 0.077 0.079 0.077 0.061 0.065 0.101 0.095

ALHA 0.054 0.055 0.064 0.053 0.078 0.052 0.061 0.057 0.063 0.054 0.063 0.064 0.067 0.064 0.065 0.065 0.054 0.057 0.085 0.081

CACE 0.053 0.052 0.063 0.053 0.079 0.054 0.057 0.056 0.063 0.056 0.064 0.063 0.064 0.064 0.064 0.062 0.051 0.055 0.084 0.078

BERZ 0.079 0.079 0.097 0.079 0.079 0.078 0.086 0.082 0.101 0.085 0.099 0.092 0.097 0.101 0.102 0.096 0.077 0.084 0.123 0.116

CAST 0.052 0.051 0.060 0.053 0.054 0.079 0.056 0.056 0.062 0.056 0.064 0.062 0.063 0.063 0.062 0.062 0.052 0.056 0.083 0.078

HOYO 0.059 0.059 0.070 0.062 0.057 0.087 0.057 0.061 0.068 0.060 0.068 0.069 0.071 0.069 0.072 0.069 0.057 0.063 0.090 0.084

TIRE 0.055 0.055 0.066 0.058 0.056 0.083 0.056 0.061 0.067 0.059 0.068 0.067 0.070 0.067 0.068 0.068 0.055 0.058 0.088 0.082

CORN 0.060 0.061 0.078 0.063 0.063 0.101 0.062 0.068 0.067 0.067 0.079 0.074 0.080 0.080 0.081 0.078 0.061 0.064 0.102 0.096

ALCU 0.055 0.054 0.066 0.055 0.056 0.085 0.056 0.060 0.060 0.067 0.068 0.066 0.068 0.069 0.068 0.066 0.054 0.059 0.088 0.082

BELA 0.063 0.061 0.077 0.064 0.064 0.099 0.065 0.068 0.068 0.080 0.069 0.077 0.081 0.081 0.080 0.079 0.060 0.067 0.102 0.095

BONI 0.062 0.062 0.074 0.065 0.064 0.093 0.062 0.070 0.068 0.075 0.066 0.077 0.080 0.077 0.076 0.077 0.061 0.067 0.099 0.093

FELI 0.063 0.063 0.078 0.068 0.064 0.098 0.064 0.072 0.070 0.080 0.068 0.082 0.080 0.082 0.081 0.083 0.063 0.068 0.104 0.098

SANT 0.063 0.061 0.077 0.065 0.064 0.101 0.064 0.069 0.068 0.080 0.069 0.082 0.078 0.083 0.079 0.080 0.061 0.067 0.103 0.096

ESPA 0.064 0.063 0.080 0.066 0.064 0.102 0.063 0.073 0.068 0.081 0.068 0.080 0.077 0.081 0.080 0.079 0.062 0.070 0.105 0.097

JUMI 0.062 0.061 0.078 0.066 0.063 0.097 0.062 0.069 0.069 0.079 0.067 0.080 0.078 0.083 0.081 0.079 0.062 0.065 0.102 0.096

CARA 0.051 0.051 0.061 0.055 0.052 0.077 0.052 0.058 0.055 0.062 0.054 0.061 0.062 0.064 0.062 0.063 0.062 0.054 0.082 0.078

ALPU 0.059 0.057 0.065 0.058 0.056 0.085 0.056 0.064 0.059 0.065 0.060 0.068 0.068 0.068 0.068 0.070 0.065 0.055 0.089 0.082

TENE 0.060

HIER

224

CAPÍTULO IV

Table S4 Pairwise FST values calculated for loci identified to be under diversifying selection by ARLEQUIN (FDIST method) and BAYESCAN. Upper diagonal shows FST values for all populations and lower diagonal present FST values for analyses restricted to populations from the Iberian Peninsula.

TRAB SAND SALA ALHA CACE BERZ CAST HOYO TIRE CORN ALCU BELA BONI FELI SANT ESPA JUMI CARA ALPU TENE HIER

TRAB 0.095 0.117 0.118 0.109 0.185 0.096 0.127 0.116 0.109 0.106 0.127 0.124 0.120 0.117 0.120 0.133 0.100 0.134 0.398 0.373

SAND 0.139 0.114 0.108 0.103 0.186 0.101 0.128 0.105 0.116 0.107 0.122 0.120 0.126 0.113 0.121 0.134 0.100 0.126 0.394 0.363

SALA 0.168 0.168 0.127 0.130 0.222 0.107 0.135 0.128 0.138 0.130 0.147 0.137 0.140 0.140 0.151 0.154 0.113 0.134 0.422 0.379

ALHA 0.177 0.154 0.165 0.124 0.195 0.123 0.141 0.126 0.127 0.120 0.139 0.138 0.144 0.126 0.147 0.139 0.116 0.132 0.411 0.383

CACE 0.158 0.148 0.163 0.174 0.191 0.102 0.134 0.130 0.115 0.105 0.123 0.117 0.134 0.113 0.138 0.133 0.100 0.130 0.395 0.372

BERZ 0.310 0.306 0.349 0.311 0.304 0.183 0.207 0.187 0.210 0.195 0.223 0.204 0.239 0.213 0.230 0.218 0.182 0.211 0.455 0.433

CAST 0.124 0.127 0.141 0.154 0.131 0.289 0.132 0.110 0.114 0.100 0.123 0.109 0.122 0.118 0.128 0.123 0.106 0.115 0.399 0.370

HOYO 0.185 0.190 0.193 0.198 0.198 0.352 0.184 0.138 0.141 0.136 0.158 0.142 0.144 0.133 0.163 0.156 0.123 0.155 0.421 0.392

TIRE 0.162 0.135 0.178 0.162 0.158 0.295 0.125 0.175 0.124 0.105 0.135 0.135 0.140 0.131 0.132 0.141 0.108 0.133 0.401 0.376

CORN 0.156 0.145 0.170 0.162 0.148 0.301 0.139 0.191 0.141 0.113 0.140 0.131 0.144 0.128 0.157 0.164 0.100 0.132 0.402 0.375

ALCU 0.153 0.130 0.169 0.158 0.151 0.295 0.120 0.184 0.129 0.157 0.134 0.120 0.146 0.122 0.139 0.141 0.099 0.119 0.406 0.381

BELA 0.171 0.185 0.196 0.186 0.179 0.344 0.159 0.249 0.159 0.205 0.159 0.156 0.169 0.143 0.166 0.166 0.123 0.153 0.411 0.388

BONI 0.156 0.159 0.185 0.180 0.160 0.311 0.132 0.202 0.170 0.162 0.151 0.209 0.150 0.132 0.151 0.147 0.111 0.141 0.413 0.381

FELI 0.154 0.171 0.201 0.195 0.200 0.370 0.153 0.213 0.183 0.189 0.184 0.232 0.179 0.154 0.164 0.161 0.127 0.157 0.431 0.401

SANT 0.193 0.179 0.226 0.196 0.166 0.338 0.179 0.216 0.178 0.191 0.187 0.204 0.194 0.226 0.150 0.151 0.113 0.134 0.406 0.376

ESPA 0.192 0.172 0.220 0.196 0.202 0.341 0.185 0.249 0.177 0.216 0.187 0.220 0.196 0.228 0.224 0.159 0.141 0.152 0.423 0.393

JUMI 0.185 0.167 0.209 0.185 0.182 0.343 0.147 0.216 0.163 0.190 0.173 0.218 0.180 0.237 0.223 0.217 0.125 0.165 0.422 0.395

CARA 0.148 0.139 0.143 0.162 0.127 0.289 0.137 0.166 0.140 0.131 0.138 0.179 0.152 0.166 0.171 0.215 0.175 0.125 0.391 0.363

ALPU 0.201 0.168 0.202 0.194 0.177 0.321 0.163 0.220 0.164 0.190 0.152 0.200 0.199 0.211 0.211 0.232 0.231 0.181 0.413 0.391

TENE 0.126

HIER

225

CAPÍTULO IV

Supplementary References

Catchen, J.M., Amores, A., Hohenlohe, P.A. et al. (2011) STACKS: building and genotyping loci de novo from short-read sequences. G3: Genes|Genomes|Genetics, 1, 171-182.

Catchen, J.M., Hohenlohe, P.A., Bassham, S. et al. (2013) STACKS: an analysis tool set for population genomics. Molecular Ecology, 22, 3124-3140.

González-Serna, M.J., Ortego, J. & Cordero, P.J. (2018) A review of cross-backed grasshoppers of the genus Dociostaurus Fieber (Orthoptera: Acrididae) from the western Mediterranean: insights from phylogenetic analyses and DNA-based species delimitation. Systematic Entomology, 43, 136-146.

Heng, L. (2017) SEQTK. [WWW document]. URL https://github.com/lh3/seqtk

Hohenlohe, P.A., Bassham, S., Etter, P.D. et al. (2010) Population genomics of parallel adaptation in threespine stickleback using sequenced RAD tags. PLOS Genetics, 6, e1000862.

Lischer, H.E. & Excoffier, L. (2012) PGDSPIDER: an automated data conversion tool for connecting population genetics and genomics programs. Bioinformatics, 28, 298-299.

Ortego, J., Gugger, P.F. & Sork, V.L. (2018) Genomic data reveal cryptic lineage diversification and introgression in Californian golden cup oaks (section Protobalanus). New Phytologist, 218, 804-818.

Peterson, B.K., Weber, J.N., Kay, E.H. et al. (2012) Double digest RADseq: an inexpensive method for de novo SNP discovery and genotyping in model and non-model species. PLOS ONE, 7, e37135.

Simon, A. (2018) FASTQC v.0.11.7. [WWW document]. URL http://www.bioinformatics.babraham.ac.uk/projects/fastqc/

226

DISCUSIÓN GENERAL 2

Camino de tierra ocupado por cientos de D. maroccanus en fase gregaria, Valle de Alcudia (Ciudad Real). Fotografía tomada por Pedro J. Cordero.

DISCUSIÓN GENERAL

Las incongruencias y los cambios en la taxonomía de varias especies del género Dociostaurus desde finales del siglo XVIII hasta la actualidad se tomaron como punto de partida para su re- evaluación desde una aproximación molecular. La actual disponibilidad de herramientas moleculares y su gran capacidad para resolver conflictos a diferentes niveles taxonómicos nos ha permitido arrojar luz a esta problemática (Bickford et al., 2007). Los análisis filogenéticos a nivel de género/subgénero revelaron importantes discordancias entre la taxonomía clásica y molecular. A un nivel supraespecífico, los resultados indicaron que las distintas especies del género Dociostaurus no se agrupan filogenéticamente de acuerdo con las expectativas de la actual clasificación en subgéneros y se concluyó que el estado actual de los mismos no es apropiado. Esto concuerda con el hecho de que algunas especies han sido objeto de constantes reclasificaciones en distintos subgéneros a lo largo del último siglo y medio (p. ej. ver Tabla 2 del Capítulo 1). Estos resultados indican también la necesidad de análisis filogenéticos que abarquen un mayor número de especies y que nos permitan evaluar de modo más concluyente la validez taxonómica de la clasificación supraespecífica actualmente aceptada para el género. Destacar que a partir de los resultados y conclusiones obtenidas en el Capítulo 1 de la presente tesis doctoral se ha incorporado una nota en la base de datos Orthoptera Species File (http://orthoptera.speciesfile.org) indicando que “el estatus de la clasificación actual en subgéneros [del género Dociostaurus] no es satisfactorio” (Cigliano et al., 2019). Cabe destacar además que la actual categorización del estado de conservación de las distintas especies de ortópteros por parte de la Unión Internacional para la Conservación de la Naturaleza (UICN, 2018) se sustenta en esta base de datos taxonómicos (Hochkirch et al., 2016) y, por lo tanto, esperamos que los resultados derivados de este capítulo tengan también un importante impacto aplicado de cara a la conservación de algunas especies amenazadas (ver Capítulos 2 y 3).

La resolución de las relaciones filogenéticas entre las distintas especies y la datación aproximada de sus tiempos de divergencia nos pueden ayudar también a entender mejor el origen biogeográfico de los distintos taxones. Este es el caso del taxón D. minutus, una especie de gran interés por encontrarse actualmente en peligro de extinción debido a que su pequeño rango de distribución se encuentra limitado a sistemas dunares del sur de Sicilia que han sido enormemente alterados por las actividades humanas (Buzzetti et al., 2016; Hochkirch et al.,

229

DISCUSIÓN GENERAL

2016). Nuestros análisis filogenéticos mostraron que su especie hermana es D. jagoi, un taxón muy abundante y ampliamente distribuido por la península ibérica, Francia, norte de África y Oriente Próximo, y pudimos datar que su divergencia tuvo lugar alrededor de hace 2,19 millones de años. Una hipótesis para explicar la separación entre D. jagoi y D. minutus es que su ancestro común hubiera alcanzado Sicilia a través del denominado puente sículo-tunecino como resultado del descenso del nivel del mar Mediterráneo durante los periodos glaciales del Pleistoceno (ver también Stöck et al., 2008) y que esta colonización haya sido seguida de un proceso de divergencia en alopatría a ambos lados del estrecho.

Los análisis filogenéticos y de delimitación de especies abordados en el Capítulo 1 también nos han permitido resolver el estatus de dos pares de taxones (Sáez & Lozano, 2005) que presentan distribuciones disyuntas en la península ibérica, por un lado, y en Europa del Este y Asia central, por otro: D. crassiusculus vs. D. kraussi y D. hispanicus vs. D. brevicollis, siendo las primeras de cada pareja endemismos ibéricos que han sido recientemente clasificados con altos niveles de amenaza según la UICN (Hochkirch et al., 2016). Nuestros análisis de delimitación de especies pusieron de manifiesto que D. crassiusculus es un taxón que debe ser elevado a la categoría de especie, en lugar de considerarse como una subespecie de D. kraussi, y como resultado se le debe asignar una alta prioridad de conservación dada la escasez de sus poblaciones y su elevado grado de fragmentación (Cordero et al., 2010; Capítulo 2). Asimismo, según los análisis filogenéticos y de delimitación de especies, las subespecies D. c. nigrogeniculatus y de D. c. aurantipes son más cercanas a D. kraussi, debiendo ser consideradas subespecies de este taxón o simplemente parte de su variabilidad fenotípica a lo largo de su amplia distribución por el este de Europa y centro de Asia. En cualquier caso, se hacen ineludibles análisis genéticos adicionales para poder contrastar estas hipótesis. Con respecto a D. hispanicus y D. brevicollis, nuestros análisis filogenéticos y de delimitación de especies confirman que se ambos taxones se tratan de especies hermanas perfectamente definidas, resolviendo de esta manera las confusiones taxonómicas de las que han sido objeto a lo largo del último siglo y medio como resultado de su gran parecido morfológico. En su conjunto, nuestros resultados ponen en evidencia la vulnerabilidad del concepto de especie morfológica que ha regido la sistemática del género Dociostaurus a diferentes escalas taxonómicas (subgénero, especie y subespecie) y apuntan a la necesidad

230

DISCUSIÓN GENERAL de integrar datos ecológicos, fenotípicos y genéticos para comprender mejor su diversidad biológica (Knowlton, 1993; Sáez & Lozano, 2005).

Nuestros datos moleculares indican que la principal diversificación del género Dociostaurus probablemente tuvo lugar durante el Mioceno-Plioceno mientras que la separación de la mayor parte de especies hermanas estudiadas ocurrió durante el Pleistoceno, hace unos 1,88 millones de años para D. hispanicus – D. brevicollis y 1,01 millones de años para D. crassiusculus – D. kraussi. Estos tiempos de divergencia son similares a los ya inferidos para otros pares de especies hermanas del orden Orthoptera (1-2 millones de años; p. ej.: Hemp et al., 2015; García-Navas et al., 2017). Los procesos de especiación más recientes han podido estar ligados a la fragmentación poblacional como resultado de las oscilaciones climáticas acontecidas durante el Pleistoceno (Knowles & Richards, 2005; Noguerales et al., 2016) y, en algunos casos, al aislamiento geográfico de las poblaciones ancestrales que han dado lugar a la formación de especies con distribuciones alopátricas. Como es de esperar, las subespecies D. j. jagoi y D. j. occidentalis divergieron más recientemente, en torno a 0,71 millones de años. Esto apoya que D. c. nigrogeniculatus realmente sea una subespecie de D. kraussi, ya que los datos moleculares sugieren que su divergencia tuvo lugar hace solo 0,35 millones de años, un intervalo de tiempo mucho más reciente que el obtenido para la pareja de especies hermanas crípticas D. crassiusculus - D. kraussi o D. hispanicus - D. brevicollis.

El Capítulo 2 se centró en el estudio del taxón D. crassiusculus, una especie amenazada y endémica del interior de la península ibérica. Los datos moleculares obtenidos mediante técnicas de secuenciación masiva indicaron que D. crassiusculus se organiza en dos linajes principales (sur y centro-norte) que divergieron hace unos 126000 años. El linaje centro-norte comprende a su vez dos grupos genéticos que se separaron después del último máximo glacial, hace unos 17000 años. Los resultados obtenidos del contraste de diferentes modelos demográficos apuntan a la presencia de flujo genético ancestral entre los dos linajes principales, algo que podría ser debido a procesos recurrentes de conexión y fragmentación de sus poblaciones durante los cambios climáticos acontecidos durante el Pleistoceno. Sin embargo, la muy escasa o inexistente migración entre los actuales grupos genéticos pone de manifiesto la alta fragmentación entre las exiguas poblaciones que todavía persisten de esta

231

DISCUSIÓN GENERAL especie. Los análisis dirigidos a identificar los factores que determinan el flujo genético entre poblaciones indicaron que los límites entre las principales cuencas hidrográficas son el principal factor que explica la diferenciación genética, mientras que no se detectó ningún efecto de la distancia geográfica, la distribución espacial de sus hábitats potenciales, la litología o la topografía. Estos resultados concuerdan con el hecho de que cada grupo genético se asocia con una cuenca hidrográfica: el grupo genético sur con la del Guadalquivir, el central con la del Guadiana y el del norte con la del Tajo. Esto pone de manifiesto la importancia de las cuencas hidrográficas como “corredores” que podrían facilitar la conectividad interpoblacional a largo plazo en esta especie.

Los resultados del Capítulo 1 pusieron de manifiesto la necesidad de elevar el actual estatus de conservación del taxón D. crassiusculus a una mayor categoría de amenaza (Hochkirch et al., 2016) como especie endémica de la península ibérica, mientras que los análisis de estructura genética e historia demográfica abordados en el Capítulo 2 nos facultaron para asignar los tres grupos genéticos identificados a distintas Unidades Evolutivas Significativas (Evolutionary Significant Units, ESUs: Ryder, 1986; Waples, 1991; Moritz, 1994) que posiblemente aglutinan el legado evolutivo de la especie (Waples et al., 1991). Es de destacar que la circunscripción de estas ESUs (sur, centro y norte) a diferentes administraciones (Andalucía, Castilla-La Mancha, Madrid) supone una ventaja de cara a la elaboración de estrategias de manejo, planes de conservación y políticas de protección regionales que permitan preservar sus respectivas historias evolutivas idiosincráticas (Moritz, 1999). Estas medidas de conservación deben contemplar la protección prioritaria de las áreas caracterizadas por una litología salina y sedimentaria (p. ej.: evaporitas y limos) y con comunidades vegetales halófitas que constituyen los hábitats ocupados por D. crassiusculus así como la monitorización periódica de las poblaciones de la especie para conocer sus tendencias demográficas a largo plazo y detectar posibles amenazas.

En el Capítulo 3 se abordó el estudio de la filogeografía, genética del paisaje y demografía de D. hispanicus, un endemismo de la península ibérica con cierto grado de amenaza según la IUCN y cuyas poblaciones se encuentran altamente fragmentadas como resultado de la transformación de buena parte de sus hábitats naturales en terrenos

232

DISCUSIÓN GENERAL dedicados a la agricultura (Hochkirch et al., 2016; Presa et al., 2016). Nuestros análisis genómicos mostraron que las poblaciones de D. hispanicus a lo largo de su rango de distribución exhiben una marcada estructuración genética. A una escala amplia, las poblaciones se dividen en dos grupos genéticos separados por el Sistema Central que, a más fina escala, se subestructuran de acuerdo a la presencia de otros sistemas montañosos de menor rango (Montes de Toledo, Sierra Morena). Estos resultados apuntan a que las características fisionómicas del paisaje son posiblemente responsables del patrón espacial de estructura y flujo genético de la especie y han moldeado su distribución, demografía y variación genética. Para analizar formalmente los procesos que están detrás de la estructura genética de la especie, se utilizó información sobre diferentes aspectos de su ecología para generar modelos demográficos espaciotemporalmente explícitos que posteriormente se contrastaron utilizando los datos genómicos empíricos en un marco de trabajo de estadística Bayesiana (ABC) (He et al., 2013). Estos modelos representaban diferentes hipótesis sobre los procesos que han podido moldear la variación genómica de las poblaciones de D. hispanicus, incluyendo la presencia de barreras topográficas al flujo genético, las alteraciones en su distribución ligadas a los cambios climáticos del Pleistoceno y la fragmentación de sus hábitats por parte del hombre. El modelo más apoyado fue aquel que incorporaba tanto las barreras topográficas como la distribución contemporánea de sus hábitats naturales. El importante papel de las barreras topográficas (pendientes > 20%) pone de manifiesto la fuerte preferencia de la especie por paisajes llanos y su impacto negativo en el flujo genético puede reflejar el elevado gasto energético que conlleva moverse a través de paisajes abruptos (Castillo et al., 2014), la aversión a atravesar determinados ambientes durante la dispersión (i.e. laderas escarpadas; Wang & Bradburd, 2014), o la incapacidad de la especie para persistir a elevadas cotas altitudinales donde las condiciones microclimáticas quedan fuera del rango fisiológico tolerado por la especie (Slatyer et al., 2016; Strangas et al., 2018). Puesto que la formación de estos sistemas montañosos antecede a la separación de D. hispanicus de su taxón hermano D. brevicollis (~1,9 millones de años; ver Capítulo 1), se puede asumir que la estructuración genética de la especie en relación a estos accidentes topográficos no es el resultado de un proceso de fragmentación de una población originalmente continua sino de una expansión de su rango de distribución seguida de deriva genética favorecida por la presencia de barreras

233

DISCUSIÓN GENERAL que limitan el flujo genético. Contrariamente a lo esperado, la rugosidad topográfica en sí (i.e. la irregularidad o accidentalidad del territorio) parece que no ejerció ningún efecto en la diferenciación genética de las poblaciones, lo que apunta a la importancia del efecto barrera en lugar de un efecto progresivo de la irregularidad topográfica en la limitación del flujo genético. Aunque la fragmentación antropogénica de los hábitats también ha moldeado la distribución de la variabilidad genética de la especie, su impacto fue menor en comparación con el de las barreras topográficas. Esto concuerda con el hecho de que las reconstrucciones demográficas no detectaran un declive poblacional durante el Antropoceno, algo que apunta a que la huella genética de los procesos recientes de fragmentación poblacional es muy pequeña en comparación con las señales dejadas por procesos históricos que han impactado los patrones de flujo genético de la especie durante largos periodos de tiempo (Lange et al., 2010; Keller et al., 2013; Ortego et al., 2015).

Muy distinta en numerosos aspectos a los dos taxones anteriormente estudiados es el caso de la langosta marroquí (D. maroccanus), una especie plaga dañina para la agricultura con grandes repercusiones económicas tanto en España como en otros países (Latchininsky, 1998). Los análisis de estructura genética basados en loci neutrales revelaron la presencia de dos grupos genéticos bien definidos correspondientes a las islas Canarias y a la península ibérica. Las dos poblaciones estudiadas en las islas Canarias (i.e. Tenerife y El Hierro) se encuentran a su vez estructuradas en dos grupos genéticos bien definidos, mientras que el conjunto de poblaciones ibéricas presenta muy escasa diferenciación genética. De la misma forma, las poblaciones canarias presentaron niveles de diversidad genética más bajos que los correspondientes a las poblaciones ibéricas, a excepción de dos poblaciones peninsulares pequeñas y aisladas (Puerto de la Berzocana -Cáceres- y Cañada del Hoyo -Cuenca-) en las que también se observaron bajos niveles de diversidad genética y/o mayor diferenciación con respecto al resto de poblaciones ibéricas. La estructura genética entre las poblaciones canarias e ibéricas se puede explicar por la barrera al flujo genético que supone la masa de agua oceánica que las separa, su gran distancia geográfica (1700 km aprox.) y la elevada deriva genética en las primeras como resultado de sus menores tamaños efectivos poblacionales y mayor grado de aislamiento. La baja estructuración genética obtenida para el conjunto de la península ibérica es un resultado esperable para una especie con la gran capacidad dispersiva

234

DISCUSIÓN GENERAL

(Latchininsky, 1998) y contrasta notablemente con la marcada estructura observada en D. crassiusculus (Capítulo 2) y D. hispanicus (Capítulo 3). En conjunto, estos resultados concuerdan con investigaciones previas sobre langostas formadoras de plagas (Chapuis et al., 2008; Chapuis et al., 2011a) y con los obtenidos para algunas especies pelágicas marinas con altos niveles de flujo genético (p. ej. la platija: Hoarau et al., 2002; o la anguila: Als et al., 2011). En cualquier caso, se debe considerar que los efectos homogenizadores del flujo genético no son necesariamente indicadores de dependencia demográfica o sincronía entre las distintas poblaciones, de modo que diferentes aspectos ligados a las condiciones ambientales locales son posiblemente el factor fundamental que está detrás de los fenómenos de irrupción de plagas (Chapuis et al., 2011a; Chapuis et al., 2011b; Aragón & Lobo, 2012, Aragón et al, 2013).

Las reconstrucciones demográficas para las poblaciones estudiadas de langosta marroquí convergen en un mismo patrón general que se caracteriza fundamentalmente por un marcado cuello de botella durante el último máximo glacial (~ hace 21000 años) seguido de una abrupta expansión poblacional al comienzo del Holoceno. Estos resultados son congruentes con la naturaleza termófila de la especie y concuerdan con evidencias del registro fósil que apuntan a que su abundancia ha sido mucho mayor durante los periodos más cálidos del Pleistoceno (Meco et al., 2010; Meco et al., 2011). No obstante, existen algunas particularidades destacables en los perfiles demográficos obtenidos para las poblaciones de las islas Canarias: éstas experimentaron durante la última glaciación un descenso demográfico menos marcado que las ibéricas y, en general, han sostenido menores tamaños efectivos poblacionales a lo largo del tiempo que las poblaciones peninsulares. La primera diferencia podría tener su explicación en la menor incidencia de las oscilaciones climáticas del Pleistoceno en latitudes más bajas (Fernández-Palacios et al., 2015; Snyder, 2016) aunque, a pesar de ello, éstas sí parecen haber tenido un impacto importante en la demografía de las poblaciones canarias de langosta marroquí desde hace al menos unos 400000 años (Meco et al., 2010; Meco et al., 2011). Estos resultados apuntan a que la langosta marroquí puede encontrar en el futuro condiciones ambientales más favorables debido al aumento de la temperatura global y a los procesos de aridificación a los que se encuentran sometidas algunas regiones, lo que podría incrementar el riesgo de irrupciones de plagas a medio y largo plazo bajo ciertas condiciones de usos del suelo (Aragón et al., 2013).

235

DISCUSIÓN GENERAL

Los loci bajo selección y con frecuencias alélicas asociadas a variables ambientales supusieron una proporción pequeña de la parte secuenciada del genoma de la langosta marroquí (0,2-1,0% y 12-15%, respectivamente). Estos resultados apuntan a que podría haber procesos de adaptación local en determinadas poblaciones en respuesta a diferentes agentes selectivos. Es de destacar que ninguno de los loci bajo selección divergente mostraron homología con secuencias de genes disponibles en la base de datos NCBI-GenBank, lo que podría ser debido a la escasez de recursos genómicos para ortópteros (solo está secuenciado el genoma completo de la Locusta migratoria; Wang et al., 2014) y/o a que las señales observadas sean el resultado de una selección por barrido selectivo debido al arrastre por ligamiento (genetic hitchhiking) entre nuestros marcadores (SNPs) y genes muy cercanos, no secuenciados, sometidos a selección divergente (Smith & Haigh, 1974). Como era de esperar, los loci bajo selección divergente mostraron una estructuración genética más marcada que los loci neutrales, mostrando un total de cuatro grupos genéticos bien definidos, pero sin una correspondencia clara ni con respecto a la geografía ni al estatus de las poblaciones (gregaria vs. solitaria). Esto puede explicarse por el hecho de que, a pesar de los altos niveles de flujo genético de esta especie, ciertas poblaciones de distintas regiones han podido experimentar regímenes selectivos similares que hayan dado lugar a procesos convergentes de adaptación local (Guo et al., 2016; Bernatchez et al., 2018). El hecho de que la diferenciación genética en loci bajo selección divergente no esté asociada con la disimilitud ambiental entre poblaciones podría indicar que los factores ecológicos que pudieran estar actuando como agentes de selección son distintos a los considerados en nuestra caracterización ambiental de las poblaciones (variables bioclimáticas), tales como el grado de pastoreo, plantas hospedadoras/nutricias, depredadores o parásitos (p. ej.: Arias et al., 1994; Louveaux et al., 1996; Barranco et al., 2000; Kokanova, 2014; Valverde-García et al., 2018).

Los resultados obtenidos en el Capítulo 4 sobre la dinámica demográfica y evolutiva de la langosta marroquí (p. ej.: conectividad interpoblacional y regional, posible impacto de las adaptaciones locales en la persistencia de las poblaciones, futuras expansiones del rango de distribución de la especie o mayor frecuencia de irrupciones de plagas debido al cambio climático bajo determinados usos del suelo, etc.) puede servir de gran ayuda a los responsables de las políticas de manejo de esta especie para establecer estrategias de gestión

236

DISCUSIÓN GENERAL mejor orientadas que permitan reducir los daños en la agricultura y su impacto económico negativo en las áreas langosteras (Latchininsky, 1998).

REFERENCIAS

Als, T.D., Hansen, M.M., Maes, G.E. et al. (2011) All roads lead to home: panmixia of European eel in the Sargasso sea. Molecular Ecology, 20, 1333-1346.

Aragón, P., Coca-Abia, M.M., Llorente, V. & Lobo, J.M. (2013) Estimation of climatic favourable areas for locust outbreaks in Spain: Integrating species' presence records and spatial information on outbreaks. Journal of Applied Entomology, 137, 610-623.

Aragón, P. & Lobo, J.M. (2012) Predicted effect of climate change on the invasibility and distribution of the western corn root-worm. Agricultural and Forest Entomology, 14, 13-18.

Arias, A., Sánchez, M., Jiménez, J. et al. (1994) Distribución en el suelo de las ootecas de Dociostaurus maroccanus (Thunb.) e importancia de su depredación en dos fincas de Extremadura. Boletín de Sanidad Vegetal. Plagas, 20, 3-22.

Barranco, P., Pascual, F. & Cabello, T. (2000) Oviposition and egg predation in Dociostaurus maroccanus (Thunberg, 1815). (Orthoptera: Acrididae). Boletín de la Asociación Española de Entomología, 24, 161-177.

Bernatchez, S., Xuereb, A., Laporte, M. et al. (2018) Seascape genomics of eastern oyster (Crassostrea virginica) along the Atlantic coast of Canada. Evolutionary Applications, en prensa [doi:10.1111/eva.12741]

Bickford, D., Lohman, D.J., Sodhi, N.S. et al. (2007) Cryptic species as a window on diversity and conservation. Trends in Ecology & Evolution, 22, 148-155.

Buzzetti, F.M., Hochkirch, A., Massa, B. et al. (2016) Dociostaurus minutus: the IUCN Red List of Threatened Species 2016: e.T16084624A70736274. [WWW document]. URL https://www.iucnredlist.org

Castillo, J.A., Epps, C.W., Davis, A.R. & Cushman, S.A. (2014) Landscape effects on gene flow for a climate-sensitive montane species, the American pika. Molecular Ecology, 23, 843-856.

Chapuis, M.P., Estoup, A., Augé-Sabatier, A. et al. (2008) Genetic variation for parental effects on the propensity to gregarise in Locusta migratoria. BMC Evolutionary Biology, 8, 37.

Chapuis, M.P., Popple, J.A., Berthier, K. et al. (2011a) Challenges to assessing connectivity between massive populations of the australian plague locust. Proceedings of the Royal Society B-Biological Sciences, 278, 3152-3160.

237

DISCUSIÓN GENERAL

Chapuis, M.P., Simpson, S.J., Blondin, L. & Sword, G.A. (2011b) Taxa-specific heat shock proteins are over-expressed with crowding in the australian plague locust. Journal of Insect Physiology, 57, 1562-1567.

Cigliano, M.M., Braun, H., Eades, D.C. & Otte, D. (2019) Orthoptera Species File. Version 5.0/5.0. [WWW document]. URL http://orthoptera.speciesfile.org

Cordero, P.J., Llorente, V., Aguirre, M.P. & Ortego, J. (2010) Dociostaurus crassiusculus (Pantel, 1886), especie (Orthoptera: Acrididae) rara en la península ibérica con poblaciones locales en espacios singulares de Castilla-La Mancha (España). Boletín de la Sociedad Entomológica Aragonesa, 46, 461-465.

Fernández-Palacios, J.M., Rijsdijk, K.F., Norder, S.J. et al. (2016) Towards a glacial-sensitive model of island biogeography. Global Ecology and Biogeography, 25, 817-830.

García-Navas, V., Noguerales, V., Cordero, P.J. & Ortego, J. (2017) Phenotypic disparity in Iberian short-horned grasshoppers (Acrididae): the role of ecology and phylogeny. BMC Evolutionary Biology, 17, 109.

Guo, B., Li, Z. & Merilä, J. (2016) Population genomic evidence for adaptive differentiation in the Baltic sea herring. Molecular Ecology, 25, 2833-2852.

He, Q., Edwards, D.L. & Knowles, L.L. (2013) Integrative testing of how environments from the past to the present shape genetic structure across landscapes. Evolution, 67, 3386-3402.

Hemp, C., Kehl, S., Schultz, O. et al. (2015) Climatic fluctuations and orogenesis as motors for speciation in East Africa: case study on Parepistaurus Karsch, 1896 (Orthoptera). Systematic Entomology, 40, 17-34.

Hoarau, G., Rijnsdorp, A.D., Van Der Veer, H.W. et al. (2002) Population structure of plaice (Pleuronectes platessa L.) in northern Europe: Microsatellites revealed large-scale spatial and temporal homogeneity. Molecular Ecology, 11, 1165-1176.

Hochkirch, A., Nieto, A., García-Criado, M. et al. (2016) European red list of grasshoppers, crickets and bush-crickets. 94 pp. Publications Office of the European Union. Luxembourg.

Keller, D., van Strien, M.J., Herrmann, M. et al. (2013) Is functional connectivity in common grasshopper species affected by fragmentation in an agricultural landscape? Agriculture, Ecosystems & Environment, 175, 39-46.

Knowles, L.L. & Richards, C.L. (2005) Importance of genetic drift during Pleistocene divergence as revealed by analyses of genomic variation. Molecular Ecology, 14, 4023-4032.

Knowlton, N. (1993) Sibling species in the sea. Annual Review of Ecology and Systematics, 24, 189-216.

238

DISCUSIÓN GENERAL

Kokanova, E.O. (2014) Food links of the Moroccan locust, Dociostaurus maroccanus (Thunberg, 1815) (Orthoptera, Acrididae) in Turkmenistan. Entomologicheskoe Obozrenie, 93, 53-57.

Lange, R., Durka, W., Holzhauer, S.I. et al. (2010) Differential threshold effects of habitat fragmentation on gene flow in two widespread species of bush crickets. Molecular Ecology, 19, 4936-4948.

Latchininsky, A.V. (1998) Moroccan locust Dociostaurus maroccanus (Thunberg, 1815): A faunistic rarity or an important economic pest? Journal of Insect Conservation, 2, 167-178.

Louveaux, A., Mouhim, A., Roux, G., Gillon, Y. & Barral, H. (1996) Effect of pastoral activities upon locust populations in the Siroua Massif (Morocco). Revue d'Écologie: La Terre et La Vie, 51, 139-151.

Meco, J., Muhs, D.R., Fontugne, M. et al. (2011) Late Pliocene and quaternary Eurasian locust infestations in the Canary archipelago. Lethaia, 44, 440-454.

Meco, J., Petit-Maire, N., Ballester, J. et al. (2010) The Acridian plagues, a new Holocene and Pleistocene palaeoclimatic indicator. Global and Planetary Change, 72, 318-320.

Moritz, C. (1994) Defining 'Evolutionarily Significant Units' for conservation. Trends in Ecology & Evolution, 9, 373-375.

Moritz, C. (1999) Conservation units and translocations: Strategies for conserving evolutionary processes. Hereditas, 130, 217-228.

Noguerales, V., Cordero, P.J. & Ortego, J. (2016) Hierarchical genetic structure shaped by topography in a narrow-endemic montane grasshopper. BMC Evolutionary Biology, 16, 96.

Ortego, J., Aguirre, M. P., Noguerales, V. & Cordero, P. J. (2015) Consequences of extensive habitat fragmentation in landscape-level patterns of genetic diversity and structure in the Mediterranean esparto grasshopper. Evolutionary Applications, 8, 621-632.

Presa, J.J., García, M., Clemente, M. et al. (2016) Dociostaurus hispanicus: the IUCN Red List of Threatened Species 2016: e.T16084433A75088044. [WWW document]. URL https://www.iucnredlist.org

Ryder, O.A. (1986) Species conservation and systematics: The dilemma of the subspecies. Trends in Ecology & Evolution, 1, 9-10.

Sáez, A.G. & Lozano, E. (2005) Body doubles. Cryptic species: as we discover more examples of species that are morphologically indistinguishable, we need to ask why and how they exist. Nature, 433, 111.

Slatyer, R.A., Nash, M.A. & Hoffmann, A.A. (2016) Scale-dependent thermal tolerance variation in Australian mountain grasshoppers. Ecography, 39, 572-582.

239

DISCUSIÓN GENERAL

Smith, J.M. & Haigh, J. (1974) The hitch-hiking effect of a favourable gene. Genetical Research, 23, 23-35.

Snyder, C.W. (2016) Evolution of global temperature over the past two million years. Nature, 538, 226-228.

Stöck, M., Sicilia, A., Belfiore, N.M. et al. (2008). Post-Messinian evolutionary relationships across the Sicilian channel: Mitochondrial and nuclear markers link a new green toad from Sicily to African relatives. BMC Evolutionary Biology, 8, 56.

Strangas, M.L., Navas, C.A., Rodrigues, M.T. & Carnaval, A.C. (2018) Thermophysiology, microclimates, and species distributions of lizards in the mountains of the Brazilian Atlantic forest. Ecography, 2, 354-364.

UICN (2018) The IUCN Red List of Threatened Species. Versión 2018-2. [WWW document]. URL https://www.iucnredlist.org

Valverde-García, P., Santiago-Álvarez, C., Thomas, M.B. et al. (2018) Comparative effects of temperature and thermoregulation on candidate strains of entomopathogenic fungi for Moroccan locust Dociostaurus maroccanus control. BioControl, 63, 819-831.

Wang, I.J. & Bradburd, G.S. (2014) Isolation by environment. Molecular Ecology, 23, 5649- 5662.

Wang, X., Fang, X., Yang, P. et al. (2014) The locust genome provides insight into swarm formation and long-distance flight. Nature Communications, 5, 2957.

Waples, R.S., Jones, R.P.J., Beckman, B.R. & Swan, G.A. (1991) Status review for Snake River fall Chinook salmon. 73 pp. Department of Commerce, National Oceanic and Atmospheric Administration. National Marine Fisheries Service, USA.

240

CONCLUSIONES 2

Dociostaurus hispanicus ♂, Madrid. Fotografía tomada por Piluca Álvarez.

CONCLUSIONES

1. A un nivel supraespecífico, los resultados de los análisis filogenéticos indicaron que las distintas especies del género Dociostaurus no se agrupan de acuerdo con las expectativas de la actual clasificación, lo que pone de manifiesto incongruencias entre la taxonomía clásica y molecular, y apunta a que el estado actual de los subgéneros no es apropiado y debe ser reevaluado.

2. Los datos moleculares indicaron que la principal diversificación del género Dociostaurus tuvo lugar durante el Mioceno-Plioceno mientras que la separación de la mayor parte de especies hermanas estudiadas ocurrió durante el Pleistoceno, posiblemente como resultado de procesos de vicarianza.

3. Dociostaurus crassiusculus se eleva a categoría de especie, se redefine como taxón endémico de la península ibérica y, por tanto, se ha de considerar con alta prioridad de conservación a causa de la escasez de sus poblaciones y su elevado grado de fragmentación. Asimismo, los análisis filogenéticos mostraron que las subespecies antes denominadas D. crassiusculus nigrogeniculatus y de D. c. aurantipes deben ahora considerarse como subespecies de D. kraussi (especie hermana a D. crassiusculus) o bien como parte de su variabilidad fenotípica.

4. La especie D. crassiusculus presenta dos linajes altamente divergentes y un total de tres grupos genéticos asociados a diferentes cuencas hidrográficas (Tajo, Guadiana y Guadalquivir), siendo este elemento fisiográfico el principal factor que explica la estructura genética de sus poblaciones. Estos resultados nos facultaron para asignar los tres grupos genéticos a distintas ESUs pertenecientes a diferentes administraciones regionales (Madrid, Castilla-La Mancha, Andalucía), lo que supondría una ventaja de cara a la elaboración de estrategias de manejo, planes de conservación y políticas de protección de esta especie.

243

CONCLUSIONES

5. Los resultados filogenéticos y de delimitación de especies confirmaron que D. brevicollis, distribuido en Europa del Este y Asia central, y D. hispanicus, endemismo ibérico, son dos especies hermanas bien definidas. Análisis genómicos detallados para D. hispanicus indicaron que esta especie presenta una marcada estructuración genética a lo largo de su rango de distribución en la península ibérica. El contraste de diferentes modelos demográficos espaciotemporalmente explícitos revelaron que su variación genómica ha sido moldeada tanto por la configuración espacial de barreras topográficas como por la fragmentación de sus hábitats naturales por parte del hombre.

6. Las poblaciones canarias e ibéricas de la langosta marroquí (D. maroccanus) se corresponden con dos grupos genéticos bien definidos, mientras que las poblaciones peninsulares presentan muy bajos niveles de diferenciación genética. Las reconstrucciones demográficas para todas las poblaciones estudiadas convergieron en un mismo patrón general, que se caracteriza por un cuello de botella durante el último periodo glaciar seguido de una abrupta expansión poblacional al inicio del Holoceno.

7. La presencia de numerosos loci bajo selección divergente y con frecuencias alélicas asociadas a variables ambientales apuntan a que la langosta marroquí podría presentar ciertas adaptaciones locales en respuesta a determinados agentes selectivos. Los loci bajo selección divergente mostraron una estructuración genética mucho más marcada que los loci neutrales pero sin una correspondencia clara con respecto a la geografía, lo que indica que ciertas poblaciones de distintas regiones podrían estar experimentando procesos selectivos similares.

8. Las poblaciones gregarias y solitarias de langosta marroquí apenas mostraron diferencias en sus niveles de diversidad genética, historia demográfica o grado de estructuración genética en loci neutrales o sometidos a selección divergente.

244