bioRxiv preprint doi: https://doi.org/10.1101/190249; this version posted September 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. bioRxiv preprint

Coalescent theory of migration network motifs

NICOLAS ALCALA∗, 1, AMY GOLDBERG∗, UMA RAMAKRISHNAN† and NOAH A. ROSENBERG∗

∗Department of Biology, Stanford University, Stanford, CA 94305-5020, USA, †National Centre for Biological Sciences, Bangalore, 560065, India

ABSTRACT Natural populations display a variety of spatial arrangements, each potentially with a distinctive impact on genetic diversity and genetic differentiation among subpopulations. Although the spatial arrangement of populations can lead to intricate migration networks, theoretical developments have focused mainly on a small subset of such networks, emphasizing the island-migration and stepping-stone models. In this study, we investigate all small network motifs: the set of all possible migration networks among populations subdivided into at most four subpopulations. For each motif, we use coalescent theory to derive expectations for three quantities that describe genetic variation: nucleotide diversity,

FST, and half-time to equilibrium diversity. We describe the impact of network properties on these quantities, finding that motifs with a large mean node degree have the largest nucleotide diversity and the longest time to equilibrium, whereas

motifs with small density have the largest FST. In addition, we show that the motifs whose pattern of variation is most strongly influenced by loss of a connection or a subpopulation are those that can be split easily into several disconnected components. We illustrate our results using two example datasets—sky island birds of genus Brachypteryx and Indian tigers—identifying disturbance scenarios that produce the greatest reduction in genetic diversity; for tigers, we also compare the benefits of two assisted gene flow scenarios. Our results have consequences for understanding the effect of geography on genetic diversity and for designing strategies to alter population migration networks to maximize genetic variation in the context of conservation of endangered species.

KEYWORDS coalescent theory; genetic differentiation; network; population structure

1 such predictions enable descriptions of the impact of migration 11 2 OALESCENT theory is a powerful tool to predict patterns of as one of the main evolutionary forces influencing allele frequen- 12 3 C genetic variation in models of population structure, and cies. In molecular ecology, they help evaluate the consequences 13 4 many studies have investigated the predictions of coalescent of abiotic factors such as geographic barriers, and biotic factors 14 5 models about genetic variation under a variety of different as- such as assortative mating, on levels of genetic diversity and 15 6 sumptions about the genetic structure of populations (Donnelly genetic differentiation. In conservation genetics, they can be 16 7 and Tavaré 1995; Fu and Li 1999; Rosenberg and Nordborg 2002). used to quantify the impact of past and future disturbance, as 17 8 Correctly predicting the effect of connectivity patterns on the well as to predict the outcome of management initiatives. 18 9 expected amount of nucleotide diversity and genetic differentia-

10 tion is important in a range of settings. In population genetics, The two most frequently examined models of population 19 Copyright © 2017 by the Authors structure are the island-migration and stepping-stone models. 20

In the island model, individuals can migrate from any subpopu- 21 Manuscript compiled: Monday 18th September, 2017 1 Affiliation correspondence address and email for the corresponding author. lation to any other subpopulation, all with the same rate (Wright 22

bioRxiv September 2017 1 bioRxiv preprint doi: https://doi.org/10.1101/190249; this version posted September 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 x

s 44 e

f arbitrary connectivity patterns. The number of patterns grows t i r t e o

v Common models

- 45 m rapidly with the number of subpopulations, however, and the 1 island model

x 2 3 circular stepping-stone model 46

s comprehensive description of networks of arbitrary size is a e f t i linear stepping-stone model r t e o single population model v 47

- combinatorial challenge. Because small network motifs are the m 2

x 4 5 6 7 “building blocks” of large networks (Milo et al. 2002), the deriva- 48 s e f t i r t e o 49 v tion of their features can be a step in predicting properties of - m 3

12 complex connectivity networks. We thus characterize coales- 50

cent quantities under all possible motifs describing the spatial 51

52 8 9 10 13 15 17 18 arrangements of up to four subpopulations. We first derive x s e f t i r t the expected coalescence times between pairs of lineages sam- 53 e o v - m 4 pled in each of the subpopulations and pairs sampled from 54 11 14 16 different subpopulations. For each subpopulation, we compute 55

three population-genetic quantities: expected nucleotide diver- 56

Figure 1 All possible network motifs for sets of at most 4 ver- sity, expected FST values between pairs of subpopulations, and 57 tices. Purple motif backgrounds highlight motifs that follow half-time to equilibrium after a perturbation. For each motif, 58 standard models, island or stepping-stone or both. Note that we compute four network statistics—number of vertices, num- 59 we take the term “motif” to indicate a specific small undi- ber of edges, mean degree, and density—correlating them with 60 rected graph (rather than a small directed or undirected sub- the population-genetic quantities. Finally, we investigate the 61 graph statistically overrepresented in large empirical net- nucleotide diversity lost after a connectivity loss or a subpopula- 62 works, as in many applications). tion loss—a transition between motifs. We interpret the results 63

in relation to problems in conservation genetics, considering two 64

case studies, birds of genus Brachypteryx and Indian tigers. For 65 23 1951). In the stepping-stone model, individuals can only migrate

both examples, we (i) consider genetic data in a network motif 66 24 to neighboring subpopulations (Kimura 1953; Maruyama 1970).

framework, and (ii) evaluate the potential impacts of connectiv- 67 25 Stepping-stone models can represent multiple spatial arrange-

ity change on population-genetic variation. 68 26 ments. Under the circular stepping-stone model, subpopulations

27 are arranged in a circle, so that all individuals can migrate to Model 69 28 exactly two subpopulations.

29 Although the island and stepping-stone models can accom- Population connectivity 70

30 modate a variety of patterns of connectivity among subpopu- We consider K haploid or diploid subpopulations of equal size 71

31 lations, they represent only some of the possible patterns, or N individuals. We denote by Mij the scaled backward migration 72

32 network “motifs.” Indeed, these models account for only 7 of rate, representing twice the number of lineages per generation 73

33 18 motifs possible for sets of one to four subpopulations (Fig- from subpopulation i that originate from subpopulation j. Thus, 74

34 ure 1). Numbering motifs by the classification from Read and Mij = 2Nmij for haploids and 4Nmij for diploids, where mij is 75

35 Wilson (2005, p. 8), motif 1 corresponds to the panmictic pop- the probability for a lineage of subpopulation i to originate from 76

36 ulation model, motif 18 to the island model, motifs 6, 14, and subpopulation j in the previous generation. The total scaled 77

37 16 to stepping-stone models, and motifs 3 and 7 to both island migration rate of subpopulation i, or twice the scaled number 78 K 38 and stepping-stone models. Although tools of coalescent theory of lineages that originate elsewhere, is Mi = ∑j=1,j6=i Mij. We 79

39 to study arbitrary migration models are available (Wilkinson- further assume that the numbers of migrants from each non- 80

40 Herbots 1998), to our knowledge, patterns of variation expected isolated subpopulation are all equal to M, so that for two non- 81

41 from the remaining 11 motifs have not been described. isolated subpopulations i and j, Mi = Mj = M. Time is a 82

42 An objective in the study of spatial arrangements of popu- continuous variable t, scaled in units of the size of a single 83

43 lations is to examine the properties of networks representing subpopulation (N for haploids, 2N for diploids). We focus on 84

2 Alcala et al. bioRxiv preprint doi: https://doi.org/10.1101/190249; this version posted September 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

85 cases with 1 ≤ K ≤ 4, and we consider all possible connectivity State (11) State (12) State (13) 2 2 2 86 patterns between subpopulations, where each pattern represents ● ● ● 1 3 1 3 1 3 87 a distinct graph on at most four vertices (Figure 1). ● ● ● ● ● ● State (22) State (23) State (33) State (00) 2 2 2 88 Coalescence ● ● ● ● 1 3 1 3 1 3 89 We consider the fate of two gene lineages drawn from a specific ● ● ● ● ● ● ● ●

90 pair of subpopulations, either the same or different subpopula- Figure 2 Schematic representation of all states for two lineage 91 tions. We denote the state of the two lineages by (ij), where i in a population divided into K = 3 distinguishable subpopu- 92 and j correspond to subpopulations. As the coalescence times lations. Lineages appear in white, and subpopulations appear 93 between two lineages with initial states (ij) and (ji) are the same, in black. The two lineages can either be in different subpopu- 94 we consider that (ij) refers to both (ij) and (ji), and we assume lations (states (12), (13), and (23)), in the same subpopulation 95 without loss of generality that i ≤ j. Consequently, the number K ((11), (22), and (33)), or they can already have coalesced ((00)). 96 of states for two lineages in K subpopulations is (2)+K+1: this K 97 quantity includes (2) states where they are in different subpopu- The transition probabilities between states after a time inter- 110 98 lations, K where they are in the same subpopulation, and 1 state val of length t are given by 111 99 where they have coalesced.

100 Assuming that events cannot occur simultaneously, the coa- P(t) = eQt. (2) 101 lescent process can be described by a continuous-time Markov

102 chain (Kingman 1982; Wilkinson-Herbots 1998). The list of all The element pij,k`(t) of P(t) corresponds to the transition proba- 112

103 possible states of the Markov chain in the case where K = 3 is bility from state ij to state k` in time t. 113

104 represented in Figure 2. This general model embeds known models. Setting Mij = 114  105 The instantaneous rate matrix Q = qij,k` for the Markov M/(K − 1) for all i and j 6= i leads to the finite island model 115

106 chain, where qij,k` is the instantaneous transition rate from state (Notohara 1990; Nei and Takahata 1993). Considering subpop- 116

107 (ij) to state (k`), is defined by (Wilkinson-Herbots 1998): ulations along a circle and setting Mij = M/2 for all adjacent 117

subpopulations (i = j + 1, i = j − 1, or {i, j} = {1, K}) and 118  Mij = 0 for all non-adjacent subpopulations leads to the circular 119  −1 − Mi if i = j, i = k, and j = `  stepping-stone model (Strobeck 1987). Considering subpopula- 120   M Mj  − i − if i 6= j, i = k, and j = ` tions along a finite line and setting Mij = M/2 for 1 < i < K, 121  2 2   M12 = MK,K−1 = M, and Mij = 0 for all non-adjacent subpop- 122   Mik if i = j, i 6= k, and j = `  ulations leads to the linear stepping-stone model (Wilkinson- 123    M if i = j, i = k, and j 6= ` Herbots 1998). 124  i`     Mj`/2 if i 6= j, i = k, and j 6= `  Results 125 qij,k` =  (1)   M /2 if i 6= j, i 6= k, and i = `  jk  Expected coalescence time 126    Mik/2 if i 6= j, i 6= k, and j = ` The probability that coalescence has already occurred after time 127    t for two lineages sampled respectively in subpopulations i and 128  6= 6= = 6= `  Mi`/2 if i j, i k, j k, and i  j corresponds to the transition probability during time t from 129    1 if i = j, k = 0, and ` = 0 initial state (ij) to state (00). This probability is given by element 130    pij,00 from matrix P(t) (eq. 2). Because pij,00(t) is a cumulative 131 0 otherwise. probability, the associated density function is 132

108 It can be seen that the list in eq. 1 covers all cases for (i, j, k, `) by dpij,00(t) 109 ≤ ≤ ` f (t) = . (3) noting that by assumption, i j and k . ij dt

Coalescent theory of migration motifs 3 bioRxiv preprint doi: https://doi.org/10.1101/190249; this version posted September 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. T be3 able Motif 18 17 14 12 16 15 13 11 10 8 9 Table 1 Exact mean coalescence times and FST values for 2- xc encaecnetmsfr4vre ois wn osmere nmgainmtf (Figure motifs migration in symmetries to Owing motifs. 4-vertex for times coalescence mean Exact 3 ¯ 1440 vertex motifs. tij represents the expected coalescence time for ( 135 M 4 2 M ( 25 2 ( 26 25 45 2 a pair of lineages, one sampled from subpopulation i and one + M + M M t ¯ 3639 M 4 3 1 4 3 2 1 11 3 8 340 + + + + 27 28 207 44 M M sampled from subpopulation j (eq. 4). Fij is the value of FST 8 ) + + ) 208 2240

between subpopulations i and j (eq.6). ) 3

¯ ¯ ¯ ( Motif t t t F 135 11 22 12 12 160 M 4 2 M ( ( t 26 25 2 ¯ 25 45 2 22 + + M M M M 340 , 4 4 3 3 2 1 2 3 8 2 1 1 ∞ 1 403 + + + + t ¯ 44 27 28 M 26 52 M + ) ) + 208

1 1 246 3 2 2 2(1 + 2M ) 1+4M ) 480 135 M M 2 2 + + t t t ¯ ¯ ¯ 1171 340 4 4 1 2 1 1 1 3 11 11 Table 2 Exact mean coalescence times and FST values for 3- 33 M M + + 208

vertex motifs. Owing to symmetries in migration motifs (Fig- 688

ure 1), t¯22 = t¯33 and t¯12 = t¯13, and thus, F12 = F13. 4 ( 1 + Motif t¯11 t¯22, t¯33 t¯12, t¯13 t¯23 F12, F13 F23 480 4 M 3 M M 3 ( 3 ) ( 135 + 30  3 4 4 3 2 3 8 1579 M ( ( ( ( ( M 4 1 1 ∞ ∞ 1 1 M 25 1 1 1 1 1 ( 1 25 2 2 M M + + + + + t + + + ¯ ∞ ∞ M 12 + 51 340 2 + + 4 4 3 3 2 8 26 205 1 1 M 3 3 2 1 2 M M M M M 28 5 M 1670 8 5 1 2 ∞ 2(1 + ) 1 M + M + M M 2 1 4 ) ) ) ) ) ) +  − 20 + 208 M 32 ) 27 + ) 8 8 8 5  8 1  1 1 M 568 3

6 1 + 1 + + 3 3 3 8M 3 M + 16 1+2M 1 5 M 24  2 2 1 1 7 3 3 3(1 + 3M ) 3(1 + 3M ) 1+3M 1+3M 480 4 ( M 1 M ( + 3 90 135 + 4 3 M 1516 4 M M 1 ( ( M ( 2 1 1 ( 1 + ) 25 133 The expected coalescence time for two lineages sampled in sub- 2 M + + t + ¯ 241  + ∞ ∞ ∞ ∞ ∞ M 13 340 2 25 + + 4 3 M 26 M 3 1 2 M M 28 1519 M M 134 + i j M populations and is thus ) ) ) ) + + 156 + 208 M 207 27 8 + ) 480 ∞  Z t¯ = t f (t) dt. (4) ij ij 2 ( 3 4 4 3 3 3 8 M 1 ), M ( ( ( ( 0 + 1 1 1 1 ( 1 25 4 + + + + t t t + ¯ ¯ ¯ ) ∞ ∞ ∞ t ¯ M 12 12 14 ( 22 15 + 4 4 3 3 8 M 3 3 2 2 M M M M 28 5 M = + )

135 ) ) ) ) We derive in Appendix A the system of equations that can be  16 ) t ¯ 44 136

solved to obtain the expected coalescence times in cases with . 480 M 137 one to four subpopulations. The expected coalescence times for M ( 3 135 + 4 4 1738 3 ( ( M (

138 1 1 motif 1—one isolated subpopulation—is simply 1. The expected 1 2 M + + t t t + ¯ ¯ ¯ + ∞ ∞ ∞ ∞ ∞ 14 12 23 340 2 + 4 4 M 3 3 1 M M

139 2073 coalescence times for the two-vertex motifs (motifs 2 and 3) M ) ) ) + 208 M

140 + appear in Table 1, for the three-vertex motifs (4 to 7) in Table 2, ) 816

141 and for the four-vertex motifs (8 to 18) in Table3. ( 3 M 4 M ( 142 The set of all pairwise coalescence times of a motif is informa- + ( 1 2 90 135 + ) 4 3 2 ( 4 3 3 8 M M 160 ( ( ( M 4 ( ( 2 1 1 1 ( 3 M 1 143 1 1 25 2

tive about another quantity of interest: the total coalescence time, + M + + + t + ¯ + + ∞ ∞ + ) 185 M 24 2 340 ( + + 4 3 2 26 25 M M M M 3 1 1 2 1 M M M 403 28 1

144 M M that is, the coalescence time of two lineages randomly sampled M + ) ) )  ) ) ) + 92 M + + 208 27 26 + 246

145 ) in any two K subpopulations, possibly the same one. Indeed, the ) ) ¯ 2 K K ¯ 146 total coalescence time is simply tT = (1/K ) ∑i=1 ∑j=1 tij, the 4 4 2 3 ( ( ( ( 1 1 1 147 1 + + + t t t mean coalescence time across all possible subpopulation pairs. t ¯ ¯ ¯ ¯ + ∞ ∞ ∞ ∞ 12 23 12 34 4 4 2 M 3 3 1 1 148 M M M The total coalescence times for all motifs presented in Figure 1 ) ) ) )

149 appear in Table S1.

4 Alcala et al. bioRxiv preprint doi: https://doi.org/10.1101/190249; this version posted September 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

150 Expected within-subpopulation nucleotide diversity Half-time to equilibrium diversity 183

151 We next calculate the expected within-subpopulation nucleotide The dynamics of πS and FST are governed by the eigenvalues 184

152 diversity, that is, the expected number of differences between of matrix Q (eq. 1; e.g., Slatkin 1991). Considering an event that 185

153 two nucleotide sequences sampled from the same subpopula- changed the population demography τ time units ago, πS and 186

154 tion, assuming an infinitely-many-sites model (Kimura 1969) FST will be at equilibrium in the sense that their values are stable 187

155 and a scaled mutation rate θ per site per generation. Here, θ through time if the probability that coalescence occurs at time 188 Qτ T 156 represents twice the number of mutant lineages per generation t > τ is small, and thus, if P(τ) = e ≈ [0, 0, ... , 0, 1] , where 189

157 in a subpopulation (2Nµ for haploids, 4Nµ for diploids, where the last entry corresponds to the coalesced state. 190 −1 158 µ is the unscaled per-site per-generation mutation rate). We take Considering the eigendecomposition Q = UΛU , where Λ 191

159 the mean across all subpopulations of the pairwise coalescence is the diagonal matrix whose elements correspond to the eigen- 192

160 time within subpopulations: values of Q and U is the matrix whose columns are the eigenvec- 193 Λτ −1 T ! tors of Q, P(τ) = Ue U . Thus, P(τ) ≈ [0, 0, ... , 0, 1] when 194 1 K = t¯ Λτ T λi τ πS θ ∑ ii . (5) e ≈ [0, 0, ... , 0, 1] , which requires that e ≈ 0 for all eigen- 195 K i=1 λi τ values λi except one, for which e ≈ 1. This condition holds if 196 161 Note that πS is also informative about total nucleotide diversity the largest eigenvalue of Q is 0 and the second-largest—denoted 197 162 when M is large, because from Tables 1-3 and S1, the total coales- λτ by λ—satisfies e ≈ 0. Because Q is an irreducible instanta- 198 163 cence time tends to the mean within-subpopulation coalescence neous rate matrix, its largest eigenvalue is 0 and all other eigen- 199 164 time across all subpopulations as M → ∞. values are strictly negative (corollary 4.9 in Asmussen 2008). 200 165 We analytically computed the within-subpopulation nu- We define the half-time to equilibrium τ as a function of λ, 201 166 cleotide diversities for each motif by substituting the expected the second-largest eigenvalue of matrix Q, as 202 167 coalescence time from Tables 1-3 into eq. 5. Nucleotide diversity τ = − ln(2)/λ. (8) 168 appears in Figure S1 as a function of network metrics.

λτ τ corresponds to the time at which e = 1/2. Thus, when 203 169 Genetic differentiation T t  τ, P(t) ≈ [0, 0, ... , 0, 1] , and πS and FST are approximately 204 170 For each motif, we compute expected values of FST between at equilibrium. The value of τ gives a sense of the time needed 205 171 pairs of distinct subpopulations i and j, denoted by Fij, from for πS and FST to reach equilibrium values after a perturbation, 206 172 pairwise coalescence times. From Slatkin (1991), such as after a loss of a connection or a subpopulation. This 207 ij ij t¯ − t¯ value depends on subpopulation connectivity patterns. 208 = T S Fij ij , (6) ¯ We computed the half-time to equilibrium from eq. 8 for each 209 tT motif for sets of 1 to 4 subpopulations (Figure 1), numerically 210 ¯ij ¯ ¯ 173 where tS = (tii + tjj)/2 is the expected coalescence time of two evaluating the second-largest eigenvalue of Q. Results appear 211 ¯ij ¯ 174 lineages sampled in the same subpopulation, and tT = (tij + in Figure S1 as a function of network metrics. 212 ¯ij 175 tS )/2 is the expected coalescence time of two lineages sampled

176 in the total population. We compute eq. 6 using eq. 4. Network motifs and patterns of genetic variation 213

177 For a K-vertex motif, FST has mean To describe the influence of the properties of network motifs on 214

2 K−1 K our genetic variation measures, we computed the correlations be- 215 F = Fij (7) ( − ) ∑ ∑ 216 K K 1 i=1 j=i+1 tween four network metrics and the mean within-subpopulation

diversity πS, the mean FST across pairs of subpopulations, F, 217 178 across subpopulation pairs. We analytically computed the ex- and the half-time to equilibrium diversity τ. 218 179 pected FST from eq. 7 for each motif for sets of 3 and 4 subpop-

180 ulations (Figure 1). The expected pairwise FST values for 2-, 3-, Network metrics For a given motif, we denote by V and E its 219

181 and 4-vertex motifs appear in Tables 1, 2, and 4, respectively. FST sets of vertices and edges, so that |V| and |E| correspond to the 220

182 appears in Figure S1 as a function of network metrics. numbers of vertices and edges of the motif. 221

Coalescent theory of migration motifs 5 bioRxiv preprint doi: https://doi.org/10.1101/190249; this version posted September 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Table 4 Exact FST values for 4-vertex motifs.

Motif F12 F13 F14 F23 F24 F34

8 1 1 1 1 1 1

1 9 1 1 1 1 1+4M 1

1 1 1 10 16 1 16 1 1+2M 1 1+ 5 M 1+ 5 M 1 1 11 1+4M 1 1 1 1 1+4M

1 1 1 12 1+3M 1 1+3M 1 1+3M 1

1 1 1 1 1 1 13 1+3M 1+3M 1+3M 1+2M 1+2M 1+2M

( + ) 14 19M+20 3 51M 52 1 F 81M+92 F 60M2+83M+20 180M2+329M+156 3 14 180M2+289M+92 12 1+ 2 M 2 2 2(137M2+340M+204) 15 1104M +2783M+1704 972M +2405M+1440 F 1 F 2880M3+8370M2+7237M+1704 2880M3+8124M2+6709M+1440 12 480M3+1464M2+1393M+408 1+3M 23

1 1 1 1 1 1 16 8 1+2M 8 8 1+2M 8 1+ 3 M 1+ 3 M 1+ 3 M 1+ 3 M 17 295M+306 1 F F 1 F 800M2+1125M+306 1+2M 12 12 8 12 1+ 3 M 1 1 1 1 1 1 18 8 8 8 8 8 8 1+ 3 M 1+ 3 M 1+ 3 M 1+ 3 M 1+ 3 M 1+ 3 M

222 The first network metric we use is |V|, the motif size, or of connections. The largest F values occur for the least dense 246

223 number of subpopulations K; here, |V| ranges from 1 to 4. The motifs—2, 4, and 8—which have 0 edges. 247

224 second metric is |E|, which corresponds to the number of pairs Finally, τ is positively correlated with the four metrics, and 248

225 of subpopulations between which gene flow occurs; |E| ranges most strongly with the mean degree |E|/|V| (ρ = 0.95 for M = 249 |V| K 226 = 250 between 0 and ( 2 ) (2). Our third metric is the mean vertex 10; Figure 3C). For large M (Figure S1I), the largest τ values

227 degree |E|/|V|, or the number of connections of an average correspond to the motifs with largest mean degree (16, 17, and 251

228 subpopulation; it ranges from 0 to K − 1. The fourth network 18), whereas the lowest τ values occur for the motifs with the 252 |V| 229 | | 253 metric is the density E /( 2 ), the number of edges divided by lowest degree (1, 2, 4, and 8).

230 the maximum number of edges possible if the motif were a fully

Impact of a disturbance event 254 231 connected graph; it ranges from 0 to 1.

In this section, we focus on the impact of a disturbance event 255 232 Correlations between network metrics and patterns of genetic on mean genetic diversity πS. Of the three quantities we 256 233 variation Correlations between network metrics and πS, F, and computed—πS, F, and τ—this quantity is perhaps the most 257 234 τ for motifs with up to four subpopulations appear in Figure 3. central to conservation biology. 258 235 Diversity πS is positively correlated with all four metrics, most

236 strongly with the number of edges |E| (ρ = 0.96 for M = 10; Enumerating outcomes of disturbance events We enumerate 259

237 Figure 3A) and the mean degree |E|/|V| (ρ = 0.96 for M = 0.1 all possible outcomes that could follow a disturbance event 260

238 and M = 1; Figure 3A). Indeed, the highest values of πS occur that removes a connection between two subpopulations or that 261

239 for motifs 16, 17, and 18, which have the largest mean degree removes a subpopulation. To do so, we compute a “graph of 262

240 (2, 2.5, and 3, respectively), whereas the lowest values occur for motifs,” where each vertex represents a motif, and we draw an 263

241 motifs 1, 2, 4, and 8, which have mean degree 0. edge between two motifs if they differ by a single subpopulation 264

242 F correlates negatively with the four metrics, especially the or a single connection. We orient edges of this graph from the 265 |V| 243 | | = − = 266 density E /( 2 ) (ρ 0.92 for M 10; Figure 3B). Indeed, motif with the larger number of subpopulations or connections

244 for large M (Figure S1H), the lowest F values occur for the toward the motif with the smaller number of subpopulations or 267

245 densest motifs—3, 7, and 18—which have the maximal number connections. We give each edge a weight corresponding to the 268

6 Alcala et al. 278 277 276 275 274 273 272 271 270 269 297 296 295 294 293 292 291 290 289 288 287 286 285 284 283 282 281 280 279 ihnsbouaindvriy ntecs favre os we loss, vertex a of case the In diversity. within-subpopulation motif from transition the with h raetdvriyls cuswt h rniinfo motif from transition the with occurs loss diversity greatest The rprino ihnsbouaindvriycag associated change diversity within-subpopulation of proportion loiedvriy(eq. diversity cleotide httetasto rmmotif from transition the indicates that weight positive a whereas diversity, subpopulation motif from transition eq. edges 3 Figure ifrn uppltoswtotioaigaysubpopulations. any isolating without subpopulations from different sampled lineages for time coalescence the increases sition in migration rates all for diversity the increases transition edge-loss one Surprisingly, subpopulations. isolated two subpopulations—to has of 2—which motif pair connected single a has 3—which Figure red, dis- in into (transitions motif subpopulations a of split sets all connected losses greatest the to lead (Figure that 4% sitions of gain a to 50% of loss a of case the on focus Figure losses vertex and losses Edge diversity. within-subpopulation the of 75% of change a to leads 0, of diversity has “other” the and 1 diversity has (Table 2 have diversity each subpopulations two where 3, motif from example, transition for the 0; diversity has subpopulation lost the that consider ewr erc nld ubro vertices of number include metrics Network π bioRxiv preprint S i not certifiedbypeerreview)istheauthor/funder,whohasgrantedbioRxivalicensetodisplaypreprintinperpetuity.Itmadeavailable w h mato iest ftels favre agsfrom ranges vertex a of loss the of diversity on impact The from ranging changes diversity to lead can edge an of Loss stema ihnsbouaindvriycmue from computed diversity within-subpopulation mean the is 5 20 ple omotif to applied | [( = E o dels n nFigure in and loss edge for 4A ( | 0, ero orltosbtenntokmtisadgntcdvriymaue.(A) measures. diversity genetic and metrics network between correlations Pearson / ( ∞ 1 | V 2 ) + | h rniinfo oi 7t oi 6 hstran- This 16. motif to 17 motif from transition the : ) l ewr oisi Figure in motifs network All . 0 doi: ) / ,t oi ,weeasnl subpopulation single a where 1, motif to 1),

2 correlation A π https://doi.org/10.1101/190249 M − i .(B) 5). ( omotif to = i eaiewih niae htthe that indicates weight negative A . 2 1. + M = 0.1

2 M = 1 ) i F i / M = 10 omotif to enpairwise mean , to 2 j h rp fmtf per in appears motifs of graph The ] nue oso enwithin- mean of loss a induces / j , [( w 2 S ) neetnl,tetran- the Interestingly, 4B). ij + ( = j o etxls.We loss. vertex for 4D nue ano mean of gain a induces 2 ) |E|/ π / under a 2 S j ( ) |V| 2 = ] 1 − | V ; r osdrd nec ae,tems togycreae ercapasi red. in appears metric correlated strongly most the panel, each In considered. are F π | this versionpostedSeptember18,2017. ST ubro edges of number , − S i CC-BY-NC-ND 4.0Internationallicense ) .5 htis, that 0.75, cossbouain (eq. subpopulations across / π B F S i where , 4B). | E eueterslsfo u ewr-ae oe oreinter- to model network-based our from results the use We (circular 16 and 7 motifs and models) stepping-stone (linear 14 subpopulations—to connected four of set single a has 13—which hra h ot“out oisaetoeta r o split. not are that those are motifs “robust” most the whereas rtsailgntcsrcuei w nmleape.Using examples. animal two in structure genetic spatial pret Examples diversity. reduce substantially to likely thus is transition motifs robust and Fragile subpopulations. isolated three has 4—which motif Figure of red, sets in disconnected (transitions into subpopulations motif the of split gener- a losses to greatest correspond the ally to lead that losses vertex the (Figure case, 10% loss of loss a to 75% of loss a l,o a ihiln oes(Tables models island with mod- par all on of els, highest the among diversity, fragility high Their their loss. from vertex stems a by split easily not are however, els, mod- stepping-stone circular is The diversity. that reduce disconnection to a expected producing loss, vertex a by split are easily motifs stepping-stone linear The models). stepping-stone ure loss, edge an by components disconnected into split those Figure are in appear loss robustness edge by an ranked Motifs to set. vertex or edge the across mean all changes with diversity associated the compute we motif, greatest each the For to loss. leads diversity disturbance which for motifs the motifs: motif from transition the with associated is loss diversity greatest | ennme fegsprvertex per edges of number mean , oisrne yrbsns oavre osapa nFig- in appear loss vertex a to robustness by ranked Motifs .W a e httems rgl oisaemtf ,6 and 6, 3, motifs are motifs fragile most the that see can We 4F. |E|/ ( ) |V| 2 .(C) 7). The copyrightholderforthispreprint(whichwas . τ olsetter fmgainmtf 7 motifs migration of theory Coalescent aftm oeulbimdvriy(eq. diversity equilibrium to half-time , C τ | E | deor edge π ecnas dniytems “fragile” most the identify also can We S enwti-upplto nu- within-subpopulation mean , | V | .Tems rgl motifs fragile most The 4C. etxlse,rprigthe reporting losses, vertex ) iial oteedge the to Similarly 4E). | E 1, | / |E|/ | and 2, ) o ntne the instance, For 4E). V ( ) |V| 2 | n est of density and , .Aymotif Any 3). 8). 325 324 323 322 321 320 319 318 317 316 315 314 313 312 311 310 309 308 307 306 305 304 303 302 301 300 299 298 bioRxiv preprint doi: https://doi.org/10.1101/190249; this version posted September 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

A Change of diversity after a transition B Transitions ranked by diversity change transitions splitting transitions not splitting the motif the motif 1 0% i

10% S i 0% S -20% )/ π i S )/ π i

2 3 S -50% -40% - π j - π S j S

( π -60% -100% ( π

4 5 6 7 change Diversity 5 → 4 6 → 5 9 → 8 7 → 6 3 → 2 Diversity change Diversity -80% 14 → 11 14 → 10 15 → 13 16 → 14 12 → 10 17 → 15 18 → 17 15 → 14 17 → 16 15 → 12 11 → 9 13 → 10 10 → 9 C Motifs ranked by mean diversity change 12 Edge loss all transitions split no transitions split 0% the motif the motif

8 9 10 13 15 17 18 i S -20% some transitions split )/ π i the motif S -40%

11 14 16 - π j S

-60% 3 5 6 9 7 14 10 11 13 12 17 18 16 15 ( π

Diversity change Diversity -80% D Change of diversity after a transition E Transitions ranked by diversity change transitions splitting transitions not splitting the motif 1 0% the motif i

10% S i 0% S -20% )/ π i S )/ π i 2 3 -50% S -40% - π j S - π j S -100% ( π -60% ( π Diversity change Diversity 4 5 6 7 Diversity change Diversity -80% 13 → 4 3 → 1 6 → 2 10 → 4 15 → 5 14 → 5 5 → 2 7 → 3 9 → 4 12 → 5 16 → 6 2 → 1 6 → 3 17 → 6 10 → 5 18 → 7 15 → 6 14 → 6 17 → 7 11 → 5 15 → 7 13 → 6 4 → 2 8 → 4 5 → 3 9 → 5 10 → 6 12 → 7 12 F Motifs ranked by mean diversity change

Vertex loss 0% i 8 9 10 13 15 17 18 S -20% )/ π i S -40% - π

j no transitions split

11 14 16 S -60% the motif ( π some transitions split the motif Diversity change Diversity -80% 3 7 5 2 9 4 8 6 17 14 16 15 13 18 10 12 11

j i i Figure 4 Change of within-subpopulation nucleotide diversity (πS − πS)/π following a transition from motif i to motif j, for all possible transitions involving the loss of a single edge or a single vertex. (A) Schematic representation of all possible motif transi- tions involving an edge loss. Lines connecting motifs represent edge losses and are colored by changes of within-subpopulation nucleotide diversity: red for loss and blue for gain (see legend). (B) Motif transitions involving an edge loss ranked from the largest loss to the largest gain of within-subpopulation diversity. (C) Motifs ranked from largest to smallest mean diversity loss follow- ing edge loss. For each motif, the mean loss or gain is computed across all possible transitions to another motif. For example, for motif 5, three subpopulations can be lost; loss of the isolated subpopulation produces motif 3, generating a diversity loss of 20%, and loss of one of the two other subpopulations produces motif 2 and a diversity loss of 60%. Therefore, the mean diversity loss for motif 5 is (20%+60%+60%)/3 ≈ 46.7%. (D) Schematic representation of all possible motif transitions involving a vertex loss. Lines connecting motifs represent vertex losses and are colored by changes of within-subpopulation nucleotide diversity. (E) Motif transitions involving a vertex loss ranked from the largest to the smallest diversity within-subpopulation diversity loss. (F) Motifs j i i ranked from largest to smallest mean diversity loss following vertex loss. In all panels, (πS − πS)/πS values assume M = 1; in j i i (B), (C), (E), and (F), black horizontal bars represent minimum and maximum values of (πS − πS)/πS for M in (0, ∞). Values of i j ¯ j i i πS and πS are computed from eq. 5 using coalescence times tii from Tables 1-3; minima and maxima of (πS − πS)/πS are obtained numerically. Pink shaded bars in (B), (C), (E), and (F) indicate that a fraction of edge losses or vertex losses split a motif: 1/4 for edge loss from motif 15 or vertex loss from motif 10, 13, or 15, 1/3 for vertex loss from motif 6, and 1/2 for vertex loss from motif 8 Alcala et al. 14. bioRxiv preprint doi: https://doi.org/10.1101/190249; this version posted September 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

326 published spatial and genetic information for each example, we Note that the losses reported are expected losses in the long 367

327 propose a network motif that might represent the structure of the term. The half-time to equilibrium τ values for motifs 5 and 368

328 population. We then ask what types of transitions could result 9 appear in Figure S1C, F, I. Interestingly, they are equal, and 369

329 in increased or decreased population structure and variation in correspond to 7.69 for M = 0.1, 1.81 for M = 1, and 1.42 for 370

330 the context of the conservation biology of the species examined. M = 10, in units of 2N generations. Thus, depending on the 371

migration rate, the future decrease of genetic diversity substan- 372

331 Indian sky island birds of genus Brachypteryx tially changes. The identical τ values for the two motifs result 373

332 First, we consider two species of genus Brachypteryx, birds en- from the fact that τ is determined by the motif component with 374

333 demic to the Western sky islands of India: the white- the lowest half time to equilibrium, and the two motifs have 375

334 bellied and the rufous-bellied shortwings Brachypteryx albiventris similar components—a pair of connected subpopulations and 376

335 and B. major. Robin et al. (2015) reported microsatellite data from either one or two isolated subpopulations. 377

336 multiple geographically separated subpopulations, sampling Comparing the edge loss and vertex loss scenarios, a vertex 378

337 218 individuals at 14 microsatellite loci. These subpopulations loss transition from motif 11 to motif 5 has a greater negative 379

338 have experienced changes in geographic range and gene flow effect on nucleotide diversity, because it has the largest long- 380

339 on both evolutionary and anthropogenic time scales owing to term effects and the equilibrium is reached as quickly as in 381

340 Pleistocene climate change that could have shifted the locations the edge loss transition. In this case, focusing on preserving 382

341 of suitable habitat and recent deforestation. Such changes can subpopulations rather than gene flow is predicted to avoid the 383

342 influence numbers of populations and gene flow between them, most detrimental loss of genetic diversity for the subpopulations. 384

343 and can be interpreted using our network model.

Indian tigers 385 344 Robin et al. (2015) stated that genetic differentiation in the

345 pair of species was not quite consistent with a simple island- Next, we consider genetic variation for tigers in India, repre- 386

346 migration model, so that our network approach might provide senting 60% of the global wild tiger population (Mondol et al. 387

347 additional insight. Indeed, consistent with geographic barriers, 2009). Natesh et al. (in press) considered the genetic diversity 388

348 Robin et al. (2015) observed genetic differentiation between the and structure across the Indian subcontinent of tigers, a species 389

349 two species, as well as two subgroups within each species. The that now occupies 7% of its historical range. India’s ∼2,500 390

350 data generally fit motif 11 (Figure 5A), containing two relatively tigers are distributed across many small groups, with a median 391

351 isolated sets of subpopulations, each with two subpopulations size of 19 across recognized groups. Understanding population 392

352 that exchange migrants. However, FST values between the two structure and connectivity is important to tiger conservation. 393

353 species sets (Table S3 of Robin et al. 2015) were lower than the Using 10,184 SNPs, Natesh et al. (in press) identified a north- 394

354 high values expected under motif 11 (Table 4), potentially as a western subpopulation (dark blue cluster I in Figure 6A), a 395

355 result of a short time scale of fragmentation. north/northeastern subpopulation (green cluster II), a central 396

356 Under the network model, supposing that the current motif is subpopulation (orange cluster III), and a southern subpopula- 397

357 11, we can investigate the future impact of the loss of an edge or tion (purple cluster IV). They reported evidence of gene flow 398

358 vertex, representing events possible for an endangered species between subpopulations III and IV and between subpopula- 399

359 (Figure 5B). The transition from motif 11 to motif 9 is seen as a tions II and III. The exact relationship between subpopulations, 400

360 loss of an edge, corresponding to a loss of migration between however was unclear. From the pairwise FST values reported 401

361 one of the pairs of subpopulations. This event decreases within- in Table 2 of Natesh et al. (in press), levels of divergence be- 402

362 subpopulation nucleotide diversity (-25%; Figure 4C), and leads tween subpopulation I and all other subpopulations were high, 403

363 to increasing FST genetic differentiation between subpopulations, suggesting isolation with limited gene flow. fastSTRUCTURE 404

364 particularly within each species. The loss of a subpopulation, analyses performed by Natesh et al. (in press) suggested con- 405

365 transitioning from motif 11 to motif 5, similarly leads to a loss of nectivity between subpopulations II and III (Figure 6A). Owing 406

366 within-subpopulation nucleotide diversity (-38%; Figure4F). to the large FST between the northeastern subpopulation (II) 407

Coalescent theory of migration motifs 9 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 428 429 430 431 432 6C). The 6C). Disturbance scenarios B. major. (A) Map (2016) suggested re- and IV IV III include BR: ; BN: include GR: Grasshills; MN: III

6C). By contrast, if one of the future

I motif 5 motif motif 9 motif II II B. major Kelly and Phillips 4D and

6C). The loss of a subpopulation, however, B. albiventris

-25% loss -38% loss . The copyright holder for this preprint (which was (which preprint this for holder copyright The

Brachypteryx albiventris connection connection

subpopulation 4B), reconnects the central subpopulation III with 4C and IV (2015). Colors and roman numerals represent the III

present Possible future motif transitions motif future Possible 11 motif I et al. II 6C). Two such reconnection scenarios can be imagined. The To maintain or restore some of the recently lost genetic di- B in diversity is expected (-50%;4F Figures and connecting isolated subpopulations by assisted migration (Fig- ure first scenario, which corresponds toroutes restoring (Figure lost migration transition from motif 9 to motif 8, involving the loss of an edge, 33% (Figures leads to qualitative differences in the genetic structureing depend- on the subpopulation lost. If thesubpopulation more I isolated or northwestern the southern subpopulation IV isthe lost, resulting then network is similar to motif 5,crease with in a within-subpopulation moderate nucleotide de- diversity (-17%) and a decrease in differentiation overall because an isolatedulation subpop- is lost (Figures connected subpopulations, the central subpopulation III or the northern/northeastern subpopulation II, is lost, then a decrease versity of Indian tigers, population sizes and increase fragmentation (Figure would decrease within-subpopulation nucleotide diversity by Robin III IV II et al. CC-BY-NC-ND 4.0 International license International 4.0 CC-BY-NC-ND 4). this version posted September 18, 2017. 2017. 18, September posted version this ; I in the Indian sky islands of the with sampling locations, and STRUC- under a under B. major values between subpopu- ST and F B. albiventris https://doi.org/10.1101/190249 doi: doi: A Application of the network theory framework to the Indian sky island birds not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available available made It is perpetuity. in preprint the display to a license bioRxiv granted has who the author/funder, is review) peer by certified not 6B), involving recent loss of connectivity between subpop- Ongoing perturbations to the network are likely, owing to Because of the smaller pairwise bioRxiv preprint preprint bioRxiv increasing human pressures and land-use changes that reduce nectivity loss between subpopulations I and2013). II (Mondol have occurred recently is supported byhistorical previous data: genetic an and earlier study with 10 microsatellitesuggests markers an older transition between motif 15 and 12, with con- of 40% of the within-subpopulation diversitytransition (-10% from from motif the 12 to motiftransition 10, from and motif 10 then to -33% motif 9). from That the connectivity loss might ure ulation IV and the other subpopulations, and leading to a loss tween subpopulations I and IV, we suggest thatin a recent network change structure occurred from motif 12 to motif 9 (Fig- lations II and IV and between subpopulations III and IV than be- subpopulation (III) and the southern subpopulationsuggest (IV), that the we motif most clearly fitting thestructure current is population motif 9. and the southern subpopulation (IV) and between the central Munnar; KD: Kodaik-anal; HW: High Wavys; and PR:motif Peppara. 11, (B) which Possible is future taken motif to transitions, represent based the onnucleotide current the diversity state transitions across of from subpopulations the (Figure population. Numbers on arrows represent predicted losses of mean four genetic clusters. Two-letter codes indicate the samplingBanasura; locations. VM: Sampling ; SP: locations Sispara; for OT: ; KT: Kothagiri. Sampling locations for Figure 5 of the distribution of TURE plot. The map and the STRUCTURE plot are adapted from 10 Alcala et al. 427 426 425 424 422 423 421 420 419 418 417 415 416 414 413 412 411 410 408 409 bioRxiv preprint doi: https://doi.org/10.1101/190249; this version posted September 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

A B Historical motif transitions

I II connection I II connection I II loss loss - 10% - 33% motif 9 motif 12 motif 10 IV III IV III IV III II past present C Possible future motif transitions I I II Disturbance scenarios motif 8 III connectionloss IV III I II -33% II II subpopulation loss -17% or motif 5 motif 9

IV III subpopulation-50% I III IV III I I loss IV or motif 4 IV III IV II

I II Assisted gene flow

connection scenarios gain NW N NE CI SI I II motif 13 1.00 +100% 0.90 0.80 IV III 0.70

0.60 motif 9 +125% to 144% I II 0.50 0.40 0.30 IV III 0.20 connection 0.10 0.00 gain motif 14 IV III Simlipal Panna_2 Panna_1 Kanha_7 Kanha_6 Kanha_4 Kanha_5 Kanha_3 Kanha_1 Pench_2 Pench_1 Pench_3 Morigaon obt_6 Corbet _2 Corbet _1 Corbet _3 Corbet _4 Corbet _5 Corbet Arunanchal Bandipur_3 Bandipur_1 Bandipur_2 Wayanad_3 Wayanad_2 Wayanad_5 Wayanad_4 Kaziranga_2 Kaziranga_1 Nagarahole_2 Nagarahole_1 Satyamangalam Bandhavgarh_2

Ranthambhore_6 Ranthambhore_3 Ranthambhore_4 Ranthambhore_1 Ranthambhore_2 present future

Figure 6 Application of the network theory framework to Indian tigers. (A) Map of the distribution of tigers with sampling loca- tions, and fastSTRUCTURE plot. The figure is adapted from Natesh et al. (in press). Note that sample sizes for the fastSTRUCTURE plot include only subsets of individuals from Natesh et al. (in press). (B) Hypothetical sequence of past motif transitions based on Natesh et al. (in press) and Figure 4. (C) Possible future motif transitions, based on the transitions from motif 9 (Figure 4). For (B) and (C), motif 9 is taken to represent the current state of the population; percentages correspond to the proportion of within- subpopulation diversity change following each motif transition (from Figure 4).

Coalescent theory of migration motifs 11 bioRxiv preprint doi: https://doi.org/10.1101/190249; this version posted September 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

448 all other subpopulations, producing a transition from motif 9 to that motif density is the most strongly correlated with genetic 489

449 motif 13. It would lead to an increase of within-subpopulation differentiation between subpopulations. 490

450 diversity of 100%. Alternatively, a second scenario in which Our framework makes it possible to predict the impact on 491

451 subpopulations are reconnected along a line, forming a linear genetic diversity of disturbances such as loss of a subpopulation 492

452 stepping-stone, is possible, corresponding to a transition from or a connection between subpopulations. The effect of the loss of 493

453 motif 9 to motif 14. This scenario might seem less intuitive, as a vertex or edge depends on the context of the disturbance in the 494

454 it does not correspond to any previous population structure. population network. Whereas some disturbances that split the 495

455 Interestingly, it leads to a greater increase of diversity (+125% to network, including edge losses in transitions from motif 3 to 2 496

456 +144%, depending on the amount of gene flow; Table S2). and from 14 to 11 and the vertex loss in the transition from motif 497

457 Note that the losses and gains of diversity reported are ex- 13 to 4, substantially reduce genetic diversity, others such as the 498

458 pected losses and gains in the long term. The τ values for motifs transition from motif 17 to 16 instead increase mean diversity 499

459 4, 5, 8, 13, and 14—the motifs that are possible as the result of across subpopulations (Figure 4). 500

460 the transitions in Figure 6C—appear in Figures S1C, S1F, and

461 S1I. τ has the same value of 0.69 for both motifs 4 and 8, because Theoretical advances Our results extend classical coalescent 501

462 they have only isolated subpopulations. τ for motif 5 is 7.69 for theory results concerning migration models. Among the 18 502

463 M = 0.1, 1.81 for M = 1, and 1.42 for M = 10, in units of 2N network motifs we studied, 11 correspond to migration models 503

464 generations. Thus, in addition to being the transition leading to that differ from the standard models. 504

465 the greatest diversity loss, the transition from motif 9 to motif As has been seen previously (Slatkin 1987; Strobeck 1987; 505

466 4 is also the one that affects diversity the fastest. τ for motif Wilkinson-Herbots 2003), for motifs all of whose subpopula- 506

467 13 is 17.76 for M = 0.1, 3.34 for M = 1, and 2.19 for M = 10, tions are exchangeable and none of whose subpopulations are 507

468 in units of 2N generations. τ for motif 14 is 29.32 for M = 0.1, isolated, we find that the within-subpopulation pairwise coa- 508

469 4.72 for M = 1, and 2.68 for M = 10, in units of 2N generations; lescence times are independent of the migration rate (Table S2). 509

470 the time required to restore diversity exceeds the time it takes Interestingly, we found that this result on migration rate inde- 510

471 to lose it. Among assisted gene flow scenarios, the transition pendence also holds for motifs with disconnected components 511

472 to motif 14 that leads to the larger amount of diversity in the (motifs 2, 4, 5, 8-12), even though disconnection leads to viola- 512

473 long term among the pair of scenarios considered is the one with tion of the assumption of migration matrix irreducibility used 513

474 the slower change of diversity. This result suggests a trade-off in Slatkin (1987). This result can be explained by the fact that 514

475 between the magnitude and speed of the transition to long-term such motifs all involve juxtaposition of smaller motifs, each of 515

476 effects on diversity. which has exchangeable subpopulations, none of which are iso- 516

lated. Consequently, even though motifs 2, 4, 5, and 8-12 do not 517

477 Discussion satisfy the assumptions used in Slatkin (1987), that each compo- 518

519 478 We have presented a novel framework that combines network nent of the motif satisfies them suffices to ensure the result on

520 479 theory and population genetics to study the impact of popu- migration-rate independence.

480 lation structure on patterns of genetic variation under diverse Motifs 14, 15, and 17, for example, do not have exchangeable 521

481 assumptions about population connectivity. Treating a struc- vertices, nor can these motifs be decomposed into disconnected 522

482 tured population as a network containing vertices that represent components that each have exchangeable vertices. Their within- 523

483 subpopulations and edges that represent gene flow, consider- subpopulation coalescence times do depend on the migration 524

484 ing all possible population network motifs for sets of one to rate (Table S2). Nevertheless, within-subpopulation coalescence 525

485 four subpopulations, we have determined motif features that times of all motifs vary relatively little with the migration rate: 526

486 correlate with patterns of genetic variation. Among four mo- the difference between the maximum and minimum values is 527

487 tif statistics, we found that the mean node degree is the most less than 15% of the minimum (Table S2). We find that migration 528

488 strongly correlated with within-subpopulation diversity, and rates have only a small effect on within-subpopulation diversity 529

12 Alcala et al. bioRxiv preprint doi: https://doi.org/10.1101/190249; this version posted September 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

530 in many spatial configurations. Indian sky island birds of genus Brachypteryx and Indian tigers. 571

572 531 Our results also extend classical theoretical results about ge- After suggesting the most appropriate motif for each species and

the sequence of transitions that might have led to the current 573 532 netic differentiation. Under the island model, FST follows a

574 533 formula 1/(1 + αM), where the constant α > 0 determines the motif, we enumerated future disturbances and highlighted the

575 534 relative impact of drift and migration (Wright 1951; Nei and ones that would have the strongest long-term impact on genetic

576 535 Takahata 1993), and it approximately follows 1/(1 + αM) under diversity within subpopulations. For tigers, we enumerated

577 536 the stepping-stone model (Cox and Durrett 2002). For a fixed possible assisted gene flow scenarios and highlighted scenarios

578 537 number of subpopulations K, among networks with all nodes leading to the greatest long-term genetic diversity increase.

538 connected, α is smallest under the island model and largest Many studies have focused on deducing population networks 579

539 under the stepping-stone model (Cox and Durrett 2002). from genetic data in population genetics; most such applications 580

have focused on clustering using community detection algo- 581 540 We exhibit additional models under which pairwise FST fol- rithms, without using a population-genetic model (Dyer and Na- 582 541 lows 1/(1 + αM), where α is intermediate between that expected

son 2004; Dyer 2015; Garroway et al. 2008; Rozenfeld et al. 2008; 583 542 under the island and stepping-stone models (Table S3). All mod-

Ball et al. 2010; Munwes et al. 2010; Greenbaum et al. 2016). While 584 543 els with exchangeable vertices or decomposable into compo-

these statistical approaches are appealing for making sense of 585 544 nents each with exchangeable vertices (motifs 2-5, 7-9, 11, 12,

complex datasets, the mechanistic models we consider are useful 586 545 16, 18) have FST values that follow this formula. Interestingly, for providing predictions about genetic diversity patterns. We 587 546 motif 13, which does not have exchangeable vertices and is not

have demonstrated how simple network motifs can be deduced 588 547 decomposable in this manner, also has an FST that follows such a from cluster analyses and pairwise F values and can then be 589 548 formula, with α = 2 or 3. Its α values lie near those of the island- ST

used to predict the impact of future disturbances. 590 549 migration motif 18, with α = 8/3, and the linear stepping-stone

550 motif 14, with α ranging from 1.129 to 3.2 across population The network theory framework is promising for the analysis 591

551 pairs and across migration rates, and are also similar to that of of natural populations whose spatial arrangements do not follow 592

552 the circular stepping-stone motif 16 (α = 2 or 8/3), although classical migration models. For example, river systems involve 593

553 motif 16 has more connections. subpopulations arranged along a stream, leading to a motif 594

with a linear arrangement such as 3, 6, and 14, or in different 595 554 Motifs 15 and 17 also have non-exchangeable vertices and are streams, leading to a star-shaped motif such as 13 (Morrissey and 596 555 not decomposable, and they have FST values whose expressions de Kerckhove 2009). Geographic barriers owing to mountains, 597 556 involve rational functions of M (Table 4). We show in Table S3, valleys, and human occupation can isolate one (motifs 2, 5, 11 598 557 however, that their FST values approximately follow an expres- and 12) or several subpopulations (motifs 4, 8, and 9). Moreover, 599 558 sion of the form 1/(1 + αM), with α ranging from 1.747 to 2.989; many landscapes present a specific zone with high resistance, 600 559 in addition, their α values are close to that of motif 16, with for example owing to low habitat quality, leading to partial 601 560 α = 2 or 8/3. Although motifs 4, 5, and 8-12 have at least two isolation of a subpopulation from a strongly connected set of 602 561 disconnected components and thus their global FST is equal to 1 subpopulations (motif 15). 603 562 irrespective of the value of M, their pairwise FST values for con-

604 563 nected subpopulations do follow 1/(1 + αM), with α ranging Our exhaustive enumeration of motifs ensures that we can

605 564 from 2 to 4. Overall, our results highlight that the classical for- confront empirical data with expected patterns of genetic vari-

606 565 mula F = 1/(1 + αM) is a helpful approximation for all motifs ation under any spatial arrangement. This enumeration can

607 566 with up to four subpopulations. improve our ability to interpret genetic data, especially for threat-

ened species, which typically present high fragmentation and 608

567 Data applications Our results provide a framework for inter- are likely to undergo future disturbances resulting from further 609

568 preting empirical patterns of genetic diversity and differentia- human-induced habitat loss or from conservation efforts such 610

569 tion, and for predicting future patterns. We have illustrated how as assisted gene flow. The framework is also promising for con- 611

570 they provide insight into two systems of conservation interest, servation planning, because it suggests which connections or 612

Coalescent theory of migration motifs 13 bioRxiv preprint doi: https://doi.org/10.1101/190249; this version posted September 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

613 subpopulations are more important in contributing to genetic 2013; Alcala and Vuilleumier 2014). The expected diversities 654

614 variation. Historical human impacts, ongoing urbanization, and in Tables 1-4 correspond to long-term expectations, and give a 655

615 habitat fragmentation are leading to species range collapse and sense of the potential of a given spatial configuration to permit 656

616 population decline (e.g. carnivores; Ripple et al. 2014). Some large levels of genetic diversity. To assess the impact of a pertur- 657

617 species, such as the sky island birds of genus Brachypteryx, are bation, long-term expectations must be contrasted with the time 658

618 specialized to habitats that are naturally patchy and isolated to reach them. We thus advocate computation of the half-time 659

619 (Robin et al. 2015). Understanding the consequences of such to equilibrium τ, which gives a sense of the time needed for nu- 660

620 patchiness from a network perspective can provide insights on cleotide diversity and FST to approach their equilibrium values. 661

621 mitigation for ongoing habitat fragmentation. Interestingly, we find that τ is strongly correlated with the mean 662

663 622 In species such as the Indian tiger, conservation might require vertex degree; it would be worthwhile to assess the potential of

| | 664 623 management strategies that include assisted migration (Kelly V as a predictor of τ in larger networks.

624 and Phillips 2016). In such contexts, strategies can be designed

625 for maximizing genetic variation, by giving the existing set of Conclusion This work is a step toward developing a general 665 626 subpopulations the most favorable connections possible. For theory that links network topology and patterns of genetic vari- 666 627 designing such strategies, our approach provides an alternative ation. Small motifs are the building blocks of complex networks 667 628 to spatially explicit landscape-genetic models focused on effects (Milo et al. 2002); thus, for large networks, counting the number 668 629 in physical space, enabling assessment of the potential genetic of appearances of each 3- or 4-vertex motif can give an initial 669 630 consequences of alternative network motifs. idea of the fine-scale structure of the population. The results we 670

have derived make it possible to link this fine-scale structure 671 631 Extensions Several assumptions of our model could be relaxed to local patterns of variation. For example, if we find many in- 672 632 to make it more closely match natural systems. We only con- stances of motifs 17 or 18, we might conclude that the network is 673 633 sidered homogeneous subpopulations, with equal sizes and dense and thus has large diversity, low FST, long time to equilib- 674 634 similar migration rates in all non-isolated subpopulations, and rium after a perturbation and a high robustness to perturbation. 675 635 equilibrium genetic variation. Heterogeneous sizes are com- On the other hand, if we find many instances of lower-density 676 636 mon in environments with varying habitat quality (Dias 1996), motifs 9, 10, and 14, we might reach the opposite conclusions. 677 637 and migration rate differences are common in species that dis-

638 perse passively, such as by currents or wind (Vuilleumier and The detection of motifs that are overrepresented in certain 678

639 Possingham 2006). Permitting heterogeneity would increase the types of network (e.g. ecological, neural, protein-interaction) has 679

640 number of motifs possible for fixed numbers of subpopulations, been used to identify network classes that share common prop- 680

641 potentially introducing source-sink dynamics (Dias 1996). These erties despite describing different data types (Milo et al. 2002; 681

642 dynamics are expected to influence robustness to loss of a con- Alon 2007). Further work could consider motifs that are overrep- 682

643 nection or subpopulation: we expect nucleotide diversity to be resented in population networks, to assess whether population 683

644 robust to loss of a connection to a sink subpopulation or loss of networks have a shared “motif signature” or if certain networks 684

645 a sink subpopulation itself, because such subpopulations might are more common in certain habitats (e.g. marine, river, terres- 685

646 be small with relatively low nucleotide diversity. Conversely, trial). Such an approach could help identify similarities between 686

647 we expect nucleotide diversity to be less robust to the loss of population networks and other types of biological networks. 687

648 a source subpopulation, as these subpopulations are typically Our results can potentially be extended to larger networks, and 688

649 larger and more diverse. it could be assessed how global patterns in genetic diversity and 689

650 Non-equilibrium genetic diversity is common in species that FST can be predicted from information on the occurrence of small 690

651 face frequent environmental changes, and it can result in tran- motifs. Such an extension will become increasingly valuable as 691

652 sient levels of genetic variation that strongly differ from the more empirical studies sample genomic datasets from broad 692

653 equilibrium and that persist for many generations (Alcala et al. geographical scales with fine-grained sampling resolution. 693

14 Alcala et al. bioRxiv preprint doi: https://doi.org/10.1101/190249; this version posted September 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Population Biology 56: 1–10. 734 694 Acknowledgments

Garroway, C. J., J. Bowman, D. Carr, and P. J. Wilson, 2008 Ap- 735 695 We thank G. Greenbaum for comments. Part of this work was plications of graph theory to landscape genetics. Evolutionary 736 696 completed when UR was a visitor at the Stanford Center for Applications 1: 620–630. 737 697 Computational, Evolutionary, and Human Genomics (CEHG). Greenbaum, G., A. R. Templeton, and S. Bar-David, 2016 Infer- 738 698 We acknowledge support from NSF grant DBI-1458059, a CEHG ence and analysis of population structure using genetic data 739 699 postdoctoral fellowship, and Swiss National Science Foundation and network theory. Genetics 202: 1299–1312. 740 700 Early Postdoc.Mobility fellowship P2LAP3_161869. Kelly, E. and B. L. Phillips, 2016 Targeted gene flow for conser- 741

vation. Conservation Biology 30: 259–267. 742

701 Literature Cited Kimura, M., 1953 "Stepping Stone" model of population. Annual 743

Report of the National Institute of Genetics Japan 3: 62–63. 744 702 Alcala, N., D. Streit, J. Goudet, and S. Vuilleumier, 2013 Peak Kimura, M., 1969 The number of heterozygous nucleotide sites 745 703 and persistent excess of genetic diversity following an abrupt maintained in a finite population due to steady flux of muta- 746 704 migration increase. Genetics 193: 953–971. tions. Genetics 61: 893–903. 747 705 Alcala, N. and S. Vuilleumier, 2014 Turnover and accumulation Kingman, J. F. C., 1982 On the genealogy of large populations. 748 706 of genetic diversity across large time-scale cycles of isolation Journal of Applied Probability 19A: 27–43. 749 707 and connection of populations. Proceedings of the Royal Soci- Maruyama, T., 1970 Effective number of alleles in a subdivided 750 708 ety of London B: Biological Sciences 281: 20141369. population. Theoretical Population Biology 1: 273–306. 751 709 Alon, U., 2007 Network motifs: theory and experimental ap- Milo, R., S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, 752 710 proaches. Nature Reviews Genetics 8: 450–461. and U. Alon, 2002 Network motifs: simple building blocks of 753 711 Asmussen, S., 2008 Applied probability and queues. Springer-Verlag complex networks. Science 298: 824–827. 754 712 New York. Mondol, S., M. W. Bruford, and U. Ramakrishnan, 2013 Demo- 755 713 Ball, M. C., L. Finnegan, M. Manseau, and P. Wilson, 2010 Inte- graphic loss, genetic structure and the conservation implica- 756 714 grating multiple analytical approaches to spatially delineate tions for Indian tigers. Proceedings of the Royal Society of 757 715 and characterize genetic population structure: an application London B: Biological Sciences 280: 20130496. 758 716 to boreal caribou (Rangifer tarandus caribou) in central Canada. Mondol, S., K. U. Karanth, and U. Ramakrishnan, 2009 Why 759 717 Conservation Genetics 11: 2131–2143. the Indian subcontinent holds the key to global tiger recovery. 760 718 Cox, J. T. and R. Durrett, 2002 The stepping stone model: new PLoS Genetics 5: e1000585. 761 719 formulas expose old myths. Annals of Applied Probability 12: Morrissey, M. B. and D. T. de Kerckhove, 2009 The maintenance 762 720 1348–1377. of genetic variation due to asymmetric gene flow in dendritic 763 721 Dias, P. C., 1996 Sources and sinks in population biology. Trends metapopulations. American Naturalist 174: 875–889. 764 722 in Ecology & Evolution 11: 326–330. Munwes, I., E. Geffen, U. Roll, A. Friedmann, A. Daya, 765 723 Donnelly, P. and S. Tavaré, 1995 Coalescents and genealogical Y. Tikochinski, and S. Gafny, 2010 The change in genetic di- 766 724 structure under neutrality. Annual Review of Genetics 29: 401– versity down the core-edge gradient in the eastern spadefoot 767 725 421. toad (Pelobates syriacus). Molecular Ecology 19: 2675–2689. 768 726 Dyer, R. J., 2015 Population graphs and landscape genetics. An- Natesh, M., G. Atla, P. Nigam, Y. V. Jhala, A. Zachariah, 769 727 nual Review of Ecology, Evolution, and Systematics 46: 327– U. Borthakur, U. Ramakrishnan, Y. V. Jhala, and A. Zachariah, 770 728 342. in press Conservation priorities for endangered Indian tigers 771 729 Dyer, R. J. and J. D. Nason, 2004 Population graphs: the graph through a genomic lens. Nature Sci Rep xx: xx–xx. 772 730 theoretic shape of genetic structure. Molecular Ecology 13: Nath, H. B. and R. C. Griffiths, 1993 The coalescent in two 773 731 1713–1727. colonies with symmetric migration. Journal of Mathematical 774 732 Fu, Y.-X. and W.-H. Li, 1999 Coalescing into the 21st century: Biology 31: 841–852. 775 733 an overview and prospects of coalescent theory. Theoretical

Coalescent theory of migration motifs 15 bioRxiv preprint doi: https://doi.org/10.1101/190249; this version posted September 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

776 Nei, M. and N. Takahata, 1993 Effective population size, genetic vances in Applied Probability 35: 665–690. 818

777 diversity, and coalescence time in subdivided populations. Wright, S., 1951 The genetical structure of populations. Annals 819

778 Journal of Molecular Evolution 37: 240–244. of Eugenics 15: 323–354. 820

779 Notohara, M., 1990 The coalescent and the genealogical process

780 in geographically structured population. Journal of Mathe-

781 matical Biology 29: 59–75.

782 Read, R. C. and R. J. Wilson, 2005 An Atlas of Graphs (Mathematics).

783 Oxford University Press.

784 Ripple, W. J., J. A. Estes, R. L. Beschta, C. C. Wilmers, E. G.

785 Ritchie, M. Hebblewhite, J. Berger, B. Elmhagen, M. Letnic,

786 M. P. Nelson, et al., 2014 Status and ecological effects of the

787 world’s largest carnivores. Science 343: 1241484.

788 Robin, V. V., P. Gupta, P. Thatte, and U. Ramakrishnan, 2015

789 Islands within islands: two montane palaeo-endemic birds

790 impacted by recent anthropogenic fragmentation. Molecular

791 Ecology 24: 3572–3584.

792 Rosenberg, N. A. and M. Nordborg, 2002 Genealogical trees,

793 coalescent theory and the analysis of genetic polymorphisms.

794 Nature Reviews Genetics 3: 380–390.

795 Rozenfeld, A. F., S. Arnaud-Haond, E. Hernández-García, V. M.

796 Eguíluz, E. A. Serrão, and C. M. Duarte, 2008 Network anal-

797 ysis identifies weak and strong links in a metapopulation

798 system. Proceedings of the National Academy of Sciences

799 U.S.A. 105: 18824–18829.

800 Slatkin, M., 1987 The average number of sites separating DNA

801 sequences drawn from a subdivided population. Theoretical

802 Population Biology 32: 42–49.

803 Slatkin, M., 1991 Inbreeding coefficients and coalescence times.

804 Genetical Research 58: 167–175.

805 Strobeck, C., 1987 Average number of nucleotide differences in

806 a sample from a single subpopulation: a test for population

807 subdivision. Genetics 117: 149–153.

808 Vuilleumier, S. and H. P. Possingham, 2006 Does colonization

809 asymmetry matter in metapopulations? Proceedings of the

810 Royal Society of London B: Biological Sciences 273: 1637–1642.

811 Wakeley, J., 1998 Segregating sites in Wright’s island model.

812 Theoretical Population Biology 53: 166–174.

813 Wilkinson-Herbots, H. M., 1998 Genealogy and subpopulation

814 differentiation under various models of population structure.

815 Journal of Mathematical Biology 37: 535–585.

816 Wilkinson-Herbots, H. M., 2003 Coalescence times and FST val-

817 ues in subdivided populations with symmetric structure. Ad-

16 Alcala et al. bioRxiv preprint doi: https://doi.org/10.1101/190249; this version posted September 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

821 Appendix A. Deriving expected coalescence times 3-vertex motifs 843

822 Expected coalescence times can be obtained by first-step analysis. In the case of three subpopulations (Fig. A1), eq. A.1 becomes 844

823 The expected coalescence time of all states (ij) (eq. 4), where i

824 ranges in [1, 2, ... , K − 1] and j ranges in [i, i + 1, ... , K], can be

825 decomposed into a sum of expected coalescence times (Notohara 1 t¯12 M12 t¯13 M13 t¯11 = + + , M1 + 1 M1 + 1 M1 + 1 826 1990; Wakeley 1998) 1 t¯ M t¯ M t¯ = + 12 21 + 23 23 , 22 + + + K K M2 1 M2 1 M2 1 t¯ = E[t ] + p∗ t¯ , (A.1) 1 t¯ M t¯ M ij ij ∑ ∑ ij,k` k` t¯ = + 13 31 + 23 32 , k=1 `=k 33 M + 1 M + 1 M + 1 {k,`}6={i,j} 3 3 3 1 t¯ M t¯ M t¯ M t¯ M ¯ = + 11 21 + 13 23 + 22 12 + 23 13 t12 M M , 1 + 2 M1 + M2 M1 + M2 M1 + M2 M1 + M2 where E[tij] is the expected time before a change of state, and 2 2 1 t¯11 M31 t¯12 M32 t¯23 M12 t¯33 M13 t¯13 = + + + + , M1 M3 qij,k` + M1 + M3 M1 + M3 M1 + M3 M1 + M3 p∗ = , 2 2 ij,k` K K 1 t¯12 M31 t¯22 M32 t¯13 M21 t¯33 M23 ∑ ∑ qij,βγ t¯23 = + + + + . = = M2 M3 M + M M + M M + M M + M β 1 γ β 2 + 2 2 3 2 3 2 3 2 3 {β,γ}6={i,j} (A.4)

827 where q terms are taken from eq. 1. Eq. A.1 describes a system of We set the values Mij to reflect the network motifs of Figure 1, 845 K 828 (2) + K equations. In the next sections, we describe this system solve the linear system of equation, and report the corresponding 846

829 of equations in the cases of K = 1 to K = 4. expected times in Table 2. For example, for motif 5, M1 = 0, 847

M12 = M21 = M13 = M31 = 0 and M23 = M32 = M2 = M3 = 848

830 1-vertex motif M in all equations, and we obtain the system of equations 849

831 The case of one subpopulation has a single possible initial state,

832 which is given by classical coalescent results (Kingman 1982):

t¯11 = 1, t¯ = 1. (A.2) 1 t¯ M 11 t¯ = + 23 , 22 M + 1 M + 1 1 t¯23 M 833 This quantity directly gives the expected pairwise coalescence t¯ = + , 33 M + 1 M + 1 (A.5) 834 time of motif 1. 1 t¯ = + t¯ , 12 M/2 13 1 t¯ = + t¯ , 835 2-vertex motifs 13 M/2 12 1 t¯22 t¯33 836 In the case of two subpopulations, M = M = M = M = t¯ = + + . 12 21 1 2 23 M 2 2 837 M. Eq. A.1 then simplifies to

1 t¯ M t¯ = + 12 , 11 M + 1 M + 1 Note that this system is equivalent to considering that the iso- 850 1 t¯ M t¯ = + 12 , (A.3) lated subpopulation 1 follows eq. A.2, that subpopulations 2 and 851 22 M + 1 M + 1 3 follow eq. A.3 with labels 2 and 3 in place of 1 and 2, and that 852 1 t¯11 M t¯22 M t¯12 = + + . M 2M 2M coalescence times between subpopulations without migration (1 853

and 2 or 1 and 3) are infinite. 854 838 This system and its solution were derived in Nath and Griffiths

839 (1993). Setting M = 0 and solving the system for t¯11, t¯22, and t¯12 We can solve this system using substitution, by first noting 855

840 gives the expected pairwise coalescence times of motif 2 (Table 1). that t¯22 = t¯33, and by then substituting the expression for t¯22 856

841 Considering M > 0 and solving the system gives the expected into the equation of t¯23. We obtain (t¯11, t¯22, t¯33, t¯12, t¯13, t¯23) = 857

842 pairwise coalescence times of motif 3 (Table1). (1, 2, 2, ∞, ∞, 2 + 1/M) as reported in Table 2. 858

Coalescent theory of migration motifs 17 bioRxiv preprint doi: https://doi.org/10.1101/190249; this version posted September 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

859 4-vertex motifs

860 For four subpopulations, eq. A.1 simplifies to

1 t¯14 M14 t¯13 M13 t¯12 M12 t¯11 = + + + , M1 + 1 M1 + 1 M1 + 1 M1 + 1 1 t¯24 M24 t¯23 M23 t¯12 M21 t¯22 = + + + , M2 + 1 M2 + 1 M2 + 1 M2 + 1 1 t¯34 M34 t¯23 M32 t¯13 M31 t¯33 = + + + , M3 + 1 M3 + 1 M3 + 1 M3 + 1 1 t¯34 M43 t¯24 M42 t¯14 M41 t¯44 = + + + , M4 + 1 M4 + 1 M4 + 1 M4 + 1 1 t¯14 M24 t¯13 M23 t¯11 M21 t¯24 M14 t¯12 = + + + + M1 M2 M + M M + M M + M M + M 2 + 2 1 2 1 2 1 2 1 2 t¯23 M t¯22 M M + 13 + 12 , 13 M + M M + M 1 2 1 2 M /2 ¯ ¯ ¯ ¯ 31 1 t14 M34 t12 M32 t11 M31 t34 M14 M M /2 t¯13 = + + + + 12 23 M1 + M3 M + M M + M M + M M + M 2 2 1 3 1 3 1 3 1 3 11 12 13 t¯ M t¯ M M21/2 M32/2 + 33 13 + 23 12 , + + M /2M M1 M3 M1 M3 21 12 M21/2 M12/2 ¯ ¯ ¯ ¯ 1 t13 M43 t12 M42 t11 M41 t44 M14 M t¯14 = + + + + 23 M1 M4 M + M M + M M + M M + M 2 + 2 1 4 1 4 1 4 1 4 1 M31 M13/2 22 M /2 23 t¯ M t¯ M 32 + 34 13 + 24 12 , 1 M1 + M4 M1 + M4 M32 M23/2 1 t¯24 M34 t¯22 M32 t¯12 M31 t¯34 M24 t¯23 = + + + + M2 + M3 M + M M + M M + M M + M 1 2 2 2 3 2 3 2 3 2 3 coalescence 33 t¯ M t¯ M + 33 23 + 13 21 , M2 + M3 M2 + M3 Figure A1 State diagram of the Markov chain representing 1 t¯ M t¯ M t¯ M t¯ M ¯ = + 23 43 + 22 42 + 12 41 + 44 24 the coalescent process of two lineages sampled in K = 3 t24 M M 2 + 4 M2 + M4 M2 + M4 M2 + M4 M2 + M4 2 2 subpopulations. States appear in gray and correspond to t¯34 M23 t¯14 M21 + + , those presented in Figure 2; transition rates between states M2 + M4 M2 + M4 1 t¯33 M43 t¯23 M42 t¯13 M41 t¯44 M34 appear in black. Mij corresponds to the scaled migration rate t¯34 = + + + + M3 M4 M + M M + M M + M M + M 2 + 2 3 4 3 4 3 4 3 4 between subpopulations i and j. This diagram applies to all t¯ M t¯ M + 24 32 + 14 31 . motifs with K = 3 subpopulations—motifs 4 to 6 in Fig- M3 + M4 M3 + M4 (A.6) ure 1. For example, motif 4 corresponds to the case where

861 Similarly to the case of 3-vertex motifs, we set the values Mij M12 = M21 = M13 = M31 = M23 = M32 = 0, and mo-

862 to reflect the network motifs of Figure 1, solve the system of tif 5 corresponds to the case where M12 = M21 = M and

863 equations using substitution or matrix inversion, and report the M13 = M31 = M23 = M32 = 0.

864 corresponding expected coalescence times in Table 3.

18 Alcala et al.