1 Supplementary: Inferring livestock movement networks from archived 2 data to support infectious disease control in developing countries 3 4 A. Muwonge1,4, P.R. Bessell1, T. Porphyre1,5, P. Motta1,6, G. Rydevik 1 , G. Devailly1,3, N.F. 5 Egbe2, R.F. Kelly1, I.G. Handel1,4, S. Mazeri1,4, B.M.deC. Bronsvoort1,4 6 7 8 9 1. The Roslin Institute and the Royal (Dick) School of Veterinary Studies, University of 10 Edinburgh, Easter Bush, Midlothian, EH25 9RG, UK 11 12 2. School of Life Sciences, University of Lincoln, Brayford Pool, Lincoln LN6 7TS 13 United Kingdom. 14 15 3. GenPhySE, Université de Toulouse, INRAE, ENVT, 31326, Castanet Tolosan, France 16 17 4. Epidemiology Economics and Risk Assessment group at The Roslin Institute and the 18 Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, 19 Midlothian, EH25 9RG, UK 20 21 5. Université Lyon 1, CNRS, VetAgro Sup, Laboratoire de Biométrie Et Biologie 22 Évolutive, Université de Lyon, Villeurbanne Cedex, France 23 24 6. The European Commission for the Control of Foot-and-Mouth Disease (EuFMD), Food 25 and Agriculture Organization of the United Nations, Rome, Italy 26

27

28

29

30

31

32

33

34

35 36 Materials and methods

37 Hypothesis and conceptual framework

38 This comparative analysis was implemented with some assumption about the livestock 39 production of a) The cattle lifecycle includes three stages highlighted here as 40 “Rearing on farm”, “movement for Trading”, ending up “at the slaughter house”, b) the datasets 41 census, empirical and molecular used here represent the three stages respectively. We then 42 generate networks from each dataset using the following methodologies; 43 1. Cattle movement network derived as a function of human protein demand using gravity 44 modelling 45 2. Derive an empirical network topology from an edge list generated from cross- sectional 46 study out team conducted (DBSX1) and 47 3. Using phylodynamic modelling of host-to-host pathogen transmission network, since 48 pathogens are considered “hitchhikers” on hosts 49 4. Derive a random network topology. The Empirical and random are used as controls, 50 i.e. the former as the reference/gold standard and the latter as the negative control/null

51 The vast majority of network structures/topologies are a product of dynamic processes [1,2], 52 therefore one can think of the resultant network topology as a relic of the contact structure. So, 53 based on this contact structure we can elucidate disease spread by simulation. The novelty here 54 is the ability to repurpose generally archived data census and molecular data. Similarity and 55 dissimilarity in topology and simulated disease characteristics between gravity, molecular and 56 empirical network as well as the random equivalents allow us to examine the following; 57 a) The amount of overlap in information captured, 58 b) the complementally utility from i.e. the extra information each captures, 59 c) how specific and non-specific each network can be. All this information can be exploited 60 to support data driven livestock disease management especially resource allocation.

61 Description of data source and context

62 Empirical data set (For R code see section-A2 in Network_Generation_Code)

63 The empirical network (EN) was generated using data collated on cattle movements through 64 the livestock trading system across Adamawa, West and North-West regions of Cameroon. The 65 lists of cattle markets present within these regions were obtained from the Ministry of 66 Livestock, Fisheries and Animal Industries (MINEPIA). Combining this information with the 67 analysis of commercial connections of between markets in each region identified a total of 59 68 livestock markets [18].

69 Census data summary

70 The census data used represents approximately 8.85 and 10.3 million head of cattle and humans 71 respectively. The human and cattle population difference between sub divisions within regions 72 is shown in Fig S1. It is however noteworthy that ratio of human to cattle is highest and lowest 73 in Adamawa and central regions respectively. Furthermore, that areas without cattle or human, 74 or missingness of one population were excluded for our analysis.

75 A B

2000000

5e+05

1500000 4e+05

3e+05

1000000

2e+05

500000

1e+05

0e+00 0 76

77 Figure S1 Spatial distribution of the (A) human and (B) cattle population generated using ggplot in R 78 using the Cameroonian census data of 2005-2007(DBSX3). Color scheme ranges from dark blue- 79 yellow for the legend and represents population ranges (A) 0-2million and (B) 0-0.5million.

80 Principles behind each network topology

81 Molecular network topology (For R code see section-A1 in Network_Generation_Code)

82 Here we exploit the principals of “measurably evolving populations” (MEP) of pathogens [3] 83 to reconstruct transmission network based of M.bovis (Fig 2) 84 Panel 2a

ID MIRU-VNTR Spoligopattern Molecular distance A B ID ID Distance C D A B 0 E F G C D 4 C E 2 E F 4 distance

Map Physical distance Molecular ID ID Distance A Physical distance B A B 2 C C D 3 G E D C E 3.5 F E F 5

85

86 Panel 2b Panel 2c

Loca/on(A( Genotype( Period!of!interest! !y! A1( !x!

MRCA( "! Genotype( ! A(

!Transmission!events! Genotype(

!!!!! A2( !! Loca/on(B( !!!!!!Muta&on!events! Period!of!interest! 87

88 Figure S2 illustrates the approach used to generate the molecular network. Panel A shows how we have 89 extracted molecular distance from M.bovis genotypes and Physical distance from the host(cattle). Panel 90 B shows how molecular distance and physical distance are related to transmission and mutation events 91 (MEP). Here transmission events (are analogous to physical distance µ) and mutation events (analogous 92 to genetic distance α). The “window” x-y defines the epidemiological “window” of interest in space 93 and time [15]. Panel C puts this in phylogeographic context, the pathogen genotypes are cast in space(µ) 94 and time(α). The two genotypes A1 and A2 evolved from the MRCA, most recent common ancestor, 95 note here that they are isolated in two different locations i.e. location A & B. From phylogeography, 96 there must be a direct relation between physical distance and molecular distance. It is the linear 97 relationship that defines the data used to construct our molecular network

98 To contextualize this, take two genotypes; A1 and A2 recovered from cattle in location X and

99 Y, and assuming the genotypes have a common recent ancestor Ao (Figure S2-Panel B). We 100 can use the molecular and physical distance α and µ to extract data points from DBSX2 that 101 satisfy the linear relationship (Fig S3) and ideally our “window” µ ~ α of interest (Figure S2- 102 Panel C) & quadrant B (Fig S3). Given our datasets we assume this window represents 2007 103 to 2014 and accounts for the period between pathogen transmission, latent infection and 104 infectious period for the cattle in Africa. [4]. This is why we use the census data for 2005-2007. 105 In our data set we define mutation changes as reported by [5] i.e. a mutation event as the 106 difference in steps in a MIRU-VNTR type between any two isolates with the same spoligotype 107 (Fig S2-Panel A) [16]. Physical distance is computed as linear Euclidean distances between 108 any two sub counties. We therefore use the data points from quadrant B (Fig S3) to generate 109 the undirected molecular network which we direct using the molecular diversity at each 110 subdivision.

Quadrant A Quadrant C

9 ●● ● ● ● ● ●● ● ●● ● 8 ● ● ● ●

8 ● ● ●●●●●● ●●● ● ● ●●● ●● ●● ●● ●● ●● ● ●● ● ● ● ● 7 ● ● ● ● ● ● ● ● ● ● ● ● ●

7 ● ● ●●●●●● ●●● ● ● ●● ● ●● ●●● ● ●●● ● ●●●● ● ● ● ● ● ● ●

6 ●● ● ● ●●● ● ● ● ●● ●● ● ●● ● ● ● ● ●●● ● ●

6 ● ● ●● ● ● ●●●●●●●●●● ●● ● ● ● ●● ●●● ●●● ●● ● ●●● ●●●●● ● ●●●●● ● ●● ● ● ● ● ● ●

Genetic distance 5 ●● ●●● ● ●● ● ● ● ● ●● ●●●● ● ●● ● ● ● ● ●● ● ● ● ●● ● 5 ● ● ●● ● ●●●●●●●●● ● ●● ● ● ● ● ●● ●●● ●● ●● ●●● ●●●● ● ● ●●●● ● ●● ● ●● ● ● ● ●

4 ● ● ● ● ●●●●●●●●●● ● ●●●●●● ●● ●● ● ● ●● ● ●●● ●● ●●● ● ● ●●● ● ● ●● ●● ● ●● ● ●● ● ● ● 4 ●● ●● ● ● ●● ● ● ● ● ●● ●● ● ● ● ●●●● ● ●● ● ● ● ● ● ●● ●● ● 0 1 2 3 4 5 6

Quadrant B Quadrant D

4 ● ● ● ● ●●●●●●●●●● ● ●●●●●● ●● ●● ● ● ●● ● ●●● ●● ●●● ● ● ●●● ● ● ●● ●● ● ●● ● ●● ● ● ● 4 ●● ●● ● ● ●● ● ● ● ● ●● ●● ● ● ● ●●●● ● ●● ● ● ● ● ● ●● ●● ●

3 ● ● ● ●●● ●●●●●● ●● ● ●●●●● ●●● ●● ● ● ●● ● ● ● ● ●●● ●● ●● ● ●● ● ●● ● ● 3 ● ● ●●● ● ●● ● ● ● ●● ●●●● ●●● ● ● ● ● ● ● ● ● ● ●

2 ● ● ●● ●●● ●●●●●●●●●●●●●● ●●●● ●●● ● ● ● ●● ●●● ● ● ● ● ● ● ●● ● ● ● ● 2 ● ●● ● ●● ● ● ● ● ● ●● ●● ●● ● ● ●● ●● ● ●● ● ● ● ● ●

Genetic distance 1 ● ●●● ● ●●●●● ●●●●● ●●●● ●●● ●● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● 1 ● ● ●● ● ●●●● ● ● ● ●● ● ● ●● ● ●● ● ● ● ●● ● ●

0 ● ●● ●● ●●●●●●●● ●● ●● ●●● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● 0 ● ● ● ● ●● ●● ●● ●● ● ●● ● ● ●●● ● ● ● ● 0 1 2 3 4 5 6 111 Physical distance Physical geographic distance

112 Figure S3 Relationship between genetic distance (α) and physical distance (µ) using our data. 113 Quadrant B would then represent the “window” whose data points we used to derive the molecular 114 network topography.

115 The dataset used for generating the molecular distance [5] contained 25 unique subdivisions of 116 which 20 were used for deriving the molecular network topography. Four and fifty-one unique 117 spoligotypes and MIRU-VNTR types respectively were used to generate molecular distances. 118 In this regard spoligotypes SB0944, SB0953, SB1025 and SB1460 listed in their order of 119 prevalence were used. 120 Gravity network topology (For R code see section-A3 in Network_Generation_Code)

121 Gravity models have widely been used in economic, it is adapted from Newton’s law of 122 gravitation in physics [6], here it is modelled to measure bilateral cattle movement between 123 any two given sub divisions of Cameroon. We assume that the movement of cattle between 124 any two given sub division is directly proportional to their respective difference in cattle and 125 human population and inversely proportional to the square of the Euclidian distances between 126 them.

Location A 21

Location B %&'()*+*), -(../0 *+*), -(../0*+*)1 %&'()*+*)1

22

$ !"#

6789:;<;=" ×?9@@AB;<;=# 2 = $ !CD 127

128 Figure S4 is a gravity model illustration, the circular shapes each represents a sub division whose size 129 represents the size of the human, cattle population. The shorter the distance between any two sub 130 divisions, and the greater the size of the populations the greater the gravitation pull between the sub 131 divisions hence flow of cattle between them

132

133

134

135

136

137

138 0.70 139 Gravity model threshold determination

140

0.65 141 Figure S5 Shows the area under the curve 142 (AUC) analysis between the gravity model 143 and the empirical data with an aim of 144 determining the threshold animal movement 0.60 145 required to define an edge(link) between any

Area Under the Curve 146 two nodes in the gravity network. In this case 147 we notice that 1.25 animals are required to 148 define a link in the gravity model. This 0.55 149 threshold is used to infer a national level 150 gravity network- For R code see 151 Cameroon_Gravity_Model.R 0.50

1 100 10000 152 Threshold Cattle flow to qualify as a movement 153

154 Results

Empirical network Molecular network Dual−function 1.00 Pulse−takers Dual−function Ngaoundere ●I 1.00 ● I ● ● Wum Banyo Belo Nkambe ● Bum 0.75 0.75 ● Belel Ngan−ha ● Pulse−takers ● Ngaoundere III

Nyambaka 0.50 0.50 ● Nkambe Ngaoundere II ● Gate−keepers

● Touboro ● ● Nkum

Eigenvector Centrality Eigenvector ● Misaje Centrality Eigenvector Banyo Bum● Bamenda I 0.25 0.25 ● ● Bibemi Fundong Bibemi Ngan−ha Bogo ● Wum ● Touboro Tchollire ● Tchollire Ngaoundere II ● ● ● 0.00 ● ● 0.00 Belo Mbengwi Gate−keepers BogoBelel Nyambaka Ngaoundere IIINgaoundere I 0 5 10 15 20 0 2 4 6 BetweennessCentrality BetweennessCentrality

Gravity network Random network

Dual−function Pulse−takers Dual−function ● 1.00 ● Misaje ● 1.00 Misaje Banyo ● Bum Nkambe ● Bamenda I Fundong Tchollire Ngan−ha 0.75 ● ● ● ● Wum 0.75 ● ● Bogo Nkum Pulse−takers Mbengwi Nkambe Belel ● 0.50 ● Nkum ● Mbengwi ● Ngaoundere III ● Gate−keepers ● 0.50 Nyambaka ● Ngaoundere II Bibemi Touboro ● ● Belo ● Gate−keepers Eigenvector Centrality Eigenvector Eigenvector Centrality Eigenvector Nyambaka ● 0.25 ● Ngan−ha Bamenda I Fundong ● ● ● Tchollire ● Touboro Bum ● Ngaoundere I ● Ngaoundere I Ngaoundere III ● ● 0.25 Belel● ● Wum ● Ngaoundere II Belo 0.00 Bogo● Bibemi ● Banyo 0 10 20 0 25 50 75 100 155 BetweennessCentrality BetweennessCentrality 156 Figure S6. Shows the key actor analysis, this is a comparison at node level aimed at 157 understanding the role each node plays in the networks.

158 As part of the comparison, we simulate infections on the generated networks as described in 159 the materials and methods. For further details on how to implement this in R, see code Epidemic 160 simulation_Code.R 161

162 Figure S7. Shows the impact of removing a node mean number infected at end of infection and 163 proportion if infection that take off in each network. This analysis aims to identify the most important 164 node in the network (For R code see Epidemic_Simulation_Leave_One_Out)

165 Table S1: Comparison between network by presence of absence of edges

166

167

168 Terminology

169 Table S2: Definitions for terms used in network analysis Network characteristic Definitions Density The ratio of the number of links in a network to the number of possible links Clustering coefficient is the ratio of existing links connecting a subdivision’s neighbors to each other to the maximum possible number of such links Average path length This is on average the number of steps it takes to move from one subdivision to another in each of the predicted networks Diameter This is a linear size of each of the networks, it is also defined as the longest of all shorted paths of each of the networks Assortativity Is a propensity of a subdivision to be linked with other sub divisions that are similar in someway Reciprocity Is quantitative measure in networks that describes the likelihood of subdivision in the directed networks being mutually linked Transitivity This is a measure of the extent to which a link between two sub divisions that are connect is transitive Node characteristics Degree centrality This is the number of links that are incident upon each of the subdivision in a network Betweeness centrality Is a measure of centrality that quantifies the number of times a subdivision in a network acts as a bridge along the shortest path between two other subdivisions Eigenvector centrality This is a measure of the influence of a subdivision in a network Closeness centrality This is a measure of metric distance between any pair of subdivisions in a network also known as their shortest path 170 171

172 References

173

174 1. Ortiz-Pelaez A, Pfeiffer DU, Soares-Magalhaes RJ, Guitian 175 characterize the pattern of animal movements in the initial phases of the 2001 foot and mouth 176 disease (FMD) epidemic in the UK. Prev. Vet. Med. In Press,. 177 2. Chaters GL et al. 2019 Analysing livestock network data for infectious disease control: An 178 argument for routine data collection in emerging economies. Philos. Trans. R. Soc. B Biol. 179 Sci. 374. (doi:10.1098/rstb.2018.0264) 180 3. Biek R, Pybus OG, Lloyd-Smith JO, Didelot X. 2015 Measurably evolving pathogens in the 181 genomic era. Trends Ecol. Evol. (doi:10.1016/j.tree.2015.03.009) 182 4. Reyes JF, Tanaka MM. 2010 Mutation rates of spoligotypes and variable numbers of tandem 183 repeat loci in Mycobacterium tuberculosis. Infect. Genet. Evol. 10, 1046–51. 184 (doi:10.1016/j.meegid.2010.06.016) 185 5. Egbe NF et al. 2017 Molecular epidemiology of Mycobacterium bovis in Cameroon. Sci. Rep. 186 7. (doi:10.1038/s41598-017-04230-6) 187 6. Eichengreen B, Irwin D a. 1998 The Role of History in Bilateral Trade Flows. 188 (doi:10.3386/w5565)

189