bioRxiv preprint doi: https://doi.org/10.1101/224147; this version posted December 1, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Evidence against tetrapod-wide digit identities and for a limited

2 frame shift in bird wings 3 4 5 Thomas A. Stewart1,2*, Cong Liang3, Justin L. Cotney4, James P. Noonan5, Thomas J. Sanger6, 6 and Günter P. Wagner1,3* 7 8 1Department of Ecology and Evolutionary Biology, Yale University 9 2Minnesota Center for Philosophy of Science, University of Minnesota 10 3Systems Biology Institute, Yale University 11 4Department of Genetics and Genome Sciences, UConn Health 12 5Department of Genetics, Yale School of Medicine 13 6Department of Biology, Loyola University in Chicago 14 *To whom correspondence should be addressed 15 16 17 18 Summary: In crown group tetrapods, individual digits are homologized in relation to a pentadactyl 19 ground plan. However, testing hypotheses of digit homology is challenging because it is unclear 20 whether digits develop with distinct and conserved regulatory states. Here we show 21 dramatic evolutionary dynamism in the gene expression profiles of digits, challenging the notion 22 that five digit identities are conserved across amniotes. Transcriptomics of developing limbs 23 shows diversity in the patterns of genetic differentiation of digits, although the anterior-most digit 24 of the pentadactyl limb has a unique, conserved expression profile. Further, we identify a core set 25 of transcription factors that are differentially expressed among the digits of amniote limbs; their 26 spatial expression domains, however, vary between species. In light of these results, we 27 reevaluate the frame shift hypothesis of avian wing evolution and conclude that only the identity 28 of the anterior-most digit has shifted position, suggesting a 1,3,4 digit identity in the bird wing. 29 30 31 Limbs evolved from paired fins in the Late Devonian, and early tetrapods possessed 32 more than five digits on the fore- and hindlimbs1,2. Later in the tetrapod stem, a pentadactyl 33 pattern stabilized as the ground plan for the limb. Individual digits are homologized between 34 species and between fore- and hindlimbs in reference to this pentadactyl ground plan3. However, 35 it remains controversial whether such hypotheses of identity correspond to distinct developmental 36 programs among the digits (developmental identities), or just the relative position of digits along 37 the limb’s anteroposterior axis (positional identities)4-7. Below we use the symbols D1, D2, etc. to 38 indicate positional identities in the pentadactyl ground plan, rather than to indicate developmental

1 bioRxiv preprint doi: https://doi.org/10.1101/224147; this version posted December 1, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

39 identities. 40 The anterior-most digit (D1) (e.g., human thumb) appears to have a distinct 41 developmental identity as compared to the more posterior digits (D2-D5). D1 is marked by a 42 unique gene expression profile—low expression of HoxD11 and HoxD12 and high expression of 43 Zic3 relative to other digits7-9—and it appears able to develop independently of Shh signaling9-11. 44 Additionally, analysis of morphological variation in primates identified a high degree of variational 45 independence of D1 relative to the more posterior digits12. Models of posterior digit identity have 46 been proposed according to the relative exposure of limb bud mesenchymal cells to Shh, which 47 emanates from the zone of polarizing activity prior to digit condensation10,11. However, broadly 48 conserved marker for individual posterior digits have not been identified in the interdigital 49 mesenchyme, the signaling center that patterns digits13,14. For instance, while the combinatorial 50 expression of Tbx2 and Tbx3 is necessary to generate the phenotypes of D3 and D4 in chicken 51 hindlimb15, it is questionable whether these developmental identities are conserved in other 52 species, like mouse, with limited morphological differentiation of the posterior digits. 53 Debates of digit homology are especially challenging to resolve when limbs have fewer 54 than five digits. This problem has been most actively investigated in the tridactyl avian wing, 55 because of the appearance of conflict between paleontological and developmental data16. The 56 fossil record of theropod dinosaurs shows a clear pattern of reduction of the posterior two digits in 57 the lineage leading to birds, yet digits in the wing have been described as developing in the 58 middle three positions of a pentadactyl developmental groundplan17-22. To explain this 59 discrepancy, the frame shift hypothesis was proposed16. It posited that a homeotic shift occurred 60 in the avian stem such that the developmental programs that were once expressed in D1, D2, 61 and D3 are now executed in the digits that develop in positions D2, D3, and D4 respectively. 62 Comparative analyses of gene expression have found support for this hypothesis: in situ 63 hybridization and transcriptomics have revealed similarity between the anterior digit of the adult 64 avian wing, which develops in position D2, and D1 of other limbs7,23. 65 Studies that aim to test digit homology assume that developmental identities (1) were 66 present in a common ancestor, (2) are conserved among the descendent lineages, and (3) are 67 reflected in gene expression profiles. Here we present comparative transcriptomic data from five 68 species that challenge these assumptions among amniotes by documenting a surprising diversity 69 of digital gene expression profiles. Analyses further reveal a core set of genes 70 differentially expressed among digits and suggest a new model for the evolution of the bird wing. 71 72 73 Results 74 Disparity in digit expression profiles

2 bioRxiv preprint doi: https://doi.org/10.1101/224147; this version posted December 1, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

75 To characterize the gene expression profiles of digits in pentadactyl amniote limbs, we 76 sequenced RNA of developing digits and their associated posterior interdigital mesenchyme from 77 the forelimbs of mouse, green anole (Anolis), and American alligator (Fig. 1 a). In each of these 78 species, hierarchical cluster analysis (HCA) and principal component analysis (PCA) of the 79 transcriptomes shows a weak signal of sample clustering by digit (Extended Data Fig. 1). The 80 strongest signals of digit-specific expression profiles are observed in D1 of mouse and D4 of the 81 alligator. Groupings of the other digit samples are not well supported. We hypothesized that this 82 result might imply that any signal of gene expression differentiation among digits is overwhelmed 83 by noise when all genes are considered, because most genes are likely irrelevant to the 84 developmental identity of digits. If such a signal exists, we predict that it will be reflected 85 preferentially in the expression of transcription factor and signaling genes. Therefore, we again 86 performed HCA and PCA on the samples of each species, this time using two gene lists: a 87 curated set of known limb patterning genes that are sensitive to Shh signaling (N=159)24, and 88 transcription factor genes (N=2183)25. 89 In mouse and alligator, HCA and PCA of known limb patterning genes results in 90 clustering of samples by digit (Fig. 1 b, c). In mouse, D1 is strongly differentiated from the other 91 digits. In alligator, an anterior cluster, comprised of digits D1, D2, and D3, is differentiated from a 92 posterior cluster, comprised of D4 and D5. By contrast, analysis of known limb patterning genes 93 in Anolis shows weak clustering of samples by digits (Fig. 1 d). This suggests a level of 94 homogeneity among Anolis digits that is not observed in either mouse or alligator. Analysis of all 95 transcription factors for these species yields comparable results to what is recovered for limb 96 patterning genes, but with generally lower adjusted uncertainty values in HCAs (Extended Data 97 Fig. 2). 98 To further test the hypothesis that there is limited gene expression differentiation among 99 Anolis digits as compared to the other pentadactyl limbs sampled, we took advantage of a result 100 from multiple testing theory26: If a differential expression analysis is conducted on two sample 101 types that are not genetically differentiated, then the resultant frequency distribution of p values 102 will be uniform within the [0, 1] interval. On the other hand, if there are truly differentially 103 expressed genes among the compared sample types, then the p value distribution is expected to 104 be biased towards p=0. We conducted differential expression analyses of adjacent digits of the 105 forelimbs of mouse, alligator, and Anolis using EdgeR27,28 and inspected p value distributions 106 (Fig. 2). In Anolis, all comparisons of adjacent digits result in p value distributions that are close to 107 uniform, suggesting that there is very weak, if any, genetic differentiation of adjacent fingers. We 108 note that this result is independent of any p value significance threshold or false discovery 109 correction method. By contrast, most of adjacent pairwise digit comparisons for mouse and 110 alligator show a strongly biased p value distribution, the exception being D2 and D3 in mouse.

3 bioRxiv preprint doi: https://doi.org/10.1101/224147; this version posted December 1, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

111 This is consistent with the idea that, in general, most digits in a limb are genetically differentiated, 112 while in Anolis genetic differentiation of digits is minimal or absent. 113 Given that these three limbs differ in their broad patterns of gene expression 114 differentiation of digits, we next asked whether individual genes show divergent or constrained 115 expression patterns across the forelimb in the different species. Specifically, we compared 116 adjacent digits, identified differentially expressed transcription factor genes, and then assessed 117 which differences are shared among mouse, alligator, and Anolis. Of the 1133 transcription factor 118 genes that are one-to-one orthologs in these three species, only four genes are differentially 119 expressed in a conserved pattern among corresponding adjacent digits (Fig. 3). There are three 120 genes that differentiate D1 from D2 (Hoxd11, Hoxd12, and Sall1), and there is one that 121 differentiates D4 from D5 (Tbx15) in all three species. No transcription factors are differentially 122 expressed in all three species between the median digits (i.e., differentiating D2 from D3, or D3 123 from D4). 124 If the homogeneity observed among Anolis forelimb digits is a derived condition, then this 125 could limit our ability to diagnose plesiomorphic developmental identities. Therefore, we also 126 considered the chicken hindlimb, which has digits D1-D4. We reanalyzed published 127 transcriptomic data for hindlimb digits7, mapping reads to a new chicken genome (Galgal5.0)29. 128 HCA and PCA of digits of the chicken hindlimb show a unique pattern of similarity as compared to 129 pentadactyl limbs: an anterior cluster, comprised of D1 and D2, is differentiated from the posterior 130 cluster, comprised of D3 and D4 (Extended Data Fig. 3). Similar to alligator, this pattern of 131 correspondence among the digits is stable across the developmental window sampled (st. 28– 132 31). As before, we tested for differential expression in adjacent digits and identified one-to-one 133 orthologous transcription factor genes that are differentially expressed at the same position 134 between mouse and alligator forelimb and chicken hindlimb (Fig. 4 a, Extended Data Fig. 4). Of 135 the 1049 transcription factor genes, ten differentiate D1 and D2 (n=10), none distinguish D2 and 136 D3, and one (Tbx3) differentiates D3 from D4 in all three species (Fig. 4 a). 137 Overall, data from these four species do not support the hypothesis that amniote digits 138 have conserved developmental identities. The exception appears to be D1, which likely had a 139 distinct developmental program in the most recent common ancestor of amniotes. We further 140 tested whether D1 has a conserved gene expression profile by sequencing RNA from developing 141 human fore- and hindlimb, which were partitioned into D1 and the posterior digital plate (D2-5). Of 142 the ten genes identified above as differentiating D1 and D2, six show conserved patterns of 143 expression change at this position: in all limbs sampled Hand2, Hoxd11, Hoxd12, and Tfap2b are 144 more highly expressed in D2 than D1, and Alx1 and Pax9 are more lowly expressed in D2 than 145 D1 (Fig. 4 b). 146 T-box family genes are predicted to regulate the identities of posterior digits15. Our data 147 provide some support for the hypothesis that this function is conserved across amniotes (Fig. 4

4 bioRxiv preprint doi: https://doi.org/10.1101/224147; this version posted December 1, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

148 c). Tbx2, which was previously shown to regulate posterior digit identity in the chicken hindlimb14, 149 shows divergent patterns of expression in the posterior digits of other species. Tbx3 differentiates 150 D3 from D4 in mouse, alligator, and chicken hindlimb, and the likelihood that it was recovered by 151 chance alone is 7.2x10-6 (binomial test); however, it is not differentially expressed at this position 152 in Anolis forelimb. Tbx15 differentiates D4 from D5 among pentadactyl limbs (Fig. 3), and the 153 likelihood that it was recovered by chance alone is 1.9x10-5 (binomial test). 154 Analyses aiming to identify genes that are conserved and differentially expressed at a 155 particular position within the limb (e.g., between D1 and D2 in mouse, alligator, and Anolis) can 156 be affected by the threshold stringency of the false discovery rate (FDR). Binomial tests, as 157 presented above, are one means of accounting for this. We present a second strategy for 158 assessing whether genes identified as differentially expressed in one species behave similarly in 159 other species that does not depend on a particular FDR threshold being reached in all species. 160 Specifically, we consider the genes identified as differentially expressed in one species between 161 adjacent digits (e.g., in mouse, 129 transcription factor genes are identified between D1 and D2). 162 Then we ask how expression fold change between the two digits in the original species compares 163 to expression fold change of the same genes and also a set of randomly selected genes of similar 164 expression levels in other species. To make these comparisons, we calculated Pearson’s 165 correlation of the fold changes between the original genes versus each of the two gene sets 166 (orthologs and random genes) in other species. Results of this approach broadly mirror those 167 described, above. 168 Among the pentadactyl limbs sampled, genes differentially expressed between D1 and 169 D2 behave consistently between species and can be distinguished from random genes, and 170 comparisons of the more posterior digits do not clearly distinguish orthologs from random genes, 171 (Extended Fig. 5 a-d). If chicken hindlimb is considered instead of the Anolis forelimb, we again 172 obtain strong support for conserved behavior of genes at the position D1 and D2, weaker support 173 for conserved gene behavior between D2 and D3, and comparisons at the position D3 and D4 do 174 not clearly distinguish orthologs from random genes (Extended Fig. 5 e-g). Thus, testing for 175 genes that are differentially expressed at the same position can recover genes that behave 176 consistently across species (i.e., Tbx15 between D4 and D5 among pentadactyl limbs, and Tbx3 177 between D3 and D4 between mouse, alligator, and Anolis), while comparisons of all genes 178 differentially expressed for these species might not show evidence of broadly conserved profiles. 179 Conversely, while we might obtain modest evidence for shared behavior among differentially 180 expressed genes (i.e., between digits D2 and D3 among mouse, alligator, and chicken), there 181 might be no individual genes recovered as differentially expressed among the taxa at that 182 position. However, both types of comparisons between digits D1 and D2 paint the consistent 183 picture that D1 exhibits a shared digit identity across these limbs. 184

5 bioRxiv preprint doi: https://doi.org/10.1101/224147; this version posted December 1, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

185 186 A core set of digit patterning genes 187 Given our result that the gene expression profile of digits is evolutionarily dynamic, we 188 next tested whether a conserved set of genes might pattern amniote autopods, albeit in different 189 spatial patterns. Specifically, we reanalyzed transcriptomic data of mouse, alligator and Anolis 190 forelimbs and chicken hindlimb, conducting ANOVA to test for genes that were differentially 191 expressed between any two digits in the limb, not just adjacent digits. This analysis recovers 192 genes that are differentially expressed between some digits in the limb, but it does not indicate 193 between which digits a gene is differentially expressed. The number of differentially expressed 194 transcription factor genes differs greatly among species: 356 in mouse, 377 in alligator, 34 in 195 Anolis, and 144 in the chicken hindlimb (FDR <0.05, Fig. 5 a). This is consistent with previous 196 results (above) that showed the Anolis forelimb to be more homogeneous than other sampled 197 limbs. Therefore, we focused on transcription factor genes that are one-to-one orthologous 198 between mouse, alligator, and chicken and identified a set of 49 genes that are differentially 199 expressed in these three limbs (Fig. 5 b). We call these conserved differentially expressed genes 200 (CDEGs). The expected number of overlapping genes among these sets by chance alone is 7.57, 201 and the probability of observing an overlap of 49 genes or more by chance is <10-6 (binomial 202 test). Thirteen of the CDEGs are included in the list of limb patterning genes sensitive to Shh 203 signaling24. To assess whether this gene set is biologically meaningful, we performed HCA and 204 PCA on the samples of each species using the 49 CDEGs. In Anolis, we considered the subset 205 (n=42) that are one-to-one orthologs across all four species. In combination, CDEGs can produce 206 unique expression profiles of each digit within a limb (Fig. 5 c) and show patterns similar to those 207 generated by analyses of known limb patterning genes (Fig. 1 b-d, Extended Fig. 3 c). 208 Analysis of amniote limbs showed that targeted gene lists generated either 209 experimentally (i.e., known limb patterning genes24), by (i.e., all transcription 210 factors25), or statistically (i.e., 49 CDEGs), can reveal distinct gene expression profiles among 211 digits of a limb, which are not observed in the full transcriptome. The spatial digit expression 212 profiles of these genes, however, is species specific. In light of these results, we reevaluated the 213 frame shift hypothesis of bird wing origin16. 214 215 216 Reevaluating the frame shift hypothesis 217 The frame shift hypothesis predicts that the three digits of the adult avian forelimb, which 218 we refer to here as D2, D3, and D4 according to their developmental position17-22, will express the 219 developmental programs observed in the digits D1, D2, and D3 of other limbs16. This hypothesis 220 was tested previously by analyzing the transcriptomes of chicken fore- and hindlimb digits7. That

6 bioRxiv preprint doi: https://doi.org/10.1101/224147; this version posted December 1, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

221 study found correspondence between forelimb D2 and hindlimb D1, consistent with the frame 222 shift hypothesis. However, correspondence of more posterior digits was not detected7. 223 We re-analyzed published transcriptomic data of digits from the chicken forelimb7 and 224 compared them to digits of the chicken hindlimb. Surprisingly, when the 49 CDEGs are 225 considered, gene expression profiles of forelimb digits D2, D3, and D4 correspond to hindlimb 226 digits D1, D3, and D4, respectively (Fig. 6 a). Analyses of transcription factor genes and known 227 limb patterning genes show a consistent pattern (Extended Data Fig. 6). Similarity between the 228 posterior two digits of the chicken fore- and hindlimb (D3 and D4 in each limb) can also be 229 observed in the expression patterns of numerous individual genes that are known to be involved 230 in the patterning of digits (Fig. 6 b). 231 To assess whether spatial gene expression profiles can be conserved between the fore- 232 and hindlimbs of a species, even when they differ in digit number, we performed in situ 233 hybridization in alligator. We evaluated expression of Tbx2, Tbx3, and Sall1, three transcription 234 factor genes identified as differentially expressed between alligator forelimb D3 and D4. In situ 235 hybridization confirms their expression in the posterior interdigital mesenchyme (Fig. 6 c) and 236 shows conserved positional expression patterns for Tbx3 and Sall1 between the forelimb and 237 hindlimb. It is unclear whether the pattern also holds for Tbx2, where difference in expression 238 level detected from RNA sequencing appear to reflect the proximodistal extent of gene 239 expression. 240 We also tested the frame shift hypothesis by comparing the limbs of chicken to the 241 pentadactyl forelimbs of other species. For each pentadactyl species, PCA were run using the 242 CDEGs (49 in mouse and alligator, and 42 in Anolis), and chicken samples were projected into 243 the reference PCA plane as supplementary observations. CDEGs were used because they can 244 produce digit-specific expression profiles for mouse and alligator forelimb and chicken hindlimb, 245 and because these patterns are reflective of more inclusive gene lists, as described above. These 246 projections show that the digits D2, D3, and D4 of the bird wing consistently fall into regions of the 247 PCA plane corresponding to the digits D1, D3, and D4 of other limbs. Although it is difficult to 248 differentiate D3 and D4 expression in all species, it is clear that D3 of the chicken forelimb does 249 not correspond in its expression profile of these genes to the D2 of the other limbs sampled (Fig 6 250 d). 251 Some have proposed on the basis of Shh expression that the digits in the avian wing are 252 of positions D1, D2, and D330. This model, like the frame-shift hypothesis, predicts wing digits 253 have developmental identities corresponding to the digits D1, D2, D3 of other limbs. The results 254 presented here as well, as the morphological evidence for five digit condensations17-22, suggest 255 Shh expression is not a conserved marker of digit identity or position in limbs with highly reduced 256 digit number. 257

7 bioRxiv preprint doi: https://doi.org/10.1101/224147; this version posted December 1, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

258 259 Discussion 260 Serial homologs are repeated body parts, generated by a common developmental 261 program. In the case of digits, chondrogenic condensations are generated by a reaction-diffusion 262 Turing-type mechanism31,32. Serial homologs can be developmentally identical (homomorph 263 parts) or they can assume distinct developmental identities through the differential expression of 264 regulatory genes (paramorph parts)33. The degree to which serial homologs are individuated can 265 be difficult to assess from morphology alone, because the same developmental program can lead 266 to different morphological outcomes depending on the developmental environment34,35. However, 267 detailed analyses of gene expression and regulation can identify developmentally individualized 268 body parts. 269 In this study, we performed a comparative analysis of whole genome expression data to 270 test the hypothesis that digits have conserved developmental identities. In interpreting our data, 271 we acknowledge that gene expression does not demonstrate gene function. Nevertheless, a lack 272 of differential gene expression between digits is evidence of a lack of developmental 273 individuation, and a high level of differential expression (particularly in transcription factor and 274 signaling genes) is evidence for distinct gene regulatory states. 275 The developmental stages studied here were selected on the basis of previous 276 experimental work. Dahn and Fallon13 demonstrated in the chicken hindlimb that genes 277 expressed in the interdigital mesenchyme regulate digit-specific morphologies, including the 278 number of phalanges. Subsequent work in the chicken hindlimb showed that this signaling, in the 279 phalanx-forming region, is active between stages 27 and stage 3014. Here, we analyzed the 280 expression profiles chicken hindlimb digits of stages 28 and 31 and showed that expression 281 profiles of limb patterning genes and transcription factor genes are stable over this developmental 282 window (Fig. 4 b, c; Extended Data Fig. 3 b, c; Fig. 6 b). Thus, signals pertinent to digit patterning 283 continue to be expressed at late stages of limb development, even after phalanges have formed. 284 Analyses of alligator show a consistent pattern: between stages 18 and 19.5, the expression 285 patterns of limb patterning genes and transcription factor genes are stable as assessed by HCA, 286 PCA, and in profiles of genes of interest (Fig. 1 c, Extended Data Figure 2, Fig. 4 b, c, Fig. 6 c). 287 Although not all species were sampled at multiple time points, we argue on the basis of these 288 comparisons in chicken and alligator that it is unlikely our conclusions on the evolution and 289 development of digit identity are biased by temporal dynamism in gene expression within the 290 developmental window studied here. 291 Our analyses show that patterns of regulatory gene expression in digits are evolutionarily 292 dynamic (Fig. 7 a). The developmental identities of digits are evolving across amniotes and can 293 be lineage-specific. The exception is a conserved developmental identity that characterizes the 294 D1 of mouse, alligator and Anolis forelimbs, chicken hindlimb, and human fore- and hindlimbs

8 bioRxiv preprint doi: https://doi.org/10.1101/224147; this version posted December 1, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

295 (Fig. 4 b). This digit identity is unlikely to be an edge-effect (i.e., merely a corollary to which digit 296 occupies the most-anterior position in a limb). In the rabbit hindlimb, which has lost the digit D1, 297 this developmental identity is not observed in D2, despite that digit now occupying the anterior- 298 most position in the limb36. Additionally, in the hindlimb of Silkie chicken mutants, which have 299 additional anterior digit on their foot, developmental identity is preserved in the digit of the 300 morphology of the native D1, despite that digit no longer occupying the anterior-most position in 301 the limb23. 302 In contrast to D1 we do not find support for conserved digit identities in the more posterior 303 digits. Among the pentadactyl limbs we studied, no genes consistently differentiate the median 304 digits (D2, D3, and D4) from one another. And when we consider the chicken hindlimb rather than 305 Anolis, because similarity among Anolis digits might be secondarily derived, we find no gene 306 differentiates D2 and D3, and only one gene (Tbx3) differentiates D3 and D4. There is limited 307 evidence for a conserved developmental identity for digit D5. A single gene (Tbx15) is 308 differentially expressed between D4 and D5 among mouse, alligator and Anolis, however more 309 genes are shared between just mouse and alligator (Fig. 3, Extended Data Fig 4 b). 310 Our analyses also identified a core set of regulatory genes, which we call CDEGs, that 311 are differentially expressed among digits, although species differ in which digits differentially 312 express the genes (Fig. 5). We propose that the CDEGs represent a “digit differentiation tool kit” 313 deployed for the individuation of different sets of digits in different lineages, depending on the 314 adaptive needs of the species. Between mouse and human, 28 the 49 CDEGs have 315 demonstrated roles in patterning distal limb skeleton (Extended Data Table 1). Of the CDEGs, 316 only 15 are differentially expressed across the Anolis forelimb. This homogeneity appears to be a 317 derived condition among the taxa sampled, as it is unlikely that the other 34 CDEGs reflect 318 homoplasy between mammals and archosaurs. 319 In Anolis most fingers, though they differ in number of phalanges, lack developmental 320 individuality and, thus, appear to be homomorphic. We consider a number of alternative, non- 321 biological explanations for the unique Anolis pattern; however, these do not adequately explain 322 homogeneity in the data. For example, it is possible that is the limbs were sampled at too-late a 323 stage, after signals pertinent to digit patterning were expressed. We regard this explanation as 324 unlikely because, as discussed above, in limbs sampled at multiple time points gene expression 325 profiles are stable over broad developmental window, through late stages of development. 326 Another possible alternative explanation is that variance among Anolis samples is greater as 327 compared to other data sets, and that this diminished our ability to detect differentially expressed 328 genes. We assessed this possibility in two ways. First, we repeated all differential expression 329 analyses considering only the two most highly correlated samples of each digit for mouse, 330 alligator and Anolis, which consistently had correlation values above 0.99 (Extended Data Fig. 7 331 a). Results of these two-sample comparisons are consistent with analyses of all three samples

9 bioRxiv preprint doi: https://doi.org/10.1101/224147; this version posted December 1, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

332 (e.g., compare Fig. 3 and Extended Data Fig. 7 b), indicating that the unique Anolis pattern is not 333 an artifact of sample quality. 334 Second, we evaluated the dispersion values of our samples. Dispersion is a measure of 335 variance among samples that is calculated by the software edgeR. This parameter affects the 336 sensitivity of differential expression analyses (e.g., a set of samples with high dispersion will have 337 low sensitivity in tests of differential expression), and it can be impacted by specimen pedigree37. 338 Anolis embryos were collected from non-siblings, whereas mouse and alligator samples were 339 collected from siblings. As expected, the mean dispersion value of Anolis samples is greater than 340 either mouse or alligator (Extended Data Fig. 8). The Anolis mean dispersion value is consistent 341 with other data sets in which samples were collected across a population37. However, such 342 differences in dispersion cannot explain the unique Anolis pattern. Chicken hindlimb digits, which 343 were also collected from non-siblings and have dispersion values comparable to Anolis 344 (Extended Data Figure 8), show patterns of differential expression comparable to mouse and 345 alligator (Fig. 2, Extended Data Fig. 4 a, Fig. 5 a). Thus, neither timing, sample quality, nor 346 pedigree appears sufficient to explain the Anolis data. It appears that homogeneity among the 347 digits reflects biological reality, and digits in this lineage have undergone secondary 348 homogenization. Other lineages might have similarly experienced loss of digit identities (e.g., 349 ichthyosaur forelimbs), and the secondary homogenization of paramorphic serial homologs has 350 been described in other anatomical systems (e.g., the homodont dentition in cetaceans38 and the 351 snake vertebral column39). 352 Finally, we reassessed the homology of fingers in the bird wing and obtain the novel 353 result that the three digits reflect a combination of translocated digit identities and conserved 354 identities. The anterior-most digit in the chicken wing, although it develops in position D217-22, 355 exhibits a gene expression profile seen in the D1 of the other limbs; this is consistent with 356 previous studies and the frame shift hypothesis7,23. The gene expression profiles of the posterior 357 wing digits (D3 and D4), however, do not show evidence of translocation. This is observed most 358 clearly by comparison to the hindlimb of the chicken, with the pattern recovered when three 359 different gene lists are considered (transcription factors, limb patterning genes, and CDEGs). As 360 discussed above, although we cannot diagnose conserved gene expression profiles for the digits 361 D3 and D4 across amniotes, we obtain indirect evidence for a correspondence of avian digits to 362 the digits D1, D3, and D4 of other amniote limbs (Fig. 6 d). The possibility of a 1-3-4 pattern of 363 digit identity in the bird wing has been proposed previously40 on the basis of experimental 364 studies41. Still, this pattern of correspondence is surprising. It challenges the predominant 365 hypotheses of digit identity and suggests an alternative scenario for how limb development 366 evolved in the lineage leading to Aves (Fig. 7 b). Significantly, it indicates that diagnoses of digit 367 identity from the paleontological record and hypotheses of digit identity based upon gene 368 expression profiles have a more complex relationship than previously anticipated.

10 bioRxiv preprint doi: https://doi.org/10.1101/224147; this version posted December 1, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

369 The frame shift hypothesis is an integrative model. It aimed to explain an apparent 370 incongruity between paleontological and neontological data sets by providing a developmental 371 account for evolutionary transformation rooted in a mechanistic basis of homology. Our results 372 show that any such integrative model will be more complicated than previously presumed. Moving 373 forward, we recommend systematic reappraisal of phalangeal and metacarpal characters along 374 the avian stem. It has been proposed that patterns of digit reduction in theropods might be more 375 complex than is generally assumed40. For example, study of the ceratosaur Limusaurus led to the 376 hypothesis that in basal tetanurans metacarpal characters correspond to identities 2-3-4, while 377 phalanges have identities 1-2-341, although specifically how this taxa informs the plesiomorphic 378 avian condition has been contested42. Additionally, we recommend continued, broad taxonomic 379 sampling in studies of limb development. Building expanded, comparative data sets will allow for 380 quantification of homoplasy between species and between the fore- and hindlimbs, which could 381 impact hypotheses of digit identity presented here. Finally, continued functional genetic studies 382 are required to understand how digit-specific phenotypes are regulated and to test the hypothesis 383 that CDEGs play privileged roles in establishing gene regulatory states in the interdigital 384 mesenchyme. 385 The question of how to diagnose the digits of the avian wing is among the oldest in 386 comparative morphology3,43. This study tests several assumptions that underlay many 387 contemporary studies of the homology and developmental identity of digits. Indeed, it is the first to 388 comparatively analyze the full gene expression profiles of digits of different species. Such data, 389 and a willingness to consider hypotheses that previously might have been regarded as heterodox, 390 is required for the testing and refinement of integrative theories on the nature of limbs. 391

11 bioRxiv preprint doi: https://doi.org/10.1101/224147; this version posted December 1, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

392 Methods 393 Limbs of each species were sampled after digital condensations have formed and after inter- 394 digital webbing has begun to reduce. RNA was extracted from digits and their associated 395 posterior inter-digital webbing following the dissection strategy shown in Figure 1 a of Wang et 396 al.7. A summary of the taxonomic and tissue sampling strategy is presented in Extended Data 397 Fig. 9. Investigators were not blinded to the group allocation during the experiment or when 398 assessing outcomes. 399 400 Alligator mississipiensis. Fertilized eggs were collected from six nests of wild individuals at the 401 Rockefeller Wildlife Refuge in Grand Chenier, Louisiana (USA) in July 2015 by Dr. Ruth Elsey 402 and colleagues. Eggs were marked with pencil to indicate the side that was facing upwards in the 403 nest so that embryos would not be injured by rotation of during transfer. Eggs were transported to 404 Yale University in mesh wire boxes containing original nesting material, and they were incubated 405 in a temperature-controlled room at 32°C. Eggs were placed on a plastic rack, surrounded with 406 original nesting material. Racks were suspended four inches above the bottom of a 10 gallon 407 aquarium. The base of the aquarium was filled with three inches of water, which was heated to 408 90°F with a submerged aquarium heater. The top of the tank was covered with plexiglass 409 perforated with 1cm diameter holes to allow for airflow. Humidity within the tank was maintained 410 at 90%. 411 Embryos were collected at Ferguson44 st.18 and 19.5. The left and right limbs of ten 412 individuals were dissected at each stage. For each stage, individuals sampled were of a single 413 nest and, therefore, at least half-siblings45. Embryos were extracted under sterile, RNAse-free 414 conditions. Individual digits and the associated posterior interdigital webbing were dissected with 415 fine scissors and forceps and placed immediately in room temperature RNAlater (Sigma-Aldrich). 416 Digits were pooled into a single vial (n=20 digits) and divided into four samples of five randomly 417 selected digits. RNA was extracted from each sample with TRIzol (Thermo Fisher Scientific) 418 following described methods46. For digit D3 stage 18, two of the extractions yielded too little RNA 419 for sequencing approach described below; therefore, there are only two replicates of this sample 420 type. RNA quality was assessed using an Agilent Technologies 2100 Bioanalyzer, and samples 421 with RIN scores above 8.5 were submitted for sequencing at the Yale Genome Sequencing 422 Center. Sample size (three replicates per sample type) was selected for downstream differential 423 expression analyses, according to References 27 and 28. To generate strand-specific 424 polyadenylated RNA libraries, samples were processed as follows: Approximately 500 ng of RNA 425 was purified with oligo-dT beads and the mRNA recovered was sheared by incubation at 94 C. 426 First strand synthesis was performed with random primers, and then second strand synthesis was 427 performed with dUTP to generate strand-specific libraries for sequencing. cDNA libraries were 428 end-paired, A-tailed adapters were ligated, and the second strand was digested with Uricil-DNA-

12 bioRxiv preprint doi: https://doi.org/10.1101/224147; this version posted December 1, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

429 Glycosylase. qRT-PCR was performed using a commercially available kit (KAPA Biosystems) to 430 confirm library quality, and insert size distribution was determined with Agilent Bioanalyzer. 431 Samples were multiplexed on an Illumina Hiseq 2000. Each sample was sequenced to a depth of 432 approximately 50 million reads (single-stranded, 75 length). 433 Reads were mapped to the American alligator genome assembly (allMis0.2) with genome 434 assembly described by Green et al.47. Sequenced reads were mapped to the genome using 435 Tophat2 v2.0.6 on Yale University’s Ruddle computing cluster. In Tophat2, reads were first 436 mapped to the transcriptome, and the remaining reads were then mapped to the genome. 437 Mapped reads were assigned to genes with HTSeq v0.5.3p48, which was implemented with 438 Python v2.7.2. In HTSeq, we required that reads be mapped to a specific strand, and to account 439 for reads that mapped to more than one feature, we ran with the setting “intersection-nonempty.” 440 441 Mus musculus. Mice embryos (E13.5) were collected from a pregnant female of the strain 442 C57BL/6J (Jackson Laboratories) in accordance with Yale IACUC #2015-11-483. The female was 443 pregnant with nine embryos. Digits from the left and right forelimbs of each individual were 444 dissected as described for alligator and pooled. From these 18 digits, RNA was extracted for 445 three batches of five digits each. RNA extraction and sequencing methods are the same as 446 described above for alligator, with the exception of sequencing depth (30 million reads were 447 obtained for each mouse sample). Sequenced reads were mapped to the mouse genome 448 assembly GRCm38 with Ensembl annotation v85 and the same Bowtie2 and HTSeq settings as 449 described for alligator. 450 451 Anolis carolinensis. Animals were bred according to published protocols49 and in accordance 452 with Loyola University’s IACUC protocol #1992. Fertilized eggs were collected and transferred to 453 petri dishes containing vermiculate moistened by equal mass water. Embryos were shipped to 454 Yale University and incubated in a Digital Sportsman Incubator (No. 1502) at 26°C. Tissues were 455 extracted and dissected according to methods described for alligator. Stage 1050 embryos were 456 sampled, and RNA was extracted using Qiagen RNeasy Micro Kit. RNA quality was assessed 457 using with a BioAnalyzer. Samples with RIN scores above 9.0 were submitted for sequencing at 458 the Yale Genome Sequencing Center. The RNAseq library was prepared with Clontech’s Ultra 459 Low V4 kit (cat# 634890). Each sample was sequenced to a depth of approximately 30 million 460 reads (single-stranded, 75 base pair length). Sequenced reads were mapped to the Anolis 461 genome assembly (AnoCar2.0, GCA_000090745.1) with Ensembl annotation v85. 462 463 Gallus gallus. Published transcriptomes of the digits of the fore- and hindlimbs of chicken7, were 464 mapped to the newest chicken genome version (GalGal5.0) with Ensembl annotation v86 465 following analytic methods described for alligator.

13 bioRxiv preprint doi: https://doi.org/10.1101/224147; this version posted December 1, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

466 467 Homo sapiens. Three individuals of Carnegie stage 1851 were donated to Yale University’s 468 Medical School. The fore- and hindlimbs were sampled, and the anterior-most digit and its 469 posterior interdigital webbing was dissected from the posterior digital plate. Dissections were 470 performed and RNA was extracted and sequenced as previously described43. Limbs at this stage 471 are similar to E12.5 of mouse52. Sequenced reads were mapped to the assembly 472 GRCh37 with Ensembl annotation v82 using the same Bowtie2 and HTSeq settings as described 473 for alligator. 474 475 Hierarchical clustering analysis. To estimate relative mRNA abundance, we calculated 476 transcripts per million (TPM)53 for the genes of a given gene list (i.e., full trancriptome, 477 transcription factors, limb patterning genes, CDEGs). The TPM measurement standardizes for 478 sequencing depth and transcript length. If multiple transcripts are described for a gene, then the 479 median transcript length was used to calculate TPM; these lengths are available as a 480 supplementary data file. TPM measures were normalized by a square root transformation, and 481 hierarchical clustering was performed on the normalized TPM data with the R package 482 “pvclust”54. Clusters were generated from the correlation-based dissimilarity matrix using the 483 average-linked method. Adjusted uncertainty values were calculated from 1000 bootstrapping 484 analysis. 485 If analyses involved comparisons between developmental stages or limbs, a bulk 486 correction was performed with a mean transformation (i.e., mean-centering)7. In these instances, 487 Pearson’s correlation coefficients range from [-1:1], rather than from [0:1]. Negative correlation 488 values arise because after bulk correction, a gene’s expression values is negative for samples 489 with a sqrt(TPM) value less than the mean sqrt(TPM) value of that gene among all samples of the 490 bulk. Bulks were comprised of all samples from a particular stage or all samples of a particular 491 limb. 492 493 Principal component analyses. PCA were performed using the “prcomp” function in R for 494 various gene lists using square root TPMs as normalized measures of relative mRNA abundance. 495 As with HCA, if analyses included samples from multiple stages or from different types of limbs, a 496 bulk correction was performed with a mean transformation (i.e., mean-centering). Bulks were 497 comprised of all samples from a particular stage or all samples of a particular limb. Loading 498 values for samples in PCAs and also bootstrap values, which were calculated using the with the 499 “bootPCA” function of the bootSVD package55 with centerSamples=True and 1000 bootstrap 500 samples), are provided as a supplementary data file. 501

14 bioRxiv preprint doi: https://doi.org/10.1101/224147; this version posted December 1, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

502 Differential expression testing of adjacent digits. EdgeR (Release 3.1)27,28 was used to test 503 for differential expression of adjacent digits (e.g., D1 vs. D2) of mouse, alligator, Anolis and 504 chicken. We used function glmFit and glmLRT in EdgeR, which implemented a generalized 505 regression model for differential expression test. Specifically, in alligator and chicken PCA and 506 PCA revealed stable grouping of samples by digit number across stages. Therefore, subsequent 507 analyses consider these data from two stages simultaneously. 508 In Anolis, pairwise testing of samples revealed nonconventional p value distribution, with 509 a decrease near zero and sometimes a bump near 0.5. Because correlation between replicates 510 was lower than what was observed in either mouse or alligator (Extended Data Fig. 7), and 511 because PCA of the full transcriptomes revealed two major clusters of data that did not 512 correspond to biological phenomena (Extended Data Fig. 1 c), we corrected for the artifact of the 513 non-biological clusters by including the first principle component in the regression model in 514 EdgeR. Analyses were also run without this PC1 correction. Results presented in the manuscript 515 are robust to both analytic approaches, although PC1 correction results in discovery of slightly 516 more differentially expressed genes for a given false discovery rate. (e.g., compare Fig. 3 and 517 Extended Data Fig. 10). 518 Following analyses of differential expression, multiple hypothesis testing was accounted 519 for by adjusting p values following the Benjamini-Hochberg method26. We also considered a 520 second correction method, the q-value of Storey56. The major results presented in the study are 521 robust to both methods. Although the Storey method uniformly called more genes as significant at 522 the FDR threshold of 0.05, the same genes are recovered in the center of Venn diagrams (Fig. 3, 523 Fig 4a, Fig 5 b). 524 525 Comparing correlation of fold change between orthologs and random genes. 526 To assess whether genes differentially expressed at a given position in one species are behaving 527 similarly in other species, we compared the relative fold change of these genes to random genes 528 of similar expression level in the other species. For example, between D1 and D2 of alligator 46 529 genes are recovered as differentially expressed among the one-to-one orthologs of the three 530 pentadactyl species sampled (FDR threshold of 0.05) (Fig. 3). For these genes, we calculated the 531 fold change in TPM according to the equation in Fig. 4 b for all three species (e.g., mouse, 532 alligator and Anolis). Next, for each of the 46 genes, we identified the gene most similar in its 533 TPM value at the position of the anterior digit (e.g., for the comparisons of D1 vs D2, we matched 534 the TPM of D1) among the one-to-one orthologous transcription factor genes for the two other 535 species. Then, we calculated to Pearson’s correlation for the vector comprised the gene fold 536 changes from the original species (alligator) and the orthologs of the other species (mouse and 537 Anolis). We also calculated Pearson’s correlation between gene fold changes from the original 538 species (alligator) and the random list of random genes of similar expression level for each of the

15 bioRxiv preprint doi: https://doi.org/10.1101/224147; this version posted December 1, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

539 other species. This was repeated for each the three species, and if a limb was sampled at 540 multiple time points, for each time point. Finally, to assess whether at a given position 541 orthologous genes could be distinguished in their behavior to random genes, we compared the 542 means of these correlations using two tests: t-test and a Mann-Whitney U test. 543 544 Differential expression testing between all digits. EdgeR was also used to test for genes that 545 differed between any combination of digits within a limb. This was done by specifying multiple 546 coefficients to function glmLRT. As with the pairwise tests, we considered both stages of alligator 547 and chicken simultaneously and included the first principle component in the regression model for 548 Anolis. Genes identified as CDEGs for each species are available as supplemental information. 549 550 Transcription factors. To identify transcription factor genes, we utilized a published atlas of 551 human and mouse transcription factors25. The published gene IDs were matched with 552 human Ensembl gene IDs in Ensembl assembly v85 using BioMart (N=2183). To recover the 553 species-specific lists of transcription factors, two approaches were taken. In mouse and Anolis, 554 orthologous genes were identified using BioMart’s orthology predictions for Ensemble assembly 555 v85. In alligator and chicken, because these species were analyzed using different assembly 556 builds, orthology was determined by matching gene symbols to those of human from Ensembl 557 assembly v85. By this approach, we recovered transcription factors for each species as follows: 558 1838 in mouse, 1563 in Anolis, 1455 in Alligator, and 1217 in chicken. To identify human 559 orthologs of select genes identified by differential expression analyses, we first used gene 560 symbols and then confirmed that the ensemble IDs were consistent across Ensemble 561 assemblies. Genes identified as transcription factors for each species are available as 562 supplemental information. 563 To identify one-to-one orthologous transcription factor genes in mouse, alligator, and 564 Anolis, we used BioMart to generate a list of one-to-one orthologous genes between mouse and 565 Anolis in Ensembl assembly v85 that correspond to the published transcription factor Entrez 566 IDs31. This gene list was then matched to alligator and chicken by gene symbol to recover 567 transcription factor genes that are one-to-one orthologs in multiple species. 568 569 Limb patterning genes. A Ph.D. dissertation by Carkett24 identified genes that are sensitive to 570 Shh signaling by experimental pertubations and in silico analyses. From these studies, a 571 summary list of genes that pattern the autopod was produced (pg. 172). This gene list includes 572 transcription factor and signaling genes. These gene symbols were matched with each species to 573 identify the subset of genes present in the genome assemblies that we considered for mouse 574 (n=151), alligator (n=142), Anolis (n=140), and chicken (n=136). Genes identified as limb 575 patterning genes for each species are available as supplemental information.

16 bioRxiv preprint doi: https://doi.org/10.1101/224147; this version posted December 1, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

576 577 578 In situ hybridization. RNA from a stage 18 alligator limb was extracted, as described above for 579 sequencing, and cDNA was generated with the High Capacity cDNA Reverse Transcription kit 580 (Applied Biosystems). Primers were designed with Primer357 to amplify fragments of Tbx2 581 (forward, GACCTTGGGCCTTCTCCTAC; reverse, GGGAGTTGTTTGGGGTTTT), Tbx3 (forward, 582 ACCAGGGGTGGATGAACATA; reverse, GCCCTAAAGCAGAGACATGC), and Sall1 (forward, 583 CTCACAGCTCAACAACCCAC; reverse, AAACCACCAGCCTCTACCTC). PCR products were 584 purified with the QIAquick Gel Extraction kit (Qiagen) and cloned with the Topo TA cloning kit 585 (Invitrogen) into the pCR 4-TOPO vector. Vectors were transformed into DH5α-T1 competent 586 cells. Sense and antisense probes were prepared by linearizing plasmid with the restriction 587 endonucleases Not1 or Pme1 and then transcribing the linearized product with T7 or T3 588 polymerase, respectively. Probes are labeled with digoxigenin (Sigma-Aldrich) and hybridized 589 with the alligator embryos at 68°C. Methods for in situ hybridization followed GEISHA Project 590 miRNA Detection Protocol Version 1.1 (http://geisha.arizona.edu/). 591 592 Illustrations. The authors modified existing images of the mouse forelimb skeleton58, alligator 593 forelimb skeleton59, chicken limb skeleton22, and chicken and alligator silhouettes22. Anolis 594 silhouette by Sarah Werning.

17 bioRxiv preprint doi: https://doi.org/10.1101/224147; this version posted December 1, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

595 References 596 1 Coates, M. & Clack, J. Polydactyly in the earliest known tetrapod limbs. Nature 347, 66- 597 69 (1990). 598 2 Lebedev, O. A. & Coates, M. I. The postcranial skeleton of the Devonian tetrapod 599 Tulerpeton curtum Lebedev. Zoological Journal of the Linnean Society 114, 307-348, 600 doi:10.1111/j.1096-3642.1995.tb00119.x (1995). 601 3 Owen, R. On the nature of limbs. 132 (John Van Voorst, 1849). 602 4 Coates, M. Ancestors and homology. Acta Biotheoretica 41, 411-424, 603 doi:10.1007/BF00709374 (1993). 604 5 Tabin, C. J. Why we have (only) five fingers per hand: hox genes and the evolution of 605 paired limbs. Development 116, 289-296 (1992). 606 6 Zhu, J. & Mackem, S. John Saunders' ZPA, Sonic hedgehog and digit identity – How 607 does it really all work? Developmental Biology 429, 391-400, 608 doi:http://dx.doi.org/10.1016/j.ydbio.2017.02.001 (2017). 609 7 Wang, Z., Young, R. L., Xue, H. & Wagner, G. P. Transcriptomic analysis of avian digits 610 reveals conserved and derived digit identities in birds. Nature 477, 583–586, 611 doi:10.1038/nature10391 (2011). 612 8 Montavon, T., Garrec, J.-F. L., Kerzberg, M. & Duboule, D. Modelling HOX genes 613 regulation in digits: reverse collinearity and the molecular origin of thumbness. Genes 614 Dev 22, 346-359, doi:10.1101/gad.1631708 (2008). 615 9 Chiang, C. et al. Manifestation of the Limb Prepattern: Limb Development in the Absence 616 of Sonic Hedgehog Function. Developmental Biology 236, 421-435, 617 doi:http://dx.doi.org/10.1006/dbio.2001.0346 (2001). 618 10 Harfe, B. D. et al. Evidence for an expansion-based temporal Shh gradient in specifying 619 vertebrate digit identities. Cell 118, 517-528, doi:10.1016/j.cell.2004.07.024 (2004). 620 11 Scherz, P. J., McGlinn, E., Nissim, S. & Tabin, C. J. Extended exposure to sonic 621 hedgehog is required for patterning the posterior digits of the vertebrate limb. Dev Biol 622 308, 343-354, doi:10.1016/j.ydbio.2007.05.030 (2007). 623 12 Reno, P. L. et al. Patterns of correlation and covariation of anthropoid distal forelimb 624 segments correspond to Hoxd expression territories. Journal of Experimental Zoology 625 Part B: Molecular and Developmental Evolution 310B, 240-258, doi:10.1002/jez.b.21207 626 (2008). 627 13 Dahn, R. D. & Fallon, J. F. Interdigital regulation of digit identity and homeotic 628 transformation by modulated BMP signaling. Science 289, 438-441, 629 doi:10.1126/science.289.5478.438 (2000). 630 14 Suzuki, T., Hasso, S. M. & Fallon, J. F. Unique SMAD1/5/8 activity at the phalanx – 631 forming region determines digit identity. Proc Natl Acad Sci U S A 105, 4185-4190 632 doi:10.1073/pnas.0707899105 (2008). 633 15 Suzuki, T., Takeuchi, J., Koshiba-Takeuchi, K. & Ogura, T. Tbx Genes Specify Posterior 634 Digit Identity through Shh and BMP Signaling. Developmental Cell 6, 43-53, 635 doi:http://dx.doi.org/10.1016/S1534-5807(03)00401-5 (2004). 636 16 Wagner, G. P. & Gauthier, J. A. 1,2,3=2,3,4: A solution to the problem of the homology 637 of the digits in the avian hand. Proc Natl Acad Sci USA 96, 5111-5116, doi:10.1073/ 638 pnas.96.9.5111 (1999). 639 17 Larsson, H. C. E. & Wagner, G. P. Pentadactyl ground state of the avian wing. Journal of 640 Experimental Zoology 294, 146-151, doi:10.1002/jez.10153 (2002). 641 18 Feduccia, A. & Nowicki, J. The hand of birds revealed by early ostrich embryos. 642 Naturweissenschaften 89, 391-393 (2002). 643 19 KundráT, M., Seichert, V., Russell, A. P. & Smetana, K. Pentadactyl pattern of the avian 644 wing autopodium and pyramid reduction hypothesis. Journal of Experimental Zoology 645 294, 152-159, doi:10.1002/jez.10140 (2002). 646 20 Burke, A. C. & Feduccia, A. Developmental Patterns and the Identification of 647 Homologies in the Avian Hand. Science 278, 666-668, doi:10.1126/ science.278.5338.666 (1997).

18 bioRxiv preprint doi: https://doi.org/10.1101/224147; this version posted December 1, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

648 21 Welten, M. C. M., Verbeek, F. J., Meijer, A. H. & Richardson, M. K. Gene expression and 649 digit homology in the chicken embryo wing. Evolution & Development 7, 18-28, 650 doi:10.1111/j.1525-142X.2005.05003.x (2005). 651 22 de Bakker, M. A. G. et al. Digit loss in archosaur evolution and the interplay between 652 selection and constraints. Nature 500, 445-448, doi:10.1038/nature12336 653 http://www.nature.com/nature/journal/v500/n7463/abs/nature12336.html - supplementary- 654 information (2013). 655 23 Vargas, A. & Fallon, J. F. Birds have dinosaur wings: the molecular evidence. J Exp 656 Zoolog B Mol Dev Evol 304, 86-90, doi:10.1002/jez.b.21023 (2005). 657 24 Carkett, M. D. Comparative analysis of Sonic Hedgehog signalling and the response to 658 Sonic Hedegehog signalling in vertebrate forelimbs and hindlimbs. PhD thesis, University 659 College London, (2015). 660 25 Ravasi, T. et al. An Atlas of Combinatorial Transcriptional Regulation in Mouse and Man. 661 Cell 140, 744-752, doi:http://dx.doi.org/10.1016/j.cell.2010.01.044 (2010). 662 26 Benjamini, Y. & Hochberg, Y. Controlling the False Discovery Rate: A Practical and 663 Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B 664 (Methodological) 57, 289-300, doi:citeulike-article-id:1042553 665 doi: 10.2307/2346101 (1995). 666 27 Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for 667 differential expression analysis of digital gene expression data. Bioinformatics (Oxford, 668 England) 26, 139-40, doi:10.1093/bioinformatics/btp616 (2010). 669 28 McCarthy, J. D., Chen, Y. & Smyth, G. K. Differential expression analysis of multifactor 670 RNA-Seq experiments with respect to biological variation. Nucleic Acids Research 40, 9 671 (2012). 672 29 Warren, W. C. et al. A New Chicken Genome Assembly Provides Insight into Avian 673 Genome Structure. G3: Genes|Genomes|Genetics 7, 109 (2017). 674 30 Towers, M., Signolet, J., Sherman, A., Sang, H. & Tickle, C. Insights into bird wing 675 evolution and digit specification from polarizing region fate maps. Nat Commun 2, 1-7, 676 doi:10.1038/ncomms1437 (2011). 677 31 Newman, S. A. & Frisch, H. L. Dynamics of skeletal pattern formation in developing chick 678 limb. Science 205, 662-668 (1979). 679 32 Raspopovic, J., Marcon, L., Russo, L. & Sharpe, J. Digit patterning is controlled by a 680 Bmp-Sox9-Wnt Turing network modulated by morphogen gradients. Science 345, 681 566-570, doi:10.1126/science.1252960 (2014). 682 33 Wagner, G. P. Homology, Genes and Evolutionary Innovation. (University Press, 2014). 683 34 Bateson, W. Problems of Genetics. (Oxford University Press, 1913). 684 35 Stewart, T. A. The origin of a new fin skeleton through tinkering. Biol. Letts 11, 1-5, 685 doi:10.1098/rsbl.2015.0415 (2015). 686 36 Salinas-Saavedra, M. et al. New developmental evidence supports a homeotic frameshift 687 of digit identity in the evolution of the bird wing. Frontiers in Zoology 11, 33, 1-10, 688 doi:10.1186/1742-9994-11-33 (2014). 689 37 Yoon, S. & Nam, D. Gene dispersion is the key determinant of the read count bias in 690 differential expression analysis of RNA-seq data. BMC Genomics 18, 408, 691 doi:10.1186/s12864-017-3809-0 (2017). 692 38 Rose, K. D. The beginning of the age of mammals. 431 (The Johns Hopkins University 693 Press, 2006). 694 39 Carroll, R. Vertebrate Paleontology and Evolution (Freeman, 1988). 695 40 Xu, X. & Mackem, S. Tracing the evolution of Avian Wing Digits. Curr Biol 23, R538- 696 R544, doi:10.1016/j.cub.2013.04.071 (2013). 697 41 Xu, X. et al. A Jurassic ceratosaur from China helps clarify avian digital homologies. 698 Nature 459, 940–944, doi:10.1038/nature08124 699 https://www.nature.com/articles/nature08124 - supplementary-information (2009). 700 42 Vargas, A., Wagner, G. & Gauthier, J. Limusaurus and bird digit identity. Nature 701 Proceedings 713 (2009). 702 43 Belon, P. L'histoire de la natvre des oyseavx : avec levrs descriptions, & naïfs portraicts 703 retirez du natvrel, escrite en sept livres. (... Gilles Corrozet ... 1555).

19 bioRxiv preprint doi: https://doi.org/10.1101/224147; this version posted December 1, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

704 44 Ferguson, M. W. J. in Biology of the Reptilia Vol. 14 (ed Carl Gans) Ch. 5, 329-481 705 (1985). 706 45 Davis, L. M., Glenn, T. C., Elsey, R. M., Dessauer, H. C. & Sawyer, R. H. Multiple 707 paternity and mating patterns in the American alligator, Alligator mississippiensis. 708 Molecular Ecology 10, 1011-1024, doi:10.1046/j.1365-294X.2001.01241.x (2001). 709 46 Chomczynski, P. & Mackey, K. Short Technical Report - Modification of the TRIZOL 710 reagent procedure for isolation of RNA from Polysaccharaide and proteoglycan rich 711 sources. Biotechniques 19, 942-945 (1995). 712 47 Green, R. E. et al. Three crocodilian genomes reveal ancestral patterns of evolution 713 among archosaurs. Science 346, doi:10.1126/science.1254449 (2014). 714 48 Anders, S., Pyl, P. T. & Huber, W. HTSeq - A Python framework to work with high- 715 throughput sequencing data. Bioinformatics, doi:10.1093/bioinformatics/btu638 716 (2014). 717 49 Sanger, T. J., Hime, P. M., Johnson, M. A., Diani, J. & Lsos, J. B. Laboratory protocols 718 for husbandry and embryo collection of Anolis lizards. Herpetological Review 39, 58-63 719 (2008). 720 50 Sanger, T., Losos, J. & Gibson-Brown, J. A developmental staging series for the lizard 721 genus Anolis: a reptilian model system for the integration of evolution, development, and 722 ecology. J. Morph. 269, 129-137 (2008). 723 51 O'Rahilly, R. Early human development and the chief sources of information on staged 724 human embryos. Europ. J. Obstet. Gynec. Reprod. Biol. 9, 273-280 (1979). 725 52 Cotney, J. et al. The Evolution of Lineage-Specific Regulatory Activities in the Human 726 Embryonic Limb. Cell 154, 185-196, doi:http://dx.doi.org/10.1016/j.cell.2013.05.056 727 (2013). 728 53 Wagner, G. P., Kin, K. & Lynch, V. J. Measurement of mRNA abundance using RNA-seq 729 data: RPKM measure is inconsistent among samples. Theory in Biosciences 131, 281- 730 285, doi:10.1007/s12064-012-0162-3 (2012). 731 54 Suzuki, R. & Shimodaira, H. Pvclust: an R package for assessing the uncertainty in 732 hierarchical clustering. Bioinformatics 22, 1540-1542, doi:10.1093/bioinformatics/btl117 733 (2006). 734 55 Fisher, A., Caffo, B., Schwartz, B. & Zipunnikov, V. Fast, Exact Bootstrap Principal 735 Component Analysis for p > 1 million. Journal of the American Statistical Association 111, 736 846-860, doi:10.1080/01621459.2015.1062383 (2016). 737 56 Storey, J. D. The positive false discovery rate: a Bayesian interpretation and the q-value. 738 Ann. Statist. 31, 2013-2035, doi:10.1214/aos/1074290335 (2003). 739 57 Untergasser, A. et al. Primer3 - new capabilities and interfaces. Nucleic Acids Research 740 40, e115 (2012). 741 58 Krebs, O. et al. Replicated anterior zeugopod (raz): a polydactylous mouse mutant with 742 lowered Shh signaling in the limb bud. Development 130, 6037-6047, 743 doi:10.1242/dev.00861 (2003). 744 59 Muller, G. B. & Alberch, P. Ontogeny of the Limb Skeleton in Alligator mississipiensis: 745 Developmental lnvariance and Change in the Evolution of Archosaur Limbs Journalf o 746 Morphology 203, 151-164 (1990). 747

20 bioRxiv preprint doi: https://doi.org/10.1101/224147; this version posted December 1, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

748 Data availability 749 The RNA-sequencing data for mouse, alligator and Anolis (including counts of mapped reads), is 750 available on Gene Expression Omnibus (GEO) repository (https://www.ncbi.nlm.nih.gov/geo/) 751 under accession number GSE108337. Sequencing data for human limb samples is available 752 through the database of Genotypes and Phenotypes (dbGaP) under study accession number 753 phs001226.v1.p1. Supplementary data files include (1) unique gene IDs corresponding to all 754 gene lists described, (2) median gene lengths, (3) bootstrap values for all PCA plots, (4) mapped 755 reads for chicken. All code used for analyses is available on GitHub 756 (https://github.com/ThomasAStewart/digit_identity_project) and available upon request. 757 758 Acknowledgements 759 We thank the Wagner lab members and SE Newman for discussions on data interpretation. The 760 human embryonic material was provided by the Joint MRC (G0700089)/Wellcome Trust 761 (GR082557) Human Developmental Biology Resource. Work in the Wagner lab was supported by 762 the John Templeton Foundation (Integrating Generic and Genetic Explanations of Biological 763 Phenomena; ID 46919), by National Institutes of Health grant GM094780 (to J.P.N.), and by 764 National Science Foundation grant 1353691. 765 766 Author Contributions 767 T.A.S. and G.P.W. contributed project design and paper writing. T.A.S., T.J.S., J.L.C., and J.P.N. 768 contributed data for analysis. T.A.S., C.L., and G.P.W. contributed analyses of data. 769 Correspondence and requests for materials should be addressed to [email protected] 770 and [email protected] 771 772 Additional Information 773 Extended Data accompanies this paper. 774 Competing Financial Interests: The authors declare no competing financial interests.

21 bioRxiv preprint doi: https://doi.org/10.1101/224147; this version posted December 1, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

a forelimb 1 mm D1 D1 D1 D2 D1 D3 D1 D4 D5 mouse alligator (st.18) alligator (st.19.5) Anolis

b r(√TPM) 100 ● 1.00 50 ●● ● ● ●● ● ● 0.95 ● 0 ● 0.90 ● 79 -50 ● PC2 (14.3%) ● 96 -100 ● 100 -200 -100 0 100 0.08 0.04 0 PC1 (70.1%)

c r(z-√TPM) ● 99 1.0 ● ● 40 ● 99 ● ● 0.0 ● 0 ● 100 ● ● ● ● −1.0 ● −40 ●● 89 PC2 (9.9%) −80 92 96 −120 −100 0 100 PC1 (66.9%) st.18 91 st.18: outline st.19.5: filled st.19.5 1.0 0

r(√TPM) d 85 ● 1.00

0.95 100 ● 89 ● ● ● 0.90 ● ● ● 0 ● ●

PC2 (23.9%) ● ● −100 ● ● −100 0 100 0.04 0 PC1 (35.3%)

Figure 1 | Pentadactyl amniote limbs have disparate patterns of genetic differentiation of digits. (a) Photographs of right forelimbs at the stages sampled, dorsal perspective. Analyses of limb patterning genes show that in (b) mouse and (c) alligator, replicates of each digit form clusters, indicating that the digits have distinct gene expression profiles. By contrast, (c) Anolis digits do not show clear differentiation of gene expression profiles. bioRxiv preprint doi: https://doi.org/10.1101/224147; this version posted December 1, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Anolis forelimb mouse forelimb alligator forelimb

750 1500 genes=39 genes=1438 genes=660 TF=7 TF=177 TF=52 500 1000

count 250 500 D1 vs D2

0 0

750 1500 genes=19 genes=25 genes=150 TF=5 TF=8 TF=11 500 1000

count 500

D2 vs D3 250

0 0

750 1500 genes=0 genes=47 genes=814 TF=0 TF=13 TF=65 500 1000

count 500

D3 vs D4 250

0 0

750 1500 genes=35 genes=739 genes=1552 TF=5 TF=77 TF=135 500 1000

count 500

D4 vs D5 250

0 0 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 p value p value p value

Figure 2 | Differential expression analyses suggest homogeneity among Anolis digits. Histograms show the distribution of p values from expression analyses of adjacent digits. In Anolis, p value distributions that are close to uniform, indicating very weak genetic differentiation of adjacent fingers. Mouse and alligator, on the other hand, generally show strongly biased p value distributions.The number of genes that are identified as differentially expressed at a FDR threshold of 0.05 are noted in each panel as “genes,” and the number of transcription factors among these are noted in each panel as “TF.” bioRxiv preprint doi: https://doi.org/10.1101/224147; this version posted December 1, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

D1 vs D2 Anolis (6) Hoxd11 Dlx6 Pitx1 Hoxd12 Dmrt2 Rarb 1 Sall1 Ell2 Sall3 1 1 Tbx15 Hand2 Tbx22 3 Nr2F1 Klf9 Tfap2b 111 28 14 Alx1 Lhx9 Alx4 Pax9 mouse (129) alligator (46)

D2 vs D3 Anolis (5) Hand2 Hoxd11 1 Nr2f1 0 4 Tbx15 0 Tbx18 5 4 2 Tbx3

mouse (7) alligator (10)

D3 vs D4 Anolis (0) Hand2 Shox2 0 Tbx2 0 0 Tbx3 0 4 4 50

mouse (8) alligator (54)

D4 vs D5 Anolis (4) Tbx15 Foxc2 Satb1 Alx1 Gli2 Six2 0 Dach2 Hand2 Tbx3 1 2 1 Etv4 Irx6 Tbx5 42 93 Datch1 Lhx2 Zeb2 15 Dlx5 Mkx Dlx6 Rarb mouse (59) alligator (111)

Figure 3 | Few genes are differentially expressed at the same position between pentadactyl limbs. Venn diagrams of one-to-one orthologous transcription factors genes for mouse, alligator, and Anolis that were identified as differentially expressed between adjacent digits with a FDR threshold of 0.05. bioRxiv preprint doi: https://doi.org/10.1101/224147; this version posted December 1, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

a D1 vs D2 D2 vs D3 D3 vs D4 chicken h. (92) chicken h. (16) chicken h. (22) Tbx3 46 13 12 23 8 0 3 2 5 10 0 1 3 44 88 6 21 4 2 5 3

mouse f. (128) alligator f. (45) mouse f. (6) alligator f. (9) mouse f. (9) alligator f. (53)

b D2 vs. D1 D(2-5) vs. D1 c Tbx2 Tbx3 Alx1 20 40 Dlx6 Dmrt2 10 20 Hand2 √ TPM √ TPM Hoxd11

Hoxd12 0 0 Pax9 D1 D2 D3 D4 D5 D1 D2 D3 D4 D5 Pitx1 Tbx15 Sall1 Tfap2b 40 chicken h. (st.28) (st.31) alligator f. (st.18) 20 TPM -TPM (st.19.5)

a b √ TPM Anolis f. mouse f. mouse f. (TPMa+TPMb)/2

human f. (a) human f. (b) human h. (a) human h. (b) Anolis f. -2 0 2 0 D1 D2 D3 D4 D5 chicken h. (st.31) chicken h. (st.28) alligator f. (st.19.5) alligator f. (st.18)

Figure 4 | D1 has a unique, conserved gene expression profile across amniotes. (a) Venn diagrams of one-to-one orthologous transcription factors genes of mouse, alligator, and chicken that are differentially expressed between adjacent digits (FDR threshold of 0.05). (b) Heatmap showing relative expression of genes in D2 and D1. Human transcriptomic data provides additional support for the hypothesis that D1 has a conserved developmental identity across amniotes. (c) Expression levels of T-box family genes across the autopod. Transcript per million (TPM) values presented in panels b and c calculated from the gene list of one-to-one orthologous genes between mouse, alligator, Anolis, chicken, and human. bioRxiv preprint doi: https://doi.org/10.1101/224147; this version posted December 1, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Alx1 Hhex Runx2 a b Batf3 Hic1 Sall1 Dach1 Hoxa13 Sall3 mouse f. alligator f. Anolis f. chicken h. Dlx5 Hoxc10 Satb2 chicken h. (129) Dlx6 Hoxd11 Six2 4000 genes=2950 genes=4677 genes=306 genes=1505 Dmrt2 Hoxd12 Sox10 TF=356 TF=377 TF=34 TF=144 Ebf1 Hoxd13 Tbx2 25 En1 Id2 Tbx3 Erg Irx6 Tbx15 14 41 count 2000 49 Ets2 Ldb2 Tfap2b Lhx9 Trps1 106 74 119 Foxc2 Nkx3-2 Twist2 Foxo1 Pax9 Zeb2 0 Foxp2 Pcbd1 Zic2 mouse f. alligator f. Gsc Pitx1 Zic3 0 0.5 1 0 0.5 1 0 0.5 1 0 0.5 1 (243) (283) Hand1 Prdm6 p value p value p value p value Hand2 Runx1t1

r(√TPM) c ● 1.0 ● 0.9 100

0.8 ● 0 ● PC2 (19.7%) ● ● ● ● ● ● ● ●● −200 0 PC1 (73.3%) 0.1 0

r(z-√TPM) 1.0 ● D1 100 ●● D2 97 50 ●●● 0.0 ● D3 100 ●● D4 −1.0 0 D5 93 PC2 (16.2%) ● −50 ● 98 ● 94 ●● −100 0 100 100 st.18 PC1 (68.1%) st.19.5 st.18: outline st.19.5: filled 1.0 0

r(√TPM) 100 ● 1.0 ● 99 ● 0.9 ● ● ● ● 97 0.8 0 ● ● ● ● PC2 (10.2%) ● 61 ● ●● 79 −200 0 PC1 (62.6%) 0.04 0

● 100 r(z-√TPM) 99 1.0 100 ● ●● 100 0.0 0 100 ●● 88

−1.0 PC2 (8.8%) 100 −100 ● ●

1.0 0 −2500 −250 PC1 (86.5%)

Figure 5 | A conserved core set of digit patterning genes in amniotes. (a) ANOVA comparing all combination of digits within limbs. The number of genes identified as differentially expressed at a FDR threshold of 0.05 are noted in each panel as ‘genes.’ The number of transcription factors among these are noted as ‘TF.’ (b) Venn diagram showing one-to-one orthologous transcription factor genes that are differentially expressed across mouse, alligator, and chicken limbs. (c) HCA and PCA show these genes can recover patterns of digit correspondence similar to analyses of limb patterning genes and transcription factors. bioRxiv preprint doi: https://doi.org/10.1101/224147; this version posted December 1, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

a r(z-√TPM) 100 1 forelimb 96 0 D2 100 st.28 −1 fore. hind. 100 D1 0 100 D2 D2 PC2 (9.6%) 93 D3 −100 st.31 D4 100 −250 0 250 1 mm PC1 (83.7%) 1.0 0 fore D2 & hind D3 D4 hind D1 D2 b Dlx6 Hoxd11 Hoxd12 Lhx9

25 40 30 20 10 30 20 15 20 10 5 10 10 √ TPM 5 0 0 0 0 D2 D3 D4D1 D2 D3 D4 D2 D3 D4D1 D2 D3 D4 D2 D3 D4D1 D2 D3 D4 D2 D3 D4D1 D2 D3 D4 Tfap2b Tbx2 Tbx3 Sall1 12 4 15 9 3 20 10 6 2 √ TPM 10 5 3 1 0 0 0 0 D2 D3 D4D1 D2 D3 D4 D2 D3 D4D1 D2 D3 D4 D2 D3 D4D1 D2 D3 D4 D2 D3 D4D1 D2 D3 D4 fore. (st.28) hind. (st.28) (st.31) (st.31)

d c Tbx2 Tbx3 Sall1

20 40 100 15 other 10 fore. hind. species 20 10 √ TPM D1 forelimb st.18 5 0 D2 0 st.19.5 0 0 D1 D2 D3 D4 D5 D1 D2 D3 D4 D5 D1 D2 D3 D4 D5 D3 D4 −100 D1 D1 D1 D5

−200 0 200

forelimb 100 100

st.18 D5 st.19 D5 st.19 D5 50 0 D1 D1 D1

0 −100

−200

hindlimb −50 D4 st.18 D4 st.19 st.19 D4 −200 0 200 −200 0 200

Figure 6 | Digits D2, D3, D4 of the chicken forelimb correspond to D1, D3, and D4 of other limbs. This pattern is observed by (a) HCA and PCA of CDEGs for digits of chicken wing and hindlimb. (b) Expression levels of individual transcription factor genes in chicken. (c) In situ hybridization of alligator embryos confirms that the position of differential expression of transcription factors can be conserved between fore- and hindlimbs that differ in digit number. Arrowheads indicate distal-most expression along a developing digit. (d) Projection of chicken digit data upon PCA of CDEGs for mouse, alligator, and Anolis. TPM values shown in panels b and c calculated from the gene list of one-to-one orthologous genes between mouse, alligator, Anolis, chicken, and human. bioRxiv preprint doi: https://doi.org/10.1101/224147; this version posted December 1, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

a

D2

D1 D1 D1 D1

forelimb forelimb forelimb forelimb hindlimb

b Shift of D1 developmental identity

plesiomorphic D5 reduced avian condition archosaur condition

Figure 7 | The evolution of digit gene expression in amniotes is highly dynamic. (a) A phylogeny of the taxa sampled by this study and schematic graphs summarizing the relative similarity of digits within limbs, where connections and line thickness reflect degree of similarity in gene expression profiles. (b) Schematic of a limited frame shift model for evolutionary origin of the avian wing in which the developmental identity of D1 was translocated to position D2. bioRxiv preprint doi: https://doi.org/10.1101/224147; this version posted December 1, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

r(√TPM) forelimb a 1.000 ● D1 D2 0.995 50 ● ● D3 0.990 25 ● D4 ● 0 ● ● PC2 (21.8%) ● D5 ● ● ● ● −25 ● ● 100 ● −80 −40 0 40 0.008 0.002 PC1 (24.7%)

r(z-√TPM) b 98 1.0 60 ● ● 0.5 ● 0.0 30 ● ● 82 ● 66 −0.5 0 ● −1.0 ●● ● ● 69 ● PC2 (22.4%) −30 ● ●● 82 −60 −50 0 50 100 PC1 (26.6%) st.18 99 st.19.5 st.18: outline st.19.5: filled 0.61.0 0.2

c r(√TPM) 1.00 150 ●

100 ● 0.98 ● ● 50 0.96

0 ● ● ● ●● PC2 (24.7%) ● ● ● ● ● −50 ● −100 0 100 PC1 (48.6%) 0.2 00.1

Extended Data Figure 1 | Analyses of full transcriptomes do not show clustering of samples by digit. (a) In mouse, only D1 forms a cluster in HCA, indicating that similarity can be diagnosed when the expression of all genes is considered; more-posterior digits do not form clusters of replicates. (b) In alligator, a cluster with low bootstrap support is observed for D4 samples. Stage 18 and stage 19.5 samples are differentiated as filled or outlined points in PCA plot. (c) In Anolis, samples do not reveal stable clustering of digit replicates. bioRxiv preprint doi: https://doi.org/10.1101/224147; this version posted December 1, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

forelimb a r(√TPM) 40 ● D1 99 1.00 ● ● ● ● D2 ● ●● 0.99 ● ● ● D3 0 ● 0.98 D4 ● D5

PC2 (23.7%) −40 80 ● ● 97 −80 −40 0 40 PC1 (42.3%) 0.008 0

b 50 ● 100 r(z-√TPM) ● 1.0 ●● ●● ● ● 86 0 ● ● ● ● ● ●● 0.0 81 −50 −1.0 −100 PC2 (29.1%) 68 −150 −50 0 50 PC1 (30.7%) st.18 st.19.5 st.18: outline st.19.5: filled 1.2 0.8 0.4 0

c r(√TPM) 1.00 ● ● 40 ● 0.99 ● ● ● ●

0.98 ● 0 ●● ● ● −40 ● PC2 (21.8%) ● ● −80 −50 0 50 100 PC1 (35.2%) 0.008 0

Extended Data Figure 2 | Analyses of all transcription factors show patterns of sample clustering consistent with analyses of limb patterning genes. (a) In mouse, D1 is markedly distinct from the posterior digits. (b) In alligator, two major clusters of digits are observed: (D1, D2, D3)(D4, D5). Stage 18 and stage 19.5 samples are differentiated as filled or outlined points in PCA plot. (c) In Anolis, digits do not show gene expression differentiation when all transcription factors are considered. Broadly, adjusted uncertainty values recovered by HCA are lower for a given cluster (e.g., alligator D4) than when limb patterning genes are analyzed. bioRxiv preprint doi: https://doi.org/10.1101/224147; this version posted December 1, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

hindlimb

D2 D1 D1 D2 D1 D3 1 mm D4 st.28 st.31

a full transcriptome ● ● 100 r(z-√TPM) ● 99 1.0 ● ● 88 0 100 ● 99 0.0 ●

−1.0 100 PC2 (20.9%) −40 ● 1.2 0.8 0.4 −200 0 PC1 (46.6%)

b transcription factors ● 100 r(z-√TPM) ● 97 1.0 20 ● ● 100 0.0 98 0 ● 84 −1.0 ● 20 ● 100 PC2 (10.6%) ● 1.0 0.2 −100 0 PC1 (70.3%)

c limb patterning genes 92 r(z-√TPM) ● 99 1.0 ● ● ● 91 0.0 0 ● 100 ● 96 −1.0 PC2 (8.2%) ● 100 −100 ● 1.0 0 −200 0 PC1 (82.9%)

Extended Data Figure 3 | Clustering analyses of chicken hindlimb digit transcriptomes. PCA, heatmap of Pearson’s correlations and HCA of (a) the full transcriptome and (b) transcription factor genes, (c) and known limb patterning genes. Each digit is represented by two data points, which correspond to one sample from stage 28 and another from stage 31. Clustering analyses show stable gene expression profiles for individual digits over this developmental window, even after phalangeal number has been established. bioRxiv preprint doi: https://doi.org/10.1101/224147; this version posted December 1, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

a b chicken hindlimb D1 vs D2 Alx1 Bcl11b Lhx9 Hlx Prrx2 chicken h. (92) Dlx6 Bnc1 Rarb Hoxa11 Sim2 1500 Dmrt2 Dach2 Sall3 Hoxd13 Sox10 genes=890 49 Hand2 En1 Tbx22 Irx6 Tbx3 TF=102 1000 25 8 Hoxd11 Hoxc11 Dach1 Ldb2 Tbx15 10 Hoxd12 Lmo3 Dlx5 Msx2 Twist2 87 21 count 500 6 Pax9 Pcbd2 Ebf2 Nfatc2 Zic2 D1 vs D2 Pitx1 Sall4 Foxc2 Nkx3-2 Zic3 mouse f. (128) alligator f. (45) Sall1 Ell2 Hand1 Pbx3 Znf423 0 Tfaptb Klf9 Hic1 Pcbd1

1500 D2 vs D3 genes=45 TF=17 chicken h. (16) Dlx6 1000 Hoxc11 13 Hoxd11 count 500 Tbx18 D2 vs D3 0 3 0 Tbx3 4 2 4 0 mouse f. (6) alligator f. (9) 1500 genes=102 TF=24 D3 vs D4 Tbx3 1000 chicken h. (22) Hoxc10 Hand2

count 500 12 Hoxc11 Shox2 D3 vs D4 Nr2f1 Tbx2 2 6 1 Pax9 Hand1 0 3 3 43 Sall1 Zic2 0.0 0.5 1.0 Twist2 p value mouse f. (9) alligator f. (53)

D4 vs D5 Alx3 Hand2 Six2 chicken h. (na) Dach1 Hoxb2 Tbx15 Dlx5 Irx6 Tbx3 na Dlx6 Lhx2 Tbx5 na na na Foxc2 Mkx Zeb2 Gli2 Rarb Zic2 I 44 20 86 Hand1 Satb1 mouse f. (64) taOlligator f. (106)

Extended Data Figure 4 | Differential expression analyses of adjacent chicken hindlimb digits. (a) p value distrubitions for pairwise tests of differential expression. The number of genes that are identified as differentially expressed at a FDR threshold of 0.05 are noted in each panel as “genes” and of these the number of transcription factors are noted in each panel as “TF.” (b) Venn diagrams showing the genes that are one-to-one orthologous transcription factors and differentially expressed in each species to a FDR threshold of 0.05. bioRxiv preprint doi: https://doi.org/10.1101/224147; this version posted December 1, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

mouse, alligator, and Anolis mouse, alligator, and chicken

a t test: e 3 t test: p value = 0.022, 4 p value = 3.587x10-4, 2 t = 2.543, df = 15.383 t = 4.004, df = 31.166

count Mann-Whitney U test: 1 count 2 Mann-Whitney U test: p value = 0.002, -4 D1 vs D2 D1 vs D2 p value = 4.144x10 , W = 74 0 0 W = 247.5 −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0

b 2 f t test: 6 t test: p value = 0.939, p value = 0.058, t = 0.078, df = 17.907 4 t = 1.970, df = 30.253 1 count Mann-Whitney U test: count 2 Mann-Whitney U test: D2 vs D3 p value = 0.222, D2 vs D3 p value = 0.056, 0 W = 55 0 W = 200.5 −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0

g c 2 t test: 4 *only mouse t test: p value = 0.204, and alligator 3 p value = 0.371, t = 1.3686, df = 8.9615 t = 0.908, df = 31.668 1 2 count Mann-Whitney U test: count Mann-Whitney U test: D3 vs D4 p value = 0.937, D3 vs D4 1 p value = 0..344, 0 W = 19 0 W = 172.5 −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0

d 4 t test: p value = 0.200 t = 1.344, df = 14.436 orthologs 2

count Mann-Whitney U test: random genes D4 vs D5 p value = 0.053, 0 W = 40 −1.0 −0.5 0.0 0.5 1.0 correlation of fold change

Extended Data Figure 5 | Comparing fold change of differentially expressed genes to random genes. Fold change of genes differentially expressed between adjacent digits for one species was compared to those of orthologous genes in other species and also to randomly selected genes of similar expression level for the other species. Comparisons were made among the pentadactyl limbs (a-d), and also considering chicken, rather than Anolis (e-g). Broadly, genes differentially expressed between D1 and D2 behave consistently between species. Among more posterior digits there is limited evidence for conserved behavior. The genes number of genes recovered as differentially expressed at each position for each species are reported in Fig 3. and Extended Data Fig. 4 b. Vertical lines in each plot represent the mean values of correlation among comparisons between the sets of orthologous or random genes. bioRxiv preprint doi: https://doi.org/10.1101/224147; this version posted December 1, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

a full transcriptome r(z-√TPM) 1

0 40 fore. hind. 93 D1 −1 0 D2

D3 PC2 (12.5%) −40 D4 −80 −100 0 100 200 st. 28 PC1 (70.9%) st. 31 1.0 0 st.28: outline st.31: filled

r(z-√TPM) transcription factors b 1 76 100 0

80 −1 72 0

80 PC2 (25.1%)

82 −100 100 st. 28 −100 −50 0 50 st. 31 PC1 (51.2%) 1.0 0 st.28: outline st.31: filled fore D2 hind D2 fore & hind & hind D1 D3 & D4 c limb patterning genes r(z-√TPM) 1 94 100 0

96 −1 0 PC2 (9.4%) 100 −100 95

79 −200 0 200 st. 28 PC1 (77.5%) st. 31 st.28: outline st.31: filled 1.0 0.0 fore D2 hind D2 fore & hind & hind D1 D3 & D4

Extended Data Figure 6 | Correspondence of chicken forelimb and hindlimb digits. (a) PCA and HCA of full transcriptomes does not reveal correspondence between digits of the fore- and hindlimb. However, analyses of (b) all transcription factors and (c) limb patterning genes show that the three digits in the avian wing correspond to hindlimb digits D1, D3, and D4. Stage 28 and 31 samples are differentiated as filled or outlined points in PCA plot. bioRxiv preprint doi: https://doi.org/10.1101/224147; this version posted December 1, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

a Anolis alligator mouse 1.00 1.00 1.00

0.99 0.99 0.99

0.98 0.98 0.98 st.18 st.19.5 0.97 0.97 0.97 Pearson’s Correlation D1 D2 D3 D4 D5 D1 D2 D3 D4 D5 D1 D2 D3 D4 D5

b Differential expression analyses of the two most-highly correlated replicates

Anolis (3) Hoxd11 Dmrt2 Pax9 Sall1 Ell2 Pitx1 0 Tbx15 Hand2 Rarb D1 vs D2 1 0 Alx1 Hoxd12 Tfap2b 2 Alx4 Klf9 106 29 13 Dlx6 Lhx9 Mouse Alligator (122) (44)

Anolis (13) Hand2 Hoxd11 10 Tbx15 D2 vs D3 0 3 Tbx18 0 Tbx3 5 2 5

Mouse Alligator (7) (10)

Anolis (2) Hand2 Shox2 2 Tbx2 D3 vs D4 0 0 Tbx3 0 4 4 44

Mouse Alligator (8) (48)

Anolis (2) Tbx15 Hand2 Six2 Alx1 Irx6 Tbx3 0 Datch1 Lhx2 Tbx5 D4 vs D5 1 0 Dlx5 Mkx Zeb2 1 Dlx6 Rarb 40 90 14 Gli2 Satb1 Mouse Alligator (56) (105)

Extended Data Figure 7 | Evaluating how differential expression analyses are impacted by variance of replicates. (a) Pearson’s correlation values of sample replicates showing that in Anolis replicates are less highly correlated than the replicates of mouse and alligator; however, for the digits D1, D2, D4 and D5 two of the three replicates are correlated with values comparable to the other species (>0.99). Therefore, to assess whether variance in Anolis was biasing our analyses, tests of differential expression were replicated using only the two most-highly correlated replicates of each digit. (b) Venn diagrams of one-to-one orthologous transcription factors genes identified as differentially expressed between adjacent digits with a FDR threshold of 0.05 when differential expression analyses considered only the two most-highly correlated samples of each digit. bioRxiv preprint doi: https://doi.org/10.1101/224147; this version posted December 1, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Anolis mouse alligator st.18 alligator st.19.5 chicken hindlimb

0 0.01 0.10 1.00 Mean Dispersion

Extended Data Figure 8 | Mean dispersion values of the digit transcriptomes. Values calculated by edgeR showing that although pedigree likely impacts tests of differential gene expression, it does not explain the homogeneity of the Anolis digits. bioRxiv preprint doi: https://doi.org/10.1101/224147; this version posted December 1, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

D2-D5 D1 D2 D3 D4 D3 D2-D5 D3 D4 D3 D1 D2 D4 D2 D5 D1 D2 D4 D5 D5 D2 D1 D1 D1 D3

D4 Human forelimb and hindlimb Chicken forelimb and hindlimb Alligator forelimb Anolis forelimb Mouse forelimb - Sampled at st.18 - Published data set: Wang et al. 2011 - Sampled at st.18 and 19.5 - Sampled at st.10 - Sampled at st.18 (E13.5) - 2 replicates per sample type, designated - Sampled at st.28 and 31 - 3 replicates per digit - 3 replicates per digit - 3 replicates per digit a and b when plotted on Fig. 2 b. - One sample per digit for each time point - Each replicate contained - Each replicate contained - Each replicate - One sample of each type from Individual - Each sample contained 5 pooled digits 5 pooled digits contained 5 pooled digits 1; one sample of each forelimb type from 20 pooled digits Individual 2; one sample of each hindlimb type from Individual 3

Archosauria

Sauropsida Mammalia

Extended Data Figure 9 | Summary of the data sampled. Illustrations are of the adult skeletons from the dorsal perspective, anterior is left. ‘Replicates’ refer to biological replicates. bioRxiv preprint doi: https://doi.org/10.1101/224147; this version posted December 1, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Anolis (5) Hoxd11 Alx4 Lhx9 Hoxd12 Dlx6 Pax9 0 Sall1 Dmrt2 Pitx1 D1 vs D2 1 1 Tbx15 Ell2 Rarb 3 Nr2F1 Hand2 Tfap2b 106 28 12 Alx1 Klf9 Mouse Alligator (122) (44)

Anolis (4) Hand2 Hoxd11 1 Tbx15 D2 vs D3 0 3 Tbx18 0 Tbx3 5 2 5

Mouse Alligator (7) (10)

Anolis (0) Hand2 Shox2 0 Tbx2 D3 vs D4 0 0 Tbx3 0 4 4 44

Mouse Alligator (8) (48)

Anolis (2) Tbx15 Hand2 Six2 Alx1 Irx6 Tbx3 0 Datch1 Lhx2 Tbx5 D4 vs D5 1 0 Dlx5 Mkx Zeb2 1 Dlx6 Rarb 40 90 14 Gli2 Satb1 Mouse Alligator (56) (105)

Extended Data Figure 10 | Differential expression analyses of adjacent digits without PC1 correction of Anolis data. Venn diagrams of one-to-one orthologous transcription factors genes for mouse, alligator, and Anolis that were identified as differentially expressed between adjacent digits with a FDR threshold of 0.05. bioRxiv preprint doi: https://doi.org/10.1101/224147; this version posted December 1, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Extended Data Table 1 | Human and mouse phenotypes have been described for 28 of the 49 CDEGs gene human phenptype or syndrome mouse phenotypes Alx1 Camptodactyly [2] polydatyly [4] Dlx5 Split hand/foot malformation [1,3] ectrodactyly, monodactyly, syndactyly [4] Dlx6 brachydactyly, ectrodactyly, syndactyly [4] En1 25 phenotypes including adactyly, ectopic digits, polydactyly, syndactyly, truncation of digits, fused phalanges [4] Ets2 Chitayat syndrome [3] Hand1 hypoplastic limb buds [4] Hand2 19 phenotypes including oligodactyly, polydactyly, abnormal pollex morphology [4] Hic1 Hand-foot-genital syndrome [1] abnormal fore- and hindlimb morphology [4] Hoxa13 Guttmacher syndrome [1,2], Hand-foot- 21 phenotypes including brachydactyly, genital syndrome [1,3], Postaxial clinodactyly, syndactyly [4] hand polydactyly [2] Hoxd11 7 phenotypes including abnormal phalanx, abnormal and fused carpals [4] Hoxd12 16 phenotypes including brachyphalangia, brachydactyly, clinodactyly, oligodactyly [4] Hoxd13 Brachydactyly-syndactyly syndrome [1], 29 phenotypes including brachydactyly, VACTERL association, Brachydactyly type ectrodactyly, polydactyly, polysyndactyly [4] E [2] Nkx3-2 Spondylo-megaepiphyseal-metaphyseal dysplasia [3] Pax9 polydactyly, polysyndactyly [4] Pitx1 Clubfoot and Lower limb malformations [1], 11 phenotypes including brachydactyly, Liebenberg syndrome [3] oligodactyly, and clubfoot [4] Runx2 Brachydactyly [1], MDMHB [3], cleidocranial 10 phenotypes including abnormal phalanx dysplasia (which includes abnormal thumbs morphology [4] and brachydactyly) [4] Sall1 Townes-Brocks Syndrome [1,2,3] , Lenz 10 phenotypes including oligodactyly, preaxial microphthalmia syndrome [2] polydactyly, syndactyly and triphalangia [4] Satb2 2q32-q33 deletion syndrome (which includes clinodactyly archnodactyly, and Talpes equinovarus [4] Sox10 kallmann-syndrome [3], Klein-Waardenburg syndrome [3], PCWH syndrome (which includes Pes cavus) [4] Tbx2 17q23.1q23.2 microdeletion syndrome [2] Polydactyly, postaxial polydactyly [4] Tbx3 Ulnar-mammary syndrome [1], Limb- 16 phenotypes including oligodactyly [4] mammary syndrome [2], post-axial polydactyly [2] bioRxiv preprint doi: https://doi.org/10.1101/224147; this version posted December 1, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Tbx5 Holt-Oram syndrome [1,2] 10 phenotypes including abnormal phalanx morphology [4] Tbx15 Cousin syndrome (which includes 9 phenotypes including abnormal phalanx brachydactyly) [1,2] morphology [4] Tfap2b Char syndrome [1,2] Polydactyly, postaxial polydactyly [4] Trps1 Trichorhinophalangeal syndrome type 1 [1,2], type 2 [2], and type 3 [2] Twist2 Ablepharon macrostomia syndrome [2] Zic2 6 phenotypes mostly restricted to carpals [4] Zic3 VACTERL association [2], Radial abnormalities [2], aplasia/Hypoplasia of the radius [2]

References [1] Stenson et al. The Human Gene Mutation Database (HGMD®): 2003 Update. Hum. Mutat. 21, 577-581 (2003) [2] NIH Genetic and Rare Disease Information Center. http://rarediseases.info.nih.gov. [3] National Library of Medicine (US). Genetics Home Reference [Internet]. https://ghr.nlm.nih.gov (2013). [4] Blake JA et al. Mouse Genome Database (MGD)-2017: community knowledge resource for the . Nucl. Acids Res. 4, D723-D729 (2017).