1 Supplementary Information for 2 Genomic and transcriptomic analyses of the subterranean 3 Reticulitermes speratus: gene duplication facilitates social evolution 4 5 6 Shuji Shigenobu, Yoshinobu Hayashi, Dai Watanabe, Gaku Tokuda, Masaru Y Hojo, Kouhei 7 Toga, Ryota Saiki, Hajime Yaguchi, Yudai Masuoka, Ryutaro Suzuki, Shogo Suzuki, Moe Kimura, 8 Masatoshi Matsunami, Yasuhiro Sugime, Kohei Oguchi, Teruyuki Niimi, Hiroki Gotoh, Masaru K 9 Hojo, Satoshi Miyazaki, Atsushi Toyoda, Toru Miura, Kiyoto Maekawa 10 11 Corresponding authors: Shuji Shigenobu, Toru Miura, Kiyoto Maekawa 12 Email: [email protected], [email protected], [email protected] 13 14 This PDF file includes: 15 16 Supplementary text 17 Figures S1 to S36 18 Tables S1 to S28 19 Legends for Datasets S1 20 SI References 21 22 Other supplementary materials for this manuscript include the following: 23 24 Datasets S1 25 26 27 28 29

1

30 Supplementary Information Text 31 Supplementary Note: Genes involved in specific biological functions 32 Following the prediction of genes encoded in the Reticulitermes speratus genome, we manually 33 annotated and investigated the genes of some functional categories that characterize the 34 ecology, evolution, behavior, development and physiology of the subterranean termite. We 35 analyzed 15 categories, among which lipocalin, glycoside hydrolase family, lysozyme family, 36 geranylgeranyl diphosphate (GGPP) synthase and the novel secretion gene family TY are 37 described in the main text. Here, the other 10 categories (sex determination; epigenetics; 38 chemosensory genes; biogenic amines and neuropeptides; juvenile hormone-related genes; 39 ecdysone-related genes; insulin/insulin-like signaling pathway; toolkit genes involved in wing 40 formation; immunity; insecticide target and detoxification genes) are described. Finally, a report 41 on a caste-specific expression of microRNAs (miRNAs) is introduced.

42 Sex determination 43 In , sex determination is cell-autonomously controlled by a cascade of RNA splicing, in 44 which mRNAs are alternatively spliced in a sex-specific manner. Although the upstream genes in 45 this cascade differ among taxa, the most downstream key gene doublesex (dsx) is conserved 46 among insects, and also in some of crustaceans and chelicerates1–4. Since dsx encodes sex- 47 specific transcription factors, the cascade results in sex-specific transcriptions of downstream 48 genes that are responsible for sex differentiation. In contrast to holometabolous insects, the sex 49 determination cascades in hemimetabolous insects including are scarcely understood5,6. 50 Thus, our genome research in Reticulitermes speratus provides important information on gene 51 repertories related to the sex determination cascades in hemimetabolous insects. 52 The sex determination cascade might also be responsible for social organization in 53 termites, since some termite show sex-specific or sex-biased caste ratio, suggesting that 54 some regulatory factors for caste differentiation sexually differ7. In R. speratus, sex ratio of 55 workers is known to be nearly equal, whereas those of nymph and soldier are female-biased8. For 56 example, differentiation into female secondary reproductives was inhibited by the pheromone 57 derived from female primary reproductives9, suggesting that they have the reaction mechanism to 58 sex-specific pheromones regulating the ergatoid differentiation. Therefore, the sex determination 59 cascade is one of the most likely candidate regulatory mechanisms for the sex-specific caste 60 differentiation. Thus, the annotation of sex-determination genes could also help to understand the 61 regulatory mechanisms underlying the in termites. Here, orthologs of the genes 62 reported as sex determination genes in other species were searched in the R. speratus 63 genome, and compared the expression levels of those genes between sexes and among castes 64 by transcriptomic analyses. 65 We selected the 27 candidate genes from the genes 66 categorized as “sex determination (BSID: 492283)” in BioSystems database at NCBI 67 (http://www.ncbi.nlm.nih.gov/biosystems/)10 and selected 4 candidate genes based on previous 68 studies on the sex determination in silkworm Bombyx mori11–13. Database searches were 69 performed using full length of amino acid sequences of 27 Drosophila genes or 3 Bombyx genes 70 against our Reticulitermes gene model database (RsGM8_pep) via BLASTP algorithm (See 71 Supplementary Table 12 for the accession numbers of each query sequence). Searching for an 72 ortholog of Bombyx Feminizer, which encodes microRNA, was performed via BLASTN algorithm. 73 These analyses revealed that the R. speratus genome possessed orthologs of the 24 Drosophila 74 genes and those of the 3 Bombyx genes (Supplementary Table 12). For the orthologs of major 75 components of the sex determination cascade in Drosophila, orthologs of Sex-lethal (Sxl), 76 transformer (tra), transformer-2 (tra2) and fruitless (fru) were found, whereas, surprisingly, the dsx 77 ortholog was not found. The BLAST search for Drosophila dsx as a query hit three orthologs of 78 the doublesex-mab3 related transcription factors genes (Dmrt11B, Dmrt93B, Dmrt99B), but not 79 dsx ortholog. The insect dsx is a member of the Dmrt gene family that is conserved among a wide 80 array of phyla14. Both dsx and Dmrt paralogs share a well conserved DNA-binding domain 81 (DM domain), and some Dmrt genes also plays essential roles in gonad development and sexual 82 differentiation outside Insecta14. The dsx ortholog would have been lost in R. speratus genome, 83 and any other factors, e.g., Dmrt genes, might be substituted as the most downstream genes in

2

84 their sex determination cascade. Alternatively, because the German cockroach Blattella 85 germanica has the conserved dsx ortholog6, the domain sequences may have diverged during 86 the course of termite evolution. 87 The R. speratus orthologs of 3 regulatory genes (deadpan, groucho, scute) for splicing of 88 the most upstream gene (Sxl) in the Drosophila cascade were duplicated, whereas no ortholog of 89 two (sisterless-A and degringolade) of them were found. An ortholog of stand still, required for the 90 Drosophila germline sex determination, was neither found. Additionally, the ortholog of B. mori 91 Feminizer was not found, while the others coding proteins were found in the R. speratus genome. 92 Although a primary signal for sex determination in R. speratus was unknown, the signal should be 93 different from those in D. melanogaster (the dose of X-linked signal element15) and B. mori 94 (Feminizer piRNA on the W chromosome11). 95 RNA-seq analysis revealed the expression patterns for 25 out of 30 candidate orthologs 96 (Supplementary Table 12). These data were compared the expression levels between sexes and 97 among three castes (primary reproductives, soldiers and workers) in two body parts [heads and 98 the remaining parts (thorax + abdomen)] (biological triplicates; NCBI BioProject Accession No. 99 PRJDB5589). Statistical analysis revealed that 14 out of 25 orthologs showed caste-biased 100 expression patterns while none showed sex-biased (Supplementary Table 12). For example, the 101 expressions of Dmrt11 orthologs (RS007930) were higher in reproductive heads, moderate in 102 worker heads, and lower in soldier heads (FDR = 7.85E-11, GLM, Supplementary Fig. 9), the 103 orthologs of outstreched (os RS015475, FDR = 1.87E-08) and ovarian tumor (out, RS009292, 104 8.00E-07) were highly expressed in female reproductive bodies (Supplementary Fig. 9). It was 105 suggested that these orthologs were involved in their sex determination or the downstream 106 pathways of sex differentiation. 107 Our analyses revealed that most orthologs of sex determination genes identified in other 108 insects, were conserved in the R. speratus genome, and that some of them were expressed in a 109 cased-biased manner. However, it remains unknown which of them play roles in their sex 110 determination cascade. In order to test their roles for sex determination, it should be examined 111 whether these genes were spliced in a sex-specific manner, and whether these genes actually 112 regulate their sex-specific trait expressions.

113 Epigenetics: Histone modifying enzymes 114 Histone posttranslational modifications (PTMs), one of the epigenetic mechanisms, play important 115 roles in gene-expression regulations without genomic changes, resulting in alteration of 116 development and behavior in various organisms16–18. Since such PTMs can be changed in 117 response to environmental stimuli16,19, polyphenic developments in insects, including caste 118 differentiation in social insects, are considered to be affected by PTMs20–23. For example, when 119 honeybee larvae were fed with royal jelly, which contain (E)-10-hydroxy-2-decenoic acid 120 possessing histone deacetylase inhibitor activity, such larvae differentiate into queen bees24. 121 Furthermore, an association between PTM (especially, histone acetylation and methylation) 122 patterns and caste identities was shown in the carpenter Camponotus floridanus25. It 123 suggests that histone acetylation and methylation have important roles in the regulation of caste 124 differentiation in social hymenopterans. However, in termites, such roles of PTMs remain 125 unknown. Here, as a first step of understanding the PTM roles in the caste differentiation in R. 126 speratus, we searched four types of histone-modifying enzyme genes, i.e., genes encoding 127 histone acetyltransferases (HATs), histone deacetylases (HDACs), histone methyltransferases 128 and histone demethylase. In addition, we examined expression patterns of those genes between 129 sexes and among castes by an RNA-seq analysis. 130 The R. speratus gene models were searched for histone acetyltransferases, 131 deacetylases, methyltransferases, and demethylases with BLASTP algorithm (E-value cutoff of 132 1e-10)26 using query protein sequences of those enzymes derived from 133 nevadensis27 which belongs to and is phylogenetically distant to R. speratus, 134 and D. melanogaster. We also investigated the expression levels of those genes by an RNA-seq 135 analysis (for the method of the RNA-seq analysis, see main text). 136 We identified 69 histone-modifying enzyme genes in R. speratus (17 histone 137 acetyltransferase genes, 13 histone deacetylase genes, 26 histone methyltransferase genes, 13 138 histone demethylase genes; Supplementary Table 13). Repertoire of these genes was almost 3

139 identical between R. speratus and Z. nevadensis (Supplementary Table 13), suggesting that the 140 repertoire have been highly conserved in the course of termite diversification. 141 Our RNA-seq analyses revealed that 47 of 69 genes showed caste-biased expression 142 patterns while all of the 69 genes showed no significant differences in expression levels between 143 sexes (FDR < 0.05, Supplementary Fig. 10 and Supplementary Table 13). Interestingly, KDM1A 144 (Lsd1, coding histone demethylase gene) and SETDB1 (eggless, coding histone 145 methyltransferase gene), both of which regulate oogenesis in Drosophila melanogaster28–30, were 146 highly expressed in thorax and abdomen of queens (Supplementary Fig. 10). The ovarian 147 development in queens is thus suggested to be regulated by PTMs, especially histone 148 methylation and demethylation. Furthermore, while DOT1L (grappa, coding histone 149 methyltransferase gene) is required for stress resistance in D. melanogaster (List et al. 2009), 150 grappa expression levels were relatively high in soldiers of R. speratus throughout the body 151 (Supplementary Fig. 10). It was suggested that the enhanced stress resistance might be required 152 for the colony defense by soldiers. 153 Generally, aging is considered to be linked to the sirtuin lysine deacetylases enzymes 154 (SIRT), and some genes encoding SIRT enzymes are highly expressed in long-lived reproductive 155 castes of an ant and a termite (SIRT1 and SIRT6 in Harpegnathos saltator; SIRT6 and SIRT7 in 156 Z. nevadensis)27,31. In R. speratus, expression levels of SIRT6 and SIRT7 were higher in the body 157 (thorax + abdomen) of female primary reproductives (queens) than in those of male reproductives 158 (kings), workers and soldiers (Supplementary Fig. 10). However, queens derived from alates 159 (=winged imagos) are suggested to live rather shorter than kings32. Therefore, at least in R. 160 speratus, SIRT6 and SIRT7 might contribute to egg-laying but not to the life-span elongation. 161 These results suggest that the caste-biased expressions of genes encoding histone 162 modifying enzymes would determine the caste identities through caste-specific PTMs in R. 163 speratus. Therefore, such caste-specific PTMs must be caused during the caste differentiation 164 process in their postembryonic development. Elucidation of the regulatory mechanisms for caste- 165 specific PTMs will help us to understand how social organizations of termites are maintained.

166 Epigenetics: DNA methylation 167 DNA methylation, the addition of methyl groups to DNA bases, is one of the most important 168 regulatory mechanisms of gene expression. The regulation of gene expression by DNA 169 methylation is considered to be a candidate for a proximate mechanism to produce insect 170 phenotypic plasticity33,34. Indeed, in the honey Apis mellifera, regulation of DNA methylation 171 has been shown to take an important role in the caste differentiation of queens and workers35. 172 Because termites exhibit relatively high levels of DNA methylation among insects36,37 (also see 173 the main text), DNA methylation should get more attention in termite biology. 174 To date, some major factors regulating DNA methylation status have been reported. DNA 175 (cytosine-5)-methyltransferase 1 (DNMT1) methylates the replicated DNA strand based on the 176 methylation status of the template DNA38,39 and DNA (cytosine-5)- methyltransferase 3 (DNMT3) 177 adds a methyl group to the unmethylated DNA region40,41. Ten-eleven translocation 178 methylcytosine dioxygenases (TETs) are involved in the oxidative demethylation of 5- 179 methylcytosine, and thymine-DNA glycosylase (TDG) also contribute to DNA demethylation 180 through the process of base excision repair42. Proteins of methyl-CpG binding domain (MBD) 181 family are capable of binding to methylated CpGs, and involved in a gene expression regulation 182 machinery43,44. It has been suggested based on transcriptome sequencing data that DNMT1, 183 DNMT3, MBD and TET genes are present45,46 in the genome of Reticulitermes speratus, but their 184 copy numbers and gene expression levels in different castes still remain unknown. In addition to 185 those gene, the copy number and expression level of the TDG gene should also be examined to 186 accelerates the study on DNA methylation in R. speratus. 187 To identify those genes, we carried out BLASTP searches against the gene models and 188 TBLASTN searches against the genome assembly of R. speratus. As query sequences for the 189 BLAST searches, we used protein sequences of the DNMT1, DNMT3 and MBD genes identified 190 in other insect species (listed in Hayashi et al.45) and those of the TET and TDG genes of 191 Drosophila melanogaster (NCBI accession numbers of NP_001261344.1 and NP_651925.1, 192 respectively). We applied an E-value threshold of 1e-10 for the BLAST searches. In addition to R. 193 speratus, we performed the BLAST searches for those genes against genomic and protein 4

194 sequence data of Blattella germanica47, Periplaneta americana48, Zootermopsis nevadensis27, 195 natalensis 49 and formosanus50. Furthermore, we also performed 196 mapping of the query sequences of the BLAST searches onto the genome assemblies of the 197 blattodean species using Exonerate51. 198 We found one copy of each of those genes in R. speratus (Supplementary Table 14), 199 meaning that R. speratus preserves the major component of DNA methylation and demethylation. 200 The other termites examined in this study were also showed to possess all of the genes. 201 However, DNMT3 was not found from two cockroach species, B. germanica and P. americana. It 202 has been known that lineage-specific duplications and deletions in DNMT have occurred 203 independently in various taxa (reviewed in Glastad et al.34). DNMT3 has been lost in some 204 cockroach lineages, while preserved in termite genomes, suggesting that DNMT3 preservation is 205 important for the evolution and maintenance of social life. Further studies are required on the 206 DNMT3 functions to clarify this hypothesis. Our RNA-seq analysis revealed that MBD-like and 207 MBD-R2 differentially expressed among castes both in head and in thorax and abdomen 208 (Supplementary Fig. 11). The expression level of DNMT3 in head and that of TDG in thorax and 209 abdomen were also significantly different among castes (Supplementary Fig. 11). Further 210 researches are required to examine if expression differences in those genes contribute to 211 maintaining the caste identity and caste differentiation.

212 Chemosensory genes 213 Social insects use diverse chemical signals to communicate various information and maintain 214 their colonies. In insects, chemical compounds are detected by receptor proteins expressed in 215 olfactory and gustatory receptor neurons. Three classes of chemosensory receptors are known in 216 insects. Odorant receptors (ORs) and gustatory receptors (GRs) are seven transmembrane 217 proteins and generally expressed in the chemosensory appendages52,53. The ionotropic receptors 218 (IRs) were found most recently and are members of the ionotropic glutamate receptor family54. 219 Sensory neurons are surrounded by a hydrophilic sensillar lymph. Many odorants are 220 hydrophobic, so that these water-insoluble lipophilic compounds require water-soluble carrier 221 proteins to access the membrane receptors of sensory neurons55. A variety of proteins in the 222 sensillar lymph are known to involve in this process, including odorant binding proteins (OBPs) 223 and chemosensory proteins (CSPs). These proteins help to solubilize hydrophobic chemicals and 224 to transport specific ligands to receptor proteins involving odor detection, discrimination and 225 coding56,57. Sensory neuron membrane proteins (SNMPs) are transmembrane proteins belonging 226 to the CD36 protein family. SNMPs are expressed in olfactory sensory neurons, presumably 227 supporting the ORs to capture odor molecules on the sensory neurons58. 228 Recent comparative genomics of social insects reveals lineage specific expansion of 229 chemosensory genes in social insects. In , ORs are diversified, showing largest repertoire in 230 insects59. Subsets of OBPs and CSPs are specifically expressed in the antennae but others are 231 expressed primarily in non-chemosensory tissues60–65, suggesting that OBPs and CSPs are not 232 restricted in chemosensory functions. In contrast to ants, the genome sequence of Z. nevadensis 233 revealed chemoreceptor repertoire were expanded in IRs not in ORs, and almost all OBPs are 234 expressed in antennae27. These results suggest that expansion of chemosensory gene family is 235 crucial for their complex social lives, but interestingly, different classes of gene families are 236 expanded in the independent social evolution in ants and termites. Here, we reported 237 chemosensory gene repertoires of subterranean termites (R. speratus), as well as their gene 238 expression patterns among castes (primary reproductives, soldiers, and workers) and between 239 sexes (females and males) for CSPs and SNMP. Details of analyses for OBPs and receptor 240 genes will be reported elsewhere. 241 We used blastp to search for models of chemosensory genes in R. speratus for using 242 protein sequences of other insect species27,66–72 as queries with an e-value cutoff of 1.0E−5. We 243 also ran a HMM search using the OS-D superfamily (pfam03392), CD36 family (PF01130), 7tm 244 Chemosensory receptor (pfam08395), Ligand-gated ion channel (pfam00060) and Ligated ion 245 channel L-glutamate- and glycine-binding site (PF10613) as a query. For phylogenetic analyses, 246 we produced an alignment with the E-INS-i strategy of MAFFT73 using protein sequences of D. 247 melanogaster, A. mellifera, A. pisum, P. humanus, Z. nevadensis and R. speratus for CSPs.

5

248 Ambiguous sections of the alignment were removed using trimAl (option-‘gappyout’)74. This 249 alignment was used to produce a maximum likelihood phylogenetic tree using RAxML75 with 100 250 bootstrap replicates. 251 For chemosensory receptor genes, we found 31 OR, 25 GR, and 92 IR candidates from 252 automatically annotated gene models of R. speratus (Supplementary Table 15-17). For OR 253 genes, the numbers were almost one-tenth of OR numbers in ant species and the about half of 254 OR numbers reported in Z. nevadensis. As suggested by Terrapon et al.27, OR gene repertoire 255 might not be expanded as in ants. However, because of OR genes are difficult to be assembled 256 and annotated automatically69, it is possible that many OR as well as GR genes were overlooked 257 in our gene models. Further intensive manual annotation is needed to confirm the OR gene 258 repertoires in Reticulitermes termites. The IR family was most expanded receptor family in R. 259 speratus. This is also true for Z. nevadensis genomes with two distinctive termite-specific 260 expansion sub-families27. Although ligand-specificity is unknown in many IRs, this receptor family 261 might have diverse function in termite society. 262 We found 5 SNMPs from R. sepratus and 3 SNMPs from Z. nevadensis gene models 263 (Supplementary Table 18). RNA-seq analyses detected the expression of 3 SNMPs 264 (RspeSNMP1a, 1b and 2), and they were expressed in both head and body (thorax + abdomen) 265 parts (Supplementary Fig. 12). Although it is suggested that SNMPs were involved in sex 266 pheromone reception58, R. speratus SNMPs were not expressed differentially between sexes as 267 well as among castes in the head part (Supplementary Fig. 13, FDR > 0.05). Therefore, SNMPs 268 might be not involved in the sex pheromone perception in R. speratus termite. 269 For odorant carrier proteins, we found 10 CSPs from R. speratus gene models (Supplementary 270 Table 19). We also found 10 CSPs from Z. nevadensis genome, and the numbers of R. speratus 271 CSP genes are compatible with Z. nevadensis. Seven of 10 CSPs were completely modeled in R. 272 speratus. In the ants CSPs are major antennal protein and there are number of CSP gene 273 specifically expanded in ants63,76. The number of CSP genes was smaller in termite species than 274 in ants and clear termite-specific expansion was not detected in the CSP phylogenetic trees 275 (Supplementary Fig. 14). RNA-seq analyses revealed that 5 CSPs were mainly expressed in the 276 head part but the others (2 CSPs) were mainly expressed in the body (thorax + abdomen) part 277 (Supplementary Fig. 15), suggesting that the CSPs were not restricted to the peripheral 278 chemosensory events. There were no sex-specific expression patterns of CSPs in the head parts. 279 Among the 5 head-specific CSPs, 3 of them were differentially expressed among castes and 280 were relatively highly expressed in the non-reproductive castes (soldiers and workers) compared 281 with primary reproductives (Supplementary Fig. 16, FDR < 0.05). These 3 CSPs are likely to be 282 involved in the communication related to social tasks such as foraging and colony defense.

283 Biogenic amines and neuropeptides 284 Biogenic amines, which include neurotransmitters, and neuropeptides regulates behaviors and 285 physiological status of individuals. The genes encoding those peptides are widely conserved 286 among insects77,78. The roles of those peptides in the regulation of behaviors and physiological 287 status that are underpinned have been extensively examined in the honeybee Apis mellifera 79,80. 288 On the other hand, still remains much to be explored in termites, which have acquired eusociality 289 independently of the honeybees. To further explore the roles of biogenic amines and 290 neuropeptides in termites, we identified genes and analyzed gene expression patterns among 291 castes in R. speratus. 292 Using the sequence information of biogenic amine-related genes27 and Z. nevadensis 293 neuropeptide genes81 as queries, we carried out Blast searches for those genes against the gene 294 models of R. speratus. As a result, there were basically no differences in gene repertoires and 295 numbers between Z. nevadensis and R. speratus (Supplementary Table 20). 296 Expression levels of each gene were compared among three castes, two sexes, two 297 body parts (head and thorax + abdomen) using RNA-seq data. The results showed that some 298 identified genes were expressed in a caste-specific manner (Supplementary Fig. 17). 299 Interestingly, high expression levels involved in dopamine biosynthesis, Pale (RS010906) and 300 Dopa decarboxylase (RS006642), were observed in soldiers and workers than in primary 301 reproductives (Supplementary Fig. 17). On the other hand, Dopamine N acetyltransferase

6

302 (RS005696), involved in the metabolism from dopamine to N-acetyldopamine82, was highly 303 expressed in soldiers (especially in heads; Supplementary Fig. 17). Consequently, intrinsic 304 dopamine levels (probably in heads including brains) may be different among castes in R. 305 speratus. Generally in insects, it is well known that species-specific behaviors, such as 306 trophallactic behavior in ants and antipredator behavior in beetles, are affected by the dopamine 307 levels83–85. In the yellow fever mosquito Aedes aegypti, dopamine was also involved in the 308 cuticular sclerotization86. Indeed in termites, brain dopamine levels were significantly higher in 309 soldiers (and soldier-destined individuals) than workers (and worker-destined individuals) in 310 Hodotermopsis sjostedti and Z. nevadensis, respectively87,88. Consequently, there is a possibility 311 that differences of dopamine levels are related to the caste-specific social behavior and 312 morphology in R. speratus. 313 Next, three dopamine receptor genes (Dop1-3) were identified, among which different 314 expression levels were observed only in Dop1 (Supplementary Fig. 18). The expression level of 315 Dop1 in queens (thorax + abdomen) was much higher than that in other castes including kings 316 (Supplementary Fig. 18). Because dopamine receptor genes were expressed in ovarian tissues in 317 A. mellifera89, R. speratus Dop1 may also be expressed in queen ovaries and involved in the 318 ovarian development. Tyramine β hydroxylase (RS013347) involved in octopamine biosynthesis 319 was expressed a little bit higher in soldiers (thorax + abdomen) than the other castes 320 (Supplementary Fig. 17). More clearly, higher expression levels of Octopamine-Tyramine 321 receptor (RS000810) and Octopamine receptor (RS008926) were observed in soldier heads 322 ( Supplementary Fig. 18). In D. melanogaster, RNAi of Tyramine β hydroxylase resulted in the 323 decreases of aggression in both males and females90 and locomotor speed induced by food 324 deprivation91. In H. sjostedti, both octopamine and tyramine levels in brains/suboesophageal 325 ganglion were higher in soldiers, and octopamine-tyramine neurons were specifically enlarged in 326 soldiers87. Consequently, present results observed in R. speratus and previous studies performed 327 in other species suggest that biogenic amines such as dopamine, octopamine and tyramine are 328 crucial for termite soldier-specific roles. 329 Finally, we identified 30 neuropeptides-related genes in R. speratus genome. Expression 330 levels of 15 genes were significantly different among castes (Supplementary Fig. 19). It should be 331 noted that, in most cases, expression levels tended to be higher in heads than in thoraces and 332 abdomens (refer to the RPKM values, Supplementary Fig. 19). Given that the neuropeptides 333 have crucial roles for regulating a wide range of insect behaviors92, these 15 genes identified are 334 involved in caste-specific behaviors and physiological actions in R. speratus.

335 Juvenile hormone-related genes 336 Juvenile hormone (JH) is the central factor for polyphenisms seen in insects, including caste 337 differentiation in social insects93,94. Since it has long been known that the transition of JH titer 338 plays critical roles in the caste differentiation in termites95, the factors up- and downstream of the 339 JH action have been particularly focused, especially in lower termites (e.g. Miura and 340 Maekawa96). For example, RNAi of a JH binding protein gene Hexamerin (Hex) promotes the 341 presoldier molt in Reticulitermes flavipes, suggesting that the sequestration of JH is important for 342 the soldier differentiation97. Moreover, expression patterns of JH biosynthesis genes have been 343 elucidated during the presoldier molt under natural condition in Z. nevadensis98. The expression 344 changes of two genes (JHAMT and CYP15A1), involved in the final steps of JH biosynthesis, are 345 suggested to be crucial for the presoldier molt. RNAi for the receptor gene Methoprene-torelant 346 (Met) was also investigated during soldier and neotenic differentiations in Z. nevadensis and R. 347 speratus, respectively99,100, suggesting that caste-specific morphogenesis (e.g., head and 348 mandible enlargement in soldiers) and/or physiological changes (e.g., up-regulation of 349 Vitellogenin in neotenics) are regulated under the JH action. Recently, based on the comparisons 350 of expression patterns of JH-related genes between queens and workers in three species with 351 genome information, Jongepier et al.101 suggest that the JH action differs between lower and 352 higher termites. However, the roles of JH-related genes have yet to be elucidated in higher 353 termites. To clarify this issue, information of R. speratus is very important, because a sister group 354 relationship between a clade containing Reticulitermes and the (higher termites) was 355 strongly supported by multiple previous studies on the molecular phylogeney102–104. Here, we 356 identified JH-related genes in R. speratus, mainly based on the information of lower and higher 7

357 termites (Z. nevadensis and Macrotermes natalensis)27,49. As the results of expression analyses 358 of those genes among castes, we discuss about the roles of related genes and the diverse JH 359 action in termites. 360 First, we identified 15 JH biosynthetic genes and 5 signaling genes (Supplementary 361 Table 21). The gene repertoires and the numbers were conserved and essentially similar to those 362 in Z. nevadensis and M. natalensis. As shown in other reports on hemimetabolous insects 363 (Villalobos-Sambucaro et al. 105 2015), two Met isoforms (Met A and B) were found in termites and 364 cockroaches (Supplementary Fig. 20), although it was not clear whether there were any 365 functional differences between the two isoforms. Our expression analyses using RNA-seq data 366 showed that many genes in the early steps of the JH biosynthetic pathway (e.g. HMGS1, HMGR, 367 DD and IPPI) were highly expressed in all the body parts of soldiers, i.e., in heads, thoraces and 368 abdomens (Supplementary Fig. 21). Soldiers of this species have well-developed frontal glands in 369 their heads and thoraces producing defensive substances including isoprenoids, which are 370 synthesized via the early steps of the JH biosynthetic pathway (also known as the mevalonate 371 pathway)106. Consequently, the gene up-regulations seen in soldiers (Supplementary Fig. 21) 372 suggested to be responsible for the production of defensive chemicals in frontal glands. On the 373 other hand, in the late steps of the JH biosynthetic pathway, JHAMT and CYP15A1 were highly 374 expressed in heads of primary reproductives (Supplementary Fig. 22). Up-regulation of these 375 genes were reported to strongly correlate to JH levels in the locust Schistocerca gregaria107. 376 According to the previous work108, primary reproductives used for RNA-seq (4 months after 377 colony foundation) may start to increase JH level for reproduction. The results suggest that high 378 JHAMT and CYP15A1 expression levels in heads are also related to these physiological changes 379 in primary reproductives. Moreover, the expressions of most genes in the late steps, including 380 JHAMT and CYP15A1, in workers tended to be higher than in soldiers, but the similar levels as 381 those in primary reproductives (Supplementary Fig. 22). These tendencies in R. speratus are 382 different from those in two lower termites, Z. nevadensis and Cryptotermes secundus, but similar 383 to those in a higher termite Macrotermes natalensis 101. Similar tendencies were also observed in 384 the expression patterns of JH signaling genes (Supplementary Fig. 23). For example, up- 385 regulations of SRC (also known as taiman) in soldiers and workers compared to primary 386 reproductives were observed (Supplementary Fig. 23). Moreover, although relatively high 387 expression levels of both Kr-h1 and Br-C were observed in primary reproductives, E93 was highly 388 expressed in soldiers especially in heads (Supplementary Fig. 23). These tendencies were similar 389 to those in the higher termite M. natalensis101. Overall, the similar expression patterns of JH- 390 related genes between R. speratus and M. natalensis suggest that the change of JH signaling 391 action is required for the evolution of complex caste system observed in both Reticulitermes spp. 392 and higher termites, both of which possess the forked caste differentiation pathways102–104. 393 Second, precursor and receptor of allatotropin/allatostatin were identified (Supplementary 394 Table 21). High expression levels of queen allatotropin receptor and king allatostatin precursor 395 were observed in thorax and abdomen samples (Supplementary Fig. 23). Both allatotropin and 396 allatostatin are neuropeptides to regulate the JH biosynthesis in corpora allata, but shown to be 397 highly expressed in pupal stage and tissue-specific patterns in the beetle Tribolium castaneum109. 398 Further analyses should be performed to know biological significances on the up-regulations of 399 allatotropin/allatostatin in termite reproductives. 400 Finally, genes for JH binding and degradation were identified (Supplementary Table 21). 401 Both Hex1 and Hex2 were highly expressed in workers than those in primary reproductives and 402 soldiers (Supplementary Fig. 24). These patterns are not contradict with those observed in the 403 honeybee Apis mellifera, in which hexamerines were highly expressed in workers than queens110. 404 Large numbers of JH esterases (JHEs) were obtained in R. speratus as in two other species with 405 genome information. Molecular phylogenetic tree based on amino acid sequences of JHEs was 406 constructed by MEGA7111, using acethylcholin-esterase genes of Drosophila melanogaster and 407 three termites as outgroups (Supplementary Fig. 25). The resultant phylogeny showed a specific 408 clade containing termite JHEs. Generally in insects, JHEs are mainly produced in fat bodies and 409 catalyze the hydrolysis of JH in hemolymph112,113. Consequently, different JHE actions in each 410 individual are probably involved in JH titer changes that may lead caste differentiation. 411 Interestingly, RS001960, 61, 65-67 were included in the same scaffold 129, all of which were 412 observed in the specific termite clade (Supplementary Fig. 25), were differently expressed among

8

413 castes (Supplementary Fig. 24). Similarly, RS004712-13 (scaffold 20) and RS014537-38 (scaffold 414 83) were also differently expressed among castes (Supplementary Fig. 26). These may be 415 reflected by gene duplication and neo-/sub-functionalization, as discussed in the main part of this 416 paper. Further detailed expression and functional analyses should be required to clarify this 417 possibility.

418 Ecdysone-related genes 419 Termite caste differentiation is deeply associated with molting events. Generally in insects, 420 molting events are regulated by both juvenile hormone (JH) and 20-hydroxyecdysone (20E; 421 active form of ecdysone)114. The gene responsible for the 20E biosynthesis and the signaling 422 pathways have been well studied in some model insects (e.g. Niwa & Niwa115). Ecdysone is 423 generally produced in the molt glands (also known as prothoracic glands) by a suite of enzymatic 424 reactions, released into hemolymph and converted into 20E in peripheral target tissues. 425 Studies on the 20E roles in caste differentiation are very few in termites, compared to 426 those of JH shown in the previous section. However, some recent literatures clarify the crucial 427 roles of 20E signaling in the termite caste differentiation. For example, an artificial 20E application 428 induced the worker-worker molt in Reticulitermes speratus116, suggesting that there is a 20E 429 signaling pathway to regulate the worker molt similar to the nymphal molt in other 430 hemimetabolous insects. Moreover, for the soldier differentiation (worker-presoldier and 431 presoldier-soldier molts), the ecdysone receptor (EcR) was shown to be activated in 432 Zootermopsis nevadensis117. Based on the expression and function analyses of 20E signaling 433 genes in Z. nevadensis, Masuoka et al.118 suggest that there are two different 20E signaling 434 pathways, one of which has a role for the molting to the next instar, and another provides a role 435 for the soldier-specific morphogenesis. According to the reports on the expression patterns of 436 some 20E-related genes selected in three termite species with genome information47, similar 437 queen-biased or worker-biased expression patterns were observed in the 20E biosynthesis 438 genes. To know whether the 20E roles are common or diversified among termite species, 439 especially between lower and higher termites, expression patterns of related genes should be 440 clarified more in detail. Here, we identified 20E-related genes in R. speratus using the information 441 of Drosophila melanogaster and Bombyx mori, and examined the expression patterns among 442 castes. 443 We identified 20E biosynthesis (7), receptor (2) and signaling (11) genes from the 444 genome of R. speratus (Supplementary Table 22). Although gene numbers and repertoires were 445 essentially similar to those of D. melanogaster and B. mori, Prothoracicotropic hormone (PTTH), 446 i.e., the conserved neuropeptide that activate molt glands, could not be observed in the current 447 gene model (Rspe OGS1.0). 448 Expression analyses were performed using RNA-seq data. Huge differences of 449 expression levels among three castes and two body parts (head and thorax + abdomen) were 450 observed in many genes (Supplementary Fig. 27, 28). For the 20E biosynthesis genes, neverland 451 (RS010513) and shade (RS006327) were highly expressed in thorax and abdomen, compared to 452 those in head. The expression localization of neverland other than molt glands are completely 453 unclear in termites. The final step of 20E biosynthesis (enzymatic activity of hydroxylation) is 454 normally regulated by shade in the peripheral tissues including the fat body115. Up-regulation of 455 shade were observed both in soldiers and workers, suggesting that 20E titer and signaling 456 pathway activity are activated in those castes. Moreover, phantom (RS002862), shadow 457 (RS010451) and spook (RS010514, primary reproductives and workers) were relatively highly 458 expressed in the head part. These genes may be expressed mainly in the molt glands of termites 459 and involved in the biosynthesis of ecdysone. Interestingly, these genes were also highly 460 expressed in thorax and abdomen of queens (all three genes) and kings (shadow). It is known 461 that ecdysone is produced in the reproductive organs, and involved in ovarian development, 462 proliferation of spermatogonium and sperm formation in some insects119,120. Further expression 463 and functional analyses should be performed to know whether phantom, shadow and spook are 464 expressed in ovary and testis in termites. 465 Expression patterns of EcR (RS006194) and USP (RS005985) were essentially similar to 466 each other, except for thorax and abdomen in primary reproductives. High expression levels of 467 receptor genes in soldiers and workers (and EcR in queen thorax and abdomen) were not 9

468 contradict with the results of 20E biosynthesis genes. Notable expression patterns of the 20E 469 signaling genes were observed in HR38 (RS008487) and E93 (RS003976); both genes were 470 highly expressed in soldiers (and also HR38 in workers). In the honeybee Apis mellifera, up- 471 regulations of 20E signaling genes were observed in workers (Kubo, 2012), and HR38 was 472 expressed in forager brains121. There is a possibility that HR38-related 20E activity is involved in 473 caste-specific behaviors both in honeybees and termites. It should be noted that JH-Met-Kr-h1- 474 E93 (MEKRE93) pathway has crucial roles in both hemi- and holometabolan metamorphoses122– 475 124. Soldiers are the developmentally-terminal stage and cannot molt into the next instar. Further 476 analyses should be performed to clarify the functional meanings of highly E93 expression in 477 soldiers and the role of E93-related signalings on the soldier formation in termites.

478 Insulin/insulin-like signaling pathway 479 Insulin/insulin-like signaling (IIS) pathway is highly conserved from invertebrate to mammals. In 480 insects, insulin-like peptides are involved in the pathway and serve as hormones and growth 481 factors125. The insulin signaling plays an important role in caste differentiation of some social 482 insects. In honeybees, the IIS pathway contributes to the caste differentiation and aging126,127. In 483 a damp-wood termite Hodotermopsis sjostedti, this pathway is also shown to be involved in the 484 soldier differentiation (Hattori et al128). Therefore, studies on IIS are thus fundamental to the better 485 understanding on insect sociality. Here, we searched for the IIS genes129 in the genome of R. 486 speratus, and investigated the differential expression levels of those genes among castes and 487 between sexesm based on the RNA-seq data (for details, see Methods in the main text). 488 We carried out BLASTP searches against the R. speratus gene models using the insulin 489 signaling genes of D. melanogaster129 (Supplementary Table 23), based on the sequence 490 similarity (E-value cutoff of 1e-10) and phylogenetic relationships shown below. We identified all 491 of the IIS components from the R. speratus genome, with some genes duplicated. We found that 492 Ras85D (Ras1) genes were duplicated in termite/cockroach lineages (Supplementary Table 23, 493 Supplementary Fig. 29). Except for termite/cockroach lineage, Ras1 gene duplication was 494 observed only in the ponerine ant Harpegnathos saltator (Supplementary Fig. 29). We found 495 three copies (RS000922, RS007018, and RS007019) of InR genes (Supplementary Fig. 30) in R. 496 speratus. Three copies of InR are also found in Z. nevadensis (Xu and Zhang130). Increase of the 497 copy number of InR might be involved in the social evolution in termites. On the other hand, the 498 ImpL2 gene was absent in the genomes of R. speratus and cockroaches (Supplementary Table 499 23). The ImpL2 protein bind with the insulin-like peptides and inhibits the insulin signaling 500 activity131. This suggests that the IIS regulation in termites and cockroachs differs from that in 501 other insects. 502 Our RNA-seq analysis revealed that, including the above mentioned Ras1 and InR 503 genes, there were significant differences in gene expression levels among primary reproductives, 504 workers and solders in heads (16 genes) and the thoraxes and abdomens (20 genes) 505 (Supplementary Fig. 31). It is considered that most of the differentially expressed genes in the 506 soldier heads are involved in the soldier-specific functions because the head of soldier exhibits 507 the highly specialized external and internal morphology associated with defensive tasks with 508 pheromonal exocrine glands132,133. The primary reproductives exhibited the highest expression 509 levels in two copies (RS000933 and RS013615) of Ras1 in the thoraxes and abdomens 510 (Supplementary Fig. 31), suggesting that these genes are involved in the gonad development. 511 Two InR genes (RS000922 and RS007019) were highly expressed in soldiers, whereas 512 RS007018 were highly expressed in workers (FDR < 0.05) (Supplementary Fig. 31). This means 513 that different gene copies of InR could possess some caste-specific functions.

514 Toolkit genes involved in wing formation 515 The acquisition of wings is one of the largest events in the insect evolution, that have led the 516 adaptive radiation in insects. To date, some genes involved in the wing formation have been 517 identified in insects134. It has also been known that environmental factors can affect expression of 518 those genes, contributing to the regulation of wing polyphenism (e.g. Hartfelder and Emlen135, Xu 519 et al.136). The wing polyphenism is thus a remarkable example of gene-environment interaction in 520 development. In some insect species, for example ants137 and aphids138, association between

10

521 expression of some homologs of Drosophila wing patterning genes and wing polyphenisms has 522 been shown. Termites also exhibit wing polyphenism among castes; workers and soldiers do not 523 have wings but nymphs, which develop into adults, possess wing buds139. 524 To identify wing-development toolkit genes referred to in Abouheif and Wray137 and 525 Brisson et al.138 in R. speratus, we performed BLASTP searches for those genes of D. 526 melanogaster against the gene models of R. speratus. Gene orthology was confirmed by 527 reciprocal BLAST between R. speratus and D. melanogaster genes, and by phylogenetic 528 analyses. We found no duplications and losses in the wing-development toolkit genes of R. 529 speratus (Supplementary Table 24). 530 Our RNA-seq data showed that the expression of Daughters against dpp (Dad) in both 531 soldiers and workers were lower than that of primary reproductives (FDR < 0.05, Supplementary 532 Fig. 32). Dad is induced by Dpp, and then act as the antagonist of Dpp52. Therefore, low 533 expression of Dad in both soldiers and workers suggest the low activities of Dpp signaling. 534 Although the apterous gene is suggested to be involved in aphid wing polyphenisms138, no 535 significant differences were observed among castes in R. speratus. Moreover, it is interesting to 536 note that the wing-specific selector gene vestigial was significantly highly expressed in wing-less 537 soldier heads (Supplementary Fig. 32). We further need to examine gene expression profiles and 538 gene function during development of each caste on the wing primordia and wing buds to discuss 539 the developmental mechanism of termite wing polyphenism. Identification of the wing 540 developmental toolkit genes in this study facilitate those study.

541 Immunity 542 The group living of social insects enables task partitioning among individuals, i.e., division of 543 labor, which is thought to realize higher productivity per individual than the solitary living does. On 544 the other hand, group living is at high risk for infectious diseases because of high density of 545 individuals and strong and frequent interactions among individuals. Moreover, many termite 546 species including R. speratus, live in pathogenic microbe-rich environments, such as damp 547 woods and soils140. Thus, pathogenic infections could have a large impact on fitness of termite 548 colonies. Since, generally, immune system of individuals is the major defense mechanism against 549 pathogens, and thus it is likely that termite immune systems have adaptively evolved. 550 Identification of genes involved in the immune system gives us an opportunity to study how 551 termites respond to pathogens. 552 We searched the R. speratus genome for the immune-related genes listed in ImmunoDB, 553 which were classified into 27 categories based on gene function and pathway141. First, we 554 downloaded protein sequences of the immune-related genes of D. melanogaster, Anopheles 555 gambiae, and Aedes aegypti from ImmunoDB. Then, using those protein sequences as queries, 556 BLASTP searches were carried out against the gene model RspeOGS1.0 to identify homologs of 557 those genes in R. speratus. PfamScan was also performed for the categories with a specific 558 protein domain. 559 We identified 251 immune-related genes from the R. speratus genome (Supplementary 560 Table 25). Almost all genes involved in IMD/JNK, Toll and JAK/STAT signaling pathways were 561 found in R. speratus as in the damp wood termite Z. nevadensis 27. In all of the categories except 562 for lysozymes, the numbers of genes are not remarkably increased or decreased in R. speratus 563 compared to other insect species (Supplementary Fig. 8). 564 Eight genes encoding antimicrobial peptides (AMPs) were identified in the R. speratus 565 genome (two defensins, termicin, crustin, locusin, prolixicin, two thaumatins; Supplementary 566 Table 25). Most of them showed caste-biased expressions (Supplementary Fig. 33). One 567 (RS002487) of the two defensin genes exhibited soldier-biased expressions while the termicin, 568 crustin and one of the thaumatin genes worker-biased. Moreover, as mentioned in the main text, 569 many of the lysozyme genes also exhibited caste-biased expressions. 570 Our results of the differential expressions of AMPs and lysozymes suggest possibility of 571 division of labor among castes on colony-level immunity. AMPs and lysozymes are “effectors” of 572 an immune system, which directly interact with microbes. Those effectors might have different 573 target microbes (gram-positive, gram-negative bacteria, fungi, etc.). In addition, expression 574 profiles of the effector genes were rather different among castes. Therefore, the differences of 575 expression profiles may result in difference on immune potential against different microbes 11

576 among castes. Division of labor on colony-level immunity among castes were also suggested in 577 R. speratus142. Further studies are required to understand the adaptive evolution in termites, 578 responding to pathogens.

579 Insecticide target and detoxification genes 580 R. speratus is a devastating pest species that can cause serious damages to wooden 581 constructions and huge economic losses143. To prevent the damage by pest termites, use of 582 Insecticides is efficient. To date, various kinds of insecticides were developed and are actually 583 used for the termite control. However, development of new insecticides is still important to further 584 reduce negative influence on environment and organisms other than termites144. The genome 585 sequence information of pest species, especially sequences of insecticide target genes and 586 detoxification-related genes, is useful to develop such insecticides and to study insecticide 587 resistance mechanisms145. Here, we report some major insecticide target genes, namely, genes 588 related to ion channels, chitin synthesis and muscle contraction. We also report the genes related 589 to the detoxification of insecticides, that is, cytochrome P450 monooxygenases (CYP), 590 glutathione S-transferases (GST), carboxylesterases (CCE). In insects, those three groups of 591 enzymes play major roles in synthesis and degradation of endogenous substrates such as 592 hormones and pheromones as well as in metabolization of exogenous substrates such as 593 insecticides146-148, and can be candidates of causal factor of insecticide resistance. 594 To identify the insecticide target genes, we performed BLASTP or TBLASTN searches 595 against the gene models or the genome assembly of R. speratus using the amino acid sequences 596 of those genes of Drosophila melanogaster as queries. Identification of the detoxification genes 597 was carried out by profile hidden markov models (profile HMMs) search. Profile HMMs of CYP 598 (PF00067), GST (PF00043 and PF02798), CCE (PF00135) that were retrieved from the Pfam 599 database (http://pfam.xfam.org) were searched against the gene models of R. speratus using 600 HMMER v3.1b1149. The profile HMM searches were also carried out against the gene models of 601 Z. nevadensis27 and M. natalensis49 and the amino acid sequences obtained from transcriptome 602 sequencing data (assembled contig sequences using Trinity150) of Periplaneta americana151 and 603 Cryptocercus punctulatus152 (DRA001254 and DRA004598, respectively). To perform a 604 phylogenetic analysis on the amino acid sequences with the profile HMM hits, those sequences 605 were aligned with MAFFT71, and the best models of amino acid replacements in the alignments 606 were determined using ProtTest v3.4153. Then, a maximum likelihood-based phylogenetic trees 607 were generated based on the alignments with the best replacement models using RAxML73. 608 We identified 64 insecticide target genes in total (Supplementary Table 26). No 609 remarkable gene family expansion was found in those genes. We found 106 CYP genes, 20 GST 610 genes and 43 CCE genes in R. speratus (Supplementary Table 27). Numbers of genes identified 611 in termites were smaller than those in 2 cockroaches, P. americana and C. punctulatus 612 (Supplementary Table 27, 28). In the phylogenetic trees of the CYP genes, we found a clade 613 containing only R. speratus genes with a high bootstrap support (Supplementary Fig. 34). In GST 614 and CCE genes, we did not find and species-specific gene duplications in R. speratus 615 (Supplementary Fig. 35, 36). 616 The identification of those genes could contribute to the development of species-specific 617 control methods in R. speratus. The duplicated CYP genes could be new targets for the control 618 methods. Examining functions of those genes are required for further development of the control 619 methods. In the insecticide target genes and the GST and CCE genes, although no species- 620 specific genes were found, there might be species-specific mutations resulting in changes in gene 621 functions. Those mutations could be new targets of species-specific control methods. In this 622 study, a large number of genes that could be targets of new control methods for R. speratus were 623 identified, enabling new researches on termite controls.

624 MicroRNAs (miRNAs) 625 To evaluate the role of miRNAs for caste polyphenism, we have performed small RNA 626 sequencing in workers and soldiers of R. speratus154. We identified eight miRNAs, which were 627 differentially expressed in soldiers and workers. 628

12

629 Supplementary Methodology

630 Insects 631 All mature colonies of Reticulitermes speratus used for genome, RNA, and Bisulfite sequencing 632 (BS-seq), were collected in Furudo, Toyama Prefecture, Japan (colony #1-8) [Supplementary 633 Table 1]. Pieces of logs were brought back to the laboratory and kept in plastic cases in constant 634 darkness. For the extraction of genomic DNA, we used female secondary reproductives 635 (nymphoids) in colony #1 (total of 2 individuals), collected in November 2013. For RNA 636 sequencing (RNA-seq), workers and soldiers were sampled from colonies #2, #3, and #4 637 collected in September 2014. Primary reproductives (queen and king) were sampled from 638 incipient colonies newly founded by alates (winged adults) that emerged from colonies #5, #6, 639 and #7 collected in April–May 2014. For BS-seq, workers and soldiers were sampled from colony 640 #8 collected in October 2014, and primary reproductives were sampled from incipient colonies 641 founded by alates that emerged from colonies #5 and #7. For in situ hybridization, each caste 642 (alates, female neotenics, workers, and soldiers) was sampled from mature colonies collected in 643 Himi, Toyama Prefecture, Japan (colonies #9 and #10) in April–May 2019. Queens were sampled 644 from incipient colonies newly founded by alates that emerged from colonies #9 and #10. Sexes of 645 individuals were identified by means of the morphological characteristics of the 7th and 8th 646 abdominal sternites (workers and soldiers)155,156 or abdominal tergites (reproductives)157.

647 Genome sequencing and assembly 648 We used female secondary reproductives (nymphoids I and II) (see above for details; colony #1 649 in Supplementary Table 1) for genome sequencing. We excluded the gut and ovaries of 650 nymphoids to avoid contamination by DNAs from the king or other microorganisms. Remaining 651 body parts were frozen in liquid nitrogen and stored at -80°C until DNA extraction. Genomic DNA 652 was isolated from each individual using a QIAGEN Genomic-tip 20/G (Qiagen, Venlo, 653 Netherlands). We used 5 microsatellite loci (Rf6-1, Rf21-1, Rf24-2, Rs02, and Rs03) to confirm 654 whether they were homozygous at these loci and shared the same genotype. Primer sequences 655 for the amplification of Rf and Rs loci are described in Vargo and Hayashi et al., respectively 656 158,159. The quantity and quality of extracted DNA were analyzed using a NanoVue 657 spectrophotometer (Cytiva, Marlborough, MA) and Qubit 2.0 fluorometer (Thermo Fisher 658 Scientific, Waltham, MA). The integrity of genomic DNA was analyzed using pulsed-field 659 electrophoresis in a 0.75% agarose gel (80 V, 16 hours). 660 Genomic DNA (derived from nymphoid I) purified as described above was fragmented 661 with a Covaris S2 sonicator (Covaris, Woburn, MA), size-selected with BluePippin (Sage Science, 662 Beverly, MA), and then used to create two pair-end libraries using a TruSeq DNA Sample 663 Preparation Kit (Illumina, San Diego, CA) with insert sizes of ~250 and ~800 bp [Supplementary 664 Table 3]. The enrichment PCR was done using six cycles. These libraries were sequenced using 665 an Illumina HiSeq 1500 with a 2 × 151 bp paired-end sequencing protocol in Rapid mode at the 666 NIBB Functional Genomics Facility (Okazaki, Japan). Four Mate-pair libraries with peaks at ~3 667 kb, ~5 kb, ~8 kb and ~10 kb, respectively, were created from the genomic DNA (derived from 668 nymphoid II with the same genotype as described above) using a Nextera Mate Pair Sample 669 Preparation Kit (Illumina) [Supplementary Table 3], and sequenced on a HiSeq system using a 2 670 × 151 bp paired-end sequencing protocol at the National Institute of Genetics (Mishima, Japan). 671 Reads of the pair-end and mate-pair libraries were assembled using ALLPATHS-LG (build# 672 47878)160, with default parameters. BUSCO v4.0.6161 was used in quantitative measuring for the 673 assessment of genome assembly, using insecta_odb10 as the lineage input. A genome browser 674 was built using JBrowse162 and is available at http://www.termites.nibb.info.

675 Gene prediction 676 A protein-coding gene reference set was generated using EvidenceModeler (EVM)163 with two 677 main sources of evidence, aligned R. speratus transcripts and aligned homologous proteins of 678 other insects, and a set of ab initio gene predictions. RNA-seq reads were assembled de novo 679 using Trinity164, and ORFs were predicted using TransDecoder165. We used CD-HIT-EST166 to 680 reduce the redundancy of the predicted ORFs. The ORF sequences were mapped to the genome

13

681 using Exonerate167 in est2genome mode for splice-aware alignment. We processed homology 682 evidence at the protein level using the reference proteomes of Drosophila melanogaster 683 (FlyBase; Attrill et al. 2016), Tribolium castaneum (accession no. AAJJ00000000), Apis mellifera 684 (accession no. AADG05000000), Acyrthosiphon pisum (accession no. ABLF01000000), Daphnia 685 pulex (accession no. ACJG00000000), Pediculus humanus (accession no. AAZO00000000), and 686 Zootermopsis nevadensis (accession no. AUST00000000). We also included protein 687 sequences predicted from de novo assembly of RNA-seq reads of Periplaneta americana168 and 688 Nasutitermes takasagoensis169. These reference proteins were split-mapped to the R. speratus 689 genome in two steps: first with BLASTX to find approximate loci, and then with Exonerate in 690 protein2genome mode to obtain more refined alignments. For an ab initio gene prediction, 691 Augustus170 was trained against a set of preliminary gene models of R. speratus (earlier version 692 of EVM set) and then was used to predict the gene models in the R. speratus genome. These 693 gene models derived from multiple evidence were merged using the EVM program to obtain the 694 reference annotation for the genome, which yielded 15584 predicted genes. Lastly, genes of 695 interest were manually inspected and corrected. In particular, tandemly duplicated genes 696 discussed in the main text such as GGPPS and lysozyme genes were liable to be incorrect gene 697 models with erroneous exon–exon connections between paralogous genes in the tandemly 698 repeated cluster. In total, 74 gene models were manually updated. The final gene set composed 699 of 15591 genes was designated as ‘Rspe OGS1.0” [Supplementary Data 2 (DOI: 700 10.6084/m9.figshare.14267381)].

701 Functional annotation of gene models 702 Functional annotation of Rspe OGS1.0 was carried out by homology searches and motif 703 searches. We scanned protein sequences of the Emsembl release-33 of D. pulex, P. humanus, 704 A. mellifera, D. melanogaster, and T. castaneum, and the gene models of Z. nevadensis (Znev 705 OGS v2.229), and Macrotermes natalensis (Mnat OGS3). We also scanned the protein 706 sequences of Rspe OGS1.0 using InterProScan v5.17-56.0171, to annotate domains and motifs of 707 predicted R. speratus coding genes. The Kyoto Encyclopedia of Genes and Genomes (KEGG) 708 annotation for Rspe OGS1.0 was performed on the web server (https://www.kegg.jp/blastkoala/) 709 with the BlastKOALA algorithm172,173. Gene Ontology terms were assigned to Rspe OGS1.0 gene 710 models by analyzing the results of the BLASTP searches against the NCBI nr database (version 711 on March 3rd, 2016) and InterProScan searches with the Blast2GO pipeline (B2G4Pipe version 712 2.5.0)174. 713 The quality of the Rspe OGS1.0 gene set was evaluated by assessing two types of 714 evidence, homology evidence and expression evidence. Among 15591 genes, 12996 (83.3%) 715 showed any hits in the NCBI nr database, 10440 (70.0%) included known protein motifs defined 716 in the Pfam database, and 14302 (91.7%) showed evidence of expression with a threshold of 717 RPKM = 1.0 in any sample of caste-specific RNA-seq data. In sum, 15577 (99.9%) had any 718 evidence for the presence of homologs and/or expression.

719 Orthology inference and gene duplication analysis 720 Orthology determination among three termites: Orthologous genes among the proteomes of three 721 termite species, namely, R. speratus, Z. nevadensis, and M. natalensis (gene models Rspe 722 OGS1.0, Znev OGS v2.229, and Mnat OGS3, respectively), were determined by pairwise 723 comparisons with InParanoid v4.1 followed by three-species comparison with MultiParanoid175,176. 724 Note that the M. natalensis gene set, Mnat OGS3, was built in this study using a similar pipeline 725 as used for R. speratus gene prediction. The BUSCO analysis indicated that Mnat OGS3 726 recovered 95.0% of insecta benchmarking universal single-copy orthologs (BUSCOs) showing 727 significant improvement from original gene models, Mnat_gene_v1.249, which captured 83.1% of 728 insecta BUSCOs. 729 Ortholog analysis with proteomes: Orthology relationships of R. speratus 730 genes (OGS1.0) with other arthropod genes were analyzed by referring to the OrthoDB gene 731 orthology database. We downloaded the arthropod ortholog table and all protein sequences 732 provided by the OrthoDB ver.8 database (87 arthropod species)177. We grouped R. speratus 733 genes with the OrthoDB ortholog group using a two-step clustering procedure implemented in

14

734 custom Ruby scripts. For each R. speratus protein, BLASTP was used to find similar proteins 735 among the arthropod proteins, and the ortholog group of the top hit was provisionally assigned to 736 the query R. speratus gene. Then, the ortholog grouping was evaluated by comparing the 737 similarity level (BLAST bit score was used as a proxy) among members within the focal ortholog 738 group. We keep the grouping if the BLAST bit score between the query R. speratus gene and top 739 arthropod gene was higher than the minimal score within the original cluster members. Among 740 15591 R. speratus OGS1.0 genes, 12434 genes were clustered into 9033 OrthoDB Arthropod 741 ortholog groups. Gene duplication was assessed based on this clustering. If two or more 742 members of one species were found in a single ortholog group, they were regarded as a 743 multigene family.

744 Repeat sequence annotation 745 To annotate repeat sequences, first, we generated repeat sequence models using 746 RepeatModeler (http://www.repeatmasker.org) from the genome assemblies of R. speratus (this 747 work), Z. nevadensis27, and M. natalensis49. Then, we pooled the repeat models of the three 748 species. Using CD-HIT-EST166, we clustered those sequences with >90% of sequence similarity 749 and used only the longest sequences in the clusters as repeat models. We identified repeat 750 sequences using RepeatMasker with the retained repeat models in each of the three species 751 genome assemblies.

752 RNA-seq 753 W4–5 workers (old workers) and soldiers were collected from each colony according to the body 754 size and antennal segments178. To collect primary reproductives, dealated adults, were chosen 755 randomly from each colony in accordance with the method of the previous literature108, and 756 female–male pairs were mated (colonies #5 and #6: 10 pairs, #5 and #7: 10 pairs, #6 and #7: 10 757 pairs; Supplementary Table 1). Each pair was placed in a 20-mL glass vial with c. 8 g of mixed 758 sawdust food (Mitani, Ibaraki, Japan) and kept at 25°C in constant darkness. Colonies were then 759 sampled after 4 months. We observed plural larvae and several workers in each colony, and 760 kings and queens were sampled. Each individual was divided into head and body parts (including 761 thorax and abdomen with the guts) on ice, immediately frozen in liquid nitrogen and stored at - 762 80°C until use. 763 We prepared RNA-seq libraries for 12 categories based on castes (reproductives, 764 workers and soldiers), sexes (males and females) and body parts (head, and thorax and 765 abdomen). Ten individuals were combined for each head sample of each caste and each sex, 766 and five individuals for the thorax and abdomen sample. Three biological replications of the 12 767 categories were made with three different field colonies totaling 36 RNA-seq libraries 768 [Supplementary Table 2]. Total RNA was isolated from each category using an SV total RNA 769 isolation system (Promega, Madison, WI, USA). DNA was digested with RNase-free DNase I for 770 20 min at 37°C. The quantity and quality of extracted RNA were checked using a NanoVue 771 spectrophotometer (Cytiva), Qubit 2.0 fluorometer (Thermo Fisher Scientific), and Agilent 2100 772 bioanalyzer (Agilent Technologies, Palo Alto, CA). Illumina libraries for RNA-seq were prepared 773 using a TruSeq Stranded mRNA Library Prep kit (Illumina) in accordance with the manufacturer's 774 instructions. First- and second-strand cDNA synthesis, adaptor ligation, and amplification were 775 performed. The generated libraries were evaluated using RT-qPCR with a KAPA qPCR SYBR 776 green PCR kit (Geneworks, Thebarton, Australia) and electrophoresis in an Agilent 2100 777 bioanalyzer (Agilent Technologies). All libraries were subjected to a single-end sequencing of 101 778 bp fragments on HiSeq 2500 (Illumina). 779 The raw sequencing reads were filtered to remove adapter sequences and low-quality 780 bases using Trimmomatic v0.32179 with the following thresholds: leading and trailing bases with a 781 Phred quality score (Q) lower than 20, and other sequences lower than Q20 for the average 782 quality in a 4-bp sliding window, but with a minimum length of the sequence read of 50 bp. 783 Subsequently, the filtered reads were mapped onto the genome assembly with TopHat v2.1.0180 784 guided by the gene models. Transcript abundances were then estimated using the featureCounts 785 program of the Subread package181. To compare gene expression levels among castes and 786 between sexes, first, counts per million (CPM) were calculated from the estimated transcript

15

787 abundances. We kept genes with at least CPM of 1 in at least three samples for subsequent 788 analyses. CPM values were then normalized with the trimmed mean of M-values (TMM) algorism 789 in edgeR182. Differentially expressed genes among castes and between sexes were detected in 790 each body part (head / thorax and abdomen) using a generalized linear model with two factors, 791 namely, caste and sex using edgeR with the conditions set as false discovery rate (FDR) < 0.01 792 and the log2 fold change of the expression level > 1. MDS plot was made using the plotMDS 793 function implemented in edgeR. RPKM (Reads Per Kilobase Million) values were calculated by 794 dividing the CPM values by the length of the genes in kilobases.

795 Methylome analysis 796 W4–5 workers (old workers) and soldiers were collected from the colony as described in the 797 previous section. To collect primary reproductives, female–male pairs and incipient colonies were 798 prepared as shown in the previous section. Colonies were then sampled after 6 months, and 799 primary reproductives were sampled. The head of each individual was frozen in liquid nitrogen 800 and stored at -80°C until use. For DNA extraction, we used 10 heads per category. We prepared 801 6 categories based on castes (reproductives, workers, soldiers) and sexes (males, females) 802 [Supplementary Table 3]. Total DNA was isolated from each category using a QIAGEN Genomic- 803 tip 20/G (Qiagen). The quantity and quality of extracted DNA were checked using a NanoVue 804 spectrophotometer (Cytiva) and Qubit 2.0 fluorometer (Thermo Fisher Scientific). 805 Samples containing 200 ng of genomic DNA each for the 6 categories were used to construct 806 Methylome libraries using a post-bisulfite adaptor tagging (PBAT) technique to perform whole- 807 genome bisulfite sequencing (WGBS)183. The isolated R. speratus genomic DNA, together with 808 1% unmethylated lambda DNA as a control, were subjected to bisulfite conversion using an EZ 809 DNA Methylation-Gold Kit (ZYMO Research, Irvine, CA) in accordance with manufacturer's 810 instructions. The bisulfite-converted templates were then subjected to adaptor tagging as 811 described previously183. The generated libraries were assessed on the Agilent Bioanalyzer 2100 812 platform and quantified with a standard curve-based qPCR assay (KAPA Biosystems, 813 Wilmington, MA). The final quality-ensured libraries were pooled and sequenced using an 814 Illumina HiSeq 2500 sequencer for 101 bp single-end sequencing. 815 De-muliplexed raw reads were trimmed of sequencing adapters and low-quality ends 816 (100) or too low (<10) coverage. Differentially methylated regions (DMRs) among 823 castes were investigated using two pipelines. First, we used ANOVA of the ratio of methylation 824 sites by exons. After multiple comparison correction (FDR) and filtering by difference (>30.0%), 825 no DMRs were detected. Next, we used BSmooth software186 to find DMRs. Bismark CpG report 826 files were loaded into BSmooth, and CpG sites with enough read coverage (more than 2 in all 827 samples) were used for smoothing and DMR analysis.

828 Manual annotations and analyses of specific categories 829 Lipocalins. Sequence alignments of lipocalin-related Pfam domains (PF00061, PF08212, 830 PF02087, PF07137, and PF02098) of were generated using the Pfam website 831 (https://pfam.xfam.org)187 and downloaded. Profile hidden Markov models (HMMs) of the 832 alignments were then obtained with hmmbuild of the HMMER package version 3.1 beta2 833 (http://hmmer.org/). We searched gene models of each of the arthropod species (D. pulex, Z. 834 nevadensis, R. speratus, M. natalensis, P. humanus, A. mellifera, D. melanogaster, and T. 835 castaneum) with the HMMs using hmmsearch in the HMMER package188. We annotated 836 significantly similar sequence matches from the hmmsearch as lipocalins. In addition, we also 837 searched the InterProScan results of the arthropod species mentioned above for proteins with the 838 lipocalin-related PROSITE (PS00213) and Pfam domains (IDs mentioned above) signatures, and 839 annotated them as lipocalins. For phylogenetic tree construction, protein sequences of the

16

840 lipocalins annotated above, SOL1 of Hodotermopsis sjostedti (NCBI accession no. BAA87882) 841 and its homologous sequences of Coptotermes formosanus (AGM32427) were aligned using the 842 E-INS-i strategy in MAFFT v7.3.1189. The best-fit model of amino acid replacement for the 843 alignment was determined with ProtTest v3.4190 based on the Bayesian information criterion191. A 844 maximum likelihood phylogenetic tree was generated from the alignment using RAxML75 with 100 845 bootstrap replicates. 846 Cellulases. Amino acid sequences deduced from the R. speratus, Z. nevadensis, M. natalensis, 847 A. mellifera, and D. melanogaster genomes were annotated for CAZy (Carbohydrate Active 848 Enzymes) families based on the dbCAN database v3192. Hmmscan on dbCAN was performed 849 with default settings; E-value < 1e–3 and < 1e–5 as the cutoff values for alignments shorter and 850 longer than 80 amino acids, respectively. Because results with E-value < 1e–3 frequently 851 contained false positives based on BLASTP, these were manually removed. For phylogenetic 852 tree construction, 38 amino acid sequences of GH1 (β-glucosidase) genes were aligned using the 853 Clustal method with MEGA v6.06193, and gaps were excluded manually. The best-fit model of 854 amino acid sequence evolution was determined using the model selection option implemented in 855 MEGA. A maximum likelihood phylogenetic tree was constructed using MEGA with 1000 856 bootstrap replicates. 857 Lysozymes. To identify lysozyme genes in R. speratus, BLASTP searches were carried out 858 against the R. speratus gene model Rspe OGS1.0 using lysozyme protein sequences of D. 859 melanogaster, Anopheles gambiae, and Aedes aegypti retrieved from ImmunoDB194 as the query. 860 Moreover, BLASTP searches for i-type lysozyme protein sequences of various insects195 were 861 also performed against the R. speratus gene model. E-value cutoff values of these BLASTP 862 searches were set at 1e–20. In addition, we performed PfamScan196 for the protein domain 863 PF00062 (c-type lysozyme/alpha-lactalbumin family). 864 We further searched proteome datasets from various arthropod species for lysozyme 865 genes to perform a phylogenetic analysis of lysozyme genes. Proteome sequence data of the 866 arthropods were downloaded from the following web databases: Ensembl197 for Bombyx mori, 867 Nasonia vitripennis, and T. castaneum, VectorBase198 for A. gambiae, Ixodes scapularis, P. 868 humanus, and Rhodnius prolixus, Genome Database199 for Acromyrmex echinatior, 869 A. mellifera, and Camponotus floridanus, AphidBase200 for A. pisum, wFleaBase201 for D. pulex, 870 FlyBase202 for D. melanogaster, Znev OGS v2.229 and Mnat OGS3. When the proteome data 871 sets included isoforms derived from the same genes, we retained only the longest one for the 872 further analyses. BLASTP searches and PfamScan were performed in proteome data sets of 873 those species using the same method as for R. speratus. The protein sequences of the 874 lysozymes identified in the arthropods were aligned with the E-INS-i strategy of the MAFFT 875 program189. Then, a codon-based alignment for the coding sequences of the lysozymes was 876 generated using the PAL2NAL program203 in accordance with the protein alignment. A 877 phylogenetic tree was reconstructed from the codon-based alignment with a maximum likelihood 878 approach based on GTR + gamma using RAxML204. The codon-based alignment was partitioned 879 into three codon positions and a particular parameter set was estimated for each position. One 880 thousand bootstrap replicates were made to assess the branch support. 881 GGPP synthases. We performed TBLASTX with the nucleotide sequences of A. pisum GGPP 882 synthase (accession no. XP_008184262.1) as a query, using the R. speratus gene model (older 883 version of OGS1.0) and the genome final assembly. We found that GGPP synthase homologs 884 were tandemly duplicated on scaffold 31, and the identified gene model was manually curated 885 according to the homologous DNA and deduced amino acid sequences. We identified the 886 homologous synteny blocks in the Z. nevadensis and M. natalensis genomes (scafolds 797 and 887 103, respectively), and conservation of synteny between termite genomes was revealed using dot 888 plots. For phylogenetic tree construction, 26 amino acid sequences of GGPP synthase genes, 889 including 13 homologs of R. speratus, were aligned using the Muscle method with MEGA v7205. 890 The best-fit model of amino acid sequence evolution was determined using the model selection 891 option implemented in MEGA. A maximum likelihood phylogenetic tree was constructed using 892 MEGA with 100 bootstrap replicates. 893 For molecular evolutionary analysis, PRANK v170427206 was used to align amino acid 894 sequences of 13 homologs of R. speratus and those of six other insects, A. pisum, A. mellifera, T. 895 castaneum, D. melanogaster, Z. nevadensis, and M. natalensis, and then back-translate to

17

896 coding sequences. Gblocks v0.91b207 was used to eliminate poorly aligned positions. The aligned 897 289 aa (867 bp) sites were subjected to downstream analyses. A gene tree was constructed by 898 using RAxML ver. 8.2.12.204 with the following parameters: -f a -# 100 -m PROTGAMMAAUTO. 899 To test for signatures of positive selection acting on the lineages leading to R. speratus GGPP 900 synthase paralogs, we compared the likelihood scores of selection models implemented in 901 CODEML in the PAML package v4.9208, using likelihood ratio tests. We used the branch-site test 902 of positive selection (branch-site model A), where branches of interests are treated as foreground 903 allowing three classes of sites (01), and others, as background with two classes 904 of sites (w=0, w=1), by specifying the following parameters: fix_omega = 0, omega = 1 and 2 905 NSsites = 2. The likelihood ratio test statistics were compared against the λ df = 1 distribution to 906 calculate P-values followed by multiple testing correction using the method of Hommel 907 implemented in R. 908 TY family. Secretion signal sequences in TY family proteins of R. speratus were predicted using 909 the SignalP v4.1 program209. The TY family homologs in Z. nevadensis and M. natalensis were 910 identified based on the ortholog analysis described above and synteny information. Multiple 911 alignments of TY family homologs from termites were constructed using MUSCLE software210. 912 Non-synonymous (Ka) and synonymous (Ks) substitution rates of paired-wise paralogues were 913 calculated with KaKs_Calculator v2.0211 for codon-aligned sequences generated by the tranalign 914 program included in the EMBOSS suite212.

915 RNA in situ hybridization 916 All castes examined except for female primary reproductives (queens) were collected from 917 mature colonies (#9 and #10). Queens were sampled 4 months after incipient colony foundation, 918 as described in the previous section. To prepare RNA probes, specific primers for three lipocalins 919 (RS008881, RS008882, and RS008823), two GH1s (β-glucosidases; RS004136 and RS004624) 920 and one GGPP synthase (RS100016) were designed using Primer3Plus213 (Supplementary Table 921 6). Total RNA for probe synthesis was extracted from the whole bodies of female neotenics 922 (RS008881), queens (RS004624), workers (RS008882 and RS004136), and soldiers (RS008823 923 and RS100016) using Isogene II (Nippon Gene, Tokyo, Japan). After treatment with DNase I 924 (Takara Bio, Shiga, Japan), the quality and quantity of total RNA were measured using a 925 NanoVue spectrophotometer (Cytiva). cDNA was synthesized using a High-Capacity cDNA 926 Reverse Transcription Kit (Thermo Fisher Scientific). The PCR products from specific primers 927 (Supplementary Table 6) were purified using a QIAquick Gel Extraction Kit (Qiagen) and 928 subcloned into a pGEM easy T-vector (Promega, Madison, WI). The inserted DNA was amplified, 929 and PCR products were sequenced using a BigDye Terminator v. 3.1 Cycle Sequencing Kit and 930 an automatic DNA Sequencer 3130 Genetic Analyzer (Thermo Fisher Scientific). Plasmids with 931 the targeted fragments were extracted using a GenElute Plasmid Miniprep Kit (SIGMA-Aldrich, 932 St. Louis, MO). The digoxygenin (DIG)-labeled sense or antisense RNA probes were produced 933 using a DIG RNA Labeling Kit (SP6/T7) (SIGMA-Aldrich), and purified with Ethachinmate (Nippon 934 Gene). 935 Prior to cryosectioning, the abdomens of queens [RS008881 (n = 4) and RS004624 (n = 936 3)], heads and thoraxes of workers [RS008882 (n = 3) and RS004136 (n = 3)], and heads of 937 soldiers [RS008823 (n = 3) and RS100016 (n = 2)] were dissected from the bodies and fixed with 938 4% paraformaldehyde in phosphate-buffered saline. Samples were embedded in TissueTek 939 O.C.T. Compound (Sakura Finetek USA Inc., Torrance, CA). Cryosections (10 µm) were 940 collected on CREST-coated glass slides (Matsunami, Osaka, Japan) using a CM1510S cryostat 941 (Leica Biosystems, Nussloch, Germany). The sections were hybridized with DIG-labeled sense or 942 antisense RNA probes using In situ hybridization reagents (Nippon Gene) in accordance with the 943 instructions provided by the manufacturer. Immunocytochemical detection of DIG-labeled RNA 944 was performed using a DIG Nucleic Acid Detection Kit (Roche, Grenzacherstrasse, Basel, 945 Switzerland) in accordance with the manufacturer’s instructions. The images were captured using 946 a Biozero microscope (Keyence, Tokyo, Japan).

18

947 Data availability 948 Data from whole-genome sequencing, transcriptome sequencing, and methylome sequencing 949 have been deposited in the DDBJ database under BioProject accessions PRJDB2984, 950 PRJDB5589 and PRJDB11323, respectively. The analyzed data including genome assembly, 951 gene prediction, annotation, and gene expression are available through FigShare (doi: 952 m9.figshare.14267342, doi: 10.6084/m9.figshare.14267381, doi: 10.6084/m9.figshare.14267498). 953 The R. speratus genome browser is available at http://www.termite.nibb.info/retsp/.

954 Code availability 955 All software used in this study for data analyses are open source. Custom R, Ruby and Shell 956 scripts were deposited into GitHub (https://github.com/termiteg/retsp_genome_paper). 957

19

958

20

959 Fig. S1. Global view of DNA methylation in the Reticulitermes speratus genome. (a) A snapshot 960 of the genome browser showing the intensive gene body methylation. The top track indicates 961 gene models with exon-intron structures. The orange barplots in the second (soldier female) and 962 third (worker female) tracks indicate the methylation levels. The barplot in kahaki color in the 963 bottom track shows the %CpG observed / expected value, which shows a notable negative 964 correlation with the methylation levels. (b) Methylation levels around start codons (left) and stop 965 codons (right) of all genes were averaged and plotted. (c) Methylation levels by gene. Methylation 966 level was calculated by gene and summarized in the histogram showing a remarkable bimodal 967 distribution (left). The same analysis conducted for only the first exons (right) indicated that the 968 first exons are devoid of methylation in most genes. (d) Methylation level by the genomic context. 969 CpG sites were partitioned by the genomic context, i.e., gene (exons and introns), protein-coding 970 region (cds), intron, intergenic, 2k-upstream from start codon, 2k-downstream from stop codon, 971 repeats, miRNA, piRNA, CpG island and acceptor-donor site), and calculated the %CpG for each 972 category. 973

21

974

975 Fig. S2. Differential methylation pattern among castes. (a) Comparison of CpG levels among 976 castes. No significant differences were detected. (b) Methylation pattern correlation among 977 castes. Methylation levels were calculated by sliding the 200-bp window for each caste and the 978 patterns were compared among castes. In any pair of comparisons, Pearson’s correlation values 979 were close to 1 suggesting the indistinguishable methylation pattern among castes. Male worker 980 (WM), female worker (WF1-2), male soldier (SM), female soldier (SF1-2), king (K), queen (Q). (c, 981 d) Caste-biased genes are unmethylated. Comparison of head transcriptomes between soldiers 982 and workers is shown as a representative. The other comparisons of castes or body parts also 983 showed similar patterns. In the MA plot comparing transcriptome of soldiers and workers (c), 984 highly methylated genes (blue; >X%) showed a tendency to be plotted around logFC=0 (no 985 changes between castes), while lowly methylated genes (red) are plotted away from the logFC=0

22

986 line. When genes are categorized into differentially expressed genes (DEG) and non-DE genes 987 (non-DEG), the methylation levels are significantly different. 988 989

23

990

991 Fig. S3. Local syntenic break by tandem gene duplications. (a, b) Dot plots comparing the 992 syntenic regions of the Reticulitermes speratus genome and the Macrotermes natalensis 993 genome. The pair of syntenic sequences were compared with BLASTN and the aligned 994 fragments were plotted with colors according to the bit scores. Scaffold_31 (a), scaffold_186 (b), 995 and scaffold_154 (c) of the R. speratus assembly are shown. 996

24

997

998 Fig. S4. Comparison of number of CAZyme-encoding genes among 5 insect species. The 999 number of genes that belong to Glycoside Hydrolase (GH), Glycosyl Transferase (GT), 1000 Carbohydrate Esterase (CE), Auxiliary Activity (AA) and Carbohydrate-Binding Module (CBM) 1001 families are compared among Reticulitermes speratus, Zootermopsis nevadensis, Macrotermes 1002 natalensis, Apis mellifera and Drosophila melanogaster. Bar plots of termites are colored in red. 1003

25

1004 26

1005 Fig. S5. Glycoside hydrolase family (GH) 9 in the Reticulitermes speratus genome. (a) Maximum 1006 likelihood (ML) tree of GH9 homologs based on the amino acid sequences obtained with a 1007 Le_Gascuel_2008 + Gamma model. The bootstrap percentages of 1000 ML trees in which the 1008 associated taxa clustered together are shown next to the nodes. The analysis involved 53 amino 1009 acid sequences. All positions containing gaps and missing data were eliminated. There were a 1010 total of 312 positions in the final dataset. Branches leading to clade A and clade B, which are 2 1011 discrete GH9 groups of insects, are marked in blue and yellow, respectively. Two groups derived 1012 from termites are also marked. (b) Caste-biased expression patterns of GH9 family genes. 1013 Expression levels are indicated as RPKM calculated from RNA-sequencing analysis. Orange and 1014 blue points indicate females and males, respectively. 1015

27

1016

1017 Fig. S6. In situ hybridization with lipocalin mRNA sense probe (negative control). (a) Vertical 1018 cryosection of the queen abdomen subjected to in situ hybridization with a sense DIG-labeled 1019 RS008881 mRNA probe. Arrowheads indicate the accessory gland cell layer (stained dark with 1020 an antisense probe in Fig. 3d). Asterisk indicates the spermatica containing sperms. Bar = 0.2 1021 mm. (b) Vertical cryosection of the soldier head subjected to in situ hybridization with a sense 1022 DIG-labeled RS008823 mRNA probe. The front of the head is on the left side. Arrowhead 1023 indicates the gland cell layer surrounding the frontal gland reservoir (R) (stained dark with an 1024 antisense probe in Fig. 3e). Asterisk indicates the brain. Bar = 0.1 mm. (c) Left to right: vertical 1025 cryosection of the worker antenna, horizontal cryosection of the worker labial palp (right palp) and 1026 maxillary palp [the last segment of left (upper) and right (lower) palp] subjected to in situ 1027 hybridization with a sense DIG-labeled RS008882 mRNA probe. Bar = 0.1 mm. 1028

28

1029

1030 Fig. S7. In situ hybridization with GH1 and GGPPS mRNA sense probe (negative control). (a) 1031 Vertical cryosection of the worker thorax subjected to in situ hybridization with a sense DIG- 1032 labeled RS004136 mRNA probe. Magnified view is shown in the right panel. Arrowheads indicate 1033 the salivary gland cells (stained dark with an antisense probe in Fig. 4d). Bar = 0.2 (left) and 0.1 1034 (right) mm. (b) Vertical cryosection of the queen abdomen subjected to in situ hybridization with a 1035 sense DIG-labeled RS004624 mRNA probe. Magnified view is shown in the lower panel. 1036 Arrowheads indicate the accessory gland cell layers (stained dark with an antisense probe in Fig. 1037 4f). Bar = 0.2 (upper) and 0.1 (lower) mm. (c) Vertical cryosection of the soldier head subjected to 1038 in situ hybridization with a sense DIG-labeled RS100016 mRNA probe. Magnified view is shown 1039 in the lower panel. Arrowhead indicates the gland cell layer surrounding the frontal gland 1040 reservoir (R) (stained dark with an antisense probe in Fig. 6d). Asterisk indicates the brain. Bar = 1041 0.2 (upper) and 0.05 (lower) mm. 1042

29

1043

1044 Fig. S8. Comparison of numbers of immune-related genes among 8 insect species. Species 1045 examined are Anopheles gambiae, Aedes aegypti, Apis mellifera, Bombus terrestris, Drosophila 1046 melanogaster, Nasonia vitripennis, Reticulitermes speratus and Tribolium castaneum. Gene 1047 numbers of D. melanogaster, An. gambiae and Ae. aegypti were obtained from ImmunoDB 1048 (Waterhouse et al. 2007), and those of N. vitripennis, T. castaneum Ap. Mellifera and B. terrestris 1049 were from Barribeau et al. (2015). Genes belonging to Caspase and Caspase Activator families 1050 were combined into Caspase, and Toll receptor and Toll pathway families into Toll pathway. 1051

30

1052

1053 Fig. S9. Expression levels of sex determination genes among royals (reproductives), soldiers and 1054 workers in Reticulitermes speratus. Expression levels are indicated as RPKM calculated from 1055 RNA-sequencing analysis. Orange and blue points indicate females and males, respectively. 1056

31

1057

1058 Fig. S9. Continued 1059

32

1060

1061 Fig. S10. Expression levels of histone modifying enzyme genes among royals (reproductives), 1062 soldiers and workers in Reticulitermes speratus. Expression levels are indicated as RPKM 1063 calculated from RNA-sequencing analysis. Orange and blue points indicate females and males, 1064 respectively. KDM1A (RS005465) encodes histone demethylase, SETDB1 (RS011823) and 1065 DOT1L (RS009425) encode histone methyltransferases, and SIRT6 (RS010824) and SIRT7 1066 (RS012147) encode histone deacetylases. These genes show the significant differences among 1067 castes in heads and/or thorax and abdomen samples (*FDR < 0.05). 1068

33

1069

1070 Fig. S11. Expression levels of DNA methylation-related genes among royals (reproductives), 1071 soldiers and workers in Reticulitermes speratus. Expression levels are indicated as RPKM 1072 calculated from RNA-sequencing analysis. Orange and blue points indicate females and males, 1073 respectively. All these 4 genes show the significant differences among castes in heads and/or 1074 thorax and abdomen samples (*FDR < 0.05). 1075 1076

34

1077

1078 Fig. S12. Expression levels of sensory neuron membrane protein (SNMP) genes in the heads 1079 and bodies (throax + abdomen) of Reticulitermes speratus. Expression levels are indicated as 1080 RPKM calculated from RNA-sequencing analysis. 1081

35

1082

1083 Fig. S13. Expression levels of sensory neuron membrane protein (SNMP) genes among the 1084 heads of royals (reproductives), soldiers and workers in Reticulitermes speratus. Expression 1085 levels are indicated as RPKM calculated from RNA-sequencing analysis. Orange and blue points 1086 indicate females and males, respectively. 1087

36

1088

1089 Fig. S14. Maximal likelihood (ML) tree of chemosensory protein (CSP) homologs based on the 1090 amino acid sequences obtained with PROTGAMMALG model. Sequences from Drosophila 1091 melanogaster (Dmel), Apis mellifera (Amel), Acyrthosiphon pisum (Apis), Pediculus humanus 1092 (Phum), Zootermopsis nevadensis (Znev) and Reticultermes speratus (Rspe) were included. 1093 Numbers above branches indicate the bootstrap probabilities more than 50% based on 100 ML 1094 trees. 1095

37

1096

1097 Fig. S15. Expression levels of chemosensory protein (CSP) genes in the heads and bodies 1098 (throax + abdomen) of Reticulitermes speratus. Expression levels are indicated as RPKM 1099 calculated from RNA-sequencing analysis. 1100

38

1101

1102 Fig. S16. Expression levels of chemosensory protein (CSP) genes among the heads of royals 1103 (reproductives), soldiers and workers in Reticulitermes speratus. Expression levels are indicated 1104 as RPKM calculated from RNA-sequencing analysis. Orange and blue points indicate females 1105 and males, respectively. 1106

39

1107

1108 Fig. S17. Expression levels of biosynthetic genes of biogenic amines among royals 1109 (reproductives), soldiers and workers in Reticulitermes speratus. Expression levels are indicated 1110 as RPKM calculated from RNA-sequencing analysis. Orange and blue points indicate females 1111 and males, respectively. All these 6 genes show the significant differences among castes in 1112 heads and/or thorax and abdomen samples (*FDR < 0.05). 1113

40

1114

1115 Fig. S18. Expression levels of receptor genes of biogenic amines among royals (reproductives), 1116 soldiers and workers in Reticulitermes speratus. Expression levels are indicated as RPKM 1117 calculated from RNA-sequencing analysis. Orange and blue points indicate females and males, 1118 respectively. All these 6 genes show the significant differences among castes in heads and/or 1119 thorax and abdomen samples (*FDR < 0.05). 1120

41

1121

1122 Fig. S19. Expression levels of neuropeptide genes among royals (reproductives), soldiers and 1123 workers in Reticulitermes speratus. Expression levels are indicated as RPKM calculated from 1124 RNA-sequencing analysis. Orange and blue points indicate females and males, respectively. All 1125 these 15 genes show the significant differences among castes in heads and/or thorax and 1126 abdomen samples (*FDR < 0.05). 1127

42

1128

1129 Fig. S19. Continued. 1130

43

1131

1132 Fig. S20. Amino acid alignment of JH receptor gene (Methoprene-torelant; Met). Species (gene 1133 ID) examined are Reticulitermes speratus (RS010120), Coptotermes formosanus (A: 1134 GFG37549.1, B: GFG37551.1), Zootermopsis nevadensis (BAR92640.1), Blattera germanica 1135 (CDO33887.1), Diploptera punctata (AIM47235.1), Locusta migratoria (AHA42531.1), Tribolium 1136 castaneum (NP_001092812.1, XP_008191439.1), Bombyx mori (NP_001108458.1), 1137 Harpegnathos saltator (EFN85711.1), Apis mellifera (XP_395005.5), Drosophila melanogaster 1138 (NP_001285132.1) and Daphnia pulex (BAM83853.1). 1139

44

1140

1141 Fig. S21. Expression levels of JH biosynthetic genes (early steps) among royals (reproductives), 1142 soldiers and workers in Reticulitermes speratus. Expression levels are indicated as RPKM 1143 calculated from RNA-sequencing analysis. Orange and blue points indicate females and males, 1144 respectively. All these 7 genes show the significant differences among castes in heads and/or 1145 thorax and abdomen samples (*FDR < 0.05). 1146

45

1147

1148 Fig. S22. Expression levels of JH biosynthetic genes (late steps) among royals (reproductives), 1149 soldiers and workers in Reticulitermes speratus. Expression levels are indicated as RPKM 1150 calculated from RNA-sequencing analysis. Orange and blue points indicate females and males, 1151 respectively. All these 6 genes show the significant differences among castes in heads and/or 1152 thorax and abdomen samples (*FDR < 0.05). 1153

46

1154

1155 Fig. S23. Expression levels of JH signaling and neuropeptide related genes among royals 1156 (reproductives), soldiers and workers in Reticulitermes speratus. Expression levels are indicated 1157 as RPKM calculated from RNA-sequencing analysis. Orange and blue points indicate females 1158 and males, respectively. All these 6 genes show the significant differences among castes in 1159 heads and/or thorax and abdomen samples (*FDR < 0.05). 1160

47

1161

1162 Fig. S24. Expression levels of JH binding and degradation genes among royals (reproductives), 1163 soldiers and workers in Reticulitermes speratus. Expression levels are indicated as RPKM 1164 calculated from RNA-sequencing analysis. Orange and blue points indicate females and males, 1165 respectively. All these 7 genes show the significant differences among castes in heads and/or 1166 thorax and abdomen samples (*FDR < 0.05). 1167

48

1168

1169 Fig. S25. Maximum likelihood (ML) tree of JH esterase (JHE) homologs based on the amino acid 1170 sequences with the highest log likelihood (-48105.8335) based on the Dayhoff model. Sequences 1171 from Reticulitermes speratus (RS), R. flavipes (Rf), Zootermopsis nevadensis (Znev), 1172 Macrotermes natalensis (Mnat), M. barneyi (Mbar), Tribolium castaneum (Tc), Apis mellifera 1173 (Am), Pandalopsis japonica (Pjap) and Drosophila melanogaster (FBpp) are included. Total 6 1174 sequences of acethylcholin-esterases are used for outgroups. We also obtained estimations of 1175 tree topology under the Neighbor-joining (NJ) method based on the JTT+G model. Numbers 1176 above or below each branch indicate the bootstrap probabilities (BP) more than 50% (100 and 1177 5,000 replicates in ML and NJ method, respectively). Only one number is given if BP was 1178 identical at that node. An asterisk indicates a node that was not supported by NJ method. 1179

49

1180

1181 Fig. S26. Expression levels of JH degradation genes (JHEs) among royals (reproductives), 1182 soldiers and workers in Reticulitermes speratus. Expression levels are indicated as RPKM 1183 calculated from RNA-sequencing analysis. Orange and blue points indicate females and males, 1184 respectively. All these 12 genes show the significant differences among castes in heads and/or 1185 thorax and abdomen samples (*FDR < 0.05). 1186

50

1187

1188 Fig. S26. Continued. 1189

51

1190

1191 Fig. S27. Expression levels of ecdysone synthesis genes among royals (reproductives), soldiers 1192 and workers in Reticulitermes speratus. Expression levels are indicated as RPKM calculated from 1193 RNA-sequencing analysis. Orange and blue points indicate females and males, respectively. All 1194 these 7 genes show the significant differences among castes in heads and/or thorax and 1195 abdomen samples (*FDR < 0.05). 1196

52

1197

1198 Fig. S28. Expression levels of ecdysone receptor and signaling genes among royals 1199 (reproductives), soldiers and workers in Reticulitermes speratus. Expression levels are indicated 1200 as RPKM calculated from RNA-sequencing analysis. Orange and blue points indicate females 1201 and males, respectively. All these 7 genes show the significant differences among castes in 1202 heads and/or thorax and abdomen samples (*FDR < 0.05). 1203

53

1204

1205 Fig. S29. Neighbor-joining (NJ) tree of Ras85D (Ras1) homologs based on the amino acid 1206 sequences obtained with Poisson model. The bootstrap percentages of 1000 NJ trees in which 1207 the associated taxa clustered together are shown next to the nodes. The Ras1 homologs of R. 1208 speratus are RS013615, RS000933 and RS003738. The Ras64B (Ras2) and Rap1 (Ras3) 1209 homologs of R. speratus are RS007784 and RS000601, respectively. Amino acid sequences of 1210 Periplaneta americana and Cryptocercus punctulatus are obtained from the assembled contig 1211 sequences using Trinity (Grabherr et al. 2011) (transcriptome data: DRA001254 and 1212 DRA004598, respectively). 1213

54

1214

1215 Fig. S30. Neighbor-joining (NJ) tree of insulin receptor (InR) homologs based on the amino acid 1216 sequences obtained with Poisson model. The bootstrap percentages of 1000 NJ trees in which 1217 the associated taxa clustered together are shown next to the nodes. Nucleotide sequences exept 1218 for Reticulitermes speratus were referred to Xu and Zhang (2015). The InR homologs of R. 1219 speratus are RS000922, RS007018 and RS007019. 1220

55

1221

1222 Fig. S31. Expression levels of Insulin/insulin-like signaling pathway genes among royals 1223 (reproductives), soldiers and workers in Reticulitermes speratus. Expression levels are indicated 1224 as RPKM calculated from RNA-sequencing analysis using the head (a) and the thorax and 1225 abdomen (b) samples. Orange and blue points indicate females and males, respectively. 1226

56

1227

1228 Fig. S32. Expression levels of toolkit genes among royals (reproductives), soldiers and workers in 1229 Reticulitermes speratus. Expression levels are indicated as RPKM calculated from RNA- 1230 sequencing analysis. Orange and blue points indicate females and males, respectively. All these 1231 19 genes show the significant differences among castes in heads and/or thorax and abdomen 1232 samples (*FDR < 0.05). 1233

57

1234

1235 Fig. S32. Continued. 1236

58

1237

1238 Fig. S32. Continued. 1239

59

1240

1241 Fig. S33. Gene expression levels of antimicrobial peptides among royals (reproductives), soldiers 1242 and workers in Reticulitermes speratus. Expression levels are indicated as RPKM calculated from 1243 RNA-sequencing analysis. Orange and blue points indicate females and males, respectively. 1244

60

1245

1246 Fig. S34. Maximum likelihood (ML) tree of cytochrome P450 monooxygenases (CYP) homologs 1247 based on the amino acid sequences obtained with LG+I+G model. The bootstrap percentages of 1248 100 ML trees in which the associated taxa clustered together are marked as circles on the nodes. 1249 Sequences from 5 species of cockroaches and termites are indicated as different color branches. 1250 Gene IDs of termites are shown in Supplementary Table 28. Amino acid sequences of 1251 Periplaneta americana and Cryptocercus punctulatus are obtained from the assembled contig 1252 sequences using Trinity (Grabherr et al. 2011) (transcriptome data: DRA001254 and 1253 DRA004598, respectively). 1254

61

1255

1256 Fig. S35. Maximum likelihood (ML) tree of glutathione S-transferases (GST) homologs based on 1257 the amino acid sequences obtained with JTT+I+G model. The bootstrap percentages of 100 ML 1258 trees in which the associated taxa clustered together are marked as circles on the nodes. 1259 Sequences from 5 species of cockroaches and termites are indicated as different color branches. 1260 Gene IDs of termites are shown in Supplementary Table 28. Amino acid sequences of 1261 Periplaneta americana and Cryptocercus punctulatus are obtained from the assembled contig 1262 sequences using Trinity (Grabherr et al. 2011) (transcriptome data: DRA001254 and 1263 DRA004598, respectively). 1264

62

1265

1266 Fig. S36. Maximum likelihood (ML) tree of carboxylesterases (CCE) homologs based on the 1267 amino acid sequences obtained with JTT+I+G model. The bootstrap percentages of 100 ML trees 1268 in which the associated taxa clustered together are marked as circles on the nodes. Sequences 1269 from 5 species of cockroaches and termites are indicated as different color branches. Gene IDs 1270 of termites are shown in Supplementary Table 28. Amino acid sequences of Periplaneta 1271 americana and Cryptocercus punctulatus are obtained from the assembled contig sequences 1272 using Trinity (Grabherr et al. 2011) (transcriptome data: DRA001254 and DRA004598, 1273 respectively). 1274

63

1275 Table S1. Termite colonies used for next-generation sequencing. Analysis Colony Location Date Castes no collected collected genome #1 Furudo, Toyama November, female secondary reprpductives 2013 (nymphoids) RNAseq #2 Furudo, Toyama September, worker and soldier 2014 RNAseq #3 Furudo, Toyama September, worker and soldier 2014 RNAseq #4 Furudo, Toyama September, worker and soldier 2014 RNAseq #5 Furudo, Toyama April-May, reproductive 2014 RNAseq #6 Furudo, Toyama April-May, reproductive 2014 RNAseq #7 Furudo, Toyama April-May, reproductive 2014 PBAT #8 Furudo, Toyama October, 2014 worker and soldier PBAT #5, #7 Furudo, Toyama May, 2014 reproductive 1276 1277

64

1278 Table S2. Reticulitermes speratus RNA sequencing libraries. Library name SRA ID Sample description Caste Sex Body Colony* part Rspe_RMH1 DRR090830 Reticulitermes speratus, royal, male, reproductive male head #5 head, rep1 Rspe_SFTA3 DRR090831 Reticulitermes speratus, soldier, soldier female thorax #4 female, thorax and abdomen, rep3 and abdomen Rspe_RMTA1 DRR090832 Reticulitermes speratus, royal, male, reproductive male thorax #5 thorax and abdomen, rep1 and abdomen Rspe_RMH3 DRR090833 Reticulitermes speratus, royal, male, reproductive male head #6 head, rep3 Rspe_RFH1 DRR090834 Reticulitermes speratus, royal, reproductive female head #6 female, head, rep1 Rspe_RMTA3 DRR090835 Reticulitermes speratus, royal, male, reproductive male thorax #6 thorax and abdomen, rep3 and abdomen Rspe_RFTA1 DRR090836 Reticulitermes speratus, royal, reproductive female thorax #6 female, thorax and abdomen, rep1 and abdomen Rspe_RFH3 DRR090837 Reticulitermes speratus, royal, reproductive female head #7 female, head, rep3 Rspe_WMH2 DRR090838 Reticulitermes speratus, worker, worker male head #3 male, head, rep2 Rspe_RFTA3 DRR090839 Reticulitermes speratus, royal, reproductive female thorax #7 female, thorax and abdomen, rep3 and abdomen Rspe_WMTA2 DRR090840 Reticulitermes speratus, worker, worker male thorax #3 male, thorax and abdomen, rep2 and abdomen Rspe_WFH2 DRR090841 Reticulitermes speratus, worker, worker female head #3 female, head, rep2 Rspe_WFTA2 DRR090842 Reticulitermes speratus, worker, worker female thorax #3 female, thorax and abdomen, rep2 and abdomen Rspe_SMH2 DRR090843 Reticulitermes speratus, soldier, soldier male head #3 male, head, rep2 Rspe_WMH1 DRR090844 Reticulitermes speratus, worker, worker male head #2 male, head, rep1 Rspe_RFTA2 DRR090845 Reticulitermes speratus, royal, reproductive female thorax #7 female, thorax and abdomen, rep2 and abdomen Rspe_SMTA2 DRR090846 Reticulitermes speratus, soldier, soldier male thorax #3 male, thorax and abdomen, rep2 and abdomen Rspe_SFH2 DRR090847 Reticulitermes speratus, soldier, soldier female head #3 female, head, rep2 Rspe_SFTA2 DRR090848 Reticulitermes speratus, soldier, soldier female thorax #3 female, thorax and abdomen, rep2 and abdomen Rspe_RMH2 DRR090849 Reticulitermes speratus, royal, male, reproductive male head #5 head, rep2 Rspe_RMTA2 DRR090850 Reticulitermes speratus, royal, male, reproductive male thorax #5 thorax and abdomen, rep2 and abdomen Rspe_RFH2 DRR090851 Reticulitermes speratus, royal, reproductive female head #7 female, head, rep2 Rspe_WMTA1 DRR090852 Reticulitermes speratus, worker, worker male thorax #2 male, thorax and abdomen, rep1 and abdomen Rspe_WMH3 DRR090853 Reticulitermes speratus, worker, worker male head #4 male, head, rep3 Rspe_WFH1 DRR090854 Reticulitermes speratus, worker, worker female head #2 female, head, rep1 Rspe_WMTA3 DRR090855 Reticulitermes speratus, worker, worker male thorax #4 male, thorax and abdomen, rep3 and abdomen Rspe_WFTA1 DRR090856 Reticulitermes speratus, worker, worker female thorax #2 female, thorax and abdomen, rep1 and abdomen

65

Rspe_WFH3 DRR090857 Reticulitermes speratus, worker, worker female head #4 female, head, rep3 Rspe_SMH1 DRR090858 Reticulitermes speratus, soldier, soldier male head #2 male, head, rep1 Rspe_WFTA3 DRR090859 Reticulitermes speratus, worker, worker female thorax #4 female, thorax and abdomen, rep3 and abdomen Rspe_SMTA1 DRR090860 Reticulitermes speratus, soldier, soldier male thorax #2 male, thorax and abdomen, rep1 and abdomen Rspe_SMH3 DRR090861 Reticulitermes speratus, soldier, soldier male head #4 male, head, rep3 Rspe_SFH1 DRR090862 Reticulitermes speratus, soldier, soldier female head #2 female, head, rep1 Rspe_SMTA3 DRR090863 Reticulitermes speratus, soldier, soldier male thorax #4 male, thorax and abdomen, rep3 and abdomen Rspe_SFTA1 DRR090864 Reticulitermes speratus, soldier, soldier female thorax #2 female, thorax and abdomen, rep1 and abdomen Rspe_SFH3 DRR090865 Reticulitermes speratus, soldier, soldier female head #4 female, head, rep3 1279 *corresponds to the colony # in Supplementary Table 1. 1280 1281

66

1282 Table S3. Reticulitermes speratus Illumina libraries for genome sequencing and whole-genome 1283 bisulfite sequencing. Library name SRA ID Sample Caste Sex Body part Colony* description Genome

Rspe_GPE250 DRR000000 Paired-end 250 female secondary female whole body; gut and #1 bp insert reprpductives ovaries are excluded (nymphoids) Rspe_GPE800 DRR000000 Paired-end 800 female secondary female whole body; gut and #1 bp insert reprpductives ovaries are excluded (nymphoids) Rspe_GMP3k DRR252502 Mate-pair 3k bp female secondary female whole body; gut and #1 insert reprpductives ovaries are excluded (nymphoids) Rspe_GMP5k DRR252503 Mate-pair 5k bp female secondary female whole body; gut and #1 insert reprpductives ovaries are excluded (nymphoids) Rspe_GMP8k DRR252504 Mate-pair 8k bp female secondary female whole body; gut and #1 insert reprpductives ovaries are excluded (nymphoids) Rspe_GMP10k DRR252505 Mate-pair 10k bp female secondary female whole body; gut and #1 insert reprpductives ovaries are excluded (nymphoids) Methylome

Rspe_PBAT DRR000000 PBAT Primary reproductive male head #5 reproductive, Male, Head Rspe_PBAT DRR000000 PBAT Primary reproductive female head #7 reproductive, Female, Head Rspe_PBAT DRR000000 PBAT Soldier, soldier male head #8 Male, Head Rspe_PBAT DRR000000 PBAT Soldier, soldier female head #8 Female, Head Rspe_PBAT DRR000000 PBAT Worker, worker male head #8 Male, Head Rspe_PBAT DRR000000 PBAT Worker worker female head #8 Female, Head 1284 * corresponds to the colony # in Supplementary Table 1. 1285 1286

67

1287 Table S4. Over-represented Gene Ontology terms in caste-DEG. body category Pfam % in caste- % in all q-value motif description part domain biased genes Thorax + abdomen

BP GO:0000270 0.34% 0.08% 0.02242181 peptidoglycan metabolic process

GO:0005975 3.85% 2.54% 0.009769257 carbohydrate metabolic process

GO:0006022 1.69% 0.80% 0.003451719 aminoglycan metabolic process

GO:0006027 0.53% 0.15% 0.009769257 glycosaminoglycan catabolic process GO:0006030 1.16% 0.56% 0.034011861 chitin metabolic process

GO:0006040 1.35% 0.71% 0.040302871 amino sugar metabolic process

GO:0006629 4.43% 2.84% 0.002648555 lipid metabolic process

GO:0006633 1.01% 0.44% 0.017959474 fatty acid biosynthetic process

GO:0006720 1.20% 0.44% 0.00066074 isoprenoid metabolic process

GO:0006721 0.87% 0.22% 4.46E-05 terpenoid metabolic process

GO:0006726 0.39% 0.08% 0.004304289 eye pigment biosynthetic process

GO:0008299 0.92% 0.29% 0.00066074 isoprenoid biosynthetic process

GO:0008610 2.17% 1.23% 0.009769257 lipid biosynthetic process

GO:0009253 0.34% 0.06% 0.004446678 peptidoglycan catabolic process

GO:0016063 0.29% 0.05% 0.012532405 rhodopsin biosynthetic process

GO:0016108 0.24% 0.05% 0.047939137 tetraterpenoid metabolic process

GO:0016114 0.58% 0.16% 0.004446678 terpenoid biosynthetic process

GO:0016116 0.24% 0.05% 0.047939137 carotenoid metabolic process

GO:0042441 0.39% 0.08% 0.007327479 eye pigment metabolic process

GO:0043052 0.34% 0.05% 0.001447822 thermotaxis

GO:0043324 0.39% 0.08% 0.007327479 pigment metabolic process involved in developmental pigmentation GO:0043474 0.39% 0.08% 0.007327479 pigment metabolic process involved in pigmentation GO:0044255 3.56% 2.40% 0.026230783 cellular lipid metabolic process

GO:0046154 0.29% 0.06% 0.034011861 rhodopsin metabolic process

GO:0048069 0.39% 0.08% 0.007327479 eye pigmentation

GO:1901136 1.11% 0.49% 0.010846089 carbohydrate derivative catabolic process MF GO:0003796 0.48% 0.11% 0.000303785 lysozyme activity

GO:0004175 3.37% 1.82% 1.04E-05 endopeptidase activity

GO:0004252 1.97% 0.86% 1.04E-05 serine-type endopeptidase activity GO:0004311 0.29% 0.07% 0.019075313 farnesyltranstransferase activity

GO:0004312 0.39% 0.09% 0.003311523 fatty acid synthase activity

68

GO:0004497 1.64% 0.84% 0.002474851 monooxygenase activity

GO:0004553 2.17% 0.88% 1.12E-06 hydrolase activity, hydrolyzing O- glycosyl compounds GO:0004659 0.48% 0.18% 0.041592279 prenyltransferase activity

GO:0004806 0.19% 0.04% 0.04711786 triglyceride lipase activity

GO:0004871 3.81% 2.77% 0.034606867 signal transducer activity

GO:0004872 4.09% 2.83% 0.005844382 receptor activity

GO:0004888 3.32% 2.15% 0.003311523 transmembrane signaling receptor activity GO:0004930 2.07% 1.29% 0.020926341 G-protein coupled receptor activity GO:0005214 0.39% 0.13% 0.044135237 structural constituent of chitin- based cuticle GO:0005506 2.17% 1.15% 0.000633723 iron ion binding

GO:0008061 0.92% 0.46% 0.037508132 chitin binding

GO:0008194 0.87% 0.44% 0.045884796 UDP-glycosyltransferase activity

GO:0008233 4.91% 3.20% 0.000226142 peptidase activity

GO:0008236 2.31% 1.02% 1.67E-06 serine-type peptidase activity

GO:0008237 1.59% 0.95% 0.033595482 metallopeptidase activity

GO:0008422 0.34% 0.08% 0.005252854 beta-glucosidase activity

GO:0008745 0.34% 0.06% 0.000750055 N-acetylmuramoyl-L-alanine amidase activity GO:0015020 0.43% 0.16% 0.047143438 glucuronosyltransferase activity

GO:0015926 0.39% 0.13% 0.044135237 glucosidase activity

GO:0016160 0.19% 0.04% 0.04711786 amylase activity

GO:0016297 0.29% 0.06% 0.007854275 acyl-[acyl-carrier-protein] hydrolase activity GO:0016614 1.35% 0.66% 0.003419662 oxidoreductase activity, acting on CH-OH group of donors GO:0016620 0.87% 0.33% 0.002073942 oxidoreductase activity, acting on the aldehyde or oxo group of donors, NAD or NADP as acceptor GO:0016705 2.31% 1.17% 0.000109621 oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen GO:0016717 0.39% 0.09% 0.003311523 oxidoreductase activity, acting on paired donors, with oxidation of a pair of donors resulting in the reduction of molecular oxygen to two molecules of water GO:0016798 2.36% 1.03% 1.67E-06 hydrolase activity, acting on glycosyl bonds GO:0016903 0.92% 0.40% 0.006585745 oxidoreductase activity, acting on the aldehyde or oxo group of donors GO:0017171 2.31% 1.02% 1.67E-06 serine hydrolase activity

GO:0020037 2.12% 1.14% 0.000941207 heme binding

GO:0038023 3.47% 2.36% 0.009272631 signaling receptor activity

GO:0042302 1.30% 0.44% 3.78E-06 structural constituent of cuticle

69

GO:0046906 2.17% 1.16% 0.000669074 tetrapyrrole binding

GO:0060089 4.09% 2.83% 0.005844382 molecular transducer activity

GO:0070011 4.72% 2.97% 9.27E-05 peptidase activity, acting on L- amino acid peptides GO:0080019 0.72% 0.23% 0.000633723 fatty-acyl-CoA reductase (alcohol-forming) activity GO:0099600 3.61% 2.38% 0.003311523 transmembrane receptor activity

CC GO:0005576 4.91% 2.72% 2.48E-07 extracellular region

Head

BP GO:0006022 2.03% 0.80% 0.000141493 aminoglycan metabolic process

GO:0006030 1.71% 0.56% 2.18E-05 chitin metabolic process

GO:0006040 1.84% 0.71% 0.000189878 amino sugar metabolic process

GO:0006629 5.19% 2.84% 1.55E-05 lipid metabolic process

GO:0006694 0.82% 0.27% 0.021334218 steroid biosynthetic process

GO:0006714 0.38% 0.07% 0.021334218 sesquiterpenoid metabolic process GO:0006716 0.38% 0.07% 0.021334218 juvenile hormone metabolic process GO:0006718 0.38% 0.07% 0.021334218 juvenile hormone biosynthetic process GO:0006720 1.77% 0.44% 1.03E-08 isoprenoid metabolic process

GO:0006721 1.27% 0.22% 2.48E-09 terpenoid metabolic process

GO:0006816 0.63% 0.17% 0.021334218 calcium ion transport

GO:0008202 0.89% 0.32% 0.025400444 steroid metabolic process

GO:0008299 1.39% 0.29% 1.10E-08 isoprenoid biosynthetic process

GO:0008610 2.66% 1.23% 0.000189878 lipid biosynthetic process

GO:0016106 0.38% 0.07% 0.021334218 sesquiterpenoid biosynthetic process GO:0016114 0.95% 0.16% 2.16E-07 terpenoid biosynthetic process

GO:0034754 0.57% 0.14% 0.021334218 cellular hormone metabolic process GO:0040003 0.76% 0.24% 0.021334218 chitin-based cuticle development

GO:0042335 0.89% 0.30% 0.021334218 cuticle development

GO:0042445 0.70% 0.19% 0.012512124 hormone metabolic process

GO:0044255 4.31% 2.40% 0.000209167 cellular lipid metabolic process

GO:0070588 0.57% 0.15% 0.025400444 calcium ion transmembrane transport GO:1901071 1.71% 0.59% 6.45E-05 glucosamine-containing compound metabolic process MF GO:0004161 0.38% 0.05% 0.000793969 dimethylallyltranstransferase activity GO:0004311 0.38% 0.07% 0.005865833 farnesyltranstransferase activity

GO:0004497 2.98% 0.84% 9.46E-14 monooxygenase activity

GO:0004553 2.22% 0.88% 1.32E-05 hydrolase activity, hydrolyzing O- glycosyl compounds GO:0004659 0.70% 0.18% 0.001470475 prenyltransferase activity

70

GO:0005198 3.74% 2.14% 0.000580658 structural molecule activity

GO:0005201 0.44% 0.08% 0.001216335 extracellular matrix structural constituent GO:0005214 0.63% 0.13% 0.000267327 structural constituent of chitin- based cuticle GO:0005216 2.28% 1.41% 0.047855518 ion channel activity

GO:0005262 0.51% 0.12% 0.006835974 calcium channel activity

GO:0005319 0.63% 0.20% 0.018197038 lipid transporter activity

GO:0005506 3.86% 1.15% 2.29E-16 iron ion binding

GO:0008010 0.38% 0.08% 0.019573356 structural constituent of chitin- based larval cuticle GO:0008061 1.33% 0.46% 0.000213188 chitin binding

GO:0008083 0.57% 0.19% 0.038334846 growth factor activity

GO:0008422 0.38% 0.08% 0.011539152 beta-glucosidase activity

GO:0015085 0.63% 0.16% 0.002009273 calcium ion transmembrane transporter activity GO:0015171 0.63% 0.23% 0.047855518 amino acid transmembrane transporter activity GO:0015248 0.25% 0.05% 0.047855518 sterol transporter activity

GO:0015926 0.57% 0.13% 0.001746016 glucosidase activity

GO:0016614 1.84% 0.66% 1.38E-05 oxidoreductase activity, acting on CH-OH group of donors GO:0016620 1.14% 0.33% 7.89E-05 oxidoreductase activity, acting on the aldehyde or oxo group of donors, NAD or NADP as acceptor GO:0016705 3.61% 1.17% 1.19E-13 oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen GO:0016709 0.32% 0.07% 0.044419513 oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen, NAD(P)H as one donor, and incorporation of one atom of oxygen GO:0016765 0.95% 0.37% 0.011539152 transferase activity, transferring alkyl or aryl (other than methyl) groups GO:0016798 2.28% 1.03% 0.00017759 hydrolase activity, acting on glycosyl bonds GO:0016903 1.20% 0.40% 0.00026769 oxidoreductase activity, acting on the aldehyde or oxo group of donors GO:0020037 3.80% 1.14% 4.06E-16 heme binding

GO:0022891 5.07% 3.53% 0.014299106 substrate-specific transmembrane transporter activity GO:0042302 2.41% 0.44% 5.57E-19 structural constituent of cuticle

GO:0042813 0.25% 0.05% 0.047855518 Wnt-activated receptor activity

GO:0046906 3.86% 1.16% 2.29E-16 tetrapyrrole binding

GO:0046943 0.76% 0.31% 0.047855518 carboxylic acid transmembrane transporter activity GO:0048037 2.34% 1.41% 0.029166002 cofactor binding

GO:0050660 1.27% 0.57% 0.012190946 flavin adenine dinucleotide binding

71

GO:0072509 0.63% 0.23% 0.040436179 divalent inorganic cation transmembrane transporter activity GO:0080019 0.89% 0.23% 0.000173742 fatty-acyl-CoA reductase (alcohol-forming) activity CC GO:0005576 5.51% 2.72% 2.39E-08 extracellular region

GO:0005578 0.89% 0.28% 0.007096274 proteinaceous extracellular matrix GO:0031012 1.46% 0.45% 3.09E-05 extracellular matrix

1288 1289

72

1290 Table S5. Over-represented Pfam domains in caste-DEG. body part Pfam % in % in all q-value motif description domain caste- biased genes Thorax + abdomen

PF00019 0.29% 0.07% 0.018855562 Transforming growth factor beta like domain

PF00049 0.19% 0.04% 0.047330812 Insulin/IGF/Relaxin family

PF00059 0.58% 0.16% 0.00082408 Lectin C-type domain

PF00061 0.29% 0.06% 0.010173654 Lipocalin / cytosolic fatty-acid binding protein family

PF00062 0.48% 0.10% 0.000162778 C-type lysozyme/alpha-lactalbumin family

PF00067 1.49% 0.74% 0.003042693 Cytochrome P450

PF00084 0.39% 0.11% 0.018855562 Sushi repeat (SCR repeat)

PF00089 1.73% 0.61% 4.06E-07 Trypsin

PF00151 0.39% 0.12% 0.02933154 Lipase

PF00201 0.72% 0.22% 0.000469997 UDP-glucoronosyl and UDP-glucosyl transferase

PF00232 0.63% 0.12% 2.96E-06 Glycosyl hydrolase family 1

PF00282 0.29% 0.08% 0.033068556 Pyridoxal-dependent decarboxylase conserved domain PF00348 0.48% 0.13% 0.002788466 Polyprenyl synthetase

PF00379 1.20% 0.38% 2.96E-06 Insect cuticle protein

PF00394 0.24% 0.05% 0.02933154 Multicopper oxidase

PF00560 0.53% 0.17% 0.010173654 Leucine Rich Repeat

PF00650 0.77% 0.25% 0.000611959 CRAL/TRIO domain

PF00688 0.29% 0.07% 0.018855562 TGF-beta propeptide

PF01061 0.43% 0.14% 0.018249708 ABC-2 type transporter

PF01151 0.34% 0.11% 0.049368426 GNS1/SUR4 family

PF01400 0.29% 0.06% 0.010173654 Astacin (Peptidase family M12A)

PF01510 0.24% 0.05% 0.013999522 N-acetylmuramoyl-L-alanine amidase

PF01607 0.82% 0.35% 0.013820609 Chitin binding Peritrophin-A domain

PF01683 0.24% 0.05% 0.02933154 EB module

PF01757 0.34% 0.09% 0.01954313 Acyltransferase family

PF02244 0.19% 0.04% 0.047330812 Carboxypeptidase activation peptide

PF02958 0.53% 0.13% 0.000469997 Ecdysteroid kinase

PF03015 0.43% 0.14% 0.018249708 Male sterility protein

PF03145 0.87% 0.29% 0.000469997 Seven in absentia protein family

PF04083 0.34% 0.06% 0.00082408 Partial alpha/beta-hydrolase lipase region

PF06585 0.82% 0.23% 3.61E-05 Haemolymph juvenile hormone binding protein (JHBP)

73

PF07732 0.24% 0.05% 0.02933154 Multicopper oxidase

PF07993 0.58% 0.18% 0.003705549 Male sterility protein

PF12796 1.83% 1.02% 0.00667198 Ankyrin repeats (3 copies)

PF13637 0.92% 0.41% 0.013477688 Ankyrin repeats (many copies)

PF13855 2.12% 0.97% 2.41E-05 Leucine rich repeat

Head

PF00019 0.38% 0.07% 0.00636893 Transforming growth factor beta like domain

PF00024 0.38% 0.07% 0.00636893 PAN domain

PF00067 2.98% 0.74% 4.79E-16 Cytochrome P450

PF00094 0.32% 0.07% 0.048052598 von Willebrand factor type D domain

PF00100 0.57% 0.11% 0.000419475 Zona pellucida-like domain

PF00106 1.33% 0.44% 0.000146402 short chain dehydrogenase

PF00128 0.38% 0.09% 0.039392288 Alpha amylase, catalytic domain

PF00151 0.44% 0.12% 0.039392288 Lipase

PF00201 0.95% 0.22% 2.40E-05 UDP-glucoronosyl and UDP-glucosyl transferase

PF00232 0.57% 0.12% 0.00127824 Glycosyl hydrolase family 1

PF00348 0.76% 0.13% 4.21E-06 Polyprenyl synthetase

PF00379 2.28% 0.38% 3.78E-20 Insect cuticle protein

PF00688 0.32% 0.07% 0.048052598 TGF-beta propeptide

PF00732 0.70% 0.17% 0.00081502 GMC oxidoreductase

PF01061 0.51% 0.14% 0.020057828 ABC-2 type transporter

PF01151 0.51% 0.11% 0.00295571 GNS1/SUR4 family

PF01347 0.32% 0.06% 0.030771274 Lipoprotein amino terminal region

PF01391 0.38% 0.07% 0.00636893 Collagen triple helix repeat (20 copies)

PF01562 0.32% 0.07% 0.048052598 Reprolysin family propeptide

PF01607 1.08% 0.35% 0.00081502 Chitin binding Peritrophin-A domain

PF01757 0.38% 0.09% 0.039392288 Acyltransferase family

PF03015 0.57% 0.14% 0.003726942 Male sterility protein

PF04083 0.32% 0.06% 0.030771274 Partial alpha/beta-hydrolase lipase region

PF05199 0.70% 0.17% 0.00081502 GMC oxidoreductase

PF06585 1.14% 0.23% 9.04E-08 Haemolymph juvenile hormone binding protein (JHBP) PF07993 0.76% 0.18% 0.000419475 Male sterility protein

1291 1292

74

1293 Table S6. Primer sequences used for RNA probe synthesis and in situ hybridization. Gen Gene ID Forward (5-3') Reverse (5'-3') Length e (bases) lipocalin

RS008823 TCGACGACAATCTCGACTGC CGACCATCTGGCTGACATCA 360

RS008881 TGTCACAACCGAGACTGTGG TAACAGGCGGAGTTGTCGAC 403

RS008882 ATTCTGCTTCGGACTGGTGT ACAGTTCCTTGCACGCATGT 407

GH1 (β-glucosidase)

RS004136 GCTCATCCCATCTTCTCTGA TGGGTCGATGAAATTCACTT 567

RS004624 TGCAAGAGCAAGAACACACC GACGGCTCTTTTCAGCAATC 1013

GGPP synthase

RS100016 TGGAGGACATATTTGGCGTG TGACGAGGTCAGCTTTGTTC 378 1294 1295

75

1296 Table S7. Lipocalin family genes in Reticulitermes speratus. Gene Genomic RNA-Seq, head (rpkm)* RNA-Seq, thorax + abdomen (rpkm)* ID position Subclass RF RM SF SM WF WM RF RM SF SM WF WM

RS00 scaffold_1:1 296 277 185 197 302 267 252 296 227 239 223 245 0420 1779874- .4 .2 .4 .7 .8 .3 .6 .1 .2 .7 .1 .5 11785972(+) RS00 scaffold_20: Clade A 80. 142 385 320 107 689 364 713 173 202 268 234 4657 145341- (SOL1 2 .4 .9 .6 8.5 .9 .0 .9 8.9 6.9 5.3 9.5 153000(+) family) RS00 scaffold_222 46. 42. 35. 42. 43. 32. 111 116 112 86. 107 108 5301 :512009- 0 3 2 5 5 1 .6 .7 .2 2 .1 .8 520780(+) RS00 scaffold_228 0.7 1.9 0.3 0.6 2.2 1.5 0.5 0.7 0.8 0.9 17. 39. 5409 :179179- 9 9 188294(+) RS00 scaffold_378 Clade A 11. 9.4 8.2 9.3 12. 13. 8.1 8.2 5.4 7.5 6.5 6.8 8601 :44567- (SOL1 2 0 5 47642(-) family) RS00 scaffold_387 Clade A 6.2 7.0 82. 80. 8.3 6.0 0.4 0.6 25. 21. 1.2 2.4 8823 :404890- (SOL1 8 6 3 9 413918(-) family) RS00 scaffold_387 Clade A 0.2 0.2 8.7 8.4 2.6 2.7 0.0 0.0 4.6 3.5 1.2 0.7 8824 :433200- (SOL1 439194(-) family) RS00 scaffold_39: Clade B 13. 13. 9.4 10. 13. 15. 400 11. 4.1 5.2 5.0 5.3 8881 1097117- 5 8 6 9 6 5.2 0 1100793(-) RS00 scaffold_39: Clade B 222 230 143 142 299 323 60. 73. 151 170 124 96. 8882 1102331- 8.8 1.4 2.2 6.5 3.8 7.5 9 2 .9 .0 .5 2 1104831(-) RS00 scaffold_39: Clade B 3.8 2.4 2.6 2.9 2.7 3.4 755 1.7 3.0 1.7 1.0 1.8 8884 1122915- .1 1126368(-) RS00 scaffold_43: 11. 5.7 19. 19. 13. 17. 9.8 15. 9.6 6.7 17. 18. 9761 3529390- 5 4 8 9 3 1 5 8 3534162(+) RS01 scaffold_48: 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0556 3522458- 3532425(-) RS01 scaffold_55: 176 167 224 233 275 266 733 731 996 830 210 182 1706 2524382- .8 .1 .3 .4 .8 .7 .1 .5 .7 .5 3.0 9.7 2537041(-) RS01 scaffold_62: 346 328 243 213 797 798 136 148 349 313 185 306 2785 2793705- 0.8 4.3 5.8 7.1 .0 .1 8.6 0.2 .6 .0 .7 .4 2806073(+) RS01 scaffold_757 Clade A 0.6 0.6 1.3 2.0 63. 41. 0.4 2.1 1.4 0.8 10. 8.3 3912 :39734- (SOL1 9 4 0 48788(-) family) RS01 scaffold_757 Clade A 20. 11. 1.4 0.8 16. 12. 7.3 1.6 0.5 0.0 7.8 10. 3913 :58631- (SOL1 8 2 8 5 8 120503(-) family) RS01 scaffold_757 Clade A 13. 18. 45. 32. 10. 10. 9.5 4.0 13. 13. 1.2 1.6 3914 :88485- (SOL1 5 1 5 7 1 7 5 5 96225(-) family) RS01 scaffold_867 172 161 199 226 130 196 107 116 267 267 151 142 4740 :52755- .9 .9 .5 .0 .8 .1 .5 .0 .4 .9 .0 .7 60543(-) 1297 *RF: female reproductives (queens), RM: male reproductives (kings), SF: female soldiers, SM: 1298 male soldiers, WF: female workers, WM: male workers. 1299 1300

76

1301 Table S8. Cellulase genes in Reticulitermes speratus. Gene Genomic Subcl RNA-Seq, head (rpkm)* RNA-Seq, thorax+abdomen (rpkm)* ID position ass RF RM SF SM WF WM RF RM SF SM WF WM

RS004 scaffold_186: GH1 5.0 3.2 15. 9.8 1.1 13.0 1603 2018 68. 55. 2105 2036 136 490588- 6 .9 .8 3 1 .6 .8 509600(-) RS004 scaffold_186: GH1 60. 65. 22. 22. 80. 78.0 7.9 11.8 10. 11. 17.5 17.2 137 540509- 7 7 3 6 2 6 1 564347(-) RS004 scaffold_186: GH1 8.3 10. 4.1 3.5 1.3 1.7 38.9 52.9 39. 47. 8.6 9.4 143 624299- 0 9 2 638822(+) RS004 scaffold_186: GH1 1.7 2.0 1.1 1.4 1.9 3.0 67.8 56.6 61. 55. 58.5 57.6 144 640651- 6 3 659530(+) RS004 scaffold_186: GH1 16. 13. 7.2 5.3 25. 24.1 25.6 30.8 33. 33. 76.9 73.4 146 661301- 6 4 1 7 2 665599(+) RS100 scaffold_186: GH1 3.2 3.7 4.1 4.5 13. 13.1 31.0 47.1 44. 49. 137. 124. 005 740796- 1 6 5 6 7 759807(+) RS100 scaffold_186: GH1 10 10 97. 96. 76. 106. 127. 195. 41 41 198. 211. 006 765348- 3.9 0.9 9 0 5 9 1 3 8.6 1.0 1 7 787391(+) RS100 scaffold_2:39 GH1 13 14 70. 71. 13 164. 5.5 5.9 10. 9.7 11.9 12.1 007 54538- 4.5 3.0 7 1 5.7 7 5 3976450(+) RS004 scaffold_2:39 GH1 24. 23. 10. 8.0 22. 20.5 17.8 22.5 31. 33. 49.6 47.5 147 89223- 1 7 8 0 3 6 4009606(+) RS004 scaffold_6:20 GH1 45. 38. 44. 39. 67. 50.1 26.1 18.4 56. 61. 46.6 47.8 149 19334- 0 6 2 1 1 6 3 2035300(+) RS004 scaffold_6:20 GH1 37. 34. 71. 80. 17. 24.5 36.5 57.7 81. 93. 41.0 42.0 623 47153- 6 2 5 0 8 9 0 2060696(+) RS004 scaffold_6:20 GH1 2.0 2.5 12. 12. 6.7 7.6 1748 0.8 4.1 6.8 2.1 2.9 624 70232- 1 4 .4 2087920(+) RS012 scaffold_6:20 GH1 6.5 7.1 3.9 5.8 81. 76.3 0.6 0.6 1.5 3.3 25.0 19.8 436 95616- 5 2111294(+) RS012 scaffold_186: GH1 1.4 0.8 1.3 1.2 3.2 5.8 7.3 13.5 26. 38. 18.3 19.5 437 676910- 2 4 679770(+) RS012 scaffold_186: GH1 99. 10 13 13 12 113. 47.0 65.2 96. 10 129. 108. 439 692051- 1 6.4 6.0 4.0 6.7 4 3 6.0 5 4 703746(+) RS012 scaffold_186: GH1 45. 44. 42. 35. 38. 34.9 40.6 50.7 40. 34. 40.3 36.2 440 706674- 4 3 4 7 8 0 4 735672(+) RS006 scaffold_27:1 GH9 32. 36. 35. 30. 28. 40.5 21.9 23.0 27. 31. 31.4 28.4 396 888536- 3 7 1 2 5 9 2 1904998(+) RS012 scaffold_611: GH9 16. 17. 7.4 9.0 27. 23.4 17.0 23.2 0.7 0.6 22.4 19.2 684 179946- 8 7 5 189568(+) RS012 scaffold_611: GH9 49. 62. 69. 68. 69. 116. 1232 1411 36. 43. 3032 2772 687 209423- 3 2 6 5 3 7 .1 .5 8 4 .0 .4 219382(+) RS100 scaffold_611: GH9 4.6 4.8 8.2 7.6 5.2 104. 1496 1987 30 36 1889 1678 101 225890- 4 2.2 3.5 4.6 6.9 4.0 4.5 ** 228919(+) + scaffold_564: 265652- 267717(-) 1302 *RF: female reproductives (queens), RM: male reproductives (kings), SF: female soldiers, SM: 1303 male soldiers, WF: female workers, WM: male workers.

77

1304 **RS100101 (GH9) lies between scaffold_611 and scaffold_564 in the current gene model (Rspe 1305 OGS1.0). 1306 1307 1308

78

1309 Table S9. Lisozyme genes in Reticulitermes speratus. Gene Genomic Subcl RNA-Seq, head (rpkm)* RNA-Seq, thorax + abdomen (rpkm)* ID position ass RF RM SF SM WF WM RF RM SF SM WF WM

RS000 scaffold_10:42 lysozy 0.0 0.2 0.2 0.2 0.0 0.0 0.2 0.0 1.8 2.9 1.0 3.3 427 51-11117(-) me c- type RS002 scaffold_1370: lysozy 0.0 0.0 0.0 0.0 0.0 0.7 2.4 2.5 7.1 14.2 15.9 23.5 400 6473-16438(-) me c- type RS003 scaffold_16:60 lysozy 0.0 0.2 5.1 3.6 0.0 1.0 22. 33. 153. 285. 64.6 64.3 406 14313- me c- 5 0 6 8 6016559(-) type RS008 scaffold_378:2 lysozy 0.6 0.7 0.0 1.6 0.0 0.3 39. 44. 167 138 250. 224. 613 88875- me c- 6 8 6.8 4.0 9 9 289873(-) type RS014 scaffold_859:2 lysozy 0.5 0.0 0.9 1.9 0.2 5.9 9.1 203 214 261 173 116 698 4243-30080(+) me c- .6 4.6 5.8 9.5 1.5 type RS100 scaffold_1097: lysozy 0.0 0.0 0.0 0.2 0.0 2.0 4.9 23. 5.3 1.9 269. 219. 001 7906-14032(+) me c- 3 3 4 type RS100 scaffold_1097: lysozy 97. 78. 266 276 127 178 131 137 211. 191. 148. 156. 002 42864- me c- 4 3 .1 .1 .1 .9 .3 .7 0 1 7 0 48573(+) type RS100 scaffold_16:61 lysozy 1.6 2.3 1.2 1.6 1.2 2.5 2.9 1.2 2.2 2.2 0.3 2.1 004 12988- me c- 6118555(-) type RS100 scaffold_859:4 lysozy 1.2 0.5 2.8 2.3 1.8 2.8 0.9 1.1 2.6 1.8 4.3 6.1 022 935-6501(+) me c- type RS100 scaffold_859:4 lysozy 0.4 0.4 0.6 0.0 2.0 1.6 0.8 0.3 1.7 1.4 2.2 1.7 023 8011-50947(+) me c- type RS100 scaffold_859:6 lysozy 0.5 0.8 0.7 0.7 0.2 1.0 35. 92. 951. 728. 151 146 024 3516-69373(+) me c- 4 5 1 6 7.7 4.3 type RS100 scaffold_859:8 lysozy 0.0 0.0 0.2 0.2 0.5 0.6 19. 40. 353. 298. 621. 639. 025 5037-90894(+) me c- 1 6 4 3 1 9 type RS100 scaffold_859:9 lysozy 24. 18. 17. 14. 8.9 9.1 19. 30. 14.6 13.1 13.8 13.7 026 9866- me c- 4 8 4 3 6 4 105566(+) type RS006 scaffold_257:1 lysozy 0.8 0.0 1.9 3.8 0.8 2.1 50. 130 435 497 609. 511. 054 332-27487(-) me i- 9 .6 9.1 8.4 9 6 type RS008 scaffold_374:1 lysozy 0.4 0.6 1.5 4.7 0.0 0.2 67. 180 384 418 694. 545. 547 74359- me i- 3 .6 9.1 4.8 7 6 193956(+) type RS015 scaffold_997:6 lysozy 0.2 0.0 0.4 0.0 0.2 0.2 0.6 5.1 10.5 19.4 45.6 35.2 579 7510-70385(+) me i- type 1310 *RF: female reproductives (queens), RM: male reproductives (kings), SF: female soldiers, SM: 1311 male soldiers, WF: female workers, WM: male workers. 1312 1313

79

1314 Table S10. GGPP synthase genes in Reticulitermes speratus. Gene Genomic Subclas RNA-Seq, head (rpkm)* RNA-Seq, thorax + abdomen ID position s (rpkm)* RF RM SF SM WF WM RF R SF SM W W M F M RS100 scaffold_31:279 derived 44. 37. 59.8 59.9 11. 5.6 3.5 3. 53. 51. 2.2 2.2 010 4158- 1 0 3 8 4 9 2806101(+) RS007 scaffold_31:281 derived 11. 9.6 93.5 88.3 6.4 6.0 4.6 0. 52. 55. 0.5 0.3 480 3416- 8 3 9 9 2826556(+) RS007 scaffold_31:283 derived 47. 41. 2.2 3.5 21. 20. 15. 2. 2.7 1.8 0.9 1.5 481 1788- 9 2 7 2 3 0 2846259(+) RS100 scaffold_31:285 derived 14. 15. 6.9 8.7 14. 12. 5.9 6. 7.1 6.3 5.4 5.8 011 4401- 8 2 8 1 5 2866719(+) RS100 scaffold_31:288 derived 3.8 5.4 115. 117. 2.6 4.2 6.3 8. 85. 82. 6.7 6.5 012 1695- 8 2 6 4 3 2891432(+) RS007 scaffold_31:289 derived 126 133 51.9 50.2 79. 69. 4.9 5. 23. 28. 1.2 1.3 482 8315- .6 .6 8 5 1 5 7 2910945(+) RS100 scaffold_31:291 derived 76. 71. 3.0 5.0 34. 22. 0.3 0. 2.9 0.6 0.1 0.2 013 8062- 8 9 0 6 2 2928930(+) RS100 scaffold_31:293 derived 0.4 0.3 1.5 2.3 0.4 0.2 0.7 1. 1.6 1.3 0.1 0.6 014 9721- 0 2948893(+) RS100 scaffold_31:296 derived 11. 10. 619. 582. 20. 15. 0.8 0. 419 426 1.7 1.0 015 4788- 1 0 1 8 4 7 0 .8 .1 2973091(+) RS100 scaffold_31:298 derived 192 202 1482 1440 387 278 3.2 3. 998 836 38. 38. 016 2793- .0 .8 .2 .8 .8 .4 7 .3 .4 1 5 2992579(+) RS100 scaffold_31:300 derived 3.0 9.1 43.4 37.9 44. 33. 0.2 0. 20. 25. 0.0 0.1 017 3886- 5 6 2 9 4 3010980(+) RS007 scaffold_31:302 derived 10. 13. 36.3 35.9 32. 26. 1.1 0. 26. 26. 2.4 1.7 483 0421- 8 4 4 9 6 1 4 3029674(+) RS007 scaffold_31:303 possibly 7.4 8.7 8.9 9.3 9.5 9.5 9.1 8. 8.7 7.9 10. 7.9 484 1695- ancestral 5 5 3042573(+) 1315 *RF: female reproductives (queens), RM: male reproductives (kings), SF: female soldiers, SM: 1316 male soldiers, WF: female workers, WM: male workers. 1317 1318

80

1319 Table S11. TY family genes in Reticulitermes speratus. Gene Genomic RNA-Seq, head (rpkm)* RNA-Seq, thorax + abdomen (rpkm)* ID position RF RM SF SM WF WM RF RM SF SM WF WM

RS001 scaffold_113:18 112 101 2,26 2,22 49,35 95,34 68 30 46 69 168 128 196 5192-185350(+) 8 9 6 2 RS001 scaffold_113:22 13,0 12,5 66,3 64,8 149,9 193,2 125 213 12,8 16,7 10,5 9,6 197 4999-225178(+) 08 54 24 99 36 48 95 51 86 06 RS001 scaffold_113:24 4,78 4,68 30,2 32,5 53,57 136,2 88 203 3,75 4,04 5,14 4,9 198 0134-240322(+) 1 3 57 02 1 54 5 5 5 93 1320 *RF: female reproductives (queens), RM: male reproductives (kings), SF: female soldiers, SM: 1321 male soldiers, WF: female workers, WM: male workers. 1322 1323

81

1324 Table S12. Sex determination genes in Reticulitermes speratus. Gene name Symbol OrthoDB7_ID Drosophila Gene ID Expression Expression homolog Differences Differences accesion No between sexes among castes (FDR) (FDR) Hea Thorax + Head Thorax + d abdome abdome n n daughterless da EOG7BGW18 NM_00127341 RS000156 1 0.99 0.75 0.0037 1 Hairy/deadpan dpn EOG7X6ZFD NM_057575 RS006294 1 0.63 0.7 0.00073

Hairy/deadpan dpn EOG7X6ZFD NM_057575 RS006295 1 0.92 0.86 1.12E-05

degringolade dgrn EOG7KX50F NM_141339 - - - - -

dissatisfaction dsf EOG7VJ5CF NM_00127318 RS013719 N/A N/A N/A N/A 0 doublesex dsx EOG77DWNS NM_169202 - - - - -

Doublesex-Mab Dmrt11B EOG7TXKGH NM_078591 RS007930 1 0.084 7.85E 0.34 related 11B -11 Doublesex-Mab Dmrt93B EOG7B8S48 NM_079704 RS006912 N/A N/A N/A N/A related 93B Doublesex-Mab Dmrt99B EOG718KC7 NM_079825 RS002870 N/A N/A N/A N/A related 99B extramacrochaeta emc EOG7QCMB NM_079152 RS011015 1 0.71 0.54 0.9 e K female lethal d fl(2)d EOG7KT8QP NM_166010 RS002775 1 0.89 0.99 0.0064

fruitless fru EOG7S84VN NM_079673 RS001598 1 0.99 0.85 0.065

groucho gro EOG7BD0RV NM_00126038 RS013444 1 1 0.44 0.07 0 groucho gro EOG7BD0RV NM_00126038 RS010817 1 0.91 0.44 0.78 0 hermaphrodite her EOG7JXFJ7 NM_00127357 RS010220 1 0.87 0.89 0.0016 7 hopscotch hop EOG74JNNN NM_078564 RS002898 1 0.99 0.06 0.08

intersex ix EOG7162KM NM_136833 RS007777 1 0.86 0.44 0.98

outstreched os EOG7TJFZ9 NM_00110354 RS015475 1 0.08 0.32 1.87E-08 5 ovarian tumor out EOG7N9B8P NM_00127240 RS009292 1 0.51 3.79E 8.00E-07 3 -07 ovo/shavenbaby ovo EOG7C0766 NM_00116920 RS015380 1 0.83 0.003 0.24 2 runt run EOG73RPRJ NM_078700 RS006486 1 0.99 0.71 0.23

sansfille snf EOG78WZ6X NM_078490 RS011497 1 0.41 0.91 0.035

Sex-lethal Sxl EOG72ZQVX NM_00103189 RS002675 1 0.69 0.024 0.32 1 scute/sisterlessB sc EOG7HJ870 NM_057455 RS001853 1 0.94 2.13E 0.00021 -14 scute/sisterlessB sc EOG7HJ870 NM_057455 RS001854 N/A N/A N/A N/A

scute/sisterlessB sc EOG7HJ870 NM_057455 RS008845 N/A N/A N/A N/A

sisterless-A sisA EOG7FZBDN NM_078561 - - - - -

standstill stil EOG7RVNX6 NM_057404 - - - - -

transformer tra EOG7HN4HG NM_079390 RS002588 1 0.70 0.73 0.52

82

transformer2 tra2 EOG7Z3SMD NM_057416 RS011357 1 0.87 0.82 0.76

virillizer vir EOG7DRWG NM_080161 RS015320 1 0.75 0.51 0.41 S P-element somatic PSI EOG7K19SX NM_00111034 RS012378 0.68 1 0.08 5.77E-04 inhibitor 3 * IGF-II mRNA BP IMP EOG738BJT XM_004929848 RS011169 0.98 1 9.54E 0.14 * -13 Masculinizer Masc EOG7QS39K AB840788 * RS005147 0.62 1 0.11 0.52

Feminizer Fem AB840787 * - - - - -

1325 N/A: data is not available. 1326 *Accession numbers of Bombyx mori orthologs, which are absent in the Drosophila genome. 1327 1328

83

1329 Table S13. Histone modifying enzyme genes in Reticulitermes speratus, and their orthologs in 1330 Zootermopsis nevadensis and Drosophila melanogaster. Gene names were described based on 1331 those in human. Expression levels were compared between castes and between sexes in each of 1332 body parts (i.e., head and body). Asterisks indicate genes with significantly differential 1333 expressions (GLM analysis, FDR < 0.05). Gene Gene ID of orthologs Expression Expression name Differences among Differences between castes (FDR) sexes (FDR) D. melanogaster Z. nevadensis R. speratus Thorax + Head Thorax + Head abdomen abdomen Histone acetyltransferases

HAT1 FBgn0037376 Znev_12488 RS014176 0.41 0.409 0.985 1

KAT2A FBgn0020388 Znev_04968 RS007246 0.148 0.172 0.570 1

EP300 FBgn0261617 Znev_08401 RS009769 0.383 0.775 0.379 1

TAF1 FBgn0010355 Znev_01981 RS006173 8.02E-05* 0.655 0.428 1

KAT5 FBgn0026080 Znev_00128 RS009782 0.003* 0.360 0.794 1

KAT6A FBgn0034975 Znev_04899, RS008273 0.282 0.538 0.606 1

Znev_04900,

Znev_06347

KAT7 FBgn28387 Znev_14581 RS001974 4.01E-04* 0.192 0.476 1

KAT8 - Znev_09388 RS014043 0.010* 0.731 0.985 1

KAT8 FBgn0014340 Znev_09335 RS014973 0.972 0.003* 0.906 1

ELP3 FBgn0031604 Znev_03596, RS007305 0.002* 0.568 0.428 1

Znev_16248

GTF3C4 - Znev_04938 RS006156 0.020* 0.437 0.805 1

NCOA2 - Znev_05082, RS006636 1.56E-04* 2.83E- 0.840 1 04* Znev_05083

CLOCK FBgn0023076 - RS010134 0.197 0.855 0.965 1

CSRP2BP FBgn0032691 Znev_02989 RS007957 0.290 0.122 0.913 1

ATF2 FBgn0265193 Znev_01083 RS004334 0.233 0.471 0.795 1

MGEA5 FBgn0038870 Znev_10779 RS014601 8.72E-09* 0.777 0.810 1

NAA60 FBgn0036039 Znev_04119 RS011057 0.019* 0.941 0.725 1

Histone deacetylases

HDAC1 FBgn0015805 Znev_03795 RS012536 0.020* 0.005* 0.683 1

HDAC3 FBgn0025825 Znev_05602, RS008767 0.008* 0.067 0.891 1

Znev_18002

HDAC4 FBgn0041210 Znev_00349 RS004692 0.006* 0.604 0.841 1

HDAC6 FBgn0026428 Znev_02211 RS001937 0.002* 0.022* 0.922 1

HDAC8 - Znev_12928 RS010779 0.246 0.706 0.458 1

HDAC11 FBgn0051119 Znev_10901 RS007375 0.606 1.34E- 0.881 1 05* SIRT1 FBgn0024291 Znev_11203 RS007459 0.157 4.99E- 0.992 1 04* SIRT2 FBgn0038788 Znev_11971 RS013047 0.291 0.723 0.598 1

SIRT3 - Znev_01239 RS012446 0.415 0.007* 0.901 1

SIRT4 FBgn0029783 Znev_10250 RS011722 0.035* 0.896 0.849 1

84

SIRT5 - Znev_14842 RS008195 0.131 0.315 0.598 1

SIRT6 FBgn0037802 Znev_09433 RS010824 3.54E-07* 0.054 0.322 1

SIRT7 FBgn0039631 Znev_03848 RS012147 0.047* 0.032* 0.598 1

Histone methyltransferases

PRMT1 FBgn0037834 Znev_11976 RS013040 0.342 7.78E- 0.821 1 04* CARM1 FBgn0037770 Znev_11468 RS009701 0.669 0.879 0.978 1

PRMT5 FBgn0015925 Znev_08220 RS011600 4.00E-04* 0.612 0.991 1

PRMT7 FBgn0034817 Znev_07771 RS014915 0.549 0.668 0.745 1

SUV39H2 FBgn0263755 Znev_00097 RS015134 0.759 0.633 0.662 1

EHMT1 FBgn0040372 Znev_05631 RS003423 0.010* 0.181 0.929 1

EHMT1 - - RS001807 0.138 0.163 0.936 1

SETDB1 FBgn0086908 Znev_15214 RS011823 5.88E-06* 0.224 0.634 1

KMT2B FBgn0003862 Znev_03032 RS005841 0.003* 0.002* 0.828 1

KMT2C FBgn0023518 Znev_09224 RS010172 0.263 0.765 0.502 1

KMT2C FBgn0263667 Znev_09226 RS010173 0.161 0.706 0.448 1

KMT2E FBgn0036398 Znev_06854 RS008681 0.083 0.972 0.298 1

SETD1A FBgn0040022 Znev_03918 RS010178 4.36E-04* 0.286 0.810 1

ASH1L FBgn0005386 Znev_16755 RS005572 0.216 0.641 0.639 1

SETD2 FBgn0030486 Znev_02205 RS014526 0.081 0.723 0.906 1

WHSC1L1 FBgn0039559 Znev_13059, RS006777 4.42E-06* 0.052 0.403 1

Znev_14928

SMYD3 FBgn0011566 Znev_00597 RS009274 0.020* 0.008* 0.443 1

DOT1L FBgn0264495 Znev_02995 RS009425 9.14E-05* 1.16E- 0.903 1 07* SETD8 FBgn0011474 Znev_05607 RS008772 0.472 1.13E- 0.559 1 04* SUV420H FBgn0025639 Znev_05656 RS005423 0.024* 0.003* 0.961 1 1 EXH2 FBgn0000629 Znev_02258 RS003652 0.010* 0.084 0.977 1

SETMAR FBgn0037841 Znev_04699 RS008713 6.48E-04* 0.497 0.909 1

SMYD4 FBgn0033427 Znev_01984 RS006171 3.35E-04* 1.69E- 0.511 1 04* SMYD5 FBgn0038869 Znev_01118 RS004372 0.013* 0.240 0.232 1

SETD3 FBgn0052732 Znev_12254 RS012270 0.545 0.005* 0.874 1

SETD4 FBgn0053230 Znev_05436 RS004593 0.015* 0.029* 0.866 1

Histone demethylases

KDM1A FBgn0260397 Znev_07217 RS006951 0.867 0.022* 0.885 1

KDM1A - Znev_00885 RS005465 0.012* 0.249 0.624 1

KDM1A - Znev_00889 RS005467 0.042* 0.082 0.630 1

KDM2A FBgn0037659 Znev_06646 RS001102 0.294 0.297 0.915 1

KDM3A Znev_17182, RS010951 0.017* 1.000 0.800 1

Znev_09543

KDM4C Znev_07618 RS014991 0.413 0.873 0.943 1

KDM5A KDM5 Znev_17776 RS002306 0.217 0.043* 0.637 1

KDM6A Utx Znev_00365 RS002457 0.049* 0.014* 0.975 1

85

KDM7A -- Znev_15990 RS015526 0.061 0.294 0.760 1

KDM8 FBgn0035166 Znev_07996 RS011528 0.055 0.516 0.991 1

JARID2 FBgn0036004 Znev_08621 RS010046 0.035* 0.165 0.560 1

JMJD6 FBgn0038948 Znew_10481 RS002889 0.008* 0.153 0.876 1

C14orf169 FBgn0266570 Znew_02698 RS011375 1.51E-04* 0.177 0.157 1 1334 *FDR < 0.05 1335 1336

86

1337 Table S14. DNA methylation-related genes in Reticulitermes speratus, and their orthologs in 1338 Zootermopsis nevadensis and Drosophila melanogaster. Gene names were described based on 1339 those in human. Expression levels were compared between castes and between sexes in each of 1340 body parts (i.e., head and body). Asterisks indicate genes with significantly differential 1341 expressions (GLM analysis, FDR < 0.05). Gene name Gene ID of orthologs Expression Expression Differences among Differences between castes (FDR) sexes (FDR) D. melanogaster Z. nevadensis R. speratus Thorax + Head Thorax + Head abdomen abdomen DNMT1 - Znev_18516 RS003121 0.339 0.243 0.849 1

DNMT3 - Znev_11906, RS003573 0.147 0.006* 0.993 1 Znev_06587 RS003574 - - - -

AGT FBgn0024912 Znev_07784 RS014911 0.572 0.276 0.973 1

CG9154 FBgn0031777 Znev_08521 RS005202 0.023* 0.512 0.219 1

DMAP1 FBgn0034537 Znev_09468 RS014390 0.069 0.269 0.842 1

MBD-like FBgn0027950 Znev_00583 RS002943 2.90E-06* 0.002* 0.202 1

MBD-R2 FBgn0038016 Znev_06566, RS013654, 0.006* 0.023* 0.625 1

Znev_01879 RS012154 0.692 0.812 0.633 1

TET FBgn0263392 Znev_11370 RS012619 0.868 0.593 0.961 1

TDG FBgn0026869 Znev_03074 RS003560 0.012* 0.083 0.711 1 1342 *FDR < 0.05 1343 1344

87

1345 Table S15. Odorant receptor (OR) genes in Reticulitermes speratus. Gene Gene ID Scaffold Strand Sequence* Exons RsOrco RS006385 27 - complete 7 RsOr1 RS010432 47 - fragment 2 RsOr2 RS010431 47 - fragment 1 RsOr9 RS007507 31 + NTE 3 RsOr10 RS013588 703 - NTE 3 RsOr11 RS005817 248 - fragment 1 RsOr12 RS001899 1287 + fragment 1 RsOr13 RS004141 186 + fragment 2 RsOr14 RS004139 186 + complete 3 RsOr15 RS004142 186 + fragment 1 RsOr16 RS004140 186 + fragment 1 RsOr17 RS004138 186 + fragment 1 RsOr26 RS007955 337 + fragment 1 RsOr28 RS013645 72 + fragment 1 RsOr30 RS010974 50 - fragment 1 RsOr35 RS000450 10 - complete 6 RsOr40 RS013053 65 - fragment 1 RsOr41 RS014428 807 + fragment 1 RsOr42 RS001393 118 - NTE 2 RsOr43 RS001394 118 - NTE 5 RsOr44 RS001395 118 - fragment 3 RsOr45 RS001396 118 - fragment 4 RsOr46 RS011968 570 + fragment 1 RsOr47 RS005073 212 - fragment 4 RsOr48 RS008327 365 - fragment 1 RsOr56 RS006316 269 - complete 7 RsOr66 RS013051 65 - fragment 1 RsOr67 RS013052 65 - fragment 1 RsOr68 RS008460 37 - NTE 3 RsOr69 RS008461 37 - fragment 1 RsOr70 RS008462 37 - fragment 1 1346 *NTE: initial methionine (M) is missed in the transcript sequence. CTE: stop codon is missed in 1347 the transcript sequence. 1348 1349

88

1350 Table S16. Gustatory receptor (GR) genes in Reticulitermes speratus. Genes Gene ID Scaffold Strand Sequence* Exon RsGr1 RS001507 12 - complete 8 RsGr2 RS001504 12 - NTE 3 RsGr3 RS010223 458 + complete 6 RsGr4 RS007965 338 - fragment 3 RsGr5a RS013641 72 + fragment 1 RsGr5b RS013642 72 + fragment 1 RsGr6 RS013643 72 - complete 7 RsGr7 RS001378 118 + complete 6 RsGr8 RS011470 530 + CTE 5 RsGr9 RS001377 118 - complete 8 RsGr10a RS001379 118 + fragment 1 RsGr10b RS001380 118 + fragment 1 RsGr11 RS001506 12 - fragment 1 RsGr12 RS001505 12 - fragment 2 RsGr13 RS011406 53 + fragment RsGr14 RS012780 62 - fragment 1 RsGr15 RS005943 25 + NTE 3 RsGr16 RS011323 524 + CTE 1 RsGr17 RS009029 4 - complete 1 RsGr18 RS011329 525 + complete 1 RsGr19 RS015443 963 - NTE 1 RsGr20 RS011328 525 - complete 1 RsGr21 RS009385 410 - complete 1 RsGr22 RS003203 155 - fragment 1 RsGr23 RS003204 155 + fragment 1 1351 *NTE: initial methionine (M) is missed in the transcript sequence. CTE: stop codon is missed in 1352 the transcript sequence. 1353 1354

89

1355 Table S17. Ionotropic receptor (IR) genes in Reticulitermes speratus. Gene Gene ID Scaffold Strand Sequence* Exon RsKAINATE1 RS003449 1616 - CTE 14 RsKAINATE2 RS001970 1291 + NTE 16 RsKAINATE3 RS001366 1174 + complete 16 RsKAINATE4 RS003544 166 - NTE 16 RsKAINATE5 RS012852 63 - NTE 16 RsKAINATE6 RS010902 5 + complete 19 RsNMDAR1 RS012806 62 - complete 17 RsNMDAR2 RS006920 293 - complete 17 RsNMDAR3 RS004430 193 + complete 13 RsNMDAR4 RS010428 47 - fragment 1 RsNMDAR4 RS010430 47 - CTE 1 RsNMDAR4 RS010426 47 - NTE 8 RsNMDAR4 RS010427 47 - fragment 4 RsNMDAR4 RS010429 47 - fragment 1 RsIR8a RS015433 96 - complete 16 RsIR25a RS008934 390 + complete 17 RsIR21a RS004709 20 - NTE 2 RsIR21a RS004710 20 - fragment 4 RsIR21b RS011973 571 + complete 9 RsIR41a1 RS004249 189 - complete 4 RsIR41a2 RS001436 119 - NTE 2 RsIR41a2 RS001437 119 - fragment 1 RsIR41a3 RS009772 430 + complete 5 RsIR41a4 RS013668 72 + NTE 3 RsIR41a4 RS013667 72 + fragment 3 RsIR41a5 RS004734 201 - fragment 1 RsIR41a5 RS004733 201 - fragment 1 RsIR68a RS000299 1 - fragment 1 RsIR68a RS000298 1 - complete 5 RsIR68a RS000297 1 - complete 1 RsIR75a RS006718 281 + fragment 2 RsIR75b RS009087 4 + complete 11 RsIR75c RS011341 527 - NTE 4 RsIR75c RS011342 527 - CTE 5 RsIR75d RS012791 62 + fragment 1 RsIR75e RS006720 281 + CTE 1 RsIR75e RS006723 281 + NTE 1 RsIR75e RS006722 281 + fragment 1 RsIR75e RS006721 281 + fragment 1 RsIR75e RS006719 281 + fragment 1 RsIR75f RS006724 281 + complete 6 RsIR75g RS012790 62 + fragment 1 RsIR75h RS008223 353 - complete 10 RsIR75j RS011943 57 - complete 9 RsIR75k RS003408 160 - complete 9

90

RsIR75k RS003409 160 - fragment 1 RsIR75m RS012788 62 + complete 3 RsIR75m RS012789 62 + CTE 4 RsIR75n RS012793 62 + fragment 2 RsIR75n RS012792 62 + fragment 1 RsIR75o RS012794 62 + fragment 1 RsIR75p RS012795 62 + CTE 1 RsIR76b RS015388 955 + complete 4 RsIR76b RS015389 955 + CTE 4 RsIR93a RS004370 19 - fragment 1 RsIR93a RS004368 19 - fragment 5 RsIR93a RS004369 19 - fragment 1 RsIR100 RS006921 293 - CTE 1 RsIR101 RS000952 109 + complete 2 RsIR103 RS000953 109 + complete 3 RsIR104 RS011634 542 + complete 11 RsIR105 RS009960 444 + complete 11 RsIR106 RS006879 29 + complete 8 RsIR107 RS008138 35 + complete 1 RsIR108 RS007756 321 + fragment 3 RsIR109 RS001336 116 + complete 9 RsIR110 RS008632 379 + fragment 1 RsIR111 RS015446 963 + fragment 1 RsIR111 RS015444 963 + fragment 1 RsIR111 RS015445 963 + fragment 1 RsIR112 RS001346 1161 - NTE 2 RsIR113 RS008629 379 + NTE 1 RsIR114 RS013234 663 + fragment 1 RsIR115 RS009235 401 + fragment 1 RsIR116 RS008630 379 + CTE 5 RsIR117 RS009236 401 + fragment 1 RsIR118 RS014458 812 + fragment 1 RsIR119 RS013409 69 + fragment 1 RsIR119 RS013410 69 + fragment 2 RsIR120 RS002319 1356 - CTE 1 RsIR121 RS013712 728 + fragment 1 RsIR122 RS015442 963 - fragment 1 RsIR123 RS013711 728 + CTE 1 RsIR124 RS008631 379 + fragment 1 RsIR125 RS013235 663 + fragment 2 RsIR126 RS013709 728 - fragment 1 RsIR127 RS005838 248 + complete 9 RsIR128 RS013236 663 + fragment 1 RsIR129 RS001385 118 + fragment 1 RsIR130 RS001384 118 + fragment 8 RsIR131 RS001386 118 + fragment 2 RsIR132 RS001387 118 + fragment 1

91

RsIR133 RS003275 158 + complete 13 RsIR134 RS001388 118 + fragment 1 RsIR135 RS006500 273 + fragment 1 RsIR136 RS003792 175 + complete 19 RsIR137 RS007301 301 - NTE 1 RsIR138 RS003198 155 - fragment 3 RsIR139 RS006166 26 - complete 25 RsIR140 RS009364 41 - complete 3 RsIR141 RS015441 963 - fragment 2 RsIR142 RS005465 23 + complete 12 RsIR143 RS012712 615 + complete 1 RsIR144 RS009304 41 + complete 9 RsIR145 RS007402 308 - CTE 1 RsIR148 RS002873 147 - complete 2 RsIR148 RS002874 147 - fragment 1 RsIR149 RS002123 130 + complete 4 RsIR149 RS002121 130 + CTE 1 RsIR149 RS002122 130 + fragment 2 RsIR150 RS006006 250 - fragment 2 RsIR159 RS011158 51 - complete 1 RsIR161 RS009527 42 - NTE 1 RsIR162 RS004299 19 - NTE 1 RsIR162 RS004298 19 - complete 1 RsIR164 RS006947 296 - NTE 1 RsIR165 RS003995 18 - complete 1 RsIR169 RS014584 84 - complete 9 RsIR172 RS004637 2 + NTE 1 RsIR182 RS006698 280 + complete 1 RsIR187 RS003025 15 - CTE 1 RsIR192 RS001765 1244 - complete 1 RsIR195 RS003107 1510 - CTE 1 RsIR202 RS010498 48 - complete 1 RsIR203 RS007486 31 - complete 1 RsIR210 RS013710 728 - complete 1 RsIR211 RS014731 864 - complete 2 RsIR215 RS008190 35 - complete 1 RsIR217 RS008540 373 + complete 1 RsIR218 RS008539 373 - complete 1 1356 *NTE: initial methionine (M) is missed in the transcript sequence. CTE: stop codon is missed in 1357 the transcript sequence. 1358 1359

92

1360 Table S18. Sensory neuron membrane protein (SNMP) genes in Reticulitermes speratus. Gene Gene ID Scaffold Strand Sequence Exons RspeSNMP1a RS007398 307 - complete 9 RspeSNMP1b RS007395 307 - complete 9 RspeSNMP1c RS007396 307 - fragment 1 RspeSNMP1c RS007397 307 - fragment 1 RspeSNMP1d RS007393 307 - fragment 2 RspeSNMP1d RS007394 307 - fragment 5 RspeSNMP2 RS007977 339 - complete 9 1361 1362

93

1363 Table S19. Chemosensory protein (CSP) genes in Reticulitermes speratus.

Gene Gene ID Scaffold Strand Sequence* Exons RspeCSP1 RS000584 1000 - complete 2 RspeCSP2 RS000585 1000 - complete 2 RspeCSP3 RS003292 1596 + complete 2 RspeCSP4 RS003144 1536 - CTE 1 RspeCSP5 RS001446 1196 - complete 2 RspeCSP6 RS010441 471 + NTE 1 RspeCSP7 RS010442 471 + complete 2 RspeCSP8 RS009753 43 + NTE 2 RspeCSP9 RS001447 1196 - complete 2 RspeCSP10 RS002912 148 - complete 3 1364 *NTE: initial methionine (M) is missed in the transcript sequence. CTE: stop codon is missed in 1365 the transcript sequence. 1366 1367

94

1368 Table S20. Biogenic amine- and neuropeptide-related genes in Reticulitermes speratus. Gene Name Gene ID Biogenic amines (biosynthesis) Henna (Phenylalanine hydroxylase) RS010135 Pale (Tyrosine hydroxylase) RS010906 Dopa decarboxylase RS006642 Dopamine N acetyltransferase RS005696 Tyrosine decarboxylase 2 RS013885 Tyramine β hydroxylase RS013347 Tryptophan hydroxylase RS010115 Biogenic amines (receptor) Dopamine 1-like receptor 1 (Dop1) RS005814 Dopamine 1-like receptor 2 (Dop2) RS005701 Dopamine 2-like receptor (Dop3) RS012926 Dopamine/Ecdysteroid receptor RS013326 Octopamine-Tyramine receptor RS000810 Octopamine receptor RS003890 Octopamine receptor in mushroom bodies RS006582 Octopamine β2 receptor RS008926 5-hydroxytryptamine (serotonin) receptor 1A RS000074 5-hydroxytryptamine (serotonin) receptor 1 RS001623 5-hydroxytryptamine (serotonin) receptor 2 RS007168 5-hydroxytryptamine (serotonin) receptor 2B RS007166 muscarinic Acetylcholine Receptor, A-type RS008036 muscarinic Acetylcholine Receptor, B-type RS008383 Adenosine receptor RS012570 Histamine-gated chloride channel subunit 1 RS002296 Neuropeptides Neuropeptide Y receptor RS006823 CCHamide-1 receptor RS002615 Adipokinetic hormone RS002158 Pigment dispersing factor RS003293 Ion transport peptide RS003304 Neuroparsin RS003352 Pyrokinin RS003469 Neuropeptide F RS003515 Orcokinin A transcript RS005362 FMRFamide RS005650 Vasotocin-neurophysin RS006779 Neuropeptide F 2 RS006944 CRF-like Diuretic hormone RS007172

95

RYamide RS007800 Sulfakinin RS008491 Partner of Bursicon RS008861 Bursicon alpha RS008862 Glycoprotein hormone beta5 (GPB5) RS009005 Glycoprotein hormone alpha2 (GPA2) RS009006 Trissin RS009815 short Neuropeptide F RS009857 Myosuppressin RS010591 Tachykinin RS012013 Leucokinin RS012022 Crustacean cardioactive peptide RS012242 Neuropeptide-like precursor 1 RS012714 Calcitonin-like Diuretic hormone, Diuretic hormone 31 RS012750 Corazonin RS013838 Natalisin RS013901 CCHamide 2 RS014742 1369 1370 1371

96

1372 Table S21. Juvenile hormone (JH)-related genes in Reticulitermes speratus. Gene Name Symbol Gene ID JH biosynthesis genes Acetoacetyl-CoA thiolase AcoAT RS011067 3-Hydroxy-3-Methylglutaryl-CoA synthase 1 HMGS1 RS005033 3-Hydroxy-3-Methylglutaryl-CoA synthase 2 HMGS2 RS013349 3-Hydroxy-3-Methylglutaryl-CoA reductase HMGR RS000919 Mevalonate kinase MK RS001132 Phosphomevalonate kinase PK RS012473 Diphosphomevalonate decarboxylase DD RS005846 Isopentenyl-diphosphate σ-isomerase IPPI RS012944 Farnesol oxidase FO RS008250 Farnesal dehydrogenase 1 FD1 RS006558 Farnesal dehydrogenase 2 FD2 RS014352 JH acid methyltransferase JHAMT RS007861 JH epoxidase CYP15A1 RS013787 JH epoxidase homolog CYP15F1 RS000985 JH epoxidase homolog CYP4C7 RS004449 JH signaling genes Methoprene-torelant Met RS010120 Steroid receptor coactivator SRC RS006636 Krüppel homolog-1 Kr-h1 RS002081 Broad-Complex Br-C RS011102 Ecdysone-induced protein 93 E93 RS003976 Neuropeptide related genes

Allatotropin precursor RS006223

Allatotropin receptor RS005076

Allatostatin precursor RS000574

Allatostatin receptor RS001538 JH binding protein genes Hexamerin 1 Hex1 RS000846 Hexamerin 2 Hex2 RS011205 JH degradation genes JH esterase JHE RS001960 JH esterase JHE RS001961 JH esterase JHE RS001964 JH esterase JHE RS001965 JH esterase JHE RS001966 JH esterase JHE RS001967 JH esterase JHE RS002191 JH esterase JHE RS003190

97

JH esterase JHE RS003673 JH esterase JHE RS003910 JH esterase JHE RS004129 JH esterase JHE RS004712 JH esterase JHE RS004713 JH esterase JHE RS006008 JH esterase JHE RS011642 JH esterase JHE RS014537 JH esterase JHE RS014538 JH epoxide hydrolase JHEH RS011542 1373 1374 1375

98

1376 Table S22. Ecdysone-related genes in Reticulitermes speratus. Gene Name Gene ID 20E synthesis neverland RS010513 shroud RS009788 CYP 307a1 (spook) RS010514 CYP 306a1 (phantom) RS002862 CYP 302a1 (disembodies) RS012246 CYP 315a1 (shadow) RS010451 CYP 314a1 (shade) RS006327 20E receptor Ecdysone receptor (EcR) RS006194 ultraspiracle (USP) RS005985 20E signaling Hormone receptor 3 (HR3) RS006489 Hormone receptor 4 (HR4) RS000766 Hormone receptor-like in 38 (HR38) RS008487 Hormone receptor-like in 39 (HR39) RS004674 Hormone-receptor-like in 78 (HR78) RS003557 fushi tarazu transcription factor 1 (FTZ-F1) RS013785 Ecdysone-induced protein 63 (E63) RS002747 Ecdysone-induced protein 74 (E74) RS009331 Ecdysone-induced protein 75 (E75) RS014319 Ecdysone-induced protein 93 (E93) RS003976 Ecdysone-induced protein 78 (E78) RS011677 1377 1378

99

1379 Table S23. Insulin signaling genes in Drosophila melanogaster, Zootermepsis nevadensis and 1380 Reticulitermes speratus. Gene name Symbol Function D. melanogaster Z. nevadensis R. speratus

Insulin-like peptide 1 Ilp1 Ligand FBgn0044051 Znev_05166, RS000535, Znev_05167, RS000536, Insulin-like peptide 2 Ilp2 Ligand FBgn0036046 Znev_07008, RS002145, Znev_07935, RS008597, Insulin-like peptide 3 Ilp3 Ligand FBgn0044050 Znev_07936 RS008598

Insulin-like peptide 4 Ilp4 Ligand FBgn0044049

Insulin-like peptide 5 Ilp5 Ligand FBgn0044048

Insulin-like peptide 6 Ilp6 Ligand FBgn0044047

Insulin-like peptide 7 Ilp7 Ligand FBgn0044046

Insulin-like peptide 8 Ilp8 Ligand FBgn0036690

Insulin-like receptor InR Receptor FBgn0283499 Znev_02684, RS000922, Znev_02685, RS007018, Znev_16736 RS007019

chico chico Receptor FBgn0024248 Znev_13974 RS000780

Lnk Lnk Receptor FBgn0028717 Znev_12363 RS006416

Phosphatidylinositol 3- Pi3K92E Downstream of FBgn0015279 Znev_01400 RS000592 kinase 92E insulin receptor Pi3K21B Pi3K21B Downstream of FBgn0020622 Znev_04386, RS004930, insulin receptor Znev_11082 RS006442

Phosphoinositide- Pdk1 Downstream of FBgn0020386 Znev_04009 RS010411 dependent kinase 1 insulin receptor Akt1 Akt1 Downstream of FBgn0010379 Znev_04339 RS000520 insulin receptor Phosphatase and tensin Pten Downstream of FBgn0026379 Znev_16636 RS011450 homolog insulin receptor Ribosomal protein S6 S6k Downstream of FBgn0283472 Znev_12092 RS003806 kinase insulin receptor Target of rapamycin Tor Downstream of FBgn0021796 Znev_11128 RS002594 insulin receptor Tsc1 Tsc1 Downstream of FBgn0026317 Znev_11179 RS004848 insulin receptor gigas gig Downstream of FBgn0005198 Znev_09179 RS006709 insulin receptor Ribosomal protein S6 RpS6 Downstream of FBgn0261592 Znev_07260 RS014227 insulin receptor Thor Thor Downstream of FBgn0261560 Znev_16639 RS011442 insulin receptor forkhead box, sub-group foxo Downstream of FBgn0038197 Znev_14322 RS013726 O insulin receptor Tif-IA Tif-IA Downstream of FBgn0032988 Znev_17679 RS000398 insulin receptor shaggy sgg Downstream of FBgn0003371 - RS007030 insulin receptor Ras homolog enriched in Rheb Downstream of FBgn0041191 Znev_02068 RS001480 brain insulin receptor

100

widerborst wdb Downstream of FBgn0027492 Znev_18447 RS014632 insulin receptor raptor raptor Downstream of FBgn0029840 Znev_09185 RS007403 insulin receptor Myc Myc Downstream of FBgn0262656 Znev_03784 RS012525 insulin receptor Ras oncogene at 85D Ras85D Ras signal FBgn0003205 Znev_07076, RS000933, (Ras1) Znev_07077, RS003738, Znev_14471 RS013615

happyhour hppy Ras signal FBgn0263395 Znev_03645 RS015403

rolled rl Ras signal FBgn0003256 Znev_00943 RS002853

Raf oncogene Raf Ras signal FBgn0003079 - RS009493

Downstream of raf1 Dsor1 Ras signal FBgn0010269 Znev_04859 RS012464

Ecdysone-inducible gene ImpL2 Binding to ILP FBgn0001257 - - L2 convoluted conv Binding to ILP FBgn0261269 Znev_01328 RS008068

steppke step other FBgn0086779 Znev_11787 RS014498

melted melt other FBgn0023001 Znev_04452 RS000118

1381 1382

101

1383 Table S24. Toolkit genes involved in wing formation in Drosophila melanogaster and 1384 Reticulitemes speratus. Gene name Gene symbol D. melanogaster R. speratus Expression Differences among castes (FDR) Head Thorax + abdomen 6.38E-13* 0.0002* achaete ac FBgn0000022 RS001853 scute sc FBgn0004170 0.719 0.261 asense ase FBgn0000137 RS001854 0.536 0.423 abdominal A abd-A FBgn0000014 RS009109 0.150 0.052 Antennapedia Antp FBgn0260642 RS009104 1.25E-06* 0.0099* apterous ap FBgn0267978 RS004477 0.731 0.008* bifid bi FBgn0000179 RS006410 0.011* 0.023* brinker brk FBgn0024250 RS014094 5.24E-06* 0.909 cubitus interruptus ci FBgn0004859 RS009075 1.04E-05* 0.005* cut cut FBgn0004198 RS000064 0.766 0.007* Daughters against dpp dad FBgn0020493 RS009849 0.857 0.375 Distal-less dll FBgn0000157 RS010450 1.40E-08* 0.0001* decapentaplegic dpp FBgn0000490 RS012434 0.517 0.862 engrailed en FBgn0000577 RS005170 0.669 0.039* escargot esg FBgn0001981

snail sna FBgn0003448 RS003859

worniu wor FBgn0001983 0.358 0.326 extradenticle exd FBgn0000611 RS014099 0.101 0.525 hedgehog hh FBgn0004644 RS014625 0.068 0.916 homothorax hth FBgn0001235 RS001474 0.083 0.513 Notch N FBgn0004647 RS013346 0.594 0.968 nubbin nub FBgn0085424 RS011952 0.907 0.028* patched ptc FBgn0003892 RS008136 1.07E-33* 0.009* spalt major sal FBgn0261648 RS013395 0.397 3.41E-07* Sex combs reduced scr FBgn0003339 RS009101 0.727 0.836 scalloped sd FBgn0003345 RS008357 0.001* 0.500 Serrate ser FBgn0004197 RS005737 1.29E-62* 0.009* spitz spi FBgn0005672 0.145 0.001* Keren Krn FBgn0052179 RS004578, RS007465 vein vn FBgn0003984

gurken grk FBgn0001137

102

4.05E-07* 0.071 blistered bs FBgn0004101 RS014165 7.02E-09* 0.018* teashirt tsh FBgn0003866 RS003447 tiptop tio FBgn0028979 0.447 0.092 Ultrabithorax Ubx FBgn0003944 RS009105 0.0001* 0.014* vestigial vg FBgn0003975 RS011568 0.163 0.865 ventral veins lacking vvl FBgn0086680 RS011806 0.023* 0.087 wingless wg FBgn0284084 RS015426 1385 *FDR < 0.05 1386 1387

103

1388 Table S25. Immune-related genes in Reticulitermes speratus. Family Subfamily Gene name Gene ID

Antimicrobial peptide crustin-like protein RS006883

Antimicrobial peptide defensin RS002487

Antimicrobial peptide defensin RS100003

Antimicrobial peptide locustin-like RS008268 protein Antimicrobial peptide prolixicin RS000201

Antimicrobial peptide termicin RS006953

Antimicrobial peptide thaumatin RS015368

Antimicrobial peptide thaumatin RS015369

Autophagy Autophgy-related RS000130

Autophagy Autophgy-related RS001459

Autophagy Autophgy-related RS002992

Autophagy Autophgy-related RS004579

Autophagy Autophgy-related RS006202

Autophagy Autophgy-related RS006853

Autophagy Autophgy-related RS006855

Autophagy Autophgy-related RS007186

Autophagy Autophgy-related RS007411

Autophagy Autophgy-related RS008423

Autophagy Autophgy-related RS008527

Autophagy Autophgy-related RS011626

Autophagy Autophgy-related RS011915

Autophagy Autophgy-related RS012458

Autophagy Autophgy-related RS014090

Autophagy Autophgy-related RS014951

Autophagy Buffy RS006694

Autophagy Target of rapamycin RS002594

1,3-beta-D glucan binding protein RS002847

1,3-beta-D glucan binding protein RS002848

1,3-beta-D glucan binding protein RS004742

1,3-beta-D glucan binding protein RS100018

1,3-beta-D glucan binding protein RS100020

Caspase RS003133

Caspase RS003489

104

Caspase RS005313

Caspase RS008854

Caspase RS012279

Caspase RS014033

Caspase RS014035

Caspase RS014259

Caspase activator RS012758

Catalase RS001412

Catalase RS001654

Catalase RS001671

Catalase RS001672

CLIP-domain serine protease RS004304

CLIP-domain serine protease RS004305

CLIP-domain serine protease RS004306

CLIP-domain serine protease RS004307

CLIP-domain serine protease RS007082

CLIP-domain serine protease RS010297

CLIP-domain serine protease RS010298

CLIP-domain serine protease RS010333

CLIP-domain serine protease RS010976

CLIP-domain serine protease RS010995

CLIP-domain serine protease RS011937

CLIP-domain serine protease RS012021

CLIP-domain serine protease RS012936

CLIP-domain serine protease RS013065

CLIP-domain serine protease RS013066

CLIP-domain serine protease RS013067

CLIP-domain serine protease RS013068

CLIP-domain serine protease RS013070

CLIP-domain serine protease RS013766

C-type lectine RS000351

C-type lectine RS001294

C-type lectine RS001855

C-type lectine RS003812

C-type lectine RS005053

105

C-type lectine RS006091

C-type lectine RS006677

C-type lectine RS007270

C-type lectine RS007745

C-type lectine RS007746

C-type lectine RS008198

C-type lectine RS008199

C-type lectine RS008200

C-type lectine RS008201

C-type lectine RS008204

C-type lectine RS010439

C-type lectine RS010966

C-type lectine RS011333

C-type lectine RS011334

C-type lectine RS011520

C-type lectine RS011567

C-type lectine RS011914

C-type lectine RS013652

C-type lectine RS013842

C-type lectine RS014902

C-type lectine RS015474

Fibrinogen-related protein RS000785

Fibrinogen-related protein RS006501

Fibrinogen-related protein RS008150

Fibrinogen-related protein RS011288

Galactoside-binding lectin RS005311

Galactoside-binding lectin RS006436

Galactoside-binding lectin RS013017

Galactoside-binding lectin RS013896

Galactoside-binding lectin RS013897

Inhibitor of apoptosis BIR repeat containing ubiquitin- RS006599 conjugating enzyme Inhibitor of apoptosis Death-associated inhibitor of RS011068 apoptosis 1 Inhibitor of apoptosis Death-associated inhibitor of RS011069 apoptosis 2 Inhibitor of apoptosis Deterin RS011623

106

IMD pathway member caspar RS003316

IMD pathway member Fas-associated death domain RS010104 ortholog IMD pathway member immune deficiency RS008532

IMD pathway member immune response deficient 5 RS010217

IMD pathway member kenny RS007928

IMD pathway member poor Imd response upon knock-in RS006682

IMD pathway member TAK1-associated binding protein 2 RS007289

IMD pathway member TGF-β activated kinase 1 RS002792

JAK/STAT pathway member domeless RS014980

JAK/STAT pathway member hopscotch RS002898

JAK/STAT pathway member Signal-transducer and activator of RS003724 transcription protein at 92E Lysozyme c-type lysozyme RS000427

Lysozyme c-type lysozyme RS002400

Lysozyme c-type lysozyme RS003406

Lysozyme c-type lysozyme RS008613

Lysozyme c-type lysozyme RS014698

Lysozyme c-type lysozyme RS100001

Lysozyme c-type lysozyme RS100002

Lysozyme c-type lysozyme RS100004

Lysozyme c-type lysozyme RS100021

Lysozyme c-type lysozyme RS100022

Lysozyme c-type lysozyme RS100023

Lysozyme c-type lysozyme RS100024

Lysozyme c-type lysozyme RS100025

Lysozyme c-type lysozyme RS100026

Lysozyme i-type lysozyme RS006054

Lysozyme i-type lysozyme RS008547

Lysozyme i-type lysozyme RS015579

MD2-like receptor RS013029

MD2-like receptor RS013030

MD2-like receptor RS013031

MD2-like receptor RS013032

MD2-like receptor RS015155

Peptidoglycan recognition protein RS005405

107

Peptidoglycan recognition protein RS006292

Peptidoglycan recognition protein RS010040

Peptidoglycan recognition protein RS013012

Peptidoglycan recognition protein RS013013

Peptidoglycan recognition protein RS100019

Prophenoloxidase RS008762

Peroxidase RS002951

Peroxidase RS002966

Peroxidase RS004201

Peroxidase RS004202

Peroxidase RS004203

Peroxidase RS004289

Peroxidase RS004290

Peroxidase RS006166

Peroxidase RS009222

Peroxidase RS009427

Peroxidase RS012128

Peroxidase RS012299

Peroxidase RS013770

Peroxidase RS014619

Relish-like protein dorsal RS002016

Relish-like protein Relish RS007470

Scavenger receptor RS001883

Scavenger receptor RS001884

Scavenger receptor RS001885

Scavenger receptor RS001886

Scavenger receptor RS001887

Scavenger receptor RS002546

Scavenger receptor RS002548

Scavenger receptor RS002549

Scavenger receptor RS002550

Scavenger receptor RS003004

Scavenger receptor RS003767

Scavenger receptor RS005025

Scavenger receptor RS007394

108

Scavenger receptor RS007395

Scavenger receptor RS007398

Scavenger receptor RS007853

Scavenger receptor RS007977

Scavenger receptor RS012012

Scavenger receptor RS012281

Scavenger receptor RS013965

Scavenger receptor RS014587

Superoxide dismutatse RS001583

Superoxide dismutatse RS001584

Superoxide dismutatse RS002757

Superoxide dismutatse RS004133

Superoxide dismutatse RS009320

Superoxide dismutatse RS013005

Spaetzle-like protein RS000600

Spaetzle-like protein RS000979

Spaetzle-like protein RS003253

Spaetzle-like protein RS004223

Spaetzle-like protein RS011245

Spaetzle-like protein RS011246

Spaetzle-like protein RS011463

Spaetzle-like protein RS011594

Serine protease inhibitor RS000470

Serine protease inhibitor RS001220

Serine protease inhibitor RS004075

Serine protease inhibitor RS005022

Serine protease inhibitor RS005471

Serine protease inhibitor RS006339

Serine protease inhibitor RS008737

Serine protease inhibitor RS011058

Serine protease inhibitor RS013650

Serine protease inhibitor RS014220

Serine protease inhibitor RS014221

Serine protease inhibitor RS014444

Small regulatory RNA pathway Argonaute 1 RS005621

109

Small regulatory RNA pathway Argonaute 2a RS010079

Small regulatory RNA pathway Argonaute 2b RS003821

Small regulatory RNA pathway Argonaute 3 RS003744

Small regulatory RNA pathway armitage RS006369

Small regulatory RNA pathway aubergine RS004296

Small regulatory RNA pathway Dicer 1 RS010489

Small regulatory RNA pathway Dicer 2 RS013700

Small regulatory RNA pathway drosha RS007949

Small regulatory RNA pathway loquacious RS001759

Small regulatory RNA pathway partner of drosha RS011750

Small regulatory RNA pathway piwi 1 RS002624

Small regulatory RNA pathway piwi 2 RS011008

Small regulatory RNA pathway R2D2 RS004300

Small regulatory RNA pathway Rm62 RS001900

Small regulatory RNA pathway Rm62 RS001901

Small regulatory RNA pathway Rm62 RS001902

Small regulatory RNA pathway Rm62 RS001905

Small regulatory RNA pathway Rm62 RS001906

Small regulatory RNA pathway Rm62 RS002802

Small regulatory RNA pathway Rm62 RS002903

Small regulatory RNA pathway Rm62 RS005517

Small regulatory RNA pathway Rm62 RS006077

Small regulatory RNA pathway Rm62 RS006940

Small regulatory RNA pathway Rm62 RS012401

Small regulatory RNA pathway Rm62 RS013373

Small regulatory RNA pathway spindle E RS010943

Small regulatory RNA pathway Tudor staphylococcal nuclease RS004901

Small regulatory RNA pathway vasa intronic gene RS011955

Thio-ester containing protein RS000697

Thio-ester containing protein RS010197

Thio-ester containing protein RS013791

Thio-ester containing protein RS014232

Toll pathway cactus RS009203

Toll pathway Myd88 RS011504

Toll pathway pelle RS014381

110

Toll pathway TNF-receptor-associated factor-like RS011260

Toll pathway tube RS013249

Toll pathway Toll receptor RS002528

Toll pathway Toll receptor RS005466

Toll pathway Toll receptor RS005470

Toll pathway Toll receptor RS005472

Toll pathway Toll receptor RS005474

Toll pathway Toll receptor RS007296

Toll pathway Toll receptor RS007975

Toll pathway Toll receptor RS008286

1389 1390

111

1391 Table S26. Insecticide-target genes in Reticulitermes speratus and their homologs in Drosophila 1392 melanogaster. Gene name D. melanogaster R. speratus

Ion channels

Acetylcholine esterase FBgn0000024 RS013479, RS009432

nicotinic Acetylcholine Receptor α1 FBgn0000036 RS002929

nicotinic Acetylcholine Receptor α2 FBgn0000039 RS002930

nicotinic Acetylcholine Receptor α3 FBgn0015519 RS001773

nicotinic Acetylcholine Receptor α4 FBgn0266347 RS001774

nicotinic Acetylcholine Receptor α5 FBgn0028875 RS003729

nicotinic Acetylcholine Receptor α6 FBgn0032151 RS014310

nicotinic Acetylcholine Receptor α7 FBgn0086778 RS003145

nicotinic Acetylcholine Receptor α8 - RS015063

nicotinic Acetylcholine Receptor α9 - RS015059

nicotinic Acetylcholine Receptor α10 - RS015059

nicotinic Acetylcholine Receptor β1 FBgn0000038 RS003147

nicotinic Acetylcholine Receptor β2 FBgn0004118 RS002931

nicotinic Acetylcholine Receptor β3 FBgn0031261 RS012432

GABA type A receptor FBgn0004244 RS010905

GABA type A receptor FBgn0030707 RS013195

Ligand-gated chloride channel FBgn0010240 RS013193

Glycine receptor FBgn0001134 RS007641

Histamine-gated chloride channel subunit FBgn0037950 RS002296

Histamine-gated chloride channel subunit FBgn0003011 -

Voltage-gated chloride channel subunit FBgn0051116 RS008419

Voltage-gated chloride channel subunit FBgn0033755 RS003825

Voltage-gated chloride channel subunit FBgn0036566 RS008614

112

Voltage-gated chloride channel subunit FBgn0038721 RS011813

Voltage-gated sodium channel subunit FBgn0264255 RS008214

Voltage-gated sodium channel subunit FBgn0085434 RS015165

Chitin synthesis

Glycogen phosphorylase FBgn0004507 RS013552, RS006448, RS006452

Trehalase FBgn0003748 RS004154, RS005093, RS013303

Hexokinase A FBgn0001186 RS007194

Hexokinase C FBgn0001187 no homolog

Phosphoglucose isomerase FBgn0003074 RS012390

Glutamine:fructose-6-phosphate FBgn0027341 RS005235 aminotransferase Glutamine:fructose-7-phosphate FBgn0039580 - aminotransferase Glucosamine-6-phosphate N- FBgn0039690 RS004411 acetyltransferase phosphoglucose mutase FBgn0003076 RS009458 mummy FBgn0259749 RS010490

Chitin synthase FBgn0001311 RS006626

Chitin synthase FBgn0029091 RS006625

Muscle-related

Ryanodine receptor FBgn0011286 RS000001, RS015395 sarcoplasmic/endoplasmic reticulum Ca2+ FBgn0263006 RS000365 ATPase myosin light chain kinase FBgn0265045 RS008592, RS012598, RS012599, RS014874, RS014875 Calcium/Calmodulin-dependent protein FBgn0005666 RS014950 kinase Calmodulin FBgn0000253 RS014134, RS013623

Calcium/calmodulin-dependent protein FBgn0016126 RS010237 kinase I Calcium/calmodulin-dependent protein FBgn0264607 RS005236 kinase II Calmodulin-binding transcription activator FBgn0259234 RS000076

Ras-related protein interacting with FBgn0265605 RS000452 calmodulin

113

Trimeric intracellular cation chanel FBgn0030745 RS011850

Calmodulin-binding protein related to a FBgn0025864 RS014554 Rab3 GDP/GTP exchange protein Aquaporin family

aquaporin FBgn0015872 RS000720

aquaporin FBgn0000180 RS005584, RS005585, RS005586

aquaporin FBgn0033807 RS011988

aquaporin FBgn0034883 RS013221

aquaporin FBgn0034884 -

aquaporin FBgn0034885 -

aquaporin FBgn0034882 -

aquaporin FBgn0033635 RS000719

1393 1394

114

1395 Table S27. Cytochrome P450s (CYP), Glutathione S-transferases (GST) and Carboxylesterases 1396 (CCE) genes in Reticulitermes speratus, Zootermopsis nevadensis and Macrotermes natalensis. 1397 Gene R. speratus Z. nevadensis M. natalensis name CYP RS000202, RS000210, RS000211, Znev_00012, Znev_00957, Znev_00958, MN000227, MN000228, RS000212, RS000444, RS000445, Znev_01139, Znev_01838, Znev_01867, MN000347, MN001136, RS000465, RS000466, RS000467, Znev_01868, Znev_02456, Znev_02808, MN001137, MN001432, RS000812, RS000815, RS000859, Znev_03004, Znev_03222, Znev_04232, MN002329, MN002525, RS000980, RS000981, RS000982, Znev_04417, Znev_04827, Znev_04985, MN002815, MN002965, RS000983, RS000984, RS000985, Znev_05339, Znev_05340, Znev_05390, MN003123, MN003207, RS000986, RS000987, RS001122, Znev_05391, Znev_05398, Znev_06057, MN003446, MN003447, RS001799, RS001847, RS002862, Znev_06128, Znev_06541, Znev_06629, MN003965, MN004138, RS002863, RS002871, RS002872, Znev_07037, Znev_08570, Znev_08701, MN004139, MN004141, RS002910, RS003017, RS003149, Znev_08930, Znev_09012, Znev_09132, MN004142, MN004409, RS003150, RS003151, RS003152, Znev_09277, Znev_09478, Znev_09480, MN004466, MN004803, RS003153, RS003154, RS003155, Znev_09481, Znev_11665, Znev_12901, MN004999, MN005205, RS003156, RS003157, RS003158, Znev_12912, Znev_13255, Znev_13889, MN005352, MN005353, RS003160, RS003206, RS003207, Znev_13890, Znev_13891, Znev_13892, MN005542, MN005559, RS003672, RS003688, RS003709, Znev_13893, Znev_14063, Znev_14143, MN005560, MN005562, RS003788, RS004073, RS004422, Znev_14286, Znev_14287, Znev_14299, MN005695, MN005696, RS004428, RS004445, RS004446, Znev_14300, Znev_14301, Znev_14302, MN005697, MN005770, RS004448, RS004449, RS004450, Znev_14590, Znev_14632, Znev_14659, MN005961, MN006039, RS005356, RS005658, RS005672, Znev_14677, Znev_14802, Znev_14833, MN006125, MN006126, RS006327, RS007757, RS008449, Znev_15638, Znev_15869, Znev_15870, MN006243, MN006414, RS008850, RS008851, RS008852, Znev_16120, Znev_16125, Znev_16153, MN006416, MN006418, RS009217, RS010162, RS010163, Znev_16218, Znev_16223, Znev_16398, MN006598, MN006609, RS010164, RS010165, RS010451, Znev_16438, Znev_16439, Znev_16771, MN006690, MN006691, RS010514, RS010600, RS010601, Znev_17183, Znev_18486, Znev_18620, MN006758, MN007379, RS010602, RS010608, RS010609, Znev_18647 MN007380, MN007381, RS010610, RS010611, RS010975, MN007382, MN007383, RS011306, RS011875, RS011957, MN007384, MN007385, RS012246, RS012921, RS013159, MN007386, MN007388, RS013250, RS013635, RS013647, MN007624, MN007639, RS013663, RS013664, RS013665, MN007913, MN007915, RS013784, RS013787, RS013788, MN007918, MN008303, RS013835, RS014290, RS014291, MN008754, MN008793, RS014292, RS014413, RS014414, MN008794, MN008892, RS014484, RS014485, RS014624, MN008893, MN009093, RS014933, RS015215, RS015219, MN009118, MN009119, RS015221 MN009543, MN009788, MN009789, MN010562, MN010563, MN010564, MN011715, MN011716, MN011746, MN011747, MN011748, MN012022, MN012023 GST RS000232, RS001168, RS001200, Znev_00124, Znev_00817, Znev_00818, MN000171, MN000172, RS003134, RS003137, RS005177, Znev_04286, Znev_04467, Znev_04468, MN000173, MN000543, RS005799, RS007488, RS007489, Znev_04470, Znev_08437, Znev_11969, MN000546, MN001107, RS007490, RS009342, RS009343, Znev_12236, Znev_13081, Znev_13082, MN001778, MN002944, RS009344, RS009865, RS010039, Znev_14234, Znev_14305, Znev_15569, MN004973, MN005428, RS011657, RS011756, RS013048, Znev_15570, Znev_15795, Znev_15978 MN005538, MN006379, RS014031, RS015157 MN006654, MN009351, MN010121, MN010281, MN011310 CCE RS001422, RS001423, RS001960, Znev_00356, Znev_00985, Znev_01464, MN000642, MN001220, RS001961, RS001964, RS001965, Znev_01490, Znev_01491, Znev_01695, MN001683, MN002124, RS001966, RS001967, RS002191, Znev_01720, Znev_01734, Znev_01735, MN002125, MN003389, RS003673, RS003894, RS003895, Znev_01778, Znev_01779, Znev_01844, MN003390, MN004198, RS003910, RS004129, RS004711, Znev_02218, Znev_02219, Znev_02223, MN005717, MN005718, RS004712, RS004713, RS006008, Znev_02224, Znev_02226, Znev_02227, MN006035, MN006244, RS006009, RS006010, RS006537, Znev_02228, Znev_05318, Znev_07005, MN006980, MN007226, RS006938, RS007152, RS007153, Znev_07007, Znev_07097, Znev_10214, MN008137, MN008387, RS007154, RS007155, RS007631, Znev_12048, Znev_12702, Znev_12703, MN010399, MN010706, RS007632, RS007668, RS007829, Znev_12704, Znev_12774, Znev_12960, MN011531, MN011532, RS009273, RS009432, RS010553, Znev_13209, Znev_14317, Znev_14604, MN011533, MN011534, RS011141, RS011142, RS011143, Znev_14951, Znev_15823, Znev_15907, MN011536, MN011537, RS011477, RS011642, RS012006, Znev_17070, Znev_17401, Znev_17972, MN011538, MN011539, RS013406, RS013479, RS014537, Znev_18481, Znev_18523 MN011540, MN011541, RS014538 MN011670, MN011672, MN011673, MN011731 1398 115

1399 Table S28. Gene numbers of Cytochrome P450s (CYP), Glutathione S-transferases (GST) and 1400 Carboxylesterases (CCE) in 6 insect species. Drosophila Periplaneta Cryptocercus Zootermopsis Reticulitermes Macrotermes melanogaster* americana punctulatus nevadensis speratus natalensis CYP 89 264 124 75 106 94 GST 39 55 26 18 20 17 CCE 35 108 66 41 43 32 1401 *Gene numbers are referred to Ranson et al. (2002). 1402 1403 1404

116

1405 Dataset S1 (separate file). Caste-biased genes [Excel] (SI_Data_1_caste-biased_genes.xlsx). 1406 1407 1408 SI References 1409 1. Kato, Y., Kobayashi, K., Watanabe, H. & Iguchi, T. Environmental sex determination in 1410 the branchiopod crustacean Daphnia magna: deep conservation of a Doublesex gene in 1411 the sex-determining pathway. PLoS Genet. 7, e1001345 (2011). 1412 2. Pomerantz, A. F. & Hoy, M. A. Expression analysis of Drosophila doublesex, transformer- 1413 2, intersex, fruitless-like, and vitellogenin homologs in the parahaploid predator 1414 Metaseiulus occidentalis (Chelicerata: Acari: Phytoseiidae). Exp. Appl. Acarol. 65, 1–16 1415 (2015). 1416 3. Price, D. C., Egizi, A. & Fonseca, D. M. The ubiquity and ancestry of insect doublesex. 1417 Sci. Rep. 5, 13068 (2015). 1418 4. Verhulst, E. C., van de Zande, L. & Beukeboom, L. W. Insect sex determination: it all 1419 evolves around transformer. Curr. Opin. Genet. Dev. 20, 376–383 (2010). 1420 5. Geuverink, E. & Beukeboom, L. W. Phylogenetic distribution and evolutionary dynamics 1421 of the sex determination genes doublesex and transformer in insects. Sex. Dev. 8, 38–49 1422 (2014). 1423 6. Wexler, J. et al. Hemimetabolous insects elucidate the origin of sexual development via 1424 alternative splicing. eLife 8, (2019). 1425 7. Bourguignon, T., Hayashi, Y. & Miura, T. Skewed soldier sex ratio in termites: testing the 1426 size-threshold hypothesis. Insectes Soc. 59, 557–563 (2012). 1427 8. Kitade, O., Miyata, H., Hoshi, M. & Hayashi, Y. Sex ratios and caste compositions in field 1428 colonies of the termite Reticulitermes speratus in Eastern Japan. Sociobiology 55, 379– 1429 386 (2010). 1430 9. Matsuura, K. et al. Identification of a pheromone regulating caste differentiation in 1431 termites. Proc. Natl. Acad. Sci. U.S.A. 107, 12963–12968 (2010). 1432 10. Geer, L. Y. et al. The NCBI BioSystems database. Nucleic Acids Res. 38, D492-496 1433 (2010). 1434 11. Kiuchi, T. et al. A single female-specific piRNA is the primary determiner of sex in the 1435 silkworm. Nature 509, 633–636 (2014). 1436 12. Suzuki, M. G., Imanishi, S., Dohmae, N., Asanuma, M. & Matsumoto, S. Identification of a 1437 male-specific RNA binding protein that regulates sex-specific splicing of Bmdsx by 1438 increasing RNA binding activity of BmPSI. Mol. Cell. Biol. 30, 5776–5786 (2010). 1439 13. Suzuki, M. G. et al. Establishment of a novel in vivo sex-specific splicing assay system to 1440 identify a trans-acting factor that negatively regulates splicing of Bombyx mori dsx female 1441 exons. Mol. Cell. Biol. 28, 333–343 (2008). 1442 14. Kopp, A. Dmrt genes in the development and evolution of sexual dimorphism. Trends 1443 Genet. 28, 175–184 (2012). 1444 15. Erickson, J. W. & Quintero, J. J. Indirect Effects of ploidy suggest X chromosome dose, 1445 not the X:A ratio, signals sex in Drosophila. PLoS Biol. 5, e332 (2007). 1446 16. Goldberg, A. D., Allis, C. D. & Bernstein, E. Epigenetics: a landscape takes shape. Cell 1447 128, 635–638 (2007). 1448 17. Fagiolini, M., Jensen, C. L. & Champagne, F. A. Epigenetic influences on brain 1449 development and plasticity. Curr. Opin. Neurobiol. 19, 207–212 (2009). 1450 18. Stilling, R. M. & Fischer, A. A Drosophila model for the role of epigenetics in brain 1451 function and development. Genome Biol. 12, 103 (2011). 1452 19. Jaenisch, R. & Bird, A. Epigenetic regulation of gene expression: how the genome 1453 integrates intrinsic and environmental signals. Nat. Genet. 33 Suppl, 245–254 (2003). 1454 20. Bonasio, R. The role of chromatin and epigenetics in the polyphenisms of ant castes. 1455 Brief. Funct. Genomics 13, 235–245 (2014). 1456 21. Cridge, A. G., Leask, M. P., Duncan, E. J. & Dearden, P. K. What do studies of insect 1457 polyphenisms tell us about nutritionally-triggered epigenomic changes and their 1458 consequences? Nutrients 7, 1787–1797 (2015). 1459 22. Ernst, U. R. et al. Epigenetics and locust life phase transitions. J. Exp. Biol. 218, 88–99 1460 (2015). 117

1461 23. Srinivasan, D. G. & Brisson, J. A. Aphids: A Model for Polyphenism and Epigenetics. 1462 Genet. Res. Int. 2012, 1–12 (2012). 1463 24. Spannhoff, A. et al. Histone deacetylase inhibitor activity in royal jelly might facilitate 1464 caste switching in . EMBO Rep. 12, 238–243 (2011). 1465 25. Simola, D. F. et al. A chromatin link to caste identity in the carpenter ant Camponotus 1466 floridanus. Genome Res. 23, 486–496 (2013). 1467 26. Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein 1468 database search programs. Nucleic Acids Res. 25, 3389–3402 (1997). 1469 27. Terrapon, N. et al. Molecular traces of alternative social organization in a termite genome. 1470 Nat. Commun. 5, 3636 (2014). 1471 28. Clough, E., Tedeschi, T. & Hazelrigg, T. Epigenetic regulation of oogenesis and germ 1472 stem cell maintenance by the Drosophila histone methyltransferase Eggless/dSetDB1. 1473 Dev. Biol. 388, 181–191 (2014). 1474 29. Di Stefano, L., Ji, J.-Y., Moon, N.-S., Herr, A. & Dyson, N. Mutation of Drosophila Lsd1 1475 disrupts H3-K4 methylation, resulting in tissue-specific defects during development. Curr. 1476 Biol.: CB 17, 808–812 (2007). 1477 30. Wang, X. et al. Histone H3K9 trimethylase Eggless controls germline stem cell 1478 maintenance and differentiation. PLoS Genet. 7, e1002426 (2011). 1479 31. Bonasio, R. et al. Genomic comparison of the ants Camponotus floridanus and 1480 Harpegnathos saltator. Science 329, 1068–1071 (2010). 1481 32. Matsuura, K. et al. Queen succession through asexual reproduction in termites. Science 1482 323, 1687–1687 (2009). 1483 33. Moczek, A. P. & Snell-Rood, E. C. The basis of bee-ing different: the role of gene 1484 silencing in plasticity. Evol. Dev. 10, 511–513 (2008). 1485 34. Glastad, K. M., Hunt, B. G., Yi, S. V. & Goodisman, M. A. D. DNA methylation in insects: 1486 on the brink of the epigenomic era. Insect Mol. Biol. 20, 553–565 (2011). 1487 35. Kucharski, R., Maleszka, J., Foret, S. & Maleszka, R. Nutritional control of reproductive 1488 status in honeybees via DNA methylation. Science 319, 1827–1830 (2008). 1489 36. Lo, N., Li, B. & Ujvari, B. DNA methylation in the termite Coptotermes lacteus. Insectes 1490 Soc. 59, 257–261 (2012). 1491 37. Glastad, K. M., Gokhale, K., Liebig, J. & Goodisman, M. A. The caste-and sex-specific 1492 DNA methylome of the termite Zootermopsis nevadensis. Sci. Rep. 6, 1–14 (2016). 1493 38. Bestor, T. H. Activation of mammalian DNA methyltransferase by cleavage of a Zn 1494 binding regulatory domain. EMBO J. 11, 2611–2617 (1992). 1495 39. Pradhan, S., Bacolla, A., Wells, R. D. & Roberts, R. J. Recombinant human DNA 1496 (cytosine-5) methyltransferase I. Expression, purification, and comparison of de novo and 1497 maintenance methylation. J. Biol. Chem. 274, 33002–33010 (1999). 1498 40. Okano, M., Xie, S. & Li, E. Cloning and characterization of a family of novel mammalian 1499 DNA (cytosine-5) methyltransferases. Nat. Genet. 19, 219–220 (1998). 1500 41. Okano, M., Bell, D. W., Haber, D. A. & Li, E. DNA methyltransferases Dnmt3a and 1501 Dnmt3b are essential for de novo methylation and mammalian development. Cell 99, 1502 247–257 (1999). 1503 42. Tuesta, L. M. & Zhang, Y. Mechanisms of epigenetic memory and addiction. EMBO J. 33, 1504 1091–1103 (2014). 1505 43. Boyes, J. & Bird, A. DNA methylation inhibits transcription indirectly via a methyl-CpG 1506 binding protein. Cell 64, 1123–1134 (1991). 1507 44. Roder, K. et al. Transcriptional repression by Drosophila Methyl-CpG-binding 1508 proteins. Mol. Cell. Biol. 20, 7401–7409 (2000). 1509 45. Hayashi, Y. et al. Construction and characterization of normalized cDNA libraries by 454 1510 pyrosequencing and estimation of DNA methylation levels in three distantly related 1511 termite species. PLoS One 8, e76678 (2013). 1512 46. Mitaka, Y., Tasaki, E., Nozaki, T., Fuchikawa, T., Kobayashi, K. & Matsuura, K. 1513 Transcriptomic analysis of epigenetic modification genes in the termite Reticulitermes 1514 speratus. Insect Sci. 27, 202–211 (2020). 1515 47. Harrison, M. C. et al. Hemimetabolous genomes reveal molecular basis of termite 1516 eusociality. Nat. Ecol. Evol. 2, 557–566 (2018).

118

1517 48. Li, S. et al. The genomic and functional landscapes of developmental plasticity in the 1518 . Nat. Commun. 9, 1–11 (2018). 1519 49. Poulsen, M. et al. Complementary symbiont contributions to plant decomposition in a 1520 -farming termite. Proc. Natl. Acad. Sci. U.S.A. 111, 14500–14505 (2014). 1521 50. Itakura, S., Yoshikawa, Y., Togami, Y. & Umezawa, K. Draft genome sequence of the 1522 termite, Coptotermes formosanus: Genetic insights into the pyruvate dehydrogenase 1523 complex of the termite. J. Asia Pac. Entomol. 23, 666–674 (2020). 1524 51. Slater, G. S. C. & Birney, E. (2005). Automated generation of heuristics for biological 1525 sequence comparison. BMC Bioinform. 6, 31. 1526 52. Clyne, P. J. et al. A Novel Family of divergent seven-transmembrane proteins. Neuron 22, 1527 327–338 (1999). 1528 53. Clyne, P. J. Candidate taste receptors in Drosophila. Science 287, 1830–1834 (2000). 1529 54. Benton, R., Vannice, K. S., Gomez-Diaz, C. & Vosshall, L. B. Variant ionotropic glutamate 1530 receptors as chemosensory receptors in Drosophila. Cell 136, 149–162 (2009). 1531 55. Pelosi, P., Zhou, J.-J., Ban, L. P. & Calvello, M. Soluble proteins in insect chemical 1532 communication. Cell. Mol. Life Sci. 63, 1658–1676 (2006). 1533 56. Campanacci, V. et al. chemosensory protein exhibits drastic conformational 1534 changes and cooperativity on ligand binding. Proc. Natl. Acad. Sci. U.S.A. 100, 5069– 1535 5074 (2003). 1536 57. Vogt, R. G. & Riddiford, L. M. Pheromone binding and inactivation by moth antennae. 1537 Nature 293, 161–163 (1981). 1538 58. Benton, R., Vannice, K. S. & Vosshall, L. B. An essential role for a CD36-related receptor 1539 in pheromone detection in Drosophila. Nature 450, 289–293 (2007). 1540 59. Zhou, X. et al. Chemoreceptor evolution in Hymenoptera and its implications for the 1541 evolution of eusociality. Genome Biol. Evol. 7, 2407–2416 (2015). 1542 60. Calvello, M. et al. Expression of odorant-binding proteins and chemosensory proteins in 1543 some Hymenoptera. Insect Biochem. Mol. Biol. 35, 297–307 (2005). 1544 61. Forêt, S., Wanner, K. W. & Maleszka, R. Chemosensory proteins in the honey bee: 1545 Insights from the annotated genome, comparative analyses and expressional profiling. 1546 Insect Biochem. Mol. Biol. 37, 19–28 (2007). 1547 62. González, D. et al. The major antennal chemosensory protein of red imported fire ant 1548 workers. Insect Mol. Biol. 18, 395–404 (2009). 1549 63. Hojo, M. K. et al. Antennal RNA-sequencing analysis reveals evolutionary aspects of 1550 chemosensory proteins in the carpenter ant, Camponotus japonicus. Sci. Rep. 5, 13541 1551 (2015). 1552 64. Leal, W. S. & Ishida, Y. GP-9s are ubiquitous proteins unlikely involved in olfactory 1553 mediation of social organization in the red imported fire ant, Solenopsis invicta. PloS One 1554 3, e3762 (2008). 1555 65. McKenzie, S. K., Oxley, P. R. & Kronauer, D. J. C. Comparative genomics and 1556 transcriptomics in ants provide new insights into the evolution and function of odorant 1557 binding and chemosensory proteins. BMC Genom. 15, 718 (2014). 1558 66. Croset, V. et al. Ancient protostome origin of chemosensory ionotropic glutamate 1559 receptors and the evolution of insect taste and olfaction. PLoS Genet. 6, e1001064 1560 (2010). 1561 67. Kirkness, E. F. et al. Genome sequences of the human body louse and its primary 1562 endosymbiont provide insights into the permanent parasitic lifestyle. Proc. Natl. Acad. 1563 Sci. U.S.A. 107, 12168–12173 (2010). 1564 68. Nichols, Z. & Vogt, R. G. The SNMP/CD36 gene family in Diptera, Hymenoptera and 1565 Coleoptera: Drosophila melanogaster, D. pseudoobscura, Anopheles gambiae, Aedes 1566 aegypti, Apis mellifera, and Tribolium castaneum. Insect Biochem. Mol. Biol. 38, 398–415 1567 (2008). 1568 69. Robertson, H. M., Warr, C. G. & Carlson, J. R. Molecular evolution of the insect 1569 chemoreceptor gene superfamily in Drosophila melanogaster. Proc. Natl. Acad. Sci. 1570 U.S.A. 100, 14537–14542 (2003).

119

1571 70. Robertson, H. M. & Wanner, K. W. The chemoreceptor superfamily in the honey bee, 1572 Apis mellifera: expansion of the odorant, but not gustatory, receptor family. Genome Res. 1573 16, 1395–1403 (2006). 1574 71. Smadja, C., Shi, P., Butlin, R. K. & Robertson, H. M. Large gene family expansions and 1575 adaptive evolution for odorant and gustatory receptors in the pea aphid, Acyrthosiphon 1576 pisum. Mol. Biol. Evol. 26, 2073–2086 (2009). 1577 72. Vogt, R. G. et al. The insect SNMP gene family. Insect Biochem. Mol. Biol. 39, 448–456 1578 (2009). 1579 73. Katoh, K., Kuma, K., Toh, H. & Miyata, T. MAFFT version 5: improvement in accuracy of 1580 multiple sequence alignment. Nucleic Acids Res. 33, 511–518 (2005). 1581 74. Capella-Gutierrez, S., Silla-Martinez, J. M. & Gabaldon, T. trimAl: a tool for automated 1582 alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973 1583 (2009). 1584 75. Stamatakis, A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with 1585 thousands of taxa and mixed models. Bioinformatics 22, 2688–2690 (2006). 1586 76. Kulmuni, J., Wurm, Y. & Pamilo, P. Comparative genomics of chemosensory protein 1587 genes reveals rapid evolution and positive selection in ant-specific duplicates. Heredity 1588 110, 538–547 (2013). 1589 77. Blenau, W. & Baumann, A. Molecular and pharmacological properties of insect biogenic 1590 amine receptors: lessons from Drosophila melanogaster and Apis mellifera. Archives of 1591 Insect Biochem. Physiol. 48, 13–38 (2001). 1592 78. Mirabeau, O. & Joly, J.-S. Molecular evolution of peptidergic signaling systems in 1593 bilaterians. Proc. Natl. Acad. Sci. U.S.A. 110, E2028–E2037 (2013). 1594 79. Liang, Z. S. et al. Molecular determinants of scouting behavior in honey bees. Science 1595 335, 1225–1228 (2012). 1596 80. Predel, R., Neupert, S., Russell, W. K., Scheibner, O. & Nachman, R. J. Corazonin in 1597 insects. Peptides 28, 3–10 (2007). 1598 81. Veenstra, J. A. The contribution of the genomes of a termite and a locust to our 1599 understanding of insect neuropeptides and neurohormones. Front. Physiol. 5, 454 (2014). 1600 82. Sasaki, K. & Harano, K. Multiple regulatory roles of dopamine in behavior and 1601 reproduction of social insects. Trends Entomol. 6, 1–13. 1602 83. Miyatake, T. et al. Pleiotropic antipredator strategies, fleeing and feigning death, 1603 correlated with dopamine levels in Tribolium castaneum. Animal Behaviour 75, 113–121 1604 (2008). 1605 84. Okada, Y. et al. Social dominance and reproductive differentiation mediated by 1606 dopaminergic signaling in a queenless ant. J. Exp. Biol. 218, 1091–1098 (2015). 1607 85. Wada-Katsumata, A., Yamaoka, R. & Aonuma, H. Social interactions influence dopamine 1608 and octopamine homeostasis in the brain of the ant Formica japonica. J. Exp. Biol. 214, 1609 1707–1713 (2011). 1610 86. Han, Q., Robinson, H., Ding, H., Christensen, B. M. & Li, J. Evolution of insect 1611 arylalkylamine N-acetyltransferases: structural evidence from the yellow fever mosquito, 1612 Aedes aegypti. Proc. Natl. Acad. Sci. U.S.A.109, 11669–11674 (2012). 1613 87. Ishikawa, Y., Aonuma, H., Sasaki, K. & Miura, T. Tyraminergic and Octopaminergic 1614 modulation of defensive behavior in termite soldier. PloS One 11, e0154230 (2016). 1615 88. Yaguchi, H., Inoue, T., Sasaki, K. & Maekawa, K. Dopamine regulates termite soldier 1616 differentiation through trophallactic behaviours. R. Soc. Open Sci. 3, 150574 (2016). 1617 89. Vergoz, V., Lim, J. & Oldroyd, B. P. Biogenic amine receptor gene expression in the 1618 ovarian tissue of the honey bee Apis mellifera. Insect Mol. Biol. 21, 21–29 (2012). 1619 90. Zhou, C., Rao, Y. & Rao, Y. A subset of octopaminergic neurons are important for 1620 Drosophila aggression. Nat. Neurosci. 11, 1059–1067 (2008). 1621 91. Koon, A. C. et al. Autoregulatory and paracrine control of synaptic and behavioral 1622 plasticity by octopaminergic signaling. Nature Neuroscience 14, 190–199 (2011). 1623 92. Schoofs, L., De Loof, A. & Van Hiel, M. B. Neuropeptides as regulators of behavior in 1624 insects. Annu. Rev. Entomol. 62, 35–52 (2017). 1625 93. Korb, J. Juvenile Hormone. Adv. Insect Physiol. 48, 131–161 (2015).

120

1626 94. Miura, T. Juvenile hormone as a physiological regulator mediating phenotypic plasticity in 1627 pancrustaceans. Dev. Growth Differ. 61, 85–96 (2019). 1628 95. Cornette, R., Koshikawa, S. & Miura, T. Histology of the hormone-producing glands in the 1629 damp-wood termite Hodotermopsis sjostedti (Isoptera, Termopsidae): A focus on soldier 1630 differentiation. Insectes Soc. 55, 407–416 (2008). 1631 96. Miura, T. & Maekawa, K. The making of the defensive caste: Physiology, development, 1632 and evolution of the soldier differentiation in termites. Evol. Dev. 22, 425–437 (2020). 1633 97. Zhou, X., Oi, F. M. & Scharf, M. E. Social exploitation of hexamerin: RNAi reveals a major 1634 caste-regulatory factor in termites. Proc. Natl. Acad. Sci. U.S.A. 103, 4499–4504 (2006). 1635 98. Yaguchi, H., Masuoka, Y., Inoue, T. & Maekawa, K. Expressions of juvenile hormone 1636 biosynthetic genes during presoldier differentiation in the incipient colony of Zootermopsis 1637 nevadensis (Isoptera: Archotermopsidae). Appl. Entomol. Zool. 50, 497–508 (2015). 1638 99. Masuoka, Y., Yaguchi, H., Suzuki, R. & Maekawa, K. Knockdown of the juvenile hormone 1639 receptor gene inhibits soldier-specific morphogenesis in the damp-wood termite 1640 Zootermopsis nevadensis (Isoptera: Archotermopsidae). Insect Biochem. Mol. Biol. 64, 1641 25–31 (2015). 1642 100. Saiki, R., Gotoh, H., Toga, K., Miura, T. & Maekawa, K. High juvenile hormone titre and 1643 abdominal activation of JH signalling may induce reproduction of termite neotenics: JH 1644 titre and signalling pathways in termites. Insect Mol. Biol. 24, 432–441 (2015). 1645 101. Jongepier, E. et al. Remodeling of the juvenile hormone pathway through caste-biased 1646 gene expression and positive selection along a gradient of termite eusociality. J. Exp. 1647 Zool. B. 330, 296–304 (2018). 1648 102. Bourguignon, T. et al. The evolutionary history of termites as inferred from 66 1649 mitochondrial genomes. Mol. Biol. Evol. 32, 406–421 (2015). 1650 103. Bucek, A. et al. Evolution of termite symbiosis informed by transcriptome-based 1651 phylogenies. Curr. Biol. 29, 3728-3734.e4 (2019). 1652 104. Inward, D., Beccaloni, G. & Eggleton, P. Death of an order: a comprehensive molecular 1653 phylogenetic study confirms that termites are eusocial cockroaches. Biol. Lett. 3, 331–335 1654 (2007). 1655 105. Villalobos-Sambucaro, M. J., Riccillo, F. L., Calderón-Fernández, G. M., Sterkel, M., 1656 Diambra, L. A. & Ronderos, J. R. Genomic and functional characterization of a 1657 methoprene-tolerant gene in the kissing-bug Rhodnius prolixus. Gen. Comp. Endocrinol. 1658 216, 1–8 (2015). 1659 106. Hojo, M., Toga, K., Watanabe, D., Yamamoto, T. & Maekawa, K. High-level expression of 1660 the Geranylgeranyl diphosphate synthase gene in the frontal gland of soldiers in 1661 Reticulitermes speratus (Isoptera: ). Arch. Insect Biochem. Physiol. 77, 1662 17–31 (2011). 1663 107. Marchal, E. et al. Final steps in juvenile hormone biosynthesis in the desert locust, 1664 Schistocerca gregaria. Insect Biochem. Mol. Biol. 41, 219–227 (2011). 1665 108. Maekawa, K., Ishitani, K., Gotoh, H., Cornette, R. & Miura, T. Juvenile Hormone titre and 1666 vitellogenin gene expression related to ovarian development in primary reproductives 1667 compared with nymphs and nymphoid reproductives of the termite Reticulitermes 1668 speratus. Physiol. Entomol. 35, 52–58 (2010). 1669 109. Abdel-latief, M. & Hoffmann, K. H. Functional activity of allatotropin and allatostatin in the 1670 pupal stage of a holometablous insect, Tribolium castaneum (Coleoptera, Tenebrionidae). 1671 Peptides 53, 172–184 (2014). 1672 110. Evans, J. D. & Wheeler, D. E. Expression profiles during honeybee caste determination. 1673 Genome Biol. 2, RESEARCH0001 (2001). 1674 111. Kumar, S., Stecher, G. & Tamura, K. MEGA7: Molecular evolutionary genetics analysis 1675 version 7.0 for bigger datasets. Mol. Biol. Evol. 33, 1870–1874 (2016). 1676 112. Grieneisen, M., Mok, A., Kieckbusch, T. D. & Schooley, D. A. The specificity of juvenile 1677 hormone esterase revisited. Insect Biochem. Mol. Biol. 27, 365–376 (1997). 1678 113. Goodman, W. G. & Cusson, M. The Juvenile Hormones. in Insect Endocrinology (ed. 1679 Gilbert, L. I.) 310–365 (Elsevier, 2012). 1680 114. Nijhout, H. F. Insect hormones. (Princeton University Press, 1994).

121

1681 115. Niwa, R. & Niwa, Y. S. Enzymes for ecdysteroid biosynthesis: their biological functions in 1682 insects and beyond. Biosci. Biotechnol. Biochem. 78, 1283–1292 (2014). 1683 116. Masuoka, Y., Miyazaki, S., Saiki, R., Tsuchida, T. & Maekawa, K. High Laccase2 1684 expression is likely involved in the formation of specific cuticular structures during soldier 1685 differentiation of the termite Reticulitermes speratus. Arthropod Struct. Dev. 42, 469–475 1686 (2013). 1687 117. Masuoka, Y. & Maekawa, K. Gene expression changes in the tyrosine metabolic pathway 1688 regulate caste-specific cuticular pigmentation of termites. Insect Biochem. Mol. Biol. 74, 1689 21–31 (2016). 1690 118. Masuoka, Y., Yaguchi, H., Toga, K., Shigenobu, S. & Maekawa, K. TGFβ signaling 1691 related genes are involved in hormonal mediation during termite soldier differentiation. 1692 PLoS Genet. 14, e1007338 (2018). 1693 119. Dumser, J. B. In vitro effects of ecdysterone on the spermatogonial cell cycle in Locusta. 1694 International Journal of Invertebrate Reproduction 2, 165–174 (1980). 1695 120. Takeda, N. Effect of ecdysterone on spermatogenesis in the diapausing slug moth 1696 pharate pupa, . J. Insect Physiol. 18, 571–580 (1972). 1697 121. Yamazaki, Y. et al. Differential expression of HR38 in the mushroom bodies of the 1698 honeybee brain depends on the caste and division of labor. FEBS Lett. 580, 2667–2670 1699 (2006). 1700 122. Belles, X. Krüppel homolog 1 and E93: The doorkeeper and the key to insect 1701 metamorphosis. Arch. Insect Biochem. Physiol. 103, e21609 (2020). 1702 123. Belles, X. & Santos, C. G. The MEKRE93 (Methoprene tolerant-Krüppel homolog 1-E93) 1703 pathway in the regulation of insect metamorphosis, and the homology of the pupal stage. 1704 Insect Biochem. Mol. Biol. 52, 60–68 (2014). 1705 124. Ureña, E., Manjón, C., Franch-Marro, X. & Martín, D. Transcription factor E93 specifies 1706 adult metamorphosis in hemimetabolous and holometabolous insects. Proc. Natl. Acad. 1707 Sci. U.S.A. 111, 7024–7029 (2014). 1708 125. Wu, Q. & Brown, M. R. Signaling and function of insulin-like peptides in insects. Annu. 1709 Rev. Entomol. 51, 1–24 (2006). 1710 126. Corona, M. et al. Vitellogenin, juvenile hormone, insulin signaling, and queen honey bee 1711 longevity. Proc. Natl. Acad. Sci. U.S.A. 104, 7128–7133 (2007). 1712 127. Wolschin, F., Mutti, N. S. & Amdam, G. V. Insulin receptor substrate influences female 1713 caste development in honeybees. Biol. Lett. 7, 112–115 (2011). 1714 128. Hattori, A. et al. Soldier morphogenesis in the damp-wood termite is regulated by the 1715 insulin signaling pathway. J. Exp. Zool. B. 320, 295–306 (2013). 1716 129. Teleman, A. A. Molecular mechanisms of metabolic regulation by insulin in Drosophila. 1717 Biochem. J. 425, 13–26 (2010). 1718 130. Xu, H. J. & Zhang, C. X. Insulin receptors and wing dimorphism in rice planthoppers. 1719 Philos. Trans. R. Soc. Lond. B. 372, 20150489 (2017). 1720 131. Honegger, B. et al. Imp-L2, a putative homolog of vertebrate IGF-binding protein 7, 1721 counteracts insulin signaling in Drosophila and is essential for starvation resistance. J. 1722 Biol. 7, 10 (2008). 1723 132. Deligne, J., Quennedy, A. & Blum, M. S. The enemies and defense mechanisms of 1724 termites. in Social insects, vol. 2 (ed. Hermann, H. R.) 1–76 (Academic Press, 1981). 1725 133. Watanabe, D., Gotoh, H., Miura, T. & Maekawa, K. Social interactions affecting caste 1726 development through physiological actions in termites. Front. Physiol. 5, 127 (2014). 1727 134. Carroll, S. B., Grenier, J. K. & Weatherbee, S. D. From DNA to Diversity: Molecular 1728 Genetics and the Evolution of Animal Design 2nd edition (Blackwell Science, 2005). 1729 135. Hartfelder, K. & Emlen, D. J. Endocrine control of insect polyphenism. in Comprehensive 1730 Mol. Insect Sci. 3, 651–703 (2005). 1731 136. Xu, H.-J. et al. Two insulin receptors determine alternative wing morphs in planthoppers. 1732 Nature 519, 464–467 (2015). 1733 137. Abouheif, E. Evolution of the gene network underlying wing polyphenism in ants. Science 1734 297, 249–252 (2002).

122

1735 138. Brisson, J. A., Ishikawa, A. & Miura, T. Wing development genes of the pea aphid and 1736 differential gene expression between winged and unwinged morphs: Wing development 1737 genes in the pea aphid. Insect Mol. Biol. 19, 63–73 (2010). 1738 139. Nii, R., Oguchi, K., Shinji, J., Koshikawa, S. & Miura, T. Reduction of a nymphal instar in 1739 a dampwood termite: heterochronic shift in the caste differentiation pathways. EvoDevo, 1740 10, 10. (2019). 1741 140. Gao, Q. & Thompson, G. J. Social context affects immune gene expression in a 1742 subterranean termite. Insectes Soc. 62, 167–170 (2015). 1743 141. Waterhouse, R. M. et al. Evolutionary dynamics of immune-related genes and pathways 1744 in disease-vector mosquitoes. Science 316, 1738–1743 (2007). 1745 142. Mitaka, Y., Kobayashi, K. & Matsuura, K. Caste-, sex-, and age-dependent expression of 1746 immune-related genes in a Japanese subterranean termite, Reticulitermes speratus. PloS 1747 One, 12, e0175417 (2017). 1748 143. Tsunoda, K. & Yoshimura, T. Current termite management in Japan. Proceedings 1st 1749 Conference of Pacific Rim Termite Research Group, March 8-9, 2004, Penang, Malaysia, 1750 pp 1–5 (2004) . 1751 144. Allenza P. & Eldridge R. High-throughput screening and insect genomics for new 1752 insecticide leads. in Insecticides Design Using Advanced Technologies (ed. Ishaaya I., 1753 Horowitz A.R. & Nauen R.) 67–86 (Springer, 2007). 1754 145. Ishaaya, I., Palli, S.R. & Horowitz, A.R. Advanced technologies for managing insect pests 1755 (Springer, 2012). 1756 146. Feyereisen, R. Insect P450 enzymes. Annu. Rev. Entomol. 44, 507-33 (1999). 1757 147. Scott, J.G. Cytochromes P450 and insecticide resistance. Insect Biochem. Mol. Biol. 1758 29, 757–777 (1999). 1759 148. Ranson, H. et al. Evolution of supergene families associated with insecticide resistance. 1760 Science 298, 179–181 (2002). 1761 149. Finn, R. D., Clements, J. & Eddy, S. R. HMMER web server: interactive sequence 1762 similarity searching. Nucleic Acids Res. 39, W29–W37 (2011). 1763 150. Grabherr, M. G. et al. Trinity: reconstructing a full-length transcriptome without a genome 1764 from RNA-Seq data. Nat. Biotechnol. 29, 644 (2011). 1765 151. Blankenburg, S. et al. Cockroach GABAB receptor subtypes: molecular characterization, 1766 pharmacological properties and tissue distribution. Neuropharmacology 88, 134–144 1767 (2015). 1768 152. Hayashi, Y., Maekawa, K., Nalepa, C. A., Miura, T. & Shigenobu, S. Transcriptome 1769 sequencing and estimation of DNA methylation level in the subsocial wood-feeding 1770 cockroach Cryptocercus punctulatus (Blattodea: Cryptocercidae). Appl. Entomol. Zool. 1771 52, 643–651 (2017). 1772 153. Darriba, D., Taboada, G. L., Doallo, R. & Posada, D. ProtTest 3: fast selection of best-fit 1773 models of protein evolution. Bioinformatics 27, 1164–1165 (2011). 1774 154. Matsunami, M., Nozawa, M., Suzuki, R., Toga, K., Masuoka, Y., Yamaguchi, K., 1775 Maekawa, K., Shigenobu, S. & Miura, T. Caste-specific microRNA expression in termites: 1776 insights into soldier differentiation. Insect Mol. Biol. 28, 86–98 (2019). 1777 155. Zimet, M. Sexual dimorphism in the immature stages of the termite, Reticulitermes 1778 flavipes (Isoptera : Rhinotermitidae). Sociobiology 7, 1–7 (1982). 1779 156. Hayashi, Y., Kitade, O. & Kojima, J.-I. Parthenogenetic reproduction in neotenics of the 1780 subterranean termite Reticulitermes speratus (Isoptera: Rhinotermitidae). Entomol. Sci. 6, 1781 253–257 (2003). 1782 157. Weesner, F. M. "External anatomy" in Biology of Termites, Volume 1, K. Krishna, F.M. 1783 Weesner, Eds. (Academic Press, 1969), pp. 19–47. 1784 158. Vargo, E. L. Polymorphism at trinucleotide microsatellite loci in the subterranean termite 1785 Reticulitermes flavipes. Mol. Ecol. 9, 817–820 (2000). 1786 159. Hayashi, Y., Kitade, O. & Kojima, J.-I. Microsatellite loci in the Japanese subterranean 1787 termite, Reticulitermes speratus. Mol. Ecol. Notes 2, 518–520 (2002). 1788 160. Gnerre, S. et al. High-quality draft assemblies of mammalian genomes from massively 1789 parallel sequence data. Proc. Natl. Acad. Sci. U.S.A. 108, 1513–1518 (2011).

123

1790 161. Seppey, M., Manni, M. & Zdobnov, E. M. BUSCO: Assessing genome assembly and 1791 annotation completeness. Methods Mol. Biol. 1962, 227–245 (2019). 1792 162. Skinner, M. E., Uzilov, A. V., Stein, L. D., Mungall, C. J. & Holmes, I. H. JBrowse: a next- 1793 generation genome browser. Genome Res. 19, 1630–1638 (2009). 1794 163. Haas, B. J. et al. Automated eukaryotic gene structure annotation using 1795 EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 9, R7 1796 (2008). 1797 164. Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-Seq data without a 1798 reference genome. Nat. Biotechnol. 29, 644–652 (2011). 1799 165. Haas, B. J. et al. De novo transcript sequence reconstruction from RNA-seq using the 1800 Trinity platform for reference generation and analysis. Nat. Protoc. 8, 1494–1512 (2013). 1801 166. Li, W. & Godzik, A. Cd-hit: a fast program for clustering and comparing large sets of 1802 protein or nucleotide sequences. Bioinformatics 22, 1658–1659 (2006). 1803 167. Slater, G. S. C. & Birney, E. Automated generation of heuristics for biological sequence 1804 comparison. BMC Bioinformatics 6, 31 (2005). 1805 168. Blankenburg, S. et al. Cockroach GABAB receptor subtypes: molecular characterization, 1806 pharmacological properties and tissue distribution. Neuropharmacology 88, 134–144 1807 (2015). 1808 169. Hojo, M., Shigenobu, S., Maekawa, K., Miura, T. & Tokuda, G. Duplication and soldier- 1809 specific expression of geranylgeranyl diphosphate synthase genes in a nasute termite 1810 Nasutitermes takasagoensis. Insect Biochem. Mol. Biol. 111, 103177 (2019). 1811 170. Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids 1812 Res. 34, W435–9 (2006). 1813 171. Jones, P. et al. InterProScan 5: genome-scale protein function classification. 1814 Bioinformatics 30, 1236–1240 (2014). 1815 172. Kanehisa, M. et al. KEGG for linking genomes to life and the environment. Nucleic Acids 1816 Res. 36, D480–4 (2008). 1817 173. Kanehisa, M., Sato, Y. & Morishima, K. BlastKOALA and GhostKOALA: KEGG tools for 1818 functional characterization of genome and metagenome sequences. J. Mol. Biol. 428, 1819 726–731 (2016). 1820 174. Conesa, A. et al. Blast2GO: a universal tool for annotation, visualization and analysis in 1821 functional genomics research. Bioinformatics 21, 3674–3676 (2005). 1822 175. Remm, M., Storm, C. E. & Sonnhammer, E. L. Automatic clustering of orthologs and in- 1823 paralogs from pairwise species comparisons. J. Mol. Biol. 314, 1041–1052 (2001). 1824 176. Alexeyenko, A., Tamas, I., Liu, G. & Sonnhammer, E. L. L. Automatic clustering of 1825 orthologs and inparalogs shared by multiple proteomes. Bioinformatics 22, e9–15 (2006). 1826 177. Kriventseva, E. V. et al. OrthoDB v8: update of the hierarchical catalog of orthologs and 1827 the underlying free software. Nucleic Acids Res. 43, D250–6 (2015). 1828 178. Takematsu Y. Biometrical study on the development of the castes in Reticulitermes 1829 speratus (Isoptera, Rhinotermitidae). Jap. J. Entomol. 60, 67–76 (1992). 1830 179. Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina 1831 sequence data. Bioinformatics 30, 2114–2120 (2014). 1832 180. Kim, D. et al. TopHat2: accurate alignment of transcriptomes in the presence of 1833 insertions, deletions and gene fusions. Genome Biol. 14, R36 (2013). 1834 181. Liao, Y., Smyth, G. K. & Shi, W. The Subread aligner: fast, accurate and scalable read 1835 mapping by seed-and-vote. Nucleic Acids Res. 41, e108 (2013). 1836 182. Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for 1837 differential expression analysis of digital gene expression data. Bioinformatics 26, 139– 1838 140 (2010). 1839 183. Miura, F., Enomoto, Y., Dairiki, R. & Ito, T. Amplification-free whole-genome bisulfite 1840 sequencing by post-bisulfite adaptor tagging. Nucleic Acids Res. 40, e136–e136 (2012). 1841 184. Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. 1842 EMBnet.journal 17, 10–12 (2011). 1843 185. Krueger, F. & Andrews, S. R. Bismark: a flexible aligner and methylation caller for 1844 Bisulfite-Seq applications. Bioinformatics 27, 1571–1572 (2011).

124

1845 186. Hansen, K. D., Langmead, B. & Irizarry, R. A. BSmooth: from whole genome bisulfite 1846 sequencing reads to differentially methylated regions. Genome Biol. 13, R83 (2012). 1847 187. El-Gebali, S. et al. The Pfam protein families database in 2019. Nucleic Acids Res. 47, 1848 D427–D432 (2019). 1849 188. Eddy, S. R. Accelerated Profile HMM Searches. PLoS Comput. Biol. 7, e1002195 (2011). 1850 189. Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: 1851 improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013). 1852 190. Darriba, D., Taboada, G. L., Doallo, R. & Posada, D. ProtTest 3: fast selection of best-fit 1853 models of protein evolution. Bioinformatics 27, 1164–1165 (2011). 1854 191. Sullivan, J. & Joyce, P. Model Selection in Phylogenetics. Annu. Rev. Ecol. Evol. Syst. 1855 36, 445–466 (2005). 1856 192. Yin, Y. et al. dbCAN: a web resource for automated carbohydrate-active enzyme 1857 annotation. Nucleic Acids Res. 40, W445–51 (2012). 1858 193. Tamura, K., Stecher, G., Peterson, D., Filipski, A. & Kumar, S. MEGA6: Molecular 1859 Evolutionary Genetics Analysis version 6.0. Mol. Biol. Evol. 30, 2725–2729 (2013). 1860 194. Waterhouse, R. M. et al. Evolutionary dynamics of immune-related genes and pathways 1861 in disease-vector mosquitoes. Science 316, 1738–1743 (2007). 1862 195. Van Herreweghe, J. M. & Michiels, C. W. Invertebrate lysozymes: diversity and 1863 distribution, molecular mechanism and in vivo function. J. Biosci. 37, 327–348 (2012). 1864 196. Finn, R. D. et al. Pfam: the protein families database. Nucleic Acids Res. 42, D222–30 1865 (2014). 1866 197. Aken, B. L. et al. Ensembl 2017. Nucleic Acids Res. 45, D635–D642 (2017). 1867 198. Giraldo-Calderón, G. I. et al. VectorBase: an updated bioinformatics resource for 1868 invertebrate vectors and other organisms related with human diseases. Nucleic Acids 1869 Res. 43, D707–13 (2015). 1870 199. Elsik, C. G. et al. Hymenoptera Genome Database: integrating genome annotations in 1871 HymenopteraMine. Nucleic Acids Res. 44, D793–800 (2016). 1872 200. Legeai, F. et al. AphidBase: a centralized bioinformatic resource for annotation of the pea 1873 aphid genome. Insect Mol. Biol. 19 Suppl 2, 5–12 (2010). 1874 201. Colbourne, J. K., Singan, V. R. & Gilbert, D. G. wFleaBase: the Daphnia genome 1875 database. BMC Bioinformatics 6, 45 (2005). 1876 202. Attrill, H. et al. FlyBase: establishing a gene group resource for Drosophila melanogaster. 1877 Nucleic Acids Res. 44, D786–92 (2016). 1878 203. Suyama, M., Torrents, D. & Bork, P. PAL2NAL: robust conversion of protein sequence 1879 alignments into the corresponding codon alignments. Nucleic Acids Res. 34, W609–12 1880 (2006). 1881 204. Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of 1882 large phylogenies. Bioinformatics 30, 1312–1313 (2014). 1883 205. Kumar, S., Stecher, G. & Tamura, K. MEGA7: molecular evolutionary genetics analysis 1884 version 7.0 for bigger datasets. Mol. Biol. Evol. 33, 1870–1874 (2016). 1885 206. Löytynoja, A. & Goldman, N. An algorithm for progressive multiple alignment of 1886 sequences with insertions. Proc. Natl. Acad. Sci. U.S.A. 102, 10557–10562 (2005). 1887 207. Talavera, G. & Castresana, J. Improvement of phylogenies after removing divergent and 1888 ambiguously aligned blocks from protein sequence alignments. Syst. Biol. 56, 564–577 1889 (2007). 1890 208. Yang, Z. PAML 4: Phylogenetic Analysis by Maximum Likelihood. Mol. Biol. Evol. 24, 1891 1586–1591 (2007). 1892 209. Petersen, T. N., Brunak, S., von Heijne, G. & Nielsen, H. SignalP 4.0: discriminating 1893 signal peptides from transmembrane regions. Nat. Methods 8, 785–786 (2011). 1894 210. Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high 1895 throughput. Nucleic Acids Res. 32, 1792–1797 (2004). 1896 211. Wang, D., Zhang, Y., Zhang, Z., Zhu, J. & Yu, J. KaKs_Calculator 2.0: a toolkit 1897 incorporating gamma-series methods and sliding window strategies. Genom. Proteom. 1898 Bioinf. 8, 77–80 (2010). 1899 212. Rice, P., Longden, I. & Bleasby, A. EMBOSS: the European Molecular Biology Open 1900 Software Suite. Trends Genet. 16, 276–277 (2000).

125

1901 213. Untergasser, A. et al. Primer3—new capabilities and interfaces. Nucleic Acid. Res. 40, 1902 e115–e115 (2012).

126