Comparative genomics and divergence time estimation of the anaerobic fungi in herbivorous

Yan Wang1,2,*, Noha Youssef3, M.B. Couger4, Radwa Hanafy3, Mostafa Elshahed3, and Jason E. Stajich1,2,*

1Department of Microbiology and Plant Pathology, University of California, Riverside, Riverside, California, USA 2Institute for Integrative Genome Biology, University of California, Riverside, Riverside, California, USA 3Department of Microbiology and Molecular Genetics, Oklahoma State University, Stillwater, Oklahoma, USA 4High Performance Computing Center, Oklahoma State University, Stillwater, Oklahoma, USA

*Corresponding authors. Yan Wang, [email protected], and Jason E. Stajich, [email protected].

This Supplementary Materials PDF file includes: • Supplementary Figures 1-3 • Supplementary Tables 1-2

1 Supplementary Figures

Pecoramyces_sp_YC3 96 96 Pecoramyces_ruminantium_C1A

100Pecoramyces_sp_FS3C Pecoramyces_sp_S4B 96 100 Pecoramyces_sp_FX4B Orpinomyces_sp_D3B 100 Orpinomyces_sp_D3A 79 100 Orpinomyces_sp_D4C Feramyces_austinii_WSF2c 100 100 Feramyces_austinii_WSF3a Neocallimastix_californiae_G1 100 100 Neocallimastix_sp_G3 Neocallimastix_frontalis_Hef5 100 Anaeromyces_sp_G3G 100 100Anaeromyces_sp_NA 100Anaeromyces_sp_C3G 100Anaeromyces_sp_C3J 100 Anaeromyces_sp_O2 81 Anaeromyces_robustus_S4 Piromyces_sp_B4 100 Piromyces_sp_B5 87 Piromyces_sp_A2 100 100 87 Piromyces_sp_A1 81 Piromyces_sp_E2 Piromyces_finnis Caecomyces_sp_Brit4 100 Caecomyces_sp_Iso3 Chytriomyces_sp_MP71 100 100 Rhizoclosmatium_globosum 100 Entophlyctis_helioformis_JEL805 100 Gonapodya_prolifera_JEL478 Gaertneriomyces_semiglobifer_Barr_43

0.3

Supp. Fig. 1. Maximum likelihood phylogenetic tree of Neocallimastigomycota using Chytridiomycota as the outgroup. All bootstrap values (out of 100) are labeled on the branches.

2 Neocallimastigomycota

Anaeromyces Piromyces Chytridiomycota Pecoramyces Neocallimastix Orpinomyces Caecomyces Feramyces

Presence Absence

Supp. Fig. 2. Presence (dark gray) and absence (light gray) of the homologous families across the genomes (and transcriptomes) of Neocallimastigomycota and Chytridiomycota. The 4,824 gene families were selected as universal homologous that present at least 22 out of the 27 Neocallimastigomycota genomes (and transcriptomes) with missing no more than 1 of the 5 included Chytridiomycota genomes. In addition, it also includes the unique gene families that are strictly absence from all Chytridiomycota but encoded by the Neocallimastigomycota (missing no more than 5 out of the 27 taxa).

3 10Brassica_napus_270 4 9Brass9 ica_rapa_subsp_pekinensis_244 10Brassica_oleracea_var_oleracea_270 8 Brassica_napus_268 8 3 5Brassica_oleracea_var_oleracea_269 4 5 1 10Brass0 ica_rapa_subsp_pekinensis_251 Brassica_napus_269 Arabidopsis_lyrata_subsp_lyrata_250 10Ar0abidopsis_thaliana_239 Ar8 8abidopsis_thaliana_242 9 3 10Arabi0 dopsis_thaliana_240 Arabidopsis_thaliana_243 Beta_vulgaris_subsp_vulgaris_275 9 8 Beta_vulgaris_subsp_vulgaris_258 Medicago_truncatula_226 5 3 8Medicago_truncatula_196 7 7Medicago_truncatula_207 6 10Medicago_truncatula_270 3 9 5 9 7Medicago_truncatula_294 100 Trifolium_pratense_257 Glycine_max_279 8 1 10Glyc0 ine_max_281 9 8 Populus_trichocarpa_207 Prunus_persica_245 Theobroma_cacao_262 Brassica_napus_119 9Brassica_oleracea_var_oleracea_121 5 10Brass0 ica_rapa_subsp_pekinensis_143 8 610Brassica_napus_140 8 Arabidopsis_thaliana_218 10Ar100abi0 dopsis_thaliana_132 7 1 100 Arabidopsis_lyrata_subsp_lyrata_99 10Tri0folium_pratense_299 8 6 100Medicago_truncatula_162 Glycine_max_133 9 7Populus_trichocarpa_211 Corchorus_capsularis_290 9 0 10Theobroma_cacao_280 2 9 9 Theobroma_cacao_142 3 5100 Populus_trichocarpa_189 10Prunus_persica_190 1 9 8 Prunus_persica_217 Beta_vulgaris_subsp_vulgaris_248 Solanum_tuberosum_302 8Solanum_tuberosum_273 2 9Solanum_tuberosum_263 5 109Solanum_tuberosum_2680 0 Solanum_lycopersicum_259 9 8 Solanum_tuberosum_284 7Solanum_tuberosum_288 3 10Solanum_tuberosum_280 5 3 4 Solanum_lycopersicum_147 Brassica_oleracea_var_oleracea_127 10Brassica_napus_130 6 10Brassica_napus_120 9 Brassica_rapa_subsp_pekinensis_141 100 10Arabi0 dopsis_thaliana_100 8Ar9abidopsis_thaliana_101 9 9 Arabidopsis_lyrata_subsp_lyrata_89 Theobroma_cacao_10 5 5 9 010Corchorus_capsularis_280 7 Populus_trichocarpa_107 10Populus_trichocarpa_110 3 9 8 Glycine_max_69 10Gl0ycine_max_220 10Medicago_truncatula_150 8 9 0 10Tri0folium_pratense_118 Prunus_persica_123 Vitis_vinifera_266 Arabidopsis_thaliana_114 Arabidopsis_thaliana_124 5Ar2abidopsis_thaliana_171 10Arabi0 dopsis_thaliana_120 6 5 10Ar0abidopsis_thaliana_238 3Arabidopsis_lyrata_subsp_lyrata_125 2 3Brassica_napus_197 4 9Brassica_oleracea_var_oleracea_152 0 9Brass9 ica_rapa_subsp_pekinensis_116 8 9 Brassica_napus_130 8 3 Beta_vulgaris_subsp_vulgaris_126 10Theobroma_cacao_10 1 6 8 Corchorus_capsularis_288 8 8 10Glyc0 ine_max_173 10Glyc0 ine_max_183 10G0lycine_max_210 8 710Medicago_truncatula_190 5 7 37 9 Trifolium_pratense_186 Prunus_persica_182 Populus_trichocarpa_169 Solanum_lycopersicum_203 8 6 Solanum_lycopersicum_214 Solanum_lycopersicum_85 3Hordeum_vulgare_subsp_vulgare_316 9 1Triticum_urartu_303 7 2A3egilops_tauschii_298 4Hordeum_vulgare_subsp_vulgare_301 8 8Hordeum_vulgare_subsp_vulgare_312 4 Hordeum_vulgare_subsp_vulgare_312 3Hordeum_vulgare_subsp_vulgare_319 5 Hordeum_vulgare_subsp_vulgare_309 4Hordeum_vulgare_subsp_vulgare_200 5 Hordeum_vulgare_subsp_vulgare_234 Hordeum_vulgare_subsp_vulgare_209 7Hordeum_vulgare_subsp_vulgare_225 7 Hordeum_vulgare_subsp_vulgare_232 Hordeum_vulgare_subsp_vulgare_224 7Hordeum_vulgare_subsp_vulgare_199 0 Hordeum_vulgare_subsp_vulgare_200 Hordeum_vulgare_subsp_vulgare_198 7Hordeum_vulgare_subsp_vulgare_196 9 6Hordeum_vulgare_subsp_vulgare_209 1 Hordeum_vulgare_subsp_vulgare_184 4Hordeum_vulgare_subsp_vulgare_313 7 Triticum_aestivum_230 6Triticum_aestivum_220 3 10Triticum_aestivum_230 1 Triticum_aestivum_23100 3 6 5 9 6Triticum_aestivum_229 2 Triticum_aestivum_204 9Brachypodium_distachyon_241 7 Zea_mays_235 Sorghum_bicolor_24100 9 9S6etaria_italica_236 10Set0aria_italica_237 10Oryza_barthii_290 2 4Oryza_barthii_295 3 7Oryza_glumipatula_302 4 6Oryza_glumipatula_302 3 Oryza_sativa_subsp_japonica_311 1010Oryza_glaberrima_3000 1 6O6ryza_sativa_subsp_indica_296 10907Oryza_rufipogon_3091 0 Oryza_nivara_253 Oryza_punctata_192 10Oryza_punctata_190 3 Oryza_meridionalis_305 9103 0 3Oryza_meridionalis_250 4 Oryza_glumipatula_196 1 7 6Oryza_rufipogon_173 4 Oryza_sativa_subsp_japonica_21100 3 109Oryza_sativa_subsp_japonica_2430 1 8 6 8Oryza_glaberrima_214 5 4 8 Oryza_barthii_306 10O0ryza_sativa_subsp_indica_297 Oryza_brachyantha_252 Leersia_perrieri_270 10Leersia_perrieri_270 1 10M0usa_acuminata_subsp_malaccensis_76 3 4 9 8 Musa_acuminata_subsp_malaccensis_1 10Musa_acuminata_subsp_malaccensis_0 2 Musa_acuminata_subsp_malaccensis_3 8 3 10Mu0sa_acuminata_subsp_malaccensis_30 Musa_acuminata_subsp_malaccensis_280 10Musa_acuminata_subsp_malaccensis_110 1 Amborella_trichopoda_316 10Amborella_trichopoda_220 9 Hordeum_vulgare_subsp_vulgare_41 8Hordeum_vulgare_subsp_vulgare_38 8 9Hordeum_vulgare_subsp_vulgare_40 2 Hordeum_vulgare_subsp_vulgare_81 910Hordeum_vulgare_subsp_vulgare_660 6 Hordeum_vulgare_subsp_vulgare_181 106A20egilops_tauschii_310 10Triticum_aestivum_70 1 10Triticum_aestivum_50 9 Triticum_urartu_313 Triticum_aestivum_7100 7 9Hordeum_vulgare_subsp_vulgare_164 4 9 7 9Hordeum_vulgare_subsp_vulgare_162 8 9Hordeum_vulgare_subsp_vulgare_153 3 Hordeum_vulgare_subsp_vulgare_134 9Hordeum_vulgare_subsp_vulgare_1248 8 8Hordeum_vulgare_subsp_vulgare_321 0 Triticum_aestivum_75 “Rhamnogal_lyase” domain 810Triticum_aestivum_680 4 9 9 Triticum_aestivum_131 Brachypodium_distachyon_112 9 8 Triticum_aestivum_32 9Triticum_aestivum_31 3 9Triticum_aestivum_31 4 910Tr9 i0ticum_urartu_55 Hordeum_vulgare_subsp_vulgare_72 10Hordeum_vulgare_subsp_vulgare_180 7 100 9 9Triticum_urartu_60 from Plant 100 8Triticum_aestivum_51 6 7Triticum_aestivum_69 2 1005Triticum_aestivum_51 1 Aegilops_tauschii_261 Brachypodium_distachyon_267 9 6 8 5 Brachypodium_distachyon_228 10Zea_0 mays_68 Z4ea_9 mays_48 8Zea_6 mays_61 Z5ea_0 mays_26 9 7 9Zea_9 mays_74 9Z9ea_mays_78 100Sorghum_bicolor_95 9 7 Setaria_italica_73 Sorghum_bicolor_58 10Setaria_italica_40 6 10Sorghum_bicolor_10 8 8 7 Zea_mays_176 Sorghum_bicolor_63 10Z0ea_mays_27 9Sorghum_bicolor_29 3 8 6Sorghum_bicolor_2100 4 Setaria_italica_29 10Setaria_italica_20 2 9Oryza_nivara_87 3 6Oryza_nivara_83 0 4Oryza_glaberrima_140 6 Oryza_rufipogon_28 1Oryza_longistaminata_38 9 1Oryza_glumipatula_41 7 Oryza_sativa_subsp_japonica_31 1090Oryza_sativa_subsp_indica_37 Oryza_meridionalis_45 41Oryza_punctata_2070 2 Oryza_barthii_43 1 2 10Leersia_perrieri_290 5 8 5Leersia_perrieri_87 24Oryza_longistaminata_324 6 Oryza_sativa_subsp_japonica_117 2 9Oryza_punctata_50 Oryza_glumipatula_144 8 8 810Oryza_nivara_141 0 9 Oryza_glaberrima_152 8 4 Oryza_sativa_subsp_indica_139 9Oryza_barthii_107 2 9 7Oryza_rufipogon_40 Oryza_brachyantha_256 10Oryza_brachyantha_20 1 Solanum_tuberosum_9100 0 10Solanum_tuberosum_80 6 100 Solanum_lycopersicum_82 100 Solanum_lycopersicum_70 Solanum_lycopersicum_145 Brassica_oleracea_var_oleracea_4 10Brassica_napus_0 8 10Brassica_napus_0 6 Brassica_rapa_subsp_pekinensis_19 100 Brassica_rapa_subsp_pekinensis_20 10Brassica_napus_0 9 9 8 10Br0assica_oleracea_var_oleracea_5 10Brassica_napus_0 7 100Arabidopsis_lyrata_subsp_lyrata_54 Brassica_oleracea_var_oleracea_255 9Brassica_napus_188 8 10010Brassica_napus_170 5 5Brass7 ica_rapa_subsp_pekinensis_177 9 3 Arabidopsis_thaliana_65 Arabidopsis_thaliana_88 9Arabidopsis_thaliana_98 4 100 10Arabidopsis_thaliana_90 2 10Ar0abidopsis_thaliana_276 Arabidopsis_lyrata_subsp_lyrata_84 9 8 Brass6 9 ica_rapa_subsp_pekinensis_103 10Brassica_napus_90 8 10Brassica_napus_100 9 10Brassica_oleracea_var_oleracea_110 0 Brassica_napus_52 100 10Brassica_oleracea_var_oleracea_40 4 8 7 Brassica_oleracea_var_oleracea_96 10Brassica_napus_90 7 10Brass0 ica_rapa_subsp_pekinensis_105 9 8 10Brassica_napus_100 6 9 8 Arabidopsis_lyrata_subsp_lyrata_179 10Arabidopsis_lyrata_subsp_lyrata_170 8 9 1 Arabidopsis_thaliana_165 100Brassica_napus_154 10Brass0 ica_rapa_subsp_pekinensis_163 9Brassica_napus_154 6 99Brassica_oleracea_var_oleracea_1579 9 9 7 Brassica_napus_121 9Brass6 ica_rapa_subsp_pekinensis_160 Corchorus_capsularis_286 100 Theobroma_cacao_172 Populus_trichocarpa_15100 7 5 9 100Populus_trichocarpa_161 Populus_trichocarpa_180 10Vitis_vinifera_10 5 9 4 100 Vitis_vinifera_14 6 9 Vitis_vinifera_67 Prunus_persica_93 10Glyc0 ine_max_208 10G0lycine_max_212 5 4 9 4 Glycine_max_185 9 4 Medicago_truncatula_216 Medicago_truncatula_115 100 10Medicago_truncatula_210 9 10Medicago_truncatula_220 5 100 Trifolium_pratense_221 Glycine_max_246 6 1 10G0lycine_max_263 7 3 6 7 Trifolium_pratense_35 100Medicago_truncatula_166 Glycine_max_289 10Beta_vulgaris_subsp_vulgaris_10 3 9 5 Beta_vulgaris_subsp_vulgaris_17 100Beta_vulgaris_subsp_vulgaris_135 8 9 9 7 10Beta_vulgaris_subsp_vulgaris_290 1 Beta_vulgaris_subsp_vulgaris_12 10Beta_vulgaris_subsp_vulgaris_10 6 10Solanum_tuberosum_270 7 100 Solanum_lycopersicum_25 100 Solanum_lycopersicum_49 9 8 Solanum_lycopersicum_57 100 Solanum_lycopersicum_138 100 Solanum_lycopersicum_140 Selaginella_moellendorffii_170 10Selaginella_moellendorffii_310 8 Selaginella_moellendorffii_91 10Selaginella_moellendorffii_100 4 9 8 Selaginella_moellendorffii_155 10Selaginella_moellendorffii_160 7 Physcomitrella_patens_79 9 5 Physcomitrella_patens_137 9 8 Vitis_vinifera_108 100Prunus_persica_151 Populus_trichocarpa_53 FS3C_FS3C_21844 S4B_S4B_28234 S4B_S4B_28229 S4B_S4B_28230 9S4B_S4B_28233 3 9FX4B_FX4B_337587 10Orpsp1_1_0 Orpsp1_1_1190025 10D3B_D3B_5090 4 9D4C_D4C_26785 6 7He4f5_Hef5_13208 10He0f5_Hef5_23402 9He7 f5_Hef5_23400 6WSF2_WSF2_657 9 10WSF3_WSF3_22420 0 WSF2_WSF2_663 100Brit4_Brit4_25267 9Iso3_Iso3_6300 8 10Iso3_Iso3_34860 8 Brit4_Brit4_25271 5 7 9 9 10A1_A1_12100 2 595A2_A2_12102 2 PirE2_1_PirE2_1_17075 Pirfi3_Pirfi3_316903 8Anasp1_Anasp1_294546 5 C3J_C3J_33041 8O02_O2_35864 6O82_O2_35857 C3J_C3J_33049 9 7 C3G_C3G_17753 9 8 O2_O2_35856 9 6 C3G_C3G_17751 Hef5_Hef5_13209 He100f5_Hef5_13210 8Neo2 sp1_Neosp1_665236 4Neo0 sp1_Neosp1_665240 3Neo1 sp1_Neosp1_665237 9Neo0 sp1_Neosp1_501077 Neosp1_Neosp1_501086 96Neo41 sp1_Neosp1_665249 8 1 Neosp1_Neosp1_669383 100 108G03_0 G3_4102 9 9 Neosp1_Neosp1_631879 9Neo7 sp1_Neosp1_457438 9 9 PirE2_1_PirE2_1_1846 PirE2_1_PirE2_1_17387 10PirE2_1_PirE2_1_17410 2 D3A_D3A_29419 4 1 8D4C_D4C_27625 6 7 99 8 10D3B_D3B_5530 4 9 2 D3B_D3B_5535 FS3C_FS3C_43727 8S4B_S4B_37076 7 6 9 WSF2_WSF2_660 WSF3_WSF3_22405 “Rhamnogal_lyase” domain 7 9 6WSF3_WSF3_22418 8 10WSF3_WSF3_22400 0 WSF3_WSF3_22403 6 3C3J_C3J_7445 9Anasp1_Anasp1_290138 9 6 3 C3J_C3J_33043 C3G_C3G_17754 6C3J_C3J_33046 4 O2_O2_35865 D3B_D3B_19756 from Neocallimastigomycota D3B_D3B_19758 D3B_D3B_19765 D3B_D3B_19761 D3B_D3B_19763 D3B_D3B_19760 D3B_D3B_19759 4D3B_D3B_19768 6 9D3B_D3B_19760 4 D3B_D3B_19757 9 6D3A_D3A_22374 (Novel) D3B_D3B_19762 10D4C_D4C_17090 4 D4C_D4C_11860 D3B_D3B_22039 99D4C_D4C_118587 8 D3A_D3A_22376 9O8rpsp1_1_Orpsp1_1_1189773 9S4B_S4B_30363 9 9 8 10FX4B_FX4B_8370 1 YC3_YC3_22068 7O32_O2_35858 9O72_O2_35860 5 1 10O2_0O2_35861 9 7O2_O2_35862 C3J_C3J_33042 97O622_O2_35866 7 4C3J_C3J_33049 100 C3J_C3J_33048 Anasp1_Anasp1_265102 10Neo0 sp1_Neosp1_702533 3 19G3_G3_12547 1 9 1Neosp1_Neosp1_678787 9 1WSF3_WSF3_16171 Pirfi3_Pirfi3_359929 6 2 Brit4_Brit4_27518 2 2 10Iso3_Iso3_40960 0 Iso3_Iso3_24258 7He1f5_Hef5_54485 9He9f5_Hef5_54486 9He9 f5_Hef5_54487 Neo100sp1_Neosp1_705818 9 8G3_G3_12543 9 7 Orpsp1_1_Orpsp1_1_1187117 10YC3_YC3_13300 4 10100 0 9WSF2_WSF2_666 2 100WSF3_WSF3_22419 PirE2_1_PirE2_1_19213 PirE2_1_PirE2_1_9593 10Flavobacterium_daejeonense_20 1 100 Flavobacterium_glycines_22 100 Flavobacterium_sp_23 100 Pedobacter_glucosidilyticus_98 Leeuwenhoekiella_blandensis_MED217_32 100 9 9 Verrucomicrobia_bacterium_IMCC26134_100 4 7 Opitutaceae_bacterium_TSB47_46 9 9 Mucilaginibacter_paludis_DSM_18603_38 9 7 100 Chthoniobacter_flavus_Ellin428_5 10Opitutaceae_bacterium_TSB47_40 1 100 Opitutaceae_bacterium_TSB47_42 Asticcacaulis_sp_YBE204_2 “Rhamnogal_lyase” domain 9 3 Chthonomonas_calidirosea_6 6Chthonomonas_calidirosea_7 7 10Chthonomonas_calidirosea_0 8 100 Chthonomonas_calidirosea_T49_100 9 100 Mucilaginibacter_sp_PPCGB_2223_39 100 Granulicella_mallensis_MP5ACTX8_24 Opitutaceae_bacterium_TSB47_43 100 Opitutaceae_bacterium_TAV5_40 100Asticcacaulis_sp_YBE204_3 from Bacterial group I 100 Asticcacaulis_excentricus_CB_48_1 100 Paraburkholderia_tropica_52 10Paenibacillus_mucilaginosus_3016_40 8 10Paenibacillus_mucilaginosus_K02_40 9 9 6 Paenibacillus_mucilaginosus_47 Paenibacillus_sp_Soil766_51 Verrucomicrobia_bacterium_IMCC26134_101 Pectobacterium_wasabiae_CFBP_3304_96 Unspecified domain from Bacteria 10Pectobacterium_wasabiae_90 5 6Pectobacterium_wasabiae_CFBP_3304_97 7 Pectobacterium_sp_94 510Pectobacterium_parmentieri_WPP163_96 0 3 Pectobacterium_betavasculorum_58 9Pectobacterium_betavasculorum_59 9 10Pectobacterium_atrosepticum_50 4 10Pectobacterium_atrosepticum_SCRI1043_50 7 6Pec4 tobacterium_atrosepticum_ICMP_1526_56 568Pectobacterium_carotovorum_subsp_carotovorum_UGC32_94 0 10Pectobacterium_atrosepticum_50 3 2 3Pectobacterium_atrosepticum_55 Pectobacterium_carotovorum_subsp_carotovorum_80 Pectobacterium_carotovorum_60 99Pec8 tobacterium_carotovorum_subsp_carotovorum_87 10Pect0 obacterium_carotovorum_subsp_brasiliense_66 1P0ectobacterium_carotovorum_subsp_brasiliense_74 Pectobacterium_carotovorum_subsp_carotovorum_83 10Pect0 obacterium_carotovorum_subsp_odoriferum_91 5P2ectobacterium_carotovorum_subsp_odoriferum_92 Pectobacterium_carotovorum_subsp_actinidiae_61 2 4 Pec9 6 tobacterium_carotovorum_subsp_carotovorum_77 29P3ectobacterium_carotovorum_subsp_carotovorum_ICMP_5702_88 8Pectobacterium_carotovorum_subsp_carotovorum_84 9Pec4 tobacterium_carotovorum_subsp_carotovorum_86 10Pec0tobacterium_carotovorum_subsp_carotovorum_78 59Pec37 tobacterium_carotovorum_subsp_carotovorum_82 Pectobacterium_carotovorum_subsp_carotovorum_85 6P7ectobacterium_carotovorum_subsp_brasiliense_62 “Rhamnogal_lyase” domain 2P0ectobacterium_carotovorum_subsp_brasiliense_73 4Pectobacterium_carotovorum_subsp_brasiliensis_ICMP_19477_75 2P0ectobacterium_carotovorum_subsp_brasiliense_69 4Pectobacterium_carotovorum_subsp_brasiliense_63 3P6ectobacterium_carotovorum_subsp_brasiliense_65 2P0ectobacterium_carotovorum_subsp_brasiliense_67 10Pect0 obacterium_carotovorum_subsp_brasiliense_70 5P3ectobacterium_carotovorum_subsp_brasiliense_64 8 5 Pectobacterium_carotovorum_subsp_carotovorum_81 Pectobacterium_carotovorum_subsp_carotovorum_PCC21_89 from Bacterial group II (with HGT in Insects) 91Pec02 tobacterium_carotovorum_subsp_carotovorum_79 Pectobacterium_carotovorum_subsp_brasiliense_72 10Pe0ctobacterium_carotovorum_subsp_brasiliense_68 Pectobacterium_carotovorum_subsp_brasiliense_71 7 8 Pectobacterium_carotovorum_subsp_carotovorum_76 Erwinia_tracheiphila_18 10Erwinia_0 tracheiphila_PSU-1_20 Cedecea_neteri_str_ND14b_4 10Dick0 eya_solani_14 100 9Dickeya_chrysanthemi_15 0 Dickeya_dadantii_13 910RhiE_Dickeya_chrysan5 0 themi_RH_I_CAD27359 10Di0ckeya_solani_D_s0432-1_15 100 100 Dickeya_chrysanthemi_Ech1591_12 100 Serratia_sp_99 Kluyvera_georgiana_ATCC_51603_25 9 7 10Dick0 eya_zeae_16 10Dickeya_zeae_EC1_10 7 Dickeya_chrysanthemi_Ech1591_11 9 2 Erwinia_tracheiphila_PSU-1_19 100 Dendroctonus_ponderosae_1 100 Dendroctonus_ponderosae_2 Dendroctonus_ponderosae_3 Melissococcus_plutonius_33 8 4 Melissococcus_plutonius_S1_37 10Melissococcus_plutonius_ATCC_35311_30 6 10Melissococcus_plutonius_30 4 Melissococcus_plutonius_35 100 10Lac0 tobacillus_helsingborgensis_29 100 100 Lactobacillus_sp_wkB8_31 100 100 Lactobacillus_ghanensis_DSM_18630_27 Lactobacillus_hamsteri_DSM_5661___JCM_6256_28 Unspecified domain from Bacteria Lactobacillus_ghanensis_DSM_18630_26 100 Lactobacillus_pentosus_KCA1_30 Paenibacillus_sp_Soil766_50 10Pyr0enochaeta_sp_DS3sAY3a_1 “RhgB_N” domain from Bacteria 10Pyrsp1_Pyrsp1_572950 8 10Parch1_Parch1_477550 6 9 9Macan1_Macan1_408090 10Didex1_Didex1_281530 7 100 100 Didma1_Didma1_17685 7 7 Parsp1_Parsp1_1145684 9 6 10Crypa2_Crypa2_51740 6 100Crypto1_Crypto1_502709 9 1 Venin1_Venin1_23265 100 Venpi1_Venpi1_213522 100 Neonectria_ditissima 10Neodi1_Neodi1_320 8 Rhyru1_1_Rhyru1_1_114943 9Aureobasidium_pullulan9 s 100 10Aurpu_var_pul1_Aurpu_var_pul1_283650 1 Aurpu_var_sub1_Aurpu_var_sub1_703636 Unspecified domain from Dikarya 9 8 9 7 Neopa1_Neopa1_8910 100 9 8 Settu3_Settu3_154507 Hyspu1_1_Hyspu1_1_118609 9 4 Phcit1_Phcit1_197331 Aspergillus_oryzae_RIB40 10Aspergillus_0 flavus_NRRL3357 10Rhizoctonia_solani_AG-0 1 10Rhiso1_Rhiso1_3890 7 10T0hacu1_Thacu1_719011 10RSOL_EUC63440 6 100 V565_KEP54040 10CerAGI_CerAGI_604320 8 8 7 100 CerAGI_CerAGI_520626 Thacu1_Thacu1_665681 Pyrenochaeta_sp_DS3sAY3a_2 6 1 100 Rhamnogalacturonate_lyaseB_Neonectria_ditissima_KPM43995 10Penicillium_stecki0 i 5 7 100Pensub1_Pensub1_9802 Penra1_Penra1_280727 Unspecified domain from Dikarya XylPMI506_XylPMI506_515693 5 1 10Chae0 tomium_globosum_CBS_14851 10Chagl_1_Chagl_1_17470 2 100 Thiap1_Thiap1_583828 10Diaporthe_helianth0 i 5 1 9 5 Diaam1_Diaam1_8226 Stagonospora_sp_SRC1lsM3a 10Cu0 cbe1_Cucbe1_393062 Tragib1_Tragib1_295752 9 5 10Trabe0 t1_Trabet1_736816 (including Rhamnogalacturonate B & C ) 100 Trave1_Trave1_42375 9 5 Rhamnogalacturonate_lyaseC_Fusarium_langsethiae_KPA39151 100Cob_ENH77361 Mrafri1_Mrafri1_439242 5 7 Podospora_anserina_25-280 8 5Chaetomium_globosum_25-280 9 9Chaetomium_thermophilum_25-280 Myceliophthora_thermophila_25-280 9 9 10Neurospora_crassa_25-280 1 9 7 10Neurospora_tetrasperma_25-280 1 Sordaria_macrospora_25-282 5 2 Togninia_minima_1-175 5 2 Gaeumannomyces_graminis_27-288 4 6 Eutypa_lata_21-278 Colletotrichum_orbiculare_25-285 10Colletotrichum_gloeosporioides_25-280 5 9 9 100 Colletotrichum_graminicola_24-283 10Colle0 totrichum_higginsianum_25-285 Verticillium_dahliae_24-277 10Verticillium_alfalfae_7-210 2 10Cochliobolus_heterostrophus_29-280 5 5 4 100Bipolaris_zeicola_20-278 6 2 Cochliobolus_sativus_24-282 100 Setosphaeria_turcica_22-278 Pyrenophora_teres_21-276 9 7 10Pyrenophora_tritici_repentis_21-270 7 5 2 Phaeosphaeria_nodorum_21-281 Botryosphaeria_parva_21-274 10Ma0crophomina_phaseolina_66-319 Fusarium_oxysporum_20-278_4 Fusarium_oxysporum_20-278_3 Fusarium_oxysporum_20-278_2 10Fusarium_oxysporum_Panama_20-270 8 9 8 10Fusarium_oxysporum_20-278_0 1 Gibberella_fujikuroi_20-278 10100Gibberella_monili0 formis_20-278 Fusarium_pseudograminearum_20-279 100Gibberella_zeae_20-27100 9 Nectria_haematococca_20-279 100 100 Dactylellina_haptotyla_23-283 100 Pestalotiopsis_fici_20-279 100 Eutypa_lata_20-278 7 7 Penicillium_rubens_16-271 100 Emericella_nidulans_87-259 8 0 Pseudocercospora_fijiensis_19-271 7 1 8 7 Dothistroma_septosporum_21-274 Sphaerulina_musiva_21-273 9 8 9M7 acrophomina_phaseolina_23-273 100 Dipse1_Dipse1_9147 Botryosphaeria_parva_22-273 10Aspergillus_niger_22-270 1 10Aspergillus_niger_22-270 0 Aspergillus_kawachii_22-270 100 10Aspergillus_oryzae_22-270 0 100 10Aspergillus_flavus_22-270 0 Rhamnogalacturonate_lyase_A_Aspergillus_arachidicola_PIG80164 8 7 100 Aspergillus_terreus_22-270 8 7 Aspergillus_clavatus_22-270 8 8 Aspergillus_nidulans_22-270 Aspergillus_fumigatus_22-270 8 9 10Neosartorya_fischeri_22-270 0 100 Dactylellina_haptotyla_22-270 100 5 8 Penicillium_rubens_22-276 9 8Penicillium_roqueforti_23-271 Penicillium_digitatum_22-264 100 Marssonina_brunnea_20-273 9 6 Glarea_lozoyensis_19-272 100Thanatephorus_cucumeris_21-315 100 Thanatephorus_cucumeris_1-160 Serendipita_indica_19-269 Pyronema_omphalodes_20-262 100Thanatephorus_cucumeris_157-335 9 5 10T0hanatephorus_cucumeris_24-216 7 3 Thanatephorus_cucumeris_24-268 7 5 Thanatephorus_cucumeris_99-256 100 Gloeophyllum_trabeum_36-280 2 8 Gloeophyllum_trabeum_42-281 10Schizophyllum_commune_22-270 5 10505 Schizophyllum_commune_21-269 Moniliophthora_roreri_22-270 “RhgB_N” domain from 9 8 Coprinopsis_cinerea_23-269 9 9 Earsca1_Earsca1_428861 Colletotrichum_graminicola_21-270 100Colletotrichum_higginsianum_21-270 100Colletotrichum_gloeosporioides_21-225 100 10Colletotrichum_gloeosporioides_21-270 1 100 Colletotrichum_orbiculare_21-270 100 Phaeosphaeria_nodorum_21-272 9 9 Drechslerella_stenobrocha_20-268 100 Arthrobotrys_oligospora_20-268 Dikarya, Oomycetes, and Bacteria Dactylellina_haptotyla_20-268 6Phytophthora_parasitica_46-294_8 1 8Phytophthora_parasitica_46-294_1 2 10Phytophthora_infestans_44-290 2 Phytophthora_parasitica_46-294_3 10Ph0ytophthora_parasitica_46_295_1 1010Phytophthora_parasitica_46-290 0 5 Phytophthora_parasitica_46-295_2 100Phytophthora_sojae_46-294 9 8 Phytophthora_ramorum_45-294 (including Rhamnogalacturonate lyase A) 100 10Phytophthora_parasitica_10-250 6 9 7Phytophthora_parasitica_2-235 9 5 Phytophthora_sojae_1-141 Phytophthora_ramorum_27-275 10Phytophthora_parasitica_28-279_0 1 9 9 10Phytophthora_parasitica_28-279_0 2 9Phytophthora_parasitica_28-279_9 3 Phytophthora_infestans_25-210 9 910Phytophthora_infestans_25-270 6 100Phytophthora_sojae_26-276 Phytophthora_ramorum_25-276 100 100 9P4hytophthora_parasitica_29_279_1 Phytophthora_parasitica_29-279_100 2 9 3 Phytophthora_nicotianae_29-279 Phytophthora_ramorum_29-279_1 10Phy0tophthora_ramorum_29-279_2 9 8 Pythium_ultimum_26-278 100 Pythium_ultimum_25-276 8 3 Thanatephorus_cucumeris_19-266_2 10Thanatephorus_cucumeris_19-266_0 1 100 Thanatephorus_cucumeris_19-231_2 100 Thanatephorus_cucumeris_19-231_100 1 Thanatephorus_cucumeris_23-214 100 100 Thanatephorus_cucumeris_20-235 100 Serendipita_indica_23-267 6 7 100 Heterobasidion_irregulare_22-266 100 Agaricus_bisporus_19-267 Schizophyllum_commune_18-262 9 2 Pseudomonas_fluorescens_21-268 8 8 Xanthomonas_albilineans_27-274 8 8Xanthomonas_oryzae_61-307 100 Rhamnogalacturonate_lyaseA_Xanthomonas_fuscans_SOO43950 9 8 Xanthomonas_campestris_34-233 Actinoplanes_sp_42-285 100 Actinoplanes_missouriensis_31-278 Clostridium_saccharobutylicum_39-288 100 Opitutus_terrae_38-299 9Cochliobolus_heterostrophus_21-273 9 9Bipolaris_zeicola_21-279 9 9 9Cochliobolus_sativus_21-279 Pyrenophora_tritici_repentis_1-231 10100Pyrenophora_teres_21-270 9 100 Setosphaeria_turcica_21-279 100 100 Leptosphaeria_maculans_21-275 9 8 Eutypa_lata_21-283 Pestalotiopsis_fici_21-279 9 8 10Colletotrichum_gloeosporioides_21-275_0 1 10Colletotrichum_gloeosporioides_21-275_0 2 10Colletotrichum_orbiculare_21-270 5 100 9 7 Colletotrichum_higginsianum_21-267 Nectria_haematococca_21-282 Marssonina_brunnea_22-279 100 Tuber_melanosporum_21-268 9St9 reptomyces_davawensis_40-289 10Streptomyces_ipomoeae_49-290 8 6 4 Streptomyces_turgidiscabies_40-289 100 Streptomyces_sviceus_41-290 6 47 S3 treptomyces_avermitilis_40-289 100Streptomyces_scabiei_45-294 6 6 Streptomyces_viridochromogenes_35-284 9 3 Streptomyces_sviceus_51-301 9 9 Streptomyces_turgidiscabies_43-293 9 8 10Streptomyces_ipomoeae_31-280 1 100 Streptomyces_viridochromogenes_14-263 Firmicutes_bacterium_39-291 Acinetobacter_nectaris_72-331 0.4

Supp. Fig. 3. Mid-point rooted phylogenetic tree of the plant-like “Rhamnogal_lyase” domain encoded by the Neocallimastigomycota (red). Labels are consistent with the Figure 3.

4 Supplementary Tables Supplementary Table 1. List of Pfam domain names with annotated functions for the ones uniquely maintained or lost in Neocallimastigomycota (AGF) comparing to Chytridiomycota

AGF Unique Domains AGF Lost Domains

Domain Function Domain Function Acyl-coenzyme A:6-aminopenicillanic 2OG- 2OG-Fe(II) AAT acid acyl- FeII_Oxy_3 superfamily Alpha/beta superfamily, hydrolytic of widely differing 2OG- 2OG-Fe(II) oxygenase Abhydrolase_5 phylogenetic origin and catalytic FeII_Oxy_4 superfamily function that share a common fold. Aspartic proteases are a catalytic type of 3-hydroxyanthranilic acid protease enzymes that use an activated dioxygenase, part of the Asp_protease water molecule bound to one or more 3-HAO kynurenine pathway for the aspartate residues for of their degradation of tryptophan and the peptide substrates. biosynthesis of nicotinic acid Acetyl-coenzyme A synthetase N- Asp_protease_2 Aspartic proteases ACAS_N terminus N-linked (asparagine-linked) glycosylation of is mediated by a highly conserved pathway in , in which a This is the catalytic region of aspzincins, (dolichol phosphate)-linked Aspzincin_M35 a group of lysine-specific metallo- Alg6_Alg8 oligosaccharide is assembled at the endopeptidases in the M35 family endoplasmic reticulum membrane prior to the transfer of the oligosaccharide moiety to the target asparagine residues. This belongs to the family of , those acting on carbon- bonds other than peptide bonds, specifically in linear amidines. The systematic name of this enzyme class is allantoate amidinohydrolase. This enzyme participates in metabolism by facilitating the utilization of as secondary This is a carbohydrate binding domain nitrogen sources under nitrogen- Carb_bind which has been shown in Allantoicase limiting conditions. While purine Schizosaccharomyces pombe to be degradation converges to required for septum localisation in all vertebrates, its further degradation varies from species to species. and microorganisms produce and using the uricolytic pathway. Allantoicase performs the second step in this pathway catalyzing the conversion of allantoate into ureidoglycolate and . CBAH Choloylglycine hydrolase family ATP_sub_h ATP synthase complex subunit h

5 AGF Unique Domains AGF Lost Domains Carbohydrate-binding module family 10 (CBM10) is found in two distinct sets of proteins with different functions. Those found in aerobic bacteria bind cellulose (or other carbohydrates); but in anaerobic fungi they are binding CBM_10 domains, referred to as dockerin ATP-synt_10 mitochondrial ATPase complex domains. The dockerin domains are subunit ATP1 believed to be responsible for the assembly of a multiprotein cellulase/ hemicellulase complex, similar to the cellulosome found in certain anaerobic bacteria. Carbohydrate-binding module family 25 Atp11p is a molecular chaperone (CBM25) binds alpha- of the mitochondrial matrix that CBM_25 glucooligosaccharides, particularly those ATP11 participates in the biogenesis containing alpha-1,6 linkages, and pathway to form F1, the catalytic granular starch. unit of the ATP synthase This is a mannan-specific carbohydrate binding domain, previously known as the X4 module. Unlike other NADH dehydrogenase CBM_35 carbohydrate binding modules, binding B12D (ubiquinone) to causes a conformational change Carbohydrate-binding module family 6 (CBM6) is unusual in that is contains two substrate-binding sites, cleft A and cleft B. Cellvibrio mixtus endoglucanase 5A contains two CBM6 domains, the CBM6 domain at the C-terminus displays distinct ligand binding CBM_6 specificities in each of the sustrate- Band_7_C C-terminal region of band_7 binding clefts. Both cleft A and cleft B can bind cello-oligosaccharides, laminarin preferentially binds in cleft A, xylooligosaccharides only bind in cleft A and beta1,4,-beta1,3-mixed linked glucans only bind in cleft B. Mitochondrial branched-chain CBM-like Polysaccharide lyase family 4, domain BCDHK_Ad alpha-ketoacid dehydrogenase III om3 kinase CBM26 is a carbohydrate-binding CBM26 module that binds starch BCS1_N Mitochondrial chaperone BCS1 CHB_HEX_C_ Chitobiase/beta-hexosaminidase C- Eukaryotic mitochondrial 1 terminal domain Bot1p regulator protein CotH kinase protein, members of this family include the spore coat protein H CotH (cotH). This protein is an atypical Chalcone_2 Chalcone like protein kinase that phosphorylates CotB and CotG

6 AGF Unique Domains AGF Lost Domains Members of this family probably act as chromate transporters. Members of this family are found in both bacteria and Cthe_2159 Carbohydrate-binding domain- Chromate_tr archaebacteria. The proteins are containing protein ansp composed of one or two copies of this region. The alignment contains two conserved motifs, FGG and PGP. Cytokine-induced anti-apoptosis inhibitor 1, Fe-S biogenesis. This family includes FeoA a small Anamorsin, subsequently named protein, probably involved in Fe2+ CIAPIN1 for cytokine-induced transport [1]. This presumed short anti-apoptosis inhibitor 1, in domain is also found at the C-terminus humans is the homologue of yeast FeoA of a variety of metal dependent CIAPIN1 Dre2, a conserved soluble transcriptional regulators. This suggests eukaryotic Fe-S cluster protein, that this domain may be metal-binding. that functions in cytosolic Fe-S In most cases this is likely to be either protein biogenesis. It is found in iron or manganese. both the cytoplasm and in the mitochondrial intermembrane space (IMS) Ferrous iron transport protein B C oxidase biogenesis FeoB_C terminus Cmc1 protein Cmc1 like In molecular biology, the FLYWCH zinc finger is a zinc finger domain. It is found in a number of eukaryotic FLYWCH proteins. FLYWCH is a C2H2-type zinc Coa1 Cytochrome oxidase complex finger characterised by five conserved assembly protein 1 hydrophobic residues, containing the conserved sequence motif: Complex1_L NADH dehydrogenase Gal_Lectin Galactose binding lectin domain YR (ubiquinone) Glycoside hydrolase family 11 CAZY This is a family of proteins GH_11 comprises enzymes with only carrying the LYR motif of family Glyco_hydro_1 one known activity, xylanase (EC Complex1_L Complex1_LYR, PF05347 likely 1 3.2.1.8). These enzymes were formerly YR_2 to be involved in Fe-S cluster known as cellulase family G biogenesis in mitochondria. Glycoside hydrolase family 48 CAZY GH_48 comprises enzymes with several Glyco_hydro_4 known activities; endoglucanase (EC COQ7 Ubiquinone biosynthesis protein 8 3.2.1.4); cellobiohydrolase (EC COQ7 3.2.1.91). Glycoside hydrolase family 6 comprises enzymes with several known activities including endoglucanase (EC 3.2.1.4) and cellobiohydrolase (EC 3.2.1.91). These enzymes were formerly known as cellulase family B. The 3D structure of COX15- Glyco_hydro_6 the enzymatic core of cellobiohydrolase CtaA II (CBHII) from the fungus Trichoderma reesei reveals an alpha-beta protein with a fold similar to the ubiquitous barrel topology first seen in triose phosphate isomerase.

7 AGF Unique Domains AGF Lost Domains Glycoside hydrolase family 8 CAZY GH_8 comprises enzymes with several Cytochrome c oxidase, the last known activities; endoglucanase (EC enzyme in the respiratory electron Glyco_hydro_8 3.2.1.4); lichenase (EC 3.2.1.73); COX6B transport chain of mitochondria (or chitosanase (EC 3.2.1.132). These bacteria) located in the enzymes were formerly known as mitochondrial (or bacterial) cellulase family D. membrane. GT-D Glycosyltransferase GT-D fold CtaG_Cox11 Cytochrome c oxidase A leucine-rich repeat (LRR) is a protein structural motif that forms an α/β horseshoe fold. It is composed of repeating 20–30 stretches that are unusually rich in the hydrophobic amino acid leucine. These repeats commonly fold together to form a solenoid , termed leucine-rich repeat domain. Typically, each repeat unit has beta strand-turn- alpha helix structure, and the assembled domain, composed of many such LRR_5 repeats, has a horseshoe shape with an Cupin_4 Cupin superfamily protein interior parallel and an exterior array of helices. One face of the beta sheet and one side of the helix array are exposed to solvent and are therefore dominated by hydrophilic residues. The region between the helices and sheets is the protein's hydrophobic core and is tightly sterically packed with leucine residues. Leucine-rich repeats are frequently involved in the formation of protein– protein interactions. This cupin like domain shares Anaerobic ribonucleoside-triphosphate similarity to the JmjC domain, NRDD reductase Cupin_8 which catalyse a novel modification PD-(D/E)XK nuclease family PDDEXK_2 transposase. These proteins are Cyto_heme_ Holocytochrome-c synthase transposase proteins. lyase Cytochrome b561 is an integral membrane protein responsible for This family contains allergens lol PI, PII Cytochrom_ electron transport, binding two Pollen_allerg_1 and PIII from Lolium perenne. B561 heme groups non-covalently.[1] It is a family of ascorbate-dependent enzymes Pur_ac_phosph Purple acid Phosphatase, N-terminal Mitochondrial ribosomal death- _N domain DAP3 associated protein 3

8 AGF Unique Domains AGF Lost Domains Deoxyribodipyrimidine photolyase (DNA photolyase) is a DNA repair enzyme. It binds to UV-damaged This domain is usually found associated DNA containing pyrimidine Recombinase with PF00239 in putative integrases/ DNA_photol dimers and, upon absorbing a recombinases of mobile genetic yase near-UV photon (300 to 500 nm), elements of diverse bacteria and phages. breaks the cyclobutane ring joining the two pyrimidines of the dimer. Emopamil binding protein, encodes a non-glycosylated type I integral membrane protein of endoplasmic reticulum and shows high level expression in epithelial Rhamnogalacturonate lyase ( EC:4.2.2.-) tissues. The EBP protein has degrades the rhamnogalacturonan I (RG- emopamil binding domains, Rhamnogal_lya I) backbone of pectin. This family including the sterol acceptor site se contains mainly members from plants, EBP and the catalytic centre, which but also contains the plant pathogen show Delta8-Delta7 sterol Erwinia chrysanthemi. isomerase activity. Human sterol isomerase, a homologue of mouse EBP, is suggested not only to play a role in cholesterol biosynthesis, but also to affect lipoprotein internalisation. ETC_C1_N NADH dehydrogenase Rubrerythrin This domain has a ferritin-like fold DUFA4 (ubiquinone) Integrase, Retroviral integrase (IN) is an enzyme produced by a retrovirus (such as HIV) that enables its genetic material to be integrated into the DNA of the infected cell. Retroviral INs are not to be confused with phage integrases, such as Electron transfer flavoprotein- rve λ phage integrase (Int) (see site-specific ETF_QO ubiquinone oxidoreductase, recombination). 4Fe-4S IN is a key component in the retroviral pre-integration complex (PIC). The complex of integrase bound to cognate viral DNA (vDNA) ends has been referred to as the intasome. Reverse transcriptase (RNA-dependent DNA polymerase), is usually indicative of a mobile element such as a retrotransposon or retrovirus. Reverse transcriptases occur in a variety of Photolyases (EC 4.1.99.3) are mobile elements, including FAD_bindin DNA repair enzymes that repair RVT_2 retrotransposons, retroviruses, group II g_7 damage caused by exposure to introns, bacterial msDNAs, ultraviolet light. hepadnaviruses, and caulimoviruses. This Pfam entry includes reverse transcriptases not recognised by the PF00078 model. Carbohydrate esterase, sialic acid- SASA specific acetylesterase. Sialic acid FAD_bindin FAD-binding domain acetylesterase in autoimmunity g_8

9 AGF Unique Domains AGF Lost Domains Stealth protein CR1, Stealth_C1 is the first of several highly conserved regions on stealth proteins in metazoa and bacteria. There are up to four CR regions on all member proteins. CR1 carries a well-conserved IDVVYT Ferric reductase like sequence-motif. The domain is found in transmembrane component. This tandem with CR2, CR3 and CR4 on family includes a common region both potential metazoan hosts and in the transmembrane proteins pathogenic eubacterial species that are Ferric_reduc mammalian cytochrome B-245 Stealth_CR1 capsular polysaccharide t heavy chain (gp91-phox), ferric phosphotransferases. The CR domains reductase transmembrane appear on eukaryotic proteins such as component in yeast and respiratory GNPTAB, N-acetylglucosamine-1- burst oxidase from mouse-ear phosphotransferase subunits alpha/beta. cress. Horizontal gene-transfer seems to have occurred between host and bacteria of these sequence-regions in order for the bacteria to evade detection by the host innate immune system Stealth_CR2 is the second of several highly conserved regions on stealth proteins in metazoa and bacteria. There are up to four CR regions on all member proteins. CR2 carries a well-conserved NDD sequence-motif. The domain is found in tandem with CR1, CR3 and CR4 on both potential metazoan hosts and pathogenic eubacterial species that Stealth_CR2 are capsular polysaccharide FLILHELTA protein of unknown function phosphotransferases. The CR domains appear on eukaryotic proteins such as GNPTAB, N-acetylglucosamine-1- phosphotransferase subunits alpha/beta. Horizontal gene-transfer seems to have occurred between host and bacteria of these sequence-regions in order for the bacteria to evade detection by the host innate immune system A toxin-antitoxin system is a set of two or more closely linked genes that together encode both a protein 'poison' and a corresponding 'antidote'. When these systems are contained on plasmids – transferable genetic elements – they ensure that only the daughter cells that Ferroportin1, that may play a role inherit the plasmid survive after cell in iron export from the cell. This YoeB_toxin division. If the plasmid is absent in a FPN1 family may represent a number of daughter cell, the unstable antitoxin is transmembrane regions in degraded and the stable toxic protein Ferroportin1 kills the new cell; this is known as 'post- segregational killing' (PSK). Toxin- antitoxin systems are widely distributed in prokaryotes, and organisms often have them in multiple copies.

10 AGF Unique Domains AGF Lost Domains There is strong evidence for involvement of the ZinT domain in zinc homeostasis and management of zinc in the periplasm. It may also facilitate zinc Glycosyl hydrolase family 63 uptake from the environment through (CAZY GH_63) is a family of interactions with the znuABC zinc eukaryotic enzymes. They catalyse transporter. It is regulated by the the specific cleavage of the non- metalloregulator gene Zur (zinc uptake reducing terminal glucose residue ZinT regulator). Glyco_hydro from Glc(3)Man(9)GlcNAc(2). The domain was originally discovered in _63 Mannosyl oligosaccharide the bacterial stress response to cadmium. glucosidase EC 3.2.1.106 is the Further studies have found that it binds first enzyme in the N-linked to cadmium, zinc, nickel, and mercury, oligosaccharide processing but not other common metals such as pathway. cobalt, copper, iron, and manganese. It may have a secondary function in managing heavy-metal toxicity. Glycosyl hydrolase family 63 (CAZY GH_63) is a family of eukaryotic enzymes. They catalyse the specific cleavage of the non- reducing terminal glucose residue Glyco_hydro from Glc(3)Man(9)GlcNAc(2). _63N Mannosyl oligosaccharide glucosidase EC 3.2.1.106 is the first enzyme in the N-linked oligosaccharide processing pathway This family of glycosyl are specifically Glyco_transf (mannosyl) glucuronoxylomannan/ _90 galactoxylomannan -beta 1,2- xylosyltransferases, EC:2.4.2.-. Glyoxal_oxi d_N Glyoxal oxidase N-terminus Hydroxyacylglutathione hydrolase C-terminus. Substrate binding HAGH_C occurs at the interface between this domain and the catalytic domain Indoleamine 2,3-dioxygenase. Indoleamine 2,3-dioxygenase is the first and rate-limiting enzyme of tryptophan catabolism through IDO the kynurenine pathway, thus causing depletion of tryptophan which can cause halted growth of microbes as well as T cells. IGR IGR protein motif

11 AGF Unique Domains AGF Lost Domains Insulin-induced protein, found in the endoplasmic reticulum and bind the sterol-sensing domain of SREBP cleavage-activating protein (SCAP), preventing it from INSIG escorting SREBPs to the Golgi. Their combined action permits feedback regulation of cholesterol synthesis over a wide range of sterol concentrations. LIAS_N is found as the N- terminal domain of the Radical_SAM family in the LIAS_N members that are lipoyl synthase enzymes, particularly the mitochondrial ones in metazoa but also those in bacteria. Catalytic LigB subunit of aromatic LigB ring-opening dioxygenase Sphingolipid Delta4-desaturase (DES). Sphingolipids are important membrane signalling Lipid_DES molecules involved in many different cellular functions in eukaryotes. MAM33 Mitochondrial glycoprotein This family consists of several eukaryotic malonyl-CoA decarboxylase (MLYCD) proteins. Malonyl-CoA, in addition to being an intermediate in the de novo synthesis of fatty acids, is an MCD inhibitor of carnitine palmitoyltransferase I, the enzyme that regulates the transfer of long- chain fatty acyl-CoA into mitochondria, where they are oxidised. Mg_trans_N IPA Magnesium transporter NIPA Mitochondrial genome Mgm101p maintenance MGM11 Mitochondrial K+-H+ exchange- Mit_KHE1 related Mitochondrial inner membrane Mitofilin protein Methylmalonic aciduria and MMADHC homocystinuria type D protein Mo-co oxidoreductase dimerisation domain. This domain Mo- is found in molybdopterin co_dimer (Mo-co) . It is involved in dimer formation, and has an Ig-fold structure

12 AGF Unique Domains AGF Lost Domains MOSC N-terminal MOSC_N domain, predicted sulfur-carrier domain MRP_L53 39S ribosomal protein L53 Mitochondrial ribosomal protein MRP-L20 subunit L2 Mitochondrial ribosomal protein MRP-L28 L28 39S mitochondrial ribosomal MRP-L46 protein L46 39S mitochondrial ribosomal MRP-L47 protein L47 Mitochondrial ribosomal protein MRP-S25 S25 MTP18 Mitochondrial 18 KDa protein NADH dehydrogenase (ubiquinone), is an enzyme of the respiratory chains of myriad organisms from bacteria to MWFE humans that falls under the H+ or Na+-translocating NADH Dehydrogenase (NDH) Family (TC# 3.D.1), a member of the Na+ transporting Mrp superfamily. NAD_bindin Ferric reductase NAD binding g_6 domain Provide feedback NADH- NADH-ubiquinone oxidoreductase u_ox-rdase complex I, 21 kDa subunit NADH dehydrogenase NDUF_B7 (ubiquinone) NTPase_I-T protein of unknown function Oxoglutarate and iron-dependent Ofd1_CTDD oxygenase degradation C-term In molecular biology 2-oxo-4- hydroxy-4-carboxy-5- OHCU_deca ureidoimidazoline decarboxylase rbox (OHCU decarboxylase) EC 4.1.1.n1 is an enzyme involved in purine catabolism Optic atrophy 3 protein, deficiency of which causes type III 3- methylglutaconic aciduria (MGA) in humans. This disease manifests OPA3 with early bilateral optic atrophy, spasticity, extrapyramidal dysfunction, ataxia, and cognitive deficits, but normal longevity Mitochondrial import protein Pam17 Pam17 These domains play a key role in PDZ_1 the formation and function of signal transduction complexes.

13 AGF Unique Domains AGF Lost Domains This is a family of Peptidase_M metalloproteases. Proteins in this 76 family are also annotated as Ku70- binding proteins. PET assembly of cytochrome c PET117 oxidase Pet127 Mitochondrial protein Pet127

Pet191_N Cytochrome c oxidase

Pirin_C Pirin C-terminal cupin domain Ubiquinol-cytochrome-c reductase QCR10 complex subunit The DNA single-strand annealing proteins (SSAPs), such as RecT, Rad52_Rad2 Red-beta, ERF and Rad52, 2 function in RecA-dependent and RecA-independent DNA recombination pathways RNA dependent RNA polymerase, This family of proteins are eukaryotic RNA dependent RNA polymerases. These proteins are RdRP involved in post transcriptional gene silencing where they are thought to amplify dsRNA templates. Rib_5- Ribose 5-phosphate isomerase A P_isom_A (phosphoriboisomerase A) Ribosomal_ L33 unknown Ribosomal_ L34 unknown Ribosomal_ L36 unknown DNA-directed RNA polymerase N-terminal. This domain has a role in interaction with regions of upstream promoter DNA and the nascent RNA chain, leading to the processivity of the enzyme. This domain undergoes a structural change in the transition from RPOL_N initiation to elongation phase. The structural change results in abolition of the promoter , creation of a channel accommodating the heteroduplex in the and formation of an exit tunnel which the RNA transcript passes through after peeling off the heteroduplex.

14 AGF Unique Domains AGF Lost Domains S1 P1 nuclease protein domain, which cleave RNA and single stranded DNA with no sequence specificity. They are found in both prokaryotes and eukaryotes and are thought to be associated in S1- programmed cell death and also in P1_nuclease tissue differentiation. Furthermore, they are secreted extracellular, that is, outside of the cell. Their function and distinguishing features mean they have potential in being exploited in the field of biotechnology. This family is involved in SCO1-SenC biogenesis of respiratory and photosynthetic systems. Squalene epoxidase. This domain is found in squalene epoxidase (SE) and related proteins which are found in taxonomically diverse groups of eukaryotes and also in bacteria. SE was first cloned from SE Saccharomyces cerevisiae where it was named ERG1. It contains a putative FAD binding site and is a key enzyme in the sterol biosynthetic pathway [1]. Putative transmembrane regions are found to the protein's C-terminus. The Sgf11 family is a SAGA complex subunit in Saccharomyces cerevisiae. The SAGA complex is a multisubunit protein complex involved in transcriptional regulation. SAGA Sgf11 combines proteins involved in interactions with DNA-bound activators and TATA-binding protein (TBP), as well as enzymes for histone acetylation and deubiquitylation mitochondrial inner membrane She9_MDM proteins with a role in inner 33 mitochondrial membrane organisation and biogenesis Organic solute transporter Ostalpha. This family is a transmembrane organic solute transport protein. In vertebrates Solute_trans these proteins form a complex _a with Ostbeta, and function as bile transporters [1]. In plants they may transport brassinosteroid-like compounds and act as regulators of cell death

15 AGF Unique Domains AGF Lost Domains The Spo12 protein plays a regulatory role in two of the most fundamental processes of biology, mitosis and meiosis, and yet its biochemical function remains elusive. Spo12 is a nuclear protein. Spo12 is a component of the FEAR (Cdc fourteen early Spo12 anaphase release) regulatory network, that promotes Cdc14 release from the nucleolus during early anaphase. The FEAR network is comprised of the polo kinase Cdc5, the separase Esp1, the kinetochore-associated protein Slk19, and Spo12. SPO22/ZIP4 in yeast is a meiosis specific protein involved in sporulation. It has been shown to SPO22 regulate crossover distribution by promoting synaptonemal complex formation Sucrase/ferredoxin-like. This family contains a number of bacterial and eukaryotic proteins Suc_Fer-like approximately 400 residues long that resemble ferredoxin and appear to have sucrolytic activity TIM21 interacts with the outer mitochondrial TOM complex and TIM21 promotes the insertion of proteins into the inner mitochondrial membrane TMEM223 Transmembrane protein 223 Coenzyme Q – cytochrome c UcrQ reductase Ureidoglycolate lyase, one of the Ureidogly_ly enzymes that acts upon ase ureidoglycolate, an intermediate of purine catabolism, releasing urea. VRR_NUC unknown YCII-related domain. This domain is suggested to play a role in transcription initiation (Bateman A YCII per. obs.). This domain is named after the most conserved motif in the alignment. Zinc-finger domain of zf-4CXXC_ monoamine-oxidase A repressor R1 R1 NADH dehydrogenase zf-CHCC (ubiquinone)

16 Supp. Table 2. Genome information of the animal hosts and diet plants used in the study to infer the genetic elements in Neocallimastigomycota that have a foreign origin

Name Taxon Version Source Loxodonta africana Elephant (African savanna Loxafr3.0 Broad Institute elephant) Horse Equus caballus EquCab2.0 Wade et al. 2009 International Sheep Sheep Ovis aries Oar_v4.0 Genomics Consortium 2010 Yak Bos mutus (wild yak) BosGru_v2.0 Qiu et al. 2012 Banana Musa acuminata DH-Pahang v2 Martin et al. 2016 Elaeis guineensis Palm EG5 Singh et al. 2013 (African oil palm) Phyllostachys Bamboo heterocycla var. v1.0 Peng et al. 2013 pubescens Aegilops tauschii GoatGrass Aet_MR_1.0 Zimin et al. 2017 subsp. Tauschii B73 Maize Zea mays Jiao Y, et al. 2017 RefGen_v4 Oryza sativa Japonica Rice Annotation Project Rice Build 4.0 Group 2007 Brachypodium International Brachypodium Brome v2.0 distachyon Initiative 2010 Sorghum Sorghum bicolor v3 Paterson et al. 2009 Arabidopsis Arabidopsis thaliana TAIR10 Swarbreck et al. 2008 Moss Physcomitrella patens V1.1 Rensing et al. 2008

17 References:

Broad Institute. Elephant Genome Project. (2018). Jiao, Y. et al. Improved maize reference genome with single-molecule technologies. Nature 546, 524–527 (2017). Martin, G. et al. Improvement of the banana ‘Musa acuminata’ reference sequence using NGS data and semi-automated bioinformatics methods. BMC Genomics 17, 1–12 (2016). Paterson, A. H. et al. The Sorghum bicolor genome and the diversification of grasses. Nature 457, 551–556 (2009). Peng, Z. et al. The draft genome of the fast-growing non-timber forest species moso bamboo (Phyllostachys heterocycla). Nat. Genet. 45, 456–461 (2013). Qiu, Q. et al. The yak genome and adaptation to life at high altitude. Nat. Genet. 44, 946–949 (2012). Rensing, S. A. et al. The Physcomitrella genome reveals evolutionary insights into the conquest of land by plants. Science 319, 64–69 (2008). Singh, R. et al. Oil palm genome sequence reveals divergence of interfertile species in Old and New worlds. Nature 500, 335–339 (2013). Swarbreck, D. et al. The Arabidopsis Information Resource (TAIR): gene structure and function annotation. Nucleic Acids Res. 36, D1009–D1014 (2008). The International Brachypodium Initiative et al. Genome sequencing and analysis of the model grass Brachypodium distachyon. Nature 463, 763–768 (2010). The International Sheep Genomics Consortium et al. The sheep genome reference sequence: a work in progress. Anim. Genet. 41, 449–453 (2010). The Rice Annotation Project. Curated genome annotation of Oryza sativa ssp. japonica and comparative genome analysis with Arabidopsis thaliana. Genome Res. 17, 175–183 (2007). Wade, C. M. et al. Genome sequence, comparative analysis, and population genetics of the domestic horse. Science 326, 865–867 (2009).

18 Zimin, A. V. et al. Hybrid assembly of the large and highly repetitive genome of Aegilops tauschii, a grogenitor of bread wheat, with the MaSuRCA mega-reads algorithm. Genome Res. 27, 787–792 (2017).

19