bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
1 The integrin-mediated adhesome complex, essential to multicellularity, is present in the most 2 recent common ancestor of animals, fungi, and amoebae. 3 4 Seungho Kang1,2, Alexander K. Tice1,2, Courtney W. Stairs3, Daniel J. G. Lahr4, Robert E. 5 Jones1,2, Matthew W. Brown1,2* 6 7 1. Department of Biological Sciences, Mississippi State University USA 8 2. Institute for Genomics, Biocomputing & Biotechnology, Mississippi State University, USA 9 3. Department of Cell and Molecular Biology, Uppsala University, Sweden. 10 4. Department of Zoology, University of São Paulo, Brazil 11 *Correspondence: Matthew W. Brown, [email protected] 12 13 Keywords: Integrin, Amoebozoa, Protein Domain architecture, amoeba, reductive evolution
14
15 Abstract: Integrins are transmembrane receptor proteins that activate signal transduction
16 pathways upon extracellular matrix binding. The Integrin Mediated Adhesion Complex (IMAC),
17 mediates various cell physiological process. The IMAC was thought to be an animal specific
18 machinery until over the last decade these complexes were discovered in Obazoa, the group
19 containing animals, fungi, and several microbial eukaryote lineages. Amoebozoa is the eukaryotic
20 supergroup sister to Obazoa. Even though Amoebozoa represents the closest outgroup to Obazoa,
21 little genomic-level data and attention to gene inventories has been given to the supergroup. To
22 examine the evolutionary history of the IMAC, we examine gene inventories of deeply sampled
23 set of 100+ Amoebozoa taxa, including new data from several taxa. From these robust data
24 sampled from the entire breadth of known amoebozoan clades, we show the presence of an
25 ancestral complex of integrin adhesion proteins that predate the evolution of the Amoebozoa. Our
26 results highlight that many of these proteins appear to have evolved earlier in eukaryote evolution
27 than previously thought. Co-option of an ancient protein complex was key to the emergence of
28 animal type multicellularity. The role of the IMAC in a unicellular context is unknown but must
29 also play a critical role for at least some unicellular organisms.
1 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
30
31 INTRODUCTION
32 Integrins are transmembrane signaling and adhesion heterodimers, made up of a-integrin (ITGA)
33 and b-integrin (ITGB) protein subunits [1]. They are important signal transduction molecules
34 across the cell membrane and associate with a set of intracellular proteins that act on the actin
35 cytoskeleton, together known as the integrin mediated adhesion complex (IMAC) (figure 1A) [2].
36 The intracellular component of the IMAC includes adhesome proteins that are 1) integrin bound
37 proteins that can bind to actin directly (talin, α-actinin, and filamin); 2) integrin bound proteins
38 that can bind indirectly to the cytoskeleton (integrin linked kinase (ILK), particularly interesting
39 new cysteine-histidine rich protein (PINCH), and Paxillin); and 3) non-integrin binding proteins
40 such as vinculins.
41
42 The IMAC is best known for its critical role in cellular aspects of behavior including development,
43 migration, proliferation, survival and metabolism [1,3,4]. They function by relaying messages
44 through outside/in and inside/out between the extracellular matrix and the internal cytoskeleton[5].
45
46 Since integrin proteins have been demonstrated to be absent in other complex multicellular
47 lineages of eukaryotes (i.e., plants and fungi) as well as in the choanoflagellates, which are the
48 closest protistan relatives of Metazoa, integrins were initially believed to be an animal specific
49 evolutionary innovation. However, over the last decade, investigations into the genomic content
50 of other close protistan relatives of animals have led to the discovery of integrins and other IMAC
51 components outside of Metazoa. Integrins and complete components of the IMAC have been
52 reported outside Metazoa in unicellular opisthokonts, such as Capsaspora owczarzaki[6],
2 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
53 Pigoraptor spp.[7], Syssomonas multiformis[7], Corallochytrium limacisporum[8], Creolimax
54 fragrantissima[9], Sphaeroforma arctica[10], as well in the apusomonad Thecamonas trahens [6],
55 and breviate Pygsuia biforma[11]. Thus, the IMAC complex and other associated proteins have a
56 much more ancient evolutionary origin than previously expected, well outside of the animals, but
57 IMAC remains to be only reported the in lineage called Obazoa, which is composed of
58 Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to
59 Obazoa, or have they appeared even deeper in evolutionary history?
60
61 Amoebozoa is the eukaryotic supergroup that is sister to Obazoa, altogether grouped in the
62 Amorphea[12]. Despite the relatively close evolutionary proximity of Amoebozoa to the animals
63 containing lineage, genomic-level data and relevant research into the supergroup remains sparse.
64 The predicted or inferred genomic complement present has yet to be undetermined in the last
65 common ancestor of Amoebozoa. Recent works revealed that in Amoebozoa (Dictyostelium
66 discoideum and Acanthamoeba castellanii) crucial components of the IMAC were missing [13],
67 namely ITGA and ITGB. Was this result simply due to a lack of data or is it an evolutionary trend?
68 Here, we examine the gene repertories across the Amoebozoa supergroup to investigate the
69 evolutionary history of the adhesome associated with the integrin proteins. The absence of the
70 integrin proteins reported previously may simply be due to the great paucity of taxonomically
71 broad genomic data from this clade, which is a limiting factor in our knowledge of IMAC evolution.
72
73 To test if the lack of IMAC is truly absence or rather unobserved due to a lack of data from a broad
74 taxonomic sampling, we took a comparative genomic/transcriptomic approach utilizing data
75 recently used to resolve the tree of Amoebozoa [14]. Here, we have found a number of predicted
3 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
76 IMAC proteins across a wide breadth of Amoebozoa and in some unexamined obazoans
77 (Ministeria vibrans (Opisthokonta [12]) and Lenisia limosa (Breviatea[15])), including integrins,
78 both ITGA and ITGB (figure 1B). Across the breadth of these taxa, we have identified a nearly
79 complete IMAC. Here we demonstrate that IMAC is not exclusive to Obazoa. Rather it was in the
80 last common ancestor of Amorphea (i.e., Obazoa + Amoebozoa), which from our interpretations
81 of the gene distributions probably had a full set of these components (figure 1C).
82
83 BACKGROUND
84 Molecular details on Integrin proteins
85 The overall architecture of both integrin proteins includes a large extracellular N-terminal ligand-
86 binding head domain and C-terminal stalk (also termed “legs”). Integrin activation is dependent
87 on the binding of a ligand (such as extracellular matrix proteins (collagen, fibronectin, and laminin)
88 [5]and the binding of divalent cations (primarily Ca2+, Mg2+, or Mn2+) on unique but well-
89 conserved cation binding motifs[16–19]. In animals, there are universally five cation binding sites
90 on the ITGA and three on the ITGB paralogs[16,17,19,20]. Intracellular integrin engagement with
91 cytoskeletal proteins in IMAC can regulate cell-signaling pathways that control differentiation,
92 survival, and motility making the integrins one of the most functionally diverse family of cell
93 adhesion molecules [5,21].
94
95 Integrin Alpha (ITGA)
96 Metazoa ITGA
97 In metazoan ITGA, which we refer herein to as “canonical ITGA proteins”, there is a β-propeller
98 made of seven blades (IPR013519)[18], an integrin leg domain (IPR013649)[18], a
4 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
99 transmembrane domain, and a C-terminal cytoplasmic tail amino acid motif (KXGFFXR -
100 IPR018184) [22](numbers are INTERPRO ids derived from INTERPROSCAN version
101 5.27.66 (IPRSCAN)[23]). These proteins are typically 700-800 amino acids in animals. The β-
102 propeller blade motif is composed of an FG-GAP domain (IPR013517), and in the final 3-4 blades
103 there is a cation binding site motif between an “FG” and the “GAP” motifs. Of note, the consensus
104 pattern of the FG-GAP domain is highly variable[20], and not necessarily those amino acids (i.e.,
105 FG and GAP)[20]. The cation binding blades of the β-propeller influences extracellular ligand
106 affinity and ligand specificity[24,25]. The cation motifs follow the consensus pattern of D/E-h-
107 D/N-x-D/N-G-h-x-D/E (h=hydrophobic residue; x=any residue)[16]. In metazoan ITGA, there are
108 three ITGA leg domains (IPR013649) located next to the ITGA head. These ITGA leg
109 (IPR013649) domains are structurally and functionally similar to immunoglobulin (Ig) fold-like
110 beta sandwich[18].
111
112 The classification of an authentic ITGA protein outside Metazoa would be a membrane docked
113 protein with an ITGA head composed of a β-propeller made of several β-propeller blades
114 (IPR013519) (figure 2), some with FG-GAP domains (IPR013517) only, and some with a cation
115 motif flanked by FG and GAP motifs (figure 4). However, we do not discard the possibility of
116 truncated ITGA because they occur in metazoans [26,27] and in the opisthokont Creolimax
117 fragrantissima [9].
118
119
120 Integrin Beta (ITGB)
121 Metazoan ITGB
5 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
122 In metazoan ITGB, which we refer herein to as “canonical ITGB proteins”, there are seven
123 domains; a plexin-semaphorin-integrin (PSI) (IPR002165), a hybrid domain (described below),
124 three to four EGF modules (IPR013111), an integrin b tail subunit (IPR012896), transmembrane,
125 and signal peptide domains[18]. The metazoan hybrid domain is composed of a unique von
126 Willebrand factor A (known as ITGB “inserted” domain (or b-I) (IPR002369) and a Ig-like domain
127 (IPR013783) upstream of the b-I domain [18] (figure 3). EGF domains make up a cysteine rich
128 stalk motif (CRSM) that corresponds to a b-subunit stalk region[18]. The consensus pattern of the
129 CRSM in Opisthokonta is CXCXXCXC[6,28], in Metazoa there are typically 3-4 CRSMs. The
130 metazoan transmembrane domain consists of GXXXG membrane motif, important for
131 oligomerization of integrin proteins[29].
132
133 In Metazoa, the b-I-domain is made up of three metal ion binding sites (comprising a ca. 200
134 amino-acid region in total)[18]. These are, a metal ion dependent adhesion site (MIDAS), site
135 adjacent to MIDAS (called ADMIDAS), and a synergistic metal ion binding site (SyMBS)[16].
136 These metal ion binding sites are key regulators for all integrin-ligand interactions and require the
137 divalent cation ions such as manganese, calcium, and magnesium ions[16]. Three metal binding
138 sites have a consensus motif of DXSYSX95EX30D (at site 150 in Human ITGB1 (NCBI:P05556.2)
139 (MIDAS), SXXDDX121DX83A (at site 154 in Human ITGB1 (NCBI:P05556.2) (ADMIDAS), and
140 D/EX55NDXPXE (at site 189 in Human ITGB1 (NCBI:P05556.2) (SyMBS)[17]. The MIDAS and
141 SyMBS (the positive regulatory site for integrin activation[16]) motifs are essential for integrin-
142 ligand binding[16,18], but ADMIDAS (the negative regulatory [30] site for integrin inhibition) is
143 not absolutely necessary for integrin activation[30]. The ITGB tail subunit (IPR012896) consists
144 of two conserved motifs responsible for activating the integrin complex: the cytosolic tail motif
6 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
145 (NPXY/F - to scaffold cytoskeletal proteins) located after the transmembrane region, and the
146 ITGA interacting motif (LLXXXHDRKE – a highly electrically charged amino acid motif)
147 downstream (C-terminus) of the NPXY/F motif, which interacts with scaffolding and signaling
148 proteins[31].
149
150 The classification of an authentic ITGB protein outside Metazoa would be a membrane docked
151 protein with an ITGB head composed of a von Willebrand factor A (IPR002369) with three metal
152 ion binding sites, and cysteine rich regions at downstream (N-terminal) (figure 3). Again, we do
153 not discard the possibility of truncated ITGB since these are present in both metazoans [26,27] and
154 Creolimax fragrantissima[9].
155
156 RESULTS
157 Integrin Alpha
158 Amoebozoan ITGA: General observations on ITGA
159 Our results show ITGA homologs are found in 23 out of 113 amoebozoan transcriptomes
160 examined (figure 2). Of those 23 amoebozoans, eight have more than one ITGA paralog (figure
161 2). Amoebozoan ITGAs vary greatly in size ranging from 523-1200 amino acids in length
162 (supplemental table 1). Many of the proteins we identified are predicted to have a diverse number
163 of β-propellers from a few (3-5 b-propeller blades), to canonical (6-7 b-propeller blades), to many
164 (more than 8 b-propeller blades) and cation binding sites (1 to 5 sites). However, all amoebozoan
165 ITGA proteins have at least three β-propeller blade (IPR013517) domains (figure 2), with an FG-
166 GAP domain and FG-GAP/cation binding motif (figure 3A). The positions of the cation binding
167 motif/FG-GAP domains are seemingly distributed randomly (supplemental table 1), and not
7 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
168 necessarily in the last 3-4 blades as in canonical animal ITGA. We noticed the presence of a C-
169 terminal transmembrane anchor (IPR021157) domain in three amoebozoan ITGAs. Animal-like
170 KXGFFXKR cytoplasmic tail motif is absent in all but one amoebozoan (figure 3E; supplemental
171 table 1).
172
173 We find amoebozoan ITGAs are present in all three major clade, but interestingly these proteins
174 are not in eight amoebozoan taxa that have genome sequences, (Dictyostelium spp., Acytostelium
175 spp., Heterostelium spp., Speleostelium caveatum, Physarum polycephalum, Entamoeba spp.,
176 Acanthamoeba spp., and Protostelium fungiovrum[32]). However very importantly, we were able
177 to confidently find ITGA in Mastigamoeba balamuthi (figure 2), which has a complete genome
178 available [33]. To confirm the presence of these genes in the genome, we used a BLAST-based
179 homology search to identify 2 ITGA genes of M. balamuthi to the genome data (CBKX00000000)
180 on NCBI (see Methods) [86]. For ITGA, there are no introns in either paralog 1
181 (CBKX010025823.1) or paralog 2 (CBKX010005624.1).
182
183 We classified ITGA into two representative types based on their predicted IPRSCAN domains,
184 cation motifs, their similarity to metazoans ITGA, and novel domains that are previously
185 unobserved in ITGA proteins.
186
187 Amoebozoa ITGA: Type I ITGA: similar architecture to Metazoa
188 For communication purposes we have classified the amoebozoan ITGA homologs as ‘Type I’ and
189 ‘Type II’ based on their similar or divergent domain complements compared to metazoan ITGA
190 sequences, respectively (figure 2). From the 23 amoebozoan taxa in which we identified an ITGA,
8 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
191 19 taxa have Type I (figure 2). One amoebozoan (Amphizonella sp. 9 paralog 2) has a Type I ITGA
192 with a domain architecture and cation motifs that are virtually identical to head region of metazoan
193 ITGA (figure 2, supplemental table 1). The rest of the amoebozoan Type I ITGAs have canonical
194 beta propellers, but they have varying numbers of cation motifs that differs from the canonical
195 ITGA proteins (supplemental table 1). Most of the cation motifs of all but four amoebozoan Type
196 I ITGAs follow the consensus metazoan amino acid pattern (figure 3A, supplemental table 1).
197 thirteen of the amoebozoan Type I ITGAs have both a predicted signal peptide and transmembrane
198 region (figure 2). The rest of the amoebozoans Type I ITGAs have only a predicted signal peptide
199 region or transmembrane region. Amoebozoa ITGA without transmembrane region may be the
200 result of endogenous soluble integrin. In a few cases we did not observe the signal peptide or the
201 transmembrane region. This may be due to truncation as we are predicting proteins from a
202 transcriptome or the product of shedding [34]. However, in the genome of Mastigamoeba
203 balamuthi all ITGA paralogs lack signal peptides and one lacks a transmembrane region (paralog
204 1).
205
206 Out of 19 Type I amoebozoan ITGA homologs, four ITGAs have an extracellular Ig-fold-like
207 (IPR013783) or cadherin-like (IPR015919) domain next to the ITGA head (figure 2). A cadherin
208 like domain (IPR015919) and an Ig-fold-like domain (IPR013783) are part of an immunoglobulin
209 fold like beta sandwich protein class, which fold in a similar structure[35]. The three amoebozoan
210 ITGAs that contain either an Ig-fold-like or a cadherin-like domain also have canonical b-
211 propellers (6-7 blades) (figure 2), although two of the four Type I ITGA have an extra cation motif
212 (supplemental table 1).
213
9 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
214 Amoebozoan ITGA is a Type II ITGA with novel domains
215 We designate any ITGA homolog with a domain flanking the ITGA head that has not previously
216 been reported in an ITGA protein as a “Type II” ITGA (figure 2). From the 23 amoebozoan taxa
217 in which we identified an ITGA, 12 taxa have Type II (figure 2). Of the 23 Type II amoebozoan
218 ITGAs we discovered, two have a protein kinase domain (IPR000719) (figure 2), five
219 amoebozoans have a proline rich extension motif called “extensins” (PR01217) (numbers are
220 SPRINTS ids derived from PRINTS version 42.0 [36] and INTERPROSCAN version 5.27.66
221 [23](supplemental methods)) (figure 3F; supplemental table 1).
222
223 In one Type II ITGA, one amoebozoan sequence (Flabellula citata paralog 1) uniquely has an
224 epidermal growth factor (EGF)-like domain (IPR000742) as its only novel domain located
225 downstream (C-terminal) to the ITGA head (figure 2). Additionally, three amoebozoans uniquely
226 have two epidermal growth factor (EGF) domains (figure 3B), a Thrombospondin repeat-1 (TSP1)
227 (IPR000884) (figure 3C), and a carbohydrate binding domain called a wall stress component
228 (WSC) (IPR002889) (figure 3D) all with cysteine rich motifs located upstream (N-terminal) to the
229 ITGA head (figure 2). We were unable to detect any other protein in the nr database with
230 significant similarity to the N-terminal domain composition of these sequences (EGF-TSP1-WSC)
231 using an e-value cut-off < 1e-10. This suggests that this EGF-TSP1-WSC architecture may be
232 unique to these three amoebozoans.
233
234 Seven of the amoebozoan ITGAs we designated as Type II possess canonical (6-7 blades) b-
235 propellers (figure 2) and a different number of cation motifs (supplemental table 1). The other five
236 amoebozoans Type II ITGAs have short (3-5 blades) b-propellers, and a wide range of number of
10 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
237 cation motifs (supplemental table 1). Four amoebozoan ITGA homologs of this type contained
238 both a signal peptide and a transmembrane domain (figure 2). Eight others had either a signal
239 peptide or a transmembrane region.
240
241 Obazoan ITGA: General observations on ITGA
242 We also found ITGA homologs in several underexplored obazoan transcriptomes: the opisthokont
243 Ministeria vibrans and the breviate Lenisia limosa (figure 2). Obazoan ITGA proteins have at least
244 three β-propeller blade (IPR013517) domains, with an FG-GAP domain (figure 2, supplemental
245 figure 5A) and FG-GAP/cation binding motif domains are seemingly distributed randomly
246 (supplemental table 1). Animal-like KXGFFXKR cytoplasmic tail motifs were observed in most
247 obazoan transcriptomes that were examined in this study (figure 4E, supplemental figure 5C).
248 Among the non-animal obazoans, the seven novel obazoans ITGA have an Ig-fold-like domain leg
249 next to ITGA head (figure 2; supplemental figure 5D).
250
251 Ministeria vibrans has both Type I and Type II paralogs (figure 2). In Type II, paralog 1 and 2
252 have a predicted long (10) ITGA b-propeller, but they do not contain transmembrane domains.
253 The other paralog has a few (3) b-propeller blades (one with FG-GAP/cation binding motif), a
254 cytosolic tail motif (figure 3E), and a transmembrane domain (figure 2). In both paralog 1 and 3,
255 a signal peptide region was not predicted. Ministeria vibrans Type II ITGA uniquely has an
256 epidermal growth factor (EGF) like domain (IPR000742) attached upstream (N-terminal) to the
257 ITGA head (figure 2). All obazoans in our study have a signal peptide or a transmembrane region
258 except for Pigoraptor spp., Sphaeroforma arctica, and Syssomonas multiformis (figure 2), which
259 lack both.
11 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
260
261 Phylogenetic analysis of ITGA
262 The unrooted tree of ITGA recovered all of Amoebozoa ITGA as a clade (figure 2), which is
263 generally sister to most of Obazoa. However, a few opisthokont sequences are nested in this clade.
264 These include a paralog of Capsaspora owczarzaki (ITGA4: XP_004347912), both M. vibrans
265 paralogs, Breviates and the lone Syssomonas multiformis ITGA. The major lineages or sublineages
266 of Amoebozoa were not recovered. Instead amoebozoan ITGA are divided into two clades. One
267 clade consists primarily of tubulinid amoebozoans (+ Ministeria) and the other contains the rest of
268 Amoebozoa.
269 It should be noted that in addition to these taxa, the genome sequence of the very
270 evolutionary distant Stramenopile brown alga (Ectocarpus siliculosus) revealed the presence of
271 three ITGA homologs[37] (supplemental figure 8). However, no other available Stramenopile
272 genome data or transcriptomic data searched thus far (for a list of taxa searched, please see
273 supplemental text) contains integrin homologs in our examinations. Therefore, we inferred an
274 additional ITGA tree with this species (supplemental figure 3). Through BlastP to NCBI-NR we
275 identified several predicted beta propeller proteins in Archaea that have a similar architecture to
276 the canonical ITGA. An unrooted tree of beta propeller homologs recovered Archaea as a
277 paraphyletic across our phylogeny (supplemental figure 8), this is further discussed below.
278 We observed more FG-GAP proteins in other taxa such as cyanobacteria (e.g. Gloeobacter
279 violaceus; WP_011143221.1, Nostoc punctiforme; ACC84844.1 and Crocosphaera watsonii;
280 WP_007307935.1), Haptophyta (Emiliania huxleyi; XP_005778208.1), Crytophyta (Guillardia
281 theta; XP_005842481.1), and Stramenopiles (e.g. Nannochloropsis spp. CCMP1776: TFJ84058.1,
282 EWM21519 and Cafeteria roenbergensis; KAA0148838.1) on NCBI. These FG-GAP proteins
12 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
283 either do not possess transmembrane and signal peptide regions or possess 7 transmembrane or
284 more. Thus, in our estimation these FG-GAP proteins fail to pass our minimum criteria of being
285 an ITGA.
286
287 Integrin Beta
288 Amoebozoan ITGB: General observations on non-metazoan ITGB
289 We find that ITGB homologs are found in 19 out of 113 amoebozoan transcriptomes examined
290 with three ITGB paralogs in Rhizamoeba saxonica and two ITGB paralogs in Idionectes vortex
291 (figure 4). We find amoebozoan ITGBs are present in all three major clades of Amoebozoa.
292 However like in ITGA, these proteins are not in four amoebozoans groups that have genome
293 sequences, (the dictyostelids, entamoebids, acanthamoebids, and Protostelium fungiovrum[32]).
294 We were able to find ITGB gene in the genome of the amoebozoan Mastigamoeba balamuthi
295 ATCC 30984 (figure 6). For ITGB, there were five introns in the genomic scaffold
296 CBKX010020319.1 and eight introns in CBKX010020318.1. These introns are canonical M.
297 balamuthi introns and the other genes on the scaffold are mastigamoebid/amoebozoan in nature.
298 We detected a tenascin ORF on ITGB genomic scaffold of CBKX010020318.1 and PA14 ORF on
299 CBKX010020318.1. We inferred a phylogenetic tree for tenascin and PA14 by searching for
300 homologs in our extensive datasets, both of which clearly show amoebozoan signal (shown as a
301 subset in figure 6). The presence of introns within the genomic scaffold for ITGB clearly
302 demonstrates that this protein is indeed part of the genome. Additionally, the phylogenetic signal
303 of other ORFs on these genomic scaffolds show amoebozoan signal and are not from
304 contamination in the genome project.
13 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
305
306 The size range of ITGB paralogs in Amoebozoa vary widely from 595 to 3296 amino acids
307 (supplemental table 2). In addition to the novel ITGB proteins in Amoebozoa, we also found a
308 previously unreported ITGB homologs in two obazoan transcriptomes, as well in CRuMs, which
309 is a recently described clade closely related to Amorphea[38]. Based on domain and motif
310 architecture from IPRSCAN outputs, we classify non-metazoan ITGBs into two representatives;
311 ITGB that are similar to a canonical ITGB, and unique domain architecture proteins that are ITGB-
312 like.
313
314 Type I ITGB: similar architecture to Metazoa
315 Our first representative type of ITGB has a similar domain architecture to metazoan ITGB (figure
316 4). From the 19 amoebozoan taxa in which we identified an ITGB, 14 taxa have Type I (figure 4).
317 Most amoebozoans in this class appear to have the vWA domain, b-subunit stalk region and the
318 PSI domains except two amoebozoans that lacks PSI domains and one amoebozoans which possess
319 the duplicated vWA domain (figure 4). Amoebozoa have the CRSM pattern in the b-subunit stalk
320 region of DXCGXCGGX(3-6)CXGC (figure 5F) ranging from three to seven times (Supplemental
321 table 2). In of b-tail motifs (IPR012896), the cytosolic tail motif (NPXY/F) is present across
322 Amoebozoa (figure 5G) except one amoebozoan (supplemental table 2) whereas an ITGA
323 interacting motif (LLXXXHDRKE) is absent in Amoebozoa. Twelve amoebozoans in this class
324 possess amino acids with a predicted electrostatic charge where the ITGA interacting motif
325 (LLXXXHDRKE) would be present in metazoan ITGB (figure 4, figure 5H). MIDAS are well
326 conserved especially in amoebozoan b-I domains (figure 5A, B) and SyMBS are well conserved
327 across the amoebozoans (figure 5C, D). However, ADMIDAS motifs are quite variable in
14 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
328 Amoebozoa compared to the metazoan consensus pattern (figure 5B, Supplemental table 2). A
329 signal peptide and a transmembrane domain were predicted in most amoebozoan ITGB in this
330 class and three amoebozoans lacking signal peptides (figure 4).
331
332 The ITGB proteins of non-metazoan obazoans have domain/motif architectures very similar to a
333 canonical metazoan ITGB (figure 4, supplemental 6). At the B-tail motifs (IPR012896) of Obazoa,
334 the cytosolic tail motif (NPXY/F) is present in all obazoans ITGB homologs (supplemental figure
335 6G), but we did not observe the ITGA interacting motif (LLXXXHDRKE) in ITGB of Breviatea.
336 We found a metazoan GXXXG motif in three amoebozoan ITGB (supplemental table 2),
337 suggesting that this motif is not animal specific. There are multiple tandem repeats of the
338 opisthokont CRSM pattern in Obazoa ITGB and both breviate taxa have a clear expansion of b-
339 subunit stalk region as previously reported [6–11] (figure 3). All of the obazoan ITGB MIDAS,
340 ADMIDAS, SyMBS motifs were well-conserved (supplemental figure 6A-D). We also
341 investigated a possible “choanoflagellate ITGB gene” that was recently reported to be present in
342 the transcriptome of Didymoeca costata[39]. The protein domain architecture has a serine protease
343 flanking vWA, PSI domain. However, it is missing the CRSM, NPXY/F (IPR021157),
344 LLXXXHDRKE (IPR014836) and GXXXG motifs. To our surprise Rigifila ramosa (in CRuMs),
345 has a protein that contains all the canonical domains of ITGB such as vWA, amoebozoan CRSM
346 pattern and c-terminal motifs (figure 3) and all typical cation motifs (supplemental table 2). We
347 were unable to find an ITGA homolog in either Rigifila ramosa or Didymoeca costata.
348
349 Type two ITGB: unique architecture compared to Metazoa and solely found in the amoebozoan
350 clade Variosea
15 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
351 Our second representative class of ITGB, which we refer to as Type II, are large proteins (1405 aa
352 to 3641 aa) that have novel domain architectures unobserved before in ITGB, which we refer to as
353 Type II. Of the total 19 amoebozoan taxa that have ITGB five amoebozoans have an ITGB
354 assigned to our Type II class (figure 4). Type II amoebozoan ITGB domain architectures include
355 Laminin G3 (LamG3) (IPR013320), Von Willebrand Type D (vWD) (IPR001846), Ig-like fold
356 (IPR013783), EGF-like (IPR013032) and a cyclic nucleotide binding (IPR000595) domain (figure
357 4. A canonical human ITGB has four EGF domains [16,18] that coincide with the location of Ig
358 regions in amoebozoan ITGB. Three amoebozoan ITGB in this class have an extracellular EGF–
359 like domain at the N-terminus, at the location of the PSI domain (figure 3). LamG3 (CSM) and
360 EGF-like domain are recovered in two amoebozoan ITGB (figure 3). The cation binding motifs
361 MIDAS, ADMIDAS motifs are hypervariable in Type II representatives compared to the metazoan
362 consensus pattern, but SyMBS is well conserved except in Filamoeba nolandi (supplemental table
363 2).
364
365
366 Phylogenetic analysis of ITGB
367 We selected Homo sapiens, Gallus gallus, and Nematostella vectensis as a selective reference
368 group because integrin proteins are highly conserved among Metazoa. The tree of ITGB reveals a
369 three major clade when we rooted on Opisthokonta as an outgroup (figure 3). We observed three
370 clades of type II amoebozoan ITGB, breviates with apusomonads ITGB, and type I amoebozoan
371 with the CRuMs taxon Rigifila ramosa ITGB. Therefore, we do not recover Amoebozoa as a
372 monophyletic group.
16 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
373 Additionally, we investigated the evolutionary history of Sib (Similar to Integrin Beta) proteins.
374 Sib proteins is a cell adhesion molecule that are similar to ITGB, originally discovered in
375 Dictyostelium discoideum [40]. However, they do not appear to have a canonical ITGB
376 (Supplemental result). Additionally, an ITGB homolog was shown in cyanobacteria, but it
377 contained only one of the ITGB domains[6]. Therefore, we inferred an additional ITGB tree with
378 these SIBA proteins from these species (Didymoeca costata, Dictyostelium discoideum and
379 cyanobacteria) (supplemental figure 2). In this tree, a clade of Type II ITGB is placed between
380 the cyanobacteria Trichodesmium and the choanoflagellate Didymoeca. However, these apparent
381 ITGB homologs do not have obvious ITGB-stalk region or NPXY motifs. While they are
382 recovered well with other ITGB, given the important missing motifs and domains the placement
383 of these in a phylogenetic context alone does not clearly define these as ITGB.
384
385 Scaffold Cytoplasmic proteins
386 In our comparative transcriptome and genome analysis, we observed the following intracellular
387 scaffold cytoskeletal adhesomes proteins: talin, a-actinin, vinculin, PINCH, paxillin, and ILK
388 throughout Amoebozoa. These six adhesome proteins were have been reported to be present in
389 Amoebozoa[6]. Since then the number of the core consensus adhesomes were expanded[21].
390 Filamin, kindlin, and tensin are three consensus adhesome proteins and cross-actin linkers that
391 interact with ITGB[21]. We have observed two novel adhesome proteins (filamin, tensin) that were
392 not known in Amoebozoa (Supplemental figure 1). We suspect these cytoskeleton proteins may
393 be critical, and the absence of these genes in some amoebozoan taxa is probably a result low gene
394 coverage in some transcriptomes. Talin, PINCH, filamin, and Paxillin key players in the integrin
395 adhesome, were not present in seven amoebozoan transcriptomes (figure 1B), however their
17 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
396 absence is also suspected to be a artefact of low gene coverage in these data. Across the
397 amoebozoan clade, we also noticed a large number of what appear to be truncated talin compared
398 to the rest of adhesome proteins (figure 1B).
399
400 ILK is scattered across the Amoebozoa unlike in Apusomonadida and Opisthokonta[6–10]. a-
401 actinin is the only signaling protein that is present in all of the amoebozoan taxa. Consistent with
402 previous reports, we did not detect FAK-cSRC or Parvin in any of Amoebozoa or Breviatea[11].
403 Since the gene coverage for Pygsuia (Breviatea) is high and genomes are available for Lenisia
404 limosa (Breviatea) and Entamoeba spp. (Amoebozoa), we conclude that breviates and Entamoeba
405 spp. likely do not possess vinculin proteins (supplemental figure 1). For Ectocarpus siliculosus,
406 its genome has talin, and a-actinin homologs[37], but it does not contain any other integrin-
407 signaling adhesome protein homologs.
408
409
410
411 DISCUSSION
412 Our findings show that the IMAC is not exclusive to obazoans as previously proposed[6,8–11].
413 We report a nearly full set of IMAC proteins is present throughout Amorphea including four
414 amoebozoan species (Flabellula citata, Rhizamoeba saxonica, Cryptodifflugia operculatum,
415 Mastigamoeba balamuthi, and Nebela sp.), and two previously unsurveyed obazoan taxa Lenisia
416 limosa (Breviatea) and in Ministeria vibrans (Opisthokonta). Our increased taxon sampling that
417 spans the breadth of the known diversity within Amoebozoa permitted discovery of these proteins.
418 While the integrin proteins undoubtedly played a role in the origins of animal
18 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
419 multicellularity[6,41,42], we do not find any evidence of integrins in the few multicellular lineages
420 of Amoebozoa, those which socially aggregate to form emergent fruiting body structures
421 (Copromyxa protea and the model dictyostelids). Interestingly, secondary loss of these proteins
422 seems to be rampant, as the majority of amoebozoan taxa appear to lack integrins or some of the
423 IMAC components, but the caveat is that some of these data are based on incomplete
424 transcriptomic data. Therefore, their absence may be due to the available data. None-the-less, the
425 most parsimonious explanation of the IMAC evolution is that the ancestor of Amoebozoa must
426 have contained them. An analogous scenario of secondary loss of the IMAC plays out in obazoan
427 evolution, in which we see that the closest relative of animals, the Choanoflagellata[39,43,44], as
428 well as the Nucletmycea (also referred to as the Holomycota, the fungal lineage) are devoid of key
429 IMAC components (namely the integrins)[6,11].
430
431 The functional domains of ITGA in amoebozoans are similar to the canonical ITGA found in
432 animals and their close relatives like the opisthokont filastereans Capsaspora owczarzaki[6],
433 Ministeria vibrans, and Pigoraptor spp.[7]. Given the homologous nature of the functional domain
434 architecture in amoebozoan ITGA, it is likely that they function at the cellular-level in a similar
435 manner to those in animals. Therefore, we suspect the last common ancestor of Amorphea (LCAA)
436 probably already had ITGA within its genome. Of note, there are potential ITGA homologs in
437 Archaea, therefore it is likely ITGA homologs predate the eukaryotic last common ancestor, but
438 have been retained mostly in Amorphea. We also suspect that the ITGA Ig-like domains possessed
439 by some amoebozoans and obazoans are equivalent to metazoan ITGA legs originally thought to
440 be specific to metazoans[6,11]. An ITGA leg domain was probably in the ITGA of the LCAA, and
441 secondarily lost in some taxa.
19 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
442
443 Some novel functional domains in Type II representatives are integrin binding domains that are
444 present in ECM protein such as LamG[45], TSP-1[46], and EGF-like domains[47]. Could the last
445 common ancestor of Type II representatives or LCAA contain these integrin binding domains?
446 Other functional domains in amoebozoan ITGA such as protein kinase, TSP-1 together with WSC
447 and intramolecular chaperone are not found in Obazoa. Domain shuffling is an important
448 mechanism for creating a novel genes in evolution[48]. It has been suggested that many domains
449 in the ECM are found in unrelated proteins[49]. Whether these proteins were parts of the domain
450 shuffling in ITGA of the last common ancestor of Amoebozoa, which resulted in ECM evolution
451 remains to be seen and their function remains elusive. However, based on our results, it is possible
452 that we have predicted novel integrin proteins with catalytic as well as adhesive properties.
453
454 The domains of amoebozoan ITGB are very similar to the canonical metazoan ITGB and we
455 suspect that the last common ancestor of Amoebozoa retained these canonical ITGB structures
456 including the CRSM motif and three metal binding sites. The cytosolic tail motif “GFFKR” is
457 probably specific to Obazoa and there may be no interaction in the c-terminal intracellular region
458 of integrins in Amoebozoa. The presence of transmembrane and the signal peptide regions are
459 sporadic in Amorphea including metazoan, a previous study showed shedding (evolution from
460 membrane docked to a secretory function) of integrins occurs[27]. The amoebozoan ITGB
461 cysteine pattern of CGXCGGXXXXCXGC possesses more amino acids, including glycine,
462 compared to Obazoa (figure 5F). Glycine is a small non-polar amino acid[50] and due to its small
463 size, glycine positions are often highly conserved within proteins[51,52]; because, glycine residues
20 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
464 maintain the overall architecture[53]. It may be interpreted that increased glycine residues could
465 maintain the overall protein structure in amoebozoan ITGB.
466
467 The type II ITGB form an independent clade, and it is a sister to the other amoebozoans and R.
468 ramosa in figure 4. LamG3 and vWD in ITGB have never been observed in Obazoa and the
469 functions of these type II ITGB proteins are unknown. Some unicellular protists such as breviates,
470 Capsaspora owczarzaki, and apusomonads have a long expansion of the CRSM for ITGB [6,11].
471 We suspect that Breviatea with a long integrin legs form a long integrin heterodimerization
472 complex at the extracellular area of the cell, as both ITGA and ITGB has long legs of nearly equal
473 proportions in Pygsuia biforma (figures 2 and 3)[11].
474
475 Our prediction is that most of Discosea, which includes Acanthamoeba (again with several
476 genomes available) and many other sampled transcriptomes in our study, have lost these proteins.
477 However, one representative discosean (Mycamoeba gemmipara) has canonical ITGB homolog
478 (figure 4). Additionally, four other amoebozoan taxa with genome sequences available also appear
479 to have lost ITGB in their evolution. There are amoebozoans in which we have observed only
480 ITGA or ITGB receptors, without their counterpart integrin protein. A study by Li et al. and
481 Schneider et al. showed that homodimerization of ITGA or ITGB might occur by clustering of
482 integrin in a lipid raft[54,55]. Our results show that many of the amoebozoan species singly possess
483 either ITGA or ITGB and yet they possess all the components of signal components of IMAC.
484 These amoebozoan with singular integrins may be capable of initiating the integrin signaling
485 pathway by a similar process.
486
21 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
487 Also, the question remains that the complexity of extra domains of integrin are not species related
488 and their compositions are a single multidomain protein that is present only in Amoebozoa.
489 Phylogenetically these orthologs (i.e., amoebozoan ITGA and ITGB) form novel clades
490 independent to known organisms and most likely are ancestral to the Amoebozoa as a whole. In
491 order to further examine the overall distributions and protein architectures, genomic sequencing
492 will be necessary, and the combination of localization and gene knockout strategies are necessary
493 to understand the function of these ‘multicellularity’ proteins in unicellular taxa.
494
495 Conclusion
496 Our most comprehensive comparative genomic and transcriptomic of Amoebozoa has revealed the
497 presence of a full set of IMAC across the Amoebozoa. To date, this complex was believed to be
498 Obazoa specific, our data shows that these IMAC proteins predates Obazoa. Therefore, the last
499 common ancestor of Amorphea contains the necessary machinery of IMAC, yet it remains
500 unknown if and what functional role they play in these unicellular protists. In future work,
501 localization and functionality studies of these IMAC proteins will be investigated.
502
503 Acknowledgements
504 This project was supported in part by the United States National Science Foundation (NSF)
505 Division of Environmental Biology (DEB) grant 1456054 (http://www.nsf.gov), awarded to MWB.
506 We thank Dr. Andrew J. Roger (Dalhousie University) for advanced access to Mastigamoeba
507 balamuthi transcriptome, which was supported by grant MOP-142349 from the Canadian Institutes
508 of Health Research awarded to A.J. Roger.
509 510 FIGURE LEGENDS
22 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
511 512 Figure 1: A) Repertoire of IMAC in amoebozoan species that have either integrin alpha or beta. 513 Names of IMAC proteins are listed at the bottom and names of amoebozoan taxa are listed on side 514 of the presence/absence heat map. Arrows indicates amoebozoan species that have nearly complete 515 set of IMAC proteins. Dark blue indicates the presence of a canonical IMAC protein, white 516 indicates the absence of an IMAC protein and light blue indicates a truncated form of IMAC 517 protein. B) Phylogenetic distribution of IMAC. Circle, innovation/gain; bar, loss. Abbreviations: 518 vin, vinculin; pax, paxillin; tal, talin; par, parvin; pin, particularly interesting new cysteine- 519 histidine-rich protein; FAK, focal adhesion kinase; ILK, integrin-linked kinase; αA, alpha-actinin; 520 α, α-integrin; β, β-integrin. C) Cartoon schematic of the membrane docked IMAC, which 521 associates with intracellular actin (modified from Brown et al., 2013), color corresponds with the 522 species tree. 523 524 525 526 527 528
23 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
529 530 Figure 2: Domain architecture of ITGA from Amoebozoa. The domain architecture of ITGA for 531 amoebozoan species are shown at the right side and phylogenetic tree of Amoebozoa is shown at 532 the left side of figure. Keys to each domain’s color and shape are represented at the bottom of the 533 figure. Phylogenetic tree of amoebozoan ITGA, 160 sites phylogeny of amoebozoan ITGA rooted 534 with Obazoa ITGA. The tree was built using IQTREE v1.5.5 under the LG+C60+F+G model of 535 protein evolution. Numbers at nodes are ML bootstrap (MLBS) values derived from 1000 ultrafast 536 MLBS replicates. Any values that are below 50% are not shown. The color of branches 537 amoebozoan sequences are illustrated according to Kang et al figure 3[14]. When domains are 538 overlapped because of different software algorithms, these are represented in transparent colors. 539
24 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
ITGA Motif
A.
4
3
bits 2 I L 1 S A D V I V SS I P A N I S T I AL Y VL LS T Y GG K S R I Y G A ST V I N T I VA VK QNN L E N L A V R DC F L S V F A V P I R Q SFT L Y L L P R N K P C Q F F M R E AA A A H M AMV CG NG E V 0 D K I FA 2 3 4 5 6 7 8 9 1 G G N 10 11 12 13 14 15 16 17 18 G19 20 21 B. D D D MEME (no SSC) 18.06.2018 20:29 C. 4 4 3 3
bits 2 V LP T S Q KN S N bits 2 Q T TV T 1 L EL H L G N TD 1 LYS ST SA VE 0 A DD F KQ 2 3 4 5 6 7 8 9 1 A FR 10 11 12 13 14 15 16 0 3 6 2 5 7 4 9 8 MEME (no SSC) 18.06.2018 22:08 1 EFG GP DQ YD 10 13 12 15 11 14 C C GF DG C C CSMEME (no SSC) 07.06.2018C 21:11 D. E.W W 4 4
3 3
bits 2 VN T Q QY bits 2 EKR K 1 Q 1 K EK RL Q F HM Q VP D S S R QY K R Y F Q NE E SEQ PRQL LKNFEPD N M 0 HHEA MAM I I 2 3 4 5 6 7 8 9 1 HA M E GF 0 P E 4 8 5 9 2 6 3 7 T LA PY 1 10 11 12 13 14 15 16 17 18 19 20 10 11 MEME (no SSC) 18.06.2018 22:11 C C MEME (no SSC) 18.06.2018 20:36 NF. N DL AAT 4
3
bits 2 S T SP SP PS L TSP PS S I TS NSP S T T S N 1 L P VN N P R I V T V A P VP TV TQ L V N T P NS P T T V P MP T R T TTS R S RL T R T T L S L N I Q N N Q I S SS I AR P NLAS QT R I RQ QT I P TF T PNPSL I ST TPTS NTP 0 L I AV I FNV N GANT N I EN DPVSDLLATAAPPNRRLALSFNNS 2 3 4 5 6 7 8 9 1 S L S S I S 10 N11 12 13 14 15 16 P17 18 19 20 21 22 23 24 25 26 27 28 29 T30 31 32 33 34 35 36 37 38 39 40 P41 42 43 44 45 T46 47 48 49 T50 540 T P MEME (no SSC) 18.06.2018 21:14 541 Figure 3: Most significantly enriched motifs in 23 amoebozoans ITGA homologs, as obtained by 542 MEME-suite except for (E). (A) Consensus cation motif and FG-GAP motifs found in 23 543 amoebozoans ITGA homologs. (B-D) For figure 3B to 3D, these consensus motifs are exclusively 544 enriched in three amoebozoan ITGA homologs out of 23 amoebozoan ITGA homologs: (B) EGF- 545 like motif (C) Thrombospodin repeat 1 motif (D) Carbohydrate WSC motif. (E) GFFKR motif 546 found in three ITGA homologs from Filamoeba nolandi and two obazoans species, Ministeria 547 vibrans and Pygsuia biforma. (F) Proline-rich extensin motif found in 8 ITGA amoebozoan 548 homologs out of 23 amoebozoan ITGA homologs. 549
25 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
550
26 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
551 552 553 Figure 4: Phylogenetic tree of ITGB homologs with a focus on Amoebozoa. The domain 554 architecture of ITGB homologs are shown to the right and the phylogenetic tree of ITGB. Keys to 555 each of domains color and shape are represented in the box at the bottom of the figure. Tree of 556 amoebozoan ITGB 295amino acids sites phylogeny of amoebozoan ITGB rooted with Obazoa. 557 The tree was built using IQTREE v1.5.5 under the LG+C60+F+G model of protein evolution. 558 Numbers at nodes on the phylogenetic tree are maximum likelihood bootstrap support values 559 derived from 1000 ultrafast bootstrap replicates. Any values that are below 50% are not shown. 560 The color of branches amoebozoan sequences are illustrated according to Kang et al. figure 3[14]. 561 When domains are overlapped because of different software algorithms, these are represented in 562 transparent colors. 563
27 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
ITGB Motif
A. B. 4 4
3 3
bits 2
bits 2 D I L RT V Q T Q S K Q H 1 I F Y V F G T S FV E Y R H I F F N I P V K 1 L K S L T V R L T T A I LV S D V D M T I L V V A V N I Q V VD Q V D M A Q GD MT F A T L N L ASY I Y L SN S L P M R V ML G L R F I L M S H V SL N I L S D I F D L F E LL M K F G R N MM S A F D Y YL I A I H V E QE AM I A I AGD T C SC E T I V K AT L S I L G A M I V V C PTF N K Y I T 0 I AA LF G E 9 4 6 2 5 8 3 7 1 L E 0 12 15 18 21 24 27 30 23 11 14 17 20 25 26 28 29 31 10 13 16 19 22 4 8 5 9 2 6 3 7 P 1 S 10 S MEME (no SSC) 11.01.2018 20:59 11 D D SM P DMEME (no SSC) 24.04.2018 21:59 C. D. 4 4
3 3
bits 2 S bits 2 L Y L Y A 1 G AG NQ E GS L L RV G H D S Q 1 V T I F A P A V G S V VS T T N P K Q SL N S N W F S F P R T M ML R L P T D M Y I AG E R L D V T I S I I RC MY TW F WV N I G L D A Q A Y E L T Y K H Y Y VY I L S A Q L Q D YG F T V K MM S Q YY S WV V N T ES N A C VL P I I I N N A Y Q S A G K QQ T M F M VT I L QF N VSS W R T G D S I F RG NMY Q G A S D E I D S D L L R C Q P S SQ RK AAHD A F C I M A G A G VL V KD HN V T I RMA NA T VK I A T RK N Q AV M H L SC G SHNGHRN I A GC 0 2 3 4 5 6 7 8 9 1 PE E G Y 0 2 3 4 5 6 7 8 9 1 21 20 16 17 18 19 10 11 12 13 14 15 F E 12 13 14 15 16 17 18 G K 10 11 P E. F.
4 4
3 3 bits bits 2 2 S A N V V T LT P 1 I N 1 L E DG S G T G LY V M I S V D DQNK SA E T SR R S R G I L F A A VR I N A R S SL G I V N Q R S A T P V L I W N E F K K T N F K W C L G V G AQ R L D K Y Q Q E HE V M ST P V H Q V M H L DED L DLKGD I SD A NF NV D E N KG L D H A 0 D 2 3 4 5 6 7 8 9 0 1 I 7 3 6 2 5 9 8 4 1 V 21 16 17 18 19 20 12 13 14 15 10 11 13 10 12 15 14 G 11 G G G G.C C C D C C G C CD 4
3
bits 2 K K LDD S T T L 1 K S E K FD QQH EA V T TE T WS G G I K T Q VQN Q S A N VT NS I Y V T EE R V K T M I Y A NH I S E I N R Y A V QHG V T PQG RN Q Q V G KH Q N S F NDASGS H RI DNSFDAKFSPAQL 0 F 2 3 4 1 5 6 7 8 F9 21 16 17 18 19 20 N L 10 11 12 13 14 15 N H. P 4
3
bits 2 V S V L I MV A V F F T I A I V QE 1 V L I CV S K N V V M G K LV A V A GT E L V I A I V A HV L A I L V TG V LF SGS K EM H T L F F G D I A A I AA L L A A A D 0 L VGG T I LA AL I GL ACACAVCAV I I R A I FTQL 4 9 3 7 8 2 6 5 1 S G V 10 19 24 29 33 34 38 14 18 23 28 12 13 17 22 26 27 31 32 36 37 40 41 11 16 21 25 30 I L G I 15 20 A M 35 M 39 KE V A AD MEME (no SSC) 06.04.2018 16:56 564 565 Figure 5: Most significantly enriched motifs in 18 amoebozoans ITGB homologs, as obtained by 566 MEME-suite. (A)-(F) Three divalent cation binding motifs: MIDAS, ADMIDAS and SyMBS are 567 shown. (A) MIDAS (black arrow) and ADMIDAS (red arrow) motifs are shown. (B) A shared 568 amino acid (indicted by arrow) in MIDAS and ADMIDAS motifs are illustrated. (C) SyMBS motif 569 represented by green arrow and a shared amino acid of MIDAS motif indicated by black arrow are 570 shown. These consensus motifs are obtained from type I amoebozoan ITGB homologs sequences 571 only. (D) SyMBS motif indicted by a green arrow is shown. (E-F) Cysteine rich motifs are shown. 572 (E) PSI domain and (F) Cysteine-rich motif stalk is shown. These are recovered from type I 573 amoebozoan ITGB sequences. (G-H) Motifs found at the C-terminal are shown. (G) NPXY motif 574 shown in blue arrow. found in most amoebozoans ITGB except three amoebozoan ITGB homolog. 575 (H) An electrostatic charge motif is shown, this motif is found exclusively in type I ITGB 576 homologs. 577 578 579
28 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
580 581 Figure 6: The presence of ITGB gene in a Mastigamoeba balamuthi genome sequence. 582 Introns/exons boundaries sequences are shown along with corresponding intron size. The genome 583 contig that possess ITGB gene are colored blue and green. The protein domain architecture of 584 ITGB gene are shown along with protein size. The adjacent tenascin gene on the ITGB genome 585 contig are shown left and right along with the inferred phylogenetic tree. 586
587 Supplemental Materials
588 Supplemental Results: Details on the Sib (similar to integrin beta) proteins found in dictyostelids.
589 Supplemental Figure 1: Repertoire of IMAC including filamin and tensin in Amoebozoa.
590 Supplemental Figure 2: Repertoire of IMAC in Amoebozoa species. Supplemental Figure 3:
591 Maximum likelihood tree of ITA homolog and other cell adhesion molecules that contain beta-
592 propellers. Supplemental Figure 4: Maximum likelihood tree of ITB homolog and Sib proteins.
593 Supplemental Figure 6: Most significantly enriched motifs in Obazoan ITA, obtained by MEME-
594 suite. Supplemental Figure 7: Most significantly enriched motifs in Obazoan ITB, obtained by
595 MEME-suite. Supplemental Figure 8: Most significantly enriched motifs in Sib protein, obtained
596 by MEME.
597
598
599 Supplemental Table 1: The table shows the list of Amorphea ITGA proteins characteristic
600 including: length, beta propeller motifs, subcellular location, any extra domains attached to ITGA,
601 c-terminal motif signal peptide, transmembrane regions and type of ITGA. The order of taxons are
29 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
602 listed in accordance to the Figure 2. The subcellular localization is displayed with likelihood value
603 in parentheses. The location of extra domains attached to ITGA are shown as the C-terminal end
604 (CTE) or at the start of N-terminal (NTB). The presence of ITGA interacting motif with ITGB is
605 shown as GFFKR and GXXXG. The location of beta propeller motifs performed by MEME-suite
606 is displayed. The presence of FG-GAP motifs are shown in asterixis. The location of beta motifs
607 that corresponds to IPRSCAN output are highlighted by parentheses. The presence of consensus
608 cation motifs are displayed in number 2 superscripts.
609
610 Supplemental Table 2: The table shows ITGB motifs, subcellular localizations, signal peptide,
611 transmembrane and extra domains attached to ITGB protein. The order of taxons are listed in
612 accordance to the Figure 3. For ITGB motifs of MIDAS, AMIDAS and SyMBS, any non-canonical
613 amino acids were highlighted in red color, cysteine rich motifs are either shown as EGF or PSI
614 domains. The presence of C-terminal motifs are shown in KLXXXD or NPXY/F and GXXXG.
615 The parentheses indicate number of motifs present in ITGB.
616
617 Supplemental Table 3: This table shows Rsem value of protists that have either a complete set of
618 IMAC, integrins with extra domains or unexpected discovery of an integrin protein. The values
619 are shown in transcripts per million (TPM).
620
621 Supplemental Table 4: A). Summary of amoebozoan ITA type I table listing: number of paralogs,
622 Beta propeller, FG-GAP and canonical motifs. The table also shows presence of signal peptide
623 and transmembrane region, c-terminal interacting motif, GXXXG motif and Ig-fold like/Cadherin
624 domain. B). Summary of amoebozoan ITA type II table listing: number of paralogs, Beta
30 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
625 propeller, FG-GAP and canonical motifs. The table also shows presence of signal peptide and
626 transmembrane region, c-terminal interacting motif, GXXXG motif and extra domain attached to
627 ITA.
628
629 REFERENCES
630 1. Hynes, R.O. (2002). Integrins: bidirectional, allosteric signaling machines. Cell 110, 673– 631 687. 632 2. LaFlamme, S.E., and Auer, K.L. (1996). Integrin signaling. Semin. Cancer Biol. 7, 111– 633 118. 634 3. Harburger, D.S., and Calderwood, D.A. (2009). Integrin signalling at a glance. J. Cell Sci. 635 122, 159–163. 636 4. Horton, E.R., Byron, A., Askari, J.A., Ng, D.H.J., Millon-Frémillon, A., Robertson, J., 637 Koper, E.J., Paul, N.R., Warwood, S., Knight, D., et al. (2015). Definition of a consensus integrin 638 adhesome and its dynamics during adhesion complex assembly and disassembly. Nat. Cell Biol. 639 17, 1577. 640 5. LaFlamme, S.E., Mathew-Steiner, S., Singh, N., Colello-Borges, D., and Nieves, B. (2018). 641 Integrin and microtubule crosstalk in the regulation of cellular processes. Cell. Mol. Life Sci. 75, 642 4177–4185. 643 6. Sebé-Pedrós, A., Roger, A.J., Lang, F.B., King, N., and Ruiz-Trillo, I. (2010). Ancient 644 origin of the integrin-mediated adhesion and signaling machinery. Proc. Natl. Acad. Sci. 107, 645 10142. 646 7. Hehenberger, E., Tikhonenkov, D.V., Kolisko, M., del Campo, J., Esaulov, A.S., Mylnikov, 647 A.P., and Keeling, P.J. (2017). Novel Predators Reshape Holozoan Phylogeny and Reveal the 648 Presence of a Two-Component Signaling System in the Ancestor of Animals. Curr. Biol. 27, 2043- 649 2050.e6. 650 8. Grau-Bové, X., Torruella, G., Donachie, S., Suga, H., Leonard, G., Richards, T.A., and 651 Ruiz-Trillo, I. (2017). Dynamics of genomic innovation in the unicellular ancestry of animals. 652 eLife 6, e26036. 653 9. de Mendoza, A., Suga, H., Permanyer, J., Irimia, M., and Ruiz-Trillo, I. (2015). Complex 654 transcriptional regulation and independent evolution of fungal-like traits in a relative of animals. 655 Elife 4, e08904. 656 10. Dudin, O., Ondracka, A., Grau-Bové, X., A.B. Haraldsen, A., Toyoda, A., Suga, H., Bråte, 657 J., and Ruiz-Trillo, I. (2019). A unicellular relative of animals generates an epithelium-like cell 658 layer by actomyosin-dependent cellularization. bioRxiv, 563726. 659 11. Brown, M.W., Sharpe, S.C., Silberman, J.D., Heiss, A.A., Lang, B.F., Simpson, A.G., and 660 Roger, A.J. (2013). Phylogenomics demonstrates that breviate flagellates are related to 661 opisthokonts and apusomonads. Proc. R. Soc. Lond. B Biol. Sci. 280, 20131755. 662 12. Adl, S.M., Bass, D., Lane, C.E., Lukeš, J., Schoch, C.L., Smirnov, A., Agatha, S., Berney, 663 C., Brown, M.W., Burki, F., et al. (2018). Revisions to the Classification, Nomenclature, and 664 Diversity of Eukaryotes. J. Eukaryot. Microbiol. 0. Available at: https://doi.org/10.1111/jeu.12691 665 [Accessed October 18, 2018].
31 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
666 13. Cavalier-Smith, T. (2017). Origin of animal multicellularity: precursors, causes, 667 consequences—the choanoflagellate/sponge transition, neurogenesis and the Cambrian explosion. 668 Philos. Trans. R. Soc. B Biol. Sci. 372. Available at: 669 http://rstb.royalsocietypublishing.org/content/372/1713/20150476.abstract. 670 14. Kang, S., Tice, A.K., Spiegel, F.W., Silberman, J.D., Pánek, T., Čepička, I., Kostka, M., 671 Kosakyan, A., Alcântara, D., and Roger, A.J. (2017). Between a pod and a hard test: the deep 672 evolution of amoebae. Mol. Biol. Evol. 673 15. Hamann, E., Gruber-Vodicka, H., Kleiner, M., Tegetmeyer, H.E., Riedel, D., Littmann, S., 674 Chen, J., Milucka, J., Viehweger, B., Becker, K.W., et al. (2016). Environmental Breviatea 675 harbour mutualistic Arcobacter epibionts. Nature 534, 254. 676 16. Zhang, K., and Chen, J. (2012). The regulation of integrin function by divalent cations. 677 Cell Adhes. Migr. 6, 20–29. 678 17. Xia, W., and Springer, T.A. (2014). Metal ion and ligand binding of integrin α5β1. Proc. 679 Natl. Acad. Sci. 111, 17863–17868. 680 18. Campbell, I.D., and Humphries, M.J. (2011). Integrin structure, activation, and interactions. 681 Cold Spring Harb. Perspect. Biol. 3, a004994. 682 19. Lee, J.-O., Rieu, P., Arnaout, M.A., and Liddington, R. Crystal structure of the A domain 683 from the a subunit of integrin CR3 (CD11 b/CD18). Cell 80, 631–638. 684 20. Xiong, J.-P., Stehle, T., Diefenbach, B., Zhang, R., Dunker, R., Scott, D.L., Joachimiak, 685 A., Goodman, S.L., and Arnaout, M.A. (2001). Crystal structure of the extracellular segment of 686 integrin αVβ3. Science 294, 339–345. 687 21. Humphries, J.D., Chastney, M.R., Askari, J.A., and Humphries, M.J. (2019). Signal 688 transduction via integrin adhesion complexes. Cell Archit. 56, 14–21. 689 22. Srichai, M.B., and Zent, R. (2010). Integrin structure and function. In Cell-Extracellular 690 Matrix Interactions in Cancer (Springer), pp. 19–41. 691 23. Finn, R.D., Attwood, T.K., Babbitt, P.C., Bateman, A., Bork, P., Bridge, A.J., Chang, H.- 692 Y., Dosztányi, Z., El-Gebali, S., Fraser, M., et al. (2017). InterPro in 2017—beyond protein family 693 and domain annotations. Nucleic Acids Res. 45, D190–D199. 694 24. Oxvig, C., and Springer, T.A. (1998). Experimental support for a beta-propeller domain in 695 integrin alpha-subunits and a calcium binding site on its lower surface. Proc. Natl. Acad. Sci. U. 696 S. A. 95, 4870–4875. 697 25. Humphries, M.J., Symonds, E.J.H., and Mould, A.P. (2003). Mapping functional residues 698 onto integrin crystal structures. Curr. Opin. Struct. Biol. 13, 236–243. 699 26. Jin, R., Trikha, M., Cai, Y., Grignon, D., and Honn, K.V. (2007). A naturally occurring 700 truncated β3 integrin in tumor cells: Native anti-integrin involved in tumor cell motility. Cancer 701 Biol. Ther. 6, 1559–1568. 702 27. Gomez, I.G., Tang, J., Wilson, C.L., Yan, W., Heinecke, J.W., Harlan, J.M., and Raines, 703 E.W. (2012). Metalloproteinase-mediated Shedding of Integrin β2 promotes macrophage efflux 704 from inflammatory sites. J. Biol. Chem. 287, 4581–4589. 705 28. Tan, S.-M., Walters, S.E., Mathew, E.C., Robinson, M.K., Drbal, K., Shaw, J.M., and Law, 706 S.K.A. (2001). Defining the repeating elements in the cysteine-rich region (CRR) of the CD18 707 integrin β2 subunit. FEBS Lett. 505, 27–30. 708 29. Russ, W.P., and Engelman, D.M. (2000). The GxxxG motif: A framework for 709 transmembrane helix-helix association11Edited by G. von Heijne. J. Mol. Biol. 296, 911–919.
32 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
710 30. Mould, A.P., Barton, S.J., Askari, J.A., Craig, S.E., and Humphries, M.J. (2003). Role of 711 ADMIDAS cation-binding site in ligand recognition by integrin α5β1. J. Biol. Chem. 278, 51622– 712 51629. 713 31. Calderwood, D.A., Fujioka, Y., de Pereda, J.M., García-Alvarez, B., Nakamoto, T., 714 Margolis, B., McGlade, C.J., Liddington, R.C., and Ginsberg, M.H. (2003). Integrin β cytoplasmic 715 domain interactions with phosphotyrosine-binding domains: A structural prototype for diversity 716 in integrin signaling. Proc. Natl. Acad. Sci. 100, 2272. 717 32. Hillmann, F., Forbes, G., Novohradská, S., Ferling, I., Riege, K., Groth, M., Westermann, 718 M., Marz, M., Spaller, T., Winckler, T., et al. (2018). Multiple Roots of Fruiting Body Formation 719 in Amoebozoa. Genome Biol. Evol. 10, 591–606. 720 33. Nývltová, E., Šuták, R., Harant, K., Šedinová, M., Hrdý, I., Pačes, J., Vlček, Č., and 721 Tachezy, J. (2013). NIF-type iron-sulfur cluster assembly system is duplicated and distributed in 722 the mitochondria and cytosol of Mastigamoeba balamuthi. Proc Natl Acad Sci U A 110, 7371– 723 7376. 724 34. Metcalf, J.A., Funkhouser-Jones, L.J., Brileya, K., Reysenbach, A.-L., and Bordenstein, 725 S.R. (2014). Antibacterial gene transfer across the tree of life. Elife 3, e04266. 726 35. Clarke, J., Cota, E., Fowler, S.B., and Hamill, S.J. (1999). Folding studies of 727 immunoglobulin-like β-sandwich proteins suggest that they share a common folding pathway. 728 Structure 7, 1145–1153. 729 36. Attwood, T.K., Bradley, P., Flower, D.R., Gaulton, A., Maudling, N., Mitchell, A.L., 730 Moulton, G., Nordle, A., Paine, K., Taylor, P., et al. (2003). PRINTS and its automatic supplement, 731 prePRINTS. Nucleic Acids Res. 31, 400–402. 732 37. Cock, J.M., Sterck, L., Rouzé, P., Scornet, D., Allen, A.E., Amoutzias, G., Anthouard, V., 733 Artiguenave, F., Aury, J.-M., Badger, J.H., et al. (2010). The Ectocarpus genome and the 734 independent evolution of multicellularity in brown algae. Nature 465, 617. 735 38. Brown, M.W., Heiss, A.A., Kamikawa, R., Inagaki, Y., Yabuki, A., Tice, A.K., Shiratori, 736 T., Ishida, K.-I., Hashimoto, T., Simpson, A.G.B., et al. (2018). Phylogenomics Places Orphan 737 Protistan Lineages in a Novel Eukaryotic Super-Group. Genome Biol. Evol. 10, 427–433. 738 39. Richter, D., Fozouni, P., Eisen, M., and King, N. (2017). The ancestral animal genetic 739 toolkit revealed by diverse choanoflagellate transcriptomes. bioRxiv. Available at: 740 http://biorxiv.org/content/early/2017/12/26/211789.abstract. 741 40. Cornillon, S., Gebbie, L., Benghezal, M., Nair, P., Keller, S., Wehrle-Haller, B., Charette, 742 S.J., Brückert, F., Letourneur, F., and Cosson, P. (2006). An adhesion molecule in free-living 743 Dictyostelium amoebae with integrin β features. EMBO Rep. 7, 617–621. 744 41. Suga, H., Chen, Z., de Mendoza, A., Sebé-Pedrós, A., Brown, M.W., Kramer, E., Carr, M., 745 Kerner, P., Vervoort, M., Sánchez-Pons, N., et al. (2013). The Capsaspora genome reveals a 746 complex unicellular prehistory of animals. 4, 2325. 747 42. Brunet, T., and King, N. (2017). The Origin of Animal Multicellularity and Cell 748 Differentiation. Dev. Cell 43, 124–140. 749 43. King, N., Westbrook, M.J., Young, S.L., Kuo, A., Abedin, M., Chapman, J., Fairclough, 750 S., Hellsten, U., Isogai, Y., and Letunic, I. (2008). The genome of the choanoflagellate Monosiga 751 brevicollis and the origin of metazoans. Nature 451, 783–788. 752 44. Fairclough, S.R., Chen, Z., Kramer, E., Zeng, Q., Young, S., Robertson, H.M., Begovic, 753 E., Richter, D.J., Russ, C., and Westbrook, M.J. (2013). Premetazoan genome evolution and the 754 regulation of cell differentiation in the choanoflagellate Salpingoeca rosetta. Genome Biol. 14, 755 R15.
33 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
756 45. Pulido, D., Hussain, S.-A., and Hohenester, E. (2017). Crystal Structure of the 757 Heterotrimeric Integrin-Binding Region of Laminin-111. Struct. Lond. Engl. 1993 25, 530–535. 758 46. Calzada, M.J., Annis, D.S., Zeng, B., Marcinkiewicz, C., Banas, B., Lawler, J., Mosher, 759 D.F., and Roberts, D.D. (2004). Identification of novel β1 integrin binding sites in the type 1 and 760 type 2 repeats of thrombospondin-1. J. Biol. Chem. 279, 41734–41743. 761 47. Hynes, R.O. (2012). The evolution of metazoan extracellular matrix. J. Cell Biol. 196, 671. 762 48. Babushok, D.V., Ohshima, K., Ostertag, E.M., Chen, X., Wang, Y., Mandal, P.K., Okada, 763 N., Abrams, C.S., and Kazazian, H.H., Jr (2007). A novel testis ubiquitin-binding protein gene 764 arose by exon shuffling in hominoids. Genome Res. 17, 1129–1138. 765 49. Adams, J.C. (2013). Extracellular Matrix Evolution: An Overview. In Evolution of 766 Extracellular Matrix, F. W. Keeley and R. P. Mecham, eds. (Berlin, Heidelberg: Springer Berlin 767 Heidelberg), pp. 1–25. Available at: https://doi.org/10.1007/978-3-642-36002-2_1. 768 50. Lodish, H., Darnell, J.E., Berk, A., Kaiser, C.A., Krieger, M., Scott, M.P., Bretscher, A., 769 Ploegh, H., and Matsudaira, P. (2008). Molecular cell biology (Macmillan). 770 51. Creighton, T.E. (1993). Proteins: structures and molecular properties (Macmillan). 771 52. Branden, C.I., and Tooze, J. (2012). Introduction to protein structure (Garland Science). 772 53. Jacob, J., Duclohier, H., and Cafiso, D.S. (1999). The Role of Proline and Glycine in 773 Determining the Backbone Flexibility of a Channel-Forming Peptide. Biophys. J. 76, 1367–1376. 774 54. Li, R., Mitra, N., Gratkowski, H., Vilaire, G., Litvinov, R., Nagasami, C., Weisel, J.W., 775 Lear, J.D., DeGrado, W.F., and Bennett, J.S. (2003). Activation of integrin αIIbß3 by modulation 776 of transmembrane helix associations. Science 300, 795–798. 777 55. Schneider, D., and Engelman, D.M. (2004). Involvement of transmembrane domain 778 interactions in signal transduction by α/β integrins. J. Biol. Chem. 279, 9840–9846. 779 56. Clarke, M., Lohan, A.J., Liu, B., Lagkouvardos, I., Roy, S., Zafar, N., Bertelli, C., Schilde, 780 C., Kianianmomeni, A., and Burglin, T.R. (2013). Genome of Acanthamoeba castellanii highlights 781 extensive lateral gene transfer and early evolution of tyrosine kinase signaling. Genome Biol 14. 782 57. Tice, A.K., Shadwick, L.L., Fiore-Donno, A.M., Geisen, S., Kang, S., Schuler, G.A., 783 Spiegel, F.W., Wilkinson, K.A., Bonkowski, M., Dumack, K., et al. (2016). Expansion of the 784 molecular and morphological diversity of Acanthamoebidae (Centramoebida, Amoebozoa) and 785 identification of a novel life cycle type within the group. Biol. Direct 11, 69. 786 58. Urushihara, H., Kuwayama, H., Fukuhara, K., Itoh, T., Kagoshima, H., Shin-I, T., Toyoda, 787 A., Ohishi, K., Taniguchi, T., Noguchi, H., et al. (2015). Comparative genome and transcriptome 788 analyses of the social amoeba Acytostelium subglobosum that accomplishes multicellular 789 development without germ-soma differentiation. BMC Genomics 16, 80–80. 790 59. Lahr, D.J.G., Kosakyan, A., Lara, E., Mitchell, E.A.D., Morais, L., Porfirio-Sousa, A.L., 791 Ribeiro, G.M., Tice, A.K., Pánek, T., Kang, S., et al. (2019). Phylogenomics and Morphological 792 Reconstruction of Arcellinida Testate Amoebae Highlight Diversity of Microbial Eukaryotes in 793 the Neoproterozoic. Curr. Biol. 29, 991-1001.e3. 794 60. Tekle, Y.I., Anderson, O.R., Katz, L.A., Maurer-Alcalá, X.X., Romero, M.A.C., and 795 Molestina, R. (2016). Phylogenomics of ‘Discosea’: A new molecular phylogenetic perspective 796 on Amoebozoa with flat body forms. Mol Phylogenet Evol 99, 144–154. 797 61. Heidel, A.J., and Glöckner, G. (2008). Mitochondrial Genome Evolution in the Social 798 Amoebae. Mol. Biol. Evol. 25, 1440–1450. 799 62. Eichinger, L., Pachebat, J.A., Glockner, G., Rajandream, M.A., Sucgang, R., Berriman, M., 800 Song, J., Olsen, R., Szafranski, K., Xu, Q., et al. (2005). The genome of the social amoeba 801 Dictyostelium discoideum. Nature 435, 43–57.
34 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
802 63. Gonzales, C.M., Spencer, T.D., Pendley, S.S., and Welker, D.L. (1999). Dgp1 and Dfp1 803 Are Closely Related Plasmids in theDictyosteliumDdp2 Plasmid Family. Plasmid 41, 89–96. 804 64. Sucgang, R., Kuo, A., Tian, X., Salerno, W., Parikh, A., Feasley, C.L., Dalin, E., Tu, H., 805 Huang, E., Barry, K., et al. (2011). Comparative genomics of the social amoebae Dictyostelium 806 discoideum and Dictyostelium purpureum. Genome Biol. 12, R20–R20. 807 65. Loftus, B., Anderson, I., Davies, R., Alsmark, U.C.M., Samuelson, J., Amedeo, P., 808 Roncaglia, P., Berriman, M., Hirt, R.P., Mann, B.J., et al. (2005). The genome of the protist 809 parasite Entamoeba histolytica. Nature 433, 865. 810 66. Ehrenkaufer, G.M., Weedall, G.D., Williams, D., Lorenzi, H.A., Caler, E., Hall, N., and 811 Singh, U. (2013). The genome and transcriptome of the enteric parasite Entamoeba invadens, a 812 model for encystation. Genome Biol. 14, R77. 813 67. Tachibana, H., Yanagi, T., Pandey, K., Cheng, X.-J., Kobayashi, S., Sherchand, J.B., and 814 Kanbara, H. (2007). An Entamoeba sp. strain isolated from rhesus monkey is virulent but 815 genetically different from Entamoeba histolytica. Mol. Biochem. Parasitol. 153, 107–114. 816 68. Schaap, P., Barrantes, I., Minx, P., Sasaki, N., Anderson, R.W., Bénard, M., Biggar, K.K., 817 Buchler, N.E., Bundschuh, R., Chen, X., et al. (2015). The Physarum polycephalum Genome 818 Reveals Extensive Use of Prokaryotic Two-Component and Metazoan-Type Tyrosine Kinase 819 Signaling. Genome Biol. Evol. 8, 109–125. 820 69. Heidel, A.J., Lawal, H.M., Felder, M., Schilde, C., Helps, N.R., Tunggal, B., Rivero, F., 821 John, U., Schleicher, M., and Eichinger, L. (2011). Phylogeny-wide analysis of social amoeba 822 genomes highlights ancient origins for complex intercellular communication. Genome Res. 823 70. Cavalier-Smith, T., Fiore-Donno, A.M., Chao, E., Kudryavtsev, A., Berney, C., Snell, E.A., 824 and Lewis, R. (2015). Multigene phylogeny resolves deep branching of Amoebozoa. Mol 825 Phylogenet Evol 83, 293–304. 826 71. Li, W., and Godzik, A. (2006). Cd-hit: a fast program for clustering and comparing large 827 sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659. 828 72. Grabherr, M.G., Haas, B.J., Yassour, M., Levin, J.Z., Thompson, D.A., and Amit, I. (2011). 829 Full-length transcriptome assembly from RNA-seq data without a reference genome. Nat 830 Biotechnol 29. 831 73. Haas, B.J., Papanicolaou, A., Yassour, M., Grabherr, M., Blood, P.D., Bowden, J., Couger, 832 M.B., Eccles, D., Li, B., Lieber, M., et al. (2013). De novo transcript sequence reconstruction from 833 RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc. 8, 1494– 834 1512. 835 74. Croning, M.D.R., Apweiler, R., and Möller, S. (2001). Evaluation of methods for the 836 prediction of membrane spanning regions. Bioinformatics 17, 646–653. 837 75. Langmead, B., and Salzberg, S.L. (2012). Fast gapped-read alignment with Bowtie 2. Nat 838 Methods 9. 839 76. Buchfink, B., Xie, C., and Huson, D.H. (2014). Fast and sensitive protein alignment using 840 DIAMOND. Nat. Methods 12, 59. 841 77. Fischer, S., Brunk, B.P., Chen, F., Gao, X., Harb, O.S., Iodice, J.B., Shanmugam, D., Roos, 842 D.S., and Stoeckert, C.J. (2002). Using OrthoMCL to Assign Proteins to OrthoMCL-DB Groups 843 or to Cluster Proteomes Into New Ortholog Groups. In Current Protocols in Bioinformatics (John 844 Wiley & Sons, Inc.). 845 78. Bolger, A.M., Lohse, M., and Usadel, B. (2014). Trimmomatic: a flexible trimmer for 846 Illumina sequence data. Bioinformatics 30.
35 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
847 79. Katoh, K., and Standley, D.M. (2013). MAFFT Multiple Sequence Alignment Software 848 Version 7: Improvements in Performance and Usability. Mol Biol Evol 30. 849 80. Criscuolo, A., and Gribaldo, S. (2010). BMGE (Block Mapping and Gathering with 850 Entropy): a new software for selection of phylogenetic informative regions from multiple sequence 851 alignments. BMC Evol Biol 10. 852 81. Nguyen, L.T., Schmidt, H.A., Haeseler, A., and Minh, B.Q. (2014). IQ-TREE: A Fast and 853 Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies. Mol Biol Evol 854 32. 855 82. Li, B., and Dewey, C.N. (2011). RSEM: accurate transcript quantification from RNA-Seq 856 data with or without a reference genome. BMC Bioinformatics 12. 857 83. Almagro Armenteros, J.J., Tsirigos, K.D., Sønderby, C.K., Petersen, T.N., Winther, O., 858 Brunak, S., von Heijne, G., and Nielsen, H. (2019). SignalP 5.0 improves signal peptide 859 predictions using deep neural networks. Nat. Biotechnol. 37, 420–423. 860 84. MacManes, M.D. (2017). The Oyster River Protocol: A Multi Assembler and Kmer 861 Approach For de novo Transcriptome Assembly. bioRxiv. Available at: 862 http://biorxiv.org/content/early/2017/08/16/177253.abstract. 863 85. Bailey, T.L., Boden, M., Buske, F.A., Frith, M., Grant, C.E., Clementi, L., Ren, J., Li, 864 W.W., and Noble, W.S. (2009). MEME Suite: tools for motif discovery and searching. Nucleic 865 Acids Res. 37, W202–W208. 866 86. Nielsen, H., Almagro Armenteros, J.J., Sønderby, C.K., Sønderby, S.K., and Winther, O. 867 (2017). DeepLoc: prediction of protein subcellular localization using deep learning. 868 Bioinformatics 33, 3387–3395. 869 87. Huerta-Cepas, J., Serra, F., and Bork, P. (2016). ETE 3: Reconstruction, Analysis, and 870 Visualization of Phylogenomic Data. Mol. Biol. Evol. 33, 1635–1638. 871 88. El-Gebali, S., Mistry, J., Bateman, A., Eddy, S.R., Luciani, A., Potter, S.C., Qureshi, M., 872 Richardson, L.J., Salazar, G.A., Smart, A., et al. (2018). The Pfam protein families database in 873 2019. Nucleic Acids Res. 47, D427–D432. 874 89. Van Dongen, S.M. (2000). Graph clustering by flow simulation. 875 90. Blandenier, Q., Seppey, C.V.W., Singer, D., Vlimant, M., Simon, A., Duckert, C., and Lara, 876 E. (2017). Mycamoeba gemmipara nov. gen., nov. sp., the First Cultured Member of the 877 Environmental Dermamoebidae Clade LKM74 and its Unusual Life Cycle. J. Eukaryot. Microbiol. 878 64, 257–265. 879 91. Picelli, S., Faridani, O.R., Bjorklund, A.K., Winberg, G., Sagasser, S., and Sandberg, R. 880 (2014). Full-length RNA-seq from single cells using Smart-seq2. Nat Protoc 9. 881 92. Onsbring, H., Tice, A.K., Barton, B.T., Brown, M.W., and Ettema, T.J.G. (2019). An 882 efficient single-cell transcriptomics workflow to assess protist diversity and lifestyle. bioRxiv, 883 782235. 884 93. Boratyn, G.M., Schäffer, A.A., Agarwala, R., Altschul, S.F., Lipman, D.J., and Madden, 885 T.L. (2012). Domain enhanced lookup time accelerated BLAST. Biol. Direct 7, 12–12. 886 94. Krogh, A., Larsson, B., von Heijne, G., and Sonnhammer, E.L.L. (2001). Predicting 887 transmembrane protein topology with a hidden markov model: application to complete 888 genomes11Edited by F. Cohen. J. Mol. Biol. 305, 567–580. 889 95. Fischer, S., Brunk, B.P., Chen, F., Gao, X., Harb, O.S., Iodice, J.B., Shanmugam, D., Roos, 890 D.S., and Stoeckert, C.J., Jr (2011). Using OrthoMCL to assign proteins to OrthoMCL-DB groups 891 or to cluster proteomes into new ortholog groups. Curr. Protoc. Bioinforma. Chapter 6, Unit- 892 6.12.19.
36 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
893 96. Xie, Y., Wu, G., Tang, J., Luo, R., Patterson, J., and Liu, S. (2014). SOAPdenovo-Trans: 894 de novo transcriptome assembly with short RNA-Seq reads. Bioinformatics 30. 895 896 897 898 899 METHODS
900
901 Method Details
902 We have strategically sampled 118 Amorphea species and their transcriptomes and genomes to
903 cover all known Amorphea subgroups. Additionally, we also searched for IMAC proteins in any
904 of the eukaryotic data in NCBI.
905
906 Culturing and Isolation of Mycamoeba gemmipara
907 Mycamoeba gemmipara was obtained from Q. Blandenier and cultured as in Blandenier et al.
908 2017[90].
909
910 Ultra-low input/ Single-Cell cDNA library construction based on Smart-Seq2
911 The cultured amoeba from Mycamoeba gemmipara were subjected to Smart-Seq2 [91] for cDNA
912 library construction as in Onsbring et al. 2019[92]. About 50 cells scraped off of a agar plate using
913 a 30-gauge platinum wire placed directly into a 200 µL thin-walled PCR tube containing the cell
914 lysis mixture [91]. Each reaction was then subjected to six freeze-thaw cycles in -80 °C
915 isopropanol and ~25 °C DI H2O respectively, before following the remainder of the protocol. A
916 modified version of Smart-Seq2 [91] was used for RNA isolation and dscDNA preparation. cDNA
917 libraries were prepared using a Nextera XT DNA Library Prep Kit (Illumina, CA) following the
37 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
918 manufacturer's protocol with dual index primers. The Mycamoeba gemmipara library was
919 sequenced using an Illumina HiSeq 4000 at Genome Quebec.
920
921 Transcriptomic Sequencing Assembly
922 Raw sequence data from the Illumina sequencing platform or from Bioproject accession number
923 PRJNA380424 [14] were subjected to TRIMMOMATIC V 0.35 [78], for cleaning and trimming of
924 poorly called reads or specific to the adaptor will be used in sequencing with the paired-end data.
925 Quality filtered and cleaned high quality reads were assembled through TRINITY [73] for de novo
926 assembly. Nucleotide sequences were translated with TRANSDECODER V 5.5.0
927 (https://github.com/TransDecoder/TransDecoder/) which also predicts open reading frames, and
928 PFAM domains [88] as well. Contigs of transcriptomes were blasted against the non-redundant
929 database of NCBI using the BLASTX program of BLAST+ suite [93]. Each predicted ORFs were
930 assigned an ortholog number through the OrthoMCL v5 [77] database. For taxa with clearly
931 truncated predicted proteins, the computationally laborious transcriptome assembly pipeline,
932 Oyster River Protocol[84], was performed in an attempt to retrieve predicted a full-length integrin
933 proteins.
934
935 Integrin Protein analysis
936 We selected a model organism IMAC proteins from NCBI: ITGA5:P08648, ITGB1:NP_002202,
937 Talin:AAF27330, Parvin:AAH16713, PINCH:NP_060450.2, vinculin:AAH39174,
938 FAK:AAA35819 Paxillin:AAC50104 ILK:NP_001014794, a-actinin:AAC17470,
939 Filamin:AAF72339 and Tensin:AAG33700. INTERPROSCAN 5.27-66.0[23] was used to determine
940 these IMAC protein domain architecture along with SIGNALIP v 5.0 [83] and TMHMM v 2.0 [94].
38 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
941 MEME-SUITE 5.0.4 [85]was used to examine integrin motifs. DEEPLOC v 1.0[86] was used to
942 determine the subcellular localization of integrin proteins. We set the minimum criteria of
943 canonical IMAC proteins based on their architecture and motifs. We used OrthoMCL to assign
944 IMAC proteins their own ortholog numbers.
945
946 Since OrthoMCL database is heavily metazoan biased, we created an ortholog database with
947 eukaryotes listed in resources table. A modified version of OrthoMCL-DB [95] was used to create
948 a novel ortholog database using the above listed transcriptomes and genomes as well as the whole
949 OrthoMCL DB v5. To do this, an all-against-all BLAST using DIAMOND-BLASTP[76] was
950 conducted using each protein from the above data as queries. The DIAMOND-BLASTP collected up
951 to 1,000 hits with e-value of 1e-5 as a cutoff for putative homology. Blast results were clustered
952 using the OrthoMCL pipeline methodology [95].
953
954 BLASTP was used to search for novel amoebozoan IMAC proteins through the new ortholog
955 database and using the previous amoebozoan ortholog protein as a query. Using a modified custom
956 pipeline was used to collect a novel IMAC orthologs. These methods generated various orthologs
957 with a similar protein architecture. We created a FASTA file of all novel eukaryote integrin
958 proteins based on their protein architecture. Clustering of these integrin proteins was performed
959 by CD- HIT [71], which used the condition of 0.95 global sequence identity. From CD-HIT output,
960 we used DIAMOND [76] to perform All-vs-ALL blast with database size of 50 million sequences.
961 Consequently, a custom script of Markov cluster (MCL) algorithm[89] was used to cluster the
962 output of DIAMOND. A custom PYTHON script was used to eliminate any MCL clustered integrin
39 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
963 proteins that were less than 500 AA. We used a custom PYTHON pipeline to collects ortholog
964 sequences that clustered to model metazoan IMAC orthologs listed above.
965
966 Sometimes, we fail to capture integrin proteins even after all these strenuous processes. Therefore,
967 PFAM ID of Integrins was used to grab any putative integrins from TRANSDECODER output. For
968 ITGA, we used FG-GAP (PF01839), and for ITGB we used PSI_integrin (PF17205), integrin_beta
969 (PF00362), integrin_b_cyt (PF08725) and integrin B tail (PF07965). We disregarded any protein
970 sequences with PFam ID lower than e-value of 1e-10. The identity of the putative integrins were
971 confirmed by INTERPROSCAN, SIGNALIP, TMHMM, DEEPLOC, and MEME-SUITE.
972
973 For each putative integrin protein, we examined proteins motifs with MEME-SUITE, and always
974 blasted our proteins of interest using BLASTP against NCBI’s GenBank NR database to check for
975 possible contaminations. For integrin proteins with a novel domains, the read depth of these
976 transcripts were examined with RSEM. The final amoebozoan integrin architectures were compared
977 with metazoan integrin proteins. A custom PYTHON script was used to create a presence/absence
978 binary (1,0) matrix of IMAC proteins across Amoebozoa. A custom R script were used to create a
979 heatmap.
980
981 In our current study these data are transcriptomic in nature and by this virtue some of the predicted
982 transcript may be truncated or missing. However, identifying full-length transcripts from short
983 reads from Illumina technology can be difficult because assembled transcript can be fragmented
984 or incomplete due to alternative splice sites and untranslated regions [96]. Therefore, it is possible
985 that some transcripts are not fully assembled, and therefore their corresponding predicted protein
40 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
986 is truncated, thus the transmembrane region and signal peptide region may be missing. The same
987 can be said for the absence of these proteins within our data, because transcriptomes contain only
988 genes being expressed at the time mRNA was harvested some genes within an organism’s genome
989 may be missed. None-the-less, our methodology permitted the unparalleled discovery of these
990 proteins within a whole supergroup, highlighting the utility of the method.
991
992 Manually assembled contigs in SEQUENCHER
993 Where necessary due to fragmentation, contigs of our automated de novo transcriptomic
994 assemblies as above from each gene of interest were blasted (BLASTN) back to the raw nucleotide
995 data and to the assembly for each transcriptome. Hits were collected and assembled using
996 SEQUENCHER v 5.4.6 (GeneCodes, Madison, WI, USA). The taxa which we had to take this
997 approach were Amphizonella sp. 2, Centropyxis aerophyla, Goncevia foncevia and Hyalosphenia
998 papilio, Pellita catalonica for ITGA, and Nebela sp. for both integrin proteins.
999
1000 Presence of integrin in the Mastigamoeba genome
1001 To confirm the presence of these genes in the genome, we blasted the integrin transcripts of M.
1002 balamuthi to the whole shotgun genome data (CBKX00000000;
1003 https://www.ncbi.nlm.nih.gov/nuccore/CBKX000000000.1) on NCBI [33]. The common splicing
1004 sites were for searched in integrins intron and exon boundaries by assembling the integrin genomic
1005 contig and integrin transcript in SEQUENCHER v 5.4.6. The genome contigs of which contained
1006 integrin gene were searched for additional ORFs to examine the phylogenetic affinity of other
1007 ORFs on the genomic contigs. We inferred maximum likelihood phylogenetic tree of tenascin
41 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
1008 (adjacent to the ITGB gene on CBKX010020318.1) and PA14 (adjacent to the ITGB gene on
1009 CBKX010020319.1), we used same conditions as the integrin phylogenetic trees (see below).
1010 1011 Phylogenetic trees
1012 To look for evolutionary relationships of ITGA and ITGB within Amorphea, protein sequences
1013 were aligned by MAFFT-LINSI with the parameters “--maxiterate 1000” and “--local pair”.
1014 Ambiguous sites were trimmed from the alignments using BMGE [80] by gap penalty of 0.8 settings.
1015 Maximum likelihood (ML) trees were inferred from these trimmed alignments in IQTREE v 1.5.5
1016 [81]under the LG model with the C60 series model of site heterogeneity. Each tree is ML
1017 bootstrapped (MLBS) by 1,000 pseudoreplicates. From the resultant tree we used a custom
1018 PYTHON script implementing ETE-TOOLKIT (www.etetoolkit.org) to map protein domain
1019 architecture (IPRSCAN domain IDs) onto the tree.
1020
1021 Data and software availability
1022 The accession number for the Mycamoeba gemmipara and Mastigamoeba balamuthi
1023 transcriptome raw reads is NCBI-SRA: XXXXX and XXXXX also shown in Key resources table.
1024 All protein and nucleotide sequences of each gene, all protein alignments (trimmed and untrimmed)
1025 for phylogenetic trees, and phylogenetic trees are available on the DRYAD repository
1026 (https://datadryad.org/ | Accession: XXXXX).
1027
1028
1029 Supplemental Text
1030 SUPPLEMENTAL METHODS, NOTES, AND RESULTS 1031 Key resources Table
42 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Resources Sources Identifier 1032 Data Examined Acanthamoeba castellanii [56] https://doi.org/10.1186/gb-2013-14-2-r11 Acanthamoeba pyroformis [57] https://doi.org/10.1186/s13062-016-0171- 0 Acanthamoeba astronyxis PRJEB7687 https://www.ncbi.nlm.nih.gov/bioproject/ PRJEB7687/ Acanthamoeba comandoni PRJNA354221 https://www.ncbi.nlm.nih.gov/bioproject/ PRJNA354221 Acanthamoeba culbertsoni PRJEB7687 https://www.ncbi.nlm.nih.gov/bioproject/ PRJEB7687 Acanthamoeba divionensis PRJEB7687 https://www.ncbi.nlm.nih.gov/bioproject/ PRJEB7687 Acanthamoeba healyi PRJEB7687 https://www.ncbi.nlm.nih.gov/bioproject/ PRJEB7687 Acanthamoeba lenticulate PRJNA374454 https://www.ncbi.nlm.nih.gov/bioproject/ PRJNA374454 Acanthamoeba lugdunensis PRJEB7687 https://www.ncbi.nlm.nih.gov/bioproject/ PRJEB7687 Acanthamoeba mauritaniensis PRJEB7687 https://www.ncbi.nlm.nih.gov/bioproject/ PRJEB7687 Acanthamoeba pearcei PRJEB7687 https://www.ncbi.nlm.nih.gov/bioproject/
PRJEB7687 Acanthamoeba polyphaga PRJNA307312 https://www.ncbi.nlm.nih.gov/bioproject/
PRJNA307312 Acanthamoeba quina PRJEB7687 https://www.ncbi.nlm.nih.gov/bioproject/
PRJEB7687 Acanthamoeba rhysodes PRJEB7687 https://www.ncbi.nlm.nih.gov/bioproject/
PRJEB7687 Acanthamoeba royreba PRJEB7687 https://www.ncbi.nlm.nih.gov/bioproject/
PRJEB7687 Acytostelium ellipticum PRJEB14640 https://www.ncbi.nlm.nih.gov/bioproject/
PRJEB14640 Acytostelium leptosomum PRJEB14640 https://www.ncbi.nlm.nih.gov/bioproject/
PRJEB14640 Acytostelium subglobosum [58] https://doi.org/10.1186/s12864-015-1278-
x
Amoeba proteus [14] https://doi.org/10.1093/molbev/msx162 Amphizonella sp. 2 [59] https://doi.org/10.1016/j.cub.2019.01.078 Amphizonella sp. 7 [59] https://doi.org/10.1016/j.cub.2019.01.078 Amphizonella sp. 8 [59] https://doi.org/10.1016/j.cub.2019.01.078 Amphizonella sp. 9 [59] https://doi.org/10.1016/j.cub.2019.01.078
Amphizonella sp.10 [59] https://doi.org/10.1093/molbev/msx162
Arcella gibbosa [14] https://doi.org/10.1093/molbev/msx162 Arcella vulgaris [59] https://doi.org/10.1016/j.cub.2019.01.078
43 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Armaparvus languidus MMETSP0420 https://www.imicrobe.us/project/view/304 Cavenderia deminutiva PRJEB14640 https://www.ncbi.nlm.nih.gov/bioproject/ PRJEB14640
Cavostelium apophysatum [14] https://doi.org/10.1093/molbev/msx162 Centropyxis aerophyla [59] https://doi.org/10.1016/j.cub.2019.01.078 Centropyxis sp. [59] https://doi.org/10.1016/j.cub.2019.01.078
Ceratiomyxa fruticulosa [14] https://doi.org/10.1093/molbev/msx162
Ceratiomyxella tahitiensis [14] https://doi.org/10.1093/molbev/msx162
Clastostelium recurvatum [14] https://doi.org/10.1093/molbev/msx162 Clydonella sp. [60] https://doi.org/10.1016/j.ympev.2016.03.0
29
Cochliopodium minus [14] https://doi.org/10.1093/molbev/msx162
Copromyxa protea [14] https://doi.org/10.1093/molbev/msx162 Coremiostelium polycephalum PRJEB14640 https://www.ncbi.nlm.nih.gov/bioproject/ PRJEB14640
Cryptodifflugia operculata [14] https://doi.org/10.1093/molbev/msx162
Cunea sp. [JDS-Ruffled] [14] https://doi.org/10.1093/molbev/msx162 Cunea sp. [MMETSP0417] MMETSP0417 https://data.iplantcollaborative.org/dav/ipl ant/commons/community_released/imicro be/projects/104/samples/1830/ Cyclopyxis sp. [59] https://doi.org/10.1016/j.cub.2019.01.078
Dermamoeba algensis [14] https://doi.org/10.1093/molbev/msx162 Dictyostelium citrinum [61] https://doi.org/10.1093/molbev/msn088 Dictyostelium discoideum [62] https://doi.org/10.1038/nature03481 Dictyostelium firmibasis [63] https://doi.org/10.1006/plas.1998.1385 Dictyostelium intermedium PRJNA45879 https://www.ncbi.nlm.nih.gov/bioproject/ PRJNA45879 Dictyostelium purpureum [64] https://doi.org/10.1006/10.1186/gb-2011- 12-2-r20 Difflugia bryophila [59] https://doi.org/10.1016/j.cub.2019.01.078
Difflugia sp. [14] https://doi.org/10.1093/molbev/msx162 Difflugia sp. D2 [59] https://doi.org/10.1016/j.cub.2019.01.078
Diplochlamys sp. [14] https://doi.org/10.1093/molbev/msx162 Dracoamoeba jomungandrii [57] https://doi.org/10.1186/s13062-016-0171- 0
Echinamoeba exudans [14] https://doi.org/10.1093/molbev/msx162
Echinosteliopsis oligospora [14] https://doi.org/10.1093/molbev/msx162
Echinostelium bisporum [14] https://doi.org/10.1093/molbev/msx162
Echinostelium minutum [14] https://doi.org/10.1093/molbev/msx162 Endostelium zonatum [PRA-191] [57] https://doi.org/10.1186/s13062-016-0171-
0
Endostelium zonatum LINKS [14] https://doi.org/10.1093/molbev/msx162 Entamoeba dispar PRJNA12916 https://www.ncbi.nlm.nih.gov/bioproject/ 12916
44 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Entamoeba histolytica [65] https://doi.org/10.1038/nature03291 Entamoeba invadens [66] https://doi.org/10.1186/gb-2013-14-7-r77 Entamoeba moshkovskii PRJNA12921 https://www.ncbi.nlm.nih.gov/bioproject/ 12921 Entamoeba nuttalli [67] https://doi.org/10.1016/j.molbiopara.2007.
02.006
Filamoeba nolandi MMETSP0413 https://www.imicrobe.us/assembly/view/2 93
Flabellula citata [14] https://doi.org/10.1093/molbev/msx162
Flamella aegyptia [14] https://doi.org/10.1093/molbev/msx162
Gocevia fonbrunei [14] https://doi.org/10.1093/molbev/msx162 Gocevia fonbrunei_Tekle_2016 [60] https://doi.org/10.1016/j.ympev.2016.03.0
29
Grellamoeba robusta [14] https://doi.org/10.1093/molbev/msx162 Heleopera sphagni [59] https://doi.org/10.1016/j.cub.2019.01.078 Heleopera sylvatica [59] https://doi.org/10.1016/j.cub.2019.01.078 Heterostelium multicystogenum PRJNA495730 https://www.ncbi.nlm.nih.gov/bioproject/ PRJNA495730 Heterostelium pallidum [61] http://dictybase.org/ Hyalosphenia elegance [59] https://doi.org/10.1016/j.cub.2019.01.078 Hyalosphenia papilio [59] https://doi.org/10.1016/j.cub.2019.01.078 Idionectes vortex PRJNA531640 https://www.ncbi.nlm.nih.gov/bioproject/? term=Idionectes%20vortex Lesquereusia sp. [59] https://doi.org/10.1016/j.cub.2019.01.078
Lingulamoeba sp. [14] https://doi.org/10.1093/molbev/msx162 Luapelamoeba arachisporum [57] https://doi.org/10.1186/s13062-016-0171- 0 Luapelamoeba hula [57] https://doi.org/10.1186/s13062-016-0171- 0
Mastigamoeba abducta [14] https://doi.org/10.1093/molbev/msx162
Mastigamoeba balamuthi (genome) [33] http://doi.org/10.1073/pnas.1219590110 Mastigamoeba balamuthi SRA TBD To be provided (transcriptome)
Mastigella eilhardi [14] https://doi.org/10.1093/molbev/msx162
Mayorella cantabrigiensis [14] https://doi.org/10.1093/molbev/msx162
Micriamoeba sp. [14] https://doi.org/10.1093/molbev/msx162 Microchlamys sp. [59] https://doi.org/10.1016/j.cub.2019.01.078 Mycamoeba gemmipara SRA TBD https://doi.org/10.1111/jeu.1235 Nebela sp. [59] https://doi.org/10.1016/j.cub.2019.01.078
Nematostelium gracile [14] https://doi.org/10.1093/molbev/msx162 Netzelia oviformis [59] https://doi.org/10.1016/j.cub.2019.01.078
Nolandella sp. [14] https://doi.org/10.1093/molbev/msx162
Ovalopodium desertum [14] https://doi.org/10.1093/molbev/msx162
Paradermamoeba levis [14] https://doi.org/10.1093/molbev/msx162
45 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Paramoeba aestuarina MMETSP0161_2 https://www.imicrobe.us/assembly/view/2 05 Paramoeba atlantica MMETSP0151_2 https://www.imicrobe.us/assembly/view/2 10
Parvamoeba rugata [14] https://doi.org/10.1093/molbev/msx162 Parvamoeba monoura [32] https://doi.org/10.1016/j.ympev.2016.03.0
29
Pellita catalonica [14] https://doi.org/10.1093/molbev/msx162
Pelomyxa sp. [14] https://doi.org/10.1093/molbev/msx162
Phalansterium solitarium [14] https://doi.org/10.1093/molbev/msx162 Physarum polycephalum [68] https://dx.doi.org/10.1093/gbe/evv237
Pygsuia biforma [11] http://doi.org/10.1098/rspb.2013.1755 Planocarina carinata [59] https://doi.org/10.1016/j.cub.2019.01.078 Polysphondylium pallidum [69] http://dictybase.org/
Protacanthamoeba bohemica [14] https://doi.org/10.1093/molbev/msx162
Protosporangium articulatum [14] https://doi.org/10.1093/molbev/msx162
Protostelium nocturnum [14] https://doi.org/10.1093/molbev/msx162 Pyxidicula operculatum [59] https://doi.org/10.1016/j.cub.2019.01.078
Rhizamoeba saonxica [14] https://doi.org/10.1093/molbev/msx162
Rhizomastix elongate [14] https://doi.org/10.1093/molbev/msx162
Rhizomastix libera [14] https://doi.org/10.1093/molbev/msx162
Ripella sp. [14] https://doi.org/10.1093/molbev/msx162 Sapocribrum chincoteaguense MMETSP0437 https://www.imicrobe.us/assembly/view/3 14
Sappinia pedata [14] https://doi.org/10.1093/molbev/msx162
Schizoplasmodiopsis [14] https://doi.org/10.1093/molbev/msx162 pseudoendospora
Schizoplasmodiopsis vulgaris [14] https://doi.org/10.1093/molbev/msx162 Speleostelium caveatum PRJNA495862 https://www.ncbi.nlm.nih.gov/bioproject/ PRJNA495862
Soliformovum irregularis [14] https://doi.org/10.1093/molbev/msx162
Squamamoeba japonica [14] https://doi.org/10.1093/molbev/msx162
Stenamoeba limacine [14] https://doi.org/10.1093/molbev/msx162
Stenamoeba stenopodia [14] https://doi.org/10.1093/molbev/msx162 Stygamoeba regulata MMETSP0447 https://www.imicrobe.us/assembly/view/3 16 Synstelium polycarpum PRJEB14640 https://www.ncbi.nlm.nih.gov/bioproject/ PRJEB14640 Thecamoeba quadrilineata [60] https://doi.org/10.1016/j.ympev.2016.03.0
29
Thecamoeba sp. [14] https://doi.org/10.1093/molbev/msx162
Thecamoebidae isolate [RHP1-1] [14] https://doi.org/10.1093/molbev/msx162 Trichosphaerium sp. (ATCC 40318) MMETSP0405 https://www.imicrobe.us/assembly/view/3 00
46 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Tychosporium acutostipes [14] https://doi.org/10.1093/molbev/msx162
Vannella fimicola [14] https://doi.org/10.1093/molbev/msx162 Vannella robusta MMETSP0166 https://www.imicrobe.us/?#/samples/2469 Vannella schaefferi MMETSP0417 https://www.imicrobe.us/assembly/view/2 95 Vannella sp. MMETSP0168 https://www.imicrobe.us/assembly/view/3 4
Vermamoeba sp. [14] https://doi.org/10.1093/molbev/msx162
Vermamoeba vermiformis [14] https://doi.org/10.1093/molbev/msx162 Vermistella antarctica [60] https://doi.org/10.1016/j.ympev.2016.03.0
29 Vexillifera bacillipedes [70] http://doi.org/10.1016/j.ympev.2014.08.01
1 Vexillifera sp. MMETSP0173 https://www.imicrobe.us/#/samples/1757
1033 List of Stramenophile TAXA from MMETSP 1034 Alexandrium minutum MOORE-via-CAMERA-MMETSP0328 1035 Aureoumbra_lagunensis CCMP1509 MMETSP0890 1036 Bolidomonas_sp RCC1657 MMETSP1321 1037 Bolidomonas_sp RCC2347 MMETSP1320 1038 Bolidomonas pacifica MMETSP1319-RCC208 1039 Cafeteria_sp CaronLabIsolate MMETSP1104 1040 Chattonella_subsalsa CCMP2191 MMETSP0947 1041 Chrysocystis_fragilis CCMP3189 MMETSP1165 **** 1042 Dictyocha_speculum CCMP1381 MMETSP1174 1043 Dinobryon_sp UTEXLB2267 MMETSP0019 1044 Fibrocapsa japonica-CCMP1661 MOORE-via-CAMERA-MMETSP1339 1045 Florenciella_sp RCC1587 MMETSP1324 Stramenopiles 1046 Florenciella parvula MMETSP1323-RCC1693 1047 Heterosigma_akashiwo CCMP2393 MMETSP0292 1048 Love2098 non_described LOVEJOY CMP2098 MMETSP0990 1049 Mallomonas_sp CCMP3275 MMETSP1167 1050 Ochromonas_sp BG1 MMETSP1105 1051 Paraphysomonas_bandaiensis CaronLabIsolate MMETSP1103 1052 Paraphysomonas_vestita GFlagA MMETSP1107 1053 Pelagomonas_calceolata CCMP1756 MMETSP0886 1054 Pelagococcus_subviridis CCMP1429 MMETSP0882 1055 Phaeomonas_parva CCMP2877 MMETSP1163 1056 Pinguiococcus_pyrenoidosus CCMP2078 MMETSP1160 1057 Pseudopedinella_elastica CCMP716 MMETSP1068 1058 Rhizochromulina_marina cf CCMP1243 MMETSP1173 1059 Spumella_elongata CCAP955 1 MMETSP1098 1060 Thraustochytrium_sp LLF1b MMETSP0198 1061 Vaucheria_litorea CCMP2940 MMETSP0945
47 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
1062 1063 Reagents 2-propanol(certified ACS) Sigma-aldrich I9516-25ML Ethanol (pure) 200 proof
Critical Commercial Assay Kits Nextera XT DNA Library Prep Kit Illumina FC-131-1096 Nextera XT index kit v2 set A Illumina FC-131-2001 Nextera XT index kit v2 set B Illumina FC-131-2002 Nextera XT index kit v2 set C Illumina FC-131-2003 Nextera XT index kit v2 set C Illumina FC-131-2003
1064 Software and Algorithms CD-HIT [71] https://github.com/weizhongli/cdhit TRINITY V2.4.0 [72] https://github.com/trinityrnaseq/trinityrnase q/ TRANSDECODER V 5.5.0 [73] https://transdecoder.github.io/ TMHMM V 2.0 [74] http://www.cbs.dtu.dk/services/TMHMM/ BOWTIE2 V 2.3.4.3 [75] https://sourceforge.net/projects/bowtie- bio/files/bowtie2/2.3.3.1 DIAMOND V 0.9.25 [76] https://github.com/bbuchfink/diamond ORTHOMCL V 5.0 [77] http://orthomcl.org/orthomcl/ TRIMMOMATIC V 0.35 [78] https://github.com/timflutre/trimmomatic MAFFT-LINSI V 7 [79] https://mafft.cbrc.jp/alignment/server/ BMGE V 1.12 [80] ftp://ftp.pasteur.fr/pub/GenSoft/projects/BM GE SEQUENCER v 5.4.6. http://www.genecodes.com/ BLAST 2.2.30+ https://blast.ncbi.nlm.nih.gov/ IQTREE v 1.5.5 [81] http://www.iqtree.org RSEM [82] https://github.com/deweylab/RSEM SIGNALIP 5.0 [83] http://www.cbs.dtu.dk/services/SignalP/ OYSTER RIVER PROTOCOL V 2.1.1 [84] https://oyster-river- protocol.readthedocs.io/en/latest/ MEME-SUITE V 5.0.4 [85] http://meme-suite.org/index.html DEEPLOC -1.0 [86] http://www.cbs.dtu.dk/services/DeepLoc/ INTERPROSCAN 5.27-66.0 [23] https://www.ebi.ac.uk/interpro/search/seque nce-search ETE3 [87] http://etetoolkit.org/documentation/ete- view/ PFAM [88] https://pfam.xfam.org/ MCL [89] https://micans.org/mcl/
48 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
1065
1066
1067 1068 Contact for Reagent and resources sharing
1069 Further information and request for resources should be directed to, and will be fulfilled by, the
1070 corresponding author Matthew W. Brown ([email protected]).
1071
1072 Sib (similar to integrin beta) proteins found in dictyostelids | Sib Proteins comparison with
1073 ITB-like
1074 SibA, a cell adhesion molecule that binds to phagocytic particles similar to ITB, was discovered
1075 in Dictyostelium discoideum [45]. Sib proteins (A to E) are capable of binding to talin and SibC
1076 regulates cell adhesion [46]. However their protein structure is different from the canonical ITB
1077 in a lack of three cation binding motifs as well as a cysteine rich region at the C-terminus [26].
1078 Although the varioseans, described above, are phylogenetically close to dictyostelids their type
1079 two ITB-like proteins with three cation motifs are distinct from Sib proteins, though with a great
1080 variation in amino acid sequences. Interestingly, we did not observe Sib proteins outside of the
1081 Dictyostelia clade, suggesting that they evolved recently in the dictyostelids. The protein
1082 architecture of SibA has a similar composition to metazoan ITB solely with a GXXXG motif in
1083 the transmembrane region, but it is observed in type II ITB (see supplemental table 2). In
1084 addition, LamG3, vWD domains in type II ITB have never been observed in Sib.
1085
1086 1087 1088 1089 1090 1091 1092
49 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
1093 1094 1095 1096
IntegrinIntegrin Actinin axillinalin AK ilamin α− β− P PINCHF α− F Vinculin T ILK TensinTaxon Amphizonella sp. 7 Tubulinea Amphizonella sp. 9 Amphizonella sp. 8 Amphizonella sp.10 Amphizonella sp. 2 Diplochlamys sp. Trichosphaerium sp. Micriamoeba sp. Echinamoeba exudans Vermamoeba vermiformis Vermamoeba sp. Flabellula citata Rhizamoeba_saxonica Cryptodifflugia operculata Pyxidicula operculatum Microchlamys sp. Heleopera sylvatica Heleopera sphagni Nebela sp. Hyalosphenia papilio Centropyxis aerophyla Lesquereusia sp. Difflugia sp. D2 Difflugia bryophila Arcella vulgaris Nolandella sp. Copromyxa protea Amoebozoa Amoeba proteus Armaparvus languidus Sapocribrum chincoteaguense Squamamoeba japonica Rhizomastix elongata Evosea Rhizomastix libera Pelomyxa sp. Mastigella eilhardi Mastigamoeba balamuthi Mastigamoeba abducta Entamoeba histolytica Entamoeba dispar Entamoeba invadens Entamoeba moshkovskii Entamoeba nuttalli ILK Polysphondylium pallidum Dictyostelium discoideum Physarum polycephalum Echinosteliopsis oligospora Echinostelium minutum ILK Echinostelium bisporum Protosproangium articulatum Clastostelium recurvatum Ceratiomyxa fruticulosa Flamella aegyptia Phalansterium solitarium Schizoplasmodiopsis vulgaris Tychosporium acutostipes Schizoplasmodiopsis pseudoendospora Cavostelium apophysatum Protostelium nocturnum Filamoeba nolandi Soliformovum irregularis Grellamoeba robusta Nematostelium gracile Acramoeba dendroidea Ceratiomyxella tahitiensis Stenamoeba stenopodia Stenamoeba limacina
Thecamoeba quadrilineata Discosea Thecamoeba sp. Sappinia pedata Thecamoebidae isolate [RHP1-1] Mayorella cantabrigiensis Paradermamoeba levis Dermamoeba algensis Mycamoeba Gemmipara Vermistella antarctica Stygamoeba regulata Vexillifera minutissima Vexillifera sp. [MMETSP0173] Paramoeba atlantica Paramoeba aestuarina Cunea sp. Cunea sp. [JDS-Ruffled] Vannella sp. Vannella robusta Ripella sp. Lingulamoeba sp. ILK Clydonella sp. Ovalopodium desertum Parvamoeba rugata Gocevia fonbrunei Cochliopodium minus Pellita catalonica Endostelium zonatum LINKS Endostelium zonatum [PRA-191] Dracoamoeba jomungandrii Protacanthamoeba bohemica Luapelamoeba arachisporum Luapelamoeba hula Acanthamoeba castellanii Acanthamoeba pyroformis
1097 1098 Supplemental Figure 1: Repertoire of IMAC including filamin and tensin in Amoebozoa: 1099 names of IMAC proteins are listed at the top and names of amoebozoan taxa are listed on side of 1100 the map. Arrows indicates amoebozoan species that have nearly complete set of IMAC proteins.
50 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
1101 Dark blue indicates the presence of a canonical IMAC protein, white indicates the absence of an 1102 IMAC protein and light blue indicates a truncated form of IMAC protein. 1103 1104
vin r rvin a a IntegrinIntegrinP P Actinin A axillinalin AK α− β− α− β− VinculinP T PINCHF ILK α−Taxon B β Arcella vulgaris Arcella Gibbosa Netzelia oviformis Cyclopyxis sp α Difflugia sp. D2 Difflugia bryophila
Lesquereusia sp. Tubulinea Difflugia sp. USP Centropyxis aerophyla Centropyxis sp. Planocarina carinata Nebela sp. Hyalosphenia papilio Hyalosphenia elegance Heleopera sylvatica Heleopera sphagni Pyxidicula operculatum Microchlamys sp. Cryptodifflugia operculata Nolandella sp. Extracellular Copromyxa protea Amoeba proteus Ptoleamoeba bullinesis Cytoplasmic Rhizamoeba_saxonica c-Src FAKVin Flabellula citata Amphizonella sp. 2 L Amphizonella sp. 7 Pin Amphizonella sp. 9 or Tal Amphizonella sp. 8
Amphizonella sp.10 Pax αA Diplochlamys sp. ILK e Trichosphaerium sp. Echinamoeba exudans Par Vermamoeba vermiformis Vermamoeba sp. Micriamoeba sp. Armaparvus languidus Idionectes vortex Actin Sapocribrum chincoteaguense Squamamoeba japonica Entamoeba nuttalli pax Entamoeba moshkovskii Fungi Entamoeba invadens Evosea vin Animals Entamoeba dispar Pygsuia pin Entamoeba histolytica tal Choanofagellates Rhizomastix libera ILK par Capsaspora Rhizomastix elongata Thecamonas integrins Pelomyxa sp. pin Mastigella eilhardi Mastigamoeba balamuthi ILK FAK Mastigamoeba abducta par Dictyostelium discoideum Dictyostelium citrinum Amoebozoa Tieghemostelium lacteum Dictyostelium firmibasis Dictyostelium intermedium Dictyostelium purpureum Speleostelium caveatum Cavenderia deminutiva integrins Holozoa Acytostelium ellipticum Acytostelium leptosomum Acytostelium subglobosum ILK Synstelium polycarpum Coremiostelium polycephalum Heterostelium pallidum Heterostelium multicystogenum Physarum polycephalum Echinostelium bisporum Echinosteliopsis oligospora Echinostelium minutum Clastostelium recurvatum Protosproangium articulatum Opisthokonta Ceratiomyxa fruticulosa Phalansterium solitarium Flamella aegyptia Schizoplasmodiopsis vulgaris Tychosporium acutostipes Schizoplasmodiopsis pseudoendospora Cavostelium apophysatum Protostelium nocturnum Filamoeba nolandi Soliformovum irregularis Grellamoeba robusta Nematostelium gracile Obazoa Ceratiomyxella tahitiensis Acramoeba dendroidea Vexillifera minutissima Innovation/Gain Vexillifera sp. [MMETSP0173] Paramoeba atlantica Paramoeba aestuarina Loss Cunea sp. [MMETSP0417] Cunea sp. [JDS-Ruffled] Vannella sp. Vannella robusta Other Eukaryotes Amorphea Vannella firmcola Vannella schaefferi Ripella sp. Lingulamoeba sp. Clydonella sp. Stygamoeba regulata Vermistella antarctica Stenamoeba stenopodia
Stenamoeba limacina Discosea Thecamoeba quadrilineata Thecamoeba sp. Sappinia pedata Thecamoebidae isolate [RHP1-1] Mycamoeba Gemmipara Mayorella cantabrigiensis Paradermamoeba levis Dermamoeba algensis Ovalopodium desertum Parvamoeba rugata Parvamoeba monoura Cochliopodium minus Pellita catalonica Gocevia fonbrunei Gocevia fonbrunei_Tekle_2016 Endostelium zonatum LINKS Endostelium zonatum [PRA-191] Dracoamoeba jomungandrii Protacanthamoeba bohemica Luapelamoeba arachisporum Luapelamoeba hula Acanthamoeba castellanii Acanthamoeba pyroformis Acanthamoeba healyi Canonical Protein Present Acanthamoeba astronyxis Acanthamoeba commandoni Acanthamoeba culbertsoni Truncated Protein Present Acanthamoeba divionensis Acanthamoeba healyi Protein Absent in Data Acanthamoeba lenticulate Acanthamoeba lugdunensis Acanthamoeba mauritaniensis Nearly Complete IMAC Acanthamoeba pearcei Acanthamoeba polyphaga Acanthamoeba quina Acanthamoeba rhysodes 1105 Acanthamoeba royreba 1106 Supplemental Figure 2: Repertoire of IMAC in only amoebozoan species that have either 1107 integrin alpha or beta. Names of IMAC proteins are listed at the top and names of amoebozoan 1108 taxa are listed on side of the map. Arrows indicates amoebozoan species that have nearly 1109 complete set of IMAC proteins. Dark blue indicates the presence of a canonical IMAC protein, 1110 white indicates the absence of an IMAC protein and light blue indicates a truncated form of 1111 IMAC protein 1112 1113
51 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
1114 1115 Supplemental Figure 3: Maximum likelihood tree of ITA homolog and other cell adhesion 1116 molecules that contain beta-propellers. Bootstrap values ≥50 are shown on the branch points. 1117 Phylogenetic tree was built with IQtree v1.5.5 under under the LG+C60+F+G model of protein 1118 evolution ML bootstrap (MLBS) (1000 ultrafast BS reps) values respectively. The protein 1119 sequences were aligned by MAFFT with the parameter linsi, maxiterate 1000 and local pair. 1120 BMGE was used to mask the alignment with the parameter of gap penalty of 0.8. ITA homolog 1121 phylogenetic tree are rooted with Linkin protein. Amoebozoan ITA genes are mostly 1122 concentrated on a single clade (colored) except for the Flabellula citata paralog. 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138
52 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Trichoplax sp. H2_3000888_ITGB6
Trichoplax sp. H2_3000889_ITGB1 100 Metazoan_ITGB
Ministeria_vibrans_c46
9 2 Pigoraptor_vietnamica_104197_Paralog2 100 Pigoraptor_chileana_93177_Paralog1
Syssomonas_multiformis_24197
Ministeria_vibrans_c334
7 5 Capsaspora_owczarzaki_CAOG01283T0 9 4 Pigoraptor_vietnamica_12581_Paralog1
100 Opistho-2_34916_Pigoraptor 8 6 100 Pigoraptor_chileana_34909_Paralog2
Ministeria_vibrans_c191
Capsaspora_owczarzaki_CAOG04086T0
Capsaspora_owczarzaki_CAOG05058T0 9 6 Sphaeroforma arctica_XP_014159356.1_ITGB 100 7 8 Creolimax_fragrantissima_CFRG6586T1
Capsaspora_owczarzaki_CAOG02005T0
Trichodesmium erythraeum_ABG51146.1_ITGB-like 9 1 Rhodobacteraceae bacterium_KPP84658.1_ITGB_like
Schizoplasmodiopsis_vulgaris_TR1491 9 3 Cavostelium_apophysatum_c13787 9 0 9 3
100 Filamoeba_nolandi_DN12164
9 8 Soliformovum_irregularis_c14427
9 2 9 0 Clastostelium_recurvatum_c10578
Didymoeca costata_m.142452_ITGB-like 9 0 9 2 Pygsuia_biforma_ITGB 9 8 Lenisia_limosa_LenisMan46
Thecamonas_trahens_AMSG_06622T0
Flabellula_citata_135911 8 7 9 6 Mastigamoeba_balamuthi_MMETSP0035-20110809 7 8 Mastigella_eilhardi_A1302_1610_2327 9 9 Idionectes_vortex_TRINITY_DN4916_c4_g12_paralog1
Rigifila_ramosa_a2731185122
Rhizamoeba_saxonica_TR7441 9 1 9 7 100 Rhizamoeba_saxonica_TR11158
7 5 Rhizamoeba_saxonica_TR7856
Idionectes_vortex_TRINITY_DN4117_C0_G1_I1_paralog2
Echinamoeba_exudans_TR10338
Pyxidicula_operculatum_TR11783 9 1 Cryptodifflugia_operculata_c2 100 9 7 Heleopera_sphagni_DN3395
9 0 Difflugia_sp._D2_TR12857 100 Difflugia_sp._USP_TR583 9 9
9 9 Hyalosphenia_papilio_DN7101
Nebela_sp._0_TR6609
100 Sib_Dictyostelium discoideum
1139 0.5 1140 Supplemental Figure 4: Maximum likelihood tree of ITB homolog and Sib protein. Bootstrap 1141 values ≥50 are shown on the branch points. Phylogenetic tree was built with IQtree v1.5.0 under 1142 the LG+C60+F+G model of protein evolution ML bootstrap (MLBS) (1000 ultrafast BS reps) 1143 values respectively. The protein sequences were aligned by MAFFT with the parameter linsi, 1144 maxiterate 1000 and local pair. BMGE was used to mask the alignment with the parameter of 1145 gap penalty of 0.6. ITB homolog phylogenetic tree are rooted with Sib protein. Amoebozoan ITB 1146 genes are mostly concentrated on a single clade (colored). 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157
53 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
ITGA Obazoan Motif
A. 4
3
bits 2 R F E Y A N 1 Q P V T FY N I L GS Q I S Y S SM G K T N L T E I M K V N V GT G G T E L S V T Y D F P THR S YL L RF S R K Q L G M NDNC VLVR D G I L M D AD V K C E L A V K S I A R A AA ADGRAS I PA S D F I MSC A 2 3 4 5 6 7 8 9 0 1 D AVI F 10 11 G12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 G N GAMEME (no SSC) 16.04.2019 22:29 B. 4 D DG D P 3
bits 2 S V F S 1 LWSYQA PL ER D GCY 2 3 4 5 6 7 8 9 0 1 Q C A MLGHD H 10 11 12 V13 E14 G Y MEME (no SSC) 26.06.2019P 15:18
C. 4 C G C 3
bits 2 GM Y L K P PT 1 K AL T F I K A T QFYEVN K S H A N E SA S M R M SE A M LA A D T CVLWRM M RENLP HQ L AFGVAGPHA L KT KTEDHDG 2 3 4 5 6 7 8 9 0 1 A AH K
GF 10 11 12 13 14 15 16 17 18 19 20 RK MEME (no SSC) 16.04.2019 22:34 D. P 4 F 3
bits 2 R V T I GY T GVT V LT AY EY 1 NV S SA T TY PS V R N Q N N S V V YT KSTVK T KDLQSL I SM TS TL FY I R I H P I L I S S A LFG L G C KL F R FF 2 3 4 5 6 7 8 9 0 I 1 LF A R K Q GA H FQ L AM V VL 10 11 P12 13 14 15 F16 17 18 19 I 20 21 22 T23 24 25 26 G27 28 29 30 1158 MEME (no SSC) 26.06.2019 16:24 1159 Supplemental Figure 6: Most significantly enrichedT motifs in Obazoan ITA, obtainedP byV 1160 MEME-suite. (A) Consensus cation motifG and FG-GAP obtained by RNA-seq for an Obazoan 1161 ITA, (B) EGF motif found exclusivelyN in MinisteriaA vibrans (C) GFFKR motif found in Obazoa, 1162 (D) immunoglobulin like motif found in Obazoa. 1163
54 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
ITGB Obazoan Motif
A. 4 B.4
3 3 bits 2 bits 2 T QQ G L A Y 1 TVPG L M L A K R L L A S 1 A P Y A T RC L DK R A N Y LQ R K L F R F S S A R I L S P P R S Q G Q I T L Y PS T R KL N G NV S A A S PY S V D E V L V M E V F Y N K E P Q Y I V K E V NS P Y L T S T T F G MV M T VY E R Q L S R A G LV Q K M I V I LVS E E KF V M N SV Q F A A D P S R F R D T D H Q V N I D E V M S AE K P F V I D R I I I I G I I I LT YN MR VE I G Q G G I G GG Q FLQ 2 3 4 5 6 7 8 9 1 T K F N D GFDY M 0 Y V D L K N A I 2 3 4 5 6 7 8 9 1 10 F11 12 13 14 15 16 17 18 19 20 F21 22 23 24 25 26 27 28 29 0 LM K L H S 10 11 12 13 14 15 16 17 18 19 20 21 22 MEME (no SSC) 27.06.2019 17:57 G V L S SM F T A H E MEME (no SSC) 16.04.2019 22:28 C. D. 4 D D D 4 R D 3 3
bits 2 bits 2 T W S M A TKLV L 1 S Y P L RT I I M LA V R T A L V KN I I N 1 AY I AY F QG Q F N V G R GT V E D S V R A VL S S T N T L H L A YR QV I N D L DD T V L T Q I N S L QSS S Q S PE W H F F VR SR S N E V A DT M F NI I S D A G A H A A G A A S PAM K M M L D K N I M I K LK RG S SVYL S E M S A EA G LD 2 3 4 5 6 7 8 9 1 D H N H C F 0 A STFQWRQ N AI KA L VN E H H V 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 I 2 3 4 5 6 7 8 9 S 0 1 R
P 10 11 12 13 14 15 16 17 DG MEME (no SSC) 16.04.2019 22:31 G E N D E Q G F KP MEME (no SSC) 16.04.2019 22:32 E. F. 4 C W 4 P 3 3 bits bits 2 2 K S D RND PLAT I GT DT S V V V S 1 Y T I P GV T T R 1 V K S D T Q D N T K KS S A S FR T S R N V V V Y G I QM V S E E S TI P HS R N S T T M G YQ R S D L T L GL G E D Q E N N Q T ET K Q T F N S D LT S N V Y L E DN M ENH GPK FL SQ F D QAG E ND E D L E D Q K Q Y A A A A N DAD A I F G A I D DAE E AAF Q G N NF E A A L T F 2 3 4 5 6 7 8 9 2 3 4 5 6 7 8 9 1 0 1 G I N G A S P I 0 F 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 A E 10 11 NT12 13 14 15 L16 17 18 19 20 21 22 23 24 25 G26 27 28 29 GW G MEME (no CSSC) 16.04.2019 22:33 GH C G G MEME (no SSC) 16.04.2019 22:29 4 C C C C C C C C H. 4 G C C C G. W 3 3
bits 2 bits 2 S Q T 1 T K E Y I V R K E SA D VT 1 E Q Q L QM Q G N V N MAM E Q NT QPN D T K R F N L S I F Y I Q AD K Q S G Y Y V I VVR A R A F R K G L R S DG V N G I N L A N T A L Q K Q Q Y S RR N A A T I G K ALAE ED D I V L GF WLLE KD N QQ MR H AF Q F W VGHKKS GEK ES I G 2 3 4 5 6 7 8 9 1 H Q 0 K AHHQ D W G D ARCA ANNA AHHE L 10 11 12 13 14 15 16 17 18 19 20 21 R 2 3 4 5 6 7 8 9 T 0 1 R
S 10 11 12 13 14 15 16 17 18 19 20 21 Y MEME (no SSC) 16.04.2019 22:28 E K MEME (no SSC) 16.04.2019 20:55 1164 L E Y D F 1165 SupplementalW N PFigure 7: Most significantlyNP enriched motifsK in Obazoan ITB,W obtained by MEME. 1166 (A-D) Cation binding motifs (A) MIDAS binding motif DXSXS (B) Possible fifth position of 1167 MIDAS and ADMIDAS, (C) ITB SyMBS (green) and MIDAS (Yellow), (D) Possible E of 1168 SyMBS (first amino acid of SyMBS) (E-F) Cysteine rich motif, (E) PSI domain, (F) Cysteine- 1169 rich motif stalk, (G-H) C-terminal motifs (G) NPXY motif, (H) Possible KLXXXD motif 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190
55 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
1191 1192 1193 1194 Sib Motif
A.
4
3
bits 2 V YT A SY QLTSVFWRVKQV N V 1 T T V S TYSQLVT Y L L VTQANS SSDLI ALVI RKATS I NGLVNN I TDT SARKLF I ATS I A 2 3 4 5 6 7 8 9 0 1 KA G FP I LASFALQ I LFMCSA I K Q I FG VM N 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 F31 A32 33 34 35 36 A37 38 39 40 41 42 43 44 45 46 47 N48 F49 50 I LQ LS F YE YP V G T MEMEL (noS SSC)P 27.06.2019Y 20:22 B. 4 D FSDKP G 3
bits 2 V PD E N SK DP S N L N 1 I T T Q V Y Y GKNK PT I SYK FYE KD HY EQK TAVT KV VI P E VI L T S 2 3 4 5 6 7 8 9 0 1 EGG SAKD L LT F D I VL D E G SQ M I D FT 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 D27 L28 29 30 K31 32 33 34 35 36 37 38 39 40 N41 42 43 44 45 46 47 Y48 49 Y50 C WD D N PP N S D MEME (no SSC)N 17.09.2019W 18:02 C.
4 C C G YG C C G N G G GKC C G 3 bits 2 V Q I VQ 1 M T V L VI RW L VYR TAYNG RN I PI Q KD 2 3 4 5 6 7 8 9 0 F1 H I I P NRD Q L A GR TN 10 11 12 13 14 L15 16 17 18 R19 20 21 22 23 24 25 26 27 N28 29 30 H31 32 33 SG V K N V S PMEMEN (no SSC) 17.09.2019 17:57 D. 4 N SPV M P I NN F A D 3 bits 2 Q 1 SST M KARSSA LQSS E 2 3 4 5 6 7 8 9 0 1 DGA A 10 Q11 12 13 G14 N15 G16 G17 I 18 19 20 21 22 E23 24 25 N26 27 GDG E AE MEME (noS SSC) 17.09.2019 18:05 1195 N L E AA D 1196 SupplementalV FigureNP 8:Y Most Ssignificantly enrichedNPLY motifs in Sib protein, obtained by MEME. 1197 (A)-(D) Cation binding motifs (A) Possible binding motif of MIDAS and ADMIDAS, (B) 1198 Possible fifth position of MIDAS and ADMIDAS (C) Possible binding motif of SyMBS (D) 1199 NPXY motif 1200 1201 1202 1203 1204 1205 1206 1207
56 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
1208 Supplemental Table 1: The table shows the list of Amorphea ITGA proteins characteristic 1209 including: length, beta propeller motifs, subcellular location, any extra domains attached to 1210 ITGA, c-terminal motif signal peptide, transmembrane regions and type of ITGA. The order of 1211 taxons are listed in accordance to the Figure 2. The subcellular localization is displayed with 1212 likelihood value in parentheses. The location of extra domains attached to ITGA are shown as 1213 the C-terminal end (CTE) or at the start of N-terminal (NTB). The presence of ITGA interacting 1214 motif with ITGB is shown as GFFKR and GXXXG. The location of beta propeller motifs 1215 performed by MEME-suite is displayed. The presence of FG-GAP motifs are shown in asterixis. 1216 The location of beta motifs that corresponds to IPRSCAN output are highlighted by parentheses. 1217 The presence of consensus cation motifs are displayed in number 2 superscripts. Species Size Subcellular Extra Domain GFFKR Location of Signal Type I of localization C-terminal end = motif Beta propellor peptide(SP) and II AA Predictability based CTE GXXX *= predict to or on likelihood N-terminal G motif possess FG- transmembr Beginning =NTB GAP ane (TM) 2 = Cation motif (D/E-h-D/N-x- D/N-G-h-x- D/E) Filamoeba nolandi 631 Cell membrane PRICHEXTENS GFFKR 16, 47, [109], TM II (CAMPEP_01685 protein Likelihood N(CTE) GIAVG 171 71262) (0.971) protein kinase (CTE) Ministeria vibrans 917 Extracellular EGF 2 repeat 1952, 2552, II Paralog 2 (0.9862) (NT) [319]2, [436]2*, (2753598) Soluble (1) [581]2, 772, Likelihood (0.997) 8122 Rhizamoeba 449 Cell membrane None [55], 1222, 184, SP, TM I saxonica protein Likelihood 2542* (TR11094_c0_g1_ (0.437) i1) Flabellula citata 673 Cell membrane None 57*, 121, 184*, SP, TM I paralog3 protein Likelihood 251*, 313 (c2461_g1_i2) (0.301) Cryptodifflugia 120 Cell membrane (NTB) 8372, 8892*, SP II operculata Paralog 0 Protein Likelihood Thrombospondin [1033]2, [1107]2 2 (c8521_g1_i1) (0.257) type -1 repeat, EGF 4 repeat Intramolecular chaperone Nolandella sp. 975 Cell membrane (NTB) 613, [670]2*, SP II AFSM Protein Likelihood Thrombospondin [734]2*, 2 (TR12097c0_g1_i1) (0.672) type -1 repeat, [805] *, 934 EGF 4 repeat
57 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Arcella intermedia 107 Nucleus membrane (NTB) [899]2*, [965]2* SP II (c23786_g1_i1) 4 Protein Likelihood Thrombospondin (0.58) type -1 repeat, EGF 4 repeat Intramolecular chaperone Flabellula citata 626 Cell membrane EGF (CTE) 123, [259], SP II paralog1 Protein Likelihood [323]2* (c9004_g2_i1) (0.994) Flabellula citata 713 Cell membrane EGF (CTE) 88*, 150*, 255, TM II paralog2 Protein Likelihood 310 (c8227_g1_i1) (0.669) Ministeria vibrans 860 Peroxisome None 582, [125]2*, I Paralog 1 (0.3325), Soluble 185, 243, 310, (2757223) (0.9929) 3872, 448, Likelihood (0.386) [578]2*, [647], 748 Flabellula citata 604 Extracellular None 113, 172, I paralog5 (0.5213) [302]* (c8994_g2_i16) Soluble (0.9842) Likelihood (0.622) Flabellula citata 646 Cell membrane None 53, [116]*, SP, TM I paralog4 Protein Likelihood [178]* (c8386_g1_i3) (0.833) Cryptodifflugia 581 Cell membrane None 582, [120]*2, SP, TM I operculata Paralog Protein Likelihood [188]*, 2612, 2 1 (c43306_g1_i1) (0.445) 325 , 383*, [449]2 Amphizonella sp 9 565 Extracellular None [47], 121, SP I paralog2 (0.9902) [198]1, 262, 2 2 (TR2587c1_g1_i1) Soluble (0.9998) [344] , [414] , Likelihood (0.997) 4822 Mastigamoeba 813 Cytoplasm (0.3215) Cadherin/Ig-like 66, [200]2, 268, I balamuthi Soluble (0.8872) fold (CTE) [345], 416, 468, paralog1 Likelihood (0.493) [540] (MMETSP0035- 201108093729) Amphizonella sp 8 569 Vacuole (0.4562) None [71],138, 207, SP I Paralog 2 Soluble (0.8119) 274 [353], 419 (TR9459c0_g1_i1) Likelihood (0.362) Heleopera 743 Cell membrane ADP-ribose GIVAG [56], 139, 2152, TM II sylvatica Protein Likelihood (NTB) [280]2 (Gene.370_NODE_157) (0.493) Nebela sp. 839 Peroxisome Cadherin (CTE) [36], [109], TM I (Gene.16036_NODE_111) (0.3381), Soluble [179], [244], (0.6007)
58 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Likelihood (0.125) [320], [388], [455] Difflugia sp. D2 266 Organelle (0.3485) None [55], [122], I (DN18414_c0_g1_ soluble (0.8094) [190] i5) Likelihood (0.287) Protosproangium 671 Cell membrane None GAVV [48], [124], TM I articulatum Protein Likelihood G [194] Paralog3 (0.937) (TR6387|m.15047) Ceratiomyxa 565 Vacuole (0.4506) None 742, [127], 198, SP, TM I fruticulosa membrane (0.7419) 259, 321, 393, (c6850_g1_i1) Likelihood (0.215) 471 Diplochlamys sp. 666 Cell membrane None GAIAG [120]2, 249, SP, TM Paralog2 (m.4379) Protein Likelihood [311]2, (0.989) Paramoeba 623 Cell membrane none 69, [125], 198, SP, TM I atlantica Protein Likelihood [337]2, 4252 (2184673) (0.498) Mastigamoeba 104 Cell membrane Cadherin/Ig-like [133], 202, 265, TM I balamuthi 2 Protein Likelihood fold (CTE) 343,522 613, paralog2 (0.674) [685], 754 (MMETSP0035- 20110809854) Amphizonella sp 9 685 Cell membrane PRICHEXTENS 68, [133]2, SP, TM II Paralog3 Protein Likelihood N(CTE) [195], 255, 2 2 (Gene.962_NODE_597) (0.87) [287] , [380] *, 455, 5202, 586 Amphizonella sp 9 677 extracellular None [62], [124], SP I paralog1 (0.8519) soluble [199]2, [268]*, 2 (TR17939c0_g1_i1) (0.8141) Likelihood [337], [409] , (0.936) [488]2 Diplochlamys sp. 540 Vacuole (0.559) None [68], 143, 397, SP I Paralog1 Soluble (0.5526) [468] (2112184) Likelihood (0.639) Vermistella 570 Extracellular None 83, 224, 2992, SP I antarctica (0.7931) 368, 4422, 516 (156452) Soluble (0.9906) Likelihood (0.905) Protosproangium 631 Vacuole (0.4093) None [108], [180], SP I articulatum Soluble (0.8173) 248, 329, Paralog4 Likelihood (0.459) [390]*, [487], (TR6387_c1_g4_i [567]* 7) Amphizonella sp 8 656 Extracellular PRICHEXTENS [53], [125], SP II Paralog 1 (0.8352) N(CTE) [200]2, 2662, Soluble (0.992)
59 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
(TR6668_c0_g1_i Likelihood (0.964) 3322, [403]2, 4) 473 Amphizonella sp. 563 Cytoplasm (0.625) PRICHEXTENS 16, 87, [161], II 10 Soluble (0.8464) N(CTE) 237, 2992, 370, Paralog1 Likelihood (0.758) [438] (TR4717c0_g1_i1) Amphizonella sp 2 544 Endoplasmic none 111, [189]2, SP I (TR2129_c1_g3_i reticulum (0.2477) [257], [334]2, 9) Soluble (0.5541) [403]2, 474, Likelihood (0.224) [539]2 Endostelium 683 Cell membrane PRICHEXTENS 2322, [295], SP II zonatum LINKS Protein Likelihood N(CTE) 432, [497]2, (789702) (0.451) [572]2, [647] Endostelium 681 Cell membrane PRICHEXTENS 2312, [293], SP II zonatum Protein Likelihood N(CTE) 430, [495]2, (m.5907) (0.479) [570]2, [645] Protosproangium 735 Extracellular none [70], [137], SP I articulatum (0.4086) [200], [268], Paralog5 Soluble (0.4867) [344], 407, 480 (TR2944c0_g1_i1) Likelihood (0.572) Protosproangium 851 Cell membrane protein kinase GIVFG 572, [131]2, SP, 2TM II articulatum Protein Likelihood [201], [275]2, Paralog1 (0.448) [344], 418, 485 (TR2074c0_g1_i2) Protosproangium 685 Cell membrane None GVGLG [105], 176, TM I articulatum Protein Likelihood [240], 303, Paralog2 (0.997) [378], 450, 512 (TR2817c1_g1_i1_6465) Stenamoeba 662 Cell membrane None [168]2, 3102, SP, TM I stenopodia Protein Likelihood 4442, 5142 (2782329) (0.93) Gocevia fonbrunei 869 Cell membrane Cadherin/Immun GAIIG [66]2, [207], TM II Paralog1 Protein Likelihood oglobulin-like 281, [348], 417, 2 (DN14950_c7_g1_i2) (0.999) fold (CTE) [488] ,
Gocevia fonbrunei 665 Extracellular None 52, [121], 194, SP I Paralog2 (0.5821) 257, [318]2, (DN13664_c1_g1_ Soluble (0.9361) [386]2*, 463 i1) Likelihood (0.656)
Amphizonella sp. 523 Extracellular None [56], [129]2, SP I 10 (0.9162) [201], [273]2, Paralog2 Soluble (0.9984) 3402, 3972, (TR11506c6_g2_i Likelihood (0.929) [468] 3)
60 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Ectocarpus 110 Cell membrane PRICHEXTENS [56], [127], 195, SP, TM II siliculosus 5 Protein Likelihood N(CTE) 260, 326, [396], (0.758) 464* Ministeria vibrans 634 Cell membrane None GFFKR [18]2*, 78 TM I Paralog 3 Protein Likelihood (m.23418) (0.214) Pygsuia biforma 331 Extracellular DUF11 4 repeats GFFKR 267, [685], TM I (ITGA) 2 (0.9162) (CTE) GLAIG [807] Soluble (0.9984) Likelihood (0.93) Lenisia limosa 334 Cytoplasm (0.5704) Ig fold like 11 GFFKR 3102, [701], I (LenisMan47) 6 Soluble (0.9615) repeats (CTE) [826]2 Likelihood (0.632) Thecamonas 211 Cell membrane Ig fold like 2 GFFKR 349*, [894]2*, SP, TM I trahens 4 Protein Likelihood repeats (CTE) [956]2* (0.975) Laminin_g3 (CTE) 1218 1219 Supplemental Table 2: The table shows ITGB motifs, subcellular localizations, signal peptide, 1220 transmembrane and extra domains attached to ITGB protein. The order of taxons are listed in 1221 accordance to the Figure 3. For ITGB motifs of MIDAS, AMIDAS and SyMBS, any non- 1222 canonical amino acids were highlighted in red colour, cysteine rich motifs are either shown as 1223 EGF or PSI domains. The presence of C-terminal motifs are shown in KLXXXD or NPXY/F and 1224 GXXXG. The parentheses indicate number of motifs present in ITGB.
61 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Species MIDA AMIDAS SyMBS CRM Size Signal B-tail Ty Subcellular S Cysteine rich aa Peptide motifs pe localization motifs and (Deeploc)
Transme
mbrane
Clastostelium 25D 29S 81E None 1405aa TM GAVVG II Extracellular recurvatum 27T 32S 120R Likelihood (0.3436) 29S 33D 121P soluble likelihood 123E 202A 122D (0.731) 139R 160D 123E Filamoeba 893D 897S 906D EGF, PSI 2714aa SP, TM GAFIG II Cell membrane nolandi 895Y 899K 949D (NTB) Protein likelihood 897S 900A 950E (0.5463) 956E 1066A 951P 991D 991D 952N Soliformovum 872D 874N 924D PSI 2218aa SP NPXY II Extracellular irregularis 874S 879Y 1012D Present likelihood (0.8972) 876N 880N 1013P soluble likelihood 958S 1071A 1014N (0.9172) 975N 975N 1015E Schizoplasmo 839D, 845S 889D PSI 3641aa SP, TM NPXY II Extracellular diopsis 841T 846Y 923N present likelihood (0.5465) vulgaris 845S 857E 925Q GAAG soluble likelihood 928E 1045A 927L (0.6244) 944D 944D 928E Cavostelium 507D 511S 546D PSI 3296aa TM NPXY II Cell membrane apophysatum 509T 514S 656N present protein likelihood 511S 515D 658D GAASG (0.7274) 592N 708A 659E 642D 642D 660D Mycamoeba 277D 281S 328E Unique 927aa TM NPXY I Cell membrane gemmipara 279T 285N 403N cysteine present protein likelihood 281S 286D 411D rich GVIAG (0.998) 375E 530A 414P region 416E 390D 416E pattern downstre am from 500AA- 860AA
62 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Mastigamoeb 159D, 163S, 198E, PSI, 1315 aa SP, TM NPXA I Cell membrane a balamuthi 161T, 170P, 238S, EGF-like present Protein likelihood 163S 171N 239D, present 5 GAASG (0.561) 241P, 242E 371S, 242E times 271D 271D Mastigella 127D, 131S 198E, PSI, 1292 aa SP, TM NPXA I Cell membrane eilhardi 129S, 134D 207N, EGF-like present Protein likelihood 131S 135D 209D, present 5 GAASG (0.7939) 212E 339A 211P, times 230D 230D 212E Rigifila 124D 128S 164D PSI, 932 aa SP, TM NPXY I Extracellular limosa 126S 131D 201N EGF-like present soluble likelihood 128S 133D 203D present GTATG (0.681) 206E 391A 205P 13 times 237D 237D 206E Echinamoeba 123D, 127S, 174E PSI, 593 aa SP, TM NPXY I Cell membrane exudans 125S, 140G 204N, EGF-like present (2) protein likelihood 127S, 141E 206E, present 3 KMAAA (0.5596) 209E, 338A 208P, times D 239D 239 D 209E GAVIG
Difflugia sp. 115D, 119S 177E, PSI, EGF 605 aa TM NPXY I Cell membrane D2 117S, 136D 208N, present 3 present (2) protein likelihood 119S, 137E 210E, times (0.8598) 213E, 295I 212P, 249D 249D 213E Nebula sp. 131D, 135S 192E, PSI, EGF 630 aa SP, TM NPXY I Cell membrane 133S, 138P 223N, present 3 present (2) protein likelihood 135S, 139Y 225E, times (0.7413) 225E, 361A, 227P, 264D 264D 228E Heleopera 127D 131S 174E PSI, EGF 622 aa SP, TM NPXY I Cell membrane sphagni 129S 134P 216N present 3 present (2) protein likelihood 131S 135H 218E times KADX(4) (0.9332) 221E 352A 220P E 266D 266D 221E
Difflugia sp. 146D, 150S 197E PSI, EGF 630 aa SP, TM NPXY I Cell membrane USP 148S, 153P 239N present 3 present (2) protein 150S, 154Y 241E times likelihood (0.9847) 245E 370A 243P 280D 280D 245E Hyalosphenia 35D 39S 85D EGF 531 aa TM NPXY I Cell membrane papilio 37S 42P 127N present 3 present (2) protein likelihood 39S 43Y 129E times KADX(4) (0.8186) 132E 270D 131P D
63 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
172D 172D 132E Rhizamoeba 143D 147S 182E PSI, EGF 951 aa SP, TM NPXY I Cell membrane saxonica 145S 150Q 223N, present 7 present (2) protein likelihood (paralog 1) 147S 151E 225D, times (0.7754) TR11158c0_g1_i1 228E 383A 227P, 259D 259D 228E Cryptodifflugi 144D 148S 194E PSI, EGF 633aa SP, TM NPXY I Cell membrane a operculum 146S 151P 236N present 3 present (2) protein likelihood 148S 152L 238E times (0.9426) 241E 370G 240P 278D 278D 241E Rhizamoeba 142D, 145S 181E, PSI, EGF 932aa TM NPXY I Cell membrane Saonxica 144S 148D 222N, present 3 present (2) protein likelihood (Paralog2) 146S 149D 224D, times (0.8204) TR7856c0_g1_i1 227E, 329A 226C, 258D 258D 227E Flabellula 163D, 167S 185D, PSI, EGF 1141 aa SP GAAVG I Cell membrane citata 165S 172M 229G present 5 protein likelihood 167S 173E 231D, times (0.7827) 234E, 364A 233P 282D 282D 234E Rhizamoeba 832D 836S 871E, EGF-like 1456 aa SP, TM NPXY I Extracellular saxonica 834S 839P 913N present 2 present (2) (05263) soluble (Paralog 3) 836S 840F 915D times likelihood (0.731) TR7441_c0_g1_i1 918E 1063A 917C 949D 949D 918E Pyxidicula 133D, 137S, 192E, PSI, EGF 621 aa SP, TM NPXY I Extracellular operculata 135S, 140P 224N, present 3 present (0.9867) soluble KADX(4)D 137S, 141Y, 226E, times likelihood (0.9995) GECGG 229E, 360A, 228P, 264D 264D 229E Pygsuia 110D 114S 161D PSI, EGF 1793 aa TM NPXY I Cell membrane biforma 112S 117D 200Q present present Protein likelihood 114S 118D 201D 34 times KAGX(4) (0.9792) 204E 331A 203P E present 231D 231D 204E Lenisia limosa 130D, 134S 182N PSI, EGF SP, TM NPXY I Cell membrane 132S 137N 215N present present Protein likelihood 134S 138D 216E 12 times, RFNAAM (0.9934) 218S 357N 217P EGF like KE 246D 246D 218S 4 times present
Ministeria 85D, 89S 124E EGF-like SP, TM NPXY I Cell membrane vibrans 87S 92D 174N present 5 present protein likelihood (paralog 1) 89S 93D 176D times KAWQQ (0.9827) c191|m.702 179E 310S 178P WEE
64 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
210D 210D 179E Ministeria 73D, 77S 122D EGF-like TM NPXF I Cell membrane vibrans 75S 80N 189N present present (2) Protein likelihood (paralog 2) 77S 81D 191D 14 times KLAQVA (0.9941) c46_198 194E 339A 193P AD 225D 225D 194E Ministeria 594D, 598S 633E EGF-like SP, TM NPXY I Cell membrane vibrans 596S 601N 677N present at present (2) Protein likelihood (paralog 3) 598S 602D 679H N- KLITGA (0.9985) c334|m.1235 682E 824A 681S terminal AD 714D 714D 682E and C- terminal
Didymoeca 444D 446S 524S None 1673aa SP, TM None I Cell membrane costata 446S, 449D 527D protein likelihood 448S 451D 529P (0.7458) 530E, 652A 530E 562D 562D Sphaeroforma 119D 123S 195E EGF-like 1141aa TM KGIQMV I Cell membrane arctica 121S 125D 227N present MD protein likelihood 123S 127D 229D 7 times (0.638) 231E 296A 230P 260D 260D 231E 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 Supplemental Table 3: This table shows Rsem value of protists that have either a complete set 1238 of IMAC, integrins with extra domains or unexpected discovery of an integrin protein. The 1239 values are shown in transcripts per million (TPM). Species ITA ITB Tali PINC ILK Paxillin Actinin Actin eF1α Vinculin n H Filamoeba 10.23 91.07(?) 45.54 30.62 11.71 0.00 138.81 2417.4 5932.12 36.07 nolandi 5 Difflugia sp. D2 0.00 0.00 0.00 6.01 N/A 0.00 63.16 12.24 0.00 0.00 Flabellula 16.31(Paralog1) 421.32 18.56 34.78 5.54 25.09 109.90 2708.6 6180.65 7.28 9.81(Paralog2) 5 citata 14.56(paralog3) 4.90 (Paralog4)
65 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
0.67(paralog 5) Nebela sp. 3.73 11.34 3.95 85.57 N/A 25.24 444.60 653.00 2388.17 6.42 Rhizamoeba 174.00 52.55(Paralog3) 25.63 29.96 3.23 13.13 106.07 12079. 5322.57 20.45 82.61(Paralog2) 17 saxonica 121.63(paralog1 ) Cryptodifflugia 4.93 (paralog1) 7.95 50.58 2.40 2.47 15.50 5.55 272.10 1857.80 9.65 1.80 (paralog2) (c7512 operculata _g1_i1) Nolandella sp. 2.77 N/A 9.84 41.65 N/A 11.66 3.44 2929.7 3621.35 12.38 AFSM 1 Arcella 2.98 N/A 4.91 1.19 N/A 11.19 43.20 110.29 2001.59 4.71 (c1709 intermedia 0_g2_i 1) Rigifila ramosa N/A 810.35 348.93 59.65 N/A 141.87 127.94 18519. 2284.60 7.37 18 Pygsuia biforma 2.14 1.45 41.95 111.50 N/A 13.47 103.13 17397. 7071.38 N/A 16 Ministeria 1.71 (Paralog 1) 0.00(Paralog 1) 3.88 13.43 3.38 13.43 8.72 277.70 310.59 1.15 2.90 (Paralog 2) 5.17(Paralog 2) vibrans 20.02 (Paralog 3) 10.56 (Paralog3) 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 Supplemental Table 4: A). Summary of amoebozoan ITA type I table listing: number of 1259 paralogs, Beta propeller, FG-GAP and canonical motifs. The table also shows presence of signal 1260 peptide and transmembrane region, c-terminal interacting motif, GXXXG motif and Ig-fold 1261 like/Cadherin domain. B). Summary of amoebozoan ITA type II table listing: number of 1262 paralogs, Beta propeller, FG-GAP and canonical motifs. The table also shows presence of signal 1263 peptide and transmembrane region, c-terminal interacting motif, GXXXG motif and extra 1264 domain attached to ITA. 1265 1266 1267
66 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.29.069435; this version posted April 30, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
1268 1269
1270
67