SUPPORTING RESULTS

Generation of the Initial STAND Markov Model

To obtain a large number of STAND NTPase domains from databases, we generated a hidden Markov model to serve as a probe sensitive to both NACHT and NB-ARC domains.

We first selected an initial set of 207 canonical NB-ARC and NACHT domain peptide sequences, including NB-ARC domains of R-, APAF1, and related fungal and bacterial

NTPases as well as the NACHT domains of NOD-like receptors. The numbers of NACHT and

NB-ARC sequences in this starting set were approximately equal. Starting with these 207 sequences, we used the CD-HIT program (1) to cluster the sequences by degree of sequence identity, and to pick representative sequences that reflected their overall diversity (see Methods), leaving 133, that we aligned using MUSCLE (2) and manual realignment. This alignment, which included 68 NACHT and 65 NB-ARC domain sequences, was used as the training set to generate a custom STAND hidden Markov model (HMM) using the hmmbuild program from the

HMMER package (3). While this custom HMM, which we refer to as STAND-HMM, is tuned for the NB-ARC and NACHT subfamilies of the STAND NTPases, it is also sensitive to the

SWACOS and MalT clades, but only weakly to the MNS clade.

Mining of Genbank for NTPase Domains and Subsequent Iterative Alignments and

Phylogeny

The STAND Markov model, STAND-HMM was used to probe the NCBI NR protein set containing 10,565,004 sequences, thereby identifying 15,500 STAND NTPases with scores of 30 or greater, a threshold that our analysis indicated was appropriate for detecting known NACHT and NB-ARC sequences while rejecting fragmentary and spurious matches (Figure S2). The

CD-HIT program (1) program was then used to select a representative subset of these sequences

1 with a 90% identity cutoff, resulting in a set of 5,679 representative NBS domains. The HMMER hmmalign program which aligns sequences using an HMM as a guide was then used to create an initial alignment of these sequences with the STAND-HMM. Because this Markov model-guided alignment was well aligned only in the most highly conserved regions of the STAND NTPases,

SeaView (4) was used to align regions with less conservation using the MUSCLE (2) or MAFFT

(5) options, as well as manual alignment. For ease of handling and analysis, the alignment was split into two separate subset alignments of 4,178 NB-ARC (including SWACOS, and MalT, plus 10 NACHT domain sequences from Leipe et al. as outgroups) and 1,501 NACHT domains that were handled separately before reuniting for final ML tree generation. To further simplify the process of alignment, a quarter of these domains (1,045 sequences for the NB-ARC alignment, and 377 for the NACHTS) were randomly selected. Of these, 817 NB-ARC and 286

NACHT domains had associated C-Terminal HETHS domains (~150 amino acids) in their source protein sequences. In order to benefit from the additional phylogenetic information included in these HETHS domains, the corresponding HETHS domains sequences were appended to the 817 NB-ARC + HETHS sequences and 286 NACHT + HETHS sequences, and aligned as described above. Maximum likelihood phylogenies were inferred from these remaining 817 NB-ARC and outgroup sequences and 271 NACHT sequences using RAxML, with 100 bootstraps.

To identify rogue taxa, the script 'compare_trees_leaf_correlation.pl', developed in-house, was run on these sets of 100 bootstraps to calculate correlation coefficients of reciprocal distance, where distance is the number of branches between two tree leaves (Supporting

Methods). Taxa with correlation coefficients of reciprocal distance less than 0.8 were excluded from further analysis, further winnowing the number of remaining sequences to 664 NB-ARC and 271 NACHT sequences (Figure S2). The minimal value of 0.8 was chosen as a threshold to

2 remove the taxa with the greatest variability in nearest neighbors while keeping the most

phylogenetically stable taxa. Lastly, a set of 15 NB-ARC, 2 MalT, 2 SWACOS, and 10 NACHT

NTPase domains previously identified by of Leipe and coworkers was added to the sequence

alignments to serve as markers for clades identified in that earlier work (6). MalT and SWACOS,

and NACHT domains served as outgroups. Also, a single TIR-NBS-WD40, identified in the

Medicago truncatula genome sequencing project, which was of interest from an evolutionary

standpoint, was included. The alignments were combined into a single alignment containing 964

NBS + HETHS domains. DNA sequences were highly divergent and were not used for

phylogeny reconstruction.

Survey of NBS, NBS-LRR, NBS-WD40, NBS-TPR and NBS-ARM Domain Structures in

the NCBI NR Protein Database.

The NCBI Protein NR Set from March 11, 2010 was scanned for matches to the STAND-

HMM with scores of 30 or greater, as described in Supporting Methods. This cutoff was chosen to limit matches to complete STAND domains closely matching the model. Proteins with matches to the STAND-HMM were then scanned for C-terminal matches to ARM, LRR, TPR, and WD40 repeat Markov models obtained from , and associated with NCBI taxonomic information. The totals of NBS, NBS-ARM, NBS-LRR, NBS-TPR, and NBS-WD40 domain combinations in various eukaryotic clades are compiled in Table S5.

In the NCBI NR Database (March 11, 2010 release), 12,549 (82%) of identified STAND

NBS domains are in eukaryotic proteins, 2,685 (17.5%) in bacterial proteins, and fractions of a percent in archaeal and viral proteins. However, the vast majority (99.7% or 4,376 out of 4,388) of proteins with NBS-LRR architecture occur in eukaryotic sequences, the only exception being a small group of 12 NACHT-LRR proteins from actinobacteria and planctomycetes (Table S3).

3 In contrast, the NBS-WD40 architecture occurs within eukaryotes and prokaryotes in

approximately the same proportions as the NBS domain itself. The NBS-TPR domain

architecture, on the other hand, is predominantly bacterial. The NBS-ARM domain architecture

is extremely rare, with only 51 identified occurrences across the NCBI NR data set, including

archaea, bacterial, and metazoan clades. Among eukaryotic sequences from NCBI, the NBS-

LRR architecture is restricted to plants and metazoans, except for a single protein from

Tetrahymena thermophila (XP_001030846.1) that possesses an unusual combination of an

NTPase plus a series of WD40 and LRR repeats. Aside from some homologs in other alveolates and in the related rhizaria, the closest homologs of the STAND NTPase of this Tetrahymena

protein are eubacterial, suggesting a possible acquisition by horizontal gene transfer.

Survey of NBS-LRR, NBS-WD40, and NBS-TPR Domain Combinations in Sequenced

Eukaryotic Genomes

Protein sequences and gene annotation data for 46 representative eukaryotic genomes

representing a broad phylogenetic sampling were downloaded from their respective genome

repositories (Dataset S1). These 46 full-genome sequences, which included both finished and draft genomes, were downloaded in October 2013 and therefore contain sequences not represented in the NCBI NR Database March 11, 2010 release described above. Protein data sets were scanned for the domain combinations of interest as described in Supporting Methods, and the number of each type of domain architecture in each genome identified is presented in Figure

2. In this case, a maximum E-value of 10 was required for individual matches so as to be permissive, and allow for maximal sensitivity for detection of STAND NTPases and repeat domains. Minor differences between the NCBI NR set survey and the eukaryotic genomes survey reflect the higher sensitivity of the latter survey, and the inclusion of recently-sequenced

4 and draft genomes from October 2013. The NBS-LRR domain combination appears in 25 of the

46 surveyed eukaryotic genomes. As in the NCBI database survey, there are few occurrences of

NBS-LRR sequences outside of land plants and metazoa. Consistent with the NCBI database

survey, among the 46 eukayotic genomes surveyed, the NBS-WD40 and NBS-TPR architectures

have a fairly broad phylogenetic distribution, whereas NBS-LRRs are predominantly associated

with plants and metazoans. The NBS-LRR architecture is absent from the genome of

Mnemiopsis leidyi, a representative of the ctenophores (comb jellies), which have been suggested to be the earliest diverging metazoans (7), but there are ten proteins with this architecture in the

sponge Amphimedon queenslandica, nine of which are clearly homologs of the NLRs of

vertebrates. This suggests that NLRs might have evolved after the divergence of ctenophores and

all other metazoans. NBS-LRR proteins are absent in the unicellular chlorophyte algae

Ostreococcus tauri and Chlamydomonas reinhardtii, but their presence in all of the multicellular

viridiplantae, including reports of a remarkable variety of R-proteins in the liverwort Marchantia

polymorpha (8), suggests that proteins with the NBS-LRR architecture evolved very early in the

evolution of multicellular plants. Furthermore, the observation in the multicellular alga,

Klebsormidium flaccidum (9) of two proteins of the TIR-NBS-LRR domain structure

(kfl00170_0010_v1.1, and kfl00295_0030_v1.1) with STAND NTPases similar to those of

Clade I seemingly pushes the genesis of R-proteins back before the divergence of charophyte

algae and land plants.

Of the six NBS-LRR domain combination detected outside the metazoan and viridiplantae

clades, most seem to be in some ways exceptional, with other domains interposed between the

NBS and the LRR, or with weak or truncated NTPase or LRR hits (Supporting Table 6). It is likely that the fairly permissive criteria of this survey contributed to their detection. Upon

5 examination, most of these NBS-LRR candidates appear to be poor analogs of R-proteins or

NOD-like receptors, with such architectures as LRR-NBS-LRR (Ectocarpus silicosus,

Esi0031_0015), Trans-Membrane-NBS-LRR (Naegleria gruberi,

jgi|Naegr1|54705|fgeneshHS_pg.scaffold_166000001), and NBS-WD40-LRR (Tetrahymena

thermophila, TTHERM_01006540), with STAND-NTPases apparently unrelated to those of R-

proteins or NLRs. Another NBS-LRR protein sequence from Dictyostelium discoideum is of

poor quality and appears to contain leucine rich repeats flanked only by fragmentary STAND-

NTPases in an NBS-LRR-NBS arrangement (DDB0191297|DDB_G0268636). Yet another

sequence with an NBS-LRR domain structure reported in the fungus Laccaria bicolor

(designated jgi|Lacbi2|469284|fgenesh1_pg.32_#_5) is nearly a perfect match to ARGH35

(XP_002324939.1) an R-protein from Populus trichocarpa, and is likely either a recent horizontal gene transfer from plants or a contaminant. Of the six non-plant non-metazoan candidate NBS-LRR proteins, only the Monosiga brevicolis protein

(jgi|Monbr1|25471|fgenesh2_pg.scaffold_10000121) seems to be a good structural analog of

NLRs and R-proteins, but has a NACHT NTPase that is closest to those of Telomerase Protein

Component 1, suggesting an independent evolution of the architecture. Although no close analogs of R-proteins or NLRs were identified in this survey, the identification of analogs such as the Monosiga protein, suggests that evolution in and horizontal gene transfer from other clades is plausible for R-proteins and NLRs.

Occurrence of NBS-TPR and NBS-WD40 domain combinations in Sequenced Eukaroytes

The NBS-TPR and NBS-WD40 domain combinations appear in 35 and 29 of the 46 surveyed

eukaryotic clades, respectively, and in 15 and 12 of the 22 surveyed non-plant non-metazoan

clades. Not only do these architectures appear in more clades outside of plants and animals, but

6 they often appear in far greater diversity in those clades as well (Figure 3). It is intriguing that

both NBS-TPR proteins have undergone significant diversification in certain basal

deuterostomes (Branchiostoma floridae (31 proteins), Strongylocentrotus purpuratus (10

proteins), and Saccoglossus kowalevskii (16 proteins)), the cnidarian Hydra magnipapillata (46

proteins), the basal metazoan Trichoplax adhaerens (11 proteins), the mushroom, Laccaria bicolor (34 proteins), and two stramenopiles (kelp/diatoms/oomycetes), Ectocarpus siliculosus

(28 proteins) and Phaeodactylum tricornutum (10 proteins). Likewise diversification of NBS-

WD40 proteins has occurred in the unicellular alga Chlamydomonas reinhardtii (15 proteins),

Laccaria bicolor (57 proteins), the basal deuterostomes Saccoglossus kowalevskii (13 proteins) and Branchiostoma floridae (15 proteins) certain free-living alveolates (protists), Tetrahymena thermophila (29 proteins) and Paramecium tetraurelia (23 proteins). These proteins are, in general, poorly characterized, and it is not known if they have an immune function, or if they play a role in programmed cell death in plants, animals, or in other eukaryotes.

AU Tests on Set Alternative Topologies

As an alternative to using the constrained trees, we also tested precisely defined

rearrangements of the ML tree using the AU test, and in particular, those with different

attachment points for the NACHT and NB-ARC clades. Topologies that placed the NACHT

clade (XVII) in a sister position with R-proteins (Table S4, Figure S3), or nested within the

clade including the NB-ARC domains of R-proteins and APAF1-like NBS-WD40 proteins (I-X),

(ALT1A, ALT2A, ALT3A) were rejected by the AU test (P values of 0.0003, 3x10-5, and 0.001

respectively). Likewise the converse rearrangements, moving R-proteins or their immediate

parent clades into a sister position with NACHT domains (ALT1B, ALT2B, ALT3B) were

rejected (P values of 2x10-3, 3x10-8, 0.002). Topologies that moved clade XVIII, that of NOD-

7 like receptors and their closest bacterial cousins into a sister position with R-proteins (ALT12A,

ALT13A, ALT14A, and ALT15A) were also strongly rejected (P values of 0.0001, 0.0002,

0.0002, and 0.009 as well as the converse rearrangements (ALT12B, ALT13B, ALT14B, and

ALT15B), (P values of 1x10-63, 1x10-36, 3x10-5, and 2x10-62). Furthermore, topologies that moved the NB-ARC domains of R-proteins (I-IV) toward the base of the NB-ARC phylogeny

(ALT7, and ALT8) were rejected by the AU test (P values of 1x10-4, and 0.0002 respectively), although moving R-proteins to the base of the clade represented by I-X (ALT6) was permitted

(P value 0.153). Although many more modest rearrangements of the tree topology than these were plausible, tree topologies that moved the NB-ARC domains out of their nested position within the NB-ARC clade (I-XV) were rejected.

Detailed Description of the Reconstruction of the Ancestral Domain Structure associated with the common ancestor of the STAND NTPases of R-proteins and NLRs

Another way to assess the likelihood that the NBS-LRR architecture arose in the common ancestor of plant R-proteins and animal NLRs is to use the reconstructed ML tree to determine the likely domain structure of the ancestral protein that contained the last common ancestor

(LCA) of the STAND NTPase domains of R-proteins, NLRs, and presumably, all extant NB-

ARC and NACHT domains. Three complimentary methods employing the phytools R package

(see Methods) were used to generate ancestral state likelihoods, two of which calculate marginal ancestral state likelihoods including the ML method (Figure 2, Table S5, Table S6) and the continuous time Markov chain method (CTMC, Figure S4A, Table S5, Table S6), and a third method which uses maximum likelihood methods to calculate conditional scaled likelihoods

(Figure S4B, Table S5, Table S6). Of these, the two marginal likelihood methods are somewhat more standard and typically give similar results, but since conditional scaled reconstructions only

8 use child nodes to calculate likelihoods in a parental node, we reasoned that such a reconstruction would be less subject to uncertainty about early branching in the tree, and therefore complimentary to the other two methods.

Based on the ML tree alone, the marginal ML (likelihood=0.00008) (Table S5, Figure 2), the marginal CTMC (likelihood=0.00008, Table S5, Figure S4A), and conditional scaled ML

(likelihood=0.00647, Table S5, Figure S4B) reconstructions, all gave low probabilities for an

NBS-LRR domain structure in the common ancestor (LCA) of the STAND NTPases of R- proteins and NLRs. Instead, the marginal reconstructions predict an NBS-TPR domain structure of the LCA (ML: likelihood=0.99432, CTMC likelihood=0.99432) and the conditional scaled reconstruction predicts either an NBS-TPR (likelihood=0.75602) or NBS (non-repeat associated) domain structure (P=0.21342) (Table S5, Table S6 Figure 2, Figure S4).

To account for topological uncertainty in the ML tree upon which these ancestral state reconstruction probabilities were based, we performed 280 RAxML fast bootstraps (see

Supporting Methods), rooting the bootstrapped trees with clades XXI and XXII as outgroups, and reconstructing ancestral state likelihoods for each tree. Average domain architecture likelihoods across the 280 bootstraps were calculated for the LCA, the common ancestor of clades I-X, clades XVI-XX (NACHTS), and the tree root (Table S5, A-C). For the marginal-

ML, CTMC, and conditional-ML reconstructions of the bootstrapped trees, the NBS-LRR domain combination had probabilities of 0.00013, 0.00013 and 0.00305 in the LCA, and probabilities of an NBS-TPR domain structure of 0.98155, 0.98155 and 0.88921, respectively

(Table S5). Likewise, the reconstructed structure of the protein at the root of the tree, containing the presumed ancestor of the NB-ARC, NACHT, SWACOS, and MalT NTPases appears most likely to have been a protein with an NBS-TPR domain architecture (bootstrap average likelihoods of 0.98207 for all reconstruction methods). However, the common ancestor of clades

9 I-X appears most likely to have been a protein with an NBS-WD40 domain structure (marginal

ML bootstrap average, NBS-WD40 P=0.68577, Table S5A), whereas the common ancestor of

NACHTS appears to have had an NBS without associated repeats (marginal ML bootstrap average P=0.83084, Table S5A), which suggests that the evolutionary path from NBS-TPR architecture in the common ancestor of the STAND NTPases of R-proteins and NLRs proceeded through NBS-WD40 and NBS architectures to ultimately give the NBS-LRR domain structures of R-proteins and NLRs respectively.

Ancestral State Reconstruction Using Alternate Midpoint Rooting Method

To assess the degree to which rooting method could affect our conclusions based on ancestral state reconstruction, we performed ASR using the ML tree and bootstrap trees rooted using the midpoint method, implemented in the 'phangorn' R package. This method assumes a constant rate of evolutionary change, which is often an incorrect assumption, and outgroup-based rooting is generally preferred. In the case of the STAND NTPases, particularly long-branch lengths among certain clades, including R-proteins and NLRs, suggests a higher rate of evolution in those branches. This not particularly surprising, since innate immune receptors are often under diversifying selection pressure (10). Despite these caveats, the midpoint method offers a way to test the degree to which our conclusions about the ancestral states of STAND NTPases depend on our rooting method. For ML marginal reconstructions that use both child and parent nodes to calculate likelihoods, the choice of the rooting method is not expected to have much affect on the likelihoods of a particular structure in a given node. On the other hand, for conditional scaled reconstructions, which reconstruct ancestral nodes depending only on child nodes, the calculated ancestral state likelihoods could change. In both cases, however, a different rooting method could change which nodes were identified as common ancestors.

The midpoint method placed the root of the ML tree within the NACHT clade itself (Figure

10 S4 D,E&F), inconsistent with earlier conclusions about the evolution of STAND NTPases (6).

As with the outgroup-rooted reconstructions, midpoint-rooted ASR suggests that an LCA with an

NBS-LRR structure is unlikely (ML marginal likelihood=0.01034, ML conditional-scaled likelihood=0.01650, Table S5 D&E). However, whereas the ancestral state reconstructions using outgroup rooted trees predict an ancestral NBS-TPR domain structure, both the ML Marginal

(Figure S4D, Table S5D) and ML Conditional Scaled reconstructions (Figure S4E, Table S5E) using midpoint-rooted trees suggest the LCA was most likely a non-repeat-associated NBS ancestral protein with bootstrap average likelihoods of 0.79098 and 0.76342 respectively. The

NBS-TPR ancestral structure does come in second place, however, with likelihoods of 0.14672 and 0.16451 respectively. As with the outgroup-rooted reconstructions (Figure S4 A-C, Table

S5 A-C), midpoint-rooted reconstructions suggest non-NBS-LRR intermediates along the path from the LCA to the R-proteins and NLRs, implying that the NBS-LRR architectures of R- proteins and NLRs was not inherited from their common ancestor. Based on previous conclusions of the relationships among STAND subclades (6) and our concerns about the midpoint rooting method, we deem it likely that the protein containing the LCA of the STAND

NTPases of R-proteins and NLRs possessed an NBS-TPR domain architecture.

Supporting Methods

Generating the custom Markov Model STAND-HMM (NACHT/NB-ARC Group)

207 representative NACHT, NB-ARC, and related domain sequences were aligned. This initially included MalT and LuxR type NTPAse domains. The program CD-HIT (1) was used to cluster the sequences into groups with 40% or greater identity and pick a representative of each cluster. A Markov model was generated using hmmbuild from the HMMER software package

11 (3) version 3.0rc2, available from http://hmmer.janelia.org/. The resulting Markov model was found to be more sensitive for NACHT sequences than NB-ARC sequences, so additional NB-

ARC sequences were added, and the alignment was adjusted to accommodate the new sequences using MUSCLE and manual realignment. In the end, 68 NB-ARC-CONTAINING sequences and 65-NACHT-containing sequences were included. The resulting HMM, STAND-HMM was sensitive to NB-ARC, NACHT, SWACOS, and MalT NTPase domains.

Generating the custom Markov Model JMU_HETHS (HETHS domain)

HETHS domains were identified in NB-ARC-domain containing proteins by manual inspection, and 99 representative NB-ARC-associated HETHS domains were aligned using

MUSCLE followed by manual adjustment. The resulting alignment was used to generate a

Markov model using hmmbuild (HMMER package).

Unstable taxon removal:

The compare_trees_leaf_correlation.pl (URL) Perl script (developed in house) assesses the degree of consistency of the placement of a tree node across a set of multiple bootstraps by measuring the variability of distance between each leaf node and each other leaf node across all binary combinations of all trees in a set of bootstraps.

The distance metric used is the number of edges between pairs of leaf nodes. This value is determined for a combination of two leaf nodes by counting edges while ascending the tree from each of two leaf nodes and adding the number of edges when a common parental node is reached.

When using actual edge distances, the combined Pearson correlation coefficient for a given leaf node tended to be insensitive to changes in nearest neighbor relationships. This is

12 understandable, since each leaf node will tend to have only a couple of nearest neighbors of a given distance and many other leaves with remote distance, such that perturbations in the small number of nearest-neighbor distances are overwhelmed by the relatively many more distant leaves. However, using the reciprocal of this edge count distance gave greater sensitivity to perturbations of nearest neighbor relationships than using actual edge count distances when calculating correlation coefficients.

For each leaf node, across each binary combination of trees, the Pearson and Spearman correlations of each edge distance, and the Pearson correlation of each reciprocal edge distance to each other node is calculated. Individual Pearson correlations for each leaf node in each combination of trees are combined by taking the square root of the sum of the squares of each correlation, divided by the number of combinations of trees (one half the number of trees times the number of trees minus 1).

Taxonomy-Based Classification of NBS-Containing Proteins

The ‘fastacmd’mprogram from NCBI was run using the ‘-T’Tflag to obtain taxon names and

NCBI taxonomy ids for 15,458 protein sequences derived from the NCBI NR database and known to contain matches to the STAND-HMM with scores equal to or grater than 30.

Taxonomy data was downloaded from ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz, parsed and loaded into a MySQL relational database. Using data from the ‘node.dmp’oand

‘name.dmp’afiles, complete taxonomies were obtained for organisms by matching the NCBI taxonomy ids with the corresponding taxon names, parental taxonomy ids, etc. In cases where this information was not available in the downloaded data, taxonomies were fetched from NCBI using the LWP Perl module, or manually downloaded. Taxonomies were saved in a flat-file database. Taxa were assigned colors based on their taxonomy for the purpose of representation

13 in the Dendrogram program.

Domain Structure Classification of NBS-Containing Proteins

Using the hmmscan program (HMMER package), 4,270 protein sequences already known to

contain matches to the STAND-HMM were scanned for domain matches against the PFAM 26.0

domain model database (ftp://ftp.ebi.ac.uk/pub/databases/Pfam/releases/Pfam26.0/). The

‘−domtblout’ooption was used with hmmscan to output a list of per-domain hits. The per- domain output was then imported into a flat-file database for each sequence. Since a coiled-coil domain profile was not available in PFAM, the Multicoil 2 program was used to identify coiled- coil regions (11). To generate a domain structure classification for each protein sequence, the target names (e.g. TPR, LRR, NB-ARC) of its hits against the PFAM database were ordered by position within the sequence and concatenated together separated by brackets and hyphens.

These classifications were searched using Perl regular expression functions corresponding to domain classifications of interest: NACHT-LRR, NB-ARC-LRR, etc. Depending on the resulting classification, taxa were assigned colors for the purpose of tree representation in the

Dendroscope program (12).

Detailed AU Test Method

Constrained trees were generated using a perl script developed in-house

‘constraint_tree_maker.pl’. Using the RAxML feature that permits generation of a ML tree based on a predefined constraint (via the -f d, -g and –no-bgfs optiond), the constrained ML trees were generated, prior to performing multiscale bootstraps, and the AU and other tests using the

CONSEL program (below). Since it was observed that optimization of the ML tree followed multiscale bootstraps analysis did not always give identical P values, these two steps were

14 performed in 9 parallel replicates, except for CON4, which was performed in 18 parallel

replicates.

Tree topologies were redrawn using a perl script (move_tree_branch.pl URL) developed in-

house, and model parameters and branch lengths were re-optimized using the "-f e" option of

RAxML version 7.0.4. Subsequently, using the "-f g" option of RAxML version 7.2.8-ALPHA,

single-site-specific log likelihood values were calculated for the best tree (reference) and

alternative tree topologies. Using a perl script developed in house,

'RAxML_perSiteLLs_to_CONSEL.pl' (URL, the output 'RAxML_perSiteLLs' file was

converted to a CONSEL-compatible mt format, and the makermt, consel, and catpv programs

from the CONSEL package version 0.20 (13) were used to perform multiscale bootstraps and

calculate P values and generate reports for the SH and AU tests.

Ancestral State Reconstruction

Fast bootstraps that were to be used for ancestral state reconstruction were performed using the MPI version of RAxML with the '-f a' and '-k' options to calculate branch lengths. Ancestral state reconstruction by the ML method was performed using the ace (Ancestral Character

Estimation) function from the APE package version 3.4 (14) in R version 3.2.3, with both marginal and conditional probabilities of ancestral states calculated. Ancestral state reconstruction by the continuous time Markov chain (CTMC) method was performed using the

'rerootingMethod' function from the phytools R package version 0.5-20. Domain structure classifications of NBS-containing proteins (Supporting Methods) were coded as integers prior to import into the R environment, and the matrix for transitions between NBS, NBS-LRR, NBS-

WD40, NBS-TPR and NBS-ANK domain structures was assumed to be equal-probability for all transitions. Taxon names in the trees and structure classifications were altered to remove

15 characters illegal for taxon names under R. Perl scripts were used to reroot the ML and bootstrap trees and make them strictly dichotomous using clades XXI and XXII as outgroups.

Alternatively, where indicated, rooting was performed by the midpoint method using the

'midpoint' function from the phangorn R package verion 2.0.4. Internal nodes were labeled numerically, or with roman numerals representing the 22 clades. After rooting, nodes of interest were identified by node number and tabulated, including the last common ancestor of R-protein and NLR NTPases (LCA), the last common ancestor of NACHTS, the last common ancestor of clades I-X, and the tree root. Visualization of ancestral state probabilities was done using functions from the phytools package.

Survey of NCBI NR Protein Database for NBS-LRR, NBS-WD40, NBS-TPR, and NBS-

ARM domain structures.

Using the ‘-domtblout’doption to output domain level hits, the hmmscan program (HMMER package) was used to scan the NR set for matches to the JMU_NNBAG1 HMM. These were filtered giving a set of 15,500 domain hits in 15,458 protein sequences. The ‘hmmfetch’ program

(HMMER package) was used to extract multiple HMMs from the PFAM 26.0 database relating to the various repeats such that sets of LRR, TPR, WD40 and ARM repeat HMMs were created.

Using the ‘-domtblout’doption, hmmscan was used to search the 15,458 sequences from above containing matches to ‘JMU_NNBAG1’. Using Perl regular expressions functions, these sequences were classified as NBS-LRR, NBS-TPR, NBS-WD40, or NBS-ARM. Taxonomy of these 15,458 protein sequences was combined with the domain structure data, and the total numbers of the various domain structures were totaled for each taxon of interest.

Survey of Sequenced Genomes for NBS-LRR, NBS-WD40, NBS-TPR, NBS-ARM, and

16 related Domain Structures

To analyze the phylogenetic distribution of different architectures of NBS-domain-containing proteins, a set of 47 sequenced eukaryotic genomes was selected to give a broad phylogenetic sampling across the eukaryotic tree of life. Protein sequence data for the genomes was downloaded from the various sequencing project sites as well as gene extents data in the form of gff files, gff3 files, and sequencing project ad hoc gene feature text formats (see Table S7). Gene feature data was converted to GFF3 format before analysis. Using the hmmscan program

(HMMER package (3)) with the ‘–domtblout’ooption to output domain-based hit and score information, the protein sequences for each genome were scanned against an HMM database containing NACHT/NB-ARC, LRR, TPR, WD40, ARM, and TIR domain models. Treating different versions of these archetype domains (LRR_1, LRR_2, etc) as the archetype (LRR), the sequences were scanned for NBS, NBS-LRR, NBS-TPR, NBS-WD40, NBS-ARM, NBS-LRR-

NBS-LRR, LRR-NBS, TPR-NBS, NBS-WD40-NBS-WD40, WD40-NBS, ARM-NBS, and

NBS-NBS architectures. Using GFF3 data to provide chromosomal locations of the genes coding for the searched proteins, it was determined which protein sequence identifiers corresponded to isoforms of the same gene locus. The single longest transcript was used as a representative. Totals of the number of loci with each given domain architecture were thereby tallied.

Colors were added to Dendrogram files using scripts written in Perl.

REFERENCES

1. Li W & Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22(13):1658-1659. 2. Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32(5):1792-1797. 3. Eddy SR (2011) Accelerated Profile HMM Searches. PLoS Computational Biology 7(10): e1002195.

17 4. Gouy M, Guindon S, & Gascuel O (2010) SeaView Version 4: A Multiplatform Graphical User Interface for Sequence Alignment and Phylogenetic Tree Building. Molecular Biology and Evolution 27(2):221-224. 5. Katoh K & Toh H (2008) Recent developments in the MAFFT multiple sequence alignment program. Briefings in bioinformatics 9(4):286-298. 6. Leipe DD, Koonin EV, & Aravind L (2004) STAND, a class of P-loop NTPases including animal and plant regulators of programmed cell death: multiple, complex domain architectures, unusual phyletic patterns, and evolution by horizontal gene transfer. Journal of Molecular Biology 343(1):1-28. 7. Ryan JF, et al. (2013) The Genome of the Ctenophore Mnemiopsis leidyi and Its Implications for Cell Type Evolution. Science 342(6164):1242592. 8. Xue J-Y, et al. (2012) A Primary Survey on Bryophyte Species Reveals Two Novel Classes of Nucleotide-Binding Site (NBS) Genes. PLOS ONE 7(5):e36700. 9. Hori K, et al. (2014) Klebsormidium flaccidum genome reveals primary factors for plant terrestrial adaptation. Nature Communications 5:3978. 10. Parniske M, et al. (1997) Novel Disease Resistance Specificities Result from Sequence Exchange between Tandemly Repeated Genes at the Cf-4/9 Locus of Tomato. Cell 91(6):821-832. 11. Trigg J, Gutwin K, Keating AE, & Berger B (2011) Multicoil2: Predicting Coiled Coils and Their Oligomerization States from Sequence in the Twilight Zone. PLOS ONE 6(8):e23519. 12. Huson DH, et al. (2007) Dendroscope: An interactive viewer for large phylogenetic trees. BMC Bioinformatics 8(1):460. 13. Shimodaira H & Hasegawa M (2001) CONSEL: for assessing the confidence of phylogenetic tree selection. Bioinformatics 17(12):1246-1247. 14. Paradis E, Claude J, & Strimmer K (2004) APE: Analyses of Phylogenetics and Evolution in R language. Bioinformatics 20(2):289-290. 15. Stamatakis A, Hoover P, & Rougemont J (2008) A Rapid Bootstrap Algorithm for the RAxML Web Servers. Syst Biol 57(5):758-771. 16. Takken FLW & Tameling WIL (2009) To Nibble at Plant Resistance Proteins. Science 324(5928):744-746. 17. Tenthorey Jeannette L, Kofoed Eric M, Daugherty Matthew D, Malik Harmit S, & Vance Russell E (2014) Molecular Basis for Specific Recognition of Bacterial Ligands by NAIP/NLRC4 Inflammasomes. Molecular Cell 54(1):17-29. 18. Acehan D, et al. (2002) Three-Dimensional Structure of the : Implications for Assembly, Procaspase-9 Binding, and Activation. Molecular Cell 9(2):423-432. 19. Shen Q-H, et al. (2007) Nuclear Activity of MLA Immune Receptors Links Isolate-Specific and Basal Disease-Resistance Responses. Science 315(5815):1098-1103.

18 Figure S1. A Comparison of Architectures of STAND NTPase-containing Proteins. The stand NTPases of plant R-proteins are in the NB-ARC subclade (PF00931) whereas those of NOD-like receptors are in the NACHT subclade (PF05729). In both cases, they have associated N-terminal effector domains and C-terminal leucine rich repeats (LRRs). In plant R-proteins, the N-terminal effector domain is frequently a TIR-domain (PF01582), or a (CC), whereas in metazoan NLRs they can be a Death-fold domain, either a CARD (PF00619), pyrin (PF02758), (DD, PF00531), or (DED, PF01335), or alternatively an Inhibitor of (IAP/BIR PF00653) domain. The general architecture of an N-terminal effector domain and C-terminal repeats is repeated in other STAND NTPases, although often with different effector domains, and almost always with different C-terminal repeat domains, such as WD40, tetratricopeptide (TPR), Ankyrin (ANK), Armadillo repeats (ARM). 19 Figure S2. Generating the NB-ARC Multiple Sequence 10,565,004 NCBI NR Alignment. The hmmsearch program (3) was used to proteins Database scan the NCBI-NR database (downloaded March 11, 2010,

STAND containing 10,565,004 protein sequences) using the HMM hmmsearch program !5278947823 STAND hidden Markov model as a query. 15,500 putative 23784883938 23848293847 !5278947823 nucleotide-binding domains were identified with scores of 23784883938 23848293847 30 or greater, a threshold that our analysis indicated was appropriate for detecting known NACHT and NB-ARC Domain 15,500 Sequences sequences while rejecting fragmentary and spurious NBS Domains matches. Subsequences that aligned to the Markov model were extracted. Sequence clustering was performed using cd-hit program the CD-HIT (1) program, grouping sequences with more than 90% identity to one another, and selecting single representative members of each cluster. This gave 5,679 Domain 5679 Sequences representative STAND-NTPase domains. NBS Domains The HMMER hmmalign program (3) was then used to STAND perform initial alignment of the trimmed representative HMM !5278947823 23784883938 hmmalign program STAND NTPase sequences to the STAND HMM. This step 23848293847 !5278947823 23784883938 aligned only the most strongly conserved regions. 23848293847 Subsequently, extensive manual realignment was 5679 MSA performed using the SeaView sequence editor. NBS Domains Multiple iterations of alignment and phylogeny reconstruction were performed using RAxML. Using the taxa inferred by the phylogenetic reconstruction as a NB-ARCs & NACHTs Outgroups guide, NB-ARC and NACHT-containing sequences were separated into two parallel alignments and handled separately. The NB-ARC alignment included plant R- proteins, animal APAF1 and CED-4 apoptosis related proteins, related microbial and fungal proteins, and MSA MSA 4178 1501 SWACOS and MalT domain sequences for use as NBS domains NBS domains outgroups and included 4178 NTPase domain sequences. The NACHT alignment included 1501 metazoan NOD-like receptors, and related fungal and bacterial proteins. Since Random these alignments still included a large number of sequences, random subsets of a quarter of the sequences MSA Subset MSA 1045 377 were selected, containing 1,045 and 377 NB-ARC and NBS domains NBS domains NACHT sequences respectively. To improve the confidence of the phylogeny reconstruction, we used the Include Appended HETHS HMM and hmmscan to identify 817 HETHS C-Terminal HETHS Domain ~150 aa domains adjacent to the C-terminus of the NB-ARC domain and 286 HETHS domains adjacent to the NACHT 817 NBS + MSA MSA 286 NBS + domains. These C-terminal adjacent HETHS domains HETHS HETHS were appended to the aligned NTPase sequences, and domains domains aligned using MUSCLE/MAAFT and manual methods. compare_trees_ The resulting alignments including the NTPase and leaf_correlation.pl HETHS domains were used for 100 bootstraps of Remove phylogeny reconstruction with the RAxML program using Rogue Taxa its fast bootstrap option(15), the WAG substitution matrix, MSA 664 NBS + MSA 271 NBS + and the Γ evolutionary model with 4 rate classes assumed HETHS HETHS and empirically-determined amino acid frequencies. The domains domains resulting bootstrap trees were used for leaf stability analysis using the script Add 19 Leipe MSA Add 10 Leipe 'compare_trees_leaf_correlation.pl'. Unstable leaves with reference reference reciprocal leaf distance correlations less than 0.80 were domains domains removed from the alignment, leaving 664 NB-ARC sequences and 271 NACHT sequences for subsequent 964 NBS+ phylogeny reconstruction. HETHS domains 20 Figure S3: Alternative Topologies Evaluated Using AU and Other Likelihood Tests. A. Tests ALT1A, ALT1B, ALT2A, ALT2B, ALT3A, ALT3B, ALT4A and ALT4B. B. Tests ALT5A, ALT5B, ALT6, ALT7, ALT8, ALT9A, ALT9B, ALT10A, ALT10B and ALT11. C. ALT12A, ALT12B, ALT13A, ALT13B, ALT14A, ALT14B, ALT15A, and ALT15B. Topological rearrangements are indicated by the arrows. Results are in Table S4.

A 0.1 99 P roteobacteria, C hloroflexi, O ther B acteria (MalT : NB S , NB S -T P R ; 9 taxa) X X II

67 Actinobacteria, P roteobacteria, O ther B acteria (S W AC O S : NB S , NB S -T P R ; 22 taxa) X X I

95 F irmicutes , C yanobacteria (NAC HT : NB S ; 3 taxa) X X

94 Actinoba cteria , P roteoba cteria , B a cteroidetes (NAC HT : NB S -W D40, NB S , others ; 10 ta xa ) X IX 8 9 5 9 8 91 Metazoa, B acteria (NAC HT : NB S -L R R , NB S , others ; 83 taxa) X V III

5 7 72 Metazoa, B acteria, C hoanoflagellida (NAC HT : NB S -W D40, NB S -T P R , NB S , others ; 29 taxa) X V II

4 0 49 Fungi, Proteobacteria, Cyanobacteria, other Bacteria, Metazoa (NACHT: NBS-WD40, NBS-ANK, NBS-TPR, NBS, others, 155 taxa) XVI

100 B acteria (NB S -T P R , NB S ; 3 taxa) X V

98 C yanobacteria (NB S -W D40, NB S ; 20 taxa) X IV 7 1 98 Fungi, Bacteria, Hydra, Amphioxus (NBS-TPR, 86 taxa) XIII 4B 1 4A 3 8 98 Actinoba cteria (NB S -T P R , others , 38 ta xa ) X II 3 5 3B 97 Actinobacteria, P roteobacteria, O ther B acteria, P lants (NB S -T P R , NB S ; 50 ta xa) X I

3A 6 1 99 Actinoba cteria , P roteoba cteria (NB S -W D40; 4 ta xa ) X 3 2B 100 P la nt (NB S -AR M; 2 ta xa ) IX

2A 9 6 82 Animal AP AF 1s , C E D4 (NB S -W D40, NB S , 17 T axa) V III 5 2 86 Archa ea , C hloroflexus , C ya noba cteria , P roteoba cteria (NB S -W D40, T IR -NB S -W D40; 7 ta xa ) V II 1B 1 1 1A 100 C ya noba cteria (NB S -AR M, others ; 4 ta xa ) V I 3 100 Amphioxus (N B S -W D40; 2 ta xa ) V

2 6 100 P lant (NB S -LR R , C C -NB S -LR R , NB S ; 58 taxa) IV 9 9 84 P lant (NB S -LR R , C C -NB S -LR R , NB S ; 273 taxa) III 3 100 P lant (NB S -LR R , C C -NB S -LR R , NB S ; 8 taxa) II 1 3 100 P lant (T IR -NB S -L R R , NB S -L R R , NB S ; 81 taxa) I

B 0.1 99 P roteobacteria, C hloroflexi, O ther B acteria (MalT : NB S , NB S -T P R ; 9 taxa) X X II

67 Actinobacteria, P roteobacteria, O ther B acteria (S W AC O S : NB S , NB S -T P R ; 22 taxa) X X I

95 F irmicutes , C yanobacteria (NAC HT : NB S ; 3 taxa) X X

94 Actinoba cteria , P roteoba cteria , B a cteroidetes (NAC HT : NB S -W D40, NB S , others ; 10 ta xa ) X IX 8 9 5 9 8 91 Metazoa, B acteria (NAC HT : NB S -L R R , NB S , others ; 83 taxa) X V III

5 7 72 Metazoa, B acteria, C hoanoflagellida (NAC HT : NB S -W D40, NB S -T P R , NB S , others ; 29 taxa) X V II

4 0 49 Fungi, Proteobacteria, Cyanobacteria, other Bacteria, Metazoa (NACHT: NBS-WD40, NBS-ANK, NBS-TPR, NBS, others, 155 taxa) XVI

100 B acteria (NB S -T P R , NB S ; 3 taxa) X V

98 C yanobacteria (NB S -W D40, NB S ; 20 taxa) X IV 98 Fungi, Bacteria, Hydra, Amphioxus (NBS-TPR, 86 taxa) XIII

1 3 8 98 Actinoba cteria (NB S -T P R , others , 38 ta xa ) X II 3 5 97 Actinobacteria, P roteobacteria, O ther B acteria, P la nts (NB S -T P R , NB S ; 50 taxa) X I

6 1 9A 99 Actinoba cteria , P roteoba cteria (NB S -W D40; 4 ta xa ) X 9A 3 100 P la nt (NB S -AR M; 2 ta xa ) IX

9 6 82 Animal AP AF 1s , C E D4 (NB S -W D40, NB S , 17 T axa) V III 5 2 86 Archa ea , C hloroflexus , C ya noba cteria , P roteoba cteria (NB S -W D40, T IR -NB S -W D40; 7 ta xa ) V II 10A 10B 8 1 1 7 100 C ya noba cteria (NB S -AR M, others ; 4 ta xa ) V I 3 100 Amphioxus (N B S -W D40; 2 ta xa ) V 6 5B 2 6 100 P lant (NB S -LR R , C C -NB S -LR R , NB S ; 58 taxa) IV 5A 9 9 84 P lant (NB S -LR R , C C -NB S -LR R , NB S ; 273 taxa) III 3 100 P lant (NB S -LR R , C C -NB S -LR R , NB S ; 8 taxa) II 11 1 3 100 P lant (T IR -NB S -L R R , NB S -L R R , NB S ; 81 taxa) I

C 0.1 99 P roteobacteria, C hloroflexi, O ther B acteria (MalT : NB S , NB S -T P R ; 9 taxa) X X II

67 Actinobacteria, P roteobacteria, O ther B acteria (S W AC O S : NB S , NB S -T P R ; 22 taxa) X X I

95 F irmicutes , C yanobacteria (NAC HT : NB S ; 3 taxa) X X

94 Actinoba cteria , P roteoba cteria , B a cteroidetes (NAC HT : NB S -W D40, NB S , others ; 10 ta xa ) X IX 8 9 5 9 8 91 Metazoa, B acteria (NAC HT : NB S -L R R , NB S , others ; 83 taxa) X V III

5 7 72 Metazoa, B acteria, C hoanoflagellida (NAC HT : NB S -W D40, NB S -T P R , NB S , others ; 29 taxa) X V II

4 0 49 Fungi, Proteobacteria, Cyanobacteria, other Bacteria, Metazoa (NACHT: NBS-WD40, NBS-ANK, NBS-TPR, NBS, others, 155 taxa) XVI

100 B acteria (NB S -T P R , NB S ; 3 taxa) X V

98 C yanobacteria (NB S -W D40, NB S ; 20 taxa) X IV 15B 7 1 15A 98 Fungi, Bacteria, Hydra, Amphioxus (NBS-TPR, 86 taxa) XIII 1 14B 3 8 98 Actinoba cteria (NB S -T P R , others , 38 ta xa ) X II 3 5 14A 97 Actinobacteria, P roteobacteria, O ther B acteria, P lants (NB S -T P R , NB S ; 50 taxa) X I 13B 6 1 99 Actinoba cteria , P roteoba cteria (NB S -W D40; 4 ta xa ) X 13A 3 100 P la nt (NB S -AR M; 2 ta xa ) IX 12B 9 6 82 Animal AP AF 1s , C E D4 (NB S -W D40, NB S , 17 T axa) V III 12A 5 2 86 Archa ea , C hloroflexus , C ya noba cteria , P roteoba cteria (NB S -W D40, T IR -NB S -W D40; 7 ta xa ) V II 1 1 100 C ya noba cteria (NB S -AR M, others ; 4 ta xa ) V I

3 100 Amphioxus (N B S -W D40; 2 ta xa ) V

2 6 100 P lant (NB S -LR R , C C -NB S -LR R , NB S ; 58 taxa) IV 9 9 84 P lant (NB S -LR R , C C -NB S -LR R , NB S ; 273 taxa) III 3 100 P lant (NB S -LR R , C C -NB S -LR R , NB S ; 8 taxa) II 1 3 100 P lant (T IR -NB S -L R R , NB S -L R R , NB S ; 81 taxa) I 21 Figure S4. Continuous Time Markov Chain (CTMC) and Conditional Scaled Ancestral State Reconstructions: A. Continuous Time Markov Chain reconstruction of marginal ancestral state likelihoods in the ML tree using the rerootingMethod function of the phytools package. B. ML Reconstruction of conditional scaled ancestral state likelihoods in the ML tree using the ace function of the phytools R package. Pie charts at nodes represent fractional likelihoods of each species. Color codes are green for NBS-LRR, blue for NBS-WD40, red for NBS-TPR, yellow for NBS-ARM, and gray for NBS (non-repeat associated). C. Representative key with tree nodes labeled (see also Table S6). D. Marginal ML ancestral state reconstruction on ML tree rooted using the midpoint method. E. Conditional-scaled ancestral state reconstruction on ML tree rooted using midpoint method. Note that MP reconstruction places the root of the tree within the NACHT group (XVI-XX). F. Representative MP-rooted tree with nodes labeled. A B XXII XXII XXI XXI XX XX XIX XIX XVIII XVIII XVII XVII XVI XVI XV XV XIV XIV XIII XIII XII XII XI XI X X IX IX VIII VIII VII VII VI VI V V IV IV III III II II I I C XXII D N933 XIX XXI XVIII XX

NACHT XVII XIX N0 N2 N7 XVI N6 XVIII XX N5 XVII (NLR Group) XXII N1 XVI XXI XV XV XIV N281 XIV XIII XIII N284 N305 XII N391 XII XI XI N304 X N479 X IX IX N478 VIII N485 VIII VII VII N484 VI VI N508 V R-Proteins V N512 IV IV N514 III III N572 II N845 II I I E F XIX XIX XVIII N3 NACHT N2 XVIII XVII N1 XVII (NLR Group) XVI XVI XX N0 XX XXII XXII N277 N931 XXI XXI XV N278 XV XIV XIV N279 XIII XIII XII N280N737 XII N823 XI XI X N281 X N732 IX IX VIII N282 VIII N709 VII VII VI N283 VI V N284 V R-Proteins IV N285 IV III N286 III II N287 II N288 I I 22 Figure S5. A Model of Activation of R-Proteins Proposed by Takken and coworkers(16). In the “off” state of the protein, the nucleotide-binding pocket of the domain has a tightly bound ADP molecule. Interaction with a ligand, via the LRR repeat domain, or with NBS itself as in the case of NAIPs (17), with possibly some contribution by the TIR or CC domain, causes a conformational change that allows ADP to be released from the nucleotide binding site and replaced by ATP, at which point, the R-protein enters its activated state. What happens in this activated state is still unclear. The activated R-protein may oligomerize, as is the case with APAF1 of vertebrates (18), or alternatively, it may interact with cellular machinery, or factors e.g. WRKY transcription factors (19). Eventually, the NTPase hydrolyzes the bound ATP, and shifts to the off state. It has been suggested that the HETHS domain may function as a “lever” to couple hydrolysis of a bound NTP with a change in the conformation of the effector domain (6).

23 Table S1. Statistics of the Alignment of 964 STAND NTPases. Alignment Length: 2786 Distinct Alignment Patterns: 2488 Proportion of Gaps: 88.69% Seq count: 964 Mean Sequence Length: 315 Standard Deviation: 25.6 Minimum Length: 209 Median Length: 312 Maximum Length: 426

Table S2. Enumeration of Domain Architectures by Clade. Totals of the various domain architectures are given by clade.

Clade NBS NBS-TPR NBS-WD40 NBS-LRR NBS-ARM NBS-ANK Total I 19 0 0 62 0 0 81 II 2006008 III 25 0 0 248 0 0 273 IV 3 0 0 55 0 0 58 V 0020002 VI 1000304 VII 1060007 VIII 4 0 13 0 0 0 17 IX 0000202 X 0040004 XI 8 42 0 0 0 0 50 XII 0 38 0 0 0 0 38 XIII 16 70 0 0 0 0 86 XIV 4 0 16 0 0 0 20 XV 0300003 XVI 62 13 41 0 0 39 155 XVII 6 9 14 0 0 0 29 XVIII 31 0 0 51 0 1 83 XIX 5 0 5 0 0 0 10 XX 3000003 XXI 13 9 0 0 0 0 22 XXII 1800009

24 Table S3. Prokaryotic NBS-WD40 Homologs of APAF1.

25 Table S4. AU Tests on Rearranged Alternative Topologies. Using RAxML and the CONSEL package, tree topologies were tested using the AU, KH, SH, weighted KH (WKH), weighted SH (wsh), NP, BP, and PP tests, as described in Supporting Methods. P values are given for each test and each topology when compared to the reference. In addition, a description of the test is given. P values less than 0.05 are indicated in dark red, and P values less than 0.01 are indicated in light red.

Topologyrankitem obs au np bp pp kh sh wkh wsh Description REF 1 1 -6 0.961 0.515 0.512 0.991 0.862 1 0.803 1 ALT1A 16 2 44.7 0.0003 2E-5 0 4E-20 7E-5 0.113 7E-5 0.001 NACHTs (XVI-XX) to sister position with R-Protein NTPases (I-IV) ALT1B 22 3 80.2 2E-39 6E-17 0 1E-35 0 0.007 0 0.001 R-Protein NTPases (I-IV) to sister position with NACHTs (XVI-XX) ALT2A 15 4 42.8 3E-5 4E-6 0 2E-19 8E-6 0.125 8E-6 0.001 NACHTs (XVI-XX) to sister position with APAF1, R-Protein & related NTPases (I-VIII) ALT2B 23 5 89.4 3E-8 5E-7 0 1E-39 0 0.002 0 0 APAF1, R-Protein & related NTPases (I-VIII) to sister position with NACHTs (XVI-XX) ALT3A 11 6 22.8 0.001 7E-5 0.0001 1E-10 0.007 0.514 0.007 0.054 NACHTs (XVI-XX) to sister position with NTPases of R-proteins, APAF1-like NBS-WD40, etc and plant NBS-ARM proteins (I-X) ALT3B 13 7 26.2 0.002 0.001 0.001 4E-12 0.009 0.43 0.009 0.064 NTPases of R-protein, APAF1-like NBS-WD40 proteins, etc and plant NBS-ARM proteins (I-X) to sister position with NACHTs (XVI-XX) ALT4A 7 8 8.1 0.059 0.007 0.006 0.0003 0.066 0.928 0.066 0.494 NACHTs (XVI-XX) to sister position with NTPases of R-proteins, APAF1-like NBS-WD40 proteins, plant NBS-ARM proteins & many actinobacterial NBS-TPR proteins (I-XIII) ALT4B 5 9 6.6 0.215 0.083 0.083 0.001 0.122 0.945 0.122 0.712 NTPases of R-proteins, APAF1-like NBS-WD40 proteins, plant NBS-ARM proteins & many actinobacterial NBS-TPR proteins (I-XIII) to sister position with NACHTs (XVI-XX) ALT5A 6 10 7.3 0.221 0.066 0.068 0.001 0.111 0.919 0.111 0.653 NTPases of Plant NBS-ARMs (IX) to sister position with R-protein NTPases (I-IV) ALT5B 9 11 12 0.091 0.007 0.006 6E-6 0.06 0.811 0.06 0.422 R-protein NTPases (I-IV) to sister position with plant NBS-ARM NTPases (IX) ALT6 8 12 11.3 0.153 0.03 0.03 1E-5 0.073 0.83 0.073 0.492 R-protein NTPases (I-IV) to sister position with the closely related NBS-WD40 bacterial and metazoan NTPases (V-X)

ALT7 18 13 52.6 0.0001 5E-5 0 1E-23 0.001 0.075 0.001 0.006 R-protein NTPases (I-IV) to sister position with similar Bacterial and Metazoan NBS-WD40 proteins and bacterial NBS- TPRs (V-XIII) ALT8 21 14 67.5 0.0002 2E-5 0 5E-30 0.0005 0.021 0.0005 0.002 R-protein NTPases (I-IV) to a sister position with other NB-ARCs (V-XV) ALT9A 2 15 6 0.247 0.102 0.064 0.002 0.138 0.94 0.138 0.716 NTPases of R-proteins, APAF1-like metazoan & bacterial NBS-WD40 & related NBS-ARM proteins (I-X) to sister position with the divergent NBS-WD40 NTPases (XIV) ALT9B 3 16 6 0.235 0.042 0.044 0.002 0.138 0.94 0.138 0.716 NTPases of divergent cyanobacterial NBS-WD40s (XIV) to a sister position with NTPases of metazoan & bacterial APAF1- like NBS-WD40 & NBS-ARM proteins (I-X) ALT10A 10 17 18.4 0.052 0.013 0.013 1E-8 0.026 0.627 0.026 0.195 R-protein like Amphioxus NBS-WD40 STAND NTPases (V) to sister position with Animal APAF1-like (VIII) ALT10B 12 18 25.5 0.021 0.006 0.006 8E-12 0.017 0.453 0.017 0.112 Animal APAF1-like NTPases(VIII) to sister position with Amphioxus NBS-WD40 NTPases (V) ALT11 4 19 6 0.32 0.167 0.166 0.002 0.197 0.912 0.197 0.767 NTPases of TIR-NBS-LRR R-proteins (I) to sister position with those of CC-NBS-LRR R-proteins (II-IV) ALT12A 20 20 65.2 0.0001 6E-6 0 5E-29 0.0001 0.012 0.0001 0.001 Metazoan NOD-like NTPases and bacterial homologs (XVIII) to sister position with R-Proteins (I-IV)) ALT12B 26 21 169.2 1E-63 3E-18 0 3E-74 0 0 0 0 R-Protein NTPases (I-IV) to sister position with metazoan NOD-like NTPases and bacterial homologs (XVIII) ALT13A 19 22 61 0.0002 6E-6 0 3E-27 0.0002 0.02 0.0002 0.003 NTPases of metazoan NLRs and bacterial homologs (XVIII) to sister position with NTPases of R-Proteins, metazoan and bacterial APAF1-like NB-WD40, and related NBS-Arm NTPases (I-VIII) ALT13B 27 23 232 1E-36 2E-15 0 2E-101 0 0 0 0 NTPases of APAF1 and R-Proteins (I-VIII) into sisterhood position with NTPases of metazoan NOD-like sequences and bacterial homologs (XVIII) ALT14A 17 24 50.6 0.0002 2E-5 3E-5 1E-22 0.001 0.067 0.001 0.01 NTPases of metazoan NLRs and bacterial homologs (XVIII) to sister position with R-proteins, APAF1-like NBS-WD40 proteins, etc and plant NBS-ARM proteins (I-X) ALT14B 25 25 135.4 3E-5 5E-6 0 2E-59 0 0 0 0 R-proteins, APAF1-like NBS-WD40 proteins, etc and plant NBS-ARM proteins (I-X) to sister position with metazoan NOD- like sequences and bacterial homologs (XVIII) ALT15A 14 26 37 0.009 0.004 0.005 8E-17 0.006 0.212 0.006 0.047 Metazoan NOD-like sequences and bacterial homologs (XVIII) to sister position with R-proteins, APAF1-like NBS-WD40 proteins, plant NBS-ARM proteins, and many actinobacterial NBS-TPR proteins (I-XIII) ALT15B 24 27 126.3 2E-62 4E-19 0 1E-55 0 0 0 0 R-proteins, APAF1-like NBS-WD40 proteins, plant NBS-ARM proteins, and many actinobacterial NBS-TPR proteins (I-XIII) to sister position with metazoan NOD-like sequences and bacterial homologs (XVIII)

26 Table S5. Ancestral State Likelihoods From ML-tree and Bootstrap Ancestral State Reconstructions. A. ML-marginal, B. ML-conditional scaled, and C. Continuous Time Markov Chain- based (CTMC) ancestral state likelihoods were calculated on the ML tree or averaged across 280 RAxML bootstraps, using clades XXI and XXII (SWACOS and MalT NTPases, respectively) as outgroups. D. ML-marginal, E. ML-conditional scaled ancestral state likelihoods for the midpoint-rooted (MP) ancestral state reconstructions. Ancestral state likelihoods are presented for the last common ancestor of the STAND NTPases of R-proteins and NLRs (LCA), the common ancestor of clades I-X, the common ancestor of NACHT proteins, and the root. Likelihoods are represented both numerically and as heatmaps with red intensity proportional to ancestral state likelihood. A Ancestor Node Analysis NBS NBS-LRR NBS-WD40 NBS-TPR NBS-ANK NBS-ARM N0 ML 0.00433 0.00007 0.00010 0.99535 0.00007 0.00007 ROOT BS Average 0.00541 0.00013 0.01093 0.98207 0.00011 0.00135 N1 ML 0.00532 0.00008 0.00013 0.99432 0.00008 0.00008 LCA BS Average 0.00577 0.00013 0.01107 0.98155 0.00012 0.00136 N478 ML 0.00021 0.00055 0.98569 0.00573 0.00020 0.00762 I-X BS Average 0.02068 0.01757 0.68577 0.16872 0.00338 0.10387

ML Marginal (OG) N2 ML 0.95000 0.00155 0.00159 0.04374 0.00156 0.00155 NACHT BS Average 0.83084 0.00331 0.01619 0.14452 0.00341 0.00172

B Ancestor Node Analysis NBS NBS-LRR NBS-WD40 NBS-TPR NBS-ANK NBS-ARM N0 ML 0.00433 0.00007 0.00010 0.99535 0.00007 0.00007 ROOT BS Average 0.00541 0.00013 0.01093 0.98207 0.00011 0.00135 N1 ML 0.21342 0.00647 0.01115 0.75602 0.00647 0.00647 LCA BS Average 0.08027 0.00305 0.02018 0.88921 0.00308 0.00420 (OG) N478 ML 0.00021 0.00055 0.99116 0.00020 0.00020 0.00767 I-X BS Average 0.03525 0.02184 0.79107 0.01691 0.00466 0.13027 N2 ML 0.99154 0.00163 0.00166 0.00191 0.00164 0.00163 NACHT

ML Conditional Scaled BS Average 0.90945 0.00441 0.02157 0.05758 0.00465 0.00234

C Ancestor Node Analysis NBS NBS-LRR NBS-WD40 NBS-TPR NBS-ANK NBS-ARM N0 ML 0.00433 0.00007 0.00010 0.99535 0.00007 0.00007 ROOT BS Average 0.00541 0.00013 0.01093 0.98207 0.00011 0.00135 N1 ML 0.00532 0.00008 0.00013 0.99432 0.00008 0.00008 LCA BS Average 0.00577 0.00013 0.01107 0.98155 0.00012 0.00136 N478 ML 0.00021 0.00055 0.98569 0.00573 0.00020 0.00762 I-X CTMC (OG) BS Average 0.02068 0.01757 0.68577 0.16872 0.00338 0.10387 N2 ML 0.95000 0.00155 0.00159 0.04374 0.00156 0.00155 NACHT BS Average 0.83144 0.00331 0.01619 0.14394 0.00341 0.00172

D Ancestor Node Analysis NBS NBS-LRR NBS-WD40 NBS-TPR NBS-ANK NBS-ARM N0 ML 0.97935 0.00092 0.00100 0.01686 0.00095 0.00092 ROOT BS Average 0.78765 0.00953 0.04186 0.14547 0.01207 0.00341 N0 ML 0.97935 0.00092 0.00100 0.01686 0.00095 0.00092 LCA BS Average 0.79098 0.01034 0.04198 0.14672 0.00657 0.00341 N282 ML 0.00021 0.00055 0.98569 0.00573 0.00020 0.00762 I-X BS Average 0.02080 0.01757 0.68569 0.16868 0.00339 0.10387

ML Marginal (MP) N0 ML 0.97935 0.00092 0.00100 0.01686 0.00095 0.00092 NACHT BS Average 0.85339 0.00821 0.04075 0.08502 0.01067 0.00196

E Ancestor Node Analysis NBS NBS-LRR NBS-WD40 NBS-TPR NBS-ANK NBS-ARM N0 ML 0.97935 0.00092 0.00100 0.01686 0.00095 0.00092 ROOT BS Average 0.78765 0.00953 0.04186 0.14547 0.01207 0.00341 N0 ML 0.97935 0.00092 0.00100 0.01686 0.00095 0.00092 LCA BS Average 0.76342 0.01650 0.04479 0.16451 0.00628 0.00451 (MP) N282 ML 0.00021 0.00055 0.99116 0.00020 0.00020 0.00767 I-X BS Average 0.03636 0.02185 0.79110 0.01573 0.00468 0.13028 N0 ML 0.97935 0.00092 0.00100 0.01686 0.00095 0.00092 NACHT

ML Conditional Scaled BS Average 0.86741 0.00839 0.04144 0.06988 0.01081 0.00207 27 Table S6. Ancestral State Likelihoods from A. Marginal ML, B. Conditional Scaled, and C. CTMC Reconstructions of the ML tree. Data are presented for the ancestral representative of each clade, and all previous clades, and listed by node identifier. Node labels correspond to the labels printed in Figure S3C. A. ML Marginal Node NBS NBS-LRR NBS-WD40 NBS-TPR NBS-ANK NBS-ARM N0 0.00433 0.00007 0.00010 0.99535 0.00007 0.00007 N1 0.00532 0.00008 0.00013 0.99432 0.00008 0.00008 N281 0.00216 0.00002 0.00015 0.99762 0.00002 0.00002 N284 0.00268 0.00012 0.00132 0.99565 0.00012 0.00012 N304 0.00003 0.00000 0.00040 0.99956 0.00000 0.00000 N478 0.00021 0.00055 0.98569 0.00573 0.00020 0.00762 N484 0.00006 0.00057 0.99237 0.00072 0.00004 0.00624 N508 0.00046 0.00524 0.97937 0.00080 0.00027 0.01386 N512 0.00074 0.01938 0.96935 0.00095 0.00063 0.00896 N514 0.00008 0.99760 0.00210 0.00007 0.00007 0.00009 N572 0.00001 0.99987 0.00010 0.00000 0.00000 0.00000 N845 0.00015 0.99980 0.00002 0.00001 0.00001 0.00001 I 0.00105 0.99888 0.00002 0.00002 0.00002 0.00002 II 0.00498 0.99479 0.00006 0.00006 0.00006 0.00006 III 0.00000 0.99999 0.00000 0.00000 0.00000 0.00000 IV 0.00000 0.99997 0.00001 0.00000 0.00000 0.00000 V 0.00008 0.00052 0.99897 0.00008 0.00007 0.00027 VI 0.02165 0.00075 0.02114 0.00066 0.00065 0.95516 N485 0.00001 0.00002 0.99978 0.00002 0.00001 0.00016 VII 0.00002 0.00000 0.99996 0.00000 0.00000 0.00001 VIII 0.00011 0.00001 0.99984 0.00001 0.00001 0.00001 N479 0.00021 0.00055 0.98569 0.00573 0.00020 0.00762 IX 0.00006 0.00007 0.00062 0.00007 0.00006 0.99911 X 0.00001 0.00002 0.99980 0.00007 0.00001 0.00009 N305 0.00003 0.00000 0.00040 0.99956 0.00000 0.00000 N391 0.00001 0.00001 0.00004 0.99993 0.00001 0.00001 XI 0.00001 0.00000 0.00000 0.99998 0.00000 0.00000 XII 0.00000 0.00000 0.00000 1.00000 0.00000 0.00000 XIII 0.00002 0.00001 0.00003 0.99991 0.00001 0.00001 XIV 0.42540 0.00339 0.51103 0.05339 0.00339 0.00339 XV 0.00014 0.00008 0.00008 0.99955 0.00008 0.00008 N2 0.95000 0.00155 0.00159 0.04374 0.00156 0.00155 N5 0.99238 0.00003 0.00012 0.00738 0.00006 0.00003 XVI 0.99826 0.00003 0.00016 0.00116 0.00036 0.00003 N6 0.99194 0.00004 0.00016 0.00774 0.00007 0.00004 XVII 0.49593 0.00096 0.03728 0.46390 0.00096 0.00096 N7 0.99919 0.00008 0.00008 0.00051 0.00007 0.00007 XVIII 0.99938 0.00043 0.00005 0.00006 0.00004 0.00004 XIX 0.99955 0.00004 0.00029 0.00005 0.00004 0.00004 XX 0.99591 0.00032 0.00033 0.00279 0.00033 0.00032 N933 0.00334 0.00003 0.00004 0.99653 0.00003 0.00003 XXI 0.00928 0.00003 0.00003 0.99062 0.00003 0.00003 XXII 0.00011 0.00005 0.00005 0.99971 0.00005 0.00005

28 B. ML Conditional Scaled Node NBS NBS-LRR NBS-WD40 NBS-TPR NBS-ANK NBS-ARM N0 0.00433 0.00007 0.00010 0.99535 0.00007 0.00007 N1 0.21342 0.00647 0.01115 0.75602 0.00647 0.00647 N281 0.00487 0.00114 0.00730 0.98442 0.00114 0.00114 N284 0.06942 0.00962 0.10820 0.79350 0.00962 0.00964 N304 0.00015 0.00015 0.00409 0.99530 0.00015 0.00017 N478 0.00021 0.00055 0.99116 0.00020 0.00020 0.00767 N484 0.00142 0.01404 0.95979 0.00093 0.00093 0.02289 N508 0.02344 0.27001 0.23569 0.01395 0.01395 0.44296 N512 0.01617 0.50008 0.43523 0.01617 0.01617 0.01617 N514 0.00008 0.99964 0.00007 0.00007 0.00007 0.00007 N572 0.00035 0.99910 0.00014 0.00014 0.00014 0.00014 N845 0.01586 0.98047 0.00092 0.00092 0.00092 0.00092 I 0.02958 0.96840 0.00051 0.00051 0.00051 0.00051 II 0.19147 0.79965 0.00222 0.00222 0.00222 0.00222 III 0.00007 0.99963 0.00007 0.00007 0.00007 0.00007 IV 0.00013 0.99934 0.00013 0.00013 0.00013 0.00013 V 0.00092 0.00092 0.99541 0.00092 0.00092 0.00092 VI 0.02222 0.00066 0.00066 0.00066 0.00066 0.97513 N485 0.00041 0.00032 0.99831 0.00032 0.00032 0.00032 VII 0.00180 0.00022 0.99731 0.00022 0.00022 0.00022 VIII 0.00426 0.00039 0.99418 0.00039 0.00039 0.00039 N479 0.01927 0.01927 0.73624 0.01927 0.01927 0.18670 IX 0.00007 0.00007 0.00007 0.00007 0.00007 0.99967 X 0.00027 0.00027 0.99863 0.00027 0.00027 0.00027 N305 0.00015 0.00015 0.00015 0.99927 0.00015 0.00015 N391 0.00071 0.00071 0.00071 0.99645 0.00071 0.00071 XI 0.00017 0.00010 0.00010 0.99943 0.00010 0.00010 XII 0.00004 0.00004 0.00004 0.99979 0.00004 0.00004 XIII 0.00103 0.00086 0.00086 0.99553 0.00086 0.00086 XIV 0.44699 0.00358 0.53869 0.00358 0.00358 0.00358 XV 0.00148 0.00148 0.00148 0.99262 0.00148 0.00148 N2 0.99154 0.00163 0.00166 0.00191 0.00164 0.00163 N5 0.99097 0.00026 0.00102 0.00696 0.00053 0.00026 XVI 0.97203 0.00143 0.00623 0.00227 0.01660 0.00143 N6 0.63994 0.01084 0.03351 0.29427 0.01072 0.01072 XVII 0.02987 0.00188 0.07263 0.89187 0.00188 0.00188 N7 0.99048 0.00214 0.00202 0.00178 0.00178 0.00178 XVIII 0.98611 0.00978 0.00103 0.00103 0.00103 0.00103 XIX 0.98992 0.00086 0.00663 0.00086 0.00086 0.00086 XX 0.98656 0.00269 0.00269 0.00269 0.00269 0.00269 N933 0.01137 0.00167 0.00167 0.98194 0.00167 0.00167 XXI 0.17895 0.00069 0.00069 0.81830 0.00069 0.00069 XXII 0.00090 0.00090 0.00090 0.99548 0.00090 0.00090

29 C. CTMC Node NBS NBS-LRR NBS-WD40 NBS-TPR NBS-ANK NBS-ARM N0 0.00433 0.00007 0.00010 0.99535 0.00007 0.00007 N1 0.00532 0.00008 0.00013 0.99432 0.00008 0.00008 N281 0.00216 0.00002 0.00015 0.99762 0.00002 0.00002 N284 0.00268 0.00012 0.00132 0.99565 0.00012 0.00012 N304 0.00003 0.00000 0.00040 0.99956 0.00000 0.00000 N478 0.00021 0.00055 0.98569 0.00573 0.00020 0.00762 N484 0.00006 0.00057 0.99237 0.00072 0.00004 0.00624 N508 0.00046 0.00524 0.97937 0.00080 0.00027 0.01386 N512 0.00074 0.01938 0.96935 0.00095 0.00063 0.00896 N514 0.00008 0.99760 0.00210 0.00007 0.00007 0.00009 N572 0.00001 0.99987 0.00010 0.00000 0.00000 0.00000 N845 0.00015 0.99980 0.00002 0.00001 0.00001 0.00001 I 0.00105 0.99888 0.00002 0.00002 0.00002 0.00002 II 0.00498 0.99479 0.00006 0.00006 0.00006 0.00006 III 0.00000 0.99999 0.00000 0.00000 0.00000 0.00000 IV 0.00000 0.99997 0.00001 0.00000 0.00000 0.00000 V 0.00008 0.00052 0.99897 0.00008 0.00007 0.00027 VI 0.02165 0.00075 0.02114 0.00066 0.00065 0.95516 N485 0.00001 0.00002 0.99978 0.00002 0.00001 0.00016 VII 0.00002 0.00000 0.99996 0.00000 0.00000 0.00001 VIII 0.00011 0.00001 0.99984 0.00001 0.00001 0.00001 N479 0.00021 0.00055 0.98569 0.00573 0.00020 0.00762 IX 0.00006 0.00007 0.00062 0.00007 0.00006 0.99911 X 0.00001 0.00002 0.99980 0.00007 0.00001 0.00009 N305 0.00003 0.00000 0.00040 0.99956 0.00000 0.00000 N391 0.00001 0.00001 0.00004 0.99993 0.00001 0.00001 XI 0.00001 0.00000 0.00000 0.99998 0.00000 0.00000 XII 0.00000 0.00000 0.00000 1.00000 0.00000 0.00000 XIII 0.00002 0.00001 0.00003 0.99991 0.00001 0.00001 XIV 0.42540 0.00339 0.51103 0.05339 0.00339 0.00339 XV 0.00014 0.00008 0.00008 0.99955 0.00008 0.00008 N2 0.95000 0.00155 0.00159 0.04374 0.00156 0.00155 N5 0.99238 0.00003 0.00012 0.00738 0.00006 0.00003 XVI 0.99826 0.00003 0.00016 0.00116 0.00036 0.00003 N6 0.99194 0.00004 0.00016 0.00774 0.00007 0.00004 XVII 0.49593 0.00096 0.03728 0.46390 0.00096 0.00096 N7 0.99919 0.00008 0.00008 0.00051 0.00007 0.00007 XVIII 0.99938 0.00043 0.00005 0.00006 0.00004 0.00004 XIX 0.99955 0.00004 0.00029 0.00005 0.00004 0.00004 XX 0.99591 0.00032 0.00033 0.00279 0.00033 0.00032 N933 0.00334 0.00003 0.00004 0.99653 0.00003 0.00003 XXI 0.00928 0.00003 0.00003 0.99062 0.00003 0.00003 XXII 0.00011 0.00005 0.00005 0.99971 0.00005 0.00005

30 Table S7. Survey of the NCBI NR Protein Set for NBS-LRR, NBS-WD40, NBS-TPR, NBS-ARM Domain Structures. The NCBI NR Protein database was scanned for NBS, NBS-ARM, NBS-LRR, NBS-TPR, and NBS-WD40 domain architectures (see Supporting Methods). Occurrences of these combinations of domains were classified by taxon, and Archaeal, Bacterial, Eukaryotic, and subsets of these domains were tallied. Possessing one domain structure does not exclude other domain structures. For example, the NBS domain structure includes NBS-ARM, NBS-LRR, NBS-TPR, and NBS-WD40 proteins as subsets. Counts of each architecture type are enumerated.

Kingdoms NBS NBS-ARM NBS-LRR NBS-TPR NBS-WD40 Eukaryote Subsets NBS NBS-ARM NBS-LRR NBS-TPR NBS-WD40 Metazoan Subsets NBS NBS-ARM NBS-LRR NBS-TPR NBS-WD40 Archaea 50 6 0 11 2 Alveolata 59 0 1 1 47 Deuterostomia 1336 1 738 52 129 Bacteria 2685 28 12 1253 133 Amoebozoa 10 0 0 4 4 Protostomia 2010 0 0 70 Eukaryota12549 17 4376 361 476 Euglenozoa 3 0 0 0 0 Cnidaria 62 0 4 11 4 Viruses 35 0 0 0 0 Heterolobosea 1 0 0 0 0 Placozoa 28 0 0 6 3 Artificial 2 0 0 0 0 Parabasalia 404 0 0 0 0 Total: 1627 1 742 69 206 Total: 15321 51 4388 1625 611 Fungi 1160 7 0 247 206 Choanoflagellida 9 0 0 2 6 Metazoa 1627 1 742 69 206 Viridiplantae 9264 9 3633 33 7 stramenopiles 12 0 0 5 0 Total: 12549 17 4376 361 476 ViridiplantaeSubsets NBS NBS-ARM NBS-LRR NBS-TPR NBS-WD40 Bryophyta 44 0 3 0 0 Coniferopsida 97 0 13 0 0 Bacterial Subsets NBS NBS-ARM NBS-LRR NBS-TPR NBS-WD40 Liliopsida 2702 5 1648 28 6 Actinobacteria 1097 0 11 727 38 eudicotyledons 6416 4 1969 5 1 Aquificae 11 0 0 0 0 Total: 9259 9 3633 33 7 Bacteroidetes/Chlorobi 86 0 0 31 2 Chlamydiae/Verrucomicrobia 16 0 0 5 0 Chloroflexi 119 2 0 86 3 Cyanobacteria 461 24 0 83 77 Deferribacteres 10 0 0 0 Deinococcus-Thermus 11 0 0 10 0 Fungi Subsets NBS NBS-ARM NBS-LRR NBS-TPR NBS-WD40 Firmicutes 121 0 0 41 1 Ascomycota 995 7 0 221 157 Fusobacteria 2 0 0 1 0 Basidiomycota 164 0 0 26 49 Nitrospirae 6 0 0 4 0 Microsporidia 1 0 0 0 0 Planctomycetes 17 0 1 5 0 Total: 1160 7 0 247 206 Proteobacteria 725 2 0 254 12 Spirochaetes 1 0 0 0 0 Thermotogae 1 0 0 0 0 Fibrobacteres 0 0 0 0 0 Acidobacteria 4 0 0 1 0 unclassified/environmental 6 0 0 5 0 Total: 2685 28 12 1253 133 ProteobacteriaSubsets NBS NBS-ARM NBS-LRR NBS-TPR NBS-WD40 Deltaproteobacteria 100 1 0 30 10 Epsilonproteobacteria 11 0 0 1 1 Archaea Subsets NBS NBS-ARM NBS-LRR NBS-TPR NBS-WD40 Alphaproteobacteria 109 0 0 49 0 Crenarchaeota 6 0 0 1 0 Betaproteobacteria 125 0 0 51 1 Euryarchaeota 38 5 0 10 2 Gammaproteobacteria 375 1 0 121 0 Thaumarchaeota 3 0 0 0 0 Zetaproteobacteria 1 0 0 0 0 environmental 3 1 0 0 0 unclassified 4 0 0 2 0 Total: 50 6 0 11 2 Total: 725 2 0 254 12

31 Table S8. List of Prokaryotic NBS-LRR proteins found in the survey of the NCBI NR Protein Set.

Accession Organism Description ZP_02736921.1 Gemmata obscuriglobus UQM 2246 hypothetical protein GobsU_34225 ZP_06272469.1 Streptomyces sp. ACTE transcriptional regulator, XRE family ZP_05505833.1 Streptomyces sp. C large ATP-binding protein YP_003342826.1 Streptosporangium roseum DSM 43021 NTPase (NACHT family)-like protein YP_003302145.1 Thermomonospora curvata DSM 43183 putative signal transduction protein with NACHT domain YP_001103465.1 Saccharopolyspora erythraea NRRL 2338 large ATP-binding protein ZP_05512066.1 Streptomyces hygroscopicus ATCC 53653 large ATP-binding protein ZP_04484261.1 Stackebrandtia nassauensis DSM 44728 putative signal transduction protein with NACHT domain NP_825875.1 Streptomyces avermitilis MA-4680 large ATP-binding protein ZP_04694130.1 Streptomyces roseosporus NRRL 11379 putative large ATP-binding protein ZP_04692245.1 Streptomyces roseosporus NRRL 11379 large ATP-binding protein ZP_05515866.1 Streptomyces hygroscopicus ATCC 53653 large ATP-binding protein

Table S9. Non-plant non-metazoan NBS-LRR proteins.

Species Architecture Identifier Most Similar Clade Comment Tetrahymena thermophila NBS-WD40-LRR TTHERM_01006540 XVIII Not strict NBS-LRR

Ectocarpus siliculosus LRR-NBS-LRR Esi0031_0015 Single LRR after NBS is extremely weak hit. Identified as LRR-GTPase of the ROCO family. Naegleria gruberi ABC_TM1F*-NBS-LRR jgi|Naegr1|54705| Has N-Terminal ABC_TM1F, ABC- fgeneshHS_pg.scaffold_166000001 Transporter Trans Membrane Domain. Likely novel ABC-Transporter. Laccaria bicolor NBS-LRR jgi|Lacbi2|469284|fgenesh1_pg.32_#_5 III Nearly identical to ARGH35 (XP_002324939.1) an R-protein from Populus trichocarpa. Likely contaminant or recent horizontal gene transfer. Dictyostelium discoideum NBS-LRR-NBS DDB0191297|DDB_G0268636 Both NBS hits are fragmentary, Poor sequence quality, Hard to Interpret Monosiga brevicollis NBS-LRR jgi|Monbr1|25471| XVII NBS Resembles the NACHT Domains of fgenesh2_pg.scaffold_10000121 Telomerase Protein Component 1 of Strongylocentrotus purpuratus. Possible independently evolved NBS-LRR.

*ABC_TM1F=ABC Transporter Trans-Membrane Domain

32 33