PLASTID-TARGETED PROTEINS ARE ABSENT FROM THE PROTEOMES OF ACHLYA HYPOGYNA AND THRAUSTOTHECA CLAVATA (OOMYCOTA, STRAMENOPILA): IMPLICATIONS FOR THE ORIGIN OF CHROMALVEOLATE PLASTIDS AND THE ‘GREEN GENE’ HYPOTHESIS

Lindsay Rukenbrod

A Thesis Submitted to the University of North Carolina Wilmington in Partial Fulfillment of the Requirements for the Degree of Master of Science

Center for Marine Science

University of North Carolina Wilmington

2012

Approved by

Advisory Committee

D. Wilson Freshwater Jeremy Morgan

Allison Taylor J. Craig Bailey Chair

Accepted by

Dean, Graduate School

This thesis has been prepared in the style and format consistent with the Journal of Eukaryotic .

ii

TABLE OF CONTENTS

ABSTRACT ...... iv

ACKNOWLEDGMENTS ...... vi

DEDICATION ...... vii

LIST OF TABLES ...... viii

LIST OF FIGURES ...... ix

CHAPTER 1: Implications for the origin of chromalveolate plastids ...... X

INTRODUCTION ...... 1

METHODS...... 3

RESULTS AND DISCUSSION ...... 4

Revised Hypotheses for the Evolution of Chromalveolate Plastids ...... 6

CHAPTER 2: Do chromalveolate genomes encode ‘green genes’? ...... 15

INTRODUCTION ...... 16

METHODS...... 18

RESULTS AND DISCUSSION ...... 19

Green Genes in and Other Chromalveolates? ...... 22

SUPPLEMENTAL INFORMATION...... 32

LITERATURE CITED ...... 41

iii

ABSTRACT

Chapter 1

The chromalveolate hypothesis predicts that extant nonphotosynthetic stramenopiles are secondarily nonphotosynthetic and derived from ancestors bearing a secondary red- type plastid. To test this hypothesis, proteomes of the oomycetes Achlya hypogyna and

Thraustotheca clavata were canvassed for plastid-targeted genes. Proteins for each species encoding putative plastid-targeting signal peptides were identified, annotated, and assigned to protein families if possible. Forty-six candidate proteins were culled from the two genomes. Bioinformatic analyses revealed that the proteomes of Achlya and Thraustotheca do not encode plastid-targeted genes acquired by endosymbiotic gene transfer. All proteins possessing non-mitochondrial-targeting signal peptides identified were judged to belong to the secretome (i.e, extracellularly secreted proteins).

These results indicate that oomycetes are ancestrally aplastidic stramenopiles and do not support the chromalveolate theory of plastid evolution. Revised hypotheses for the origin of plastids characterized by chlorophylls a and c and fucoxanthin are presented. It is concluded that and stramenopile plastids are likely tertiary or higher order plastids, not secondary plastids.

Chapter 2

The hypothesis that a green algal symbiosis preceded the red algal symbiont that gave rise to red-type plastids in the ancestors of the chromalveolates is reexamined. A network approach was used to detect nuclear encoded proteins from the genomes of

Achlya hypogyna, Thraustotheca clavata, other oomycetes, and other chromalveolates

iv

that cluster with green algal genes. Twelve proteins clustering with green algal genes at high stringency were annotated and selected for further analyses.

Representative homologs from all other eukaryotic taxa available were aligned to sequences comprising each network and maximum likelihood trees were constructed from these alignments. Protein trees derived from these data exhibited obvious errors resulting from taxon biases and heterotachy. These results argue that ‘green genes’ detected in phylogenomics studies are artifactual and not indicative of endosymbiotic gene transfer.

v

ACKNOWLEDGMENTS

My thanks go to my advisor, Dr. J. Craig Bailey, whose enthusiasm about molecular caught my interest in the very beginning of my scientific education. His continuous encouragement, wit, and sense of humor made this journey an enjoyable one. Ian Misner and Dr. Chris Lane of the University of Rhode Island have also been instrumental in my education, providing feedback and technical support in my research. I’d also like to thank my committee members, Dr. D. Wilson Freshwater, Dr.

Jeremy Morgan, and Dr. Allison Taylor, for their encouragement and flexibility throughout this process.

My lab mates past and present, particularly Cory Dashiell, Erika Shwarz, Ashley

Hayes, and Allison Martin, helped me maintain my focus over the years throughout failed DNA extractions, computer malfunctions, approaching deadlines, and many other graduate school related challenges.

The Department of Biology and Marine Biology, the Center for Marine Science and the National Science Foundation provided financial support for my education and research.

Finally, I’d like to thank my parents and my husband for supporting me every step of the way.

vi

DEDICATION

I’d like to dedicate this to my mother, whose endless patience has allowed me to explore life with few restrictions and overwhelming love and support.

vii

LIST OF TABLES

Table Page

Chapter 1

1. Protein IDs for 46 hypothetical proteins detected in the genomes of

Thraustotheca and/or Achlya...... 9

2. Protein ID numbers, annotations and protein family designations...... 11

3. Proteins sorted into one of 14 unique protein families...... 13

4. List of seven proteins from the Achlya and Thraustotheca and putative

homologs found in the plastid proteome...... 14

Chapter 2

1. List of 12 annotated proteins from the Achlya and/or Thraustotheca proteomes

or other oomycetes found in EGNs ...... 24

viii

LIST OF FIGURES

Figure Page

Chapter 1

1. Hypotheses for the origin of complex, higher order

chlorophyll a+c-containing plastids in chromalveolates...... 8

Chapter 2

1. Three examples of putative green genes in oomycete

genomes based on EGN analysis...... 25

2. DEXDc ML tree ...... 26

3. RPB ML tree ...... 27

4. ALDH ML tree ...... 28

5. TOR-containing kinase ML tree ...... 29

6. YAK1 ML tree: ...... 30

7. ALS ML tree...... 31

ix

CHAPTER 1: Implications for the origin of chromalveolate plastids.

x INTRODUCTION

The evolutionary origin and subsequent movement of secondary and higher order plastids among photosynthetic is the subject of intense debate. The principal key to unraveling the evolutionary history of plastids is an accurate understanding of the relationships among both host and plastid lineages (Archibald 2009; Green 2011). This goal is hampered by the mosaic nature of eukaryotic genomes comprised of lineage- specific genes inherited vertically, thousands of genes acquired by endosymbiotic gene transfer (EGT), and genes obtained via lateral gene transfer (LGT) (Archibald 2008;

Green 2011; Keeling 2009; Larkum 2007).

The chromalveolate hypothesis posits that the , , and stramenopiles are monophyletic and that the last common ancestor of these lineages was a photosynthetic alga bearing a red-type plastid (Cavalier-Smith

1999; 2003). This notion is supported, in the first instance, by the fact that photosynthetic members of these chlorophyll a+c-containing groups all possess red- type plastids surrounded by three or four unit membranes [the so-called chloroplast- endoplasmic reticulum, or CER], a feature indicative of secondary endosymbiosis

(Dodge 1975; Foth and McFadden 2003; Guillot and Gibbs 1980a, b; Gibbs 1981a, b;

Köhler et al. 1997). Second, nuclear-encoded plastid-targeted proteins in these are characterized by the presence of a 5’ bipartite signal sequence that directs gene products to the plastid and across the outer- and inner-pair of plastid membranes (Kroth

2002; Soll and Schleiff 2004). In terms of coding capacity, gene content, and organization the plastid genomes of chromalveolates resemble those of far

more closely than they resemble the plastid genomes of (Delwiche 1999;

Keeling 2004; Yoon et al. 2002). Cavalier-Smith (1999) originally emphasized the chromalveolate hypothesis is consistent with idea that the chloroplast endoplasmic- reticulum (CER) and complex protein-trafficking systems that characterize chromalveolates are unlikely to have evolved independently on different occasions (see

Kroth 2002; Ralph et al. 2004).

Over the last decade, tests of the ‘chromalveolate’ concept has been the subject

– implicitly or explicitly – of numerous broad-scale phylogenetic studies. The chromalveolates have not been recovered as a monophyletic group in any study

(Archibald 2009, Baurain et al. 2010). More recent studies imply the relationships among chromalveolate host cells and their plastids is more complex than originally supposed, perhaps involving tertiary and higher-order transfers among hosts (Archibald

2009; Bodyl 2005; Keeling 2004; Sanchez-Puerta and Delwiche 2008). In this paper the chromalveolate hypothesis is re-examined in light of new genomic data available for nonphotosynthetic members of the Stramenopila.

The stramenopiles, one of the four principal taxa included in the , are divided into two groups. (i)The ‘photosynthetic stramenopiles’, ‘ algae’ or

‘ochrophytes’ - is comprised of chlorophyll a+c-containing photosynthetic algae including phaeophytes, chrysophytes, and diatoms, eustigmatophytes, pelagophytes, and xanthophytes (Lee et al. 2000). (ii) Nonphotosynthetic organisms that are bactivorous, parasitic or saprobic heterotrophs in nature including bicosoecieds, hyphochytrids, labyrinthulids, oomycetes, thraustochytrids, among others (Lee et al.

2

2000). The oomycetes are the most diverse, well studied, and economically important of all nonphotosynthetic stramenopiles.

The chromalveolate hypothesis implies that extant aplastidic stramenopiles are derived from ancestors that once possessed a secondary red-type plastid. However, there is no ultrastructural or DNA evidence suggesting that bicosoecieds, hyphochytrids, labyrinthulids, oomycetes, or thraustochytrids possess, or possessed in the past, a plastid. Furthermore, ultrastructural or DNA sequence evidence for cryptic plastids in these organisms is absent or controversial (Lee et al 2000; Reyes-Prieto et al. 2008;

Slamovits and Keeling 2008; Stiller et al. 2009).

In this study the proteomes of the oomycetes Achlya hypogyna and

Thraustotheca clavata, were canvassed in search of photosynthesis related genes.

METHODS

Full length predicted proteins were obtained from ongoing genome sequencing projects for Achlya hypogyna (ATCC48635) and Thraustotheca clavta (ATCC34112) estimated to encode 17,430 and 12,154 predicted proteins, respectively; additional details will be published separately. The Achlya and Thraustotheca proteomes were searched for possible plastid-targeted genes using the signal peptide prediction program ChloroP

(v.1.1) (Emanuelsson et al. 1999). Hypothetical proteins returned from these searches were subsequently analyzed using SignalP (v.4.0) (Petersen et al. 2011), annotated and assigned to protein families if possible using the Conserved Database (CDD)

(Marchler-Bauer et al. 2007). Mitochondria-targeted proteins and proteins possessing

3

transmembrane regions identified using TmHMM (v2.0) were removed from the data set

(Krogh et al. 2001). Searches for heterokont-like bipartite plastid-targeting peptides, consisting of both signal and transit peptide motifs, were conducted using HECTAR

(Gruber et al. 2007; Gschloessl et al. 2008; Waller et al. 2000). Finally, the oomycete proteins were BLASTed against the Arabidopsis thaliana plastid proteome database

(which includes plastid- and nuclear-encoded plastid targeted proteins) using plprot v.2.3 (Baginsky et al. 2005; Kleffmann et al. 2004; 2006).

RESULTS AND DISCUSSION

The chromalveolate hypothesis implies that the ancestors of oomycetes were photosynthetic organisms bearing red-type plastids and putative plastid-related genes have been reported from the genomes of the pathogens Phytophthora ramorum and P. sojae (Tyler et al. 2006). The competing hypothesis is the long-held view that oomycetes are ancestrally aplastidic. It is possible that the ancestors of oomycetes were photosynthetic but that extant members of group have not retained any plastid- associated genes. On the other hand, empirical data including studies of apicomplexans, and other taxa imply plastid-associated genes are unlikely to be completely purged from the genome even in organisms where a vestigal, nonphotosynthetic plastid is absent (Barbrook et al. 2006; de Koning and Keeling 2004;

Matsuzuki et al. 2008; Wilson 2004; Sanchez-Puerta et al. 2007).

Thirty hypothetical proteins from the Achlya genome and 16 from the

Thraustotheca genome putatively possessing a 5’ plastid-targeting signal peptide were

4

identified (Table 1). Of these 46 proteins 22 are presently characterized as hypothetical proteins of unknown function; 24 of the proteins were annotated (<1.00E-25) and found to represent 14 unique protein families (Tables 2, 3).

BLASTp queries revealed that none of the oomycete proteins (Table 2) are encoded by the 271 eukaryotic plastid genomes sequenced to date. None of the 46 presequences examined here possess the ASAFP (Y/W/L) motif necessary for plastid import in diatoms, although the significance of this observation is unclear (Gruber et al.

2007) (see supplementary Tables S1 and S2).

Putative homologs to seven of the oomycete proteins were detected in the A. thaliana plastid proteome (Table 4). These seven oomycete proteins are more-or-less distant relatives of three A. thaliana genes. Both Achlya and Thraustotheca encode proteins similar to the zinc-finger type WRKY1 DNA-binding transcription factor that plays a role in disease resistance in A. thaliana (Dong et al. 2003; Shindo et al. 2012)

Three Achlya and one Thraustotheca proteins putatively encoding cysteine proteinase

RD21A are shared in common with the A. thaliana plastid proteome. Finally, a single

Achlya protein distantly related (6E-17) to A. thaliana aldehyde dehydrogenase (ALDH) was also detected.

These three genes are not indicators for photosynthesis per se because homologs have been detected from across the tree of life in photosynthetic (e.g., and green algae) and nonphotosynthetic organisms (e.g., eubacteria, , fungi, and the rhizarian Dictyostelium). Homologs, more closely related to the Achlya and

Thraustotheca proteins, to each of these putative genes have been previously detected

5

in the genomes of Phytophthora infestans, P. sojae (Pythiales) and the white rust

Albugo laibachii (Tyler et al. 2006).

The annotated proteins recovered in this study include nine know to belong to oomycete secretomes and six of these are common proteases such as chitinase and cellulase (Tables 2, 3: Birch et al. 2006; Gaulin et al. 2008; Kamoun 2006; Levesque et al. 2010). One of the proteins belongs to the elicitin family; a family of virulence genes unique to oomycetes (Jiang et al. 2006). Based upon these data, plastid-associated genes are not present in the Achlya or Thraustotheca predicted proteomes.

Revised Hypotheses for the Evolution of Chromalveolate Plastids

These data, as well as the study by Stiller et al. (2009), indicate that oomycetes are ancestrally aplastidic despite reports to the contrary (Tyler et al. 2006). This information and the results of recent phylogenomics investigations have been synthesized and revised hypotheses for the evolution of chromalveolate plastids are presented in Figures

1 and 2. These diagrams reflect a number of assumptions that are enumerated for the sake of clarity. (i) The Chromalveolata sensu stricto is paraphyletic (e.g., , Iida et al.

2007; Khan et al. 2007; reviewed in Green 2011; Rogers et al. 2007). (ii) )omycetes, all other heterotrophic stramenopiles, as well as the are ancestrally aplastidic

(Archibald 2008; Reyes-Prieto et al. 2008; Tyler et al. 2006). (iii) The SAR clade is recognized as natural (Burki et al. 2007; Hackett et al. 2007; Lane & Archibald 2008).

Fourth, recent studies imply that SAR and host cells are likely distantly related

(Baurain et al. 2010; Hackett et al. 2007; Parfrey et al. 2010). For these reasons, no

6

specifically defined relationship between SAR and Hacrobia host cells is implied in

Figure 1. The diagrams comprising Figure 1 are drawn under the assumption that the

Hacrobia is monophyletic (Burki et al. 2007; Hackett et al. 2007; Harper et al. 2005,

Patron et al. 2007).

These hypotheses share elements in common with prior models of chromalveolate plastid evolution in which multiple plastid acquisitions (or plastid replacements) are inferred via serial endosymbiotic transfer (Archibald 2008; Bodyl

2005; Bodyl et al. 2009; Bodyl and Moszczynski 2006; Sanchez-Puerta & Delwiche

2008). Two predictions derived from these models bear emphasizing: (1) Alveolates and Stramenopiles likely possess tertiary or quarternary plastids and (2) it is conceivable that one of these taxa, the alveolates or stramenopiles, may have obtained their plastid from the other (Fig. 1). Finally, it is noted that the number of membranes surrounding higher-order, complex plastids seems to be fixed at four or less.

7

Fig. 1 Hypotheses for the origin of complex, higher order chlorophyll a+c-containing plastids in chromalveolates. (A) Independent acquisition of a tertiary (3°) plastid in the alveolate and stramenopile lineages from the Hacrobia lineage. (B) Serial endosymbiotic transfer resulting in a quarternary (4°) alveolate plastid from the 3° stramenopile plastid. (C) ) Serial endosymbiotic transfer resulting in a 4° stramenopile plastid from the 3° alveolate plastid.

8

Table 1. Protein IDs for 46 hypothetical proteins detected in the genomes of Thraustotheca and/or Achlya characterized by the presence of a putative plastid-targeting 5’ signal peptide sequence. ChloroP was used to detect classical plastid transit peptides. HECTOR was used to search for bipartite plastid targeting leader sequences characteristic of stramenopiles and other 3° chromalveolates (Kilian and Kroth 2003, McFadden and van Dooren 2004, Vesteg et al. 2009).

Protein ID SignalP ChloroP HECTAR Thraustotheca clavata THRCLA_02069 Y Y Chloroplast THRCLA_03737 Y Y Signal peptide THRCLA_03876 Y Y Signal peptide THRCLA_04285 Y - Signal peptide AA THRCLA_04386 Y Y Signal peptide THRCLA_04952 N Y Signal peptide THRCLA_05863 Y Y Signal peptide THRCLA_06099 Y Y Signal peptide THRCLA_07047 Y Y Signal peptide THRCLA_08011 Y - Signal peptide THRCLA_10855 N - Signal peptide THRCLA_10997 N - Signal peptide THRCLA_11248 Y Y No N-terminal target peptide found THRCLA_11271 Y Y Chloroplast THRCLA_11391 Y Y Signal peptide THRCLA_11516 Y Y Signal peptide

Achlya hypogyna ACHHYP_00269 Y Y Signal Peptide ACHHYP_01095 Y - Signal peptide ACHHYP_01226 Y - Signal peptide B ACHHYP_01546 Y Y Signal peptide B ACHHYP_02169 Y Y Chloroplast ACHHYP_02305 Y Y Signal peptide ACHHYP_03044 Y Y Signal peptide ACHHYP_03052 N - Signal peptide ACHHYP_04549 Y Y Chloroplast ACHHYP_04706 Y Y Signal peptide ACHHYP_04908 Y Y Signal peptide ACHHYP_05005 Y Y Signal peptide

9

Table 1 cont

ACHHYP_05180 Y Y Signal peptide ACHHYP_05326 Y Y Signal peptide ACHHYP_05770 Y Y Signal peptide ACHHYP_06287 Y - Signal peptide ACHHYP_06505 Y - Signal peptide ACHHYP_06977 Y Y Signal peptide ACHHYP_07400 Y Y Chloroplast ACHHYP_08323 Y - Signal peptide ACHHYP_09221 Y Y Chloroplast ACHHYP_09519 Y Y Chloroplast ACHHYP_10824 Y Y Signal peptide ACHHYP_11025 Y Y Chloroplast ACHHYP_11286 Y - Signal peptide ACHHYP_11397 Y Y Signal peptide ACHHYP_12628 Y - Chloroplast ACHHYP_13722 Y Y Chloroplast ACHHYP_14385 Y Y Signal peptide ACHHYP_15409 Y Y Chloroplast

10

Table 2. Protein ID numbers, annotations (<1.00E-25), and protein family designations for 46 proteins from the Thraustotheca and Achlya genomes putatively possessing 5’ plastid-targeting signal peptides.

Gene/Protein ID Annotation pfam

Thraustotheca clavate THRCLA_02069 putative GPI-anchored serine-rich hypothetical protein _ THRCLA_03737 cd05384: SCP_PRY1_like [COG2340] pfam00188: THRCLA_03876 hypothetical protein, with EGF-like motif _ THRCLA_04285 Kazal-type serine proteinase inhibitor pfam7648 THRCLA_04386 hypothetical protein _ THRCLA_04952 hypothetical protein _ THRCLA_05863 hypothetical protein _ THRCLA_06099 putative GPI-anchored serine-rich hypothetical protein _ THRCLA_07047 hypothetical protein _ THRCLA_08011 cysteine protease family C01A, putative pfam00112 THRCLA_10855 hypothetical protein _ THRCLA_10997 chitinase D-like pfam00704 THRCLA_11248 hypothetical protein, unknown function _ THRCLA_11271 hypothetical protein, elicitin superfamily pfam00964 THRCLA_11391 beta-N-acetylglucosaminidase pfam00728 THRCLA_11516 hypothetical protein, unknown function _

Achlya hypogyna ACHHYP_00269 putative GPI-anchored serine-rich hypothetical protein _ ACHHYP_01095 beta-N-acetylglucosaminidase pfam00728 ACHHYP_01226 hypothetical protein pfam12937 ACHHYP_01546 hypothetical protein _ ACHHYP_02169 trypsin-like serine protease pfam13365 ACHHYP_02305 putative GPI-anchored serine-rich hypothetical protein _ ACHHYP_03044 putative chitinase-like carbohydrate-binding protein pfam00704 ACHHYP_03052 hypothetical protein _ ACHHYP_04549 hypothetical protein _ ACHHYP_04706 hypothetical protein encoding ricin_B_lectin pfam00652 ACHHYP_04908 hypothetical protein _ ACHHYP_05005 puative D-lactate dehydrogenase pfam01565

11

Table 2 cont

ACHHYP_05180 hypothetical protein _ ACHHYP_05326 Cellulose pfam00150 ACHHYP_05770 hypothetical protein _ ACHHYP_06287 hypothetical protein _ ACHHYP_06505 papain family cysteine protease pfam00112 ACHHYP_06977 hypothetical protein _ ACHHYP_07400 hypothetical protein _ ACHHYP_08323 hypothetical protein containing PAN domain pfam00024 ACHHYP_09221 hypothetical protein _ ACHHYP_09519 hypothetical protein encoding ricin_B_lectin pfam00652 ACHHYP_10824 ankyrin repeat protein pfam12796 ACHHYP_11025 hypothetical protein _ ACHHYP_11286 aldehyde dehydrogenase pfam0017 ACHHYP_11397 hypothetical protein _ ACHHYP_12628 papain-like cysteine protease C1 pfam00112 ACHHYP_13722 hypothetical protein _ ACHHYP_14385 hypothetical protein _ ACHHYP_15409 papain-like cysteine protease C1 pfam00112

12

Table 3. Proteins investigated in this study were sorted into one of 14 unique protein families, which are listed below. Note that all proteins investigated (see Table 2) are predicted to have a 5’ signal peptide and that nine of the 14 families include secreted proteins. Six of the families include proteases and that the elicitin family of virulence proteins are secreted extracellularly and is unique to oomycetes. pfam ID Protein ID Protein family / Conserved domains 00188 THRCLA_03737 Cysteine-rich secretory protein family 07648 THRCLA_04285 Kazal_2: Kazal-type serine protease inhibitor domain 00112 THRCLA_08011 ACHHYP_06505 Peptidase_C1: Papain family cysteine protease ACHHYP_12628 ACHHYP_15409 00964 THRCLA_11271 Elicitin 00728 THRCLA_11391 Glyco_hydro_20: Glycosyl hydrolase family 20, catalytic ACHHYP_01095 domain 12937 ACHHYP_01226 F-box-like 13365 ACHHYP_02169 Trypsin_2: Trypsin-like peptidase domain 00704 THRCLA_10997 Glyco_hydro_18: Glycosyl hydrolases family 18 ACHHYP_03044 00652 ACHHYP_04706 Ricin_B_lectin: Ricin-type beta-trefoil lectin domain ACHHYP_09519 01565 ACHHYP_05005 FAD_binding_4: FAD binding domain 00150 ACHHYP_05326 Cellulase: Cellulase (glycosyl hydrolase family 5) 00024 ACHHYP_08323 PAN_1: PAN domain 12796 ACHHYP_10824 Ank_2: Ankyrin repeats 0017 ACHHYP_11286 aldehyde dehydrogenase superfamily (ALDH-SF)

13

Table 4. List of seven proteins from the Achlya hypogyna and Thraustotheca clavata oomycete genomes and putative homologs found in the Arabidopsis thaliana plastid proteome. Reference refers to functional studies of the genes identified in this analysis.

Oomycete A. thaliana plastid protein ID proteome ID Gene annotation E - value Reference

ACH_05770 plp_at_01492 disease resistance protein related to DNA-binding 2.00E-20 protein WRKY1 THR_04952 plp_at_01492 disease resistance protein related to DNA-binding 7.00E-23 protein WRKY1 THR_08011 plp_at_00089 cysteine proteinase RD21A (=thiol protease 4.00E-53 Shindo et al. 2012 RD21A) ACH_15409 plp_at_00089 cysteine proteinase RD21A (=thiol protease 1.00E-24 Shindo et al. 2012 RD21A) ACH_12628 plp_at_00089 cysteine proteinase RD21A (=thiol protease 1.00E-18 Shindo et al. 2012 RD21A) ACH_06505 plp_at_00089 cysteine proteinase RD21A (=thiol protease 9.00E-47 Shindo et al. 2012 RD21A) ACH_11286 plp_at_00466 aldehyde dehydrogenase 6.00E-17 (ALDH)

14 1 CHAPTER 2: Do chromalveolate genomes encode ‘green genes’?

2 3

1 INTRODUCTION

2 One of the most vexing problems in systematics is the

3 interrelationships among the so-called ‘chromalveolates’ (Archibald 2008; Cavalier-

4 Smith 1999; Green 2011; Keeling 2004). The Chromalveolata is a paraphyletic taxon

5 whose members can be divided into two groups: The first group (the SAR clade)

6 includes the Alveolates (apicomplexans, dinoflagellates, and ciliates) that are sister to

7 Stramenopiles (including phaeophytes, chrysophytes, oomycetes). In turn, these two

8 clades are sister to the , a group principally comprised of free-living amoebae

9 (Burki et al. 2007; Hackett et al. 2007; Lane and Archibald 2008; Rogers et al. 2007).

10 The second group, the Hacrobia, includes cryptomonads and haptophytes and lesser-

11 known relatives such as the telonemids, , and picobiliphytes (Burki et al.

12 2007; Elias and Archibald 2009; Hackett et al. 2007; Okamoto et al. 2009; Rice and

13 Palmer 2006; Patron et al. 2007). The exact relationship between host cells and plastids

14 belonging to members of the SAR and Hacrobia clades is unclear (Baurain et al. 2010;

15 Harper et al. 2005). Despite these uncertainties, it is clear that all photosynthetic

16 chromalveolates possess three or four membrane-bound secondary or higher-order

17 plastids ultimately derived from a red alga (Hackett et al. 2004; Janouskovec et al.

18 2010; Kahn et al. 2007; Yoon et al. 2002; 2004; Sanchez-Puerta et al. 2007). How

19 these plastids were acquired is a contentious issue but most recent models reflect a

20 growing consensus that multiple independent origins and/or serial endosymbiotic events

21 best explain most recent data (Bodyl 2005; Bodyl and Moszczynski 2006; Sanchez-

22 Puerta and Delwiche 2008).

16

1 The understanding of the evolutionary history of chromalveolates has recently

2 been further complicated by the unexpected discovery of so-called ‘green genes’ in

3 chromalveolate genomes. Whole genome sequencing and EST studies have revealed

4 that the genomes of chromalveolate species encode 100s or 1000s of genes apparently

5 derived from within the green algal lineage (Moustafa et al. 2009; Tyler et al. 2006;

6 Woehle et al. 2011). For example, the genomes of the diatoms Phaeodactylum and

7 Thalassiosira reportedly contain thousands of genes whose phylogenetic affinities lie

8 within green algae (Armbrust et al. 2004; Bowler et al. 2008; Chan et al. 2011; Moustafa

9 et al. 2009). Putative ‘green genes’ (albeit fewer in number) have also been detected in

10 the genomes other chromalveolates examined (Cock et al. 2010). The presence of

11 ‘green genes’ has lead some authorities to speculate that the last common ancestor of

12 the chromalveolates once harbored a green algal symbiont that was later replaced by a

13 red algal symbiont that gave rise to the chlorophyll a + c-containing red-type plastids

14 that characterize most extant chromalvelates (Armbrust 2009; Dorrell & Smith 2011;

15 Frommolt et al. 2008; Moustafa et al. 2009). In short, the green genes found in

16 chromalveolate genomes are hypothesized to have been obtained via endosymbiotic

17 gene transfer (EGT) (Huang et al. 2004; Reyes-Prieto et al. 2008; Slamovits and

18 Keeling 2008; Tyler et al. 2006;).

19 Other studies – implicitly or explicitly – imply that the green phylogenetic signal in

20 chromalveolate (particularly diatom) genomes may be more apparent than real. Biases

21 associated with heuristic phylogenomics pipelines needed to construct across genome-

22 level trees and the uneven distribution of protein sequences for eukaryotic taxa have

23 been previously described (Stiller et al. 2009; Woehle et al. 2011). In this study, two

17

1 chromalveolates, the nonphotosynthetic stramenopiles Achlya, Thraustotheca, were

2 canvassed for proteins of putative green algal origin. These proteins were annotated,

3 combined with homologs from other oomycete genomes or expressed sequence tag

4 (EST) databases, and homologs representing all other available eukaryotic taxa. The

5 phylogenetic trees obtained were used to (1) determine if nonphotosynthetic, aplastidic

6 oomycetes encode green algal genes similar to those found in diatoms and other

7 chromalveolates. Note, that if oomycetes are ancestrally non-photosynthetic then their

8 genomes should not encode ‘green genes’. (2) Second these trees were used to

9 critically reassess the veracity of green genes found in chromalveolates in toto.

10

11 METHODS

12

13 The genomes of Achlya hypogyna (ATCC 48635) and Thraustotheca clavata

14 (ATCC 34112) were sequenced and assembled yielding 17,430 and 12,154 predicted

15 proteins, respectively. Green genes possibly obtained by HGT or EGT events were

16 identified using evolutionary gene network (EGNs) analyses as described in Bittner et

17 al. (2010). In brief, all sequences were BLAST-ed against one another. Sequences

18 were connected in the EGN connected components graph when they showed a

19 minimum similarity, BLASTp score < E-value threshold, and sequence identity score

20 and BLAST identity percentage equal to or exceeding user determined limits. For

21 example, an EGN network with user defined parameters of ‘1E-20 at 80% similarity’

22 connects sequences that have BLASTp scores below 1E-20 and sequence identities

23 equal to or greater than 80%.

18

1 In this study batches of networks were separately constructed with minimum

2 threshold protein identities of 35, 45 and 65% and E-value thresholds of 1E-20.

3 Networks including oomycete proteins and one or more protein sequences derived from

4 representatives of (1) the green algal lineage (GAL) or (2) Fungi were selected for

5 further investigation. Annotations for candidate HGT/EGT proteins in the Achlya and

6 Thraustotheca genomes were then refined using NCBI’s conserved domain (CDD) and

7 KOG databases (Marchler-Bauer et al. 2007; Tatusov et al. 2003) and then used to

8 drive BLASTp searches aimed at recovering more distantly related eukaryotic homologs

9 from GenBank. Homologous sequences from representative all available eukaryotic

10 lineages were selected and aligned using “Geneious Alignment” with default settings in

11 Geneious v5.5 (Drummond et al. 2011) and manually edited as necessary. Thus, each

12 protein alignment included all sequences in the EGN of interest, as well as a number of

13 more distant homologs from other eukaryotes. Maximum likelihood trees for each

14 protein alignment were constructed using PHYML (Guindon et al. 2010) with the WAG

15 substitution model (Whelan & Goldman 2001) to account for heterotachy and 500

16 bootstrap replicates. Baysian posterior probabilities were calculated with using the Mr.

17 Bayes plugin for Geneious and run with default settings using the WAG substitution

18 model.

19

20 RESULTS AND DISCUSSION

21

22 Because they are ancestrally aplastidic, oomycetes are a perfect foil for

23 examining the hypothesis that chromalveolate genomes harbor varying numbers of

19

1 green genes acquired via EGT from an ancient green algal endosymbiont (Dorrell &

2 Smith 2011; Moustafa et al 2009). Genes of cyanobacterial and/or red algal origin were

3 originally reported for the genomes of Phytopthora ramorum and P. sojae but it has

4 since been demonstrated that these genes are very unlikely to reflect cyanobacterial or

5 red algal contributions to these genomes (Tyler et al. 2006; Stiller et al. 2009; Woehle et

6 al. 2011).

7 In this study 12 protein-encoding genes encoded by the Achlya, Thraustotheca or

8 other oomycete genomes were examined, which, based on EGN analyses, are closely

9 related to genes found in green algae (Table 1). Three exemplary EGN networks are

10 depicted in Figure 1. These networks indicate that Phytopthora spp. share one or more

11 copies of the phosphate dikinase (PPDK) gene in common with the green algae

12 Chlamydomonas and Volvox (Fig. 1a). The PPDK gene is, however, absent from the

13 genomes of Achlya and Thraustotheca and this observation – coupled with the current

14 understanding of oomycete systematics – implies that PPDK was likely acquired in the

15 Phytopthora lineage following the pythialean/saprolegnialean divergence (Beakes &

16 Sekimoto 2008; Sekimoto et al. 2009) In any event, the PPDK network clearly

17 demonstrates a putative green algal gene in Phytopthora spp., that is unknown in other

18 oomycetes. If Phytopthora spp. PPDK genes were acquired via EGT, then this

19 observation is most parsimoniously interpreted as a recent event – not one that can be

20 associated with the presence of a ancient green algal symbiont. All oomycetes

21 examined encode single copies of eukaryotic translation initiation factor 5B and an

22 aldehyde dehydrogenase whose most similar homologs are putatively found in the

23 bryophyte Physcomitrella patens (Fig. 1b, 1c, respectively).

20

1 Maximum likelihood (ML) trees for six of the 12 oomycete proteins of putative

2 green algal origin examined in this study are depicted in Figures 2 – 7. These six were

3 selected for demonstration because they are the most taxon replete and best

4 supported; trees for the remaining six proteins are equally problematic, or worse (see

5 below).

6 A tree comprised of DEXDc homologs is presented in Fig. 2. The EGN for

7 DEXDc implies a green origin for this gene in oomycetes, specifically uniting oomycete

8 homologs with the sequence for Chlamydomonas reinhardtii (not shown). Note,

9 however, in the tree that the C. reinhardtii DEXDc terminates a very long branch and

10 that when other eukaryotic homologs are added the oomycete/green relationship

11 becomes less clear. In fact, this tree implies that oomycetes share a common ancestor

12 with the Opistokonts (fungi and animals), a result clearly at odds with current

13 understanding of eukaryotic systematics. In summary, (at least) two phylogenetic errors

14 are apparent in the DEXDc tree: long branch attraction and a topological error that can

15 likely be traced to problems associated with taxon sampling, i.e. clear homologs to the

16 algal, plant, oomycete, and fungal DEXDc genes have yet to be identified in other

17 eukaryotes. The same issue – taxon sampling – specifically the differential distribution

18 of homologs among eukaryotic lineages also plagues the RPB tree (Fig. 3). Bearing in

19 mind that protein sequences for animals, fungi and plants far outnumber those available

20 for other organisms, the RPB subunit II tree implies that the alveolates are sister to a

21 clade including stramenopiles (brown algae, diatoms, and oomycetes), animals, and

22 green algae + land plants (Fig. 3). This topological error is likely compounded by the

23 observation that the alveolate sequences terminate long branches whereas the

21

1 embryophytes terminate shorter branches, and heterotachy is a well-known source of

2 phylogenetic error (Kolaczkowski and Thornton 2008; Pagel and Meade 2008; Philippe

3 et al. 2008; Shalchian-Tabrzi et al 2006). The ALDH tree implies that the stramenopiles

4 are not monophyletic; green algal sequences are nested within a clade including

5 sequences for alveolates and stramenopiles (Fig. 4).

6 These same types of phylogenetic errors are demonstrated in Figures 5 – 7

7 and are not repeated. What these trees clearly demonstrate, however, is the pervasive

8 influence that the vast number of sequences available for fungi (80+ complete

9 genomes) may have on phylogenomics studies (cf. Stiller et al. 2009). The TOR-

10 containing kinase tree suggests that green algae may not be monophyletic and that

11 green algae and stramenopiles are, again, sister to animals and fungi (Opistokonts)

12 (Fig. 5). The unorthodox relationships among green algae, oomycetes, and fungi are

13 also recovered in the YAK1 tree (Fig. 6). The ALS tree is equally vexing and seems to

14 suggest that the chromalveolates (in toto?) may have obtained their copy of this gene

15 via horizontal gene transfer from fungi (Fig. 7).

16

17 Green Genes in Oomycetes and Other Chromalveolates?

18

19 On the basis of the data collected, the notion that chromalveolate genomes encode

20 hundreds or thousands of genes derived from green algae is false.

21 Critical analyses of protein-encoding sequences from oomycetes and other

22 chromalveolates of putative green algal origin yielded trees seriously compromised by a

23 number of obvious and well-known sources of phylogenetic error. These included at

22

1 minimum biased taxon sampling, long branch attraction, and heterotachy. This

2 argument is bolstered by the curious fact that so-called ‘green genes’ can be detected in

3 oomycetes even though these organisms are ancestrally aplastidic. These results, and

4 those of Stiller et al. (2009), suggest that these biases are so prevalent at this time that

5 broad-scale evolutionary scenarios drawn from phylogenomics studies need to be

6 interpreted with a higher level of skepticism.

7

23

1 Table 1. List of 12 annotated proteins from the Achlya and/or Thraustotheca proteomes or other 2 oomycetes found in EGN connected components graphs clustering with homologs from green 3 algae. 4 Protein Annotation

TOR-phosphatidylinositol phosphatidylinositol kinase, putative target of rapamycin (TOR) kinase Yak1 PKc-like superfamily, Yak1-like protein kinase acetolactate synthase TPP_AHAS[cd02015], Thiamine pyrophosphate (TPP) family, (ALS or AHAS) Acetohydroxyacid synthase (AHAS) subfamily DEXDc DEXDc superfamily, premRNAsplicing factor ATPdependent RNA helicase PRP16 putative RPB RNA polymerase beta subunit.cd00653: RNA_pol_B_RPB2 RRM RRM superfamily, PREDICTED: cleavage stimulation factor subunit 2-like RRM2 RRM superfamily, PREDICTED: similar to RNA binding motif protein Sm_D1 Sm-like superfamily, small nuclear ribonucleoprotein D1 Sm_E Sm-like superfamily, small nuclear ribonucleoprotein E thioredoxin peroxidase thioredoxin-like superfamily, cd03015: PRX_Typ2cys threonine protease threonine protease family T01A putative, cd01911: proteasome_alpha ALDH ALDH-SF superfamily, cd07084: ALDH_KGSADH-like 5 6

24

1

2 3 4 Fig. 1. Three examples of putative green genes in oomycete genomes based on EGN analysis 5 conducted at 65% protein identity. (A) All species of Phytophthora in this analysis share a copy 6 of phosphate dikinase (PPDK: P. infestans gene ID 03724) with Chlamydomonas reinhardtii 7 and Volvox carteri, two microscopic green algae. Note that PPDK is not encoded on the Achlya 8 or Thraustotheca genomes. (B) The moss Physcomitrella patens shares both eukaryotic 9 translation initiation factor 5B (P. infestans gene ID 20386) and (C) an aldehyde dehydrogenase 10 (P. infestans gene ID 00034) with all oomycetes included in this analysis. 11

25

1 2 Fig. 2 DEXDc ML tree: Oomycetes are shown sister to animals, sharing a common ancestor 3 with fungi. The phylogenetic errors demonstrated include long branch attraction and topological 4 error due to sampling bias. 5

26

1 2 Fig. 3. RPB ML tree: Alveolates are shown as sister to a clade including stramenoplies, animals, 3 and GAL. Long branches in the alveolate clad and short branches in the GAL, stramenoplie 4 and clade is indicative of topological error due to heterotachy. 5

27

1 2 Fig. 4. ALDH ML tree: Stramenopiles and GAL shown as not monophyletic. Long branch 3 attraction between GAL, stramenopiles, and alveolates is likely responsible for phylogenetic 4 error. 5

28

1 2 Fig. 5. TOR-containing kinase ML tree: Stramenopiles are sister to GAL, shown sharing a 3 common ancestor with animals. Heterotachy and topological error due to sampling bias are 4 demonstrated. 5

29

1 2 Fig. 6. YAK1 ML tree: GAL and stramenopiles shown sharing a common ancestor with fungi. 3 Long branch attraction between GAL and stramenoplies, heterotachy and topological error due 4 to sampling bias are demonstrated. 5

30

1 2 Fig. 7. ALS ML tree: Two clade tree shown making inferences about the relationship between 3 the two impossible. Phylogenetic error is likely due to abundance of available fungal genome 4 data (sampling bias). 5 6

31

1 SUPPLEMENTAL INFORMATION

2 Table S1. Selected hypothetical proteins (n=16) from the Thraustotheca clavata 3 genome possessing putative 5’ transit peptides. Chloroplast transit peptides predicted 4 using ChloroP (v.1.1) are shown in bold face. Transit peptide sequences predicted 5 using SignalP (v.4.0) are underlined. 6 7 >THRCLA_02069 8 MVRISALLGTFALIHAQTTTAPPASASNSWTMTTVNSIQARVVSDAATWDATNKKFG 9 LVMKQNTVTFPDQYRAAMDTVNTASVEGALFYVQTEGINKQFDVNCMRKTNMSYIWF 10 LNVTIVQPTFAIAEYADNGGVVPEYGKFIAMDNGQCTPLDTKGTMSDECMTLGGLNYH 11 ANIGPFIGGEPRKEHLLAKYPDNIWFSYPNSCFTKTFIAKDTKCREAQKGGLCPLGVQP 12 DGIKCTYSFDILGYIRIDELVGITNLTNSQTGQKYKDRVEFCKDSKVEFDFSTMKSDLTF 13 WDNPTDEAANTNRTTKMLELYNNLIKTGTGDAAYMKSLPTAAELTAKNPPCWKNSPIC 14 ATAEFGCRRKLTAQICEKCTSASPDCKKPTSSDSVPPKLTKAVAPPLPTDASGKTTVP 15 RNPTGAGGNGNAAAAESSASSLVAFTSLIITLAALFA 16 >THRCLA_03737 17 MKSTFVLLAAISLVNASSSTKLRGAAPCPNSNSGSSDNSSDYSGSESNWDSGSGSD 18 WDDCGSGSTSTSDSGSNDYPSNWDSNSGSDTTEEPATYAPAPTSAPTSAPTETPAT 19 SKGTLKEQIIHQTNLIRAAHGLGPVKWNDELAAKMQAWANSDPQQNGGGHGGPPGN 20 QNLASFDVCNDNCMRMTGPAWAWYSGEEKLWDYDANKSRDGIWETTGHFSNSMDP 21 GVNEIACGYSTFYNPQIGHDDSLVWCNYLGGNNGVIPRPRIDQATLEKQLTSAY 22 >THRCLA_03876 23 MNLKAWILSVAIASAAAASGSSSGSGSTTDAPLTQENLSSRPGLCNTSKDCAKYTKG 24 SNVYSCIAVKSNIVNLTTLKQCVLGDGCSGGKAGSCPTFTSWPQKFRQVQPVCAFVA 25 VPNCNSAVNSQGQVVSVRSLREQAAKPGNVTCFQAKFGSNSSSSDDSATVYGIYQCV 26 DKKLYAEKNLGYLDNTPKQLQSCAGNVTVVNGQSVSNVLCNGHGTCVPQTDFSDIYK 27 CLCSTGYSDKDNCGAATGNVCSAFGQCGNGNCNPDTGKCVCPYGSTGDQCSKCDP 28 AQNNNASVTNMCNGNGKCGIDGTCQCSDGYLGTNCETQIKKNSTASSATGSTTSSKK 29 SAASGLHEASIAIFSIATIFAAALI 30 >THRCLA_04285 31 MQIKSIIATLTLAALAQADNNNCEKSCTKELSPLCASNNETYNNLCLFQIAQCQQPTLTI 32 SANQSCSTNVKFCTRLCPTVYQPVCGSDNTTYPTECDLKNKACNNPSLTVTKQGACD 33 NCPKACLEILAPVCGSDGKTYDNTCFLLKTACANPSLNLTFVSTGSCTNGNNTTTTAPP 34 SGTTLPPSGTTLPPTTSGNPSTTTTPPTTKPASSATTAMLSLMSAAAIAITYML 35 >THRCLA_04386 36 MKWQVALLSLVTSGIAQDHCGSTTVPTIVPTPAPTLAPTPAPTPAPTPAPTPAPTPAPT 37 PAPTPAPTPAPTPAPTPAPTPAPTPAPTPAPTPAPTPAPTPAPTPAPLAVATATWTNLW 38 SDIVQVATDNTQICIRETNGDVDCKPWSTDSSLPTVYGGHSSNFLATGGGWSISTVNN 39 VNYLVVISPLYNANVMVLDEAILYAATDGATCCITTSTFRCASQKLDMTFVKMTDKYITS 40 SSIYNAVIYGVDAQGKLYKGSTASISTGVANWQEVSTPCPFTQVSYDGTTLCGLYAST 41 NTIVCTSGTLSLQPNWVALQSNKWKQFSITQSYIYAVDTSNNVQRLQISQPIAVAP 42 >THRCLA_04952 43 MTLASSPTFSRPLLLPPLTSALSPSIAQQMKRQHECEGGGSVKRHCSTFPYMEMPRL 44 PSITQPSSHIGYLSESYYPSPTSLPMLPPASTLLQQATRKSMDLVPSNAYAPTLPEPCT 45 LYKSNENTKPSPSNEEVRGECLDAQCHNSVKHRGYCKLHGGARRCDVPGCPKGVQG

32

1 GNLCIGHGGGKRCRFPGCSKATQSQGLCKAHGGGVRCKYDGCNKSSQGGGFCRRH 2 GGGKRCSVAGCPRGAQRGTTCAQHGGKAQCMIDGCVRADRGGGYCEVHRKDKVC 3 RQGYCNRLARIKCEGYCTQHHREFCITSPPQ 4 >THRCLA_05863 5 MGSVLVLLFSPLHAWLTSSSSSSLPCLQTSFLLSQNASLDQIATAQPRINDAIVSFLAQ 6 SNVQSRIQWTNVDITTSTIDGNGPAIVQMCLYVPPNTSVNQVAMAMSATVNWSGLKTS 7 ISTLHRRFQLFDLTTPLQVLNQVTQYNFQRIPFPYVQWYLVVQGRFDYFWPQIKIKHAIA 8 VLLNISSSSVIPQDIIFPPYDAYNDIATILPFAITQVNSSTFARTANTLTGPLQDILALHGILL 9 LTQFPDPNGNGKLQQSVPWPEYPQLDPTSFYPFHNWTPVPNSFVVKLIYGGLLTLTN 10 MSSVILQVLDVLDSPQTANFTDFQTLTLTYPPNNGTATFESSRYNTLDFIVAGDRSTLE 11 ANQQTLGESLYQIGVSIFDVIDINSTMQTAQWYPYMQLDCPYNLSALASIIQRIALAAFF 12 SIPLSSIQLIEIATNSTTFEIACNDTLEQRYLKKQLKETTRWSTVMNNFTANSAFCTIGGE 13 SLAYPPMFPGSTYGWSQPSSSMDNTCSVNTIELTACDQCDRYLNAVCFTNPNCYQTQ 14 TTLLSQLLVSSNASSVFQQLSLSTSANTKTLNTLALYYSCIAAFQCLIAPNTSIITSDEVYT 15 IDINANGANFSTTLYYPQDDIYLVLNDQTTLEEIQINLSSSISNSIFVNVSGTSSSFNVTM 16 DSVVIPFQLPVIAYSTVPATIQRISASIPQLVFLSNSSNDTTVLLNGKCTTCLTQMDECK 17 MSPSCPSIAICWSNVVESAISQLDSVYSTLEISTQLISCYENASLEDFEMFLRVQKCLLQ 18 SSCPISPTLESIVKGTMIVLRSTTGFQTIELTPTPAVTLTIGTESIILSSNSISGLQATMINFL 19 SPLCQASIQSNTANLTIQFNDFGAPILPTINGTIYSQMPRIFLDRMPLDSSRFGFSYQSY 20 KQLSPSSLPNAFTTTLNSNCQMCQNLFDQCLLSSFCASIISNFQNTIAGATNAFIGWSV 21 ALQRLSFDIPEWDQFAQTLSCFEIHNCPINSTISMLKNGRMLLLSSTPVVLSVTFSSSPF 22 EAAIYVQRFRQPINVSSNSSAAYIQGQFQMNFGSLALTNVSITNTSMELSLNSYYGPTP 23 EFMVTSSEFSNKTIILGTSMVSVVSYSPAAYFPY 24 >THRCLA_06099 25 MKFALVSSLAVLASAQTNNSSAGSNSNVNCPLQFTSACANTQECGTLNGYPLECQV 26 YGSVKQCVCSKENANCQNSTNIANTIPQFGVCTGGKQCAGSGFKALQTPVRTCSEQL 27 VCIPQYASGNELQSICHTCSSCKQQNKPDATGRLIFNCTQICPLGQGDPIVTIPPVTTAP 28 TNSTKKNDSSKGSGSTAGSKPKSAATSIVAGVATVAIVAIASLF 29 >THRCLA_07047 30 MILINLLFGLRLCTDGVSLLQQQVPRKPSKRTKQSRCKHVPFVASTALKPTHETLAPL 31 MPLVVYQEVTENDMAHLISLVDNQDNQEDNEEITENVVADVFVPLVDNQVSQEANEN 32 VVVDFADNQDNQEANENVAEFEPLVNYQDNQENVAEFVSLVDNQGSQEASENVVVE 33 LVPSVVCRDSFEPTEEDVAAVLHGRFAANQAALLRVSSFQPADDRSLTAIQLIRYFELY 34 HLVRMDYNQLRHLEPSRLEKIQLVRLSILERQAIEAMLSDVAELWSRQPNDVSSAKKLQ 35 WFKNLQYGLMWDMLELLEHQKPDHHCARGLCPQLYQEKLDIIYSE 36 >THRCLA_08011 37 MKTIFLTTALLASTSCALQMTNKERNEILDELNKWKQSAVGKAALVHNFLPSSQRQEG 38 LSIDAKQDLEITRFAHTKKVVEQLNKEHKGSAVFSTNNMFALMSDEEYKKWVKGAFGR 39 DHKKRQLRGENIQLELTAEQREASGIDWTSNKCMPAVKNQGQCGSCWTFASVGAAE 40 MAHCLVTGNLLDLAEQQLVDCASDAGQGCQGGWPTKALQYITQTGMCTSRDYPYTA 41 SDGQCNNSCKKTKLSIGEPVDIQGESALQSALNKQPISVVVEAGNDVWRNYQSGIVQQ 42 CPGAQSDHAVIAVGYGSDGGDYFKIRNSWGAEWGEQGYIRLRRGVGGKGMCNVAE 43 GPSYPSMSGKPNPDGPTDEPSNDPTDEPSNDPTDEPSNDPTDEPSNDPTDDPSDDP 44 TDDPWNGSNDWDWGN 45 >THRCLA_10855

33

1 MIVPVVLMALTGISITGVTLRWCCSHRQKTSWKEERKEPLLATPVPLPRIKTDVFIERSI 2 AMDVPLMETCSGCGAWIDPSLAAIANGLCVVCSYQTIPSLEIDIDENISENDETSNDKDS 3 DKPILTDEDTTADIESPKDVEIQIEESNFEDEMEDISTTSQDMVIPISQDNNCNEGDEDV 4 EIAEEALALVQDMWDIAYQAHLGVGDDPTADIFVEMALDLDATAEAIKKEPHLLSESFH 5 FLSLSLASLMELVPEAWVAHVEATELKFEALQFRYHSKLTVENCLDLATHLYELVECAQ 6 EFGVDPAVASSLMDGLEELVEAIEETPCELVSWLAYLAATVKLLKSYQRDFEQAEMWD 7 TVVECERNLEPLEMHCWEIYSPC 8 >THRCLA_10997 9 MKASLCIATLAAMGSIASSRNIRHHAESVMGNPVQRRSESTRLPTHPLTGYWHDFPN 10 PAGDTYPLTQITKDWDVIVVAFANSLGSGKVGFDVDPKAGSETQFIKDISTLKAAGKTIV 11 LSLGGQNGAVTLNDATETANFVSSVYDLIKKFGFDGIDLDLENGISKDLPIINNLITAVKQ 12 LKQKVGDSFYLSMAPTYGGIWGAYLPIIDGLRNELTQIHVQYYNNGGFVYTDGRTLNE 13 GTVDCLVGGSVMLIEGFQTNYGNGWKFNGLRPDQVSFGVPSGTSAAGRGFVTPEVV 14 KRALTCLVQGVGCDTVKPPKTYPTYRGAMTWSINWDSHDGYVFSRPARQALDSLGG 15 SPPQPNPTAVNPTDAPNPLTNPPTSRPTNTPTVTPTQSPRPTSQPTSLPTSSPSSVPTI 16 NPTPIPTSVAPQPTQAPSSSC 17 >THRCLA_11248 18 MANTIQWLFIYCVIVASQGPPNNGERTCSVTLGGPVSQTSTAGTMSFCTAFPQERCC 19 LPVHDEYVKSTFYALLDSGYICASATNTAIAHLQTMFCLACDPSMSLYLTPPRNTTFFS 20 APQTLKVCRALAISFKQHIDAVSPYYFSDCGLTYAGDRNNLCIPKTAISPNMVFPGCSE 21 GQNICYSTTQGYYSPIWYCSSSPCGPDTPFGLNDIPCSGPTCTPAFQFLNDNRAAKPP 22 FFEPFAVEIIDESTCAPGESSCCMTDSSIVPTS 23 >THRCLA_11271 24 MKTTAFVLALASTAAASSPCTGSAVITAVTPLIAQATTCSTDSGFDLVALISGTTPTDA 25 QKQKFLTAESCKTLYASVQKSLAGITPACTIGDIDTSGWSTVSMDKGLDALIKSLPSLLA 26 SSGATNSTSNSTANSTISSTTVSPSSTTAAPAKSGVAATGVTIAAVALTTAILHLNANKQ 27 QEIHEHLRLTIKESDVETLGEVMSMSLIPAAEAHQFI 28 >THRCLA_11391 29 MKLSILLAAFGVVASSSIPKHTYKCNDGVCVQTPLNGAGVSLGSPLLSLRMCEMTCG 30 AGSLWPYPASVSLGTTATAIDTNKVSHSIKINGAEATSTLTNSIVQTFNEGVKAKTKWV 31 RGQSEIGAISHSIYGTISSNNEVLGQDTDESYELSIDGPRVKINAATIYGYRHALTTLNQL 32 IDYDELTNSVKMISKATISDKPAYSHRGIVLDTSRNFYPIESLKRMIDTMGANKLNTFHW 33 HMTDSSSFPIEINGEPRLTTYGAYSAEQIYTQDQIRDLVQFAKARGVRIIPELDAPAHAG 34 AGWQWGPKAGYGDLTLCYGADPWMNYCLEPPCGQLNPLNKQVYSVLDTVYKELTSL 35 FDGDVFHMGGDEVSIPCWNSSKVITDHLKDTNKPGAFFDLWGDFQTKAAAMLNKKVM 36 VWSSDLTTDPYLKYFEPNNTIIQLWGGSTDGDATRITSQGYDVVASYWDAYYLDCGFG 37 GWVSKGNGWCAPYKSWQVIYDLDITANMTAANAKHVLGSEVAMWSEIADAHVVETKV 38 WPRAAALAERLWTNPKTDWKSAMGRMRIQRDRIADAGIGADAVHPLWCRQNPGKCQ 39 LV 40 >THRCLA_11516 41 YTCVAVQTAIAGIALASQCVLGTTCGGNSAGQCPTFSSWSSSYQKIQPVCAFVNVTN 42 CVNFIKAGSEAKATSGSGSTSTVNCYQATFSANNISQVVSGIYKCVDSGLYVSQNLGAI 43 KNLTTTQMDVCAGNLTTSVGALCNGHGTCAPTAAFSSKYQCICNEGYSATDNCNVAT 44 SNVCNAFGSCGAGNTCDTTSKQCSCTTGTTGPQCSLCDPTASSSVVCNGNGVCSSS 45 GTCTCNSDYTGSLCSRTATTNSTGSNKSSSSSHLVASLATIATCLLAILM 46

34

1 2 3 Table S2. Selected hypothetical proteins (n=30) from the Achlya hypogyna genome 4 possessing putative 5’ transit peptides. Chloroplast transit peptides predicted using 5 ChloroP (v.1.1) are shown in bold face. Transit peptide sequences predicted using 6 SignalP (v.4.0) are underlined. 7 8 >ACHHYP_00269 9 MVRTLSLLLLAAGVAGQTSTTPVPTPAVSNPPFTMTLVNSIQARVVAEAATWDETNQ 10 KFGLVLKQNTNTFEERYRAVMDTVNTASVEGALYYVQTEGIDKPLQTGCMRKTNMSYI 11 WFLNITMVQPTFAIAEYQDNGGVVPEYGKFVAMDGGLCTPVGTETPLECLTYGGLNFN 12 KNLGQWVGGEARKKNGRANYDDNYWFSFPNSCYTMRFDAKTKACRDLQKGGLCPIG 13 TQPDGVKCTYSFDVLGYLAIDDLVGITSMKNTLTGQNFKGFSEFCKAGKTEYNFADSS 14 SDLTFWNDPLEPAANANRTKVMMQKYNDLVQNGVGDQKHMKALPSVEELTKANPPC 15 WKNSPRCATAANGCRRKLLSQICEVCSAPADDCKKPGPNDKAAPMLNKQFQPALPTD 16 ATGNTKQPRAPNAAPLDAPAGGAGGNVIKGSGAAATSLILATAVGLVALAV 17 >ACHHYP_01095 18 MLARLAALIGVAAALQVPFTTYECVRGRCEPRPRSFSPPDSASSLRLCEMTCGAGNL 19 WPLPTSVSLGTTTRVVSVDYVSHTVTFLDNSVPISPLVGAIQRIFDNTLALKATECALAS 20 VGGAELAVTASIESGNEVRDYFRTFTMAADDNTMVQELELETDESYTLTIVDGAATIHA 21 ATVYGYRHALTTLSQLIEYDELSHDMHIISAVTITDAPHFAHRGIVLDTSRQYYSVPAIKR 22 LLDGMGATKLNSFHWHFTDTASFPIEIKGEPRLTAFGAYHPRSVYTQQAMRDIVAYAR 23 ARGVRVIPEVDAPSHVGAGWQWGKDAGLGELAVCFGHNPWTEACVEPPCGQLNPF 24 NPHVYDVLETVYEELNEIFDSDVFHMGGDEVHLGCWNMSAAVTAHMTDRSPDAFYRV 25 WGRFQMQARQLVGEKKIAVWTSDLTNAPYLRKYFDPASTIIQMWTLSTGSDAARFTA 26 QGYPVIASYYDAYYLDCGFGNWLLKGADWCTPYHHWSVLYDLDVLHNVPAAQRNLVL 27 GGEVALWSEEVDEATMDAKIWPRAAAAAERWWSNPVNGTWKDAIDRMRIQRDRLVD 28 IGLQADALQPLWCRQNAGDLSQGSGISISATVKSKSEALTVDTDESYELSIDGPKVSIN 29 AATVYGYRHALTTLNQLIDYDEISNSVKMIAKAKIADKPAYSHRGIVLDTARNYYSIDSLK 30 RLVDTMGANKLNTFHWHFSDSSSFPFEIKSEPRLTSYGAYSKDQVYTQDQIRDFVQFA 31 KARGVRIIPELDAPSHAGAGWQWGPKAGYGELTLCYGSDPWMDYCLEPPCGQLNPL 32 NDHVYDILKTVFEEMHGLFDSNVFHMGGDEVSVPCWNSSKVITDHLKNTTSNAPFFDL 33 WGTFQTKAGALIEKANKKIMVWTSDLTTDPYLKYFKPSNTIVQLWGGSTDGDAERLTS 34 KGYEVVASYWDAYYLDCGFGGWVSKGNGWCAPYKSWQVIYDLDVRANLTATNAKRV 35 LGSEVAMWSEIADEKAVEAKIWPRAAALAERLWTNPKTNWKSAMTRMRIQRDRIADA 36 GVGTDAVHPLWCRQNPGKCTLV 37 >ACHHYP_01226 38 MTALADAVWLAVMAFLDGQDLSRLMRVSRAHWRRLQAQVRRWREIQLGLGLGHWV 39 QRNVRLTINTQVQEAQSLAVQRSPDARVPPRVETIQKELGPIEAERSVHRLTATTPLFT 40 ATQQAVLVLSFDCTSADTKPLLVHTSQRARTLYTTLTLTIFDRTLRRHVYHKASGDLAT 41 VPVAEKQAWTNAGATLRCDVASNDKSCQVQLGLPARLDGKIDCYHIERVDFTLHKREL 42 YPVFSLPLEPSLPTCWIHLQFHDLARAQCLARVSAPCHALLEMAASRTDDTNHPARRT 43 AVEQLEVATFRSTQPTSLPDISSLAKPGMISMVISGPERHQAFYHTAFGHSGATRKSDS 44 AHVLAATWVPGVLEFAMYPDTLNRRVLKGIFTLEFAVSGALTSLVVLAQHLSPRRLLRY 45 NARVASYSRRPEAERNEDA 46 >ACHHYP_01546

35

1 MVALFLGTAIALALASSATGSFTGLAMPAANSSEPKSGQCKLMKLLPRATQFNVALS 2 PRHYGRGGHCGRCVQTQCDRCAASAPIIAQVTDRASDVGLSKPMLRALFGSGAPSAV 3 TWDFVDCPVNDPIALCTKPRNTSAYIIYVQPTNTVAGVQNMTIDGFRGRLTNASYHFKA 4 PMPANWSNVRVSMKSFTGDAIAASVALRPGRCVTIPHQFSPSPAAASGTPAVIDYDGD 5 EDADSITVPPPYK 6 >ACHHYP_02169 7 MAWIVVLGILAHVATALQSSLCATSSAFSPPGCHANRRLATWSRAIVRLNAGGHVCT 8 GWFVGSEGHILTAHHCIHKARAVEVVVEETPAQTCPPRTIRGRMTTGIDVVAFSVALDY 9 ALLRPLNRSVRGPVHLQLHSSAADIVGLEAIVAQHVDASSPVVLSEAGRIVSTTFAGCG 10 RRDRLAYALDTKASASGSPILSTATGAVLGLHTCGGIHCHGKSVPMWIVIGCSSEPGH 11 WNSGAVAADVVADLRQRHHLPPDAVAHETLSAPTPSTIIVERGRLVQRAANTTSVDAY 12 LLTMAMPGRVTLDLLAWTMDAQGRWHDLRRDCDGSFFDTKVILAVVDDADGRPLLRR 13 IAENDNDTRHQGMGDGSIDNRDAFLDVYLASPGDYYVLVGTAAMLLPAVFAPRLSAPT 14 DGGQHLYGCGNTRATEANYNLRITTDDGTLQRIEAPFPRTAACSSSARKCPAAHADTA 15 LTLDAVVAGTLHRTYSSGTSMDHISFELTKAGRIAIDVVSYQEHTNGSIAIDGLHDVCGR 16 AYLDTVLYVFGATIPSGEYLDPAALVATASDRPPTHVASQRYRSVSTRDPYVEVDLPA 17 GNFTLVVGQQPLSLFEAVRVLYPGSRETDAPLLCGRPHPFGHYHVFFWVQHRRMLSA 18 TMPGSFDHAACTHEVCSDSML 19 >ACHHYP_02305 20 MKFTTLLVATVFGQNTTTAPSSAPTPAPTKCLLQFTSPCKSSSECGDLNGFNLTCIKS 21 GSNKQCNFNGGSTVAKDNQFKAADNLVYQFGDCSTASCTTGHGFTEGLPTTVTCQE 22 PLVCVKEINDNPGVVLKSQCHTCGSCKAQSLKDTRFDCSKVCPLTPAPTTKAPKVPGA 23 TGSAASSGSGSETSAPATRAPKTGTPAPTAASSASTALVSGIAVVALAFAQLC 24 >ACHHYP_03044 25 MAGLIVGILAAVGTFSGSGESISTGTSSTPAPTTHTPTTLSPSPTTKPTTVTPTPTLAN 26 GLCPLRGMYLSGTSCVACPTPKKTFSVFWESQVDCSTFATSSAAAYVTHIYWSFALID 27 PTTGTVSSTFQGSSATLKACIAAARAKCIKNYVSIGGATMRQTFVALNSSAQLTTFALS 28 AAQVVQEYGFDGVDIDDESGNLLAGGDWKANALPNVLVYLQGLKTQLAALPRAATEP 29 KYQITWDEFPTSLSTGCDLASGDYLRCFDVRIANIVDQVNIMMYNSASSTDYDNFLNVV 30 TPTEWATAMPASKIVIGGCVGPIGTIGGCAFGAAPTATQLKAYASLLDPALHERLSRMD 31 LGFMLDLARDELLVLLESEQAHNPGVAVREGEGREDKQQQRRVQREVGAEEVDEAH 32 VGEERVEGGVRRDLAGVEQQ 33 >ACHHYP_03052 34 MAAVSNPLLPLQLALADLLERPIHAALDDALRQPSNEQHLHHCVRSLPPSATVDALD 35 ASLAFVVHARALLTICSDYLDQHIAPQHALKKITDLLSVSREIANDAEVNATADDADVDE 36 AATDDSDQFASPKGEPPVGPWSGSETPAAPTSRQSWWAQIWGGDEDNDSAGDDVS 37 APPEEETLPSLPVEVANTIASLAAFPTNLKLQLHGLEALVEYVHGPCCCESVGPLYAAP 38 DMLPAVLHAISSLAQSKRAQIAGLSLLANPSSPKANMPMLPANLPTQQVRRLILRAMQR 39 FKAHAQIQGLGCLALSNLCRGPAISESHALKARGCRLVWSSWLLALICASSGTSMRAH 40 PLTGGPEDMQYAVLDAGSVAVVEAASRRFQDDDRVRKHADMALREMLQKHASRRAP 41 QCAFQ 42 >ACHHYP_04549 43 MRARAFFVLAGCATAAASPPLPWQSSCQVCAHTGRCGGASSPIKFCGTWPTGACC 44 CSANVNCPTPGVHATCDCGFLADYPVDAALPPVADVLGYNFS 45 >ACHHYP_04706

36

1 MRASVLAIAATVAAAANNQTATTKVFSLEVGTVGVHASRNQDSVLIPCKSNVCVPTG 2 SATLEFCRKACNRETGEHDCTTNCACNGTTPGYMCAGICNKAKTADECGSPVFQTCS 3 GEDLVPDYECANYKCTNHQRTNYLGANNRCANYERAAYPLPDIHARVRFVKLLNPGE 4 AHLIEYYTGLYFGPGQNNANDGFIWNPSVGSIKSISGNSCLDAYVAVDHNVYVHTYPC 5 DDSNPNQWWLYDSSLHQLRHKTHSTMCLDADPNDANKKVQMYLCSPGNANQYFDM 6 RPILS 7 >ACHHYP_04908 8 MTSVVAVTACLLSWLQRSRASPPVAYSAPNSVAFPAEIVHIKVILSRRRSSVLANGVL 9 PPVAPPRRAAHHEGHLAPLTRDLLSDKSGNAPP 10 >ACHHYP_05005 11 MSHCHFAFFVPMLARSLASFTRASRRCFSTEGPFEHRAVSAEVIAELKALYGDRVSTA 12 ASVREHHGTDESYHTPSPPDVVVYADSTEEVSKILQIASASKTPVIPFGAGSSLEGHISA 13 LHGGISLDLTNMKSVISVEQENMSCRVQCGVTRLQLESELRATGLFFPVDPGADATLG 14 GMVATNASGTTTVRYGNMKSNVLGLTAVMADGKIIKTGSKARKSSAGYDLTRLFIGSE 15 GTLAVVTEVELRLQGVPEAQKIAVCSFPTIQDAVDTCTVIMQMGIPVARMEFMDHKAIE 16 ATNSYSKLNNIVSPCLVIEMNGTPEEIEHHTATVQALAEEYSVQRMSWAATEEDRKELL 17 KARHSAWYATMNLVPGSRALSTDVCVPISNLTQVIVDTQADLEASNLVGTIVGHVGDG 18 NFHVMLPFLPEDEPAVRAFSDRLVERALAADGTCTGEHGIGSGKIKYLRMEHGDSVDV 19 MRTIKQALDPHNILNPSKLF 20 >ACHHYP_05180 21 MYNTADSVAFLSLLTSTVRAITPLPPLQFRVQAKFATGPLPASKPSPSSFISVRFVWNI 22 LVRLVVYRRRATPTPVDMAQERTVLA 23 >ACHHYP_05326 24 MHCTFFLSIVTAALAGVAGHVQQRIRSGAVKARGVNLGSWLVTEHFMMPQSPIYQNV 25 SADLQPLGEYVVTTALGRAVADPLFKAHRSSWITENDIKEIASFGLNTVRVPVGWWIYE 26 DPNDSDWQAYSPGGIQYLDALINDWALKYNVAVLVGMHGAKGSQNGEGHSAPQLPG 27 ESHFTDDADNVYTTMQSAKFIMSRYQSSVAFLGLEMLNEPTITPGRVYNIDRTKLIIYYT 28 NLYSKLRAICSSCIIMLSPLLNEQYESFGNQWANVLPTGSNNWIDWHKYLIWGFENWS 29 MKDIINTGTQWIANDITLWQSRRSAPIFVGEWSLAAAEGILGELKNGTNLNTYANRALA 30 AMKEAKAGWTYWSWKVNATDWRSYGWNMQALLRAGVIDLKNA 31 >ACHHYP_05770 32 MSKLSLAFLLHPTALACPPGPEAYVCPLSPETIVCPLSPRVSPASSARAKPKRSPPA 33 PRSRPCKEPGCTKYAVTRGHCIAHGGGKRCSVEQCPSGAKSNGLCWKHGGSKTCS 34 FPKCSNRSKTYGVCWSHGGGKQCADPNCTKTALRHGFCWAHGGGKRCRTEGCQR 35 PAYERNDNLCDVHCAKAS 36 >ACHHYP_06287 37 MQLSHILLFATAAAAQHTLLDSGTPEDRPSSWGSPVTKQIPSAVRFRSSGLCGEAQTI 38 DYVDFMVNTDLADIKANATWIGVEICPSVEDVPACPPTSVAEQIPIEVRGKRTTLHWVP 39 ATPKVLEPESLYWFIVSSNVENALQAVSWYPGSKRYGTDNDPKSDVASATRMLVPWG 40 GMDWVVEPSGGVAPLDHRRVPNAKIVVKA 41 >ACHHYP_06505 42 MIKSFTITATLLASASSLQMTNKERNELIDELNQWKKSQAGKTALVQGLLPPHPKTESF 43 DANAKLEAELVRFATTKKVVEKLNAEHNGSAVFSTDNQFALMTDDEFKKYVQGAFGK 44 PHKKRQLRGENIQLELTPAQREASGKDWTTSKCMPAVKNQGSCGSCWSFAAVGASA 45 MAHCLVSGKLIDLSEQQLVSCASSAGQGCQGGWPNKALEYIAQTGVCTAADFPYTQS 46 NGQCKQSCRKNKLSIGRPVDIRGESALQSALDKQPVTVVVEAGNNVWRNYKSGIVKS

37

1 CPGAQSDHAVIAVGYGNGFFKIRNSWGANWGEQGYMRLQKGSGGNGMCNVAEAPS 2 YPSMSGSPKPNNDDNNMPDDNDD 3 >ACHHYP_06977 4 MKISRVAVIGLLFVAARSTRAQSTSSSTQSNTETTSTESTPFSSSSSSGPAPIVDVIAAA 5 IDAGATPKQAAIVAVAADTGASLAAIIQTAVDAGVSPSIASAVASAANSAAGSGADDVTS 6 APITTVADAAVDAGATTAQAAAIANAASSGVSGDDLVNVAISVGVPASIASSVASAAGS 7 AAGTPAPIADVIQAALDSGASLNQAAAIAVAVSAGVSVDDIQTAAIQQGLPASVASSIAS 8 AVQSTIASAAGSTSADALAANGLGVTSASSTSYVPPSEVTPLKLTGAKDPEAASDVNS 9 PEAYSFSAPMTSGSTKSSESPLSGISGMFNNIVALVTSAPSPAEEPKPRLRASCRTA 10 >ACHHYP_07400 11 MKTPAFLASALFAVATGERPACGPDTPSPTMTPTADPTFAPTSGPTFPPTPAPGQWT 12 SLGGFAHDISFDGTNVCVKNGDGAFCGFAGQPFDQWKPVATQLKDIEQVACAKGVAF 13 VWGRSSGDLVMKTINLKTGEEHDAKMQDGESPRQFSTDGSVVCGTTNSRLFGAKVT 14 NGALGAYSTISEDHEIYKTAVAGEFLIVAGYDGALQATLLDAENWDTFSFDVVPVDLRA 15 REISTDGVDLCIVTYELDIACSKLSSGLEKWTKVPGEWKTVAVSNNTIYGVDFKSSEIRY 16 TYLK 17 >ACHHYP_08323 18 MVAWAWLPAAAAVVAATETHWSHLGNASSDRGLRIHTPITRADLHDEYNDAPVTQR 19 RLSGSAASLFRAVAGYGFRGLSNAAIFSGVTLDMCASACVTDARCLSFDYEASTCYIA 20 HTDRYAYPADFVPRATSTYYEWQGAAATPTIEPNGGRLTSYGAFQLFTTSRAAAMYY 21 QFKSLENGTVTVYTLYSPGTTVTLPEYPCVVQAYTTKAGLSDSIVLVSNAFTVYAARYA 22 YLVPFYNGLGFHGLVTRVQLDVQGVKRPRPSRVLEFTDINSTLGIGPFRGQLSTINLTA 23 YDARLAGFFDAFTGITTTLCPQVESRVAVSTVTYVNVSLQVFQNASRWVLVPAPLYAS 24 APGDLVFSSSVSLVEEYLYLCPHQNAKGHAGVIAKVNLRAFNATSHLPFQPAIEMLDLT 25 VIDPSLTGFGSCFANRNYGYFVQRRNAAGLAGQIVRVNLDLFAQPALAVTVLNATTFD 26 ARFVGFSGAVVYKNVAYLVPFERNKVGLELNPNYKYFPTPTSSIMGRLDLTTFSTVTPV 27 DLSVLDVKYACGYFGGFTVSYYVYLVPNMWTTDTTSPGVNPYHGLVARLNTLTMNVE 28 SLDLTLVDPSLKGFMRGFAFGRYAILVPHRNGLTTELPVRLNKSQKNNLGTIVAIDTDNF 29 TPSGVRYLDLTLALRSQIPNMPDADLRGFIGGGVSGEYGFFVPYFNGVRFSGKVVRVN 30 LRKFGEVQVLDMTQVHTSLRGFTNAVFPQLYEPTVTSLWNYVIPDGTQTPYTFITVDV 31 >ACHHYP_09221 32 MVSVTTPSMTLLGAIALVAGQATVAPTTATPSAPSASPTKGPWAFKSVRTVQARVQA 33 DVPVWDAAHKEWVAVFPQNTVTFEQRYRAAMDTINTATVEGALFYVQTEGIDKAVQA 34 ANGCMRKSNMSYIWYYDIEVVQPVYSVAEFGQNTGYAPEYGPFIAMDNGMCTPTSGT 35 TVPQGCMQFTGLAGNIALGNYIGGEPRTKHQYANYANNYWFSYPNSCFTKSFTAKTD 36 ACRNSPMQKGGLCPYGTKPDGINCTYSFSVLGYLSIDDLVGITSTVNPQTGKAFSNHM 37 EFCKAGKYEWDFTTSTGLPFWADPLNVTANAARSAKMMDLYTAKVAAGVGEYANMK 38 PFPKVSELVAQNPSCSDNSPYCAKQPHGCQRSLLGQICVPCSSASPSCKPPTRAFPA 39 LPVATTPPPVTDAAGNVVPMSTNLLGQAVPATSSASTVAFSATAAILVLALA 40 >ACHHYP_09519 41 MIVSAIVFAVLASAAGQSPLKIASSVPYALTIDGSAPVSTVISNTRATSLSVHIASMNLP 42 PGATLTIGTVDGKDKVVYTGAHTNLVSDYFIQNKVVVSYAAASYSNNTTPLVAIDKYFA 43 GTPDAGGLESICSTTGDLSRPAACYATSEPVKYAKARAIARLVIGGSSLCTGWLFGSE 44 GHLLTNNHCINNDRLAASTQVEFGAECASCSDGSNNVQLACKGTIVASNVTLLATSSK 45 LDFALVKINLNAGVDLSKYGYLQARDSAPVLNEPVWLAGHPQGDPLRMAVATSNNAE 46 GAIVSTNVTDSCKDNQVGYLLDTQGGSSGSPVMSTVDNSVVAIHNCGGCDSETPSNG

38

1 GIPLTKILAYLRANNIALPKNSVSAAPAPTTAKTTTASPATPAPSTSAPATAAPKPPTFTL 2 CSVSNKVISEYYTGLYVAPAGHTANEQFSYSPDTGAIQVQSNGQCLDAYWGGSSFLV 3 HTWPCDRGNNNQKWTVANNQVMHRVHGVCLTSVAGSKSLGVAPCNAADVRQWIYT 4 NCDTANVRNFVQLRTPRGALVSEWYSSVLAKQPQSSWTELWEINGQQMRSFSGSTC 5 LDAYWDNSRFQVHTWQCDPTNGNQQWRVGNSVVAHATHSNLCLDVDPTDPRQAAQ 6 VWGCHSATINSNQLFDVVAF 7 >ACHHYP_10824 8 MASISQWLCLSCWAPMSTPKTTMATDAWCGTFWKHMLMSVSVTPPPACDCSTDGA 9 TALFFAAQRGHSDIVYLLMSAGATAEESTLGISPKQIAQANGHTIVAAIFDTLPPPLPHRL 10 HWERSSVLFLSSFLVYRCNLLLLRH 11 >ACHHYP_11025 12 MHARFFAPVLGTLSLVAGSATTLAVNSSRTPQVNAQVRRLSKRALPRDMGKSSTSA 13 QAPEGSSKPDMMKDFPIFLFTIE 14 >ACHHYP_11286 15 MASESTPLLALLELPLLKPTSAETIQGHVTALRASFISGAMRPLAARKAQLRAIRALVE 16 DGCEILQAAMWKDLHKHAAETFVTETSSVLLEVQDHLDNLDDWAAPHKVGTNLLNLP 17 GSSYIRSDPLGVACIMDTWNYPIMLLLMPLIGAI 18 >ACHHYP_11397 19 MDRLLLLSALATAVAVDDAAPRPSRAPLPTTLVPWGSPLAAPTAPCTWGGRAHALD 20 WNLTTSVPGSRQCFPNLFAADQPLEFPYPRSSYNYDLDPPVVGPRVQVQWTNGVTN 21 VTAPVAAFDYRTFEMTGDELLFHALPDAPGVYRLAVQAFDWDRASSECRACLAVTDQ 22 VRPRATVARAGLCGASTTAPYSPEALAAADDRVRALVRYRATATNNDACSDRRCDAV 23 TVAQTGFLSAFPTAVVDGANAAVDAVPDGWLGCLAAPLSARERQRLTTPLALVDDAR 24 DYFVALQELYTPFRCGAPPGRPTCAGAASETCALMQAVVLPASHLVARVAVKLKATAG 25 HIADPAAAFPGAGYLPPSARHLHLAIPCYPTNASFSSFCADTVEWRVSDLFELSAELNA 26 SQPWGFDAAAPLVTWFVQQGPAWVAVADNKRLAFDKFQDTLVFRAMTPCGQVGEDI 27 AWTVFSHRAEALSVDAWWNSLWSCGGCNVPKADFSVCRFRFDPTSPLVSAMLHPPA 28 SCRDAAGRSCRNGCLARGQCNGRSTAASCGQQAGATWCDARGSALLAAAVPRYSL 29 RSLQCVWQYANTSSANWSVAVDVAVDTAFALKLRNADATELSVSCTLTFDPDTGEPA 30 VVKTRSLALSLRNCDGPRFEDHALAFVKDRCDASWRPGVGRQPAPRQACAGHLVFP 31 STTDAAATVLLTPADDLACCSGPVAAFSCQPLPGHPGLKQCQRADTATALLAAEPQA 32 WPPVALAASLALVFVLVRRRRQPSDTDLSRPLIDGDRC 33 >ACHHYP_12628 34 MIVQILALAATASAFTKCHIRHPNRTEVLSTPCPHEYVTELPASFDWRNVNGTNFVTV 35 SRNQHVPHYCGSCWAFAATSALSDRVRIARERNSEGKDRVLVTRQVNLSPQVLLNC 36 DKEDMGCHGGEGLSAYRYIHENGIPEEGCQRYLATGHDVGNTCTAIDVCRNCEPSKG 37 CFPQPSYDTYHVSEYGAVDGEAKMMAEIFARGPIVCGVAVTDEFLNYSGGVIDDKSGR 38 TDIDHDISIVGWGVDGSGTKYWVGRNSWGTYWGEEGWFRLRRGNNNLGVETDCAF 39 GVPADDGWPKRHTETTSPAKAAVWSGEIKSLLQPSRAQAKSRAPVHFVGGEKVLSPR 40 PHEEIDVLALPKQWDWRNIAGINYVTWDKNQHIPQYCGSCWAQATTSALSDRIAILRN 41 ASWPEIALSPQVVVNCHGGGSCEGGNPGAVYEYAHRHGIPDQTCQAYVAKDGQCNA 42 LGVCETCWPTNSSFTPGKCVAVPKFKSYYVAEYGHVRGADKMKAELYKRGPIGCGM 43 HVTDKFEAYTGGIYSEKTWFPIPNHEISIAGWGFDEATQTEYWIGRNSWGTYWGENG 44 WFRIKMHSDNLGIEGDCDWGVPIPDGSQPLL 45 >ACHHYP_13722

39

1 MKCFAVLAFAAFAAAATSEQAATTQPATTTAAVTTAAPNTTTVVPLVSTKAPNTTTVTP 2 APTTKAVTTVPVTTVKANTTAPVTTVPVVTQTNVTSPDETETPEPVIEQPTDAPLPVPT 3 KKKSNATTVPPSASASISMLSVASVAVAVAAYVM 4 >ACHHYP_14385 5 MATSVLALCFSSLTANSTNTPEPKYQTRTVDTVVYESSAKWPKYMGKGSAIQMYTTA 6 ALSAQILVSFPETTTVLEKVATVGPLVSLSAVIFFGAKYLGERVITNVTSCRTVGQRGIT 7 DAIYLYLDEFLKIQVAGGLRPKTFECYPKGVSALRLISYLKLVSKDENGMCNVKINRTTF 8 WLDLGKAQVHQEQSLKILLDGKPLLVRKGKIKKAARA 9 >ACHHYP_15409 10 MGLFAPVLAFATVAVAGSSSTTLPTAPASLSTTRSVPLTDRAALIQELAKWKDSKAGK 11 YAAANGFLKLSRLESAGDAEAELAAFAETKATVEALNQQYPLARFSTENPFALLTNDEF 12 ATWVSGGRDKVQRKVPEASTTQSTTASIAPGTVDWTMSGCVASVRSQGVCGSCFAF 13 AAVAAAESAYCLLHDRHLTPFSDQQVLSCGPGNGCMGGWSDQSLAWMASHGVCTG 14 ASYPHTNDWNTTAAACIPECKALSMPYSSVASVAGEHELEAAIALQPVAVDISATSPVF 15 KNYESGIITGGCNVDFNHVVLGVGYGVAEVPYFKMKNSWGDWWGEGGFVRLQRGV 16 GGVGTCGLARHAAYPVVFPMPFNLVTFRGVVISEYYSNLFASAKQGSVNELWTYDAIT 17 RHITVGSNHQCLDAYPTGSSYAVHTYSCDAKNDNQKWVIDSANHAIKHAVHPTLCLDV 18 DPNQNNKVQVWSCSPGNQNQWVAVSEERVKLWNVNGNFLASDGNLIQFYSPSSPSY 19 EWAVSNLDHTWRARSNVGAPDLCLDAYEPWNGGAVHLYTCDSTNGNQKWIYDAKTQ 20 QLRHLTHVGFCLDMRTALGDKAHLWTCNTPANSLQKFQYKSLTFPA 21

40

LITERATURE CITED

Archibald, J. M. 2008. The origin and spread of eukaryotic photosynthesis: evolving views in light of genomics. Bot. Mar., 52:95--103.

Archibald, J. M. 2009. The puzzle of plastid evolution. Curr. Biol., 19:R81--R88.

Armbrust, E. V. 2009. The life of diatoms in the world’s oceans. Nature, 459:185--192.

Armbrust, E. V., Berges, J. A., Bowler, C., Green, B. R., Martinez, D., Putnam, N. H., Zhou, S., Allen, A. E., Apt, K. E., Bechner, M., Brzezinski, M. A., Chaal, B. K., Chiovitti, A., Davis, A. K., Demarest, M. S., Detter, J. C., Glavina, T., Goodstein, D., Hadi, M. Z., Hellsten, U., Hildebrand, M., Jenkins, B. D., Jurka, J., Kapitonov, V. V., Kröger, N., Lau, W. W. Y., Lane, T. W., Larimer, F. W., Lippmeier, J. C., Lucas, S., Medina, M., Montsant, A., Obornik, M., Parker, M. S., Palenik, B., Pazour, G. J., Richardson, P. M., Rynearson, T. A., Saito, M. A., Schwartz, D. C., Thamatrakoln, K., Valentin, K., Vardi, A., Wilkerson, F. P. & Rokhsar, D. S. 2004. The genome of the diatom Thalassiosira pseudonana: Ecology, evolution and metabolism. Science, 306:79-86.

Baginsky, S., Kleffmann, T., von Zychlinski, A & Gruissem, W. 2005. Analysis of shotgun proteomics and RNA profiling data from Arabidopsis thaliana chloroplasts. J. Prot. Res., 4:637--640.

Barbrook, A. C., Howe, C. J. & Purton, S. 2006. Why are plastid genomes retained in non-photosynthetic organisms. Trends Plant Sci., 11:101--108.

Baurain, D., Brinkmann, H., Petersen, J., Rodríguez-Ezpeleta, N., Stechmann, A., Demoulin, V., Roger, A. J., Burger, G., Lang, B. F. & Philippe, H. 2010. Phylogenomic evidence for separate acquisition of plastids in cryptophytes, haptophytes and stramenoiles. Mol. Biol. Evol., 27:1698--1709.

Beakes, G. W. & Sekimoto, S. 2009. The evolutionary phylogeny of oomycetes - insights gained from studies of holocarpic parasites of algae and invertebrates. In: K. Lamour and S. Kamoun (ed.), Oomycete Genetics and Genomics: Diversity, Interactions, and Research Tools. John Wiley & Sons, Inc., Hoboken, NJ, USA. doi: 10.1002/9780470475898.ch1.

Birch, P. R. J., Rehmany, A. P., Pritchard, L., Kamoun, S. & Beynon, J. L. 2006. Trafficking arms: oomycete effectors enter host plant cells. Trends Microbiol., 14:8--11.

Bittner, L., Halary, S., Payri, C., Cruaud, C., de Reviers, B., Lopez, P. & Bapteste, E. 2010. Some considerations for analyzing biodiversity using integrative metagenomics and gene networks. Biol. Direct, 5:doi:10.1186/1745-6150-5-47.

Bodyl, A. & Moszczynski, K. 2006. Did the peridinin plastid evolve through tertiary endosymbiosis? A hypothesis. Eur. J. Phycol., 41:435--448.

Bodyl, A. 2005. Do plastid-related characters support the chromalveolate hypothesis? J. Phycol., 41:712--719.

Bodyl, A., Stiller, J. W. & Mackiewicz, P. 2009. Chromalveolate plastids: direct descent or multiple endosymbiosis. Trends Ecol. Evol., 3:119--121.

Bowler, C., Allen, A. E., Badger, J. H., Grimwood, J., Jabbari, K., Kuo, A., Maheswari, U., Martens, C., Maumus, F., Otillar, R. P., Rayko, E., Salamov, A., Vandepoele, K., Beszteri, B., Gruber, A., Heijde, M., Katinka, M., Mock, T., Valentin, K., Verret, F., Berges, J. A., Brownlee, C., Cadoret, J. P., Chiovitti, A., Choi, C. J., Coesel, S., De Martino, A., Detter, J. C., Durkin, C., Falciatore, A., Fournet, J., Haruta, M., Huysman, M. J., Jenkins, B. D., Jiroutova, K., Jorgensen, R. E., Joubert, Y., Kaplan, A., Kroger, N., Kroth, P. G., La Roche, J., Lindquist, E., Lommer, M., Martin-Jezequel, V., Lopez, P. J., Lucas, S., Mangogna, M., McGinnis, K., Medlin, L. K., Montsant, A., Oudot-Le Secq, M. P., Napoli, C., Obornik, M., Parker, M. S., Petit, J. L., Porcel, B. M., Poulsen, N., Robison, M., Rychlewski, L., Rynearson, T. A., Schmutz, J., Shapiro, H., Siaut, M., Stanley, M., Sussman, M. R., Taylor, A. R., Vardi, A., von Dassow, P., Vyverman, W., Willis, A., Wyrwicz, L. S., Rokhsar, D. S., Weissenbach, J., Armbrust E. V., Green B. R., Van de Peer, Y., Grigoriev, I. V.. 2008. The Phaeodactylum genome reveals the evolutionary history of diatom genomes. Nature, 456:239--244.

Burki, F., Shalchian-Tabrizi, K., Minge, M., Skjaevelane, A. Nikolaev, S. I., Jakrobsen, K. S. & Pawlowski, J. 2007. Phylogenomics reshuffles the eukaryotic supergroups. PLoS One, 2:e790.

Cavalier-Smith, T. 1999. Principles of protein and lipid targeting in secondary symbiogenesis: euglenoid, , and sporozoan plastid origins and the eukaryote family tree. J. Eukaryot. Microbiol., 46: 347--366.

Cavalier-Smith, T. 2003. Genomic reduction and evolution of novel genetic membranes and protein-targeting machinery in eukaryote-eukaryote chimaeras (meta-algae). Philos. Trans. R. Soc. Lond. B. Biol., 359:109--134.

Chan, C. X., Reyes-Prieto, A. & Bhattacharya, D. 2011. Red and green algal origin of diatome membrane transporters: Insights into enviromental adaptation and cell evolution. PloS ONE, 6(12):e29138. doi:10.1371/journal.pone.0029138

Cock, J. M., Sterck, L., Rouze, P., Scornet, D., Allen, A. E., Amoutzias, G., Anthouard, V., Artiguenave, F., Aury, J. M., Badger, J. H., Beszteri, B., Billiau, K., Bonnet, E., Bothwell, J. H., Bowler, C., Boyen, C., Brownlee, C., Carrano, C. J., Charrier, B., Cho, G. Y., Coelho, S. M., Collen, J., Corre, E., Da Silva, C., Delage, L., Delaroque, N., Dittami, S. M., Doulbeau, S., Elias, M., Farnham, G., Gachon, C. M. M., Gschloessl, B., Heesch, S., Jabbari, K. Jubin, C., Kawai, H., Kimura, K., Kloareg, B., Küpper, F. C.,

42

Lang, D., Le Bail, A., Leblanc, C., Lerouge, P., Lohr, M., Lopez, P. J., Martens, C., Maumus, F., Michel, G., Miranda-Saavedra, D., Morales, J., Moreau, H., Motomura, T., Nagasato, Ch., Napoli, C. A., Nelson, D. R., Nyvall-Collén, P., Peters, A. F., Pommier, C., Potin, P., Poulain, J., Quesneville, H., Read, B., Rensing, S. A., Ritter, A., Rousvoal, S., Samanta, M., Samson, G., Schroeder, D. C., Ségurens, B., Strittmatter, M., Tonon, T., Tregear, J. W., Valentin, K., von Dassow, P., Yamagishi, T., Van de Peer, Y., & Wincker, P. 2010. The Ectocarpus genome and the independent evolution of multicellularity in brown algae. Nature, 465:617--621.

De Koning, A. P. & Keeling, P. J. 2004 Nucleus-encoded genes for plastid-targeted proteins in Helicosporidium: functional diversity of a cryptic plastid in a parasitic alga. Eukaryot. Cell, 3:1198--1205.

Delwiche, C. F. 1999. Tracing the thread of plastid diversity through the tapestry of life. Am. Nat., 154:S164--S177.

Dodge, J. D. 1975. A survey of chloroplast ultrastructure in the dinophyceae. Phycologia 14:253-–263.

Dong, J., Chen, C. & Chen, Z. 2003. Expression profiles of the Arabidopsis WRKY gene superfamily during plant defense response. Plant Mol. Biol., 51:21--37.

Dorrell, R. G. & Smith, A. G. 2011. Do red and green make brown?: perspectives on plastid acquisitions within chromalveolates. Eukaryotic Cell, 10:856--868.

Drummond, A. J., Ashton, B., Buxton, S., Cheung, M., Cooper, A., Duran, C., Field, M., Heled, J., Kearse, M., Markowitz, S., Moir, R., Stones-Havas, S., Sturrock, S., Thierer, T. & Wilson, A. 2011. Geneious v5.5. www.geneious.com.

Elias, M. & Archibald, J. M. 2009. Sizing up the genomic footprint of endosymbiosis. BioEssays, 31:1273--1279.

Emanuelsson, O., Nielsen, H. & von Heijne, G. 1999. ChloroP, a neural network-based method for predicting chloroplast transit peptides and their cleavage sites Prot. Sci., 8:978--984

Foth, B. J. & McFadden, G. I. 2003. The : a plastid in falciparum and other Apicomplexan parasites. Int. Rev. Cytol. 224:57--110.

Gaulin, E., Madoui, M. A., Bottin, A., Jacquet, C., Mathé, C., Couloux, A., Wincker, P., Dumas, B. 2008. Transcriptome of Aphanomyces euteiches: new oomycete putative pathogenicity factors and metabolic pathways. PLoS ONE, doi:10.1371/journal.pone.0001723

Gibbs, S. 1981a. The chloroplast endoplasmic reticulum: structure, function, and evolutionary significance. Int. Rev. Cytol., 72:49--99.

43

Gibbs, S. 1981b. The chloroplast of some algal groups may have evolved from endosymbiotic eukaryotic algae. Ann. N.Y. Acad. Sci., 361:193--208.

Green, B. R. 2011. After the primary endosymbiosis: an update on the chromalveolate hypothesis and the origins of algae with Chl c. Photosynth. Res., 107:103--115.

Gruber, A., Vugrinec, S., Hempel, F., Gould, S. B., Maier, U. G. & Kroth, P. G. 2007. Protein argeting into complex diatom plastids: functional characterisation of a specific targeting motif. Plant Mol. Biol. 64:519--530.

Gschloessl, B., Guermeur, Y. & Cock, J. M. 2008. HECTAR: A method to predict subcellular targeting in . BMC Bioinformatics, doi: 10.1186/1471-2105-9- 393.

Guillot, M. & Gibbs, S. 1980a. Evidence that the chloroplast and of cryptomonads are remnants of a eukayrotic symbiont. J. Cell Biol., 87:186.

Guillot, M. & Gibbs, S. 1980b. The nucleomorph: its ultrastructure and evolutionary significance. J. Phycol., 16:558--568

Guindon, S., Dufayard, J. F., Lefort, V., Anisimova, M., Hordijk, W. & Gascuel, O. 2010. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Sys. Biol., 59:307--321.

Hackett, J. D., Yoon, H. S., Li, S., Reyes-Prieto, A., Rümmele, S. E. & Bhatta charya, D. 2007. Phylogenomic analysis supports the monophyly of cryptophytes and haptophytes and the association of rhizaria with chromalveolats. Mol. Biol. Evol. 24:1702--1713.

Hackett, J. D., Yoon, H. S., Soares, M. B., Bonaldo, M. F., Casavant, T. L., Sheetz, T. E., Nosenko, T. & Bhattacharya, D. 2004. Migration of the plastid genome to the nucleus in a peridinin dinoflagellates. Curr. Biol., 14:213--218.

Harper, J. T., Waanders, E. & Keeling, P. J. 2005. On the monophyly of chromalveolates using a six-protein phylogeny of eukaryotes. Int. J. Syst. Evol. Micr., 55:487--496.

Huang, J., Mullapudi, N., Lancto, C. A., Scott, M., Abrahamsen, M. S. & Kissinger, J. C. 2004. Genomic evidence supports past endosymbiosis, intracellular and horizontal gene transfer in parvum. Genome Biol., 11:R88.

Iida, K. Takishita, K., Ohshima, K. & Inagaki, Y. 2007. Assessing the monophyly of chlorophyll-c containing plastids by multi-gene phylogenies under the unlinked model conditions. Mol. Phylogenet. Evol., 45:227--238.

44

Janouskovec, J., Horak, A., Obornik, M., Lukes, J. & Keeling, P. J. 2010. A common red algal origin of the apicomplexan, dinoflagellates and heterokont plastids. Proc. Natl. Acad. Sci., 107:10949--10954.

Jiang, R. H., Tyler, B. M., Whisson, S. C., Hardham, A. R. & Govers, F. 2006. Ancient origin of elicitin gene clusters in Phytophthora genomes. Mol. Biol. Evol., 2:338--351.

Kamoun, S. 2006. A catalogue of the effector secretome of plant pathogenic oomycetes. Annu. Rev. Phytopathol., 44:41--60.

Keeling, P. J. 2004. Diversity and evolutionary history of plastids and their hosts. Am. J. Bot., 91:1481--1493.

Keeling, P. J. 2009. Role of horizontal gene transfer in the evolution of photosynthetic eukaryotes and their plastids. Methods Mol. Biol., 532:501--515.

Khan, H., Parks, N., Kozera, C., Curtis, B. A., Parsons, B. J., Bowman, S. & Archibale, J. M. 2007. Plastid genome sequence of the cryptophytes alga, salina CCMP1319: lateral transfer of putative DNA replication machinery and a test of chromist plastid phylogeny. Mol. Biol. Evol., 24: 1832--1842.

Kleffmann, T., Russenberger, D., von Zychlinski, A., Christopher, W., Sjolander, K., Gruissem, W. & Baginsky, S. 2004. The Arabidopsis thaliana chloroplast proteome reveals pathway abundance and novel protein functions. Curr. Biol., 14:354--362.

Kleffmann, T., Hirsch-Hoffmann, M. Gruissem, W. & Baginsky, S. 2006. plprot: a comprehensive proteome database for different plastid types. Plant Cell Physiol., 47:432--436.

Köhler, S., Delwiche, C. F., Denny, P. W., Tilney, L. G., Webster, P., Wilson, R. J., Palmer, J. D. & Roos, D. S. 1997. A plastid of probable green algal origin in apicomplexan parasites. Science, 275:1485--1489.

Kolaczkowski, B. & Thornton, J. W. 2008. A mixed branch length model of heterotachy improves phlogenetic accuracy. Mol. Biol. Evol., 25:1054--1066.

Kroth, P. G. 2002. Protein transport into secondary plastids and the evolution of primary and secondary plastids. Int. Rev. Cytol., 221:191--255.

Lane, C. E. & Archibald, J. M. 2008. The eukaryotic tree of life: endosymbiosis takes its TOL. Trends Ecol. Evol., 5:268--275.

Larkum, A. W. D., Lockhart, P. J. & Howe, C. J. 2007. Shopping for plastids. Trends Plant Sci., 12:189--195.

Lee J. J., Leedale G. F. & Bradbury P. (eds) 2000. Illustrated Guide to the .

45

2nded., Society of Protozoologists, Allen Press, Lawrence, Kansas.

Marchler-Bauer, A., Anderson, J. B., Derbyshire, M. K., DeWeese-Scott, C., Gonzales, N. R., Gwadz, M., Hao, L., He, S., Hurwitz, D. I., Jackson, J. D., Ke, Z., Krylov, D., Lanczycki, C. J., Liebert, C. A., Liu, C., Lu, F., Lu, S., Marchler, G. H., Mullokandov, M., Song, J. S., Thanki, N., Yamashita, R. A., Yin, J. J., Zhang, D. & Bryan, S. H. 2007. CDD: a conserved domain database for interactive domain family analysis. Nucleic Acid Res., 35:D237--240.

Moustafa, A., Beszteri, B., Maier, U. G., Bowler, C., Valentin, K. & Bhattacharya, D. 2009. Science, 324:1724--1726.

Okamoto, N., Chantangsi, C., Horák, A., Leander, B. S. & Keeling, P. J. 2009. Molecular phylogeny and description of the novel Roombia truncate gen. et sp. Nov., and establishment of the hacrobia taxon nov. PLoS ONE. 4:e7080. doi:10.1371/journal.pone.0007080.

Pagel, M. & Meade, A. 2008. Modelling heterotachy in phylogenetic inference by reversible-jump Markov chain Monte Carlo. Phil. Trans. R. Soc. B., 363:3955--3964.

Parfrey, L. W., Grant, J., Tekle, I. Y., Lasek-Nesselquist, E., Morrison, H. G., Sogin, M. L., Patterson, D. J. & Katz, L. A. 2010. Broadly sampled multigene analyses yield a well- resolved eukaryotic tree of life. Syst. Biol., 59:518--533.

Patron, N. J., Inagaki, Y. & Keeling, J. P. 2007 Multiple gene phylogenies support the monophyly of cryptomonads and haptophytes host lineages. Curr. Biol.,17:887-891.

Petersen, T. N., Brunak, S., von Heijne, G. & Nielsen, H. 2011. SignalP 4.0: discriminating signal peptides from transmembrane regions. Nature Methods, 8:785-- 786.

Philippe, H., Zhou, Y., Brinkmann, H., Rodrigue, N. & Delsuc, F. 2005. Heterotachy and long-branch attraction in phyloogenetics. BMC Evol. Biol. 5:50. Doi:10.1186/1471- 2148-5-50.

Ralph, S. A., van Dooren, G. G., Waller, R. F., Crawford, M. J., Fraunholz, J. J., Foth, B. J., Tonkin, C. J., Roos, D. S. & McFadden, G. I. 2004. Metabolic maps and functions of the Plasmodium falciparum apicoplast. Nature Rev. Microbiol., 2:203--216.

Reyes-Prieto, A., Moustafa, A. & Bhattacharya, D. 2008. Multiple genes of apparent algal origin suggest ciliates may once have been photosynthetic. Curr. Biol., 13:956-- 962.

Rice, D. W. & Palmer, J. D. 2006. An exceptional gene transfer in plastids: gene replacement by a distant bacterial paralog and evidence that haptophytes and cryptophytes plastids are sisters. BMC Biol., 4:31.

46

Rogers, M. B., Patron, N. J. & Keeling, P. J. 2007. Horizontal transfer of a eukarotic plastid--targeted protein ene to cyanobacteria. BMC Biol., 5:26.

Sanchez-Puerta, M. V & Delwiche, C. F. 2008. A hypothesis for plastid evolution in chromalveolates. J. Phycol., 44:1097--1107.

Sanchez-Puerta, M. V., Lippmeier, J. C., Apt, K. E. & Delwiche, C. F. 2007. Plastid genes in a non-photosynthetic dinoflagellate. , 158:105--117.

Sekimoto, S., Klochkova, T. A., West, J. A., Beakes, G. W. & Honda, D. 2009. Olpidiopsis bostrychiae sp. Nov.: an endoparasitic oomycete that infects Bostrychia and other red algae (Rhodophyta). Phycologia, 48:460--472.

Shindo, T., Misas-Villamil, J. C., Hörger A. C., Song, J. & van der Hoorn, R. A. L. 2012. A role in immunity for Arabidopsis cystein protease RD21, the ortholog of the tomato immune protease C14. PloS ONE, 7:e29317. Doi:10.1371/journal.pone.0029317.

Slamovits, C. H. & Keeling, P. J. 2008. Plastid-derived genes in the nonphotosynthetic alveolates Oxyrris marinus. Mol. Biol. Evol., 25: 1297--1306.

Soll, J. & Schleiff, E. 2004. Protein import into chloroplasts. Nature Rev. Mol. Cell Biol., 5:198--208.

Stiller, J. W., Huang, J., Ding, Q., Tian, J. & Goodwillie, C. 2009. Are algal genes in nonphotosynthetic evidence of historical plastid endosymbiosis? BMC Genomics, doi:10.1186/1471-2164-10-484

Tatusov, R.L., Natale, D.A., Fedorova, N.D., Jackson, J., Jacobs, A., Krylov, D.M., Mekhedov, S.L., Nikolskaya, A.N., Rao, B.S., Wolf, Y.I., Aravind, L., Lanczycki, C., Masumder, R., Sreekumar, K., Vasudevan, S., Walker, D.R., Tatusova, T.A., Yao, K., Yin, J., Koonin, E.V. 2003. The COG database: an updated version includes eukaryotes. BMC Bioinformatics. 4:41.

Tyler, B. M., Tripathy, S., Zhang, X., Dehal, P., Jiang, R. H. Y., Aerts, A., Arredondo, F. P., Baxter, L., Bensasson, D., Beynon, J. L., Chapman, J., Damasceno, C. M. B., Dorrance, A. E., Dou, D., Dickerman, A. W., Dubchak, I. L., Garbelotto, M., Gijzen, M., Gordon, S. G., Govers, F., Grunwald, N. J., Huang, W., Ivors, K. L., Jones, R. W., Kamoun, S., Krampis, K., Lamour, K. H., Lee, M. K., McDonald, W. H., Medina, M., Meijer, H. J. G., Nordberg, E. K., Maclean, D. J., Ospina-Giraldo, M. D., Morris, P. F., Phuntumart, V., Putnam, N. H., Rash, S., Rose, J. K. C., Sakihama, Y., Salamov, A. A., Savidor, A., Scheuring, C. F., Smith, B. M., Sobral, B. W. S., Terry, A., Torto-Alalibo, T. A., Win, J., Xu, Z., Zhang, H., Grigoriev, I. V., Rokhsar, D. S., Boore, J. L. 2006. Phytophthora genome sequences uncover evolutionary origins and mechanisms of pathogenesis. Science, 313:1261--1266.

47

Whelan, S. & Goldman, N. 2001. A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol. Bio.l Evol,.18:691--699.

Wilson, R. J. M. 2004. Plastid functions in the . Protist, 155:11--12.

Woehle, C., Dagan, T., Martin, W. F. & Gould, S. B. 2011. Red and problematic green phylogenetic signals among thousands of nuclear genes from the photosynthetic and apicomplexa-related Chromera velia. Genome Biol. Evol., 3:1220--1230.

Yoon, H. S., Hackett, J. D., Ciniglia, C., Pinto, G. & Bhattacharya, D. 2004. A molecular timeline for the origin of photosynthetic eukaryotes. Mol. Biol. Evol., 21:809-- 818.

Yoon, H. S., Hackett, J. D., Pinto, G. & Bhattacharya, D. 2002. The single, ancient origin of chromist plastids. Proc. Natl. Acad. Sci. USA, 99:15507--15512.

48