Supplemental Data Heidel Et Al
Total Page:16
File Type:pdf, Size:1020Kb
Supplemental data Heidel et al. Table of Contents 1. Sequencing strategy and statistics ...................................................................................................... 2 2. Genome structure ............................................................................................................................... 2 2.1 Extrachromosal elements .............................................................................................................. 2 2.2 Chromosome structure ................................................................................................................. 3 2.3 Repetitive elements ...................................................................................................................... 5 3. Coding sequences ................................................................................................................................ 5 3.1 Homopolymer tracts ..................................................................................................................... 5 3.2 Gene families and orthology relationships ................................................................................... 7 3.3 Synteny analysis .......................................................................................................................... 11 4. Protein functional domains ............................................................................................................... 12 5. Protein families .................................................................................................................................. 13 5.1 Primary metabolism .................................................................................................................... 13 5.2 Secondary metabolism ................................................................................................................ 16 5.3 Cell shape, organization and motility .......................................................................................... 19 5.3.1 Kinesins ................................................................................................................................. 19 5.3.2 Myosins ................................................................................................................................ 21 5.3.3 The microfilament system .................................................................................................... 22 5.4 Gene transcription ....................................................................................................................... 32 5.5 Cell adhesion ............................................................................................................................... 34 5.6 Cell signaling ................................................................................................................................ 37 5.6.1 Seven transmembrane domain receptors including G‐protein coupled receptors ............. 37 5.6.2 Cyclic nucleotide synthesis, detection and degradation ...................................................... 40 5.6.3 Sensor histidine kinases ....................................................................................................... 43 5.6.4 Monomeric G‐proteins ......................................................................................................... 45 5.6.5 ABC transporters .................................................................................................................. 49 6. Supplemental Methods ..................................................................................................................... 52 6.1 DNA isolation ............................................................................................................................... 52 6.2 Sequencing and Assembly ........................................................................................................... 52 6.3 Chromosome structure analysis .................................................................................................. 53 6.4 Gene prediction, Blast and synteny analysis ............................................................................... 53 6.5 Gene family detection using domain analysis ............................................................................. 53 6.6 Protein variation .......................................................................................................................... 53 6.7 Phylogeny and species split dating .............................................................................................. 54 7. References ......................................................................................................................................... 57 1. Sequencing strategy and statistics The sequencing projects for both the P. pallidum (PP) and D. fasciculatum (DF) genomes were initiated by paired-end Sanger sequencing of 1-2 kb sheared genomic DNA inserts cloned in pUC18, soon to be followed by four runs of more cost-effective pyrosequencing using the 454 Roche platform, yielding a total coverage of over 15 x for each genome (Table S1.1). A dense physical map of both genomes was prepared by paired-end sequencing of ~30 kb genomic DNA fragments inserted into the fosmid vector pCC2FOS. The gaps within supercontigs were filled in by primer walking, until the sequence at both sides of the gap became too repetitive for primer design. At this point only 52 and 33 gaps remained in the PP and DF genomes respectively, which compares very favorably against the current state of 226 gaps in the DD genome, 5 years after completion and continued polishing (Table 1, main text). Table S1.1: Sequencing statistics P. pallidum D. fasciculatum reads from fosmid clones 7937 5033 reads from small insert library clones 112268 80433 454 runs/MB raw data 4/416 4/418 Contigs from initial newbler assembly 6352 7292 gap closing reads 1068 1165 genome coverage 14.4 x 14.9 x Final assembly: contigs/supercontigs 52/41 33/25 2. Genome structure 2.1 Extrachromosal elements Metazoans and plants integrate tandem arrays of rRNA genes into existing chromosomes (Schwarzacher and Wachtler 1993) to provide a large number of copies for simultaneous transcription. In contrast, some protists, such as Tetrahymena (Blomberg et al. 1997) amplify their rRNA genes on extrachromosomal palindromes. All analysed Dictyostelia contain an amplified extrachromosomal sequence that codes for rRNA genes. Sequences of extrachromosomal elements are highly overrepresented in the sequencing reads due to their high abundance. Yet, repetitiveness prevented their assembly from 454 read data alone. Thus, small insert library reads were used to assemble the full length palindrome arms. rRNA genes were defined based on homology to the DD counterparts and the common eukaryote set of rRNA genes. Based on the relative abundance of the sequencing reads that match the palindromes, compared to unique parts of the genome, the PP, DF and DD palindromes make up around 5, 9 and 25% of the genomic DNA content, respectively. This difference is mainly attributable to the shorter palindrome arms in PP (15 kb) and DF (26 kb) compared to 45 kb for DD. Thus, the number of palindrome molecules is in the same range in all species. The palindrome organization is the same as in DD (Sucgang et al. 2003): the rRNA genes reside at the ends of the arms with transcription directed towards the telomeric regions. The observed plasticity in palindrome size is thus due to different amounts of repeated sequences in their central regions. To highlight conserved regions in extrachromosomal palindromic elements of DD, PP and DF, the full (DD) or half palindromes (PP and DF) were aligned and pairwise represented in a dot-matrix using Tuple_plot (Szafranski et al. 2006), an algoritm which reduces noise caused by repetitive sequence. Only the regions that contain the ribosomal RNA genes are conserved between the three palindromes and these regions are typically located at the end of the palindrome arms (Fig. S2.1) 2 A Comparison between the extrachromosomal elements of DD (X-axis) and DF (Y-axis). B Comparison between DD (X-axis) and PP (Y-axis). C Comparison between DF (X-axis) and PP (Y-axis). Figure S2.1: Pairwise alignments between DD, PP and DF extrachromosomal elements represented as Tuple_plots. The full palindrome is shown only for DD. Yellow shadowing denotes the positions of the rRNA genes. Black dots represent tuples in the same direction, red dots inverse tuples. Representations are drawn to scale so that the figures are comparable. 2.2 Chromosome structure In DD no telomere structures common to eukaryotes are present. However, when we searched the assembled genomes of PP and DF for the common TTTAGA motif and variations thereof, we found repeated structures of such motifs in both species (14 in PP and 12 in DF). The motifs were all located at contig ends indicating a role as telomeres. In case of DF half of these structures were directly associated with DIRS transposable elements reflecting the location of DIRS elements at telomer ends in DD. The repeated telomere motifs in PP were found in the neighborhood of complex structures (see main text) but no DIRS elements could be found in the entire PP genome. Figure S2.2 shows the chromosomes of PP separated by Pulsed Field Gel Electrophoresis (PFGE), confirming the prediction of 7 chromosomes from 14 telomeres. 3 Figure S2.2: Pulsed-field gel separation of Pp PN500