<<

Abbreviations

Asn Asparagine A Adenine A Alanine aa ABC ATP-binding cassette Ala Alanine ALR Alignment Length Region Arg Arginine Asp Aspartic acid ATP Adenosine-triphosphate AUE AU-rich Element (in tRNA) B G or T or BLAST Basic Local Alignment Search Tool BLASTN BLAST for DNA sequences searching DNA sequences BLASTP BLAST for sequences searching protein sequences BLASTX BLAST for DNA sequences searching protein sequences bp base pair(s) BSE Bovine Spongiform Encephalopathy C Cytidine C Cysteine C. a programming language CBS Center for Biological multiple alignment program) CLUSTALW CLUSTAL with a command line interface CLUSTALX CLUSTAL with a graphical user interface

257 258 Abbreviations

CML Chemical Markup Language COG Cluster of Orthologous CRISPR Clustered, Regularly Interspersed Short Palindromic Repeats C-terminal Carboxy-terminal CVS Comma Separated Value file Cys Cysteine D Aspartic acid DBMS Management System DDBJ DNA DataBase of Japan DNA Deoxyribonucleic acid dsDNA double-strand DNA E Glutamic acid EBI European Institute EC Enzyme Commission eggNOG Evolutionary Genealogy of Genes: Non-supervised Orthologous Groups EMBL European Molecular Biology Laboratories E-value Expectation value ExPASy Expert Protein Analysis System F Phenylalanine FASTA pairwise alignment program / file format FIS binding site for Fis protein G Guanine G Glycine GCDML Genomic Contextual Data Markup Language GEI Genomic Island GIS Geographic Information System Gln Glutamine Glu Glutamic acid Gly Glycine GOLD Genomes OnLine Database H Histidine H nucleotide A or C or T H-bond Hydrogen bond His Histidine Abbreviations 259

HTML HyperText Markup Language HTTP HyperText Transfer Protocol I Isoleucine IHF Integration Host Factor Ile Isoleucine INSDC International Nucleotide Collaboration IQR Inter-Quartile Range IS Insertion Sequence ISBN International Standard Book Number ITS Internally Transcribed Spacer K Lysine K nucleotide G or T

Ka non-synonymous mutation rate kDa kilodalton

Ks synonymous mutation rate L Leucine Leu Leucine Lip Lipoprotein secretion signal LPS lipopolysaccharide LSU large ribosomal subunit Lys Lysine M Methionine M nucleotide A or C MCM Markov Chain Model Met Methionine MIGS Minimal Information about a Genome Sequence MIMS Minimal Information about a Metagenomic Sequence mRNA messenger RNA N any nucleotide N Asparagine NCBI National Center for Biotechnology Information ncRNA non-coding RNA NEWT a taxonomy database NIH National Institutes of Health NLM National Library of Medicine 260 Abbreviations

nt nucleotide(s) N-terminal Amino-terminal OH Hydroxyl ORF P Proline P promoter PAI Pathogenicity Island PCR Polymerase Chain Reaction PDB Protein Database PEDANT Protein Extraction, Description and Analysis Tool Phe Phenylalanine PHP a scripting language PID Project Identifier PIR Protein Information Resource PMID PubMed Identifier Pro Proline PSD Protein Sequence Database PubMed a publication database pur purine pyr pyrimidine Q Glutamine R Arginine R nucleotide G or A R a programming language RCSB Research Collaboratory for Structural Bioinformatics ROF Relative Oligonucleotide Frequency RNA Ribonucleic acid rRNA ribosomal RNA S Serine S nucleotide G or C S Svedberg unit Sec secretion Ser Serine SIB Swiss Institute of Bioinformatics SNP Single Nucleotide Polymorphism Abbreviations 261

SOA Service Oriented Architecture SOAP Simple Object Access Protocol (deprecated name) SQL Structured Query Language sRNA small RNA ssDNA single-strand DNA SSU small ribosomal subunit STRING Search Tool for the Retrieval of Interacting Genes/ T Threonine T Thymine T1SS type I secretion syste T3SS type III secretion system T4SS type IV secretion system Tat twin Arginine (secretion signal) Thr Threonine tmRNA transfer-messenger RNA trEMBL Translated EMBL database tRNA transfer RNA Trp Tryptophan Tyr Tyrosine U Uracil URI Uniform Resource Identifier URL Uniform Resource Locator V Valine V nucleotide G or C or A Val Valine VCR Vibrio cholerae repeat W Tryptophan W nucleotide A or T WGS Whole Genome Shotgun database WSDL Web Services Description Language X any amino acid XML Extensible Markup Language Y Tyrosine Y nucleotide T or C ZOM Zero Order Markov method Index

16S rRNA, 62, 102, 154, 166 bias, 123 , 230 deviation, 124 structure, 156 distribution of, 125 of intergenic regions, 124 A of metagenomic DNA, 234 ABC transport system, 183 AT skew, 40, 42, 119, 130 Accession number, 38, 59, 60, 61, 62, 64, 74, Atlas 81, 82, 97 A-DNA Atlas, 129 Acidobacteria, 117 Base Atlas, 40, 119 Acinetobacter baumannii,29 Blast Atlas, 206 Actinobacteria, 117, 142 Chromatin Atlas, 171 Adaptation, 246 Expression Atlas, 168 Adenine, 8 Genome Atlas, 42, 147 A-DNA, 127, 129 Metagenome Atlas, 236 A-DNA Atlas, 129, 130 Repeat Atlas, 143 Aeropyrum pernix,32 Structure Atlas, 128, 131 Agrobacterium rhizogenes, 131, 132 Z-DNA Atlas, 130 Agrobacterium tumefaciens, 128, 129 ATP-binding cassette, 183 , 26, 27, 66, 83 Alignment B multiple alignment, 26 Bacillus anthracis, 221, 225 pairwise alignment, 99, 108 Bacillus cereus, 155, 197, 222 Alphabet Bacillus subtilis, 44, 161, 222 DNA alphabet, 28 Bacillus thuringiensis, 222 protein alphabet, 20 Bacillus weihenstephanensis, 194, 195, 196 Alpha helix, 15, 182, 193 Bacterial concept, 220 Anaeromyxobacter, 157 Bacteriophage ␸X174, 37 Annotation Bacteroides, 117, 142, 185 over-annotation, 32 Bacteroides fragilis, 113 strand direction, 102, 201 Base Atlas, 40, 41, 42, 78, 79, 80, 119, 120 under-annotation, 32 Absolute Base Atlas, 40, 41 Annotation quality, 100, 101 Relative Base Atlas, 42 Anticodon, 10, 161 Base composition, 118 Antigen prediction, 185 Base composition bias, 138 Aquificae, 117 Base pairs, 8, 9, 13 Archaea, 3, 11, 49, 113, 157 Base skew, 120 Artemis, 99, 108, 248 Bdellovibrio bacteriovorus, 161, 162 AT content, 41, 50, 77, 78, 111, 114, 115, 116, B-DNA, 127 117, 119, 122, 123, 124, 180 Bendability, 131

263 264 Index

Beta sheet, 193, 195 Clostridium tetani, 119, 120, 121 Bioclipse, 90 CLUSTAL Bioconductor, 89 alignment, 29, 157 Biofilm, 229, 253 CLUSTALW, 27 BioMake, 90 Cluster of Orthologous Genes, 191 BioMed Central, 67 Coding density, 100, 101, 108 BioPerl, 89 Coding strand, 10, 13 , 89 Codon, 10, 11, 77, 154 BLAST, 23, 32, 56, 156, 189, 190, 201 Codon-anticodon recognition, 11 bit score, 24 Codon usage, 77, 78, 138, 162, 163, E-value, 24, 25, 203 180, 234 score value, 204 COG, 191 BLAST Atlas, 203, 206, 207, 208, 226, Comparative , 47, 214 227, 236 Compiler, 85 BLAST Matrix, 203, 204, 205, 206 Complementary strand, 13, 21, 140, 141, BLASTN, 23 142, 201 BLASTP, 23 Computing time, 205, 216 BLASTX, 23, 25 Contamination Borrelia burgdorferi,50 database, 31 Box-and-whiskers plot, 96, 104, 108, 117, 118, metagenomic application, 241 123, 159 sequence, 102 Browser, 75 Contig, 60, 198, 199, 200 Buchnera, 117 Copenhagen Models, 195 Buchnera aphidicola, 161, 162, 255 Core genome, 214, 215, 217, 220, 221, Burkholderia, 118, 223, 224, 250 222, 223 Burkholderia cenocepacia, 124 Creationism, 244, 252 Burkholderia mallei, 224 Cruciform, 146 Burkholderia pseudomallei, 224, 250 CSV file, 74 Burkholderia xenovorans, 97, 223 Curved DNA, 123, 126, 127, 128, 131, 175 C Cyanidioschyzon merolae, 114 C and C++, 85 Cyanobacteria, 240 C#, 85 Cytosine, 8 Campylobacter jejuni, 24, 102, 103 Carsonella ruddii, 47, 98 D Caulobacter crescentus, 164 Data Cellular localization, 180 extraction, 73 Chaperone, 15, 181, 182, 193 metadata, 70 Chimera, 26 primary, 70 Chimeric sequence, 24, 31 visualization, 77 Chlamydiae, 96, 117, 142, 143 Database, 54, 55 Chlamydia muridarum, 130 non-redundant, 25, 60 Chloroflexi, 117 Database contamination, 31 Chromatin Atlas, 170, 171 Database Management System , 74 Chromatin silencing, 171, 176 DBMS, 74 Chromosome, 5 DDBJ, 57 Chromosome numbering, 119 Degradation tag, 165 Cis-acting element, 175, 176 Deinococcus radiodurans, 139 Cloacamonas acidaminovorans, 239 Deinococcus Thermus, 117 Cloning insert, 44 Deletion, 20, 226, 227 Clostridia, 235 Deoxyribose, 8 Clostridium, 117 Destabilizing energy, 179 Clostridium botulinum, 235 Desulfotalea psychrophila, 121, 132 Index 265

Dideoxy Expression nucleotide, 38 , 168 , 38 protein, 14, 168 Dinucleotide, 137 Expression Atlas, 168 Direct repeat, 139, 145 Extremophile, 133 Distribution plot, 101, 108, 115, 116, 134 Diversity, 244 F DNA FASTA, 26 A-DNA, 127 file format, 61, 62, 73 alphabet of, 28 File format, 61, 74 B-DNA, 127 FASTA, 61, 62, 73 bendability, 131 GenBank, 61, 63, 73 curved DNA, 127 , 73 double helix, 9 Firmicutes, 117, 122, 138, 142 double-strand, 13 FIS, 171 melted DNA, 127 binding site, 170, 171, 176 , 8 Fortran, 85 repeat, 139 Fragile program, 82 strand direction, 13 Frankia, 118 structure, 125, 147 Fusobacteria, 117 supercoiled, 127 G transfer, 244 Gap cost, 20 Z-DNA, 127 GC skew, 40, 41, 118, 119, 121, 123, 144, 226 DNA fingerprint, 112 GEIs, see Genome island DNA polymerase, 14, 38, 62 GenBank, 23, 53, 55, 57, 58 Downstream, 12 file format, 61, 63, 73 Drosophila, 5, 112, 241 Gene dsDNA, 13 acquisition, 244 clustering, 250 E content, 100, 215, 225 EC number, 62 duplication, 142, 191 EMBL, 57 expression, 125, 154, 167, 168, 171 E. coli, 104, 115, 176, 177, 178, 203, 204 family, 103, 215, 217 E. coli KÐ12, 168, 171, 206 finding, 32, 198, 200 E. coli O157:H7, 44, 206 location, 62, 169, 180 Effector, 182, 183 orientation, 120, 121 Enterobacter sakazakii, 205 syntenty, 254 Environmental change, 252 Genetic code, 10, 11, 154 Epitope, 185 redundance, 11, 154 prediction, 185 Genome Eubacteria, 11, 157, 164 alignment, 99, 108 Eucarya, 3 annotation, 31, 32, 101, 111, 189, 197 Eukaryotes, 3, 4, 6, 11, 62, 112, 113 length, 112 Eukaryotic genome, 114 Project ID, 60 E-value, 24 reduction, 100, 253 Everted repeat, 140, 141, 143 size, 95, 108, 255 Evolution, 243 Genome Atlas, 42, 43, 44, 45, 47, 48, 125, 147, directed, 252 148, 158, 179, 225, 250 prediction of, 253 Genome island, 160, 183, 225, 249 time scale, 246 Genomes Online Database (GOLD), 59 Execution pipeline, 76, 89 Geobacillus kaustophilus, 240 ExPASy, 66 Global direct repeat, 140 Expectation value, see E-value Global inverted repeat, 140 266 Index

Global repeat, 141, 145 M Guanine, 8 Make,89 Markov Chain Method, 137, 138 H Markup language, 86, 87 Haemophilus influenzae, 47, 48, 49, 58 Markup tag, 81, 86 Hairpin, 147, 160 attribute, 86 Helicobacter pylori, 31, 49, 58, 214 Melted DNA, 127 Heliobacterium modesticaldum, 235 Membrane topology, 196 Helix breaker, 193 Messenger RNA, 7, 153, 164 Histone-like protein, 26, 128, 170, 171 Metadata, 70, 71 binding site, 171, 175 derived, 70 H-NS, 26 primitive, 70 Homolog, 190, 202 Metagenome Atlas, 235, 237, 238, 240 HTML, 81, 86 Metagenomics, 229 Hydrophobicity, 195 of marine , 240 Hypothetical protein, 191 of soil bacteria, 235 visualization of, 235 I Methanobrevibacter, 117 Identity score, 22 Methanocaldococcus jannischii,49 IHF, 15, 27, 30, 170 Methanococcus, 117 binding site, 27, 170, 171, 175, 176 Microbial community, 213 Imperfect repeat, 141 Minimal gene set, 214 In silico, 33, 73 Mirror repeat, 140, 141, 143 Indel, 20, 95, 99, 100 Mitochondrial DNA, 11, 154 Initiation of transcription, 173 Mobile element, 141, 147, 148, 160 Insertion, 19 mRNA, 7, 11, 161 sequence, 31, 147, 248 concentration, 168 Integration Host Factor, see IHF half-life, 175 Internally transcribed spacer (ITS), 157 stability, 175 Interpreter, 84 Multiple alignment, 29 Intrinsic DNA curvature, 42, 126, 148 Mycobacteria, 143 Inverted repeat, 43, 140, 145 Mycobacterium leprae, 100 Mycobacterium tuberculosis, 155 J Mycoplasma, 11, 117 Java, 85 Mycoplasma genitalium,47 JavaScript, 84 Mycoplasma hyopneumoniae,99

K N Key Nanoarchaeaota, 117 identification, 60 Nanoarchaeum equitans,47 primary, 58 National Center for Biotechnology Information Kingdom, 3 (NCBI), 55, 56 Klebsiella pneumoniae, 205 ncRNA, 12, 175 Non-coding RNA, see ncRNA L Non-synonymous mutation, 202 Lactobacillus delbrueckii, 162 Nucleotides, 7, 8 Lagging strand, 14, 40, 111, 119, 120, 121, 122 Leading strand, 14, 40, 111, 118, 119, 120, O 122, 123, 157 Oligomer Leptospira interrogans, 101 bias, 121 Lipoprotein, 182 strand difference, 122 Local repeat, 142, 146 Ontology, 71, 72 direct repeat, 140 Open pan-genome, 218 Index 267

Open reading frame (ORF), 10, 32, 102, Programming, 69 200, 201 programming language, 83 Operon, 172 compiled, 83, 85 Origin of replication (Ori), 14, 111, 119, 121, interpreted, 83, 84 122, 123, 128, 157, 160, 169, 171, object oriented, 85 200 Project Identifier (PID), 60, 61, 97, 108 Ortholog, 142, 191 pipeline, 76, 89, 200, 201, 215 Over-annotation, 32 Prokaryotes, 3 Promoter, 11, 171, 177 P consensus sequence, 176 Pairwise alignment, 23 location, 12, 171 Palindrome, 43, 139, 140, 145 structural property, 177 Palindromic repeat, 147 structure, 176 Pan-genome, 214, 215, 216, 217, 218, 219, Propeller twist, 131 220, 222, 224, 225 ProSite, 66 Paralog, 191, 203, 204 Protein Parser/parsing, 74 alphabet of, 20 Pathogenesis, 222 category, 191 Pathogenicity, 58 denaturation, 181 island (PAI), 249 expression, 14, 168 Pectobacterium atrosepticum, 205 family, 103 PEDANT, 65, 66 folding, 15, 180 Pelagibacter ubique,98 function, 190, 201 Pelotomaculum thermopropionicum, 143, 145, functional category, 191 148, 236 homology, 190 Periplasm, 182 induced deformability, 131 Perl, 84 length distribution, 101, 108 Photobacterium profundum, 154, 158, 159 membrane-embedded protein, 195 Photorhabdus luminescens, 143, 205 modification, 180 PHP, 84 secretion, 16, 181 , 4, 5, 30, 103, 107, 158, 165, similarity, 190 166, 231 structure, 15, 193 rooted, 30 structure prediction, 193 unrooted, 30 water-soluble, 193 Phylum, 97 Protein-coding genes, 190 sequenced genomes, 230 Protein database (PDB), 64, 65 PID, 60, 61, 97, 108 search, 23 pipeline, 76, 89, 200, 201, 215 Protein-protein interaction, 195 PIR-PSD, 65 Proteobacteria, 97, 111, 142, 157, 222 Planctomycetes, 117 Proteome, 103, 203 PMID, 58 Pseudomonas syringae, 249 Polycistronic mRNA, 160, 161, 172 PubMed, 57 Position preference, 42, 125, 126, PubMed Identification Number 131, 148 (PMID), 58 Post-translational modification, 16, 181 Purine pyrimidine step, 174 Preprotein, 182 Purine stretch, 130, 146 Primary Pyrimidine, 8 data, 70 Pyrimidine stretch, 130, 146 key, 59, 60 Python, 84 structure, 193 Prochlorococcus, 117 Q Prochlorococcus marinus, 164, 165, 240 Quaternary structure, 193 ProDom, 66 Query sequence, 19 268 Index

R rRNA, 12 R, 84 16S Rrna, 62, 102, 154, 166 Raster graphic, 79 AT content, 159 Reading frame, 22 gene count, 154 Recombination, 202 operon, 47, 102, 154, 177, 179 RefSeq, 60 location, 158 Regulation promoter, 176, 177 of gene expression, 175 rRNA gene, 154 of transcription, 12, 167, 169 Ruby, 84 of translation, 179 Regulatory protein, 13 S Relational database, 74, 75 Salmonella, 26, 177, 178, 206 Relative oligonucleotide frequency, 138 Salmonella enterica, 222 Relaxed DNA, 127 Salmonella typhimurium, 139 Release factor, 164 Sanger sequencing method, 38 Repeat Sargasso Sea, 233 direct, 139 Scalability, 83 everted, 140, 141, 143 Scatter plot, 144, 149 frequency, 141 Sec-dependent secretion system, 182 global, 141 Secondary structure, 193 direct, 140 Secretion inverted, 140, 144 sec-dependent, 182 imperfect, 141 signal prediction, 184 local, 142 Tat-dependent, 182 direct, 140, 144 type I, 183 everted, 144 type II, 183 inverted, 144 type III, 183, 249 mirror, 141, 143 type IV, 183 simple, 139 Secretome, 183, 184 score, 141 Sec signal peptide, 182 spacer, 140 Selection, 243 Repeat Atlas, 143, 144, 145 pressure, 202 Replication, 14 Selenocysteine, 154 Restriction enzyme, 62, 139, 140 Selfish DNA, 248 Ribose, 8 Sequence, 7 phosphate backbone, 9 alignment, 19, 21, 23, 27 Ribosomal RNA, see rRNA assembly, 198 Ribosome, 10 conservation, 19, 202 binding site, 12 logo plot, 28, 162, 174, 178 large subunit, 155 read, 199 small subunit, 155 similarity, 22 stalled, 164 Sequencing technology, 197, 199 RNA dideoxy sequencing, 38 database, 62 Sanger method, 38 mRNA,7,11 Service Oriented Architecture, 82, 88 ncRNA, 12 Shell scripts, 84 nucleotides, 8 Shewanella, 117 polycistronic, 157, 175 Shigella, 176, 178, 206, 222 rRNA, 12 Shigella flexneri, 247 tmRNA, 153, 165 Shigella sonnei, 205 tRNA, 10, 11, 12, 153 Shotgun cloning, 46 RNA polymerase, 9 Shotgun DNA sequencing, 46 Rose plot, 162 SIDD value, 179 Index 269

Sigma 54, 103, 104, 105 Synonymous mutation, 202 binding site, 174 Synthetic biology, 98, 253 Sigma 70, 104, 105, 172, 177 Synthetic gene, 180 binding site, 174 Sigma factor, 11, 103, 105, 106, 171, 172 T alternative sigma factor, 104 TATAAT box, 174 binding, 173, 174 Tat secretion, 182 binding site, 173 signal, 182 primary, 104 Taverna, 90 stress-response, 181 Taxonomy, 58, 66, 115, 157, 158 Signal peptide, 182 Telomere, 114 Similarity chain, 31 Termite Group 1 bacterium, 239 Similarity score, 22 Terrestrial bacteria, 235 Simple Object Access Protocol (SOAP, 88 Tertiary structure, 193 Simple repeat, 139, 145 Tetranucleotide, 138, 139 Single Nucleotide Polymorphism Text file, 73, 74 (SNP), 199 Thalassiosira pseudonana,3 Sliding window analysis, 43 Thermoanaerobacter, 102 SNP, 199 Thermobispora bispora, 158 Sodalis glossinidius, 100, 249, 255 Thermophile, 117, 230 Soil bacteria, 117, 233 Thermotoga, 117 Solibacter usitatus,97 Thermus aquaticus,62 Sorangium cellulosum,97 Thermus thermophilus, 161, 162 Spirochaetes, 117, 142 Thymine, 8 Spiroplasma,11 tmRNA, 153, 164 Stacking energy, 126, 148 database, 64 Stalled ribosome, 164 structure, 165 Start codon, 10, 12 Trans-acting factor, 176 Stop codon, 11, 12 Transcription, 9, 10 Strand global regulation, 169 difference plot, 134 local regulation, 169 lagging, 119, 120, 121, 122 termination, 12 leading, 14, 40, 111, 118, 119, 120, 121, Transcriptional regulation, 169 122, 123, 157 Transcription start, 11, 172 direction, 13 Transcriptome, 153, 201 annotation, 102, 201 Transfer-messenger RNA, see tmRNA Streptococcus, 219 Transfer RNA (tRNA), 10 Streptococcus agalactiae, 214 gene, 160 Streptococcus pneumoniae, 220 Translation, 10 Streptococcus pyogenes, 220, 225 efficiency, 164 Streptococcus thermophilus, 220 Translocation pore, 182 Streptomyces, 118 Trans-membrane helix, 195 Streptomyces coelicolor, 162 Transposon, 145 Structure Atlas, 128, 129, 131, 132, Trans-translation, 164 133, 179 TrEMBL, 65, 66 Structured Query Language (SQL), 75 Trinucleotide, 137 Subpopulation, 244 Trivially parallel task, 76 Supercoiled DNA, 127 tRNA, 10, 11, 12, 153 Supercoiling, 175 database, 64 Superhelical DNA, 127 gene, 12, 64 Superintegron, 253 gene count, 154 Surface protein, 253 structure, 161 Swiss-Prot, 102 Twin Arginine secretion, 182 270 Index

Two-component signal transduction, 175, 253 Visualization Types, secretion methods, 77 I, 183 tool, 134, 149, 166, 186 II, 183 III, 183, 249 W IV, 183 Web Service, 75, 88 Wigglesworthia glossinidia, 162, 206 U Window analysis, 141, 170 Under-annotation, 32 Wolbachia, 241 UniParc, 65 Word frequency, 137 UniProt, 65 WSDL, 89 UniProtKB, 65, 66 UniRef, 65 X Untranslated RNA, 153 XML, 87 UP element, 175 document, 87 Upstream, 11, 171 Uracil, 8 Y Yersinia, 177, 178, 206 V Vector graphic, 79 Z Vibrio cholerae, 50, 253 Z-DNA, 127 Vibrio parahaemolyticus, 253 Z-DNA Atlas, 130 Violin plot, 185, 186 Zero order Markov method (ZOMs), 138