Test Documentation Release Test
Total Page:16
File Type:pdf, Size:1020Kb
test Documentation Release test test Dec 06, 2017 Contents 1 Biopython 3 1.1 Tutorial and Cookbook..........................................3 2 Introduction 5 2.1 What is Biopython?...........................................5 2.2 What can I find in the Biopython package................................5 2.3 About these notebooks..........................................6 3 Quick Start 7 3.1 General overview of what Biopython provides.............................7 3.2 Working with sequences.........................................8 3.3 A usage example.............................................8 3.4 Parsing sequence file formats......................................9 3.5 Connecting with biological databases.................................. 11 3.6 What to do next............................................. 11 4 Sequence Objects 13 4.1 Sequences and Alphabets........................................ 13 4.2 Sequences act like strings........................................ 14 4.3 Slicing a sequence............................................ 15 4.4 Turning Seq objects into strings..................................... 16 4.5 Concatenating or adding sequences................................... 16 4.6 Changing case.............................................. 17 4.7 Nucleotide sequences and (reverse) complements............................ 18 4.8 Transcription............................................... 18 4.9 Translation................................................ 19 4.10 Translation Tables............................................ 21 4.11 Comparing Seq objects.......................................... 22 4.12 MutableSeq Objects........................................... 23 4.13 UnknownSeq Objects.......................................... 24 4.14 Working with strings directly...................................... 25 5 Sequence annotation objects 27 5.1 The SeqRecord Object.......................................... 27 5.2 Creating a SeqRecord.......................................... 28 5.3 Feature, location and position objects.................................. 32 5.4 References................................................ 36 i 5.5 The format method............................................ 36 5.6 Slicing a SeqRecord........................................... 36 5.7 Adding SeqRecord objects........................................ 40 5.8 Reverse-complementing SeqRecord objects............................... 41 6 Sequence Input/Output 43 6.1 Parsing or Reading Sequences...................................... 43 6.2 Reading Sequence Files......................................... 44 6.3 Parsing sequences from compressed files................................ 48 6.4 Parsing sequences from the net..................................... 48 6.5 Sequence files as dictionaries...................................... 50 6.6 Writing sequence files.......................................... 56 7 Multiple Sequence Alignment objects 61 7.1 Parsing or Reading Sequence Alignments................................ 61 7.2 Writing Alignments........................................... 73 7.3 Manipulating Alignments........................................ 77 7.4 Alignment Tools............................................. 80 8 BLAST 101 8.1 Running BLAST over the Internet.................................... 102 8.2 Saving blast output............................................ 105 8.3 Running BLAST locally......................................... 105 8.4 Parsing BLAST output.......................................... 107 8.5 The BLAST record class......................................... 108 8.6 Deprecated BLAST parsers....................................... 109 8.7 Bio.Blast.NCBIStandalone........................................ 109 9 BLAST and other sequence search tools (experimental code) 115 9.1 The SearchIO object model....................................... 116 9.2 A note about standards and conventions................................. 129 9.3 Reading search output files........................................ 130 9.4 Dealing with large search output files with indexing.......................... 131 9.5 Writing and converting search output files................................ 131 10 Accessing NCBI’s Entrez databases 133 10.1 Entrez Guidelines............................................ 134 10.2 EInfo: Obtaining information about the Entrez databases........................ 135 10.3 ESearch: Searching the Entrez databases................................ 138 10.4 EPost: Uploading a list of identifiers................................... 139 10.5 ESummary: Retrieving summaries from primary IDs.......................... 139 10.6 EFetch: Downloading full records from Entrez............................. 140 10.7 ELink: Searching for related items in NCBI Entrez........................... 143 10.8 EGQuery: Global Query - counts for search terms........................... 147 10.9 ESpell: Obtaining spelling suggestions................................. 148 10.10 Parsing huge Entrez XML files..................................... 149 10.11 Handling errors.............................................. 150 10.12 Specialized parsers............................................ 153 10.13 Using a proxy.............................................. 162 10.14 Examples................................................. 163 10.15 Using the history and WebEnv...................................... 219 11 Swiss-Prot and ExPASy 225 11.1 Parsing Swiss-Prot files......................................... 225 11.2 Parsing Prosite records.......................................... 231 ii 11.3 Parsing Prosite documentation records................................. 232 11.4 Parsing Enzyme records......................................... 233 11.5 Accessing the ExPASy server...................................... 234 11.6 Scanning the Prosite database...................................... 244 12 Going 3D: The PDB module 247 12.1 Reading and writing crystal structure files................................ 247 12.2 Structure representation......................................... 250 12.3 Disorder................................................. 256 12.4 Hetero residues.............................................. 258 12.5 Navigating through a Structure object.................................. 258 12.6 Analyzing structures........................................... 555 12.7 Common problems in PDB files..................................... 559 12.8 Accessing the Protein Data Bank.................................... 561 13 Bio.PopGen: Population genetics 563 13.1 GenePop................................................. 563 13.2 Operations on GenePop records..................................... 564 13.3 Coalescent simulation.......................................... 565 14 Phylogenetics with Bio.Phylo 569 14.1 Demo: what is in a tree?......................................... 569 14.2 I/O functions............................................... 572 14.3 View and export trees.......................................... 576 14.4 Using Tree and Clade objects...................................... 581 14.5 Running external applications...................................... 584 14.6 PAML integration............................................ 585 15 Sequence motif analysis using Bio.motifs 587 15.1 Motif objects............................................... 587 15.2 Reading motifs.............................................. 589 15.3 Writing motifs.............................................. 598 15.4 Position-Weight Matrices........................................ 599 15.5 Position-Specific Scoring Matrices................................... 600 15.6 Searching for instances.......................................... 600 15.7 Each motif object has an associated Position-Specific Scoring Matrix................. 602 15.8 Comparing motifs............................................ 604 15.9 De novo motif finding.......................................... 604 16 Cluster analysis 607 16.1 Data representation............................................ 607 16.2 Missing values.............................................. 607 16.3 Random number generator........................................ 608 16.4 Euclidean distance............................................ 609 16.5 City-block distance............................................ 609 16.6 The Pearson correlation coefficient................................... 609 16.7 Absolute Pearson correlation....................................... 609 16.8 Uncentered correlation (cosine of the angle)............................... 610 16.9 Absolute uncentered correlation..................................... 610 16.10 Spearman rank correlation........................................ 611 16.11 Kendall’s 휏 ................................................ 611 16.12 Weighting................................................. 611 16.13 Calculating the distance matrix..................................... 611 16.14 Calculating the cluster centroids..................................... 612 16.15 Calculating the distance between clusters................................ 613 iii 16.16 k-means and k-medians......................................... 615 16.17 k-medoids clustering........................................... 616 16.18 Representing a hierarchical clustering solution............................. 618 16.19 Performing hierarchical clustering.................................... 620 16.20 Calculating the distance matrix..................................... 626 16.21 Calculating