Weblogo Documentation Release 3.7.9.Dev2+G7eab5d1.D20210504
Total Page:16
File Type:pdf, Size:1020Kb
WebLogo Documentation Release 3.7.9.dev2+g7eab5d1.d20210504 Gavin E. Crooks May 04, 2021 Contents: 1 Distribution and Modification3 1.1 WebLogo API..............................................3 1.2 Alphabets and Sequences........................................3 1.3 Sequence IO...............................................7 1.3.1 Sequence file reading and writing...............................7 1.3.2 Supported File Formats.....................................8 1.4 Logo Data, Options, and Format.....................................9 1.5 Logo Formatting............................................. 11 Python Module Index 13 Index 15 i ii WebLogo Documentation, Release 3.7.9.dev2+g7eab5d1.d20210504 WebLogo is software designed to make the generation of sequence logos easy and painless. A sequence logo is a graphical representation of an amino acid or nucleic acid multiple sequence alignment. Each logo consists of stacks of symbols, one stack for each position in the sequence. The overall height of the stack indicates the sequence conservation at that position, while the height of symbols within the stack indicates the relative frequency of each amino or nucleic acid at that position. In general, a sequence logo provides a richer and more precise description of, for example, a binding site, than would a consensus sequence. WebLogo features a web interface (http://weblogo.threeplusone.com), and a command line interface provides more options and control (http://weblogo.threeplusone.com/manual.html#CLI). These pages document the API. The main WebLogo webserver is located at http://weblogo.threeplusone.com Please consult the manual for installation instructions and more information: (Also located in the weblogolib/htdocs subdirectory.) http://weblogo.threeplusone.com/manual.html For help on the command line interface run weblogo --help To build a simple logo run weblogo < cap.fa > logo0.eps To run as a standalone webserver at localhost:8080 weblogo --serve Contents: 1 WebLogo Documentation, Release 3.7.9.dev2+g7eab5d1.d20210504 2 Contents: CHAPTER 1 Distribution and Modification This package is distributed under the new BSD Open Source License. Please see the LICENSE.txt file for details on copyright and licensing. The WebLogo source code can be downloaded from https://github.com/WebLogo/weblogo WebLogo requires Python 3.6 or 3.7. Generating logos in PDF or bitmap graphics formats require that the ghostscript program ‘gs’ be installed. Scalable Vector Graphics (SVG) format also requires the program ‘pdf2svg’. 1.1 WebLogo API To create a logo in python code: >>> from weblogo import * >>> fin= open('cap.fa') >>> seqs= read_seq_data(fin) >>> logodata= LogoData.from_seqs(seqs) >>> logooptions= LogoOptions() >>> logooptions.title="A Logo Title" >>> logoformat= LogoFormat(logodata, logooptions) >>> eps= eps_formatter(logodata, logoformat) 1.2 Alphabets and Sequences Alphabetic sequences and associated tools and data. Seq is a subclass of a python string with additional annotation and an alphabet. The characters in string must be contained in the alphabet. Various standard alphabets are provided. Classes Alphabet-- A subset of non-null ascii characters Seq-- An alphabetic string SeqList-- A collection of Seq's 3 WebLogo Documentation, Release 3.7.9.dev2+g7eab5d1.d20210504 Alphabets o generic_alphabet-- A generic alphabet. Any printable ASCII character. o protein_alphabet-- IUCAP/IUB Amino Acid one letter codes. o nucleic_alphabet-- IUPAC/IUB Nucleic Acid codes'ACGTURYSWKMBDHVN-' o dna_alphabet-- Same as nucleic_alphabet, with 'U' (Uracil) an alternative for 'T' (Thymidine). o rna_alphabet-- Same as nucleic_alphabet, with 'T' (Thymidine) an alternative for 'U' (Uracil). o reduced_nucleic_alphabet-- All ambiguous codes in 'nucleic_alphabet' are alternative to'N' (aNy) o reduced_protein_alphabet-- All ambiguous ('BZJ') and non-canonical amino acids codes ('U', Selenocysteine and 'O', Pyrrolysine) in 'protein_alphabet' are alternative to'X'. o unambiguous_dna_alphabet--'ACGT' o unambiguous_rna_alphabet--'ACGU' o unambiguous_protein_alphabet-- The twenty canonical amino acid one letter codes, in alphabetic order,'ACDEFGHIKLMNPQRSTVWY' Amino Acid Codes: Code Alt. Meaning ----------------- A Alanine B Aspartic acid or Asparagine C Cysteine D Aspartate E Glutamate F Phenylalanine G Glycine H Histidine I Isoleucine J Leucine or Isoleucine K Lysine L Leucine M Methionine N Asparagine O Pyrrolysine P Proline Q Glutamine R Arginine S Serine T Threonine U Selenocysteine V Valine W Tryptophan Y Tyrosine Z Glutamate or Glutamine X ? any * translation stop - .~ gap Nucleotide Codes: Code Alt. Meaning ------------------------------ A Adenosine C Cytidine (continues on next page) 4 Chapter 1. Distribution and Modification WebLogo Documentation, Release 3.7.9.dev2+g7eab5d1.d20210504 (continued from previous page) G Guanine T Thymidine U Uracil R G A (puRine) Y T C (pYrimidine) K G T (Ketone) M A C (aMino group) S G C (Strong interaction) W A T (Weak interaction) B G T C (not A) (B comes after A) D G A T (not C) (D comes after C) H A C T (not G) (H comes after G) V G C A (not T, not U) (V comes after U) N X? A G C T (aNy) - .~ A gap Refs: http://www.chem.qmw.ac.uk/iupac/AminoAcid/A2021.html http://www.chem.qmw.ac.uk/iubmb/misc/naseq. html Authors: GEC 2004,2005 class weblogo.seq.Alphabet An ordered subset of printable ascii characters. Status: Beta Authors: • GEC 2005 alphabetic(string) True if all characters of the string are in this alphabet. chr(n) The n’th character in the alphabet (zero indexed) or 0 chrs(sequence_of_ints) Convert a sequence of ordinals into an alphabetic string. letters() Letters of the alphabet as a string. normalize(string) Normalize an alphabetic string by converting all alternative symbols to the canonical equivalent in ‘letters’. ord(c) The ordinal position of the character c in this alphabet, or 255 if no such character. ords(string) Convert an alphabetic string into a byte array of ordinals. static which(seqs, alphabets=None) Returns the most appropriate unambiguous protein, RNA or DNA alphabet for a Seq or SeqList. If a list of alphabets is supplied, then the best alphabet is selected from that list. The heuristic is to count the occurrences of letters for each alphabet and downweight longer alphabets by the log of the alphabet length. Ties go to the first alphabet in the list. class weblogo.seq.Seq An alphabetic string. A subclass of “str” consisting solely of letters from the same alphabet. alphabet -- A string or Alphabet of allowed characters. 1.2. Alphabets and Sequences 5 WebLogo Documentation, Release 3.7.9.dev2+g7eab5d1.d20210504 name -- A short string used to identify the sequence. description -- A string describing the sequence Authors : GEC 2005 back_translate() Translate a protein sequence back into coding DNA, using the standard genetic code. See webl- ogo.transform.GeneticCode for details and more options. complement() Returns complementary nucleic acid sequence. join(iterable) ! str Return a string which is the concatenation of the strings in the iterable. The separator between elements is S. lower() Return a lower case copy of the sequence. mask(letters=’abcdefghijklmnopqrstuvwxyz’, mask=’X’) Replace all occurrences of letters with the mask character. The default is to replace all lower case letters with ‘X’. ords() Convert sequence to an array of integers in the range [0, len(alphabet) ) remove(delchars) Return a new alphabetic sequence with all characters in ‘delchars’ removed. reverse() Return the reversed sequence. Note that this method returns a new object, in contrast to the in-place reverse() method of list objects. reverse_complement() Returns reversed complementary nucleic acid sequence (i.e. the other strand of a DNA sequence.) tally(alphabet=None) Counts the occurrences of alphabetic characters. Arguments: - alphabet – an optional alternative alphabet Returns : A list of character counts in alphabetic order. tostring() Converts Seq to a raw string. translate() Translate a nucleotide sequence to a polypeptide using full IUPAC ambiguities in DNA/RNA and amino acid codes, using the standard genetic code. See weblogo.transform.GeneticCode for details and more options. upper() Return a lower case copy of the sequence. word_count(k, alphabet=None) Return a count of all subwords in the sequence. >>> from weblogo.seq import * >>> Seq("abcabc").word_count(3) [('abc', 2), ('bca', 1), ('cab', 1)] 6 Chapter 1. Distribution and Modification WebLogo Documentation, Release 3.7.9.dev2+g7eab5d1.d20210504 words(k, alphabet=None) Return an iteration over all subwords of length k in the sequence. If an optional alphabet is provided, only words from that alphabet are returned. >>> list(Seq("abcabc").words(3)) ['abc', 'bca', 'cab', 'abc'] weblogo.seq.rna(string) Create an alphabetic sequence representing a stretch of RNA. weblogo.seq.dna(string) Create an alphabetic sequence representing a stretch of DNA. weblogo.seq.protein(string) Create an alphabetic sequence representing a stretch of polypeptide. class weblogo.seq.SeqList(alist=[], alphabet=None, name=None, description=None) A list of sequences. isaligned() Are all sequences of the same length and alphabet? ords(alphabet=None) Convert sequence list into a 2D array of ordinals. profile(alphabet=None) Counts the occurrences of characters in each column. Returns: Motif(counts, alphabet) tally(alphabet=None) Counts the occurrences of alphabetic characters. Parameters alphabet -- an optional alternative alphabet (-)– Returns : A list of character counts in alphabetic order.