BIOINFORMATICS APPLICATIONS NOTE Pages 380-381

Vol. 14 no. 4 1998 BIOINFORMATICS APPLICATIONS NOTE Pages 380-381 MView: a web-compatible database search or multiple alignment viewer NigelP. Brown, ChristopheLeroy and Chris Sander European Bioinformatics Institute (EMBLĆEBI), Wellcome Genome Campus, CambridgeCB10 1SD, UK Received on December 10, 1997; revised and accepted on January 15, 1998 Abstract may be hyperlinked to the SRS system (Etzold et al., 1996), a text field, a field of scoring information from searches, and Summary: MView is a tool for converting the results of a a field reporting the per cent identity of each sequence with sequence database search into the form of a coloured multiple respect to a preferred sequence in the alignment, usually the alignment of hits stacked against the query. Alternatively, an query in the case of a search. existing multiple alignment can be processed. In either case, Multiple alignments require minimal parsing and are the output is simply HTML, so the result is platform independent and does not require a separate application or subjected only to formatting stages. Search hits are first applet to be loaded. stacked against the ungapped query sequence and require Availability: Free from http://www.sander.ebi.ac.uk/mview/ special processing. Ungapped search (e.g. BLAST) hit subject to copyright restrictions. fragments are assembled into a single string by overlaying Contact: [email protected] them preferentially by score onto a template string, while gapped search (e.g. FASTA) hits have columns corresponding Often when running FASTA (Pearson, 1990) or BLAST to query gaps excised. Consequently, the stacked alignment is (Altschul et al., 1990), it is desired to visualize the database a patchwork of reconstituted sequences that nevertheless is hits stacked against the query sequence. The BEAUTY informative and visually striking. post-processor (Worley et al., 1996) parses BLAST output MView offers three kinds of input filtering. A threshold and shows the query and hits as a stack of line segments, but maximum pairwise sequence identity can be specified to this is neither generally applicable, nor are the sequences screen out close homologues, and an upper bound can be set themselves stacked. At the other extreme, given a full multiple on the number of sequences to be reported. Additionally, alignment generated by other tools, viewers [e.g. Belvu specific to the type of input, cut-offs on, for example, BLAST (Sonnhammer, 1995), an X window client], editors [e.g. p-value or score, can be set. Three colour schemes are Cinema (Attwood et al., 1997), a Java applet] and formatters supported: none, colour residues by property using amino acid for hard copy [e.g. Boxshade (Hofmann and Baron, 1992)] physicochemical classes (Taylor, 1986), or colour by identity exist. and property. In the last case, residues are coloured if identical By comparison, MView is intended as a filter for post-processing searches or alignments to generate a to their counterpart in any preferred sequence in the reformatted alignment with HTML mark-up. Its principal uses alignment, normally the query in the case of a search. This so far are as an embedded tool inside Web applications for reference sequence is also used to calculate the displayed per viewing precomputed searches and alignments under cent identities. Formatting options include pagination of the GeneQuiz (Scharf et al., 1994; Andrade et al., 1998), and in alignment—the default is to produce one single scrollable displaying FASTA/BLAST search output (Lopez, 1997). band, but this can be broken into panes by specifying a desired Driven from the command line, the program comprises a number of alignment columns per pane. A ruler can be back end implementing different parsers, and a front end attached to the top of the alignment, and various other minor providing a set of formatting options applied to the internally settings are possible, such as a choice of gap character. assembled (or simply read in) multiple alignment. HTML Parsers all inherit from a generic class offering incremental mark-up may be switched off so that plain ASCII output can parsing of record-based files, and are quite easy to add. At the be produced for loading into another tool, such as Belvu. time of writing, parsers for protein sequence searches have The basic output consists of columns of optional descriptor been implemented for FASTA 1.6, 2.0, 3.0, BLASTP 1.4, information followed by a column of alignment strings (see BLAST2 (WashU) 2.0, BLAST2 (NCBI) 2.0, PSI-BLAST Figure 1). Descriptors include a number giving the original 2.0. (Altschul et al., 1997). Multiple alignment formats rank of a sequence in the input, a sequence identifier, which recognized are Pearson/FASTA, MSF, CLUSTALW, 380 E Oxford University Press MView: a web-compatible database search Fig. 1. A single pane of output from MView assembled from a search of P2CA_RAT (a class 2C protein phosphatase) using BLASTP 1.4.9 against a non-redundant database. Fields from the left are: rank of each hit; SRS-linked identifier; score, p-value, fragment c ount as reported by BLASTP; per cent identity of hit fragments to query. Maximum pairwise identity was set at 80%, colouring is by identity to query and by property, with other residues in grey. These proteins belong to an extended family (Bork et al., 1996) that also includes a mitochondrial phosphatase, the adenylate cyclases and several bacterial phosphatases. The BLAST scores in the lower rows are weak, but the tool readily allows the user to identify promising candidates for further study. MaxHom/HSSP, and a trivial one comprising paired columns Bork,P., Brown,N.P., Hegyi,H. and Schultz,J. (1996) The protein of identifiers and aligned sequences. phosphatase 2C (PP2C) superfamily: Detection of bacterial homo- MView and its underlying class libraries are implemented logues. Protein Sci., 5, 1421–1425. in Perl, Version 5 (Wall et al., 1996) for UNIX, and should be Etzold,T., Ulyanov,A. and Argos,P. (1996) SRS: information retrieval easily portable to other systems. Formatting and colouring of system for molecular biology data banks. Methods Enzymol., 266, HTML alignments require a fixed-width font (e.g. Courier) 114–128. and support for the <FONT> tag, so a recent version of a Hofmann,K. and Baron,M.D. (1992) BOXSHADE. ISREC, Switzer- browser such as Netscape is recommended. land; Institute for Animal Health, U.K. http://ulrec3.unil.ch/soft- ware/BOX_form.html. Lopez,R. (1997) Fasta3 and blast service at the EBI. EMBnet.news, Acknowledgements 4(1). http://www2.ebi.ac.uk/embnet/news/. Thanks to S.Hoersch, R.Lopez, C.Reich, A.Franchini and Pearson,W.R. (1990) Rapid and sensitive sequence comparison with M.Andrade for testing and suggestions. FASTP and FASTA. Methods Enzymol., 183, 63–98. Scharf,M., Schneider,R., Casari,G., Bork,P., Valencia,A., Ouzounis,C. and Sander,C. (1994) GeneQuiz: a workbench for sequence References analysis. In Altman,R., Brutlag,D., Karp,P., Lathrop,R. and Searls,D. (eds), Proceedings of the Second International Conference Altschul,S.F., Gish,W., Miller,W., Myers,E, W. and Lipman,D.J. (1990) on Intelligent Systems for Molecular Biology. AAAI Press, Menlo Basic local alignment search tool. J. Mol. Biol., 215, 403–410. Park, CA, pp. 348–353. Altschul,S.F., Madden,T.L., Schaeffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Gapped BLAST and PSI-BLAST: a new Sonnhammer,E. (1995) Belvu—a multiple alignment viewer. Sanger ∼ generation of protein database search programs. Nucleic Acids Res., 25, Centre, UK. http://www.sanger.ac.uk/ esr/Belvu.html. 3389–3402. Taylor,W.R. (1986) The classification of amino acid conservation. J. Andrade, M., Brown, N.P., Leroy, C., Hoersch, S., Reich, C., Franchini, Theor. Biol., 119, 205–218. A., de Daruvar, A., Tamames, J., Valencia, A., Ouzounis, C. and Sander, Wall,L., Christiansen,T. and Schwartz,R.L. (1996) Programming Perl, C. (1998) GeneQuiz: Automated genome sequence analysis and 2nd edn. Nutshell Handbooks, O’Reilly & Associates, Inc., annotation”. (manuscript in preparation). http://www.sander.e- Sebastopol, CA, USA. bi.ac.uk.genequiz/ . Worley,K.C., Wiese,B.A. and Smith,R.F. (1996) BEAUTY: An Attwood,T., Payne,A.W.R., Michie,A.D. and Parry-Smith,D.J. (1997) enhanced BLAST-based search tool that integrates multiple biologi- A Colour INteractive Editor for Multiple Alignments—CINEMA. cal information resources into sequence similarity search results. EMBnet.news, 3(3). http://www2.ebi.ac.uk/embnet/news/. Genome Res., 5, 173–184. 381.

BIOINFORMATICS APPLICATIONS NOTE Pages 380-381

Ontology-Based Methods for Analyzing Life Science Data

Ploidetect Enables Pan-Cancer Analysis of the Causes and Impacts of Chromosomal Instability

Are You an Invited Speaker? a Bibliometric Analysis of Elite Groups for Scholarly Events in Bioinformatics

Full List of PCAWG Consortium Working Groups and Writing

Biological Pathways Exchange Language Level 3, Release Version 1 Documentation

Systems Biology Graphical Notation: Process Description Language Level 1

I S C B N E W S L E T T

Mapping the Protein Universe

A General Mathematical Framework for Understanding the Behavior of Heterogeneous Stem Cell Regeneration Arxiv:1903.11448V1 [Q-B

Bioinformatics 2 -- Lecture 3

Generating Functional Protein Variants with Variational Autoencoders

Context-Aware Prediction of Pathogenicity of Missense Mutations Involved in Human Disease