Tunca Doğan , Alex Bateman , Maria J. Martin Your Choice

(—THIS SIDEBAR DOES NOT PRINT—) UniProt Domain Architecture Alignment: A New Approach for Protein Similarity QUICK START (cont.) DESIGN GUIDE Search using InterPro Domain Annotation How to change the template color theme This PowerPoint 2007 template produces a 44”x44” You can easily change the color theme of your poster by going to presentation poster. You can use it to create your research 1 1 1 the DESIGN menu, click on COLORS, and choose the color theme of poster and save valuable time placing titles, subtitles, text, Tunca Doğan , Alex Bateman , Maria J. Martin your choice. You can also create your own color theme. and graphics. European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), We provide a series of online tutorials that will guide you Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK through the poster design process and answer your poster Correspondence: [email protected] production questions. To view our template tutorials, go online to PosterPresentations.com and click on HELP DESK. ABSTRACT METHODOLOGY RESULTS & DISCUSSION When you are ready to print your poster, go online to InterPro Domains, DAs and DA Alignment PosterPresentations.com Motivation: Similarity based methods have been widely used in order to Generation of the Domain Architectures: You can also manually change the color of your background by going to VIEW > SLIDE MASTER. After you finish working on the master be infer the properties of genes and gene products containing little or no 1) Collect the hits for each protein from InterPro. Domain annotation coverage Overlap domain hits problem in Need assistance? Call us at 1.510.649.3001 difference b/w domain databases: the InterPro database: sure to go to VIEW > NORMAL to continue working on your poster. experimental annotation. The most popular ones are the sequence 2) Remove all non-domain type hits. similarity search methods such as BLAST. New approaches that overcome 3) Order the domain hits sequentially. How to add Text The template comes with a number of pre- the limitations of the methods that relying solely upon sequence similarity 4) Merge the hits from the same InterPro hierarchy into single hits using formatted placeholders for headers and text QUICK START the condensed view algorithm provided by this resource. are rising. One of these novel approaches is the comparison of the blocks. You can add more blocks by copying 5) Treat the overlapping hits from unrelated InterPro entries. and pasting the existing ones or by adding a Zoom in and out organization/architecture of the structural domains in the proteins. The 6) Add the stretches of residues without domain hits (> 30 a.a.) as “GAP” text box from the HOME menu. As you work on your poster zoom in and out to the idea is that the shared structural units may indicate shared evolutionary level that is more comfortable to you. Go to VIEW > domains in the DAs. ZOOM. and functional properties associated between these units. Text size Figure 3. Domain hit statistics of UniProtKB/SwissProt Figure 4. The fraction of overlap hits by InterPro proteins from various databases domains on the residues of all UniProtKB/SwissProt Adjust the size of your text based on how much content you have to proteins present. The default template text offers a good starting point. Title, Authors, and Affiliations Results: Here we propose a new algorithm for the comparison of domain Statistics about the directionality in DAs: Follow the conference requirements. Start designing your poster by adding the title, the names of the architectures in order to identify similarities and to propagate functional authors, and the affiliated institutions. You can type or paste text annotations between the proteins in the UniProt Database. The method How to add Tables into the provided boxes. The template will automatically adjust the To add a table from scratch go to the INSERT menu and size of your text to fit the title box. You can manually override this “UniProt Domain Architecture Alignment” is unique from previous click on TABLE. A drop-down box will help you select feature and change the size of your text. Figure 1. Different types of overlapping domain approaches in three major ways: (i) the use of InterPro Database for the hits on protein sequences rows and columns. domain annotation, (ii) the incorporation of the domain weights into the You can also copy and a paste a table from Word or TIP: The font size of your title should be bigger than your name(s) Figure 2. Resolution process for the overlap hits. another PowerPoint document. A pasted table may need and institution name(s). dynamic programming step, and (iii) the inclusion of information regarding to be re-formatted by RIGHT-CLICK > FORMAT SHAPE, non-annotated regions in the proteins into the domain architectures. The TEXT BOX, Margins. Domain weighting: performance of the method was measured through the identification of Graphs / Charts orthology using the OMA database (F1 score: 0.62). The results indicated Inverse domain frequency: Nt : total number of proteins in the test set Figure 5. Co-occurrence frequencies of a selection of domain Nd : number of proteins containing domain d pairs, hit together on UniProtKB/SwissProt proteins (InterPro You can simply copy and paste charts and graphs from Excel or accessions of the domains are shown at the top of the bars). the effectiveness of the approach for similarity detection. We plan to Word. Some reformatting may be required depending on how the Adding Logos / Seals original document has been created. integrate the algorithm into a learning based system for the automatic Neighboring domain count: Ed : total number of distinct neighboring domains to d Most often, logos are added on each side of the title. You can insert Evaluation of the performance of the method a logo by dragging and dropping it from your desktop, copy and annotation of uncharacterized proteins in the UniProtKB/TrEMBL database. The performance of the proposed method in identification of orthologous How to change the column configuration paste or by going to INSERT > PICTURES. Logos taken from web sites Term frequency: Nd,p : domain copy number of domain d in protein p Dp : total number of domains in protein p protein sequences proteins from Orthologous Matrix project (OMA) release RIGHT-CLICK on the poster background and select LAYOUT to see are likely to be low quality when printed. Zoom it at 100% to see March 2014 (Altenhoff, et al., 2011). the column options available for this template. The poster columns INTRODUCTION Zmin(d1,d2) & Zmax(d1,d2) : sizes of the what the logo will look like on the final poster and make any shorter and longer hits respectively; of can also be customized on the Master. VIEW > MASTER. necessary adjustments. Domain hit sizes: domain d in protein 1 and in protein 2 The randomly selected UniProtKB/SwissProt proteins from the OMA groups Zav : average size of all domain hits on • Discovery of functional properties for proteins is a key step in all proteins in the set were subjected to the DA alignment procedure. TIP: See if your school’s logo is available on our free poster biomedical research. How to remove the info bars O : similarity ratio between domain d and domain e The performance of the method was evaluated by measuring its ability to Domain similarity measure: d,e If you are working in PowerPoint for Windows and have finished your templates page. identify the orthologous proteins as orthologs usually share the same • Experimental identification of proteins is still a quite laborious and poster, save as PDF and the bars will not be included. You can also expensive task. function. Ap1,p2, Cp1,p2, Fp1,p2, delete them by going to VIEW > MASTER. On the Mac adjust the Photographs / Graphics Weight matrix: Sp1,p2 & Ip1,p2 : local weight matrices Page-Setup to match the Page-Setup in PowerPoint before you You can add images by dragging and dropping from your desktop, • This led to many computational methods being developed to infer the Table 1. Performance results of the proposed method in the identification of orthologous unknown properties of the proteins based on their sequence similarities proteins in OMA groups. create a PDF. You can also delete them from the Slide Master. copy and paste, or by going to INSERT > PICTURES. Resize images Rp1,p2 : raw scoring matrix to experimentally annotated proteins (i.e. BLAST, PSI-BLAST). Final scoring matrix: Wp1,p2 : general weight matrix proportionally by holding down the SHIFT key and dragging one of between proteins 1 and 2 the corner handles. For a professional-looking poster, do not distort Save your work • Different approaches have been tried lately, especially in the field of Save your template as a PowerPoint document. For printing, save as your images by enlarging them disproportionally. protein function prediction, to augment the performance of sequence PowerPoint of “Print-quality” PDF. methods. Weighted Domain Architecture Alignment: • One of these approaches is the study of protein domains: the structural Needleman-Wunsch Global Sequence Alignment algorithm (Needleman and Print your poster building blocks in proteins that are able to function and fold Wunsch, 1970) is the core of the proposed DA alignment method: When you are ready to have your poster printed go online to independently from the rest of the protein. PosterPresentations.com and click on the “Order Your Poster” • Modification of the algorithm in order to work with 7137 distinct button. Choose the poster type the best suits your needs and submit • The concept of domain architectures (DA), defined as the CONCLUSIONS InterPro domains as its alphabet instead of 20 amino acids. your order. If you submit a PowerPoint document you will be organizational properties of a protein regarding the domains it ORIGINAL DISTORTED receiving a PDF proof for your approval prior to printing.

Tunca Doğan , Alex Bateman , Maria J. Martin Your Choice

Enhanced Representation of Natural Product Metabolism in Uniprotkb

Sequencing Alignment I Outline: Sequence Alignment

The EMBL-European Bioinformatics Institute the Hub for Bioinformatics in Europe

Comparative Analysis of Multiple Sequence Alignment Tools

Chapter 6: Multiple Sequence Alignment Learning Objectives

How to Generate a Publication-Quality Multiple Sequence Alignment (Thomas Weimbs, University of California Santa Barbara, 11/2012)

"Phylogenetic Analysis of Protein Sequence Data Using The

Aligning Reads: Tools and Theory Genome Transcriptome Assembly Mapping Mapping

Alignment of Next-Generation Sequencing Data

Errors in Multiple Sequence Alignment and Phylogenetic Reconstruction

Evolution and Function of Drososphila Melanogaster Cis-Regulatory Sequences

Developing and Implementing an Institute-Wide Data Sharing Policy Stephanie OM Dyke and Tim JP Hubbard*