A Java Application Combining Protein Information with Epitope Prediction Software

Total Page:16

File Type:pdf, Size:1020Kb

A Java Application Combining Protein Information with Epitope Prediction Software EpiProt: A Java Application Combining Protein Information With Epitope Prediction Software The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation O'Mara, Patrick. 2016. EpiProt: A Java Application Combining Protein Information With Epitope Prediction Software. Master's thesis, Harvard Extension School. Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:33797405 Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#LAA EpiProt: A Java Application Combining Protein Information with Epitope Prediction Software Patrick O’Mara A Thesis in the Field of Biotechnology for the Degree of Master of Liberal Arts in Extension Studies. Harvard University November 2016 Abstract EpiProt is a bioinformatics suite designed to help users determine antigenic regions within proteins for the production of antibodies. It combines seven epitope prediction programs (five from the Immune Epitope Database, ABCPred and BcePred) with protein annotation information giving users a comprehensive view of a protein targeted for antigen design. The user can perform BLAST searches using UniProtJAPI to find similar proteins to perform pairwise and multiple sequence alignments using ClustalW, Clustal Omega, MUSCLE, T-coffee, or MAFFT. The resulting alignment is shown in a display where the user can annotate the alignments with predicted epitope regions, post-translational modifications (from PhophoSitePlus and UniProt), subcellular location (UniProt), protein processing (UniProt), secondary structure (from SIFTS/PDB) and predicted secondary structure (PsiPred and JPred). Built using Java 8, EpiProt uses Java Swing to create the graphical user interface. EpiProt is designed with a model-view- presenter (MVP) architecture pattern requiring an internet connection to run and collect data via Representational State Transfer (RESTful) for viewing. Table of Contents List of Tables ................................................................................................................... viii List of Figures .................................................................................................................... ix I. Introduction .....................................................................................................................1! Antibodies and Antigens ..........................................................................................2! Epitopes ....................................................................................................................2! In Vivo Generation of Antibodies ............................................................................3! Manufacturing Antibodies for Applications ............................................................4! Antigen Design Principles .......................................................................................7! Accessibility .................................................................................................8! Immunogenicity ...........................................................................................8! Specificity ....................................................................................................9! Antigen Design Process .........................................................................................10! Intra-species Alignment .............................................................................10! Inter-species Alignment .............................................................................11! Isoform Alignment .....................................................................................11! Protein Annotation .....................................................................................12! Protein structure .............................................................................12! Topology and protein interaction information ...............................14! Post-translational modifications and mutation sites ......................14! Protein processing ..........................................................................15! iv General guidelines .........................................................................16! Antigen Design Summary ..........................................................................16! Epitope Prediction Sources ....................................................................................17! Similar Approaches and Deficiencies ....................................................................17! MacVector ..................................................................................................18! Jalview .......................................................................................................19! AbDesigner ................................................................................................21! Ig-score ..........................................................................................23! Uniqueness .....................................................................................23! Conservation ..................................................................................23! II. Software Design ...........................................................................................................25! Development Software ...........................................................................................25! Third-Party Software .............................................................................................25! PsiPred 3.5 .................................................................................................26! JPred 1.5 .....................................................................................................29! JABAWS 2.1.0 ..........................................................................................29! III. Software Architecture .................................................................................................30! View .......................................................................................................................32! General Display .........................................................................................32! Menu Bars ......................................................................................33! BLAST panel .................................................................................34! Header Pane ...................................................................................34! v Editor Pane .....................................................................................35! Purpose .......................................................................................................35! Presenter .................................................................................................................37! Model .....................................................................................................................37! Data Access Objects ..................................................................................39! Not MVP: Protein.java...............................................................................40! The Services ...............................................................................................41! UniprotService.java .......................................................................41! UniprotBlastService.java ...............................................................42! PsiPredService.java ........................................................................42! JPredService.java ...........................................................................42! SiftService.java ..............................................................................43! PhosphoSitePlus.java .....................................................................43! IedbEpitopePredictionService.java ................................................43! BcePredPredictionService.java ......................................................44! ABCPredService.java ....................................................................46! JabawsService.java ........................................................................48! IV. Summary and Conclusion ...........................................................................................50! Proposed vs. Actual Program .................................................................................50! Future Considerations ............................................................................................51! Summary ................................................................................................................52! Appendix ............................................................................................................................55! vi Additional Tables ...................................................................................................55! Source Code ...........................................................................................................59! Bibliography ....................................................................................................................423! vii List of Tables Table 1. Summary of the epitope prediction programs available in EpiProt
Recommended publications
  • Analysis of Gene Expression Data for Gene Ontology
    ANALYSIS OF GENE EXPRESSION DATA FOR GENE ONTOLOGY BASED PROTEIN FUNCTION PREDICTION A Thesis Presented to The Graduate Faculty of The University of Akron In Partial Fulfillment of the Requirements for the Degree Master of Science Robert Daniel Macholan May 2011 ANALYSIS OF GENE EXPRESSION DATA FOR GENE ONTOLOGY BASED PROTEIN FUNCTION PREDICTION Robert Daniel Macholan Thesis Approved: Accepted: _______________________________ _______________________________ Advisor Department Chair Dr. Zhong-Hui Duan Dr. Chien-Chung Chan _______________________________ _______________________________ Committee Member Dean of the College Dr. Chien-Chung Chan Dr. Chand K. Midha _______________________________ _______________________________ Committee Member Dean of the Graduate School Dr. Yingcai Xiao Dr. George R. Newkome _______________________________ Date ii ABSTRACT A tremendous increase in genomic data has encouraged biologists to turn to bioinformatics in order to assist in its interpretation and processing. One of the present challenges that need to be overcome in order to understand this data more completely is the development of a reliable method to accurately predict the function of a protein from its genomic information. This study focuses on developing an effective algorithm for protein function prediction. The algorithm is based on proteins that have similar expression patterns. The similarity of the expression data is determined using a novel measure, the slope matrix. The slope matrix introduces a normalized method for the comparison of expression levels throughout a proteome. The algorithm is tested using real microarray gene expression data. Their functions are characterized using gene ontology annotations. The results of the case study indicate the protein function prediction algorithm developed is comparable to the prediction algorithms that are based on the annotations of homologous proteins.
    [Show full text]
  • Macvector 13.5 Workshop
    MacVector 13 Workshop September 2014 MacVector 13.5 for Mac OS X Getting Started with MacVector 13.5 Support: 1 866 338 0222 +44 (0) 1223 410552 [email protected] MacVector 13 Workshop 1 MacVector 13 Workshop September 2014 MacVector Resources Tutorials A number of tutorials are available for download; http://www.macvector.com/downloads.html Videos http://www.macvector.com/Screencasts/screencasts2.html Manual There is a downloadable PDF version of the manual (12.0) at http://www.macvector.com/downloads.html#MacVector12UserGuide Discussion Forums To post questions or follow ongoing discussions, check out the user forums at; http://www.macvector.com/phpbb/index.php Copyright Statement Copyright MacVector, Inc, 2014. All rights reserved. This document contains proprietary information of MacVector, Inc and its licensors. It is their exclusive property. It may not be reproduced or transmitted, in whole or in part, without written agreement from MacVector, Inc. The software described in this document is furnished under a license agreement, a copy of which is packaged with the software. The software may not be used or copied except as provided in the license agreement. MacVector, Inc reserves the right to make changes, without notice, both to this publication and to the product it describes. Information concerning products not manufactured or distributed by MacVector, Inc is provided without warranty or representation of any kind, and MacVector, Inc will not be liable for any damages. This version of the workshop guide was published in November
    [Show full text]
  • Macvector 10.6 2 Macvector User Guide Copyright Statement
    MacVector 10.6 2 MacVector User Guide Copyright statement Copyright MacVector, Inc, 2008. All rights reserved. This document contains proprietary information of MacVector, Inc and its licensors. It is their exclusive property. It may not be reproduced or transmitted, in whole or in part, without written agreement from MacVector, Inc. The software described in this document is furnished under a license agreement, a copy of which is packaged with the software. The software may not be used or copied except as provided in the license agreement. MacVector, Inc reserves the right to make changes, without notice, both to this publication and to the product it describes. Information concerning products not manufactured or distributed by MacVector, Inc is provided without warranty or representation of any kind, and MacVector, Inc will not be liable for any damages. MacVector User Guide 3 4 MacVector User Guide 1 Introduction to the User Guide ....................................19 Overview....................................................................................19 The MacVector documentation set.............................................20 About this user guide..............................................................20 Conventions in this user guide...................................................20 Interface conventions .............................................................21 Navigation aids ..........................................................................21 2 Introduction to MacVector.............................................23
    [Show full text]
  • Variation in Protein Coding Genes Identifies Information
    bioRxiv preprint doi: https://doi.org/10.1101/679456; this version posted June 21, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Animal complexity and information flow 1 1 2 3 4 5 Variation in protein coding genes identifies information flow as a contributor to 6 animal complexity 7 8 Jack Dean, Daniela Lopes Cardoso and Colin Sharpe* 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Institute of Biological and Biomedical Sciences 25 School of Biological Science 26 University of Portsmouth, 27 Portsmouth, UK 28 PO16 7YH 29 30 * Author for correspondence 31 [email protected] 32 33 Orcid numbers: 34 DLC: 0000-0003-2683-1745 35 CS: 0000-0002-5022-0840 36 37 38 39 40 41 42 43 44 45 46 47 48 49 Abstract bioRxiv preprint doi: https://doi.org/10.1101/679456; this version posted June 21, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Animal complexity and information flow 2 1 Across the metazoans there is a trend towards greater organismal complexity. How 2 complexity is generated, however, is uncertain. Since C.elegans and humans have 3 approximately the same number of genes, the explanation will depend on how genes are 4 used, rather than their absolute number.
    [Show full text]
  • Comparative Analysis of Human Chromosome 7Q21 and Mouse
    Downloaded from genome.cshlp.org on October 2, 2021 - Published by Cold Spring Harbor Laboratory Press Letter Comparative analysis of human chromosome 7q21 and mouse proximal chromosome 6 reveals a placental-specific imprinted gene, TFPI2/Tfpi2, which requires EHMT2 and EED for allelic-silencing David Monk,1,6 Alexandre Wagschal,2 Philippe Arnaud,2 Pari-Sima Mu¨ller,3 Layla Parker-Katiraee,4 Déborah Bourc’his,5 Stephen W. Scherer,4 Robert Feil,2 Philip Stanier,1 and Gudrun E. Moore1 1Institute of Child Health, London WC1N 1EH, United Kingdom; 2Institute of Molecular Genetics, CNRS UMR-5535 and University of Montpellier-II, 34293 Montpellier, France; 3Sir William Dunn School of Pathology, University of Oxford, Oxford OX1 3RE, United Kingdom; 4Center for Applied Genomics, The Hospital for Sick Children, Toronto M5G 1L7, Canada; 5Inserm U741, F-75251 Paris Cedex 05, France Genomic imprinting is a developmentally important mechanism that involves both differential DNA methylation and allelic histone modifications. Through detailed comparative characterization, a large imprinted domain mapping to chromosome 7q21 in humans and proximal chromosome 6 in mice was redefined. This domain is organized around a maternally methylated CpG island comprising the promoters of the adjacent PEG10 and SGCE imprinted genes. Examination of Dnmt3l−/+ conceptuses shows that imprinted expression for all genes of the cluster depends upon the germline methylation at this putative “imprinting control region” (ICR). Similarly as for other ICRs, we find its DNA-methylated allele to be associated with trimethylation of lysine 9 on histone H3 (H3K9me3) and trimethylation of lysine 20 on histone H4 (H4K20me3), whereas the transcriptionally active paternal allele is enriched in H3K4me2 and H3K9 acetylation.
    [Show full text]
  • Clustering and Apple
    Apple in Research Rajiv Pillai Power of UNIX. Simplicity of Macintosh. Mac OS X: The easy way to be open Comand Line Interface FreeBSD 5 Editors Commands and Utilities Shells Scripting languages The Best Foundation The Best Foundation Secure Scalable Open standards High performance Rock-solid stability Advanced networking Built on Open Source Over 100 Open Source Projects Apple Confidential Modern Languages • GCC 3.3 • Perl 5.8.1 • Python 2.3 • PHP 4.3.2 • TCL 8.4.2 • Ruby 1.6.8 • Bash 2.05 Integrated X11 Quartz window Runs side-by- manager side with native applications (or full screen) Accelerated graphics Launch from Finder Dock menu A Whole New World of Solutions Bringing Mac OS X into new markets Scientific Distributed Enterprise Mathematica Platform LSF Oracle 10g MATLAB Globus Sybase BLAST Sun Grid Engine HP OpenView HMMER MPI SAP client GROMACS PBS SAS GeneSpring Myrinet JBoss (J2EE) PyMol Infiniband Tomcat (JSP) IBM XL Fortran iNquiry Axis (SOAP) Mac OS X The Best of Both Worlds Open Like Linux Convenient Like Windows Open Source Shrink-wrap solutions Open Standards Fits in to existing networks Open APIs & Applications Single point of support Runs all the Apps a Scientist Needs A single system on their desk • Their favorite GUI applications (e.g. Mathematica, Gaussian, Vector NTI, TurboWorx, more) • And their favorite UNIX applications (e.g. Phred/Phrap, HMMer, BLAST, Smith-Waterman, more) • And their favorite productivity tools (e.g. Photoshop, Microsoft Office, Outlook email client) • All run simultaneously on Mac OS X (Yes! Side by side) Apple Confidential Over 100 installations since March Academic Government and Commercial Harvard University Naval Medical Research Center Isis Pharmaceuticals Stanford University Scripps Research Institute Cincinnati Children’s Hospital Cornell University Children’s Mercy Hospital U.S.
    [Show full text]
  • Macvector 12.5 Getting Started Guide
    MacVector 12.5 Getting Started Guide Copyright statement Copyright MacVector, Inc, 2011. All rights reserved. This document contains proprietary information of MacVector, Inc and its licensors. It is their exclusive property. It may not be reproduced or transmitted, in whole or in part, without written agreement from MacVector, Inc. The software described in this document is furnished under a license agreement, a copy of which is packaged with the software. The software may not be used or copied except as provided in the license agreement. MacVector, Inc reserves the right to make changes, without notice, both to this publication and to the product it describes. Information concerning products not manufactured or distributed by MacVector, Inc is provided without warranty or representation of any kind, and MacVector, Inc will not be liable for any damages. 2 Table of Contents COPYRIGHT STATEMENT ................................................................................ 2 INTRODUCTION.................................................................................................. 4 GETTING SEQUENCE INFORMATION INTO MACVECTOR ...................... 4 IMPORTING SEQUENCE FILES .............................................................................4 CREATING NEW SEQUENCES...............................................................................4 OPENING SEQUENCES FROM ENTREZ ...............................................................5 VIEWING AND EDITING SEQUENCES...........................................................
    [Show full text]
  • The Arf1p Gtpase-Activating Protein Glo3p Executes Its Regulatory Function Through a Conserved Repeat Motif at Its C-Terminus
    2604 Research Article The Arf1p GTPase-activating protein Glo3p executes its regulatory function through a conserved repeat motif at its C-terminus Natsuko Yahara1,*, Ken Sato1,2 and Akihiko Nakano1,3,‡ 1Molecular Membrane Biology Laboratory, RIKEN Discovery Research Institute and 2PRESTO, Japan Science and Technology Agency, Hirosawa, Wako, Saitama 351-0198, Japan 3Department of Biological Sciences, Graduate School of Science, University of Tokyo, Hongo, Bunkyo-ku, Tokyo 113-0033, Japan *Present address: Department of Biochemistry, University of Geneva, Sciences II, Geneva, Switzerland ‡Author for correspondence (e-mail: [email protected]) Accepted 21 March 2006 Journal of Cell Science 119, 2604-2612 Published by The Company of Biologists 2006 doi:10.1242/jcs.02997 Summary ADP-ribosylation factors (Arfs), key regulators of ArfGAP, Gcs1p, we have shown that the non-catalytic C- intracellular membrane traffic, are known to exert multiple terminal region of Glo3p is required for the suppression of roles in vesicular transport. We previously isolated eight the growth defect in the arf1 ts mutants. Interestingly, temperature-sensitive (ts) mutants of the yeast ARF1 gene, Glo3p and its homologues from other eukaryotes harbor a which showed allele-specific defects in protein transport, well-conserved repeated ISSxxxFG sequence near the C- and classified them into three groups of intragenic terminus, which is not found in Gcs1p and its homologues. complementation. In this study, we show that the We name this region the Glo3 motif and present evidence overexpression of Glo3p, one of the GTPase-activating that the motif is required for the function of Glo3p in vivo. proteins of Arf1p (ArfGAP), suppresses the ts growth of a particular group of the arf1 mutants (arf1-16 and arf1-17).
    [Show full text]
  • Macvector 12.6 User Guide2.Pdf
    MacVector 12.6 MacVector User Guide 1 2 MacVector User Guide Copyright statement Copyright MacVector, Inc, 2012. All rights reserved. This document contains proprietary information of MacVector, Inc and its licensors. It is their exclusive property. It may not be reproduced or transmitted, in whole or in part, without written agreement from MacVector, Inc. The software described in this document is furnished under a license agreement, a copy of which is packaged with the software. The software may not be used or copied except as provided in the license agreement. MacVector, Inc reserves the right to make changes, without notice, both to this publication and to the product it describes. Information concerning products not manufactured or distributed by MacVector, Inc is provided without warranty or representation of any kind, and MacVector, Inc will not be liable for any damages. Trademarks Gateway®, TOPO®, Vector NTI® and Zero Blunt® are regiestered trademarks of Life Technologies, Carlsbad, California, USA. Vector NTI Advance™ is a trademark of Life Technologies, Carlsbad, California, USA. MacVector User Guide 3 4 MacVector User Guide 1 Introduction to the User Guide ....................................19 Overview....................................................................................19 The MacVector documentation set.............................................20 About this user guide..............................................................20 Conventions in this user guide...................................................20
    [Show full text]
  • Views [10-12] in Favour of Information Theory- Methods
    BMC Bioinformatics BioMed Central Research article Open Access Evaluation of GO-based functional similarity measures using S. cerevisiae protein interaction and expression profile data Tao Xu1,2, LinFang Du*2 and Yan Zhou*3,1 Address: 1Shanghai-MOST Key Laboratory of Health and Disease Genomics, Chinese National Human Genome Center at Shanghai, Shanghai 201203, PR China, 2College of Life Sciences, Sichuan University, Chengdu 610064, PR China and 3Department of Microbiology, School of Life Sciences, Fudan University, Shanghai 200433, PR China Email: [email protected]; LinFangDu*[email protected]; Yan Zhou* - [email protected] * Corresponding authors Published: 6 November 2008 Received: 18 March 2008 Accepted: 6 November 2008 BMC Bioinformatics 2008, 9:472 doi:10.1186/1471-2105-9-472 This article is available from: http://www.biomedcentral.com/1471-2105/9/472 © 2008 Xu et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract Background: Researchers interested in analysing the expression patterns of functionally related genes usually hope to improve the accuracy of their results beyond the boundaries of currently available experimental data. Gene ontology (GO) data provides a novel way to measure the functional relationship between gene products. Many approaches have been reported for calculating the similarities between two GO terms, known as semantic similarities. However, biologists are more interested in the relationship between gene products than in the scores linking the GO terms.
    [Show full text]
  • Macvector 11 2 Macvector User Guide Copyright Statement
    MacVector 11 2 MacVector User Guide Copyright statement Copyright MacVector, Inc, 2009. All rights reserved. This document contains proprietary information of MacVector, Inc and its licensors. It is their exclusive property. It may not be reproduced or transmitted, in whole or in part, without written agreement from MacVector, Inc. The software described in this document is furnished under a license agreement, a copy of which is packaged with the software. The software may not be used or copied except as provided in the license agreement. MacVector, Inc reserves the right to make changes, without notice, both to this publication and to the product it describes. Information concerning products not manufactured or distributed by MacVector, Inc is provided without warranty or representation of any kind, and MacVector, Inc will not be liable for any damages. Trademarks Gateway®, TOPO®, Vector NTI® and Zero Blunt® are regiestered trademarks of Life Technologies, Carlsbad, California, USA. Vector NTI Advance™ is a trademark of Life Technologies, Carlsbad, California, USA. MacVector User Guide 3 4 MacVector User Guide 1 Introduction to the User Guide ....................................19 Overview....................................................................................19 The MacVector documentation set.............................................20 About this user guide..............................................................20 Conventions in this user guide...................................................20 Interface conventions
    [Show full text]
  • MV 12.5.1 Release Notes
    MacVector 12.5.1 for Mac OS X System Requirements MacVector 12.5 runs on any PowerPC or Intel Macintosh running Mac OS X 10.5 or higher. It is a Universal Binary, meaning that it runs natively on both PowerPC and Intel based Macintosh computers. There are no specific hardware requirements for MacVector – if your machine can run OS X 10.5 or above, it can run MacVector. A complete installation of MacVector 12.5 uses approximately 160 MB of disk space. Installation and License Activation Install MacVector 12.5 by double-clicking on the MacVector 12.5.mpkg installer application. You will be prompted for a system administrator account and password during installation. As with MacVector 12.0, once installation is complete, you must enter a valid serial number and activation code the first time you run MacVector. This information is usually sent by e-mail and is also printed on the inside of the CD sleeve. If you previously installed MacVector 12.0 and have a serial number with a maintenance end date of Oct 1st 2011 or later, MacVector 12.5 will automatically use your existing license and you will not be required to enter the details again. Changes for MacVector 12.5.1 Bug Fixes Several occasional crashes have been fixed. Printing from the multiple sequence alignment editor now no longer prints additional blank pages. A bug leading to corrupted data when copying and pasting items between Restriction Enzyme documents has been fixed. The CDS translations display in the single sequence editor now updates correctly when residues are inserted before CDS features in the editor.
    [Show full text]