Vita et al. Immunome Research 2011, x:xx Page 1 of 8 http://www.immunome-research.net/xxx RESEARCH Open Access A Model for Collaborative Curation, The IEDB and ChEBI Curation of Non-peptidic Epitopes Randi Vita1*, Bjoern Peters1, Zara Josephs2, Paula de Matos2, Marcus Ennis2, Steve Turner2, Christoph Steinbeck2, Emily Seymour1, Laura Zarebski1, and Alessandro Sette1 Abstract The Immune Epitope Database (IEDB) recently expanded and enhanced its non-peptidic epitope related data utilizing a collaboration with Chemical Entities of Biological Interest (ChEBI), resulting in the first re- source that brings together published immunological data with the expertise of the ChEBI database. This procedure took advantage of the distinct expertise of the IEDB and ChEBI databases to improve content and enhance interoperability of both databases. This project has resulted in the comprehensive inventory and curation of immune epitope data related to non-peptidic structures and serves as a model for success- ful collaborative curation between established resources. Introduction plete; resulting in the manual curation of 11,641 published manuscripts. Throughout this experience, The important discoveries that make up the scientific the IEDB has undergone major revision and ex- literature are hidden within published manuscripts panded its database fields, features, and curation and laboratory notebooks. In order to be fully util- guidelines to ensure accurate, thorough, and consis- ized, this data needs to be converted into the stan- tent curation of peptidic epitope data. However, data dardized formats of databases and these formats related to non-peptidic epitopes such as carbohy- need to be compatible. There exists a need for ex- drates, lipids, and chemicals were largely uncurated. perts in each field of curation in order to understand In order to expand the content of the IEDB to in- the data being curated. This specialized training can clude non-peptidic epitopes we reached out to the be beneficial across different research specialties if experts studying these epitopes and began a fruitful the individual databases collaborate and cooperate. collaboration with the ChEBI resource. Currently, several collaborative standards exist, such as the Open Biomedical Ontologies (OBO) Foundry Results [1] and Minimum Information for Biological and Inventory of the literature describing non- Biomedical Investigations (MIBBI) [2]. Here we peptidic immune epitopes present a novel collaborative curation method that takes advantage of the specialized curators of two As a first step towards making non-peptidic very different databases; the IEDB and ChEBI. This epitope data available to the immunological commu- collaboration resulted in enhancements to both data- nity and scientific community in general, papers po- sets and provided new depth and insights to the data tentially containing data that could be curated and being curated. entered in the IEDB were inventoried. To this end The IEDB contains data related to antibody we adapted an approach already described and vali- and T cell epitopes for humans, non-human pri- dated elsewhere [4]. Briefly, the PubMed database mates, rodents, and other animal species [3]. To date, was searched utilizing a query, purposely designed curation of peptidic epitope data relating to infec- to be very broad and thus inclusive. The results of tious diseases, allergens, and autoimmunity is com- this query were then narrowed to select potentially relevant papers, by the use of an automated text clas- 1La Jolla Institute for Allergy and Immunology, Vaccine Discovery, sifier. This automated classifier was trained on La Jolla, California, USA. 2Chemoinformatics and Metabolism Team, European Bioinformatics Institute, 20,910 abstracts that were manually assigned to be Hinxton, Cambridge, UK curatable or not by a domain expert. The automated * Corresponding author: [email protected] classifier is used to discard all references that with Bjoern Peters [email protected] 95% confidence do not contain curatable informa- Zara Josephs [email protected] tion. The remaining references are manually re- Paula de Matos [email protected] Marcus Ennis [email protected] viewed by a domain expert to select the curatable Steve Turner [email protected] ones. These manual assignments are used to continu- Christoph Steinbeck [email protected] ously update the automated classifier, resulting in its Emily Seymour [email protected] st Laura Zarebski [email protected] continuous improvement [5]. As of October 1 2010, Alessandro Sette [email protected] © 2011 Vita et al; licensee Nikolai Petrovsky Publishing. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Vita et al. Immunome Research 2011, x:xx Page 2 of 8 http://www.immunome-research.net/xxx a total of 27,636 potentially relevant refer- ences were identified and categorized as a function of subject matter. Of those, 2,642 related to HIV research were not scruti- nized further, as HIV is currently outside of the scope of the IEDB. The remaining 24,994 references categorized as described in Davies et al [4] were separated based on whether they described peptidic or non- peptidic epitopes. As shown in Figure 1, it was found that 20% of the references iden- tified in PubMed describe non-peptidic epitopes. While the majority (80%) of the literature describes peptidic epitopes, this large proportion of non-peptidic epitopes was notable and unexpected. Next, we probed in more detail the nature of the non-peptidic epitopes described in these references [Figure 2]. As done previ- ously for peptidic epitopes, the references were clas- references related to transplantation, including for sified first and foremost on the basis of their associa- example complex carbohydrate blood sugar antigens tion to particular diseases and biological processes, and galactose residues involved in xenotransplanta- and then also on the basis of the chemical structure tion. Epitopes predominantly recognized in cancer of the epitope itself. Approximately half (56%) of all research include the Tn, Thomsen-Friedenreich, and non-peptidic epitope references relate to specific dis- Lewis antigens, which are also implicated in autoim- eases, such as allergic diseases (776 references; munity and transplantation. 15%), autoimmunity (648; 13 %), infectious diseases (517; 10%), transplantation (341; 7%) or cancer However, the largest category of non–peptidic refer- (542; 11%). Non-peptidic epitopes related to allergic ences (44%) describe haptens and carbohydrate reactions mostly include molecules involved in con- moieties not directly associated with a particular dis- tact dermatitis, such as nickel or haptens used in ex- ease. These include model epitopes/antigens that are perimental models of allergy. Non-peptidic epitopes used to study fundamental mechanisms of immunity identified in autoimmunity research are mainly lipids (TNP, DNP, ABA, NIP) and other small molecules and glycolipids. Example structures include cardi- such as natural hormones, narcotics, drugs and their olipin and phosphatidic acid. Infectious disease ref- metabolites, pollutants, poisons, and other assorted erences mostly describe carbohydrate and glycolipid small molecules for which detection assays are being epitopes such as LPS and glycosylated proteins. A developed. number of non-peptidic epitopes are discussed in The relative distribution of references relating to non -peptidic epitopes in comparison to ref- erences describing peptidic epitopes also revealed some striking differences. Most notably about 46% of peptidic references are related to infectious diseases, com- pared to about 10% for non-peptidic. Conversely, only about 13% of the pep- tidic references are related to model anti- gens and other molecules, compared to about 44% for non-peptidic. These large differences in distribution are likely the reflection of the widespread use of well- defined small haptens to study immune responses, especially in early immu- nological literature, combined with the technical challenges associated with the exact definition of non-peptidic epitopes recognized in infectious diseases. In © 2011 Vita et al; licensee Nikolai Petrovsky Publishing. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Vita et al. Immunome Research 2011, x:xx Page 3 of 8 http://www.immunome-research.net/xxx conclusion, a large fraction of the immunological Indeed, by far the most challenging aspect literature describes epitopes that are non-peptidic. relating to the curation of non-peptidic epitopes is the representation of their molecular structure and Representing non-peptidic data in the IEDB identification through free text and alpha-numeric identifiers. The commonly used nomenclature is not Having identified references within the IEDB scope uniform. In different references the same chemical and their relative subject matter, we next determined entity can be described by its molecular formula, its to what extent the existing database structure was simplified molecular input line entry specification amenable to the representation of non-peptidic
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages8 Page
-
File Size-