Research Article Title: The shrinking human protein coding complement: are there now fewer than 20,000 genes? Authors: Iakes Ezkurdia1*, David Juan2*, Jose Manuel Rodriguez3, Adam Frankish4, Mark Diekhans5, Jennifer Harrow4, Jesus Vazquez 6, Alfonso Valencia2,3, Michael L. Tress2,*. Affiliations: 1. Unidad de Proteómica, Centro Nacional de Investigaciones Cardiovasculares, CNIC, Melchor Fernández Almagro, 3, rid, 28029, MadSpain 2. Structural Biology and Bioinformatics Programme, Spanish National Cancer Research Centre (CNIO), Melchor Fernández Almagro, 3, 28029, Madrid, Spain 3. National Bioinformatics Institute (INB), Spanish National Cancer Research Centre (CNIO), Melchor Fernández Almagro, 3, 28029, Madrid, Spain 4. Wellcome Trust Sanger Institute, Wellcome Trust Campus, Hinxton, Cambridge CB10 1SA, UK 5. Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA 6. Laboratorio de Proteómica Cardiovascular, Centro Nacional de Investigaciones Cardiovasculares, CNIC, Melchor Fernández Almagro, 3, 28029, Madrid, Spain *: these two authors wish to be considered as joint first authors of the paper. Corresponding author: Michael Tress,
[email protected], Tel: +34 91 732 80 00 Fax: +34 91 224 69 76 Running title: Are there fewer than 20,000 protein-coding genes? Keywords: Protein coding genes, proteomics, evolution, genome annotation Abstract Determining the full complement of protein-coding genes is a key goal of genome annotation. The most powerful approach for confirming protein coding potential is the detection of cellular protein expression through peptide mass spectrometry experiments. Here we map the peptides detected in 7 large-scale proteomics studies to almost 60% of the protein coding genes in the GENCODE annotation the human genome.