Bioinformatic Characterization of Genes and Proteins Involved in Blood Clotting in Lampreys
Total Page:16
File Type:pdf, Size:1020Kb
UC San Diego UC San Diego Previously Published Works Title Bioinformatic Characterization of Genes and Proteins Involved in Blood Clotting in Lampreys. Permalink https://escholarship.org/uc/item/0d21p5zb Journal Journal of molecular evolution, 81(3-4) ISSN 0022-2844 Author Doolittle, Russell F Publication Date 2015-10-05 DOI 10.1007/s00239-015-9701-0 Peer reviewed eScholarship.org Powered by the California Digital Library University of California Bioinformatic Characterization of Genes and Proteins Involved in Blood Clotting in Lampreys Russell F. Doolittle Journal of Molecular Evolution ISSN 0022-2844 Volume 81 Combined 3-4 J Mol Evol (2015) 81:121-130 DOI 10.1007/s00239-015-9701-0 1 23 Your article is protected by copyright and all rights are held exclusively by Springer Science +Business Media New York. This e-offprint is for personal use only and shall not be self- archived in electronic repositories. If you wish to self-archive your article, please use the accepted manuscript version for posting on your own website. You may further deposit the accepted manuscript version in any repository, provided it is only made publicly available 12 months after official publication or later and provided acknowledgement is given to the original source of publication and a link is inserted to the published article on Springer's website. The link must be accompanied by the following text: "The final publication is available at link.springer.com”. 1 23 Author's personal copy J Mol Evol (2015) 81:121–130 DOI 10.1007/s00239-015-9701-0 ORIGINAL ARTICLE Bioinformatic Characterization of Genes and Proteins Involved in Blood Clotting in Lampreys Russell F. Doolittle1 Received: 2 September 2015 / Accepted: 24 September 2015 / Published online: 5 October 2015 Ó Springer Science+Business Media New York 2015 Abstract Lampreys and hagfish are the earliest diverging coagulation factors VIII and IX, both of which are critical of extant vertebrates and are obvious targets for investi- for the ‘‘intrinsic’’ clotting system and responsible for gating the origins of complex biochemical systems found hemophilia in humans. On the other hand, they have three in mammals. Currently, the simplest approach for such each of genes for factors VII and X, participants in the inquiries is to search for the presence of relevant genes in ‘‘extrinsic’’ clotting system. The strategy of using raw trace whole genome sequence (WGS) assemblies. Unhappily, in sequence ‘‘reads’’ together with partial WGS assemblies the past a high-quality complete genome sequence has not for lampreys can be used in studies on the early evolution been available for either lampreys or hagfish, precluding of other biochemical systems in vertebrates. the possibility of proving gene absence. Recently, improved but still incomplete genome assemblies for two Keywords Lampreys Á Blood clotting Á Trace databases Á species of lamprey have been posted, and, taken together Whole genome sequence assembly problems with an extensive collection of short sequences in the NCBI trace archive, they have made it possible to make reliable counts for specific gene families. Particularly, a Introduction multi-source tactic has been used to study the lamprey blood clotting system with regard to the presence and Lampreys and hagfish are the only two extant genera of absence of genes known to occur in higher vertebrates. As jawless fish (Agnatha). As such, they are central to our was suggested in earlier studies, lampreys lack genes for understanding of the early evolution of vertebrates, offer- ing the possibility of finding simpler versions of complex physiological systems observed in mammals. For example, In the article describing the whole genome assembly for the Japanese it was long ago determined that lampreys have single-chain lamprey (Mehta et al. 2013), the organism is referred to as Lethenteron japonicum. However, the downloaded version of the hemoglobins and not the tetrameric kind found in most database describes each entry as Lethenteron camtschaticum. In the vertebrates (Wald and Riggs 1951). Similarly, early bio- present article, the first-named version is used. Moreover, because of chemical studies suggested that lampreys have a simpler the frequent back and forth comparing the sea lamprey and Japanese blood clotting scheme than do higher vertebrates and one lamprey, occasionally the shortened terms PM (for Petromyzon marinus) and LJ (for Lethenteron japonicum) are used. that is limited to the so-called ‘‘extrinsic’’ clotting system (Doolittle and Surgenor 1962). The same underlying Electronic supplementary material The online version of this question has been asked about numerous other biochemical article (doi:10.1007/s00239-015-9701-0) contains supplementary material, which is available to authorized users. systems: namely, can the early evolutionary stages of complex systems found in higher vertebrates be better & Russell F. Doolittle understood by examining the situation in lampreys or [email protected] hagfish? Over the years, we and others have been attempting to 1 Departments of Chemistry & Biochemistry and Molecular Biology, University of California, San Diego, La Jolla, trace the appearance of the various genes involved in CA 92093-0314, USA vertebrate blood coagulation. Currently, the most direct 123 Author's personal copy 122 J Mol Evol (2015) 81:121–130 method of attempting such evolutionary reconstructions is pairs provided (Table 1). The other recently reported the examination of whole genome DNA sequences (WGS) assembly was for the closely related Japanese lamprey of representative organisms. In this regard, high-quality (Lethenteron japonicum) (Mehta et al. 2013). It has only a genomes are available for several jawed fish, frogs, lizard, slightly higher coverage, but the much longer scaffolds birds, and numerous mammals, and it has been possible to provide vital overlaps for joining segments found in the detail many of the events that have made mammalian blood other databases (Table 1). The lamprey genome has been clotting so complex (Davidson et al. 2003a, b; Jiang and estimated to contain between 1.6 and 2.2 billion bp (Gre- Doolittle 2003; Ponczec et al. 2008). Unhappily, whole gory 2005), suggesting that the average coverage in both of genome sequences for lamprey and hagfish have been late the new assemblies may be as low as 60 % (Table 1). This in coming and are still imperfect, especially for cases may be misleading, however, because gene-rich regions of where a consideration of gene absence is critical. the genome were likely easier to sequence than gene-poor, Several years ago, we attempted to circumvent the high-repeat regions. problem of not having a lamprey WGS by exhaustively In spite of their limitations, in the work reported here, studying individual DNA sequences stored in the NCBI these assemblies have been used in combination with the Trace Archive, which, in the case of the sea lamprey abundant short sequences stored in the Trace Archive (Petromyzon marinus), is an especially rich trove of raw collection to reconstruct fully most of the genes and pro- sequence data. As it happened, the concurrent appearance teins involved in the lamprey coagulation pathway. All the of a draft assembly based on the same Trace data made it evidence supports the earlier conclusions about the possible to make judgments about the presence and likely absences of factors VIII and IX and the presence of addi- absence of the various clotting factors found in mammals, tional factors VII and X in lampreys. The biggest challenge and in the end we cautiously proposed that lampreys have a was that most of the gene products of interest are the result reduced set of such genes (Doolittle et al. 2008). of gene duplications that took place about the same time as In particular, the data suggested that lampreys lack the appearance of vertebrates, exacerbating the problem of genes for factors VIII and IX, two genes that are essential distinguishing orthologs from highly similar paralogs. The for the ‘‘intrinsic’’ clotting system in higher vertebrates and cases of coagulation factors V and VIII are even more defects in which are responsible for hemophilia in humans. problematic because of their being members of the fer- It was already known that genes for these two proteins are roxidase family of proteins, which includes ceruloplasmin present in jawed fishes like the pufferfish (Davidson et al. and hephaestin proteins, themselves composed of three 2003a; Jiang and Doolittle 2003) and zebrafish (Hanu- major domains that are the result of (tandem) duplications. manthaiah et al. 2002), and the proposal was that in the interval between the divergence of the jawless and jawed fish more or less simultaneous duplications of genes for Methods factors X and V gave rise to those for factors IX and VIII. Crucially, factors VIII and IX interact with each other in Various lamprey databases were downloaded on to an in- the same way that factors V and X do, the first-named of house computer. These included current versions of the each pair (VIII and V) serving as a large molecular weight Trace Archive for P. marinus, which, in addition to the vast co-factor for the second-named vitamin K-dependent pro- numbers of random shotgun reads, also contains numerous tease (factors IX and X). sequences from non-genomic sources, including extensive The study also revealed that lampreys have two genes cDNA and EST entries. The Trace Archive also contains for factor X and three for factor VII, vitamin K-dependent many mate pairs that can be used to establish neighborli- factors at the heart of the ‘‘extrinsic’’ clotting system. The ness. As noted in our earlier report, we had also downloaded main shortcoming of the work was that it was not possible a 2007 draft assembly based on the same Trace data from to link together all the various trace sequences for every ftp://genome.wustl.edu/pub/petromyzon_marinus.