Genome-Wide Analysis of the Cupin Superfamily in Cowpea (Vigna
Total Page:16
File Type:pdf, Size:1020Kb
bioRxiv preprint doi: https://doi.org/10.1101/2020.06.07.138958; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 1 Genome-wide analysis of the cupin superfamily in cowpea (Vigna 2 unguiculata) 3 Antônio J. Rocha1*, Mario Ramos de Oliveira Barsottini 2 Ana Luiza Sobral 4 Paiva1, José Hélio Costa1 Thalles Barbosa Grangeiro 1 5 6 1Departamento de Bioquímica e Biologia Molecular, Centro de Ciências, Campus do 7 Pici, Universidade Federal do Ceará, Fortaleza, Ceará, 60.440-900, Brazil 8 2Laboratory of Genome e BioEnergy-LGE. Institute of Biology, State University of 9 Campinas, Campinas, São Paulo, Brazil 10 3Laboratório de Genética Molecular, Departamento de Biologia, Centro de Ciências, 11 Campus do Pici, Universidade Federal do Ceará, Fortaleza, Ceará, 60.440-900, Brazil 12 *To whom all correspondence should be addressed 13 E-mail: [email protected] 14 15 Abstract 16 Cowpea [Vigna unguiculata (L.)Walp.] is an essential food crop that is cultivated in many 17 important arid and semi-arid regions of the world. In this study the genome-wide database 18 of cowpea genes was accessed in search of genomic sequences coding for globulins, 19 specifically members of the cupin superfamily, a well-documented multigenic family 20 belonging to the globulin protein class. A total of seventy-seven genes belonging to the 21 cupin superfamily were found and divided into six families. We classify V. unguiculata 22 genes into two subgroups: classical cupins with one cupin domain (fifty-nine proteins) 23 and bicupins with two cupin domains (eighteen members). In addition, a search for cupin 24 members in other closely related species of the fabaceae family [V. angularis, V. radiatam 25 and Phaseolus vulgaris (common bean)] was performed. Based on those data, a detailed 26 characterization and comparison of the cupin genes on these species was performed with 27 the aim to better understand the connection and functions of cupin proteins from different, 28 but related, plant species. This study was the first attempt to investigate the cupin 29 superfamily in V. unguiculata, allowing the identification of six cupins families and better 30 understand the structural features of those proteins, such as number of domains alternative 31 splicing. 32 Keywords: 33 Vicilin, Leguminous, aminoacid sequences, protein domain, bioRxiv preprint doi: https://doi.org/10.1101/2020.06.07.138958; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 34 1. Introduction 35 Cowpea [Vigna unguiculata (L.)Walp.] is an important food crop that is cultivated 36 in arid and semi-arid regions of Africa, Asia and Americas. In Brazil, it is mainly found 37 in the northeast region, where it is a source of food for the population of that region 38 (Ehlers et al, 1999). 39 Cowpea seed storage proteins are classified in four groups based on their 40 solubility: albumins (water-soluble proteins), prolamins (alcohol-soluble), glutelins (acid 41 or alkali-soluble) and globulins (diluted saline solution-soluble) (Osborne 1924). 42 Globulins, in turn, are divided into two subgroups according to their sedimentation 43 coefficients: 7S and 11S globulin-types, respectively known as vicilins and legumins 44 (Ponzoni et al, 2018). Vicilins constitute the major source of nutrients during cowpea 45 seed development (Kriz et al, 1999) and are composed of several isoforms encoded by 46 multigenic families which are categorized based on the occurrence or not of enzymatic 47 activity (Shotwell and Larkins 2012). 48 Furthermore, the cupin comprises a ubiquitious protein superfamily characterized 49 by the presence of a conserved barrel domain (Dunwell, 1998). This domain has two 50 conserved motifs of β-strands separated by a less conserved region composed by another 51 two β-strands with an intervening variable loop (Dunwell et al., 2000, 2001, 2002, 2003). 52 Cowpea 7S vicilins were found to contain two cupin_1 domains (bicupins), and 53 β-vignins are the main representative of this protein class (Sales et al., 1992, 2001). 54 Cowpea β-vignins associate in trimers that form a carbohydrate-binding multiprotein. 55 Each monomer possesses an oligosaccharide interacting site that confers specific 56 carbohydrate-binding property to the oligomeric structure (Dunwell et al., 2002, 2003). 57 These binding sites are located at the vertices of the triangle-shaped oligomer and the 58 interaction between β-vignins and oligosaccharides, mainly through hydrogen bond 59 interactions (a typical feature of carbohydrate-protein interaction), was suggested by 60 computational simulations (Rocha et al, 2018). 61 In this study the genome-wide database of cowpea genes was accessed in search 62 of genomic sequences coding for cupins, given that its represents a well documented 63 multigenic family of globulins. A total of seventy-seven genes belonging to the cupin 64 superfamily were found, which were then classified into six families by phylogenetic 65 reconstruction methods. V. unguiculata cupin genes were categorized into two groups: 66 classical cupins (fifty-nine proteins) and bicupins (eighteen members). In addition, a 67 search for cupin members on other related species the fabaceae family (V. angularis, V. bioRxiv preprint doi: https://doi.org/10.1101/2020.06.07.138958; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 68 radiatam and Phaseolus vulgaris) was also performed. Based on these data, a detailed 69 characterization and comparison of the cupin genes on the species analyzed was 70 performed with the aim to better understand the connection and functions of cupin 71 proteins from different, but related, plant sources. 72 2. Methods 73 2.1 Dataset 74 The V. unguiculata proteome IT97K-499-35 (genome assembly v1.0), available 75 at the Phytozome database (http://phytozome.jgi.doe.gov/) (Goodstein et al., 2012), was 76 accessed to search for proteins of the cupin superfamily. Furthermore, Rocha et al, 2018 77 were cloned six sequence denominate IT-81d-1053 (3R) and EPACE-10 3(S). 3 78 sequences resistance to C. maculatus and 3 susceptible to C. maculatus, 79 80 2.2 Sequences analysis 81 Analyses of the predicted the cupin superfamily proteins and identification of the 82 cupin domain were performed using five different web servers. Pfam protein Database 83 2.0 (Finn et al, 2016), HMMER with Biosequence analysis using profile hidden Markov 84 Models (Potter et al, 2018), SMART (Schultz et al, 2000) and Simple Modular 85 Architecture Research Tool from the EMBL server and Conserved Domain tool from 86 NCBI (CDD) (Marchler-Bauer., 2017). When applicable, only the result with the highest 87 e-value was considered for analysis. BioEdit 7.2 software was used for edition (insertion 88 and deletion) of amino acids sequences. The presence or absence of signal peptide was 89 assessed with the SignalP 4,1 server (Petersen et al, 2011). The MEGA 7 (Tamakura et 90 al, 2016) software was used for construction of phylogenetic tree using the Neighbor- 91 Joining method (Saitou et al, 1987) with bootstrap values (1000 replicates). 92 93 323 Protein structural model, docking and dynamic molecular 94 Proteins structural modeling, docking, and dynamic molecular were performed 95 essentially as described elsewhere (Rocha et al., 2018). 96 97 3. Results and discussion 98 231- Cupin gene identification and analysis 99 We identified 77 gene sequences encoding proteins containing one or two copies 100 of the cupin superfamily domain in the genome of V. unguiculata (Table S1). These bioRxiv preprint doi: https://doi.org/10.1101/2020.06.07.138958; this version posted June 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 101 sequences were grouped into six cupin families (cupin-1 to cupin-5 and cupin-8), and no 102 sequences related to cupin-6 and cupin-7 families were found (Table S1). 103 The cupin-1 domain consists of a conserved barrel structure, and members of the 104 cupin-1 family are represented by 11S and 7S seed storage globulins (termed legumins 105 and vicilins, respectively) and germins (Dunwell et al, 1998). Legumins and vicilins are 106 two-domain proteins (bicupins), whereas germins are single-domain molecules 107 (monocupins) (Rocha et al, 2018) (Figure 1). β-vignins are the most abundant vicilins 108 found in the genome of cowpea comprising 17 sequences (Table S 1 and 3), from which 109 four are devoid of secretion signal peptides sequences: Vigun03g085800.1, 110 Vigun03g085900.1, Vigun05g254700.1, vigun11g151800.1 (Table S2). 111 In a recent study based on computational simulations, Rocha et al (2018) 112 demonstrated the presence of two cupin-1 domains in the primary structure of several β- 113 vignin isoforms from two V. unguiculata genotypes (EPACE-10 and IT81D-1053) 114 differing in the resistance to the bruchid beetle disease (Callosobrcuhus maculatus). In 115 that study, the authors observed by computational simulations that β-vignin sequences 116 presented a unique chitin-binding site (ChBS) in the N-terminal and in C-terminal ends 117 (figure supp. 1). Those findings revealed the presence of three ChBS, which supports the 118 hypothesis of the interaction of V. unguiculata β-vignins with the monosaccharide N- 119 Acetyl-D-Glucosamine (GlcNac) and possibly its oligomeric derivatives, as observed for 120 other bicupins in the present study (figure S1). 121 Cupin families 2 and 4 were represented by only one sequence with one cupin 122 domain each.