UNIVERSIDADE FEDERAL DO CEARÁ CENTRO DE CIÊNCIAS DEPARTAMENTO DE BIOQUÍMICA E BIOLOGIA MOLECULAR PROGRAMA DE PÓS-GRADUAÇÃO EM BIOQUÍMICA

JOSÉ EDNÉSIO DA CRUZ FREIRE

MODIFICAÇÕES PÓS-TRADUCIONAIS REVELAM QUE Mo-CBP3, UMA ALBUMINA 2S LIGANTE DE QUITINA DE Moringa oleifera, É UMA MISTURA COMPLEXA DE ISOFORMAS

FORTALEZA 2018

ii

JOSÉ EDNÉSIO DA CRUZ FREIRE

MODIFICAÇÕES PÓS-TRADUCIONAIS REVELAM QUE Mo-CBP3, UMA ALBUMINA 2S LIGANTE DE QUITINA DE Moringa oleifera, É UMA MISTURA COMPLEXA DE ISOFORMAS

Tese de Doutorado apresentada ao Programa de Pós-Graduação em Bioquímica, do Departamento de Bioquímica e Biologia Molecular da Universidade Federal do Ceará, como requisito parcial para obtenção do título de Doutor em Bioquímica. Área de concentração: Bioquímica Vegetal.

Orientador: Profº. Dr. Thalles Barbosa Grangeiro

FORTALEZA 2018 14

15

JOSÉ EDNÉSIO DA CRUZ FREIRE

MODIFICAÇÕES PÓS-TRADUCIONAIS REVELAM QUE Mo-CBP3, UMA ALBUMINA 2S LIGANTE DE QUITINA DE Moringa oleifera, É UMA MISTURA COMPLEXA DE ISOFORMAS

Tese de Doutorado apresentada ao Programa de Pós-Graduação em Bioquímica, do Departamento de Bioquímica e Biologia Molecular da Universidade Federal do Ceará, como requisito parcial para obtenção do título de Doutor em Bioquímica. Área de concentração: Bioquímica Vegetal.

APROVADA EM: _____28 / _____03 / _____2018

BANCA EXAMINADORA

Profº. Drº. Thalles Barbosa Grangeiro (Orientador) Universidade Federal do Ceará (UFC)

Prof. Dr. Geancarlo Zanatta Universidade Federal do Ceará (UFC)

Prof. Dr. Bruno Lopes de Sousa Universidade Estadual do Ceará (UECE)

Prof. Dr. Ito Liberato Barroso Neto Centro Universitário Unichristus

Prof. Dr. Rômulo Farias Carneiro Universidade Federal do Ceará (UFC) 16

A Deus pela ciência necessária para desenvolver este trabalho.

A minha queria e compreensiva esposa, Cindy;

Aos saudosos, Genésio e Edite (meus pais);

A minha Madrasta, Maria por ter ajudado sempre que necessário;

Aos meus Irmãos, Benedito, Genedito, Evandito, Genedier, Efigênia e Lidiene,

Dedico, com amor. 17

AGRADECIMENTOS

Ao Profº. Dr. Thalles Barbosa Grangeiro, por me acolher em seu laboratório e dedicar uma valiosa orientação. Pelos questionamentos, explicações, soluções e exemplo, o qual tentarei seguir na minha caminhada científica.

Ao Profº. Dr. José Edvar Monteiro Júnior, por sua grande contribuição para a realização deste trabalho, por meio de sugestões e explicações e co-orientação.

Aos Profº. Dr. Geancarlo Zanatta, Dr. Bruno Lopes de Sousa, Dr. Ito Liberato Barroso Neto e Dr. Rômulo Farias Carneiro por aceitarem o convite para participar desta banca. A todos os professores do Departamento de Bioquímica e Biologia Molecular que, de alguma forma, colaboraram para a realização deste trabalho.

A todos os integrantes dos Laboratórios de Genética de microrganismos.

A Deus, pela ciência necessária para a execução deste trabalho.

MUITO OBRIGADO!

18

AGRADECIMENTOS INSTITUCIONAIS

Este trabalho foi realizado graças ao auxílio das seguintes Instituições:

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES).

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq).

Fundação Cearense de Apoio ao Desenvolvimento Científico e Tecnológico (FUNCAP).

Universidade Federal do Ceará, Departamento de Bioquímica e Biologia Molecular do Centro de Ciências e Universidade de Fortaleza, Núcleo de Biologia Experimental, Centro de Ciências da Saúde, em cujos laboratórios foi realizada esta pesquisa.

19

RESUMO

O estudo de albuminas 2S expressas em sementes de Moringa oleifera tem sido um processo muito complexo, especialmente quando empregadas técnicas proteômicas convencionais devido à presença de muitas isoformas. A grande diversidade de isoformas proteicas pode advir, em parte, devido à origem multigênica, além do intenso processamento pós-traducional. Neste contexto, este trabalho teve como objetivos investigar a ocorrência de novas isoformas de Mo-CBP3 e caracterizar estas proteínas quanto à presença de modificações pós- traducionais (MPTs). O trabalho empregou uma abordagem experimental que incluiu (1) clonagem e sequenciamento de fragmentos de DNA genômico codificando Mo-CBP3; (2) análise de massas moleculares por LC-ESI-MS; e (3) análise computacional, com o objetivo de correlacionar massas teóricas e experimentais. Um total de 32 clones diferentes de Mo-

CBP3 foram obtidos e completamente sequenciados. Após alinhamento múltiplo das sequências de aminoácidos deduzidas foi possível agrupá-las em oito grupos distintos. Destas, quatro isoformas foram descritas anteriormente (Mo-CBP3-1, Mo-CBP3-2, Mo-CBP3-3 e Mo-

CBP3-4), enquanto que as isoformas Mo-CBP3-2A (163 resíduos de aminoácidos), Mo-CBP3-

2B (162 resíduos), Mo-CBP3-3A (160 resíduos) e Mo-CBP3-3B (160 aa) foram descritas pela primeira vez neste trabalho. Análises das sequências de aminoácidos, deduzidas de DNA genômico, sugeriram que as isoformas de Mo-CBP3 são sintetizadas como preproproteínas, contendo um peptídeo sinal N-terminal, um propeptídeo N-terminal, uma cadeia menor com cerca de 4 kDa, um peptídeo de ligação entre as duas cadeias, uma cadeia maior com aproximadamente 8 kDa, e uma extensão C-terminal. ESI-MS revelaram 147 valores distintos de massas moleculares, dos quais 89 correspondiam às variantes da cadeia menor e 58 às variantes da cadeia maior. Valores de massa molecular foram calculados a partir das sequências de aminoácidos, assumindo diferentes graus de processamento proteolítico nas extremidades N- e C-terminais de cada cadeia. Além disso, possíveis alterações produzidas por diferentes PTMs, como hidroxilação de prolina, fosforilação de serina ou treonina, oxidação de metionina e cliclização de glutamina N-terminal em piroglutamato (pGlu), foram também levados em consideração. Usando essa estratégia, a maioria das massas experimentais da cadeia menor (72 de um total de 89) e quase todas as massas da cadeia maior (57 de um total de 58) puderam ser atribuídas a sequências específicas de aminoácidos, das diferentes isoformas de Mo-CBP3. Estes resultados sugerem que: i) as isoformas de Mo-CBP3 são oriundas, primordialmente, de membros proximamente relacionados de uma família 20

multigênica; ii) uma vez processados, os precursores codificados a partir dos diferentes mRNAs originam duas cadeias, uma menor (4 kDa) e outra maior (8 kDa), unidas por pontes dissulfeto; as extremidades N- e C-terminais de ambas as cadeias podem sofrer processamento proteolítico, resultando na remoção de um ou alguns resíduos de cada sequência, gerando cadeias com massas ligeiramente distintas; iii) modificações pós- traducionais em determinados resíduos de uma das cadeias polipeptídicas ou de ambas são uma terceira fonte de variação na massa molecular, aumentando substancialmente o número de isoformas. Em conclusão, uma mistura extremamente complexa de isoformas de Mo-CBP3, oriundas de alguns poucos genes, é produzida mediante diferentes combinações de PTMs distintas. Estes resultados constituem importantes avanços na compreensão dos mecanismos pós-traducionais inerentes que operam durante a biossíntese das albuminas 2S nas sementes da M. oleifera.

Palavras-Chave: Moringa oleífera. Modificação pós-traducional. albuminas 2S. Mo-CBP3.

21

ABSTRACT

The study of 2S expressed in Moringa oleifera seeds has been a very complex process, especially when using conventional proteomic techniques due to the presence of many isoforms. The great diversity of isoforms may be dueto the multigenic origin, in addition to the intense post-translational processing. In this context, the aim of this work was to investigate the occurrence of new Mo-CBP3 isoforms and characterize these for the presence of post-translational modifications (PTMs). The work employed an experimental approach which included (1) cloning and sequencing of genomic DNA fragments encoding

Mo-CBP3; (2) molecular mass analysis through LC-ESI-MS; and (3) computational study, with the objective to correlate theoretical and experimental masses. A total of 32 different

Mo-CBP3 clones were obtained and completely sequenced. After multiple alignments of the deduced amino acid sequences it was possible to categorize them into eight distinct isoform groups. Four of these isoforms have been previously described (Mo-CBP3-1, Mo-CBP3-2, Mo-

CBP3-3 and Mo-CBP3-4), whereas Mo-CBP3-2A (163 amino acid residues), Mo-CBP3-2B

(162 residues), Mo-CBP3-3A (160 residues) and Mo-CBP3-3B (160 aa) isoforms were first reported in this work. Analysis of the amino acid sequences, deduced from the genomic DNA, suggested that Mo-CBP3 isoforms are synthesized as preproproteins containing an N-terminal signal peptide, an N-terminal propeptide, a small chain of about 4 kDa, a peptide linkage between the two chains, a large chain with approximately 8 kDa, and a C-terminal extension. ESI-MS experiments revealed 147 distinct values of molecular mass, of which 89 corresponded to variants of the small chain and 58 to variants of the large chain. Molecular mass values were calculated from the amino acid sequences, assuming different degrees of proteolytic processing at the N- and C-terminal ends of each chain. In addition, possible changes produced by different PTMs, such as proline hydroxylation, serine or threonine phosphorylation, methionine oxidation, and the cyclization of the N-terminal glutamine into pyroglutamate (pGlu) were also taken into account. Using this strategy, the majority of the experimental masses of the small chain (72 of a total of 89) and almost all experimental masses corresponding to the large chain (57 out of 58) could be attributed to specific amino acid sequences of the different Mo-CBP3 isoforms. These results suggest that: i) Mo-CBP3 isoforms are primarily derived from closely related members of a multigenic family; ii) once processed, the precursors encoded from the different mRNAs give rise to two chains, one smaller (4 kDa) and one larger (8 kDa), joined by disulfide bonds; the N- and C-terminal ends of both chains can undergo proteolytic processing, resulting in the removal of one or a few 22

residues from each sequence, yielding chains with slightly different masses; iii) post- translational modifications in certain residues of one of the polypeptide chains or both are a third source of variation in the molecular mass, substantially increasing the number of isoforms. In conclusion, an extremely complex mixture of Mo-CBP3 isoforms, derived from a few genes, is produced by different combinations of distinct PTMs. These results constitute important advances in understanding the inherent post-translational mechanisms that operate during the biosynthesis of 2S in seeds of M. oleifera.

Keywords: Moringa oleifera. Post-translational modification. 2S albumins. Mo-CBP3

23

SUMÁRIO

1 INTRODUÇÃO GERAL...... 13 2 FUNDAMENTAÇÃO TEÓRICA...... 16 3 HIPÓTESE...... 30 4 OBJETIVOS...... 31 5 POST-TRANSLATIONAL MODIFICATIONS REVEAL THAT Mo-

CBP3, A 2S MORINGA OLEIFERA CHITIN-BINDING ALBUMIN, IS A COMPLEX MIXTURE OF ISOFORMS...... 39 6 CONSIDERAÇÕES FINAIS...... 69 REFERÊNCIAS...... 70 APÊNDICE - SUPPLEMENTARY MATERIAL: POST-

TRANSLATIONAL MODIFICATIONS REVEAL THAT Mo-CBP3, A 2S MORINGA OLEIFERA CHITIN-BINDING ALBUMIN, IS A COMPLEX MIXTURE OF ISOFORMS...... 79 13

1 INTRODUÇÃO GERAL

A Moringa oleifera é uma planta pertencente à família Moringaceae, oriunda da região sub-himalaia, a noroeste da Índia, sendo mundialmente conhecida por vários codinomes dentre eles: drumstick tree (RAMACHANDRAN; PETER; GOPALAKRISHNAN, 1980) e árvore do milagre, devido as suas inúmeras propriedades benéficas, incluindo seu potencial nutricional (SAHAY; YADAV; SRINIVASAMURTHY, 2017; KOU et al., 2018; SREELATHA; JEYACHITRA; PADMA, 2011). No Brasil, notadamente na região Nordeste, a M. oleifera foi introduzida por volta da década de 1950, sendo largamente utilizada, especialmente na purificação de água para fins de uso doméstico devido a presença de moléculas floculantes naturais em suas sementes (VICENTE et al., 2017; ABIYU et al., 2018). Além disso, as sementes de M. oleifera detêm grande importância industrial alimentícia, uma vez que apresenta elevada quantidade de óleo de boa qualidade, sendo o principal constituinte o ácido oléico com até 78,7% (LEONE et al., 2016). Outra razão, pela qual a M. oleifera tem chamado de muitos pesquisadores diz respeito aos efeitos deletérios causados sobre vírus (LIPIPUN et al., 2003), bactérias (ELGAMILY et al., 2016), fungos (ELGAMILY et al., 2016; NETO et al., 2017), artrópodes (FERREIRA et al., 2009) e nematódeos (ONYEKE; AKUESHI, 2012). Destra as moléculas antifúngicas identificadas na M. oleifera se destacam as proteínas ligantes à quitina chamada genericamente de Mo-CBP (chitin-binding protein from Moringa oleifera), com especiais destaques: a Mo-CBP3, uma proteína termoestável capaz de flocular e inibir atividade contra as espécies de fungos fitopatogênicos Fusarium solani, F. oxysporum,

Colletotrichum musae e C. gloesporioides (GIFONI et al., 2012) e a Mo-CBP2 capaz de inibir Candida albicans, C. parapsilosis, C. krusei e C. tropicalis (NETO et al., 2017), respectivamente.

No que diz respeito a Mo-CBP3, além de seu potencial antifúngico, possui ainda resistência a uma ampla faixa de temperatura e pH. Segundo Freire e colaboradores (2015), a

Mo-CBP3 constitui produtos de origem multigênica, uma vez que são conhecidas atualmente quatro isoformas: Mo-CBP3-1, Mo-CBP3-2, Mo-CBP3-3 e Mo-CBP3-4. Embora tenha sido identificada como uma proteína ligante à quitina, estudos adicionais tem demonstrado que esta proteína constitui um membro genuíno da família das albuminas 2S (FREIRE et al., 2015). O termo “albumina” foi um codinome genérico atribuído para proteínas reativamente pequenas (1,7S – 2,2S) de armazenamento em plantas solúveis em água e ricas em aminoácidos cisteínas e glutaminas em regiões conservadas (YOULE; HUANG, 1981; 14

KREBBERS et al., 1988; (MYLNE; HARA-NISHIMURA; ROSENGREN, 2014), que ainda compartilham características que incluem resíduos catiônicos, ligações dissulfeto e peptídeos antimicrobianos (NAWROT et al., 2014). Muitas dessas proteínas são sintetizadas como pré-pro-albuminas, as quais apresentam uma sequência N-terminal (peptídeo sinal), uma extensão do N-terminal que separa o peptídeo sinal da cadeia maior da cadeia menor, um linker ou pequena sequência de aminoácidos que liga ambas as cadeias polipeptídicas e, uma sequência C-terminal, regiões essas ausentes na proteína madura, a exemplo a pré-pro-Mabinlin-II de Capparis masaikai

(NIRASAWA et al., 1993) e a pré-pro-Mo-CBP3 de M. oleifera (FREIRE et al., 2015). Embora, as albuminas 2S apresentem diferentes tamanhos de precursores, a estruturas dessas proteínas têm sido muito conservadas, sendo sua estabilidade mantida por oito resíduos de cisteínas ligadas por pontes dissulfeto (..C..C../..CC..CXC..C..C..) (PANTOJA-UCEDA et al., 2002). De modo geral a estrutura das albuminas 2S são organizadas em quatro α-hélices, sendo mantido esse padrão por quatro pontes dissulfeto de maneira similar aos encontrados nos inibidores bifuncionais de α-amilase/tripsina e nas proteínas transportadoras de lipídios não-específicas – nsLTPs, de non-specific Lipid Transfer Proteins (CÂNDIDO et al., 2011; EDSTAM et al., 2011; NAWROT et al., 2014). Entretanto, há relato de algumas albuminas 2S que exibem apenas três α-hélices como a Ber e 1 (PDB: 2LFV) de Bertholletia excelsa, enquanto que outras podem apresentar até cinco α-hélices como o caso observado na proteína Napin BnIb (PDB: 1PNB) de Brassica napus (RICO, et al., 1996).

Apesar dos avanços na caracterização e do modo de ação de Mo-CBP3 esta proteína não

é compreendida completamente. Acredita-se que Mo-CBP3 possa influênciar negativamente a atividade das bombas de prótons (H+-ATPase) presentes na membrana celular, além de causar deformações na parede de células de fungos fitopatogênicos (BATISTA et al., 2014), tais como: F. solani, F. oxysporum, Colletotrichum musae e C. gloesporioides (GIFONI et al., 2012). Resultados obtidos por espectrometria de massas e PAGE-SDS admitiram que Mo-

CBP3 é formada por duas cadeias polipeptídicas, em torno de 4,1 kDa e 8,1 kDa (BATISTA et al., 2014; FREIRE et al., 2015). Apesar de recentes estudos estarem buscando respostas sobre o mecanismo de ação desta proteína, é necessário grande esforço a fim de descrever todas as isoformas existentes de

Mo-CBP3, bem como entender o papel fisiológico de cada uma delas, visto a grande importância fisiológica das albuminas durante o desenvolvimento dos vegetais, além dos mecanismos de defesa que estas estão envolvidas. Embora existam isoformas de Mo-CBP3 com diferenças pontuais em suas sequências de aminoácidos é plausível que muitas outras 15

sejam resultado de modificações pós-traducionais incluindo a presença de oxidações fosforilações, hidroxilações em muitos de seus resíduos particulares.

16

2 – FUNDAMENTAÇÃO TEÓRICA

Moringa oleifera Lamarck

Considerações gerais

A Moringa oleifera (Figura 1) é uma planta pertencente à família Moringaceae, originária da região sub-himalaia, a noroeste da Índia, sendo mundialmente conhecida como drumstick tree (RAMACHANDRAN; PETER; GOPALAKRISHNAN, 1980; GANGULY; GUHA, 2008), pode ainda ser conhecida como horseradish (SHIH et al., 2011) ou árvore do milagre, devido seus inúmeros atributos benéficos, incluindo seu potencial nutricional (SAHAY; YADAV; SRINIVASAMURTHY, 2017; KOU et al., 2018; SREELATHA; JEYACHITRA; PADMA, 2011).

FIGURA 1 – Características morfológicas da Moringa oleifera.

Fonte: elaborada pelo autor.

No Brasil, especialmente na região Nordeste, a M. oleifera foi introduzida por volta da década de 1950, tem sido largamente empregadas de diversas maneiras, com especial na ornamentação de parques e jardins, na alimentação animal, na complementação alimentar humana e na medicina (VIEIRA; CHAVES; VIÉGAS, 2008). Segundo Gallão e colaboradores (2006), o cultivo da M. oleifera tem sido expandido por todo o semiárido 17

brasileiro devido seu emprego no tratamento de água para uso doméstico. O gênero Moringa é constituído por treze espécies (RANI; HUSAIN; KUMOLOSASI, 2018) bastante estudadas, em razão, de propriedades floculantes presentes em suas sementes e, consequentemente, sua aplicação para purificar água para consumo humano e animal. Muitas espécies pertencentes a gênero Moringa, especialmente a M. oleifera, se caracteriza, também, por sua resistência a diferentes espécies de pragas, facilidade de cultivo e rapidez no crescimento (RAMACHANDRAN; PETER; GOPALAKRISHNAN, 1980).

Aproveitamento da Moringa oleifera

Devido o atributo floculante apresentado pelas sementes de M. oleifera, esta espécie tem sido de grande relevância, sobretudo, em países subdesenvolvidos, onde é comum o uso de suas sementes para purificar a água de uso doméstico (VICENTE et al., 2017; ABIYU et al., 2018). As propriedades floculantes presentes em sementes de M. oleifera foram atribuídas a polieletrólitos de 3,0 kDa, globulinas e albuminas (OKUDA et al., 2001; ZAKU et al., 2015; BAPTISTA et al., 2017). Outros estudos relacionaram essa propriedade a proteínas catiônicas com massa molecular entre 6,5 e 13,0 kDa (GASSENSCHMIDT et al., 1995; NDABIGENGESERE; NARASIAH; TALBOT, 1995; GHEBREMICHAEL et al., 2005; GOYAL et al., 2007), ou a polipeptídeos (CHEN, 2009; MANGALE; CHONDE; RAUT, 2012). Além disso, as sementes de M. oleifera detêm importância industrial, pois apresenta elevada quantidade de óleo de boa qualidade, sendo o principal constituinte o ácido oléico com até 78,7% (LEONE et al., 2016). Além dos atributos ponderados acima, a M. oleifera tem sido emprega a séculos na medicina popular, em diversos continentes. Por essa razão, diversos estudos têm sido delineados a fim de isolar moléculas com potencial biotecnológico a partir dessa planta. Em um estudo desenvolvido por Lipipun e colaboradores (2003), foi demonstrado que extrato etanólico obtido a partir de folhas de M. oleifera possui efeito antiviral contra o HSV-1 (vírus Herpes simplex tipo 1), inibindo o desenvolvimento viral in vitro, minimizando a mortalidade de murinos previamente infectados com o HSV-1. Há décadas, agentes antimicrobianos são identificados e isolados a partir de sementes de M. oleifera. Tanto extratos etanólicos quanto aquosos foram capazes de inibir o crescimento dos microrganismos: Bacillus subitilis, Escherichia coli, Pasturella multocida, Pseudomonas aeruginosa e Staphylococcus aureus, S. mutans e Vibrio cholerae (CÁCERES et al., 1991; JABEEN et al., 2008; VIERA; CHAVES; VIÉGAS, 2010; ELGAMILY et al., 18

2016). Efeitos adversos contra espécies de fungos foram comprovados em: Basidiobolus haptosporus, B. ranarum, Candida albicans, Colletotrichum gloeosporioides, C. musae, Fusarium solani, F. oxysporum, Microsporum canis, Rhizopus solani, Trichophyton rubrum e T. mentagrophytes (NWOSU; OKAFOR, 1995; JABEEN et al., 2008; ROCHA et al., 2011; GIFONI et al., 2012; BATISTA et al., 2014; ELGAMILY et al., 2016; NETO et al., 2017). Efeito deletério sobre o mosquito Aedes aegypti (FERREIRA et al., 2009) e nematicida contra o Meloidogyne incognita (ONYEKE; AKUESHI, 2012), também foram observados. Grande parte das propriedades discorridas acima tem sido atribuída a moléculas isoladas a partir das sementes da planta M. oleifera. Segundo Gallão e colaboradores (2006), para cada 100 g de farinha há o equivalente a 39,3 g de proteínas nas sementes de moringa. Portanto, é possível que muitas classes de proteínas expressas nas sementes de M. oleifera possam estar diretamente relacionadas a várias propriedades atribuídas a esta parte da planta. Dentre as proteínas já purificadas de sementes de moringa destacam-se: uma proteína catiônica com capacidade de flocular partículas suspensas em água (AGRAWAL; SHEE; SHARMA, 2007). Makkar e Becker (1997), sugeriram que propriedades antinutricionais inerentes às sementes dessa espécie podem estar associadas a uma lectina. A hipótese foi corroborada após a identificação de duas lectinas hemaglutinantes básicas chamadas cMol (30 kDa) e WSMoL (20 kDa), termoestáveis e com atividade nas faixas de pH 4.0-9.0 e de 4.5- 9.5, respectivamente (SANTOS et al., 2009; ROLIM et al. 2011).

Proteínas ligantes à quitina chamada Mo-CBP4 apresentaram propriedades antiinflamatória e antinociceptiva (PEREIRA et al., 2011), em seguida foi descrita outra proteína denominada Mo-CBP3 termoestável capaz de flocular e inibir as espécies de fungos fitopatogênicos Fusarium solani, F. oxysporum, Colletotrichum musae e C. gloesporioides (GIFONI et al., 2012). Mais recentemente foi caracterizada uma proteína chamada de Mo-

CBP2 capaz de inibir Candida albicans, C. parapsilosis, C. krusei e C. tropicalis. Devido ao potencial antifúngico, resistência a ampla faixa de temperatura e pH, a proteína Mo-CBP3 pode ser útil no desenvolvimento de medicamentos antifúngicos. Esta, no entanto, constitui produtos de origem multigênica (FREIRE et al., 2015), sendo, conhecidas atualmente quatro isoformas identificadas como: Mo-CBP3-1, Mo-CBP3-2, Mo-CBP3-3 e Mo-

CBP3-4. Embora tenha sido identificada como uma proteína ligante à quitina, estudos adicionais tem demonstrado que esta constitui um membro genuíno da família das albuminas 2S, como mostrado pela presença de um motivo de oito cisteínas, que é uma característica da superfamília da prolamina (FREIRE et al., 2015).

19

Albuminas 2S Vegetais

Considerações gerais

O termo “albumina” foi um codinome genérico atribuído inicialmente para qualquer proteína solúvel em água (MYLNE; HARA-NISHIMURA; ROSENGREN, 2014). Outrossim, essas proteínas já foram chamadas de arabins, devido a sua similaridade com a napin (Brassica napus), primeira albumina 2S descrita (HEATH et al., 1986). Atualmente, as albuminas 2S constituem uma classe de pequenas (1,7S – 2,2S) proteínas de armazenamento em plantas, ricas em aminoácidos cisteínas e glutaminas em regiões conservadas, geralmente classificadas com base em seu coeficiente de sedimentação (YOULE; HUANG, 1981; KREBBERS et al., 1988), ainda compartilham características que incluem resíduos catiônicos, ligações dissulfeto e peptídeos antimicrobianos (NAWROT et al., 2014). Essas proteínas pertencem à superfamília das prolaminas e, compreendem as principais proteínas em sementes e frutos de cascas rígidas, sendo amplamente distribuídas em plantas monocotiledôneas e dicotiledôneas (BREITENEDER; RADAUER, 2004; GUPTA; GAUR; SALUNKE, 2008). Como proteínas de armazenamento, as albuminas 2S ficam estocadas sob forma de corpos proteicos (protein storage vacuoles - PSVs), sendo empregadas como fonte de nutrientes (aminoácidos) e esqueletos de carbono durante a germinação e crescimento de plântulas (YOULE; HUANG, 1981; LI et al., 2006; CANDIDO et al., 2011). Acredita-se que o transporte das proteínas de reserva energética para os PSVs é regulado por enzimas proteolíticas presentes nas sementes (TAN-WILSON; WILSON, 2012), com base na co- localização subcelular tanto das enzimas e das proteínas reserva, bem como das condições adequadas para a atividade das enzimas proteolíticas (WILSON et al., 2016). Para Wang e Bunkers (2000), as plantas podem usar proteínas de armazenamento para gerar proteínas antimicrobianas que as protegem contra o ataque de seus inimigos naturais, especialmente durante a germinação das sementes, quando as plântulas encontram-se vulneráveis devido à falta do sistema de defesa induzível mais organizado.

Biossíntese das albuminas 2S

De modo geral, as albuminas 2S são codificadas por famílias multigênicas, sendo formadas a partir de precursores com tamanhos que variam entre 18-21 kDa e, que geralmente 20

sofre extenso processamento proteolítico em vacúolos, resultando em duas subunidades, sendo a maior entre 8-14 kDa e a menor entre 3-10 kDa (SHEWRY; NAPIER; TATHAM, 1995; MORENO; CLEMENTE, 2008; NAWROT et al., 2014). Muitas dessas proteínas são sintetizadas como pré-pro-albuminas, as quais apresentam uma sequência N-terminal (peptídeo sinal), uma extensão do N-terminal que separa o peptídeo sinal da cadeia maior da cadeia menor, um linker ou pequena sequência de aminoácidos que liga ambas as cadeias polipeptídicas e, uma sequência C-terminal, regiões essas ausentes na proteína madura, a exemplo a pré-pro-Mabinlin-II de Capparis masaikai

(NIRASAWA et al., 1993) e a pré-pro-Mo-CBP3 de M. oleifera (FREIRE et al., 2015). Um esquema geral é mostrado na figura 2.

FIGURA 2- Esquema geral de uma pré-pro-albumina.

Peptídeo sinal Cadeia menor Peptídeo de Extensão N-terminal Ligação entre cadeias

Cadeia maior

Extensão C-terminal

Fonte: elaborada pelo autor.

Dentre os eventos inerentes aos processos de maturação das albuminas 2S em sementes, a sequência N-terminal se destaca, sendo ela a responsável pelo direcionamento dessas proteínas ainda inativas para o vacúolo de armazenamento, consistindo, portanto, um mecanismo geral de ontogenia nesses órgãos (VITALE; GALILI, 2001). É provável que o direcionamento das albuminas 2S para os vacúolos de armazenamento seja devido à maior abundância deste quando comparados aos vacúolos líticos (MARTINOIA; MAESHIMA; NEUHAUS, 2007). No que diz respeito aos eventos metabólicos inerentes as albuminas 2S, em muitos casos, o processamento parece ser muito dispendioso, uma vez que os precursores apresentam peso molecular elevado, a exemplo da albumina 2S de Helianthus annuus (HaG5) 21

que possui massa molecular de 38 kDa, sendo reduzido em dois peptídeos de tamanhos aproximados de 19 kDa (ALLEN et al., 1987). Em sementes da planta Ricinus communis, um precursor da albumina 2S (2S ASP-Ib) com peso molecular de 29,3 kDa, após os processos proteolíticos pode resultar em até em duas isoformas com subunidades menores 4 kDa e 4,3 kDa e as subunidades maiores em torno de 7 kDa e 8 kDa, respectivamente (SHARIEF; LI, 1982; IRWIN et al., 1990; SILVA et al., 1996). Há, no entanto, algumas plantas que parecem ter desenvolvido mecanismos para evitar essa perda energética durante a síntese de albuminas, produzindo precursores muito menores, como as albuminas Ses i 1 e Ses i 2 de Sesamum indicum que possui um precursor conhecido com massa molecular próximo de 17,5 kDa (AF091841) e 17,4 kDa (ID: AF240005), respectivamente (TAI et al., 1999; TAI et al., 2001; MORENO et al., 2005). Após os processos proteolíticos inerentes as proteínas Ses i 1 e Ses i 2, o resultado são dois heterodímeros, sendo o correspondente a proteína Ses i 1 madura uma cadeia menor em torno de 4 kDa e a cadeia maior em trono de 8 kDa, e para a proteína Ses i 2 uma cadeia menor em torno de 4,9 kDa e a maior em trono de 8,7 kDa (TAI et al., 1999; MORENO et al., 2005). A Malva parviflora é outra espécie de planta que compartilha mecanismos que evitam a perda de energia durante a síntese da albumina 2S CW-1, após seu processamento resulta em duas subunidades de 5 kDa e 3 kDa (WANG; BUNKERS, 2000). Há ainda albuminas 2S que apresentam apenas um único polipeptídio (KHAN et al., 2016). É possível que em precursores menores de albuminas 2S, forme-se apenas uma subunidade como é o caso da albumina 2S (SFA-8), isoladas de sementes de H. annuus que possui massa molecular de 12,1 kDa (KORTT; CALDWELL, 1990). Há, no entanto, pesquisas mais recentemente que indicam haver albuminas 2S sintetizadas a partir de precursor de tamanho aproximadamente 13 kDa (HUMMEL; WIGGER; BROCKMEYER, 2015), resultando em proteínas maduras compostas por subunidades de cerca de 9 a 10 kDa (subunidade maior) e 3 a 4 kDa (subunidade menor), respectivamente. Embora, as albuminas 2S apresentem diferentes tamanhos de precursores, a estruturas dessas proteínas têm sido muito conservadas, sendo sua estabilidade mantida por oito resíduos de cisteínas ligadas por pontes dissulfeto (..C..C../..CC..CXC..C..C..) (PANTOJA-UCEDA et al., 2002) como pode ser visto na figura 3. Dentre as modificações pós-traducionais que ocorrem destacam-se, a clivagem do peptídeo de sinal, a remoção de um fragmento N-terminal adicional, de um segmento interno e de alguns resíduos do C-terminal (CASTRO et al., 1987; SCOFIELD; CROUCH, 1987; KREBBERS et al., 1988).

22

FIGURA 3 - Representação esquemática do padrão de pontes de sulfuto formadas entre os oito resíduos de cisteínas conservados na família de albumina 2S.

Fonte: elaborada pelo autor.

Há relatos que algumas albuminas 2S são organizadas estruturalmente em quatro α-hélices, sendo mantido esse padrão por quatro pontes dissulfeto de maneira similar aos encontrados nos inibidores bifuncionais de α-amilase/tripsina e nas proteínas transportadoras de lipídios não-específicas – nsLTPs, de non-specific Lipid Transfer Proteins (CÂNDIDO et al., 2011; EDSTAM et al., 2011; NAWROT et al., 2014). Porém, até a presente data, exceto a albumina 2S: Ber e 1 (PDB: 2LFV) de Bertholletia excelsa que apresenta três α-hélices (RUNDQVIST et al., 2012), todas as estruturas (Figura 4) de albuminas 2S resolvidas apresentam cinco α-hélices como observadas nas proteínas: Napin BnIb (PDB: 1PNB) de Brassica napus (RICO, et al., 1996), RicC3 (PDB: 1PSY) de Ricinus communis (PANTOJA- UCEDA, et al., 2003), napin BnIb (PDB: 1SM7) de Brassica napus (PANTOJA-UCEDA, et al., 2004a), SFA-8 (PDB: 1S6D) de Helianthus anuus (PANTOJA-UCEDA, et al., 2004b), Ara h 2 (PDB: 1W2Q) de Arachis hypogaea (LEHMANN, et al., 2006), Mabinlin II (PDB:

2DS2) de Capparis masaikai (LI, et al., 2008) e Mo-CBP3-4 (PDB: 5DOM) de Moringa oleifera (ULLAH, at al., 2015).

Atividades biológicas já descritas para as albuminas 2S

A presença de diversas funções biológicas já determinadas para as albuminas 2S até o momento, em parte, pode ser devido à existência de várias isoformas, especialmente em sementes de plantas não leguminosas (nesses vegetais, as vicilinas são as principais proteínas 23

de reserva), como mostrado na Arabidopsis thaliana (VAN DER KLEI et al., 1993), em Bertholletia excelsa (MORENO et al., 2004), S. indicum (ORRUÑO; MORGAN, 2011), na Corylus avellana (PFEIFER et al., 2015), na M. oleifera (FREIRE et al., 2015) e no A. hypogaea (BOUALEG; BOUTEBBA, 2017), por exemplo. Em consequência, o isolamento de albumina específica tem se mostrado um processo complexo, quando se emprega técnicas convencionais devido à presença de muitas isoformas. É plausível que a diversidade de formas pode ser resultante, em parte, devido suas origens multigênicas e, por consequente, intenso processamento pós-traducional que essas proteínas sofrem (CÂNDIDO et al., 2011; PFEIFER et al., 2015). Além desses fatores, há ainda a influência de fenômenos ambientais incluindo temperatura, umidade e salinidade que podem induzir a ocorrência de variações alélicas em sequências de proteínas entre os genótipos de plantas, resultando em uma heterogeneidade no pool de proteínas quando a amostra não é de origem geneticamente uniforme. Em muitos casos, é imperativo a clonagem e expressão de albuminas 2S presentes em uma dada amostra, para então caracterização dessas individualmente como ocorreu com as albuminas de B. excelsa (GANDER et al., 1991), de Glycine max (LIN et al., 2006) e de C. avellana (GARINO et al., 2010b).

Potencial antifúngico de albuminas 2S

Há décadas, estudos científicos têm demonstrado o grande potencial que as albuminas 2S detêm contra espécies fúngicas. Dentre essas moléculas, destacam-se alguns exemplos isolados a partir de sementes das plantas: A. hypogaea: A albumina heterodimérica 2S-1 pode existir em pelo menos três isoformas (DUAN et al., 2013), suporta variação de 30 a 100 °C durante 10 minutos e variação de pH de 2,0 a 11,0. Possui potencial inibitório contra o fungo Aspergillus flavus (DUAN et al., 2013). Capsicum annuum: A albumina 2S chamada Ca-Alb estruturalmente caracterizada como um heretodímero (4 kDa e 9 kDa), e funcionalmente ativo contra as espécies de leveduras Candida tropicalis, Kluyveromyces marxiannus e Saccharomyces cerevisiae (RIBEIRO et al., 2011a). Cucurbita maxima: A albumina 2S (sem nomenclatura) apresentou atividade antifúngica com a espécie F. oxysporum (TOMAR et al., 2014a).

24

FIGURA 4 - Estrutura de albuminas 2S já resolvidas. (A) 2LVF de Bertholletia excelsa, (B) 1PNB de B. napus, (C) 1PSY de Ricinus communis, (D) 1SM7 de Brassica napus, (E) 1S6D de Helianthus annuus, (F), 1W2Q de Arachis hypogaea, (G) 2DS2 de Capparis masaikai e (H) 5DOM de Moringa oleifera. As α-hélices em vermelho (cadeia menor) e α-hélices em azul (cadeia maior).

A B C D

1PNB 1PSY 1SM7 2LFV D

E F G H

1S6D 1W2Q 2DS2 5DOM

Fonte: elaborada pelo autor.

48

H. annuus: Um pool de albuminas identificadas como Ha-AF15, embora tenha apresentado atividade minimizando o crescimento das espécies Sclerotinia sclerotiorum e F. solani, não foi capaz de inibir completamente a germinação de esporos desses fungos (REGENTE; DE LA CANAL, 2001). M. parviflora: Duas albuminas 2S nomeadas como CW-1 e CW-4 até o presente momento foram identificadas. A isoforma CW-1 caracteriza-se como uma proteína heterodimérica, possuindo massa molecular da cadeia maior em torno de 6 kDa e da cadeia menor entorno de 3 kDa (WANG; BUNKERS, 2000). Análises adicionais com a CW-1 evidenciaram que essa proteína suporta variações de concentração de sal, e ainda exerce efeitos deletérios contra o fungo F. graminearum (WANG; BUNKERS, 2000). Segundo Wang e colaboradores (2001), a isoforma CW-4 possui uma cadeia polipeptídica de 5 kDa, e mesmo depois de ser incubada a 100 ºC durante 20 minutos, ainda permanece ativa contra o fungo Phytophthora infestans. M. oleifera: Várias albuminas 2S foram identificadas, sendo elas formas heterodiméricas com cadeias polipeptídicas com massa moleculares próximas de 8,1 kDa e 4,1 kDa (FREIRE et al., 2015). Estas isoformas possuem termoestabilidade permanecendo ativas após serem incubadas durante 1 h a 100 ºC (GIFONI et al., 2012). Atualmente, são denominadas de Mo-CBP3-1, Mo-CBP3-2, Mo-CBP3-3, Mo-CBP3-4 (FREIRE et al., 2015) e

Mo-CBP2 (NETO et al., 2017). Estas proteínas possuem atividade inibitória contras as espécies F. solani, F. oxysporum, C. musae and C. gloeosporioides (GIFONI et al., 2012; BATISTA et al., 2014). Passiflora alata: Até a presente data apenas a albumina 2S chamada Pa-AFP1 foi identificada. Essa proteína é formada por duas cadeias polipeptídicas, sendo a massa da cadeia maior 7 kDa e a massa da cadeia menor 4,5 kDa, e tem apresentado atividade inibitória contra o fungo C. gloeosporioides (RIBEIRO et al., 2011b). P. edulis f. flavicarpa: Entre as albuminas 2S purificadas desta planta podemos destacar três heterodiméricas, sendo elas Pf1 (3,5 kDa e 8 kDa), Pf2 (3,5 kDa e 8 kDa) e a Pf- Alb (5 kDa e 9 kDa) (AGIZZIO et al., 2003; RIBEIRO et al., 2011a). Em um ensaio in vitro, as proteínas Pf1 e Pf2, bem como a associação contendo ambas as proteínas, inibiram o crescimento dos fungos F. oxysporum e F. solani, bem como a levedura S. cerevisiae e inibiu fortemente a acidificação estimulada pela glicose do meio por F. oxysporum de forma dose dependente (AGIZZIO et al., 2003). A Pf-Alb foi capaz de causar efeitos deletérios nas espécies de leveduras C. albicans, C. guilliermondii, C. parapsilosis, C. tropicalis Kluyveromyces marxiannus e Pichia membranifaciens (RIBEIRO et al., 2011a).

26

Putranjiva roxburghii: A albuminas 2S identificada como putrin, possui duas cadeias polipeptídicas, a maior em torno de 7,5 KDa e a menor em torno de 4,5 kDa. Análise realizada a fim de evidenciar potencial antifúngico, demonstrou que a putrin é ativa contra Aspergillus flavus, F. oxysporum e Phanerochaete chrysosporium (TOMAR et al., 2014b). Raphanus sativus: Conhecidas como Rs-2Ss (Rs-2S1 - Rs-2S5), este grupo de isoformas são formadas por duas cadeias polipeptídicas, sendo a maior em torno de 10 kDa e a menor próximo de 4 kDa (TERRAS et al., 1992). Ensaios realizados têm demonstrado que o pool proteico formado por essas albuminas, mesmo após tratamento a 100 ºC durante 10 minutos é capaz de inibir as espécies fúngicas: Alternaria brassicola, Ascochyta pisi, Botrytis cinerea, C. lindemuthianum, F. culmorum, F. oxysporum f.sp. lycopersici, F. oxysporum f.sp. pisi, Mycosphaerella fijiensis var. fijiensis, Nectria haematococca, Phoma betae, Phytophthora infestans, Pyricularia oryzae, Trichoderma hamatum e Verticillium dahliae (TERRAS et al., 1992; TERRAS et al., 1993). Taraxacum officinale: Cinco isoformas de albuminas 2S foram identificadas, porém a capacidade inibitória contra fungos foi observada somente nas isoformas: To-A1, To- A2, To-A3, todas elas apresentando características heterodiméricas, com cadeia maior entrono de 10 kDa e a menor entrono de 4 kDa (ODINTSOVA et al., 2010). Os ensaios antifúngicos mostraram que as isoformas To-A diferiram em seu efeito antifúngico: To-A1 foi capaz de inibição a germinação das espécies Helminthosporium sativum (Bipolaris sorokiniana) e Phoma betae; To-A2 apresentou igual efeito contra as espécies H. sativum, P. betae e F. oxysporum; enquanto que a To-A3 foi ativa contra os microrganismos H. sativum, P. betae e Verticillum albo-atrum (ODINTSOVA et al., 2010). Embora, muitos dos estudos científicos que abordam atividade inibitória de albuminas 2S contra fungos tenham sido, de maneira geral, direcionados para fungos fitopatogênicos, é provável que estas proteínas compartilhem essa mesma propriedade funcional quando testadas contra fungos de interesse clínico. Tal hipótese tem sido corroborada por algumas albuminas 2S, por exemplo, a Mo-CBP2, uma proteína purificada de sementes de M. oleifera, que foi capaz de inibir crescimento de espécies de fungos de interesse hospitalar, apresentando atividade contra as espécies, C. albicans, C. parapsilosis, C. krusei e C. tropicalis (RIBEIRO et al., 2011a; NETO et al., 2017).

27

Potencial antibacteriano de albuminas 2S

Nos últimos dez anos, estudos científicos têm demonstrado também que algumas albuminas 2S possuem atividade antibacteriana. Dentre essas proteínas, destaca-se a albumina 2S (SiAMP2) isolada de sementes de Sesamum indicum, que apresentou atividade inibitória contra espécies pertencentes ao gênero Klebsiella sp., (MARIA-NETO et al., 2011), a albumina 2S (Putrin) isolada de sementes de Putranjiva roxburghii, apresentou atividade contras as espécies Bacillus subtilis e Micrococcus flavus (TOMAR et al., 2014b), e a albumina (Rc-2S-Alb) isolada de R. communis, apresentou atividade contra as espécies B. subtilis, Klebsiella pneumonia e Pseudomonas aeruginosa (SOUZA et al., 2016).

Potencial alergênico de albuminas 2S

Durante décadas, muitas proteínas presentes em sementes de plantas têm sido associadas a moléculas potencialmente alergênicas, dentre essas, muitas são albuminas. Dentre as amêndoas, as quais as albuminas constituem as principais moléculas alergênicas, merecem destaque as albuminas: Sin a I isolada de Sinapis Alba (MENÉNDEZ-ARIAS et al., 1988), SFA-8 isolada de H. annuus (PANTOJA-UCEDA et al., 2004), Ana o 3 isolada de Anacardium occidentale (ROBOTHAM et al., 2005), Pis v 1 isolada de Pistacia vera (AHN et al., 2009), Cor a 14 isolada de C. avellana (GARINO et al., 2010; PFEIFER et al., 2015), Car i 1 isolada de Carya illinoinensis (SHARMA et al., 2011), e as albuminas Gly m 5 e Gly m 6 isoladas de G. max (EBISAWA et al., 2013), todas elas têm induzindo sérios efeitos nocivos locais e/ou sistêmicos.

Potencial atividade nucleásica de albuminas 2S

Muitas proteínas são potencialmente capazes de degradar moléculas de ácidos nucléicos. Algumas dessas proteínas podem degradar completamente, enquanto que outras apenas contam em pontos específicos. Nesse contexto, parece que algumas albuminas 2S divergiram para esse propósito biológico. Dentre as albuminas com atividade nucleásica se destacam: a albumina 2S (não nomeada) isolada de sementes de C. maxima capaz de fragmentar moléculas de DNA e de RNA (FANG et al., 2010; TOMAR et al., 2014), a albumina 2S (não nomeada) obtida de sementes de P. roxburghii, capaz de clivar ambas as moléculas de DNA e RNA (TOMAR et al., 2014b), a albumina 2S (Mo-CBP2) capaz de 28

fragmentar de DNA, sendo esta isolada de semente da planta M. oleifera (NETO et al., 2017) e, mais recentemente a albumina 2S (WTA), purificada de sementes de Wrightia tinctoria que apresentou atividade DNásica (SHARMA et al., 2017).

Outras atividades relatadas para as albuminas 2S

É ainda relatada para esse grupo de proteínas atividade anticancerígena. Dentre o grupo de albuminas, esta atividade foi relatada para a albumina 2S (não nomeada) de C. maxima, que exibiu forte atividade anticancerígena contra células de câncer de mama (MCF- 7), teratocarcinoma de ovário (PA-1), câncer de próstata (PC-3 e DU-145) e linhagens celulares de carcinoma hepatocelular (HepG2) (TOMAR et al., 2014). Algumas albuminas 2S são potencialmente capazes de criar orifícios permitindo a permeabilidade em membranas celulares, sendo este efeito já descrito para as albuminas Pf1 e Pf2 expressas em semente de Passiflora edulis f. flavicarpa (AGIZZIO et al., 2006), bem como para a albumina 2S (Mo-CBP2) purificada de sementes de M. oleifera (NETO et al.,

2017). Ademais, foi demonstrado que a Mo-CBP2 é capaz de aumento substancialmente a quantidade de espécies reativas de oxigênio, bem como apresenta potencial, quando incubadas com células de leveduras (NETO et al., 2017). Outra atividade comprovada para esse grupo de proteínas e sua competência em emulsificar, sendo essa atividade demonstrada pela proteína SFA-8 isolada de H. annuus (BURNETT et al., 2002; PANTOJA-UCEDA et al., 2004). Tem sido também, evidenciado que algumas albuminas 2S são capazes de inibir proteases serínicas e subtilisin (protease não específica), sendo observada essa atividade em albuminas isoladas de sementes de Brassica juncea e B. nigra (GENOV et al., 1997; MANDAL et al., 2002). Recentemente, a Rc-2S-Alb purificada de semente de R. communis apresentou potencial inibitório contra tripsina (SOUZA et al., 2016). E a albumina Cb2SA, uma proteína heterodimérica (5 kDa e 9 kDa) isolada de Caryocar brasiliense, apresentou atividade inibitória de tripsina in vivo na espécie Spodoptera frugiperda (COSTA et al., 2015). Ademais, uma das cadeias presente na albumina PA1b, albumina 2S isolada de Pisum sativum, apresentou atividade in vivo em ensaios contra Culex pipiens e Aedes aegyptii (GRESSENT et al., 2011). Em estudos posteriores utilizando as espécies de gorgulos do arroz Sitophilus oryzae, S. oryzae e Tribolium castaneum foi demonstrado que ocorre uma interação entre PA1b e a V-ATPase que resultou em um mecanismo de apoptose, culminando na morte 29

desses insetos (EYRAUD et al., 2017). De igual modo, atividade apoptótica tem sido relatada para a albumina 2S (não nomeada) de C. máxima (TOMAR et al., 2014), é possível que outras albuminas vegetais possam compartilhar essa função biológica.

Mo-CBP3, uma albumina 2S de Moringa oleifera

Apesar dos avanços na caracterização e do modo de ação de Mo-CBP3 esta proteína não é compreendida completamente. Mo-CBP3 parece ter influência sobre as bombas de prótons (H+-ATPase) presentes na membrana celular, além de causar deformações na parede de células de fungos fitopatogênicos (BATISTA et al., 2014), tais como: F. solani, F. oxysporum, Colletotrichum musae e C. gloesporioides (GIFONI et al., 2012). Espectrometria de massas e PAGE-SDS mostram que Mo-CBP3 é formada por duas cadeias polipeptídicas, em torno de 4,1 kDa e 8,1 kDa (BATISTA et al., 2014; FREIRE et al., 2015). Apesar de recentes estudos estarem buscando respostas sobre o mecanismo de ação desta proteína, são necessários maiores esforços a fim de descrever todas as isoformas existentes de Mo-CBP3, bem como entender o papel fisiológico de cada uma delas, visto a grande importância fisiológica das albuminas durante o desenvolvimento dos vegetais, além dos mecanismos de defesa que estas estão envolvidas.

30

3 HIPÓTESE

Mo-CBP3 constitui um conjunto de isoformas de origem multigênica presente na planta Moringa oleifera, sendo esta uma mistura complexa e de difícil separação por métodos proteômicos convencionais. Ademais, cada isoforma pode apresentar massa molecular distinta (conforme a necessidade bioquímica celular), devido à presença de sítios de modificações pós-traducionais (MPTs) do tipo oxidação da metionina, fosforilação e hidroxilação.

31

4 OBJETIVOS

Objetivo geral

Entender os padrões de modificações pós-traducionais (PTMs) possíveis em isoformas de Mo-CBP3, empregando técnicas de biologia molecular e proteômicas (ESI-MS).

Objetivos específicos

- Isolar e purificar gDNA a partir de folhas jovens de Moringa oleifera;

- Amplificar a região codificadora de isoformas de Mo-CBP3;

- Clonar e sequenciar a região codificadora de isoformas Mo-CBP3;

- Identificar todas as sequências que codificam isoformas de Mo-CBP3 no genoma de M. oleifera; - Comparar e analisar as sequências de nucleotídeos obtidas e as sequências aminoácidos deduzidas para as isoformas de Mo-CBP3 com sequências disponíveis em bancos de dados públicos;

- Reduzir e alquilar amostras de Mo-CBP3 e submeter a análises por espectometria de massas (ESI-MS);

- Identificar todas as massas obtidas as para ambas as cadeias de Mo-CBP3; - Calcular e associar as massas monoisotópicas (massa obtida a partir de sequência de aminoácidos) e massas calculadas (massa monoisotópica mais massa de possíveis MPTs); - Comparar, classificar todas as isoformas com base na massa calculada e a massa experimental.

32

REFERÊNCIAS ABIYU, A. et al. Wastewater treatment potential of Moringa stenopetala over Moringa olifera as a natural coagulant, antimicrobial agent and heavy metal removals. Cogent Environmental Science, v. 4, n. 1, p. 1–13, 2018. AGIZZIO, A. P. et al. A 2S albumin-homologous protein from passion fruit seeds inhibits the fungal growth and acidification of the medium by Fusarium oxysporum. Archives of Biochemistry and Biophysics, v. 416, p. 188–195, 2003. AGIZZIO, A. P. et al. The antifungal properties of a 2S albumin-homologous protein from passion fruit seeds involve plasma membrane permeabilization and ultrastructural alterations in yeast cells. Plant Science, v. 171, p. 515–522, 2006. AGRAWAL, H.; SHEE, C.; SHARMA, A. K. Isolation of a 66 kDa Protein with coagulation activity from seeds of Moringa oleifera. Research Journal of Agriculture and Biological Sciences, v. 3, n. 5, p. 418–421, 2007. AHN, K. et al. Identification of two pistachio allergens, Pis v 1 and Pis v 2, belonging to the 2S albumin and 11S family. Clinical and Experimental Allergy, v. 39, p. 926–934, 2009. ALLEN, R. D. et al. Sequence and expression of a gene encoding an albumin storage protein in sunflower. Molecular & General Genetics, v. 210, n. 2, p. 211–218, 1987. ASHRAF, M. et al. Microscopic evaluation of the antimicrobial activity of seed extracts of Moringa oleifera. Agriculture, v. 40, n. 4, p. 1349–1358, 2008. BAPTISTA, A. T. A. et al. Protein fractionation of seeds of Moringa oleifera lam and its application in superficial water treatment. Separation and Purification Technology, v. 180, p. 114–124, 2017.

BATISTA, A. B. et al. New insights into the structure and mode of action of Mo-CBP3, an antifungal chitin-binding protein of Moringa oleifera seeds. PLoS ONE, v. 9, n. 10, p. e111427, 2014. BOUALEG, I.; BOUTEBBA, A. Purification of water soluble proteins (2S albumins) extracted from peanut defatted flour and isolation of their isoforms by gel filtration and anion exchange chromatography. Scientific Study & Research, v. 18, n. 2, p. 135–143, 2017. BREITENEDER, H.; RADAUER, C. A classification of plant food allergens. Journal of Allergy and Clinical Immunology, v. 113, p. 821–830, 2004. CÁCERES, A. et al. Pharmacological properties of Moringa oleifera. 1: Preliminary screening for antimicrobial activity. Journal of Ethnopharmacology, v. 33, p. 213–6, 1991. CÂNDIDO, E. DE S. et al. Plant storage proteins with antimicrobial activity: novel insights into plant defense mechanisms. The FASEB Journal, v. 25, n. 1, p. 3290–3305, 2011. CHEN, M. Elucidation of bactericidal effects incurred by Moringa oleifera and chitosan. Journal of the U.S. SJWP, v. 4, n. 2009, p. 65–79, 2009. COSTA, T. G. et al. Identification of a novel 2S albumin with antitryptic activity from Caryocar brasiliense seeds. Journal of Agricultural Science, v. 7, n. 6, p. 197–206, 2015. 33

DA SILVA, J. G. et al. Amino acid sequence of a new 2S albumin from Ricinus communis which is part of a 29-kDa precursor protein. Archives of Biochemistry and Biophysics, v. 336, n. 1, p. 10–18, 1996. DUAN, X. H. et al. Some 2S albumin from peanut seeds exhibits inhibitory activity against Aspergillus flavus. Plant Physiology and Biochemistry, v. 66, p. 84–90, 2013. EBISAWA, M. et al. Gly m 2S albumin is a major allergen with a high diagnostic value in -allergic children. Journal of Allergy and Clinical Immunology, v. 132, n. 4, 2013. ELGAMILY, H. et al. Microbiological assessment of Moringa oleifera extracts and its incorporation in novel dental remedies against some oral pathogens. Macedonian Journal of Medical Sciences, v. 4, n. 4, p. 585–590, 2016. EYRAUD, V. et al. The interaction of the bioinsecticide PA1b (Pea Albumin 1 subunit b) with the insect V-ATPase triggers apoptosis. Scientific Reports, v. 7, n. 1, p. 1–10, 2017. FANG, E. F. et al. Biochemical characterization of the RNA-hydrolytic activity of a pumpkin 2S albumin. FEBS Letters, v. 584, n. 18, p. 4089–4096, 2010. FERREIRA, P. M. P. et al. Larvicidal activity of the water extract of Moringa oleifera seeds against Aedes aegypti and its toxicity upon laboratory animals. Anais da Academia Brasileira de Ciências, v. 81, p. 207–16, 2009.

FREIRE, J. E. C. et al. Mo-CBP3, an antifungal chitin-binding protein from Moringa oleifera seeds , is a member of the 2S albumin family. PLoS ONE, v. 10, n. 3, p. 1–24, 2015. GALLÃO, M. I.; DAMASCENO, L. F.; BRITO, E. S. Avaliação química e estrutural da semente de moringa. Revista Ciência Agronômica, v. 37, n. 1, p. 106–9, 2006. GANGULY, R.; GUHA, D. Alteration of brain monoamines & EEG wave pattern in rat model of Alzheimer’s disease & protection by Moringa oleifera. Indian Journal of Medical Research, v. 128, n. p. 744–751, 2008. GARINO, C. et al. Isolation, cloning, and characterization of the 2S albumin: A new allergen from hazelnut. Molecular Nutrition and Food Research, v. 54, n. 9, p. 1257–1265, 2010. GASSENSCHMIDT, U. et al. Isolation and characterization of a flocculating protein from Moringa oleifera Lam. Biochimica et Biophysica Acta, v. 1243, n. 3, p. 477–81, abr. 1995. GENOV, N. et al. A novel thermostable inhibitor of trypsin and subtilisin from the seeds of Brassica nigra: Amino acid sequence, inhibitory and spectroscopic properties and thermostability. Biochimica et Biophysica Acta - Protein Structure and Molecular Enzymology, v. 1341, n. 2, p. 157–164, 1997. GHEBREMICHAEL, K. A. et al. A simple purification and activity assay of the coagulant protein from Moringa oleifera seed. Water Research, v. 39, p. 2338–2344, 2005. GIFONI, J. M. et al. A novel chitin-binding protein from Moringa oleifera seed with potential for plant disease control. Biopolymers, v. 98, n. 4, p. 406–415, 2012. GOYAL, B. R. et al. Phyto-pharmacology of Moringa oleifera Lam . - An overview. Natural Product Radiance, v. 6, n. 4, p. 347–353, 2007. 34

GRESSENT, F. et al. Pea albumin 1 subunit b (PA1b), a promising bioinsecticide of plant origin. Toxins, v. 3, n. 12, p. 1502–1517, 2011. GUPTA, P.; GAUR, V.; SALUNKE, D. M. Purification, identification and preliminary crystallographic studies of a 2S albumin seed protein from Lens culinaris. Acta Crystallographica Section F: Structural Biology and Crystallization Communications, v. 64, p. 733–736, 2008. HEATH, J. D. et al. Analysis of storage proteins in normal and aborted seeds from embryo- lethal mutants of Arabidopsis thaliana. Planta, v. 169, n. 3, p. 304–312, 1986. IRWIN, S. D. et al. The Ricinus communis 2S albumin precursor: A single preproprotein may be processed into two different heterodimeric storage proteins. Molecular and General Genetics, v. 222, n. 2–3, p. 400–408, 1990. KHAN, S. et al. Purification and characterization of 2S albumin from Nelumbo nucifera. Bioscience, Biotechnology and Biochemistry, v. 80, n. 11, p. 2109–2114, 2016. KOU, X. et al. Nutraceutical or Pharmacological Potential of Moringa oleifera Lam. Nutrients, v. 10, n. 3, p. 343, 2018. KREBBERS, E. et al. Determination of the processing sites of an Arabidopsis 2S albumin and characterization of the complete gene family. Plant physiology, v. 87, p. 859–866, 1988. LEONE, A. et al. Moringa oleifera seeds and oil: Characteristics and uses for human health. International Journal of Molecular Sciences, v. 17, n. 12, p. 1–14, 2016. LEHMANN, K. et al. Structure and stability of 2S albumin-type peanut allergens: implications for the severity of peanut allergic reactions. Biochemical Journal, v. 395, p. 463-472, 2006. LI, D.-F. et al. Crystal structure of Mabinlin II: A novel structural type of sweet proteins and the main structural basis for its sweetness. Journal of Structural Biology, v. 162, p. 50-62, 2008. LI, L. et al. MAIGO2 is involved in exit of seed storage proteins from the endoplasmic reticulum in Arabidopsis thaliana. The Plant cell, v. 18, n. 12, p. 3535–3547, 2006. LIPIPUN, V. et al. Efficacy of Thai medicinal plant extracts against Herpes simplex virus type 1 infection in vitro and in vivo. Antiviral Research, v. 60, p. 175–180, 2003. MAKKAR, H. P. S.; BECKER, K. Nutrients and antiquality factors in different morphological parts of the Moringa oleifera tree. The Journal of Agricultural Science, v. 128, p. 311–322, 1997. MANDAL, S. et al. Precursor of the inactive 2S seed storage protein from the Indian mustard Brassica juncea is a novel trypsin inhibitor. Characterization, post-translational processing studies, and transgenic expression to develop insect-resistant plants. Journal of Biological Chemistry, v. 277, n. 40, p. 37161–37168, 2002. MARIA-NETO, S. et al. Bactericidal activity identified in 2S albumin from sesame seeds and in silico studies of structure-function relations. Protein Journal, v. 30, n. 5, p. 340–350, 2011. 35

MARTINOIA, E.; MAESHIMA, M.; NEUHAUS, H. E. Vacuolar transporters and their essential role in plant metabolism. Journal of Experimental Botany, v. 58, n. 1, p. 83–102, 2007. MENÉNDEZ-ARIAS, L. et al. Primary structure of the major allergen of yellow mustard (Sinapis alba L.) seed, Sin a I. European journal of Bochemistry/FEBS, v. 177, n. 1, p. 159–166, 1988. MORENO, F. J. et al. Mass spectrometry and structural characterization of 2S albumin isoforms from Brazil nuts (Bertholletia excelsa). Biochimica et Biophysica Acta - Proteins and Proteomics, v. 1698, p. 175–186, 2004. MORENO, F. J. et al. Thermostability and in vitro digestibility of a purified major allergen 2S albumin (Ses i 1) from white sesame seeds (Sesamum indicum L.). Biochimica et Biophysica Acta - Proteins and Proteomics, v. 1752, p. 142–153, 2005. MORENO, F. J.; CLEMENTE, A. 2S albumin storage proteins: What makes them food allergens? The Open Biochemistry Journal, v.2, p. 16-28, 2008. MYLNE, J. S.; HARA-NISHIMURA, I.; ROSENGREN, K. J. Seed storage albumins: biosynthesis, trafficking and structures. Functional Plant Biology, v. 41, p. 671–7, 2014. NAWROT, R. et al. Plant antimicrobial peptides. Host Defense Peptides and Their Potential as Therapeutic Agents, v. 59, p. 181–196, 2014. NDABIGENGESERE, A.; SUBBA NARASIAH, K.; TALBOT, B. G. Active agents and mechanism of coagulation of turbid waters using Moringa oleifera. Water Research, v. 29, p. 703–710, 1995. NETO, J. X. S. et al. A chitin-binding protein purified from Moringa oleifera seeds presents anticandidal activity by increasing cell membrane permeability and reactive oxygen species production. Frontiers in Microbiology, v. 8, p. 1–12, 2017. NIRASAWA, S. et al. Cloning and sequencing of a cDNA encoding a heat-stable sweet protein, mabinlin II. Gene, v. 181, p. 225–7, 1993. NWOSU, M.; OKAFOR, J. I. Preliminary studies of the antifungal activities of some medicial plants against Basidiobolus and some other pathogenic fungi. Mycoses, v. 38, p. 191–5, 1995. ODINTSOVA, T. I. et al. Antifungal activity of storage 2S albumins from seeds of the invasive weed dandelion Taraxacum officinale Wigg. Protein and Peptide Letters, v. 17, n. 4, p. 522–529, 2010. OKUDA, T. et al. Isolation and characterization of coagulant extracted from Moringa oleifera seed by salt solution. Water Research, v. 35, n. 2, p. 405–410, 2001. ONYEKE, C. C.; AKUESHI, C. O. Infectivity and reproduction of Meloidogyne incognita (Kofoid and White) Chitwood on African yam bean, Sphenostylis stenocarpa (Hochst Ex. A. Rich) Harms accessions as influenced by botanical soil amendments. African Journal of Biotechnology, v. 11, n. 67, p. 13095–103, ago. 2012. ORRUÑO, E.; MORGAN, M. R. A. Resistance of purified seed storage proteins from sesame (Sesamum indicum L.) to proteolytic digestive enzymes. Food Chemistry, v. 128, n. 4, p. 923–929, 2011. 36

PANTOJA-UCEDA, D. et al. Solution structure and stability against digestion of rproBnib, a recombinant 2S albumin from rapeseed: relationship to its allergenic properties. Biochemistry, v. 43, p. 16036-16045, 2004a. PANTOJA-UCEDA, D. et al. Solution structure of a methionine-rich 2S albumin from sunflower seeds: Relationship to its allergenic and emulsifying properties. Biochemistry, v. 43, p. 6976-6986, 2004b. PANTOJA-UCEDA, D. et al. Solution structure of RicC3, a 2S albumin storage protein from Ricinus communis. Biochemistry, v. 42, p. 13839-13847, 2003. PEREIRA, M. L. et al. Purification of a chitin-binding protein from Moringa oleifera seeds with potential to relieve pain and inflammation. Protein and Peptide Letters, v. 18, p. 1078– 1085, 2011. PFEIFER, S. et al. Cor a 14, the allergenic 2S albumin from hazelnut, is highly thermostable and resistant to gastrointestinal digestion. Molecular Nutrition & Food Research, v. 59, n. 10, p. 2077–2086, 2015. RAMACHANDRAN, C.; PETER, K. V.; GOPALAKRISHNAN, P. K. Drumstick (Moringa oleifera): A Multipurpose Indian Vegetable. Economic Botany, v. 34, n. 3, p. 276–283, 1980. RANI, N. Z. A.; HUSAIN, K.; KUMOLOSASI, E. Moringa Genus: A review of phytochemistry and pharmacology. Frontiers in Pharmacology, v. 9, n. p. 1–26, 2018. REGENTE, M.; DE LA CANAL, L. Do sunflower 2S albumins play a role in resistance to fungi? Plant Physiology and Biochemistry, v. 39, p. 407–413, 2001. RIBEIRO, S. F. F. et al. Antifungal and other biological activities of two 2S albumin- homologous proteins against pathogenic fungi. Protein Journal, v. 31, p. 59–67, 2011a. RIBEIRO, S. M. et al. Identification of a Passiflora alata Curtis dimeric peptide showing identity with 2S albumins. Peptides, v. 32, n. 5, p. 868–874, 2011b. RICO, M. et al. H NMR assignment and global fold of napin BnIb, a representative 2S albumin seed protein. Biochemistry, v. 35, p. 15672-15682, 1996. ROBOTHAM, J. M. et al. Ana o 3, an important cashew nut (Anacardium occidentale L.) allergen of the 2S albumin family. Journal of Allergy and Clinical Immunology, v. 115, p. 1284–1290, 2005. ROCHA, M. F. G. et al. Extratos de Moringa oleifera e Vernonia sp. sobre Candida albicans e Microsporum canis isolados de cães e gatos e análise da toxicidade em Artemia sp. Ciência Rural, v. 41, p. 1807–1812, 2011. ROLIM, L. A. D. M. M. et al. Genotoxicity evaluation of Moringa oleifera seed extract and lectin. Journal of Food Science, v. 76, p. T53-T58, 2011. RUNDQVIST, L. et al. Solution structure, copper binding and backbone dynamics of recombinant Ber e 1 – The major allergen from brazil nut. PLOS ONE, v. 7, n. 10, p. e46435, 2012. 37

SAHAY, S.; YADAV, U.; SRINIVASAMURTHY, S. Potential of Moringa oleifera as a functional food ingredient: A review. International Journal of Food Science and Nutrition, v. 2, n. 5, p. 31–37, 2017. SANTOS, A. F. S. et al. Isolation of a seed coagulant Moringa oleifera lectin. Process Biochemistry, v. 44, p. 504–508, 2009. SAPANA, M. M.; SONAL, G. C.; RAUT, P. Use of Moringa oleifera (Drumstick) seed as natural absorbent and an antimicrobial agent for ground water treatment. Research Journal of Recent Sciences, v. 1, n. 3, p. 31–40, 2012. SHARIEF, F. S.; LI, S. S. Amino acid sequence of small and large subunits of seed storage protein from Ricinus communis. The Journal of biological chemistry, v. 257, n. 24, p. 14753–14759, 1982. SHARMA, A. et al. Purification and characterization of 2S albumin from seeds of Wrightia tinctoria exhibiting antibacterial and DNase activity. Protein and Peptide Letters, v. 24, n. 4, p. 368–378, 2017. SHARMA, G. M. et al. Cloning and characterization of 2s albumin, Car i 1, a major allergen in pecan. Journal of Agricultural and Food Chemistry, v. 59, p. 4130–4139, 2011. SHEWRY, P. R.; NAPIER, J. A; TATHAM, A S. Seed storage proteins: structures and biosynthesis. The Plant cell, v. 7, n. July, p. 945–956, 1995. SHIH, M. C. et al. Effect of different parts (leaf, stem and stalk) and seasons (summer and winter) on the chemical compositions and antioxidant activity of Moringa oleifera. International Journal of Molecular Sciences, v. 12, p. 6077–6088, 2011. SOUZA, P. F. N. et al. A 2S albumin from the seed cake of Ricinus communis inhibits trypsin and has strong antibacterial activity against human pathogenic bacteria. Journal of Natural Products, v. 79, p. 2423–31, 2016. TAI, S. S. K. et al. Molecular cloning of 11S globulin and 2S albumin, the two major seed storage proteins in Sesame. Journal of Agricultural and Food Chemistry, v. 47, p. 4932–8, 1999. TAI, S. S. K. et al. Expression pattern and deposition of three storage proteins, 11S globulin, 2S albumin and 7S globulin in maturing sesame seeds. Plant Physiology and Biochemistry, v. 39, n. 11, p. 981–992, 2001. TAN-WILSON, A. L.; WILSON, K. A. Mobilization of seed protein reserves. Physiologia Plantarum, v. 145, n. 1, p. 140–153, 2012. TERRAS, F. R. et al. A new family of basic cysteine-rich plant antifungal proteins from Brassicaceae species. FEBS letters, v. 316, n. 3, p. 233–240, 1993. TERRAS, F. R. G. et al. Analysis of two novel classes of plant antifungal proteins from radish (Raphanus sativus L.) seeds. Journal of Biological Chemistry, v. 267, p. 15301– 15309, 1992. TOMAR, P. P. S. et al. Characterization of anticancer, DNase and antifungal activity of pumpkin 2S albumin. Biochemical and Biophysical Research Communications, v. 448, p. 349–54, 2014a. 38

TOMAR, P. P. S. et al. Purification, characterization and cloning of a 2S albumin with DNase, RNase and antifungal activities from Putranjiva Roxburghii. Applied Biochemistry and Biotechnology, v. July, p. 1–12, 2014b. ULLAH, A. et al. Crystal structure of mature 2S albumin from Moringa oleifera seeds. Biochemical and Biophysical Research Communications, v. 468, p. 365-371, 2015. VAN DER KLEI, H. et al. A fifth 2S albumin isoform is present in Arabidopsis thaliana. Plant Physiology, v. 101, n. 4, p. 1415–1416, 1993. VICENTE, T. et al. Tratabilidade de água superficial utilizando coagulantes naturais à base de tanino e extratos de sementes de Moringa oleifera. Ensaios e Ciência: C. Biológicas, Agrárias e da Saúde, v. 1, n. 3, p. 152–155, 2017. VIEIRA, H.; CHAVES, L. H. G.; VIÉGAS, R. A. Acumulação de nutrientes em mudas de moringa (Moringa oleifera Lam) sob omissão de macronutrients. Revista Ciência Agronômica, v. 39, n. 1, p. 130–136, 2008. VIERA, G. H. F. et al. Antibacterial effect (in vitro) of Moringa oleifera and Annona muricata against Gram positive and Gram negative bacteria. Revista do Instituto de Medicina Tropical de São Paulo, v. 52, n. 3, p. 129–132, 2010. VITALE, A; GALILI, G. The endomembrane system and the problem of protein sorting. Plant Physiology, v. 125, n. 1, p. 115–118, 2001. WANG, X. et al. Purification and characterization of three antifungal proteins from cheeseweed (Malva parviflora). Biochemical and Biophysical Research Communications, v. 282, n. 5, p. 1224–1228, 2001. WANG, X.; BUNKERS, G. J. Potent heterologous antifungal proteins from cheeseweed (Malva parviflora). Biochemical and Biophysical Research Communications, v. 279, n. 2, p. 669–673, 2000. WILSON, K. A. et al. Role of vacuolar membrane proton pumps in the acidification of protein storage vacuoles following germination. Plant Physiology and Biochemistry, v. 104, p. 242–249, 2016. YOULE, R. J.; HUANG, A. H. C. Occurrence of low molecular weight and high cysteine containing albumin storage proteins in oilseeds of diverse species. American Journal of Botany, v. 68, n. 1, p. 44, 1981. ZAKU, S. G. et al. Moringa oleifera: An underutilized tree in Nigeria with amazing versatility: A review. African Journal of Food Science, v. 9, n. 9, p. 456–461, 2015.

39

5 POST-TRANSLATIONAL MODIFICATIONS REVEAL THAT MO- CBP3, A 2S MORINGA OLEIFERA CHITIN-BINDING ALBUMIN, IS A COMPLEX MIXTURE OF ISOFORMS

Abstract

Mo-CBP3 is a mixture of proteins (Mo-CBP3-1, Mo-CBP3-2, Mo-CBP3-3 and Mo-CBP3-4) from Moringa oleifera seeds, as described previously, and it is classified as member of 2S albumin seed storage proteins. The four isoforms share the same molecular weight, but slightly diverge in terms of primary sequence. X-ray diffraction studies revealed that Mo-

CBP3-4 is a heterodimeric protein consisting of a small and a large polypeptide chain stabilized by disulfide bonds. This work aimed to investigate the occurrence of other Mo-

CBP3 isoforms and characterize these proteins regarding to physicochemical properties and post-translational modifications (PTM). DNA isolation, molecular cloning and DNA sequencing along with the analysis of deduced amino acid sequences demonstrated the existence of four undescribed Mo-CBP3 isoforms, which are most closely related to Mo-CBP3-

2 and Mo-CBP3-3. In addition, the native Mo-CBP3 protein complex extracted from M. oleifera seeds was purified and further separeted and characterized on a ultra performance liquid chromatography system coupled to an electrospray ionization tandem mass spectrometer (MS). Both the intact and the reduced and alkylated proteins were used. These analysis allowed to separate and identified peaks corresponding to the four known and the four new Mo-CBP3 isoforms. Furthermore, MS/MS experiments revealed the existence of an impressive number of clusters of peptides (89 and 57, small and large chain respectively) presenting fixed and variable PTM, which may account for an extensive diversity of Mo-

CBP3 isoforms. The biological meaning and implications for this findings are discussed.

Keywords: Isoform. Post-translational modification. Processing. 2S albumins. PTMs.

Introduction

Mo-CBP3 is a complex mixture of seed storage proteins from the tree Moringa oleifera belonging to the 2S albumin group [1], a class of water-soluble proteins that contribute to the nutrition and development of germinating seeds widely distributed in mono and dicotyledonous plants, including cowpea [2], Brazil nut [3], mustard [4], peanut [5], 40

rapeseed [6] and others. In addition to its nutritional contribution to developing seeds, Mo-

CBP3 was found to be a chitin binding protein (CBP) possessing antifungal activity on both spore germination and mycelial growth of the phytopathogenic fungus Fusarium solani, an activity observed even when the protein was subjected to heat treatment at 100 ºC for 1 h. Other economically important phytopathogenic fungi, F. oxysporum,

Colletotrichum musae and C. gloesporioides [7;8] were also inhibited in vitro by Mo-CBP3.

Structural crystallographic studies on the closely-related protein Mo-CBP3-4 (PDB ID:

5DOM), a isoform of Mo-CBP3, has evidenced that this isoform is a heterodimeric protein composed by two subunits, termed small and large chain, differing on its amino acidic composition and length, which are held together and stabilized by two disulfide bonds, a feature commonly found in other 2S albumin members [1;9].

Additional studies had evidenced that the final active and mature Mo-CBP3 polypeptide is indeed the result of the proteolytic processing of a larger precursor, which takes place with the initial removal of an N-terminal signal peptide followed by its subsequent cleavage at four specific points, resulting in two asymmetric chains with apparent molecular masses of around 4.1 and 8.1 kDa [1]. This pattern of protein processing and maturation is commonly shared by many other 2S albumins [10;11], with the exception of some members, whose processing results in mature proteins formed by a single polypeptide chain [12]. Like other 2S albumin members [13;14], Mo-CBP3 has a multigenic origin, which explains the multitude of isoforms, Mo-CBP3-1, Mo-CBP3-2, Mo-CBP3-3 and Mo-CBP3-4, previously described [1]. In addition to the multigenic origin, 2S albumins are in general subjected to a sort of post-translational modifications (PTM), which may result in local physicochemical changes in the molecule properties and arrangement mostly related to charge and volume alterations and also the concomitant adjustment of protein-ligand or protein-protein interactions [15;16]. Since Mo-CBP3, as a classical 2S albumin, may potentially be diversified in terms of isoforms due to PTMs one can speculate that not only the known isoforms of the protein but also many other of these putative isoforms may be expected to be present and active in seeds of M. oleifera. However, due to the rapid and unexpected changes in the primary structure of a protein, it is very difficult to predict whether a protein is modified as well as how extensive is this modification based solely on the inspection of amino acid sequences. In addition, the use of conventional molecular biology techniques to study PTM is somewhat difficult and restricted, because of the necessity of equipment with high accuracy and detection limits 41

capable of to identify and differentiate molecular alterations in the order of 1 Da or even less [17]. Because of this the employment of the mass spectrometry analysis in conjunction with molecular biology techniques can guarantee with a certain degree of precision and accuracy the detection and characterization of PTMs. This work was conducted with the aim to identify and characterize new isoforms of the complex mixture of 2S albumins collectively named Mo-CBP3 protein and its possible isoforms raised from PTM.

Materials and methods

Plant material

Seeds and leaves of M. oleifera were harvested from trees growing at the Campus do Pici, Fortaleza, Ceará, Brazil. Voucher specimens (EAC 54112) were deposited at the Herbário Prisco Bezerra, Universidade Federal do Ceará. Once M. oleifera is not a native species of Brazilian flora, it was not required specific permissions from local authorities to get the samples used in this work. Plant material was thoroughly washed in distilled water, immediately frozen in liquid nitrogen and stored at -80 °C until use.

Plasmid, bacterial strain and reagents

The pGEM-T Easy cloning vector was purchased from Promega (Madison, WI, USA). Escherichia coli TOP10Fʹ cloning cells were obtained from Invitrogen (Carlsbad, CA, USA). All the reagents used in this work were of analytical grade and high purity.

DNA extraction, amplification and cloning of PCR products

Frozen leaves of M. oleifera were powdered with a mortar and a pestle in the presence of liquid nitrogen until obtainment of a fine powder and immediately submitted to extraction of genomic DNA (gDNA) using a CTAB-based protocol (Cetyltrimethylammonium-bromide), as described by Warner [18]. The purified gDNA was resuspended in Tris-EDTA buffer (10 mM Tris-HCl, pH 8.0, 1 mM EDTA) and stored at - 20°C. The quality and yield of gDNA were verified by assessing the integrity, using agarose gel electrophoresis [19;20], and the 260/280 nm spectrophotometric ratio, respectively. 42

Three pairs of primers were designed and randomly selected to identify suitable combinations to get all the potential amplicons for M. oleifera 2S albumins (Supplementary Table S1). Briefly, PCR amplifications were carried out in a 50 μL reaction volume. Reactions were performed by addition of 1.2 µg of gDNA, 10 µL of 1X Green GoTaq reaction buffer, 1.5 mM MgCl2, 200 μM for each dNTP, 0.5 μM for each primer, and 2.5 U of Go Taq DNA Polymerase (Promega, Madison, WI, USA). PCR conditions were: 95 °C for 4 min (initial denaturation step), 95 °C for 1 min. (denaturation), 60 °C for 1.5 min (primer annealing) and 72 °C for 1.5 min (target elongation). These reactions were repeated for 33 cycles. A final elongation step (72 °C for 15 min) concluded the PCR reactions cycle. PCR products were separated by 1% agarose gel electrophoresis and analyzed after ethidium bromide staining and image digitization [21]. The sizes of the amplicons were estimated with the program E-Capt (Version 14.1 for windows). An aliquot of the remaining amplified products was purified using the Illustra GFX PCR DNA and Gel Band Purification Kit (GE Healthcare, Buckinghamshire, UK) following the manufacturer's instructions. Purified DNA molecules were ligated into the pGEM-T Easy cloning vector using 1 U of T4 DNA ligase (Promega) at 4 °C for 16 h. Products from the ligation reactions were introduced into E. coli TOP10F’ cells by electroporation. Positive clones were selected on LB agar plates containing 100 μg.mL-1 carbenicillin, 25 μg.mL-1 streptomycin, 0.4 μM IPTG and 80 μg.ml-1 X-GAL. Plasmid DNA was isolated from antibiotic-resistant colonies using the alkaline lysis method [22] and the presence of the inserts was confirmed by restriction endonuclease reaction with Eco RI (Fermentas Life Sciences, Ontario, Canada) at 37 ºC for 30 minutes. Eco RI was inactivated at 65 ºC for 20 minutes.

DNA sequencing and assembly

Plasmid samples for DNA sequencing were purified using the AxyPrep Plasmid Miniprep Kit (Axygen Scientific, Union City, CA, USA) according to the manufacturer’s instructions. Sequencing process was done by Macrogen Sequencing Service (Macrogen Inc., Seoul, Korea) using the standard Sanger dideoxy sequencing (chain termination) method in a 3730xl DNA Analyzer (Applied Biosystems). Amplicons obtained from different primer combinations were all sequenced in order to obtain the maximum number of different isoforms of the protein of interest [1].

43

Sequence analysis and prediction of disulfide bonds and signal peptides

Sequences of interest were screened using the ABI Format Sequence Analysis software and the Phred/Phrap/Consed program package was used to verify the quality of the sequences. The chromat file for each sequence was used by Phred (for base-calling), Phrap (for assembly of the sequences), and Consed (for viewing and editing the analysis) for alignment and analysis of the reads. The output from the analysis included color-coded variant nucleotides and base variants were identified based on the relative peak heights of each dye after normalization.

The nucleotide and deduced amino acid sequences of Mo-CBP3 isoforms were used for BLAST search on the NCBI database using the BLASTp tool [23] at the web server http://www.ncbi.nlm.gov/. The presence and delimitation of protein domains was accomplished using the - CDD server - Conserved Domain Database [24]. Multiple alignments of DNA and deduced amino acid sequences were performed using the Molecular Evolutionary Genetics Analysis (MEGA) software version 6.0 [25], using Muscle tools [26], implemented with the BioEdit 7.2.5 software package [27], and the program Clustal Ω [28] at the web server www.ebi.ac.uk/Tools/msa/clustalo/, which were routinely used for sequence manipulation, editing and comparisons. The default alignment parameters of Clustal Ω were employed, although the number of combined iteractions and the maximum number of HMM (Hidden Markov Model) iteractions were both set to five. The possible presence, the number and the arrangement of disulfide bonds were analyzed by DiANNA 1.1 web server [37]. The presence of possible signal peptides was evaluated by the web server SignalP 4.1 Server, http://www.cbs.dtu.dk/services/SignalP/.

Sample preparation and mass spectrometry analysis

Mo-CBP3 was purified from crude extracts of mature M. oleifera seeds using affinity chromatography on a chitin matrix followed by cation exchange chromatography on a CM-Sepharose (GE Healthcare) matrix as described previously [7]. The purity of the samples was verified by tricine-SDS-polyacrylamide gel electrophoresis [29] and protein bands were stained with a 0.1% (w/v) Coomassie Brilliant Blue R-250 in 40% methanol/1% acetic acid solution. Distaining was carried out in a 50% (v/v) methanol solution. 44

Alternatively, some samples were separated on its two chains by reduction and alkylation of Cysteine residues, according to Shevchenko and collaborators [30], before MS analysis. Purified Mo-CBP3 (5 mg) was reduced with 100 μL of a solution consisting of 65 mM DTT in 100 mM ammonium bicarbonate at 56 °C for 30 min. Following, the protein was immediately alkylated in 100 μL of a solution of 200 mM iodoacetamide in 100 mM ammonium bicarbonate, at room temperature, for 30 min in the dark. Two cycles of washing and dehydration were performed in 200 μL of 100 mM ammonium bicarbonate during 10 min (wash) and 200 μL of absolute acetonitrile for 5 min (dehydration) and completely dried under vacuum. Reduced and alkylated samples of Mo-CBP3 were directly injected in a reversed-phase high-performance liquid chromatography (HPLC) C18 column (75 μm x 10 cm, 1.7 μm particle size) coupled to a nanoAcquity UPLC sample manager following a flow rate of 0.35 mL.min-1 (Souza et al. 2012). Mobile phases consisted of 0.1% formic acid in water (A solution) and 0.1% formic acid in acetonitrile (B solution). Retained proteins were eluted according to an acetonitrile gradient consisting of a linear increase of 0-13.5% of B solution in 15 min and then a final step ranging from 13.5 to 50% of B solution in 15 min. This condition (50% of B solution) was kept constant for additional 50 min. Mass spectra were acquired in a Synapt G1 HDMS instrument (Waters Co., Milford, MA, USA) using a data-dependent acquisition (DDA) mode, where the top peaks (B solution ranging from 27% to 47%) were subjected to LC-ESI-MS analysis. The data were processed using the Protein Lynx Global Server software (Waters Co.).

Post-translational modification (PTM) analysis

The data acquired from the Protein Lynx Global Server software were analyzed for the presence of one fixed and four variable PTMs on peptides, concerning to the small and large chain of Mo-CBP3 isoforms. The carboxyamidomethylcysteine formation was the fixed PTM analyzed. This PTM was considered for all peptides under analysis and it results in an implement of mass of +57.02146 Da. The variable PTMs analyzed were: 1) pGln acid formation: This modification can occur due to the cyclization of N-terminal Gln residues resulting in the formation of pGln acid and a mass alteration of -17.02655 Da; 2) Methionine oxidation: Like pGln acid formation this is a variable PTM and its occurrence was only possible to be determined through the manual and direct identification of affected Met residues by the evaluation of the caused mass 45

change (+15.99491 Da); 3) Phosphorylation: This variable PTM results in a mass alteration of +79.96633 Da and it was analyzed using the web server NetPhos 3.1 (http://www.cbs.dtu.dk/services/NetPhos/), which is capable to predict the possible sites for phosphorylation and 4) Hydroxylation: The iHyd-PseAAC server (http://app.aporc.org/iHyd- PseAAC/) was used to predict and identify hydroxylation sites on the peptides analyzed. Hydroxylation was the last PTM analyzed and it is implied to an alteration of monoisotopic mass of +15.99491 Da.

Results and discussions

cDNA cloning and sequence analysis of Mo-CBP3 isoforms

Genomic DNA was extracted from fresh leaves of M. oleifera trees according to method described by Freire and and collaborators [1]. Five different amplicons were generated after PCR cycles, with estimated sizes around 716, 680, 680, 680 and 628 base pairs, according to E-Capt software (Fig. 1).

Fonte: elaborada pelo autor.

Figure 1. Agarose gel electrophoresis of PCR products encoding Mo-CBP3 isoforms. Genomic fragments encoding Mo-CBP3 isoforms were amplified by PCR using total genomic DNA from M. oleifera as template and different combinations of the primers described in Table S1. Lane 1: DNA size markers; Lane 2: PCR products obtained with primers P5 and P2 (~700 bp); Lane 3: PCR products obtained with primers P5 and P6 (~680 bp); Lane 4: PCR products obtained with primers P1 and P2 (~680 bp); Lane 5: PCR products obtained with primers P3 and P4 (~680 bp); Lane 6: PCR products obtained with primers P3 and P2 (~630 bp).

46

PCR products were extracted from gel, purified and cloned into E. Coli TOP10F’.

A total of 32 clones coding for different Mo-CBP3 isoforms were obtained and completely sequenced. When the deduced amino acid sequences were multiply aligned it was possible to isoform them into six groups according to their overall similarity (Supplementary Fig. S1 –

S6). It is important to note that the known sequences of 2S albumins Mo-CBP3-1 (160 aa –

Accession number KF616830.1), Mo-CBP3-2 (163 aa – Accession number KF616832.1) were not observed here, although has been reported in the previous article [1]. A multiple alignment with all deduced amino acid sequences for Mo-CBP3 can be seen in figure 2 (Supplementary material Fig. S7 - multiple alignment with all deduced nucleotides sequences).

Based on sequence similarities with the previously described Mo-CBP3 proteins we reported here four new Mo-CBP3 isoforms which were named Mo-CBP3-2A (163 aa), and

Mo-CBP3-2B (162 aa) and Mo-CBP3-3A and Mo-CBP3-3B (both with 160 aa). Mo-CBP3-2A and Mo-CBP3-2B share an amino acid identity with Mo-CBP3-2 of 99 and 98%, respectively, while Mo-CBP3-3A and Mo-CBP3-3B share an amino acid identity with Mo-CBP3-3 of 99% (Fig. S2). Matrix of pairwise comparisons of gDNA (this work) and cDNA (Freire et al., 2015) and amino acid sequences (isoforms 1, 2, 2A, 2B, 3, 3A, 3B and 4) can be seen in the supplementary tables and S3.

According to SignalP 4.1 Server [31], the new isoforms of Mo-CBP3 described here presented the conserved Eukaryote signal peptide sequence (Supplementary Fig. S8). This sequence is formed by a classical tripartite structure comprising a positively charged N- terminal region (3 aa residues ending with alysine residue) followed by a central hydrophobic core (13 aa residues long, with a high proportion of leucine residues) and a neutral but polar C-terminal region (4 aa residues long). This C-terminal region contains the putative signal peptide cleavage site (Fig. 2). Positions −1 and −3 relative to this region are occupied by Ala20 and Ala18 (isoforms 1, 3, 3A, 3B, and 4) or Ala20 and Thr18 (isoforms 2, 2A and 2B), respectively. This finding is in conformity with the canonical cleavage sites hared by common signal peptides [32] (Fig. 2).

Purification and analysis of Mo-CBP3 isoforms by LC-ESI-MS

Purification of Mo-CBP3 was achieved following the protocol described by [7]. The proteinaceous crude extract of M. oleifera seeds yielded approximately 2.33 g (11.65%) of albumins/g of freeze-dried seed after 4 h extraction with 0.05 M Tris-HCl, pH 8.0 buffer 47

containing 0.15 M NaCl at 4 ºC. Affinity chromatography on a chitin column produced a non- retained peak (NR) and the adsorbed proteins (retained peak-RP) were sequentially eluted in a 0.05 M acetic acid solution. Tubes from RP were pooled and dialyzed against distilled water, lyophilized and applied (750 mg of protein) onto a cation exchange matrix (CM-Sepharose).

This chromatographic step produced a non-retained peak (Mo-CBP1) and three additional retained fractions (Mo-CBP2, Mo-CBP3, and Mo-CBP4), which were eluted with 0.4, 0.5, and 0.6 M NaCl, respectively.

Approximately 18 mg of the fraction of interest, Mo-CBP3, (based on published data, [7]), were obtained from RP, representing about 2.4% of the total soluble protein. When analyzed by gel electrophoresis on a 17.5% polyacrylamide/Tris-tricine system under denaturing and reducing conditions it was observed that Mo-CBP3 is comprised by two polypeptide chains (a large chain of around 8 KDa and a small chain of approximately 4

KDa), as noted before [1;8]. Previous studies had demonstrated that this Mo-CBP3 fraction is indeed a mixture of at least four isoforms of a protein, which are difficultly resolved and identified through 2D gel electrophoresis [7;8]. Similarly to napin, a 2S albumin from

Brassica napus [13], Mo-CBP3 is a heterodimeric protein formed by two main domains, a small and a large chain intercalating a linker region and a signal peptide sequence located at its N-terminal region [1]. The mature polypeptide lacks the N-terminal, the signal peptide and the linker region, thus being constituted only by the small and the large chain regions, which are connected and stabilized by disulfide bonds [33;34;35;36].

Previously reduced and alkylated Mo-CBP3 samples were subjected to HPLC (Fig. 3) and, consequently to mass spectrometry analysis were conducted in a Synapt G1 HDMS Q-ToF mass spectrometer (Waters Co., Milford, MA, USA) through LC-ESI-MS in an attempt to better characterize Mo-CBP3 isoforms. MS data revealed in this work not only the presence of the four well known described isoforms, Mo-CBP3-1, 2, 3 and Mo-CBP3-4 [1], but also four new uncharacterized proteins termed here, Mo-CBP3-2A, 2B and Mo-CBP3-3A,

3B which are variants of Mo-CBP3-2 and Mo-CBP3-3, respectively. When the reduced form of the protein was analyzed the maximum length observed for the small chain was 34 aa (Mo-CBP3-4), 35 aa (Mo-CBP3-1, 2), 36 aa (Mo-

CBP3-2A, 2B) and 38 aa (Mo-CBP3-3, 3A and 3B), while the maximum length observed for the large chain was 70 aa (Mo-CBP3-3, 3A and 3B), 71 aa (Mo-CBP3-1 and 4), 72 aa (Mo-

CBP3-2, 2A) and 73 aa (Mo-CBP3-2B). The table 1 and table 2 summarizes the variation in 48

Figure 2. Multiple sequence alignment of Mo-CBP3 isolated of M. oleifera. The amino acid sequence of

Mo-CBP3-1 (GenBank: AHG99681.1), Mo-CBP3-2

(GenBank: AHG99683.1), Mo-CBP3-3 (GenBank:

AHG99684.1), Mo-CBP3-4 (GenBank:

AHG99682.1), CAA22018] and the new Mo-CBP3 isoforms determined in this work were aligned using the online version of Clustal Ω. Shade residues indicate ≥50% prevalence (black) or <50% prevalence (white), and the Cys residues are highlighted in yellow. Disulfide bridges are indicated with yellow lines above the sequences. The positions of α-helices are indicated above the sequence (lines blue spirals). The α-helices of the small chain(α1 and α2) and large chain (α3, α4 and

α5) of Mo-CBP3-3 (PDB code 5DOM). Signal peptide is indicated in the sequence (lines blue).The N-terminal extensionindicate (NTE), the linker peptide (LP) and the C-terminal extension (CTE) of

the pro Mo-CBP3-3 are labeled. The processing sites

in the proMo-CBP3-3 sequence: Phosphorylation sites (green residues); Oxidation sites (red residues) and hydroxylation sites (blue residues).

Fonte: elaborada pelo autor.

49

Fonte: elaborada pelo autor.

Figure 3. High performance liquid chromatography (HPLC) chromatogram of reduced and alkylated Mo-CBP3. Fill in light gray (peak 1) corresponds to small chain; fill in dark gray (peak 2), corresponds to large chain.

the length and amino acid composition observed between the large and small chains and among different isoforms of Mo-CBP3.

In addition to the amino acid composition, isoforms of Mo-CBP3 differ regard the number and disposal of disulfide bonds (Fig. 4) that stabilize its structure, according to the

DiANNA 1.1 web server [37] and the previously solved tridimensional structure of Mo-CBP3-

4, PDB ID: 5DOM [9]. The majority of isoforms (Mo-CBP3-1, 2, 2A, 3, 3A, 3B and 4) present two intramolecular disulfide bonds, while Mo-CBP3-2B has only one intramolecular 146 146 disulfide bond (Cys was substituted by Arg , relating to Mo-CBP3-2 and Mo-CBP3-2A). On the other hand, seven isoforms have 2 intermolecular disulfide bonds, with the exception 110 of Mo-CBP3-3A, which has only 1 intermolecular disulfide bond (Cys was substituted by 110 Arg , relating to Mo-CBP3-3 and Mo-CBP3-3B. Several other 2S albumins are reported to being formed by 4 disulfide bonds, consisting of heterodimeric proteins stabilized by 2 intermolecular and 2 intramolecular disulfide bonds (napin, PDB ID: 1PNB from Brassica napus [38] and mabinlin II PDB ID: 2DS2 from Capparis masaikai [39]). Indeed the arrangement in which two chains are cross- braced by four disulfide bonds is the predominant and conserved array throughout the 2S

50

albumin family [40]. However there are also cases in which 2S albumin members are not processed into heterodimers remaining as monomeric proteins yet stabilized by 4 disulfide bonds (Bre e 1, PDB ID: 2LVF from Bertholletia excels [41]; RicC3 PDB ID 1PSY from Ricinus communis [42] and Bnlb PDB ID: 1SM7 from Brassica napus [42]).

Most isoforms of Mo-CBP3 had the classical arrangement of disulfide bonds as observed for napin and mabinlin II, however, to our knowledge this is the first report in which

2S albumins are stabilized by an unusual array of 3 disulfide bonds (Mo-CBP3-2B and Mo-

CBP3-3A) rather than 4. In order to understand the mass variation observed in the intact isoforms of Mo-

CBP3 and taking into consideration the intricate array of disulfide bonds observed here we decided to separate the protein into its two constituent chains by reduction and alkylation of cysteine residues before mass spectrometry studies. The mixture was then applied to a reversed phase C18 column and the retained peaks were eluted under a gradient of 14.81-50% acetonitrile (Fig. S11). Fractions corresponding to the small (RP-HPLC 1) and the large chains (RP-HPLC 2), were independently applied onto a Synapt G1 HDMS Q-ToF mass spectrometer and the results of the mass analysis are summarized in the (Fig. 5 and Fig. 6). A complete in depth analysis of the results is shown on supplementary data (Supplementary Figs. S9 – S30).

Clusters of Mo-CBP3-3 isoforms

89 clusters of mass were obtained for the small chain of Mo-CBP3. From this total, 20 clusters of peptides were identified as belonging to Mo-CBP3-1, 8 to Mo-CBP3-2, 7

Mo-CBP3-2A, 2B, 30 to Mo-CBP3-3, 3A, 3B, 7 to Mo-CBP3-4 and 17 clusters not known.

Mo-CBP3-2, 2A and 2B, were included in the same group due to the high sequence similarity shared by these three isoforms, which differ in only one amino acid 38 38 residue (Gln - Mo-CBP3-2 mutated to Arg Mo-CBP3-2A and 2B). In the same way Mo-

CBP3-3, 3A and 3B were grouped in a single cluster because these three isoforms share exactly the same amino acid sequence. Only the large chain presents differences among these proteins. 57 clusters of mass were attributed to peptides forming the large chain of Mo-CBP3 isoforms and theses clusters were subdivided in 7 groups as follows: 7 peptide clusters to Mo-

CBP3-1, 15 to Mo-CBP3-2, 11 to Mo-CBP3-2A, 11 to Mo-CBP3-2B, 2 to Mo-CBP3-3, 3B, 2 to

Mo-CBP3-3A, 8 to Mo-CBP3-4 and 1 cluster not known. Mo-CBP3-3 and 3B, which form one 51

single cluster, share exactly the same amino acid sequence among its small and large chains. The difference among these two proteins was only observed when the full length amino acid

Fonte: elaborada pelo autor.

Figure 4. General scheme of formation of disulfide bonds in Mo-CBP3 isoforms. (A) The isoforms: Mo-CBP3-1, 2, 2A, 3, 3B and 4 present two intramolecular and two intermolecular disulfide bonds; (B) The isoform Mo-CBP3-2B present one intramolecular and two intermolecular disulfide bonds; (C) The isoform Mo-CBP3-3A two intramolecular and one intermolecular disulfide bonds. 52

Monoisotopic Predicted Experimental Peptide sequence SD ∆ Da Variable modifications mass mass mass 39 69 Mo-CBP3-1 ( QRC ... PME )31 aa 3781.89 3911.9278 3911.2564 0.1680 0.6714 1x Oxi 38 69 Mo-CBP3-1 ( QQR ... PME )32 aa 3909.95 4086.9327 4086.8108 0.0000 0.1219 pGln and 1x Pho 38 70 Mo-CBP3-1 ( QQR ... MED )33 aa 4024.98 4218.9893 4219.2536 0.0684 0.2644 1x Pho 38 72 Mo-CBP3-1 ( QQR ... MED )35 aa 4253.09 4367.1329 4367.0242 0.5719 0.1087

39 69 Mo-CBP3-2 ( QRC ... PLD )31 aa 3795.97 3926.0078 3925.1062 0.0404 0.9016 1x Hyd 39 70 Mo-CBP3-2 ( QRC ... LDE )32 aa 3925.01 4038.0213 4038.3741 0.0248 0.3528 pGln and 1x Hyd 37 70 Mo-CBP3-2 ( QQQ ... LDE )34 aa 4181.13 4278.1464 4279.4971 0.0000 1.3507 pGln 37 71 Mo-CBP3-2 ( QQQ ... DEV )35 aa 4280.20 4393.2113 4393.4531 0.1126 0.2418 pGln and 1x Hyd

38 71 Mo-CBP3-2A/2B ( RQR ... DEV )34 aa 4081.18 4391.1342 4390.1842 0.0000 0.9500 1x Pho and 1x Hyd 38 71 Mo-CBP3-2A/2B ( RQR ... DEV )34 aa 4081.18 4374.1893 4373.8339 0.1999 0.3554 1x Pho 37 71 Mo-CBP3-2A/2B ( QRQ ... DEV )35 aa 4308.24 4438.2778 4438.7791 0.1749 0.5012 1x Hyd 37 72 Mo-CBP3-2A/2B ( QRQ ... EVE )36 aa 4437.29 4534.3064 4536.0366 0.0000 1.7302 pGln

44 73 Mo-CBP3-3/3A/3B ( QCR ... ALE )30 aa 3584.85 3761.8403 3761.9399 0.0000 0.0996 pGln and 1x Pho 44 73 Mo-CBP3-3/3A/3B ( QCR ... ALE )30 aa 3584.85 3778.8669 3780.6914 0.0888 1.8246 1x Pho 43 73 Mo-CBP3-3/3A/3B ( QQC ... ALE )31 aa 3712.91 3826.9590 3826.5420 0.1072 0.4170 37 74 Mo-CBP3-3/3A/3B ( QQG ... LED )36 aa 4410.23 4524.2729 4521.3341 0.0226 2.9387

39 71 Mo-CBP3-4( QRC ... EDV )33 aa 4010.00 4140.0378 4139.7871 0.0625 0.2507 1x Oxi 37 69 Mo-CBP3-1 ( QQQ ... PME )33 aa 4052.02 4324.9639 4324.6920 0.1290 0.2719 1x Oxi and 2x Pho 37 69 Mo-CBP3-4 ( QQQ ... PME )33 aa 4052.02 4246.0293 4245.6079 0.0000 0.4214 1x Pho 37 70 Mo-CBP3-4 ( QQQ ... MED )34 aa 4167.05 4344.0327 4343.1727 0.0927 0.8600 pGln and 1x Pho Fonte: elaborada pelo autor.

Table 1. Assignment of experimental molecular masses, as determined by ESI-MS, to amino acid sequences of Mo-CBP3 small chain. Monoisotopic mass values were those calculated from each amino acid sequence. Predicted mass values were obtained from the monoisotopic masses, including mass changes due to fixed and variable modifications. SD - Standard deviation. 53

Monoisotopic Predicted Experimental Peptide sequence SD ∆ Da Variable modifications mass mass mass 92 160 Mo-CBP3.1 ( RPP ... RQQ )69 aa 7912.95 8335.0451 8334.7607 0.0000 0.2844 1x Pho 91 160 Mo-CBP3.1 ( RRP ... RQQ )70 aa 8069.05 8427.1737 8426.5811 0.0000 0.5926 1x Oxi 90 160 Mo-CBP3.1 ( ARR ... RQQ )71 aa 8140.09 8530.2035 8529.4355 0.0000 0.7680 2 x Oxi and 1x Hyd 91 160 Mo-CBP3.1 ( RRP ... RQQ )70 aa 8069.05 8619.0962 8618.6650 0.0000 0.4312 2x Oxi, 2x Pho and 1x Hyd

89 158 Mo-CBP3.2 ( GRQ ... RQQ )71 aa 8050.94 8409.0637 8408.6855 0.0000 0.3782 1x Oxi 89 158 Mo-CBP3.2 ( GRQ ... RQQ )71 aa 8050.94 8473.0433 8472.3204 0.3267 0.7229 1x Pho 88 158 Mo-CBP3.2 ( PGR ... RQQ )71 aa 8147.99 8490.1188 8490.4727 0.0000 0.3539 89 158 Mo-CBP3.2 ( GRQ ... RQQ )70 aa 8050.94 8505.0249 8505.3010 0.1514 0.2761 2x Oxi and 1x Pho

87 158 Mo-CBP3.2A ( GPG ... RQQ )72 aa 8188.98 8531.1088 8531.8895 0.2543 0.7808 88 158 Mo-CBP3.2A ( PGR ... RQQ )71 aa 8131.96 8538.0684 8538.0788 0.5254 0.0104 3x Oxi and 1x Hyd 88 158 Mo-CBP3.2A ( PGR ... RQQ )71 aa 8131.96 8554.0551 8553.9866 0.3301 0.0685 1x Pho 87 158 Mo-CBP3.2A ( GPG ...RQQ )72 aa 8188.98 8834.9874 8834.7402 0.0000 0.2472 3x Oxi, 3x Pho and 2x Hyd

89 158 Mo-CBP3.2B ( GRQ ... RQQ )70 aa 8088.00 8531.9076 8532.3174 0.4674 0.4098 2x Pho 89 158 Mo-CBP3.2B ( GRQ ... RQQ )70 aa 8088.00 8549.0349 8547.4759 0.6474 1.5590 1x Oxi and 2x Pho 87 158 Mo-CBP3.2B ( GPG ... RQQ )71 aa 8242.07 8591.1569 8590.3839 0.3944 0.7730 3x Oxi and 1x Hyd 89 158 Mo-CBP3.2B ( GRQ ... RQQ )73 aa 8088.00 8660.9910 8661.4876 2.0690 0.4966 3x Oxi and 3x Pho

91 160 Mo-CBP3.3/3B ( ARR ... GQQ )70 aa 8022.07 8460.1600 8460.7627 0.0000 0.6027 1x Oxi and 1x Pho 92 160 Mo-CBP3.3/3B ( RRP ... GQQ )69 aa 7951.03 8485.0812 8485.0348 0.1823 0.0464 2x Oxi and 2x Pho

92 160 Mo-CBP3.3A ( RRP ... GQQ )69 aa 7905.05 8382.0798 8381.7925 0.2013 0.2873 2x Oxi and 1x Pho 91 160 Mo-CBP3.3A ( ARR ... GQQ )70 aa 7976.08 8398.1751 8397.4561 0.0000 0.7190 1x Pho

90 160 Mo-CBP3.4 ( ARR ... RQQ )71 aa 8156.12 8498.2488 8498.6921 0.2299 0.4433 90 160 Mo-CBP3.4 ( ARR .... RQQ )71 aa 8156.12 8514.2437 8514.5244 0.0000 0.2807 1x Oxi 91 160 Mo-CBP3.4 ( RRP ... RQQ )70 aa 8085.08 8603.1363 8604.6192 0.4706 1.4829 1x Oxi and 2x Pho 91 160 Mo-CBP3.4 ( RRP ... RQQ )70 aa 8085.08 8667.1160 8667.5136 0.4063 0.3976 3x Pho Fonte: elaborada pelo autor.

Table 2. Assignment of experimental molecular masses, as determined by ESI-MS, to amino acid sequences of Mo-CBP3 large chain. Monoisotopic mass values were those calculated from each amino acid sequence. Predicted mass values were obtained from the monoisotopic masses, including mass changes due to fixed and variable modifications. SD - Standard deviation.

54

sequence, rather than the fully mature polypeptide, was analyzed and this difference consisted 81 81 in a replacement of a Glu in Mo-CBP3-3 for a Gly in Mo-CBP3-3B in the linker region. The apparent discordance between the total of clusters encountered for both chains and the number of expected isoforms, 8, suggests that post-translational modifications could be occurring with Mo-CBP3, which is an event commonly reported for other proteins, including members of the 2S albumin group, mabinlin II [43], napin [13], 2S albumin-type peanut allergen [44] and 2S albumin from cacao [45], for example.

Post-translational modifications (PTMs) mapping

Five classes of post-translational modifications (PTMs), including carboxyamidomethylcysteine and pyroglutamic acid formation, methionine oxidation, phosphorylation and hydroxilation were considered to explain the extraordinary number of variation of Mo-CBP3 isoforms found in this work. The process of reduction and alkylation of cysteine residues may inevitably lead to the occurrence of carboxyamidomethylcysteine formation, a commonly known fixed PTM that affects Cys residues. Once all isoforms of Mo-

CBP3 are constituted by Cys residues, some of which are involved in disulfide bonds, all the peptides under analysis were considered to be prone to his modification. It is important to note that this type of PTM is rather a side effect of sample preparation than a biological event of natural occurrence. Pyroglutamic acid formation is a kind of variable PTM reported as a consequence of an enzymatic or a completely spontaneous and random process of cyclization of N-terminal Gln residues [46]. It was observed that some Gln residues were affected by these modifications in detriment of others. Regard to methionine oxidation, all Met residues were considered to be potential sites for the occurrence of this PTM, however only small chain peptides belonging to Mo-CBP3-1 and Mo-CBP3-4 isoforms were detected as being oxidized at Met68 for both peptides. A more detailed discussion and analysis about all these modifications will be presented in the next section. According to the NetPhos 3.1 server [47] only Ser and Thr residues were found to be putative sites for phosphorylation in the small chain of Mo-CBP3 47 62 62 isoforms. The following amino acids: Ser , Ser (Mo-CBP3-1), Thr (Mo-CBP3-2, 2A and 66 47 62 2B), Thr (Mo-CBP3-3, 3A and 3B) and Thr , Ser (Mo-CBP3-4) comprised the phosphorylation sites identified. On the other hand, no phosphorylation sites (supplementary Table S4) for Thr or Tyr residues were found for the large chain and the putatively 55

107 114 121 142 105 phosphorylated Ser residues identified were: Ser , Ser , Ser , Ser (Mo-CBP3-1), Ser , 112 140 114 142 154 Ser , Ser (Mo-CBP3-2, 2A and 2B), Ser , Ser , Ser (Mo-CBP3-3, 3A and 3B) and 107 114 121 142 Ser , Ser , Ser , Ser (Mo-CBP3-4). The iHyd-PseAAC server [48] predicted that only Pro residues were prone to hydroxylation in all Mo-CBP3 isoforms. In the small chain the hydroxylation profile was as 67 follows: Pro (Mo-CBP3-1, Mo-CBP3-2, 2A, 2B and Mo-CBP3-4). Hydroxylation sites were not identified on the small chain of Mo-CBP3-3, 3A and 3B isoforms. When considering the large chain all isoforms presented some degree of hydroxylation in the following site: Pro108, 145 152 106 107 150 Pro , Pro (Mo-CBP3-1 and Mo-CBP3-4), Pro , Pro , Pro (Mo-CBP3-2, 2A e 2B), 108 154 152 Pro , Pro (Mo-CBP3-3 and 3B) and Pro (Mo-CBP3-3A).

Detailed analysis of PTMs in Mo-CBP3 isoforms

The formation of carboxiamidomethylcysteine results in increments of the monoisotopic mass of the molecule under analysis of 57.02146 Da. This fixed modification was observed in all peptides analyzed, which includes the small and the large chain of all isoforms of Mo-CBP3. For example, the monoisotopic mass of the peptide (38QQRCRHQFQSQQRLRACQR VIRRWSQGGGPMEDVE72) corresponding to the small chain of Mo-CBP3-1 was expected to be 4253.09 Da, according to Expasy Compute pI/Mw tool [49]. The experimental mass obtained for this same peptide using LC-ESI-MS analysis was 4367.0242 Da. Considering that the two Cys residues (Cys41 and Cys54) present on this peptide are modified as carboxyamidomethylcysteine resulting in an increment of mass of 114.0429 Da (+57.02146 Da for each Cys), we conclude that these two modifications lead to a predicted mass of 4367.1329 Da. In this case the difference between the predicted and the experimental mass was of 0.1087 Da, suggesting the reliability of the result. In some situations, the formation of carboxyamidomethylcysteine was accompanied by other PTM, such as the enzymatic or spontaneous cyclization of Gln residues resulting in the production of pyroglutamic acid. The peptide (37QQQRCRQQFQTHQRLRACQRFIRRRTQGGGPLDE70) corresponding to the small chain of Mo-CBP3-2 presented a monoisotopic mass of 4181.13 Da and an experimental mass of 4279.4971 Da. The difference between the monoisotopic and the experimental mass was explained by two modifications: 1) The carboxyamidomethylcysteine formation (Cys41 and Cys54, mass variation of +114.0429 Da) 56

Fonte: elaborada pelo autor.

Figure 5. General scheme of a spectrum (mass spectrometric) demonstrating different masses corresponding to the small chain of Mo-CBP3 isoforms. 57

Fonte: elaborada pelo autor.

Figure 6. General scheme of a spectrum (mass spectrometric) demonstrating different masses corresponding to the small chain of Mo-CBP3 isoforms. 49

and the pyroglutamic acid formation (Gln37, mass variation of -17.02655 Da). When these two modifications are considered, we calculate a predicted mass of 4278.1464 Da. Now, comparing the predicted (4278.1464 Da) to the experimental mass (4279.4971 Da) it can be seen a difference of only 1.3507 Da. Oxidation, phosphorylation and hydroxylation lead to increments of mass on the polypeptides of 15.99491, 79.96633 and 15.99491 Da per amino acid, respectively. The validation of these PTMs was in accordance to the procedure used on the analysis of carboxyamidomethylcysteine and pyroglutamic acid forming modifications discussed before. The peptide (39QRCRHQFQSQQRLRACQRVIRR 71 WSQGGGPMEDV ) belongs to the small chain of Mo-CBP3-1 and contemplates all the kind of modifications evaluated in this work. The monoisotopic mass calculated for this peptideis 3995.99 Da. Experimental data revealed a real mass of 4204.7363 Da. A difference between the experimental and the calculated monoisotopic mass of 208.7463 Da was observed. This peptide contains two Cys residues (Cys41 and Cys54) in its sequence, which contribute to a mass of +114.0429 Da due to carboxyamidomethylcysteine formation. In addition, the presence of Gln residue (Gln39) at its N-terminal end that can cyclizes generating pyroglutamic acid, results in a loss of 17.02655 Da. Considering the possible oxidation of Met68 (+15.99491 Da), phosphorylation of Ser62 (+79.96633) and the hydroxylation of Pro67 (+15.99491 Da) we can calculate a predicted mass for this peptide of 4204.9625 Da. A difference of 0.2262 Da (considering the experimental and the predicted mass) strongly suggests that the PTMs presented here explain the variation of mass observed for this peptide regarding to its calculated monoisotopic mass. This same procedure was employed for all 89 and 58 peptide clusters (small and large chain, respectively) of Mo-CBP3 isoforms (Supplementary Table S5 – S16). Taken together these results demonstrate the robustness of the analysis.

The Biological Meaning of PTM on Mo-CBP3 isoforms

PTM are part of an intricate and elaborate mechanism by which living cells create a variety of modifications in proteins significantly enhancing its proteomic arsenal, which guarantees fast responses at comparatively low energetic costs [2]. As observed in our work the 2S albumin storage protein from M. oleifera seeds, Mo-CBP3, exists in at least 8 forms slightly diverging in its amino acidic composition. In addition to this variation, which may be consequence of genetic events like gene duplication, or even mRNA editing [50;51]; the two

59

main chains that constitute the mature polypeptide of Mo-CBP3 are subjected to a multitude of PTM, as identified by MS analysis. This variety of PTM may result in an explosion of possible isoforms and the cross-talk among isoforms of a same protein can coordinately determine the activity and protein-protein interactions [2]. Beyond the fact that we find a great diversity of PTM on the small and large peptides another intriguing finding we observed through mass spectrometry analysis was the fact that a specific peptide, such as (39QRCRHQFQSQQRLRACQRVIRRWSQG 72 GGPMEDVE - Mo-CBP3-1 small chain), may be present with different types of PTMs. In one case this peptide was encountered to be formed by glutamine cyclization (Gln39), methionine oxidation (Met68) and proline hydroxylation (Pro67) resulting in a mass of 4253.9771 Da. On another circumstance, the same peptide presented an experimental mass of 4350.8582 Da, which accounts for a serine phosphorylation (Ser62) instead of glutamine cyclization, in addition to the other two modifications previously described. This finding is in agreement with the observation that PTMs are reversible and therefore, a same amino acid residue can alternate among different states of modifications. Events like, cellular metabolism, protein folding, or even protein-protein interactions are the possible reasons responsible for this fact. Furthermore, it is known that a specific PTM may influence the occurrence of another PTM [2]. As a typical 2S albumin Mo-

CBP3 is fundamentally a storage protein acting primarily as a source of nitrogen and sulfur for the developing seedling during germination and growing in the same way as related for other 2S albumins, like napin, from B. napus and arabin, from Arabidopsis thaliana [40]. Beyond this well documented function, as a source of stored energy for nutrition and development, unusual activities had been attributed to this class of proteins with the past of the last years including the capacity to degrade DNA and RNA molecules. The 2S albumin from pumpkin presented potent RNase activity against RNA isolated from yeast and DNase activity against the plasmid DNA, pBR-322 [50;51]. The 2S albumin from Putranjiva roxburghii, called putrin, was capable to degrade both RNA from human colonic carcinoma cell line and the plasmid DNA pB-322, as well [54].

A similar case of DNase activity was reported for Mo-CBP2 (another 2S albumin from

M. oleifera seeds) which effectively digested the plasmid DNA pUC18 [55]. RNase and DNase activities are uncommon and not related to the storage function well described to 2S albumins and we speculate here that these activities may be related, amongst other reasons, to the diversity of isoforms these proteins may assume because of the occurrence of PTMs, as 60

detected for Mo-CBP3. However, additional studies addressing the specific implications of

PTMs on the activities of Mo-CBP3 should be conducted.

Conclusions

In this work was demonstrated that 2S albumins from M. oleifera seeds are a complex group of proteins composed not only by four isoforms, as previously described, but indeed by eight variants. The four new isoforms are closely-related to Mo-CBP3-2 and Mo-

CBP3-3. In addition, mass spectrometry analysis revealed by the first time that the two polypeptide chains that constitute the mature protein are subjected to different types of PTMs, revealing an extraordinary number of variants, which we hypothesize could be part of a diversification process in the proteomic repertoire of M. oleifera plant in order to regulate the cellular homeostasis.

Acknowledgements This work was supported by research grants from Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) and Fundação Cearense de Apoio ao Desenvolvimento Científico e Tecnológico (FUNCAP). JECF was recipient of Doctoral Fellowships from CAPES and CNPq. TBG is a senior researcher (PQ-2) of CNPq.

References

[1] J.E.C. Freire, I.M. Vasconcelos, F.B.M.B. Moreno, A.B. Batista, M.D.P. Lobo, M.L. Pereira, J.P.M.S. Lima, R.V.M. Almeida, A.J.S. Sousa, A.C.O. Monteiro-Moreira, J.T.A. Oliveira, T.B. Grangeiro, Mo-CBP3, an antifungal chitin-binding protein from Moringa oleifera seeds , is a member of the 2S albumin family., PLoS One. 10 (2015) 1–24. doi:10.1371/journal.pone.0119871. [2] G. Friso, K.J. van Wijk, Update: Post-translational protein modifications in plant metabolism, Plant Physiol. 169 (2015) pp.01378.2015. doi:10.1104/pp.15.01378. [3] S. De la Cruz, I.M. López-Calleja, M. Alcocer, I. González, R. Martín, T. García, TaqMan real-time PCR assay for detection of traces of Brazil nut (Bertholletia excelsa) in food products, Food Control. 33 (2013) 105–113. doi:10.1016/j.foodcont.2013.01.053. [4] M. Hummel, T. Wigger, J. Brockmeyer, Characterization of mustard 2S albumin allergens by bottom-up, middle-down, and top-down proteomics: A consensus set of isoforms of Sin a 1, J. Proteome Res. 14 (2015) 1547–1556. doi:10.1021/pr5012262. 61

[5] I. Boualeg, A. Boutebba, Purification of water soluble proteins (2S albumins) extracted from peanut defatted flour and isolation of their isoforms by gel filtration and anion exchange chromatography, Sci. Study Res. 18 (2017) 135–143. [6] D. Pantoja-Uceda, O. Palomares, M. Bruix, M. Villalba, R. Rodríguez, M. Rico, J. Santoro, Solution structure and stability against digestion of rproBnIb, a recombinant 2S albumin from rapeseed: Relationship to its allergenic properties, Biochemistry. 43 (2004) 16036–16045. doi:10.1021/bi048069x. [7] J.M. Gifoni, J.T. a. Oliveira, H.P.H.D. Oliveira, A.B. Batista, M.L. Pereira, A.S. Gomes, T.B. Grangeiro, I.M. Vasconcelos, A novel chitin-binding protein from Moringa oleifera seed with potential for plant disease control, Biopolymers. 98 (2012) 406–415. doi:10.1002/bip.22068. [8] A.B. Batista, J.T. a. Oliveira, J.M. Gifoni, M.L. Pereira, M.G.G. Almeida, V.M. Gomes, M. Da Cunha, S.F.F. Ribeiro, G.B. Dias, L.M. Beltramini, J.L.S. Lopes, T.B. Grangeiro, I.M. Vasconcelos, New Insights into the Structure and Mode of Action of Mo-CBP3, an Antifungal Chitin-Binding Protein of Moringa oleifera Seeds, PLoS One. 9 (2014) e111427. doi:10.1371/journal.pone.0111427. [9] A. Ullah, R.B. Mariutti, R. Masood, I.P. Caruso, G.H. Gravatim Costa, C. Millena De Freita, C.R. Santos, L.M. Zanphorlin, M.J. Rossini Mutton, M.T. Murakami, R.K. Arni, Crystal structure of mature 2S albumin from Moringa oleifera seeds, Biochem. Biophys. Res. Commun. 468 (2015) 365–371. doi:10.1016/j.bbrc.2015.10.087. [10] S.M. Ribeiro, R.G. Almeida, C. a a Pereira, J.S. Moreira, M.F.S. Pinto, A.C. Oliveira, I.M. Vasconcelos, J.T. a Oliveira, M.O. Santos, S.C. Dias, O.L. Franco, Identification of a Passiflora alata Curtis dimeric peptide showing identity with 2S albumins., Peptides. 32 (2011) 868–874. doi:10.1016/j.peptides.2010.10.011. [11] P.F.N. Souza, I.M. Vasconcelos, F.D.A. Silva, F.B. Moreno, A.C.O. Monteiro- moreira, L.M.R. Alencar, S.G. Abreu, J.S. Sousa, J.T.A. Oliveira, A 2S albumin from the seed cake of Ricinus communis inhibits trypsin and has strong antibacterial activity against human pathogenic bacteria, J. Nat. Prod. 79 (2016) 2423–31. doi:10.1021/acs.jnatprod.5b01096. [12] Y. Kawagoe, K. Suzuki, M. Tasaki, H. Yasuda, K. Akagi, E. Katoh, The Critical Role of Disulfide Bond Formation in Protein Sorting in the Endosperm of Rice, Society. 17 (2005) 1141–1153. doi:10.1105/tpc.105.030668.structure. [13] P.M. Gehrig, a Krzyzaniak, J. Barciszewski, K. Biemann, Mass spectrometric amino acid sequencing of a mixture of seed storage proteins (napin) from Brassica napus, products of a multigene family., Proc. Natl. Acad. Sci. U. S. A. 93 (1996) 3647–3652. doi:10.1073/pnas.93.8.3647. [14] S. Pfeifer, M. Bublin, P. Dubiela, K. Hummel, J. Wortmann, G. Hofer, W. Keller, C. Radauer, K. Hoffmann-Sommergruber, Cor a 14, the allergenic 2S albumin from hazelnut, is highly thermostable and resistant to gastrointestinal digestion., Mol. Nutr. Food Res. 59 (2015) 2077–2086. doi:10.1002/mnfr.201500071. [15] P. Beltrao, P. Bork, N.J. Krogan, V. van Noort, Evolution and functional cross-talk of protein post-translational modifications, Mol. Syst. Biol. 9 (2013) 1–13. 62

[16] N.Y. Liu, H.H. Lee, Z.F. Chang, Y.G. Tsay, Examination of segmental average mass spectra from liquid chromatography-tandem mass spectrometric (LC-MS/MS) data enables screening of multiple types of protein modifications, Anal. Chim. Acta. 892 (2015) 115–122. doi:10.1016/j.aca.2015.07.032. [17] J.S. Cottrell, Protein identification using MS/MS data, J. Proteomics. 74 (2011) 1842– 1851. doi:10.1016/j.jprot.2011.05.014. [18] S.A.J. Warner, Genomic DNA isolation and lambda library construction, in: G.D. Foster. D. Twell (Eds.), Plant Gene Isol. Princ. Pr. John Wiley Sons, West Sussex, 1996: pp. 51–73. [19] J. Sambrook, T. E. Fritsch, Molecular Cloning: A Laboratory Manual, in: 2nd Ed., Cold Spring Harb. Lab. Press. Cold Spring Harb., 1989. [20] P.Y. Lee, J. Costumbrado, C.-Y. Hsu, Y.H. Kim, Agarose gel electrophoresis for the separation of DNA fragments, J. Vis. Exp. 62 (2012) 1–5. doi:10.3791/3923. [21] J. Sambrook, E. Fritsch, T. Maniatis, Molecular Cloning: A Laboratory Manual. 2nd ed. Cold Spring Harbor: Cold Spring Harbor Laboratory Press;, 1989. [22] T.J. Gibson, J.E. Sulston, Preparation of large numbers of plasmid DNA samples in microtiter plates by the alkaline lysis method, Gene Anal. Tech. 4 (1987) 41–44. doi:10.1016/0735-0651(87)90016-1. [23] S.F. Altschul, T.L. Madden, A. a. Schäffer, J. Zhang, Z. Zhang, W. Miller, D.J. Lipman, Gapped BLAST and PSI-BLAST: A new generation of protein database search programs, Nucleic Acids Res. 25 (1997) 3389–3402. doi:10.1093/nar/25.17.3389. [24] A. Marchler-Bauer, Y. Bo, L. Han, J. He, C.J. Lanczycki, S. Lu, F. Chitsaz, M.K. Derbyshire, R.C. Geer, N.R. Gonzales, M. Gwadz, D.I. Hurwitz, F. Lu, G.H. Marchler, J.S. Song, N. Thanki, Z. Wang, R.A. Yamashita, D. Zhang, C. Zheng, L.Y. Geer, S.H. Bryant, CDD/SPARCLE: Functional classification of proteins via subfamily domain architectures, Nucleic Acids Res. 45 (2017) D200–D203. doi:10.1093/nar/gkw1129. [25] K. Tamura, G. Stecher, D. Peterson, A. Filipski, S. Kumar, MEGA6: Molecular evolutionary genetics analysis version 6.0, Mol. Biol. Evol. 30 (2013) 2725–2729. doi:10.1093/molbev/mst197. [26] R.C. Edgar, MUSCLE: Multiple sequence alignment with high accuracy and high throughput, Nucleic Acids Res. 32 (2004) 1792–1797. doi:10.1093/nar/gkh340. [27] T. Hall, BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT, Nucleic Acids Symp. Ser. 41 (1999) 95–98. doi:citeulike-article-id:691774. [28] F. Sievers, D.G. Higgins, Clustal Omega, accurate alignment of very large numbers of sequences, in: Russell D. Mult. Seq. Alignment Methods. Methods Mol. Biol. (Methods Protoc. Vol 1079. Humana Press. Totowa, NJ, 2014. http://www.springer.com/us/book/9781627036450?wt_mc=ThirdParty.SpringerLink.3. EPR653.About_eBook. [29] H. Schägger, Tricine-SDS-PAGE., Nat. Protoc. 1 (2006) 16–22. 63

doi:10.1038/nprot.2006.4. [30] A. Shevchenko, M. Wilm, O. Vorm, M. Mann, Mass spectrometric sequencing of proteins from silver-stained polyacrylamide gels, Anal. Chem. 68 (1996) 850–858. doi:10.1021/ac950914h. [31] T.N. Petersen, S. Brunak, G. von Heijne, H. Nielsen, SignalP 4.0: discriminating signal peptides from transmembrane regions, Nat. Methods. 8 (2011) 785–786. doi:10.1038/nmeth.1701. [32] G. Von Heijne, Patterns of amino acids near signal-sequence cleavage sites, Eur. J. Biochem. 133 (1983) 17–21. [33] S. Oguri, M. Kamoshida, Y. Nagata, Y.S. Momonoki, H. Kamimura, Characterization and sequence of tomato 2S seed albumin: a storage protein with sequence similarities to the fruit lectin., Planta. 216 (2003) 976–984. doi:10.1007/s00425-002-0950-y. [34] T.C. Jyothi, S. Sinha, S. a Singh, a Surolia, a G. Appu Rao, Napin from Brassica juncea: thermodynamic and structural analysis of stability., Biochim. Biophys. Acta. 1774 (2007) 907–919. doi:10.1016/j.bbapap.2007.04.008. [35] V.V. Do Nascimento, H.C. Castro, P.A. Abreu, A. Elenir, A. Oliveira, J.H. Fernandez, J. Da Silva Araújo, O.L.T. Machado, In silico structural characteristics and α-amylase inhibitory properties of Ric c 1 and Ric c 3, allergenic 2S albumins from ricinus communis seeds, J. Agric. Food Chem. 59 (2011) 4814–4821. doi:10.1021/jf104638b. [36] T.G. Costa, O.L. Franco, L. Migliolo, S.C. Dias, Identification of a novel 2S albumin with antitryptic activity from Caryocar brasiliense seeds, J. Agric. Sci. 7 (2015) 197– 206. doi:10.5539/jas.v7n6p197. [37] F. Ferrè, P. Clote, DiANNA 1.1: An extension of the DiANNA web server for ternary cysteine classification, Nucleic Acids Res. 34 (2006) 182–185. doi:10.1093/nar/gkl189. [38] M. Rico, M. Bruix, C. González, R.I. Monsalve, R. Rodríguez, 1H NMR assignment and global fold of napin BnIb, a representative 2S albumin seed protein, Biochemistry. 35 (1996) 15672–15682. doi:10.1021/bi961748q. [39] D.F. Li, P. Jiang, D.Y. Zhu, Y. Hu, M. Max, D.C. Wang, Crystal structure of Mabinlin II: A novel structural type of sweet proteins and the main structural basis for its sweetness, J. Struct. Biol. 162 (2008) 50–62. doi:10.1016/j.jsb.2007.12.007. [40] B. Franke, A.M. James, M. Mobli, M.L. Colgrave, J.S. Mylne, K.J. Rosengren, Two proteins for the price of one: Structural studies of the dual-destiny protein preproalbumin with sunflower trypsin inhibitor-1, J. Biol. Chem. 292 (2017) 12398– 12411. doi:10.1074/jbc.M117.776955. [41] L. Rundqvist, T. Tengel, J. Zdunek, E. Björn, J. Schleucher, M.J.C. Alcocer, G. Larsson, Solution Structure, Copper Binding and Backbone Dynamics of Recombinant Ber e 1-The Major Allergen from Brazil Nut, PLoS One. 7 (2012). doi:10.1371/journal.pone.0046435. [42] D. Pantoja-Uceda, M. Bruix, G. Giménez-Gallego, M. Rico, J. Santoro, Solution Structure of RicC3, a 2S Albumin Storage Protein from Ricinus communis, 64

Biochemistry. 42 (2003) 13839–13847. doi:10.1021/bi0352217. [43] S. Nirasawa, Y. Masuda, K. Nakaya, Y. Kurihara, Cloning and sequencing of a cDNA encoding a heat-stable sweet protein, mabinlin II, Gene. 181 (1993) 225–7. [44] K. Lehmann, K. Schweimer, G. Reese, S. Randow, M. Suhr, W.-M. Becker, S. Vieths, P. Rösch, Structure and stability of 2S albumin-type peanut allergens: implications for the severity of peanut allergic reactions., Biochem. J. 395 (2006) 463–472. doi:10.1042/BJ20051728. [45] N.M. Zaini, A. Awang, C. Budiman, K.F. Rodrigues, International Journal of Advanced and Applied Sciences Single step purification of 2S albumin from Theobroma cacao, Int. J. Adv. Appl. Sci. 4 (2017) 57–61. [46] A. Kumar, A.K. Bachhawat, Pyroglutamic acid: Throwing light on a lightly studied metabolite, Curr. Sci. 102 (2012) 288–297. doi:10.2307/24083854. [47] N. Blom, T. Sicheritz-Pontén, R. Gupta, S. Gammeltoft, S. Brunak, Prediction of post- translational glycosylation and phosphorylation of proteins from the amino acid sequence, Proteomics. 4 (2004) 1633–1649. doi:10.1002/pmic.200300771. [48] Y. Xu, X. Wen, X.J. Shao, N.Y. Deng, K.C. Chou, iHyd-PseAAC: Predicting hydroxyproline and hydroxylysine in proteins by incorporating dipeptide position- specific propensity into pseudo amino acid composition, Int. J. Mol. Sci. 15 (2014) 7594–7610. doi:10.3390/ijms15057594. [49] E. Gasteiger, C. Hoogland, A. Gattiker, S. Duvaud, M.R. Wilkins, R.D. Appel, A. Bairoch, Protein Identification and Analysis Tools on the ExPASy Server, Proteomics Protoc. Handb. (2005) 571–607. doi:10.1385/1-59259-890-0:571. [50] E. Krebbers, L. Herdies, A. De Clercq, J. Seurinck, J. Leemans, J. Van Damme, M. Segura, G. Gheysen, M. Van Montagu, J. Vandekerckhove, Determination of the Processing Sites of an Arabidopsis 2S Albumin and Characterization of the Complete Gene Family., Plant Physiol. 87 (1988) 859–866. doi:10.1104/pp.87.4.859. [51] R.I. Monsalve, M. Villalba, R. Rodríguez, Allergy to Mustard Seeds : The Importance of 2S Albumins as Food Allergens, Internet Symp. Food Allergens. 3 (2001) 57–69. [52] E.F. Fang, J.H. Wong, P. Lin, T.B. Ng, Biochemical characterization of the RNA- hydrolytic activity of a pumpkin 2S albumin, FEBS Lett. 584 (2010) 4089–4096. doi:10.1016/j.febslet.2010.08.041. [53] P.P.S. Tomar, K. Nikhil, A. Singh, P. Selvakumar, P. Roy, A.K. Sharma, Characterization of anticancer, DNase and antifungal activity of pumpkin 2S albumin., Biochem. Biophys. Res. Commun. 448 (2014) 349–54. doi:10.1016/j.bbrc.2014.04.158. [54] P.P.S. Tomar, N.S. Chaudhary, P. Mishra, D. Gahloth, G.K. Patel, P. Selvakumar, P. Kumar, A.K. Sharma, Purification, characterisation and cloning of a 2S albumin with DNase, RNase and antifungal activities from Putranjiva Roxburghii., Appl. Biochem. Biotechnol. July (2014) 1–12. doi:10.1007/s12010-014-1078-9. [55] J.X.S. Neto, M.L. Pereira, J.T.A. Oliveira, L.C.B. Rocha-Bezerra, T.D.P. Lopes, H.P.S. Costa, D.O.B. Sousa, B.A.M. Rocha, T.B. Grangeiro, J.E.C. Freire, A.C.O. 65

Monteiro-Moreira, M.D.P. Lobo, R.S.N. Brilhante, I.M. Vasconcelos, A chitin-binding protein purified from Moringa oleifera seeds presents anticandidal activity by increasing cell membrane permeability and reactive oxygen species production, Front. Microbiol. 8 (2017) 1–12. doi:10.3389/fmicb.2017.00980.

66

6 CONSIDERAÇÕES FINAIS

O estudo de albuminas 2S expressas em sementes de Moringa oleifera tem fornecidos evidências de uma grande diversidade de isoformas, em parte, devido à origem multigênica, além do intenso processamento pós-traducional (MPT). Foram consideradas as PTMs: como hidroxilação de prolina, fosforilação de serina ou treonina, oxidação de metionina e cliclização de glutamina N-terminal em piroglutamato (pGlu). Após alinhamento múltiplo das sequências de aminoácidos deduzidas foi possível agrupá-las em oito grupos distintos. Onde, quatro isoformas foram descritas anteriormente

(Mo-CBP3-1, Mo-CBP3-2, Mo-CBP3-3 e Mo-CBP3-4), enquanto que as isoformas Mo-CBP3-

2A (163 resíduos de aminoácidos), Mo-CBP3-2B (162 resíduos), Mo-CBP3-3A (160 resíduos) e Mo-CBP3-3B (160 aa) foram descritas pela primeira vez neste trabalho. Embora existam outras isoformas de albuminas 2S na M. oleifera, as quais não foi possível serem identicadas neste trabalho. Análises das sequências de aminoácidos, deduzidas de DNA genômico, sugeriram que as isoformas de Mo-CBP3 são sintetizadas como preproproteínas, contendo um peptídeo sinal N-terminal, um propeptídeo N-terminal, uma cadeia menor com cerca de 4 kDa, um peptídeo de ligação entre as duas cadeias, uma cadeia maior com aproximadamente 8 kDa, e uma extensão C-terminal. Estes resultados constituem importantes avanços na compreensão dos mecanismos pós-traducionais inerentes que operam durante a biossíntese das albuminas 2S nas sementes da M. oleifera. 67

REFERÊNCIAS

ABIYU, A. et al. Wastewater treatment potential of Moringa stenopetala over Moringa olifera as a natural coagulant, antimicrobial agent and heavy metal removals. Cogent Environmental Science. v. 4, p. 1–13, 2018. AGIZZIO, A. P. et al. A 2S albumin-homologous protein from passion fruit seeds inhibits the fungal growth and acidification of the medium by Fusarium oxysporum. Archives of Biochemistry and Biophysics. v. 416, p. 188–195, 2003. AGIZZIO, A. P. et al. The antifungal properties of a 2S albumin-homologous protein from passion fruit seeds involve plasma membrane permeabilization and ultrastructural alterations in yeast cells. Plant Science. v. 171, p. 515–522, 2006. AGRAWAL, H.; SHEE, C.; SHARMA, A. K. Isolation of a 66 kDa Protein with coagulation activity from seeds of Moringa oleifera. Research Journal of Agriculture and Biological Sciences. v. 3, p. 418–421, 2007. AHN, K. et al. Identification of two pistachio allergens, Pis v 1 and Pis v 2, belonging to the 2S albumin and 11S globulin family. Clinical and Experimental Allergy. v. 39, p. 926–934, 2009. ALLEN, R. D. et al. Sequence and expression of a gene encoding an albumin storage protein in sunflower. Molecular & General Genetics. v. 210, p. 211–218, 1987. ALTSCHUL, S. F. et al. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Research. v. 25, p. 3389-3402, 1997. ASHRAF, M. et al. Microscopic evaluation of the antimicrobial activity of seed extracts of Moringa oleifera. Agriculture. v. 40, p. 1349–1358, 2008. BAPTISTA, A. T. A. et al. Protein fractionation of seeds of Moringa oleifera lam and its application in superficial water treatment. Separation and Purification Technology. v. 180, p. 114–124, 2017.

BATISTA, A. B. et. al. New Insights into the structure and mode of action of Mo-CBP3, an antifungal chitin-binding protein of Moringa oleifera seeds. PLoS One. v.9, p. e111427, 2014. BELTRAO, P. et. al. Evolution and functional cross-talk of protein post-translational modifications. Molecular Systems Biology. v. 9, p. 1-13, 2013. BLOM, N. et al. Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence. Proteomics. v. 4, p. 1633-1649, 2004. BOUALEG, I.; BOUTEBBA, A. Purification of water soluble proteins (2S albumins) extracted from peanut defatted flour and isolation of their isoforms by gel filtration and anion exchange chromatography. Social Studies of Science. v. 18, 135-143, 2017. BREITENEDER, H.; RADAUER, C. A classification of plant food allergens. Journal of Allergy and Clinical Immunology. v. 113, p. 821–830, 2004. CÁCERES, A. et al. Pharmacological properties of Moringa oleifera. 1: Preliminary screening for antimicrobial activity. Journal of Ethnopharmacology. v. 33, p. 213–6, 1991. 68

CÂNDIDO, E. DE S. et al. Plant storage proteins with antimicrobial activity: novel insights into plant defense mechanisms. The FASEB Journal. v. 25, p. 3290–3305, 2011. CHEN, M. Elucidation of bactericidal effects incurred by Moringa oleifera and chitosan. Journal of the U.S. SJWP. v. 4, p. 65–79, 2009. COSTA, T. G. et al. Identification of a novel 2S albumin with antitryptic activity from Caryocar brasiliense seeds. Journal of Agricultural Science. v. 7, p. 197-206, 2015. COTTRELL, J. S. Protein identification using MS/MS data. Journal of Proteomics. v. 74. p. 1842-1851, 2011. Da SILVA, J. G. et al. Amino acid sequence of a new 2S albumin from Ricinus communis which is part of a 29-kDa precursor protein. Archives of Biochemistry and Biophysics. v. 336, p. 10–18, 1996. De la CRUZ, S. et al. TaqMan real-time PCR assay for detection of traces of Brazil nut (Bertholletia excelsa) in food products. Food Control. v. 33, p. 105-113, 2013. DUAN, X. H. et al. Some 2S albumin from peanut seeds exhibits inhibitory activity against Aspergillus flavus. Plant Physiology and Biochemistry. v. 66, p. 84–90, 2013. EBISAWA, M. et al. Gly m 2S albumin is a major allergen with a high diagnostic value in soybean-allergic children. Journal of Allergy and Clinical Immunology. v. 132, 2013. EDGAR, R. C. MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research. v. 32, p. 1792-1797, 2004. ELGAMILY, H. et al. Microbiological assessment of Moringa oleifera extracts and its incorporation in novel dental remedies against some oral pathogens. Macedonian Journal of Medical Sciences. v. 4, p. 585–590, 2016. EYRAUD, V. et al. The interaction of the bioinsecticide PA1b (Pea Albumin 1 subunit b) with the insect V-ATPase triggers apoptosis. Scientific Reports. v. 7, p. 1–10, 2017. FANG, E. F. et al. Biochemical characterization of the RNA-hydrolytic activity of a pumpkin 2S albumin. FEBS Letters. v. 584, p. 4089–4096, 2010. FERRÈ, F.; CLOTE, P. DiANNA 1.1: An extension of the DiANNA web server for ternary cysteine classification. Nucleic Acids Research. v. 34, p. 182-185, 2006. FERREIRA, P. M. P. et al. Larvicidal activity of the water extract of Moringa oleifera seeds against Aedes aegypti and its toxicity upon laboratory animals. Anais da Academia Brasileira de Ciências. v. 81, p. 207–16, 2009. FRANKE, B. et al. Two proteins for the price of one: Structural studies of the dual-destiny protein preproalbumin with sunflower trypsin inhibitor-1. Journal of Biological Chemistry. v. 292, p. 12398-12411, 2017.

FREIRE, J. E. C. et al. Mo-CBP3, an antifungal chitin-binding protein from Moringa oleifera seeds, is a member of the 2S albumin family. PLoS One. v. 10, p. 1-24, 2015. FRISO, G.; van WIJK, K. J. Update: Post-translational protein modifications in plant metabolism. Plant Physiology. v. 169, p.1-43, 2015. 69

GALLÃO, M. I.; DAMASCENO, L. F.; BRITO, E. S. Avaliação química e estrutural da semente de moringa. Revista Ciência Agronômica. v. 37, p. 106–9, 2006. GANGULY, R.; GUHA, D. Alteration of brain monoamines & EEG wave pattern in rat model of Alzheimer’s disease & protection by Moringa oleifera. Indian Journal of Medical Research. v. 128, p. 744–751, 2008. GARINO, C. et al. Isolation, cloning, and characterization of the 2S albumin: A new allergen from hazelnut. Molecular Nutrition and Food Research. v. 54, p. 1257–1265, 2010. GASSENSCHMIDT, U. et al. Isolation and characterization of a flocculating protein from Moringa oleifera Lam. Biochimica et Biophysica Acta. v. 1243, p. 477–81, 1995. GASTEIGER, E. et al. Protein Identification and Analysis Tools on the ExPASy Server., in The Proteomics Protocols Handbook, 2005, p. 571–607. GEHRIG, P.M. et. al. Mass spectrometric amino acid sequencing of a mixture of seed storage proteins (napin) from Brassica napus, products of a multigene family. Proceedings of the National Academy of Sciences. v. 93, p. 3647-3652, 1996. GENOV, N. et al. A novel thermostable inhibitor of trypsin and subtilisin from the seeds of Brassica nigra: Amino acid sequence, inhibitory and spectroscopic properties and thermostability. Biochimica et Biophysica Acta - Protein Structure and Molecular Enzymology. v. 1341, p. 157–164, 1997. GHEBREMICHAEL, K. A. et al. A simple purification and activity assay of the coagulant protein from Moringa oleifera seed. Water Research. v. 39, p. 2338–2344, 2005. GIBSON, T. J.; SULSTON, J. E. Preparation of large numbers of plasmid DNA samples in microtiter plates by the alkaline lysis method. Gene Analysis Techniques. v.4, p. 41-44, 1987. GIFONI, J. M. et al. A novel chitin-binding protein from Moringa oleifera seed with potential for plant disease control. Biopolymers. v. 98, p. 406–415, 2012. GOYAL, B. R. et al. Phyto-pharmacology of Moringa oleifera Lam . ó An overview. Natural Product Radiance. v. 6, p. 347–353, 2007. GRESSENT, F. et al. Pea albumin 1 subunit b (PA1b), a promising bioinsecticide of plant origin. Toxins. v. 3, p. 1502–1517, 2011. GUPTA, P.; GAUR, V.; SALUNKE, D. M. Purification, identification and preliminary crystallographic studies of a 2S albumin seed protein from Lens culinaris. Acta Crystallographica Section F: Structural Biology and Crystallization Communications. v. 64, p. 733–736, 2008. HALL, T. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symposium Series. v. 41, p. 95-98, 1999. HEATH, J. D. et al. Analysis of storage proteins in normal and aborted seeds from embryo- lethal mutants of Arabidopsis thaliana. Planta. v. 169, p. 304–312, 1986. 70

HUMMEL, M.; WIGGER, T.; BROCKMEYER, J. Characterization of mustard 2S albumin allergens by bottom-up, middle-down, and top-down proteomics: A consensus set of isoforms of Sin a 1. Journal of Proteome Research. v. 14, p. 1547-1556, 2015. IRWIN, S. D. et al. The Ricinus communis 2S albumin precursor: A single preproprotein may be processed into two different heterodimeric storage proteins. Molecular and General Genetics. v. 222, p. 400–408, 1990. JYOTHI, T. C. et al. Napin from Brassica juncea: thermodynamic and structural analysis of stability. Biochimica et Biophysica Acta. v. 1774, p. 907-919, 2007. KAWAGOE, Y. et. al. The critical role of disulfide bond formation in protein sorting in the endosperm of rice. Society. v.17, p. 1141-1153, 2005. KHAN, S. et al. Purification and characterization of 2S albumin from Nelumbo nucifera. Bioscience, Biotechnology and Biochemistry. v. 80, p. 2109–2114, 2016. KOU, X. et al. Nutraceutical or Pharmacological Potential of Moringa oleifera Lam. Nutrients. v. 10, p. 343, 2018. KREBBERS, E. et al. Determination of the processing sites of an Arabidopsis 2S Albumin and characterization of the complete gene family. Plant Physiology. v. 87, p. 859-866, 1988. KUMAR, A.; BACHHAWAT, A. K. Pyroglutamic acid: Throwing light on a lightly studied metabolite. Current Science. v. 102, p. 288-297, 2012. LEE, P. Y. et al. Agarose gel electrophoresis for the separation of DNA fragments. Journal of Visualized Experiments. v. 62, p.1-5, 2012. LEHMANN, K. et al. Structure and stability of 2S albumin-type peanut allergens: implications for the severity of peanut allergic reactions. Biochemical Journal. v. 395, p. 463-472, 2006. PEREIRA, M. L. et al. Purification of a chitin-binding protein from Moringa oleifera seeds with potential to relieve pain and inflammation. Protein and Peptide Letters. v. 18, p. 1078– 1085, 2011. LEONE, A. et al. Moringa oleifera seeds and oil: Characteristics and uses for human health. International Journal of Molecular Sciences. v. 17, p. 1–14, 2016. LI, D. F. et al. Wang, Crystal structure of mabinlin II: A novel structural type of sweet proteins and the main structural basis for its sweetness. Journal of Structural Biology. v. 162, p. 50-62, 2008. LI, L. et al. MAIGO2 is involved in exit of seed storage proteins from the endoplasmic reticulum in Arabidopsis thaliana. The Plant cell. v. 18, p. 3535–3547, 2006. LIPIPUN, V. et al. Efficacy of Thai medicinal plant extracts against Herpes simplex virus type 1 infection in vitro and in vivo. Antiviral Research. v. 60, p. 175–180, 2003. LIU, N. Y. et. al. Examination of segmental average mass spectra from liquid chromatography-tandem mass spectrometric (LC-MS/MS) data enables screening of multiple types of protein modifications. Analytica Chimica Acta. v. 892, p. 115-122, 2015. 71

MAKKAR, H. P. S.; BECKER, K. Nutrients and antiquality factors in different morphological parts of the Moringa oleifera tree. The Journal of Agricultural Science. v. 128, p. 311–322, 1997. MANDAL, S. et al. Precursor of the inactive 2S seed storage protein from the Indian mustard Brassica juncea is a novel trypsin inhibitor. Characterization, post-translational processing studies, and transgenic expression to develop insect-resistant plants. Journal of Biological Chemistry. v. 277, p. 37161–37168, 2002. MARCHLER-BAUER, A. et al. CDD/SPARCLE: Functional classification of proteins via subfamily domain architectures. Nucleic Acids Research. v. 45, p. D200-D203, 2017. MARIA-NETO, S. et al. Bactericidal activity identified in 2S albumin from sesame seeds and in silico studies of structure-function relations. Protein Journal. v. 30, p. 340–350, 2011. MARTINOIA, E.; MAESHIMA, M.; NEUHAUS, H. E. Vacuolar transporters and their essential role in plant metabolism. Journal of Experimental Botany. v. 58, p. 83–102, 2007. MENÉNDEZ-ARIAS, L. et al. Primary structure of the major allergen of yellow mustard (Sinapis alba L.) seed, Sin a I. European journal of Bochemistry/FEBS. v. 177, p. 159– 166, 1988. MONSALVE, R. I.; VILLALBA, M.; RODRÍGUEZ, R. Allergy to mustard seeds: The importance of 2S albumins as food allergens. Internet Symposium on Food Allergens. v. 3, p. 57-69, 2001. MORENO, F. J. et al. Mass spectrometry and structural characterization of 2S albumin isoforms from Brazil nuts (Bertholletia excelsa). Biochimica et Biophysica Acta - Proteins and Proteomics. v. 1698, p. 175–186, 2004. MORENO, F. J. et al. Thermostability and in vitro digestibility of a purified major allergen 2S albumin (Ses i 1) from white sesame seeds (Sesamum indicum L.). Biochimica et Biophysica Acta - Proteins and Proteomics. v. 1752, p. 142–153, 2005. MORENO, F. J.; CLEMENTE, A. 2S albumin storage proteins: What makes them food allergens? The Open Biochemistry Journal. v.2, p. 16-28, 2008. MYLNE, J. S.; HARA-NISHIMURA, I.; ROSENGREN, K. J. Seed storage albumins: biosynthesis, trafficking and structures. Functional Plant Biology. v. 41, p. 671–7, 2014. NASCIMENTO, V. V. et al. In silico structural characteristics and α-amylase inhibitory properties of Ric c 1 and Ric c 3, allergenic 2S albumins from ricinus communis seeds. Journal of Agricultural and Food Chemistry. v. 59, p. 4814-4821, 2011. NAWROT, R. et al. Plant antimicrobial peptides. Host Defense Peptides and Their Potential as Therapeutic Agents. v. 59, p. 181–196, 2014. NDABIGENGESERE, A.; SUBBA NARASIAH, K.; TALBOT, B. G. Active agents and mechanism of coagulation of turbid waters using Moringa oleifera. Water Research. v. 29, p. 703–710, 1995. NETO, J. X. S. et al. A chitin-binding protein purified from Moringa oleifera seeds presents anticandidal activity by increasing cell membrane permeability and reactive oxygen species production. Frontiers in Microbiology. v. 8, p.1-12, 2017. 72

NIRASAWA, S. et al. Cloning and sequencing of a cDNA encoding a heat-stable sweet protein, mabinlin II. Gene. v. 181, p. 225-227, 1993. NWOSU, M.; OKAFOR, J. I. Preliminary studies of the antifungal activities of some medicial plants against Basidiobolus and some other pathogenic fungi. Mycoses. v. 38, p. 191–5, 1995. ODINTSOVA, T. I. et al. Antifungal activity of storage 2S albumins from seeds of the invasive weed dandelion Taraxacum officinale Wigg. Protein and Peptide Letters. v. 17, p. 522–529, 2010. OGURI, S. et al. Characterization and sequence of tomato 2S seed albumin: a storage protein with sequence similarities to the fruit lectin. Planta. v. 216, p. 976-984, 2003. OKUDA, T. et al. Isolation and characterization of coagulant extracted from Moringa oleifera seed by salt solution. Water Research. v. 35, p. 405–410, 2001. ONYEKE, C. C.; AKUESHI, C. O. Infectivity and reproduction of Meloidogyne incognita (Kofoid and White) Chitwood on African yam bean, Sphenostylis stenocarpa (Hochst Ex. A. Rich) Harms accessions as influenced by botanical soil amendments. African Journal of Biotechnology. v. 11, p. 13095–103, 2012. ORRUÑO, E.; MORGAN, M. R. A. Resistance of purified seed storage proteins from sesame (Sesamum indicum L.) to proteolytic digestive enzymes. Food Chemistry. v. 128, p. 923– 929, 2011. PANTOJA-UCEDA, D. et al. Solution structure and stability against digestion of rproBnib, a recombinant 2S albumin from rapeseed: relationship to its allergenic properties. Biochemistry. v. 43, p. 16036-16045, 2004a. PANTOJA-UCEDA, D. et al. Solution structure of a methionine-rich 2S albumin from sunflower seeds: Relationship to its allergenic and emulsifying properties. Biochemistry. v. 43, p. 6976-6986, 2004b. PANTOJA-UCEDA, D. et al. Solution structure of RicC3, a 2S albumin storage protein from Ricinus communis. Biochemistry. v. 42, p. 13839-13847, 2003. PANTOJA-UCEDA, D. et. al. Solution structure and stability against digestion of rproBnIb, a recombinant 2S albumin from rapeseed: Relationship to its allergenic properties. Biochemistry. v. 43, p. 16036-16045, 2004. PETERSEN, T. N. SignalP 4.0: discriminating signal peptides from transmembrane regions. Nature Methods. v. 8, p. 785-786, 2011. PFEIFER, S. et al. Cor a 14, the allergenic 2S albumin from hazelnut, is highly thermostable and resistant to gastrointestinal digestion. Molecular Nutrition & Food Research. v. 59, p. 2077–2086, 2015. PFEIFER, S. et. al. Cor a 14, the allergenic 2S albumin from hazelnut, is highly thermostable and resistant to gastrointestinal digestion. Molecular Nutrition & Food Research. v. 59, p. 2077-2086, 2015. RAMACHANDRAN, C.; PETER, K. V.; GOPALAKRISHNAN, P. K. Drumstick (Moringa oleifera): A Multipurpose Indian Vegetable. Economic Botany. v. 34, p. 276–283, 1980. 73

RANI, N. Z. A.; HUSAIN, K.; KUMOLOSASI, E. Moringa Genus: A review of phytochemistry and pharmacology. Frontiers in Pharmacology. v. 9, p. 1–26, 2018. REGENTE, M.; DE LA CANAL, L. Do sunflower 2S albumins play a role in resistance to fungi? Plant Physiology and Biochemistry. v. 39, p. 407–413, 2001. RIBEIRO, S. F. F. et al. Antifungal and other biological activities of two 2S albumin- homologous proteins against pathogenic fungi. Protein Journal. v. 31, p. 59–67, 2011a. RIBEIRO, S. M. et al. Identification of a Passiflora alata Curtis dimeric peptide showing identity with 2S albumins. Peptides. v.32, p. 868-874, 2011. RICO, M. et al. 1H NMR assignment and global fold of napin BnIb, a representative 2S albumin seed protein. Biochemistry. v. 35. p. 15672-15682, 1996. ROBOTHAM, J. M. et al. Ana o 3, an important cashew nut (Anacardium occidentale L.) allergen of the 2S albumin family. Journal of Allergy and Clinical Immunology. v. 115, p. 1284–1290, 2005. ROCHA, M. F. G. et al. Extratos de Moringa oleifera e Vernonia sp. sobre Candida albicans e Microsporum canis isolados de cães e gatos e análise da toxicidade em Artemia sp. Ciência Rural. v. 41, p. 1807–1812, 2011. ROLIM, L. A. D. M. M. et al. Genotoxicity evaluation of Moringa oleifera seed extract and lectin. Journal of Food Science. v. 76, p. T53-T58, 2011. RUNDQVIST, L. et al. Solution structure, copper binding and backbone dynamics of recombinant Ber e 1 - The major allergen from brazil nut. PLoS One. v. 7. p. e46435, 2012. SAHAY, S.; YADAV, U.; SRINIVASAMURTHY, S. Potential of Moringa oleifera as a functional food ingredient: A review. International Journal of Food Science and Nutrition. v. 2, p. 31–37, 2017. SAMBROOK, J.; FRITSCH, T. E. Molecular Cloning: A Laboratory Manual, in: 2nd Ed., Cold Spring Harb. Lab. Press. Cold Spring Harb., 1989. SANTOS, A. F. S. et al. Isolation of a seed coagulant Moringa oleifera lectin. Process Biochemistry. v. 44, p. 504–508, 2009. SAPANA, M. M.; SONAL, G. C.; RAUT, P. Use of Moringa oleifera (Drumstick) seed as natural absorbent and an antimicrobial agent for ground water treatment. Research Journal of Recent Sciences. v. 1, p. 31–40, 2012. SCHÄGGER, H. Tricine-SDS-PAGE. Nature Protocols. v.1, p. 16-22, 2006. SHARIEF, F. S.; LI, S. S. Amino acid sequence of small and large subunits of seed storage protein from Ricinus communis. The Journal of biological chemistry. v. 257, p. 14753– 14759, 1982. SHARMA, A. et al. Purification and characterization of 2S albumin from seeds of Wrightia tinctoria exhibiting antibacterial and DNase activity. Protein and Peptide Letters. v. 24, p. 368–378, 2017. SHARMA, G. M. et al. Cloning and characterization of 2s albumin, Car i 1, a major allergen in pecan. Journal of Agricultural and Food Chemistry. v. 59, p. 4130–4139, 2011. 74

SHEVCHENKO, A. et al. Mass spectrometric sequencing of proteins from silver-stained polyacrylamide gels. Analytical Chemistry. v. 68, p. 850-858, 1996. SHEWRY, P. R.; NAPIER, J. A; TATHAM, A S. Seed storage proteins: structures and biosynthesis. The Plant cell. v. 7, p. 945–956, 1995. SHIH, M. C. et al. Effect of different parts (leaf, stem and stalk) and seasons (summer and winter) on the chemical compositions and antioxidant activity of Moringa oleifera. International Journal of Molecular Sciences. v. 12, p. 6077–6088, 2011. SIEVERS, F.; HIGGINS, D. G. Clustal Omega, accurate alignment of very large numbers of sequences, in: Russell D. Mult. Seq. Alignment Methods. Methods Mol. Biol. (Methods Protoc. Vol 1079. Humana Press. Totowa, NJ, 2014. SOUZA, P. F. N. et al. A 2S albumin from the seed cake of Ricinus communis inhibits trypsin and has strong antibacterial activity against human pathogenic bacteria. Journal of Natural Products. v. 79, p. 2423–31, 2016. SOUZA, P. F. N. et. al. A 2S albumin from the seed cake of Ricinus communis inhibits trypsin and has strong antibacterial activity against human pathogenic bacteria. Journal of Natural Products. v. 79, p. 2423-2431, 2016. TAI, S. S. K. et al. Expression pattern and deposition of three storage proteins, 11S globulin, 2S albumin and 7S globulin in maturing sesame seeds. Plant Physiology and Biochemistry. v. 39, p. 981–992, 2001. TAI, S. S. K. et al. Molecular cloning of 11S globulin and 2S albumin, the two major seed storage proteins in Sesame. Journal of Agricultural and Food Chemistry. v. 47, p. 4932–8, 1999. TAMURA, K. et al. MEGA6: Molecular evolutionary genetics analysis version 6.0. Molecular Biology and Evolution. v. 30, p. 2725-2729, 2013. TAN-WILSON, A. L.; WILSON, K. A. Mobilization of seed protein reserves. Physiologia Plantarum. v. 145, p. 140–153, 2012. TERRAS, F. R. et al. A new family of basic cysteine-rich plant antifungal proteins from Brassicaceae species. FEBS letters. v. 316, p. 233–240, 1993. TERRAS, F. R. G. et al. Analysis of two novel classes of plant antifungal proteins from radish (Raphanus sativus L.) seeds. Journal of Biological Chemistry. v. 267, p. 15301– 15309, 1992. TOMAR, P. P. S. et al. Characterization of anticancer, DNase and antifungal activity of pumpkin 2S albumin. Biochemical and Biophysical Research Communications. v. 448, p. 349–354, 2014a. TOMAR, P. P. S. et al. Purification, characterisation and cloning of a 2S albumin with DNase, RNase and antifungal activities from Putranjiva Roxburghii. Applied Biochemistry and Biotechnology. v. 174, p. 1-12, 2014b. ULLAH, A. et al. Crystal structure of mature 2S albumin from Moringa oleifera seeds. Biochemical and Biophysical Research Communications. v. 468, p. 365-371, 2015. 75

VAN DER KLEI, H. et al. A fifth 2S albumin isoform is present in Arabidopsis thaliana. Plant Physiology. v. 101, p. 1415–1416, 1993. VICENTE, T. et al. Tratabilidade de água superficial utilizando coagulantes naturais à base de tanino e extratos de sementes de Moringa oleifera. Ensaios e Ciência: C. Biológicas, Agrárias e da Saúde. v. 1, p. 152–155, 2017. VIEIRA, H.; CHAVES, L. H. G.; VIÉGAS, R. A. Acumulação de nutrientes em mudas de moringa (Moringa oleifera Lam) sob omissão de macronutrients. Revista Ciência Agronômica. v. 39, p. 130–136, 2008. VIERA, G. H. F. et al. Antibacterial effect (in vitro) of Moringa oleifera and Annona muricata against Gram positive and Gram negative bacteria. Revista do Instituto de Medicina Tropical de São Paulo. v. 52, p. 129–132, 2010. VITALE, A; GALILI, G. The endomembrane system and the problem of protein sorting. Plant Physiology.v. 125, p. 115–118, 2001. von HEIJNE, G. Patterns of amino acids near signal-sequence cleavage sites. European Journal of Biochemistry. v. 133 p. 17-21, 1983. WANG, X. et al. Purification and characterization of three antifungal proteins from cheeseweed (Malva parviflora). Biochemical and Biophysical Research Communications. v. 282, p. 1224–1228, 2001. WANG, X.; BUNKERS, G. J. Potent heterologous antifungal proteins from cheeseweed (Malva parviflora). Biochemical and Biophysical Research Communications. v. 279, p. 669–673, 2000. WARNER, S. A. J. Genomic DNA isolation and lambda library construction, in: G.D. Foster. D. Twell (Eds.), Plant Gene Isol. Princ. Pr. John Wiley Sons, West Sussex, 1996: pp. 51–73. WILSON, K. A. et al. Role of vacuolar membrane proton pumps in the acidification of protein storage vacuoles following germination. Plant Physiology and Biochemistry. v. 104, p. 242–249, 2016. XU, Y. et al. iHyd-PseAAC: Predicting hydroxyproline and hydroxylysine in proteins by incorporating dipeptide position-specific propensity into pseudo amino acid composition. International Journal of Molecular Sciences. v. 15, p. 7594-7610, 2014. YOULE, R. J.; HUANG, A. H. C. Occurrence of low molecular weight and high cysteine containing albumin storage proteins in oilseeds of diverse species. American Journal of Botany. v. 68, p. 44, 1981. ZAINI, N. M. et al. Single step purification of 2S albumin from Theobroma cacao. International Journal of Advanced and Applied Sciences. v. 4, p. 57-61, 2017. ZAKU, S. G. et al. Moringa oleifera: An underutilized tree in Nigeria with amazing versatility: A review. African Journal of Food Science. v. 9, p. 456–461, 2015.

76

APÊNDICE - SUPPLEMENTARY MATERIAL: POST-TRANSLATIONAL

MODIFICATIONS REVEAL THAT Mo-CBP3, A 2S MORINGA OLEIFERA CHITIN- BINDING ALBUMIN, IS A COMPLEX MIXTURE OF ISOFORMS

Table S1. DNA sequences and primer-binding sites of the oligonucleotide primers used in the present work. The sequences of the primers were based on the cDNA sequences, previously determined by Freire et al. (Freire et al., 2015).

GenBank Target Mo- Primer- accession Primer Sequence CBP3 binding site number of the isoform cDNA sequence P1 5ʹ-CGTCAGTATATCAGAAGCAGTTTAA-3ʹ 27-51 (+) KF616830, 1/4 P2 5ʹ-AGCTTCGAGCTCTACGAACACACAC-3ʹ 670-694 (−) KF616831

P3 5ʹ-CGTCAGTATATCAGAAGCAGTTTAC-3ʹ 28-50 (+) 2 KF616832 P4 5ʹ-CACGGGGTACATTTGAGCAACTAGC-3ʹ 692-716 (−)

P5 5ʹ-TCAGCAGCAACCAACACCACACCGG-3ʹ 27-51(+) 3 KF616833 P6 5ʹ-GTTACACCGCTAGTGGCTCTCGTCT-3ʹ 664-688 (−)

Fonte: elaborada pelo autor.

77

4 TCAGTATATCAGAAGCAGTTTACCATGGCAAAGATCACTCTCCTCCTCGCCACCTTCGGT M A K I T L L L A T F G 12 65 TTGCTCCTTCTCCTGACCAACGCTTCTATCTACCGCACAACTGTCGAGCTCGACGAGGAG L L L L L T N A S I Y R T T V E L D E E 32 125 GCTGACGAGAACCAGCGGCAGAGATGTCGCCAGCAATTTCAGACCCACCAGCGCCTCAGG A D E N Q R Q R C R Q Q F Q T H Q R L R 52 185 GCGTGCCAGCGCTTCATCCGGCGACGGACCCAGGGTGGAGGTCCCCTGGACGAGGTTGAA A C Q R F I R R R T Q G G G P L D E V E 72 245 GACGAAGTAGACGAAATCGAAGAGGTTGTGGAGCCCGACCAGGGTCCCGGTCGACAACCG D E V D E I E E V V E P D Q G P G R Q P 92 305 GCCTTCCAGCGTTGCTGCCAACAGCTGCGGAACATATCTCCTCCTTGCAGGTGCCCATCA A F Q R C C Q Q L R N I S P P C R C P S 112 365 CTCAGGCAAGCAGTACAGTTGACACACCAGCAGCAGGGACAGGTGGGTCCTCAGCAGGTA L R Q A V Q L T H Q Q Q G Q V G P Q Q V 132 425 AGGCAGATGTACCGCGTCGCTAGCAACATCCCCTCCATGTGCAACCTGCAGCCGATGAGC R Q M Y R V A S N I P S M C N L Q P M S 152 485 TGCCCCTTCCGTCAGCAGCAAAGCTCGTGGCTCTAAACTGTATGGCCCATAAGGTGGTCA C P F R Q Q Q S S W L * 163 545 CGTGCAGTCATGCACGAGGATAATAGACCTCACTCTTGCCAGCCCTTCACTGACAAGGGT 605 AACGAGCTATGTAATAATAAAAGCACATAGTATCGTGTGCGTTTATGGAGTCTGAAGCTA 634 GCTAGCTAGTTGCTCAAATGTACCCCGTG Fonte: elaborada pelo autor.

Figure S1. Genomic DNA sequence (GenBank accession number: MH000615) and deduced amino acid sequence of Mo-CBP3-2A. Numbers for the first nucleotide and the last amino acid residue in each row are shown on the left and right, respectively. The stop codon is indicated by an asterisk and primer sequences are underlined.

78

4 TCAGTATATCAGAAGCAGTTTACCATGGCAAAGATCACTCTCCTCCTCGCCACCTTCGGT M A K I T L L L A T F G 12 65 TTGCTCCTTCTCCTGACCAACGCTTCTATCTACCGCACAACTGTCGAGCTCGACGAGGAG L L L L L T N A S I Y R T T V E L D E E 32 125 GCTGACGAGAACCAGCGGCAGAGATGTCGCCAGCAATTTCAGACCCACCAGCGCCTCAGG A D E N Q R Q R C R Q Q F Q T H Q R L R 52 185 GCGTGCCAGCGCTTCATCCGGCGACGGACCCAGGGTGGAGGTCCCCTGGACGAGGTTGAA A C Q R F I R R R T Q G G G P L D E V E 72 245 GACGAAGTAGACGAAATCGAAGAGGTTGTGGAGCCCGACCAGGGTCCCGGTCGACAACCG D E V D E I E E V V E P D Q G P G R Q P 92 305 GCCTTCCAGCGTTGCTGCCAACAGCTGCGGAACATATCTCCTCCTTGCAGGTGCCCATCA A F Q R C C Q Q L R N I S P P C R C P S 112 365 CTCAGGCAAGCAGTACAGTTGACACACCAGCAGCAGGGACAGGTGGGTCCTCAGCAGGTA L R Q A V Q L T H Q Q Q G Q V G P Q Q V 132 425 AGGCAGATGTACCGCGTCGCTAGCAACATCCCCTCCATGCGCAACCTGCAGCCGATGAGC R Q M Y R V A S N I P S M R N L Q P M S 152 485 TGCCCCTTCCGTCAGCAGCAAAGCTCGTGGCTCTAAACTGTATGGCCCATAAGGTGGTCA C P F R Q Q Q S S W L * 163 545 CGTGCAGTCATGCACGAGGATAATAGACCTCACTCTTGCCAGCCCTTCACTGACAAGGGT 605 AACGAGCTATGTAATAATAAAAGCACATAGTATCGTGTGCGTTTATGGAGTCTGAAGCTA 634 GCTAGCTAGTTGCTCAAATGTACCCCGTG Fonte: elaborada pelo autor.

Figure S2. Genomic DNA sequence (GenBank accession number: MH000616) and deduced amino acid sequence of Mo-CBP3-2B. Numbers for the first nucleotide and the last amino acid residue in each row are shown on the left and right, respectively. The stop codon is indicated by an asterisk and primer sequences are underlined.

79

2 TCAGCAGCAACCAACACCACACCGGCAGTGCTTACAATGGCAAAGTTCACTCTCCTCCTT M A K F T L L L 8 63 GCCATCTTCGCTTTGTTCCTCATTCTGGCCAACGCCAACGTCTACCGCACCACTGTCGAG A I F A L F L I L A N A N V Y R T T V E 28 123 CTCGACGAGGAACCTGACGACAACCAGCAAGGCCAGCAGCAGCAGCAATGCCGCCAGCAG L D E E P D D N Q Q G Q Q Q Q Q C R Q Q 48 183 TTTTTGACCCATCAACGCCTCAGGGCTTGCCAGCGCTTCATCCGACGACAGACCCAGGGT F L T H Q R L R A C Q R F I R R Q T Q G 68 143 GGAGGCGCCCTCGAGGATGTCGAAGACGACGTAGAAGAAATCGAGGAAGTGGTGGAGCCC G G A L E D V E D D V E E I E E V V E P 88 303 GACCAGGCCCGTCGACCAGCCATCCAACGTTGCTGCCAACAGCTGCGGAACATACAGCCT D Q A R R P A I Q R C C Q Q L R N I Q P 108 363 CGCTGCAGGTGCCCTTCACTGAGGCAGGCAGTACAGCTCGCACACCAGCAGCAGGGACAG R C R C P S L R Q A V Q L A H Q Q Q G Q 128 423 GTGGGTCCTCAACAGGTAAGGCAGATGTACCGCCTTGCTAGCAACATCCCCGCTATCTGC V G P Q Q V R Q M Y R L A S N I P A I C 148 483 AACCTGCGGCCAATGAGCTGCCCATTCGGTCAGCAGTGAAGCTTGTGGCTGTAAACTATA N L R P M S C P F G Q Q * 160 543 TGGCCCTGGTGGTCACCAGTACTCATGCACGAAGACAATCGATGCATGGCGATAATAAAC 603 CTTACTCTTACTCTTTACTCTTCGACTGTTTAGGTGGAGACGAGAGCCACTAGCGGTGTA 646 ACAATAAAAGCACATTATCGTGTGTGTTCGTAGAGCTCGAAGC Fonte: elaborada pelo autor.

Figure S3. Genomic DNA sequence (GenBank accession number: MH000617) and deduced

amino acid sequence of Mo-CBP3-3. Numbers for the first nucleotide and the last amino acid residue in each row are shown on the left and right, respectively. The stop codon is indicated by an asterisk and primer sequences are underlined.

80

4 GCAACCAACACCACACCGGCAGTGCTTACAATGGCAAAGTTCACTCTCCTCCTTGCCATC M A K F T L L L A I 10 65 TTCGCTTTGTTCCTCATTCTGGCCAACGCCAACGTCTACCGCACCACTGTCGAGCTCGAC F A L F L I L A N A N V Y R T T V E L D 30 125 GAGGAACCTGACGACAACCAGCAAGGCCAGCAGCAGCAGCAATGCCGCCAGCAGTTTTTG E E P D D N Q Q G Q Q Q Q Q C R Q Q F L 50 185 ACCCATCAACGCCTCAGGGCTTGCCAGCGCTTCATCCGACGACAGACCCAGGGTGGAGGC T H Q R L R A C Q R F I R R Q T Q G G G 70 245 GCCCTCGAGGATGTCGAAGACGACGTAGAAGAAATCGAGGAAGTGGTGGAGCCCGACCAG A L E D V E D D V E E I E E V V E P D Q 90 305 GCCCGTCGACCAGCCATCCAACGTTGCTGCCAACAGCTGCGGAACATACAGCCTCGCCGC A R R P A I Q R C C Q Q L R N I Q P R R 110 365 AGGTGCCCTTCACTGAGGCAGGCAGTACAGCTCGCACACCAGCAGCAGGGACAGGTGGGT R C P S L R Q A V Q L A H Q Q Q G Q V G 130 425 CCTCAACAGGTAGGGCAGATGTACCGCCTTGCTAGCAACATCCCCGCTATCTGCAACCTG P Q Q V G Q M Y R L A S N I P A I C N L 150 485 CGGCCAATGAGCTGCCCATTCGGTCAGCAGTGAAGCTTGTGGCTGTAAACTATATGGCCC R P M S C P F G Q Q * 160 545 TGGTGGTCACCAGTACTCATGCACGAAGACAATCGATGCATGGCGATAATAAACCTTACT 601 CTTACTCTTTACTCTTCGACTGTTTAGGTGGAGACGAGAGCCACTAGCGGTGTAAC Fonte: elaborada pelo autor.

Figure S4. Genomic DNA sequence (GenBank accession number: MH000618) and deduced

amino acid sequence of Mo-CBP3-3A. Numbers for the first nucleotide and the last amino acid residue in each row are shown on the left and right, respectively. The stop codon is indicated by an asterisk and primer sequences are underlined.

81

4 GCAACCAACACCACACCGGCAGTGCTTACAATGGCAAAGTTCACTCTCCTCCTTGCCATC M A K F T L L L A I 10 65 TTCGCTTTGTTCCTCATTCTGGCCAACGCCAACGTCTACCGCACCACTGTCGAGCTCGAC F A L F L I L A N A N V Y R T T V E L D 30 125 GAGGAACCTGACGACAACCAGCAAGGCCAGCAGCAGCAGCAATGCCGCCAGCAGTTTTTG E E P D D N Q Q G Q Q Q Q Q C R Q Q F L 50 185 ACCCATCAACGCCTCAGGGCTTGCCAGCGCTTCATCCGACGACAGACCCAGGGTGGAGGC T H Q R L R A C Q R F I R R Q T Q G G G 70 245 GCCCTCGAGGATGTCGAAGACGACGTAGAAGGAATCGAGGAAGTGGTGGAGCCCGACCAG A L E D V E D D V E G I E E V V E P D Q 90 305 GCCCGTCGACCAGCCATCCAACGTTGCTGCCAACAGCTGCGGAACATACAGCCTCGCTGC A R R P A I Q R C C Q Q L R N I Q P R C 110 365 AGGTGTCCTTCACTGAGGCAGGCAGTACAGCTCGCACACCAGCAGCAGGGACAGGTGGGT R C P S L R Q A V Q L A H Q Q Q G Q V G 130 425 CCTCAACAGGTAAGGCAGATGTACCGCCTTGCTAGCAACATCCCCGCTATCTGCAACCTG P Q Q V R Q M Y R L A S N I P A I C N L 150 485 CGGCCAATGAGCTGCCCATTCGGTCAGCAGTGAAGCTTGTGGCTGTAAACTATATGGCCC R P M S C P F G Q Q * 160 545 TGGTGGTCACCAGTACTCATGCACGAAGACAATCGATGCATGGCGATAATAAACCTTACT 601 CTTACTCTTTACTCTTCGACTGTTTAGGTGGAGACGAGAGCCACTAGCGGTGTAAC Fonte: elaborada pelo autor.

Figure S5. Genomic DNA sequence (GenBank accession number: MH000619) and deduced

amino acid sequence of Mo-CBP3-3B. Numbers for the first nucleotide and the last amino acid residue in each row are shown on the left and right, respectively. The stop codon is indicated by an asterisk and primer sequences are underlined.

82

2 CGTCAGTATATCAGAAGCAGTTTAATTACTATGGCAAAGCTCACTCTCCTCCTCGCCACC M A K L T L L L A T 10 63 TTAGCTTTGCTCGTCCTCCTGGCCAACGCCTCCATCTACCGCACCACTGTCGAGCTCGAC L A L L V L L A N A S I Y R T T V E L D 30 123 GAGGAGCCTGACGACAACCAGCAGCAGAGATGTCGCCATCAATTTCAGACCCAACAGCGC E E P D D N Q Q Q R C R H Q F Q T Q Q R 50 183 CTCAGGGCTTGCCAGCGCGTCATCCGGCGATGGAGCCAGGGTGGAGGTCCCATGGAGGAC L R A C Q R V I R R W S Q G G G P M E D 70 243 GTTGAAGACGAAATAGACGAAACAGACGAAATCGAGGAAGTCGTTGAGCCCGACCAGGCC V E D E I D E T D E I E E V V E P D Q A 90 303 CGTCGACCACCAACTCTCCAGCGTTGCTGCCGACAGCTGCGGAACGTATCTCCTTTCTGC R R P P T L Q R C C R Q L R N V S P F C 110 363 AGGTGCCCTTCACTCAGGCAAGCAGTACAGTCTGCACAGCAGCAACAGGGACAGGTCGGT R C P S L R Q A V Q S A Q Q Q Q G Q V G 130 423 CCTCAGCAGGTAGGTCACATGTACCGCGTCGCCAGTCGCATCCCTGCCATCTGTAACCTG P Q Q V G H M Y R V A S R I P A I C N L 150 483 CAGCCCATGAGGTGCCCGTTCCGTCAGCAGCAAAGCTCGTGAACGCAGGTGGTCACCAGC Q P M R C P F R Q Q Q S S * 163 543 ACTCATGCACGAAAAACAATCGATGCATGAGGACAATAAACCTCACTCTTGCTCTTATCT 603 GGAAAGAGTGACGAGCTATGTAACAATAAAAACACAGTTCCCTGTGTGTGTTCGTAGAGC 611 TCGAAGCT Fonte: elaborada pelo autor.

Figure S6. Genomic DNA sequence (GenBank accession number: MH000620) and deduced amino acid sequence of Mo-CBP3-4. Numbers for the first nucleotide and the last amino acid residue in each row are shown on the left and right, respectively. The stop codon is indicated by an asterisk and primer sequences are underlined. 83

10 20 30 40 50 60 70 80 90 100 ....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....| Mo-CBP3-1_cDNA ATGGCAAAGCTCACTCTCCTCCTCGCCACCTTCGCTTTGCTCGTCCTCCTGGCCAACGCCTCCATCTACCGCACCACTGTCGAGCTCGACGAGGAGCCTG Mo-CBP3-2_cDNA ATGGCAAAGATCACTCTCCTCCTCGCCACCTTCGGTTTGCTCCTTCTCCTGACCAACGCTTCTATCTACCGCACAACTGTCGAGCTCGACGAGGAGGCTG Mo-CBP3-2A_gDNA ATGGCAAAGATCACTCTCCTCCTCGCCACCTTCGGTTTGCTCCTTCTCCTGACCAACGCTTCTATCTACCGCACAACTGTCGAGCTCGACGAGGAGGCTG Mo-CBP3-2B_gDNA ATGGCAAAGATCACTCTCCTCCTCGCCACCTTCGGTTTGCTCCTTCTCCTGACCAACGCTTCTATCTACCGCACAACTGTCGAGCTCGACGAGGAGGCTG Mo-CBP3-3_gDNA ATGGCAAAGTTCACTCTCCTCCTTGCCATCTTCGCTTTGTTCCTCATTCTGGCCAACGCCAACGTCTACCGCACCACTGTCGAGCTCGACGAGGAACCTG Mo-CBP3-3_cDNA ATGGCAAAGTTCACTCTCCTCCTTGCCATCTTCGCTTTGTTCCTCATTCTGGCCAACGCCAACGTCTACCGCACCACTGTCGAGCTCGACGAGGAACCTG Mo-CBP3-3A_gDNA ATGGCAAAGTTCACTCTCCTCCTTGCCATCTTCGCTTTGTTCCTCATTCTGGCCAACGCCAACGTCTACCGCACCACTGTCGAGCTCGACGAGGAACCTG Mo-CBP3-3B_gDNA ATGGCAAAGTTCACTCTCCTCCTTGCCATCTTCGCTTTGTTCCTCATTCTGGCCAACGCCAACGTCTACCGCACCACTGTCGAGCTCGACGAGGAACCTG Mo-CBP3-4_gDNA ATGGCAAAGCTCACTCTCCTCCTCGCCACCTTAGCTTTGCTCGTCCTCCTGGCCAACGCCTCCATCTACCGCACCACTGTCGAGCTCGACGAGGAGCCTG Mo-CBP3-4_cDNA ATGGCAAAGCTCACTCTCCTCCTCGCCACCTTAGCTTTGCTCGTCCTCCTGGCCAACGCCTCCATCTACCGCACCACTGTCGAGCTCGACGAGGAGCCTG

110 120 130 140 150 160 170 180 190 200 ....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....| Mo-CBP3-1_cDNA ACGACAACCAG------CAGCAGAGATGTCGCCATCAATTTCAGTCCCAACAGCGCCTCAGGGCTTGCCAGCGCGTCATCCGGCGATGGAGCCA Mo-CBP3-2_cDNA ACGAGAACCAG------CAGCAGAGATGTCGCCAGCAATTTCAGACCCACCAGCGCCTCAGGGCGTGCCAGCGCTTCATCCGGCGACGGACCCA Mo-CBP3-2A_gDNA ACGAGAACCAG------CGGCAGAGATGTCGCCAGCAATTTCAGACCCACCAGCGCCTCAGGGCGTGCCAGCGCTTCATCCGGCGACGGACCCA Mo-CBP3-2B_gDNA ACGAGAACCAG------CGGCAGAGATGTCGCCAGCAATTTCAGACCCACCAGCGCCTCAGGGCGTGCCAGCGCTTCATCCGGCGACGGACCCA Mo-CBP3-3_gDNA ACGACAACCAGCAAGGCCAGCAGCAGCAGCAATGCCGCCAGCAGTTTTTGACCCATCAACGCCTCAGGGCTTGCCAGCGCTTCATCCGACGACAGACCCA Mo-CBP3-3_cDNA ACGACAACCAGCAAGGCCAGCAGCAGCAGCAATGCCGCCAGCAGTTTTTGACCCATCAACGCCTCAGGGCTTGCCAGCGCTTCATCCGACGACAGACCCA Mo-CBP3-3A_gDNA ACGACAACCAGCAAGGCCAGCAGCAGCAGCAATGCCGCCAGCAGTTTTTGACCCATCAACGCCTCAGGGCTTGCCAGCGCTTCATCCGACGACAGACCCA Mo-CBP3-3B_gDNA ACGACAACCAGCAAGGCCAGCAGCAGCAGCAATGCCGCCAGCAGTTTTTGACCCATCAACGCCTCAGGGCTTGCCAGCGCTTCATCCGACGACAGACCCA Mo-CBP3-4_gDNA ACGACAACCAG------CAGCAGAGATGTCGCCATCAATTTCAGACCCAACAGCGCCTCAGGGCTTGCCAGCGCGTCATCCGGCGATGGAGCCA Mo-CBP3-4_cDNA ACGACAACCAG------CAGCAGAGATGTCGCCATCAATTTCAGACCCAACAGCGCCTCAGGGCTTGCCAGCGCGTCATCCGGCGATGGAGCCA

210 220 230 240 250 260 270 280 290 300 ....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....| Mo-CBP3-1_cDNA GGGTGGAGGTCCCATGGAGGACGTTGAAGACGAAATAGGCGAAACAGACGAAATCGAGGAAGTCGTTGAGCCCGACCAG---GCCCGTCGACCACCAACT Mo-CBP3-2_cDNA GGGTGGAGGTCCCCTGGACGAGGTTGAAGACGAAGTA------GACGAAATCGAAGAGGTTGTGGAGCCCGACCAGGGTCCCGGTCGACAACCGGCC Mo-CBP3-2A_gDNA GGGTGGAGGTCCCCTGGACGAGGTTGAAGACGAAGTA------GACGAAATCGAAGAGGTTGTGGAGCCCGACCAGGGTCCCGGTCGACAACCGGCC Mo-CBP3-2B_gDNA GGGTGGAGGTCCCCTGGACGAGGTTGAAGACGAAGTA------GACGAAATCGAAGAGGTTGTGGAGCCCGACCAGGGTCCCGGTCGACAACCGGCC Mo-CBP3-3_gDNA GGGTGGAGGCGCCCTCGAGGATGTCGAAGACGACGTA------GAAGAAATCGAGGAAGTGGTGGAGCCCGACCAG------GCCCGTCGACCAGCC Mo-CBP3-3_cDNA GGGTGGAGGCGCCCTCGAGGATGTCGAAGACGACGTA------GAAGAAATCGAGGAAGTGGTGGAGCCCGACCAG------GCCCGTCGACCAGCC Mo-CBP3-3A_gDNA GGGTGGAGGCGCCCTCGAGGATGTCGAAGACGACGTA------GAAGAAATCGAGGAAGTGGTGGAGCCCGACCAG------GCCCGTCGACCAGCC Mo-CBP3-3B_gDNA GGGTGGAGGCGCCCTCGAGGATGTCGAAGACGACGTA------GAAGGAATCGAGGAAGTGGTGGAGCCCGACCAG------GCCCGTCGACCAGCC Mo-CBP3-4_gDNA GGGTGGAGGTCCCATGGAGGACGTTGAAGACGAAATAGACGAAACAGACGAAATCGAGGAAGTCGTTGAGCCCGACCAG---GCCCGTCGACCACCAACT Mo-CBP3-4_cDNA GGGTGGAGGTCCCATGGAGGACGTTGAAGACGAAATAGACGAAACAGACGAAATCGAGGAAGTCGTTGAGCCCGACCAG---GCCCGTCGACCACCAACT

Fonte: elaborada pelo autor. Continua... Figure S7. Multiple alignment of genomic DNA sequences encoding isoforms of Mo-CBP3. Genomic DNA sequences, as determined in the present work, were aligned with cDNA sequences (GenBank accession numbers: KF616830, encoding Mo-CBP3-1; KF616832, encoding Mo-CBP3-2; KF616833, encoding Mo-CBP3-3; and KF616831, encoding Mo-CBP3-4), that were previously determined (Freire et al., 2015). The program PAL2NAL (Suyama et al., 2006) was used to align the codons. 84 Continuação...

310 320 330 340 350 360 370 380 390 400 ....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....| Mo-CBP3-1_cDNA CTCCAGCGTTGCTGCCGACAGCTGCGGAACGTATCTCCTTTCTGCAGGTGCCCTTCACTCAGGCAAGCAGTACAGTCTGCACAGCAGCAACAGGGACAGG Mo-CBP3-2_cDNA TTCCAGCGTTGCTGCCAACAGCTGCGGAACATATCTCCTCCTTGCAGGTGCCCATCACTCAGGCAAGCAGTACAGTTGACACACCAGCAGCAGGGACAGG Mo-CBP3-2A_gDNA TTCCAGCGTTGCTGCCAACAGCTGCGGAACATATCTCCTCCTTGCAGGTGCCCATCACTCAGGCAAGCAGTACAGTTGACACACCAGCAGCAGGGACAGG Mo-CBP3-2B_gDNA TTCCAGCGTTGCTGCCAACAGCTGCGGAACATATCTCCTCCTTGCAGGTGCCCATCACTCAGGCAAGCAGTACAGTTGACACACCAGCAGCAGGGACAGG Mo-CBP3-3_gDNA ATCCAACGTTGCTGCCAACAGCTGCGGAACATACAGCCTCGCTGCAGGTGCCCTTCACTGAGGCAGGCAGTACAGCTCGCACACCAGCAGCAGGGACAGG Mo-CBP3-3_cDNA ATCCAACGTTGCTGCCAACAGCTGCGGAACATACAGCCTCGCTGCAGGTGCCCTTCACTGAGGCAGGCAGTACAGCTCGCACACCAGCAGCAGGGACAGG Mo-CBP3-3A_gDNA ATCCAACGTTGCTGCCAACAGCTGCGGAACATACAGCCTCGCCGCAGGTGCCCTTCACTGAGGCAGGCAGTACAGCTCGCACACCAGCAGCAGGGACAGG Mo-CBP3-3B_gDNA ATCCAACGTTGCTGCCAACAGCTGCGGAACATACAGCCTCGCTGCAGGTGTCCTTCACTGAGGCAGGCAGTACAGCTCGCACACCAGCAGCAGGGACAGG Mo-CBP3-4_gDNA CTCCAGCGTTGCTGCCGACAGCTGCGGAACGTATCTCCTTTCTGCAGGTGCCCTTCACTCAGGCAAGCAGTACAGTCTGCACAGCAGCAACAGGGACAGG Mo-CBP3-4_cDNA CTCCAGCGTTGCTGCCGACAGCTGCGGAACGTATCTCCTTTCTGCAGGTGCCCTTCACTCAGGCAAGCAGTACAGTCTGCACAGCAGCAACAGGGACAGG

410 420 430 440 450 460 470 480 490 500 ....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....| Mo-CBP3-1_cDNA TCGGTCCTCAGCAGGTAGGTCACATGTACCGCGTCGCCAGTCGCATCCCTGCCATCTGTAACCCGCAGCCCATGAGGTGCCCGTTCCGTCAGCAGCAAGG Mo-CBP3-2_cDNA TGGGTCCTCAGCAGGTAAGGCAGATGTACCGCGTCGCTAGCAACATCCCCTCCATGTGCAACCTGCAGCCGATGAGCTGCCTCTTCCGTCAGCAGCAAAG Mo-CBP3-2A_gDNA TGGGTCCTCAGCAGGTAAGGCAGATGTACCGCGTCGCTAGCAACATCCCCTCCATGTGCAACCTGCAGCCGATGAGCTGCCCCTTCCGTCAGCAGCAAAG Mo-CBP3-2B_gDNA TGGGTCCTCAGCAGGTAAGGCAGATGTACCGCGTCGCTAGCAACATCCCCTCCATGCGCAACCTGCAGCCGATGAGCTGCCCCTTCCGTCAGCAGCAAAG Mo-CBP3-3_gDNA TGGGTCCTCAACAGGTAAGGCAGATGTACCGCCTTGCTAGCAACATCCCCGCTATCTGCAACCTGCGGCCAATGAGCTGCCCATTCGGTCAGCAG----- Mo-CBP3-3_cDNA TGGGTCCTCAACAGGTAAGGCAGATGTACCGCCTTGCTAGCAACATCCCCGCTATCTGCAACCTGCGGCCAATGAGCTGCCCATTCGGTCAGCAG----- Mo-CBP3-3A_gDNA TGGGTCCTCAACAGGTAGGGCAGATGTACCGCCTTGCTAGCAACATCCCCGCTATCTGCAACCTGCGGCCAATGAGCTGCCCATTCGGTCAGCAG----- Mo-CBP3-3B_gDNA TGGGTCCTCAACAGGTAAGGCAGATGTACCGCCTTGCTAGCAACATCCCCGCTATCTGCAACCTGCGGCCAATGAGCTGCCCATTCGGTCAGCAG----- Mo-CBP3-4_gDNA TCGGTCCTCAGCAGGTAGGTCACATGTACCGCGTCGCCAGTCGCATCCCTGCCATCTGTAACCTGCAGCCCATGAGGTGCCCGTTCCGTCAGCAGCAAAG Mo-CBP3-4_cDNA TCGGTCCTCAGCAGGTAGGTCACATGTACCGCGTCGCCAGTCGCATCCCTGCCATCTGTAACCTGCAGCCCATGAGGTGCCCGTTCCGTCAGCAGCAAAG

510 ....|....| Mo-CBP3-1_cDNA CTCG------Mo-CBP3-2_cDNA CTCGTGGCTC Mo-CBP3-2A_gDNA CTCGTGGCTC Mo-CBP3-2B_gDNA CTCGTGGCTC Mo-CBP3-3_gDNA ------Mo-CBP3-3_cDNA ------Mo-CBP3-3A_gDNA ------Mo-CBP3-3B_gDNA ------Mo-CBP3-4_gDNA CTCG------Mo-CBP3-4_cDNA CTCG------

Fonte: elaborada pelo autor. Figure S7. Multiple alignment of genomic DNA sequences encoding isoforms of Mo-CBP3. Genomic DNA sequences, as determined in the present work, were aligned with cDNA sequences (GenBank accession numbers: KF616830, encoding Mo-CBP3-1; KF616832, encoding Mo-CBP3-2; KF616833, encoding Mo-CBP3-3; and KF616831, encoding Mo-CBP3-4), that were previously determined (Freire et al., 2015). The program PAL2NAL (Suyama et al., 2006) was used to align the codons. 85

Table S2. Matrix of pairwise comparisons of gDNA (this work) and cDNA (Freire et al., 2015) sequences encoding Mo-CBP3 (isoforms 1, 2, 2A, 2B, 3, 3A, 3B and 4). For each pair of compared sequences, the percentage of sequence identity (above the diagonal) and the number of different nucleotides (below the diagonal) between them are shown. These numbers were calculated based on the multiple sequence alignment shown in Fig. S8.

1-cDNA 2-cDNA 2A-gDNA 2B-gDNA 3-gDNA 3-cDNA 3A-gDNA 3B-gDNA 4-gDNA 4-cDNA 1-cDNA - 84.1 84.1 83.9 77.4 77.4 77.4 77.0 98.9 98.9 2-cDNA 79 - 99.5 99.3 79.6 79.6 79.2 79.2 84.5 84.5 2A-gDNA 79 2 - 99.7 79.6 79.6 79.2 79.2 84.5 84.5 2B-gDNA 80 3 1 - 79.4 79.4 79.0 79.0 84.3 84.3 3-gDNA 113 102 102 103 - 100.0 99.5 99.5 77.6 77.6 3-cDNA 113 102 102 103 0 - 99.5 99.5 77.6 77.6 3A-gDNA 113 104 104 105 2 2 - 99.1 77.6 77.6 3B-gDNA 115 104 104 105 2 2 4 - 77.2 77.2 4-gDNA 5 77 77 78 112 112 112 114 - 100.0 4-cDNA 5 77 77 78 112 112 112 114 0 -

Fonte: elaborada pelo autor.

86

Table S3. Matrix of pairwise comparisons of the amino acid sequences of Mo-CBP3 (isoforms 1, 2, 2A, 2B, 3, 3A, 3B and 4). For each pair of compared sequences, the percentage of sequence identity (above the diagonal) and the number of different nucleotides (below the diagonal) between them are shown. These numbers were calculated based on the multiple sequence alignment shown in Fig. S9.

1 2 2A 2B 3 3A 3B 4 1 - 74.6 74.6 74.0 70.6 70.6 70.0 96.9 2 42 - 98.7 98.1 75.4 74.2 74.8 75.9 2A 42 2 - 99.3 75.4 74.2 74.8 75.9 2B 43 3 1 - 74.8 73.6 74.2 75.3 3 49 41 41 42 - 98.7 99.3 71.2 3A 49 43 43 44 2 - 98.1 71.2 3B 50 42 42 43 1 3 - 70.6 4 5 40 40 41 48 48 49 -

Fonte: elaborada pelo autor. 87

A B

C D

E F

Fonte: elaborada pelo autor.

Figure S8. Graphical output from SignalP 4.1. The graphics were generated when

the amino acid sequences of Mo-CBP3-2A (A), Mo-CBP3-2B (B), Mo-CBP3-3

(C), Mo-CBP3-3A (D), Mo-CBP3-3B (E) and Mo-CBP3-4 (F), as deduced from genomic DNA sequences, were analyzed through the program’s web-server (http://www.cbs.dtu.dk/services/SignalP/).

88

Fonte: elaborada pelo autor.

Figure S9. ESI-MS spectra of protein fractions obtained when reduced and

alkylated Mo-CBP3was analyzed by RF-HPLC.

89

Fonte: elaborada pelo autor.

Figure S10. ESI-MS spectra of protein fractions obtained when reduced and

alkylated Mo-CBP3was analyzed by RF-HPLC.

90

Fonte: elaborada pelo autor.

Figure S11. ESI-MS spectra of protein fractions obtained when reduced and

alkylated Mo-CBP3was analyzed by RF-HPLC.

91

Fonte: elaborada pelo autor.

Figure S12. ESI-MS spectra of protein fractions obtained when reduced and

alkylated Mo-CBP3 was analyzed by RF-HPLC.

92

Fonte: elaborada pelo autor.

Figure S13. ESI-MS spectra of protein fractions obtained when reduced and

alkylated Mo-CBP3 was analyzed by RF-HPLC.

93

Fonte: elaborada pelo autor.

Figure S14. ESI-MS spectra of protein fractions obtained when reduced and

alkylated Mo-CBP3 was analyzed by RF-HPLC.

94

Fonte: elaborada pelo autor.

Figure S15. ESI-MS spectra of protein fractions obtained when reduced and

alkylated Mo-CBP3 was analyzed by RF-HPLC.

95

Fonte: elaborada pelo autor.

Figure S16. ESI-MS spectra of protein fractions obtained when reduced and alkylated

Mo-CBP3 was analyzed by RF-HPLC.

96

Fonte: elaborada pelo autor.

Figure S17. ESI-MS spectra of protein fractions obtained when reduced and alkylated

Mo-CBP3 was analyzed by RF-HPLC.

97

Fonte: elaborada pelo autor.

Figure S18. ESI-MS spectra of protein fractions obtained when reduced and alkylated

Mo-CBP3 was analyzed by RF-HPLC.

98

Fonte: elaborada pelo autor.

Figure S19. ESI-MS spectra of protein fractions obtained when reduced and alkylated

Mo-CBP3 was analyzed by RF-HPLC. 99

Fonte: elaborada pelo autor.

Figure S20. ESI-MS spectra of protein fractions obtained when reduced and alkylated

Mo-CBP3 was analyzed by RF-HPLC.

100

Fonte: elaborada pelo autor.

Figure S21. ESI-MS spectra of protein fractions obtained when reduced and alkylated

Mo-CBP3 was analyzed by RF-HPLC.

101

Fonte: elaborada pelo autor.

Figure S21. ESI-MS spectra of protein fractions obtained when reduced and alkylated

Mo-CBP3 was analyzed by RF-HPLC.

102

Fonte: elaborada pelo autor.

Figure S22. ESI-MS spectra of protein fractions obtained when reduced and alkylated

Mo-CBP3 was analyzed by RF-HPLC.

103

Fonte: elaborada pelo autor.

Figure S23. ESI-MS spectra of protein fractions obtained when reduced and alkylated

Mo-CBP3 was analyzed by RF-HPLC.

104

Fonte: elaborada pelo autor.

Figure S24. ESI-MS spectra of protein fractions obtained when reduced and alkylated

Mo-CBP3 was analyzed by RF-HPLC.

105

Fonte: elaborada pelo autor.

Figure S25. ESI-MS spectra of protein fractions obtained when reduced and alkylated

Mo-CBP3 was analyzed by RF-HPLC.

106

Fonte: elaborada pelo autor.

Figure S26. ESI-MS spectra of protein fractions obtained when reduced and alkylated

Mo-CBP3 was analyzed by RF-HPLC.

107

Fonte: elaborada pelo autor.

Figure S27. ESI-MS spectra of protein fractions obtained when reduced and alkylated

Mo-CBP3 was analyzed by RF-HPLC.

108

Fonte: elaborada pelo autor.

Figure S28. ESI-MS spectra of protein fractions obtained when reduced and alkylated

Mo-CBP3 was analyzed by RF-HPLC.

109

Fonte: elaborada pelo autor.

Figure S29. ESI-MS spectra of protein fractions obtained when reduced and alkylated

Mo-CBP3 was analyzed by RF-HPLC.

110

Fonte: elaborada pelo autor.

Figure S30. ESI-MS spectra of protein fractions obtained when reduced and alkylated

Mo-CBP3 was analyzed by RF-HPLC.

111

Table S4. Predictions of phosphorylation sites were determined by NetPhos server, to amino acid sequences of Mo-CBP3 to small and large chain. Scores values were those calculated from own server. Small Chian Large Chian

Amino Amino Sequence Score Kinase Sequence Score Kinase acid acid S47 HQFQSQQRL 0.590 DNAPK 0.736 unsp

0.994 unsp S107 LRNVSPFCR 0.609 PKA

0.822 PKA 0.576 cdk5

0.620 DNAPK 0.866 unsp S62 IRRWSQGGG 0.525 ATM S114 CRCPSLRQA 0.717 PKC Mo-CBP3-1 0.510 RSK 0.675 PKA

0.502 PKG S121 QAVQSAQQQ 0.538 PKC

0.962 unsp

S142 YRVASRIPA 0.590 PKA

0.562 PKG

0.936 unsp 0.599 PKA

0.750 PKA S107 LRNISPPCR 0.589 unsp S62 IRRRTQGGG 0.555 PKG 0.556 cdk5

0.551 DNAPK 0.869 unsp Mo-CBP3-2 Mo-CBP -2A S112 CRCPSLRQA 0.638 PKA 3 Mo-CBP3-2B 0.571 PKC

0.812 unsp

S140 YRVASNIPS 0.520 PKG

0.504 PKC

0.985 unsp 0.973 unsp

T66 IRRQTQGGG 0.738 PKA 0.772 PKC

0.568 DNAPK S114 CRCPSLRQA 0.585 PKB

0.585 PKA Mo-CBP3-3 Mo-CBP -3A 0.528 PSK 3 Mo-CBP3-3B 0.715 unsp

S142 YRLASNIPA 0.507 cdc2

0.506 PKA

S154 LRPMSCPFG 0.585 PKA

PKA 0.736 unsp T47 HQFQTQQRL DNAPK S107 LRNVSPFCR 0.609 PKA

0.994 unsp 0.576 cdk5

0.822 PKA 0.866 unsp

114 62 0.620 DNAPK S CRCPSLRQA 0.717 PKC Mo-CBP3-4 S IRRWSQGGG 0.525 ATM 0.675 PKA

0.510 RSK S121 QAVQSAQQQ 0.538 PKC

0.502 PKG 0.962 unsp

S142 YRVASRIPA 0.615 PKA

0.562 PKG Fonte: elaborada pelo autor. 112

Table S5. Assignment of experimental molecular masses, as determined by ESI-MS, to amino acid sequences of Mo-CBP3 -1 small chain. Monoisotopic mass values were those calculated from each amino acid sequence. Predicted mass values were obtained from the monoisotopic masses, including mass changes due to fixed and variable modifications (Oxi: Met oxidation; Hyd: Pro hydroxylation; Pho: Ser or Thr phosphorylaion; pGlu: cyclization of N-terminal Gln).

Monoisotopic Predicted Experimental Peptide SD ∆ (Da) Variable modifications mass mass mass

39 69 Mo-CBP3-1 ( QRC ... PME )31 aa 3781,89 3911,9278 3911,2564 0,1680 0,6714 1x Oxi

39 69 Mo-CBP3-1 ( QRC ... PME )31 aa 3781,89 3974,8676 3974,7311 0,0813 0,1366 pGlu, 1x Oxi and 1x Pho

39 69 Mo-CBP3-1 ( QRC ... PME )31 aa 3781,89 3990,8625 3990,6306 0,0000 0,2319 pGlu, 1x Oxi, 1x Pho and 1x Hyd

39 70 Mo-CBP3-1 ( QRC ... MED )32 aa 3896,92 4025,9262 4025,6528 0,0000 0,2734 pGlu, 1x Oxi and 1x Hyd

39 70 Mo-CBP3-1 ( QRC ... MED )32 aa 3896,92 4073,9027 4074,1838 0,0000 0,2811 pGlu and 1x Pho

38 69 Mo-CBP3-1 ( QQR ... PME )32 aa 3909,95 4086,9327 4086,8108 0,0000 0,1219 pGlu and 1x Pho

39 70 Mo-CBP3-1 ( QRC ... MED )32 aa 3896,92 4122,9191 4122,8218 0,0000 0,0973 1x Oxi, 1x Pho and 1x Hyd

39 71 Mo-CBP3-1 ( QRC ... EDV )33 aa 3995,99 4188,9676 4188,7649 0,0659 0,2027 pGlu, 1x Oxi and 1x Pho

39 71 Mo-CBP3-1 ( QRC ... EDV )33 aa 3995,99 4204,9625 4204,7363 0,0000 0,2262 pGlu, 1x Oxi, 1x Pho and 1x Hyd

38 70 Mo-CBP3-1 ( QQR ... MED )33 aa 4024,98 4218,9893 4219,2536 0,0684 0,2644 1x Pho

38 70 Mo-CBP3-1 ( QQR ... MED )33 aa 4024,98 4233,9525 4233,3965 0,0000 0,5560 pGlu, 1x Oxi and 1x Pho

39 72 Mo-CBP3-1 ( QRC ... DVE )34 aa 4125,03 4254,0362 4253,9771 0,0366 0,0591 pGlu, 1x Oxi and 1x Hyd

37 70 Mo-CBP3-1 ( QQQ ... MED )34 aa 4153,04 4267,0829 4267,5342 0,1429 0,4513

39 72 Mo-CBP3-1 ( QRC ... DVE )34 aa 4125,03 4302,0127 4302,7402 0,0000 0,7275 pGlu and 1x Pho

38 71 Mo-CBP3-1 ( QQR ... EDV )34 aa 4124,05 4317,0276 4317,5176 0,0578 0,4900 pGlu, 1x Oxi and 1x Pho

39 72 Mo-CBP3-1 ( QRC ... DVE )34 aa 4125,03 4351,0291 4350,8582 0,0757 0,1709 1x Oxi, 1x Pho and 1x Hyd

38 72 Mo-CBP3-1 ( QQR ... DVE )35 aa 4253,09 4367,1329 4367,0242 0,5719 0,1087 Continua...

113 Continuação...

38 72 Mo-CBP3-1 ( QQR ... DVE )35 aa 4253,09 4430,0727 4430,3102 0,4087 0,2375 pGlu and 1x Pho

37 71 Mo-CBP3-1 ( QQQ ... EDV )35 aa 4252,10 4446,1093 4446,3865 0,1419 0,2772 pGlu, 1x Oxi and 1x Pho 38 72 Mo-CBP3-1 ( QQR ... DVE )35 aa 4253,09 4399,1227 4399,6560 0,1986 0,5333 1x Oxi and 1x Hyd Fonte: elaborada pelo autor.

114

Table S6. Assignment of experimental molecular masses, as determined by ESI-MS, to amino acid sequences of Mo-CBP3 -2 small chain. Monoisotopic mass values were those calculated from each amino acid sequence. Predicted mass values were obtained from the monoisotopic masses, including mass changes due to fixed and variable modifications.

Monoisotopic Experimental Peptide Predicted mass SD ∆ (Da) Variable modifications mass mass

39 69 Mo-CBP3-2 ( QRC ... PLD )31 aa 3795,97 3926,0078 3925,1062 0,0404 0,9016 1x Hyd

39 70 Mo-CBP3-2 ( QRC ... LDE )32 aa 3925,01 4038,0213 4038,3741 0,0248 0,3528 pGlu and 1x Hyd

37 70 Mo-CBP3-2 ( QQQ ... LDE )34 aa 4181,13 4278,1464 4279,4971 0,0000 1,3507 pGlu

37 70 Mo-CBP3-2 ( QQQ ... LDE )34 aa 4181,13 4294,1413 4293,5144 0,0525 0,6269 pGlu and 1x Hyd

37 70 Mo-CBP3-2 ( QQQ ... LDE )34 aa 4181,13 4311,1678 4310,6533 0,2483 0,5145 1x Hyd

37 71 Mo-CBP3-2 ( QQQ ... DEV )35 aa 4280,20 4393,2113 4393,4531 0,1126 0,2418 pGlu and 1x Hyd

37 71 Mo-CBP3-2 ( QQQ ... DEV )35 aa 4280,20 4473,1776 4471,8106 0,1646 1,3671 pGlu, 1x Pho and 1x Hyd

37 71 Mo-CBP3-2 ( QQQ ... DEV )35 aa 4280,20 4457,1827 4456,3848 0,0000 0,7979 pGlu and 1x Pho Fonte: elaborada pelo autor.

115

Table S7. Assignment of experimental molecular masses, as determined by ESI-MS, to amino acid sequences of Mo-CBP3 -2A/2B small chain. Monoisotopic mass values were those calculated from each amino acid sequence. Predicted mass values were obtained from the monoisotopic masses, including mass changes due to fixed and variable modifications.

Monoisotopic Experimental Peptide Predicted mass SD ∆ (Da) Variable modifications mass mass

38 70 Mo-CBP3-2A/2B ( RQR ... LDE )33 aa 4081,12 4194,1313 4195,7803 0,0000 1,6490 pGlu and 1x Hyd

38 71 Mo-CBP3-2A/2B ( RQR ... DEV )34 aa 4081,18 4391,1342 4390,1842 0,0000 0,9500 1x Pho and 1x Hyd

38 71 Mo-CBP3-2A/2B ( RQR ... DEV )34 aa 4081,18 4374,1893 4373,8339 0,1999 0,3554 1x Pho

38 72 Mo-CBP3-2A/2B ( RQR ... EVE )35 aa 4309,23 4422,2413 4422,8447 0,0000 0,6034 pGlu and 1x Hyd

37 71 Mo-CBP3-2A/2B ( QRQ ... DEV )35 aa 4308,24 4438,2778 4438,7791 0,1749 0,5012 1x Hyd

37 71 Mo-CBP3-2A/2B ( QRQ ... DEV )35 aa 4308,24 4502,2493 4501,0205 0,0000 1,2288 1x Pho

37 72 Mo-CBP3-2A/2B ( QRQ ... EVE )36 aa 4437,29 4534,3064 4536,0366 0,0000 1,7302 pGlu Fonte: elaborada pelo autor.

116

Table S8. Assignment of experimental molecular masses, as determined by ESI-MS, to amino acid sequences of Mo-CBP3 -3/3A/3B small chain. Monoisotopic mass values were those calculated from each amino acid sequence. Predicted mass values were obtained from the monoisotopic masses, including mass changes due to fixed and variable modifications.

Monoisotopic Experimental Peptide Predicted mass SD ∆ (Da) Variable modifications mass mass

44 72 Mo-CBP3-3/3A/3B ( QCR ... GAL )29 aa 3455,81 3712,7590 3712,7739 0,0000 0,0149 pGlu and 2x Pho

44 73 Mo-CBP3-3/3A/3B ( QCR ... ALE )30 aa 3584,85 3761,8403 3761,9399 0,0000 0,0996 PGlu and 1x Pho

44 73 Mo-CBP3-3/3A/3B ( QCR ... ALE )30 aa 3584,85 3778,8669 3780,6914 0,0888 1,8246 1x Pho

42 71 Mo-CBP3-3/3A/3B ( QQQ ... GGA )30 aa 3598,84 3792,8493 3792,4390 0,0228 0,4103 1x Pho

44 74 Mo-CBP3-3/3A/3B ( QCR ... LED )31 aa 3699,88 3796,8964 3798,3441 0,0306 1,4477 pGlu

42 72 Mo-CBP3-3/3A/3B ( QQQ ... GAL )31 aa 3711,93 3808,94637 3805,0896 0,0000 3,8568 pGlu

43 73 Mo-CBP3-3/3A/3B ( QQC ... ALE )31 aa 3712,92 3809,9325 3810,5969 0,0000 0,6644 pGlu

43 73 Mo-CBP3-3/3A/3B ( QQC ... ALE )31 aa 3712,91 3826,9590 3826,5420 0,1072 0,4170

43 72 Mo-CBP3-3/3A/3B ( QQC ... GAL )30 aa 3583,87 3840,8190 3840,3687 0,0000 0,4503 pGlu and 2x Pho

44 74 Mo-CBP3-3/3A/3B ( QCR ... LED )31 aa 3699,88 3876,8627 3876,0911 0,2331 0,7716 pGlu and 1x Pho

43 73 Mo-CBP3-3/3A/3B ( QQC ... ALE )31 aa 3712,91 3889,8988 3890,5552 0,0000 0,6564 pGlu and 1x Pho

44 75 Mo-CBP3-3/3A/3B ( QCR ... EDV )32 aa 3798,95 3895,9664 3897,6879 1,6699 1,7215 pGlu

42 72 Mo-CBP3-3/3A/3B ( QQQ ... GAL )31 aa 3711,93 3905,9393 3904,9390 0,0000 1,0003 1x Pho

43 74 Mo-CBP3-3/3A/3B ( QQC ... LED )32 aa 3827,94 3941,9829 3940,2865 0,0462 1,6964

42 72 Mo-CBP3-3/3A/3B ( QQQ ... GAL )31 aa 3711,93 3968,8790 3967,4628 0,2623 1,4162 pGlu and 2x Pho

42 73 Mo-CBP3-3/3A/3B ( QQQ ... ALE )32 aa 3840,97 4017,9573 4018,6013 0,0000 0,6440 pGlu and 1x Pho

44 76 pGlu Mo-CBP3-3/3A/3B ( QCR ... DVE )33 aa 3927,99 4025,0064 4025,3928 0,0000 0,3864 Continua...

117

Continuação...

40 71 Mo-CBP3-3/3A/3B ( QQQ ... GGA )32 aa 3854,96 4048,9693 4048,0107 0,0000 0,9586 1x Pho

44 75 Mo-CBP3-3/3A/3B ( QCR ... EDV )32 aa 3798,95 4055,8990 4056,4688 0,0000 0,5698 pGlu and 2x Pho

41 73 Mo-CBP3-3/3A/3B ( QQQ ... ALE )33 aa 3969,03 4066,0495 4065,7617 0,0638 0,2877 pGlu

40 72 Mo-CBP3-3/3A/3B ( QQQ ... GAL )33 aa 3968,05 4082,0929 4080,7900 0,0000 1,3029

44 76 Mo-CBP3-3/3A/3B ( QCR ... DVE )33 aa 3927,99 4104,9727 4105,4468 0,0000 0,4741 pGlu and 1x Pho

41 73 Mo-CBP3-3/3A/3B ( QQQ ... ALE )33 aa 3966,03 4163,0424 4163,0103 0,9469 0,0321 1x Pho

44 76 Mo-CBP3-3/3A/3B ( QCR ... DVE )33 aa 3927,99 4184,9390 4185,9189 0,0000 0,9799 pGlu and 2x Pho

41 74 Mo-CBP3-3/3A/3B ( QQQ ... LED )34 aa 4084,06 4198,1029 4196,8547 0,0078 1,2482

40 75 Mo-CBP3-3/3A/3B ( QQQ ... EDV )36 aa 4311,18 4408,1964 4407,8840 0,2937 0,3124 pGlu

39 74 Mo-CBP3-3/3A/3B ( GQQ ... LED )36 aa 4269,14 4463,1493 4464,7344 0,4160 1,5852 1x Pho

38 73 Mo-CBP3-3/3A/3B ( QGQ ... ALE )36 aa 4282,17 4476,1808 4476,5762 0,0000 0,3954 1x Pho

38 74 Mo-CBP3-3/3A/3B ( QGQ ... LED )37 aa 4397,20 4494,2164 4494,6184 0,0657 0,4020 pGlu

37 74 Mo-CBP3-3/3A/3B ( QQG ... LED )38 aa 4410,23 4524,2729 4521,3341 0,0226 2,9387 Fonte: elaborada pelo autor.

118

Table S9. Assignment of experimental molecular masses, as determined by ESI-MS, to amino acid sequences of Mo-CBP3 -4 small chain. Monoisotopic mass values were those calculated from each amino acid sequence. Predicted mass values were obtained from the monoisotopic masses, including mass changes due to fixed and variable modifications.

Monoisotopic Peptide Predicted mass Experimental mass SD ∆ (Da) Variable modifications mass

38 69 Mo-CBP3-4 ( QQR ... PME )32 aa 3923,97 4116,9476 4116,4297 0,0000 0,5179 pGlu, 1x Oxi and 1x Pho

39 71 Mo-CBP3-4( QRC ... EDV )33 aa 4010,00 4140,0378 4139,7871 0,0625 0,2507 1x Oxi

37 69 Mo-CBP3-4 ( QQQ ... PME )33 aa 4052,02 4324,9639 4324,6920 0,1290 0,2719 1x Oxi and 2x Pho

37 69 Mo-CBP3-4 ( QQQ ... PME )33 aa 4052,02 4246,0293 4245,6079 0,0000 0,4214 1x Pho

37 69 Mo-CBP3-4 ( QQQ ... PME )33 aa 4052,02 4260,9925 4260,8975 0,4093 0,0950 1x Oxi, 1x Pho and 1x Hyd

37 70 Mo-CBP3-4 ( QQQ ... MED )34 aa 4167,05 4344,0327 4343,1727 0,0927 0,8600 pGlu and 1x Pho

38 72 Mo-CBP3-4 ( QQR ... DVE )35 aa 4267,10 4381,1429 4381,7926 0,1290 0,6497 Fonte: elaborada pelo autor.

119

Table S10. Assignment of experimental molecular masses, as determined by ESI-MS, to amino acid sequences of Mo-CBP3 -1 large chain. Monoisotopic mass values were those calculated from each amino acid sequence. Predicted mass values were obtained from the monoisotopic masses, including mass changes due to fixed and variable modifications.

Experimental Peptide Monoisotopic mass Predicted mass SD ∆ (Da) Variable modifications mass

92 160 Mo-CBP3.1 ( RPP ... RQQ )69 aa 7912,95 8335,0451 8334,7607 0,0000 0,2844 1x Pho

91 160 Mo-CBP3.1 ( RRP ... RQQ )70 aa 8069,05 8427,1737 8426,5811 0,0000 0,5926 1x Oxi

90 160 Mo-CBP3.1 ( ARR ... RQQ )71 aa 8140,09 8514,2086 8513,2761 0,0913 0,9324 2x Oxi

90 160 Mo-CBP3.1 ( ARR ... RQQ )71 aa 8140,09 8530,2035 8529,4355 0,0000 0,7680 2 x Oxi and 1x Hyd

90 160 Mo-CBP3.1 ( ARR ... RQQ )71 aa 8140,09 8562,1851 8562,2362 0,5360 0,0511 1x Pho

91 160 Mo-CBP3.1 ( RRP ... RQQ )70 aa 8069,05 8619,0962 8618,6650 0,0000 0,4312 2x Oxi, 2x Pho and 1x Hyd

90 160 Mo-CBP3.1 ( ARR ... RQQ )71 aa 8140,09 8722,1178 8721,7988 0,0000 0,3189 3X Pho Fonte: elaborada pelo autor.

120

Table S11. Assignment of experimental molecular masses, as determined by ESI-MS, to amino acid sequences of Mo-CBP3 -2 large chain. Monoisotopic mass values were those calculated from each amino acid sequence. Predicted mass values were obtained from the monoisotopic masses, including mass changes due to fixed and variable modifications.

Monoisotopic Experimental Peptide Predicted mass SD ∆ (Da) Variable modifications mass mass

87 158 Mo-CBP3.2 ( GPG ... RQQ )72 aa 8205,01 8771,0511 8771,7002 0,0000 0,6491 3x Oxi, 2x Pho and 1x Hyd

89 158 Mo-CBP3.2 ( GRQ ... RQQ )71 aa 8050,94 8409,0637 8408,6855 0,0000 0,3782 1x Oxi

89 158 Mo-CBP3.2 ( GRQ ... RQQ )71 aa 8050,94 8441,0541 8440,3082 0,4715 0,7459 3x Oxi

89 158 Mo-CBP3.2 ( GRQ ... RQQ )71 aa 8050,94 8473,0433 8472,3204 0,3267 0,7229 1x Pho

88 158 Mo-CBP3.2 ( PGR ... RQQ )71 aa 8147,99 8490,1188 8490,4727 0,0000 0,3539

89 158 Mo-CBP3.2 ( GRQ ... RQQ )70 aa 8050,94 8505,0249 8505,3010 0,1514 0,2761 2x Oxi and 1x Pho

88 158 Mo-CBP3.2 ( PGR ... RQQ )71 aa 8147,99 8522,1086 8521,8914 0,4001 0,2172 2x Oxi

88 158 Mo-CBP3.2 ( PGR ... RQQ )71 aa 8147,99 8570,0933 8570,5693 0,3216 0,4760 1x Pho

87 158 Mo-CBP3.2 ( GPG ... RQQ )72 aa 8205,01 8579,1286 8579,2542 0,8600 0,1256 2x Oxi

87 158 Mo-CBP3.2 ( GPG ... RQQ )72 aa 8205,01 8595,1235 8595,4322 0,5542 0,3087 3x Oxi

90 158 Mo-CBP3.2 ( RQP ... RQQ )69 aa 7993,91 8623,9225 8623,2999 0,6377 0,6226 3x Oxi and 3x Pho

89 158 Mo-CBP3.2 ( GRQ ... RQQ )70 aa 8050,94 8664,9576 8665,2533 0,2346 0,2957 2x Oxi and 3x Pho

89 158 Mo-CBP3.2 ( GRQ ... RQQ )70 aa 8050,94 8680,9525 8680,6878 0,4290 0,2647 3x Oxi and 3x Pho

87 158 Mo-CBP3.2 ( GPG ... RQQ )72 aa 8205,01 8723,0746 8724,4606 0,5492 1,3861 1x Oxi and 2x Pho

87 158 Mo-CBP3.2 ( GPG ... RQQ )72 aa 8205,01 8819,0276 8820,3047 0,0000 1,2771 2x Oxi and 3x Pho Fonte: elaborada pelo autor.

121

Table S12. Assignment of experimental molecular masses, as determined by ESI-MS, to amino acid sequences of Mo-CBP3 -2A large chain. Monoisotopic mass values were those calculated from each amino acid sequence. Predicted mass values were obtained from the monoisotopic masses, including mass changes due to fixed and variable modifications.

Monoisotopic Experimental Peptide Predicted mass SD ∆ (Da) Variable modifications mass mass

90 158 Mo-CBP3.2A ( RQP ... RQQ )69 aa 7977,88 8463,9547 8463,5996 0,0000 0,3551 3x Oxi, 1x Pho and 1x Hyd

89 158 Mo-CBP3.2A ( GRQ ... RQQ )70 aa 8034,90 8456,9951 8455,1758 0,0000 1,8193 1x Pho

91 158 Mo-CBP3.2A ( QPA ... RQQ )68 aa 7821,78 8499,7772 8499,5645 0,0000 0,2127 3x Oxi, 3x Pho and 3x Hyd

87 158 Mo-CBP3.2A ( GPG ... RQQ )72 aa 8188,98 8531,1088 8531,8895 0,2543 0,7808

88 158 Mo-CBP3.2A ( PGR ... RQQ )71 aa 8131,96 8538,0684 8538,0788 0,5254 0,0104 3x Oxi and 1x Hyd

88 158 Mo-CBP3.2A ( PGR ... RQQ )71 aa 8131,96 8554,0551 8553,9866 0,3301 0,0685 1x Pho

89 158 Mo-CBP3.2A ( GRQ ... RQQ )70 aa 8034,90 8568,9512 8569,0008 0,0714 0,0496 2x Oxi and 2x Pho

87 158 Mo-CBP3.2A ( GPG ... RQQ )72 aa 8188,98 8611,0751 8610,7493 0,6266 0,3258 1x Pho

88 158 Mo-CBP3.2A ( PGR ...RQQ )71 aa 8131,96 8761,9725 8760,5547 0,0000 1,4178 3x Pho

88 158 Mo-CBP3.2A ( PGR ...RQQ )71 aa 8131,96 8793,9623 8792,0915 1,0120 1,8708 3x Oxi, 3x Pho and 2x Hyd

87 158 Mo-CBP3.2A ( GPG ...RQQ )72 aa 8188,98 8834,9874 8834,7402 0,0000 0,2472 3x Oxi, 3x Pho and 2x Hyd Fonte: elaborada pelo autor.

122

Table S13. Assignment of experimental molecular masses, as determined by ESI-MS, to amino acid sequences of Mo-CBP3 -2B large chain. Monoisotopic mass values were those calculated from each amino acid sequence. Predicted mass values were obtained from the monoisotopic masses, including mass changes due to fixed and variable modifications.

Experimental Peptide Monoisotopic mass Predicted mass SD ∆ (Da) Variable modifications mass

89 158 Mo-CBP3.2B ( GRQ ... RQQ )70 aa 8088,00 8531,9076 8532,3174 0,4674 0,4098 2x Pho

89 158 Mo-CBP3.2B ( GRQ ... RQQ )70 aa 8088,00 8549,0349 8547,4759 0,6474 1,5590 1x Oxi and 2x Pho

87 158 Mo-CBP3.2B ( GPG ... RQQ )71 aa 8242,07 8591,1569 8590,3839 0,3944 0,7730 3x Oxi and 1x Hyd

89 158 Mo-CBP3.2B ( GRQ ... RQQ )70 aa 8088,00 8613,0145 8613,4457 0,3343 0,4312 3x Oxi, 2x Pho and 2x Hyd

87 158 Mo-CBP3.2B ( GPG ... RQQ )71 aa 8242,07 8639,1335 8638,0301 0,2036 1,1034 2x Oxi and 1x Pho

88 158 Mo-CBP3.2B ( PGR ... RQQ )70 aa 8185,05 8646,0849 8645,0156 0,0000 1,0693 1x Oxi and 2x Pho

90 158 Mo-CBP3.2B ( RQP ... RQQ )71 aa 8030,98 8651,9558 8652,5990 0,6392 0,6432 3x Oxi, 3x Pho and 3x Hyd

89 158 Mo-CBP3.2B ( GRQ ... RQQ )73 aa 8088,00 8660,9910 8661,4876 2,0690 0,4966 3x Oxi and 3x Pho

88 158 Mo-CBP3.2B ( PGR ... RQQ )70 aa 8185,05 8694,0696 8694,9947 0,2688 0,9251 3x Oxi, 2x Pho and 1x Hyd

89 158 Mo-CBP3.2B ( GRQ ... RQQ )71 aa 8088,00 8708,9758 8709,2909 0,8155 0,3151 3x Oxi, 3x Pho and 3x Hyd

87 158 Mo-CBP3.2B ( GPG ... RQQ )71 aa 8242,07 8751,0896 8750,4605 0,6738 0,6291 3x Oxi, 2x Pho and 1x Hyd Fonte: elaborada pelo autor.

123

Table S14. Assignment of experimental molecular masses, as determined by ESI-MS, to amino acid sequences of Mo-CBP3 -3/3B large chain. Monoisotopic mass values were those calculated from each amino acid sequence. Predicted mass values were obtained from the monoisotopic masses, including mass changes due to fixed and variable modifications.

Monoisotopic Experimental Peptide Predicted mass SD ∆ (Da) Variable modifications mass mass

91 160 Mo-CBP3.3/3B ( ARR ... GQQ )70 aa 8022,07 8460,1600 8460,7627 0,0000 0,6027 1x Oxi and 1x Pho

92 160 Mo-CBP3.3/3B ( RRP ... GQQ )69 aa 7951,03 8485,0812 8485,0348 0,1823 0,0464 2x Oxi and 2x Pho Fonte: elaborada pelo autor.

124

Table S15. Assignment of experimental molecular masses, as determined by ESI-MS, to amino acid sequences of Mo-CBP3 -3A large chain. Monoisotopic mass values were those calculated from each amino acid sequence. Predicted mass values were obtained from the monoisotopic masses, including mass changes due to fixed and variable modifications.

Monoisotopic Experimental Peptide Predicted mass SD ∆ (Da) Variable modifications mass mass

92 160 Mo-CBP3.3A ( RRP ... GQQ )69 aa 7905,05 8382,0798 8381,7925 0,2013 0,2873 2x Oxi and 1x Pho

91 160 Mo-CBP3.3A ( ARR ... GQQ )70 aa 7976,08 8398,1751 8397,4561 0,0000 0,7190 1x Pho Fonte: elaborada pelo autor.

125

Table S16. Assignment of experimental molecular masses, as determined by ESI-MS, to amino acid sequences of Mo-CBP3 -4 large chain. Monoisotopic mass values were those calculated from each amino acid sequence. Predicted mass values were obtained from the monoisotopic masses, including mass changes due to fixed and variable modifications.

Peptide Monoisotopic mass Predicted mass Experimental mass SD ∆ (Da) Variable modifications

90 160 Mo-CBP3.4 ( ARR ... RQQ )71 aa 8156,12 8498,2488 8498,6921 0,2299 0,4433

90 160 Mo-CBP3.4 ( ARR .... RQQ )71 aa 8156,12 8514,2437 8514,5244 0,0000 0,2807 1x Oxi

91 160 Mo-CBP3.4 ( RRP ... RQQ )70 aa 8085,08 8603,1363 8604,6192 0,4706 1,4829 1x Oxi and 2x Pho

90 160 Mo-CBP3.4 ( ARR ... RQQ )71 aa 8156,12 8626,1998 8626,4795 0,0000 0,2797 2x Oxi, 1x Pho and 1x Hyd

91 160 Mo-CBP3.4 ( RRP ... RQQ )70 aa 8085,08 8667,1160 8667,5136 0,4063 0,3976 3x Pho

91 160 Mo-CBP3.4 ( RRP ... RQQ )70 aa 8085,08 8731,0874 8731,9766 0,0000 0,8892 2x Oxi, 3x Pho and 2x Hyd

90 160 Mo-CBP3.4 ( ARR ... RQQ )71 aa 8156,12 8738,1560 8737,9868 0,8197 0,1692 3x Pho

90 160 Mo-CBP3.4 ( ARR ... RQQ )71 aa 8156,12 8802,1274 8802,2155 0,0759 0,0881 2x Oxi, 3x Pho and 2x Hyd Fonte: elaborada pelo autor.