Chemical Proteomics: mass spectrometry-based label-free quantitative approach in cell systems

Catarina Guimarães Frazão de Faria

Thesis to obtain the Master Science Degree in Biological Engineering

Supervisors: PhD Alfonsina D’Amato Prof. Maria Matilde Soares Duarte Marques

Examination Committee Chairperson: Prof. Arsénio do Carmo Sales Mendes Fialho Supervisor: Prof. Maria Matilde Soares Duarte Marques Member of the Committee: Dra. Alexandra Maria Moita Antunes

October 2018 ACKNOWLEDGMENTS

First and foremost, I would like to dedicate this work to my grandparents Maria Helena and José Manuel, who unfortunately did not live long enough to accompany my latest journeys yet had always encouraged me to pursue my goals. The second note goes to my parents, for their categorical support, and for their effort to financially make it possible for me to embark on a lifetime experience: the 6-months-Erasmus in Milan that I had long dreamed about, to do the research internship that prospered this thesis. All and all, they are the people that over the years gave all the tools for me to build my character, to enrich my vision on things, and to thrive.

I am sincerely thankful for the opportunity of work and the conditions presented at Unità di Ricerca di Analisi Farmaceutica e Biofarmaceutica, Dipartimento di Scienze Farmaceutiche (DISFARM), Università degli Studi di Milano, by the hands of Prof. Marina Carini and Prof. Giancarlo Aldini, who accepted to receive me, and Alfonsina D’Amato PhD, in the quality of my supervisor. To my Italian colleagues, with whom I shared most of my time at the lab, words cannot describe how incredibly fortunate I feel to have found myself in such a joyful, enthusiastic and caring environment. More than simply being co-workers, we pieced together memories that truly marked my experience abroad, for which I am very grateful. And luckily alongside I had the chance to become somewhat fluent in Italian; se non fosse stato per il loro incentivo quotidiano, non sarei mai riuscita a farlo. Ricorderò sempre tutti i momenti divertenti a pausa pranzo, la loro perplessità con il cibo che portavo, le mille volte che ho detto che non volevo caffè; le serate di beach volley e in palestra, e tutti gli altri programmi che abbiamo fatto fuori dal laboratorio, dagli aperitivi all’Union Club (che odiavo) e pranzi al Doge alla cena portoghese a casa mia, non dimenticando mai le giornate ai laghi di Como e d'Iseo, e in particolare il bellissimo giorno del mio compleanno. E per tutto questo e molto altro, per tutti i ricordi che ho nel cuore, lascio loro una parola di ringraziamento: a Giacomo, dal primo giorno il mio compagno di lavoro in questo progetto; a Laura, Marco, Giovanna, Martina, Ettore, Paola, Nicolas, Federica; e ad Alessandra.

I would like as well to thank my internal supervisor at Instituto Superior Técnico, Prof. Matilde Marques, for the guidance and help provided.

On another personal note, I would like to give a special thank you to all the friends that supported me throughout the years, particularly during this Erasmus period, which if it had to be summarized in one word it would be dramatic. To my dearest friends from Portugal, some that came to visit, some that were abroad too and I went to visit, and some that I met in Milan; and to my international Erasmus buddies Theresa, Sofija and Tomasz. They say Erasmus is about the people you encounter and those whom you become close with, and it could not be any truer. I am tremendously grateful for the 6 months I got to spend in Milan developing this work; for all the new friendships, the travelling, the chance to expand my horizons and, above all, for allowing me to grow as a person and to acquire life lessons that I will forever recall.

3

PREFACE

The work presented in this thesis was performed at Unità di Ricerca di Analisi Farmaceutica e Biofarmaceutica, Dipartimento di Scienze Farmaceutiche (DISFARM), Università degli Studi di Milano, address Via Luigi Mangiagalli 25-20133, Milan, Italy, under the supervision of Prof. Marina Carini and Alfonsina D’Amato PhD, during the period February – July 2018 within the frame of the Erasmus+ programme. It was co-supervised at Instituto Superior Técnico by Prof. Matilde Marques.

4

ABSTRACT

For the present project, a novel mass spectrometry-based label-free untargeted method was followed under the paradigm of Quantitative Proteomics. The work developed provides an example of a modern-era explorative proteomic approach in cell systems, supporting the current multidisciplinary context of Chemical Proteomics as a fundamental contributor to drug development. The method was based on applying cellular shotgun proteomics without labelling, making use of the high- resolution OrbitrapTM FusionTM TribridTM mass spectrometer for a technologically advanced mass spectrometric analysis coupled with on-line nano-high performance liquid chromatographic separation, which allows an in- depth identification and characterization of entire proteomes. In this case, it was studied the proteolytic digested proteome of Caco-2 cells upon treatment with 4-hydroxy- 2-nonenal (HNE) in concentration range from 0 to 100 µM. HNE is an electrophile molecule derived from lipid peroxidation of poly-unsaturated fatty acids, known to play a key cytoprotective role in oxidative stress prior to exerting toxic effects. Thus, the aim was to attain an integrative profiling of this complex biological matrix to assess protein changes and post-translational modifications that could be related to the beneficial character of HNE in oxidative stress, and the cellular pathways involved in the antioxidant response. From a total of 3354 identified with high-confidence, the ones differentially expressed against the control were brought to closer attention, using various bioinformatic analysis tools to generate protein networks clustered on the premise of relevant biological processes modulated by HNE. These particularly included response to stimulus and stress, detoxification of reactive oxygen species, oxidation metabolism, protein regulation and degradation, assembly, inflammation and immune response. As result of the functional proteomic analysis, several protein targets and pathways identified might be further explored and validated for biomarkers of oxidative stress and endorse the design of new bioactive compounds for the treatment of oxidative stress-related conditions such as cancer, neurodegenerative, autoimmune and cardiovascular diseases. In conclusion, a progressive high-tech label-free proteomic approach with human cells is here reported highlighting the search for novel potential beneficial effects of HNE at molecular level, emphasizing the importance of mass spectrometry-based proteomics in pre-clinical pharmaceutical research.

Keywords quantitative proteomics, mass spectrometry, functional proteomic analysis, HNE, oxidative stress, antioxidant response

5

Resumo

No presente projecto, sob o paradigma da Proteómica Quantitativa, foi seguido um método original de espectrometria de massa sem marcação e não-direccionado. O trabalho desenvolvido fornece um exemplo de uma abordagem moderna de proteómica explorativa em sistemas celulares, no actual contexto multidisciplinar da Proteómica Química como contribuinte fundamental para o desenvolvimento de fármacos. O método baseou-se na aplicação de proteómica celular do tipo shotgun, usando o espectrómetro de massa de alta-resolução OrbitrapTM FusionTM TribridTM para uma análise espectrométrica tecnologicamente avançada acoplada on-line com separação cromatográfica líquida de nano-alto desempenho, que permite a identificação e caracterização aprofundada de proteomas completos. Neste caso, estudou-se o proteoma digerido de células Caco-2 após tratamento com 4-hidroxi-2-nonenal (HNE) em gama de concentração 0 a 100 µM. HNE é uma molécula eletrofílica derivada da peroxidação lipídica de ácidos gordos poli-insaturados, que desempenha um importante papel citprotector durante o stress oxidativo antes de exercer efeitos nocivos. O objectivo passou assim por obter um perfil proteico integrativo desta matriz biológica complexa de modo a avaliar alterações proteicas e modificações pós-tradução que possam relacionar- se com o carácter benéfico do HNE no stress oxidativo e as vias celulares envolvidas na resposta antioxidante. De um total de 3354 proteínas identificadas com nível alto de confiança, aquelas que se mostraram diferencialmente expressas em comparação com os controlos foram analisadas com maior atenção, recorrendo a diversas ferramentas de análise bioinformática para gerar redes de proteínas agrupadas na premissa de processos biológicos relevantes modulados pelo HNE. Estes incluíram particularmente resposta a estímulos e stress, destoxificação de espécies oxigenadas reactivas, metabolismo de oxidação, regulação e degradação proteica, organização de nucleossomas, resposta inflamatória e imunitária. Como resultado de uma análise proteómica funcional, várias proteínas-alvo e vias de interesse identificadas poderão ser exploradas como biomarcadores de stress oxidativo e endossar o desenho de novos compostos bioactivos para o tratamento de doenças relacionadas com o stress oxidativo, por exemplo, cancro e doenças neurodegenerativas, autoimunes e cardiovasculares. Em conclusão, reporta-se aqui uma abordagem proteómica sem marcação com células humanas, progressiva e de alta tecnologia, destacando a procura de novos potenciais efeitos benéficos do HNE ao nível molecular, que enfatiza a importância da proteómica de espectrometria de massa ao nível pré-clínico de estudos farmacêuticos.

Palavras-chave proteómica quantitativa, espectrometria de massa, análise proteómica funcional, HNE, stress oxidativo, resposta antioxidante

6

TABLE OF CONTENTS

ACKNOWLEDGMENTS ...... 3 PREFACE ...... 4 ABSTRACT ...... 5 Resumo ...... 6 TABLE OF CONTENTS ...... 7 LIST OF ABBREVIATIONS AND ACRONYMS...... 9 LIST OF FIGURES AND TABLES ...... 11 INTRODUCTION ...... 13 1. Chemical Proteomics ...... 13 2. The impact of Proteomics ...... 15 2.1 Background context ...... 15 2.2 Expression vs. Structural and Functional Proteomics ...... 16 2.3 Quantitative Proteomics ...... 16 2.4 Mass spectrometry-based Proteomics: Multi-dimensional Protein Identification Technology ...... 17 2.4.1 Tandem mass spectrometry ...... 17 2.4.2 Top-down vs. Bottom-up Proteomics ...... 20 2.4.3 Methods of quantification ...... 21 2.4.4 Interpretation of the fragmentation spectra and peptide identification ...... 25 2.5 Protein annotation and function prediction ...... 29 2.5.1 Annotation methods ...... 29 2.5.2 Functional classification systems ...... 30 2.5.3 Genome and protein integration databases ...... 32 2.5.4 Limitations of annotation procedures ...... 33 3. The importance of cell systems in research ...... 34 3.1 Applications of cell culture ...... 34 3.2 Human colorectal cancer cell line Caco-2 as a system model ...... 34 3.2.1 Cell line development, characteristics and properties ...... 34 3.2.2 General maintenance protocol ...... 36 3.2.3 Caco-2 in the research field: advantages and disadvantages ...... 36 4. Case study: antioxidant expression activated by 4-hydroxy-2-nonenal ...... 38 4.1 4-hydroxy-2-nonenal: a product of lipid peroxidation ...... 38 4.2 Keap1-Nrf2 signalling pathway for an antioxidant response ...... 41 4.3 Effect of oxidative stress and Nrf2 in carcinogenesis ...... 44 5. Scope of the present project ...... 45 MATERIALS AND METHODS ...... 46

7

1. Materials ...... 46 1.1 Chemicals, Reagents and Kits ...... 46 1.2 Solutions ...... 47 1.3 Instruments ...... 48 2. Methods ...... 48 2.1 Cell culture conditions ...... 48 2.2 Cell treatment with HNE ...... 49 2.3 Cell lysis ...... 49 2.4 Protein digestion ...... 50 2.5 ZipTip procedure ...... 50 2.6 High-resolution mass spectrometry analysis ...... 51 2.7 Data processing with bioinformatic tools ...... 51 RESULTS AND DISCUSSION ...... 53 1. Caco-2 proteome modulated by HNE ...... 53 1.1 Up- and down-regulation of proteins ...... 53 1.1.1 Protein identification by Peptide Spectral Match with MaxQuant ...... 53 1.1.2 Statistical breakdown with Perseus ...... 53 1.1.3 Protein annotation and functional proteomic analysis ...... 63 1.2 Non-enzymatic post-translational modifications ...... 76 1.2.1 Protein identification by Peptide Spectral Match with Proteome Discoverer ...... 76 1.2.2 Search for HNE-adducts ...... 77 CONCLUSIONS AND FUTURE PERSPECTIVES ...... 80 REFERENCES ...... 81 APPENDICES ...... 89 Appendix A – Protein identification software workflows to perform Label-Free quantification analysis ...... 89 Appendix B – List of up- and down-regulated proteins in HNE-treated Caco-2 samples vs. control sample ...... 92 Appendix C – Statistical evidence of replicate reproducibility in HNE-treated Caco-2 samples ...... 96 Appendix D – List of post-translational modifications by HNE in HNE-treated Caco-2 samples ...... 97 Appendix E – Poster communication P080, XXVII Congresso della Divisione di Chimica Analitica, Bologna 16-20 September 2018 ...... 98

8

LIST OF ABBREVIATIONS AND ACRONYMS

2-DE: two-dimensional gel Electrophoresis EDTA: Ethylenediaminetetraacetic Acid 4-HHE or HNE: 4-hydroxy-2-hexenal EP: Electrophile 4-HNA: 4-hydroxynonanoic acid ESI: Electrospray Ionization 4-HNE: 4-hydroxy-2-nonenal ETD: Electron-Transfer Dissociation 4-ONE: 4-oxo-2-nonenal FA: Formic Acid ABPP: Activity-based Protein Profiling FBS: Fetal Bovine Serum ACN: Acetonitrile FDR: False Discovery Rate ADH: Alcohol Dehydrogenase FT-ICR: Fourier Transform Ion Cyclotron Resonance ADMET: Absorption, Distribution, Metabolism, FWHM: full width at half maximum Excretion and Toxicity Gcl: glutamate-cysteine ligase AIM: Antioxidant Inflammation Modulators Gly or G: Glycine AKR: Aldo-Keto Reductase GO: AMBIC: Ammonium bicarbonate GPx: Glutathione Peroxidase AQUA: Isotopic-labeled Absolute Quantification GSH: Glutathione ARE: Antioxidant Response Element GST: Glutathione-S-Transferase Arg or R: Arginine HCA: Hierarchical Cluster Analysis AUC: Area Under the Curve HCD: Higher-energy Collisional Dissociation BSA: Bovine Serum Albumin HEPES: 4-(2-hydroxyethyl)-1- bZip: Basic Leucine Zipper piperazineethanesulfonic Acid Caco-2: Human Epithelial Colorectal His or H: Histidine Adenocarcinoma cell line HO-1: Heme Oxygenase-1 CDDO-Me: bardoloxone methyl HPLC: High-Performance Liquid Chromatography CID: Collision-Induced Dissociation HSE: Heat Shock Element CNC: Cap-N-Collar HTS: High-Throughput Screening CRC: Colorectal Cancer ICAT: Isotope-coded Affinity Tags Cul3: Cullin-3 iTRAQ: Isobaric Tags for Relative and Absolute Cys or C: Cysteine Quantification DDA: Data-Dependent Acquisition Keap1: Kelch-like ECH- associated protein 1 DHN: 1,4-dihydroxy-2-nonene KEGG: Kyoto Encyclopedia of and Genomes DIA: Data-Independent Acquisition LC: Liquid Chromatography DMA: dimethylacetal LEGO: Logical Extension of the Gene Ontology DMEM: Dulbecco’s Modified Eagle Medium LFQ: Label-Free Quantification DMF: dimethylfumarates LPO: Lipid Peroxidation DTT: 1,4-Dithiothreitol Lys or K: Lysine ECD: Electron-Capture Dissociation m/z: Ratio of mass over charge

9

MALDI: Matrix-Assisted Laser Desorption or PSM: Peptide-Spectral Match Ionization PTM: Post-Translational Modifications MDA: Malondyaldehyde PUFA: Polyunsaturated Fatty Acids MOA: Mode of Action Q: Quadrupole MRM: Multiple Reaction Monitoring ROS: Reactive Oxygen Species Mrp: Multidrug Resistance-associated protein RPMIM: Roswell Park Memorial Institute Medium MS: Mass Spectrometry SID: Surface-Induced Dissociation MS/MS or MSn: Tandem Mass Spectrometry SILAC: Stable Isotope Labeling by Amino Acids in Cell MudPit: Multidimensional Protein Identification Culture Technology SRM: Selected Reaction Monitoring NMR: Nuclear Magnetic Resonance STRING: Search Tool for the Retrieval of Interacting NQO-1: NAD(P)H:Quinone Oxidoreductase-1 Genes/Proteins Nrf2: Nuclear Factor (erythroid-derived 2)-like 2 TMT: Tandem Mass Tags PANTHER: Protein Analysis Through Evolutionary TOF: Time-of-Flight Relationships TR-1: Thioredoxin-1 PBS: Phosphate Buffered Saline Tris: Tris(hydroxymethyl)aminomethane PCA: Principal Component Analysis TXNRD1: Thioredoxin Reductase-1 PCC: Pearson Correlation Coefficient UniProt: Universal Protein Resource PD: Pharmacodynamics UniProtKB: UniProt Knowledgebase PDB: UniParc: UniProt Archive Pen-Strep: Penicillin-streptomycin UniRef: UniProt Reference Clusters PK: Pharmacokinetics UPR: Ubiquitin Proteosome System Prx-1: Peroxiredoxin-1 XIC: Extracted-Ion Chromatogram PRM: Paralell Reaction Monitoring

10

LIST OF FIGURES AND TABLES

Figure 1 – Applications of Proteomics at different stages of Drug Discovery...... 14 Figure 2 – From genome to proteome: an increase in complexity and dynamics...... 15 Figure 3 – Orbitrap Fusion™ Tribrid™ Mass Spectrometer...... 19 Figure 4 – Orbitrap Fusion™ ion path...... 19 Figure 5 – Difference between protein-based top-down and peptide-based bottom-up proteomics...... 21 Figure 6 – Stages of incorporation of stable isotope labels in typical labeling workflows in comparison to label-free in quantitative proteomics and respective mass spectrometric data output...... 24 Figure 7 – Peptide fragmentation ion series and correspondent MS/MS fragmentation spectra...... 26 Figure 8 – Database searching algorithms...... 27 Figure 9 – Schematic of the peptide scoring algorithm in Andromeda...... 28 Figure 10 – Similarity between differentiated Caco-2 cells and feature-specific enterocytes...... 35 Figure 11 – Reasons for failure in Drug Discovery through the years...... 37 Figure 12 – Skeletal structural formulas of some end-products of lipid peroxidation...... 38 Figure 13 – Overview of the relationship between HNE metabolism and toxicity...... 39 Figure 14 – Reaction of HNE with cysteinyl, histidyl and lysyl residues via 1,4-Michael addition, and with lysil residues via Schiff base mechanism with pentylpyrrole rearrangement...... 40 Figure 15 – Electrophile-cell interactions: dose-response curves...... 41 Figure 16 – Activation of the Nrf2/ARE signalling pathway by HNE...... 42 Figure 17 – Organization and functions of structure domains in Nrf2...... 42 Figure 18 – Organization and functions of structure domains in Keap1...... 43 Figure 19 – Multi-scatter plot representation (with Pearson Correlation Coefficient) of LFQ intensity amongst replicates for the identified proteins on the Caco-2 sample treated with HNE in concentration 10 µM...... 54 Figure 20 – Volcano plot representation of the protein datasets on the Caco-2 samples treated with HNE in concentration 0.1 µM, 1 µM, 10 µM and 100 µM compared to the control...... 55 Figure 21 – Principal Component Analysis based on LFQ intensity of the identified proteins on the Caco-2 sample replicates: control and treated with HNE in concentration 0.1 µM, 1 µM, 10 µM and 100 µM...... 59 Figure 22 – Principal Component Analysis based on the identified proteins on the Caco-2 sample replicates: control and treated with HNE in concentration 0.1 µM, 1 µM, 10 µM and 100 µM...... 60 Figure 23 – Hierarchical Clustering Analysis for clustering the technical and biological replicates, and the identified proteins on the Caco-2 samples: control and treated with HNE in concentration 0.1 µM, 1 µM, 10 µM and 100 µM. Dendrograms and heat map obtained with the dissimilarity measures Euclidean distance and Average-Linkage...... 62 Figure 24 – Expression pattern across the conditions (replicates) on the Caco-2 samples: control and treated with HNE in concentration 0.1 µM, 1 µM, 10 µM and 100 µM, for two selected protein clusters...... 62 Figure 25 – STRING network diagram with minimum confidence of 0.4 (medium) for the up- and down-regulated proteins on the Caco-2 sample treated with HNE in concentration 0.1 µM – 10 µM combined...... 64 Figure 26 – STRING network diagram with minimum confidence of 0.4 (medium) for the up-regulated proteins on the Caco-2 sample treated with HNE in concentration 100 µM...... 64 Figure 27 – STRING network diagram with minimum confidence of 0.4 (medium) for the down-regulated proteins on the Caco-2 sample treated with HNE in concentration 100 µM...... 65 Figure 28 – PANTHER over-representation test with PANTHER GO-Slim biological process annotation for the list of up-regulated proteins on the Caco-2 sample treated with HNE in concentration 100 µM...... 68 Figure 29 – GOrilla over-representation test with GO biological process annotation for the list of up-regulated proteins on the Caco-2 sample treated with HNE in concentration 100 µM...... 69

11

Figure 30 – ClueGO over-representation test with GO annotation biological process for the list of up-regulated proteins on the Caco-2 sample treated with HNE in concentration 100 µM...... 71 Figure 31 – ClueGO over-representation test with GO annotation biological process for the list of down-regulated proteins on the Caco-2 sample treated with HNE in concentration 100 µM...... 71 Figure 32 – BiNGO over-representation test with GO annotation for biological process for the list of down-regulated proteins on the Caco-2 sample treated with HNE in concentration 100 µM...... 73 Figure 33 – Reactome over-representation test for the list of down-regulated proteins on the Caco-2 sample treated with HNE in concentration 100 µM...... 74 Figure 34 – Top-pathways of the Reactome over-representation test for the list of down-regulated proteins on the sample treated with HNE in concentration 100 µM, according to Ensembl resource...... 74 Figure 35 – MSn spectra of HNE-adducted peptides in the Caco-2 samples treated with HNE...... 78

Figure A1 – Workflow implemented in the Perseus framework to process the results of protein identification by MaxQuant Label-Free analysis of MS data generated by Orbitrap Fusion™ Tribrid™ mass spectrometer...... 89 Figure A2 – Default processing workflow implemented in Proteome Discoverer for Label-Free analysis of MS data generated by Orbitrap Fusion™ Tribrid™ mass spectrometer...... 90 Figure A3 – Default consensus workflow implemented in Proteome Discoverer for Label-Free analysis of MS data generated by Orbitrap Fusion™ Tribrid™ mass spectrometer...... 90 Figure C1 – Multi-scatter plot representation (with Pearson Correlation Coefficient) of LFQ intensity amongst replicates for the identified proteins on the Caco-2 samples: control and treated with HNE in concentration 0.1 µM, 1 µM, 10 µM and 100 µM. ……………………………………..96

Table 1 – Properties of the Caco-2 cell line...... 35 Table 2 – Absorbance, protein concentration and volume of lysate corresponding to 10 µg of protein on the Caco-2 samples: control and treated with HNE in concentration 0.1 µM, 1 µM, 10 µM and 100 µM, for the biological replicates A and B...... 50 Table 3 – Relevant outliers list for up-regulated proteins on the Caco-2 samples treated with HNE in concentration 0.1 µM, 1 µM, 10 µM and 100 µM...... 55 Table 4 – Relevant outliers list for down-regulated proteins on the Caco-2 samples treated with HNE in concentration 0.1 µM, 1 µM, 10 µM and 100 µM...... 57 Table 5 – GO-terms and KEGG pathways enrichment analysis for the list of up-regulated proteins on the Caco-2 sample treated with HNE in concentration 100 µM...... 66 Table 6 – Relevant HNE-adducted peptides in the Caco-2 samples treated with HNE in concentration 0.1 µM and 100 µM...... 77

Table A1 – Protein groups, proteins, peptide groups and peptide-spectral matches count on the Caco-2 samples: control and treated with HNE in concentration 0.1 µM, 1 µM, 10 µM and 100 µM, after applying filters within Proteome Discoverer...... 91 Table B1 – Up- and down-regulated proteins on the Caco-2 sample treated with HNE in concentration 0.1 µM. ………………………………………… 92 Table B2 – Up- and down-regulated proteins on the Caco-2 sample treated with HNE in concentration 1 µM. ………………………………………….. 92 Table B3 – Up- and down-regulated proteins on the Caco-2 sample treated with HNE in concentration 10 µM. ………………………………………… 93 Table B4 – Up- and down-regulated proteins on the Caco-2 sample treated with HNE in concentration 100 µM. ……………………………………….93 Table D1 – HNE-modified proteins on the Caco-2 control sample. ………………………………………………………………………………………………………………. 97 Table D2 – HNE-modified proteins on the Caco-2 sample treated with HNE in concentration 0.1 µM. ……………………………………………………….. 97 Table D3 – HNE-modified proteins on the Caco-2 sample treated with HNE in concentration 100 µM. ……………………………………………………….97

12

INTRODUCTION

1. Chemical Proteomics

Focusing a variety of science disciplines, such as biochemistry, proteomics and informatics, towards an analytical application for medicinal chemistry studies, Chemical Proteomics or Chemoproteomics, defined as the large-scale study of proteins using chemical methods, has recently emerged as a powerful pre-clinical integrated research component for a rapid comprehensive profiling of novel drug targets and biomarkers of disease. (1–3) The main goal is to catalog possible protein binding partners of bioactive small molecules with unknow modes of action in live cells and then apply it in the drug discovery process, in order to, not only understand the produced effects of the drug candidates and to enlighten the cellular mechanisms resulting in the observed phenotype, but also to search for alternative mechanisms linked to previously unrecognized therapeutic potential. (4–6) In this sense, Chemical proteomics is applicable to the functional study of proteomes and post- translational modifications (PTM), of special interest considering that many diseases are associated with changes in protein-protein interactions and PTM that can alter protein activity. This being, the field is ever more being used to assess variations in the reactive proteome and in protein function at different stages of disease progression, aiming at the identification of new biomarkers and therapeutic targets. (1,2,7) Adding to this, the efforts in Chemical Proteomics attempt to prioritize candidate targets, streamline the drug development pipeline and reduce the number of failures downstream. (4)

From the advantage taken of the advances in mass spectrometry-based proteomics methods, having extended the scope of target identification to the whole proteome level, robust and high-throughput workflows for exploring drug-target-phenotype relationships are now available. (7) One of the key developments is that measurements have increasingly become more quantitative, thus enabling a more accurate and reliable assessment of protein changes. (1) For both target-based and phenotypic-based (global) workflows, illustrated in figure 1, proteomics empowers different steps of the process based on its applications to: characterize direct or indirect drug-target interactions for target deconvolution and selectivity/drug affinity profiling; elucidate the mechanism of action by which a drug exerts its pharmacological effect, for target characterization and validation; and identify biomarkers useful for monitoring the result of target modulation, i.e. the phenotype, in vivo. (5,8) Identification of specific protein targets of drugs is a critical step in unraveling their mode of action (MOA), thereby enhancing the understanding of their pharmacodynamics (PD) and allowing the refining of their future clinical applications. (3)

13

Figure 1 – Applications of Proteomics at different stages of Drug Discovery. (From Schirle et. al (2012) (2))

With targeted chemoproteomics, the strategy relies on the use of chemical probes specifically engineered to capture and identify protein targets, focusing on its design and on the analytical strategy used to detect and quantify the captured proteins. The two types of bait probes are: noncovalent tagged or immobilized tool compounds, like a biotin tag; and covalent active site-labelling probes, denoted as activity-based protein profiling (ABPP). (2–4,6,8) The main limits with such approach involve the identification of true protein targets out of a large number of hits since unspecific interactions are nearly impossible to disregard and lead to false positives. Other than developing more elaborate and improved probe strategies, namely with ABPP, this non-specific binding problem can be overcome by quantitative proteomics. (3)

Global protein profiling, which was the focus for the present project, aims at a wide-ranging analysis of protein expression levels in a cell system, and has become a worthwhile complement of gene expression profiling to study MOA of drugs. Since they can often be determined by comparing the phenotype of drug-treated cells with untreated cells, quantitative proteomics is ideal for detecting changes in protein expression or in post- translational modifications. (2,5,8) Monitoring (co)regulation of proteins implicated in a certain cellular process upon drug treatment can functionally link a drug and/or target to induction or inhibition of those mechanisms. This approach proves to be valuable when investigating drugs that affect the stability of proteins such as protease inhibitors or other enzymes involved in proteolytic pathways, particularly the ubiquitin system. (5) Moreover, global proteome changes can reveal complex relationships in signaling cascades and establish causal links between apparently unrelated biochemical events. (8) In typical approaches for deep proteome analysis of single-cells, proteins are extracted from the sample after treatment, digested or not into peptides, and analyzed by single-load separation with liquid chromatography (LC) coupled to tandem mass spectrometry (MS/MS). For advanced PTM profiling, a PTM- specific enrichment step is usually included. (5) Nevertheless, optimized strategies for the generation of comprehensive human proteomes have shown that a multi-load separation allows for greater peak capacity and shorter gradients, not only to increase the speed, dynamic range and resolving power of proteome analysis, but also that of PTM without requiring enrichment. (9,10) Consequently, such workflows are nowadays allowing the identification of 5000 to 10000 proteins and far-reaching PTM sites in a single-cell line, due to the tremendous improvements in LC-MS/MS technology (8), which will be further explored in the next section.

14

2. The impact of Proteomics 2.1 Background context

For the past 15 years, after the Project was completed in 2003, proteomic studies have taken great advancement based on the development of modern equipment and innovative techniques. The information obtained from genome sequencing, under the improvement of the methods to perform it, namely with Next- and Third-Generation Sequencing, has allowed a better understanding of protein networks and cellular function at a systems-wide level, and has prompted new approaches for the comprehension of cellular proteomes and the profiling of protein changes under different cellular states. (11,12) Therefore, genomic, transcriptomic and proteomic information has been, and will continue to be, essential in identifying inherited disorders, characterizing mutations that drive cancer progression and tracking disease outbreaks. (13)

Proteomic studies, in comparison with Genomics and Transcriptomics, render more informative data regarding living cells systems, as they look at the variability associated with changes of protein expression during the cell cycle and changes derived from adaptive response processes to intracellular or extracellular stimuli; while the genome is static, the proteome presents dynamic properties. Moreover, the occurrence of alternative splicing and post-translational modifications, as shown in figure 2, dictates that monitoring protein expression cannot be equated to mRNA microarray technology, because not only variations in protein abundance and mRNA abundance are not necessarily proportional, but the generated products present great diversity, complexity and heterogeneity. Post-translational modifications, such as acetylation, hydroxylation, phosphorylation and glycosylation, modulate protein activity, stability, localization, and function, playing essential roles in many critical cell signaling events, in both healthy and disease states, therefore intrinsically increasing the difficulty of proteome research. The determination of PTM is one of the main goals of proteomics and makes it a highly demanding task to achieve extensive analysis in a comprehensive, accurate and quantitative manner. (14–17)

Figure 2 – From genome to proteome: an increase in complexity and dynamics. The diversity of protein products originating from a single gene is mainly due to alternative splicing of transcripts and post-translational modification of proteins. (From Jensen (2004) (15))

15

2.2 Expression vs. Structural and Functional Proteomics

There are two distinct fundamental approaches in proteomics. The original concept, Expression Proteomics, refers to the quantitative analysis of protein expression levels at genomic scale. Thus, it attempts to catalog the expression of all proteins present in cells, tissues or organisms, and with this give a differential analysis of biological systems upon reaction to external stimuli or under specific disease conditions, in order to identify proteins that are up- or down-regulated, underlying the possibility of finding therapeutic targets or diagnostic biomarkers. (16) However, this strategy presents several limitations, including the overall large range of protein expression in biological systems, with regulatory proteins being frequently hidden by abundant proteins, low-copy proteins being hard to detect and quantify, and with difficulty in separating proteins with extreme isoelectric point or molecular weight. Also to consider is the fact that expression-based proteomics only offers correlative relationships, requiring extensive validation through biological and technical replicates. (14)

The other approach, Structural and Functional Proteomics, is fundamentally and strategically different, because it denotes the analysis of structure, function, interactions and sub-cellular localization of proteins at genomic scale. On a first basis, large-scale mapping of protein structures by using experimental techniques like crystallization and X-ray crystallography, and also in silico molecular modelling, is an important research step due to the direct correlation between the three-dimensional structure of a protein and its molecular function. (18) Nevertheless, understanding protein functions within the cell is more importantly dependent on the identification of interacting protein partners, since the association with specific protein complexes involved in certain molecular mechanisms is strongly suggestive of the biological function of the protein, particularly if it is an unknown protein. Moreover, the identification of protein partners will lead to a more detailed description and recognition of cellular mechanisms at the molecular level. Under this fundament, Functional Proteomics aims to elucidate the biological aspects of protein-protein interactions, thus providing an input towards the characterization of cellular signaling pathways. (12,16) The technologies used include those: i) based on affinity purification procedures and physical measurement of the associated protein partners from physiological fluids; ii) relying on pairwise testing of two partners, based on biochemical automation or chip technologies; iii) based on genetic readout systems such as the various two- hybrid systems and also phage-display technologies; and iv) based on computational prediction methods, some of which are based on known 3-D structures and binding motifs. (14)

2.3 Quantitative Proteomics

The overall goal of protein quantification in systems biology is to obtain snapshots of protein concentration associated with different states, to render an intertwined functional network of expression. Instead of providing just the list of proteins identified in a certain sample (qualitative proteomics), quantitative proteomics yields

16 information about the amount of proteins present, and about the physiological differences between two biological states such as treated/non-treated or healthy/disease. Therefore, quantification can be absolute or relative, respectively. Relative quantitative proteomics reports the change of protein amount between two or more samples, and although several techniques are widely used for it, being the most common quantification approach nowadays, the ultimate purpose of quantitative proteomics is the absolute measurement of protein abundance, because it provides a more precise description of molecular events in biological processes. (19–21)

Quantitative proteomics was traditionally associated with two-dimensional gel electrophoresis (2-DE) techniques and subsequent software-assisted direct-read gel analyses for picking lists, followed by mass spectrometry for the downstream protein identification. (12) However, the limitations associated with running gels, like detection resolution and sensitivity and protein solubility, impelled the development of a better and more sensitive quantification process, for which quantitative mass spectrometry (MS) technology has taken charge in proteome studies. More accurate and reliable results have been achieved within, by the large use of stable-isotope-labelling or label-free methodologies. (3,12)

2.4 Mass spectrometry-based Proteomics: Multi-dimensional Protein Identification Technology

Mass spectrometry is a powerful tool for an accurate and unbiased mass determination and characterization of proteins, having great potential for understanding disease mechanisms at systems level since proteins are the effector molecules for all cell functions, being directly responsible for determining cell phenotype. (22) Hence, a variety of methods and instrumentations have been developed for its many uses. Two manners of proceeding with Multidimensional Protein Identification Technology (MudPIT) (23) by mass spectrometry are Top-Down, which analyzes intact proteins, and Bottom-Up, which analyzes peptides in proteolytic digests. (12)

2.4.1 Tandem mass spectrometry

Mass spectrometers mainly consist of an ion source and optics, a mass analyzer, and the data processing electronics. (10) A key development in MS instrumentation in its early days and presently bench-top upgraded was the introduction of soft ionization methods that allowed for proteins and peptides, which are polar, nonvolatile, and thermally unstable species, to transfer into the gas phase without extensive degradation to be further analyzed. (10,24) The two primary procedures used for the ionization of protein/peptides are Electrospray Ionization (ESI) (25,26) and Matrix-Assisted Laser Desorption/Ionization (MALDI) (27), whose character on peptide detection in large-scale proteomics has proven to be complementary. (24)

17

To determine the mass-to-charge ratio (m/z) of the ions and separate them accordingly, these ion sources are conjugated with mass analyzers: either Scanning and Ion-Beam mass spectrometers, such as Time-of-Flight (TOF) and Quadrupole (Q); or Ion Trapping mass spectrometers. The latter include the Penning trap, under the designated Fourier Transform Ion Cyclotron Resonance (FT-ICR) mass spectrometry, which is based on the cyclotron frequency of the ions in a fixed magnetic field; the Paul trap, which uses dynamic electric fields switched at radio-frequencies rates; and Orbitrap, which works based on electrostatic interactions, then using the Fourier transform of the frequency signal to obtain the converted mass spectrum. (10,28–30) The scanning mass analyzers like TOF are usually interfaced with MALDI to perform pulsed analysis, whereas the ion-beam and trapping instruments are frequently coupled to a continuous ESI source. (10) Each mass analyzer has unique properties, such as mass range, analysis speed, resolution, sensitivity, ion transmission, and dynamic range. For instances, with FT-ICR-MS it is possible to achieve higher levels of mass accuracy than with other types of mass spectrometers, since a superconducting magnet is considerably more stable than, for example, radio-frequency voltage. Orbitrap differs from FT-ICR by the absence of a magnetic field and hence has a significantly slower decrease of resolving power with increasing m/z. (29) Hybrid mass spectrometers, such as the Orbitrap Fusion™ Tribrid™ that was used in this project, combine more than one mass analyzer to answer specific needs during analysis. (10)

Fragmentation of ions of a particular m/z ratio (precursor ions) is then accomplished by collision-induced dissociation (CID) methods, namely higher-energy collisional dissociation (HCD); electron-capture and electron- transfer dissociation (ECD and ETD) methods; photodissociation; or surface-induced dissociation (SID), to give different product ions for a following stage of separation and detection. In this way, designated tandem mass spectrometry (MS/MS or MSn), it is possible to obtain, not only the masses of the proteins or peptides given by the full-scan MS, but also the primary sequence information yielded by the fragmentation scans. Different MS/MS methods vary in how ions are selected and measured in the MS2 scan, thus data can be obtained in several modes: data-dependent acquisition (DDA), data-independent acquisition (DIA), multiple reaction monitoring (MRM), parallel reaction monitoring (PRM), or selected reaction monitoring (SRM). (31–33) The several fragments cause a pattern in the mass spectrum that accounts for a short, easily identifiable series of sequence ions, which yields a partial sequence. This peptide sequence tag is a highly specific identifier of the peptide, and it ultimately leads to protein sequencing, as the multiple sequence tags obtained can be used to identify peptides in a protein database: by comparing the masses of the proteolytic peptides, or their tandem mass spectra, with those predicted from a sequence database or annotated as fragmentation patterns in a peptide spectral library, peptides can be identified and assembled into a protein identification. (34–36)

Orbitrap Fusion™ Tribrid™: a powerful novel combination

As stated by Thermo Fischer Scientific®, “this instrument combines the best of quadrupole, linear ion trap and Orbitrap mass analysis in a revolutionary tri-architecture to provide unprecedented depth of analysis and ease of use.” (37)

18

Orbitrap Fusion Tribrid, depicted in figure 3, can analyze samples in a broad multi-purpose range, from in- depth discovery experiments to characterization of complex PTM, or purely comprehensive qualitative/quantitative workflows, including non-targeted ones. (38) Using a quadrupole mass filter for precursor ion selection allows the ion trap and Orbitrap mass analyzers to operate in parallel, as shown in figure 4, for excellent sensitivity and selectivity, thus increasing the amount of high-quality data acquired in the same time-frame when comparing to other instruments. The availability of multiple fragmentation techniques, since precursor fragmentation can take place in the ion-routing multipole (HCD), in the ion trap (CID, ETD) or both (EThCD), followed by fragment ion detection by either the linear ion trap or Orbitrap mass analyzers, at any fragmentation stage of MSn, maximizes flexibility, versatility and performance. Next-generation ion sources and ion optics increase system ease of operation and robustness. (37,38) An upgraded Thermo Scientific™ Xcalibur™ processing and instrument control software complements the equipment, adding powerfulness to the implemented methods. As an exclusive tune editor technology, Dynamic Scan Management allows for intelligent and efficient scheduling and prioritization of the scan events. It also enables selection, sorting, and optimum routing of precursors to different fragmentation modes and analyzers based on user-selected parameters, including m/z, intensity, and charge. (38)

Figure 3 – Orbitrap Fusion™ Tribrid™ Mass Spectrometer. (From Thermo Fisher Scientific® (37))

Figure 4 – Orbitrap Fusion™ ion path. (From Thermo Fisher Scientific® - Product Specifications (38))

Hence, elucidative structural information from metabolites, glycans and other small molecules, and PTM and sequence polymorphisms can be attained more thoroughly, accurately and with greater speed. (37,38)

19

Orbitrap Fusion offers simultaneously an ultrahigh peak resolution and a fast peak scan rate, giving it a major advantage regarding the ability to confidently identify isobaric species and to determine the molecular weight of lower-abundance proteins, such as transcription factors. In this context, it is also ideal to perform analyses with monoclonal antibodies and other intact proteins, because a quick and accurate assessment of product quality and safety—including sequence integrity, glycan heterogeneity, and purity – essential to the development of new biotherapeutics, can be guaranteed throughout each step. (37,38) In mass spectrometry, a high number of scans per peak per unit of time (Scan Rate) is required to perform correct peak picking and mass spectral deconvolution of full-scan mass spectra. On the other hand, mass resolution describes the ability of separating two narrow mass spectral peaks; the higher, the better the separation. It is characterized by the parameter mass resolving power, defined as R=m/dm, where m designates the mass over charge ratio and dm the peak width necessary for separation at that mass. In the peak width definition, the minimum peak separation is measured at a specified fraction of the peak height, for instances 50% which unveils the full width at half maximum (FWHM) concept as a unit of resolution. (39) When developing hybrid mass spectrometers, one of the goals is to implement operation at high resolving power (R>50000 FWHM) with fast scans (scan rate > 10 spec/sec), to circumvent the fact that the highest resolving power, as it happens with FT-ICR, doesn’t necessarily provide a fast analysis. (39) With Orbitrap Fusion, resolution goes up to 500000 at m/z 200 and scan rate goes up to 20 Hz, giving unmatched spectral quality. (38)

Another relevant parameter in protein quantification is mass accuracy, which improves the analytical confidence of the results. With the optional Thermo Scientific™ EASY-IC™ Ion Source providing real-time internal calibration, Orbitrap Fusion can achieve a confidence-building sub-1-ppm mass accuracy in every scan. With external calibration, it stands sub-3-ppm. (38) Overall, Orbitrap Fusion steps up to respond to the most challenging biological research objectives, being able to properly process low-abundance and high-complexity samples in proteomics, metabolomics, glycomics, lipidomics, and similar applications, providing identification of compounds to its fullest extent. (37)

2.4.2 Top-down vs. Bottom-up Proteomics

As illustrated in figure 5, the main advantage of Top-down analysis, besides the fact that sample preparation is greatly simplified and the mixture complexity is reduced, is the ability to detect degradation products, sequence variants (e.g. mutations, polymorphisms, alternatively spliced isoforms) and combinations of post- translational modifications simultaneously in one spectrum. (40) Overall, a higher sequence coverage of target proteins is obtained, which reduces the ambiguities of the peptide-to-protein mapping, allowing for the identification of specific protein isoforms. (41,42) However, it faces limitations at large scale due to the lack of intact protein fractionation methods integrated with tandem mass spectrometry that can match the resolution of two-dimensional gel electrophoresis, considering that an effective fractionation is critical for sample handling before MS-based proteomics. (41)

20

Figure 5 – Difference between protein-based top-down and peptide-based bottom-up proteomics. (From Gregorich et. al (2014) (22))

On the other hand, the more conventional Bottom-up approach, involving digestion of the protein extract, usually with trypsin or chymotrypsin, is systematized with high throughput and automation, rendering better front-end separation of tryptic peptides compared to proteins, and thus higher sensitivity. (40) In this scenario, the extract may first be purified by gel electrophoresis resulting in, after in-gel digestion, one or a few proteins in each proteolytic digest; or alternatively, an in-solution digestion is performed directly, followed by one- or multi-dimension separation of the peptides by high performance liquid chromatography (HPLC) coupled to mass spectrometry, a technique known as Shotgun Proteomics. Nevertheless, a Bottom-up strategy presents limitations in the analysis of protein modifications, mainly because of an only partial sequence coverage and a loss of connections among modifications on different portions of a protein. Adding to this, there is loss of specific information concerning the individual proteins, since only a small and variable fraction of the digested peptides is recovered, and it is inevitable to find redundant peptide sequences, leading to origin ambiguity. (10,40)

2.4.3 Methods of quantification

2.4.3.1 Stable-isotope label-based quantification

Isotope labels can be applied metabolically, chemically or enzymatically, introducing a known mass difference between peptides from different experimental conditions. The differently labeled samples are then combined and subjected to quantitative MS. (19) An extensive overview and comparison of labelling methods in MS-based quantitative proteomics is presented by Sap and Demmers (2012) (43), here briefly addressed.

21

Regarding metabolic labeling, SILAC (stable isotope labeling by amino acids in cell culture) and heavy 15N- labeling are two of the best studied relative quantification methods. Labeling single residues like arginine, lysine, leucine and isoleucine, or replacing all 14N atoms for their heavy 15N variant during cell growth, reduces variance, giving reproducibility, and ensures accuracy in the quantification, since the cells are labeled early in the work- flow. (20) The relative abundance ratio of peptides is experimentally measured by comparing heavy/light peptide pairs, since a heavy-labeled peptide produces a specific mass shift compared to its light counterpart visible in the mass spectrum, and then protein levels are inferred from statistical evaluation of the peptide ratios by bioinformatic software tools. (3,10) Chemical derivatization techniques include ICAT (isotope-coded affinity tags), used for the labeling of free cysteine residues; iTRAQ (isobaric tags for relative and absolute quantification), used for the labeling of free amines; and TMT (tandem mass tags), that are fused to proteins or peptides after proteolytic digestion. (3,10,20) With enzymatic labeling, proteins are digested in the presence of 18O-labeled water so that the heavy atom is incorporated at the carboxyl end of each peptide. (44) Each of these relative approaches has its own advantages and limitations. In short review, with SILAC, variation is largely avoided due to the well-controlled efficiency of protein labeling in the cells and the fact that treatment and control samples are pooled at the early stage of the experiment. Nevertheless, it is time consuming and might not be suitable for certain types of cells, besides being difficult or practically impossible to apply it to tissue and body fluid samples, which are of particular relevance to biomedical research. Therefore, a great advantage of chemical labeling methods is that they are not limited by sample types. Adding to this, iTRAQ in comparison to SILAC allows for more samples to be simultaneously analyzed, making it feasible to include biological replicates in the same LC-MS/MS single run. However, they require complex sample handling and preparation which forces experimental errors and incurs in a less accurate quantification when mixing and comparing samples. (3,21)

A powerful method to determine the exact protein or peptide concentration is the use of isotopic-labeled absolute quantification (AQUA) peptides, which serve as heavy-labeled internal standards to mimic native peptides formed by proteolysis. For this, synthetic solid-phase peptides (prepared with covalent modifications) of a known concentration are added to the protein extracts of cell lysates during proteolytic in-gel digestion, and like with SILAC, their stable isotope labeled amino acids containing 13C and 15N give a predefined mass shift. (20) The retention time and fragmentation pattern of the native peptide formed by trypsinization is identical to that of the AQUA internal standard peptide, which is determined first-hand by LC–MS/MS using SRM for validation. In this method, specific ions are selected in the first stage of a tandem mass spectrometer and then ion products of the fragmentation reaction of the precursor ions are selected in the second stage to be monitored specifically in rapid succession. Because an absolute amount of the AQUA peptide is added, the area under the curve (AUC) ratio can be used to determine the exact concentration levels of a protein or modified protein of interest in the original cell lysate. (45) This strategy presents great advantages since LC–SRM results in a highly specific and sensitive measurement of both internal standard and analyte directly from complex peptide mixtures, allied to the fact that the internal

22 standard is present during in-gel digestion of the native peptides, thus reducing the variability and loss of efficiency due to the procedures of gel-extraction, sample handling and introduction into the LC-MS system. (45) Nevertheless, synthesizing AQUA peptides is expensive, and so, only suitable for the quantification of a few peptides of interest. Moreover, it is necessary to have a priori information about those peptides, such as elution time, m/z ratio, charge state and the best fragmentation method, to quantify them by AQUA. (20)

2.4.3.2 Label-free quantification

Label-free quantification (LFQ) presents a high-throughput method in the field of quantitative proteomics that compared to the stable isotope-labeling strategies does not require mixing steps, as seen in figure 6. It works based on the peak intensity of the peptide ions, by using the extracted-ion chromatogram (XIC), for which one or more m/z values representing one or more analytes of interest are isolated from the entire data set for a chromatographic run; or based on the identification frequency of a particular protein by spectral count to obtain quantitative data. (20,21) In principle, this approach allows for the quantification of any protein from which a peptide is unambiguously identified, whereas with absolute quantification by stable-isotope labeling it is only possible to quantify the proteins with corresponding isotope standards. This implies that label-free methods are suitable for large scale analysis, which complements the fact that they can be applied to all types of samples. However, they provide less accurate quantitative values, due to run-to-run variations within LC-MS/MS, error-prone injection of samples, and the ion-suppression effect of co-detected ions, since each sample is processed separately in individual runs. (3,21,43)

The measurement of AUC signal intensity of the peptide precursor ion is typically accomplished by integration of the average of intensities of the three most intense ions of any given peptide over its chromatographic elution profile. Change in protein abundance is calculated through comparison of the integrated signal response of all individual unique peptides matching the respective protein among different analyses. (20,21) Concerns with LC signal resolution can arise when peptide signals are spread over a large retention time range causing overlap with co-eluting peptides, or when multiple signals are shown for the same peptide. (43) Spectral counting, a factor of identification frequency, is based on determining and comparing the number of tandem fragment-ion spectra of peptides between different samples without considering physicochemical peptide properties. (20,21) It has shown the highest correlation with relative protein abundance, thus suggesting it to be the best index for relative quantification. (46) The most significant disadvantage of spectrum counting is that it behaves very poorly with proteins of low abundance and few spectra. (43)

23

Internal standard Label-free Metabolic labelling Chemical or enzymatic labelling (e.g. AQUA) protocols

Figure 6 – Stages of incorporation of stable isotope labels in typical labeling workflows in comparison to label-free in quantitative proteomics and respective mass spectrometric data output. The light and dark grey diamonds represent the two protein samples to be differentially labeled or compared. (Adapted from Sap and Demmers (2012) (43))

Despite it all, an accurate and robust proteome-wide quantification with label-free approaches has still been a challenge, prompting the development of novel procedures for intensity determination and normalization. The major two problems presented are: with pre-fractionation, separate sample processing inevitably introduces differences in the fractions to be compared; and the selection of the peptide signals that should contribute to the optimal determination of the protein signal across the samples. (47) In this context, MaxLFQ method, which has proved to be fully compatible with any peptide or protein separation prior to MS/MS analysis, and with standard statistical analysis workflows, was implemented in MaxQuant platform for proteomics. Based on globally minimizing inter-sample differences for each observed peptide, MaxLFQ introduces ‘delayed normalization’. Because the total peptide ion signals are spread over the several runs on LC-MS/MS, it is not possible to sum up the peptide ion signals before knowing the normalization coefficients for each fraction. By delaying normalization, summing up intensities with normalization factors as free variables, the quantities of the peptides are determined by a global optimization procedure based on achieving the least overall proteome variation. (47) Furthermore, this strategy works under a novel approach for protein quantification, extracting the maximum ratio information from peptide signals in arbitrary numbers of samples to achieve the highest possible accuracy.

24

Ratios deriving from individual peptide signals are taken into account rather than intensities summed up because the XIC ratios for each peptide are already a measurement of the protein ratio. (47) Another thing to consider is that, due to stochastic MS/MS sequencing and differences in protein abundances across samples, peptide identification can be missing in specific samples. A way to obtain a signal for each peptide in every sample is to integrate the missing peptide intensities over the mass retention time using the integration boundaries from the samples where the peptide has been identified. (47)

2.4.4 Interpretation of the fragmentation spectra and peptide identification

2.4.4.1 Peptide fragmentation and de novo sequencing

Data-dependent acquisition is one of the focal techniques for protein identification in shotgun proteomics. In this automatic process, a subset of single precursor ions from the survey scan is chosen by the MS instrument on the basis of predetermined rules, namely higher abundance, to continue on a second stage of mass selection. This means that tandem mass spectra will not be acquired for all available ions of the eluted peptides, and that information from the previous spectrometric scan determines the parameters of the subsequent ones. (46) DDA presents as main advantages its flexibility, extent of detection, and easiness of setup and analysis. In terms of peptide quantification, relative methods can be applied with low-to-high precision liable on being label- free or labeled. However, irreproducibility and inaccuracy are inherent to the concept, due to the hitches of measuring co-eluting and low-abundance peptides, and the stochastic nature of precursor ion selection. Additionally, to survey as many peptides as possible, and thus render thousands of hit identifications, DDA deliberately samples each peptide species only once or twice, which not only prevents precise absolute quantification, but also averts reaching high-saturation levels on identifications when sampling complex protein mixtures (for example, whole cell lysates), as both require multiple measurements per peptide. (33,46)

Whether using DDA or another tandem MS mode, the selected peptide ions undergo gas-phase dissociation pathways, the fragmentation being strongly dependent on two factors: the energy liberated, because at low energies fragmentation mainly occurs along the peptide backbone bonds, whereas at higher energies fragments generated by amino acid sidechains are also observed; and where the charge resides in the products. (32,47) With this, peptide fragmentation is not sequential, meaning that the first event does not start at the N- terminal and proceed one residue at a time down to the C-terminal. Happening at different sites results in different fragments, with some fragmentations being preferred over others as noted by the variation in the abundance of observed peaks in the tandem spectra. The distribution of the peaks reflects the population of fragment ions produced in the collision cell of the mass spectrometer. (31,48)

The method of de novo sequencing means assembling the sequence of the peptides by assignment of fragment ions from the mass spectra. Considering that doing it manually is labor-intensive and time consuming, and impossible with large-scale proteomic analysis, appropriate algorithms come already implemented in the mass spectrometers for the interpretation of spectra. (49,50)

25

The sequence of the peptide can be disclosed by the mass difference between the peaks, greatly helped by the fact that ion series can be established according to the site of backbone cleavage, (48) as illustrated in figure 7. Three different types of backbone bonds can be broken to form peptide fragments: alkyl carbonyl (CHR-CO), peptide amide (CO-NH), and amino alkyl (NH-CHR); appearing to extend either from the N-terminal or the C- terminal, which leads to six types of sequence ions: a, b, c and x, y, z, respectively. In most cases, b-ions and y- ions, and neutral losses of water or ammonia, depending on the amino acid composition, dominate the mass spectrum because the vast majority of instruments use low-energy CID and most de novo algorithms assume that peptides preferentially fragment into b- and y- series since the peptide amide bond is the most vulnerable. (31,48,49,51,52)

a. b. C-term

N-term

Figure 7 – Peptide fragmentation ion series and correspondent MS/MS fragmentation spectra. a. Fragmentation ion series a, b, c (N-term) and x, y, z (C-term) according to the site of backbone cleavage: alkyl carbonyl (CHR-CO), peptide amide (CO-NH), and amino alkyl (NH-CHR), respectively. b. Simplified representation of an MS/MS spectrum with b- and y- ions: peptide IYEVEGMR. (From Sadygov et. al (2004) (31)) Distances between peaks on the m/z axis can be used to infer partial sequences of the peptide, by knowing each residue isotopic mass and the fact that b-ions m/z values refer to the mass of the shortened peptide minus OH- and y-ions m/z values to it plus H+. (49)

One source of uncertainty in analyzing tandem mass spectra is the charge multiplicity of the fragment ions and the determination of the charge state of the precursor peptide, since ionization occurs mainly at +2/+3 charge, being critical to accurately calculating the molecular weight. Methods to determine the charge state of ions involve deconvolution of the m/z spread of isotope peaks. (31,53)

2.4.4.2 Peptide-Spectral Match

In a typical LC–MS/MS workflow, experimental peptide product ion MS/MS spectra are compared with those derived theoretically from protein sequence databases by in silico digestion and fragmentation, or with collections of identified spectra, in the method of database searching. Subsequently, the analysis of the large volume of proteomics data relies on automated algorithms to produce peptide-spectral matches (PSM) and simultaneously assess their quality based on scoring functions, which will lead to reliable peptide identifications. (34) Typically, less than 50% of the MS/MS spectra results in peptide-spectrum matches. Then, only a small fraction of PSM passes the preset false discovery rate (FDR), which controls the probability of false positive findings in the set given only the matches showing a statistical value lower than the threshold. (54)

26

A PSM scoring function assigns a numerical value to a peptide-spectrum pair (P,S), expressing the likelihood that the fragmentation of a peptide P is recorded in the experimental mass spectrum S. Most scoring functions form a generative statistical model for the probability Prob(S|P) that spectrum S was created from a fragmentation of peptide P, and then the likeliest peptide P* to do so is the one that maximizes that probability. In theory, if the model Prob(S|P) is sufficiently accurate, that peptide that maximizes Prob(S|P) will be the true peptide whose fragmentation is recorded in S. The probability Prob(S|P) is often converted to a score by using a log ratio test that compares it to the probability obtained from a null hypothesis, such as FDR. (34,55) The problem with generative models for PSM scoring is that peptide fragmentation is extremely complex and not easily represented with high confidence by simple statistical models, particularly when searching against large databases, where they can fail to discriminate between correct PSM and false positives. (34) When scoring a PSM from a database search, one cannot assume that the correct peptide sequence for a query spectrum is necessarily there because the peptide might originate from an unknown gene or an unknown splicing. Hence, this approach fails to recognize novel peptides since it can only match to existing sequences in the database. This challenges the ranking of the PSM candidate list, because in addition to bringing the correct PSM to the top, the scoring function also needs to assign all correct PSM a higher score than all incorrect ones, which can be difficult by reason of quality of the fragmentation spectra. (34,56)

Database searching algorithms can follow one of four approaches: descriptive, interpretative, stochastic and probability-based modeling, briefly described in figure 8.

Figure 8 – Database searching algorithms. (From Sadygov et. al (2004) (31))

Under the probability model, a group of database search algorithms have been implemented that use collective properties of database sequences to calculate the probability that a match between the calculated and measured fragment masses is a random event. In this context appeared Andromeda (57), a peptide search engine integrated into MaxQuant environment, which was the proteomic platform used in this project for the analysis of the mass spectrometry data. A key advantage of Andromeda is its extensibility. The algorithm supports high accuracy MS/MS data, being able to assign and score complex patterns of post-translational modifications (e.g. highly phosphorylated peptides), as well as handling many modifications of the same peptide. (57) It also reveals incredible flexibility by including a second peptide identification algorithm that enhances the characterization of complex mixtures. These mixtures frequently lead to co-fragmentation of co-eluting peptides

27 with similar masses, which will reduce the number of peptides identified in database searches, especially if the co-fragmented peptides generate peaks of comparable intensity. Thus, for each isotope pattern that is detected in the LC-MS data but is not targeted for fragmentation, the algorithm checks if it intersects the m/z isolation window of any MS/MS event. If so, fragment ions that have already been assigned to a peptide sequence during the main search are subtracted and the search is repeated with the precursor mass of the co-fragmented peptide, in a statistically rigorous independent way to overcome the lower quality in comparison to the original MS/MS spectra. (57) Andromeda has proven to be a robust and scalable search engine that in combination with MaxQuant represents a powerful and unified analysis pipeline for quantitative proteomics. (57) The scoring function at the heart of Andromeda is built on a simple binominal distribution probability formula, as shown in figure 9. To efficiently score MS/MS spectra, it is important to retrieve all candidate peptides showing a suitable calculated precursor mass within a given tolerance, therefore a list of all peptides obtained from the protein sequences in databases by the specified digestion rule is generated, considering all possible combinations of preset variable modifications. (57)

Figure 9 – Schematic of the peptide scoring algorithm in Andromeda. (From Cox et. al (2011) (57))

Andromeda divides the experimental MS/MS spectra into mass ranges of 100 Da. In each one, the number of experimental peaks offered for matching is dynamically tested in an intensity prioritized manner. Because there are many signals present, including some low intense noise signals, this number is limited to a maximum. The parameter q is defined as the number of allowed peaks in the mass interval and it is needed to calculate the probability of random matches. (57)

28

So, the first step in the calculation of the score is to count the number of matches k between the n theoretical fragment masses and the peaks in the spectrum. If the difference between calculated and measured masses is less than a predefined value, a match is counted. The higher k is compared to n, the lower the chance that this happened by chance. (57) Andromeda score is then calculated as the negative log10 of the probability of matching at least k out of the n theoretical masses by chance. The final score involves an optimization over the inclusion of modification-specific neutral losses and of the highest intensity q peaks accounted per 100 Da m/z interval. (57)

2.5 Protein annotation and function prediction

The ever-increasing interest in high-throughput proteomic studies by mass spectrometry has been greatly accompanied by the remarkable growth of protein databases, which not only allow for the identification of the peptide fragments to predict what proteins are present in the samples, but also provide a handful of information regarding its features, structure, activity, functions and involvement in cellular pathways, collectively known as protein annotation.

2.5.1 Annotation methods

Protein annotation was typically performed by and homology identification against already annotated proteins from public databases. The thought behind these homology-based methods was that proteins of similar sequence would have similar functions. However, this was not always verified: closely related proteins sharing sequence similarity could have evolved completely different functions. (58) Therefore, other methods were pursued with the purpose of giving a more realistic and accurate prediction of likely protein functions. At first, sequence motif-based methods account for the search of protein domains given a query sequence, since within domains shorter signatures known as motifs are associated with particular functions and can be used to subcellular localization of a protein. Databases for protein domains and families include Pfam (59) and PROSITE (60). On another perspective, structure-based methods look for the similarity between 3D structures of proteins, in the conception that structural alignments can be a better indicator of similar function in comparison to sequence alignments, due to the fact that structure is generally better conserved, particularly motifs such as the active site or binding site. Thus, proteins can present a very different amino acid sequence but evidence a structural arrangement that promotes the same kind of protein function. (58) Protein Data Bank (PDB) (61) is the key database for structural data, which is mainly obtained by X-ray crystallography or NMR spectroscopy, and perfected through computational modelling techniques, like solvent mapping, to ponder conformational changes caused by the binding of small molecules and enlighten the likely location of an active site. Leaving behind the comparison of sequence or structure, genomic context-based methods for instances are based on the remark that two or more proteins with the same pattern of presence or absence in different

29 genomes most likely have a functional link. Following the idea that similar pathways reflect a sharing of genomic context across all species, this approach can be used to better predict the biological process or mechanism in which a protein acts within the cell. (62)

In recent years, the so-called network-based methods gained popularity, aiming to establish functional association networks for target groups of proteins linked to each other by edges representing evidence of shared function. In this sense, patterns of protein interactions within networks are used to infer function and to improve the system-level understanding of cellular processes. (63) The STRING database (Search Tool for the Retrieval of Interacting Genes/Proteins) is widely used, holding known and predicted protein–protein interaction knowledge imported from various sources, including experimental data, public text collections and computational prediction methods. Through literature curation, text mining and orthology-based inspection of protein interaction on model organisms, STRING contains information on almost 10 million proteins from more than 2000 organisms. (64,65) This type of network resource also serves to highlight functional enrichments in user-provided lists of proteins, using a number of functional classification systems of great span and relevance in biology, medicine and computer science, such as Gene Ontology (GO) (66,67), PANTHER (68), KEGG (69,70), and Reactome (71).

2.5.2 Functional classification systems

Gene Ontology and PANTHER

The Gene Ontology (GO) project provides representation and nomination of defined terms depicting gene product properties through an universal vocabulary, covering three domains: i) molecular function (the molecular-level activities); ii) biological process (the biological programs accomplished by multiple molecular activities); and iii) cellular component (the locations of action relative to cellular structures). (67) The GO knowledgebase is composed firstly by the Gene Ontology itself, which provides the logical structure of the terms and their relationships, manifested as a directed acyclic graph. Each term has defined relationships to other terms in the same domain, and sometimes in other domains. The second component is the core of GO annotations, supported directly or indirectly by scientific literature: evidence-based statements relating a specific gene product, whether a protein, a non-coding RNA, or a macromolecular complex, to a specific ontology term. Together, the ontology and the annotations aim to describe a comprehensive model of biological systems and network biology. (67) The gene set enrichment analysis tool of GO directs to the interface of PANTHER database (Protein ANalysis THrough Evolutionary Relationships). This enables to take advantage of user-friendly visualization tools, such as the hierarchical view that organizes enrichment results using the relations of GO (which are constantly updated to avoid outdated annotations), by grouping related terms together to facilitate biological interpretation. (67,68) PANTHER contains comprehensive information on the evolution and function of protein-coding genes from completely sequenced genomes, allowing the classification of new protein sequences, and to analyze gene lists obtained from large-scale genomics experiments. (68)

30

As a future direction to allow more inclusive, accurate statements about gene function and how multiple genes may function together, GO annotations are steeping to models of biology by the means of a new formalism designated LEGO (Logical Extension of the Gene Ontology). Currently under development, it aims to combine traditional GO annotations together into a larger model of gene and system function, which holds a fully integrated, yet simpler, representation of how gene functions relate to each other and to bigger biological processes. (67)

KEGG Pathways

KEGG (Kyoto Encyclopedia of Genes and Genomes) is a knowledgebase for systematic analysis of gene functions, linking genomic information with higher order functional information. Therefore, it comprises two main databases: Genes and Pathways. The first stores the genomic information, in a collection of gene catalogs for all completely sequenced (and some partially sequenced) genomes, with up-to-date annotation of gene functions. The second contains the higher order biological functions in terms of generalized networks of interacting proteins, with graphical representations of the associated cellular processes supplemented by a set of ortholog group tables for the information about conserved sub-pathways. There is a third database, Ligand, covering information about chemical compounds, enzyme molecules and enzymatic reactions. (70) Since functional annotation of individual genes is still largely incomplete, the KEGG databases are updated in an effort to level the submissions of newly sequenced genomes (from a growing number of organisms) with its gene sets being linked with cell pathways through functional assignment. (70)

Reactome Pathways

The Reactome knowledgebase provides an ordered network of molecular transformations regarding cellular processes, resulting in an extended version of a classic metabolic map. It functions both as an archive of biological processes and as a tool for discovering unexpected functional relationships, such as gene expression profiles or somatic mutation catalogues from tumor cells. (71) Reactome differs from other resources by focusing its manual annotation on a single species, Homo sapiens, and applying a single consistent data model across various domains of biology like KEGG or PANTHER. (72) It has entries for over 10000 human genes, 53% of the predicted human protein-coding genes, supporting the annotation of almost 25000 specific forms of proteins distinguished by post-translational modifications and subcellular localizations. (71) Besides a novel Pathway Diagram Viewer, Reactome also provides an over-representation analysis tool to aid the interpretation of expression data sets, calculating whether a user-generated list of proteins, for example whose expression is changed in response to a stress, contains more annotated to each pathway than what would be expected. (71)

31

2.5.3 Genome and protein integration databases

The development of functional classification systems to better understand and characterize biological events at cellular level makes tremendous use of the several different up-to-date genome databases of various model organisms. One of such is the Ensembl genome database project, launched 20 years ago parallel to the initial drafts of the Human Genome Project. Functioning as a genome browser, sequence data is fed into the gene annotation system which creates a set of predicted gene locations and saves them for analysis. (73) Central to its concept is the ability to automatically generate graphical views of the alignment of query genes and other genomic data, like genetic variations and regulatory features, against a reference genome. Over time, Ensembl has expanded as its resources describe multiple fields of genomics, in particular gene annotation, comparative genomics and epigenomics; and covers a growing number of genome assemblies from other model organisms. Additionally, its interface enables data at varying levels of resolution, from level to text-based representations of DNA and amino acid sequences, as well as to present other displays such as homologous- genes trees across a range of species. Overall, this supports research ranging from quick browsing to genome- wide bioinformatic analysis. (73,74)

Taking into consideration that so many diverse protein knowledgebases have been shaped, each one revealing particular advantages and up-features, but also limitations, it is of keen interest to use databases that promote aggregation and integration of the most relevant information collected on protein features, functions and linked pathways, in a way that is sufficient but still allows a further deep analysis. Under this light, the Universal Protein Resource (UniProt) is a comprehensive, high-quality resource of protein sequence and functional information, with many entries being derived from genome sequencing projects. (75) It lodges two core databases: UniProt Knowledgebase (UniProtKB) and UniProt Archive (UniParc), which individually gather non-redundant, accurate information extracted, respectively, from scientific literature, biocurator-evaluated computational analysis and also unreviewed automatic annotation, or from publicly available protein sequence databases. Whereas UniProtKB is partially manually annotated, updated to the latest scientific findings, UniParc contains only protein sequences, with no annotation. (76,77) A third database, UniProt Reference Clusters (UniRef), aims at clustering highly homologous sequences from UniProtKB and selected UniParc records to significantly reduce database size, thus enabling faster sequence searches. (78) The greatest advantage of UniProt is that sequence data is obtained from several protein and genome source collections, including PDB and Ensembl, but merged into unique sequences stored only once and given unique identifiers, making it possible to reliably identify the same protein from different source databases, and have linked to it a broad annotation scope. (79)

32

2.5.4 Limitations of annotation procedures

The main limitations of annotation enrichment come from the annotations themselves. At protein level, the most common proteins are more thoroughly annotated and described, and well-known functions and biological processes present more reliable and complete terms to it. Thus, a growing number of submitted new protein sequences lack functional annotation, which introduces a certain bias into the statistical analysis. (80) Another constraint is the complexity and detail of annotation associated with large gene/protein sets. This happens because resources like Reactome and GO can be very intricate in their annotation, generating complicated networks of inter-related and similar terms. The simplest approach to unravel this is to make use of simplified ontologies, such as GOslims in GO, where fine detailed terms are removed and assigned to broader, more general parent terms. (80)

An alternative is offered by Cytoscape software platform: a project for integrating biomolecular interaction networks with high-throughput expression profiles and phenotypes into a unified conceptual outline display through powerful visual styles. (81) The core component of Cytoscape is a network graph, with molecular species represented as nodes and intermolecular interactions represented as links/edges between nodes, for which visual representation and integrated data, selection and filtering tools are provided. A key aspect of Cytoscape architecture is the additional use of plugins for specialized features, which are available for network and molecular profiling analyses, new layouts, and connection with large databases of functional annotations. (81) For instances, the apps BiNGO (82) and ClueGO (83), updated with the newest files from Gene Ontology, KEGG and Reactome, represent the results as a network of terms where directed edges represent term relationships as defined in the ontology used. This allows tools from graph theory to be used in reorganizing the network layout to expose communalities within, helping to simplify the output. (80)

33

3. The importance of cell systems in research 3.1 Applications of cell culture

Cell culture is a major tool used in cellular and molecular biology, providing excellent model systems for studying the normal physiology and biochemistry of cells, the effects of drugs and toxic compounds, and mutagenesis and carcinogenesis. It is also used in drug screening and development, and for the large-scale manufacturing of biological compounds such as vaccines and therapeutic proteins. The major advantage of using cell culture for these applications is the consistency and reproducibility of results. (84)

3.2 Human colorectal cancer cell line Caco-2 as a system model

Colorectal cancer (CRC) is one of the most common cancers worldwide, with a frequency of about 1.4 million cases every year. It is also one of the most lethal, holding around 700 000 deaths per year as of 2012, which gives an approximately 50% fatality rate. Its burden is expected to increase by 60% to more than 2.2 million new cases and 1.1 million deaths by 2030. (85) CRC is a useful model for studying carcinogenesis and tumor progression, as colonocytes follow a systematic process of proliferation, differentiation and transformation from adenomas to carcinomas. (86)

3.2.1 Cell line development, characteristics and properties

Starting from the 1970’s, several cell lines were established from gastrointestinal tumors aiming to perform studies on cancer mechanisms and possible therapies. (87) However, they generally failed to show a specific intestinal differentiation, even if partial differentiation could be induced by treatment with synthetic or biological factors, with exception of the colorectal adenocarcinoma Caco-2 cell line. (88) These cells were found to have the capability to differentiate spontaneously into a monolayer of cylindrical polarized cells with morphological, biochemical and functional features of mature intestinal enterocytes, even though they originate from the colon. Amongst those features, illustrated on figure 10, are well-developed microvilli on the apical membrane, functional tight junctions between adjacent cells, and intestinal hydrolase activity such as that of lactase, aminopeptidase N, sucrase-isomaltase, dipeptidyl peptidase-IV, and alkaline phosphatase, which is frequently used as a marker of cell differentiation in colon cancer cells. (88–91) However, no P-450 metabolizing enzyme activity has been reported. (91) Table 1 summarizes the discussed properties of Caco-2 cells.

34

a. Figure 10 – Similarity between differentiated Caco-2 cells and feature-specific enterocytes. a. Representation of Caco-2 enterocyte-differentiated monolayer in a culture chamber. (From ATC Pharma – Caco-2 Permeability Assay (92)) b. Representation of a human enterocyte and its associated b. membrane transporters. (From SOLVO Biotechnology – Drug absorption in the intestine (93))

Caco-2 cells can spontaneously differentiate into small intestinal cells to form a functional polarized epithelium mimicking the gastrointestinal barrier, including the presence of microvilli, tight junctions, specific transporters and metabolic enzymes.

Proliferating Caco-2 cells spontaneously initiate the enterocytic differentiation process after reaching confluence, nonetheless also demonstrating at this point a colonocyte phenotype. The transition from proliferation to differentiation seems to be triggered by specific biochemical events including E-cadherin- mediated cell–cell adhesion. Additionally, extracellular matrix proteins, cytokines, hormones and growth factors play a role in cellular growth arrest and parallel initiation of the differentiation program. (94) During differentiation, the content of colonocyte-specific proteins decreases, whereas that for colonocyte-specific increases. The timing and degree of this phenotypic switch is relevant to the interpretation of experiments using Caco-2 cells as a model of the intestinal barrier. (95)

Table 1 – Properties of the Caco-2 cell line. (From Lea (2015) (91))

35

3.2.2 General maintenance protocol

Caco-2 cells are routinely maintained in medium suitable for the culture of mammalian cells such as RPMIM (Roswell Park Memorial Institute Medium) 1640 or DMEM (Dulbecco’s Modified Eagle Medium). The presence of glucose, glutamine, phenol red, serum supplement, HEPES buffer and sodium pyruvate can be selected upon preference, as well as the addition of Non-Essential Amino Acids, thyoglycerol and the type of antibiotics, like gentamycin, penicillin or streptomycin, in order to prepare a complete medium. (91)

The cells should be kept at 37°C in a humidified atmosphere containing 5% CO2. For propagation in culture flasks, Caco-2 cells are seeded in a concentration of 105 cells/cm2. Medium should be changed every 1-2 days and the confluence should be maintained at 50-60% to avoid stressing the cells, thus splitting, normally 1:2 or 1:3, is required every 2-3 days. For this, trypsinization is performed by incubating the cells with a solution of Trypsin/EDTA (maximum 5 mL) after rinsing them first with PBS. The incubation with the enzyme should be as short as possible as this process affects cell viability. As soon as the cells are detached from the surface, the process is stopped by resuspending them in complete medium. In counting and checking cell viability, the content of dead cells should not exceed 5%. (91)

3.2.3 Caco-2 in the research field: advantages and disadvantages

Caco-2 cell line has proved to be an upright, indispensable tool for intestinal ADMET, i.e, Absorption, Distribution, Metabolism, Excretion and Toxicity, studies that describe the disposition of a pharmaceutical compound within an organism, therefore being of extreme relevance for research in areas of nutrition, pharmacology, physiopathology and toxicology, besides disease progression, considering that good pharmacokinetics (PK) is the ultimate go-to in drug development, close-handed with desired pharmacodynamics. Nowadays, for over 20 years, it has been a well-established in vitro model system for the human intestinal barrier, giving a reliable and reproducible assessment of intestinal permeability of compounds during pre-clinical trials, by which one can predict absorption and efflux of drugs in the human gastrointestinal tract. (90,96,97)

In this context, the increasing use of Caco-2 cell line over other cell lines might be an ascribing factor for the shift in the early 2000’s regarding the main reasons of failure in drug discovery, as a great decrease from about 40% to less than 10% was evidenced concerning pharmacokinetics/bioavailability issues, as seen in figure 11. An overall reduction of 1.5 times was verified when considering other factors reporting to ADMET properties such as clinical safety and toxicity. (98)

36

1990's 2000's Figure 11 – Reasons for failure in Drug Discovery through the years. (Adapted from Kola and Landis (2004) (98))

Nevertheless, variations between gene expression profiles of a transformed epithelial cell line like Caco-2 and normal human intestinal epithelium have been observed, for instances by Bourgine et. al (2012) (99). In fact, while investigating in vitro and in vivo expression level correlation, it was verified that the best agreement between human tissue and the representative human colon cancer cell lines did not happen with Caco-2, but with HT29, another cell line originating from a human colon adenocarcinoma. (99)

However, the simplicity and reproducibility of this model sustains it as one of the most preferred for pharmacological and toxicological studies, even if there cannot be a fully direct comparison with the in vivo situation. The limitations include the fact that the normal epithelium contains more than one differentiated cell type, not only enterocytes; that the mucus and unstirred water layer are not present; and that several non- cellular parameters affect the absorption of certain compounds, for instances lipophilicity and solubility, which can be influenced by the absence of in vivo specific features of the gastrointestinal tract such as the action of bile acids. (91)

37

4. Case study: antioxidant gene expression activated by 4-hydroxy-2-nonenal 4.1 4-hydroxy-2-nonenal: a product of lipid peroxidation

4-hydroxy-2-nonenal (4-HNE), an α,β-unsaturated hydroxy aldehyde, is one of the main end-products deriving from the oxidation of polyunsaturated fatty acids (PUFA) in membrane lipid layers by reactive oxygen species (ROS) produced during cell metabolism, namely superoxide anion, hydrogen peroxide and hydroxyl radical. (100) Lipid peroxidation (LPO) can induce changes in the permeability and fluidity of the membrane lipid bilayer, compromising cell integrity; and through the action of its end-products can affect cellular processes like proliferation, differentiation and apoptosis of normal and neoplastic cells. Therefore, the accumulation of ROS, generalizing the phenomena of oxidative stress, is damaging to the cell, prompting a set of enzymatic and non- enzymatic defense mechanisms to promote an antioxidant effect and re-establish homeostasis. (101–103)

4-hydroxy-2-alkenals resulting from the oxidation of ω-6 PUFA, e.g. linoleic acid and arachidonic acid, represent the most biologically active end-products due to their high electrophilic power, hence the capacity to form adducts with low-molecular-weight compounds like glutathione (GSH); with proteins and, at higher concentrations, nucleic acids. Based on their prolonged half lives and their ability to diffuse from their sites of formation, they have been considered second messengers of oxidative stress. (104) These molecules have 3 reactive groups: an aldehyde at C-1, a double-bond at C-3, and a hydroxyl group at C-4, all contributing to make them highly reactive with nucleophiles, with the primary reactivity lying at the unsaturated bond. Besides 4-HNE, other relevant end-products of lipid peroxidation include 4-oxo-2-nonenal (4-ONE), 4- hydroxy-2-hexenal (4-HHE), malondialdehyde (MDA) and acrolein, represented in figure 12. (105)

4-hydroxy-2-nonenal acrolein malondialdehyde

4-oxo-2-nonenal 4-hydroxy-2-hexenal

Figure 12 – Skeletal structural formulas of some end-products of lipid peroxidation.

HNE is present in the free form up to 1 µM in normal plasma but drastically increases during pathological oxidative stress, serving as a biomarker, and once formed, it can interfere with several signaling processes, as well as gene expression pathways and protein functions. (106) In order to control the steady-state intracellular concentration of aldehydes like HNE, by balancing the formation by LPO with the catabolism into less reactive compounds, the cell presents several enzymatic detoxification mechanisms (both phase I and II), mainly: glutathione conjugation, catalyzed by glutathione-S-

38 transferases (GST) (but can also occur spontaneously), forming HNE-GS, which is further metabolized and found in urine mostly as the mercapturic acid derivative, HNE-MA; reduction to alcohols like 1,4-dihydroxy-2-nonene (DHN) by aldo-keto reductases (AKR) or alcohol dehydrogenases (ADH); and oxidation to acids like 4- hydroxynonanoic acid (4-HNA) by aldehyde dehydrogenases (ALDH). (102,104,107,108) On the other hand, non-enzymatic biotransformations of HNE, related to the formation of adducts with proteins and nucleic acids (and potentially other small molecules like carnosine) have only recently been better elucidated and summarized. Despite being obvious pathways of toxicity, associated with cell carcinogenesis, necrosis, apoptosis and hypersensitivity, that can lead to severe systemic side effects and chronic inflammation as reported in cancer, neurodegenerative, cardiovascular and autoimmune diseases, there is evidence of an indirect link to detoxification mechanisms. The generation of HNE-protein adducts can, in some cases, indirectly represent a contrast to disease progression and promote adaptive cell responses by their involvement in several important biochemical pathways, namely redox signal transduction and gene expression, demonstrating that HNE isn’t only a toxic product of LPO, but can also be a determining regulatory molecule. (100,103,108–111) An extensive review on the detoxification of HNE has been written by Mol et. al (2017) (108), with the main pathways roughly represented in figure 13.

Figure 13 – Overview of the relationship between HNE metabolism and toxicity. Toxicity may arise from the accumulation of parent compounds or the formation of chemically reactive products, that, if not detoxified, can effect covalent modification of biological macromolecules. The nature and functional consequences of such binding determines the resulting toxicological response. (Adapted from Mol et. al (2017) (108) and Park et. al (2011) (111)).

HNE preferentially forms 1,4-Michael-type adducts via the C-3 unsaturated bond with nucleophiles like proteins. Amino acids known to react with HNE via 1,4-addition are Cys (sulfhydryl group), Lys (ε-amino group) and His (imidazole group), from highest to lowest reactivity. It has been proposed that HNE can also modify Arg residues. In addition, HNE can also react with lysyl residues via the C-1 aldehyde group through Schiff base mechanism to give imine and pentylpyrrole adducts. (102,107,109,112) These HNE modifications are depicted in figure 14.

39

Michael adduction

Schiff base adduction with pentylpyrrole rearrangement

Figure 14 – Reaction of HNE with cysteinyl, histidyl and lysyl residues via 1,4-Michael addition, and with lysil residues via Schiff base mechanism with pentylpyrrole rearrangement. (Adapted from Barrera et. al (2014) (109))

The reaction of electrophiles (EP) like HNE with the Cys residue of proteins via its sulfhydryl group, since it has an unshared pair of electrons, presents itself as a very interesting feature based on the duality between cytotoxicity and cytoprotection. Highly reactive EP tend to react non-specifically, and since GSH is the most abundant thiol in the cell, it may be depleted, reducing the cell’s reductive capacity and so leaving it more susceptible to oxidative stress and damage. Nevertheless, EP within limited concentrations have been reported to activate transcription-dependent pathways by binding to specific cysteines, majorly within the Keap1/Nrf2 pathway that induces potent antioxidative enzymatic systems, thus allowing an “electrophilic counterattack”. (113–115) Therefore, the dose-response curves regarding electrophile–cell interactions come in an inverted-U- shape, and in this context, Satoh et. al (2013) have described three categories of EP according to their protective action versus toxic effects, presented in figure 15. (113–115)

40

Figure 15 – Electrophile-cell interactions: dose-response curves. Distinctive groups of electrophiles can be defined in terms of broadness of their therapeutic index: EP1, none; EP2, limited; EP3, broad. The size of the arrows indicates the relative strength of the opposing effects of GSH depletion vs electrophilic counterattack in response to EP treatment. (From Satoh et. al (2013) (113))

The difference between EP2 and EP3 compounds is the degree of GSH depletion for an equally strong electrophilic counterattack. In practical terms, EP3 compounds may represent desirable drug candidates since they show broad therapeutic dose ranges and minimal potential side effects due to lower GSH depletion (even though they still do exert some toxic effect). For clinical tolerability and druggability, the reduction in toxic side effects is rendered more critical than a further increase in the protective effect. (113–115) Regarding HNE for instances, which has been predicted to fall under the category of EP3, thus motivating the search for experimental confirmation evidence, a question to emerge is the definition of the parameters related to drug-like properties that are essential for the development of viable anti-oxidative drug candidates.

4.2 Keap1-Nrf2 signalling pathway for an antioxidant response

A major mechanism in the cellular defense against oxidative or electrophilic stress is the activation of the Nrf2 signaling pathway, which controls the expression of genes whose protein products are involved in the detoxification and elimination of reactive oxidants and electrophilic agents through conjugative reactions, and by enhancing cellular antioxidant capacity. (116,117) As hinted before, the predominant protective activity is mediated by the reaction of an EP with a critical thiol in the cytoplasmic protein Kelch-like ECH-associated protein 1 (Keap1), causing it to dissociate from the transcription factor nuclear factor (erythroid-derived 2)-like 2 (Nrf2). Nrf2 is then free to enter the nucleus for subsequent activation of Antioxidant Response Element (ARE)- mediated transcription of antioxidant enzymes, illustrated in figure 16. (113,118,119)

41

Figure 16 – Activation of the Nrf2/ARE signalling pathway by HNE. Keap1 retains Nrf2 near the proteasome, prompting its degradation in the cytoplasm of quiescent cells. Under conditions of oxidative or xenobiotic stress, Nrf2 is released from Keap1 repression and translocates to the nucleus, where binding to the antioxidant response element (ARE) in association with Maf promotes an antioxidant response. (From Siow et. al (2007) (110))

Under unstressed conditions, Keap1 (actin-bound) acts as a binding protein to trap Nrf2 in the cytoplasm functioning as an adaptor protein for Cullin 3 E3 ubiquitin ligase, which polyubiquitinates Nrf2 leading to its degradation by the proteasome. On the other hand, redox-disrupting stimuli by directly modifying Keap1 inactivate this protein, thus reducing the ubiquitin E3 ligase activity of Keap1-Cul3 complex and allowing Nrf2 to be stabilized, provoking the induction of a battery of cytoprotective genes. (118,120–122) It has been established for over 15 years that activated Nrf2 is a major regulator of ARE-mediated gene expression and belongs to the Cap-N-Collar (CNC) family of basic region-leucine zipper transcription factors, heterodimerizing with small Maf proteins, with subsequent binding to the ARE of target genes. Evidence show that Nrf2 activation might be dependent on the phosphorylation of Ser40 residue by protein kinases (110,118,123) Nrf2, represented in figure 17, is a basic leucine Zipper (bZIP) transcription factor, containing a highly- conserved bZIP region (Neh1 domain) that mediates sequence specific DNA binding properties and allows for the dimerization of DNA binding regions. The Neh2 domain at the N-terminal contains DLG and ETGE (high-affinity) motifs that bind to Keap1 Kelch domain. Domains Neh3 at the C-terminal, Neh4 and Neh5 mediate Nrf2 transactivational activity by binding to acetyltransferases. Neh6 has the function of Keap1-independent negative regulation of Nrf2. (124)

Figure 17 – Organization and functions of structure domains in Nrf2. The reactive key cysteine residues, nuclear import (white down arrow), and nuclear export (white up arrow) signaling sites are indicated. (From Magesh et. al (2012) (124))

Keap1, represented in figure 18, is a Cys thiol-rich biosensor of redox stimuli that shares some homology with actin-binding Kelch protein and serves as a negative regulator of Nrf2. It contains 5 domains: N-terminal

42 region (NTR); Broad complex, Tramtrack, and Bric-a-Brac (BTB); linker intervening region (IVR); Kelch domain; and C-terminal region (CTR). BTB attached to actin-binding proteins is responsible for homodimerization and interaction with Cullin-based ubiquitin E3 ligase complex. IVR domain contains the cysteine residues that are sensitive to oxidation. Kelch domain has six kelch repeats with multiple protein contact sites that mediate association of Keap1 with Neh2 domain of Nrf2. (121,124)

Figure 18 – Organization and functions of structure domains in Keap1. The reactive key cysteine residues and nuclear import signaling site (white down arrow) are indicated. The redox sensitive cysteine residues are marked with (*). (From Magesh et. al (2012) (124))

As indicated in figure 18, Cys151, Cys273 and Cys288 residues represent the most susceptible thiol groups, being Cys151 at N-terminal the one where electrophiles preferably covalently react by S-alkylation, which has been observed using mass spectrometry. Each of these cysteine thiols may differentially regulate anti-oxidant gene expression, in accordance to the redirection of Nrf2 to a certain ARE. ARE is the cis-regulatory element containing specific DNA sequences found in the upstream regulatory regions of genes encoding detoxifying enzymes and cytoprotective proteins. (122–126)

Target enzymes regulated by the Keap1/Nrf2/ARE pathway are essentially involved in: i) synthesis of glutathione: glutamate-cysteine ligase catalytic and modifier subunits (Gclc, Gclm); ii) elimination of ROS: heme oxygenase-1 (HO-1), peroxiredoxin-1 (Prx-1), thioredoxin-1 (TR-1), thioredoxin reductase-1 (TXNRD1), glutathione peroxidase (GPx); iii) detoxification of xenobiotics: NAD(P)H:quinone oxidoreductase-1 (NQO-1), glutathione S-transferase (GST); and iv) drug transport: Multidrug resistance-associated protein (Mrp). (120,121) Several studies have demonstrated that HO-1 is prominently induced under various oxidative stress conditions in many different cell types, namely macrophages, vascular smooth muscle cells and fibroblasts. (110,127) HO-1 is a microsomal heat-shock protein that catalyzes the rate-limiting step in the degradation of the pro-oxidant heme to biliverdin, iron and carbon monoxide. These products play important roles in cellular oxidant defense: biliverdin is subsequently converted by biliverdin reductase to bilirubin, an antioxidant which can scavenge lipid peroxyl radicals (128); carbon monoxide exhibits anti-apoptotic and anti-inflammatory properties (129); and by regulation of HO-1 itself, the export of cellular iron is increased to maintain cell viability during oxidative stress. (130)

Targeting Keap1-Nrf2-ARE signaling pathway has therefore become an attractive strategy for the development of preventive and therapeutic agents referred to as antioxidant inflammation modulators (AIM) regarding oxidative stress-related diseases and conditions. (124) Thus, NRF2 activators have been shown to be anti-inflammatory and neuroprotective at least in part via redox regulation. (122)

43

Besides HNE, other known electrophilic activators of Nrf2 include dimethylfumarates (DMF), which have been approved by United States Food and Drug Administration as a new first-line oral drug to treat patients with relapsing forms of multiple sclerosis; and bardoxolone methyl (CDDO-Me), which was undergoing phase 3 clinical trials to be used in the treatment of chronic kidney disease and cancer but was disregarded for safety reasons. (122,131,132) As stated previously, the major problem with electrophiles such as DMF is that they also produce severe systemic side effects, in part due to non-specific S-alkylation of cysteine thiols and subsequent depletion of glutathione. To avoid these effects, AIM are being developed to follow other Nrf2 activation mechanisms, for instances using safe pro-electrophilic drugs, such as carnosic acid from the herb Rosmarinus officinalis and zonarol from seaweed Dictyopteris undulata (113,133,134); or by non-covalent pharmacological inhibition of protein-protein interactions, in particular domain-specific Keap1-Nrf2 interaction or other repressor proteins involved in this transcriptional pathway. (122,135) In this context, a recent review by Satoh et. al (2017) (122) discusses Nrf2 as a druggable target by covalent and non-covalent interactions. A broader, very comprehensive review on Nrf2 as a therapeutic target for chronic diseases is presented by Cuadrado et. al (2018) (136), vastly covering a mechanism-based systems medicine approach together with a multi-scale network analysis of the Nrf2 interactome, diseasome and the drugome to validate its fundamental role in several pathophenotypes.

4.3 Effect of oxidative stress and Nrf2 in carcinogenesis

When present in lower levels, enough to maintain homeostasis, Nrf2 prompts the elimination of ROS, carcinogens, and other DNA-damaging agents, which inhibits tumor initiation and cancer metastasis. (137) However in cancer, Nrf2 is usually upregulated and aberrantly accumulated, helping malignant cells to tolerate high levels of ROS that propel tumor progression, through the activation of specific redox signaling circuits and inflammatory processes that favor survival and proliferation of tumor cells. (136,138,139) Constitutively high levels of Nrf2 have also been shown to protect cancer cells against ionizing radiation, inducing chemoresistance. (121,140) Therefore, Nrf2 seems to have a dual role, acting as tumor suppressor in normal cells, but under prolonged activation it has been shown to favor tumor growth and relapse, as it also maintains the redox balance and generates antioxidants in cancer cells. (136,141) For instance, the Nrf2 signature in cancer stem cells from human colorectal tumors pointed out protective mechanisms mediated by high levels of Gclc, glutathione peroxidase, and thioredoxin reductase-1, that underlie the ability of these cells to counteract stressors. (142) From this perspective, NRF2 behaves in cancer cells like an oncogene, that by inducing chronic activation of ARE-mediated cytoprotective responses affords adaptation to their oxidative environment. (143) Different mechanisms lead to an observed increase in Nrf2 activity in cancer, derived from: i) somatic mutations in Nrf2, Keap1, or Cul3; ii) epigenetic silencing of Keap1; iii) microRNA-mediated regulation of Nrf2 and Keap1; iv) disruption of Nrf2-Keap1 interaction by aberrantly accumulated proteins; v) transcriptional up- regulation of Nrf2 through oncogene-dependent signaling; and vi) modification of Keap1 by metabolic intermediates. (137,141)

44

5. Scope of the present project

Ensuing the topics introduced, this project addresses the implementation of a novel mass spectrometry- based label-free untargeted method as an explorative quantitative proteomics approach in cell systems. A shotgun proteomics technique without labelling was applied to Caco-2 cells upon treatment with HNE in concentration range from 0 to 100 µM, using the high-resolution OrbitrapTM FusionTM TribridTM mass spectrometer for an in-depth protein identification. The results were then interpreted by a functional proteomic analysis recurring to several bioinformatic tools to investigate the action of HNE as an antioxidant and stress response agent.

45

MATERIALS AND METHODS

1. Materials

If not said otherwise, all the listed items were available at the laboratories of Unità di Ricerca di Analisi Farmaceutica e Biofarmaceutica, Dipartimento di Scienze Farmaceutiche (DISFARM), Università degli Studi di Milano, address Via Luigi Mangiagalli 25-20133, Milan, Italy.

1.1 Chemicals, Reagents and Kits

The water used throughout the procedures was obtained with a purification system Milli-Q® Gradient A10 (Millipore®, Vimodrone, Italy) coupled with a 0.22 µm Millipak membrane filter, that produces Type 1 water of very high quality, suitable for sensitive applications such as chromatography. It guarantees a product with maximum electrical resistivity of 18 MΩ⋅cm and electrical conductivity of 0.056 uS/cm at 25°C, and free of particles with a diameter superior to 200 nm.

Cell culture Dulbecco’s Modified Eagle Medium (DMEM), high glucose, no glutamine, no phenol red, 1X, 4.5 g/L D- glucose, Gibco® #31053-028 L-glutamine 200 mM, Lonza® #BE17-605E Sodium pyruvate 100 mM, Gibco® #11360-070 Penicillin-streptomycin (Pen-Strep) 10K/10K, Lonza® # DE17-602E Fetal Bovine Serum (FBS), Euroclone® #ECS0180L Sterile Phosphate Buffered Saline (PBS), 1X, 0.0067 M, HyClone® # SH30256.01 Trypsin-EDTA, 0.5%, no phenol red, 10X, Gibco® #15400-054 Non-sterile Phosphate Buffered Saline (PBS), pH 7.4, 1X, Gibco® #10010-023 Tissue Culture (TC) Flask T75, Stand., Vent. Cap, Sarstedt® # 83.3911.002

Cell treatment with HNE 4-hydroxy-2-nonenal-dimethylacetal (HNE-DMA) 40 mg/mL in dichlorometane Hydrochloric acid (HCl), 37% v/v, MM=36.46 g/mol, ρ=1.19 g/cm3, Sigma-Aldrich® #30721-M

Cell lysis Urea, MM=60.06 g/mol, Bio-Rad® #161-0731 Sodium chloride (NaCl), 99.5%, MM= 58.44 g/mol, Sigma-Aldrich® #31434-M Tris(hydroxymethyl)aminomethane (Tris), 99.8%, MM=121.14 g/mol, Bio-Rad® #161-0719 Protease Inhibitor Cocktail, Sigma-Aldrich® #P8340

46

Bradford Assay Bradford Reagent, Sigma-Aldrich® #B6916 Bovine Serum Albumin (BSA), 96%, MM=66kDa, Sigma-Aldrich® #A2153

Protein Digestion Ammonium bicarbonate (AMBIC), 99%, MW=79.06 g/mol, Sigma-Aldrich® #A6141 1,4-Dithiothreitol (DTT), 97%, MW=154.25 g/mol, Roche® #10708984001 Iodoacetamide, 99%, MW=175.18 g/mol, Sigma® #I6125 Trypsin sequencing grade, 25 µg, Roche® # 11418475001

ZipTip procedure 0.2 µL micro-C18 resin Millipore ZipTips, Sigma-Aldrich® #ZTC18M096 Acetonitrile (ACN) supragradient HPLC grade, MM=41.05, ρ=0.786 g/cm3, Scharlau® #AC03312500 Formic Acid (FA), 98-100%, MM=46.03 g/mol, ρ=1.22 g/cm3, Scharlau® #AC10851000

High resolution mass spectrometry analysis Nano-Trap column, 100 µm x 2 cm, packed with Thermo Scientific™ Acclaim PepMap100 C18, 5 µM, 100Å, Thermo Fisher Scientific®, #164199 EASY-Spray column, 15 cm x 75 µm, packed with Thermo Scientific™ Acclaim PepMap RSLC C18, 3 µm, 100Å, Thermo Fisher Scientific®, #11352263

1.2 Solutions

DMEM complete: 5 mL L-glutamine 200 mM, 5 mL Sodium pyruvate, 5 mL Pen-Strep, 50 mL FBS for initial 500 mL bottle of DMEM HCl 1 mM: 8.3 µL HCl 37% v/v in 100 mL make-up with Milli-Q water HNE 18.9 µM: 500 µL HCl 1mM added onto a vial of 50 µL HNE-DMA 40 mg/mL in dichloromethane after evaporation of the solvent under nitrogen stream 1X Trypsin-EDTA: 5 mL 10X Trypsin-EDTA in 50 mL make-up with PBS NaCl 1 M: 17.532 g NaCl in 300 mL make-up with Milli-Q water Tris-HCl 500 mM pH 8.5: 18.171 g Tris in 300 mL make-up with Milli-Q water, adjusting with HCl 1 mM until pH=8.5 Lysis Buffer: 4.8543 g Urea, 0.3 mL NaCl 1 M, 1 mL Tris-HCl 500 mM pH 8.5, in 10 mL make-up with Milli-Q water, plus 2 tablets of protease/phosphatase inhibitors BSA standard 1 µg/µL: 1 mg BSA in 1 mL make-up with PBS To prepare the other standards, consecutive dilutions 1:2 to obtain concentrations 0.500 µg/µL, 0.250 µg/µL, 0.125 µg/µL, 0.0625 µg/µL. The Zero Standard is 1 mL PBS. AMBIC 50 mM: 40 mg AMBIC in 10 mL make-up with Milli-Q water

47

DTT 50 mM: 7.71265 mg DTT in 1 mL make-up with AMBIC Iodoacetamide 100 mM: 17.5184 mg iodoacetamide in 1 mL make-up with AMBIC Trypsin 0.25 µg/µL: 100 µL AMBIC added onto a vial of 25 µg trypsin ZipTip “Elution”: 800 µL acetonitrile, 1 µL formic acid in 1 mL make-up with Milli-Q water ZipTip “Washing”: 50 µL acetonitrile, 10 µL formic acid in 1 mL make-up with Milli-Q water

1.3 Instruments

Milli-Q® Gradient A10 water purification system, Millipore® #MPGL04001 Microbiological Safety Cabinet Class II, S@feflow 1.2, Euroclone® #135-3086

Incubator CB-150 CO2, Binder® #9040-0038 ED Heating Immersion Circulator for water bath, Julabo® # 9116000 Microbiological Safety Cabinet VBH58 C2 Compact, Steril®, #ST-002900000 Heraeus™ Megafuge™ 16R Centrifuge, Thermo Fisher Scientific®, #75004270 IEC™ CL31R Multispeed Centrifuge, Thermo Fisher Scientific®, #1210917 Termomixer Compact, Eppendorf® #T1317 PowerWave HTTM Microplate Spectrophotometer, Bio-TEK®, #BT-RPRWI TC20™ Automated Cell Counter, Bio-Rad® #1450102 SpeedDry Vacuum Concentrator RVC 2-18 CDplus, Christ®, #100248 DionexTM UltiMateTM 3000 RSLCnano system, Thermo Fisher Scientific®, #5200.0355 connected to Thermo Scientific™ Orbitrap Fusion™ Tribrid™ Mass Spectrometer, Thermo Fisher Scientific®, #IQLAAEGAAPFADBMBCX (available at Fondazione Filarete, address Viale Ortles 22/4 – 20139, Milan, Italy)

2. Methods 2.1 Cell culture conditions

Human colorectal adenocarcinoma cell line Caco-2, commercially available, was cultured in Dulbecco's Modified Eagle Medium (DMEM) high glucose medium (1X, Gibco®) supplemented with 1% L-glutamine 200 mM (Lonza®), 1% sodium pyruvate 100 mM (Gibco®), 1% antibiotic mixture penicillin-streptomycin (Lonza®) and 10% fetal bovine serum (FBS) (Euroclone®).

2 Cells were grown in 75 cm TC flasks (Sarstedt®) at 37°C in a humidified atmosphere of 95% air/5% CO2. Cells were passaged every 2-3 days by rinsing first with 5-10 mL sterile PBS (1X, 0.0067 M, HyClone®) and then adding 2-4 mL trypsin-EDTA (0.5%, 10X, Gibco®) diluted 1:10 in PBS, followed by incubation for 5-7 minutes at 37°C, 5%

CO2. Resuspended cells in complete medium, in volume according to the splitting proportion, were then plated at a desired density of 1∙105 cells⁄cm2 to give 8∙106 cells in 10 mL medium per flask. Cell counting throughout culturing was performed using TC20™ Automated Cell Counter (Bio-Rad®) according to its associated protocol.

48

2.2 Cell treatment with HNE

A vial of 18.9 µM 4-hydroxy-2-nonenal (HNE) was prepared from its corresponding diacetal, 4-hydroxy-2- nonenal-dimethylacetal (HNE-DMA). A solution of 40 mg/mL HNE-DMA in dichloromethane had been synthetized following the protocol of Rees et. al (1995) (144), from which 50 µL were taken. The solvent was evaporated under a nitrogen stream, and 500 µL of HCl 1 mM prepared from 37% v/v HCl (Sigma-Aldrich®) were added, leaving on aluminum foil for 1h at 25°C to allow the hydrolysis. The concentration of HNE was then validated by UV measurement (λ=224 nm, ε=13750 mol-1.cm-1), and the vial was kept at 4°C, to be stable for at least 24h, until used.

After performing enough passages to have the necessary number of flasks for the experiment, and with 90% confluency, Caco-2 cells were treated with HNE to give final concentrations of 0.1 µM, 1 µM, 10 µM and 100 µM. A stock of 100 µM HNE in 25 mL DMEM was first prepared, from which the volumes to give the desired concentrations in 10 mL DMEM flasks were taken. For the control, filtered HCl 1 mM in a proportional-to-the- stock volume in 10 mL was added instead. As required for the experiment to be meaningful, biological replicates (A and B) were prepared. The flasks were then incubated at 37°C, 5% CO2 for 24h.

2.3 Cell lysis

After 24h incubation, treated Caco-2 cells were washed with sterile PBS and trypsin-EDTA diluted 1:10 in PBS was added, following the protocol described for cell passage. After collection with PBS to Falcon tubes, the samples were centrifuged at 1000g, 4°C for 10 min (Heraeus™ Megafuge™ 16R, Thermo Fisher Scientific®), then washed twice with cold PBS, pH 7.4 (1X, Gibco®) re-suspending the pellet, and centrifuged again. Total extracts were recovered with a lysis buffer in Milli-Q® water containing 8 M Urea (Bio-Rad®), 30 mM NaCl (Sigma-Aldrich®), 50 mM Tris (Bio-Rad®) adjusted to pH 8.5 with HCl 1mM, and 1 tablet of Protease Cocktail Inhibitor (Sigma-Aldrich®), by adding 200 µL of buffer to the pellet. After transferring to Eppendorf tubes, the samples were centrifuged at 14000g, 4°C for 30 min (IEC™ CL31R Multispeed, Thermo Fisher Scientific®).

The Bradford Assay to determine the protein concentration in the lysates was performed using Bovine Serum Albumin (BSA) (Sigma-Aldrich®) standards of concentration 0, 0.0625 µg/µL, 0.125 µg/µL, 0.250 µg/µL, 0.500 µg/µL and 1 µg/µL in PBS. In a 96-well plate, 5 µL of the standards, and 2 µL of the lysates’ supernatant diluted 1:10 with Milli-Q® water plus 3 µL of PBS, were added to 250 µL of Bradford Reagent (Sigma-Aldrich®). Absorbance (A) was read at wavelength 595 nm in a microplate spectrophotometer (PowerWave HTTM, Bio-

TEK®). Obtaining the calibration curve 퐴 = 0.5192 ∙ 퐶푃 + 0.6877, 푟² = 0.9911, the protein concentration in the samples (Cp) was calculated, alongside the volume of lysate that corresponds to 10 µg of protein necessary to perform the digestive step, which are depicted in table 2.

49

Table 2 – Absorbance, protein concentration and volume of lysate corresponding to 10 µg of protein on the Caco-2 samples: control and treated with HNE in concentration 0.1 µM, 1 µM, 10 µM and 100 µM, for the biological replicates A and B. Samples Absorbance Protein concentration Volume for 10 µg (µg/µL) (µL) CTRL A 0.8425 1.490 6.71 0.1 µM A 0.8345 1.413 7.08 1 µM A 0.8505 1.567 6.38 10 µM A 0.7965 1.047 9.55 100 µM A 0.8145 1.221 8.19 CTRL B 0.8585 1.644 6.08 0.1 µM B 0.8665 1.721 5.81 1 µM B 0.8790 1.842 5.43 10 µM B 0.8610 1.668 5.99 100 µM B 0.8205 1.278 7.82

2.4 Protein digestion

Solutions of 50 mM ammonium bicarbonate (AMBIC) (Sigma-Aldrich®) in Milli-Q® water, 50 mM dithiothreitol (DTT) (Roche®) in AMBIC, and 100 mM iodoacetamide (Sigma-Aldrich®) in AMBIC, were prepared, alongside a solution of trypsin for sequencing grade (Roche®) of concentration 0.25 µg/µL in AMBIC. For each sample lysate, in new Eppendorf, the volume of AMBIC necessary to add was calculated for a total volume of 50 µL discounting the volume of lysate to be withdrawn (table 2) plus the volume of reagents to add to it: 5 µL of DTT, 7.5 µL of iodoacetamide and 2 µL of trypsin. The volume of lysate was then added accordingly. Reduction with DTT was performed in a thermomixer (Compact, Eppendorf®) at 52°C, 600 rpm for 30 min, followed by alkylation with iodoacetamide, in the dark at room temperature, for 20 min. Digestion with trypsin was then carried at 37°C, 400 rpm, overnight, and the samples were afterwards stored at -20°C until further use.

2.5 ZipTip procedure

Millipore 20 µL ZipTips (Sigma-Aldrich®) imbedded with 0.2 µL of micro-C18 resin, fixed at its to avoid dead volume, were used to perform a micro-chromatographic step to concentrate and purify the digestion samples prior to the downstream sensitive analysis by mass spectrometry. Designated “Elution” and “Washing” aqueous Milli-Q® solutions were prepared containing 80% ACN (Scharlau®), 0.1% FA (Scharlau®); and 5% ACN, 1% FA, respectively. Considering the protein concentration of 10 µg/50 µL on the digested samples, 10 µL were withdrawn to process 2 µg of protein and diluted 1:2 with washing solution. A sequence of 1. conditioning the tip with the elution solution; 2. washing three times; 3. aspirating the sample two times to allow the binding; 4. washing again three times; and 5. eluting to a new Eppendorf, was performed. The final samples were then placed for 1 hour under speed dry (RVC 2-18 CDplus, Christ®) for vacuum concentration, and afterwards stored at -20°C until further use.

50

2.6 High-resolution mass spectrometry analysis

The digested samples of Caco-2 cells treated with HNE in concentration range from 0 to 100 µM were then processed by high-resolution mass spectrometry coupled on-line with a nano-high performance liquid chromatography system. Tryptic peptides were analyzed using a DionexTM UltiMateTM 3000 RSLCnano system (Thermo Fisher Scientific®) (145) connected to the Thermo Scientific™ Orbitrap Fusion™ Tribrid™ mass spectrometer (Thermo Fisher Scientific®) (38) equipped with a nano-electrospray ion source, available at Fondazione Filarete (Milan, Italy). Peptide mixtures were pre-concentrated on a Nano-Trap column, 100 µm x 2 cm, packed with Thermo Scientific™ Acclaim PepMap100 C18, 5µM, 100 Å (Thermo Fisher Scientific®) and separated on an EASY-Spray column, 15 cm x 75 µm, packed with Thermo Scientific™ Acclaim PepMap RSLC C18, 3 µm, 100 Å (Thermo Fisher Scientific®). The temperature was set to 35 °C and the flow rate to 300 nL.min−1. Mobile phases were the following: 0.1% FA in water (solvent A); 0.1% FA in water/ACN with a 2/8 ratio (solvent B). Peptides were eluted from the column with the following gradient: 4% to 28% of B for 90 min, 28% to 40% of B in 10 min, and then to 95% within the following 6 min to rinse the column. The column was re-equilibrated for 20 min. Total run time was 130 min. One blank was run between triplicates to prevent sample carryover. MS spectra were collected over an m/z range of 375-1500 Da at 120000 resolutions, operating in the data- dependent mode, cycle time 3 s between master scans. HCD mode was performed with collision energy set at 35 eV. Three technical replicates were used for each sample, to guarantee the necessary validation of the spectrometric method, therefore each condition of HNE treatment had 6 reported matching spectrum results.

2.7 Data processing with bioinformatic tools

The Xcalibur (Thermo Fisher Scientific®) MS raw data was grouped together as the different biological conditions at stake, i.e. control and treated with an increasing concentration of HNE, to proceed with an untargeted label-free quantitative proteomic analysis using the software packages MaxQuant 1.6.2 (Max Planck Institute of Biochemistry) by the algorithm Andromeda, and Proteome Discoverer 2.2 (Thermo Fisher Scientific®) via the algorithm Sequest HT. The results obtained from MaxQuant were then processed using the software framework Perseus 1.6.1.3 (Max Planck Institute of Biochemistry), whereas the results from Proteome Discoverer were analyzed by its own interface, for data annotation and analysis. In order to perform the matching between experimental spectra data and theoretical in silico created information, the Homo Sapiens sequence database was downloaded from UniProt (ELIXIR Core Data Resource); trypsin was set as the proteolytic enzyme; cysteine carbamidomethylation (+57.021 Da), methionine oxidation (+15.995 Da) and N-terminal acetylation (+42.011) were set as variable/dynamic modifications. On the other hand, a search for other dynamic modifications was also conducted on Proteome Discoverer: HNE modified cysteine, lysine, arginine and histidine residues forming Michael Adducts (+156.116 Da), 2-pentylpyrrole (+120.094 Da), and Schiff Base (+138.104 Da).

51

Overall default settings for protein identification were used and the descriptors of Label-Free Quantification were applied within both software. The False Discovery Rate of protein identification was set to FDR=0.05 (medium-confidence). The implemented workflows are further described in Appendix A.

Lists of all the expressed proteins in each condition, with the respective algorithm score and other relevant analytical parameters, were prompted, grounded on which further qualitative, quantitative and statistical studies were carried out in attempt to elucidate if a differential expression had occurred during the treatment with HNE, seeking to understand which cellular pathways or metabolic functions could be involved in modulation by that molecule and so to characterize its possible effect on the intestinal barrier. Based on integrated information from functional annotation databases like STRING, Gene Ontology, KEGG Pathways and Reactome, selecting Homo sapiens as organism reference, the up- and down-regulated proteins against the control were characterized by their biological processes, cellular localization or molecular functions and protein-interaction networks were exposed. For this, the open source bioinformatic software Cytoscape 3.6.1 (Institute of Systems Biology) was used, with several result outputs being given via its internally installed plugins STRING, BiNGO and ClueGO, as well as other open source tools of enrichment available online.

52

RESULTS AND DISCUSSION

1. Caco-2 proteome modulated by HNE 1.1 Up- and down-regulation of proteins 1.1.1 Protein identification by Peptide Spectral Match with MaxQuant

The primary LFQ analysis of the Orbitrap Fusion™ Tribrid™ mass spectrometer raw data was performed with MaxQuant quantitative proteomics platform (146–148), using Andromeda algorithm. Andromeda is a search engine for matching experimental MS/MS spectra to theoretical peptide sequences in a database based on probabilistic scoring. It follows the approach of calculating the probability that the observed number of matches between the estimated and measured fragment masses could have occurred by chance. The final score involves an optimization for the number of highest intensity peaks that are considered per 100 Da m/z interval. Generally, the score is higher if the matches are among the more intense peaks. (57) The output consists of a raw list of scored peptide candidates per spectrum, together with the associated protein list. The latter contains the LFQ intensity values for all the proteins identified across the processed samples, besides other relevant parameters such as number of peptides and unique peptides, sequence coverage, molecular weight, final algorithm score, MS/MS count, and information provided by the integrated database for basic protein description. (57)

1.1.2 Statistical breakdown with Perseus

The protein list output file of the MS analysis was inputted in the Perseus framework (149) to shape the obtained results. After following filtering procedures, a total of 3354 protein groups were identified, built on which a statistical analysis was then carried to retrieve useful content regarding the outcome of HNE treatment in Caco-2 cell samples. Firstly, on LFQ approaches it is fundamental to have a very strong linear correlation between replicates (technical and biological) and spread amongst all samples. Because the statistic r is not robust, its value can be misleading if outliers are present, therefore the Pearson correlation coefficient (PCC) for instances is preferable, as it is more sensitive to the data distribution. PCC is desired to be near +1 for a good positive linear correlation between variables. The multi-scatter plots for the protein LFQ intensity comparison between replicates for all samples are shown in Appendix C, having selected the Pearson correlation. All the plots evidence PCC>0.97, giving a fairly strong linear correlation. The plots for the replicates (technical and biological) of the sample treated with 10 µM HNE are presented in figure 19, with one of them emphasized to showcase point distribution.

53

Figure 19 – Multi-scatter plot representation (with Pearson Correlation Coefficient) of LFQ intensity amongst replicates for the identified proteins on the Caco-2 sample treated with HNE in concentration 10 µM.

Continuing the statistical data analysis, one of the most important statistical tools in omics experiments, the Volcano plot, was prompted. This type of scatter plot is extremely useful to quickly identify meaningful changes in large datasets composed of replicate data points between two conditions. By plotting the negative log of a measure of statistical significance against the positive log of the magnitude of change (fold-change) between the two conditions, one can see highly significant data points appearing toward the top of the plot, and large changes appearing equidistant from the center and far to the left or right. Figure 20 presents the Volcano plot for the protein datasets on the Caco-2 samples treated with HNE in concentration 0.1 µM, 1 µM, 10 µM and 100 µM in comparison to the control sample, highlighting the points in the regions of higher fold-change alongside most significance, i.e. the outliers. In this case, the fold-change is the positve log2 of the ratio between LFQ intensities of treated and control samples, and the statistical measure is the negative log10 of the p-value. The matrixes of the outliers, for which the positive fold-change (difference) is associated with the up-regulated proteins and the negative one with the down-regulated proteins upon treatment with HNE, are shown extensively in Appendix B. To summarize the most interesting findings for discussion, tables 3 and 4 are presented next.

54

a. b.

c. d.

Figure 20 – Volcano plot representation of the protein datasets on the Caco-2 samples treated with HNE in concentration 0.1 µM, 1 µM, 10 µM and 100 µM compared to the control. a. 0.1 µM, b. 1 µM, c. 10 µM, d. 100 µM.

Plot: statistical significance (-log10 p-value) versus fold-change (Difference = log2 LFQ intensity Treated/Control). The up- regulated proteins are highlighted in green and the down-regulated in red, having imposed a fold-change range >0.5 and <0.5, respectively, for p-value<0.05.

Table 3 – Relevant outliers list for up-regulated proteins on the Caco-2 samples treated with HNE in concentration 0.1 µM, 1 µM, 10 µM and 100 µM.

For each protein, the difference, -log10 p-value, number of peptides and unique peptides, and sequence coverage are reported.

Accession Unique Sequence Protein up-regulated Gene name Difference - log (p-value) Peptides number 10 peptides coverage [%] 100 µM HNE Q71DI3 .2 HIST2H3A 2.652668635 1.841015717 9 1 58.1 P52895 Aldo-keto reductase family 1-member C2 AKR1C2 2.438052877 7.333678304 17 4 65.0 Q04828 Aldo-keto reductase family 1-member C1 AKR1C1 2.286650976 8.941107437 18 1 67.2 ADP/ATP translocase 2;ADP/ATP P05141 SLC25A5 1.985486984 3.894005921 11 4 35.6 translocase 2, N-terminally processed P0DMV9, Heat shock 70 kDa protein 1B;Heat shock HSPA1B, 1.757443746 12.14322701 46 27 64.0 P0DMV8 70 kDa protein 1A HSPA1A Glutamate--cysteine ligase regulatory P48507 GCLM 1.472600778 2.840540652 4 4 20.8 subunit Q4ZG72 ATP-dependent RNA helicase DDX18 DDX18 1.30824852 4.092307315 8 8 19.6 Q5TEC6 Histone H3 HIST2H3PS2 1.262562434 1.957199926 5 1 27.2 Hydroxyacyl-coenzyme A dehydrogenase, Q16836 HADH 1.247538249 1.215141734 6 6 28.8 mitochondrial Sodium/potassium-transporting ATPase P05023 ATP1A1 1.19863383 3.490769308 20 11 26.8 subunit alpha-1 H3F3B,H3F3A Histone H3;Histone H3.3;Histone P84243 HIST3H3 1.154080391 2.378280444 10 2 59.8 H3.1t;Histone H3.3C H3F3C Sodium/potassium-transporting ATPase P05026 ATP1B1 1.13394022 2.95518102 4 4 15.8 subunit beta-1

55

Table 3 (cont.) – Relevant outliers list for up-regulated proteins on the Caco-2 samples treated with HNE in concentration 0.1 µM, 1 µM, 10 µM and 100 µM. For each protein, the difference, -log10 p-value, number of peptides and unique peptides, and sequence coverage are reported.

Accession Unique Sequence Protein up-regulated Gene name Difference - log (p-value) Peptides number 10 peptides coverage [%] 100 µM HNE P10412 .4 HIST1H1E 1.117386818 2.197013578 11 3 26.9 HIST1H4H;HI P62805 Histone H4 1.101834615 2.673068048 13 13 62.1 ST1H4A Q16881 Thioredoxin reductase 1, cytoplasmic TXNRD1 1.00251929 7.630571263 13 12 46.1 Glutamate--cysteine ligase catalytic P48506 GCLC 0.879188855 5.017323095 17 17 32.2 subunit P16401 Histone H1.5 HIST1H1B 0.849722544 3.659302532 8 8 26.1 Q12931 Heat shock protein 75 kDa, mitochondrial TRAP1 0.643105189 3.562072968 18 18 29.8 P30837 Aldehyde dehydrogenase X, mitochondrial ALDH1B1 0.528113683 2.57515175 12 11 28.4 10 µM HNE Q9UHG3 Prenylcysteine oxidase 1 PCYOX1 0.805989265 4.393974376 9 9 24.6 1 µM HNE P02679 Fibrinogen gamma chain FGG 1.662530581 3.239816532 5 5 20.4 Q9UHG3 Prenylcysteine oxidase 1 PCYOX1 0.903085709 4.727033966 9 9 24.6 P37268 Squalene synthase FDFT1 0.902542114 2.863163499 4 4 14.2 Q4ZG72 ATP-dependent RNA helicase DDX18 DDX18 0.787450473 3.333942714 8 8 19.6 0.1 µM HNE Q4ZG72 ATP-dependent RNA helicase DDX18 DDX18 0.972569466 5.682290832 8 8 19.6 Q05086 Ubiquitin-protein ligase E3A UBE3A 0.54363807 4.200667914 5 5 10.2

At higher concentration of HNE (100 µM), the list of up-regulated proteins from the Caco-2 digested proteome includes aldo-keto reductases (AKR1C1, AKR1C2) and aldehyde dehydrogenase (ALDH1B1), which are enzymes known to be generally involved in the catabolism of aldo-electrophiles like HNE. Clearly represented are examples of enzymes regulated by the Keap1/Nrf2/ARE pathway, particularly the ones involved in the synthesis of glutathione (to which HNE is known to bound to be metabolized): glutamate-cysteine ligase catalytic and modifier subunits (GCLC, GCLM); and involved the elimination of ROS: thioredoxin reductase-1 (TXNRD1). All these enzymes are known to be part of the response to oxidative stress and xenobiotic stimulus. (108,120) Stress-induced proteins associated to the regulation of homeostasis and apoptotic processes, like heat shock 70 kDa protein 1 (HSPA1) and heat shock 75 kDa protein (TRAP1), are also manifested, thus correlating to the positive-activation of the Nrf2 signaling pathway that promotes an electrophilic counterattack.

For lower concentrations (0.1 µM-10 µM), it seems that the proteome reports a pro-atherogenic overlook, derived from pro-oxidant, pro-hyperlipidemic and pro-fibrotic actions associated, respectively, with the proteins prenylcysteine oxidase 1 (PCYOX1), squalene synthase (FDFT1) and fibrinogen gamma chain (FGG). It suggests that there is some accumulation of ROS and the possibility of them inducing systemic side effects, but no major stress response seems to be activated, perhaps due to an insufficient concentration of HNE to produce an effective antioxidant defense mechanism request. This is supported by the fact that ubiquitin-protein ligase E3A (UBE3A) is up-regulated at 0.1 µM, which indicates that the complex Cul3-Keap1 is still functioning for targeting Nrf2 to proteasome degradation, and so the transcription factor is not activated to promote the expression of the detoxifying enzymes that were evidenced at the highest concentration of HNE.

56

Table 4 – Relevant outliers list for down-regulated proteins on the Caco-2 samples treated with HNE in concentration 0.1 µM, 1 µM, 10 µM and 100 µM.

For each protein, the difference, -log10 p-value, number of peptides and unique peptides, and sequence coverage are reported.

Accession Unique Sequence Protein down-regulated Gene name Difference - log (p-value) Peptides number 10 peptides coverage [%] 100 µM HNE P62851 40S ribosomal protein S25 RPS25 -0.635217031 2.542026423 8 8 37.6 Q9BZL1 Ubiquitin-like protein 5 UBL5 -0.715422948 1.951687923 2 2 28.8 Q9Y2S6 Translation machinery-associated protein 7 TMA7 -0.716721217 2.370193597 4 4 40.6 P05161 Ubiquitin-like protein ISG15 ISG15 -0.743067106 2.497565446 5 5 46.1 O76003 Glutaredoxin-3 GLRX3 -0.791659355 1.838233361 12 12 34.9 P18077 60S ribosomal protein L35a RPL35A -0.791750272 2.308424346 7 7 36.4 P62979, Ubiquitin-40S ribosomal protein P62987, S27a;Ubiquitin;40S ribosomal protein RPS27A,UBB, -0.807513555 3.559757824 15 3 72.4 P0CG47, S27a;Polyubiquitin-B;Ubiquitin;Polyubiquitin- UBC,UBA52 P0CG48 C;Ubiquitin P62081 40S ribosomal protein S7 RPS7 -0.809478124 4.719759513 17 17 66.5 Eukaryotic translation initiation factor 3 O75821 EIF3G -0.887653033 2.506411337 13 13 45.6 subunit G Q71UI9, H2AFV .V;Histone H2A.Z;Histone H2A -0.974391301 2.867893193 6 4 53.9 P0C0S5 H2AFZ Eukaryotic translation initiation factor 5A; P63241 Eukaryotic translation initiation factor 5A- EIF5A, Q6IS14 1;Eukaryotic translation initiation factor 5A- EIF5AL1, -1.064674377 6.21485533 14 14 76.7 Q9GZV4 1-like;Eukaryotic translation initiation factor EIF5A2 5A-2 Q8N543 Prolyl 3-hydroxylase OGFOD1 -1.080378532 1.681852867 3 3 10.0 O15230 Laminin subunit alpha-5 LAMA5 -1.13339138 2.043215896 31 31 12.8 O75506 Heat shock factor-binding protein 1 HSBP1 -1.206366062 1.398235145 2 2 34.2 P15531 Nucleoside diphosphate kinase A NME1 -1.370045344 3.234450072 13 3 86.2 10 µM HNE Glycerol-3-phosphate P43304 dehydrogenase;Glycerol-3-phosphate GPD2 -0.735952377 3.198443013 12 12 21.5 dehydrogenase, mitochondrial Q9H9B4 Sideroflexin-1 SFXN1 -0.790905635 4.657627272 9 9 34.5 1 µM HNE Pyruvate dehydrogenase E1 component subunit alpha;Pyruvate dehydrogenase E1 P08559 PDHA1 -0.626162465 2.215686897 8 8 25.6 component subunit alpha, somatic form, mitochondrial Cytochrome c oxidase subunit 5A, P20674 COX5A -0.669151306 2.207590154 9 9 58 mitochondrial P00441 Superoxide dismutase [Cu-Zn] SOD1 -0.683326085 2.738603203 11 11 97.4 Cytochrome c oxidase subunit 5B, P10606 COX5B -0.716614087 2.126159862 6 6 30.2 mitochondrial Sodium/potassium-transporting ATPase P05023 ATP1A1 -0.795041084 2.914359097 20 11 26.8 subunit alpha-1 Q9H9B4 Sideroflexin-1 SFXN1 -0.870193799 4.61408463 9 9 34.5 0.1 µM HNE P62979, Ubiquitin-40S ribosomal protein P62987, S27a;Ubiquitin;40S ribosomal protein RPS27A,UBB, -0.774700483 3.969514981 15 3 72.4 P0CG47, S27a;Polyubiquitin-B;Ubiquitin;Polyubiquitin- UBC,UBA52 P0CG48 C;Ubiquitin

Analyzing the list of down-regulated proteins at 100 µM concentration of HNE, ubiquitin (RPS27A, UBB, UBC, UBA52) and ubiquitin-like proteins (UBL5, ISG15) pop up, which might suggest that ubiquitination-type processes function in a non-conventional way when the cell is reacting to oxidative damage. The heat shock factor-binding protein 1 (HSBP1) evidences a really high negative fold-change, consistent with its role as a negative regulator of heat shock response in mammalian cells on conditions of thermal and chemical stress, as its dissociation from heat shock factor 1 stimulates a transcriptionally-activated stress counteraction. (150)

57

Another interesting observation is the under-expression of glutaredoxin-3 (GLRX3), a redox enzyme shown to be critical in maintaining redox homeostasis and regulating cell survival pathways in cancer cells. Decreased levels of GLRX3 are associated with susceptibility to cellular oxidative stress, supporting the idea of having been induced an oxidative stress response. (151)

When the concentration of HNE is lower, near basal levels within the cell, the Caco-2 proteome shows that common under-expressed proteins include sideroflexin-1 (SFXN1), pyruvate dehydrogenase E1 (PDHA1), glycerol-3-phosphate dehydrogenase 2 (GPD2), superoxide dismutase [Cu-Zn] (SOD1) and cytochrome c oxidase subunit 5 (COX5A, COX5B). These findings are in relative accordance to the previous ones: at this stage, the cell is not requiring a great defense against oxidative stress, particularly by having a key enzyme of ROS elimination, the superoxide dismutase, down-regulated. Nonetheless, the presence of those reactive species is already being felt by the cell, since cytochrome c oxidase, which is the terminal oxidase of the mitochondrial electron transport chain and has direct influence on cellular level of ATP, is depleted, and evidence on literature suggests that its dysfunction is invariably associated with increased mitochondrial reactive oxygen species production and cellular toxicity. (152) Cells compensate for decreased mitochondrial ATP production through aerobic respiration by upregulating the less efficient glycolytic portion of this pathway. Key enzymes associated to respiration are precisely glycerol- 3-phosphate dehydrogenase and pyruvate dehydrogenase. GDP2 (mitochondrial) catalyzes the irreversible oxidation of glycerol-3-phosphate to dihydroxyacetone phosphate and concurrently transfers electrons to the electron transport chain. PDHA1 contributes to the process of pyruvate decarboxylation transforming pyruvate into acetyl-CoA, which may then be used in the mitochondrial citric acid cycle to carry out cellular respiration. The fact that these enzymes are down-regulated suggests a shift towards the fermentative pathway derived from the oxidative stress imbalance. (153,154) Furthermore, oxidative damage can be linked to iron excess and accumulation within the cell, and this correlates to the fact that sideroflexin-1, a mitochondrial tricarboxylate and iron carrier protein, essential for the regulation of iron homeostasis, is here under-expressed. (155,156)

It is necessary to point out, nevertheless, that the factor of incubation time with HNE was not taken into account in this study, since it was only done one assay for 24h treatment. This variable is of high importance to proteomic studies because the proteome is extremely sensitive to the exterior ambient and is constantly changing subsequently. An extended/shorter period of exposure to such a highly electrophilic species could give a more assertive indication on its dual role in cytotoxicity-cytoprotection.

Another relevant statistical procedure is the Principal Component Analysis (PCA), which by giving a lower- dimensional picture can be used to visualize variation and strong patterns in a dataset, since it is a data compression method, removing noise and redundancy. It is defined as an orthogonal linear transformation of possibly correlated variables into a smaller number of linearly uncorrelated variables called principal components. In this new coordinate system, the first principal component has the largest possible variance,

58 accounting for as much of the variability in the data (how spread the data is) as possible, and each succeeding component has the highest variance possible under the constraint of being orthogonal to the preceding ones. Figures 21 and 22 evidence PCA based on LFQ intensity of all the sample replicates and on the list of identified proteins, respectively, with the first to third principal components.

a. v

b. v

Figure 21 – Principal Component Analysis based on LFQ intensity of the identified proteins on the Caco-2 sample replicates: control and treated with HNE in concentration 0.1 µM, 1 µM, 10 µM and 100 µM. a. PC2 vs. PC1 plot, b. PC3 vs. PC1 plot. The technical replicates are grouped together in distinctive colors according to the respective concentration of HNE.

Analyzing the plot for LFQ intensity on figure 21, it is clear that two separate regions of data are shown by the first component with the highest variance (43.7%), corresponding to the biological replicates A or B of the processed samples: on the positive x-axis side are all samples A, and on the negative x-axis side all samples B. This can somewhat be explained by the fact that samples B were run in the mass spectrometer posteriorly to samples A, and so the LC-MS initial conditions were not met to be fully reciprocated, due to batch-to-batch variability. Nevertheless, it is relatively possible to visualize by region (mostly on samples Aside) that differences in protein expression were greater at higher concentration of HNE (100 µM), as compared to lower concentrations (0.1 µM to 10 µM) and untreated controls, whose groups are shown closer to one another and to the zero line. This confirms that the exposure to an electrophilic environment initially results in a soft response, manifesting small changes in protein expression, which then increase tremendously when the concentration of HNE is 10 times higher reaching a more toxic level for the cell.

59

Looking at the second and third principal components on the y-axis, respectively holding 28.5% and 12.4% of data variance, there is not a very clear pattern of distinction between groups, as for instances one depicting an obvious maximum variation from the rest. With PC2, the region of the samples B is quite compact, but the one treated with 100 µM HNE is a bit separated from the others, and for samples A, the groups of bigger variation are the ones associated with the concentrations 1 µM and 100 µM. With PC3, both regions evidence the bigger variation for the concentrations 0.1 µM and 100 µM. The particularities are difficult to explain, however the common dissimilarity of the sample treated with the highest concentration of HNE goes along what has been discussed so far, that such concentration leads to the most changes in the proteome in comparison to the more basal levels.

The technical replicates for each sample are, on most cases, visibly nearby one another, which validates good data reproducibility and can help finding potential clusters, indicating protein expression differences among different groups according to the concentration of HNE. In regard to this, however, the A samples seem to be more widespread between replicates. The overall dispersion can be explained by the presence of outliers in the dataset: both the variance and the covariance matrix are known to be sensitive to outliers, making PCA a non- robust method, which leads to principal components being distorted so as to fit the outlier well. (157) It is also noticeable that the plots show majorly a negative PC correlation, due to the negative covariance relation in the dataset.

Considering that the dataset at work presents a high level of redundancy, evidenced by figure 19, PCA is useful to reduce data dimensionality, as proven by figure 22, where one can find a central, compact group of proteins and then only some outliers.

a. b. v v

Figure 22 – Principal Component Analysis based on the identified proteins on the Caco-2 sample replicates: control and treated with HNE in concentration 0.1 µM, 1 µM, 10 µM and 100 µM. a. PC2 vs. PC1 plot, b. PC3 vs. PC1 plot. The outliers of the plots are highlighted.

Looking at those outliers, on the left-side of the plot are shown a cytoskeletal keratin (KRT18) and a histone (HIST1H2AH), and on the right-side two (HIST1H4H, HIST1H2BI) and a cytoplasmic actin (PS1TP5BP1). The separation of these particular proteins might be a suggestion of biomarkers to discriminate the increasing effect of oxidative stress on the cell, since in order to be PCA outliers it is inferable that their expression alters the most along the tested concentrations of HNE. Histones type H2A, H2B, and H4 are part of the core histones, being directly associated with the formation of the nucleosome block, therefore regulating gene transcription.

60

In general, genes that are active have less bound histone, while inactive genes are highly associated with histones during interphase, this is, the metabolic phase of the cell cycle in which the cell conducts ordinary functions to its growth. Hence, it is not surprising that upon an induced oxidative stress response that requires the expression of certain antioxidant genes and alters the priority of cellular processes, there are alterations on the histone expression pattern.

Adding to this analysis, clustering is an important technique for data mining in proteomics experiments; for instances, it allows to organize groups of genes with related expression patterns, that can contain functionally related proteins such as enzymes for a specific pathway, or genes that are co-regulated. The purpose of clustering is therefore grouping a set of objects in such a way that objects in the same cluster are more similar to each other than to those in other clusters. Hierarchical Clustering Analysis (HCA) was performed in a bottom-up approach, meaning each observation/object started in its own cluster and then the most similar pairs of clusters were merged until obtaining just one cluster containing all objects. In order to decide which clusters should be combined, a measure of dissimilarity between sets of observations is required. In this case, the Euclidean distance was first used as measure of distance between pairs of objects, and then an average-linkage criterion was followed as measure of distance between sets, taking that distance as the average distance between their objects. Afterwards, the agglomeration of clusters on each iteration is done based on the minimal distance between one another. Figure 23 represents the obtained dendrograms for the simultaneous hierarchical clustering of proteins (rows) and conditions (columns), with the latter being the LFQ intensity for the technical replicates of each sample, also giving the heat map. The ultimate goal of this analysis is to identify groups of proteins whose expression might indicate a higher or lower concentration of HNE, meaning a change in expression across the analyzed samples, while verifying if the technical replicates are clustered together in accordance to their respective concentration.

The conditions dendrogram reveals that a high threshold (red line) can be established to separate the conditions in two major clusters. Each one of these clusters comprises all the technical replicates of both control and treatment, from either sample type A or B, which corroborates the results given by PCA: a differentiation on the biological replicates. Further looking into this dendrogram, 12 smaller clusters can be grouped holding a lower threshold (green line), each one containing the technical replicates of their respective concentration of HNE, within sample type A or B, with exception of two errors in the B section for the clusters of control and 1 µM. It is also possible to notice that the least similar cluster is the one corresponding to the highest concentration of HNE; and that the replicates of control and 0.1 µM, and 1 µM and 10 µM, can be grouped into two clusters, and these are mergeable into one with a threshold (purple line) separating it from the 100 µM cluster.

61

samples A samples B

Figure 23 – Hierarchical Clustering Analysis for clustering the technical and biological replicates, and the identified proteins on the Caco-2 samples: control and treated with HNE in concentration 0.1 µM, 1 µM, 10 µM and 100 µM. Dendrograms and heat map obtained with the dissimilarity measures Euclidean distance and Average-Linkage. The column tree highlights in color 12 clusters and 3 threshold lines, and the row tree 10 clusters and 1 threshold line.

The results of the heat map don’t show a clear change in protein expression between the different conditions. This can be further confirmed by selecting as example two protein clusters (highlighted in pink and baby blue) and focus on the expression of those groups of proteins across the several conditions (figure 24), which does not vary significantly, meaning it is not possible to validate certain clusters of proteins as a pattern to identify up- or down-regulation in accordance to the concentration level of HNE in the samples.

Figure 24 – Expression pattern across the conditions (replicates) on the Caco-2 samples: control and treated with HNE in concentration 0.1 µM, 1 µM, 10 µM and 100 µM, for two selected protein clusters.

The pink protein cluster size is 578 proteins and the baby blue one is 22 proteins.

62

This does not mean, however, that there aren’t proteins whose expression specifically changes when increasing the concentration of HNE, as that was proven to be case by finding outliers both with Volcano plots and PCA. It means instead that the way the protein clusters were grouped in this clustering analysis does not necessarily put together those proteins, and corresponding to a certain concentration replicate, so in average the variation in expression is more or less flat, and peaks are hard to distinguish to be attributed to one of the previously identified outliers. In similar fashion to the two protein clusters represented in figure 24, no particular cluster evidenced outliers as with a major difference in expression.

Figure 24 also showcases that the protein clusters were grouped in such way that they only contain, for all the conditions, either over-expressed (green) or under-expressed (red) proteins. Looking again at figure 23, the threshold on protein clusters (blue line) separates it into two larger clusters and from the heat map it is visible that one contains all the repressed proteins. The other contains all the over-expressed ones with exception of a group of repressed proteins that is part of a smaller cluster (highlighted in gold), zoomed in the same figure. Nevertheless, several grey areas appear in the heat map, associated in particular with two clusters of proteins (highlighted in dark blue and orange) for the conditions related to samples B, which represent the missing values denoted as “NA” in the protein matrix of LFQ intensity.

1.1.3 Protein annotation and functional proteomic analysis

Making use of several bioinformatic plugins within Cytoscape software framework, the lists of identified up- and down-regulated proteins in the processed samples were put to generate networks of protein interaction and connections of different biological processes, molecular functions and cellular localization.

The first goal was to retrieve a STRING network to see how the different proteins connect to one another. STRING is a database of protein-protein interactions, accounting for both physical and functional associations. Figures 25-27 illustrate the networks obtained for the up- and down-regulated proteins, with the lists of the low concentration samples (0.1 µM – 10 µM) combined.

63

a. b.

Figure 25 – STRING network diagram with minimum confidence of 0.4 (medium) for the up- and down-regulated proteins on the Caco-2 sample treated with HNE in concentration 0.1 µM – 10 µM combined. a. up-regulated proteins, b. down-regulated proteins. Interesting groups are circled in red.

ANTIOXIDANT ENZYMES

HISTONES

Figure 26 – STRING network diagram with minimum confidence of 0.4 (medium) for the up-regulated proteins on the Caco-2 sample treated with HNE in concentration 100 µM. Interesting groups are circled in red.

64

Figure 27 – STRING network diagram with minimum confidence of 0.4 (medium) for the down-regulated proteins on the Caco-2 sample treated with HNE in concentration 100 µM. Interesting groups are circled in red.

UBIQUITINS

For the highest concentration of HNE (figures 26 and 27), it is possible to observe groups of proteins with high-confidence interactions: on the over-expressed side, histones (HIST1H4A, HIST1H4H, HIST3H3, HIST1H1E, HIST1H1B, H3F3B, H3F3A) and the antioxidant enzymes mentioned before (AKR1C1, AKR1C2, GCLC, GCLM, ALDH1B1, TXNRD1); on the under-expressed side, ubiquitins (UBB, UBC, UBA52, RPS27A). For the lower concentrations of HNE (figure 25), the up-regulated proteins appear mostly without interactions and no relevant groups strike out, contrary to the down-regulated proteins, that show some histones and ubiquitins in close interaction.

Nevertheless, in all cases, some of the identified proteins are not involved in protein-protein interactions, which correlates to the heterogeneity of the Caco-2 digested proteome since the totality of proteins was considered for mass spectrometry without a prior enrichment step, and many cellular targets are associated. Another interesting piece of information is the fact that the networks present a number of interactions higher than expected. This means that the proteins of these datasets show more interactions among themselves than what would be expected for a random protein set of similar size drawn from the reference Homo sapiens genome. Such an increase indicates that the proteins are at least partially biologically connected as groups important to certain cellular contexts.

The produced STRING networks were further enriched with functional information from Gene Ontology and KEGG Pathways, in order to highlight and explore different pathway clusters. Given a set of genes that are up- or down-regulated under certain conditions, an enrichment analysis finds which pathways are over-represented using annotations for that gene set. Reporting the functional enrichment tool of STRING, table 3 registers which

65

GO-terms and KEGG pathways were significantly enriched for the list of up-regulated proteins on the sample treated with 100 µM HNE.

Table 5 – GO-terms and KEGG pathways enrichment analysis for the list of up-regulated proteins on the Caco-2 sample treated with HNE in concentration 100 µM. Hypergeometric statistical test with Benjamini and Hochberg False Discovery Rate multiple test correction, with a cut-off value < 0.05.

UP-REGULATED 100 µM

count in gene set FDR p-value

Biological Process (GO)

DNA methylation on cytosine 4 0.0231

Molecular Function (GO)

Binding 56 0.0138 Protein binding 35 0.0029 Enzyme binding 20 3.8E-5 Poly(A) RNA binding 20 1.3E-4 Macromolecular complex binding 14 0.0177 Nucleoside-triphosphatase activity 11 0.025 Protein complex binding 9 0.0491 Structure-specific DNA binding 8 0.00216 DNA binding 5 0.00252 Nucleossomal DNA binding 3 0.0029 Atpase activity coupled to transmembrane movement of ions 3 0.0491 Glutamate-cysteine ligase activity 2 0.00796

Cellular Component (GO)

Organelle part 50 6.4E-5 Cytoplasmic part 41 0.023 Extracellular exosome 39 5.45E-12 Organelle lumen 38 1.15E-6 Macromolecular complex 32 0.00418 Protein complex 29 0.00444 Intracellular non-membrane-bounded organelle 28 0.00232 Nucleoplasm 22 0.0206 Organelle membrane 22 0.0298 Cytoplasmic vesicle 13 0.0111 Nucleolus 12 0.00444 Catalytic complex 11 0.0234 Nucleosome 8 2.96E-7 Mitochondrial matrix 8 0.00444 Chromatin 8 0.00686

KEGG Pathways

Metabolic pathways 12 0.0346 Alcoholism 6 0.00499 Protein processing in the endoplasmic reticulum 6 0.00641 Rap1 signaling pathway 5 0.0384 Ras signaling pathway 5 0.0299 MAPK signaling pathway 5 0.034 VEGF signaling pathway 4 0.00719 Pancreatic cancer 4 0.0308 Colorectal cancer 3 0.00719

66

The main biological process evidenced is DNA methylation, accounting the enrichment for the group of histones mentioned before. This suggests that the presence of HNE in high concentrations might promote an increase in histone expression. As mentioned before, histones are critical regulators of transcriptional processing through their recruitment of transcription factors via post-translational modifications and their restructuring of chromatin to allow accessibility to DNA. (105,158) Therefore, it is reasonable to consider that in response to oxidative stress conditions, to activate the transcription of detoxifying enzymes, histones play an important role, which can at the same time be compromised since they provide a fertile residue-source for adduction by exogenous and endogenous electrophiles. It has been demonstrated that some products of lipid peroxidation, like 4-oxo-2-nonenal and acrolein, bind to histones due to their cysteine-deficient, lysine-rich content, generating adducts that interfere with nucleosome assembly and disrupt chromosome replication. (158) On the other hand, HNE has been found to preferably adduct cysteine-containing regulatory chromatin proteins, with a relative aversion to histones. (105) Nonetheless, there is evidence altogether in the presented up-regulated proteins upon HNE treatment hinting that this compound can indirectly endorse cellular antioxidant actions.

For the molecular functions, the majority report to usual binding modes, not particularly derived from the presence of HNE. However, glutamate-cysteine ligase activity stands out, confirming what had been proposed by the previous Volcano plot general analysis: the activation of detoxifying enzymes by meanings of glutathione synthesis linked to the negative feedback effect of an HNE increase in the cell, since glutathione is a major electrophile scavenger.

Looking at the focal KEGG pathways identified, metabolic pathways are unsurprisingly the most enriched given that cell life and sustainability depend on it, especially under stress conditions that require for a re-balance of the enzymatic chemical transformations happening in the cell. Next, alcoholism, i.e. the dependence on ethanol exposure, is referenced as it is known to be interconnected with oxidative stress; an increased consumption of alcohol leads to free radical production and lipid peroxidation, thus inducing an imbalance between pro-oxidant and antioxidant mechanisms, elevating the former. (159) Because the liver is the main ethanol-detoxifying organ, ethanol-induced alterations have been mostly studied in hepatic tissue, and alcoholic liver disease has proven to be a good model to study chronic oxidative stress and the role of lipid peroxidation and protein adduction in disease. (160) Alongside other lipid peroxidation products, e.g. malondialdehyde, HNE is able to mediate the modulation of tissue and cellular events responsible for the progression of liver fibrosis, which recurrently will lower the threshold level for alcohol metabolism. (161)

To have a different view on enrichment based off the list of up-regulated proteins, other open source tools available online were used, namely GO Enrichment through PANTHER Classification System (http://pantherdb.org/tools/compareToRefList.jsp) and GOrilla (http://cbl-gorilla.cs.technion.ac.il/). Overall, they offer the same type of information regarding the classification in GO terms through different visualization and organization modes.

67

Figures 28 and 29 display, respectively, PANTHER-GO-Slim and GOrilla enrichment results for the biological process domain over the list of up-regulated proteins on the sample treated with 100 µM HNE.

Figure 28 – PANTHER over-representation test with PANTHER GO-Slim biological process annotation for the list of up- regulated proteins on the Caco-2 sample treated with HNE in concentration 100 µM. Fisher's Exact statistical test with Benjamini and Hochberg False Discovery Rate multiple test correction, with a cut-off value < 0.05.

68

Figure 29 – GOrilla over-representation test with GO biological process annotation for the list of up-regulated proteins on the Caco-2 sample treated with HNE in concentration 100 µM. Hypergeometric statistical test with Benjamini and Hochberg with False Discovery Rate multiple test correction, with a cut-off value < 0.05.

N - is the total number of genes B - is the total number of genes associated with a specific GO term n - is the number of genes in the top of the user's input list or in the target set when appropriate b - is the number of genes in the intersection Enrichment = (b/n) / (B/N)

69

These tools have provided a much richer content regarding the biological processes linked to the up- regulated proteins on the condition of over-exposure to HNE. 100 µM is an extremely high concentration for an electrophile species that is usually present in the cell up to 1 µM, and so, as expected, a notorious stress response has been activated. This had already been proposed with the previous analysis, nonetheless, now accentuated with the identified pathways by GOrilla, which can be broadly grouped as two major inter-connected clusters: nucleosome assembly and response to a chemical stimulus. The automatic arrangement of the protein/genes associated with each GO term makes it easier to assert their connections to the cellular processes they are involved in. Regarding the enriched groups for response to stimulus, they count the heat shock proteins, aldo-keto reductases and glutamate-cysteine ligase subunits, as pointed out before, but new, not so obvious information comes with the association of an ATPase for sodium/potassium transport (ATP1A1, ATP1B1) (which was down- regulated at lower concentration), an hydroxyacyl-coa dehydrogenase (HADH), an ADP/ATP translocase carrier protein (SLC25A5) and a DEAD-box protein helicase (DDX18). A search of these proteins on UniProt database (https://www.uniprot.org/) clarifies that they can be implicated in several other cellular processes, thus are not so bluntly related to specific cellular responses to the presence of HNE as an aggressor, yet can be indirectly important due to their roles in metabolism and maintaining cell homeostasis. In respect to nucleosome assembly, also evidenced in the PANTHER enrichment with the pathways of chromatin organization, the intervenient proteins are clearly the various histones already highlighted with STRING, once again reinforcing the fact that this is an important cellular process triggered in the follow-up of the electrophilic attack by HNE. Besides the eminence of histone adduction itself, it is moreover owing to the potential damaging effect of this compound when forming DNA-adducts, compromising cell function and viability. Not only HNE but also other electrophiles, being endogenous lipid peroxidation products or coming from exogenous sources, have been reported to induce a significant decrease of the overall acetylation state of core histones, thus altering nucleosome assembly. (158,162) Regardless of the mechanism of action, data suggests that these compounds may alter the underlying epigenetic state of the cell, potentially propagating stress responses such as inflammation. As a fact, classical examples of mechanisms that introduce epigenetic changes are DNA methylation and histone modification, each altering how genes are expressed without altering the underlying DNA sequence. (105)

Accessible within Cytoscape software are BiNGO and ClueGo plugins, presenting as main advantage over other GO tools the fact that can be used directly and interactively on molecular interaction graphs. Figures 30 – 32 give the ClueGO groups network based on the GO biological process for, respectively, the up-, down-, and combined up- and down-regulated proteins on the sample treated with 100 µM HNE. Figure 33 presents the BiNGO network of GO biological process for the down-regulated proteins.

70

VEGF SIGNALLING β-CATENIN:TCF COMPLEX

NUCLEOSOME

MICROTUBULE POLYMERIZATION

LUPUS

Figure 30 – ClueGO over-representation test with GO annotation biological process for the list of up-regulated proteins on the Caco-2 sample treated with HNE in concentration 100 µM. Two-sided hypergeometric statistical test with Bonferroni p-value multiple test correction, with a cut-off value < 0.05, showing medium network specificity.

TRANSLATION

REGULATION

Figure 31 – ClueGO over-representation test with GO annotation biological process for the list of down-regulated proteins on the Caco-2 sample treated with HNE in concentration 100 µM. Two-sided hypergeometric statistical test with Bonferroni p-value multiple test correction, with a cut-off value < 0.05, showing medium network specificity.

71

For the up-regulated proteins (figure 30), ClueGo highlighted several intertwined biological processes relating the nucleosome with systemic lupus erythematosus disease and β-catenin:TCF activating complex. It is not surprising for lupus to be referenced since this autoimmune inflammatory disease can be linked to oxidative stress due to its contribution to immune system dysregulation, abnormal activation and processing of cell-death signals and autoantibody production that characterize the pathology. Oxidative modification of self- antigens triggers autoimmunity, and the degree of modification of serum proteins has shown correlation with disease activity and organ damage in lupus. Excessive production and ineffective clearance of ROS, and depletion of reduced glutathione whose action regulates mitochondrial hyperpolarization to balance out oxidative stress, have been found in peripheral blood lymphocytes from patients with this disease, appearing to be detrimental in its development. (163,164) As commented before, glutathione synthesis was found in this analysis to be up- regulated upon the treatment with HNE in high concentration. On the other hand, ROS are also known to modulate signaling by the Wnt/β-catenin pathway, which is fundamental to the control of cell proliferation, development and fate. (165) Wnt proteins activate different intracellular signal transduction pathways, and a central component is the effector β-catenin, which interacts with TCF transcription factor family to activate the expression of Wnt target genes involved in benefic cell effects. (166) However, Wnt signaling is upregulated in inflammatory processes and oxidative stress and is highly associated with numerous cancers, in particular colorectal. (165) Adding to this, increased ROS production diverts the limited pool of β-catenin from TCF- to FOXO-mediated transcription of genes participating in antioxidant response, which shifts cellular processes to be altered by the adverse effects of oxidative stress. Thus, β-Catenin has been viewed as a key molecule in defense against oxidative stress (167), correlating to the noticed up- regulation of its associated pathways as part of the Caco-2 response against the HNE stimulus. Another pathway modulated by oxidative stress here evidenced is the VEGF signaling pathway, identified before as an enriched KEGG pathway (table 3), which is associated to angiogenesis, i.e. the process of sprouting new blood vessels from preexisting vasculature. VEGF signaling is required for normal vascular development and homeostasis, but it is also involved in tumor progression by promoting growth of tumor vasculature. (168) ROS stimulate the induction of VEGF expression in various cell types (endothelial cells, smooth muscle cells, macrophages), that controls cell migration. (169) Several studies have identified new mechanisms of ROS- activated angiogenesis that operate in a VEGF-independent manner, one of these mechanisms mediating proangiogenic activities of ROS reporting to lipid peroxidation products like HNE. (170)

The down-regulated proteins network (figure 31) majorly evidences processes related to the regulation of translational events and ribosome/ribonucleoprotein complex binding, correlating to the under-expression of ribosomal proteins (RPS25, RPLP2, RPL35A, RPS7), translation initiation factors (EIF5A, EIF5AL1, EIF5A2, EIF3G), termination regulators (OGFOD1) and a nucleoside diphosphate kinase (NME1), listed on table 4. While a possible causal link between oxidative stress, translation activity reduction or inhibition, and the onset of oxidative stress- related diseases has been hypothesized, the molecular consequences of mRNA and rRNA damage on ribosome functions and protein synthesis have yet to be uncovered in detail. (171)

72

TRANSLATION REGULATION

TRANSLATION REGULATION

Figure 32 – BiNGO over-representation test with GO annotation for biological process for the list of down-regulated proteins on the Caco-2 sample treated with HNE in concentration 100 µM. Hypergeometric statistical test with Benjamini and Hochberg False Discovery Rate multiple test correction, with a cut-off value < 0.01. Interesting groups are circled in red.

The BiNGO network obtained (figure 32) is quite intricate, making it difficult to discern essential cellular pathways associated to the proteome down-regulation when Caco-2 cells were treated with HNE at high concentration. However, one that is noted at various end points of this hierarchical display is the regulation of translational elongation and termination mentioned before.

Therefore, to further explore the list of down-regulated proteins after exposure to HNE, the open source tool of Reactome Pathways v.65 available online (https://reactome.org/PathwayBrowser/) was used, prompting the network presented in figure 33. The results show 531 retrieved pathways based on Uniprot database entries and 17 pathways from Ensembl, with the latter being enlisted on figure 34.

73

Figure 33 – Reactome over-representation test for the list of down-regulated proteins on the Caco-2 sample treated with HNE in concentration 100 µM. Hypergeometric statistical test with Benjamini and Hochberg False Discovery Rate multiple test correction, with a cut-off value < 0.05.

Figure 34 – Top-pathways of the Reactome over-representation test for the list of down-regulated proteins on the sample treated with HNE in concentration 100 µM, according to Ensembl resource. Hypergeometric statistical test with Benjamini and Hochberg False Discovery Rate multiple test correction, with a cut-off value < 0.05.

74

Figure 33 shows a genome-wide hierarchically overview of the results for the pathway analysis. Each step away from the bursts center represents the next level lower in the pathway hierarchy. The color code denotes over-representation of that pathway in the inputted dataset: yellow signifies pathways that are significantly over- represented, and it is possible to conclude that they are majorly associated with the cellular mechanisms of response to external stimulus, signal transduction, protein metabolism, DNA repair and replication, and immune response. This was fairly expected considering the electrophilic attack by a noxious agent and the subsequent cell actions to maintain homeostasis and so propel effective defensive responses. A closer look on the top identified pathways on figure 33 discriminates more specific stress-related processes, namely cellular responses to hypoxia and heat, and detoxification of reactive oxygen species, being particularly annotated the activation and regulation of heat shock factor 1 (HSF1). Overall linked to these processes is the under-expression of ubiquitins (UBB, UBC, UBA52, RPS27A – human genes coding for ubiquitin) and heat shock factor-binding proteins (HSBP1), in completion to what had been suggested by STRING connections. Heat shock factor 1 is a transcription factor that activates gene expression in response to external aggressions like heat shock and oxidative stress. During this response, HSF1 undergoes conformational transition from an inert non-DNA-binding monomer to an active functional trimeric structure that will bind to a DNA promotor called the heat shock element (HSE). Fighting the activation of HSF1 is its factor binding protein HSBP1, which is nuclear-localized and interacts with the active trimeric state of HSF1 to negatively regulate HSF1-DNA binding activity. (150,172,173) Therefore, the evidenced under-expression of HSBP1 upon treatment with a high concentration of HNE sustains the transactivational activity of HSF1, interacting with the promotor element in order to accomplish an expression-induced cell response to the applied stress. On the other hand, ubiquitins are key participants in protein regulation and degradation pathways: the ubiquitination of a substrate protein (by covalent tagging of ubiquitins mainly to lysine or cysteine residues) can mark them for degradation via the proteasome, alter their cellular location, affect their activity and promote or prevent protein interactions. (174) This process is generally augmented in heat shock and other stress conditions due to the likelihood of generating damaged proteins, which can compromise cellular and genomic integrity, and therefore need to be eliminated. As so, the Ubiquitin Proteosome System (UPR) plays a crucial role in counteracting the effects of stressors. However, the case of oxidative stress, alongside other examples such as iron limitation, denotes an opposing particularity, because the stress sensors are ubiquitin ligase complexes bound to regulators of response elements, whose ubiquitin activity is strongly diminished upon redox-disrupting stimuli. This way, it occurs the stabilization, instead of degradation, of the correlated transcription-activating factors, thus prompting a cellular response through the expression of detoxifying enzymes. (175) Under this conception, there is a clear connection between the observed down-regulation of ubiquitins in the presence of a high concentration of HNE to the previous observation of some of those detoxifying enzymes being up- regulated. Altogether, there is strong evidence pointing to an HNE-induced activation of an antioxidant signaling pathway, most likely the Cul3 ubiquitin ligase-Keap1/Nrf2 pathway explained in the introduction section.

75

Also highlighted in figures 33 and 34 are immune-related processes, such as signaling by cytokines (e.g. interferons), being the corresponding under-expressed protein an ubiquitin-like protein (ISG15). Interferon (IFN) signaling pathway plays a central role in initiating immune responses, especially antiviral and antitumor effects. Among the proteins induced by IFN-α/β (type I) is the ubiquitin-like ISG15, leading to its conjugation to target proteins in the control of cellular responses to DNA damage conditions and viral activity. (176–178) Like in ubiquitination, ISGylation involves an enzymatic cascade with E1 activating enzyme, E2 conjugating enzyme and E3 ligase, which catalyze the conjugation of ISG15 to a lysine residue on the target protein. (178) However, it seems that there is not a common fate for ISGylated proteins, in comparison to the degradation of the majority of ubiquitinated proteins, but it varies according to the particular function of the individual targeted proteins. (179) Besides several antiviral regulatory proteins, proteins that function in stress responses had been identified as ISG15 targets, including key enzymes involved in oxidative stress such as thioredoxin reductase-1 and glutathione S-transferase, and several heat-shock proteins, as reported in earlier proteomic works of Zhao et. al (2005). (176) Considering that these particular proteins have been found to be up-regulated upon treatment with HNE in high concentration, and reversely ISG15 down-regulated alongside ubiquitins, it powers the suggestion that their intracellular level under normal cell conditions is kept low by E3 ligase-mediated direct ISGylation and indirect ubiquitination of Nrf2 for proteasome degradation, yet under oxidative stress conditions they are keen to detoxification therefore the deconjugation of ubiquitin and ubiquitin- like proteins is impelled. For deISGylation, UBP43/USP18 was the first ISG15-specific protease reported (180), so accompanying the previous evidence for ISG15 under-expression in the presence of HNE, it could be expected for this protein to be up-regulated, however that was not particularly verified.

1.2 Non-enzymatic post-translational modifications 1.2.1 Protein identification by Peptide Spectral Match with Proteome Discoverer

A second LFQ analysis of the Orbitrap Fusion™ Tribrid™ mass spectrometer raw data was performed with Proteome Discoverer quantitative proteomics platform (181,182), using Sequest HT algorithm. Sequest HT is another search engine for matching experimental MS/MS spectra to theoretical peptide sequences in a database but based on automated cross-correlation. It works by retrieving the mass of the intact peptides from the observed spectra and using it to select a set of candidate peptides sequences fitting that mass range; then for each one, the algorithm projects a theoretical tandem mass spectrum that is put to comparison with the measured fragmentation pattern. The best hit will have the highest value of Xcorr, as this final correlation scoring is used to produce the ultimate ranking of candidate peptides in the search. XCorr values are usually higher for well-matched, large peptides, and lower for smaller peptides. (35,183) Similar to other software packages, but with the advantage of being more user-friendly, the output consists of a list of identified proteins, each one holding its several scored peptide groups that were matched to the experimental spectrum and for which there is extensive information regarding modifications of residues, m/z ratio, theoretical mass, retention time and attributed XCorr value, among other parameters related to peptide

76

fragmentation. The proteins enlisted refer the number of peptides and unique peptides, sequence coverage, molecular weight, final algorithm score, etc. and evidence a comprehensive annotation provided by the software for protein characterization, including linked biological processes and molecular functions. (182)

1.2.2 Search for HNE-adducts

Non-enzymatic post-translational modifications of proteins occur when a nucleophilic or redox-sensitive amino acid side chain encounters a reactive metabolite. Often, the biological function of these modifications is limited by their irreversibility, and so they can be considered indicators of stress and disease. Nevertheless, certain non-enzymatic post-translational modifications can be reversed, which provides an additional layer of regulation and renders these modifications suitable for controlling several cellular processes ranging from signaling to metabolism. (184) Despite other modifications having been identified, the focus of this PTM search was to collect proteins that were modified in a valid manner by HNE, mainly giving to the formation of Michael adducts with lysine, arginine, histidine and cysteine residues, as they are susceptible to the electrophilic attack. The lists of HNE modified proteins in the Caco-2 proteome upon treatment with HNE are presented in Appendix D, showing for the samples treated with 0.1 µM and 100 µM, and for the control. It stands out that several modifications occur independently of the concentration of HNE, being reported in the control sample as well. Table 6 summarizes the information associated with the modified peptide sequences of highlighted proteins, and some of the correspondent MS/MS fragmentation spectra are shown in figure 35.

Table 6 – Relevant HNE-adducted peptides in the Caco-2 samples treated with HNE in concentration 0.1 µM and 100 µM. For each protein, the sequence of the peptide with the amino acid (aa) modified, monoisotopic m/z ratio, charge, retention time (RT) and XCorr value are reported. The symbol # indicates the modification site. All the modifications account for the formation of Michael Adducts except the one signed with *, describing a 2-pentylpyrrole modification.

Accession Gene aa Protein modified Annotated Sequence m/z (Da) Charge RT (min) XCorr number name mod 100 µM [K].VGSAADIPINISETDLSLLTATVVPPSG P21333 Filamin-A FLNA [R29] 980.52556 +4 115.7036 5.86 r#EEPcLLK.[R] P08238 Heat shock protein HSP 90-beta HSP90AB1 [R19] [R].VFIMDScDELIPEYLNFIr#.[G] 844.08645 +3 116.8056 3.69

Q15149 Plectin PLEC [R19] [K].EQELQQTLQQEQSVLDQLr#GEAEA 789.15291 +4 98.7809 3.92 AR.[R] P14618 Pyruvate kinase PKM PKM [K22] [R].LAPITSDPTEATAVGAVEASFk#.[C] 777.74454 +3 90.8321 3.40 P05455 Lupus La protein SSB [K4] [R].SPSk#PLPEVTDEYKNDVK.[N] 551.29003 +4 45.5721 4.94 0.1 µM P35579 Myosin-9 MYH9 [R6] [R].QAQQEr#DELADEIANSSGK.[G] 562.02461 +4 47.5763 4.01 O43707 Alpha-actinin-4 ACTN4 [H13] [K].IcDQWDALGSLTh#SR.[R] 479.48633 +4 68.8971 3.43 bifunctional purine biosynthesis P31939 ATIC [K23] [R].AEISNAIDQYVTGTIGEDEDLI .[W] 884.11183 +3 107.9773 3.00 protein purH k# glyceraldehyde-3-phosphate [R].VIISAPSADAPMFVMGVNHEKYDNSL P04406 GAPDH [K27] 618.71701 +5 79.9858 4.96 dehydrogenase k#.[I] [R]. ELEQVcNPIISGLYQGAGGPGPGG P0DMV8 heat shock 70 kDa protein 1A HSPA1A [K1]E k# 835.67756 +3 87.7891 5.58 FGAQGPK.[G] [K].EGITGPPADSS PIGPDDAIDALSSD P20810 Calpastatin * CAST [K12] k# 931.4417 +4 89.9919 3.95 FTcGSPTAAGK.[K]

77

a.

1 36 [K]. VGSAADIPINISETDLSLLTATVVPPSGr#EEPCLLK. [R] (R29-HNE) Filamin-A

b.

1 18 [R]. SPSk#PLPEVTDEYKNDVK. [N] (K4-HNE) K4-HNE Lupus La protein

Figure 35 – MSn spectra of HNE-adducted peptides in the Caco-2 samples treated with HNE. a. [K].1VGSAADIPINISETDLSLLTATVVPPSGr#EEPCLLK36.[R] (R29-HNE), on Filamin-A. b. [R].1SPSk#PLPEVTDEYKNDVK.18[N] (K4-HNE), on Lupus La protein, pinpointing the modification peak by comparing to the correspondent fragment matches report. The tandem MS spectra were recorded in HCD mode, with a mass range 150-1600 m/z, and are characterized by the diagnostic y and b fragment ion series.

It is important to refer that for some of the modifications identified by the software it was not possible to visualize the correspondent peak on the MS/MS spectrum, which reports uncertainty to being a valid, real modification. Another note comes with the fact that lys-modifications at the C-terminal residue might indicate a false positive: due to a change in charge of the lateral chain caused by HNE adduction, the tryptic enzyme usually fails to correctly recognize the modified amino acid and a missed cleavage occurs.

It has been shown in various cell types that the quantitative share of HNE binding to proteins and modifying them is usually low with about 1–8% of total HNE consumption, as measured by the addition of free HNE. The

78 greater part is associated with enzymatic detoxification pathways of HNE. Nonetheless, this share can change under real stress conditions as the cellular metabolic response might be able to metabolize the amount of formed HNE quicker. (112,185) In fact, not many modifications were found in this study, and it was evidenced a common note regarding HNE targets from other MS analyses, with modified proteins being those of the proteostasis system, as seen with heat shock protein 70 (HSP70) and heat shock protein 90 (HSP90). (112) HNE-modified proteins have been shown to be degraded in vitro by the 20S proteasome, the known proteasomal form for degrading oxidized proteins, but there is also reporting of the ubiquitination pathway for degradation, e.g. with glyceraldehyde-3-phosphate dehydrogenase, which was here identified as a modified protein at low concentration of HNE. This proteolytic susceptibility often varies depending on the concentration of HNE used and the stage of HNE-protein modification. With the treatment here performed with Caco-2 cells there was some indication of proteasomal inhibition by HNE at 100 µM, commented before with the down- regulation of ubiquitins, which is generally only achieved at high concentrations, whereas when mild concentrations up to 10 µM are used protein degradation is higher. This diminishment in proteasomal activity might be attributed to post-translational modifications of the system itself, as noted. (186–189)

Regarding modifications on cysteine residues (data not shown), they were found on peptides of several tubulin chains (TUBA8, TUBA1A, TUBA4A) and on a transketolase (TKT), on the sample treated in concentration 0.1 µM. Tubulin chains also evidenced lys- and his-modifications by HNE (appendix D), overall correlating to the modulation of microtubule polymerization by HNE as pointed by the ClueGo analysis (figure 30). Data has suggested a mechanistic basis for the impairment of tubulin function by 4-HNE leading to inhibited polymerization, therefore prompting a positive regulation response by the cell. (190) On the other hand, a modification in transketolase presents itself very interesting. In a recent study, the expression of this enzyme, that alongside glucose-6-phosphate dehydrogenase connects the pentose phosphate pathway to glycolysis, was demonstrated to be controlled by Nrf2 detoxification pathway activation, to regulate the production of NADPH that maintains antioxidant glutathione in its reduced form. A TKT knockdown resulted in a decrease of NADPH and an increase of ROS. (191) TKT is therefore involved in counteracting oxidative stress, particularly in cancer cells by playing an important role in redox homeostasis in cancer development. Along these lines, targeting enzymes for antioxidant production represents a direction for cancer treatment. (191)

79

CONCLUSIONS AND FUTURE PERSPECTIVES

The work herein presented offers undoubtful comprehension that a mass spectrometry-based global proteomic approach can play a very important role in preliminary research experiments of drug discovery. The label-free untargeted method followed using the high-resolution OrbitrapTM FusionTM TribridTM mass spectrometer was capable of describing a complex biological matrix, serving full coverage of the Caco-2 digested proteome after treatment with HNE; not only all present proteins were identified, 3354 in total, but also non- enzymatic post-translational modifications were found, namely related to the formation of HNE-adducts. With label-free techniques, although the samples from drug-treated and untreated cells cannot be combined to perform multiple analysis in a single run, as multiplexing requires labelling, an accurate semi-quantitative study can still be attained. Whilst guaranteeing statistical significance of the results, the main goal achieved was the identification of up- and down-regulated proteins in comparison to the control experiment, which renders great utility in pre-clinical applications. Based on these outliers, a functional proteomic analysis was performed, seeking to understand which cellular pathways and metabolic functions could be related to modulation by HNE considering its involvement in oxidative stress and the antioxidant response. Taking advantage of several bioinformatic tools available, differentially expressed pathways with a significant p-value were discussed, focusing on response to stimulus and stress, detoxification of reactive oxygen species, oxidation metabolism, protein regulation and degradation, nucleosome assembly, inflammation and immune response. Several identified proteins revealed high significance in these pathways, which gives indication of potential drug targets and biomarkers of oxidative stress, and so prompts future verification and validation via targeted approaches, as with immunofluorescence and Western Blot analysis. A targeted mass spectrometry method with predetermined focus on detecting one or several selected proteins is also a feasible option, banking on its highly quantitative performance character. Therefore, even if Label-Free Quantification does not allow to unambiguously differentiate nonspecifically bounded proteins from specific targets, it provides a list of potential drug targets for further analysis, underlining the contribution of proteomic workflows to the pharmaceutical field by generating assertive groundwork that subsequently boosts medicinal chemistry studies. In fact, before working on lead compound optimization in all its involvement, with computational studies to assess ligand-target interactions and pharmacokinetic studies to evaluate drug-like properties afterwards, finding suitable targets for drug design on the basis of biological plausibility is the fundamental primary stage of drug development. Moreover, and looking further on the developmental route, MS technology for protein biomarker quantification is becoming robust enough to allow the transition of Chemoproteomics into clinical applications as well, by being possible to perform very sensitive, reliable analysis of large-size sets (from several hundred to a few thousand) and of certain sample types, for instances serum or peripheral blood cells. These developments bring the opportunity of adapting quantitative workflows to apply predictive medicine, by evaluating the magnitude of changes to the functional proteome as they can be linked to wellness-to-disease transitions; to better understand drug efficacy and selectivity; and to study disease progression, since broad profiling at various stages of disease could identify diagnostic markers. (1)

80

REFERENCES

1. Jones LH, Neubert H. Clinical chemoproteomics - Opportunities and obstacles. Sci Transl Med. 2017;9(386):1–7. 2. Riccardi Sirtori F, Altomare A, Carini M, Aldini G, Regazzoni L. MS methods to study macromolecule-ligand interaction: Applications in drug discovery. Methods. 2018;144(February):152–74. 3. Wang J, Gao L, Lee YM, Kalesh KA, Ong YS, Lim J, et al. Target identification of natural and traditional medicines with quantitative chemical proteomics approaches. Pharmacol Ther. 2016;162:10–22. 4. Han SY, Seong HK. Introduction to chemical proteomics for drug discovery and development. Arch Pharm (Weinheim). 2007;340(4):169–77. 5. Schirle M, Bantscheff M, Kuster B. Mass spectrometry-based proteomics in preclinical drug discovery. Chem Biol. 2012;19(1):72–84. 6. Wright MH, Sieber SA. Chemical proteomics approaches for identifying the cellular targets of natural products. Nat Prod Rep. 2016;33(5):681–708. 7. Kwon HJ, Karuso P. Chemical proteomics, an integrated research engine for exploring drug-target-phenotype interactions. Proteome Sci. 2018;16(1):1–2. 8. Hess S. The emerging field of chemo- and pharmacoproteomics. Proteomics - Clin Appl. 2013;7(1–2):171–80. 9. Bekker-Jensen DB, Kelstrup CD, Batth TS, Larsen SC, Haldrup C, Bramsen JB, et al. An Optimized Shotgun Strategy for the Rapid Generation of Comprehensive Human Proteomes. Cell Syst. 2017;4(6):587–99. 10. Yates JR, Ruse CI, Nakorchevsky A. Proteomics by Mass Spectrometry: Approaches, Advances, and Applications. Annu Rev Biomed Eng. 2009;11(1):49–79. 11. Tzeng S-C, Maier CS. Label-Free Proteomics Assisted by Affinity Enrichment for Elucidating the Chemical Reactivity of the Liver Mitochondrial Proteome toward Adduction by the Lipid Electrophile 4-hydroxy-2-nonenal (HNE). Front Chem. 2016;4(March):1–17. 12. Resing KA, Ahn NG. Proteomics strategies for protein identification. FEBS Lett. 2005;579(4 SPEC. ISS.):885–9. 13. Briggs J. Whole Genome Sequencing [Internet]. Individualized Medicine blog. 2015 [cited 2018 Apr 14]. Available from: https://individualizedmedicineblog.mayoclinic.org/2015/12/23/whole-genome-sequencing/ 14. Köcher T, Superti-Furga G. Mass spectrometry-based functional proteomics: From molecular machines to protein networks. Nat Methods. 2007;4(10):807–15. 15. Jensen ON. Modification-specific proteomics: Characterization of post-translational modifications by mass spectrometry. Curr Opin Chem Biol. 2004;8(1):33–41. 16. Monti M, Orrù S, Pagnozzi D, Pucci P. Functional proteomics. Clin Chim Acta. 2005;357(2):140–50. 17. Mann M, Jensen ON. Proteomic analysis of post-translational modifications. Nat Biotechnol. 2003;21(3):255–61. 18. Ewing RM, Doyle DA. Structural Proteomics: Large-Scale Studies. eLS. 2015;1–7. 19. Ong S-E, Mann M. Mass spectrometry-based proteomics turns quantitative. Nat Chem Biol. 2005;1(5):252–62. 20. Lindemann C, Thomanek N, Hundt F, Lerari T, Meyer HE, Wolters D, et al. Strategies in relative and absolute quantitative mass spectrometry based proteomics. Biol Chem. 2017;398(5–6):687–99. 21. Kito K, Ito T. Mass Spectrometry-Based Approaches Toward Absolute Quantitative Proteomics. Curr Genomics. 2008;9(4):263–74. 22. Gregorich ZR, Chang YH, Ge Y. Proteomics in heart failure: Top-down or bottom-up? Pflugers Arch Eur J Physiol. 2014;466(6):1199– 209. 23. Washburn MP, Wolters D, Yates JRCN-2498, Yates 3rd JR, Yates JRCN-2498. Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nat Biotechnol. 2001;19(3):242. 24. Nadler WM, Waidelich D, Kerner A, Hanke S, Berg R, Trumpp A, et al. MALDI versus ESI: The Impact of the Ion Source on Peptide Identification. J Proteome Res. 2017;16(3):1207–15. 25. Fenn JB, Mann M, Meng CK, Wong SF, Whitehouse CM. Electrospray Ionization for Mass Spectrometry of Large Biomolecules. Science (80- ). 1989;246(4926):64–71. 26. Ho CS, Lam CWK, Chan MHM, Cheung RCK, Law LK, Lit LCW, et al. Electrospray ionisation mass spectrometry: principles and clinical applications. Clin Biochem. 2003;24(1):3–12.

81

27. Hillenkamp F, Karas M. Mass spectrometry of peptides and proteins by matrix-assisted ultraviolet laser desorption/ionization. Methods Enzymol. 1990;193:280–95. 28. Han X, Aslanian A, Yates III JR. Mass spectrometry for proteomics. Curr Opin Chem Biol. 2008;12(5):483–90. 29. Perry RH, Cooks RG, Noll RJ. Orbitrap mass spectrometry: Instrumentation, ion motion and applications. Mass Spectrom Rev. 2008;27(6):661–99. 30. Khan O. Mass Analyzers (Mass Spectrometry) [Internet]. Analytical Chemistry - Instrumental. 2017 [cited 2018 Apr 17]. Available from: https://chem.libretexts.org/Textbook_Maps/Analytical_Chemistry/Supplemental_Modules_(Analytical_Chemistry)/Instrumental_An alysis/Mass_Spectrometry/Mass_Spectrometers_(Instrumentation)/Mass_Analyzers_(Mass_Spectrometry) 31. Sadygov RG, Cociorva D, Yates III JR. Large-scale database searching using tandem mass spectra: Looking up the answer in the back of the book. Nat Methods. 2004;1(3):195–202. 32. Hunt DF, Yates JR, Shabanowitz J, Winston S, Hauer CR. Protein sequencing by tandem mass spectrometry. Proc Natl Acad Sci. 1986;83(17):6233–7. 33. Hu A, Noble WS, Wolf-Yadlin A. Technical advances in proteomics: new developments in data-independent acquisition. F1000Research. 2016;5:419. 34. Frank AM. A ranking-based scoring function for peptide-spectrum matches. J Proteome Res. 2009;8(5):2241–52. 35. Eng J, McCormack A, Yates J. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J Am Soc Mass Spectrom. 1994;5(11):976–89. 36. Mann M, Wilm M. Error-Tolerant Identification of Peptides in Sequence Databases by Peptide Sequence Tags. Anal Chem. 1994;66(24):4390–9. 37. Thermo Fisher Scientific. Orbitrap FusionTM TribridTM Mass Spectrometer [Internet]. Mass Spectrometry Systems & Components. [cited 2018 Jul 25]. Available from: https://www.thermofisher.com/order/catalog/product/IQLAAEGAAPFADBMBCX 38. Thermo Fisher Scientific. Orbitrap FusionTM TribridTM Mass Spectrometer - Product Specifications. Application note Thermo Fisher Scientific. 2015. 39. Fiehn Lab. Mass Resolution and Resolving Power [Internet]. 2016 [cited 2018 Aug 9]. Available from: http://fiehnlab.ucdavis.edu/projects/seven-golden-rules/mass-resolution 40. Zhang H, Ge Y. Comprehensive Analysis of Protein Modifications by Top-Down Mass Spectrometry. Circ Cardiovasc Genet. 2011;4(6):1– 21. 41. Tran JC, Zamdborg L, Ahlf DR, Lee JE, Catherman AD, Durbin KR, et al. Mapping intact protein isoforms in discovery mode using top- down proteomics. Nature. 2011;480(7376):254–8. 42. Kelleher NL, Lin HY, Valaskovic GA, Aaserud DJ, Fridriksson EK, McLafferty FW. Top Down versus Bottom Up Protein Characterization by Tandem High-Resolution Mass Spectrometry. J Am Chem Soc. 1999;121(4):806–12. 43. Sap KA, Demmers JA. Labeling Methods in Mass Spectrometry Based Quantitative Proteomics. In: Leung H-C, Man TK, editors. Integrative Proteomics. IntechOpen; 2012. 44. Yao X, Freas A, Ramirez J, Demirev PA, Fenselau C. Proteolytic 18O-labeling for comparative proteomics. Model studies with two serotypes of adenovirus. Anal Chem. 2001;73(page 2839):2836–42. 45. Gerber SA, Rush J, Stemman O, Kirschner MW, Gygi SP. Absolute quantification of proteins and phosphoproteins from cell lysates by tandem MS. Proc Natl Acad Sci. 2003;100(12):6940–5. 46. Liu H, Sadygov RG, Yates III JR. A model for random sampling and estimation of relative protein abundance in shotgun proteomics. Anal Chem. 2004;76:4193–201. 47. Cox J, Hein MY, Luber CA, Paron I, Nagaraj N, Mann M. Accurate Proteome-wide Label-free Quantification by Delayed Normalization and Maximal Peptide Ratio Extraction, Termed MaxLFQ. Mol Cell Proteomics. 2014;13(9):2513–26. 48. Wysocki VH, Tsaprailis G, Smith LL, Breci LA. Mobile and localized protons: A framework for understanding peptide dissociation. J Mass Spectrom. 2000;35(12):1399–406. 49. Medzihradszky KF, Chalkley RJ. Lessons in de novo peptide sequencing by tandem mass spectrometry. Mass Spectrom Rev. 2015;34(1):43–63. 50. Hughes C, Ma B, Lajoie GA. De Novo Sequencing Methods in Proteomics. Proteome Bioinforma. 2010;604:105–21. 51. Roepstorff P, Fohlman J. Proposal for a common nomenclature for sequence ions in mass spectra of peptides. Biomed Mass Spectrom. 1984;11(11):601.

82

52. Chu IK, Siu CK, Lau JKC, Tang WK, Mu X, Lai CK, et al. Proposed nomenclature for peptide ion fragmentation. Int J Mass Spectrom. 2015;390:24–7. 53. Mann M, Meng C, Fenn J. Interpreting Mass Spectra of Multiply Charged Ions. Anal Chem. 1989;61:1702–8. 54. Ivanov M V., Levitsky LI, Lobas AA, Panic T, Laskay ÜA, Mitulovic G, et al. Empirical multidimensional space for scoring peptide spectrum matches in shotgun proteomics. J Proteome Res. 2014;13(4):1911–20. 55. Cannon WR, Jarman KH, Webb-Robertson BJM, Baxter DJ, Oehmen CS, Jarman KD, et al. Comparison of probability and likelihood models for peptide identification from tandem mass spectrometry data. J Proteome Res. 2005;4(5):1687–98. 56. Kim S, Gupta N, Pevzner PA. Spectral probabilities and generating functions of tandem mass spectra: A strike against decoy databases. J Proteome Res. 2008;7(8):3354–63. 57. Cox J, Neuhauser N, Michalski A, Scheltema RA, Olsen J V., Mann M. Andromeda: A peptide search engine integrated into the MaxQuant environment. J Proteome Res. 2011;10(4):1794–805. 58. Whisstock JC, Lesk AM. Prediction of protein function from protein sequence and structure. Q Rev Biophys. 2003;36(3):307–40. 59. Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, et al. Pfam: The protein families database. Nucleic Acids Res. 2014;42(Databse):D222–30. 60. Sigrist CJA, Cerutti L, De Castro E, Langendijk-Genevaux PS, Bulliard V, Bairoch A, et al. PROSITE, a protein domain database for functional characterization and annotation. Nucleic Acids Res. 2009;38(Database):D161–D166. 61. Berman HM. The Protein Data Bank: a historical perspective. Acta Crystallogr Sect A. 2007;64:88–95. 62. Sleator RD, Walsh P. An overview of in silico protein function prediction. Arch Microbiol. 2010;192(3):151–5. 63. Sharan R, Ulitsky I, Shamir R. Network-based prediction of protein function. Mol Syst Biol. 2007;3(88):1–13. 64. Szklarczyk D, Franceschini A, Wyder S, Forslund K, Heller D, Huerta-Cepas J, et al. STRING v10: Protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 2015;43(D1):D447–52. 65. Wodak SJ, Pu S, Vlasblom J, Séraphin B. Challenges and Rewards of Interaction Proteomics. Mol Cell Proteomics. 2009;8(1):3–18. 66. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene ontology: Tool for the unification of biology. Nat Genet. 2000;25(1):25–9. 67. The Gene Ontology Consortium. Expansion of the Gene Ontology knowledgebase and resources. Nucleic Acids Res. 2017;45(Database):D331–8. 68. Mi H, Huang X, Muruganujan A, Tang H, Mills C, Kang D, et al. PANTHER version 11: Expanded annotation data from Gene Ontology and Reactome pathways, and data analysis tool enhancements. Nucleic Acids Res. 2017;45(Database):D183–9. 69. Kanehisa M, Goto S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 2000;28(1):27–30. 70. Kanehisa M, Sato Y, Kawashima M, Furumichi M, Tanabe M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 2016;44(Database):D457–62. 71. Fabregat A, Jupe S, Matthews L, Sidiropoulos K, Gillespie M, Garapati P, et al. The Reactome Pathway Knowledgebase. Nucleic Acids Res. 2018;46(Database):D649–55. 72. Fabregat A, Sidiropoulos K, Garapati P, Gillespie M, Hausmann K, Haw R, et al. The Reactome pathway Knowledgebase. Nucleic Acids Res. 2016;44(Database):D481–7. 73. The Ensembl Consortium. Ensembl 2018. Nucleic Acids Res. 2017;46(D1):D754–61. 74. Flicek P, Aken B, Ballester B. Ensembl’s 10th year. Nucleic Acids Res. 2010;38. 75. The UniProt Consortium. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 2017;45(D1):D158–D169. 76. Breuza L, Poux S, Estreicher A, Famiglietti ML, Magrane M, Tognolli M, et al. The UniProtKB guide to the human proteome. Database. 2016;2016:1–10. 77. Reynaud E. UniProt archive. Nat Educ. 2010;3(9):3236–7. 78. Suzek BE, Wang Y, Huang H, McGarvey PB, Wu CH. UniRef clusters: A comprehensive and scalable alternative for improving sequence similarity searches. . 2015;31(6):926–32. 79. Pundir S, Martin MJ, O’Donovan C. UniProt protein knowledgebase. Methods Mol Biol. 2017;1558:41–55. 80. European Bioinformatics Institute (EMBL-EBI). Limitations of annotation enrichment [Internet]. Network analysis of protein interaction data: an introduction. Cambridgeshire; 2018 [cited 2018 Aug 27]. Available from: https://www.ebi.ac.uk/training/online/course/network-analysis-protein-interaction-data-introduction/limitations-annotation- enrichment

83

81. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, et al. Cytoscape: A software Environment for integrated models of biomolecular interaction networks. Genome Res. 2003;13(11):2498–504. 82. Maere S, Heymans K, Kuiper M. BiNGO: A Cytoscape plugin to assess overrepresentation of Gene Ontology categories in Biological Networks. Bioinformatics. 2005;21(16):3448–9. 83. Bindea G, Mlecnik B, Hackl H, Charoentong P, Tosolini M, Kirilovsky A, et al. ClueGO: A Cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks. Bioinformatics. 2009;25(8):1091–3. 84. Thermo Fisher Scientific. Introduction to Cell Culture [Internet]. Cell Culture Basics — Gibco. [cited 2018 Aug 16]. Available from: https://www.thermofisher.com/pt/en/home/references/gibco-cell-culture-basics/introduction-to-cell-culture.html 85. Arnold M, Sierra MS, Laversanne M, Soerjomataram I, Jemal A, Bray F. Global patterns and trends in colorectal cancer incidence and mortality. Gut. 2017;66(4):683–91. 86. Gonzalez-Donquiles C, Alonso-Molero J, Fernandez-Villa T, Vilorio-Marqués L, Molina AJ, Martín V. The NRF2 transcription factor plays a dual role in colorectal cancer: A systematic review. PLoS One. 2017;12(5):1–14. 87. Fogh J, Fogh JM, Orfeo T. One Hundred and Twenty-Seven Cultured Human Tumor Cell Lines Producing Tumors in Nude Mice. J Natl Cancer Inst. 1977;59(1):221–6. 88. Pinto M, Robine-leon S, Appay M, Kedinger M, Triadou N, Dussaulx E, et al. Enterocyte-like differentiation and polarization of the human colon carcinoma cell line caco-2 in culture. Biol Cell. 1983;47:323–30. 89. Matsumoto H, Erickson RH, Gum JR, Yoshioka M, Gum E, Kim YS. Biosynthesis of alkaline phosphatase during differentiation of the human colon cancer cell line Caco-2. Gastroenterology. 1990;98(5 PART 1):1199–207. 90. Buhrke T, Lengler I, Lampen A. Analysis of proteomic changes induced upon cellular differentiation of the human intestinal cell line Caco-2. Dev Growth Differ. 2011;53(3):411–26. 91. Lea T. Caco-2 Cell Line. In: Verhoeckx K, Cotter P, López-Expósito I, Kleiveland C, Lea T, Mackie A, et al., editors. The Impact of Food Bioactives on Health. Springer, Cham; 2015. p. 103–11. 92. ATC Pharma. Caco-2 permeability assay [Internet]. [cited 2018 Jul 18]. Available from: http://www.atc-pharma.be/en/node/151 93. SOLVO Biotechnology. Drug absorption in the intestine [Internet]. Intestinal Barrier. [cited 2018 Jul 18]. Available from: https://www.solvobiotech.com/barriers/intestinal-barrier 94. Burgess AW. Growth control mechanisms in normal and transformed intestinal cells. Philos Trans R Soc B. 1998;353(1370):903–9. 95. Engle MJ, Goetz GS, Alpers DH. Caco-2 cells express a combination of colonocyte and enterocyte phenotypes. J Cell Physiol. 1998;174(3):362–9. 96. Sambuy Y, De Angelis I, Ranaldi G, Scarino ML, Stammati A, Zucco F. The Caco-2 cell line as a model of the intestinal barrier: influence of cell and culture-related factors on Caco-2 cell functional characteristics. 2005;21:1–26. 97. Kumar KK V, Karnati S, Reddy MB, Chandramouli R. Caco-2 cell lines in drug discovery- an updated perspective. J Basic Clin Pharm. 2010;1(2):63–9. 98. Kola I, Landis J. Can the pharmaceutical industry reduce attrition rates? Nat Rev Drug Discov. 2004;3(August):1–5. 99. Bourgine J, Billaut-Laden I, Happillon M, Lo-Guidice J-M, Maunoury V, Imbenotte M, et al. Gene expression profiling of systems involved in the metabolism and the disposition of xenobiotics: comparison between human intestinal biopsy samples and colon cell lines. Drug Metab Dispos. 2012;40(4):694–705. 100. Zhang Y, Sano M, Shinmura K, Tamaki K, Katsumata Y, Matsuhashi T, et al. 4-Hydroxy-2-nonenal protects against cardiac ischemia- reperfusion injury via the Nrf2-dependent pathway. J Mol Cell Cardiol. 2010;49(4):576–86. 101. Ayala A, Muñoz MF, Argüelles S. Lipid peroxidation: Production, metabolism, and signaling mechanisms of malondialdehyde and 4- hydroxy-2-nonenal. Oxid Med Cell Longev. 2014;2014. 102. Pizzimenti S, Toaldo C, Pettazzoni P, Ciamporcero E, Dianzani MU, Barrera G. Lipid Peroxidation in Colorectal Carcinogenesis: Bad and Good News. In: Ettarh R, editor. Colorectal Cancer Biology – From Genes to Tumor. IntechOpen; 2008. p. 155–88. 103. Barrera G, Pizzimenti S, Dianzani MU. Lipid peroxidation: control of cell proliferation, cell differentiation and cell death. Mol Aspects Med. 2008;29(1–2):1–8. 104. Zhong H, Yin H. Role of lipid peroxidation derived 4-hydroxynonenal (4-HNE) in cancer: Focusing on mitochondria. Redox Biol. 2015;4:193–9. 105. Galligan JJ, Marnett LJ. Histone adduction and its functional impact on epigenetics. Chem Res Toxicol. 2016;30(1):376–87.

84

106. Pizzimenti S, Menegatti E, Berardi D, Toaldo C, Pettazzoni P, Minelli R, et al. 4-Hydroxynonenal, a lipid peroxidation product of dietary polyunsaturated fatty acids, has anticarcinogenic properties in colon carcinoma cell lines through the inhibition of telomerase activity. J Nutr Biochem. 2010;21(9):818–26. 107. Esterbauer H, Schaur RJ, Zollner H. Chemistry and biochemistry of 4-hydroxynonenal, malonaldehyde and related aldehydes. Free Radic Biol Med. 1991;11(1):81–128. 108. Mol M, Regazzoni L, Altomare A, Degani G, Carini M, Vistoli G, et al. Enzymatic and non-enzymatic detoxification of 4-hydroxynonenal: Methodological aspects and biological consequences. Free Radic Biol Med. 2017;111(January):328–44. 109. Barrera G, Pizzimenti S, Ciamporcero ES, Daga M, Ullio C, Arcaro A, et al. Role of 4-Hydroxynonenal-Protein Adducts in Human Diseases. Antioxid Redox Signal. 2015;22(18):1681–702. 110. Siow RCM, Ishii T, Mann GE. Modulation of antioxidant gene expression by 4-hydroxynonenal: atheroprotective role of the Nrf2/ARE transcription pathway. Redox Rep. 2007;12(1–2):11–5. 111. Park BK, Laverty H, Srivastava A, Antoine DJ, Naisbitt D, Williams DP. Drug bioactivation and protein adduct formation in the pathogenesis of drug-induced toxicity. Chem Biol Interact. 2011;192(1–2):30–6. 112. Castro JP, Jung T, Grune T, Siems W. 4-Hydroxynonenal (HNE) modified proteins in metabolic diseases. Free Radic Biol Med. 2017;111(September 2016):309–15. 113. Satoh T, McKerch SR, Lipton SA. Nrf2/ARE-Mediated Antioxidant Actions of Pro-Electrophilic Drugs. Free Radic Biol Med. 2013;65:645– 57. 114. Satoh T, Akhtar MW, Lipton SA. Combating oxidative/nitrosative stress with electrophilic counterattack strategies. In: Jakob, U., Reichmann D, editor. Oxidative stress and redox regulation. Dordrecht: Springer; 2013. p. 277–307. 115. Satoh T, Lipton SA. Redox regulation of neuronal survival mediated by electrophilic compounds. Trends Neurosci. 2007;30(1):37–45. 116. Nguyen T, Nioi P, Pickett CB. The Nrf2-antioxidant response element signaling pathway and its activation by oxidative stress. J Biol Chem. 2009;284(20):13291–5. 117. Itoh K, Wakabayashi N, Katoh Y, Ishii T, Igarashi K, Engel JD, et al. Keap1 represses nuclear activation of antioxidant responsive elements by Nrf2 through binding to the amino-terminal Neh2 domain. Genes Dev. 1999;13(1):76–86. 118. Itoh K, Tong KI, Yamamoto M. Molecular mechanism activating Nrf2-Keap1 pathway in regulation of adaptive response to electrophiles. Free Radic Biol Med. 2004;36(10):1208–13. 119. Levonen A-L, Landar A, Ramachandran A, Ceaser EK, Dickinson D a, Zanoni G, et al. Cellular mechanisms of redox cell signalling: role of cysteine modification in controlling antioxidant defences in response to electrophilic lipid oxidation products. Biochem J. 2004;378(Pt 2):373–82. 120. Taguchi K, Motohashi H, Yamamoto M. Molecular mechanisms of the Keap1-Nrf2 pathway in stress response and cancer evolution. Genes Cells. 2011;16(2):123–40. 121. Ma Q. Role of Nrfe in Oxidative Stress and Toxicity. Annu Rev Pharmacol Toxicol. 2013;53(1):401–26. 122. Satoh T, Lipton SA. Recent advances in understanding NRF2 as a druggable target: development of pro-electrophilic and non-covalent NRF2 activators to overcome systemic side effects of electrophilic drugs like dimethyl fumarate. F1000Research. 2017;6(2138):1–10. 123. Dinkova-Kostova AT, Holtzclaw WD, Cole RN, Itoh K, Wakabayashi N, Katoh Y, et al. Direct evidence that sulfhydryl groups of Keap1 are the sensors regulating induction of phase 2 enzymes that protect against carcinogens and oxidants. Proc Natl Acad Sci. 2002;99(18):11908–13. 124. Magesh S, Chen Y, Hu L. Small Molecule Modulators of Keap1-Nrf2-ARE Pathway as Potential Preventive and Therapeutic Agents. Med Res Rev. 2012;32(4):687–726. 125. Yamamoto XM, Kensler TW, Motohashi H. The Keap1-Nrf2 System: A Thiol-Based Sensor-Effector Apparatus for Maintaining Redox Homeostasis. Physiol Rev. 2018;98(16):1169–203. 126. Hong F, Freeman ML, Liebler DC. Identification of sensor cysteines in human Keap1 modified by the cancer chemopreventive agent sulforaphane. Chem Res Toxicol. 2005;18(12):1917–26. 127. Ishii T, Itoh K, Takahashi S, Sato H, Yanagawa T, Katoh Y, et al. Transcription factor Nrf2 coordinately regulates a group of oxidative stress-inducible genes in macrophages. J Biol Chem. 2000;275(21):16023–9. 128. Baranano DE, Rao M, Ferris CD, Snyder SH. Biliverdin reductase: A major physiologic cytoprotectant. Proc Natl Acad Sci. 2002;99(25):16093–8.

85

129. Otterbein LE, Bach FH, Alam J, Soares M, Tao Lu H, Wysk M, et al. Carbon monoxide has anti-inflammatory effects involving the mitogen-activated protein kinase pathway. Nat Med. 2000;6(4):422–8. 130. Ferris CD, Jaffrey SR, Sawa A, Takahashi M, Brady SD, Barrow RK, et al. Heme oxygenase-1 prevents cell death by regulating cellular iron. Nat Cell Biol. 1999;1(3):152–7. 131. Wang YY, Yang YX, Zhe H, He ZX, Zhou SF. Bardoxolone methyl (CDDO-Me) as a therapeutic agent: An update on its pharmacokinetic and pharmacodynamic properties. Drug Des Devel Ther. 2014;8:2075–88. 132. Lu MC, Ji JA, Jiang ZY, You QD. The Keap1–Nrf2–ARE Pathway As a Potential Preventive and Therapeutic Target: An Update. Med Res Rev. 2016;36(5):924–63. 133. Satoh T, Kosaka K, Itoh K, Kobayashi A, Yamamoto M, Shimojo Y, et al. Carnosic acid, a catechol-type electrophilic compound, protects neurons both in vitro and in vivo through activation of the Keap1/Nrf2 pathway via S-alkylation of targeted cysteines on Keap1. J Neurochem. 2008;104(4):1116–31. 134. Shimizu H, Koyama T, Yamada S, Lipton SA, Satoh T. Zonarol, a sesquiterpene from the brown algae Dictyopteris undulata, provides neuroprotection by activating the Nrf2/ARE pathway. Biochem Biophys Res Commun. 2015;457(4):718–22. 135. Jiang Z-Y, Lu M-C, You Q-D. Discovery and Development of Kelch-like ECH-Associated Protein 1 . Nuclear Factor Erythroid 2 ‑ Related Factor 2 ( KEAP1 : NRF2 ) Protein − Protein Interaction Inhibitors : Achievements , Challenges , and Future Directions. J Med Chem. 2016;59(24):10837−10858. 136. Cuadrado A, Manda G, Hassan A, Alcaraz MJ, Barbas C, Daiber A, et al. Transcription Factor NRF2 as a Therapeutic Target for Chronic Diseases: A Systems Medicine Approach. Pharmacol Rev. 2018;70(2):348–83. 137. Milkovic L, Zarkovic N, Saso L. Controversy about pharmacological modulation of Nrf2 for cancer therapy. Redox Biol. 2017;12(March):727–32. 138. Schumacker PT. Reactive oxygen species in cancer cells: Live by the sword, die by the sword. Cancer Cell. 2006;10(3):175–6. 139. Reuter S, Cupta SC, Chaturvedi MM, Aggarwal BB. Oxidative stress, inflammation, and cancer: How are they linked? Free Radic Biol Med. 2010;49(11):1603–16. 140. Singh A, Bodas M, Wakabayashi N, Bunz F, Biswal S. Gain of Nrf2 Function in Non-Small-Cell Lung Cancer Cells Confers Radioresistance. Antioxid Redox Signal. 2010;13(11):1627–37. 141. Jaramillo MC, Zhang DD. The emerging role of the Nrf2–Keap1 signaling pathway in cancer. Genes Dev. 2013;27(20):2179–91. 142. Emmink BL, Verheem A, Van Houdt WJ, Steller EJ, Govaert KM, Pham T V., et al. The secretome of colon cancer stem cells contains drug-metabolizing enzymes. J Proteomics. 2013;91:84–96. 143. Panieri E, Santoro MM. ROS homeostasis and metabolism: a dangerous liason in cancer cells. Cell Death Dis. 2016;7(6):e2253. 144. Rees MS, Siakotos AN, Mundy BP. Improved synthesis of various isotope labeled 4-hydroxyalkenals and peroxidation intermediates. Synth Commun. 1995;25(20):3225–36. 145. Thermo Fisher Scientific. DionexTM UltiMateTM 3000 RSLCnano System - Product Specifications. Chromatography. 2016. 146. Cox J. MaxQuant [Internet]. Software. [cited 2018 Jul 25]. Available from: http://www.biochem.mpg.de/5111795/maxquant 147. Cox J, Mann M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat Biotechnol. 2008;26(12):1367–72. 148. Tyanova S, Temu T, Carlson A, Sinitcyn P, Mann M, Cox J. Visualization of LC-MS/MS proteomics data in MaxQuant. Proteomics. 2015;15(8):1453–6. 149. Tyanova S, Temu T, Sinitcyn P, Carlson A, Hein MY, Geiger T, et al. The Perseus computational platform for comprehensive analysis of (prote)omics data. Nat Methods. 2016;13(9):731–40. 150. Satyal SH, Chen D, Fox SG, Kramer JM, Morimoto RI. Negative regulation of the heat shock transcriptional response by HSBP1. Genes Dev. 1998;12(13):1962–74. 151. Pham K, Pal R, Qu Y, Liu X, Yu H, Shiao SL, et al. Nuclear glutaredoxin 3 is critical for protection against oxidative stress-induced cell death. Free Radic Biol Med. 2015;85:197–206. 152. Srinivasan S, Avadhani NG. Cytochrome c oxidase dysfunction in oxidative stress. Free Radic Biol Med. 2012;53:1252–63. 153. Mráček T, Drahota Z, Houštěk J. The function and the role of the mitochondrial glycerol-3-phosphate dehydrogenase in mammalian tissues. Biochim Biophys Acta - Bioenerg. 2013;1827(3):401–10. 154. Eyassu F, Angione C. Modelling pyruvate dehydrogenase under hypoxia and its role in cancer metabolism. R Soc Open Sci. 2017;4(10):1– 9.

86

155. Miyake SI, Yamashita T, Taniguchi M, Tamatani M, Sato K, Tohyama M. Identification and characterization of a novel mitochondrial tricarboxylate carrier. Biochem Biophys Res Commun. 2002;295(2):463–8. 156. Galaris D, Pantopoulos K. Oxidative stress and iron homeostasis: Mechanistic and health aspects. Crit Rev Clin Lab Sci. 2008;45(1):1– 23. 157. Serneels S, Verdonck T. Principal component analysis for data containing outliers and missing elements. Comput Stat Data Anal. 2008;52(3–1):1712–27. 158. Galligan JJ, Rose KL, Beavers WN, Hill S, Tallman KA, Tansey WP, et al. Stable histone adduction by 4-oxo-2-nonenal: A potential link between oxidative stress and epigenetics. J Am Chem Soc. 2014;136(34):11864–6. 159. Kandi S, Deshpande N, Bharath V, Pinnelli K, Devaki R, Rao P. Alcoholism and Its Role in the Development of Oxidative Stress and DNA Damage : An Insight. Am J Med Sci Med. 2014;2(3):64–6. 160. Galligan JJ, Smathers RL, Shearn CT, Fritz KS, Backos DS, Jiang H, et al. Oxidative Stress and the ER Stress Response in a Murine Model for Early-Stage Alcoholic Liver Disease. J Toxicol. 2012;2012:1–12. 161. Parola M, Robino G. Oxidative stress-related molecules and liver fibrosis. J Hepatol. 2001;35(2):297–306. 162. Chen D, Fang L, Li H, Tang MS, Jin C. Cigarette smoke component acrolein modulates chromatin assembly by inhibiting histone acetylation. J Biol Chem. 2013;288(30):21678–87. 163. Perl A. Oxidative stress in the pathology and treatment of systemic lupus erythematosus. Nat Rev Rheumatol. 2013;9(11):674–86. 164. Lightfoot YL, Blanco LP, Kaplan MJ. Metabolic abnormalities and oxidative stress in lupus. Curr Opin Rheumatol. 2017;29(5):442–9. 165. Karimaian A, Majidinia M, Bannazadeh Baghi H, Yousefi B. The crosstalk between Wnt/β-catenin signaling pathway with DNA damage response and oxidative stress: Implications in cancer therapy. DNA Repair (Amst). 2017;51:14–9. 166. Korswagen HC. Regulation of the Wnt/β-catenin pathway by redox signaling. Dev Cell. 2006;10(6):687–8. 167. Manolagas SC, Almeida M. Gone with the Wnts: β-Catenin, T-Cell Factor, Forkhead Box O, and Oxidative Stress in Age-Dependent Diseases of Bone, Lipid, and Glucose Metabolism. Mol Endocrinol. 2007;21(11):2605–14. 168. Kim Y-W, Byzova T V. Oxidative stress in angiogenesis and vascular disease. Blood [Internet]. 2014;123(5):625–31. Available from: http://www.bloodjournal.org/cgi/doi/10.1182/blood-2013-09-512749 169. Wang Y, Zang QS, Liu Z, Wu Q, Maass D, Dulan G, et al. Regulation of VEGF-induced endothelial cell migration by mitochondrial reactive oxygen species. Am J Physiol Physiol. 2011;301(3):C695–704. 170. Kim Y-W, West XZ, Byzova T V. Inflammation and oxidative stress in angiogenesis and vascular disease. J Mol Med. 2013;91(3):323–8. 171. Gonskikh Y, Polacek N. Alterations of the translation apparatus during aging and stress response. Mech Ageing Dev. 2017;168:30–6. 172. Sakurai H, Enoki Y. Novel aspects of heat shock factors: DNA recognition, chromatin modulation and gene expression. FEBS J. 2010;277(20):4140–9. 173. Anckar J, Sistonen L. Regulation of HSF1 Function in the Heat Stress Response: Implications in Aging and Disease. Annu Rev Biochem. 2011;80(1):1089–115. 174. Mukhopadhyay D, Riezman H. Proteasome-independent functions of ubiquitin in endocytosis and signaling. Science (80- ). 2007;315(5809):201–5. 175. Flick K, Kaiser P. Protein degradation and the stress response. Semin Cell Dev Biol. 2012;23(5):515–22. 176. Zhao C, Denison C, Huibregtse JM, Gygi S, Krug RM. Human ISG15 conjugation targets both IFN-induced and constitutively expressed proteins functioning in diverse cellular pathways. Proc Natl Acad Sci. 2005;102(29):10200–5. 177. Morales DJ, Lenschow DJ. The antiviral activities of ISG15. J Mol Biol. 2013;425(24):4995–5008. 178. Jeon YJ, Park JH, Chung CH. Interferon-Stimulated Gene 15 in the Control of Cellular Responses to Genotoxic Stress. Mol Cells. 2017;40(2):83–9. 179. Zhang D, Zhang D-E. Interferon-Stimulated Gene 15 and the Protein ISGylation System. J Interf Cytokine Res. 2011;31(1):119–30. 180. Malakhov MP, Malakhova OA, Il Kim K, Ritchie KJ, Zhang DE. UBP43 (USP18) specifically removes ISG15 from conjugated proteins. J Biol Chem. 2002;277(12):9976–81. 181. Thermo Fisher Scientific. Proteome DiscovererTM Software [Internet]. Data Management Software. [cited 2018 Jul 25]. Available from: https://www.thermofisher.com/order/catalog/product/OPTON-30795 182. Thermo Fisher Scientific. Thermo Proteome Discoverer 2.2. User Guide Thermo Fisher Scientific. 2017. 183. Tabb DL. The SEQUEST Family Tree. J Am Soc Mass Spectrom. 2015;26(11):1814–9. 184. Harmel R, Fiedler D. Features and regulation of non-enzymatic post-translational modifications. Nat Chem Biol. 2018;14(3):244–52.

87

185. Siems W, Grune T. Intracellular metabolism of 4-hydroxynonenal. Mol Aspects Med. 2003;24(4–5):167–75. 186. Botzen D, Grune T. Degradation of HNE-modified proteins-possible role of ubiquitin. Redox Rep. 2007;12(1):63–7. 187. Grune T, Davies KJA. The proteasomal system and HNE-modified proteins. Mol Aspects Med. 2003;24(4–5):195–204. 188. Shringarpure R, Grune T, Mehlhase J, Davies KJA. Ubiquitin conjugation is not required for the degradation of oxidized proteins by proteasome. J Biol Chem. 2003;278(1):311–8. 189. Okada K, Wangpoengtrakul C, Osawa T, Toyokuni S, Tanaka K, Uchida K. 4-Hydroxy-2-nonenal-mediated impairment of intracellular proteolysis during oxidative stress. Identification of proteasomes as target molecules. J Biol Chem. 1999;274(34):23787–93. 190. Stewart BJ, Doom JA, Petersen DR. Residue-specific adduction of tubulin by 4-hydroxynonenal and 4-oxononenal causes cross-linking and inhibits polymerization. Chem Res Toxicol. 2007;20(8):1111–9. 191. Xu IM-J, Lai RK-H, Lin S-H, Tse AP-W, Chiu DK-C, Koh H-Y, et al. Transketolase counteracts oxidative stress to drive cancer development. Proc Natl Acad Sci. 2016;113(6):E725–34.

88

APPENDICES

Appendix A – Protein identification software workflows to perform Label-Free quantification analysis

MaxQuant & Perseus Xcalibur raw spectra files obtained from Orbitrap Fusion™ Tribrid™ mass spectrometer were loaded to MaxQuant platform integrated with Andromeda search engine for Peptide Spectral Match. Label-Free Quantification parameters were selected, maintaining general default settings for protein identification, with exception to the number of peptides set to be greater or equal than 3, the minimum peptide length set to 6 amino acids, and the maximum mass range set to 4600 Da, with five running processors. The protein groups list output file of the MS analysis was then processed within Perseus framework, following the workflow illustrated in figure A1, which tailed as: filter rows based on categorical columns for only identified by site, reverse and potential contaminants; transform LFQ Intensity columns with log2 values; input categorical annotation rows, by grouping the LFQ Intensity columns according to its associated condition of treatment; and filter columns and rows based on valid values (minimum 3, in at least one group). An overall number of 3354 protein groups identified across all samples conditions was obtained. Along the way, it was possible to retrieve several graphs of interest.

Figure A1 – Workflow implemented in the Perseus framework to process the results of protein identification by MaxQuant Label-Free analysis of MS data generated by Orbitrap Fusion™ Tribrid™ mass spectrometer.

Proteome Discoverer Xcalibur raw spectra files obtained from Orbitrap Fusion™ Tribrid™ mass spectrometer were loaded as well to Proteome Discoverer platform. This software also works for protein identification through Peptide Spectral Match, but different-extent processing and consensus workflows can be created, which were in this case the ones associated with an all-inclusive Label-Free Quantification, illustrated in figures A2 and A3.

89

Figure A2 – Default processing workflow implemented in Proteome Discoverer for Label-Free analysis of MS data generated by Orbitrap Fusion™ Tribrid™ mass spectrometer.

This workflow was modified to include in the Sequest HT search engine node certain dynamic modifications: cysteine, lysine, arginine or histidine HNE-Michael Adduct, Schiff base and 2-pentylpyrrole. Proteins were searched both in the Homo Sapiens and in the Decoy Database by using the Percolator node.

“ProcessingWF_LTQ_Orbitrap\PWF_OT_Precursor_Quan_and_LFQ_CID_SequestHT_Percolator_HNE.pdProcessingWF”

“ConsensusWF_Comprehensive_Enhanced_Annotation_LFQ_and_Precursor.pdConsensusWF”

Figure A3 – Default consensus workflow implemented in Proteome Discoverer for Label-Free analysis of MS data generated by Orbitrap Fusion™ Tribrid™ mass spectrometer. The three standard validating levels of protein identification are obtained with PSM Grouper, Peptide Validator and Protein FDR Validator nodes. Annotations for all identified proteins were retrieved directly from ProteinCenter website by using the ProteinCenter Annotation node.

90

However, the components for the relative quantification analysis, the Minora Feature Detector node in the processing workflow, and the Feature Mapper and Precursor Ions Quantifier nodes in the consensus workflow, were not available due to software license expiration, therefore only a qualitative protein listing was performed.

General default settings for protein identification were used, with exception to the mass range set to 350-5000 Da, the signal/noise threshold set to 1.5, the precursor mass tolerance set to 10 ppm and the fragment mass tolerance set to 0.6 Da. The output files of the MS analysis showing the list of proteins identified for each condition of treatment were then processed with several reduction-filters, including: on the protein page, the number of peptides to be greater or equal than 3, and the peptide length to be greater or equal than 6 amino acids; on the peptide page, the Xcorr value to be greater or equal than 2.2, and modifications to contain HNE. Table A1 summarizes the numbers associated to the final lists of identified proteins for the Caco-2 samples treated with HNE according to the imposed parameters.

Table A1 – Protein groups, proteins, peptide groups and peptide-spectral matches count on the Caco-2 samples: control and treated with HNE in concentration 0.1 µM, 1 µM, 10 µM and 100 µM, after applying filters within Proteome Discoverer. Samples Control 0.1 µM 1 µM 10 µM 100 µM

Protein Groups 3208 3113 3107 3034 2749 Proteins 1238/7974 1019/7739 1196/7840 1126/7587 607/6898 Peptide Groups 298/21004 292/17808 7/17694 208/16608 183/14497 PSM 47099/56354 45171/53429 42222/52274 42300/50734 32239/42092

91

Appendix B – List of up- and down-regulated proteins in HNE-treated Caco-2 samples vs. control sample

The lists represent the outliers of the Volcano Plots obtained with Perseus software, for a fold-change < 0.5 and > 0.5, with p-value < 0.05. Several information was exported for each quantified protein: protein accession number, protein name, gene name, total peptides number, unique peptides number, percentage of coverage, molecular weight, matching scores, besides the values for statistical significance and fold-change.

Table B1 – Up- and down-regulated proteins on the Caco-2 sample treated with HNE in concentration 0.1 µM.

UP-REGULATED 0.1 µM

Unique Sequence Significant -LOG(p-value) Difference Protein IDs Protein names Gene names Peptides Razor + unique peptides Mol. weight [kDa] Score peptides coverage [%] Q4ZG72;A0A024RAH8;Q9NVP1;Q8N254;H7C452; + 5.682290832 0.972569466 ATP-dependent RNA helicase DDX18 DDX18 8 8 8 19.6 61.594 23.036 Q59FR7 + 4.070842863 0.89635102 P51571;A6NLM8 Translocon-associated protein subunit delta SSR4 6 6 6 37 18.998 11.519 A0A0D9SG77;A0A1B0GVL3;Q9H2G0;Q05086;Q5W + 4.200667914 0.54363807 7F7;Q9BUI6;Q96GR7;A0A0D9SGI3;A0A0D9SG63;S Ubiquitin-protein ligase E3A UBE3A 5 5 5 10.2 88.082 9.1169 4R306 DOWN-REGULATED 0.1 µM Unique Sequence Significant -LOG(p-value) Difference Protein IDs Protein names Gene names Peptides Razor + unique peptides Mol. weight [kDa] Score peptides coverage [%] Q5RKT7;B2RDW1;P62979;A0A248RGE3;J3QS39;J3 QTR3;F5H6Q2;F5GYU3;F5H2Z3;F5H265;Q5UGI3;B4 RPS27A;HEL112; DV12;F5H388;F5H747;F5GXK7;J3QKN0;Q5U5U6;Q Ubiquitin-40S ribosomal protein S27a;Ubiquitin;40S ribosomal protein UBB;UBC;UbC;D + 3.969514981 -0.774700483 5PY61;Q96C32;Q96MH4;L8B4I8;L8B196;L8B4R0;A8 15 15 3 72.4 17.905 49.631 S27a;Polyubiquitin-B;Ubiquitin;Polyubiquitin-C;Ubiquitin KFZp434K0435; K674;L8B4Z6;L8B4M0;L8B4J3;P0CG47;P0CG48;Q9 UBA52 UFQ0;Q96H31;Q66K58;Q59EM9;M0R1V7;Q49A90; Q8WYN9;F5GZ39;J3QSA3;A8CGI2 + 3.470979815 -0.830869675 P11387;B9EG90;Q5IFJ5;Q9BVT2;Q6PK95;Q6NWZ5 DNA topoisomerase 1 TOP1 16 16 15 25 90.725 37.455 + 4.231658355 -0.844214122 Q96AG4;I3L223 Leucine-rich repeat-containing protein 59 LRRC59 9 9 9 42.7 34.93 20.101

Table B2 – Up- and down-regulated proteins on the Caco-2 sample treated with HNE in concentration 1 µM.

UP-REGULATED 1 µM Razor + unique Unique Sequence coverage Mol. weight Significant -LOG(p-value) Difference Protein IDs Protein names Gene names Peptides Score peptides peptides [%] [kDa] A0A140VJJ6;C9JEU5;C9JC84;P02679;C9JPQ9;C9JU00;D3DP16; + 3.239816532 1.662530581 Fibrinogen gamma chain FGG 5 5 5 20.4 49.496 7.467 Q7Z664 High mobility group nucleosome-binding domain-containing + 4.3516225 1.492972692 Q15651 HMGN3 3 3 2 40.4 10.666 4.5653 protein 3 + 2.068861168 0.928467592 Q9Y5Z4;Q05DB4;Q5THN1 Heme-binding protein 2 HEBP2 7 7 7 34.6 22.875 12.744 Q584P1;B7Z3Y2;B7Z8A2;Q9UHG3;C9K055;F8W8W4;C9JM55;C + 4.727033966 0.903085709 Prenylcysteine oxidase 1 PCYOX1 9 9 9 24.6 45.138 15.314 9JGT6 B7Z9R8;E9PNM1;Q6IAX1;B4DND3;A0A1W2PQ47;P37268;B4D + 2.863163499 0.902542114 Squalene synthase FDFT1 4 4 4 14.2 40.753 7.0239 TK0;B4DWP0 NOMO3;NOMO + 3.551057179 0.882767359 J3KN36;P69849;Q5JPE7;Q1LZN2;A0A087WTD5;H3BPS9 Nodal modulator 3;Nodal modulator 2 8 8 1 9.7 139.38 15.573 2 + 3.044628488 0.822576523 A6NMQ3;Q5T5H1;O43768;A0A1W2PRU0;H3BTD3 Alpha-endosulfine ENSA 7 7 6 51.4 15.634 22.659 + 2.263200016 0.790232976 G1UI16;Q29RF7;B3KMN2;Q96DB6 Sister chromatid cohesion protein PDS5 homolog A PDS5A 7 7 7 6.8 150.83 12.718 + 3.333942714 0.787450473 Q4ZG72;A0A024RAH8;Q9NVP1;Q8N254;H7C452;Q59FR7 ATP-dependent RNA helicase DDX18 DDX18 8 8 8 19.6 61.594 23.036 + 2.364661042 0.787115733 E7EVH9;Q08623 Pseudouridine-5-phosphatase HDHD1 3 3 3 24.6 21.466 4.5671 Q05DA4;Q5HYD8;O15460;C9JCP0;C9JX45;A8MXE0;E7ERI1;E7E P4HA2;DKFZp68 + 2.341716816 0.783104579 Prolyl 4-hydroxylase subunit alpha-2 5 5 5 13.1 57.377 7.7608 NX0;E7EPI9 6M0919 + 4.621241511 0.706046422 I1W660;O94907;D6RGF1;D6RCC2;Q9UBU2 Dickkopf-related protein 1 DKK1 4 4 4 15.4 28.671 8.7227 + 3.451328822 0.668786685 P18077;F8WBS5;F8WB72;C9K025 60S ribosomal protein L35a RPL35A 7 7 7 36.4 12.538 11.213 + 3.347382564 0.637969971 Q9H2U1;E7EWK3;B7Z8P5;A0A0G2JRP3;A0A0G2JQU7;H7C5F5 ATP-dependent RNA helicase DHX36 DHX36 10 10 10 13.3 114.76 14.141

+ 2.927839955 0.63589557 Q96H79 Zinc finger CCCH-type antiviral protein 1-like ZC3HAV1L 3 3 3 18 32.962 7.4328 RPL17;hCG_2448 A0A087WXM6;J3QQT2;J3KRX5;A0A024R261;A0A0A6YYL6;P18 + 3.018332742 0.628766696 60S ribosomal protein L17 7;RPL17- 10 10 1 47.9 19.586 23.534 621;J3KRB3;A0A087WWH0;J3QS96;A0A087WY81;J3KSJ0 C18orf32 E9PQR7;Q548N1;Q9UK41;E9PLM9;E9PR04;E9PM90;A0A0S2Z5 + 3.048031815 0.621810913 Vacuolar protein sorting-associated protein 28 homolog VPS28 4 4 4 36 18.469 10.176 H5 THUMPD1;44M2 + 2.466380861 0.60701561 A0A024R388;Q9NXG2;H3BNW0;O60362;Q6MZT3;B4DFY9 THUMP domain-containing protein 1 .1;DKFZp686C10 4 4 4 20.4 39.315 18.126 54 DOWN-REGULATED 1 µM Razor + unique Unique Sequence coverage Mol. weight Significant -LOG(p-value) Difference Protein IDs Protein names Gene names Peptides Score peptides peptides [%] [kDa] Q53GE3;A0A024RBX9;P08559;A5YPB6;A5PHJ9;Q5JPU0;Q5JPT Pyruvate dehydrogenase E1 component subunit alpha;Pyruvate PDHA1;PDHA1/L + 2.215686897 -0.626162465 9;Q5JPU1;Q5JPU3;Q5JPU2;Q9UNV7;Q99725;Q99724;Q6LCA3; dehydrogenase E1 component subunit alpha, somatic form, 8 8 8 25.6 43.216 20.431 OC79064 P78557;P29803 mitochondrial E7D7X9;A0A024R8U9;P32322;E2QRB3;E7D7Y0;A0A1B2JLU7;Q Pyrroline-5-carboxylate reductase;Pyrroline-5-carboxylate + 2.267809888 -0.664071083 2TU78;B7Z8T1;J3KQ22;Q8TBX0;J3QL24;J3QR88;J3QLK9;J3QL3 PYCR1;PIG45 9 9 8 44.8 33.39 19.499 reductase 1, mitochondrial 2;J3QKT4;J3KTA8;J3QKT3;J3QRZ0;J3QL23;B4DQK8 + 2.207590154 -0.669151306 P20674;H3BNX8;H3BRM5;H3BV69;H3BRI0;Q71UP1 Cytochrome c oxidase subunit 5A, mitochondrial COX5A 9 9 9 58 16.762 31.384 V9HWC9;P00441;H7BYH4;W8Q444;A1YYW4;X5D3G4;X5CN18; + 2.738603203 -0.683326085 Superoxide dismutase [Cu-Zn] HEL-S-44;SOD1 11 11 11 97.4 15.936 54.504 X5CHU1;X5D5A9 + 2.126159862 -0.716614087 P10606;Q6FHM4;Q6FHJ9 Cytochrome c oxidase subunit 5B, mitochondrial COX5B 6 6 6 30.2 13.696 12.264 O75821;Q6IAM0;K7EL20;A8K5K5;K7ER90;K7ENA8;K7EP16;B0 + 3.023562451 -0.721892675 Eukaryotic translation initiation factor 3 subunit G EIF3G 13 13 13 45.6 35.611 43.455 AZV5;K7EL60;B7Z1G0;K7ENH0 Q13509;Q9BV28;Q53G92;B2RBD5;A0A0B4J269;Q3ZCR3;G3V2 + 3.027231483 -0.729504267 Tubulin beta-3 chain TUBB3 21 5 5 57.6 50.432 18.038 A3;G3V5W4;G3V2R8;G3V3R4;G3V2N6;G3V3J6;G3V3W7 + 2.250365936 -0.752418518 Q8N5N7 39S ribosomal protein L50, mitochondrial MRPL50 3 3 3 29.1 18.325 7.332 + 2.628724644 -0.787349065 A0A1L1UHR1;P21796;B3KTS5;C9JI87;B4DEI3 Voltage-dependent anion-selective channel protein 1 VDAC1 14 14 14 62.9 30.772 32.678 P05023;B7Z3V1;B3KRI1;B1AKY9;B2RAQ4;A0A0S2Z3W6;P5099 + 2.914359097 -0.795041084 3;B7Z1N5;B3KX35;H0Y7C1;B4DIQ8;B3KW93;Q58I23;A0A2H4Y Sodium/potassium-transporting ATPase subunit alpha-1 ATP1A1 20 20 11 26.8 112.89 42.3 9T5 + 2.687114993 -0.802395058 Q71UA6;Q15758;M0QXM4;Q59ES3;M0QX44;B4DE27 Amino acid transporter;Neutral amino acid transporter B(0) SLC1A5 4 4 4 13.1 56.583 13.023 + 1.961195711 -0.859285196 Q9BWJ5 Splicing factor 3B subunit 5 SF3B5 4 4 4 46.5 10.135 7.8636 + 4.61408463 -0.870193799 Q9H9B4;D6RFI0;D6RDG7;S4R2X2;D6RAE9 Sideroflexin-1 SFXN1 9 9 9 34.5 35.619 17.208 MST065;TOMM2 + 2.822104784 -0.99582243 Q549C5;Q9NS69;Q53GB0 Mitochondrial import receptor subunit TOM22 homolog 5 5 5 66.2 15.521 15.223 2 A0A0A0MR02;A0A024QZT0;A0A024QZN9;P45880;B4DKM5;Q + 4.863776097 -1.00235494 Voltage-dependent anion-selective channel protein 2 VDAC2 10 10 10 48.2 30.348 23.048 5JSD2;Q5JSD1;A2A3S1 + 3.922439704 -1.095991453 Q8TE01 derp12 11 11 11 32.7 38.192 21.929 HIST1H2AB;HIST Histone H2A;Histone H2A type 1-C;Histone H2A type 3;Histone + 3.964983169 -1.100026957 Q08AJ9;A0A024R017;Q93077;Q7L7L0;P04908 1H2AC;HIST3H2 7 2 2 57.7 14.135 41.457 H2A type 1-B/E A Q53T76;A0A140VJK2;P43304;A8K1Z2;B7Z601;Q8WUQ0;F5GYK Glycerol-3-phosphate dehydrogenase;Glycerol-3-phosphate + 3.337658525 -1.14129289 GPD2 12 12 12 21.5 77.252 19.638 7;A0A0S2Z3S0 dehydrogenase, mitochondrial

92

Table B3 – Up- and down-regulated proteins on the Caco-2 sample treated with HNE in concentration 10 µM.

UP-REGULATED 10 µM Razor + unique Unique Sequence coverage Significant -LOG(p-value) Difference Protein IDs Protein names Gene names Peptides Mol. weight [kDa] Score peptides peptides [%] High mobility group nucleosome-binding domain- + 2.214304894 1.066178004 Q15651 HMGN3 3 3 2 40.4 10.666 4.5653 containing protein 3 + 3.031389589 0.86313858 G1UI16;Q29RF7;B3KMN2;Q96DB6 Sister chromatid cohesion protein PDS5 homolog A PDS5A 7 7 7 6.8 150.83 12.718 J3KN36;P69849;Q5JPE7;Q1LZN2;A0A087WT + 4.54558314 0.861730576 Nodal modulator 3;Nodal modulator 2 NOMO3;NOMO2 8 8 1 9.7 139.38 15.573 D5;H3BPS9 A8K126;A0A024RDS3;Q9UBW7;A0A087X151; + 2.337864272 0.856110891 Zinc finger MYM-type protein 2 ZNF198;ZMYM2 3 3 3 3.1 139.73 2.0089 B4DZH0 + 2.532042058 0.831194083 E7ET48;Q9NWG5 AIM1L 6 6 6 19.3 49.276 10.945 J3KP58;A0A024RBR1;P30622;F5H6A0;F5H0N + 2.865243476 0.816943487 7;Q59GJ4;F5H1T5;F6VGP8;Q86WU4;B3KXA5; CAP-Gly domain-containing linker protein 1 CLIP1;RSN 7 7 7 7.3 148.13 8.4977 Q6P5Z9;F5H367;F5H270 Q584P1;B7Z3Y2;B7Z8A2;Q9UHG3;C9K055;F8 + 4.393974376 0.805989265 Prenylcysteine oxidase 1 PCYOX1 9 9 9 24.6 45.138 15.314 W8W4;C9JM55;C9JGT6 + 2.613283638 0.770235729 R4GN98;P06703;B2R577 Protein S100;Protein S100-A6 S100A6 5 5 5 48.2 9.681 10.43 Q05DA4;Q5HYD8;O15460;C9JCP0;C9JX45;A8 P4HA2;DKFZp686M0 + 2.682967274 0.764158885 Prolyl 4-hydroxylase subunit alpha-2 5 5 5 13.1 57.377 7.7608 MXE0;E7ERI1;E7ENX0;E7EPI9 919 Vesicle-associated membrane protein-associated protein + 3.031193929 0.662232399 E5RK64;Q53XM7;O95292;Q59EZ6 VAPB 3 3 3 42.3 7.8009 6.4423 B/C DOWN-REGULATED 10 µM Razor + unique Unique Sequence coverage Significant -LOG(p-value) Difference Protein IDs Protein names Gene names Peptides Mol. weight [kDa] Score peptides peptides [%] + 3.133553004 -0.647580783 A0A024R845;P61106;X6RFL8 Ras-related protein Rab-14 RAB14 9 9 9 51.2 23.897 18.791 Q53T76;A0A140VJK2;P43304;A8K1Z2;B7Z601 Glycerol-3-phosphate dehydrogenase;Glycerol-3- + 3.198443013 -0.735952377 GPD2 12 12 12 21.5 77.252 19.638 ;Q8WUQ0;F5GYK7;A0A0S2Z3S0 phosphate dehydrogenase, mitochondrial P23378;A0A1J0GNC2;A0A1W2PP74;A0A1W2 PQV3;A0A1W2PPB1;A0A1W2PQU6;A0A1W2 + 2.561699917 -0.76705869 Glycine dehydrogenase (decarboxylating), mitochondrial GLDC 13 13 13 20.7 112.73 25.823 PPH6;Q9HDA3;A0A1W2PPG0;A0A1W2PPK8; A0A1W2PPE1 Q71UA6;Q15758;M0QXM4;Q59ES3;M0QX44; + 3.262911222 -0.783893394 Amino acid transporter;Neutral amino acid transporter B(0) SLC1A5 4 4 4 13.1 56.583 13.023 B4DE27 + 4.657627272 -0.790905635 Q9H9B4;D6RFI0;D6RDG7;S4R2X2;D6RAE9 Sideroflexin-1 SFXN1 9 9 9 34.5 35.619 17.208 + 3.861721204 -0.824496587 Q549C5;Q9NS69;Q53GB0 Mitochondrial import receptor subunit TOM22 homolog MST065;TOMM22 5 5 5 66.2 15.521 15.223 + 2.80269102 -0.93421491 B2R4V2;Q969Q0;R4GN19 60S ribosomal protein L36a-like RPL36AL;RPL36A 7 7 2 29.2 12.527 8.4066

Table B4 – Up- and down-regulated proteins on the Caco-2 sample treated with HNE in concentration 100 µM.

UP-REGULATED 100 µM Razor + unique Unique Significant -LOG(p-value) Difference Protein IDs Protein names Gene names Peptides Sequence coverage [%] Mol. weight [kDa] Score peptides peptides + 1.841015717 2.652668635 Q71DI3 Histone H3.2 HIST2H3A 9 1 1 58.1 15.388 11.885 + 7.333678304 2.438052877 P52895;B4DK69;Q59GU2;S4R3P0 Aldo-keto reductase family 1 member C2 AKR1C2 17 4 4 65 36.735 10.248 + 8.941107437 2.286650976 Q04828;H0Y804;A6NHU4;B4E0M1 Aldo-keto reductase family 1 member C1 AKR1C1 18 18 1 67.2 36.788 35.678 ADP/ATP translocase 2;ADP/ATP translocase 2, N-terminally + 3.894005921 1.985486984 P05141;Q6NVC0;O75961 SLC25A5 11 11 4 35.6 32.852 23.021 processed A8K5I0;A0A0G2JIW1;P0DMV9;P0DMV8;A0A1U9X7W4;B4DNT8; HEL-S- + 12.14322701 1.757443746 Q59EJ3;B4DFN9;B4DWK5;B4E388;B4DI39;B4E1S9;B4DVU9;V9GZ Heat shock 70 kDa protein 1B;Heat shock 70 kDa protein 1A 103;HSPA1B;HSPA 46 42 27 64 70.051 132.55 37;B3KTT5;B4DNX1;B4E1T6;Q9UQC1;B4DNV4 1A

+ 2.840540652 1.472600778 P48507;D3DT44 Glutamate--cysteine ligase regulatory subunit GCLM 4 4 4 20.8 30.727 47.613 + 4.092307315 1.30824852 Q4ZG72;A0A024RAH8;Q9NVP1;Q8N254;H7C452;Q59FR7 ATP-dependent RNA helicase DDX18 DDX18 8 8 8 19.6 61.594 23.036 + 1.347241987 1.297547023 Q5ISQ0;Q6FHV6;P09104;F5H0C8;A8K3B0 Gamma-enolase;Enolase ENO2 6 2 2 21.4 45.54 2.0356 Q13501;E7EMC7;E3WH17;E3W990;B4DE82;B4E3V2;E9PFW8;C9J SQSTM1;SQSTM1- + 4.919939425 1.268679937 Sequestosome-1;Tyrosine-protein kinase receptor 13 13 13 47.7 47.687 49.535 RJ8;Q13502;D6RBF1;C9J6J8 ALK + 1.957199926 1.262562434 Q5TEC6 Histone H3 HIST2H3PS2 5 1 1 27.2 15.43 2.4567 A0A1W2PNM1;B2RB06;A0A140VK76;B3KTT6;A0A0D9SFP2;Q168 + 1.215141734 1.247538249 36;A0A1W2PQ78;E9PF18;A0A1W2PQV5;A0A0A0MSE2;A0A1W2 Hydroxyacyl-coenzyme A dehydrogenase, mitochondrial HADH 6 6 6 28.8 29.096 14.276 PQC2;A0A1W2PRT2;A0A1W2PQ55;A0A1W2PP40

P05023;B7Z3V1;B3KRI1;B1AKY9;B2RAQ4;A0A0S2Z3W6;P50993;B + 3.490769308 1.19863383 Sodium/potassium-transporting ATPase subunit alpha-1 ATP1A1 20 20 11 26.8 112.89 42.3 7Z1N5;B3KX35;H0Y7C1;B4DIQ8;B3KW93;Q58I23;A0A2H4Y9T5 K7EK07;B2R4P9;P84243;B4E380;B2R6Y1;A8K4Y7;A0A1X8XL64;Q1 H3F3B;H3F3A;HIST + 2.378280444 1.154080391 Histone H3;Histone H3.3;Histone H3.1t;Histone H3.3C 10 10 2 59.8 14.914 25.502 6695;K7EMV3;B4DEB1;K7ES00;Q6NXT2;Q6TXQ4;K7EP01 3H3;H3F3C

+ 2.205199663 1.147715886 P31930;B4DUL5;B4DZB9 Cytochrome b-c1 complex subunit 1, mitochondrial UQCRC1 9 9 9 32.1 52.645 21.881

+ 2.95518102 1.13394022 B7Z9S8;A6NGH2;Q6LEU2;A3KLL5;P05026;Q58I20;V9GYR2 Sodium/potassium-transporting ATPase subunit beta-1 ATP1B1 4 4 4 15.8 28.627 7.9529

+ 1.328052403 1.124486605 J3KN36;P69849;Q5JPE7;Q1LZN2;A0A087WTD5;H3BPS9 Nodal modulator 3;Nodal modulator 2 NOMO3;NOMO2 8 8 1 9.7 139.38 15.573 + 2.197013578 1.117386818 Q4VB24;B2R984;A3R0T8;A3R0T7;P10412;A1L407;P22492;Q02539 Histone H1.4 HIST1H1E 11 11 3 26.9 21.893 25.265

+ 1.614475604 1.114023844 B7ZA56;B4E184;A0A0S2Z500;Q96CV9;A0A0S2Z5I6;A0A087WY28 Optineurin OPTN 3 3 3 7.5 59.559 9.9504

+ 1.195894857 1.107519786 A8K3Y5;Q9NX58;D6RDJ1 Cell growth-regulating nucleolar protein LYAR 4 4 4 11.1 43.622 6.8963 + 2.673068048 1.101834615 B2R4R0;P62805;Q0VAS5;Q6B823 Histone H4 HIST1H4H;HIST1H4 13 13 13 62.1 11.367 142.25 E5KS60;E5KS55;Q6N0B1;Q9P2R7;A0A0S2Z511;B4DY87;Q7Z503;B DKFZp686D0880;S Succinyl-CoA ligase subunit beta;Succinyl-CoA ligase [ADP- + 2.872749974 1.051732699 7ZAF6;B4DRV2;Q9Y4T0;B7Z9S6;Q5T9Q5;Q5T9Q8;A0A0U1RQF8; UCLA2;DKFZp564P 5 5 5 12.7 50.317 8.6612 forming] subunit beta, mitochondrial A0A0U1RQL1;A0A0U1RQU7 2062 E9PIR7;B2R5P6;F8W809;B7Z2S5;A0A087WSW9;A0A182DWI3;A0 A024RBK9;A0A140VKC9;A0A087WSY9;Q16881;E2QRB9;Q6ZR44; + 7.630571263 1.00251929 Thioredoxin reductase 1, cytoplasmic TXNRD1 13 13 12 46.1 53.166 32.558 E9PKD3;E9PLT3;E9PKI4;A0A0B4J225;E9PIZ5;E9PRI8;E9PQI3;B3K UQ5

+ 2.782179694 0.956207275 Q53GY1;O95817;C9JFK9 BAG family molecular chaperone regulator 3 BAG3 7 7 7 18.8 61.603 11.786 + 2.989317786 0.934076627 P11388;J3KTB7;E9PCY5;B4DKD0;Q71UH4;Q02880 DNA topoisomerase 2-alpha TOP2A 8 8 8 6.9 174.38 13.681 + 1.808585952 0.933511893 P51571;A6NLM8 Translocon-associated protein subunit delta SSR4 6 6 6 37 18.998 11.519

Q6NX51;B7Z321;Q96A65;B7Z4J9;Q8TAR2;B7Z689;B3KTF7;B7Z50 + 1.356810885 0.927825928 Exocyst complex component 4 EXOC4 3 3 3 4.6 110.5 6.4749 2 Single-stranded DNA-binding protein;Single-stranded DNA- + 4.675172079 0.913513819 Q567R6;A4D1U3;Q04837;A0A0G2JLD8;E7EUY5;B7Z268;C9K0U8 SSBP1 7 7 7 60.8 17.359 13.774 binding protein, mitochondrial

B4DJV2;A0A024RB75;O75390;A0A0C4DGI3;B3KTN4;Q0QEL2;H0Y + 5.230384393 0.904898008 IC4;F8W4S1;B7Z1E1;F8VZK9;F8VX07;F8VRP1;F8VX68;F8VWQ5;F Citrate synthase;Citrate synthase, mitochondrial CS 11 11 10 35.3 50.431 21.26 8VPF9;F8VPA1;F8VTT8;F8W1S4;H0YH82;F8W642;F8VRI6

+ 3.534574928 0.890849304 Q9HDC9;H0Y512 Adipocyte plasma membrane-associated protein APMAP 7 7 7 24.5 46.48 15.333

B3KSG3;B7Z4H1;F8W8M4;A0A0A0MRL6;O14639;H0Y7N6;F6XFR5 + 1.937044861 0.889584223 Actin-binding LIM protein 1 ABLIM1 4 4 4 13 60.189 8.5168 ;B4DQA3;J3QSX6 B7Z9R8;E9PNM1;Q6IAX1;B4DND3;A0A1W2PQ47;P37268;B4DTK + 2.753430763 0.88890775 Squalene synthase FDFT1 4 4 4 14.2 40.753 7.0239 0;B4DWP0 Q584P1;B7Z3Y2;B7Z8A2;Q9UHG3;C9K055;F8W8W4;C9JM55;C9J + 5.197240327 0.886361758 Prenylcysteine oxidase 1 PCYOX1 9 9 9 24.6 45.138 15.314 GT6 + 4.649112529 0.885055224 A0A024R172;Q14914;F2Z3J9;Q5JVP2;F6XGT7 Prostaglandin reductase 1 LTB4DH;PTGR1 9 9 9 43.2 35.885 18.467 + 4.116973111 0.88087527 Q643R0;Q9ULW0;B3KM90;Q96FC3 Targeting protein for Xklp2 HCTP4;TPX2 12 12 12 26.1 85.652 18 Q14TF0;P48506;B4E2I4;E1CEI4;A0A0C4DGB2;D6R959;H0Y9I7;H0Y + 5.017323095 0.879188855 Glutamate--cysteine ligase catalytic subunit GCLC 17 17 17 32.2 72.765 31.866 AB6;D6REX4;D6RGF8 + 3.659302532 0.849722544 P16401;Q14463 Histone H1.5 HIST1H1B 8 8 8 26.1 22.58 16.5 + 2.747922464 0.84292539 B4DW11;H0YC35;P10909;H0YAS8 ;Clusterin;Clusterin beta chain;Clusterin alpha chain CLU 2 2 2 17.7 25.683 5.6022

B9A067;Q16891;C9J406;B4DS66;B4DQY2;B4DKR1;H7C463;B4DT2 + 3.158018265 0.836988131 MICOS complex subunit MIC60 IMMT 19 19 19 36.1 78.973 38.133 0;B4E2B5;Q05DN3;Q3B7X4;A0A087WYS0;D6RAW4 I1SRC5;A0A024RAV5;L7RSL8;P01116;Q92468;X5D945;Q5U091;P0 + 1.945351203 0.810368538 1112;P01111;Q14014;P78460;Q96T27;Q7KZ06;Q71SP6;A0A221ZR 9 4 4 45.3 33.983 6.4979 Z1;G3V4K2;A0A221ZRZ5;A0A221ZRZ4;G3V5T7;Q14015 Q01650;Q2MCL6;H0YJ95;A0A0C4DGL4;G3V4Z6;B4DVT0;Q96QB2; + 2.421839814 0.799834887 Large neutral amino acids transporter small subunit 1 SLC7A5 4 4 4 8.1 55.01 8.6462 A0A0S2Z502;A0A024R719;Q9UM01;Q92536 + 1.451372875 0.794154803 O43719;B4DRS4;Q5H919;Q5H918 HIV Tat-specific factor 1 HTATSF1 6 6 6 16.6 85.852 13.696 Q6FHM2;P62879;E7EP32;C9JXA5;C9JIS1;C9JZN1;Q9HAV0;A8K3F + 3.08762371 0.791423162 Guanine nucleotide-binding protein G(I)/G(S)/G(T) subunit beta-2 GNB2 12 7 7 51.2 37.331 31.628 6;C9JD14;H7C5J5

Ras-related C3 botulinum toxin substrate 1;Ras-related C3 A4D2P1;A4D2P0;P63000;P60763;A4D2P2;Q5G7L4;A0A024R1P2;P RAC1;RAC3;RAC2; + 2.380108802 0.782858149 botulinum toxin substrate 3;Ras-related C3 botulinum toxin 4 3 3 19.8 21.45 9.5183 15153;J3KSC4;J3QLK0;B1AH77;A0A024R9T5;B1AH78;B1AH80 hCG_20693 substrate 2

E5KNY5;P42704;B4DSR0;B8ZZ38;A0A0C4DG06;C9JCA9;A0A0C4D + 1.777302065 0.76747481 Leucine-rich PPR motif-containing protein, mitochondrial LRPPRC 37 37 37 32.9 157.9 82.635 G51;A8K6U0;A0A024R746;Q9NP80 + 2.695951778 0.76606547 Q14684;Q6PJM8 Ribosomal RNA processing protein 1 homolog B RRP1B 8 8 8 13.9 84.427 17.35

+ 2.145882011 0.72242864 G1UI16;Q29RF7;B3KMN2;Q96DB6 Sister chromatid cohesion protein PDS5 homolog A PDS5A 7 7 7 6.8 150.83 12.718

+ 2.18213022 0.719856898 B2RAU5;Q9Y5X1;B3KXH8;O95061;Q59ES5 Sorting nexin;Sorting nexin-9 SNX9 4 4 4 9.7 66.561 6.9032

+ 3.400787241 0.714480464 E9PQR7;Q548N1;Q9UK41;E9PLM9;E9PR04;E9PM90;A0A0S2Z5H5 Vacuolar protein sorting-associated protein 28 homolog VPS28 4 4 4 36 18.469 10.176 + 3.577770528 0.712407112 B4DUG4;Q3MHD2;K7ELG9 Protein LSM12 homolog LSM12 3 3 3 38.9 12.508 4.7838

+ 2.604224972 0.706515948 P22695;H3BRG4;H3BSJ9;H3BP04;H3BUE4;H3BUI9;A0A087WVZ4 Cytochrome b-c1 complex subunit 2, mitochondrial UQCRC2 10 10 10 36.4 48.442 37.10793 A0A0S2Z3S5;A0A0S2Z3H8;P63092;Q5JWF2;B0AZR9;Q5FWY2;Q1 4455;S4R3E3;O60726;A0A0A0MR13;Q5JWE9;H0Y7F4;A0A1W2PR Guanine nucleotide-binding protein G(s) subunit alpha isoforms E1;A2A2R6;A0A1W2PQK2;O75633;A0A087WTB6;Q5JWD1;O7563 + 3.50445256 0.68833224 short;Guanine nucleotide-binding protein G(s) subunit alpha GNAS;GSA 9 9 8 28.9 44.266 13.871 2;H0Y7E8;A0A1W2PRJ7;S4R3V9;A0A087WZE5;A0A1W2PS82;Q93 isoforms XLas 020;A0A024R2Z1;Q5T697;B3KP89;A0A142CHG9;P11488;P63096; P19087;P09471;A8MTJ3;P04899 TMED7- + 2.152876196 0.685355377 A0A0A6YYA0;Q9Y3B3;Q3B7W7;B4E2C1;G3V2Y2 Transmembrane emp24 domain-containing protein 7 3 3 3 18.6 21.233 6.8982 TICAM2;TMED7 + 2.319189851 0.668915558 A0A024RB02;Q86W92;F5GZP6;B4DFU8;H0YFE4;B3KM46 Liprin-beta-1 PPFIBP1 6 6 4 8.6 96.964 12.325 Q6FI51;Q6FHS4;P25685;M0R080;M0R1D6;M0QZD0;M0QYT3;M0 + 4.638306623 0.66754055 DnaJ homolog subfamily B member 1 DNAJB1 13 13 12 39.7 38.01 23.451 QXK0;B4DKK9;M0R128;A0A1B0GX60

+ 3.088211938 0.654720465 A0A024RDB0;A0AVT1;B3KSS1;H0Y8S8 Ubiquitin-like modifier-activating enzyme 6 UBE1L2;UBA6 10 10 10 12.1 117.97 12.68

+ 1.765955638 0.652869225 B3KMB9;Q8NEU8;B7Z1Q8;A8K3E7;F8W124;F8VS86;F8VXB0 DCC-interacting protein 13-beta APPL2 5 5 5 13.9 74.432 10.913

Dipeptidyl peptidase 4;Dipeptidyl peptidase 4 membrane + 3.173133528 0.649496714 P27487;F8WE17;H7C1N5;F8WBB6 DPP4 17 17 17 24.3 88.278 36.355 form;Dipeptidyl peptidase 4 soluble form

+ 3.278211214 0.644838015 Q71UA6;Q15758;M0QXM4;Q59ES3;M0QX44;B4DE27 Amino acid transporter;Neutral amino acid transporter B(0) SLC1A5 4 4 4 13.1 56.583 13.023 Q9BV61;Q59EK6;Q53G55;A0A140VJY2;Q12931;Q53FS6;Q8N9Z3; + 3.562072968 0.643105189 Heat shock protein 75 kDa, mitochondrial TRAP1 18 18 18 29.8 79.356 49.743 K0A7K7;Q5CAQ4;I3L0K7;I3L239;I3L253;I3L3L4;I3L2D5 + 3.182008086 0.641665777 A8K9K6;O00567;A0PJ92;H0Y653;H0YDU4;Q5JXT2;Q9BSN3 Nucleolar protein 56 NOP56 17 17 17 37.2 65.977 37.023

+ 3.484762811 0.638024648 A8KAK1;Q9NYU2;H7BZG0;Q9NYU1;Q9NV86 UDP-glucose:glycoprotein glucosyltransferase 1 UGGT1 24 24 24 22.1 175 43.455 + 1.543783978 0.626900355 Q9Y3U8;J3QSB5 60S ribosomal protein L36 RPL36 4 4 4 22.9 12.254 22.429 A0A1B0GUA3;Q96EK5;A0A1B0GUH6;A0A1B0GTE6;A0A0D9SFK7; + 3.687379808 0.618414561 KIF1-binding protein KIAA1279 5 5 5 16.4 74.783 10.6 A0A1B0GU96;A0A1B0GV79;A0A1B0GVY9;A0A1B0GW21 A0A024RAE4;B4E1U9;P60953;B7ZAY4;B4DMH5;A0A024RAE6;A0 A024R9T1;Q5JYX0;Q9UJM1;Q9UJM0;F8WET9;B1AH79;Q59FQ0;G + 2.029086768 0.617617925 Cell division control protein 42 homolog CDC42;hCG_39634 7 7 6 43.5 21.258 11.227 3V4H1;G3V476;V9HWD0;Q7Z513;A0A024R692;Q59G91;P17081; Q9H4E5 + 1.448118946 0.616496563 A8K5D9;A0A024RA49;Q9NQW6;B4DSL6;H7C3S1;C9JJT6;H7C1C2 Actin-binding protein anillin ANLN 6 6 6 8.9 119.9 8.6825

Solute carrier family 2, facilitated glucose transporter member + 1.697364928 0.613102913 B7Z966;P11169;B7Z5A7;Q59F54;B7Z5J8;Q8TDB8;F5GYR5 3;Solute carrier family 2, facilitated glucose transporter member SLC2A3;SLC2A14 9 9 8 21 47.394 28.627 14

+ 1.578509664 0.610093117 C9JLU1;P52434 DNA-directed RNA polymerases I, II, and III subunit RPABC3 POLR2H 4 4 4 39.9 16.925 9.2436 + 4.856275417 0.607052167 Q96HE7;G3V3E6;G3V5B3;G3V503;G3V2H0;Q5TAE8;Q86YB8 ERO1-like protein alpha ERO1L 18 18 17 59 54.392 38.109 + 2.690874463 0.606653372 O75607 Nucleoplasmin-3 NPM3 2 2 2 17.4 19.343 31.46 + 2.309760852 0.60609436 O75976;B7Z843 Carboxypeptidase D CPD 3 3 3 4.6 152.93 17.894 Q6IBR0;Q53EP4;P04843;Q96HX3;B4DL99;B7Z4L4;B4DNJ5;F8WF3 Dolichyl-diphosphooligosaccharide--protein glycosyltransferase + 5.004899385 0.605189323 RPN1 18 18 18 34.6 68.606 37.5 2 subunit 1 + 2.510435276 0.6046079 J3KS05;Q6IBN6;P83916;B5MD17;K7ELA4;C9JWS9;Q9Y654 Chromobox protein homolog 1 CBX1 8 7 7 54.3 20.03 29.078 + 3.36512255 0.600270271 P78527;B4DL41;F5GX40 DNA-dependent protein kinase catalytic subunit PRKDC 40 40 40 12.9 469.08 58.466 + 2.57515175 0.528113683 P30837;B4DLJ0 Aldehyde dehydrogenase X, mitochondrial ALDH1B1 12 11 11 28.4 57.206 19.705 UP-REGULATED 100 µM Razor + unique Unique Significant -LOG(p-value) Difference Protein IDs Protein names Gene names Peptides Sequence coverage [%] Mol. weight [kDa] Score peptides peptides + 1.841015717 2.652668635 Q71DI3 Histone H3.2 HIST2H3A 9 1 1 58.1 15.388 11.885 + 7.333678304 2.438052877 P52895;B4DK69;Q59GU2;S4R3P0 Aldo-keto reductase family 1 member C2 AKR1C2 17 4 4 65 36.735 10.248 + 8.941107437 2.286650976 Q04828;H0Y804;A6NHU4;B4E0M1 Aldo-keto reductase family 1 member C1 AKR1C1 18 18 1 67.2 36.788 35.678 ADP/ATP translocase 2;ADP/ATP translocase 2, N-terminally + 3.894005921 1.985486984 P05141;Q6NVC0;O75961 SLC25A5 11 11 4 35.6 32.852 23.021 processed A8K5I0;A0A0G2JIW1;P0DMV9;P0DMV8;A0A1U9X7W4;B4DNT8; HEL-S- + 12.14322701 1.757443746 Q59EJ3;B4DFN9;B4DWK5;B4E388;B4DI39;B4E1S9;B4DVU9;V9GZ Heat shock 70 kDa protein 1B;Heat shock 70 kDa protein 1A 103;HSPA1B;HSPA 46 42 27 64 70.051 132.55 37;B3KTT5;B4DNX1;B4E1T6;Q9UQC1;B4DNV4 1A

+ 2.840540652 1.472600778 P48507;D3DT44 Glutamate--cysteine ligase regulatory subunit GCLM 4 4 4 20.8 30.727 47.613 + 4.092307315 1.30824852 Q4ZG72;A0A024RAH8;Q9NVP1;Q8N254;H7C452;Q59FR7 ATP-dependent RNA helicase DDX18 DDX18 8 8 8 19.6 61.594 23.036 + 1.347241987 1.297547023 Q5ISQ0;Q6FHV6;P09104;F5H0C8;A8K3B0 Gamma-enolase;Enolase ENO2 6 2 2 21.4 45.54 2.0356 Q13501;E7EMC7;E3WH17;E3W990;B4DE82;B4E3V2;E9PFW8;C9J SQSTM1;SQSTM1- + 4.919939425 1.268679937 Sequestosome-1;Tyrosine-protein kinase receptor 13 13 13 47.7 47.687 49.535 RJ8;Q13502;D6RBF1;C9J6J8 ALK + 1.957199926 1.262562434 Q5TEC6 Histone H3 HIST2H3PS2 5 1 1 27.2 15.43 2.4567 A0A1W2PNM1;B2RB06;A0A140VK76;B3KTT6;A0A0D9SFP2;Q168 + 1.215141734 1.247538249 36;A0A1W2PQ78;E9PF18;A0A1W2PQV5;A0A0A0MSE2;A0A1W2 Hydroxyacyl-coenzyme A dehydrogenase, mitochondrial HADH 6 6 6 28.8 29.096 14.276 PQC2;A0A1W2PRT2;A0A1W2PQ55;A0A1W2PP40

P05023;B7Z3V1;B3KRI1;B1AKY9;B2RAQ4;A0A0S2Z3W6;P50993;B + 3.490769308 1.19863383 Sodium/potassium-transporting ATPase subunit alpha-1 ATP1A1 20 20 11 26.8 112.89 42.3 7Z1N5;B3KX35;H0Y7C1;B4DIQ8;B3KW93;Q58I23;A0A2H4Y9T5

K7EK07;B2R4P9;P84243;B4E380;B2R6Y1;A8K4Y7;A0A1X8XL64;Q1 H3F3B;H3F3A;HIST + 2.378280444 1.154080391 Histone H3;Histone H3.3;Histone H3.1t;Histone H3.3C 10 10 2 59.8 14.914 25.502 6695;K7EMV3;B4DEB1;K7ES00;Q6NXT2;Q6TXQ4;K7EP01 3H3;H3F3C

+ 2.205199663 1.147715886 P31930;B4DUL5;B4DZB9 Cytochrome b-c1 complex subunit 1, mitochondrial UQCRC1 9 9 9 32.1 52.645 21.881

+ 2.95518102 1.13394022 B7Z9S8;A6NGH2;Q6LEU2;A3KLL5;P05026;Q58I20;V9GYR2 Sodium/potassium-transporting ATPase subunit beta-1 ATP1B1 4 4 4 15.8 28.627 7.9529

+ 1.328052403 1.124486605 J3KN36;P69849;Q5JPE7;Q1LZN2;A0A087WTD5;H3BPS9 Nodal modulator 3;Nodal modulator 2 NOMO3;NOMO2 8 8 1 9.7 139.38 15.573 + 2.197013578 1.117386818 Q4VB24;B2R984;A3R0T8;A3R0T7;P10412;A1L407;P22492;Q02539 Histone H1.4 HIST1H1E 11 11 3 26.9 21.893 25.265

+ 1.614475604 1.114023844 B7ZA56;B4E184;A0A0S2Z500;Q96CV9;A0A0S2Z5I6;A0A087WY28 Optineurin OPTN 3 3 3 7.5 59.559 9.9504

+ 1.195894857 1.107519786 A8K3Y5;Q9NX58;D6RDJ1 Cell growth-regulating nucleolar protein LYAR 4 4 4 11.1 43.622 6.8963 + 2.673068048 1.101834615 B2R4R0;P62805;Q0VAS5;Q6B823 Histone H4 HIST1H4H;HIST1H4 13 13 13 62.1 11.367 142.25 E5KS60;E5KS55;Q6N0B1;Q9P2R7;A0A0S2Z511;B4DY87;Q7Z503;B DKFZp686D0880;S Succinyl-CoA ligase subunit beta;Succinyl-CoA ligase [ADP- + 2.872749974 1.051732699 7ZAF6;B4DRV2;Q9Y4T0;B7Z9S6;Q5T9Q5;Q5T9Q8;A0A0U1RQF8; UCLA2;DKFZp564P 5 5 5 12.7 50.317 8.6612 forming] subunit beta, mitochondrial A0A0U1RQL1;A0A0U1RQU7 2062 E9PIR7;B2R5P6;F8W809;B7Z2S5;A0A087WSW9;A0A182DWI3;A0 A024RBK9;A0A140VKC9;A0A087WSY9;Q16881;E2QRB9;Q6ZR44; + 7.630571263 1.00251929 Thioredoxin reductase 1, cytoplasmic TXNRD1 13 13 12 46.1 53.166 32.558 E9PKD3;E9PLT3;E9PKI4;A0A0B4J225;E9PIZ5;E9PRI8;E9PQI3;B3K UQ5

+ 2.782179694 0.956207275 Q53GY1;O95817;C9JFK9 BAG family molecular chaperone regulator 3 BAG3 7 7 7 18.8 61.603 11.786 + 2.989317786 0.934076627 P11388;J3KTB7;E9PCY5;B4DKD0;Q71UH4;Q02880 DNA topoisomerase 2-alpha TOP2A 8 8 8 6.9 174.38 13.681 + 1.808585952 0.933511893 P51571;A6NLM8 Translocon-associated protein subunit delta SSR4 6 6 6 37 18.998 11.519

Q6NX51;B7Z321;Q96A65;B7Z4J9;Q8TAR2;B7Z689;B3KTF7;B7Z50 + 1.356810885 0.927825928 Exocyst complex component 4 EXOC4 3 3 3 4.6 110.5 6.4749 2

Single-stranded DNA-binding protein;Single-stranded DNA- + 4.675172079 0.913513819 Q567R6;A4D1U3;Q04837;A0A0G2JLD8;E7EUY5;B7Z268;C9K0U8 SSBP1 7 7 7 60.8 17.359 13.774 binding protein, mitochondrial

B4DJV2;A0A024RB75;O75390;A0A0C4DGI3;B3KTN4;Q0QEL2;H0Y + 5.230384393 0.904898008 IC4;F8W4S1;B7Z1E1;F8VZK9;F8VX07;F8VRP1;F8VX68;F8VWQ5;F Citrate synthase;Citrate synthase, mitochondrial CS 11 11 10 35.3 50.431 21.26 8VPF9;F8VPA1;F8VTT8;F8W1S4;H0YH82;F8W642;F8VRI6

+ 3.534574928 0.890849304 Q9HDC9;H0Y512 Adipocyte plasma membrane-associated protein APMAP 7 7 7 24.5 46.48 15.333

B3KSG3;B7Z4H1;F8W8M4;A0A0A0MRL6;O14639;H0Y7N6;F6XFR5 + 1.937044861 0.889584223 Actin-binding LIM protein 1 ABLIM1 4 4 4 13 60.189 8.5168 ;B4DQA3;J3QSX6 B7Z9R8;E9PNM1;Q6IAX1;B4DND3;A0A1W2PQ47;P37268;B4DTK + 2.753430763 0.88890775 Squalene synthase FDFT1 4 4 4 14.2 40.753 7.0239 0;B4DWP0 Q584P1;B7Z3Y2;B7Z8A2;Q9UHG3;C9K055;F8W8W4;C9JM55;C9J + 5.197240327 0.886361758 Prenylcysteine oxidase 1 PCYOX1 9 9 9 24.6 45.138 15.314 GT6 + 4.649112529 0.885055224 A0A024R172;Q14914;F2Z3J9;Q5JVP2;F6XGT7 Prostaglandin reductase 1 LTB4DH;PTGR1 9 9 9 43.2 35.885 18.467 + 4.116973111 0.88087527 Q643R0;Q9ULW0;B3KM90;Q96FC3 Targeting protein for Xklp2 HCTP4;TPX2 12 12 12 26.1 85.652 18 Q14TF0;P48506;B4E2I4;E1CEI4;A0A0C4DGB2;D6R959;H0Y9I7;H0Y + 5.017323095 0.879188855 Glutamate--cysteine ligase catalytic subunit GCLC 17 17 17 32.2 72.765 31.866 AB6;D6REX4;D6RGF8 + 3.659302532 0.849722544 P16401;Q14463 Histone H1.5 HIST1H1B 8 8 8 26.1 22.58 16.5 + 2.747922464 0.84292539 B4DW11;H0YC35;P10909;H0YAS8 Clusterin;Clusterin;Clusterin beta chain;Clusterin alpha chain CLU 2 2 2 17.7 25.683 5.6022 B9A067;Q16891;C9J406;B4DS66;B4DQY2;B4DKR1;H7C463;B4DT2 + 3.158018265 0.836988131 MICOS complex subunit MIC60 IMMT 19 19 19 36.1 78.973 38.133 0;B4E2B5;Q05DN3;Q3B7X4;A0A087WYS0;D6RAW4 I1SRC5;A0A024RAV5;L7RSL8;P01116;Q92468;X5D945;Q5U091;P0 + 1.945351203 0.810368538 1112;P01111;Q14014;P78460;Q96T27;Q7KZ06;Q71SP6;A0A221ZR 9 4 4 45.3 33.983 6.4979 Z1;G3V4K2;A0A221ZRZ5;A0A221ZRZ4;G3V5T7;Q14015 Q01650;Q2MCL6;H0YJ95;A0A0C4DGL4;G3V4Z6;B4DVT0;Q96QB2; + 2.421839814 0.799834887 Large neutral amino acids transporter small subunit 1 SLC7A5 4 4 4 8.1 55.01 8.6462 A0A0S2Z502;A0A024R719;Q9UM01;Q92536 + 1.451372875 0.794154803 O43719;B4DRS4;Q5H919;Q5H918 HIV Tat-specific factor 1 HTATSF1 6 6 6 16.6 85.852 13.696 Q6FHM2;P62879;E7EP32;C9JXA5;C9JIS1;C9JZN1;Q9HAV0;A8K3F + 3.08762371 0.791423162 Guanine nucleotide-binding protein G(I)/G(S)/G(T) subunit beta-2 GNB2 12 7 7 51.2 37.331 31.628 6;C9JD14;H7C5J5

Ras-related C3 botulinum toxin substrate 1;Ras-related C3 A4D2P1;A4D2P0;P63000;P60763;A4D2P2;Q5G7L4;A0A024R1P2;P RAC1;RAC3;RAC2; + 2.380108802 0.782858149 botulinum toxin substrate 3;Ras-related C3 botulinum toxin 4 3 3 19.8 21.45 9.5183 15153;J3KSC4;J3QLK0;B1AH77;A0A024R9T5;B1AH78;B1AH80 hCG_20693 substrate 2 E5KNY5;P42704;B4DSR0;B8ZZ38;A0A0C4DG06;C9JCA9;A0A0C4D + 1.777302065 0.76747481 Leucine-rich PPR motif-containing protein, mitochondrial LRPPRC 37 37 37 32.9 157.9 82.635 Table B4 (cont.) – Up- andG51;A8K6U0;A0A024R746;Q9NP80 down-regulated proteins on the Caco-2 sample treated with HNE in concentration 100 µM. + 2.695951778 0.76606547 Q14684;Q6PJM8 Ribosomal RNA processing protein 1 homolog B RRP1B 8 8 8 13.9 84.427 17.35 UP-REGULATED 100 µM + 2.145882011 0.72242864 G1UI16;Q29RF7;B3KMN2;Q96DB6 Sister chromatid cohesion protein PDS5 homolog A PDS5A 7 7 7 6.8 150.83 12.718 Razor + unique Unique Significant -LOG(p-value) Difference Protein IDs Protein names Gene names Peptides Sequence coverage [%] Mol. weight [kDa] Score + 2.18213022 0.719856898 B2RAU5;Q9Y5X1;B3KXH8;O95061;Q59ES5 Sorting nexin;Sorting nexin-9 SNX9 4 peptides4 peptides4 9.7 66.561 6.9032 + 1.841015717 2.652668635 Q71DI3 Histone H3.2 HIST2H3A 9 1 1 58.1 15.388 11.885 + 3.400787241 0.714480464 E9PQR7;Q548N1;Q9UK41;E9PLM9;E9PR04;E9PM90;A0A0S2Z5H5 Vacuolar protein sorting-associated protein 28 homolog VPS28 4 4 4 36 18.469 10.176 + 7.333678304 2.438052877 P52895;B4DK69;Q59GU2;S4R3P0 Aldo-keto reductase family 1 member C2 AKR1C2 17 4 4 65 36.735 10.248 + 3.5777705288.941107437 0.7124071122.286650976 B4DUG4;Q3MHD2;K7ELG9Q04828;H0Y804;A6NHU4;B4E0M1 ProteinAldo-keto LSM12 reductase homolog family 1 member C1 AKR1C1LSM12 183 183 31 38.967.2 12.50836.788 4.783835.678 ADP/ATP translocase 2;ADP/ATP translocase 2, N-terminally + 2.6042249723.894005921 0.7065159481.985486984 P22695;H3BRG4;H3BSJ9;H3BP04;H3BUE4;H3BUI9;A0A087WVZ4P05141;Q6NVC0;O75961 Cytochrome b-c1 complex subunit 2, mitochondrial UQCRC2SLC25A5 1011 1011 104 36.435.6 48.44232.852 37.10723.021 processed A0A0S2Z3S5;A0A0S2Z3H8;P63092;Q5JWF2;B0AZR9;Q5FWY2;Q1A8K5I0;A0A0G2JIW1;P0DMV9;P0DMV8;A0A1U9X7W4;B4DNT8; HEL-S- + 12.14322701 1.757443746 4455;S4R3E3;O60726;A0A0A0MR13;Q5JWE9;H0Y7F4;A0A1W2PRQ59EJ3;B4DFN9;B4DWK5;B4E388;B4DI39;B4E1S9;B4DVU9;V9GZ Heat shock 70 kDa protein 1B;Heat shock 70 kDa protein 1A 103;HSPA1B;HSPA 46 42 27 64 70.051 132.55 Guanine nucleotide-binding protein G(s) subunit alpha isoforms E1;A2A2R6;A0A1W2PQK2;O75633;A0A087WTB6;Q5JWD1;O756337;B3KTT5;B4DNX1;B4E1T6;Q9UQC1;B4DNV4 1A + 3.50445256 0.68833224 short;Guanine nucleotide-binding protein G(s) subunit alpha GNAS;GSA 9 9 8 28.9 44.266 13.871 2;H0Y7E8;A0A1W2PRJ7;S4R3V9;A0A087WZE5;A0A1W2PS82;Q93 + 2.840540652 1.472600778 P48507;D3DT44 isoformsGlutamate--cysteine XLas ligase regulatory subunit GCLM 4 4 4 20.8 30.727 47.613 020;A0A024R2Z1;Q5T697;B3KP89;A0A142CHG9;P11488;P63096; + 4.092307315 1.30824852 P19087;P09471;A8MTJ3;P04899Q4ZG72;A0A024RAH8;Q9NVP1;Q8N254;H7C452;Q59FR7 ATP-dependent RNA helicase DDX18 DDX18 8 8 8 19.6 61.594 23.036 + 1.347241987 1.297547023 Q5ISQ0;Q6FHV6;P09104;F5H0C8;A8K3B0 Gamma-enolase;Enolase TMED7-ENO2 6 2 2 21.4 45.54 2.0356 + 2.152876196 0.685355377 A0A0A6YYA0;Q9Y3B3;Q3B7W7;B4E2C1;G3V2Y2Q13501;E7EMC7;E3WH17;E3W990;B4DE82;B4E3V2;E9PFW8;C9J Transmembrane emp24 domain-containing protein 7 SQSTM1;SQSTM1- 3 3 3 18.6 21.233 6.8982 + 4.919939425 1.268679937 Sequestosome-1;Tyrosine-protein kinase receptor TICAM2;TMED7 13 13 13 47.7 47.687 49.535 + 2.319189851 0.668915558 A0A024RB02;Q86W92;F5GZP6;B4DFU8;H0YFE4;B3KM46RJ8;Q13502;D6RBF1;C9J6J8 Liprin-beta-1 PPFIBP1ALK 6 6 4 8.6 96.964 12.325 + 1.957199926 1.262562434 Q6FI51;Q6FHS4;P25685;M0R080;M0R1D6;M0QZD0;M0QYT3;M0Q5TEC6 Histone H3 HIST2H3PS2 5 1 1 27.2 15.43 2.4567 + 4.638306623 0.66754055 DnaJ homolog subfamily B member 1 DNAJB1 13 13 12 39.7 38.01 23.451 QXK0;B4DKK9;M0R128;A0A1B0GX60A0A1W2PNM1;B2RB06;A0A140VK76;B3KTT6;A0A0D9SFP2;Q168 + 1.215141734 1.247538249 36;A0A1W2PQ78;E9PF18;A0A1W2PQV5;A0A0A0MSE2;A0A1W2 Hydroxyacyl-coenzyme A dehydrogenase, mitochondrial HADH 6 6 6 28.8 29.096 14.276 + 3.088211938 0.654720465 A0A024RDB0;A0AVT1;B3KSS1;H0Y8S8PQC2;A0A1W2PRT2;A0A1W2PQ55;A0A1W2PP40 Ubiquitin-like modifier-activating enzyme 6 UBE1L2;UBA6 10 10 10 12.1 117.97 12.68

+ 1.765955638 0.652869225 B3KMB9;Q8NEU8;B7Z1Q8;A8K3E7;F8W124;F8VS86;F8VXB0P05023;B7Z3V1;B3KRI1;B1AKY9;B2RAQ4;A0A0S2Z3W6;P50993;B DCC-interacting protein 13-beta APPL2 5 5 5 13.9 74.432 10.913 + 3.490769308 1.19863383 Sodium/potassium-transporting ATPase subunit alpha-1 ATP1A1 20 20 11 26.8 112.89 42.3 7Z1N5;B3KX35;H0Y7C1;B4DIQ8;B3KW93;Q58I23;A0A2H4Y9T5 Dipeptidyl peptidase 4;Dipeptidyl peptidase 4 membrane + 3.173133528 0.649496714 P27487;F8WE17;H7C1N5;F8WBB6 DPP4 17 17 17 24.3 88.278 36.355 K7EK07;B2R4P9;P84243;B4E380;B2R6Y1;A8K4Y7;A0A1X8XL64;Q1 form;Dipeptidyl peptidase 4 soluble form H3F3B;H3F3A;HIST + 2.378280444 1.154080391 Histone H3;Histone H3.3;Histone H3.1t;Histone H3.3C 10 10 2 59.8 14.914 25.502 6695;K7EMV3;B4DEB1;K7ES00;Q6NXT2;Q6TXQ4;K7EP01 3H3;H3F3C + 3.278211214 0.644838015 Q71UA6;Q15758;M0QXM4;Q59ES3;M0QX44;B4DE27 Amino acid transporter;Neutral amino acid transporter B(0) SLC1A5 4 4 4 13.1 56.583 13.023 + 2.205199663 1.147715886 P31930;B4DUL5;B4DZB9 Cytochrome b-c1 complex subunit 1, mitochondrial UQCRC1 9 9 9 32.1 52.645 21.881 Q9BV61;Q59EK6;Q53G55;A0A140VJY2;Q12931;Q53FS6;Q8N9Z3; + 3.562072968 0.643105189 Heat shock protein 75 kDa, mitochondrial TRAP1 18 18 18 29.8 79.356 49.743 + 2.95518102 1.13394022 K0A7K7;Q5CAQ4;I3L0K7;I3L239;I3L253;I3L3L4;I3L2D5B7Z9S8;A6NGH2;Q6LEU2;A3KLL5;P05026;Q58I20;V9GYR2 Sodium/potassium-transporting ATPase subunit beta-1 ATP1B1 4 4 4 15.8 28.627 7.9529 + 3.182008086 0.641665777 A8K9K6;O00567;A0PJ92;H0Y653;H0YDU4;Q5JXT2;Q9BSN3 Nucleolar protein 56 NOP56 17 17 17 37.2 65.977 37.023 + 1.328052403 1.124486605 J3KN36;P69849;Q5JPE7;Q1LZN2;A0A087WTD5;H3BPS9 Nodal modulator 3;Nodal modulator 2 NOMO3;NOMO2 8 8 1 9.7 139.38 15.573 + 3.484762811 0.638024648 A8KAK1;Q9NYU2;H7BZG0;Q9NYU1;Q9NV86 UDP-glucose:glycoprotein glucosyltransferase 1 UGGT1 24 24 24 22.1 175 43.455 + 2.197013578 1.117386818 Q4VB24;B2R984;A3R0T8;A3R0T7;P10412;A1L407;P22492;Q02539 Histone H1.4 HIST1H1E 11 11 3 26.9 21.893 25.265 + 1.543783978 0.626900355 Q9Y3U8;J3QSB5 60S ribosomal protein L36 RPL36 4 4 4 22.9 12.254 22.429 A0A1B0GUA3;Q96EK5;A0A1B0GUH6;A0A1B0GTE6;A0A0D9SFK7; + 3.6873798081.614475604 0.6184145611.114023844 B7ZA56;B4E184;A0A0S2Z500;Q96CV9;A0A0S2Z5I6;A0A087WY28 KIF1-bindingOptineurin protein KIAA1279OPTN 53 53 53 16.47.5 74.78359.559 9.950410.6 A0A1B0GU96;A0A1B0GV79;A0A1B0GVY9;A0A1B0GW21 + 1.195894857 1.107519786 A0A024RAE4;B4E1U9;P60953;B7ZAY4;B4DMH5;A0A024RAE6;A0A8K3Y5;Q9NX58;D6RDJ1 Cell growth-regulating nucleolar protein LYAR 4 4 4 11.1 43.622 6.8963 + 2.673068048 1.101834615 A024R9T1;Q5JYX0;Q9UJM1;Q9UJM0;F8WET9;B1AH79;Q59FQ0;GB2R4R0;P62805;Q0VAS5;Q6B823 Histone H4 HIST1H4H;HIST1H4 13 13 13 62.1 11.367 142.25 + 2.029086768 0.617617925 E5KS60;E5KS55;Q6N0B1;Q9P2R7;A0A0S2Z511;B4DY87;Q7Z503;B Cell division control protein 42 homolog CDC42;hCG_39634DKFZp686D0880;S 7 7 6 43.5 21.258 11.227 3V4H1;G3V476;V9HWD0;Q7Z513;A0A024R692;Q59G91;P17081; Succinyl-CoA ligase subunit beta;Succinyl-CoA ligase [ADP- + 2.872749974 1.051732699 7ZAF6;B4DRV2;Q9Y4T0;B7Z9S6;Q5T9Q5;Q5T9Q8;A0A0U1RQF8; UCLA2;DKFZp564P 5 5 5 12.7 50.317 8.6612 Q9H4E5 forming] subunit beta, mitochondrial A0A0U1RQL1;A0A0U1RQU7 2062 + 1.448118946 0.616496563 A8K5D9;A0A024RA49;Q9NQW6;B4DSL6;H7C3S1;C9JJT6;H7C1C2E9PIR7;B2R5P6;F8W809;B7Z2S5;A0A087WSW9;A0A182DWI3;A0 Actin-binding protein anillin ANLN 6 6 6 8.9 119.9 8.6825 A024RBK9;A0A140VKC9;A0A087WSY9;Q16881;E2QRB9;Q6ZR44; + 7.630571263 1.00251929 Thioredoxin reductase 1, cytoplasmic TXNRD1 13 13 12 46.1 53.166 32.558 E9PKD3;E9PLT3;E9PKI4;A0A0B4J225;E9PIZ5;E9PRI8;E9PQI3;B3K Solute carrier family 2, facilitated glucose transporter member + 1.697364928 0.613102913 B7Z966;P11169;B7Z5A7;Q59F54;B7Z5J8;Q8TDB8;F5GYR5UQ5 3;Solute carrier family 2, facilitated glucose transporter member SLC2A3;SLC2A14 9 9 8 21 47.394 28.627 14 + 2.782179694 0.956207275 Q53GY1;O95817;C9JFK9 BAG family molecular chaperone regulator 3 BAG3 7 7 7 18.8 61.603 11.786 + 1.5785096642.989317786 0.6100931170.934076627 C9JLU1;P52434P11388;J3KTB7;E9PCY5;B4DKD0;Q71UH4;Q02880 DNA-directedDNA topoisomerase RNA polymerases 2-alpha I, II, and III subunit RPABC3 POLR2HTOP2A 48 48 48 39.96.9 16.925174.38 9.243613.681 + 4.8562754171.808585952 0.6070521670.933511893 Q96HE7;G3V3E6;G3V5B3;G3V503;G3V2H0;Q5TAE8;Q86YB8P51571;A6NLM8 ERO1-likeTranslocon-associated protein alpha protein subunit delta ERO1LSSR4 186 186 176 5937 54.39218.998 38.10911.519 + 2.690874463 0.606653372 O75607 Nucleoplasmin-3 NPM3 2 2 2 17.4 19.343 31.46 Q6NX51;B7Z321;Q96A65;B7Z4J9;Q8TAR2;B7Z689;B3KTF7;B7Z50 + 2.3097608521.356810885 0.9278259280.60609436 O75976;B7Z843 CarboxypeptidaseExocyst complex component D 4 EXOC4CPD 33 33 33 4.64.6 152.93110.5 17.8946.4749 Q6IBR0;Q53EP4;P04843;Q96HX3;B4DL99;B7Z4L4;B4DNJ5;F8WF32 Dolichyl-diphosphooligosaccharide--protein glycosyltransferase + 5.004899385 0.605189323 RPN1 18 18 18 34.6 68.606 37.5 2 subunitSingle-stranded 1 DNA-binding protein;Single-stranded DNA- + 4.675172079 0.913513819 Q567R6;A4D1U3;Q04837;A0A0G2JLD8;E7EUY5;B7Z268;C9K0U8 SSBP1 7 7 7 60.8 17.359 13.774 + 2.510435276 0.6046079 J3KS05;Q6IBN6;P83916;B5MD17;K7ELA4;C9JWS9;Q9Y654 Chromoboxbinding protein, protein mitochondrial homolog 1 CBX1 8 7 7 54.3 20.03 29.078 + 3.36512255 0.600270271 P78527;B4DL41;F5GX40 DNA-dependent protein kinase catalytic subunit PRKDC 40 40 40 12.9 469.08 58.466 + 2.57515175 0.528113683 P30837;B4DLJ0B4DJV2;A0A024RB75;O75390;A0A0C4DGI3;B3KTN4;Q0QEL2;H0Y Aldehyde dehydrogenase X, mitochondrial ALDH1B1 12 11 11 28.4 57.206 19.705 + 5.230384393 0.904898008 IC4;F8W4S1;B7Z1E1;F8VZK9;F8VX07;F8VRP1;F8VX68;F8VWQ5;F Citrate synthase;Citrate synthase, mitochondrial CS 11 11 10 35.3 50.431 21.26 8VPF9;F8VPA1;F8VTT8;F8W1S4;H0YH82;F8W642;F8VRI6

+ 3.534574928 0.890849304 Q9HDC9;H0Y512 Adipocyte plasma membrane-associated protein APMAP 7 7 7 24.5 46.48 15.333

B3KSG3;B7Z4H1;F8W8M4;A0A0A0MRL6;O14639;H0Y7N6;F6XFR5 + 1.937044861 0.889584223 Actin-binding LIM protein 1 ABLIM1 4 4 4 13 60.189 8.5168 ;B4DQA3;J3QSX6 B7Z9R8;E9PNM1;Q6IAX1;B4DND3;A0A1W2PQ47;P37268;B4DTK + 2.753430763 0.88890775 Squalene synthase FDFT1 4 4 4 14.2 40.753 7.0239 0;B4DWP0 Q584P1;B7Z3Y2;B7Z8A2;Q9UHG3;C9K055;F8W8W4;C9JM55;C9J + 5.197240327 0.886361758 Prenylcysteine oxidase 1 PCYOX1 9 9 9 24.6 45.138 15.314 GT6 + 4.649112529 0.885055224 A0A024R172;Q14914;F2Z3J9;Q5JVP2;F6XGT7 Prostaglandin reductase 1 LTB4DH;PTGR1 9 9 9 43.2 35.885 18.467 + 4.116973111 0.88087527 Q643R0;Q9ULW0;B3KM90;Q96FC3 Targeting protein for Xklp2 HCTP4;TPX2 12 12 12 26.1 85.652 18 Q14TF0;P48506;B4E2I4;E1CEI4;A0A0C4DGB2;D6R959;H0Y9I7;H0Y + 5.017323095 0.879188855 Glutamate--cysteine ligase catalytic subunit GCLC 17 17 17 32.2 72.765 31.866 AB6;D6REX4;D6RGF8 + 3.659302532 0.849722544 P16401;Q14463 Histone H1.5 HIST1H1B 8 8 8 26.1 22.58 16.5 + 2.747922464 0.84292539 B4DW11;H0YC35;P10909;H0YAS8 Clusterin;Clusterin;Clusterin beta chain;Clusterin alpha chain CLU 2 2 2 17.7 25.683 5.6022 B9A067;Q16891;C9J406;B4DS66;B4DQY2;B4DKR1;H7C463;B4DT2 + 3.158018265 0.836988131 MICOS complex subunit MIC60 IMMT 19 19 19 36.1 78.973 38.133 0;B4E2B5;Q05DN3;Q3B7X4;A0A087WYS0;D6RAW4 I1SRC5;A0A024RAV5;L7RSL8;P01116;Q92468;X5D945;Q5U091;P0 + 1.945351203 0.810368538 1112;P01111;Q14014;P78460;Q96T27;Q7KZ06;Q71SP6;A0A221ZR 9 4 4 45.3 33.983 6.4979 Z1;G3V4K2;A0A221ZRZ5;A0A221ZRZ4;G3V5T7;Q14015 Q01650;Q2MCL6;H0YJ95;A0A0C4DGL4;G3V4Z6;B4DVT0;Q96QB2; + 2.421839814 0.799834887 Large neutral amino acids transporter small subunit 1 SLC7A5 4 4 4 8.1 55.01 8.6462 A0A0S2Z502;A0A024R719;Q9UM01;Q92536 + 1.451372875 0.794154803 O43719;B4DRS4;Q5H919;Q5H918 HIV Tat-specific factor 1 HTATSF1 6 6 6 16.6 85.852 13.696 Q6FHM2;P62879;E7EP32;C9JXA5;C9JIS1;C9JZN1;Q9HAV0;A8K3F + 3.08762371 0.791423162 Guanine nucleotide-binding protein G(I)/G(S)/G(T) subunit beta-2 GNB2 12 7 7 51.2 37.331 31.628 6;C9JD14;H7C5J5

Ras-related C3 botulinum toxin substrate 1;Ras-related C3 A4D2P1;A4D2P0;P63000;P60763;A4D2P2;Q5G7L4;A0A024R1P2;P RAC1;RAC3;RAC2; + 2.380108802 0.782858149 botulinum toxin substrate 3;Ras-related C3 botulinum toxin 4 3 3 19.8 21.45 9.5183 15153;J3KSC4;J3QLK0;B1AH77;A0A024R9T5;B1AH78;B1AH80 hCG_20693 substrate 2 E5KNY5;P42704;B4DSR0;B8ZZ38;A0A0C4DG06;C9JCA9;A0A0C4D + 1.777302065 0.76747481 Leucine-rich PPR motif-containing protein, mitochondrial LRPPRC 37 37 37 32.9 157.9 82.635 G51;A8K6U0;A0A024R746;Q9NP80

+ 2.695951778 0.76606547 Q14684;Q6PJM8 Ribosomal RNA processing protein 1 homolog B RRP1B 8 8 8 13.9 84.427 17.35

+ 2.145882011 0.72242864 G1UI16;Q29RF7;B3KMN2;Q96DB6 Sister chromatid cohesion protein PDS5 homolog A PDS5A 7 7 7 6.8 150.83 12.718

+ 2.18213022 0.719856898 B2RAU5;Q9Y5X1;B3KXH8;O95061;Q59ES5 Sorting nexin;Sorting nexin-9 SNX9 4 4 4 9.7 66.561 6.9032

+ 3.400787241 0.714480464 E9PQR7;Q548N1;Q9UK41;E9PLM9;E9PR04;E9PM90;A0A0S2Z5H5 Vacuolar protein sorting-associated protein 28 homolog VPS28 4 4 4 36 18.469 10.176 + 3.577770528 0.712407112 B4DUG4;Q3MHD2;K7ELG9 Protein LSM12 homolog LSM12 3 3 3 38.9 12.508 4.7838

+ 2.604224972 0.706515948 P22695;H3BRG4;H3BSJ9;H3BP04;H3BUE4;H3BUI9;A0A087WVZ4 Cytochrome b-c1 complex subunit 2, mitochondrial UQCRC2 10 10 10 36.4 48.442 37.107 A0A0S2Z3S5;A0A0S2Z3H8;P63092;Q5JWF2;B0AZR9;Q5FWY2;Q1 4455;S4R3E3;O60726;A0A0A0MR13;Q5JWE9;H0Y7F4;A0A1W2PR Guanine nucleotide-binding protein G(s) subunit alpha isoforms E1;A2A2R6;A0A1W2PQK2;O75633;A0A087WTB6;Q5JWD1;O7563 + 3.50445256 0.68833224 short;Guanine nucleotide-binding protein G(s) subunit alpha GNAS;GSA 9 9 8 28.9 44.266 13.871 2;H0Y7E8;A0A1W2PRJ7;S4R3V9;A0A087WZE5;A0A1W2PS82;Q93 isoforms XLas 020;A0A024R2Z1;Q5T697;B3KP89;A0A142CHG9;P11488;P63096; P19087;P09471;A8MTJ3;P04899 TMED7- + 2.152876196 0.685355377 A0A0A6YYA0;Q9Y3B3;Q3B7W7;B4E2C1;G3V2Y2 Transmembrane emp24 domain-containing protein 7 3 3 3 18.6 21.233 6.8982 TICAM2;TMED7 + 2.319189851 0.668915558 A0A024RB02;Q86W92;F5GZP6;B4DFU8;H0YFE4;B3KM46 Liprin-beta-1 PPFIBP1 6 6 4 8.6 96.964 12.325 Q6FI51;Q6FHS4;P25685;M0R080;M0R1D6;M0QZD0;M0QYT3;M0 + 4.638306623 0.66754055 DnaJ homolog subfamily B member 1 DNAJB1 13 13 12 39.7 38.01 23.451 QXK0;B4DKK9;M0R128;A0A1B0GX60

+ 3.088211938 0.654720465 A0A024RDB0;A0AVT1;B3KSS1;H0Y8S8 Ubiquitin-like modifier-activating enzyme 6 UBE1L2;UBA6 10 10 10 12.1 117.97 12.68

+ 1.765955638 0.652869225 B3KMB9;Q8NEU8;B7Z1Q8;A8K3E7;F8W124;F8VS86;F8VXB0 DCC-interacting protein 13-beta APPL2 5 5 5 13.9 74.432 10.913 Dipeptidyl peptidase 4;Dipeptidyl peptidase 4 membrane + 3.173133528 0.649496714 P27487;F8WE17;H7C1N5;F8WBB6 DPP4 17 17 17 24.3 88.278 36.355 form;Dipeptidyl peptidase 4 soluble form

+ 3.278211214 0.644838015 Q71UA6;Q15758;M0QXM4;Q59ES3;M0QX44;B4DE27 Amino acid transporter;Neutral amino acid transporter B(0) SLC1A5 4 4 4 13.1 56.583 13.023 Q9BV61;Q59EK6;Q53G55;A0A140VJY2;Q12931;Q53FS6;Q8N9Z3; + 3.562072968 0.643105189 Heat shock protein 75 kDa, mitochondrial TRAP1 18 18 18 29.8 79.356 49.743 K0A7K7;Q5CAQ4;I3L0K7;I3L239;I3L253;I3L3L4;I3L2D5 + 3.182008086 0.641665777 A8K9K6;O00567;A0PJ92;H0Y653;H0YDU4;Q5JXT2;Q9BSN3 Nucleolar protein 56 NOP56 17 17 17 37.2 65.977 37.023

+ 3.484762811 0.638024648 A8KAK1;Q9NYU2;H7BZG0;Q9NYU1;Q9NV86 UDP-glucose:glycoprotein glucosyltransferase 1 UGGT1 24 24 24 22.1 175 43.455 + 1.543783978 0.626900355 Q9Y3U8;J3QSB5 60S ribosomal protein L36 RPL36 4 4 4 22.9 12.254 22.429 A0A1B0GUA3;Q96EK5;A0A1B0GUH6;A0A1B0GTE6;A0A0D9SFK7; + 3.687379808 0.618414561 KIF1-binding protein KIAA1279 5 5 5 16.4 74.783 10.6 A0A1B0GU96;A0A1B0GV79;A0A1B0GVY9;A0A1B0GW21 A0A024RAE4;B4E1U9;P60953;B7ZAY4;B4DMH5;A0A024RAE6;A0 A024R9T1;Q5JYX0;Q9UJM1;Q9UJM0;F8WET9;B1AH79;Q59FQ0;G + 2.029086768 0.617617925 Cell division control protein 42 homolog CDC42;hCG_39634 7 7 6 43.5 21.258 11.227 3V4H1;G3V476;V9HWD0;Q7Z513;A0A024R692;Q59G91;P17081; Q9H4E5

+ 1.448118946 0.616496563 A8K5D9;A0A024RA49;Q9NQW6;B4DSL6;H7C3S1;C9JJT6;H7C1C2 Actin-binding protein anillin ANLN 6 6 6 8.9 119.9 8.6825

Solute carrier family 2, facilitated glucose transporter member + 1.697364928 0.613102913 B7Z966;P11169;B7Z5A7;Q59F54;B7Z5J8;Q8TDB8;F5GYR5 3;Solute carrier family 2, facilitated glucose transporter member SLC2A3;SLC2A14 9 9 8 21 47.394 28.627 14

+ 1.578509664 0.610093117 C9JLU1;P52434 DNA-directed RNA polymerases I, II, and III subunit RPABC3 POLR2H 4 4 4 39.9 16.925 9.2436 + 4.856275417 0.607052167 Q96HE7;G3V3E6;G3V5B3;G3V503;G3V2H0;Q5TAE8;Q86YB8 ERO1-like protein alpha ERO1L 18 18 17 59 54.392 38.109 + 2.690874463 0.606653372 O75607 Nucleoplasmin-3 NPM3 2 2 2 17.4 19.343 31.46 + 2.309760852 0.60609436 O75976;B7Z843 Carboxypeptidase D CPD 3 3 3 4.6 152.93 17.894 Q6IBR0;Q53EP4;P04843;Q96HX3;B4DL99;B7Z4L4;B4DNJ5;F8WF3 Dolichyl-diphosphooligosaccharide--protein glycosyltransferase + 5.004899385 0.605189323 RPN1 18 18 18 34.6 68.606 37.5 2 subunit 1 + 2.510435276 0.6046079 J3KS05;Q6IBN6;P83916;B5MD17;K7ELA4;C9JWS9;Q9Y654 Chromobox protein homolog 1 CBX1 8 7 7 54.3 20.03 29.078 + 3.36512255 0.600270271 P78527;B4DL41;F5GX40 DNA-dependent protein kinase catalytic subunit PRKDC 40 40 40 12.9 469.08 58.466 + 2.57515175 0.528113683 P30837;B4DLJ0 Aldehyde dehydrogenase X, mitochondrial ALDH1B1 12 11 11 28.4 57.206 19.705 94

Table B4 (cont.) – Up- and down-regulated proteins on the Caco-2 sample treated with HNE in concentration 100 µM.

DOWN-REGULATED 100 µM Razor + unique Unique Significant -LOG(p-value) Difference Protein IDs Protein names Gene names Peptides Sequence coverage [%] Mol. weight [kDa] Score peptides peptides + 2.639735029 -0.601013819 Q14444;E9PLA9;G3V153;B4DSF8;B3KXU8 Caprin-1 CAPRIN1 16 16 16 29.2 78.365 70.668 + 2.521027395 -0.607437134 A0A024R179;Q09161;X6R941;F2Z2T1 Nuclear cap-binding protein subunit 1 NCBP1 25 25 25 42.5 91.838 40.856 + 3.019445615 -0.623521487 B8ZZU8;Q15370;I3L0M9;A0A0B4J296 Transcription elongation factor B polypeptide 2 TCEB2 6 6 6 88.5 12.527 25.415 + 2.542026423 -0.635217031 P62851 40S ribosomal protein S25 RPS25 8 8 8 37.6 13.742 14.142

+ 2.252493004 -0.650155226 A2VCL2 Coiled-coil domain-containing protein 162 CCDC162P 2 2 2 2.8 103.84 2.4803 + 2.645464368 -0.652169228 Q9P1F3;Q5SZC9 Costars family protein ABRACL ABRACL 3 3 3 39.5 9.0564 8.9503 + 1.883394466 -0.652745883 A2VCK8;P62328;Q0P5T0;Q0P5U7;Q0P5Q0;Q0P5P4;Q0P5N8 Thymosin beta-4;Hematopoietic system regulatory peptide TMSB4X 7 7 7 88.6 5.0526 18.838

+ 2.064138517 -0.654881096 Q9Y5L4;K7EIT2 Mitochondrial import inner membrane translocase subunit Tim13 TIMM13 5 5 5 53.7 10.5 12.556

+ 2.45772341 -0.66170152 B2R7S7;Q5T2W1;A0A024R4E8;A0A0C4DG67;A8MUH7 Na(+)/H(+) exchange regulatory cofactor NHE-RF3 PDZK1 10 10 10 31.6 57.081 43.571 SERBP1;DKFZp686 + 2.963694299 -0.663713137 Q8NC51;Q5VU21;Q63HR1;D3DQ70;D3DQ69 Plasminogen activator inhibitor 1 RNA-binding protein 14 14 14 41.2 44.965 34.804 P17171

Coiled-coil-helix-coiled-coil-helix domain-containing protein CHCHD2;CHCHD2P + 2.905227414 -0.670174281 Q9Y6H1;Q5T1J5 2;Putative coiled-coil-helix-coiled-coil-helix domain-containing 3 3 3 33.8 15.512 12.019 9 protein CHCHD2P9, mitochondrial

+ 2.029443103 -0.671827952 V9HWC4;Q6FIE5;Q9NRX4 14 kDa phosphohistidine phosphatase HEL-S- 2 2 2 21.6 13.832 4.9719 A0A024RAI5;A0A024RAJ2;A0A024RAF0;B7Z6Y2;A0A024RAI6;A0 A024RAG9;A0A024RAG8;A0A024RAE9;A0A024RAI4;A0A024RAF + 5.560021251 -0.674361229 1;O00499;Q9BTH3;A0A024RAJ3;A0A024RAF6;Q8WWH9;B7Z2Z2; Myc box-dependent-interacting protein 1 BIN1 20 20 20 52.9 53.02 43.521 Q8NFL3;S4R418;F5H0W4;B7Z6F3;A0A0A0MT10;B2RAR0;A0A087 X188;A8K5Y2;Q9UBW5;P49418

G3V4W0;B4DY08;B2R5W2;G3V4C1;B2R603;A8K9A4;P07910;G3V Heterogeneous nuclear ribonucleoproteins C1/C2;Heterogeneous HNRNPC;hCG_164 576;Q5IST8;G3V2Q1;B3KX96;G3V555;G3V575;D3DPI2;G3V5X6;G nuclear ribonucleoprotein C-like 1;Heterogeneous nuclear 1229;HNRNPCL1;H + 2.442103717 -0.685183843 3V3K6;G3V251;A0A0G2JNQ3;O60812;B2RXH8;B4DMJ1;B4DSU6; ribonucleoprotein C-like 2;Heterogeneous nuclear 20 20 20 59.2 28.916 43.325 NRNPCL2;HNRNPC B4DQQ2;G3V2D6;A0A0G2JPF8;P0DMR1;B7ZW38;G3V4M8;G3V5 ribonucleoprotein C-like 4;Heterogeneous nuclear L4;HNRNPCL3 V7;G3V2H6;Q6PKD2;Q569J8;E5RG71;B3KT61;B3KSX3;Q86SE5 ribonucleoprotein C-like 3

+ 3.16944854 -0.705166181 P51991;B4E036 Heterogeneous nuclear ribonucleoprotein A3 HNRNPA3 25 25 4 46.8 39.594 92.438 GRWD1;DKFZp564 + 3.039809861 -0.714096705 Q8NCK5;Q9BQ67;Q8TCJ8;B2RDP6;B4DTI1;M0QX71 Glutamate-rich WD repeat-containing protein 1 7 7 7 27.8 49.346 11.368 C172 + 1.951687923 -0.715422948 B2R4N3;A0A024R7B0;Q9BZL1 Ubiquitin-like protein 5 UBL5 2 2 2 28.8 8.5478 3.5723 + 2.370193597 -0.716721217 A0A024R306;Q9Y2S6 Translation machinery-associated protein 7 CCDC72;TMA7 4 4 4 40.6 7.0662 5.8922 + 1.71000447 -0.719799042 Q6IBA2;Q59G24;P53999;B7Z1Z0;Q6E433 Activated RNA polymerase II transcriptional coactivator p15 PC4;SUB1 8 8 8 42.5 14.395 29.582 + 7.972800205 -0.720065435 A0A024RCA7;P05387;H0YDD8 60S acidic ribosomal protein P2 RPLP2 10 10 10 80.9 11.665 89.165 + 7.209432907 -0.733125369 L0R512 NCAM2 1 1 1 21.4 4.7347 1.7464 + 2.497565446 -0.743067106 P05161;A0A096LNZ9;A0A096LPJ4 Ubiquitin-like protein ISG15 ISG15 5 5 5 46.1 17.887 10.771 + 5.55243249 -0.788380941 Q99584;D3DV53 Protein S100-A13 S100A13 7 7 7 72.4 11.471 16.423 + 1.838233361 -0.791659355 A0A140VJK1;O76003 Glutaredoxin-3 GLRX3 12 12 12 34.9 37.432 21.439 + 2.308424346 -0.791750272 P18077;F8WBS5;F8WB72;C9K025 60S ribosomal protein L35a RPL35A 7 7 7 36.4 12.538 11.213 Q5RKT7;B2RDW1;P62979;A0A248RGE3;J3QS39;J3QTR3;F5H6Q2; F5GYU3;F5H2Z3;F5H265;Q5UGI3;B4DV12;F5H388;F5H747;F5GXK RPS27A;HEL112;UB 7;J3QKN0;Q5U5U6;Q5PY61;Q96C32;Q96MH4;L8B4I8;L8B196;L8B Ubiquitin-40S ribosomal protein S27a;Ubiquitin;40S ribosomal + 3.559757824 -0.807513555 B;UBC;UbC;DKFZp 15 15 3 72.4 17.905 49.631 4R0;A8K674;L8B4Z6;L8B4M0;L8B4J3;P0CG47;P0CG48;Q9UFQ0;Q protein S27a;Polyubiquitin-B;Ubiquitin;Polyubiquitin-C;Ubiquitin 434K0435;UBA52 96H31;Q66K58;Q59EM9;M0R1V7;Q49A90;Q8WYN9;F5GZ39;J3QS A3;A8CGI2 Q5U0D2;Q01995;Q6FI52;H0YCU9;Q59FA5;E9PJ32;Q96FG7;B0YJ5 + 6.457667513 -0.808653514 Transgelin TAGLN 20 20 2 87.1 22.611 56.418 8;Q7Z517 + 4.719759513 -0.809478124 P62081;B5MCP9;Q59FS3 40S ribosomal protein S7 RPS7 17 17 17 66.5 22.127 36.294 + 3.727084781 -0.831898371 Q496C9;Q8TEA8;A0A087WZV9 D-tyrosyl-tRNA(Tyr) deacylase;D-tyrosyl-tRNA(Tyr) deacylase 1 DTD1 4 4 4 25.8 23.483 8.0404 A0A087WZG9;B4DSP0;A0A087WXK2;A0A087WUL4;A0A087WX2 + 8.153146372 -0.846232096 Retrotransposon-derived protein PEG10 PEG10 6 6 6 17.5 40.471 12.53 3;Q86TG7 + 2.486983702 -0.851186752 Q76LA1;P04080;A0A1W2PS52;A0A1W2PQG6 Cystatin-B CSTB 4 4 4 70.4 11.139 16.104 + 4.0678599 -0.86452891 Q9Y5J9;G3XAN8;E9PIR3 Mitochondrial import inner membrane translocase subunit Tim8 B TIMM8B 4 4 4 42.2 9.3435 10.327 Q13509;Q9BV28;Q53G92;B2RBD5;A0A0B4J269;Q3ZCR3;G3V2A3; + 3.692261088 -0.867581685 Tubulin beta-3 chain TUBB3 21 5 5 57.6 50.432 18.038 G3V5W4;G3V2R8;G3V3R4;G3V2N6;G3V3J6;G3V3W7 O75821;Q6IAM0;K7EL20;A8K5K5;K7ER90;K7ENA8;K7EP16;B0AZ + 2.506411337 -0.887653033 Eukaryotic translation initiation factor 3 subunit G EIF3G 13 13 13 45.6 35.611 43.455 V5;K7EL60;B7Z1G0;K7ENH0 + 5.973932631 -0.920789401 P29966;Q6NVI1;Q05C82 Myristoylated alanine-rich C-kinase substrate MARCKS 12 12 12 49.4 31.554 39.347 A0A0D9SF70;G5E9L0;Q8N6H7;E9PQP3;B4E327;B3KV00;H0YDN9; + 3.689802662 -0.9528265 E9PIY6;E9PMU9;E9PMR9;E9PM63;E9PNY8;H0YDX1;E9PJT7;E9PK ADP-ribosylation factor GTPase-activating protein 2 ARFGAP2 6 6 5 20.5 42.153 25.421 28;E9PN48;B7Z6H9 + 1.573676497 -0.954432869 P26232;C9J144;C9IZ88 Catenin alpha-2 CTNNA2 7 1 1 8.5 105.31 2.9395 Myosin regulatory light chain 12A;Myosin regulatory light chain + 3.713899459 -0.970846494 J3QRS3;P19105;O14950;Q53HL1;Q2F834;J3KTJ1 MYL12A;MYL12B 9 9 4 60.5 20.457 57.71 12B + 2.867893193 -0.974391301 Q71UI9;P0C0S5;C9J0D1;C9J386 Histone H2A.V;Histone H2A.Z;Histone H2A H2AFV;H2AFZ 6 4 4 53.9 13.509 14.824

Eukaryotic translation initiation factor 5A;Eukaryotic translation EIF5A;EIF5AL1;EIF5 + 6.21485533 -1.064674377 I3L397;I3L504;P63241;Q6IS14;F8WCJ1;C9J7B5;C9J4W5;Q9GZV4 initiation factor 5A-1;Eukaryotic translation initiation factor 5A-1- 14 14 14 76.7 16.019 48.479 A2 like;Eukaryotic translation initiation factor 5A-2

+ 1.681852867 -1.080378532 Q8N543;H3BUA6;B4E360;A0A0A0MTR2;H3BR79 Prolyl 3-hydroxylase OGFOD1 OGFOD1 3 3 3 10 63.245 3.2959 O15230;A0A087WYH7;A0A075B783;Q566Q1;A0A0A0MSA0;B0YJ + 2.043215896 -1.13339138 Laminin subunit alpha-5 LAMA5 31 31 31 12.8 399.73 61.532 32;Q16787 + 1.398235145 -1.206366062 O75506 Heat shock factor-binding protein 1 HSBP1 2 2 2 34.2 8.5435 6.7723 + 3.217315486 -1.317394733 P15531 Nucleoside diphosphate kinase A NME1 13 3 3 86.2 17.149 29.847 + 3.234450072 -1.370045344 P07108;B8ZWD1;A0A024RAF2;A0A0A0MTI5;B8ZWD9;B8ZWD8 Acyl-CoA-binding protein DBI 9 9 9 90.8 10.044 72.733 B7Z6P1;P68133;P68032;D2JYH4;A8K3K1;P63267;P62736;Q7Z7J6; A6NL76;B3KW67;B4DUI8;F8WB63;B8ZZJ2;C9JFL5;F6UVQ4;F6QU T6;B7Z6I1;B3KPP5;F8WCH0;K4EQ44;K4EP00;K4ENJ5;A0A173GM Actin, alpha skeletal muscle;Actin, alpha cardiac muscle 1;Actin, ACTA1;ACTC1;ACT + 3.791575799 -1.941323407 X0;Q562Y4;Q562X7;Q562X6;Q562X4;Q562X2;Q562X1;Q562X0;Q5 21 3 1 38.2 38.579 9.5728 gamma-enteric smooth muscle;Actin, aortic smooth muscle A2;ACTG2 62W2;Q562U1;Q562U0;Q562T8;Q562T6;Q562T5;Q562T4;Q562T3; Q562T0;Q562R6;Q562R5;Q562R0;Q562Q9;Q562Q8;Q562Q6;Q56 2Q4;Q562Q2;Q562Q0;Q562P8;Q562L3 + 1.633555577 -2.22867775 F8WBN3;B9A031;C9JQZ0;B8ZZY7;F8WBV6;P84101 Small EDRK-rich factor 2 SERF2 2 2 2 27.5 4.5792 3.7333

95

Appendix C – Statistical evidence of replicate reproducibility in HNE-treated Caco- 2 samples

Figure C1 – Multi-scatter plot representation (with Pearson Correlation Coefficient) of LFQ intensity amongst replicates for the identified proteins on the Caco-2 samples: control and treated with HNE in concentration 0.1 µM, 1 µM, 10 µM and 100 µM.

96

Appendix D – List of post-translational modifications by HNE in HNE-treated Caco- 2 samples

The lists represent the identified proteins with non-rejected peptide modifications by HNE obtained with Proteome Discoverer software. Several information was exported for each quantified protein: protein accession number, protein name, total peptides number, unique peptides number, percentage of coverage, molecular weight, isoelectric point (pI), algorithm scores, besides the modified residue in the annotated sequence, with the monoisotopic mass over charge ratio (m/z), theoretical mass shift, retention time (RT) and Xcorr value.

Table D1 – HNE-modified proteins on the Caco-2 control sample. All modifications report to the formation of Michael adducts, except the formation of pentylpyrrole on Calpastatin.

MODIFICATIONS HNE - CTR Acc # Protein # peptides # unique peptides Sum PEP Score Coverage MW (kDa) calculated pI Score Sequest AA mod Annotated Sequence m/z Theo. MH+ (Da) RT (min) Xcorr P35579 Myosin-9 130 112 760.967 60% 226.4 5.6 1564.87 [R6]D [R].QAQQErDELADEIANSSGK.[G] 749.0331 2245.09425 45.8834 3.89 P21333 Filamin-A 109 99 604.482 54% 280.6 6.06 1133.3 [R29]E [K].VGSAADIPINISETDLSLLTATVVPPSGrEEPcLLK.[R] 980.527 3919.0984 115.0381 6.48 Q15149 Plectin 99 96 388.575 29% 531.5 5.96 583.4 [R19]G [K].EQELQQTLQQEQSVLDQLrGEAEAAR.[R] 789.1531 3153.60221 97.5155 3.80 P08238 Heat shock protein HSP 90-beta 52 31 376.185 65% 83.2 5.03 1617.1 [R19]G [R].VFIMDScDELIPEYLNFIr.[G] 844.0873 2530.26079 116.4332 4.75 P07900 Heat shock protein HSP 90-alpha 54 35 353.741 65% 84.6 5.02 1584.28 [R19]G [R].VFIMDNcEELIPEYLNFIr.[G] 857.7613 2571.28734 116.1292 4.29 O43707 Alpha-actinin-4 54 40 317.762 67% 104.8 5.44 697.18 [H13]S [K].IcDQWDALGSLThSR.[R] 479.4862 1914.93781 67.9762 3.40 P14618 Pyruvate kinase PKM 37 37 276.431 72% 57.9 7.84 880.2 [R22]V [R].GIFPVLcKDPVQEAWAEDVDLr.[V] 679.0996 2713.39055 98.4651 3.19 P31939 bifunctional purine biosynthesis protein purH 42 42 237.109 84% 64.6 6.71 494.51 [K23]W [R].AEISNAIDQYVTGTIGEDEDLIk.[W] 884.1121 2650.33454 107.6084 4.33 [H19]E [R].VIISAPSADAPMFVMGVNhEK.[Y] 593.06 2369.22434 77.1297 3.95 P04406 glyceraldehyde-3-phosphate dehydrogenase 24 24 221.999 79% 36 8.46 919.49 [K21]Y [R].VIISAPSADAPMFVMGVNHEk.[Y] 593.06 2369.22434 77.1297 4.00 [K27]I [R].VIISAPSADAPMFVMGVNHEKYDNSLk.[I] 773.1438 3089.5686 78.1213 5.02

Q9BQE3 Tubulin alpha-1C chain 31 3 188.02 82% 49.9 5.1 706.54 [H4]P [R].QLFhPEQLITGKEDAANNYAR.[G] 643.5813 2571.32017 66.5164 4.14

Q71U36 Tubulin alpha-1A chain 30 1 184.88 78% 50.1 5.06 783.19 [H4]P [R].QLFhPEQLITGKEDAANNYAR.[G] 643.5813 2571.32017 66.5164 4.14 P20810 Calpastatin 26 26 166.95 54% 76.5 5.07 276.95 [K12]P [K].EGITGPPADSSkPIGPDDAIDALSSDFTcGSPTAAGK.[K] 931.4462 3722.77418 90.4237 2.78 [H4]P [R].QLFhPEQLITGKEDAANNYAR.[G] 643.5813 2571.32017 66.5164 4.14 P68366 Tubulin alpha-4A chain 25 5 166.37 59% 49.9 5.06 678.03 [K13]V [R].SIQFVDWcPTGFk.[V] 580.9556 1740.86655 83.1023 3.38 Q13748-1 tubulin alpha-3C/D chain 22 0 142.733 57% 49.9 5.1 604.85 [H4]P [R].QLFhPEQLITGKEDAANNYAR.[G] 643.5813 2571.32017 66.5164 4.14 P12277 Creatine kinase B-type 20 20 140.296 69% 42.6 5.59 362.32 [R21]L [R].GTGGVDTAAVGGVFDVSNADr.[L] 707.6829 2121.04584 71.2255 4.04 P51991-1 Heterogeneous nuclear ribonucleoprotein A3 22 4 130.868 45% 39.6 9.01 275.7 [R16]E [K].LFIGGLSFETTDDSLr.[E] 643.0105 1927.00588 90.5109 5.67 Q32P51 Heterogeneous nuclear ribonucleoprotein A1-like 2 16 1 101.361 38% 34.2 9.00 326.24 [R16]D [R].SHFEQWGTLTDcVVMrDPNTK.[R] 670.0701 2677.27487 112.5313 3.10

Table D2 – HNE-modified proteins on the Caco-2 sample treated with HNE in concentration 0.1 µM. All modifications report to the formation of Michael adducts, except the formation of pentylpyrrole on Calpastatin.

MODIFICATIONS HNE - 0.1 µM Acc # Protein # peptides # unique peptides Sum PEP Score Coverage MW (kDa) calculated pI Score Sequest AA mod Annotated Sequence m/z Theo. MH+ (Da) RT (min) Xcorr [R6]D [R].QAQQErDELADEIANSSGK.[G] 562.02461 2245.09425 47.5763 4.01 P35579 Myosin-9 110 94 681.04 54% 226.4 5.6 1547.4 [R11]A [R].QLEEAEEEAQrANASR.[R] 662.99083 1986.97268 36.2978 3.03 P08238 Heat shock protein HSP 90-beta 57 26 376.626 67% 83.2 5.03 1676.95 [R19]G [R].VFIMDScDELIPEYLNFIr.[G] 844.08873 2530.26079 116.3031 4.39 Q15149 Plectin 80 78 312.833 22% 531.5 5.96 545.8 [R19]G [K].EQELQQTLQQEQSVLDQLrGEAEAAR.[R] 789.15423 3153.59509 97.8126 3.36 O43707 Alpha-actinin-4 51 36 276.934 66% 104.8 5.44 633.46 [H13]S [K].IcDQWDALGSLThSR.[R] 479.48633 1914.93781 68.8971 3.43 P31939 bifunctional purine biosynthesis protein purH 41 41 237.904 82% 64.6 6.71 500.29 [K23]W [R].AEISNAIDQYVTGTIGEDEDLIk.[W] 884.11183 3560.68699 107.9773 3.00 P04406-1 glyceraldehyde-3-phosphate dehydrogenase 24 24 224.473 73% 36 8.46 926.94 [K27]I [R].VIISAPSADAPMFVMGVNHEKYDNSLk.[I] 618.71701 3089.5686 79.9858 4.96 P60174 Triosephosphate isomerase 27 27 202.458 79% 30.8 5.92 578.6 [K13]V [R].HVFGESDELIGQk.[V] 538.94689 1614.83735 46.2216 3.42 P0DMV8 heat shock 70 kDa protein 1A 32 23 188.603 54% 70 5.66 382.5 [K1]E [R].kELEQVcNPIISGLYQGAGGPGPGGFGAQGPK.[G] 835.67756 3339.6884 87.7891 5.58 P46940 Ras GTPase-activating-like protein IQGAP1 33 32 145.644 28% 189.1 6.48 266.81 [H17]A [K].IGGILANELSVDEAALhAAVIAINEAIDR.[R] 779.42806 3114.70449 122.0868 3.34 P20810 Calpastatin 23 23 136.429 45% 76.5 5.07 216.49 [K12]P [K].EGITGPPADSSkPIGPDDAIDALSSDFTcGSPTAAGK.[K] 931.4417 3722.77418 89.9919 3.95 P35241-1 radixin 27 19 119.202 41% 68.5 6.37 255.08 [K1]P [R].kPDTIEVQQMK.[A] 491.6015 1472.80288 27.7567 4.08 P15311 Ezrin 31 23 110.527 48% 69.4 6.27 274.33 [K1]P [R].kPDTIEVQQMK.[A] 491.6015 1472.80288 27.7567 4.08 P16422 Epithelial cell adhesion molecule 8 8 52.565 28% 34.9 7.46 110.19 [K2]P [R].AkPEGALQNNDGLYDPcDESGLFK.[A] 728.09287 2909.35093 71.4835 4.71

Table D3 – HNE-modified proteins on the Caco-2 sample treated with HNE in concentration 100 µM. All modifications report to the formation of Michael adducts.

MODIFICATIONS HNE - 100 µM Acc # Protein # peptides # unique peptides Sum PEP Score Coverage MW (kDa) calculated pI Score Sequest AA mod Annotated Sequence m/z Theo. MH+ (Da) RT (min) Xcorr P21333 Filamin-A 78 71 337.949 41% 280.6 6.06 635.34 [R29]E [K].VGSAADIPINISETDLSLLTATVVPPSGrEEPCLLK.[R] 980.52556 3919.0984 115.7036 5.86 P08238 Heat shock protein HSP 90-beta 55 32 313.276 66% 83.2 5.03 1570.4 [R19]G [R].VFIMDScDELIPEYLNFIr.[G] 844.08645 2530.26079 116.8056 3.69 Q15149 Plectin 71 69 244.662 20% 531.5 5.96 375.05 [R19]G [K].EQELQQTLQQEQSVLDQLrGEAEAAR.[R] 789.15291 3153.60221 98.7809 3.92 P11142 Heat shock cognate 71 kDa protein 34 28 227.252 50% 70.9 5.52 723.69 [K11]H [R].FDDAVVQSDMk.[H] 705.83908 1410.6821 46.9487 3.61 P14618 Pyruvate kinase PKM 39 39 203.892 77% 57.9 7.84 850.91 [K22]C [R].LAPITSDPTEATAVGAVEASFk.[C] 777.74454 2331.23297 90.8321 3.40 Q9NY65 Tubulin alpha-8 chain 15 1 96.953 37% 50.1 5.06 435.19 [H15]S [K].LTDAcSGLQGFLIFhSFGGGTGSGFTSLLMER.[L] 880.68592 3519.72185 121.4466 2.55

97

Appendix E – Poster communication P080, XXVII Congresso della Divisione di Chimica Analitica, Bologna 16-20 September 2018

The work developed led to the presentation of a poster entitled “Quantitative analyses of protein profiling to study the dual protective/damaging effect of 4-hydroxynonenal (HNE) on the intestinal barrier” at XXVII Congresso della Divisione di Chimica Analitica della Società Chimica Italiana, 16-20 September 2018, Bologna, Italy. The abstract is present below and also available at: https://analitica2018.it/event/1/attachments/1/25/Abstract_book_Analitica_2018.pdf (page 249) Authors: Alessandra Altomare, Giacomo Mosconi, Catarina Guimarães Faria, Giancarlo Aldini, Marina Carini, Alfonsina D’Amato Department of Pharmaceutical Sciences, Università degli Studi di Milano, Via Luigi Mangiagalli 25, 20133, Milan, Italy

98