Biomarkers and genetics of canine visceral haemangiosarcoma

Patharee Oungsakul Doctor of Veterinary Medicine, Master of Veterinary Studies

0000-0003-4523-3484

A thesis submitted for the degree of Doctor of Philosophy at The University of Queensland in Year 2020

School of Veterinary Science

Abstract

Visceral haemangiosarcoma (HSA) is a vascular endothelial cell cancer that arises in internal organs, especially in the . HSA carries a very poor prognosis but is also difficult to diagnose. Therefore, dogs with HSA symptoms are often euthanized without full investigation. The goal of the thesis was to improve the diagnosis and prevention of this cancer by exploring several HSA biomarkers.

The thesis comprises four independent biomarker studies regarding glycoproteins, genetics and Infrared spectral markers. The aim of the first study (chapter 2) was to validate the candidates for canine visceral HSA serum biomarkers, by performing tissue immuno- and lectin-labelling of HSA (n = 32) and HSA-like (n=26) formalin-fixed paraffin-embedded (FFPE) tissues with C7 (component complement 7), MGAM (maltase-glucoamylase), VTN (vitronectin) antibodies and DSA (Datura stramonium), WGA (Wheat germ agglutinin), SNA (Sambucus nigra) and PSA (Pisum sativum)lectins. The results showed lectin/immuno-positive signal on HSA cells, endothelial cell of the veins and arteries. IHC and LHC signal intensities were heterogeneous across the tissue types, diagnosis groups (HSA vs HSA-like) and tissue markers. A semi-quantitative assay was applied to assess and compare the levels of IHC/LHC signal intensities of each marker. Among the candidates tested, complement component 7 (C7) and DSA binding glycoproteins were considered the most promising tissue markers as they demonstrated the ability to distinguish HSA tissue from HSA-like tissues (e.g. hematoma), by the comparison of glycoprotein expression level.

HSA is overrepresented in Golden Retrievers, German Shepherds and Labrador Retrievers which suggested that genetic background plays an important role in this cancer. Inherited germline mutation was hypothesized to be the major factor. The aim of chapter 3 was to identify germline variants/polymorphisms that were related to HSA, allowing for development of HSA predisposing genetic markers in the future. Whole exome sequence data analysis was performed in 28 dogs with HSA. The sources of data included novel whole exome sequencing from the samples collected in this study (n = 11) and database archival data (n = 17). Control data (n = 247) were obtained from sequence read archive databases. Exome variants association scan was performed in parallel with functional annotation analysis. The results showed genetic variants/polymorphisms in four , including PLSCR3, FANCM, MRPL40 and ZNF704, were strongly associated with HSA. The allele frequencies were detected in over 80% of HSA cases and were mostly homozygous. While the allele frequencies in the control were around 50%, none was homozygous. set enrichment analysis showed five molecular pathways where genes with HSA-related variants resided, including

i

Interferon gamma, Complement, Adipogenesis, Genes down-regulated in CD4 T-cells, and Genes down-regulated in marginal zone B-lymphocytes.

In chapter 4, somatic mutations were investigated further in five samples, where matched blood- tumour samples were available, by performing whole exome sequencing and bioinformatics data analysis. Gene mutation candidates were manually selected based on; 1) their annotated functions, 2) co-mutated candidates which most HSA tumours have in common, and 3) genes that had high numbers of mutations. Co-mutated (more than 60% of samples) somatic mutations were found in ADCY1, CNN2, ARHGAP45, ATXN2L, CYP1A2, KRT3, SBNO2, TUFM and U6 genes. When considering the severity of molecular functions that were disrupted by gene mutations, genes containing one or more binding sites for MZF1 (myeloid zinc finger 1) in their regions (DUSP6, NFKBID, COPS3, KPNB1, KIF21A, SERF2, PABPC1L and COL24A1) were found to have undergone mutations in canine HSA tumours. Genes with high numbers of mutations included GLUL, RIDA, ESRRA, COL11A2, ZNF296, RSAD2, RIN2, XRCC1, POLRMT, MTFR2 and PSMD4, with high numbers of the mutations occurring in the 3'-UTR. Combining all the mutations, the mitotic spindle pathway was found to be disrupted by mutations in NF1, SASS6, CLASP1, ARHGEF3 and STAU1 genes. The proportion of samples having mutations on gene TP53 and PIK3CA in this study cohort was 20%.

The study in chapter 5 explored the capability of Fourier Transform Infrared (FTIR) imaging plus Partial Least Squares-Discriminant analysis (PLS-DA) in discriminating FFPE spleen tissue sections containing HSA (cancer), which could lead to the development of rapid, label-free tissue diagnostic technologies to aid in canine HSA differential diagnosis. FTIR images of 26 splenic HSA and 32 non-HSA splenic lesions were blind tested. The prediction model correctly identified 24 out of 26 HSA samples while 21 non-HSA samples were incorrectly predicted as cancer, which yielded a sensitivity of 92% and 34% specificity. Potential improvement for this technology is suggested.

The studies in this thesis contributed to further development of canine visceral HSA biomarkers such as differential diagnostic markers, disease predisposing genetic markers and genomic markers for personalized treatments.

ii

Declaration by author

This thesis is composed of my original work, and contains no material previously published or written by another person except where due reference has been made in the text. I have clearly stated the contribution by others to jointly authored works that I have included in my thesis.

I have clearly stated the contribution of others to my thesis as a whole, including statistical assistance, survey design, data analysis, significant technical procedures, professional editorial advice, financial support and any other original research work used or reported in my thesis. The content of my thesis is the result of work I have carried out since the commencement of my higher degree by research candidature and does not include a substantial part of work that has been submitted to qualify for the award of any other degree or diploma in any university or other tertiary institution. I have clearly stated which parts of my thesis, if any, have been submitted to qualify for another award.

I acknowledge that an electronic copy of my thesis must be lodged with the University Library and, subject to the policy and procedures of The University of Queensland, the thesis be made available for research and study in accordance with the Copyright Act 1968 unless a period of embargo has been approved by the Dean of the Graduate School.

I acknowledge that copyright of all material contained in my thesis resides with the copyright holder(s) of that material. Where appropriate I have obtained copyright permission from the copyright holder to reproduce material in this thesis and have sought permission from co-authors for any jointly authored works included in the thesis.

iii

Publications included in this thesis

No publication included.

iv

Submitted manuscripts included in this thesis

No manuscripts submitted for publication.

v

Other publications during candidature

Conference abstracts

1. Patharee Oungsakul, Caroline O’Leary, Michelle M. Hill, Helle Bielefeldt-Ohmann, David Duffy. (2019) Identifying visceral haemangiosarcoma related germline variants in dogs using whole exome sequencing [Poster presentation and flash talk oral presentation] Proceeding of the 10th International Conference on Canine and Feline Genetics and Genomics (ICCFGG). May 26 – 29, 2019, Bern, Switzerland.

2. Patharee Oungsakul, Caroline O’Leary, Michelle M. Hill, David Duffy, Helle Bielefeldt- Ohmann. (2018) Genetic variations associated with splenic haemangiosarcoma in the dog [Oral presentation, Best oral presentation] at School of Veterinary Science Research Conference. July 20, 2018, the University of Queensland, Gatton Campus

3. Patharee Oungsakul, Caroline O’Leary, Michelle M. Hill, David Duffy, Helle Bielefeldt- Ohmann. (2017) Validation of candidate glycoprotein biomarkers for canine visceral haemangiosarcoma using semi-quantitative lectin/immunohistochemical assay [Oral presentation] School of Veterinary Science Research Conference. September 15, 2017, the University of Queensland, Gatton Campus.

4. Patharee Oungsakul, Caroline O’Leary, Michelle M. Hill, David Duffy, Helle Bielefeldt- Ohmann. (2016) Improving the detection and prognostication of canine haemangiosarcoma [Oral presentation] Program and Abstract of School of Veterinary Science Research Conference. October 27, 2016, the University of Queensland, Gatton Campus.

Manuscripts in preparation

1. Patharee Oungsakul, David Perez Guaita, Alok K. Shah, Eunju Choi, David Duffy, Helle Bielefeldt-Ohmann, Bayden Wood, Michelle M. Hill. Use of FTIR imaging for differential diagnosis of splenic canine hemangiosarcoma. [Manuscript in preparation]

2. Patharee Oungsakul, Eunju Choi, Alok K. Shah, Ahmed Mohamed, Caroline O’Leary, David Duffy, Michelle M. Hill, Helle Bielefeldt-Ohmann. Validation of candidate glycoprotein biomarkers for canine visceral hemangiosarcoma using semi-quantitative lectin/immunohistochemical assays. [Manuscript in preparation]

vi

Contributions by others to the thesis

Dr. Helle Bielefeld-Ohmann has contributed to the conceptual design of this PhD research project and has contributed to the pathologic diagnosis for the samples included in the studies in chapter 2 and chapter 5.

A/Prof. Michelle Hill has contributed to the conceptual design of studies included in chapter 2 and chapter 5. A/Prof David Duffy and Dr. Caroline O'Leary have contributed to the conceptual design of the genetics studies included in chapter 3 and chapter 4 of this thesis.

Dr. David Perez Guaita has contributed to data analysis in cancer prediction model construction and testing which was included in chapter 5.

Dr. Eunju Choi has contributed to sample procurement for chapter 5.

Dr. Helle Bielefeldt-Ohmann, A/Prof. Michelle Hill, A/Prof. David Duffy, Dr. David Perez Guaita and Dr. Alok Shah contributed to varying degree to the writing and editing of the thesis and manuscripts both submitted and in preparation that made up part or all of Chapters 2, 3 and 5.

Statement of parts of the thesis submitted to qualify for the award of another degree

No works submitted towards another degree have been included in this thesis

vii

Research involving or animal subjects

Some of the samples used in the projects (chapter 2, 3, 4, 5) were obtained from client-owned animals. Sample collection was performed under the animal ethics approval certificate (AEC Approval Number SVS/438/15/KIBBLE/CRF) with client’s consent forms. The animal ethics approval certificate is provided in Appendix 14.

viii

Acknowledgments

Firstly, I would like to thank to my principal supervisor, Dr. Helle Bielefeldt-Ohmann for have been supporting me for a long-long time. Dr. Helle is one of the most committed and reliable people I have ever met. I felt very blessed to know that she agreed to be supervising me through my PhD candidature. Many thanks may not be enough for her kindness, patience and everything she has done to support me, so thank you.

Thank you to my co-supervisor, A/Prof. Michelle Hill for giving me advice and constructive feedback on the work I have done, and for allowing me to join her research group at TRI and QIMRB where I had an opportunity to participate in laboratory meeting and discussions. I was also able to attend the seminars in various biomedical research topics at the institutes.

Thank you to my co-supervisor, A/Prof. David Duffy for his patience and for guiding me through genomic data analysis where I had started from not knowing what to do. And thank you to my co- supervisor, Dr. Caroline O'Leary, for giving me advice and great ideas about the research project.

For the great opportunity to study abroad at the University of Queensland, I would like to thank to the generosity of Royal Thai Government Scholarship for the faculty of veterinary science, Rajamangala University of Technology Srivijaya which was the major fund for my tuition fee and living stipends. Also thank to the University of Queensland's Top Up Assistant Program (TUAP) for the top-up living allowance. Thank you to Thai Embassy Office of Educational Affair (OEA) staff for being supportive in all aspects.

Thank you to the financial supports from Australian National Kennel Council, John and Mary Kibble trust, Fred Z Eager Research Prize in Veterinary Science and the donation from an anonymous supporter to support canine visceral haemangiosarcoma research projects in this thesis.

I also would like to thank,

UQ School of Veterinary Science, PM staff, Dr. Willy Suen, Dr. Sarah Kaye for sample provision and nice photographs to be included in the thesis and presentation, Chris Cazier, Lana Bradshaw and Jo Gordon for help and advice in samples preparation. UQ Veterinary Medical Center,

Dr. Mark Haworth, Dr. Jayne McGhie, and Dr. Alana Rosenblatt at UQ Veterinary Medical Center for samples provision. UQ Vet school post-grad coordinators, A/Prof. Chiara Palmieri,

Dr. Francois-Rene Bertin. Thanks to the HDR candidature revision panels who provided constructive feedback in every milestone. UQ Vet school admin, Annette Winter, Christine Cowell,

ix

Deborah McDonald. UQ Vet school and uni friends, for sharing nice time together, Halley, Jaz, Tutu, Emily, Sara, Priya, Michelle, Pandji, Pong, Dung, Gervais, Agness, Trang, Wana, Fon, Maria, Ankit, Henry, Hussain, Partha, Santosh, Binita, Manika will miss you guys for sure!

Translational Research Institute (TRI), Microscopic facility, Adler Ju, Genomic facility,

Ivon Harliwong for helping with microscopic slide scanning and DNA quantitation training.

QIMR Berghofer Medical Research Institute (QIMRB)

Thank you to A/Prof. Michelle Hill's lab members, Renee, Jeff, Ahmed, Rui chao, Mriga, Thomas, Julie, Harley, Harshi, Oom and specially thanks to Dr. Alok Shah who always answers to every of my questions from glycoproteomic to FTIR microscope, really appreciate your help and support.

Hpc admin, Scott Wood, Xiaping Lin, scientific staff in microscopy and histology facilities for training provision.

Centre for Biospectroscopy and School of Chemistry, Monash University

Thank you to Dr. Bayden Wood and Dr. David Perez Guaita, for providing FTIR microscope usage training workshop, for the advice and data analysis. Also, thanks to Mustafa Kanzis, Thomas Hennessy and David Haines from Agilent Technologies for the support and advice regarding FTIR machine.

School of Chemistry and Molecular Bioscience, thank you to Prof. Roy Hall for allowing me to work in the lab and to the lab members, Jodie, Caitlin, Agathe, Tisun, Jess, Laura.

Ramaciotti Centre for Genomics, Dr. Helena Mangs, Dr. Catriona Murray.

Integrated Science, Jenny Peters, Dr. Kaushal Ghandi.

I would like to personally thank to my family relatives, aunties who give me advice and support since I was young. Finally, I would like to express the highest gratitude to the most important people in my life, to mom and dad, who showed me that unconditional love does exist in this world.

x

Financial support

The research projects in this thesis were funded and awarded by:

Royal Thai Government Scholarship

The University of Queensland Top Up Assistant Program (TUAP) scholarship

Australian National Kennel Council

John and Mary Kibble trust

Fred Z Eager Research Prize in Veterinary Science

Donation from anonymous supporters

Keywords dog, visceral haemangiosarcoma, glycoprotein biomarkers, semi-quantitative lectin- immunohistochemistry, germline polymorphisms, somatic mutations, whole exome sequencing, FTIR imaging

xi

Australian and New Zealand Standard Research Classifications (ANZSRC)

ANZSRC code: 070703, Veterinary Diagnosis and Diagnostics, 50%

ANZSRC code: 111203, Cancer Genetics, 25%

ANZSRC code: 060408, Genomics, 25%

Fields of Research (FoR) Classification

FoR code: 0707, Veterinary Science, 30%

FoR code: 1112, Oncology and Carcinogenesis, 35%

FoR code: 0601, Biochemistry and Cell Biology, 35%

xii

Table of Contents

Abstract ...... i

Declaration by author ...... iii

Publications included in this thesis ...... iv

Submitted manuscripts included in this thesis ...... v

Other publications during candidature ...... vi

Conference abstracts ...... vi

Manuscripts in preparation ...... vi

Contributions by others to the thesis ...... vii

Statement of parts of the thesis submitted to qualify for the award of another degree ...... vii

Research involving human or animal subjects ...... viii

Acknowledgments ...... ix

Financial support ...... xi

Keywords ...... xi

Table of Contents ...... xiii

List of Figures ...... xix

List of Tables ...... xxii

List of abbreviations used in the thesis ...... xxv

xiii

Chapter 1 Introduction and Literature Review ...... 1

1.1 Introduction ...... 1

1.2 Canine visceral haemangiosarcoma ...... 2

1.2.1 Terminology ...... 2

1.2.2 Prevalence ...... 2

1.2.3 Clinical signs and differential diagnosis ...... 4

1.2.4 Diagnosis ...... 5

1.2.5 Macroscopic and microscopic features ...... 7

1.2.6 Clinical staging, histologic grading and classification ...... 10

1.2.7 Treatment ...... 14

1.2.8 Prognosis ...... 14

1.3 Etiology and pathogenesis of canine visceral haemangiosarcoma ...... 15

1.3.1 Genomic instability and genetic alterations ...... 15

1.3.2 Breed and genetic background predisposition ...... 19

1.3.3 Neutering status...... 19

1.3.4 Pathogens ...... 20

1.3.5 Hypoxia and inflammation...... 20

1.4 Biomarker ...... 21

1.4.1 Biomarker definition and classification ...... 21

1.4.2 Biomarkers in cancer research ...... 22

1.4.3 Steps in biomarkers development ...... 23

1.5 Serum biomarkers for canine visceral haemangiosarcoma ...... 25

1.6 Tissue markers for canine visceral haemangiosarcoma ...... 29

1.7 Fourier Transform Infrared (FTIR) spectroscopic imaging ...... 33

1.7.1 Infrared (IR) spectrum ...... 33

1.7.2 Fourier Transform Infrared (FTIR) spectroscopic imaging ...... 33

xiv

1.7.3 Using FTIR spectroscopy to identify biochemical components in canine tissues ...... 35

1.8 Hypothesis, aims and thesis structure ...... 36

1.8.1 Rationale 1 Glycoprotein biomarkers ...... 36

1.8.2 Rationale 2 Inherited HSA predisposing genetic markers ...... 37

1.8.3 Rationale 3 Somatic mutations in canine visceral haemangiosarcoma ...... 38

1.8.4 Rationale 4 Using Fourier transformed infrared (FT-IR) imaging to distinguish splenic lesions in canine formalin-fixed, paraffin-embedded (FFPE) tissues...... 38

Chapter 2 Validation of candidate glycoprotein biomarkers for canine visceral haemangiosarcoma using semi-quantitative lectin/immunohistochemical assay ...... 41

2.1 Introduction ...... 41

2.2 Materials and methods ...... 44

2.2.1 Samples ...... 44

2.2.2 Lectin-histochemistry (LHC) ...... 45

2.2.3 Immunohistochemistry (IHC) ...... 48

2.2.4 Positive and negative controls...... 49

2.2.5 Semi-quantitative lectin/immunohistochemistry scoring ...... 49

2.2.6 Statistical analysis ...... 52

2.3 Results ...... 54

2.3.1 Lectin/immunohistochemistry protocol optimization ...... 54

2.3.2 Microscopic images of lectin/immunohistochemistry ...... 55

2.3.3 Glycoprotein expression on the tissue components assessed by H-scores...... 66

2.3.4 Glycoprotein expression on normal vascular endothelium in spleen with HSA and in non-HSA tissues...... 72

2.3.5 HSA histopathologic grades and level of tissue glycoprotein expression...... 72

2.3.6 HSA growth patterns and level of glycoprotein expression...... 74

2.3.7 Potential HSA tissue markers assessed by recursive partitioning ...... 75

xv

2.4 Discussion ...... 78

Chapter 3 Identifying visceral haemangiosarcoma related germline variants in dogs, using whole exome sequencing ...... 82

3.1 Introduction ...... 82

3.2 Materials and methods ...... 83

3.2.1 Samples ...... 83

3.2.2 DNA extraction and pre-quality control assessment...... 85

3.2.3 Whole Exome Sequencing ...... 85

3.2.4 Data pre-processing, variants calling and data preparation for analysis ...... 86

3.2.5 Candidate variants filtering and prioritization ...... 88

3.3 Results ...... 95

3.3.1 DNA quality control and whole exome sequencing ...... 95

3.3.2 Exome variant association scan ...... 99

3.3.3 Variant functional analysis ...... 107

3.3.4 Highlighted HSA related genes ...... 123

3.4 Discussion ...... 126

Chapter 4 Somatic alterations in splenic haemangiosarcoma of five dogs ...... 130

4.1 Introduction ...... 130

4.2 Materials and methods ...... 133

4.2.1 Sample ...... 133

4.2.2 Genomic DNA extraction and whole exome sequencing ...... 133

4.2.3 Identification of somatic mutations and functional analysis ...... 134

4.3 Results ...... 136

4.3.1 DNA quality ...... 136

4.3.2 Single nucleotide mutations ...... 141

xvi

4.3.3 Mutational patterns ...... 155

4.3.4 Copy number variations ...... 158

4.4 Discussion ...... 160

Chapter 5 Using Fourier transformed infrared (FT-IR) imaging to distinguish splenic lesions in canine formalin-fixed, paraffin-embedded (FFPE) tissues...... 166

5.1 Introduction ...... 166

5.2 Materials and methods ...... 168

5.2.1 Samples ...... 168

5.2.2 Sample preparation ...... 169

5.2.3 Scan area selection for FTIR imaging ...... 169

5.2.4 FTIR image spectral data acquisition ...... 171

5.2.5 Creation, optimization, model selection and blind testing of the selected model ...... 171

5.3 Results ...... 175

5.3.1 Sample preparation methods ...... 175

5.3.2 HSA prediction model selection ...... 177

5.3.3 Cancer prediction from FTIR images ...... 182

5.4 Discussion ...... 185

Chapter 6 Conclusions and future perspectives ...... 189

Aim-1 Validation of candidate glycoprotein biomarkers for canine visceral haemangiosarcoma, using semi-quantitative lectin/immunohistochemical assay...... 189

Aim-2 Identifying genetic risk markers for canine visceral haemangiosarcoma ...... 190

Aim-3 Identifying somatic mutations in canine visceral haemangiosarcoma...... 192

Aim-4 Using Fourier transformed infrared (FT-IR) imaging to distinguish splenic lesions in canine formalin-fixed, paraffin-embedded (FFPE) tissues...... 195

Conclusions ...... 195

xvii

References ...... 196

Appendices ...... 213

Appendix 1 ...... 213

Appendix 2 ...... 216

Appendix 3 ...... 217

Appendix 4 ...... 219

Appendix 5 (part 1) ...... 221

Appendix 5 (part 2) ...... 223

Appendix 5 (part 3) ...... 228

Appendix 6 ...... 236

Appendix 7 ...... 237

Appendix 8 ...... 240

Appendix 9 ...... 241

Appendix 10 ...... 242

Appendix 11 ...... 248

Appendix 12 ...... 250

Appendix 13 ...... 251

Appendix 14 ...... 252

xviii

List of Figures

Figure 1- 1 Gross and microscopic features of canine haemangiosarcoma...... 8 Figure 1- 2 Microscopic features of human angiosarcoma, ...... 12 Figure 1- 3 Tools for biomarker discovery...... 23 Figure 1- 4 Electromagnetic spectrum...... 33 Figure 1- 5 Sample of IR spectrum acquired from FTIR spectroscope [108]...... 34 Figure 1- 6 Thesis structure and chapters...... 40

Figure 2- 1 Distribution of thirty-two HSA samples assigned by histopathologic grades, ...... 45 Figure 2- 2 Lectin-histochemistry...... 46 Figure 2- 3 Immunohistochemistry...... 49 Figure 2- 4 Microphotographs generated from a scanned slide of CD31 immunohistochemical labelling of endothelial cells of baboon intestinal tissue...... 56 Figure 2- 5 Microscopic photo of DSA histochemical labelling on dog spleen tissue section ... 57 Figure 2- 6 Scanned microscopic photos of lectin-immunohistochemical labelling by different lectins/antibodies (DSA-Datura stramonium, WGA-Wheat Germ Agglutinin and SNA-Sambucus nigra lectins and anti-human CD31, anti-human VTN and anti-human complement C7 antibodies) on consecutive splenic HSA sections (sample 13_019400E)...... 59 Figure 2- 7 Scanned microscopic photos of lectin-immunohistochemical labelling by DSA- Datura stramonium, WGA-Wheat Germ Agglutinin and SNA-Sambucus nigra lectins and anti- human CD31, anti-human VTN and anti-human complement C7 antibodies on consecutive splenic HSA sections (sample 13_019400E)...... 60 Figure 2- 8 Scanned microscopic photos of lectin-immunohistochemical labelling DSA-Datura stramonium, WGA-Wheat Germ Agglutinin and SNA-Sambucus nigra lectins and anti-human CD31, anti-human VTN and anti-human complement C7 antibodies, respectively, on consecutive cardiac HSA sections (sample 13_023446B)...... 61 Figure 2- 9 Scanned microscopic photos of histochemical labelling with DSA-Datura stramonium, WGA-Wheat Germ Agglutinin and SNA-Sambucus nigra lectins and anti-human CD31, anti-human VTN and anti-human complement C7 antibodies, respectively, on consecutive cardiac HSA sections (sample 13_023446B)...... 62 Figure 2- 10 Microscopic images of lectin/immunohistochemical labelling using DSA-Datura stramonium, WGA-Wheat Germ Agglutinin and SNA-Sambucus nigra lectins and anti-human CD31, anti-human VTN and anti-human complement C7 antibodies...... 63 Figure 2- 11 Microscopic images of lectin/immunohistochemical labelling using DSA-Datura stramonium, WGA-Wheat Germ Agglutinin and SNA-Sambucus nigra lectins and anti-human CD31, anti-human VTN and anti-human complement C7 antibodies, respectively, on HSA with different growth patterns...... 65 Figure 2- 12 Dot charts of H-score distribution, assessed from lectin-immunohistochemical labelling by lectins/antibodies (DSA-Datura stramonium, WGA-Wheat Germ Agglutinin and SNA-Sambucus nigra lectins and anti-human CD31, anti-human VTN and anti-human complement C7 antibodies), respectively...... 69

xix

Figure 2- 13 Boxplot showing the distribution of lectin-immunohistochemical signal intensities, assessed by given H-scores, using six antibodies/lectins ...... 70 Figure 2- 14 Boxplots of H-scores distribution, pairwise comparison of the H-scores was made between three tissue components: ...... 71 Figure 2 - 15 Boxplots of H-score distribution, assessed by lectin/immunohistochemistry for DSA-Datura stramonium, WGA-Wheat Germ Agglutinin and SNA-Sambucus nigra lectins and anti-human CD31, anti-human VTN and anti-human complement C7 antibodies, ...... 73 Figure 2- 16 Boxplots of H-score distribution, assessed for CD31, DSA, WGA, SNA, VTN and C7 lectin/immunohistochemical labelling signal intensity on thirty-two HSA FFPE tissue sections...... 75 Figure 2- 17 Scatter plots of H-scores distribution, assessed from lectin-immunohistochemical labelling for complement C7 and VTN as well as DSA, WGA and SNA...... 76

Figure 3- 1 Flowchart demonstrating the workflow from whole exome sequence raw data (FASQ files) pre-processing until variant discovery (calling)...... 87 Figure 3- 2 HSA-related germline variants filtering workflow: ...... 89 Figure 3- 3 Gene Set Enrichment Analysis (GSEA) data input approaches...... 93 Figure 3- 4 Electrophoresis gel image of genomic DNAs extracted from HSA dog blood...... 95 Figure 3- 5 Electrophoresis gel image of genomic DNAs extracted from; ...... 97 Figure 3- 6 DNA electrophoresis of samples after DNA shearing...... 98 Figure 3- 7 GENetic EStimation and Inference in Structured samples (GENESIS) generalized mixed model analysis of association between HSA and all variants...... 99 Figure3- 8 Statistical summary of the predicted consequences of the HSA-related variants identified by variant association analysis...... 105 Figure3- 9 Bar graph represents number of genes that carry single nucleotide germline variants and the Hallmark molecular signature pathways that might be involved in dogs with HSA. ... 118 Figure 3- 10 Summary of HSA related gene lists generated by two major variant identification approaches, exome variant association scan and variant functional analysis...... 123

Figure 4- 1 Workflow for somatic mutations identification from matched tumour-normal whole exome sequence data to obtain individual's single nucleotide variations (SNVs), small insertion- deletion (Indels) and copy number variations (CNVs)...... 135 Figure 4- 2 Pre-assessment electrophoresis gel image of genomic DNAs extracted from HSA tumors and blood samples...... 138 Figure 4- 3 On arrival electrophoresis gel image of submitted genomic DNA samples...... 139 Figure 4- 4 DNA electrophoresis of samples after DNA shearing...... 140 Figure 4- 5 Genomic view of TP53 gene in five tumour samples...... 146 Figure 4- 6 Genomic view of TP53 gene in paired blood and tumour from sample 's1422'. .. 146 Figure 4- 7 Genomic view of HSPG2 gene in paired blood and tumour from sample 's1625'...... 151 Figure 4- 8 Genomic view of AMT gene in paired blood and tumour from sample 'sMAR'... 151

xx

Figure 4- 9 Genomic view of AHNAK gene in paired blood and tumour from sample 's1810'...... 152 Figure 4- 10 A, B, C. Genomic view of AHNAK gene copy number aberrations in paired blood and tumour from all samples...... 153 Figure 4- 11 A. Mutational spectrum and B. 96 Mutational profile of each sample...... 156 Figure 4- 12 A. Heat map showing cosine similarity between the sample mutational patterns and 33 COSMIC mutational signature patterns. B. Relative contribution of each sample to mutational patterns...... 157 Figure 4- 13 Sample of copy number plot of a sample's whole exome sequence data...... 159

Figure 5- 1 Forty-nine FFPE tissue samples were assigned to data set groups...... 168 Figure 5- 2 Images of scanned microscopic slides, ...... 170 Figure 5- 3 Workflow for data analysis...... 172 Figure 5- 4 Sample of images acquired at each stage of the study...... 173 Figure 5- 5 Options for FTIR image data acquisition methods and images acquired from the experimental trials...... 176 Figure 5- 6 Pixel-wise and sample-wise cancer prediction methods...... 181 Figure 5- 7 Classified FTIR images by cancer prediction model...... 183 Figure 5- 8 Classified 'ground true' image before and after merging 'endothelial cell' and 'connective tissue' classes...... 186

Figure 6- 1 The hypothesized pathways of inherited HSA-related germline variants among overrepresented dog breeds...... 191

xxi

List of Tables

Table 1- 1 Percentages of cancer and HSA of dogs with splenic lesions...... 3 Table 1- 2. Differential diagnosis for /splenic nodules in the dog ...... 4 Table 1- 3 Cytologic morphology of canine haemangiosarcoma...... 9 Table 1- 4 Clinical staging system for canine haemangiosarcoma...... 10 Table 1- 5 Grading criteria for canine haemangiosarcoma and canine soft tissue mesenchymal tumours (STMTs) grading...... 13 Table 1- 6 Genomic events including simple mutations (e.g. single nucleotide mutations, indels etc.) and copy number variations (gain/loss) in human sarcoma, angiosarcoma and canine haemangiosarcoma...... 18 Table 1- 7 Steps in cancer biomarker development ...... 24 Table 1- 8 Potential plasma and serum biomarkers for canine visceral haemangiosarcoma...... 26 Table 1- 9 Top 20 candidate glycoprotein biomarkers for canine visceral haemangiosarcoma, . 28 Table 1- 10 Markers that were explored for immunohistochemistry on canine haemangiosarcoma (HSA), haemangioma (HA) and other tissue samples...... 30

Table 2- 1 Number of cases and clinical parameters...... 45 Table 2- 2 Lectins, including glycan chain specificity and antibodies used in the study and their antigenic retrieval (AR) methods and dilutions...... 47 Table 2- 3 H-score calculation and sample microscopic images of different levels of visualised glycoprotein on the tissues...... 51 Table 2- 4 Features of data collected from semi-quantitative lectin/immunohistochemistry assay...... 53 Table 2- 5 Average H-scores for lectin/immunohistochemical signal intensities of each component of the control samples, ...... 55 Table 2- 6 Overall H-scores assessed from lectin/immunohistochemistry with DSA-Datura stramonium, WGA-Wheat Germ Agglutinin and SNA-Sambucus nigra lectins and anti-human CD31, anti-human VTN and anti-human complement C7 antibodies on HSA samples, ...... 67 Table 2- 7 Overall H-scores, assessed from lectin/immunohistochemistry using DSA-Datura stramonium, WGA-Wheat Germ Agglutinin and SNA-Sambucus nigra lectins and anti-human CD31, anti-human VTN and anti-human complement C7 antibodies, ...... 68 Table 2 - 8 Overall H-scores assessed from lectin/immunohistochemistry with DSA-Datura stramonium, WGA-Wheat Germ Agglutinin and SNA-Sambucus nigra lectins and anti-human CD31, anti-human VTN and anti-human complement C7 antibodies in three histopathological grades...... 74

Table 3- 1 Genomic DNA extracted from EDTA blood of eleven dogs diagnosed with HSA were submitted for whole exome sequencing...... 84 Table 3- 2 Variant Effect Predictor (VEP)...... 91 Table 3- 3 Genomic DNA quality control before sample submission and on arrival at genomic

xxii

sequencing facility quality control...... 96 Table 3- 4 Significant HSA associated germline SNVs that were predicted as high impact variants by VEP...... 101 Table 3- 5 Potential HSA associated germline single nucleotide variants...... 102 Table 3- 6 Gene Set Enrichment Analysis (GSEA) highlighted top pathways where significant HSA associated germline variants overlapped, and genes belonging to the gene sets...... 106 Table 3- 7 Numbers of germline variants identified from variant calling to variant filtered. 107 Table 3- 8 Highlighted HSA related germline variants that were detected at multiple loci on the same gene...... 109 Table 3- 9 Possible damaging HSA related germline variants based on two variant effect prediction tools...... 112 Table 3- 10 The top ten hallmark molecular pathways where the candidate genes overlapped with the hallmark gene set in the Molecular Signature Database (MSigDB) computed by Gene Set Enrichment Analysis (GSEA)...... 115 Table 3- 11 Hallmark molecular pathways and descriptions where the candidate genes overlapped with the hallmark gene set in the Molecular Signature Database (MSigDB) computed by Gene Set Enrichment Analysis (GSEA)...... 116 Table 3- 12 Hallmark molecular pathways and their genes harboring the variants identified in the HSA cohort, ...... 119 Table 3- 13 Multiple list comparison between the results from various HSA related germline variant identification approaches mainly variant association scan ("Association") and variant functional analysis with different variant selection criteria...... 124 Table 3- 14 Genes with HSA-related germline SNVs identified by two variant identification approaches, exome variant association scan and variant functional analysis...... 124

Table 4- 1 Animal profile...... 133 Table 4- 2 Genomic DNA quality control before sample submission and on arrival at genomic sequencing facility quality control...... 137 Table 4- 3 Genes where single nucleotide mutations were predicted to cause damaging effects on the transcripts and ...... 141 Table 4- 4 Single nucleotide mutations that were detected in at least 60% of samples...... 142 Table 4- 5 Possible hot spots of mutations genes...... 144 Table 4- 6 A. HSA panel and B. Pairwise comparison between point somatic mutations in the genes identified in each sample and to the HSA panel...... 145 Table 4 - 7 Pairwise comparison between single nucleotide somatic mutations in each sample and to genes that had HSA related germline variants (possibly mutations)...... 147 Table 4- 8 Genes with HSA-related germline variants that also underwent somatic mutations...... 148 Table 4- 9 Hallmark molecular signature pathways of somatic mutated genes by sample...... 154 Table 4- 10 Cosine similarity between the sample mutational patterns and selected top 7 COSMIC mutational signature patterns...... 157

xxiii

Table 5- 1 Number (in pixels) and proportion (percentage) of classified 'ground true' FTIR images in 'CAL' and 'VAL' datasets in the calibration/model selection step...... 177 Table 5- 2 List of seven pre-processing methods used for optimization of the prediction model...... 178 Table 5- 3 Classified tissue components (ground true) and the sensitivity, specificity and correct classification rate (CCR) obtained by two classification algorithms, Random Forest (RF) and Partial Least Square Discriminant Analysis (PLS-DA)...... 178 Table 5- 4 Possible combinations of the regions and definition of the spectral assignments. 180 Table 5- 5 Confusion matrix, or 2 x 2 contingency table of the FTIR imaging plus cancer prediction model outcome...... 182 Table 5- 6 Table summarizes number of FTIR images, and the model prediction results. .... 184

Table 6- 1 Genomic events including simple mutations (e.g. single nucleotide mutations, indels etc.) and copy number variations (gain/loss) in human sarcoma, angiosarcoma and canine haemangiosarcoma...... 194

xxiv

List of abbreviations used in the thesis

Abbreviations Terms 3'-UTR 3' untranslated region ABC Avidin-Biotin peroxidase Complex AEC 3-amino-9-ethylcarbazole AGP α-1 acid glycoprotein AR Antigen retrieval AS Angiosarcoma ATR Attenuated Total Reflection AUC Area under the curve AUROC Area Under Receiver Operating Characteristic C7 complement component 7 CaF2 Calcium Fluoride CCR Correct Classification Rate CNV Copy number variations COSMIC catalogue of somatic mutations in cancers CT Computed Tomography CV Coefficient of Variations DAB 3,3′-diaminobenzidine DFI disease free interval DNA Deoxyribonucleic acid DSA Datura stramonium EC Endothelial Cell EFSW electric field standing wave ELISA Enzyme-linked immunosorbent assay EMBL European Molecular Biology Laboratory EBI European Bioinformatics Institute EPC endothelial cell progenitor cell ET-1 endothelin 1 EVA European Variation Archive FDR False Discovery Rate FFPE Formalin-fixed paraffin-embedded

xxv

Abbreviations Terms FNA Fine Needle Aspiration FPA Focal Plane Array FTIR Fourier Transform Infrared GATK Genome Analysis Tool Kit GENESIS GENetic EStimation and Inference in Structured samples GLMM Generalized linear mixed model GR Golden Retriever GS German Shepherd GSEA Gene Set Enrichment Analysis H&E Hematoxylin and Eosin HA Haemangioma HER2 human epidermal growth factor receptor 2 HIER heat induced epitope retrieval HPF High Power Field HSA Haemangiosarcoma IHC Immunohistochemistry indels small insertions and deletions IQR Inter Quartile Range IR Infrared LeMBA Lectin magnetic bead array LHC Lectin histochemistry Low-E Low emission LYVE-1 endothelial receptor 1 MAC Membrane attack complex MAD Median Absolute Deviation MC Mitotic Count MCT Mercury Cadmium Telluride MGAM Maltase-glucoamylase MI Mitotic Index MRI Magnetic Resonance Imaging

xxvi

Abbreviations Terms MS/MS tandem mass spectrometry MSigDB Molecular signature database NCB Needle Core Biopsy oaCGH Oligonucleotide Array Comparative Genomic Hybridization OR Odd Ratio PBS Phosphate buffer saline PCA Principal Components Analysis PCR Polymerase chain reaction PDGF platelet derived growth factor PECAM-1 Platelet Endothelial Cell Adhesion Molecule 1 PLS-DA Partial Least Squares-Discriminant analysis PROX-1 Prospera-related homeobox gene 1 PSA Pisum sativum RBC Red blood cell RF Random Forest RHE retiform haemangioendothelioma RNA Ribonucleic acid SBS Single Base Substitution SD Standard Deviation SIFT Sorting Intolerance From Tolerance SNA Sambucus nigra SNP Single nucleotide polymorphism SNV Single nucleotide variant ST Survival Time STMTs Soft Tissue Mesenchymal Tumours TAT Thrombin-anti thrombin TBST Tris buffer saline with Tween 20 TK-1 Thymidine kinase 1 TP53 tumor p53 UEA-1 Ulex europaeus-1 agglutinin

xxvii

Abbreviations Terms VCF Variant Call Format VEGF vascular endothelial growth factor VEGFR vascular endothelial growth factor receptor VEP Variant Effect Predictor VTN vitronectin vWf von Willebrand's factor WES Whole Exome Sequencing WBC White blood cell WGA Wheat germ agglutinin

xxviii

Chapter 1 Introduction and Literature Review

1.1 Introduction

Most dog owners have experienced the moments when they had to say the last good-bye to their companions. Even though dogs are well perceived as human's best friend, they have shorter lifespan than human. Dogs resemble human in the types of diseases they have. Cancer is one of the leading causes of death/euthanasia in pet dogs, accounting for nearly 16% of dog deaths [1, 2].

Visceral haemangiosarcoma (HSA) is a vascular endothelial cell cancer that arises in internal organs, especially in the spleen and heart which is difficult to detect. The clinical signs are non- specific until the tumour ruptures and the affected dogs are brought to an emergency unit because of the consequences of internal body cavity blood loss. A diagnosis of HSA carries a very poor prognosis with most dogs surviving less than a year. Immediate euthanasia is therefore commonly chosen by owners and veterinarians.

If there were markers that could inform dog owners whether their dogs were likely to develop this cancer later in life (i.e., predisposition), or if there was a panel of tests that could inform veterinarians about HSA, more HSA dogs could be saved from suffering through the end-stage HSA. One of the most promising methods for detecting canine HSA is the use of specific biomarkers. Previous studies have indicated the possible use of glycoprotein biomarkers in detection of canine HSA. However, the validation of the canine HSA detection using glycoprotein biomarkers has not been widely performed in the independent cohorts.

Dog genetic background such as germline mutation is one of the factors contributing to HSA development. HSA predisposing genetic markers have been identified in a single dog breed, the Golden Retriever. Studies in various dog breeds of different geographical regions that also suffer from HSA were not widely performed. In the HSA cancer cell, genomic alterations were identified. However, the driver mutations were not obvious and still needed to be investigated further.

The diagnosis of HSA is somewhat challenging. Even with the microscopic examination of tissue samples which is the gold standard, ambiguous samples are sometimes misdiagnosed due to the similarity between cancer and non-cancerous hematoma tissues. Fourier Transform Infrared (FTIR) imaging is one of the novel methods in cancer diagnosis with the advanced feature of distinguishing cancerous from normal tissues in some cancer types. However, the capability of the technique has not been explored in canine HSA tissues. 1

This chapter provides the overview of canine visceral HSA, biomarkers in general, serum biomarkers for canine visceral HSA, canine HSA tissue markers and the use of FTIR imaging in cancer diagnosis. Research hypotheses and aims are defined in the final section of this chapter.

1.2 Canine visceral haemangiosarcoma

1.2.1 Terminology

‘Haemangiosarcoma’ or HSA is the malignant tumour of blood vascular endothelium whereas ‘Lymphangiosarcoma’ means the malignant tumour arising from the endothelium of lymphatic vessels [3]. The term ‘angiosarcoma’ is used mostly in the human medicine literature, where it denotes a tumour of the endothelial cell of vasculature, either blood vessels or lymphatic vessels, but usually angiosarcoma and haemangiosarcoma are used interchangeably where haemangiosarcoma is normally used in the veterinary literature. The term ‘hemangioendothelioma’ appears in several publications, in both human and veterinary medicine, and is a form of benign vascular tumour which forms hypercellular vascular channels. The microscopic and biological features of this form of tumour is somewhere in between benign and malignancy So, the term ‘malignant hemangioendothelioma’ can sometimes refer to haemangiosarcoma [3-6].

1.2.2 Prevalence

According to the 2019 national survey of pets and people, Australia had approximately 5.1 million dog population, own by nearly 40% of households [7]. The proportion of dog owning households in Australia was close to the United States of America where 38% of the households own 77 million dogs [8]. In the United Kingdom, nearly 10 million dogs own by 26% of households were reported in 2019 [9].

Visceral HSA accounted for around 2% of 81,744 cases recorded at a veterinary diagnostic laboratory in the US, during six-year period, the number of visceral cases were slightly higher than cutaneous HSA cases [10]. Three dog breeds that were over presented with splenic HSA included Golden Retriever (18%), German Shepherd (10%) and Labrador Retriever (15%).

Necropsy survey of 1,206 records over 15-year period showed that 56.6% (168/297) of Golden Retrievers, 36.7% (113/308) of German Shepherd and 30.9% (82/265) of Labrador Retrievers were euthanized because of cancers. The most common cancer found in those breeds was HSA, which accounted for 31.5, 32.7 and 22% in Golden Retriever, German Shepherd and Labrador, respectively [11]. The percentages of cancer as the cause of death were assessed in Veterinary Medical Database (VMDB) over 20-year period. Slightly different percentages compared to

2

previous study were reported; 49.9% in Golden Retriever, 27.7% in German Shepherd and 34% in Labrador Retriever [12]. Another 17-year medical record assessment found 65% of Golden Retriever dogs died from cancers while HSA accounted for 22.64% [13].

HSA is the most common malignant tumour of the spleen. Splenic lesions with mass rupture and hemoabdomen were more likely to be cancer and HSA (76.1 - 80% of the lesions were cancer, 88 – 92.6% of the cancers were HSA), while non-rupture lesions were less likely (29.5% of the lesions were cancer, 58% of the cancers were HSA) [14-16]. In several reports 43.7 – 58% of splenic tumours were likely to be cancers where 32 - 73.5% of the cancers were diagnosed as HSA [17-20]. However, small dog breeds (less than 15 kg.) were less likely to have splenic HSA [21]. Table 1- 1 summarises the percentages of dogs with splenic lesions that were diagnosed with cancers and percentages of HSA among splenic cancers.

Table 1- 1 Percentages of cancer and HSA of dogs with splenic lesions.

Cohort Cases Cancers (%) HSA (%) References

Splenic mass 71 76.1 92.6 [16] with hemoabdomen and anemia Splenic mass, 30 80 88 [15] with hemoabdomen Splenic mass, non-ruptured 105 29.5 58 [14]

Splenic mass 87 43.7 44.7 [17]

Splenic mass 249 53 73.5 [18]

Splenic mass 325 58 32 [20]

Splenic mass 64 48.43 80 [19]

Splenic mass, 370 56 27 [21] small breeds (<15 kg.)

3

1.2.3 Clinical signs and differential diagnosis

Presenting signs for haemangiosarcoma are non-specific, varying from asymptomatic to acute death, depending on the site where the tumour occurs. Signs of blood loss from a ruptured tumour range from mild (lethargy, weakness, pale mucous membrane, tachycardia, tachypnea) to severe (hypovolemic shock, collapse or sudden death). Cardiovascular signs are presented with cardiac HSA e.g. exercise intolerance, dyspnea, right-sided heart failure, ascites, cardiac tamponade. Bleeding from splenic or hepatic HSA results in hemoperitoneum and abdominal distension might be noticeable, reabsorption of blood back into the circulation makes the clinical signs resolve and become vague [22, 23].

Patients profiles e.g. haematology, serum biochemistry, coagulation profiles, urinalysis, thoraco- abdominal imaging, abdominal ultrasound, echocardiography, electrocardiogram are assessed to aid in clinical staging. Non-specific clinical examination findings such as anemia, thrombocytopenia, DIC, increase alkaline phosphatase and alanine transferase may be detected [22, 23]. The differential diagnoses for clinical presentation of splenomegaly/splenic nodules in dogs, without histopathologic or cytologic confirmation were summarized by Smith (2003) [22] is shown in Table 1- 2.

Table 1- 2. Differential diagnosis for splenomegaly/splenic nodules in the dog [22].

Non-neoplastic disease Neoplastic disease Hyperplastic nodular enlargement Benign Hematoma Fibroma Vascular stasis/congestion Hemangioma Thrombosis/infarction Lipoma/myclolipoma Torsion Malignant Traumatic rupture Haemangiosarcoma Abscessation Fibrosarcoma Extramedullary hematopoiesis Leiomyosarcoma Inflammation/necrosis Lymphosarcoma Myxosarcoma Osteosarcoma Liposarcoma Rhabdomyosarcoma Chondrosarcoma Mesenchymoma Undifferentiated/anaplastic sarcoma Histiocytosis/fibrohistiocytic nodules Mast cell tumour Metastatic neoplasia

4

1.2.4 Diagnosis

Final diagnosis of the HSA depends on microscopic examination of tissue samples. These samples are best obtained via splenectomy or post-mortem necropsy. Immunohistochemistry (IHC) for vascular endothelial cell markers (Factor VIII-related antigen/vonWillebrand’s factor and CD 31/PECAM-1) are used to confirm vascular tumours in case of solid HSA pattern, which looks similar to other soft tissue sarcomas [24-27]. IHC of CD31 and Factor VIII-related antigen should be performed together, because cancer cells in some patients are not immuno-positive for either or both CD31 and Factor VIII-related antigen [28, 29]. Lymphatic vessel markers, LYVE-1 (Lymphatic vessel endothelial receptor-1) and PROX-1 (Prospera-related homeobox gene-1) were used to successfully differentiate lymphangiosarcoma from haemangiosarcoma [30]. There are still no proven specific markers for HSA that can differentiate HSA cells from normal vascular endothelial cells, hence normal vascular endothelial cells were assessed as an internal control when performing immunohistochemistry for vascular neoplasms [31]. Some antibodies that were able to distinguish malignancy from benign will be discussed later in the chapter.

Cytologic diagnosis using aspirated cells from splenic nodules sampled via ultrasound guided biopsy is the least invasive diagnostic technique that provides the most accurate results when compared to the gold standard so far. The accuracy of cytologic examination from the FNA derived specimens compared to histopathologic examination were 100% for hematopoietic neoplasia while for HSA was 66.67% (2/3) accurate [32]. Another comparison for the diagnosis of splenic lesions was conducted, and complete agreement of the results between two techniques was 59%, partial agreement 29% and disagreement 12%. The study also found different proportions of benign and malignancy lesions detected by cytologic vs. histopathologic examination. The prevalence of malignant lesions detected by FNA was 20%, benign lesions 29%, normal 28%, inflammatory lesion 3% and equivocal 20% while histopathologic examination could detect malignant lesions in 43%, benign in 49% and inflammatory lesion in 8% [33].

Fine Needle Aspiration (FNA) sampling to get large numbers of cells from this tumour can be challenging because most of the cytologic samples comprises red blood cells as background. Haemangiosarcoma can be very fragile and easy to rupture and bleed, close monitoring for patient’s condition is suggested when performing the FNA [28, 32, 34-38].

The combination of FNA and NCB (Needle Core Biopsy) was studied to see if the techniques were safe for aid in the diagnosis of splenic mass rather than only FNA alone. The study did not report major complications, only more than 10% drop in PCV was observed in 3.9% of the participating

5

cases, 25.6% had peritoneal haemorrhage from NCB, 2.6% had haemorrhage from FNA and 69.2% had visualized NCB needle tract. The correlation comparison was not between FNA and NCB vs. histopathologic examination from splenectomy, but between FNA and NCB. Complete agreement was 51.4%, partial agreement 8.6% and disagreement 40%. Unfortunately, one of the cases, that had undergone splenectomy, was histopathological diagnosed as haemangiosarcoma, but had been diagnosed as lymphoid hyperplasia from FNA and NCB cytopathologic results [36].

Non-definitive but less invasive imaging techniques for HSA diagnosis such as radiography, ultrasound, computed tomography (CT) and magnetic resonance imaging (MRI) were studied. Standard radiography could only tell that there was a mass or space occupying substance around the organ. Conventional ultrasound could help clarify whether the echogenicity of that substance was hypo-, hyper-, iso- or heteroechoic. There was an attempt to classify ultrasonographic features for tentative diagnosis of liver parenchymal lesion. It was 85% sensitive for hepatic HSA diagnosis by using ultrasound and the ultrasonographic appearance was found to be heteroechoic in 54% of the cases. It was concluded that using the ultrasound alone would not allow a definitive diagnosis for canine liver lesion [39]. X-ray and ultrasound together were good tools to diagnose internal organ mass at the initial stage, but for definitive diagnosis, more invasive protocols e.g. FNA or splenectomy were still needed.

Conventional ultrasound was not a useful tool for differentiation between malignancy and benign unless as an aid for ultrasound guided FNA. In advanced ultrasonography, the hypothesis is that malignant tumour lesions tend to appear more hypoechoic compared to normal tissue echogenicity. The echogenicity of the HSA on conventional B-mode ultrasound appeared to be mixed pattern (10 HSA; 6 hypoechoic, 4 mixed) as well as hematoma (3 hematomas; 1 hypoechoic, 2 mixed) [40]. Contrast harmonic ultrasound imaging was then applied. Based on the same concept about the hypo-echogenicity of the malignant tumours, this ultrasound technique facilitates the investigation of tissue blood perfusion in each phase. The results were promising and might be able to aid in the differentiation between benign and malignant canine splenic tumour [41]. However, this technique requires an ultrasound expert and contrast medium to be injected to the patient. Elastography (the ultrasonographic technology for measurement of tissue elasticity) might be a promising technology in the diagnostic of human breast, prostate or thyroid cancer, because those cancers have the tendency to lose tissue elasticity, whereas in canine patients, this technique was unable to distinguish between benign and malignant splenic lesions [40].

6

Computed Tomography (CT) scan and Magnetic Resonance Imaging (MRI) have advantages over ultrasound in case of a large-sized splenic mass. While MRI was considered the best technique for distinguishing between soft tissue and a mass, the ability of both CT and MRI to diagnose canine HSA in a visceral organs was inconsistent [42-45]. However, these techniques, including positron emission tomography/computed tomography (PET/CT) can be helpful for detection of metastases before or after surgical splenectomy, so clinical staging and prognosis can be made [42, 46, 47]. Some advanced techniques are promising auxiliary diagnostic tools, but to make a definitive diagnosis of the HSA or to differentiate benign splenic lesion from malignant lesion nowadays, histopathologic examination is still required.

1.2.5 Macroscopic and microscopic features

Haemangiosarcoma in visceral organ can vary in size, from as small as 1 mm to 20 cm or larger blood-containing masses. The colour of the mass itself is white, but mottled red, black or yellow/white dependent on the amount of red blood cells present. The shape is usually nodular, but flat, raised, cystic or areas of necrosis with blood-oozing when cut through the surface can also be seen. The consistency of the mass varies from soft/gelatinous, blood-filled multiple cysts, sponge- like to firm. The wide range of macroscopic appearance of HSA is in concordance with the varieties of its microscopic features [23, 48, 49].

‘Typical’ HSA cells are elongated, spindle-shaped, plump, anaplastic vascular endothelial cells and arrange themselves to form vascular chambers along with a supporting acellular, bright eosinophilic hyaline substance forming the stroma (Figure 1- 1) [3, 38, 48, 50]. Cytologic examination of HSA specimens derived from Fine Needle Aspiration (FNA) has different appearance from the histopathologic features as described by Bertazzolo et al. (2005) [28] and are summarized in Table 1- 3.

HSA growth patterns were summarized following the dominant pattern of solid, capillary/small calibre spaces, cavernous/cystic/large vascular spaces or the combination of all patterns [29, 49]. Atypical forms of HSA in dog such as retiform haemangioendothelioma (RHE)-like HSA, epithelioid variant HSA were also reported [5, 6].

7

A B

C D E

Figure 1- 1 Gross and microscopic features of canine haemangiosarcoma. A. Haemangiosarcoma of a dog’s spleen. (Animal and image sources by courtesy of Dr. Mark Haworth, UQ Veterinary Medical Centre and Dr. Willy Suen, UQ SVS Pathology)

B. Microscopic image of H&E stained canine haemangiosarcoma tissue, x60 objective magnification. This is a ‘slit-like’ or ‘capillary’ growth pattern which is considered typical form of HSA. C. Typical or classical type HSA. The HSA cells are attempting to form vascular channels which appear 'slit-like'. They produce characteristic hyalinized extracellular matrix, which is the glossy, light pink, clouded substance that appears adjacent to the HSA cells in H&E stained histopathologic micro image (star). D. Cavernous type HSA. The HSA cells are forming multiple blood filled vascular 'cyst-like' channels. E. Solid type HSA. Sometimes the growth pattern of the HSA cells is similar to those of the soft tissue tumours. Few vascular channels or extracellular matrix are observed. The scale bars in C, D and E = 100 µm.

8

Table 1- 3 Cytologic morphology of canine haemangiosarcoma. The table and included figures were summarised from Bertazzolo et al. (2005) [28].

Features Low - grade High - grade Epithelioid

sarcomatous sarcomatous subgroup subgroup subgroup

Cytoplasm Spindle, monomorphic Spindle, oval, flamed, Round, oval, shaped . stellate pleomorphic spindle shaped with shaped. distinct cellular Severe anisocytosis border . Size :15- 20 µm . 20-50 up to 70-80 µm . 15-50 µm . Punctate vacuolated, Punctate vacuolated Punctate vacuolated basophilic cytoplasm . cytoplasm . cytoplasm .

Nucleus Round or oval shaped . Round, oval, Severe pleomorphic shape . anisokaryosis . Anisokaryosis, bi -or multiple nuclei Nucleoli Indistinct Indent, prominent - N : C ratio High Low Low to high .

9

1.2.6 Clinical staging, histologic grading and classification

Similar to other tumours, haemangiosarcoma has three clinical stages according to the TNM system as shown in Table 1- 4. The clinical staging applies to the tumour’s extent. In stage I, the tumour is confined to the organ (spleen), stage II with ruptured spleen but no metastases and stage III implies ruptured spleen and evidence of metastases [23, 51].

Table 1- 4 Clinical staging system for canine haemangiosarcoma. [23]

Primary Tumour (T)

T0 No evidence of tumour

T1 Tumour less than 5 cm diameter and confined to primary site

T2 Tumour 5 cm or greater or ruptured: invading subcutaneous tissue

T3 Tumour invading adjacent structures, including muscle

Regional Lymph Nodes (N)

N0 No regional involvement

N1 Regional lymph node involvement

N2 Distant lymph node involvement

Distant Metastasis (M)

M0 No evidence of distant metastasis

M1 Distant metastasis

Stages

I T0 or T1, N0, M0 II T1 or T2, N0 or N1, M0 III T2 or T3, N0, N1 or N2, M1

Tumour grading is usually based on the following criteria when the tissue is visualized under the microscope: degree of differentiation, mitotic index or count (MI or MC) per 10 high-power fields

10

(HPF), degree of cellular/nuclear pleomorphism, nuclear size and number, percent necrosis, invasiveness, stromal reaction, overall cellularity and lymphoid response. The most used criteria, because they are objective features, are mitotic index, percentage of necrosis and nucleolar features [27, 52].

Standard histopathologic grading was not established specifically for canine HSA, but some reports have been using an HSA scoring system established by Ogilvie et al. (1996) [53], which was modified from soft tissue sarcoma. The cut-off value for evaluation of each parameter e.g. differentiation score, mitotic count or tumour necrosis score for canine soft tissue mesenchymal tumours (STMTs) was not specifically determined and still needed to be standardized [52]. Table 1- 5 demonstrates the HSA grading system modified by Ogilvie et al. (1996), Sabattini and Bettini (2009) and proposed grading score for STMTs by Meuten et al. (2016) [52-54].

In human low-grade angiosarcoma, and varieties of atypical angiosarcoma microscopic features were described. It can be vasoformative, sieve or slit-like, Karposi-form or solid pattern. At higher-grade of angiosarcoma, the majority of poorly differentiated cells are seen which could be epithelioid, spindle or pleomorphic cells (Figure 1- 2) [55].

Human vascular tumours were classified by WHO into 3 categories, benign (Haemangioma), intermediate (Haemangioendotheliomas) and malignant (Angiosarcoma and Epithelioid haemangioendothelioma) [56]. In an updated edition in year 2013, pseudomyogenic hemangioendothelioma was introduced into the intermediate category [57].

In domestic animals, histologic classification of mesenchymal tumours of skin and soft tissue also classified animal vascular tumours in 3 categories as in human. Only Kaposi-like vascular tumour (the morphology of the tumour was described similar to human Kaposi’s sarcoma and kaposiform hemangioendothelioma) was categorized in the intermediate vascular tumour by that time [58]. Subsequently, several reports came out identifying similarity between canine variant forms of canine intermediate vascular tumour [6, 59-62]. Hence, dog and human HSA and AS share notable similarities between each other.

Nevertheless, a report of two Golden Retrievers which had retiform haemangioendothelioma (RHE)-like vascular lesion in the skin and heart did not conform with the human vascular tumour classification. While RHE in human is classified as a low-grade malignant tumour with low likelihood to metastasize, the RHE-like vascular lesion in dogs was very aggressive and categorized as malignant haemangioendothelioma or haemangiosarcoma, so re-classification of vascular tumours in dog was suggested [5].

11

A B C

D E

F G H

Figure 1- 2 Microscopic features of human angiosarcoma, which was graded according to degree of differentiation and pleomorphism [55] A-E: Low-grade angiosarcoma, well-differentiated, typical form of angiosarcoma.

A. Well-differentiated, typical angiosarcoma. Vascular channels formation entraps collagenous stroma and forms intraluminal papillary structure.

B. Vasoformative pattern. C. Sieve or slit-like pattern. D. Karposiform. E. Solid pattern.

F-H: High-grade angiosarcoma. Cells are poorly differentiated and can be of any form.

F. Epithelioid cells. G. Spindle cells. H. Pleomorphic cells.

12

1 Table 1- 5 Grading criteria for canine haemangiosarcoma and canine soft tissue mesenchymal tumours (STMTs) grading. Grading Ogilvie et al. [52] Sabattini and Bettini [53] Meuten et al. [51] criteria Score Definition Score Score Definition Prognosis

Differentiation (1-3) 1: well- 1: well-differentiated with (1-3) 1: well-defined differentiates irregular vascular channels. diagnostic pattern 2: moderately 2: moderately-differentiated 2: poorly-defined differentiated with at least 50% of the 3: un-differentiated 3: poorly- tumour showing well-defined differentiated vascular channels. 3: poorly-differentiated solid tumour with few vascular channels. Pleomorphism (0-3) 0: none 1: Minimal - - 1: mild 2: Moderate 2: moderate 3: Marked 3: marked Necrosis (0-3) 0: ≤ 10 - (0-2) 0: no necrosis amount 1: 10 – 20% 1: ≤ 50% necrosis 2: 21 - 49% 2:> 50% necrosis 3: > 50% Mitotic count count per 0: ≤ 10 - count per 1: 0 - 9 figures 10 HPF 1: 10 - 20 2.37 mm2 2: 10 - 19 figures (0-3) 2: 21 - 29 (1-3) 3: > 19 figures 3: > 30 Metastases Local recurrence Total 0 - 5 = Grade 1 Score 1, 2, 3 Total score ≤ 3 = Grade 1 0 % 7% score 6 - 9 = Grade 2 15 4 -5 =Grade 2 27% 35% 24 10 - 12 = Grade 3 ≥ 6 = Grade 3 22% 75%

2 13

1.2.7 Treatment

Surgical removal of the spleen (splenectomy) is the primary recommended treatment for splenic HSA and adjuvant chemotherapy treatment is indicated for almost all cases. Doxorubicin-based chemotherapy protocols are used, it can be administered as a single agent or combined with Cyclophosphamide or Vincristine depending on the veterinarian’s decision [23]. Radiation therapy in dog with HSA as a primary therapy is not widely studied, but palliative radiation was reviewed by Smith (2003) [22].

Studies about developing treatment protocols, drug delivery systems, cancer vaccines, immunomodulators, anti-metastatic especially anti-angiogenesis agents or molecular drug target research are still ongoing [63-67]. Alternative anti-cancer agents were also studied such as Polysaccharopeptide from mushroom or the Chinese herb, Yunnan Baiyao, might be other HSA treatment options [68, 69].

1.2.8 Prognosis

The prognosis for dogs diagnosed with visceral HSA is poor. Median survival time for a dog with HSA ranges from 19 to 240 days depending on clinical stage. Adjunctive therapies may have some influences to prolong survival time, but unfortunately less than 10% survive up to one year, while approximately 30% may survive up to two months [23, 70]. This is shorter than cutaneous haemangiosarcoma, which has median survival time up to 987 days [71].

Well differentiated tumours had a greater tendency to have longer disease-free interval (DFI) or survival time (ST) after surgical removal of the spleen, yet studies to evaluate significant associations between HSA histologic grades and survival time are still needed [53]. In one study of human primary angiosarcoma, well-differentiated tumours by histologic grade showed significant longer survival time than poorly differentiated tumour, whereas tumour growth patterns showed no significant relation with survival time [29, 72].

Tumour fragility was suggested as one of the criteria predicting malignancy or benign, if the tumour is ruptured (more fragile), it has higher chance to be a malignant tumour and the median survival time is significantly shorter, 110 days in malignant and 436 days in benign [14]. Score system constructed from combination of clinical parameters; 'HeLP' has been developed to predict the likelihood of HSA in non-traumatic hemoabdomen canine patients. Area Under the Receiver Operating Characteristics (AUROC) that measured the performance of the model (the closer to 1

14

the better the performance) was 0.77 in validation population, which was considered as good model [73].

1.3 Etiology and pathogenesis of canine visceral haemangiosarcoma

1.3.1 Genomic instability and genetic alterations

Cancer is a genomic disease of multiple contributing factors. According to the latest established cancer hallmarks, genomic instability is a key driver for initiation of carcinogenesis [74-77].

In , angiosarcoma (AS) is much rarer than in the dog. Common causes of human secondary AS are breast cancer radiation treatment and , neither of which have been reported to feature in canine HSA. Genomic amplification of MYC genes was detected in over 90% of samples, while lower number (less than 10%) of sample population had mutations in PLCG1 and KDR genes [78]. While the cause of human primary angiosarcoma remains unclear, genomic profile of the cancer seems to be different. In a case of primary splenic angiosarcoma, genomic profile was assessed to select the most suitable target treatment. The results showed splenic angiosarcoma tumour underwent PIK3CA mutations, FOS and MCL1 gene amplification and a part of PIK3R1 gene deletion [79].

The (human) Angiosarcoma (ASC) project was established to collect clinical information from patients to aid in the research of this rare cancer (Painter et al, 2020). Whole exome sequencing data from 47 tumours showed top three genes that underwent genomic alteration were TP53 (30%), KDR (26%) and PIK3CA (21%). While some samples with TP53 simple somatic mutations (e.g. single nucleotide mutations, indels etc.) also had copy number losses, the alterations in KDR gene were dominantly gain of copy numbers. Only simple somatic mutations were identified in PIK3CA gene [80]. When compared top mutated genes to co-mutations percentages in human sarcoma, apart from TP53 mutations where the percentages of human sarcoma and AS were 39.24% and 30%, respectively, other genes were found highly mutated in AS than sarcoma in general (Table 1- 6) [80, 81].

In a canine HSA cohort, MYC, PTEN, PDGFRB, TP53 gene amplifications were not prominent. Thomas et al, (2014) performed Oligonucleotide Array Comparative Genomic Hybridization (oaCGH) genomic profiling of HSA tumors from 40 Golden Retriever (GR) dogs plus 35 non-GR dogs and found more than 20% of the samples had significant gain of copy number, suggesting amplifications on 5, 12, 13, 24 and 31 where genes SKI, VEGFA, PDGFRA, KIT and KDR reside. Loss of copy number, suggesting genetic deletion was detected on chromosomes 11

15

and 16 where tumor suppressor genes CDKN2A/B and its interacting protein gene CDKN2AIP reside. Differences in copy number variations (CNVs) between Golden Retrievers and other dog breeds, including Australian Shepherd Dog, Bernese Mountain Dog, Flat-coated Retriever and German Shepherd Dog were observed, suggesting that copy number profile may differ between some breeds.

Somatic copy number variations were also identified from the oaCGH data of 69 HSA tumors from Golden Retrievers [82]. Copy number gains were observed in VEGFA (19% of the cohort), KDR (22%), KIT (17%) and MYC (9%), whereas copy number deletions were observed in CDKN2A/B (22%) and AXIN1 (22%). Some samples had a mix of both gain and loss of copy number in PTK6 (gain 8.7%, loss 17%), ARFRP1 (gain 13%, loss 17%) and RTEL1 (gain 13%, loss 17%). Both studies of canine HSA genomic profile using oaCGH showed similar results in that copy number deletions were found on tumor suppressor genes CDKN2A/B and copy number gains were detected on genes that were involved in cancer promotion.

While CNVs were not recurrent on cancer-associated genes TP53, PTEN, PDGFRB and MYC, somatic point mutations in some of those genes were reported to be significant in canine HSA [82- 84]. Whole exome sequencing of matched tumor-normal samples was performed in 20 dogs of various breeds and highlighted candidate mutations in canine HSA such as TP53 which were detected in 35% of the cohort, PTEN, 10%, PIK3CA, 45% and PLCG1, 5% [84]. Similar findings were observed in 47 Golden Retrievers as they showed significant numbers of mutations in seven genes, TP53(60%), PIK3CA (30%) PIK3R1, ORC1, RASA1, ARPC1A, and ENSCAFG00000017407 (<10%) [82]. Target sequencing of HSA candidate gene panels including TP53, PTEN, PIK3CA, PLCG1 and NRAS were conducted in 20 cases from a previous study [84] plus 30 additional cases, and the results showed somatic mutations in one or more genes in the panel, with gene mutations in TP53 in 66% of the dogs, PTEN, 6%, PIK3CA, 46% and PLCG1 in 4% of tumors [85]. Based on these results five distinct groups for HSA intra-tumor molecular subtypes were proposed [85]. Mutations of genes in Ras family e.g. VHL, NRAS, KRAS and HRAS were also investigated in 10 HSA dog cell lines, the mutations of those genes were infrequent and may not play major roles in canine HSA [86].

Gene expression profile and molecular pathways analysis were investigated in several dog breeds with HSA [86, 87]. The results showed over expression of VEGFR1, VEGFA, TIMP-1, ADAM9, PDGFC, MMP14, TNF-α and acid ceramidase. Molecular function involved in canine HSA were cancer, hypoxia, inflammatory, angiogenesis and possible ceramide signaling pathways.

16

Most studies of canine HSA genomic and gene expression profile have noted heterogeneity of the results and clustering among dog breeds. The factors contributing to the heterogeneity of genetic events in canine HSA were unclear but likely the genetic background of the dog breeds plays a role. Exploration in larger cohorts with greater variety of breeds are still needed.

17

Table 1- 6 Genomic events including simple mutations (e.g. single nucleotide mutations, indels etc.) and copy number variations (gain/loss) in human sarcoma, angiosarcoma and canine haemangiosarcoma. Gene Human Human Canine HSA (%) Sarcoma (%) AS (%) TP53 39.24m 30m,cl 66m 60m 35m -

3.46cg - - - - -

34.23cl - - - - -

PTEN 5.06m - 6m - 10m -

4.29cg - - - - -

8.08cl - - - - -

PIK3CA 3.38m 21m 46m 30m 45m -

5.0cg - - - - -

5.77cl - - - - -

PLCG1 1.69m 17m 4m - 5m -

5.77cg - - - - -

2.31cl - - - - -

KDR 1.69m 26m,cg - - - 20cg

10.0cg - - 22cg - -

3.08cl - - - - -

MYC 0.84m 11m,cg - - - -

9.62cg - - 9cg - -

5.77cl - - - - -

KIT 2.53m 9m,g - - - -

10.0cg - - 17cg - 20cg

3.46cl - - - - -

References [81] [80] [85] [82] [84] [83] m = simple mutations e.g. single nucleotide mutations, indels, frameshift mutations etc. cg = copy number gain, cl = copy number loss

18

1.3.2 Breed and genetic background predisposition

German shepherd and Golden Retriever were among the top breeds that reported to develop HSA, followed by Labrador Retriever [10, 11, 15, 17, 18, 88, 89]. The overrepresentation of HSA in German Shepherd and Golden Retriever breeds was significant [89] which demonstrated that breed and genetic background contributed to the high incidences of this cancer. Inherited germline mutations were suspected to be the cause.

A study to identify canine HSA germline risk mutations was conducted by Tonomura et al (2015) [90]. In a genome wide association study in 142 Golden Retrievers with HSA, fine mapping was performed by whole genome sequencing in 9 HSA dogs. The study identified germline risk loci located on 5 at 29Mb and 32Mb positions. No protein coding genes were identified, however, the contributions to higher HSA risk were suspected to be germline variants on non-coding regions of regulatory genes in T-cell mediated immune response in the tumour. Since the study was conducted using a single dog breed within one geographical region, further exploration in other dog breeds of different geographical regions may help construct the picture of HSA risk genotypic landscape of the dogs worldwide.

1.3.3 Neutering status

Neutering was found to be associated with higher risk of HSA in female dogs. From a medical record data assessed in 15-year period, spayed female dogs had higher risk for HSA (OR = 3.18 + 0.73) which is higher than castrated male dogs (OR = 1.39 + 0.17) [91].

Another medical record assessment was conducted in Golden Retriever dogs presented at a veterinary academic centre over 17-year period, 65% of Golden Retriever dogs died from cancers while HSA accounted for 22.64% [13]. No statistically significant difference was observed between the proportion of HSA diagnosed cases in neuter and intact female dogs. Odd ratios of having any cancers according to the neutering status increased with age, from 0.99 to 1.25 in male dogs. While in female dogs, the odd ratios of neutering and cancer risk was 1.73 then 1.19 in older dogs. In an assessment of Veterinary Medical Database (VMDB) during a 40-year period, the odd ratio of neutering and splenic HSA was 1.59 in female dogs. The study also demonstrated that German Shepherd and Golden Retriever were among breeds that highly association with HSA [89].

19

1.3.4 Pathogens

Bartonalla spp. was suspected to be one of oncogenic infectious agents for vascular tumour as Bartonalle vinsonii subsp. berkhoffi gentoype II was isolated in blood from a patient suffering liver epithelioid hemangioendothelioma. The DNA of the pathogen was also detected in the patient's tissue sample. In another case of dog's cutaneous hemangiopericytoma, the pathogen was also detected [92].

From the findings, prevalence of vector-borne pathogens such as Bartonella spp., Babesia spp., Mycoplasma spp., Ehrlichia/Anaplasma spp., Hepatozoon spp., Leishmania spp., and Rickettsia spp. were investigated in dogs with splenic lesions [19, 93, 94]. The DNA of Bartonella spp. was detected in 26% of 50 canine HSA tissue samples. Later study found 73% of 110 HSA dogs were positive for Bartonella spp [93]. by the detection of DNAs of this microorganism, not only in the tumour tissues, but non-tumour tissues collected from other anatomical sites of the same dog [94]. The prevalence of vector-borne pathogens in dogs with splenic lesions was different in the Mediterranean region. Bartonella spp. was not detected in any splenic lesions while Babesia canis was detected in four samples out of HSA blood or tissue samples [19].

1.3.5 Hypoxia and inflammation

Even though the pathogenesis of canine HSA remains unclear regarding the origins of HSA, whether the cancer arises from transformed endothelial cells or from hematopoietic precursor cells. As, apart from endothelial cell markers, CD31 or vWf, endothelial progenitor cells, CD133, CD34 and hematopoietic cell marker, CD117 were expressed by HSA cells [95-97]. An angiosarcoma- induced mouse model may help elucidate the pathogenesis of HSA as a comparative model [98].

The study suggested hypoxia and inflammation as important factors that lead to hepatic angiosarcoma development. Interestingly, the organs involved were not the organ where the tumour located (liver), but bone marrow and spleen were also involved. 2-Butoxyethanol (2-BE) was used for induction of red blood cell hemolysis, consequently created hypoxic condition which; 1) triggered an inflammation processes in the liver which induced endothelial cell (EC) proliferation. 2) activated Erythropoietin signaling that promoted endothelial cell progenitor cell (EPC) differentiation and endothelial cell proliferation in the bone marrow and spleen. Those cells were exported to the site of tumour. During the phase of EC proliferation and EPC/EC recruitment from the bone marrow and spleen, the mechanisms that maintain genomic stability was failed at some stage, then the mutated proliferated cells developed HSA [98]. According to the model in mouse, it is possible that there were two main sources of proliferating cells that could undergo

20

carcinogenesis. The involvement of hypoxia and inflammation processes in HSA was found in Gene Set Enrichment Analysis of differently expressed genes in a dog HSA cohort, especially of Golden Retriever breeds, suggested those pathways may play important roles in canine HSA pathogenesis as well [87].

1.4 Biomarker

1.4.1 Biomarker definition and classification

In biomedical research, clinical practice and medical product development, the FDA-NIH Biomarker Working Group has given the definition and categorized types of biomarkers according to the applications in their BEST (Biomarkers, EndpointS, and other Tools) Resource, describing a biomarker as a ‘defined characteristic that is measured as an indicator of normal biological processes, pathogenic processes, or responses to an exposure or intervention, including therapeutic interventions’ [99].

Anything which is able to be measured objectively (has standard parameter) can be a biomarker such as heart rate, alteration in mean arterial blood pressure indicating a physiologic or pathologic condition, and blood glucose concentration in Diabetes Mellitus are examples of classical biomarkers/biological markers. When it comes to the molecular biology era, DNA, RNA, protein and metabolites have become measurable, and are called ‘molecular biomarker’ which are the focus for researchers who are trying to find a specific biomarker for a specific condition [100].

Again, when categorizing a biomarker according to its application purposes, a biomarker can be a diagnostic, monitoring, pharmacodynamics/response, predictive, prognostic, safety or susceptibility/risk biomarker [99]. Often one biomarker can be used for more than one purpose. For example, testing of blood glucose could be for routine health check screening or for monitoring the efficacy of glycaemic control treatment. So, it could be a screening diagnostic biomarker at first as well as monitoring biomarker after a patient was confirmatively diagnosed.

It is noted that predictive and prognostic biomarker might sound similar, but they are not. The predictive biomarker is used to identify (predict) the individual which would respond to a particular treatment [99]. Clinicians use predictive biomarker to determine which treatment option is the best suitable for their patients, so the medical care nowadays can become more individualized and more precise.

21

Biomarkers in the drug research field are also classified into Type 0, Type I and Type II according to their utility in drug discovery processes. Type 0 biomarker is to indicate disease status i.e. diagnostic biomarker; Type I is to detect the drug effect and Type II is considered as surrogate endpoint [99-101]. A biomarker becoming a surrogate endpoint is the ultimate goal of biomarker research, because it reflects the target of interest directly. Examples of biomarkers that have a surrogate endpoint potential: serum creatinine concentration reflecting renal function or, blood glucose which indicates a patient’s glycaemic status. Biomarkers which have become surrogate endpoint are used worldwide, aid in disease staging and impact clinician’s decision making. They also help the researchers, healthcare professionals, patients and the public to communicate more easily because of their ability to depict patient’s disease condition substantively and unarguably.

1.4.2 Biomarkers in cancer research

Nowadays, the frame work of cancer biomarkers research are based on the investigation into specific molecular pathways of the cancer cell to find either a target for treatment, to predict the most suitable treatment option or to draw a prognosis [102]. Technologies such as genomics, epigenomics, transcriptomics, proteomics and metabolomics are applied (Figure 1- 3).

Cancer biomarkers can be assessed directly by measuring the secreted substances from cancer cells or indirectly by changes that occur in response to the development of cancer e.g. immune response, hormonal changes, profile of serum proteins by using mass spectrometry. Sometimes more than a single biomarker is required [103].

The ideal biomarkers for detecting cancer at an earlier stage should ideally have the following properties; 1) Non-invasive sample collecting methods. Samples such as urine or blood are preferred. 2) Inexpensive. 3) Once the standard protocols for analysis is established, it can be done anywhere and yields the same valid results. 4) High specificity (low false positive rate) [103].

Even though the number of papers about biomarkers in veterinary species is less than human, promising biomarkers for disease detection like chronic kidney disease or appropriate breed selection in livestock species are available [104, 105]. Cancer biomarkers in companion animals are studied because veterinary species such as dog is a suitable animal model to study human cancers [106, 107].

22

Figure 1- 3 Tools for biomarker discovery. (Summarized from Jain (2010) and Niedbala et al. (2009) [100, 101]. Parts of the image were modified from Baker et al. (2014) [108].

1.4.3 Steps in biomarkers development

To deliver a biomarker into clinical use, it needs to pass several steps which could take years of research time, involving hundreds of researchers in multidisciplinary fields and thousands of samples sacrificed [109]. Steps in biomarker development are comprised of discovery, validations and clinical implementation or they could be assigned into 5 phases as proposed formerly as shown in Table 1- 7 [103, 110].

In phase 1 or Discovery step, research questions should be clearly defined. For example, what is the target population, what sample specimen to collect, including the inclusion/exclusion criteria. The ideal method of sample collection is prospective collection, but most of the times, archival samples have also been used. Retrospective or observational studies were designed with points to consider such as; 1) variations in methods of sample collection and processing, 2) data analysis procedures and 3) low sample size which could affect the results of the analysis which were likely to cause higher false positive results [110].

Next, in the validation steps, analytical validation or phase 2, the aim is to assess how accurate and reliable the test can measure the analytes. The sample set used in this phase could be the same as

23

used in phase 1, and the biomarkers are validated by; 1) Same assay which is done initially to see how constant the results are measured and 2) a different assay. For example, candidate protein biomarkers discovered by mass spectrometry are validated by an immunohistochemistry; gene expression profile’s candidate gene biomarkers are validated by RT-PCR or in situ hybridization [110].

The clinically utility validation or phase 3 and 4 of the biomarker development, is to assess how reliable the test is when correlating with the outcome. These phases may employ either a retrospective-longitudinal repository study or prospective screening studies, if there is no financial constraint For Clinical implementation, there are 4 key elements to consider when delivering the novel biomarkers to be utilized; 1) regulatory approval, 2) commercialization, 3) coverage by health insurance companies and 4) incorporation in clinical practice guideline [103, 110].

Table 1- 7 Steps in cancer biomarker development [103, 110]. The asterisk (*) indicates where studies of this thesis are at.

Pepe et al. (2001) [103] Goossens et al. (2015) [110]

Phase 1 Preclinical exploratory* Biomarker Discovery*

Phase 2 Clinical assay and validation Analytical validation*

Phase 3 Retrospective longitudinal study Clinical utility validation

Phase 4 Prospective screening

Phase 5 Cancer control Clinical implementation

24

1.5 Serum biomarkers for canine visceral haemangiosarcoma

For the ideal biomarkers, they should be collected from patients by the least invasive methods and should be specific to the disease as well. While the spleen is the most common organ in dogs that develops HSA, but not all the masses are HSA. They could be other types of cancers such as , histiocytic sarcoma or benign lesions like hematoma [14, 18, 111].

Plasma and serum biomarkers are considered good options for canine visceral HSA screening or diagnosis markers because of their less invasive sample collection techniques. However, commonly reported concerns were about the specificity. Increased concentration of serum α-1 acid glycoprotein (AGP), plasma thrombin-anti thrombin (TAT) complex and thymidine kinase 1 (TK-1) were detected in HSA and various types of tumours, both malignant and benign [112-114]. While serum vascular endothelial growth factor (VEGF) and ferritin were also elevated in HSA when compared to healthy dogs, none of the markers could distinguish HSA from other HSA-like splenic lesions such as haematoma [115-117]. Serum collagen XXVII alpha-1 concentration is higher in dogs with HSA compared to healthy dogs or dogs with other cancers or inflammatory diseases, but the concentration of this marker in the serum of dogs with HSA-like splenic lesions was not investigated [118].

Plasma/serum HSA biomarkers that were capable of differentiating HSA from other benign splenic lesions (so called 'HSA-like') included Big endothelin-1 (ET-1), circulating micro RNAs and several glycoproteins. Serum ET-1 concentration was higher in dogs with HSA when compared to HSA-like splenic lesions including benign and other malignant tumours [119]. Plasma circulating micro RNAs, miR-214 and miR-126 were detected in dogs with HSA at higher levels than in healthy dogs and dogs with non-malignant lesions [120]. Despite a range of biomarkers for splenic lesions including HSA were discovered, to deliver a biomarker into clinical use, it needs to pass several steps from the discovery through analytical validation, clinical utility validation and then clinical implementation. This could take years of research time, involving hundreds of researchers in multidisciplinary fields and thousands of samples are needed [110]. Canine visceral HSA serum/plasma biomarkers that have been studied are summarized in Table 1- 8.

In addition to those published studies, a PhD thesis from The University of Queensland School of Veterinary Science describes the identification of serum glycoprotein biomarkers for the detection of canine visceral HSA. In this study, candidate glycoprotein biomarkers were isolated from HSA, HSA-like and normal dog serum using lectin-coated magnetic bead array, then protein identification was performed by tandem mass spectrometry (LeMBA/MS/MS). Among a panel of lectins

25

(proteins that specifically recognize and bind to glycans of glycoproteins, [121]) used to identify glycosylation signatures, there were four lectins which differentially isolated candidate glycoproteins from HSA serum, and thus potentially could be used to measure differential glycosylation of candidate protein biomarkers for canine visceral HSA; WGA (Wheat germ agglutinin), SNA (Sambucus nigra), DSA (Datura stramonium) and PSA (Pisum sativum) [122]. Top 20 candidate glycoprotein biomarkers for canine visceral haemangiosarcoma are shown in Table 1- 9.

Table 1- 8 Potential plasma and serum biomarkers for canine visceral haemangiosarcoma.

Potential biomarkers Implications Notes References

Plasma vascular Elevate Potential. Could not distinguish [115, 117] endothelial growth between HSA and benign vascular factor (VEGF) lesions (e.g. hematoma).

Plasma Thrombin- Elevate Is found elevated in various types [112] anti-Thrombin (TAT) of tumours, not specific to HSA. complex

Platelet count and TS < 5.8 Markers were useful only when [16] Total solid g/dL and dogs already presented with clinical concentration (TS) platelet signs. count < 90,000 cells/µL

Plasma cardiac Elevate Marker reflects cardiac myocardial [123] Troponin I (cTnI) damage, not a specific marker for HSA.

Serum α-1 acid Elevate Is found elevated in various types [114] glycoprotein (AGP) of tumours, not specific to HSA.

Serum Collagen Elevate Highest concentration when [118] XXVII alpha 1 compare to other cancers and inflammatory diseases. Comparison between HSA and vascular lesion (e.g. hematoma) was not investigated.

Serum Thymidine Elevate Is found elevated in various types [113] kinase 1 (TK-1) of tumours, malignant and benign.

Continue the next page

26

Table 1-8 (Continue)

Potential biomarkers Implications Notes References

Serum ferritin Elevate Potential. Could not distinguish [116] between HSA and benign vascular lesions (e.g. hematoma).

Serum glycoproteins Decrease Potential HSA-related glycoprotein [122] captured by lectins; biomarkers. Further validation is Maltase-glucoamylase needed. (MGAM), complement C7, vitronectin (VTN)

Assessment of Highly No association found at this stage. [124] acanthocyte presented in blood smear of HSA dog.

Serum Big Elevate Potential HSA biomarker. [119] Endothelin-1 (ET-1) Further validation is needed.

Plasma circulating Detected Has potential to be HSA biomarker, [120] miRNAs (Expressed) further validation is needed. -214, -126

Serum monocyte Elevate Comparison of monocyte [125] chemokine CCL2 chemokine CCL2 was made between serum from HSA to healthy dogs. In metastatic site (), higher density of CD18+ cells were presented in HSA when compared to other cancers (OSA, STS, melanoma, TCC). Comparison between HSA and vascular lesion (e.g. hematoma) was not investigated.

27

Table 1- 9 Top 20 candidate glycoprotein biomarkers for canine visceral haemangiosarcoma, pulled down by Lectin magnetic bead microarray then analysed by mass spectrometry (LeMBA/MS), ranked by analysis of area under the curve (AUC). General specificity; SNA/sialic acid, WGA/D-N-Acetyl glucosamine, DSA/D-N-Acetyl glucosamine, PSA/α-L-Fucose and LPH/complex specificities [122].

Number Lectin_protein AUC Number Lectin_protein AUC

1 SNA_MGAM 0.8105 11 SNA_CP 0.7250

2 LPH_MGAM 0.7594 12 SNA_ATIII 0.7224

3 SNA_C3 0.7500 13 PSA_CP 0.7208

4 WGA_C7 0.7449 14 LPH_ATIII 0.7112

5 SNA_AFM 0.7408 15 WGA_MGAM 0.7078

6 PSA_C7 0.7403 16 LPH_C7 0.7041

7 DSA_C7 0.7286 17 NPL_C7 0.7020

8 SNA_VTN 0.7276 18 WGA_GSN 0.6976

9 WGA_ATIII 0.7264 19 PSA_MGAM 0.6970

10 DSA_MGAM 0.7254 20 WGA_LRG1 0.6858

28

1.6 Tissue markers for canine visceral haemangiosarcoma

Judged on published reports, the purposes for studying canine HSA tissue markers focused in 3 major areas, which are (i) finding the treatment target, (ii) developing diagnostic and prognostic methods and (iii) to understand biological behaviour or pathogenesis of this cancer.

Factor VIII related antigen or von Willebrand factor (vWf) and Ulex europaeus-1 agglutinin (UEA- 1) lectin were used for human endothelial cell markers in immunohistochemistry of FFPE tissue samples, they also expressed on vascular neoplasms such as Haemangioma (HA) and Haemangiosarcoma (HSA) [31, 126]. In canine HSA and HA, Factor VIII related antigen was first reported as endothelial cell markers which was also immuno-positive in 100% of HA tissue samples. While fewer positive samples (89%) were detected in the HSA [24]. Lectin histochemistry using several lectins, including UEA-1 for canine endothelial cell markers were performed by Augustin-Voss et al (1990). In contrast to the results previously reported in human tissue, lectin histochemical signal for UEA-1 lectin was not detected on normal endothelial cells of dog tissue while the signals on HA and HSA tissues were varied. In 1995, the utility of Platelet Endothelial Cell Adhesion Molecule (PECAM) or CD31 as canine endothelial cell markers was reported [25]. Immunohistochemistry on canine HSA and HA FFPE tissues in the study showed better performance of CD31 in detecting HSA (100% positive samples) when compared to vWf (73% positive samples). Since then, CD31 along with Factor VIII related antigen (vWf) have been widely used in immunohistochemistry as endothelial cell markers to help distinguish vascular tumours from other soft tissue tumours.

Some antibodies showed differences immunohistochemical signal intensity between malignant vascular tumour and benign tumours or lesions, i.e. HSA vs HA. Claudin-5, VEGF, flk-1, VEGFR- 3, CD44, CD117, AGP-α and PDGF-β were among those proteins, suggesting the roles of those proteins in malignancy [54, 114, 127-129]. Because the assessment of immunohistochemical signals is subjective, to apply any markers for comparing levels of signal intensities, standard immunohistochemical scoring system is needed.

Some proteins such as COX-2, p53 and cyclinD which were known to have important roles in cancer were explored, however the studies reported no immunohistochemical signal was detected on HSA tissues [130, 131]. Immunohistochemical labelling of canine HSA tissues with some molecules helped demonstrate tissue expression of those molecules and provide information on possible treatment targets. Table 1- 10 summarizes the results of the studies that have explored antibodies for canine HSA immunohistochemistry.

29

Table 1- 10 Markers that were explored for immunohistochemistry on canine haemangiosarcoma (HSA), haemangioma (HA) and other tissue samples.

Markers Sample (n) Results summary References

HSA HA Others

COX-2 19 20 17 No IHC signal detected on HSA tissues. [130]

STATs 22 18 - IHC for signal transducer and activator [132] of transcription-3 showed higher percentage of positive cells on HSA tissues when compared to HA.

VEGF, 73 15 3 IHC for VEGF and flk-1 in HSA showed [128] bFGF, higher signal compared to HA, suggested important roles of VEGF in malignant flt-1, proliferation. flk-1

p53, 6 6 - No IHC signal for p53 and cyclinD [131] p21, detected on any HSA and most of HA tissues. p16 IHC signal for Estrogen Receptor-α was cyclinD, higher in HSA tissues. PCNA, pAkt, telomerase, ER-α

Claudin-5 26 6 60 All HSA tissues were positive for [127] immunohistochemical reaction with Claudin-5 and CD31. In solid HSA growth pattern, IHC signal for vWf was detected in 6/11 samples.

HoxA9, 78 30 8 The Positive Index (%) of HSA's [133] HoxB3, homeobox protein expression, especially HoxD3 and Pbx1 was higher than HA's. HoxD3, The expression of homeobox proteins in HoxB7, HSA cell is similar to those in the active Pbx1, endothelial cell during angiogenesis. Meis1 (Continue next page)

30

Table 1-10 (Continue)

Markers Sample (n) Results summary References

HSA HA Others

CD117 40 29 10 HSA tissue expression of CD117, [54] (KIT), (GT) VEGFR-3 and CD44 is higher than HA. VEGFR-3, Normal vascular endothelial cells in granulation tissue (GT) also expressed CD44 CD117.

AGP-α 4 - 3 HSA and other malignant tumours were [114] immuno-positive for alpha-1 acid glycoprotein (AGP-α), while IHC signal was not detected in hematoma and nodular hyperplasia tissue.

PDGF-β, 46 21 - HSA had the highest proportion of [129] PDFG-BB, samples that showed strong IHC signal for PDGF-β, when compared to HA. PDGF-α

flk-1, 122 - - HSA samples showed various IHC signal [29] intensities of the markers performed. VEGFA, Ang-2, Tie-2

PDGFR-α, 27 20 - Despite the mutations of PDGFRs genes, [134] PDGFR-β, the expression of the proteins was still detected by IHC. More than half of HSA samples showed higher PDGFR- β signal intensity when compared to HA.

uPA, 57 26 - IHC for urokinase plasminogen activator [135] uPAR and the urokinase plasminogen receptor were detected in 73.2% and 100% of HSA splenic tissue samples, respectively. Lower proportion of positive samples were detected in HA tissues.

(Continue next page)

31

Table 1-10 (Continue)

Markers Sample (n) Results summary References

HSA HA Others

RTKs, 10 - 2 c-kit and PDGFR-2 were suggested as [136] c-kit, specific Ag for HSA VEGFR-2, PDGFR-2, PI3K, phosphor- Akt, phosphor- mTOR, ERK

KIT 36 16 GT Despite the mutations of KIT genes in [137] HSA cells, IHC signal for KIT was detected in 94.4% of HSA samples. While no IHC signal was detected in HA samples.

PSMA 27 - - IHC signal for Prostate-specific [138] membrane antigen was detected in all HSA samples

32

1.7 Fourier Transform Infrared (FTIR) spectroscopic imaging

1.7.1 Infrared (IR) spectrum

‘Infrared (IR) radiation refers broadly to that part of the electromagnetic spectrum between the visible and microwave regions’ [139] (Figure 1- 4). The properties of electromagnetic spectrum are addressed in several aspects depending on which region is used and what is the most convenient way to mention such as energy (E), wavelength (λ) or frequency (ν) [140]. The size, or wavelength of the IR spectra is slightly longer than visible light and about the size of a pinpoint (Figure 1- 4).

Figure 1- 4 Electromagnetic spectrum. (Image source: https://mynasadata.larc.nasa.gov/science-practices/electromagnetic-diagram/).

1.7.2 Fourier Transform Infrared (FTIR) spectroscopic imaging

FTIR technology allows us to identify chemical composition in the substance. FTIR imaging tools provide spatial information of chemical composition [141]. A wide range of FTIR imaging usage has been applied, for examples, forensic analysis of foreign materials in biopharmaceutical manufacture [142] or identification of pigments, binders and degradation products in an antique painting [143]. Recently researchers in the biomedical field became interested in using IR spectra to assess biochemical contents in biological materials (e.g. cells, tissue, biofluid etc.) [108]. Based on the principle that when IR light travels across molecules in biological materials, the radiation is absorbed and converted into vibrational energy. The absorbance at particular wavenumber can then

33

be measured. For biological material analysis by using IR spectra, the wavelength is measured and stated in the reciprocal term, wavenumber (ṽ) where the most convenient scale of unit is cm-1. Each molecular bond in biological material such as lipid, protein, DNA, carbohydrate etc. has its own wavenumber absorption preference, i.e., has a distinct IR spectrum. The IR spectra range from 10,000 to 100 cm-1 while the spectra used for analysing biological materials is within 4000 - 400 cm-1 [108, 139].

The former method for measuring IR absorbance was a dispersion device, the absorption area of the sample was plotted as frequency versus intensity. With the Fourier Transform Infrared spectroscope (FTIR), the ‘interferometer’ was introduced to create an ‘interferogram’, which is the outcome from passing different IR wavelengths through the sample at the same time. Another part introduced into the machine is the computer part which is to ‘transform’ the interferogram into the typical IR spectrum of each sample. This allowed the peaks to be plotted and analysed. The IR spectral peaks reflect biological contents in each point where IR spectrum passes through (Figure 1- 5) [139, 144].

Figure 1- 5 Sample of IR spectrum acquired from FTIR spectroscope [108].

Workflow for FTIR spectroscopic imaging and diagnosis has 3 major steps; sample preparation, FTIR spectral acquisition and data analysis. Types of sample (tissue, live cells or biofluid) and substrate suitable for FTIR spectral sampling mode are first considered in the sample preparation

34

step. In the FTIR spectral acquisition step, there are 3 modes; transflection (or reflection), transmission and attenuated total reflection (ATR). Substrate to be used depends on which mode of spectral acquisition chosen, e.g. Low emission (Low-E) slide is suitable for transflection mode, and Calcium fluoride is suitable for transmission mode. Transflection mode means the IR spectra pass through the sample and then reflect to the detector while in transmission mode the IR spectra pass through sample to the detector. Choice of substrate and sampling mode is considered based on previous works on similar samples, as this technology has been introduced into biological material analysis comparatively recently, optimization in those steps may need to be done [108].

For initial data analysis, raw spectral data can be processed into ‘chemical image’ or categorized into ‘clusters’ which is part of the training process. For diagnostic purpose, the trained classification system will be applied to the test data set; the workflow diagram can be found in Baker et al. (2014) [108]. FTIR spectroscopy computational software is an important part for data analysis.

In cancer diagnosis research using tissue samples, the advantages of FTIR spectroscopy over traditional H&E stain or special stains such as Masson’s trichrome, cytokeratin stain etc. include: minimizing sample preparation protocols and time, does not need contrast agents to be stained and the chemical changes recorded by IR spectroscopy yielded the same as histological stains [145].

1.7.3 Using FTIR spectroscopy to identify biochemical components in canine tissues

FTIR spectroscopic imaging of canine advanced hepatic histiocytic sarcoma and normal liver tissues were compared using different spectral sampling modes (transmission vs transflection) and sample preparation techniques (deparaffinized vs paraffinized sample). Sample preparation for transmission mode was 8 µm tissue on CaF2 slide and for transflection mode, 4 µm tissue was placed on Mirr-IR Low-E slide substrate. IR spectral sample collected using Focal plane array (FPA) detector, 4000-900 cm-1 spectral range, 8 cm-1 spectral resolution. For IR spectral data analysis, Cytospec and Unscrambler software were used [146].

Even though transflection mode was known to have interference from electric field standing wave (EFSW) artefacts greater than in transmission mode [147]. And deparaffinization was recommended before acquiring IR spectra, non-deparaffinized tissue samples on Low-E slides were shown to yield similar diagnostic results as in transmission mode with some consideration and adjustment needed [108, 146]. This suggests that besides good sample preparation technique and appropriate FTIR spectral acquisition, data processing by using computer software is helpful to overcome some limitation of the transflection mode.

35

1.8 Hypothesis, aims and thesis structure

The visceral form of HSA is difficult to detect. The clinical signs are non-specific and affected dogs are often brought to an emergency unit due to the consequences of tumour rupture such as blood loss into the abdominal cavity, collapse or death. A diagnosis of HSA carries a very poor prognosis with most dogs surviving less than a year. Immediate euthanasia is therefore commonly chosen by owners and veterinarians.

The goal of the project is to help those dogs by exploring biomarkers that can improve HSA diagnosis and prevention. The thesis comprises three major themes; 1) glycoprotein biomarkers, 2) genetic markers and 3) infrared spectral markers, with four independent studies conducted. The study rationales and hypotheses are described as followings:

1.8.1 Rationale 1 Glycoprotein biomarkers

From previous studies [122], glycoproteins were promising biomarkers for dog HSA diagnosis. This work has been initiated and candidate glycoproteins were sorted out by comparing glycoproteins in serum from HSA dogs to non-HSA (or HSA-like) dogs. However, serum samples analysis by using mass spectrometry could not demonstrate which cell the glycoproteins were produced from unless immunohistochemistry (IHC) or lectin histochemistry (LHC) is performed on cancer tissues to visualize the actual location of glycoprotein production.

Hypothesis 1

The candidate glycoproteins for canine HSA detected in the serum, should be detected in HSA cells and the level of glycoprotein expression is different from normal vascular endothelial cell

Aim 1

Validation of canine HSA glycoprotein biomarker candidates and determine sites of those glycoproteins’ production/expression on HSA tissues.

Objective 1.1 To locate sites of the production/expression of these glycoproteins.

Objective 1.2 To demonstrate differences in the level of glycoprotein marker expression between HSA and non-HSA tissues.

Objective 1.3 To determine possible HSA-specific tissue markers.

36

Significance

1. Narrow down canine HSA glycoprotein biomarker candidates, facilitate further development.

2. There are possibilities to propose novel HSA tissue specific markers in addition to those that were unable to clearly distinguish malignant HSA from non-malignant vascular lesions.

3. Proteins/glycoproteins expression characteristics such as sites/cells involved in glycoprotein production could be determined and contribute to more understanding of this cancer.

1.8.2 Rationale 2 Inherited HSA predisposing genetic markers

The pathogenesis of canine HSA remains unclear. It is commonly seen in some dog breeds such as Golden Retriever and German Shepherd, which implies that genetic components could be involved. Predisposing factor that increase the risks for these dogs to develop HSA was suspected to be germline mutated, inherited and then segregated in some breeds because of in-breeding process.

There are still no conclusive variants, 'alterations in the most common DNA nucleotide sequence' (NCI Dictionary of genetics terms) to be HSA predisposing gene markers. Whole sequencing plus bioinformatic analysis allowed us to explore multiple variables and can fine map the altered base sequences. There are possibilities to identify 'carrier' dogs at a higher risk to develop HSA that should be bred away by dog breeders or raise the awareness of dog owners to monitor their pet's health condition.

Hypothesis 2

Dogs that develops HSA were carriers for HSA related germline variants while normal dog were not. The variants can be identified by association analysis study.

Aim 2

Identification of genetic markers associated with HSA that can be used to predict individual risk.

Objective 2.1 To identify visceral HSA related germline variants in dogs, by the analysis of whole exome sequence data.

Significance

1. Prevalence of HSA in dogs can be reduced. When there are genetic markers that can identify HSA risk carrier dogs, breeding management can be performed by avoid breeding the carrier dogs.

37

2. Whole exome sequence data generated in this study will be contributed to the data base archive for future studies.

1.8.3 Rationale 3 Somatic mutations in canine visceral haemangiosarcoma

Genomic instability is one of the hallmarks of cancer, including HSA. To date, genomic profile of cancer patients can be obtained using next generation sequencing technologies. Knowing what went wrong in the cancer genes can help elucidate the pathogenesis of the cancer and also be informative for treatment design in precision medicine such as development of drug target, being selective markers for chemotherapeutic agents or becoming prognostic markers. Whole Exome Sequencing studies showed possible driver mutated genes in dogs were similar to human (PIK3CA, TP53, PTEN, PLCG1) and dog could therefore possibly be an animal model to study human Angiosarcoma (AS), which is rare compared to dog HSA. However, the driver mutations in dog were not obvious and still needed to be investigated further.

Hypothesis 3

HSA cell underwent genomic alterations, its genomic profile is different from those of normal somatic cell and can be identified.

Aim 3

Identification of genetic alterations in canine visceral HSA

Objective 3.1 To identify genes that undergo somatic mutations that significantly related to canine visceral HSA, by analysis of whole exome sequence data.

Significance

1. Potential driver mutations in canine HSA are identified and highlighted.

2. Whole exome sequence data generated in this study will be contributed to the data base archive for future studies.

1.8.4 Rationale 4 Using Fourier transformed infrared (FT-IR) imaging to distinguish splenic lesions in canine formalin-fixed, paraffin-embedded (FFPE) tissues.

Based on the hypothesis that HSA cells in the spleen produce different biochemical components from normal cells in the splenic parenchyma. Biochemical alterations may precede visible

38

morphological changes in cancerous cells. The detection of those is one of the keys to a successful early cancer diagnosis.

Detecting biochemical component differences between cancerous patient samples versus non- cancerous samples is a method to discover cancer biomarkers. There is a non-sample destructive, label-free chemical compound analysis method using infrared spectrum called Fourier Transform Infrared (FTIR) imaging which recently has been applied to analyse biochemical components in a wide range of biological samples, including cancer tissues. However, there are countable numbers of studies using dog cancer tissues, especially the spleen.

This study is to explore the use of this technique to discover which biochemical compounds in dog HSA tissues are different from HSA-like tissues and the capability of FTIR imaging plus machine learning algorithms in distinguishing HSA tissues from HSA-like tissues.

By knowing what biochemical compounds to look for, further steps, i.e. serum biochemistry profile analysis, could be done in the future to ascertain the efficacy as HSA biomarkers. Or a rapid cancer scanning test using freshly collected tissue samples from dogs could be developed.

Hypothesis 4

The HSA cell contains different biochemical components, compared to those of normal endothelial cell and can be detected using FTIR imaging plus machine learning and classification testing.

Aim 4

Exploring the capability of FTIR imaging for the detection of canine HSA in FFPE tissue sample.

Objective 4.1 To optimize FTIR image data acquisition method: Reflection/Transmission modes.

Objective 4.2 To optimize sample preparation methods for FTIR image data acquisition.

Objective 4.3 To detect FTIR spectral data differences between HSA cancer cells and normal endothelial cells.

Objective 4.4 To test FTIR imaging plus machine learning algorithms whether they can correctly classify canine HSA tissue from non-cancerous tissue.

Significance

1. Reports proof-of-concept evaluation of Fourier Transform Infrared (FTIR) imaging for canine HSA diagnosis.

39

This thesis includes four experimental chapters according to the rationale above. The structure of the thesis is demonstrated in Figure 1- 6.

Figure 1- 6 Thesis structure and chapters. The thesis comprises three main topics, glycoprotein biomarkers (chapter 2), genetics regarding germline mutations (chapter 3), somatic mutations (chapter 4) and exploration of novel cancer diagnostic method (chapter 5).

40

Chapter 2 Validation of candidate glycoprotein biomarkers for canine visceral haemangiosarcoma using semi-quantitative lectin/immunohistochemical assay

2.1 Introduction

The visceral form of haemangiosarcoma (HSA) remains one of the more common, invariable fatal cancers in dogs. The cancer develops in visceral organs such as spleen, liver and heart. There are usually no specific clinical signs until the tumour grows and ruptures, resulting in massive blood loss. Sudden death or euthanasia are often the consequence. In practice, the mass along with the affected organ, especially the spleen, may be surgically removed, however, the survival time of dogs diagnosed with HSA is less than a year [23, 70]. Definitive diagnosis for HSA is histopathologic examination of tissue samples obtained via surgical spleen removal or post-mortem necropsy.

Under microscopic examination of hematoxylin & eosin (H&E) stained sections, typical HSA cells are elongated, spindle-shaped, plump, anaplastic vascular endothelial cells and arrange themselves to form vascular channels or chambers along with a supporting acellular, bright eosinophilic hyaline substance forming the stroma [3, 38, 48, 50]. HSA growth patterns were summarized following the dominant pattern of solid, capillary/small calibre spaces (typical), cavernous/cystic/large vascular spaces (cavernous) or the combination of all patterns (mixed). However, no significant differences were found in the survival time or HSA tissue expression of endothelial cell markers (CD31/ PECAM-1 and Factor VIII-related antigen/vonWillebrand’s factor) among the different growth patterns [29, 49, 148].

Standard histopathologic grading was not established specifically for canine HSA; most reports have been using HSA scoring systems established by Ogilvie et al. (1996), Sabattini and Bettini (2009) and Meuten et al. (2016) which were modified from soft tissue sarcoma [52-54]. The major parameter was the level of cell differentiation from well-, moderately- and poorly-differentiated HSA, then pleomorphism, extend of necrosis and mitotic count were assessed, given histopathologic grades from low to high grade or grade 1 to 3 [52-54].

Like for many other diseases, the sooner correct diagnosis is made, the greater chance of surviving the patient gets. Pre-splenectomy diagnostic methods to aid in HSA detection have been studied. Imaging techniques including radiography, computed tomography (CT-scan), magnetic resonance

41

imaging (MRI) are useful for assessment of metastatic lesions before or after surgical intervention [42, 46, 47]. Conventional radiography and ultrasonography are helpful for non-invasive internal organ exploration for suspected mass [149]. The ultrasound may give some hints on types of the lesions but could not be used to draw a definitive diagnosis [41]. Elastography (the ultrasonographic technology for measurement of tissue elasticity) might be a promising technology in the diagnosis of human breast, prostate or thyroid cancer, because those cancers have the tendency to lose tissue elasticity, whereas in canine patients, this technique was unable to distinguish between benign and malignant splenic lesions [40]. Ultrasound guided fine needle aspiration (FNA) biopsy is usually performed to collect samples for cytologic examination, however, histopathology remains the final definitive diagnostic method [150].

While cytologic examination from FNA biopsy provided over 80% correlation with histopathologic examination when performed on skin masses (90.9%), mammary gland tumors (81% for malignant tumors, 93% for benign tumors) and at various organ sites (90.4%) [151-153], the correlation between cytologic and histopathologic examination was lesser for the internal organs. FNA biopsied cytologic examination of focal lesions in the liver was 60% and 16.7% correlated with histopathologic results for overall neoplasia and sarcoma, respectively [154]. In splenic lesions, cytologic examination results was 59% correlated with histopathologic examination [33]. In another study, assessing correlation of test results between FNA and needle core biopsy (NCB) of canine splenic lesions, it was found that among two HSA samples, one cytologic result agreed with NCB while the other result disagreed [36]. While the combination of ultrasound guided FNA with NCB may be helpful for diagnosis of splenic lesions, caution must be exercised when performing it on HSA suspected lesions, because the risk of excessive bleeding and tumor seeding are major concerns, especially when using larger than 18G needle [35-37]. It was also noted that the aspirates from spleen tend to have large amounts of red blood cell contamination [35].

Plasma and serum biomarkers are considered good options for canine visceral HSA screening or diagnosis markers because of their less invasive sample collection techniques. However, commonly reported concerns were about the specificity. Increased concentration of serum α-1 acid glycoprotein (AGP), plasma thrombin-anti thrombin (TAT) complex and thymidine kinase 1 (TK-1) were detected in HSA and various types of tumours, both malignant and benign [112-114]. While serum vascular endothelial growth factor (VEGF) and ferritin were also elevated in HSA when compared to healthy dogs, none of the markers could distinguish HSA from other HSA-like splenic lesions such as haematoma [115-117]. Serum collagen XXVII alpha-1 concentration is higher in

42

dogs with HSA compared to healthy dogs or dogs with other cancers or inflammatory diseases, but the concentration of this marker in the serum of dogs with HSA-like splenic lesions was not investigated [118].

Plasma/serum HSA biomarkers that were capable of differentiating HSA from other benign splenic lesions (so called 'HSA-like') included Big endothelin-1 (ET-1), circulating micro RNAs and several glycoproteins. Serum ET-1 concentration was higher in dogs with HSA when compared to HSA-like splenic lesions including benign and other malignant tumours [119]. Plasma circulating micro RNAs, miR-214 and miR-126 were detected in dogs with HSA at higher levels than in healthy dogs and dogs with non-malignant lesions [120]. Despite a range of biomarkers for splenic lesions including HSA were discovered, to deliver a biomarker into clinical use, it needs to pass several steps from the discovery through analytical validation, clinical utility validation and then clinical implementation. This could take years of research time, involving hundreds of researchers in multidisciplinary fields and thousands of samples are needed [110].

Glycoproteins are proteins with carbohydrate or glycans attached to the core molecules as the products of post-translational modification. They are a promising class of cancer biomarkers because glycoproteins are involved in most of the mechanisms of the cancer hallmarks [155-157]. A previous PhD candidate in our group conducted a glycoproteomic study, which identified potential serum HSA glycoprotein biomarkers, using lectin magnetic bead array-coupled tandem mass spectrometry (LeMBA/MS/MS) [122]. The study identified HSA related glycoproteins captured by an array of lectins, the proteins that have binding sites for glycans. Glycoproteins captured by lectins DSA (Datura stramonium), WGA (Wheat germ agglutinin), SNA (Sambucus nigra) and PSA (Pisum sativum) lectins were the top HSA candidate glycoprotein biomarkers and included Maltase-glucoamylase (MGAM), complement component 7 (C7), and vitronectin (VTN).

However, serum sample analysis using mass spectrometry does not reveal which cell type(s) produces the glycoproteins. The aim of the study described in this chapter was to validate previously discovered candidate glycoprotein biomarkers for canine visceral HSA using semi- quantitative lectin- and immunohistochemical assays. To obtain this information immunohistochemistry (IHC) or lectin histochemistry (LHC) must be performed on cancer tissues to visualize the actual or likely location of glycoprotein production. Therefore, further steps in narrowing down the candidates and validation in different sources of samples and/or larger sample size is essential. Using archived tissue samples for study may not directly aid early diagnosis but, a specific marker would then be re-validated in a convenient sample, such as blood, using

43

immunoassays, such as ELISA, before the biomarker can be launched or commercially developed in the future.

2.2 Materials and methods

2.2.1 Samples

Formalin-fixed paraffin-embedded (FFPE) tissues were collected between 2010 - 2017 from the Veterinary Medical Centre, the University of Queensland, and Brisbane Veterinary Specialist Centre (BVSC) or, provided by the Small Animal Specialist Hospital (SASH), Sydney. The sample collection at the University of Queensland Veterinary Medical Centre and BVSC was performed under the Animal Ethics Approval (AEC) number SVS/438/15/KIBBLE/CRF with the animal owner's consent. The inclusion criteria prioritized cases presenting with an abnormal mass in spleen regardless of the diagnoses. Cases with HSA-suspected mass in other internal organs, such as liver and heart, were also collected.

Fifty-eight FFPE samples were selected after histopathologic diagnosis was made by pathologists. Forty-nine samples were collected from spleen, six from liver and one sample from heart, peritoneal mass and sternal mass, respectively. Immunohistochemistry targeting CD31 and myeloid lineage antigen (MAC 387) were performed on ambiguous samples to confirm the endothelial and macrophage/monocytic cell origin, respectively.

The samples were assigned to two groups, HSA (n = 32) and non-HSA (n = 26). Thirty-two samples were diagnosed as HSA whereas twenty-six samples were diagnosed as the following: adenocarcinoma (n = 1), apocrine gland tumour (n = 1), fibrohistocytic nodule (n = 1), haemorrhage (n = 2), hepatocellular carcinoma (n = 5), hematoma (n = 5), lymphoid hyperplasia (n = 1), lymphoma (n = 3), nodular hyperplasia (n = 1), normal spleen tissue (n = 3), splenitis (n = 1), ambiguous diagnosis (n = 2). One patient had two lesions, HSA and lymphoid hyperplasia, respectively. Number of cases and their diagnosis were summarised in Table 2- 1. Further information of patient profile is included in Appendix 1.

For HSA samples, histopathologic grades from 1 – 3, according to tumour differentiation (1 = well- differentiated, 2 = moderately differentiated, 3 = poorly-differentiated) were assigned [53]. Four types of growth patterns [29, 49] were also assessed for further analysis. Numbers samples in each category are shown in Figure 2- 1.

44

Table 2- 1 Number of cases and clinical parameters. Diagnosis Cases Sex Cases Haemangiosarcoma 32 Male intact 11 Adenocarcinoma 1 Male sterile 11 Apocrine gland tumour 1 Female intact 7 Fibrohistiocytic nodule 1 Female sterile 12 Haemorrhage 2 No information 17 Hematoma 5 Hepatocellular Carcinoma 5 Age (median, range) 13 ± 3, 6 – 12 Lymphoid hyperplasia 1 Lymphoma 2 Nodular hyperplasia 1 Normal spleen 3 Splenitis 1 Questioned 2

Figure 2- 1 Distribution of thirty-two HSA samples assigned by histopathologic grades, (left chart) and growth patterns (right chart).

2.2.2 Lectin-histochemistry (LHC)

Five µm thick FFPE tissue sections were cut and placed on Menzel-Glaser Superfrost PLUS® 25 x 75 x 1 mm microscope slides (Thermo® Scientific), heat-fixed for 90 minutes at 60o C, then cooled before deparaffinization and rehydration. No antigen retrieval was performed prior to DSA and WGA histochemistry, while heat induced epitope retrieval (HIER) using Tris/EDTA buffer, pH9, was performed for SNA lectin. Blocking of nonspecific peroxidase activity and protein binding were done by incubating with peroxidase (Peroxidase Block, Dako®, Agilent Technologies) for 15 minutes at room temperature followed by 15 minutes incubation with 0.15 M glycine in PBS, where a quick rinse with TBST wash buffer (0.05 mol/L Tris-HCl, 0.3 mol/L NaCl, 0.1% Tween 20, 0.01%, pH 7.6, TBST Tris Buffer Saline with Tween 20, S3306; Dako®, Agilent Technologies) was

45

applied in between. The tissues were then incubated with antibody diluent with background reducing components ready-to-use (Dako®, Agilent Technologies) for 30 minutes at room temperature as a final step of non-specific protein blocking. Then the tissues were incubated with PBS-diluted lectins at their optimal dilutions shown in Table 2- 2.

Glycan (sugar) chains of the tissue glycoproteins (GlcNAc)2-4, GlcNAc, Neu5Acα6Gal/GalNAc and αMan/ αGlc were detected by biotinylated lectins; DSA (Datura stramonium), WGA (Wheat germ agglutinin), SNA (Sambucus nigra) and PSA (Pisum sativum), respectively. Lectins and their main sugar specificity are shown in Table 2- 2. The slides were incubated in a moist chamber at room temperature for one hour. The lectin binding was visualised using the avidin-biotin peroxidase complex method (Vectastain Elite® ABC Kit Standard, Vector Laboratories Ltd.) and the chromogen substrate 3-amino-9-ethylcarbazole (AEC). The resulting signal was a bright red histochemical signal. Nuclei of all cells were counterstained with Mayer’s Hematoxylin (Lillie’s Modification) Histological Staining Reagent (Dako® Agilent Technologies). A summary of the lectin-histochemistry protocols is shown in Table 2- 2. Figure 2- 2 demonstrates lectin- histochemistry, binding and detection. The sections, mounted with aqueous based mounting medium, Faramount® (Dako®, Agilent Technologies, Denmark), were examined by light microscopy and scanned (Olympus slide scanner VS120, Olympus®) for further analysis.

Figure 2- 2 Lectin-histochemistry. Lectin binding to glycan chains of the glycoproteins on tissue samples. Biotinylated lectins bound to the glycan chains of the glycoproteins on the tissue. Peroxidase-linked avidin-biotin complex (ABC) then binds to the biotinylated lectins. The chromogenic substrates were then added to visualize the Avidin-biotin peroxidase complexes.

46

Table 2- 2 Lectins, including glycan chain specificity and antibodies used in the study and their antigenic retrieval (AR) methods and dilutions. Lectins and Antibodies Class Host No AR Citrate Tris EDTA Proteinase Buffer K pH 6 HIER pH 9 HIER 20 min. 20 min. 15 min. 90o C 90o C Room Temp DSA na na 1:200 - - -

(GlcNAc)2-4 WGA na na 1:500 - - - GlcNAc SNA na na × (1:50)* 1:50 × Neu5Acα6Gal/GalNAc PSA# na na × × × × αMan/ αGlc MGAM# Polyclonal Rabbit × × × × (Clone: HPA002270)

C7 Polyclonal Rabbit - 1:75 - - (Clone: ab192346) Vitronectin Polyclonal Rabbit × 1:100 (1:100)* × (Clone: PA5-27909) CD31 Monoclonal Mouse × × (1:20)* 1:10 (Clone: JC70A) MAC387 Monoclonal Mouse - - - 1:300 (Clone: MAC 387)

× = no lectin/immunohistochemical reaction signal detected - = not performed na = not applicable * = The dilution shown within the parenthesis was performed for optimization, but not on the samples, because the immunohistochemical reaction was weaker than the other AR method. # = All antigenic retrieval techniques were attempted, none resulted in positive IHC.

47

2.2.3 Immunohistochemistry (IHC)

For comparison to lectin-histochemistry results, immunolabeling of candidate glycoproteins, maltase-glucoamylase (MGAM), complement C7, and vitronectin was performed, using rabbit polyclonal anti-MGAM (HPA002270, Sigma Life Science), anti-complement C7 (ab192346, abcam) and anti-VTN antibody (PA5-27909, Thermo Fisher Scientific), respectively. Cross- reactivity between human and canine epitopes was based on the results from previous PhD study [122] where candidate glycoprotein biomarkers were discovered using LeMBA Mass Spectrometry. The identification of glycoproteins (MGAM, C7 and VTN) was based on between human and canine species.

Routinely used endothelial cell marker, CD31 (mouse monoclonal antibody Clone JC70A, M0823, Dako®, Agilent Technologies) was used to confirm the endothelial cell origin of target cells [158]. The cross-reactive mouse monoclonal anti-human myeloid lineage marker (MAC 387, Dako®, Agilent Technologies) was used to aid in the diagnosis of ambiguous tissue samples, notably differentiation of HSA and fibrous histiocytoma.

Following tissue deparaffinization and rehydration, antigen retrieval and blocking of non-specific binding sites were performed using the same protocol as for lectin-histochemistry. Antigen retrieval methods and antibody dilutions for each of the reagents are shown in Table 2- 2. The tissues with antibodies were incubated in a moist chamber overnight at 4o C.

Detection of bound antibodies were done using a two-step, peroxidase labelled polymer conjugated with secondary antibodies (EnVision+ System HRP, Dako®, Agilent Technologies) with visualization using AEC substrate and counterstaining with Meyer’s hematoxylin.

To establish optimal dilutions for lectins and primary antibodies, various tissue antigen retrieval (AR) techniques included no-AR, HIER with Tris/EDTA buffer, pH9, citrate buffer, pH6 (Target Retrieval Solution, S2367, S1699, Dako®, Agilent Technologies) and proteinase K treatment (Proteinase K Ready-to-use; Dako®, Agilent Technologies) were trialled on two randomly selected HSA tissue samples. The optimized antigen retrieval techniques and dilutions for each antibody are shown in Table 2- 2. A schematic of the principles of the IHC protocol is shown in Figure 2- 3.

48

Figure 2- 3 Immunohistochemistry. Binding of primary antibody to epitope of the tissue protein. Peroxidase labelled polymer with conjugated secondary antibody was applied to facilitate visualization.

2.2.4 Positive and negative controls

A typical HSA case was chosen as batch control for lectin/immunohistochemistry. Two microscopic slides with 5 µm consecutive tissue sections were included in every experimental batch, one being a positive control and the other for negative control. This was to ensure constant signal intensity of the lectin/immuno-histochemical labelling and to facilitate relative signal intensity scoring. The internal positive controls were normal vascular endothelial cells in the samples, which showed lectin/immunohistochemical reaction at various intensities. For the external negative controls, at the primary antibody/lectin incubation step, antibody diluent was applied instead of the antibody or lectin.

2.2.5 Semi-quantitative lectin/immunohistochemistry scoring

To achieve the third aim of the study, which was to quantify the candidate glycoprotein expressions in HSA tissues compared to non-HSA, ‘Tissue’ was assessed by components that expressed the glycoproteins, i.e., (1) HSA cells (2) arterial endothelial and (3) venous endothelial cells. The scoring system used for the assessment of lectin/immunohistochemical reaction in this study was modified from the H-score system, which is widely used for evaluating immunohistochemical reaction signal intensity expressed by tissues/cells, such as for the evaluation of steroid receptors in breast cancer [159]. The scores were given to each component separately.

The data collected was based on the assessment of two parameters for tissue glycoprotein expression, the intensity level of the lectin/immunohistochemical reaction (mild, moderate, high or

49

1-3) and the proportion of cells expressing the glycoprotein as a percentage. When combining those two parameters, the H-scores ranged between 0 – 300, which then were classified into four overall intensities from no signal detected to high signal intensity (Table 2- 3).

50

Table 2- 3 H-score calculation and sample microscopic images of different levels of visualised glycoprotein on the tissues. The H-scores were calculated from the combination of lectin/immunohistochemical reaction signal intensity levels (0-3) combined to the proportion of the glycoprotein expressed cells (%).

H-score (0-300) = [1*(% cells 1+)] + [2*(% cells 2+)] + [3*(% cells 3+)]

H-score Overall intensity classification Microscopic image

0 0 (no signal detected)

1-100 1 (mild)

101-200 2 (moderate)

201-300 3 (high)

51

2.2.6 Statistical analysis

The overall intensities of the lectin-immunohistochemical signal of each lectin/antibody and tissue components of the samples were assessed. The comparisons were performed at three levels. The first level, microscopic components, was to compare lectin-immunohistochemical signal intensities expressed by each antibody/lectin across tissue components, i.e., HSA cells, arterial endothelium and venous endothelium. Wilcoxon-signed rank test was applied to test for the difference between each pair of tissue components. Bonferroni correction was performed to adjust p-values for multiple comparison. The second level was to compare the signal intensities across two groups of sample population, dogs with HSA and dogs without HSA. Kruskal-Wallis test was applied for this comparison. The graphical illustrations and statistical analysis were created and performed using R (3.5.1 – 3.6.1) and R package, ggplot2 [160, 161].

The third level was to assess the performance of those lectins/antibodies used in the study, by prioritizing the reagents that had the ability to distinguish between HSA tissues and non-HSA tissues. In this analysis, the LHC/IHC H-scores of ‘HSA’ tissue samples (n = 32) were compared with tissue samples that had histologic morphology resembling HSA, but not actually HSA, i.e., ‘HSA-like’ (n = 6). HSA-like samples included four splenic hematomas and two cases of haemorrhage. Data was visualized by matrix scatter plots and analysed by Recursive Partitioning using R package, Rpart [162].

In addition, the trends of relationship between lectin/immunohistochemical signal intensity by H- scores and HSA histopathological grades or growth pattern were also assessed. For the relationship between H-scores and HSA histopathological grades, Jonckheere-Terpstra trend test and testing of linear model fitting was perform using clinfun, R package [163]. Kruskal-Wallis test was applied for assessing the difference of H-scores among HSA growth patterns.

52

Table 2- 4 Features of data collected from semi-quantitative lectin/immunohistochemistry assay. Level of cell/tissue glycan/glycoprotein expression was assessed by the given H-score, ranges from 0 to 300. The samples were comprised of two disease condition groups, HSA and non- HSA (Column I). The grades and growth patterns subgroups were assigned to each of the HSA samples (Column II and III). Because the glycan/glycoprotein was expressed on multiple cell types, and not specific to the cancer, H-scores were assessed by cell types separately. In the HSA group, there were three cell types: HSA cancer cells, arterial endothelium and venous endothelium; whereas in the non-HSA group, only two cell types: arterial endothelium and venous endothelium were present (Column IV). The data values were assessed as H-scores (Column V). I II III IV V Samples Subgroups Subgroups Cell types/ Values Grades Growth patterns Tissue component Group 1 Grade 1 Typical A. Cancer cell H-scores (0-300) HSA Grade 2 Solid B. Arterial Grade 3 Cavernous endothelium Mixed C. Venous endothelium Group 2 - - B. Arterial non-HSA endothelium C. Venous endothelium

53

2.3 Results

In general, the tissue components that had positive lectin/immunohistochemistry (LHC/IHC) reaction were HSA cancer tissue, arterial endothelium and venous endothelium. Background or non-specific lectin/immune labelling on mononuclear cells and necrotic debris was observed in some samples. Even though the tissue trimming process aimed to collect tissue samples where the normal-cancer tissue margin was presented, not all tissue sections contained every component. Poor quality tissue samples, e.g. very necrotic, containing more than 50% area of cell debris or blood clots, were excluded from the analysis. The features of the LHC/IHC reaction or signal detected on the tissue were different among tissue components, lectins and antibodies. The results are described individually.

2.3.1 Lectin/immunohistochemistry protocol optimization

Optimization protocols for LHC/IHC were performed using seven lectins/antibodies (PSA-Pisum sativum, DSA-Datura stramonium, WGA-Wheat Germ Agglutinin and SNA-Sambucus nigra lectins, anti-human MGAM, anti-human VTN and anti-human complement C7 antibodies). Regardless of antigen retrieval technique applied, no specific signal was obtained for the PSA lectin and the anti-human MGAM antibody, and these two reagents were therefore excluded from the study. Average (mean) of assessed H-scores and Coefficient of Variations (CV, %) for each lectin/antibody were calculated. HSA cells which were immuno-labelled with anti-human complement C7 antibody had the least variations of the IHC signal intensity among experimental batches (9.7 % CV). For arterial and venous endothelial cells, anti- human CD31 antibody showed the least variations (8.4 % and 4.4 % CV for arterial endothelium and venous endothelium, respectively). Table 2- 5 summarizes the number of experimental repeats, average H-scores and the percentage CV for each lectin/antibody.

54

Table 2- 5 Average H-scores for lectin/immunohistochemical signal intensities of each component of the control samples, labelled by each lectin/antibody (DSA-Datura stramonium, WGA-Wheat Germ Agglutinin and SNA-Sambucus nigra lectins and anti-human CD31, anti-human VTN and anti-human complement C7 antibodies). The control tissue slides were included in every batch of the experiments. The Coefficient of Variations (CV, %) were calculated to assess the variations among the repetitive runs. Antibodies/Lectins CD31 DSA WGA SNA VTN C7 Repeats 5 9 11 9 8 8 HSA Mean H-score na* 188.3 219.1 115.5 40.3 253 CV (%) (28.1) (22.7) (81.3) (105.6) (9.7) Positive HSA cells na* 83.3 % 74.6 % 59.4 % 34 % 86.9 % Arterial endothelium Mean H-score 280.8 4.4 245 81.1 21.4 92.5 CV (%) (8.4) (95.4) (11.2) (49.9) (135.2) (53.8) Positive endothelial cells 93.6 % 4.4 % 86.1 % 54.4 % 21.4 % 63.8 % Venous endothelium Mean H-score 275.4 247.2 29.4 87.2 39.5 154.4 CV (%) (4.4) (19.9) (138.6) (77.2) (130.7) (48.5) Positive endothelial cells 91.8 % 83.9 % 18.3 % 45 % 20.8 % 66.9 % na = For CD31 antibody, baboon tissue (courtesy of Dr. Penny Farrell, Univ. of Sydney, with approval from the SSWAHS Animal Welfare Committee) was used for positive control to optimize immunohistochemistry protocol on dog tissues.

2.3.2 Microscopic images of lectin/immunohistochemistry

Selected microscopic images of lectin/immunohistochemistry were captured from microscopic slide scanned images. Anti-CD31 antibody was used to confirm endothelial cell origin of the HSA tissue samples. CD31 immunohistochemical labelling on vascular endothelium appeared bright red when visualized with AEC (Figure 2- 4, A and B). On dog HSA tissues, CD31 mostly labelled arterial endothelium rather than venous endothelium as shown by the overall H-score in Table 2- 6 and Table 2- 7. In contrast, on baboon tissue, the CD31 immunohistochemical signal intensities for arterial and venous endothelium were very similar (average H-score = 280.8 for arterial endothelium and 275.4 for venous endothelium; Table 2- 5).

55

A

B

Figure 2- 4 Microphotographs generated from a scanned slide of CD31 immunohistochemical labelling of endothelial cells of baboon intestinal tissue. The expression of CD31 in the cytoplasm of the endothelial cells was visualized by AEC (bright red, arrow). The counterstain used was H&E (blue). The consecutive tissues were included in every batch of CD31 immunohistochemistry as an external positive control. Scale bar in A = 200 µm; scale bar in B = 50 µm.

DSA-bound glycoproteins were detected in the cytoplasm of the cells with varying histochemical signal intensities. Among three target tissue components, DSA histochemical signal was observed in the cytoplasm of venous endothelium at the highest intensity where heterogeneous signal intensities were observed in HSA cells (Figure 2- 5). Non-specific labelling was observed on some mononuclear cells, red blood cells and cellular debris.

56

Figure 2- 5 Microscopic photo of DSA histochemical labelling on dog spleen tissue section The histochemical signal intensity score of 3 was assigned to the endothelium of the venules in the venous sinus (arrow). The intensity score of 1 was assigned to the endothelium of splenic artery (arrowhead). Lectin-immunohistochemical reaction was visualized by AEC (bright red), H&E was used for counterstaining (blue), while red blood cell appears brown. (sample 11_008151, 20x objective magnification).

WGA histochemistry also labelled the three tissue components. However, the overall H-scores observed for venous endothelium and arterial endothelium were the opposite of what was seen for DSA. No or less histochemical signal intensity was observed on the venous endothelium, while most of the histochemical signal was observed in the arterial endothelium (Table 2- 6 and Table 2- 7). Non-specific labelling was observed on some of the mononuclear cells in the spleen. Background staining was common, especially on cellular debris.

Unlike, WGA and DSA, the lectin histochemical signals were very low to undetectable in SNA lectin histochemistry. Low histochemical signal intensity was observed in the arterial endothelium while almost none was observed in the venous endothelium. The HSA cells also expressed low SNA histochemical signal, while most of the positive reaction was observed extracellularly, i.e. in between chambers of the slit-like HSA tissues or in vascular lumens. The immunohistochemical labelling pattern of VTN was similar to those of the SNA, where most of the immunohistochemical signal intensity was described as low and appeared extracellularly rather than intracellularly.

57

For complement C7 immunohistochemistry, the location of the IHC signal was observed in the cytoplasm of HSA and appeared granular (Figure 2- 6, C7 and Figure 2- 7, C7). The signal intensity was quite uniform across the sample. Additional labelling was observed in mononuclear cells and macrophages in the spleen.

Figure 2- 6 to Figure 2- 9 show microscopic images of LHC/IHC reaction on the consecutive tissues of splenic and cardiac HSA. Three tissue components that were positive for LHC/IHC at various levels of signal intensity are shown in Figure 2- 10. LHC/IHC reaction on different types of HSA growth patterns; typical, solid and cavernous, labelled by the six lectins/antibodies are shown in Figure 2- 11.

58

CD31 DSA

WGA SNA

VTN C7

Figure 2- 6 Scanned microscopic photos of lectin-immunohistochemical labelling by different lectins/antibodies (DSA-Datura stramonium, WGA-Wheat Germ Agglutinin and SNA- Sambucus nigra lectins and anti-human CD31, anti-human VTN and anti-human complement C7 antibodies) on consecutive splenic HSA sections (sample 13_019400E). Cancer tissues (star) are separated from normal spleen parenchyma (x) by fibrous tissue (arrow). Lectin-immunohistochemical reaction to each lectin/antibody was visualized by AEC (bright red), H&E was used for counterstaining (blue), while red blood cell appears brown. A scale bar indicating 1mm is provided at the bottom right of the figures, except for WGA, where a 500 µm scale bar is provided.

59

CD31 DSA

WGA SNA

VTN C7

Figure 2- 7 Scanned microscopic photos of lectin-immunohistochemical labelling by DSA- Datura stramonium, WGA-Wheat Germ Agglutinin and SNA-Sambucus nigra lectins and anti- human CD31, anti-human VTN and anti-human complement C7 antibodies on consecutive splenic HSA sections (sample 13_019400E). Glycoprotein expression on HSA cells, detected by lectin-immunohistochemistry differs among lectins/antibodies. Lectin-immunohistochemical reaction to each lectin/antibody was visualized by AEC (bright red), H&E was used for counterstaining (blue) while red blood cell appears brown. The lectin-histochemical reaction was dectected in the extracellular matrix of the cancer cells for SNA-bound glycoproteins (arrow). Scale bars = 50 µm.

60

CD31 DSA

WGA SNA

VTN C7

Figure 2- 8 Scanned microscopic photos of lectin-immunohistochemical labelling DSA-Datura stramonium, WGA-Wheat Germ Agglutinin and SNA-Sambucus nigra lectins and anti-human CD31, anti-human VTN and anti-human complement C7 antibodies, respectively, on consecutive cardiac HSA sections (sample 13_023446B). Tissue expression of glycoproteins specific for each reactant was visualized by AEC (bright red), H&E was used for counterstaining (blue) while red blood cell appears brown (within circle). Cancer tissue (star) lies next to fibrous tissue (arrow). Scale bars = 100 µm.

61

CD31 DSA

WGA SNA

VTN C7

Figure 2- 9 Scanned microscopic photos of histochemical labelling with DSA-Datura stramonium, WGA-Wheat Germ Agglutinin and SNA-Sambucus nigra lectins and anti-human CD31, anti-human VTN and anti-human complement C7 antibodies, respectively, on consecutive cardiac HSA sections (sample 13_023446B). Tissue expression of glycoproteins specific to each reactant was visualized by AEC (bright red), H&E was used for counterstaining (blue). Cancer tissue (star) lies next to fibrous tissue (arrow). Scale bars = 50 µm.

62

Cancer Arterial endothelium Venous endothelium CD31

DSA

WGA

SNA

VTN

C7

Figure 2- 10 Microscopic images of lectin/immunohistochemical labelling using DSA-Datura stramonium, WGA-Wheat Germ Agglutinin and SNA-Sambucus nigra lectins and anti-human CD31, anti-human VTN and anti-human complement C7 antibodies. Lectin-immunohistochemical reaction to each lectin/antibody was visualized by AEC (bright red), H&E was used for counterstain (blue), while red blood cell appears brown. Three tissue types which expressed the glycoproteins were HSA (cancer), the arterial endothelium and venous endothelium. Lectin/immuno-labelling by the different reactants show varying levels and proportion of tissue glycoprotein expression. Areas where high lectin/immunohistochemical signal was present were selected to demonstrate the labelling patterns (sample 12_036750A, cancer; 11_008151, endothelium). The lectin- immunohistochemical labelling across all the samples was heterogeneous and H-scores were- 63

assigned for further analysis. Scale bars = 50 µm.

64

Typical slit-like Solid Cavernous CD31

DSA

WGA

SNA

VTN

C7

Figure 2- 11 Microscopic images of lectin/immunohistochemical labelling using DSA-Datura stramonium, WGA-Wheat Germ Agglutinin and SNA-Sambucus nigra lectins and anti-human CD31, anti-human VTN and anti-human complement C7 antibodies, respectively, on HSA with different growth patterns. Lectin-immunohistochemical reaction to each lectin/antibody was visualized by AEC (bright red), H&E was used for counterstaining (blue) while red blood cell appears brown. (sample 18/015280E; cavernous, 11/065749G; solid, 12/025492A; typical – WGA, SNA, VTN, C7, 12/030488F; typical - DSA, CD31). The lectin-immunohistochemical labelling across all the samples was heterogeneous and H-scores were assigned for further analysis. Scale bars = 100 µm.

65

2.3.3 Glycoprotein expression on the tissue components assessed by H-scores.

Various levels of lectin-immunohistochemical signal intensity as well as positive area percentages were observed across the samples and lectins/antibodies, hence the data were reported by H-scores. Table 2- 6 summarizes number of positive samples, percentage of positive cells, overall H-scores (Mean ± SD, Median ± MAD) and Inter Quartile Range (IQR) by each lectin/antibody. Three tissue components (HSA cells, arterial endothelium and venous endothelium) were assed in HSA samples, two tissue components were assessed in non-HSA samples as shown in Table 2- 7.

The dot charts in Figure 2- 12 show distribution of the H-score assessed in HSA cells, arterial endothelium and venous endothelium. The distribution of the data varies across tissue components and types of lectins/antibodies. Most of the data were not normally distributed, but some trend of either left or right-skewed distributions are apparent.

Tissue components comparison was performed using Kruskal-Wallis test for the differences between the components. Boxplots demonstrating the distribution of the H-scores across all the reactants can be viewed in Figure 2- 13 where the details for tissue components comparison are demonstrated in Figure 2- 14. Differences among tissue components were observed (p-value < 0.05) across the reactants, except for VTN (p-value = 0.6). Wilcoxon signed-rank test was performed for the pair-wise comparison of H-scores differences between two components at a time, p-values were labelled above the significant bar (Figure 2- 14).

For every lectin/antibody, the H-scores of three tissue components including HSA cells, arterial and venous endothelium were significantly different from each other. In WGA, SNA and CD31- LHC/IHC, the signal intensity was found to be higher in arterial endothelium when compared to venous endothelium. In the opposite way, DSA and C7 LHC/IHC signal intensity detected in arterial endothelium was lower when compared to venous endothelium.

66

Table 2- 6 Overall H-scores assessed from lectin/immunohistochemistry with DSA-Datura stramonium, WGA-Wheat Germ Agglutinin and SNA-Sambucus nigra lectins and anti-human CD31, anti-human VTN and anti-human complement C7 antibodies on HSA samples, three tissue components; HSA cell, arterial endothelium and venous endothelium. Tissue components Antibodies and Lectins HSA samples (n = 32) CD31 DSA WGA HSA Positive samples 25/29 32/32 29/31 Positive cells (%) 43.7 78.7 69 H-score (Mean ± SD) 123.7 ± 109.8 171.3 ± 76.7 157.9 ± 109.2 H-score (Median ± MAD) 180 ± 133.4 180 ± 89 150 ± 148.3 Inter Quartile Range 215 135 197.5 Arterial endothelium Positive samples 27/28 24/27 28/28 Positive cells (%) 52.68 30 73.6 H-score (Mean ± SD) 134.8 ± 87.5 46.9 ± 58.9 194.6 ± 78.5 H-score (Median ± MAD) 140 ± 126 26 ± 29.7 215 ± 70.4 Inter Quartile Range 167.5 60 110 Venous endothelium Positive samples 19/26 26/27 24/28 Positive cells (%) 13.7 65.7 15.1 H-score (Mean ± SD) 37.1 ± 33.4 172.9 ± 59.3 29.4 ± 13.3 H-score (Median ± MAD) 22.5 ± 41.3 190 ± 81.3 10 ± 60.4 Inter Quartile Range 52.5 100 19 Tissue components Antibodies and Lectins HSA samples (n = 32) SNA VTN C7 HSA Positive samples 24/30 25/30 26/28 Positive cells (%) 41 45.5 68.3 H-score (Mean ± SD) 78.8 ± 91.2 57.4 ± 60.4 175.8 ± 102.6 H-score (Median ± MAD) 20 ± 29.7 45.1 ± 62.5 220 ± 103.8 Inter Quartile Range 125.75 82.5 192.5 Arterial endothelium Positive samples 26/26 18/25 26/26 Positive cells (%) 53.3 38.7 65.9 H-score (Mean ± SD) 88.7 ± 68.9 38.7 ± 35.3 83.9 ± 45.6 H-score (Median ± MAD) 65 ± 66.7 30 ± 44.5 77.5 ± 24.1 Inter Quartile Range 106.25 75 26.9 Venous endothelium Positive samples 16/25 19/28 26/28 Positive cells (%) 26 37.2 62.7 H-score (Mean ± SD) 52 ± 14.8 65.7 ± 51.5 139 ± 118.6 H-score (Median ± MAD) 10 ± 60.4 75 ± 76.2 25 ± 90.5 Inter Quartile Range 60 60 150

67

Table 2- 7 Overall H-scores, assessed from lectin/immunohistochemistry using DSA-Datura stramonium, WGA-Wheat Germ Agglutinin and SNA-Sambucus nigra lectins and anti-human CD31, anti-human VTN and anti-human complement C7 antibodies, non-HSA samples, two tissue components. Tissue components Antibodies and Lectins non-HSA samples (n = 26) CD31 DSA WGA Arterial endothelium Positive samples 12/12 18/19 22/23 Positive cells (%) 63.8 33 72.4 H-score (Mean ± SD) 191.3 ± 52.01 38 ± 30.5 175.4 ± 75 H-score (Median ± MAD) 195 ± 44.5 25 ± 37.1 180 ± 66.7 Inter Quartile Range 52.5 52.5 95 Venous endothelium Positive samples 7/11 19/19 18/24 Positive cells (%) 7.4 67.6 21.3 H-score (Mean ± SD) 18.3 ± 35 187.1 ± 87.1 40 ± 66.5 H-score (Median ± MAD) 10 ± 14.8 230 ± 44.5 5 ± 7.4 Inter Quartile Range 15 85 36.8 Tissue components Antibodies and Lectins non-HSA samples (n = 26) SNA VTN C7 Arterial endothelium Positive samples 21/21 17/22 23/24 Positive cells (%) 58.3 38.4 50 H-score (Mean ± SD) 79.5 ± 43.2 43.9 ± 41.7 71.3 ± 58.0 H-score (Median ± MAD) 75 ± 22.2 35 ± 52 50 ± 44.5 Inter Quartile Range 20 52.5 52.5 Venous endothelium Positive samples 17/22 14/22 20/22 Positive cells (%) 23.9 38 50.2 H-score (Mean ± SD) 33 ± 39.2 82.5 ± 97.8 116.8 ± 98.6 H-score (Median ± MAD) 25 ±36.3 35 ± 52 85 ± 96.4 Inter Quartile Range 45.5 147.5 182.5

68

Figure 2- 12 Dot charts of H-score distribution, assessed from lectin-immunohistochemical labelling by lectins/antibodies (DSA-Datura stramonium, WGA-Wheat Germ Agglutinin and SNA-Sambucus nigra lectins and anti-human CD31, anti-human VTN and anti-human complement C7 antibodies), respectively. HSA = HSA cell, EA = Arterial Endothelium, EV = Venous Endothelium.

69

Figure 2- 13 Boxplot showing the distribution of lectin-immunohistochemical signal intensities, assessed by given H-scores, using six antibodies/lectins (CD31, DSA, WGA, SNA, VTN and C7). Three tissue components; HSA cells, arterial endothelium and venous endothelium within two disease condition groups; dogs with HSA and dogs without HSA were grouped as shown in the figure legend 'Group'.

70

Figure 2- 14 Boxplots of H-scores distribution, pairwise comparison of the H-scores was made between three tissue components: HSA/cancer cells, arterial endothelium and venous endothelium. H-scores were allocated to lectin-immunohistochemical labelled tissue samples using DSA-Datura stramonium, WGA- Wheat Germ Agglutinin and SNA-Sambucus nigra lectins and anti-human CD31, anti-human VTN and anti-human complement C7 antibodies, respectively. Significant bars show only significant p-values, derived from Wilcoxon test, adjusted by Bonferroni corrections. Note. AE = Arterial Endothelium, VE = Venous Endothelium.

71

2.3.4 Glycoprotein expression on normal vascular endothelium in spleen with HSA and in non-HSA tissues.

The results described previously showed that the arterial and venous endothelial cells expressed glycoproteins differently. Therefore, I next compared overall H-scores of the arterial and venous endothelium among two disease condition groups; tissue samples from dogs with HSA (HSA) and from dogs without HSA (non-HSA), to see whether there were differences in level of normal tissue glycoproteins detected between the two groups.

There was no statistically significant difference in the lectin/immunohistochemical reaction of arterial and venous endothelial cells between HSA and non-HSA tissues (Appendix 2). The results suggested that glycoproteins or glycans expressed by normal vascular endothelium of the spleen were not altered by the presence of HSA in the organ. This supports the use of adjacent vascular endothelium as an internal control when performing lectin/immunohistochemistry using the following reagents: anti-human CD31 antibody, anti-human VTN antibody, anti-human complement C7 antibody and DSA, WGA and SNA lectins.

2.3.5 HSA histopathologic grades and level of tissue glycoprotein expression.

The histopathological grades were applied to thirty-two HSA samples based on cancer differentiation, from well-differentiated (grade 1), moderately differentiated (grade 2) to poorly differentiated (grade 3), as summarised in Table 1- 5 (Chapter 1). The majority of the HSA samples (n = 14) were moderately differentiated, whereas fewer samples (n = 11) were poorly differentiated and the lowest number of samples (n = 7) were well-differentiated HSA (Figure 2- 1, left chart). The growth patterns of the HSA were also identified as; typical (n = 10), solid (n = 7), cavernous (n = 2) and mixed pattern (n = 13) (Figure 2- 1, right chart).

For CD31, which was considered the standard marker of endothelial cell origin, the HSA cells of the well-differentiated and poorly differentiated cancers tended to express lower CD31 than those of the moderately differentiated HSA. No statistically significant differences among histopathological grades were observed after testing by Jonckheere-Terpstra trend test, with the obtained p-values = 0.62. However, the dispersion of H-scores, demonstrated by the length of the boxplot in Figure 2 - 15 and the IQRs shown in Table 2 - 8 (grade 1 = 10, grade 2 = 85 and grade 3 = 192.5) increase along with the cancer grades.

Levels of DSA histochemical reaction on HSA cells and venous endothelial cells assessed by H- score were statistically significantly different among histopathological grades, with increasing

72

DSA-expression as the HSA became less differentiated (Figure 2 - 15). Jonckheere-Terpstra trend test returned p-values = 0.04 and the fitting of Linear model testing yielded a p-value = 0.026 for F- statistic.

For SNA lectin, there was a borderline trend of decreasing HSA cell expression of the SNA-binding glycoproteins with decreasing differentiation, as performed by Jonckheere-Terpstra trend test and fitting of Linear model testing, p-values = 0.07 and 0.06, respectively.

No significant differences in lectin/histochemical reaction among three HSA histopathological grades were observed for WGA, VTN and C7 lectin/histochemistry. The boxplots of lectin/immunohistochemistry H-score distribution for all antibodies/lectins are shown in Figure 2 - 15.

Figure 2 - 15 Boxplots of H-score distribution, assessed by lectin/immunohistochemistry for DSA-Datura stramonium, WGA-Wheat Germ Agglutinin and SNA-Sambucus nigra lectins and anti-human CD31, anti-human VTN and anti-human complement C7 antibodies, on thirty-two HSA FFPE tissue sections. The samples were divided into three groups by histopathological grades, three tissue components were assessed separately. Trends of correlation between level of glycoprotein expression and histopathological grade were observed in DSA and SNA histochemistry.

73

Table 2 - 8 Overall H-scores assessed from lectin/immunohistochemistry with DSA-Datura stramonium, WGA-Wheat Germ Agglutinin and SNA-Sambucus nigra lectins and anti-human CD31, anti-human VTN and anti-human complement C7 antibodies in three histopathological grades. Antibodies/Lectins CD31 DSA WGA

Grade 1 Mean ± SD 63 ± 115.95 132 ± 50.57 215 ± 115.57 Median ± MAD 15 ± 7.41 100 ± 14.83 270 ± 192.74 IQR 10 55 187.5 Grade 2 Mean ± SD 166.2 ± 97.83 175.4 ± 83.65 160.4 ± 114.13 Median ± MAD 210 ± 44.48 177.5 ± 85.25 170 ± 151.97 IQR 85 97.5 220 Grade 3 Mean ± SD 101 ± 110.15 200.6 ± 67.6 148.60 ± 108.89 Median ± MAD 10 ± 14.83 180 ± 59.30 120 ± 151.97 IQR 192.5 90 116.25

Antibodies/Lectins SNA VTN C7

Grade 1 Mean ± SD 152 ± 126.72 87.2 ± 91.91 93.75 ± 116.08 Median ± MAD 90 ± 107.49 70 ± 44.48 45 ±103.78 IQR 190 64.5 130 Geade 2 Mean ± SD 86.54 ± 89.2 67.94 ±54.15 200.7 ± 100.25 Median ± MAD 40 ± 59.30 75 ± 66.53 245.0 ± 37.07 IQR 150 76.25 92.5 Grade 3 Mean ± SD 43 ± 60.07 38.18 ± 40.64 180.9 ± 93.22 Median ± MAD 20 ±29.7 30 ± 44.48 150 ± 133.43 IQR 70 82.5 125

2.3.6 HSA growth patterns and level of glycoprotein expression.

For growth patterns assessment, there was only a limited number of samples with cavernous growth pattern (shown in Figure 2- 1, right chart). Hence, I included only solid and typical types of HSA growth patterns for comparison. No statistical difference between the overall H-scores were detected. However, for the solid type of HSA there was a trend of lesser expression of CD31, complement C7, VTN, DSA, and SNA bound glycoproteins than typical has. In contrast, WGA bound glycoproteins were acting in the opposite way. Solid HSA was more likely to express higher WGA bound glycoproteins than typical HSA. Figure 2- 16 shows the comparison of overall H- score between two growth pattern types of HSA. 74

Figure 2- 16 Boxplots of H-score distribution, assessed for CD31, DSA, WGA, SNA, VTN and C7 lectin/immunohistochemical labelling signal intensity on thirty-two HSA FFPE tissue sections. No statistically difference in the signal intensity between solid or typical HSA growth pattern was found.

2.3.7 Potential HSA tissue markers assessed by recursive partitioning

The final objective was to select top reactants among the six antibodies/lectins (as tissue markers) that could best distinguish HSA from HSA-like according to their differences in the level of lectin- immunohistochemical signal intensities. The expected patterns of signal intensities were either high in HSA/low in HSA-like or vice versa.

A matrix scatter plots of the sample H-scores distribution when paired with different possibilities was created (Figure 2-17, A). Recursive partitioning results suggested two of the markers tested would be useful in distinguishing between HSA and HSA-like tissues based on cut-off point values.

75

Complement C7 antibody and DSA lectin were the top two target-reagents based on this set of data. Two cut-off points, at the C7 H-score above 67.5 and DSA H-score above 145, were able to conclusively distinguish HSA tissue samples from HSA-like tissue as shown in Figure 2-17, B.

A

B

Figure 2- 17 Scatter plots of H-scores distribution, assessed from lectin-immunohistochemical labelling for complement C7 and VTN as well as DSA, WGA and SNA. A. Matrix scatter plots created for the candidate antibodies/lectins H-scores. The darker dots (dark brown) represent H-score assessed from the cancer component of HSA splenic tissue samples. While the lighter dots (light green) represent H-score assessed from the HSA-like lesion of non- HSA (hematoma and haemorrhage) splenic tissue samples. B. Scatter plot of H-scores distribution 76

assessed from lectin-immunohistochemical labelling with the anti-human complement C7 antibody and DSA lectin. The pair of reactants was selected from Recursive partitioning results. Two cut-off points complement C7 (H-score > 67.5) then DSA (H-score > 145) allow clear distinction between HSA and HSA-like tissues.

77

2.4 Discussion

There were three reasons for manual LHC/IHC signal intensity assessment or semi-quantitative method to be chosen over image analysis software or immunofluorescence technique. (1) The organ of primary interest, spleen, contains large quantities of red blood cells and background labelling “noise” was common. Immunofluorescence labelling was attempted but yielded no promising results. Eventually, bright red coloured 3-amino-9-ethylcarbazole (AEC) substrate chromogen was used for lectin/histochemical reaction visualization instead of the commonly used 3,3′- diaminobenzidine (DAB) which presents in brown colour and can mimic the colour of poorly non- specific proteins-blocked red blood cells and hemosiderin in splenic macrophages. (2) There was no specific antibody or lectin which only labelled HSA cells without labelling the adjacent normal endothelial cells encountered elsewhere in the tissue samples. The image analysis software would need to be able to distinguish these cells based on morphology alone, which is still a challenging task, although machine learning algorithms may eventually be possible (however, see Chapter 5). (3) HSA is a non-uniform cancer; it has various growth patterns and degrees of differentiation. Because of the nature of the samples, semi-quantitative LHC/IHC was considered the most suitable data collection method for this experimental design. It was acknowledged, that the tissue pre- processing (sample collection, preservation, paraffin embedding) might contribute to inconsistencies of LHC/IHC reactions. However, considering that glycoprotein expression on normal vascular endothelium was consistent between samples, it may be concluded that sample handling and processing were of negligible concern.

Small sample size and non-comprehensive sampling were other limitations when studying archival samples of naturally occurring diseases. The larger distribution of moderate and high-grade HSA in the sample population, pointed to the fact that the animals were diagnosed when the cancer had already advanced to a critical clinical stage. An attempt to minimize the effect of this limitation was to apply two dimensions of data observation, the modified H-scoring system. The samples are tissue sections which comprise up to > 50 million cells per a 5 mm x 10 mm tissue section, assuming the 5 µm-thick tissue section contain one cell layer. The size of the tissue sections included in this study varied between 5 mm x 10 mm to 20 x 30 mm, making up to at least 2,900 million cells. H-scoring system allowed me to gather information from the LHC/IHC results at a deeper level, not only binomial, but the level of signal intensity plus proportion of cells in each level. It was shown to be a robust protein expression measurement method which could be applied in automated quantitative immunohistochemistry (qIHC) for measuring human epidermal growth

78

factor receptor 2 (HER2) expression in human breast cancer [164]. Depending on the nature of samples, various IHC scoring systems may be assessed to determine the most suitable for the study [165]. For spleen tissues, H-scoring system appeared useful when dealing with LHC/IHC results where the signal intensity of reactants varies across the sample.

In addition to the HSA cancer cells, arterial and venous endothelial cells also showed heterogeneity of glycoprotein expression. Distinctive patterns of glycoprotein expression on different types of vascular endothelium (artery vs. vein) were demonstrated. Dog spleen arterial endothelium expressed higher CD31 and WGA-gps than venous endothelium, while higher expression of VTN, complement C7 and DSA-gps was detected in the venous endothelium. The arterial and venous endothelial cells expressed proteins differently as might be expected. Known molecular identification markers for arterial and venous endothelium were also established; VEGF-R2, Notch and Ephrin-B2 for arteries, BRG1, COUP-TFII and Eph-B4 for veins, reviewed by dela Paz and D'Amore (2003) and Wolf et al. (2019) [166, 167]. Based on this finding, caution should be exercised when assessing the LHC/IHC signal intensity of endothelial cell markers on dog endothelial cells; i.e., the type of endothelial cell should be clearly identified.

The heterogeneity of LHC/IHC reaction on HSA cells was due to the heterogenous nature of the cancer, where cell functions may have been altered and protein expression in the cells likewise changed. In contrast, there was consistency in the LHC/IHC signal intensity on arterial and venous endothelia, respectively, for all the antibodies and lectins used regardless of whether the spleen contained HSA or not. I demonstrated that normal vascular endothelium can be an internal control, when performing lectin-immunohistochemical labelling with anti-human CD31, VTN, complement C7 antibodies and DSA, WGA and SNA lectins on dog spleen tissue. Whether this will apply to other endothelial molecules remains to be ascertained but seems likely.

When glycoprotein expression was compared between normal vascular endothelial cells of the HSA and non-HSA groups, the results showed no statistically difference. This suggests that differences in the levels of serum glycoprotein biomarkers [122] were unlikely to be influenced by normal vascular endothelium, but rather that HSA cells or possibly liver if the cancer has effect on liver functions [168], were the source of such changes. Still, roles of normal vascular endothelia in HSA cannot be completely ruled out based on this study alone, as there were two limitations of the study: one is with the overall number of samples and the other was in regard to the non-HSA group. The non-HSA group included various splenic lesions, only 3 out of 26 tissue samples were normal spleen. It could be possible that other disease conditions could affect the level of glycoprotein

79

expression of the vascular endothelium as well. To validate this, comparison of vascular endothelial cell glycoprotein expression between three groups; HSA, non-HSA and healthy control in a larger cohort is suggested.

The trend in decreasing HSA cell expression of SNA bound glycans or glycoproteins as the cancer becomes more poorly differentiated, which is the opposite of what was seen for DSA bound glycans or glycoproteins, suggests SNA bound glycans may be involved in HSA cancer biological process as well. However, the evidence from this study is not as strong as for the DSA bound glycans. Previously, glycosylation changes in HSA cancer cell was demonstrated by lectin immunohistochemistry using a panel of lectins, but DSA and SNA were not included [169]. The question remains what the underlying reason for the altered lectin-histochemical signal intensities, i.e., whether it is changes in the proteins or the glycan parts of the glycoproteins or both. To determine this will require that the DSA and SNA bound glycoproteins are identified. Some of the possibilities of biological events leading to changes in glycoprotein expression in HSA are (1) impaired function of the cell protein synthesis process (impaired transcription, ); (2) protein can be synthesized but cannot be conjugate with the glycans (post-translational modification problems) or (3) glycan changes (increase or decrease) by unknown mechanism [170]. The recommendation would be to research further by focussing on glycan studies.

Complement C7 is overexpressed in HSA cells compared to immunohistochemical signal presented on hematoma tissues. This finding suggested complement C7 may be involved in HSA cancer biology. C7 is a glycoprotein that forms a membrane attack complex (MAC) together with complement components C5b, C6, C8 and C9 (O'Leary et al, 2016). The C7 gene is conserved in human and some other species including dog. Their DNA and amino acid sequences were 84.4% and 81.4% identical, respectively [171]. Apart from the liver, complement C7 is secreted by endothelial cells [172]. Proposed roles of complement C7 in ovarian and non-small cell lung cancer are inflammatory process regulation and possibly tumour suppression [173, 174]. In hepatocellular carcinoma, complement C7 was highly expressed in the cancer cells where the role was demonstrated to maintain tumour initiating cell stemness [175].

In summary, the locations of candidate HSA biomarkers and lectin-binding glycan/glycoprotein expression profile in spleen were demonstrated. Levels of the glycoprotein expression were assessed using a semi-quantitative lectin/immunohistochemistry approach. Moreover, complement C7 and DSA may supplement the use of CD31 and von Willebrand factor in immunohistochemistry to confirm vascular endothelial cell origin in HSA suspected tumours. However, since most of the

80

antibodies used were rabbit polyclonal antibodies against human antigens, antibodies validation steps confirming cross-reactivity of the antibodies are still needed to be performed if IHC protocols for those antibodies were to be developed in laboratory diagnostic test [176]. And the practicality should be evaluated in a larger sample size especially with more controls. The results suggest that complement C7 may play a potential role in HSA cancer biology. Identifying glycoproteins which bind the lectin DSA and SNA may also be the key to a better understanding of HSA pathobiology and should be pursued.

81

Chapter 3 Identifying visceral haemangiosarcoma related germline variants in dogs, using whole exome sequencing

3.1 Introduction

Hemangiosarcoma (HSA) is a cancer which arises from blood vessels and is the most frequent malignant neoplasia of the canine spleen. It is more commonly seen in some popular dog breeds, including Golden Retrievers, German shepherd and Labrador Retriever [10, 11, 15, 17, 18, 88, 89]. Dogs with HSA carry a poor prognosis despite treatment, the diagnosis usually leads to immediate euthanasia, chosen by owners and veterinarians.

From the accumulation of evidences such as over representing of HSA in particular dog breeds, different of genomic profile among breeds and breed clustered gene expression profile, it is suggested that genomic background plays an important role in this cancer [10, 11, 15, 17, 18, 83, 86-89]. It was hypothesised that inherited germline mutations that contributed to the higher risk of developing HSA had been segregated in some dog breeds.

A study to identify canine HSA germline risk mutations was conducted by [90]. In a genome wide association study in 142 Golden Retrievers with HSA, fine mapping was performed by whole genome sequencing in 9 HSA dogs. The study identified germline risk loci located on chromosome 5 at 29Mb and 32Mb base pair positions. No protein coding genes were identified, however, the contributions to higher HSA risk were suspected to be germline variants on non-coding regions of regulatory genes in T-cell mediated immune response in the tumour. Since the study was conducted using a single dog breed within one geographical region, further exploration in other dog breeds of different geographical regions may help construct the picture of HSA risk genotypic landscape of the dogs worldwide.

The aim of this chapter was to highlight potential germline single nucleotide variants based on the combination of (1) highly associated with canine visceral HSA but not related to breeds or geographical region and (2) HSA-related in terms of biological function significance. Two variant filtering approaches were explored, one filtered with stringent statistically significant cut-off for exome variant association scan and the other approach with the predicted functional significance selection.

82

3.2 Materials and methods

3.2.1 Samples

There were three sources of data available for the present analysis: (i) whole exome sequencing (WES) of EDTA blood samples from local and US HSA case animals; (ii) previously published WES datasets for HSA cases and controls from other laboratories and, (iii) a large set of control whole genome sequences covering multiple dog breeds.

For the first source of data, I performed genomic DNA extraction from 11 EDTA blood samples of HSA-diagnosed dogs and submitted for whole exome sequencing. Seven EDTA blood samples were collected from Brisbane Veterinary Specialist Centre and Veterinary Medical Centre, School of Veterinary Science, the University of Queensland between 2010 and 2017. All samples were collected under the Animal Ethics Approval (AEC) number SVS/438/15/KIBBLE/CRF with the animal owner's consent. Four EDTA blood samples were purchased from the Medical Oncology Research and Tissue Archiving, Veterinary Teaching Hospital, Colorado State University, Fort Collins, Colorado, USA. Profile of the animals are shown in Table 3- 1.

The second source, additional whole exome sequence data of HSA dogs (n = 17) and non- HSA (n = 28) were downloaded from the European Nucleotide Archives (ENA) after searching through relevant literature containing exome sequence data derived from HSA and non-HSA dogs using Illumina sequencing platforms [84, 177]. Summarised sample data sources for 45 dogs are shown in Appendix 3. The final number of exome sequence data sets were 28 dogs diagnosed with HSA and 28 dogs without HSA for comparison.

The external control dataset which was the VCF file containing genomic variations identified in normal dogs of several breeds (n=219) was downloaded from EMBL-EBI’s European Variation Archive (EVA) through project ‘High quality variant calls from multiple dog genome project - Run1’, project accession number PRJEB24066 where Illumina sequencing platform and GATK tools were used for whole genome sequence variants identification; Data retrieved November, 2018.

83

Table 3- 1 Genomic DNA extracted from EDTA blood of eleven dogs diagnosed with HSA were submitted for whole exome sequencing. Breed Sample ID Age Sex Source Golden Retriever 1810B 8 FS a 1422B 10 FS 1340B 11 FS 1625B 12 MC 12/030488/ASAB 7 na b 13/021750/JESB 14 FS 14/059575/ZACB 11 M German Shepherd 13/006850/MAXB 7 MS 13/042014/KUMB 13 MS Australian Cattle Dog 12/070341/WATB na na Bullmastiff 17/026978/MARB na F c a = Medical Oncology Research and Tissue Archiving, Veterinary Teaching Hospital, Colorado State University, Fort Collins, Colorado, USA. b = Brisbane Veterinary Specialist Centre. c = Veterinary Medical Centre, School of Veterinary Science, the University of Queensland. na = data not available, F = Female, FS = Female Sterile, M = Male, MC = Male Castrated.

84

3.2.2 DNA extraction and pre-quality control assessment

Genomic DNA was extracted from EDTA blood samples using the Isolate® II Genomic DNA kit (Bioline, UK) following the manufacturer’s protocol. Adequate amount, purity and integrity of the genomic DNA are the main considerations in DNA quality control prior to further steps in library preparation and whole exome sequencing. A minimum of fifty microlitres of 50 ng/µl genomic DNA dilutions with greater than 1.8 A 260/280 absorbance ratio were accepted by the genomic sequencing facility. An electrophoresis gel image of non- degraded DNA materials was also required.

The extracted genomic DNAs were measured using spectrophotometer (Nanodrop® Lite, Thermo Fisher Scientific, Inc.) and fluorometer (Qubit® 3.0, Thermo Fisher Scientific, Inc.). Up to 5 µl or at least 500 ng of DNA per sample was loaded and run on a 0.8% agarose gel containing 0.002% of DNA gel stain (SYBR® Safe, Invitrogen; Thermo Fisher Scientific Inc., USA) in sodium boric acid-based conductive medium (SB™; Faster Better Media LLC, USA) for DNA electrophoresis. The gel image was visualized and captured using gel imaging system with imaging software (ChemiDoc™ XRS+ System and Image Lab 3.0 Software; Biorad Laboratories Pty. Ltd., Australia).

Some of the samples that did not pass the pre-quality control due to either insufficient amounts of DNA or due to severe DNA degradation (smearing on the agarose gel image) were replaced by backup DNA samples with optimal quality. A minimum of 50 µl at 50 ng/µl of DNA per sample were sent to the Ramaciotti Centre for Genomics, UNSW, for library preparation and whole exome sequencing.

The DNA quality was assessed on arrival at the sequencing facility by spectrophotometer (Xpose®), fluorometer (Qubit®) and genomic DNA integrity assay using automated electrophoresis tool (Agilent 2200 TapeStation®, Agilent Technologies, Inc.).

3.2.3 Whole Exome Sequencing

Libraries were prepared using SureSelectXT HS Target Enrichment System for Illumina Multiplex Sequencing (Agilent Technologies, Inc., USA) then sequenced by NextSeq500 ® sequencer. These processes were performed by the Ramaciotti Centre for Genomics, UNSW.

85

3.2.4 Data pre-processing, variants calling and data preparation for analysis

I performed all the data analysis using the high-performance computing cluster at Queensland Institute of Medical Research (QIMR Berghofer, Brisbane). Exome sequence data were pre- processed as recommended by Genome Analysis Toolkit best practices pipeline [178], where raw exome sequences were mapped by aligning them against the reference sequences. Dog reference genome sequence, genome assembly: CanFam3.1 (GCA_000002285.2) was downloaded from Ensembl (www.ensembl.org, data retrieved January, 2018). Exome sequence data post-processing was performed to remove duplicate sequences and to recalibrate the aligned sequences, after which the data were ready to be analysed.

Variants detection or variants calling was to compare the sample sequences with the reference sequence, with ‘variant’ defined as the position where the sequence of the samples and the reference differed. The tools applied for variants calling were GATK HaplotypeCaller and VarScan2 with default settings [178, 179]. The detailed workflow is illustrated in Figure 3- 1.

86

Figure 3- 1 Flowchart demonstrating the workflow from whole exome sequence raw data (FASQ files) pre-processing until variant discovery (calling). Data pre-processing included aligning whole exome sequences to the reference sequence, removing PCR duplications, indel re-alignment and base quality recalibration. Two variants caller tools were applied in this study. The first tool, GATK (version 4.0) was used for variant calling in individual samples, then combined into a single variant call format (vcf) file using the commands, HaplotypeCaller, CombineGVCFs and GenotypeGVCFs respectively. The second tool, VarScan2 (version 2.4.2) with default setting for germline variants calling was used for detecting variants in an individual sample. The samples were assigned into five groups according to HSA status (HSA – non-HSA), breed (Golden Retriever – non-Golden Retriever) and geographical regions (the United States – Australia), in case breeds and regions

87

contributed to the results. The variant call format (vcf) files were combined using GATK’s (version 3.8) CombineVariants command, generating the dataset which contains combined variants from all fifty-six samples in a single vcf file for allele frequencies calculation. However, in the initials step, samples were grouped into binary data, cases (HSA) and controls (non-HSA). 3.2.5 Candidate variants filtering and prioritization

I hypothesised that HSA-related variants were segregated in dogs with HSA. So that either significantly higher or lower allele frequencies of the HSA-related variants would be observed when comparing HSA to non-HSA sequences. However, due to the small sample size, stringent statistical cut-off may filter out informative results at the borderline cut-off value. Therefore, another approach, which is based on the consequence of non-synonymous variants, was also applied.

Two candidate variants filtering approaches, exome-wide variant association scan and functional annotation were performed. The HSA-related variants were filtered using the statistical significance cut off for the first approach and biological/functional significance plus statistical significance for the latter. The schematic workflow of the two approaches and expected results are demonstrated in Figure 3- 2.

88

Figure 3- 2 HSA-related germline variants filtering workflow: candidate germline variants filtering based on stringent statistical significance and biological function significance. For statistically significant HSA-related variants, exome variant association scan (diagram on the left), additional external control variants data was downloaded from available database archive (PRJEB24066). R package contained generalized linear mixed model (GLMM) association analysis, GENESIS: GENetic EStimation and Inference in Structured samples was used to analyse the associations between the variants and HSA (case) status compared to non-HSA (control). The variants that were significantly associated with HSA status were filtered out. Variants with read depth less than 30x were removed. The candidate variants were then annotated with Ensembl’s Variant Effect Predictor tools (VEP) and analysed for possible involved pathways using Gene Set Enrichment Analysis (GSEA). For analysing HSA-related variants by their functional significance (diagram on the right), functional annotation using Ensembl’s Variant Effect 89

Predictor tools (VEP) and Gene Set Enrichment Analysis (GSEA) were applied to identify variants and pathways involved in the germline of dogs with HSA. The allele frequencies of the filtered variants between dogs with HSA and dogs without HSA were compared using Sib Pair program. Then possible HSA-related germline variants were highlighted.

3.2.5.1 Exome-wide germline variants scan for HSA associated markers To diminish the effect of sample population stratification, for example breeds, ages, geographical regions that causes strong false positive associations, R package GENetic EStimation and Inference in Structured samples (GENESIS) was applied [180]. Two adjustments for population structures were carried out, Genetic Principal Components Analysis (PCA) to calculate the first four principal components which are included in the mixed model as fixed effects. The empirical kinship matrix for all animals in the study was calculated based on the sequence data and included in the mixed model as the random effects. Because the outcome is binary, GENESIS allowed us to fit in the logistic regression framework [180]. I tested the candidate variants one at a time.

As demonstrated in Figure 3- 2, exome variant association scan, external control dataset (n = 219) was included in the control group (project accession PRJEB24066, data retrieved November 2018. HSA-associated variants were selected based on significance difference of the allele frequencies observed in the HSA dogs (case, n = 28) compared to non-HSA dogs (control, n = 28 + 219).

The selected variants were filtered by removing variants with sequence depth less than 30x and possible artefacts (i.e. variants with 100% allele frequencies in both groups). Then the filtered variants were annotated using Ensembl’s Variant Effect Predictor (VEP, version 99) [181]. From the candidate variants, possible molecular signature pathways involved were investigated in Gene Set Enrichment Analysis (GSEA 4.0), utilizing Molecular Signature Data base (MSigDB 7.0) [182]. False Discovery Rate (FDR) q-values for the analysis equal or less than 0.05 were considered as significant. The top ten significant pathways, to which the genes harbouring variants belonged, were chosen and the candidate variants were listed.

90

3.2.5.2 Variant functional analysis approach The variants detected in both dogs with HSA and without HSA were identified in terms of their genomic annotations. Effects on the transcripts and their involvement in molecular pathways were predicted using Ensembl's Variant Effect Predictor tools (VEP 92.3) and then Gene Set Enrichment Analysis (GSEA 3.0) software, utilizing Molecular Signature Data base (MSigDB 6.2) [181, 182].

From VEP predicted results, the variants were filtered and selected following two prediction categories; 1) VEP predictions and 2) Sorting Intolerance From Tolerance (SIFT) predictions. VEP predicted the impacts of the variants on the transcripts, SIFT predicted the effects of the amino acid changes on the protein functions [181, 183]. Table 3- 2 shows VEP predicted impacts and sequence ontology.

Table 3- 2 Variant Effect Predictor (VEP). This approach predicts the consequence impacts of the variants, sequence ontology terms and description (extracted from the full list on Ensembl's VEP website, www.asia.ensembl.org). High and moderate impact variants were filtered. VEP Impact Sequence ontology term Sequence ontology description HIGH transcript_ablation A feature ablation whereby the deleted region includes a transcript feature. HIGH splice_acceptor_variant A splice variant that changes the 2 base region at the 3' end of an intron. HIGH splice_donor_variant A splice variant that changes the 2 base region at the 5' end of an intron. HIGH stop_gained A sequence variant whereby at least one base of a codon is changed, resulting in a premature stop codon, leading to a shortened transcript. HIGH frameshift_variant A sequence variant which causes a disruption of the translational reading frame, because the number of nucleotides inserted or deleted is not a multiple of three. HIGH stop_lost A sequence variant where at least one base of the terminator codon (stop) is changed, resulting in an elongated transcript. HIGH start_lost A codon variant that changes at least one base of the canonical start codon. HIGH transcript_amplification A feature amplification of a region containing a transcript. (Continue next page)

91

Table 3-2 (Continue)

VEP Impact Sequence ontology Sequence ontology description term MODERATE inframe_insertion An inframe non synonymous variant that inserts bases into in the coding sequence. MODERATE inframe_deletion An inframe non synonymous variant that deletes bases from the coding sequence. MODERATE missense_variant A sequence variant, that changes one or more bases, resulting in a different amino acid sequence but where the length is preserved. MODERATE protein_altering_variant A sequence_variant which is predicted to change the protein encoded in the coding sequence. LOW splice_region_variant A sequence variant in which a change has occurred within the region of the splice site, either within 1-3 bases of the or 3-8 bases of the intron. LOW incomplete_terminal_ A sequence variant where at least one base of codon_variant the final codon of an incompletely annotated transcript is changed. LOW start_retained_variant A sequence variant where at least one base in the start codon is changed, but the start remains. LOW stop_retained_variant A sequence variant where at least one base in the terminator codon is changed, but the terminator remains. LOW synonymous_variant A sequence variant where there is no resulting change to the encoded amino acid.

Four criteria were established for candidate variants selection, including variants that were detected at multiple loci of the same gene (criterion 1), variants that were predicted as damaging consequence by both predictor tools (criterion 2), variants that were predicted as damaging consequence by VEP (criterion 3) and by SIFT (criterion 4).

For each SNVs filtered in criteria 1 and 2, allele frequencies were inspected in the HSA population versus control population then the association p-values were calculated. Association p-value less than 0.05 was set as the cut-off for candidate variants selection. Carrier cases (HSA cases that carry the allele, either homozygous or heterozygous) of two geographical regions, the United States (n = 21) and Australia (n = 7) were reported. 92

The variants filtered in all four criteria were combined and Gene Set Enrichment Analysis to identify major molecular signature pathways that might be involved in HSA, according to the damaging variants in the genes, was performed. I also explored two different approaches (so called, approach A and B) of data input. The concept was to create different sets of both variants and variants-related pathways identified in dogs with HSA and the dogs without HSA, then remove the common variants/pathways, so only the variants/pathways unique to HSA were identified. The only difference between the two experimental approaches was whether variants or pathways set intersecting were performed. The workflow is illustrated in Figure 3- 3.

Figure 3- 3 Gene Set Enrichment Analysis (GSEA) data input approaches. Two approaches were tested. Approach A, the germline variants identified in HSA and control groups were intersected after which the shared variants were removed. Only genes containing variants unique to HSA were analysed in GSEA. In Approach B, germline variants/genes containing variants identified in both HSA and control groups were analysed by GSEA, then the HSA-related pathways related to HSA were extracted later. HSA-related pathways were proposed and the pathways that were identified by both approaches were prioritized before the pathways identified by either approach.

In the first approach (Approach A in Figure 3- 3), the candidate variants/gene symbols containing variants uniquely identified in the HSA cohort was analysed by GSEA to see what pathways were involved. In the other approach, (Approach B in Figure 3- 3), the candidate variants/gene symbols containing variants identified were analysed by GSEA at the 93

beginning, then the pathways identified in both groups were compared. Pathways unique to HSA were reported. Finally, the candidate HSA-related pathways and variants belonging to the pathways were identified giving the priority to pathways and variants presented in both approaches (Figure 3- 3)

94

3.3 Results

3.3.1 DNA quality control and whole exome sequencing

DNA samples extracted from EDTA stabilized blood of HSA dogs were pre-assessed using spectrophotometer (Nanodrop®), fluorometer (Qubit® 3.0) and 0.8% agarose gel electrophoresis. The DNA quantity, concentration and A260/280 absorbance ratio are recorded in Table 3- 3, and the agarose gel images are shown in Figure 3- 4. Quality controls were performed on arrival to genomic sequencing facility, electrophoresis gel images are shown in Figure 3- 5, A and B. Post-shearing DNAs electrophoresis gel image is shown in Figure 3- 6. Phred quality score above 30 was reported for all sequenced samples (mean 31.4725; 31.18 - 31.68) which suggested 99.9% accuracy of the base call by the sequencing platform.

Figure 3- 4 Electrophoresis gel image of genomic DNAs extracted from HSA dog blood. Eleven samples (Gel image A. inside the box, “MARB”, B. “Zac, Wait”, C. “1810B, 1422B,

95

1340B, 1625B, ASAB, JESB, MAXB, KUMB”) showed intense bands with few smears in the lanes compared to smears in the adjacent lanes. The position referenced by the ladder on the left suggested the majority of DNA sizes were larger than 1 kbp.

Table 3- 3 Genomic DNA quality control before sample submission and on arrival at genomic sequencing facility quality control. Quality control on arrival data was provided by Ramaciotti Centre for Genomics, 2017. Sample Pre-quality control Quality control on arrival

name Nano Qubit A260/280 Xpose1 Xpose2 Qubit A260/280 DIN 1810B 113.9 76.0 1.86 86.6 86.6 83.1 1.82 8.7 1422B 346.2 520.0 1.86 225.0 230.2 215.1 1.84 8.3 1340B 95.0 105.4 1.95 72.4 112.6 66.6 1.92 8.0 1625B 84.6 94.4 1.86 80.2 80.2 71.7 1.86 8.2 ASAB 136.7 316.0 1.87 164.1 164.1 159.3 1.85 8.9 JESB 132.1 131.0 1.85 136.4 136.4 131.0 1.86 9.3 ZACB* 114.8 64.5 1.78 97.3 na 92.7 1.85 9.4 MAXB 97.0 153.0 1.80 88.1 88.1 84.7 1.82 9.3 KUMB 360.7 632.0 1.84 202.8 202.8 183.9 1.85 9.1 WATB* 227.5 72.9 1.92 156.2 na 110.0 1.93 9.3 MARB* 157.2 66.2 1.87 143.1 na 91.6 1.87 8.1

Nano = dsDNA concentration (ng/µl) measured by Nanodrop® Qubit = dsDNA concentration (ng/µl) measured by Qubit® Xpose1 = dsDNA concentration (ng/µl) measured by Xpose® Xpose2 = nucleic acid concentration (ng/µl) measured by Xpose® DIN = DNA Integrity Number * = Re-submitted samples. na = data not available.

96

Figure 3- 5 Electrophoresis gel image of genomic DNAs extracted from; A. Eight samples (1810B, 1422B, 1340B, 1625B, ASAB, JESB, MAXB, KUMB inside the box) had DNA integrity number (DIN) over 7, which were acceptable for genomic DNA library preparation. In contrast, sample “ZACB” showed vast smear and low DIN number, 2.5. This sample was re-extracted then submitted again. B. Re-submitted test sample which did not pass the quality control in the first sample submission (ZACB, Figure 3-5, A) and two additional samples (MARB, WATB, inside the box). The DNA integrity number (DIN) of all samples were over 7 and the gel showed intense DNA bands without visible smears, suggesting most of the DNA fragments were intact. Images were generated by automated electrophoresis tool, Agilent 2200 TapeStation® with Agilent Genomic DNA ScreenTape® System. (Modified, image retrieved from Ramaciotti Centre for Genomics, 2017).

97

Figure 3- 6 DNA electrophoresis of samples after DNA shearing. The sizes of the DNA fragments of all samples, including DNAs extracted from eleven blood samples from dogs with HSA (MARB to KUMB, inside red box) were about 200 bps. Image generated by automated electrophoresis tool, Agilent 2200 TapeStation ® with High Sensitivity D1000 ScreenTape® System (Modified, image retrieved from Ramaciotti Centre for Genomics, 2017).

98

3.3.2 Exome variant association scan

Possible sample stratifications which may lead to false positive associations were removed first by four principal components (Figure 3- 7, A). The diagnostic quantile-quantile plot of association test statistics is shown in Figure 3- 7, B with the inflation factor (λ) of 0.95 where the observed p-values started to deviate from the normal distribution non-associated p-values at 10-2. The p-values of the associations between each variant on the chromosomes (1-38) and HSA status were plotted along the x-axis, creating a Manhattan plot (Figure 3- 7, C). The Genome-wide significant variants (p-value 5 < 10-8) were filtered. Possible Linkage Disequilibrium (LD) locations were observed in chromosomes.

A B

C

Figure 3- 7 GENetic EStimation and Inference in Structured samples (GENESIS) generalized mixed model analysis of association between HSA and all variants. A. First four principal components, including breeds were adjusted to remove population stratification. B. Diagnostic quantile-quantile plot of association test statistics. The inflation factor (λ) of the analysis is 0.95. C. Manhattan plot representing the variants p-values of

99

associations. The variants identified are plotted along the x-axis according to their genomic location, chromosome 1 – 38, with their p-values according to the y-axis. Variants with genome-wide significant (p-value < 5 x 10-8) were filtered.

One hundred and thirty-nine significantly HSA-associated SNVs (p-value < 5 x 10-8) were filtered, functionally annotated and their effects predicted. Four SNVs on chromosomes 5, 8, 26 and 29 were predicted by VEP to have high impact on consequent transcripts (Table 3- 4). SNP 5: 32330465 A/G (c.139T>C, p.*47Gln) resides on exon2 of the phospholipid scramblase 3 gene (PLSCR3, 5:32,327,475-32,331,401). The variant was predicted to code for glutamine instead of being a stop codon. The G allele frequency in the HSA dogs was 92.59%, while in non-HSA it was 6.7%. All 20 US dogs with HSA were found to be homozygous for the allele while about 70% (5/7) of Australian dogs with HSA were homozygous.

At SNP 8:22594311 G/T on the coding region of the Fanconi anemia complementation group M (FANCM) gene, two transcripts were predicted to code for a stop codon (exon 10/23, c.1642 G>T, p.Glu548* and exon 9/22, c.1564 G>T, p.Glu522*). Considering the allele frequencies, the majority of HSA cases do not carry "T" allele (10.71%) but "G" allele (89.29%), while the frequency of "T" and "G" alleles were almost equal in non-HSA group (46.43% and 53.57%, respectively) (Table 3-4). The predicted transcript of SNP 26:29775156 G/A (c.299 C>T, p.Arg74*) on exon 3/4 of the mitochondrial ribosomal protein L40 (MRPL40) gene also code for a stop codon. The "A" allele frequency in the HSA cases was 3.57% while the "G" allele frequency was 96.43% while both alleles in non-HSA group were 50% (Table 3-4).

SNP 29:28067178 A/G (c.229T>C, exon 2/9, p.*77Arg) on coding region of the zinc finger protein 704 (ZNF704) gene was predicted to code for arginine instead of a stop codon. In the majority of HSA cases (98.21%) the "A" alleles were present and homozygous, meaning the transcripts coded by normal cells in dogs at HSA risk may be shorter compared to another polymorphic (G) genotype. Meanwhile, the proportion of the allele frequencies was different in non-HSA group, where the frequency of A and G alleles were 55.36% and 44.64%, respectively (Table 3-4).

100

Table 3- 4 Significant HSA associated germline SNVs that were predicted as high impact variants by VEP. Allele frequencies in cases (dogs with HSA), controls (dogs without HSA), number of dogs carrying the allele (homozygous + heterozygous/total) among the HSA cases from the United States of America and Australia are shown. Gene AF AF US AUS Assoc VEP Chr:Position Case Control HSA HSA p-value Prediction (allele) (%) (%) (cases) (cases) PLSCR3 5:32330465 (A/G) 92.59 6.70 (20+0)/20 (5+0)/7 4.68 e-11 stop lost FANCM 8:22594311 (G/T) 10.71 46.43 (0+4)/21 (0+2)/7 2.63 e-08 stop gained (G) 89.2 53.57 (17+4)/21 (5+2)/7 MRPL40 26:29775156 (G/A) 3.57 50.00 (0+2)/21 (0+0)/7 4.00 e-11 stop gained (G) 96.43 50.00 (19+2)/21 (7+0)/7 ZNF704 29:28067178 (A/G) 1.79 55.36 (0+1)/21 (0+0)/7 1.99 e-12 stop lost (A) 98.21 44.64 (20+1)/28 (7+0)/7

In addition to functional annotation which have highlighted PLSCR3, FANCM, MRPL40 and ZNF704 as potential HSA related genes. Germline SNVs were selected according to their association significance and the segregation of the allele frequencies in cases compared to control population. The highlighted HSA related SNVs shown in Table 3- 5 included the variants with the genotypic alleles which were segregated in most of the cases. SNPs 5: 32330465 A/G and 5:32330470 G/A on coding region of gene PLSCR3 were associated with HSA. The first SNP resulted in elongation of exon 2 transcripts as mentioned previously and the other SNP led to missense transcripts (c.134 C>T, p.Ala45Val). The allele frequencies and genotypes showed that the polymorphic alleles were homozygous in 100% of US HSA cases and 70% of Australian HSA cases. The SNPs appeared to be on 3' regions (downstream) of ACADVL and TNK1genes. Another SNP, 5:77412492 G/A was cross- checked with the indels data (Appendix 4), two base pairs deletion on exon 1 of MARVEL domain containing 3 (MARVELD3) gene resulting in reading frameshift (c.393_394delAA, p.Arg131fs) was predicted. 101

SNPs on chromosome 6, 15, 21 and 23 were likely to be associated with HSA cases as alternate allele frequencies were segregated in most of the cases and more than 90% of HSA cases from US samples carried more than 90% homozygous alleles. In Australian HSA cases, more than 50% of samples carried homozygous and heterozygous alleles. Other SNPs were also highlighted in Table 3- 5. The full list of significant HSA-related SNVs is shown in Appendix 5 (part 1), Appendix 5 (part 2) and Appendix 5 (part 3).

Table 3- 5 Potential HSA associated germline single nucleotide variants. Chromosome, base pair positions, annotated genes, allele frequencies in cases-controls and number of dogs carrying the allele (homozygous + heterozygous/total) among the HSA cases from the United States of America (US) and Australia (Aus) are highlighted. Chr_position Ref/ Consequence AF (%) HSA HSA p value Gene Alt Case Ctrl US Aus 1_102123402 C/G missense 48.15 0 (1+19)/20 (0+7)/7 4.80E-13 ZNF865 _variants 2_77345896 T/C synonymous 42.59 1.79 (0+18)/21 (0+5)/7 2.72E-10 HSPG2 _variant 2_77345896 T/C intron 42.59 1.79 (0+18)/21 (0+5)/7 2.72E-10 HSPG2 _variant 2_77345897 A/G intron 40.74 1.79 (0+18)/20 (0+4)/7 5.17E-10 HSPG2 _variant 2_77345897 A/G missense 40.74 1.79 (0+18)/20 (0+4)/7 5.17E-10 HSPG2 _variant 5_77412492 G/A missense 83.3 2.35 (15+0)/19 (5+0)/5 1.74E-11 MARVELD3 _variant 5_32330465 A/G downstream_gene 92.59 6.7 (20+0)/20 (5+0)/7 4.68E-11 ACADVL _variant 5_32330465 A/G stop_lost 92.59 6.7 (20+0)/20 (5+0)/7 4.68E-11 PLSCR3 _variant 5_32330465 A/G downstream 92.59 6.7 (20+0)/20 (5+0)/7 4.68E-11 TNK1 _gene_variant 5_32330470 G/A downstream_gene 92.0 9.3 (19+0)/19 (4+0)/6 1.80E-08 ACADVL _variant

(Continue next page)

102

Table 3-5 (Continue)

Chr_position Ref/ Consequence AF (%) HSA HSA p value Gene Alt Case Ctrl US Aus 5_32330470 G/A downstream_gene 92.0 9.3 (19+0)/19 (4+0)/6 1.80E-08 ACADVL _variant 5_32330470 G/A missense 92.0 9.3 (19+0)/19 (4+0)/6 1.80E-08 PLSCR3 _variant 5_32330470 G/A downstream_gene 92.0 9.3 (19+0)/19 (4+0)/6 1.80E-08 TNK1 _variant 6_39891567 G/A downstream_gene 91.61 20.87 (16+0)/17 (6+0)/7 2.47E-08 FBXL16 _variant 6_39891567 G/A upstream_gene 91.61 20.87 (16+0)/17 (6+0)/7 2.47E-08 JMJD8 _variant 6_39891567 G/A upstream_gene 91.61 20.87 (16+0)/17 (6+0)/7 2.47E-08 JMJD8 _variant 6_39891567 G/A downstream_gene 91.61 20.87 (16+0)/17 (6+0)/7 2.47E-08 STUB1 _variant 6_39891567 G/A 3_prime_UTR 91.61 20.87 (16+0)/17 (6+0)/7 2.47E-08 WDR24 _variant 8_72320921 A/C intron 35.71 0 (0+14)/21 (0+6)/7 5.23E-08 AKT1 _variant 9_49748755 A/C missense 46.43 6.12 (0+20)/21 (0+6)/7 5.36E-09 ABO _variant 10_26681571 G/C synonymous 50.0 11.54 (0+21)/21 (0+7)/21 1.07E-08 SOX10 _variant 10_26681601 G/C synonymous 50.0 11.37 (0+21)/21 (0+7)/7 1.49E-08 SOX10 _variant 10_17055863 C/G intron 50.0 0 (3+13)/17 (0+5)/7 2.33E-08 HDAC10 _variant 10_17055863 C/G synonymous 50.0 0 (3+13)/17 (0+5)/7 2.33E-08 PANX2 _variant 11_2055922 C/G missense 53.57 6.40 (4+16)/21 (0+6)/7 4.52E-09 HNRNPH1 _variant

(Continue next page) 103

Table 3-5 (Continue)

Chr_position Ref/ Consequence AF (%) HSA HSA p value Gene Alt Case Ctrl US Aus 11_2055922 C/G intron 53.57 6.40 (4+16)/21 (0+6)/7 4.52E-09 RUFY1 _variant 14_39516858 C/T missense 58.93 8.16 (6+14)/21 (0+7)/7 1.94E-09 CBX3 _variant 15_52226379 T/A intron 95.64 22.13 (19+2)/21 (6+1)/7 1.45E-08 FGB _variant 15_18558883 T/A intron 89.29 12.02 (21+0)/21 (4+0)/7 5.25E-08 KRR1 _variant 15_18558883 T/A missense 89.29 12.02 (21+0)/21 (4+0)/7 5.25E-08 SALL2 _variant 16_46742570 A/C missense 98.21 57.14 (0+1)/21 (0+0)/7 9.43E-09 RWDD4 _variant 20_39437199 T/A intron 56.52 0.004 (11+0)/17 (2+0)/6 2.25E-08 TRAIP _variant 21_26043319 T/A intron 88.89 10.66 (19+0)/20 (5+0)/7 6.17E-08 NUMA1 _variant 23_19762376 T/A intron 97.92 9.0 (18+0)/18 (5+1)/6 8.67E-09 NR1D2 _variant 24_18184314 A/C missense 46.43 4.55 (3+15)/21 (1+3)/7 3.14E-10 AVP _variant 24_42409111 A/C synonymous 37.04 4.17 (0+16)/20 (0+4)/7 5.03E-09 RBM38 _variant 25_28042882 T/A intron 85.19 6.9 (20+0)/21 (3+0)/6 1.10E-11 MSRA _variant 26_22615544 T/A 5_prime_UTR 85.71 5.61 (13+0)/14 (5+0)/7 4.70E-09 GASL1 _variant 27_24589659 A/T intron 94.44 10.86 (18+1)/20 (7+0)/7 3.74E-09 ETNK1 _variant 28_4228581 T/C intron 98.15 44.08 (19+1)/20 (7+0)/7 1.46E-08 BMS1 _variant

104

According to the Manhattan plot in Figure 3- 7 C, when lowering the association significant cut-off to p-value < 0.05, there were 8,808 HSA associated germline SNVs. VEP was applied for annotation and filtering possible functional significant variants, 132 variants were filtered. Variant effects of 240 transcripts were predicted. The summary of the consequences is shown in Figure3- 8. The annotated candidate variants were computed in Gene Set Enrichment Analysis (GSEA) to identify the overlapping gene set, so that it could link to possible molecular signatures. Among eight Molecular Signature Database (MSigDB) collections in the GSEA program no overlapping gene set was found for the Hallmark, Collection 3 (motif gene set), Collection 4 (computational gene set) and Collection 6 (Oncogenic signatures) when computed separately. When selecting all the MSigDB collections for computing together, twenty-five overlapped genes from three gene sets in Collection 2 (curated gene set) and Collection 7 (immunologic signatures) MSigDB were the top overlapped gene sets. Summary of the pathways and genes are shown in Table 3- 6.

Figure3- 8 Statistical summary of the predicted consequences of the HSA-related variants identified by variant association analysis. The figures were generated by Ensemble’s VEP (asia.ensembl.org) from the input of 132 variants. 105

Table 3- 6 Gene Set Enrichment Analysis (GSEA) highlighted top pathways where significant HSA associated germline variants overlapped, and genes belonging to the gene sets. Gene set MSigDB Gene set description standard name collection REACTOME_DISEASE C2 Disease. (p-value = 1.7 e-7) GSE22140_HEALTHY_ C7 Genes down regulated in CD4 VS_ARTHRITIC_MOUSE_ [GeneID=920] T cells under specific CD4_TCELL_DN pathogen free conditions: healthy versus arthritis (KRN model). (p-value = 2.83 e-6) GSE28737_BCL6_HET_ C7 Genes down regulated in marginal zone B VS_BCL6_KO_MARGINAL_ lymphocytes with knockout of BCL6 ZONE_BCELL_DN [GeneID=604]: heterozygous versus homozygous. (p-value = 2.91 e-6) Gene Gene Description HDAC5 histone deacetylase 5 ATP6V1H ATPase H+ transporting V1 subunit H XRCC6 X-ray repair cross complementing 6 HDAC10 histone deacetylase 10 PSMA7 proteasome subunit alpha 7 TSG101 tumor susceptibility 101 AKT1 AKT serine/threonine kinase 1 AVP arginine vasopressin FGB fibrinogen beta chain RPS8 ribosomal protein S8 AP2A2 adaptor related protein complex 2 subunit alpha 2 HSPG2 heparan sulfate proteoglycan 2 G6PC3 glucose-6-phosphatase catalytic subunit 3 GAS2L1 growth arrest specific 2 like 1 NUMA1 nuclear mitotic apparatus protein 1 GGA3 golgi associated, gamma adaptin ear containing, ARF binding protein 3 AHNAK AHNAK nucleoprotein RBM38 RNA binding motif protein 38 MIS18BP1 MIS18 binding protein 1

106

3.3.3 Variant functional analysis

From whole exome sequence data, 1,615,738 germline variants were detected by Haplotype Caller tool in the whole sample set (n = 56). A different tool, VarScan2 was used for calling the germline variants in subgroup individuals as shown in (Table 3- 7, VarScan2). Variant Effect Predictor tools predicted between 700,000s to 1,000,000s transcript effects following SNVs and between 60,000s to 80,000s effects from indels (Table 3- 7, VEP). After filtering the HSA-related variants that were likely to cause high impact consequence on the transcripts and deleterious effects on the proteins predicted by VEP and SIFT, respectively, thousands of variants remained (Table 3- 7, Filtered from VEP results).

Table 3- 7 Numbers of germline variants identified from variant calling to variant filtered.

Abbreviations: US_GR = American Golden Retrievers, US_Others = American non-Golden Retrievers, Aus_GR = Australian Golden Retrievers, Aus_Others = Australian non-Golden Retrievers, Crit 1 = possible damaging variants that were found on multiple loci of the same gene, Crit 2 = possible damaging variants that were predicted by both variant effect predictor tools (VEP and SIFT), Crit 3 = possible damaging variants that were predicted by VEP only, Crit 4 = possible damaging variants that were predicted by SIFT only.

107

Genes where germline SNVs were identified at multiple loci (criterion 1) were highlighted and annotated (Table 3- 8). They were found to be on chromosomes 1, 15, 16, 20, 3 and 30. On chromosome 1, the affected loci were within 200 bp from 1:112,075,070-11,2075,256. Genes in the region included TMEM145 (transmembrane protein 145), PRR19 (proline rich 19) and PAFAH1B3 (platelet activating factor acetylhydrolase 1B3). Possible damaging variants were predicted to cause missense transcriptions (c.247G>A, p.Val81Met; c.67C>T, p.Arg21Cys; c.61C>G, p.Arg19Gly) on exon 1 of the PRR19 gene. The polymorphic alleles were not detected in control population and in the case, only heterozygous carriers were presented. Most of Australian samples appeared as heterozygous carriers (6-7 dogs out of 7) while in the US samples, less than half of dogs carried the alleles.

Multiple missense variants were identified in LINS1 (Lines homolog 1) gene. This gene is 21 kilo base long and resides at base pair position 40,415,599-40,437,075 of chromosome 3. Two major regions of the gene burdened high numbers of variants, first region covered 7.5 kbp (3:40,429,662-40,432,228), the other region covered 10 bp (3:40,434,162-40,434,171). The allele frequencies of the variants within the 7.5 kbp region were segregated almost evenly between the cases and controls. While at 10 bp region, 3:40,434,162 A/T (exon 4/5, c.1318A>T, p.Asn414Tyr; exon 6/7, c.1985A>T, p.Asn444Tyr) and 3:40,434,171 G/C (exon 4/5, c.1327C>C, p.Glu417Gln; exon 6/7, c.1994G>C, p.Glu447Gln), allele "T" at 3:40,434,162 and allele "C" at 3:40,434,171 were lower in the cases, 3.57% and 3.70%, respectively. Most of HSA cases from both US and Australia were homozygous for reference alleles.

On chromosome 20, multiple missense variants of genes MITF (melanocyte inducing transcription factor) and AMT (amino methyltransferase) were detected. For the variants 20: 21,855,582_A/T and 20: 21,855,583_G/A on the of MITF gene, four possible transcripts are transcribed. The variants cause changes in the transcripts that lead to amino acid changes. Higher allele frequencies were presented in the cases, about half of the cases from the US and Australia were heterozygous. For variants on exons of the AMT gene, the allele frequencies in the cases were slightly higher, both US and Australia cases were all heterozygous. Other highlighted genes by this criterion were TXNDC12 (thioredoxin domain containing 12), PPP2CB (protein phosphatase 2 catalytic subunit beta), TCTA (T cell leukemia translocation altered) and CYP1A2 (cytochrome P450 family 1 subfamily A member 2) (Table 3- 8).

108

Table 3- 8 Highlighted HSA related germline variants that were detected at multiple loci on the same gene. For number of HSA cases in the US and Aus = (homozygous + heterozygous)/Total cases. Chr_pos_ref/alt Consequence AF (%) HSA HSA p value Gene Impact Case Control US Aus 1_112075070_C/T upstream_gene_variant 19.64 0 (0+4)/21 (0+7)/7 0.0018 TMEM145 1_112075070_C/T missense_variant 19.64 0 (0+4)/21 (0+7)/7 0.0018 (C) 80.36 100 (17+4)/21 (0+7)/7 PRR19 1_112075070_C/T intron_variant 19.64 0 (0+4)/21 (0+7)/7 0.0018 PAFAH1B3 1_112075250_G/A upstream_gene_variant 14.29 0 (0+2)/21 (0+6)/7 0.0190 TMEM145 1_112075250_G/A missense_variant 14.29 0 (0+2)/21 (0+6)/7 0.0190 (G) 85.71 100 (19+2)/21 (1+6)/7 PRR19 1_112075256_G/C upstream_gene_variant 12.5 0 (0+1)/21 (0+6)/7 0.0221 TMEM145 1_112075256_G/C intron_variant 12.5 0 (0+1)/21 (0+6)/7 0.0221 PAFAH1B3 1_112075256_G/C missense_variant 12.5 0 (0+1)/21 (0+6)/7 0.0221 (G) 87.50 100 (20+1)/21 (1+6)/7 PRR19 15_9291312_A/G missense_variant 1.85 19.67 (0+0)/21 (0+1)/7 0.0183 TXNDC12 16_33557718_T/C missense_variant 1.85 14.08 (0+0)/20 (0+1)/7 0.0387 PPP2CB 16_33557721_C/T missense_variant 1.85 14.2 (0+0)/20 (0+1)/7 0.0353 PPP2CB 20_21855582_A/T missense_variant 26.79 6.76 (0+11)/21 (0+4)/7 0.0104 MITF 20_21855583_G/A missense_variant 26.79 6.76 (0+11)/21 (0+4)/7 0.0104 MITF (Continue next page)

109

Table 3-8 (Continue)

Chr_pos_ref/alt Consequence AF (%) HSA HSA p value Gene Impact Case Control US Aus 20_39819613_G/T downstream_gene_variant 50.0 44.06 (0+21)/21 (0+7)/7 0.0053 TCTA 20_39819613_G/T missense_variant 50.0 44.06 (0+21)/21 (0+7)/7 0.0053 AMT 20_39819642_G/A downstream_gene_variant 50.0 42.21 (0+21)/21 (0+7)/7 0.0027 TCTA 20_39819642_G/A missense_variant 50.0 42.21 (0+21)/21 (0+7)/7 0.0027 AMT 20_39819690_C/T downstream_gene_variant 44.64 43.24 (0+18)/21 (0+7)/7 0.0135 TCTA 20_39819690_C/T missense_variant 44.64 43.24 (0+18)/21 (0+7)/7 0.0135 AMT 3_40429662_G/C missense_variant 44.64 44.69 (0+18)/21 (0+7)/7 0.0468 LINS1 3_40430885_C/T missense_variant 32.14 47.35 (0+18)/21 (0+0)/7 0.0046 LINS1 3_40430978_A/G missense_variant 26.79 14.61 (0+15)/21 (0+0)/7 0.0052 LINS1 3_40432228_T/G missense_variant 32.14 47.96 (0+18)/21 (0+0)/7 0.0054 LINS1 3_40434162_A/T missense_variant 3.57 30.20 (0+1)/21 (0+1)/7 0.0071 (A) 96.43 69.80 (20+1)/21 (6+1)/7

LINS1 3_40434171_G/C missense_variant 3.70 30.45 (0+1)/20 (0+1)/7 0.0104 (G) 96.3 69.55 (19+1)/20 (6+1)/7 LINS1 30_37820700_G/A missense_variant 32.14 41.53 (0+13)/21 (0+5)/7 0.0264 CYP1A2

110

The next criterion for selecting HSA related germline variant candidates was based on possible effects of the variants on the transcripts and/or proteins. Variants that showed significant difference between the allele frequencies in case and controls were on chromosome 28, 37 and 38. Shown in Table 3- 9, SNP 28:13675592 T/G, c.31A>C, p.Met1Leu resides on exon 1/3 of gene MRPL43 (mitochondrial ribosomal protein L43) was predicted as start_lost variant which means the change occurs in the start codon. The "G" allele frequency was segregated lower in HSA cases than controls, while allele "T" was segregated more in the HSA cases (p-value, 4.93 e-08). Most of the HSA cases (82.61%) carried "T" allele. In US HSA, 68.8% (11/16 cases) were homozygous, 31.2% (5/16 cases) were heterozygous. In Australian HSA, 57.1% (4/7cases) were homozygous and 42.9% (3/7 cases) were heterozygous.

Alternate allele "T" on chromosome 37:5,608,759 C/T, c.8606G>A, p.Ser2869Asn was predicted to cause a missense transcription on exon 46/66 of the dynein axonemal heavy chain 7 (DNAH7) gene. The allele frequency was slightly higher in the HSA cases (42.86%, p-value = 0.0571). The carrier status for US HSA cases was 33.3% (7/21) homozygous, 23.8% (5/21) heterozygous, while for Australian HSA cases 14.3% (1/7) were homozygous and 42.9% (3/7) heterozygous.

SNP 38: 21302502 T/G, c.159T>G, p.Met1Arg resides on exon 2/7 of the beta-1,4- galactosyltransferase 3 (B4GALT3) gene and the 3' region of the protoporphyrinogen oxidase (PPOX) gene. While in US HSA-cases homozygous and heterozygous for both alleles were found in almost equal numbers (8 G/G, 5 T/T, 13 T/G), no heterozygous cases were found in Australian HSA cases in the cohort. Other variants that were predicted as damaging but the allele frequencies between cases and controls were not significantly different and are shown in Table 3- 9.

111

Table 3- 9 Possible damaging HSA related germline variants based on two variant effect prediction tools.

Chr_pos_ref/alt Consequence AF (%) HSA HSA p value Gene Impact Case Control US Aus 1_108633874_T/C start_lost 28.57 33.13 (3+5)/21 (2+1)/7 0.4824 DHX34 HIGH 10_45364871_G/T missense_variant 5.36 6.46 (0+2)/21 (0+1)/7 0.9614 OXER1 MODERATE 10_45364871_G/T downstream_gene 5.36 6.46 (0+2)/21 (0+1)/7 0.9614 HAAO MODIFIER 12_1254270_C/A start_lost 23.21 6.88 (2+6)/21 (1+1)/7 0.0960 VWA7 HIGH 12_1254270_C/A downstream_gene 23.21 6.88 (2+6)/21 (1+1)/7 0.0960 VARS1 MODIFIER 14_5616329_G/C missense_variant 26.79 34.89 (1+8)/21 (2+1)/7 0.8955 PODXL MODERATE 15_603168_A/C start_lost 36.96 41.96 (4+9)/20 (0+3)/3 0.6549 C15H1orf50 HIGH 15_603168_A/C upstream_gene 36.96 41.96 (4+9)/20 (0+3)/3 0.6549 P3H1 MODIFIER 15_6921385_C/T missense_variant 1.79 0 (0+1)/21 (0+0)/7 na ZMYM1 MODERATE 15_51920844_C/T missense_variant 5.36 5.71 (0+3)/21 (0+0)/7 0.8327 DCHS2 MODERATE 15_51984210_G/A missense_variant 7.14 2.87 (0+2)/21 (1+0)/7 na DCHS2 MODERATE 17_7923519_T/G splice_acceptor 19.64 14.62 (3+4)/21 (0+1)/7 0.8812 SLC66A3 HIGH 2_44046606_T/C start_lost 1.79 0 (0+1)/21 (0+0)/7 na SETD9 HIGH 25_47087155_C/A start_lost 3.57 3.28 (0+0)/21 (1+0)/7 na ASB18 HIGH (Continue next page)

112

Table 3-9 (Continue)

Chr_pos_ref/alt Consequence AF (%) HSA HSA p value Gene Impact Case Control US Aus 28_13675592_T/G start_lost 17.39 55.56 (0+5)/16 (0+3)/7 4.93 e-08 (T) HIGH 82.61 44.44 (11+5)/16 (4+3)/7 MRPL43 28_13675592_T/G downstream_gene 17.39 55.56 (0+5)/16 (0+3)/7 4.93 e-08 SEMA4G MODIFIER 28_13675592_T/G upstream_gene 17.39 55.56 (0+5)/16 (0+3)/7 4.93 e-08 TWNK MODIFIER 29_10953431_A/T missense_variant 26.92 34.06 (0+2)/20 (0+4)/6 na RAB2A MODERATE 35_23926662_C/T missense_variant 69.64 50.84 (9+11)/21 (4+2)/7 0.9427 TRIM38 MODERATE 37_5608759_C/T missense_variant 42.86 25.31 (7+5)/21 (1+3)/7 0.0571 DNAH7 MODERATE 38_21302502_T/G start_lost 60.71 73.54 (8+8)/21 (5+0)/7 0.0192 (T) HIGH 39.29 26.46 (5+8)/21 (2+0)/7 B4GALT3 38_21302502_T/G downstream_gene 60.71 73.54 (8+8)/21 (5+0)/7 0.0192 PPOX MODIFIER 5_75191814_T/C start_lost 38.24 41.40 (3+7)/17 na na ADAT1 HIGH 5_75191814_T/C downstream_gene 38.24 41.40 (3+7)/17 na na KARS1 MODIFIER 5_82180196_G/A missense_variant 20.0 15.43 (4+1)/18 (0+1)/7 0.5741 EXOC3L1 MODERATE 5_82180196_G/A upstream_gene 20.0 15.43 (4+1)/18 (0+1)/7 0.5741 E2F4 MODIFIER 8_10706059_A/G start_lost 50.0 38.15 (1+19)/21 (0+7)/7 0.1173 DTD2 HIGH (Continue next page)

113

Table 3-9 (Continue)

Chr_pos_ref/alt Consequence AF (%) HSA HSA p value Gene Impact Case Control US Aus 9_49843218_G/A missense_variant 3.57 6.67 (0+1)/21 (0+1)/7 0.3085 ADAMTS13 MODERATE 9_55323281_A/T missense_variant 18.75 45.87 (0+2)/14 (2+0)/2 na C9H9orf16 MODERATE 9_55323281_A/T downstream_gene 18.75 45.87 (0+2)/14 (2+0)/2 na CIZ1 MODIFIER 9_55323281_A/T downstream_gene 18.75 45.87 (0+2)/14 (2+0)/2 na LCN2 MODIFIER

Three molecular signature pathways, Adipogenesis, Complement and Interferon gamma response were the results of HSA-related gene set enrichment analysis. These pathways comprised the gene set where the variants uniquely belonged to HSA, regardless of the data input approaches. The top ten overlapping pathways from the analysis of four different gene sets input are shown in Table 3- 10. Other molecular signature pathways that may be involved in the HSA were Mitotic spindle, Estrogen response late/early, Heme metabolism, Apical junction, Bile acid metabolism, G2M checkpoint, KRAS signaling up, E2F targets and Inflammatory response (Table 3- 11). The full list of the pathways and genes with SNVs are in Table 3- 12.

114

Table 3- 10 The top ten hallmark molecular pathways where the candidate genes overlapped with the hallmark gene set in the Molecular Signature Database (MSigDB) computed by Gene Set Enrichment Analysis (GSEA). The gene symbols of the identified germline variants via two different approaches (Figure 3- 3, A and B) were investigated. Inside the parentheses are numbers of genes belonging in the pathways. Approach A: HSA cohort p-value Approach A: control cohort p-value

Mitotic spindle (37) 2.34 e-14 Estrogen response early (17) 4.61 e-7 Estrogen response late (33) 1.52 e-11 E2F target (16) 2.21 e-6 Adipogenesis (32) 7.01 e-11 Inflammatory response (16) 2.21 e-6 Heme metabolism (29) 5.54 e-9 G2M checkpoint (15) 9.96 e-6 Estrogen response early (28) 2.2 e-8 Mitotic spindle (15) 9.96 e-6 Fatty acid metabolism (24) 4.5 e-8 MYC target V2 (8) 1.6 e-5 Complement (27) 8.38 e-8 Epithelial mesenchymal transition 4.18 e-5 (14) Interferon gamma response 8.38 e-8 KRAS signaling up (14) 4.18 e-5 (27) Apical junction (26) 3.07 e-7 Unfolded protein response (10) 7.48 e-5 Bile acid metabolism (18) 9.02 e-7 Estrogen response late (13) 1.64 e-4

Approach B: HSA cohort p-value Approach B: control cohort p-value

Mitotic spindle (52) 2.3 e-19 Estrogen response early (18) 3.89 e-7 Estrogen response late (46) 4.16 e-15 MYC target V2 (10) 4.1 e-7 Estrogen response early (45) 1.93 e-14 E2F targets (17) 1.79 e-6 G2M checkpoint (40) 2.73 e-11 Inflammatory response (17) 1.79 e-6 Interferon gamma response 2.73 e-11 G2M checkpoint (16) 7.78 e-6 (40) Complement (39) 1.06 e-10 Epithelial mesenchymal transition 3.17 e-5 (15) KRAS signaling up (39) 1.06 e-10 Hypoxia (15) 3.17 e-5 E2F targets (38) 4.02 e-10 Mitotic spindle (15) 3.17 e-5 Inflammatory response (38) 4.02 e-10 TNFA signaling via NFTB (16) 3.17 e-5 Adipogenesis (37) 1.47 e-9 Unfolded protein response (11) 3.37 e-5

115

Table 3- 11 Hallmark molecular pathways and descriptions where the candidate genes overlapped with the hallmark gene set in the Molecular Signature Database (MSigDB) computed by Gene Set Enrichment Analysis (GSEA). Adipogenesis, Complement and Interferon gamma response pathways (bold letters) were identified uniquely in the HSA group and by both approaches (AB). Gene set name Approach(es) Unique Descriptions to HSA

Mitotic spindle AB no Genes important for mitotic spindle assembly. Estrogen response late AB no Genes defining late response to estrogen. Adipogenesis AB yes Genes up-regulated during adipocyte differentiation (adipogenesis). Heme metabolism A yes Genes involved in metabolism of heme (a cofactor consisting of iron and porphyrin) and erythroblast differentiation. Estrogen response early AB no Genes defining early response to estrogen. Fatty acid metabolism A yes Genes encoding proteins involved in metabolism of fatty acids. Complement AB yes Genes encoding components of the complement system, which is part of the innate immune system. Interferon gamma AB yes Genes up-regulated in response response to IFNG [Gene ID=3458]. Apical junction A yes Genes encoding components of apical junction complex. (Continue next page)

116

Table 3-11 (Continue)

Gene set name Approach(es) Unique Descriptions to HSA

Bile acid metabolism A yes Genes involve in metabolism of bile acids and salts.

G2M checkpoint B no Genes involved in the G2/M checkpoint, as in progression through the cell division cycle. KRAS signaling up B no Genes up-regulated by KRAS activation. E2F targets B no Genes encoding cell cycle related targets of E2F transcription factors. Inflammatory response B no Genes defining inflammatory response.

117

Number of genes carrying single nucleotide variants in the pathways 0 10 20 30 40 50 60 * Adipogenesis 37 * Interferon gamma response 40 * Complement 39 Mitotic spindle 52 Estrogen response late 49 Estrogen response early 45 Heme metabolism 29 Fatty acid metabolism 24 Apical junction 26 Bile acid metabolism 18 G2M checkpoint 40 KRAS signaling up 39 E2F targets 38 Inflammatory response 38 Unfolded protein response 10 MYC target V2 8 Epithelial mesenchymal transition 14 Hypoxia 15 TNFA signaling via NFKB 15

Figure3- 9 Bar graph represents number of genes that carry single nucleotide germline variants and the Hallmark molecular signature pathways that might be involved in dogs with HSA.

118

Table 3- 12 Hallmark molecular pathways and their genes harboring the variants identified in the HSA cohort, ranking from the most likely related to the least HSA-related. Rank Hallmark molecular pathways Genes

Adipogenesis C3, ACADL, FAH, LIFR, APLP2, COQ9, (37 gene sets) PPP1R15B, COL15A1, CD36, ABCA1, GBE1, NDUFS3, ARAF, RREB1, SLC5A6, PQLC3, ACAA2, SOD1, COL4A1, BAZ2A, OMD, YWHAG, CIDEA, DECR1, EPHX2, ADCY6, CHUK, ENPP2, UCK1, LIPE, ACADM, SPARCL1, ACLY, DNAJC15, AGPAT3, MAP4K3, SORBS1. Interferon gamma response CASP7, IFIT2, NMI, NOD1, CIITA, APOL6, (40 gene sets) TNFSF10, HERC6, ICAM1, IFIH1, IFIT1, CSF2RB, MX2, ST3GAL5, RTP4, DDX60, ISOC1, IL4R, SP110, ARID5B, OAS3, JAK2, CD40, TRIM2, BTG1, IL2RB, LGALS3BP, IL18BP, XAF1, CMKLR1, FAS, GBP6, TAP1, BST2, TRIM25, KLRK1, RNF213, CD69, TRAFD1, BANK1. Complement CA2, OLR1, XPNPEP1, CTSH, FCN1, GNB4, (39 gene sets) KYNU, BRPF3, C3, SERPINB2, LRP1, F8, GCA, DYRK2, COL4A2, KCNIP2, CD36, CEBPB, RHOG, GZMB, PCLO, JAK2, PDGFB, ERAP2, CASP7, DUSP5, C4BPB, CPM, ITGAM, LTF, PPP2CB, SPOCK2, F3, CDK5R1, DPP4, F7, DOCK10, SCG3, F5. Mitotic spindle ABL1, ARHGAP4, CLIP2, GSN, MYO1E, PRC1, (52 gene sets) SMC4, TUBGCP6, AKAP13, ARHGAP5, DLG1, HDAC6, NET1, PXN, SPTBN1, WASF2, ALMS1, BIN1, DOCK2, KIF1B, NOTCH2, RABGAP1, STAU1, WASL, ALS2, BRCA2, DST, KIF20B, NUMA1, RALBP1, SUN2, ANLN, CDC42EP4, EPB41, LATS1, PCNT, ROCK1, TBCD, ARFIP2, CDK1, FARP1, MAP1S, PLEKHG2, SASS6, TRIO, ARHGAP29, CLIP1, FLNA, MAP3K11, PPP4R2, SMC1A, TUBGCP5. (Continue next page)

119

Table 3-12 (2) (Continue)

Rank Hallmark molecular pathways Genes

Estrogen response late ACOX2, CALCR, DNAJC12, HSPA4L, MOCS2, (46 gene sets) PLXNB1, SLC29A1, UNC13B, ADD3, CD44, FARP1, JAK2, MYOF, PRLR, SLC7A5, WFS1, ALDH3B1, CDH1, GAL, KLF4, NCOR2, RBBP8, ST14, XRCC3, BCL2, CELSR2, GJB3, KLK10, OPN3, SCNN1A, TMPRSS3, ZFP36, CA2, CHST8, HMGCS2, LSR, PERP, SERPINA3, TRIM29, CACNA2D2, DCXR, HR, LTF, PGR, SLC26A2, UGDH. Estrogen response early ADCY1, CALCR, FARP1, KLF10, MUC1, OPN3, (45 gene sets) SH3BP5, TGM2, ADD3, CBFA2T3, HR, KLF4, MYBL1, PGR, SLC1A1, TMPRSS3, ALDH3B1, CD44, IGF1R, KLK10, MYOF, PODXL, SLC26A2, WFS1, BCL2, CELSR2, JAK2, KRT15, NAV2, RBBP8, SLC7A5, BHLHE40, DLC1, KAZN, MLPH, NCOR2, RRP12, SVIL, CALB2, ELF1, KDM4B, MREG, OLFML3, SCNN1A, TBC1D30. Heme metabolism EPB41, ADD1, FTCD, KHNYN, SIDT2, RBM38, (29 gene sets) RNF123, TMCC2, CA2, SLC6A8, ADD2, OSBP2, GLRX5, BMP2K, PQLC1, C3, LPIN2, EPB42, ANK1, AHSP, PSMD9, CAST, UROS, TRIM10, EZH1, IGSF3, PPOX, RHAG, KAT2B. Fatty acid metabolism CA2, CD36, ACADM, MLYCD, RDH11, GRHPR, (24 gene sets) HSD17B7, RAP1GDS1, HMGCS2, ACAA2, ACADL, ENO3, EHHADH, CYP1A1, METAP1, D2HGDH UGDH, DECR1, UROS, EPHX1, HAO2, FMO1, NTHL1, CBR3. Apical junction WASL, PECAM1, SHC1, ADAMTS5, GRB7, (26 gene sets) MAP4K2, DHX16, CDH1, ITGB1, IRS1, CNTN1, NRAP, RAC2, MPZL1, CALB2, LAMA3, ACTN1, ADRA1B, PCDH1, ACTB, ICAM1, GTF2F1, CDH8, NFASC, PPP2R2C, WNK4. Bile acid metabolism ABCA1, EPHX2, ISOC1, ABCD3, ABCA2, ABCA8, (18 gene sets) ABCA5, GC, PEX1, SOD1, MLYCD, CROT, HSD3B7, ABCA9, CYP7A1, ABCA4, PIPOX, PECR. (Continue next page)

120

Table 3-12 (Continue)

Rank Hallmark molecular pathways Genes

G2M checkpoint ABL1, BARD1, CDC25B, FANCC, KIF20B, NOLC1, (40 gene sets) PRC1, TACC3, ARID4A, BRCA2, CDK1, FOXN3, MCM6, NOTCH2, SLC7A5, TNPO2, ATF5, CASP8AP2, CHAF1A, HSPA8, MKI67, NUMA1, SMC1A, TRA2B, ATRX, CCNT1, DMD, HUS1, MTF2, NUP98, SMC4, TROAP, AURKB, CDC25A, EXO1, ILF3, NCL, POLE, SYNCRIP, WRN. KRAS signaling up ACE, CBL, EVI5, IL7R, NGF, RGS16, SPON1, (39 gene sets) TNNT2, AKAP12, CIDEA, GPRC5B, KCNN4, PECAM1, SCG3, SPP1, TOR1AIP2, ALDH1A3, CMKLR1, HSD11B1, KLF4, PIGR, SERPINA3, STRN, WDR33, C3AR1, CROT, IGF2, LCP1, PPBP, SOX9, TLR8, ZNF639, CA2, DOCK2, IKZF1, NAP1L2, RELN, SPARCL1, TNFRSF1B. E2F targets AURKB, CDC25B, HELLS, MSH2, POLE, RAN, (38 gene sets) SMC6, TCF19, BARD1, CDK1, HUS1, NBN, PPM1D, RPA1, SNRPB, TRA2B, BRCA2, DEPDC1, ILF3, NCAPD2, PRPS1, SHMT1, SSRP1, XRCC6 CCP110, DIAPH3, MCM6, NOLC1, PSIP1, SMC1A, SYNCRIP, CDC25A, E2F8, MKI67, PMS2, RAD51AP1, SMC4, TACC3. Inflammatory response ABCA1, C3AR1, GPR132, KIF1B, OLR1, PTPRE, (38 gene sets) RTP4, TLR2, ADORA2B, CD40, ICAM1, MEFV, OSMR, RELA, SEMA4D, TNFRSF1B, AHR, CD69, IL2RB, MSR1, P2RX7, RGS16, SLAMF1, TNFSF10 ATP2B1, CMKLR1, IL4R, NMI, PTGER2, RHOG, SLC4A4, BST2, F3, IL7R, NPFFR2, PTGIR, ROS1, STAB1. (Continue next page)

121

Table 3-12 (Continue)

Rank Hallmark molecular pathways Genes Unfolded protein response ATF3, EXOC2, EXOSC9, HYOU1, SLC7A5, (10 gene sets) EEF2, EXOSC10, FUS, RRP9, WFS1. MYC target V2 GNL3, PHB, RRP9, UTP20, MPHOSPH10, PPRC1, (8 gene sets) TMEM97, WDR74. Epithelial mesenchymal transition COL16A1, COL5A1, COPA, FLNA, LOXL2, (14 gene sets) MXRA5, TNC, COL4A2, COL5A3, FAS, LAMA1, MMP3, SPP1, VCAN. Hypoxia AKAP12, ATP7A, COL5A1, EFNA1, GPI, PDGFB, (15 gene sets) PPFIA4, ZFP36, ATF3, BHLHE40, DUSP1, GAPDHS, PAM, PNRC1, VLDLR. TNFA signaling via NFKB ATF3, BCL6, CD69, EFNA1, KYNU, PNRC1, TNC, (15 gene sets) ZFP36, ATP2B1, BHLHE40, DUSP1, IL23A, MXD1, TLR2, ZBTB10.

= The results from the GSEA investigation of HSA variants gene symbol, which were agreed by both approaches and the pathways were unique to HSA group.

= The results from the GSEA investigation of HSA variants gene symbol, which were not agreed by both approaches, but the pathways were unique to HSA group. Or the results were agreed by both approaches, but the pathways were not unique to HSA group.

122

3.3.4 Highlighted HSA related genes

Genes with HSA-related germline SNVs were highlighted based on several variant identification methods and approaches. Fifty-five annotated genes were highlighted by exome variant association scan while 2,198 genes were filtered by variant functional analysis. Fifteen genes were identified in both approaches, so they were prioritized as the most likely HSA related by both functions and statistical associations and comprised: CYP1A2, FANCM, BMS1, GAS2L1, JAKMIP1, MIS18BP1, PTPRB, RBM38, RPS8, SOX10, TSG101, XRCC6, AHNAK, MYNN and SALL2. Figure 3- 10 illustrates the lists of HSA related genes/variants and the numbers of genes identified by various approaches are shown in Table 3- 13. Table 3- 14: (1) top prioritized HSA related candidate genes, (2) genes with SNVs that were statistically associated with HSA and (3) genes that might be associated with HSA by the predicted functions included those (3a) having damaging variants at multiple loci (multiple hits), (3b) were predicted by both variant effect predictors (VEP and SIFT) and (3c) genes involved in HSA related hallmark molecular signature pathways.

Figure 3- 10 Summary of HSA related gene lists generated by two major variant identification approaches, exome variant association scan and variant functional analysis. Two candidate gene lists were highlighted as the results of exome variant association scan while five candidate gene lists were highlighted using variant functional analysis approaches. Genes with top prioritized single nucleotide variants from two approaches were combined, 15 genes were highlighted as potential HSA related germline variants

123

Table 3- 13 Multiple list comparison between the results from various HSA related germline variant identification approaches mainly variant association scan ("Association") and variant functional analysis with different variant selection criteria. Fifteen genes were identified by both approaches. (Calculation by Multiple List Comparator tool, molbiotools.com)

Table 3- 14 Genes with HSA-related germline SNVs identified by two variant identification approaches, exome variant association scan and variant functional analysis. 1) Top HSA related genes were identified by both approaches. 2) Significant HSA related genes identified by exome variant association scan. 3a) HSA related genes identified by functional analysis, and the variants were detected at multiple loci on a single gene. 3b) Genes that have damaging variants which were predicted by both variant effect predictor tools. 3c) Genes belong to the hallmark molecular signatures that were possible involved in canine HSA. 1) Top HSA related genes, identified by both approaches

CYP1A2 FANCM BMS1 GAS2L1 JAKMIP1 MIS18BP1 PTPRB RBM38 RPS8 SOX10 TSG101 XRCC6 AHNAK MYNN SALL2

2) Genes with SNVs that were statistically associated with HSA HDAC5 FGB AHNAK CBX3 ARMC3 ETNK1 TNK1 ATP6V1H RPS8 RBM38 SALL2 TRAIP BMS1 TLE7 XRCC6 AP2A2 MIS18BP1 KRR1 NR1D2 PDE7A MARVELD3 HDAC10 HSPG2 ZNF865 ZNF777 TTLL9 ZNF704 WDR24 PSMA7 G6PC3 KLK2 RWDD4 WFDC10A JAKMIP1 FANCM TSG101 GAS2L1 PTPRB PLEKHO1 LSM14B CYP1A2 CYTH1 AKT1 NUMA1 SOX10 PRPF3 MSRA MYNN PLSCR3 AVP GGA3 HNRNPH1 SFRP4 MRPL40 ETNK2

124

3a) Multiple hits, genes with high burden of variants ALPK2 CFAP157 FTH1 ITGAD MTUS1 PPP2CB RP1 AMT CTIF GFI1B KIF21B NLRP10 PROB1 SHROOM3 ARHGEF5 CYP1A2 GRB10 LINS1 OR13C8 PRR19 TG CASP6 DNAH5 IRF2BP2 MICAL2 PARP15 PTGIR TRAPPC2 CEACAM23 FAT2 ITGA1 MITF PKHD1 RAC1 TXNDC12 VCAN VWA8 WWC3

3b) Genes with damaging variants predicted by two prediction tools ADAMTS13 B4GALT3 DCHS2 EXOC3L1 MRPL43 PODXL SLC2A4RG ADAT1 C15H1orf50 DHX34 GFRA2 NPFFR2 PQLC3 TERT ARHGAP22 C9H9orf16 DNAH7 HTR2B OXER1 RAB2A TNFRSF4 ASB18 CPA1 DTD2 KANSL2 OXTR SETD9 TRIM38 TRIM47 VWA7 ZMYM1

3c) Genes in the hallmark molecular signatures that were possible involved in the HSA C3 ACAA2 ACLY CSF2RB IL18BP CTSH PDGFB ACADL SOD1 DNAJC15 MX2 XAF1 FCN1 ERAP2 FAH COL4A1 AGPAT3 ST3GAL5 CMKLR1 GNB4 DUSP5 LIFR BAZ2A MAP4K3 RTP4 FAS KYNU C4BPB APLP2 OMD SORBS1 DDX60 GBP6 BRPF3 CPM COQ9 YWHAG CASP7 ISOC1 TAP1 SERPINB2 ITGAM PPP1R15B CIDEA IFIT2 IL4R BST2 LRP1 LTF COL15A1 DECR1 NMI SP110 TRIM25 F8 PPP2CB CD36 EPHX2 NOD1 ARID5B KLRK1 GCA SPOCK2 ABCA1 ADCY6 CIITA OAS3 RNF213 DYRK2 F3 GBE1 CHUK APOL6 JAK2 CD69 COL4A2 CDK5R1 NDUFS3 ENPP2 TNFSF10 CD40 TRAFD1 KCNIP2 DPP4 ARAF UCK1 HERC6 TRIM2 BANK1 CEBPB F7 RREB1 LIPE ICAM1 BTG1 CA2 RHOG DOCK10 SLC5A6 ACADM IFIH1 IL2RB OLR1 GZMB SCG3 PQLC3 SPARCL1 IFIT1 LGALS3BP XPNPEP1 PCLO F5

125

3.4 Discussion

The potential canine HSA related germline SNVs and genes involved were identified and highlighted. Some of the highlighted variants were predicted to affect the transcripts of genes with tumor suppressing roles. For an example, SNPs 5:32,330,465 (A/G) on the exon of PLSCR3 gene, which is involved in enhancing Tumor Necrosis Factor-α [184], a down regulation of PLSCR3 could delay the induction of tumor apoptosis so that the animal with germline mutations of these genes may be more susceptible to develop cancer. The results demonstrated that 100% of US dogs with HSA carry homozygous possible pathogenic allele, c.139T>C, p.*47Gln. This finding agreed with GWAS results reported by Tonomura et al. (2015) [90], which have identified shared germline risk loci for HSA and B-cell lymphoma on chromosome 5 at 29Mb and 33Mb in 142 American Golden Retriever dogs.

FANCM is a tumor suppressor gene with DNA repair and chromosomal stability maintenance functions. Recent studies also demonstrated the involvement of FANCM in cancer cell survival regulatory mechanisms [185-190]. FANCM loss of function has been studied to a great extent, as damaging germline mutations contributed as a predisposing factor for human breast and ovarian cancer [191-195]. The carriers can have mutations in one allele or both, and it was suggested that carrying biallelic mutations may entail greater susceptibility to cancer [196]. In a case of human ovarian cancer, where the patient underwent germline gene panel testing included FANCM, was reported to carry monoallelic mutations of the gene [197]. In this canine HSA study, significant allele frequency difference was observed between HSA and the control population. It is therefore highly possible that the HSA population may be carrier for this pathogenic allele. Further analytical validation, and investigation in a larger cohort is recommended.

In contrast to the tumor suppressor genes, some highlighted candidates were found to be involved in cancer development. MRPL40 encodes one of the mitochondrial ribosomal proteins, involved in mitochondrial biogenesis that helps translate protein components of oxidative phosphorylation and was found to be expressed in human breast cancer tissue [198]. ZNF704 encodes a member of the zinc finger proteins, a transcription factor family. Roles of some genes in the family were described as tumor suppressor, but many of them were found to be oncogenes in various types of cancers [199]. ZNF704 was included in the group of genes that were associated with prognosis outcome for human lung adenocarcinoma and

126

prostate cancers [200]. In human prostate cancer, high expression of ZNF704 along with 3 other genes were found in patients with metastatic-lethal stage of cancer [201].

To explore whether the genes known to be somatically mutated in HSA were good candidates for being germline predisposing variants, germline variants on genes: TP53, PIK3CA, PTEN, PLCG1, KIT, KDR and CDKN2A/B were inspected. The results from GENESIS R package did not report any significant variants on those genes which were associated with HSA group at genome-wide significant level (p-value < 10-8). Top HSA associated variants with p-value < 0.05 were listed in Appendix 6.

Exome variant association scan coupled with gene set enrichment analysis results found that highly HSA associated germline variants may have effects on immune system i.e. genes down-regulated in CD4 T-cells and genes down-regulated in marginal zone B-lymphocytes. Several genes in the set were found to be involved in tumorigenesis mechanisms such as XRCC6 and HDACs. Polymorphisms in a DNA repair gene, XRCC6 (X-Ray Repair Cross Complementing 6 or X-Ray Repair in Chinese hamster Cell 6) were found to have various effects on cancer risks, either increasing or decreasing the risks of cancers, depending on the SNPs and types of cancers [202]. Polymorphisms in Histone Deacetylase (HDAC) genes, HDAC5 and HDAC10 were found to be associated with HSA in this study cohort. The roles of HDACs are not completely understood but believed to be promoting cancer via their regulation of chromatin remodeling and gene expression (reviewed by Glozak and Seto (2007) [203]). Epigenetic modifying agents that inhibit HDACs (HDACi) have been used in cancer patients as therapeutic agents [204]. In dogs, anti-proliferative effect of HDACi agents were observed in many types of cancer cell lines including B-cell lymphoma, osteosarcoma, prostate cancer, urothelial carcinoma, mast cell tumour and histiocytic sarcoma. The agents appeared to reduce tumour growth in mouse tumour xenograft as well [205-212]. There has been on case-report of a dog diagnosed with HSA which was treated with HDACi after splenectomy and have survived more than 1,000 days, however the evidence of clinical trial for the use of HDACi for treating HSA is limited [213, 214]. As HDACs were found to be associated with canine HSA, further validation studies for the roles of HDACs in canine HSA may be worth considered.

Another variant filtering approach, functional annotation coupled gene set enrichment analysis clearly pointed to the involvement of HSA related genes in Adipogenesis (genes up- regulated during adipocyte differentiation), Complement (genes encoding components of the

127

complement system, which is part of the innate immune system) and Interferon gamma response (Genes up-regulated in response to IFNG) hallmark molecular signatures. The results provided additional evidence that some germline variants may have impacts on genes regulating immune function and facilitate HSA development as the consequence. The findings are corroborating other gene expression profile pathway analysis studies of HSA tumor, which suggested that the disruption in T-cell mediated immune response, inflammation, angiogenesis, cancer and hypoxia mechanisms could be involved in canine HSA cancer biology and possible distinct molecular subtypes [86, 87, 90, 96]. With germline variants already identified, further investigation on somatic mutations was performed using the tumor samples obtained from 5 dogs which were included in this study (see Chapter 4).

Looking at the gene set enrichment analysis results from two input data set, obtained by two approaches, functional analysis approach gene sets resulted in cancer related hallmark molecular signatures while no hallmark molecular signature was identified when using the input from variant association scan. Each variant identification approach has its own strength, between statistical and biological sense. GSEA tend to work well when having adequate numbers of genes that are involved in similar biological pathways, so the input from functional analysis approach results in better categorized gene sets. Here two methods of data input by functional analysis approach were also tested (Figure 3- 3). The results showed that selecting the set of genes that were uniquely identified in HSA cases may help narrow down the molecular signatures which are specific to the HSA.

Amongst the highlighted HSA related variants, no genotypic difference between samples from two geographical regions was observed so far. The differences of the genotypes were shown to be strongly related to HSA status. The segregation of risk genotypes in each breed should be investigated further to confirm which factors contributed to higher HSA risk, between the risk genes themselves or whether they were already segregated in some breeds.

Possible factors that may cause false positive associations in variant association analysis especially breed-related markers were adjusted using GENESIS R/Bioconductor package [180]. Even though inflation factor (λ) below one was obtained after the adjustment, large numbers of highly significant HSA associated markers were still detected which suggested incomplete genomic control of breed population structure since the case and control included in this study were dogs of various breeds. Taking this limitation into account, over interpretation of the results was minimized by performing variant association analysis along with functional annotation of the candidates, prioritizing cancers

128

related genes and recommendation of further analysis in specific breed or even in an individual patient.

In summary, HSA dogs in the sample cohort had germline mutations in some of the tumor suppressor genes, oncogenes, as well as genes involved in adipogenesis, complement and interferon gamma pathways. Possible loss of function mutations in tumor suppressor genes (e.g. FANCM, PLSCR3, XRCC6 etc.) may lead to genomic instability thus heighten the risk of developing HSA. While germline mutations of genes in complement and interferon gamma pathways may alter the functions of immune regulatory system which helps preventing or eliminating cancer cells. However, the mechanism is not yet clearly understood and should be further investigated.

This study has highlighted possible HSA related germline variants and genes that might have been affected by or involved in the tumorigenesis. To continue further investigation, several steps in validation are still needed such as sequence viewing, exploration of a larger cohort and performing association analysis using case and control from the same breed. With the list of candidate genes having been narrowed down, genotype scan in another cohort can be done by target only genes/variants of interests using several methods such as target sequencing, SNPs array or PCR. Inherited germline mutation cannot be confirmed unless familial genomic data are analyzed. Further study may be conducted by genotyping families of dogs that have cancer and perform pedigree analysis of least three generations.

129

Chapter 4 Somatic alterations in splenic haemangiosarcoma of five dogs

4.1 Introduction

Hemangiosarcoma (HSA) is a vascular endothelial malignancy, which occur in both humans and dogs. Splenic HSA in dogs is common, especially in some breeds such as Golden Retriever, German shepherd, Labrador Retriever [10, 11, 15, 17, 18, 88, 89]. In humans, angiosarcoma is considered rare when compared to other cancers [215, 216]. The dog may be a useful animal model to study this 'common in dog/rare in human' cancer to gain more information which might then contribute to future personalized medicine for both species [82, 84].

Companion animal medicine is developing in parallel with human medicine and is often translational between the species. Cancer is one of the diseases where dogs and humans share similarity in many aspects, hence it is no wonder that dogs are becoming popular models for studying various types of human cancers and also one of the clinical trial candidates in the era of personalized medicine [217, 218]. For cancer patients, target sequencing of the somatic and cancer genome may be performed as a routine protocol to obtain the individual's genomic data for selecting the appropriate treatments [79]. And with patient's consent, the gathering of data can be fundamental for the research to improve the preciseness in future tailored treatment [219-223].

The mechanism of HSA ontogeny is not completely understood; however, ongoing research have revealed important genomic events of canine HSA. Target sequencing of an HSA panel and sub-typing may be possible in the future [85]. As for most of the biomarkers, before the markers can be implemented in the clinic, multiple validations in different cohorts are needed [110].

Thomas et al, (2014) [83] performed Oligonucleotide Array Comparative Genomic Hybridization (oaCGH) genomic profiling of HSA tumors from 40 Golden Retriever (GR) dogs plus 35 non-GR dogs and found significant gain of copy number on chromosomes 5, 12, 13, 24 and 31, while copy number losses were found on chromosomes 11 and 16, where tumor suppressor genes CDKN2A/B and its interacting protein gene CDKN2AIP reside. In more than 20% of the sample cohort, copy number aberrations were found at the regions where genes

130

involved in HSA and other cancers reside, such as SKI (21%), VEGFA (22% in GR cohort, 40% in non-GR), PDGFRA (20%), KIT (27%) and KDR (28%). Unlike in many dog cancers, copy number variations (CNVs) were not recurrent in famous tumor-associated genes like PDGFRB, TP53, MYC and PTEN in the HSA cohort. Differences in CNVs between Golden Retrievers and other dog breeds, including Australian Shepherd Dog, Bernese Mountain Dog, Flat-coated Retriever and German Shepherd Dog were observed, suggesting that copy number profile may differ between some breeds.

Somatic copy number variations were also identified from the oaCGH data of 69 HSA tumors from Golden Retrievers [82]. Copy number gains were observed in VEGFA (19% of the cohort), KDR (22%), KIT (17%) and MYC (9%), whereas copy number deletions were observed in CDKN2A/B (22%) and AXIN1 (22%). Some samples had a mix of both gain and loss of copy number in PTK6 (gain 8.7%, loss 17%), ARFRP1 (gain 13%, loss 17%) and RTEL1 (gain 13%, loss 17%). Both studies of canine HSA genomic profile using oaCGH showed similar results in that copy number deletions were found on tumor suppressor genes CDKN2A/B and copy number gains were detected on genes that were involved in cancer promotion.

While CNVs were not recurrent on cancer-associated genes TP53, PTEN, PDGFRB and MYC [83], somatic point mutations in some of those genes were reported to be significant in canine HSA [82, 84]. Whole exome sequencing of matched tumor-normal samples was performed in 20 dogs of various breeds and highlighted candidate mutations in canine HSA such as TP53 which were detected in 35% of the cohort, PTEN, 10%, PIK3CA, 45% and PLCG1, 5% [84]. Similar findings were observed in 47 Golden Retrievers as they showed significant numbers of mutations in seven genes, TP53(60%), PIK3CA (30%) PIK3R1, ORC1, RASA1, ARPC1A, and ENSCAFG00000017407 (<10%) [82]. Target sequencing of HSA candidate gene panels including TP53, PTEN, PIK3CA, PLCG1 and NRAS were conducted in 20 cases from a previous study [84] plus 30 additional cases, and the results showed somatic mutations in one or more genes in the panel, with gene mutations in TP53 in 66% of the dogs, PTEN, 6%, PIK3CA, 46% and PLCG1 in 4% of tumors [85]. Based on these results five distinct groups for HSA intra-tumor molecular subtypes were proposed [85].

The factors contributing to the heterogeneity of genetic events in canine HSA were unclear but likely the genetic background of the dog breeds plays a role. The results that showed clustering within breeds, mostly Golden Retriever, were observed in DNA copy number

131

profile and gene expression profile [83, 87], whereas intra-tumor molecular subtypes were reported by[96]. Exploration in larger cohorts with greater variety of breeds are still needed.

The aim of this study was to identify somatic alterations in the exome of canine HSA focusing on single nucleotide mutations, with exploration of copy number variations. Whole exome sequencing of matched tumor-normal tissue samples obtained from five HSA diagnosed dogs was performed, followed by bioinformatic analyses. As samples included in this study were dogs of two distinct genetic background, Golden Retriever and Bull Mastiff, hence, I anticipated to see differences in the somatic alteration profiles between the two breeds and more similarity within the same breed.

132

4.2 Materials and methods

4.2.1 Sample

Fresh frozen HSA tissues and EDTA blood samples were obtained from four Golden Retrievers and a Bull Mastiff. Sample collection from the Veterinary Medical Centre, School of Veterinary Science, the University of Queensland in 2017, was performed under the Animal Ethics Approval (AEC) number SVS/438/15/KIBBLE/CRF with the animal owner's consent. Four HSA tissues with corresponding EDTA blood samples were purchased from Medical Oncology Research and Tissue Archiving, Veterinary Teaching Hospital, Colorado State University, Fort Collins, Colorado, USA. Animal profiles are shown in Table 4- 1.

Table 4- 1 Animal profile. Sample Age Sex Breed Sample Tumour site Source* s1810T_GR 8 FS Golden Retriever Blood and tissue spleen a s1422T_GR 10 FS Golden Retriever Blood and tissue spleen a s1340T_GR 11 FS Golden Retriever Blood and tissue spleen a s1625T_GR 12 MC Golden Retriever Blood and tissue spleen a sMART_BM na F Bull Mastiff Blood and tissue spleen b a = Medical Oncology Research and Tissue Archiving, Veterinary Teaching Hospital, Colorado State University, Fort Collins, Colorado, USA. b = Veterinary Medical Centre, School of Veterinary Science, the University of Queensland. na = not available.

4.2.2 Genomic DNA extraction and whole exome sequencing

Genomic DNA extraction from HSA tissues and matched EDTA blood samples was performed as described in the materials and methods in Chapter 3. Additional steps in tissue lysate and DNA cleaning were conducted to obtain high quality genomic materials. DNA quality control was performed as described in Chapter 3.

Libraries were prepared using SureSelectXT HS Target Enrichment System for Illumina Multiplex Sequencing (Agilent Technologies, Inc., USA) then sequenced by NextSeq500 ® sequencer. These processes were performed by the Ramaciotti Centre for Genomics, UNSW.

133

4.2.3 Identification of somatic mutations and functional analysis

I performed all the data analysis using the high-performance computing cluster at Queensland Institute of Medical Research (QIMR Berghofer, Herston). Data pre-processing was performed as described in Chapter 3, using canine reference genome (CanFam3.1, GCA_000002285.2) for the exome sequence alignment. Bioinformatic tools and workflow of data processing are illustrated in Figure 4- 1. Variant caller tools, VarScan2 and Genome Analysis Toolkit's (GATK 4.0.1.0) Mutect2 were applied to detect somatic mutations included single nucleotide variants (SNVs) and small insertion and deletion (indels) [178, 179]. Copy number variations (CNVs) were also identified using the default settings of VarScan2.

SNVs were then functional annotated using Variant Effect Predictor tool (VEP 92.3). I filtered the Single nucleotide variants that were predicted to cause high impacts on consequent transcripts plus missense variants where an amino acid substitution could have affected protein functions according to calculated variant consequences and SIFT (Sorting Intolerant From Tolerant) predictor tools [181, 183]. All the somatic mutations identified by VarScan2, regardless of their predicted effects were investigated for molecular pathways, that the mutations in the genes might have involved, by searching through molecular signatures database (MSigDB) using GSEA: Gene Set Enrichment Analysis [182]. The exome sequence alignment at the identified mutation sites were visualized using Integrative Genomics Viewer (IGV) [224].

Using the somatic mutations called by GATK's Mutect2 (Figure 4- 1), single base substitution patterns of each sample were generated using R package, MutationalPatterns [225]. The mutational patterns of the samples were compared to human's COSMIC: Catalogue of Somatic Mutation in Cancer [226].

The identification of copy number variations in exome sequence data was based on the ratio of sequence reads compared to mean number of reads along the entire chromosomes. Copy number variations results from VarScan were plotted and inspected using R package, DNAcopy [227]. Recentering was performed in some samples using VarScan's recenter command, then smoothing and segmentation were performed. The cut-off points were set, above 0.6 for amplification and below -1.0 for deletion.

134

Figure 4- 1 Workflow for somatic mutations identification from matched tumour-normal whole exome sequence data to obtain individual's single nucleotide variations (SNVs), small insertion-deletion (Indels) and copy number variations (CNVs). The effect of SNVs and indels were predicted using Variant Effect Predictor tool (VEP), the mutations which were predicted to be damaging in the cancer were filtered and highlighted. Gene Set Enrichment Analysis (GSEA) was also performed using the highlighted mutations to see which molecular signature hallmarks were possibly affected by the somatic mutations. Mutational patterns were visualised and compared to human catalogue of somatic mutations in cancers (COSMIC).

135

4.3 Results

4.3.1 DNA quality

DNA samples extracted from fresh frozen tumor and EDTA blood of HSA dogs were pre- assessed using spectrophotometer (Nanodrop®), fluorometer (Qubit® 3.0) and 0.8% agarose gel electrophoresis. Samples with inadequate DNA quantity and quality were re-extracted until prerequisite amount and quantity were met. The DNA quantity, concentration and

A260/280 absorbance ratio are recorded in Table 4- 2, and the pre-assessed agarose gel images are shown in Figure 4- 2, A and B. Genomic DNAs were assessed on arrival at the sequencing facility (Figure 4-3, A and B) then the samples were processed through exome libraries construction and whole exome sequencing. Figure 4- 4 shows DNA electrophoresis image of samples after DNA shearing. Phred quality score above 30 was reported for all sequenced samples (mean 31.4725; 31.18 - 31.68) which suggested 99.9% accuracy of the base call by the sequencing platform.

136

Table 4- 2 Genomic DNA quality control before sample submission and on arrival at genomic sequencing facility quality control. Quality control on arrival data was provided by Ramaciotti Centre for Genomics, 2017. Sample Pre-quality control Quality control on arrival

name Nano Qubit A260/28 Xpose Qubit A260/28 DIN

0 0

1810B 113.9 76.0 1.86 86.6 83.1 1.82 8.7

1422B 346.2 520.0 1.86 225.0 215.1 1.84 8.3

1340B 95.0 105.4 1.95 72.4 66.6 1.92 8.0

1625B 84.6 94.4 1.86 80.2 71.7 1.86 8.2

MARB* 157.2 66.2 1.87 143.1 91.6 1.87 8.1

1810T na na na 17.1 10.6 1.92 2.9

1422T 412.0 326.0 1.88 404.2 364.2 1.89 7.5

1340T 107.9 146.4 1.87 na 56 1.85 6.6

1625T 153.4 328.0 1.87 196 165 1.87 7.4

MART* 878.4 416.0 1.88 1009.4 619 1.86 7.7

Nano = dsDNA concentration (ng/µl) measured by Nanodrop® Qubit = dsDNA concentration (ng/µl) measured by Qubit® Xpose = dsDNA concentration (ng/µl) measured by Xpose® DIN = DNA Integrity Number * = Re-submitted samples. na = data not available or not performed according to limited sample quantity.

137

Figure 4- 2 Pre-assessment electrophoresis gel image of genomic DNAs extracted from HSA tumors and blood samples. A. Matched tumor-blood samples of dog No. 1810, 1422, 1340 and 1625. Most DNA samples showed strong, less smear bands except for the DNA extracted from the tumor of No. 1810 dog (1810T) where the smear indicated overall DNA fragmentations. B. Matched tumor-blood samples of dog sample MAR. The position referenced by the ladder on the left suggested most of the DNA sizes were larger than 1 kbp.

138

Figure 4- 3 On arrival electrophoresis gel image of submitted genomic DNA samples. A, B. The intense bands of the DNA according to the ladders showed that the length of the DNA strands were between 7 – 48.5 kbps. DNA integrity number (DIN) indicates the level of DNA degradation, DIN above 7 is generally acceptable for downstream processes. Two samples, 1340T and 1810T had low DIN, due to limited sample availability, the samples were processed along with the others in the batch. Images were generated by automated electrophoresis tool, Agilent 2200 TapeStation® with Agilent Genomic DNA ScreenTape® System. (Modified, image retrieved from Ramaciotti Centre for Genomics, 2017).

139

Figure 4- 4 DNA electrophoresis of samples after DNA shearing. The next process following DNA quality assessment was to shear the DNA into desirable size. The sizes of the DNA fragments of all samples, were about 200 bps. Image generated by automated electrophoresis tool, Agilent 2200 TapeStation® with High Sensitivity D1000 ScreenTape® System (Modified, image retrieved from Ramaciotti Centre for Genomics, 2017).

140

4.3.2 Single nucleotide mutations

The candidates presented here are selected from various aspects included; 1) functional annotated consequence of the mutations: gene mutations that were predicted to cause damaging effects were selected, 2) case investigation: based on the hypothesis that same causative events can be found in the same disease, I scanned for mutations that presented in 60% or more of the cases, 3) hot spot of mutations: as multiple mutations on some genes may lead to genomic instability and cancer, I highlighted the genes that harbored multiple mutations, 4) previously reported significant mutations: I compared the mutations identified in this sample cohort to the published canine HSA significant somatic mutations including 'HSA panel' [82, 85] and, 5) two-hit cancer model [228, 229]: I identified germline mutation candidates for canine HSA in the previous chapter, therefore those genes were inspected for any possibilities of 'second hit' somatic mutations.

4.3.2.1 Mutations with impacts on consequent transcripts and amino acids by prediction Nearly 200 single nucleotide mutations per sample were identified by variant caller; predicted damaging mutations are listed Table 4-3. Among 23 candidates, gene set enrichment analysis of molecular signatures found overlap of 8 genes; DUSP6, NFKBID, COPS3, KPNB1, KIF21A, SERF2, PABPC1L and COL24A1 in the regulatory target gene set (Collection C3), which were genes containing one or more binding sites for UniProt:P28698 (MZF1, myeloid zinc finger 1) in their promoter region.

Table 4- 3 Genes where single nucleotide mutations were predicted to cause damaging effects on the transcripts and proteins.

cDNA Amino Chr_position_ref/alt Consequence Gene Symbol position acids Codons ENSCAFG 1_112135246_T/G splice_donor_variant 00000004882 - - - 1_113170067_A/C splice_donor_variant SPTBN4 - - - 1_116769288_A/C splice_acceptor_variant NFKBID - - - 10_17238266_T/G splice_donor_variant IL17REL - - - 11_52219775_A/C start_lost TPM2 67 M/R aTg/aGg 13_37303254_T/G stop_lost CCDC166 1311 */C tgA/tgC 15_30326507_A/C start_lost DUSP6 2 M/R aTg/aGg 17_60313406_C/T stop_gained PSMD4 1082 R/* Cga/Tga 19_32026476_C/G splice_acceptor_variant DDX18 - - - 20_44859191_T/G splice_acceptor_variant MAST3 - - - 21_31322561_T/C splice_acceptor_variant CYB5R2 - - - 24_32347960_T/G splice_donor_variant PABPC1L - - - 25_7445937_A/G start_lost ENSCAFG 2 M/T aTg/aCg 141

00000006373 27_14259005_G/T stop_gained KIF21A 910 E/* Gag/Tag 30_10536215_G/C start_lost SERF2 3 M/I atG/atC 32_10540827_C/G stop_gained AFF1 638 S/* tCa/tGa ENSCAFG 32_30753341_G/A start_lost 00000030091 1 V/M Gtg/Atg 5_42143514_A/C start_lost COPS3 41 M/L Atg/Ctg 6_61955536_G/T stop_gained COL24A1 619 G/* Gga/Tga 7_31380090_A/C splice_donor_variant MAEL - - - 7_887484_T/G start_lost GPR37L1 328 M/L Atg/Ctg 9_24004331_T/G splice_donor_variant KPNB1 - - - X_120804813_T/G splice_donor_variant ZNF185 - - -

4.3.2.2 Co-mutated mutations Mutations found in 80% (4/5) of samples occurred in the ADCY1 and CNN2 genes. Three in five samples (60%) had mutations in ARHGAP45, ATXN2L, CYP1A2, KRT3, SBNO2, TUFM andU6 genes. The list of genes and predicted consequence is shown in Table 4- 4. When inspecting the mutation sites of gene ADCY1 in each sample, two samples had missense mutations while the others were synonymous, meaning the change of one base pair do not change the encoded amino acid. Mutations in the CNN2 gene appeared to be at the intron and at the 3' region of the gene.

Another missense mutation was found in the coding region of ARHGAP45 gene in sample s1422, while the mutations in other samples were found to be on the introns. Two samples (s1340 and s1422) had single nucleotide mutation at the same position on chromosome 6:18355050 (T/C), highlighted (bold letters) in Table 4- 4, where it is found to be on a sequence further down from 3' untranslated region (3'-UTR) end of genes ARHGAP45 and TUFM. The full list of co-mutated genes is shown in Appendix 7.

Table 4- 4 Single nucleotide mutations that were detected in at least 60% of samples.

Chr_position_ref/alt Gene Consequence cDNA Protein Sample 16_1038452_A/T ADCY1 missense_variant c.604T>A p.Tyr202Asn s1422 16_1039431_C/G missense_variant c.475G>C p.Asp159His s1625 c.3372G>C p.Asp1048His s1625 c.3522G>C p.Asp1098His s1625 16_1115585_G/A synonymous_variant c.1664C>T s1810 c.1433C>T s1810 c.1514C>T s1810 16_1115570_A/G synonymous_variant c.1679T>C s1810 16_1120681_G/A synonymous_variant c.1319C>T sMAR

142

c.1088C>T sMAR c.1169C>T sMAR 20_57636039_T/G CNN2 downstream_gene_variant s1340 20_57657476_A/C intron_variant s1422 20_57681788_T/C downstream_gene_variant s1625 20_57683293_T/G downstream_gene_variant sMAR 20_57657476_A/C ARHGAP45 missense_variant c.3168T>G p.Val1024Gly s1422 20_57716139_T/G intron_variant s1422 20_57681788_T/C intron_variant s1625 20_57683293_T/G intron_variant sMAR 6_18355050_T/C ATXN2L downstream_gene_variant s1340 6_18355050_T/C downstream_gene_variant s1422 6_18354403_T/G intron_variant s1810 30_37796763_C/T CYP1A2 upstream_gene_variant s1340 30_37819105_A/G intron_variant s1422 30_37822789_T/A intron_variant s1810 27_2296529_A/C KRT3 intron_variant s1340 27_2227083_A/C intron_variant s1422 27_2450104_G/C intron_variant s1625 20_57636039_T/G SBNO2 intron_variant s1340 20_57390536_C/A intron_variant s1422 20_57519418_A/C intron_variant s1625 6_18355050_T/C TUFM downstream_gene_variant s1340 6_18355050_T/C downstream_gene_variant s1422 6_18354403_T/G downstream_gene_variant s1810 20_44337284_T/G U6 downstream_gene_variant snRNA s1340 19_2224458_T/G upstream_gene_variant snRNA s1810 19_33434040_G/C downstream_gene_variant snRNA s1810 6_21295918_G/T downstream_gene_variant snRNA sMAR

4.3.2.3 Possible hot spots of mutations Possible hot spots of mutations were analysed in gene set enrichment analysis. The results showed 10 genes including GLUL, RIDA, ESRRA, COL11A2, ZNF296, RSAD2, RIN2, XRCC1, POLRMT, MTFR2 and PSMD4 (Table 4- 5) that were involved in the top 3 overlapping molecular signatures. According to the database, the gene sets were found; 1) up-regulated in ME-A cells (breast cancer) undergoing apoptosis in response to doxorubicin (GRAESSMANN_APOPTOSIS_BY_DOXORUBICIN_UP), 2) related to the arrangement and bonding together of a set of macromolecules to form a protein-containing complex (GO_PROTEIN_CONTAINING_COMPLEX_ASSEMBLY) and 3) partially belong to a protein complex, that caps one or both ends of the proteasome core complex and regulates

143

entry into, or exit from, the proteasome core complex (GO_PROTEASOME_ACCESSORY_COMPLEX).

Table 4- 5 Possible hot spots of mutations genes.

Chr_position_ref/alt Gene Consequence cDNA Protein Sample 7_15732271_G/A GLUL 3_prime_UTR_variant s1422 7_15732308_G/T 3_prime_UTR_variant s1422 7_15732280_C/T 3_prime_UTR_variant s1625 13_349073_A/T RIDA 3_prime_UTR_variant s1810 13_349080_G/A 3_prime_UTR_variant s1810 13_349068_A/G 3_prime_UTR_variant s1810 13_349080_G/A 3_prime_UTR_variant sMAR 18_52761147_A/C ESRRA synonymous_variant c.756T>G s1422 18_52766521_A/C upstream_gene_variant s1422 12_2630170_A/C COL11A2 missense_variant c.4604T>G p.Val1513Gly s1422 12_2630170_A/C missense_variant c.4462T>G p.Val1427Gly s1422 12_2630170_A/C missense_variant c.4399T>G p.Val1406Gly s1422 12_2647882_G/C missense_variant c.1038C>G p.Phe324Leu s1422 1_110411271_T/G ZNF296 missense_variant c.1029T>G p.Cys322Gly s1625 1_110415194_T/G downstream_gene_variant s1625 17_4501967_C/G RSAD2 intron_variant s1340 17_4501972_C/G intron_variant s1340 24_3373979_T/G intron_variant s1422 24_3375178_G/C intron_variant s1422 24_3373979_T/G RIN2 intron_variant s1422 24_3375178_G/C intron_variant s1422 1_111566218_C/G XRCC1 synonymous_variant c.595C>G s1422 1_111567070_A/G intron_variant s1422 20_57915383_T/C POLRMT intron_variant s1340 20_57916935_C/G intron_variant sMAR 1_28830850_A/G MTFR2 downstream_gene_variant sMAR 1_28830916_A/G 3_prime_UTR_variant sMAR 1_28830940_T/C 3_prime_UTR_variant sMAR 1_28830947_C/T 3_prime_UTR_variant sMAR 1_28830948_A/G 3_prime_UTR_variant sMAR 17_60313510_A/G PSMD4 3_prime_UTR_variant s1810 17_60313403_A/G missense_variant c.1079A>G p.Ile352Val s1810 17_60313406_C/T stop_gained c.1082C>T p.Arg353* s1810 17_60313413_C/T missense_variant c.1089C>T p.Ala355Val s1810 17_60313500_G/A 3_prime_UTR_variant s1810

144

4.3.2.4 HSA panel mutated genes A list of HSA somatic gene mutations generated from previously reported studies [82, 85] consisted of 35 significant genes where point somatic mutations occurred in the canine HSA cohort, 'HSA panel' (Table 4- 6, A). Pairwise comparison was performed between HSA panel to the identified candidate genes from five samples in my set.

Sample "s1422" had 4 overlapping mutations with the HSA panel: TP53, PIK3CA, COL1A2 and USP21 while sample "sMAR" had one overlap gene, HRAS (Table 4- 6, B). Figure 4- 5 shows genome view of TP53 gene from all dogs, no mutations on TP53 were observed in other samples.

To explore two-hits phenomenon on tumour suppressor gene TP53, paired blood-tumour exome sequences were inspected; no germline variants were detected on TP53 gene of sample s1422 (Figure 4- 6). Germline variants on the other genes that underwent somatic mutations (PIK3CA, COL1A2, USP21 and HRAS) were also inspected, but no evidence of two-hit phenomenon observed in this cohort.

Table 4- 6 A. HSA panel and B. Pairwise comparison between point somatic mutations in the genes identified in each sample and to the HSA panel. (Calculation and image were generated by Multiple List Comparator tool, molbiotools.com)

A B TP53 TYRO3 KDM6A KRAS DPH1 PIK3CA USP21 MYD88 MAPK1 KDR PTEN GRM5 SEMA3A PDGFRB BRAF PLCG1 ANO2 FOXO3 CDKN2B ORC1 NRAS AKIP1 CBLC COL1A2 RASA1 DSCAM BMP15 OTULIN KIT ARPC1A IREB2 FRK HRAS FBXW7 ENSCAFG00000017407

145

Figure 4- 5 Genomic view of TP53 gene in five tumour samples. Each sample was performed paired-end whole exome sequencing. Only sample 's1422' has a missense somatic mutation at chromosome: base pair position Chr5:32,563,409 (C>T). No somatic mutation of TP53 was detected in other samples. (Image generated by Integrative Genomics Viewer, IGV).

Figure 4- 6 Genomic view of TP53 gene in paired blood and tumour from sample 's1422'. No germline variants were observed. Two hits phenomenon did not occur in TP53 gene in this individual. (Image generated by Integrative Genomics Viewer, IGV). 146

4.3.2.5 Germline-somatic mutations HSA related germline variations were highlighted in Chapter 3 by various approaches. The affected genes included; 1) those having germline variants that were presented in most of the HSA dogs but not in the control, 'germline_assoc' (Appendix 8), 2) genes that were involved in adipogenesis, interferon gamma and complement pathways and genes that have damaging germline variants, 'germline_func', (Appendix 9). Table 4 - 7 shows pairwise comparison between genes with HSA-related germline variants and somatic mutated gene candidates. With 'germline_assoc' gene list, three samples had somatic mutations in CYP1A2 while two samples had mutations in HSPG2, KRR1 and SALL2 genes (Table 4 - 7, A). The comparison between 'germline_func' gene list and somatic mutated genes also revealed three samples with germline and somatic mutations in CYP1A2 gene while DNAH7 and LINS1 were found to be mutated in two samples (Table 4 - 7, B).

Table 4 - 7 Pairwise comparison between single nucleotide somatic mutations in each sample and to genes that had HSA related germline variants (possibly mutations). A. compared with variant association scan approach, 'germline_assoc' and B. variant functional analysis, 'germline_func'. Genes listed below the tables were likely to have both germline and somatic mutations. (Calculation and image were generated by Multiple List Comparator tool, molbiotools.com)

A B

s1340 CYP1A2 HDAC10 s1340 CYP1A2 DNAH7 CTIF PROB1 s1422 CYP1A2 HSPG2 KRR1 SALL2 s1422 CYP1A2 DNAH7 B4GALT3 CIDEA ATP6V1H CYTH1 ZNF704 COQ9 CSF2RB DHX34 VCAN s1625 HSPG2 KRR1 SALL2 TTLL9 VWA7 WWC3 s1810 CYP1A2 AHNAK MRPL40 TSG101 s1625 AMT TRAPPC2 XAF1 s1810 CYP1A2 LINS1 COL4A1 TG sMAR MIS18BP1 sMAR LINS1 C9H9orf16 F8 MICAL2 SLC5A6 SORBS1

147

Matched normal-tumour exome sequences of each sample were aligned and visualized by Integrative Genomics Viewer, to confirm true somatic mutation and to search for evidence of second hit on the genes with HSA-related germline variants reported in Table 4 - 7 which may disrupt the function of the gene. The proportion of the base pair alleles was noted in Table 4- 8. Germline variants on gene HSPG2 at chromosome position 2:77345896-97 was also detected by VarScan as loss of heterozygosity (LOH), p-value 0.0025 and 0.0036, respectively. Samples of IGV screen capture of the viewed exome sequences (s1625-HSPG2 and sMAR-AMT) are shown in Figure 4- 7 and Figure 4- 8. Copy number aberrations were observed on almost the entire segments of AHNAK genes in sample s1810, both normal and tumour samples, reflecting the alteration of this event occurred in the germline. The depth of coverage of the sequences were as high as 400 reads (Figure 4- 9). The exploration was extended to other samples, all samples showed evidence of copy number aberrations of the AHNAK genes (Figure 4- 10, A, B, C).

Table 4- 8 Genes with HSA-related germline variants that also underwent somatic mutations. Changes in the proportion of heterozygous alleles were visualized on IGV. The proportion of reference and alternate alleles at the chromosome: position of the gene is reported as percentages. Two paired-end sequences (R1 and R2) from normal and tumour are shown.

Sample s1422 GENE Chromosome: Ref/Alt Normal Reads Tumour Reads Position R1 (Ref/Alt) % R1 (RefAlt) % R2 (Ref/Alt) % R2 (Ref/Alt) % HSPG2 2:77345896 T/C 70/30 53 72/28 39 75/25 55 77/23 43 2:77345897 A/G 67/32 52 72/28 39 75/25 55 77/23 43 2:77345944 A/G 82/15 39 80/16 45 87/13 47 91/9 45 2:77345947 A/G 82/15 40 80/15 46 89/9 46 89/9 45 VWA7 12:1249662 A/G 52/45 33 37/63 41 45/55 38 36/64 45 12:1249677 G/T 46/50 28 64/36 36 57/43 35 62/38 42

148

Sample s1625 GENE Chromosome: Ref/Alt Normal Reads Tumour Reads Position R1 (Ref/Alt) % R1 (Ref/Alt) % R2 (Ref/Alt) % R2 (Ref/Alt) % HSPG2 2:77345896 T/C 77/23 43 86/14 49 LOH, p-value = 70/30 47 92/6 51 0.0025 2:77345897 A/G 77/23 44 86/14 49 LOH, p-value = 70/30 47 94/6 51 0.0036 AMT 20:39819372 T/C 80/18 61 94/6 64 89/11 63 92/8 60 20:39819399 G/A 75/25 65 90/10 71 80/20 64 87/11 70 20:39819421 T/C 75/25 79 88/9 75 74/22 65 86/13 72 20:39819437 C/T 73/26 78 88/12 76 77/23 64 86/14 78 20:39819441 C/T 72/27 78 88/12 78 75/25 65 86/13 79 20:39819613 G/T 63/37 52 84/16 45 67/33 57 80/20 40 20:39819617 G/A 65/33 52 84/16 45 67/33 57 80/20 40 20:39819620 G/A 65/35 52 84/16 45 63/36 56 79/21 39 Sample s1810 GENE Chromosome: Ref/Alt Normal Reads Tumour Reads Position R1 (Ref/Alt) % R1 (Ref/Alt) % R2 (Ref/Alt) % R2 (Ref/Alt) % CYP1A2 30:37819854 C/T 71/29 98 82/18 119 70/30 92 74/25 135 30:37821271 C/T 78/22 68 76/23 90 74/25 80 77/23 94 30:37823718 T/C 76/24 38 78/19 36

149

78/22 37 78/20 40 30:37823758 C/T 69/29 68 70/29 69 75/24 68 75/25 77 LINS 3:40434144 C/T 77/22 43 63/37 27 75/25 44 68/32 22 3:40434158 A/G 81/19 43 63/38 32 75/25 44 68/28 25 3:40434162 A/T 81:19 42 63/38 32 73/27 44 64/36 25 3:40434171 G/C 78/22 46 67/33 39 73/27 49 67/33 30 Sample sMAR GENE Chromosome: Ref/Alt Normal Reads Tumour Reads Position R1 (Ref/Alt) % R1 (Ref/Alt) % R2 (Ref/Alt) % R2 (Ref/Alt) % LINS1 3:40429562 C/T 73/27 82 90/10 79 77/23 83 87/13 86 3:40429628 C/T 75/25 61 89/11 46 79/21 56 90/10 51 3:40434171 C/G 66/34 47 81/19 47 71/29 48 81/19 42 3:40436465 G/A 5/95 41 28/72 43 11/89 38 27/23 37

150

Figure 4- 7 Genomic view of HSPG2 gene in paired blood and tumour from sample 's1625'. Heterozygous germline variants are detected in normal sample while in the tumour, homozygous alleles were seen. (Image generated by Integrative Genomics Viewer, IGV).

Figure 4- 8 Genomic view of AMT gene in paired blood and tumour from sample 'sMAR'. Heterozygous germline variants are detected in normal sample while in the tumour, homozygous alleles were seen. (Image generated by Integrative Genomics Viewer, IGV).

151

Figure 4- 9 Genomic view of AHNAK gene in paired blood and tumour from sample 's1810'. High depth of coverage (up to 300 reads in this IGV screen shot) are observed across the gene. (Image generated by Integrative Genomics Viewer, IGV).

152

A

B

C

Figure 4- 10 A, B, C. Genomic view of AHNAK gene copy number aberrations in paired blood and tumour from all samples. 153

4.3.2.6 Hallmark molecular signature pathways involvement from somatic mutations

Gene Set Enrichment Analysis was performed using the list of genes underwent somatic mutations in; a) all samples, pooled and b) individual sample. The pooled gene lists resulted in one hallmark molecular signature, mitotic spindle (FDR q-value 9.4 e-3). Genes NF1, SASS6, CLASP1, ARHGEF3 and STAU1 are important genes for mitotic spindle assembly. While performing GSEA of somatically mutated genes sample-wisely, various hallmark molecular signature pathways were found to be involved (Table 4- 9). Mutated genes in sample 's1625' did not show overlapping genes in hallmark molecular signature pathways.

Table 4- 9 Hallmark molecular signature pathways of somatic mutated genes by sample.

FDR Sample Hallmark molecular signature pathways p-value q-value s1422 HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION 4.83E-8 2.42E-6

HALLMARK_ADIPOGENESIS 2.24E-6 5.6E-5 HALLMARK_APICAL_JUNCTION 1.35E-5 2.24E-4 HALLMARK_MITOTIC_SPINDLE 1.58E-3 1.17E-2 HALLMARK_E2F_TARGETS 1.63E-3 1.17E-2 HALLMARK_KRAS_SIGNALING_DN 1.63E-3 1.17E-2 HALLMARK_MYC_TARGETS_V1 1.63E-3 1.17E-2 HALLMARK_PI3K_AKT_MTOR_SIGNALING 5.86E-3 2.7E-2 HALLMARK_ESTROGEN_RESPONSE_EARLY 6.48E-3 2.7E-2 HALLMARK_HEME_METABOLISM 6.48E-3 2.7E-2

s1340 HALLMARK_MYOGENESIS 6.48E-8 3.24E-6

s1810 HALLMARK_EPITHELIAL_MESENCHYMAL_TRANSITION 2.82E-5 1.41E-3

HALLMARK_MTORC1_SIGNALING 1.83E-3 4.57E-2

sMAR HALLMARK_APICAL_JUNCTION 4.14E-4 1.04E-2

HALLMARK_MYOGENESIS 4.14E-4 1.04E-2 HALLMARK_HYPOXIA 2.79E-3 4.65E-2

154

4.3.3 Mutational patterns

The mutation spectrum across all samples showed close similarity and mean numbers of point somatic mutations in kilobase was 138.8 ± 3.06, ranging from 133.809 to 141.184 Kbs (Figure 4- 11, A). Among six classes of base substitution, the dominating base substitution types are C>T and TT mutations occur at the CpG islands. The individual 96 mutational profiles were also similar (Figure 4- 11, B). The mutational patterns of canine HSA were the closest to human COSMIC Single Base Substitution (SBS) signatures 5 with mean cosine similarity of 0.8252 ± 0.0037. Lesser similarity to COSMIC database were found with SBS-12, SBS-26, SBS-20, SBS-1, SBS-6 and SBS-25, respectively. The cosine similarity values are shown in Table 4- 10 where the heat map of cosine similarity and sample contribution to mutational pattern is demonstrated in Figure 4- 12.

155

A

B

Figure 4- 11 A. Mutational spectrum and B. 96 Mutational profile of each sample.

156

Table 4- 10 Cosine similarity between the sample mutational patterns and selected top 7 COSMIC mutational signature patterns.

Sample Signature Signature Signature Signature Signature Signature Signature 5 12 26 20 1 6 25 1340T_GR 0.82821 0.73101 0.72163 0.71319 0.70854 0.70564 0.59015 1422T_GR 0.82849 0.73060 0.72157 0.71248 0.70852 0.70426 0.59228 1625T_GR 0.82567 0.72916 0.72103 0.71121 0.70981 0.70531 0.58960 1810T_GR 0.81932 0.72535 0.71674 0.71165 0.71002 0.70730 0.58564 MART_BM 0.82415 0.73376 0.72399 0.70991 0.70385 0.70254 0.58796

A

B

Figure 4- 12 A. Heat map showing cosine similarity between the sample mutational patterns and 33 COSMIC mutational signature patterns. B. Relative contribution of each sample to mutational patterns.

157

4.3.4 Copy number variations

Top CNVs were selected using the cut-off points (0.6 for amplification, -1.0 for deletion). Because the segment size was 100 bp long, results from cut-off selection were likely to be noises. Another option to inspect copy number variations was to inspect the plots of whole chromosomes. Segments on the regions of the chromosomes were plotted against the ratio of amplifications or deletions, compared to mean copy numbers of the entire chromosome. Figure 4- 13, A shows the plots of copy number variations across all chromosomes, Figure 4- 13, B is an enlarged figure of chromosome 7, where two regions on the chromosome may be deleted. According to the main purpose, which was to prioritize single nucleotide mutations, no further investigation was carried on from this stage.

158

A

B

Figure 4- 13 Sample of copy number plot of a sample's whole exome sequence data. A. Plots of copy number segments along the chromosomes. Position of the base pair (Kb) is shown on X-axis, ratios of copy numbers that deviated from the mean (0), were plotted along Y-axis. B. Enlarged figure of chromosome 37, two large segments are likely to be deletion as the ratios of copy numbers were below the average line.

159

4.4 Discussion

Genes that underwent somatic alterations, with a focus on single nucleotide changes, in canine splenic HSA were identified. Single nucleotide mutated genes analyzed by Gene Set Enrichment analysis highlighted one hallmark molecular signature, mitotic spindle assembly, indicating that important genes in the mitotic spindle assemble process were the most mutated in this pooled sample cohort. The function of the mitotic spindle, which is the array of microtubules is to pull duplicated mitotic chromosomes apart during cell division process [230]. Disruption of mitotic spindle assembly such as misorientation (when two centrosomes did not align properly) could result in; (i) aneuploidy (abnormal number of chromosome), (ii) disorganization of tissue morphology, hence promoting cancer cell metastasis and (iii) expansion of cancer stem cell compartments, reviewed by Pease and Tirnauer (2011) [231]. The consequences of mitotic spindle assembly disruption were also demonstrated when treating cancer cells with alternating electric fields-Tumor Treating Fields, which resulted in aneuploidy, cellular multinucleation and apoptosis of daughter cells [232]. The implications of mitotic spindle disruption in oncogenesis are not clearly understood, whether they were cause, consequence or coexistence with cancer formation. However, it is noted that mitotic spindle disruption was associated with the mutations of genes with tumor suppressing function such as APC, VHL, E-cadherin, PTEN or oncogenes e.g. PI3K and VEGF, reviewed by Pease and Ternauer (2011) and Noatynska et al. (2012) [233].

Disruption of the spindle assembly complex was found to be involved in canine histiocytic malignancies due to copy number loss on chromosome 16 (83% histiocytic malignancy cases) and 31 (70% of histiocytic malignancy cases) [234]. The study also assessed copy number losses on chromosome 16 and 31 in HSA cases and found 50% and 16% of HSA cases had copy number loss on chromosome 16 and 31, respectively. While performing GSEA sample-wisely, the results showed two in five samples had alteration in genes involved in PI3K signaling pathway which is one of the frequently altered pathways in cancers [235]. By performing data analysis in an individual sample, Mutation detection is a step toward applying personalized healthcare to dogs when combining with targeted therapies. Further studies are worth conducting. It is also beneficial to have species-specific molecular databases to enhance the preciseness of the patient management.

Heterogeneity of the genes and molecular hallmark pathways identified is possibly due to the mixture of normal cells and HSA cells which infiltrated splenic parenchyma. Whole exome sequencing was performed using DNA extracted from fresh tumour tissue samples. Some molecular signature pathways may be physiological processes of normal cell that responded to

160

tumour invasion. Pure HSA cells can be extracted by cell culture or micro dissection from tissue section and this may be an approach for future studies.

Genes in adipogenesis and hypoxia pathways were reported to be differentially expressed in canine HSA cells [86, 87, 96]. Two samples (s4122 and sMAR) in this study also had somatic mutated genes in those pathways. In chapter 3 of this study it was shown that genes associated with adipogenesis, complement and interferon gamma response pathways had single nucleotide polymorphisms (SNPs) differences between HSA dog population and non-disease dog population. Those SNPs were suspected to be inherited from germline mutations in the past. While the chronology of gene expression regulatory processes remains unclear, Gene Set Enrichment Analysis results suggested the involvement of adipogenesis and hypoxia molecular mechanisms in canine HSA should be investigated further.

Based on the 'two-hit' model of cancer, the first 'hit' or mutation of the predisposing genes occurs in germline for the hereditary cancers, and the next hit occurs in somatic, while in the case of non-hereditary cancers both hits occur in somatic cells [228]. Some canine HSA predisposing genes highlighted in the previous chapter were found to undergo somatic mutations. The exploratory study of both germline and somatic mutations helped narrowing down the set of genes that underwent somatic mutations alone. Several hot spots of gene mutations included genes that maintain DNA stability, such as X-Ray Repair Cross Complementing 1 (XRCC1) that codes for DNA single-strand break repair protein [171] and genes involved in endothelial cell functions like Ras and Rab interactor 2 (RIN2), a mediator for endothelial cell adhesion and morphogenesis [236]. RIN2 was one of the somatic mutated gene identified in the previous study, however, it was not defined as a strong driver mutation candidate [84].

Another gene set enrichment analysis of possible hot spots of gene mutations in this cohort showed that they were members of the genes that were up regulated in response to doxorubicin treatment of breast cancer (human data). The possibility of those genes to become biomarkers have been investigated in various types of human cancers, including Radical S‐adenosyl methionine domain containing 2 (RSAD2) and Estrogen related receptor alpha (ESRRA) as prognostic markers for breast and endometrial cancers [237-239] and proteasome 26S subunit, non-ATPase 4 (PSMD4) as drug sensitivity selective marker for breast cancer [240]. Moreover, many of the genes in this set had point mutations in 3’-UTR (3’-untranslated regions) where the alterations in 3'-UTR tend to promote tumorigenesis in

161

various types of human cancers through several mechanisms that resulted in increased proto- oncogene expression or repressed tumor suppressor genes [241-243]. Taken together the results from germline and somatic mutations in canine has suggest the possibility that there were predisposing gene sets (first hit) and another gene set that underwent somatic mutations (second hit). The hits are not confined to the coding regions of the genes, but the mutations on the non-coding regions nearby e.g. intron, 3'UTR or 5'UTR may also have significant effects.

Commonly mutated genes that were identified in more than 60% of the samples occurred in non-coding regions of the genes such as ATXN2L, CYP1A2, KRT3, SBNO2 and TUFM. These genes were found to be mutated among Golden Retriever samples, but not in Bull Mastiff. Gene ARHGAP45 is found to be mutated in both Golden Retriever and Bull Mastiff samples. A missense mutation was found in one Golden Retriever sample while mutations on the intron part of the gene were found in other two samples. Even though the mutations were not on the exome, the mutation sites on the intron were located within 60 kb from the missense mutation. The role of protein ArhGAP45 (Rho GTPase activating protein 45) or HMHA1 (minor histocompatibility antigen 1) was recently investigated in various types of human endothelial cell culture [244]. The study demonstrated that ArhGAP45 is a negative regulator of the endothelial cell integrity promoter. Silencing the gene resulted in increasing endothelial cell migration rate and tubular sprouting [244]. The endothelial cells lining the tubules also adapted themselves to perform better in resisting the shear-induced force. The function of this gene has not been emphasized in canine. According to most of the samples (60%) were found to have mutations on nearby spots of this gene, and it is therefore recommended that the potency of this gene in being one of canine HSA key mutations should be assessed.

The mutational patterns were uniform across all samples. HSA samples demonstrated the closest similarity to somatic mutational pattern SBS-5, which is commonly found in most cancer types, but the etiology was unknown [226]. Next mutational pattern that HSA demonstrated was SBS-1, which is related to ageing. This finding agreed with a previous study where mutational signature of ageing was consistently present across all samples [82]. In addition to common mutational signatures which were observed in most cancers, the mutation pattern of canine HSA samples partially resembled the mutational signature observed in a small portion of human liver cancers (less than 20%). SBS-25 was found in a Hodgkin lymphoma cell line. The etiology of both mutational signatures in human remains

162

unknown. SBS-6, -20 and -26 which were often observed together and were found in many types of human cancers (e.g. breast, cervical and colorectal cancers etc.) were the results from either or both defective DNA mismatch repairs and/or microsatellite instability. The findings demonstrated that even though human and dog cancer molecular profiles might be comparative, it would be worthwhile gathering the data and establish catalogues of mutations in cancer for non-human species for more preciseness in future study.

With regards to CNVs detection in WES data, Megquier et al, (2019) [82] described possible artefacts in identifying CNVs from WES data when using FFPE samples. Higher amount of CNVs, including amplifications and deletions were detected in the WES data obtained from FFPE when compared to fresh frozen samples. As tumor samples in this study were all fresh frozen, I decided to expand my exploration to include copy number variations. However, I have experienced high levels of noise. Others have pointed out that the algorithms for CNVs identification in WES data showed variable results and the optimizations are still in progress [245, 246]. The existing data set may be re-analyzed and validated when the approach has been further developed.

Other factors that might contribute to the noise and biases for copy number variation identification in this study were DNA quality of some sample and the proportion of normal cells to cancer cells in the tumor tissues from which the DNAs were extracted from. The results showed borderline DNA quality obtained from sample s1810's. The number of DNA segments identified for copy number variations analysis in this sample was the highest when compared to other samples, so it is possible that this sample's DNA was fragmented and produced artefacts. The algorithm chosen to detect copy number variations was based on the comparison of sequence depth or number of reads between normal and tumor samples [179, 245, 247]. As I performed DNA extraction from fresh frozen tumor tissues and I did not have the corresponding histological samples to assess the morphology, there is a chance that the samples shared different proportions between normal and cancer cells. This could affect the results when comparing between numbers of sequence reads at each position but should not affect somatic mutation identification results because the numbers of sequence reads were not used for comparison, but for quality control.

Even though the identification of copy number alterations was not focused, I have manually detected possible copy number aberrations of AHNAK (AHNAK nucleoprotein) in both normal and tumour genes of all five samples by IGV inspection. The depth of coverage at

163

some segments were as high as 400 reads compared to other candidate genes where the sequence depths were between 60 to 120 reads. AHNAK is involved in a wide range of cellular processes, and although mechanistically not fully elucidated, the involvement of this protein in cancers was found to be diverse. High level of laryngeal carcinoma tissue expression of AHNAK and inflammatory markers was associated with poor survival time, while over-expression of AHNAK in triple negative breast cancer was suggested to have tumour suppression effects on cell line and mouse xenografts [248-250]. A recent study found AHNAK to be over-expressed in doxorubicin resistant breast cancer cell line when treated with anti-cancer drug, doxorubicin, suggesting the possibility of AHNAK as predictive biomarker for anti-cancer treatment response or as manipulative target in cancer treatment [251]. Since HSA dogs in this cohort tended to have AHNAK copy number aberrations, comparison of AHNAK copy numbers between HSA and non-HSA dogs may be considered to seek for an evidence of association between AHNAK copy number variations and canine HSA. Further study of this gene in canine HSA is recommended.

Only one out of five samples (s1422) had mutations in many of the well represented genes in the 'HSA panel' such as TP53, PIK3CA, USP21 and COL1A2. This is a lower proportion than those identified in larger cohorts, i.e. TP53 mutations was found in more than half of the samples that consisted of 47 Golden Retrievers and 50 various dog breeds [82, 85] and 35% of 20 samples [84]. The fewer cases with TP53 mutations identified in this cohort may reflect the small sample size. The number of identified somatic mutations in this sample set was the highest among all (Appendix 10). The heterogeneity of the mutated genes among samples from the same breed (Golden Retriever) may be due to different stages of cancer development, intra tumor molecular subtypes or an individual’s unique genomic profile. Histopathological stage and growth pattern in one sample (sMAR) were moderate and dominant solid type, respectively. The histopathological features of other samples were not assessed because of limited sample quantity, precluding assessment of the relations between histopathological features and the exome mutational profile.

In conclusion, this exploratory study of whole exome sequence of canine HSA focusing on single nucleotide mutations has: (i) highlighted potential key genes in HSA development, (ii) supported the concept of hereditary HSA predisposing factors by demonstrating the different sets of mutated genes in normal somatic cell and HSA cancer cells and (iii) found that mutations in non-coding gene regions such as 3'UTR or introns may have significant effect if

164

occurring in significant genes. There are other genomic events that can be further explored such as small insertion and deletions (indels) and copy number variations, or re-filtering the point mutation candidates with other tools/criteria. As cancer is a disease that results from multifactorial causes including genomic events, the study has contributed by adding up to parts of that. As this study included limited sample size, further validation in a larger cohort is needed.

165

Chapter 5 Using Fourier transformed infrared (FT-IR) imaging to distinguish splenic lesions in canine formalin-fixed, paraffin- embedded (FFPE) tissues.

5.1 Introduction

Visceral haemangiosarcoma (HSA) is a vascular endothelial cell cancer that arises in internal organs, especially in the spleen, heart and liver. Presenting signs for HSA are non-specific, depending on the site where the tumour occurs. Signs of blood loss from a ruptured tumour range from mild weakness to severe hypovolemic shock or sudden death [22, 23]. Definitive diagnosis for canine HSA relies on microscopic examination of the tumour sample. HSA carries a very poor prognosis, hence, a dog with HSA related symptoms such as hemoabdomen are often euthanized without full investigation.

However, microscopic examination of HSA remains challenging. While endothelial cell markers, CD31 and vWf are used to distinguishing HSA from other soft tissue sarcomas, they are not specific to HSA cells as normal endothelial cells also express those glycoproteins [24, 25]. Without proven HSA specific tissue markers which can clearly highlight HSA from non- malignant vascular lesions, the differentiation between HSA and HSA-like lesions such as hematoma had to depend mainly on microscopic examination of H&E stained slides, while the interpretation was often ambiguous. Hematomas were usually found concurrently with splenic lesions including HSA. Inadequate numbers of samples collected from splenic lesions could cause the HSA diagnosis to be missed [252, 253].

In cancer cell or tissue, biochemical changes may proceed while morphologic changes remain undetectable under microscope. Fourier Transform Infrared (FTIR) spectro-microscopy or FTIR imaging technique has been applied for characterizing biomolecular compositions in biological materials for few decades [108]. Different types of brain tumours and normal brain tissues could be characterized based on collagen contents [254]. FTIR imaging could also differentiate molecular damaged oral mucosa tissues which had been exposed to radiation in the therapy for head and neck cancers from normal tissues [255]. The study noted that oral mucosa tissues from cancer patients who had not been treated with radiation therapy also showed different IR spectral features from normal tissue whereas H&E stained microscopic images were similar.

166

The proposed applications of FTIR imaging in cancer research included, but not limited to assisting pathologist in cancer differentiation and cancer grading [108, 256, 257]. The technique also had the potential to be a rapid, stain-free micro-imaging technique that can detect cancer cells in sentinel lymph nodes or demarcate cancer margin during surgical procedures in real time fashion [258-261].

The use of FTIR imaging as an auxiliary cancer diagnostic tool in dogs was not extensively investigated as much as in human cancer research [262]. Kochan et al, (2015) [146] reported the capability of FTIR imaging coupled with multivariate data analysis in distinguishing hepatic histiocytoma tissue from normal hepatic parenchyma where the choice of data acquisition modes/substrates used had small effects on the results. FTIR imaging study of dog HSA tissue was still limited, therefore the optimisation of sample preparation techniques and FTIR spectral data acquisition setting were needed to obtain the most informative raw data for the subsequent data analysis.

I hypothesized that the FTIR spectral features of HSA cells differ from those of their normal cell counterpart, the vascular endothelium. The aim of this chapter was to explore the capability of the FTIR imaging in association with multivariate analysis and cancer prediction model to distinguish HSA from other non-HSA splenic lesions in formalin-fixed paraffin- embedded (FFPE) tissue sections. The project had four experimental objectives: (1) to find the optimal substrates and mode of data acquisition (CaF2, transmission mode vs. Mirr-IR, reflection mode) for dog spleen tissue, (2) to find the optimal sample preparation technique (remove paraffin vs. retain paraffin) FTIR data acquisition technique (Transmission vs. Reflection) mode, (3) to identify spectral differences between HSA cells and normal endothelial cells and, as mentioned in the aim, (4) to assess the capability of FTIR imaging for dog HSA diagnosis.

167

5.2 Materials and methods

5.2.1 Samples

Formalin-fixed, paraffin-embedded (FFPE) dog spleen tissues, which had previously been examined and diagnosed by veterinary pathologists, with various types of lesions including HSA as well as other neoplastic and nonneoplastic lesions (i.e., "Non-HSA") were used for the studies. Forty-nine samples were collected from the University of Queensland Veterinary Medical Centre under the Animal Ethics Approval (AEC) number SVS/438/15/KIBBLE/CRF and with the animal owner's consent, or obtained from Section of Anatomic Pathology, Department of Biomedical Sciences, College of Veterinary Medicine, Cornell University, courtesy of Dr. Eunju Choi.

The samples were randomly assigned into three sample data sets, ‘CAL’, ‘VAL’ and ‘BLIND’. ‘CAL’ and ‘VAL’ data sets consisted of twenty-four samples (HSA, n = 15; non- HSA, n = 9). This data set was used for supervised classification and model optimization as calibration and cross-validation samples. The ‘BLIND’ data set consisted of twenty-five samples (HSA, n = 13; non-HSA, n = 12). This data set was used as an external data set for testing the cancer predicting ability of the FTIR imaging coupled with a selected cancer prediction model. Figure 5- 1 demonstrates the purposes of the assigned data set. “HSA” were tissue sections with HSA lesions whereas “Non-HSA” were tissues derived from normal spleen, haemorrhagic spleen, hematoma, nodular hyperplasia, hemangioma, lymphoma and histiocytoma. Tissue samples derived from other organs besides spleen (brain, kidney and heart) were included in the testing set to challenge the capability of FTIR imaging plus machine learning algorithms when dealing with untrained data set. Patient profile can be viewed in Appendix 11.

Figure 5- 1 Forty-nine FFPE tissue samples were assigned to data set groups. Supervised FTIR images classification and model selection were conducted using 'CAL' (training/calibration) and 'VAL' (test/validation) data sets. The model that could predict the HSA with the highest Correct Classification Rate (CCR) was chosen and applied to an external independent data set, 'BLIND'. Then the accuracy of cancer prediction was 168

calculated.

5.2.2 Sample preparation

The first two aims for the study were to select the optimal substrate and data acquisition mode for the experiment. There were two options for sample substrates and mode of data acquisition to be assessed, first Mirr-IR substrate (MirrIR Low-e microscope slides for reflective infrared studies, Kevley Technologies, Ohio, USA) for transflecion mode and

Calcium Fluoride substrate (76 mm x 26 mm x 1 mm UV Polished Calcium Fluoride (CaF2) optical window, CAFP76-26-1U, Crystran Ltd., UK) for transmission mode of FTIR data acquisition. The next selection was sample preparation technique, i.e., whether to remove paraffin from tissues before acquiring FTIR spectral data or to retain the paraffin. The decision was made after a trial experiment, attempt to achieve less standing wave artefacts, scattering effects and as recommended by spectroscopists [108].

After sample preparation technique had been determined, the FFPE tissue blocks were cut into

5 µm thick sections and placed on 76 mm x 26 mm x 1 mm UV Polished CaF 2 optical window (CAFP76-26-1U, Crystran Ltd., UK). A consecutive tissue section was placed on a conventional microscope glass slide (75 mm x 25 mm x 1 mm) for routine hematoxylin and eosin (H&E) staining. The tissue sections on the slides were heat-fixed at 60o C for 10 minutes. The tissue sections on microscope glass slides were stained with H&E using an autostainer (Multistainer-Automated Slide Stainer; Leica ST5020, Biosystem, USA) and cover slipped with xylene-based mounting medium. The CaF2 slides with tissue sections were placed in a microscopic slide box with 2 packets of 10 g silica gel overnight before FTIR spectral data acquisition.

5.2.3 Scan area selection for FTIR imaging

The size of tissue samples varied from 5 mm x 10 mm to 20 mm x 30 mm which were considered too large for the whole area to be scanned by the FTIR microscope. Areas of interests, from 704 µm x 704 µm to 2,112 µm x 1,408 µm, were designated for each tissue sample by marking on the microscopic photos of the H&E slides, scanned using a digital slide scanner (Olympus slide scanner VS120, Olympus®). Photos of H&E slides and an example of chosen areas of interests are shown in Figure 5- 2 A and B.

169

On the microscopic image of H&E stained tissue, normal dog spleen is a capsular organ containing blood vessels forming the circulatory structure through clusters of white blood cells, called the white pulp, which are surrounded by a network of red blood cells filled venules called venous sinuses. The red blood cells area is called the red pulp. Combined the white and red pulp is called splenic parenchyma. Detailed microscopic images of normal dog spleen are shown in Appendix 12. HSA tissue consists of HSA cells and extracellular matrix, which often appears as hyalinized substance adjacent to the neoplastic endothelial cells. HSA cells that produce this extracellular matrix are considered the classical HSA type.

In the 'CAL' and 'VAL' data sets, the clear images of HSA tissues were targeted for FTIR spectral data acquisition for the HSA group, where normal splenic parenchyma was targeted for the non-HSA group. Then the selected areas on the H&E stained tissues were matched with the visible images taken from light mode of the FTIR microscope to facilitate annotation of tissue components.

A

B

Figure 5- 2 Images of scanned microscopic slides, A. where the consecutive tissue sections were to be scanned by FTIR spectro-microscope at the designated areas. B. Sample microscopic slide photo where three FTIR spectro- microscope scanning areas (within the squares) were selected.

170

5.2.4 FTIR image spectral data acquisition

FTIR image spectral data were acquired at 19o C, with nitrogen gas continuously circulated through the spectrometer and sample holding stage during the entire acquisition session. Paraffin wax was not removed from the tissue section samples to avoid sample loss during the xylene washing procedures. FTIR image spectral data were collected using FTIR microscope imaging system (Agilent Cary 620, Agilent Technologies), which was equipped with x15 magnification objective microscope and liquid nitrogen cooled 128 x 128 pixels Mercury Cadmium Telluride (MCT) Focal Plane Array (FPA) detector. A 704 µm x 704 µm single tile image which contains 128 of 5.5 µm pixel size is generated. Multiple tiles can be stitched together when acquiring data from larger area, the mosaic image can then be created using Resolution Pro® software (ver. 5.3.0), which was the computer program used for collecting FTIR image spectral data. The data acquisition methods were set to transmission mode to obtain the data between 3950 – 900 cm-1 IR spectral range at 8 cm-1 spectral resolution.

For tissue sections in the 'CAL' and 'VAL' sets, the samples were scanned co-averaging 128 scans, while the background image was obtained with 512 FPA scans. The selected areas of interest ranged from a single tile (704 µm x 704 µm) to six (3 x 2 tiles) mosaic images. For the 'BLIND' set, lower number of scans (64 times for sample and 128 times for the background), single tile images were taken, allowing for shorter data acquisition time and smaller size of raw FTIR image spectral data.

5.2.5 Creation, optimization, model selection and blind testing of the selected model

The whole process intended to i) find the best parameters for the multi classification of the hyperspectral images considering: “paraffin”, “cancer cell and cancer matrix”, “connective tissue and normal endothelium” and “splenic parenchyma” classes (red blood cells and white blood cells classes were not considered), ii) the estimation of the generalization of the model pixel-wise for all the classes using an independent test set and, iii) the blind testing of the model for the detection of cancer. Two major processes from creating the most suitable model ('Calibration') to the testing of the external blind data set ('Testing') were performed for the data analysis. The workflow is demonstrated in Figure 5- 3.

171

Figure 5- 3 Workflow for data analysis. The calibration step to select the most suitable model for predicting 'Cancer' (HSA) in ‘BLIND’ data set was performed by manually labelling the components in the FTIR images in the ‘CAL’ and ‘VAL’ sets to classes where the ground true pixels were created. Then the data analyst randomly split the data, by pixels into two groups. The models which could be used to discriminate each pixel class from the other were created and then tested to find the most effective model. The selected model was used to predict the cancer (HSA) FTIR images in the 'BLIND' data set.

172

First, secondary data image preparation was conducted in only the ‘CAL’ and ‘VAL’ data sets in the calibration step, using an in-house program called GMarker (Dr. David Perez-Guaita, Centre for Biospectroscopy, School of Chemistry, Monash University). GMarker is a graphical user interface (GUI) developed using the APP designer in Matlab to manually assign the spatial characteristics to the different tissue types. Six classes of tissue types were assigned to the FTIR images: (1) paraffin, (2) cancer cell and cancer matrix, (3) connective tissue and normal endothelium, (4) red blood cells, (5) white blood cells, and (6) splenic parenchyma. The class-labelled FTIR images became 'ground true' images for cross- validation. Figure 5- 4, A-E show images acquired from the FTIR microscope, both visible and hyperspectral image with an H&E stained tissue image from the consecutive tissue section. Figure 5- 4, F shows secondary data image or ‘ground true’ image labelled using GMarker.

Figure 5- 4 Sample of images acquired at each stage of the study. A, B. The microscopic images of H&E stained dog spleen tissue, splenic artery and part of splenic parenchyma (star), scale bar = 1 mm (A) and 200 µm (B). C, D. Visible images, taken from light microscope of the FTIR microscope, from adjacent tissue sections of the same block, E. Hyperspectral image or FTIR image, obtained from the spectral data acquired from the unstained, paraffin retained adjacent tissue section on CaF2 substrate (slide) and F. Secondary data image, 'ground true', generated by manually classify the type of tissue. Shown in the sample, this hyperspectral image contains splenic parenchyma (blue), connective tissue (purple), endothelium (fluorescence green), red blood cells (yellow) and paraffin (dark red).

173

At model creation step (Figure 5- 3, Creation of the model), two classification methods, Random Forest (RF) and Partial Least Squares-Discriminant analysis (PLS-DA) were tested. FTIR images were imported from the Agilent software to Matlab using a function obtained from the chitoolbox (https://bitbucket.org/AlexHenderson/chitoolbox/src/ master/). PLS-DA and pre-processing were performed using the functions available at the PLS-toolbox from Eigenvector (Washington, USA). The optimization was carried through the creation of several models combining pre-processing and spectral regions as described in [263, 264].

Once the optimum model was selected, the generalization error was established by predicting the "VAL" dataset of images and compared with the reference data for each pixel. correct classification rate (CCR) according to the test, as well as specificity and sensitivity for each class was computed. According to low number of samples, the data analysis was performed pixel-wise rather than sample-wise.

Unlike the 'CAL' and 'VAL' datasets in the calibration step, the 'BLIND' dataset of the testing step which was used for testing the predicting accuracy of the FTIR imaging coupled with the selected classification model. The dataset contained 58 hyperspectral images. The images in this data set were not labelled using GMarker. The only reference information of this sample set was the presence of cancer or not, without any spatial information regarding the location of the cancerous tissue. Hence, for each image the reference data was either a "1" = HSA or "0" = Non-HSA (one reference value per image). This information had not been communicated to the data analyst until the 'best’ model for prediction method was selected. Finally, the selected model was used for predicting the FTIR images in the 'BLIND' set. Pseudo colour images representing classified tissue components or predicted images were generated. The presentation of “cancer and cancer matrix” class in the predicted images were then classified as “HSA”, where images without the presentation of "cancer and cancer matrix" class were classified as "Non-HSA". The sensitivity and specificity of the test were then calculated using the pathologist's diagnosis as reference.

174

5.3 Results

5.3.1 Sample preparation methods

There are several options for sample preparation and modes of data acquisition when acquiring spectral data from formalin-fixed paraffinized (FFPE) tissue sections for FTIR imaging (Baker et al., 2014). One of the major differences between two modes of data acquisitions; transmission and transflection, was the price of substrates where the tissue sections were placed on. To acquire data using transmission mode, the more expensive, transparent, low IR reflecting Calcium Fluoride (CaF2) substrate was used. The substrate of the same size as microscopic slide costs up to hundreds Australian dollar by the time the study was conduct. For the transflection mode of FTIR spectral data acquisition, low emission, IR reflective (MirrIR, Low-e) substrate slide was used.

To address the first two study objectives, I compared Calcium Fluoride vs. Mirr-IRsubstrates, with and without paraffin removal prior to FTIR scanning. I also explored the optimum tissue section thickness for dog spleen FFPE from 4 – 7 µm.

Three tissue blocks were prepared at different sample preparation techniques. When removing the paraffin with xylene washes, spleen tissue sections fell off the Mirr-IR substrates when dried or became visibly shrunken as shown in Figure 5- 5, A1 and A2. Due to the technical problem, transmission mode of FTIR data acquisition with retention of paraffin was the most practical for spleen tissue and 5 µm tissue section thickness was also recommended. Data acquisition technique (transmission mode using Calcium Fluoride substrate, 5 µm sample thickness, retaining paraffin wax) was determined and performed for the whole sample set.

175

A1 A2

B

Figure 5- 5 Options for FTIR image data acquisition methods and images acquired from the experimental trials. A1, A2. Samples of visible images (black and white) and hyperspectral or FTIR images (coloured) taken from different sample preparation methods and mode of FTIR image data acquisition. Tissues on CaF2 substrate after paraffin removal by xylene following the recommended protocol (Baker et al., 2014) appeared dried and detached from the substrate. B. Samples of visible images and hyperspectral images obtained from the same sample tissue block of different thicknesses. Transmission mode for FTIR data acquisition on CaF 2 substrate was performed. Note: FTIR spectral data collected from a 4 µm-thick tissue sample are not shown here

176

5.3.2 HSA prediction model selection

FTIR spectral data acquired in the calibration step were from 24 dogs (HSA, n = 15; non- HSA, n = 9), 35 images were obtained. Five images with significant different spectra from the group when performing principal component analysis (PCA) in preliminary data assessment were excluded because instrumental error was suspected.

The remaining images (n = 30) were randomly divided into 'CAL' and 'VAL' sets where the 'ground true' classification (splenic parenchyma, White blood cells, Red blood cells, connective tissue plus endothelium, cancer cell plus cancer matrix and paraffin) were assigned to the images. The proportion of the data in pixels are shown in Table 5- 1.

Table 5- 1 Number (in pixels) and proportion (percentage) of classified 'ground true' FTIR images in 'CAL' and 'VAL' datasets in the calibration/model selection step. (WBCs = White blood cells, RBCs = Red blood cells. Classified FTIR CAL (21 images) VAL (9 images) images

Number Proportion Number Proportion (pixels) (%) (pixels) (%) 1 Splenic parenchyma 410,300 40.59 223,322 44.51 2 WBCs 33,077 3.27 26,098 5.20 3 RBCs 59,212 5.86 49,001 9.77 4 Connective tissue and 121,373 12.01 113,181 22.56 endothelium 5 Cancer cell and matrix 384,890 38.08 79,554 15.85 6 Paraffin 1,973 0.20 10,609 2.11

The most suitable predicting model was chosen from the algorithms where better correct classification rate (CCR) was obtained. Models created using Random Forest (RF) and Partial Least Square Discriminant Analysis (PLS-DA) were tested pixel-wise (the sensitivity and specificity were calculated from the predicting results pixel-by-pixel). Six models were created using RF algorithm from 2 IR spectral regions with 4 pre-processing methods while 441 models were created using PLS-DA from 6 IR spectral regions with 7 pre-processing methods. The pre-processing methods were shown in Table 5- 2. The sensitivity, specificity

177

and CCR obtained from two algorithms after independent testing within the calibration dataset was shown in Table 5- 3.

Table 5- 2 List of seven pre-processing methods used for optimization of the prediction model. SNV = Standard Normal Variate, MC = Mean Centring Methods SNV MC Derivatives Smoothing points 1 SNV MC 2 SNV MC 1st 7 3 MC 1st 9 4 SNV MC 1st 15 5 SNV MC 2nd 7 6 MC 2nd 11 7 SNV MC 2nd 15

Table 5- 3 Classified tissue components (ground true) and the sensitivity, specificity and correct classification rate (CCR) obtained by two classification algorithms, Random Forest (RF) and Partial Least Square Discriminant Analysis (PLS-DA). (SEN = Sensitivity, SPC = Specificity). Class Pixels RF PLS-DA SEN SPC SEN SPC Splenic parenchyma 223,322 0.47 0.86 0.66 0.91 WBCs 26,098 0.50 0.92 0.09 0.93 RBCs 49,001 0.88 0.93 0.60 0.96 Connective tissue and 113,181 0.83 0.88 0.77 0.88 endothelium Cancer cell and matrix 79,554 0.24 0.89 0.74 0.91 Paraffin 1,069 0.02 0.95 0.82 0.98 Correct classification rate 55.2 % 67 %

178

For the prediction of red blood cells, white blood cells and connective tissue/vascular endothelium classes, higher sensitivity was obtained by Random Forest method. Whereas PLS-DA provided higher sensitivity for cancer cell and splenic parenchyma prediction plus higher specificity for all classes (Table 5- 3). In general, PLS-DA provided better correct classification rate (CCR) than RF method, 67% and 55.2%, respectively. So, the model created by PLS-DA was chosen to predict the HSA in ‘BLIND’ dataset at the testing step (Figure 5- 3).

For the models created using PLS-DA, six IR spectral regions comprising 1601-1770, 1490- 1600, 1429-1489, 1296-1428, 1139-1295, 900-1138 were distributed in all the possible combinations (including the absence of the region) to provide 63 different spectral regions. They were then combined with 7 pre-processing methods namely, Standard Normal Variate (SNV) + mean centring (MC), first derivative (7 Smoothing points) + SNV + MC, first derivative (9 Smoothing points) + MC, first derivative (15 Smoothing points) + SNV + MC, second derivative (7 Smoothing points) + SNV + MC, second derivative (11 Smoothing points) + MC and second derivative (15 Smoothing points) + SNV + MC (Table 5- 2). Using the 441 combinations of pre-processing and spectral regions, models were then created using the “CAL” dataset and validated using venetian blinds cross validation. For each model, the number of latent variables (LVs) was selected, considering the highest correct classification rate according to the cross validation (CV-CCR). The 441 combinations were then sorted according to the CV-CCR and a model was selected considering the CV-CCR and the sensitivity to each class according to the CV. The possible combinations of the regions are shown in Table 5- 4.

179

Table 5- 4 Possible combinations of the regions and definition of the spectral assignments. Wave numbers and definition of spectral assignments were extracted from [265]. Band position Wave numbers Definition of the spectral assignments ranges (cm-1) included (cm-1) 1601-1770 1740 C=O stretching vibrations of triglycerides, cholesterol esters. 1639 Amide I: C=O stretching vibrations of proteins. 1490-1600 1545 Amide II: N-H bending and C-N stretching vibrations of proteins.

1429-1489 1453 CH2 bending vibrations of lipids. 1296-1428 1402 COO- symmetric stretching: fatty acid side chains. 1310 Peptide side chain vibrations. 1139-1295 1234 PO-2 symmetric stretching: fully hydrogen bonded: mainly nucleic acids with the little contribution from phospholipids. 1152 CO-O-C antisymmetric stretching vibrations of glycogen and nucleic acid ribose. 900-1138 1080 PO-2 symmetric stretching: nucleic acids and phospholipids; C-O stretching: glycogen, polysaccharide, glycolipids. 1045 CO stretching vibrations of carbohydrates, glycogen; deoxyribose/ribose of nucleic acids. 1025 Mainly from glycogen. 925 Sugar vibrations in backbone of DNA-Z form.

Sensitivity and specificity were calculated from the confusion matrix, pixel-wise and sample- wise (Figure 5- 6, A and B). For pixel-wise, the confusion matrix between the ground true classes and the predicted results were created. While in sample-wise, 2 x2 confusion table was created because the results would be either “HSA” or “non-HSA”. The testing of the model’s sensitivity and specificity on the ‘BLIND’ dataset in the next step, were performed sample-wise.

180

A

B

Figure 5- 6 Pixel-wise and sample-wise cancer prediction methods. A. In pixel-wise fashion, all image pixels were pooled and analysed regardless of the spatial data. Diagonal matrix shows numbers of correctly predicted pixels. B. Sample-wise testing of PLS-DA cancer prediction model. FTIR images were predicted 'positive' by the presentation of cancer class pixels. (Figures provided by Dr. David Perez-Guaita, Centre for Biospectroscopy, School of Chemistry, Monash University.)

181

5.3.3 Cancer prediction from FTIR images

The FTIR imaging plus multivariate classification models were conducted to classify tissue types including HSA cells and its cancer matrix in the calibration step, using ‘CAL’ and ‘VAL’ dataset. In the testing step, the selected model (PLS-DA) was applied to predict HSA out of 58 FTIR images (HSA, n = 26; non-HSA, n = 32) obtained from 25 dogs from a different source.

The size of raw spectral data obtained from the 'Calibration' set ranged from 114 MB (single tile FTIR images) to 1.6 GB (3 x 2 tiles mosaic FTIR images). Estimated scanning time was 2 minutes for a single tile image and up to 20 minutes for six tiles mosaic images. The estimated scanning time for the samples in the 'Testing' set was one minute per sample and the size of a single tile image was also 114 MB.

Cancer prediction was performed on FTIR images using the prediction model. The model classified IR spectra to each class according to previously calibrated spectra. Predicted FTIR images where there are light blue pixels were interpreted as cancerous (Figure 5- 7). The confusion matrix compared the diagnosis made by pathologists and the predicted results are summarized in Table 5- 5.

Table 5- 5 Confusion matrix, or 2 x 2 contingency table of the FTIR imaging plus cancer prediction model outcome. The test was conducted using 26 pathologist diagnosed splenic HSA and 32 non-HSA splenic lesions hyperspectral FTIR images. The images were generated from the infrared spectra acquired from the FTIR microscope scanning of the formalin-fixed, paraffin-embedded (FFPE) tissue sections. HSA Non-HSA FTIR predicted HSA 24 (TP) 21 (FP) FTIR predicted Non-HSA 2 (FN) 11 (TN)

182

Predicted Classes

Paraffin

Cancer cell and matrix

Connective tissue and endothelium

Red blood cells

White blood cells

Splenic parenchyma

Figure 5- 7 Classified FTIR images by cancer prediction model. The model selected from previous calibration step was applied to classify tissue types of the unknown ‘BLIND’ dataset. Any images that showed “cancer and cancer matrix” (light blue) class on the images were interpreted as “HSA” (cancer). The remaining were interpreted as “non-HSA”. For an example, the top left corner image (‘unknown01’), circled, was interpreted as HSA, where the ‘unknown02’ image was interpreted as non-HSA. The white points in the predicted images were the pixels that did not pass quality control, where the spectra were significantly different from those of the spectra in the calibration data set (95% CI). (Figures provided by Dr. David Perez-Guaita, Centre for Biospectroscopy, School of Chemistry, Monash University.)

183

The machine correctly predicted 24 images out of 26 cancer images, which yielded 92% sensitivity. However, 21 non-cancer images were incorrectly predicted as cancer, wherefore the specificity of the test was 34%. The false positive FTIR images were haematoma (7 images), haemangioma (2 images), histiocytoma (2 images), and 10 images acquired from liver, heart, kidney and brain as described in Table 5- 6. The overall accuracy of the test was 60.34%.

Table 5- 6 Table summarizes number of FTIR images, and the model prediction results. Among 26 HSA FTIR images, 24 were correctly predicted as HSA (true positive), while two were predicted non-HSA (false positive). The remaining FTIR images were non-HSA. Haemangioma and Histiocytoma were predicted as HSA (false positive). Lymphoma and nodular hyperplasia were correctly predicted as non-HSA (true negative). Ten FTIR images of other tissues (brain, kidney, liver and heart) were included to assess the capability of the machine when dealing with untrained or unknown samples. These were all predicted as HSA, i.e. false positive.

184

5.4 Discussion

This chapter has explored the capability of FTIR imaging plus PLS-DA analysis in discriminating formalin-fixed paraffin-embedded spleen tissue sections containing HSA (cancer) tissue from spleen tissue without HSA (non-cancer or non-HSA). The data was derived from FTIR transmission measurement of the sample spectra on CaF 2 substrate without removal of the paraffin from the sample.

The step of paraffin removal was omitted in this study because many of the sample tissue sections dried and detached from the substrate, both Mirr-IR and CaF2 after paraffin removal by xylene immersion in the trial experiment. Because the spleen is a blood cell rich organ and many of the samples were hematoma lesions, where most of the tissue sections comprised mainly blood, the retention of the tissues on the substrate slide was very poor. Since an important differential for splenic HSA is splenic hematoma, it was considered more practical to retain paraffin for the sample preparation to prevent sample loss during paraffin removal step. Furthermore, omitting the step while conserving the informative data could help reduce time and the use of a hazardous reagent such as xylene.

Optimal FTIR data acquisition mode and substrates (between Mirr-IR and CaF2) was not chosen by the comparison between the results from those techniques, due to low sample number and time constraints. The transmission mode of FTIR data acquisition on CaF 2 substrate was chosen for this experiment because there was no FTIR imaging study in dog spleen tissue for references. However, a comparison of the FTIR spectral data derived from the transmission mode measurement of prostate cancer tissue could be discriminated in a tighter cluster in principle component analysis compared to the data derived from reflection mode [266].

One of the objectives for this study to compare biochemical components between the endothelial cell of normal spleen and the HSA cells was not achieved due to a low proportion of endothelial cell pixels in the FTIR images (2.43%, Appendix 13). Higher resolution FTIR images may help overcome this limitation.

The limitation of low number of FTIR image pixels in some classes e.g. endothelial cell (2.43%, Appendix 13) and cancer matrix (5.08%, Appendix 13) in the preliminary results was resolved by; 1) re-classifying the assigned pixels in the calibration FTIR images. The pixels that were assigned to 'endothelial cell' class were merged with 'connective tissue class' while 'cancer matrix' class was merged with 'cancer cell', shown in Figure 5- 8, and 2) subtraction of un-used

185

classes e.g. protein and empty pixels from the analysis. Therefore, FTIR images were classified to 6 classes as shown in Table 5- 1.

Figure 5- 8 Classified 'ground true' image before and after merging 'endothelial cell' and 'connective tissue' classes.

Even though one of the chapter experimental objectives; to identify spectral differences between HSA cells and normal endothelial cells was not achieved, the aim of the chapter which was to explore the capability of the FTIR imaging for dog HSA diagnosis was accomplished.

FTIR images derived from normal kidney, brain, liver and heart tissues were predicted as HSA (false positive). Some organs like kidney consist of multiple tubules with endothelial cells and extracellular matrix and may resembles HSA. Organ specific algorithm may therefore be needed. Within the same tissue type such as the spleen, FTIR images of hematoma were the most likely to be incorrectly predicted as cancerous.

The ambiguous features between HSA and hematoma were also known to be the issue in histopathologic examination of H&E stained microscopic image. Hematoma tissue also consists newly formed blood vessels and hyalinised extracellular matrix which morphologically resemble to those of the HSA tissue [267]. There were two possibilities to explain the overwhelming numbers of false positive hematoma FTIR images. First, the possibility of FTIR imaging is capable of detecting cancerous biochemical components in the hematoma tissue prior to morphological changes. Which means the hematoma tissues could have been sampled from HSA spleen but did not involve the cancerous lesions. A recent study has shown that the lower number of samples collected from splenic HSA lesions, the higher risk of misdiagnosis because it is common in splenic HSA that haemorrhage and hematoma tissues are sampled rather than neoplastic tissue [253]. The other possibility was about the specificity of FTIR imaging coupled with cancer prediction model, which was not enough to distinguish between ambiguous lesions in the spleen and this is likely to be the case. For future investigation, the study may be re-designed to emphasis the prediction of HSA from the following sample groups; 1) splenic HSA tissue, 2) hematoma from HSA affected spleen and 3) hematoma tissue from

186

non-HSA affected . If the method demonstrates the ability to predict hematoma from HSA affected spleen as cancerous, this may assist the pathologists when dealing with inadequate number of samples from splenic lesions.

The sensitivity and specificity of the test can be optimized by adjusting the detection cut-off limit. In this study, classified FTIR images were generated by the prediction model, however the interpretation was performed subjectively by visualization, with the judging criteria as described in the results. This task can be proceeded by additional algorithm which calculates number and proportion of cancer pixels to the whole images, so that the cut-off can be established objectively.

The success of FTIR imaging in cancer prediction relies on sample (FTIR image) acquisition and cancer prediction models equally. In addition to good quality FTIR image data, the accuracy of the test can be optimized according to choices of data analytic methods. As reported by Zhang et al, (2015) and Mao et al, (2016) where FTIR images obtained from fresh osteoarthritis (OA) articular cartilage tissues were compared with normal articular cartilage tissues. FTIR imaging coupled with PLS-DA model yielded 100% accuracy in the calibration step and 90.24% in testing step while PCA and Fisher discriminant analysis yielded 94% accuracy in calibration step and 86.67% in testing step [268, 269].

The heterogeneous structure of the spleen is a big challenge in supervised classification of the tissue components. The border pixels between different classes e.g. splenic parenchyma and cancer, connective tissue and cancer etc. could be incorrectly classified. One of the solutions to this is to delete pixels between the border and increase sample size to compensate.

The FTIR imaging coupled PLS-DA machine learning showed promising results on how this technology could discriminate HSA from non-HSA of the spleen which is one of the most challenging tissue organs to analyse. To make use of the findings and improve on what has not yet been achieved, I recommend: (1) try fresh frozen tissue. This can aid in quick and label-free diagnosis. Moreover, lipid information can also be assessed. Unlike FFPE where lipid was removed by tissue processing protocol. Recommended samples would be FNA samples. Because of high positive predictive value of this test methods, it is a good screening diagnostic test before more specific tests (histopathologic slide interpretation by pathologists, panels of IHCs) to be done, and (2) Higher resolution FTIR image data acquisition. With higher the resolution, the longer it will take for scanning, but in the future, higher resolution and faster

187

scanning time instruments and methods will be developed, (3) Optimization of data analysis algorithms and (4) increasing the sample size, especially the control group.

188

Chapter 6 Conclusions and future perspectives

The visceral form of canine HSA is difficult to detect. The clinical signs are non-specific, and the diagnosis of HSA carries a very poor prognosis with most dogs surviving less than a year. Immediate euthanasia is therefore commonly chosen by owners and veterinarians. The goal of the thesis is to help those dogs by exploring potential biomarkers that can improve HSA diagnosis and prevention.

I chose to explore biomarkers for canine visceral HSA in three major aspects: 1) glycoprotein biomarkers, 2) genetic markers and 3) infrared spectral markers. The aims, major findings, future directions and contributions towards improving HSA diagnosis are summarised as followings:

Aim-1 Validation of candidate glycoprotein biomarkers for canine visceral haemangiosarcoma, using semi-quantitative lectin/immunohistochemical assay

Major findings

1) The locations of candidate glycoprotein expression on HSA tissues included three tissue components: HSA cells, vascular endothelial cells of the artery and vein.

2) The level of tissue glycoprotein expression varied between different markers, but was consistent among tissue components (HSA cells, endothelium) and disease status (HSA vs non-HSA).

3) Among canine visceral HSA glycoprotein candidates tested, complement component 7 (C7) and DSA bound glycoproteins were considered the most promising tissue markers as they demonstrated the ability to distinguish HSA tissue from HSA-like tissues (e.g. hematoma), by the comparison of glycoprotein expression level.

Future direction

Complement component C7 is one of the promising canine visceral HSA biomarkers. This IHC experiment supported the results from LeMBA/MS. The future of C7 application as HSA biomarkers can be as a differential diagnostic and prognostic marker. In a larger sample size where there are dogs with splenic HSA, non-HSA splenic lesions and other types of cancer, serum C7 levels can be evaluated with regard to sensitivity and specificity as a differential

189

diagnostic marker, based on the hypothesis that dogs with HSA will have lower serum C7- levels compared to dogs with other splenic lesions. For being a prognostic marker, serum and tissue C7 level can be assessed in HSA dogs along with different clinical stages and survival time. So, further investigation of complement C7 in serum/tissue and HSA is recommended.

Contributions

I have narrowed down canine visceral HSA glycoprotein biomarker candidates which could facilitate further biomarkers development. Furthermore, I have demonstrated proteins/glycoproteins expression characteristics such as sites/cells involved in glycoprotein production, these could be determined and contribute to more understanding of this cancer.

Aim-2 Identifying genetic risk markers for canine visceral haemangiosarcoma

Major findings

1) Genetic polymorphisms in four genes; PLSCR3, FANCM, MRPL40 and ZNF704 were strongly related to canine visceral HSA where the variants allele frequencies were detected in over 80% of HSA cases, mostly homozygous. While the variants allele frequencies were around 50% in the control, none was homozygous.

2) Five molecular pathways in which genes with HSA-related variants resided include Interferon gamma, Complement, Adipogenesis, Genes down-regulated in CD4 T-cells and Genes down-regulated in marginal zone B-lymphocytes pathways.

Future direction

1) To apply HSA-risk genotypic testing for planned breeding or as a genotypic test used in practice, results validation in larger sample cohorts is still needed. As I have highlighted and narrowed down HSA related germline variant candidates, future studies can target specific genes instead of all the coding genes to save time and cost.

2) To confirm that HSA related germline variants were segregated among some dog breeds, as proposed in Figure 6- 1, breed-based genotypic analysis is worth investigating further; predisposing or risk markers should be validated and specified.

190

Contributions

I have identified candidate genetic risk markers for canine visceral HSA that may be developed as conventional genotypic test to be used by dog breeders, veterinarians or dog owners. By being able to identify the risk genotype of the dogs, dog breeders can avoid breeding risk carrier dogs together, to reduce the segregation of risk gene polymorphisms in the dog populations.

For the veterinarians and dog owners, it is better to know whether the dogs are at risk for developing HSA later in life or not, so that close monitoring for HSA can be performed during health checkup. Whole exome sequence data generated in this study will be submitted to sequence read archive database for future research studies.

Figure 6- 1 The hypothesized pathways of inherited HSA-related germline variants among overrepresented dog breeds. It started with germline mutations that could have increased the risk of HSA. The breeding of pedigree dogs increased the chance of passing HSA risk polymorphisms to the offspring, then

191

in-bred again. From selective mating the heterozygous carrier with another heterozygous carrier, the chance of having homozygous carriers increased and segregated among the pedigree dog breeds. The study has highlighted the potential HSA risk markers that could be used in genotypic testing to help identify dogs at high HSA risk. To further validate the results, genotypic testing in larger and independent cohort of HSA dogs is needed. (A dog silhouette image was modified from www.shutterstock.com). Aim-3 Identifying somatic mutations in canine visceral haemangiosarcoma

Major findings

1) Co-mutated (more than 60% of samples) somatic mutations were found in ADCY1, CNN2, ARHGAP45, ATXN2L, CYP1A2, KRT3, SBNO2, TUFM and U6 genes.

2) Genes that burdened high numbers of mutations included GLUL, RIDA, ESRRA, COL11A2, ZNF296, RSAD2, RIN2, XRCC1, POLRMT, MTFR2 and PSMD4.

3) The proportion of samples having mutations on gene TP53 and PIK3CA in this study cohort was slightly lower, when compared to previous studies (Table 6- 1)

4) Sample-wise Gene Set Enrichment Analysis yielded more informative results regarding the involvement of mutated genes in cancer molecular pathways, comparing to pooled samples. GSEA is one of the powerful tools to assess biomolecular processes in cancer patient to personalize the optimal healthcare management for the patient.

5) AHNAK copy number aberrations may have important roles in canine HSA and should be investigated further.

Future direction

Since somatic mutations in canine visceral HSA was found to be heterogeneous, cancer sub- typing by molecular profile (e.g. germline polymorphisms, gene mutations, gene expression and protein expression) may help in better understanding of disease mechanisms. This may then lead to successfully development of personalized biomarkers such as diagnostic, prognostic and treatment selective markers.

Contributions

I have identified genes that underwent somatic mutations in canine visceral HSA and have narrowed down the potential candidates that were related to canine HSA which can be developed as subtyping markers, therapeutic targets, therapeutic drug selective markers or

192

prognostic markers. Whole exome sequence data generated in this study was to be submitted to sequence read archive database for future research studies.

193

Table 6- 1 Genomic events including simple mutations (e.g. single nucleotide mutations, indels etc.) and copy number variations (gain/loss) in human sarcoma, angiosarcoma and canine haemangiosarcoma. Gene Human Human Canine HSA Sarcoma (%) AS (%) (%) TP53 39.24m 30m,cl 66m 60m 35m - 20m 3.46cg ------34.23cl ------PTEN 5.06m - 6m - 10m - - 4.29cg ------8.08cl ------PIK3CA 3.38m 21m 46m 30m 45m - 20m 5.0cg ------5.77cl ------PLCG1 1.69m 17m 4m - 5m - - 5.77cg ------2.31cl ------KDR 1.69m 26m,cg - - - 20cg - 10.0cg - - 22cg - - - 3.08cl ------MYC 0.84m 11m,cg - - - - - 9.62cg - - 9cg - - - 5.77cl ------KIT 2.53m 9m,g - - - - - 10.0cg - - 17cg - 20cg - 3.46cl ------Ref [81] [80] [85] [82] [84] [83] Chapter4 results

m = simple mutations e.g. single nucleotide mutations, indels, frameshift mutations etc. cg = copy number gain. cl = copy number loss.

194

Aim-4 Using Fourier transformed infrared (FT-IR) imaging to distinguish splenic lesions in canine formalin-fixed, paraffin-embedded (FFPE) tissues.

Major findings

1) FTIR imaging coupled with Partial Least Square Discriminant Analysis (PLS-DA) was able to distinguish HSA from HSA-like FFPE tissue sections with 92% sensitivity and 64% specificity. However, the sensitivity and specificity of the test can be adjusted according to the detection limit cut-off.

Future direction

To apply any tools for clinical use, cost, time and the performance are among the top considerations. For FTIR imaging, standard operating procedures regarding; 1) choice of samples and preparation techniques, 2) mode of data acquisition and 3) data analysis algorithms, are still needed to be optimized and adjusted.

Contributions

I have reported proof-of-concept evaluation of Fourier Transform Infrared (FTIR) imaging for canine HSA diagnosis.

Conclusions

In this PhD project, I have explored the biomarkers of canine visceral HSA in several aspects, using various marker detection methods, from conventional to novel. The findings of my research projects in canine visceral HSA provided the validation for previously discovered glycoprotein biomarkers, proposed genetic risk marker candidates for genotypic testing, and identified somatic mutations in the tumours. These results and data can be used for developing therapeutic targets and reports proof-of-concept evaluation of Fourier Transform Infrared (FTIR) imaging for canine HSA diagnosis.

195

References

1. Michell AR. Longevity of British breeds of dog and its relationships with sex, size, cardiovascular variables and disease. Vet Rec. 1999;145(22):625-9. 2. Adams VJ, Evans KM, Sampson J, Wood JL. Methods and mortality results of a health survey of purebred dogs in the UK. J Small Anim Pract. 2010;51(10):512-24. 3. Robinson WF, Robinson NA. Cardiovascular System. In: Grant MM, editor. Jubb, Kennedy & Palmer's Pathology of Domestic Animals: Volume 3. 6th ed: W.B. Saunders; 2016. p. 99-101. 4. Requena L, Kutzner H. Hemangioendothelioma. Semin Diagn Pathol. 2013;30(1):29-44. 5. Lombardini ED, Summers BA. Two canine malignant vascular tumours with features of human retiform haemangioendothelioma. J Comp Pathol. 2013;148(2–3):225-9. 6. Warren AL, Summers BA. Epithelioid variant of hemangioma and hemangiosarcoma in the dog, horse, and cow. Vet Pathol. 2007;44(1):15-24. 7. Pets in Australia: a national survey of pet and people [Internet]. Animal Medicine Australia (AMA). 2019 [cited 2020]. Available from: https://animalmedicinesaustralia.org.au/report/pets-in- australia-a-national-survey-of-pets-and-people/. 8. AVMA Pet Ownership and Demographics Sourcebook [Internet]. American Veterinary Medical Association. 2017-2018 Edition [cited 2020]. Available from: https://www.avma.org/sites/default/files/resources/AVMA-Pet-Demographics-Executive- Summary.pdf. 9. PAW Report [Internet]. 2019 [cited 2020]. Available from: https://yougov.co.uk/topics/resources/articles-reports/2019/10/31/pdsa-animal-wellbeing-paw- report. 10. Schultheiss PC. A retrospective study of visceral and nonvisceral hemangiosarcoma and hemangiomas in domestic animals. J Vet Diagn Invest. 2004;16(6):522-6. 11. Craig LE. Cause of death in dogs according to breed: a necropsy survey of five breeds. J Am Anim Hosp Assoc. 2001;37(5):438-43. 12. Fleming JM, Creevy KE, Promislow DE. Mortality in north american dogs from 1984 to 2004: an investigation into age-, size-, and breed-related causes of death. J Vet Intern Med. 2011;25(2):187-98. 13. Kent MS, Burton JH, Dank G, Bannasch DL, Rebhun RB. Association of cancer-related mortality, age and gonadectomy in golden retriever dogs at a veterinary academic center (1989- 2016). PLoS One. 2018;13(2):e0192578. 14. Cleveland MJ, Casale S. Incidence of malignancy and outcomes for dogs undergoing splenectomy for incidentally detected nonruptured splenic nodules or masses: 105 cases (2009– 2013). J Am Vet Med Assoc. 2016;248(11):1267-73. 15. Pintar J, Breitschwerdt EB, Hardie EM, Spaulding KA. Acute nontraumatic hemoabdomen in the dog: a retrospective analysis of 39 cases (1987-2001). J Am Anim Hosp Assoc. 2003;39(6):518- 22. 16. Hammond TN, Pesillo-Crosby SA. Prevalence of hemangiosarcoma in anemic dogs with a splenic mass and hemoperitoneum requiring a transfusion: 71 cases (2003–2005). J Am Vet Med Assoc. 2008;232(4):553-8.

196

17. Day MJ, Lucke VM, Pearson H. A review of pathological diagnoses made from 87 canine splenic biopsies. J Small Anim Pract. 1995;36(10):426-33. 18. Eberle N, von Babo V, Nolte I, Baumgartner W, Betz D. Splenic masses in dogs. Part 1: Epidemiologic, clinical characteristics as well as histopathologic diagnosis in 249 cases (2000- 2011). Tierarztl Prax Ausg K Kleintiere Heimtiere. 2012;40(4):250-60. 19. Movilla R, Altet L, Serrano L, Tabar MD, Roura X. Molecular detection of vector-borne pathogens in blood and splenic samples from dogs with splenic disease. Parasit Vectors. 2017;10(1):131. 20. Sherwood JM, Haynes AM, Klocke E, Higginbotham ML, Thomson EM, Weng HY, et al. Occurrence and clinicopathologic features of splenic neoplasia based on body weight: 325 dogs (2003-2013). J Am Anim Hosp Assoc. 2016;52(4):220-6. 21. Fernandez S, Lang JM, Maritato KC. Evaluation of nodular splenic lesions in 370 small-breed dogs (<15 kg). J Am Anim Hosp Assoc. 2019;55(4):201-9. 22. Smith AN. Hemangiosarcoma in dogs and cats. Vet Clin North Am Small Anim Pract. 2003;33(3):533-52. 23. Thamm DH. Miscellaneous Tumors. In: Vail DM, Page RL, editors. Withrow and MacEwen's Small Animal Clinical Oncology. 5th ed. Saint Louis: W.B. Saunders; 2013. p. 679-715. 24. von Beust BR, Suter MM, Summers BA. Factor VIII-related antigen in canine endothelial neoplasms: an immunohistochemical study. Vet Pathol. 1988;25(4):251-5. 25. Ferrer L, Fondevila D, Rabanal RM, Vilafranca M. Immunohistochemical detection of CD31 antigen in normal and neoplastic canine endothelial cells. J Comp Pathol. 1995;112(4):319-26. 26. Giuffrida MA, Bacon NJ, Kamstock DA. Use of routine histopathology and factor VIII-related antigen/von Willebrand factor immunohistochemistry to differentiate primary hemangiosarcoma of bone from telangiectatic osteosarcoma in 54 dogs. Vet Comp Oncol. 2016. 27. Ehrhart EJ, Kamstock DA, Powers BE. The pathology of neoplasia. In: Vail DM, Page RL, editors. Withrow and MacEwen's small animal clinical oncology. 5th ed. Saint Louis: W.B. Saunders; 2013. p. 51-67. 28. Bertazzolo W, Dell'Orco M, Bonfanti U, Ghisleni G, Caniatti M, Masserdotti C, et al. Canine angiosarcoma: cytologic, histologic, and immunohistochemical correlations. Vet Clin Pathol. 2005;34(1):28-34. 29. Goritz M, Muller K, Krastel D, Staudacher G, Schmidt P, Kuhn M, et al. Canine splenic haemangiosarcoma: influence of metastases, chemotherapy and growth pattern on post-splenectomy survival and expression of angiogenic factors. J Comp Pathol. 2013;149(1):30-9. 30. Halsey CHC, Worley DR, Curran K, Charles JB, Ehrhart EJ. The use of novel lymphatic endothelial cell-specific immunohistochemical markers to differentiate cutaneous angiosarcomas in dogs. Vet Comp Oncol. 2016;14(3):236-44. 31. Little D, Said JW, Siegel RJ, Fealy M, Fishbein MC. Endothelial cell markers in vascular neoplasms: An immunohistochemical study comparing factor VIII-related antigen, blood group specific antigens, 6-keto-PGF1 alpha, and Ulex europaeus 1 lectin. J Pathol. 1986;149(2):89-95. 32. Christopher MM. Cytology of the spleen. Vet Clin North Am Small Anim Pract. 2003;33(1):135-52.

197

33. Christensen N, Canfield P, Martin P, Krockenberger M, Spielman D, Bosward K. Cytopathological and histopathological diagnosis of canine splenic disorders. Aust Vet J. 2009;87(5):175-81. 34. Pedro B, Linney C, Navarro-Cubas X, Stephenson H, Dukes-McEwan J, Gelzer AR, et al. Cytological diagnosis of cardiac masses with ultrasound guided fine needle aspirates. J Vet Cardiol. 2016;18(1):47-56. 35. Glinska-Suchocka K, Jankowski M, Kubiak K, Spuzak J, Dzimira S, Nicpon J. Fine needle biopsy of abdominal organs in dogs -- indications, contraindications and performance technique. Pol J Vet Sci. 2013;16(4):835-42. 36. Watson AT, Penninck D, Knoll JS, Keating JH, Sutherland-Smith J. Safety and correlation of test results of combined ultrasound-guided fine-needle aspiration and needle core biopsy of the canine spleen. Vet Radiol Ultrasound. 2011;52(3):317-22. 37. O'Keefe DA, Couto CG. Fine-needle aspiration of the spleen as an aid in the diagnosis of splenomegaly. J Vet Intern Med. 1987;1(3):102-9. 38. Kiehl AR, Mays MBC. Atlas for the diagnosis of tumors in the dog and cat. Ames, Iowa: John Wiley & Sons, Inc.; 2016. 248 p. 39. Warren-Smith CM, Andrew S, Mantis P, Lamb CR. Lack of associations between ultrasonographic appearance of parenchymal lesions of the canine liver and histological diagnosis. J Small Anim Pract. 2012;53(3):168-73. 40. Alder D, Bass D, Sporri M, Kircher P, Ohlerth S. Does real-time elastography aid in differentiating canine splenic nodules? Schweiz Arch Tierheilkd. 2013;155(9):491-6. 41. Ohlerth S, Dennler M, Rüefli E, Hauser B, Poirier V, Siebeck N, et al. Contrast harmonic imaging characterization of canine splenic lesions. J Vet Intern Med. 2008;22(5):1095-102. 42. Kim M, Choi S, Choi H, Lee Y, Lee K. Diagnosis of a large splenic tumor in a dog: computed tomography versus magnetic resonance imaging. J Vet Med Sci. 2016;77(12):1685-7. 43. Boddy KN, Sleeper MM, Sammarco CD, Weisse C, Ghods S, Litt HI. Cardiac magnetic resonance in the differentiation of neoplastic and nonneoplastic pericardial effusion. J Vet Intern Med. 2011;25(5):1003-9. 44. Fukuda S, Kobayashi T, Robertson ID, Oshima F, Fukazawa E, Nakano Y, et al. Computed tomographic features of canine nonparenchymal hemangiosarcoma. Vet Radiol Ultrasound. 2014;55(4):374-9. 45. Jones ID, Lamb CR, Drees R, Priestnall SL, Mantis P. Associations between dual-phase computed tomography features and histopathologic diagnoses in 52 dogs with hepatic or splenic masses. Vet Radiol Ultrasound. 2016;57(2):144-53. 46. Borgatti A, Winter AL, Stuebner K, Scott R, Ober CP, Anderson KL, et al. Evaluation of 18-F- fluoro-2-deoxyglucose (FDG) positron emission tomography/computed tomography (PET/CT) as a staging and monitoring tool for dogs with stage-2 splenic hemangiosarcoma - A pilot study. PLoS One. 2017;12(2):e0172651. 47. Lee G, Kwon SY, Son K, Park S, Lee JH, Cho KO, et al. Clinical usefulness of post-operative 18F-fluorodeoxyglucose positron emission tomography-computed tomography in canine hemangiosarcoma. J Vet Sci. 2016;17(2):257-60. 48. Pearson GR, Head KW. Malignant haemangioendothelioma (angiosarcoma) in the dog. J Small Anim Pract. 1976;17(11):737-45.

198

49. Cullen JM. Tumors of the liver and gall bladder. In: Meuten DJ, editor. Tumors in domestic animals. 5th ed. Ames, Iowa: John Wiley & Sons, Inc.; 2017. p. 602-31. 50. Hendrick MJ. Mesenchymal tumors of the skin and soft tissues. In: Meuten DJ, editor. Tumors in domestic animals. 5th ed: John Wiley & Sons, Inc.; 2016. p. 142-75. 51. North SM, Banks TA. Small Animal Oncology. Philadelphia, USA: Elsvier Health Science; 2009. 304 p. 52. Meuten DJ, Moore FM, George W. Appendix: Diagnostic schemes and algorithms. In: Meuten DJ, editor. Tumors in domestic animals. 5th ed. Ames, Iowa: John Wiley & Sons, Inc.; 2016. p. 942-78. 53. Ogilvie GK, Powers BE, Mallinckrodt CH, Withrow SJ. Surgery and doxorubicin in dogs with hemangiosarcoma. J Vet Intern Med. 1996;10(6):379-84. 54. Sabattini S, Bettini G. An immunohistochemical analysis of canine haemangioma and haemangiosarcoma. J Comp Pathol. 2009;140(2-3):158-68. 55. Lucas DR. Angiosarcoma, radiation-associated angiosarcoma, and atypical vascular lesion. Arch Pathol Lab Med. 2009;133(11):1804-9. 56. Fletcher CDM, Krishnan UK, Fredrik M, editors. WHO classification of tumours. Pathology and genetics of tumours of soft tissue and bone. 3 ed. Lyon, France: IARC Press; 2002. 57. Jo VY, Fletcher CDM. WHO classification of soft tissue tumours: an update based on the 2013 (4th) edition. Pathology. 2014;46(2):95-104. 58. Hendrick MJ, Mahaffy EA, Moore FM, Vos JH, Walder EJ. Histological classification of mesenchymal tumors of skin and soft tissues of domestic animals. 3rd ed. Washington D.C.: Armed Forces Institute of Pathology American Registry of Pathology; 1998. 64 p. 59. Gamlem H, Nordstoga K. Canine vascular neoplasia--histologic classification and inmunohistochemical analysis of 221 tumours and tumour-like lesions. APMIS Suppl. 2008(125):19-40. 60. Machida N, Arimura T, Otagiri Y, Kiryu K, Oka T. Epithelioid haemangioendothelioma of the lung in a dog. J Comp Pathol. 1998;119(3):317-22. 61. Vincek V, Zaulyanov L, Mirzabeigi M. Kaposiform hemangioendothelioma: the first reported case in a nonhuman animal species. Vet Pathol. 2004;41(6):695-7. 62. Hendrick MJ. Kaposiform hemangioendothelioma: the first reported case in a nonhuman animal species. Vet Pathol. 2005;42(4):518. 63. Finotello R, Stefanello D, Zini E, Marconato L. Comparison of doxorubicin-cyclophosphamide with doxorubicin-dacarbazine for the adjuvant treatment of canine hemangiosarcoma. Vet Comp Oncol. 2017;15(1):25-35. 64. Clifford CA, Mackin AJ, Henry CJ. Treatment of canine hemangiosarcoma: 2000 and beyond. J Vet Intern Med. 2000;14(5):479-85. 65. Gardner HL, London CA, Portela RA, Nguyen S, Rosenberg MP, Klein MK, et al. Maintenance therapy with toceranib following doxorubicin-based chemotherapy for canine splenic hemangiosarcoma. BMC Vet Res. 2015;11(1):131. 66. Kahn SA, Mullin CM, de Lorimier LP, Burgess KE, Risbon RE, Fred RM, 3rd, et al. Doxorubicin and deracoxib adjuvant therapy for canine splenic hemangiosarcoma: a pilot study. Can Vet J. 2013;54(3):237-42.

199

67. U'Ren LW, Biller BJ, Elmslie RE, Thamm DH, Dow SW. Evaluation of a novel tumor vaccine in dogs with hemangiosarcoma. J Vet Intern Med. 2007;21(1):113-20. 68. Wirth KA, Kow K, Salute ME, Bacon NJ, Milner RJ. In vitro effects of Yunnan Baiyao on canine hemangiosarcoma cell lines. Vet Comp Oncol. 2016;14(3):281-94. 69. Brown DC, Reetz JR. Single agent polysaccharopeptide delays metastases and improves survival in naturally occurring hemangiosarcoma. Evid Based Complement Alternat Med. 2012;2012:8. 70. Spangler WL, Kass PH. Pathologic factors affecting postsplenectomy survival in dogs. J Vet Intern Med. 1997;11(3):166-71. 71. Szivek A, Burns RE, Gericota B, Affolter VK, Kent MS, Rodriguez CO, Jr., et al. Clinical outcome in 94 cases of dermal haemangiosarcoma in dogs treated with surgical excision: 1993- 2007*. Vet Comp Oncol. 2012;10(1):65-73. 72. Pandey M, Sutton GR, Giri S, Martin MG. Grade and prognosis in localized primary angiosarcoma. Clin Breast Cancer. 2015;15(4):266-9. 73. Schick AR, Hayes GM, Singh A, Mathews KG, Higginbotham ML, Sherwood JM. Development and validation of a hemangiosarcoma likelihood prediction model in dogs presenting with spontaneous hemoabdomen: The HeLP score. J Vet Emerg Crit Care (San Antonio). 2019;29(3):239-45. 74. Hanahan D, Weinberg RA. The hallmarks of cancer. Cell. 2000;100(1):57-70. 75. Hanahan D, Weinberg RA. Hallmarks of cancer: the next generation. Cell. 2011;144(5):646-74. 76. Newkirk KM, Brannick EM, Kusewitt DF. Neoplasia and Tumor Biology. In: Zachary JF, editor. Pathologic Basis of Veterinary Disease. 6th ed. St. Louis, Missouri: Elsvier; 2017. p. 286- 321. 77. Argyle DJ, Khanna C. Tumor Biology and Metastasis. In: Vail DM, Page RL, editors. Withrow and MacEwen's small animal clinical oncology. 5th ed. Saint Louis: W.B. Saunders; 2013. p. 30-50. 78. Huang SC, Zhang L, Sung YS, Chen CL, Kao YC, Agaram NP, et al. Recurrent CIC gene abnormalities in angiosarcomas: A molecular study of 120 cases with concurrent investigation of PLCG1, KDR, MYC, and FLT4 gene alterations. Am J Surg Pathol. 2016;40(5):645-55. 79. Cao L, Hong J, Wang Y, Yu J, Ma R, Li J, et al. A primary splenic angiosarcoma hepatic metastasis after splenectomy and its genomic alteration profile. Medicine (Baltimore) [Internet]. 2019 [cited 2019]; 98(28):[e16245 p.]. Available from: https://journals.lww.com/md- journal/Fulltext/2019/07120/A_primary_splenic_angiosarcoma_hepatic_metastasis.17.aspx. 80. Painter CA, Jain E, Tomson BN, Dunphy M, Stoddard RE, Thomas BS, et al. The Angiosarcoma Project: enabling genomic and clinical discoveries in a rare cancer through patient- partnered research. Nat Med. 2020;26(2):181-7. 81. Grossman RL, Heath AP, Ferretti V, Varmus HE, Lowy DR, Kibbe WA, et al. Toward a shared vision for cancer genomic data. N Engl J Med. 2016;375(12):1109-12. 82. Megquier K, Turner-Maier J, Swofford R, Kim JH, Sarver AL, Wang C, et al. Comparative genomics reveals shared mutational landscape in canine hemangiosarcoma and human angiosarcoma. Mol Cancer Res. 2019. 83. Thomas R, Borst L, Rotroff D, Motsinger-Reif A, Lindblad-Toh K, Modiano JF, et al. Genomic profiling reveals extensive heterogeneity in somatic DNA copy number aberrations of canine hemangiosarcoma. Chromosome Res. 2014;22(3):305-19. 200

84. Wang G, Wu M, Maloneyhuss MA, Wojcik J, Durham AC, Mason NJ, et al. Actionable mutations in canine hemangiosarcoma. PLoS One. 2017;12(11):e0188667. 85. Wang G, Wu M, Durham AC, Radaelli E, Mason NJ, Xu X, et al. Molecular subtypes in canine hemangiosarcoma reveal similarities with human angiosarcoma. PLoS One. 2020;15(3):e0229728. 86. Tamburini BA, Phang TL, Fosmire SP, Scott MC, Trapp SC, Duckett MM, et al. Gene expression profiling identifies inflammation and angiogenesis as distinguishing features of canine hemangiosarcoma. BMC Cancer. 2010;10:619. 87. Tamburini BA, Trapp S, Phang TL, Schappa JT, Hunter LE, Modiano JF. Gene expression profiles of sporadic canine hemangiosarcoma are uniquely associated with breed. PLoS One. 2009;4(5):e5549. 88. Boston SE, Higginson G, Monteith G. Concurrent splenic and right atrial mass at presentation in dogs with HSA: a retrospective study. J Am Anim Hosp Assoc. 2011;47(5):336-41. 89. Robinson KL, Bryan ME, Atkinson ES, Keeler MR, Hahn AW, Bryan JN. Neutering is associated with developing hemangiosarcoma in dogs in the Veterinary Medical Database: An age and time-period matched case-control study (1964-2003). Can Vet J. 2020;61(5):499-504. 90. Tonomura N, Elvers I, Thomas R, Megquier K, Turner-Maier J, Howald C, et al. Genome-wide association study identifies shared risk loci common to two malignancies in golden retrievers. PLoS Genet. 2015;11(2):e1004922. 91. Belanger JM, Bellumori TP, Bannasch DL, Famula TR, Oberbauer AM. Correlation of neuter status and expression of heritable disorders. Canine Genet Epidemiol. 2017;4:6. 92. Breitschwerdt EB, Maggi RG, Varanat M, Linder KE, Weinberg G. Isolation of Bartonella vinsonii subsp. berkhoffii genotype II from a boy with epithelioid hemangioendothelioma and a dog with hemangiopericytoma. J Clin Microbiol. 2009;47(6):1957-60. 93. Varanat M, Maggi RG, Linder KE, Breitschwerdt EB. Molecular prevalence of Bartonella, Babesia, and hemotropic Mycoplasma sp. in dogs with splenic disease. J Vet Intern Med. 2011;25(6):1284-91. 94. Lashnits E, Neupane P, Bradley JM, Richardson T, Thomas R, Linder KE, et al. Molecular prevalence of Bartonella, Babesia, and hemotropic Mycoplasma species in dogs with hemangiosarcoma from across the United States. PLoS One. 2020;15(1):e0227234. 95. Lamerato-Kozicki AR, Helm KM, Jubala CM, Cutter GC, Modiano JF. Canine hemangiosarcoma originates from hematopoietic precursors with potential for endothelial differentiation. Exp Hematol. 2006;34(7):870-8. 96. Gorden BH, Kim JH, Sarver AL, Frantz AM, Breen M, Lindblad-Toh K, et al. Identification of three molecular and functional subtypes in canine hemangiosarcoma through gene expression profiling and progenitor cell characterization. Am J Pathol. 2014;184(4):985-95. 97. Kakiuchi-Kiyota S, Obert LA, Crowell DM, Xia S, Roy MD, Coskran TM, et al. Expression of hematopoietic stem and endothelial cell markers in canine hemangiosarcoma. Toxicol Pathol. 2020;48(3):481-93. 98. Laifenfeld D, Gilchrist A, Drubin D, Jorge M, Eddy SF, Frushour BP, et al. The role of hypoxia in 2-butoxyethanol-induced hemangiosarcoma. Toxicol Sci. 2010;113(1):254-66. 99. FDA-NIH Biomarker Working Group. BEST (Biomarkers, EndpointS, and other Tools) Resource. Food and Drug Administration (US) Co-published by National Institutes of Health (US); 2016.

201

100. Jain KK. Biomarkers of Cancer. The Handbook of Biomarkers. Totowa, NJ: Humana Press; 2010. p. 189-326. 101. Niedbala RS, Mauck C, Harrison P, Doncel GF. Biomarker discovery: validation and decision- making in product development. Sex Transm Dis. 2009;36(3 Suppl):S76-80. 102. Committee on the Review of Omics-Based Tests for Predicting Patient Outcomes in Clinical Trials, Board on Health Care Services, Board on Health Sciences Policy, Institute of Medicine. Evolution of translational omics: Lessons learned and the path forward. In: Micheel CM, Nass SJ, Omenn GS, editors. Evolution of translational omics: Lessons learned and the path forward. Washington (DC): National Academies Press (US); 2012. 103. Pepe MS, Etzioni R, Feng Z, Potter JD, Thompson ML, Thornquist M, et al. Phases of biomarker development for early detection of cancer. J Natl Cancer Inst. 2001;93(14):1054-61. 104. Myers MJ, Smith ER, Turfle PG. Biomarkers in veterinary medicine. Annu Rev Anim Biosci. 2016;5:65-87. 105. Nabity MB, Lees GE, Boggess MM, Yerramilli M, Obare E, Yerramilli M, et al. Symmetric dimethylarginine assay validation, stability, and evaluation as a marker for the early detection of chronic kidney disease in dogs. J Vet Intern Med. 2015;29(4):1036-44. 106. Rowell JL, McCarthy DO, Alvarez CE. Dog models of naturally occurring cancer. Trends Mol Med. 2011;17(7):380-8. 107. Schiffman JD, Breen M. Comparative oncology: what dogs and other species can teach us about humans with cancer. Philos Trans R Soc Lond B Biol Sci. 2015;370(1673). 108. Baker MJ, Trevisan J, Bassan P, Bhargava R, Butler HJ, Dorling KM, et al. Using Fourier transform IR spectroscopy to analyze biological materials. Nat Protoc. 2014;9(8):1771-91. 109. Poste G. Bring on the biomarkers. Nature. 2011;469(7329):156-7. 110. Goossens N, Nakagawa S, Sun X, Hoshida Y. Cancer biomarker discovery and validation. Transl Cancer Res. 2015;4(3):256-69. 111. Davies O, Taylor AJ. Refining the "double two-thirds" rule: Genotype-based breed grouping and clinical presentation help predict the diagnosis of canine splenic mass lesions in 288 dogs. Vet Comp Oncol. 2020;18(4):548-58. 112. Maruyama H, Watari T, Miura T, Sakai M, Takahashi T, Koie H, et al. Plasma thrombin- antithrombin complex concentrations in dogs with malignant tumours. Vet Rec. 2005;156(26):839- 40. 113. Thamm DH, Kamstock DA, Sharp CR, Johnson SI, Mazzaferro E, Herold LV, et al. Elevated serum thymidine kinase activity in canine splenic hemangiosarcoma*. Vet Comp Oncol. 2012;10(4):292-302. 114. Yuki M, Machida N, Sawano T, Itoh H. Investigation of serum concentrations and immunohistochemical localization of alpha1-acid glycoprotein in tumor dogs. Vet Res Commun. 2011;35(1):1-11. 115. Clifford CA, Hughes D, Beal MW, Mackin AJ, Henry CJ, Shofer FS, et al. Plasma vascular endothelial growth factor concentrations in healthy dogs and dogs with hemangiosarcoma. J Vet Intern Med. 2001;15(2):131-5. 116. Chikazawa S, Hori Y, Hoshi F, Kanai K, Ito N, Higuchi S. Hyperferritinemia in dogs with splenic hemangiosarcoma. J Vet Med Sci. 2013;75(11):1515-8.

202

117. Frenz M, Kaup FJ, Neumann S. Serum vascular endothelial growth factor in dogs with haemangiosarcoma and haematoma. Res Vet Sci. 2014;97(2):257-62. 118. Kirby GM, Mackay A, Grant A, Woods P, McEwen B, Khanna C, et al. Concentration of lipocalin region of collagen XXVII alpha 1 in the serum of dogs with hemangiosarcoma. J Vet Intern Med. 2011;25(3):497-503. 119. Fukumoto S, Miyasho T, Hanazono K, Saida K, Kadosawa T, Iwano H, et al. Big endothelin-1 as a tumour marker for canine haemangiosarcoma. Vet J. 2015;204(3):269-74. 120. Heishima K, Mori T, Ichikawa Y, Sakai H, Kuranaga Y, Nakagawa T, et al. MicroRNA-214 and microRNA-126 are potential biomarkers for malignant endothelial proliferative diseases. Int J Mol Sci. 2015;16(10):25377-91. 121. Varki A CR, Esko JD, et al. Essentials of Glycobiology 2nd ed. New York: Cold Spring Harbor Laboratory Press; 2009 [cited 2019]. Available from: https://www.ncbi.nlm.nih.gov/books/NBK1932/. 122. Choi E. Diagnostic glycoprotein biomarkers for canine hemangiosarcoma: a potential model of human angiosarcoma [Dissertation/Thesis]: The University of Queensland, School of Veterinary Science. ; 2013. 123. Chun R, Kellihan HB, Henik RA, Stepien RL. Comparison of plasma cardiac troponin I concentrations among dogs with cardiac hemangiosarcoma, noncardiac hemangiosarcoma, other neoplasms, and pericardial effusion of nonhemangiosarcoma origin. J Am Vet Med Assoc. 2010;237(7):806-11. 124. Wong RW, Gonsalves MN, Huber ML, Rich L, Strom A. Erythrocyte and biochemical abnormalities as diagnostic markers in dogs with hemangiosarcoma related hemoabdomen. Vet Surg. 2015;44(7):852-7. 125. Regan DP, Escaffi A, Coy J, Kurihara J, Dow SW. Role of monocyte recruitment in hemangiosarcoma metastasis in dogs. Vet Comp Oncol. 2017;15(4):1309-22. 126. Hosaka M, Murase N, Orito T, Mori M. Immunohistochemical evaluation of factor VIII related antigen, filament proteins and lectin binding in haemangiomas. Virchows Archiv A. 1985;407(3):237-47. 127. Jakab C, Halasz J, Kiss A, Schaff Z, Rusvai M, Galfi P, et al. Claudin-5 protein is a new differential marker for histopathological differential diagnosis of canine hemangiosarcoma. Histol Histopathol. 2009;24(7):801-13. 128. Yonemaru K, Sakai H, Murakami M, Kodama A, Mori T, Yanai T, et al. The significance of p53 and retinoblastoma pathways in canine hemangiosarcoma. J Vet Med Sci. 2007;69(3):271-8. 129. Asa SA, Murai A, Murakami M, Hoshino Y, Mori T, Maruo K, et al. Expression of platelet- derived growth factor and its receptors in spontaneous canine hemangiosarcoma and cutaneous hemangioma. Histol Histopathol. 2012;27(5):601-7. 130. Heller DA, Clifford CA, Goldschmidt MH, Holt DE, Manfredi MJ, Sorenmo KU. Assessment of cyclooxygenase-2 expression in canine hemangiosarcoma, histiocytic sarcoma, and mast cell tumor. Vet Pathol. 2005;42(3):350-3. 131. Chandler HL, Newkirk KM, Kusewitt DF, Dubielzig RR, Colitz CM. Immunohistochemical analysis of ocular hemangiomas and hemangiosarcomas in dogs. Vet Ophthalmol. 2009;12(2):83- 90.

203

132. Petterino C, Rossetti E, Drigo M. Immunodetection of the signal transducer and activator of transcription-3 in canine haemangioma and haemangiosarcoma. Res Vet Sci. 2006;80(2):186-8. 133. Kodama A, Sakai H, Murakami M, Murai A, Mori T, Maruo K, et al. Immunohistochemical demonstration of angiogenesis-associated homeobox proteins in canine vascular tumours. J Comp Pathol. 2009;141(2-3):199-203. 134. Abou Asa S, Mori T, Maruo K, Khater A, El-Sawak A, Abd el-Aziz E, et al. Analysis of genomic mutation and immunohistochemistry of platelet-derived growth factor receptors in canine vascular tumours. Vet Comp Oncol. 2015;13(3):237-45. 135. Anwar S, Yanai T, Sakai H. Immunohistochemical detection of urokinase plasminogen activator and urokinase plasminogen activator receptor in canine vascular endothelial tumours. J Comp Pathol. 2015;153(4):278-82. 136. Adachi M, Hoshino Y, Izumi Y, Takagi S. Immunohistochemical detection of a potential molecular therapeutic target for canine hemangiosarcoma. J Vet Med Sci. 2016;78(4):649-56. 137. Chen YC, Liao JW, Hsu WL, Chang SC. Identification of the two KIT isoforms and their expression status in canine hemangiosarcomas. BMC Vet Res. 2016;12(1):142. 138. Dowling M, Samuelson J, Fadl-Alla B, Pondenis HC, Byrum M, Barger AM, et al. Overexpression of prostate specific membrane antigen by canine hemangiosarcoma cells provides opportunity for the molecular detection of disease burdens within hemorrhagic body cavity effusions. PLoS One. 2019;14(1):e0210297. 139. Silverstein RM, Webster FX, Kiemle D. Spectrometric identification of organic compounds. 7th ed. Chichester: Wiley; 2005. 512 p. 140. Butcher G. Anatomy of an electromagnetic wave. In: Parkinson C L, Wollack EJ, editors. The tour of the electromagnetic spectrum. 3rd ed. Washington D.C.: National Aeronautics and Space Administration; 2016. 141. Bhargava R. Infrared spectroscopic imaging: the next generation. Appl Spectrosc. 2012;66(10):1091-120. 142. Li G, Torraca G, Jing W, Wen Z-q. Applications of FTIR in identification of foreign materials for biopharmaceutical clinical manufacturing. Vib Spectrosc. 2009;50(1):152-9. 143. Pięta E, Olszewska-Świetlik J, Paluszkiewicz C, Zając A, Kwiatek WM. Application of ATR- FTIR mapping to identification and distribution of pigments, binders and degradation products in a 17th century painting. Vib Spectrosc. 2019;103:102928. 144. Thermo Nicolet Corporation. Introduction to Fourier transform infrared spectroscopy. Madison WI: Thermo Nicolet Corporation; 2001. 145. Tiwari S, Bhargava R. Extracting knowledge from chemical imaging data using computational algorithms for digital cancer diagnosis. Yale J Biol Med. 2015;88(2):131-43. 146. Kochan K, Heraud P, Kiupel M, Yuzbasiyan-Gurkan V, McNaughton D, Baranska M, et al. Comparison of FTIR transmission and transfection substrates for canine liver cancer detection. Analyst. 2015;140(7):2402-11. 147. Filik J, Frogley MD, Pijanka JK, Wehbe K, Cinque G. Electric field standing wave artefacts in FTIR micro-spectroscopy of biological materials. Analyst. 2012;137(4):853-61. 148. Maharani A, Aoshima K, Onishi S, Gulay KCM, Kobayashi A, Kimura T. Cellular atypia is negatively correlated with immunohistochemical reactivity of CD31 and vWF expression levels in canine hemangiosarcoma. J Vet Med Sci. 2018;80(2):213-8. 204

149. Miles K. Imaging abdominal masses. Vet Clin North Am Small Anim Pract. 1997;27(6):1403- 31. 150. Ballegeer EA, Forrest LJ, Dickinson RM, Schutten MM, Delaney FA, Young KM. Correlation of ultrasonographic appearance of lesions and cytologic and histologic diagnoses in splenic aspirates from dogs and cats: 32 cases (2002-2005). J Am Vet Med Assoc. 2007;230(5):690-6. 151. Ghisleni G, Roccabianca P, Ceruti R, Stefanello D, Bertazzolo W, Bonfanti U, et al. Correlation between fine-needle aspiration cytology and histopathology in the evaluation of cutaneous and subcutaneous masses from dogs and cats. Vet Clin Pathol. 2006;35(1):24-30. 152. Simon D, Schoenrock D, Nolte I, Baumgärtner W, Barron R, Mischke R. Cytologic examination of fine-needle aspirates from mammary gland tumors in the dog: diagnostic accuracy with comparison to histopathology and association with postoperative outcome. Vet Clin Pathol. 2009;38(4):521-8. 153. Mills JN, Griffiths GL. The accuracy of clinical diagnoses by fine-needle aspiration cytology. Aust Vet J. 1984;61(8):269-71. 154. Bahr KL, Sharkey LC, Murakami T, Feeney DA. Accuracy of US-guided FNA of focal liver lesions in dogs: 140 cases (2005-2008). J Am Anim Hosp Assoc. 2013;49(3):190-6. 155. Pinho SS, Reis CA. Glycosylation in cancer: mechanisms and clinical implications. Nat Rev Cancer. 2015;15(9):540-55. 156. Munkley J, Elliott DJ. Hallmarks of glycosylation in cancer. Oncotarget. 2016;7(23):35478-89. 157. Vajaria BN, Patel PS. Glycosylation: a hallmark of cancer? Glycoconj J. 2017;34(2):147-56. 158. Ramos-Vara JA, Miller MA, Dusold DM. Immunohistochemical expression of CD31 (PECAM-1) in nonendothelial tumors of dogs. Vet Pathol. 2018;55(3):402-8. 159. van Diest PJ, van Dam P, Henzen-Logmans SC, Berns E, van der Burg ME, Green J, et al. A scoring system for immunohistochemical staining: consensus report of the task force for basic research of the EORTC-GCCG. European Organization for Research and Treatment of Cancer- Gynaecological Cancer Cooperative Group. J Clin Pathol. 1997;50(10):801-4. 160. Wickham H. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York2016 [cited 2019]. Available from: https://ggplot2.tidyverse.org. 161. R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.2018-2019. Available from: https://www.R-project.org/. 162. Therneau T, Atkinson B, (2009). rpart: Recursive Partitioning and Regression Tree. R package version. 4.1-15 ed2019. https://CRAN.R-project.org/package=rpart. 163. Seshan VE. clinfun: Clinical Trial Design and Data Analysis Functions. R package. 1.0.15 ed2018. https://CRAN.R-project.org/package=clinfun. 164. Jensen K, Krusenstjerna-Hafstrom R, Lohse J, Petersen KH, Derand H. A novel quantitative immunohistochemistry method for precise protein measurements directly in formalin-fixed, paraffin-embedded specimens: analytical performance measuring HER2. Mod Pathol. 2017;30(2):180-93. 165. Specht E, Kaemmerer D, Sanger J, Wirtz RM, Schulz S, Lupp A. Comparison of immunoreactive score, HER2/neu score and H score for the immunohistochemical evaluation of somatostatin receptors in bronchopulmonary neuroendocrine neoplasms. Histopathology. 2015;67(3):368-77.

205

166. dela Paz NG, D'Amore PA. Arterial versus venous endothelial cells. Cell Tissue Res. 2009;335(1):5-16. 167. Wolf K, Hu H, Isaji T, Dardik A. Molecular identity of arteries, veins, and lymphatics. J Vasc Surg. 2019;69(1):253-62. 168. Kacevska M, Mahns A, Sharma R, Clarke SJ, Robertson GR, Liddle C. Extra-hepatic cancer represses hepatic drug metabolism via interleukin (IL)-6 signalling. Pharm Res. 2013;30(9):2270-8. 169. Augustin-Voss HG, Smith CA, Lewis RM. Phenotypic characterization of normal and neoplastic canine endothelial cells by lectin histochemistry. Vet Pathol. 1990;27(2):103-9. 170. Stowell SR, Ju T, Cummings RD. Protein glycosylation in cancer. Annu Rev Pathol. 2015;10:473-510. 171. NCBI, Resource, Coordinators. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2013;42(D1):D7-D17. 172. Langeggen H, Pausa M, Johnson E, Casarsa C, Tedesco F. The endothelium is an extrahepatic site of synthesis of the seventh component of the complement system. Clin Exp Immunol. 2000;121(1):69-76. 173. Bossi F, Rizzi L, Bulla R, Debeus A, Tripodo C, Picotti P, et al. C7 is expressed on endothelial cells as a trap for the assembling terminal complement complex and may exert anti-inflammatory function. Blood. 2009;113(15):3640-8. 174. Ying L, Zhang F, Pan X, Chen K, Zhang N, Jin J, et al. Complement component 7 (C7), a potential tumor suppressor, is correlated with tumor progression and prognosis. Oncotarget. 2016;7(52):86536-46. 175. Seol HS, Lee SE, Song JS, Rhee JK, Singh SR, Chang S, et al. Complement proteins C7 and CFH control the stemness of liver cancer cells via LSF-1. Cancer Lett. 2016;372(1):24-35. 176. Ramos-Vara JA, Kiupel M, Baszler T, Bliven L, Brodersen B, Chelack B, et al. Suggested guidelines for immunohistochemical techniques in veterinary diagnostic laboratories. J Vet Diagn Invest. 2008;20(4):393-413. 177. Broeckx BJG, Derrien T, Mottier S, Wucher V, Cadieu E, Hedan B, et al. An exome sequencing based approach for genome-wide association studies in the dog. Sci Rep. 2017;7(1):15680. 178. Auwera Vd, Geraldine A, Carneiro MO, Hartl C, Poplin R, del Angel G, et al. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr Protoc Bioinformatics. 2013;11(1110):11.0.1-.0.33. 179. Koboldt DC, Zhang Q, Larson DE, Shen D, McLellan MD, Lin L, et al. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 2012;22(3):568-76. 180. Conomos MP GS, Brown L, Chen H, Rice K, Sofer T, Thornton T, Yu C. GENESIS: GENetic EStimation and Inference in Structured samples (GENESIS): Statistical methods for analyzing genetic data from samples with population structure and/or relatedness. R package version 2.14.1 2019 [Available from: https://github.com/UW-GAC/GENESIS. 181. McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GR, Thormann A, et al. The Ensembl Variant Effect Predictor. Genome Biol. 2016;17(1):122.

206

182. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences. 2005;102(43):15545-50. 183. Ng PC, Henikoff S. Predicting deleterious amino acid substitutions. Genome Res. 2001;11(5):863-74. 184. Liu J, Epand RF, Durrant D, Grossman D, Chi NW, Epand RM, et al. Role of phospholipid scramblase 3 in the regulation of tumor necrosis factor-alpha-induced apoptosis. Biochemistry. 2008;47(15):4518-29. 185. Pan X, Chen Y, Biju B, Ahmed N, Kong J, Goldenberg M, et al. FANCM suppresses DNA replication stress at ALT telomeres by disrupting TERRA R-loops. Sci Rep. 2019;9(1):19110. 186. Wang H, Li S, Oaks J, Ren J, Li L, Wu X. The concerted roles of FANCM and Rad52 in the protection of common fragile sites. Nat Commun. 2018;9(1):2791. 187. Basbous J, Constantinou A. A tumor suppressive DNA translocase named FANCM. Crit Rev Biochem Mol Biol. 2019;54(1):27-40. 188. Silva B, Pentz R, Figueira AM, Arora R, Lee YW, Hodson C, et al. FANCM limits ALT activity by restricting telomeric replication stress induced by deregulated BLM and R-loops. Nat Commun. 2019;10(1):2253. 189. Bernal M, Yang X, Lisby M, Mazón G. The FANCM family Mph1 helicase localizes to the mitochondria and contributes to mtDNA stability. DNA Repair. 2019;82:102684. 190. Xue X, Sung P, Zhao X. Functions and regulation of the multitasking FANCM family of DNA motor proteins. Genes Dev. 2015;29(17):1777-88. 191. Peterlongo P, Catucci I, Colombo M, Caleca L, Mucaki E, Bogliolo M, et al. FANCM c.5791C>T nonsense mutation (rs144567652) induces exon skipping, affects DNA repair activity and is a familial breast cancer risk factor. Hum Mol Genet. 2015;24(18):5345-55. 192. Kiiski JI, Pelttari LM, Khan S, Freysteinsdottir ES, Reynisdottir I, Hart SN, et al. Exome sequencing identifies FANCM as a susceptibility gene for triple-negative breast cancer. Proc Natl Acad Sci U S A. 2014;111(42):15172-7. 193. Kiiski JI, Tervasmäki A, Pelttari LM, Khan S, Mantere T, Pylkäs K, et al. FANCM mutation c.5791C>T is a risk factor for triple-negative breast cancer in the Finnish population. Breast Cancer Res Treat. 2017;166(1):217-26. 194. Dicks E, Song H, Ramus SJ, Oudenhove EV, Tyrer JP, Intermaggio MP, et al. Germline whole exome sequencing and large-scale replication identifies FANCM as a likely high grade serous ovarian cancer susceptibility gene. Oncotarget. 2017;8(31):50930-40. 195. Lopes JL, Chaudhry S, Lopes GS, Levin NK, Tainsky MA. FANCM, RAD1, CHEK1 and TP53I3 act as BRCA-like tumor suppressors and are mutated in hereditary ovarian cancer. Cancer Genet. 2019;235-236:57-64. 196. Catucci I, Osorio A, Arver B, Neidhardt G, Bogliolo M, Zanardi F, et al. Individuals with FANCM biallelic mutations do not develop Fanconi anemia, but show risk for breast cancer, chemotherapy toxicity and may display chromosome fragility. Genet Med. 2018;20(4):452-7. 197. Nikolaidi A, Konstantopoulou I, Pistalmantzian N, Fostira F, Yannoukakos D, Athanasiadis I. A patient affected with serous ovarian/peritoneal carcinoma carrying the FANCM mutation. Case Rep Oncol Med. 2019;2019:9357924.

207

198. Sotgia F, Whitaker-Menezes D, Martinez-Outschoorn UE, Salem AF, Tsirigos A, Lamb R, et al. Mitochondria "fuel" breast cancer metabolism: fifteen markers of mitochondrial biogenesis label epithelial cancer cells, but are excluded from adjacent stromal cells. Cell Cycle. 2012;11(23):4390- 401. 199. Jen J, Wang YC. Zinc finger proteins in cancer progression. J Biomed Sci. 2016;23(1):53. 200. Liu H, Zhao H. Prognosis related miRNAs, DNA methylation, and epigenetic interactions in lung adenocarcinoma. Neoplasma. 2019;66(3):487-93. 201. Cheng A, Zhao S, FitzGerald LM, Wright JL, Kolb S, Karnes RJ, et al. A four-gene transcript score to predict metastatic-lethal progression in men treated for localized prostate cancer: development and validation studies. Prostate. 2019;79(14):1589-96. 202. Jia J, Ren J, Yan D, Xiao L, Sun R. Association between the XRCC6 polymorphisms and cancer risks: a systematic review and meta-analysis. Medicine (Baltimore). 2015;94(1):e283. 203. Glozak MA, Seto E. Histone deacetylases and cancer. Oncogene. 2007;26(37):5420-32. 204. Tang F, Choy E, Tu C, Hornicek F, Duan Z. Therapeutic applications of histone deacetylase inhibitors in sarcoma. Cancer Treat Rev. 2017;59:33-45. 205. Eto S, Saeki K, Yoshitake R, Yoshimoto S, Shinada M, Ikeda N, et al. Anti-tumor effects of the histone deacetylase inhibitor vorinostat on canine urothelial carcinoma cells. PLoS One. 2019;14(6):e0218382. 206. Da Ros S, Aresu L, Ferraresso S, Zorzan E, Gaudio E, Bertoni F, et al. Validation of epigenetic mechanisms regulating gene expression in canine B-cell lymphoma: An in vitro and in vivo approach. PLoS One. 2018;13(12):e0208709. 207. Kisseberth WC, Murahari S, London CA, Kulp SK, Chen CS. Evaluation of the effects of histone deacetylase inhibitors on cells from canine cancer cell lines. Am J Vet Res. 2008;69(7):938- 45. 208. Elshafae SM, Kohart NA, Altstadt LA, Dirksen WP, Rosol TJ. The effect of a histone deacetylase inhibitor (AR-42) on canine prostate cancer growth and metastasis. Prostate. 2017;77(7):776-93. 209. Wittenburg LA, Bisson L, Rose BJ, Korch C, Thamm DH. The histone deacetylase inhibitor valproic acid sensitizes human and canine osteosarcoma to doxorubicin. Cancer Chemother Pharmacol. 2011;67(1):83-92. 210. Thayanithy V, Park C, Sarver AL, Kartha RV, Korpela DM, Graef AJ, et al. Combinatorial treatment of DNA and chromatin-modifying drugs cause cell death in human and canine osteosarcoma cell lines. PLoS One. 2012;7(9):e43720. 211. Nagamine MK, Sanches DS, Pinello KC, Torres LN, Mennecier G, Latorre AO, et al. In vitro inhibitory effect of trichostatin A on canine grade 3 mast cell tumor. Vet Res Commun. 2011;35(6):391-9. 212. Dias JNR, Aguiar SI, Pereira DM, André AS, Gano L, Correia JDG, et al. The histone deacetylase inhibitor panobinostat is a potent antitumor agent in canine diffuse large B-cell lymphoma. Oncotarget. 2018;9(47):28586-98. 213. Cohen LA, Powers B, Amin S, Desai D. Treatment of canine haemangiosarcoma with suberoylanilide hydroxamic acid, a histone deacetylase inhibitor. Vet Comp Oncol. 2004;2(4):243- 8. 214. Thamm DH. SAHA and hemangiosarcoma: another view. Vet Comp Oncol. 2005;3(2):101. 208

215. Neuhauser TS, Derringer GA, Thompson LD, Fanburg-Smith JC, Miettinen M, Saaristo A, et al. Splenic angiosarcoma: a clinicopathologic and immunophenotypic study of 28 cases. Mod Pathol. 2000;13(9):978-87. 216. Cioffi A, Reichert S, Antonescu CR, Maki RG. Angiosarcomas and other sarcomas of endothelial origin. Hematol Oncol Clin North Am. 2013;27(5):975-88. 217. Katogiritis A, Khanna C. Towards the delivery of precision veterinary cancer medicine. Vet Clin North Am Small Anim Pract. 2019;49(5):809-18. 218. Rao SR, Somarelli JA, Altunel E, Selmic LE, Byrum M, Sheth MU, et al. From the clinic to the bench and back again in one dog year: How a cross-species pipeline to identify new treatments for sarcoma illuminates the path forward in precision medicine. Front Oncol. 2020;10:117. 219. Schwartzberg L, Kim ES, Liu D, Schrag D. Precision oncology: Who, how, what, when, and when not? Am Soc Clin Oncol Educ Book. 2017;37:160-9. 220. Galanina N, Bejar R, Choi M, Goodman A, Wieduwilt M, Mulroney C, et al. Comprehensive genomic profiling reveals diverse but actionable molecular portfolios across hematologic malignancies: Implications for next generation clinical trials. Cancers (Basel). 2018;11(1). 221. Mukherjee S. Genomics-guided immunotherapy for precision medicine in cancer. Cancer Biother Radiopharm. 2019;34(8):487-97. 222. Morganti S, Tarantino P, Ferraro E, D’Amico P, Viale G, Trapani D, et al. Complexity of genome sequencing and reporting: Next generation sequencing (NGS) technologies and implementation of precision medicine in real life. Crit Rev Oncol Hematol. 2019;133:171-82. 223. Takeda M, Sakai K, Takahama T, Fukuoka K, Nakagawa K, Nishio K. New Era for Next- Generation Sequencing in Japan. Cancers (Basel). 2019;11(6). 224. Robinson JT, Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES, Getz G, et al. Integrative genomics viewer. Nat Biotechnol. 2011;29(1):24-6. 225. Blokzijl F, Janssen R, van Boxtel R, Cuppen E. MutationalPatterns: comprehensive genome- wide analysis of mutational processes. Genome Med. 2018;10(1):33. 226. Tate JG, Bamford S, Jubb HC, Sondka Z, Beare DM, Bindal N, et al. COSMIC: the Catalogue Of Somatic Mutations In Cancer. Nucleic Acids Res. 2019;47(D1):D941-d7. 227. Olshen AB, Venkatraman ES, Lucito R, Wigler M. Circular binary segmentation for the analysis of array‐based DNA copy number data. Biostatistics. 2004;5(4):557-72. 228. Knudson AG. Hereditary cancer: two hits revisited. J Cancer Res Clin Oncol. 1996;122(3):135- 40. 229. Knudson AG. Cancer genetics. Am J Med Genet. 2002;111(1):96-102. 230. Alberts B. Molecular biology of the cell. 6th ed. New York: Garland Science, Taylor and Francis Group; 2015. 1465 p. 231. Pease JC, Tirnauer JS. Mitotic spindle misorientation in cancer--out of alignment and into the fire. J Cell Sci. 2011;124(Pt 7):1007-16. 232. Giladi M, Schneiderman RS, Voloshin T, Porat Y, Munster M, Blat R, et al. Mitotic spindle disruption by alternating electric fields leads to improper chromosome segregation and mitotic catastrophe in cancer cells. Sci Rep. 2015;5:18046. 233. Noatynska A, Gotta M, Meraldi P. Mitotic spindle (DIS)orientation and DISease: cause or consequence? J Cell Biol. 2012;199(7):1025-35. 209

234. Kennedy K, Thomas R, Durrant J, Jiang T, Motsinger-Reif A, Breen M. Genome-wide DNA copy number analysis and targeted transcriptional analysis of canine histiocytic malignancies identifies diagnostic signatures and highlights disruption of spindle assembly complex. Chromosome Res. 2019;27(3):179-202. 235. Sanchez-Vega F, Mina M, Armenia J, Chatila WK, Luna A, La KC, et al. Oncogenic signaling pathways in The Cancer Genome Atlas. Cell. 2018;173(2):321-37.e10. 236. Sandri C, Caccavari F, Valdembri D, Camillo C, Veltel S, Santambrogio M, et al. The R- Ras/RIN2/Rab5 complex controls endothelial cell adhesion and morphogenesis via active integrin endocytosis and Rac signaling. Cell Res. 2012;22(10):1479-501. 237. Tang FH, Chang WA, Tsai EM, Tsai MJ, Kuo PL. Investigating novel genes potentially involved in endometrial adenocarcinoma using Next-Generation Sequencing and bioinformatic approaches. Int J Med Sci. 2019;16(10):1338-48. 238. Tang J, Yang Q, Cui Q, Zhang D, Kong D, Liao X, et al. Weighted gene correlation network analysis identifies RSAD2, HERC5, and CCL8 as prognostic candidates for breast cancer. J Cell Physiol. 2020;235(1):394-407. 239. Zhou H, Zhang C, Li H, Chen L, Cheng X. A novel risk score system of immune genes associated with prognosis in endometrial cancer. Cancer Cell Int. 2020;20:240. 240. Fejzo MS, Anderson L, Chen HW, Guandique E, Kalous O, Conklin D, et al. Proteasome ubiquitin receptor PSMD4 is an amplification target in breast cancer and may predict sensitivity to PARPi. Gene Chromosomes Canc. 2017;56(8):589-97. 241. Diederichs S, Bartsch L, Berkmann JC, Fröse K, Heitmann J, Hoppe C, et al. The dark matter of the cancer genome: aberrations in regulatory elements, untranslated regions, splice sites, non- coding RNA and synonymous mutations. EMBO Mol Med. 2016;8(5):442-57. 242. Deshpande A, Pastore A, Deshpande AJ, Zimmermann Y, Hutter G, Weinkauf M, et al. 3'UTR mediated regulation of the cyclin D1 proto-oncogene. Cell Cycle. 2009;8(21):3592-600. 243. Park HJ, Ji P, Kim S, Xia Z, Rodriguez B, Li L, et al. 3′ UTR shortening represses tumor- suppressor genes in trans by disrupting ceRNA crosstalk. Nat Genet. 2018;50(6):783-9. 244. Amado-Azevedo J, Reinhard NR, van Bezu J, van Nieuw Amerongen GP, van Hinsbergh VWM, Hordijk PL. The minor histocompatibility antigen 1 (HMHA1)/ArhGAP45 is a RacGAP and a novel regulator of endothelial integrity. Vascul Pharmacol. 2018;101:38-47. 245. Nam JY, Kim NK, Kim SC, Joung JG, Xi R, Lee S, et al. Evaluation of somatic copy number estimation tools for whole-exome sequencing data. Brief Bioinform. 2016;17(2):185-92. 246. Zare F, Dow M, Monteleone N, Hosny A, Nabavi S. An evaluation of copy number variation detection tools for cancer using whole exome sequencing data. BMC Bioinformatics. 2017;18(1):286. 247. Kadalayil L, Rafiq S, Rose-Zerilli MJ, Pengelly RJ, Parker H, Oscier D, et al. Exome sequence read depth methods for identifying copy number changes. Brief Bioinform. 2015;16(3):380-92. 248. Davis TA, Loos B, Engelbrecht AM. AHNAK: The giant jack of all trades. Cell Signal. 2014;26(12):2683-93. 249. Chen B, Wang J, Dai D, Zhou Q, Guo X, Tian Z, et al. AHNAK suppresses tumour proliferation and invasion by targeting multiple pathways in triple-negative breast cancer. J Exp Clin Cancer Res. 2017;36(1):65.

210

250. Dumitru CA, Bankfalvi A, Gu X, Zeidler R, Brandau S, Lang S. AHNAK and inflammatory markers predict poor survival in laryngeal carcinoma. PLoS One. 2013;8(2):e56420. 251. Davis T, van Niekerk G, Peres J, Prince S, Loos B, Engelbrecht AM. Doxorubicin resistance in breast cancer: A novel role for the human protein AHNAK. Biochem Pharmacol. 2018;148:174-83. 252. Cole PA. Association of canine splenic hemangiosarcomas and hematomas with nodular lymphoid hyperplasia or siderotic nodules. J Vet Diagn Invest. 2012;24(4):759-62. 253. Herman EJ, Stern AW, Fox RJ, Dark MJ. Understanding the efficiency of splenic hemangiosarcoma diagnosis using Monte Carlo simulations. Vet Pathol. 2019;56(6):856-9. 254. Noreen R, Chien CC, Delugin M, Yao S, Pineau R, Hwu Y, et al. Detection of collagens in brain tumors based on FTIR imaging and chemometrics. Anal Bioanal Chem. 2011;401(3):845-52. 255. Ukkonen H, Vuokila S, Mikkonen JJW, Dekker H, Schulten E, Bloemena E, et al. Biochemical changes in irradiated oral mucosa: A FTIR spectroscopic study. Biosensors (Basel). 2019;9(1). 256. Akalin A, Ergin A, Remiszewski S, Mu X, Raz D, Diem M. Resolving Interobserver discrepancies in lung cancer diagnoses by spectral histopathology. Arch Pathol Lab Med. 2019;143(2):157-73. 257. Notarstefano V, Sabbatini S, Conti C, Pisani M, Astolfi P, Pro C, et al. Investigation of human pancreatic cancer tissues by Fourier Transform Infrared Hyperspectral Imaging. J Biophotonics. 2020;13(4):e201960071. 258. Das K, Kendall C, Isabelle M, Fowler C, Christie-Brown J, Stone N. FTIR of touch imprint cytology: a novel tissue diagnostic technique. J Photochem Photobiol B. 2008;92(3):160-4. 259. Urboniene V, Pucetaite M, Jankevicius F, Zelvys A, Sablinskas V, Steiner G. Identification of kidney tumor tissue by infrared spectroscopy of extracellular matrix. J Biomed Opt. 2014;19(8):087005. 260. Tian P, Zhang W, Zhao H, Lei Y, Cui L, Zhang Y, et al. Intraoperative detection of sentinel lymph node metastases in breast carcinoma by Fourier transform infrared spectroscopy. Br J Surg. 2015;102(11):1372-9. 261. Pucetaite M, Velicka M, Urboniene V, Ceponkus J, Bandzeviciute R, Jankevicius F, et al. Rapid intra-operative diagnosis of kidney cancer by attenuated total reflection infrared spectroscopy of tissue smears. J Biophotonics. 2018;11(5):e201700260. 262. Wood BR, Kiupel M, McNaughton D. Progress in Fourier transform infrared spectroscopic imaging applied to venereal cancer diagnosis. Vet Pathol. 2014;51(1):224-37. 263. Roy S, Perez-Guaita D, Andrew DW, Richards JS, McNaughton D, Heraud P, et al. Simultaneous ATR-FTIR based determination of malaria parasitemia, glucose and urea in whole blood dried onto a glass slide. Anal Chem. 2017;89(10):5238-45. 264. Heraud P, Chatchawal P, Wongwattanakul M, Tippayawat P, Doerig C, Jearanaikoon P, et al. Infrared spectroscopy coupled to cloud-based data management as a tool to diagnose malaria: a pilot study in a malaria-endemic country. Malar J. 2019;18(1):348. 265. Aksoy C, Guliyev A, Kilic E, Uckan D, Severcan F. Bone marrow mesenchymal stem cells in patients with beta thalassemia major: molecular analysis with attenuated total reflection-Fourier transform infrared spectroscopy study as a novel method. Stem Cells Dev. 2012;21(11):2000-11. 266. Pilling MJ, Bassan P, Gardner P. Comparison of transmission and transflectance mode FTIR imaging of biological tissue. Analyst. 2015;140(7):2383-92.

211

267. Ohta N, Watanabe T, Ito T, Kubota T, Suzuki Y, Ishida A, et al. Clinical and pathological characteristics of organized hematoma. Int J Otolaryngol. 2013;2013:539642. 268. Zhang XX, Yin JH, Mao ZH, Xia Y. Discrimination of healthy and osteoarthritic articular cartilages by Fourier transform infrared imaging and partial least squares-discriminant analysis. J Biomed Opt. 2015;20(6):060501. 269. Mao ZH, Yin JH, Zhang XX, Wang X, Xia Y. Discrimination of healthy and osteoarthritic articular cartilage by Fourier transform infrared imaging and Fisher's discriminant analysis. Biomed Opt Express. 2016;7(2):448-53.

212

Appendices

Appendix 1

Patients profile summary.

Diagnosis Breed Sex Age Organ Growth Grade Pattern

HSA Australian Cattle Dog na* na spleen solid 3

HSA Beagle MS 7 spleen solid 1

HSA Border Collie MS 9 sternal mass solid 3

HSA Bull Mastief F na spleen solid 2

HSA Bull terrier na 14 spleen mixed 3

HSA Crossbred M 6 spleen typical 3

HSA Crossbred M 12 spleen typical 3

HSA Crossbred MS 11 spleen mixed 3

HSA Crossbred na na spleen mixed 2

HSA Crossbred na na spleen mixed 3

HSA Crossbred F 13 spleen typical 2

HSA Curly Retriever M 12 heart typical 2

HSA Curly Retriever na 12 spleen mixed 1

HSA Dachshund FS 6 spleen mixed 2

HSA Dalmatian MS 9 spleen mixed 3

HSA German Shepherd MS 7 spleen typical 2

HSA German Shepherd M 11 peritoneal mass cavernous 1

HSA German Shepherd MS 13 spleen mixed 3

HSA Golden Retriever na 7 spleen mixed 2

HSA Golden Retriever FS 14 spleen solid 1

HSA Golden Retriever M 14 spleen mixed 1

HSA Irish setters F 11 spleen typical 2

213

HSA Irish setters F 12 liver typical 2

HSA Maltese FS 11 spleen solid 2

HSA na na 9 spleen mixed 2

HSA na na na spleen solid 3

HSA na na na liver typical 1

HSA Rottweiller na na spleen mixed 3

HSA Springer spaniel MS 9 spleen typical 2

HSA Staffordshire bull terrier FS 9 spleen cavernous 1

HSA Stumpy tail cattle dog FS 14 spleen typical 2

HSA Terrier F 12 spleen mixed 2

Adenocarcinoma German Shepherd F 10 spleen na na

Apocrine gland German Shepherd MS 13 spleen na na tumor

Fibrohistiocytic Dachshund M 8 spleen na na nodule

Haemorrhage Maltese M 6 spleen na na

Haemorrhage na na na spleen na na

Hematoma Australian Cattle Dog F 11 spleen na na

Hematoma Beagle FS 12 spleen na na

Hematoma na FS na spleen na na

Hematoma Staffordshire bull terrier FS 10 spleen na na

Hematoma na na 16 spleen na na

Hepatocellular Crossbred na 13 spleen na na Carcinoma

Hepatocellular Jack Russell terrier MS na liver na na Carcinoma

Hepatocellular na na na liver na na Carcinoma

Hepatocellular na na na liver na na Carcinoma

214

Hepatocellular West Highland terrier FS 10 liver na na Carcinoma

Lymphoid Golden Retriever FS 14 spleen na na hyperplasia

Lymphoma Labrador Retriever MS na spleen na na

Lymphoma Rhodesian ridgeback M 12 spleen na na

Lymphoma Staffordshire bull terrier FS 11 spleen na na

Nodular Maltese FS 14 spleen na na hyperplasia

Normal spleen Crossbred M 13 spleen na na

Normal spleen French bulldog na na spleen na na

Normal spleen Terrier M na spleen na na

Questioned Labrador Retriever MS 9 spleen na na

Questioned na na na spleen na na

Splenitis Crossbred M 8 spleen na na

* na = not available/not applicable

215

Appendix 2

Boxplots of H-scores distribution for normal arterial and venous endothelial cells. The H-scores were assessed from lectin-immunohistochemical labelling using DSA-Datura stramonium, WGA- Wheat Germ Agglutinin and SNA-Sambucus nigra lectins and anti-human CD31, anti-human VTN and anti-human complement C7 antibodies. The H-scores of arterial endothelium and venous endothelium from two disease status groups (HSA vs. non-HSA) were compared, no statistically significant differences were observed. Note. AE = Arterial Endothelium, VE = Venous Endothelium.

216

Appendix 3

Downloaded whole exome sequence data (n = 45).

Group Breed SRR number Bio Project Source

non-HSA Golden Retriever SRR5519101 PRJNA385156 [172] (Control) SRR5519102

SRR5519103

SRR5519104

SRR5519106

SRR5519107

Labrador retriever SRR5519108

SRR5519109

SRR5519110

SRR5519111

SRR5519112

SRR5519113

Poodle SRR5519114

SRR5519115

SRR5519116

SRR5519117

SRR5519118

SRR5519119

SRR5519120

SRR5519093

SRR5519094

SRR5519095

SRR5519096

SRR5519097

217

SRR5519098

SRR5519099

SRR5519100

SRR5519105

HSA Golden Retriever SRR6277173 PRJNA417727 [83]

SRR6277211 PRJNA417727

SRR6277175 PRJNA417727

SRR6277177 PRJNA417727

SRR6277179 PRJNA417727

Labrador SRR6277206 PRJNA417727 Retriever

German Shepherd SRR6277209 PRJNA417727

German Shepherd SRR6277192 PRJNA417727 mix

Bichon Frise SRR6277191 PRJNA417727

Nova Scotia Duck SRR6277196 PRJNA417727 Tolling Retriever

Beagle mix SRR6277193 PRJNA417727

Rottweiler mix SRR6277188 PRJNA417727

Soft-coated SRR6277185 PRJNA417727 wheaten terrier

na SRR6277183 PRJNA417727

Beagle mix SRR6277214 PRJNA417727

Portugese water SRR6277202 PRJNA417727 dog

Bishon Frise SRR6277199 PRJNA417727

218

Appendix 4

Exome wide significantly HSA associated small insertions and deletions (indels) and annotated genes.

chr_pos_allele(s) Gene AF AF HSA HSA p-value Case Control US Aus

5_77412493_ MARVELD3 83.33 2.35 (15+0)19 (5+0)/5 1.74E-11 AA/-

10_23657143_ DESI1 91.07 53.57 (16+5)/21 (7+0)/7 8.41E-09 TT/-

15_18558884_ KRR1 89.29 12.02 (21+0)/21 (4+0)/7 5.25E-08 AAAAA/-

13_5659244_ DPYS 53.57 4.34 (15+0)/21 (0+7)/7 3.97E-07 TTGG/-

11_18011622_ ISOC1 85.19 15.07 (19+0)/20 (4+0)/7 7.16E-07 AA/-

18_12018551_ SFRP4 94.64 60.68 (19+2)/21 (6+1)/7 1.70E-06 TT/-

20_36972378_ SFMBT1 78.57 55.77 (16+5)/21 (0+7)/7 2.56E-05 AGAG/-

6_49476645_ GPR88 92.31 29.11 (19+0)/19 (5+0)/7 1.75E-04 AA/-

16_33555270_ PPP2CB 69.64 96.28 (8+13)/21 (3+4)/7 4.15E-04 TG/-

21_27089869_ OR51C1P 51.79 55.33 (1+20)/21 (0+7)/7 5.08E-04 TCAGT/-

1_108350589_ EHD2 28.57 5.35 (8+0)/21 (0+0)/7 9.45E-04 TTTT/-

30_32297987_ FEM1B 60.71 12.39 (17+0)/21 (0+0)/7 1.22E-03 AAAA/-

219

35_7519752_ DSP 50.0 0 (3+2)/15 (7+0)/7 1.76E-03 CAGCG/-

35_7519760_ DSP 47.83 0 (3+2)/16 (7+0)/7 2.44E-03 TGAGAAA/-

1_102181775_ SBK3 47.62 4.74 (8+0)/17 (2+0)/4 3.55E-03 ATAA/-

6_9233200_ MEPCE 87.50 98.21 (14+7)/21 (7+0)/21 7.38E-03 CTCCGGCCC CGCCCTTCAG/-

2_73579846_ DHDDS 82.14 100 (17+4)/21 (1+6)/7 8.90E-03 CTT/-

7_29257444_ NME7 83.93 93.44 (12+9)/21 (7+0)/7 1.11E-02 GC/-

32_443981_ U1 65.22 39.83 (9+5)/16 (2+3)/7 1.17E-02 TG/-

35_23691735_ H2BC1 51.92 32.0 (6+6)/19 (3+3)/7 2.13E-02 CA/-

34_7277813_ UBE2QL1 62.96 26.71 (16+0)/20 (0+1)/7 3.04E-02 GAAAG/-

30_21662183_ MNS1 96.43 84.62 (20+1)/21 (6+1)/7 4.02E-02 GC/-

220

Appendix 5 (part 1)

Allele frequencies (%) of annotated single nucleotide variants identified using exome variant association scan approach. Cases with higher allele segregation when compared to controls are shown. Number of cases carrying the alleles in US samples and Australian samples demonstrated by: (Homozygous+Heterozygous)/Total.

AF HSA HSA Position AF Chr (bp) Allele Gene Case Control US Aus p value 1 102123402 C/G ZNF865 48.15 0 (1+19)/20 (0+7)/7 4.80E-13 2 77345896 T/C HSPG2 42.59 1.79 (0+18)/21 (0+5)/7 2.72E-10 2 77345896 T/C HSPG2 42.59 1.79 (0+18)/21 (0+5)/7 2.72E-10 2 77345897 A/G HSPG2 40.74 1.79 (0+18)/20 (0+4)/7 5.17E-10 2 77345897 A/G HSPG2 40.74 1.79 (0+18)/20 (0+4)/7 5.17E-10 5 77412492 G/A MARVELD3 83.3 2.35 (15+0)/19 (5+0)/5 1.74E-11 5 32330465 A/G ACADVL 92.59 6.7 (20+0)/20 (5+0)/7 4.68E-11 5 32330465 A/G PLSCR3 92.59 6.7 (20+0)/20 (5+0)/7 4.68E-11 5 32330465 A/G TNK1 92.59 6.7 (20+0)/20 (5+0)/7 4.68E-11 5 32330470 G/A ACADVL 92.0 9.3 (19+0)/19 (4+0)/6 1.80E-08 5 32330470 G/A PLSCR3 92.0 9.3 (19+0)/19 (4+0)/6 1.80E-08 5 32330470 G/A TNK1 92.0 9.3 (19+0)/19 (4+0)/6 1.80E-08 6 39891567 G/A FBXL16 91.61 20.87 (16+0)/17 (6+0)/7 2.47E-08 6 39891567 G/A JMJD8 91.61 20.87 (16+0)/17 (6+0)/7 2.47E-08 6 39891567 G/A JMJD8 91.61 20.87 (16+0)/17 (6+0)/7 2.47E-08 6 39891567 G/A STUB1 91.61 20.87 (16+0)/17 (6+0)/7 2.47E-08 6 39891567 G/A WDR24 91.61 20.87 (16+0)/17 (6+0)/7 2.47E-08 8 72320921 A/C AKT1 35.71 0 (0+14)/21 (0+6)/7 5.23E-08 9 49748755 A/C ABO 46.43 6.12 (0+20)/21 (0+6)/7 5.36E-09 10 26681571 G/C SOX10 50.0 11.54 (0+21)/21 (0+7)/21 1.07E-08 10 26681601 G/C SOX10 50.0 11.37 (0+21)/21 (0+7)/7 1.49E-08 10 17055863 C/G HDAC10 50.0 0 (3+13)/17 (0+5)/7 2.33E-08 10 17055863 C/G PANX2 50.0 0 (3+13)/17 (0+5)/7 2.33E-08 11 2055922 C/G HNRNPH1 53.57 6.40 (4+16)/21 (0+6)/7 4.52E-09 11 2055922 C/G RUFY1 53.57 6.40 (4+16)/21 (0+6)/7 4.52E-09 14 39516858 C/T CBX3 58.93 8.16 (6+14)/21 (0+7)/7 1.94E-09

221

15 52226379 T/A FGB 95.64 22.13 (19+2)/21 (6+1)/7 1.45E-08 15 18558883 T/A KRR1 89.29 12.02 (21+0)/21 (4+0)/7 5.25E-08 15 18558883 T/A SALL2 89.29 12.02 (21+0)/21 (4+0)/7 5.25E-08 16 46742570 A/C RWDD4 98.21 57.14 (0+1)/21 (0+0)/7 9.43E-09 20 39437199 T/A TRAIP 56.52 0.004 (11+0)/17 (2+0)/6 2.25E-08 21 26043319 T/A NUMA1 88.89 10.66 (19+0)/20 (5+0)/7 6.17E-08 23 19762376 T/A NR1D2 97.92 9.0 (18+0)/18 (5+1)/6 8.67E-09 24 18184314 A/C AVP 46.43 4.55 (3+15)/21 (1+3)/7 3.14E-10 24 42409111 A/C RBM38 37.04 4.17 (0+16)/20 (0+4)/7 5.03E-09 25 28042882 T/A MSRA 85.19 6.9 (20+0)/21 (3+0)/6 1.10E-11 26 22615544 T/A GASL1 85.71 5.61 (13+0)/14 (5+0)/7 4.70E-09 27 24589659 A/T ETNK1 94.44 10.86 (18+1)/20 (7+0)/7 3.74E-09 28 4228581 T/C BMS1 98.15 44.08 (19+1)/20 (7+0)/7 1.46E-08

222

Appendix 5 (part 2)

Allele frequencies (%) of annotated single nucleotide variants identified using exome variant association scan approach. Cases with lower allele segregation when compared to controls are shown. Number of cases carrying the alleles in US samples and Australian samples demonstrated by: (Homozygous+Heterozygous)/Total.

Chr Position Allele Gene AF AF HSA US HSA p value (bp) Case Control Aus

1 106010837 C/T KLK1 15.38 50.0 (0+4)/19 (0+4)/7 4.70E-08

1 106010837 C/T KLK2 15.38 50.0 (0+4)/19 (0+4)/7 4.70E-08

1 106010837 C/T KLK4 15.38 50.0 (0+4)/19 (0+4)/7 4.70E-08

2 10296439 G/T ARMC3 12.50 50.0 (0+6)/21 (0+1)/21 2.10E-08

3 72537104 A/G PDS5A 19.64 50.0 (4+0)/21 (0+7)/7 1.61E-10

3 71251182 A/T JAKMIP1 1.92 41.07 (0+1)/19 (0+0)/7 1.40E-09

3 71251189 G/T JAKMIP1 1.92 41.07 (0+1)/19 (0+0)/7 1.40E-09

3 72537445 T/C PDS5A 19.64 47.32 (0+4)/21 (0+7)/7 6.40E-08

5 77232978 T/G TLE7 1.79 39.29 (0+1)/21 (0+0)/7 1.65E-08

5 77232993 A/G TLE7 1.79 39.29 (0+1)/21 (0+0)/7 1.65E-08

8 22594357 C/G FANCM 7.14 46.43 (0+3)/21 (0+1)/7 3.36E-10

8 22594357 C/G FANCM 7.14 46.43 (0+3)/21 (0+1)/7 3.36E-10

8 22594357 C/G MIS18BP1 7.14 46.43 (0+3)/21 (0+1)/7 3.36E-10

8 22594350 T/G FANCM 8.93 46.43 (0+3)/21 (0+2)/7 8.74E-09

8 22594350 T/G MIS18BP1 8.93 46.43 (0+3)/21 (0+2)/7 8.74E-09

8 22594311 G/T FANCM 10.71 46.43 (0+4)/21 (0+2)/7 2.63E-08

8 22594311 G/T FANCM 10.71 46.43 (0+4)/21 (0+2)/7 2.63E-08

8 22594313 A/G FANCM 10.71 46.43 (0+4)/21 (0+2)/7 2.63E-08

223

8 22594313 A/G FANCM 10.71 46.43 (0+4)/21 (0+2)/7 2.63E-08

8 22594311 G/T MIS18BP1 10.71 46.43 (0+4)/21 (0+2)/7 2.63E-08

8 22594313 A/G MIS18BP1 10.71 46.43 (0+4)/21 (0+2)/7 2.63E-08

9 2661907 C/T CYTH1 57.14 100 (3+18)/21 (1+6)/7 1.18E-11

9 5234418 T/G GGA3 10.71 48.21 (0+4)/21 (0+2)/7 1.52E-08

9 5234418 T/G GGA3 10.71 48.21 (0+4)/21 (0+2)/7 1.52E-08

9 5234418 T/G MIF4GD 10.71 48.21 (0+4)/21 (0+2)/7 1.52E-08

9 5234418 T/G MRPS7 10.71 48.21 (0+4)/21 (0+2)/7 1.52E-08

9 19299928 G/T HDAC5 2.08 39.29 (0+1)/16 (0+0)/7 7.05E-08

10 12356994 A/G PTPRB 2.08 46.43 (0+0)/17 (0+1)/6 7.81E-11

10 12356994 A/G PTPRB 2.08 46.43 (0+0)/17 (0+1)/6 7.81E-11

10 23658488 T/C DESI1 1.79 41.07 (0+1)/21 (0+0)/7 4.39E-09

10 23658488 T/C XRCC6 1.79 41.07 (0+1)/21 (0+0)/7 4.39E-09

10 23658488 T/C XRCC6 1.79 41.07 (0+1)/21 (0+0)/7 4.39E-09

10 23657142 A/G DESI1 8.93 46.43 (0+5)/21 (0+0)/7 8.41E-09

10 23657152 G/C DESI1 8.93 46.43 (0+5)/21 (0+0)/7 8.41E-09

10 23657154 G/C DESI1 8.93 46.43 (0+5)/21 (0+0)/7 8.41E-09

10 23657142 A/G XRCC6 8.93 46.43 (0+5)/21 (0+0)/7 8.41E-09

10 23657152 G/C XRCC6 8.93 46.43 (0+5)/21 (0+0)/7 8.41E-09

10 23657154 G/C XRCC6 8.93 46.43 (0+5)/21 (0+0)/7 8.41E-09

15 15544670 C/T RPS8 1.85 44.64 (0+1)/20 (0+0)/7 7.43E-10

15 15544670 C/T SNORD38A 1.85 44.64 (0+1)/20 (0+0)/7 7.43E-10

15 15544670 C/T SNORD38B 1.85 44.64 (0+1)/20 (0+0)/7 7.43E-10

224

15 15544670 C/T SNORD39 1.85 44.64 (0+1)/20 (0+0)/7 7.43E-10

15 15544670 C/T SNORD46 1.85 44.64 (0+1)/20 (0+0)/7 7.43E-10

15 18698940 G/A KRR1 8.93 46.3 (0+1)/21 (0+4)/7 4.75E-09

15 18698940 G/A KRR1 8.93 46.3 (0+1)/21 (0+4)/7 4.75E-09

15 18698942 C/T KRR1 8.93 46.3 (0+1)/21 (0+4)/7 4.75E-09

15 18698942 C/T KRR1 8.93 46.3 (0+1)/21 (0+4)/7 4.75E-09

15 15544646 G/A RPS8 5.36 44.64 (0+2)/21 (0+1)/7 2.58E-08

15 15544648 G/A RPS8 5.36 44.64 (0+2)/21 (0+1)/7 2.58E-08

15 15544646 G/A SNORD38A 5.36 44.64 (0+2)/21 (0+1)/7 2.58E-08

15 15544648 G/A SNORD38A 5.36 44.64 (0+2)/21 (0+1)/7 2.58E-08

15 15544646 G/A SNORD38B 5.36 44.64 (0+2)/21 (0+1)/7 2.58E-08

15 15544648 G/A SNORD38B 5.36 44.64 (0+2)/21 (0+1)/7 2.58E-08

15 15544646 G/A SNORD39 5.36 44.64 (0+2)/21 (0+1)/7 2.58E-08

15 15544648 G/A SNORD39 5.36 44.64 (0+2)/21 (0+1)/7 2.58E-08

15 15544646 G/A SNORD46 5.36 44.64 (0+2)/21 (0+1)/7 2.58E-08

15 15544648 G/A SNORD46 5.36 44.64 (0+2)/21 (0+1)/7 2.58E-08

16 14366546 C/T ZN777 1.79 50.0 (0+1)/21 (0+0)/7 1.23E-13

16 14366553 C/T ZN777 1.79 50.0 (0+1)/21 (0+0)/7 1.23E-13

16 14366562 G/T ZN777 1.85 50.0 (0+1)/21 (0+0)/7 3.64E-13

16 14366563 G/T ZN777 1.85 50.0 (0+1)/21 (0+0)/6 3.64E-13

16 14366516 T/C ZNF777 1.79 44.23 (0+1)/21 (0+0)/7 1.64E-10

16 46742573 C/T RWDD4 1.79 42.86 (0+1)/21 (0+0)/7 9.43E-09

17 59571075 A/G PRPF3 7.14 51.79 (0+4)/21 (0+0)/7 1.34E-10

225

17 59411591 G/A PLEKHO1 2.0 48.21 (0+0)/18 (0+1)/7 8.63E-10

18 12018565 T/A SFRP4 3.57 40.65 (0+1)/21 (0+1)/7 6.08E-08

18 12018584 G/C SFRP4 1.79 38.55 (0+0)/21 (0+0)/7 6.49E-08

18 54104345 A/G AHNAK 1.79 42.21 (0+0)/21 (0+1)/7 8.41E-08

18 45362284 C/T AP2A2 3.57 46.47 (0+1)/21 (0+1)/6 9.36E-08

21 40941597 T/C TSG101 5.36 50.0 (0+3)/21 (0+0)/7 7.95E-10

24 46178769 G/A LSM14B 1.92 46.15 (0+0)/19 (0+1)/7 4.30E-11

24 21379094 A/T PPRG1 3.57 89.09 (0+2)/21 (0+0)/7 1.35E-10

24 21379094 A/T TTLL9 3.57 89.09 (0+2)/21 (0+0)/7 1.35E-10

24 32910604 G/A WFDC10A 3.57 43.21 (0+2)/21 (0+0)/7 1.93E-08

26 29775195 A/C MRPL40 1.79 50.0 (0+1)/21 (0+0)/7 7.28E-12

26 29775156 G/A MRPL40 3.57 50.0 (19+2)/21 (7+0)/7 4.00E-11

26 29775207 C/T HIRA 3.57 50.0 (0+2)/21 (0+0)/7 1.28E-10

26 29775207 C/T MRPL40 3.57 50.0 (0+2)/21 (0+0)/7 1.28E-10

26 29775177 G/A HIRA 5.36 50.0 (0+3)/21 (0+0)/7 3.26E-10

26 29775177 G/A MRPL40 5.36 50.0 (0+3)/21 (0+0)/7 3.26E-10

26 29775136 C/T HIRA 1.79 42.86 (0+1)/21 (0+0)/7 1.28E-08

26 29775136 C/T MRPL40 1.79 42.86 (0+1)/21 (0+0)/7 1.28E-08

26 27303267 G/C G6PC3 5.36 42.98 (0+1)/21 (0+2)/7 7.05E-08

29 28067178 A/G ZNF704 1.79 55.36 (0+1)/21 (0+0)/7 1.99E-12

29 15409348 G/A PDE7A 1.79 40.0 (0+1)/21 (0+0)/7 3.13E-09

29 15409350 C/A PDE7A 1.79 40.0 (0+1)/21 (0+0)/7 3.13E-09

29 15409381 G/C PDE7A 1.79 40.0 (0+1)/21 (0+0)/7 3.13E-09

226

29 5339956 C/T ATP6V1H 3.57 50.0 (0+0)/21 (0+2)/7 4.76E-09

29 5339934 C/G ATP6V1H 3.57 46.0 (0+0)/21 (0+2)/7 6.10E-08

29 5339935 A/G ATP6V1H 3.57 46.0 (0+0)/21 (0+2)/7 6.10E-08

30 37816883 G/C CYP1A2 1.79 43.25 (0+1)/21 (0+0)/7 2.87E-09

30 37816875 A/C CYP1A2 1.92 43.25 (0+1)/19 (0+0)/7 3.97E-08

34 34385269 A/G MYNN 1.79 40.0 (0+1)/20 (0+0)/7 2.34E-09

38 725236 C/T ETNK2 1.85 48.21 (0+1)/19 (0+0)/7 2.12E-09

38 725229 G/A ETNK2 1.85 46.43 (0+1)/20 (0+0)/7 6.70E-09

38 725296 G/A ETNK2 8.93 53.57 (0+5)/21 (0+0)/7 3.56E-08

38 725301 A/T ETNK2 8.93 53.57 (0+5)/21 (0+0)/7 3.56E-08

38 725256 G/A ETNK2 5.56 50.0 (0+3)/20 (0+0)/7 5.96E-08

227

Appendix 5 (part 3)

Consequence and predicted impact of annotated significantly HSA associated single nucleotide variants.

Chr Position Allele Gene Consequence Impact

1 102123402 C/G ZNF865 missense_variant MODERATE

1 106010837 C/T KLK1 intron_variant MODIFIER

1 106010837 C/T KLK2 intron_variant MODIFIER

1 106010837 C/T KLK4 downstream_gene_variant MODIFIER

2 77345896 T/C HSPG2 synonymous_variant LOW

2 77345896 T/C HSPG2 intron_variant MODIFIER

2 77345897 A/G HSPG2 intron_variant MODIFIER

2 77345897 A/G HSPG2 missense_variant MODERATE

2 10296439 G/T ARMC3 intron_variant MODIFIER

3 72537104 A/G PDS5A intron_variant MODIFIER

3 71251182 A/T JAKMIP1 intron_variant MODIFIER

3 71251189 G/T JAKMIP1 intron_variant MODIFIER

3 72537445 T/C PDS5A intron_variant MODIFIER

5 77412492 G/A MARVELD3 missense_variant MODERATE

5 32330465 A/G ACADVL downstream_gene_variant MODIFIER

5 32330465 A/G PLSCR3 stop_lost_variant HIGH

5 32330465 A/G TNK1 downstream_gene_variant MODIFIER

228

5 77232978 T/G TLE7 missense_variant MODERATE

5 77232993 A/G TLE7 synonymous_variant LOW

5 32330470 G/A ACADVL downstream_gene_variant MODIFIER

5 32330470 G/A PLSCR3 missense_variant MODERATE

5 32330470 G/A TNK1 downstream_gene_variant MODIFIER

6 39891567 G/A FBXL16 downstream_gene_variant MODIFIER

6 39891567 G/A JMJD8 upstream_gene_variant MODIFIER

6 39891567 G/A JMJD8 upstream_gene_variant MODIFIER

6 39891567 G/A STUB1 downstream_gene_variant MODIFIER

6 39891567 G/A WDR24 3_prime_UTR_variant MODIFIER

8 22594357 C/G FANCM intron_variant MODIFIER

8 22594357 C/G FANCM missense_variant MODERATE

8 22594357 C/G MIS18BP1 intron_variant MODIFIER

8 22594350 T/G FANCM intron_variant MODIFIER

8 22594350 T/G MIS18BP1 intron_variant MODIFIER

8 22594311 G/T FANCM intron_variant MODIFIER

8 22594311 G/T FANCM stop_gained HIGH

8 22594313 A/G FANCM intron_variant MODIFIER

8 22594313 A/G FANCM synonymous_variant LOW

8 22594311 G/T MIS18BP1 intron_variant MODIFIER

229

8 22594313 A/G MIS18BP1 intron_variant MODIFIER

8 72320921 A/C AKT1 intron_variant MODIFIER

9 2661907 C/T CYTH1 intron_variant MODIFIER

9 49748755 A/C ABO missense_variant MODERATE

9 5234418 T/G GGA3 upstream_gene_variant MODIFIER

9 5234418 T/G GGA3 upstream_gene_variant MODIFIER

9 5234418 T/G MIF4GD downstream_gene_variant MODIFIER

9 5234418 T/G MRPS7 missense_variant MODERATE

9 19299928 G/T HDAC5 downstream_gene_variant MODIFIER

10 12356994 A/G PTPRB intron_variant MODIFIER

10 12356994 A/G PTPRB upstream_gene_variant MODIFIER

10 23658488 T/C DESI1 upstream_gene_variant MODIFIER

10 23658488 T/C XRCC6 missense_variant MODERATE

10 23658488 T/C XRCC6 intron_variant MODIFIER

10 23657142 A/G DESI1 upstream_gene_variant MODIFIER

10 23657152 G/C DESI1 upstream_gene_variant MODIFIER

10 23657154 G/C DESI1 upstream_gene_variant MODIFIER

10 23657142 A/G XRCC6 missense_variant MODERATE

10 23657152 G/C XRCC6 synonymous_variant LOW

10 23657154 G/C XRCC6 missense_variant MODERATE

230

10 26681571 G/C SOX10 synonymous_variant LOW

10 26681601 G/C SOX10 synonymous_variant LOW

10 17055863 C/G HDAC10 intron_variant MODIFIER

10 17055863 C/G PANX2 synonymous_variant LOW

11 2055922 C/G HNRNPH1 missense_variant MODERATE

11 2055922 C/G RUFY1 intron_variant MODIFIER

14 39516858 C/T CBX3 missense_variant MODERATE

15 15544670 C/T RPS8 missense_variant MODERATE

15 15544670 C/T SNORD38A upstream_gene_variant MODIFIER

15 15544670 C/T SNORD38B upstream_gene_variant MODIFIER

15 15544670 C/T SNORD39 downstream_gene_variant MODIFIER

15 15544670 C/T SNORD46 upstream_gene_variant MODIFIER

15 18698940 G/A KRR1 missense_variant MODERATE

15 18698940 G/A KRR1 intron_variant MODIFIER

15 18698942 C/T KRR1 synonymous_variant LOW

15 18698942 C/T KRR1 intron_variant MODIFIER

15 52226379 T/A FGB intron_variant MODIFIER

15 15544646 G/A RPS8 missense_variant MODERATE

15 15544648 G/A RPS8 synonymous_variant LOW

15 15544646 G/A SNORD38A upstream_gene_variant MODIFIER

231

15 15544648 G/A SNORD38A upstream_gene_variant MODIFIER

15 15544646 G/A SNORD38B upstream_gene_variant MODIFIER

15 15544648 G/A SNORD38B upstream_gene_variant MODIFIER

15 15544646 G/A SNORD39 downstream_gene_variant MODIFIER

15 15544648 G/A SNORD39 upstream_gene_variant MODIFIER

15 15544646 G/A SNORD46 upstream_gene_variant MODIFIER

15 15544648 G/A SNORD46 upstream_gene_variant MODIFIER

15 18558883 T/A KRR1 intron_variant MODIFIER

15 18558883 T/A SALL2 missense_variant MODERATE

16 14366546 C/T ZN777 missense_variant MODERATE

16 14366553 C/T ZN777 missense_variant MODERATE

16 14366562 G/T ZN777 missense_variant MODERATE

16 14366563 G/T ZN777 missense_variant MODERATE

16 14366516 T/C ZNF777 missense_variant MODERATE

16 46742570 A/C RWDD4 missense_variant MODERATE

16 46742573 C/T RWDD4 missense_variant MODERATE

17 59571075 A/G PRPF3 missense_variant MODERATE

17 59411591 G/A PLEKHO1 missense_variant MODERATE

18 12018565 T/A SFRP4 missense_variant MODERATE

18 12018584 G/C SFRP4 synonymous_variant LOW

232

18 54104345 A/G AHNAK intron_variant MODIFIER

18 45362284 C/T AP2A2 downstream_gene_variant MODIFIER

20 39437199 T/A TRAIP intron_variant MODIFIER

21 40941597 T/C TSG101 synonymous_variant LOW

21 26043319 T/A NUMA1 intron_variant MODIFIER

23 19762376 T/A NR1D2 intron_variant MODIFIER

24 46178769 G/A LSM14B intron_variant MODIFIER

24 21379094 A/T PPRG1 downstream_gene_variant MODIFIER

24 21379094 A/T TTLL9 intron_variant MODIFIER

24 18184314 A/C AVP missense_variant MODERATE

24 42409111 A/C RBM38 synonymous_variant LOW

24 32910604 G/A WFDC10A intron_variant MODIFIER

25 28042882 T/A MSRA intron_variant MODIFIER

26 29775195 A/C MRPL40 missense_variant MODERATE

26 29775156 G/A MRPL40 stop_gained_variant HIGH

26 29775207 C/T HIRA upstream_gene_variant MODIFIER

26 29775207 C/T MRPL40 missense_variant MODERATE

26 29775177 G/A HIRA upstream_gene_variant MODIFIER

26 29775177 G/A MRPL40 missense_variant MODERATE

26 22615544 T/A GASL1 5_prime_UTR_variant MODIFIER

233

26 29775136 C/T HIRA upstream_gene_variant MODIFIER

26 29775136 C/T MRPL40 synonymous_variant LOW

26 27303267 G/C G6PC3 intron_variant MODIFIER

27 24589659 A/T ETNK1 intron_variant MODIFIER

28 4228581 T/C BMS1 intron_variant MODIFIER

29 28067178 A/G ZNF704 stop_lost_variant HIGH

29 15409348 G/A PDE7A missense_variant MODERATE

29 15409350 C/A PDE7A synonymous_variant LOW

29 15409381 G/C PDE7A missense_variant MODERATE

29 5339956 C/T ATP6V1H intron_variant MODIFIER

29 5339934 C/G ATP6V1H intron_variant MODIFIER

29 5339935 A/G ATP6V1H intron_variant MODIFIER

30 37816883 G/C CYP1A2 upstream_gene_variant MODIFIER

30 37816883 G/C CYP1A2 intron_variant MODIFIER

30 37816875 A/C CYP1A2 upstream_gene_variant MODIFIER

30 37816875 A/C CYP1A2 intron_variant MODIFIER

34 34385269 A/G MYNN splice_region_variant LOW

38 725236 C/T ETNK2 3_prime_UTR_variant MODIFIER

38 725229 G/A ETNK2 3_prime_UTR_variant MODIFIER

38 725296 G/A ETNK2 3_prime_UTR_variant MODIFIER

234

38 725301 A/T ETNK2 3_prime_UTR_variant MODIFIER

38 725256 G/A ETNK2 3_prime_UTR_variant MODIFIER

235

Appendix 6

Frequent somatically mutated genes and the allele frequencies (%) in case (HSA) and control. Number of cases carrying the alleles in US HSA samples and Australian HSA samples demonstrated by: (Homozygous+Heterozygous)/Total.

GENE Chr_position Allele AF (%) p-value HSA Case Control US AUS TP53 5_32565114 C/T 0.0179 0.0000 0.0337 (0+1)/21 (0+0)/7 5_32563990 T/G 0.9444 1.0000 0.0369 (17+3)/20 (7+0)/7 5_32562479 G/GGA 0.6200 0.7561 0.0432 (10+4)/18 (3+1)/7 PIK3CA 34_12669287 C/A 0.0000 0.0957 0.0030 (0+0)/16 (0+0)/7 34_12651251 G/C 0.3333 0.8618 0.0034 (0+0)/1 (1+0)/2 34_12673499 T/A 0.3269 0.4895 0.0241 (3+4)/19 (3+1)/7 34_12662688 A/AT 0.3333 0.1371 0.0391 (2+0)/5 (1+0)/4 PTEN 26_37912737 C/A 0.0000 0.2759 0.0002 (0+0)/9 (0+0)/2 26_37853125 A/G 0.1964 0.0615 0.0017 (2+4)/15 (1+1)/7 26_37910175 C/G 0.0833 0.1866 0.0539 (0+4)/20 (0+0)/4 KIT 13_47174856 C/T 0.0000 0.6379 6e-5 (0+0)/2 (0+0)/2 13_47187798 C/T 0.0000 1.0000 0.0004 (0+0)/2 (0+0)/6 13_47120002 T/C 0.1500 0.5068 0.0010 (1+1)/9 (1+0)/1 13_47144179 G/C 0.0000 0.0796 0.0031 (0+0)/21 (0+0)/7 13_47187819 A/G 0.0476 0.2072 0.0041 (1+0)/14 (0+0)/6 13_47189659 C/T 0.0357 0.1408 0.0112 (0+1)/20 (0+1)/7 13_47144532 C/T 0.3214 0.1865 0.0234 (3+9)/21 (1+1)/7 13_47148652 A/AT 0.1389 0.2988 0.0290 (1+1)/13 (1+0)/5 PLCG1 24_29198747 A/C 0.5357 0.3996 0.0520 (9+6)/21 (2+2)/7 24_29212918 G/C 0.0588 0.0000 0.0747 (1+0)/13 (3+0)/13 KDR 13_47466618 C/T 0.0000 0.1776 2e-5 (0+0)/18 (0+0)/7 13_47473824 G/A 0.2143 0.0449 4e-5 (1+6)/21 (2+0)/7 13_47460841 A/G 0.1389 0.0082 8e-5 (1+1)/12 (1+0)/6 13_47477985 C/A 0.0000 0.1327 0.0001 (0+0)/7 (0+0)/21 13_47463782 G/T 0.0000 0.1824 0.0001 (0+0)/14 (0+0)/5 13_47474691 G/A 0.8333 0.5816 0.0001 (15+3)/20 (6+0)/7 13_47460135 T/C 0.6964 0.4385 0.0002 (6+7)/21 (6+0)/7 13_47471294 A/G 0.8571 0.6633 0.0016 (16+4)/21 (6+0)/7 13_47469217 A/G 0.5893 0.3959 0.0058 (8+8)/21 (4+1)/7 13_47478328 G/A 0.6667 0.4306 0.0115 (5+0)/9 (5+0)/6 13_47450511 G/A 0.5714 0.4078 0.0198 (9+6)/21 (4+0)/7 13_47467611 G/T 0.7143 0.5633 0.0270 (11+6)/21 (6+0)/7 MYC 13_25200798 G/T 0.0000 0.1429 0.0193 (0+0)/12 (0+0)/7 13_25203038 A/C 0.0000 0.0370 0.0894 (0+0)/21 (0+0)/7 CDKN2A 11_40856638 A/G 0.2308 0.4862 0.0090 (2+1)/11 (0+1)/2 11_40856631 G/C 0.2692 0.4401 0.0799 (3+0)/11 (0+1)/2 CDKN2B 9_22005173 C/T 0.0536 0.1721 0.0102 (0+2)/19 (1+6)/7

236

Appendix 7

Co-mutations of single nucleotide mutations identified in the samples

Gene Occurrences Samples ADCY1 4 s1422, s1625, s1810, sMAR CNN2 4 s1340, s1422, s1625, sMAR ARHGAP45 3 s1422, s1625, sMAR ATXN2L 3 s1340, s1422, s1810 CYP1A2 3 s1340, s1422, s1810 KRT3 3 s1340, s1422, s1625 SBNO2 3 s1340, s1422, s1625 TUFM 3 s1340, s1422, s1810 U6 3 s1340, s1810, sMAR 5S_rRNA 2 s1340, s1625 ABCA13 2 s1422, sMAR ACADVL 2 s1340, s1625 ADAMTS7 2 s1422, s1625 ADCK5 2 s1625, s1810 AMD1 2 s1340, s1422 BLZF1 2 s1422, s1625 CD2BP2 2 s1422, s1625 CFAP57 2 s1422, sMAR CLEC10A 2 s1340, s1422 COL4A3 2 s1340, s1422 COL5A3 2 s1810, sMAR CPSF1 2 s1625, s1810 DDX18 2 s1625, s1810 DGAT1 2 s1422, s1625 DHX15 2 s1340, s1422 DMBT1 2 s1340, s1422 DNAH7 2 s1340, s1422 EBNA1BP2 2 s1422, sMAR ELF3 2 s1422, s1625 GCGR 2 s1625, s1810 GDE1 2 s1340, sMAR GLUL 2 s1422, s1625 HSPG2 2 s1422, s1625 ITGA7 2 s1422, s1810 KDM4C 2 s1625, sMAR 237

KIF21A 2 s1422, s1810 KRR1 2 s1422, s1625 LINS1 2 s1810, sMAR MAP3K11 2 s1340, s1422 MAP4K1 2 s1340, s1625 MOV10L1 2 s1340, sMAR NCOA3 2 s1340, sMAR NOTCH1 2 s1422, sMAR NOXO1 2 s1340, s1422 OBSL1 2 s1422, sMAR OR4Q2 2 s1422, s1810 OTUD7A 2 s1340, s1810 P2RX3 2 s1340, s1422 PCNX3 2 s1340, s1422 PDE2A 2 s1422, sMAR PFDN1 2 s1340, s1422 PLCB1 2 s1625, s1810 PLXNB2 2 s1340, s1422 PNLDC1 2 s1340, s1422 POLRMT 2 s1340, sMAR PPFIA2 2 s1625, sMAR PPP1R9B 2 s1422, s1625 PRKD2 2 s1340, s1810 RABL6 2 s1625, s1810 RIDA 2 s1810, sMAR SALL2 2 s1422, s1625 SELENOO 2 s1340, s1422 SH3GLB2 2 s1422, s1810 SLC28A3 2 s1340, s1625 SNCB 2 s1422, s1810 SNORA70 2 s1810, sMAR SOX18 2 s1340, s1810 SPTBN4 2 s1422, s1810 STRN4 2 s1340, s1810 SULF2 2 s1340, sMAR TM9SF2 2 s1340, sMAR TNRC18 2 s1422, s1625 TNS2 2 s1625, s1810 TRAF7 2 s1340, s1422

238

TRAPPC9 2 s1422, s1625 TSKS 2 s1340, s1625 TSPEAR 2 s1422, s1810 TTLL6 2 s1422, s1625 TUBG1 2 s1810, sMAR U4 2 s1422, s1810 UBE2I 2 s1422, s1625 WNK3 2 s1422, s1625 WSB1 2 s1422, sMAR WTAP 2 s1340, s1625 ZNF598 2 s1340, s1422 cfa-mir-8813-1 2 s1625, s1810

239

Appendix 8

Genes with HSA related germline variants that were highlighted in chapter 3 by exome variant scan approach, 'germline_assoc'.

HDAC5 AP2A2 KLK2 PRPF3 ETNK1 MARVELD3 ATP6V1H HSPG2 PTPRB SFRP4 BMS1 WDR24 XRCC6 G6PC3 SOX10 ARMC3 PDE7A FANCM HDAC10 GAS2L1 HNRNPH1 TRAIP ZNF704 CYTH1 PSMA7 NUMA1 CBX3 NR1D2 JAKMIP1 PLSCR3 TSG101 GGA3 SALL2 TTLL9 CYP1A2 AKT1 AHNAK KRR1 WFDC10A MYNN AVP RBM38 ZNF777 LSM14B ETNK2 FGB MIS18BP1 RWDD4 MSRA TNK1 RPS8 ZNF865 PLEKHO1 MRPL40 TLE7

240

Appendix 9

Genes with HSA related germline variants that were highlighted in chapter 3 by variant functional annotation 'germline_func'.

ALPK2 PPP2CB EXOC3L1 PPP1R15B SPARCL1 ARID5B FCN1 SPOCK2 AMT PROB1 GFRA2 COL15A1 ACLY OAS3 GNB4 F3 ARHGEF5 PRR19 HTR2B CD36 DNAJC15 JAK2 KYNU CDK5R1 CASP6 PTGIR KANSL2 ABCA1 AGPAT3 CD40 BRPF3 DPP4 CEACAM23 RAC1 MRPL43 GBE1 MAP4K3 TRIM2 C3 F7 CFAP157 RP1 NPFFR2 NDUFS3 SORBS1 BTG1 SERPINB2 DOCK10 CTIF SHROOM3 OXER1 ARAF CASP7 IL2RB LRP1 SCG3 CYP1A2 TG OXTR RREB1 IFIT2 LGALS3BP F8 F5 DNAH5 TRAPPC2 PODXL SLC5A6 NMI IL18BP GCA FAT2 TXNDC12 PQLC3 PQLC3 NOD1 XAF1 DYRK2 FTH1 VCAN RAB2A ACAA2 CIITA CMKLR1 COL4A2 GFI1B VWA8 SETD9 SOD1 APOL6 FAS KCNIP2 GRB10 WWC3 SLC2A4RG COL4A1 TNFSF10 GBP6 CEBPB IRF2BP2 ADAMTS13 TERT BAZ2A HERC6 TAP1 RHOG ITGA1 ADAT1 TNFRSF4 OMD ICAM1 BST2 GZMB ITGAD ARHGAP22 TRIM38 YWHAG IFIH1 TRIM25 PCLO KIF21B ASB18 TRIM47 CIDEA IFIT1 KLRK1 PDGFB LINS1 B4GALT3 VWA7 DECR1 CSF2RB RNF213 ERAP2 MICAL2 C15H1orf50 ZMYM1 EPHX2 MX2 CD69 CASP7 MITF C9H9orf16 C3 ADCY6 ST3GAL5 TRAFD1 DUSP5 MTUS1 CPA1 ACADL CHUK RTP4 BANK1 C4BPB NLRP10 DCHS2 FAH ENPP2 DDX60 CA2 CPM OR13C8 DHX34 LIFR UCK1 ISOC1 OLR1 ITGAM PARP15 DNAH7 APLP2 LIPE IL4R XPNPEP1 LTF PKHD1 DTD2 COQ9 ACADM SP110 CTSH PPP2CB

241

Appendix 10

Possible damaging single nucleotide mutations identified by sample.

Sample: s1340 Amino Chr_position_ref/alt Consequence Gene cDNA Codon acid Position 1_108709404_T/G missense_variant CCDC9 1793 Aag/Cag K/Q 322 1_114449827_G/C missense_variant MAP4K1 897 aaG/aaC K/N 299 11_52219775_A/C start_lost TPM2 67 aTg/aGg M/R 1 11_74110859_G/T missense_variant CNTRL 1533 Ggt/Tgt G/C 501 12_487192_G/C missense_variant PPP1R18 1495 Cct/Gct P/A 499 13_63189705_G/C missense_variant - 131 tCt/tGt S/C 11 15_30326507_A/C start_lost DUSP6 2 aTg/aGg M/R 1 15_39264355_G/T missense_variant UHRF1BP1L 1129 tCt/tAt S/Y 325 17_851325_T/G missense_variant PXDN 1766 gAc/gCc D/A 589 2_60433955_T/G missense_variant LPCAT2 1024 Aac/Cac N/H 297 2_64688537_A/C missense_variant SNX20 795 gAc/gCc D/A 239 20_44859191_T/G splice_acceptor_variant MAST3 - - - - 20_48160650_T/G missense_variant PKN1 3349 Acg/Ccg T/P 1117 24_44367936_G/C missense_variant PHACTR3 559 agG/agC R/S 81 24_47297242_T/G missense_variant ZBTB46 2080 Aac/Cac N/H 369 25_19916586_G/T missense_variant - 160 Gtg/Ttg V/L 54 25_36805239_G/C missense_variant LZTS1 712 Gaa/Caa E/Q 114 31_39775263_A/C missense_variant DIP2A 4951 gAc/gCc D/A 1375 5_32278943_A/C missense_variant - 3557 cTg/cGg L/R 1186 5_35959298_T/G missense_variant DNAH9 9773 gTg/gGg V/G 3258 5_56687279_A/C missense_variant MIB2 406 Aac/Cac N/H 136 6_38742239_T/G missense_variant DNASE1L2 1030 Atc/Ctc I/L 29 6_61955536_G/T stop_gained COL24A1 619 Gga/Tga G/* 182 6_68974317_C/G missense_variant NEXN 1421 Gaa/Caa E/Q 373 9_13136988_G/A missense_variant PSMD12 13 Ggc/Agc G/S 5 9_13137010_G/A missense_variant PSMD12 35 cGc/cAc R/H 12

242

Sample: s1422 Amino Chr_position_ref/alt Consequence Gene cDNA Codon acid Position 20_53259196_A/T missense_variant - 818 cTt/cAt L/H 273 32_30753341_G/A start_lost - 1 Gtg/Atg V/M 1 12_49630430_C/A missense_variant - 626 aGg/aTg R/M 209 18_52766521_A/C missense_variant - 104 gTt/gGt V/G 35 31_37236424_A/C missense_variant - 1229 gTg/gGg V/G 403 35_24047354_C/G missense_variant - 457 Gcc/Ccc A/P 153 16_15091679_T/G missense_variant ABCB8 1289 gTc/gGc V/G 413 16_1038452_A/T missense_variant ADCY1 604 Tat/Aat Y/N 202 3_87373279_C/G missense_variant ADGRA3 210 tCt/tGt S/C 57 13_36642632_A/C missense_variant ADGRB1 2335 Atc/Ctc I/L 578 24_46286231_T/G missense_variant ADRM1 489 gTg/gGg V/G 138 32_10540827_C/G stop_gained AFF1 638 tCa/tGa S/* 213 6_42250504_A/C missense_variant AMPD2 1447 Ttg/Gtg L/V 269 10_69280462_A/C missense_variant ATP6V1B1 608 aAc/aCc N/T 203 7_29193302_G/C missense_variant BLZF1 660 tCt/tGt S/C 169 18_53233634_T/G missense_variant C18H11orf95 243 gTc/gGc V/G 23 13_37303254_T/G stop_lost CCDC166 1311 tgA/tgC */C 437 27_6583694_T/G missense_variant CCDC184 1096 Att/Ctt I/L 366 9_877421_G/C missense_variant CEP131 2280 Gac/Cac D/H 593 18_45246550_C/G missense_variant CHID1 178 Gac/Cac D/H 60 missense_variant, 6_47500628_C/G splice_region_variant COL11A1 1505 Ccg/Gcg P/A 380 13_38060253_T/G missense_variant COMMD5 161 tTg/tGg L/W 54 30_32690848_T/G missense_variant CORO2B 1271 Tgg/Ggg W/G 316 28_33459273_T/G missense_variant CPXM2 1502 Aag/Cag K/Q 293 9_45614151_C/T missense_variant CRK 997 Gac/Aac D/N 333 21_31322561_T/C splice_acceptor_variant CYB5R2 - - - - 5_23931379_C/G missense_variant DDX10 129 cGg/cCg R/P 13 X_121470502_T/G missense_variant DUSP9 524 cTg/cGg L/R 175 5_32486970_A/G missense_variant EIF4A1 163 gAt/gGt D/G 51 9_1894464_T/G missense_variant ENPP7 1144 Acc/Ccc T/P 84 7_6099139_C/A missense_variant FCAMR 1660 Gtc/Ttc V/F 451 6_72134013_A/G missense_variant FPGT 389 aTt/aCt I/T 130 6_11848608_G/A missense_variant GRID2IP 1214 aCg/aTg T/M 405 11_2581249_G/T missense_variant GRM6 710 cGg/cTg R/L 237 10_19872612_G/C missense_variant GTSE1 990 gCt/gGt A/G 306

243

5_76964996_A/C missense_variant HYDIN 1418 gTc/gGc V/G 473 18_46304477_A/C missense_variant IGF2 89 gTc/gGc V/G 30 18_46304518_T/G missense_variant IGF2 48 caA/caC Q/H 16 10_15817475_A/C missense_variant KCNC2 623 Tgg/Ggg W/G 202 8_61222237_A/C missense_variant KCNK13 1205 aAc/aCc N/T 402 1_114670506_A/C missense_variant KCNK6 707 gTg/gGg V/G 236 10_43842598_A/C missense_variant KIAA1211L 2352 gAc/gCc D/A 551 missense_variant, X_54818207_G/T splice_region_variant KIF4A 1840 aaG/aaT K/N 594 9_24004331_T/G splice_donor_variant KPNB1 - - - - 2_34096852_C/G missense_variant LARP4B 1522 ttC/ttG F/L 354 24_6808975_C/G missense_variant MACROD2 1159 Gaa/Caa E/Q 387 7_31380090_A/C splice_donor_variant MAEL - - - - 17_62882266_T/G missense_variant MOV10 461 cAt/cCt H/P 154 6_28936035_C/G missense_variant MRTFB 2397 caG/caC Q/H 799 X_50763697_T/G missense_variant MSN 925 gTt/gGt V/G 160 1_106416113_T/A missense_variant MYH14 2438 cAg/cTg Q/L 813 1_116769288_A/C splice_acceptor_variant NFKBID - - - - 5_32342162_A/C missense_variant NLGN2 1828 gAg/gCg E/A 88 20_3642856_T/G missense_variant NUP210 4532 gAc/gCc D/A 1511 2_78848656_C/G missense_variant OTUD3 1065 aaG/aaC K/N 355 3_80387499_T/A missense_variant PCDH7 1368 caA/caT Q/H 456 1_53517618_C/T missense_variant PDE10A 1120 Gtg/Atg V/M 374 missense_variant, 21_25630829_T/G splice_region_variant PDE2A 1846 Ttc/Gtc F/V 594 13_37451070_T/G missense_variant PLEC 13526 cAg/cCg Q/P 4509 6_11436457_C/A missense_variant PMS2 152 cGg/cTg R/L 51 21_41110129_T/G missense_variant PTPN5 737 Acc/Ccc T/P 223 29_3348573_A/C missense_variant PXDNL 2516 gTc/gGc V/G 794 7_969013_T/G missense_variant RNPEP 1481 gAc/gCc D/A 494 12_2663480_A/C missense_variant RXRB 227 Tgt/Ggt C/G 19 15_18560175_T/G missense_variant SALL2 1373 Acc/Ccc T/P 435 18_25550548_T/G missense_variant SIGIRR 809 gAc/gCc D/A 152 1_113170067_A/C splice_donor_variant SPTBN4 - - - - 20_37252651_A/C missense_variant STAB1 6101 cTt/cGt L/R 2034 24_20280229_A/C missense_variant TCF15 687 Aag/Cag K/Q 190 17_48982838_T/G missense_variant TET3 5108 aAc/aCc N/T 1703 5_32563409_C/T missense_variant TP53 845 tGc/tAc C/Y 230 1_83436578_A/C missense_variant TRPM6 6024 gaA/gaC E/D 2008

244

5_34185438_A/C missense_variant USP43 2900 gAc/gCc D/A 967 3_24191951_T/C missense_variant VCAN 2077 Aga/Gga R/G 691 18_52105928_T/G missense_variant VPS51 119 tAc/tCc Y/S 40 12_1254193_G/T missense_variant VWA7 80 gCa/gAa A/E 27 20_57716139_T/G missense_variant WDR18 533 Acc/Ccc T/P 131 18_53861141_T/C missense_variant WDR74 928 Tat/Cat Y/H 310 X_120804813_T/G splice_donor_variant ZNF185 - - - - 13_38143408_A/G missense_variant ZNF250 434 gTc/gCc V/A 145

Sample: s1625 Amino Chr_position_ref/alt Consequence Gene cDNA Codon acid Position 1_106290501_A/G missense_variant MYBPC2 382 Tcc/Ccc S/P 110 1_106791927_A/C missense_variant TSKS 394 Acc/Ccc T/P 118 1_110411271_T/G missense_variant ZNF296 1029 Tgt/Ggt C/G 322 1_111621578_T/G missense_variant PHLDB3 1400 caT/caG H/Q 421 1_112135246_T/G splice_donor_variant - - - - - 1_48329660_G/T missense_variant RSPH3 1098 Cgt/Agt R/S 319 10_45295747_G/C missense_variant FER1L5 492 ttC/ttG F/L 164 11_26047548_T/G missense_variant EGR1 602 aTt/aGt I/S 201 13_37742405_G/C missense_variant HSF1 826 Gag/Cag E/Q 232 13_37876362_A/C missense_variant KIFC2 131 gAc/gCc D/A 44 13_38453575_T/G missense_variant WFS1 2095 Acc/Ccc T/P 669 15_23671998_T/G missense_variant PPFIA2 1253 Aac/Cac N/H 111 15_46197224_T/G missense_variant ARHGAP10 512 tTt/tGt F/C 171 19_32026476_C/G splice_acceptor_variant DDX18 - - - - 19_50452009_G/C missense_variant EPC2 438 gaG/gaC E/D 146 2_68322525_C/G missense_variant AZIN2 414 Gcc/Ccc A/P 33 2_8827198_T/G missense_variant ARHGAP21 387 gaT/gaG D/E 129 20_39063549_C/G missense_variant SEMA3B 637 Gaa/Caa E/Q 213 20_50327178_A/C missense_variant YIPF2 1356 Acg/Ccg T/P 162 21_19852915_A/C missense_variant TENM4 299 gAc/gCc D/A 100 21_19852921_A/C missense_variant TENM4 305 gAg/gCg E/A 102 24_24822566_G/A missense_variant SCAND1 502 Cgg/Tgg R/W 168 missense_variant, 24_46551855_C/T splice_region_variant SLCO4A1 1713 aCg/aTg T/M 429 25_48189016_A/C missense_variant RAB17 821 Tcg/Gcg S/A 199 26_27330960_T/A missense_variant - 163 Tca/Aca S/T 55

245

27_1734327_T/G missense_variant MAP3K12 2719 aTc/aGc I/S 775 27_2879508_C/T missense_variant KRT80 1103 gCc/gTc A/V 368 27_42765284_T/G missense_variant WNK1 104 gTg/gGg V/G 35 29_188488_T/G missense_variant PRKDC 3453 aAa/aCa K/T 1150 31_39596586_G/C missense_variant PCNT 635 aGa/aCa R/T 130 35_24288524_A/C missense_variant BTN1A1 926 cAc/cCc H/P 309 6_17792688_T/G missense_variant QPRT 461 gTg/gGg V/G 154 7_887484_T/G start_lost GPR37L1 328 Atg/Ctg M/L 1 X_9435478_C/T missense_variant - 596 tCt/tTt S/F 199 X_9435484_C/A missense_variant - 602 aCc/aAc T/N 201

Sample: s1810 Amino Chr_position_ref/alt Consequence Gene cDNA Codon acid Position 1_102371271_G/C missense_variant BRSK1 1775 tCc/tGc S/C 592 12_220183_G/T missense_variant MED23 1722 ttG/ttT L/F 571 12_32255010_G/A missense_variant - 491 cGc/cAc R/H 34 13_2579883_A/T missense_variant PABPC1 215 aTg/aAg M/K 46 14_14186002_A/C missense_variant ADAM22 680 aAa/aCa K/T 227 16_45179793_G/C missense_variant CCDC110 2536 gaG/gaC E/D 787 17_60313406_C/T stop_gained PSMD4 1082 Cga/Tga R/* 353 18_14814687_T/G missense_variant ATXN7L1 2620 gTg/gGg V/G 873 18_17682635_A/C missense_variant PTPN12 2293 aaA/aaC K/N 445 21_32381230_A/C missense_variant DENND5A 2812 Tgt/Ggt C/G 938 24_32347960_T/G splice_donor_variant PABPC1L - - - - 25_7445937_A/G start_lost - 2 aTg/aCg M/T 1 27_14259005_G/T stop_gained KIF21A 910 Gag/Tag E/* 66 3_40430885_C/T missense_variant LINS1 553 Ctc/Ttc L/F 159 30_10536215_G/C missense_variant SERF2 106 atG/atC M/I 15 37_29707979_C/A missense_variant SERPINE2 788 Gcg/Tcg A/S 263 5_42143514_A/C start_lost COPS3 41 Atg/Ctg M/L 1 7_60918113_C/G missense_variant CDH2 2312 Ctg/Gtg L/V 740 8_2211049_A/G missense_variant - 349 Agg/Ggg R/G 79 9_4712154_A/C missense_variant MRPL38 1087 cAg/cCg Q/P 361

Sample: sMAR Amino Chr_position_ref/alt Consequence Gene cDNA Codon acid Position

246

10_17238266_T/G splice_donor_variant IL17REL - - - - 10_17238266_T/G splice_donor_variant IL17REL - - - - 1_110193971_T/G missense_variant KLC3 483 gAc/gCc D/A 157 12_66350713_T/G missense_variant MICAL1 2924 gAt/gCt D/A 975 16_59301450_C/T missense_variant - 65 tCa/tTa S/L 22 18_52021299_T/G missense_variant SPDYC 823 cAc/cCc H/P 141 21_35318514_A/C missense_variant MICAL2 4682 Acc/Ccc T/P 1395 28_8920011_C/G missense_variant SORBS1 1273 caG/caC Q/H 319 31_39318707_T/G missense_variant COL6A1 2682 gTg/gGg V/G 871 38_2049135_A/C missense_variant MFSD4A 906 gAc/gCc D/A 161 9_19042413_A/C missense_variant GPATCH8 4175 cAc/cCc H/P 1392 9_6805830_A/C missense_variant SDK2 1915 Aca/Cca T/P 639

247

Appendix 11

Patient samples included in the 'CAL', 'VAL' and 'BLIND' sets.

Reference number Diagnosis FTIR images Group 'CAL', 'VAL' acquired 13/034453 Normal spleen 2 non-HSA 14/061885_A Normal spleen 2 13/006868_A Haemorrhage 3 12/0066186_A Hematoma 1 12/025495_J Hematoma 2 17/151879_4 Hematoma 1 17/119979_3 Hematoma 1 17/117517_2 Hematoma 1 17/120973_1 Nodular hyperplasia 1 17/68968_1 Haemangiosarcoma, spleen 1 HSA 10/077306_T Haemangiosarcoma, spleen 1 13/080574_E Haemangiosarcoma, spleen 2 13/019400_E Haemangiosarcoma, spleen 1 12/025492_A Haemangiosarcoma, spleen 1 12/030488_F Haemangiosarcoma, spleen 2 12/036750_A Haemangiosarcoma, spleen 1 14/057063_C Haemangiosarcoma, spleen 1 17/146723_1 Haemangiosarcoma, spleen 1 17/38707_1 Haemangiosarcoma, spleen 2 17/80266_3 Haemangiosarcoma, spleen 2 17/122447_2 Haemangiosarcoma, spleen 2 11/065749_G Haemangiosarcoma, spleen 1 12/066193_D Haemangiosarcoma, spleen 1 13/042014_A Haemangiosarcoma, spleen 2 'BLIND'

16/1555799_1 Hematoma 2 non-HSA 16/45421_1 Hematoma 4 17/120068_3 Hematoma 2 17/31800_1 Hematoma 2

248

17/60980_1 Hematoma 1 17/70610_2 Hematoma 2 17/68655_4 Nodular hyperplasia 2 17/134340_1 Nodular hyperplasia 2 17/90856_1 Hemangioma 2 15/14401_1 Histiocytoma 2 17/33755_2 Lymphoma 2 17/131203_1 Brain tissue 3 15/38532_2 Haemangiosarcoma, spleen 1 HSA 17/144311_1 Haemangiosarcoma, spleen 1 17/170400_1 Haemangiosarcoma, spleen 2 16/149156_3 Haemangiosarcoma, spleen 1 17/177476_4 Haemangiosarcoma, spleen 2 17/198098_3 Haemangiosarcoma, spleen 2 17/90692_3 Haemangiosarcoma, spleen 2 17/151319_1 Haemangiosarcoma, kidney 3 HSA, non-HSA 17/173773_2 Haemangiosarcoma, kidney 3 17/197590_2 Haemangiosarcoma, liver 2 17/199283_5 Haemangiosarcoma, liver 4 17/76114_1 Haemangiosarcoma, heart 4 17/85759_1 Haemangiosarcoma, heart 5

249

Appendix 12

Microscopic images of normal spleen tissue, H&E stained.

250

Appendix 13

Number (in pixels) and proportion (percentage) of classified 'ground true' FTIR images in 'CAL' and 'VAL' datasets in the calibration/model selection step, prior to reclassification of the tissue components (WBCs = White blood cells, RBCs = Red blood cells.

Class Primary classification Number Proportion (pixels) (%)

1 Splenic parenchyma 658,759 37.93% 2 Cancer cell 445,500 25.65% 3 Cancer matrix 88,251 5.08% 4 Connective tissue 285,102 16.42% 5 WBCs 59,175 3.41% 6 RBCs 73,806 4.25% 7 Empty 66,285 3.82% 8 Paraffin 16,440 0.95% 9 Protein 1,250 0.07% 10 Endothelial cell 42,136 2.43%

251

Appendix 14

Animal Ethics Approval Certificate

252