å

UNIVERSITÀ DEGLI STUDI DI PAVIA Dipartimento di Biologia e Biotecnologie “Lazzaro Spallanzani”

Investigating the genetics of

cord blood transplantation: from classical and non-classical HLA towards non-HLA genetics and more

Paola Bergamaschi

Dottorato di Ricerca in Genetica, Biologia Molecolare e Cellulare XXX Ciclo – A.A. 2014-2017

UNIVERSITÀ DEGLI STUDI DI PAVIA Dipartimento di Biologia e Biotecnologie “Lazzaro Spallanzani”

Investigating the genetics of cord blood transplantation: from classical and non-classical HLA towards non-HLA genetics and more

Paola Bergamaschi

Supervised by Prof. Laura Salvaneschi and Prof. Antonio Torroni

Dottorato di Ricerca in Genetica, Biologia Molecolare e Cellulare XXX Ciclo – A.A. 2014-2017

This thesis is dedicated to my beloved grandmothers, Piera and Maria. May their force be with me. ∞

«...os homini sublime dedit caelumque videre iussit et erectos ad sidera tollere vultus».

(P.Ovidii Nasonis, Met., I, 85-86)

______On the cover: a group of women carrying their products to the Chimbote market, Chimborazo region, Ecuador, 1998 (modified from a photograph by Sebastião Salgado). On the background: the schematic representation of the HLA region, low right, mitochondrial DNA (mtDNA), up left, and Y chromosome (Y-chr), up right.

Abstract

Cord blood represents an alternative source of hematopoietic progenitor cells for transplantation in patients with both hematological and inborn disorders. Thus cord blood banks have become common worldwide providing repositories for cryopreserved cord blood units that are ready for use. By the means of national donors registries, candidate units can be selected for a patient in need using a set of minimum essential data required for the search procedure, mainly relying on HLA match categories and cell dose. The HLA system plays a primary role in immunity and consequently represents, in allografts between non HLA identical individuals, a major barrier that need to be overcome to contrast the occurrence of immunological complications such as graft rejection and Graft versus Host Disease. Due to the tolerogenic properties of cord blood, donor/recipient HLA match is less stringent in this setting. Therefore current HLA definition in cord blood banking programs is based on serological/low resolution antigenic typing for HLA–A and –B and high resolution allelic typing for -DRB1, where up to 1-2 mismatches are permitted. Recently, the impact of allele-level definition of cord blood donor/recipient HLA match has been reported as increasingly relevant and is modifying not only the current donor selection algorithm but also cord blood banking programs strategies. Furthermore, the impact of non-HLA genetic factors on the clinical outcome, that could explain at least in part the occurrence of unexpected complications in HLA- matched transplants (or their absence in HLA-mismatched ones), is still debated. Due to the extensive polymorphism, the HLA system displays considerable population diversity. Linkage disequilibrium, that is the existence of non-random allele combinations that tend to be inherited together as genes blocks (haplotypes), leads to patterns of HLA genetic variation worldwide that could be informative in regard to human geographic expansions, demographic history and cultural diversification. Along with the improvements in molecular-based typing technologies, it has been recently reasserted that, besides mitochondrial DNA and Y-chromosome (uniparental markers), the immunogenetic polymorphisms still may represent an important and complementary tool for population studies. In this scenario, cord blood banks may play an important role by making available their DNA samples archive and their donors‟ data collections, which include ethnic and geographical origin extended back to grandparents for both the maternal and paternal lineages. Furthermore, the organization of cord blood banks in networks that comprise the national registries may amplify the contribution in this setting. During the three years of my doctoral studies, I mainly focused my research activity on three projects aimed to investigate the genetics of cord blood transplantation, from classical and non-classical HLA towards non-HLA genetics, including population genetics. The first project aimed to validate a platform that enables to achieve high definition HLA typing of cord blood units at time of banking in a quick, accurate and cost-effective manner. In response to the increasing importance of allele-level donor-recipient match on the outcome after cord blood transplantation, high definition of both class I and II HLA loci at time of listing is a way to improve the attractiveness of our Cord Blood Bank inventory in Pavia, reducing the time for donor search and procurement and simplifying donor choice, in particular for patients of non-European heritage. The second research project aimed at evaluating the impact of non-HLA genetics on clinical outcome after cord blood transplantation. In fact with the development of human genomics, many studies using single nucleotide polymorphisms of immune response and drug metabolism have shown their influence on post-transplant outcomes. However, in the setting of cord blood transplantation only one study including a small and heterogeneous group of recipients has been described, with no significant association between any of the polymorphisms studied and transplant outcomes. In a multicentric retrospective analysis promoted by Eurocord, including a dataset of 851 cord blood units (including 85 units coming from Pavia) and 173 patients, several candidate genes related to immune response were analysed. We demonstrated the association of cord CTLA-4 GG with lower survival and higher non–relapse mortality, suggesting that this polymorphism might be considered for cord blood donor selection, when more than one unit meeting the current criteria of cell dose and HLA matching is available for a patient. The third study was carried out on 48 cord blood donors with maternal or paternal geographical origins documented to be from Central and South America aiming to investigate the contribution of HLA polymorphisms in human population genetics, in parallel to mitochondrial DNA and Y-chromosome analyses. The high resolution definition of class II HLA alleles, proved to be able to provide complementary information, in particular if the maternal HLA typing is available, enabling the definition of the maternal (and indirectly the paternal) inherited haplotype. Moreover the samples from Ecuador and Peru were included in a larger dataset and contributed to a study on mitogenome variation aimed to shed light on the Paleo-Indian entry into South America. In the course of my Ph.D. studies I also contributed to two additional projects. In the first we investigated the polymorphisms of HLA-G gene, a non- classical HLA class I locus, in 85 cord blood units aiming to assess the potential role in tolerance mechanisms and related implications for donor selection, while in the second one we analysed the immunogenetic data of 42 families from Venezuela assigning mitochondrial DNA and Y-chromosome to acquire new insights on the genetic characteristics of the population of this country. Taken together, the data reported in this thesis remark that the genetics of cord blood transplantation is still an evolving field of endeavour, providing an arena where multidisciplinary approaches may contribute to the progresses achieved in many areas of investigation, ranging from the clinics of hematopoietic stem cell transplantation to the sustainability of cord blood banking programs, involving also apparently not related spheres of interest such as the study of human origins and migrations.

Abbreviations

3'UTR, 3' untranslated region 5‟URR, 5' upstream regulatory region AF, allele frequency allo-HSCT, allogeneic hematopoietic stem cell transplantation BM, bone marrow BMDW, Bone Marrow Donors Worldwide BMT, bone marrow transplantation CB, umbilical cord blood CBBs, cord blood banks CBT, cord blood transplant CBUs, cord blood units CCR5, C-C chemokine receptor type 5 CFU-GM, colony forming unit-granulocyte monocyte CIBMTR, Center for International Blood and Marrow Transplant Research CMV, cytomegalovirus CTLA-4, cytotoxic T-lymphocyte antigen 4 DFS, disease free survival EBMT, European Group for Blood and Marrow Transplantation EMDIS, European Marrow Donor Information System G-CSF, granulocyte colony stimulating factor GvHD, Graft versus Host Disease GvL, graft versus leukaemia GWAS, genome-wide association studies HLA, Human Leukocyte Antigen HSCs, hematopoietic stem cells HSCT, hematopoietic stem cell transplantation IL, interleukin KIR, killer immunoglobulin-like receptor LD, linkage disequilibrium LFS, leukaemia-free survival MHC, Major Histocompatibility Complex MUD, matched unrelated donor NIMA, non-inherited maternal antigens MSY, male specific region of the Y chromosome mtDNA, mitochondrial DNA NK, natural killer NMDP, National Marrow Donor Program NRM, non-relapse mortality OS, overall survival PBG, peptide binding groove PBSC, peripheral blood stem cell PCR-SSP, Polymerase Chain Reaction-Sequence-specific primed revPCR-SSO, Reverse Polymerase Chain Reaction - Sequence Specific Oligonucleotide Probe RFS, relapse-free survival SNPs, single nucleotide polymorphisms TC, transplant center TNC, total nucleated cell TNF, tumour necrosis factor TRM, transplant related mortality sHLA-G, soluble HLA-G molecules WMDA, World Marrow Donor Association

CONTENTS Contents

ABSTRACT

ABBREVIATIONS

CONTENTS

1. INTRODUCTION

1.1 THE GENETICS OF CORD BLOOD TRANSPLANTATION 1 1.1.1 CORD BLOOD FOR HAEMATOPOIETIC STEM CELL TRANSPLANTATION 1 1.1.2 THE GENETICS OF CORD BLOOD TRANSPLANTATION 7 1.1.2.1 Classical HLA genetics 7 1.1.2.2 Non-classical HLA genetics 20 1.1.2.3 Non-HLA genetics 23 1.2 HUMAN POPULATION GENETICS 29 1.2.1 MITOCHONDRIAL DNA 32 1.2.1.1 Mitochondrial genome 33 1.2.1.2 Genetic code 37 1.2.1.3 Mitochondrial DNA features 38 1.2.2 THE HUMAN Y CHROMOSOME 42 1.2.2.1 Genes 44 1.2.2.2 Origin 44

2. REVIEW OF THE LITERATURE

2.1 THE IMPORTANCE OF HLA IN CBT 47 2.2 THE IMPORTANCE OF NON-HLA GENETIC FACTORS IN CBT 51 2.3 CONTRIBUTION OF HLA TO POPULATION GENETICS 52 2.3.1 THE EVOLUTION OF MHC 52 2.3.2 THE HLA DIVERSITY IN DIFFERENT POPULATIONS 55 2.3.3 HLA GENETIC DIFFERENTIATION IN AMERICA 57 2.3.3.1 HLA analysis of South America populations: Ecuador and Peru 60 2.3.3.2 HLA analysis of South America populations: Venezuela 66 2.4 CONTRIBUTION OF MTDNA TO POPULATION GENETICS 68 2.4.1 THE MOLECULAR CLOCK 68 2.4.2 MTDNA NOMENCLATURE 69 2.4.3 MTDNA REFERENCE SEQUENCES 70 2.4.4 MTDNA WORLDWIDE PHYLOGENY 72 2.4.4.1 The origin of modern humans 74 CONTENTS 2.4.4.2 The „Out of Africa‟ exit 76 2.4.4.3 Human colonization of the world 78 2.4.4.3.1 The peopling of Australasia 78 2.4.4.3.2 The peopling of Europe 79 2.4.4.3.3 The peopling of the Americas 80 2.5 CONTRIBUTION OF Y-CHROMOSOME TO POPULATION GENETICS 85 2.5.1 Y-CHROMOSOME POLYMORPHISMS 85 2.5.1.1 Single Nucleotide Polymorphisms (SNPs) 85 2.5.1.2 Variable Number of Tandem Repeats (VNTRs) 86 2.5.2 PHYLOGENY 86 2.5.3 PHYLOGEOGRAPHY 89 2.5.4 RECONSTRUCTING THE ROUTES FOLLOWED BY MODERN HUMANS 90 2.5.5 Y-CHROMOSOME HAPLOGROUPS 91 2.5.5.1 Native American haplogroups 91

3. MY CONTRIBUTION

3.1 CLASSICAL HLA GENETICS: EVALUATION OF LUMINEX® XMAP® TECHNOLOGY APPLIED TO HIGH DEFINITION HLA TYPING OF CORD BLOOD UNITS PRIOR TO LISTING. 95 3.1.1 AIM OF THE RESEARCH 95 3.1.2 BACKGROUND 95 3.1.3 THE SAMPLE 96 3.1.4 RESULTS AND DISCUSSION 98 3.1.4.1 Analysis of the time from banking to listing 98 3.1.4.2 Analysis of the test repetitions 99 3.1.4.3 Analysis of the costs 100 3.1.5 CONCLUSION AND PERSPECTIVES 100 3.2 NON-HLA GENETICS: EVALUATION OF THE IMPACT OF IMMUNE RESPONSE GENES ON CBT OUTCOME 102 3.2.1 AIM OF THE RESEARCH 102 3.2.2 BACKGROUND 102 3.2.3 THE SAMPLE 103 3.2.4 RESULTS 104 3.2.4.1 Recipients, donors and transplant characteristics 104 3.2.4.2 Analysis of the genetic polymorphism 107 3.2.4.3 Analysis of the clinical outcomes 109 3.2.5 DISCUSSION 116 3.2.6 CONCLUSION AND PERSPECTIVES 118 3.3.1 CONTRIBUTION OF THE CORD BLOOD BANKS TO POPULATION GENETICS STUDIES 119 CONTENTS

3.3.1.1 BACKGROUND 119 3.3.1.2 AIM OF THE RESEARCH 123 3.3.1.3 THE SAMPLE 123 3.3.1.4 RESULTS AND DISCUSSION 124 3.3.1.4.1 Mt-DNA analysis 124 3.3.1.4.2 Y-chromosome analysis 126 3.3.1.4.3 HLA allele and haplotype analyses 128 3.3.1.5 CONCLUSION AND PERSPECTIVES 133 3.3.2 MITOGENOME VARIATION IN ECUADOR AND PERU 135 3.3.2.1 BACKGROUND 135 3.3.2.2 AIM OF THE RESEARCH 136 3.3.2.3 THE SAMPLE 137 3.3.2.4 RESULTS 143 3.3.2.4.1 The mitogenome variation 143 3.3.2.4.2 Phylogenetic analysis 143 3.3.2.5 DISCUSSION 151 3.3.2.6 CONCLUSION 155 3.4 ADDITIONAL PROJECTS 156 3.4.1 NON-CLASSICAL HLA GENETICS: ANALYSIS OF RS1233334 (HLAG -725 G/C/A) AND RS1063320 (HLAG +3142 C/G) IN CORD BLOOD UNITS 156 3.4.1.1 Background 156 3.4.1.2 Aim of the research 157 3.4.1.3 The sample 157 3.4.1.4 Results and discussion 157 3.4.1.5 Conclusion and perspectives 157 3.4.2 STUDY OF THE HLA CHARACTERISTICS OF 46 FAMILIES FROM VENEZUELA 158 3.4.2.1 Background 158 3.4.2.2 Aim of the research 159 3.4.2.3 The sample 160 3.4.2.4 Results and discussion 160 3.4.2.5 Conclusion and perspectives 163 4. MATERIALS AND METHODS 4.1 DNA EXTRACTION 165 4.2 DNA QUANTIFICATION 165 4.3 ANALYSIS OF HLA LOCI 166 4.3.1 Molecular analysis by revPCR-SSO and PCR-SSP 167 4.3.2 Molecular analysis by revPCR-SSO and Luminex® xMAP® technology 174 4.3.3 Test repetition 179 4.3.4 Statistical analysis 179 CONTENTS

4.4 ANALYSIS OF THE POLYMORPHISMS OF IMMUNE RESPONSE GENES AND CBT OUTCOMES 180 4.4.1 Genetic polymorphism 180 4.4.2 Statistical analysis 181 4.5 MTDNA ANALYSIS 182 4.5.1. Long range PCR for Illumina sequencing 182 4.5.2 Next Generation Sequencing: sequence analysis 184 4.5.3 DNA amplification for Sanger sequencing 185 4.5.4 Phylogenetic (and other) analyses 189 4.6 Y-CHROMOSOME ANALYSIS 190 4.6.1 Biallelic markers PCR amplification reactions and conditions 191 4.6.2 AFLP and RFLP analysis 192 4.7 HLA-G POLYMORPHISMS ANALYSIS 193 4.7.1 Sequencing method 193 4.7.2 Real-Time PCR 195

REFERENCES 197

LIST OF ORIGINAL MANUSCRIPTS 228

1. INTRODUCTION

1. INTRODUCTION

1.1 Genetics of cord blood transplantation 1.1 The genetics of cord blood transplantation 1.1.1 Cord blood for haematopoietic stem cell transplantation

Hematopoietic stem cell transplantation (HSCT) was born after the atomic bomb explosion in Japan at the end of World War II to threat the effects of irradiation. After the initial experiments in mice and dogs, the first human bone marrow transplants was performed in 1959 giving a proof of concept that infusing bone marrow provides hematological reconstitution in lethally irradiated patients with acute leukaemia (Thomas et al., 1957). In 1965 Mathé was the first to describe long-term engraftment demonstrating chimerism, tolerance and an anti-leukaemic effect (Mathé et al., 1965). Major progresses came from the discovery of the Human Leukocyte Antigen (HLA) system by J. Dausset (Dausset, 1958) and J.J. Van Rood (van Rood, 1968), when selecting HLA identical siblings as bone marrow donors diminished the risk of rejection and Graft versus Host Disease (GvHD). Among the major landmarks that contributed to the progress of HSCT over the years, we can mention: 1) the development of bone marrow registries for treating patients who do not have an HLA identical sibling donor; 2) improved methods of high resolution HLA typing; 3) the use of new sources of hematopoietic stem cells, such as granulocyte colony stimulating factor (G-CSF), mobilized peripheral blood stem cells (Korbling and Freireich, 2011), cryopreserved umbilical cord blood (Gluckman, 2009), haploidentical related hematopoietic stem cells (Aversa et al., 1998); 4) the international collaboration through non-profit organizations such as EBMT (European Group for Blood and Marrow Transplantation); CIBMTR (Center for International Blood and Marrow Transplant Research) which has been developing for more than 40 years a worldwide database now including data on 465,000 autologous, related and unrelated donor transplant recipients, and performing both observational and prospective research; NMDP (National Marrow Donor Program) founded in 1986, today called Be the Match Registry and consisting of 13 million donors and more than 185.000 cord blood units; WMDA (World Marrow Donor Association), a voluntary organization of representatives of bone marrow donor registries, cord blood banks, other organizations, discussing issues regarding the clinical use of hematopoietic stem cells from unrelated donors across international boundaries leading to the formulation of guidelines; BMDW (Bone Marrow Donors Worldwide), which collects the HLA phenotypes and other relevant data of 27,708,380 volunteer stem cell donors and 688,938 cord blood units, and in 2017 has become a service of the WMDA; Eurocord, a registry of cord blood transplants (about 13,500 from 615 centers, 53 countries and 42 cord blood banks) working in close collaboration with cord blood banks (members of Netcord) to

1

1.1 Genetics of cord blood transplantation analyze the results of cord blood transplants and provide quality standards to unrelated cord blood banks. The result is that nowadays allogeneic hematopoietic stem cell transplantation (allo-HSCT) is widely employed to treat patients with malignant and non-malignant hematological disorders. Traditionally, hematopoietic stem cells (HSCs) were harvested from the posterior iliac crests under general anesthesia; therefore the principal source of HSCs was bone marrow (BM) from an HLA- identical sibling for transplantation in children and young adults. Today, other stem cell sources are available such as G-CSF-mobilized peripheral blood stem cells (PBSC), or umbilical cord blood (CB), therefore the donor can be a HLA identical sibling, a matched unrelated donor (MUD), a haploidentical family PBSC or BM donor or, a HLA-mismatched unrelated CB donor (figure 1.1.1). Unmanipulated CB cells collected and cryopreserved at birth have been used both in related and unrelated HLA-matched and mismatched allogeneic transplants in children, and more recently in adults. Comparison of the yield of various cell sources is given in table 1.1.1. Baldomero et al., performed an EBMT survey of HSCT activity in 2009 (Baldomero et al., 2011) analyzing 31,322 HSCT reports from 624 centers from 43 countries. For allo-HSCT, more unrelated than HLA-identical sibling donors were reported (51% vs 43%). The proportion of peripheral blood as stem cell source was 99% for autologous and 71% for allogeneic HSCT, while unrelated cord blood was used in 756 cases (7%) (www.ebmt.org).

Figure 1.1.1 Sources of hematopoietic stem cells for transplantation.

2

1.1 Genetics of cord blood transplantation Table 1.1.1 Cell content according to stem cell source.

Cluster of Differentiations (CD) is a nomenclature system used for the classification of monoclonal antibodies directed against epitopes on the surface molecules of leukocytes (white blood cells), providing targets for immunophenotyping. According to CD nomenclature, CD34 is one of the CD markers of hematopoietic stem cells, CD3 is the CD marker for T-cells. *per Kg recipient body weight

Focusing on cord blood, since the first human cord blood transplant (CBT) was performed in 1988, cord blood banks (CBBs) have been established worldwide for collection and cryopreservation of cord blood for allo-HSCT (Gluckman et al., 2011). The main practical advantages of using cord blood as an alternative source of stem cells are the relative ease of procurement, the absence of risk for mothers and donors, the reduced likelihood of transmitting infections, particularly cytomegalovirus (CMV), and the ability to store fully tested and HLA typed potential transplants in frozen state, available for immediate use. Recently, the absence of ethical concern and the potentially unlimited supply of cells have arisen increasing interest in the use of CB for regenerative medicine. Nowadays, an international network of CBBs and transplant centers has been established and provides a common inventory where more than 700,000 cord blood units (CBUs) are cryopreserved in more than 100 CBBs and more than 20,000 CBT have been performed worldwide (www.bmdw.org, www.nmdp.org). As the number of CBUs is increasing, to improve the quality of the units for cost-efficient management of the CBBs, only the largest units are cryopreserved and banked in order to obtain at least 3 x107 total nucleated cells (TNC)/kg recipient bodyweight. In fact in CBT setting, several studies have shown that the number of cells is the most important factor for engraftment, while some degree of HLA mismatch is acceptable (Gluckman et al., 2004; Gluckman et al., 2011). In this setting, a strict cooperation between two organizations has provided a major contribution to the knowledge achieved so far. First Eurocord, which was established in 1995 with the principal objective of collecting outcome data from cord blood banks and transplant centers. From 1988 to 2010, 6736 CBT reports from Europe and transplant centers in other countries have been collected. 596 transplants have been reported using related donors (mainly HLA identical sibling donors for children with malignant and non-malignant disorders) and 6140 have been performed in the unrelated transplant setting for children (n=3287) and adults (n=2770). Based on this international cooperation of many transplant centers, Eurocord has published crucial reports of multicentric studies allowing the rapid development of CBT. On the other hand, Netcord (www.netcord.org) has to be

3

1.1 Genetics of cord blood transplantation mentioned too. This network was created in 1998 to establish good practices in umbilical cord blood storage, facilitate donor search, improve the quality of the grafts, and establish procedures for bank accreditation. To promote quality throughout all phases of cord blood banking in order to achieve consistent production of high quality CBUs for transplantation, international standards have been established (see last version of the Netcord-FACT Standards, www.factwebsite.org). Several studies including Eurocord ones, demonstrated that neutrophil and platelet recovery were associated with the degree of HLA mismatch, the number of TNC collected and infused and the use of G-CSF after transplant. Coexistence of HLA class I and II disparities and high CD34+ cell dose (the stem cell dose) in the graft were associated with only a modest increase in the incidence of severe GvHD grade III-IV; in contrast, disease relapse was higher in matched transplants, indicating a graft versus leukaemia (GvL) effect (Gluckman et al., 2004; Welte et al., 2010; Barker et al., 2010; Barker et al., 2011). Current HLA definition in cord blood banks is based on serological typing for HLA -A and -B and allelic typing for HLA-DRB1, avoiding cord blood units with 3 or 4 HLA disparities. At selection, diagnosis and presence of patient HLA antibodies against the HLA antigens of the CBU should be also carefully evaluated. Overall, HLA compatibility appears to be more important for patients with non-malignant disorders than for those with malignant disorders. Cell dose requirements refer only to TNC and CD34+ cells, and must increase with the number of HLA mismatches. If the criterion for the minimum number of cells for a single CBT is not achieved, double CBT (infusion of two CBUs from different donors in the same recipient) could be an option. By contrast, CFU-GM (colony forming unit-granulocyte monocyte) and viability are generally not considered criteria for donor selection. In brief, cord blood selection, when multiple CBUs are identified as potentially suitable for a patient, occurs as follow. For CBUs with 6/6 or 5/6 HLA match, HLA-A or HLA-B mismatches are preferable to DRB1 mismatches. In malignant disorders, nucleated cell dose should be >2.5 to 3.0 x107/kg at freezing, and >2.0 to 2.5 x107/kg after thawing; CD34+ cell dose should be approximately 1.2 to 1.7 x105/kg at freezing or after thawing. In non-malignant disorders, higher total and CD34+ cell doses are requested, and HLA match is preferable. For CBUs with 4/6 HLA match, HLA-A or HLA-B mismatches are better than HLA-DRB1 mismatches. HLA-DRB1 mismatch may lead to a high GvL effect in an advanced phase of the diseases. In malignant disorders, nucleated cell dose should be >3.5 x107/kg at freezing, or >3.0 x107/kg after thawing; CD34+ cell dose approximately >1.7 x105/kg, at freezing or after thawing. In non-malignant disorders, both nucleated cell dose and CD34+ cell dose should be higher. CBUs with 3/6 HLA match should be avoided. In extremely severe cases for patients with malignant disorders, 3/6 matched CBUs can be used if a high

4

1.1 Genetics of cord blood transplantation nucleated cell dose is given, while they are not recommended for patients with non- malignant disorders. If several cord blood units are available that fit the above criteria, the following factors should be taken into consideration: cord blood bank accreditation status and location, ABO compatibility, allele HLA typing of HLA-A and –B, other HLA factors such as HLA-C, high resolution HLA typing, non-inherited maternal antigens (NIMA), anti-donor HLA antibodies in the patient. Double cord blood transplants can be recommended if the cell dose is insufficient with a single CBU. The total dose should be at least 3 x107 TNC/kg, HLA matching between the two units and the recipient must be as matched as possible, as double CBT is associated to good engraftment and survival with more GvHD and less relapse than single CBT. In 2004, to evaluate the contribution of cord blood to allo-HSCT versus the other stem cell sources, two studies have compared the outcome of unrelated CBT and bone marrow transplantation (BMT) in children with malignant diseases (Rocha et al., 2004; Laughlin et al., 2004). On behalf of Eurocord, Rocha et al. published a study comparing the outcome of matched unrelated BMT (HLA 6 out of 6), either unmanipulated or T-depleted, with that of mismatched CBT. The results showed that after CBT, engraftment was delayed, GvHD was reduced similarly to T-cell depleted BMT and there was no difference in relapse or in leukaemia-free survival (LFS). Eapen et al. (2007) for the CIBMTR and the New York Cord Blood Bank (NYCBB) compared the outcomes of 503 children with acute leukaemia given an unrelated mismatched CBT with 282 unrelated BM transplant recipients. HLA allele-mismatched BM recipients had more acute and chronic GvHD without decreasing LFS. Importantly, even using an allele-matched BM donor, LFS was not statistically different from one or 2 HLA disparate CBT and an HLA-matched CBT recipient had better outcomes compared to HLA allele- matched BM recipients (figure 1.1.2). However, an increased transplant related mortality was observed in children transplanted with a low CB cell dose (<3 x107/kg) and one HLA disparate CB graft or in children given a two HLA disparate CBT, independently of the cell dose infused. Interestingly, use of two HLA mismatched CBT was associated with a lower incidence of relapse (Eapen et al., 2007). The same studies were performed in adults with malignancies. Eurocord compared adults with acute leukaemia receiving either a matched unrelated bone marrow transplant (HLA 6 out of 6) or a mismatched cord blood transplant. The results showed that, despite a delay of engraftment, CBT gave a similar LFS to BMT. In the same issue of the journal, CIBMTR and NYCBB showed that, in adults with malignancies, CBT gave the same LFS survival as one antigen mismatched unrelated bone marrow transplant (Rocha et al., 2004; Laughlin et al., 2004). Recently, Eurocord and CIBMTR performed a joint study comparing the outcome of unrelated HLA-matched or 1-2 antigen mismatched bone marrow

5

1.1 Genetics of cord blood transplantation (n=364) or G-CSF mobilized peripheral blood (n=728) with that of mismatched cord blood transplant (n=148) in adults with acute leukaemia. In multivariate analysis, transplant related mortality (TRM) was higher after CBT, but relapse rate and GvHD were lower, resulting in the same LFS compared to the other sources of stem cells (Eapen et al., 2010). The results of these comparative studies and the meta-analysis (Hwang et al., 2007), gathered together, showed that CBT is feasible in adults when a cord blood unit contains a high number of cells and should be considered an option as an allogeneic stem cell source for patients lacking an HLA- matched bone marrow donor; despite increased HLA disparity, unrelated CBUs offer sufficiently promising results compared with matched MUD-HSCT in adults with hematological malignancies. Further improvement has been obtained by the use of double cord blood transplant and of reduced intensity conditioning regimens (Brunstein et al., 2011). These papers were the hallmark of the worldwide development of cord blood transplant as they clearly demonstrated that CBT could be used in adults as well as in children and unrelated mismatched CBT gave the same results as an HLA matched unrelated BMT. Furthermore, the results of these studies, both in children and adults, led to the conclusion that the donor search process for BM and CB from unrelated donors should be started simultaneously, especially in patients with acute leukaemia, where the time factor is crucial (Hwang et al., 2007; Brunstein et al., 2011). Figure 1.1.3 illustrates the algorithm of donor choice.

Figure 1.1.2 Leukaemia-free survival (adjusted probability shown as %) according to stem cell source (modified from Eapen et al., 2007) BM= bone marrow; PBPC= peripheral blood progenitor cells; UCB= umbilical cord blood

6

1.1 Genetics of cord blood transplantation

Figure 1.1.3 Algorithm of donor choice (EBMT-ESH Handbook on Haematopoietic Stem Cell Transplantation, 2012 edition). ABO= AB0 blood group system; CMV= cytomegalovirus

1.1.2 The genetics of cord blood transplantation

Our knowledge of donor selection strategies has been enhanced considerably over the years thanks to the publication of studies analyzing the outcome in very large groups of transplant patients, all with high resolution tissue typing results. Besides cell dose, HLA compatibility represents a major criterion for appropriate donor selection. This is the reason why the immunogenetics of allogeneic HSCT is a major topic of interest in this setting. Besides the large degree of consensus in the literature regarding the selection of the optimal donor, many questions remain, in particular referred to the impact of non-classical HLA loci or non-HLA genetics on clinical outcome. 1.1.2.1 Classical HLA genetics

The primary role of HLA molecules is to present peptide to T-cells, enabling recognition and clearance of non-self particles, and also to prevent the misrecognition of self as foreign. These natural functions, crucial for efficient immune response and control of autoimmunity, represent a major barrier in transplantation and need to be overcome (or manipulated) in order to allow grafts between HLA non-identical individuals.

7

1.1 Genetics of cord blood transplantation The HLA system displays extensive polymorphism, most likely originated by the evolutionary pressure due to the need for the immune system to adapt to the variety of infectious pathogens for controlling the emergence of diseases. Despite this massive diversity in the HLA system, compatible donors can be identified to make allo-HSCT a possible and relatively safe procedure, due to the Mendelian inheritance of HLA and the presence of well defined haplotypes and linkage disequilibrium.

1.1.2.1.1 The HLA system

The Major Histocompatibility Complex (MHC) was discovered in mice by Peter Gorer and George Snell (Marsh et al., 2000). The identification of such antigens in humans followed the description of anti-leukocyte antisera, detectable by agglutination assays, in the sera of patients who had received multiple blood transfusions. Thus Human Leukocyte Antigen (HLA) corresponds to the MHC in humans.

Genetic organization. The MHC contains more than 200 genes, mainly related to immunity, and is contained within 4.2 Mbp of DNA on the short arm of chromosome 6 at 6p21.3. Three main regions can be identified: the HLA Class I region (containing HLA-A, -B and -C genes), the HLA Class II region (containing HLA-DR, -DQ and -DP genes) and between these, the Class III region (including genes that encode complement factors and tumor necrosis factor, TNF) (see figure 1.1.4).

Figure 1.1.4 Gene organization of HLA region on the short arm of chromosome 6 and the corresponding antigens on the cell membrane.

8

1.1 Genetics of cord blood transplantation Structure and function of HLA molecules. Overall HLA Class I and Class II molecules display a similar structure, where most of the polymorphisms are located in the peptide binding groove (PBG) (Marsh et al., 2000). HLA molecules are expressed on the cell surface where their function consists in presenting peptides to T-cells as antigens on the cell surface. Each MHC allele can present thousands of different peptides by interacting with the T-cell receptor, which is able to recognize the peptide only if it is presented by the same MHC molecule as encountered during priming, a concept known as “MHC restriction” (Marsh et al., 2000).

Figure 1.1.5 The interaction between HLA class I and II molecules and T-cell subsets. On the right, schematic representation of HLA genes and corresponding antigen on the cell membrane; on the left, HLA class I molecules interact with a CD8+ T-cell (bottom) while HLA class II molecules interact with a CD4+ T-cell (top), in the process of antigen presentation.

Class I molecules are found on most nucleated cells and platelets and consist of an α chain which is associated with β2 microglobulin (β2m) (a non polymorphic protein encoded on chromosome 15). Bound peptides are classically 8-10 amino acids long, interacting with the Class I molecule through pockets in the PBG. The exposed portion of the peptide and the upper faces of the two α-helices of the Class I molecule interact with the CD8+ T-cell receptor.

9

1.1 Genetics of cord blood transplantation Class II molecules are generally restricted to cells of the immune system (e.g. B cells, dendritic cells), even if they can be induced on other cell types during the immune response, and consists of two transmembrane glycoproteins, the α and β chains. Bound peptides are longer (12–24 amino acids), as the PBG of the Class II molecules has open ends which allow the peptide to extend on the groove in between where it is presented to CD4+ T-cell receptor.

Polymorphism. The HLA region is the most polymorphic one in the human genome (http://www.ebi.ac.uk/imgt/hla/stats.html) (table 1.1.2). While in Class I molecules the α chain is highly polymorphic, in Class II molecules the β chain is highly polymorphic with limited polymorphism in the α chains. The polymorphism is concentrated in the areas encoding the PBG and the sites of interaction with the T-cell receptor. This will ensure the maximum likelihood of an efficient immune response against the larger number of different triggers.

Table 1.1.2 The number of HLA alleles currently known at each locus (updated April 2011).

Nomenclature and typing methods. Owing to the extensive polymorphism, serological typing techniques have soon revealed to be completely inadequate to uncover the level of diversity present in the HLA system. DNA- based tissue typing techniques have introduced great advances in the field and nowadays DNA typing can result in low, medium or high resolution data (also see table 1.1.3). HLA nomenclature has been reviewed accordingly. Each HLA allele name is unique and follows strict nomenclature conventions, where a number corresponding to up to four sets of digits separated by colons is assigned. The first digits (before the first colon) describe the allele group, which often corresponds to the serological antigen e.g. A*24 (the asterisk denotes DNA-based typing). This is the level obtained by low resolution typing techniques. Low resolution typing (or serology) may be appropriate in certain circumstances (e.g. screening of potential sibling donors), however is generally insufficient for selecting unrelated donors. The second set of digits (after the first colon) are used to list the subtypes, different numbers denoting one or more nucleotide substitution that changes the amino acid sequence of the encoded protein e.g. A*24:02 or A*24:05. Medium resolution tissue typing techniques (e.g. SSO, SSP) can define specific allele groups and

10

1.1 Genetics of cord blood transplantation subtypes, although often as a string of possible alleles within a particular allele group, where the use of National Marrow Donor Program (NMDP) codes can be helpful for unambiguous definition (http://bioinformatics.nmdp.org/HLA/hla_res_idx.html). High resolution typing methods enable to resolve the tissue type to allele level, with no ambiguity. By the first two sets of digits, a type is referred as high resolution level, which is recommended for the selection of unrelated donor. The allele name can be extended to several more divisions, representing firstly synonymous mutations, and then intronic or other non-coding variants, which are currently not considered when selecting donors or scoring HLA matches. The addition of an optional letter at the end of a sequence indicates a major alteration in its expression (e.g. an “N” for a null allele). Novel alleles are regularly reported as soon as they are identified and sequenced. Therefore HLA Nomenclature updates are produced on a monthly basis and published in the journals Tissue Antigens, Human Immunology and the International Journal of Immunogenetics, listing all the new and confirmatory sequences reported to the Nomenclature Committee, plus information on errors and corrections to sequences (https://www.ebi.ac.uk/ipd/imgt/hla/; http://hla.alleles.org/nomenclature/index.html).

Figure 1.1.6 Nomenclature for factors of the HLA System (modified from Marsh SGE, http://hla.alleles.org).

11

1.1 Genetics of cord blood transplantation Table 1.1.3 An example of HLA nomenclature and its relation to tissue typing techniques.

Linkage disequilibrium and haplotypes. Linkage disequilibrium (LD) refers to the fact that certain alleles occur together with a frequency greater than would be expected by chance (non-random gametic association). In general, LD is more frequently observed between loci that are in close proximity. A haplotype describes a group of genes which are inherited together. Certain HLA haplotypes are common in particular ethnic groups. α and β subunit of each of the HLA Class II molecules display strong LD explaining why tissue typing involves only the highly polymorphic β subunits. Among HLA class I molecules, HLA-B and HLA- C display strong LD, however, more than one HLA-C allele may be in LD with a particular HLA-B allele. Allele prediction based on LD studies is not completely accurate, particularly in ethnic groups where HLA types have not been well studied yet. For this reason, high resolution typing pre-transplant is mandatory to avoid mismatches. LD may extend across the entire class I and II regions (haplotype), however identity for all the HLA loci does not necessarily predict for identity for the intervening genes. In fact, recombination events may occur, in particular with increased frequency at certain points within the MHC (Marsh et al., 2000). One example is between HLA-DP and the other class II loci, providing the underlying explanation to the difficulty in finding a donor matched for HLA-DPB1 in addition to the other HLA loci. HLA structures and functions. Antigenic mismatches are characterized by amino acid substitutions in both peptide binding and T-cell recognition regions, whereas allelic mismatches are characterized by amino acid substitution in the peptide binding regions only. Both limited and extensive polymorphic differences may result in functional immunogenicity, because limited polymorphisms are not functionally null. In fact, even small differences between MHC molecules are recognized to influence T-cell recognition. Prior to DNA sequencing, serologically defined groups of related HLA variants were subdivided on the basis of their differential reactivity with alloreactive T-cells. When the foreign MHC molecule closely resembles self-MHC, a cross-reaction with self-educated T-cells is more likely, thus explaining why limited differences may induce alloresponses even stronger than numerous differences (Lechler et al., 1990). Originally donor selection included functional studies (mixed lymphocyte cultures, MLC, or

12

1.1 Genetics of cord blood transplantation cytotoxic T-lymphocyte precursor frequency, CTLp) between patient and donor to directly and functionally demonstrate the degree of allogenicity within a pair. Subsequently these methods have been replaced by DNA typing methods. Attempts have been made to create scoring systems which will electronically predict the likelihood of allogenicity, based on functional and structural difference between two HLA alleles; however these methods are not available in routine practice yet.

1.1.2.1.2 Donor search and procurement

The procurement of unrelated donors (adult volunteers or CBUs) is one of the major issues in allo-HSCT, including the organizational framework involved in providing unrelated HSCs and the complex decision tree between the three types of stem cell product and between the different available donors or units. In fact in most countries where an allo-HSCT is a therapeutic procedure widely available to the patients in need, families have been so small for many decades that an HLA identical sibling donor is available for less than a third of the patients. This is the reason why the majority of patients have to rely on alternatives: PBSC or BM from adult MUDs or frozen CBUs. The probability and speed of finding a matched unrelated donor have been significantly improved by high resolution typing of the patient already available prior to the donor search. In the setting of allo-HSCT, typing must be done by DNA methods to avoid hidden mismatches, particularly antigenically silent alleles, and shall include at least HLA-A, -B -C and -DRB1, considering HLA-DQB1 typing to be desirable and HLA-DPB1 to be optional. The probability to identify an unrelated donor can be estimated using various surrogate markers, such as the patient haplotype, the presence of rare alleles or unusual HLA associations (B/C, DRB1/DQB1) and the numbers of potential donors identified at search. The search is generally performed by the transplant center (TC) by the means of an associated registry. In fact unrelated donor (and CBU) search reports can be difficult to interpret. Therefore it is critical for the person interpreting the search report to have a good knowledge of HLA haplotypes and LD, as well as knowledge of NMDP codes and HLA nomenclature, in order to select appropriate donors for confirmatory typing, in a time efficient manner. Often the best donor may not be the first one listed on the search report, therefore a strict collaboration with the reference histocompatibility laboratory is also highly desirable. In this setting, BMDW is a database with the HLA phenotypes of MUDs and more comprehensive information on CBUs from all registries and CBBs accessible for international transplants worldwide. BMDW is operated by the Dutch registry, Europdonor, located in Leiden, and provides a search tool to obtain a list of matched or acceptably mismatched MUDs or CBUs based on a patient‟s HLA phenotype. As of today, BMDW can be regarded as complementary to the European Marrow Donor Information System (EMDIS). This is a network

13

1.1 Genetics of cord blood transplantation connecting the computer systems of several registries with MUDs and CBUs on 5 continents that, for every new patient requiring an international search, forwards the search to all partners and receives the list of matched MUDs and/or CBUs electronically. Although only 40% of the registries are connected via EMDIS, over 80% of international transplant activity takes place between these partners. In particular, the lists of CBUs display a number of distinct features: 1) CBUs are grouped according to number of HLA differences for HLA-A, -B and -DRB1, where the former two are evaluated at serological or low-resolution level and at allele level for HLA-DRB1; 2) they provide important additional information, in particular the TNC and CD34+ cells counts, with a link to the detailed CBU report. Although the management of MUDs and publicly available CBUs varies between and even within countries, certain structures and functional entities are similar. Ideally, a Registry is the single point of access to all MUDs and CBUs units in a country and is designated as the national hub, collecting the relevant data from the inventories of its affiliated donor centers and cord blood banks into a single file and maintaining adequate administrative and IT infrastructure to facilitate the search process. A search unit operating within or on behalf of a transplant canter accesses the registry for national patients, acting in close contact with the transplant physicians responsible for the patient and the local HLA laboratory aiming at the identification of a suitable donor. The Registry provides lists of suitable MUDs and/or CBUs to requesting organizations and manages the flow of information in the donor search processes. At the other end of the network, there are the providers of the stem cell sources, either the donor centers for MUDs or the cord blood banks for CBUs. The donor center is responsible for recruiting new donors, obtaining the informed consent of interested volunteers and registering their personal and search relevant data. The work-up leading to the stem cell harvest is a complex process carried out by a trained physician to ensure a maximum of safety for both donors and patients, and is usually done via the national hub for details of the harvest and transport of the product.

In this setting, the Cord Blood Bank (CBB) represents a multidisciplinary structure that is responsible for the recruitment and subsequent management of donors/mothers as well as the collection, processing, testing, cryopreservation, storage, listing, reservation, release, and distribution of CBUs (Figure 1.1.7). It is typically affiliated to a registry and is also responsible for evaluating clinical outcomes to ascertain that the units shipped for transplantation are safe and potent (Rubinstein, 2009; Gluckman et al., 2011) (Figure 1.1.8).

14

1.1 Genetics of cord blood transplantation

Figure 1.1.7 Schematic representation of CBB functions. Cord blood is collected at birth from healthy pregnant women upon signing informed consent (collection); cord blood units with appropriate cell content are fully characterized, miniaturized by volume reduction and cryopreserved for storage in liquid nitrogen vapor tanks (banking); the data of banked CBUs are submitted to the national registry (i.e. IBMDR) to be available for unrelated donors search (listing).

ISSUE FOR TRANSPLANT

LISTING

Figure 1.1.8 The organizational framework involved in providing unrelated cord blood stem cells. Cord blood units data are all part of a global inventory managed by NetCord and BMDW (listing). Cord blood donor search is performed by the means of the national registry (i.e. IBMDR). Upon confirmation of procurement for a patient by a requesting institution (the transplant center), a cord blood unit is shipped in frozen status (issue for transplantation). Once infused in the patient, the follow-up data (outcomes) are sent back to the providing registry (and cord blood bank) by the means of Eurocord, where these data are used for retrospective analysis in large multicentric collaborative studies.

15

1.1 Genetics of cord blood transplantation CB donation. Donation refers to all activities necessary to ensure the safe collection. First, prior to labor, an information program allows healthy pregnant women to reach an autonomous decision on donation, after obtaining a signed consent to collect and donate. During labor, the potential donor is evaluated by a trained health professional according to donor eligibility criteria using a specifically driven health questionnaire to disclose risk behaviors and travel history. In general, any disease of infectious, genetic, neoplastic or immune origin causes donor exclusion. Consenting eligible donors undergo a cord blood collection, generally performed by venipuncture of the umbilical cord vein after birth under aseptic conditions with drainage of the blood into sealed bags. Two main collection techniques are in place, one performed while the placenta is still in utero and the other performed immediately after placental delivery in a contiguous area (ex utero collection). Importantly, CB collection must not interfere with the management of labor. Once collected, units are kept under controlled conditions before transportation to the processing center. Finally, the donor follow-up is obtained to update the anamnesis with previously unknown medical conditions for the mother, the newborn or relatives. CB manufacturing. Cord blood tissue refers to the blood including HSCs harvested from placental and umbilical vessels after birth. Collected products are sent to a processing center where they are evaluated against a minimum cellular threshold defined to identify which units will be processed, with the purpose of increasing the efficiency of the inventories. Products are qualified according to safety, identity, purity and potency. Then the units undergo preparation for cryopreservation and freezing to be kept on inventory for many years at < -150°C. CB provision. In order to make inventories available for transplantation, cord blood banks list their validated units using minimum essential data required for search procedures. Searches usually first sort potential candidates according to HLA match categories and then by cell dose (see figure 1.1.9)(Gluckman and Rocha, 2004; Gluckman et al., 2006). Therefore the data usually exported from the local processing databases to publicly accessible global databases are HLA typing of HLA-A, –B and -DRB1 loci and cell counts. For historical reasons, search databases contain units typed at different resolutions, even if current standards recommend that new units are typed using molecular techniques of low resolution for class I loci and high resolution for class II loci. Cell dose is primarily evaluated by the amount of TNC and, in addition, by CD34+ cell count, which is increasingly used as an indicator of graft potency.

16

1.1 Genetics of cord blood transplantation

Figure 1.1.9 Basic criteria for cord blood donors selection. The number of HLA disparities correlates with neutrophil recovery (left), with a statistically significant decrease for 3-4 mismatches versus no or 1 or 2 mismatches. TNC (total nucleated cell) at collection correlates with TRM (transplant related mortality) (right), which decreases if increasing cell doses are infused and is significantly higher for cell dose lower than 2x107 (modified from Gluckman et al., 2004 - left - and Gluckman and Rocha, 2004 –right). Cell dose is shown as number/kg of recipient body weight. CI = cumulative incidence After listing, the CBB may receive further requests from the TC such as preliminary reports, additional testing, reservation, allocation of a CBU for a defined patient, including identity confirmation and shipment. As the CB graft is already harvested, the shipment can be arranged immediately, although a number of administrative and technical procedures are required before transporting the unit using trained couriers. On reception at the transplant center, the unit needs to be transiently stored below -150°C; then, after conditioning, the CBU is reconstituted for infusion after a dilution/washing step. Cord blood bank tasks end when a basic clinical follow-up analysis is obtained. In fact engraftment and low toxicity after infusion represent the best proof of CB quality. In this regard, Eurocord in Europe and CIBMTR in the US are the main providers of follow-up data, which are the data source for retrospective studies on CB efficacy that may reveal important insights for future modification of CBBs practices. In the setting of donor search and procurement, WMDA defines the standards for registries encompassing all their relevant collaborating entities. Several registries worldwide have been accredited successfully so far, including the Italian Bone Marrow Donor Registry (IBMDR) (www.worldmarrow.org). In particular, among HSCs sources cord blood is a highly regulated product. In this setting, NetCord-FACT International Standards for Cord Blood Collection, Banking, and Release for Administration, produced after a collaborative effort between NetCord and the Foundation for the Accreditation of Cellular Therapy (FACT), provide a tool for harmonizing the procedures in use worldwide (www.factwebsite.org). As the production of cord blood grafts is expensive and the cellular threshold for a secure engraftment is well defined, a useful parameter to consider for the management of a CBB program is the minimal cell dose acceptable for

17

1.1 Genetics of cord blood transplantation processing. For instance, to target 50 kg patient body weight and accepting 2.5 x107 TNC per kilogram as a safe target, the cellular threshold should be at least 125 x107 TNC post-processing. International networking is also necessary to cover the entire population, including minor ethnicities: enlarging the inventory has only marginal benefits, while approaches focused on enrolling candidate donors of non- European descent can been attempted to profitably increase the genetic diversity.

1.1.2.1.3 Clinical effects of HLA matching

Discrepancies can be found between different studies reporting on the impact of HLA matching and caution should be exercised when comparing or interpreting the results. The year of transplantation, patient demographic profile and ethnicity (homogeneity of HLA in the population), conditioning type, stem cell source and T-cell depletion, are likely to impact on the results and should be taken into account. As previously detailed, the HLA system displays a major role in alloreactivity. As a consequence, HLA-mismatches within a donor-recipient pair led to immunological reactions after allo-HSCT, both in the host-versus-graft and in the graft-versus-host direction (see figure 1.1.10).

Figure 1.1.10 Immunological complications after allogeneic HSCT. Graft rejection acts in the recipient versus donor direction (top), while GvHD and GvL/GvT in the donor versus recipient one (bottom). Graft rejection and GvHD severely affect the outcome, while GvL/GvT displays a beneficial effect. The best donor remains an HLA-identical sibling donor, confirming that - despite numerous other donor factors such as age and gender - genetic factors are the most important donor determinant of patient outcome. However, recombination

18

1.1 Genetics of cord blood transplantation events may occur and result in an HLA mismatch in siblings, particularly for HLA- DPB1. The best unrelated donor is the one who is matched, at high resolution, for the major polymorphic HLA loci. HLA-A, -B, -C, -DRB1 (8/8) are all considered critical, so that an 8/8 matched donor represents the current gold standard. Despite some controversy remains about the need to include HLA-DQB1 in donor selection strategies (10/10), many transplant centers routinely type for this locus their patients and consider a 10/10 matched donor as the gold standard. HLA-DPB1 is not routinely included in donor selection strategies, even if it is considered in certain circumstances, such as haploidentical HSCT. Recent large studies from the NMDP/CIBMTR have investigated the impact of HLA mismatching in recipients of myeloablative allo-HSCT (Flomenberg et al., 2004; Lee et al., 2007; Arora et al., 2009). All showed a significant dicrease of survival with any degree of HLA mismatching at HLA-A, - B, -C, -DRB1 (8/8), so that a 6/8 match will be worse than a 7/8 match, which is worse than an 8/8 match. Two of the studies showed no difference in mortality between a single class I versus a single DRB1 mismatch (Flomenberg et al., 2004; Arora et al., 2009). In contrast, in the study by Lee et al., the impact of an HLA-A or -DRB1 mismatch on overall survival (OS) is more marked than an HLA-B or -C mismatch (Lee et al., 2007). The earlier NMDP study suggested that low resolution mismatches at HLA-A, -B, -C or -DRB1 were associated with a more adverse outcome than high resolution mismatches (Flomenberg et al., 2004), while the later study showed no significant differences in survival dependant on whether the mismatch was allelic or antigenic, except at HLA-C, where an antigenic mismatch increased transplant risks while an allelic mismatch did not (Arora et al., 2009). Data from the International Histocompatibility Working Group in Hematopoietic Cell Transplantation (Petersdorf, 2007) showed that a single mismatch for HLA-A, -B and -C was significantly associated with a detriment of overall survival, while mismatches for single HLA-DRB1 or -DQB1 allele did not. While some studies have not shown any survival disadvantage associated with a single HLA-DQB1 mismatch (Flomenberg et al., 2004, Lee et al., 2007), others have found that there is a significantly worse survival if an HLA-DQB1 mismatch was found in addition to a class I mismatch (Petersdorf et al., 2004; Arora et al., 2009). Several studies have shown that matching of HLA-DPB1 results in inferior GvHD risk and increased relapse risk (Shaw et al., 2010), while not all studies have shown a survival detriment. It should be mentioned that there is clinical evidence for tolerated HLA mismatches. In certain circumstances the use of an HLA mismatched donor has been shown to be associated with an outcome similar to that when using an HLA matched donor. Several studies have shown that single, or even multiple, mismatches may be tolerated in the setting of high risk/late stage disease (Petersdorf et al., 2004; Lee et al., 2007) and in transplants using T-cell depletion in the conditioning. Furthermore changes at an amino acid or epitope level may be

19

1.1 Genetics of cord blood transplantation more significant triggers of an allogeneic response than allelic mismatches, with the result that certain mismatches may be permissive (i.e. not associated with worse clinical outcomes than a match), while others may be non-permissive (i.e. associated with worse clinical outcomes than either a match or a permissive mismatch). Therefore care should be taken not to delay the search unnecessarily, if an acceptable HLA mismatched donor is available. Moreover, uncovering such mismatches is incredibly complex, due to the polymorphic nature of HLA and the clinical heterogeneity of transplant study populations. Comparisons between Japanese and European-descent patients have found that different allelic mismatches are tolerated differently in these populations (Morishima et al., 2007). For the HLA-DPB1 locus, which is frequently mismatched in patient/donor pairs, allowing for analysis of specific mismatches, Zino et al., developed a functional epitope-based algorithm, in which different DPB1 mismatches are classified as permissive or non-permissive based on immunogenicity to a shared T-cell epitope (Fleischhauer et al., 2001; Zino et al., 2004). The outcomes in patients matched for DPB1 at an allelic level were similar to those with an allelic mismatch, but permissive epitope mismatch. Conversely, non-permissive mismatches resulted in a significantly worse overall survival. 1.1.2.2 Non-classical HLA genetics

Among non-classical HLA loci, HLA-G and HLA-E have been extensively investigated on pregnancy where they play a major role in tolerance established at the feto-maternal interface. In particular the expression of human HLA-G molecules on cytotrophoblasts (the inner layer of the trophoblast, which is a fetal tissue developing into a large part of the placenta with a major role in implantation) can block the maternal immune response by inhibiting uterine natural killer (NK) cell lysis and CD8+ T-cell cytotoxicity, suppressing maternal allogeneic CD4+ T- cell proliferation and promoting T helper 2 activation. Since Medawar first proposed the so called “paradox of the fetal allograft” in 1953, numerous studies have attempted to shed light on the mechanisms by which fetuses survive unharmed in a genetically foreign host - the mother - during successful pregnancies (Medawar, 1953). Both HLA-G and -E molecules are expressed on the human placenta and their gene polymorphisms have been previously correlated with pathological conditions related to deregulated maternal immune response during pregnancy, such as pre-eclampsia, miscarriage, and have also been used in in-vitro fertilization setting as markers of successful embryo implantation (Persson et al., 2017). Basing on the concept that the relationship and trafficking between the semi-allogenic fetus and the mother represents a model for allogeneic transplantation, HLA-G has been studied also in the setting of transplantation, including allo-HSCT. In solid organ transplants, it has been shown that high levels of soluble HLA-G molecules (sHLA-G) in the blood are correlated with reduced

20

1.1 Genetics of cord blood transplantation incidences of acute and chronic graft rejection in heart and kidney recipients. For this reason, sHLA-G levels have been proposed as a way of non-invasive monitoring for organ transplanted patients during their entire follow-up. High sHLA-G blood levels have been demonstrated to correlate with reduced incidence of both rejection and GvHD in allogeneic PBSC transplantation (Le Maux et al., 2008). HLA-G is a non-classical HLA class Ib gene which expresses seven isoforms derived from alternative splicing of a primary transcript: the membrane- bound molecules G1, G2, G3, G4 and the soluble forms G5, G6, G7 (see figure 1.1.11). Despite HLA-G gene is poorly polymorphic in its coding sequences, the unexpected extraordinary rate of variation in the 5‟URR and in the 3'UTR indicates a great importance of HLA-G cell surface expression and protein release (figure 1.1.12 provides a summary of most investigated polymorphisms and their potential function).

Figure 1.1.11 Schematic representation of HLA-G gene in the HLA region (6p21.2- 21.3), primary mRNA and the seven isoforms derived from alternative splicing.

21

1.1 Genetics of cord blood transplantation

Figure 1.1.12 Schematic representation of the HLA-G gene with the most investigated polymorphisms, their potential functions and a summary of HLA-G isoforms (Nilsson et al., 2014). During pregnancy, both transmembrane and soluble HLA-G molecules (sHLA-G) are physiologically expressed and down-regulate the maternal immune response. sHLA-G is contained in cord blood at birth; it is plausible that sHLA-G is also present in cord blood collections and could display some role in CBT. It has been reported that not only cord blood mesenchymal cells, but also CD34+ cell progenies produce sHLA-G (Avanzini et al., 2009; Buzzi et al., 2012). Taking into account that cord blood derives from pregnancy and HLA-G molecules act in both pregnancy and transplantation as immune-modulators, our group has previously reported on the HLA-G genotyping of 85 CBUs aiming to identify the best sHLA- G producers among our cord blood donors, thus providing a possible explanation (at least in part) of the tolerogenic properties of cord blood. Considering the HLA- G 14bp insertion/deletion (INS/DEL) polymorphism in the 3'UTR of the HLA-G gene affecting mRNA stability and protein expression (Gonzalez et al., 2010), and measuring sHLA-G levels, we demonstrated that there was a statistically significant correlation between sHLA-G and CD34+ cell concentrations in the group of HLA-G 14b INS/INS carriers (r=0.5662, p-value=0.0060)(Capittini et al., 2014), with possible implication on donor selection. Other polymorphisms are known to be involved in sHLA-G production and could be investigated in this setting. The HLA-G -725C/G polymorphism (rs1233334) is located in the promoter region of HLA-G gene closely flanking an IRF (interferon response factor-1) binding motif. In a study by Ober et al., -725C/G, was associated with an increased risk for miscarriage in couples where both partners are carriers of the G allele, compared with couples not carrying the G allele (Ober et al., 2003). Several authors have previously hypothesized differences in the transcriptional properties of the HLA-G -725G and -725C alleles, probably due to the introduction of an

22

1.1 Genetics of cord blood transplantation additional methylated cytosine on a CpG nucleotide in presence of the -725C allele which may down regulate transcription of the HLA-G gene itself. SNP (+3142 C/G) in the 3‟ UTR has also been suggested to affect HLA-G expression. This site was proposed to be the target of mi-RNAs, and the presence of G instead of C was claimed to favour mRNA degradation by mi-RNAs (Veit et al., 2009). 1.1.2.3 Non-HLA genetics

Even when the HLA loci are identical between the donor and the recipient (in both unrelated donor and sibling transplantation), GvHD and graft rejection may still occur. In fact, it is recognized that other genetic factors exist which may mediate transplant complications through various mechanisms. Unfortunately, only a few multicentric collaborative studies have been performed, in contrast to the many single center studies reporting the impact of these factors on HSCT outcome, whose relevance is, however, limited by the fact that the study populations are often different and the results of clinical impact of the factor investigated are in conflict with other reports. Therefore non-HLA immunogenetic factors have not yet entered routine practice in the HSCT field and are only considered in particular subgroups of patients or protocols of treatment (Chien et al., 2012; Takami, 2013). In the setting of CBT, it has been described that matching of killer immunoglobulin-like receptor (KIR) ligand or noninherited maternal antigens (NIMA) may represent non-HLA factors of CBUs able to affect outcomes after CBT and could be used as criteria for CBU selection (Rocha et al., 2012; Rocha et al., 2016). The effect of KIR-ligand matching after CBT was studied in 461 patients with acute myeloid leukemia. Donor-recipient HLA matching considered allele-level matching at HLA-A, -B, -C, and -DRB1. Separate analyses were conducted for 6-7/8 HLA-matched and 3-5/8 HLA-matched transplants to avoid HLA matching confounds KIR-ligand matching. Among 1-2 HLA-mismatched recipients, no significant differences in NRM, relapse, and overall mortality were shown that depend on the KIR-ligand match status. Conversely, NRM and overall mortality were higher among recipients of 3-5 HLA-mismatched transplants with KIR-ligand mismatches in host-versus-graft direction (compared with KIR-ligand matched transplants). Therefore these data do not support selecting CB units based on KIR-ligand match status for 1-2 HLA mismatched transplants, while KIR- ligand mismatching shall be avoided to lower mortality in case of 3 or more HLA mismatched transplants (Rocha et al., 2016). Basing on the concept that in utero, exposure to non-inherited maternal antigen (NIMA) induces T regulatory cells in the fetus that are specific to the non-inherited maternal haplotype (and consequentely tolerant to NIMA), it has been hypothesized that in CBT recipients who are are matched to donor NIMAs, alloresponse could be reduced. In other words, if the mismatch between a CBU and the corresponding recipient of CBT is a NIMA (the same antigen of the mother who donated the CBU which the infant donor does not share with her) the presence in the cord of regulatory T cells

23

1.1 Genetics of cord blood transplantation originated after in utero exposure to NIMA leads to tolerance to this mismatch, and theoretically decreases alloreactivity to the same antigen if present in the CBT recipient. To verify this hypothesis, 48 NIMA-matched CBTs (where the NIMA of the donor CBU matched to the patient) and 116 non-NIMA-matched CBTs were included in a Eurocord retrospective analyisis. TRM was lower after NIMA- matched CBTs compared with NIMA-mismatched CBTs, and consequently, overall survival was higher (55% after NIMA-matched CBTs versus 38% after NIMA-mismatched CBTs, p = 0.04). Therefore it has been suggested that in presence of multiple HLA-mismatched CB units available for a patient, the choice should fall on a NIMA-matched CBU for improving survival (Rocha et al., 2012). Besides KIR, NIMA and minor histocompatibility antigens, we can mention that cytokine, chemokine and immune response gene polymorphisms are a topic of growing interest in HSCT setting (Dickinson and Norden, 2015). For instance, proinflammatory cytokines, their receptors and related inhibitors have been implicated in a large number of immunological diseases, including GvHD following allo-HSCT. It has been hypothesized that single nucleotide polymorphisms (SNPs) present in the regions of DNA encoding the cytokine genes and their promoter regions, may be important particularly if a variation in the functional level or activity is produced. The paper of 2008 by Dickinson provided a comphensive review of the polymorphic cytokine genes that have been studied in the transplant context, including tumor necrosis genes (TNF), interleukin (IL)-10, the IL-1 gene family, IL-2, IL-6, interferon (IFN)-γ, TGF-β1 and TGF-β1 receptors (Dickinson, 2008). This review has been recently updated to discuss the relevance of genome-wide association studies (GWAS) and a recent meta-analysis combining GWAS studies with gene expression micro array data in the field of autoimmune disease and solid organ transplantation, and novel candidate gene polymorphisms, such as SNPs in microRNAs (Dickinson and Norden, 2015)(see table 1.1.4).

24

1.1 Genetics of cord blood transplantation Table 1.1.4 Review of the studies published on non-HLA polymorphisms associated with acute GVHD, chronic GVHD, transplant-related mortality, survival and relapse in HLA matched sibling and matched MUD HSCT (Dickinson and Norden, 2015).

SNP associations were linked to either increased ↑ or decreased ↓ incidence of severity of acute or chronic GVHD (aGVHD or cGVHD); transplant-related mortality (TRM) and relapse or overall survival (OS) following HLA-matched sibling transplants (Sib) or matched unrelated donor transplants (MUD). The effect can be carried in either the patient (P) or the donor (D).

Recently chemokine polymorphic genes, such as CCR5 and CCR9, have been extensively investigated for their potential role in infectious complications and GvHD after HSCT. C-C chemokine receptor type 5 (CCR5) is mainly expressed on T-cells, macrophages, and dendritic cells and plays a role in the trafficking of these cells to the site of inflammation. A very recent study by the Japan Marrow Donor Program has shown that recipient CCR5 variation predicts relapse and survival. In particular, the recipient CCR5-2086A/A genotype affects the induction of the GvL effect without augmenting GVHD, suggesting that CCR5 genotyping in transplant recipients may be useful for evaluating pre-transplantation

25

1.1 Genetics of cord blood transplantation risks (Horio et al., 2017). Moreover, homozygous carriers of CCR5 gene variant CCR5-Δ32 is highly resistant to infections with human immunodeficiency virus type 1 (HIV-1) thus representing the preferred stem cell donors for HIV-infected patients. In fact HSCT with CCR5-Δ32/Δ32 stem cells from an adult donor is nowadays the only known cure of HIV infection and recently cord blood has been suggested as the ideal candidate stem cell source (Petz et al., 2013; Duarte et al., 2015). Subtyping listed donors for CCR5-Δ32 could be a novel strategy for registries to improve their inventories and utilization of cord blood (Solloch et al., 2017). Several genes related to the innate immune system, particularly those involved in the recognition of pathogens, have recently been studied in the HSCT setting, including NOD-like receptors (NOD2/CARD15) and the Toll-like receptors (TLR), which have been shown to impact on transplant complications including relapse, GvHD and susceptibility to infections (Penack et al., 2010). In particular, host defense and inflammatory gene polymorphisms have been reported to be correlated with the outcome in the study by Rocha et al., of 2002, where 107 donor/recipient DNA pairs were genotyped for several gene polymorphisms including TNF-alpha and -beta, IL-1 receptor antagonist, IL-6, and IL-10, and found to be informative genetic risk factors for selecting donor/recipient pairs (Rocha et al., 2002). Among the genes related to adaptive immune response, cytotoxic T- lymphocyte antigen 4 (CTLA-4), also called CD152, has been recently investigated for the potential application as biomarker and therapeutic target (also see figure 1.1.13). CTLA-4 gene is located on the long arm of chromosome 2 (2q33.2) and is a member of the immunoglobulin superfamily, encoding a protein that transmits an inhibitory signal to T-cells. Alternate transcriptional splice variants, encoding different isoforms, have been characterized; CTLA-4 mutations have been associated with insulin-dependent diabetes mellitus, Graves‟ disease, Hashimoto thyroiditis, celiac disease, systemic lupus erythematosus, thyroid-associated orbitopathy, and other autoimmune diseases. Several SNPs have been described such as +49A/G in exon 1, and +6230G>A in 3‟UTR (also known in the literature as CT60 G>A or rs3087243 SNP, located 3' downstream of exon 4 of CTLA-4 gene)(see figure 1.1.14). CTLA-4 is a costimulatory protein receptor (inhibitory receptor) for CD80+CD86-regulatory T-cell acting as a major negative regulator of T-cell responses. The affinity of CTLA4 for its natural B7 family ligands, CD80 and CD86, is considerably stronger than the affinity of stimulatory coreceptor CD28.

26

1.1 Genetics of cord blood transplantation

Figure 1.1.13 Alterations in cytokine levels (IL-10, IL6 and IL2) via immunoregulatory SNPs can lead to altered regulatory function of regulatory (Treg) and effector T-cells (Teff). Binding of CTLA 4 (possibly also via functional SNPs) with CD80/CD86, B7, proteins on dendritic cells (DCs) can lead to immunosuppression of Teff; conversely, altered binding of CTLA 4 may lead to reduced immunosuppression via Tregs and subsequent GvHD (Dickinson and Norden, 2015).

Figure 1.1.14 Schematic representation of CTLA-4 gene on the long arm of chromosome 2 (2q33)(top) and two of its main polymorphic nucleotide positions (bottom).

27

1.1 Genetics of cord blood transplantation Checkpoint blockade therapy targeting CLTA-4 has been used effectively in both allogeneic and autologous HSCT to restore anti-tumor immunity. In fact, relapse, representing the principal cause of treatment failure after HSCT, has been reported to imply immune escape, which in turn, at least in some cases, appears to be mediated by increased expression of inhibitory immune checkpoints. Caution shall be adopted using this therapeutic approach due to the associated risk of fatal immune-related adverse events and graft-versus-host disease (Merryman and Armand, 2017). A gene modification approach producing a CTLA4-CD28 chimera gene (CTC28) of T-cells manipulated for donor lymphocyte infusion (DLI) has been very recently reported to increase GvL effect in patients with hematological malignancies, with increased but treatable GvHD (Park et al., 2017). As a biomarker, besides the studies on autoimmune diseases, CTLA-4 has been extensively reported as one of the candidate genes for association studies with acute and chronic GVHD after HSCT. Despite a recent CIBMTR research analysis reported that CTLA-4 SNP rs4553808 is not associated with HSCT outcomes in adults with acute myeloid leukemia and advanced myelodysplastic syndrome undergoing a first 8/8 or 7/8 HLA-matched MUD- HSCT (Sengsayadeth et al., 2014), in several single-center studies AG and GG genotypes of donor CTLA-4 SNP rs4553808 have been shown to be independent predictors of inferior relapse- free survival (RFS) and overall survival in patients receiving allo-MUD-HSCT, compared with those with the AA genotype (Vannucchi et al., 2007; Perez-Garcia et al., 2007; Bosch-Vizcaya et al., 2012; Jagasia et al., 2012). In CBT setting, only one study has been reported on 115 CBT recipients and their unrelated CB grafts, which were genotyped for TNF-alpha (TNFd3/d3) and IL-10. The authors showed no correlation between the donor and the recipient risk alleles under investigation and the incidence of acute GvHD grades II to IV (Kögler et al., 2002).

28

1.2 Human population genetics 1.2 Human population genetics

Evolution is defined as the change in the inherited characteristics of biological populations over successive generations. The driving forces of this phenomenon are mutations, natural selection, migrations and genetic drift. Changes produced in any generation are normally small and they can provide advantages or disadvantages as well as have no influence on the fitness to the carriers. The aim of population genetics is to study the genetic composition of modern populations as starting point for elucidating the evolutionary events that shaped its structure during millennia, including colonizations, migrations and population expansions. These studies are based on the analysis of genetic polymorphisms, especially those whose phenotypic effects are almost neutral. Besides loci evolving neutrally (or supposed evolving neutrally), particularly useful are uniparental systems like the male specific region of the Y chromosome (MSY, Underhill et al., 2000) and the mitochondrial DNA (mtDNA, Wallace, 1997). The holandric and maternal inheritance, respectively (figure 1.2.1), and the absence of reshuffling from generation to generation allow the Y- chromosome and mtDNA polymorphisms to trace back paternal and maternal lineages therefore enabling to elucidate male and female genetic history.

Figure 1.2.1 Transmission of uniparental (i.e. mitochondrial DNA - left) and biparental autosomal systems (right). MtDNA will be the same as the maternal grandmother, as well as the Y chromosome is the same as the paternal grandfather. Therefore mtDNA allows to retrieve information restricted to the maternal lineage only (and Y chromosome restricted to the paternal one), in contrast to autosomal markers that can be used for all ancestry.

29

1.2 Human population genetics The process of molecular differentiation occurred independently along these two lineages during and after the process of dispersal into different regions and continents. Therefore Y chromosome and mtDNA polymorphisms tend to be restricted to particular geographic locations and population groups, and the parallel study of the two genetic systems allows acquiring sufficient and reliable information concerning the genetic ancestry of individuals and/or populations. The average genomic difference between a pair of humans taken across the world can be one difference per 1000 bp or even less. Even though the inter- population differences are minor compared with within-population differences, it is possible, by using only a small number of genetic traits, to distinguish with a certain likelihood individuals of different continental affiliation. Whether the continuum of genetic differences between human populations is smooth or bumpy, geography rather than ethnicity seems to be the driving factor in such patterning. Some of these differences may have arisen as a consequence of random genetic drift of neutrally evolving traits, while some others can be ascribable to selective factors. The possibility to discern genetic differences between the populations can provide answer to the challenging question of how and when they have arisen. In the matter of this, two models have been proposed: (i) the multiregional model, which explains the observed differences as the result of a long term segregation of continental gene pools; (ii) the replacement theory, according to which a small founder group coming from Africa spread into Asia and to the rest of the world, occupying the habitats of the preceding human types and substituting them. Even though whole-genomic approaches are now opening up new ways to answer these questions related to the origin and diversification of our species, mtDNA and Y chromosome, with their unique pattern of inheritance, continue to be important sources of information. With the technological advances in both genomics and bioinformatics, the simultaneous analysis of large sets of genetic markers is becoming more affordable, and the traditional uniparental systems are often coupled with the more comprehensive genome-wide scans. Indeed, the uniparentally-transmitted genetic systems represent less than 2% of the DNA of a cell and to obtain a comprehensive view of the population patterns of human diversity, autosomal markers needs to be evaluated as well (Barbujani and Colonna, 2010) (figure 1.2.2). In this setting, human Leukocyte Antigen (HLA) class I and class II extensive polymorphism and linkage disequilibrium, namely the existence of genes blocks encoding proteins with similar functions that form non-random alleles combinations (haplotypes) that are inherited together, can be profitably applied to population studies (Trowsdale, 2011). Major Histocomaptibility Complex (MHC) selection, through mechanisms such as disease resistance (and probably also reproductive fitness, if considering the relationship with Killer Immunoglobulin- like Receptor, KIRs), is likely to depend upon MHC variation, not only at the level of the individual, but notably at population level. Both HLA and KIR systems

30

1.2 Human population genetics display considerable population diversity and their patterns of genetic variation worldwide show significant information about human geographic expansion, demographic history and cultural diversification. It is increasingly acknowledged that, in addition to mitochondrial DNA, Y-chromosome, microsatellites, SNPs and other genetic markers, immunogenetic polymorphisms represent important and complementary tools for anthropological studies (Fernandez-Vina et al., 2012; Sanchez-Mazas et al., 2011).

Figure 1.2.2 Comparison of transmission for uniparental and biparental genetic systems. Concerning uniparental markers (mtDNA e Y-chromosome), in individuals 7, 8, 9 and 10 we can retrieve only purple O and blue Y, while grey Y, rose and violet O of the grandparents are lost. Due to LD and the presence of haplotypes (gene blocks inherited together with a very low recombination rate), HLA may be useful to recover information back to the ancestors along both the maternal and paternal lineages, as in individuals 7, 8, 9 and 10 we can retrieve information on the maternal grandfather‟s and the paternal grandmother‟s haplotypes. O =mtDNA; Y= y chromosome; II =autosomes (i.e. HLA region on chromosome 6) LD = linkage disequilibrium

31

1.2 Human population genetics 1.2.1 Mitochondrial DNA

Eukaryotic cells are organized in compartments. Each cell contains a nucleus and a surrounding cytoplasm in which are suspended organelles enclosed within membranes. Among these there are mitochondria, made up by an inner membrane enclosing the matrix and an outer membrane delimiting the perimembrane space. Mitochondria represent a budding of a fusing network more similar to the endoplasmic reticulum (Iborra et al., 2004). Although most of the DNA of a cell is contained in the nucleus, also the mitochondrion has its own independent genome, such as the machineries for replication, transcription, and protein synthesis. Every cell contains as many as several thousands mitochondria (figure 1.2.3), each with a variable number of mitochondrial DNA (mtDNA) molecules. As a result of this high amount of mtDNA copies, despite its small size, the mitochondrial genome can come to constitute up to 0.3% of the total DNA of a cell.

Figure 1.2.3 Electron microscope cell section view of numerous mitochondria.

The discovery of the mtDNA makes to emerge several questions about how eukaryotic cells tolerate more than one genome and why this genome shed many (but not all) of its genes to the point of being no longer self-sufficient for replication and expression. The endosymbiosis (or symbiogenesis) represents the most commonly accepted theory to explain the origin of mitochondria. According to this theory, mitochondria descend from free-living bacteria that became symbiotic with eukaryotic cells about 1.5 billion years ago. Originally it was proposed that the nucleus originated in an Archaebacterium and symbiosis began with an eubacterial progenitor of the modern mitochondrion (Margulis, 1971). During the course of the

32

1.2 Human population genetics years the conventional „endosymbiosis theory‟ has been subjected to modifications and the revised theory has been labeled „hydrogen hypothesis‟ (Martin and Muller, 1998). It postulates that the eukaryotic nucleus and the mitochondria were created simultaneously through the fusion of a hydrogen-requiring methanogenic Archaebacterium (host) and a hydrogen-producing alpha-proteobacterium (symbiont). Indeed, the eukaryotic nucleus is a chimera of genes whose origins are clearly archaebacterial with others clearly derived from Eubacteria. Regardless of which view is correct, one thing is common to both: the majority of mitochondrial genes that existed in the symbiont genome of the proto-eubacterium have been transferred to the nuclear genome.

1.2.1.1 Mitochondrial genome

The sequencing of the entire mitochondrial genome (mitogenome) was announced for the first time in 1981, when Anderson and colleagues published the sequence and the organization of the human mitogenome (Anderson et al., 1981). This sequence, 16,569 base pairs (bp) long, is the so-called Cambridge Reference Sequence (CRS), later resequenced and named revised Cambridge reference sequence (rCRS) (Andrews et al., 1999). Human mtDNA was only the first of many other mitochondrial genomes to be completely sequenced. One year later the publication of CRS, Anderson‟s laboratory produced also the Bovine Reference Sequence (BRS) (Anderson et al., 1982), 16,338 bp long. Nowadays, although mitogenomes of numerous species are available, many others still need to be sequenced. Mitochondrial DNA is organized as a circular, double-stranded molecule. The two strands are denoted H (heavy) and L (light) because they have a different base composition which confers them different buoyant densities in a cesium chloride gradient. The H-strand is guanine-rich while L-strand is cytosine-rich. Traditionally, the molecule is numbered on the light strand, relative to the first published human mtDNA sequence. Differently from the nuclear genome, the mitogenome has a very compact structure without introns. In humans, it contains 37 genes (28 on the H-strand and 9 on the L-strand), all of which essential for normal mitochondrial function. Thirteen of these genes encode for enzymes involved in OX-PHOS, while the remaining for two ribosomal RNAs (rRNAs 12S and 16S) and 22 transfer RNAs (tRNAs) required and sufficient for the synthesis of mitochondrial proteins (Anderson et al., 1981; Wallace, 1994; DiMauro and Schon, 2003). Most genes are contiguous, separated by one or two non-coding base pairs and among those codifying enzymes MTATP6 and MTATP8, as well as MTND4 and MTND4L, are overlapping (figure 1.2.4 and table 1.2.1). This pattern is also found among most metazoans, although in some cases one or more of the 37 genes is absent and the mtDNA length varies between species.

33

1.2 Human population genetics

Figure 1.2.4 Map of the human mitochondrial genome. Loci are coloured according to functional groupings. Gene identifiers on the outside of the map are transcribed on the heavy strand and gene identifiers on the inside of the map are transcribed on the light strand. Transfer RNA loci are designated by the single letter code of their specific amino acid. The non-coding D-loop is shown at the top of the map (in black) (Stewart and Chinnery, 2015).

Table 1.2.1 List of animal mtDNA genes and gene products. Gene designation Encoded product

COI, COII, COIII Cytochrome oxidase subunits I, II, and III Cytb Cytochrome b apoenzyme ND1-6, 4L NADH dehydrogenase subunits 1 to 6 and 4L ATP6, ATP8 ATP synthase subunits 6 and 8 lrRNA Large ribosomal subunit RNA srRNA Small ribosomal subunit RNA tRNAs 18 amino acid-specific transfer RNAs L(CUN) and L(UUR) 2 leucine tRNAs S(AGN) and S(UCN) 2 serine tRNAs

Although most of the mtDNA encodes for products, there are several non- coding regions interspersed in the molecule, the major of which is the mtDNA displacement loop (D-loop), or control-region, that is involved in the regulation of

34

1.2 Human population genetics transcription and replication of the molecule. In human mtDNA, the D-loop extends from np 16024 to np 576, splitting into three short regions named HVSI (nps 16024-16400), HVSII (nps 44-340) and HVSII (438-576) (Brandstätter et al., 2004). The acronym HVS derives from „hypervariable sequences‟ as they are highly variable at the population level if compared to the rest of the genome. These regions contain the origin of heavy-strand mtDNA replication (OH), the light-strand transcription promoter (LSP) and the heavy-strand promoters (HSP1 and HSP2) (Falkenberg et al., 2007). It seems they are involved in the genome replication and transcription nevertheless their function is not completely known. The structure and the composition of the D-loop are not identical in all the animals. For example, if we consider the bovine mtDNA, its control-region sequence is only slightly homologous to the corresponding region in the human mitochondrial genome (Anderson et al., 1982) and contains only one hypervariable sequence (nps 16042-16313). The replication of the mtDNA is a process independent to cell cycle. Indeed, unlike nuclear DNA, which replicates only once during each cell cycle, mtDNA is continuously recycled by the polymerase γ, even in non-dividing tissues such as skeletal muscle and brain (Bogenhagen and Clayton, 1977; Birky, 2001). The precise mechanism of the mtDNA replication is currently a topic of great debate. The traditional „strand-asymmetric model‟, proposed in 1982 by Clayton, suggests that mammalian mtDNA molecules replicate unidirectionally from two spatially and temporally distinct strand-specific origins. According to this model, the heavy strand leads the replication cycle beginning at OH with the synthesis of a primary transcript that continues until the origin of light-strand replication (OL) is exposed. Only after the replication fork has passed OL, the light (or lagging) strand is synthesized in the opposite direction (Clayton, 1982). However, in 2003 experimental evidences supported an alternative „strand-symmetric‟, or „rolling circle‟ model (Bowmaker et al., 2003). It postulates that the replication of mtDNA begins at several points in a 5.5-kb critical region between the D-loop and the ND4 gene. The replication bubbles then proceed in both directions, stopping at OH, and stalling briefly in the region of OL before completing the replication cycle, with the lagging strand catching up by the ligation of Okazaki fragments (figure 1.2.5) (Lightowlers and Chrzanowska-Lightowlers, 2012).

35

1.2 Human population genetics

Figure 1.2.5 Models of mammalian mtDNA replication. The „strand asynchronous‟ or „strand-displacement‟ model (A) compared to the „strand synchronous‟ and unidirectional replication model (Lightowlers and Chrzanowska-Lightowlers, 2012).

Transcription of the mtDNA is „prokaryotic like‟. In human mtDNA, the transcription of the two strands is initiated from three promoters named heavy strand 1 and 2 (HSP1 and HSP2) and light strand (LSP), which, together with their uspstream enhancers, are recognized by the transcription machinery. This consists of a mitochondrial RNA polymerase (POLRMT), a mitochondrial transcription factor A (TFAM), and one of two homologous mitochondrial transcription factors, B1 (TFB1M) or B2 (TFB2M) (Falkenberg et al., 2002; Chinnery and Hudson, 2013). Transcription initiated from the HSP1 generates a short transcript that terminates at the 16S rRNA, while those initiated from HSP2 generates a polycistronic message including both rRNA genes, 12 mRNA genes, and 14 tRNA genes. Light-strand transcription from the LSP generates the ND6 mRNA and 8 tRNAs (Taylor and Turnbull, 2005). Full length transcripts are cut into functional tRNA, rRNA, and mRNA molecules. The end of the process depends on the mitochondrial transcription termination factors, a group of DNA-binding proteins whose functions and mechanism of action remain to be defined, and brings to full- length transcripts that are cut into functional tRNA, rRNA, and mRNA molecules.

36

1.2 Human population genetics The mitochondrial translation machinery works in strict cooperation with the cytoplasmatic one that makes nuclear-encoded proteins destined for the mitochondrion (Chinnery and Hudson, 2013). Indeed, during the transcription, nascent mtRNA is translated by mitochondrial ribosomes (mitoribosomes) that bound both the polymerase (Kornberg, 1992) and the inner mitochondrial membrane (Liu and Spremulli, 2000). The mitoribosomes involved in this process are partly coded by mtDNA but require further 81 nuclear DNA (nDNA) proteins (Chinnery and Hudson, 2013) to be assembled. The initiation factors IF1 and IF3 promote the dissociation of the ribosomal subunits (Koc and Spremulli, 2002), thus allowing the assembly of the initiation complex and the beginning of the translation process (Christian and Spremulli, 2009). The elongation is controlled by nuclear- encoded proteins and it goes on until the recognition of the STOP codons. Once translation ends, peptides are transferred through the mitochondrial double membrane by the mediation of the translocation machineries TIM (trans inner membrane) and TOM (trans outer membrane). The close proximity of the two sets of translation machinery (cytoplasmic and mitochondrial) on each side of the membranes of the organelle ensures efficient assembly of mitochondrial complexes containing proteins encoded by nuclear and mitochondrial genomes (Iborra et al., 2004). 1.2.1.2 Genetic code

The genetic code is defined as the basis of the heredity and consists in a set of rules by which the genetic information encoded by DNA and RNA sequences is translated in proteins. When the code was deciphered, it was immediately labeled as „universal‟, but, less than 15 years later, it was found that in mitochondria some codons differed from the universal code (Barrell et al., 1979). To explain these changes, it was proposed that mitochondria might probably tolerate changes in the code that would not be acceptable to a larger and more complex genome (as the nuclear) (Jukes, 1981). However, the discovery that the code has changed in undamaged organisms made this hypothesis unlikely. Now it is realized that the genetic code evolved in two distinct phases: a first in which the „canonical‟ code emerged and a second in which it diverged in numerous nuclear and organelles lineages (Knight et al., 2001). In mitochondria this phenomenon occurred independently in plants and in the other organisms. In plants mitochondria use the universal code, whereas other organisms have many different code changes in their mtDNAs with respect to the universal one, with only one constant: the codon UGA coding for tryptophan instead of a termination signal (Anderson et al., 1981). Further studies have shown that the mitochondrial genetic code is not even universal among non-plant mitochondria (table 1.2.2). This variability in codon usage reflects the variability in number and composition of anticodons among organisms and organelles.

37

1.2 Human population genetics Table 1.2.2 Mitochondrial genetic code variation for mammals, fruit flies and yeasts.

Nuclear mtDNA genetic code RNA codon genetic code Mammals Drosophila Yeasts

UGA STOP Trp Trp Trp

AGA, AGG Arg STOP Ser Arg

AUA Ile Met Met Met

AUU Ile Met Met Met CUU, CUC, Leu Leu Leu Thr CUA, CUG

1.2.1.3 Mitochondrial DNA features

1.2.1.3.1 Maternal inheritance and lack of recombination

One of the main features of mammalian mtDNAs is that they are inherited mostly from the maternal line. The transmission along the female lineage, from mother to offspring without any paternal contribution or recombination between the two parental lineages, was already observed in 1980 (Giles et al., 1980). The precise molecular mechanism behind strict maternal transmission in humans remains elusive. It appears that several mechanisms have coevolved to avoid paternal mtDNA contribution to the embryo (Sato and Sato, 2013). It was originally thought that paternal mitochondria did not enter the oocyte, because of both their low copy number and their location in the midpiece of the sperm tail. However, the detection of paternal mtDNA molecules in early human preimplantation embryos (St John et al., 2000) leaded to believe that paternal mitochondria are subjected to prezygotic and zygotic control mechanisms. It was suggested that during the spermatogenesis there is a down-regulation on mtDNA replication that controls the paternal mtDNA copy number (Rantanen et al., 2001) and a further elimination by an active mechanism that involves selective ubiquitination (Sutovsky and Schatten, 2000; Sutovsky, 2003). Alternatively, a passive „diluition model‟ has been proposed due to the disproportionate amount of paternal vs maternal mtDNAs (ratio of 1:15,860) (Luo et al., 2013; Pyle et al., 2015). Recently, a study on C. elegans envisioned a third mechanism, in which an endonuclease mediates the damage of the inner membrane of paternal mitochondria thus inducing their selective elimination (Zhou et al., 2016). Taken together, all these data confirm that in natural conditions maternal inheritance is strongly controlled by the elimination of paternal mtDNA. However, occasionally, these mechanisms may fail, potentially leading to maternal/paternal mtDNA mosaicism in an individual. An example is the case of a 28 years-old man affected by a

38

1.2 Human population genetics metabolic disorder due to a pathogenic mtDNA deletion that resulted of paternal origin (Schwartz and Vissing, 2002). Untill now this is the only documented case of paternal transmission in humans. Moreover, high-depth mtDNA sequencing, up to about 1.2 millions fold coverage, revealed that there is no evidence for paternal contribution in mtDNA inheritance (Pyle et al., 2015). In conclusion, the current opinion is that paternal transmission of mtDNA is exceptionally rare and even if it does occur, it is extremely unlikely that it might result in recombination between paternal and maternal mtDNAs. Therefore, the traditional dogma of maternal inheritance is widely accepted, especially in studies of population genetics.

1.2.1.3.2 Homoplasmy and heteroplasmy

Cells contain thousands of molecules of mtDNA and in the majority of times their sequences are identical. This condition in which all the mtDNAs in a cell (or in a tissue) have the same genome is known as „homoplasmy‟. However, somentimes wild type and mutated molecules can coexist and this situation is termed „heteroplasmy‟. The percentage of heteroplasmy can vary widely among different individuals, populations (Irwin et al., 2009), but also in the same individual from organ to organ or between cells (Calloway et al., 2000). This mixture of wild type and mutated mtDNAs is often correlated with clinical expressions (Avital et al., 2012; Gasparre et al., 2013; Sobenin et al., 2013). Studies conducted on cybrids (cell lines incorporating mitochondria from another source) containing different amounts of mutated mtDNAs have shown that the proportion of mutant mtDNA must exceed a critical threshold level, that is mutation- and tissue-specific, before a cell expresses a biochemical defect in the respiratory chain (Schon et al., 1997; Wallace et al., 1998). However, heteroplasmies are also present in normal individuals and appear to be more frequent in the control region than in the coding region (Jazin et al., 1996; Santos et al., 2008; Li et al., 2010). Now is possible to determine the presence of even a low percentage of heteroplasmy by the next generation sequencing (NGS) that allows to resequence a particular region thousands of times, thus revealing also rare variants (Wallace and Chalkia, 2013). Studies using this approach revealed that 25-65% of the general population has at least one heteroplasmy across the entire mitochondrial genome (Li et al., 2010; Sosa et al., 2012). Two mechanisms are described as responsible of changes in mtDNA in human cells in vivo that occasionally cause the appearance of heteroplasmies. These mechanisms are known as „relaxed replication‟ and „vegetative segregation‟. The replication in mitochondria is considered „relaxed‟ because it occurs independently of the cell cycle, indeed mtDNA is destroyed and replicated continuously even in non-dividing tissues (Bogenhagen and Clayton, 1977). Moreover, since individual molecules appear to be randomly selected for destruction and replication, in heteroplasmic cells this process can lead to changes

39

1.2 Human population genetics in the proportion of mutant and wild-type mtDNA molecules over a period time through random intracellular genetic drift (Birky, 1994; Chinnery and Samuels, 1999). Vegetative segregation is the unequal partitioning of mutant and wild-type mtDNA that occurs during cell division and can also lead to changes in the level of heteroplasmy in a proliferative tissue (Birky, 1994), such as blood leucocytes or cells in culture (Lehtinen et al., 2000).

1.2.1.3.3 Mitochondrial genetic bottleneck

Taking into account the previously described features of mtDNAs, one should expect not to find any differences between mother and offspring mtDNA composition. However, observations indicate that the amount of a variant inherited from a heteroplasmic mother varies in the offspring (Cree et al., 2008; Carling et al., 2011). A theory explaining this phenomenon is the „mitochondrial bottleneck‟. The reduction in the effective number of mitochondrial genomes occurs during early embryogenesis in the developing female germ line (Jenuth et al., 1996; Marchington et al., 1998) and it facilitates the rapid removal of deleterious mtDNA mutations from the population. The exact mechanism by which the bottleneck occurs is hotly debated but there are currently three theories (Figure 1.5). A possible explanation for the mitochondrial reduction is to consider as bottleneck the small number of mtDNA copies (~2,000) during the early development of the germ lines (Cao et al., 2007) in respect to the copy number at other stages of the female germ line (~100,000 copies in the mature oocyte). However, 2,000 is too great a number to sustain the observed genetic drift (figure 1.2.6a). Other two hypotheses have been proposed to explain this rapid genetic drift in germ line, one sustaining that molecules of mitochondrial DNA aggregated in nucloids caused a tighter bottleneck that accelerates the drift (figure 1.2.6b) (Cao et al., 2007), and another asserting that a combination of several factors are involved. It has been demonstrated (using specific markers) that mtDNA content during the early development of cell lines varies with time and hits a sharp minimum just before mtDNA replication in the embryo is initiated. This bottleneck narrowing leads to an increase in genetic drift (figure 1.2.6c) (Cree et al., 2008). These theories suggest that multiple factors may be involved in the mitochondrial bottleneck (including or not nucleoids) and further investigation is needed to completely clarify this complex system (Khrapko, 2008).

40

1.2 Human population genetics

Figure 1.2.6 The mitochondrial genetic bottleneck hypothesis. (a) A simple model of the mitochondrial bottleneck contains too many mtDNA molecules and creates lower genetic drift estimations (solid line) than experimentally observed genetic drift (dotted line). (b) Assuming aggregation of mtDNA molecules into nucleoids results in a much tighter effective bottleneck and accelerates the estimated drift to fit observations. (c) Alternatively, observed drift can be accounted by a variable bottleneck with temporal reduction of copy number (Khrapko, 2008).

1.2.1.3.4 Mutation rate

Mutation is the ultimate source of genetic variation. It is both the substrate for evolution and the cause of genetic disease (Nachman and Crowell, 2000). In genetics, the mutation rate is the measure of the rate at which a mutation occurs in an organism or gene in each generation. Mutation rates differ between species and even between different regions of the genome of a single species. These different rates of nucleotide substitution are measured in „substitutions per base pair per generation‟. Understanding the key process of human mutation is important for many aspects of medical genetics and human evolution. Human mtDNA is characterized by a much greater evolutionary rate than that of the average nuclear gene (Brown et al., 1979) and this peculiarity makes mtDNA very informative in evolutionary studies. This high mutational rate is due to several reasons, among these the higher frequency of replication than the nuclear DNA and the lack of an efficient DNA repair mechanism as well as protective proteins such as histones (Clayton et al., 1974; Tao et al., 2014). In addition, mitochondrial DNA is also physically associated with the inner mitochondrial membrane where highly mutagenic oxygen radicals are generated (Shigenaga et al.,

41

1.2 Human population genetics 1994; Tao et al., 2014). The mutation rate across the entire mtDNA molecule is not equal and the overall mutation rate in the non-coding control region (bases 16024- 576) is about 10 times higher than that of the coding region (bases 577-16023) (Pakendorf and Stoneking, 2005; Howell et al., 2007; van Oven and Kayser, 2009). 1.2.2 The human Y chromosome

With an approximate size of 60 Mb, the Y chromosome is the third smallest human chromosome as measured by flow cytometry (Harris et al., 1986). It is divided in two distinct parts: pseudo-autosomal and Y-specific regions. The pseudo-autosomal region (PAR) consists of three separate portions, PAR1 (2.6 Mb) and PAR2 (0.3 Mb), located at the telomeric ends, and PAR3, located within the X-trasposed region of the short arm of the chromosome. PAR3 has been hypothesized thanks to a study conducted in 2013 (Veerappa et al, 2013) in a sample of ten families with various dyslexic members, where transposed blocks of more than 100 bp of region Yp11.2 were identified in the XTR region localized in Xq21.3. As these sequences, probably due to a unique event of unequal recombination, were found within the XTR region, which is not involved in regular X-Y recombination, this led the researchers to hypothesize the presence of a third PAR region, then called PAR3. PARs participate in the homologous recombination with the X chromosome, being therefore required for a proper segregation of the two sex chromosomes during meiosis. Genes located within the PARs are inherited in the same manner as autosomal genes. The Y-specific region represents 95% of the chromosome. It does not pair with the X chromosome and for this reason it was first named Non Recombining Region of chromosome Y (NRY). Later, when it was discovered that it undergoes a form of non-reciprocal homologous recombination, called gene conversion, it was renamed Male Specific region of chromosome Y (MSY). The Y-specific region spans 23 Mb, 8 Mb on the short arm (Yp) and 14.5 on the long arm (Yq) and includes heterochromatic and euchromatic sequences (figure 1.2.7; Skaletsky et al., 2003): 1) The heterocromatic sequences are located in the centromeric region (1Mb), in the euchromatic region (an island of 400 Kb) and in the distal Yq (40 Mb) (figure 1.2.7), accounting for two-thirds of the Y chromosome. It does not contain transcribed sequences and is primarily composed of two large repeated sequences: DYZ1 and DYZ2, representing 50% of all the r-chromosome DNA (Kunkel et al., 1977). It has been estimated that there are about 500 copies of DYZ1 repeated in tandem. The number of repeated copies varies from individual to individual and determines a length polymorphism of the heterochromatic region of Y chromosome (Cooke, 1976). Sequence analysis showed a strong heterogeneity among these sequences. Also DYZ2 is tandemly repeated and is located mainly in the distal part of chromosome Y (Cooke et al., 1982). Most of the repeated sequences present in the heterocromatic region are not shared with X chromosome. 2) The euchromatic region includes 156 euchromatic transcriptional units, half of

42

1.2 Human population genetics which (78) coding for proteins. Sixty belong to 9 different Y-specific gene families (each characterized by more than 98% homology among its members), the remaining 18 are single-copy sequences, for a total of 27 distinct families of proteins encoded by the MSY region. Among the remaining 78 apparently non- coding transcriptional units, 13 are single-copy sequences and 65 are grouped into 15 MSY-specific families. According to their origin, the euchromatic sequences can be distinguished in: X-transposed, X-degenerate and ampliconic. The X- transposed sequences are 99% identical to DNA sequences in Xq21. Their name reflects their origin due to a massive X-to-Y transposition occurred about 3-4 million years ago (mya), after the separation of the human from the chimpanzee. This class exhibits the lowest gene-density as well as the highest density of interspersed repeated elements. The X-degenerate sequences contain single-copy genes or pseudogenes with 60% to 96% nucleotide sequence identity to their X homologues, probably the surviving relics of ancient autosomes from which the X and Y chromosomes co-evolved. All the twelve constitutive genes of MSY map in this region while, of the eleven genes predominantly expressed in testes, only one gene, SRY (Sex-determining Region Y), maps in this region. The ampliconic regions are composed of sequences that exhibit marked similarity to other sequences in the MSY. This amplicons are scattered across the euchromatic long arm and the proximal short arm and occupy together 10.2 Mb. The ampliconic sequences show the highest density of genes among the three regions described above (135 of the 156 genes described); within them 9 Y-specific gene families have been identified, which code for 60 proteic units and 75 of the 78 transcription units, apparently not coding for proteins. Unlike the genes of the X-degenerate region, those in the ampliconic region show primarily or exclusively testis-specific expression.

Figure 1.2.7 a) Schematic representation of the Y chromosome. b) Enlarged view of a 24 Mb portion of the MSY, extending from the proximal boundary of the Yp pseudoautosomal region to the proximal boundary of the large heterochromatic region of Yq.

43

1.2 Human population genetics 1.2.2.1 Genes

In comparison with the other human chromosomes, Y chromosome has a limited number of genes: currently only 562 genes have been identified (http://www.ensembl.org/Homo_sapiens/Location/Chromosome?chr=Y). Genes on MSY can be divided in two classes: the first one includes single-copy genes with X homologues, ubiquitous expression and controlling housekeeping cell functions, the second one includes multiple-copy genes (the SRY makes an exception), only expressed in testes and with more specialized functions (figure 1.2.8; Lahn and Page, 1997).

Figure 1.2.8 Gene map of NRY region. pter, short-arm telomere; qter, long-arm telomere. Listed immediately above the chromosome are nine NRY genes with functional X homologs (red). Immediately below the chromosome are 11 testis-specific genes or families (blue), some with multiple locations (Lahn and Page, 1997) 1.2.2.2 Origin

The X-Y homology observed in the PAR regions together with that observed for many genes support the hypothesis that X and Y co-evolved from a couple of autosomes (Ohno et al., 1967). According to the “region by region” hypothesis (figure 1.2.9) proposed by Lahn and Page (1999), the differentiation of the two sex chromosomes would have started in region 1 between 240 and 320 mya, shortly after the separation of mammal and avian lines. This would explain why the avian sex chromosomes are completely different from those of mammals. Regions 2 and 3 would have started their differentiation from 130-170 and 80-130 mya, respectively, whereas the differentiation of the fourth region would have started only 50 mya, after the separation of simian and pro-simian lines. The analysis of the genes in region 1 has uncovered another interesting information: the SRY gene and its homologue SOX3 are among the most divergent genes in humans and thus the firsts to differentiate. This is in accordance with the idea that two autosomal chromosomes became sexual because of a mutation that had turned SOX3 into the SRY gene for male sex determination (Foster and Graves, 1994).

44

1.2 Human population genetics

Figure 1.2.9 The Y-chromosome origin according to the “region by region” hypothesis (Lahn and Page, 1999). Four inversions on the Y chromosome are postulated. Each inversion reduced the size of the pseudoautosomal region (black; for simplicity, only one pseudoautosomal region is shown for each chromosome) and enlarged the portions of the X (yellow) and Y (blue) chromosomes that did not recombine during male meiosis. Ongoing decay and loss of Y genes offset these periodic expansions of the NRY. Points of divergence from the sex chromosomes of other mammals are indicated

Some years later Skaletsky et al., (2003) proposed a more complex theory that shares the Lahn and Page‟s idea of a common origin of modern sex chromosome from a common ancestral pair of autosomes (figure 1.2.10). The idea that the process of divergence affected only the Y chromosome, due to the loss of its capacity to recombine, while its counterparts on the X would have retained its original and ancestral function, did not explain the main features of the "ampliconic" sequences, such as testis specific expression, almost perfect palindromic sequences and high similarity with autosomal sequences. To explain this, the "ampliconic" sequences were supposed to have evolved from a high variety of substrates with different genomic and molecular mechanisms. For example, the VCY (Variably Charged Y chromosome) and RBMY (RNA-Binding Motif gene family on Y) genes, despite being in the "ampliconic" regions, were supposed to be degenerate copies of genes located on the X chromosome, probably evolved from common ancestors of the two sex chromosomes. The DAZ (Deleted in AZoospermia gene) gene seemed, on the contrary, to have originated during the evolution of primates by transposition and subsequent amplification of an autosomal transcriptional unit, DAZL, which still exists on chromosome 3. The systematic study of homology between the MSY region and the autosomal regions showed that a whole series of autosomal transpositions, followed by amplification, have contributed to the establishment of the ampliconic regions. Finally, an additional mechanism of evolution was proposed for the CDY (Chromodomain

45

1.2 Human population genetics Family Y) gene, supposed to be the result of the retrotransposition and subsequent amplification of a mature mRNA of an autosomal gene.

Figure 1.2.10 Molecular evolutionary pathways and processes that gave rise to genes in three MSY euchromatic sequence classes. X-degenerate genes and pseudogenes (yellow background) derived from an autosomal pair that was ancestral to both the X and Y chromosomes. X-transposed genes (pink background) derived from X-linked genes, which in turn derived from the ancestral autosomal pair. Ampliconic genes (blue background) were derived through three converging processes: amplification of X-degenerate genes (for example, RBMY, VCY ); transposition and amplification of autosomal genes (DAZ); and retroposition and amplification of autosomal genes (CDY). Boxes enumerate dominant themes in X-degenerate (yellow) and ampliconic (blue) gene evolution. The asterisk indicates that Y–Y gene conversion is apparently common in the 61% of ampliconic sequences that exhibit intrachromosomal identities of ≥ 99.9% (Skaletsky et al., 2003).

46

2. REVIEW OF THE LITERATURE

2. REVIEW OF THE LITERATURE

2.1 HLA genetics and CBT 2.1 The importance of HLA in CBT

In the setting of unrelated allo-HSCT, cord blood represents an alternative source of HSCs other than BM and PBSC. The main practical advantages of this stem cell source are the simplicity of procurement, the safety for mothers and donors, the low risk of infection transmission. Furthermore, due to the immaturity of the immune system at birth, less alloreactive T-cells are present in the graft and cord blood is considered more predisposed to tolerance. Consequently, the incidence and severity of acute and chronic GvHD is decreased in comparison with other graft sources (Rocha et al., 2000). These are the reason why a growing number of CBBs have been established worldwide in the last decades. CBBs are repositories where cryopreserved CBUs are readily available for transplantation. To do this, CBBs list their CBUs by the means of national donors‟ registries using minimum essential data required for search procedure, sorting potential donor candidates according to two main criteria: first, HLA match categories, and then, cell dose. HLA system, which corresponds to MHC in humans, is highly polymorphic (the most one in human genome) and plays a primary role in immunity that need to be overcome to allow allografts between non HLA identical individuals. It is well known that type and number of HLA mismatches is critical for transplantation outcome: immunological complications such as graft rejection and GvHD occur after HSCT and severely affect morbidity and mortality. Due to the tolerogenic properties of cord blood, less stringent matching criteria apply for this HSC source. The current HLA definition in CBBs is based on serological/low resolution antigenic typing for HLA–A and –B and high resolution allelic typing for -DRB1, where until 1-2 mismatches are permitted (unrelated CBUs with 6/6, 5/6 or 4/6 match are acceptable donors, where HLA-A and B mismatches are preferable to DRB1 ones). This is in contrast to BM and PBSC unrelated donors that are chosen on the basis of high resolution allelic typing for HLA-A, -B, -C, - DRB1, -DQB1 (the best choice being a 9/10 or 10/10 donor), -DPB1 not yet included routinely but considered only in certain circumstances (Ballen et al., 2013). Our knowledge of donor selection strategies has been enhanced considerably over the years by publication of studies analyzing the outcome in very large groups of transplant patients leading to a large degree of consensus regarding the selection of the optimal donor, which is evolving along wide the availability of high definition testing methods. Recently, Ruggeri reviewed the impact of immunogenetic factors on CBT outcomes, pointing out the importance of increasing the level of donor–recipient match at class I and class II HLA loci (Ruggeri et al., 2016). Table 2.1.1 shows a summary of the main papers reporting on this topic.

47

2.1 HLA genetics and CBT In single unit CBT, studies on the association between cell dose and degree of HLA matching and successful outcome after CBT have been described since 1989 (Gluckman et al., 1989; Gluckman et al., 1997; Rubinstein et al., 1998). In a Eurocord analysis of 2004 in 550 children and adults undergoing CBT for malignat disorders, the number of HLA mismatches demonstrated to correlate with graft failure, suggesting inferior engraftment with increased HLA disparity. Cell dose was also found to be a primary factor correlated to neutrophil and platelet recovery (Gluckman et al., 2004). Then in 2007 a CIBMTR analysis reported on the comparison between 503 CBT and 282 BM-HSCT. Neutrophil and platelet recovery were similar after BM-HSCT and 6/6 CBT, and higher after 4 or 5/6 CBT if TNC were > 3 x107/Kg, with a compensation of increased disparity by augmenting cell dose (Eapen et al., 2007). Overall these results led to the definition of criteria for CBUs choice (Rocha et al., 2009). A subsequent analysis of 1061 single CBTs confirmed a better outcome for 6/6 matched CBUs with any cell dose, and a well defined combination of mismatches and TNC levels for 5/6 and 4/6 matched CBUs (Barker et al., 2010). In 2011 a collaborative analysis by Eurocord and CIBMTR addressed the importance of HLA-C at intermediate resolution, which thereafter has become recommended. In fact NRM was higher in HLA-C mismatched CBT compared with fully matched transplants (8/8) (P=0.018) and also higher for CBT with a single mismatch at HLA-A, -B, or -DRB1 and mismatched at HLA-C compared with CBT with a single mismatch at HLA-A, -B, or -DRB1 but matched at HLA-C (P=0.029)(Eapen et al., 2011). Subsequently several studies have been showing the growing relevance of allelic disparities in CBT setting, despite the definition of CBU matching based on serological/low resolution antigenic typing for HLA–A and –B and high resolution allelic typing for -DRB1 has been maintained (see table 2.1.1). Recently a collaborative CIBMTR-Eurocord retrospective analysis by Eapen et al., considering 1568 single CBTs reported that allele-level matching at HLA-A,-B,-C, and -DRB1 is associated with the lowest NRM in patients with acute leukemia and myelodysplastic syndrome (Eapen et al., 2014). According to Eapen et al., CBUs fully matched or with one/two allele mismatches should be preferred, while the use of CBUs with three or more allele mismatches should be carefully assessed for the risk of graft failure and NRM.

48

2.1 HLA genetics and CBT Table 2.1.1 List of the principal papers on high-resolution HLA typing and CBT outcomes (Ruggeri et al., 2016)

In double CBT, the results are still controversial. Double CBT has been developed to overcome the limit of cell dose and using CBUs in adults. Donor selection allows up to two HLA disparities between each CBU and between patients and CBU. After infusion, only one CBU is responsible of long term engraftment as a dominant unit, with the factors predicting which will be the dominant unit remaining unexplained as no association with the degree of HLA matching (either at antigenic or allelic level) has been reported (Avery et al., 2011). Recently, Oran suggested that high resolution typing and selecting CBU matched at least at 5/8 alleles may reduce 2-year TRM (39% vs 60% for CBUs with 4/8 or less), where 7-8/8 matched units remain the best option (no deaths for TRM) (Oran et al., 2015). However, the analysis by Brunstein reported different results stating that HLA matching at allele level has no impact on CBT outcome, and even a lower risk of relapse and treatment failure associated with a higher degree of HLA mismatch for acute leukemia patients (Brunstein et al., 2016). On the basis of these premises, nowadays a transplant center is oriented to select a CBU basing on a more defined level of HLA typing as respect to the current standard (antigenic typing for HLA-A and –B, allelic for DRB1 only), an this is modifying the algorithm of donor choice (Dahi et al., 2014). Table 2.1.2 outlines the principles for cord blood donor selection, as per the guidelines of EBMT Cord Blood Committee.

49

2.1 HLA genetics and CBT

Table 2.1.2 Criteria for the choice of cord blood unit, on behalf of Eurocord and the Cord Blood committee of CTIWP/EBMT (Cellular Therapy & Immunobiology Working Party of the European Society for Blood and Marrow Transplantation)(Ruggeri et al., 2016).

In this scenario, the simplest approach is to perform allele typing for HLA- A,–B,-C, and -DRB1 on a CBU at confirmation of procurement, prior to release for HSCT, giving the transplant physician the information for the best donor choice, as recommended by NetCord FACT standards, 6th edition, 2016 (www.factwebsite.org). Alternatively, HLA typing of CBUs at medium/high resolution level can be reasonably proposed at banking/prior to listing and is a strategy that an increasing number of CBBs has been adopting along with the development of sequencing-based typing techniques for increasing cost- effectiveness.

50

2.2 non-HLA genetics and CBT 2.2 The importance of non-HLA genetic factors in CBT

Despite the role of non-HLA genotypes has been widely investigated in the setting of HSCT, only one study has been reported on CBT. Table 2.2.1 shows a summary of non-HLA polymorphisms and relevant papers in this regard. Table 2.2.1 Non-HLA polymorphisms and HSCT (Dickinson and Middleton, 2005).

In the setting of CBT, Kögler et al., investigated the role of cytokine TNF- alfa and IL-10 gene polymorphisms, and several minor histocompatibility antigen (mHag) mismatches in the development of GvHD after unrelated CBT. In this study, the cytokine gene polymorphisms TNFd3/d3 and IL-10 -1064 of neither the recipient nor the donor alone or in combination were associated with acute GvHD severity, even if a trend of association with the presence of both genotypes and occurrence of GvHD grades I to IV was found (see table 2.2.2). These results were in contrast with those obtained in BM-HSCT were the same polymorphisms have been significantly associated to occurrence of GvHD. The need for including larger and homogeneous cohorts of patients was outlined as an important factor to be evaluated in future studies aimed at verifying these findings. Table 2.2.2 Results of the statistical analysis of the correlation between cytokine gene polymorphisms of patients and acute GvHD (Kögler et al., 2002)

51

2.3 HLA and population genetics 2.3 Contribution of HLA to population genetics 2.3.1 The evolution of MHC

The Major Histocompatibility Complex (MHC) is a genomic region encoding proteins involved in antigen presentation, therefore playing a pivotal role in the adaptive immune response. MHC seems to originate from an ancestral region (the “proto-MHC”) acquiring the genes involved with the adaptive immune system at time of jawed vertebrates appearance. MHC organization varies in different groups of vertebrates. However some characteristics such as extreme polymorphism and gene clustering are conserved, pointing to some evolution advantage, which essentially consists of avoiding population extinction by pathogens (Martinez-Borra and Lopez-Larrea, 2012). Moreover, the study of KIRs (killer-cell immunoglobulin-like receptors), whose role is primarily in innate immunity, and their rapport with MHC genes in primates revealed an interaction of these two polymorphic systems indicative of co-evolution, a mechanism that is thought to trigger the rapid evolution of the MHC (in fact, part of MHC class I molecules are the only identified ligands for KIR receptors and KIR genes are known to interact specifically with certain HLA molecules) (Augusto and Petzl- Erler, 2015).

MHC evolution. It is known that the adaptive immune system has originated suddenly during the emergence of the jawed vertebrates, in a short period of time when all its characterizing elements appeared including T-cell receptor (TCR), Immunoglobulins (Ig) and the MHC. A possible explanation is that some evolutionary pressures, compared with other genes, made MHC genes evolve rapidly. The polymorphism of MHC genes is hypothesized to cause a diverse genetic susceptibility to infection; so that individuals with different class I and II genes select different peptides to activate the immune system. This represents a population advantage by avoiding the possibility that a pathogen would make the entire population become extinct. Many pathogens have developed strategies to interfere with antigen presentation and the immune system. The emergence of natural killer (NK) cells and their receptors are thought to have represented the response of the immune system to avoid this immune evasion. In humans, this recognition occurs via the use of HLA class I receptors that are found in the NK cell membrane, called KIRs, which are also very polymorphic. In apes, the interaction of these two polymorphic molecules, KIR and MHC, has revealed another possible mechanism that triggered the rapid evolution of the MHC (Martinez-Borra and Lopez-Larrea, 2012). Worth mentioning features of the MHC are: 1) class I and class II polymorphisms, basically generated by point mutations and recombination, but also gene duplication. Pathogen-driven positive selection may have promoted the

52

2.3 HLA and population genetics survival of new alleles with differences in their antigen presentation properties, whereas the sequences in other regions have been conserved to maintain the structure of the HLA molecules; 2) the existence of genes that encode proteins with similar functions. Loci producing similar proteins have been generated by gene duplication, such as human class I genes, namely HLA-A, -B and -C, which produce proteins with similar functions. The existence of several class I or class II genes increases the level of polymorphism thus being advantageous for an individual. Gene duplication also produces genes that may acquire new functions, such as for non-classical class I molecules that are produced from the classical class I molecules acquiring changes that cannot maintain their function; 3) and genetic linkage, which is the consequence of the presence of both class I and class II genes on the same chromosome forming a cluster. The organization of MHC region as a gene cluster allowed the co-evolution of different genes involved in the same function to form haplotypes. Thus alleles at different loci can form nonrandom combinations of alleles, probably because their proximity, thus facilitating the preservation of a given combination of alleles that is effective in a response to a particular pathogen (Martinez-Borra and Lopez-Larrea, 2012). In humans multiple MHCs have been completely sequenced and the region represents one of the most analyzed sections of the genome, encompassing ∼4 Mbp, containing about 0.1% of the human genome (but only 0.6% of identified genes). The MHC is the most gene-dense region of the human genome, particularly in class III, where genes are packed with little or no separation. Also in humans the main feature of the MHC is that it is extremely variable, particularly in the peptide- binding grooves of HLA molecules but also in DNA flanking the groove sequence. New HLA alleles continue to accumulate, their maintenance in a population being modeled by mechanisms of heterozygous advantage (overdominant or balancing selection), frequency-dependent selection and fluctuating selection, due to continual change in pathogen type and abundance. Another theory advocated for explaining MHC extreme variation is the so called “associative balancing complex” evolution, which proposes that mutations accumulate in the MHC because they are rarely expressed as homozygotes, with recombination rates so low that purifying selection is ineffective. Once fixed in the population, these mutations are thought to act by epistatic selection against recombinants. An admixture of different mechanisms is likely to be involved in generating and maintaining the large number of alleles in populations. Linkage disequilibrium is widespread in the human genome, however it is nowhere as extensive as in the MHC. At least in part the explanation may relate to the clustering of genes whose products closely interact at the functional level. A related phenomenon is the so called “polymorphic frozen blocks”. In brief, blocks of sequence may be identical on some disparate haplotypes, while they may differ dramatically on others, due to ancient variation occurred before human speciation and inherited as stretches of sequence that infrequently recombine. However, other factors may also be

53

2.3 HLA and population genetics advocated, such as a recent origin of some haplotypes with consequently insufficient time for recombination to separate markers (Trowsdale, 2011). It is generally assumed the selection pressure that fuels the generation of MHC variation is resistance to infection, being more effective the more different MHC loci and alleles are present in the population. In fact a large number of variants in a population could promote a form of herd resistance that controls the spread of infection. Therefore natural selection would take place at the level of the individual but, over the long term, groups or populations that encompass a range of alleles would have a survival advantage. A potential risk emerges when a population becomes relatively monomorphic, where the whole population could become susceptible to an infectious agent that escapes recognition by the specific MHC. Consequently, opportune mechanisms appear to operate to maintain out- breeding at the MHC, such as MHC-related disassortative mate selection. Other functions for MHC molecules, such as maternal-fetal interaction through NK cell receptors may also operate in selection and maintenance of the polymorphism in the population (Trowsdale, 2011). So MHC selection, through mechanisms such as disease resistance (and probably also reproductive fitness, if considering the relationship with KIRs), is likely to depend upon MHC variation, not only at the level of the individual, but notably at the level of the population. As a consequence, HLA class I and class II extensive polymorphism and linkage disequilibrium, namely the existence of genes blocks encoding proteins with similar functions that form non-random alleles combinations (haplotypes), can be profitably applied to population studies. Whereas the molecular evolution of HLA (but also KIR) polymorphisms has been most likely to be dependent on natural selection, principally driven by host–pathogen interactions, their genetic variation worldwide shows traces of human geographic expansion, demographic history and cultural diversification. Therefore it is increasingly acknowledged that, in addition to mitochondrial DNA, Y-chromosome, microsatellites, SNPs and other genetic markers, immunogenetic polymorphisms represent important and complementary tools for anthropological studies (Fernandez-Vina et al., 2012; Sanchez-Mazas et al., 2011). KIR and MHC co-evolution. Due to the importance in disease and reproduction, KIR–HLA combinations are under selective pressure. There are global evidences of coevolution of KIR with HLA that can be seen when large datasets of different populations are considered (Augusto and Petzl-Erler, 2015). Since when Single et al., (Single et al., 2007) provided the first evidence, others reported on coevolution of these two gene families across worldwide populations, particularly on regard to the evaluation of KIR–HLA combinations and reproductive success. In this setting, Native-Americans have been a target population particularly informative about evolution in the course of recent human history, due to the long term genetic isolation from populations of other continents, the small population

54

2.3 HLA and population genetics size, and the great inter-population diversity. Different American populations showed similar frequencies of KIR ligands despite having distinct sets of HLA alleles, suggesting that natural selection as well as other mechanisms (such as bottleneck and founder effects) may act to maintain a set of functional ligands that efficiently control KIR function, even in those populations that exhibit very different HLA allele frequencies (Augusto and Petzl-Erler, 2015). However KIR diversity has been investigated only in few Native-American populations so far. An exception is represented by the brillant study by Gendzekhadze et al., on coevolution of KIR and HLA in the Yucpa. The authors determined the complete allele-level KIR haplotypes and frequencies and were able to demostrate that only six KIR haplotypes and three HLA epitopes were present in this native population of Venezuela, which may represent the lowest grade of diversity acceptable for any human population. They also found a KIR variant, which replaced the one brought form Asia with ancient migrations, and may have been implicated favorably in the control of infections, providing evidence of balancing selection that appears to operate on KIR as strong as for HLA class I (Gendzekhadze et al., 2009). Besides, the effects of KIR and HLA coevolution can be found on diversity of both complexes, there are much more evidences for the evolutionary causes and functional consequences of HLA diversity, considering that the latter is extensively more studied. As HLA and KIR polymorphisms are encoded by independent regions of human genome, are expressed by different kinds of molecules, and are studied in different sets of populations, they may act as complementary in anthropological studies, where HLA represents the present and KIR the future of immunogenetic studies applied to anthropology (this difference mainly relying on the typing technologies used to analyze their variability and the level of understanding of its diversity in human populations, that are still increasing for HLA and only at the beginning for KIR) (Sanchez-Mazas et al., 2011). 2.3.2 The HLA diversity in different populations

Since the first applications of such polymorphisms in population genetics, it has been observed that the patterns of genetic diversity worldwide tended to exhibit a geographic structure. Population trees generally discriminated populations from different continents, where the main controversy was the position of Africans, either segregating with Europeans within an „occidental group‟ separated from an „oriental group‟ of Asian, Amerindian and Oceanian populations, or segregating separately from the others. Natural selection was probably not the only mechanism, as the patterns of genetic diversity may also have been shaped by the history of human migrations. As a consequence, increasing interests have been observed over the years in using these immunogenetic systems as informative to reconstruct the history of human peopling (Sanchez-Mazas et al., 2011).

55

2.3 HLA and population genetics It is assumed that the polymorphism of HLA alleles is mainly functional, because different HLA molecules bind different sets of peptides. High sequence diversity is therefore required in the HLA peptide binding region to bind a high variety of pathogen-derived peptides that have to be subsequently presented to T- cell receptors. The distribution of HLA alleles in different populations may be a consequence of this functional polymorphism. Individuals heterozygous for HLA alleles may have a wider peptide binding repertoire and therefore an enhanced capability to respond to more pathogen variants (selection of heterozygotes). Also the presence of different loci both within the class I (A, B and C) and II (DR, DQ and DP) of molecules may compensate for the deficits of homozygosity. The strong linkage disequilibrium (LD) is responsible for the fact that some HLA alleles are observed together in populations more frequently than expected based on their gene frequencies, such as some DRB1 alleles being in strong LD with specific DQA1 and DQB1 alleles. Differences in haplotype combinations of HLA alleles between different populations could be explained by the compensatory abilities of allelic products encoded by the haplotype to bind peptide epitopes from different pathogens, or by past population differentiations or recent admixture. Although the number of known alleles has been growing from year to year, between 10 and 30 alleles are observed per population for most of the HLA loci, the largest number being observed at HLA-B. HLA alleles generally exhibit low to medium frequencies, and many of them are very rare, with the exception of the DPB1 locus (and populations that underwent rapid genetic drift). In fact 60–70% of known classical HLA alleles have only been reported up to three times, suggesting that new allele variants are being generated on a regular and ongoing basis. Despite evidence of natural selection, HLA polymorphism could be highly informative for anthropological studies, as the patterns of HLA genetic variation reveal past spatial and demographic human populations. Globally, it has been estimated that the genetic distances between populations based on frequency data for all HLA loci is significantly correlated with geographic distances, leading to the conclusion that human migrations are likely to be the primary force in the evolution of HLA variation worldwide, in addition to demographic expansions and contractions, that contributed to allelic diversification and population diversification, respectively (Sanchez-Mazas et al., 2011). HLA DNA sequences may advantageously complement HLA allele frequencies as a source of data used to explore the genetic history of human populations, providing new insights on human MHC molecular evolution. In a study by Buhler and Sanchez-Mazas (2011), a large dataset of 2,062 DNA sequences has been used to analyze seven HLA genes in 23,500 individuals of about 200 populations worldwide, showing that the global patterns of HLA nucleotide diversity among populations significantly correlated with geography, with some unexpected genetic relationships. The authors found that the investigated populations have accumulated a high proportion of very divergent

56

2.3 HLA and population genetics alleles at all loci except HLA-DPB1, stating for an advantage of heterozygotes expressing molecularly distant HLA molecules (asymmetric overdominant selection model) (Buhler and Sanchez-Mazas, 2011). Therefore, DNA sequence diversity among populations came out to be also important for the interpretation of the observed HLA polymorphisms. 2.3.3 HLA genetic differentiation in America

From the point of view of HLA, Amerindian populations are distant from those of other continents, but appear to be also distant genetically from each other. Their allelic diversity is limited, with only few alleles exhibiting very high frequencies (e.g. DRB1*04:07, *04:11, *0802, *14:02 and/or *1602). HLA alleles found in Amerindian belong to a subset of lineages observed in , in accordance with a Beringian origin of the double continent first settlers. It is postulated that in both Oceania and the Americas, rapid genetic drift has occurred due to the small population sizes. This led to a drop of genetic diversity, while the large molecular differentiation among most HLA alleles might have overcome this limit ensuring immunological protection. Of particular interest are the studies of American Indian populations from Mexico and South America, where despite the restricted number of alleles, all HLA loci with the exception of DPB1 were found to be present with high levels of heterozygosity (Buhler and Sanchez-Mazas, 2011). In Amerindian populations, very few allelic lineages (namely four HLA-A, seven HLA-B, seven HLA-C, five HLA-DRB1, two HLA-DQA1, two HLADQB1 and five HLA-DPB1) are detected, while several alleles of the same lineage are present in each population. Many of these are not found in other outbred populations, and are postulated to have been generated in America as novel allele by gene conversion events (Fernandez-Vina et al., 2012). In brief, all putative novel alleles may derive from a few founder alleles (those alleles of each lineage found in other populations) with the nucleotide sequences donated in the gene conversion events coming from other founder alleles. This is supported by the observation that most of the novel alleles identified differ from other alleles in the same lineages by amino acid substitutions in residues contributing to the peptide-binding groove, leading to potential new peptide-binding capabilities. Gene conversion events may involve alleles of the same locus. The majority of the putative novel alleles found in these populations refers to HLA-B locus, which is characterized by the highest degree of diversity. It has been postulated that HLA-B has diversified very rapidly in the South American populations. Interestingly, novel HLA-B alleles are present at the highest gene frequencies in many populations, suggesting that they were positively selected probably because of some selective advantages. In fact, as the founder polymorphism is so limited for these populations, it is likely that any novel allele arisen in these environments would enlarge the peptide-binding repertoire of these populations. Therefore it can be hypothesized that HLA-B locus diverged more

57

2.3 HLA and population genetics than the HLA-A or -DRB1 loci in the South American populations, for the larger number of founder alleles and consequently the higher probability for intra-locus gene conversions. The hypothesis of a pathogen-driven evolution shaping the pattern of global HLA diversity, supported by Prugnolle et al., (2005a) who found a significant correlation between HLA class I heterozygosity levels in populations and pathogen richness at the global level, drop when Amerindian populations are not taken into account (this correlation tends to decrease)(Sanchez-Mazas et al., 2011). In fact Amerindians, as isolated populations in which significant founder effects restrict the level of polymorphism, show high levels of lineage differentiation that may have been selected to counterbalance environmental factors. In addition to classical population genetics analyses, it is likely that new approaches using computer simulation may be useful to unravel the effects of stochastic and deterministic factors on the evolution of HLA polymorphism, improving the interpretation of HLA diversity patterns worldwide in the near future. The distribution of HLA alleles in various outbred populations was initially examined the serological level, leading to the observation that the levels of homozygosity displayed by HLA-A, -B –C and -DRB1 loci were below those expected for populations evolving under neutral conditions (e.g. genetic drift). Using molecular typing methods revealed that many serologically indistinguishable subtypes can be observed in the same population. This led to the observation that some alleles of the same serotype or allelic lineage with limited structural differences were observed with distinctive frequency distributions in different populations. Therefore the HLA nomenclature has evolved over time to capture the definitions achieved by methodological advances. It has been suggested that the analysis of both nucleotide sequence homology and haplotype constitution of alleles at contiguous loci shall be taken into account to elucidate the evolutionary relationships between alleles without erroneous inferences of genetic relatedness between populations (Fernandez-Vina et al., 2012). For instance, considering the alleles DRB1*08:04:02, DRB1*08:04:04, DRB1*08:07 and DRB1*08:11, which are found only in populations from the American continent, and looking deeply at the nucleotide sequences that differentiate these alleles among their group, we can deduct that they are evolutionarily related, deriving from the allele DRB1*08:02:01, which has a high frequency in almost all Native American populations. As DRB1*08:02:01 is also found in Asian populations, its presence may identify the founder migrations from Asia to America through the Bering Strait. Studies performed in outbred populations of South America showed the presence of HLA alleles that were common and uniquely found in one group (but that were virtually absent in other groups) and several ethnic-specific HLA alleles in Native Americans (Moraes et al., 1993). In the American Indian tribes, very few allelic lineages were observed (4 HLA-A, 7 -B, 7 -C, 4 -DRB1, 2 DQA1, 2 DQB1 and 5 DPB) but several alleles of the same lineage were present in each tribe. Many

58

2.3 HLA and population genetics of these were not observed in other outbred populations or tribes, and are supposed to have been generated in the Americas by gene conversion events (novel alleles). These studies have identified large genetic distances between populations from the American continent, which are significantly reduced when replacing alleles by their corresponding serotypes. This is in contrast to the genetic distances in populations from other continents that are are in general smaller and correlate well with geographical distances, and do not differ significantly when evaluated by distribution of alleles or their corresponding broad serotypes (Fernandez-Vina et al., 2012). Furthermore, LD patterns between alleles of various HLA loci may provide significant insight with regard to the history of a particular allele and may help elucidate possible evolutionary relations between alleles. Population studies revealed that DRB1 alleles display tight associations with DQA1 and DQB1 alleles. These shared block associations may mark the evolutionary relationship between some DRB1 alleles, possibly due to a rapid or recent diversification of an allelic lineage, or to selection for specific cis combinations of DRB1 and DQ alleles. LD analysis between alleles of the class I loci showed tight associations between HLA-B and HLA-C alleles but weaker between HLA-B and HLA–A alleles. Some alleles with identical protein sequences and distinguished at the nucleotide sequence level by silent substitutions or substitutions in non-coding segments may be related by descent from a common ancestral sequence, but also may have arisen independently (convergent evolution). The occurrence of the same allele in LD with different alleles at neighbouring loci in different, geographically distant populations suggests that these events may have occurred. Alternatively, these alleles may be ancient, and diverged through recombination that generated new haplotypes. Undetected convergent evolution events may be confounding in the investigation of population relationships, leading to erroneously close relations between populations. One example is the case of B*52:01:01 and B*52:01:02 alleles that differ by one silent substitution at the third nucleotide of codon 23. B*51:01:01 is present in the same populations in which B*52:01:02 is found, and it can be postulated that B*52:01:02 may derive from B*51:01: 01 from a gene conversion event introducing a segment present in B*40:02:01. In sub-Saharan Africans, both B*52:01:02 and B*51:01:01 are in LD with C*16:01, while the B*52:01:01 allele is in LD with C*12:02:02 in Asians and Europeans, and also a de novo generation of HLA-B*52:01:02 may be postulated in Native American populations (Fernandez-Vina et al., 2012). In many reports, it has been noticed that the genetic distances between open populations correlate well with their geographical locations, and for migrant populations with their regions of origin. In contrast, the genetic distance measurements are larger than expected between inbred populations of the same region due to a large number of unique alleles in a small number of lineages as the result of limited founder polymorphism. In these populations, any novel allele may have been positively selected to enlarge the communal peptide-binding repertoire. Conversely, some alleles are found in multiple populations with distinctive

59

2.3 HLA and population genetics haplotypic associations, suggesting that convergent evolution events may have taken place as well. For all these reason, allelic diversity in HLA should be analysed in the context of HLA haplotypes and blocks and in conjunction with other genetic markers to accurately track the migrations of modern humans.

2.3.3.1 HLA analysis of South America populations: Ecuador and Peru

Unlike any other population of similar size or antiquity, the Amerindians are unique in that they have undergone evolution in virtual isolation for 15-20 Kya and, as such, offer the opportunity to analyze genetic diversity in population developing within a known time frame and vicinity. The Cayapa Indians of Ecuador speak a language classified as belonging to a branch of the Amerind family, one of the three major groups (Amerind, Nadene, and Aleut-Eskimo) in which Native American languages are subdivided corresponding to the purported three waves of migration from Asia into the American continent (Torroni et al., 1993). Although their origin is still debated, the Cayapa are believed to be the first inhabitants of Ecuador. They may have originated in the Amazonian region, and subsequently migrated to the Andes and later to the forest and coastal regions of Ecuador, or have originated in the Andean Highlands in the northern area of the country and subsequently have moved toward the coastal regions, as a result of the Incan expansion in the 15th century and the Spanish invasion of the 16th century. Modern Cayapa are a small population (~3,600 individuals), subsisting on agriculture, hunting, and fishing, with living conditions affected by various infectious diseases such as malaria, tuberculosis, intestinal parasitosis, and onchocerciasis (also known as "river blindness"). Cayapa people have been isolated for a long time and have maintained their genetic integrity, since no genetic admixture is seen between the Cayapa and neighboring African American and European populations. This provides a special value for studies aimed at reconstructing the evolutionary history of HLA polymorphism, as only a very limited admixture is present as a confounding factor, enabling to observe more clearly the effects of selection, mutation, and recombination. A study of 1995 by Trachtenberg et al., reported on the analysis of HLA class II loci (defined by DNA-based typing) in a sample of Cayapa Indians of Ecuador. HLA class II genes have four primary polymorphic loci-DRB1, DQA1, DQB1, and DPB1, separated by ~60, ~20, and ~400 kb of DNA, respectively. Allelic variation is extensive at these loci. Recombination between these loci is infrequent, occurring at a rate of about 1% between DQB1 and DPB1, while recombination between DQA1 and DRB1 or between DQA1 and DQB1 has never been observed within families but has been inferred by comparing different DRB1-

60

2.3 HLA and population genetics DQA1-DQB1 haplotypes. The authors defined the Cayapa HLA class II allele frequencies for DRB1, DQA1, DQB1, and DPB1, as summarized in figure 2.3.1.

Figure 2.3.1 Allelic distribution of the HLA class II loci in the Cayapa Indians of Ecuador (Trachtenberg et al., 1995). At the DRB1 locus 13 alleles were present, with the three most common (DRB1 *0901, DRB1 *0407, and DRB1 *1402) accounting for 67% of the total sample; five DRB1 alleles were rare, having only one (in the case of DRB1*1503, *0403, *0408, and *0410) or two (in the case of DRB1*1102) copies. Each of the three DQA1 alleles (DQA1*0301, *0401, and *0501) was relatively common, ranging in frequency from 14.5% to 56%, with DQA1*0301 predominating. The four DQB1 alleles found (DQB1*0301, *0302, *0303, and *0402) showed frequencies ranging from 11% to 41.5%, with DQB1*0302 at the highest frequency. At the DPB1 locus, two alleles (DPB1*0402 and *1402) predominate among the six detected, and constitute 89% of the sample in roughly equal proportions. As previously reported, several DRB1, DQA1, and DQB1 allelic lineages are absent from this and other Amerindian populations. Each DRB1 allele

61

2.3 HLA and population genetics has a single primary (high-frequency) association with DQB1 (see table 2.3.1). For instance, alleles DRB1*1102 and *1402 are predominately associated with DQB1*0301. Many of the common DRB1 alleles are found on one or a few additional DQB1 haplotypes, besides the most common haplotype. For instance, DRB1*0802, typically associated with DQB1*0402, was also found with DQB1*0302, indicative of recombination between the DR and DQ regions. Also DRB1*0407, the most common allele, was found only on the DQB1*0302 haplotype. Concerning DQA1*01 and DQB1*05 or DQB1*06 alleles, the Cayapa do not have either of these DQA1 and DQB1 groups but just three DQA1 alleles and four DQB1 alleles, giving a potential total of 12 distinct DQA-DQB haplotypes. Among these potential 12, Cayapa have 9 haplotypes, 4 of which are unique to this population (see table 2.3.2). The authors also discovered a new variant present in several Cayapa samples, namely DRB1*08042 allele, which has been supposed to have derived from DRB1*0802.

Table 2.3.1 Estimated DRB1-DQA1-DQB1-DPB1 haplotype frequencies and significant positive disequilibrium values for DR-DQ-DP haplotypes in the Cayapa (Trachtenberg et al., 1995).

62

2.3 HLA and population genetics Table 2.3.2 Inferred DRB1-DQA1-DQB1 recombinants among Cayapa haplotypes (Trachtenberg et al., 1995).

The comparison between Cayapa class II alleles and those of North American Amerindian samples with Asian, Caucasian, and African distributions indicate a reduced level of polymorphism among Amerindians. The DQA1*01 and *02 and the DQB1*05 and *06 lineages are absent, as are the DRB1*01, *03, *07, and *10 lineages. This reduction in overall class II polymorphism is consistent with a population bottleneck of the ancestral Asian population that migrated over the Bering land bridge to America ~14,000-30,000 years ago, as supported by evidence from mtDNA and other genetic studies (Torroni et al., 1993). In an alternative view, it possibly reflects selective forces. The allelic distributions of all four of the class II loci examined are more even than that expected under the hypothesis of selective neutrality in the Cayapa and may be indicative of a long history of balancing selection tending to maximize allelic heterozygosity, which is especially apparent in the DQB1 locus. The DPB1 allelic distribution in the Cayapa is dominated by DPB1*0402 and *1401, which together constitute 89% of the alleles sampled. In particular, DPB1 *1401 allele is absent or rare in most human populations and is found at very low frequencies (< 5%) in North American Amerindians and at moderate frequencies (about 10%) in South American Amerindians from Brazil and Argentina. Moreover, the advantage of studying disequilibrium in a highly polymorphic multilocus system with low but measurable recombination is that the recent evolutionary history of the system can be inferred. In this population, each DPB1 allele has one or more haplotypes in significant positive disequilibrium, such as for the great majority of DRB1 alleles as well. Of the common DRB1 alleles, only DRB1*0901 approaches linkage equilibrium with

63

2.3 HLA and population genetics DPB1. Some of the DR-DQ haplotypes in linkage disequilibrium with DPB1 (e.g., DRB1*1602-DQB1*0301) are unique to Amerindians and, therefore, presumably are more recent. Therefore the Cayapa DR-DP LD data are consistent with a recent origin and a selective process that Trachtenberg et al., postulated to have occurred recently, after the separation of Northern American and Southern American Amerindians (~10,000 years ago, or ~500 generations ago). Moreover, any disequilibrium due to genetic drift (i.e., neutral) would rapidly disappear; thus, selection favoring the particular DR-DP haplotypes observed in high positive disequilibrium appears to have occurred within the past 10,000 years (Trachtenberg et al., 1995). Quechua-speaking population groups of Peru live in the Andean highlands. The official language of the ancient Inca Empire was Quechua, but the Quechua culture originated in central Peru at least 1000 years before the rise of the Inca Empire in the earl 1400„s. Actually, 13 million people of Peru, Ecuador, Bolivia, Brazil, Argentina, Colombia, and Chile speak Quechua. In the study by Tsuneto et al. (2003), seven South American Amerindian populations were compared for HLA class II polymorphisms, including Quechua (n=44), all from the mountainous regions of Peru. In fact HLA alleles and haplotypes may be excellent markers to understand the genetic relationships between populations and unravel when and where did these alleles emerged. The study of the HLA variability of Native American populations has revealed several alleles specific to one or more of the Latin American indigenous populations. The most frequent alleles and haplotypes are common also in other Amerindian populations, with each HLA-DRB1 allele typically found in combination with just one DQA1-DQB1 haplotype, most likely as a result of random genetic drift and reduced gene flow from non-Amerindians. Amerindian populations which did not receive gene flow from populations originated in other continents, lack alleles of lineages DRB1*01, DRB1*03, DRB1*07, DRB1*10, DRB1*11, DRB1*12, DRB1*13, and DRB1*15. Among populations analyzed, the Quechua have the highest admixture rate, with 22.7% non-Amerindian DRB1 alleles. As all Amerindian populations, also the Quechua have significantly reduced allelic and haplotypic HLA diversity. However, diversity was highest in the Quechua who received the most intense gene flow from non-Amerindians. In agreement with the results of previous studies of Amerindian populations, the common alleles of locus HLA-DRB1 belong to the DRB1*04, DRB1*08, DRB1*09, DRB1*14, and DRB1*16 lineages/groups (table 2.3.3). DRB1*0411 is the most widespread of lineage DRB1*04 (mean frequency of 25%), however it does not occur in the Quechua. Conversely, the next most common DRB1*04 allele, that is DRB1*0407 occurs with the highest frequency in the Quechua (14.8%), being present in three of the seven populations investigated. In regard to DRB1*08 lineage, DRB1*0804 is absent in the Quechua populations. As for locus HLA-DQA1, the sum of the frequencies of the alleles DQA1*03, DQA1*0401, and DQA1*0501 exceeds 90% in all the seven population samples. In particular DQA1*03 is the most frequent in the Quechua (36.4%). In the HLA-

64

2.3 HLA and population genetics DQB1 locus, alleles DQB1*0301, DQB1*0302, and DQB1*0402 were found in all populations, where DQB1*0302 is the most frequent in the Quechua (31.8%). Allele DRB1*090102 is seen among Amerindians, with highest observed frequency of 20% in the Cayapa but also found to be common in Quechua (18.2%). DRB1*090102 occurs also in North American Native populations including the Tsimshiam, Na-Dene, and Eskimo and among Eastern Asians, being one marker of the Asian ancestry of Native American populations. As expected, each HLA-DRB1 allele is found in just one DR-DQ haplotype. Haplotypes DRB1*1602- DQA1*0501-DQB1*0301 and DRB1*14-DQA1*0501-DQB1*0301 may be taken as ethnic markers, because they are unique to the indigenous populations of the Americas (Tsuneto et al., 2003). Table 2.3.3 HLA-DRB1 allele frequencies (Tsuneto et al., 2003).

65

2.3 HLA and population genetics 2.3.3.2 HLA analysis of South America populations: Venezuela

The area west of Lake Maracaibo, on the northwestern corner of South America in the limits between Colombia and Venezuela, is inhabited by two Amerindian populations: the Yucpa in the northern section of the Sierra de Perija and the Bari at the southern section. Yucpa is one of the seven Venezuelan tribes classified as “Carib” because they speak a language of Cariban affiliation. It has been shown that the Carib tribes of northern South America have a very striking genetic homogeneity compared to tribes belonging to other linguistic phylum. At the time of the conquest the Carib had occupied the eastern half of Venezuela and much of the southern territory of the country where they formed a single linguistic community of preagricultural nomads approximately 4500 years ago when they started to separate, migrating possibly in a south-to-north-northwest direction conquering territories, which had been occupied previously by tribes of different affiliation. They have developed a Neo-Indian type of culture, based on slashand- burn agriculture.Yucpa originated at the lower Amazonia, migrating gradually to their present habitat in a period of more than 4000 years. At present they are approximately 4000 individuals living in over 40 settlements. In the study by Layrisse at al., 2001, 73 individuals were selected included 55 members of 20 family groups and 18 subjects with no relatives, providing a total of 43 or 44 unrelated Yucpa for haplotypes frequencies estimation. Extended HLA typing was performed by sequencing-based platform. Thus Yupca was the first Amerindian tribe of the Caribban stock tested for the HLA system at the allelic level. The Yucpa population shows the expected low number of class 1 variants reported in other South American tribes. In particular the authors outlined the extremely high gene frequency of B*3905 (0.31) and B*3909 (0.44), in complete linkage disequilibria with C*0702 (0.75), which is characteristic of the Yucpa population. Only six DRB1 alleles were found in the sample tested: DRB1*0403, 0407, 0411, 0807, 1402, and 1602, showing tight linkage disequilibria with DQA1 and DQB1 alleles. The DRB1 allele and DR/DQ haplotype frequencies demonstrate the remarkable high frequency of DR*04 alleles (f =0.75) the highest ever reported in any human population. The study by Layrisse at al. showed the presence of ancestral and novel class I and class II alleles, which have been considered characteristic of Amerindian populations. As for isolated indigenous populations, the number of alleles at each locus was found to be limited to 5 or 6, with 1 or 2 at each locus showing frequencies above 30%. The Yucpa have retained 7 of the hypothetical 20 ancestral HLA class 1 alleles present among the founder population migrating into this continent through Beringia and present 9 of the novel alleles which have been postulated to originate in situ, subsequent to the colonization of the Americas.Three of these new alleles, A*0204, B*3905, B*3909 have frequencies superior to 30% among the Yucpa and are included in the most frequent extended haplotypes present in this population. Significant linkage

66

2.3 HLA and population genetics disequilibria are observed either between new alleles [HLAA*0204-B*3905; C*1503-B*52012; B*52012-DRB1*0807) and between old and new alleles (C*0102B*1522; B*1522 DRB1*0407) (Layrisse at al., 2001).

67

2.4 mtDNA and population genetics 2.4 Contribution of mtDNA to population genetics

Historically, the mitochondrial DNA has been an elective tool in the study of population genetics. Human mtDNA is characterized by a high copy number, a much greater evolutionary rate than that of the average nuclear gene and a matrilineal inheritance (Giles et al., 1980) without any paternal contribution or recombination between the two parental lineages (Hagelberg et al., 1999; Macaulay et al., 1999a; Sutovsky and Schatten, 2000; Elson et al., 2001; Rantanen et al., 2001). Therefore, over the course of time, this process of molecular divergence gave rise to monophyletic units called haplogroups. A can be defined as a group of phylogenetically related mtDNAs that share mutations derived from a common female ancestor (or MRCA) (Torroni et al., 1993). Because the process of molecular differentiation is relatively fast and occurred mainly during and after the process of human dispersal into different parts of the world, subsets of mtDNA variation usually tend to be restricted to particular geographic areas and populations (Torroni et al., 2006). Thanks to these peculiar features, the mitochondrial DNA is a very informative and a widely employed instrument to track down the history and migration of the female ancestor who transmitted the mtDNA molecule through generations. The study of the geographical distribution of the clades (haplogroups) within a phylogeny is called phylogeography (Avise, 2000). This approach requires the combination of three elements: a phylogenetic tree, the geographic distribution of lineages on the tree, and the time depth of lineages, especially those that are restricted to a particular area (Soares et al., 2010). Sequence variation of DNAs sampled in the present, can be used to reconstruct the phylogenetic tree that displays the inferred genealogical relationships between individual sequences. The timescale is provided by converting lineage diversity to age estimates by means of a molecular clock.

2.4.1 The molecular clock

The molecular clock is an extremely useful method for estimating evolutionary timescales by the analysis of the changes of molecular traits. It is based on the inference that DNA and protein sequences evolve at a rate that is relatively constant over time and among different organisms. A direct consequence of this constancy is that the genetic difference between any organisms is proportional to the time since they last shared a common ancestor (Ho, 2008). The building of a molecular clock requires the comparison between a phylogenetic tree (including sequences from different individuals belonging to the same or different species) and an outgroup with respect to which the divergence in exactly known.

68

2.4 mtDNA and population genetics Since timing is decisive for the interpretation of the demographic history of populations, a reliable relation between sequence diversity and the time scale is needed. Uniparental markers, in particular the mtDNA, provide phylogenies that can be better time-calibrated than other systems. During the years, a wide range of molecular clock models and methods, implemented in various statistical phylogenetic settings, have been proposed (Ho and Duchêne, 2014), but the most commonly used for humans is the time-dependent clock that corrects for purifying selection proposed by Soares (Soares et al., 2009). Recently another mutation rate, based on the divergence between ancient and modern human mitogenomes, has been introduced and it consists in a linear clock obtained using 66 ancient dated mtDNAs as tip calibration points (Posth et al., 2016).

2.4.2 mtDNA nomenclature

In the early 90‟s, the first large-scale population study was performed on Native Americans and it highlighted the presence of four different clusters that included all the samples (Torroni et al., 1993). In this context the universally accepted mtDNA nomenclature was initiated with the definition of the first four branches in the human phylogenetic tree, named alphabetically as A, B, C, and D. Shortly afterward these analyses were also applied to other continental populations allowing the identification of haplogroups E, F and G in Asia and H, I, J, K, T, U, W, X and V in Europe (figure 2.4.1) (Torroni et al., 1994a; Torroni et al., 1994b; Torroni et al., 1996). In order to address all the laboratories producing mtDNA data to follow the same nomenclature, in 1998 the cladistic rules for the hierarchical ordering of haplogroups and sub-haplogroups were established (Richards et al., 1998). This system involves the use of a capital letter to define a main branch of the mtDNA phylogenetic tree, followed by a number to identify a second level lineage (a sub-clade). By following a hierarchical notation, the next layer of clusters is denoted by a small letter as a suffix and the successive levels by numbers and small letters alternate (e.g., B2b5a1). Since then, the naming of new haplogroups has naturally evolved in a self-organizing way, ushered forward by those who produced novel data, following the rules of the nomenclature system and respecting the published record. Haplotypes that cannot be assigned to a specific known sub-lineage (either because such lineage has not been defined yet, or because insufficient data was produced for that sample) are placed under the last possible assignable node in the tree and are represented with a star at the end of their name (e.g., A2*).

69

2.4 mtDNA and population genetics

Figure 2.4.1 Simplified mtDNA lineages and their geographical distribution (MITOMAP 2013).

The increasing number of complete mtDNA sequences has greatly improved the capability to define the basal branches of the human phylogenetic tree. On the other hand, the large amount of data may create conflicting information from different papers concerning the same haplogroup. In order to avoid this problem van Oven and Kayser drew an updated phylogeny (http://www.Phylotree.org) by comparing the maximum number of available mtDNA sequences of each haplogroup (van Oven and Kayser, 2009). Since mitogenomes are constantly published, this online tree offers regular updates that allow checking for the latest haplogroup subdivisions and naming, getting the full up-to-date picture of the worldwide human mtDNA tree.

2.4.3 mtDNA reference sequences

The definition of the haplogroup identity of a sequence needs a previous comparison with a reference sequence. The first mitochondrial reference was the CRS (Cambridge reference sequence) corresponding to the first human mitogenome sequenced (Anderson et al., 1981). In 1999 the original mtDNA sample used to obtain the CRS was sequenced again (Andrews et al., 1999) and

70

2.4 mtDNA and population genetics this resequencing revealed some discrepancies in respect to that obtained by Anderson et al., These differences include one extra base pair in position 3107, and incorrect assignments of single base pairs due both to sequencing artifacts and to the use of bovine samples to cover the regions technically difficult to sequence in the human sample. This „new version‟ of CRS was called revised Cambridge reference sequence (rCRS). The rCRS belongs to the European haplogroup H2a2a1 (van Oven and Kayser, 2009) and is deposited in the GenBank NCBI database under the accession number NC_012920. In 2012, Behar and colleagues (Behar et al., 2012) proposed that rCRS should be replaced by a new reference representing the ancestral genome from which all present mtDNAs descends (the ), placing the root between haplogroups L0 and L1‟2‟3‟4‟5‟6. This reference was named RSRS (Reconstructed Sapiens Reference Sequence). The RSRS was reconstructed using ~19,000 mitogenomes and includes three spacers (positions 523, 524 and 3107) to preserve the historical CRS position numbering (Table 2.1). The introduction of a new reference trigged opposite opinions. The main criticisms arrived from forensic and medical genetics, in which a sample is commonly represented with the list of variants in comparison to the reference, and not as a sequence (i.e. FASTA format) (Bandelt et al., 2014). Given that the two references belong to phylogenetically distant haplogroups (RSRS to haplogroup L and rCRS to H2a2a1), the amount of differences between the two reference sequences is significant (table 2.4.1), thus creating difficulties in the conversion of the list of mutations obtained reading sequences with respect to one or the other reference.

Table 2.4.1 List of SNP differences between the RSRS and the rCRS. aCoding-region mutations (nps 577-16023) are shown in bold; control-region mutations (nps 16024- 576) are in italic (http://www.Phylotree.org)

Position RSRS rCRS Position RSRS rCRS 73 G A 8701 G A 146 C T 8860 G A 152 C T 9540 C T 195 C T 10398 G A 247 A G 10664 T C 263 G A 10688 A G 523 - A 10810 C T 524 - C 10873 C T 750 G A 10915 C T 769 A G 11719 A G 825 A T 11914 A G 1018 A G 12705 T C

71

2.4 mtDNA and population genetics 1438 G A 13105 G A 2706 G A 13276 G A 2758 A G 13506 T C 2885 C T 13650 T C 3594 T C 14766 T C 4104 G A 15326 G A 4312 T C 16129 A G 4769 G A 16187 T C 7028 T C 16189 C T 7146 G A 16223 T C 7256 T C 16230 G A 7521 A G 16278 T C 8468 T C 16311 C T 8655 T C 16519 C T

2.4.4 mtDNA worldwide phylogeny

The human mtDNA tree (figure 2.4.2) splits at its core layers into branches that carry exclusively African sequences belonging to the L haplogroup. This clade has been subdivided into 7 main branches (L0-L6) (Kivisild et al., 2004; Behar et al., 2008) and only one of these, haplogroup L3, is shared between the Africans and the rest of the world (Torroni et al., 2006). Thus, mtDNAs outside Africa fall into the two main clusters M and N, branching out from the root of haplogroup L3 (that also has given rise to some sub-clades specific to African populations). The number of extant non-African founder haplogroups can however be extended to include a third member, haplogroup R, which is a daughter-clade of N. In the commonly used haplogroup nomenclature (Richards and Macaulay, 2001), branches within M include C, D, E, G, Q, and Z (Friedlaender et al., 2005; Kong et al., 2006; Chandrasekar et al., 2009; Derenko et al., 2010), while the subdivision within N includes A, I, S, W, X, and Y (Reidla et al., 2003; Kivisild et al., 2006; Kong et al., 2006; Derenko et al., 2007; Perego et al., 2009; Palanichamy et al., 2010) and the super haplogroup R, which contains also B, F, J, P, R0, T, and U (Torroni et al., 1996; Macaulay et al., 1999b; Palanichamy et al., 2004; Friedlaender et al., 2005; Kong et al., 2006; Achilli et al., 2008; Cerný et al., 2011). The majority of the sub-lineages of haplogroup M are found primarily in South and East Asia, while N and its sub-clade R encompass most of the mtDNAs of West Asian and European populations. Haplogroups M and N are both rare in sub- Saharan Africa, where the mtDNAs belong almost exclusively to the macro- haplogroup L. However, the finding of distinct M variants in leaded the identification of eastern Africa as the source of a migration out of Africa involving the ancestors of current Asian populations (Quintana-Murci et al., 1999; Jobling et al., 2004).

72

2.4 mtDNA and population genetics

Figure 2.4.2 Schematic representation of the worldwide phylogeny of human mtDNA. The tree was obtained by combining six trees constructed separately, with branch lengths estimated with maximum likelihood and the time-dependent molecular clock. Number and presence of clades is dependent on availability of data and not on worldwide frequencies. Ages are expressed in kilo years ago (Kya) (Soares et al., 2009).

73

2.4 mtDNA and population genetics

2.4.4.1 The origin of modern humans

In 1871, Charles Darwin postulated in his „The Descent of Man‟ that humans originated in Africa. However, this hypothesis was considered speculative until the 1980s when it was substantiated by a number of studies using present-day mitochondrial DNA, as well as evidences based on physical anthropology of archaic specimens. The first reconstructions of human genetic ancestry were based on data from mtDNA, MSY, and a small number of nuclear loci and much of what they revealed remains central to our understanding today (Groucutt et al., 2015). The model envisioned that a small, relatively isolated population of early humans evolved into modern Homo sapiens, and succeeded in spreading out of Africa towards Asia and then Europe and elsewhere, replacing all of the Neanderthals and other late archaic Homo sapiens (Stringer and Andrews, 1988). Although a common recent African origin for all modern humans is not the only hypothesis that has been set forth, it currently represents the near consensus position held by the scientific community (Liu et al., 2006). As concerning uniparental markers, the analysis of mtDNA from African populations suggested that East Africa played an important role in the origin and diversification of modern humans (Gonder et al., 2007). The maternal lineages of all living humans coalesce in a most recent common ancestor (MRCA), also known as "mitochondrial Eve" (Cann et al., 1987), that is typically represented as the top of the root node of the human phylogenetic tree, splitting in macro-haplogroups L0 and L1-L6 (figure 2.4.3). Phylogenetic studies describing the global mitogenome diversity in humans confirmed the single origin of our most recent (female) common ancestor, who lived in East Africa around 200 thousands years ago (Kya), with no findings of Neanderthal or other mitogenomes in modern humans (Ingman et al., 2000; Mishmar et al., 2003; Gonder et al., 2007; Underhill and Kivisild, 2007; Soares et al., 2009).

74

2.4 mtDNA and population genetics

Figure 2.4.3 Schematic phylogenetic tree of the African L sub-lineages. The L0 and L1- L5 branches are highlighted in light green and tan, respectively. The branches are made up of haplogroups L0–L6 which, in turn, are divided into clades. Khoisan and non-Khoisan clades are shown in blue and purple, respectively. Clades involved in the African exodus are shown in pink. A time scale is given on the left. Approximate time periods for the beginning of African last stone age (LSA) modernization, appearance of African LSA sites, and solidization of LSA throughout Africa are shown by increasing colors densities (Behar et al., 2008).

75

2.4 mtDNA and population genetics

This picture of human origins and dispersal, based on the phylogeographic analysis of human mtDNA, is certainly not the whole story, which has to be based on the full range of human genome markers and as much ancient DNA evidence as possible. Indeed, studies of genome-wide variation have shown that modern humans underwent some interbreeding with earlier human offshoots from Africa. The impact of Neanderthals and Denisovans is overall rather small in modern subjects (1¬2%) (Green et al., 2010; Reich et al., 2010; Meyer et al., 2012), but certain archaic Homo genes (e.g. immune genes) played a major role on the fitness of the expanding populations (Abi-Rached et al., 2011; Mendez et al., 2012; Deschamps et al., 2016). Nevertheless, the “out-of-Africa” model (see below), first proposed three decades ago on the basis of mtDNA phylogeography, remains the consensus view of modern human origins, broadly supported also by paleontological and archeological as well as genome-wide evidence (Mellars et al., 2013). 2.4.4.2 The „Out of Africa‟ exit

The reconstruction of human genetic ancestry and the dispersal of modern humans out of Africa represent still now an ongoing debate. Evidences related to the timing and routes of dispersal of Homo sapiens out of Africa come from fossil, genetic and archaeological data and are consistent with several models (Groucutt et al., 2015). The oldest modern human remains out of Africa were identified in the (Skhul and Qafzeh caves) (Grün and Stringer, 1991) and in the United Arab Emirates (Jabel Faya site) (Armitage et al., 2011). These archaeological and fossil evidences suggest that humans could have been present outside the African continent 90-120 Kya, demonstrating their presence in eastern Arabia during the last interglacial. In support of this findings, climatic models demonstrated the presence of enabling migration paths out of Africa throw resource-rich corridors established during three time windows: one 130-118 Kya (not associated with human migration out of Africa (Timmermann and Friedrich, 2016), one 106-94 Kya and another 89-73 Kya (deMenocal and Stringer, 2016). If the hypothesis of an early exodus ~100 Kya is true, it was suggested that the presence of fossils and artifacts outside of Africa could be attributed to an earlier localized and short-lived dispersal. All of the members of this group became extinct and thus had no descendants in the modern human population (Mellars, 2006a; Richards et al., 2006). This hypothesis finds validations by evidences coming from mtDNA studies. All present-day mtDNAs in non-Africans are placed within the two branches, M and N, of haplogroup L3 (Behar et al., 2012). This has led to argument that the group leaving Africa must have been one and very small, given that only L3 type survived and all other possible founding branches, if any, were lost by genetic drift. The age of L3 is an upper boundary on the exit from Africa and

76

2.4 mtDNA and population genetics places the maximum at ~70 Kya, virtually ruling out a successful exit before the Toba volcanic supereruption in Sumatra 74 Kya (Soares et al., 2012) (Figure 2.4). On the other hand, haplogroups M and N most likely originated after the exit from Africa, thus representing a lower bound for this dating. The ages of these two haplogroups, between 50 and 65 Kya, are quite close to the one of their L3 ancestor in Africa, suggesting that the expansion of L3 in Eastern Africa and its exit out of the continent could be part of the same demographic event (figure 2.4.4) (Soares et al., 2012).

Figure 2.4.4 Schematic representation of major dispersals from Eastern Africa during the Pleistocene (Soares et al., 2012).

Two alternative scenarios have been proposed to explain the presence of the two sub-branches of the mtDNA haplogroup L3 both in Europe and Asia. The first postulates a „Levantine route‟ from northeast Africa to the Levant across the Sinai Peninsula ~45 Kya (Stringer and Andrews, 1988; Prugnolle et al., 2005b). However, the route along the Levantine corridor did not explain why adjacent Europe was settled thousands of years later than distant Australia (Forster and Matsumura, 2005), thus, this model has been replaced by a „southern route model‟ according to which dispersal probably started ~70 Kya from the Horn of Africa to the Persian Gulf and further along the tropical coast of the Indian Ocean to and Australasia (figure 2.4.5). This second scenario is strongly supported by palaeoenvironmental evidence, confirming that a northern migration would have been impossible during the glacial period extending from ~70 to 50 Kya (Forster and Matsumura, 2005; Macaulay et al., 2005; Mellars, 2006b; Torroni et al., 2006). Recently, three new studies based on high-coverage whole genome analyses on 270 populations across the globe (Malaspinas et al., 2016; Mallick et

77

2.4 mtDNA and population genetics al., 2016; Pagani et al., 2016) provided a high-resolution portrait of human genetic diversity, allowing new inferences to refine and extend current models of historical human migration out of Africa (mainly based on mtDNA data) (Tucci and Akey, 2016). They came to different conclusions about the out of Africa dispersal. Pagani and colleagues identified a genetic signature in the genomes of present-day Papuans that suggests human presence outside Africa before the main out of Africa split time that involved other Eurasians (~75 Kya) (Pagani et al., 2016), in line with a multiple dispersal model. The other two works, instead, support the scenario that all contemporary non-Africans branched off from a single ancestral population (Malaspinas et al., 2016; Mallick et al., 2016).

Figure 2.4.5 Hypothetical routes along the Indian Ocean coastline that could have been taken by early humans from Africa. MtDNA data from Malaysians and aboriginal Andaman islanders suggest that human settlements appeared along the Indian Ocean coastline 60 Kya (Forster and Matsumura, 2005).

In conclusion, the debate about the first out of Africa is still open. Modern humans could have crossed the Bab al-Mandab more than 100 Kya, a time that predates the upper bound indicated by genetic studies, but from a genetic point of view, in particular from the mtDNA perspective, a single exit about 60-70 Kya is the most plausible scenario to explain the first migratory event that led to the peopling of the planet. 2.4.4.3 Human colonization of the world

2.4.4.3.1 The peopling of Australasia

Soon after the rapid expansion along the coastlines of southern Asia, south- eastern Asia, and Indonesia, the wave of migration reached New Guinea and

78

2.4 mtDNA and population genetics Australia (figure 2.4.5), at a time when the lower sea levels joined the two islands into one land mass, necessitating sea travel only across narrow straits (Hudjashov et al., 2007). The deep and specific phylogenetic lineages now within this former landmass indicate a small founding population size and subsequent isolation of Australia and New Guinea, from the rest of the world. Approximately 3.5 Kya, an expansion of Austronesian-speakers arrived in Near Oceania and the descendants of these colonizers spread to the far corners of the Pacific, colonizing Remote Oceania (Duggan et al., 2014; Duggan and Stoneking, 2014). These founder events and the lack of contact could underlie the divergent morphological development seen in the Australian human fossil record and could also help in explaining the remarkably restricted range of Pleistocene Australian lithic industries and bone artifacts compared with contemporary cultures elsewhere in the world (Mellars, 2006c). The richest basal variation in the founder haplogroups M, N and R is found along the southern stretch of , particularly in the Indian subcontinent (Chaubey et al., 2008; Sun et al., 2006; Palanichamy et al., 2004) and a similarly high basal diversification are present in Southeast Asia (Hill et al., 2007; Kong et al., 2006; Macaulay et al., 2005). These data suggest a rapid colonization along the southern coast of Asia, reaching Sahul ~60 Kya. The expansion northwards to fill the heartland of the continent occurred only later, ~45 Kya, when a combination of technology and climatic conditions enabled the exploration of the interior of Eurasia. One of the marginal extensions eventually led to the peopling of Europe (Mellars, 2006b).

2.4.4.3.2 The peopling of Europe

The first peopling of Europe by modern humans occurred about 45 Kya (Gamble et al., 2004; Mellars, 2006b; Mellars, 2006d). The genetic pool of present- day Europeans is suggested to derive from the admixture of three ancestral populations: west European Paleolithic hunter-gatherers, ancient north Eurasians - who were closely related to a 24 Ky-old skeleton from (Raghavan et al., 2014) -, and early European farmers (Lazaridis et al., 2014). However, an ongoing debate concerns the relative amount of genetic input into modern Europeans from Paleolithic versus Neolithic waves of settlement. Europeans have a high level of haplogroup diversity within haplogroups N and R (H, HV, N1, J-T, U, I, W, and X) but lack of haplogroup M almost entirely (Underhill and Kivisild, 2007; Soares et al., 2010). Members of mtDNA haplogroup U5 probably marked the first Upper Paleolithic entry in Europe from the Near East, while populations bearing U6 (and M1) entered North Africa (Olivieri et al., 2006; Pennarun et al., 2012). Paleolithic events include also the resettlement from southern refugia after the Last Glacial Maximum (LGM) (~18 Kya), marked by lineages originated in these refugia. The expansion from the Franco-Cantabrian refugium left signatures in haplogroups V (Torroni et al., 1998;

79

2.4 mtDNA and population genetics Torroni et al., 2001), H1, H3 (Achilli et al., 2004; Pereira et al., 2005), H5 and U5b1b (Soares et al., 2010; Tambets et al., 2004), while U5b3 marks a dispersal event from the Italian Peninsula (Pala et al., 2009), U4 and U5a from the East European Plain (Malyarchuk et al., 2008; Malyarchuk et al., 2010) and J, T, I and W from the Near Eastern refugia (Pala et al., 2012; Olivieri et al., 2013). The advent of agriculture and pastoralism usually distinguish Neolithic from earlier Paleolithic or Mesolithic hunting-gathering cultures. The Neolithic period of interest for Europe began around 10 Kya when farmers moving from the Near East rapidly (~5 Ky) reached most of the European continent. Most of the studies carried on ancient DNAs have led to the prevailing conclusion that Paleolithic and Mesolithic hunter-gatherer European populations differed genetically from early Neolithic farmers, in turn implying that there was a wide- scale replacement across Europe from the Near East in the early Neolithic, with limited assimilation of native Europeans (Pinhasi et al., 2012; Lazaridis et al., 2014; Omrak et al., 2016; Posth et al., 2016). However, according to a recent study on ancient genomes, in the time frame from ~37 to ~14 Kya, Paleolithic Europeans derive from a single ancestral population, but, starting from 14 Kya (the first significant warming period – the Bølling-Allerød interstadial - after the LGM), there was a migration wave from the Near East contributing to the European genetic pool (Fu et al., 2016). The notion of a genetic input from the Near East into and across Europe in the late Pleistocene prior to the arrival of the Early Neolithic material culture in Greece ~8.5 Kya (Manning et al., 2014) is a novelty in human paleogenomics. Actually, a Late Glacial/ Postglacial recolonization of Europe from the Near East before the migration waves associated with the onset of farming had been already hypothesized in mtDNA studies (Pala et al., 2012; Olivieri et al., 2013; Gandini et al., 2016; Richards et al., 2016). In addition, several haplogroups often assumed to have dispersed from Anatolia only with the advent of the Neolithic, similarly to all non-U8, non-U5 and non-U2 mtDNAs, were already present in Mesolithic Mediterranean Europe, particularly in Italy (Pereira et al., 2017), a suggestion in line with the recent detection of two K1c mitogenomes in Mesolithic Greece (Hofmanová et al., 2016). These data leave still open questions concerning the populations involved in the genetic contributions to Paleolithic and Neolithic Europeans.

2.4.4.3.3 The peopling of the Americas

America, the double continent, was the last to be colonized by modern humans and the study of its peopling represents one of the first and the most significant example of interdisciplinary interaction between archaeology, linguistics and genetics (Greenberg et al., 1986). In last decades phylogenetic studies were particularly useful to shed light on America‟s first colonizers, particularly regarding the timing of their arrival and the routes they took (Schurr and Sherry, 2004; O'Rourke and Raff, 2010). Many major contributions have come

80

2.4 mtDNA and population genetics from mtDNA studies, mainly carried out in modern populations, but also from ancient human remains (Kemp et al., 2007; Gilbert et al., 2008; Raff et al., 2010; Tackney et al., 2015; Fehren-Schmitz et al., 2015; Llamas et al., 2016). The study of the mitogenome variability along all the Americas allowed to increase the overall number of maternal founding lineages from just four - initially named A, B, C and D (Schurr et al., 1990; Torroni et al., 1992; Torroni et al., 1993) – to 16 (figure 2.4.6). Among these, eight (A2, B2, C1b, C1c, C1d, C1d1, D1 and D4h3a) are called pan-American haplogroups, as they are distributed across the double continent, while the others are less frequent and generally show a distribution restricted to specific geographic areas, i.e. North America (A2a, A2b, C4c, D2a, D3, D4e1, X2a and X2g) (Perego et al., 2010). Additional sub-lineages have evolved from the pan-American haplogroups and exhibit limited geographic distribution ranges (Bodner et al., 2012; de Saint Pierre et al., 2012; Achilli et al., 2013). Overall these studies, together with archeological and climatic evidence, support the scenario that the migratory event that led to the peopling of the Americas occurred approximately 20 Kya, during the Last Glacial Maximum (LGM). It refers to the time of maximum extent of the ice sheets during the last glacial period, between 13 and 30 Kya, with a peak around 18-20 Kya (Clark et al., 2009). During the peak of the last Ice Age, the sea level was considerably lower than today, and Asia and North America were connected by an exposed massive land bridge known as Beringia, now submerged. One very contentious issue is whether the settlement of the Americas occurred by means of a single or multiple streams of migration. Many analyses of Native American genetic diversity suggest a single migratory wave from an ancestral population that lived in Beringia during the LGM, probably coming from South-Central Siberia (Zegura et al., 2004; Schroeder et al., 2007; Tamm et al., 2007; Wang et al., 2007; Kemp and Schurr, 2010; Hoffecker et al., 2014; Hoffecker et al., 2016). However, evidence coming both from mtDNA and genome-wide analyses identified at least one or two additional source populations, leading to the so called „tripartite migration model‟. This model, originally proposed in 1980‟ by anthropometric and linguistic data, postulates that the Americas were settled through three separate population movements whose identity was expressed in linguistic terms as Amerinds, Na-Dene, and Eskimo– Aleut speakers (Williams et al., 1985; Greenberg et al., 1986; Greenberg 1987; Reich et al., 2012; Achilli et al., 2013, Raghavan et al., 2014; Raghavan et al., 2015).

81

2.4 mtDNA and population genetics

Figure 2.4.6 MtDNA tree encompassing the roots of all known founding Native American haplogroups. The distinguishing mutational motifs for the 16 known Native American haplogroups are reported on the branches. Mutations in the control region are in red, while mutations in the coding region are listed in black; they are transitions unless a base is explicitly indicated. The prefix @ designates reversions, while suffixes indicate transversions (to A, G, C, or T), indels (+, d). Recurrent mutations within the tree are underlined (Perego et al., 2010; Kumar et al., 2011).

Independently to how many migration events occurred, the reduced genetic diversity found in the Americas is a sign of a significant founder effect, suggesting

82

2.4 mtDNA and population genetics that the number of Native American founders was limited. The ancestral Beringian populations probably retreated into refugia during the Ice Age, where new genetic variants evolved through mutation and genetic drift. Here they remained isolated for ~5 Ky, before spreading rapidly throughout the Americas (Reich et al., 2012; Chatters et al., 2014; Raff and Bolnick, 2014; Rasmussen et al., 2015; Raghavan et al., 2015; Llamas et al., 2016). The southward expansion from Beringia to the extreme southern tip of South America, covering a latitude gap of more than 100° (from about 65° North to 54° South) and a distance of more than 15,000 Km, possibly occurred in a time span of less than 2 Ky (Kumar et al., 2011; Bodner et al., 2012). It likely occurred following two entry ways: the Pacific coastal route (deglaciated more than 17 Kya), probably playing the major role in the peopling of the double continent, and the interior ice-free corridor passage between the Laurentide and Cordilleran ice sheets (opened ~13.5 Kya) (figure 2.4.7), that also had a significant impact, at least for the colonization of North America (Fix, 2005; Fagundes et al., 2008; Perego et al., 2009; Hooshiar Kashani et al., 2012; Kemp et al., 2010; Perego et al., 2010; Achilli et al., 2013).

Figure 2.4.7 The two paths of migration from Beringia. The Pacific coastal route is marked in yellow while the interior route is in light blue.

Likewise, it was proposed that the migration to South America may have occurred down the Pacific coast and later eastward across the Andean Cordillera, or after a split of the founding population in the northern area of South America with population groups moving separately across the eastern Andean highlands into the Amazonian basin (Bodner et al., 2012, de Saint Pierre et al., 2012; Reich et al.,

83

2.4 mtDNA and population genetics 2012; Homburber et al., 2015; Llamas et al., 2016). Intriguingly, some recent genome-wide data have also risen the possibility that some Amazonian Native Americans descend partly from a Native American founding population that carried ancestry more closely related to indigenous Oceanian populations. This signature is apparently not present in Northern and Central Americans suggesting a more diverse set of founding populations of the Americas than previously accepted (Raghavan et al., 2015; Skoglund et al., 2015).

84

2.5 Y-chromosome and population genetics 2.5 Contribution of Y-chromosome to population genetics

2.5.1 Y-chromosome polymorphisms

Any mutational event in a nucleotide sequence generates two allelic forms, the ancestral and the derived one. The new sequence variant is defined as polymorphic if it is observed in at least 1% of a population, rare if the frequency is lower. The MSY region exhibits patrilineal inheritance (Ellis et al., 1990) and for this property the analysis of Y-chromosome polymorphisms represents a useful tool for evolutionary studies together with its use in forensic and paternity studies (Jobling et al., 1997). The first polymorphism identified was the DYS11 in 1985 (Casanova et al., 1985), followed in 1986 by DYS1 (Ngo et al., 1986). Towards the end of 1996 less than 60 known polymorphisms were assigned to the MSY region. The majority (80%) were long-range polymorphism (detectable by pulsed gel electrophoresis), Restriction Fragment Length Polymorphisms (RFLPs) or Variable Number of Tandem Repeats (VNTRs). Until 1997, only 11 biallelic polymorphisms were known that could be analyzed by PCR (Seielstad et al., 1994; Hammer and Horai, 1995; Whitfield et al., 1995; Santos et al., 1995, Jobling et al., 1996; Underhill et al., 1996). In 1997, Underhill et al., described 19 new Single Nucleotide Polymorphisms (SNPs) that were discovered by a new method of mutation analysis known as Denaturing High Performance Liquid Chromatography (DHPLC). Currently, the use of this technology and the sequencing of entire regions have rapidly increased the number of Y-specific markers from a dozen to several hundreds. Different kinds of variations have been described in the MSY: morphological variations in the Yq and in the centromeric region, deletions and inversions of large stretches of DNA, as well as smaller variations such as the Alu insertion, SNPs and VNTRs. Among these, SNPs and VNTRs have been the most used in evolutionary and population genetic studies. 2.5.1.1 Single Nucleotide Polymorphisms (SNPs)

Currently 43,227 SNPs of the Y chromosome are reported on Yfull website (https://www.yfull.com/snp-list/) but only a small part of them has been validated and used in the reconstruction of the human Y-chromosome phylogeny. SNPs are often referred to as unique event polymorphisms (UEPs) as, since they have a very low mutation rate (10-8 per base pair per generation), the chance of two consecutive events hitting exactly the same nucleotide pair is very low. Specific websites

85

2.5 Y-chromosome and population genetics constantly update the new discovered Y-chromosome polymorphisms: ISOGG- (https://isogg.org/tree/), YFULL (https://www.yfull.com/tree/), FamilyTreeDNA (http://www.familytreedna.com/). 2.5.1.2 Variable Number of Tandem Repeats (VNTRs)

Tandem repeat loci are defined by blocks or arrays of tandem repeated DNA sequences, patterns of nucleotides repeated directly adjacent to each other. Depending on the length of the single unit and on the average size of the arrays, they can be grouped into three classes: satellite, minisatellite and microsatellite DNA. The concept of satellite DNA appeared in 1960. It refers to highly repetitive DNA which, in a density gradient centrifugation, generates a minor band (satellite band) of a different buoyant density respect to the major band of genomic DNA. This is due to the different nucleotide composition. The first sequences identified were localized to the telomeric and centromeric region of chromosomes. Subsequently, also shorter tandem repeated sequences named mini-and micro- satellites were identified. The Y chromosome carries two major satellites, DYZ1 and DYZ2 (Cooke, 1976). The DYZ1 satellite is composed of a 3.56-kb repeats, each showing a tandem arrays of penta nucleotides (Nakahori et al., 1986). DYZ2 is organized into 2.47-kb repeat units (Cooke et al., 1982). Minisatellite is a pattern of 10-60 bp repeated even more than 1,000 times per locus. In the Y chromosome, the only minisatellite isolated and characterized is the MSY1 or DYF155S1 (Jobling et al., 1998). It is composed of 48-144 copies of a 25-bp repeat unit, which is AT rich (75-80%) and predicted to form stable hairpin structures. Microsatellites are composed of 1-6 bp repeated units and are variable in length from 5 to 200 bp representing the 10% of the human genome. The number of repetitions of a microsatellite at a given locus can be polymorphic and, by examining several STR loci in one individual, it is possible to obtain a unique genetic profile, or haplotype (Monckton and Jeffreys, 1993). Differently from SNPs, the monophyletic origin of STR alleles cannot be assumed, as any allele of a given size can be generated by a number of events (additions or subtractions) from an entire set of parental alleles. More than 400 microsatellite sites (Kayser et al., 2004; Hanson and Ballantyne, 2006) have been identified and most of them are polymorphic. Information about them can be found on many websites (http://strbase.nist.gov/, http://www.yhrd.org), even if their nomenclature has not been unified yet. 2.5.2 Phylogeny

A haplotype is the allelic combination on one chromosome of closely linked markers, which tend to be inherited together. In the absence of recombination, such as the case of MSY region, haplotypes can evolve only by

86

2.5 Y-chromosome and population genetics accumulation of mutational events throughout generations. For this reason, according to the coalescent theory, it is possible from the analysis of the actual variation going backwards in time through generations in order to identify the molecule that is ancestral to the existing molecules (the Most Recent Common Ancestor; MRCA). All MSY copies bearing the same allelic variant at a given position can all in principle be considered descendants of the same ancestor in which that particular mutational event occurred (i.e. they have a monophyletic origin), belonging therefore to the same haplogroup (group of chromosomes with the same mutations in the same position and order). Alleles shared by two haplogroups testify that they have common ancestry, whereas alleles that differentiate two haplogroups testify that they belong to lineages that diverged at a certain point in the past and, since then, accumulated a different series of mutations. Figure 2.5.1 illustrates the phylogeny of human Y chromosomes. It represents the phylogenetic relationships existing among haplogroups, and the root of the tree (MRCA) represents the ancestral state at all positions found to be variable today. The first well-resolved phylogenetic tree of the MSY included 116 binary haplogroups identified by 167 SNPs (Underhill et al., 2000). From then on, the MSY tree has been progressively refined through the discovery and mapping of a large number of SNPs (YCC, 2002; Jobling and Tyler-Smith, 2003). The most recently published MSY phylogeny incorporates 599 mutations in 311 distinct binary haplogroups (Karafet et al., 2008). Haplogroup nomenclature follows the rules defined in 2002 by the Y Chromosome Consortium. Capital letters (from A to T) were used to identify 20 major haplogroups, then two alternative nomenclature systems were proposed: the hierarchical one, in which sub-lineages are defined by alternating lower case letters and numbers (i.e. J2a), and the mutational one, in which each lineage is defined by the name of the main haplogroup, followed by the terminal mutation (i. e. J-M410). Haplogroups that are not yet defined by any downstream mutations are referred to as , because they are potentially paraphyletic, and indicated by an “*”. On the contrary, the evolutionary path which led to the observed STR haplotype diversity cannot be reconstructed in the form of a tree consisting of only divergent branches, but reticulated structures as networks are needed (Bandelt et al., 1999). In the Y-chromosome system the term haplotype refers to the combination of alleles at STR loci. Haplotypes nomenclature varies from paper to paper. Out of the 200 STR loci identified, only a limited number of them is mostly used both in population genetic and forensic studies: DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS385ab, DYS438, DYS439, DYS437, DYS448, DYS456, DYS458, DYS635, YGATAH4 (Kayser et al., 1997; Pascali et al., 1999; Ayub et al., 2000; Redd et al., 2002; Mulero et al., 2006). From 1999 all data have been

87

2.5 Y-chromosome and population genetics deposited at http://www.yhrd.org. This database has become a relevant resource to locate populations of the world harboring haplotypes identical or similar to a particular type, also for evolutionary studies.

Figure 2.5.1 Structure of the Y-chromosome phylogenetic tree. The array indicates the root of the tree. The main haplogroups are indicated by a capital letter and a different color. Mutation names are indicated along the branches. The length of the branches is not proportional to the number of mutations or their ages (www.familytreedna.com). Mutations are indicated with a specific letter and number chosen by the discovers‟ group (M- Mutation, Underhill et al., 1997; P-Polymorphism Hammer et al., 2000; V-Variation Cruciani et al., 2006). The ancestral state is determined by comparison with anthropomorphic apes as chimpanzees.

88

2.5 Y-chromosome and population genetics 2.5.3 Phylogeography

Phylogeography is the discipline that combines the temporal dimension of the phylogeny with the spatial dimension of geography. Thus, the study of the geographical distribution of haplogroups and sub-haplogroups and their phylogenetic relationships makes possible to trace back ancient population movements (Avise et al., 1987). In figure 2.5.2 is illustrated the geographical distribution of the main Y-chromosome haplogroups in the different continents.

Figure 2.5.2 Geographical distribution of the major Y-chromosome haplogroups. Each circle shows a sample population and the area of sectors is proportional to the relative haplogroup frequency.

As evidenced by figure 2.5.2, the process of human molecular differentiation has occurred mainly during and after the process colonization and diffusion into the different regions and continents, thus haplogroups and sub- haplogroups tend to be restricted to specific geographic areas and population groups.

89

2.5 Y-chromosome and population genetics 2.5.4 Reconstructing the routes followed by modern humans

The estimate of microsatellite variation within a haplogroup in different geographic areas allows reconstructing the direction of the past human migratory events. The basis of this approach is that a Y-haplogroup originates when a new binary-marker mutation occurs. This happens on a single chromosome so there is no associated microsatellite variation. With the spread of the lineage, microsatellite mutations will occur throughout generations: the longer the time, the greater the accumulated variation (figure 2.5.3). Thus, knowing the mutational rate of the microsatellite loci, it is possible to estimate the time of origin of a mutation or of other demographic events associated to it.

Figure 2.5.3 Accumulation of microsatellite variation throughout time (Jobling and Tyler-Smith, 2003).

This approach, which combines the study of stable and variable markers, has provided important information in reconstructing the human history, confirming the African origin and tracing the ancient routes followed by the modern humans after the “Out of Africa” exit (figure 2.5.4).

Figure 2.5.4 Map of the Homo sapiens migrations (www.familytreedna.com).

90

2.5 Y-chromosome and population genetics In agreement with the fossil records and mtDNA data, the most ancient haplogroups of the phylogeny (A and B) are Africa-specific and the microsatellite variation associated to them has been evaluated ~150 thousand years ago (kya; Hammer et al., 1998; Thomson et al., 2000). In addition, haplogroup variations observed in the nowadays populations of the various continents are in agreement with the “out of Africa” migration towards the various continents following favorable climate conditions. In 2013, a Y chromosome characterized by ancestral alleles for all the identified SNPs was identified in an African American individual. Nowadays this chromosome characterizes the deepest branch of the tree that was named A00 (Mendez et al., 2013) and the time of the MRCA (TMRCA) of the Y tree was dated at 338 kya, thus exceeding estimates of the mtDNA TMRCA, as well as those of the age of the oldest anatomically modern human fossils. The extremely ancient age combined with the rarity of the A00 lineage, which was also found at very low frequency in Central Africa, point to the importance of considering more complex models for the origin of Y-chromosome diversity. These models include ancient population structure and the possibility of archaic introgression of Y chromosomes into anatomically modern humans. 2.5.5 Y-chromosome haplogroups

The main Y-chromosome haplogroups are 18. In figure 2.5.5 the geographic distribution of the main Y-chromosome haplogroups is illustrated. 2.5.5.1 Native American haplogroups

The current pool of Native American Y chromosomes is a mixture of haplogroups that derive from pre-Columbian dispersals from Siberia and more recent gene flow from Europe and Africa (Grugni et al., 2015; Kimura et al., 2017). Although the identification of the founding lineages has been complicated because of the high historical rate (about 16%) of male-mediated admixture in Native Americans (Bosch et al. 2003), two founding lineages, haplogroups C and Q, have been early described as typical of Amerindians (Underhill et al. 1996; Karafet et al. 1997; Karafet et al. 1999). On average, they account for about 5% and 75% of Native Americans, respectively. Until few years ago, the resolution of these haplogroups did not undergo substantial improvements; only a few studies based on full Y-chromosome sequencing include samples belonging to haplogroup Q (Lippold et al., 2014; Karmin et al., 2015), and none of them focused specifically on its phylogeny.

91

2.5 Y-chromosome and population genetics

Figure 2.5.5 Y-chromosome haplogroup geographic frequency distribution maps (Chiaroni et al., 2009).

Haplogroup C is phylogenetically one of the most ancient haplogroups. M130 is the mutation that characterizes haplogroup C; which has a wide distribution across Asia and Oceania whereas it is less frequent in Europe and the Americas, and not observed in Africa (Figure 2.5.7). It has not been detected in sub-Saharan African populations, therefore suggesting an Asian origin after anatomically modern humans migrated out of Africa (~50 Kya). Being a non- African lineage, haplogroup C is highly useful in tracing the migration route of the African exodus in prehistory (Figure 2.5.6)(Zhong et al., 2010). It includes six main branches (C1-C6), each with a specific geographic distribution. Chromosomes C* are found on the Indian subcontinent, Sri Lanka and in parts of South-East Asia. The rare C1 lineage appears to be restricted to Japan. C2 is found predominantly in New Guinea, Melanesia, and Polynesia. C3 clade

92

2.5 Y-chromosome and population genetics seems to have originated in South-East or Central Asia and then spread in two directions: towards Northern Asia and the Americas, and towards Eastern and Central Europe, where it may represent the record of the westward expansion of the Huns in the early middle ages. C4 is found exclusively among aboriginal Australians and is dominant in that population. C5 has a significant presence in India (Karafet et al., 2001; Karafet et al. 2008; Bortolini et al., 2003; Hammer et al., 2006; Kayser et al., 2006; Pakendorf et al., 2006; Regueiro et al., 2006; Sengupta et al., 2006).

Figure 2.5.6 Frequency distribution of haplogroup C in worldwide populations and the inferred migration routes of the African exodus carrying the M130 mutation (Zhong et al., 2010). Haplogroup Q: M242 is the signature marker of haplogroup Q, which seems to have arisen in Siberia (Altai regions) where it is widely distributed (Zegura et al., 2004), then arriving in Northern Eurasia up to the Americas. It is distributed widely in North Eurasia and is found at high frequencies in some Siberian groups and at low frequencies in North-Europe (Marjanovic et al., 2005; Di Gaetano et al., 2009; Battaglia et al., 2009), Middle East (Regueiro et al., 2006; Grugni et al., 2012) and South Asia (Deng et al., 2004; Sengupta et al., 2006; Fornarino et al., 2009). It represents the major lineage among the Native Americans. In 1997 Underhill and colleagues discovered a SNP that became known as M3 (Underhill et al., 1997). Q-M3 sub-clade is the most prevalent one in both North and South Native American individuals; according to its distribution and to the associated haplotypes, it is likely to has originated in Central Asia and then spread from middle Siberia where it is highly frequent nowadays. Q-M3 and Q-M242(xM3), subsequently classified as Q-L54(xM3) have been identified as the two American founding lineages both containing a complex arrangement of sub-clades (Geppert et al., 2011; Dulik et al., 2012; Battaglia et al., 2013). Very recently Balanovsky

93

2.5 Y-chromosome and population genetics and colleagues provided an updated version of the phylogenetic structure and geographic distribution of haplogroup Q, where the majority of its sub-haplogroups is observed in Asia but two of its sub-branches, Q-M3 and Q-Z780, capture the majority of extant Native American Y chromosomes (Zegura et al., 2004; Jota et al., 2016). Balanovsky also presented a detailed phylogenetic reconstruction of Q- L275 marker, a sub-haplogroup that is confined to West Asia and neighbouring parts of Central and South Asia (mainly Pakistan, West India and also ). Interestingly, a sub-branch of this clade was acquired by a population ancestral to Ashkenazi Jews and grew within this population, reaching up to 5% in present day Ashkenazi (figure 2.5.7)(Balanovsky et al., 2017).

Figure 2.5.7 Phylogenetic structure of haplogroup Q-M242. The tree nodes are indicated with both types of nomenclature provided by the YCC. At the bottom are shown the main place of distribution of the different sub-clades (from Balanovsky et al., 2017, modified).

94

3. MY CONTRIBUTION

3. MY CONTRIBUTION

3.1 Classical HLA genetics

3.1 Classical HLA genetics: Evaluation of Luminex® xMAP® technology applied to high definition HLA typing of cord blood units prior to listing. 3.1.1 Aim of the research

Allele-level donor-recipient match at HLA-A,-B,-C and DRB1 loci impacts on the outcome after CBT for hematologic malignancies and is modifying the strategy of donor selection. High definition of both class I and II HLA loci at time of listing is a way to improve the attractiveness of CBBs inventories, reducing the time for donor search and procurement and simplifying donor choice, in particular for patients of non-European heritage. We aimed to develop a platform that enables to achieve high definition HLA typing of CBUs at time of banking in a quick, accurate and cost-effective manner. 3.1.2 Background

In allo-HSCT setting, CB represents an alternative source of HSCs other than BM and PBSC (Ballen et al., 2013). BM and PBSC unrelated donors are chosen on the basis of high resolution allelic typing for HLA-A, -B, -C, -DRB1, - DQB1, the best option being a 9/10 or 10/10 matched donor. In contrast, unrelated CBUs with 6/6, 5/6 or 4/6 match are considered acceptable donors, due to the tolerogenic properties of this stem cell source. Therefore, current HLA definition of CBUs is based on serological/low resolution antigenic typing for HLA–A and – B and high resolution allelic typing for -DRB1, where up to 1-2 mismatches are permitted and HLA-A and B mismatches are preferable to DRB1 ones (Rocha et al., 2009). Recently, Ruggeri reviewed the impact of immunogenetic factors on CBT outcomes, pointing out the importance of increasing the level of donor–recipient match at class I and class II HLA loci (Ruggeri et al., 2016). In single unit CBT, a joined CIBMTR-Eurocord retrospective analysis by Eapen reported that allele- level matching at HLA-A,-B,-C, and -DRB1 is associated with the lowest NRM in patients with acute leukemia and myelodysplastic syndrome (Eapen et al., 2014). CBUs fully matched or with one/two allele mismatches should be preferred, while the use of CBUs with three or more allele mismatches should be carefully assessed for the risk of graft failure and NRM. In double CBT, the results are still controversial. Oran suggested that high-resolution typing and selecting CBU matched at least at 5/8 alleles may reduce 2-year TRM (39% vs 60% for CBUs

95

3.1 Classical HLA genetics with 4/8 or less), where 7-8/8 matched units remain the best option (no deaths for TRM) (Oran et al., 2015). On the other hand, Brunstein reported no impact of allele level HLA matching on CBT outcome, and even a lower risk of relapse and treatment failure associated with a higher degree of HLA mismatch in acute leukemia patients (Brunstein et al., 2016). Taken together all these premises led to the fact that nowadays a transplant center is oriented to select a CBU on the basis of a more defined level of HLA typing as respect to the current standard (antigenic typing for HLA-A and –B, allelic typing for DRB1 only), modifying the algorithm of donor choice (Dahi et al., 2014). In this scenario, the simplest approach is to perform allele typing for HLA-A,–B,-C, and -DRB1 on a CBU just prior to the release for HSCT, giving the transplant physician the information for the best donor choice at time of final decision, as recommended by NetCord-FACT standards [NetCord FACT International standards for cord blood collection, banking, and release for administration, 6th edition, July 2016] (www.factwebsite.org). By contrast, we propose the HLA typing of CBUs at medium/high- resolution level prior to the listing, to improve the utilization of our CBUs by offering an inventory tailored on the transplant physician‟s needs. In this setting the Luminex® xMAP® platform (LABScan 100™xPonent, Luminex Corporation, Austin, TX, USA), which essentially consists of a fluorescent reverse PCR-SSO, may represent a creditable approach to perform HLA analysis in a quick, accurate and cost-effective manner (Dunbar, 2006; Testi and Andreani, 2015). In this study, we evaluated the impact of introducing the Luminex platform in our routine practice by comparing the results of the 2014 activity (based on Luminex® xMAP® technology) and the 2013 one (based on LiPA revPCR-SSO plus PCR- SSP), taking into consideration time, performance and costs. 3.1.3 The sample

Cord blood withdrawal was performed by in utero technique using commercial collection bags containing CPD, according to the procedures of the Pavia CBB, in compliance with NetCord-FACT standards. Cryopreservation and banking occurred within 48 hours after delivery, as per NetCord-FACT standards. At the CBB, from each CBU suitable for banking, a sample of 3 ml was taken in a tube containing EDTA and was sent to the Immunogenetics Laboratory, accompanied by one sample of maternal blood. All CBUs and samples were collected after signing the appropriate informed consent; apposite procedures were in place to ensure both confidentiality of the donors and traceability. All DNAs were isolated with fully automated process (Maxwell®16, Promega Instrument, Madison WI, USA) and stored at -20°C until tested. In 2014, the Luminex® xMAP® technology was introduced in our routine and was applied to CBU typing at high definition for HLA-A,-B,-C, and -DRB1. Before 2014, CBU typing was performed by a combined approach relying on LiPA revPCR-

96

3.1 Classical HLA genetics

SSO plus PCR-SSP. The different steps of the 2013 and 2014 procedures are summarized in figure 3.1.1. In 2013, 120 CBUs were typed for HLA-A and –B at low resolution and HLA-C and -DRB1 at high resolution by LiPA revPCR-SSO plus PCR-SSP. The 2013 activity was compared with that of 2014, where 113 CBUs were typed for HLA-A,-B,-C, and -DRB1 at medium/high resolution by the Luminex® xMAP® technology. CBU mothers were always typed concurrently for HLA-A, -B and –DR at low resolution, independently of the approach used for CBU typing, to provide the confirmation of the haplotype segregation, as per our policy. Concerning the number of CBUs included in the two groups (those tested on 2013 and those in 2014, respectively), the two groups are comparable, as assessed by Chi square test or Fisher exact test, as appropriate. Moreover, as the purpose of our study was to compare the time, cost and average number of repetitions needed to obtaing a CBU‟s complete HLA-A,-B,-C,-DRB1 typing in 2014 vs 2013 (when two different approaches were used), typing twice the same CBU by the methods in use in 2013 and 2014 was not necessary.

Figure 3.1.1 Schematic description of the steps required for Luminex® xMAP® based approach, used for the 2014 activity (right), and for LiPA revPCR-SSO plus PCR-SSP one, used for the 2013 activity (left) (Guarene et al., submitted).

97

3.1 Classical HLA genetics

3.1.4 Results and discussion 3.1.4.1 Analysis of the time from banking to listing

First, for each CBU considered in the study, we calculated the time elapsed between the infant‟s donor date of birth (assumed to correspond to the time of banking) and the test reporting date, to obtain the testing time. Assuming that listing occurred at completion of the testing report, the testing time is indicative of the delay between banking and listing CBUs, and was found significantly shorter in 2014 (mean: 141 [SD: 30] days, range: 72-187) than in 2013 (mean: 181 [SD: 15] days, range: 113-273), even with equal staff employed, p-value <0.001 (see table 3.1.1). This is not surprising because the traditional approach (2013 activity) consisted of a two-step typing (low resolution SSO followed by high resolution SSP), while Luminex® xMAP® technique (2014 activity) provides directly a high definition result in a single step. Moreover, Luminex platform allows the simultaneous processing of a large number of samples (more than 90) with a reduced quantity of DNA (2 l at 20 ng/l) needed for amplifying each genomic locus-specific sequence, ensuring optimization of time consuming.

Table 3.1.1 Comparison of time, test repetitions and cost needed to type CBUs at listing: Luminex® xMAP® based activity (2014) (right) versus LiPA revPCR-SSO plus PCR-SSP approach (2013) (left) (Guarene et al., submitted). Time: data are mean, Standard Deviation (SD) and range in brackets. Test repetitions: data are the total number of repetitions (except those due to confirmation of homozygous status) and, in brackets, percentage of repetitions over the number of testing, namely n=120 in 2013 and n=113 in 2014, respectively. Cost refers to the overall cost for HLA-A, -B, -C and –DRB1 typing per each CBU on average, including the cost for test repetitions. P<0.05 was considered statistically significant. n.s. = not significant

2013 (LiPA revPCR-SSO 2014 2014 plus PCR-SSP) (Luminex® xMAP®) vs 2013 (n=120) (n=113)

Time between 181 (SD:15) days 141 (SD:30) days p<0.001 banking and listing (113-273) (72-187)

Total number of test repetitions 27 (22.5%) 13 (11.5%) p=0.001 (HLA-A)

98

3.1 Classical HLA genetics

Total number of test repetitions 27 (22.5%) 21 (18.5%) p=n.s. (HLA-B)

Total number of test repetitions 19 (15.8%) 19 (16.8%) p=n.s. (HLA-C)

Total number of test repetitions 28 (23.3%) 22 (19.5%) p=n.s. (HLA-DRB1)

Cost 395,6 euro 240,7 euro

3.1.4.2 Analysis of the test repetitions

The proportion of repetitions was found lower in 2014 typed CBUs vs 2013 ones for all the HLA loci considered, except for locus HLA-C. Test repeats decreased from 22.5% to 11.5% for HLA-A (p-value=0.001, OR=2.23), from 22.5% to 18.5% for HLA-B (p-value=n.s., OR=1.53) and from 23.3% to 19.5% for HLA-DRB1 (p-value=n.s., OR=1.61) in 2013 vs 2014, respectively. For HLA-C, the percentage of repetitions was 19 (15.8%) in 2013 vs 19 (16.8%) in 2014 (p-value=n.s., OR=0.93). However, this finding cannot be evaluated for the purpose of this analysis, because locus C was defined using a different typing approach. In fact, due to the increasing evidence of HLA-C mismatch impact on CBT related mortality, we have implemented HLA-C high resolution typing on all our CBU and we have been routinely performing it on listed CBU since 2011 (Eapen et al., 2011). Our analysis showed that, after the introduction of Luminex® xMAP®, a decreasing trend in the overall number of repetitions can be observed for all HLA loci, except for HLA-C, though reaching the statistical significance for HLA-A only (p-value=0.001), as shown in table 3.1.1. The reasons for test repetitions are detailed in table 3.1.2. For each allele two main reasons for test repetitions are taken into account, namely de novo (test failure) and ambiguous results resolving. Then the sum of the overall number of repetitions (due to both these reasons) was compared between 2013 and 2014. As test repetitions consist of a second SSP-based test mainly needed in case of failed test or to solve ambiguous finding, their reduction seems to point to the better performance of Luminex® xMAP® vs the previous method. Another reason for test repetitions is for confirming the homozygous status, accounting for 13 HLA-A, 6 HLA-B and 4 HLA-DRB1 typings in 2013 vs 10 HLA-A, 6 HLA-B and 11 HLA-DRB1 typings in 2014. However these repetitions have been excluded from the comparative analysis of the two

99

3.1 Classical HLA genetics

approaches, as they are not expression of the performance but simply due to chance.

Table 3.1.2 Reasons for test repetitions when typing CBUs: Luminex® xMAP® based activity (2014) is compared to LiPA revPCR-SSO plus PCR-SSP approach (2013) (Guarene et al., submitted). P<0.05 was considered statistically significant. n.s.= not significant

2013 2014 (LiPA revPCR-SSO plus PCR-SSP) (Luminex® xMAP®) (n=120) (n=113) 2014 vs Ambiguous Total Ambiguous Total 2013 De novo De novo results number of results number of (test failure) (test failure) resolving repetitions resolving repetitions

HLA-A 26 1 27 6 7 13 p=0.001 HLA-B 23 4 27 12 9 21 p=n.s. HLA-DRB1 24 4 28 10 12 22 p=n.s.

3.1.4.3 Analysis of the costs

Cost analysis revealed that the cost for the complete HLA-A, -B, -C and – DRB1 typing of a CBU was reduced in 2014 vs 2013, accounting for 240,7 versus 395,6 euros per unit on average, respectively (see table 3.1.1). This is not surprising if we consider that the cost of HLA-A, -B, -C and – DRB1 testing per sample includes not only the cost of completing the test, but also the repetition test, when needed. In fact, the traditional approach (2013 activity) implied two steps of typing (low resolution SSO followed by high resolution SSP), while Luminex® xMAP® technique (2014 activity) provides a high definition result in a single step, with evident cost saving. Moreover, the proportion of test failures and repetitions demonstrated to be reduced for the 2014 approach. The strategy described in this study has been maintained in 2015 (n= 60 CBUs) and in 2016 (n= 30 CBUs, first semester) with results comparable to 2014 in terms of cost saving and quality of the definition level. 3.1.5 Conclusion and perspectives

Despite the current demand for unrelated CBUs is declining concurrently to the rising of family haploidentical SCT, cord blood is still an option, in particular for patients of non-European descent. As the impact of high allele-level donor-recipient match at HLA-A,-B,-C and DRB1 loci on post-transplant mortality for hematologic malignancies is modifying the strategy of donor

100

3.1 Classical HLA genetics

selection, we suggest that Luminex® xMAP®, as a technology widely available among HLA Laboratories, has a useful application in CBB programs. Defining allele polymorphisms for the whole panel of class I and II HLA loci at time of listing will improve the attractiveness of CBB inventories by reducing the time for donor search and procurement and simplifying donor choice. The results of this study have been reported in a manuscript submitted for publication (Guarene et al., submitted). Prospectively, the impact of the strategy based on high definition typing of CBUs at time of banking remains to be evaluated in terms of index of released CBUs and time between listing and issue for transplantation. Updates in Luminex® platform, such as FLEXMAP 3D® application, will improve both the level of definition and the cost-effectiveness. Finally, a strategy based on next-generation sequencing platforms could be reasonably implemented in CBB setting, even if the sustainability of this approach could be better guaranteeded if the test is provided on a centralized basis than at the level of single CBBs, or HLA laboratories.

101

3.2 non HLA genetics

3.2 Non-HLA genetics: evaluation of the impact of immune response genes on CBT outcome

3.2.1 Aim of the research

The identification of factors affecting CBT outcomes with the purpose of improving survival and reducing life-threatening complications is of paramount importance, and may have consequences on the current criteria for CBU selection. With the development of human genomics, many studies using SNPs of immune response and drug metabolism have shown their influence on outcomes after HSCT, even if in CBT only one study has been reported so far. Therefore, a retrospective cohort registry-based analysis was conducted in collaboration between Netcord banks and Eurocord with the aim of studying how SNPs of eight candidate genes implicated in immune response, whose genotypes are determined on the CBUs and/or CBT recipients with malignant disorders, can influence the transplant outcomes, including engraftment, acute and chronic GvHD, NRM, relapse and survival. The primary objective was to study NRM, defined as death not related to recurrence of primary disease. Secondary outcomes were defined as follows: (1) overall survival (OS): time interval between transplantation and death due to any cause; (2) disease-free survival (DFS): time of life without relapse of the primary disease, (3) acute and chronic GvHD: diagnosis and grading were assigned by the transplant center using standard criteria (Glucksberg et al., 1974; Martin et al., 2006); (4) relapse of disease: event characterized by recurrence of the primary disease, (5) neutrophil and platelet engraftment: neutrophils count > 0.5x109/L for 3 days and, platelets > 20x109/L for 3 days with no prior transfusion for at least 3 days (Ljungman et al., 2010; Filipovich et al., 2005).

3.2.2 Background

Cord blood unit from unrelated donor has been used in the absence of HLA-matched related or unrelated HSCT donor. CBT outcomes are influenced by CBU-related (mainly cell dose and number of HLA disparities) and recipient- related factors, such as the underlying disease or the conditioning regimen and GvHD prophylaxis (Kollman et al., 2001; Lee et al., 2007; Woolfrey et al., 2011). Identification of these factors is critical to improve the percentage of successful transplants by enhanced criteria for selecting the most appropriate CBU for a patient. Besides better HLA matching based on allele typing that has been recently shown to have growing importance in CBT (Eapen et al., 2014), genetic factors other than HLA are postulated to be involved in HSCT complications, basing on

102

3.2 non HLA genetics the observation that GvHD and graft rejection may occur even when HLA loci are identical between donor and recipient. For instance, matching of KIR ligand or NIMA have been reported to affect outcomes after CBT and could be used as criteria for CBU choice when multiple units with comparable HLA-disparities and cell dose are available (Willemze et al., 2009; Rocha et al., 2012). Proinflammatory cytokines, their receptors and inhibitors have been implicated in GvHD. A number of SNPs found in the encoding regions and in the promoter of cytokine genes, which leads to a variation in the functional level or activity of the corresponding cytokine, is supposed to be relevant for post- transplant infectious complication, i.e. IL-10, TNF. Moreover, genes related to innate immunity such as NOD-like receptors and Toll-like receptors could be implicated in relapse, GvHD and susceptibility to infections (Dickinson, 2008; Penack et al., 2010). Many studies on influence of the SNPs of immune response and drug metabolism on HSCT have been published (Dickinson et al., 2004). However, in the CBT setting only one study including a small and heterogeneous group of 115 CBT recipients has been described, where recipients and donors pairs have been genotyped for TNF-alpha and IL-10, as well as for other SNPs, but no significant association with transplant outcomes was observed (Kögler et al., 2002). 3.2.3 The sample

Our laboratory contributed to a multicentric study to perform a complete retrospective cohort analysis completed in collaboration with Eurocord, Cellular Therapy & Immunobiology Working Party of European Society for Blood and Marrow Transplantation (EBMT-CTIWP), NetCord and Ribeirão Preto School of Medicine of São Paulo University (FMRP-USP), Brazil. Table 3.2.1 lists the seven CBBs that provided the CBU samples for genotyping. The Pavia CBB contributed with 30 CBUs accounting for the 4.3% of all CBUs enrolled in the study. The inclusion criteria were: 1) recipients of unrelated single CBT with malignant diseases; 2) availability of CBU samples for the CBUs used in the CBT; 3) transplants performed in EBMT centers and 4) availability of clinical data at EBMT-Eurocord database. The exclusion criteria were: 1) recipients of two or more CBUs; 2) recipients of CBU combined with another source of HSCs; 3) CBT with intrabone infusion and/or 4) CBT with expanded in vitro CBU or any other form of experimental manipulation. From January 1994 to December 2010, a total of 696 CBT recipients met the eligibility criteria. However, only 143 samples from the recipients were collected before transplant and kept in the CBB (Düsseldorf), and were available for analysis. All DNA samples were shipped in dry ice to the Laboratory of Hematology of Ribeirão Preto School of Medicine of São Paulo University according to International and Brazilian rules for shipment of biological material, provided appropriate informed consent of performing genetic studies in CBU and

103

3.2 non HLA genetics

patients has been obtained. Moreover the study was approved by the local ethical committee (Comité de protection des personnes Ile-de-France IV) at the Saint- Louis Hospital (Paris, France), where Eurocord headquarters are located. Conditioning protocols, GvHD prophylaxis, selection of CBU, use of G- CSF, reactivation of CMV surveillance and use of antimicrobial agents followed guidelines and rules of each transplant center.

Table 3.2.1 List of participating cord blood banks (CBB) and amount of samples shipped (Cunha et al., 2017).

CBB City Country Samples Percentage

José Carreras Stammzellbank* Düsseldorf Germany 143 20.5

Banc de Sang i Teixits Barcelona Spain 279 40.1

Banque de Sang Placentaire - Besançon France 147 21.1 Etablissement Français du Sang

Centro Regional de Transfusion de Málaga Spain 28 4.0 Málaga

Centro de Transfusión de Madrid Madrid Spain 44 6.3

Banca Toscana di Sangue Florence Italy 25 3.6 Placentare

Banca del Sangue del Cordone Pavia Italy 30 4.3 Ombelicale

Total 696 100.0

* Düsseldorf CBB shipped additional 173 samples of recipients stored before transplantation 3.2.4 Results

3.2.4.1 Recipients, donors and transplant characteristics

Patients, disease, donor (CBU) and transplant characteristics of the three cohorts of CBT recipients are listed on table 3.2.2. The first cohort includes 696 patients who met the eligibility criteria of the study. Ten percent of CBT (n=68) were HLA-identical (6/6), 39% (n=266) were transplanted with one HLA disparity (5/6), 51% (n=346) with two or more HLA disparities (4/6 or 3/6). Infused median TNC was 3.4x107/kg (range 0.6 to 30) and CD34+ cells 1.5x105/kg (range 0.2 to 39). Median follow-up was 49 (range 0.8 to 195) months. The second cohort corresponds to the group of patients that includes 305 CBT where HLA high-resolution typing of HLA-A, -B, -C and -DRB1 of patients and cord blood units were performed. Only 5% (n=15) of patients were

104

3.2 non HLA genetics transplanted with an 8/8 CBU graft, whereas 13% (n=40) were 7/8; 25% (n=76) were 6/8, 32% (n=98) were 5/8, 19% (n=59) 4/8 and 6% (n=17) were 3/8. The third cohort includes 143 CBT whose recipients‟ DNA samples were available for genotyping. All recipients‟ samples were provided by José Carreras Stammzellbank, Düsseldorf, Germany (n=143), as usually CBBs did not keep patients‟ samples and so did the other participating CBBs. Therefore third cohort of patients included those transplanted with a CBU from the Dusseldorf CBB, where 81% (n=117) were transplanted with a 5/6 and 4/6 CBU.

Table 3.2.2 Recipients, donors and CBT characteristics (Cunha et al., 2017).

Recipients Recipients with Recipients with available available CBU with available CBU samples samples and HR-HLA samples Recipients (n=696) data (n=305) (n=143)

COHORT 1 COHORT 2 COHORT 3

Gender (male), n(%) 380 (56) 176 (58) 88 (61)

Age (years), median (range) 17 (0.3 - 69) 12 (0.7-60) 9.3 (0.3-60)

Weight (kg), median (range) 50 (6 - 120) 43 (6-112) 30 (5.6-119)

Children ( < 18 years), n(%) 375 (54) 191 (66) 106 (74)

Positive CMV serology, n(%) 393 (61) 180 (62) 69 (52)

Major ABO incompatibility, n(%) 190 (34) 91 (36) 63 (44)

Diagnosis, n(%)

ALL 274 (39) 160 (53) 62 (44)

AML 230 (33) 109 (36) 36 (25)

MDS 72 (10) 36 (11) 23 (16)

CML 34 (4.9) - 9 (6)

CLL 5 (1) - -

Lymphoma 55 (8) - 6 (4)

Myeloma 7 (1) - 2 (1)

Histiocytosis 18 (3) - 5 (4)

Aplastic anemia - - -

Hemoglobinopathy - - -

SCID - - -

Metabolic diseases - - -

Others 1 (0.1) - -

105

3.2 non HLA genetics

HLA compatibility, n(%)

8/8 - 15 (5)

7/8 - 40 (13)

6/8 or 6/6 68 (10) 76 (25) 21 (15)

5/8 or 5/6 266 (39) 98 (32) 65 (45)

4/8 or 4/6 318 (47) 59 (19) 52 (36)

3/8 or 3/6 22 (3) 17 (6) 4 (3)

2/8 or 2/6 6 (1) - 1 (1)

Disease status at time of UCBT, n(%)

Early 225 (33) 125 (41) 33 (23)

Intermediate 260 (38) 114 (37) 63 (44)

Advanced 205 (27) 66 (22) 47 (33)

Conditioning, n(%)

Myeloablative 556 (80) 304 (99.7) 124 (87)

Non myeloablative 132 (20) 1 (0.3) 19 (13)

GVHD prophylaxis, n(%)

CsA + Others 597 (91) 276 (90) 16 (11)

MTX + Others 47 (7) 16 (6) 117 (82)

Others 13 (2) 13 (4) 10 (7)

Use of ATG and /or monoclonal antibody, n(%)

Yes 538 (81) 269 (91) 123 (88)

No 124 (19) 26 (9) 20 (12)

Infused TNC dose, x 107/Kg, median 3.4 (0.6 - 30) 3.6 (1.2-20) 3.7 (0.6-25) (range)

Infused CD34+ cells dose, x 105/Kg, 1.5 (0.2 - 39) 1.7 (0.2-18) 1.6 (0.2-18) median (range)

Follow-up, months (range) 49 (0.8-195) 50 (3-145) 66 (12-195)

CBU, cord blood unit; CBT, cord blood transplantation; CMV, cytomegalovirus; HR-HLA, high resolution-human leukocyte antigen; ALL, acute lymphoblastic leukemia; AML, acute myeloid leukemia; MDS, myelodysplastic syndromes; CML, chronic myeloid leukemia; CLL, chronic lymphocytic leukemia; CsA, cyclosporin A; MTX, methotrexate; ATG, anti-thymocyte globulin; MoAb, monoclonal antibody; TNC, total nucleated cells

106

3.2 non HLA genetics

3.2.4.2 Analysis of the genetic polymorphism

The genes under investigation are summarized in table 4.4.1, namely the following candidate genes related to immune response: NLRP1 (rs-5862), NLRP2 (rs-043684), NLRP3 (rs-10754558), TIRAP/Mal (rs-8177374), IL10 (rs-1800872), REL (rs-13031237), TNFRSF1B (rs-1061622) and CTLA4 (rs-3087243). The results of CBU and recipient gene polymorphisms and their prevalence are listed in tables 3.2.3 and 3.2.4. No allele or genotype with a frequency lower than 1% was observed in the study. Allele frequencies were similar among groups and were in Hardy-Weinberg equilibrium.

Table 3.2.3 Results of allele and genotype frequencies for recipients of cohort 1 (n=696) (Cunha et al., 2017).

SNP Allele n Ratio Genotype n Ratio

A 689 0.495 AA 177 0.254 NLRP1 G 703 0.505 AG 335 0.481 (NALP1) GG 184 0.264

A 790 0.568 AA 239 0.343 NLRP2 G 602 0.432 AG 312 0.448 (NALP2) GG 145 0.208

C 829 0.596 CC 252 0.362 NLRP3 G 563 0.404 CG 325 0.467 (NALP3) GG 119 0.171

C 1164 0.836 CC 483 0.694

TIRAP-MAL T 228 0.164 CT 198 0.284

TT 15 0.022

G 1036 0.744 GG 392 0.564

IL10-592 T 354 0.254 GT 252 0.363

Undetermined 2 0.001 TT 51 0.073

G 866 0.622 GG 276 0.397

REL (cREL) T 526 0.378 GT 314 0.451

TT 106 0.152

T 1092 0.784 TT 425 0.611

TNFRSF1B G 300 0.216 GT 242 0.348

GG 29 0.042

107

3.2 non HLA genetics

G 673 0.483 AA 162 0.233

CTLA4 A 719 0.517 AG 349 0.501

GG 185 0.266

CBU, cord blood unit; SNP, single nucleotide polymorphism

Table 3.2.4 Results of allele and genotype frequencies for recipients of cohort 2 (n=305) (Cunha et al., 2017).

SNP Allele n Ratio Genotype n Ratio

A 310 0.508 AA 83 0.272 NLRP1 G 300 0.492 AG 144 0.472 (NALP1) GG 78 0.256

A 323 0.530 AA 93 0.305 NLRP2 G 287 0.470 AG 137 0.449 (NALP2) GG 75 0.246

C 352 0.577 CC 102 0.334 NLRP3 G 258 0.423 CG 148 0.485 (NALP3) GG 55 0.180

C 508 0.833 CC 209 0.685

TIRAP-MAL T 102 0.167 CT 90 0.295

TT 6 0.020

G 446 0.731 GG 163 0.536

IL10-592 T 162 0.266 GT 120 0.395

Undetermined 2 0.003 TT 21 0.069

G 367 0.602 GG 119 0.390

REL (cREL) T 243 0.398 GT 129 0.423

TT 57 0.187

T 370 0.607 TT 179 0.587

TNFRSF1B G 240 0.393 GT 12 0.039

GG 114 0.374

G 299 0.490 AA 73 0.239

CTLA4 A 311 0.510 AG 153 0.502

GG 79 0.259

CBU, cord blood unit; SNP, single nucleotide polymorphism

108

3.2 non HLA genetics

Data are not shown for the third cohort of CBT recipients (n=143), where no statistical association was found between the gene polymorphisms tested in the DNA samples of the patients and CBT outcomes.

3.2.4.3 Analysis of the clinical outcomes

As for the third cohort of CBT recipients (n=143), no statistical association was found between the polymorphisms of the patients and CBT outcomes, only outcomes from cohort 1 and 2 are shown. Details of univariate analysis of all cohorts are described in tables 3.2.5, 3.2.6a and 3.2.6b.

NEUTROPHIL AND PLATELET ENGRAFTMENT

Cumulative incidence (CI) of neutrophil recovery was 82% at day 60 post CBT. According to CBU genotype, univariate analysis showed an association of CBU-CTLA4 and neutrophil recovery. At day 60, CI of neutrophil recovery for patients transplanted with an AA CTLA4-CBU was 85%, versus 84% and 77% for those transplanted with AG and GG CTLA4-CBU genotypes, respectively. Multivariate analysis confirmed the delayed neutrophil recovery for recipients of CBU with GG CTLA4 genotype (HR: 1.24; 95%-CI: 1.09 to 1.40; p=0.03). At day 180, CI of platelet recovery was 57%. According to CBU genotype, univariate analysis indicated that TNFRSF1B gene polymorphism was associated with platelets recovery. CI of platelets recovery at day 180 was 41% for recipients of CBU with GG genotype versus 58% and 58% for TG and TT genotypes, respectively. Also in the multivariate model, platelet recovery was decreased in recipients of CBU with GG TNFRSF1B genotype (HR 1.94; 95%-CI 1.71 to 2.19; p=0.02). ACUTE AND CHRONIC GRAFT-VERSUS-HOST DISEASE

Cumulative incidence of grade II-IV acute GvHD was 31% at day 100. No statistical association was found between any of the SNPs analyzed and the incidence of acute GvHD. Also, no statistical association between the candidate genes and chronic GvHD was observed. Also in multivariate analysis, the variables associated with acute and chronic GvHD did not include the investigated SNPs.

109

3.2 non HLA genetics

Table 3.2.5 Results of univariate for cohort 1 (n=696), according to gene polymorphisms (Cunha et al., 2017).

Relapse: OS: 40 + 2 DFS: 35 + 2 NRM: 37 + 2 28 + 2 SNP Genotype n

% % (range) p % (range) p % (range) p p (range)

TT 106 36 + 5 0.43 33 + 5 0.77 40 + 5 0.65 28 + 5 0.71 REL GG 276 42 + 3 36 + 3 38 + 3 27 + 3 (cREL) TG 314 40 + 3 35 + 3 35 + 3 30 + 3

< AA 162 47 + 4 0.03 43 + 4 0.03 35 + 4 22 + 3 0.05 0,01

CTLA4 GG 185 33 + 4 29 + 3 47 + 4 25 + 3

AG 349 41 + 3 35 + 3 32 + 3 33 + 2

TT 15 33 + 12 0.23 33 + 12 0.30 27 + 12 0.46 40 + 13 0.40

TIRAP-MAL CC 483 39 + 2 34 + 2 38 + 2 28 + 2

TC 198 43 + 4 38 + 4 35 + 3 27 + 4

GG 119 34 + 4 0.66 33 + 4 0.68 40 + 5 0.86 27 + 4 0.59 NLRP3 CC 252 43 + 3 37 + 3 37 + 3 27 + 3 (NALP3) CG 325 40 + 3 34 + 3 36 + 2 30 + 3

GG 145 41 + 4 0.95 36 + 4 0.74 34 + 4 0.34 30 + 4 0.71 NLRP2 AA 239 39 + 3 34 + 3 41 + 3 26 + 2 (NALP2) AG 312 41 + 3 36 + 3 34 + 2 29 + 2

AA 177 43 + 4 0.57 39 + 4 0.39 37 + 4 0.89 24 + 3 0.12 NLRP1 GG 184 37 + 4 32 + 3 36 + 4 33 + 3 (NALP1) AG 335 40 + 3 35 + 3 37+ 3 26 + 2

TT 51 43 + 7 0.62 37 + 7 0.76 26 + 6 0.13 36 + 7 0.21

GG 392 36 + 3 34 + 2 37 + 3 30 + 2

IL10-592 TG 252 42 + 3 36 + 3 39 + 3 25 + 3

Undetermin 1 - - - - ed

GG 29 41 + 9 0.10 31 + 9 0.11 31 + 9 0.68 38 + 9 0.39

TNFRSF1B TT 425 43 + 3 38 + 2 36 + 2 27 + 2

TG 242 35 + 3 31 + 3 40 + 3 30 + 3

OS, overall survival; DFS, disease free-survival; NRM, non-relapse mortality; CBU, cord blood unit

110

3.2 non HLA genetics

Table 3.2.6a Results of univariate analysis according to CBU genotype and availability of high resolution HLA (n=305), cohort 2 (Cunha et al., 2017).

OS: 44% DFS: NRM: Relapse: 30% + 3 39% + 3 31% + 3 + 2 Variable Value n

% p % p % p % p

146 46 + 4 0.60 41 + 4 0.49 31 + 4 0.92 29 + 4 0.64 Time of CBT < 2006

> 2006 159 43 + 4 37 + 4 32 + 4 31 + 4

Male 176 47 + 4 0.23 40 + 4 0.42 30 + 4 0.43 30 + 3 0.98 Gender Female 129 40 + 5 37 + 1 33 + 4 30 + 4

< 18 153 51 + 4 < 0.01 45 + 4 0.05 27 + 3 0.04 29 + 4 0.84 Age at time years of CBT > 18 152 30 + 5 28 + 5 40 + 5 32 + 5 years

Matched 93 42 + 6 0.78 38 + 5 0.87 30 + 5 0.93 32 + 5 0.61

ABO Minor 66 48 + 6 40 + 6 32 + 6 30 + 6 compatibility

Major 91 44 + 5 41 + 5 32 + 5 27 + 5

< Early 125 58 + 5 < 0.01 53 + 5 28 + 4 0.09 20 + 4 < 0.01 Disease 0.01 status at Intermedi 114 41 + 5 35 + 5 30 + 4 35 + 3 time of ate transplantati on Advance 66 23 + 6 19 + 5 41 + 6 40 + 6 d

Positive 180 38 + 4 < 0.01 34 + 4 0.01 37 + 4 0.01 29 + 4 0.88 CMV serology Negative 109 53 + 5 47 + 5 24 + 4 30 + 5

HLA 15 48 + 14 0.43 44 + 14 0.48 13 + 9 0.08 37 + 16 0.35 matched

HLA High matched resolution 116 48 + 5 41 + 5 28 + 4 31 + 4 vs 1-2 HLA allele MM compatibility HLA matched 157 41 + 4 37 + 4 37 + 4 27 + 4 vs 3-4 allele MM

111

3.2 non HLA genetics

HLA matched 17 31 + 15 35 + 14 18 + 10 47 + 10 vs 5 allele MM

Use of ATG No 26 40 + 10 0.58 33 + 9 0.30 16 + 7 0.09 56 + 10 < 0.01 or monoclonal antibody Yes 269 44 + 3 40 + 3 33 + 3 27 + 3

< 3.4 Infused TNC 7 152 38 + 4 0.08 34 + 4 0.26 35 + 4 0.30 31 + 4 0.62 dose, x (10 /Kg) 7 10 /Kg, > 3.4 7 152 51 + 4 44 + 4 28 + 4 29 + 4 median (10 /Kg)

Infused < 1.5 5 144 34 + 4 0.02 30 + 4 0.02 34 + 4 0.61 37 + 4 0.01 CD34+ cells (10 /Kg) dose, x 105/Kg, > 1.5 5 145 53 + 4 46 + 4 30 + 4 24 + 4 (10 /Kg) median

AA 73 53 + 6 0.04 48 + 6 0.08 27 + 5 0.05 25 + 5 0.41

CTLA4 AG 153 45 + 4 39 + 4 28 + 4 34 + 5 genotype GG 79 33 + 6 32 + 5 42 + 6 28 + 4

CBU, cord blood unit; CBT, cord blood transplantation; CMV, cytomegalovirus; HR-HLA, high resolution-human leukocyte antigen; MM, mismatch; ATG, anti-thymocyte globulin; TNC, total nucleated cells

Table 3.2.6b Results of univariate analysis according to CBU genotype and availability of high resolution HLA (n=305), cohort 2 (Cunha et al., 2017).

Neutrophil Platelet aGvHD: cGvHD: 19% engraftment: engraftment: Variable Value n 29% + 2 + 2 87% + 2 64% + 3

% (range) p % (range) p % (range) p % (range) p

< 2006 146 23 + 4 0.04 18 + 3 0.64 87 + 3 0.1 66 + 4 0.91 Time of CBT > 2006 159 35 + 4 21 + 3 89 + 2 63 + 4

Male 176 27 + 3 0.40 21 + 3 0.35 88 + 2 0.4 64 + 4 0.66 Gender Female 129 32 + 4 16 + 3 88 + 3 65 + 4

< 18 years 153 33 + 4 0.08 15 + 3 0.03 85 + 2 < 0.01 62 + 4 0.33 Age at time of CBT > 18 years 152 24 + 4 26 + 4 92 + 2 68 + 4

112

3.2 non HLA genetics

Matched 93 30 + 5 0.72 17 + 4 0.16 87 + 4 0.67 63 + 5 0.35

ABO Minor 66 compatibility 24 + 5 29 + 6 85 + 5 73 + 5

Major 91 30 + 5 17 + 4 92 + 3 71 + 5

Early 125 33 + 4 0.47 21 + 4 0.53 92 + 2 0.01 67 + 4 0.29 Disease status at time of Intermedia 114 transplantatio te 26 + 4 20 + 4 89 + 3 66 + 5 n

Advanced 66 29 + 6 14 + 4 79 + 5 56 + 6

Positive 180 31 + 3 0.62 21 + 3 0.47 84 + 3 0.04 59 + 4 < 0.01 CMV serology Negative 109 27 + 3 17 + 4 94 + 2 75 + 4

HLA 15 matched 27 + 12 0.73 20 + 11 0.84 87 + 10 0.60 73 + 12 0.56

HLA matched 116 vs 1-2 allele MM 30 + 4 22 + 4 87 + 3 66 + 4 High resolution HLA HLA matched compatibility 157 vs 3-4 allele MM 30 + 4 17 + 3 88 + 2 61 + 4

HLA matched 17 vs 5 allele MM 38 + 10 22 + 13 88 + 9 71 + 12

No 26 Use of ATG or 35 + 10 0.47 24 + 11 0.89 81 + 8 0.16 58 + 10 0.31 monoclonal antibody Yes 269 29 + 3 19 + 2 89 + 2 67 + 3

< 3.4 Infused TNC 152 (107/Kg) 24 + 4 0.03 21 + 3 0.39 89 + 2 0.31 67 + 4 0.63 dose, x 107/Kg, > 3.4 152 median (107/Kg) 35 + 4 17 + 3 86 + 3 62 + 4

< 1.5 Infused CD34+ 144 (105/Kg) 27 + 4 0.22 19 + 3 0.87 86 + 3 0.07 61 + 4 0.05 cells dose, x 105/Kg, > 1.5 145 median (105/Kg) 33 + 4 21 + 3 90 + 2 68 + 4

AA 73 32 + 6 0.74 20 + 5 0.44 89 + 4 < 0.01 60 + 6 0.28

CTLA4 AG 153 genotype 30 + 4 21 + 3 93 + 2 69 + 4

GG 79 26 + 5 15 + 4 78 + 5 59 + 6

CBU, cord blood unit; CBT, cord blood transplantation; CMV, cytomegalovirus; HR-HLA, high resolution-human leukocyte antigen; MM, mismatch; ATG, anti-thymocyte globulin; TNC, total nucleated cells

113

3.2 non HLA genetics

NON RELAPSE-MORTALITY, OVERALL SURVIVALAND CAUSES OF DEATH and RELAPSE and DISEASE FREE SURVIVAL

Cumulative incidence of NRM was 37% at 4 years. Univariate analysis according CBU genotype showed higher NRM for recipients of CBU with CTLA4 GG genotype (AA: 35%, AG: 32%, GG: 47% - figure 3.2.1A). Multivariate analysis confirmed these results demonstrating higher NRM for recipients of CBU with GG genotype (HR: 1.44 95%-CI 1.07 to 1.94, p=0.02). In the analysis for cohort 2 (acute leukemia patients and CBU with allele typing of HLA-A,-B,-C and DRB1), recipients of CBU with GG CTLA4 genotype had increased NRM (HR: 1.72; 95%-CI 1.10 to 2.70%; p=0.02): the 4 years CI of NRM was 27% for AA, 28% for AG and 42% for GG genotype, respectively (figure 3.2.1B).

Figure 3.2.1 Non-relapse mortality according to CBU genotype. A) effect of CTLA4 genotype for recipients with available CBU samples (n=696), cohort 1; B) effect of CTLA4 genotype for recipients with available CBU samples and HLA high resolution typing (n=395), cohort 2 (Cunha et al., 2017).

Estimated overall survival (OS) at 4 years was 40%. Univariate analysis by CBU genotype showed inferior OS for recipients of CBU with CTLA4 GG genotype (AA: 47%, AG: 41%, GG: 33%), and multivariate analysis confirmed these results (HR: 1.41 95%-CI 1.04 to 1.90, p=0.02). In the analysis of cohort 2 (acute leukemia patients and CBU with allele typing of HLA-A,-B,-C and DRB1) recipients of CBU with GG CTLA4 genotype had inferior OS, but with borderline statistical significance (HR: 1.54; 95%-CI: 1.03-2.41; p=0.06). At 4 years, OS was 53% for AA, 45% AG and 33% for GG genotype, respectively. Fifty-nine percent (n=412) recipients died. Sixty two percent (n=255) of deaths were related to transplant complications, 36% (n =149) to relapse or disease progression, while 2% (n=8) were of unknown causes. Deaths related to transplant complications (n=255) were due to: 44% (n=113) infections, 23% (n=59) GvHD, 6% (n=16) multiple organ failure, 6% (n=13) pulmonary complications, 5% (n=2) graft failure, 5% (n=12) bleeding disorders, 4% (n=11) hepatic veno-occlusive

114

3.2 non HLA genetics disease and 7% (n=19) other or unknown causes. Potential associations between SNPs and causes of death were studied by correspondence analysis. Recipients of CBU with GG CTLA4 genotype died mainly of transplant complications in particular GVHD, bleeding disorders and infectious complications (table 3.2.7).

Table 3.2.7 Cause of deaths of recipients according to CBU CTLA4 genotyping (Cunha et al., 2017).

Recipients with available CBU samples (n=696) Variable

AA (n=162) AG (n=349) GG (n=185)

GVHD 13 24 22

Infections 22 55 36

Pulmonary complications 3 5 5

Bleeding disorders 2 6 4

Multiple organ failure 6 4 6

Graft failure 3 3 6

VOD 4 4 3

Others 3 14 2

Total 56 115 84

GVHD, graft-versus-host disease; VOD, hepatic veno-oclusive disease

Cumulative incidence of relapse was 28% at 4 years. In univariate analysis CTLA4 genotype was associated with higher incidence of relapse: 22% for AA, 33% for AG and 25% for GG genotype, respectively. Multivariate analysis confirmed that recipients of CBU with AA CTLA4 genotype had lower relapse incidence (HR: 0.64 95%-CI 0.57 to 0.72, p = 0.02). However, in the analysis for cohort 2 (acute leukemia patients and CBU with allele typing of HLA- A,-B,-C and DRB1), CTLA4 CBU genotype was not associated with relapse (HR: 0.64; 95%-CI: 0.37 to 1.12; p=0.13), probably due to the smaller number of recipients on this cohort. At 4 years, disease free survival (DFS) was 35%. According to CBU genotype, univariate analysis demonstrated that CTLA4 impacts on DFS (AA: 43% AG: 35%, and GG: 29%). Multivariate analysis confirmed these results showing inferior DFS for the recipients receiving CBUs with GG genotype (HR 1.41, 95% CI 1.06 to 1.88, p = 0.02). In the analysis for cohort 2 (acute leukemia patients and CBU with allele typing of HLA-A,-B,-C and DRB1) recipients of

115

3.2 non HLA genetics

CBU with GG CTLA4 genotype had inferior DSF, but this association was not statistically significant (HR: 1.49; 95%-CI: 0.96 to 2.30; p=0.07). Table 3.2.8 shows a summary of multivariate analysis results according to CBU genotype.

Table 3.2.8 - Summary of multivariate analysis results according to CBU genotype (Cunha et al., 2017).

CI 95.0% Analyzed cohort Outcomes SNP HR p < >

Neutrophil CTLA4 genotype GG 1.24 1.09 1.40 0.03 engraftment

Platelets TNFRSF1B 1.94 1.71 2.19 0.02 engraftment genotype GG

Recipients with available CBU Non-relapse < samples (n=696) CTLA4 genotype GG 1.52 1.35 1.72 mortality 0.01 COHORT 1

Overall survival CTLA4 genotype GG 1.41 1.04 1.90 0.02

Relapse CTLA4 genotype AA 0.64 0.57 0.72 0.02

Disease free- CTLA4 genotype GG 1.41 1.06 1.88 0.02 survival

Neutrophil CTLA4 genotype GG 1.35 0.99 1.82 0.06 engraftment

Recipients with available CBU samples and HR-HLA Non-relapse CTLA4 genotype GG 1.72 1.10 2.7 0.02 data (n=305) mortality

COHORT 2

Overall survival CTLA4 genotype GG 1.54 0.99 2.41 0.06

Relapse CTLA4 genotype AA 1.54 0.89 2.67 0.13

Disease free- CTLA4 genotype GG 1.49 0.96 2.30 0.07 survival

CBU, cord blood unit; HR-HLA, high resolution human leukocyte antigen 3.2.5 Discussion As for allo-HSCT from other stem cell sources, many variables related to CBU, recipient, disease or transplantation characteristics affect the clinical course also after CBT. However, the influence of non-HLA genetics factors, such as genetic polymorphism of immune response, has not been well elucidated in this setting. Aiming to investigate the association between CBU and recipient

116

3.2 non HLA genetics genotypes with CBT outcomes, 696 CBUs and 143 recipients were genotyped for 8 SNPs related to immunological response. The only innate immune response polymorphisms associated with outcomes was TNFRS1B, which was found to correlate with platelet recovery. Despite this result has not been observed in a previous study in CBT setting (Kögler et al., 2002), studies on other HSC sources showed that this polymorphism was associated with inferior survival (Pearce et al., 2012; Dickinson et al., 2010) and increased incidence of acute GvHD and high levels of IL-6 (Stark et al., 2003), whereas its absence has been correlated with decreased survival (Keen et al., 2004). CTLA4 was the only candidate gene of adaptive immune response evaluated in this study. Interestingly, recipients of CBU carrying GG CTLA4 genotype had decreased DFS, increased NRM and lower neutrophil recovery. In addition, recipients of CBU carrying the AA CTLA4 genotype showed a lower incidence of relapse. Furthermore, increased NRM was confirmed in recipients of CBU with GG CTLA4 with acute leukemia, receiving myeloablative conditioning and with available high resolution HLA typing (cohort 2). CTLA-4 and its genetic variations have been largely investigated in the field of allo-HSCT, especially regarding GvHD and survival. Because of contradictory results, as the largest study available with a homogeneous cohort yielded negative results, the real impact of CTLA4 on outcomes has not been completely clarified so far (Azarian et al., 2007; Vannucchi et al., 2007; Bosch-Vizcaya et al., 2012; Piccioli et al., 2010; Perez- Garcia et al., 2007) and also there is no report in the literature regarding its role in CBT (Sengsayadeth et al., 2014). Nevertheless, our findings are biologically plausible. In fact, it has been reported that the G allele produces lower mRNA of sCTLA-4, whose expression is the functional basis for the observed association between CTLA-4 and autoimmune diseases (Ueda et al., 2003). Moreover the analysis of mRNA expression of sCTLA4 in healthy blood donors pointed to allele A as responsible for a greater production of sCTLA-4, and allo-HSCT recipients of progenitors cells from AA genotype donors have been reported to have increased incidence of alloimmune reactions, including acute GvHD. Furthermore, the G allele seems to confer increased risk of disease relapse (Perez-Garcia et al., 2007). It has been suggested that sCTLA-4 inhibits B7-flCTLA-4 complex, with consequent influence on T-cells alloreactivity (Perez-Garcia et al., 2007; Ueda et al., 2003). We can speculate that lymphocytes carrying GG CTLA4 genotype of CBU have lower alloreactivity, which could explain the impact of CTLA4 genotype on engraftment. In our study the recipients of CBU AA genotype experienced lower relapse, suggesting superior T-cells response for AA CBU genotype and confirming this hypothesis. However, the immunobiology of CBU cells needs to be further investigated. Hypothesis based on the immunobiology of HSCT adult cells can not, necessarily, be applied to the CBT scenario. In comparison with naïve adult T- cells, CBU T-cells are known to be Th2 biased, with reduced inducible expression

117

3.2 non HLA genetics of Th1 cytokines. Therefore T-cell reconstitution after CBT is CD4+ biased, a pattern, normally, not observed after HSCT using a BM or PBSC source (Hiwarkar et al., 2015). Approximatly 2 months after infusion, CBU T-cells differentiate into viral specific T-cells, originating rapid response to viral infections, such as CMV and adenovirus (Chiesa et al., 2012). In addition, in murine models, CBU T-cells mediate superior anti-tumor responses compared with adult T-cells. The anti-tumor activity correlates with increased tumor-homing of CBU CD8+ T-cells and rapid gain of cytotoxic and Th1 function (Hiwarkar et al., 2015). Analyzing CBT immune tolerance, the comparison of CTLA4 gene expression between CBU and HSCs of adult donors demonstrated a reduced expression of CTLA-4 in CBU T- cells, even if theses results were not reproduced in accordance with CTLA4 allele genotype (Miller et al., 2002). 3.2.6 Conclusion and perspectives In conclusion, gene polymorphisms of the immune response, as CTLA4 and TNFRS1B, may influence CBT outcomes according to CBU genotype. In particular, the association of CTLA4 GG genotype of the CBU with lower survival and higher NRM suggests that this polymorphism might be considered for CBU selection when more than one CBU meeting the current suggested selection criteria of cell dose and HLA matching (Hough et al., 2016) are available. Importantly, CTLA4 typing of cord blood units could be easily provided by the CBB without significant increase of costs or delay in transplantation schedule, thus could be a candidate additional criterion to be included in the algorithm of CBU selection. The results of this joined NetCord-Eurocord multicentric study have been recently reported in Blood (Cunha et al., 2017). Despite noteworthy, this study has some limits mainly consisting in the fact that it is a retrospective registry based analysis, which includes multiple transplant centers and CBBs, and a heterogeneous population of patients with different diseases, receiving different types of conditioning regimen and transplanted in a wide range period. To overcome these limitations, the final multivariate models was adjusted for these variables and the same analysis was performed in a homogeneous group, represented by cohort 2, which included patients transplanted for acute leukemia, all given a myeloablative conditioning regimen and with available allele-level HLA typing. In this homogeneous but small cohort, CTLA4 demonstrated to be associated with NRM, but not with other outcomes parameters. Therefore, further studies are needed to better elucidate the impact of CTLA4 on survival after CBT, and to reproduce our results in a different cohort, besides studies on functional analysis of CTLA4 gene and related alleles to clarify the underlying mechanisms.

118

3.3 population genetics

3.3.1 Contribution of the cord blood banks to population genetics studies 3.3.1.1 Background

Concerns exist about underutilization and underrepresentation of ethnic minorities on donors registries. Non-European individuals are poorly represented within the total number of donors leading to unequal access to HSCT for patients from non-European heritage. The likelihood of finding an optimal donor is reported to vary largely among ethnic groups, with the highest probability among whites of European descent (75%), and the lowest among blacks of South or Central American descent (16%)(Gragert et al., 2014; Pidala et al., 2013). In fact due to genetic diversity in human populations, Europeans and non-Europeans do not share HLA combinations thus resulting infrequently HLA-matched. For these reasons, CBBs and registries are highly committed to recruit donors with the aim of increasing the number and genetic diversity of those available for search. Initiatives include brochures and promotional materials that are distributed to donation centers translated into several languages involving bilingual translators and cultural mediators. We have also adopted this strategy at our CBB in Pavia since 2006. As HLA-matching requirements are less stringent versus BM and PBSC donors, cord blood may represent an alternative source to overcome the limits of current ethnic composition of donor registries leading to an increased chance of finding a compatible HSC donor for minorities. Moreover it is known that the proportion of non-Europeans in CBB inventories worldwide is higher than in donor registries (see figure 3.3.1). All these reasons support the universal use of cord blood for patients of non-European descent (Barker et al., 2010; Bordoni et al., 2015; Ustun et al., 2014). Reviewing the activity of our CBB (for the period (2006-2015), we found that 701 non-Italian donors have donated cord blood and joined our CBB program over the years, including 456 of European and 245 of extra-European origin, respectively. Among the 245 donors coming from extra-European countries, 86 were from Central and South America (see figure 3.3.2). However, the number of banked CBUs with maternal and/or paternal origin other than Italian, for which DNA is available, is much lower. In fact one major criterion for banking is the total cell content of not less than 1.5 x109. This cut off now leads to a strong selection at time of banking and excludes some potentially informative samples from our DNA archive: only 10-15% of all CBU collections on average are suitable for processing, cryopreservation and long-term storage.

119

3.3 population genetics

Figure 3.3.1 Geographical distribution of cord blood units available worldwide in the cord blood banks network (2016 data from BMDW, www.bmdw.org).

Figure 3.3.2 Review of the data referred to the enrolment of cord blood donors from non-Italian parents, subdivided according to the origin of parents (Pavia CBB activity report for the period 2006-2015, data unpublished). The number of collected CBU is shown in brackets; the number of banked CBU is in light blue (parents of European descent) or in violet (parents of non-European descent). The number of CBU with parents from South America is highlighted up-right (n= 86).

120

3.3 population genetics

Studies based on uniparental markers (mtDNA and MSY) provide information concerning the relationship among different populations within the human species useful to trace back to the ancestry and the history of human migrations. In particular, referring to the first peopling of the Americas, there is increasing evidence that the ancestors of Paleo-Indians reached the Americas from Beringia after the Last Glacial Maximum about 16 Kya and populated the continent following two main entry ways: the Pacific coastal route and the ice-free corridor passage between the Laurentide and Cordilleran ice sheets (see figure 3.3.3 and 3.3.4)(Perego et al., 2009; Bodner et al., 2012). The counterpart of mtDNA studies is represented by studies on the Y chromosome, an independent source of genetic information. Despite the identification of the male founding lineages has been difficult because of the rarity of autochthonous Y-chromosomes in modern American populations (as a consequence of the male post-Colombian colonization by Europeans), two main native haplogroups were described, haplogroups C and Q, accounting for about 6% and 75% of the Native American Y chromosomes, respectively. Their entry into the Americas is likely to have occurred at different times, with haplogroup Q, observed all over the double continent, arriving prior to haplogroup C, which is essentially limited to North America (Battaglia et al., 2013). In addition to mitochondrial DNA and Y-chromosome, it is increasingly acknowledged that the sequence variation of autosomal DNA, including immunogenetic polymorphisms, may represent important and complementary tools for anthropological studies (Fernandez-Vina et al., 2012; Sanchez-Mazas et al., 2011). In fact HLA class I and class II loci are characterized not only by extensive polymorphism, and consequently great diversity between populations, but also linkage disequilibrium leading to the presence of genes blocks of non-random alleles combinations (haplotypes), that tend to be inherited together with patterns of worldwide distribution in different populations that could be very informative in population studies (Trowsdale, 2011). Disease resistance, and probably also reproductive fitness, are advocated to have acted as mechanisms for MHC selection not only at the level of the individual but notably at the population level, shaping the distribution of HLA allele and haplotype diversity observed in modern populations, often overlapping the geographical boundaries of the areas where the different populations live. In parallel with the improvement in the typing methods and consequently in the level of definition obtained, it has been recently reasserted that HLA patterns of genetic variation worldwide can provide significant information about human geographic expansion, demographic history and cultural diversification.

121

3.3 population genetics

Figure 3.3.3 Distinctive Pale-Indian migration routes from Beringia (both Pacific coastal route and the ice free corridor) marked by two rare mtDNA haplogroups, X2a and D4h3a (Perego et al., 2009).

Figure 3.3.4 The three migration models into the South American continent: A) the incubation of population groups arriving from the north in a northern area of South America and a late split into coastal and continental population groups after the full development of all major D1g and D1j subclades); B) a coastal southward migration followed by the colonization of the continental interior by trans-Andean migrations, with limited later exchange along the cordillera; C) and a coastal southward migration and trans- Andean colonization of the continental interior, followed by extensive trans-Andean migrations, with bidirectional gene flow between west and east especially in the Southern Cone (Bodner et al., 2012).

122

3.3 population genetics

3.3.1.2 Aim of the research

As our CBB stores a DNA archive of all its banked CBUs and also the geographical and ethnic origin of the family can be easily reconstructed including grandparents, we investigated the potential application in population genetics, and possible interactions with other disciplines such as archeology, linguistics and paleoclimatology. Immunogenetics data of our CBUs banked and listed for clinical purpose consist of allele level definition of class I and II HLA loci, namely low resolution for HLA-A and –B and high resolution for HLA-C and –DRB1. In parallel, mother‟s HLA typing at low resolution for HLA-A, -B and –DRB1 is defined for each CBU enabling the identification of the maternal inherited haplotype (MIH), and the deduction of paternal inherited one (PIH), providing information about the maternal and paternal lineage, respectively. In this study we have evaluated the contribution of these data in a scenario where also uniparental markers are considered, focusing on Central and South America as geographical origin of the samples. 3.3.1.3 The sample

We reviewed the data on the ethnic composition of our CBB inventory and were able to find 48 CBU DNA samples stored form 1997 to 2014 and suitable to be enrolled in a study on the origin of South Americas populations performed in collaboration with the Laboratory of Population Genetics, DBB, University of Pavia (Prof.A.Torroni, Prof.O.Semino). All CBUs and samples were collected after signing the appropriate informed consent and confidentiality of the donors was always preserved, with protocols approved by the Ethic Committee for Clinical Experimentation of the University of Pavia, Board minutes of the 11th of April 2013. For the 48 cord blood donors (21 males, 27 females) included in this study the maternal or the paternal geographical origin was documented to come from Central and South America, namely: Argentina (n=7), Brazil (n=5), Chile (n=2), Cuba (n=5), Dominican Republic (n=8), Ecuador (n=11), El Salvador (n=1), Peru (n=6), Venezuela (n=3). According to the geographical origin reported for maternal and paternal lineage respectively, 42 DNA samples were used for mtDNA (18 males, 24 females) and 12 for MSY analysis. Out of the 21 males, 8 were used for both mtDNA and Y-chromosome analysis and 10 for mtDNA analysis only (in fact 4 out of the 12 males analyzed for Y-chromosome cannot be analyzed for mtDNA for the purpose of this study, as the mother was of European descent). Subsequently the samples recruitment continued to include those CBUs cryopreserved in 2015 (El Salvador, n=1, female) and 2016 (Peru, n=1, male). Ongoing, we modified our strategy to increase the number of informative samples

123

3.3 population genetics and decided to also include in the study those CBUs not suitable for banking, by storing their DNA if the ethnic background was of interest. Thanks to this approach we obtained 4 further samples (Cuba, n=1, female; Argentina, n=1, female; Peru, n=2, males), increasing the overall sample size of this study. Furthermore, the collaboration has been extended to the other Italian CBBs, which are all connected in a national network (ITCBN) under the coordination of the National Blood Center and the National Transplant Center, and the Italian Registry (IBMDR). A multicentric national study has been launched in 2017 to centralize in Pavia possibly all DNA of ITCBN CBUs whose origin is in Central and South America and contribute to enlarge the existing set of mtDNA, Y chromosome and HLA typing data. So far 4 out of 19 Italian CBBs have formally joined the initiative. ERCBB (Bologna) sent 19 samples (7 males, 12 females); Firenze CBB is ready to send 14 samples (6 males, 8 females); and UniCatt CBB and Calabria CBB have been completing the review of their inventory data and retrieving the samples. CBUs samples will be centralized to IBMDR for HLA typing of relevant loci, including maternal typing for reconstruction of MIH and PIH, and for the validation of data for listing. The 19 samples sent by ERCBB have been selected basing on the maternal or the paternal geographical origin to be documented to come from Central and South America, namely: Argentina (n=2), Brazil (n=5), Chile (n=1), Cuba (n=2), Colombia (n=1), Dominican Republic (n=1), Ecuador (n=4), Honduras (n=1), Peru (n=1), Venezuela (n=1). The immunogenetics data of the ERCBB CBUs consist of low resolution for HLA-A and –B and high resolution for –DRB1, as per IBMDR standards, while the mother‟s HLA typing for the definition of MIH and PIH is not available (as most CBBs perform confirmation of maternal haplotype not at time of listing but at CBU release for transplantation). 3.3.1.4 Results and discussion 3.3.1.4.1 Mt-DNA analysis

For mtDNA, D-loop analysis was first performed to classify the samples according to mitochondrial haplogroups. After this identification and selection by screening the mtDNA control region, direct sequencing was done accordingly for the analysis of the entire mitogenome. From the original sample of 42 DNA samples suitable for mtDNA analysis, 28 showed to belong to native mitochondrial haplogroups. Out of these 28 assigned to native mitochondrial haplogroups, 14 candidate mtDNAs were selected because of their origin (9 from Ecuador and 5 from Peru, respectively) and were the first to be completely sequenced. Data on mtDNA sequencing were used for phylogeny construction in a larger dataset of 217 samples (208 from Ecuador and nine from Peru) to evaluate if mitogenome variation in these countries located at the northernmost part of the South American continent

124

3.3 population genetics may provide new clues on how and when human populations arrived and spread in South America (see section 3.3.2 for more details). The results of this study have been reported in a manuscript submitted for publication on Molecular Biology and Evolution journal (Brandini et al. in press). Out of the 28 samples that showed to belong to native mitochondrial haplogroups by D-loop analysis, 14 samples with non-Ecuador non-Peru origin also underwent direct sequencing with the following results: Argentina (C1c, n=1; D1J1a, n=1), Chile (D1g, n=1; A2, n=1), Cuba (C1c, n=1; A2, n=2), Dominican Republic (A2, n=2; D, n=2), Venezuela (C1c, n=1; B2, n=2). Focusing on the cohort of 16 CBU samples coming from Ecuador (n= 11) and Peru (n=5), 12 females and 4 males, the aforementioned 14 (9 from Ecuador and 5 from Peru) turned out to be of Native American ancestry upon mtDNA analysis and were assigned to mtDNA haplogroups as follow: A2 (n=2), A2ac (n=1), B2 (n=3), B2b (n=1), C1b (n=1), C1c (n=1), C1d (n=1), D1 (n=3) e D1f (n=1). The remaining two samples (both from Ecuador) were assigned to non- native haplogroups U2d3 and L3e2b (see table 3.3.1). mtDNA analysis of the 3 Peru samples subsequently recruited in the period 2015-2016 showed native mtDNA haplogroups for all of them, namely B2, n=2, and D1, n=1 (see table 3.3.1). Therefore also these samples were used for the largest dataset of the study on mitogenome variation and Paleo-Indians entry into in South America detailed in section 3.3.2 and were included in the manuscript by Brandini et al.

Table 3.3.1 Results of mtDNA analysis on 19 CBU samples coming from Ecuador (n= 11) and Peru (n=8), from Pavia CBB.

mtDNA analysis – Pavia CBB

native (total) country of origin, and n

A2 2 Ecuador=2

A2ac 1 Ecuador=1

B2 5 Ecuador=1 / Peru=4

B2b 1 Ecuador=1

C1b 1 Peru=1

C1c 1 Ecuador=1

C1d1 1 Ecuador=1

D1 4 Ecuador=1 / Peru=3

D1f 1 Ecuador=1

125

3.3 population genetics

non native haplogroup n (total) country of origin, and n

U2d3 1 Ecuador=1

L3e2b 1 Ecuador=1

19 Ecuador=11 / Peru=8

Referring to the multicentric national study, ERCBB (Bologna) sent 19 samples (7 males, 12 females). All 19 samples underwent mtDNA analysis. 7 out of 19 turned out to be of Native American ancestry and were assigned to mtDNA haplogroups as follow: A2 (n=1), B2 (n=3), C1b (n=1), D1 (n=2). Five were form Ecuador (n=4), whose mtDNA haplogroups were: B2=1, H=1, J1c=1, X2=1, and Peru (n=1, mtDNA haplogroup D1) (see table 3.3.2).

Table 3.3.2 Results of mtDNA analysis on 5 CBU samples coming from Ecuador (n= 4) and Peru (n=1), from ERCBB (Bologna).

mtDNA analysis – ERCBB (Bologna)

native haplogroup n (total) country of origin, and n

B2 1 Ecuador=1

D1 1 Peru=1

non native haplogroup n (total) country of origin, and n

H 1 Ecuador=1

J1c 1 Ecuador=1

X2 1 Ecuador=1

5 Ecuador=4 / Peru=1

3.3.1.4.2 Y-chromosome analysis

For MSY, RFLP analysis was performed for the signature markers (M130 and M242) of the two main Native American founding lineages, haplogroups C and Q as well as other informative markers of major Old World lineages. Out of the 48 DNA samples of the original dataset, 12 males were suitable for MSY analysis. Only one CBU from El Salvador was found to belong to Y- chromosome haplogroup Q. Considering the three Peruvian samples recruited in the period 2015-2016, only one male was suitable for MSY analysis, and turned out to be a member of haplogroup Q. Both these data will be included in future studies of this haplogroup. No sample was found to belong to haplogroup C.

126

3.3 population genetics

Among the remaining 11 DNA samples of the original dataset, the other Y- chromosome haplogroups were assigned as follow: Argentina (G, n=1; E, n=1), Brazil (R1b, n=1), Dominican Republic (E, n=2; R1b, n=1), Ecuador (E, n=2), Venezuela (R1b, n=1; J, n=1); one sample coming from the Dominican Republic could not be assigned. As for the cohort of 16 CBU samples from Ecuador (n=11) and Peru (n=5), 4 were males but only 2 were suitable for the analysis. As detailed above, both samples from Ecuador resulted not to be of Native American ancestry (haplogroup E, n=2). Concerning the 3 Peruvian samples, all males, recruited in the period 2015-2016, one sample turned out to be a member of haplogroup Q, while the other two were unsuitable for the purpose of this analysis as the father was of European descent. Finally, referring to the multicentric national study, 7 out of the 19 samples sent by ERCBB (Bologna) were males. MSY analysis revaled that none was of Native American ancestry. They were classified as follows: =4 (Argentina=1, Ecuador=1, Honduras=1, Peru=1), and haplogroup E=1 (Dominican Republic=1), while 2 have not been assigned yet (Argentina=1, Brazil=1). Focusing on the cohort of CBU samples coming from Ecuador and Peru only, 2 were males and both were suitable for the analysis. As detailed above, both samples from Ecuador and Peru resulted not to be of Native American ancestry (Y chromosome haplogroup R1b). Table 3.3.3 summarizes the results of the MSY analysis for CB samples from Ecuador and Peru (from both Pavia CBB and ERCBB). Our findings are not surprising as non-native contamination is observed to be mainly male-mediated in the geographical regions here investigated.

Table 3.3.3 Results of MSY analysis of three CBU samples from Ecuador (n= 2) and Peru (n=1), collected at Pavia CBB, and 2 CBU samples from Ecuador (n= 1) and Peru (n=1), collected at the ERCBB.

MSY analysis – Pavia CBB and ERCBB

native haplogroup n (total) country of origin, n and CBB

Q 1 Peru=1 (PVCBB)

non native haplogroup n (total) country of origin, n and CBB

E 2 Ecuador=2 (PVCBB)

R1b 2 Ecuador=1/ Peru=1 (ERCBB)

tot 5 Ecuador=3 (PVCBB=2; ERCBB=1) Peru=2 (PVCBB=1; ERCBB=1)

ERCBB = Bologna CBB; PVCBB = Pavia CBB

127

3.3 population genetics

3.3.1.4.3 HLA allele and haplotype analyses

HLA studies in Native American populations showed that the number of allelic lineages is significantly reduced versus other populations, but with high levels of heterozygosity, with exception of the DPB1 locus. In the American Indian tribes, very few allelic lineages (4 HLA-A, 7 HLA-B, 7 HLA-C, 4 HLA-DRB1, 2 HLA-DQA1, 2 HLA-DQB1 and 5 HLA-DPB1) have been observed with several alleles not occurring in other outbred populations or tribes. These alleles might have arisen in the Americas as novel alleles, thus can be considered as characteristic of Native Americans (Fernandez-Vina et al., 2012). Focusing on the cohort of 16 CBU samples coming from Ecuador (n=11) and Peru (n=5), 12 females and 4 males, we looked back at the HLA system, in particular at class II HLA loci. For listing, the current definition of our CBUs relies on HLA typing at low resolution for HLA-A and –B loci, at high resolution for HLA-C and –DRB1. Therefore DRB1 alleles were already defined. For each CBU, our CBB was used to perform the referring mother‟s HLA typing (HLA-A, -B and –DRB1 loci at low resolution). This approach ensures that the maternal inherited haplotype is known for every CBU (and the paternal one can be easily derived). Low resolution HLA-A and –B and high resolution for HLA-C and –DRB1 typing of the CBUs and mothers was performed by either revSSO by Luminex® xMAP® technology or LiPA revPCR-SSO plus PCR-SSP, while high resolution DQA1 and DQB1 typing by PCR-SSP. It is known that only five of the 13 DRB1 allelic lineages described so far occur in Native Americans (DRB1*04, DRB1*08, DRB1*09, DRB1*14, DRB1*16). So we first examined which HLA class II DRB1 alleles were present in our cohort of 16 samples and compared them to the frequencies reported for the native Ecuadorian Cayapa and Peruvian Quechua populations in the Allele Frequencies Net Database (AFND) (www.allelefrequencies.net), the reference databank for population studies considering HLA system. The AFND is a freely accessible database that stores population frequencies for alleles or genes of the immune system in worldwide populations and is regularly updated (Dos Santos et al., 2016). Among the 16 CBU samples from Ecuador and Peru we found several DRB1 alleles described as characteristic of the Cayapa and Quechua (and rarely detected in Europeans) such as: DRB1*04:07 (n=2); DRB1*08:02 (n=2); DRB1*09:01 (n=2) and DRB1*14:02 (n=1), belonging to all the aforementioned native allelic lineages with the exception of DRB1*16 (see figure 3.3.5). As in other Native American populations, DRB1*16:02 is the only allele of DRB1*16 lineage found in Cayapa and Quechua, where it is present at low frequencies (5.7% in Quechua and less than 1% in Cayapa)(Trachtenberg et al., 1995; Tsuneto et al., 2003).

128

3.3 population genetics

Figure 3.3.5 DRB1 alleles described as characteristic of the Cayapa and Quechua found upon the analysis of the immunogenetic data of 16 CBU samples from Ecuador (n=11) and Peru (n=5). Allele frequency is also provided (AF). Data from www.allelefrequecies.net.

129

3.3 population genetics

HLA class II region contains four primary polymorphic loci (DRB1, DQA1, DQB1 and DPB1), separated by 60, 20 and 400 kb, respectively, with extensive allelic variation and very infrequent recombination. The recombination between DQB1 and DPB1 is reported with a rate of 1%, while recombination between DQA1 and DRB1 or DQB1 and DRB1 has never been observed. This is the reason why we extended the HLA typing of our samples to include DQA1 and DQB1 and were able to define HLA DR-DQ haplotypes. As a result, we confirmed the presence of native HLA DR-DQ haplotypes, namely: HLA-DRB1*04:07-DQA1*03:01-DQB1*03:02 (n=2); HLA-DRB1*08:02-DQA1*04:01-DQB1*04:02 (n=2); HLA-DRB1*09:01-DQA1*03:02-DQB1*03:03 (n=2). All the aforementioned DRB1 alleles described as characteristic of the Cayapa and Quechua (and rarely detected in Europeans) were found to belong to previously reported native HLA DR-DQ haplotypes with the exception of DRB1*14:02. In Native Americans (Trachtenberg et al., 1995), each DRB1 allele has a single primary (high-frequency) association with DQB1, even if many of the common DRB1 alleles are found on one or a few additional DQB1 haplotypes, besides the most common haplotype. Our findings are in agreement with this previous report where, for instance, DRB1*0802 is described as typically associated with DQB1*0402 and DRB1*0901 with DQB1*0303 (as in our four cases), and DRB1*0407, which is the most common allele in the Cayapa people, is described to be found only on the DQB1*0302 haplotype (as in our two cases). DRB1*04:07 is also reported with high frequency among the Quechua (14.8%), as well as HLA-DRB1*08:02 (12.5%), HLA-DRB1*09:01 (18.2%) and DRB1*14:02 (13.6%)(Tsuneto et al., 2003).

Table 3.3.4 Native HLA DR-DQ haplotypes found upon the analysis of the immunogenetic data of 16 CBU samples from Ecuador (n=11) and Peru (n=5). Interestingly, for both cases of HLA-DRB1*09:01-DQA1*03:02- DQB1*03:03), the native haplotype was the paternal inherited one (PIH). The first case was a female, from Peru with mtDNA haplogroup C1 and HLA maternal inherited haplotype (MIH) not characteristic of the reference population. Native PIH provides information on the paternal lineage (that cannot be obtained by Y- chromosome analysis without sampling the father); concerning the maternal lineage, native mtDNA traces back to maternal ancestry and is only apparently in contrast to non-native MIH (as this could be for instance the mother‟s PIH). The second case was a male from Ecuador (mtDNA haplogroup B2, Y-chromosome haplogroup E), where both maternal (HLA-DRB1*14:02-DQA1*05:03-

130

3.3 population genetics

DQB1*03:019 and paternal (HLA-DRB1*09:01-DQA1*03:02-DQB1*03:03) haplotypes were natives. These data could be interpreted as complementary and not in contrast to those obtained by the analyses of uniparental markers. Native MIH and native mtDNA haplogroup support a native descent for the maternal lineage, while a native PIH associated to a non-native Y haplogroup may simply mean that the PIH is the father‟s MIH. The two cases of HLA-DRB1*08:02-DQA1*04:01-DQB1*04:02 were a female from Peru with mtDNA haplogroup B2, where HLA-DRB1*08:02- DQA1*04:01-DQB1*04:02 represented the MIH, and another female from Peru with mtDNA D1 where HLA-DRB1*08:02-DQA1*04:01-DQB1*04:02 represented the PIH and HLA-DRB1*04:07-DQA1*03:01-DQB1*03:02 the MIH. In this case again native PIH provides information on the paternal lineage that otherwise could be obtained only testing the father. The two case of HLA-DRB1*04:07-DQA1*03:01-DQB1*03:02 were the second case described above where HLA-DRB1*04:07-DQA1*03:01- DQB1*03:02 is the MIH associated to HLA-DRB1*08:02-DQA1*04:01- DQB1*04:02 as PIH, and another case of a female form Ecuador where it represented the MIH and mtDNA was C1d. We found that native allele HLA-DRB1*14:02 in two cases was present into HLA-DRB1*14:02-DQA1*05:03-DQB1*03:01 haplotype, where the preferential association DRB1*14:02-DQB1*03:01, previously described for Native Americans as predominant, is maintained together with an inedited association with DQA1*05:03 allele (instead of the typical DQA1*05:01), never described before. However DQA1*05:03 allele occurs with a rate of ~4,4% among the Cayapa while is not present in Afro-Ecuadorians, supporting the hypothesis that we may have found a misdetected native haplotype. One case coincides with the first case described above for HLA-DRB1*09:01-DQA1*03:02-DQB1*03:03 haplotype, the second is the case of a female from Peru with mtDNA haplogroup B2. Intriguingly, allelic variants of HLA-DQA1 and -DQB1 have been associated with susceptibility/resistance to onchocerciasis, a helminthic infection caused by nematode Onchocerca volvulus that causes blindness and debilitating skin lesions. Onchocerciasis, introduced by the slave trade, is endemic among the Cayapas where DQA1*04:01 (allele frequency 24%, much higher than in other Central and South America populations) seems to represent a protective allele (like in Afro-Ecuadorians living in the same area, the Esmeraldas region), while DQA1*03:01 is only suggestive of susceptibility. DQA1 alleles seem to be involved in other infectious diseases such as tuberculosis, leprosy and schistosomiasis, confirming the pathogen-driven selective pressure acting on HLA genes and providing a possible explanation for their allele and haplotype frequencies in the Cayapa (De Angelis et al., 2011). DQA1*04:01 and *03:01 are also very frequent in the Quechua, accounting for 13.6% and 36.4% of all DQA1 alleles, respectively, while the most frequent DQB1 alleles are *03:01 (31.8%),

131

3.3 population genetics

*03:02 (18.2%), *03:03 (18.2%) and *14:02 (13.6%). In contrast to the Cayapa (but also the Quechua, at least in part) where the DQA1*01 and *02 and the DQB1*05 and *06 lineages are absent, as well as the DRB1*01, *03, *07, and *10 lineages (Imanishi et al. 1992), consistently with a population bottleneck of the ancestral population that reached the double continent (Torroni et al., 1993) or with a long history of balancing selection (Imanishi et al., 1992), we found these alleles in our samples as part of non native HLA haplotypes indicative of a more admixed original population. These alleles are described at low frequencies in several native populations of South America where they are generally considered as markers of gene flow from other continents (Tsuneto at al., 2003). Looking deeply in detail to the sample from Ecuador assigned to non- native mtDNA haplogroup U2d3, this turned out to have native DRB1*14:02 in the MIH. However until the definition of DQA1 and DQB1 alleles is complete to obtain the entire DR-DQ haplotype, this potentially informative observation can not be conclusive. On the contrary, the remaining sample from Ecuador assigned to non-native mtDNA haplogroup L3e2b showed no characteristic DRB1 alleles. Referring to the three Pervian samples recruited in the period 2015-2016 that turned out to harbor native mtDNA haplogroups (namely B2, n=2; D1, n=1), HLA typing showed the presence of DRB1*08:02 allele, DRB1*08:04 and, strikingly, unique DRB1*04:17 allele in the PIH of the male sample that turned out to be assigned to the native Y-chromosome Q. The complete definition of DQA1 and DQB1 alleles through the availability of the entire DR-DQ haplotype will ensure a more comprehensive evaluation of these findings. DRB1*04:17 most likely originated in South America, and it has been found so far exclusively in the Toba and Wichi populations of Argentina (Zhang S, et al. 1993). Two DRB1*0804 alleles have been reported to occur in South Americans. In particular DRB1*080402 has been described in the Cayapa population, while DRB1*080401 is found in African populations and in populations which received gene flow from sub-Saharan Africans. Subtyping to distinguish between these two alleles and the assessment of the DR-DQ haplotype will clarify the relevance of this finding. The presence of DRB1*080401 in Native Americans has mostly been explained by gene flow from Africans, due to its low frequency and its apparent absence in Eastern Asians. Also, DRB1*080401 is found in a haplotype bearing DQB1*0301 allele in Africans as well as in some Americans reinforcing the hypothesis of gene flow. However, among the South America native populations, the most frequent haplotype is DRB1*0804-DQA1*0401-DQB1*0402, with a frequency (4.6–8.1%) higher than the estimated admixture with Africans and Europeans. Another hypothesis is that this allele originated from a *08:02 one (Trachtenberg et al., 1995). Concerning HLA class I loci, the HLA-A*02 allele was found in four cases, but unfortunately A*02:12, which is characteristic of the Cayapa population (60%), was not present in our dataset.

132

3.3 population genetics

Finally referring to the multicentric national study, mtDNA analyisis has been completed for all 19 samples sent by ERCBB (7 males, 12 females) as well as MSY analysis for 5 out of the 7 males, while the immunogenetic overview is still ongoing. Looking at which HLA class II DRB1 alleles were present in the Bologna cohort of 5 samples (4 from Ecuador and 1 from Peru) and comparing them to the frequencies reported for the native Ecuadorian Cayapa and Peruvian Quechua populations in the AFND (www.allelefrequencies.net), in two out of 5 we found DRB1 alleles described as characteristic of the Cayapa and Quechua (and rarely detected in Europeans) such as: DRB1*04:07 and DRB1*08:02 (in a female from Ecuador whose mtDNA was assigned to haplogroup B2), and DRB1*04:04 (in a male from Ecuador whose mtDNA and Y chromosome were assigned to haplogroups H and R1b, respectively). For the first case, DQB1 alleles were typed for the reconstruction of parental haplotypes, and both turned out to be native: HLA-DRB1*04:07-DQB1*03:02 and HLA-DRB1*08:02-DQB1*04:02, even if we cannot distinguish the MIH and PIH until maternal typing is performed (ERCBB performs confirmation of maternal haplotype not at listing but at CBU release for transplantation). Even if we do not have DQA1 typing, taking into account that recombination is very unlikely between DR and DQ loci, we can consider the above haplotypes as native. This information appears to be complementary to mtDNA one (native haplogroup B2), as described above for the Pavia CBB samples. For the second case, DPB1 alleles were typed for the reconstruction of parental haplotypes, and one turned out to be native: HLA-DRB1*04:04- DPB1*04:02, even if we cannot distinguish if this corresponds to MIH or PIH until maternal typing is performed. DRB1*04:04 and DPB1*04:02 are present among the Cayapa with allele frequency of ~4% and 39.5%, respectively, where at DPB1 locus two alleles (DPB1*04:02 and DPB1*14:02) are reported to predominate accounting together in quite equal proportion for the 89% of all DP alleles. Moreover, among the Cayapa, practically DRB1*04:04 is found only in association with DPB1*04:02 in the DRB1*04:04-DPB1*04:02 haplotype (Trachtenberg et al, 1995). Even if we do not have DQA1 and DPB1 typing and recombination between DR and DP loci is rare but more likely than between DR and DQ, the disequilibrium between these loci is significant allowing us to consider the above haplotype as native. This information appears to be complementary to uniparental data (non native mtDNA and Y haplogroups H and R1b, respectively), as described above for the Pavia CBB samples. The other alleles and haplotypes from the five Bologna samples appear not valuable for the purpose of this study. 3.3.1.5 Conclusion and perspectives

Our findings indicate that high definition of HLA class I and II alleles in CBB programs might be advantageous to identify rarely represented or population specific alleles. These data, if available at time of listing CBUs on the registries, will enable to promptly identify HLA compatible donors for a higher proportion of

133

3.3 population genetics patients, in particular those of non-European background. Initiatives for enrolling non-resident candidate donors are even more desirable to increase the genetic diversity of donors‟ pool. Also, high definition of HLA class I and II alleles, by highlighting the genetic diversity underlying the current level of HLA definition, could be a valuable strategy for inventory requalification. In parallel to its mission of recruiting donors suitable for allogeneic HSCT, a CBB could also contribute to population studies including population genetics by making available its satellite sample collections and donors information. In particular, the study of HLA polymorphism at high resolution confirmed to be complementary to uniparental markers, in particular if the MIH and PIH are available. In fact, maternal HLA typing, providing the definition of the maternal (and indirectly the paternal) inherited haplotype showed to be of valuable contribution, integrating the information of HLA allelles, and even CBU haplotype alone. For instance, CB data on mtDNA sequencing were used for phylogeny construction in a larger dataset of 217 samples from Ecuador and Peru to evaluate if mitogenome variation in these countries may provide new clues on Paleo-Indians arrival and spread in South America. The results of this study have been reported in a manuscript that will appear on Molecular Biology and Evolution (Brandini et al., in press). Concerning HLA class II, the extension of the DR-DQ haplotype to include the DPB1 locus will be particularly interesting. The DPB1 allelic distribution in the Cayapa is dominated by DPB1*0402 and *1402, accounting together for the 89% of all alleles. In particular, the DPB1*1401 allele is reported as absent or rare in most human populations (Imanishi et al. 1992), while it is found at very low frequencies (< 5%) in native North Americans, at moderate frequencies (~10%) in native South Americans from Brazil and Argentina and at unusual high frequencies in the Waorani Indians of Ecuador and isolated Indian tribes of Colombia (Cerna et al. 1993). Each DPB1 allele has one or more haplotypes in significant positive disequilibrium in the Cayapa, as in other Native American populations. As DRB1 and DPB1 are separated by a recombination interval of 1%, this low but measurable recombination allows the recent evolutionary history of the system to be inferred by studying the frequencies of DR-DP haplotypes and DR-DP linkage disequilibrium data. Concerning HLA class I loci, HLA-A*24 could be further investigated in our dataset of Ecuadorian and Peruvian samples. In particular locus B alleles, which are the most polymorphic and show typical patterns of allelic variation in Native Americans, can be profitably studied in the future (Fernandez-Vina et al., 1997).

In the future, the study of KIR polymorphisms could be reasonably approached, given the evidence of coevolution with HLA in Native Americans (Augusto and Petzl-Erler, 2015). Thanks to the achievements of direct sequencing, a unique variant of the O blood group allele called O1vG542A has been identified

134

3.3 population genetics that is shared among Native Americans but rare in other populations. Therefore, the contribution of other genetic systems, such as ABO blood group, as complementary information cannot be excluded (Bodmer, 2015; Villanea et al., 2013). 3.3.2 Mitogenome variation in Ecuador and Peru 3.3.2.1 Background

The initial peopling of the Americas is a long-standing topic of debate. The archeological sites of Monte Verde in southern Chile and Pedra Furada in northeastern Brazil have played a major role. Since their preliminary excavations, they raised the possibility of human presence in South America, as far south as the Southern Cone during the final Pleistocene (Guidon and Delibrias 1986; Dillehay 1989; Parenti et al., 1990; Parenti et al. 1996; Dillehay and Collins, 1998). In recent years, compelling archeological evidence attest to human presence about 14.5 kya at multiple sites in South America (Fraser, 2014; Boëda et al., 2014; Aimola et al., 2014; Dillehay et al., 2017) and a very early exploitation of extreme high-altitude Andean environments (Rademaker et al., 2014). Studies of mitochondrial DNA (mtDNA) have extensively contributed to the current view that people entered North America approximately 16 Kya (Schurr and Sherry, 2004; Fagundes et al., 2008; O'Rourke and Raff, 2010; Achilli et al., 2013; Llamas et al., 2016), after a period of isolation in Beringia (Tamm et al., 2007; Raghavan et al., 2015; Tackney et al., 2015; Hoffecker et al., 2016) that had a major role in the shaping of the first settlers‟ genetic diversity (Perego et al., 2009). As a single maternally-inherited locus, mtDNA often does not reflect the whole complexity of past demographic processes, but allows an extremely detailed reconstruction of the nesting relationships within a phylogeny, a feature that can be extremely informative for dating migration and population separation events, especially when the sequence variation of large datasets of entire mitogenomes is considered (Richards et al., 2016). Early studies based on RFLPs and mtDNA control-region variation suggested that Native Americans exhibit a low variability when compared to other continental contexts, with only four haplogroups, initially named A, B, C and D (Schurr et al., 1990; Torroni et al., 1992; Torroni et al., 1993) later re-labelled as A2, B2, C1 and D1 (Forster et al., 1996), encompassing the vast majority of mtDNAs in the entire double continent. Subsequent studies, mostly carried out at the level of entire mitogenomes, allowed the phylogenetic dissection of the four major haplogroups and the identification of additional rare haplogroups bringing the overall number of maternal founding lineages of Asian/Beringian origin to 16 (Tamm et al., 2007; Perego et al., 2010). Among these, eight (A2, B2, C1b, C1c,

135

3.3 population genetics

C1d, C1d1, D1 and D4h3a) are often defined as "pan-American", as they are found across the double continent. The others are either extremely rare (X2g and D4e1) (Perego et al., 2009; Kumar et al., 2011) or generally restricted to the populations of the arctic and subarctic regions of North America. In addition to the founding haplogroups of Beringian/Asian origin, mitogenome analyses have identified a few sub-haplogroups whose geographical distributions and estimated ages indicate an in situ origin in the Americas shortly after or within a few millennia from the initial peopling. Currently they include: B2b, which is shared between North and South America and has been preliminarily dated to 21 Kya on the basis of 14 complete mitogenomes (Taboada-Echalar et al., 2013); B2a, dated to 11-13 Kya and whose distribution is restricted to the US and Mexico, with traces in Canada (Achilli et al., 2013); and four sub-haplogroups (D1g, D1j, C1b13 and B2i2; 11-16 Kya) that have been identified in the Southern Cone of South America (Bodner et al., 2012; de Saint Pierre et al., 2012). The geographical distribution of B2b is best explained with an origin in North American Paleo-Indians (Taboada-Echalar et al., 2013) while they were expanding southward. In this scenario B2b was carried first to Meso-America and then to South America together with A2, B2, C1b, C1c, C1d, C1d1, D1 and D4h3a of Beringian origin. The other five sub-haplogroups instead most likely arose sometime later, B2a in North America and the others in the Southern Cone, after that the front of the expansion wave had already passed through, thus remaining mostly confined to the geographic area where each arose, especially if the processes of adaptation to the different environments, as suggested by archeological evidence (Rademaker et al., 2014), and tribalization (Torroni et al., 1993) of Paleo-Indian settlers began early. 3.3.2.2 Aim of the research

A recent study of mitogenomes in the Mediterranean basin has shown that, if the identification and dating of autochthonous sub-haplogroups that arose in situ shortly after the first peopling event is accompanied by the identification and dating of a close upstream node in the phylogeny (which instead arose somewhere else, outside the area of interest, prior to the colonization event), minimum and maximum times for the presence of autochthonous sub-haplogroups in the area can be estimated (Olivieri et al., 2017). This approach can provide rather narrow time boundaries for the peopling event and was applied to the present study aiming to shed light on the entry time of Paleo-Indians into South America from a genetic perspective. In the three years of my PhD studies, I contributed to a study (Brandini et al. in press) that applied this approach to a population sample from South America, hoping to identify and accurately date as many sub-haplogroups as possible, including those that arose in South America early after the peopling event, and

136

3.3 population genetics those - as B2b - that instead arose in North or Central America after the human entry from Beringia, but prior to the peopling of South America. 3.3.2.3 The sample

A total of 227 novel modern mitogenomes of Native American ancestry from the northwestern area of South America (Ecuador and Peru) were completely sequenced. These mitogenomes were then evaluated phylogenetically together with all previously published mitogenomes (both modern and ancient) from the same geographic area and, finally, with all closely related mitogenomes from the entire double continent reported in the literature. These analyses allowed the detection of numerous novel sub-haplogroups belonging to two classes: those that arose in South America early after its peopling and those that instead originated in North or Central America and reached South America with the first settlers. Coalescence age estimates for these sub- haplogroups were found in agreement with archaeological evidence attesting to human presence in South America 14.5 Kya and provided time boundaries indicating that early Paleo-Indians probably moved from North to South America over the short time frame of 1.5 Ky. Of the 227 newly sequenced mitogenome, 217 were Ecuadorians (93 Native Americans and 124 Mestizos), representatives of all major regions of Ecuador (table 3.3.5), and ten were Peruvians (all Mestizos) (table 3.3.6). Ethnicity and genealogical information were ascertained by direct interview. For all individuals an appropriate written informed consent was obtained. Genomic DNA was extracted and purified from either buccal swabs (202 Ecuadorians), or mouthwash (1 Peruvian and 4 Ecuadorians), or cord blood (9 Peruvians and 11 Ecuadorians) (table 3.3.7) following standard phenol/chloroform methods or by automated device (cord blood). MtDNAs of Native American ancestry were identified and selected through a preliminary survey of the mtDNA control region from (np) 16024 to np 300 following a standard Sanger protocol (Karachanak et al., 2012). The identified mutational motifs, relative to the revised Cambridge Reference Sequence (rCRS) (Andrews et al., 1999), allowed the classification of mtDNAs into Native American and Old World haplogroups (217 and 10, respectively). Then the 217 subjects (208 from Ecuador and nine from Peru) harbouring diagnostic mutational motifs of Native American haplogroups underwent sequencing of the entire mitogenome.

Table 3.3.5 Geographic origin of the 217 Ecuadorean samples and their subdivision into regions and provinces (Brandini et al., in press). Sample IDs are those reported in the phylogenetic trees, except for the nine mitogenomes (in bold) classified into Old World mtDNA haplogroups.

137

3.3 population genetics

No. per % per Regions Provinces Sample IDsa province region

North Carchi #010, #062, #063, #064 4 3.69 Ecuador Imbabura #024, #071, #282, #653 4

#009, #011, #012, #014, #018, #042, #043, #052, #070, #166, North- #191, #279, #281, #285, #292, Central Pichincha #293, #294, #302, #304, #305, 34 15.67 Ecuador #319, #322, #334, #415, #426, #462, #463, #469, #487, #489, #490, #578, #579, #654

#008, #015, #022, #051, #072, Chimborazo 11 #130, #313, #314, #427, #483, #524

Central #007, #075, #132, #145, #154, 13.82 Ecuador Cotopaxi #240, #280, #315, #321, #428, 14 #465, #491, #570, #582

Tungurahua #049, #050, #074, #414, #535 5

Azuay #128, #522 2 South- Central Cañar #187 1 2.30 Ecuador Morona-Santiago #399, #458 2

El Oro #053, #169 2

#067, #129, #133, #134, #135, #136, #137, #138, #140, #141, #143, #148, #161, #164, #167, #168, #171, #172, #174, #175, #176, #179, #181, #182, #184, #185, #186, #283, #288, #289, South #291, #295, #296, #297, #306, #308, #316, #317, #320, #326, 36.41 Ecuador Loja 77 #327, #328, #329, #331, #333, #337, #340, #342, #343, #347, #348, #354, #355, #442, #443, #444, #446, #447, #448, #449, #450, #451, #452, #454, #456, #460, #468, #485, #492, #526, #541, #577, #581, #584, #590, #642, #652

138

3.3 population genetics

Litoral Bolìvar #020, #298, #464 3 2.30 region Los Rìos #044, #371 2

Manabì #069 1 Pacific 0.92 Santo Domingo region #017 1 de los Tsàchilas

#016, #019, #041, #048, #068, #131, #139, #142, #144, #160, #162, #165, #170, #173, #178, #180, #183, #188, #212, #286, #287, #290, #299, #301, #309, Unknown #310, #311, #312, #318, #332, 54 24.88 #338, #339, #341, #353, #445, #453, #455, #457, #466, #473, #494, #525, #569, #583, #586, #587, #591, #592, #648, #649, #650, #651, #655, #656

Table 3.3.6 Ethnic origin of the 217 Ecuadorean samples and the ten individuals from Peru (underlined) (Brandini et al., in press). Individuals are classified as either „Native Americans‟ or „Mestizos‟. For Native American individuals, when available, the ethnicity is specified. Sample IDs are those reported in the phylogenetic trees. The ten mtDNA (in bold) belonging to Old World haplogroups were not included in the phylogenetic analyses.

Ethnicity Sample IDsa N

"Native Americans"

(autochthonous ethnic groups)

Cañari #187 1

Cayambe #064, #279, #463 3

Otavalo #024 1

Panzaleo (Quichua) #049, #132, #154, #570, #582 5

Puruhà #314, #427, #428 3

Quichua #069, #298 2

Quitu-Cara/Cayambe #322, #469 2

139

3.3 population genetics

Quitu-Cara #487 1

Salasaca #044 1

Saraguro #348, #442, #443 3

Shuar #399, #458 2

Tsàchila #017 1

#067, #072, #128, #133, #134, #135, #136, #137, #138, #139, #141, #142, #143, #145, #148, #160, #161, #162, #171, #172, #173, #174, #176, #179, #180, #181, #182, #183, #184, #185, #283, #286, #287, #289, #290, #295, Others 68 #296, #297, #306, #309, #310, #311, #312, #316, #317, #318, #326, #329, #337, #339, #340, #341, #347, #353, #354, #444, #445, #446, #447, #448, #449, #450, #451, #452, #454, #455, #466, #525

#007, #008, #009, #010, #011, #012, #014, #015, #016, #018, #019, #020, #022, #041, #042, #043, #048, #050, #051, #052, #053, #062, #063, #068, #070, #071, #074, #075, #129, #130, #131, #140, #144, #146, #164, #165, #166, #167, #168, #169, #170, #175, #178, #186, #188, #191, #194, #212, #240, #259, #262, #280, #281, #282, #285, #288, #291, #292, #293, #294, #299, #301, #302, "Mestizos" #304, #305, #307, #308, #313, #315, #319, #320, #321, 134 #327, #328, #331, #332, #333, #334, #338, #342, #343, #355, #371, #414, #415, #426, #453, #456, #457, #460, #462, #464, #465, #468, #473, #483, #485, #489, #490, #491, #492, #494, #503, #522, #524, #526, #535, #541, #569, #577, #578, #579, #581, #583, #584, #586, #587, #590, #591, #592, #598, #623, #624, #642, #648, #649, #650, #651, #652, #653, #654, #655, #656, #657

Table 3.3.7 Haplogroup frequencies of mtDNAs from Ecuador (Brandini et al., in press). Only mtDNAs sequenced in this study were included.

Haplogroups Native Americans (%) Mestizos (%) All Ecuadoreans (%) N=93 N=124 N= 217 Pan-American 93 (100) 115 (92.74) 208 (95.85) A2 9 (9.68) 28 (22.58) 37 (17.05) B2 61 (65.59) 52 (41.94) 113 (52.07) C1b 20 (21.51) 14 (11.29) 34 (15.67)

140

3.3 population genetics

C1c 0 (0.00) 2 (1.61) 2 (0.92) C1d 1 (1.08) 6 (4.84) 7 (3.23) D1 1 (1.08) 12 (9.68) 13 (5.99) D4h3a 1 (1.08) 1 (0.81) 2 (0.92) Old World 0 (0.00) 9 (7.26) 9 (4.15) L2a1 0 (0.00) 1 (0.81) 1 (0.46) L3e2b 0 (0.00) 4 (3.23) 4 (1.84) R0a 0 (0.00) 2 (1.61) 2 (0.92) U2d3 0 (0.00) 1 (0.81) 1 (0.46) U5b3f 0 (0.00) 1 (0.81) 1 (0.46)

Table 3.3.8 Frequencies of the sub-haplogroups identified in the Ecuadorian and Peruvian modern populations (Brandini et al., in press). Subjects with Old World mitogenomes were excluded. Only mitogenomes of Native American ancestry from this study, HGDP and Tito et al., (2012) were included because they were randomly selected. This led to the exclusion of 13 A2 (Tamm et al., 2007; Cardoso et al., 2012), one B2 (Tamm et al., 2007), one C1b (Greenspan, direct submission 2015), nine C1d (Perego et al., 2010) and 12 D4h3a (Perego et al., 2009) mitogenomes.

Haplogroup/ Ecuador (%) Peru (%) Sub-haplogroup N=208 N=335 A2 37 (17.79) 68 (20.30) A2k 4 (1.92) 7 (2.09) A2y 2 (0.96) 2 (0.60) A2z 0 (0.00) 4 (1.19) A2aa 0 (0.00) 5 (1.49) A2ac 13 (6.25) 0 (0.00) A2am 0 (0.00) 2 (0.60) A2ar 2 (0.96) 2 (0.60) A2as 0 (0.00) 4 (1.19) A2at 0 (0.00) 8 (2.39) A2au 0 (0.00) 5 (1.49) A2av 6 (2.88) 3 (0.90) A2aw 3 (1.44) 0 (0.00) Other A2 7 (3.37) 26 (7.76) B2 113 (54.33) 147 (43.88) B2b 64 (30.77) 61 (18.21) >B2b5 22 (10.58) 2 (0.60) >B2b6 26 (12.50) 5 (1.49)

141

3.3 population genetics

>B2b7 6 (2.88) 0 (0.00) >B2b8 3 (1.44) 1 (0.30) >B2b9 1 (0.48) 10 (2.99) >B2b10 0 (0.00) 10 (2.99) >B2b11 0 (0.00) 11 (3.28) >B2b12 1 (0.48) 6 (1.79) >B2b13 0 (0.00) 3 (0.90) >Other B2b 5 (2.40) 13 (3.88) B2l 8 (3.85) 0 (0.00) B2o 0 (0.00) 0 (0.00) B2q 17 (7.17) 1 (0.30) B2y 0 (0.00) 7 (2.09) B2z 14 (6.73) 0 (0.00) B2aa 1 (0.49) 6 (1.79) B2ab 0 (0.00) 14 (4.18) B2ac 0 (0.00) 5 (1.49) B2ad 0 (0.00) 3 (0.90) B2ae 0 (0.00) 3 (0.90) B2ag 0 (0.00) 7 (2.09) B2ah 0 (0.00) 3 (0.90) Other B2 9 (4.33) 37 (11.04) C1 43 (20.67) 67 (20.00) C1b 34 (16.35) 62 (18.51) >C1b2 0 (0.00) 10 (2.99) >C1b16 0 (0.00) 5 (1.49) >C1b21 0 (0.00) 3 (0.90) >C1b23 16 (7.69) 0 (0.00) >C1b24 0 (0.00) 4 (1.19) >C1b25 0 (0.00) 3 (0.90) >C1b26 1 (0.48) 4 (1.19) >C1b28 3 (1.44) 0 (0.00) >C1b29 4 (1.92) 0 (0.00) >Other C1b 10 (4.81) 33 (9.85) C1c 2 (0.96) 5 (1.49) C1d 7 (3.37) 0 (0.00) D4h3a 2 (0.96) 3 (0.90) D1 13 (6.25) 50 (14.93) D1f 9 (4.33) 3 (0.90) D1k 0 (0.00) 5 (1.49)

142

3.3 population genetics

D1o 0 (0.00) 4 (1.19) D1p 0 (0.00) 3 (0.90) D1q 0 (0.00) 4 (1.19) D1r 0 (0.00) 6 (1.79) D1s 0 (0.00) 4 (1.19) D1t 0 (0.00) 3 (0.90) D1u 0 (0.00) 5 (1.49) Other D1 4 (1.92) 13 (3.88)

3.3.2.4 Results 3.3.2.4.1 The mitogenome variation

To survey mitogenome variation in northwestern South America, DNA was obtained from 217 Ecuadorians (93 Native Americans and 124 Mestizos), representatives of all major regions of Ecuador and ten (all from Mestizos) Peruvian individuals (tables 3.3.5 and 3.3.6). An initial preliminary survey of the mtDNA control-region showed that 208 (96%) of the Ecuadorians harbour mitogenomes of Native American ancestry. The founder pan-American haplogroup B2 is very common in northwestern South America (52.1%), but also all others (A2, C1b, C1c, C1d, D1 and D4h3a) were detected. Only nine mitogenomes, all from Mestizos, were members of Old World haplogroups: L2a1 (N=1), L3e2b (N=4), R0a (N=2), U2d3 (N=1) and U5b3f (N=1). These findings are consistent with the multiple ancestral sources (Native Americans, Europeans and Africans) that have contributed to the formation of the modern Ecuadorian population (González-Andrade et al., 2007; Santangelo et al., 2017). As for Peru, one of the ten mtDNA control regions was classified into an Old World haplogroup, the European U5a1a1. The 217 mitogenomes of Native American ancestry mentioned above (208 from Ecuador and nine from Peru) underwent complete sequencing and were employed, together with 430 previously published Ecuadorian and Peruvian mitogenomes from both modern and ancient samples to reconstruct the phylogenies of the pan-American haplogroups A2, B2, C1b, C1c, C1d, D1 and D4h3a in northwestern South America. 3.3.2.4.2 Phylogenetic analysis

The phylogenetic relationships of the 647 mitogenomes showed that within each of the pan-American haplogroups, we can detect extensive differentiation into derived branches, including a large number (N=48) of novel sub-haplogroups, often branching into further sub-clades.

143

3.3 population genetics

The frequencies of sub-haplogroups in modern Ecuadorians and Peruvians were determined by excluding mitogenomes from non-random population surveys (table 3.3.8). Despite some differences, in particular for D1 (6.3% in Ecuador vs. 14.9% in Peru), frequencies at the level of founding pan-American haplogroups are rather similar in the two countries. We also assessed some mitogenome diversity parameters in the two geographic areas (table 3.3.9) employing the same dataset used to assess haplogroup frequencies. Also in this case we did not detect major differences.

Table 3.3.9 Mitogenome diversity in Ecuador and Peru. Only subjects with Native American mitogenomes were considered. Only modern mitogenomes from this study, HGDP and Tito et al., (2012) were included here because they were randomly selected; indels and heteroplasmies were not considered.

We also included 68 previously published ancient mitogenomes in our analyses, all from Peru (Gòmez-Carballa et al., 2015; Fehren-Schmitz et al., 2015; Llamas et al., 2016). Similar to present-day mitogenomes, they encompassed all founding pan-American haplogroups, except for the relatively rare D4h3a. Considering the phylogenetic relationships between ancient mitogenomes and those from modern Ecuadorians and Peruvians, only one ancient Peruvian B2 mitogenome (#242), dated at 1,639±275 years ago, turned out to be completely identical to a modern (Peruvian) mitogenome (#241). After having classified modern and ancient mitogenomes from Ecuador and Peru into numerous sub-haplogroups, we searched for their diagnostic mutational motifs in our in-house database that encompasses more than 1,700 published mitogenomes of Native American origin. We identified 85 mitogenomes belonging or phylogenetically closely related to the sub-haplogroups found in Ecuador and Peru. Their phylogenetic and geographical evaluation revealed two classes of sub-haplogroups: those restricted to South America and those with representatives also in the northern part of the double.It should be underscored that the distinguishing mutational motifs of the sub-haplogroups in both classes are restricted to the Americas. This implies that they arose somewhere in the Americas during the (long) time frame that ranges from the initial human entry into the continent to very recent times. To estimate the minimum ages of these sub-

144

3.3 population genetics haplogroups, we calculated coalescence times with Maximum Likelihood (ML) computation. As expected sub-haplogroup ages varied widely, with some close to the postulated entry time of Paleo-Indians into North America and others which are extremely young (< 1 Ky). The sub-haplogroups with ML ages that are equal to or older than 14 Ky are listed in table 3.3.10; they include most (9 out of 11) of those also found in the northern part of the double continent and many sub-haplogroups detected only in South America.

Table 3.3.10 Distribution of sub-haplogroups with ML age estimates equal or older a than 14 Ky (Brandini et al., in press). The underlined sub-haplogroups are restricted to South b America. Number of all mitogenomes included in the analysis (both modern and ancient). The number of ancient mitogenomes is in brackets. cML age estimates are from the dataset including both modern and ancient mitogenomes (Supplementary table S5). dThe sub-haplogroup nomenclature differs from that reported in PhyloTree (http://www.phylotree.org). eSub-haplogroups defined for the first time in this study.

ML age estimates ± SE Sub-haplogroupsa Nb Geographical distribution (Ky)c

USA, Mexico, Venezuela, Wayuu A2kd 20 15.32 ± 1.97 (Colombia/Venezuela), Ecuador, Peru

A2yd 15 14.38 ± 1.64 Ecuador, Peru

A2zd 13 14.43 ± 2.00 USA, Puerto Rico, Peru

A2ar 5 15.20 ± 1.37 Guatemala, Ecuador, Peru

A2ase 6 (2) 15.88 ± 1.56 Peru

A2ate 10 (2) 15.89 ± 1.41 Peru

USA, Puerto Rico, Mexico, Colombia, B2b 158 (11) 15.99 ± 0.92 Bolivia, Brazil, Venezuela, Ecuador, Peru

>B2b2 4 14.76 ± 1.19 USA, Bolivia

>B2b3 8 14.77 ± 1.09 USA, Puerto Rico, Brazil, Venezuela

>B2b6e 31 14.56 ± 1.11 Ecuador, Peru

>B2b11e 15 (4) 13.99 ± 1.17 Peru

B2le 10 15.64 ± 1.31 Mexico, Ecuador

>B2l1 9 14.23 ± 1.38 Mexico, Ecuador

B2o1d 5 14.19 ± 1.96 Bolivia, Ecuador

B2q 22 (1) 14.52 ± 1.86 USA, Mexico, Ecuador, Peru

B2aae 15 (5) 14.60 ± 1.63 Mexico, Ecuador, Peru

145

3.3 population genetics

B2abe 15 (1) 14.28 ± 1.13 Bolivia, Peru

B2ace 5 16.07 ± 1.27 Peru

C1b21 5 (2) 14.53 ± 1.71 Peru

C1b24e 4 16.27 ± 1.57 Peru

C1b26e 5 16.66 ± 1.21 Ecuador, Peru

>C1b26ae 4 15.55 ± 1.23 Ecuador, Peru

>>C1b26a1e 3 14.47 ± 1.26 Peru

C1b29e 5 14.64 ± 2.38 Ecuador

D4h3a11 3 15.67 ± 1.81 Peru

USA, Mexico, Brazil, Colombia, D1f 27 18.05 ± 1.38 Venezuela, Ecuador, Peru

D1k 9 20.57 ± 2.76 USA, Mexico, Peru

>D1k1e 5 15.81 ± 2.40 USA, Peru

D1oe 5 (1) 17.92 ± 1.63 Peru

D1qe 4 17.15 ± 1.52 Peru

D1r 6 17.86 ± 1.93 Peru

D1te 3 18.81 ± 1.43 Peru

SOUTH AMERICAN-SPECIFIC SUB-HAPLOGROUPS Most sub-branches within the pan-American haplogroups A2, B2, C1b, C1c, C1d, D1 and D4h3a turned out to encompass only mitogenomes from South America. Some of these sub-haplogroups are apparently restricted to Ecuador (A2ac2, A2av1a, A2aw, B2b5a, B2b5b1a, B2b6a1a, B2b7, B2b8a, B2l1a, B2z, C1b23, C1b28, C1b29 and C1d1f), some to Peru (A2z2, A2as, A2at, A2au, B2b9a, B2b9b, B2b10, B2b11, B2b12b, B2b13, B2aa1a, B2ab1a1, B2ac, B2ad, B2ae, B2y2, B2ag, B2ah, C1b16, C1b19, C1b21, C1b24, C1b25, C1b26a1, C1b27, C1d1e, D4h3a11, D1k1a, D1o, D1p, D1q, D1r, D1s, D1t and D1u), others are detected in both countries (A2y, A2av1, B2b6, B2b8, B2b9, B2b12, B2q1, C1b26), and some harbour representatives also from elsewhere in South America (B2ab, B2b5, B2o1). These geographical distributions indicate that their distinguishing mutational motifs most likely arose in situ (in South America) sometime during or after the process of human entry and spread into the southern sub-continent. It is interesting to note that, among the 647 mitogenomes from Ecuador and Peru, we

146

3.3 population genetics did not detect any belonging to the sub-haplogroups (B2i2, C1b13, D1g and D1j) previously identified in the Southern Cone of South America (de Saint-Pierre et al., 2012; Bodner et al., 2012). This observation not only confirms that B2i2, C1b13, D1g and D1j indeed arose in the Southern Cone in the terminal phase of the migration process from North to South, but also reveals that there was very limited maternal lineage gene flow, if any, from South to the North along the Pacific in the following millennia. Many of the sub-haplogroups observed in Ecuador and Peru encompass only a rather small number of mitogenomes, thus their estimated ages should be considered with some caution. However, among the oldest (table 3.3.10) there are four, each represented by at least 15 mitogenomes, with virtually identical coalescence ages: A2y (14.4 ± 1.6 Ky), B2b6 (14.6 ± 1.1 Ky), B2b11 (14.0 ± 1.2 Ky) and B2ab (14.3 ± 1.1 Ky). SUB-HAPLOGROUPS FOUND ALSO IN NORTH AND CENTRAL AMERICA

This second class is made up by eleven sub-haplogroups (A2k, A2z, A2ac, A2ar, A2av, B2b, B2l, B2q, B2aa, D1f and D1k) that are detected also in North or Central America. Three explanations can be envisioned for their geographical distributions: 1) they arose in North or Central America, prior to the arrival of humans in South America, and reached the southern continent with the first colonizers of the sub-continent; 2) they arose in North or Central America but their presence in South America is accounted for by later migratory events; 3) they might have arisen in South America after human arrival in the subcontinent and have later spread towards the North. Most of these sub-haplogroups encompass only a limited number of mitogenomes, but they are characterized by some informative phylogenetic features that help to discriminate between the three possibilities listed above. For instance, B2q, B2aa, B2l and D1k show sub-branches (or haplotypes) that depart directly from the sub-haplogroup root and are detected either only in Mexico (and the US) or only in Ecuador and/or Peru (figure 3.3.6), a feature in agreement with the scenario that they arose in North America. Moreover, their coalescence times are all >14 Ky (table 3.3.10) indicating that they most likely reached South America with the first entry in the subcontinent.

147

3.3 population genetics

Figure 3.3.6 Phylogenetic relationships and geographical distributions of B2q (A), B2aa (B), B2l (C) and D1k (D) modern mitogenomes (Brandini et al., in press). Each colour corresponds to a country, thus colors of mitogenomes (squares) in the schematic trees correspond to their (maternal) geographical origin, except for white that indicates an unknown source. In the tree, each square corresponds to one mitogenome unless otherwise indicated by the number close or below the square. Each black dot on a branch indicates a mutation. The red dots in the sub-branch D1k1 (panel D) are all reversions. Their reliability is dubious and they might be the cause of the high age estimate of D1k. In the map (drawn by hand), the number of mitogenomes for each haplogroup is shown per country and the sizes of circles are proportional to the numbers of mitogenomes. A similar branching pattern is also observed for sub-haplogroup A2ar (15.2 ± 1.4 Ky), with the difference here that the northern mitogenome departing from the root is not from Mexico or the US but from Guatemala (figure 3.3.7, panel A).

148

3.3 population genetics

Figure 3.3.7 Phylogenetic relationships and geographical distributions of A2ar (A), D1f (B), A2z (C) and A2k (D) modern mitogenomes (Brandini et al., in press). Each color corresponds to a country, thus colors of mitogenomes (squares) in the schematic trees correspond to their (maternal) geographical origin, except for white that indicates an unknown source. In the tree, each square corresponds to one mitogenome unless otherwise indicated by the number close or below the square. Each black dot on a branch indicates a mutation. In the map, the number of mitogenomes for each haplogroup is shown per country and the sizes of circles are proportional to the numbers of mitogenomes. Not all members of this class of sub-haplogroups harbor only a few representatives. B2b, as mentioned above, is extremely frequent both in Ecuador and Peru. A phylogeny of B2b has been previously proposed (Taboada-Echalar et al., 2013). With the addition to the phylogeny of 134 novel B2b modern mitogenomes, we were able to identify nine new internal sub-clades (B2b5- B2b13), some already mentioned above, and obtained a more accurate estimate of its age. Among the 147 B2b mitogenomes included in figure 3.3.8, nine are from North and Central America. The remaining B2b mitogenomes (N=138) are from

149

3.3 population genetics

South America, but unlike the sub-haplogroups mentioned above, they were found not only in the countries along the Pacific coast, but also in the Atlantic regions of the subcontinent (figure 3.3.8).

Figure 3.3.8 Phylogenetic relationships and geographical distributions of B2b modern mitogenomes (Brandini et al., in press). Each colour corresponds to a country, thus colors of mitogenomes (squares) in the schematic tree correspond to their (maternal) geographical origin. In the tree, each square corresponds to one mitogenome unless otherwise indicated by the number close or below the square. Each black dot on a branch indicates a mutation. In the map, the number of mitogenomes is shown per country and the sizes of circles are proportional (except for Ecuador and Peru) to the numbers of mitogenomes. Many of the mitogenomes from the US belonging to B2b and other sub- haplogroups are from individuals classified as "Hispanics", a rather generic term that refers to individuals of Cuban, Mexican, Puerto Rican, South or Central American origin. Thus, their ancestral ethnogeographic source remains essentially undefined unless genealogical information is available or an accurate evaluation of the phylogenetic relationships between members of the sub-haplogroup is performed. Figure 3.3.8 shows that two of the mitogenomes from the US, including the one from the Pomo (North California), and the one from Mexico depart directly from the root of B2b, in agreement with the scenario that B2b originated in North America as well. Moreover, its age estimate (16.0 ± 0.9 Ky) indicates an early occurrence after the human entry from Beringia. Another one of the US mitogenome clusters within the sub-branch B2b2 together with three mitogenomes from Bolivia. It should be noted that B2b2 (figure 3.3.8) is one of the oldest sub- branches (14.8 ± 1.2 Ky) of B2b and that also in this case the mitogenome from the

150

3.3 population genetics

US diverges directly from the ancestral root. This suggests that not only B2b but also B2b2 might have arisen in North or Central America. The phylogenetic connections of the remaining B2b mitogenome from the US (KM102108), a member of B2b3 (figure 3.3.8), are instead clearly indicative of Puerto Rico as an ancestral source. Indeed it is a direct derivative of a haplotype found in two subjects from Puerto Rico, in turn deriving from another one found in both Puerto Rico and Venezuela that is distantly related to mitogenomes from Brazil (Kayapo and Yanomama). These phylogenetic links become clearer when considering the strong Taino mtDNA component in modern Puerto Ricans and that the Tainos were the final outcome of migratory events from the Orinoco river basin, which eventually reached the Greater Antilles about 5 Kya (Martínez-Cruzado et al., 2001). Thus, B2b3 most likely represents an additional ancient branch (14.8 ± 1.1 Ky) that arose in South America at a very early stage of human spread into the southern subcontinent, a branch now found in the Greater Antilles because of ancient migrations, and in the US because of recent gene flow from Puerto Rico. It should also be mentioned that Martinez-Cruzado has raised the possibility, based also on the apparent lack (at least so far) of B2 mitogenomes in ancient DNA studies in the Caribbean, that most B2 mitogenomes in Puerto Ricans might have arrived during the post-contact era, a scenario that could also apply to B2b3. While, we cannot rule out this scenario at the moment, we can say that even if B2b3 arrived in Puerto Rico after the Taino period, its ancestral source remains unchanged (the Orinoco river basin). Phylogeographic data suggest a similar Puerto Rican origin also for the US mitogenomes of sub-haplogroup A2z (figure 3.3.7), which is split into A2z1 and A2z2, with the former branch already identified in Puerto Rico and Cuba by assessing mtDNA control-region variation (Vilar et al., 2014). The phylogeny shows that all six US mitogenomes are members of A2z1, a branch that they share with three Puerto Rican mitogenomes and one previously published from Peru. Taking into account that three of the US mitogenomes are identical to two from Puerto Rico, it is likely that the six US members of A2k1 are indeed all from Puerto Rico, and that A2z (14.4 ± 2.0 Ky), similarly to B2b3, also arose in South America. Finally, there are two additional sub-haplogroups, D1f and A2k, with phylogeographic features that are suggestive of a North American origin. Despite their much lower frequency, they both show geographic distributions very similar to that of B2b, with mitogenomes not only from North America and northwestern South America (Ecuador and Peru), but also from South American countries of the Atlantic area (figure 3.3.7). 3.3.2.5 Discussion

To date the nature and extent of mtDNA variation in northwestern South America is still poorly investigated at the level of entire mitogenomes. In this

151

3.3 population genetics study, we analyzed 647 mitogenomes from Ecuador and Peru, detecting all pan- American haplogroups (A2, B2, C1b, C1c, C1d, D1 and D4h3a) that, according to previous studies (Tamm et al., 2007; Perego et al., 2009; Achilli et al., 2013), entered into North America from Beringia along the Pacific coast. This confirms that the founding haplotypes of these haplogroups were also involved in the human entry into South America. As expected, we did not detect representatives of the other founding Native American haplogroups for which either a different entry route (ice free corridor) (X2a, X2g and C4c) or a later arrival from Beringia/Alaska (A2a, A2b, D2a and D3) have been proposed (Perego et al., 2009; Hooshiar Kashani et al., 2012; Achilli et al., 2013). Notably, we also did not detect any mitogenomes of either Oceanian or South East Asian origin, a scenario that has been recently re-opened by two genome-wide studies that apparently identified signals of shared ancestry with Australo-Melanesians in some Amazonian populations (Surui, Karitiana, Xavante) of Brazil (Raghavan et al., 2015; Skoglund et al., 2015). Within each of the pan-American haplogroups, we observed extensive differentiation into derived branches, including 48 newly defined sub-haplogroups. When we searched for the diagnostic mutational motifs of these derived branches in a database of about 1,700 previously published mitogenomes of Native American origin, we identified 85 mitogenomes belonging or closely related to these sub-haplogroups. An overall phylogeographic assessment revealed two classes of sub-haplogroups: those restricted to South America and those with representatives also in the northern part of the double. The geographical distributions of sub-haplogroups that we observed in this study might be in some cases inaccurate. The first reason for this limitation is the number of published mitogenomes (about 1,700) available for comparisons that is not large enough to provide a good representation of the presence/absence of each sub-haplogroup in the different regions of the double continent. The second is that many Native American populations experienced a dramatic size reduction at the time of European contact, with genetic drift playing a major role in shaping the remaining genetic variation. Thus geographical distributions of sub-haplogroups prior to the arrival of Europeans might be very different from the current ones. We can not exclude, for instance, that some of the sub-haplogroups that we now consider as restricted to South America might have indeed unsampled representatives in living individuals for North and/or Central America, or that they were present in North or Central American at the time of European arrival and then they went extinct. Taking into account the limitations mentioned above, the simplest explanation, at least for the moment, for sub-haplogroups with a geographical distribution restricted to South America is that they arose in South America. If so, the oldest of these sub-haplogroups would have been the first to arise and their coalescence ages would provide a lower boundary for human presence in the southern sub-continent.

152

3.3 population genetics

Interestingly among these, the four most represented (A2y, B2b6, B2b11, B2ab; each ≥ 15 mitogenomes) harbor virtually identical ages (table 3.3.10), indicating that humans were already in South America by 14.0-14.6 Kya. This time frame is further supported by the ages of A2z (14.4 ±2.0 Ky) and B2b3 (14.8 ± 1.1 Ky), which most likely also arose in South America at a very early stage of human spread into the southern subcontinent and are now also found in the Greater Antilles because of ancient migrations and in the US because of recent gene flow from Puerto Rico. The sub-haplogroups found also in the northern part of the double continent are not as numerous as those restricted to South America, but almost all harbor old coalescence ages (figure 3.3.6-8), as would be expected in the scenario that they arose in North or Central America prior to the spread of Paleo-Indians into South America. These include B2q, B2aa, B2l, D1k, A2ar, D1f and A2k (figure 3.3.6 and 3.3.7) and B2b (figure 3.3.8). These most likely arrived together with the founder haplotypes of haplogroups A2, B2, C1b, C1c, C1d, D1 and D4h3a, thus indicating that overall at least 15 different founding mtDNA haplotypes arrived to South America at the time of human entry in the sub- continent and could have exploited the selective advantage of being part of an expanding wave front (Moreau et al., 2011). Among these, B2b and its derivatives are extremely informative. They reveal that the early Paleo-Indian carriers of B2b probably moved from North America to the area corresponding to modern Ecuador and Peru over the short time frame of 1.5 Ky comprised between 16.0 ± 0.9 Kya and 14.6 ± 1.1 Kya, corresponding to the ages of B2b and its oldest South American branch in the northwestern part of the sub-continent, respectively - a finding that fits with archaeological evidence attesting to human presence at the Monte Verde site in southern Chile at least 14.5 Kya (Dillehay et al., 2015) and the conclusions of some earlier mitogenome studies (Bodner et al., 2012; de Saint Pierre et al 2012). Finally, some of the sub-haplogroups that arose in North or Central America and later spread into South America also provide valuable information concerning the routes of diffusion of the first South Americans. In particular, the geographical distributions of the A2k, B2b and D1f mitogenomes (figure 3.3.7 and 3.3.8) indicate that the first settler population(s) might have undergone an early split in the northern part of South America (Wang et al., 2007), followed by diffusion along both the Pacific and Atlantic coastal regions (figure 3.3.9).

153

3.3 population genetics

Figure 3.3.9 Diffusion routes of first South Americans as suggested by the geographical distributions of A2k, B2b and D1f mitogenomes (Brandini et al., in press) The number of mitogenomes belonging to haplogroups A2k, B2b and D1f is reported per country and the sizes of circles are proportional (except for B2b in Ecuador and Peru) to the numbers of mitogenomes.

It should be underlined that such a scenario is also compatible with an early finding that until now has not been fully evaluated. One of the previously published mitogenomes (GenBank FJ68754) belonging to D4h3a, a haplogroup so far considered a marker of the Paleo-Indian spread along the Pacific coast, has been found in the northeastern part of Brazil (Maranhão state) (Perego et al., 2009). We now know that this mitogenome does not belong to any of the several D4h3a sub- branches that characterize the western part of South America (Lindo et al., 2017); instead, it departs directly from the haplogroup root. Therefore a post-peopling event of gene flow from the western part of the sub-continent is a rather unlikely explanation for its presence in northeastern Brazil. In contrast, its detection there is fully compatible with the scenario that D4h3a is an additional haplogroup that, similar to A2k, B2b and D1f, was present in both population subsets that moved

154

3.3 population genetics along the Pacific and Atlantic coasts after the initial split in the northern part of South America. 3.3.2.6 Conclusion

According to genetic evidence, human entry into North America from Beringia most likely has occurred ~16 Kya. Recent archaeological evidence attests to human presence ~14.5 thousand years ago (Kya) at multiple sites in South America and at extreme high-altitude in the Andes, thus implying an extremely rapid spread along the double continent. To shed light on this issue, we completely sequenced 217 novel modern mitogenomes of Native American ancestry from the northwestern area of South America (Ecuador and Peru), we evaluated them phylogenetically together with other available mitogenomes, both modern and ancient, from the same geographic area, and with all closely related mitogenomes from the entire double continent. We detected a large number of novel sub-haplogroups, often branching into further sub-clades, belonging to two classes: those that arose in South America early after its peopling and those that instead originated in North or Central America and reached South America with the first settlers. Coalescence age estimates provide time boundaries indicating that early Paleo-Indians probably moved from North America to the area corresponding to modern Ecuador and Peru over the short time frame of ~1.5 Ky comprised between 16.0 and 14.6 Kya, confirming rapid spread of founding groups along the American continent. The results of this study have been reported in a manuscript that will appear on Molecular Biology and Evolution (Brandini et al., in press).

155

3.4 Additional projects

3.4 Additional projects

3.4.1 Non-classical HLA genetics: Analysis of rs1233334 (HLAG -725 G/C/A) and rs1063320 (HLAG +3142 C/G) in cord blood units

3.4.1.1 Background

HLA-G is a non-classical HLA class Ib gene which expresses seven isoforms derived from alternative splicing of a primary transcript: the membrane- bound molecules G1, G2, G3, G4 and the soluble forms G5, G6, G7. The main features of HLA-G are its low polymorphism and restricted tissue expression. Despite HLA-G gene is poorly polymorphic in its coding sequences, the unexpected extraordinary rate of variation in the 5‟URR and in the 3'UTR indicates a great importance of HLA-G cell surface expression and protein release. During pregnancy, both transmembrane and soluble HLA-G molecules (sHLA-G) are physiologically expressed in placental and trophoblast tissues and down-regulate maternal immune response. HLA-G is contained in cord blood collected at birth, where it has recently been reported to be produced by CD34+ cell progenies (Buzzi et al., 2012), besides cord blood mesenchymal cells, as previously demonstrated (Avanzini et al., 2009). To investigate whether HLA-G is related or not to the induction of tolerance in CBT, genotyping of the CBUs for known HLA-G polymorphisms could be useful to predict the content of sHLA-G in CBU, and may have an impact on donor selection in the future. Considering the HLA-G 14bp insertion/deletion (INS/DEL) polymorphism in the 3'UTR of the gene affecting mRNA stability and protein expression, in 85 healthy cord blood donors we previously demonstrated that there was a statistically significant correlation between sHLA-G and CD34+ cell concentrations in the group of HLA- G 14b INS/INS carriers (r =0.5662, p-value =0.0060)(Capittini et al., 2014). Two further polymorphisms can be considered that are known to be involved in sHLA- G production. One is the HLA-G -725 G/C/A polymorphism (rs1233334) that is located in the promoter region of HLA-G gene closely flanking an IRF (interferon response factor-1) binding motif. Several authors have previously hypothesized differences in the transcriptional properties of the HLA-G -725 G and -725 C alleles although without the support of functional data. Probably the introduction of an additional methylated cytosine on a CpG nucleotide in presence of the -725C allele may down regulate transcription of the HLA-G gene itself (Ober et al., 2003). The SNP +3142 C/G in the 3‟ UTR has also been suggested to affect HLA- G expression. This site was proposed to be targeted by mi-RNAs so that the

156

3.4 Additional projects presence of G instead of C would favor mRNA degradation by mi-RNAs (Veit et al., 2009).

3.4.1.2 Aim of the research

Taking into account that cord blood derives from pregnancy and HLA-G molecules act in both pregnancy and transplantation as immune-modulators, we aimed to identify the best sHLA-G producers among our cord blood donors, by defining HLA-G genotypes for -725 G/C/A and +3142 C/G polymorphisms. An association between HLA-G SNPs and sHLA-G levels would outline a possible explanation (at least in part) of the tolerogenic properties of cord blood, besides providing new criterion for improved donor selection. 3.4.1.3 The sample

To investigate the role of -725 G/C/A and +3142 C/G polymorphisms on cord blood s-HLA-G levels, rs1233334 and rs1063320 genotyping was performed on 81 and 79 DNA samples from CBUs stored at the Pavia Cord Blood Bank by means of PCR-SBT and TaqMan SNP Genotyping Assay, respectively. sHLA-G levels were determined, by ELISA test (Exbio), in the plasma‟s samples of corresponding CBUs. 3.4.1.4 Results and discussion

HLA-G -725 G/C/A and HLA-G +3142 C/G allelic frequencies determined in CBUs were comparable to those from controls of European descent, previously described. No statistically significant differences in sHLA-G production were found among different genotypes for both polymorphisms: rs1233334 (HLA-G -725 G/C/A): CC 23.0677 ng/mL (mean) 95% CI 8.32–37.8154; GC 18.0214 ng/mL (mean) 95% CI 10.1044–25.9383; GG 13.5540 ng/mL (mean) 95% CI 8.8047–18.3032 rs1063320 (HLA-G +3142 C/G): CC 12.1036 ng/mL (mean) 95% CI 6.1365–18.0708; GC 25.9991 ng/mL (mean) 95% CI 16.7273–35.2708; GG 20.3673 ng/mL (mean) 95% CI 9.7490–30.9868. 3.4.1.5 Conclusion and perspectives

This very preliminary survey suggests absence of correlation between rs1233334 and rs1063320 with the levels of sHLA-G detected in CBUs. Anyway, a better assessment of the role of HLA-G -725 G/C/A and HLA-G +3142 C/G polymorphisms requires an increase of the sample size. Future perspectives include

157

3.4 Additional projects the correlation of these data with those of other polymorphisms involved with HLA-G expression, for instance HLA-G 14bp INS/DEL. Furthermore, HLA-G genotypes and SNPs can be used to calculate the allelic and haplotypic frequencies of the subset population (mainly consisting of healthy European-descent CB donors) providing the reference group for future analyses (for instance in reproduction and autoimmunity setting) and the basis for comparison to published data from populations of both European and non-European background that may reflect information on their ethnic history, including other non classical HLA genes such as HLA-E and HLA-F (Ober et al. 2003; Larsen and Hviid, 2009; Castelli et al. 2011; Carvalho dos Santos et al., 2013; Manvailer et al. 2014; Tureck et al., 2014).

3.4.2 Study of the HLA characteristics of 46 families from Venezuela

3.4.2.1 Background

The patterns of HLA genetic variation worldwide provide significant information about populations‟ diversity related to human geographic expansion, demographic history and cultural diversification. On the basis of our previous analysis considering DNA samples of the CBB (see section 3.3.1), taking into consideration the analogous potential contribution of the DNA archive of bone marrow donors (where the current level of definition is high resolution for both HLA class I and II loci, including DQ locus), we reviewed the data on the ethnic composition of our inventory and were able to found 92 family donors from Venezuela suitable to be enrolled (92 for mtDNA and 44 for MSY) in population studies on the origin of South American populations. Moreover, as they are family donors, parental and/or phratry typings enable the reconstruction of both parental HLA haplotypes (MIH and PIH, and often also NIMA and NIPA, non-inherited paternal antigens). The contemporary population of Venezuela is the result of admixture of Native Americans, Europeans, and Africans, through a process that is not homogeneous throughout the country. In the area of Caracas, it has been reported that the Native American contribution is almost entirely due to females, with scarce representation of non-European and non-African Y-chromosomes (Martinez et al., 2007). Among the indigenous populations inhabiting the area west of Lake Maracaibo including Sierra de Perija, the Yucpa and Bari have been the first among Native American tribes of Caribbean stock (speaking a language of Cariban

158

3.4 Additional projects affiliation) to be tested for the HLA system at allelic level (see figure 3.4.1) (Layrisse et al., 2001). In the Yucpa, it has been reported the presence of ancestral and novel HLA class I and II alleles considered characteristic of Native-American populations.

Figure 3.4.1 Geographical distribution of native ethnic groups of Venezuela. 3.4.2.2 Aim of the research

As allele definition of HLA-A, -B, -C, -DRB1, -DQA1, -DQB1, and optionally -DPB1, as well as full A-B-DR-DQ haplotypes were available for all samples, including the definition of maternal/paternal inherited and non inherited haplotypes (MIH/PIH and NIMA/NIPA), we examined HLA alleles and haplotypes of our sample compared to the Sierra de Perija Yucpa and Bari, as reference Venezuelan native populations in AFND (www.allelefrequencies.net) (Layrisse et al., 2001). Besides the contribution of HLA alleles and haplotypes, we took into consideration also uniparental markers (mtDNA and MSY) to obtain a more comprehensive view of the genetic composition and ancestry in our dataset (Lee et

159

3.4 Additional projects al., 2015; Martinez et al., 2007). Furthermore, as both parents‟ typing was available for all these families, besides maternal inherited haplotypes, MIH and PIH, also non-inherited ones (NIMA and NIPA) can be easily deducted and may provide a more comprehensive outline about the maternal and paternal lineages. 3.4.2.3 The sample

We analyzed the offspring DNA samples of 46 families from Venezuela, testing 32 and 20 unrelated individuals for mtDNA and for MSY, respectively. For all 46 samples, high resolution typing was performed for HLA-A,-B,- C,-DRB1,-DQA1,-DQB1 and -DPB1 by Luminex-based revSSO and LIPA revSSO plus SSP-PCR, including the definition of complete maternal/paternal inherited and non-inherited HLA-DR-DP haplotypes (MIH/PIH and NIMA/NIPA). 3.4.2.4 Results and discussion

Concerning mtDNA analysis, 25 out of 32 were assigned to Native American (as shown in the table 3.4.1) and seven to European and African mtDNA haplogroups (namely: H, n=5; L2a1a2, n=1 and T2b, n=1).

Table 3.4.1 Summary of the individuals from Venezuela assigned to native mtDNA haplogroups.

160

3.4 Additional projects

Concerning MSY analysis, one out of 20 belonged to the Native American haplogroup Q, while the other Y-chromosomes were assigned to non–native haplogroups, namely R1b (n=10), E (n=6) and J (n=3), respectively. Due to extensive male-mediated European and African inputs in Venezuelans, similarly to other South American populations, these data are not surprising. Concerning the HLA system, the HLA alleles and haplotypes of our Venezuelan samples were all examined and compared to those reported for the Sierra de Perija Yucpa and Bari on AFND (www.allelefrequencies.net). As expected for other native populations of South America, also the Yucpa population shows low number of class 1 variants, that could be explained by random genetic drift after recent bottlenecks or reduced genetic variability of the founder population. The Yucpa have retained 7 of the hypothetical 20 ancestral HLA class 1 alleles of the founder population migrating into this continent through Beringia and present 9 of the novel alleles, thought to have originated in situ via recombination or gene conversion under selection pressure. In our Venezuelan families we found several HLA class I alleles reported as frequent in Venezuelan Native Americans, such as A*02:04 in n=3 (allele frequency, AF= 31.4% among Yucpa, which is the only population in AFDN with AF>1%) and B*35:43 in n=4 (AF=15.9% among Bari, which is the only population in AFDN with AF>1%). A*0204 is one of the three new alleles postulated to have originated in situ among Yucpa, where these novel alleles show frequencies superior to 30% and are included in the most frequent extended haplotypes. We also found the A*02:13 allele in n=2. Despite unfrequent in Yucpa (AF=1.2%), in AFND this allele is reported in only another native Central-South American population, namely the Cayapa people from Ecuador (AF=2.4%) (see figure 3.4.2). Interestingly, the Yucpa, Ticuna and Cayapa populations, which are culturally different and geographically distant, besides A*02:13, share other two rare alleles, B*15:22 by the Yucpa and the Cayapa, and DRB1*08:07 by the Yucpa and the Ticuna, indicative of some relationship between these tribes and pointing to a possible origin from lower Amazonia (Layrisse et al., 2001). Concerning class II, only six DRB1 alleles have previously reported among the Yucpa, namely DRB1*0403, 0407, 0411, 0807, 1402, and 1602, with tight linkage disequilibria with specific DQA1 and DQB1 alleles (Layrisse et al., 2001). As in the paper by Layrisse et al. DPB1 was also tested, we focused on this locus and we found the characteristic DPB1*04:02 in n=15 (AF=78.6% in Yucpa) and, interestingly, DPB1*14:01 in n=7. DPB1*14:01 is not considered a typical Native American DPB1 allele but it has previously described with AF=21.4% in the Yucpa and is only present in DR-DP haplotypes comprising DRB1*08:07, *14:02 and *04:01 alleles, in decreasing order (Layrisse et al., 2001) (see figure 3.4.2).

161

3.4 Additional projects

Figure 3.4.2 Allele frequencies for A*02:13, A*02:04, B*35:43, DPB1*04:02 and DPB1*14:01 for the Yucpa and Bari reference populations (AFND, www.allelefrequencies.it).

Concerning HLA class II haplotypes, we found the native DQA1*03:01- DQB1*03:02 haplotype in 12 samples (AF=60.6% in Venezuelan Native Americans), and the native DQA1*04:01-DQB1*04:02 haplotype (AF=12.8% in Venezuelan Native Americans) associated to DRB1*08 in 7 samples (see table 3.4.2). Interestingly, in the paper by Layrisse et al., DQA1*03:01 is only reported in association with DQB1*03:02, as well as DQA1*04:01 is only reported in association with DQB1*04:02, forming the two aforementioned DQA-DQB haplotypes. Furthermore, in the same paper, DQA1*04:01-DQB1*04:02 is only reported in a haplotype comprising DRB1*08:07, confirming the low variation of

162

3.4 Additional projects

DR and DQ alleles and the preferential association in one specific haplotype, as known for Native American populations, sustained by strong LD and balancing selection.

Table 3.4.2 Summary of the Venezuelan individuals assigned to native HLA haplotypes.

Interestingly, A*02:04-B*35:43-DRB1*04:07 in the PIH was associated to non-native MSY haplogroup R1b, integrating the information on the paternal lineage that can be derived by uniparental marker only. Similarly, the sample assigned to non-native mtDNA haplogroup H showed native DRB1*04:03- DQA1*03:01-DQB1*03:02 (11.7% in the Yucpa) in the MIH, which integrates the information on maternal lineage that can be obtained by uniparental marker only. Finally, the sample belonging to MSY haplogroup Q showed a non-native HLA PIH, which may correspond to the father‟s MIH (thus not being in contrast to the information derived by uniparental marker analyses). Outside inbred populations, as in previous reports related to HLA allele and haplotype frequencies in Venezuelans, our data show several European alleles such as A*02, B*51, DRB1*15, DRB1*07 and African Americans ones such as A*68, B*07, B*15 alleles, pointing out that our population is more admixed than the reference ones, and in agreement with the known great ethnic admixture of the Venezuelan population (del Pilar Fortes et al., 2012). 3.4.2.5 Conclusion and perspectives

As previously demonstrated for the immunogenetic data from the CBB, despite preliminary our data on family donors from Venezuela state that HLA polymorphisms could be complementary to uniparental markers, especially when complete parental inherited haplotypes are defined, providing a useful tool for anthropological studies. A very intriguing perspective of the present study is to investigate the associations between KIR and HLA, as for all our samples HLA-A, -B and –C alleles (which represent all the known ligands of KIR molecules) are defined and KIR genotypes can be determined. KIR diversity has been poorly investigated in Native-American populations so far, despite they represent ideal target populations due to long term genetic isolation from populations of other

163

3.4 Additional projects continents, small population size, and great inter-population diversity (Augusto and Petzl-Erler, 2015). An exception is just represented by the noteworthy study of Gendzekhadze et al. in Yucpa people, where the authors were able to determine that only six KIR haplotypes and three HLA epitopes are present in this native population of Venezuela, which may represent the lowest grade of diversity acceptable for any human population (Gendzekhadze et al., 2009).

164

4. MATERIALS AND METHODS

4. MATERIALS AND METHODS

4. Materials and Methods

Cord blood withdrawal was performed by in utero technique using commercial collection bags containing CPD, according to the procedures of the Pavia CBB, in compliance with NetCord-FACT standards, 6th ed. 2016.

At the CBB, from each CBU suitable for banking, a sample of 3 ml was taken in a tube containing EDTA and was sent to the Immunogenetics Laboratory, accompanied by one sample of maternal blood. All CBUs and samples were collected after signing the appropriate informed consent, and procedures were in place to ensure both confidentiality of the donors and traceability. 4.1 DNA Extraction

Human DNA was obtained from 300 µl of cord blood plus EDTA, treated with a cell lysing step by adding lysis buffer (300 µl) plus proteinase K (30 µl). After this initial step, the sample is mixed by vortex mixer at 2200 rpm and incubated at 56°C for 20 minutes. The DNA extraction was performed by fully automated method (Maxwell®16 System, Promega Instrument, Madison WI, USA) and the DNA obtained is eluted in 100 µl of elution buffer. Then DNA solutions are stored at -20°C, in 1.5 ml Eppendorf® tubes.

Cord blood DNA can also be obtained by chorionic villa (stored at -80°C in 2 ml tubes) as an alternative source of genetic material of the foetus/newborn. After thawing, chorionic villa sample is cut in small pieces and resuspended in a saline solution; then DNA extraction is performed according to the same procedure as described above. 4.2 DNA Quantification

The concentration of the extracted DNA was determined using the NanoVue™ Plus (VWR International PBI Srl, Milano, Italia) spectrophotometer. The NanoVue is a full-spectrum (220-750 nm) spectrophotometer that measures 3 μl samples with high accuracy and reproducibility. It utilizes a sample retention technology that employs surface tension alone to hold the sample in place. In addition, the NanoVue Plus has the capability to measure highly concentrated samples without dilution.

A 3 μl sample was pipetted onto the end of a fiber optic cable (the receiving fiber). A second fiber optic cable (the source fiber) was then brought into contact with the liquid sample causing the liquid to bridge the gap between the fiber optic ends. The gap was controlled to both 1 mm and 0.2 mm paths. A pulsed xenon flash lamp provided the light source, and a spectrometer that exploit a linear

165

4. Materials and Methods

CCD (charge-coupled device) array was used to analyze the light passed through the sample.

Sample concentration in ng/μl was calculated following the Beer-Lambert equation, based on absorbance at 260 nm (that of nucleic acids):

A = E * b * c

where A is the absorbance represented in absorbance units, E is the wavelength-dependent molar extinction coefficient with units of M-1 cm-1, b is the path length in cm (defined by the gap between the two optical surfaces), and c the analyte concentration in moles/liter or molarity (M).

At the end of each measurement the value of sample concentration in ng/μl appears on the instrument display directly, together with the A260/A280 ratio (index of protein contamination) and the A260/A230 ratio (index of contamination by carbohydrates, phenols, peptides or aromatic molecules). A260/A280 ratio and A260/A230 ratio are useful to evaluate the purity. When the value of the A260/A280 ratio (where a wavelength of 280 nm corresponds to the proteins absorbance) is between 1.6 and 1.8, this means that the spectrophotometer reading corresponds, with good probability, to nucleic acids concentration, thus the measure of DNA concentration obtained by the NanoVue can be considered as acceptable. The optimal value for A260/A230 ratio is 2.2, thus meaning that the obtained measure of DNA concentration can be considered as acceptable.

The information obtained (ng/µl) was used to prepare working solutions at the desired concentration, usually 50 ng/µl.

Before each working session, the instrument shall be calibrated using a sample containing buffer solution (blank) and a DNA sample at known concentration (usually 20 ± 2 ng/ µl). 4.3 Analysis of HLA loci

In 2014, Luminex® xMAP® technology was introduced in our HLA laboratory routine and was applied to CBU typing at high definition for HLA-A,- B,-C, and -DRB1. Before 2014, CBU typing was performed by a combined approach relying on LiPA revPCR-SSO plus PCR-SSP.

CBU mothers were always typed concurrently for HLA-A, -B and –DR at low resolution, providing the confirmation of the haplotype segregation, as per our CBB policy.

166

4. Materials and Methods

4.3.1 Molecular analysis by revPCR-SSO and PCR-SSP

Reverse Polymerase Chain Reaction - Sequence Specific Oligonucleotide Probe or revPCR-SSO, is a probe assay based on the reverse- hybridization principle: amplified and biotinylated DNA is chemically denatured, and the single strands are hybridized with specific oligonucleotide probes immobilized on membrane-based strips (Jordan et al., 1995).

revPCR-SSO is performed using INNO-LiPA Kits (Innogenetics, Ghent, Belgium), consisting of a line probe assay, for in vitro use, designed for the molecular typing of HLA alleles at the allele group level (that is the first two digits after the asterisk in an allele name when following standard HLA nomenclature, e.g. HLA-DRB1*01). The INNO-LiPA HLA typing tests consist of revPCR-SSO assays, based on the reverse hybridization principle. Amplified biotinylated DNA material is chemically denatured, and the separated strands are hybridized with specific oligonucleotide probes immobilized as parallel lines on membrane-based strips, followed by a stringent wash step to remove any mismatched amplified material. Then, streptavidin conjugated with alkaline phosphatase is added and bound to any biotinylated hybrid previously formed. Incubation with a substrate solution containing a chromogen results in a purple/brown precipitate. The reaction is stopped by a wash step, and the reactivity pattern of the probes is recorded.

The steps of INNO-LiPA HLA assays are as follows:

step 1. Amplification of a HLA-locus alleles. The samples required are: the test sample (10 μl of amplified product), 10 μl of LiPA Control sample and 10 μl of blank amplified control sample (negative control). For this purpose, a specific amplification kit (INNO-LiPA Amp kit) is used.

step 2 Hybridization and stringent wash with the probes immobilized on one INNO-LiPA HLA strip (56°C).

Denaturation and hybridization. After heating a shaking water bath to 56°C ± 0.5°C, checking and adjusting the temperature using a calibrated thermometer, without exceeding the indicated temperatures, the Hybridization Solution (Saline sodium phosphate EDTA, SSPE, and buffer containing 0.5% sodium dodecyl sulphate, SDS) and Stringent Wash Solution (SSPE buffer containing 0.1% SDS) are pre-warmed in a water bath of at least 37°C, not exceeding 56°C, mixing before use, ensuring that all crystals dissolved.

Using forceps, the required number of INNO-LiPA strips is removed from the tube (one strip per test sample plus one strip for the LiPA Control sample and

167

4. Materials and Methods for the blank amplified control sample). Then the required number of test troughs (one trough per test sample) is placed into the tray and 10 μl of Denaturation Solution (an alkaline solution containing EDTA) is pipetted into the upper corner of each trough, allowing denaturation to proceed for 5 minutes at 20-25°C. The prewarmed Hybridization Solution is shaked and a volume of 2 ml is added to the denatured amplified product into each trough, mixing by gentle shaking and taking care not to contaminate neighbouring troughs during pipetting. The strip with the marked side of the membrane up is immediately placed into the trough, completely submerged in the solution. Then the tray is placed into the 56°C ± 0.5°C shaking water bath (80 rpm), closing the lid, and incubating for 30 minutes. The water level is adjusted between 1/3 and 1/2 of the height of the trough and the tray is immobilized between two heavy weights to prevent from sliding, avoiding splashing water from the water bath into the trough.

Stringent wash. After hybridization, the tray is removed from the water bath, holding the tray at a low angle and aspirating the liquid from the trough with a pipette, preferably attached to a vacuum aspirator. 2 ml of pre-warmed Stringent Wash Solution are added into each trough and rinsing occurs by rocking the tray briefly (10 - 20 seconds) at 20 - 25°C. Then the solution is aspirated from each trough, repeating this brief washing step once. At last, the solution is aspirated and each strip incubated in 2 ml pre-warmed Stringent Wash Solution in the shaking water bath at 56°C ± 0.5°C for 10 minutes. Then the lid of the water bath is closed. Before incubation, the temperature of the water bath is checked using a calibrated thermometer, and adjusting the temperature if necessary. Both the concentrated Rinse Solution 5x and Conjugate 100x are diluted during stringent wash.

step 3 Color development. All subsequent incubations are carried out at 20 - 25°C on a shaker. During the incubations, the liquid and test strips should move back and forth in the trough for homogeneous staining. Each strip is washed twice for 1 minute using 2 ml of the Rinse working solution (Phosphate buffer containing NaCl, Triton®, and 0.05% MIT/0.48% chloroacetamide, CAA, as preservative, to be diluted 1/5 (1 part +4 parts) in distilled or deionized water before use). Then 2 ml of the Conjugate working solution (obtained diluting 1/100 the conjugate, that is streptavidin labeled with alkaline phosphatase in Tris buffer containing protein stabilizers and 0.01% MIT/0.098% CAA as preservative, in Conjugate Diluent (Phosphate buffer containing NaCl,Triton®, protein stabilizers, and 0.01% MIT/0.1% CAA as preservative) before use) are added to each trough and incubated for 30 minutes while agitating the tray on the shaker. Each strip is washed twice for 1 minute using 2 ml of the Rinse working solution and washed once more using 2 ml Substrate Buffer (Tris buffer containing NaCl, MgCl2 and 0.01% MIT/0.1% CAA as preservative). 2 ml of the diluted Substrate working solution (obtained diluting 1/100 the substrate, that is 5-Bromo-4-chloro-3-indolyl

168

4. Materials and Methods phosphate p-toluidine salt (BCIP) and 4-nitroblue tetrazolium (NBT) in dimethylformamide(DMF), in Substrate buffer (Tris buffer containing NaCl, MgCl2 and 0.01% MIT/0.1% CAA as preservative) before use) are added to each trough and incubated for 30 minutes while agitating the tray on the shaker. Color development is stopped by washing the strips twice in 2 ml distilled water while agitating the tray on the shaker for at least 3 minutes. Using forceps, the strips are removed from the troughs and placed on absorbent paper, letting them to dry completely before reading the results. Once developed and dried, the strips shall be stored in the dark.

Step 4. Reading and interpretation of the probe reactivity pattern. A line is considered positive when a clear purple/brown band appears at the end of the test procedure. After checking for the correct reactivity pattern of the LiPA Control sample, the conjugate control line on the strip should be lined up with the conjugate control line on the plastic reading card. Then the positivity of the control lines (first two lines) is checked in order to validate each individual strip. Typing results are based on the reactivity of the probes in the kit: after identification of all probe numbers that are positive on the INNO-LiPA strip, the HLA-locus type is deduce by using the INNO-LiPA typing table, provided with the kit, or the LiPA interpretation software (LiRAS™) (figure 4.3.1).

Figure 4.3.1. Location of the marker line (black), the conjugate control line (conj.control), the HLA-DRB1 Plus control line (HLA-DRB1 Control) and the 37 sequence-specific DNA probes on the INNO-LiPA HLA-DRB1 Plus strip.

Incubation at 56°C ± 0.5°C during hybridization and stringent wash are the most critical steps in avoiding false positive (temperature too low) or false negative/very weak signals (temperature too high). For this purpose, a shaking water bath with shaking platform (80 rpm) and inclined lid, provided with a calibrate thermometer, allows a strict control of temperature variations, ensuring the lid of the water bath is kept closed during incubations in order to avoid false positive signals. Also the water level shall be adjusted between 1/3 and 1/2 of the height of the troughs, making sure that they do not float on the water and the water should be in direct contact with them. On the contrary, the use of a hot air shaker shall be avoided. 169

4. Materials and Methods

Incubation steps for the color development should be between 20 and 25°C. If the temperature is below 20°C, weaker results may be obtained. If the temperature is above 25°C, high background and/or false positive signals may be obtained. Therefore it is recommended always to incubate exactly for the duration as mentioned in the protocol.

The amplitude of the motion generated by both the shaking water bath (hybridization procedure) and the shaker (color development procedure) is critical in achieving maximum sensitivity and homogeneous staining: the strip surface should be completely submerged and the amplitude should be as high as possible. However, shaking during incubation of the strips should be performed in such a way that both the liquid and the test strips move back and forth, without liquid spillage over the edge of the troughs, as this can lead to cross-contamination and invalid results. During hybridization and stringent wash incubations, the troughs can be left uncovered in the water bath, as covering the troughs with microplate sealers may also cause cross-contamination.

Polymerase Chain Reaction-Sequence-specific primed or PCR-SSP is based on the use of sequence specific primers designed to be complementary or not to the target DNA sequence (Welsh et al., 1999).

The PCR-SSP methodology for HLA was originally described in 1991 and 1992 (Olerup and Zetterquist, 1991; Olerup and Zetterquist, 1992) and is based on the principle that completely or almost completely matched oligonucleotide primers without 3‟-end mismatches are more efficiently used in the PCR reaction, than mismatched primers by thermo-stable DNA polymerases without proof- reading properties. Primer pairs are designed to match with single alleles or group(s) of alleles depending upon the degree of typing resolution required. With strictly controlled PCR conditions, matched or almost completely matched primer pairs allow amplification to occur, leading to a positive result, whereas mismatched primer pairs do not allow amplification to occur, leading to a negative result. After the PCR process, the amplified DNA fragments are size-separated by agarose gel electrophoresis, visualized by staining with ethidium bromide and exposure to ultraviolet light, documented by photography and interpreted. Interpretation of PCR-SSP results is based on the presence or absence of specific PCR products. The relative sizes of the specific PCR products may be helpful for the interpretation of the results.

In most PCR-based techniques, the PCR process is used only as an amplification step of needed target DNA, and a post-amplification step to discriminate between the different alleles is required. In contrast, in the PCR-SSP methodology, the discrimination between the different alleles takes place during

170

4. Materials and Methods the PCR process. This shortens and simplifies the post-amplification step to a simple gel electrophoresis detection step. The SSP test results are either positive or negative, which abolishes the need for complicated interpretation of results. In addition, the typing resolution of the PCR-SSP is higher than for other PCR-based typing techniques as each primer pair defines two sequence motifs located in cis, i.e. on the same chromosome.

Since the PCR process may be adversely affected by various factors (e.g. pipetting errors, too low DNA concentration, poor DNA quality, presence of PCR inhibitors, thermal cycler accuracy) an internal positive control primer pair is included in each PCR reaction (Olerup and Zetterquist, 1992). The internal positive control primer pair matches conserved regions of the human growth hormone gene, which is present in all human DNA samples. The amplicons generated by the specific HLA primer pairs are shorter than the amplicons of the internal positive control primer pair but larger than unincorporated primers.

PCR-SSP is performed using reagents by Olerup SSP® HLA Typing Kits (Olerup SSP AB, Stockholm, Sweden), which are qualitative in vitro diagnostic kits for the DNA typing of HLA Class I and HLA Class II alleles for the purpose of determining HLA phenotype. The Olerup SSP® typing kits contain dried, pre- optimized sequence-specific primers for PCR amplification of HLA alleles and of the human growth hormone gene and PCR Master Mix including Taq polymerase (“Master Mix”). The same Master Mix is used for all for all Olerup SSP® kits.

The primer solutions are pre-aliquoted and dried in different 0.2 ml wells of cut, thin-walled PCR trays. Each well of the tray contains a dried primer solution consisting of a specific primer mix, i.e. allele- and group-specific HLA primers as well as an internal positive control primer pair matching non-allelic sequences and are ready for the addition of DNA sample, Master Mix, and H2O. Each tray includes a negative control reaction well that detects the presence of PCR products generated by more than 95% of the Olerup SSP® HLA Class I, DRB, DQB1, DPB1 and DQA1 amplicons as well as the amplicons generated by internal positive control primer pairs. The primers are designed for optimal PCR amplification when using the Master Mix and the recommended DNA cycling program. The same PCR Cycling Parameters are used for all the Olerup SSP® kits (table 4.3.1).

Extracted, highly pure DNA is needed for SSP typings. The A260/A280 ratio should be 1.6 – 2.0 as determined by UV spectrophotometry for optimal band visualization during electrophoresis. DNA samples to be used for PCR-SSP HLA typing should be re-suspended in dH2O; if necessary (recommended DNA concentration 15-30 ng/µl). The extracted DNA is diluted in dH2O, as concentrations exceeding 50 ng/µl will increase the risk for nonspecific

171

4. Materials and Methods amplifications and weak extra bands, especially for HLA Class I high resolution SSP typings. DNA samples should not be re-suspended in solutions containing chelating agents such as EDTA, above 0.5 mM in concentration. DNA samples may be used immediately after extraction or stored at +4°C or at -20°C or colder. The purity and concentration of extracted DNA samples that have been stored for a prolonged period are tested for acceptability prior to HLA typing.

After purification of genomic DNA, PCR amplification on purified DNA sample is performed using an Olerup SSP® typing tray, programming a thermocycler to run the Olerup SSP® PCR program, according to the PCR Cycling Parameters described above and preparing the electrophoresis using a high quality electrophoresis grade agarose, capable of resolving fragments of DNA ranging from 50 to 1100 base pair in size.

Table 4.3.1 Steps for programming the thermocycler before starting (as required for Olerup SSP® kits). Total reaction volume in each well, 10 µL.

The steps of Olerup SSP® typing assays are as follows:

step 1. After thawing at room temperature (20 to 25°C) the appropriate number of DNA samples, the primer trays and the volume of Master Mix needed for the selected DNA samples/primer trays, DNA samples are mixed briefly by vortexing. Using a manual single-channel pipette, Master Mix and dH2O are added at room temperature into a 0.5 ml or a 1.5 ml tube. Then the tube undergoes pulse- spin in a microcentrifuge to bring all liquid down from sides of the tube. Using a manual single-channel pipette, 8 μl of the Master Mix – dH2O mixture and 2 μl dH2O are added into the negative control well, i.e. the well with the negative control primer pairs, of the primer tray. Using a manual single-channel pipette, at room temperature the DNA sample is added to the remaining Master Mix–dH2O mixture and the tube undergoes pulse-spin in a microcentrifuge to bring all liquid down from sides of the tube. Using an electronic single-channel dispenser 10 μl of the sample reaction mixture is aliquoted into each well, except the negative control well, of the primer tray.

172

4. Materials and Methods

step 2. After the sample is applied above the primers (dried at the bottom of each well of the primer tray) to avoid cross-contamination between wells and touching the inside wall of the well with the pipette tip to allow the sample to slide down to the bottom of the well and all samples have dropped to the bottom of each well. The primer tray is covered with the provided adhesive PCR seals to prevent evaporative loss during PCR amplification and is placed in the thermocycler with a suitable tube-tray adapter, within 5 minutes delay between PCR setup and thermal cycling is started according to the Olerup SSP® program number, specifying a 10 μl reaction volume. After about 1 hour and 20 minutes, the program is completed.

step 3. Gel electrophoresis. After orienting the primer tray and gel box, so that the order of the wells is from left to right and top to bottom, the strip lids are removed without splashing the PCR products and the PCR products are loaded in sequence to the 2 % agarose gel using an 8-channel pipette. Then a DNA size marker is loaded in one well per row, the gel box is covered with the gel box lid. After the gel is electrophoresed in 0.5 x TBE buffer, without re-circulation of the buffer, for 15-20 minutes at 8-10 V/cm, the gel tray is transferred to a UV transilluminator and the gel is photographed.

step 4. Data analysis. The provided lot-specific Specificity and Interpretation tables or worksheet for the specific HLA alleles amplified by each primer mix shall be used. In brief, the gel photo is carefully examined to determine the positive lanes (figure 4.3.2). A faster-migrating, shorter band seen in a gel lane means that specific HLA allele is amplified. This indicates a positive test result. The presence and absence of specific PCR products is recorded to match the pattern of gel lanes with specific PCR products with the information in the lot- specific interpretation and specificity tables to obtain the HLA typing of the sample DNA. An internal positive control band, slower-migrating and longer, should be visible in all gel lanes, except in the negative control gel lane, as a control of successful amplification. The internal positive control band may be weak or absent in positive gel lanes. Absence of internal positive control band with no specific PCR product indicates failed PCR reaction. The presence of specific PCR product or internal positive control band in negative control lanes indicates contamination with PCR products and voids all test results. Primer oligomers ranging from 40 to 60 base pairs in size might be observed in the negative control lanes, not representing contamination. Unused primers will form a diffuse band shorter than 50 base pairs; also primer oligomer artefacts might be observed, that are longer than the primer band but shorter than the specific bands.

173

4. Materials and Methods

Figure 4.3.2 Gel interpretation of PCR-SSP.

4.3.2 Molecular analysis by revPCR-SSO and Luminex® xMAP® technology

Luminex® xMAP® technology relies on a single-step reverse PCR-SSO, and was used with LABType® SSO HD commercial kits for class I A, B and C loci and class II DRB1 locus (One Lambda Inc., Canoga Park, CA, USA) (Dunbar, 2006; Heinemann et al., 2009).

The LABType™ SSO DNA typing system provides sequence-specific oligonucleotide probes immobilized on microspheres for identification of HLA alleles in amplified genomic DNA samples through a controlled DNA hybridization reaction, followed by flow analysis using the LABScan™ 100 flow analyzer. In brief, target DNA is PCR amplified using a group-specific primer. Then, the PCR product is biotinylated, which allows detection using R- Phycoerythrin-conjugated Streptavidin (SAPE). The PCR product is denatured and allowed to rehybridize to complementary DNA probes conjugated to fluorescently coded microspheres. A flow analyzer identifies the fluorescent intensity of PE (phycoerythrin) on each microsphere, and the HLA typing is assigned on the basis of the reaction pattern compared to patterns associated with published HLA gene sequences. The test results are either positive or negative, abolishing the need for complicated interpretation of results. The introduction of a step to amplify the target DNA by PCR, coupled with hybridization and detection in a single reaction mixture, makes this method suitable for both small and largescale testing.

PCR amplification of targeted regions with HLA locus specific biotinylated primers is followed by a process of probing the amplicon with microscopic polystyrene beads (about 6 micron wide), fluorescence-labeled. In particular, prior to the analysis, each sample undergoes denaturation, neutralization, hybridization, incubation and alternative steps of labeling and washing, according to the manufacturer‟s instructions. Each bead is internally dyed 174

4. Materials and Methods with a unique combination of red and infrared dye; the intensities of the two dyes are differently combined so that a unique signature can identify each bead when excited by a laser bundle (figure 4.3.3)(Heinemann et al., 2009). The beads are chemically coated with different oligonucleotide targets, each complementary to HLA specific genomic sequences. Beads presenting complementary probes to the amplicon hybridize, then amplicons annealed to probes are detected via SAPE (Streptavidin Phycoerythrin) chemistry. Luminex® platform uses the principles of flowcytometry to excite the beads, streaming in single row throughout a pair of lasers. The first laser, the red one, is used to excite and therefore identify the specific bead, while the green laser excites the dyes captured on the beads during the assay, highlighting the SAPE-mediated fluorescence.

Figure 4.3.3 HLA class I and class II typing using Luminex TM multiplex technology (Heinemann et al., 2009).

Luminex® xMAP® technology develops a multiplex system, where many readings are made on each bead set (up to 100 unique assays within a single sample); the oligonucleotides coating the 100 beads are selected in such a way that unique reaction patterns of the beads identify individual responses. The presence of specific alleles is given according to the beads released fluorescence and the reaction pattern provides the HLA typing, once compared to patterns associated with published sequences. The analysis of the results is performed by HLA- Fusion® software (One Lambda Inc., Canoga Park, CA, USA).

The microsphere mixture consists of a set of fluorescently labeled microspheres that bear unique sequencespecific oligonucleotide probes for HLA 175

4. Materials and Methods alleles. Each microsphere mixture includes negative and positive control microspheres for subtraction of non-specific background signals and normalization of raw data to adjust for possible variation in sample quantity and reaction efficiency. The HLA locus-specific primer mixes are pre-optimized for amplification of specific HLA genes from 40 ng of purified genomic DNA in 20 µl volume when used in conjunction with D-mix, the prescribed amount of recombinant Taq polymerase, and the specific PCR reaction profile provided with the manufacturer‟s instructions. For each lot, a worksheet for the specific HLA alleles that can be identified by each probe is provided. LABType™ SSO beads can settle and aggregate if left in a tube, therefore they are mixed vigorously by pipetting several times or by vortexing in horizontal position for 10 to 30 seconds to obtain fully homogeneous mixture, before use. To help prevent bead aggregation, immediately after removal of supernatant, tray is inverted and very gently tapped on dry paper towel to remove as much liquid as possible, then is sealed and mixed by vortex thoroughly at low speed to loosen the pellets. LABType™ SSO beads are packaged in an aluminium foil bag and shall not be removed until ready to use. To avoid photo bleaching of the beads, beads are protected from light during usage and storage at -20ºC in the tightly capped tube. Beads are covered with aluminium foil or equivalent also during assay.

The steps of LABType® SSO HD typing kits are listed as follows:

step 1. The DNA sample used for Luminex assay should be re-suspended in sterile water or in 10 mM Tris-HCl, pH 8.0-9.0 at an optimal concentration of 20 ng/µl with the A260/A280 ratio of 1.65-1.80. The samples should be free from any inhibitors of DNA polymerase, and should not be re-suspended in solutions containing chelating agents, such as EDTA, above 0.5 mM in concentration.

step 2. Amplification. Thawed DNA, Amplification Primers, and D-Mix are kept on ice until use. The concentration of genomic DNA is adjusted to 20 ng/µl using sterile water. D-mix and Amplification Primer are mixed by vortex for 15 seconds for 3-5 seconds. Using the table provided in the manufacturer‟s instructions, the indicated volume of D-mix and Primers are mixed by vortex for 15 seconds, and placed on ice. For accurate pipetting of Taq polymerase, it is recommended to prepare master mix for at least 10 reactions. 2 μl of DNA (at 20 ng/µl) is pipetted into the bottom of a tube (for final volume of 20 μl per PCR reaction), partially covered to prevent evaporation and contamination. An appropriate amount of Taq polymerase (e.g., 0.2µl (typically at 5 U/ul) per 20 µl reaction) is added to the Amplification Mixture, then 18 μl of Amplification Mixture are aliquoted into each well containing DNA, avoiding to touch the pre- aliquoted DNA at the bottom to prevent cross-contamination. After placing a PCR Pad appropriate for the thermal cycler on the tray and closing the lid, “LABType™

176

4. Materials and Methods

SSO PCR Program” (see table 4.3.2) is run. If the obtained amplified product is not used immediately, it can be stored covering DNA tray at -80°C to -20°C C for up to one month.

Table 4.3.2 LABType™ SSO PCR Program

step 3. Denaturation/Neutralization. 2.5µl Denaturation Buffer are transferred into a well of a clean 96-well plate, adding 5 µl of each amplified DNA and mixing thoroughly (preferably by pipetting up and down), and incubating at room temperature (20 – 25°C) for 10 minutes. Then 5 µl of Neutralization Buffer are added with pipette, and mixed thoroughly (preferably by pipetting up and down): the color should change from bright pink to pale yellow or clear. The PCR plate is placed on the ice bath, avoiding contamination of PCR product with water.

step 4. Denaturation/Neutralization. After combining appropriate volumes of Bead Mixture and Hybridization Buffer to prepare Hybridization Mixture, 38µl Hybridization Mixture are added to each well, then the tray is covered with tray seal, vortexed thoroughly at low speed and placed into the pre-warmed thermal cycler (60°C). After incubation for 15 minutes, 100 µl Wash Buffer are quickly added to each well, and centrifuged for 5 minutes at 1000-1300 g. After removal of wash buffer, these steps are repeated for a total of three wash steps.

step 5. Labeling. 50 µl of 1X SAPE solution are added to each well and vortexed thoroughly at low speed. Then the tray is placed in the pre-heated thermal cycler (60°C) and incubated for 5 minutes. After removing the seal 100 µl Wash Buffer are quickly added to each well, and centrifuged for 5 minutes at 1000–1300 g. After removal of supernatant, 70 µl Wash Buffer are added to each well, gently mixed by pipetting and transferred to reading plate using an 8- or 12-channel pipet, avoiding sample- to-sample contamination. The final volume should be at least 80

177

4. Materials and Methods

µl. For the best results, samples should be read as soon as possible, as prolonged storage of samples (more than 4 hours) may result in loss of signal. Anyway storage of samples overnight at 4°C in the dark with a tray seal, and thoroughly mixing the samples immediately before reading, is acceptable.

step 6. Data acquisition, calculation and analysis. LABScan™ 100 is turned on and set up for sample acquisition and calibration according to the Luminex User‟s Manual for the software version currently in use. The template/protocol is chosen according to the product catalog ID and lot number, also downloadable via the One Lambda website. After loading the plate onto the XY platform and filling the reservoir with sheath fluid, the session is initiated. Once the samples have been run, the data output should be saved in a .csv file.

The mean fluorescence intensity (MFI) generated by the Luminex® software contains the FI for each bead (or probe bound to the bead) per sample. The percent positive value is calculated as:

Percent Positive Value = 100 x [MFI (Probe n) - MFI (Probe Negative Control)/ MFI (Probe Positive Control) - MFI (Probe Negative Control)]

where the positive reaction is defined by the percent of positive values for the probe higher than the pre-set cutoff value for the probe, while the negative reaction is defined as the percent of positive values lower than the cut-off value. The MFI for negative control is typically 0-100 and can vary between lots and locus-specific products: signals outside of the range may represent inefficient controls of the assay parameters such as sample quantity and/or quality of sample, technique, instrument calibration, and state of all reagents including amplified DNA, buffers, SAPE and the bead mixture.

Comparing calculated percent positive values to the pre-determined cut-off values for each test probe, allows assigning a positive attribute to probes that have a percent positive above the cut-off and a negative attribute to those below the cut- off. The MFI of the positive control should be within 1200-7000 MFI. The MFI value may fall outside of this range and varies for each positive control probe and lot. The MFI of each probe is normalized against the positive control MFI and is expressed as a percentage of the positive control MFI. The pre-set cut-off value for each probe was established using a 100- to 200-sample DNA panel. Finally HLA allele (or allele groups) of the sample is determined by matching the pattern of positive and negative bead IDs with the information in the LABType™ SSO worksheet or using HLA Fusion™ Software.

In order to obtain valid data, two parameters, count and Mean Fluorescence Intensity (MFI), must be monitored for each data acquisition. Count represents the 178

4. Materials and Methods total number of beads that has been analyzed, and should be above 100 ±25%. A significant reduction in the count suggests bead loss during sample acquisition or assay and can void test results. MFI represents a PE signal detected within the counted beads, and varies based on reaction outcome. The MFI for the positive control probe could vary from lot to lot, and also due to sample quantity and/or quality, technique, instrument calibration, and state of all reagents including amplified DNA, buffers, SAPE and the bead mixture.

4.3.3 Test repetition

In case of failed test or to solve ambiguous findings or to confirm a homozygous status, a second SSP-based test is performed. In particular, a low resolution SSP-based typing (HLA-Ready Gene®kits, Inno-train Diagnostik GmbH, Kronberg, Germany) is performed, using HLA-A, -B -C and -DRB1 locus specific kits, according to the manufacturer‟s instructions. Moreover, in order to split the HLA-B*14, B*15 and B*40 serological variants, a second step consisting in a high resolution SSP-based typing (AllSet+™ Gold SSP, Invitrogen by Life Technologies, Thermo Fisher Scientific, Waltham, MA, USA) is performed, according to the manufacturer‟s instructions

4.3.4 Statistical analysis

For the statistical analysis, a comparison between the results of 2014 activity (based on Luminex® xMAP® technology) and 2013 one (based on LiPA revPCR-SSO plus PCR-SSP) was carried out.

The results are presented in terms of mean and standard deviation (SD) for typing time, and as percentage frequencies and odds ratio (OR) concerning the number of test repetitions. The comparisons for each HLA allele between the two years were performed using Chi square test or Fisher exact test, as appropriate. The average cost per sample testing was calculated as the overall amount of the costs of the de novo tests and the repetition tests, if needed, divided for the tests number.

Chi square test was used to compare qualitative variables, while the comparison between means was done using a t test for independent data. P<0.05 was considered statistically significant. All tests were two-sided.

The data analysis was performed with the STATA statistical package (release 14.0, 2015, Stata Corporation, College Station, Texas, USA).

179

4. Materials and Methods

4.4 Analysis of the polymorphisms of immune response genes and CBT outcomes

From January 1994 to December 2010, a total of 696 CBT recipients met the eligibility criteria for inclusion in a multicentric retrospective analysis promoted by Eurocord. However, only 143 samples from the recipients were collected before transplant, kept in the CBB and were available for analysis. Samples were centralized and shipped in dry ice to the Laboratory of Hematology of Ribeirão Preto School of Medicine of São Paulo University according to International and Brazilian rules for shipment of biological material. Informed consent of performing genetic studies in CBU and patients followed the ethical committee‟s rules of each participating CBB. Informed consent for CBU collection and CBT were obtained in accordance with Declaration of Helsinki. This study was approved by the local ethical committee (Comité de protection des personnes Ile- de-France IV) located in Saint-Louis Hospital, Paris, France, where Eurocord is hosted.

Conditioning protocols, GvHD prophylaxis, selection of CBU, use of G- CSF, reactivation of CMV surveillance and use of antimicrobial agents followed guidelines and rules of each transplant center.

4.4.1 Genetic polymorphism

Biological samples were shipped as extracted DNA, umbilical cord blood sample or cord fragment tissue. The DNA samples of the CBUs from the Pavia CBB were shipped as extracted DNA, obtained following the procedure illustrated in paragraph 4.1. The concentration and purity of all extracted DNAs were determined using the NanoVue™ Plus, as detailed in paragraph 4.2.

For samples shipped as cord blood or cord tissue samples, genomic DNA was extracted with extraction kits (FlexiGene DNA Kit [51206]-Qiagen® for CBU sample and Promega DNA Purification Kit [A1125]-Wizard® for umbilical fragment tissue), and stored in 1.5 ml Eppendorf tubes at -20°C until use.

DNA samples were genotyped by Real-time PCR assay for the following candidate genes related to immune response: NLRP1 (rs-5862), NLRP2 (rs- 043684), NLRP3 (rs-10754558), TIRAP/Mal (rs-8177374), IL10 (rs-1800872), REL (rs-13031237), TNFRSF1B (rs-1061622) and CTLA4 (rs-3087243). A complete list of all SNPs studied and probes used for genotyping samples are described in table 4.4.1.

180

4. Materials and Methods

Table 4.4.1 List of SNPs and probes used for genotyping candidate genes involved in immune response (Cunha et al., 2017).

Gene SNP Chromosome Name Probes (VIC/FAM)

NLRP1 rs5862 17:5499637 LOC728392MIS12, GTCTTAAGATGACAAATC (NALP1) MIND kinetochore CCTAGGG[A/G]TCAGGTG complex component, GTTTTCCCGCACGAACTC homolog (S. pombe)

NLRP2 rs1043684 19:55001063 NLR family; pyrin CTGCCTCTGTTTTATACC (NALP2) domain containing 2 TGCACAC[A/G]TCCTTATC TTTGTTACATATGAAAT

NLRP3 rs10754558 1:247448734 NLR family; pyrin GACAATGACAGCATCGG (NALP3) domain containing 3 GTGTTGTT[C/G]TCATCAC AGCGCCTCAGTTAGAGG A

TIRAP-MAL rs8177374 11:126292948 Toll-interleukin 1 GAGGGCTGCACCATCCC receptor (TIR) domain CCTGCTGT[C/T]GGGCCT containing adaptor CAGCAGAGCTGCCTACC protein CA

IL10-592 rs1800872 1:206773062 Interleukin 10 CTTTCCAGAGACTGGCTT CCTACAG[T/G]ACAGGCG GGGTCACAGGATGTGTTC

REL (cREL) rs13031237 2:60908994 V-rel TAAAGTTTGAAAAAATGG reticuloendotheliosis CTCATGT[G/T]TACTTCAT viral oncogene homolog TGTCCTTTCTTTATTGC (avian)

TNFRSF1B rs1061622 1:12192898 Tumor necrosis factor GTGGCCATCCCTGGGAA receptor superfamily; TGCAAGCA[G/T]GGATGC member 1B AGTCTGCACGTCCACGTC C

CTLA4 rs3087243 2:203874196 Cytotoxic T-lymphocyte TCTTCACCACTATTTGGG associated protein 4 ATATAAC[A/G]TGGGTTAA CACAGACATAGCAGTCC

Abbreviations: SNP, single nucleotide polymorphism Real-time PCR was performed by allelic discrimination method in 7300 Real Time PCR System, using TaqMan® SNP genotyping assays and TaqMan® genotyping Master Mix reagent (Applied BiosystemsTM, Foster City, CA, USA), according to the manufacturer instructions. Graphical interpretation of results was processed and supplied by ABI 7500 System SDS Software® (Applied BiosystemsTM, Foster City, CA, USA). 4.4.2 Statistical analysis

Preliminary analyses of Hardy-Weinberg equilibrium and minimum allele frequency (MAF) of genotype distribution were performed (data not shown). Chi-

181

4. Materials and Methods square test was used for measuring difference between groups (Consonni et al., 2011; Ball et al., 2013). Independent risk factors for death and survival analyses were performed for DFS and OS by univariate and multivariate analysis with log- rank and Cox proportional hazards tests, respectively (Kaplan and Meier, 1958). Prognostic factors for neutrophils and platelets engraftment, acute and chronic GvHD, NRM was analyzed in a competitive risk scenario using Fine and Gray hazards proportional models, being death a competitive event (Fine and Gray 1999). These models were constructed using variables of univariate analysis presented p < 0.20 or variables with clinical relevance. Further, variables with p > 0.05 were removed from the multivariate model. Estimated type I error was set at 0.01 due to correction for multiple testing. Statistical analyses were performed with SPSS® 18.0 (SPSS Inc. Chicago, IL, EUA), Splus2000® (MathSoft, Seattle, WA) e The R Projet for Statistical Computing (http://www.r-project.org) (Everitt, 2005). A subset analysis was also performed using a homogenous group of patients with acute leukemia, given a myeloablative conditioning regimen and with allele typing of HLA -A, -B, -C and -DRB1 (cohort 2). This cohort of patients is part of a previous study on the impact of HLA high resolution typing on CBT outcomes which included a total of 1568 single CBT (Eapen et al., 2014). 4.5 MtDNA analysis 4.5.1. Long range PCR for Illumina sequencing

Most of the mitogenomes analyzed in this thesis have been sequenced using a Next Generation Sequencing (NGS) platform (MiSeq desktop sequencer, Illumina®) starting from two overlapping fragments, obtained with a long range PCR technique, covering the whole mtDNA sequence. The long range PCR uses an optimized polymerase that can amplify DNA lengths up to 30 Kb and beyond, that cannot typically be amplified using routine PCR methods or reagents.

One of the two overlapping PCR fragments was about 8 Kb in length, while the second was more than 9 Kb long (table 4.5.1). The first extends from np 5871 to np 13829, while the second from np 13477 to 6151 and includes the control region. In order to verify the amount and the specificity of PCR products, 2-5 μl of amplification reaction, mixed with gel-loading buffer, were loaded on agarose gels, at different concentrations depending on the size of the PCR fragments, containing ethidium bromide at the final concentration of 0.5 μg/ml. Electrophoresis was carried out in 1X TBE.

182

4. Materials and Methods

Table 4.5.1 Oligonucleotides used to amplify the human mitogenome in two fragments.a It corresponds to the first nucleotide position (at 5‟) in the primer, numbered according to the reference sequence rCRS (Andrews et al., 1999).b Melting temperature.

PCR product Fragment Namea Length (nt) Sequence (5'-3') Tm (°C)b length (bp) 5871 for 22 gcttcactcagccattttacct 59.79 1 7959 13829 rev 22 agtcctaggaaagtgacagcga 60.44

13477 for 22 gcaggaatacctttcctcacag 60.13 2 9244 6151 rev 22 actagtcagttgccaaagcctc 59.95

Amplification reactions were performed with 10-50 ng of template DNA in 50 µl of reaction mix for each fragment containing 1X GoTaq® LongPCR Master Mix (Promega) and 0.2 µM of each primer (tables 4.5.2 and 4.5.3).

Table 4.5.2 Long range amplification protocol. Initial Reagent Volume (µl) Final Concentration concentration

H2O 22 GoTaq® LongPCR Master Mix 2X 25 1X

Forward primer 10 µM 1 0.2 µM

Reverse primer 10 µM 1 0.2 µM

DNA ~25 ng/µl 1 ~25 ng/µl

Final Volume 50

Table 4.5.3 Long range PCR reaction protocol used for mtDNA.

Step Temperatures °C Time

Initial denaturation 94°C 2 min Denaturation 94°C 30 sec Annealing 55°C 30 sec Extension 65°C 9 min

183

4. Materials and Methods

Repeat from denaturation to annealing for 30 times Final extension 72°C 10 min 4.5.2 Next Generation Sequencing: sequence analysis

Long range PCR products were purified with the Wizard® SV Gel and PCR Clean-Up System (Promega) according to manufacture instructions and quantified using a Quantus™ Fluorometer (Promega).

A total amount of 1.5 ng of PCR product (0.75 ng for each PCR) were used for the set up of a sequencing library with the Nextera® XT DNA sample preparation kit (Illumina) following the manufacturer's protocol. The steps of this protocol are briefly described below:

 Tagmentation of DNA: input DNA is tagged and enzymatically fragmented by the Nextera XT transposome, which simultaneously fragments the DNA and adds adapter sequences to the ends.

 Amplification of tagmented DNA: the tagmented DNA is amplified via a limited-cycle PCR program. The PCR step also adds index 1 (i7) and index 2 (i5), oligos that bind the DNA segments at both ends. These oligos are complementary to those present in the flow cell, allowing hybridization, clustering and subsequent amplification when the samples are loaded in the MiSeq. The use of two different indexes allows first the hybridization process to occur between index 1 and the complementary oligo in the flow cell and then the formation of „bridges‟ with the hybridization between the index 2 and its complementary oligo.

PCR Clean-Up: the library DNA is purified with AMPure XP beads, a step that also provides a size selection removing very short library fragments from the population. These beads are paramagnetic (magnetic only in a magnetic field) and this prevents them from clumping and falling out of solution. Each bead is made of polystyrene surrounded by a layer of magnetite, which is coated with carboxyl molecules. This composition allows the reversibly binding of negatively-charged DNA to the carboxyl groups on their surface.

Library normalization: the quantity of each library is normalized to ensure more equal library representation in your pooled sample. In preparation for sequencing, equal volumes of normalized library are combined, diluted in hybridization buffer, and heat denatured prior to MiSeq sequencing. Once obtained the final normalized sample, it is loaded on the MiSeq flow cell.

184

4. Materials and Methods

Sequencing reactions were carried out on a MiSeq (Illumina) by using the MiSeq Reagent Nano Kit, v2 (300 cycles). On-board software created results in FASTQ format, which were analyzed with the Geneious software (version 8.1). This software was used to compare mitogenome sequences with both the Revised Sapiens Reference Sequence (RSRS) (Behar et al., 2012) and the rCRS (Andrews et al., 1999) and to create a report of sequence variants (nucleotide substitutions and indels). The threshold used to detect heteroplasmies was 20% of mutated bases and the average depth of the obtained reads was about 4000X.

4.5.3 DNA amplification for Sanger sequencing

Some of the human mitogenome sequences included in this dissertation were instead obtained by Sanger sequencing.

In this case, amplification reactions were performed on 10-50 ng of template DNA in 25 µl of reaction mix in a thermocycler. The amplification of human mtDNA was carried out in the reaction mix reported in table 4.5.4, following the PCR conditions shown in table 4.5.5.

Table 4.5.4 Amplification protocol of human mtDNA. aPromega GoTaq® DNA polymerase. Reagent Initial concentration Volume (µl) Final Concentration

H2O 13.3

Buffer 5X 5 1X

dNTP mix 1.25 mM (each) 4 200 mM (each)

Forward primer 10 µM 0.75 0.2 µM

Reverse primer 10 µM 0.75 0.2 µM

Taq polymerasea 5000 U/ml 0.2 40 U/ml

DNA ~25 ng/µl 1 ~25 ng/µl Final Volume 25

Table 4.5.5 PCR reaction protocol for human mtDNA. aVariable temperature according to the pair of primers. Step Temperatures °C Time Initial denaturation 95°C 2 min

185

4. Materials and Methods

Denaturation 95°C 30 sec Annealing 55°Ca 30 sec Extension 72°C 70 sec Repeat from denaturation to annealing for 35 times Final extension 72°C 10 min

Reactions to amplify either the whole mitochondrial genome or the mtDNA control-region were performed following a well-established protocol (Torroni et al., 2001). The whole mitochondrial genome was amplified in 11 overlapping PCR fragments. Primers used for the complete mitochondrial genome amplification, and those used for the control-region amplification (at the bottom), are listed and reported in table 4.5.6.

Table 4.5.6 Oligonucleotides used to amplify the entire human mtDNA in eleven fragments (complete sequence PCRs) and a portion of the control-region (control- region PCR). aIt corresponds to the first nucleotide position (at 5‟) in the primer, numbered according to the reference sequence rCRS (Andrews et al., 1999).bMelting temperature.

Oligonucleotide Fragment PCR ID length (bp) Namea Length (nt) Sequence (5'-3') Tm (°C)b

Complete sequence PCRs

14900 for 20 gccatgcactactcaccaga 59.96 1 1760 90 rev 20 aatgctatcgcgtgcatacc 61.02 16458 for 20 cccataacacttgggggtag 59.17 2 1682 1570 rev 20 tgtaagttgggtgctttgtgtt 58.72 1404 for 22 acttaagggtcgaaggtggatt 60.23 3 1832 3235 rev 22 cttaacaaaccctgttcttggg 59.90 2932 for 20 gggataacagcgcaatccta 59.90 4 1607 4538 rev 20 gcttagcgctgtgatgagtg 60.13 4366 for 22 aaaattctccgtgccacctatc 61.53 5 1659 6024 rev 22 ttatgttgtttatgcggggaaa 61.37 5871 for 22 gcttcactcagccattttacct 59.79 6 1747 7617 rev 22 tcttgtagacctacttgcgctg 59.72 7356 for 24 gtagaagaaccctccataaacctg 59.46 7 1822 9177 rev 24 tagaagtgtgaaaacgtaggcttg 59.88 8 1740 8896 for 20 gccctagcccacttcttacc 60.10

186

4. Materials and Methods

10728 rev 21 ggccatatgtgttggagattg 60.21 10466 for 20 ccaaatgcccctcatttaca 60.04 9 1774 12240 rev 20 ggggcatgagttagcagttc 59.57 12014 for 22 ctcacccaccacattaacaaca 60.70 10 1816 13829 rev 22 agtcctaggaaagtgacagcga 60.44 13504 for 22 tactccaaagaccacatcatcg 59.99 11 1804 15307 rev 22 gaagggcaagatgaagtgaaag 60.24

Control-region PCR

15877 for 20 caaatgggcctgtccttgta 60.88 D-loop 1420 727 rev 20 agggtgaactcactggaacg 60.15

After the PCR reaction, fragments were purified using the ExoSAP-IT enzymatic system (Exonuclease I and Shrimp Alcaline Phosphatase, GE Healthcare). An ABI 3730 sequencer with 96 capillaries was employed for separation of the sequencing ladders. The sequencing was performed by the BMR Genomics (http://www.bmr-genomics.it/) or the GATC Biotech (http:/www.gatc- biotech.com/). Obtained sequences were aligned, assembled, and compared with the references using the software Sequencher™ 5.0 (Gene Codes). Since the traces were usually unambiguous, it was generally necessary to sequence only one strand.

The oligonucleotides used for the Sanger sequencing of human samples are listed in table 4.5.7. At the bottom is reported the protocol applied to the control region. Generally only the forward primer 15981 is employed for the D-loop sequencing, while the 305 rev was used for mtDNAs harbouring the T16189C. This transition creates a poly-C (Bendall and Sykes, 1995) that causes the fall of the sequencing reaction and the loss of the signal. The reverse primer solves the problem by completing the sequence information from the reverse side. For the same reason also the sequencing protocol for the complete human mtDNA includes a reverse primer (58 rev). Moreover, if length variation in the C tracts at np 309 is present, another poly-C could be created, causing the fall of sequence signal again. In this case an additional primer should be used to cover the entire sequence in that range. This oligonucleotide, named 653 rev are comprised in PCR fragment 2. However, the poly-C at np 309 is much less common than the one nearby np 16189, thus the primer 653 rev has not been included among the primers in the standard protocol for complete mtDNA sequencing.

Table 4.5.7 Oligonucleotides used for Sanger sequencing of the entire human mtDNA. An additional primer used in case of poly-C nearby np 309 (additional

187

4. Materials and Methods oligonucleotide for complete sequencing) is also included.aNumbers in column correspond to the 11 PCR fragments listed in Table 4.5.6. bIt corresponds to the first nucleotide position (at 5‟) in the primer, numbered according to the reference sequence rCRS (Andrews et al., 1999). cMelting temperature.

Template Sequencing oligonucleotide a PCR ID Nameb Length (nt) Sequence (5'-3') Tm (°C)c Complete sequencing 14948 for 20 cacatcactcgagacgtaaa 54.92 1 15564 for 20 atttcctattcgcctacaca 54.93 58 rev 20 aataccaaatgcatggagag 55.17 16522 for 20 taaagcctaaatagcccaca 55.27 2 584 for 20 tagcttacctcctcaaagca 55.46 1060 for 20 aagacccaaactgggattag 55.74 1445 for 20 gagtgcttagttgaacaggg 55.02 3 2047 for 20 tttaaatttgcccacagaac 55.39 2509 for 20 atcacctctagcatcaccag 55.23 3067 for 20 tgagttcagaccggagtaat 54.76 4 3540 for 20 tctcaccatcgctcttctac 55.54 4010 for 20 acaccctcaccactacaatc 54.77 4410 for 20 cagctaaataagctatcggg 54.58 5 5014 for 20 cctcaattacccacatagga 55.02 5544 for 20 tcaaagccctcagtaagttg 55.63 6041 for 20 ccttctaggtaacgaccaca 55.33 6 6473 for 20 cacagcagtcctacttctcc 55.00 7027 for 20 cccacttccactatgtccta 55.08 7416 for 20 ttcgaagaacccgtatacat 54.77 7 7987 for 20 actccttgacgttgacaatc 55.09 8505 for 20 ataacaaaccctgagaacca 54.62 8975 for 18 tcattcaaccaatagccc 54.27 8 9589 for 20 aagtcccactcctaaacaca 54.68 10147 for 20 acatagaaaaatccacccct 55.09 10498 for 22 tagcatttaccatctcacttct 53.48 9 11081 for 20 ataacattcacagccacaga 54.03 11644 for 20 cctcgtagtaacagccattc 54.99

188

4. Materials and Methods

12114 for 19 acatcattaccgggttttc 54.81 10 12611 for 20 attcatccctgtagcattgt 54.75 13134 for 20 agcagaaaatagcccactaa 54.42 13569 for 20 cgcctatagcactcgaataa 55.85 11 14115 for 20 cccactcatcctaaccctac 56.03 14646 for 20 cactcaacagaaacaaagca 54.98 Control-region sequencing D-loop 15981 for 19 ccattagcacccaaagcta 56.44 D-loop 305 rev 20 gggtttggtggaaattttt 55.37 Additional oligonucleotide for complete sequencing 2 653 rev 20 cctatttgtttatggggtga 55.04

4.5.4 Phylogenetic (and other) analyses

We built maximum-parsimony (MP) trees, one each for macro- haplogroups A2, B2 (without B2b), C1 (including C1b, C1c and C1d), and D4 (including D4h3a and D1), and one for haplogroup B2b. These trees encompassed both the new and previously published Ecuadorian (225, all modern) and Peruvian (68 ancient and 354 modern) mitogenomes. The MP phylogenetic trees were obtained by using the mtPhyl software (http://eltsov.org/mtphyl.aspx) and hand- corrected with reference to PhyloTree (van Oven and Kayser, 2009). New haplogroups/sub-haplogroups were defined when encompassing a minimum of three different haplotypes sharing at least one stable mutation (not recurrent in the tree), and were named following the nomenclature of the PhyloTree database build 17 (at http://www.phylotree.org/) (van Oven and Kayser, 2009). In some cases, the presence of new mitogenomes branching prior to a previously defined haplogroup node (i.e. A2k, A2y, A2z and B2o1) forced us to redefine the nomenclature of the branches as well as their diagnostic mutational motifs. Coalescence times were estimated using two methods: Maximum Likelihood (ML) and BEAST (Bayesian Evolutionary Analysis Sampling Trees). We performed these calculations considering all substitutions except those at nps 16182, 16183 and 16519. ML estimations were performed using the software PAMLX 1.3.1 (Yang 1997), assuming the HKY85 mutation model (two parameters in the model of DNA evolution) with gamma-distributed rates (approximated by a discrete distribution with 32 categories). They were performed on two datasets, one including only modern mitogenomes and the other with both modern and ancient mitogenomes. The estimated ages of macro-haplogroups M 189

4. Materials and Methods and N reported in Behar et al., (2012) were used as fixed priors for both datasets, while the dates of the ancient samples reported in table S1 of Llamas et al., (2016) were used as tip calibration points in the dataset including ancient samples. To calculate Bayesian age estimates, we employed BEAST 1.8.3 (Drummond and Rambaut, 2007) on both datasets used for the ML analyses. Also in this case, the estimated ages of macro-haplogroups M and N were used as fixed priors for both datasets, while the dates of the ancient samples reported in table S1 of Llamas et al., (2016) were used as tip calibration points in the dataset including ancient samples. The program was run under the HKY substitution model (gamma- distributed rates plus invariant sites) with a fixed molecular clock (Olivieri et al., 2017). Taking into account that the clock rate is linear in BEAST and that the timeframe for the appearance of the Native American sub-haplogroups is less than 20 Ky, the corrected molecular clock (Soares et al., 2009) was set at 2.33±0.2 x 10- 8 base substitution per nucleotide per year over the entire mitogenome, which corresponds to ~2,650 years for a mutation to happen. The chain length was established at 50,000,000 iterations, with samples drawn every 10,000 Markov chain Monte Carlo (MCMC) steps, after a discarded burn-in of 5,000,000 steps, as in previous studies (Olivieri et al., 2017). 4.6 Y-chromosome analysis

Every analysis of Y-chromosome markers is PCR based. The amplified product can be analyzed in different ways, according to the type of marker to be considered. In the case of Amplified Fragment Length Polymorphism (AFLP), the allelic state, determined by insertion or deletion of a fragment can be directly obtained by electrophoresis. In the case of SNP, if the variation creates or abolishes a site for a restriction enzyme, it can be investigated by restriction analysis followed by electrophoretic separation of obtained fragments (RFLP). When the analysis cannot be carried out with these simple single procedures, the most used methodologies are DHPLC and sequence analysis.

PCR primers are designed to have an annealing temperature between 56°C and 63°C in order to use a touch down PCR program described below. In RFLP analyses, the variation may be present in a potential restriction site. In this case, the primers were modified in order to create the restriction site themselves (called MM primers, i. e. modified with MisMatch).

190

4. Materials and Methods

4.6.1 Biallelic markers PCR amplification reactions and conditions

Eleven binary genetic markers, selected to identify the continental ancestry of sampled Y chromosomes, were hierarchically genotyped by AFLP or RFLP at the conditions previously described.

AFLP: YAP (Hammer and Horai, 1995)

RFLP: P15 (modified from Hammer et al., 2000); M198 (Underhill et al., 2001); M172 (Flores et al., 2003); M242 (Cinnioglu et al., 2004); M497 (King et al, 2011); P128 (Karafet et al., 2008); M269 (Balaresque et al., 2010); P260 (Karafet et al., 2008); PAGE94 (Rootsi et al., 2012); S6 (Yfull tree).

The nomenclature used for haplogroup labeling is in agreement with YCC conventions (2002) and recent updating of the Y-DNA Haplogroup Tree 2017 (https://isogg.org/tree/ISOGG_YDNATreeTrunk.html).

Concentrations and volumes of reagents used in the amplification reactions for every Y-chromosome marker analyzed are represented in table 4.6.1. Conditions of PCR reactions for every Y-chromosome marker analyzed are reported in table 4.6.2. Reactions were carried out in a 9700 Perkin Elmer® thermal cycler.

Table 4.6.1 Solutions and their concentrations in PCR reactions performed with GoTaq® DNA Polymerase (Promega). Reaction volume Stock solutions Final concentration (in a final volume of 25 μl)

5X GoTaq Reaction Buffer 5.0 μl 1X (1.5 mM MgCl2) dNTPmix (1.25 mM each) 2.0 μl 100 μM Forward Primer (100 μM) 0.05 μl 0.2 μM Reverse Primer (100 μM) 0.05 μl 0.2 μM GoTaq DNA Polymerase (5 U/μl) 0.1 μl 0.5 U/tot DNA (50 ng/μl) 1.0 μl 2 ng/μl

191

4. Materials and Methods

Table 4.6.2 Scheme of the touch down PCR method used in reaction performed with GoTaq® DNA Polymerase (Promega).

Hold Time Temperature Activation 2 minutes 95°C

Fourteen initial amplification cycles: Denaturation 20 seconds 94°C Annealing 1 minute 63°C→56°Ca Extension 1 minute 72°C

Thirty-five amplification cycles: Denaturation 20 seconds 94°C Annealing 45 seconds 56°C Extension 90 seconds 72°C

Final extension 10 minutes 72°C Maintenance Forever 15°C a the annealing temperature was decreased 0.5°C per cycle. 4.6.2 AFLP and RFLP analysis

In the Y chromosomes analyzed in this thesis, one AFLP was surveyed, the Y-chromosome YAP marker (Hammer and Horai, 1995). In this case electrophoresis (see section MtDNA Analysis) is informative in allelic discrimination, because the variation is an insertion of ~300 bp. The amplification of the two alleles forms generates fragments of 150 bp (YAP-) or 450 bp (YAP+) that are both detectable by a 2% agarose-gel electrophoresis. When the absence or presence of the mutation under study creates or abolishes a restriction site for a specific restriction endonuclease, an analysis of RFLP can be carried out. The specific enzyme cuts the amplified fragment, producing an expected restriction pattern. The scheme of digestion reaction, the same for all RFLPs analyzed, is reported in table 4.6.3. Incubation time and temperature depended on the enzyme. Reaction conditions and amplicon concentration were chosen to obtain 100% of the activity of the specific enzyme and to avoid partial digestions, which could produce ambiguous electrophoresis pattern. Fragments obtained from the digestion reaction were separated on agarose- gel, the concentration depending on the length of fragments to be separated. Electrophoretic run was carried out under the conditions described in the previous sections. The presence or absence of mutations determines the electrophoresis pattern. Sometimes the amplified fragment already contains restriction sites in different places of the sequence. The acquisition/loss of a restriction site, caused by the mutation, introduces a variation of the electrophoretic pattern, making possible

192

4. Materials and Methods its detection. In order to distinguish between the different restriction patterns a 3 or 4% agarose-gel was made.

Table 4.6.3 Digestion reaction scheme.

Reaction volume Stock solutions Final concentration (in a final volume of 30 μl) 10X Enzyme Buffer 3.0 μl 1X 100X BSAa (10mg/ml) 0.3 μl 1X (3 ng) Enzyme (10 U/μl) 0.3 μl 3 U/reaction DNA (~200 ng/μl) 20.0 μl ~100 ng/μl aBovine Serum Albumin. 4.7 HLA-G polymorphisms analysis

HLA-G gene polymorphism -725 G/C/A (rs1233334) was determined by the means of PCR-SBT (PCR-Sequence Based Typing) using the following primers (HLA-G -725 F: 5‟-ACCCCTGAATGATCAGGAATCT-3‟; HLA-G -725 R: 5‟-AAAGTTTGTGCTGGCTCCTG-3‟) on CEQ8800 system. Sequences were analyzed using BioEdit software (7.0.5.3 version) and compared with the reference sequence. Typing of HLA-G +3142 C/G polymorphism (rs1063320) was performed, by Real-Time PCR Endpoint genotyping on Light cycler 480, using a Custom TaqMan® SNP Genotyping Assay with the following primers and probes: primer F: 5‟-GGATCCTGAGCAATCA-3‟; primer R: 5‟- TTACCGATCTTAATAA; Probe 1 (“G” allele): VIC- TTTCGCTGGCGTGAAG- NFQ; Probe 2 (“C” allele): FAM-TCGCTGGCATGAAG- NFQ.

4.7.1 Sequencing method

The sequencing method gives resolution down to the level of single base- pairs, and it is useful in discovering new mutations and detecting SNPs. In case of HLA-G gene polymorphism -725 G/C/A (rs1233334), a sequencing analysis was carried out to highlight the presence of G or C or A allele at position -725 in the 5‟- URR (Upstream Regulatory Region) of the HLA-G gene. After PCR using the following forward and reverse primers HLA-G -725 F: 5‟- ACCCCTGAATGATCAGGAATCT-3‟; HLA-G -725 R: 5‟- AAAGTTTGTGCTGGCTCCTG-3‟ (or alternatively HLA-G -725 F: 5‟- AGCATAACCTTGGTAACCCCTG-3‟; HLA-G -725 R: 5‟- GCACTAGTGAGGGGCATTGT -3‟), a first purification step was necessary

193

4. Materials and Methods before proceeding to the sequencing analysis. An aliquot of each amplified fragment was purified using the AGENCOURT® AMPURE® XP kit (Beckman Coulter Inc.), which utilizes Agencourt‟s solid-phase paramagnetic bead technology for high-throughput purification of PCR amplicons. An optimized buffer is used to selectively bind PCR amplicons 100bp and larger to paramagnetic beads, then excess primers, nucleotides, salts, and enzymes can be removed using a simple washing procedure. The resulting purified PCR product, which is essentially free of contaminants, was amplified for SBT using forward and reverse primer, respectively, and Dye Terminator Cycle Sequencing (DTCS) Quick Start kit (Beckman Coulter Inc.). A second purification step was performed with Agencourt CleanSEQ system (Beckman Coulter Inc.) which is a rapid, high performance dye- terminator removal process based on Solid Phase Reversible Immobilization (SPRI) technology. Then the analysis was carried out using a eight capillaries automatic sequencer (CEQ8800 system, Beckman Coulter Inc.), in accordance with Sanger‟s method. After the run, electropherograms were provided as *. scf output files and analyzed using BioEdit software (7.0.5.3 version). Sample sequences were lined up and compared with each other and, afterwards, to the reference sequence containing the wild type allele at the concerned locus. Thus, every mutation/SNP was highlighted (figure 4.7.1). In case of the polymorphism here investigated, C is the minor allele.

Figure 4.7.1 Electropherogram. One sample (top) presents -725G allele, while the other two show a C allele (middle) and A allele (bottom), respectively.

194

4. Materials and Methods

4.7.2 Real-Time PCR

Real-time PCR is a variation of the standard PCR technique that is commonly used to quantify DNA (or RNA) in a sample. In traditional PCR, detection and quantification of the amplified sequence are performed at the end of the reaction after the last PCR cycle, and involve post-PCR manipulations such as gel electrophoresis and image analysis. In real-time PCR, the amount of DNA is measured after each cycle via fluorescent dyes that yield increasing fluorescent signal in direct proportion to the number of PCR product molecules (amplicons) generated. Thus using sequence-specific primers, the number of copies of a particular DNA (or RNA) sequence can be determined. If a particular sequence (DNA or RNA) is abundant in the sample, amplification is observed in earlier cycles; if the sequence is scarce, amplification is observed in later cycles. Quantification of amplified product is obtained using fluorescent probes or fluorescent DNA-binding dyes and real-time PCR instruments that measure fluorescence while performing the thermal cycling needed for the PCR reaction. In case of HLA-G +3142 C/G polymorphism (rs1063320), Real-Time PCR Endpoint genotyping was performed by LightCycler 480 (Roche Molecular Diagnostics, Basel, Switzerland), using a custom TaqMan® SNP Genotyping Assay (Applied Biosystems) with the following primers and probes: primer Forward: 5‟-GGATCCTGAGCAATCA-3‟; primer Reverse: 5‟- TTACCGATCTTAATAA; Probe 1 (“G” allele): VIC- TTTCGCTGGCGTGAAG- NFQ; Probe 2 (“C” allele): FAM-TCGCTGGCATGAAG- NFQ. The probe consists in a single strand DNA with a chromogen at the 5‟end such as FAM (6-carboxyfluorescein), also called “reporter”, and a NFQ (Non Fluorescent Quencher) group at the 3‟ end, also called “quencher”. Prior to PCR starting, as the two dyes (reporter and quencher) are in proximity, no fluorescence is produced by the chromogen. During PCR, the primers and probe anneal to the target. DNA polymerase extends the primer upstream of the probe, and, if the probe is bound to the correct target sequence, the polymerase‟s 5-nuclease activity cleaves the probe, releasing a fragment containing the reporter dye. Once cleavage takes place, the reporter and quencher dyes are no longer attracted to each other; the quencher is no more able to adsorb the energy emitted by the released reporter and the fluorescence is recorded by the system (see figure 4.7.2). After each cycle, the number of probes bound to the target sequence and cleaved by DNA polymerase increases leading to an increment of the fluorescence signal. The change in fluorescence over the course of the reaction is measured by real-time PCR instrument that combines thermal cycling with fluorescent dye scanning capability. By plotting fluorescence against the cycle number, the real-time PCR

195

4. Materials and Methods instrument generates an amplification plot that represents the accumulation of product over the duration of the entire PCR reaction. In this way, it is possible to monitor amplification at each cycle ensuring that only specific amplicons are detected, due to the selectivity of the probe.

Figure 4.7.2 Schematic representation of real-time PCR mechanism using TaqMan probes. TaqMan probes are oligonucleotide probes labeled with a reporter (R) fluorophore at the 5' end and a quencher (Q) fluorophore at the 3' end. Prior to PCR, the two dyes are in proximity, and no fluorescence is produced by the chromogen-reporter (up left). Once PCR initiates, the primer and probe anneal to the template DNA (up right); then the polymerase‟s 5-nuclease activity (Taq, Thermus aquaticus polymerase) cleaves the probe, releasing nucleotides form TaqMan probe, including a fragment containing the reporter dye (bottom), whose fluorescence can be detected by the system.

196

References References

Abi-Rached L, Jobin MJ, Kulkarni S, McWhinnie A, Dalva K, Gragert L, Babrzadeh F, Gharizadeh B, Luo M, Plummer FA, Kimani J, Carrington M, Middleton D, Rajalingam R, Beksac M, et al. The shaping of modern human immune systems by multiregional admixture with archaic humans. Science. 2011; 334:89-94

Achilli A, Rengo C, Magri C, Battaglia V, Olivieri A, Scozzari R, Cruciani F, Zeviani M, Briem E, Carelli V, Moral P, Dugoujon JM, Roostalu U, Loogväli EL, Kivisild T, et al. The molecular dissection of mtDNA haplogroup H confirms that the Franco-Cantabrian glacial refuge was a major source for the European gene pool. Am J Hum Genet. 2004;75:910-918

Achilli A, Perego UA, Bravi CM, Coble MD, Kong QP, Woodward SR, Salas A, Torroni A, Bandelt HJ. The phylogeny of the four pan-American mtDNA haplogroups: implications for evolutionary and disease studies. PLoS One. 2008;3:e1764

Achilli A, Perego UA, Lancioni H, Olivieri A, Gandini F, Hooshiar Kashani B, Battaglia V, Grugni V, Angerhofer N, Rogers MP, Herrera RJ, Woodward SR, Labuda D, Smith DG, Cybulski JS, et al. Reconciling migration models to the Americas with the variation of North American native mitogenomes. Proc Natl Acad Sci U S A. 2013;110:14308-14313Aimola G, Andrade C, Mota L and Parenti F. Final Pleistocene and Early Holocene at Sitio do Meio, Piauí -Brazil: Stratigraphy and comparison with Pedra Furada. J Lithic Studies. 2014;1:5–24

Anderson S, Bankier AT, Barrell BG, de Bruijn MH, Coulson AR, Drouin J, Eperon IC, Nierlich DP, Roe BA, Sanger F, Schreier PH, Smith AJH, Staden R, Young G. Sequence and organization of the human mitochondrial genome. Nature. 1981;290:457-465

Anderson S, de Bruijn MH, Coulson AR, Eperon IC, Sanger F, Young IG. Complete sequence of bovine mitochondrial DNA. Conserved features of the mammalian mitochondrial genome. J Mol Biol. 1982;156:683-717

Andrews RM, Kubacka I, Chinnery PF, Lightowlers RN, Turnbull DM, Howell N. Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA. Nat Genet. 1999;23:147

Armitage SJ, Jasim SA, Marks AE, Parker AG, Usik VI, Uerpmann HP. The southern route "out of Africa": evidence for an early expansion of modern humans into Arabia. Science. 2011;331:453-456

Arora M, Weisdorf DJ, Spellman SR, Haagenson MD, Klein JP, Hurley CK, Selby GB, Antin JH, Kernan NA, Kollman C, Nademanee A, McGlave P, Horowitz MM, Petersdorf EW. HLA- identical sibling compared with 8/8 matched and mismatched unrelated donor bone marrow transplant for chronic phase chronic myeloid leukemia. J Clin Oncol. 2009;27:1644–1652

Augusto DG, Petzl-Erler ML. KIR and HLA under pressure: evidences of coevolution across worldwide populations. Hum Genet. 2015;134:929-940

Avanzini MA, Bernardo ME, Cometa AM, Perotti C, Zaffaroni N, Novara F, Visai L, Moretta A, Del Fante C, Villa R, Ball LM, Fibbe WE, Maccario R, Locatelli F. Generation of mesenchymal stromal cells in the presence of platelet lysate: a phenotypic and functional comparison

197

References of umbilical cord blood- and bone marrow-derived progenitors. Haematologica. 2009;94:1649-1960

Aversa F, Tabilio A, Velardi A, Cunningham I, Terenzi A, Falzetti F, Ruggeri L, Barbabietola G, Aristei C, Latini P, Reisner Y, Martelli MF. Treatment of high-risk acute leukemia with T- cell depleted stem cells from related donors with one fully mismatched HLA haplotype. N Engl J Med 1998;339:1186–1193

Avery S, Voss MH, Gonzales AM, Lubin M, Castro-Malaspina H, Giralt S, Kernan NA, Scaradavou A, Hedvat CV, Stevens CE, Barker JN. Importance of day 21 BM chimerism in sustained neutrophil engraftment following double-unit cord blood transplantation. Bone Marrow Transplant. 2012;47:1056-1060

Avise JC, Arnold J, Ball RM, Bermingham E, Lamb T, Neigel JE, Reeb CA, Saunders NC. Intraspecific phylogeography: the mitochondrial DNA bridge between population genetics and systematic Annu Rev Ecol Syst. 1987;18:489-522

Avise JC. Phylogeography: the history and formation of species. Harvard University Press, Cambridge. 2000

Avital G, Buchshtav M, Zhidkov I, Tuval Feder J, Dadon S, Rubin E, Glass D, Spector TD, Mishmar D. Mitochondrial DNA heteroplasmy in diabetes and normal adults: role of acquired and inherited mutational patterns in twins. Hum Mol Genet. 2012;21:4214-4224

Ayub Q, Mohyuddin A, Qamar R, Mazhar K, Zerjal T, Mehdi SQ, Tyler-Smith C. Identification and characterisation of novel human Y-chromosomal microsatellites from sequence database information. Nucleic Acids Res. 2000;28:e8

Azarian M, Busson M, Lepage V, Charron D, Toubert A, Loiseau P, de Latour RP, Rocha V, Socié G. Donor CTLA-4 +49 A/G*GG genotype is associated with chronic GVHD after HLA- identical haematopoietic stem-cell transplantations. Blood. 2007;110:4623-4624

Balanovsky O, Gurianov V, Zaporozhchenko V, Balaganskaya O, Urasin V, Zhabagin M, Grugni V, Canada R, Al-Zahery N, Raveane A, Wen SQ, Yan S, Wang X, Zalloua P, Marafi A, et al. Phylogeography of human Y-chromosome haplogroup Q3-L275 from an academic/citizen science collaboration. BMC Evol Biol. 2017;17:18

Balaresque P, Bowden GR, Adams SM, Leung HY, King TE, Rosser ZH, Goodwin J, Moisan JP, Richard C, Millward A, Demaine AG, Barbujani G, Previderè C, Wilson IJ, Tyler-Smith C et al. A predominantly neolithic origin for European paternal lineages. PLoS Biol 2010;8: e1000285

Baldomero H, Gratwohl M, Gratwohl A, Tichelli A, Niederwieser D, Madrigal A, Frauendorfer K; European Group for Blood and Marrow Transplantation EBMT. The EBMT activity survey 2009: Trends over the past 5 years. Bone Marrow Transplant. 2011;46: 485–501

Ball RD. Statistical analysis of genomic data. Methods in molecular biology (Clifton, NJ) 2013;1019:171-192

Ballen KK, Gluckman E, Broxmeyer HE. Umbilical cord blood transplantation: the first 25 years and beyond. Blood. 2013;122:491-498

Bandelt HJ, Forster P, Röhl A. Median-joining networks for inferring intraspecific phylogenies. Mol Biol Evol. 1999;16:37-48

198

References Bandelt HJ, Kloss-Brandstätter A, Richards MB, Yao YG, Logan I. The case for the continuing use of the revised Cambridge Reference Sequence (rCRS) and the standardization of notation in human mitochondrial DNA studies. J Hum Genet. 2014;59:66-77

Barbujani G, Colonna V. Human genome diversity: frequently asked questions. Trends Genet. 2010;26:285-295

Barker JN, Scaradavou A, Stevens CE. Combined effect of total nucleated cell dose and HLA match on transplantation outcome in 1061 cord blood recipients with hematologic malignancies. Blood. 2010;115:1843-1849

Barker JN, Byam C, Scaradavou A. How we search: A guide to the selection and acquisition of unrelated cord blood grafts. Blood. 2011;117:3277–3285Barrell BG, Bankier AT, Drouin J. A different genetic code in human mitochondria. Nature. 1979;282:189-194

Battaglia V, Fornarino S, Al-Zahery N, Olivieri A, Pala M, Myres NM, King RJ, Rootsi S, Marjanovic D, Primorac D, Hadziselimovic R, Vidovic S, Drobnic K, Durmishi N, Torroni A, Santachiara-Benerecetti AS, Underhill PA, Semino O. Y-chromosomal evidence of the cultural diffusion of agriculture in Southeast Europe. Eur J Hum Genet. 2009;17:820-830

Battaglia V, Grugni V, Perego UA, Angerhofer N, Gomez-Palmieri JE, Woodward SR, Achilli A, Myres N, Torroni A, Semino O. The first peopling of South America: new evidence from Y-chromosome haplogroup Q. PLoS One. 2013;8:e71390

Behar DM, Villems R, Soodyall H, Blue-Smith J, Pereira L, Metspalu E, Scozzari R, Makkan H, Tzur S, Comas D, Bertranpetit J, Quintana-Murci L, Tyler-Smith C, Wells RS, Rosset S; Genographic Consortium. The dawn of human matrilineal diversity. Am J Hum Genet. 2008;82:1130-1140

Behar DM, van Oven M, Rosset S, Metspalu M, Loogväli EL, Silva NM, Kivisild T, Torroni A, Villems R. A "Copernican" reassessment of the human mitochondrial DNA tree from its root. Am J Hum Genet. 2012;90:675-684

Birky CW. Relaxed and stringent genomes: why cytoplasmic genes don't obey Mendel's Laws. J Hered. 1994;85:355-365

Birky CW. The inheritance of genes in mitochondria and chloroplasts: laws, mechanisms, and models. Annu Rev Genet. 2001;35:125-148

Bodmer W. Genetic characterization of human populations: from ABO to a genetic map of the British people. Genetics. 2015;199:267-279

Bodner M, Perego UA, Huber G, Fendt L, Röck AW, Zimmermann B, Olivieri A, Gómez-Carballa A, Lancioni H, Angerhofer N, Bobillo MC, Corach D, Woodward SR, Salas A, Achilli A, et al. Rapid coastal spread of first Americans: novel insights from South America's Southern Cone mitochondrial genomes. Genome Res. 2012;22:811-820

Boëda E, Clemente-Conte I, Fontugne M, Lahaye C, Pino M, Daltrini Felice G, Guidon N, Hoeltz S, Lourdeau A, Pagli M, et al. A new late Pleistocene archaeological sequence in South America: the Vale da Pedra Furada (Piaui, Brazil). Antiquity. 2014;88:927–955

Bogenhagen D, Clayton DA. Mouse L cell mitochondrial DNA molecules are selected randomly for replication throughout the cell cycle. Cell. 1977;11:719-727

Bordoni C, Magalon J, Gilbertas C, Gamerre M, Le Coz P, Berthomieu M, Chabannon C, Di Cristofaro J, Picard C. Cord blood collection and banking from a population with highly

199

References

diverse geographic origins increase HLA diversity in the registry and do not lower the proportion of validated cord blood units: experience of the Marseille Cord Blood Bank. Bone Marrow Transplant. 2015;50:531-535

Bortolini MC, Salzano FM, Thomas MG, Stuart S, Nasanen SP, Bau CH, Hutz MH, Layrisse Z, Petzl-Erler ML, Tsuneto LT, Hill K, Hurtado AM, Castro-de-Guerra D, Torres MM, Groot H et al. Y-chromosome evidence for differing ancient demographic histories in the Americas. Am J Hum Genet. 2003;73: 524-539

Bosch E., Calafell F, Rosser ZH, Nørby S, Lynnerup N, Hurles ME, Jobling MA. High level of male- biased Scandinavian admixture in Greenlandic Inuit shown by Y-chromosomal analysis. Hum Genet. 2003;112:353-363

Bosch-Vizcaya A, Pérez-García A, Brunet S, Solano C, Buño I, Guillem V, Martínez-Laperche C, Sanz G, Barrenetxea C, Martínez C, Tuset E, Lloveras N, Coll R, Guardia R, González Y, et al. GvHD/Immunotherapy committee of the Spanish Group for Hematopoietic Transplant (GETH). Donor CTLA-4 genotype influences clinical outcome after T-cell-depleted allogeneic hematopoietic stem cell transplantation from HLA-identical sibling donors. Biol Blood Marrow Transplant. 2012;18:100-105

Bowmaker M, Yang MY, Yasukawa T, Reyes A, Jacobs HT, Huberman JA, Holt IJ. Mammalian mitochondrial DNA replicates bidirectionally from an initiation zone. J Biol Chem. 2003;278:50961-50969

Brandini S, Bergamaschi P, Cerna MF, Gandini F, Bastaroli F, Bertolini E, Cereda C, Ferretti L, Gómez-Carballa A, Battaglia V, Salas A, Semino O, Achilli A, Olivieri A, and Torroni A. The Paleo-Indian entry into South America according to mitogenomes. Mol Biol Evol. [in press]

Brandstätter A, Peterson CT, Irwin JA, Mpoke S, Koech DK, Parson W, Parsons TJ. Mitochondrial DNA control region sequences from Nairobi (): inferring phylogenetic parameters for the establishment of a forensic database. Int J Legal Med. 2004;118:294-306

Brown WM, George M, Wilson AC. Rapid evolution of animal mitochondrial DNA. Proc Natl Acad Sci U S A. 1979;76:1967-1971

Brunstein CG, Fuchs EJ, Carter SL, Karanes C, Costa LJ, Wu J, Devine SM, Wingard JR, Aljitawi OS, Cutler CS, Jagasia MH, Ballen KK, Eapen M, O'Donnell PV; Blood and Marrow Transplant Clinical Trials Network. Alternative donor transplantation after reduced intensity conditioning: Results of parallel phase 2 trials using partially HLA- mismatched related bone marrow or unrelated double umbilical cord blood grafts. Blood. 2011;118: 282–288

Brunstein CG, Petersdorf EW, DeFor TE, Noreen H, Maurer D, MacMillan ML, Ustun C, Verneris MR, Miller JS, Blazar BR, McGlave PB, Weisdorf DJ, Wagner JE. Impact of allele-level HLA mismatch on outcomes in recipients of double umbilical cord blood transplantation. Biol. Blood Marrow Transplant. 2016;22:487–492

Buhler S, Sanchez-Mazas A. HLA DNA sequence variation among human populations: molecular signatures of demographic and selective events. PLoSOne. 2011;6:e14643

Buzzi M, Alviano F, Campioni D, Stignani M, Melchiorri L, Rotola A, Tazzari P, Ricci F, Vaselli C, Terzi A, Pagliaro PP, Cuneo A, Lanza F, Bontadini A, Baricordi OR, et al. Umbilical cord blood CD34(+) cell-derived progeny produces human leukocyte antigen-G molecules with immuno-modulatory functions. Hum Immunol 2012;73:150-155

200

References Calloway CD, Reynolds RL, Herrin GL Jr, Anderson WW. The frequency of heteroplasmy in the HVII region of mtDNA differs across tissue types and increases with age. Am J Hum Genet. 2000;66:1384–1397

Cann RL, Stoneking M, Wilson AC. Mitochondrial DNA and human evolution. Nature. 1987;325:31-36

Cao L, Shitara H, Horii T, Nagao Y, Imai H, Abe K, Hara T, Hayashi J, Yonekawa H. The mitochondrial bottleneck occurs without reduction of mtDNA content in female mouse germ cells. Nat Genet. 2007;39:386-390 Capittini C, Bergamaschi P, Sachetto S, Truglio M, Viola M, Marchesi A, Genovese V, Romano B, Guarene M, Poma R, Martinetti M, Tinelli C, Salvaneschi L. The plasma levels of soluble HLA-G molecules correlate directly with CD34+ cell concentration and HLA-G 14bp insertion/insertion polymorphism in cord blood donors. Blood Transfusion. 2014;23:1-6

Cardoso S, Alfonso-Sánchez MA, Valverde L, Sánchez D, Zarrabeitia MT, Odriozola A, Martínez- Jarreta B, de Pancorbo MM. Genetic uniqueness of the Waorani tribe from the Ecuadorian Amazon. Heredity (Edinb). 2012;108:609-615 Carling PJ, Cree LM, Chinnery PF. The implications of mitochondrial DNA copy number regulation during embryogenesis. Mitochondrion. 2011;11:686-692

Carvalho dos Santos L, Tureck LV, Wowk PF, Mattar SB, Gelmini GF, Magalhães JC, Bicalho Mda G, Roxo VM. HLA-E polymorphisms in an Afro-descendant Southern Brazilian population. Hum Immunol. 2013;74:199-202

Casanova M, Leroy P, Boucekkine C, Weissenbach J, Bishop C, Fellous M, Purrello M, Fiori G, Siniscalco M. A human Y-linked DNA polymorphism and its potential for estimating genetic and evolutionary distance. Science. 1985;230:1403-1406

Castelli EC, Mendes-Junior CT, Veiga-Castelli LC, Roger M, Moreau P, Donadi EA. A comprehensive study of polymorphic sites along the HLA-G gene: implication for gene regulation and evolution. Mol Biol Evol. 2011;28:3069-3086

Cerna M, Falco M, Friedman H, Raimondi E, Maccagno A, Fernandez-Viña M, Stastny P. Differences in HLA class II alleles of isolated South American Indian populations from Brazil and Argentina. Hum Immunol. 1993;37:213-220

Cerný V, Mulligan CJ, Fernandes V, Silva NM, Alshamali F, Non A, Harich N, Cherni L, El Gaaied AB, Al-Meeri A, Pereira L. Internal diversification of mitochondrial haplogroup R0a reveals post-last glacial maximum demographic expansions in South Arabia. Mol Biol Evol. 2011;28:71-78

Chandrasekar A, Kumar S, Sreenath J, Sarkar BN, Urade BP, Mallick S, Bandopadhyay SS, Barua P, Barik SS, Basu D, Kiran U, Gangopadhyay P, Sahani R, Prasad BV, Gangopadhyay S, et al. Updating phylogeny of mitochondrial DNA macrohaplogroup M in India: dispersal of modern human in South Asian corridor. PLoS One. 2009;4:e7447 Chatters JC, Kennett DJ, Asmerom Y, Kemp BM, Polyak V, Blank AN, Beddows PA, Reinhardt E, Arroyo-Cabrales J, Bolnick DA, Malhi RS, Culleton BJ, Erreguerena PL, Rissolo D, Morell- Hart S, et al. Late Pleistocene human skeleton and mtDNA link Paleoamericans and modern Native Americans. Science. 2014;344:750-754

Chaubey G, Karmin M, Metspalu E, Metspalu M, Selvi-Rani D, Singh VK, Parik J, Solnik A, Naidu BP, Kumar A, Adarsh N, Mallick CB, Trivedi B, Prakash S, Reddy R, et al. Phylogeography of mtDNA haplogroup R7 in the Indian peninsula. BMC Evol Biol. 2008;8:227 201

References Chien JW1, Zhang XC, Fan W, Wang H, Zhao LP, Martin PJ, Storer BE, Boeckh M, Warren EH, Hansen JA. Evaluation of published single nucleotide polymorphisms associated with acute GVHD. Blood. 2012;119:5311-5319

Chiesa R, Gilmour K, Qasim W, Adams S, Worth AJ, Zhan H, Montiel-Equihua CA, Derniame S, Cale C, Rao K, Hiwarkar P, Hough R, Saudemont A, Fahrenkrog CS, Goulden N, et al. Omission of in vivo T-cell depletion promotes rapid expansion of naive CD4+ cord blood lymphocytes and restores adaptive immunity within 2 months after unrelated cord blood transplant. Br J Haematol. 2012;156:656-666

Chinnery PF, Samuels DC. Relaxed replication of mtDNA: A model with implications for the expression of disease. Am J Hum Genet. 1999;64:1158-1165

Chinnery PF, Hudson G. Mitochondrial genetics. Br Med Bull. 2013;106:135-159

Christian BE, Spremulli LL. Evidence for an active role of IF3mt in the initiation of translation in mammalian mitochondria. Biochemistry. 2009;48:3269-3278

Clark PU, Dyke AS, Shakun JD, Carlson AE, Clark J, Wohlfarth B, Mitrovica JX, Hostetler SW, McCabe AM. The Last Glacial Maximum. Science. 2009;325:710-714

Clayton DA, Doda JN, Friedberg EC. The absence of a pyrimidine dimer repair mechanism in mammalian mitochondria. Proc Natl Acad Sci U S A. 1974;71:2777-2781

Clayton DA. Replication of animal mitochondrial DNA. Cell. 1982;28:693-705

Consonni G, Moreno E, Venturini S. Testing Hardy-Weinberg equilibrium: an objective Bayesian analysis. Statistics in medicine 2011;30: 62-74

Cooke H. Repeated sequence specific to human males. Nature. 1976;262:182-186

Cooke HJ, Schmidtke J, Gosden JR. Characterisation of a human Y chromosome repeated sequence and related sequences in higher primates. Chromosoma. 1982;87: 491-502

Cree LM, Samuels DC, de Sousa Lopes SC, Rajasimha HK, Wonnapinij P, Mann JR, Dahl HH, Chinnery PF. A reduction of mitochondrial DNA molecules during embryogenesis explains the rapid segregation of genotypes. Nat Genet. 2008;40:249-254

Cruciani F, La Fratta R, Torroni A, Underhill PA, Scozzari R. Molecular dissection of the Y chromosome haplogroup E-M78 (E3b1a): a posteriori evaluation of a microsatellite- network-based approach through six new biallelic markers. Hum Mutat. 2006;27:831-832

Cunha R, Zago MA, Querol S, Volt F, Ruggeri A, Sanz G, Pouthier F, Koegler G, Vicario JL, Bergamaschi P, Saccardi R, Lamas CH, Heredia C, Michel G, Bittencourt H, et al. An analysis on behalf of Eurocord, Cord Blood Committee Cellular Therapy - Immunobiology Working Party of EBMT, Netcord and Faculdade de Medicina de Ribeirão Preto - Faculdade de Medicina de São Paulo, Universidade de São Paulo. Impact of CTLA4 genotype and other immune response gene polymorphisms on outcomes after single umbilical cord blood transplantation. Blood. 2017;129:525-532

Dahi PB, Ponce DM, Devlin S, Evans KL, Lubin M, Gonzales AM, Byam C, Sideroff M, Wells D, Giralt S, Kernan NA, Scaradavou A, Barker JN. Donor-recipient allele-level HLA matching of unrelated cord blood units reveals high degrees of mismatch and alters graft selection. Bone Marrow Transplant. 2014;49:1184-1186

Dausset J. Iso-leuko-antibodies. Acta Haematol. 1958;20:156–166

202

References de Saint Pierre M, Gandini F, Perego UA, Bodner M, Gómez-Carballa A, Corach D, Angerhofer N, Woodward SR, Semino O, Salas A, Parson W, Moraga M, Achilli A, Torroni A, Olivieri A. Arrival of Paleo-Indians to the Southern Cone of South America: new clues from mitogenomes. PLoS One. 2012;7:e51311 del Pilar Fortes M, Gill G, Paredes ME, Gamez LE, Palacios M, Blanca I, Tassinari P. Allele and haplotype frequencies at Human leukocyte antigen class I and II genes in Venezuela‟s population. Ann Biol Clin 2012;70: 175-181 deMenocal PB, Stringer C. Human migration: Climate and the peopling of the world. Nature. 2016;538:49-50

Deng W, Shi B, He X, Zhang Z, Xu J, Li B, Yang J, Ling L, Dai C, Qiang B, Shen Y, Chen R. Evolution and migration history of the Chinese population inferred from Chinese Y- chromosome evidence. J Hum Genet. 2004;49:339-348

Derenko M, Malyarchuk B, Grzybowski T, Denisova G, Dambueva I, Perkova M, Dorzhu C, Luzina F, Lee HK, Vanecek T, Villems R, Zakharov I. Phylogeographic analysis of mitochondrial DNA in northern Asian populations. Am J Hum Genet. 2007;81:1025-1041 Derenko M, Malyarchuk B, Grzybowski T, Denisova G, Rogalla U, Perkova M, Dambueva I, Zakharov I. Origin and post-glacial dispersal of mitochondrial DNA haplogroups C and D in northern Asia. PLoS One. 2010;5:e15214 Deschamps M, Laval G, Fagny M, Itan Y, Abel L, Casanova JL, Patin E, Quintana-Murci L. Genomic signatures of selective pressures and introgression from archaic hominins at human innate immunity genes. Am J Hum Genet. 2016;98:5-21 Di Gaetano C, Cerutti N, Crobu F, Robino C, Inturri S, Gino S, Guarrera S, Underhill PA, King RJ, Romano V, Cali F, Gasparini M, Matullo G, Salerno A, Torre C, Piazza A. Differential Greek and northern African migrations to Sicily are supported by genetic evidence from

Dickinson AM, Middleton PG, Rocha V, Gluckman E, Holler E, Eurobank members. Genetic polymorphisms predicting the outcome of bone marrow transplants. Br J Haematol. 2004;127:479-490

Dickinson AM, Middleton PG. Beyond the HLA typing age: genetic polymorphisms predicting transplant outcome. Blood Rev. 2005;19:333-340

Dickinson AM. Non-HLA genetics and predicting outcome in HSCT. Int J Immunogen. 2008;35:375-380

Dickinson AM, Pearce KF, Norden J, O'Brien SG, Holler E, Bickeböller H, Balavarca Y, Rocha V, Kolb HJ, Hromadnikova I, Sedlacek P, Niederwieser D, Brand R, Ruutu T, Apperley J, et al. Impact of genomic risk factors on outcome after hematopoietic stem cell transplantation for patients with chronic myeloid leukemia. Haematologica. 2010;95:922-927

Dickinson AM, Norden J. Non-HLA genomics: does it have a role in predicting haematopoietic stem cell transplantation outcome? Int J Immunogenet. 2015;42:229-238

Dillehay TD. Monte Verde: A Late Pleistocene Settlement in Chile. 1989;Volume I: The Paleo- environmentand Site Context. Washington, D.C.: Smithsonian Institution Press Dillehay TD and Collins MB. Early cultural evidence from Monte Verde in Chile. Nature. 1998;332:150-152Dillehay TD, Ocampo C, Saavedra J, Sawakuchi AO, Vega RM, Pino M, Collins MB, Scott Cummings L, Arregui I, Villagran XS, Hartmann GA, Mella M, González

203

References A, Dix G. New Archaeological Evidence for an Early Human Presence at Monte Verde, Chile. PLoS One. 2015;10:e0141923. Dillehay TD, Goodbred S, Pino M, Vásquez Sánchez VF, Tham TR, Adovasio J, Collins MB, Netherly PJ, Hastorf CA, Chiou KL, Piperno D, Rey I, Velchoff N. Simple technologies and diverse food strategies of the Late Pleistocene and Early Holocene at Huaca Prieta, Coastal Peru. Sci Adv. 2017;3:e1602778. DiMauro S, Schon EA. Mitochondrial respiratory-chain diseases. N Engl J Med. 2003;348:2656- 2668

Dos Santos EJ, McCabe A, Gonzalez-Galarza FF, Jones AR, Middleton D. Allele Frequencies Net Database: Improvements for storage of individual genotypes and analysis of existing data. Hum Immunol. 2016;77:238-248

Drummond AJ, Rambaut A. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol Biol. 2007;7:214

Duarte RF, Salgado M, Sánchez-Ortega I, Arnan M, Canals C, Domingo-Domenech E, Fernández-de- Sevilla A, González-Barca E, Morón-López S, Nogues N, Patiño B, Puertas MC, Clotet B, Petz LD, Querol S, et al. CCR5 Δ32 homozygous cord blood allogeneic transplantation in a patient with HIV: a case report. Lancet HIV. 2015;2:e236-242

Duggan AT, Evans B, Friedlaender FR, Friedlaender JS, Koki G, Merriwether DA, Kayser M, Stoneking M. Maternal history of Oceania from complete mtDNA genomes: contrasting ancient diversity with recent homogenization due to the Austronesian expansion. Am J Hum Genet. 2014;94:721-733

Duggan AT, Stoneking M. Recent developments in the genetic history of East Asia and Oceania. Curr Opin Genet Dev. 2014;29:9-14

Dulik MC, Zhadanov SI, Osipova LP, Askapuli A, Gau L, Gokcumen O, Rubinstein S, Schurr TG. Mitochondrial DNA and Y chromosome variation provides evidence for a recent common ancestry between Native Americans and Indigenous Altaians. Am J Hum Genet. 2012;90: 573

Dunbar SA. Applications of Luminex xMAP technology for rapid, high-throughput multiplexed nucleic acid detection. Clin. Chim. Acta. 2006;363:71-82

Eapen M, Rubinstein P, Zhang MJ, Stevens C, Kurtzberg J, Scaradavou A, Loberiza FR, Champlin RE, Klein JP, Horowitz MM, Wagner JE. Outcomes of transplantation of unrelated donor umbilical cord blood and bone marrow in children with acute leukaemia: A comparison study. Lancet. 2007;369:1947–1954

Eapen M, Rocha V, Sanz G, Scaradavou A, Zhang MJ, Arcese W, Sirvent A, Champlin RE, Chao N, Gee AP, Isola L, Laughlin MJ, Marks DI, Nabhan S, Ruggeri A, et al. Center for International Blood and Marrow Transplant Research; Acute Leukemia Working Party Eurocord (the European Group for Blood Marrow Transplantation); National Cord Blood Program of the New York Blood Center. Effect of graft source on unrelated donor haematopoietic stem-cell transplantation in adults with adult leukaemia: A retrospective analysis. Lancet Oncol. 2010;11: 653–660

Eapen M, Klein JP, Sanz GF, Spellman S, Ruggeri A, Anasetti C, Brown M, Champlin RE, Garcia- Lopez J, Hattersely G, Koegler G, Laughlin MJ, Michel G, Nabhan SK, Smith FO, et al. Eurocord-European Group for Blood and Marrow Transplantation; Netcord; Center for International Blood and Marrow Transplant Research. Effect of donor-recipient HLA

204

References

matching at HLA A, B, C, and DRB1 on outcomes after umbilical-cord blood transplantation for leukaemia and myelodysplastic syndrome: a retrospective analysis. Lancet Oncol. 2011;12:1214-1221

Eapen M, Klein JP, Ruggeri A, Spellman S, Lee SJ, Anasetti C, Arcese W, Barker JN, Baxter-Lowe LA, Brown M, Fernandez-Vina MA, Freeman J, He W, Iori AP, Horowitz MM, et al. Center for International Blood and Marrow Transplant Research, Netcord, Eurocord, and the European Group for Blood and Marrow Transplantation. Impact of allele-level HLA matching on outcomes after myeloablative single unit umbilical cord blood transplantation for hematologic malignancy. Blood. 2014;123:133-140

Ellis N, Taylor A, Bengtsson BO, Kidd J, Rogers J, Goodfellow P. Population structure of the human pseudoautosomal boundary. Nature. 1990;344: 663-665

Elson JL, Andrews RM, Chinnery PF, Lightowlers RN, Turnbull DM, Howell N. Analysis of European mtDNAs for recombination. Am J Hum Genet. 2001;68:145-153

Everitt BS. An R and S-Plus® Companion to Multivariate Analysis; 2005

Fagundes NJ, Kanitz R, Eckert R, Valls AC, Bogo MR, Salzano FM, Smith DG, Silva WA Jr, Zago MA, Ribeiro-dos-Santos AK, Santos SE, Petzl-Erler ML, Bonatto SL. Mitochondrial population genomics supports a single pre-Clovis origin with a coastal route for the peopling of the Americas. Am J Hum Genet. 2008;82:583-592

Falkenberg M, Gaspari M, Rantanen A, Trifunovic A, Larsson NG, Gustafsson CM. Mitochondrial transcription factors B1 and B2 activate transcription of human mtDNA. Nat Genet. 2002;31:289-294

Falkenberg M, Larsson NG, Gustafsson CM. DNA replication and transcription in mammalian mitochondria. Annu Rev Biochem. 2007;76:679-699

Fehren-Schmitz L, Llamas B, Lindauer S, Tomasto-Cagigao E, Kuzminsky S, Rohland N, Santos FR, Kaulicke P, Valverde G, Richards SM, Nordenfelt S, Seidenberg V, Mallick S, Cooper A, Reich D, Haak W. A re-appraisal of the early Andean human remains from Lauricocha in Peru. PLoS One. 2015;10:e0127141 Fernandez-Vina MA, La´zaro AM, Marcos CY, Nulf C, Raimondi E, Haas EJ, Stastny P. Dissimilar evolution of B-locus versus A-locus and class II loci of the HLA region in South American Indian tribes. Tissue Antigens 1997:50:233–250

Fernandez-Vina MA, Hollenbach JA, Lyke KE, Sztein MB, Maiers M, Klitz W, Cano P, Mack S, Single R, Brautbar C, Israel S, Raimondi E, Khoriaty E, Inati A, Andreani M, et al. Tracking human migrations by the analysis of the distribution of HLA alleles, lineages and haplotypes in closed and open populations. Philos Trans R Soc Lond B Biol Sci. 2012;367:820-829

Filipovich AH, Weisdorf D, Pavletic S, Socie G, Wingard JR, Lee SJ, Martin P, Chien J, Przepiorka D, Couriel D, Cowen EW, Dinndorf P, Farrell A, Hartzman R, Henslee-Downey J, et al. National Institutes of Health consensus development project on criteria for clinical trials in chronic graft-versus-host disease: I. Diagnosis and staging working group report. Biol Blood Marrow Transplant. 2005;11:945-956

Fine JP, Gray RJ. A Proportional Hazards Model for the Subdistribution of a Competing Risk. 1999.

205

References Fix AG. Rapid deployment of the five founding Amerind mtDNA haplogroups via coastal and riverine colonization. Am J Phys Anthropol. 2005;128:430-436

Fleischhauer K, Zino E, Mazzi B, Sironi E, Servida P, Zappone E, Benazzi E, Bordignon C. Peripheral blood stem cell allograft rejection mediated by CD4(+) T lymphocytes recognizing a single mismatch at HLA-DP beta 1*0901. Blood 2001;98:1122–1126

Flomenberg N, Baxter-Lowe LA, Confer D, Fernandez-Vina M, Filipovich A, Horowitz M, Hurley C, Kollman C, Anasetti C, Noreen H, Begovich A, Hildebrand W, Petersdorf E, Schmeckpeper B, Setterholm M, et al. Impact of HLA class I and class II high resolution matching on outcomes of unrelated donor bone marrow transplantation: HLA-C mismatching is associated with a strong adverse effect on transplantation outcome. Blood. 2004;104:1923–1930

Flores C, Maca-Meyer N, Pérez JA, González AM, Larruga JM, Cabrera VM. A predominant European ancestry of paternal lineages from Canary Islanders. Ann Hum Genet 2003;67: 138-152

Fornarino S, Pala M, Battaglia V, Maranta R, Achilli A, Modiano G, Torroni A, Semino O, Santachiara-Benerecetti SA. Mitochondrial and Y-chromosome diversity of the Tharus (Nepal): a reservoir of genetic variation. BMC Evol Biol. 2009;9: 54-169

Forster P, Harding R, Torroni A and Bandelt HJ. Origin and evolution of Native American mtDNA variation: a reappraisal. Am J Hum Genet, 1996;59:935-945

Forster P, Matsumura S. Evolution. Did early humans go north or south? Science. 2005;308:965- 966

Foster JW, Graves JA. An SRY-related sequence on the marsupial X chromosome: implications for the evolution of the mammalian testis-determining gene. Proc Natl Acad Sci U S A 1994;91:1927-1931

Fraser B. The first South Americans: Extreme living. Nature 2014;514:24-26

Friedlaender J, Schurr T, Gentz F, Koki G, Friedlaender F, Horvat G, Babb P, Cerchio S, Kaestle F, Schanfield M, Deka R, Yanagihara R, Merriwether DA. Expanding Southwest Pacific mitochondrial haplogroups P and Q. Mol Biol Evol. 2005;22:1506-1517

Fu Q, Posth C, Hajdinjak M, Petr M, Mallick S, Fernandes D, Furtwängler A, Haak W, Meyer M, Mittnik A, Nickel B, Peltzer A, Rohland N, Slon V, Talamo S, et al. The genetic history of Ice Age Europe. Nature. 2016;534:200-205

Gamble C, Davies W, Pettitt P, Richards M. Climate change and evolving human diversity in Europe during the last glacial. Philos Trans R Soc Lond B Biol Sci. 2004;359:243-253

Gandini F, Achilli A, Pala M, Bodner M, Brandini S, Huber G, Egyed B, Ferretti L, Gómez-Carballa A, Salas A, Scozzari R, Cruciani F, Coppa A, Parson W, Semino O, et al. Mapping human dispersals into the Horn of Africa from Arabian Ice Age refugia using mitogenomes. Sci Rep. 2016;6:25472

Gasparre G, Porcelli AM, Lenaz G, Romeo G. Relevance of mitochondrial genetics and metabolism in cancer development. Cold Spring Harb Perspect Biol. 2013;5.pii:a011411

Gendzekhadze K, Norman PJ, Abi-Rached L, Graef T, Moesta AK, Layrisse Z, Parham P. Co- evolution of KIR2DL3 with HLA-C in a human population retaining minimal essential

206

References diversity of KIR and HLA class I ligands. Proc Natl Acad Sci USA. 2009;106:18692– 18697

Geppert M, Baeta M, Núñez C, Martínez-Jarreta B, Zweynert S, Cruz OW, González-Andrade F, González-Solorzano J, Nagy M, Roewer L. Hierarchical Y-SNP assay to study the hidden diversity and phylogenetic relationship of native populations in South America. Forensic Sci Int Genet. 2011;5: 100-104

Gilbert MT, Jenkins DL, Götherstrom A, Naveran N, Sanchez JJ, Hofreiter M, Thomsen PF, Binladen J, Higham TF, Yohe RM 2nd, Parr R, Cummings LS, Willerslev E. DNA from pre-Clovis human coprolites in Oregon, North America. Science. 2008;320:786-789

Giles RE, Blanc H, Cann HM, Wallace DC. Maternal inheritance of human mitochondrial DNA. Proc Natl Acad Sci U S A. 1980;77:6715-6719

Gluckman E, Broxmeyer HE, Auerbach AD, Friedman HS, Douglas GW, Devergie A, Esperou H, Thierry D; Socie G, Lehn P, Cooper S, English D, Kurtzberg, J., Bard J and Boyse AE. Hematopoietic reconstitution in a patient with Fanconi's anemia by means of umbilical- cord blood from an HLA-identical sibling. N Engl J Med. 1989;321:1174-1178

Gluckman E, Rocha V, Boyer-Chammard A, Locatelli F, Arcese W, Pasquini R, Ortega J, Souillet G, Ferreira E, Laporte JP, Fernandez M, Chastang C. Outcome of cord-blood transplantation from related and unrelated donors. Eurocord Transplant Group and the European Blood and Marrow Transplantation Group. N Engl J Med. 1997:337:373–381

Gluckman E, Rocha V, Arcese W, Michel G, Sanz G, Chan KW, Takahashi TA, Ortega J, Filipovich A, Locatelli F, Asano S, Fagioli F, Vowels M, Sirvent A, Laporte JP, et al. Eurocord Group. Factors associated with outcomes of unrelated cord blood transplant: guidelines for donor choice. Exp Hematol. 2004;32:397-407

Gluckman E, Rocha V. Cord blood transplant: strategy of alternative donor search. Springer Semin Immunopathol. 2004;26:143-154

Gluckman E1, Rocha V. Donor selection for unrelated cord blood transplants. Curr Opin Immunol. 2006;18:565-570

Gluckman E. History of cord blood transplantation. Bone Marrow Transplant 2009;4:621–626

Gluckman E, Ruggeri A, Volt F, Cunha R, Boudjedir K, Rocha V. Milestones in umbilical cord blood transplantation. Br J Haematol. 2011;154:441-447

Glucksberg H, Storb R, Fefer A, Buckner CD, Neiman PE, Clift RA, Lerner KG, Thomas ED. Clinical manifestations of graft-versus-host disease in human recipients of marrow from HLA-matched sibling donors. Transplantation. 1974; 18: 295-304

Gómez-Carballa A, Catelli L, Pardo-Seco J, Martinón-Torres F, Roewer L, Vullo C and Salas A. The complete mitogenome of a 500-year-old Inca child mummy. Sci Rep. 2015;5:16462

Gonder MK, Mortensen HM, Reed FA, de Sousa A, Tishkoff SA. Whole-mtDNA genome sequence analysis of ancient African lineages. Mol Biol Evol. 2007;24:757-768 Gonzalez A1, Alegre E, Torres MI, Díaz-Lagares A, Lorite P, Palomeque T, Arroyo A. Evaluation of HLA-G5 plasmatic levels during pregnancy and relationship with the 14-bp polymorphism. Am J Reprod Immunol. 2010;64:367-374

207

References González-Andrade F, Sánchez D, González-Solórzano J, Gascón S, Martínez-Jarreta B. Sex-specific genetic admixture of Mestizos, Amerindian Kichwas, and Afro-Ecuadorans from Ecuador. Hum Biol. 2007;79:51-77 Gragert L, Eapen M, Williams E, Freeman J, Spellman S, Baitty R, Hartzman R, Rizzo JD, Horowitz M, Confer D, Maiers M. HLA match likelihoods for hematopoietic stem-cell grafts in the U.S. registry. N Engl J Med. 2014;371:339-348 Green RE, Krause J, Briggs AW, Maricic T, Stenzel U, Kircher M, Patterson N, Li H, Zhai W, Fritz MH, Hansen NF, Durand EY, Malaspinas AS, Jensen JD, Marques-Bonet T, et al. A draft sequence of the Neandertal genome. Science. 2010;328:710-722

Greenberg JH, Christy GTI, Zegura SL. The settlement of the Americas: A comparison of the linguistic, dental, and genetic evidence. Curr Anthropol. 1986;27:477–497

Greenberg JH. Language in the Americas. Stanford, CA: Stanford University Press. 1987

Greenspan B. Direct submission. 2015. Family Tree DNA.

Groucutt HS, Petraglia MD, Bailey G, Scerri EM, Parton A, Clark-Balzan L, Jennings RP, Lewis L, Blinkhorn J, Drake NA, Breeze PS, Inglis RH, Devès MH, Meredith-Williams M, Boivin N, et al. Rethinking the dispersal of Homo sapiens out of Africa. Evol Anthropol. 2015;24:149-164

Grugni V, Battaglia V, Hooshiar Kashani B, Parolo S, Al-Zahery N, Achilli A, Olivieri A, Gandini F, Houshmand M, Sanati MH, Torroni A, Semino O. Ancient migratory events in the Middle East: new clues from the Y-chromosome variation of modern Iranians. PLoS One. 2012;7:e41252

Grugni V, Battaglia V, Perego UA, Raveane A, Lancioni H, Olivieri A, Ferretti L, Woodward SR, Pascale JM, Cooke R, Myres N, Motta J, Torroni A, Achilli A, Semino O. Exploring the Y Chromosomal Ancestry of Modern Panamanians. PLoS One. 2015;10:e0144223

Grün R, Stringer CB. Electron spin resonance dating and the evolution of modern humans. Archaeometry. 1991;33:153-199

Guarene M, Badulli C, Cremaschi AL, Sbarsi I, Cacciatore R, Tinelli C, Pasi A, Bergamaschi P, and Perotti CG. Luminex® xMAP® technology is an effective strategy for high definition HLA typing of cord blood units prior to listing. [submitted]

Guidon N and Delibrias G. Carbon-14 dates point to man in the Americas 32,000 years ago. Nature 1986;321:769-771

Hagelberg E, Goldman N, Lio P, Whelan S, Schiefenhovel W, Clegg JB, Bowden DK. Evidence for mitochondrial DNA recombination in a human population of island Melanesia. Proc Biol Sci. 1999;266:485-492

Hammer MF, Horai S. Y chromosomal DNA variation and the peopling of Japan. Am J Hum Genet. 1995;56:951-962

Hammer MF, Karafet T, Rasanayagam A, Wood ET, Altheide TK, Jenkins T, Griffiths RC, Templeton AR, Zegura SL. Out of Africa and back again: nested cladistic analysis of human Y chromosome variation. Mol Biol Evol. 1998;15:427-441

Hammer MF, Redd AJ, Wood ET, Bonner MR, Jarjanazi H, Karafet T, Santachiara-Benerecetti S, Oppenheim A, Jobling MA, Jenkins T, Ostrer H, Bonne-Tamir B. Jewish and Middle

208

References

Eastern non-Jewish populations share a common pool of Y-chromosome biallelic haplotypes. Proc Natl Acad Sci U S A. 2000:97:6769-6774

Hammer MF, Karafet TM, Park H, Omoto K, Harihara S, Stoneking M, Horai S. Dual origins of the Japanese: common ground for hunter-gatherer and farmer Y chromosomes. J Hum Genet. 2006; 51:47-58Hanson EK, Ballantyne J. Comprehensive annotated STR physical map of the human Y chromosome: Forensic implications. Leg Med (Tokyo). 2006;8:110- 120

Harris P, Boyd E, Young BD, Ferguson-Smith MA. Determination of the DNA content of human chromosomes by flow cytometry. CytogeneT-cell Genet. 1986;41:14-21

Heinemann FM. HLA genotyping and antibody characterization using the Luminex™ Multiplex Technology. Transfus. Med. Hemother. 2009; 362:273-278

Hill C, Soares P, Mormina M, Macaulay V, Clarke D, Blumbach PB, Vizuete-Forster M, Forster P, Bulbeck D, Oppenheimer S, Richards M. A mitochondrial stratigraphy for island southeast Asia. Am J Hum Genet. 2007;80:29-43

Hiwarkar P, Qasim W, Ricciardelli I, Gilmour K, Quezada S, Saudemont A, Amrolia P, Veys P. Cord blood T cells mediate enhanced antitumor effects compared with adult peripheral blood T cells. Blood. 2015;126:2882-2891

Ho SY, Duchêne S. Molecular-clock methods for estimating evolutionary rates and timescales. Mol Ecol. 2014;23:5947-5965

Ho SY. The molecular clock and estimating species divergence. Nature Education 2008;1:168

Hoffecker JF, Elias SA, O'Rourke DH. Anthropology. Out of Beringia? Science. 2014;343:979-980

Hoffecker JF, Elias SA, O'Rourke DH, Scott GR, Bigelow NH. Beringia and the global dispersal of modern humans. Evol Anthropol. 2016;25:64-78Hofmanová Z, Kreutzer S, Hellenthal G, Sell C, Diekmann Y, Díez-Del-Molino D, van Dorp L, López S, Kousathanas A, Link V, Kirsanow K, Cassidy LM, Martiniano R, Strobel M, Scheu A, et al. Early farmers from across Europe directly descended from Neolithic Aegeans. Proc Natl Acad Sci U S A. 2016;113:6886-6891

Homburger JR, Moreno-Estrada A, Gignoux CR, Nelson D, Sanchez E, Ortiz-Tello P, Pons-Estel BA, Acevedo-Vasquez E, Miranda P, Langefeld CD, Gravel S, Alarcón-Riquelme ME, Bustamante CD. Genomic Insights into the Ancestry and Demographic History of South America. PLoS Genet. 2015;11:e1005602

Hooshiar Kashani B, Perego UA, Olivieri A, Angerhofer N, Gandini F, Carossa V, Lancioni H, Semino O, Woodward SR, Achilli A, Torroni A. Mitochondrial haplogroup C4c: a rare lineage entering America through the ice-free corridor? Am J Phys Anthropol. 2012;147:35-39

Horio T, Mizuno S, Uchino K, Mizutani M, Hanamura I, Espinoza JL, Onizuka M, Kashiwase K, Morishima Y, Fukuda T, Kodera Y, Doki N, Miyamura K, Mori T, Takami A. The recipient CCR5 variation predicts survival outcomes after bone marrow transplantation.Transpl Immunol. 2017;42:34-39

Hough R, Danby R, Russell N, Marks D, Veys P, Shaw B, Wynn R, Vora A, Mackinnon S, Peggs KS, Crawley C, Craddock C, Pagliuca A, Cook G, Snowden JA, Clark A, Marsh J, Querol S, Parkes G, Braund H, Rocha V. Recommendations for a standard UK approach to incorporating umbilical cord blood into clinical transplantation practice: an update on 209

References cord blood unit selection, donor selection algorithms and conditioning protocols. Br J Haematol. 2016;172:360-370

Howell N, Elson JL, Howell C, Turnbull DM. Relative rates of evolution in the coding and control regions of African mtDNAs. Mol Biol Evol. 2007;24:2213-2221

Hudjashov G, Kivisild T, Underhill PA, Endicott P, Sanchez JJ, Lin AA, Shen P, Oefner P, Renfrew C, Villems R, Forster P. Revealing the prehistoric settlement of Australia by Y chromosome and mtDNA analysis. Proc Natl Acad Sci U S A. 2007;104:8726-8730

Hwang WY, Samuel M, Tan D, Koh LP, Lim W, Linn YC. A meta analysis of unrelated donor cord blood transplantation versus unrelated donor bone marrow transplantation in adult and pediatric patients. Biol Blood Marrow Transplant. 2007;13: 444–453

Iborra FJ, Kimura H, Cook PR. The functional organization of mitochondrial genomes in human cells. BMC Biol. 2004;2:9

Imanishi T, Tatsuya A, Kimura A, Tokunaga K, Gojobori T. Allele and haplotype frequencies for the HLA and complement loci in various ethnic groups. In: Tsuji K, Aizawa M, Sasazuki (eds) Proceedings of the Eleventh International Histocompatibility Workshop and Conferencen1991. New York: Oxford Science 1992:1065–1127

Ingman M, Kaessmann H, Paabo S, Gyllensten U. Mitochondrial genome variation and the origin of modern humans. Nature. 2000;408:708-713

Irwin JA, Saunier JL, Niederstätter H, Strouss KM, Sturk KA, Diegoli TM, Brandstätter A, Parson W, Parsons TJ. Investigation of heteroplasmy in the human mitochondrial DNA control region: a synthesis of observations from more than 5000 global population samples. J Mol Evol. 2009;68:516-527

Jagasia M, Clark WB, Brown-Gentry KD, Crawford DC, Fan KH, Chen H, Kassim A, Greer JP, Engelhardt BG, Savani BN. Genetic variation in donor CTLA-4 regulatory region is a strong predictor of outcome after allogeneic hematopoietic cell transplantation for hematologic malignancies. Biol Blood Marrow Transplant. 2012;18:1069-1075

Jazin EE, Cavelier L, Eriksson I, Oreland L, Gyllensten U. Human brain contains high levels of heteroplasmy in the noncoding regions of mitochondrial DNA. Proc Natl Acad Sci U S A. 1996; 3:12382–12387

Jenuth JP, Peterson AC, Fu K, Shoubridge EA. Random genetic drift in the female germline explains the rapid segregation of mammalian mitochondrial DNA. Nat Genet. 1996;14:146-151

Jobling MA, Hurles M, Tyler-Smith C. Human Evolutionary Genetics: origins, peoples and disease. London/New York: Garland Science Publishing. 2004; pp 523Jordan F, McWhinnie AJ, Turner S, Gavira N, Calvert AA, Cleaver SA, Holman RH, Goldman JM, Madrigal JA. Comparison of HLA-DRB1 typing by DNA-RFLP, PCR-SSO and PCR-SSP methods and their application in providing matched unrelated donors for bone marrow transplantation. Tissue Antigens. 1995;45:103-110

Jobling MA, Samara V, Pandya A, Fretwell N, Bernasconi B, Mitchell RJ, Gerelsaikhan T, Dashnyam B, Sajantila A, Salo PJ, Nakahori Y, Disteche CM, Thangaraj K, Singh L, Crawford MH et al. Recurrent duplication and deletion polymorphisms on the long arm of the Y chromosome in normal males. Hum Mol Genet. 1996;5:1767-1775

210

References Jobling MA, Pandya A, Tyler-Smith C The Y chromosome in forensic analysis and paternity testing. Int J Legal Med. 1997;110:118-124

Jobling MA, Williams GA, Schiebel GA, Pandya GA, McElreavey GA, Salas GA, Rappold GA, Affara NA, Tyler-Smith C. A selective difference between human Y-chromosomal DNA haplotypes. Curr Biol. 1998;8:1391-1394

Jobling MA, Tyler-Smith C. The human Y chromosome: an evolutionary marker comes of age. Nat Rev Genet. 2003;4: 598-612

Jota MS, Lacerda DR, Sandoval JR, Vieira PP, Ohasi D, Santos-Júnior JE, Acosta O, Cuellar C, Revollo S, Paz-Y-Miño C, Fujita R, Vallejo GA, Schurr TG, Tarazona-Santos EM, Pena SDj, Ayub Q, Tyler-Smith C, Santos FR; Genographic Consortium. New native South American Y chromosome lineages. J Hum Genet. 2016;61:593-603

Jukes TH. Amino acid codes in mitochondria as possible clues to primitive codes. J Mol Evol. 1981;18:15-17 Kaplan EL, Meier P. Nonparametric Estimation from Incomplete Observations. 1958

Karafet T, Zegura SL, Vuturo-Brady J, Posukh O, Osipova L, Wiebe V, Romero F, Long JC, Harihara S, Jin F, Dashnyam B, Gerelsaikhan T, Omoto K, Hammer MF. Y chromosome markers and Trans-Bering Strait dispersals. Am J Phys Anthropol. 1997;102:301-314

Karafet TM, Zegura SL, Posukh O, Osipova L, Bergen A, Long J, Goldman D, Klitz W, Harihara S, de Knijff P, Wiebe V, Griffiths RC, Templeton AR, Hammer MF. Ancestral Asian source(s) of new world Y-chromosome founder haplotypes. Am J Hum Genet. 1999;64:817-831

Karafet T, Xu L, Du R, Wang W, Feng S, Wells RS, Redd AJ, Zegura SL, Hammer MF. Paternal population history of East Asia: sources, patterns, and microevolutionary processes. Am J Hum Genet. 2001;69: 615-628

Karafet TM, Mendez FL, Meilerman MB, Underhill PA, Zegura SL, Hammer MF. New binary polymorphisms reshape and increase resolution of the human Y chromosomal haplogroup tree. Genome Res. 2008;18:830-838

Kayser M, de Knijff P, Dieltjes P, Krawczak M, Nagy M, Zerjal T, Pandya A, Tyler-Smith C, Roewer L. Applications of microsatellite-based Y chromosome haplotyping. Electrophoresis. 1997;18:1602-1607

Kayser M, Kittler R, Erler A, Hedman M, Lee AC, Mohyuddin A, Mehdi SQ, Rosser Z, Stoneking M, Jobling MA, Sajantila A, Tyler-Smith C. A comprehensive survey of human Y- chromosomal microsatellites. Am J Hum Genet. 2004;74:1183-1197

Kayser M, Brauer S, Cordaux R, Casto A, Lao O, Zhivotovsky LA, Moyse-Faurie C, Rutledge RB, Schiefenhoevel W, Gil D, Lin AA, Underhill PA, Oefner PJ, Trent RJ, Stoneking M. Melanesian and Asian origins of Polynesians: mtDNA and Y chromosome gradients across the Pacific. Mol Biol Evol. 2006;23:2234-2244Keen LJ, DeFor TE, Bidwell JL, Davies SM, Bradley BA, Hows JM. Interleukin-10 and tumor necrosis factor alpha region haplotypes predict transplant-related mortality after unrelated donor stem cell transplantation. Blood. 2004;103: 3599-3602

Kemp BM, Malhi RS, McDonough J, Bolnick DA, Eshleman JA, Rickards O, Martinez-Labarga C, Johnson JR, Lorenz JG, Dixon EJ, Fifield TE, Heaton TH, Worl R, Smith DG. Genetic analysis of early Holocene skeletal remains from Alaska and its implications for the settlement of the Americas. Am J Phys Anthropol. 2007;132:605-621

211

References Kemp BM, Shurr TG. Ancient and modern genetic variation in the Americas. In: AuerbachBM (Ed.) Human variation in the Americas. Occasional paper No. 38. Carbondale, IL: Center for Archaeological Investigations, Southern Illinois University. 2010; pp.12-50

Kemp BM, González-Oliver A, Malhi RS, Monroe C, Schroeder KB, McDonough J, Rhett G, Resendéz A, Peñaloza-Espinosa RI, Buentello-Malo L, Gorodesky C, Smith DG. Evaluating the farming/language dispersal hypothesis with genetic variation exhibited by populations in the Southwest and Mesoamerica. Proc Natl Acad Sci U S A. 2010;107:6759-6764

Khrapko K. Two ways to make an mtDNA bottleneck. Nat Genet. 2008;40:134-135

Kimura L, Nunes K, Macedo-Souza LI, Rocha J, Meyer D, Mingroni-Netto RC. Inferring paternal history of rural African-derived Brazilian populations from Y chromosomes. Am J Hum Biol. 2017;29. doi: 10.1002/ajhb.22930

King RJ, Dicristofaro J, Kouvatsi A, Triantaphyllidis C, Scheidel W, Myres NM, Lin AA, Eissautier A, Mitchell M, Binder D, Semino O, Novelletto A, Underhill PA, Chiaroni J. The coming of the Greeks to Provence and Corsica: Y-chromosome models of archaic Greek colonization of the western Mediterranean. BMC Evol Biol 2011;11:69

Kivisild T, Reidla M, Metspalu E, Rosa A, Brehm A, Pennarun E, Parik J, Geberhiwot T, Usanga E, Villems R. Ethiopian mitochondrial DNA heritage: tracking gene flow across and around the gate of tears. Am J Hum Genet. 2004;75:752-570

Kivisild T, Shen P, Wall DP, Do B, Sung R, Davis K, Passarino G, Underhill PA, Scharfe C, Torroni A, Scozzari R, Modiano D, Coppa A, de Knijff P, Feldman M, et al. The role of selection in the evolution of human mitochondrial genomes. Genetics. 2006;172:373-387

Knight RD, Freeland SJ, Landweber LF. Rewiring the keyboard: evolvability of the genetic code. Nat Rev Genet. 2001;2:49-58

Koc EC, Spremulli LL. Identification of mammalian mitochondrial translational initiation factor 3 and examination of its role in initiation complex formation with natural mRNAs. J Biol Chem. 2002;277:35541-35549

Kögler G, Middleton PG, Wilke M, Rocha V, Esendam B, Enczmann J, Wernet P, Gluckman E, Querol S, Lecchi L, Goulmy E, Dickinson AM. Recipient cytokine genotypes for TNF- alpha and IL-10 and the minor histocompatibility antigens HY and CD31 codon 125 are not associated with occurrence or severity of acute GVHD in unrelated cord blood transplantation: a retrospective analysis. Transplantation. 2002;74:1167-75

Kollman C, Howe CW, Anasetti C, Antin JH, Davies SM, Filipovich AH, Hegland J, Kamani N, Kernan NA, King R, Ratanatharathorn V, Weisdorf D, Confer DL. Donor characteristics as risk factors in recipients after transplantation of bone marrow from unrelated donors: the effect of donor age. Blood. 2001; 98:2043-2051

Kong QP, Bandelt HJ, Sun C, Yao YG, Salas A, Achilli A, Wang CY, Zhong L, Zhu CL, Wu SF, Torroni A, Zhang YP. Updating the East Asian mtDNA phylogeny: a prerequisite for the identification of pathogenic mutations. Hum Mol Genet. 2006;15:2076-86

Korbling M, Freireich EJ. Twenty-five years of peripheral blood stem cell transplantation. Blood. 2011;117: 6411-6416

Kornberg A BT. DNA replication. New York: WH Freeman and Company, New York. 1992

212

References Kumar S, Bellis C, Zlojutro M, Melton PE, Blangero J, Curran JE. Large scale mitochondrial sequencing in Mexican Americans suggests a reappraisal of Native American origins. BMC Evol Biol. 2011;11:293

Kunkel LM, Smith KD, Boyer SH, Borgaonkar DS, Wachtel SS, Miller OJ, Breg WR, Jones HW, Rary JM. Analysis of human Y-chromosome-specific reiterated DNA in chromosome variants. Proc Natl Acad Sci U S A. 1977;74:1245-1249

Karachanak S, Carossa V, Nesheva D, Olivieri A, Pala M, Hooshiar Kashani B, Grugni V, Battaglia V, Achilli A, Yordanov Y, et al. Bulgarians vs the other European populations: a mitochondrial DNA perspective. Int J Legal Med. 2012;126:497-503

Karmin M, Saag L, Vicente M, Wilson Sayres MA, Järve M, Talas UG, Rootsi S, Ilumäe AM, Mägi R, Mitt M, Pagani L, Puurand T, Faltyskova Z, Clemente F, Cardona A, et al. A recent bottleneck of Y chromosome diversity coincides with a global change in culture. Genome Res. 2015;25:459-466.

Lahn BT, Page DC Functional coherence of the human Y chromosome. Science. 1997;278: 675- 680

Lahn BT, Page DC. Four evolutionary strata on the human X chromosome. Science. 1999;286: 964-967

Larsen MH, Hviid TV. Human leukocyte antigen-G polymorphism in relation to expression, function, and disease. Hum Immunol. 2009;70:1026-1034

Laughlin MJ, Eapen M, Rubinstein P, Wagner JE, Zhang MJ, Champlin RE, Stevens C, Barker JN, Gale RP, Lazarus HM, Marks DI, van Rood JJ, Scaradavou A, Horowitz MM. Outcomes after transplantation of cord blood or bone marrow from unrelated donors in adults with leukemia. N Engl J Med. 2004;351:2265-2275

Layrisse Z, Guedez Y, Domínguez E, Paz N, Montagnani S, Matos M, Herrera F, Ogando V, Balbas O, Rodríguez-Larralde A. Extended HLA haplotypes in a Carib Amerindian population: the Yucpa of the Perija Range. Hum Immunol. 2001;62:992-1000Lazaridis I, Patterson N, Mittnik A, Renaud G, Mallick S, Kirsanow K, Sudmant PH, Schraiber JG, Castellano S, Lipson M, Berger B, Economou C, Bollongino R, Fu Q, Bos KI, et al. Ancient human genomes suggest three ancestral populations for present-day Europeans. Nature. 2014;513:409-413

Le Maux A, Noël G, Birebent B, Grosset JM, Vu N, De Guibert S, Bernard M, Semana G, Amiot L. Soluble human leucocyte antigen-G molecules in peripheral blood haematopoietic stem cell transplantation: a specific role to prevent acute graft-versus-host disease and a link with regulatory T cells. Clin Exp Immunol. 2008;152:50-56

Lechler RI, Lombardi G, Batchelor JR, Reinsmoen N, Bach FH. The molecular basis of alloreactivity. Immunol Today 1990;11:83–88

Lee SJ, Klein J, Haagenson M, Baxter-Lowe LA, Confer DL, Eapen M, Fernandez-Vina M, Flomenberg N, Horowitz M, Hurley CK, Noreen H, Oudshoorn M, Petersdorf E, Setterholm M, Spellman S, et al. High-resolution donor-recipient HLA matching contributes to the success of unrelated donor marrow transplantation. Blood. 2007;110:4576-4583

Lee EJ, Merriwether DA. Identification of Whole Mitochondrial Genomes from Venezuela and Implications on Regional Phylogenies in South America. Hum Biol. 2015;87:29-38

213

References Lindo J, Achilli A, Perego UA, Archer D, Valdiosera C, Petzelt B, Mitchell J, Worl R, Dixon EJ, Fifield TE, at al. Ancient individuals from the North American Northwest Coast reveal 10,000 years of regional genetic continuity. Proc Natl Acad Sci U S A. 2017;114:4093-4098

Lippold S, Xu H, Ko A, Li M, Renaud G, Butthof A, Schröder R, Stoneking M. Human paternal and maternal demographic histories: insights from high-resolution Y chromosome and mtDNA sequences. Investig Genet. 2014;5:13

Lehtinen SK, Hance N, El Meziane A, Juhola MK, Juhola KM, Karhu R, Spelbrink JN, Holt IJ, Jacobs HT. Genotypic stability, segregation and selection in heteroplasmic human cell lines containing np 3243 mutant mtDNA. Genetics. 2000;154:363-80

Li M, Schönberg A, Schaefer M, Schroeder R, Nasidze I, Stoneking M. Detecting heteroplasmy from high-throughput sequencing of complete human mitochondrial DNA genomes. Am J Hum Genet. 2010;87:237-249

Lightowlers RN, Chrzanowska-Lightowlers ZM. Exploring our origins—the importance of OriL in mtDNA maintenance and replication. EMBO Rep. 2012;13:1038-1039 Liu H, Prugnolle F, Manica A, Balloux F. A geographically explicit genetic model of worldwide human-settlement history. Am J Hum Genet. 2006; 79:230-237 Liu M, Spremulli L. Interaction of mammalian mitochondrial ribosomes with the inner membrane. J Biol Chem. 2000;275:29400-29406 Ljungman P, Bregni M, Brune M, Cornelissen J, de Witte T, Dini G, Einsele H, Gaspar HB, Gratwohl A, Passweg J, Peters C, Rocha V, Saccardi R, Schouten H, Sureda A, et al. European Group for Blood and Marrow Transplantation. Allogeneic and autologous transplantation for haematological diseases, solid tumours and immune disorders: current practice in Europe 2009. Bone Marrow Transplant. 2010;45:219-34

Llamas B, Fehren-Schmitz L, Valverde G, Soubrier J, Mallick S, Rohland N, Nordenfelt S, Valdiosera C, Richards SM, Rohrlach A, Romero MI, Espinoza IF, Cagigao ET, Jiménez LW, Makowski et al. Ancient mitochondrial DNA provides high-resolution time scale of the peopling of the Americas. Sci Adv. 2016;2:e1501385 Luo SM, Ge ZJ, Wang ZW, Jiang ZZ, Wang ZB, Ouyang YC, Hou Y, Schatten H, Sun QY. Unique insights into maternal mitochondrial inheritance in mice. Proc Natl Acad Sci U S A. 2013;110:13038-13043 Macaulay V, Richards M, Sykes B. Mitochondrial DNA recombination-no need to panic. Proc Biol Sci. 1999a;266:2037-2042

Macaulay V, Richards M, Hickey E, Vega E, Cruciani F, Guida V, Scozzari R, Bonné-Tamir B, Sykes B, Torroni A. The emerging tree of West Eurasian mtDNAs: a synthesis of control- region sequences and RFLPs. Am J Hum Genet. 1999b;64:232-249

Macaulay V, Hill C, Achilli A, Rengo C, Clarke D, Meehan W, Blackburn J, Semino O, Scozzari R, Cruciani F, Taha A, Shaari NK, Raja JM, Ismail P, Zainuddin Z, et al. Single, rapid coastal settlement of Asia revealed by analysis of complete mitochondrial genomes. Science. 2005;308:1034-1036Malaspinas AS, Westaway MC, Muller C, Sousa VC, Lao O, Alves I, Bergström A, Athanasiadis G, Cheng JY, Crawford JE, Heupink TH, Macholdt E, Peischl S, Rasmussen S, Schiffels S, et al. A genomic history of Aboriginal Australia. Nature. 2016;538:207-214

Mallick S, Li H, Lipson M, Mathieson I, Gymrek M, Racimo F, Zhao M, Chennagiri N, Nordenfelt S, Tandon A, Skoglund P, Lazaridis I, Sankararaman S, Fu Q, Rohland N, et al.. The Simons 214

References Genome Diversity Project: 300 genomes from 142 diverse populations. Nature. 2016;538:201-206

Malyarchuk B, Grzybowski T, Derenko M, Perkova M, Vanecek T, Lazur J, Gomolcak P, Tsybovsky I. Mitochondrial DNA phylogeny in Eastern and Western Slavs. Mol Biol Evol. 2008;25:1651-1658

Malyarchuk B, Derenko M, Grzybowski T, Perkova M, Rogalla U, Vanecek T, Tsybovsky I. The peopling of Europe from the mitochondrial haplogroup U5 perspective. PLoS One. 2010;5:e10285

Manning K, Timpson A, Colledge S, Crema E, Edinborough K, Kerig T, Shennan S. The chronology of culture: a comparative assessment of European Neolithic dating approaches. Antiquity. 2014;88:1065–1080

Marchington DR, Macaulay V, Hartshorne GM, Barlow D, Poulton J. Evidence from human oocytes for a genetic bottleneck in an mtDNA disease. Am J Hum Genet. 1998;63:769-775

Margulis L. The origin of plant and animal cells. Am Sci. 1971;59:230-235

Marjanovic D, Fornarino S, Montagna S, Primorac D, Hadziselimovic R, Vidovic S, Pojskic N, Battaglia V, Achilli A, Drobnic K, Andjelinovic S, Torroni A, Santachiara-Benerecetti AS, Semino O. The peopling of modern Bosnia-Herzegovina: Y-chromosome haplogroups in the three main ethnic groups. Ann Hum Genet. 2005;69:757-763

Marsh SGE, Parham P, Barber LD. The HLA FactsBook. London: Academic Press; 2000

Martin PJ, Weisdorf D, Przepiorka D, Hirschfeld S, Farrell A, Rizzo JD, Foley R, Socie G, Carter S, Couriel D, Schultz KR, Flowers ME, Filipovich AH, Saliba R, Vogelsang GB, et al. Design of Clinical Trials Working Group. National Institutes of Health Consensus Development Project on Criteria for Clinical Trials in Chronic Graft-versus-Host Disease: VI. Design of Clinical Trials Working Group report. Biol Blood Marrow Transplant. 2006;12:491-505

Martin W, Muller M. The hydrogen hypothesis for the first eukaryote. Nature. 1998;392:37-41

Martínez H, Rodríguez-Larralde A, Izaguirre MH, De Guerra DC. Admixture estimates for Caracas, Venezuela, based on autosomal, Y-chromosome, and mtDNA markers. Hum Biol. 2007;79:201-213

Martinez-Borra J and Lopez-Larrea C. The emergence of the major histocompatilibility complex.Adv Exp Med Biol. 2012; 738:277-289

Martínez-Cruzado JC. The history of Amerindian mitochondrial DNA lineages in Puerto Rico, In: Island shores, distant pasts: archaeological and biological approaches to the Pre- Columbian settlement of the Caribbean. Fitzpatrick SM, Ross AH, editors. Gainesville, Florida, U.S.A: University Press of Florida. 2010;54-80

Mathé G, Amiel H, Schwarzenberg L Cattan A, Schneider M. Adoptive immunotherapy of acute leukemia: experimental and clinical results. Cancer Res. 1965;25:1525-1531

Medawar PB. Some immunological and endocrinological problems raised by the evolution of viviparity in vertebrates. Symp Soc Exp Biol. 1953;7:320–338

Mellars P. Why did modern human populations disperse from Africa ca. 60,000 years ago? A new model. Proc Natl Acad Sci U S A. 2006a;103:9381-9386

215

References Mellars P. A new radiocarbon revolution and the dispersal of modern humans in Eurasia. Nature. 2006b;439:931-935

Mellars P. Going East: new genetic and archaeological perspectives on the modern human colonization of Eurasia. Science. 2006c;313:796-800

Mellars P. Archeology and the dispersal of modern humans in Europe: Deconstructing the “Aurignacian”. Evol Anthropol. 2006d;15:167-182

Mellars P, Gori KC, Carr M, Soares PA, Richards MB. Genetic and archaeological perspectives on the initial modern human colonization of southern Asia. Proc Natl Acad Sci U S A. 2013;110:10699-10704

Mendez FL, Watkins JC, Hammer MF. A haplotype at STAT2 Introgressed from neanderthals and serves as a candidate of positive selection in Papua New Guinea. Am J Hum Genet. 2012;91:265-274

Mendez FL, Krahn T, Schrack B, Krahn AM, Veeramah KR, Woerner AE, Fomine FL, Bradman N, Thomas MG, Karafet TM, Hammer MF. An African American paternal lineage adds an extremely ancient root to the human Y chromosome phylogenetic tree. Am J Hum Genet. 2013;92:454-459

Merryman RW1, Armand P2. Immune Checkpoint Blockade and Hematopoietic Stem Cell Transplant.Curr Hematol Malig Rep. 2017;12:44-50

Meyer M, Kircher M, Gansauge MT, Li H, Racimo F, Mallick S, Schraiber JG, Jay F, Prüfer K, de Filippo C, Sudmant PH, Alkan C, Fu Q, Do R, Rohland N, et al. A high-coverage genome sequence from an archaic Denisovan individual. Science. 2012;338:222-226

Miller RE, Fayen JD, Mohammad SF, Stein K, Kadereit S, Woods KD, Sramkoski RM, Jacobberger JW, Templeton D, Shurin SB, Laughlin MJ. Reduced CTLA-4 protein and messenger RNA expression in umbilical cord blood T lymphocytes. Exp Hematol 2002;30:738-744

Mishmar D, Ruiz-Pesini E, Golik P, Macaulay V, Clark AG, Hosseini S, Brandon M, Easley K, Chen E, Brown MD, Sukernik RI, Olckers A, Wallace DC. Natural selection shaped regional mtDNA variation in humans. Proc Natl Acad Sci U S A. 2003;100:171-176

Monckton DG, Jeffreys AJ. DNA profiling. Curr Opin Biotechnol. 1993;4: 660-664

Manvailer LF, Wowk PF, Mattar SB, da Siva JS, da Graça Bicalho M, Roxo VM. HLA-F polymorphisms in a Euro-Brazilian population from Southern Brazil. Tissue Antigens. 2014; 84:554-559

Moraes, ME, Fernandez-Vina M, Salatiel I., Tsai S., Moraes JR. and Stastny P. HLA class II DNA typing in two Brazilian populations. Tissue Antigens. 1993;41 :238–242

Moreau C, Bhérer C, Vézina H, Jomphe M, Labuda D and Excoffier L. Deep human genealogies reveal a selective advantage to be on an expanding wave front. Science. 2011;334:1148- 1150

Morishima Y, Kawase T, Malkki M, Petersdorf EW. Effect of HLA-A2 allele disparity on clinical outcome in hematopoietic cell transplantation from unrelated donors. Tissue Antigens. 2007;69:31–35

216

References Mulero JJ, Chang CW, Calandro LM, Green RL, Li Y, Johnson CL, Hennessy LK. Development and validation of the AmpFlSTR Yfiler PCR amplification kit: a male specific, single amplification 17 Y-STR multiplex system. J Forensic Sci. 2006;51: 64-75

Nachman MW, Crowell SL. Estimate of the mutation rate per nucleotide in humans. Genetics. 2000;156:297-304

Nakahori Y, Mitani K, Yamada M, Nakagome Y. A human Y-chromosome specific repeated DNA family (DYZ1) consists of a tandem array of pentanucleotides. Nucleic Acids Res. 1986;14:7569-7580

Ngo KY, Vergnaud G, Johnsson C, Lucotte G, Weissenbach J A DNA probe detecting multiple haplotypes of the human Y chromosome. Am J Hum Genet. 1986;38:407-418

Nilsson LL, Djurisic S , Hviid TVF. Controlling the immunological crosstalk during conception and pregnancy: HLA-G in reproduction Front. Immunol. 2014;5:198

Ober C, Aldrich CL, Chervoneva I, Billstrand C, Rahimov F, Gray HL, Hyslop T. Variation in the HLA-G promoter region influences miscarriage rates. Am J Hum Genet. 2003;72:1425- 1435

Ohno S. Sex chromosome and sex-linked genes. Berlin: Springer, 1967

Olerup O, Zetterquist H. HLA-DRB1*01 subtyping by allele-specific PCR-amplification: A sensitive, specific and rapid technique. Tissue Antigens. 1991:37:197-204

Olerup O, Zetterquist H. HLA-DR typing by PCR amplification with sequence-specific primers (PCR-SSP) in 2 hours: An alternative to serological DR typing in clinical practice including donor-recipient matching in cadaveric transplantations. Tissue Antigens. 1992:39:225-235Olivieri A, Achilli A, Pala M, Battaglia V, Fornarino S, Al-Zahery N, Scozzari R, Cruciani F, Behar DM, Dugoujon JM, Coudray C, Santachiara-Benerecetti AS, Semino O, Bandelt HJ, Torroni A. The mtDNA legacy of the Levantine early Upper Palaeolithic in Africa. Science. 2006;314:1767-1770

Olivieri A, Pala M, Gandini F, Hooshiar Kashani B, Perego UA, Woodward SR, Grugni V, Battaglia V, Semino O, Achilli A, Richards MB, Torroni A. Mitogenomes from two uncommon haplogroups mark late glacial/postglacial expansions from the Near East and neolithic dispersals within Europe. PLoS One. 2013;8:e70492

Olivieri A, Sidore C, Achilli A, Angius A, Posth C, Furtwängler A, Brandini S, Capodiferro MR, Gandini F, Zoledziewska M, Pitzalis M, Maschio A, Busonero F, Lai L, Skeates R, et al. Mitogenome diversity in Sardinians: a genetic window onto an island's past. Mol Biol Evol. 2017;34:1230-1239

Omrak A, Günther T, Valdiosera C, Svensson EM, Malmström H, Kiesewetter H, Aylward W, Storå J, Jakobsson M, Götherström A. Genomic evidence establishes Anatolia as the source of the European Neolithic gene pool. Curr Biol 2016;26:270-275

Oran B, Cao K, Saliba RM, Rezvani K, de Lima M, Ahmed S, Hosing CM, Popat UR, Carmazzi Y, Kebriaei P, Nieto Y, Rondon G, Willis D, Shah N, Parmar S, et al. Better allele-level matching improves transplant-related mortality after double cord blood transplantation. Haematologica. 2015; 100:1361-1370

O'Rourke DH, Raff JA. The human genetic history of the Americas: the final frontier.Curr Biol.2010;20:R202-207

217

References Pagani L, Lawson DJ, Jagoda E, Mörseburg A, Eriksson A, Mitt M, Clemente F, Hudjashov G, DeGiorgio M, Saag L, Wall JD, Cardona A, Mägi R, Sayres MA, Kaewert S, et al. Genomic analyses inform on migration events during the peopling of Eurasia. Nature. 2016;538:238-242

Pakendorf B, Stoneking M. Mitochondrial DNA and human evolution. Annu Rev Genomics Hum Genet. 2005;6:165-183

Pakendorf B, Novgorodov IN, Osakovskij VL, Danilova AP, Protod'jakonov AP, Stoneking M. Investigating the effects of prehistoric migrations in Siberia: genetic variation and the origins of Yakuts. Hum Genet. 2006;120: 334-353Pala M, Achilli A, Olivieri A, Hooshiar Kashani B, Perego UA, Sanna D, Metspalu E, Tambets K, Tamm E, Accetturo M, Carossa V, Lancioni H, Panara F, Zimmermann B, Huber G. Mitochondrial haplogroup U5b3: a distant echo of the Epipaleolithic in Italy and the legacy of the early Sardinians. Am J Hum Genet. 2009;84:814-821

Pala M, Olivieri A, Achilli A, Accetturo M, Metspalu E, Reidla M, Tamm E, Karmin M, Reisberg T, Hooshiar Kashani B, Perego UA, Carossa V, Gandini F, Pereira JB, Soares P, et al. Mitochondrial DNA signals of late glacial recolonization of Europe from Near Eastern refugia. Am J Hum Genet. 2012;90:915-924

Palanichamy MG, Sun C, Agrawal S, Bandelt HJ, Kong QP, Khan F, Wang CY, Chaudhuri TK, Palla V, Zhang YP. Phylogeny of mitochondrial DNA macrohaplogroup N in India, based on complete sequencing: implications for the peopling of South Asia. Am J Hum Genet. 2004;75:966-978

Palanichamy MG, Zhang CL, Mitra B, Malyarchuk B, Derenko M, Chaudhuri TK, Zhang YP. Mitochondrial haplogroup N1a phylogeography, with implication to the origin of European farmers. BMC Evol Biol. 2010;10:304

Parenti F, Mercier N and Valladas H. The oldest hearths of Pedra Furada, Brasil: thermoluminescence analysis of heated stones. Current Research in the Pleistocene. 1990; 7:36-37

Parenti F, Fontugue M and Guérin C. Pedra Furada in Brazil and its "presumed" evidence: limitations and potential of the available data. Antiquity. 1996;70:416-421

Park HB, Lee JE, Oh YM, Lee SJ, Eom HS, Choi K. CTLA4-CD28 chimera gene modification of T cells enhances the therapeutic efficacy of donor lymphocyte infusion for hematological malignancy. Exp Mol Med. 2017;49:e360. doi: 10.1038/emm.2017.104

Pascali VL, Dobosz M, Brinkmann B. Coordinating Y-chromosomal STR research for the Courts. Int J Legal Med. 1999;12:1

Pearce KF, Lee SJ, Haagenson M, et al. Analysis of non-HLA genomic risk factors in HLA-matched unrelated donor hematopoietic cell transplantation for chronic myeloid leukemia. Haematologica. 2012; 97: 1014-1019

Penack O, Holler E, van den Brink MR. Graft-versus-host disease: regulation by microbe- associated molecules and innate immune receptors. Blood. 2010;115:1865-1872

Pennarun E, Kivisild T, Metspalu E, Metspalu M, Reisberg T, Moisan JP, Behar DM, Jones SC, Villems R. Divorcing the Late Upper Palaeolithic demographic histories of mtDNA haplogroups M1 and U6 in Africa. BMC Evol Biol. 2012;12:234

218

References Perego UA, Achilli A, Angerhofer N, Accetturo M, Pala M, Olivieri A, Hooshiar Kashani B, Ritchie KH, Scozzari R, Kong QP, Myres NM, Salas A, Semino O, Bandelt HJ, Woodward SR, Torroni A. Distinctive Paleo-Indian migration routes from Beringia marked by two rare mtDNA haplogroups. Curr Biol. 2009;19:1-8

Perego UA, Angerhofer N, Pala M, Olivieri A, Lancioni H, Hooshiar Kashani B, Carossa V, Ekins JE, Gómez-Carballa A, Huber G, Zimmermann B, Corach D, Babudri N, Panara F, Myres NM, Parson W, Semino O, Salas A, Woodward SR, Achilli A, Torroni A. The initial peopling of the Americas: a growing number of founding mitochondrial genomes from Beringia. Genome Res. 2010;20:1174-1179

Pereira JB, Costa MD, Vieira D, Pala M, Bamford L, Harich N, Cherni L, Alshamali F, Hatina J, Rychkov S, Stefanescu G, King T, Torroni A, Soares P, Pereira L, Richards MB. Reconciling evidence from ancient and contemporary genomes: a major source for the European Neolithic within Mediterranean Europe. Proc Biol Sci. 2017;284(1851)

Pereira L, Richards M, Goios A, Alonso A, Albarrán C, Garcia O, Behar DM, Gölge M, Hatina J, Al- Gazali L, Bradley DG, Macaulay V, Amorim A. High-resolution mtDNA evidence for the late-glacial resettlement of Europe from an Iberian refugium. Genome Res. 2005;15:19- 24

Perez-Garcia A, De la Camara R, Roman-Gomez J, et al. CTLA-4 polymorphisms and clinical outcome after allogeneic stem cell transplantation from HLA-identical sibling donors. Blood 2007;110:461-467

Persson G, Melsted WN, Nilsson LL, Hviid TVF. HLA class Ib in pregnancy and pregnancy- related disorders. Immunogenetics. 2017,29,69:581-595

Petersdorf EW, Anasetti C, Martin PJ, Gooley T, Radich J, Malkki M, Woolfrey A, Smith A, Mickelson E, Hansen JA. Limits of HLA mismatching in unrelated hematopoietic cell transplantation. Blood 2004;104:2976–2980

Petersdorf EW. Risk assessment in haematopoietic stem cell transplantation: histocompatibility. Best Pract Res Clin Haematol. 2007; 20:155-1570

Petz LD1, Redei I, Bryson Y, Regan D, Kurtzberg J, Shpall E, Gutman J, Querol S, Clark P, Tonai R, Santos S, Bravo A, Spellman S, Gragert L, Rossi J, et al. Hematopoietic cell transplantation with cord blood for cure of HIV infections. Biol Blood Marrow Transplant. 2013;19:393- 397

Piccioli P, Balbi G, Serra M, et al. CTLA-4 +49A>G polymorphism of recipients of HLA-matched sibling allogeneic stem cell transplantation is associated with survival and relapse incidence. Annals of hematology 2010;89: 613-618

Pidala J, Kim J, Schell M, Lee SJ, Hillgruber R, Nye V, Ayala E, Alsina M, Betts B, Bookout R, Fernandez HF, Field T, Locke FL, Nishihori T, Ochoa JL, et al. Race/ethnicity affects the probability of finding an HLA-A, -B, -C and -DRB1 allele-matched unrelated donor and likelihood of subsequent transplant utilization. Bone Marrow Transplant. 2013;48:346-350

Pinhasi R, Thomas MG, Hofreiter M, Currat M, Burger J. The genetic history of Europeans. Trends Genet. 2012;28:496-505

Posth C, Renaud G, Mittnik A, Drucker DG, Rougier H, Cupillard C, Valentin F, Thevenet C, Furtwängler A, Wißing C, Francken M, Malina M, Bolus M, Lari M, Gigli E, et al. Pleistocene mitochondrial genomes suggest a single major dispersal of non-Africans and a Late Glacial population turnover in Europe. Curr Biol. 2016;26:827-833Prugnolle F, 219

References Manica A, Charpentier M, Guegan JF, Guernier V, Balloux F. Pathogen driven selection and worldwide HLA class I diversity. Curr Biol 2005a;15:1022–1027

Prugnolle F, Manica A, Balloux F. Geography predicts neutral genetic diversity of human populations. Curr Biol. 2005b;15:R159-60

Pyle A, Hudson G, Wilson IJ, Coxhead J, Smertenko T, Herbert M, Santibanez-Koref M, Chinnery PF. Extreme-depth re-sequencing of mitochondrial DNA finds no evidence of paternal transmission in humans. PLoS Genet. 2015;11:e1005040

Quintana-Murci L, Semino O, Bandelt HJ, Passarino G, McElreavey K, Santachiara-Benerecetti AS. Genetic evidence of an early exit of Homo sapiens sapiens from Africa through eastern Africa. Nat Genet. 1999;23:437-441

Rademaker K, Hodgins G, Moore K, Zarrillo S, Miller C, Bromley GRM, Leach P, Reid DA, Yépez Álvarez W and Sandweiss DH. Paleoindian settlement of the high-altitude Peruvian Andes. Science. 2014;346:466-469

Raff JA, Tackney J, O'Rourke DH. South from Alaska: a pilot aDNA study of genetic history on the Alaska Peninsula and the eastern Aleutians. Hum Biol. 2010;82:677-693

Raff JA, Bolnick DA. Palaeogenomics: genetic roots of the first Americans. Nature. 2014;506:162- 163

Raghavan M, DeGiorgio M, Albrechtsen A, Moltke I, Skoglund P, Korneliussen TS, Grønnow B, Appelt M, Gulløv HC, Friesen TM, Fitzhugh W, Malmström H, Rasmussen S, Olsen J, Melchior L, et al. The genetic prehistory of the New World Arctic. Science. 2014;345:1255832

Raghavan M, Steinrücken M, Harris K, Schiffels S, Rasmussen S, DeGiorgio M, Albrechtsen A, Valdiosera C, Ávila-Arcos MC, Malaspinas AS, Eriksson A, Moltke I, Metspalu M, et al. POPULATION GENETICS. Genomic evidence for the Pleistocene and recent population history of Native Americans. Science. 2015;349:aab3884

Rantanen A, Jansson M, Oldfors A, Larsson NG. Downregulation of Tfam and mtDNA copy number during mammalian spermatogenesis. Mamm Genome. 2001;12:787-792

Rasmussen M, Sikora M, Albrechtsen A, Korneliussen TS, Moreno-Mayar JV, Poznik GD, Zollikofer CP, Ponce de León MS, Allentoft ME, Moltke I, Jónsson H, Valdiosera C, Malhi RS, Orlando L, Bustamante CD, et al. The ancestry and affiliations of Kennewick Man. Nature. 2015;523:455-458

Redd AJ, Agellon AB, Kearney VA, Contreras VA, Karafet T, Park H, De Knijff P, Butler JM, Hammer MF Forensic value of 14 novel STRs on the human Y chromosome. Forensic Sci Int. 2002;130:97-111

Regueiro M, Cadenas AM, Gayden T, Underhill PA, Herrera RJ. Iran: tricontinental nexus for Y- chromosome driven migration. Hum Hered. 2006;61: 132-143

Reich D, Green RE, Kircher M, Krause J, Patterson N, Durand EY, Viola B, Briggs AW, Stenzel U, Johnson PL, Maricic T, Good JM, Marques-Bonet T, Alkan C, Fu Q, et al. Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature. 2010;468:1053-1060

Reich D, Patterson N, Campbell D, Tandon A, Mazieres S, Ray N, Parra MV, Rojas W, Duque C, Mesa N, García LF, Triana O, Blair S, Maestre A, Dib JC, et al. Reconstructing Native American population history. Nature. 2012;488:370-374

220

References Reidla M, Kivisild T, Metspalu E, Kaldma K, Tambets K, Tolk HV, Parik J, Loogväli EL, Derenko M, Malyarchuk B, Bermisheva M, Zhadanov S, Pennarun E, Gubina M, Golubenko M, et al. Origin and diffusion of mtDNA haplogroup X. Am J Hum Genet. 2003;73:1178-1190

Richards MB, Macaulay VA, Bandelt HJ, Sykes BC. Phylogeography of mitochondrial DNA in western Europe. Ann Hum Genet. 1998;62:241-260

Richards M, Macaulay V. The mitochondrial gene tree comes of age. Am J Hum Genet. 2001;68:1315-1320

Richards MB, Bandelt H-J, Kivisild T, Oppenheimer S. A Model for the dispersal of modern humans out of Africa. In: Bandelt PH-J, Macaulay DV, and Richards DM, editors. Human Mitochondrial DNA and the Evolution of Homo sapiens: Springer Berlin Heidelberg. 2006Richards MB, Soares P, Torroni A. Palaeogenomics: mitogenomes and migrations in Europe's past. Curr Biol. 2016;26:R243-6

Rocha V, Wagner JE Jr, Sobocinski KA et al. Graft-versus-host-disease in children who have received a cord-blood or bone marrow transplant from an HLA-identical sibling. Eurocord and International Bone Marrow Transplant Registry Working Committee on Alternative Donor and Stem Cell Sources. N Engl J Med 2000: 342:1846–54

Rocha V, Franco RF, Porcher R, Bittencourt H, Silva WA Jr, Latouche A, Devergie A, Esperou H, Ribaud P, Socie G, Zago MA, Gluckman E.. Host defense and inflammatory gene polymorphisms are associated with outcomes after HLA-identical sibling bone marrow transplantation. Blood 2002; 100: 3908-3918

Rocha V, Labopin M, Sanz G et al. Transplants of umbilical-cord blood or bone marrow from unrelated donors in adults with acute leukemia. N Engl J Med 2004;351:2276–2285

Rocha V, Gluckman E. Eurocord-Netcord registry and European Blood and Marrow Transplant group. Improving outcomes of cord blood transplantation: HLA matching, cell dose and other graft- and transplantation-related factors. Br. J. Haematol. 2009;147:262-274

Rocha V, Spellman S, Zhang MJ, Ruggeri A, Purtill D, Brady C, Baxter-Lowe LA, Baudoux E, Bergamaschi P, Chow R, Freed B, Koegler G, Kurtzberg J, Larghero J, Lecchi L, et al.; Eurocord-European Blood and Marrow Transplant Group and the Center for International Blood and Marrow Transplant Research. Effect of HLA-matching recipients to donor noninherited maternal antigens on outcomes after mismatched umbilical cord blood transplantation for hematologic malignancy. Biol Blood Marrow Transplant. 2012;18:1890-1896

Rocha V, Ruggeri A, Spellman S, Wang T, Sobecks R, Locatelli F, Askar M, Michel G, Arcese W, Iori AP, Purtill D, Danby R, Sanz GF, Gluckman E, Eapen M; Eurocord, Cord Blood Committee Cellular Therapy Immunobiology Working Party of the European Group for Blood and Marrow Transplantation, Netcord, and the Center for International Blood and Marrow Transplant Research. Killer Cell Immunoglobulin-Like Receptor-Ligand Matching and Outcomes after Unrelated Cord Blood Transplantation in Acute Myeloid Leukemia. Biol Blood Marrow Transplant. 2016;22:1284-1289

Rootsi S, Myres NM, Lin AA, Järve M, King RJ, Kutuev I, Cabrera VM, Khusnutdinova EK, Varendi K, Sahakyan H, Behar DM, Khusainova R, Balanovsky O, Balanovska E, Rudan P, et al. Distinguishing the co-ancestries of haplogroup G Y-chromosomes in the populations of Europe and the Caucasus. Eur J Hum Genet. 2012;20:1275-82

Rubinstein P, Carrier C, Scaradavou A, Kurtzberg J, Adamson J, Migliaccio AR, Berkowitz RL, Cabbad M, Dobrila NL, Taylor PE, Rosenfield RE, Stevens CE.. Outcomes among 562 221

References recipients of placental-blood transplants from unrelated donors. N Engl J Med. 1998:339:1565–1577

Rubinstein P. Cord blood banking for clinical transplantation. Bone Marrow Transplant. 2009;44:635–642

Ruggeri A, Paviglianiti A, Gluckman E, Rocha V. Impact of HLA in cord blood transplantation outcomes. HLA 2016;87:413-421

Sanchez-Mazas A, Fernandez-Viña M, Middleton D, Hollenbach JA, Buhler S, Di D, Rajalingam R, Dugoujon JM, Mack SJ, Thorsby E. Immunogenetics as a tool in anthropological studies. Immunology. 2011;133:143–164

Santangelo R, González-Andrade F, Børsting C, Torroni A, Pereira V and Morling N. Analysis of Ancestry Informative Markers in three main ethnic groups from Ecuador supports a trihybrid origin of Ecuadorians. Forensic Sci Int Genet. 2017;31:29-33

Santos C, Montiel R, Arruda A, Alvarez L, Aluja MP, Lima M. Mutation patterns of mtDNA: empirical inferences for the coding region. BMC Evol Biol. 2008;8:167

Santos FR, Pena SD, Tyler-Smith C. PCR haplotypes for the human Y chromosome based on alphoid satellite DNA variants and heteroduplex analysis. Gene. 1995;165:191-198

Sato M, Sato K. Maternal inheritance of mitochondrial DNA by diverse mechanisms to eliminate paternal mitochondrial DNA. Biochim Biophys Acta. 2013;1833:1979-1984

Schon EA, Bonilla E, DiMauro S. Mitochondrial DNA mutations and pathogenesis. J Bioenerg Biomembr. 1997;29:131-149

Schroeder KB, Schurr TG, Long JC, Rosenberg NA, Crawford MH, Tarskaia LA, Osipova LP, Zhadanov SI, Smith DG. A private allele ubiquitous in the Americas. Biol Lett. 2007;3:218-223

Schurr TG, Ballinger SW, Gan YY, Hodge JA, Merriwether DA, Lawrence DN, Knowler WC, Weiss KM, Wallace DC. Amerindian mitochondrial DNAs have rare Asian mutations at high frequencies, suggesting they derived from four primary maternal lineages. Am J Hum Genet. 1990;46:613-623

Schurr TG, Sherry ST. Mitochondrial DNA and Y chromosome diversity and the peopling of the Americas: evolutionary and demographic evidence. Am J Hum Biol. 2004;16:420-39

Schwartz M, Vissing J. Paternal inheritance of mitochondrial DNA. N Engl J Med. 2002;347:576- 580.

Seielstad MT, Hebert JM, Lin AA, Underhill PA, Ibrahim M, Vollrath D, Cavalli-Sforza LL. Construction of human Y-chromosomal haplotypes using a new polymorphic A to G transition. Hum Mol Genet. 1994;3:2159-2161

Sengsayadeth S, Wang T, Lee SJ, Haagenson MD, Spellman S, Fernandez Viña MA, Muller CR, Verneris MR, Savani BN, Jagasia M. Cytotoxic T-lymphocyte antigen-4 single nucleotide polymorphisms are not associated with outcomes after unrelated donor transplantation: a center for international blood and marrow transplant research analysis. Biol Blood Marrow Transplant. 2014;20:900-903

Sengupta S, Zhivotovsky LA, King R, Mehdi SQ, Edmonds CA, Chow CE, Lin AA, Mitra M, Sil SK, Ramesh A, Usha Rani MV, Thakur CM, Cavalli-Sforza LL, Majumder PP, Underhill PA.

222

References

Polarity and temporality of high-resolution y-chromosome distributions in India identify both indigenous and exogenous expansions and reveal minor genetic influence of Central Asian pastoralists. Am J Hum Genet. 2006;78:202-221

Shaw BE, Arguello R, Garcia-Sepulveda CA, Madrigal JA. The impact of HLA genotyping on survival following unrelated donor haematopoietic stem cell transplantation. Br J Haematol 2010;150:251–258

Shigenaga MK, Hagen TM, Ames BN. Oxidative damage and mitochondrial decay in aging. Proc Natl Acad Sci U S A. 1994;91:10771-10778

Single RM, Martin MP, Gao X, Meyer D, Yeager M, Kidd JR, Kidd KK, Carrington M. Global diversity and evidence for coevolution of KIR and HLA. Nat Genet. 2007;39:1114-1119

Skaletsky H, Kuroda-Kawaguchi T, Minx PJ, Cordum HS, Hillier L, Brown LG, Repping S, Pyntikova T, Ali J, Bieri T, Chinwalla A, Delehaunty A, Delehaunty K, Du H, Fewell G et al. The male-specific region of the human Y chromosome is a mosaic of discrete sequence classes. Nature. 2003;423: 825-837

Skoglund P, Mallick S, Bortolini MC, Chennagiri N, Hünemeier T, Petzl-Erler ML, Salzano FM, Patterson N, Reich D. Genetic evidence for two founding populations of the Americas. Nature. 2015; 525:104-108

Soares P, Ermini L, Thomson N, Mormina M, Rito T, Röhl A, Salas A, Oppenheimer S, Macaulay V, Richards MB. Correcting for purifying selection: an improved human mitochondrial molecular clock. Am J Hum Genet. 2009; 84:740-759

Soares P, Achilli A, Semino O, Davies W, Macaulay V, Bandelt HJ, Torroni A, Richards MB. The archaeogenetics of Europe. Curr Biol. 2010; 20:R174-83

Soares P, Alshamali F, Pereira JB, Fernandes V, Silva NM, Afonso C, Costa MD, Musilová E, Macaulay V, Richards MB, Cerny V, Pereira L. The expansion of mtDNA haplogroup L3 within and out of Africa. Mol Biol Evol. 2012; 29:915-927 Sobenin IA, Chistiakov DA, Bobryshev YV, Postnov AY, Orekhov AN. Mitochondrial mutations in atherosclerosis: new solutions in research and possible clinical applications. Curr Pharm Des. 2013; 19:5942-5953

Solloch UV, Lang K, Lange V, Böhme I, Schmidt AH, Sauter J. Frequencies of gene variant CCR5-Δ32 in 87 countries based on next-generation sequencing of 1.3 million individuals sampled from 3 national DKMS donor centers. Hum Immunol. 2017 [Epub ahead of print]

Sosa MX, Sivakumar IK, Maragh S, Veeramachaneni V, Hariharan R, Parulekar M, Fredrikson KM, Harkins TT, Lin J, Feldman AB, Tata P, Ehret GB, Chakravarti A. Next-generation sequencing of human mitochondrial reference genomes uncovers high heteroplasmy frequency. PLoS Comput Biol. 2012;8:e1002737

St John J, Sakkas D, Dimitriadi K, Barnes A, Maclin V, Ramey J, Barratt C, De Jonge C. Failure of elimination of paternal mitochondrial DNA in abnormal embryos. Lancet. 2000;355:200

Stark GL, Dickinson AM, Jackson GH, Taylor PR, Proctor SJ, Middleton PG. Tumour necrosis factor receptor type II 196M/R genotype correlates with circulating soluble receptor levels in normal subjects and with graft-versus-host disease after sibling allogeneic bone marrow transplantation. Transplantation. 2003;76:1742-1749

223

References Stewart JB, Chinnery PF. The dynamics of mitochondrial DNA heteroplasmy: implications for human health and disease. Nat Rev Genet. 2015;16:530-542

Stringer C, Andrews P. In reply: modern human origins. Science. 1988;241:773-774 Sun C, Kong QP, Palanichamy MG, Agrawal S, Bandelt HJ, Yao YG, Khan F, Zhu CL, Chaudhuri TK, Zhang YP. The dazzling array of basal branches in the mtDNA macrohaplogroup M from India as inferred from complete genomes. Mol Biol Evol. 2006;23:683-690

Sutovsky P, Schatten G. Paternal contributions to the mammalian zygote: fertilization after sperm-egg fusion. Int Rev Cytol. 2000;195:1-65

Sutovsky P. Ubiquitin‐dependent proteolysis in mammalian spermatogenesis, fertilization, and sperm quality control: Killing three birds with one stone. Microsc Res Tech. 2003;61:88- 102

Taboada-Echalar P, Alvarez-Iglesias V, Heinz T, Vidal-Bralo L, Gómez-Carballa A, Catelli L, Pardo- Seco J, Pastoriza A, Carracedo A, Torres-Balanza A, et al.. The genetic legacy of the pre- colonial period in contemporary Bolivians. PLoS One. 2013;8:e58980.

Tackney JC, Potter BA, Raff J, Powers M, Watkins WS, Warner D, Reuther JD, Irish JD, O'Rourke DH. Two contemporaneous mitogenomes from terminal Pleistocene burials in eastern Beringia. Proc Natl Acad Sci U S A. 2015;112:13833-13838

Takami A. Role of non-HLA gene polymorphisms in graft-versus-host disease. Int J Hematol. 2013,98:309-1810.1007/s12185-013-1416-7. Epub 2013 Aug 15.

Tambets K, Rootsi S, Kivisild T, Help H, Serk P, Loogväli EL, Tolk HV, Reidla M, Metspalu E, Pliss L, Balanovsky O, Pshenichnov A, Balanovska E, Gubina M, Zhadanov S, et al. The western and eastern roots of the Saami--the story of genetic "outliers" told by mitochondrial DNA and Y chromosomes. Am J Hum Genet. 2004;74:661-682

Tamm E, Kivisild T, Reidla M, Metspalu M, Smith DG, Mulligan CJ, Bravi CM, Rickards O, Martinez-Labarga C, Khusnutdinova EK, Fedorova SA, Golubenko MV, Stepanov VA, Gubina MA, Zhadanov SI, et al. Beringian standstill and spread of Native American founders. PLoS One. 2007;2:e829

Tao M, You CP, Zhao RR, Liu SJ, Zhang ZH, Zhang C, Liu Y.Animal mitochondria: evolution, function, and disease. Curr Mol Med. 2014;14:115-124

Taylor RW, Turnbull DM. Mitochondrial DNA mutations in human disease. Nat Rev Genet. 2005;6:389-402

Testi M., Andreani M. Luminex-Based Methods in High-Resolution HLA Typing. Methods Mol. Biol. 2015; 1310:231-245

Thomas ED, Lochte HL, LU WC, Ferrebee JW. Intravenous infusion of bone marrow in patients receiving radiation and chemotherapy. New Engl J Med 1957;257: 491–496

Thomson R, Pritchard JK, Shen P, Oefner PJ, Feldman MW. Recent common ancestry of human Y chromosomes: evidence from DNA sequence data. Proc Natl Acad Sci U S A 2000;97:7360-7365

Timmermann A, Friedrich T. Late Pleistocene climate drivers of early human migration. Nature. 2016;538:92-95

224

References Tito RY, Polo SI and Lewis CM. 2012. Direct SubmissionTorroni A, Schurr TG, Yang CC, Szathmary EJ, Williams RC, Schanfield MS, Troup GA, Knowler WC, Lawrence DN, Weiss KM. Native American mitochondrial DNA analysis indicates that the Amerind and the Nadene populations were founded by two independent migrations. Genetics. 1992;130:153-162

Torroni A, Schurr TG, Cabell MF, Brown MD, Neel JV, Larsen M, Smith DG, Vullo CM, Wallace DC. Asian affinities and continental radiation of the four founding Native American mtDNAs. Am J Hum Genet. 1993;53:563-590

Torroni A, Lott MT, Cabell MF, Chen YS, Lavergne L, Wallace DC. mtDNA and the origin of Caucasians: identification of ancient Caucasian-specific haplogroups, one of which is prone to a recurrent somatic duplication in the D-loop region. Am J Hum Genet. 1994a;55:760-776

Torroni A, Miller JA, Moore LG, Zamudio S, Zhuang J, Droma T, Wallace DC. Mitochondrial DNA analysis in Tibet: implications for the origin of the Tibetan population and its adaptation to high altitude. Am J Phys Anthropol. 1994b;93:189-199

Torroni A, Huoponen K, Francalacci P, Petrozzi M, Morelli L, Scozzari R, Obinu D, Savontaus ML, Wallace DC. Classification of European mtDNAs from an analysis of three European populations. Genetics. 1996;144:1835-1850

Torroni A, Bandelt HJ, D'Urbano L, Lahermo P, Moral P, Sellitto D, Rengo C, Forster P, Savontaus ML, Bonné-Tamir B, Scozzari R. mtDNA analysis reveals a major late Paleolithic population expansion from southwestern to northeastern Europe. Am J Hum Genet. 1998;62:1137-1152

Torroni A, Bandelt HJ, Macaulay V, Richards M, Cruciani F, Rengo C, Martinez-Cabrera V, Villems R, Kivisild T, Metspalu E, Parik J, Tolk HV, Tambets K, Forster P, Karger B, Francalacci P, et al. A signal, from human mtDNA, of postglacial recolonization in Europe. Am J Hum Genet. 2001;69:844-852

Torroni A, Achilli A, Macaulay V, Richards M, Bandelt HJ. Harvesting the fruit of the human mtDNA tree. Trends Genet. 2006;22:339-345

Trachtenberg EA, Erlich HA, Rickards O, De Stefano F, Klitz W. HLA class II linkage disequilibrium and haplotype evolution in the Cayapa Indians of Ecuador. Am J Hum Genet. 1995:57:415–424

Trowsdale J. MHC, disease and selection. Immunol Lett 2011;137:1-8

Tsuneto LT, Probst CM, Hutz MH, Salzano FM, Rodriguez-Delfin LA, Zago MA, Hill K, Hurtado AM, Ribeiro-dos-Santos AK, Petzl-Erler ML. HLA class II diversity in seven Amerindian populations. Clues about the origins of the Aché. Tissue Antigens. 2003;62:512-526

Tucci S, Akey JM. Population genetics: A map of human wanderlust. Nature. 2016;38:179-180

Tureck LV, Santos LC, Wowk PF, Mattar SB, Silva JS, Magalhães JC, Roxo VM, Bicalho MG. HLA-G 5' URR SNPs and 3' UTR 14-bp insertion/deletion polymorphism in an Afro- Brazilian population from Paraná State. Int J Immunogenet. 2014;41:29-33

Ueda H, Howson JM, Esposito L, et al. Association of the T-cell regulatory gene CTLA4 with susceptibility to autoimmune disease. Nature. 2003;423:506-511

225

References Underhill PA, Jin L, Zemans R, Oefner PJ, Cavalli-Sforza LL. A pre-Columbian Y chromosome- specific transition and its implications for human evolutionary history. Proc Natl Acad Sci U S A. 1996;93:196-200

Underhill PA, Jin L, Lin AA, Mehdi SQ, Jenkins T, Vollrath D, Davis RW, Cavalli-Sforza LL, Oefner PJ. Detection of numerous Y chromosome biallelic polymorphisms by denaturing high-performance liquid chromatography. Genome Res. 1997;7:996-1005

Underhill PA, Shen P, Lin AA, Jin L, Passarino G, Yang WH, Kauffman E, Bonné-Tamir B, Bertranpetit J, Francalacci P, Ibrahim M, Jenkins T, Kidd JR, Mehdi SQ, Seielstad MT et al. Y chromosome sequence Y chromosome sequence variation and the history of human populations. Nat Genet. 2000;26:358-361

Underhill PA, Passarino G, Lin AA, Shen P, Mirazón Lahr M, Foley RA, Oefner PJ, Cavalli-Sforza LL. The phylogeography of Y chromosome binary haplotypes and the origins of modern human populations. Ann Hum Genet. 2001;65:43-62

Underhill PA, Kivisild T. Use of y chromosome and mitochondrial DNA population structure in tracing human migrations. Annu Rev Genet. 2007;41:539-564

Ustun C, Bachanova V, Shanley R, MacMillan ML, Majhail NS, Arora M, Brunstein C, Wagner JE, Weisdorf DJ. Importance of donor ethnicity/race matching in unrelated adult and cord blood allogeneic hematopoietic cell transplant. Leuk Lymphoma. 2014;55:358-364 van Oven M, Kayser M. Updated comprehensive phylogenetic tree of global human mitochondrial DNA variation. Hum Mutat. 2009;30:E386-394 van Rood JJ. The detection of transplantation antigens in leukocytes. Semin Hematol 1968;2:187– 214

Vannucchi AM, Guidi S, Guglielmelli P, et al. Significance of CTLA-4 and CD14 genetic polymorphisms in clinical outcome after allogeneic stem cell transplantation. Bone marrow transplantation. 2007;40:1001-1002

Veit TD, Chies JA. Tolerance versus immune response -- microRNAs as important elements in the regulation of the HLA-G gene expression. Transpl Immunol. 2009;20:229-231

Veerappa AM, Padakannaya P, Ramachandra NB.Copy number variation-based polymorphism in a new pseudoautosomal region 3 (PAR3) of a human X-chromosome-transposed region (XTR) in the Y chromosome. Funct Integr Genomics. 2013;13:285-293

Vilar MG, Melendez C, Sanders AB, Walia A, Gaieski JB, Owings AC and Schurr TG and Genographic Consortium. Genetic diversity in Puerto Rico and its implications for the peopling of the Island and the West Indies. Am J Phys Anthropol. 2014;3:352-368

Villanea FA, Bolnick DA, Monroe C, Worl R, Cambra R, Leventhal A, Kemp BM. Brief communication: Evolution of a specific O allele (O1vG542A) supports unique ancestry of Native Americans. Am J Phys Anthropol. 2013;151:649-57

Wallace DC. Mitochondrial DNA sequence variation in human evolution and disease. Proc Natl Acad Sci U S A. 1994;91:8739-8746

Wallace DC. Mitochondrial DNA in aging and disease. Sci Am. 1997;277:40-47

Wallace DC, Brown MD, Melov S, Graham B, Lott M. Mitochondrial biology, degenerative diseases and aging. Biofactors. 1998;7:187-90

226

References Wallace DC, Chalkia D. Mitochondrial DNA genetics and the heteroplasmy conundrum in evolution and disease. Cold Spring Harb Perspect Biol. 2013;5:a021220Wang S, Lewis CM, Jakobsson M, Ramachandran S, Ray N, Bedoya G, Rojas W, Parra MV, Molina JA, Gallo C, Mazzotti G, Poletti G, Hill K, Hurtado AM, et al. Genetic variation and population structure in Native Americans. PLoS Genet. 2007;3:e185

Welsh K., Bunce M. Molecular typing for the MHC with PCR-SSP. Rev. Immunogenet, 1999; 1:157-176

Welte K, Foeken L, Gluckman E, Navarrete C. International exchange of cord blood units: The registry aspects. Bone Marrow Transplant 2010;45:825–831

Whitfield LS, Sulston JE, Goodfellow PN. Sequence variation of the human Y chromosome. Nature. 1995;378:379-380

Willemze R, Rodrigues CA, Labopin M, Sanz G, Michel G, Socié G, Rio B, Sirvent A, Renaud M, Madero L, Mohty M, Ferra C, Garnier F, Loiseau P, Garcia J, et al. Eurocord-Netcord and Acute Leukaemia Working Party of the EBMT. KIR-ligand incompatibility in the graft- versus-host direction improves outcomes after umbilical cord blood transplantation for acute leukemia. Leukemia 2009;23:492-500

Williams RC, Steinberg AG, Gershowitz H, Bennett PH, Knowler WC, Pettitt DJ, Butler W, Baird R, Dowda-Rea L, Burch TA, Morse HG, Smith CG. GM allotypes in Native Americans: evidence for three distinct migrations across the Bering land bridge. Am J Phys Anthropol. 1985;66:1-19

Woolfrey A, Klein JP, Haagenson M, Spellman S, Petersdorf E, Oudshoorn M, Gajewski J, Hale GA, Horan J, Battiwalla M, Marino SR, Setterholm M, Ringden O, Hurley C, Flomenberg N, et al. HLA-C antigen mismatch is associated with worse outcome in unrelated donor peripheral blood stem cell transplantation. Biol Blood Marrow Transplant. 2011;17:885-92

YCC A nomenclature system for the tree of human Y-chromosomal binary haplogroups. Genome Res. 2002;12: 339-348

Zegura SL, Karafet TM, Zhivotovsky LA, Hammer MF. High-resolution SNPs and microsatellite haplotypes point to a single, recent entry of Native American Y chromosomes into the Americas. Mol Biol Evol. 2004;21:164-175

Zhang S, Fernandez-Vin˜a M, Falco M, Cerná M, Raimondi E, Stastny P. A novel HLA-DRB1 allele (DRB1*0417) in South American Indians. Immunogenetics 1993;38:463

Zhong H, Shi H, Qi XB, Xiao CJ, Jin L, Ma RZ, Su B. Global distribution of Y-chromosome haplogroup C reveals the prehistoric migration routes of African exodus and early settlement in East Asia. J Hum Genet. 2010;55:428-435

Zhou Q, Li H, Li H, Nakagawa A, Lin JL, Lee ES, Harry BL, Skeen-Gaar RR, Suehiro Y, William D, Mitani S, Yuan HS, Kang BH, Xue D. Mitochondrial endonuclease G mediates breakdown of paternal mitochondria upon fertilization. Science. 2016;353:394-399

Zino E, Frumento G, Marktel S et al. A T-cell epitope encoded by a subset of HLA-DPB1 alleles determines nonpermissive mismatches for hematologic stem cell transplantation. Blood. 2004; 103:1417–1424

227

List of original manuscripts

Guarene M, Badulli C, Cremaschi AL, Sbarsi I, Cacciatore R, Tinelli C, Pasi A, Bergamaschi P, and Perotti CG Luminex® xMAP® technology is an effective strategy for high definition HLA typing of cord blood units prior to listing. [submitted]

Brandini S, Bergamaschi P, Cerna MF, Gandini F, Bastaroli F, Bertolini E, Cereda C, Ferretti L, Gómez-Carballa A, Battaglia V, Salas A, Semino O, Achilli A, Olivieri A, and Torroni A The Paleo-Indian entry into South America according to mitogenomes. Mol Biol Evol. [in press]

Cunha R, Zago MA, Querol S, Volt F, Ruggeri A, Sanz G, Pouthier F, Koegler G, Vicario JL, Bergamaschi P, Saccardi R, Lamas CH, Heredia C, Michel G, Bittencourt H, Tavella M, Panepucci RA, Fernandes F, Pavan J, Gluckman E, Rocha V - An analysis on behalf of Eurocord, Cord Blood Committee Cellular Therapy - Immunobiology Working Party of EBMT, Netcord and Faculdade de Medicina de Ribeirão Preto - Faculdade de Medicina de São Paulo, Universidade de São Paulo (2017) Impact of CTLA4 genotype and other immune response gene polymorphisms on outcomes after single umbilical cord blood transplantation. Blood 129:525- 532.

228

From www.bloodjournal.org by guest on September 25, 2017. For personal use only. Regular Article

TRANSPLANTATION Impact of CTLA4 genotype and other immune response gene polymorphisms on outcomes after single umbilical cord blood transplantation

Renato Cunha,1-4 Marco A. Zago,4 Sergio Querol,5 Fernanda Volt,1-3 Annalisa Ruggeri,1-3,6 Guillermo Sanz,7 Fabienne Pouthier,8 Gesine Kogler,9 Jos´eL. Vicario,10 Paola Bergamaschi,11-13 Riccardo Saccardi,14 Carmen H. Lamas,15 Cristina D´ıaz-de-Heredia,16 Gerard Michel,17 Henrique Bittencourt,18 Marli Tavella,4 Rodrigo A. Panepucci,4 Francisco Fernandes,19 Julia Pavan,19 Eliane Gluckman,1-3 and Vanderson Rocha,1-3,20,21 on behalf of Eurocord, Cord Blood Committee Cellular Therapy–Immunobiology Working Party of the European Society for Blood and Marrow Transplantation, Netcord and Faculdade de Medicina de Ribeira˜o Preto–Faculdade de Medicina de Sa˜o Paulo, Universidade de Sa˜o Paulo

1Eurocord, HˆopitalSaint Louis, Assistance Publique–Hˆopitauxde Paris (AP-HP), Paris, France; 2University Paris-Diderot, Paris, France; 3Monacord, Centre Scientifique de Monaco, Monaco; 4Clinical Hospital, Ribeira˜o Preto School of Medicine, Sa˜o Paulo University, Ribeira˜o Preto, Brazil; 5Barcelona Cord Blood Bank, Barcelona, Spain; 6Service d’H´ematologie et Th´erapieCellulaire, HˆopitalSaint Antoine, AP-HP, Paris, France; 7La Fe University Hospital, Valencia, Spain; 8Besanc¸on Cord Blood Bank, Besanc¸on, France; 9Jos´eCarreras Cord Blood Bank, D¨usseldorf,Germany; 10Madrid Cord Blood Bank, Madrid, Spain; 11Pavia Cord Blood Bank, Pavia, Italy; 12Service of Immunohematology and Transfusion Medicine, Policlinico San Matteo, Fondazione Istituto di Ricovero e Cura a Carattere Scientifico, Pavia, Italy; 13Department of Biology and Biotechnology, University of Pavia, Pavia, Italy; 14Firenze Cord Blood Bank, Firenze, Italy; 15Malaga Cord Blood Bank, Malaga, Spain; 16Vall d’Hebron Hospital, Barcelona, Spain; 17Timone Children Hospital, Marseille, France; 18Sainte- Justine University Health Center, Montreal, Canada; 19Mathematics Institute, Sa˜o Paulo University, Sa˜o Paulo, Brazil; 20Oxford University, Oxford, United Kingdom; and 21Servic¸o de Hematologia, Hemoterapia e Terapia Celular, Sa˜o Paulo University, Sa˜o Paulo, Brazil

Key Points We evaluated the impact of recipient and cord blood unit (CBU) genetic polymorphisms related to immune response on outcomes after unrelated cord blood transplantations • Gene polymorphism of the (CBTs). Pretransplant DNA samples from 696 CBUs with malignant diseases were immune response as CTLA4 genotyped for NLRP1, NLRP2, NLRP3, TIRAP/Mal, IL10, REL, TNFRSF1B, and CTLA4. HLA was shown to impact CBT compatibility was 6 of 6 in 10%, 5 of 6 in 39%, and ‡4 of 6 in 51% of transplants. outcomes according to CBU Myeloablative conditioning was used in 80%, and in vivo T-cell depletion in 81%, of cases. 3 7 genotype. The median number of total nucleated cells infused was 3.4 10 /kg. In multivariable • CTLA4-CBU genotype might analysis, patients receiving CBUs with GG-CTLA4 genotype had poorer neutrophil recovery (hazard ratio [HR], 1.33; P 5 .02), increased nonrelapse mortality (NRM) (HR, be considered for CBU P < P 5 . 1.50; .01), and inferior disease-free survival (HR, 1.41; .02). We performed the same selection when 1 CBU analysis in a more homogeneous subset of cohort 1 (cohort 2, n 5 305) of patients who meeting the current received transplants for acute leukemia, all given a myeloablative conditioning regimen, suggested selection criteria and with available allele HLA typing (HLA-A, -B, -C, and -DRB1). In this more is available. homogeneous but smaller cohort, we were able to demonstrate that GG-CTLA4-CBU was associated with increased NRM (HR, 1.85; P 5 .01). Use of GG-CTLA4-CBU was associated with higher mortality after CBT, which may be a useful criterion for CBU selection, when multiple CBUs are available. (Blood. 2017;129(4):525-532) Introduction

Cord blood grafts from an unrelated donor have been frequently used in regimen and graft-versus-host disease [GVHD] prophylaxis).2-4 the absence of a HLA-matched related or unrelated hematopoietic stem Identification of factors to improve outcomes is of paramount cell (HSC) transplant (HSCT) donor. To date, .35 000 unrelated cord importance, and may impact the current criteria for CBU selection. blood transplantations (CBTs) have been performed worldwide and Recently, better HLA matching based on allele typing has been .630 000 cord blood units (CBUs) are available for transplantation.1 shown to improve outcomes after CBT.5 Also, it has been described that CBT outcomes depend on characteristics of CBUs (mainly cell dose non-HLA factors of CBUs, such as matching of killer immunoglobulin- and number of HLA disparities) and other factors related to recipients, like receptor ligand or noninherited maternal antigens, may also affect underlying disease, and transplantation technique (such as conditioning outcomes after CBT and could be used as criteria for CBU selection.6,7

Submitted 14 June 2016; accepted 17 October 2016. Prepublished online as The publication costs of this article were defrayed in part by page charge Blood First Edition paper, 3 November 2016; DOI 10.1182/blood-2016-06- payment. Therefore, and solely to indicate this fact, this article is hereby 722249. marked “advertisement” in accordance with 18 USC section 1734.

The online version of this article contains a data supplement. © 2017 by The American Society of Hematology

BLOOD, 26 JANUARY 2017 x VOLUME 129, NUMBER 4 525 From www.bloodjournal.org by guest on September 25, 2017. For personal use only.

526 CUNHA et al BLOOD, 26 JANUARY 2017 x VOLUME 129, NUMBER 4

With the development of human genomics, several studies have FlexiGene DNA kit (Qiagen) for CBU samples and the Wizard DNA Purification attempted to correlate genetic polymorphisms of the innate and kit (Promega) for umbilical fragment tissue and stored in 1.5-mL Eppendorf adaptive immune response to hematopoietic cell transplantation (HCT) tubes at a temperature of 220°C. DNA samples were genotyped by real-time outcomes.8-16 However, in the CBT setting, only 1 study including a polymerase chain reaction (PCR) assay for the following candidate genes related small and heterogeneous group of 115 CBT recipients has been to immune response: NLRP1 (rs-5862), NLRP2 (rs-043684), NLRP3 (rs- 10754558), TIRAP/Mal (rs-8177374), IL10 (rs-1800872), REL (rs-13031237), described. In that study, pairs of genotyped samples (recipients and a TNFRSF1B (rs-1061622), and CTLA-4 (rs-3087243). A complete list of all donors) for tumor necrosis factor and interleukin 10, as well as for single-nucleotide polymorphisms (SNPs) studied and probes used for genotyp- minor histocompatibility antigen, HY, histocompatibility antigen 1, ing samples are described in supplemental Table 1 (available on the Blood andCD31codon125wereevaluated,andnosignificant association Web site). Real-time PCR was performed by allelic discrimination method in the between polymorphisms and transplant outcomes was observed.17 7300 Real-Time PCR system, using TaqMan SNP genotyping assays and Therefore, we conducted a retrospective cohort registry, based with TaqMan genotyping Master Mix reagent (Applied Biosystems). Graphical the aim of studying the influence of genetic polymorphisms of CBU on interpretation of results was processed and supplied by ABI 7500 System SDS transplant outcomes, namely myeloid engraftment, acute and chronic software (Applied Biosystems). GVHD, nonrelapse mortality (NRM), relapse, and survival for patients with hematological malignancies. We selected 8 candidate genes Statistical analysis (NLRP1, NLRP2, NLRP3, TIRAP/Mal, IL10, REL, TNFRSF1B, and The primary objective of the study NRM was defined as death not related to CTLA4), which are genes encoding main cytokines and other immune recurrence of primary disease. Secondary outcomes were defined as follows: (1) proteins involved in the development of innate and adaptive overall survival (OS): time interval between transplantation and death due to any immunological responses that have been associated with outcomes cause; (2) disease-free survival (DFS): time of life without relapse of the primary after HCT in previous studies. Tirap/mal mutation has been associated disease; (3) acute and chronic GVHD: diagnosis and grading were assigned by 38,39 with NRM, whereas IL10 and TNFRSF1B have been associated with the transplant center using standard criteria ; (4) relapse of disease: event GVHD and survival. The NALP genes (NLRP1, NLRP2, NLRP3) characterized by recurrence of the primary disease; (5) neutrophil and platelet . 3 9 . 3 9 have been associated with fungal infection and, as CTLA4, have also engraftment: neutrophil count 0.5 10 /L for 3 days and platelets 20 10 /L for 3 days with no prior transfusion for at least 7 days.40,41 been associated with relapse and mortality. The association of CTLA4 18-35 Disease status was classified according to criteria from the Center for with GVHD has also been previously observed. REL, a subunit of 42 k fl International Blood and Marrow Transplant Research. Myeloablative NF- B, has been implicated with the in ammatory response in conditioning was defined as a regimen containing either total body irradiation autoimmune disease and septic shock outcomes but, to our knowledge, withadoseof.6 Gy, a dose of oral busulfan of .8 mg/kg, a dose of IV busulfan 36,37 our study is the first to analyze REL in the transplant setting. of .6.4 mg/kg, or a dose of treosulfan of at least 12 g/m2. Recipients and CBU were typed for HLA-A and -B at the antigenic level and for -DRB1 at the allelic level. A subset analysis using a homogenous group of patients with acute leukemia, given a myeloablative conditioning regimen and with available allele typing of HLA -A, -B, -C, and -DRB1, was performed (this subset of cohort 1 Patients and methods was defined as cohort 2). This cohort of patients was part of a previous study on the impact of HLA high-resolution typing on CBT outcomes, which has included Study design a total of 1568 single CBTs.5 This is a retrospective cohort study performed in collaboration with Eurocord, Preliminary analyses of Hardy-Weinberg equilibrium and minimum allele Cellular Therapy and Immunobiology Working Party of the European Society frequency of genotype distribution were performed. The x2 test was used for for Blood and Marrow Transplantation (EBMT-CTIWP), NetCord and Ribeirão measuring difference between groups.43,44 Independent risk factors for DFS and Preto School of Medicine of São Paulo University (FMRP-USP). OS were performed by univariate and multivariate survival analyses with log- Inclusion criteria were: (1) availability of CBU samples used in the CBT; (2) rank and Cox proportional hazards tests, respectively.45 Prognostic factors for recipients of unrelated single CBT with malignant diseases; (3) transplants neutrophil and platelet engraftment, acute and chronic GVHD, NRM were performed by EBMT centers; and (4) availability of clinical data at EBMT- analyzed in a competitive risk scenario using Fine and Gray hazards proportional Eurocord databases. Exclusion criteria were: (1) recipients of 2 or more CBUs; models, death being a competitive event.46 Multivariate models were constructed (2) recipients of CBU combined with another source of HSCs; (3) CBT with using variables that reached a P , .20 in univariate analysis and other variables intrabone infusion; and/or (4) CBT with expanded in vitro CBU or any other form with clinical relevance. Estimated type I error was set at 0.05. Statistical analyses of experimental manipulations. were performed with SPSS 18.0 (SPSS Inc), Splus2000 (MathSoft), and The R 47,48 From January 1994 to December 2010, CBU samples were available for all Project for Statistical Computing (http://www.r-project.org). 696 CBTs, and recipient samples were available for only 143 of them. Samples were shipped in dry ice to the Laboratory of Hematology (Ribeirão Preto School of Medicine, São Paulo University) according to international and Brazilian regulations for shipment of biological material. Informed consent procedures for Results performing genetic studies in CBUs and patients followed the ethical committee rules of each cord blood bank (CBB). Informed consents for CBU collection and Recipients, donors, and transplant characteristics CBT were obtained in accordance with Declaration of Helsinki. This study was approved by the local ethical committee (Comite´ de Protection des Personnes Ile- Patients, disease, donor (CBU), and transplant characteristics of the 3 de-France IV) located in Saint-Louis Hospital, Paris, France. cohorts of CBT recipients are listed in Table 1. The first cohort includes Conditioning protocols, GVHD prophylaxis, selection of CBU, use of 696 patients who met the eligibility criteria of the study. The second granulocyte-colony-stimulating factor, reactivation of cytomegalovirus (CMV) cohort is a subset group of patients that includes 305 CBT recipients for surveillance, and use of antimicrobial agents followed guidelines and rules of whom HLA high-resolution typing of HLA -A, -B, -C, and -DRB1 of each transplant center. patients and CBUs were available. This cohort was used to confirm the results found in cohort 1. Supplemental Table 2 lists the 7 CBBs that Genetic polymorphism provided the CBU samples for genotyping. Biological samples were shipped as extracted DNA, umbilical cord blood Of the first cohort, 56% of patients were male (n 5 380), 54% children sample, or cord fragment tissue. Genomic DNA was extracted with the (n 5 375), 61% had a positive CMV serology (n 5 393), and 34% From www.bloodjournal.org by guest on September 25, 2017. For personal use only.

BLOOD, 26 JANUARY 2017 x VOLUME 129, NUMBER 4 CTLA4 GENOTYPE AND OUTCOMES AFTER UCBT 527

Table 1. Recipients, donors, and CBT characteristics with available In the second cohort, all patients had acute leukemia and received a CBU samples and according to CTLA4-CBU genotyping (n 5 696) myeloablative conditioning regimen, 66% were children (n 5 191), and AA-CBU AG-CBU GG-CBU 91% received ATG (n 5 269). Only 5% of patients were transplanted genotyping, genotyping, genotyping, 5 5 Recipients n 5 162 n 5 349 n 5 185 with an 8 of 8 CBU graft (n 15), whereas 13% were 7 of 8 (n 40), 25% were 6 of 8 (n 576), 32% were 5 of 8 (n 598), 19% were 4 of 8 (n 5 Male sex, n 79 200 110 59), and 6% were 3 of 8 (n 517). Table 2 describes recipients, donors, and Age, median (range), y 17 (0.4-64) 16 (0.3-67) 16 (0.6-69) Weight, median (range), kg 49 (5.5-95) 50 (6-112) 52 (6-119) CBT characteristics of cohort 2 according to CTLA4-CBU genotyping. Children #18 y, n 85 188 102 Positive CMV serology, n 56 125 76 Genetic polymorphism Major ABO incompatibility, n 31 99 60 Results of CBU and recipient gene polymorphisms and their prevalence Diagnostic, n are listed in supplemental Tables 3 and 4. Alleles or genotypes with ALL 63 135 76 , AML 53 123 54 1% of frequency were not observed and frequencies were similar MDS 18 32 22 among groups. In addition, group allelic frequencies were in Hardy- CML 9 16 9 Weinberg equilibrium. CLL 3 2 0 Lymphoma 9 29 17 Outcomes Myeloma 0 5 2 Histiocytosis 6 7 5 Neutrophil and platelet engraftment. Cumulative incidence of Others 1 0 0 neutrophil recovery was 82% (95% confidence interval [CI], 79%- HLA compatibility, n 85%) at day 60 post-CBT. According to CBU genotypes, univariate 6/6 19 35 14 analysis showed an association of CTLA4-CBU and neutrophil 5/6 59 136 71 recovery. In fact, at day 60, cumulative incidence of neutrophil 4/6 75 155 88 recovery for patients transplanted with an AA-CTLA4-CBU was 85% 3/6 3 13 6 (95% CI, 79%-90%), whereas it was 84% (95% CI, 80%-87%) for 2/6 2 1 3 AG, and 77% (95% CI, 70%-83%) for GG genotypes, respectively. Disease status at time of Multivariate analysis confirmed a delayed neutrophil recovery for CBT, n Early 56 119 50 recipients of CBU with GG-CTLA4 genotype (hazard ratio [HR], 1.33; P 5 Intermediate 58 129 73 95% CI, 1.04-1.70; .02). Others factors independently associated Advanced 162 346 182 with neutrophil recovery were: age .18 years (HR, 1.58; 95% CI, 1.28- Conditioning, n 1.96; P , .01), advanced status of disease at time of CBT (HR, 1.27; Myeloablative 130 283 143 95% CI, 1.03-1.54; P 5 .02), and median infused TNC (.median; HR, Nonmyeloablative 32 62 38 0.71; 95% CI, 0.66-0.93; P 5 .01). Details of multivariate analysis for GVHD prophylaxis, n neutrophil engraftment are shown in supplemental Tables 5 and 6. CsA 6 others 143 295 158 At day 180, cumulative incidence of platelet recovery was 57% 6 MTX others 10 10 10 (95% CI, 53%-61%). In the multivariate model, platelet recovery Others 2 7 4 independently associated with: age .18 years (HR, 1.38; 95% CI, 1.06- Use of ATG and /or P 5 monoclonal antibody, n 1.79; .02), advanced disease at time of CBT (HR, 1.32; 95% CI, P 5 Yes 121 281 136 1.02-1.69; .03), and recipients with negative CMV serology status No 34 52 38 (HR, 0.77; 95% CI, 0.62-0.94; P 5 .01). Details of multivariate analysis Infused TNC dose, 3.5 (0.9-30) 3.3 (0.6-22) 3.4 (0.7-21) for platelet engraftment are shown in supplemental Tables 7 and 8. median (range), 3107/kg Acute and chronic GVHD. Cumulative incidence of grade II- Infused CD341 cell dose, 1.5 (0.2-20) 1.5 (0.2-39) 1.4 (0.3-18) IV acute GVHD was 31% (95% CI, 28%-35%) at day 100. No median (range), 3105/kg statistical association was found between any SNP analyzed and the Follow-up (range), mo 40 (4-181) 55 (1-160) 48 (1-195) incidence of acute GVHD. In multivariate analysis, variables associated ALL, acute lymphoblastic leukemia; AML, acute myeloid leukemia; CLL, chronic with acute GVHD were: age .18 years (HR, 1.69; 95% CI, 1.51-1.92; lymphocytic leukemia; CML, chronic myeloid leukemia; CsA, cyclosporin A; MDS, P , .01), myeloablative conditioning (HR, 1.77; 95% CI, 1.57-2.01; myelodysplastic syndrome; MTX, methotrexate. P , .01), use of ATG or monoclonal antibody (HR, 0.57; 95% CI, 0.50-0.64; P , .01), and advanced disease at CBT (HR, 1.39; 95% CI, received a CBT with a major ABO incompatibility (n 5 190). Ten 1.23-1.57; P5 .03). At 4 years, cumulative incidence of chronic GVHD percent of CBT were HLA-identical (6 of 6) (n 5 68), 39% were was 16% (95% CI, 13%-19%). No statistical association between the transplanted with 1 HLA disparity (5 of 6) (n 5 266), 51% were candidate genes and chronic GVHD was observed. In multivariate transplanted with 2 or more HLA disparities (4 of 6 or 3 of 6) (n 5 346). analysis, only advanced disease at time of CBT was associated with a At time of transplantation, recipients had early disease status in 33% (n 5 higher incidence of chronic GVHD (HR, 1.74; 95% CI, 1.54-1.96; P 5 225), intermediate disease in 38% (n 5 260), and advanced disease in .03). Details of multivariate analysis for acute or chronic GVHD are 27% (n 5 205) of the cases. GVHD prophylaxis was cyclosporine-based shown in supplemental Tables 9-12. in 91% of patients (n 5 597). Myeloablative conditioning was used in NRM, OS, and causes of death. Cumulative incidence of 80% (n 5 556) and antithymocyte immunoglobulin (ATG) or mono- NRM was 37% (95% CI, 33%-41%) at 4 years. Univariate analysis clonal antibody in 81% (n 5 538) of the cases. Infused median total according to CBU genotype showed higher NRM for recipients of nucleated cells (TNCs) were 3.4 3 107/kg (range, 0.6-30) and median GG-CTLA4-CBU genotype (AA, 35% [95% CI, 29%-43%]; AG, 32% CD341 cells were 1.5 3 105/kg (range, 0.2-39). Median follow-up was [95% CI, 27%-37%]; GG, 47% [95% CI, 40%-54%]) (Figure 1A). 49 (range, 0.8-195) months. Table 1 describes recipients, donors, and Multivariate analysis confirmed these results demonstrating higher CBT characteristics of cohort 1 according to CTLA4-CBU genotyping. NRM for recipients of GG-CBU genotype (HR, 1.50; 95% CI, 1.11-2.03; From www.bloodjournal.org by guest on September 25, 2017. For personal use only.

528 CUNHA et al BLOOD, 26 JANUARY 2017 x VOLUME 129, NUMBER 4

Table 2. Recipients, donors, and CBT characteristics with available 1.01-1.58; P 5 .04). In the analysis of cohort 2, recipients of GG- CBU samples, HLA-HR data, and according to CTLA4-CBU CTLA4-CBU genotype had inferior OS, but with borderline statistical genotyping (n 5 305) significance (HR, 1.54; 95% CI, 1.03-2.41; P 5 .06). At 4 years, OS AA-CBU AG-CBU GG-CBU was 53% (95% CI, 41%-64%) for AA, 45% (95% CI, 37%-53%) for genotyping, genotyping, genotyping, Recipients n 5 73 n 5 153 n 5 79 AG, and 33 (95% CI, 23%-46%) for GG. Details of multivariate

Male sex, n 32 93 51 analysis for OS are shown in supplemental Tables 15 and 16. 5 Age, median (range), y 13 (0.7-56) 13 (0.8-60) 11 (0.7-48) Fifty-nine percent of recipients died (n 412). Sixty-two percent of Weight, median (range), kg 39 (7-93) 50 (7-112) 36 (6-94) deaths were related-transplant complications (n 5 255), 36% to relapse Children #18 y, n 45 91 55 or disease progression (n 5 149), whereas 2% were of unknown causes Positive CMV serology, n 43 91 46 (n 5 8). Deaths related to transplant complications (n 5 255) were due Major ABO incompatibility, n 13 49 29 to: infections, 44% (n 5 113); GVHD, 23% (n 5 59); multiple organ Diagnostic, n failure, 6% (n 5 16); pulmonary complications, 6% (n 5 13); graft ALL 36 82 42 failure, 5% (n 5 12); bleeding disorders, 5% (n 5 12); hepatic veno- AML 28 57 24 occlusive disease, 4% (n 5 11); and other or unknown causes, 7% (n 5 MDS 9 14 13 19) . Associations between SNPs and causes of death were studied HLA compatibility, n 8/8 5 8 2 by correspondence analysis. Recipients of CBU with GG-CTLA4 7/8 9 17 14 genotype died mainly of transplant complications, in particular, GVHD 6/8 18 40 18 and infectious complications (Table 3). 5/8 21 45 32 Relapse. Cumulative incidence of relapse was 28% (95% CI, 4/8 13 34 12 24%-32%) at 4 years. In univariate analysis, CTLA4 genotype was 3/8 7 9 1 associated with higher incidence of relapse. It was 22% (95% CI, 17%- Disease status at time of 28%) for AA, 33% (95% CI, 29%-37%) for AG, and 25% (95% CI, CBT, n 20%-31%) for GG. However, multivariate analysis did not confirm this Early 32 66 27 result. Other variables independently associated with relapse were: use Intermediate 28 56 30 of ATG or monoclonal antibody (HR, 1.75; 95% CI, 1.15-2.70; P , Advanced 13 31 22 GVHD prophylaxis, n .01) and advanced disease at time of CBT (HR, 1.83; 95% CI, 1.32- P , CsA 6 others 68 137 71 2.56; .01). In cohort 2, CTLA4-CBU genotype was not associated MTX 6 others 4 9 3 with relapse. Details of multivariate analysis for NRM are shown in Others 0 2 3 supplemental Tables 17 and 18. Use of ATG and /or Disease-free survival. At 4 years, DFS was 35% (95% CI, 31%- monoclonal antibody, n 39%). According to CBU genotype, univariate analysis demonstrated Yes 8 9 9 an impact for CTLA4 on DFS (AA, 43% [95% CI, 35%-51%]; AG, No 62 139 68 35% [95% CI, 29%-41%]; and GG, 29% [95% CI, 24%-35%]). Infused TNC dose, 3.7 (1-17) 3.4 (1-19) 3.9 (1-20) Multivariate analysis confirmed these results showing inferior DFS of median (range), 3107/kg 1 recipients receiving GG genotype CBU (HR, 1.41; 95% CI, 1.06-1.88; Infused CD34 cell dose, 1.8 (0.3-18) 1.7 (0.2-9) 1.6 (0.3-17) P 5 median (range), 3105/kg .02). Other variables associated with DFS were: recipients with P 5 Follow-up (range), mo 59 (4-125) 60 (3-145) 40 (6-86) negative CMV serology status (HR, 0.74; 95% CI, 0.60-0.92; .01), intermediate disease at time of CBT (HR, 1.46; 95% CI, 1.13-1.89; This cohort is a subset of recipients with available CBU (n 5 696). P , .01), and advanced disease at time of CBT (HR, 2.17; 95% CI, HR-HLA, high-resolution HLA. Other abbreviations are explained in Table 1. 1.67-2.82; P , .01). In cohort 2, recipients of GG-CTLA4-CBU genotype had inferior DFS, but it was not statically significant (HR, P , .01). Other factors independently associated with NRM were: 1.49; 95% CI, 0.96-2.30; P 5 .07). Details of multivariate analysis for previous HCT (HR, 1.73; 95% CI, 1.20-2.49; P , .01), recipients with NRMareshowninsupplementalTables19and20. negative CMV serology status (HR, 0.63; 95% CI, 0.47-0.85; P , Table 4 shows a summary of the multivariate analysis results .01), use of ATG or monoclonal antibody (HR, 1.69; 95% CI, 1.07- according to CBU genotype. 2.67; P 5 .03), and advanced disease at time of CBT (HR, 1.72; 95% CI, 1.30-2.27; P , .01). In the analysis for cohort 2, recipients of CBU GG CTLA4 genotype had increased NRM (HR, 1.85; 95% CI 1.14- 3.00; P 5 .01). Four-year cumulative incidence of NRM was 27% Discussion (95% CI, 18%-38%) for AA, 28% (95% CI, 21%-36%) for AG, and 42% (95% CI, 31%-54%) for GG (Figure 1B). Details of multivariate CBT is a valuable option for patients with malignant diseases lacking an analysis for NRM are shown in supplemental Tables 13 and 14. HLA-identical donor. Since the first CBT in 1988, considerable Estimated OS at 4 years was 40% (95% CI, 36%-44%). Univariate progress has been made in this field.49 As in allogeneic HSCT (allo- analysis by CBU genotype showed inferior OS for recipients of GG- HSCT), many variables, mostly related to CBU, recipients, disease, CTLA4-CBU genotype (AA, 47% [95% CI, 39%-55%]; AG, 41% or transplantation characteristics, influence the occurrence of CBT [95% CI, 35%-47%]; GG, 33% [95% CI, 26%-42%]), and multivariate complications, such as transplantation toxicities, GVHD, infections, analysis confirmed these results (HR, 1.41; 95% CI, 1.04-1.90; and relapse. In addition, CBT outcomes are comparable to other HSC P 5 .02). Other factors independently associated with OS were: sources when CBT is performed with ,2 of 6 HLA mismatches and negative CMV serology status (HR, 0.69; 95% CI, 0.55-0.87; P , .01), adequate cell dose.50-54 However, the influence of non-HLA genetic intermediate or advanced disease at time of CBT (HR, 1.50; 95% factors, such as genetic polymorphism of immune response, on CBT CI, 1.13-1.94; P , .01 and HR, 2.20; 95% CI, 1.67-2.89; P , .01, outcomes has not been well investigated. The purpose of the current respectively), and .2 HLA incompatibility (HR, 1.26; 95% CI, study was to investigate the association between CBU and recipient From www.bloodjournal.org by guest on September 25, 2017. For personal use only.

BLOOD, 26 JANUARY 2017 x VOLUME 129, NUMBER 4 CTLA4 GENOTYPE AND OUTCOMES AFTER UCBT 529

A B

1.0 GG-CBU (n=185): 47 ± 4% 1.0 GG-CBU (n=79): 42 ± 6% AA-CBU (n=162): 35 ± 4% AG-CBU (n=153): 28 ± 4% 0.8 0.8 AG-CBU (n=349): 32 ± 3% AA-CBU (n=73): 27 ± 5%

0.6 0.6

0.4 0.4

0.2 0.2 non-relapse mortality non-relapse mortality Cumulative incidence of Cumulative incidence of 0.0 0.0 010203040 50 60 0 10203040 50 60 Time (months) Time (months)

Figure 1. NRM according to CBU genotype. (A) CTLA4 for recipients with available CBU samples (n 5 696). (B) CTLA4 for recipients with available CBU samples and HLA high-resolution typing (n 5 305).

genotypes with CBT outcomes. With this aim, CBUs (n 5 696) CBT.1 The number of hematopoietic progenitor cells and lymphocytes were genotyped for 8 SNPs related to immunological response. in CBUs is, usually, 1 log less than in other HSC grafts, and both of Before establishing any association between genotypes and these types of cells have an important role in engraftment after phenotypes, preliminary analysis ruled out genotyping errors by allo-HSCT. Therefore, we could speculate that lymphocytes minimum allelic frequency test (data not shown). carrying the GG-CTLA4 genotype of CBU have lower allor- In this analysis, polymorphisms of innate immune response eactivity, which could, in turn, explain the impact of CTLA4 on were not associated with CBT outcomes. CTLA4 was the only engraftment. Analyzing CBT immune tolerance, Miller et al candidate gene of adaptive immune response evaluated in this compared CTLA4 gene expression of CBU and HSCs of adult study. Interestingly, recipients of CBU carrying GG-CTLA4 genotype donors and demonstrated reduced expression of CTLA4 in CBU had lower neutrophil recovery, increased NRM, and decreased DFS. T cells, but theses results were not reproduced in accordance with In addition, recipients of CBU carrying the AA-CTLA4 genotype CTLA4 allele genotype.56 Another interesting finding in our study showed a lower incidence of relapse. Furthermore, increased NRM was that recipients of CBU AA genotype experienced lower was confirmed in recipients of CBU with GG-CTLA4 with acute relapse. Again, this result suggests superior T-cell response for leukemia, receiving myeloablative conditioning, and with available AA-CBU genotype. high-resolution HLA typing. Although our findings are based on solid research and are CTLA4 and its genetic variations have been largely investigated biologically likely, there are some limitations to our study. First, this in the field of allo-HSCT, especially regarding GVHD and survival. has a retrospective registry-based nature, with multiple transplant However, because of contradictory results, the real impact of centers and CBBs. Second, the effect of recipient CTLA4 genotype CTLA4onoutcomesisdifficult to evaluate.28,29,31-33 The largest could not be studied as recipient samples were not available. Finally, the study available, with a homogeneous cohort, yielded negative study population is heterogeneous, with patients with different diseases results. Nevertheless, there is no report in the literature describing receiving different types of conditioning regimen and trans- the role of CTLA4 in CBT.13 planted in a wide range period. To overcome the limitations, the The impact of CTLA4 presented in our study is biologically final multivariate models were adjusted for these variables, plausible. It has been suggested that the G allele of CT60 produces however, cautions on the conclusions should be taken when lower messenger RNA of soluble CTLA4 (sCTLA4) whose performing multiple tests. Therefore, with the aim of circum- expression is the functional basis for the observed association venting these limitations, we performed the same analysis in a between autoimmune diseases and CTLA4.55 Others have analyzed messenger RNA expression of sCTLA4 in 60 healthy blood donors and have identified allele A as being responsible for greater Table 3. Cause of deaths of recipients according to CTLA4-CBU production of sCTLA4. This study also reported that allo-HSCT genotyping recipients of progenitor cells from AA genotype donors had Recipients with available CBU samples, N 5 696 higher number of alloimmune reactions, as indicated by the higher Variable AA, n 5 162 AG, n 5 349 GG, n 5 185 incidence of acute GVHD. Furthermore, the G allele conferred GVHD 13 24 22 32 increased risk of disease relapse. Moreover, it has been suggested Infections 22 55 36 that sCTLA4 inhibits the B7-flCTLA4 complex, with consequent Pulmonary complications 3 5 5 32,55 influence on T-cell alloreactivity. These previous findings, Bleeding disorders 2 6 4 taken together, are in agreement with the results observed in our Multiple organ failure 6 4 6 study. Graft failure 3 3 6 In the current study, CBU with GG genotype was associated with VOD 4 4 3 decreased neutrophil engraftment, increased NRM, and decreased Others 3 14 2 survival rates. Higher cell dose and fewer HLA disparities of the CBU Total 56 115 84 graft with the recipient are known factors enhancing engraftment after VOD, hepatic veno-oclusive disease. From www.bloodjournal.org by guest on September 25, 2017. For personal use only.

530 CUNHA et al BLOOD, 26 JANUARY 2017 x VOLUME 129, NUMBER 4

Table 4. Summary of multivariate analysis results according to CBU genotype 95% CI Analyzed cohort Outcomes SNP HR <>P Recipients with available Neutrophil engraftment CTLA4 genotype AA Reference group .08 CBU samples, n 5 696 CTLA4 genotype AG 0.85 0.69 1.05 .14 CTLA4 genotype GG 1.33 1.04 1.70 .02 NRM CTLA4 genotype AA Reference group .02 CTLA4 genotype AG 0.68 0.47 0.98 .04 CTLA4 genotype GG 1.50 1.11 2.03 ,.01 OS CTLA4 genotype AA Reference group .09 CTLA4 genotype AG 1.19 0.91 1.56 .21 CTLA4 genotype GG 1.41 1.04 1.90 .02 Relapse CTLA4 genotype AA Reference group .07 CTLA4 genotype AG 1.35 0.83 2.17 .22 CTLA4 genotype GG 1.19 0.82 1.27 .35 DFS CTLA4 genotype AA Reference group .06 CTLA4 genotype AG 1.21 0.93 1.57 .15 CTLA4 genotype GG 1.41 1.06 1.88 .02 Recipients with available Neutrophil engraftment CTLA4 genotype AA Reference group .16 CBU samples and HR-HLA CTLA4 genotype AG 0.74 0.54 1.01 .07 data (n 5 305)* CTLA4 genotype GG 1.33 0.92 1.92 .13 NRM CTLA4 genotype AA Reference group .05 CTLA4 genotype AG 0.68 0.38 1.2 .18 CTLA4 genotype GG 1.85 1.14 3.00 .01 OS CTLA4 genotype AA Reference group .07 CTLA4 genotype AG 1.05 0.70 1.59 .81 CTLA4 genotype GG 1.54 0.99 2.41 .06 Relapse CTLA4 genotype AA Reference group .25 CTLA4 genotype AG 1.49 0.75 2.94 .66 CTLA4 genotype GG 1.12 0.65 1.96 .25 DFS CTLA4 genotype AA Reference group .17 CTLA4 genotype AG 1.16 0.78 1.73 .46 CTLA4 genotype GG 1.49 0.96 2.30 .07

Abbreviations are explained in Table 2. *This cohort is a subset of recipients with available CBU (n 5 696). homogeneous subset of cohort 1 (cohort 2) to have a more homogeneous population, with only patients transplanted for Acknowledgments acute leukemia all given a myeloablative conditioning regimen and with available allele HLA typing. In this cohort, we were able The authors thank all participating cord blood banks and transplant to demonstrate that CTLA4 was associated with NRM, but not centers. The authors also thank Dalva Tereza Catto for valuable with other outcomes. The absence of association with other contribution on samples shipment from Europe to the Laboratory of outcomes may have been related to the limited sample size. Hematology of Ribeirão Preto School of Medicine of São Paulo Nevertheless, caution should be taken when interpreting the University. conclusions, as when performing multiple tests, some of the V.R. was supported by National Health Service Blood and results may have arisen by chance. In addition, studies on functional Transplants and funded by the National Institute for Health analysis of the CTLA4 gene and the impact of HLA-DP on CBT Research Biomedical Research Centers funding scheme, Oxford, 57 outcomes are also needed. United Kingdom, and by Fundação de Amparo a` Pesquisa do In many retrospective studies, the number of HLA disparities Estado de São Paulo grant 2013/02162-8, São Paulo, Brazil. This and cell dose are important factors associated with outcomes after work was supported by the Center for Cell-Based Therapy of 1,53 CBT and are frequently used to select the best CBUs. Ribeirão Preto School of Medicine of São Paulo University. However, when multiple CBUs are available for the same patient, CTLA4-CBU genotype could be tested and used as an additional criterion to select the CBU, with the aim of improving outcomes after CBT. Authorship In conclusion, gene polymorphisms of the immune response may influence CBT outcomes. CTLA4 was shown to impact CBT outcomes Contribution: R.C., M.A.Z., E.G., and V.R. designed research; R.C., according to CBU genotype. The association of GG-CTLA4 of the CBU R.A.P., and M.T. performed experiments; R.C., F.V., and A.R. with lower survival and higher NRM suggests that this polymorphism verified clinical data; S.Q., F.P., G.K., J.L.V., P.B., R.S., C.H.L., and might be considered for CBU selection when .1 CBU meeting the current C.D.-d.-H. provided biological samples; G.S., G.M., and H.B. suggested selection criteria of cell dose and HLA matching is available. provided clinical data; R.C., F.F., J.P., and V.R. analyzed the data Importantly, CTLA4 typing of CBUs should not, significantly, increase and performed statistical analysis; and R.C., E.G., and V.R. wrote the costs or delay transplantation, and it could be provided by the CBBs. manuscript. From www.bloodjournal.org by guest on September 25, 2017. For personal use only.

BLOOD, 26 JANUARY 2017 x VOLUME 129, NUMBER 4 CTLA4 GENOTYPE AND OUTCOMES AFTER UCBT 531

Conflict-of-interest disclosure: The authors declare no competing financial interests. Appendix: study group members A complete list of the members of the Eurocord, Cord Blood Committee Cellular Therapy–Immunobiology Working Party of the The study group members are as follows: Eurocord: R.C., F.V., A.R., European Society for Blood and Marrow Transplantation, Netcord and E.G., V.R.; Cord Blood Committee Cellular Therapy–Immunobiology Faculdade de Medicina de Ribeirão Preto–Faculdade de Medicina de Working Party of the European Society for Blood and Marrow São Paulo, Universidade de São Paulo appears in “Appendix.” Transplantation: A.R.; Netcord: S.Q., F.P., G.K., J.L.V., P.B., R.S., Correspondence: Renato Cunha, Bone Marrow Transplantation C.H.L.; Faculdade de Medicina de Ribeirão Preto–Faculdade de Unit, Clinical Hospital of Ribeirão Preto School of Medicine of São Medicina de São Paulo, Universidade de São Paulo: R.C., M.A.Z., Paulo University, São Paulo, Brazil; e-mail: [email protected]. M.T., R.A.P., V.R.

References

1. Hough R, Danby R, Russell N, et al. 13. Sengsayadeth S, Wang T, Lee SJ, et al. Cytotoxic 24. Lin MT, Storer B, Martin PJ, et al. Relation of an Recommendations for a standard UK approach to T-lymphocyte antigen-4 single nucleotide interleukin-10 promoter polymorphism to graft- incorporating umbilical cord blood into clinical polymorphisms are not associated with outcomes versus-host disease and survival after transplantation practice: an update on cord blood after unrelated donor transplantation: a Center for hematopoietic-cell transplantation. New Engl J unit selection, donor selection algorithms and International Blood and Marrow Transplant Med. 2003;349(23):2201-2210. conditioning protocols. Br J Haematol. 2016; Research analysis. Biol Blood Marrow Transplant. 25. Stark GL, Dickinson AM, Jackson GH, Taylor PR, 172(3):360-370. 2014;20(6):900-903. Proctor SJ, Middleton PG. Tumour necrosis factor 2. Kollman C, Howe CW, Anasetti C, et al. Donor 14. Pasquini MC, Wang Z. Current use and outcome receptor type II 196M/R genotype correlates characteristics as risk factors in recipients after of hematopoietic stem cell transplantation: CIBMTR with circulating soluble receptor levels in normal transplantation of bone marrow from unrelated summary slides. Available at: https://www.cibmtr.org/ subjects and with graft-versus-host disease after donors: the effect of donor age. Blood. 2001; ReferenceCenter/SlidesReports/SummarySlides/ sibling allogeneic bone marrow transplantation. 98(7):2043-2051. Pages/index.aspx#CiteSummarySlides. Accessed Transplantation. 2003;76(12):1742-1749. 15 November 2015. 3. Lee SJ, Klein J, Haagenson M, et al. High-resolution 26. Kesh S, Mensah NY, Peterlongo P, et al. TLR1 donor-recipient HLA matching contributes to the 15. Rocha V, Franco RF, Porcher R, et al. Host and TLR6 polymorphisms are associated with success of unrelated donor marrow transplantation. defense and inflammatory gene polymorphisms susceptibility to invasive aspergillosis after Blood. 2007;110(13):4576-4583. are associated with outcomes after HLA-identical allogeneic stem cell transplantation. Ann N Y 4. Woolfrey A, Klein JP, Haagenson M, et al. HLA-C sibling bone marrow transplantation. Blood. 2002; Acad Sci. 2005;1062:95-103. antigen mismatch is associated with worse 100(12):3908-3918. 27. Granell M, Urbano-Ispizua A, Pons A, et al. outcome in unrelated donor peripheral blood stem 16. Rocha V, Porcher R, Fernandes JF, et al. Common variants in NLRP2 and NLRP3 genes cell transplantation. Biol Blood Marrow Association of drug metabolism gene are strong prognostic factors for the outcome of Transplant. 2011;17(6):885-892. polymorphisms with toxicities, graft-versus-host HLA-identical sibling allogeneic stem cell 5. Eapen M, Klein JP, Ruggeri A, et al; Center for disease and survival after HLA-identical sibling transplantation. Blood. 2008;112(10):4337-4342. International Blood and Marrow Transplant hematopoietic stem cell transplantation for patients 28. Piccioli P, Balbi G, Serra M, et al. CTLA-4 149A. Research, Netcord, Eurocord, and the European with leukemia. Leukemia. 2009;23(3):545-556. G polymorphism of recipients of HLA-matched Group for Blood and Marrow Transplantation. 17. K¨ogler G, Middleton PG, Wilke M, et al. Recipient sibling allogeneic stem cell transplantation is Impact of allele-level HLA matching on outcomes cytokine genotypes for TNF-alpha and IL-10 and associated with survival and relapse incidence. after myeloablative single unit umbilical cord blood the minor histocompatibility antigens HY and CD31 Ann Hematol. 2010;89(6):613-618. transplantation for hematologic malignancy. codon 125 are not associated with occurrence or Blood. 2014;123(1):133-140. severity of acute GVHD in unrelated cord blood 29. Vannucchi AM, Guidi S, Guglielmelli P, et al. 6. Rocha V, Ruggeri A, Spellman S, et al. Killer cell transplantation: a retrospective analysis. Significance of CTLA-4 and CD14 genetic immunoglobulin-like receptor-ligand matching and Transplantation. 2002;74(8):1167-1175. polymorphisms in clinical outcome after allogeneic stem cell transplantation. Bone Marrow outcomes after unrelated cord blood 18. Nordlander A, Uzunel M, Mattsson J, Remberger Transplant. 2007;40(10):1001-1002. transplantation in acute myeloid leukemia. Biol M. The TNFd4 allele is correlated to moderate-to- Blood Marrow Transplant. 2016;22(7):1284-1289. severe acute graft-versus-host disease after 30. Sellami MH, Bani M, Torjemane L, et al. Effect of 7. Rocha V, Spellman S, Zhang MJ, et al. Effect of allogeneic stem cell transplantation. Br J donor CTLA-4 alleles and haplotypes on graft- HLA-matching recipients to donor noninherited Haematol. 2002;119(4):1133-1136. versus-host disease occurrence in Tunisian maternal antigens on outcomes after mismatched 19. Takahashi H, Furukawa T, Hashimoto S, et al. patients receiving a human leukocyte antigen- umbilical cord blood transplantation for Contribution of TNF-alpha and IL-10 gene identical sibling hematopoietic stem cell hematologic malignancy. Biol Blood Marrow polymorphisms to graft-versus-host disease transplant. Hum Immunol. 2011;72(2):139-43. Transplant. 2012;18(12):1890-1896. following allo-hematopoietic stem cell 31. Bosch-Vizcaya A, Perez-Garcia A, Brunet S, et al. 8. Dickinson AM, Middleton PG, Rocha V, Gluckman transplantation. Bone Marrow Transplant. 2000; Donor CTLA-4 genotype influences clinical E, Holler E; Eurobank members. Genetic 26(12):1317-1323. outcome after T cell-depleted allogeneic polymorphisms predicting the outcome of bone 20. Mullighan C, Heatley S, Doherty K, et al. Non-HLA hematopoietic stem cell transplantation from HLA- marrow transplants. Br J Haematol. 2004;127(5): immunogenetic polymorphisms and the risk of identical sibling donors. Biol Blood Marrow 479-490. complications after allogeneic hemopoietic stem- Transplant. 2012;18(1):100-105. 9. Middleton PG, Taylor PR, Jackson G, Proctor SJ, cell transplantation. Transplantation. 2004;77(4): 32. P´erez-Garc´ıa A, De la C´amaraR, Rom´an-G´omez Dickinson AM. Cytokine gene polymorphisms 587-596. J, et al; GVHD/Immunotherapy Committee of the associating with severe acute graft-versus-host 21. Bettens F, Passweg J, Gratwohl A, et al. Spanish Group of Hematopoietic Stem Cell disease in HLA-identical sibling transplants. Association of TNFd and IL-10 polymorphisms Transplantation. CTLA-4 polymorphisms and Blood. 1998;92(10):3943-3948. with mortality in unrelated hematopoietic stem cell clinical outcome after allogeneic stem cell 10. Malkki M, Gooley T, Dubois V, Horowitz M, transplantation. Transplantation. 2006;81(9): transplantation from HLA-identical sibling donors. Petersdorf EW. Immune response gene 1261-1267. Blood. 2007;110(1):461-467. polymorphisms in unrelated donor hematopoietic 22. Keen LJ, DeFor TE, Bidwell JL, Davies SM, 33. Azarian M, Busson M, Lepage V, et al. Donor cell transplantation. Tissue Antigens. 2007; Bradley BA, Hows JM. Interleukin-10 and tumor CTLA-4 149 A/G*GG genotype is associated with 69(suppl 1):50-53. necrosis factor alpha region haplotypes predict chronic GVHD after HLA-identical haematopoietic 11. Petersdorf EW. Risk assessment in transplant-related mortality after unrelated donor stem-cell transplantations. Blood. 2007;110(13): haematopoietic stem cell transplantation: stem cell transplantation. Blood. 2004;103(9): 4623-4624. 3599-3602. histocompatibility. Best Pract Res Clin Haematol. 34. Jagasia M, Clark WB, Brown-Gentry KD, et al. 2007;20(2):155-170. 23. Ishikawa Y, Kashiwase K, Akaza T, et al. Genetic variation in donor CTLA-4 regulatory 12. Mullighan CG, Petersdorf EW. Genomic Polymorphisms in TNFA and TNFR2 affect region is a strong predictor of outcome after polymorphism and allogeneic hematopoietic outcome of unrelated bone marrow allogeneic hematopoietic cell transplantation for transplantation outcome. Biol Blood Marrow transplantation. Bone Marrow Transplant. 2002; hematologic malignancies. Biol Blood Marrow Transplant. 2006;12(suppl 1):19-27. 29(7):569-575. Transplant. 2012;18(7):1069-1075. From www.bloodjournal.org by guest on September 25, 2017. For personal use only.

532 CUNHA et al BLOOD, 26 JANUARY 2017 x VOLUME 129, NUMBER 4

35. Balavarca Y, Pearce K, Norden J, et al. Predicting Diagnosis and Staging Working Group report. Biol 51. Rocha V, Labopin M, Sanz G, et al; Acute survival using clinical risk scores and non-HLA Blood Marrow Transplant. 2005;11(12):945-956. Leukemia Working Party of European Blood and immunogenetics. Bone Marrow Transplant. 2015; Marrow Transplant Group; Eurocord-Netcord 42. Szydlo R, Goldman JM, Klein JP, et al. Results of 50(11):1445-1452. Registry. Transplants of umbilical-cord blood or allogeneic bone marrow transplants for leukemia bone marrow from unrelated donors in adults with 36. Toubiana J, Courtine E, Tores F, et al. using donors other than HLA-identical siblings. acute leukemia. N Engl J Med. 2004;351(22): Association of REL polymorphisms and outcome J Clin Oncol. 1997;15(5):1767-1777. of patients with septic shock. Ann Intensive Care. 2276-2285. 43. Consonni G, Moreno E, Venturini S. Testing 2016;6(1):28. Hardy-Weinberg equilibrium: an objective 52. Barker JN, Scaradavou A, Stevens CE. 37. ZhouXJ,LuXL,NathSK,etal;International Bayesian analysis. Stat Med. 2011;30(1):62-74. Combined effect of total nucleated cell dose and Consortium on the Genetics of Systemic Lupus HLA match on transplantation outcome in 1061 Erythematosus. Gene-gene interaction of BLK, 44. Ball RD. Statistical analysis of genomic data. cord blood recipients with hematologic TNFSF4, TRAF1, TNFAIP3, and REL in systemic Methods Mol Biol. 2013;1019:171-192. malignancies. Blood. 2010;115(9):1843-1849. lupus erythematosus. Arthritis Rheum. 2012;64(1): 45. Kaplan EL, Meier P. Nonparametric estimation 53. Rocha V, Gluckman E; Eurocord-Netcord registry 222-231. from incomplete observations. J Am Stat Assoc. and European Blood and Marrow Transplant 1958;53(282):457-481. 38. Glucksberg H, Storb R, Fefer A, et al. Clinical group. Improving outcomes of cord blood manifestations of graft-versus-host disease in human 46. Fine JP, Gray RJ. A proportional hazards model transplantation: HLA matching, cell dose and recipients of marrow from HL-A-matched sibling for the subdistribution of a competing risk. JAm other graft- and transplantation-related factors. Br donors. Transplantation. 1974;18(4):295-304. Stat Assoc. 1999;94(446):496-509. J Haematol. 2009;147(2):262-274. 39. Martin PJ, Weisdorf D, Przepiorka D, et al. 47. Everitt BS. An R and S-PlusÒ Companion to National Institutes of Health Consensus 54. Gluckman E, Ruggeri A, Volt F, Cunha R, Boudjedir Multivariate Analysis. Berlin, Germany: Springer; K, Rocha V. Milestones in umbilical cord blood Development Project on Criteria for Clinical Trials 2005. in Chronic Graft-versus-Host Disease: VI. Design transplantation. Br J Haematol. 2011;154(4):441-447. 48. R: a language and environment for statistical of Clinical Trials Working Group report. Biol Blood 55. Ueda H, Howson JM, Esposito L, et al. computing. Available at: http://www.gbif.org/ Marrow Transplant. 2006;12(5):491-505. Association of the T-cell regulatory gene CTLA4 resources/2585. Accessed 15 November 2015. 40. Ljungman P, Bregni M, Brune M, et al; European with susceptibility to autoimmune disease. Nature. Group for Blood and Marrow Transplantation. 49. Gluckman E, Broxmeyer HA, Auerbach AD, et al. 2003;423(6939):506-511. Allogeneic and autologous transplantation for Hematopoietic reconstitution in a patient with haematological diseases, solid tumours and immune Fanconi’s anemia by means of umbilical-cord 56. Miller RE, Fayen JD, Mohammad SF, et al. disorders: current practice in Europe 2009. Bone blood from an HLA-identical sibling. N Engl J Med. Reduced CTLA-4 protein and messenger RNA Marrow Transplant. 2010;45(2):219-234. 1989;321(17):1174-1178. expression in umbilical cord blood T lymphocytes. Exp Hematol. 2002;30(7):738-744. 41. Filipovich AH, Weisdorf D, Pavletic S, et al. 50. Laughlin MJ, Eapen M, Rubinstein P, et al. National Institutes of Health Consensus Outcomes after transplantation of cord blood or bone 57. Petersdorf EW, Malkki M, O’hUigin C, et al. High Development Project on Criteria for Clinical Trials marrow from unrelated donors in adults with HLA-DP expression and graft-versus-host in Chronic Graft-Versus-Host Disease: I. leukemia. NEnglJMed. 2004;351(22):2265-2275. disease. N Engl J Med. 2015;373(7):599-609. Acknowledgements

I am profoundly grateful to Prof. Antonio Torroni, whom I hold in great esteem for patiently guiding, inspiring and motivating me during the course of my doctoral studies, one of the best chance in my life, and to Prof. Laura Salvaneschi, the mentor who shaped my professional career and to whom I owe much of what I know. My gratitude also goes to Prof. Ornella Semino and Prof. Chiara Mondello for the kind words of encouragement and advice they always supported me with. My sincere thanks to Viola, Anna and Stefania, without whom this thesis would not have gone to print, and Alessandro, Marco, Alessandro, Enza for welcoming me as one of them. My warm thanks are given to Ilaria, Marco, Annalù, Rosalia, Carla and Annamaria, whose help has enabled me to achieve this goal. Finally, I shall mention my family, Sergio, and my little Bianca, who is the apple of my eye, and Barbara, a new friend of mine. Last but not least my two feline friends, unique companions in the hours of studying.