BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from

BMJ Open is committed to open peer review. As part of this commitment we make the peer review history of every article we publish publicly available.

When an article is published we post the peer reviewers’ comments and the authors’ responses online. We also post the versions of the paper that were used during peer review. These are the versions that the peer review comments apply to.

The versions of the paper that follow are the versions that were submitted during the peer review process. They are not the versions of record or the final published versions. They should not be cited or distributed as the published version of this manuscript.

BMJ Open is an open access journal and the full, final, typeset and author-corrected version of record of the manuscript is available on our site with no access controls, subscription charges or pay-per-view fees (http://bmjopen.bmj.com).

If you have any questions on BMJ Open’s open peer review process please email [email protected] http://bmjopen.bmj.com/ on September 29, 2021 by guest. Protected copyright. BMJ Open BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from

Cohort profile: DBDS Genomic Cohort, a prospective and comprehensive resource for integrative and temporal analysis of genetic, environmental, and lifestyle factors affecting health of blood donors.

ForJournal: peerBMJ Open review only Manuscript ID bmjopen-2018-028401

Article Type: Cohort profile

Date Submitted by the 06-Dec-2018 Author:

Complete List of Authors: Hansen, Thomas Folkmann; Rigshospitalet Glostrup, Danish Headache Center; Kobenhavns Universitet Sundhedsvidenskabelige Fakultet, Center for protein research Banasik, Karina; Kobenhavns Universitet Sundhedsvidenskabelige Fakultet, Centre for Protein research Erikstrup, Christian; Aarhus Universitetshospital, Immunology Pedersen, Ole; Neastved hospital, Department of immunology Westergaard , David ; Kobenhavns Universitet Sundhedsvidenskabelige Fakultet, Center for protein research Chmura , Piotr; Kobenhavns Universitet Sundhedsvidenskabelige Fakultet, Center for protein research http://bmjopen.bmj.com/ Nielsen, Kaspar; Aalborg university hospital, immunology Thørner, Lise; Rigshospitalet, immunology Hjalgrim, Henrik; Statens Serum Institut Paarup, Helene; Universitetshospital, Immunology Larsen , Margit ; Rigshospitalet, immunology Petersen, Mikkel; Aarhus Universitetshospital, Immunology Jennum, Poul; Glostrup Hostpital, Danish Center for Sleep Medicine, Department of Clinical Neurophysiology; University of Copenhagen,

Center for Healthy Ageing on September 29, 2021 by guest. Protected copyright. Andersen, Steffen; Copenhagen Business School, Economics Nyegaard, Mette; Aarhus Universitet Health, Biomedicine Jemec, Gregor; Roskilde University Hospital, Dermatology Olesen, Jes; Rigshospitalet Glostrup, Danish Headache Center Werge, Thomas; Kobenhavns Universitet Sundhedsvidenskabelige Fakultet, Clinical medicine Johansson, Pär I; Copenhagen University Hospital, Rigshospitalet, Department of Clinical Immunology Sørensen, Erik; Rigshospitalet, immunology Brunak, S; University of Copenhagen Ullum, Henrik; Rigshospitalet, immunology Burgdorf, Kristoffer ; Rigshospitalet, immunology

Blood bank & transfusion medicine < PATHOLOGY, GENETICS, Keywords: PREVENTIVE MEDICINE, PUBLIC HEALTH

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml Page 1 of 18 BMJ Open BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 For peer review only 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

32 http://bmjopen.bmj.com/ 33 34 35 36 37 38 39

40 on September 29, 2021 by guest. Protected copyright. 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml BMJ Open Page 2 of 18

1 2 3 4 Cohort profile: DBDS Genomic Cohort, a prospective and comprehensive resource for BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 6 7 integrative and temporal analysis of genetic, environmental, and lifestyle factors affecting 8 9 health of blood donors. 10 11 Thomas Folkmann Hansen*1,2 Karina Banasik*2, Christian Erikstrup3, Ole Birger Pedersen4, David 12 13 2 2 5 6 14 Westergaard , Piotr Jaroslaw Chmura , Kaspar René Nielsen , Lise Wegner Thørner , Henrik 15 16 Hjalgrim7, Helene M. Paarup8, Margit Hørup Larsen6, Mikkel Steen Petersen3, Poul Jørgen Jennum9, 17 18 Steffen Andersen10, MetteFor Nyegaard peer11, Gregor review B. Jemec4,12 only, Jes Olesen1, Thomas Werge13, Pär 19 20 6 6 2 6 6 21 Ingemar Johansson , Erik Sørensen , Søren Brunak , Henrik Ullum , Kristoffer Sølvsten Burgdorf 22 23 24 25 Affiliations: 26 27 28 1Danish Headache Center, Department of Neurological department, Copenhagen University Hospital, 29 30 31 Glostrup, . 32 33 2Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, 34 35 University of Copenhagen, Copenhagen, Denmark.

36 http://bmjopen.bmj.com/ 37 3 38 Department of Clinical Immunology, Aarhus University Hospital, Aarhus, Denmark 39 40 4Department of Clinical Immunology, Naestved Hospital, Naestved, Denmark. 41 42 5Department of Clinical Immunology, Aalborg University Hospital, Aalborg, Denmark 43

44 6Department of Clinical Immunology, Copenhagen University Hospital, Rigshospitalet, Copenhagen, on September 29, 2021 by guest. Protected copyright. 45 46 47 Denmark 48 49 7Department of Epidemiology Research, Statens Serum Institut, Copenhagen, Denmark, Department 50 51 of Hematology & Copenhagen University Hospital, Copenhagen, Denmark 52 53 8 54 Department of Clinical Immunology, Hospital, Odense, Denmark 55 56 9Danish Center for Sleep Medicine, Department of Clinical Neurophysiology, Copenhagen 57 58 University Hospital, Copenhagen, Denmark, 59 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml Page 3 of 18 BMJ Open

1 2 3 4 10Department of Economics, Copenhagen Business School BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 6 11 7 Department of Biomedicine, Aarhus University, Aarhus, Denmark 8 9 12Department of Dermatology, Zealand University Hospital, Roskilde; Faculty of Health and Medical 10 11 Sciences, University of Copenhagen, Denmark 12 13 13 14 Institute of Biological Psychiatry, Mental Health Centre Sct. Hans, Copenhagen University 15 16 Hospital, Roskilde, Denmark & Department of Clinical Medicine, University of Copenhagen, 17 18 Copenhagen, DenmarkFor peer review only 19 20 *Both authors contributed equally 21 22 23 24 25 26 27 Corresponding author: 28 29 30 Thomas Folkmann Hansen, [email protected] 31 32 33 34 35

36 http://bmjopen.bmj.com/ 37 38 39 40 41 42 43

44 on September 29, 2021 by guest. Protected copyright. 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml BMJ Open Page 4 of 18

1 2 3 4 Abstract BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 6 7 Purpose: To establish a cohort that enables identification of genomic factors that influence human 8 9 health and empower increased blood donor health and safe blood transfusions. 10 11 Human health is complex and involves several factors, a major one being the genomic aspect. The 12 13 14 genomic era has resulted in many consortia encompassing large samples sizes, which has proven 15 16 successful for identifying genetic factors associated with specific traits. However, it remains a big 17 18 challenge to establish largeFor cohorts peer that facilitate review studies of theonly interaction between genetic factors, 19 20 environmental and life-style factors as these change over the course of life. A major obstacle to such 21 22 23 endeavors is that it is difficult to revisit participants to retrieve additional information and obtain 24 25 longitudinal, consecutive measurements. 26 27 28 29 30 Participants: Blood donors (n=110,000) have given consent to participate in the Danish Blood Donor 31 32 Study. The study utilizes the infrastructure of the Danish blood banks. 33 34 35

36 http://bmjopen.bmj.com/ 37 Findings to date: The cohort comprises extensive phenotype data and whole genome genotyping data. 38 39 Further, it is possible to retrieve additional phenotype data from national registries as well as from 40 41 the donors at future visits, including consecutive measurements. 42 43

44 on September 29, 2021 by guest. Protected copyright. 45 46 Future plans: To provide new knowledge on factors influencing our health. Thus, provide a platform 47 48 for studying the influence of genomic factors on human health, in particular the interaction between 49 50 environmental and genetic factors. 51 52 53 54 55 56 57 58 59 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml Page 5 of 18 BMJ Open

1 2 3 4 INTRODUCTION BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 6 7 A person's health is determined by complex interactions between genetic, environmental, and lifestyle 8 9 factors. Analyzing these factors collectively and prospectively is preferable. However, this is usually 10 11 only possible using birth-cohorts and large population-based cohorts, and due to the extensive effort 12 13 involved in establishing such cohorts, they are rare. The Danish Blood Donor Study (DBDS; dbds.dk) 14 15 16 is a large prospective cohort of blood donors aiming at identifying predictors of healthy donors. As 17 18 part of this cohort, weFor have now peer established review DBDS Genomic only Cohort assessing common SNPs in 19 20 110,000 donors. Thus, the DBDS Genomic Cohort provides a comprehensive catalog for large-scale 21 22 23 genetic analyses in relation to numerous environmental and lifestyle factors affecting donor’s health. 24 25 26 The evaluation of blood donor’s health is important for several reasons. It is crucial for both the donor 27 28 and the blood recipient that a healthy donor population is maintained with a high donation rate and a 29 30 low dropout rate, thereby ensuring a steady blood supply. Evidence-based guidelines for donor 31 32 33 recruitment, care, and retention are needed to ensure that donor recruitment can focus on individuals 34 35 who are likely to remain healthy and donate frequently in the long term. One obvious relevant

36 http://bmjopen.bmj.com/ 37 influential parameter relates to iron metabolism: We know that hundreds of genes impact the 38 39 1 40 generation and regulation of blood cells and also influence phenotype variations of iron absorption 41 42 and metabolism2-4. Genome-wide SNP information is expected to provide knowledge enabling us to 43

44 evaluate to whom donating blood will be unproblematic, thus facilitating retention of a stable blood on September 29, 2021 by guest. Protected copyright. 45 46 donor population. Another parameter is altruism. Altruism as part of a prosocial behavior, the selfless 47 48 5 6 49 concern for the welfare of others , is generally considered a typical blood donor characteristic . 50 51 However, altruism in the context of voluntary blood donation has also been shown to be a very 52 53 complex phenotype6. In a previous study, we found a substantially larger genetic influence on blood 54 55 56 donor behavior compared to most previous twin studies on altruism, which further highlights the 57 58 heterogeneity of the blood donor personality7. We expect that genome-wide SNP information will 59 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml BMJ Open Page 6 of 18

1 2 3 4 provide knowledge that can aid in the identification of long-term and steady donors. Further, we will BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 6 7 test for association between genotype SNP information and prodromal symptoms of somatic and 8 9 psychiatric disease or illnesses. 10 11 12 The DBDS Genomic Cohort offers the possibility to assess the impact of heterogeneous exposures in 13 14 a broad range of phenotypes, such as mental state, risk-taking behavior, and characterization of blood 15 16 17 components, and immune defense. The setup of the study allows for researchers to assess the genetic 18 For peer review only 19 association in a) cross sectional studies to investigate e.g. the variation of phenotypic characteristic, 20 21 clinical, and biochemical measurements, b) retrospective studies of e.g. rehabilitation capacity, and 22 23 24 c) prospective studies e.g. analyzing the variation of phenotypes and clinical measurements over time 25 26 and even identify prodromal symptoms. 27 28 29 COHORT DESCRIPTION 30 31 32 Population: The nation-wide blood donor population in Denmark consists of more than 230,000 33 34 donors giving more than 300,000 blood donations annually (www.bloddonor.dk). Blood donation in 35

36 http://bmjopen.bmj.com/ 37 Denmark is voluntary and unpaid. This means that the donation is based on the desire to help others, 38 39 who need it and not the desire of an economic benefit. Blood donors must be physically well, aged 40 41 17 and 67 years, and weigh more than 50 kilos. Individuals in chronic medical treatment or frequent 42 43 travelers to countries with high prevalence of blood disease are not allowed to participate. The deferral 44 on September 29, 2021 by guest. Protected copyright. 45 46 rules can be seen at www.bloddonor.dk. Blood donors from foreign countries must have lived in 47 48 Denmark for a minimum of one year, have a Danish social security number and have learned the 49 50 51 Danish language to prevent misunderstandings between the donor and the blood bank professionals. 52 53 54 The nationwide Danish blood bank is an integrated part of the Danish healthcare system financed by 55 56 local and state taxes. The Danish healthcare system is administrated by democratically elected 57 58 assemblies from national state institutions, regions and municipalities. The Danish blood banks are 59 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml Page 7 of 18 BMJ Open

1 2 3 4 non-profit organizations owned and operated by each of the five regions in Denmark. The blood banks BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 6 7 have a national board to structure collaboration across regions on recruiting donors, processing and 8 9 distributing the blood for the Danish population. The DBDS is building upon the structured Danish 10 11 blood bank system in the regions responsible for administrating donation sites at 27 hospitals in 12 13 14 addition to five mobile donation units using 180 selected sites nationally (e.g., large companies, sports 15 16 centers and universities) 17 18 For peer review only 19 At the blood donation centers and attached laboratories, the entire necessary infrastructure needed for 20 21 the collection of biological samples and structured data is in place. Both blood plasma and whole 22 23 24 blood for DNA extraction are available from all donors. The blood bank infrastructure already has 25 26 laboratory facilities with educated staff (nurses, technicians, IT specialists and physicians). In 27 28 addition, the blood bank professionals facilitate the testing of the blood for a variety of biomarkers 29 30 31 and holds expertise in large scale storage of biological material. 32 33 34 Contact to participants at the blood banks is fundamental to our study. Blood donors are asked to 35

36 participate and sign an informed consent when they visit the blood bank to donate blood. This consent http://bmjopen.bmj.com/ 37 38 allows us to use the blood samples from their past and future donations to study the impact of genetic 39 40 41 and immunological factors on current and future health and disease. The inclusion and exclusion 42 43 criteria for blood donation and participation in DBDS are the same with 95% of the blood donors

44 on September 29, 2021 by guest. Protected copyright. 45 who are invited agrees to participate in DBDS8. 46 47 48 Questionnaires: From March 2010 until July 2015, all participating donors had to complete a four- 49 50 51 page paper-based questionnaire with questions of self-experienced physical and mental health 52 53 including the 12-item short form (SF-12®) standardized health survey, smoking habits, alcohol 54 55 intake, exercise, food intake, supplemental iron intake, height, weight, and waist circumference. 56 57 85,000 individuals filled out the paper questionnaire. As a follow-up to the initial paper-based 58 59 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml BMJ Open Page 8 of 18

1 2 3 4 questionnaire, we have developed and implemented a digital and flexible tablet-based questionnaire BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 6 9 7 platform, using the open source survey software tool LimeSurvey . This enables a rapid, easy, and 8 9 cost-effective procedure to collect self-reported data on health traits from the participating donors at 10 11 the donor sites at multiple time-points. The first digital questionnaire was implemented and used from 12 13 14 July 2015 until May 2018. The questionnaire was focused the following research questions: allergy, 15 16 ADHD, migraine, hidradenitis, depression, and Restless Legs Syndrome. It also contains questions 17 18 from the paper-based Forquestionnaire; peer SF-12, review smoking habits, onlyalcohol intake, height and weight. In 19 20 total, 48,000 DBDS participants completed the first digital questionnaire. The second digital 21 22 23 questionnaire started June 2018. It includes questions on: Sleep patterns, anxiety, migraine, stress, 24 25 skin diseases, endometriosis, pain, learning difficulties, SF-12, smoking habits, alcohol intake, height 26 27 and weight. 28 29 30 8 31 DBDS Organization: The DBDS itself is described in detail by Pedersen et al. . Briefly, DBDS is 32 33 governed by a steering committee with a scientific advisory board. All projects are managed by the 34 35 DBDS steering committee. Genetic projects involving genetic data in DBDS are run in collaboration

36 http://bmjopen.bmj.com/ 37 between DBDS Genomic Consortium that consists of the DBDS steering committee, deCODE 38 39 ® 40 Genetics , and scientific collaborators. 41 42 43

44 Genotyping: DNA is purified from whole blood and subsequent stored at -20ºC. All samples are then on September 29, 2021 by guest. Protected copyright. 45 46 47 genotyped in two batches at deCODE genetics using the Global Screening Array (GSA) by Illumina 48 49 (batch 1 n=85,000 and batch 2 n=25,000). The array has a very rich up-to-date content of >650.000 50 51 SNPs with custom chip content optimized for comparison with the Illumina Omni Express chip. All 52 53 54 genotype data are processed simultaneously for genotype calling, quality control and imputation. 55 56 Initially, individuals or SNPs with more than 10% missing data are excluded, as are individuals 57 58 59 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml Page 9 of 18 BMJ Open

1 2 3 4 deviating more than 3 standard deviations from the population heterozygosity (correcting for BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 6 7 individuals carrying large copy number variations (>100Kbp)). 8 9 Imputation 10 11 The genotyping data is imputed using a reference panel backbone consisting of 1) UK 1KG phase 3 12 13 14 and HapMap reference to predict non-genotyped SNPs with MAF>1%, and 2) an in-house dataset 15 16 consisting of n>6,000 Danish whole genome sequences to improve the prediction of variations with 17 18 a MAF down to aroundFor 0.01%. peer Variants listed review in the American only College of Medical Genetics and 19 20 Genomics guidelines are not predicted10. 21 22 23 Copy number variations 24 25 Using the genotype of b-allele count and log Ratio, copy number variations (CNVs) are called using 26 27 pennCNV11. CNVs called using <20 SNPs are excluded and the remaining CNVs are visually 28 29 30 inspected to exclude false positives. 31 32 Statistical design: All data are stored and analyzed on a specialized, secure section of the 16,000 core 33 34 Danish National Supercomputer for Life Sciences - Computerome (www.computerome.dtu.dk). Data 35

36 http://bmjopen.bmj.com/ 37 storage and computational analysis is performed on a protected, private cloud environment built using 38 39 industry-standard components. The environment has been designed to allow administrators to control 40 41 user access down to a very granular (conditional per column) level. The data security and resilience 42 43 are handled by an enterprise-grade HIPAA-compliant storage system configured to perform

44 on September 29, 2021 by guest. Protected copyright. 45 46 automatic hourly backups. The analysis environment is capable of dynamic scaling and has been 47 48 successfully tested in a composition of over 100 servers totaling more than 300 CPUs, over 13TB of 49 50 RAM and has access to up to 5.7PB (5,700TB) of disk space. The cluster comes with a preconfigured 51 52 53 queueing system, possibility to run Virtual Machines and containers (e.g. Docker, Singularity), a set 54 55 of over 900 preinstalled tools and packages and a possibility to add GPU servers optimized for 56 57 Machine Learning and specialized big memory systems (1-8TB of RAM). In terms of physical 58 59 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml BMJ Open Page 10 of 18

1 2 3 4 security, the Computerome site has a guard station with 24/7 active security with thermal imaging BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 6 7 cameras and a direct connection to law enforcement including K-9 units. All access in and out of the 8 9 site is strictly regulated including requirement of presenting a photo IDs and having a pre-approved 10 11 supervisor on site at all times. 12 13 14 For each hypothesis tested, a synopsis is provided including a detailed analysis plan. Information on 15 16 17 each synopsis will be published either as link to published articles describing the results or as 18 For peer review only 19 summary statistic on the study website: www.dbds.dk. 20 21 22 Samples: During each visit to a blood donor facility (up to four times per year for whole blood donors 23 24 or up to ten times if plasma donors), every participant donates one EDTA plasma sample. At inclusion 25 26 27 in DBDS one whole blood sample is also taken. Plasma samples taken prior to the inclusion date are 28 29 stored for quality assessments and will also be accessible for future analyses. All samples are frozen 30 31 within 6 hours of donation and stored in the primary collection tubes until processing. 32 33 34 Routine blood measurements including e.g. blood group, red and white blood cell counts, hemoglobin 35

36 concentration, and hematocrit are obtained at each donation. Besides routine measures, project- http://bmjopen.bmj.com/ 37 38 related measurements are available e.g. subgroups of patients are assessed for ferritin levels, infection 39 40 41 status (Cytomegalovirus, Toxoplasma gondii, and Herpes Simplex Virus), HLA-typing, and other 42 43 selected markers of infection (circulating cytokines, C-reactive protein, etc.).

44 on September 29, 2021 by guest. Protected copyright. 45 46 47 General Data Protection and Ethical Issues and Principles: DBDS has secured necessary permissions 48 49 50 and approval from the Danish Data Protection Agency (2007-58-0015) and the Scientific Ethical 51 52 Committee system (M-20090237). New projects within the DBDS Genomic consortium will require 53 54 additional approval by the National Committee on Health Research Ethics. DBDS will be responsible 55 56 57 for the continued contact with, and securing future permissions from, relevant Danish authorities 58 59 regarding research on DBDS samples. 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml Page 11 of 18 BMJ Open

1 2 3 4 The study will adhere to the FAIR (http://datafairport.org/: Findable, Accessible, Interoperable and BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 6 7 Reusable) concepts. Within this legal framework DBDS Genomic Consortium Board can thus decide 8 9 how and under which conditions the data can be shared. Generally, relevant summary data will be 10 11 publicly available via repositories 3 months after acceptance for publication (H2020 open-access 12 13 14 policy). 15 16 17 18 FINDINGS TO DATEFor peer review only 19 20 Initial quality measures that have been assessed: 21 22 23 Kinship 24 25 As described, giving blood often runs in families and the heritability has been estimated to be >53%7. 26 27 It is clear from the estimated kinship (Figure 1, Table 1), that there is a considerable 1st, 2nd and 3rd 28 29 30 degree relatives among the participants in the DBDS Genomic consortium. 31 32 Ethnicity 33 34 Based on ~15K overlapping SNPs from the genotyped data and the 1000 Genomes samples, we 35

36 http://bmjopen.bmj.com/ 37 confirm the expected population structure of the DBDS cohort; most participants are of European 38 39 ancestry (99%) and the following two ethnicity groups are of South Asian (0.4%) and East Asian 40 41 (0.2%) ancestry, respectively. The proportion of participants with recent African ancestry is 42 43 extremely low (0.002%) which is expected given the strict donor travel quarantine rules. Ethnicity

44 on September 29, 2021 by guest. Protected copyright. 45 12 46 was evaluated using FlashPCA2 . 47 48 Minor allele frequencies 49 50 The distribution of the minor allele frequency (MAF), shown that majority of SNPs (84%) is above 51 52 53 1% (Figure 2) as expected, which provides solid basis for genotype imputation. 54 55 56 57 STRENGTH AND LIMITATIONS 58 59 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml BMJ Open Page 12 of 18

1 2 3 4 Consecutive measurements: A unique feature of this large blood donor cohort is the ability to do BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 6 7 consecutive assessments. In standard settings, participants are typically recruited at a baseline time- 8 9 point and are invited for follow-up studies once or twice in the following years. The blood bank 10 11 represents an advantage because most donors have a long-term committed relationship for blood 12 13 13 14 donation and are seen one to four times annually . It is therefore possible to collect several yearly 15 16 and consecutive biological samples and questionnaire information over decades for a large number 17 18 of participants. Again, Forsubgroups peer and samples review from specific time-points only can be used in a retrospective 19 20 manner. 21 22 23 24 National-based registries: Denmark has several comprehensive national registries, which include both 25 26 health information and socio-demographical measures on an individual level. The informed consent 27 28 allows for combining information obtained from the DBDS participants and the national registries; 29 30 31 the Danish National Patient Registry (since 1977), the Danish Cancer (since 1943) and Diabetes 32 33 Registries (since 1992-2012), the Danish Registry of Medicinal Product Statistics (since 1994), the 34 35 Civil Registration Registry (since 1968 vital status, number of children, birthplace, address, relocation

36 http://bmjopen.bmj.com/ 37 and more), the Integrated Database for Labor Market Research (since 1982, e.g. educational level, 38 39 40 occupation status, income, social status, and other related parameters). This facilitates retrospective, 41 42 cross-sectional, or prospective studies using registry-based measurements in combination with 43

44 questionnaire-derived data. on September 29, 2021 by guest. Protected copyright. 45 46 47 Donor selection: Although, DBDS participants resemble the Danish population a few limitations in 48 49 14 50 the study design may affect the generalizability of the results . The blood donor exclusion criteria 51 52 dismiss individuals with: infections or diseases that are transmittable through blood, weight below 50 53 54 kg, hemoglobin (Hb) levels below 12.9 g/dL in males and 12.0 g/dL in females, and curious 55 56 57 behaviors: individuals with high travel rates to countries with high risk of hepatitis and HIV, men 58 59 who have sex with men, individuals who have previously worked as prostitutes, those who have used 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml Page 13 of 18 BMJ Open

1 2 3 4 intravenous substances, and pregnant women. Comparing socio-demographical parameters of blood BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 6 7 donors with that of the total Danish population, we know that very low- and high-income individuals 8 9 are underrepresented among blood donors14. In this respect, we acknowledge that DBDS lacks 10 11 coverage of certain parts of the general population in contrast to traditional population-based studies. 12 13 15 14 Similarly, the population based UK Biobank study have also reported a “healthy volunteer” bias . 15 16 17 DISCUSSION 18 For peer review only 19 20 The extension of the DBDS with a genomic cohort will profoundly impact the usability and empower 21 22 studies on genetic, environmental, and lifestyle factors that influence blood donor health. 23 24 Furthermore, the study provides a unique platform that facilitates analysis of common phenotypes 25 26 27 not otherwise found in the national health registries, disease resilience factors, and interactions 28 29 between genes and environment. At finally, such a large healthy cohort holds a huge potential for 30 31 providing crucial information for future precision medicine initiatives and similar efforts have been 32 33 34 started, e.g. “All of US” (by NIH, US). In Denmark we have the advantage of a collection of 35

36 extensive, national health registries that facilitate epidemiological studies on specific http://bmjopen.bmj.com/ 37 38 diseases/outcomes in such a large cohort. For phenotypes and symptoms not monitored systematically 39 40 41 in these health registries; e.g. lifestyle factors such as smoking habits, sleep patterns, and self- 42 43 perceived health; large epidemiological studies are needed but typical difficult to conduct. The DBDS

44 on September 29, 2021 by guest. Protected copyright. 45 Genomic Cohort can facilitate such studies. As described above, the DBDS Genomic Cohort exploits 46 47 an existing blood donor platform with an extremely high participation rate (>95%), which facilitates 48 49 50 a straightforward evaluation of donor health in large epidemiological studies. Furthermore, the 51 52 electronic questionnaire platform allows for easy and fast implementation of new, targeted 53 54 investigations in subgroups of the donor population. Together with outcomes from the Danish health 55 56 57 registries and the millions of retrospective plasma samples stored in easily accessible freezers, DBDS 58 59 represents a solid phenotyping platform that can be used for both cross-sectional epidemiological 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml BMJ Open Page 14 of 18

1 2 3 4 studies and for retrospective biomarker studies. We believe that these strengths make the DBDS BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 6 7 Genomic Cohort a strong competitive player in the field of precision medicine. 8 9 10 The DBDS Genomic Cohort allows us to study gene-environment interaction that are otherwise 11 12 difficult to study: testing disease development hypotheses; e.g. a) cognitive performance in 13 14 interaction with genetic factors and the risk of dementia, and b) determining the contribution of 15 16 17 genetic and environmental factors to a phenotype such as sleep pattern. The DBDS Genomic Cohort 18 For peer review only 19 is particularly valuable for studying disease resistance in individuals exposed to one or more known 20 21 disease risk factors and yet do not proceed to develop the disease. One such example could be 22 23 24 participants carrying a high load of a highly inheritable trait like psychiatric illness in the family or 25 26 known genetic risk factors, who do not have psychiatric illness themselves. Lastly, DBDS also 27 28 provides sequential storage of plasma samples, which allows for sequential blood measurements. 29 30 31 Such measurements could be used to investigate health markers like suPAR (soluble urokinase 32 33 plasminogen activating receptor) and the variation associated with the donor’s general health16. 34 35

36 In short, the DBDS genomic cohort facilitates the investigation of the impact of genomic factors on http://bmjopen.bmj.com/ 37 38 health traits and states. 39 40 41 Integrative analysis of different ‘omics, i.e. multi-omics analysis will be possible in the large DBDS 42 43 genomic cohort, which adds tremendously to its value as a resource for studying the health of blood 44 on September 29, 2021 by guest. Protected copyright. 45 46 donors and correlation between blood related traits and states. We expect that the DBDS genomic 47 48 cohort will contribute to discovery and validation of prodromal symptoms and biomarkers of disease, 49 50 51 thus providing a better understanding of the disease pathologies and suggesting new drug targets. 52 53 54 COLLABORATION 55 56 57 We encourage scientific collaborations using the data generated in the DBDS Genetic Consortium. 58 59 Published summarized data is available om request. Otherwise request of data necessitate first 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml Page 15 of 18 BMJ Open

1 2 3 4 approval by the DBDS steering committee and if the request is considered outside the aim of DBDS, BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 6 7 application to the national scientific ethical committee is obligatory. Additionally, material transfer 8 9 and data protection agreement need to be acquired. Please visit www.dbds.dk. 10 11 12 Acknowledgements 13 14 We wish to express a special thanks to the staffs of the Danish blood banks whose continued inclusion 15 16 17 of blood-donors into the DBDS makes this research possible. 18 For peer review only 19 20 Contributorship statement 21 22 TH, KB, and KSB conceived and planned the experiments. KB carried out the analyses. OBP, HH, 23 HMP, KRN, CE,HU, PJJ, TW, JO, GBJ, MN, SA, PIJ,ES and XX contributed to cohort and research 24 25 design. DW, PJC, KB and SB led to data infrastructure design. CE, KSB, MHL, ES and MSP 26 27 contributed to data capture. TH, KB and KSB contributed to the interpretation of the results. TH, KB, 28 and KSB took the lead in writing the manuscript. All authors provided critical feedback and helped 29 30 shape the analysis and manuscript and approaved the final version. 31 32 33 34 Competing interests 35 The authors declare no competing interest

36 http://bmjopen.bmj.com/ 37 38 39 Funding 40 41 We also wish to thank the Lundbeck Foundation, Denmark under Grant (R209-2015-3500 to 42 Kristoffer Burgdorf), The Danish Administrative Regions, The Danish Blood Donor Research 43

44 Foundation (Bloddonorernes Forskningsfond), Rigshospitalets Research Foundation, The Novo on September 29, 2021 by guest. Protected copyright. 45 46 Nordisk Foundation (NNF14CC0001 and NNF17OC0027594), and CANDY foundation 47 (CEHEAD). 48 49 50 51 References 52 53 54 1. van der Harst P, Zhang W, Mateo Leach I, et al. Seventy-five genetic loci influencing the human red blood 55 56 cell. Nature 2012;492(7429):369-75. doi: 10.1038/nature11677 57 2. McLaren CE, Garner CP, Constantine CC, et al. Genome-wide association study identifies genetic loci 58 associated with iron deficiency. PLoS One 2011;6(3):e17390. doi: 10.1371/journal.pone.0017390 59 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml BMJ Open Page 16 of 18

1 2 3

4 3. Tanaka T, Roy CN, Yao W, et al. A genome-wide association analysis of serum iron concentrations. Blood BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 2010;115(1):94-6. doi: 10.1182/blood-2009-07-232496 6 7 4. Sorensen E, Grau K, Berg T, et al. A genetic risk factor for low serum ferritin levels in Danish blood donors. 8 Transfusion 2012;52(12):2585-9. doi: 10.1111/j.1537-2995.2012.03629.x 9 5. Warneken F, Tomasello M. The roots of human altruism. Br J Psychol 2009;100(Pt 3):455-71. doi: 10 10.1348/000712608X379061 11 6. Steele WR, Schreiber GB, Guiltinan A, et al. role of altruistic behavior, empathetic concern, and social 12 responsibility motivation in blood donation behavior. Transfusion 2008;48(1):43-54. doi: 13 10.1111/j.1537-2995.2007.01481.x 14 7. Pedersen OB, Axel S, Rostgaard K, et al. The heritability of blood donation: a population-based nationwide 15 16 twin study. Transfusion 2015;55(9):2169-74; quiz 68. doi: 10.1111/trf.13086 17 8. Pedersen OB, Erikstrup C, Kotze SR, et al. The Danish Blood Donor Study: a large, prospective cohort and 18 biobank for medicalFor research. peer Vox Sang review 2012;102(3):271. doi:only 10.1111/j.1423-0410.2011.01553.x 19 9. Burgdorf KS, Felsted N, Mikkelsen S, et al. Digital questionnaire platform in the Danish Blood Donor Study. 20 Comput Methods Programs Biomed 2016;135:101-4. doi: 10.1016/j.cmpb.2016.07.023 21 10. Kalia SS, Adelman K, Bale SJ, et al. Recommendations for reporting of secondary findings in clinical exome 22 and genome sequencing, 2016 update (ACMG SF v2. 0): a policy statement of the American College 23 of Medical Genetics and Genomics. Genetics in Medicine 2017;19(2):249. 24 11. Wang K, Li M, Hadley D, et al. PennCNV: an integrated hidden Markov model designed for high-resolution 25 26 copy number variation detection in whole-genome SNP genotyping data. Genome research 27 2007;17(11):1665-74. 28 12. Abraham G, Qiu Y, Inouye M. FlashPCA2: principal component analysis of Biobank-scale genotype 29 datasets. Bioinformatics 2017;33(17):2776-78. doi: 10.1093/bioinformatics/btx299 [published 30 Online First: 2017/05/06] 31 13. Burgdorf K, Pedersen O, Sørensen E, et al. Extending the gift of donation: blood donor public health 32 studies. ISBT Science Series 2015;10(S1):225-30. 33 14. Burgdorf KS, Simonsen J, Sundby A, et al. Socio-demographic characteristics of Danish blood donors. PLoS 34 35 One 2017;12(2):e0169112. doi: 10.1371/journal.pone.0169112 15. Fry A, Littlejohns TJ, Sudlow C, et al. Comparison of Sociodemographic and Health-Related Characteristics 36 http://bmjopen.bmj.com/ 37 of UK Biobank Participants with the General Population. Am J Epidemiol 2017 doi: 38 10.1093/aje/kwx246 39 16. Nielsen J, Roge R, Pristed SG, et al. Soluble urokinase-type plasminogen activator receptor levels in 40 patients with schizophrenia. Schizophr Bull 2015;41(3):764-71. doi: 10.1093/schbul/sbu118 41 17. Manichaikul A, Mychaleckyj JC, Rich SS, et al. Robust relationship inference in genome-wide association 42 studies. Bioinformatics 2010;26(22):2867-73. doi: 10.1093/bioinformatics/btq559 [published Online 43 First: 2010/10/12]

44 on September 29, 2021 by guest. Protected copyright. 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml Page 17 of 18 BMJ Open

1 2 3

4 BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 6 7 Figure 1. Relationships for DBDS donors genotyped as part of the first batch(n=85,000). Each point 8 represents a pair of related individuals and the colors indicate the degree of relatedness (InfType): 9 monozygotic twins(MZ) and technical duplicates(DUP) in green (in the upper left corner), 1st degree 10 nd rd 11 relatives as parent-child pairs (PO, dark blue) and full siblings (FS, blue), 2 and 3 degree relatives 12 in blue, red and light red, respectively. The y-axis shows the estimated kinship coefficient, defined as 13 the probability that two alleles sampled at random (one from each individual) are identical by descent. 14 The x-axis shows the proportion of zero identity-by-state (IBS0), defined as the proportion of SNPs 15 16 at which two samples share no alleles. KING’s criteria17 was used to estimate the degree of 17 relatedness. 18 For peer review only 19 20 21 Figure 2. Minor allele frequency (MAF) distribution of the genotyped SNPs prior to quality 22 23 control. 24 25 Table 1. Related pairs (3rd degree or closer) for batch 1 DBDS donors genotyped (N=85,000). 26 Counts are derived from the relationship information presented in Figure 1. 27 28 29 nd rd 30 Relationship Monozygotic Parent- Full siblings 2 degree 3 degree 31 twins offspring 32 Pairs 51 4,246 3,309 4,375 11,433 33 34 35

36 http://bmjopen.bmj.com/ 37 38 39 40 41 42 43

44 on September 29, 2021 by guest. Protected copyright. 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml BMJ Open Page 18 of 18 BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 For peer review only 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

32 http://bmjopen.bmj.com/ 33 34 35 36 37 38 39 Caption : Minor allele frequency (MAF) distribution of the genotyped SNPs prior to quality control.

40 on September 29, 2021 by guest. Protected copyright. 127x127mm (300 x 300 DPI) 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml Page 19 of 18 BMJ Open BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 For peer review only 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

32 http://bmjopen.bmj.com/ 33 34 35 36 37 38 39 Minor allele frequency (MAF) distribution of the genotyped SNPs prior to quality control.

40 on September 29, 2021 by guest. Protected copyright. 127x127mm (300 x 300 DPI) 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml BMJ Open BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from

Cohort profile: DBDS Genomic Cohort, a prospective and comprehensive resource for integrative and temporal analysis of genetic, environmental, and lifestyle factors affecting health of blood donors.

ForJournal: peerBMJ Open review only Manuscript ID bmjopen-2018-028401.R1

Article Type: Cohort profile

Date Submitted by the 19-Feb-2019 Author:

Complete List of Authors: Hansen, Thomas Folkmann; Rigshospitalet Glostrup, Danish Headache Center; Kobenhavns Universitet Sundhedsvidenskabelige Fakultet, Center for protein research Banasik, Karina; Kobenhavns Universitet Sundhedsvidenskabelige Fakultet, Centre for Protein research Erikstrup, Christian; Aarhus Universitetshospital, Immunology Pedersen, Ole; Neastved hospital, Department of immunology Westergaard , David ; Kobenhavns Universitet Sundhedsvidenskabelige Fakultet, Center for protein research Chmura , Piotr; Kobenhavns Universitet Sundhedsvidenskabelige Fakultet, Center for protein research http://bmjopen.bmj.com/ Nielsen, Kaspar; Aalborg university hospital, immunology Thørner, Lise; Rigshospitalet, immunology Hjalgrim, Henrik; Statens Serum Institut Paarup, Helene; Odense Universitetshospital, Immunology Larsen , Margit ; Rigshospitalet, immunology Petersen, Mikkel; Aarhus Universitetshospital, Immunology Jennum, Poul; Glostrup Hostpital, Danish Center for Sleep Medicine, Department of Clinical Neurophysiology; University of Copenhagen,

Center for Healthy Ageing on September 29, 2021 by guest. Protected copyright. Andersen, Steffen; Copenhagen Business School, Economics Nyegaard, Mette; Aarhus Universitet Health, Biomedicine Jemec, Gregor; Roskilde University Hospital, Dermatology Olesen, Jes; Rigshospitalet Glostrup, Danish Headache Center Werge, Thomas; Kobenhavns Universitet Sundhedsvidenskabelige Fakultet, Clinical medicine Johansson, Pär I; Copenhagen University Hospital, Rigshospitalet, Department of Clinical Immunology Sørensen, Erik; Rigshospitalet, immunology Brunak, S; University of Copenhagen Ullum, Henrik; Rigshospitalet, immunology Burgdorf, Kristoffer ; Rigshospitalet, immunology

Primary Subject Genetics and genomics Heading:

Secondary Subject Heading: Public health

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml Page 1 of 20 BMJ Open BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 1 2 3 Blood bank & transfusion medicine < PATHOLOGY, GENETICS, 4 Keywords: PREVENTIVE MEDICINE, PUBLIC HEALTH 5 6 7 8 9 10 11 12 13 14 15 16 For peer review only 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

32 http://bmjopen.bmj.com/ 33 34 35 36 37 38 39

40 on September 29, 2021 by guest. Protected copyright. 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml BMJ Open Page 2 of 20

1 2 3 4 Cohort profile: DBDS Genomic Cohort, a prospective and comprehensive resource for BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 6 7 integrative and temporal analysis of genetic, environmental, and lifestyle factors affecting 8 9 health of blood donors. 10 11 Thomas Folkmann Hansen*1,2 Karina Banasik*2, Christian Erikstrup3, Ole Birger Pedersen4, David 12 13 2 2 5 6 14 Westergaard , Piotr Jaroslaw Chmura , Kaspar René Nielsen , Lise Wegner Thørner , Henrik 15 16 Hjalgrim7, Helene M. Paarup8, Margit Hørup Larsen6, Mikkel Steen Petersen3, Poul Jørgen Jennum9, 17 18 Steffen Andersen10, MetteFor Nyegaard peer11, Gregor review B. Jemec4,12 only, Jes Olesen1, Thomas Werge13, Pär 19 20 6 6 2 6 6 21 Ingemar Johansson , Erik Sørensen , Søren Brunak , Henrik Ullum , Kristoffer Sølvsten Burgdorf 22 23 24 25 Affiliations: 26 27 28 1Danish Headache Center, Department of Neurological department, Copenhagen University Hospital, 29 30 31 Glostrup, Denmark. 32 33 2Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, 34 35 University of Copenhagen, Copenhagen, Denmark.

36 http://bmjopen.bmj.com/ 37 3 38 Department of Clinical Immunology, Aarhus University Hospital, Aarhus, Denmark 39 40 4Department of Clinical Immunology, Naestved Hospital, Naestved, Denmark. 41 42 5Department of Clinical Immunology, Aalborg University Hospital, Aalborg, Denmark 43

44 6Department of Clinical Immunology, Copenhagen University Hospital, Rigshospitalet, Copenhagen, on September 29, 2021 by guest. Protected copyright. 45 46 47 Denmark 48 49 7Department of Epidemiology Research, Statens Serum Institut, Copenhagen, Denmark, Department 50 51 of Hematology & Copenhagen University Hospital, Copenhagen, Denmark 52 53 8 54 Department of Clinical Immunology, Odense University Hospital, Odense, Denmark 55 56 9Danish Center for Sleep Medicine, Department of Clinical Neurophysiology, Copenhagen 57 58 University Hospital, Copenhagen, Denmark, 59 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml Page 3 of 20 BMJ Open

1 2 3 4 10Department of Economics, Copenhagen Business School BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 6 11 7 Department of Biomedicine, Aarhus University, Aarhus, Denmark 8 9 12Department of Dermatology, Zealand University Hospital, Roskilde; Faculty of Health and Medical 10 11 Sciences, University of Copenhagen, Denmark 12 13 13 14 Institute of Biological Psychiatry, Mental Health Centre Sct. Hans, Copenhagen University 15 16 Hospital, Roskilde, Denmark & Department of Clinical Medicine, University of Copenhagen, 17 18 Copenhagen, DenmarkFor peer review only 19 20 *Both authors contributed equally 21 22 23 24 25 26 27 Corresponding author: 28 29 30 Thomas Folkmann Hansen, [email protected] 31 32 33 34 35

36 http://bmjopen.bmj.com/ 37 38 39 40 41 42 43

44 on September 29, 2021 by guest. Protected copyright. 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml BMJ Open Page 4 of 20

1 2 3 4 Abstract BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 6 7 Purpose: To establish a cohort that enables identification of genomic factors that influence human 8 9 health and empower increased blood donor health and safe blood transfusions. 10 11 Human health is complex and involves several factors, a major one being the genomic aspect. The 12 13 14 genomic era has resulted in many consortia encompassing large samples sizes, which has proven 15 16 successful for identifying genetic factors associated with specific traits. However, it remains a big 17 18 challenge to establish largeFor cohorts peer that facilitate review studies of theonly interaction between genetic factors, 19 20 environmental and life-style factors as these change over the course of life. A major obstacle to such 21 22 23 endeavors is that it is difficult to revisit participants to retrieve additional information and obtain 24 25 longitudinal, consecutive measurements. 26 27 28 29 30 Participants: Blood donors (n=110,000) have given consent to participate in the Danish Blood Donor 31 32 Study. The study utilizes the infrastructure of the Danish blood banks. 33 34 35

36 http://bmjopen.bmj.com/ 37 Findings to date: The cohort comprises extensive phenotype data and whole genome genotyping data. 38 39 Further, it is possible to retrieve additional phenotype data from national registries as well as from 40 41 the donors at future visits, including consecutive measurements. 42 43

44 on September 29, 2021 by guest. Protected copyright. 45 46 Future plans: To provide new knowledge on factors influencing our health. Thus, provide a platform 47 48 for studying the influence of genomic factors on human health, in particular the interaction between 49 50 environmental and genetic factors. 51 52 53 54 55 56 57 58 59 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml Page 5 of 20 BMJ Open

1 2 3 4 INTRODUCTION BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 6 7 A person's health is determined by complex interactions between genetic, environmental, and lifestyle 8 9 factors. Analyzing these factors collectively and prospectively is preferable. However, this is usually 10 11 only possible using birth-cohorts and large population-based cohorts, and due to the extensive effort 12 13 involved in establishing such cohorts, they are rare. The Danish Blood Donor Study (DBDS; dbds.dk) 14 15 16 is a large prospective cohort of blood donors aiming at identifying predictors of healthy donors. As 17 18 part of this cohort, weFor have nowpeer established review DBDS Genomic only Cohort assessing common single 19 20 nucleotide polymorphisms (SNPs) in 110,000 donors. Thus, the DBDS Genomic Cohort provides a 21 22 23 comprehensive catalog for large-scale genetic analyses in relation to numerous environmental and 24 25 lifestyle factors affecting donor’s health. A description of summary statistics on phenotypes and data 26 27 coverage are provided in Table 1. A detailed sociodemographic description of Danish blood donors, 28 29 30 including sex and age distribution, ethnicity, education, employment and level of urbanization are 31 32 found in Burgdorf et.al 20171. 33 34 35 The evaluation of blood donor’s health is important for several reasons. It is crucial for both the donor

36 http://bmjopen.bmj.com/ 37 and the blood recipient that a healthy donor population is maintained with a high donation rate and a 38 39 40 low dropout rate, thereby ensuring a steady blood supply. Evidence-based guidelines for donor 41 42 recruitment, care, and retention are needed to ensure that donor recruitment can focus on individuals 43

44 who are likely to remain healthy and donate frequently in the long term. One obvious relevant on September 29, 2021 by guest. Protected copyright. 45 46 influential parameter relates to iron metabolism: We know that hundreds of genes impact the 47 48 2 49 generation and regulation of blood cells and also influence phenotype variations of iron absorption 50 51 and metabolism3-5. Genome-wide SNP information is expected to provide knowledge enabling us to 52 53 evaluate to whom donating blood will be unproblematic, thus facilitating retention of a stable blood 54 55 56 donor population. Another parameter is altruism. Altruism as part of a prosocial behavior, the selfless 57 58 concern for the welfare of others6, is generally considered a typical blood donor characteristic7. 59 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml BMJ Open Page 6 of 20

1 2 3 4 However, altruism in the context of voluntary blood donation has also been shown to be a very BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 6 7 7 complex phenotype . In a previous study, we found a substantially larger genetic influence on blood 8 9 donor behavior compared to most previous twin studies on altruism, which further highlights the 10 11 heterogeneity of the blood donor personality8. The considerable amount of kinship (twins and 12 13 14 siblings) in the cohort in combination with socioeconomic variables from the Danish registries will 15 16 enable us to further differentiate between genetic impact and social basis of altruism in blood 17 18 donation. The genome-wideFor SNPpeer information review will provide only knowledge that can aid in the 19 20 identification of long-term and steady donors. Further, we will test for association between genotype 21 22 23 SNP information and prodromal symptoms of somatic and psychiatric disease or illnesses. 24 25 26 The DBDS Genomic Cohort offers the possibility to assess the impact of heterogeneous exposures in 27 28 a broad range of phenotypes, such as mental state, risk-taking behavior, and characterization of blood 29 30 31 components, and immune defense. The setup of the study allows for researchers to assess the genetic 32 33 association in a) cross sectional studies to investigate e.g. the variation of phenotypic characteristic, 34 35 clinical, and biochemical measurements, b) retrospective studies of e.g. rehabilitation capacity, and

36 http://bmjopen.bmj.com/ 37 c) prospective studies e.g. analyzing the variation of phenotypes and clinical measurements over time 38 39 40 and even identify prodromal symptoms. 41 42 43 COHORT DESCRIPTION

44 on September 29, 2021 by guest. Protected copyright. 45 46 Population: The nation-wide blood donor population in Denmark consists of more than 230,000 47 48 donors giving more than 300,000 blood donations annually (www.bloddonor.dk). Blood donation in 49 50 51 Denmark is voluntary and unpaid. This means that the donation is based on the desire to help others, 52 53 who need it and not the desire of an economic benefit. Blood donors must be physically well, aged 54 55 17 and 67 years, and weigh more than 50 kilos. Individuals in chronic medical treatment or frequent 56 57 travelers to countries with high prevalence of blood disease are not allowed to participate. The deferral 58 59 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml Page 7 of 20 BMJ Open

1 2 3 4 rules can be seen at www.bloddonor.dk. Blood donors from foreign countries must have lived in BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 6 7 Denmark for a minimum of one year, have a Danish social security number and have learned the 8 9 Danish language to prevent misunderstandings between the donor and the blood bank professionals. 10 11 12 The nationwide Danish blood bank is an integrated part of the Danish healthcare system financed by 13 14 local and state taxes. The Danish healthcare system is administrated by democratically elected 15 16 17 assemblies from national state institutions, regions and municipalities. The Danish blood banks are 18 For peer review only 19 non-profit organizations owned and operated by each of the five regions in Denmark. The blood banks 20 21 have a national board to structure collaboration across regions on recruiting donors, processing and 22 23 24 distributing the blood for the Danish population. The DBDS is building upon the structured Danish 25 26 blood bank system in the regions responsible for administrating donation sites at 27 hospitals in 27 28 addition to five mobile donation units using 180 selected sites nationally (e.g., large companies, sports 29 30 31 centers and universities) 32 33 34 At the blood donation centers and attached laboratories, the entire necessary infrastructure needed for 35

36 the collection of biological samples and structured data is in place. Both blood plasma and whole http://bmjopen.bmj.com/ 37 38 blood for DNA extraction are available from all donors. The blood bank infrastructure already has 39 40 41 laboratory facilities with educated staff (nurses, technicians, IT specialists and physicians). In 42 43 addition, the blood bank professionals facilitate the testing of the blood for a variety of biomarkers

44 on September 29, 2021 by guest. Protected copyright. 45 and holds expertise in large scale storage of biological material. 46 47 48 Contact to participants at the blood banks is fundamental to our study. Blood donors are asked to 49 50 51 participate and sign an informed consent when they visit the blood bank to donate blood. This consent 52 53 allows us to use the blood samples from their past and future donations to study the impact of genetic 54 55 and immunological factors on current and future health and disease. The inclusion and exclusion 56 57 58 59 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml BMJ Open Page 8 of 20

1 2 3 4 criteria for blood donation and participation in DBDS are the same with 95% of the blood donors BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 6 9 7 who are invited agrees to participate in DBDS . 8 9 10 Questionnaires: From March 2010 until July 2015, all participating donors had to complete a four- 11 12 page paper-based questionnaire with questions of self-experienced physical and mental health 13 14 including the 12-item short form (SF-12®) standardized health survey, smoking habits, alcohol 15 16 17 intake, exercise, food intake, supplemental iron intake, height, weight, and waist circumference. 18 For peer review only 19 85,000 individuals filled out the paper questionnaire. As a follow-up to the initial paper-based 20 21 questionnaire, we have developed and implemented a digital and flexible tablet-based questionnaire 22 23 10 24 platform, using the open source survey software tool LimeSurvey . This enables a rapid, easy, and 25 26 cost-effective procedure to collect self-reported data on health traits from the participating donors at 27 28 the donor sites at multiple time-points. The first digital questionnaire was implemented and used from 29 30 31 July 2015 until May 2018. The questionnaire was focused the following research questions: allergy, 32 33 ADHD, migraine, hidradenitis, depression, and Restless Legs Syndrome. It also contains questions 34 35 from the paper-based questionnaire; SF-12, smoking habits, alcohol intake, height and weight. In

36 http://bmjopen.bmj.com/ 37 total, 48,000 DBDS participants completed the first digital questionnaire. The second digital 38 39 40 questionnaire started June 2018. It includes questions on: Sleep patterns, anxiety, migraine, stress, 41 42 skin diseases, endometriosis, pain, learning difficulties, SF-12, smoking habits, alcohol intake, height 43

44 and weight. Using the questionnaire data, several studies have already assessed the factors describing on September 29, 2021 by guest. Protected copyright. 45 46 11 12 47 the general health of blood donors, e.g. hidradentis , risk of infection , migraine, restless legs, and 48 49 depression13 14. Also, by revisiting samples it has been possible to assess wether infections correlates 50 51 with obesity15 and how iron deficiency and hemoglobin levels affect donor health in general16 17. 52 53 54 Registry-based data: There is and has always been equal access to the healthcare system in Denmark, 55 56 57 hence, the centralized national civil and health registries are unbiased and comprehensive resources 58 59 of healthcare data. The registries include the National Patient Registry containing all hospital- 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml Page 9 of 20 BMJ Open

1 2 3 4 registered diagnoses since 197518, as well as other specialized registries, e.g. the Danish Medical BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 6 7 Birth Register, the Danish Register of Causes of Death, and Statistics Denmark monitoring e.g. socio- 8 9 economic data. We have already used the DBDS cohort in epidemiological studies assessing e.g. the 10 11 mortality 19 of donors and the effect of blood donation on offspring birth weight20. 12 13 14 DBDS Organization: The DBDS itself is described in detail by Pedersen et al.9. Briefly, DBDS is 15 16 17 governed by a steering committee with a scientific advisory board. All projects are managed by the 18 For peer review only 19 DBDS steering committee. Genetic projects involving genetic data in DBDS are run in collaboration 20 21 between DBDS Genomic Consortium that consists of the DBDS steering committee, deCODE 22 23 ® 24 Genetics , and scientific collaborators. 25 26 27 28 Genotyping: DNA is purified from whole blood and subsequent stored at -20ºC. All samples are then 29 30 31 genotyped in two batches at deCODE genetics using the Global Screening Array (GSA) by Illumina 32 33 (batch 1 n=85,000 and batch 2 n=25,000). The array has a very rich up-to-date content of >650.000 34 35 SNPs with custom chip content optimized for comparison with the Illumina Omni Express chip. All

36 http://bmjopen.bmj.com/ 37 genotype data are processed simultaneously for genotype calling, quality control and imputation. 38 39 40 Initially, individuals or SNPs with more than 10% missing data are excluded, as are individuals 41 42 deviating more than 3 standard deviations from the population heterozygosity (correcting for 43

44 individuals carrying large copy number variations (>100Kbp)). on September 29, 2021 by guest. Protected copyright. 45 46 47 Imputation 48 49 The genotyping data is imputed using a reference panel backbone consisting of 1) UK 1KG phase 3 50 51 and HapMap reference to predict non-genotyped SNPs with minor allele frequency (MAF)>1%, and 52 53 54 2) an in-house dataset consisting of n>6,000 Danish whole genome sequences to improve the 55 56 prediction of variations with a MAF down to around 0.01%21. Variants listed in the American College 57 58 of Medical Genetics and Genomics guidelines are currently not predicted, due to restrictions from the 59 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml BMJ Open Page 10 of 20

1 2 3 4 Danish National Ethics Committee 22. For future collaborative studies the Ethics Committee will BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 6 7 approve analysis of these variants on a case by case basis. 8 9 Copy number variations 10 11 Using the genotype of b-allele count and log Ratio, copy number variations (CNVs) are called using 12 13 23 14 pennCNV . CNVs called using <20 SNPs are excluded and the remaining CNVs are visually 15 16 inspected to exclude false positives. 17 18 Statistical design: All dataFor are stored peer and analyzed review on a specialized, only secure section of the 16,000 core 19 20 Danish National Supercomputer for Life Sciences - Computerome (www.computerome.dtu.dk). Data 21 22 23 storage and computational analysis is performed on a protected, private cloud environment. The 24 25 analysis environment is capable of dynamic scaling and has been successfully tested in a composition 26 27 of over 100 servers totaling more than 300 CPUs, over 13TB of RAM and has access to up to 5.7PB 28 29 30 (5,700TB) of disk space. The cluster comes with a preconfigured queueing system, possibility to run 31 32 Virtual Machines and containers (e.g. Docker, Singularity), a set of over 900 preinstalled tools and 33 34 packages and a possibility to add GPU servers optimized for Machine Learning and specialized big 35

36 http://bmjopen.bmj.com/ 37 memory systems (1-8TB of RAM). 38 39 40 For each hypothesis tested, a synopsis is provided including a detailed analysis plan. Information on 41 42 each synopsis will be published either as link to published articles describing the results or as 43

44 summary statistic on the study website: www.dbds.dk. on September 29, 2021 by guest. Protected copyright. 45 46 47 Samples: During each visit to a blood donor facility (up to four times per year for whole blood donors 48 49 50 or up to ten times if plasma donors), every participant donates one EDTA plasma sample. At inclusion 51 52 in DBDS one whole blood sample is also taken. Plasma samples taken prior to the inclusion date are 53 54 stored for quality assessments and will also be accessible for future analyses. All samples are frozen 55 56 57 within 6 hours of donation and stored in the primary collection tubes until processing. 58 59 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml Page 11 of 20 BMJ Open

1 2 3 4 Routine blood measurements including e.g. blood group, red and white blood cell counts, hemoglobin BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 6 7 concentration, and hematocrit are obtained at each donation. Besides routine measures, project- 8 9 related measurements are available e.g. subgroups of patients are assessed for ferritin levels, infection 10 11 status (Cytomegalovirus, Toxoplasma gondii, and Herpes Simplex Virus), HLA-typing, and other 12 13 14 selected markers of infection (circulating cytokines, C-reactive protein, etc.). 15 16 17 18 General Data ProtectionFor and Ethical peer Issues andreview Principles: DBDS only has secured necessary permissions 19 20 and approval from the Danish Data Protection Agency (2007-58-0015) and the Scientific Ethical 21 22 23 Committee system (M-20090237). New projects within the DBDS Genomic consortium will require 24 25 additional approval by the National Committee on Health Research Ethics. DBDS will be responsible 26 27 for the continued contact with, and securing future permissions from, relevant Danish authorities 28 29 30 regarding research on DBDS samples. 31 32 33 The study will adhere to the FAIR (http://datafairport.org/: Findable, Accessible, Interoperable and 34 35 Reusable) concepts. Within this legal framework DBDS Genomic Consortium Board can thus decide

36 http://bmjopen.bmj.com/ 37 how and under which conditions the data can be shared. Generally, relevant summary data will be 38 39 40 publicly available via repositories 3 months after acceptance for publication (H2020 open-access 41 42 policy). 43

44 Patient and Public Involvement: Patients and public were not involved in the design of this study. on September 29, 2021 by guest. Protected copyright. 45 46 47 Data availability statement: No additional data available 48 49 50 51 FINDINGS TO DATE 52 53 54 Initial quality measures that have been assessed: 55 56 Kinship 57 58 59 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml BMJ Open Page 12 of 20

1 2 3 4 As described, giving blood often runs in families and the heritability has been estimated to be >53%8. BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 6 7 It is clear from the estimated kinship based on the first batch of participants (n=85.000) (Figure 1, 8 9 Table 2), that there is a considerable 1st, 2nd and 3rd degree relatives among the participants in the 10 11 DBDS Genomic cohort. 12 13 14 Ethnicity 15 16 Based on ~15K overlapping SNPs from the genotyped data and the 1000 Genomes samples, we 17 18 confirm the expected populationFor peer structure ofreview the DBDS cohort; only most participants are of European 19 20 ancestry (99%) and the following two ethnicity groups are of South Asian (0.4%) and East Asian 21 22 23 (0.2%) ancestry, respectively. The proportion of participants with recent African ancestry is 24 25 extremely low (0.002%) which is expected given the strict donor travel quarantine rules. Ethnicity 26 27 was evaluated using FlashPCA224. 28 29 30 Minor allele frequencies 31 32 The distribution of the MAF, shown that majority of SNPs (84%) is above 1% (Figure 2) as expected, 33 34 which provides solid basis for genotype imputation. 35

36 http://bmjopen.bmj.com/ 37 38 39 STRENGTH AND LIMITATIONS 40 41 Consecutive measurements: A unique feature of this large blood donor cohort is the ability to do 42 43 consecutive assessments. In standard settings, participants are typically recruited at a baseline time-

44 on September 29, 2021 by guest. Protected copyright. 45 46 point and are invited for follow-up studies once or twice in the following years. The blood bank 47 48 represents an advantage because most donors have a long-term committed relationship for blood 49 50 donation and are seen one to four times annually25. It is therefore possible to collect several yearly 51 52 53 and consecutive biological samples and questionnaire information over decades for a large number 54 55 of participants. Again, subgroups and samples from specific time-points can be used in a retrospective 56 57 manner. 58 59 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml Page 13 of 20 BMJ Open

1 2 3 4 National-based registries: Denmark has several comprehensive national registries, which include both BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 6 7 health information and socio-demographical measures on an individual level. The informed consent 8 9 allows for combining information obtained from the DBDS participants and the national registries; 10 11 the Danish National Patient Registry (since 1977), the Danish Cancer (since 1943) and Diabetes 12 13 14 Registries (since 1992-2012), the Danish Registry of Medicinal Product Statistics (since 1994), the 15 16 Civil Registration Registry (since 1968 vital status, number of children, birthplace, address, relocation 17 18 and more), the IntegratedFor Database peer for Labor review Market Research only (since 1982, e.g. educational level, 19 20 occupation status, income, social status, and other related parameters). This facilitates retrospective, 21 22 23 cross-sectional, or prospective studies using registry-based measurements in combination with 24 25 questionnaire-derived data. 26 27 28 Donor selection: Although, DBDS participants resemble the Danish population a few limitations in 29 30 1 31 the study design may affect the generalizability of the results . The blood donor exclusion criteria 32 33 dismiss individuals with: infections or diseases that are transmittable through blood, weight below 50 34 35 kg, hemoglobin (Hb) levels below 12.9 g/dL in males and 12.0 g/dL in females, and curious

36 http://bmjopen.bmj.com/ 37 behaviors: individuals with high travel rates to countries with high risk of hepatitis and HIV, men 38 39 40 who have sex with men, individuals who have previously worked as prostitutes, those who have used 41 42 intravenous substances, and pregnant women. Comparing socio-demographical parameters of blood 43

44 donors with that of the total Danish population, we know that very low- and high-income individuals on September 29, 2021 by guest. Protected copyright. 45 46 1 47 are underrepresented among blood donors . In this respect, we acknowledge that DBDS lacks 48 49 coverage of certain parts of the general population in contrast to traditional population-based studies. 50 51 Similarly, the population based UK Biobank study have also reported a “healthy volunteer” bias26. 52 53 54 DISCUSSION 55 56 57 58 59 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml BMJ Open Page 14 of 20

1 2 3 4 The extension of the DBDS with a genomic cohort will profoundly impact the usability and empower BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 6 7 studies on genetic, environmental, and lifestyle factors that influence blood donor health. 8 9 Furthermore, the study provides a unique platform that facilitates analysis of common phenotypes 10 11 not otherwise found in the national health registries, disease resilience factors, and interactions 12 13 14 between genes and environment. At finally, such a large healthy cohort holds a huge potential for 15 16 providing crucial information for future precision medicine initiatives and similar efforts have been 17 18 started, e.g. “All of US”For (by NIH,peer US). Inreview Denmark we haveonly the advantage of a collection of 19 20 extensive, national health registries that facilitate epidemiological studies on specific 21 22 23 diseases/outcomes in such a large cohort. For phenotypes and symptoms not monitored systematically 24 25 in these health registries; e.g. lifestyle factors such as smoking habits, sleep patterns, and self- 26 27 perceived health; large epidemiological studies are needed but typical difficult to conduct. The DBDS 28 29 30 Genomic Cohort can facilitate such studies. As described above, the DBDS Genomic Cohort exploits 31 32 an existing blood donor platform with an extremely high participation rate (>95%), which facilitates 33 34 a straightforward evaluation of donor health in large epidemiological studies. Furthermore, the 35

36 http://bmjopen.bmj.com/ 37 electronic questionnaire platform allows for easy and fast implementation of new, targeted 38 39 investigations in subgroups of the donor population. Together with outcomes from the Danish health 40 41 registries and the millions of retrospective plasma samples stored in easily accessible freezers, DBDS 42 43 represents a solid phenotyping platform that can be used for both cross-sectional epidemiological

44 on September 29, 2021 by guest. Protected copyright. 45 46 studies and for retrospective biomarker studies. We believe that these strengths make the DBDS 47 48 Genomic Cohort a strong competitive player in the field of precision medicine. 49 50 51 The DBDS Genomic Cohort allows us to study gene-environment interaction that are otherwise 52 53 54 difficult to study: testing disease development hypotheses; e.g. a) cognitive performance in 55 56 interaction with genetic factors and the risk of dementia, and b) determining the contribution of 57 58 genetic and environmental factors to a phenotype such as sleep pattern. One way to study this is by 59 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml Page 15 of 20 BMJ Open

1 2 3 4 using multivariate regression models. As an example, Fan et al. have recently portrayed an advanced BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 6 7 example of gene-environment interaction analysis incorporating temporal and spatial 8 9 considerations27, an analysis based on data from the Danish civil registries. The DBDS Genomic 10 11 Cohort is particularly valuable for studying disease resistance in individuals exposed to one or more 12 13 14 known disease risk factors and yet do not proceed to develop the disease. One such example could be 15 16 participants carrying a high load of a highly inheritable trait like psychiatric illness in the family or 17 18 known genetic risk factors,For who peer do not have review psychiatric illnessonly themselves. Lastly, DBDS also 19 20 provides sequential storage of plasma samples, which allows for sequential blood measurements. 21 22 23 Such measurements could be used to investigate health markers like suPAR (soluble urokinase 24 25 plasminogen activating receptor) and the variation associated with the donor’s general health28. 26 27 28 In short, the DBDS genomic cohort facilitates the investigation of the impact of genomic factors on 29 30 31 health traits and states. 32 33 34 Integrative analysis of different ‘omics, i.e. multi-omics analysis will be possible in the large DBDS 35

36 genomic cohort, which adds tremendously to its value as a resource for studying the health of blood http://bmjopen.bmj.com/ 37 38 donors and correlation between blood related traits and states. We expect that the DBDS genomic 39 40 41 cohort will contribute to discovery and validation of prodromal symptoms and biomarkers of disease, 42 43 thus providing a better understanding of the disease pathologies and suggesting new drug targets.

44 on September 29, 2021 by guest. Protected copyright. 45 46 COLLABORATION 47 48 49 We encourage scientific collaborations using the data generated in the DBDS Genetic Consortium. 50 51 Published summarized data is available om request. Otherwise request of data necessitate first 52 53 54 approval by the DBDS steering committee and if the request is considered outside the aim of DBDS, 55 56 application to the national scientific ethical committee is obligatory. Additionally, material transfer 57 58 and data protection agreement need to be acquired. Please visit www.dbds.dk. 59 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml BMJ Open Page 16 of 20

1 2 3 4 Acknowledgements BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 6 7 We wish to express a special thanks to the staffs of the Danish blood banks whose continued inclusion 8 9 of blood-donors into the DBDS makes this research possible. 10 11 12 Contributorship statement 13 14 TH, KB, and KSB conceived and planned the experiments. KB carried out the analyses. OBP, HH, 15 16 HMP, KRN, CE,HU, PJJ, TW, JO, GBJ, MN, SA, PIJ,ES and LWT contributed to cohort and research 17 design. DW, PJC, KB and SB led to data infrastructure design. CE, KSB, MHL, ES and MSP 18 For peer review only 19 contributed to data capture. TH, KB and KSB contributed to the interpretation of the results. TH, KB, 20 21 and KSB took the lead in writing the manuscript. All authors provided critical feedback and helped 22 shape the analysis and manuscript and approaved the final version. 23 24 25 26 Competing interests 27 28 The authors declare no competing interest 29 30 31 Funding 32 33 We also wish to thank the Lundbeck Foundation, Denmark under Grant (R209-2015-3500 to 34 35 Kristoffer Burgdorf), The Danish Administrative Regions, The Danish Blood Donor Research

36 Foundation (Bloddonorernes Forskningsfond), Rigshospitalets Research Foundation, The Novo http://bmjopen.bmj.com/ 37 38 Nordisk Foundation (NNF14CC0001 and NNF17OC0027594), and CANDY foundation 39 40 (CEHEAD). 41 42 43 References

44 on September 29, 2021 by guest. Protected copyright. 45 46 47 1. Burgdorf KS, Simonsen J, Sundby A, et al. Socio-demographic characteristics of Danish blood donors. PLoS 48 One 2017;12(2):e0169112. doi: 10.1371/journal.pone.0169112 49 2. van der Harst P, Zhang W, Mateo Leach I, et al. Seventy-five genetic loci influencing the human red blood 50 cell. Nature 2012;492(7429):369-75. doi: 10.1038/nature11677 51 3. McLaren CE, Garner CP, Constantine CC, et al. Genome-wide association study identifies genetic loci 52 associated with iron deficiency. PLoS One 2011;6(3):e17390. doi: 10.1371/journal.pone.0017390 53 4. Tanaka T, Roy CN, Yao W, et al. A genome-wide association analysis of serum iron concentrations. Blood 54 55 2010;115(1):94-6. doi: 10.1182/blood-2009-07-232496 56 5. Sorensen E, Grau K, Berg T, et al. A genetic risk factor for low serum ferritin levels in Danish blood donors. 57 Transfusion 2012;52(12):2585-9. doi: 10.1111/j.1537-2995.2012.03629.x 58 6. Warneken F, Tomasello M. The roots of human altruism. Br J Psychol 2009;100(Pt 3):455-71. doi: 59 10.1348/000712608X379061 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml Page 17 of 20 BMJ Open

1 2 3

4 7. Steele WR, Schreiber GB, Guiltinan A, et al. role of altruistic behavior, empathetic concern, and social BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 responsibility motivation in blood donation behavior. Transfusion 2008;48(1):43-54. doi: 6 7 10.1111/j.1537-2995.2007.01481.x 8 8. Pedersen OB, Axel S, Rostgaard K, et al. The heritability of blood donation: a population-based nationwide 9 twin study. Transfusion 2015;55(9):2169-74; quiz 68. doi: 10.1111/trf.13086 10 9. Pedersen OB, Erikstrup C, Kotze SR, et al. The Danish Blood Donor Study: a large, prospective cohort and 11 biobank for medical research. Vox Sang 2012;102(3):271. doi: 10.1111/j.1423-0410.2011.01553.x 12 10. Burgdorf KS, Felsted N, Mikkelsen S, et al. Digital questionnaire platform in the Danish Blood Donor Study. 13 Comput Methods Programs Biomed 2016;135:101-4. doi: 10.1016/j.cmpb.2016.07.023 14 11. Theut Riis P, Pedersen O, Sigsgaard V, et al. Prevalence of patients with self-reported hidradenitis 15 16 suppurativa in a cohort of Danish blood donors: a cross-sectional study. British Journal of 17 Dermatology 2018 18 12. Kotzé SR, Pedersen OB,For Petersen peer MS, et al. Deferralreview for low hemoglobin only is not associated with increased 19 risk of infection in Danish blood donors. Transfusion 2017;57(3):571-77. 20 13. Didriksen M, Allen RP, Burchell BJ, et al. Restless legs syndrome is associated with major comorbidities in 21 a population of Danish blood donors. Sleep Med 2018;45:124-31. doi: 10.1016/j.sleep.2018.02.007 22 [published Online First: 2018/04/24] 23 14. Didriksen M, Hansen TF, Thørner LW, et al. Restless legs syndrome is associated with increased risk of 24 migraine. Cephalalgia Reports 2018;1:2515816318780743. 25 26 15. Kaspersen KA, Pedersen OB, Petersen MS, et al. Obesity and risk of infection: results from the Danish 27 Blood Donor Study. Epidemiology 2015;26(4):580-89. 28 16. Kotze SR, Pedersen OB, Petersen MS, et al. Predictors of hemoglobin in Danish blood donors: results from 29 the Danish Blood Donor Study. Transfusion 2015;55(6):1303-11. doi: 10.1111/trf.13011 [published 30 Online First: 2015/02/04] 31 17. Rigas AS, Pedersen OB, Magnussen K, et al. Iron deficiency among blood donors: experience from the 32 Danish Blood Donor Study and from the Copenhagen ferritin monitoring scheme. Transfus Med 2017 33 doi: 10.1111/tme.12477 [published Online First: 2017/10/13] 34 35 18. Schmidt M, Schmidt SAJ, Sandegaard JL, et al. The Danish National Patient Registry: a review of content, data quality, and research potential. Clinical epidemiology 2015;7:449. 36 http://bmjopen.bmj.com/ 37 19. Ullum H, Rostgaard K, Kamper-Jørgensen M, et al. Blood donation and blood donor mortality after 38 adjustment for a healthy donor effect. Transfusion 2015;55(10):2479-85. 39 20. Rigas AS, Pedersen OB, Sørensen E, et al. Frequent blood donation and offspring birth weight—a 40 next-generation association? Transfusion 2018 41 21. Eggertsson HP, Jonsson H, Kristmundsdottir S, et al. Graphtyper enables population-scale genotyping 42 using pangenome graphs. Nature genetics 2017;49(11):1654. 43 22. Kalia SS, Adelman K, Bale SJ, et al. Recommendations for reporting of secondary findings in clinical exome

44 on September 29, 2021 by guest. Protected copyright. 45 and genome sequencing, 2016 update (ACMG SF v2. 0): a policy statement of the American College 46 of Medical Genetics and Genomics. Genetics in Medicine 2017;19(2):249. 47 23. Wang K, Li M, Hadley D, et al. PennCNV: an integrated hidden Markov model designed for high-resolution 48 copy number variation detection in whole-genome SNP genotyping data. Genome research 49 2007;17(11):1665-74. 50 24. Abraham G, Qiu Y, Inouye M. FlashPCA2: principal component analysis of Biobank-scale genotype 51 datasets. Bioinformatics 2017;33(17):2776-78. doi: 10.1093/bioinformatics/btx299 [published 52 Online First: 2017/05/06] 53 54 25. Burgdorf K, Pedersen O, Sørensen E, et al. Extending the gift of donation: blood donor public health 55 studies. ISBT Science Series 2015;10(S1):225-30. 56 26. Fry A, Littlejohns TJ, Sudlow C, et al. Comparison of Sociodemographic and Health-Related Characteristics 57 of UK Biobank Participants with the General Population. Am J Epidemiol 2017 doi: 58 10.1093/aje/kwx246 59 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml BMJ Open Page 18 of 20

1 2 3

4 27. Fan CC, McGrath JJ, Appadurai V, et al. Spatial gene-by-environment mapping for schizophrenia reveals BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 locale of upbringing effects beyond urban-rural differences. bioRxiv 2018:315820. 6 7 28. Nielsen J, Roge R, Pristed SG, et al. Soluble urokinase-type plasminogen activator receptor levels in 8 patients with schizophrenia. Schizophr Bull 2015;41(3):764-71. doi: 10.1093/schbul/sbu118 9 10 11 12 13 14 15 16 17 18 For peer review only 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

36 http://bmjopen.bmj.com/ 37 38 39 40 41 42 43

44 on September 29, 2021 by guest. Protected copyright. 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml Page 19 of 20 BMJ Open

1 2 3

4 BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 6 7 Figure 1. Minor allele frequency (MAF) distribution of the genotyped SNPs prior to quality 8 control. 9 10 Figure 2. Relationships for DBDS donors genotyped as part of the first batch(n=85,000). Each point 11 represents a pair of related individuals and the colors indicate the degree of relatedness ("InfType” in 12 13 KING IBD segment inference): monozygotic twins and technical duplicates(MZ/DUP) in pink (in 14 the upper left corner), 1st degree relatives as parent-offspring pairs (PO, dark green) and full siblings 15 (FS, light green), 2nd and 3rd degree relatives in dark and light purple, respectively. The y-axis shows 16 17 the estimated kinship coefficient, defined as the probability that two alleles sampled at random (one 18 from each individual) areFor identical peer by descent. review The x-axis shows only the proportion of zero identity-by- 19 state (IBS0), defined as the proportion of SNPs at which two samples share no alleles. KING’s criteria 20 21 was used to estimate the degree of relatedness (--related command in KING). We used a set of 22 independent high-quality markers (excluding palindromic and non-autosomal markers, markers with 23 MAF<1%, low call-rate (<99%) and markers in regions with high Linkage Disequilibrium) for the 24 25 relatedness calculation. 26 27 Table 1. Prevalence and missingness of enviroenmental variables 28 Participants Missingness 29 Men 48.8 % 0% 30 Women 51.2 % 0% 31 Age* 41 (29-50) * 0% 32 Height (cm) * 175.6 (169-182) 0.53% 33 Weight (kg) * 78.1 (67-87) 0.82% 34 BMI 25.3 (22-27) 1.2% 35 Smoking 11.9% 1.3%

36 http://bmjopen.bmj.com/ Alcohol consumption (daily) 3.0% 4.7% 37 38 39 Questionnaire 1 87.000 40 Questionnaire 2 50.000 41 * Mean (Quartiles). 42 43

44 on September 29, 2021 by guest. Protected copyright. 45 Table 2. Related pairs (3rd degree or closer) for batch 1 DBDS donors genotyped (N=85,000). 46 47 48 Relationship Monozygotic Parent- Full siblings 2nd degree 3rd degree 49 twins offspring 50 51 Pairs 51 4,246 3,309 4,375 11,433 52 53 54 55 56 57 58 59 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml BMJ Open Page 20 of 20 BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 For peer review only 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

32 http://bmjopen.bmj.com/ 33 34 35 36 37 38 39 Caption : Minor allele frequency (MAF) distribution of the genotyped SNPs prior to quality control.

40 on September 29, 2021 by guest. Protected copyright. 127x127mm (300 x 300 DPI) 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml Page 21 of 20 BMJ Open BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 For peer review only 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

32 http://bmjopen.bmj.com/ 33 Relationships for DBDS donors genotyped as part of the first batch(n=85,000). Each point represents a pair 34 of related individuals and the colors indicate the degree of relatedness ("InfType” in KING IBD segment 35 inference): monozygotic twins and technical duplicates(MZ/DUP) in pink (in the upper left corner), 1st 36 degree relatives as parent-offspring pairs (PO, dark green) and full siblings (FS, light green), 2nd and 3rd 37 degree relatives in dark and light purple, respectively. The y-axis shows the estimated kinship coefficient, 38 defined as the probability that two alleles sampled at random (one from each individual) are identical by 39 descent. The x-axis shows the proportion of zero identity-by-state (IBS0), defined as the proportion of SNPs at which two samples share no alleles. KING’s criteria was used to estimate the degree of relatedness (-- 40 related command in KING). We used a set of independent high-quality markers (excluding palindromic and on September 29, 2021 by guest. Protected copyright. 41 non-autosomal markers, markers with MAF<1%, low call-rate (<99%) and markers in regions with high 42 Linkage Disequilibrium) for the relatedness calculation. 43 44 152x127mm (300 x 300 DPI) 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml BMJ Open BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from

Cohort profile: DBDS Genomic Cohort, a prospective and comprehensive resource for integrative and temporal analysis of genetic, environmental, and lifestyle factors affecting health of blood donors.

ForJournal: peerBMJ Open review only Manuscript ID bmjopen-2018-028401.R2

Article Type: Cohort profile

Date Submitted by the 19-Mar-2019 Author:

Complete List of Authors: Hansen, Thomas Folkmann; Rigshospitalet Glostrup, Danish Headache Center; Kobenhavns Universitet Sundhedsvidenskabelige Fakultet, Center for protein research Banasik, Karina; Kobenhavns Universitet Sundhedsvidenskabelige Fakultet, Centre for Protein research Erikstrup, Christian; Aarhus Universitetshospital, Immunology Pedersen, Ole; Neastved hospital, Department of immunology Westergaard , David ; Kobenhavns Universitet Sundhedsvidenskabelige Fakultet, Center for protein research Chmura , Piotr; Kobenhavns Universitet Sundhedsvidenskabelige Fakultet, Center for protein research http://bmjopen.bmj.com/ Nielsen, Kaspar; Aalborg university hospital, immunology Thørner, Lise; Rigshospitalet, immunology Hjalgrim, Henrik; Statens Serum Institut Paarup, Helene; Odense Universitetshospital, Immunology Larsen , Margit ; Rigshospitalet, immunology Petersen, Mikkel; Aarhus Universitetshospital, Immunology Jennum, Poul; Glostrup Hostpital, Danish Center for Sleep Medicine, Department of Clinical Neurophysiology; University of Copenhagen,

Center for Healthy Ageing on September 29, 2021 by guest. Protected copyright. Andersen, Steffen; Copenhagen Business School, Economics Nyegaard, Mette; Aarhus Universitet Health, Biomedicine Jemec, Gregor; Roskilde University Hospital, Dermatology Olesen, Jes; Rigshospitalet Glostrup, Danish Headache Center Werge, Thomas; Kobenhavns Universitet Sundhedsvidenskabelige Fakultet, Clinical medicine Johansson, Pär I; Copenhagen University Hospital, Rigshospitalet, Department of Clinical Immunology Sørensen, Erik; Rigshospitalet, immunology Brunak, S; University of Copenhagen Ullum, Henrik; Rigshospitalet, immunology Burgdorf, Kristoffer ; Rigshospitalet, immunology

Primary Subject Genetics and genomics Heading:

Secondary Subject Heading: Public health

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml Page 1 of 20 BMJ Open BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 1 2 3 Blood bank & transfusion medicine < PATHOLOGY, GENETICS, 4 Keywords: PREVENTIVE MEDICINE, PUBLIC HEALTH 5 6 7 8 9 10 11 12 13 14 15 16 For peer review only 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

32 http://bmjopen.bmj.com/ 33 34 35 36 37 38 39

40 on September 29, 2021 by guest. Protected copyright. 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml BMJ Open Page 2 of 20

1 2 3 4 Cohort profile: DBDS Genomic Cohort, a prospective and comprehensive resource for BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 6 7 integrative and temporal analysis of genetic, environmental, and lifestyle factors affecting 8 9 health of blood donors. 10 11 Thomas Folkmann Hansen*1,2 Karina Banasik*2, Christian Erikstrup3, Ole Birger Pedersen4, David 12 13 2 2 5 6 14 Westergaard , Piotr Jaroslaw Chmura , Kaspar René Nielsen , Lise Wegner Thørner , Henrik 15 16 Hjalgrim7, Helene M. Paarup8, Margit Hørup Larsen6, Mikkel Steen Petersen3, Poul Jørgen Jennum9, 17 18 Steffen Andersen10, MetteFor Nyegaard peer11, Gregor review B. Jemec4,12 only, Jes Olesen1, Thomas Werge13, Pär 19 20 6 6 2 6 6 21 Ingemar Johansson , Erik Sørensen , Søren Brunak , Henrik Ullum , Kristoffer Sølvsten Burgdorf 22 23 24 25 Affiliations: 26 27 28 1Danish Headache Center, Department of Neurological department, Copenhagen University Hospital, 29 30 31 Glostrup, Denmark. 32 33 2Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, 34 35 University of Copenhagen, Copenhagen, Denmark.

36 http://bmjopen.bmj.com/ 37 3 38 Department of Clinical Immunology, Aarhus University Hospital, Aarhus, Denmark 39 40 4Department of Clinical Immunology, Naestved Hospital, Naestved, Denmark. 41 42 5Department of Clinical Immunology, Aalborg University Hospital, Aalborg, Denmark 43

44 6Department of Clinical Immunology, Copenhagen University Hospital, Rigshospitalet, Copenhagen, on September 29, 2021 by guest. Protected copyright. 45 46 47 Denmark 48 49 7Department of Epidemiology Research, Statens Serum Institut, Copenhagen, Denmark, Department 50 51 of Hematology & Copenhagen University Hospital, Copenhagen, Denmark 52 53 8 54 Department of Clinical Immunology, Odense University Hospital, Odense, Denmark 55 56 9Danish Center for Sleep Medicine, Department of Clinical Neurophysiology, Copenhagen 57 58 University Hospital, Copenhagen, Denmark, 59 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml Page 3 of 20 BMJ Open

1 2 3 4 10Department of Economics, Copenhagen Business School BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 6 11 7 Department of Biomedicine, Aarhus University, Aarhus, Denmark 8 9 12Department of Dermatology, Zealand University Hospital, Roskilde; Faculty of Health and Medical 10 11 Sciences, University of Copenhagen, Denmark 12 13 13 14 Institute of Biological Psychiatry, Mental Health Centre Sct. Hans, Copenhagen University 15 16 Hospital, Roskilde, Denmark & Department of Clinical Medicine, University of Copenhagen, 17 18 Copenhagen, DenmarkFor peer review only 19 20 *Both authors contributed equally 21 22 23 24 25 26 27 Corresponding author: 28 29 30 Thomas Folkmann Hansen, [email protected] 31 32 33 34 35

36 http://bmjopen.bmj.com/ 37 38 39 40 41 42 43

44 on September 29, 2021 by guest. Protected copyright. 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml BMJ Open Page 4 of 20

1 2 3 4 Abstract BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 6 7 Purpose: To establish a cohort that enables identification of genomic factors that influence human 8 9 health and empower increased blood donor health and safe blood transfusions. 10 11 Human health is complex and involves several factors, a major one being the genomic aspect. The 12 13 14 genomic era has resulted in many consortia encompassing large samples sizes, which has proven 15 16 successful for identifying genetic factors associated with specific traits. However, it remains a big 17 18 challenge to establish largeFor cohorts peer that facilitate review studies of theonly interaction between genetic factors, 19 20 environmental and life-style factors as these change over the course of life. A major obstacle to such 21 22 23 endeavors is that it is difficult to revisit participants to retrieve additional information and obtain 24 25 longitudinal, consecutive measurements. 26 27 28 29 30 Participants: Blood donors (n=110,000) have given consent to participate in the Danish Blood Donor 31 32 Study. The study utilizes the infrastructure of the Danish blood banks. 33 34 35

36 http://bmjopen.bmj.com/ 37 Findings to date: The cohort comprises extensive phenotype data and whole genome genotyping data. 38 39 Further, it is possible to retrieve additional phenotype data from national registries as well as from 40 41 the donors at future visits, including consecutive measurements. 42 43

44 on September 29, 2021 by guest. Protected copyright. 45 46 Future plans: To provide new knowledge on factors influencing our health. Thus, provide a platform 47 48 for studying the influence of genomic factors on human health, in particular the interaction between 49 50 environmental and genetic factors. 51 52 53 54 55 56 57 58 59 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml Page 5 of 20 BMJ Open

1 2 3 4 INTRODUCTION BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 6 7 A person's health is determined by complex interactions between genetic, environmental, and lifestyle 8 9 factors. Analyzing these factors collectively and prospectively is preferable. However, this is usually 10 11 only possible using birth-cohorts and large population-based cohorts, and due to the extensive effort 12 13 involved in establishing such cohorts, they are rare. The Danish Blood Donor Study (DBDS; dbds.dk) 14 15 16 is a large prospective cohort of blood donors aiming at identifying predictors of healthy donors. As 17 18 part of this cohort, weFor have nowpeer established review DBDS Genomic only Cohort assessing common single 19 20 nucleotide polymorphisms (SNPs) in 110,000 donors. Thus, the DBDS Genomic Cohort provides a 21 22 23 comprehensive catalog for large-scale genetic analyses in relation to numerous environmental and 24 25 lifestyle factors affecting donor’s health. A description of summary statistics on phenotypes and data 26 27 coverage are provided in Table 1. A detailed sociodemographic description of Danish blood donors, 28 29 30 including sex and age distribution, ethnicity, education, employment and level of urbanization are 31 32 found in Burgdorf et.al 20171. 33 34 35 The evaluation of blood donor’s health is important for several reasons. It is crucial for both the donor

36 http://bmjopen.bmj.com/ 37 and the blood recipient that a healthy donor population is maintained with a high donation rate and a 38 39 40 low dropout rate, thereby ensuring a steady blood supply. Evidence-based guidelines for donor 41 42 recruitment, care, and retention are needed to ensure that donor recruitment can focus on individuals 43

44 who are likely to remain healthy and donate frequently in the long term. One obvious relevant on September 29, 2021 by guest. Protected copyright. 45 46 influential parameter relates to iron metabolism: We know that hundreds of genes impact the 47 48 2 49 generation and regulation of blood cells and also influence phenotype variations of iron absorption 50 51 and metabolism3-5. Genome-wide SNP information is expected to provide knowledge enabling us to 52 53 evaluate to whom donating blood will be unproblematic, thus facilitating retention of a stable blood 54 55 56 donor population. Another parameter is altruism. Altruism as part of a prosocial behavior, the selfless 57 58 concern for the welfare of others6, is generally considered a typical blood donor characteristic7. 59 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml BMJ Open Page 6 of 20

1 2 3 4 However, altruism in the context of voluntary blood donation has also been shown to be a very BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 6 7 7 complex phenotype . In a previous study, we found a substantially larger genetic influence on blood 8 9 donor behavior compared to most previous twin studies on altruism, which further highlights the 10 11 heterogeneity of the blood donor personality8. The considerable amount of kinship (twins and 12 13 14 siblings) in the cohort in combination with socioeconomic variables from the Danish registries will 15 16 enable us to further differentiate between genetic impact and social basis of altruism in blood 17 18 donation. The genome-wideFor SNPpeer information review will provide only knowledge that can aid in the 19 20 identification of long-term and steady donors. Further, we will test for association between genotype 21 22 23 SNP information and prodromal symptoms of somatic and psychiatric disease or illnesses. 24 25 26 The DBDS Genomic Cohort offers the possibility to assess the impact of heterogeneous exposures in 27 28 a broad range of phenotypes, such as mental state, risk-taking behavior, and characterization of blood 29 30 31 components, and immune defense. The setup of the study allows for researchers to assess the genetic 32 33 association in a) cross sectional studies to investigate e.g. the variation of phenotypic characteristic, 34 35 clinical, and biochemical measurements, b) retrospective studies of e.g. rehabilitation capacity, and

36 http://bmjopen.bmj.com/ 37 c) prospective studies e.g. analyzing the variation of phenotypes and clinical measurements over time 38 39 40 and even identify prodromal symptoms. 41 42 43 COHORT DESCRIPTION

44 on September 29, 2021 by guest. Protected copyright. 45 46 Population: The nation-wide blood donor population in Denmark consists of more than 230,000 47 48 donors giving more than 300,000 blood donations annually (www.bloddonor.dk). Blood donation in 49 50 51 Denmark is voluntary and unpaid. This means that the donation is based on the desire to help others, 52 53 who need it and not the desire of an economic benefit. Blood donors must be physically well, aged 54 55 17 and 67 years, and weigh more than 50 kilos. Individuals in chronic medical treatment or frequent 56 57 travelers to countries with high prevalence of blood disease are not allowed to participate. The deferral 58 59 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml Page 7 of 20 BMJ Open

1 2 3 4 rules can be seen at www.bloddonor.dk. Blood donors from foreign countries must have lived in BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 6 7 Denmark for a minimum of one year, have a Danish social security number and have learned the 8 9 Danish language to prevent misunderstandings between the donor and the blood bank professionals. 10 11 12 The nationwide Danish blood bank is an integrated part of the Danish healthcare system financed by 13 14 local and state taxes. The Danish healthcare system is administrated by democratically elected 15 16 17 assemblies from national state institutions, regions and municipalities. The Danish blood banks are 18 For peer review only 19 non-profit organizations owned and operated by each of the five regions in Denmark. The blood banks 20 21 have a national board to structure collaboration across regions on recruiting donors, processing and 22 23 24 distributing the blood for the Danish population. The DBDS is building upon the structured Danish 25 26 blood bank system in the regions responsible for administrating donation sites at 27 hospitals in 27 28 addition to five mobile donation units using 180 selected sites nationally (e.g., large companies, sports 29 30 31 centers and universities) 32 33 34 At the blood donation centers and attached laboratories, the entire necessary infrastructure needed for 35

36 the collection of biological samples and structured data is in place. Both blood plasma and whole http://bmjopen.bmj.com/ 37 38 blood for DNA extraction are available from all donors. The blood bank infrastructure already has 39 40 41 laboratory facilities with educated staff (nurses, technicians, IT specialists and physicians). In 42 43 addition, the blood bank professionals facilitate the testing of the blood for a variety of biomarkers

44 on September 29, 2021 by guest. Protected copyright. 45 and holds expertise in large scale storage of biological material. 46 47 48 Contact to participants at the blood banks is fundamental to our study. Blood donors are asked to 49 50 51 participate and sign an informed consent when they visit the blood bank to donate blood. This consent 52 53 allows us to use the blood samples from their past and future donations to study the impact of genetic 54 55 and immunological factors on current and future health and disease. The inclusion and exclusion 56 57 58 59 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml BMJ Open Page 8 of 20

1 2 3 4 criteria for blood donation and participation in DBDS are the same with 95% of the blood donors BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 6 9 7 who are invited agrees to participate in DBDS . 8 9 10 Questionnaires: From March 2010 until July 2015, all participating donors had to complete a four- 11 12 page paper-based questionnaire with questions of self-experienced physical and mental health 13 14 including the 12-item short form (SF-12®) standardized health survey, smoking habits, alcohol 15 16 17 intake, exercise, food intake, supplemental iron intake, height, weight, and waist circumference. 18 For peer review only 19 85,000 individuals filled out the paper questionnaire. As a follow-up to the initial paper-based 20 21 questionnaire, we have developed and implemented a digital and flexible tablet-based questionnaire 22 23 10 24 platform, using the open source survey software tool LimeSurvey . This enables a rapid, easy, and 25 26 cost-effective procedure to collect self-reported data on health traits from the participating donors at 27 28 the donor sites at multiple time-points. The first digital questionnaire was implemented and used from 29 30 31 July 2015 until May 2018. The questionnaire was focused the following research questions: allergy, 32 33 ADHD, migraine, hidradenitis, depression, and Restless Legs Syndrome. It also contains questions 34 35 from the paper-based questionnaire; SF-12, smoking habits, alcohol intake, height and weight. In

36 http://bmjopen.bmj.com/ 37 total, 48,000 DBDS participants completed the first digital questionnaire. The second digital 38 39 40 questionnaire started June 2018. It includes questions on: Sleep patterns, anxiety, migraine, stress, 41 42 skin diseases, endometriosis, pain, learning difficulties, SF-12, smoking habits, alcohol intake, height 43

44 and weight. Using the questionnaire data, several studies have already assessed the factors describing on September 29, 2021 by guest. Protected copyright. 45 46 11 12 47 the general health of blood donors, e.g. hidradentis , risk of infection , migraine, restless legs, and 48 49 depression13 14. Also, by revisiting samples it has been possible to assess wether infections correlates 50 51 with obesity15 and how iron deficiency and hemoglobin levels affect donor health in general16 17. 52 53 54 Registry-based data: There is and has always been equal access to the healthcare system in Denmark, 55 56 57 hence, the centralized national civil and health registries are unbiased and comprehensive resources 58 59 of healthcare data. The registries include the National Patient Registry containing all hospital- 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml Page 9 of 20 BMJ Open

1 2 3 4 registered diagnoses since 197518, as well as other specialized registries, e.g. the Danish Medical BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 6 7 Birth Register, the Danish Register of Causes of Death, and Statistics Denmark monitoring e.g. socio- 8 9 economic data. We have already used the DBDS cohort in epidemiological studies assessing e.g. the 10 11 mortality 19 of donors and the effect of blood donation on offspring birth weight20. 12 13 14 DBDS Organization: The DBDS itself is described in detail by Pedersen et al.9. Briefly, DBDS is 15 16 17 governed by a steering committee with a scientific advisory board. All projects are managed by the 18 For peer review only 19 DBDS steering committee. Genetic projects involving genetic data in DBDS are run in collaboration 20 21 between DBDS Genomic Consortium that consists of the DBDS steering committee, deCODE 22 23 ® 24 Genetics , and scientific collaborators. 25 26 27 28 Genotyping: DNA is purified from whole blood and subsequent stored at -20ºC. All samples are then 29 30 31 genotyped in two batches at deCODE genetics using the Global Screening Array (GSA) by Illumina 32 33 (batch 1 n=85,000 and batch 2 n=25,000). The array has a very rich up-to-date content of >650.000 34 35 SNPs with custom chip content optimized for comparison with the Illumina Omni Express chip. All

36 http://bmjopen.bmj.com/ 37 genotype data are processed simultaneously for genotype calling, quality control and imputation. 38 39 40 Initially, individuals or SNPs with more than 10% missing data are excluded, as are individuals 41 42 deviating more than 3 standard deviations from the population heterozygosity (correcting for 43

44 individuals carrying large copy number variations (>100Kbp)). on September 29, 2021 by guest. Protected copyright. 45 46 47 Imputation 48 49 The genotyping data is imputed using a reference panel backbone consisting of 1) UK 1KG phase 3 50 51 and HapMap reference to predict non-genotyped SNPs with minor allele frequency (MAF)>1%, and 52 53 54 2) an in-house dataset consisting of n>6,000 Danish whole genome sequences to improve the 55 56 prediction of variations with a MAF down to around 0.01%21. Variants listed in the American College 57 58 of Medical Genetics and Genomics guidelines are currently not predicted, due to restrictions from the 59 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml BMJ Open Page 10 of 20

1 2 3 4 Danish National Ethics Committee 22. For future collaborative studies the Ethics Committee will BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 6 7 approve analysis of these variants on a case by case basis. 8 9 Copy number variations 10 11 Using the genotype of b-allele count and log Ratio, copy number variations (CNVs) are called using 12 13 23 14 pennCNV . CNVs called using <20 SNPs are excluded and the remaining CNVs are visually 15 16 inspected to exclude false positives. 17 18 Statistical design: All dataFor are stored peer and analyzed review on a specialized, only secure section of the 16,000 core 19 20 Danish National Supercomputer for Life Sciences - Computerome (www.computerome.dtu.dk). Data 21 22 23 storage and computational analysis is performed on a protected, private cloud environment. The 24 25 analysis environment is capable of dynamic scaling and has been successfully tested in a composition 26 27 of over 100 servers totaling more than 300 CPUs, over 13TB of RAM and has access to up to 5.7PB 28 29 30 (5,700TB) of disk space. The cluster comes with a preconfigured queueing system, possibility to run 31 32 Virtual Machines and containers (e.g. Docker, Singularity), a set of over 900 preinstalled tools and 33 34 packages and a possibility to add GPU servers optimized for Machine Learning and specialized big 35

36 http://bmjopen.bmj.com/ 37 memory systems (1-8TB of RAM). 38 39 40 For each hypothesis tested, a synopsis is provided including a detailed analysis plan. Information on 41 42 each synopsis will be published either as link to published articles describing the results or as 43

44 summary statistic on the study website: www.dbds.dk. on September 29, 2021 by guest. Protected copyright. 45 46 47 Samples: During each visit to a blood donor facility (up to four times per year for whole blood donors 48 49 50 or up to ten times if plasma donors), every participant donates one EDTA plasma sample. At inclusion 51 52 in DBDS one whole blood sample is also taken. Plasma samples taken prior to the inclusion date are 53 54 stored for quality assessments and will also be accessible for future analyses. All samples are frozen 55 56 57 within 6 hours of donation and stored in the primary collection tubes until processing. 58 59 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml Page 11 of 20 BMJ Open

1 2 3 4 Routine blood measurements including e.g. blood group, red and white blood cell counts, hemoglobin BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 6 7 concentration, and hematocrit are obtained at each donation. Besides routine measures, project- 8 9 related measurements are available e.g. subgroups of patients are assessed for ferritin levels, infection 10 11 status (Cytomegalovirus, Toxoplasma gondii, and Herpes Simplex Virus), HLA-typing, and other 12 13 14 selected markers of infection (circulating cytokines, C-reactive protein, etc.). 15 16 17 18 General Data ProtectionFor and Ethical peer Issues andreview Principles: DBDS only has secured necessary permissions 19 20 and approval from the Danish Data Protection Agency (2007-58-0015) and the Scientific Ethical 21 22 23 Committee system (M-20090237). New projects within the DBDS Genomic consortium will require 24 25 additional approval by the National Committee on Health Research Ethics. DBDS will be responsible 26 27 for the continued contact with, and securing future permissions from, relevant Danish authorities 28 29 30 regarding research on DBDS samples. 31 32 33 Patient and Public Involvement: Patients and public were not involved in the design of this study. 34 35 Data availability statement: The study will adhere to the FAIR (http://datafairport.org/: Findable,

36 http://bmjopen.bmj.com/ 37 Accessible, Interoperable and Reusable) concepts. Within this legal framework DBDS Genomic 38 39 40 Consortium Board can thus decide how and under which conditions the data can be shared. Generally, 41 42 relevant summary data will be publicly available via repositories 3 months after acceptance for 43

44 publication (H2020 open-access policy). on September 29, 2021 by guest. Protected copyright. 45 46 47 48 49 50 51 FINDINGS TO DATE 52 53 54 Initial quality measures that have been assessed: 55 56 Kinship 57 58 59 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml BMJ Open Page 12 of 20

1 2 3 4 As described, giving blood often runs in families and the heritability has been estimated to be >53%8. BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 6 7 It is clear from the estimated kinship based on the first batch of participants (n=85.000) (Figure 1, 8 9 Table 2), that there is a considerable 1st, 2nd and 3rd degree relatives among the participants in the 10 11 DBDS Genomic cohort. 12 13 14 Ethnicity 15 16 Based on ~15K overlapping SNPs from the genotyped data and the 1000 Genomes samples, we 17 18 confirm the expected populationFor peer structure ofreview the DBDS cohort; only most participants are of European 19 20 ancestry (99%) and the following two ethnicity groups are of South Asian (0.4%) and East Asian 21 22 23 (0.2%) ancestry, respectively. The proportion of participants with recent African ancestry is 24 25 extremely low (0.002%) which is expected given the strict donor travel quarantine rules. Ethnicity 26 27 was evaluated using FlashPCA224. 28 29 30 Minor allele frequencies 31 32 The distribution of the MAF, shown that majority of SNPs (84%) is above 1% (Figure 2) as expected, 33 34 which provides solid basis for genotype imputation. 35

36 http://bmjopen.bmj.com/ 37 38 39 STRENGTH AND LIMITATIONS 40 41 Consecutive measurements: A unique feature of this large blood donor cohort is the ability to do 42 43 consecutive assessments. In standard settings, participants are typically recruited at a baseline time-

44 on September 29, 2021 by guest. Protected copyright. 45 46 point and are invited for follow-up studies once or twice in the following years. The blood bank 47 48 represents an advantage because most donors have a long-term committed relationship for blood 49 50 donation and are seen one to four times annually25. It is therefore possible to collect several yearly 51 52 53 and consecutive biological samples and questionnaire information over decades for a large number 54 55 of participants. Again, subgroups and samples from specific time-points can be used in a retrospective 56 57 manner. 58 59 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml Page 13 of 20 BMJ Open

1 2 3 4 National-based registries: Denmark has several comprehensive national registries, which include both BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 6 7 health information and socio-demographical measures on an individual level. The informed consent 8 9 allows for combining information obtained from the DBDS participants and the national registries; 10 11 the Danish National Patient Registry (since 1977), the Danish Cancer (since 1943) and Diabetes 12 13 14 Registries (since 1992-2012), the Danish Registry of Medicinal Product Statistics (since 1994), the 15 16 Civil Registration Registry (since 1968 vital status, number of children, birthplace, address, relocation 17 18 and more), the IntegratedFor Database peer for Labor review Market Research only (since 1982, e.g. educational level, 19 20 occupation status, income, social status, and other related parameters). This facilitates retrospective, 21 22 23 cross-sectional, or prospective studies using registry-based measurements in combination with 24 25 questionnaire-derived data. 26 27 28 Donor selection: Although, DBDS participants resemble the Danish population a few limitations in 29 30 1 31 the study design may affect the generalizability of the results . The blood donor exclusion criteria 32 33 dismiss individuals with: infections or diseases that are transmittable through blood, weight below 50 34 35 kg, hemoglobin (Hb) levels below 12.9 g/dL in males and 12.0 g/dL in females, and curious

36 http://bmjopen.bmj.com/ 37 behaviors: individuals with high travel rates to countries with high risk of hepatitis and HIV, men 38 39 40 who have sex with men, individuals who have previously worked as prostitutes, those who have used 41 42 intravenous substances, and pregnant women. Comparing socio-demographical parameters of blood 43

44 donors with that of the total Danish population, we know that very low- and high-income individuals on September 29, 2021 by guest. Protected copyright. 45 46 1 47 are underrepresented among blood donors . In this respect, we acknowledge that DBDS lacks 48 49 coverage of certain parts of the general population in contrast to traditional population-based studies. 50 51 Similarly, the population based UK Biobank study have also reported a “healthy volunteer” bias26. 52 53 54 DISCUSSION 55 56 57 58 59 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml BMJ Open Page 14 of 20

1 2 3 4 The extension of the DBDS with a genomic cohort will profoundly impact the usability and empower BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 6 7 studies on genetic, environmental, and lifestyle factors that influence blood donor health. 8 9 Furthermore, the study provides a unique platform that facilitates analysis of common phenotypes 10 11 not otherwise found in the national health registries, disease resilience factors, and interactions 12 13 14 between genes and environment. At finally, such a large healthy cohort holds a huge potential for 15 16 providing crucial information for future precision medicine initiatives and similar efforts have been 17 18 started, e.g. “All of US”For (by NIH,peer US). Inreview Denmark we haveonly the advantage of a collection of 19 20 extensive, national health registries that facilitate epidemiological studies on specific 21 22 23 diseases/outcomes in such a large cohort. For phenotypes and symptoms not monitored systematically 24 25 in these health registries; e.g. lifestyle factors such as smoking habits, sleep patterns, and self- 26 27 perceived health; large epidemiological studies are needed but typical difficult to conduct. The DBDS 28 29 30 Genomic Cohort can facilitate such studies. As described above, the DBDS Genomic Cohort exploits 31 32 an existing blood donor platform with an extremely high participation rate (>95%), which facilitates 33 34 a straightforward evaluation of donor health in large epidemiological studies. Furthermore, the 35

36 http://bmjopen.bmj.com/ 37 electronic questionnaire platform allows for easy and fast implementation of new, targeted 38 39 investigations in subgroups of the donor population. Together with outcomes from the Danish health 40 41 registries and the millions of retrospective plasma samples stored in easily accessible freezers, DBDS 42 43 represents a solid phenotyping platform that can be used for both cross-sectional epidemiological

44 on September 29, 2021 by guest. Protected copyright. 45 46 studies and for retrospective biomarker studies. We believe that these strengths make the DBDS 47 48 Genomic Cohort a strong competitive player in the field of precision medicine. 49 50 51 The DBDS Genomic Cohort allows us to study gene-environment interaction that are otherwise 52 53 54 difficult to study: testing disease development hypotheses; e.g. a) cognitive performance in 55 56 interaction with genetic factors and the risk of dementia, and b) determining the contribution of 57 58 genetic and environmental factors to a phenotype such as sleep pattern. One way to study this is by 59 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml Page 15 of 20 BMJ Open

1 2 3 4 using multivariate regression models. As an example, Fan et al. have recently portrayed an advanced BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 6 7 example of gene-environment interaction analysis incorporating temporal and spatial 8 9 considerations27, an analysis based on data from the Danish civil registries. The DBDS Genomic 10 11 Cohort is particularly valuable for studying disease resistance in individuals exposed to one or more 12 13 14 known disease risk factors and yet do not proceed to develop the disease. One such example could be 15 16 participants carrying a high load of a highly inheritable trait like psychiatric illness in the family or 17 18 known genetic risk factors,For who peer do not have review psychiatric illnessonly themselves. Lastly, DBDS also 19 20 provides sequential storage of plasma samples, which allows for sequential blood measurements. 21 22 23 Such measurements could be used to investigate health markers like suPAR (soluble urokinase 24 25 plasminogen activating receptor) and the variation associated with the donor’s general health28. 26 27 28 In short, the DBDS genomic cohort facilitates the investigation of the impact of genomic factors on 29 30 31 health traits and states. 32 33 34 Integrative analysis of different ‘omics, i.e. multi-omics analysis will be possible in the large DBDS 35

36 genomic cohort, which adds tremendously to its value as a resource for studying the health of blood http://bmjopen.bmj.com/ 37 38 donors and correlation between blood related traits and states. We expect that the DBDS genomic 39 40 41 cohort will contribute to discovery and validation of prodromal symptoms and biomarkers of disease, 42 43 thus providing a better understanding of the disease pathologies and suggesting new drug targets.

44 on September 29, 2021 by guest. Protected copyright. 45 46 COLLABORATION 47 48 49 We encourage scientific collaborations using the data generated in the DBDS Genetic Consortium. 50 51 Published summarized data is available om request. Otherwise request of data necessitate first 52 53 54 approval by the DBDS steering committee and if the request is considered outside the aim of DBDS, 55 56 application to the national scientific ethical committee is obligatory. Additionally, material transfer 57 58 and data protection agreement need to be acquired. Please visit www.dbds.dk. 59 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml BMJ Open Page 16 of 20

1 2 3 4 Acknowledgements BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 6 7 We wish to express a special thanks to the staffs of the Danish blood banks whose continued inclusion 8 9 of blood-donors into the DBDS makes this research possible. 10 11 12 Contributorship statement 13 14 TH, KB, and KSB conceived and planned the experiments. KB carried out the analyses. OBP, HH, 15 16 HMP, KRN, CE,HU, PJJ, TW, JO, GBJ, MN, SA, PIJ,ES and LWT contributed to cohort and research 17 design. DW, PJC, KB and SB led to data infrastructure design. CE, KSB, MHL, ES and MSP 18 For peer review only 19 contributed to data capture. TH, KB and KSB contributed to the interpretation of the results. TH, KB, 20 21 and KSB took the lead in writing the manuscript. All authors provided critical feedback and helped 22 shape the analysis and manuscript and approaved the final version. 23 24 25 26 Competing interests 27 28 The authors declare no competing interest 29 30 31 Funding 32 33 We also wish to thank the Lundbeck Foundation, Denmark under Grant (R209-2015-3500 to 34 35 Kristoffer Burgdorf), The Danish Administrative Regions, The Danish Blood Donor Research

36 Foundation (Bloddonorernes Forskningsfond), Rigshospitalets Research Foundation, The Novo http://bmjopen.bmj.com/ 37 38 Nordisk Foundation (NNF14CC0001 and NNF17OC0027594), and CANDY foundation 39 40 (CEHEAD). 41 42 43 References

44 on September 29, 2021 by guest. Protected copyright. 45 46 47 1. Burgdorf KS, Simonsen J, Sundby A, et al. Socio-demographic characteristics of Danish blood donors. PLoS 48 One 2017;12(2):e0169112. doi: 10.1371/journal.pone.0169112 49 2. van der Harst P, Zhang W, Mateo Leach I, et al. Seventy-five genetic loci influencing the human red blood 50 cell. Nature 2012;492(7429):369-75. doi: 10.1038/nature11677 51 3. McLaren CE, Garner CP, Constantine CC, et al. Genome-wide association study identifies genetic loci 52 associated with iron deficiency. PLoS One 2011;6(3):e17390. doi: 10.1371/journal.pone.0017390 53 4. Tanaka T, Roy CN, Yao W, et al. A genome-wide association analysis of serum iron concentrations. Blood 54 55 2010;115(1):94-6. doi: 10.1182/blood-2009-07-232496 56 5. Sorensen E, Grau K, Berg T, et al. A genetic risk factor for low serum ferritin levels in Danish blood donors. 57 Transfusion 2012;52(12):2585-9. doi: 10.1111/j.1537-2995.2012.03629.x 58 6. Warneken F, Tomasello M. The roots of human altruism. Br J Psychol 2009;100(Pt 3):455-71. doi: 59 10.1348/000712608X379061 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml Page 17 of 20 BMJ Open

1 2 3

4 7. Steele WR, Schreiber GB, Guiltinan A, et al. role of altruistic behavior, empathetic concern, and social BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 responsibility motivation in blood donation behavior. Transfusion 2008;48(1):43-54. doi: 6 7 10.1111/j.1537-2995.2007.01481.x 8 8. Pedersen OB, Axel S, Rostgaard K, et al. The heritability of blood donation: a population-based nationwide 9 twin study. Transfusion 2015;55(9):2169-74; quiz 68. doi: 10.1111/trf.13086 10 9. Pedersen OB, Erikstrup C, Kotze SR, et al. The Danish Blood Donor Study: a large, prospective cohort and 11 biobank for medical research. Vox Sang 2012;102(3):271. doi: 10.1111/j.1423-0410.2011.01553.x 12 10. Burgdorf KS, Felsted N, Mikkelsen S, et al. Digital questionnaire platform in the Danish Blood Donor Study. 13 Comput Methods Programs Biomed 2016;135:101-4. doi: 10.1016/j.cmpb.2016.07.023 14 11. Theut Riis P, Pedersen O, Sigsgaard V, et al. Prevalence of patients with self-reported hidradenitis 15 16 suppurativa in a cohort of Danish blood donors: a cross-sectional study. British Journal of 17 Dermatology 2018 18 12. Kotzé SR, Pedersen OB,For Petersen peer MS, et al. Deferralreview for low hemoglobin only is not associated with increased 19 risk of infection in Danish blood donors. Transfusion 2017;57(3):571-77. 20 13. Didriksen M, Allen RP, Burchell BJ, et al. Restless legs syndrome is associated with major comorbidities in 21 a population of Danish blood donors. Sleep Med 2018;45:124-31. doi: 10.1016/j.sleep.2018.02.007 22 [published Online First: 2018/04/24] 23 14. Didriksen M, Hansen TF, Thørner LW, et al. Restless legs syndrome is associated with increased risk of 24 migraine. Cephalalgia Reports 2018;1:2515816318780743. 25 26 15. Kaspersen KA, Pedersen OB, Petersen MS, et al. Obesity and risk of infection: results from the Danish 27 Blood Donor Study. Epidemiology 2015;26(4):580-89. 28 16. Kotze SR, Pedersen OB, Petersen MS, et al. Predictors of hemoglobin in Danish blood donors: results from 29 the Danish Blood Donor Study. Transfusion 2015;55(6):1303-11. doi: 10.1111/trf.13011 [published 30 Online First: 2015/02/04] 31 17. Rigas AS, Pedersen OB, Magnussen K, et al. Iron deficiency among blood donors: experience from the 32 Danish Blood Donor Study and from the Copenhagen ferritin monitoring scheme. Transfus Med 2017 33 doi: 10.1111/tme.12477 [published Online First: 2017/10/13] 34 35 18. Schmidt M, Schmidt SAJ, Sandegaard JL, et al. The Danish National Patient Registry: a review of content, data quality, and research potential. Clinical epidemiology 2015;7:449. 36 http://bmjopen.bmj.com/ 37 19. Ullum H, Rostgaard K, Kamper-Jørgensen M, et al. Blood donation and blood donor mortality after 38 adjustment for a healthy donor effect. Transfusion 2015;55(10):2479-85. 39 20. Rigas AS, Pedersen OB, Sørensen E, et al. Frequent blood donation and offspring birth weight—a 40 next-generation association? Transfusion 2018 41 21. Eggertsson HP, Jonsson H, Kristmundsdottir S, et al. Graphtyper enables population-scale genotyping 42 using pangenome graphs. Nature genetics 2017;49(11):1654. 43 22. Kalia SS, Adelman K, Bale SJ, et al. Recommendations for reporting of secondary findings in clinical exome

44 on September 29, 2021 by guest. Protected copyright. 45 and genome sequencing, 2016 update (ACMG SF v2. 0): a policy statement of the American College 46 of Medical Genetics and Genomics. Genetics in Medicine 2017;19(2):249. 47 23. Wang K, Li M, Hadley D, et al. PennCNV: an integrated hidden Markov model designed for high-resolution 48 copy number variation detection in whole-genome SNP genotyping data. Genome research 49 2007;17(11):1665-74. 50 24. Abraham G, Qiu Y, Inouye M. FlashPCA2: principal component analysis of Biobank-scale genotype 51 datasets. Bioinformatics 2017;33(17):2776-78. doi: 10.1093/bioinformatics/btx299 [published 52 Online First: 2017/05/06] 53 54 25. Burgdorf K, Pedersen O, Sørensen E, et al. Extending the gift of donation: blood donor public health 55 studies. ISBT Science Series 2015;10(S1):225-30. 56 26. Fry A, Littlejohns TJ, Sudlow C, et al. Comparison of Sociodemographic and Health-Related Characteristics 57 of UK Biobank Participants with the General Population. Am J Epidemiol 2017 doi: 58 10.1093/aje/kwx246 59 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml BMJ Open Page 18 of 20

1 2 3

4 27. Fan CC, McGrath JJ, Appadurai V, et al. Spatial gene-by-environment mapping for schizophrenia reveals BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 locale of upbringing effects beyond urban-rural differences. bioRxiv 2018:315820. 6 7 28. Nielsen J, Roge R, Pristed SG, et al. Soluble urokinase-type plasminogen activator receptor levels in 8 patients with schizophrenia. Schizophr Bull 2015;41(3):764-71. doi: 10.1093/schbul/sbu118 9 10 11 12 13 14 15 16 17 18 For peer review only 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

36 http://bmjopen.bmj.com/ 37 38 39 40 41 42 43

44 on September 29, 2021 by guest. Protected copyright. 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml Page 19 of 20 BMJ Open

1 2 3

4 BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 5 6 7 Figure 1. Relationships for DBDS donors genotyped as part of the first batch(n=85,000). Each 8 point represents a pair of related individuals and the colors indicate the degree of relatedness 9 ("InfType” in KING IBD segment inference): monozygotic twins and technical 10 st 11 duplicates(MZ/DUP) in pink (in the upper left corner), 1 degree relatives as parent-offspring pairs 12 (PO, dark green) and full siblings (FS, light green), 2nd and 3rd degree relatives in dark and light 13 purple, respectively. The y-axis shows the estimated kinship coefficient, defined as the probability 14 that two alleles sampled at random (one from each individual) are identical by descent. The x-axis 15 16 shows the proportion of zero identity-by-state (IBS0), defined as the proportion of SNPs at which 17 two samples share no alleles. KING’s criteria was used to estimate the degree of relatedness (-- 18 related command in KING).For We peer used a set ofreview independent high-quality only markers (excluding 19 20 palindromic and non-autosomal markers, markers with MAF<1%, low call-rate (<99%) and 21 markers in regions with high Linkage Disequilibrium) for the relatedness calculation. 22 23 Figure 2. Minor allele frequency (MAF) distribution of the genotyped SNPs prior to quality control. 24 25 Table 1. Prevalence and missingness of enviroenmental variables 26 Participants Missingness 27 28 Men 48.8 % 0% 29 Women 51.2 % 0% 30 Age* 41 (29-50) * 0% 31 Height (cm) * 175.6 (169-182) 0.53% 32 Weight (kg) * 78.1 (67-87) 0.82% 33 BMI 25.3 (22-27) 1.2% 34 Smoking 11.9% 1.3% 35 Alcohol consumption (daily) 3.0% 4.7%

36 http://bmjopen.bmj.com/ 37 Questionnaire 1 87.000 38 Questionnaire 2 50.000 39 * Mean (Quartiles). 40 41 42 43 Table 2. Related pairs (3rd degree or closer) for batch 1 DBDS donors genotyped (N=85,000).. 44 on September 29, 2021 by guest. Protected copyright. 45 46 47 Relationship Monozygotic Parent- Full siblings 2nd degree 3rd degree 48 twins offspring 49 Pairs 51 4,246 3,309 4,375 11,433 50 51 52 53 54 55 56 57 58 59 60

For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml BMJ Open Page 20 of 20 BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 For peer review only 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

32 http://bmjopen.bmj.com/ 33 Relationships for DBDS donors genotyped as part of the first batch(n=85,000). Each point represents a pair 34 of related individuals and the colors indicate the degree of relatedness ("InfType” in KING IBD segment 35 inference): monozygotic twins and technical duplicates(MZ/DUP) in pink (in the upper left corner), 1st 36 degree relatives as parent-offspring pairs (PO, dark green) and full siblings (FS, light green), 2nd and 3rd 37 degree relatives in dark and light purple, respectively. The y-axis shows the estimated kinship coefficient, 38 defined as the probability that two alleles sampled at random (one from each individual) are identical by 39 descent. The x-axis shows the proportion of zero identity-by-state (IBS0), defined as the proportion of SNPs at which two samples share no alleles. KING’s criteria was used to estimate the degree of relatedness (-- 40 related command in KING). We used a set of independent high-quality markers (excluding palindromic and on September 29, 2021 by guest. Protected copyright. 41 non-autosomal markers, markers with MAF<1%, low call-rate (<99%) and markers in regions with high 42 Linkage Disequilibrium) for the relatedness calculation 43 44 152x127mm (300 x 300 DPI) 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml Page 21 of 20 BMJ Open BMJ Open: first published as 10.1136/bmjopen-2018-028401 on 9 June 2019. Downloaded from 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 For peer review only 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

32 http://bmjopen.bmj.com/ 33 34 35 36 37 38 39 Caption : Minor allele frequency (MAF) distribution of the genotyped SNPs prior to quality control.

40 on September 29, 2021 by guest. Protected copyright. 127x127mm (300 x 300 DPI) 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 For peer review only - http://bmjopen.bmj.com/site/about/guidelines.xhtml