ARTICLE

DOI: 10.1038/s41467-017-01261-5 OPEN Functional mapping and annotation of genetic associations with FUMA

Kyoko Watanabe1, Erdogan Taskesen 1,2, Arjen van Bochoven3 & Danielle Posthuma 1,4

A main challenge in genome-wide association studies (GWAS) is to pinpoint possible causal variants. Results from GWAS typically do not directly translate into causal variants because the majority of hits are in non-coding or intergenic regions, and the presence of

1234567890 linkage disequilibrium leads to effects being statistically spread out across multiple variants. Post-GWAS annotation facilitates the selection of most likely causal variant(s). Multiple resources are available for post-GWAS annotation, yet these can be time consuming and do not provide integrated visual aids for data interpretation. We, therefore, develop FUMA: an integrative web-based platform using information from multiple biological resources to facilitate functional annotation of GWAS results, prioritization and interactive visualization. FUMA accommodates positional, expression quantitative trait loci (eQTL) and chromatin interaction mappings, and provides gene-based, pathway and tissue enrichment results. FUMA results directly aid in generating hypotheses that are testable in functional experiments aimed at proving causal relations.

1 Department of Complex Trait Genetics, Center for Neurogenomics and Cognitive Research, VU University Amsterdam, Amsterdam 1081 HV, The Netherlands. 2 VU University Medical Center (VUMC), Alzheimercentrum, Amsterdam 1081 HV, The Netherlands. 3 Faculty of Science, VU University Amsterdam, Amsterdam 1081 HV, The Netherlands. 4 Department of Clinical Genetics, VU University Medical Center, Amsterdam Neuroscience, Amsterdam 1081 HV, The Netherlands. Correspondence and requests for materials should be addressed to D.P. (email: [email protected])

NATURE COMMUNICATIONS | 8: 1826 | DOI: 10.1038/s41467-017-01261-5 | www.nature.com/naturecommunications 1 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-01261-5

n the past decade, more than 2500 genome-wide association interactive figures that can be used to explore associations in more studies (GWAS) have identified thousands of genetic loci for depth and aids, e.g., in identifying multiple lines of evidence I 1 hundreds of traits . The past 3 years have seen an explosive pointing to the same prioritized gene, or in connecting hits in – increase in GWAS sample sizes2 4, and these are expected to several via biological pathways. increase even further to 0.5–1 million in the next year and beyond5. These well-powered GWAS will not only lead to more reliable results but also to an increase in the number of detected Results disease-associated genetic loci. To benefit from these results, it is Overview of FUMA web application. FUMA incorporates 18 crucial to translate genetic loci into actionable variants that can biological data repositories and tools to process GWAS summary guide functional genomics experimentation and drug target statistics and provide a variety of annotations (Supplementary testing6. However, since the majority of GWAS hits are located in Table 1). To accomplish this task, FUMA consists of two separate non-coding or intergenic regions7, direct inference from sig- processes described in detail below. nificantly associated single-nucleotide polymorphisms (SNPs) The core function of FUMA is the SNP2GENE process (Fig.1) rarely yields functional variants. More commonly, GWAS hits in which SNPs are annotated with their biological functionality span a genomic region (“GWAS risk loci”) that is characterized and mapped to genes based on positional, eQTL and chromatin by multiple correlated SNPs, and may cover multiple closely interaction information of SNPs. First, based on the provided located genes. Some of these genes may be relevant to the disease, summary statistics (input format is available in Supplementary while others are not, yet due to the correlated nature of closely Note 1), independent significant SNPs and their surrounding located genetic variants, distinguishing relevant from non- genomic loci are identified by FUMA depending on LD structure, relevant genes is often not possible based on association P- and define lead SNPs and genomic risk loci (Methods). values alone. Pinpointing the most likely relevant, causal genes Independent significant SNPs and SNPs that are in LD with the and variants requires integrating available information about independent significant SNPs are then annotated for functional regional linkage disequilibrium (LD) patterns and functional consequences on gene functions (based on Ensembl genes (build consequences of correlated SNPs, such as deleteriousness of 85) using ANNOVAR12), deleteriousness score (CADD score13), variants, but also their effects on as well as their potential regulatory functions (RegulomeDB score14 and 15-core role in chromatin interaction sites. Ideally, functional inferences chromatin state predicted by ChromHMM15 for 127 tissue/cell obtained from different repositories are integrated, and annotated types9,10), effects on gene expression using eQTLs of various SNP effects are interpreted in the broader context of genes and tissue types and 3D structure of chromatin interactions with Hi-C molecular pathways. For example, consider a genomic risk locus data (Methods). In addition, independent significant SNPs and with one lead SNP associated with an increased risk for a disease, correlated SNPs are also linked to the GWAS catalog1 to provide and several dozen other SNPs in LD with the lead SNP that also insight into previously reported associations of the SNPs in the show a low association P-value, spanning multiple genes. If none risk loci with a variety of phenotypes. of these tested SNPs and none of the other (not tested but known) Functionally annotated SNPs are subsequently mapped to SNPs in LD with the lead SNP are known to have a functional genes based on functional consequences on genes by (i) physical consequence (i.e., altering expression of a gene, affecting a position on the genome (positional mapping), (ii) eQTL binding site or violating the structure), no causal gene can associations (eQTL mapping), and (iii) 3D chromatin interactions be indicated. However, if one or several of the SNPs are known to (chromatin interaction mapping). Gene mapping can be affect the function of one of the genes in the area, but not the controlled by setting several parameters (Supplementary Table 2) other genes, then that single gene has a higher probability of that allow to in- or exclude specific functional categories of SNPs being functionally related to the disease. Pinpointing which and (Supplementary Fig. 1). Positional mapping is used to map SNPs how genes are affected by SNPs associated with a trait is crucial in based on being physically located inside a gene using a default of increasing our insight into the biological mechanisms underlying 10 kb windows, yet custom windows around a gene can be set by that trait. Interpreting SNP-trait associations requires adding the user. Users can select to only use SNPs that have specified functional information from several resources and repositories functional consequences, such as coding or splicing SNPs, to limit such as, e.g., the Genotype-Tissue Expression (GTEx)8, Ency- the positional mapping to functionally relevant SNPs. Thus, by clopedia of DNA Elements (ENCODE)9, Roadmap Epigenomics selecting to exclude intronic SNPs from the positional mapping Project10, or chromatin interaction information11. function, genes that contain only intronic SNPs in LD of In practice, the extraction and interpretation of the relevant independent significant SNPs will not be prioritized by FUMA. biological information from available repositories is not always eQTL mapping is used to map SNPs to genes which they show a straightforward, and can be time consuming as well as error prone. significant eQTL association with (i.e., the expression of that gene We have, therefore, developed FUMA, which functionally annotates is associated with allelic variation at the SNP). eQTL mapping GWAS findings and prioritizes the most likely causal SNPs and uses information from 4 data repositories (GTEx8, Blood eQTL genes using information from 18 biological data repositories and browser16, BIOS QTL browser17 and BRAINEAC18), and is tools. Gene prioritization is based on a combination of positional currently based on cis-eQTLs which can map SNPs to genes up to mapping, expression quantitative trait loci (eQTL) mapping and 1 Mb apart. Users can select tissue/cell types that are relevant to chromatin interaction mapping. Results are visualized to facilitate the phenotype of interest, and eQTLs can be filtered either by quick insight into the implicated molecular functions. FUMA is nominal P-value or FDR provided by the original data sources available as an online tool at http://fuma.ctglab.nl, where users can (Methods and Supplementary Note 2). Chromatin interaction customize settings to for example only use exonic SNPs for anno- mapping is used to map SNPs to genes when there is a significant tation, or only use SNPs that are eQTLs in specific tissues for the chromatin interaction between the disease-associated regions and annotation based on expression data. As input, FUMA requires nearby or distant genes. Chromatin interaction mapping can GWAS summary statistics and outputs include multiple tables and involve long-range interactions as it does not have a distance figures containing extensive information on, e.g., functionality of boundary as in eQTL mapping. FUMA currently contains Hi-C SNPs in genomic risk loci, including protein-altering consequences, data of 14 tissue types and seven cell lines from the study of gene-expression influences, open-chromatin states as well as three- Schmitt et al.11, yet new chromatin interaction data will be added dimensional (3D) chromatin interactions. The online tool includes when it becomes available and FUMA also allows users to upload

2 NATURE COMMUNICATIONS | 8: 1826 | DOI: 10.1038/s41467-017-01261-5 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-01261-5 ARTICLE

GWAS summary statistics

SNP2GENE

Characterization of significant hits Genome-wide analyses Independent Step 1. Characterize genomic loci significant SNPs MAGMA gene analysis

1. Identification of independent significant SNPs and candidate SNPs (SNPs in LD) Lead SNPs 2. Defining lead SNPs 3. Defining genomic risk loci MAGMA gene-set analysis Genomic risk loci

Step 2. Annotation of candidate SNPs in genomic loci SNPs with Gene set Gene-based annotations P-values P-values Functional consequences on genes (ANNOVAR), CADD score, RegulomeDB score, 15 chromatine state (127 tissue/cell types),eQTL, 3D chromation eQTLs interactions (Hi-C),GWAScatalog

Chromatin interactions Step 3. Functional Gene mapping

Positional mapping eQTL mapping Mapped genes table Chromatin interaction mapping Interactive visualization

Manhattan Plot (GWAS summary statistics) Manhattan plot of the input GWAS summary statistics. GENE2FUNC ≥ Result tables For plotting, overlapping data points are not drawn (filtering was performed only for SNPs with P-value 1e-5, see tutorial for details). Genomic risk loci lead SNPs lnd. Sig. Snps SNPs (annotations) ANNOVAR Mapped Genes eQTL GWAScatloag parameters Download the plot as PNG JPG SVG PDF

Click row to display a regional plot of GWAS summary statistics.

Show 10 entries Search:

60 Genomic P- uniqID rslD chr pos start end nSNPs nGWASSNPs nlndSigSNPs lndSigSNPs Locus value 50 3.09e- 61 9:84739941:A:T rs11139497 9 84739941 84736303 85059047 126 124 2 rs11139497:r 09 40 -value P 3.562e- 62 10:18745105:G:T rs7893279 10 18745105 18680935 18971121 199 191 3 rs7893279:rs 11 30 Interactive heatmap of Tissue specificity (DEG) –log10 9.235e- 63 10:104612335:A:T rs11191419 10 104612335 104229588 105165184 457 449 13 rs4244354:rs 20 18 3.21e- 64 11:24403620:A:G rs11027857 11 24403620 24367339 24412992 85 72 1 rs11027857 gene expression 10 09 8.253e- 65 11:46373311:C:T rs7951870 11 46373311 46342942 46730043 126 123 1 rs7951870 0 11 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 202122 23 6.652e- 4 66 11:57510294:A:G rs9420 11 57510294 57385856 57681828 93 92 1 rs9420 08 Manhattan Plot (gene-based test) 1.7238- 3 This is a manhattan plot of the gene-based test as computed by MAGMA based on you input GWAS summary statistics. 67 11:109378071:C:T rs12421382 11 109378071 109272074 109609889 68 63 1 rs12421382 The gene-based P-value is downloadable from ‘Download’ tab from the left side bar. 07 4.091e- 68 Input SNPs were mapped to 16010 protein coding genes (distance 0). Genome wide significance (red dashed line in the plot) was defined at P = 0.05/16010 = 3.123e-6. 11:113392994:C:T rs2514218 11 113392994 113317745 113427677 51 50 2 rs2514218:rs NKX2-3 2 10 ATP2A1 2.008e- Download the plot as PNG JPG SVG PDF 69 11:123394636:C:Grs77502336 11 123394636 123394636 123395987 3 3 1 rs77502336 PKLR 09 IL27 1 NPIPB6 Label top 8 genes. 3.678e- 70 11:124613957:A:G rs5566136 11 124613957 124612159 124653926 30 30 3 rs55661361:r NPIPB7 12 MUC19 0 IL23R IL12RB2 40 NOD2 UTS2 Showing 61 to 70 of 109 entries 1 row selected IL23R 4 35 Previous 1... 6 7 811... NPIPB8 SBK1 30 ATG16L1 HCN3 Next SULT1A2 3 25 C1or141 EGR2 -value LRRK2 P 20 CLN3 2 MST1 CYLD

TUFM –log10 15 NICN1 IRGM

FDPS -log 10 P-value Regional Plot (GWAS association) DAP3 1 10 YY1AP1 Please click one of the row of ‘Genomic risk loci’, ‘lead SNPs’ or ‘ind .sig.SNPs’tables to display a regional plot. ATXN2L You can zoom in/out by mouse scroll. 5 2 2 SULT1A1 Each SNP is color-coded based on the highest r to one of the ind.sig. SNPs, if that is greater or equal to the user defined threshold. Other SNPs (i.e. below the the user-defined r ) 0 are colored in grey. The top lead SNPs in genomic risk loci, lead SNPs and ind.sig. SNPs are circled in black and colored in dark-purple,purple and red,red respectively. PER3 0 Clear ADO 12345678910111213141516171920212218 Selected Locus RIT1 1000G SNPs 4 Chromosome RABEP2 r2 top lead SNP rs11191419 16 1 MSTO1 0.9 QQ plot (GWAS summary statisics) QQ plot (gene-based test) 14 0.8 Chorm 10 GON4L 0.7 3 12 0.6 SLC35D1 This is a Q-Q plot of GWAS summary statistics. This is a Q-Q plot of the gene-based test computed by MAGMA. 0.5 10 0.4 BP 104612335

EIF3C For plotting purposes, overlapping data points are not drawn (filtering was performed -value 0.3 only for SNPs with P-value ≥ 1e-5, see thutorial for details). P 8 0.2 SYT11 0.1 P-value 9.235e-18 AAGAB 2 6

EIF3CL –log10 #lnd. Sig. SNPs 13 Download the plot as PNG JPG SVG PDF Download the plot as PNG JPG SVG PDF 4 MTX1 2 GBA #lead SNPs 3 1 0 THBS3 104,000,000 104,500,000 105,000,000 105,500,000 40 SNPs within LD 457 CCDC101 Chromosome 10 NFATC2IP Top lead SNP Lead SNPs lndependent significant SNPs 0 60 35 GWAS SNPs within LD 449 ARHGEF2 UBQLN4 30 SH2B1 50 Liver Regional plot with annotation ?

Lung -value -value

Ovary Testis P Spleen Uterus Vagina P 25 Select annotation(s) to plot: Bladder Thyroid 40 Pituitary Prostate

Liver Stomach Lung Pancreas GWAS association statistics Ovary Testis Uterus Vagina Spleen Thyroid

Bladder 20

Pituitary CADD score Prostate

Stomach Artery_Aorta Artery_Tibial Nerve_Tibial Pancreas Brain_Cortex Whole_Blood 30 RegulomeDB score Adrenal_Gland Kidney_Cortex Colon_Sigmoid Fallopain_Tube Nerve_Tibial Artery_Aorta 15 Chromatine 15 state Brain_Cortex Whole_Blood Artery_Coronary Brain_Amygdala Muscle_Skeletal Kidney_Cortex Adrenal_Gland Observed –log10 Colon_Sigmoid Brain_Cerebellum Cervix_Ectocervix eQTL Fallopian_Tube Cervix_Endocervix Colon_Transverse Observed –log10 20 Muscle_Skeletal Artery_Coronary Brain_Amygdala Cervix_Ectocervix Brain_Cerebellum Brain_Hippocampus Esophagus_Mucosa 10

Colon_Transverse Heart_Left_Ventricle

Cervix_Endocervix Brain_Hypothalamus

Brain_Hippocampus Minor_Salivary_Gland Esophagus_Mucosa Heart_Left_Ventricle

Brain_Hypothalamus Plot OK.Good to go. Click “Plot” to create regional plot with selected annotations. Brain_Substantia_nigra Esophagus_Muscularis

Minor_Salivary_Gland Adipose_Subcutaneous 10 Heart_Atrial_Appendage 5 Esophagus_Muscularis Brain_Substantia_nigra Adipose_Subcutaneous Breast_Mammary_Tissue Heart_Atrial_Appendage

Breast_Mammary_Tissue Brain_Frontal_Cortex_BA9 Brain_Frontal_Cortex_BA9 Adipose_Visceral_Omentum 0 0 Adipose_Visceral_Omentum Brain_Caudate_basal_ganglia Cells_Transformed_fibroblasts

Brain_Caudate_basal_ganglia Brain_Putamen_basal_ganglia Brain_Cerebellar_Hemisphere Cells_Transformed_fibroblasts Skin_Sun_Exposed_lower_leg Brain_Putamen_basal_ganglia 0.0 0.5 1.0 1.5 2.0 3.0 4.0 5.5 Brain_Cerebellum_Hemisphere Brain_Spinal_cord_cervical_c-1 Skin_Sun_Exposed_Lower_leg 2.5 3.5 4.5 5.0 0.00.51.01.52.02.53.03.5

Brain_Spinal_cord_cervical_c-1 Small_Intestine_Terminal_lleum Small_Intestine_Terminal_lleum Expected -log10 P-value Expected –log10 P-value Skin_Not_Sun_Exposed_Suprapubic Cells_EBV-transformed_Iymphocytes Cells_EBV-transformed_lymphocytes Skin_Not_Sun_Exposed_Suprapubic

Brain_Anterior_cingulate_cortex_BA24 Brain_Anterior_cingulate_cortex_BA24

Esophagus_Gastroesophageal_Junction Esophagus_Gastroesophagus_Junction Brain_Nucleus_assumbens_basal_ganglia Brain_Nucleus_accumbens_basal_ganglia

Summary of SNPs and mapped genes Functional consequences of SNPs on genes ? Regional plot #Genomic risk loci 109 Download the plot as PNG JPG SVG PDF For SNPs colored grey in the plots of GWAS P-value, CADD, RegulomeDB core and eQTLs, please refer the legend at the bottom of #lead SNPs 128 7000 the page. Overrepresentation in gene sets 6000 For details of color-code of genes, please refer the legend at the bottom of the page. #Ind. Sig. SNPs 269 5000 Clear 4000 Download the plot as PNG JPG SVG PDF #candidate SNPs 14,497 3000 2000 #candidate GWAS tagged SNPs 14,032 1000 0 #mapped genes 84 NA 1000G SNPs UTR3 UTR5 exonic intronic 18 r2 Proportion of overlapping Enrichment P-value Overlapping genes intergenic upstream downstream 1 ncRNA_exonic 16 0.9 Hallmark gene sets ncRNA_intronic genes in gene sets 0.8 14 0.7 0.6 12 0.5 Summary per genomic risk locus ? 0.4 0.3 -value 10

Reactome highiy calcium permeable postsynaptic nicotinic acetylcholine receptors P 0.2 Download the plot as PNG JPG SVG PDF Positional gene sets 8 0.1 Reactome persynaptic nicotinic acetylcholine receptors –log10 6 Top lead SNP Lead SNPs #genes physically 4 Independent significant SNPs Reactome acetylcholine binding and downstream events Genomic loci Size (kb) #SNPs #mapped genes located in loci 2 Kegg alzheimers disease 1:2369498-2402499 1:8404093-8701288 0 Curated gene sets 1:30412503-30459412 Kegg long term potentiation 1:44029353-44137257 ACTR1A AS3MT CNNM2 Mapped genes 1:73275828-74077588 SUFU TRIM8 ARL3 WBP1L NT5C2 ST13P13 INA TAF5 CALHM2 Non-mapped protein coding genes Reactome neurotransmitter receptor binding and downstream transmission in the postsynaptic cell 1:97801307-98562260 Non-mapped non-coding genes 1:149998923-150822168 RNU6-43P SFXN2 CYP17A1 MARCKSL1P1REL1 PCGF6 UCMG5

1:177247854-177323124 FBXL15 RNU6-1231P RNU11-3R MIR1307 Biocarta pten pathway 1:207912179-208024062 Motif gene sets 1:243503764-243685760 CUEDC2 CYP17A1-AS1 PDCD11 2:57942987-58502679 Reactome cytochrome p450 arranged by substrate type MIR146B PFN1P11 2:72357336-72369876 2:146416874-146441828 RP11-18I14.10 PTGES3P4 pid s1p s1p1 pathway 2:149391192-149520186 C10orf95 C10orf32 2:162796517-162910223 2:185406883-186057716 TMEM180 C10orf32-ASMT GO terms gene sets 2:193848340-194028079 0.0 0.5 1.0 1.5 2.0 2.5 3.0 2:194338679-194676200 0.30 0.25 0.20 0.15 0.10 0.05 2:198146381-198942568 18 LRP1 2:199908378-200305460 PTGIS PLCB2 MAPK3 FOXO3 2:200710167-201242293 16 Proportion –log10 adjusted P-value GRIN2A CHRNA5CHRNA3CHRNA2 NDUFA6 CYP2D6 2:225334070-225467840 14 CYP17A1 Exonic SNPs CACNA1C 2:233550961-233812895 12 3:2436014-2562284 10 Other SNPs 3:17221017-17888256 8 3:36834099-36964583

3:52217088-53175017 CADD score 6 3:60287845-60293004 4 3:63792668-64003983 2 3:135807609-136615268 0 3:180524764-181205593 1a 4:23357307-23443426 1b 1c 4:102702364-103198082 1d 4:170208954-170646003 1e 4:176851045-176875795 1f 5:44960923-45393754 2a 5:60135962-60843706 2b 2c 5:88580998-88748452 3a 5:109030041-137948140 3b 5:137598340-137948140 4 5:151905013-152899532 RegulomeDB score 5 6 19:19358672-19744079 7 19:30981639-31038995 6 19:50067508-50138023 GTEx Brain_Cerebellar_Hemisphere 4 GTEx Brain_Cerebellum

20:37360944-37501507 -value

P 2 20:48114678-48131649 NT5C2 20:62155111-62155111 0 22:39942234-40058186 6 22:41027819-41027819 4 22:41408754-41854446 ARL3 2 General biological functions of genes 22:42315790-42689370 eQTL –log10 0 104,200,000 104,400,000 104,600,000 104,800,000 105,000,000 105,200,000 0 0 0 1 2 3 4 5 6 7 8 9 0 5 10 10 15 20 25 100 200 300 400 500 600 700 800 900 100 200 300 400 500 600 700 800 900 1000 Chromosome 10 Size (kb) #SNPs #mapped genes #genes physically located in loci OMIM (known disease associations), DrugBank (known targets of drugs), (general biological information)

Fig. 1 Overview of FUMA. FUMA includes two core processes, SNP2GENE and GENE2FUNC. The input is GWAS summary statistics. SNP2GENE prioritizes functional SNPs and genes, outputs tables (blue boxes), and creates Manhattan, quantile–quantile (QQ) and interactive regional plots (box at right bottom). GENE2FUNC provides four outputs; a gene expression heatmap, enrichment of differentially expressed gene (DEG) sets in a certain tissue compared to all other tissue types, overrepresentation of gene sets, and links to external biological information of input genes. All results are downloadable as text files or high-resolution images

NATURE COMMUNICATIONS | 8: 1826 | DOI: 10.1038/s41467-017-01261-5 | www.nature.com/naturecommunications 3 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-01261-5

Table 1 Feature comparison of bioinformatics tools and data sources

Tools Format GWAS LD Functional Regulatory eQTLs 3D Prioritize Map Gene Pathways Prioritize Visualization summary consequences elements chromatin SNPs SNPs expression and gene genes statistics on genes interactions to sets genes LD calculation PLINK St x x Variant annotations ANNOVAR St x x x x VEP St x x x x SCAN Web x x x x ReglomeDB Web x x x x HaploReg Web x x x x Gene-based test/Gene-set analyses VEGAS St x xx MAGMA St x xxx Pascal St x xxx MAGENTA St x xxx INRICH St x xx DEPICT St x xxx Visualization tools LocusZoom St/ x x Web LocusTrack St/ xx x Web 3D genome Web x x browser FUMA Web x x x x x x x x x x x x

St Standalone software, Web Web-based application their own chromatin interaction matrices, which is not limited to specific tissue compared to other tissue types) for each of 53 tissue Hi-C, but also accommodates ChIA-PET, 5C or Capture Hi-C types based on GTEx v6 RNA-seq data8 is also provided to identify data (Methods and Supplementary Note 3). Since chromatin tissue specificity of prioritized genes (Methods; Supplementary interactions are often defined in a certain resolution (as a Table 3). Enrichment of prioritized genes in biological pathways genomic region), such as 40 kb, an interacting region may span and functional categories is tested using the hypergeometric test multiple genes. To further prioritize candidate genes from against gene sets obtained from MsigDB21 and WikiPathways22. chromatin interaction mapping, information on tissue/cell type The proportions of overlapping genes, enrichment P-value and specific enhancer and promoter regions from the Roadmap which input genes are overlapping with the tested gene sets are Epigenomics Project10 can be optionally integrated with inter- visualized in plots as well as tables, which provides quick overview acting regions to filters SNPs and target genes (see Methods for of the shared biological functions of prioritized genes. details). The results of SNP2GENE and GENE2FUNC processes are For each of these three mapping strategies, additional filtering displayed as either interactive tables or plots on the web of SNPs based on functional annotations (i.e., CADD, Regulo- application. Additionally, tables are downloadable as plain text meDB, and 15-core chromatin state) is optionally available files (Supplementary Note 1) and plots are downloadable as high- (Methods and Supplementary Table 2). For example, setting a quality images in several formats (PNG, JPEG, PDF, and SVG). CADD score threshold will cause FUMA to use only highly fi deleterious SNPs or ltering SNPs by RegulomeDB score or open FUMA covers various features of existing tools. As a variety of chromatin state prioritizes SNPs which are likely to affect bioinformatics tools have been developed to obtain insights in – regulatory elements per one of the mapping strategies. GWAS results23 25, we compared the list of features available in The three mapping strategies (positional, eQTL and chromatin FUMA with the features available in other tools, and describe interaction mapping) result in a set of prioritized genes, based on fi fi fi these further below (Table 1). the GWAS input and speci c user-de ned lter settings. Both LD calculation is the first step to characterize risk loci of eQTL and chromatin interaction mapping may lead to prioritized GWAS by computing population specific LD structure, so called genes that are not necessarily themselves located inside a genomic clumping which identifies independent significant SNPs and risk locus, although they are linked to SNPs within a genomic risk defines the genomic risk loci. PLINK26 is the most widely used locus. The combination of positional mapping of deleterious software for this task which takes GWAS summary statistics coding SNPs, eQTL mapping, and chromatin interaction mapping (requiring a reference panel) or genotype data as input. In across (relevant) tissue types may reveal multiple lines of evidence FUMA, this task is automated by using pairwise LD (r2) of SNPs pointing towards the same genes and enables to prioritize genes in the reference panel (1000 genomes project phase 327) pre- that are highly likely involved in the trait of interest. computed by PLINK, resulting in a list of independent significant To obtain insight into putative biological mechanisms of SNPs, lead SNPs and genomic risk loci based on the GWAS input prioritized genes, the GENE2FUNC process annotates these genes fi fi fi le. FUMA also adds SNPs to the identi ed risk loci that do not in biological context (Fig. 1; see Methods for details). Speci cally, have a P-value (i.e., they were not available in the GWAS input biological information for each input gene is provided to gain file), but that are LD proxies of the identified lead SNPs, as these insight into previously associated diseases as well as drug targets by SNPs might be causally relevant. Alternatively, users can pre- mapping OMIM19 ID and DrugBank20 ID. Tissue specific 8 compute lead SNPs or risk loci and upload these to FUMA. expression patterns based on GTEx v6 RNA-seq data for each Variant Annotation is required to obtain information on gene are visualized as an interactive heatmap. Beside the single gene biological consequences of SNPs in the risk loci. There are several level analyses, overrepresentation in sets of differentially expressed tools such as ANNOVAR12 and VEP28 which annotate functional genes (DEG; sets of genes which are more (or less) expressed in a consequences on genes, and variant scores such as deleteriousness

4 NATURE COMMUNICATIONS | 8: 1826 | DOI: 10.1038/s41467-017-01261-5 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-01261-5 ARTICLE and phylogenetic conservations (extensive review is available in BMI GWAS Hou and Zhang29). Particularly for non-coding SNPs, SCAN30, 77 genomic loci 14 31 95 Iead SNPs RegulomeDB and HaploReg annotate regulatory information, 223 independent significant SNPs such as eQTLs, enhancer/promoter regions, and transcription factor 10,367 lead SNPs + correlated SNPs binding sites (see Tak and Farnham32 for extensive overview). (4072 GWAS tagged SNPa + 6295 Although SCAN and HaploReg correct for LD, the input of the SNPs from 1000G) tools mentioned above is a list of SNPs of interest which does not take genetic associations into account and thus requires pre- 400 prioritized genes by FUMA processing of GWAS results by the user. FUMA performs Positional mapping of deleterious coding SNPs annotation of SNPs that are in LD of independent significant SNPs eQTL mapping of 44 tissue types in a single flow, and does not require additional data preformatting. Gene-based test/gene-set analyses are methods that enable to 7 5 78 summarize SNP associations at the gene level and associate the set 2 950 of genes to biological pathways. For instance, VEGAS performs 33,34 249 Chromatin interaction permutation based simulation , MAGMA employs multiple mapping with Hi-C of 35 linear regression and Pascal computes sum and maximum of 14 tissue types chi-squared statistics36 to obtain gene-based P-values. Addition- ally, there are several tools that perform not only gene-based test but also gene-set analyses using full distribution of genetic 116 genes (Reported genes in 35 37 38 46* associations (e.g., MAGMA , MAGENTA , INRICH , and original study) 55 DEPICT39). FUMA implements MAGMA gene-based analysis 96 and gene-set analysis on the full GWAS input data. In addition, Novel 15 candidates genes prioritized by SNP2GENE or by the user are also tested for Known 234 overrepresentation in various gene sets in GENE2FUNC process. candidates Visualization is one of the essential features that allows (quick) insights into the GWAS results, e.g., summarizing annotated 151 genes (positional + eQTL 400 genes information of SNPs and genes. LocusZoom is one of the most mappings) (positional + eQTL + chromatin widely used visualization tool for GWAS results which plots LD interaction mappings) structure of a risk locus, gene locations as well as SNP association 40 Fig. 2 Overview of prioritized genes from BMI GWAS by FUMA. Starting values . LocusTrack is an extension of LocusZoom which also from the BMI GWAS summary statistics, boxes represent results of the plots additional information together such as Chip-seq and 41 SNP2GENE process. The annotated SNPs include all independent lead SNPs chromatin state . 3D Genome Browser is a recently developed and SNPs which are in LD with these lead SNPs. Prioritized genes are web application which contains comprehensive 3D chromatin 42 divided into three categories; genes that are implicated by deleterious interaction datasets such as Hi-C and ChIA-PET , though it coding SNPs (colored pink), by eQTLs for these genes (colored blue), or by does not integrate with GWAS summary statistics. These tools are chromatin interactions (colored green). The prioritized genes are further primarily focused on visualization of a subset of functionally categorized into previously reported genes (blue) and novel genes (red) relevant data sources. FUMA integrates results from multiple prioritized genes by FUMA. *These genes were not prioritized by FUMA lines of evidence and provides interactive visualization of results, since they do not have either deleterious coding SNPs, eQTLs or chromatin facilitating rapid interpretation. interactions, although they are located within GWAS risk loci The current lack of a single platform that integrates all possible resources for post-GWAS annotation hampers our understanding of GWAS results, as different GWAS studies may use a different “metabolism of lipid and lipoprotein”, “immune system”, and selection of queried resources rendering their post-GWAS “calcium signaling” (Supplementary Data 5). In addition, FUMA interpretation incomplete and difficult to compare. FUMA results showed that, although several genomic loci for BMI provides a central place for a wide variety of post-GWAS included multiple prioritized genes, a single gene was prioritized annotation strategies and to our knowledge is the most versatile in 22 out of 43 loci which contain at least one prioritized gene tool in doing so. (Supplementary Fig. 2), suggesting that these 22 genes have a high probability of being the causal gene in that region. The 22 “highly likely causal genes” include several well-known genes for BMI Application to GWAS of body mass index. To validate the such as NEGR1, TOMM40, and TMEM18. The strongest GWAS utility of FUMA, we applied it to summary statistics of the most association signal for BMI was on 16q.12.2 where three genes recent GWAS for body mass index (BMI; 236,231 individuals)43. were prioritized; FTO, RBL2, and IRX3 (Fig. 3). These three genes FUMA identified 95 lead SNPs (from 223 independent significant were only prioritized by eQTL mapping as the positional map- SNPs) across 77 genomic risk loci (Fig. 2 and Supplementary ping showed no deleterious coding SNPs located in these genes. Data 1–3), in accordance with the original study. We first con- The original study43 only mentioned FTO, because the associated ducted positional mapping of deleterious coding SNPs and eQTL SNPs were located in this gene, however none of the associated mapping (Methods) which prioritized 151 unique genes; 23 genes SNPs have a potential direct affect such as coding SNPs on FTO. with deleterious coding SNPs (positional mapping), and 144 Two of the genes prioritized by FUMA (RBL2 and IRX3) are genes with eQTLs that potentially alter expression of these genes physically located outside the genomic locus and are missed when (eQTL mapping) including 16 genes that had both deleterious using conventional approaches that prioritize genes located in the coding SNPs and eQTLs (Supplementary Data 4). The 151 genes locus of interest based on LD around the top SNP. Although the consist of 55 genes that were also reported in the original study43 IRX3 gene was not reported in the original study43, recent and 96 novel genes implicated by FUMA, including 45 genes functional work has indeed validated this as the causal gene which are located outside the risk loci. These novel candidates whose expression is affected by SNPs in the 16q.12.2 locus44. have shared biological functions with the 55 previously known We then performed chromatin interaction mapping using candidate genes such as “metabolism of carbohydrate”, Hi-C data of 14 tissue types (Methods). FUMA prioritized 310

NATURE COMMUNICATIONS | 8: 1826 | DOI: 10.1038/s41467-017-01261-5 | www.nature.com/naturecommunications 5 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-01261-5

1000G SNPs a r2 140 1 0.9 0.8 120 0.7 0.6 0.5 100 0.4 0.3 -value 80 0.2 P 0.1 60 Top lead SNP –log10 40 Lead SNPs Independent significant SNPs 20

0

RBL2 RPGRIP1L FTO IRX3 Mapped genes AKTIP FTO-IT1 Non-mapped protein coding genes Non-mapped non-coding genes

53,600,000 53,800,000 54,000,000 54,200,000

1000G SNPs r2 140 1 0.9 0.8 120 0.7 0.6 100 0.5 0.4 0.3 -value 80 0.2 P 0.1 60 Top lead SNP –log10 40 Lead SNPs Independent significant SNPs 20

0

RPGRIP1L FTO Mapped genes Non-mapped protein coding genes Non-mapped non-coding genes b 20 18 16 14 Exonim SNPs 12 10 Other SNPs 8

CADD score 6 4 2 0 1a 1b 1c 1d 1e 1f 2a 2b 2c 3a 3b 4 RegulomeDB score 5 6 7

10 GTEx Adipose Subcutaneous GTEx Artery_Tibial 5 GTEx Breast_Mammary_Tissue RBL2 GTEx Colon_Sigmoid -value 0 GTEx Heart_Atrial_Appendage P 10 GTEx Lung GTEx Nerve_Tibial 5 GTEx Pancreas

IRX3 GTEx Skin_Not_Sun_Exposed_Suprapubic 0 GTEx Skin_Sun_Exposed_Lower_Leg 10 GTEx Thyroid

eQTL –log10 5 FTO 0 53,600,000 53,700,000 53,800,000 Chromosome 16

Fig. 3 Regional plot of the locus 16q.12.2 of BMI GWAS. a Extended region of the FTO locus, which includes prioritized genes RBL2 and IRX3. Genes prioritized by FUMA are highlighted in red. b Zoomed in regional plot of FTO locus with, from the top, GWAS P-value (SNPs are colored based on r2), CADD score, RequlomeDB score and eQTL P-value. Non-GWAS-tagged SNPs are shown at the top of the plot as rectangles since they do not have a P-value from the GWAS, but they are in LD with the lead SNP. eQTLs are plotted per gene and colored based on tissue types. In the plots of CADD score, RegulomeDB score and eQTLs, SNPs which are not mapped to any gene are colored gray

6 NATURE COMMUNICATIONS | 8: 1826 | DOI: 10.1038/s41467-017-01261-5 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-01261-5 ARTICLE

Table 2 Summary of FUMA application to three GWAS summary statistics

GWAS Risk Reported genes in the Positional eQTL Chromatin Total* Genes located Novel Loci contain loci original study mapping mapping interaction outside the risk loci candidates prioritized genes mapping BMI 77 117 23 144 310 400 263 263 67 CD 71 115 39 69 199 276 161 215 55 SCZ 109 349 36 54 33 113 26 35 45

*The number of unique genes mapped by one of the positional, eQTL and chromatin interaction mappings genes (Supplementary Data 4), of which 61 genes are overlapping prioritized genes including 215 novel candidates which were not with the genes prioritized by positional and/or eQTL mappings reported in the original study (Table 2 and Supplementary Fig. 3). and 232 genes are located outside of the genomic risk loci (Fig. 2). Of the 23 loci which are mapped to at least one gene by positional That resulted in a total of 400 prioritized genes by combining and eQTL mappings, additional 23 loci are mapped to candidate three mapping strategies including 330 novel candidates which genes by chromatin interaction mapping, in which several of the were not reported in the original study (Table 2 and genes prioritized from those loci are involved in immune system Supplementary Data 4). These novel candidates further supported and cytokine signaling pathways (Supplementary Fig. 4 and shared biological functions with previously reported known Supplemental Data 12). One of these 23 risk loci, the 17q12 locus genes, such as lipid and lipoprotein metabolism, homeostatic is mapped to six chemokine ligands by Hi-C in liver: CCL1, process and various metabolic pathways, with a greater number of CCL2, CCL7, CCL8, CCL11, and CCL13. Additionally, prioritized genes compared to the mappings without Hi-C data (Supple- genes include 11 cytokines (IL4, IL5, IL10, IL19, IL23R, IL24, mentary Data 5). Out of 400 prioritized genes, 59 genes are IL27, IL33, IL1RL1, IL18R1, and IL18RAP)whereinIL18R1 and mapped by both eQTLs and chromatin interactions including IL18RAP are also mapped by eQTLs in whole blood and IL23R and IRX3 on the 16q.12.2 locus (Fig. 4), which further supports the IL27 are also mapped by deleterious coding SNPs which further hypothesis that these genes are involved in the risk of BMI. Of the supports the involvement of these cytokine genes in CD. The role of 48 loci that contained at least one prioritized gene from positional these chemokines and cytokines in inflammatory disease has been and eQTL mappings, chromatin interaction mapping identified widely studied48 and yet, chromatin interaction mapping identified candidate genes in additional 18 loci (Supplementary Fig. 2), additional relevant candidates from the risk loci. The prioritized including loci mapped to known genes associated with BMI such 276 genes showed enrichment in 123 canonical pathways such as as MC4R, FOXO3, and ADCY9. The 400 prioritized genes showed immune system and cytokine related pathways which are known to enrichment in 9 GO terms, such as “response to zinc ion” and be highly relevant to CD49 (Supplementary Data 13). “oligopeptide binding” overlapping with multiple metallothionein and glutathione S-Transferase genes whose association with obesity risk has been reported45,46 (Supplementary Data 6). Application to schizophrenia GWAS. We also applied FUMA to fi the most recent Schizophrenia (SCZ; 36,989 cases and 113,075 Thus, using BMI summary statistics, FUMA con rmed known 3 genes but also prioritized novel genes, including potential causal controls) GWAS summary statistics , and 128 lead SNPs from fi genes located outside the GWAS risk loci of BMI, which were 269 independent signi cant SNPs across 109 genomic loci were fi missed in the original study. identi ed (Supplementary Note 5, Supplementary Fig. 8 and Supplementary Data 14–17). Positional mapping of deleterious coding SNPs and eQTL mapping prioritized 84 unique genes of Application to Crohn’s disease GWAS. To further illustrate its which 36 genes were implicated by deleterious coding SNPs and utility, we applied FUMA to the summary statistics of Crohn’s 65 were implicated by eQTLs influencing expression of these disease47 (CD; 6333 cases and 15,056 controls). With FUMA, 95 genes (six genes had both deleterious coding SNPs and eQTLs; lead SNPs from 184 independent significant SNPs across 71 Supplementary Data 18). The prioritized 84 genes include 65 genomic loci were identified for CD (Supplementary Fig. 3 and genes which were previously reported as candidates in the Supplementary Data 7–10). First, describing the results of posi- original study3, while 19 genes were novel (Table 2) including 11 tional mapping of deleterious coding SNPs and eQTL mapping, genes which are physically located outside the GWAS risk loci. FUMA prioritized 95 unique genes from 32 loci (Supplementary These 19 novel candidates have several shared biological func- Fig. 4), of which 39 genes were implicated by deleterious coding tions with 65 previously known genes, such as “matrisome” and SNPs and 69 were implicated by eQTLs influencing expression of “neuronal system” (Supplementary Data 19). Out of 84 prior- these genes (12 genes had both deleterious coding SNPs and itized genes, 60 of them were also identified by the recent eQTLs; Table 2 and Supplementary Data 11). The prioritized 95 TWAS50 and Hi-C51 studies including 10 genes which are genes include 37 known candidate genes that were also reported in physically located outside the risk loci. The prioritized genes the original study47 including well-known CD-related genes such as cover 34 genomic loci out of 109 of which 20 loci are mapped to NOD2, IL23R,andSLC22A5, while 58 genes were novel (Supple- single prioritized gene (Supplementary Fig. 9; see Supplementary mentary Fig. 3; see Supplementary Note 4 and Supplementary Note 5 and Supplementary Fig. 10 for detailed results). These 20 Figs. 5–7 for detail results). These novel candidates include 18 genes genes are highly likely to drive the association signal in the that are physically located outside the GWAS risk loci, and the novel genomic loci. These genes include CACNA1C, LRP1, PLCB2, candidates mainly share immune system related biological functions GRIN2A, and NMUR2, which are involved in pathways such as with 37 previously known genes (Supplementary Data 12). Alzheimer’s disease, long-term potentiation, calcium signaling, Chromatin interaction mapping using Hi-C data in small and transmission across chemical synapses. bowel and liver prioritized 199 genes of which 18 genes are Chromatin interaction mapping using Hi-C data in hippo- overlapping with genes prioritized by positional and/or eQTL campus and prefrontal cortex prioritized 33 genes of which mappings and 149 genes are located outside of the genomic risk DPYD and WBPIL are also mapped by a deleterious coding SNP, loci (Supplementary Data 11). That resulted in a total of 276 and VPS45 and PITPNM2 are also mapped by eQTL in the brain

NATURE COMMUNICATIONS | 8: 1826 | DOI: 10.1038/s41467-017-01261-5 | www.nature.com/naturecommunications 7 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-01261-5

rs758747rs879620

56.5

2.5 56

3

55.5

3.5

55 rs1558902 4

54.5 4.5

GNAO1

CAPNS2SLC6A2 CES5A M MT1B MT1F LPCAT2 CES1 T 54 1 MMP2

G

IRX6 MT1H MT1X

TRAP1 ZNF263 IRX5 ZNF200 53.5 MEFV OR2C1 MTRNR2L4 16.5 NAA60 CLUAP1 IRX3 SLX4 DNASE1 RPGRIP1L CREBBP 53 FTO ADCY9 17 RBL2 CHD9 52.5 17.5

52 18

51.5 18.5

51 19 SYT17 CLEC19A GDE1 CCP110 C16orf62 19.5 50.5 KNOP1 IQCK GPRC5B GP2 PDIL 20 50 ACSM5T ACSM1 ARMC5 ACSM3 COX6A2ITGAM 20.5 49.5 ZNF668 STX1B

HSD3B7RNF40

PHKG2 rs12446632 ZNF843 SRCAPFBRS 21 KAT8 PRR14

SULT1A1 NUPR1 NPIPB7 ZNF771SEPT1 N SBK1 VKORC1 P 21.5 PRSS53 I C16orf93ZNF629 TC2IP IL27 BCKDK P A

CLN3

31.5 B ATXN2L

SH2B1

6 NF EIF3CL SULT1A2 CCDC101 22 31 ATP2A1EIF3C TUFM

LAT

22.5

SPNS1

30.5

30

27.5

28

29.5

29 28.5 rs9925964

rs3888190

Fig. 4 Chromatin interactions and eQTLs of BMI risk loci on chr. 16. The most outer layer is the Manhattan plot displaying SNPs with P-value < 0.05. Candidate SNPs are colored based on the highest r2 to one of the independent significant loci (red: r2 > 0.8, orange: r2 > 0.6). Other SNPs are colored in gray. rsID of top SNPs per locus are labeled. The outer circle is the chromosome coordinate and genomic risk loci are highlighted in blue. Genes mapped by either Hi-C or eQTLs are shown on the inner circle. Genes mapped by Hi-C, eQTLs are colored orange and green, respectively. Genes mapped by both are colored red. Chromatin interaction and eQTLs are shown as links colored orange and green respectively

(Supplementary Data 18). Out of these 33 genes, 15 are located potentiation and neurotransmitter receptor binding (Supplemen- outside of the genomic risk loci. Together with positional and tary Data 20). nAChR is an important neuron receptor in which eQTL mapping, this resulted in a total of 113 candidate genes one of the subunits alpha-7 (CHRNA7) has been recently studied including 35 novel candidates which are not reported in the as a new Schizophrenia drug target52,53. nAChR was also original study (Table 2 and Supplementary Fig. 8). The 29 genes identified as enriched pathway in the recent study using Hi-C prioritized only by chromatin interactions have shared functions in human cerebral cortex51 that suggests potential involvement of with other genes such as “regulation of response to stress” nAChR pathway in SCZ risk. (RWDD3), “intracellular signal transaction” (SGSM3), and several functions involved in regulation of transcriptions (OTUD7B and ZBTB18; Supplementary Data 19). Discussion Enrichment was seen in several brain-system related pathways, We introduce a web application named FUMA that allows to such as nicotinic acetylcholine receptors (nAChR), long-term process GWAS summary statistics, and annotate, prioritize SNPs

8 NATURE COMMUNICATIONS | 8: 1826 | DOI: 10.1038/s41467-017-01261-5 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-01261-5 ARTICLE and genes and facilitates interpretation by providing interactive enhancer and promoter regions for 111 epigenomes were obtained from the visualizations. FUMA provides a single platform that is built on Roadmap Epigenomics Projects10. Genomic coordinate of GWAS catalog1 reported the most popular tools for post-GWAS annotation and includes a SNPs was lifted down using liftOver software from hg38 to hg19. Normalized gene expression data (RPKM, Read Per Kilobase per Million) from GTEx portal v68 for rich collection of data repositories to bring insights into the 53 tissue types were processed for different purposes. The details are described in phenotype of interest, and annotation in FUMA typically takes “GTEx Gene Expression Data Set” section. Curated pathways and gene sets from only ±30 min. For every prioritized gene, FUMA provides the MsigDB v5.221 and WikiPathways22 which are assigned ID. rationale for pinpointing this gene, such as for example when the expression of the prioritized gene is altered by a SNP that is Characterization of genomic risk loci based on GWAS.Todefine genomic loci associated with the disease of interest. Interactive regional plots of interest to the trait based on provided GWAS summary statistics, pre-calculated (Fig. 3 and Supplementary Figs. 5–7, 10) show which genes in a LD structure based on 1000G of the relevant reference population (EUR for BMI, CD and SCZ) is used. First of all, independent significant SNPs with a genome- genomic risk locus are prioritized and which genes are not, and wide significant P-value (< 5e-8) and independent from each other at r2 < 0.6 are the annotated SNPs in the prioritized genes facilitate the gen- identified. For each independent significant SNP, all known (i.e., regardless of being eration of hypotheses for functional validation experiments. For available in the GWAS input) SNPs that have r2 ≥ 0.6 with one of the independent example, if a gene is prioritized because of an associated loss-of- significant SNPs are included for further annotation (candidate SNPs). These SNPs may thus include SNPs that were not available in the GWAS input, but are function SNP, follow-up validation experiments focusing on a available in the 1000G reference panel and are in LD with an independent sig- knock-out of this gene may provide disease relevant functional nificant SNP. Candidate SNPs can be filtered based on a user-defined minor allele information. On the other hand, if a gene is prioritized because a frequency (MAF, ≥0.01 by default). risk associated allele of a SNP increases expression of this gene in Based on the identified independent significant SNPs, independent lead SNPs fi 2 < brain, then an overexpression experiment of this gene in neuronal are de ned if they are independent from each other at r 0.1. Additionally, if LD blocks of independent significant SNPs are closely located to each other (< 250 kb cell cultures would be a more relevant experiment. based on the most right and left SNPs from each LD block), they are merged into The availability of biological resources that can aid in the one genomic locus. Each genomic locus can thus contain multiple independent interpretation of GWAS results, such as Hi-C and ChIA-PET, significant SNPs and lead SNPs. have dramatically increased recently and several studies have Besides using FUMA to determine lead SNPs based on GWAS summary fi statistics, users can provide a list of pre-defined lead SNPs. In addition, users can identi ed novel candidates from GWAS risk loci by integrating provide a list of pre-defined genomic regions to limit all annotations carried out by 51,54–57 their results for example with chromatin interactions . FUMA to those regions. These technologies have the potential to identify distal interac- tions of promoters and enhancers. Especially for risk loci for Annotation of candidate SNPs in genomic risk loci. Functional consequences of which it has been difficult to identify target genes due to the SNPs on genes are obtained by performing ANNOVAR12 (“gene-based annota- presence of gene desserts, distal interactions might point to causal tion”) using Ensembl genes (build 85). Note that SNPs can be annotated to more gene. Indeed, we identified additional putative causal genes by than one gene in case of intergenic SNPs which are annotated to the two closest up- and down-stream genes. CADD scores, RegulomeDB scores and 15-core chromatin performing chromatin interaction mapping on outcomes from state are annotated to all SNPs in 1000G phase 3 by matching chromosome, three GWAS studies (BMI, CD, and SCZ) and the additionally position, reference, and alternative alleles. eQTLs are also extracted by matching identified genes based on chromatin interaction information were chromosome, position and alleles of all independent significant SNPs and SNPs mostly located outside of the risk loci, and were shown to have which are in LD with one of the independent significant SNPs for each user- shared function with known candidates. Although chromatin selected tissue type, wherein SNPs can have multiple eQTLs for distinct genes and fi tissue types (Supplementary Note 2). Information on previously known SNP-trait interactions are highly tissue/cell type speci c, as well as time associations reported in the GWAS catalog is also retrieved for all SNPs of interest dependent, and currently available data is still limited in those by matching chromosome and position. aspects, FUMA provides an option to upload custom interaction matrices. Additionally, FUMA is built in such a way that newly Gene mapping. Gene annotation is based on Ensembl genes (build 85). To match published data including 3D chromatin interactions, eQTLs and external gene IDs, ENSG ID is mapped to entrez ID yielding 35,808 genes which other variant annotations can easily be included in the consist of 19,436 protein-coding genes, 9249 non-coding RNA, and other 7123 SNP2GENE process. This makes FUMA a flexible web tool which genes (e.g., pseudogenes, processed transcripts, immunoglobulin genes, and T-cell receptor genes). can be utilized not only for new GWAS results but also for Positional mapping is performed based on annotations obtained from previously published GWAS to re-annotate risk loci with the ANNOVAR12. Two optional filters are provided to control the maximum distance latest biological data sources. from SNPs to genes and select specific functional consequences of SNPs on gene. In summary, FUMA provides an easy-to-use tool to func- When the former option is defined, FUMA maps SNPs to genes based on ANNOVAR annotation and a user-defined maximum distance is applied for tionally annotate, visualize, and interpret results from genetic intergenic SNPs. When the latter option is provided, FUMA maps only SNPs association studies and to quickly gain insight into the directional which have selected annotations annotated by ANNOVAR (e.g., coding or splicing biological implications of significant genetic associations. FUMA SNPs). combines information of state-of-the-art biological data sources For eQTL mapping, all independent significant SNPs and SNPs in LD of them fi fi in a single platform to facilitate the generation of hypotheses for are mapped to eQTLs in user-de ned tissue types. By default, only signi cant SNP–gene pairs (false discovery rate (FDR)≤ 0.05) are used. Optionally, eQTLs can functional follow-up analysis aimed at proving causal relations be filtered based on a user-defined P-value. eQTL mapping maps SNPs to genes up between genetic variants and diseases. to 1 Mb apart (cis-eQTLs). Chromatin interaction mapping is performed by overlapping independent significant SNPs and SNPs in LD of them with one end of significantly interacting Methods regions in user-selected tissue/cell types. These SNPs are then mapped to genes whose Data pre-processing. All genetic data sets used in this study are based on the hg19 promoter regions (250 bp up- and 500 bp down-stream of transcription start site by human assembly and rsIDs were mapped to dbSNP build 146 if necessary. To default) overlap with another end of the significant interactions. Optionally SNPs can compute minor allele frequencies and LD structure, we used the data from the 1000 be filtered for those overlapping with predicted enhancer regions of the user-selected Genomes Project27 phase 3 (1000G). Minor allele frequency and r2 of pairwise epigenomes. Similarly, mapped genes can also be filtered for having promoter regions SNPs (minimum r2 = 0.05 and maximum distance between a pair of SNPs is 1 Mb) overlap with predicted promoter regions of the user-selected epigenomes. were pre-computed using PLINK26 for each of available populations (AFR, AMR, Optional filtering of SNPs based on functional annotations obtained in step 2 of EAS, EUR, and SAS). Functional annotations of SNPs were obtained from the SNP2GENE (i.e., CADD score, RegulomeDB score, 15-core chromatin state) can be following three repositories; CADD13, RegulomeDB14, and core 15-state model of performed for positional, eQTL and chromatin interaction mappings separately. chromatin9,10,15. Cis-eQTL information was obtained from the following four When any of these filters is activated, candidate SNPs are filtered primary to gene different data repositories; GTEx portal v68, Blood eQTL browser16, BIOS QTL mapping. Note that this filtering of SNPs based on functional annotations for a Browser17, and BRAINEAC18, and genes were mapped to ensemble gene ID if certain mapping does not affect other mappings, e.g., when SNPs are filtered by necessary (Supplementary Note 2). Pre-processed Hi-C data for 14 tissue types and CADD score in positional mapping but not in eQTL mapping, SNPs are filtered prior seven cell lines were obtained from GSE8711211 (Supplementary Note 3). Predicted to positional mapping but eQTL mapping uses the original set of candidate SNPs.

NATURE COMMUNICATIONS | 8: 1826 | DOI: 10.1038/s41467-017-01261-5 | www.nature.com/naturecommunications 9 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-01261-5

For mapped genes, two scores of intolerance to functional mutations are interaction mapping was performed using Hi-C data of two tissue types; Liver and 58 annotated; probability of being loss-of-function intolerant (pLI) and non-coding Small Bowel from GSE87112. The MHC region and indels were excluded from the 59 residual variation intolerance score (ncRVIS) . analysis. Since the input GWAS summary statistics only contained results from the discovery phase, we manually submitted the 71 reported lead SNPs to FUMA in fi MAGMA for gene analysis and gene set analysis. FUMA uses input GWAS addition to the independent lead SNPs that were identi ed as described above summary statistics to compute gene-based P-values (gene analysis) and gene set (Supplementary Data 7). Only protein-coding genes were used in mappings and P-value (gene set analysis) using the MAGMA35 tool. For gene analysis, the gene- enrichment of DEG in 53 tissue types, Canonical Pathways and GO terms were tested. based P-value is computed for protein-coding genes by mapping SNPs to genes if SNPs are located within the genes. For gene set analysis, the gene set P-value is Application to SCZ GWAS computed using the gene-based P-value for 4728 curated gene sets (including . Parameters were set as described above and eQTLs in canonical pathways) and 6166 GO terms obtained from MsigDB v5.2. For both 10 brain tissues from GTEx. Chromatin interaction mapping was performed using Hi-C data of two brain regions; hippocampus and prefrontal cortex. The extended analyses, the default MAGMA setting (SNP-wise model for gene analysis and – competitive model for gene set analysis) are used, and the Bonferroni correction MHC region (25 34 Mb), Chromosome X and indels were excluded from this (gene) or FDR (gene-set) was used to correct for multiple testing. 1000G phase 327 analysis. The input GWAS summary statistics are based on the discovery phase and is used as a reference panel to calculate LD across SNPs and genes. not all reported lead SNPs from the combined results of discovery and replication phases reached genome-wide significance. To include all reported lead SNPs, 111 non-indel lead SNPs were provided to FUMA and additional independent lead GTEx gene expression data set. Normalized gene expressions (reads per kilo base SNPs were identified at P < 5e-8 (Supplementary Data 14). Only protein-coding per million, RPKM) of 53 tissue types were obtained from GTEx (Supplementary genes were used in mappings and enrichment of DEG in 53 tissue types, Canonical Table 3). A total of 56,320 genes was available in GTEx, which we filtered on an Pathways and GO terms were tested. average RPKM per tissue greater than or equal to 1 in at least one tissue type. This resulted in transcripts of 28,520 genes, of which 22,146 were mapped to entrez ID (see “Gene Mapping” section for details). In the GENE2FUNC, the heatmap of Code availability. Source code of FUMA web application is available through a git prioritized genes displays two expression values; (i) the average log2(RPKM+1) per repository at https://github.com/Kyoko-wtnb/FUMA-webapp/. tissue per gene, in which RPKM is winsorized at 50, allowing comparison of expression level across genes and tissue types and (ii) the average of the normalized expression (zero mean of log2(RPKM+1)) per tissue per gene allowing comparison Data availability. Data and tools used in FUMA are all publicly available from the of expression level across tissue types within a gene. following links (details are in Supplementary Table 1). dbSNP build 146 rsID To obtain differentially expressed gene sets (DEG; genes which are significantly archive: ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b146_grch137p13/ more or less expressed in a given tissue compared to others) for each of 53 tissue database/organism_data/RsMergeArch.bcp.gz, 1000 genome phase 3 reference type, the normalized expression (zero mean of log2(RPKM+1)) is used. Two-sided panel: ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/, CADD: http://cadd. Student’s t-tests are performed per gene per tissue against all other tissues. After the gs.washington.edu/download, RegulomeDB: http://www.regulomedb.org/ Bonferroni correction, genes with corrected P-value < 0.05 and absolute log fold downloads, 15-core chromatin state: http://egg2.wustl.edu/roadmap/data/ change ≥ 0.58 are defined as a DEG set in a given tissue, i.e., for these gene byFileType/chromhmmSegmentations/ChmmModels/coreMarks/jointModel/final/ expression in the given tissue had the largest discrepancy with expression in all , GWAS catalog: https://www.ebi.ac.uk/gwas/, GTEx v6: http://www.gtexportal.org/ other tissues. In addition, we distinguish between genes that are upregulated and home/, Blood eQTL Browser: http://genenetwork.nl/bloodeqtlbrowser/, BIOS QTL downregulated in a specific tissue compared to other tissues, by taking the sign of t- Browser: http://genenetwork.nl/biosqtlbrowser/, BRAINEAC: http://www.braineac. score into account. In GENE2FUNC, genes are tested against those DEG sets by org/, HiC (GSE87112): https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi? hypergeometric tests to evaluate if the prioritized genes (or a list of genes of acc=GSE87112, promoter/enhancer regions: http://egg2.wustl.edu/roadmap/data/ interest) are overrepresented in DEG sets in specific tissue types. byDataType/dnase/, pLI score: ftp.broadinstitute.org/pub/ExAC_release/ release0.3.1/functional_gene_constraint, ncRVIS score: http://journals.plos.org/ plosgenetics/article/file?type=supplementary&id=info:doi/10.1371/journal. Gene set enrichment test. To test for overrepresentation of biological functions, pgen.1005492.s011, MsigDB: http://software.broadinstitute.org/gsea/msigdb/, the prioritized genes (or a list of genes of interest) are tested against gene sets WikiPathways: http://wikipathways.org/index.php/WikiPathways, ANNOVAR: obtained from MsigDB (i.e., hallmark gene sets, positional gene sets, curated gene http://annovar.openbioinformatics.org/en/latest/, and MAGMA: https://ctg.cncr. sets, motif gene sets, computational gene sets, GO gene sets, oncogenic signatures, nl/software/magma. GWAS summary statistics used in this study is available from and immunologic signatures) and WikiPathways, using hypergeometric tests. The the followings; BMI: http://portals.broadinstitute.org/collaboration/giant/index. set of background genes (i.e., the genes against which the set of prioritized genes are php/GIANT_consortium_data_files, CD: ftp.sanger.ac.uk/pub/consortia/ tested against) is 19,283 protein-coding genes. Background genes can also be ibdgenetics/, SCZ: http://www.med.unc.edu/pgc/results-and-downloads. selected from gene types as described in the “Gene Mapping” section. Custom sets of background genes can also be provided by the users. Multiple testing correction (i.e., Benjamini–Hochberg by default) is performed per data source of tested gene Received: 3 March 2017 Accepted: 30 August 2017 sets (e.g., canonical pathways, GO biological processes, hallmark genes). FUMA reports gene sets with adjusted P-value ≤ 0.05 and the number of genes that overlap with the gene set > 1 by default.

FUMA parameters for application to GWAS summary statistics. In the described applications, three mapping strategies were applied to GWAS summary References statistics with the following settings: positional mapping was performed by 1. Welter, D. et al. The NHGRI GWAS catalog, a curated resource of SNP-trait selecting exonic and splicing SNPs with CADD score ≥ 12.37 (defined by Kircher associations. Nucleic Acids Res. 42, D1001–D1006 (2014). et al.13) to restrict the mapping to deleterious coding SNPs. eQTL mapping was 2. Wood, A. R. et al. Defining the role of common variation in the genomic and performed using GTEx eQTLs with FDR<0.05. Chromatin interaction mapping biological architecture of adult human height. Nat. Genet. 46, 1173–1186 (2014). was performed using Hi-C data from Schmitt et al.11 and interactions were filtered by FDR<1e-6. Tissue types used for eQTLs and chromatin interaction mappings 3. Ripke, S. et al. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427 (2014). are described in the following section for each of three phenotypes. Other para- fi meters not mentioned here were kept as default (Supplementary Table 2). 4. Okbay, A. et al. Genome-wide association study identi es 74 loci associated with educational attainment. Nature 533, 539–542 (2016). 5. Sudlow, C. et al. UK biobank: an open access resource for identifying the causes of a Application to BMI GWAS. Parameters were set as described in the above section wide range of complex diseases of middle and old age. PLoS Med. 12,1–10 (2015). and we used eQTLs in 44 tissue types from GTEx. For chromatin interaction 6. Breen, G. et al. Translating genome-wide association findings into new mapping, Hi-C data of 14 tissue types (Adrenal, Aorta, Bladder, Dorsolateral therapeutics for psychiatry. Nat. Neurosci. 19, 1392–1396 (2016). Prefrontal Cortex, Hippocampus, Left Ventricle, Liver, Lung, Ovary, Pancreas, 7. Maurano, M. T. et al. Systematic localization of common disease-associated Psoas, Right Ventricle, Small Bowel and Spleen) from GSE87112 was used. Indels variation in regulatory DNA. Science 337, 1190–1195 (2012). were excluded. rsID was mapped to dbSNP build 146 and chromosome and 8. The GTEx Consortium. The genotype-tissue expression (GTEx) pilot analysis: positions were extracted based on hg19 reference. Only protein- Multitissue gene regulation in humans. Science 348, 648–660 (2015). coding genes were used in gene mapping and enrichment of DEG in 53 tissue 9. The ENCODE Project Consortium. An integrated encyclopedia of DNA types, Canonical Pathways and GO terms were tested. elements in the human genome. Nature 489,57–74 (2012). 10. Roadmap Epigenomics Consortium. Integrative analysis of 111 reference Application to CD GWAS. We set parameters as described above and we used human epigenomes. Nature 518, 317–330 (2015). eQTLs in five tissue types from GTEx which are relevant to CD, i.e., Small Intestine, 11. Schmitt, A. D. et al. A compendium of chromatin contact maps reveals spatially Colon Sigmoid, Colon Transverse, Stomach, and Whole Blood. Chromatin active regions in the human genome. Cell Rep. 17, 2042–2059 (2016).

10 NATURE COMMUNICATIONS | 8: 1826 | DOI: 10.1038/s41467-017-01261-5 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/s41467-017-01261-5 ARTICLE

12. Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of 45. Sato, M., Kawakami, T., Kadota, Y., Mori, M. & Suzuki, S. Obesity and genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, metallothionein. Curr. Pharm. Biotechnol. 14, 432–440 (2013). e164 (2010). 46. Raza, S. et al. Association of glutathione-S-transferase (GSTM1 and GSTT1) 13. Kircher, M. et al. A general framework for estimating the relative pathogenicity and FTO gene polymorphisms with type 2 diabetes mellitus cases in Northern of human genetic variants. Nat. Genet. 46, 310–315 (2014). India. Balkan J. Med. Genet. 17,47–54 (2014). 14. Boyle, A. P. et al. Annotation of functional variation in personal genomes using 47. Franke, A. et al. Genome-wide meta-analysis increases to 71 the number of RegulomeDB. Genome Res. 22, 1790–1797 (2012). confirmed Crohn’s disease susceptibility loci. Nat. Genet. 42, 1118–1125 (2010). 15. Ernst, J. & Kellis, M. ChromHMM: automating chromatin-state discovery and 48. Turner, M. D., Nedjai, B., Hurst, T. & Pennington, D. J. Cytokines and characterization. Nat. Methods 9, 215–216 (2012). chemokines: at the crossroads of cell signalling and in fl ammatory disease. 16. Westra, H.-J. et al. Systematic identification of trans eQTLs as putative drivers Biochim. Biophys. Acta 1843, 2563–2582 (2014). of known disease associations. Nat. Genet. 45, 1238–1243 (2013). 49. Braat, H., Peppelenbosch, M. P. & Hommes, D. W. Immunology of Crohn’s 17. Zhernakova, D. V. et al. Identification of context-dependent expression disease. Ann. N. Y. Acad. Sci. 1072, 135–154 (2006). quantitative trait loci in whole blood. Nat. Genet. 49, 139–145 (2016). 50. Gusev, A.. et al. Transcriptome-wide association study of schizophrenia and 18. Ramasamy, A. et al. Genetic variability in the regulation of gene expression in chromatin activity yields mechanistic disease insights. Preprint at http://biorxiv. ten regions of the human brain. Nat. Neurosci. 17, 1418–1428 (2014). org/content/early/2016/08/02/067355 (2016). 19. Amberger, J. S., Bocchini, C. A., Schiettecatte, F., Scott, A. F. & Hamosh, A. 51. Won, H. et al. Chromosome conformation elucidates regulatory relationships – OMIM.org: online mendelian inheritance in man (OMIM), an online catalog of in developing human brain. Nature 538, 523 527 (2016). human genes and genetic disorders. Nucleic Acids Res. 43, D789–D798 (2015). 52. Martin, L. F. & Freedman, R. Schisophrenia and the alpha 7 nicotinic – 20. Wishart, D. S. et al. DrugBank: a comprehensive resource for in silico drug acetylcholine receptor. Int. Rev. Neurobiol. 78, 225 246 (2007). α discovery and exploration. Nucleic Acids Res. 34, D668–D672 (2006). 53. Freedman, R. 7-Nicotinic acetylcholine receptor agonists for cognitive – 21. Liberzon, A. et al. Molecular signatures database (MSigDB) 3.0. Bioinformatics enhancement in schizophrenia. Annu. Rev. Med. 65, 245 261 (2014). 27, 1739–1740 (2011). 54. Mifsud, B. et al. Mapping long-range promoter contacts in human cells with – 22. Kutmon, M. et al. WikiPathways: capturing the full diversity of pathway high-resolution capture Hi-C. Nat. Genet. 47, 598 606 (2015). knowledge. Nucleic Acids Res. 44, D488–D494 (2016). 55. Martin, P. et al. Capture Hi-C reveals novel candidate genes and complex long- 23. Edwards, S. L., Beesley, J., French, J. D. & Dunning, M. Beyond GWASs: range interactions with related autoimmune risk loci. Nat. Commun. 6, 10069 illuminating the dark road from association to function. Am. J. Hum. Genet. 93, (2015). 56. Mcgovern, A. et al. Capture Hi-C identifies a novel causal gene, IL20RA, in the 779–797 (2013). pan-autoimmune genetic susceptibility region 6q23. Genome Biol. 17, 212 24. Marjoram, P., Zubair, A. & Nuzhdin, S. V. Post-GWAS: where next? More (2016). samples, more SNPs or more biology? Heredity (Edinb.) 112,79–88 (2014). 57. Promoters, T. G. et al. Lineage-specific genome architecture lnks enhancers and 25. Visscher, P. M., Brown, M. A., McCarthy, M. I. & Yang, J. Five years of GWAS non-coding disease variants to target gene promoters. Cell 167, 1369–1384 discovery. Am. J. Hum. Genet. 90,7–24 (2012). (2016). 26. Purcell, S. et al. PLINK: a tool set for whole-genome association and 58. Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007). Nature 536, 285–291 (2016). 27. Auton, A. et al. A global reference for human genetic variation. Nature 526, 59. Petrovski, S. et al. The intolerance of regulatory sequence to genetic variation 68–74 (2015). predicts gene dosage sensitivity. PLoS Genet. 11,1–25 (2015). 28. McLaren, W. et al. The ensembl variant effect predictor. Genome Biol. 17, 122 (2016). 29. Hou, L. & Zhao, H. A review of post-GWAS prioritization approaches. Front. Acknowledgements Genet. 4, 2009–2014 (2013). This work was funded by The Netherlands Organization for Scientific Research (NWO 30. Gamazon, E. R. et al. SCAN: SNP and copy number annotation. Bioinformatics VICI 453-14-005) and Ingrosyl. We thank the GIANT consortium, WTCCC and PGC 26, 259–262 (2010). for providing GWAS summary statistics and GTEx Portal for RNA-seq and eQTL data. 31. Ward, L. D. & Kellis, M. HaploReg v4: Systematic mining of putative causal We also thank Prof Patrick Sullivan for discussion of 3D chromatin interaction data. variants, cell types, regulators and target genes for human complex traits and disease. Nucleic Acids Res. 44, D877–D881 (2016). Author contributions 32. Tak, Y. G. & Farnham, P. J. Making sense of GWAS: using epigenomics D.P. conceived the study. K.W. and A.v.B. developed the web application. K.W. and genome engineering to understand the functional relevance of SNPs performed analyses and drafted the manuscript. K.W., E.T., and D.P. participated in the in non-coding regions of the human genome. Epigenet. Chromatin 8,57 discussions, interpretation of the results, and editing of the manuscript. All authors (2015). provided relevant input at different stages of the project and approved the final 33. Liu, J. Z. et al. A versatile gene-based test for genome-wide association studies. manuscript. Am. J. Hum. Genet. 87, 139–145 (2010). 34. Mishra, A. & Macgregor, S. VEGAS2: Software for more flexible gene-based testing. Twin Res. Hum. Genet. 18,1–6 (2014). Additional information 35. de Leeuw, C. A., Mooij, J. M., Heskes, T. & Posthuma, D. MAGMA: generalized Supplementary Information accompanies this paper at doi:10.1038/s41467-017-01261-5. gene-set analysis of GWAS data. PLoS Comput. Biol. 11,1–19 (2015). 36. Lamparter, D., Marbach, D., Rueedi, R., Kutalik, Z. & Bergmann, S. Fast and Competing interests: The authors declare no competing financial interests. rigorous computation of gene and pathway scores from SNP-based summary statistics. PLOS Comput. Biol. 12, e1004714 (2016). Reprints and permission information is available online at http://npg.nature.com/ 37. Ayellet, V. S., Groop, L., Mootha, V. K., Daly, M. J. & Altshuler, D. Common reprintsandpermissions/ inherited variation in mitochondrial genes is not enriched for associations with type 2 diabetes or related glycemic traits. PLoS Genet. 6, e1001058 (2010). Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in fi 38. Lee, P. H., O’dushlaine, C., Thomas, B. & Purcell, S. M. INRICH: Interval-based published maps and institutional af liations. enrichment analysis for genome-wide association studies. Bioinformatics 28, 1797–1799 (2012). 39. Pers, T. H. et al. Biological interpretation of genome-wide association studies using predicted gene functions. Nat. Commun. 6, 5890 (2015). Open Access This article is licensed under a Creative Commons 40. Pruim, R. J. et al. LocusZoom: regional visualization of genome-wide Attribution 4.0 International License, which permits use, sharing, association scan results. Bioinformatics 27, 2336–2337 (2011). adaptation, distribution and reproduction in any medium or format, as long as you give 41. Cuellar-Partida, G., Renteria, M. E. & MacGregor, S. LocusTrack: integrated appropriate credit to the original author(s) and the source, provide a link to the Creative visualization of GWAS results and genomic annotation. Source Code Biol. Med. Commons license, and indicate if changes were made. The images or other third party ’ 10, 1 (2015). material in this article are included in the article s Creative Commons license, unless 42. Wang, Y. et al. The 3D genome browser: a web-based browser for visualizing indicated otherwise in a credit line to the material. If material is not included in the ’ 3D genome organization and long-range chromatin interactions. Preprint at article s Creative Commons license and your intended use is not permitted by statutory http://biorxiv.org/content/early/2017/02/27/112268 (2017). regulation or exceeds the permitted use, you will need to obtain permission directly from 43. Locke, A. E. et al. Genetic studies of body mass index yield new insights for the copyright holder. To view a copy of this license, visit http://creativecommons.org/ obesity biology. Nature 518, 197–206 (2015). licenses/by/4.0/. 44. Claussnitzer, M. et al. FTO obesity variant circuitry and adipocyte browning in humans. N. Engl. J. Med. 373, 895–907 (2015). © The Author(s) 2017

NATURE COMMUNICATIONS | 8: 1826 | DOI: 10.1038/s41467-017-01261-5 | www.nature.com/naturecommunications 11