Disentangling the Genetics of Sarcopenia: prioritization of NUDT3 and KLF5 as for lean mass and HLA-DQB1-AS1 for hand grip strength based on associated SNPs

Abhishek N. Singh (  [email protected] )

Research article

Keywords: Genetics of Sarcopenia, prioritization of NUDT3 and KLF5, genes for lean mass and HLA- DQB1-AS1

Posted Date: October 16th, 2019

DOI: https://doi.org/10.21203/rs.2.16139/v1

License:   This work is licensed under a Creative Commons Attribution 4.0 International License. Read Full License

Version of Record: A version of this preprint was published on February 24th, 2020. See the published version at https://doi.org/10.1186/s12881-020-0977-6.

Page 1/16 Abstract

Background: Sarcopenia is a skeletal muscle disease of clinical importance that occurs commonly in old age and in various disease sub-categories. Widening the scope of knowledge of the genetics of muscle mass and strength is important because may allows us to identify new genetic markers or identify patients with an increased risk to develop a specifc musculoskeletal diseases or condition such as sarcopenia. It might also allow us to identify drugs that affect muscle in ways unknown before and therefore to reposition drugs to other uses, in accordance to their newly found target. We used bioinformatics tools to identify loci responsible for regulating muscle strength and lean muscle mass, which can then be a target for downstream lab experimentation validation. SNPs associated with various disease traits for the muscles and specifc loci were chosen according to their muscle phenotype association p-value, as traditionally done in the GWASs. We developed and applied a combination of expression quantitative trait loci study (eQTLs) and GWAS summary information, to prioritize causative SNP and point out the unique genes associated in the tissues of interest (muscle).

Results: We found NUDT3 and KLF5 for lean mass and HLA-DQB1-AS1 for hand grip strength as candidate genes to target for these phenotypes. The associated regulatory SNPs are rs464553, rs1028883 and rs3129753 respectively.

Conclusion: TWAS approaches of combining GWAS and eQTL summary statistics proved helpful in statistically prioritizing genes and their associated SNPs for the disease phenotype of study, here, Sarcopenia. Potentially regulatory SNPs associated with these genes can be analyzed with respect to TADs and then targeted for knock out in either C2C12 mouse myoblast cells, adipocytes or any other relevant cell, depending on the phenotype it is hypothesized to affect, as a downstream experimental validation plan.

Background

Many diseases known to man originate from more than one genetic locus. A GWAS (genome-wide association study) of lean muscle mass (LM) or hand grip strength (HG) was done in the studies by Karasik et al.[1], Zilikens et al.[2], Willems et al.[3] and Tikkanen et al.[4], in various large human populations. Various sample GWAS plots for LM and HG are also provided at the author’s webpage (www.tinyurl.com/abinarain and then navigate to “Educational Stuffs”). As such, genetic inheritance studies, although outlining genetic architecture, were unable to determine the specifc loci responsible for these diseases[5].

Since 2006, GWASs have allowed us to trace the multiple genetic factors for such diseases using statistical tools that can lead to a more effective research of specifc locus of interest[6]. The data produced by these studies, which now rank in the thousands, are available online so further downstream research can be conducted, and new results can be incorporated. This is indeed valuable, since, for

Page 2/16 example, musculoskeletal diseases are one of the leading causes of disability in the world [7]; treatment of these diseases costs the world medical industry around 125 billion dollars annually[7].

As shown in the scientifc literature[8], muscle strength, hand grip (HG) in our case, and lean muscle mass (LM) are two of the most important factors that can predict muscle health. These are especially important in the context of sarcopenia, as they allow evaluating its progression. Furthermore, research has shown that with age these two factors tend to dissociate and thus need to be measured independently, as one cannot predict the other[8]. The discovery of new therapeutic targets which might increase LM or HG or both may be useful in the management or treatment of otherwise incurable diseases.

Previous studies have identifed potential targets for research[9]. These targets were overlaid with eQTL and GWAS p values. These p values can later be added to data collected from other reliable online databases and summed into a grading system (scale), which can allow us to choose the most relevant genes to be knocked out, for example, in the mouse C2C12 myoblast cell line using the siRNA technique. However, for this paper, we present only the combination of summary level data from GWASs and publicly available eQTLs such as those from studies by GTEx[21] and Westra et. al. [20], and then plot the topologically active domains (TADs), at the loci of interest generated by this combination approach of phenotype-associated SNPs (Single Nucleotide Polymorphism) and tissue-relevant gene-associated SNPs. A TAD is a self-interacting genomic region, meaning that DNA sequences within a TAD physically interact with each other more frequently than with sequences outside the TAD[22]. The function of TADs is not fully understood yet, although disrupting the TADs e.g. because of SNPs or InDels (insertions deletions) can lead to chromosomal structure disruption and could lead to diseases by affecting gene regulation.

Method

The results of GWAS for lean mass (LM) and hand grip strength (HG) was published in studies by Karasik et al,.[1], Zilikens et al.[2], Willems et al.[3] and Tikkanen et al.[4], in various large human populations. The summary of eQTL data was obtained by studies by Westra et. al. [20] and by the GTEx21 consortium for various tissues. With these data, we executed “Summary-data-based Mendelian randomization” (SMR) [23] analysis. We chose not to investigate the SNPs for further categorization of pleiotropy or causality, which can be done using the Heidi test. For the case of GTEx eQTL summary data, the execution was done for all tissues, and then we observed for genes which were enriched specifcally in skeletal muscle tissue or specifcally compared to the aggregate of all other tissue types. For the genes of interest as described in the analysis section, we went on to examine the TAD plots for the corresponding skeletal muscle tissues such as the psoas and bladder from the study of Schmitt et. al. [24]

Results

Our GTEx tissue analysis found that for lean mass, two genes, NUDT3 and KLF5, were enriched in skeletal muscles (Figure 1, 2), and they were also found in Westra et. al. [20] (Westra) eQTL analysis Page 3/16 (Figure 4). In the GTEx tissue analysis for the hand grip trait, we found one gene, HLA-DQB1-AS1, which was specifcally enriched in skeletal tissue compared to other tissues (Figure 12), with the associated SNP as rs3129753. Many other genes found to be enriched in skeletal muscle tissues and other tissues in common intersection as in Figures 1, 2, 12, were also found in Westra eQTL analysis with our GWAS summary dataset. We propose that those genes that are found to be exclusively enriched only in skeletal muscle tissues by both GTEx and Westra eQTL summary dataset combined with our GWAS summary dataset using the Mendelian randomization approach be given the frst priority for further experimental validation, and least priority should be given to those genes which are found to be not enriched in skeletal muscle tissue in either the Westra or GTEx analysis. The second priority should be given to the genes found to be enriched in skeletal muscle tissue as well as any other tissue. Clearly, NUDT3 and KLF5 are very strong candidate genes for lean mass study, and their associated regulating SNP are rs464553 and rs1028883 respectively. Most importantly, the associative SNPs for these genes are also listed in fgures 3–11, and RT-qPCR (Reverse Transcriptase quantitative Polymerase Chain Reaction) can be possibly performed on skeletal muscle samples from people with and without these SNP genotypes, to confrm the enrichment of these genes.

Chromosomes separate active and inactive chromatin into compartments A and B, respectively where compartment A correlates with high , active histone marks, and early replication timing, whereas the compartment B replicates late, is enriched with repressive histone modifcations and has low gene expression. Compartments can be further subdivided into megabase-sized genomic regions known as topologically associating domains (TADs)[25][26], which act as regulatory scaffolds and are demarcated by binding sites of the architectural protein CTCF. Disruption of TAD boundaries results in the establishment of novel inter-TAD interactions. These have been shown to be associated with misexpression of Hox genes [27], up-regulation of proto-oncogenes [28], and developmental disorders [29]. TAD plots for the psoas and bladder tissues (which are skeletal and smooth muscle types, respectively) were also plotted and one of them is shown as Figure 13 for KLF5 bladder tissue. Clearly, these sites can be further investigated for their corresponding SNP positions and whether they are in TAD boundaries or not by knock down of the SNPs and observing for any disruption in cellular activity. TAD plots for other relevant genes using raw and normalized data can be found at the author’s webpage www.tinyurl.com/abinarain and then navigating to the “Educational Section”.

Discussion

Apart from the combination of GWAS and eQTL summaries for the tissue of interest and searching for exclusivity of enhancement of gene and the presence of SNP regulation near TAD boundaries, as discussed above, we also incorporated other factors, such as gene score for relevant diseases, phenotype also known in mice to be tested, coexpression of the gene in question in relevant databases, known epigenetic factors, and medical implications for known drugs. Though the results for this study will still need time to

Page 4/16 be substantiated for publication, we feel that it would make sense to state the method for the sake of completeness.

Potential genes were obtained from the work of Zillikens [2], Karasik[1] for LM and for HG, Willems[3] and Tikkanen[10]. Genes provided by Karasik, Zillikens and Willems were graded as frst tier genes, while genes provided by Tikkanen were graded as second tier genes. The reason behind this was that the Tikkanen research [10] was published at the end of the year 2018, while the database was already being collected. The list of SNPs was mined with cis Expression quantitative trait loci analysis (eQTLs) for transcripts within 2 Mb of the SNP position was carried out as described by Zillikens et al [2]. Other similar datasets were scrutinized, and genes in proximity of SNPs were scaled according to a specifcally developed scoring system with the following components: GWASs, eQTLs, Malacards, OMIM, TAD and epigenetic data.

For GWASs: 1 point for each signifcant P value for a locus in any published muscle-related GWAS. For eQTLs, 1 point if the locus has a muscle-specifc signifcant P value (is an eQTL), and 0 if not. Data were provided by three eQTL studies [2,3,10]. Scoring was as follows: 1 point for each time the gene is expressed in relevant tissues (muscle, tendon or the tibial nerve). Data were also extracted from the Ensembl data base (build 95) [11]. One point for was given for each relevant disease the gene was associated with according to “Malacards 1” version 4.9.0.20 and OMIM. The Malacards scoring was on a scale of 0 to infnity. For this study, each disease with a score of more than 3 received one point in the score. OMIM was used for verifcation of the Malacards data in mice. One point was given for each time the gene was associated with a muscle-related phenotype in mice, in accordance with the mice genome informatics database version 6.3.0.14. One point was subtracted each time the gene was shown to be co- expressed with another gene that has a relevant effect or phenotype according to the COXPRESdb gene co-expression database (version 7.1) [15]

The Malacards relevance score algorithm can be found on this webpage: https://www.malacards.org/pages/searchguide available datasets for Homo sapiens). Co-expression reduces the chance that a specifc gene is responsible for a given trait; thus a point was subtracted. Genes of interest were also examined in relation to the active TAD to which they are mapped, to reveal whether a gene is in euchromatin and whether it interacts with other genes which might carry out a similar function or have a similar phenotype as the gene of interest. TADs were analyzed from samples of psoas muscle (striated) and bladder muscle (m. detrusor, smooth muscle) For epigenetic data: 1 point was awarded each time the SNP appeared to be an enhancer in a relevant tissue, according to HaploReg[16].

Proxies for every variant in the SNP table was extracted using Ldlink[17], the output was fltered according to the following criteria: LD R² = 0.8, location within the region of the gene of interest and a score of 5 or less according to RegulomDB[18], which means that the proxies had sufcient data validating their annotation.

Page 5/16 In case two genes received the same score, biological data will be used to determine which one is more relevant to the study. Relevant loci will be then knocked out in the C2C12 mouse myoblast cell line. Identifying new loci responsible for LM or HG may be used by the pharmaceutical industry as new targets for pharmaceutical products or repositioning of existing drugs in accordance to new data on the activity of these drugs. The greatest hurdle with drug repositioning today is lack of solid databases needed to conduct efcient repositioning [9]. This study of future can serve as a test to whether this approach to gene prioritization can resolve this problem.

Conclusions

The current work focused primarily on the combined bioinformatic approaches using GWASs and eQTLs for summary-data based Mendelian randomization (SMR). The results of exclusivity of the tissues of interest were further classifed for their importance based on Venn diagrams and their corresponding TAD plots to look for the TAD boundaries where the associated regulating SNPs could be localized. NUDT3 and KLF5 for lean mass and HLA-DQB1-AS1 for hand grip strength and their associated SNPs (rs464553, rs1028883 and rs3129753) had the highest priority as candidate targets for new or repositioned drugs.

We propose conducting RT-qPCR to further ascertain the association of enhancement of a gene for patients with a SNP genotype that was associated positively with gene enrichment in the current study. We also proposed further steps that can be taken to further prioritize candidate genes as targets for new drugs or for drug repurposing. One limitation of this study is that the eQTL analysis was not done on trans-association SNPs.

Declarations

SNPs—Single Nucleotide Polymorphism, GWAS—Genome wide association study, eQTL—expression quantitative trait loci, TADs—Topologically Associated Domains, LM—Lean Mass, LD—Linkage Disequilibrium, HG—Hand Grip strength, TWAS—Transcriptome Wide Association Study

Ethical Approval and Consent to participate

No individual level data was used for the analysis, and thus consent for publication was not needed.

CONSENT FOR PUBLICATION

No individual level data has been used for this study, and thus consent for publication was not needed.

AVAILABILITY OF DATA AND MATERIAL

www.tinyurl.com/abinarain and then navigate to Educational Section

COMPETING INTERESTS

Page 6/16 None to be declared

FUNDING

The publication costs of this project were funded by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 802825 to Principal Investigator Minna U Kaikkonen)

ACKNOWLEDGEMENT

The author is thankful for the various software tools that have been made available for academic purposes such as SMR. The author is thankful to Bili Gasman of Bar Ilan University who wrote the gene knockout strategy as a future course of action, and David Karasik of Bar Ilan University who provided GWAS data that he carried out for hand grip and lean mass study for his previous research work. David Laaksonen of UEF helped in contribution of the written content of the manuscript.

References

1. Karasik D, Zillikens MC, Hsu YH, et al. Disentangling the genetics of lean mass. Am J Clin Nutr 2019;109(2):276–8. 2. Zillikens MC, Demissie S, Hsu Y-H, et al. Large meta-analysis of genome-wide association studies identifes fve loci for lean body mass. Nat Commun [Internet] 2017 [cited 2019 Feb 11];8(1):80. Available from: http://www.nature.com/articles/s41467-017-00031-7 3. Willems SM, Wright DJ, Day FR, et al. Large-scale GWAS identifes multiple loci for hand grip strength providing biological insights into muscular ftness. Nat Commun [Internet] 2017 [cited 2019 Feb 11];8:16015. Available from: http://www.nature.com/doifnder/10.1038/ncomms16015 4. Tikkanen E, Gustafsson S, Amar D, et al. Biological Insights Into Muscular Strength: Genetic Findings in the UK Biobank. Sci Rep [Internet] 2018 [cited 2019 Feb 11];8(1):6451. Available from: http://www.nature.com/articles/s41598-018-24735-y 5. Risch N, Merikangas K. The Future of Genetic Studies of Complex Human Diseases. Science (80- ) [Internet] 1996 [cited 2019 Jan 12];273(5281):1516–7. Available from: http://www.sciencemag.org/cgi/doi/10.1126/science.273.5281.1516 6. Haines JL. Complement Factor H Variant Increases the Risk of Age-Related Macular Degeneration. Science (80- ) [Internet] 2005 [cited 2019 Jan 12];308(5720):419–21. Available from: http://www.sciencemag.org/cgi/doi/10.1126/science.1110359 7. Barbe MF, Gallagher S, Massicotte VS, Tytell M, Popoff SN, Barr-Gillespie AE. The interaction of force and repetition on musculoskeletal and neural tissue responses and sensorimotor behavior in a rat model of work-related musculoskeletal disorders. BMC Musculoskelet Disord [Internet] 2013 [cited

Page 7/16 2019 Jan 26];14(1):303. Available from: http://bmcmusculoskeletdisord.biomedcentral.com/articles/10.1186/1471-2474-14-303 8. Heymsfeld SB, Cristina Gonzalez M, Lu J, Jia G, Zheng J. Skeletal muscle mass and quality: evolution of modern measurement concepts in the context of sarcopenia. [cited 2019 Feb 13];Available from: https://doi.org/10.1017/S0029665115000129 9. Doostparast Torshizi A, Wang K. Next-generation sequencing in drug development: target identifcation and genetically stratifed clinical trials. Drug Discov Today [Internet] 2018 [cited 2019 Feb 11];23(10):1776–83. Available from: https://linkinghub.elsevier.com/retrieve/pii/S1359644618300400 10. Tikkanen E, Gustafsson S, Amar D, et al. Biological Insights Into Muscular Strength: Genetic Findings in the UK Biobank OPEN. [cited 2019 Jan 23];Available from: www.nature.com/scientifcreports 11. Zerbino DR, Achuthan P, Akanni W, et al. Ensembl 2018. Nucleic Acids Res [Internet] 2018 [cited 2019 Feb 11];46. Available from: http://www.ensembl.org 12. Rappaport N, Twik M, Plaschkes I, et al. MalaCards: an amalgamated human disease compendium with diverse clinical and genetic annotation and structured search. Nucleic Acids Res [Internet] 2017 [cited 2019 Feb 11];45:877–87. Available from: https://academic.oup.com/nar/article- abstract/45/D1/D877/2572056 13. Stelzer G, Rosen N, Plaschkes I, et al. The GeneCards Suite: From Gene Data Mining to Disease Genome Sequence Analyses [Internet]. In: Current Protocols in Bioinformatics. Hoboken, NJ, USA: John Wiley & Sons, Inc.; 2016 [cited 2019 Feb 11]. p. 1.30.1-1.30.33.Available from: http://doi.wiley.com/10.1002/cpbi.5 14. Finger JH, Smith CM, Hayamizu TF, et al. The mouse Gene Expression Database (GXD): 2017 update. Nucleic Acids Res [Internet] 2017 [cited 2019 Feb 11];45. Available from: http://www.informatics.jax.org/gxdlit 15. Obayashi T, Hayashi S, Shibaoka M, Saeki M, Ohta H, Kinoshita K. COXPRESdb: a database of coexpressed gene networks in mammals. Nucleic Acids Res [Internet] 2008 [cited 2019 Feb 11];36:77–82. Available from: http://coxpresdb. 16. Ward LD, Kellis M. HaploReg: A resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants. Nucleic Acids Res 2012;40(D1):930–4. 17. Machiela MJ, Chanock SJ. Genetics and population analysis LDlink: a web-based application for exploring population-specifc haplotype structure and linking correlated alleles of possible functional variants. [cited 2019 Mar 21];Available from: https://academic.oup.com/bioinformatics/article- abstract/31/21/3555/195027 18. Boyle AP, Hong EL, Hariharan M, et al. Annotation of functional variation in personal genomes using RegulomeDB. [cited 2019 Mar 21];Available from: https://genome.cshlp.org/content/22/9/1790.full.pdf 19. Shiori TN, Ryo N, Kiyoko O, Eto NN. A novel in vitro model of sarcopenia using BubR1 hypomorphic C2C12 myoblasts. Cytotechnology [Internet] [cited 2019 Feb 13];68. Available from:

Page 8/16 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5023544/pdf/10616_2015_Article_9920.pdf 20. Westra HJ et. al., Systematic identifcation of trans eQTLs as putative drivers of known disease associations. Nat Genet. 2013 Oct;45(10):1238-1243. doi: 10.1038/ng.2756. Epub 2013 Sep 8. Available from: https://www.ncbi.nlm.nih.gov/pubmed/24013639 21. GTEx Consortium, Genetic effects on gene expression across human tissues, Nature volume550, pages204–213 (12 October 2017). Available from: https://www.nature.com/articles/nature24277 22. Pombo, A; Dillon, N (April 2015). "Three-dimensional genome architecture: players and mechanisms". Nature Reviews Molecular Cell Biology. 16 (4): 245–57. doi:10.1038/nrm3965. PMID 25757416 23. Zhihong Zhu, Futao Zhang, Han Hu, Andrew Bakshi, Matthew R Robinson, Joseph E Powell, Grant W Montgomery, Michael E Goddard, Naomi R Wray, Peter M Visscher & Jian Yang Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nature Genetics volume 48, pages 481–487 (2016) 24. Schmitt, A., Hu, M., Jung, I., Xu, Z., Qiu, Y., Tan, C., Li, Y., Lin, S., Lin, Y., Barr, C. & Ren, B. (2016). A Compendium of Chromatin Contact Maps Reveals Spatially Active Regions in the . Cell Reports, 17(8), 2042-2059. 25. Dixon J.R., Selvaraj S., Yue F. , Kim A. , Li Y. , Shen Y. , Hu M. , Liu J.S. , Ren B. , Topological domains in mammalian genomes identifed by analysis of chromatin interactions., Nature. 2012; 485: 376-380 26. Nora E.P. , Lajoie B.R. , Schulz E.G. , Giorgetti L. , Okamoto I. , Servant N. , Piolot T. van Berkum N.L. , Meisig J. , Sedat J. , et al. Spatial partitioning of the regulatory landscape of the X-inactivation centre. Nature. 2012; 485: 381-385 27. Narendra V. , Rocha P.P. , An D. , Raviram R. , Skok J.A. , Mazzoni E.O. , Reinberg D. , CTCF establishes discrete functional chromatin domains at the Hox clusters during differentiation. Science. 2015; 347: 1017-1021 28. Flavahan W.A., Drier Y. , Liau B.B. , Gillespie S.M. , Venteicher A.S. , Stemmer-Rachamimov A.O. , Suvà M.L. , Bernstein B.E. , Insulator dysfunction and oncogene activation in IDH mutant gliomas., Nature. 2016; 529: 110-114 29. Lupiáñez D.G. , Kraft K. , Heinrich V. , Krawitz P. , Brancati F. , Klopocki E. , Horn D. , Kayserili H. , Opitz J.M. , Laxova R. , et al., Disruptions of topological chromatin domains cause pathogenic rewiring of gene-enhancer interactions. Cell. 2015; 161: 1012-1025

Figures

Page 9/16 Figure 1

Venn-Diagram Genes found to be eQTL associated as well as GWAS associated (Lean Mass: meta- analysis of the whole body data) as by Summary based Mendelian Randomization approach for GTEx v7 skeletal muscular tissue and union set of other tissues. Clearly Genes NUDT3 and KLF5 seems to be exclusive enrichment targets for SNP association in skeletal muscle

Page 10/16 Figure 2

Venn-Diagram Genes found to be eQTL associated as well as GWAS associated (Lean Mass: meta- analysis of the appendicular data) as by Summary based Mendelian Randomization approach for GTEx v7 skeletal muscular tissue and union set of other tissues. Clearly Genes NUDT3 and KLF5 seem to be exclusive enrichment targets for SNP association in skeletal muscle tissues.

Figure 3

Westra et. al. eQTLs compared with hand-grip strength GWAS summary combined Mendelian Randomization output to prioritize gene enrichment for corresponding SNPs.

Page 11/16 Figure 4

Westra et. al., studies of eQTL compared with lean mass (metal append) GWAS summary combined in Mendelian Randomization output to prioritize gene enrichment for corresponding SNPs.

Figure 5

Westra et. al., studies of eQTL compared with lean mass (metal whole body) GWAS summary combined in Mendelian Randomization output to prioritize gene enrichment for corresponding SNPs.

Figure 6

GTEx consortium, studies of eQTL compared with lean mass (metal append) GWAS summary combined in Mendelian Randomization output to prioritize gene enrichment for corresponding SNPs for aggregated all tissues (sample results).

Page 12/16 Figure 7

GTEx consortium, studies of eQTL compared with lean mass (appendicular) GWAS summary combined in Mendelian Randomization output to prioritize gene enrichment for corresponding SNPs for skeletal muscle only.

Figure 8

GTEx consortium, studies of eQTL compared with lean mass (whole body) GWAS summary combined in Mendelian Randomization output to prioritize gene enrichment for corresponding SNPs aggregated for all tissues (sample results).

Figure 9

GTEx consortium, studies of eQTL compared with lean mass (whole body) GWAS summary combined in Mendelian Randomization output to prioritize gene enrichment for corresponding SNPs in skeletal muscle tissues (sample results).

Page 13/16 Figure 10

GTEx consortium Handgrip GWAS data with eQTL from Skeletal muscle summary Mendelian randomization result

Figure 11

GTEx consortium Handgrip GWAS data with eQTL aggregated summary Mendelian randomization

Page 14/16 Figure 12

Venn-Diagram Handgrip GWAS summary with eQTL enrichment analysis for GTEx tissues for skeletal muscle tissue vs rest of the tissues.

Page 15/16 Figure 13

TAD plot of KLF5 gene region in 13 for Bladder.

Page 16/16