Structural Forms of the Human Amylase Locus and Their Relationships to Snps, Haplotypes, and Obesity
Total Page:16
File Type:pdf, Size:1020Kb
Structural Forms of the Human Amylase Locus and Their Relationships to SNPs, Haplotypes, and Obesity The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation Usher, Christina Leigh. 2015. Structural Forms of the Human Amylase Locus and Their Relationships to SNPs, Haplotypes, and Obesity. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences. Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:17467224 Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#LAA Structural forms of the human amylase locus and their relationships to SNPs, haplotypes, and obesity A dissertation presented by Christina Leigh Usher to The Division of Medical Sciences in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the subject of Genetics and Genomics Harvard University Cambridge, Massachusetts March 2015 © 2015 Christina Leigh Usher All rights reserved. Dissertation Advisor: Professor Steven McCarroll Christina Leigh Usher Structural forms of the human amylase locus and their relationships to SNPs, haplotypes, and obesity Abstract Hundreds of human genes reside in structurally complex loci that elude molecular analysis and assessment in genome-wide association studies (GWAS). One such locus contains the three different amylase genes (AMY2B, AMY2A, and AMY1) responsible for digesting starch into sugar. The copy number of AMY1 is reported to be the genome’s largest influence on obesity, yet has gone undetected in GWAS. Using droplet digital PCR (ddPCR), sequence analysis, and optical mapping, we characterized eight common structural forms of the amylase locus, their mutational histories, and their relationships to SNPs. We found that AMY1 copy number has a unique distribution undetectable to earlier methods that can be understood from an underlying set of structural forms and their allele frequencies. Despite a history of recurrent structural mutations, AMY1 copy number has maintained partial correlations to nearby SNPs; these SNPs do not associate with body mass index (BMI). To directly test for association, we measured amylase gene copy number using ddPCR in 1,000 Estonians selected for being either obese or lean and in two cohorts totaling ~3,500 individuals using sequence analysis. We had 99% power to detect even the lower bound of the reported effects on BMI and obesity, yet found no association. This study model of using multiple methods to analyze the copy number, structural haplotypes, and surrounding SNP haplotypes of multi-allelic variants will likely facilitate more robust disease association results in future studies. iii Acknowledgements My graduate school experience has been a long journey that would have been even longer without the help of my mentors, friends, and family. I would first like to thank Steve McCarroll for being my mentor. It has been an honor being his first graduate student, and quite possibly, his most stubborn. Yet, despite my greatest protestations, he never stopped giving me good advice that I would secretly follow and good ideas that drove my projects forward. His best gift has been his whole-hearted approval of my career aspirations and the connections to get me there. I would like to thank the members of the McCarroll lab for being fun, supportive, and knowledgeable. I would like to thank the first generation members Cindy Liu, Tom Mullen, and Elizabeth Fels for being a friend, a motivator, and a ridiculously good assistant, respectively. I’d like to thank the newer folks Curtis, Vanessa, Heather, Evan, Dominique, Arpiar, and Michelle for keeping the good vibes going. Of course, I have to give shout outs to Bob Handsaker and Jim Nemesh for being our omnipotent computational specialists. And last, but not least, I have to thank my officemates Linda Boettger, Aswin Sekar, Nolan Kamitaki, and Avery Davis for being more friends than coworkers. Know that even though I grumpily wore my headphones every time you guys were talking, I was still listening. iv I am grateful for having had the opportunity to work with and learn from a wonderful group of collaborators: the labs of Joel Hirschhorn, Timothy Frayling, Charles Lee, and BioNano. It was also a privilege to be guided by my Dissertation Advisory Committee Jesse Gray, Mark Daly, Alkes Price, and James Gusella. I would also like to thank Mark Daly, Terence Capellini, Matt Warman, and John Armour for kindly agreeing to be a part of my dissertation examination committee. And lastly, I’d like to thank the members of my undergraduate lab David Bloom, Nicole Giordani, Dacia Kwiatkowski, Cameron Lilly, and Levi Watson for making science look so fun that I wanted to pursue it in graduate school. I am also indebted to my family (Susan, Michael, Kevin, David, and Joyce Usher, and Heinz and Helga Meincke) for keeping me on track by asking me when I’m going to graduate even more times than the BBS administrative office. But honestly, my life wouldn’t have been as successful without them shipping up to Boston for condo repairs and keeping up long-distance support through cell phones and Facebook. v Table of Contents Introduction ................................................................................................................. 1! Multi-allelic CNVs ................................................................................................... 2! The rationale of the dissertation ............................................................................ 8! The amylase locus .................................................................................................... 9! Chapter 1. Measuring the amylase locus ................................................................ 21! Optimizing the measurement of AMY1 ............................................................... 23! The distribution of amylase gene copy number ................................................. 29! The structural alleles of the amylase locus .......................................................... 31! Chapter 2. Assembling the alleles of the amylase locus ......................................... 35! Published assemblies ............................................................................................. 35! Novel assemblies .................................................................................................... 39! Chapter 3. SNP correlations to the amylase locus ................................................. 45! SNPs capturing structural haplotypes ................................................................ 47! SNPs capturing copy number .............................................................................. 48! Chapter 4. Associating the amylase locus to obesity .............................................. 49! Discussion .................................................................................................................. 54! Discussion of the previous report ......................................................................... 54! Discussion of my proposed study model .............................................................. 57! Future Directions ................................................................................................... 61! Appendix A. Dissecting the timing & origin of gene expression variation .......... 64! Appendix B. Experimental Methods ....................................................................... 87! vi Amylase .................................................................................................................. 87! Het-Seq ................................................................................................................. 107! Appendix C. Protocols for amylase genotyping using ddPCR ........................... 117! References ................................................................................................................ 140! vii Introduction Human genomes have thousands of deletion and duplication polymorphisms. These so-called copy number variants (CNVs) cause as many as ~0.78% of base pairs to differ in copy number between any two individuals’ genomes1. The first indication that these copy number alterations could influence human phenotypes came from sporadic diseases, now termed genomic disorders, caused by de novo structural alterations2. The dominant mutations responsible tended to alter gene dosage and occur in regions of genomic instability caused by segmental duplications and low copy repeats3. It was not long until some inherited genomic disorders and mendelian diseases were linked to inherited CNVs in linkage studies4-6. However, despite the evidence of inheritance, early studies of CNV in normal individuals still referred to common CNVs as recurring, as if these CNVs had independent origins7 and were affecting as yet undefined phenotypes. Though rare and de novo CNVs have well-known roles in disease, with many having strong odds ratios of 2–308-10, we now know from microarray-based genome-wide CNV studies in phenotypically normal populations that most of the CNV in any individual’s genome arises from a reservoir of polymorphisms that are common11, stably 1 inherited11, and mostly benign12. Most deletions and simple duplications are now routinely assessed in genome-wide association