bioRxiv preprint doi: https://doi.org/10.1101/212431; this version posted November 1, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. Genetic variation across the repertoire alters odor perception

Casey Trimmer1,*, Andreas Keller2, Nicolle R. Murphy1, Lindsey L. Snyder1, Jason R. Willer3, Maira Nagai4,5, Nicholas Katsanis3, Leslie B. Vosshall2,6,7, Hiroaki Matsunami4,8, and Joel D. Mainland1,9

1Monell Chemical Senses Center, Philadelphia, Pennsylvania, USA 2Laboratory of Neurogenetics and Behavior, The Rockefeller University, New York, New York, USA 3Center for Human Disease Modeling, Duke University Medical Center, Durham, North Carolina, USA 4Department of Molecular Genetics and Microbiology, Duke University Medical Center, Durham, North Carolina, USA 5Department of Biochemistry, University of Sao Paulo, Sao Paulo, Brazil 6Howard Hughes Medical Institute, New York, New York, USA 7Kavli Neural Systems Institute, New York, New York, USA 8Department of Neurobiology and Duke Institute for Brain Sciences, Duke University Medical Center, Durham, North Carolina, USA 9Department of Neuroscience, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania, USA *[email protected]

ABSTRACT

The human olfactory receptor repertoire is characterized by an abundance of genetic variation that affects receptor response, but the perceptual effects of this variation are unclear. To address this issue, we sequenced the OR repertoire in 332 individuals and examined the relationship between genetic variation and 276 olfactory phenotypes, including the perceived intensity and pleasantness of 68 odorants at two concentrations, detection thresholds of three odorants, and general olfactory acuity. Genetic variation in a single OR frequently associated with odorant perception, and we validated 10 cases in which in vitro OR function correlated with in vivo odorant perception using a functional assay. This more than doubles the published examples of this phenomenon. For eight of these 10 cases, reduced receptor function associated with reduced intensity perception. In addition, we used participant genotypes to quantify genetic ancestry and found that, in combination with single OR genotype, age and gender, we can explain between 10 and 20% of the perceptual variation in 15 olfactory phenotypes, highlighting the importance of single OR genotype, ancestry, and demographic factors in variation of olfactory perception.

Introduction ceptor variation can now be matched to individuals and receptor responses can be directly observed. Understanding how the olfactory system detects odor- have approximately 400 OR that are intact in ants and translates their features into perceptual infor- at least part of the population, but individuals have dif- mation is one of the fundamental questions in olfaction. ferent repertoires of pseudogenes, copy number vari- Although early color vision researchers were unable to ations (CNVs), and single nucleotide polymorphisms directly observe receptor responses, perceptual deficits (SNPs) that can alter receptor responses6–8. While non- caused by genetic variation, i.e. colorblindness, helped functional genes are rare in the (on average to show that color vision is mediated by three receptors 100 heterozygous and 20 homozygous pseudogenes responding to different wavelengths of light1, 2. Guillot, in an individual), they are significantly enriched in OR and then Amoore, extended this idea to olfaction, and genes9. This provides a useful set of “natural knock- proposed that cataloging specific anosmias, the inabil- outs” to examine the role of a single OR in olfactory ity to perceive a particular odorant, may provide similar perception—does loss of function in one OR affect odor- clues linking and perception3–5. Early applications ant detection threshold, intensity, pleasantness, or char- of this idea failed, presumably because olfaction relies acter? Alternatively, the large set of ORs may redun- on hundreds of receptors, and without direct observa- dantly encode odorant representations, such that loss tion of their responses, psychophysical tests could not of function of a single OR only rarely has perceptual con- untangle the fundamental rules of odor coding. sequences. Recent work suggests functional changes With the advent of next-generation genome sequenc- in a single receptor can have significant perceptual con- ing to profile olfactory receptor (OR) genes and cell- sequences, but data linking perceptual and genetic vari- 10–14 based assays to identify ligands for ORs, however, re- ation exist for only five ORs . bioRxiv preprint doi: https://doi.org/10.1101/212431; this version posted November 1, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

To answer this question, we collected a larger dataset and high-throughput sequencing (Supplementary Fig. under the premise that studies of natural perceptual vari- 1), potentially reflecting errors in variant calling due to ation will improve our understanding of the normal trans- read mismapping. We therefore used Sanger sequenc- lation of OR activity to odor perception. By taking advan- ing for OR10G4 and OR10G9. These results indicate tage of high-throughput sequencing, which permits low- that for ORs with high sequence similarity, alternate cost, high-coverage sequencing of the OR subgenome, sequencing methods may prove necessary. However, we examined how perception is affected by genetic vari- high-throughput sequencing successfully identified se- ation in the entire OR repertoire. To validate the func- quence variants with greater than 95% accuracy for tional link between genetic variation and perception, we eight of the 10 ORs examined and can be expected to used a cell-based assay to examine receptor response perform with similar accuracy on ORs where sequenc- to associated odorants. ing reads map with high confidence (90% of ORs have Here, we identified associations between genetic vari- a MAPQ > 30, mapping confidence = 99.9%). ation in 418 OR genes and 276 olfactory phenotypes, used cell-based functional assays to provide a mecha- Genetic variation in single ORs frequently associates nistic basis underlying the associations, and examined with odor perception the contribution of single OR genotype, genetic ances- We first examined the association between the geno- try, age, and sex to variation in olfactory perception. type of single ORs and 276 different phenotypes (Fig. 1 and Supplementary Table 2). Eight odorant per- Results ception phenotypes (12% of the tested odors) signifi- cantly correlated with variation in a single OR (p We carried out high-throughput sequencing of the en- < 0.05 following false discovery rate correction). Note tire OR subgenome in a cohort of 332 participants that multiple ORs in a single locus commonly associated previously phenotyped for their . This with a phenotype due to the structure of the olfactory dataset includes ratings of the perceived intensity and subgenome (see below). These results indicate that al- pleasantness of 68 odorants at two concentrations though a given odorant typically activates multiple ORs, (Supplementary Table 1), detection thresholds of three variation in a single OR frequently associated with per- odorants, and overall olfactory acuity15. Participants ceptual features. For these top associations, OR varia- rated each stimulus twice, and the median within- tion was more likely to associate with the perceived in- subject correlation was 0.63 for intensity rating and 0.57 tensity (88% of eight significant associations) than the for pleasantness. In addition, within-participant variabil- perceived pleasantness of an odorant (p = 0.07 via a ity was similar when ratings were collected either thirty Binomial Test). minutes or one year apart, indicating variability is stable over time15. In vitro assays confirm genetic associations High-throughput sequencing of the olfactory receptor Next ,we used a functional assay to search for a mech- gene family anistic explanation for the associations between OR ge- We used Illumina short-read DNA sequencing to ana- netic variation and perception. Because ORs are com- lyze a target region consisting of 418 ORs and 256 ol- monly found in clusters in the genome17, our associ- factory related genes (approximately 800 kilobases), ob- ation analysis had the resolution to identify regions in taining a minimum of 15x coverage for 96% of targeted the genome associated with odorant perception, but bases, and identified 19,535 variants in open reading could not discriminate between the causal mutation and frames and contiguous regions. We validated a subset nearby SNPs in linkage disequilibrium with the causal of these by comparing them to variants identified from mutation18–20. Sanger sequencing data for 10 ORs. For eight ORs, To illustrate this point, three different ORs in close we found greater than 95% concordance between the proximity on 6 significantly associated two sequencing methods (n ≥ 68 subjects). In contrast with the perceived intensity of 2-ethylfenchol: OR11A1, concordance between Sanger and high-throughput se- OR12D2, and OR10C1 (Fig. 2a). The top-associated quencing was 40% for OR10G4 and 81% for OR10G9. SNP (Supplementary Table 3), found in OR10C1, is in These ORs share a high degree of sequence similarity high linkage disequilibrium with SNPs in OR11A1 and (OR10G4 is 96% identical to OR10G9 and 95% iden- OR12D2 (correlation > 60% in the 1000 EUR tical to OR10G7 at the nucleotide level according to data21, 22). BLAST (basic local alignment search tool)16), decreas- To investigate these associations, we cloned all ma- ing the ability to reliably map a sequencing read to a jor haplotypes in this locus with a frequency greater genomic location. Lower confidence in mapping (rep- than 5% in our participant cohort and tested their re- resented by lower median mapping quality (MAPQ)) sponse to increasing doses of 2-ethylfenchol using a associated with lower concordance between Sanger cell-based luciferase reporter gene assay. The refer-

2/22 bioRxiv preprint doi: https://doi.org/10.1101/212431; this version posted November 1, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

ence haplotype of OR11A1 (MA) was previously demon- in cell culture, with a total of 14 responsive ORs (Fig. strated to respond to 2-ethylfenchol (Fig. 2b)14. We 2a, 3). Three OR/odorant pairs (OR7D4/androstenone, found that two additional haplotypes of OR11A1, MT OR7D4/androstadienone, and OR10G4/guaiacol) were and TA, each with a single amino acid change from the previously identified using this dataset with Sanger se- reference, responded to 2-ethylfenchol with similar sen- quencing11, 14 and confirmed here via Illumina sequenc- sitivities (log(EC50 of reference MA) = -6.8, log(EC50 of ing. For five of these 14 responsive ORs, OR geno- MT) = -6.3, log(EC50 of TA) = -6.2). However, the MT type associated with perceived intensity, and subjects and TA haplotypes had lower maximum responses that with less responsive haplotypes tended to rate the per- differed significantly from the reference MA haplotype ceived intensity of the receptors’ ligands to be less in- (sum of squares test MT against MA, F(3,42) = 529.8, tense than subjects with more responsive haplotypes p < 0.0001; sum of squares TA against MA, F(3,42) = (Fig. 2b,c, 3c,h,i, and a previously published corre- 418.9, p < 0.0001). In contrast, OR12D2 and OR10C1 lation between OR7D4 function and the perceived in- did not respond to 2-ethylfenchol in the in vitro assay. tensity of androstenone11). In contrast for one OR, OR Furthermore, OR11A1 genotype explained 13.1% of genotype associated with perceived intensity, and sub- the variance in the perceived intensity of 2-ethylfenchol jects with the more responsive haplotype rated the re- (p < 0.0001 following FDR correction). Participants ceptor’s ligand to be less intense (Fig. 3d). For two of rated 2-ethylfenchol to be 8 ranks more intense (out of these 14 ORs, OR genotype associated with perceived 68) for each reference MA haplotype they possessed, pleasantness, and subjects with less responsive hap- and 5 and 5.2 ranks less intense for each TA or MT lotypes rated the perceived pleasantness of the asso- haplotype they possessed, respectively (Fig. 2c). In ciated odorant to be higher (isoeugenol) or lower (fen- addition, participants with the reference MA haplotype chone) than subjects with more responsive haplotypes tended to rate 2-ethylfenchol to be less pleasant than TA (Fig. 3e,f). For three ORs, OR genotype associated or MT participants (p > 0.05 following FDR correction). with both intensity and pleasantness, and subjects with These results indicate that of the three ORs found to less responsive haplotypes rated the odorant as less in- significantly associate with the perceived intensity of 2- tense and more pleasant than subjects with more func- ethylfenchol in our analysis, OR11A1 was the best can- tional haplotypes (Fig. 3a, and the previously published didate for a causal receptor at this locus. correlations between OR7D4 and OR10G4 function and the perception of androstadienone11 and guaiacol14, re- spectively). Hyporesponsive haplotypes are associated with changes in perceived intensity or valence For three ORs, haplotype function did not clearly as- We performed a similar analysis for our top 50 associ- sociate with perception: similar in vitro function was as- ated OR/odorant phenotype pairs (colored points in Fig. sociated with changes in perception (Fig. 3b,j) or differ- 1), relating the response of associated and linked ORs ences in function were associated with little change in to odorant perception in our participants. Note that al- perception (Fig. 3g). These ORs may be responding to though several odor mixtures were included in the psy- the associated odorant, while having only a small effect chophysical stimulus set, we only examined associa- on perception. Overall, in vitro functional variation in a tions with monomolecular odorants (gray points in Fig. single OR predicted intensity or pleasantness percep- 1). After examining linkage disequilibrium at each locus tion, or both, for ten different OR/odorant pairs in this 11, 14 (Supplementary Fig. 2) and removing cases where dataset, three of which were previously published . multiple ORs from a single locus associated with the Finally, in four cases phenotypes significantly associ- same odorant, we identified a total of 36 unique OR ated with genetic variation in a single OR locus (p < 0.05 loci/odorant associations. Based on the false discov- following multiple comparisons correction), but we were ery rate for these top associations, we expected 66% unable to identify an OR which responded to the asso- to be false positives23. To discriminate false and real ciated odorant in cell culture (Fig. 4). Not all ORs have positive hits in our association analysis, we cloned all been functionally expressed in in vitro assay systems24. major haplotypes for associated ORs, in addition to at In order to determine if our inability to identify a causal least one haplotype for any OR linked to the associated OR was due to technical limitations, we modified the receptor (SNP correlation > 0.6, corresponding to 200 ORs in Figure 4 using conserved residues in rodent and clones in total, Supplementary Table 4), and tested primate orthologs as a guide and tested their response their response to the associated odorant in our heterol- to the associated odorant. Modified OR6Y1 had nine ogous assay. We tested clones that accounted for, on amino acid changes from human OR6Y1 and was found average, 74% of the haplotypes found in our participant in three primate species (Supplementary Fig. 3a); it population. responded to diacetyl, but not to 2-butanone, provid- We found that 11 (31%) of these loci had at least ing some support for the idea that OR6Y1 is the causal one OR which responded to the associated odorant receptor for the association with diacetyl, although we

3/22 bioRxiv preprint doi: https://doi.org/10.1101/212431; this version posted November 1, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

were unable to observe its response with our assay sys- lated with OR pseudogene frequency (p > 0.05 follow- tem (Supplementary Fig. 3b). ing FDR, data not shown), indicating that number of pseudogenes did not associate with changes in detec- tion threshold, perceived intensity, or perceived pleas- Genetic and demographic influences on human odor antness of single odorants, or our measurement of over- perception all olfactory acuity. We then determined how much phenotypic variance in odorant perception could be explained by various ge- Finally, we built a model to determine how much netic and demographic factors. First, we examined the phenotypic variance in odorant perception could be ex- relationship between genetic ancestry and odorant per- plained by four genetic and demographic factors: the ception in our participant cohort by performing a princi- genotype of the single OR that explained the most vari- pal component (PC) analysis on all genetic data avail- ance for each phenotype, genetic ancestry, gender, and able for our subjects. Self-reported ancestry clustered age (Supplementary Table 6). The genotype of a sin- when participants were plotted along the axes of the gle OR and genetic ancestry were significant contribu- first two PCs (Supplementary Fig. 4a). This clustering tors to several phenotypes (Fig. 5a). Age explained was not dependent on OR genes (correlation between greater than 4% of the variance in several different phe- PC1 from all genetic data and PC1 from non-OR ge- notypes (p < 0.0001). The relative perceived intensity netic data = 0.99, Supplementary Fig. 4b). PC1 sepa- of nonyl aldehyde, linalool, and isobutyraldehyde de- rated Caucasians and Asians from African-Americans creased with age, while the relative perceived pleas- (self-reported), while PC2 separated Caucasians and antness of decyl aldehyde and the detection threshold African-Americans from Asians (self-reported). of isovaleric acid increased with age. Note that these We next examined the correlation of each PC with were changes in relative ranking, not overall decreases all 276 phenotypes. PC1 explained greater than 4% of in sensitivity due to age. Gender explained a small the variance in six different phenotypes (p < 0.01 fol- percentage of variance for some olfactory phenotypes. lowing FDR correction) (Supplementary Fig. 4c and Men tended to rate the relative perceived intensity of Supplementary Table 5). For example, PC1 signif- terpineol, banana, and citral to be higher, the perceived icantly correlated with the perceived pleasantness of intensity of fir to be lower, and had a lower overall olfac- vanillin (r = 0.28, p < 0.0001 following FDR), with self- tory acuity than women (p < 0.002). reported Caucasians and Asians rating vanillin as more The median percentage of total phenotypic variance pleasant than African-Americans (Supplementary Fig. we could explain using single OR genotype, genetic an- 4d). PC2 had a smaller effect on olfactory pheno- cestry, gender, and age was 4.77% (Fig. 5b). We then types (Supplementary Fig. 4e, p > 0.05 for all phe- assessed the relative contribution of all four factors to notypes) and explained at most 4% of the variance the 25 phenotypes for which we could explain the most in spearmint perception (r = -0.20, p = 0.054 follow- variance (p < 0.0001) (blue in Fig. 5b and shown in ing FDR), which self-reported Asians tended to rate as more detail in Fig. 5c). The test-retest correlation for more pleasant than African-Americans and Caucasians a phenotype provided an upper bound for the amount (Supplementary Fig. 4f and Supplementary Table of perceptual variance we could explain (shown in gray 5). These results indicate that genetic ancestry is an Fig. 5c)25. Note we did not measure test-retest cor- important contributor to variation in some olfactory phe- relation values for detection thresholds, as these mea- notypes. surements were only collected once. For some percep- In addition, we examined the frequency of genes tual phenotypes our model accounted for over 80% of that are pseudogenized in a subset of our cohort. We the explainable variance, such as the perceived inten- designated OR haplotypes with nonsense or frameshift sity of diacetyl, where single OR genotype is the main mutations as non-functional. The median number of contributor, and the perceived intensity of nonyl alde- pseudogenized ORs in an individual was 34 out of 418 hyde, where age is the main explanatory variable. For ORs that are intact in at least part of the population other phenotypes, such as the perceived intensity of 2- (Supplementary Fig. 5a), and pseudogene frequency ethylfenchol and linalool, OR genotype, genetic ances- significantly correlated with PC1, as subjects who self- try, age, and gender accounted for less than half of the reported as African-American tended to have a higher explainable variance. These results indicate that while number of non-functional ORs than Caucasian and genetic variation in a single OR is an important contrib- Asian subjects (r = -0.36, p < 0.0001) (Supplementary utor to perceptual variance, methods that consider mul- Fig. 5b). These observations are in accordance with tiple ORs may allow us to account for more of the ex- previous work demonstrating greater pseudogene fre- plainable variance in a particular phenotype. quency in the entire genome in African versus Cau- casian and Asian populations in the 1000 Genomes cohort9. No perceptual phenotypes significantly corre-

4/22 bioRxiv preprint doi: https://doi.org/10.1101/212431; this version posted November 1, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Discussion perceived pleasantness of the ligand in four out of ten cases. Unlike the correlation with perceived intensity, We performed a large-scale examination of the rela- loss of function led to reduced pleasantness in one tionship among OR genetic variation, OR activation, case (isoeugenol) and increased pleasantness in three and odorant perception. We examined 36 different OR cases (fenchone, guaiacol, and androstadienone). For loci/odorant pairs in cell culture, demonstrated that at most associations (nine out of ten), OR variation corre- least one OR from 11 of these loci responded to the lated with changes in both intensity and pleasantness, associated odorant, and found that OR response in although the effect on the latter was usually smaller (be- vitro matched the perceived intensity or pleasantness low the top 50 associations). Rarely (one case out of of the associated odorant for 10 different OR/odorant ten) was variation in OR function related to pleasant- pairs. As three of these pairs have been previously pub- ness changes with no associated changes in perceived lished11, 14, here we describe seven new cases where intensity (isoeugenol in Fig. 3f). One possibility is that genetic variation in a single OR predicts intensity or the change in intensity is driving pleasantness, but with- pleasantness perception. To our knowledge, this is the out knowing how pleasantness changes across a full first large-scale examination of the association between range of concentrations, we cannot test this hypothesis the entire OR subgenome and a large group of olfactory here. phenotypes. Our results have important implications for olfactory Finally, we found that genetic variation in a single coding. First, we demonstrated that the perception of receptor had a larger effect on intensity and pleasant- 13% of the 68 odors tested were tied to genetic and func- ness than on detection threshold. Previous studies fo- tional variation in a single OR. This increases the known cused on the correlation between OR genetic variation 10, 12, 28 cases directly linking OR activation to odorant percep- and differences in detection threshold , but in our tion from five to 1210–14 and places a constraint on the dataset, no single OR associated with the detection amount of redundancy in the combinatorial code. These threshold of vanillin, isovaleric acid, or pentadecalac- cases provide valuable tools to explore and manipulate tone at a significance level that warranted testing in the olfactory code using agonists or antagonists and to cell culture. Poor genotyping frequency prevented us examine the contribution of the activation of individual from testing a published association between OR11H7 10 ORs to the coding of odorant information. and isovaleric acid detection threshold (Fig. 1). How- Second, loss of function of an OR correlated with a ever, the genotype of OR56A4 significantly associated decrease in the perceived intensity of the ligand in eight with the perceived pleasantness of isovaleric acid, but out of ten cases in this dataset, in accordance with previ- not its detection threshold. Similarly, OR7D4 geno- 11, 13, 14 type explains more variance in the perceived intensity ous studies . The specificity of the effect is incon- 11 sistent with a proposed model where the bulk of ORs en- of androstenone than its detection threshold . Differ- code odorant identity and a subset of broadly tuned re- ences in phenotype measurements may also account ceptors encode odorant intensity26. Another proposed for our failure to find a previously identified association model of intensity coding is that the total number of ORs between OR2J3 genotype and cis-3-hexenol percep- tion, as the original study examined detection threshold activated at a given concentration encodes odorant in- 12 tensity. While the number of functional ORs correlated and here we measured perceived intensity (Fig. 1). with the relative intensity within one odorant, this may The high linkage disequilibrium that characterizes not apply across odorants as several odorants that acti- many OR loci makes identifying causal ORs difficult vate a large number of OR types in vitro are not particu- using association analyses of the scale discussed larly intense (eugenol), while odorants that appeared to here17, 29. High resolution analyses capable of dis- activate only a few receptors are particularly potent (thi- secting these loci would require perceptual data from ols). However, these observations may be explained by an extremely large number of subjects. For example, difficulties in comparing concentrations across in vitro genome-wide association studies (GWAS) examining and in vivo studies. The repertoire of activated ORs asparagus urine odor anosmia and cilantro preference changes with odorant concentration, and therefore ge- in 4,742 and 26,455 subjects, respectively, were still un- netic variation in an OR will be perceptually relevant able to narrow down causal regions beyond a cluster of only at certain concentrations. We tested two concentra- ORs30, 31. To overcome these limitations, we used a cell- tions of each odorant, and found that perceived intensity based assay to identify the response of the associated was associated with genetic variation in a particular OR and linked ORs to odorants. more strongly for one concentration. This is consistent Heterologous assays have identified ligands for only with work showing that different receptors in Drosophila 12% of the roughly 400 intact human ORs to date32, melanogaster are necessary for perception in different suggesting that these assays need to be improved to concentration ranges27. functionally express all human ORs. Based on this suc- Third, loss of function in an OR correlated with the cess rate, the expected rate of identifying a causal OR

5/22 bioRxiv preprint doi: https://doi.org/10.1101/212431; this version posted November 1, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

underlying an association was around 12%. Because genetic data available for our subjects (roughly 19,000 the false discovery rate for our top 50 associations was SNPs in both ORs and 256 additional genes), bypass- 66%, among the 36 OR/odorant pairs examined, we ex- ing self-reporting and allowing us to incorporate sub- pected that 12 would be true positives and 24 would be jects who self-reported their ancestry as “Other”. Al- false positives. We in fact found 11 cases where the though we found no significant associations between associated OR/odorant pair was active in vitro, which SNPs in the 256 non-OR genes and the tested pheno- was much higher than the expected 12% (approximately types (Supplementary Table 3), the SNPs were useful one case) given previous work in heterologous assays. in establishing ancestry. We found that ancestry was These results suggest that the receptors that are rele- able to explain a significant portion of the variance in un- vant to perception are enriched in the set of receptors decanal and 2-decenal and the perceived pleasantness that respond in the in vitro assay. ORs which are func- and detection threshold of vanillin (also shown in15), tional in vitro are also more likely to be found in the among others. From these data alone, we were unable set of receptors expressed in the human olfactory ep- to determine if these differences in odorant perception ithelium (OE)33. Furthermore, deorphanized (as listed were due to unknown genetic, cultural, or social factors in33) or perceptually relevant (10–14, Fig. 2, 3) ORs are that co-segregate with ancestry. In conjunction with our expressed at levels 1.5 times higher than other intact finding that OR pseudogene frequency was higher in ORs34 (p < 0.0001 for both via a binomial test). One pos- subjects who self-identified as African-American, this sibility is that a large fraction of intact human ORs are work demonstrates the importance of considering an- nonfunctional in vivo and are not expressed in the OE. cestry when studying odorant perception in diverse pop- In summary, our results indicate that cell-based assays ulations. Models incorporating OR genotype, ances- are a useful proxy for identifying behaviorally relevant try, age, and gender accounted for over 70% of the ex- ORs that are expressed in the OE and whose activation plainable variance (test-retest correlation) for some ol- can be directly tied to perception. Despite these suc- factory phenotypes (guaiacol, diacetyl, and nonyl alde- cesses, there are certainly cases where in vitro results hyde) and less than half of the explainable variance for do not predict perception—in vitro assays lack critical others (2-ethylfenchol, linalool, and androstadienone). components of the OE, including in the mucous These results indicate that considering the contribution layer which transport and modify odorant molecules24. of multiple ORs may be useful in explaining more of Cell-based assays were unable to identify a responsive the variance for some olfactory phenotypes. While the OR for some loci for which we have strong association contribution of genetic and demographic factors to vari- data (Fig. 4a), potentially due to either the association ation in odorant perception has been previously exam- being spurious or the receptor failing to function in the ined, this is the first investigation to examine the entire cell-based assay. In one instance, a modified version OR subgenome, quantify ancestry using genetic varia- of OR6Y1 derived from rodent and primate orthologs re- tion, and relate these and other demographic factors to sponded to diacetyl, supporting the idea that, although a large group of olfactory phenotypes. This work also the human receptor did not function properly in the in highlights the necessity of considering the contribution vitro assay (Supplementary Fig. 3), it was the causal of multiple ORs to odorant perception. receptor underlying this association. Examining cases where the in vitro assay does not match perceptual out- Although we know the OR gene family is character- comes may provide guidance on how to improve these ized by a large amount of genetic and functional varia- assays in the future. tion, the combinatorial nature of the olfactory code and our limited knowledge of OR/odorant pairs makes it dif- We found that several olfactory phenotypes were sig- ficult to translate this variation to differences in percep- nificantly influenced by ancestry. Ancestry is a common tion. Here we focused on the perceptual consequences confounding factor in association studies that incorpo- of loss-of-function of individual ORs, demonstrating that rate different subpopulations35. Although most previous intensity and pleasantness coding for some odors is not studies examining the association between OR geno- redundant and that loss of function in a receptor reduces type and odorant perception circumvented this problem the perceived intensity of the receptor’s ligand. Simi- by using a homogenous participant cohort10, 28, 36, stud- lar studies on colorblindness, a condition in which ge- ies conducted in diverse populations indicate that an- netic variation alters color perception, helped determine cestry may be a significant factor in odorant percep- the tuning of the three photoreceptors to different wave- tion29. A previous examination of the relationship be- lengths of light. Deciphering the quantitative represen- tween participant demographics and odorant perception tation of color by photoreceptors allowed us to digitize in our cohort demonstrated that self-reported ancestry color information so that it can be sent and stored with- (African-American, Caucasian, Asian) significantly cor- out degradation, as well as develop representations of related with some perceptual phenotypes15. Here, we color space that outline how wavelengths of light can extended these results by quantifying ancestry using all be combined to make novel colors. Understanding how

6/22 bioRxiv preprint doi: https://doi.org/10.1101/212431; this version posted November 1, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

the olfactory receptors encode odors should lead to sim- the solvent. All three measurements were ranked on a ilar advances in olfaction: namely digitizing odors and per-task basis, the ranks were averaged for each sub- identifying agonists or antagonists of receptors that can ject, and finally this average was expressed as a rank produce any desired olfactory percept from a small set among all subjects from 1 (lowest acuity) to 391 (highest of primary odors. acuity). Therefore, 276 different phenotypes were ex- amined: perceived intensity and pleasantness rank of 66 odorants at two concentrations, two solvents at one Methods concentration but included in the ranking for both high and low odorant concentration, detection threshold rank Psychophysical testing of three odorants, and overall olfactory acuity. Collection of psychophysical data was previously re- ported by Keller et al. and approved by the Rocke- feller University Institutional Review Board11, 15. Briefly, Sequencing sample preparation and genotyping 391 subjects rated both the intensity and pleasantness Genomic DNA was prepared from venous blood sam- of 66 odors at two concentrations (designated “high” ples with the PAXgene Blood DNA kit (Qiagen). DNA and “low”) and two solvents on a scale of 1 to 7, 1 be- was sheared (Covaris) and ligated to adapters neces- ing “extremely weak” or “extremely unpleasant” and 7 sary for both sequencing and barcoding samples using being “very strong” or “extremely pleasant”. The high the TruSeq kit (Illumina). The OR subgenome was cap- and low odorant concentrations were intensity-matched tured using an Agilent SureSelect Target Enrichment kit to 1/1,000 and 1/10,000 dilutions of 1-butanol, respec- custom-designed to enrich for the open reading frame tively. For six odorants, pure odorant was rated as of ORs and human orthologs of 256 additional genes less intense than the 1/1000 dilution of 1-butanone, found to be highly expressed in mouse olfactory sensory and these were not diluted for testing (odors described neurons (unpublished data) (SureSelect ELID 0352781, in Supplementary Table 1). In addition, the detec- Supplementary Table 7)37. Paired-end sequencing tion thresholds for three odorants (pentadecalactone, was carried out on 332 participants using an Illumina vanillin, and isovaleric acid) were determined for each GAIIx with a read length of 2x75 basepairs. Each sam- subject. ple was individually barcoded and twelve samples were Subjects rated the intensity and pleasantness of each multiplexed per lane. odorant/concentration twice. Within-subject variability Sequence variants were identified using a custom- in odorant rating was determined by calculating the cor- made pipeline that followed the current best practices relation between the first and second rating of all odor- recommended for variant detection by the Broad Insti- ants. Test-retest correlation was calculated by examin- tute38, 39. Reads were aligned to the hg19 reference ing the correlation between the first and second rating genome using BWA40, and the Genome Analysis Toolkit for each olfactory phenotype where duplicate trials were (GATK)41 was used to remove PCR duplicates, realign run. For each subject, the average intensity and pleas- reads around insertions and deletions (indels), recali- antness ratings at each odorant concentration (low and brate base quality scores, and genotype variant sites high) were ranked from 1 to 68, such that the odorant (SNPs and indels) across all 332 subjects simultane- with the highest rated intensity for a concentration was ously using variant quality score recalibration (SNPs) ranked as 68 and the odorant with the lowest rated in- or standard hard filtering (indels). SNPs were phased tensity was ranked as 1. Solvents (propylene glycol and with SHAPEIT42 (excluding SNPs genotyped at a fre- paraffin oil) were rated three times at a single concen- quency < 95% and SNPs which deviated from Hardy- tration, which was averaged and included in the ranking Weinberg equilibrium (p < 0.00001))20, and a custom- with the other 66 odorants at both concentrations. Rank- written R43 script was used to translate the phased vari- ing on a per-subject basis controlled for the different use ant call file into 836 full-length haplotypes (418 ORs x of the rating scale among subjects. Detection thresh- a maternal and a paternal haplotype) for each subject. olds were ranked on a per-odorant basis, such that the Finally, Sanger sequencing was used to confirm the se- subject with the highest detection threshold for a partic- quence of 10 ORs in at least 68 subjects. Genomic ular odorant received a ranking of 1 and the subject with DNA was amplified with Phusion DNA Polymerase the lowest detection threshold for an odorant received (Thermo Fisher Scientific) or EmeraldAmp PCR Mas- a ranking of 391. Three measurements were used to ter Mix (Clontech) using primers up- and downstream calculate general olfactory acuity: percentage of odor- of the OR’s open reading frame (Supplementary Ta- ants where the high concentration was rated as more ble 8). PCR products were purified by ultrafiltration on a intense than the low concentration, percentage of odor- vacuum manifold (NucleoFast 96 PCR, Machery Nagel) ants where the high concentration was rated as more in- and sequenced (ABI 3730XL) at the University of Penn- tense than solvent, and percentage of odorants where sylvania DNA Sequencing Facility. the low concentration was rated as more intense than Putative OR pseudogenes were identified by annotat-

7/22 bioRxiv preprint doi: https://doi.org/10.1101/212431; this version posted November 1, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

ing variants using Variant Effect Predictor to determine and p-values were corrected for multiple comparisons type and impact of mutations44. The sum of OR haplo- using false discovery rate. The total contribution of sin- types with nonsense or frameshift mutations was deter- gle OR genotype, ancestry, age, and gender to pheno- mined for each individual. type variability was calculated by regressing the count of individual OR haplotypes against all 276 phenotypes Association Analysis and incorporating the first two PCs calculated from all The association between olfactory phenotypes and genetic data, age (in years), and gender as covariates single OR genotype was analyzed using multiple (full model). P-values were corrected for multiple com- linear regression to regress the haplotype count (0, parisons using false discovery rate. To calculate the in- 2 1, or 2) of individual ORs against all 276 phenotype dividual contribution (r ) of single OR genotype, ances- measurements using the R statistical package. The try, age, and gender to each phenotype, the percentage analysis was limited to haplotypes found at a frequency of variance explained by a linear model in which each greater than 5% in our cohort and ORs with low covariate was removed was compared to the full linear genotype frequency were removed (29 ORs: OR4F5, model. To determine if a covariate significantly altered OR2T8, OR2T7, OR2T5, OR2T29, OR2T35, OR2T27, the linear model, a one-way analysis of variance was OR5AC1, OR5H8, OR2A4, OR10AC1, OR2A42, used to compare models with and without the covariate. OR2A7, OR2A1, OR4F21, OR51A2, OR52Z1, Consensus odorant receptor OR52N1, OR5G3, OR8G1, OR11H12, OR11H2, The online version of MAFFT version 7 was utilized to OR4Q2, OR11H7. OR4E1, OR4F4, OR1D5, OR1D4, create OR6Y1, OR6B2, and OR56A4 consensus pro- OR4F17). Note the ORs eliminated did not have teins based on the alignment of the orthologs found any single SNPs which significantly associated with in Homo sapiens, Gorilla gorilla gorilla, Pan paniscus, odorant perception (see below for details on SNP Pan troglodytes, Pongo abelii, Macaca mulatta, Man- association analysis). To correct for population struc- drillus leucophaeus, Callithrix jacchus, Microcebus mur- ture (ancestry), the first two PCs calculated from all inus, Rattus norvegicus, and Mus musculus48, 49. The genetic data were incorporated as covariates in the consensus genes were then designed for expression in linear model35. Principal components were calculated human cells using the IDT Codon Optimization Tool and using the SNPRelate45 package in R (after removing synthesized as a standard IDT gBlocks Gene Fragment. SNPs genotyped at a rate < 95% and SNPs in linkage disequilibrium > 0.5)20. P-values were corrected for OR cloning multiple comparisons using false discovery rate23. OR haplotyes for functional testing in cell culture were Although our ranked data is not normally distributed, generated by cloning the respective sequence from linear regression was used to find the coefficients pooled genomic DNA, from a specific subject, or by for each haplotype. Genotype/phenotype association generating polymorphic SNPs by site-directed mutage- was also analyzed using a Kruskal-Wallis test, which nesis using overlap extension PCR50. The open read- demonstrated that p-values between the parametric ing frame of each OR was amplified with Phusion poly- and non-parametric test were significantly correlated merase and cloned into the pCI vector (Promega) con- (r2 = 0.77, p < 0.0001). taining the first 20 amino acids of human rhodopsin51. To analyze phenotype association with the genotype Cloned sequences were confirmed by Sanger sequenc- of single SNPs, individual SNP counts were regressed ing (ABI 3730XL) at the University of Pennsylvania DNA against phenotype measurements, and the first 2 PCs Sequencing Facility. were incorporated as covariates to correct for ances- 46, 47 try in the study population using PLINK . The analy- Luciferase assay sis was limited to variants with minor allele frequencies The Dual-Glo Luceriferase Assay System (Promega) greater than 5% that did not significantly deviate from 20 was used to measure in vitro OR activity as described Hardy-Weinberg equilibrium (p > 0.00001) . P-values previously52, 53. Hana3A cells were co-transfected with were corrected for multiple comparisons using false dis- OR, a short from of receptor transporter 1 covery rate. To examine linkage disequilibrium among (RTP1S)54, the type 2 muscarinic acetylcholine recep- SNPs in OR loci, SNP genotype correlations were calcu- tor (M3-R)55, Renilla luciferase driven by an SV40 pro- lated from 1000 Genomes data (European population) 21, 22 moter, and firefly luciferase driven by a cyclic AMP re- using LocusZoom . sponse element. 18-24 hours post-transfection, ORs were treated with medium or serial dilutions of odorants Contribution of ancestry, age, and gender to percep- spanning 1nM to 1mM in triplicate. Four hours after tion odorant stimulation, luciferase activity was measured First, the correlation between olfactory phenotypes and using the Synergy 2 (BioTek). Normalized luciferase ac- the first two PCs of genetic variation was calculated, tivity was calculated by dividing firefly luciferase values

8/22 bioRxiv preprint doi: https://doi.org/10.1101/212431; this version posted November 1, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

by Renilla luciferase values for each well. Results repre- URL http://www.ncbi.nlm.nih.gov/pubmed/ sent mean response (for 3 wells) +/- s.e.m. Responses 22344438%5Cnhttp://www.pubmedcentral.nih. were fit to a three-parameter sigmoidal curve. An odor- gov/articlerender.fcgi?artid=PMC3299548% ant was considered an agonist if the standard error of 5Cnhttp://www.sciencemag.org/cgi/doi/ the logEC50 was less than 1 log unit, the 95% confi- 10.1126/science.1215040. DOI 10.1126/sci- dence intervals for the top and bottom parameters of the ence.1215040. curve did not overlap, and the extra sum-of-squares test 10. Menashe, I. et al. Genetic elucidation of human confirmed that the odorant activated OR-transfected hyperosmia to isovaleric acid. PLoS biology 5, cells more than empty-vector-transfected cells. An ex- e284 (2007). URL http://www.pubmedcentral. tra sum-of-squares was also used to determine if one nih.gov/articlerender.fcgi?artid=2043052& model fit the data from two haplotypes better than two tool=pmcentrez&rendertype=abstract. DOI separate models. Data were analyzed with GraphPad 10.1371/journal.pbio.0050284. Prism 6. 11. Keller, A., Zhuang, H., Chi, Q., Vosshall, L. B. & Mat- sunami, H. Genetic variation in a human odorant References receptor alters odour perception. Nat. 449, 468– 1. Rushton, W. A cone pigments in the protanope. J 72 (2007). URL http://www.ncbi.nlm.nih.gov/ Physiol 168, 345–359 (1963). /17873857. DOI 10.1038/nature06162. 2. Nathans, J., Piantanida, T., Eddy, R., Shows, T. & 12. McRae, J. F. et al. Genetic variation in the Hogness, D. Molecular genetics of inherited vari- odorant receptor OR2J3 is associated with ation in human color vision. Sci. 232, 203–210 the ability to detect the ”grassy” smelling (1986). DOI DOI: 10.1126/science.3485310. odor, cis-3-hexen-1-ol. Chem. senses 37, 585–93 (2012). URL http://proxy.library. 3. Guillot, M. Physiologie des sensations–anosmies upenn.edu:2626/content/37/7/585.full. DOI partielles et odeurs fondamentales. C.R. Hebd. 10.1093/chemse/bjs049. Acad. Sci. 226, 1307–09 (1948). 13. Jaeger, S. et al. A Mendelian Trait for Ol- 4. Amoore, J. E. Specific Anosmia: a Clue factory Sensitivity Affects Odor Experience to the Olfactory Code. Nat. 214, 1095–1098 and Food Selection. Curr. Biol. 1–5 (2013). (1967). URL http://www.nature.com/nature/ URL http://linkinghub.elsevier.com/ journal/v214/n5093/pdf/2141095a0.pdf. DOI retrieve/pii/S0960982213008531. DOI 10.1038/2141095a0. 10.1016/j.cub.2013.07.030. 5. Amoore, J. E. Specific Anosmia and the Concept 14. Mainland, J. D. et al. The missense of smell: of Primary Odors. Chem. Senses 2, 267–281 functional variability in the human odorant re- (1977). URL http://chemse.oxfordjournals. ceptor repertoire. Nat. neuroscience 17, 114– org/content/2/3/267.abstract. DOI 20 (2014). URL http://www.ncbi.nlm.nih.gov/ 10.1093/chemse/2.3.267. pubmed/24316890. DOI 10.1038/nn.3598. 6. Menashe, I., Man, O., Lancet, D. & Gilad, Y. Dif- 15. Keller, A., Hempstead, M., Gomez, I. A., ferent noses for different people. Nat. genetics 34, Gilbert, A. N. & Vosshall, L. B. An olfac- 143–4 (2003). URL http://www.ncbi.nlm.nih. tory demography of a diverse metropolitan gov/pubmed/12730696. DOI 10.1038/ng1160. population. BMC neuroscience 13, 122 7. Hasin-Brumshtein, Y., Lancet, D. & Olender, T. Hu- (2012). URL http://www.pubmedcentral.nih. man olfaction: from genomic variation to pheno- gov/articlerender.fcgi?artid=3493268& typic diversity. Trends genetics : TIG 25, 178– tool=pmcentrez&rendertype=abstract. DOI 84 (2009). URL http://www.ncbi.nlm.nih.gov/ 10.1186/1471-2202-13-122. pubmed/19303166. DOI 10.1016/j.tig.2009.02.002. 16. Altschul, S. F., Gish, W., Miller, W., Myers, E. & Lip- 8. Olender, T. et al. Personal receptor reper- man, D. Basic local alignment search tool. J Mol toires: olfaction as a model. BMC genomics 13, Biol 215, 403–410 (1990). 414 (2012). URL http://www.pubmedcentral. 17. Glusman, G., Yanai, I., Rubin, I. & Lancet, D. The nih.gov/articlerender.fcgi?artid=3462693& complete human olfactory subgenome. Genome tool=pmcentrez&rendertype=abstract. DOI research 11, 685–702 (2001). URL http:// 10.1186/1471-2164-13-414. genome.cshlp.org/content/11/5/685.full. DOI 9. MacArthur, D. G. et al. A Systematic Survey 10.1101/gr.171001. of Loss-of-Function Variants in Human Protein- 18. Daly, M. J., Rioux, J. D., Schaffner, S. F., Hudson, Coding Genes. Sci. 335, 823–828 (2012). T. J. & Lander, E. S. High-resolution haplotype

9/22 bioRxiv preprint doi: https://doi.org/10.1101/212431; this version posted November 1, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

structure in the . Nat. genetics 29, tool=pmcentrez&rendertype=abstract. DOI 229–32 (2001). URL http://www.ncbi.nlm.nih. 10.1186/jbiol108. gov/pubmed/11586305. DOI 10.1038/ng1001-229. 28. McRae, J. et al. Identification of Regions Associ- 19. Gabriel, S. B. et al. The structure of haplotype ated with Variation in Sensitivity to Food- Related blocks in the human genome. Sci. (New York, Odors in the Human Genome. Curr. Biol. 1–5 N.Y.) 296, 2225–9 (2002). URL http://www.ncbi. (2013). URL http://linkinghub.elsevier. nlm.nih.gov/pubmed/12029063. DOI 10.1126/sci- com/retrieve/pii/S0960982213008543. DOI ence.1069424. 10.1016/j.cub.2013.07.031. 20. Anderson, C. a. et al. Data quality control in 29. Jaeger, S. R., McRae, J. F., Salzman, Y., Williams, genetic case-control association studies. Nat. L. & Newcomb, R. D. A preliminary investiga- protocols 5, 1564–73 (2010). URL http://www. tion into a genetic basis for cis-3-hexen-1-ol nature.com/nprot/journal/v5/n9/abs/nprot. odour perception: A genome-wide association 2010.116.html%5Cnhttp://www.pubmedcentral. approach. Food Qual. Prefer. 21, 121–131 nih.gov/articlerender.fcgi?artid=3025522& (2010). URL http://linkinghub.elsevier. tool=pmcentrez&rendertype=abstract. DOI com/retrieve/pii/S0950329309001268. DOI 10.1038/nprot.2010.116. 10.1016/j.foodqual.2009.08.011. 21. Pruim, R. J. et al. LocusZoom: Regional visualiza- 30. Eriksson, N. et al. Web-based, participant- tion of genome-wide association scan results. Bioin- driven studies yield novel genetic associations forma. 27, 2336–2337 (2011). DOI 10.1093/bioin- for common traits. PLoS genetics 6, e1000993 formatics/btq419. (2010). URL http://www.pubmedcentral.nih. 22. Abecasis, G. R. et al. A map of human gov/articlerender.fcgi?artid=2891811& genome variation from population-scale tool=pmcentrez&rendertype=abstract. DOI sequencing. Nat. 467, 1061–73 (2010). 10.1371/journal.pgen.1000993. URL http://www.pubmedcentral.nih.gov/ 31. Eriksson, N. et al. A genetic variant near olfac- articlerender.fcgi?artid=3042601&tool= tory receptor genes influences cilantro preference. pmcentrez&rendertype=abstract. DOI Flavour 1, 1–9 (2012). 10.1038/nature09534. 32. de March, C. A., Ryu, S., Sicard, G., Moon, C. & 23. Benjamini, Y. & Hochberg, Y. Controlling the false Golebiowski, J. Structure-odour relationships re- discovery rate: A practical and powerful approach viewed in the postgenomic era. Flavour Fragr. J. to multiple testing. J. Royal Stat. Soc. Ser. B 289– 30, 342–361 (2015). 300 (1995). 33. Verbeurgt, C. et al. Profiling of olfactory receptor 24. Peterlin, Z., Firestein, S. & Rogers, M. E. The gene expression in whole human olfactory mucosa. state of the art of odorant receptor deorphaniza- PLoS ONE 9, 21–26 (2014). DOI 10.1371/jour- tion: A report from the orphanage. The J. nal.pone.0096333. general physiology 1–16 (2014). URL http:// 34. Olender, T. et al. The human olfactory tran- www.ncbi.nlm.nih.gov/pubmed/24733839. DOI scriptome. BMC Genomics 17, 619 (2016). 10.1085/jgp.201311151. URL http://bmcgenomics.biomedcentral.com/ 25. Keller, A. et al. Predicting human olfactory articles/10.1186/s12864-016-2960-3. DOI perception from chemical features of odor 10.1186/s12864-016-2960-3. molecules. Sci. 355, 820–826 (2017). URL 35. Price, A. L., Zaitlen, N. A., Reich, D. & Patter- http://www.sciencemag.org/lookup/doi/ son, N. New approaches to population stratifica- 10.1126/science.aal2014. DOI 10.1126/sci- tion in genome-wide association studies. Nat. Rev. ence.aal2014. 11, 459–463 (2010). URL http://dx.doi.org/10. 26. Firestein, S. How the olfactory system makes 1038/nrg2813. DOI 10.1038/nrg2813. sense of scents. Nat. 413, 211–8 (2001). URL http: 36. Knaapila, A. et al. A genome-wide study on //www.ncbi.nlm.nih.gov/pubmed/11557990. DOI the perception of the odorants androstenone 10.1038/35093026. and galaxolide. Chem. senses 37, 541–52 27. Asahina, K., Louis, M., Piccinotti, S. & Vosshall, (2012). URL http://www.pubmedcentral.nih. L. B. A circuit supporting concentration-invariant gov/articlerender.fcgi?artid=3452230& odor perception in Drosophila. J. Biol. 8, 9 tool=pmcentrez&rendertype=abstract. DOI (2009). URL http://jbiol.com/content/ 10.1093/chemse/bjs008. 8/1/9%5Cnhttp://www.pubmedcentral.nih. 37. Mainland, J. D., Willer, J. R., Matsunami, H. gov/articlerender.fcgi?artid=2656214& & Katsanis, N. Next-Generation Sequencing

10/22 bioRxiv preprint doi: https://doi.org/10.1101/212431; this version posted November 1, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

of the Human Olfactory Receptors. Olfac- 48. Katoh, K. & Standley, D. M. MAFFT multiple se- tory Recept. Methods Protoc. 1003, 133–147 quence alignment software version 7: Improve- (2013). URL http://link.springer.com/10. ments in performance and usability. Mol. Biol. Evol. 1007/978-1-62703-377-0. DOI 10.1007/978-1- 30, 772–780 (2013). DOI 10.1093/molbev/mst010. 62703-377-0. 49. Katoh, K. MAFFT version 7. 38. Auwera, G. A. V. D. et al. From FastQ data 50. Heckman, K. L. & Pease, L. R. Gene splicing to high confidence varant calls: the Genonme and mutagenesis by PCR-driven overlap exten- Analysis Toolkit best practices pipeline. In sion. Nat. protocols 2, 924–32 (2007). URL http: Bateman, A., Pearson, W. R., Stein, L. D., //www.ncbi.nlm.nih.gov/pubmed/17446874. DOI Stormo, G. D. & Yates, J. R. (eds.) Curr 10.1038/nprot.2007.132. Protoc Bioinformatics, vol. 11 (2014). DOI 10.1002/0471250953.bi1110s43.From. 51. Krautwurst, D., Yau, K.-w., Reed, R. R. & Hughes, H. Identification of Ligands for Olfactory Receptors 39. DePristo, M. A. et al. A framework for variation dis- by Functional Expression of a Receptor library. Cell covery and genotyping using next-generation DNA 95, 917–926 (1998). sequencing data. Nat. genetics 43, 491–8 (2011). URL http://dx.doi.org/10.1038/ng.806. DOI 52. Trimmer, C., Snyder, L. L. & Mainland, J. D. High- 10.1038/ng.806. throughput analysis of mammalian olfactory recep- tors: measurement of receptor activation via lu- 40. Li, H. & Durbin, R. Fast and accurate short read ciferase activity. J. visualized experiments : JoVE alignment with Burrows-Wheeler transform. Bioin- 1–10 (2014). URL http://www.ncbi.nlm.nih. forma. 25, 1754–1760 (2009). DOI 10.1093/bioin- gov/pubmed/24961834. DOI 10.3791/51640. formatics/btp324. 53. Zhuang, H. & Matsunami, H. Evaluating cell- 41. McKenna, A. et al. The Genome Analysis surface expression and measuring activation of Toolkit: a MapReduce framework for analyzing mammalian odorant receptors in heterologous next-generation DNA sequencing data. Genome cells. Nat. protocols 3, 1402–13 (2008). URL research 20, 1297–303 (2010). URL http: http://proxy.library.upenn.edu:2219/nprot/ //genome.cshlp.org/content/20/9/1297.full. journal/v3/n9/full/nprot.2008.120.html. DOI DOI 10.1101/gr.107524.110. 10.1038/nprot.2008.120. 42. Delaneau, O., Marchini, J. & Zagury, J.-F. A lin- 54. Zhuang, H. & Matsunami, H. Synergism of ac- ear complexity phasing method for thousands of cessory factors in functional expression of mam- genomes. Nat. methods 9, 179–81 (2012). URL malian odorant receptors. The J. biological http://dx.doi.org/10.1038/nmeth.1785. DOI chemistry 282, 15284–93 (2007). URL http:// 10.1038/nmeth.1785. www.ncbi.nlm.nih.gov/pubmed/17387175. DOI 10.1074/jbc.M700386200. 43. R Development Core Team. R: A language and en- vironment for statistical computing. R Foundation 55. Li, Y. R. & Matsunami, H. Activation state of the M3 for Stat. Comput. Vienna, Austria (2008). muscarinic acetylcholine receptor modulates mam- malian odorant receptor signaling. Sci. signaling 44. McLaren, W. et al. Deriving the consequences of 4, ra1 (2011). URL http://www.pubmedcentral. genomic variants with the Ensembl API and SNP nih.gov/articlerender.fcgi?artid=3034603& Effect Predictor. Bioinforma. 26, 2069–2070 (2010). tool=pmcentrez&rendertype=abstract. DOI DOI 10.1093/bioinformatics/btq330. 10.1126/scisignal.2001230. 45. Zheng, X. et al. A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinforma. 28, 3326–3328 Acknowledgements (2012). DOI 10.1093/bioinformatics/bts606. This work was supported by R03 DC011373 and 46. Purcell, S. et al. PLINK: a tool set for whole- R01 DC013339 to J.D.M; T32 DC000014 and F32 genome association and population-based linkage DC014202 to C.T.; DC005782, DC012095 and analyses. Am. journal human genetics 81, 559– DC014423 to H.M.; and 2017/00726-2 to M.N. from the 75 (2007). URL http://www.pubmedcentral. São Paulo Research Foundation (FAPESP). A portion nih.gov/articlerender.fcgi?artid=1950838& of the work was performed using the Monell Chemosen- tool=pmcentrez&rendertype=abstract. DOI sory Receptor Signaling Core and Genotyping and 10.1086/519795. DNA/RNA Analysis Core, which were supported, in 47. Purcell, S. & Chang, C. PLINK 1.07. part, by funding from the US National Institutes of

11/22 bioRxiv preprint doi: https://doi.org/10.1101/212431; this version posted November 1, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Health NIDCD Core Grant P30 DC011735. Collec- tion of psychophysical data was supported by UL1 TR000043 from the Clinical and Translational Science Award program at the National Center for Advancing Translational Sciences. L.B.V. is an investigator of the Howard Hughes Medical Institute.

Author Contributions J.D.M., A.K., H.M., and L.B.V. conceived and designed the project. C.T., N.R.M., L.L.S., J.R.W., M.N., N.K., and J.D.M. performed research. A.K. collected the psy- chophysical data and provided DNA samples under the supervision of L.B.V. C.T. carried out the analysis and wrote the paper with input from all authors. J.D.M. su- pervised the project.

Competing Financial Interests J.D.M is on the scientific advisory board of Aromyx and receives compensation for these activities. L.B.V is on the scientific advisory board of International Flavors and Fragrances, Inc. and receives compensation for these activities.

12/22 bioRxiv preprint doi: https://doi.org/10.1101/212431; this version posted November 1, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

OR10R2OR6Y1OR10Z1OR10J5OR13G1OR1C1OR2W3 OR2T10OR6B2 OR2Y1OR14J1OR12D2OR11A1OR10C1OR2A25 OR13C8 OR5C1 OR52A4 OR56A4OR56A1OR6A2 OR4A16OR4C15OR5W2 OR10Q1OR5A2OR4D6OR8D4OR4D5OR10G4OR10G9OR10D3OR6C1 OR4K15 OR4F15OR1A1OR3A2 OR7D4 OR11H1 O O Intensity [High] OCH3 H OH Intensity [Low] 10 OH Pleasantness [High] Diacetyl O Pleasantness [Low] Acuity Guaiacol O H Detection O O

8 2−Ethylfenchol H

HH O S O O Isobutyraldehyde O O O O H HO O O H O Isovaleric acid

H Orange HO O O OH O H

O H O H O H O Bourgeonal OH O O O O O 6 OH O H O

O O H O O H O Pass FDR p = 0.05 Diallyl sulfide Androstenone Androstadienone Linalool 2−Butanone Isoeugenol Fenchone Citral Citral Linalool O r−Limonene Nonyl aldehyde Octyl aldehyde (−)−Menthol HO Citronella Cineole Galaxolide 4−Methylvaleric acid Heptyl acetate Orange Lime Caproic acid Spearmint Eugenol methyl ether Butyric acid Caproic acid −log10(p-value) Heptyl acetate Butyl acetate Isobutyric acid 2−Ethylfenchol 4 Isovaleric acid Paraffin oil Top 50 p = 0.66

2

OR2J3/Cis-3-hexen-1-ol OR11H7/Isovaleric acid 0 1 2 3 5 6 7 8 9 10 11 12 14 151617 1922X Chromosome

Figure 1. Association between OR haplotype and odorant perception. ORs are plotted by chromosomal position against association with an olfactory phenotype (-log10 p-values): perceived intensity (circles) or pleasantness (squares) rank of 68 odorants at high (closed) and low (open) concentrations, detection threshold rank of 3 odorants (triangle), or general olfactory acuity (diamond). The gray line represents p = 0.05 following multiple comparisons correction (FDR). The black line represents p = 0.66 (following FDR), the significance cutoff of the top 50 associations which were tested in cell culture (colored points). Associations with mixtures above the black line are shown in gray (spearmint, citronella, orange, and lime), because we did not analyze mixtures as ligands in our heterologous expression experiments. Points are colored by associated odorant. For loci where multiple ORs associate with an odorant, only the top association is labeled. Associated ORs are identified at the top of the graph. Previously published associations are shown in white (different dataset10, 12) or indicated by black arrows (same dataset11,14).

13/22 bioRxiv preprint doi: https://doi.org/10.1101/212431; this version posted November 1, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. a b 150

OR12D2OR11A1OR10C1 50 OR11A1 MA (79.5%)* 8 2 r 80 OR11A1 TA (4.8%) -50 OR11A1 MT (12.0%) 0.8 Vector control 0.6 -150 0.4 Recombination Rate (cM/Mb) 6 0.2 60

Normalized Luciferase Value -250 -8 -6 -4 -2 [2-Ethylfenchol] (log M) alue) c Intensity Pleasantness 4 40 2−Et h yl f -log10 (p− v enchol [High] 2 20

0 0 p < 0.0001 29.3 29.4 −10 0 10 −10 0 10 Position on (Mb) 29.5 Change in Odor Rank

Figure 2. Functional variation in OR11A1 correlates with the perception of 2-ethylfenchol. (a) Association (-log10 p-values) between SNPs in a locus on chromosome 6 and the perceived intensity of 2-ethylfenchol. The most highly associated SNP is shown in purple, and flanking SNPs are colored according to their linkage disequilibrium with the best-associated SNP (pairwise r2 values from 1000 Genomes EUR data21, 22). Recombination rates in this locus are shown in blue (right axis). ORs labeled at the top of the plot were tested in cell culture for their response to 2-ethylfenchol. Unlabeled lines are non-OR genes or ORs that were not tested in cell culture due to low correlation with the top SNP. (b) Response of OR11A1 haplotypes to increasing doses of 2-ethylfenchol. Error bars, s.e.m. of three replicates. Y-axis values are normalized to the baselined response of the reference haplotype. Letters indicate the amino acids which differ from hg19 reference haplotype (marked with an asterisk). (c) Changes in perceived intensity and pleasantness rank elicited by a single copy of each OR11A1 haplotype (intensity: p < 0.0001, pleasantness: p > 0.05 following FDR).

14/22 bioRxiv preprint doi: https://doi.org/10.1101/212431; this version posted November 1, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

a Intensity Pleasantness f Intensity Pleasantness 150 OR10G9 ATVMRYVLR 150 OR10G4 APMERK OR10G9 ATAMQYVLH* OR10G4 ALMVRK* Isoeugenol [Low]

OR10G9 ATTVQYVLH Guaiacol [High] OR10G4 VLVEGQ 100 OR10G9 VIAMRCLVR 100 Vector control Vector control

50 50

0 0 Normalized Luciferase Value Normalized Luciferase Value -50 -50 -8 -6 -4 -2 -8 -6 -4 -2 [Guaiacol] (log M) −10 0 10 −10 0 10 [Isoeugenol] (log M) −10 0 10 −10 0 10 Change in Odor Rank Change in Odor Rank b Intensity Pleasantness g Intensity Pleasantness 150 OR10G7 STVQAI 150 OR10G7 STVQAI OR10G7 TTAPGI* OR10G7 TTAPGI*

OR10G7 STAPGI Isoeugenol [Low]

OR10G7 STAPGI Guaiacol [High] 100 OR10G7 TMAPGI 100 OR10G7 TMAPGI OR10G7 STAPGV OR10G7 STAPGV Vector control Vector control 50 50

0 0 Normalized Luciferase Value -50 Normalized Luciferase Value -50 -8 -6 -4 -2 -8 -6 -4 -2 [Guaiacol] (log M) −10 0 10 −10 0 10 [Isoeugenol] (log M) −10 0 10 −10 0 10 Change in Odor Rank Change in Odor Rank c Intensity Pleasantness h Intensity Pleasantness 150 OR10J5 R* 150 OR1C1 A* OR10J5 W OR1C1 V Vector control Bourgeonal [Low] Vector control 100 100 Linalool [Low]

50 50

0 0 -8 -6 -4 Normalized Luciferase Value -50 Normalized Luciferase Value -50 -8 -6 -4 -2 -8 -6 -4 -2 [Bourgeonal] (log M) −10 0 10 −10 0 10 [Linalool] (log M) −10 0 10 −10 0 10 Change in Odor Rank Change in Odor Rank d Intensity Pleasantness i Intensity Pleasantness 150 OR2A25 SVA* 150 OR1A1 RMP OR2A25 NAP OR1A1 RVP* Caproic acid [Low] OR2A25 NVP OR1A1 HVP Linalool [High] 100 OR2A25 NVA 100 OR1A1 RVS OR2A25 SVP Vector control Vector control 50 50

0 0 Normalized Luciferase Value Normalized Luciferase Value -50 -50 -8 -6 -4 -2 -8 -6 -4 -2 [Linalool] (log M) −20 −10 0 10 20−20 −10 0 10 20 [Caproic acid] (log M) −10 0 10 −10 0 10 Change in Odor Rank Change in Odor Rank e Intensity Pleasantness j Intensity Pleasantness 150 OR2W1 N 150 OR11A1 MA* OR2W1 D*

OR11A1 TA Vector control Caproic acid [High]

OR11A1 MT F 100 100 Vector control enchone [High] 50 50 0 0 -50 -50

Normalized Luciferase Value -100

Normalized Luciferase Value -100 -8 -6 -4 -2 -8 -6 -4 -2 [Fenchone] (log M) −10 0 10 −10 0 10 [Caproic acid] (log M) −10 0 10 −10 0 10 Change in Odor Rank Change in Odor Rank Less intense More intense Less pleasant More pleasant Less intense More intense Less pleasant More pleasant

p < 0.05 following FDR Top 50 associations Below top 50 associations

Figure 3. Functional variation in vitro associates with changes in perceived intensity and pleasantness. The correlation between genetic variation, OR response to odorants in vitro, and perceptual rankings for all OR/odorant associations with at least one responsive OR (excluding previously published OR/odorant pairs11, 14). Left panels show the response of different OR haplotypes to increasing doses of odorant. Error bars, s.e.m. of three replicates. Y-axis values are normalized to the baselined response of either the reference or the most responsive haplotype. Right panels show the change in perceived intensity and pleasantness rank elicited by a single copy of the haplotype. Letters indicate the amino acids that differ from hg19 reference haplotype (marked with an asterisk). Pink labels indicate associations that fall within the top 50, and darker pink labels indicate associations that are significant (p < 0.05) following FDR correction.

15/22 bioRxiv preprint doi: https://doi.org/10.1101/212431; this version posted November 1, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

a Intensity Pleasantness

OR6Y1 V*

Diacetyl [Low] OR6Y1 I

−10 0 10 −10 0 10 Change in Odor Rank

b Intensity Pleasantness

OR6B2 RC* yde [Low] alde h

uty r OR6B2 CR Iso b

−10 0 10 −10 0 10 Change in Odor Rank

c Intensity Pleasantness OR56A4 ELINITK OR56A4 ALINIAK OR56A4 ELINIAK* OR56A4 ELVNIAK

ic acid [Low] OR56A4 ELINIAR ale r

v OR56A4 EHINIAK

Is o OR56A4 ELINTAK OR56A4 ELISIAK −40 0 40 −40 0 40 Change in Odor Rank d Intensity Pleasantness

OR6Y1 V* utanone [Low] OR6Y1 I 2− B

−10 0 10 −10 0 10 Change in Odor Rank Less intense More intense Less pleasant More pleasant

p < 0.05 following FDR Below top 50 associations

Figure 4. Causal ORs were not identified for all significant associations. Significant associations between genetic variation and perceptual rankings for which we were unable to identify a causal receptor. The change in perceived intensity and pleasantness rank elicited by a single copy of the haplotype is shown for OR6Y1 and diacetyl (a), OR6B2 and isobutyraldehyde (b), OR56A4 and isovaleric acid (c), and OR6Y1 and 2-butanone (d). Panels are ordered by significance. Letters indicate the amino acids that differ from hg19 reference sequence (marked with an asterisk). Pink labels indicate associations that fall within the top 50, and darker pink labels indicate associations which are significant following FDR correction.

16/22 bioRxiv preprint doi: https://doi.org/10.1101/212431; this version posted November 1, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

a High Odorant Concentration 15 Low Odorant Concentration Guaiacol Intensity 2−Ethylfenchol Intensity

Diacetyl Intensity

10 Isobutyraldehyde Intensity Linalool Intensity Undecanal Intensity Vanillin Pleasantness iance Explained a r Nonyl aldehyde Intensity Linalool Intensity Vanillin Detection 5 Octyl acetate Decyl aldehyde Pleasantness Isobutyraldehyde Intensity Pleasantness Spearmint Pleasantness Isovaleric acid Detection − Terpineol Intensity Fir Intensity Overall acuity Banana Intensity Citral Intensity % Phenotypic V

0 − − −

Top OR Genetic Age Gender Genotype Ancestry b 30

25

20

15

10 Phenotype Count

5

0 0 5 10 15 20 % Phenotypic Variance Explained c

p < ns0.05 40 Single OR Genotype Genetic Ancestry Age Sex Difference between r2 of full model and total r2 of individual covariates Test-retest r2 High Odorant Concentration 30 Low Odorant Concentration iance Explained

a r 20

10 % Phenotypic V OR10J5 OR10G9 OR6Y1 OR12D2 OR4D6 OR14J1 OR2L5 OR10G7 OR56A4 OR13C8 OR7D4 OR3A3 OR6C68 OR2T10 OR9A2 OR2A25 OR51D1 OR8A1 OR6Y1 OR1C1 OR10C1 OR14I1 OR6B2 OR10G4 OR11A1 0 al Intensity yde Intensity yde Intensity ate Intensity Overall acuity Cit r anillin Detection affin oil Intensity ic acid Detection 2M4MP Intensity Linalool Intensity Diacety /Intensity enchol Intensity utanone Intensity V uty r alde h anillin Pleasantness Linalool Intensity ic acid Pleasantness Anise Pleasantness Guaiacol Intensity V Pa r yl f 2−Decenal Intensity Linalool Pleasantness ale r yl alde h Bourgeonal Intensity 2− B Undecanal Intensity v uty r xyl b ale r Isoeugenol Pleasantness v Is o No n H e 2−Et h Iso b Is o Octyl acetate Pleasantness Androstadienone Intensity

Figure 5. Contribution of single OR genotype, genetic ancestry, age and gender to odorant perception. (a) Individual contributions of single OR genotype, genetic ancestry, age, and gender to 276 odorant perception phenotypes, focusing on the OR able to explain the most variance in a particular phenotype (“Top OR Genotype”). The width of the violin plot indicates the number of phenotypes for which each factor explains that particular percentage of variance, and the line indicates the median percentage of variance explained. Top five phenotypes are labeled for each factor. (b) Distribution of the total percentage phenotypic variance explained by the full model. Bars in blue illustrate the top 25 perceptual phenotypes (p < 0.0001 following FDR), and the relative contribution of all 4 genetic and demographic factors to variance in these phenotypes is shown in (c). Covariates that significantly alter the linear model are shown in full color (p < 0.05), as determined by a one-way analysis of variance comparing the complete model (with all four covariates) to a model excluding the covariate. Gray bars indicate the total explainable variance for each phenotype, as determined by its test-retest value. Bold labels indicate high odorant concentration, and plain labels indicate low odorant concentration.

17/22 bioRxiv preprint doi: https://doi.org/10.1101/212431; this version posted November 1, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

350 n=6 100 318− n=2 300 − 80 Mean % Concordance 250 n=2− 60 200

150 40 OR Count

100 20 50 18 OR10G4 0 0 0 5 10 15 20 25 30 35 40 45 50 55 60 Median Mapping Quality

Supplementary Figure 1. Median mapping quality for each OR. Distribution of the median mapping quality (MAPQ) calculated for all reads for 100 subjects (left axis), and the average percent concordance between Illumina and Sanger sequencing for 10 ORs of various mapping qualities (right axis).

18/22 bioRxiv preprint doi: https://doi.org/10.1101/212431; this version posted November 1, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

a b c d e f g

OR10J1 OR10K1OR10R2 OR6Y1OR6P1OR10X1OR10Z1 OR8D4 OR4D5OR6T1 OR10S1OR10G6OR10G4OR10G9OR10G8OR10G7 OR6B2OR6B3 OR56A4OR56A1 OR10J5 OR4K15 OR7D2 OR7D4

2 *Diacetyl Intensity 2 *Guaiacol Intensity 2 Isobutyraldehyde Intensity 2 *Isovaleric acid Pleasantness 2 Bourgeonal2 Intensity 2 *Diallyl sulfide Pleasantness2 2 *Androstenone Intensity r r r r r r r r r 10 2-Butanone Intensity Guaiacol Pleasantness Isobutyric acid Pleasantness Androstadienone Intensity 100 Isoeugenol Pleasantness 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 Androstadienone r-Limonene Intensity 0.8 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 Pleasantness 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 8 80 Recombination Rate (cM/Mb)

6 60

-log10 (p−value) 4 40

2 20

0 0

158.5 158.6 123.8 123.9 240.9 241 6 6.1 159.3 159.4 159.520.2 159.6 159.720.4 20.5 9.2 20.79.3 9.4 Position on (Mb) Position on (Mb) Position on Chromsome 2 (Mb) Position on Chromosome 11 (Mb) Position on Chromosome 1 (Mb) Position on Chromosome 14 (Mb) Position on (Mb)

h i j k l m n

OR6B1 OR2A5OR2A25OR2A12OR2A2OR2A14 OR13F1OR13C4OR13C3OR13C8OR13C5OR13C2OR13C9 OR1L4 OR1L6 OR5C1OR1K1 OR13G1OR6F1OR14A2OR1C1 OR14A16OR11L1 OR2W3 OR52E2 OR52A4OR52A5OR52A1 OR2T29OR2T34OR2T10OR2T11OR2T35OR2T27OR14I1 OR5D14OR5L1OR5L2OR5D16 OR5W2OR5I1

2 2 Citral2 Intensity 2 2 Citral2 Intensity 2 Linalool Intensity 2 Octyl aldehyde Pleasantness 2 Nonyl2 aldehyde Intensity2 2 (-)-Menthol Intensity *Linalool Intensity r r 10 r r r r r r r *4-Methylvaleric acid Intensity r r r 100 0.8 0.8 0.8 0.8 0.8 0.8 0.8 Butyric acid Pleasantness 0.8 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 8 80 Recombination Rate (cM/Mb)

6

alue) 60

4

-log10 (p− v 40

2 20

0 0 143.6 143.7 143.8 107.2 143.9 107.3 125.4107.4 125.5107.5 247.7125.6 125.7247.9 248 5 248.25.1 248.6 5.2 248.7 5.3 248.8 55.5 55.6 55.7 Position on (Mb) Position on (Mb) Position on Chromosome 9 (Mb) Position on Chromosome 1 (Mb) Position on Chromosome 11 (Mb) Position on Chromosome 1 (Mb) Position on Chromosome 11 (Mb)

o p q r s t u

OR2AG2OR2AG1OR6A2 OR10A5OR10A2OR10A4OR2D2 OR6C1OR6C3 OR6C75 OR6C65OR6C76OR6C2OR6C70OR6C68 OR4A16OR4A15 OR4C15OR4C11 OR5AN1 OR5A2OR5A1OR4D6OR4D10 OR2W1 OR2B3OR2J3 OR2J2 OR14J1 OR1A2OR1A1OR1D4 OR3A2OR3A1OR3A4 OR2Y1

2 2 Cineole Pleasantness 2 Heptyl acetate Intensity 2 Eugenol methyl ether 2 Galaxolide Intensity 2 Caproic acid Intensity 2 *Caproic acid Intensity 2 2 Heptyl acetate Pleasantness r r r r2 r r r r2 r r r Intensity 10 100 Androstadienone 10 0.8 0.8 0.8 0.8 0.8 Pleasantness 0.80.8 0.8 0.8 0.8 0.8 100 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.60.6 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.40.4 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.20.2 8 80 8 Recombination r 80 Recombination Rate (cM/Mb)

alue) 6 60 6

alue) 60 ate (cM/Mb) 4 40

4 -log10 (p− v

-log10 (p− v 40

2 20 2 20

0 0 0 0 6.7 6.8 55.6 6.9 7 54.955.8 55 55.1 55.2 55.3 59.2 29.1 59.4 29.2 3 3.129.4 29.51803.2 180.13.3 180.2 180.3 Position on Chromosome 11 (Mb) Position on (Mb) Position on Chromosome 11 (Mb) Position on Chromosome 11 (Mb) Position on Chromosome 6 (Mb) Position on (Mb) Position on (Mb)

v w x y z

OR4F6OR4F15 OR11H1 OR4C15 OR5D14 OR10Q1 OR10W1 OR5B17 OR10D3OR8G2OR8G1OR8G5OR8D1OR8D2 OR8B2OR8B3

2 Butyl acetate Intensity 2 2 Isovaleric acid Intensity 2 22 Paraffin oil Pleasantness 2 22-Ethylfenchol Intensity 2 Isoeugenol Intensity r r r r rr 10r r r 100 0.8 0.8 0.8 0.8 0.80.8 0.8 0.8 0.8 10 0.6 0.6 0.6 0.6 0.60.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 8 80 Recombination Rate (cM/Mb) 8

6 60 alue) 6 alue) (p− v 10 o g l 4 40 −

4 -log10 (p− v

2 20

2

0 0

0 102.2 102.3 102.416.3 102.516.455.1 55.216.5 16.657.855.4 55.557.9 58123.8 123.958.1 124 124.1 124.2 124.3 Position on (Mb) Position on (Mb) Position on Chromosome 11 (Mb) Position on Chromosome 11 (Mb) Position on Chromosome 11 (Mb)

Supplementary Figure 2. Genetic variation in OR clusters associates with odorant perception. Association (-log10 p-values) between SNPs in 27 different OR loci and the 38 perceptual phenotypes shown in the top right corner of each panel (note one locus associated with two phenotypes is shown in Fig. 2a). The most highly associated SNP is shown in purple, and flanking SNPs are colored according to their linkage disequilibrium with the best-associated SNP (pairwise r2 values from 1000 Genomes EUR data21, 22). Recombination rates in each locus are shown in blue (right axis). ORs labeled at the top of the plot were tested in cell culture for their response to the associated odorant, and the associated ORs are shown in bold. Unlabeled lines are non-OR genes or ORs that were not tested in cell culture due to low correlation with the top SNP.

19/22 bioRxiv preprint doi: https://doi.org/10.1101/212431; this version posted November 1, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

a Amino acid # 11 16 32 35 122 176 197 202 259 Homo sapien H R F I I M A M M Gorilla gorilla R H F I I T A L T Pan paniscus R H F I I T A L T Pan troglodytes R H F I I T A L T Pongo abelii R H F I I T A L T Macaca mulatta R H L V V T S L T Mandrillus leucophaeus R H L V V T S L T Microcebus murinus R H L V V T S L T Callithrix jacchus G H L V V T S L T Rattus norvegicus W H L V V T S L T Mus musculus W H L V V T S L T

Consensus R H L V V T S L T b 150 Consensus OR6Y1 Human OR6Y1 Vector control 100

50

0

Normalized Luciferase Value -50 -4 -3 -2 -1 0 [Diacetyl] (log M)

Supplementary Figure 3. Modified OR6Y1 responds to diacetyl. (a) Comparison of human OR6Y1 to orthologous receptors from 10 species. The most common amino acid for each position, highlighted in gray, was used to make a consensus receptor with 9 amino acid changes from human OR6Y1. (b) Dose-response curves for human and modified OR6Y1. Responses are from cells transfected with either a plasmid encoding the indicated odorant receptor or an empty vector stimulated with diacetyl. Error bars, s.e.m. of three replicates.

20/22 bioRxiv preprint doi: https://doi.org/10.1101/212431; this version posted November 1, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

a b African−American Caucasian 0.05 Asian Native American Other 0.2 Do Not Wish To Specify

0.00

0.1 −0.05 incipal Component 2 P r 0.0 −0.10 incipal Component 1: non−OR SNPs 2 P r r = 0.98, p < 0.0001 −0.10 −0.05 0.00 0.05 −0.10 −0.05 0.00 0.05 Principal Component 1 Principal Component 1: All SNPs

c 8 d Vanillin Pleasantness 2−Decenal Intensity Vanillin Detection 60 Undecanal Intensity

y PC1 6 Octyl acetate Pleasantness

Overall acuity More pleasant 40 4 w] Pleasantness Rank

iance Explained b 2 20 a r anillin [L o % V V Less pleasant 0 High Odorant Concentration 2 Low Odorant Concentration 0 r = 0.077, p < 0.0001 0 50 100 150 200 250 −0.10 −0.05 0.00 0.05 Phenotype Principal Component 1 e f Spearmint Pleasantness 4 60 Methyl salicylate Intensity

y PC2 Anise Pleasantness 3 Undecanal Intensity Decyl aldehyde Intensity More pleasant 40

2

20 iation Explained b 1 a r mint [High] Pleasantness Rank % V Less pleasant Spea r High Odorant Concentration 2 0 Low Odorant Concentration 0 r = 0.042, p = 0.054 0 50 100 150 200 250 0.0 0.1 0.2 Phenotype Principal Component 2

Supplementary Figure 4. Genetic ancestry correlates with olfactory perception. (a) Self-reported ancestry clustered when participants were plotted according to their eigenvectors for the first two principal components (PCs) calculated from all available genotype data for our subject cohort. (b) Correlation between the first PC of genetic variation calculated using SNPs from all targeted genes (OR and non-OR genes) and the first PC calculated using SNPs identified in non-OR genes only (r2 = 0.98, p < 0.0001). (c) Percent variance explained by PC1 for all 276 phenotypes (ordered by percent variance explained). PC1 explains greater than 4% of the variance for 6 phenotypes (labeled) (p < 0.01 following FDR). Bold labels indicate high odorant concentration, and plain labels indicate low odorant concentration. (d) Correlation between PC1 and the perceptual ranking for the pleasantness of vanillin (r2 = 0.077, p = 0.0001). (e) Percent variance explained by PC2 for all 276 phenotypes (ordered by percent variance explained). The top 5 phenotypes are labeled (r2 > 0.027, p > 0.05 following FDR). Bold labels indicate high odorant concentration, and plain labels indicate low odorant concentration. (f) Correlation between PC2 and the perceived pleasantness of spearmint (r2 = 0.042, p = 0.054).

21/22 bioRxiv preprint doi: https://doi.org/10.1101/212431; this version posted November 1, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

a 50

40

30 Count 20

10

0 20 30 40 50 Total non−functional OR haplotypes b 50

40

30 otal non−functional OR haplotypes T r2 = 0.13, p < 0.0001 −0.10 −0.05 0.00 0.05 Principal Component 1 African−American Caucasian Asian Native American Other Do Not Wish To Specify

Supplementary Figure 5. Pseudogene frequency in the participant population. (a) Distribution of frequency of OR haplotypes with either nonsense or frameshift mutations. (b) Correlation between ancestry (represented using PC1 calculated from all genetic data) and pseudogene frequency (r2 = 0.13, p < 0.0001).

22/22