Cara J Büsst, Katrina J Scurrah Phd, Justine a Ellis Phd, Stephen B Harrap FRACP, Phd

SELECTIVE GENOTYPING REVEALS ASSOCIATION BETWEEN THE

EPITHELIAL SODIUM CHANNEL -SUBUNIT AND SYSTOLIC BLOOD

PRESSURE

Cara J Büsst, Katrina J Scurrah PhD, Justine A Ellis PhD, Stephen B Harrap FRACP, PhD

Department of Physiology, The University of Melbourne, Parkville, Vic 3010, Australia

Online Methods

Subject recruitment and phenotype measurement

Subjects were drawn from the VFHS cohort of 2911 healthy adults recruited between 1991 and 1996.1 These subjects comprised 767 families consisting of 2 parents (40-70 yr old) and at least 1 natural offspring (18-30 yr old). Recruitment was limited to Caucasian families. A history of heart disease was not relevant to recruitment, the aim being to enrol a representative sample of subjects exhibiting a broad cross-section of cardiovascular risk factor levels.1

Participants attended research clinics where trained research nurses obtained relevant information regarding drug treatment and smoking, measured cardiovascular phenotypes and took blood samples as detailed previously.1 After resting for 10 minutes, 3 measurements of

SBP and DBP were taken with the participant supine; the 3 measurements were repeated after the participant had been standing for 2 minutes.1 The first blood pressure measurements in the supine and standing positions were discarded. We calculated the combined mean of the last two measurements of SBP and DBP in the supine and standing positions.1

Logistic regression analysis Logistic regression models were also fitted to assess associations while adjusting for the covariates age, sex and body mass index (BMI). The main analysis assumed an additive genetic model and each SNP was included in the model as a continuous covariate with value equal to the number of minor alleles in each individual’s genotype (i.e. 0, 1 or 2). Secondary analyses aimed to investigate possible underlying genetic models in more detail, by including each SNP in a separate logistic regression model as a categorical variable, with minor allele homozygotes and heterozygotes compared separately with major allele homozygotes (the reference group). This approach makes few assumptions about the underlying genetic model and allows the genotype data to suggest the most likely model. For SNPs with small numbers of minor allele homozygotes (< 5), a dominant genetic model was fitted instead, by combining the minor allele homozygote and heterozygote groups and comparing this larger group with the major allele homozygotes.

Haplotype analysis

Haplotype analyses were performed within the Haplo.Stats package (version 1.2.1 Mayo

Clinic/Foundation, Rochester, Minn., USA, http://www.mayo.edu/hsr/people/schaid.html), which estimates haplotypes by computing maximum likelihood estimates of haplotype probabilities. Haplotype association testing was performed using the haplo.score function2 within the Haplo.Stats package and confirmed using logistic regression analyses within the same package (haplo.glm). Analyses of the specific haplotypes were undertaken by defining haplotype blocks comprising 3 adjacent SNPs. These were defined by the sliding “window” approach.2 The first window comprised the first 3 adjacent SNPs (1, 2 & 3); the next window comprised SNPs 2, 3 & 4, the next 3, 4 & 5 and so on. The association with SBP of all estimated specific haplotypes in this window was assessed by the global P value calculated by the function haplo.score within Haplo.stats. 2 These analyses were completed for each window. All haplotype analyses included adjustments for the covariates age, sex and BMI.

Permutation testing

In order to adjust for multiple testing, we chose a permutation approach, with the initial aims of comparing permutation p-values (Pperm) to the asymptotic p-values reported in haplo.stats output, and estimating empirical correlations between test statistics in the logistic regression and haplotype analyses. Although correlations between parameter estimates were expected to be reasonably similar to LD measures, they have the advantage of being based on the specific statistical models applied rather than raw genotype data and should therefore provide a more accurate estimate of the number of “independent” tests in order to apply a Bonferroni adjustment. The permutation approach also allowed assessment of the significance of SNP and haplotype analyses simultaneously. At each iteration, genotypes were permuted among individuals, while keeping the high/low SBP status and covariates fixed for each individual.

All 25 additive logistic regression models and all 23 sliding-window haplotype analyses were then applied to the permuted dataset, and the 25 Z statistics (log odds ratios divided by standard errors) and 23 global score statistics were saved. This process was repeated 10,000 times.

The permutation p-value for each SNP or haplotype was estimated as the proportion of permuted datasets with a Z or score statistic (respectively) at least as extreme as that observed for the actual dataset. The empirical correlations between each pair of SNP or haplotype test statistics were estimated as the correlation of the 10,000 pairs of statistics from the permuted datasets. These correlations were then used to estimate the approximate number of

“independent” SNP or haplotype tests in order to apply a Bonferroni correction. We also calculated the proportion of permutations for which 3 adjacent SNP models and the associated haplotype analysis simultaneously produced more extreme results than were observed for the real dataset, and the frequency with which the number of times 4 or more SNP logistic regression analyses were significant at p = 0.05 in the permuted datasets.

Effective sample size

In order to work out the effective sample size for a study of the VFHS cohort for our study, we do not believe that the effective sample size is simply the number of families (767) as this ignores the individual specific sources of variation that we have shown exist. Nor would it be true to say that the effective sample size is all 2911 individuals as these subjects fall into families in which phenotypes such as SBP are correlated.

A standard formula for estimating effective sample size (ESS) from clustered designs exists.3

This formula states that:

ESS = mn/(1+rho(m-1))

where n is the number of clusters (767 families), m is the average cluster size (2911/767=3.8 individuals per family), and rho is the average within-cluster correlation for SBP between relatives (approximately 0.25 from our previous analyses). Applying this formula gives an effective sample size of 1714 individuals. References

1. Harrap SB, Stebbing M, Hopper JL, Hoang HN, Giles GG. Familial patterns of

covariation for cardiovascular risk factors in adults: The Victorian Family Heart

Study. Am J Epidemiol. 2000;152:704-715.

2. Schaid DJ, Rowland CM, Tines DE, Jacobson RM, Poland GA. Score tests for

association between traits and haplotypes when linkage phase is ambiguous. Am J

Hum Genet. 2002;70:425-434.

3. Donner A, Klar N, Design and analysis of cluster randomization trials in health

research. 2000, New York: Oxford University Press.

4. Giegerich R, Meyer F, Schleiermacher C. GeneFisher--software support for the

detection of postulated genes. Proc Int Conf Intell Syst Mol Biol. 1996;4:68-77.

5. Morris B, Benjafield A, Ishikawa K, Iawi N. Polymorphism (-173G>A) in promoter of

human epithelial sodium channel gamma subunit gene (SCNN1G) and association

analysis in essential hypertension. Hum Mutat. 2001;17:157.

6. Iwai N, Baba S, Mannami T, Katsuya T, Higaki J, Ogihara T, Ogata J. Association of

sodium channel gamma-subunit promoter variant with blood pressure. Hypertension.

2001;38:86-89.

7. Persu A, Coscoy S, Houot AM, Corvol P, Barbry P, Jeunemaitre X. Polymorphisms of

the gamma subunit of the epithelial Na+ channel in essential hypertension. J

Hypertens. 1999;17:639-645. Table SI. Primers used for the genotyping of SCNN1G SNPs

______

No. SNP Forward Primer Reverse primer Extension primer

______

1 rs5718*† CTTGGGGTAAGTGCCGCAGA AGCGGGCCACGACTTCACA TGCGCGGTGGCCCAGGAAG

2 rs4073289 TAGGACATATCCTAGGGCAT CCCAGCTTCCAATTTTAGC GCACATCGAAGTCACAGACA

3 rs5732 ACCTGCTTCTCTTCTTTGC ACGGGCAGATTCTTCTTG ACTTTGAGGACGGGACTC

4 rs5733 TCAACACCAACACCCATG GTGGACTTTGATGGAAACTG CCGCATCGTGGTGTCCCGC

5 rs4967948 ATCAGGCAGACTTTGTCAG GAAAGAAGAGGAAGACATGG AAACTAACTTGTCCAGAATC

6 rs4365290‡ ACTTGGAACAGGAGACCA TTGCCCTTCTCATCCTGA TTGCCCTTCTCATCCTGA

7 rs5735*‡ CTGGAACTCCGTCTCAGA GACTCGATGTGCATGACA CCTAGATTCTCCCACCGGAT

8 rs5737‡ GCTGATCTTTGATCAGGATG GCAGTGAAAGAATGAGAAGG GCCTTGTGAATGCTACC

9 rs4302034 GAGCAAGACCCTATCTCAG TGCCCTTACATGACTTTGG AAATACAATACAATACAATA

10 rs4247210 GTAAGGAAACCTGTGCCAA CAGCATGAACACTGATGGA AGTGGACATGGGGGCAGCAG

11 rs7200952 AGTCTAAGATCAAGGCACAG ATGTCCTGATCAGAAGAGGA TCTGGTGAGGCTTCTCTCTT

12 rs13331086 GAAGTGAATTGCTGGAGGT CATCACTGGCCCAACTTC CATCACTGGCCCAACTTC 13 rs11074553 CCACACCAGCAATATGAGA TTGGGCTCATCTTCTAGAGA N/A

14 rs4299163 AGAGTCCAGCAGAGAGGA AGCCTTCCTTGAACCTCA CATTTCTTCTTCATAGCA

15 rs12446988 GAGACAGGAGAATGGCAT AGATCACTGAGGCAGAGA AGATCACTGAGGCAGAGA

16 rs4401050 GCAGGATGAATGTAGCTGA AGCCAGTAATCCCAAGTCA TGGGAAACCAGCCTGCTTAC

17 rs4260062, 18 rs4411498, CTCACTGCAGCCTTGAC CTTTGCAAGGAGCACCA N/A 19 rs4427805

20 rs5740 CATCTACAACGCTGCCTA TGTTCTTGAGGTCTCTGGA GCTCCAGGTAACAGATTGGC

21 rs4281710, 22 rs4499238, ATAGCTTCTCAGCTCCCAA AAACCAGGTATGCCACCT N/A 23 rs4516235, 24 rs4470152

25 rs5723‡ CCTTTTCCAACCAGCTCA GAGTATCTGAAAAGCCCAGA ACCCAGATGCTGGATGAGCT

26 rs5728 CAGATGCCAAAGATAGGAGA GCACTCTGATCTCCACCA GCAGCAGAGAACTGGCCCAG

______

PCR primer design was assisted by GeneFisher software.4

† Previously studied by Morris et al.5

* Previously studied by Iwai et al.6

‡ Previously studied by Persu et al.7