The Genetics of Variation in Gene Expression

Chris J. Cotsapas

A thesis submitted in fullllment of the requirements for the degree of Doctor of Philosophy

University of New South Wales

2005 Abstract

The majority of genetic dierences between species and individuals have been hypothesised to impact on the regulation, rather than the structure, of genes. As the details of genetic variation are uncovered by the various genome sequencing projects, understanding the functional eects on gene regulation will be key to uncovering the molecular mechanisms underying the genesis and inheritance of common phenotypes, such as complex human disease and commercially important traits in plants and animals. Unlike coding sequence polymorphisms, genetic variants aecting gene expression will reside in the transcriptional machinery and its regulatory inputs. As these are largely specic to cell– or tissue–types, we would expect that regulatory variants will also aect nal mRNA levels in a tissue specic manner. Genetic variation between individuals may therefore be more complex than the sum total of sequence dierences between them. Demonstrating this hypothesis is the main focus of this thesis. We use microarrays to measure mRNA levels of approximately 22,000 transcripts in inbred and recombinant inbred strains of mice, and present compelling evidence that the genetic inuences on these levels are tissue–specic in at least 85% of cases. We uncover two loci which apparently inuence transcript levels of multiple genes in a tissue–specic manner. We also present evidence that failure of microarray data normalisation may cause spurious linkage of expression phenotypes leading to erroneous biological conclusions, and detail a novel, extensible mathematical framework for performing tailored normalisation which can remove such systematic bias. The wider context of these results is then discussed.

1 Contents

1 Introduction 11 1.1 Summary ...... 12 1.2 Detection strategies ...... 13 1.2.1 Allelic discrimination ...... 13 1.2.2 Genetical genomics ...... 15 1.3 Genetics of regulatory variation ...... 18 1.3.1 Extent of genetic eects ...... 18 1.3.2 cis, trans, and master regulators ...... 20 1.3.3 Heritability, epistasis, and the number of determinants 21 1.3.4 Tissue specicity ...... 22 1.4 Biological implications ...... 23 1.5 Outline ...... 25

2 Microarray normalisation for genetical genomics 26 2.1 Introduction ...... 27 2.1.1 A note on microarray data visualisation ...... 28 2.2 Normalisation – mathematical bias removal ...... 29 2.2.1 Scaling ...... 30 2.2.2 Analysis of Variance ...... 31 2.2.3 Principal Components Analysis ...... 31 2.2.4 Intensity–dependent smoothing ...... 33 2.3 Correcting multiple non–linear biases in microarray data . . . 34 2.3.1 Non–linear artefacts ...... 35 2.3.2 Additive model normalisation ...... 36

2 CONTENTS

2.4 Failure of normalisation in genetical genomics experiments . . 41 2.4.1 Experimental design ...... 42 2.4.2 Lack of agreement between normalisation results . . . 43 2.5 Conclusions ...... 46 2.6 Materials and Methods ...... 47 2.6.1 Sample handling ...... 47 2.6.2 Expression proling ...... 47 2.6.3 Normalisation ...... 48 2.6.4 Linkage analysis ...... 48

3 Genetic inuence on mRNA levels is tissue specic 49 3.1 Introduction ...... 50 3.2 Experimental design ...... 51 3.3 Tissue specicity of inuences on gene expression ...... 53 3.3.1 The majority of genetic inuences are tissue specic . 54 3.3.2 Expression levels do not reect complexity of inuences 56 3.4 Functional bias in inuenced transcripts ...... 57 3.4.1 Over–representation of functional themes ...... 59 3.5 Discussion ...... 61 3.6 Materials and Methods ...... 63 3.6.1 RNA preparation ...... 63 3.6.2 Microarray hybridisation and washing ...... 63 3.6.3 Data processing ...... 64 3.6.4 Overlap analysis ...... 64 3.6.5 Detecting changes in expression between strains . . . . 64 3.6.6 Gene Ontology analysis ...... 66

4 Dissection of genetic inuences on mRNA levels in a Re- combinant Inbred panel 67 4.1 Introduction ...... 68 4.2 Experimental design ...... 69 4.2.1 Moderated linkage statistics ...... 70 4.3 Independent tissue analysis ...... 72

3 CONTENTS

4.3.1 Linkage complexity ...... 73 4.4 Expression level correlation analysis detects biological themes under genetic inuence ...... 74 4.4.1 Correlation analysis of genetically variant expression levels identies biological pathways ...... 75 4.4.2 Correlated clusters have common genetic determinants 76 4.5 Discussion ...... 77 4.6 Materials and Methods ...... 79 4.6.1 RNA preparation ...... 79 4.6.2 Microarray hybridisation and washing ...... 80 4.6.3 Data processing ...... 80 4.6.4 Linkage analysis ...... 81

5 Resolvable genetic determinants of mRNA levels are tissue specic 82 5.1 Introduction ...... 83 5.2 Experimental design ...... 84 5.3 Tissue specicity of parentally inuenced genes ...... 84 5.4 Transgressive segregation of mRNA levels ...... 87 5.5 Some loci inuence multiple transcript mRNA levels . . . . . 89 5.5.1 A region on chromosome 1 inuences multiple transcripts in brain ...... 90 5.5.2 A region on chromosome 8 inuences transcripts in all tissues ...... 92 5.6 Discussion ...... 94 5.7 Materials and Methods ...... 97 5.7.1 RNA preparation ...... 97 5.7.2 Microarray hybridisation and washing ...... 97 5.7.3 Data processing ...... 98 5.7.4 Linkage analysis ...... 98 5.7.5 Overlap analysis ...... 99

4 CONTENTS

6 Discussion 100 6.1 Results summary ...... 101 6.2 Dening regulatory circuits ...... 102 6.2.1 Regulatory interactions as networks ...... 103 6.2.2 Tissue specic regulatory interactions ...... 104 6.2.3 Mapping regulatory metaphenotypes ...... 105 6.3 The implications of tissue specicity ...... 106 6.3.1 Understanding relationships between individuals . . . 107

Literature cited 109

A Signicant linkages in the BxD panel 128

B Over–represented GO terms in correlation clusters 204

5 List of Tables

1.1 Summary of genetic inuences on gene expression levels . . . 19

2.1 Eect of normalisation on gene identication ...... 44 2.2 Eect of normalisation on locus identication ...... 45

3.1 Genetic inuences on mRNA levels in each tissue ...... 54 3.2 Expression of genetically inuenced genes across tissues . . . 57 3.3 Extrapolating genetic inuence between tissues ...... 58 3.4 Enriched Gene Ontology terms for genetically inuenced genes. 60

4.1 Linkage results in three tissues ...... 73 4.2 Complexity of expression variation ...... 74 4.3 Linkage aggregation in RI brain ...... 76 4.4 Linkage aggregation in RI kidney ...... 77 4.5 Linkage aggregation in RI liver ...... 78

5.1 Linkage results for genetically inuenced genes ...... 84 5.2 cis and trans eects on gene expression ...... 86 5.3 Genetic dissection of transgressive mRNA levels ...... 88 5.4 cis and trans eects on transgressively segregating genes . . . 89 5.5 A locus aecting multiple transcripts in brain ...... 90 5.6 Loci aecting transcript levels in multiple tissues ...... 92 5.7 Linkage signicances for genes aected by chromosome 8 . . . 93

A.1 Gene linkages in brain ...... 129

6 LIST OF TABLES

A.2 Gene linkages in kidney ...... 186 A.3 Gene linkages in liver ...... 195

B.1 Over–represented GO terms in BxD brain ...... 205 B.2 Over–represented GO terms in BxD kidney ...... 206 B.3 Over–represented GO terms in BxD liver ...... 207

7 List of Figures

2.1 Basic microarray visualisation ...... 29 2.2 Non–linear biases in microarray data ...... 37 2.3 Systematic biases in microarrays ...... 39 2.4 Systematic biases in microarrays ...... 39 2.5 Deposition order biases in each channel ...... 40

3.1 Tissue–specic genetic inuence on mRNA levels ...... 55

4.1 Signicance comparisons for linkage analysis ...... 71

5.1 Overlap of linkage results for genetically inuenced genes . . 85 5.2 Overlap for transgressive genes ...... 88 5.3 Eects mapping to chromosome 1 ...... 91 5.4 Linkage scan showing eects in all tissues ...... 93

8 LIST OF FIGURES

Acknowledgments

This thesis is the culmination of four years of work with Professor Peter Little, without whom it would have been impossible. I owe him a great debt of gratitude for years of support and friendship, but most of all for the patience he exhibited whilst I transformed myself from dilettante to scientist. The trust and friendship he has extended to me have been an honour and a privilege, and I can only hope they have not been misplaced. Dr. Rohan Williams has played a central role in this work, as collabora- tor, oce–mate, food buddy, aesthetic sounding–board and purveyor of ex- otic quantitative methods. His broad knowledge and burning curiosity made our wide–ranging scientic conversations fascinating, and he has taught me much. The Gene Ontology analyses presented in Chapter 3 are his work, and many points in the Discussion have arisen as a consequence of brain– storming sessions. The statistical aspects of this work are the fruit of a collaboration with the School of Mathematics, UNSW, with Professors William Dunsmuir and Matt Wand, and especially Dr. David Nott. David has gently tutored me into at least passing competence with elementary statistics; the B–statistic modications in Chapter 3 are his work, and he has had a major inuence on the normalisation methods presented in Chapter 2. He has kindly proof–read and corrected some parts of this work. The additive model normalisation procedure in Chapter 2 was devised by Professor Matt Wand, who also wrote an early implementation. The material presented here rests, directly or indirectly, on the work of other members of our laboratory. The expression data in Chapter 3 was generated by Jeremy Pulvers as part of his Honours project; Eva Chan kindly provided genetic map data; and Mark Cowley has contributed program code for several analyses. They have provided a stimulating environment to work in, and I am fortunate to count them as friends and colleagues. Bronwyn Robertson and Geo Kornfeld of the Ramaciotti Centre for Gene Function Analysis, UNSW, provided microarrays and facilities for the experiments described here. A/Prof. Russell Standish of the High Perfor-

9 LIST OF FIGURES mance Computing Support Unit at UNSW wrote an optimised implementation of the bootstrapped Student’s t–test used in Chapter 4, and kindly arranged for compute time on the Barossa cluster. John Schimenti (Jack- son Laboratories) and Maja Bucan (University of Pennsylvania) and their laboratories kindly assisted in obtaining BxD mouse strains. They all have my sincere thanks. On a personal note, I would never have embarked on so ambitious a move without the support of my family. They had always taught me that everything was possible, and although I suspect this is not what they had in mind, were directly responsible for me moving hemispheres on a whim- sical decision to follow science. Friends I have made in Australia made me welcome and provided respite and refreshment over the last four years: Neil Saunders, Stephen Harrop, and Greg Tyrelle as Team Linux; Kai and Kerry Schindlmayr for endless friendship and hospitality; Julie Lim, Emma Collinson, Yael Azriel were my female consiences; Jo Gibson forebore to judge, and Laura Coleman taught me I was not alone. All have my thanks and love.

10 Chapter 1

Introduction

11 1.1. SUMMARY

1.1 Summary

The majority of genetic dierences between species and individuals impact on the regulation, rather than the structure, of genes [King and Wilson, 1975; Paigen, 1979]. As the details of genetic variation are uncovered by the various genome sequencing projects, understanding the functional eects on gene regulation will be key to uncovering the molecular mechanisms underying the genesis and inheritance of common phenotypes, such as complex human disease [Gabellini et al., 2004; Theuns and Van Broeckhoven, 2000] and commercially important traits in plants and animals. The transcriptome is presently the most amenable level at which to study gene regulation and its genetics at a global level, although we still know little of the regulatory forces which shape these processes. This is due in part to the diculty in predicting regulatory variants from sequence, and in part to the absence until recently of high–throughput quantitative methods of assaying gene expression [Brenner et al., 2000; Schena et al., 1995; Velculescu et al., 1995]. As the latter become available, established methods and genetic resources may be used to dissect the genetics of regulation, by treating expression levels as quantitative traits with genetic determinants. Unlike coding sequence polymorphisms, genetic variants aecting gene expression will reside in the transcriptional machinery and its regulatory inputs. As these are largely specic to cell– or tissue–types [reviewed by Wray et al., 2003], we would expect that regulatory variants will also aect expression levels in a tissue specic manner. It is therefore probable that the eects of genetic variation between individuals may therefore be more complex than the sum total of sequence dierences would imply. Demonstrating the validity of this hypothesis is the main focus of this thesis. In this chapter we review the methods and results of large–scale quantitative studies of genetic eects on gene expression levels, followed by a consideration of their phenotypic consequences and tissue specicity. we then describe the experimental designs used throughout this work. In following chapters we present a series of experiments using mRNA abundances from inbred and recombinant inbred mice as molecular phenotypes in ge-

12 1.2. DETECTION STRATEGIES netic mapping experiments, demonstrating the tissue specicity of heritable modulations on gene expession. we also describe a new method for normalis- ing microarray data which removes biases capable of generating artefactual genetic signal in these experiments, and conclude by arguing that this evidence supports the rethinking of genetic dierences between individuals as a highly contextual measure of functional changes, as opposed to a simple proportion of sequence divergence, and suggesting that this concept may also be useful in characterising between–species dierences.

1.2 Detection strategies

A genetic variant aecting transcriptional regulation will alter either the eective concentration of mRNA transcribed from a gene, or the spatio– temporal distribution of message. Stable dierences of gene expression can therefore be used to infer the presence of such variants. This has been done either by looking for dierences in transcription between alleles in heterozygous individuals (allelic discrimination), or by treating overall expression levels as heritable traits in genetically distinct individuals in a population (genetical genomics [Jansen and Nap, 2001]). In the latter case, a natural population may be used to “phenotype” for expression levels, and a segregating population to map their genetic determinants in the genome. This approach is equivalent to quantitative trait locus (QTL) analysis, from which analytical methods have been extensively borrowed.

1.2.1 Allelic discrimination

Polymorphisms within expressed sequence can be used to quantitate the expression levels of alleles in either heterozygous individuals [Cowles et al., 2002; Pastinen et al., 2004; Wittkopp et al., 2004; Yan et al., 2002], poly- ploids [Schaart et al., 2005], or natural populations [Lo et al., 2003], by adapting genotyping technologies such as single–base primer extension [Cowles et al., 2002; Pastinen et al., 2004; Yan et al., 2002], pyrosequencing [Schaart et al., 2005; Wittkopp et al., 2004], genotyping arrays [Lo et al., 2003], and

13 1.2. DETECTION STRATEGIES mass spectrometry [Knight et al., 2003] for use with cDNA. The method lends itself to detection of eects in cis (for example, variants in promoter binding sequences), which will be in linkage disequilibrium with the expressed marker. However, eects in trans can only be formally excluded in obligatory heterozygotes, such as intercross [Cowles et al., 2002] or interspecic [Wittkopp et al., 2004] individuals derived from homozygous backgrounds. They can be inferred by comparison to the parental strains [Wittkopp et al., 2004]: an allelic imbalance between the parentals, but not within the F1 generation being indicative of a trans eect. In outbred populations such as the CEPH human reference pedigrees [Dausset et al., 1990], where individuals are not necessarily heterozygous at all loci, the assumption that eects are due to cis variants cannot be made without testing for either association or haplotype transmission within pedigrees [Pastinen et al. 2004; Yan et al. 2002, c.f. Lo et al. 2003]. These techniques generally allow resolution of small relative dierences between allele abundances: Wittkopp et al. [2004] resolve 1.1–fold or better dierences in 29 genes between inter-specic Drosophila melanogaster D. simulans crosses; Knight et al. [2003] measure a 1.3–fold allelic expression imbalance for LTA in humans; and Cowles et al. [2002] measure 1.5–fold or better dierences for 69 genes in F1 mouse inbred strain crosses. Lo et al. [2003] sacrice sensitivity for throughput, screening over 1000 polymorphisms in human tissues using array–based formats, but resolving 2–fold or greater changes in expression. This tradeo becomes a recurrent theme, as the other, more sensitive approaches have limited throughput due to for- mat constraints. The main limitation of the allelic discrimination approach is the reliance on expressed polymorphisms, which implies knowledge of coding sequence variations within the study populations. Whilst such data are being generated for humans [The International HapMap Consortium, 2003] and mice [Wade et al., 2002; Wiltshire et al., 2003], they are not yet available for many other organisms, and are unlikely to ever be for those not used as genetic models. This raises the problem of de novo identication of suitable polymorphisms for each gene under study, which can be expensive and

14 1.2. DETECTION STRATEGIES time consuming. Furthermore, cis–acting variants may be in elements some distance from the expressed polymorphism, and recombinations in this interval will reduce power in detecting eects. As each gene must be tested separetely, massively–parallel execution is also problematic. Further, the approach does not distinguish between genetic and epigenetic eects, so that prior knowledge, such as imprinting status, is also required. These problems make allelic discrimination cumbersome at the whole–transcriptome level, but the high specicity and sensitivity make these methods ideal for focussing on particular sets of genes.

1.2.2 Genetical genomics

The second approach is to treat overall expression levels in a sample as a quantitative trait, without dierentiating between alleles. There is no requirement for prior knowledge of the genes under consideration, so generic expressed sequence or cDNA resources may be used in high–throughput expression formats, such as microarrays [Schena et al., 1995], SAGE [Vel- culescu et al., 1995] or MPSS [Brenner et al., 2000], widening the scope to include non–model organisms. Extensive genetic and physical maps are already available for common model organisms, as is genotypic information for stable populations such as recombinant inbred mouse strains [Bailey, 1971] and CEPH family cell lines [Dausset et al., 1990]. The availability of high– throughput genotyping platforms also makes these experiments feasible in transient intercross or backcross populations [e.g. Schadt et al., 2003]. As most or all the transcriptome may be queried, there is no bias from prefer- ential target gene selection. Two experimental designs are possible: comparing individuals within a population [Jin et al., 2001; Kluger et al., 2004; Oleksiak et al., 2002] to determine heritability of diences in gene expression levels, or mapping such dierences in a segregating population [Brem et al., 2002; Morley et al., 2004; Schadt et al., 2003; Yvert et al., 2003] as QTLs (eQTL [Schadt et al., 2003]). Sampling from an outbred population captures a large proportion of the natural variation of an organism. However, the absence of dened

15 1.2. DETECTION STRATEGIES relationships between individuals and potential stratication eects, allow little resolution power beyond estimating the proportion of eected genes and a rough measure of heritability [Falconer, 1989]. A segregating population oers the advantage of dissecting the number and relative contribution of determinants, at the expense of decreasing the proportion of total genetic variation sampled. The two strategies may be used in combination, for example to limit mapping attempts to transcripts with highly heritable expression levels [Morley et al., 2004]. Whilst this decreases the multiple testing involved in mapping thousands of expression phenotypes, it will exclude the apparently common transgressive segregation events [e.g. Brem and Kruglyak, 2005; Chesler et al., 2005]. High–throughput expression assays allow us to look for eects at the level of molecular processes, rather than of single genes, which may be more biologically relevant. Clustering [Eisen et al., 1998; Gasch and Eisen, 2002], dimensionality reduction [Alter et al., 2000, 2003], and information theory [Basso et al., 2005] amongst other computational approaches [Deng et al., 2005; Lipan and Wong, 2005; Perrin et al., 2003; Qian et al., 2003; Tavazoie et al., 1999] have already been applied to large expression datasets, to detect correlations in patterns of gene expression across conditions indicative of group co–regulation and functional relationship. These methods allow us to detect novel links between groups of genes — particularly valuable information for the majority of transcripts in large expressed collections, for which little information is available. In a genetical genomics setting, Yvert et al. [2003] use mean expression levels of groups identied by hierarchical clustering [Eisen et al., 1998] to represent overall trends in co–regulated genes; and Kluger et al. [2004] use principal components to collapse expression of genes in biological pathways from KEGG [Kanehisa et al., 2004] and BioCarta, rather than interrogate expression levels per se. Downstream applications of such approaches [Bing and Hoeschele, 2005; Li et al., 2005] may reveal any genetic perturbations of transcriptome subnetworks, which may be more informative in terms of phenotypic consequences than changes to individual genes. At a methodological level, mapping trends in groups of genes rather than individual gene expression proles will substantially

16 1.2. DETECTION STRATEGIES reduce the multiple testing problem [Brem et al., 2002; Yvert et al., 2003] inherent in genetical genomics. The primary limitation of segregating populations is the obligatory use of limited genetic backgrounds so that only a proportion of variants present in a species may be assayed. Solutions to address this problem, such as recombinant inbred panels derived from multiple genetic backgrounds [Churchill et al., 2004], are being developed, although these are likely to be limited to common model organisms for the forseeable future and are inapplicable to humans. There are several aspects of methodology that are far from resolved. Data normalisation, particularly for microarrays, to remove systematic artefacts has mostly relied on simplistic data transformation such as mean stan- dardising or scaling, which are known to be inadequate even for simple outlier detection experiments [Williams et al., 2006, in press]. Residual structure within the data is therefore often mapped, in the mistaken be- lief that it represents large–scale biological phenomena. This artefactual signal detection is exacerbated by use of established linkage statistics without a thorough examination of their applicability to such datasets. The use of non–signicant, “best” LRS [Churchill and Doerge, 1994; Doerge and Churchill, 1996] score by Chesler et al. [2005] as a measure of eects in trans (discussed in the next section), for example, promotes the overestimation of such phenomena. Determination of signicance may also be inaccurate when using nominal p–values for certain statistical tests, which ignore the small sample size and non–normal nature of the underlying data. This may be particularly damaging in combination with the universal lack of minimum intensity data thresholding based on negative control sequences present on arrays, as the high variance observed at low intensities will inate test statistics [Smyth, 2004]. These issues are discussed in detail in Chapter 2, where we present an extensible normalisation framework and associated methodology to address these problems.

17 1.3. GENETICS OF REGULATORY VARIATION

1.3 Genetics of regulatory variation

The methods described above allow us to begin dissecting the genetic component of gene expression levels, and interpreting these results will inform our understanding of transcriptome regulation. A summary of the current literature is given in Table 1.1. Results from Wittkopp et al. [2004] are included for completeness, although it should be noted that they compared two species of drosophilids, D. melanogaster and D. simulans.

1.3.1 Extent of genetic eects

The primary observation is the proportion of transcripts whose expression is under clear genetic control. There is a striking dierence between results from allelic discrimination and genetical genomics approaches, the former indicating that more than half the genes investigated are inuenced. Whilst this may be due to the increased sensitivity of the underlying experimental methods, these studies generally examine a small number of carefully selected genes, and will therefore tend to overestimate such eects. This notion is reinforced by the observation that, with the exception of the results of Lo et al. [2003], there appears to be a decreasing trend in the proportion of aected genes within species, as the number of genes surveyed increases (Table 1.1). The more inclusive genetical genomics studies suggest that 10 – 28% of transcripts are subject to genetic modulation. The false discovery rates (FDR) vary between these reports, however, as does the method used (if any) to lter transcripts considered for analysis. Monks et al. [2004], for example, consider genes expressed at reliable detection levels; Morley et al. [2004] require dierential expression in grandparents of CEPH families before considering parent–children heritability; and Chesler et al. [2005] restrict themselves to transcripts with a certain heritability. A rough estimation from all these results would therefore indicate that one third to one half of transcripts will be under some genetic expression inuence, subject to the caveats of tissue specicity and temporal modulation of these eects. A comparison to the number of genes with heritable expression levels, however,

18 1.3. GENETICS OF REGULATORY VARIATION

Species† Genes Aected FDR cis trans Refs (Modality‡) studied (%) % (%) (%) Dm (AD)?\ 29 29 (100) N/A 28 (97) 16 (55) Wittkopp et al., 2004 Hs (AD) 13 6 (46) N/A 6 (100) — Yan et al., 2002 Hs (AD) 15 7 (47) N/A 7 (100) — Bray et al., 2003 Hs (AD)§ 602 326 (54) N/A — — Lo et al., 2003 Hs (AD) 129 23 (18) N/A 23 (100) — Pastinen et al., 2004 Mm (AD) 69 4 (6) N/A 4 (100) — Cowles et al., 2002 Mm (NP) 7,169 73 (1) — — Sandberg et al., 2000 Dm (NP) 3,931 () — — Jin et al., 2001 Fh (NP) 907 161 (18) — — Oleksiak et al., 2002 Sc (NP) 5,908 433 (7) — — Townsend et al., 2003 Fh (NP) 192 92 (48) — — Whitehead and Crawford, 2005 Hs (SP) 7,861 2,123 (27) () () Schadt et al., 2003 Hs (SP)? 3,554 142 (4) 32 (23) 115 (81) Morley et al., 2004 Hs (SP) 3,554 984 (28) ?? () ?? () Morley et al., 2004 Hs (SP) 2,340 762 (31) 5 8 (26) 12 (39) Monks et al., 2004 Sc (SP) 1,528 570 (37) 205 (36) 365 (64) Brem et al., 2002 Sc (SP) 6,215 1716 (28) (20) 992? (75) Yvert et al., 2003 Mm (SP) 12,422 1,218 (10) 162 (13) 1,056 (87) Bystrykh et al., 2005 Mm (SP) 608 101 (17) 25 92 (91) 9 (9) Chesler et al., 2005 o Rn1 (SP) 15,923 1,833 (12) 75 622 (34) 1,211 (66) Hubner et al., 2005 o Rn2 (SP) 15,923 2,051 (13) 64 800 (39) 1,251 (61) Hubner et al., 2005

Table 1.1: Study summary of signicant genetic inuences on gene expression within a population. Only eects demonstrating linkage are reported for genetical genomics studies. † Fh: Fundulus heteroclitus; Dm: Drosophila melanogaster; Hs: Homo sapiens; Mm: Mus Musculus; Rn: Rattus norvegi- cus; Sc: Saccharomyces cerevisiae. ‡ AD: allelic discrimination; NP: natural population; SP: segregating population. ? These studies report both cis and trans inuences on transcript levels, so that the sum of eects is greater than the number of genes inuenced. \ This is an inter– rather than intra–, specic population, between D. melanogaster and D. simulans. § Lo et al. [2003] cannot distinguish between cis and trans eects. o Hubner et al. [2005] provide results on fat tissue (Rn1) and kidney (Rn2) at several genome–wide thresholds: P < 0.05 is used here for constistancy with other studies [Hubner et al., 2005].

suggests that this is still an underestimate (discussed in section 1.3.3). Transgressive segregation [Brem and Kruglyak, 2005; Chesler et al., 2005] appears common for gene expression levels. In yeast, Brem and Kruglyak

19 1.3. GENETICS OF REGULATORY VARIATION

[2005] estimate that up to 59% of transcripts display expression dierences in segregants of a cross between laboratory and wild strains, and Chesler et al. [2005] report that the phenomenon is common in laboratory mouse crosses. This suggests that multiple determinants of gene expression are common, so that they may be regarded as complex quantitative traits.

1.3.2 cis, trans, and master regulators

In segregating population experiments, we can dissect the aetiology of regulatory variation by estimating the proportions of variants genetically in cis and in trans to the aected genes. The former will reside in regulatory elements, such as promoter/enhancer binding sites or transcript stability motifs [Cartegni et al., 2002]; trans–acting factors are generally thought of as transcription factor alleles [Chesler et al., 2005], although they may be co–factors, accessory proteins, or reside in more indirect regulatory components [Yvert et al., 2003], for example altering propagation eciency in a signalling cascade resulting in transcriptional changes. Of nine studies, eight report 60 – 90% trans–acting determinants; Chesler et al. [2005] alone report 91% cis eects, which may be due to the compar- atively small number of genes passing their ltering process. This suggests that the most common mechanism for expression level variation is through regulatory factors aecting transcription, rather than changes to the control sequences of a transcript. Since multiple transcripts will be modulated by the same regulatory machinery, the number of cis and trans variants may be the same, the discrepancy in eect frequency being attributable to the pleiotropy of the latter. Pleiotropic eects can be detected by examining the map locations of trans eects on all transcripts, with those coincident identifying loci containing “master regulators” of transcription [Morley et al., 2004]. Chesler et al. [2005] then go on to show that these loci also give “best” (i.e. non– signicant) linkage scores for a large number of other transcripts irrespective of their heritability, suggesting that up to 10% of the transcriptome may be modulated by any one of these loci. Schadt et al. [2003] indicate seven such

20 1.3. GENETICS OF REGULATORY VARIATION linkage “hotspots” containing more than 1% of approximately 4,300 QTLs detected. In contrast, Morley et al. [2004] nd two hotspots eecting more than six transcripts each: plausible indications of, for example, a variant transcription factor. Monks et al. [2004] nd 12 such loci, none inuencing more than six transcripts, in fteen human pedigrees. The concept of master regulators inuencing thousands of transcripts must therefore be treated with scepticism, particularly where a lack of signicance coupled with low heritability may suggest that many of these linkages are spurious. Observa- tions from our laboratory further suggest that even highly signicant linkage may be artefactual: failure of normalisation allows subtle data structure to remain in expression value matrices, which will by chance be described by some genotype pattern in the genetic map and therefore produce “linkage” signal [Chapter 2 and Williams et al., 2006, in press]. This possibility would also explain the inconsistancy of locations for these master regulators between experiments (e.g. Schadt et al. [2003] c.f. Chesler et al. [2005]).

1.3.3 Heritability, epistasis, and the number of determinants

The ability to map determinants of a quantitative trait is dependent on both its heritability (the proportion of variance attributable to genetic factors), and its complexity (the number of contributing factors) [Falconer, 1989]. Classical QTLs [Korstanje and Paigen, 2002] are thought to follow an exponential distribution (the so–called –model [Morton, 1998]) rather than Fisher’s original proposition of an innitessimal model [Fisher, 1930], such that a few loci of large eect account for the majority of genetic variance, with tens or hundreds supplying the balance [reviewed in Farrall, 2004]. A complication occurs if two loci interact to produce an overall eect, in a process called epistasis [Falconer, 1989; Lynch and Walsh, 1998]. Since the eects are, by denition, non–additive, each locus will, when considered in isolation, contribute only a small fraction of the heritable variance. QTL of this nature will tend to follow an innitessimal model, and will not show statistically signicant linkage. The observation that a substantial proportion of highly heritable expres-

21 1.3. GENETICS OF REGULATORY VARIATION sion levels do not show robust evidence of linkage may therefore be explained, after lack of power considerations, by either an innitessimal, or an epistatic argument. Chesler et al. [2005] indicate that only 17% of highly heritable transcripts show linkage in mice, Monks et al. [2004] report the same number for humans, and Morley et al. [2004] 4% in humans. Other studies either do not calculate heritability, or report similarly low gures. In an explicit study of the properties of eQTL, Brem and Kruglyak [2005] show that approximately half the expression levels in yeast are best explained by models containing more than ve additive loci; through simulation, they show that up to 61% may be accountable by ten–locus models. We can therefore conclude that a substantial proportion of heritable transcription levels are either inuenced by many loci, perhaps approaching an innitessimal, rather than exponential, model of eect; or that they may be explained by a relatively small number of epistatic interactions between causative variants. A recent study in yeast [Storey et al., 2005] indicates that 14% of transcription level dierences are explainable by epistasis between two loci. However, the majority of studies report a small percentage of multiple linkages, which implies that at least some expression levels are the result of additive eects following an exponential model. Whether these represent extremes of a continuum of complexity for expression traits, or two discrete modes of regulatory variation, is still unclear.

1.3.4 Tissue specicity

Despite several studies reporting results from multiple tissues in the same populations, there has, as yet, been no systematic comparison between cell types beyond observations of low overlaps in eected genes. Hubner et al. [2005] report 7 – 22% overlap of eects in recombinant inbred rat fat and kidney tissues, with far fewer eects in trans detected at more stringent thresholds. Chesler et al. [2005] and Bystrykh et al. [2005] report 50% cis eects shared between brain and hematopoietic stem cells from recombinant inbred mice, and only four corresponding trans eects, where the number of inuenced genes is not reported. It is not clear from these reports whether

22 1.4. BIOLOGICAL IMPLICATIONS the proportion of overlap is due to the absence of expression, or variants in tissue specic regulatory components. Given the predominantly tissue– specic nature of the gene regulation machinery [Wray et al., 2003], the latter is likely to be a signicant contributor of tissue–specic eects. Contrary to published claims [Chesler et al., 2005], the proportion of overlaps make extrapolating regulatory eects between tissues untenable. It further suggests that the full spectrum of variation between individuals can only be understood by extensive tissue sampling. An interesting implication is that genetic similarity is a tissue–specic quantity, which would add further complexity to population structure, genetic epidemiology, and species evolution.

1.4 Biological implications

The rst demonstration of functional regulatory variation between individuals came over fty years ago [Law et al., 1952], and the general importance of regulatory variants both between and within species was already an established concept by the 1970s [reviewed in King and Wilson, 1975; Paigen, 1979]. The eld of evolutionary developmental biology has since coalesced around the theory that changes to gene regulation during development is responsible for dierences in body plans, and ultimately speciation [Carroll, 2005; Levine and Tjian, 2003]. Unsurprisingly, the consequences of more subtle gene regulatory dierences between individuals is as yet unclear, as these dierences are themselves only now being elucidated. Gene expression changes in perturbed systems or disease states have been used as molecular phenotype surrogates in complex disease dissection [Eaves et al., 2002; Karp et al., 2000], and as a signature in cancer diagnosis [Sorlie et al., 2001]. In this context, there is no requirement for them to be causative: they must simply be indicative of a state alteration to be useful. Genetical genomics experiments are now being used in a similar fashion to relate heritable dierences in gene expression levels to variation in physiological traits. Hubner et al. [2005] nd that cis eQTLs in a recombinant inbred rat panel derived from spontaneously hypertensive and normal progenitor

23 1.4. BIOLOGICAL IMPLICATIONS strains correspond to previously mapped hypertension–related trait QTLs (“pQTLs”). These results not only suggest that at least some physiological trait dierences are caused by changes in expression, but also the identity of causative genes, a major stumbling block in classical quantitative trait analysis [Nadeau and Frankel, 2000]. Schadt et al. [2005] use genes dierentially expressed between lean and obese F2 intercross mice fed an atherogenic diet [Drake et al., 2001; Schadt et al., 2003] to identify causative, rather than resultant, expression dierences with genetic modelling methods capable of assigning directionality to a correlation. The examples above attempt to identify candidate causative genes for a trait, by focussing on a particular study population chosen for maximal trait dierences. Whilst of obvious utility in dissecting known phenotypes, this approach ignores other dimensions of variation between individuals, such as the possibility that dierences in the topology of the transcriptome may be important. The network of interactions between gene products displays a hierarchical structure of subnetworks connected by hubs, which are themselves connected together [Han et al., 2004]. These hub proteins often have regulatory eects on their nodes, particularly at the transcriptional level, so that interactome topology will reect the structure of the transcriptome. We may speculate that changes to the structure or dynamics of these subnetworks induced by regulatory variants could have pervasive eects on the gross structure of the transcriptome. Such subtle changes will either be un- detected or their scale will not be reected in gene–by–gene approaches. The phenotypic consequences of such second order eects are presently unknown, but may yield unexpected insights into molecular phenotypes, particularly little–understood phenomena such as background modier activity [Nadeau, 2001] or nuclear spatial organisation [Casolari et al., 2005; Parada et al., 2004]. A recent study [Denver, DR. and Morris, K. and Streelman, JT. and Kim, SK. and Lynch, M. and Thomas, WK., 2005] suggests that mutational changes to the transcriptome accumulate rapidly; it is tempting to speculate that new combinations of alleles brought together at each meiosis may have a similar, if smaller, eect on the transcriptome: a constant revelation of “cryptic” variation [Gibson and Dworkin, 2004] between individuals.

24 1.5. OUTLINE

1.5 Outline

We investigate genetic inuences on gene expression using two inbred mouse strains and a panel of thirty–one recombinant inbred strains derived from them. We use microarrays to measure expression levels for approximately 22,000 transcripts, representing the majority of the murine gene complement. In Chapter 2 we present a novel, extensible framework for microarray data normalisation using additive models to eliminate subtle systematic biases. We show that this method is appropriate to genetical genomics applications and describe the eects of failure of normalisation. In Chapter 3, we compare mRNA levels in three tissues of two inbred mouse strains, and conclude that the majority of dierences are tissue– specic. We further show that genes whose products are involved in transcription and its regulation are particularly susceptible to dierences in mRNA levels across genetic backgrounds, suggesting that many variants inuencing these levels may reside in molecules only indirectly associated with transcription. We conclude that extrapolation between tissues of genetic eects on mRNA levels is not possible, and that the inuence of genetic variation on gene regulation is tissue–specic. In Chapter 4, we measure mRNA levels in a panel of thirty–one recombinant inbred strains, and map genetic determinants of these molecular phenotypes by assessing linkage to a dense genetic map. We show that only approximately 10% of genes with altered expression in the parental strains have resolvable determinants, but many genes not found altered in the parentals exhibit genetic linkage. We also show that two loci inuence mRNA levels in a tissue–specic manner, and conclude that gene regulation is genetically complex and largely tissue–specic. In Chapter 5, we discuss the implications of these ndings, particularly the notion that alterations to the regulation of a core set of genes may be responsible for tissue–specicity, rather than tissue–specic suites of regulators.

25 Chapter 2

Microarray normalisation for genetical genomics

26 2.1. INTRODUCTION

2.1 Introduction

The elimination of random and systematic noise from data prior to analysis is a key step in experimental science. As microarray technology has matured, a number of sources of noise have been identied [Yang et al., 2002], and both experimental and theoretical methods to account for them have been proposed. These sources may be divided into three broad categories: (i) Manufacturing practices: non–random spacing of gene probes in an array leading to areas of high/low signal; time–dependent chemical dierences in slide surface. (ii) Pre–processing methods: biases introduced by feature selection and background estimation algorithms. (iii) Experimental artefacts: background signal generated by the hybridisation process; dierences in uorescence properties between dyes; dif- ferential brightness–induced scale and range dierences. Progress has been made in addressing all three sources of bias with a combination of practical and theoretical approaches. It is notable that the latter has focused heavily on borrowing methodology from other elds, particularly applications of multivariate statistics. The rst eorts to improve microarray signal revolved around changes to manufacturing practices and experimental protocols. Manufacturing was improved by clone and control selection [Loftus et al., 1999; Schuchhardt et al., 2000], particularly after so–called housekeeping genes were shown to uctuate dramatically in expression across cell types [Lee et al., 2002]; selection of appropriate oligonucleotide sequences representing transcripts to minimise cross-hybridisation and provide cleaner signal; and the improvement of slide–coating substrates to minimise background uorescence. A signicant improvement to experimetal protocol was the adoption of indirect uorescent labelling: here, amino allyl–modied nucleotides are incor- porated during cDNA synthesis followed by dye coupling, rather than direct labelling with dye–nucleotide complexes. These can have markedly dierent steric properties leading to dierential incorporation rates and hence strong channel intensity bias, which is decreased with indirect labelling. The ex-

27 2.1. INTRODUCTION periments described in this thesis have been carried out with this labelling strategy. Advances in image processing software have also been made, particularly in the estimation of background intensities for each spot. The process of spot identication in a scan image (segmentation) has evolved from over- laying grids of xed–diameter perfect circles, to adaptive tting algorithms such as seeded region growing, capable of accounting for deviant spot morphology and thus more accurately capturing real signal [Smyth et al., 2003]. Background uorescence estimation has progressed from subtracting an average for the whole slide, to local sampling of the inter–spot spaces, to two–dimensional smooth imputation of background under the spot itself by the techniques of morphological opening [Smyth et al., 2003; Soille, 1999]. Whist these improvements have greatly increased the quality of microarray data being collected, further biases remain, and must be removed post hoc by mathematical manipulation: this process is generally referred to as normalisation [Smyth and Speed, 2003]. The nature of the remaining biases may vary across laboratories, so investigation and tailoring of methods is warranted. This tailoring is especially cogent in genetical genomics, where the goal is not to identify large expression level ratios as in typical microarray experiments, but to use them as quantitative traits. Biases must therefore be carefully removed to avoid spurious inference of genetic inuence, a phenomenon more prevalent than previously thought (discussed below in section 2.4). The relatively modest magnitude of heritable expression changes makes this data treatment all the more important.

2.1.1 A note on microarray data visualisation

Visual inspection is key to exploratory analysis [Cleveland, 1993, 1994; Tufte, 1990, 1997], but direct plots of microarray intensities are rarely informative (Fig 2.1A), even when log2 transformed to decrease the scale (Fig 2.1B). Mean dierence plots [Bland and Altman, 1986, 1999] are generally used to expose more subtle data structure by contrasting the ratio of measurements to the average measurement (Figure 2.1C). For two–colour microarray data,

28 2.2. NORMALISATION – MATHEMATICAL BIAS REMOVAL the notation proposed by Yang et al. [2002] is often used: intensity ratio M = log2(R G); and geometic mean intensity A = log2(R + G), where R and G are the background subtracted red and green channel spot intensities, respectively. I shall use this notation thoughout this chapter, and refer to the mean dierence plot as the MA plot.

A: Raw B: Logged C: Mean difference 15 2 0 50000 10 −2 Green 5 log2 Green −4 20000 Mean difference M −6 0 0

0 20000 50000 0 5 10 15 6 8 10 12 14 16

Red log2 Red Mean intensity A

Figure 2.1: basic visualisation of microarray data. An array representing approximately 15,200 transcript cDNAs is represented as A: background subtracted intensities; B: log2–transformed background subtracted intensities; C: mean dierence [Bland and Altman, 1986, 1999], or MA [Yang et al., 2002], plot. Subtle dierences, such as a tendency towards higher expression ratios M at low mean intensity A, are more obvious in the latter.

2.2 Normalisation – mathematical bias removal

A bewildering number of statistical methods have been developed to remove bias from microarray data, for both single–channel and dual channel platforms. The main approaches for the latter can be divided into four broad categories, reviewed below. With the exception of methods based on smoothing, all assume that bias is linear in microarray data — a awed assumption, as shown below. Microarray data is assumed log–normal, so logarithmic transformation, usually to base 2, is universal in microarray analysis.

29 2.2. NORMALISATION – MATHEMATICAL BIAS REMOVAL

2.2.1 Scaling

Scale adjustments by linear transformation are the simplest normalisation strategy used, the goal being to make M comparable across slides which may dier in scale and/or range of values. They do not, however, address within– slide biases. Dividing by slide–wise mean or median M [Hughes et al., 2000; Monks et al., 2004; Schadt et al., 2003], and other forms of standardisation have been used. Yang et al. [2002] suggest scaling by the median absolute deviation (MAD), which is robust to outlying values. Bolstad et al. [2003] and Yang and Thorne [2003] have suggested an alternative to these transformations, dubbed quantile normalisation. Aver- age values are computed for each of N genes in a dataset, and N quantiles are calculated for the average distribution. For each slide, genes are then ranked, and their values replaced with the N quantile values in ascending order. So, the gene with the lowest/most negative M value in each slide is assigned the rst quantile value, the second lowest/most negative gene assigned the second quantile value, and so on, irrespective of gene identity. Thus each slide has exactly the same scale, since the expression values are now drawn from the average distribution. The process is analogous to using ranks of genes rather than expression levels; the quantiles, however, mir- ror any density changes in the average distribution, whereas ranks are of necessity integers. The greatest weakness of this method is the inability to account for missing values (caused by e.g. background subtraction resulting in negative intensities): during the substitution process, some quantile values will have to be ignored, but how these are to be selected is arbitrary. As the number of missing values grows large, the shape of the quantile distribution will change, negating the aim of the method. One solution is to impute the original value of the gene from other slides in the dataset, for which several methods have been proposed [Kim et al., 2005; Ouyang et al., 2004; Troyanskaya et al., 2001].

30 2.2. NORMALISATION – MATHEMATICAL BIAS REMOVAL

2.2.2 Analysis of Variance

Kerr et al. [2000] construct an analysis of variance (ANOVA) model of microarray data of the form

log(yijkg) = + Ai + Dj + Vk + Gg + (AG)ig + (V G)kg + ijkg where yijkg is the measurement for array i, dye j, variety k and gene g. is the overall average signal; Ai accounts for gross array dierences such as hybridisation success; Dj for dye–specic eects such as dierential incorporation rates; Vk for sample (variety) eects, and Gg for gene eects . The interaction term array gene ((AG)ig) represent bias such as spatial in- consistancies or deformations on a particular slide. The (V G)kg term is the quantity of interest, as this represents alterations to expression associated with a particular variety — genuine biological signal. The ANOVA approach unies normalisation and analysis into one procedure, and the model–based approach allows the addition of terms describing other artefacts as appropriate. However, the current model does not account for non–linear bias within a slide, and it is dicult to see how this could be achieved without positing complex interaction terms between parameters. Degrees of freedom soon become limiting in linear model approaches where the number of samples is small, as in most microarray experimental designs. There is therefore a limit to the number of terms one can include in the model, particuarly if several degrees of freedom are to be retained for error estimation, as is common [Kerr et al., 2000; Simono, 1996].

2.2.3 Principal Components Analysis

Alter et al. [2000] use principal components analysis (PCA) calculated by the singular value decomposition (SVD) to analyse a microarray time–series experiment charting approximately one period of the Saccharomyces cerevisiae cell cycle at 30 minute intervals for 390 minutes [Spellman et al., 1998]. PCA is a dimensionality reduction technique: its aim is to capture the maximum amount of variance in a dataset by transforming to a small

31 2.2. NORMALISATION – MATHEMATICAL BIAS REMOVAL number of new, uncorrelated variables, or principal components (PCs). It is thus essentially a remapping of data along a limited number of axes of variance [Jolie, 2002]. Vectors describing these new axes in terms of the original dataspace axes are termed eigenvectors, and can be calculated in a number of ways, SVD being a common choice [Jolie, 2002]. The decomposition can be reversed to reconstitute the original data. An important implication of the independence of the PCs is that they capture dierent variance trends. Any one of these can be removed by eliminating the relevant component (eigenvector): this collapses the data in that dimension, removing that variance trend, but not altering the data on other axes. We thus have a mechanism for removing given variance trends (e.g. artefactual signal) from a dataset without aecting other trends. Alter et al. [2000] use just this strategy, eliminating an eigenvector describing an upward tendency inconsistant with the expected periodicity of cell cycle phenomena. They assume that this tendency is therefore artefactual, and normalise their data by reconstituting without that eigenvector. Other eigenvectors are then found to describe two periodic trends across the data correlating with cell cycle phase changes, and these are interpreted as expression “signatures” of biological origin, corresponding to periodicities in the cell cycle. In later work, Alter et al. [2003] use a generalised version of this approach to make comparisons between the yeast and human cell cycles. The main limitation of PCA is one of interpretation. Eigenvectors do not necessarily correspond to discrete physical processes: they simply describe trends of variance in data. There is thus no guarantee that a single eigenvector captures a single experimental variable. In the experiment described by Alter et al. [2000], biological variance is expected to be periodic, due to regular changes in expression at dierent points of the cell cycle. They therefore assume that non–periodic variance trends can be dismissed as artefact. In a less well–dened experiment, there would be no a priori method of distinguishing eigenvectors describing biological signal from those capturing artefact. Interpretation would then have to proceed by correlating eigenvectors back to experimental variables in order to deduce their provencance, and so decide if they should be removed.

32 2.2. NORMALISATION – MATHEMATICAL BIAS REMOVAL

2.2.4 Intensity–dependent smoothing

Dudoit et al. [2000] re–analysed a microarray dataset comparing ApoA1 knockout and SR-BI transgenic mice to inbred controls in an investigation of low HDL cholesterol models [Callow et al., 2000]. They found a non– linear dependence of log ratio M on mean channel intensity A; in other words, expression ratio changes as a function of mean intensity, and this function is not linear (shown for our own data in the next section). Dudoit et al. [2000] and later Yang et al. [2002] and Smyth and Speed [2003], point out that scaling of one channel to another – a linear transformation – cannot adequately remove such non–linear bias. These authors use a robust local regression method — loess [Cleveland and Devlin, 1988; Cleveland et al., 1992] — to account for the non–linearity, and adjusting log ratio M by the local t residuals. The procedure ts a locally linear regression to the data over a scrolling window of dened width [Simono, 1996], which amounts to tting a smooth curve through the data. Normalisation by taking residuals of this function eectively alters the data such that the smooth function is now a straight line y = 0. All these authors further report that this intensity–dependance can be dierent for each subgrid of spots, deposited by a single print–tip. Microar- ray slides are robotically manufactured in a process where hollow metallic pins dip into microtitre plates containing oligonucleotide solutions, which are then deposited onto treated–surface glass slides. Thus, each pin will deposit a sub–grid or block of spots in the same area of the microarray, which is in fact a grid of spot grids. Since each pin has subtly dierent proportions, the capillary and surface tension forces which draw up and deposit oligonucleotide solution will vary slightly, leading to changes in spot morphology. The proposed solution is to normalise data from each subgrid independently by tting print–tip specic loess curves; however, this solution, like any smooth t, may be hindered by overtting the function [Simono, 1996]. Wilson et al. [2003] oer a slightly dierent approach to this problem, by spatially smoothing the residuals of the loess t to adjust for variations in median intensity across a slide. This eliminates the awkward transition of

33 2.3. CORRECTING MULTIPLE NON–LINEAR BIASES IN MICROARRAY DATA values between subgrids, but is susceptible to overtting caused by abrupt changes in the smoothness of data [Simono, 1996]. Analogous approaches are oered by Finkelstein et al. [2001], who iteratively apply linear regression for each subgrid, removing outliers until the regression stabilises, giving a linear data transformation; Sapir and Churchill [2000] who use the orthogonal residuals from a robust regression of red in- tentisy R on green intensity G in place of the intensities themselves; and Kepler et al. [2002], who attempt to nd a core set of invariant genes, by iteratively reweighted least–squares, against which they calculate a normalisation constant. All these approaches, like the ANOVA described above, assume that the majority of genes have invariant expression across samples, so that M → 0 and hence the overall relationship of the two channels should be linear. A particular strength is the ability to dierentially weight spots in all these schemes, so that, for instance, constant control spots can be exploited. How- ever, the sparsity of controls in most microarrays (of the order of several per subgrid) usually precludes their exclusive use for normalisation. Smooth approaches can be applied iteratively to other non–linear biases, so the data is progressively smoothed for multiple eects. However, the interaction between such iterative applications, if any, is not clear, and may generate new biases.

2.3 Correcting multiple non–linear biases in microarray data

As discussed above, a non–linear systematic relationship between mean intensity and log ratio [Smyth and Speed, 2003; Yang et al., 2002] may exist in two–colour microarray data. At least one other non–linear bias has also been reported, where a change in intensity correlates with the order in which spots within a subgrid are deposited onto the slide surface. Balazsi et al. [2003] show a time–dependent trend associated with atmospheric exposure during the slide printing process, and Mary-Huard et al. [2004] report a pe-

34 2.3. CORRECTING MULTIPLE NON–LINEAR BIASES IN MICROARRAY DATA riodic bias in expression ratios, consistant with a spot deposition order bias. Investigation of other possible sources of systematic bias is therefore warranted, as is the development of a normalisation method capable of handling multiple non–linear eects. In this section we shall show that both an intensity and a deposition order dependence exist in data generated in our laboratory, and present a novel method for removing them, based on Generalised Additive Models (GAMs) [Ruppert et al., 2003; Simono, 1996].

2.3.1 Non–linear artefacts

Figure 2.2A shows a mean–dierence (MA) plot for a typical microarray experiment from our laboratory. The array contains approximately 15,200 cDNAs from the NIA 15k mouse clone set [Kargul et al., 2001]. The smooth line describes a loess function after Yang et al. [2002], above, illustrating a tendency for the value of M to change according to the magnitude of A. This tendency is obviously not constant, which would produce a linear relationship; it is non–linear, resulting in a smooth curve. Figure 2.2B shows the same plot, but with a total of thirty–two loess lines tted, each corresponding to a subgrid of spots on the array as described above. As previously reported [Smyth and Speed, 2003; Yang et al., 2002], the bias in each subgrid has slightly dierent properties, suggesting a subgrid–specic procedure is appropriate for normalisation. The source of this bias appears to be due to inherent dye properties. It is present in self–self hybridisations, where the same RNA sample is used in both channels of the array [Dudoit et al., 2002; Pulvers, 2004]. There should therefore be no dierence in eciency of any step of the process — apart from cDNA labelling and channel–specic scanning parameters. The use of alternatives to cyanine based uorophores does not completely remove the problem [Pulvers, 2004], suggesting that the bias may actually be due to uor–specic physical properties. Figure 2.2C shows the same M data, plotted in order of spot deposition. There is a clear trend for spots layed down towards the end of the printing

35 2.3. CORRECTING MULTIPLE NON–LINEAR BIASES IN MICROARRAY DATA process to have slightly higher M values than those deposited earlier. The relationship is non–linear, and Figure 2.2D shows that it can vary across subgrids in the same way as intensity–dependence. It corresponds to the deposition–order artefact reported by Balazsi et al. [2003], where they show that the order of deposition of spots within a subgrid aects intensities in a time–dependent fashion. Microarrays are produced over a series of days, during which the slides are exposed to uctuations in light, temperature, and humidity; this exposure appears to create dierences in signal intensity. A similar bias is described by Smyth and Speed [2003], who nd much sharper gradations to the bias: these authors attribute the dierences in expression ratios to dierences in the quality of the cDNA libraries used to assemble the arrays they examine [Callow et al., 2000], and use scale adjustment to correct the bias. We therefore have two non–linear biases in this data, which must be accounted for prior to downstream analysis. Since none of the normalisation methods reviewed in the previous section are capable of dealing with more than one non–linearity, a new method is called for.

2.3.2 Additive model normalisation

We may incorporate the non–linearities described above, along with any linear eects, into an Additive Model (AM) [Hastie and Tibshirani, 1990; Simono, 1996]. AMs are extensions of linear models, where at least one of the terms in the mean is expressed as a smooth function of the predictor [Hastie and Tibshirani, 1990; Ruppert et al., 2003]. We can therefore use this framework to integrate multiple non–linear biases into a single expression. The residuals of the model, once the non–linear mean has been accounted for, will then be the normalised expression values.

We can describe gene g’s observed expression ratio Mg as

Mg = + f1(Ag) + f2(Dg) + g where is an intercept term, f1(Ag) and f2(Dg) are (smooth) functions of mean intensity and deposition order, respectively, and g is the error term.

36 2.3. CORRECTING MULTIPLE NON–LINEAR BIASES IN MICROARRAY DATA

A B 2 2 1 1 0 0 M M −1 −1 −2 −2

6 8 10 12 14 6 8 10 12 14 C D 2 2 1 1 0 0 M M −1 −1 −2 −2

0 100 300 500 0 100 300 500

Figure 2.2: systematic non–linearities in microarray data from our laboratory, revealed by loess smooth curve tting. Top: expression ratios are dependent on intensity (right), and this relationship can vary for each subgrid on an array. Bottom: expression ratios are also subject to spot deposition order bias (left), which can vary between subgrids (right).

An immediate problem is the possibility of over–tting the model: if the smooth functions are over–sensitive to data uctuations, they will tend to t too closely to the local changes in data, resulting in a very “wiggly” smooth curve. In contrast, under–sensitivity will fail to capture local trends in data, negating the utility of using a curve, rather than a straight line, to capture local data variations. Smoothness is controlled by penalising roughness in the tting procedure, which can be done in a variety of ways. In this case, the smooth functions used are penalised regression splines. Over– and under–tting are controlled by adjusting the level of penalisation using a smoothing parameter. Calculating this parameter is problematic, as we are trading o smoothness to accuracy of t. If the parameter is too small, we overt, and if too large, the model is insensitive to data uctuation.

37 2.3. CORRECTING MULTIPLE NON–LINEAR BIASES IN MICROARRAY DATA

Here, smoothing parameters are chosen using generalised cross–validation (GCV): cross validation is a leave–one–out procedure, where each data point is iteratively left out of the smoothing t, and the square error of the t to the point left out is computed. The objective is then to minimise the sum of the squared errors. The generalised case reweights the cross–validation terms according to some criteria. Once again, this non–parametric approach allows us to avoid external assumptions about the form of the sooth function, which may compromise the solution. Taking the residuals of Mg from this model then provides normalised expression values. Figure 2.3 shows the spline ts from this model for the slide used above. The top panel describes intensity dependence, and is similar to the trend revealed by loess (cf. Figure ??A): the eect varies over approximately 2.5 M units, or 34% of the total range of M. The lower panel shows the bias due to deposition order, of the order of 0.3 M units (5% of the range of raw M). There is a clear periodicity in M, corresponding to the four days of the printing process. It would appear that as printing progresses on each day, there is a commensurate increase in M. This pattern is common to all slides in the data set: another example is shown in Figure 2.4. The source of this artefact is uncertain, but looking at the raw foreground and background intensities provides a clue: the periodicity exists in both foreground estimates and the Cy5 (red) background (top three panels, Figure 2.5), but not the Cy3 (green) background intensity (bottom panel, Figure 2.5). These trends exist in the other slides in this dataset (data not shown), and suggest that signal, rather than background, may be increasing. There seems to be a cycling over the four days of printing, which is performed at near–ambient temperature (25C, but high humidity (> 50%). Martinez et al. [2003] show a channel–specic bias, abrogated by exposure to ambient conditions of slides prior to printing; this demonstrates that atmospheric conditions may aect measured intensity signal. It appears that a similar eect occurs in our microarrays, such that increased hydration of the slide surface during prolonged exposure to high humidity tends to increase the signal in some fashion. This may be due to increased eciency of oligonucleotide binding to the slide surface at elevated hydration levels,

38 2.3. CORRECTING MULTIPLE NON–LINEAR BIASES IN MICROARRAY DATA 1.5 0.5 −0.5 6 8 10 12 14 0.05 −0.10 0 100 200 300 400 500

Figure 2.3: Penalised regression spline ts for two non–linear biases in a typical microarray experiment, modelled as a GAM. Top: Intensity dependence. Bottom: spot deposition order dependence. 1.0 0.0 −1.0 6 8 10 12 14 0.00 −0.08 0 100 200 300 400 500

Figure 2.4: Penalised regression spline ts for two non–linear biases in a typical microarray experiment, modelled as a GAM. Top: Intensity dependence. Bottom: spot deposition order dependence.

39 2.3. CORRECTING MULTIPLE NON–LINEAR BIASES IN MICROARRAY DATA which would increase foreground signal but not background levels. Irrespec- tive of source, this is a clear data artefact, and should be removed.

A 0.04 0.00 −0.04

0 100 200 300 400 500 B 0.05 −0.05

0 100 200 300 400 500 C 0.0015 0.0000

−0.0020 0 100 200 300 400 500 D 0.010 0.000 −0.010 0 100 200 300 400 500

Figure 2.5: deposition order biases in red and green (A,B) foreground and (C,D) background intensities for a typical microarray. Curves obtained by tting a GAM to each set of intensities with deposition order as the single smooth predictor. All but the green background would appear to have an embedded periodicity corresponding to the diurnal cycles of the printing process.

The detection of this time–dependent periodicity in our data is a good example of the possibility of undertting [Ruppert et al., 2003; Simono, 1996]. The lower panels of Figure 2.2 demonstrate that a loess function

40 2.4. FAILURE OF NORMALISATION IN GENETICAL GENOMICS EXPERIMENTS with a default parameter set fails to adequately describe the periodicity subsequently uncovered through GAM spline ts, which are parameterised from the data using GCV. A smooth function requires, amongst other parameters, a local area span to be specied. This controls the area of data around which local ts are made when estimating the smooth curve (best vi- sualised as the size of a scrolling window across the data). The loess default of 2/3 is simply too large to allow detection of a four–cycle eect, leading to an inadequately t data model. This observation does not reect the superiority of one smooth function over another; rather, it reveals the importance of data–driven parameter estimation, with other properties (such as robusticity to outliers) being of secondary importance. Cross–validation is a powerful method for arriving at such estimates Wood [2004], making the overall process of normalisation using GAMs more sensitive to data trends. Generalised Additive Models are therefore a robust framework for removing multiple non–linear biases from microarray data. Multiple such trends, of which at least two have been reported can be accomodated as described, and the application of cross–validation techniques to parametrise the smooth terms within the GAM provide sensitivity to data alterations. This methodology can be applied to each data from each subgrid of an array in the same way as the print–tip loess procedure. It provides a strong alternative to current methods which fail to adequately account for the multiple systematic biases in microarray data.

2.4 Failure of normalisation in genetical genomics experiments

The artefacts described above appear to pervade microarray data. Perhaps surpisingly, there has not yet been a systematic investigation of the eect of normalisation methods on expression level linkage results. This is possibly due to an implicit assumption that incidental systematic noise from any one slide will not be able to generate artefacts which will aect linkage analysis. In this section, we explore the eects of normalisation method on

41 2.4. FAILURE OF NORMALISATION IN GENETICAL GENOMICS EXPERIMENTS linkage results from a genetical genomics experiment using a panel of sixteen BxD Recombinant Inbred (RI) Strains [Bailey, 1971]. we compare linkage results obtained with raw data, and after median adjustment, whole–slide and subgrid loess, and the GAM normalisation described above. Studies using two–colour microarrays have generally used linear transformations to scale between arrays: Brem et al. [2002] and Yvert et al. [2003] average ratios, assuming log–normality (after Fazzio et al. [2001]), although Brem and Kruglyak [2005] report that linkage results from these experiments in a yeast segregating population are robust whether scaling or ANOVA normalisation is used; and Schadt et al. [2003] and Monks et al. [2004] scale channel intensities by mean intensity division prior to ratio calcu- lation (after Hughes et al. [2000]). Other investigators, using single channel Aymetrix GeneChips, generally use the default trimmed median scaling provided in the Aymetrix MAS data analysis suite [Bystrykh et al., 2005; Hubner et al., 2005, for example]. Chesler et al. [2005] report that their linkage results from BxD RI mouse lines are similar when either the default normalisation method or a robust variant, RMA [Irizarry et al., 2003], is used; this conclusion is based on the visual inspection of summary plots for the whole genome. It is, however, misleading, as closer examination shows that only 35% of transcripts are identied in data from both methods as having signicant genetic determinants in the genome [RBH Williams, CJ Cotsapas, et al, submitted].

2.4.1 Experimental design

Each of sixteen RI strains was represented by three age– and sex–matched individuals. Pooled total RNA from each strain was reverse transcribed and co–hybridised with a common reference to microarrays containing the NIA 15K set [Kargul et al., 2001], giving a dataset of sixteen slides. After image extraction, background correction and removal of control spot values, expression ratios were calculated as M = log2(R) log2(G) as described in section 2.1.1, where R is the intensity of the red channel (RI sample), and G that of the green channel (reference sample). The data was normalised in

42 2.4. FAILURE OF NORMALISATION IN GENETICAL GENOMICS EXPERIMENTS each of the following ways: no normalisation, median scaling, loess smoothing applied to the whole slide [Smyth and Speed, 2003], loess smoothing applied separately to each subgrid [Smyth and Speed, 2003; Yang et al., 2002], and GAM smoothing applied to each subgrid as described above. The latter three methods are followed by median absolute deviation (MAD) between–array scaling: global loess–treated data is scaled as whole slides; the other two treatments are scaled per subgrid, as described in the Mate- rials and Methods section for this chapter, and Yang et al. [2002]. I shall abbreviate these treatments to Raw, Median, Loess, Print–tip, and Gam, respectively. The sixteen M values across the panel for each gene are then used as expression phenotypes in a linkage analysis to determine genetic inuences on gene expression. Linkage, to a genetic map comprising 387 markers spanning all autosomes and the X chromosome, was assessed for each of the ve datasets using a bootstrapped t–test. This is equivalent to the more common regression–based methods for linkage analysis on to two genotypes: RI lines are obligatory homozygotes at all loci, so only the two homozygous genotypes are considered. Signicance was dened as either P 0.0013 or P 0.000025 (genome–wide Bonferroni corrected p 0.05 and p 0.01, respectively) for association to any marker.

2.4.2 Lack of agreement between normalisation results

If more conservative normalisation methods simply remove artefactual linkage signal, we would expect to see a gradual decrease in the number of genes identied with progressively more conservative treatments. There should, however, be many genes in common between analyses, which are presumably under genuine genetic inuence. Contrary to this prediction, Table 2.1 shows that the number of genes identied increases, but there is very little agreement across all the data treatments. This is true for both genome–wide corrected p 0.05 and p 0.01. Perhaps unsuprisingly, there is strong agreement between results from untreated and median scaled data. Since median scaling is a rst–order ad-

43 2.4. FAILURE OF NORMALISATION IN GENETICAL GENOMICS EXPERIMENTS justment, any spurious linkage due to the non–linear biases discussed above will not be removed. The lack of concordance with data treatments capable of removing second–order structure suggests that virtually all linkage identied with the more permissive methods is spurious. The somewhat higher agreement between the three conservative methods (L,P, and G in Table 2.1) indicates that increasingly sensitive removal of subtle bias begins to stabilise linkage results. These two groups of overlap, within but not between rst– and second–order data corrections, suggests that these results are not due to low power, but to fundamental changes in internal data structure.

R (%) M (%) L (%) P (%) Total R – – – – 295 M 248 (70) – – – 308 L 16 (2) 20 (3) – – 400 P 12 (2) 13 (2) 198 (35) – 366 G 9 (1) 13 (2) 97 (14) 105 (16) 409 R – – – – 60 M 38 (46) – – – 60 L 4 (2) 4 (2) – – 110 P 4 (3) 5 (4) 33 (21) – 82 G 4 (3) 3 (2) 17 (10) 18 (13) 77

Table 2.1: Eect of normalisation method on the identication of genes under genetic inuence. Linkage signicant at top: p 0.05; and bottom: p 0.01. Proportions are calculated as the common fraction of unique genes in two analyses (i.e. intersect/union).

It should be noted, however, that the false discovery rate due to multiple testing in this experiment is crippling: at p 0.05, we expect 15206 0.05 ' 760 genes by chance, and at p 0.01, we expect 15206 0.01 ' 150. The overlaps within the two groups (none/permissive, and conservative) of treatments suggest that at least some of the identications reect signal within the data, rather than false positives. To test this suggestion, we have compared the overlap in loci being identied as exerting genetic inuence in the ve analyses. If complex artefacts have no signicant eect on linkage results, but study power is low, we might expect that there will be little overlap in

44 2.4. FAILURE OF NORMALISATION IN GENETICAL GENOMICS EXPERIMENTS the individual genes identied (as above), but the loci identied would be reasonably similar. Furthermore, we can ameliorate the false discovery rate by only looking at loci appearing to inuence multiple genes. Such loci are biologically interesting as they would indicate that the eected genes are functionally related (“regulons” [Cotsapas et al., 2003]). The expected false positive number of loci appearing to inuence a single gene at p 0.05 is still 15206 0.05 ' 760; however, the number of loci inuencing n genes by chance is now 15206 0.05n, which is 2 for n = 3. Similarly, at p 0.01 we would expect 152 and 0.02 for n = 1 and n = 3, respectively.

R (%) M (%) L (%) S (%) Total R – – – – 16 M 15 (79) – – – 18 L 6 (14) 7 (16) – – 32 P 6 (12) 6 (12) 23 (47) – 40 G 8 (12) 9 (14) 25 (39) 27 (39) 57 R – – – – 7 M 6 (62) – – – 6 L 2 (22) 2 (25) – – 4 P 2 (17) 3 (18) 6 (57) – 7 G 2 (18) 2 (20) 5 (67) 5 (62) 6

Table 2.2: eect of normalisation on identication of loci inuencing at least three transcript levels. top: p 0.05; bottom: p 0.01.Proportions are calculated as the common fraction of unique genes in two analyses (i.e. intersect/union).

Table 2.2 summarises the overlaps between loci apparently inuencing at least three transcripts at the two genome–wide signicance levels. The same pattern of overlaps as seen previously emerges: there is a clear cor- respondance within, but not between, the two groups of data treatments. Since the numbers of loci detected are much higher than those expected by chance, we may conclude that these results reect structure, of either biological or systematic origin, in the underlying data. The agreement between the three smoothing techniques, particularly at the more stringent signicance level, shows that these approaches are capable of identifying at least some of the same eects. In contrast, the lack of agreement with the

45 2.5. CONCLUSIONS two permissive treatments indicates that the majority of linkage identied in data processed with the latter is not robust to removal of known data bias, and therefore should be regarded with suspicion.

2.5 Conclusions

We have shown that multiple non–linear biases may exist in microarray data, and described an extensible mathematical framework to remove them, based on generalised additive models. This approach benets from robust methods for parameter estimation, ensuring that the models describe the data as accurately as possible. we have further shown that normalisation method can have profound eects on linkage analyses performed with the resulting expression measurements. The implication here is inescapable: microarray data contains complex systematic biases which can generate biologically plausible spurious signal in linkage analyses. Failure to remove these biases seems to generate linkage signal which appears biologically plausible, but which must be regarded as artefactual, since it is not robust to further bias removal. A normalisation strategy tailored to each dataset is therefore a prerequisite for avoiding spurious inference of genetic inuences on gene expression levels. Other evidence also suggests that this problem exists in studies on larger populations and on dierent platforms [RBH Williams, CJ Cotsapas, et al, submitted]: we have shown that similar lack of concordance between data treatments pertains to an independent study of thirty two BxD strains [Chesler et al., 2005], so such eects are not unique to the data presented, or a consequence of small population size. These results lead to the uncomfortable conclusion that many of the observations reported in the literature are suspect, and may be artefactual. However, tailoring normalisation methods should resolve any such problems, so that re–analysis of previous results is a viable option.

46 2.6. MATERIALS AND METHODS

2.6 Materials and Methods

2.6.1 Sample handling

Three eight–week old males from BxD strains 1, 2, 6, 9, 11, 12, 13, 14, 16, 18, 19, 21, 24, 29, 31, and 32 were obtained from the Jackson Laboratory, Bar Harbor, Maine. Animals were housed in standard conditions for one week to acclimatise, and then sacriced by cervical dislocation. Whole brains, livers, kidneys, spleens and testes were harvested immediately and snap frozen in liquid nitrogen. Ten C57Bl/6J males were processed in the same way, to provide a reference sample. Total RNA was extracted from whole brains with TriZol reagent (Invit- rogen, Carlsbad, NJ) as per the manufacturer’s protocols. Quality of RNA was assessed by spectophotometry (A260/A280 absorbance ratios of > 2) and electrophoresis (rRNA bands visible on 1% agarose gels). Pools for each strain were then created by mixing equal amounts of RNA from each individual.

2.6.2 Expression proling

50g RNA from each strain pool was reverse transcribed and indirectly labelled using a commercial kit (Invitrogen, Sydney, Australia) as per the manufacturer’s instructions. BxD samples were labelled with Cyanine 5 dye (Invitrogen, Sydney, Australia), and the C57Bl/6J reference samples with Cyanine 3 dye. Each sample/reference pair was concentrated to 2-3l and resuspended in 50l DIG Easy buer (Roche, Paris, France) containing 5l each 10mg/ml yeast tRNA and 10mg/ml calf thymus DNA (Sigma, Syd- ney, Australia). This mixture was then applied to microarrays printed with the NIA 15K set (Clive and Vera Ramaciotti Centre, UNSW, Sydney, Aus- tralia) and hybridised under a coverslip at 37C for 15 hours. Coverslips were removed by immersion in 1xSSC; slides were then washed three times for 15 mins at 50C with 1xSSC,0.1% SDS, rinsed three times in 1xSSC, and dried by centrifugation. Slides were scanned in an ArrayWorx (Ap- plied Precision, ) microarray scanner for 0.4s (Cy3) and 0.5s (Cy5). The

47 2.6. MATERIALS AND METHODS resultant ti images were then extracted with Spot v. 2 (CSIRO, Australia http:\\experimental.act.cmis.csiro.au/Spot/index.php), and expression ratios were calculated after control removal and morphological opening background subtraction.

2.6.3 Normalisation

Median, global loess, and print–tip loess normalisations were carried out as implemented in the limma v. 1.8.6 package of Bioconductor [Gentleman et al., 2004], as described in Smyth [2004]. General Additive Models were tted using the mgcv package [Wood, 2001] for the R programming language [R Development Core Team, 2005], using default parameters. A GAM is tted to each subgrid of each array, with smooth functions of mean intensity A and deposition order D used as predictors for expression ratio M. The residuals of the model are then taken as normalised M values. For all three smoothing–based methods, scaling is then performed using the limma library.

2.6.4 Linkage analysis

A genetic map of 387 informative mouse markers spanning all autosomes and the X chromosome was compiled from publicly available information on the BXD strains at http://www.nervenet.org [Eva Chan, UNSW, pers. comm.]. Markers with missing genotypes were excluded, as were those with redun- dant Strain Distribution Patterns (genotype strings across the panel; SDPs). Those with less than two of either genotype were considered uninformative, and also excluded. For each marker, a Student’s t–test was calculated for each gene by separating the 16 expression ratios into two groups by genotype. Expression ratios were then randomly resampled with replacement to create new groups of the same numbers of observations, from which a new t–test was calculated. This process was repeated 15,000 times to give a distribution of permuted t–tests for each gene. The P value is then the proportion of permuted t– tests greater in magnitude than the observed statistic for the gene. These P values are therefore limited to a minimum value of 1/10000, or 1 105.

48 Chapter 3

Genetic inuence on mRNA levels is tissue specic

49 3.1. INTRODUCTION

3.1 Introduction

One of the more signicant insights in modern genetics has been the realisa- tion that phenotypic diversity does not necessarily reect a commensurate level of genetic dierence. Thus, genetically closely related species can in fact dier substantially in morphology, behaviour and cognition, and bio- chemistry [for example, Gompel et al., 2005; Hunter et al., 2005]. It has been suggested that there is therefore insucient genetic variation to explain such divergence in terms of coding sequence polymorphisms that lead to alterations in gene product activity, and therefore variants which alter gene regulation may have signicant roles in the generation of diversity [King and Wilson, 1975]. The general principle of phenotype:genotype variation imbalance can also be applied to the dierences between individuals of a species. The recent demonstrations of substantial heritability in, and genetic inuences on, many mRNA levels in yeast [Brem et al., 2002; Yvert et al., 2003], rodents [Bystrykh et al., 2005; Chesler et al., 2005; Hubner et al., 2005], and humans [Monks et al., 2004; Morley et al., 2004; Schadt et al., 2003] reinforce this notion, suggesting that regulatory variants are a major mechanism of phenotypic variation between individuals. By analogy to sequence polymorphisms, it has generally been assumed that a signicant proportion of regulatory polymorphisms between individuals can be detected by extrapolation from a single tissue or state [Chesler et al., 2005]. Since such polymorphisms must reside in regulatory mechanism components, this would imply that these components are common between tissues, despite extensive evidence that both cis–acting functional elements and trans–acting transcription eectors are known to be tissue specic [Wray et al., 2003], as are at least some expression dierences between individuals [Bystrykh et al., 2005; Chesler et al., 2005; Cowles et al., 2002]. The total incidence of regulatory variation is therefore unknown, even within the well characterised genetic environments of segregating experimental populations. Here, we present an experiment using microarrays to measure dierences in 22,000 gene expression levels between three tissues in two common inbred

50 3.2. EXPERIMENTAL DESIGN strains of mice, C57BL/6J and DBA2/J. The experiment is performed using pools of total RNA from many individuals raised in a constant environment, so that any dierences between strains may be considered of genetic origin. The strains are amongst the oldest extant mouse strains [Beck et al., 2000], having been developed at the beginning of the last century [Festing, 1998]. They have a signicant amount of strain–specic polymorphisms, and dier in many physiological and behavioural phenotypes, such as responses to ethanol, sugar preference, and stress–related behaviours [Festing, 1998]. In these experiments, we show that >95% of expression level dierences are tissue–specic in transcripts detectable in all three tissues, and that these are mostly genes involved in transcription and associated metabolic pathways. Finally, we demonstrate that extrapolation between tissues misidenties the vast majority of eects in the predicted tissue, a result which has implications for experimental design, particularly in human genetics.

3.2 Experimental design

We wished to identify genes whose mRNA levels are inuenced by genetic variation between the two inbred mouse strains, and ascertain whether these inuences occur across multiple tissues or are tissue–specic. However, not all genes on our arrays are expressed in any or all of the three tissues: we therefore dene expressed genes as those reliably detected in each tissue, i.e. having a mean intensity greater than the 95th percentile of negative control values. Our experiment compares multiple age– and sex–matched individuals from the two strains, raised in identical environmental conditions and processed in the same way. We therefore considered signicant changes in mRNA levels of reliably detected genes to be the result of genetic variation, and term such genes genetically inuenced. Genes can therefore be classied as:

expressed in all tissues, and

1. genetically inuenced in all tissues, or 2. genetically inuenced in a subset of tissues, or

51 3.2. EXPERIMENTAL DESIGN

3. not genetically inuenced;

expressed in a subset of tissues, and

1. genetically inuenced in those tissues, or 2. genetically inuenced in a further subset of tissues, or 3. not genetically inuenced;

not expressed

Further, we can classify gene expression across tissues as either at constant or variable levels. A gene can thus have mRNA levels under genetic inuence in one or more tissues, and either constant or variable expression across tissues. The former is a result of genetic variation between two strains, whereas the latter is independent of strain dierences. Separate pools of total RNA were created from ten whole brain, ten kidney, and nine liver preparations for the two strains of mice [Pulvers, 2004]. The tissue pools from each strain were directly compared on dual colour spotted oligo microarrays representing 22,000 transcripts in sixfold technical replicates. Data was normalised using the additive model framework described in the previous chapter, followed by between–array quantile normalisation for each set of replicates [Yang and Thorne, 2003]. Reliable detection was determined as an intensity at least 95% of the maximum negative control intensity in each set of six replicate arrays hybridised for each tissue. Dierences in mRNA levels between the strains were assessed using the B–statistic [Lonnstedt and Speed, 2002; Smyth, 2004], which is essentially a modication of a two–sample Student’s t–statistic incorporating changes to allow for the small variance values common in microarray data. A two– sample t–test divides the dierences between the means of two groups by the sums of the variances, thus capturing that dierence as a proportion of the variability in each group. However, as variance estimates tend toward zero, the denominator incorporating these estimates becomes much smaller than the mean dierence numerator. This leads to an inationary eect where test statistics on the order of hundreds are common, even for small sample mean

52 3.3. TISSUE SPECIFICITY OF INFLUENCES ON GENE EXPRESSION dierences which do not reect believable changes in mRNA levels. The solution proposed by Lonnstedt and Speed [2002] and modied by Smyth [2004] is to adopt a Bayesian inference approach to variance estimation, and calculate a log of posterior odds (B–statistic) for unequal vs. equal expression in the two samples (i.e. a log odds expression), given the data. These probabilities are derived by considering various properties of the data in a Bayesian framework, where estimation of the prior hyperparameters can be approached as a variable selection problem. Small gene–specic variances can be stabilised by “borrowing strength” from across the dataset within this framework to calculate each of the two probabilities. The result is a robust LOD score which reects the probability of a gene having altered mRNA levels between the two strains. A value of at least three (i.e. P 103) was considered signicant evidence of genetic inuence on mRNA levels between the two strains. Full details of these procedures are given in the Materials and Methods section at the end of this chapter.

3.3 Tissue specicity of inuences on gene expression

The number of genes expressed (i.e. reliably detected) in each tissue, and the number of those genetically inuenced, are presented in Table 3.1. More genes are expressed in the brain, consistent with observations that it is transcriptionally more complex than other tissues. There appear to be imbalances in the directionality of the genetic inuences — one strain having higher mRNA levels than the other for the majority of genetically inuenced genes — particularly in kidney and brain (Table 3.1, third and fourth columns). It should be noted that DBA/2J is known to have low brain weight [Roderick et al., 1973; Storer, 1967; Wahlsten et al., 1975], and C57BL/6J to have low kidney weight [Schlager, 1968] compared to its body mass. However, the imbalances in directionality do not reect these relative dierences, suggesting that the genetically inuenced

53 3.3. TISSUE SPECIFICITY OF INFLUENCES ON GENE EXPRESSION mRNA levels described in Table 3.1 are not due to gross physiological dif- ferences, such as changes to proportions of cell–types within tissues. They may, rather, reect underlying transcription regulatory dierences between the strains. Expressed G.I. Directionality? Tissue genes (%†) genes (%‡) D < B (%) D > B (%) Brain 10,932 (50) 247 (2) 76 (31) 171 (69) Kidney 8,146 (37) 667 (8) 374 (56) 293 (44) Liver 9,388 (43) 411 (4) 281 (64) 130 (36)

Table 3.1: proportions of expressed (reliably detected) genes having genetically inuenced (GI) mRNA levels between C57BL/6J (abbr. B) and DBA/2J (abbr. D). ? Proportions of overexpressed genes in each strain. † Percentage of the 22,000 transcripts represented on the microarrays. ‡ Per- centage of expressed genes.

3.3.1 The majority of genetic inuences are tissue specic

To study the tissue specicity of genetic inuences on mRNA levels, we rst conned our analysis to the 6,522 transcripts expressed in all three tissues. A total of 755 unique transcripts are genetically inuenced in at least one tissue, but only 2% of these are so in all three tissues, whereas 85.4% are genetically inuenced only in one of the three (P < 104; Figure 3.1). This result shows that the majority of genetic inuences on mRNA levels are tissue specic, a result unbiased by lack of detection across all three tissues.

This observation may, however, be the result of low statistical power: if we have little power to detect a real genetic inuence on mRNA levels in any tissue, then our power of detecting the inuence in all tissues is even lower. Power is the probability of correctly rejecting the null hypothesis of a gene not being genetically inuenced: therefore, the probability of correctly reject the null independently in all three tissues is the product of the probabilities in each tissue. If, for example, the power in each tissue is the same Pt = 0.5, 3 then the power to detect a genetic inuence in all three tissues Pa = 0.5 =

54 3.3. TISSUE SPECIFICITY OF INFLUENCES ON GENE EXPRESSION

B statistic 2x mod. B 1.2x mod. B

B K B K B K

128 21 337 17 5 83 1694 575 2966

15 5 442 9 65 0 10 470 913

180 40 1688

L 755 L 160 L 8748

Figure 3.1: Overlaps between sets of genetically inuenced genes in three tissues. The numbers of genes whose expression is genetically inuenced (GI) in one or more tissues identied by the B-statistic and its modication as detailed in the text, for thresholds of 2– and 1.2– fold expression changes.

0.125. At low powers, therefore, we might expect to see the majority of genetically inuenced genes appearing to be inuenced in only one tissue. To test whether the observed lack of overlap is due to such a power artefact in the above analysis, we developed a probabilistic approach to estimating the proportion of genetically inuenced genes, that properly accounts for the uncertainties involved. We assess the probability of genetic inuence using a modied version of the B-statistic [Lonnstedt and Speed, 2002, see Materials and Methods]. Briey, we calculate the posterior probability of any gene having an absolute mean expression ratio greater than a chosen threshold and so being genetically inuenced with an expression dierence greater than a certain size [Nott et al., 2006, accepted]. Sum- ming these probabilities gives the posterior number of genes whose mRNA levels are genetically inuenced with eect greater in magnitude than the cuto. Similarly, we can calculate probabilities of genetic inuence in just one, just two or all three tissues for each gene and, by summing, calculate posterior expected values. It is important to realise that this analysis does not identify the individual genes under genetic inuence, but rather the proportion of genes that are likely to be. Figure 3.1 summarises the results for two thresholds corresponding to a 2-fold and a 1.2-fold change in expression level: these dene 160 and 8746 genes as being altered in at least one tissue, respectively, of which only 3-5% are genetically inuenced in all three

55 3.3. TISSUE SPECIFICITY OF INFLUENCES ON GENE EXPRESSION tissues. Taken together, these results show that genetic variation overwhelmingly has a tissue specic inuence on mRNA levels, and that this result is not an artefact of low statistical power or experimental noise. Of particular note is that using the less conservative, 1.2–fold threshold in the modied B–statistic analysis has a very limited impact upon the proportion of genes declared as genetically inuenced in all three tissues, indicating that genetic inuences between the strains tend to result in large dierences in mRNA levels.

3.3.2 Expression levels across tissues do not reect complexity of regulation

The simplest explanation for these observations is that the causative genetic variations reside in regulatory machinery which is itself tissue specic. If this is the case, regulation of expression in each tissue will be mediated by dierent components of the regulatory machinery. We might therefore expect that genes with mRNA levels genetically inuenced in only one tissue will tend to have variable expression levels across tissues. We show here that this hypothesis is incorrect, by calculating the proportion of the 755 transcripts identied in the previous section — those expressed in all tissues but overwhelmingly under genetic inuence in only one tissue — which are expressed at variable levels in the three tissues. A gene is classied as variably expressed if the means of normalised intensities A for each of the three tissues vary by more than 10%. Up to 60% of genes with genetically inuenced mRNA levels show constant expression across tissues (Table 3.2), indicating that although the inuence of genetic variation is associated with variable expression (particularly in brain), lack of variable expression across tissues is not an accurate predictor of whether the underlying regulatory machinery is common between those tissues. To assess the accuracy of extrapolation between tissues, each tissue was used to predict which genes would be genetically inuenced in any of the other tissues (Table 3.3). For example, sampling liver to assess genetic vari-

56 3.4. FUNCTIONAL BIAS IN INFLUENCED TRANSCRIPTS

Tissue Constant Variable P–value† (genes considered) expression expression Brain (128) 25 (20%) 103 (80%) § < 2.2 1016 Kidney (337) 206 (61%) 131 (39%) 6.9 1003 Liver (180) 97 (54%) 83 (46%) 6.4 1004 Background‡ 4436 (68%) 2086 (32%)

Table 3.2: proportions of genetically inuenced genes whose expression levels vary across tissues. The analysis is limited to genes expressed in all three tissues but whose expression is only genetically inuenced in one. Genes are classied into those with and without substantially dierent expression levels between tissues, dened as a 10% range from mean intensity (see text for details). There is a substantial proportion of genetically inuenced genes in each tissue whose expression across tissues is equal, suggesting that similar expression levels in multiple tissues is not a good indicator of regulatory homogeneity. † P–values determined by the Fisher-Irwin test. ‡ The background set of mRNA levels not under genetic inuence to which proportions are compared. § Smallest oating point number resolvable.

ation in the brain leads to the identication of only 24 of the 247 genes that would have been detected by the direct brain analysis, and erroneous identication of a further 387 which are genetically inuenced in liver but not brain. These results cumulatively show that both the false positive and false negative prediction rates are extremely high between tissues. Further- more, irrespective of expression in other tissues, 988 genes are inuenced by genetic variation in at least one of the three tissues; we would identify only 19.2% if we were to use only brain, 27.4% if only liver and 53.3% if only kidney. Therefore, extrapolation of genetic inuences between tissues misidenties the vast majority of eects, and analysis of a single tissue will grossly underestimate the genetic inuences on gene expression between two organisms.

3.4 Functional bias in inuenced transcripts

If the observed dierences in expression between the strains are genetically encoded, we might expect that genes possessing certain functions might be

57 3.4. FUNCTIONAL BIAS IN INFLUENCED TRANSCRIPTS

Predicting tissue Brain Kidney Liver FN FP FN FP FN FP Target Brain – – 81.0% 93.0% 90.3% 94.2% tissue Kidney 93.0% 81.0% – – 84.4% 74.7% Liver 94.2% 90.3% 74.7% 84.4% – –

Table 3.3: The accuracy of assignment of genetic inuence by extrapolation between tissues. These gures summarize our ability to predict genes whose expression is under genetic inuence between the two strains in one tissue (target tissue) by extrapolating from another tissue (predicting tissue), as false negatives (FN) and false positives (FP). We would correctly identify less than a quarter of genetic eects, due to the extreme lack of overlap between genetically inuenced genes in dierent tissues.

more likely to dier than others. This could be due to co–regulation, where a polymorphism in an eector of transcription alters the expression levels of many genes of similar functionality (“regulons” Cotsapas et al. [2003]); or may be a reection on the dierent selection pressures placed on these strains in the interval since their inception. In either case, we would expect that the set of genes with genetically inuenced mRNA levels would tend to have related functionality. Here, this is assessed by an over–representation analysis of Gene Ontology [Ash- burner et al., 2000] terms associated with these genes. Although GO is a controlled vocabulary, rather than an annotational tool per se, it encapsu- lates functional information in a triple hierarchy. This tripartite structure is particularly useful, as it allows us to look at a set of transcripts from three distinct points of view: Biological Process (BP), describing the context in which a gene product acts; Molecular Function (MF), detailing the exact mechanism of action of the transcript’s product; and Cellular Compartment (CC), which describes the site of action of a gene product in the cell [Ash- burner et al., 2000]. Given the GO annotations for a set of transcripts, therefore, we can deduce functional “themes” within the set by looking for terms more common than expected by chance. Here, we examine over–representational analyses for terms associated

58 3.4. FUNCTIONAL BIAS IN INFLUENCED TRANSCRIPTS with two groups of transcripts: 1,165 which exhibit genetically inuenced mRNA levels in at least one tissue; and a subset of these, the 755 transcripts described above which are also expressed in all three tissues.

3.4.1 Functions associated with transcription and signalling are over–represented

Table 3.4 presents GO terms which are signicantly over–represented (P < 0.001) in the superset of 1,165 genetically inuenced genes described above (columns labelled GV), and the subset of 755 genes expressed in all tissues (labelled GV & exp). Not all transcripts have associated GO terms, and so the analyses are reduced to the genes for which terms could be recovered. Root terms for each GO category, which are ubiquitous by denition, and terms with less than ve occurances in the subset, which may give spuriously low P –values, have been excluded. Term over–representation is calculated by resampling from the sets from which the genes were drawn. This prevents terms with high underlying frequencies to be mistakenly identied as over– represented. Two functional themes emerge as being over–represented: transcriptional regulation, and (intra–cellular) signalling. The former is supported by the numerous over–represented BP terms involving transcription and nucleic acid metabolism, and their associated regulation, and the nucleic acid binding terms in MF. The latter is evidenced primarily by the kinase/transferase– associated MF terms, particularly involving phosphate groups, and by the BP terms relating to phosphorus metabolism. Phosphorylation is a classic mechanism of protein activation, and is a standard mechanism of intra– cellular signal transduction, for example in the Ras signalling pathway [Al- berts et al., 2002] where a cascade of MAP kinases sequentially phosphory- lates, propagating a signal received at the cell surface into the nucleus in the form of activated transcription factors. It is striking that the terms suggestive of signalling functionalities are not over–represented in the subset of 755 genes expressed in all tissues. This implies that the majority of genes underlying the over–representation

59 3.4. FUNCTIONAL BIAS IN INFLUENCED TRANSCRIPTS

GO GV GV & exp Class†Term N ‡ P–value N ‡ P–value Annotation BP 06139 81 3.88 106 50 2.24 105 nucleobase, nucleoside, nucleotide and nucleic acid metabolism 06350 45 2.03 106 28 5.32 105 transcription 19219 45 8.63 106 28 1.41 104 regulation of nucleobase, nucleoside, nucleotide and nucleic acid metabolism 45449 45 1.13 105 28 1.76 104 regulation of transcription 50791 55 6.08 105 32 1.84 104 regulation of physiological process 06351 44 7.98 106 28 1.89 104 DNA–dependent transcription 19222 54 1.03 104 32 3.00 104 regulation of metabolism 06355 44 1.7 105 28 3.60 104 DNA–dependent regulation of transcription 50789 78 1.78 105 50 6.70 104 regulation of biological process 06793 20 2.59 104 15 0.04 phosphorus metabolism 06796 20 2.59 104 15 0.04 phosphate metabolism 08151 144 6.82 104 92 0.05 cell growth and/or maintenance 09987 248 2.89 104 157 0.12 cellular process MF 03677 12 3.01 107 10 3.21 105 DNA binding 03676 29 1.38 105 24 8.21 104 nucleic acid binding 05488 119 8.39 106 82 3.65 103 binding 16772 9 2.00 104 6 0.01 transferase activity, transferring phosphorus–containing groups 16301 8 1.36 104 6 0.02 kinase activity 04672 7 4.87 104 5 0.03 protein kinase activity 16773 8 4.32 104 6 0.03 phosphotransferase activity, alcohol group as acceptor CC 05634 100 5.38 106 72 9.72 104 nucleus

Table 3.4: Over–represention analysis for Gene Ontology terms associated with genetically inuenced genes. These terms occur more frequently than expected by chance in the 1,165 genes inuenced in at least one tissue (GV), and in the subset of 755 expressed in all three tissues (GV & exp). Terms have been restricted to those with at least ve instances in the latter set, to avoid spuriously low P–values. † GO category: Biological Process (BP), Molecular Function (MF), and Cellular Compartment (CC). ‡ Number of observed annotations. are expressed in a tissue–specic manner. In contrast, the majority of genes underlying the transcriptional regulation theme appear to be within the

60 3.5. DISCUSSION subset of 755 genes. In the previous section, we saw that the subset of 755 genes, primarily implicated in transcriptional regulation, are overwhelmingly genetically inuenced in only one tissue. We therefore posit a scenario where dierent parts of the transcriptional regulatory machinery are being eected in dierent tissues, by genetically encoded quantitative changes in signal transducer concentrations.

3.5 Discussion

These analyses indicate that in inbred mice, the majority of genetically inuenced mRNA level variation is tissue specic. Furthermore, it appears that this is at least partially a result of genetic variation aecting tissue–specic signalling pathways, which have quantitative eects on the transcriptional regulatory machinery in only one tissue. This is a somewhat counter–intuitive result: the simplest way to think of genetic inuences on mRNA levels is as variants occuring in the transcriptional machinery itself, which then cause changes in expression of other genes. In contrast, the results presented here suggest that at least some of the causative variations reside elsewhere and may act to change the expression of transcriptional machinery components: this presumably has eects on the expression levels of genes regulated by those components. This is certainly consistent with previous observations: Yvert et al. [2003] show in a yeast haploid population that the expression levels of clusters of co– regulated genes are modulated by certain loci in the genome, but these loci do not contain an over–abundance of known transcription factors; this observation suggests that the causative variations may well reside in genes which are not directly involved in transcription. Overall, these results show that mRNA levels in cells are set by complex mechanisms, not necessarily restricted to functions directly related to gene expression such as transcription, mRNA processing, export and translation. It is clear that these processes tend to be controlled by both basal and tissue specic factors [Wray et al., 2003] which may be subject to genetic varia-

61 3.5. DISCUSSION tion. Further, they suggest that lack of dierential expression across tissues cannot reliably indicate a single control mechanism, as a signicant proportion of transcripts genetically inuenced only in a single tissue are expressed uniformly in all three tissues. We must therefore conclude that mechanisms of mRNA control are unexpectedly complex and that invariant expression across major tissues does not equate to a single underlying mechanism for mRNA level control. The genetic inuences on gene expression may be dissected by treating expression levels as phenotypes in a segregating population, and attempt- ing to map genetic determinants of these phenotypes (“genetical genomics” [Jansen and Nap, 2001]). Such an experiment in a panel of 31 BxD Reom- binant Inbred strains, derived from C57BL/6J and DBA/2J, is presented in the next chapter, where the tissue specicity of genetic inuences on expression are discussed. Perhaps the most dicult conclusion is that gauging the extent of regulatory genetic variation between individuals requires extensive tissue proling, and that extrapolation between tissues is not possible. Ultimately, understanding the relevance of such variation on complex phenotypes, particularly disease status, will require extensive and accurate tissue models, and has profound implications for experimental design. This is particularly the case in human genetic research, where the practical and ethical dicul- ties in aquiring adequate samples of diverse tissues from individuals usually results in the use of cell lines derived from easily sourced cell types.

62 3.6. MATERIALS AND METHODS

3.6 Materials and Methods

3.6.1 RNA preparation

Mus musculus strains C57BL/6J and DBA/2J were obtained from the Bio- logical Resources Centre, UNSW (Sydney, Australia). Whole brain, kidney and liver tissues were harvested according to protocols approved by the University of New South Wales Animal Care and Ethics Committee (Ethics Code ACEC 01/43), and snap frozen in liquid N2. Total RNA was extracted according to the manufacturer’s instructions with Trizol Reagent (Invitro- gen, Mt. Waverley, Vic, Australia); purity and integrity was assessed by OD260/OD280 readings greater than 2 and intact rRNA bands (Agilent Bio- analyzer, Agilent, Forest Hills, Vic, Australia) analysis, respectively. Total RNA from the tissues of 10 individuals was pooled for each strain (9 for liver) to remove individual variation in gene expression; 20g of pooled RNA and 2g of Lucidea Universal Scorecard Spike-in (Amersham Biosciences, Cas- tle Hill, NSW, Australia) were reverse transcribed using the SuperScript III Indirect cDNA Labeling System (Invitrogen, Mt. Waverley, Vic, Australia) and uorescently labeled with Alexa Fluor 555 for C57BL/6J and Alexa Fluor 647 for DBA/2J (Invitrogen, Mt. Waverley, Vic, Australia).

3.6.2 Microarray hybridisation and washing

For each tissue, labeled cDNA was directly compared on 6 replicate glass slide two-color microarrays containing the Compugen Mouse OligoLibrary representing 21,997 genes and Lucidea Universal ScoreCard (Clive and Vera Ramaciotti Center for Gene Function Analysis, UNSW, Sydney, Australia), in 100l of DIGEasy buer (Roche, Basel, Switzerland) with 5l each yeast tRNA and calf thymus DNA as blockers (Invitrogen, Mt. Waverley, Vic, Australia). Utility controls from the Lucidea Scorecard were not used, and therefore served as additional negative controls. Hybridized microarrays were washed in 1xSSC, three times in 1xSSC, 0.1%SDS at 50 C, and three times in 1xSSC, dried by centrifugation, and scanned with the GenePix 4000B microarray scanner (Axon Instruments, Union City, CA, USA).

63 3.6. MATERIALS AND METHODS

3.6.3 Data processing

Image analysis was performed with the Spot image analysis software version 2 (CSIRO, Australia, http://experimental.act.cmis.csiro.au/Spot/index.php). All further data processing and statistical analyses were performed using R version 2.0.0. [R Development Core Team, 2005]. Gene expression data were morph background corrected and log2 transformed. Data for controls and the 232 replicated spots of the housekeeping gene Gapd (NM 008084) were removed prior to normalization to avoid bias. All 18 slides were then normalized for intensity and spatial bias and then quantile adjusted to adjust for the diering scale of measurements across arrays Yang et al. [2002]. Log2 ratios of intensities, or M values, were subsequently used as expression measurements.

3.6.4 Overlap analysis

We classied genes as reliably detected if their log mean intensity, A = th 0.5(log2 R + log2 G), was greater than the 95 percentile of negative controls present on our arrays. B statistics were then calculated for all genes, using default parameters in the R limma library version 1.8.6, part of the Bioconductor project [Gentleman et al., 2004]; genes were classied as genetically inuenced if they had both a LOD score > 3 and an A value greater than the intensity threshold. Overlaps were calculated by comparing the Genbank identiers of the relevant transcripts.

3.6.5 Detecting changes in expression between strains

NOTE: this section was written by David Nott, School of Mathematics, UNSW, for a co–authored manuscript describing the results presented in this chapter, and is used with permission. Adopting the model of Lonnstedt and Speed [2002] for the M–values, the posterior probability of the true mean gj of the M–values for gene g and tissue j being non–zero is P r(gj 6= 0|M) = 1/(1 + exp(Bgj)) where Bgj is the B–statistic. The validity of this expression can be derived under weaker assumptions than those in Lonnstedt and Speed [2002], as

64 3.6. MATERIALS AND METHODS described in Smyth [2004], where essentially only approximate normality of the estimator of the mean parameter, an approximate scaled 2 distribution for an estimator of the variance parameter, and independence between the estimators of mean and variance parameters is assumed. In calculating Bgj we have used previously described prior distributions Smyth [2004]. All prior hyperparameters including pj = P r(gj 6= 0) are estimated as in Smyth [2004]. Our prior hyperparameters are specic to each tissue although for simplicity this dependence is suppressed in the notation below. In this model the posterior distribution gj|gj 6= 0, M is a t–density,

2 2 cng 2 d0s0 + M ngM cng k gjk 1+cng gj td0+ng Mgj, Ã1 + cng Png(d0 + ng) !

2 where ng = dg + 1, d0, dg, c, and s0 are dened as in Smyth [2004], Mgj is the sample mean of M–values for gene g in tissue j, Mgjk is the M–value for 2 gene g, tissue j and replicate k, and t(, ) denotes the t–density with degrees of freedom, mean and scale parameter 2. For a given cuto k we have the posterior probability of | gj | k is

P r(| gj | k|M) = P r(gj 6= 0|M)P r(| gj | k|M, gj 6= 0) where as we have seen the rst term is calculated from the Bgj and the second term can be calculated as a t–probability as described above. An estimate of the number of genes with nonzero mean M–values larger than a threshold for a given tissue is obtained by summing these probabilities over genes. This represents a so–called direct posterior probability approach to inference [Newton et al., 2004]. Estimates of numbers of genes whose expression is inuenced in just one, just two or all three tissues are obtained in a similar way by obtaining gene specic probabilities using posterior independence between tissues and summing over genes to get posterior expected values.

65 3.6. MATERIALS AND METHODS

3.6.6 Gene Ontology analysis

NOTE: this section is based on a methodological description written by Ro- han Williams, School of BABS, UNSW, for a co–authored manuscript describing the results presented in this chapter, and used with permission. In order to test for over–representation of a GO term, we tested whether genes of interest were mapped to the term at a level greater than chance expectation (dened as the observable proportion of genes mapping to the term in the set of expressed genes in the experiment) using sampling without replacement from the hypergeometric distribution.

66 Chapter 4

Dissection of genetic inuences on mRNA levels in a Recombinant Inbred panel

67 4.1. INTRODUCTION

4.1 Introduction

Gene expression levels have been shown to be heritable in yeast [Brem et al., 2002; Yvert et al., 2003], plants [Schaart et al., 2005; Schadt et al., 2003], fruit ies [Wittkopp et al., 2004], mice [Bystrykh et al., 2005; Chesler et al., 2005], rats [Hubner et al., 2005], and humans [Monks et al., 2004; Morley et al., 2004; Schadt et al., 2003]. The principle of stably inherited expression levels was rst proposed by King and Wilson [1975], and elegantly demonstrated by de Vienne et al. [1988] and Damerval et al. [1994], who showed that some protein levels are heritable. The later studies cited above lever- aged the power of genome–wide transcription level assays to demonstrate that a signicant proportion of genes have partially heritable expression levels. By treating expression levels as quantitative traits in genetic mapping approaches, these studies have also collectively shown that this heritability is complex in origin, being detectable as multiple genetic eects both in cis and in trans to transcripts. The complexity of genetic eects on gene expression reported in many of the above studies is perhaps surprising, given the modest (by classical QTL analysis standards) sample sizes used. In an explicit investigation of epistasis, Storey et al. [2005] found that 37% of yeast gene expression levels measured in 112 F1 segregants show simultaneous linkage to two independent loci. In a dierent approach, Wittkopp et al. [2004] show that eects in cis and in trans appear to compensate each other in fruit ies, such that ospring display dierential expression of genes which are not dierent between the parents, due to a reassortment of alleles. This phenomenon of transgressive segregation has also been reported in yeast [Brem and Kruglyak, 2005], plants [Damerval et al., 1994], and mice [Chesler et al., 2005]. It is worth noting, however, that very strong heritabilities have been reported alongside these results: Brem et al. [2002] report a median of 84% in the yeast population analysed by Storey et al. [2005], and Monks et al. [2004] a median of 34% in 15 human pedigrees. From a mechanistic point of view, this complexity is expected: a large number of molecules is involved in the transcription of any gene [Wray et al.,

68 4.2. EXPERIMENTAL DESIGN

2003]; genetic variation in the eect of more than one of these molecules would result in the genetically complex inheritance of transcript levels. Sim- ilarly, variations in the genesis of regulatory input to the transcriptional process (for example, in multiple members of intracellular signalling cascades or ligand binding at the cell surface) could result in such genetic complexity. In this chapter, we use a panel of 31 Recombinant Inbred strains derived from the two strains used in the previous chapter to show that genetic inuences on transcript levels are mappable as quantitative traits in three tissues, that they can be of complex genetic origin, and that some biological functionalities are more susceptible to such eects. We consider each tissue independently; comparisons across tissues are presented in the following chapter.

4.2 Experimental design

Segregating populations oer a powerful method of dissecting genetic determinants on mRNA levels. The ability to denitively identify a genetic component to such molecular traits oers compelling proof of genetically encoded dierences, in contrast to our somewhat informal denition of strain dierences in the previous chapter. Recombinant Inbred (RI) panels of strains, derived from a single cross between two progenitor (parental) strains and bred to homozygosity as distinct strains, are a particularly useful resource in this context [Abiola et al., 2003]. Individuals do not require genotyping as the genetic background is known; they may be pooled to preclude individual, non–genetic expression uctuations; and, as they are homozygous, only two genotypes are possible, simplifying the detection of linkage. Our strategy was to dissect genetic inuences on gene expression in whole brain, kidney, and liver, by mapping determinants of mRNA levels in these tissues in a Recombinant Inbred mouse panel comprising 31 BxD strains, originally derived from a cross between C57Bl/6J (B) and DBA/2J (D) — the strains analysed in the preceding chapter. We used a common reference design on a two–colour microarray platform, comparing each BxD strain to a C57Bl/6J reference sample of ten individuals. We pooled equal amounts

69 4.2. EXPERIMENTAL DESIGN of total RNA from three individuals of each BxD strain to avoid changes in mRNA levels of individual origin (details are provided in the Materials and Methods section at the end of this chapter).

4.2.1 Moderated linkage statistics

We tested for linkage against a dense genetic map of 652 markers, cover- ing all autosomes and the X chromosome. Linkage is generally assessed by asking whether the trait values, grouped by genotype at a particular locus, exhibit signicant mean dierences. Our population is limited to two possible genotypes, B and D, so that a two–sample Student’s t–test may be employed as a linkage statistic. We found signicance assessment problematic: as many expression values tend towards zero, the variance estimates used in the denominator of a standard t–test also tend towards zero, producing greatly inated statistics with magnitudes in the hundreds being common. These statistics would give exceptionally high signicances based on standard t distibutions, but the underlying data clearly does not support the conclusion of mRNA level dierences [data not shown]. We addressed this problem by assessing signicance empirically in the rst place, by bootstrapping [Efron and Tibshirani, 1993] each calculated statistic. Bootstrapping involves randomly resampling data (with replacement) and recalculating statistics; signicance is then assessed as the proportion of resampled statistics having greater magnitude than the observed statistic. The fewer these are, the more unlikely the observed statistic is, and hence the greater its signicance. This approach is extremely robust, as no distributional assumptions are made about the data; however, the empirical p–values obtained are limited by the number of resamples performed. In the context of genetical genomics, this limitation becomes important, as thousands of expression phenotypes are being tested against hundreds of markers. The multiple testing problem implicit in such numbers of comparisons requires very stringent signicance thresholding, which in turn requires large numbers of resamples, making the problem computationally intractable. For example, in order to resolve point–wise p values of 104 empirically, we must

70 4.2. EXPERIMENTAL DESIGN resample each statistic 10,000 times. Doing so for 12,000 genes each at 652 markers takes approximately 24 hrs on a ve–node compute cluster using highly optimised code. This translates to genome–wise p = 0.0652, at which we would expect > 1400 false positives. As computation time scales approximately linearly, increasing resolution soon becomes impractical. 6 4 2 −log10 P 0 1 2 3 4 5 6 7 8 10 12 14 16 18 X

Figure 4.1: empirical (solid) vs. nominal (dotted) LOD scores (-log10P ) for a gene displaying signicant linkage. Empirical p–values are from a t– test boostrapped 10,000 times, and nominal P –values from a moderated F – statistic. Note that empirical p values cannot resolve the linkage to Mmu 1 beyond LOD = 4, due to resampling limitations.

We therefore decided to use moderated F –statistics [Smyth, 2004] to assess linkage. These rely on stabilising variance estimates to counteract the inationary eects of considering small values. The approach adopted by Smyth [2004] builds on the Bayesian model to estimating variance proposed by Lonnstedt and Speed [2002], but recasts the posterior odds statistic (the B–statistic used in the previous chapter) as a moderated t–statistic using the posterior residual standard deviations. This approach can be generalised to moderated F –statistics: Smyth [2004] employs this approach in the context of generalised linear models to make contrasts between groups of expression values specied as parameters to a model [Smyth et al., 2004]. This approach allows analysis of a common reference design experiment such as ours, where the samples are not directly compared on each slide. Signicance may then be assessed from a standard F distribution, since the variance ination phenomenon has been controlled. These nominal P –values are not limited in resolution, allowing us to adjust for multiple testing. A comparison of empirical p–values obtained with the bootstrapped t–test and

71 4.3. INDEPENDENT TISSUE ANALYSIS nominal P –values obtained with the moderated F –statistic show very high correlation (> 0.95): visual inspection shows that linkage scans from the two methods are highly superimposable (Figure 4.1). We are therefore condent that using moderated F –statistics to assess linkage is a fast (approximately 6 hrs on a desktop computer) and accurate solution. We believe that our results are generally robust to the artefactual linkage obtained as a consequence of normalisation failure in Chapter 2, section 4, as our normalisation procedures are tailored for the dataset. However, it is still possible that poor quality data may give rise to artefactual linkage: a set of poor hybridisations, for example, might give markedly dierent expression measurements for some genes for reasons of data quality rather than biology. To test this possibility, we removed data from strains with least correlation to all other strains (median correlation coecients < 0.5) and recalculated linkage statistics for the subsets of 27, 28, and 26 remaining strains in brain, kidney and liver, respectively. We obtained essentially identical linkage signicances for 200 randomly selected genes in each tissue, suggesting that data quality is not a signicant driver of results [data not shown].

4.3 Independent tissue analysis

We began by asking which genes were reliably detected in each tissue, by using the intensities of negative controls present on our arrays as minimum thresholds as described in the previous chapter. As we are treating each tissue independently in this analysis, we did not require that transcripts be detected in all tissues (Table 4.1, second column). As reported in section 3 of the previous chapter, more transcripts are detected in brain than kidney or liver, reecting the greater transcriptional complexity of that tissue. The number of transcripts detected in the latter two tissues is much smaller than that reported previously (Table 3.1): this appears to be due both to the large number of distinct biological samples used here, compared to the technical replicates of pooled samples in the previous chapter, and the lack of replicate hybridisations per data point.

72 4.3. INDEPENDENT TISSUE ANALYSIS

Tissue Detected Inuenced Cis Trans Both Brain 12025 1089 343 725 21 Kidney 3802 226 173 52 1 Liver 3276 165 101 59 5

Table 4.1: Genetic inuences on gene expression levels in three mouse tissues.

We then asked which transcripts had mappable genetic determinants, and whether these were in cis, in trans, or both (Table 4.1). We dene cis– acting inuences as those mapping to the marker closest to the transcript (the average spacing is one marker every 4Mb), and trans inuences as those at least 10Mb away or on a dierent chromosome. Surprisingly, the proportions of inuences are dierent between brain, and kidney and liver. The former exhibits a wealth of trans eects (69% of linkages) compared to the latter two tissues (23% and 39%, respectively). These results cumulatively provide further evidence that transcript levels are under genetic inuence across multiple tissues, and that the variants underlying this inuence may reside throughout the genome.

4.3.1 Linkage complexity

The diversity of genetic eects suggested by the above results, particularly the presence of both cis and trans inuences on some genes (Table 4.1, sixth column), led us to investigate multiple eects further. We tabulated the number of transcripts under one or more genetic inuences, and found that, whilst the majority exhibit a single signicant linkage, a substantial fraction exhibit two or more, with up to ve distinct genetic determinants being resolved in kidney, and eight in brain and liver (Table 4.2; all signicant linkage results are tabulated in Appendix A). There is a curious trend in these results: there appears to be increasing complexity in genetic inuences on regulation commensurate with the complexity of cell types found in each tissue. Put another way, it would appear that simple tissues have simple regulatory mechanisms, and complex ones more complicated ones. However, the observation is more accurately

73 4.4. EXPRESSION LEVEL CORRELATION ANALYSIS DETECTS BIOLOGICAL THEMES UNDER GENETIC INFLUENCE

Linkages Brain Kidney Liver 1 717 205 141 2 258 11 11 3 75 8 7 4 19 1 2 5 8 1 1 6 7 0 1 7 4 0 1 8 1 0 1 Total 1089 226 165

Table 4.2: explained by the heterogeneity of cell populations included in each tissue. Regulation has to be no more complex in any one cell derived from brain tissue than it is in a cell derived from kidney: there are simply more cell types in brain than kidney, and hence proportionally more genetic inuences on regulatory mechanisms are being detected. The presence of multiple linkages does, however, hint at the complexity of the eects of genetic variation on gene regulation; this is particularly notable given we are comparing only two genetic backgrounds.

4.4 Expression level correlation analysis detects biological themes under genetic inuence

Genetical genomics experiments are designed to generate, rather than test, hypotheses, and therefore the expectation is that any transcripts showing signicant linkage are of potential interest. Biological interpretation of these results is, however, often the limiting factor. Given practical constraints pro- hibiting follow–up on all mappable transcripts (over 2000 in our case), some level of interpretation is required in order to make general inferences, and prioritise candidates for further study. Several strategies may be used: ranking by linkage or association strength and/or complexity; manual curation and intuitive hypothesis ranking; prior biological hypotheses of particular

74 4.4. EXPRESSION LEVEL CORRELATION ANALYSIS DETECTS BIOLOGICAL THEMES UNDER GENETIC INFLUENCE interest to investigators; incorporation of external annotations or sequence information; and experimental validation. For example, Hubner et al. [2005] overlayed cis–acting expression QTLs in rat fat and kidney with human QTLs for blood pressure and hypertension, to nd 73 candidate genes for the human QTLs which were their primary research interest. Using a validation strategy, Doss et al. [2005] nd that 64% of cis–acting eQTLs detected in an F2 cross between C57Bl/6J and DBA/2J mice [Schadt et al., 2003] replicate in an F1 cross; and Pierce et al. [2006] use F2 crosses to validate 70% of cis, but no trans, eects discovered in a BXD RI panel. Here, we adopt a strategy similar to that described by ?, where we dene putative co–regulated sets of genes based on similarity of expression across RI strains. As we are interested in dierential genetic inuences on such regulons, we dene them as the sets of genes whose expression correlates with that of each mappable transcript (i.e., those in Table 4.1). We then annotate these clusters using the GO term over–representation technique described in Chapter 3, section 4, to determine potential functional signicance of dierential regulation.

4.4.1 Correlation analysis of genetically variant expression levels identies biological pathways

We have dened the members of each cluster as the 5% of detectable transcripts most correlated to the seed transcript of the cluster. This process has two consequences: it obviates the need for a hard correlation threshold, whose denition may be problematic; and it makes each cluster comparable in size, making over–representation analyses more comparable. Of the 1089, 226, and 165 clusters dened in brain, kidney, and liver respectively, we nd 24, 16, and 31 have over–represented GO terms associated with them (presented in full in Appendix B). Several biological themes emerge in each tissue: transcriptional regulation, ATP binding, and signal transduction in brain; transcriptional regulation, cellular organisation, and ion binding in kidney; and transcriptional regulation, signal transduction, and metabolism in liver. Strikingly, the regulation of transcription is a com-

75 4.4. EXPRESSION LEVEL CORRELATION ANALYSIS DETECTS BIOLOGICAL THEMES UNDER GENETIC INFLUENCE

Seed CoM† P ‡ Seed CoM† P ‡ NM 009475 284 < 1 x 1016 AK008491 240 < 1 x 1016 AK018586 45 < 1 x 1016 NM 019960 239 < 1 x 1016 AF272844 263 < 1 x 1016 NM 017390 31 < 1 x 1016 NM 025623 33 < 1 x 1016 D29939 185 < 1 x 1016 BC011420 256 < 1 x 1016 AF154571 275 < 1 x 1016 NM 010371 251 < 1 x 1016 NM 025911 254 < 1 x 1016 AJ279846 263 < 1 x 1016 D82866 34 < 1 x 1016 AK011831 106 < 1 x 1016 AK008614 238 < 1 x 1016 NM 016806 220 < 1 x 1016 NM 029565 180 < 1 x 1016 AK016990 273 < 1 x 1016 AK009532 274 < 1 x 1016 AK013835 257 < 1 x 1016 AK015300 37 < 1 x 1016 NM 019976 262 < 1 x 1016 L07051 32 < 1 x 1016

Table 4.3: Signicant aggregation of linkages in correlation clusters of transcripts expressed in brain (mean cluster size of 600 transcripts). All clusters with associated over–represented GO terms show marked linkage aggregation, suggesting genetic inuence. † Co–incident mappings: largest number of linkages to a single marker. ‡ Assessed by a 2 test, one degree of freedom.

mon theme across all three tissues, echoing the results of a similar analysis in the parental strains (Chapter 3, section 3). We note here that the strong indication that the transcriptional machinery appears to be the target, rather than the source, of variation between the RI strains suggests that the majority of genetic variation resides elsewhere in the functional landscape of the genome, consistent with our previous ndings (Chapter 3).

4.4.2 Correlated clusters have common genetic determinants

Our clustering analysis does not explicitly address genetic determinants of clusters, but merely suggests that, as the members’ expression levels are correlated, they may be co–regulated. We therefore wished to ask whether clusters with over–represented GO terms had indications of common genetic inuence between the members, by looking for coincident linkages within the clusters. As the majority of cluster members do not have signicant linkages at the genome–wide level, we used the best linkage, irrespective of

76 4.5. DISCUSSION strength, as indicative of possible genetic eect.

Seed CoM† P ‡ Seed CoM† P ‡ NM 009466 16 < 1 x 1016 NM 009574 40 < 1 x 1016 NM 010371 20 < 1 x 1016 AK005218 49 < 1 x 1016 NM 009093 11 < 1 x 1016 NM 026612 6 5.73 x 107 AK003956 24 < 1 x 1016 NM 025797 16 < 1 x 1016 NM 025699 33 < 1 x 1016 AF176530 37 < 1 x 1016 AK018430 12 < 1 x 1016 NM 008218 11 < 1 x 1016 NM 008303 12 < 1 x 1016 NM 007409 12 < 1 x 1016 AK014238 18 < 1 x 1016 BC009153 16 < 1 x 1016

Table 4.4: Signicant aggregation of linkages in correlation clusters of transcripts expressed in kidney (mean cluster size of 190 transcripts). All clusters with associated over–represented GO terms show marked linkage aggregation, suggesting genetic inuence. † Largest number of linkages to a single marker. ‡ Assessed by a 2 test, one degree of freedom.

We present the largest accretions of coincident linkage within each cluster in Tables 4.3, 4.4, and 4.5. These results show that all clusters with over–represented GO terms (i.e. those detailed in Appendix B) exhibit signicantly coincident linkage in brain (24 clusters), kidney (16), and liver (31), respectively. Furthermore, secondary accretions of linkage are observed in many clusters, and these are also signicant by the same metric [data not shown]. We note that, although the individual linkages themselves are generally not statistically signicant, the overlap between such “best hits” is striking, and we interpret this as suggestive of common genetic determinants on transcript levels and hence evidence of putative co–regulation patterns.

4.5 Discussion

In this chapter, we have presented a high–level overview of the genetic landscape of gene expression in brain, kidney, and liver. We have shown that many transcript levels appear to have genetic determinants, that these are genetically complex, and that certain biological functionalities appear to be particularly susceptible to genetic inuence. Our analyses also suggest that

77 4.5. DISCUSSION

Seed CoM† P ‡ Seed CoM† P ‡ AB010331 19 < 1 x 1016 NM 021384 16 < 1 x 1016 Z12419 23 < 1 x 1016 AK004313 14 < 1 x 1016 NM 008317 12 < 1 x 1016 AK007274 34 < 1 x 1016 AK016223 19 < 1 x 1016 NM 010739 16 < 1 x 1016 AF179403 16 < 1 x 1016 NM 008953 9 1.22 x 1015 AK005665 19 < 1 x 1016 AK017880 24 < 1 x 1016 AK011390 9 1.22 x 1015 AK019620 19 < 1 x 1016 AK007715 16 < 1 x 1016 NM 008280 16 < 1 x 1016 NM 021535 19 < 1 x 1016 NM 008183 7 1.97 x 109 AK003192 15 < 1 x 1016 NM 021877 16 < 1 x 1016 AK013045 12 < 1 x 1016 NM 008946 29 < 1 x 1016 BC008111 35 < 1 x 1016 AK016507 10 < 1 x 1016 M28684 26 < 1 x 1016 AK006223 28 < 1 x 1016 AK005861 17 < 1 x 1016 NM 010402 14 < 1 x 1016 NM 030724 10 < 1 x 1016 NM 020600 19 < 1 x 1016 NM 019871 14 < 1 x 1016

Table 4.5: Signicant aggregation of linkages in correlation clusters of transcripts expressed in liver (mean cluster size of 163 transcripts). All clusters with associated over–represented GO terms show marked linkage aggregation, suggesting genetic inuence. † Largest number of linkages to a single marker. ‡ Assessed by a 2 test, one degree of freedom.

incorporating the correlation structure of transcript levels across samples enhances detection of co–regulated clusters of genes under genetic inuence. We note that one of the more subtle limitations of array technology is that only relatively highly expressed transcripts are considered; however, failure to detect a transcript does not imply that it is not expressed: it merely cannot be detected by our technology. This resolution limit is the compromise made in assaying the transcriptome (nearly) as a whole, rather than individually assaying each transcript. A seldom discussed corollary to this limit is that a particular class of biological functionalities may be being selectively ignored: transcripts encoding rate–limiting molecules which are generally expressed at low levels will almost invariably not be detected. The nding that the transcriptional machinery and signal propagation

78 4.6. MATERIALS AND METHODS apparatus are preferentially under genetic inuence is particularly striking: this suggests to us that the majority of genetic changes to the transcriptional landscape do not reside in cellular components directly involved in transcription. Consequently, variants throughout the genome may act on the generic transcriptional machinery to induce changes to transcript levels either directly or indirectly. This observation in turn suggests that transcript level variation may be a generic way of producing changes between individuals which lead to phenotypic dierences. If this is the case, a further axis of variation may be tissue–specicity of this variation, as seen in the progenitor strains (Chapter 3). We investigate this possibility in the next chapter.

4.6 Materials and Methods

4.6.1 RNA preparation

Three eight–week old males from Mus musculus BXD/TyJ strains 1, 2, 5, 6, 8, 9, 11–16, 18–24, 27–34, 36, 38, 39, 40, and 42 were obtained from the Jackson Laboratories (Bar Harbor, ME,USA). Whole brain, kidney and liver tissues were harvested according to protocols approved by the Uni- versity of New South Wales Animal Care and Ethics Committee (Ethics Code ACEC 01/43), and snap frozen in liquid N2. Total RNA was extracted according to the manufacturer’s instructions with Qiagen RNEasy RNA extraction systems (Qiagen, Doncaster, Vic, Australia). Purity was assessed by OD260/OD280 readings greater than 2; integrity was assessed by visual inspection for intact rRNA bands analysis after electrophoresis on 1% agarose–TAE gels containing 1g/100ml ethidium bromide. Equal amounts of total RNA from each strain were mixed to give tissue pools representative of the genetic backgrounds. A common reference sample was created for each tissue from total RNA extracted from ten eight–week–old male C57Bl/6J mice obtained from the Biological Resources Centre, UNSW (Sydney, Australia), as described above. 20g of pooled RNA and 2g of Lucidea Universal Scorecard Spike-in

79 4.6. MATERIALS AND METHODS

(Amersham Biosciences, Castle Hill, NSW, Australia) were reverse transcribed using the SuperScript III Indirect cDNA Labeling System (Invit- rogen, Mt. Waverley, Vic, Australia) and uorescently labeled with Alexa Fluor 555 for C57BL/6J and Alexa Fluor 647 for BxD strain samples (In- vitrogen, Mt. Waverley, Vic, Australia).

4.6.2 Microarray hybridisation and washing

Each sample:reference cDNA pair was hybridised to glass slide two-color microarrays containing the Compugen Mouse OligoLibrary representing 21,997 genes and Lucidea Universal ScoreCard (Clive and Vera Ramaciotti Center for Gene Function Analysis, UNSW, Sydney, Australia), in 100l of DI- GEasy buer (Roche, Basel, Switzerland) with 5l each yeast tRNA and calf thymus DNA as blockers (Invitrogen, Mt. Waverley, Vic, Australia). Utility controls from the Lucidea Scorecard were not used, and therefore served as additional negative controls. Hybridized microarrays were washed in 1xSSC, three times in 1xSSC, 0.1%SDS at 50 C, and three times in 1xSSC, dried by centrifugation, and scanned with the GenePix 4000B microarray scanner (Axon Instruments, Union City, CA, USA).

4.6.3 Data processing

Images were analysed with the Spot image analysis suite, v. 2 (CSIRO, Aus- tralia, http://experimental.act.cmis.csiro.au/Spot/index.php). All further data processing and statistical analyses were performed using R version 2.0.0 or later [R Development Core Team, 2005]. Gene expression data were morph background corrected and log2 transformed. Data for controls and the 232 replicated spots of the housekeeping gene GapdH (NM 008084) were removed prior to normalization to avoid bias. All slides were then normalized for intensity, spatial and deposition bias using the additive model method described in Chapter 2; the 31 slides in each tissue group were then quantile adjusted to allow for the diering scale of measurements across arrays Yang et al. [2002]. Log2 ratios of intensities, or M values, were subsequently used as expression “phenotypes” for linkage analysis.

80 4.6. MATERIALS AND METHODS

4.6.4 Linkage analysis

A genetic map of 652 informative mouse markers spanning all autosomes and the X chromosome was compiled from publicly available information on the BXD strains at http://www.nervenet.org [Eva Chan, UNSW, pers. comm.]. Markers with missing genotypes were excluded, as were those with redun- dant Strain Distribution Patterns (genotype strings across the panel; SDPs). Those with less than two of either genotype were considered uninformative, and also excluded. Two approaches were used to assess signicant linkage: a two–sample Student’s t–test which was resampled 10,000 times for each marker, to generate empirical p values; and an adaptation of the B–statistic [Lonnstedt and Speed, 2002; Smyth, 2004], with signicance assessed by nominal P values. For each marker, a Student’s t–test was calculated for each gene by separating the 31 expression ratios into two groups by genotype. Expression ratios were then randomly resampled with replacement to create new groups of the same numbers of observations, from which a new t–test was calculated. This process was repeated 15,000 times to give a distribution of permuted t–tests for each gene. The p value is then the proportion of permuted t–tests greater in magnitude than the observed statistic for the gene. These p values are therefore limited to a minimum value of 1/(number of bootstraps). B–statistics were calculated using the implementation in the limma package version 1.8.6 from the Bioconductor project [Gentleman et al., 2004]. For each marker, a contrast matrix was generated according to the distribution of genotypes across the strains, creating two groups of expression values as for the Student’s t–test procedure above. Since the B–statistic is essentially a t–test with robust variance estimation, this method is equivalent to the bootstrapped t–test, but does not require permutation to assess signicance.

81 Chapter 5

Resolvable genetic determinants of mRNA levels are tissue specic

82 5.1. INTRODUCTION

5.1 Introduction

The recent demonstrations of substantial genetic modulations on mRNA levels in yeast [Brem et al., 2002; Fay et al., 2004; Yvert et al., 2003], ies [Wittkopp et al., 2004], teleost sh [Oleksiak et al., 2002, 2005], rodents [Bystrykh et al., 2005; Chesler et al., 2005; Hubner et al., 2005; Schadt et al., 2003] and humans [Monks et al., 2004; Morley et al., 2004; Schadt et al., 2003] provide ample evidence that genetic variation causes quantitative changes to gene regulation. Although gene regulation is known to be highly tissue–specic [Wray et al., 2003], there has been little discussion on the tissue specicity of these eects in multicellular organisms. Segregating populations oer a powerful method of dissecting genetic determinants on mRNA levels. The ability to denitively identify a genetic component to such molecular traits oers compelling proof of genetically encoded dierences, in contrast to our somewhat informal denition of strain dierences in the previous chapter. Recombinant Inbred (RI) panels of strains, derived from a single cross between two progenitor (parental) strains and bred to homozygosity as distinct strains, are a particularly useful resource in this context [Abiola et al., 2003]. Individuals do not require genotyping as the genetic background is known; they may be pooled to preclude individual, non–genetic expression uctuations; and, as they are homozygous, only two genotypes are possible, simplifying the detection of linkage. Here we use a panel of 31 mouse RI BxD strains derived from two common progenitor strains, C57Bl/6J and DBA/2J, to map genetic determinants of transcript levels, measured with microarrays, in three tissues. We have previously shown (Chapter 3) that these strains exhibit numerous genetic inuences on mRNA levels, and that these are overwhelmingly tissue– specic. Here we show that these ndings are corroborated; we further report that a signicant number of genes display transgressive segregation, indicating that control of their mRNA levels is complex. We identify at least two loci which inuence multiple transcripts, both of which are predominantly tissue–specic, and suggest that genetic mapping may provide

83 5.2. EXPERIMENTAL DESIGN a mechanism for discovering new functional relationships between genes.

5.2 Experimental design

Our main objective is to investigate the tissue specicity, if any, of genetic inuences on gene expression. We use the data presented in the previous chapter from whole brain, kidney, and liver of 31 BXD RI strains, but only consider the genes detected in the progenitor strains (Chapter 3). Our linkage detection strategy remains otherwise unchanged.

5.3 Genetic determinants of genes found genetically inuenced in parental strains are tissue specic

We rst asked whether transcripts with genetically inuenced mRNA levels in the parental strains C57Bl/6J and DBA/2J had genetic determinants of those levels resolvable in the RI panel. In this analysis, we restrict ourselves to the 755 genes identied in Chapter 3, section 3. A small, but signicant, proportion (Table 5.1) of these transcripts appear to have resolvable genetic determinants of mRNA levels.

Tissue G.I. P 0.05 P 0.01 P 0.001 genes (FP) (FP) (FP) Brain 247 24 (12) 15 (2) 6 (0) Kidney 667 47 (33) 26 (7) 9 (1) Liver 411 32 (21) 14 (4) 7 (0) Total† 755 94 48 20

Table 5.1: Linkage results for the genes with genetically inuenced (G.I.) mRNA levels in the parental strains C57Bl/6J and DBA/2J (Chapter 3), at three genome–wise signicance levels. It is striking that only a small proportion of genes identied in Chapter 3 have mappable genetic determinants. FP: expected number of false positives. † Unique number of genes in the three analyses.

84 5.3. TISSUE SPECIFICITY OF PARENTALLY INFLUENCED GENES

In the parental strains, the majority of these genes appeared to have mRNA levels under tissue–specic genetic inuence (Chapter 3, section 3), a result which we found not to be an artefact of poor power. To corroborate this nding, we asked whether genetic determinants of mRNA levels for these genes were also tissue–specic. The results, presented in Figure 5.1, strongly support the hypothesis that genetic inuences on mRNA levels are tissue–specic. Only a single transcript is inuenced in all three tissues (see also Figure 5.4), with the determinant localising to within 3.5Mb of the gene, suggesting an eect in cis.

P < 0.05 P < 0.01 P < 0.001 B K B K B K

18 5 39 11 3 20 5 0 8 1 1 1 0 2 0 2 0 0

29 11 6

L 94 L 48 L 20

Figure 5.1: overlap of linkage results for the transcripts presented in Ta- ble 5.1. The vast majority of transcripts only have mRNA level determinants in one of the three tissues: Brain, Kidney, or Liver.

A further question is whether these determinants act predominantly in cis or in trans. Table 5.2 suggests that the majority of eects occur in trans, dened here as linkage to a locus more than 5Mb away from the gene’s location. Varying this denition does not alter these gures substantially, as the majority of determinants reside on dierent chromosomes to their targets. The majority of cis–acting determinants are detectable at the most stringent threshold of genome–wise P 0.001, consistent with the notion that these may be monogenic eects, as opposed to the potential complexity of those in trans. Other reports also observe this nding: for example, Hubner et al. [2005] nd that cis–acting determinants in a rat RI panel give higher signicances than those acting in trans. As the analyses of the parental strains and the RI strains comprising the mapping panel rest on independent experiments, these results support

85 5.3. TISSUE SPECIFICITY OF PARENTALLY INFLUENCED GENES

Tissue P 0.05 P 0.01 P 0.001 cis trans Total cis trans Total cis trans Total Brain 3 21 24 2 13 15 2 4 6 Kidney 11 33 47 11 14 26 6 3 9 Liver 4 25 32 3 9 14 2 4 7

Table 5.2: numbers of determinants of mRNA levels acting in cis and in trans. Eects are in cis if they map to a locus within 5Mb of the transcript genomic location. The majority of eects are in trans, but those in cis tend to be stronger (give lower P –values) and are therefore more reliably detected at more stringent thresholds. The location of some transcripts is ambiguous, so not all determinants can be classied as either in cis or in trans. the conclusions of Chapter 3, that genetic inuence on gene expression is predominantly tissue–specic. The small number of genetic determinants (< 10% at P 0.05) suggest that genetic inuence is complex, so that each contributing locus only accounts for a small proportion of the nal expression phenotype, although this could also be a result of the modest size of the RI panel. This is consistent with previous reports of small proportions of genes with highly heritable mRNA levels having resolvable genetic determinants in segregating populations (see Introduction, Table 1). It is particularly congruent with the simulation studies of Brem and Kruglyak [2005], who found that the majority of expression phenotypes in a haploid yeast population derived from a cross between laboratory and wild strains are best explained by models containing more than ve contributing loci. However, we are formally unable to rule out the possibility that some of the genes for which we are unable to map determinants have mRNA level dierences between strains which are of non–genetic origin. Irrespective of this objec- tion, as the majority of determinants on mRNA levels for genes expressed in all three tissues are only observable in a single tissue, we conclude that the great majority of genetic inuences on mRNA levels are tissue–specic.

86 5.4. TRANSGRESSIVE SEGREGATION OF MRNA LEVELS

5.4 Transgressive segregation of mRNA levels

Several genetical genomics studies (reviewed in the Introduction, section 3) have reported that mRNA levels which are invariant in parental genotypes can vary in a segregating population, and that genetic determinants of this variation are resolvable [e.g. Brem and Kruglyak, 2005; Chesler et al., 2005; Montooth et al., 2003]. This observation has been ascribed to the phenomenon of transgression, where additive QTLs have opposite eects, thus cancelling out the phenotype dierence in the parental genotypes. Reassort- ment of alleles in recombinants, however, breaks this balance, so that the phenotypes diverge in recombinant ospring. To investigate the occurance of transgressive segregation in the BxD RI panel, we sought genetic linkage for expressed transcripts (i.e., those with intensities higher than negative controls, as dened in Chapter 3) which are not genetically variant in the parental strains. We rst considered each tissue separately (Table 5.3, top panel), and then restricted our analysis to the 1870 genes expressed in all three tissues (Table 5.3, bottom panel). From these results, it appears that transgressive segregation is limited to brain, with perhaps a few instances in liver. Figure 5.2 shows the overlap between the transcripts with genetic determinants identied in Table 5.3. As with genes having genetically inuenced mRNA levels in the parental strains (previous section), there is minimal overlap between tissues. This result further enhances the notion that the overwhelming majority of genetic inuences on mRNA levels are restricted to a single tissue. Much like eects on mRNA levels detectable in the parental strains, transgressive genetic eects generally act in trans to the genes they eect (Table 5.4). This is unsuprising, as, by denition, they act in concert with at least one other trait determinant. This implies that one of the determinants will have to reside elsewhere in the genome to the eected gene(s). The lack of credible eects in cis suggests that transgression is occuring, rather than being the result of failure to detect an actual dierence in the parental strains due to low statistical power. This result reinforces the previous suggestion

87 5.4. TRANSGRESSIVE SEGREGATION OF MRNA LEVELS

Tissue Expressed P 0.05 P 0.01 P 0.001 genes (FP) (FP) (FP) Brain 11846 877 (592) 327 (118) 67 (12) Kidney 3512 52 (176) 9 (35) 1 (4) Liver 3086 79 (154) 23 (31) 7 (3) Brain 1870 189 (93) 73 (19) 8 (2) Kidney 1870 29 (93) 5 (19) 1 (2) Liver 1870 37 (93) 7 (19) 4 (2) Total† 1870 244 84 13

Table 5.3: linkage results for genes displaying transgressive segregation, i.e. those not found genetically inuenced in parental strains. Top: tissue– wise analysis of reliably detected (expressed) genes, as dened in Chapter 3. Bottom: analysis of genes found expressed in all three tissues. These gures do not include the genes in Table 5.1. FP: expected false positives. † Unique number of genes presented in the three analyses.

P < 0.05 P < 0.01 P < 0.001 B K B K B K

179 7 21 72 1 4 8 0 1 0 0 0 3 1 0 0 0 0

33 7 4

L 244 L 84 L 13

Figure 5.2: overlap of transgressively segregating genes in Brain, Kidney, and Liver. As seen for genes with genetically inuenced mRNA levels in the parental strains (above), there is essentially no overlap in the identities of genes eected, suggesting that genetic determinants of mRNA levels are tissue–specic. that genetic inuences on mRNA levels are complex. The nding that transgression is more common in the brain is intriguing: it may reect the relative complexity of the tissue, both in terms of cell–type populations and distinct transcriptional proles [e.g. Sandberg et al., 2000]; or it may be a consequence of the divergence of the two parental strains since their inception.

88 5.5. SOME LOCI INFLUENCE MULTIPLE TRANSCRIPT MRNA LEVELS

Tissue P 0.05 P 0.01 P 0.001 cis trans Total cis trans Total cis trans Total Brain 6 807 877 1 304 327 0 64 67 Kidney 1 46 52 0 9 9 0 1 1 Liver 2 71 79 0 23 23 0 7 7 Brain 1 174 189 1 66 73 0 7 8 Kidney 0 27 29 0 5 5 0 1 1 Liver 0 33 37 0 7 7 0 4 4

Table 5.4: numbers of determinants of transgressively segregating mRNA levels acting in cis and in trans. Eects are in cis if they map to a locus within 5Mb of the transcript genomic location. Top panel: segregating transcripts expressed in each tissue. Bottom panel: segregating transcripts expressed in all tissues (see Table 5.3). The majority of eects are in trans; unlike those in Table 5.2, eects in cis do not tend to be stronger. The location of some transcripts is ambiguous, so not all determinants can be classied as either in cis or in trans.

We therefore conclude that mRNA levels are under considerable tissue– specic genetic inuence, which appears to be complex. Furthermore, the absence of mRNA level dierences between two genetic backgrounds is not necessarily an indication of lack of genetic inuences, but may rather be a compensation by determinants with opposite eects. We further speculate that the presence of such phenomena in only a single tissue may be a reection of the modest size of our segregating population; others of smaller eect may exist in the other tissues.

5.5 Some loci inuence multiple transcript mRNA levels

We next wished to investigate the properties of the loci exerting these inuences. We are particularly interested in two categories of loci: those inuencing multiple genes (dened here as more than ten at P 0.05); and those inuencing genes in all tissues. The former implies co–regulation of the inuenced transcripts (i.e. a “regulon” [Cotsapas et al., 2003]), which

89 5.5. SOME LOCI INFLUENCE MULTIPLE TRANSCRIPT MRNA LEVELS may also indicate a functional link between these genes. The latter category represents particuarly credible eects, as they have been detected in three independent experiments. As above, we only considered genes expressed in all tissues.

5.5.1 A region on chromosome 1 inuences multiple transcripts in brain

Table 5.5 lists all genetic markers to which linkage was obtained for at least ten transcripts at P 0.05 in brain. No such eects were observed in the other two tissues. A region of approximately 14 Mb on chromosome 1 is particularly startling, exerting a trans inuence, in brain only, on more than 200 transcripts at the lowest genome–wise signicance, many of which show linkage to several markers in the region. However, the number of transcripts drops precipitously to eleven at P 0.001, so that many of these eects must be treated with caution.

Marker chromosome Mb P 0.05 P 0.01 P 0.001 D1Mit128 1 72.957 11 – – D1Mit19 1 74.126 12 – – D1Mit216 1 80.319 113 49 – D1Mit134 1 80.781 125 60 11 D1Mit83 1 86.032 22 – – D8Mit124 8 14.758 22 – – D19Mit28 19 15.176 15 – –

Table 5.5: genetic markers giving signicant linkage to multiple transcripts. This is dened as more than ten linkages at P 0.05. Note the abrupt decline in numbers of inuenced transcripts. Multiple linkages were only observed in brain.

The two markers displaying most signicant linkages are in a region of non–shared origin between the two parental strains (Figure 5.3), as determined by SNP polymorphisms [Pletcher et al., 2004]. This region is approximately 2 Mb in size, and is anked by 2 Mb and 5 Mb of shared origin sequence with little SNP polymorphism. Examination of this region in the

90 5.5. SOME LOCI INFLUENCE MULTIPLE TRANSCRIPT MRNA LEVELS

UCSC Genome Browser [03/2005 build, Kent et al., 2002] reveals ten RefSeq genes; of these, serpin2, a serine protease inhibitor (RefSeq ID: NM 009255) is the only one known to be highly expressed in brain but not kidney or liver. It is an extracellular protein involved in follicular development, but originally derived from glia [Bedard et al., 2003]. Interestingly, Chesler et al. [2005] report that that a locus centred about the marker Mtap2, itself highly polymorphic between the parental strains, on chromosome 1 at 66.7 Mb gives maximum (but non–signicant) linkage for more than 400 transcripts in a similar panel of 32 BxD strains. These results indicate that a variant residing on chromosome 1 is responsible for the modulation of multiple mRNA levels, and that this may indicate a functional link between the genes eected. 200 0 Number of genes 0 50 100 150 200

Figure 5.3: Numbers of transcripts mapping to chromosome 1 at P 0.05 (dotted line), P 0.01 (dashed line), and P 0.001 (solid line). Gray boxes indicate regions of SNP polymorphism.

We conclude that the region of chromosome 1 between 73 Mb and 86 Mb contains a brain–specic mRNA level eector which inuences at least eleven transcripts. Such an eector would not have been detected in analyses of other tissues, conrming our previous nding (Chapter 3, section 3) that the majority of genetic inuences on mRNA levels are tissue–specic, and that consideration of a single tissue will dramatically underestimate the total number of regulatory variations between two genetic backgrounds.

91 5.5. SOME LOCI INFLUENCE MULTIPLE TRANSCRIPT MRNA LEVELS

5.5.2 A region on chromosome 8 inuences transcripts in all tissues

We next looked for genetic markers displaying linkage to transcripts in all three tissues. The ve markers satisfying this criterion are listed in Table 5.6, and describe a contiguous locus on chromosome 8. All markers show linkage to a single transcript in all three tissues: AK007639, part of the hypothetical protein Rnf122 residing at chromosome 8 29.9 Mb (ring nger protein 122, RefSeq NM 175136.1; Figure 5.4). Interestingly, this gene is dierentially expressed between the parental strains in all three tissues.

Marker chromosome Mb P 0.05 P 0.01 P 0.001 3/3 2/3 1/3 3/3 2/3 1/3 3/3 2/3 1/3 D8Mit289 8 27.627 1 – 6 1 – 4 1 – 1 D8Mit189 8 33.233 1 1 4 1 1 1 1 – 2 D8Mit24 8 33.602 1 1 1 1 – 2 1 – 2 D8Mit294 8 38.471 1 1 5 1 1 1 1 1 1 D8Mit339 8 39.420 1 1 3 1 – 2 1 – –

Table 5.6: loci aecting transcript levels in multiple tissues. There appears to be a single locus, on chromosome 8, which inuences a number of transcripts in at least one tissue. Dashes indicate zero.

Multiple other transcripts are inuenced from this region in kidney and liver, but not brain (Table 5.6). As all the genes considered here are expressed in all three tissues, the actions of the locus must be tissue specic, suggesting that it participates in the regulation of the aected genes in some tissues but not others, and by the fact it appears to encode a ring nger protein, which is a class of zinc nger and therefore a possible transcription factor. There is the interesting possibility that AK007639 represents the gene harbouring the causative variation: a cis–acting eect changing its expression level could have “knock–on” eects in trans if it is involved in the regulation of other genes. The transcript localises to a region of chromosome 8 which is highly polymorphic between the parental strains [Figure 5.4 and Pletcher et al., 2004], supporting this possibility. Although speculative, this proposition is strengthened by the independent identication of the locus in

92 5.5. SOME LOCI INFLUENCE MULTIPLE TRANSCRIPT MRNA LEVELS

Gene D8Mit289 D8Mit189 D8Mit24 D8Mit294 D8Mit339 27.6 Mb 33.2 Mb 33.6 Mb 38.5 Mb 39.4 Mb B K L B K L B K L B K L B K L NM.009078 – – – – – – – – – – 1.7 – – 1.3 – AK007639 3.8 6 5.3 7.7 10.9 12.3 6.7 7.4 9.3 5.7 7 9.7 3.6 5.1 5.8 M73678 – – – – 1.4 – – – – – 1.4 – – – – U58670 – 1.5 – – – – – – – – 1.4 – – 1.9 – BC006779 – – 2.7 – – 6.9 – – 4.4 – – 4.7 – – 2.6 NM.010564 – 2 – – 1.5 – – – – – – – – – – AK019407 – – – – – – – – – – 1.4 – – – – X81716 – 2 – – – – – – – – – – – – – AK016395 – – 3.4 – 2.5 8.7 – 1.8 7.2 – 3.1 4.8 – 1.5 2.3 AB024497 – – – – 1.6 – – – – – – – – – – AK013716 – 1.5 – – – – – – – – – – – – –

Table 5.7: Genome–wide signicance levels (log10P ) for genes inuenced by a region of chromosome 8. Thus, 2 P = 0.01, 2.3 P = 0.005, 3 P = 0.001, etc. Dashes represent non–signicant linkage (P > 0.05). The three genes in the top panel reside on chromosome 8 at 19.4, 29.9 and 86.1 Mb, respectively; the transcripts in the bottom panel localise to other chromosomes. AK007639 is the most strongly inuenced transcript, probably under cis–acting inuence.

dierent tissues. 10 5 −log10 P

0 T 20 40 60 80 100 120

Figure 5.4: Linkage scans for AK007639 in brain (solid line), liver (dashed) and kidney (dotted). The transcript is a RIKEN cDNA originally isolated from C57Bl/6J pancreas, part of the hypothetical protein Rnf122 (ring nger protein 122, RefSeq NM 175136.1). The nucleotide sequence localises to chromosome 8 29.93Mb, indicated by the “T”.

This region (25 – 40 Mb, chromosome 8) is primarily of non–shared origin in the parental strains. We cannot, therefore, exclude any of the approximately sixty genes in this region as candidate eectors, in contrast

93 5.6. DISCUSSION to the region on chromosome 1 discussed in the previous section. Analyses in other inbred strains with dierent patterns of shared origin regions may reduce this interval, making candidate selection easier [Wade et al., 2002]. These results lead us to conclude that the chromosome 8 region dened by the markers in Table 5.6 harbours a transcriptional eector which inuences transcripts in a highly tissue–specic fashion. The genes under its inuence are, by denition, co–regulated, and may therefore share a common biological functionality. The number of genes we nd inuenced, however, is too small to obtain reliable results with the Gene Ontology term over– representation analysis employed in the previous chapter. We note that, as the eector acts in a tissue–specic way, analysis of a single tissue would under–estimate its impact on mRNA levels.

5.6 Discussion

We have shown that a signicant number of mRNA levels have genetic determinants susceptible to genetic dissection in a segregating population. The results presented here conrm our previous ndings (Chapter 3), that the majority of these genetic inuences are tissue–specic, even when the af- fected genes are expressed in all tissues analysed. This is consistent with the ndings of Hubner et al. [2005], who nd only 15% of (mostly in cis) determinants to be common between kidney and adipose tissue in a rat RI panel, and of Chesler et al. [2005] and Bystrykh et al. [2005] who nd 50% of cis–acting eects common between forebrain and hematopoietic stem cells. We identied a large number of genes as genetically inuenced in the parental strains (Chapter 3), for which no determinants could be found in any tissue. There could be several genetic or technical reasons for this: our mapping population is very small by canonical QTL analysis standards (31 strains vs. at least several hundred individuals commonly employed); there may be multiple determinants for each expression phenotype, further eroding our power of detection; the microarray technology on which these studies rests may simply not have the requisite resolution to measure small changes

94 5.6. DISCUSSION in expression levels; and that our analysis does not consider purely epistatic determinants which have little marginal eect, recently shown to underlie 14% of yeast mRNA level variations [Storey et al., 2005]. However, we point out that our denition of genetic inuence in the parental strains is informal, as we assume that any observed dierences in mRNA levels have a genetic component, which is a maximum estimate of genetic inuences. The two loci found to aect multiple transcripts represent two distinct classes of eector: the rst is completely tissue–specic, only exerting inuence in a single tissue, and the second inuences dierent transcripts in each tissue. These ndings are in line with the common perception that gene regulation is predominantly tissue–specic, and that gene expression may be regulated by dierent mechanisms in each tissue [Wray et al., 2003]. This implies that extrapolation of genetic inuence on gene expression across tissues is not possible, conrming our nding in the parental strains. The existence of such loci implies co–regulation of the aected genes, at least in some tissues. We suggest that discovery of such regulons by genetic dissection may provide a powerful discovery method both for regulatory interactions and gene function. We further note that the discovery of two such loci in a small segregating population derived from only two genetic backgrounds suggests that recent eorts to expand such resources [Peirce et al., 2004] and on–going eorts to establish large mapping populations from multiple backgrounds [Churchill et al., 2004] will substantially increase our power to detect quantitative genetic eects on gene expression. There is presently a discrepancy in the genetical genomics literature as to the existence of “master modulatory loci” [Chesler et al., 2005] which aect the levels of very large numbers of transcripts. Brem et al. [2002] nd ten loci in a yeast haploid population inuencing between 8 and 87 transcripts each, as did Yvert et al. [2003]; and Schadt et al. [2003] and Chesler et al. [2005] each report several loci inuencing hundreds or thousands of transcripts each in mouse populations. In contrast, Morley et al. [2004] report two loci inuencing more than seven transcripts, and Monks et al. [2004] detail twelve loci inuencing more than three transcripts each, with none aecting more than six. Our own results seem to corroborate the latter ndings, with at

95 5.6. DISCUSSION most several tens of transcripts inuenced in at least one tissue by any locus at stringent signicance thresholds. We have previously shown (Chapter 2) that failure of normalisation can have dramatic eects on linkage analysis of expression phenotypes, which may be a confounding variable in the above discrepancy. We suggest that our analyses in multiple tissues presented here, the rst to be validated on a large scale (albeit in dierent tissues from the same animals), provide cautionary evidence against the existence of loci aecting hundreds of transcript levels, assuming the discrepancies in ndings are not due to limited statistical power due to small sample size. However, given that our study is equivalent in size to studies reporting such results [Bystrykh et al., 2005; Chesler et al., 2005], we do not believe that lack of power can explain this discrepancy. These analyses provide further evidence that estimating the extent of regulatory genetic variation will require extensive tissue proling, as extrapolation between tissues is unacceptably inaccurate. This makes it extremely dicult to directly assess such eects in human populations due to ethical and practical limitations: a new generation of model organism genetic resources will therefore be required to model the human condition, particularly in contexts where quantitative variations in gene regulation are key, such as complex genetic disease.

96 5.7. MATERIALS AND METHODS

5.7 Materials and Methods

5.7.1 RNA preparation

5.7.2 Microarray hybridisation and washing

97 5.7. MATERIALS AND METHODS thymus DNA as blockers (Invitrogen, Mt. Waverley, Vic, Australia). Utility controls from the Lucidea Scorecard were not used, and therefore served as additional negative controls. Hybridized microarrays were washed in 1xSSC, three times in 1xSSC, 0.1%SDS at 50 C, and three times in 1xSSC, dried by centrifugation, and scanned with the GenePix 4000B microarray scanner (Axon Instruments, Union City, CA, USA).

5.7.3 Data processing

5.7.4 Linkage analysis

98 5.7. MATERIALS AND METHODS nominal P values. For each marker, a Student’s t–test was calculated for each gene by separating the 31 expression ratios into two groups by genotype. Expression ratios were then randomly resampled with replacement to create new groups of the same numbers of observations, from which a new t–test was calculated. This process was repeated 15,000 times to give a distribution of permuted t–tests for each gene. The p value is then the proportion of permuted t–tests greater in magnitude than the observed statistic for the gene. These p values are therefore limited to a minimum value of 1/(number of bootstraps). B–statistics were calculated using the implementation in the limma package version 1.8.6 from the Bioconductor project [Gentleman et al., 2004]. For each marker, a contrast matrix was generated according to the distribution of genotypes across the strains, creating two groups of expression values as for the Student’s t–test procedure above. Since the B–statistic is essentially a t–test with robust variance estimation, this method is equivalent to the bootstrapped t–test, but does not require permutation to assess signicance.

5.7.5 Overlap analysis

99 Chapter 6

Discussion

100 6.1. RESULTS SUMMARY

6.1 Results summary

We have measured genetic inuences on approximately 22,000 mRNA levels in whole brain, kidney, and liver of two divergent inbred mouse strains, and recombinant inbred strains derived from them. We found that genetically encoded changes in expression are predominantly tissue–specic, and that they appear to be of complex genetic origin. We have also presented a new, extensible framework for microarray data normalisation that is suitable for genetical genomics applications, and which allows extensive tailoring to any data set with little eort. Our experimental ndings can be summarised fall into two categories:

1. Technical achievements

– Processing of microarray data has a profound eect on estimating linkage for expression “phenoypes”. Failure of normalisation may leave residual systematic biases which dramatically alter the outcome of linkage analyses and subsequent biological interpre- tations.

2. Biological achievements

– Genetic inuences on gene expression levels are tissue specic in mice. We have shown that mRNA level changes for genes detectable in multiple tissues in two strains of mice are overwhelmingly tissue specic. We have also shown, in a panel of recombinant inbred strains, that genetic determinants for these changes are tissue specic and appear to be of complex origin. – Extrapolation of genetic inuences between tissues is highly inaccurate, resulting in approximately 2% correct assignments. The tissue specicity of genetic eects eectively precludes prediction of inuence from one tissue to another. This also means that proling a single tissue will greatly under–estimate the total number of genetic eects on mRNA levels.

101 6.2. DEFINING REGULATORY CIRCUITS

– At least two loci appear to modulate the expression levels of groups of transcripts in a tissue–specic manner. These loci act in dierent ways: one is active in only a single tissue, whereas the other inuences dierent genes in each of three tissues. – Genes whose products are involved in transcription appear to be particularly susceptible to genetic inuences on gene expression, suggesting that such inuences reside in molecules not directly involved in transcription.

6.2 Dening regulatory circuits

We have shown that a large proportion of genetic determinants of mRNA levels act in trans to the aected genes. These variations must therefore reside in molecules which have an impact on the process of transcription itself, on the regulation of transcription, or on subsequent processes such as mRNA maturation, export, and turnover. The rst possibility is concep- tually the simplest: variation in the core transcriptional machinery and/or associated factors directly alters the rate of transcription of aected genes, producing a change in nal mRNA levels. This is perhaps best illustrated by the example of AK007639 (Chapter 4, section 5), a putative ring nger transcription factor and a highly plausible candidate for a trans eector of multiple genes. This possibility can be conated with that of variations in post–transcriptional machinery, as these processes are known to be inti- mately connected [Maniatis and Reed, 2002]. However, our analysis of Gene Ontology terms associated with genes exhibiting genetically modulated transcript levels (Chapter 3, section 4) suggests that this is not the only possible model: genes with functions associated with transcription and its regulation seem to be more susceptible to genetic alterations of transcript levels. Whilst“knock–on”eects of one transcription factor to another may be invoked to explain this, our observation of intra–cellular signalling–associated genes being aected tissue–specically in this experiment suggests that this is not the case. Rather, variations in molecules which control transcript levels more indirectly may be involved,

102 6.2. DEFINING REGULATORY CIRCUITS both those aecting the process of transcription and those aected post– transcriptional processes which impact on nal mRNA levels. Any denition of regulatory genetic circuits must therefore provide for a large proportion of genes which are not involved directly in the mechanics of transcription. Genetical genomics experiments can be used to infer networks of regulatory interactions [e.g. Li et al., 2005], subject to the above proviso. However, we note that the results presented both here and in other reports (see Intro- duction, Table 1 for a summary) stem from only a limited number of genetic backgrounds. Phenomena such as background eects [Nadeau, 2005] and modier loci [Nadeau, 2001] will modulate these eects, and it is impossible to predict the full range of possible alleles for any given gene [Davis and Justice, 1998]. We may thus expect far more information as a wider range of variants are considered (e.g. with resources such as the Collaborative Cross [Churchill et al., 2004]) to enrich our understanding of the scope of regulatory genetic networks.

6.2.1 Regulatory interactions as networks

Our results show that the majority of genetic eects act in trans, which implies regulatory interactions between genes. Our experiment cannot distinguish whether these interactions are direct, or if intermediates are involved. If the latter case is true, we may adopt a graph–theoretical framework to gene regulation, where genes are nodes and regulatory interactions between genes are (directed) edges connecting the nodes [e.g. Lee et al., 2002]. An eect in trans would then alter the strength of an interaction, expressed as a weight on the edge connecting the two nodes. Such network representations can be constructed in several ways: using existing protein–protein interaction data [Bork et al., 2004] (e.g. from yeast two–hybrid experiments); mining published literature for reported interactions [Friedman et al., 2001]; inferring co–regulation from correlations in expression levels across states [e.g. Basso et al., 2005]; using putative regulatory motif presence to construct a hierarchical interaction network [Segal et al., 2003]; or, most likely, some combination of all approaches [e.g. Scholtens et al., 2005].

103 6.2. DEFINING REGULATORY CIRCUITS

An eect in cis would be a weight alteration on a self edge, beginning and terminating at the same node. The size of the eect will vary, depending on factors such as the stochasticity of the regulatory interaction, its stoichio- metric requirements, and so on. Further, nodes will tend to have more than one incoming edge — the number of proteins directly involved in transcription of any gene is likely to be in the hundreds [Maniatis and Reed, 2002] — so that the overall eect is buered [Wittkopp et al., 2004]. Such buering would also explain the tendency for eects in cis to be stronger (i.e. give better linkage), as they do not have to lter through other interactions. If an aected node itself eects the expression of other genes, then by the same argument the eect will propagate through the network, gradually being absorbed by the buering at each node until the eect is lost. A variant may thus cause a local disturbance in the network of interactions, which is eventually absorbed by the rest of the structure. Such a scenario may explain the substantial phenotypic dierences being generated by limited genetic variation, through pleiotropic eects on gene regulation [King and Wilson, 1975].

6.2.2 Tissue specic regulatory interactions

We have demonstrated that biological systems common to multiple tissues can exhibit tissue–specic genetic inuences. This observation suggests that tissue–specicity is not necessarily caused by the use of dierent regulators in dierent tissues. Rather, the interactions in which these regulators are involved may change between tissue types, so that a common set of genes are involved in dierent conformations of regulatory interaction in each tissue. This observation implies that tissue–specic factors may not be the only explanation for tissue–specicity of genetic variation. Thus the same variants may have dierent eects in each tissue, providing an interesting insight into the mechanisms underlying pleiotropy. This argument may also be recast as a graph or network representation. Since we have only considered genes expressed in all tissues under investigation, our graph would constitute the same nodes — a “core” transcriptome

104 6.2. DEFINING REGULATORY CIRCUITS common to at least three tissues. The dierences in regulatory interactions across tissues can then be expressed as changes in the connections within the network. Genes expressed only in one tissue can then be added as areas of the network with dierent properties. This “alternate wiring” principle of tissue specicity may be useful in understanding not only the generation of phenotypic diversity from a small pool of genetic polymorphisms through pleiotropy, but also cell dierentiation processes and evolutionary changes in development (“evo–devo”). Although a review of evolutionary developmental biology is beyond the scope of this discussion, it should be pointed out that many of the ideas discussed in this thesis stem from this area (see Carroll [2005] for an overview). The existence of dierent hubs in tissue– specic network arrangements may create alternate foci for both pleiotropic individual variation and inter–species dierences.

6.2.3 Mapping regulatory metaphenotypes

Consideration of more complex representations of gene expression regulation brings us to the question of whether using transcript levels per se is appropriate in genetical genomics studies, and if not, what other measures could be used in their place. Gene expression levels are the most basic form of subphenotype we can consider in quantitative dissection of gene regulation. Whilst genetical genomics approaches oer a powerful method to study the complexity of gene regulation, a more fundamental question is the nature of ultimate phenotypic eects associated with genetically encoded changes in expression. Complex physiological or behavioural dierences, for example, do not necessarily translate into discrete gene expression states which may then be genetically dissected. Rather, the genetic basis of quantitative traits may be due to shifts in the structure of the transcriptome, which do not necessarily imply large changes in expression levels of any one gene. If this is the case, then a more appropriate approach would be to abstract functional modules within a set of regulatory interactions, and attempt to map determinants on these modules as a whole. In the context of graph representations, we would have to identify sub–graphs representing a bio-

105 6.3. THE IMPLICATIONS OF TISSUE SPECIFICITY logical functionality, and then abstract a quantity representing regulatory interactions to use as a meta–phenotype. Initial eorts have been made by Yvert et al. [2003], who use hierarchical clustering to delineate groups of transcripts and then use the mean expression levels of the clusters as trait values in a haploid yeast population; and by Kluger et al. [2004], who calculate principal components (see Chapter 2, section 2 for a discussion) for groups of genes sharing Gene Ontology terms to distinguish lineages of human hematopoietic cells. However, these approaches do not allow true intergration of multiple forms of data describing functions, interaction, sequence dierences, evolutionary constraints on each gene, etc. ; nor do they allow for information implicit in more complex representations of regulatory interactions such as graphs (e.g. directionality of interactions). New approaches to abstract this complex information are therefore required. This problem is not new, and is certainly not unique to genetical genomics: as a recent example, it has been considered by Klingenberg and Monteiro [2005] in the context of abstracting information from morphological measurements into metaphenotypes which are more biologically meaningful than the raw measurements themselves. We therefore return to one of the central problems in quantitative genetics: choosing quantities which adequately capture the complex events in which we are interested, without being so general as to conate several discrete processes.

6.3 The implications of tissue specicity

The tissue specic nature of genetic variation suggested by our results, and in particular the alternate wiring hypothesis, have serious implications for our understanding of genetic regulation and, more generally, of genetic variation itself.

106 6.3. THE IMPLICATIONS OF TISSUE SPECIFICITY

6.3.1 Understanding relationships between individuals

If the eects of genetic variation dier across tissues due to the tissue– specicity of gene regulation, it follows that dierences between individuals are also tissue–specic. As we showed in Chapter 3, extrapolation of genetic eects between tissues is largely inaccurate, supporting this notion. The implication is clear: we cannot predict gene regulation from any one tissue, and neither can we predict the signicance of genetic variations as these may dier between tissues. We must therefore address quantitative regulatory changes in phenotypically relevant tissues, a conclusion which has profound and dicult implications for the conduct of genetic studies in human complex diseases. Even more fundamental are the implications of this tissue–specicity for understanding genetic distance between individuals. We are accustomed to measuring genetic relatedness between individuals in terms of sequence polymorphisms, which are the same in all tissues, albeit more extensive than previously thought [Tuzun et al., 2005]. However, if the action of such polymorphisms is tissue–specic, then the functional dierences between individuals must also be a property of individual tissues. The degree of relatedness between individuals may therefore be a function of the degree of similarity between transcriptomes in each tissue, and inexpressible in terms of a single number representing whole organisms. Extrapolating this notion brings us to species–level dierences. Is it possible that dierences between species are in fact highly tissue specic? A recent eort to show this eect as changes in distances between primate species and between murid species [Enard et al., 2002] has been discredited by re– analysis [Hsieh et al., 2003]. However, the authors of the original report used a somewhat crude measure of distance (sum of expression distances) which does not take into account any of the network–based considerations discussed above. More sensitive methods may be required to uncover such phenomena, as has been the case between individuals presented here. These speculations, although preliminary, will provide fertile ground for extending our understanding of genetic individuality and perhaps even the

107 6.3. THE IMPLICATIONS OF TISSUE SPECIFICITY concept of species. We may well have to revise our denitions to accomodate this new evidence.

108 Bibliography

O. Abiola, JM. Angel, P. Avner, AA. Bachmanov, JK. Belknap, B. Bennett, EP. Blankenhorn, DA. Blizard, V. Bolivar, GA. Brockmann, KJ. Buck, JF. Bureau, WL. Casley, EJ. Chesler, JM. Cheverud, GA. Churchill, M. Cook, JC. Crabbe, WE. Crusio, A. Darvasi, G. de Haan, P. Dermant, RW. Doerge, RW. Elliot, CR. Farber, L. Flaherty, J. Flint, H. Gershen- feld, JP. Gibson, J. Gu, W. Gu, H. Himmelbauer, R. Hitzemann, HC. Hsu, K. Hunter, FF. Iraqi, RC. Jansen, TE. Johnson, BC. Jones, G. Kemper- mann, F. Lammert, L. Lu, KF. Manly, DB. Matthews, JF. Medrano, M. Mehrabian, G. Mittlemann, BA. Mock, JS. Mogil, X. Montagutelli, G. Morahan, JD. Mountz, H. Nagase, RS. Nowakowski, BF. O’Hara, AV. Osadchuk, B. Paigen, AA. Palmer, JL. Peirce, D. Pomp, M. Rosemann, GD. Rosen, LC. Schalkwyk, Z. Seltzer, S. Settle, K. Shimomura, S. Shou, JM. Sikela, LD. Siracusa, JL. Spearow, C. Teuscher, DW. Threadgill, LA. Toth, AA. Toye, C. Vadasz, G. Van Zant, E. Wakeland, RW. Williams, HG. Zhang, F. Zou, and . . The nature and identication of quantitative trait loci: a community’s view. Nat Rev Genet, 4(11):911–916, 2003.

B. Alberts, A. Johnson, J. Lewis, M. Ra, K. Roberts, and P. Walter. Molec- ular Biology of the Cell. New York: Garland Publishing, 4th edition, 2002.

O. Alter, PO. Brown, and D. Botstein. Singular value decomposition for genome-wide expression data processing and modeling. Proc Natl Acad Sci U S A, 97(18):10101–10106, 2000.

O. Alter, PO. Brown, and D. Botstein. Generalized singular value decomposition for comparative analysis of genome-scale expression data sets of

109 BIBLIOGRAPHY

two dierent organisms. Proc Natl Acad Sci U S A, 100(6):3351–3356, 2003.

M. Ashburner, CA. Ball, JA. Blake, D. Botstein, H. Butler, JM. Cherry, AP. Davis, K. Dolinski, SS. Dwight, JT. Eppig, MA. Harris, DP. Hill, L. Issel- Tarver, A. Kasarskis, S. Lewis, JC. Matese, JE. Richardson, M. Ringwald, GM. Rubin, and G. Sherlock. Gene ontology: tool for the unication of biology. The Gene Ontology Consortium. Nat Genet, 25(1):25–29, 2000.

DW. Bailey. Recombinant-inbred strains. An aid to nding identity, linkage, and function of histocompatibility and other genes. Transplantation, 11 (3):325–327, 1971.

G. Balazsi, KA. Kay, AL. Barabasi, and ZN. Oltvai. Spurious spatial periodicity of co-expression in microarray data due to printing design. Nucleic Acids Res, 31(15):4425–4433, 2003.

K. Basso, AA. Margolin, G. Stolovitzky, U. Klein, R. Dalla-Favera, and A. Califano. Reverse engineering of regulatory networks in human B cells. Nat Genet, 37(4):382–390, 2005.

JA. Beck, S. Lloyd, M. Hafezparast, M. Lennon-Pierce, JT. Eppig, MF. Festing, and EM. Fisher. Genealogies of mouse inbred strains. Nat Genet, 24(1):23–25, 2000.

J. Bedard, S. Brule, CA. Price, DW. Silversides, and JG. Lussier. Serine protease inhibitor-e2 (serpine2) is dierentially expressed in granulosa cells of dominant follicle in cattle. Mol Reprod Dev., 64(2):152–165, 2003.

N. Bing and I. Hoeschele. Genetical genomics analysis of a yeast segregant population for transcription network inference. Genetics, 170(2):533–542, 2005.

JM. Bland and DG. Altman. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet, 1(8476):307–310, 1986.

110 BIBLIOGRAPHY

JM. Bland and DG. Altman. Measuring agreement in method comparison studies. Stat Methods Med Res, 8(2):135–160, 1999.

BM. Bolstad, RA. Irizarry, M. Astrand, and TP. Speed. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics, 19(2):185–193, 2003.

P. Bork, LJ. Jensen, C. von Mering, AK. Ramani, I. Lee, and EM. Marcotte. Protein interaction networks from yeast to human. Curr Opin Struct Biol, 14(3):292–299, 2004.

NJ. Bray, PR. Buckland, MJ. Owen, and MC. O’Donovan. Cis-acting variation in the expression of a high proportion of genes in human brain. Hum Genet, 113(2):149–153, 2003.

RB. Brem and L. Kruglyak. The landscape of genetic complexity across 5,700 gene expression traits in yeast. Proc Natl Acad Sci U S A, 102(5): 1572–1577, 2005.

RB. Brem, G. Yvert, R. Clinton, and L. Kruglyak. Genetic dissection of transcriptional regulation in budding yeast. Science, 296(5568):752–5, 2002.

S. Brenner, M. Johnson, J. Bridgham, G. Golda, DH. Lloyd, D. Johnson, S. Luo, S. McCurdy, M. Foy, M. Ewan, R. Roth, D. George, S. Eletr, G. Albrecht, E. Vermaas, SR. Williams, K. Moon, T. Burcham, M. Pallas, RB. DuBridge, J. Kirchner, K. Fearon, J. Mao, and K. Corcoran. Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays. Nat Biotechnol, 18(6):630–634, 2000.

L. Bystrykh, E. Weersing, B. Dontje, S. Sutton, MT. Pletcher, T. Wiltshire, AI. Su, E. Vellenga, J. Wang, KF. Manly, L. Lu, EJ. Chesler, R. Alberts, RC. Jansen, RW. Williams, MP. Cooke, and G. de Haan. Uncovering regulatory pathways that aect hematopoietic stem cell function using ’genetical genomics’. Nat Genet, 37(3):225–232, Mar 2005.

111 BIBLIOGRAPHY

MJ. Callow, S. Dudoit, EL. Gong, TP. Speed, and EM. Rubin. Microar- ray expression proling identies genes with altered expression in HDL- decient mice. Genome Res, 10(12):2022–2029, 2000.

SB. Carroll. Evolution at two levels: on genes and form. PLoS Biol, 3(7), 2005.

L. Cartegni, SL. Chew, and AR. Krainer. Listening to silence and understanding nonsense: exonic mutations that aect splicing. Nat Rev Genet, 3(4):285–298, 2002.

JM. Casolari, CR. Brown, DA. Drubin, OJ. Rando, and PA. Silver. De- velopmentally induced changes in transcriptional program alter spatial organization across chromosomes. Genes Dev, 19(10):1188–1198, 2005.

EJ. Chesler, L. Lu, S. Shou, Y. Qu, J. Gu, J. Wang, HC. Hsu, JD. Mountz, NE. Baldwin, MA. Langston, DW. Threadgill, KF. Manly, and RW. Williams. Complex trait analysis of gene expression uncovers polygenic and pleiotropic networks that modulate nervous system function. Nat Genet, 37(3):233–242, 2005.

GA. Churchill, DC. Airey, H. Allayee, JM. Angel, AD. Attie, J. Beatty, WD. Beavis, JK. Belknap, B. Bennett, W. Berrettini, A. Bleich, M. Bogue, KW. Broman, KJ. Buck, E. Buckler, M. Burmeister, EJ. Chesler, JM. Cheverud, S. Clapcote, MN. Cook, RD. Cox, JC. Crabbe, WE. Cru- sio, A. Darvasi, CF. Deschepper, RW. Doerge, CR. Farber, J. Forejt, D. Gaile, SJ. Garlow, H. Geiger, H. Gershenfeld, T. Gordon, J. Gu, W. Gu, G. de Haan, NL. Hayes, C. Heller, H. Himmelbauer, R. Hitze- mann, K. Hunter, HC. Hsu, FA. Iraqi, B. Ivandic, HJ. Jacob, RC. Jansen, KJ. Jepsen, DK. Johnson, TE. Johnson, G. Kempermann, C. Kendziorski, M. Kotb, RF. Kooy, B. Llamas, F. Lammert, JM. Lassalle, PR. Lowen- stein, L. Lu, A. Lusis, KF. Manly, R. Marcucio, D. Matthews, JF. Medrano, DR. Miller, G. Mittleman, BA. Mock, JS. Mogil, X. Mon- tagutelli, G. Morahan, DG. Morris, R. Mott, JH. Nadeau, H. Nagase, RS. Nowakowski, BF. O’Hara, AV. Osadchuk, GP. Page, B. Paigen,

112 BIBLIOGRAPHY

K. Paigen, AA. Palmer, HJ. Pan, L. Peltonen-Palotie, J. Peirce, D. Pomp, M. Pravenec, DR. Prows, Z. Qi, RH. Reeves, J. Roder, GD. Rosen, EE. Schadt, LC. Schalkwyk, Z. Seltzer, K. Shimomura, S. Shou, MJ. Sillan- paa, LD. Siracusa, HW. Snoeck, JL. Spearow, K. Svenson, LM. Tarantino, D. Threadgill, LA. Toth, W. Valdar, FP. de Villena, C. Warden, S. What- ley, RW. Williams, T. Wiltshire, N. Yi, D. Zhang, M. Zhang, F. Zou, and . . The Collaborative Cross, a community resource for the genetic analysis of complex traits. Nat Genet, 36(11):1133–1137, 2004.

GA. Churchill and RW. Doerge. Empirical threshold values for quantitative trait mapping. Genetics, 138(3):963–971, 1994.

WS. Cleveland. Visualising data. Summit, NJ: Hobart Press, 1993.

WS. Cleveland. The Elements of Graphing Data. Summit, NJ: Hobart Press, 1994.

WS. Cleveland and SJ. Devlin. Locally weighted regression: an approach to regression analysis by local tting. J. Amer. Statist. Ass., 83:596–610, 1988.

WS. Cleveland, E. Grosse, and WM. Shyu. Statistical Models in S, chapter 8. Chapman and Hall, 1992.

CJ Cotsapas, EKF Chan, MF Kirk, M Tanaka, and PFR Little. Genetic variation and the control of transcription. Cold Spring Harbor Symposia on Quantitative Biology, 68:109–114, 2003.

CR Cowles, JN Hirschhorn, D Altshuler, and ES Lander. Detection of regulatory variation in mouse genes. Nat Genet, 32(3):432–7, 2002.

C Damerval, A Maurice, J M Josse, and D de Vienne. Quantitative trait loci underlying gene product variation: a novel perspective for analyzing regulation of genome expression. Genetics, 137(1):289–301, May 1994.

J. Dausset, H. Cann, D. Cohen, M. Lathrop, JM. Lalouel, and R. White. Centre d’etude du polymorphisme humain (CEPH): collaborative genetic mapping of the human genome. Genomics, 6(3):575–577, 1990.

113 BIBLIOGRAPHY

AP. Davis and MJ. Justice. Mouse alleles: if you’ve seen one, you haven’t seen them all. Trends Genet, 14(11):438–441, 1998.

D de Vienne, A Leonardi, and C Damerval. Genetic aspects of variation of protein amounts in maize and pea. Electrophoresis, 9(11):742–750, Nov 1988.

X. Deng, H. Geng, and H. Ali. EXAMINE: A computational approach to reconstructing gene regulatory networks. Biosystems, 81(2):125–136, 2005.

Denver, DR. and Morris, K. and Streelman, JT. and Kim, SK. and Lynch, M. and Thomas, WK. The transcriptional consequences of mutation and natural selection in Caenorhabditis elegans. Nat Genet, 37(5):544–548, 2005.

RW. Doerge and GA. Churchill. Permutation tests for multiple loci aecting a quantitative character. Genetics, 142(1):285–294, 1996.

S. Doss, EE. Schadt, TA. Drake, and AJ. Lusis. Cis-acting expression quantitative trait loci in mice. Genome Res, 15(5):681–691, 2005.

TA. Drake, E. Schadt, K. Hannani, JM. Kabo, K. Krass, V. Colinayo, LE. Greaser, J. Goldin, and AJ. Lusis. Genetic loci determining bone density in mice with diet-induced atherosclerosis. Physiol Genomics, 5(4):205– 215, 2001.

S. Dudoit, YH. Yang, MJ. Callow, and TP. Speed. Statistical methods for identifying dierentially expressed genes in replicated cdna microarray experiments. Technical Report 578, Stanford University, August 2000.

S. Dudoit, YH. Yang, MJ. Callow, and TP. Speed. Statistical methods for identifying dierentially expressed genes in replicated cDNA microarray experiments. Statistica Sinica, 12:111–139, 2002.

IA. Eaves, LS. Wicker, G. Ghandour, PA. Lyons, LB. Peterson, JA. Todd, and RJ. Glynne. Combining mouse congenic strains and microarray gene

114 BIBLIOGRAPHY

expression analyses to study a complex trait: the NOD model of type 1 diabetes. Genome Res, 12(2):232–243, 2002.

B. Efron and RJ. Tibshirani. An introduction to the Bootstrap. New York: Chapman and Hall, 1993.

MB. Eisen, PT. Spellman, PO. Brown, and D. Botstein. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A, 95(25):14863–14868, 1998.

W. Enard, P. Khaitovich, J. Klose, S. Zollner, F. Heissig, P. Giavalisco, K. Nieselt-Struwe, E. Muchmore, A. Varki, R. Ravid, GM. Doxiadis, RE. Bontrop, and S. Paabo. Intra- and interspecic variation in primate gene expression patterns. Science, 296(5566):340–343, 2002.

DS Falconer. Introduction to Quantitative Genetics, 3rd edition. London: Longman Scientic and Technical, 1989.

M. Farrall. Quantitative genetic variation: a post-modern view. Hum Mol Genet, 13 Spec No 1:1–7, 2004.

JC. Fay, HL. McCullough, PD. Sniegowski, and MB. Eisen. Population genetic variation in gene expression is associated with phenotypic variation in Saccharomyces cerevisiae. Genome Biol, 5(4):R26, 2004.

TG. Fazzio, C. Kooperberg, JP. Goldmark, C. Neal, R. Basom, J. Delrow, and T. Tsukiyama. Widespread collaboration of Isw2 and Sin3-Rpd3 chromatin remodeling complexes in transcriptional repression. Mol Cell Biol, 21(19):6450–6460, 2001.

MFW. Festing. Inbred strains of mice, via Mouse Genome Informatics, 1998. URL http://www.informatics.jax.org/external/festing/mouse/. Accessed Sept 2005.

DB. Finkelstein, J. Gollub, R. Ewing, F. Sterky, S. Somerville, and JM. Cherry. Iterative linear regression by sector: renormalization of cdna microarray data and cluster analysis weighted by cross homology. Methods of Microarray Data Analysis, 2001.

115 BIBLIOGRAPHY

RA. Fisher. The Genetical Theory of Natural Selection. Oxford: Oxford University Press, 1930.

C. Friedman, P. Kra, H. Yu, M. Krauthammer, and A. Rzhetsky. GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics, 17 Suppl 1:74–82, 2001.

D. Gabellini, MR. Green, and R. Tupler. When enough is enough: genetic diseases associated with transcriptional derepression. Curr Opin Genet Dev, 14(3):301–307, 2004.

AP. Gasch and MB. Eisen. Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering. Genome Biol, 3(11), 2002.

RC. Gentleman, VJ. Carey, DM. Bates, BM. Bolstad, M. Dettling, S. Du- doit, B. Ellis, L. Gautier, Y. Ge, J. Gentry, K. Hornik, T. Hothorn, W. Hu- ber, S. Iacus, RA. Irizarry, F. Leisch, C. Li, M. Maechler, AJ. Rossini, G. Sawitzki, C. Smith, G. Smyth, L. Tierney, JYH. Yang, and J. Zhang. Bioconductor: Open software development for computational biology and bioinformatics. Genome Biology, 5:R80, 2004.

G. Gibson and I. Dworkin. Uncovering cryptic genetic variation. Nat Rev Genet, 5(9):681–690, 2004.

N. Gompel, B. Prud’homme, PJ. Wittkopp, VA. Kassner, and SB. Carroll. Chance caught on the wing: cis-regulatory evolution and the origin of pigment patterns in Drosophila. Nature, 433(7025):481–487, 2005.

JD. Han, N. Bertin, T. Hao, DS. Goldberg, GF. Berriz, LV. Zhang, D. Dupuy, AJ. Walhout, ME. Cusick, FP. Roth, and M. Vidal. Evi- dence for dynamically organized modularity in the yeast protein-protein interaction network. Nature, 430(6995):88–93, 2004.

T. Hastie and R. Tibshirani. Generalised Additive Models. Chapman and Hall, 1990.

116 BIBLIOGRAPHY

WP. Hsieh, TM. Chu, RD. Wolnger, and G. Gibson. Mixed-model reanal- ysis of primate data suggests tissue and species biases in oligonucleotide- based gene expression proles. Genetics, 165(2):747–757, 2003.

N. Hubner, CA. Wallace, H. Zimdahl, E. Petretto, H. Schulz, F. Maciver, M. Mueller, O. Hummel, J. Monti, V. Zidek, A. Musilova, V. Kren, H. Causton, L. Game, G. Born, S. Schmidt, A. Muller, SA. Cook, TW. Kurtz, J. Whittaker, M. Pravenec, and TJ. Aitman. Integrated transcriptional proling and linkage analysis for identication of genes underlying disease. Nat Genet, 37(3):243–253, 2005.

TR. Hughes, CJ. Roberts, H. Dai, AR. Jones, MR. Meyer, D. Slade, J. Bur- chard, S. Dow, TR. Ward, MJ. Kidd, SH. Friend, and MJ. Marton. Widespread aneuploidy revealed by DNA microarray expression proling. Nat Genet, 25(3):333–337, 2000.

RG. Hunter, MM. Lim, KB. Philpot, LJ. Young, and MJ. Kuhar. Species dierences in brain distribution of CART mRNA and CART peptide between prairie and meadow voles. Brain Res, 1048(1-2):12–23, 2005.

RA. Irizarry, BM. Bolstad, F. Collin, LM. Cope, B. Hobbs, and TP. Speed. Summaries of Aymetrix GeneChip probe level data. Nucleic Acids Res, 31(4):e15, 2003.

RC. Jansen and JP. Nap. Genetical genomics: the added value from segregation. Trends Genet, 17(7):388–391, 2001.

W. Jin, RM. Riley, RD. Wolnger, KP. White, G. Passador-Gurgel, and G. Gibson. The contributions of sex, genotype and age to transcriptional variance in Drosophila melanogaster. Nat Genet, 29(4):389–395, 2001.

IT. Jolie. Principal Components Analysis. Springer Series in Statistics. New York: Springer, 2002.

M. Kanehisa, S. Goto, S. Kawashima, Y. Okuno, and M. Hattori. The KEGG resource for deciphering the genome. Nucleic Acids Res, 32 (Database issue):277–280, 2004.

117 BIBLIOGRAPHY

GJ. Kargul, DB. Dudekula, Y. Qian, MK. Lim, SA. Jaradat, TS. Tanaka, MG. Carter, and MS. Ko. Verication and initial annotation of the NIA mouse 15K cDNA clone set. Nat Genet, 28(1):17–18, 2001.

CL. Karp, A. Grupe, E. Schadt, SL. Ewart, M. Keane-Moore, PJ. Cuomo, J. Kohl, L. Wahl, D. Kuperman, S. Germer, D. Aud, G. Peltz, and M. Wills-Karp. Identication of complement factor 5 as a susceptibility locus for experimental allergic asthma. Nat Immunol, 1(3):221–226, 2000.

WJ. Kent, CW. Sugnet, TS. Furey, KM. Roskin, TH. Pringle, AM. Zahler, and D. Haussler. The human genome browser at UCSC. Genome Res., 12(6):996–1006, 2002. URL http://genome.ucsc.edu/.

TB. Kepler, L. Crosby, and KT. Morgan. Normalization and analysis of DNA microarray data by self-consistency and local regression. Genome Biol, 3(7):research0037.1, 2002.

MK. Kerr, M. Martin, and GA. Churchill. Analysis of variance for gene expression microarray data. J Comput Biol, 7(6):819–837, 2000.

H. Kim, GH. Golub, and H. Park. Missing value estimation for DNA microarray gene expression data: local least squares imputation. Bioinfor- matics, 21(2):187–198, 2005.

MC. King and AC. Wilson. Evolution at two levels in humans and chim- panzees. Science, 188(4184):107–116, 1975.

CP. Klingenberg and LR. Monteiro. Distances and directions in multidi- mensional shape spaces: implications for morphometric applications. Syst Biol, 54(4):678–688, 2005.

Y. Kluger, DP. Tuck, JT. Chang, Y. Nakayama, R. Poddar, N. Kohya, Z. Lian, A. Ben Nasr, HR. Halaban, DS. Krause, X. Zhang, PE. New- burger, and SM. Weissman. Lineage specicity of gene expression patterns. Proc Natl Acad Sci U S A, 101(17):6508–6513, 2004.

118 BIBLIOGRAPHY

JC. Knight, BJ. Keating, KA. Rockett, and DP. Kwiatkowski. In vivo char- acterization of regulatory polymorphisms by allele-specic quantication of RNA polymerase loading. Nat Genet, 33(4):469–475, 2003.

R. Korstanje and B. Paigen. From QTL to gene: the harvest begins. Nat Genet, 31(3):235–236, 2002.

LW. Law, AG. Morrow, and EM. Greenspan. Inheritance of low liver glu- curonidase activity in the mouse. J Natl Cancer Inst, 12(4):909–916, 1952.

PD. Lee, R. Sladek, CM. Greenwood, and TJ. Hudson. Control genes and variability: absence of ubiquitous reference transcripts in diverse mammalian expression studies. Genome Res, 12(2):292–297, 2002.

M. Levine and R. Tjian. Transcription regulation and animal diversity. Nature, 424(6945):147–151, 2003.

H. Li, L. Lu, KF. Manly, EJ. Chesler, L. Bao, J. Wang, M. Zhou, RW. Williams, and Y. Cui. Inferring gene transcriptional modulatory relations: a genetical genomics approach. Hum Mol Genet, 14(9):1119–1125, 2005.

O. Lipan and WH. Wong. The use of oscillatory signals in the study of genetic networks. Proc Natl Acad Sci U S A, 102(20):7063–7068, 2005.

HS. Lo, Z. Wang, Y. Hu, HH. Yang, S. Gere, KH. Buetow, and MP. Lee. Allelic variation in gene expression is common in the human genome. Genome Res, 13(8):1855–1862, 2003.

KS. Loftus, Y. Chen, G. Gooden, JF. Ryan, G. Birznieks, M. Hillard, DA. Baxevanis, M. Bittner, P. Meltzer, J. Trent, and W. Pavan. Informatic selection of a neural crest-melanocyte cDNA set for microarray analysis. Proc. Natl Acad. Sci. USA, 96:9277–9280, 1999.

I. Lonnstedt and T.P. Speed. Replicated microarray data. Statistica Sinica, 12:31–46, 2002.

M. Lynch and B Walsh. Genetics and Analysis of Quantitative Traits. Sin- auer Associates, 1998.

119 BIBLIOGRAPHY

T. Maniatis and R. Reed. An extensive network of coupling among gene expression machines. Nature, 416(6880):499–506, 2002.

MJ. Martinez, AD. Aragon, AL. Rodriguez, JM. Weber, JA. Timlin, MB. Sinclair, DM. Haaland, and M. Werner-Washburne. Identication and removal of contaminating uorescence from commercial and in-house printed DNA microarrays. Nucleic Acids Res, 31(4):e18, 2003.

T. Mary-Huard, JJ. Daudin, S. Robin, F. Bitton, E. Cabannes, and P. Hil- son. Spotting eect in microarray experiments. BMC Bioinformatics, 5: 63–63, 2004.

S.A. Monks, A. Leonardson, H. Zhu, P. Cundi, P. Pietrusiak, S. Edwards, J.W. Phillips, A. Sachs, and E.E. Schadt. Genetic inheritance of gene expression in human cell lines. Am J Hum Genet., 75:1094–1105, 2004.

KL. Montooth, JH. Marden, and AG. Clark. Mapping determinants of variation in energy metabolism, respiration and ight in Drosophila. Genetics, 165(2):623–635, 2003.

M. Morley, CM. Molony, TM. Weber, JL. Devlin, KG. Ewens, RS. Spielman, and VG. Cheung. Genetic analysis of genome-wide variation in human gene expression. Nature, 430(7001):743–747, 2004.

NE. Morton. Signicance levels in complex inheritance. Am J Hum Genet, 62(3):690–697, 1998.

JH. Nadeau. Modier genes in mice and humans. Nat Rev Genet, 2(3): 165–174, 2001.

JH. Nadeau. Listening to genetic background noise. N Engl J Med, 352(15): 1598–1599, 2005.

JH. Nadeau and WN. Frankel. The roads from phenotypic variation to gene discovery: mutagenesis versus QTLs. Nat Genet, 25(4):381–384, 2000.

120 BIBLIOGRAPHY

M.A. Newton, A. Nouiery, D. Serkar, and P. Ahlquist. Detecting dierential gene expression with a semiparametric hierarchical mixture method. Biostat., 5:155–176, 2004.

DJ. Nott, Z. Yu, EKF. Chan, CJ. Cotsapas, MJ. Cowley, JN. Pulvers, RBH. Williams, and PFR. Little. Hierarchical Bayes variable selection and microarray experiments. 2006, accepted.

MF. Oleksiak, GA. Churchill, and DL. Crawford. Variation in gene expression within and among natural populations. Nat Genet, 32(2):261–266, 2002.

MF. Oleksiak, JL. Roach, and DL. Crawford. Natural variation in cardiac metabolism and gene expression in Fundulus heteroclitus. Nat Genet, 37 (1):67–72, 2005.

M. Ouyang, WJ. Welsh, and P. Georgopoulos. Gaussian mixture clustering and imputation of microarray data. Bioinformatics, 20(6):917–923, 2004.

K. Paigen. Acid hydrolases as models of genetic control. Annu Rev Genet, 13:417–466, 1979.

LA. Parada, S. Sotiriou, and T. Misteli. Spatial genome organization. Exp Cell Res, 296(1):64–70, 2004.

T. Pastinen, R. Sladek, S. Gurd, A. Sammak, B. Ge, P. Lepage, K. Lavergne, A. Villeneuve, T. Gaudin, H. Brandstrom, A. Beck, A. Verner, J. Kingsley, E. Harmsen, D. Labuda, K. Morgan, MC. Vohl, AK. Naumova, D. Sinnett, and TJ. Hudson. A survey of genetic and epigenetic variation aecting human gene expression. Physiol Genomics, 16(2):184–193, 2004.

JL. Peirce, L. Lu, J. Gu, LM. Silver, and RW. Williams. A new set of BXD recombinant inbred lines from advanced intercross populations in mice. BMC Genet, 5(1):7–7, 2004.

BE. Perrin, L. Ralaivola, A. Mazurie, S. Bottani, J. Mallet, and F. D’Alche- Buc. Gene networks inference using dynamic Bayesian networks. Bioin- formatics, 19 Suppl 2:138–138, 2003.

121 BIBLIOGRAPHY

J Pierce, L Lu, H Li, J Wang, K Manly, R Hitzemann, J Bleknap, and R Williams. How replicable are mrna expression qtls? Mammalian Genome, 17(6):643–656, 2006.

MT. Pletcher, P. McClurg, S. Batalov, AI. Su, SW. Barnes, E. Lagler, R. Ko- rstanje, X. Wang, D. Nusskern, MA. Bogue, RJ. Mural, B. Paigen, and T. Wiltshire. Use of a dense single nucleotide polymorphism map for in silico mapping in the mouse. PLoS Biol, 2(12):e393, 2004.

JN. Pulvers. The tissue specicity of transcriptional regulatory mechanisms and gene expression level variation. Master’s thesis, University of New South Wales, 2004.

J. Qian, J. Lin, NM. Luscombe, H. Yu, and M. Gerstein. Prediction of regulatory networks: genome-wide identication of transcription factor targets from gene expression data. Bioinformatics, 19(15):1917–1926, 2003.

R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2005. URL http://www.R-project.org.

TH. Roderick, RE. Wimer, CC. Wimer, and PA. Schwartzkroin. Genetic and phenotypic variation in weight of brain and spinal cord between inbred strains of mice. Brain Res., 64:345–353, 1973.

DR. Ruppert, MP. Wand, and RJ. Carroll. Semiparametric Regression. Cambridge Series in Statistical and Probabilistic Mathematics. New York: Cambridge University Press, 2003.

R. Sandberg, R. Yasuda, D.G. Pankratz, T.A. Carter, J.A. Del Rio, L. Wod- icka, M. Mayford, D.J. Lockhart, and C. Barlow. Regional and strain- specic gene expression mapping in the adult mouse brain. Proc Natl Acad Sci U S A, 97(20):11038–43, 2000.

M. Sapir and GA. Churchill. Estimating the posterior probability of dierential gene expression from microarray data. Technical report, Jackson Laboratories, 2000.

122 BIBLIOGRAPHY

JG. Schaart, L. Mehli, and HJ. Schouten. Quantication of allele-specic expression of a gene encoding strawberry polygalacturonase-inhibiting protein (PGIP) using Pyrosequencing. Plant J, 41(3):493–500, 2005.

EE. Schadt, J. Lamb, X. Yang, J. Zhu, S. Edwards, D. Guhathakurta, SK. Sieberts, S. Monks, M. Reitman, C. Zhang, PY. Lum, A. Leonardson, R. Thieringer, JM. Metzger, L. Yang, J. Castle, H. Zhu, SF. Kash, TA. Drake, A. Sachs, and AJ. Lusis. An integrative genomics approach to infer causal associations between gene expression and disease. Nat Genet, 37(7):710–717, 2005.

EE. Schadt, SA. Monks, TA. Drake, AJ. Lusis, N. Che, V. Colinayo, TG. Ru, SB. Milligan, JR. Lamb, G. Cavet, PS. Linsley, M. Mao, RB. Stoughton, and SH. Friend. Genetics of gene expression surveyed in maize, mouse and man. Nature, 422(6929):297–302, 2003.

M. Schena, D. Shalon, RW. Davis, and PO. Brown. Quantitative monitor- ing of gene expression patterns with a complementary DNA microarray. Science, 270(5235):467–470, 1995.

G. Schlager. Kidney weight in mice: strain dierences and genetic determination. J. Hered., 59:171–174, 1968.

D. Scholtens, M. Vidal, and R. Gentleman. Local modeling of global interactome networks. Bioinformatics, 21(17):3548–3557, 2005.

J. Schuchhardt, D. Beule, A. Malik, E. Wolski, H. Eickho, H. Lehrach, and H. Herzel. Normalization strategies for cDNA microarrays. Nucleic Acids Res, 28(10), 2000.

E. Segal, M. Shapira, A. Regev, D. Pe’er, D. Botstein, D. Koller, and N. Friedman. Module networks: identifying regulatory modules and their condition-specic regulators from gene expression data. Nat Genet, 34(2): 166–176, 2003.

JS. Simono. Smoothing Methods in Statistics. Springer Series in Statistics. NewYork: Springer–Verlag, 1996.

123 BIBLIOGRAPHY

G.K. Smyth. Linear models and empirical bayes methods for assessing dif- ferential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology, 3:3, 2004.

GK Smyth and TP. Speed. Normalization of cdna microarray data. Methods, 31:265–273, 2003.

GK. Smyth, N. Thorne, and J. Wettenhall. limma: Linear Models for Mi- croarray Data User’s Guide. The Walter and Eliza Hall Institute of Med- ical Research, October 2004.

GK. Smyth, YH. Yang, and T. Speed. Statistical issues in cDNA microarray data analysis, volume 224, chapter 9, pages 111–136. New York: Humana Press, Inc., 2003.

P. Soille. Morphological Image Analysis: Principles and Applications. New York: Springer, New York, 1999.

T. Sorlie, CM. Perou, R. Tibshirani, T. Aas, S. Geisler, H. Johnsen, T. Hastie, MB. Eisen, M. van de Rijn, SS. Jerey, T. Thorsen, H. Quist, JC. Matese, PO. Brown, D. Botstein, P. Eystein Lonning, and AL. Borresen-Dale. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci U S A, 98(19):10869–10874, 2001.

PT. Spellman, G. Sherlock, MQ. Zhang, VR. Iyer, K. Anders, MB. Eisen, PO. Brown, D. Botstein, and B. Futcher. Comprehensive identication of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell, 9(12):3273–3297, 1998.

JB. Storer. Relation of lifespan to brain weight, body weight and metabolic rate among inbred mouse strains. Exp. Gerontol., 2:173–182, 1967.

JD. Storey, JM. Akey, and L. Kruglyak. Multiple locus linkage analysis of genomewide expression in yeast. PLoS Biology, 3(8), 2005.

124 BIBLIOGRAPHY

S. Tavazoie, JD. Hughes, MJ. Campbell, RJ. Cho, and GM. Church. Sys- tematic determination of genetic network architecture. Nat Genet, 22(3): 281–285, 1999.

The International HapMap Consortium. The International HapMap Project. Nature, 426(6968):789–796, 2003.

J. Theuns and C. Van Broeckhoven. Transcriptional regulation of Alzheimer’s disease genes: implications for susceptibility. Hum Mol Genet, 9(16):2383–2394, 2000.

JP. Townsend, D. Cavalieri, and DL. Hartl. Population genetic variation in genome-wide gene expression. Mol Biol Evol, 20(6):955–963, 2003.

O. Troyanskaya, M. Cantor, G. Sherlock, P. Brown, T. Hastie, R. Tibshirani, D. Botstein, and RB. Altman. Missing value estimation methods for DNA microarrays. Bioinformatics, 17(6):520–525, 2001.

ER. Tufte. Envisioning Information. Cheshire, Conn.: Graphics Press, 1990.

ER. Tufte. Visual Explanations. Cheshire, Conn.: Graphics Press, 1997.

E. Tuzun, AJ. Sharp, JA. Bailey, R. Kaul, VA. Morrison, LM. Pertz, E. Hau- gen, H. Hayden, D. Albertson, D. Pinkel, MV. Olson, and EE. Eichler. Fine-scale structural variation of the human genome. Nat Genet, 37(7): 727–732, 2005.

VE. Velculescu, L. Zhang, B. Vogelstein, and KW. Kinzler. Serial analysis of gene expression. Science, 270(5235):484–487, 1995.

CM. Wade, EJ. Kulbokas, AW. Kirby, MC. Zody, JC. Mullikin, ES. Lander, K. Lindblad-Toh, and MJ. Daly. The mosaic structure of variation in the laboratory mouse genome. Nature, 420(6915):574–578, 2002.

D. Wahlsten, WJ. Hudspeth, and K. Bernhardt. Implications of genetic variation in mouse brain structure for electrode placement by stereotoxic surgery. J. Comp. Neurol., 162:519–532, 1975.

125 BIBLIOGRAPHY

A. Whitehead and DL. Crawford. Variation in tissue-specic gene expression among natural populations. Genome Biol, 6(2):R13, 2005.

RBH. Williams, CJ. Cotsapas, , MJ. Cowley, EKF. Chan, DJ. Nott, and PFR. Little. Inuence of microarray normalisation procedures on detection of linkage signal in genetical-genomics experiments. Nature Genetics, 2006, in press.

DL. Wilson, MJ. Buckley, CA. Helliwell, and IW. Wilson. New normalization methods for cDNA microarray data. Bioinformatics, 19(11):1325– 1332, 2003.

T. Wiltshire, MT. Pletcher, S. Batalov, SW. Barnes, LM. Tarantino, MP. Cooke, H. Wu, K. Smylie, A. Santrosyan, NG. Copeland, NA. Jenk- ins, F. Kalush, RJ. Mural, RJ. Glynne, SA. Kay, MD. Adams, and CF. Fletcher. Genome-wide single-nucleotide polymorphism analysis denes haplotype patterns in mouse. Proc Natl Acad Sci U S A, 100(6):3380– 3385, 2003.

PJ. Wittkopp, BK. Haerum, and AG. Clark. Evolutionary changes in cis and trans gene regulation. Nature, 430(6995):85–88, 2004.

SN. Wood. mgcv:gams and generalized ridge regression for R. R News, 1 (2):20–25, 2001.

SN. Wood. Stable and ecient multiple smoothing parameter estimation for generalized additive models. J. Amer. Statist. Ass., 99:637–686, 2004.

G.A. Wray, M.W. Hahn, E. Abouheif, J.P. Balho, M. Pizer, M.V. Rock- man, and L.A. Romano. The evolution of transcriptional regulation in eukaryotes. Mol. Biol. Evol., 20:1377–1419, 2003.

H. Yan, W. Yuan, VE. Velculescu, B. Vogelstein, and KW. Kinzler. Allelic variation in human gene expression. Science, 297(5584):1143–1143, 2002.

Y.H. Yang, S. Dudoit, P. Luu, D.M. Lin, V. Peng, J. Ngai, and T.P. Speed. Normalization for cdna microarray data: a robust composite method ad-

126 BIBLIOGRAPHY

dressing single and multiple slide systematic variation. Nucl. Acids. Res., 30:e15, 2002.

YH. Yang and NP. Thorne. Normalization for two-color cDNA microarray data., volume 40 of Monograph Series, pages 403–418. IMS Lecture Notes, 2003.

G. Yvert, RB. Brem, J. Whittle, JM. Akey, E. Foss, EN. Smith, R. Mack- elprang, and L. Kruglyak. Trans-acting regulatory variation in saccharomyces cerevisiae and the role of transcription factors. Nat Genet, 35(1): 57–64, 2003.

127 Appendix A

Signicant linkages in the BxD panel

128 Signicant linkages in the BxD panel

Table A.1: Linkage hits in cis and in trans for expression levels measured in BXD whole brain preparations. Signicant linkage is dened as P < 0.05, with genome–wise adjustment for trans eects. Genomic location is given as cumulative megabases across the mouse genome.

Transcript Locus Chr Location (Mb) Cis/Trans P NM.019460 D19Mit68 19 2369.61 T 3.28E05 AB015425 D8Mit128 8 1185.17 T 4.26E05 NM.021565 D3Mit19 3 535.89 C 4.93E02 NM.021565 D1Mit216 1 80.32 T 5.71E05 NM.011732 D16Mit86 16 2176.22 C 4.32E02 NM.009475 D1Mit216 1 80.32 T 5.70E06 NM.009475 D1Mit134 1 80.78 T 3.94E06 AK016070 D1Mit216 1 80.32 T 1.66E05 AK016070 D1Mit134 1 80.78 T 9.70E06 AK008434 D13Mit35 13 1862.73 C 3.12E02 AK020104 D3Mit347 3 499.50 T 2.76E05 AK020930 D2Mit286 2 350.45 T 1.57E05 NM.011453 D1Mit216 1 80.32 T 7.00E05 NM.011453 D1Mit134 1 80.78 T 2.14E05 M16360 D19Mit103 19 2419.31 T 8.52E06 AK020518 D3Mit19 3 535.89 C 2.42E02 AK007236 D2Mit457 2 376.81 C 4.95E02 NM.025498 D1Mit87 1 114.16 T 7.38E05 AK019615 05.142.105 5 835.33 C 3.95E02 AK020850 D19Mit127 19 2377.75 T 1.32E05 AK013632 D8Mit124 8 1139.72 T 4.17E05 AK013632 D19Mit127 19 2377.75 T 2.84E05 AK013632 D19Mit61 19 2380.97 T 3.55E05 AK013632 D19Mit28 19 2381.35 T 1.16E05 Continued on next page

129 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P AK013632 D19Mit80 19 2388.76 T 1.59E05 AF016695 D1Mit216 1 80.32 T 1.90E05 AF016695 D1Mit134 1 80.78 T 1.21E05 X14387 D7Mit259 7 1123.02 C 1.24E02 BC013541 D3Mit21 3 414.22 T 2.15E05 BC013541 03.034.300 3 414.52 T 3.77E06 BC013541 D3Mit63 3 418.08 T 2.34E05 AK008491 D1Mit216 1 80.32 T 6.96E05 AK008491 D1Mit134 1 80.78 T 6.76E05 AB049821 D17Mit221 17 2270.31 C 4.21E02 NM.010407 D9Mit151 9 1375.37 C 1.59E02 M16810 D13Mit35 13 1862.73 C 4.42E02 AK005929 D1Mit216 1 80.32 T 7.34E05 AK005929 D1Mit134 1 80.78 T 4.73E05 AK011185 D9Mit151 9 1375.37 C 2.23E02 AK020637 D11Mit337 11 1627.76 C 4.88E02 AK020637 D8Mit124 8 1139.72 T 6.04E05 AK015461 D8Mit124 8 1139.72 T 2.36E05 NM.011406 D11Mit337 11 1627.76 C 3.98E02 M32004 D12Mit150 12 1742.65 C 3.38E02 AK006812 D1Mit216 1 80.32 T 5.63E05 AK019864 D7Mit259 7 1123.02 C 3.77E02 NM.025485 D19Mit127 19 2377.75 T 2.40E05 AK013755 D11Mit337 11 1627.76 C 1.59E02 AK004904 D12Mit150 12 1742.65 C 4.24E02 AK013743 D1Mit134 1 80.78 T 5.83E05 AK007287 D1Mit134 1 80.78 T 7.12E05 AK017229 D8Mit156 8 1253.28 C 3.75E02 AF296075 D8Mit25 8 1188.52 T 3.00E05 Continued on next page

130 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P NM.010321 D14Mit131 14 1975.00 C 2.69E02 NM.010321 D7Mit117 7 1009.47 T 4.73E06 NM.010321 D7Mit246 7 1009.87 T 2.43E05 AF229637 D19Mit109 19 2372.17 T 3.69E05 M81747 D10Mit122 10 1481.94 T 1.13E05 AK018586 D9Mit151 9 1375.37 C 2.49E02 AK021192 D19Mit68 19 2369.61 T 5.23E05 AK016975 D3Mit19 3 535.89 C 1.05E02 AK019334 D19Mit28 19 2381.35 T 4.33E05 U37222 D1Mit216 1 80.32 T 7.63E06 U37222 D1Mit134 1 80.78 T 8.01E06 AF148688 D8Mit124 8 1139.72 T 3.49E05 NM.011757 D13Mit35 13 1862.73 C 4.33E02 NM.016810 DXMit105 X 2469.60 T 1.66E05 NM.016810 DXMit144 X 2482.82 T 1.23E05 AF035526 D1Mit216 1 80.32 T 5.58E06 AF035526 D1Mit134 1 80.78 T 3.01E05 NM.011498 D9Mit151 9 1375.37 C 3.91E02 AF033666 D2Mit457 2 376.81 C 4.26E02 Z12284 D11Mit337 11 1627.76 C 1.01E02 Z12270 D1Mit216 1 80.32 T 3.57E05 AJ249820 D1Mit216 1 80.32 T 6.85E05 AJ249820 D1Mit134 1 80.78 T 8.10E06 AF041889 D1Mit216 1 80.32 T 6.41E05 AF041889 D1Mit134 1 80.78 T 6.31E05 AK017582 D6Mit14 6 986.00 C 4.33E02 AK009847 D1Mit19 1 74.13 T 7.04E05 AK009847 D1Mit216 1 80.32 T 2.59E05 AK018246 D16Mit47 16 2144.43 T 4.73E05 Continued on next page

131 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P AK020358 D1Mit216 1 80.32 T 1.89E05 AK005503 D1Mit134 1 80.78 T 3.98E05 AK011684 D1Mit134 1 80.78 T 6.02E05 AY033650 D1Mit134 1 80.78 T 4.74E05 AK008259 16.052.225 16 2137.52 T 3.74E05 AK008259 D16Mit47 16 2144.43 T 3.18E05 NM.026189 D1Mit216 1 80.32 T 1.27E05 NM.026189 D1Mit134 1 80.78 T 6.91E05 NM.025421 D1Mit216 1 80.32 T 1.44E05 NM.025421 D1Mit134 1 80.78 T 2.52E05 NM.026640 D1Mit328 1 68.25 T 7.48E05 NM.026640 D1Mit178 1 69.13 T 5.29E05 NM.026640 D1Mit128 1 72.96 T 7.42E05 NM.026640 D1Mit19 1 74.13 T 7.33E05 NM.026640 D1Mit216 1 80.32 T 8.68E06 NM.026640 D1Mit134 1 80.78 T 7.51E07 AK014175 D5Mit309 5 769.16 T 4.08E06 AF084548 D4Mit344 4 689.95 C 1.96E02 NM.009129 D4Mit344 4 689.95 C 4.30E02 NM.008801 D7Mit259 7 1123.02 C 3.36E02 NM.019960 D1Mit216 1 80.32 T 6.12E05 NM.019960 D1Mit134 1 80.78 T 5.34E05 NM.009393 D2Mit457 2 376.81 C 1.08E02 NM.009920 D3Mit19 3 535.89 C 3.28E02 M25567 D1Mit19 1 74.13 T 4.37E05 M25567 D1Mit216 1 80.32 T 3.02E05 M25567 D1Mit134 1 80.78 T 4.16E05 AF342896 D7Mit62 7 1062.34 T 2.36E05 NM.026318 D9Mit151 9 1375.37 C 1.82E02 Continued on next page

132 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P NM.023630 D1Mit134 1 80.78 T 7.52E05 AK012911 D14Mit131 14 1975.00 C 4.86E02 AJ409503 X.023.015 X 2457.98 T 6.53E05 AK012210 D19Mit28 19 2381.35 T 5.96E05 AK018581 D13Mit35 13 1862.73 C 3.83E02 AK006709 DXMit89 X 2436.00 T 2.60E05 BC003960 D1Mit216 1 80.32 T 5.04E05 BC003960 D1Mit134 1 80.78 T 3.41E05 BC005651 D14Mit129 14 1901.70 T 7.05E05 AF190730 D6Mit14 6 986.00 C 4.86E02 NM.011256 D6Mit14 6 986.00 C 4.07E02 NM.011029 D8Mit156 8 1253.28 C 1.68E02 NM.008485 D7Mit259 7 1123.02 C 2.13E03 AJ290947 D14Mit99 14 1876.58 T 7.56E05 AF012133 D1Mit216 1 80.32 T 6.96E06 AF012133 D1Mit134 1 80.78 T 1.89E06 AF012133 D1Mit83 1 86.03 T 3.43E05 Z12206 D1Mit134 1 80.78 T 4.37E05 NM.025989 02.151.240 2 340.66 T 6.89E05 NM.025989 D2Mit282 2 344.48 T 4.15E05 NM.025989 D2Mit493 2 349.91 T 8.39E06 NM.025989 Src 2 353.40 T 2.72E05 NM.025989 D2Mit411 2 355.52 T 3.57E06 NM.025989 D2Mit412 2 358.28 T 2.06E05 NM.025989 D2Mit51 2 359.08 T 5.53E05 AK007603 D11Mit337 11 1627.76 C 2.98E02 AK004313 D3Mit19 3 535.89 C 1.07E02 AK009512 D1Mit155 1 194.32 C 2.66E02 AK008392 D14Mit131 14 1975.00 C 2.96E02 Continued on next page

133 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P NM.024177 D12Mit150 12 1742.65 C 1.00E02 NM.025310 D5Mit155 5 787.90 T 4.74E05 NM.025310 D5Mit157 5 788.85 T 7.28E05 NM.026253 D1Mit216 1 80.32 T 2.61E05 NM.026253 D1Mit134 1 80.78 T 1.07E05 NM.026297 D1Mit328 1 68.25 T 5.44E05 NM.026297 D1Mit134 1 80.78 T 3.69E05 BC011191 08.062.280 8 1186.59 T 5.09E05 NM.010419 D14Mit129 14 1901.70 T 4.51E05 AB041547 01.059.350 1 62.23 T 4.49E05 AF035948 D17Mit221 17 2270.31 C 4.14E02 NM.011756 D2Mit5 2 205.35 T 3.27E05 AF039417 D17Mit221 17 2270.31 C 4.48E02 AJ223207 D1Mit216 1 80.32 T 3.83E06 X78885 D1Mit216 1 80.32 T 4.06E05 X78885 D1Mit134 1 80.78 T 1.36E05 AF041887 D1Mit128 1 72.96 T 5.05E05 AF041887 01.070.445 1 73.31 T 2.34E05 AF041893 D19Mit61 19 2380.97 T 1.77E05 AF041893 D19Mit28 19 2381.35 T 4.36E06 AK006746 D3Mit19 3 535.89 C 1.32E02 AK016547 D19Mit68 19 2369.61 T 2.20E05 AK007149 D17Mit221 17 2270.31 C 3.86E02 D30730 D1Mit216 1 80.32 T 2.03E05 D30730 D1Mit134 1 80.78 T 5.89E06 AK018342 D1Mit216 1 80.32 T 3.73E05 AK020709 D1Mit216 1 80.32 T 6.39E05 AK020709 D1Mit134 1 80.78 T 3.02E05 AK019273 D6Mit14 6 986.00 C 1.16E02 Continued on next page

134 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P NM.024201 D6Mit14 6 986.00 C 1.38E02 NM.025404 D4Mit344 4 689.95 C 4.56E02 AK003844 D14Mit131 14 1975.00 C 4.62E02 AK009349 D6Mit14 6 986.00 C 2.77E02 NM.007498 D7Mit62 7 1062.34 T 5.83E06 NM.008781 D1Mit134 1 80.78 T 3.43E05 AB017104 D14Mit131 14 1975.00 C 4.34E02 NM.009466 D1Mit216 1 80.32 T 1.51E05 NM.009466 D1Mit134 1 80.78 T 1.97E05 NM.011710 D19Mit61 19 2380.97 T 4.16E05 NM.011710 D19Mit28 19 2381.35 T 2.77E05 NM.011487 D15Mit35 15 2082.50 C 4.34E02 AF272844 D1Mit216 1 80.32 T 3.55E05 AF272844 D1Mit134 1 80.78 T 6.91E05 NM.007884 D2Mit457 2 376.81 C 2.63E02 AF012147 D1Mit216 1 80.32 T 1.07E05 AF012147 D1Mit134 1 80.78 T 4.90E05 AF041969 D19Mit61 19 2380.97 T 1.84E05 AF041969 D19Mit28 19 2381.35 T 9.90E06 AK018120 D16Mit151 16 2155.89 T 5.17E05 AK018120 16.084.925 16 2170.77 T 8.58E06 AK020397 D12Mit242 12 1655.75 T 5.39E05 AK020397 D12Mit136 12 1657.11 T 7.33E05 AK013759 D1Mit216 1 80.32 T 6.24E05 BC006073 D1Mit216 1 80.32 T 8.23E06 BC006073 D1Mit134 1 80.78 T 8.28E06 AK006081 D1Mit178 1 69.13 T 3.92E05 AK006081 D1Mit216 1 80.32 T 4.97E05 AK006081 D1Mit134 1 80.78 T 2.55E06 Continued on next page

135 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P AK008856 D2Mit304 2 323.95 T 7.50E05 AK010668 D2Mit286 2 350.45 T 1.19E05 AK016003 D1Mit216 1 80.32 T 2.55E05 AK016003 D1Mit134 1 80.78 T 2.91E05 NM.021536 D1Mit128 1 72.96 T 6.35E05 NM.021536 D1Mit19 1 74.13 T 4.00E05 NM.021536 D1Mit216 1 80.32 T 3.91E05 NM.011474 D1Mit216 1 80.32 T 3.19E06 NM.011474 D1Mit134 1 80.78 T 3.69E05 AF159256 D1Mit216 1 80.32 T 9.06E07 AF159256 D1Mit134 1 80.78 T 1.35E06 AB045322 D1Mit216 1 80.32 T 3.92E05 NM.011028 D8Mit124 8 1139.72 T 6.72E05 NM.008094 D1Mit216 1 80.32 T 7.66E06 NM.008094 D1Mit134 1 80.78 T 5.70E05 NM.016812 D1Mit134 1 80.78 T 3.04E05 NM.016812 D1Mit83 1 86.03 T 2.18E05 AK012351 D7Mit259 7 1123.02 C 4.13E02 AK009110 D11Mit337 11 1627.76 C 2.28E02 AK013511 D1Mit216 1 80.32 T 9.31E07 AK013511 D1Mit134 1 80.78 T 3.38E05 L17335 D1Mit216 1 80.32 T 3.56E05 AK020954 D12Mit150 12 1742.65 C 4.32E03 BC004705 D1Mit134 1 80.78 T 3.73E05 AK018200 D6Mit14 6 986.00 C 2.66E02 NM.026336 D14Mit131 14 1975.00 C 2.65E02 AK015510 D7Mit259 7 1123.02 C 4.92E02 NM.025934 D1Mit216 1 80.32 T 7.51E07 NM.025934 D1Mit134 1 80.78 T 2.44E06 Continued on next page

136 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P AK006138 D1Mit216 1 80.32 T 3.37E05 AK006138 D1Mit134 1 80.78 T 6.17E05 NM.009013 D1Mit216 1 80.32 T 5.98E05 NM.019835 D19Mit127 19 2377.75 T 5.41E05 NM.009720 D1Mit216 1 80.32 T 1.46E05 NM.009720 D1Mit134 1 80.78 T 2.09E05 NM.009720 D8Mit124 8 1139.72 T 7.13E05 AB048947 D3Mit19 3 535.89 C 3.58E02 X03766 D1Mit216 1 80.32 T 3.00E05 X03766 D1Mit134 1 80.78 T 5.49E06 NM.021439 D1Mit155 1 194.32 C 1.09E02 X70398 11.089.780 11 1591.88 T 8.74E07 X70398 D11Mit41 11 1597.84 T 1.58E07 X70398 D11Mit179 11 1598.60 T 1.04E06 AK014508 D19Mit28 19 2381.35 T 4.26E05 X77486 D19Mit28 19 2381.35 T 6.98E05 AK005733 D11Mit208 11 1567.30 T 7.40E05 AK017731 D1Mit216 1 80.32 T 5.77E05 AF179403 D1Mit145 1 167.32 T 5.88E05 AK014046 D19Mit127 19 2377.75 T 6.32E05 AK020556 D1Mit216 1 80.32 T 2.51E05 AK020556 D1Mit134 1 80.78 T 2.50E05 AK021118 D1Mit216 1 80.32 T 3.01E05 AF268912 D1Mit134 1 80.78 T 8.88E06 AK014853 D1Mit216 1 80.32 T 5.41E05 AK004098 D17Mit221 17 2270.31 C 1.85E02 AK020339 D1Mit134 1 80.78 T 6.99E05 NM.010469 D12Mit150 12 1742.65 C 2.87E02 X57277 D1Mit216 1 80.32 T 3.89E05 Continued on next page

137 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P X57277 D1Mit134 1 80.78 T 1.75E05 X57277 D8Mit124 8 1139.72 T 6.67E05 NM.007712 D7Mit259 7 1123.02 C 4.24E02 NM.020574 D1Mit216 1 80.32 T 6.48E05 NM.020574 D1Mit134 1 80.78 T 8.55E06 AJ130977 D1Mit328 1 68.25 T 3.50E05 AJ130977 D1Mit178 1 69.13 T 3.50E05 AJ130977 D1Mit19 1 74.13 T 7.33E05 AJ130977 D19Mit103 19 2419.31 T 2.26E05 NM.009048 D8Mit124 8 1139.72 T 7.59E05 X80435 D14Mit99 14 1876.58 T 6.61E05 NM.017390 D8Mit124 8 1139.72 T 3.99E05 AK005581 D11Mit34 11 1587.97 T 6.97E05 AF041918 D8Mit124 8 1139.72 T 5.58E05 X75096 D2Mit6 2 216.82 T 2.42E05 X03219 D19Mit103 19 2419.31 T 6.62E05 AK021084 D1Mit216 1 80.32 T 2.43E05 AK012296 D1Mit83 1 86.03 T 5.15E06 AK014899 D15Mit35 15 2082.50 C 4.03E02 AK016475 D1Mit216 1 80.32 T 1.05E05 AK016475 D1Mit134 1 80.78 T 2.19E06 AK014093 D1Mit134 1 80.78 T 3.60E05 AK013418 D1Mit216 1 80.32 T 5.84E05 AK010984 D1Mit216 1 80.32 T 1.18E05 BC005637 D8Mit128 8 1185.17 T 5.43E05 AK007690 D1Mit216 1 80.32 T 4.24E05 AK007690 D1Mit134 1 80.78 T 4.73E05 AK018641 D4Mit344 4 689.95 C 2.02E02 AK006934 D19Mit28 19 2381.35 T 7.48E05 Continued on next page

138 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P AF278712 D11Mit318 11 1570.91 T 5.53E05 NM.010938 D1Mit216 1 80.32 T 1.78E05 NM.010938 D1Mit134 1 80.78 T 1.09E05 NM.008701 DXMit186 X 2582.74 C 2.17E02 NM.018779 D1Mit216 1 80.32 T 4.72E05 NM.021792 D1Mit216 1 80.32 T 7.44E05 AF301619 05.142.105 5 835.33 C 2.47E02 NM.010171 D1Mit216 1 80.32 T 9.33E06 NM.010171 D1Mit134 1 80.78 T 9.68E06 NM.010935 D11Mit34 11 1587.97 T 5.63E05 NM.008448 D6Mit14 6 986.00 C 2.24E02 AF029748 D8Mit156 8 1253.28 C 2.28E02 AK014035 D12Mit150 12 1742.65 C 4.29E02 AK015335 D3Mit19 3 535.89 C 1.48E02 AF357240 D1Mit216 1 80.32 T 1.80E05 AF357240 D1Mit134 1 80.78 T 5.59E06 AK013442 D19Mit53 19 2410.66 T 6.37E05 AK017316 D7Nds3 7 1052.85 T 4.03E05 AK003388 D1Mit216 1 80.32 T 1.53E05 AK003388 D1Mit134 1 80.78 T 1.87E05 AK019918 D1Mit134 1 80.78 T 2.04E06 AK019918 D1Mit83 1 86.03 T 4.93E05 AK015028 D1Mit216 1 80.32 T 5.14E05 AK015028 D1Mit134 1 80.78 T 5.00E05 AK019742 DXMit186 X 2582.74 C 2.06E02 NM.026515 D1Mit19 1 74.13 T 7.21E05 AK005397 D1Mit216 1 80.32 T 2.32E05 AK005397 D1Mit134 1 80.78 T 1.61E05 AK007157 D1Mit155 1 194.32 C 3.75E02 Continued on next page

139 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P AK005857 D1Mit216 1 80.32 T 6.46E05 AK005857 D1Mit134 1 80.78 T 5.14E05 AK017599 D15Mit35 15 2082.50 C 4.43E02 AK018267 D14Mit99 14 1876.58 T 7.17E05 AK012404 D1Mit216 1 80.32 T 4.63E05 AF315590 D1Mit216 1 80.32 T 4.10E06 AF315590 D1Mit134 1 80.78 T 3.51E06 X03040 D14Mit99 14 1876.58 T 4.48E05 NM.013568 05.142.105 5 835.33 C 4.58E02 U06665 D1Mit134 1 80.78 T 1.74E05 NM.013867 D4Mit344 4 689.95 C 4.13E02 NM.009750 D1Mit216 1 80.32 T 3.23E05 NM.020036 D9Mit151 9 1375.37 C 1.19E02 BC010799 D1Mit216 1 80.32 T 6.79E05 BC010799 D1Mit134 1 80.78 T 6.95E05 M26423 D1Mit134 1 80.78 T 2.61E05 AF303827 D1Mit216 1 80.32 T 6.55E05 NM.008142 D13Mit35 13 1862.73 C 4.24E02 AK020264 D19Mit61 19 2380.97 T 2.39E05 AK020264 D19Mit28 19 2381.35 T 4.75E05 NM.025623 D19Mit127 19 2377.75 T 6.86E05 NM.025623 D19Mit28 19 2381.35 T 3.19E05 AK020125 D1Mit216 1 80.32 T 5.57E05 AK020125 D1Mit134 1 80.78 T 2.63E05 AK016760 D15Mit35 15 2082.50 C 2.14E02 AJ409496 D1Mit19 1 74.13 T 3.00E05 AJ409496 D1Mit134 1 80.78 T 4.89E06 AK021364 D18Mit14 18 2314.24 T 2.67E07 AK021364 D18Mit17 18 2315.06 T 3.17E07 Continued on next page

140 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P AK021364 D18Mit149 18 2320.57 T 2.27E08 AK016455 D19Mit61 19 2380.97 T 3.69E05 AK016455 D19Mit28 19 2381.35 T 6.11E05 AK017799 D1Mit19 1 74.13 T 5.90E05 AK017799 D1Mit216 1 80.32 T 1.44E05 AK017799 D1Mit134 1 80.78 T 3.55E05 AK014557 D6Mit14 6 986.00 C 3.20E02 NM.008872 D1Mit128 1 72.96 T 1.33E05 NM.008872 D1Mit19 1 74.13 T 3.09E05 NM.008872 D1Mit216 1 80.32 T 9.09E06 NM.008872 D1Mit134 1 80.78 T 9.74E06 NM.009275 D1Mit216 1 80.32 T 6.98E05 NM.009275 D1Mit134 1 80.78 T 1.53E05 NM.013842 X.023.015 X 2457.98 T 9.13E06 NM.007377 D4Mit344 4 689.95 C 2.87E02 NM.013781 D1Mit216 1 80.32 T 2.34E05 NM.013781 D1Mit134 1 80.78 T 4.08E05 AK006283 D6Mit14 6 986.00 C 2.04E02 AK006283 D1Mit216 1 80.32 T 2.96E05 AK006283 D1Mit134 1 80.78 T 3.05E05 NM.007815 D1Mit128 1 72.96 T 2.08E05 AK008511 D12Mit150 12 1742.65 C 1.71E02 AK013837 D1Mit134 1 80.78 T 3.47E05 AK005804 D1Mit134 1 80.78 T 1.37E05 NM.025597 D1Mit19 1 74.13 T 4.72E06 NM.025597 D1Mit216 1 80.32 T 4.64E06 NM.025597 D1Mit134 1 80.78 T 6.79E06 AK011671 D1Mit216 1 80.32 T 1.06E05 AK011671 D1Mit134 1 80.78 T 9.13E06 Continued on next page

141 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P AK020996 D10Mit179 10 1495.44 C 2.72E02 AK014697 D1Mit155 1 194.32 C 1.68E02 BC007167 D1Mit216 1 80.32 T 2.09E05 BC007167 D1Mit134 1 80.78 T 2.41E05 NM.008083 D2Mit286 2 350.45 T 4.03E05 U52197 05.142.105 5 835.33 C 3.12E02 AF074266 D1Mit216 1 80.32 T 1.00E05 AF074266 D1Mit134 1 80.78 T 1.38E06 AF074266 D1Mit83 1 86.03 T 6.27E05 NM.021483 D2Mit286 2 350.45 T 4.42E05 NM.009635 D2Mit457 2 376.81 C 2.01E02 NM.009747 D1Mit216 1 80.32 T 8.06E06 NM.009747 D1Mit134 1 80.78 T 2.66E06 AB041555 DXMit186 X 2582.74 C 2.85E02 U44941 D1Mit128 1 72.96 T 1.01E05 U44941 01.070.445 1 73.31 T 3.30E05 U44941 D1Mit19 1 74.13 T 2.67E06 U44941 D1Mit181 1 74.94 T 8.99E06 BC004613 D6Mit14 6 986.00 C 4.73E02 M34893 D19Mit28 19 2381.35 T 1.44E05 AK012559 D5Mit309 5 769.16 T 1.59E05 AF285574 D1Mit134 1 80.78 T 2.33E05 BC005487 D1Mit216 1 80.32 T 7.52E05 BC005487 D1Mit134 1 80.78 T 4.06E05 AK006713 D8Mit128 8 1185.17 T 6.79E05 D29939 D11Mit34 11 1587.97 T 6.43E05 AK006294 D19Mit61 19 2380.97 T 3.18E05 AK006294 D19Mit28 19 2381.35 T 3.44E05 BC011420 D1Mit216 1 80.32 T 1.87E05 Continued on next page

142 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P BC011420 D1Mit134 1 80.78 T 4.56E06 AK007311 D19Mit28 19 2381.35 T 7.44E05 BC013520 D8Mit124 8 1139.72 T 5.06E05 AK015180 D8Mit124 8 1139.72 T 4.12E05 NM.010030 D1Mit134 1 80.78 T 1.02E05 NM.007433 D6Mit14 6 986.00 C 3.70E02 NM.016739 D11Mit337 11 1627.76 C 4.78E02 NM.019513 D1Mit216 1 80.32 T 6.83E05 NM.019513 D1Mit134 1 80.78 T 6.65E05 AF133913 D11Mit337 11 1627.76 C 3.67E02 NM.010676 D19Mit28 19 2381.35 T 5.11E05 X58586 D1Mit216 1 80.32 T 4.12E05 NM.027288 D1Mit134 1 80.78 T 6.39E05 AK015669 D1Mit216 1 80.32 T 1.35E06 AK015669 D1Mit134 1 80.78 T 3.80E06 AF363030 D1Mit216 1 80.32 T 6.48E05 AF363030 D1Mit134 1 80.78 T 6.75E05 AK014848 D1Mit216 1 80.32 T 5.93E05 AK014848 D1Mit134 1 80.78 T 2.36E05 AK012713 D1Mit216 1 80.32 T 2.45E06 AK012713 D1Mit134 1 80.78 T 2.78E07 AK012713 D1Mit83 1 86.03 T 3.16E05 AK002462 D1Mit216 1 80.32 T 3.17E06 AK002462 D1Mit134 1 80.78 T 5.66E05 AK017509 02.125.650 2 323.03 T 3.45E05 BC008273 D8Mit156 8 1253.28 C 2.58E02 AK007073 D8Mit156 8 1253.28 C 2.77E02 NM.026410 D1Mit178 1 69.13 T 6.26E05 NM.026410 D1Mit216 1 80.32 T 3.40E06 Continued on next page

143 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P NM.026410 D1Mit134 1 80.78 T 1.12E05 AK005591 D2Mit436 2 275.87 T 4.39E05 AK015698 D1Mit216 1 80.32 T 2.14E05 AK015698 D1Mit134 1 80.78 T 1.21E05 AK007964 D19Mit6 19 2426.62 C 2.11E03 Z78161 D1Mit216 1 80.32 T 5.37E05 Z78161 D1Mit134 1 80.78 T 1.90E05 M31775 D14Mit99 14 1876.58 T 4.74E05 AB039919 D1Mit216 1 80.32 T 6.68E05 NM.007424 D1Mit216 1 80.32 T 2.94E06 NM.007424 D1Mit134 1 80.78 T 1.37E06 NM.019512 D1Mit134 1 80.78 T 6.22E05 NM.009269 D1Mit216 1 80.32 T 6.57E05 NM.010091 D12Mit150 12 1742.65 C 8.86E03 AB020542 D2Mit304 2 323.95 T 7.22E05 NM.011839 D6Mit14 6 986.00 C 3.91E02 NM.013692 D1Mit216 1 80.32 T 4.25E05 NM.011066 D14Mit131 14 1975.00 C 2.65E02 AF041908 D12Mit150 12 1742.65 C 4.45E02 AK018128 D1Mit178 1 69.13 T 2.90E05 AK018128 D1Mit19 1 74.13 T 1.01E05 AK018128 D1Mit181 1 74.94 T 4.93E05 AK018128 D1Mit216 1 80.32 T 6.72E06 AK018128 D1Mit134 1 80.78 T 1.86E05 AK018128 D1Mit83 1 86.03 T 1.86E05 NM.030251 D1Mit216 1 80.32 T 1.73E05 NM.030251 D1Mit134 1 80.78 T 1.08E05 NM.030251 D1Mit83 1 86.03 T 1.05E05 AK005808 DXMit1 X 2491.85 T 6.73E05 Continued on next page

144 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P AK014244 D16Mit86 16 2176.22 C 4.53E02 AK014244 DXMit1 X 2491.85 T 1.10E05 NM.033145 DXMit1 X 2491.85 T 2.57E05 AK009638 D1Mit216 1 80.32 T 9.50E06 AK009638 D1Mit134 1 80.78 T 5.22E06 BC003334 D1Mit216 1 80.32 T 5.85E05 BC002199 D1Mit134 1 80.78 T 5.81E05 L17336 D1Mit216 1 80.32 T 2.45E05 L17336 D1Mit134 1 80.78 T 3.85E05 L17336 D8Mit124 8 1139.72 T 4.35E05 AK013552 D1Mit216 1 80.32 T 1.03E05 AK013552 D1Mit134 1 80.78 T 4.17E06 AK009235 D1Mit178 1 69.13 T 7.15E05 AK009235 D1Mit19 1 74.13 T 8.40E06 AK009235 D1Mit181 1 74.94 T 1.47E05 AK009235 D1Mit216 1 80.32 T 3.41E06 AK009235 D1Mit134 1 80.78 T 7.13E06 AK015465 02.125.650 2 323.03 T 3.06E06 AK015465 D2Mit304 2 323.95 T 6.37E06 AF154571 D1Mit216 1 80.32 T 7.01E06 AF154571 D1Mit134 1 80.78 T 4.08E06 AB017136 D1Mit216 1 80.32 T 3.11E06 AB017136 D1Mit134 1 80.78 T 3.27E06 AK006648 D3Mit19 3 535.89 C 9.17E03 U75374 D1Mit216 1 80.32 T 1.73E05 NM.025946 05.142.105 5 835.33 C 1.45E02 AK020003 11.059.515 11 1566.46 T 6.00E05 AK020003 D11Mit208 11 1567.30 T 2.61E05 AK020003 D11Mit318 11 1570.91 T 6.25E05 Continued on next page

145 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P BC004701 D1Mit216 1 80.32 T 1.44E05 BC004701 D1Mit134 1 80.78 T 5.91E06 Z12259 DXMit186 X 2582.74 C 3.15E02 BC011329 D1Mit134 1 80.78 T 7.02E06 BC011329 D1Mit83 1 86.03 T 3.63E05 AK010385 D1Mit216 1 80.32 T 1.39E05 BC014835 D1Mit216 1 80.32 T 1.16E05 BC014835 D1Mit134 1 80.78 T 1.41E05 AK004344 D12Mit150 12 1742.65 C 3.60E02 NM.010371 D1Mit216 1 80.32 T 2.03E06 NM.010371 D1Mit134 1 80.78 T 6.48E06 AK021075 D4Mit12 4 660.71 T 1.08E05 AK021075 D4Mit147 4 661.24 T 4.99E05 AK014864 D6Mit14 6 986.00 C 4.90E02 D50393 D1Mit216 1 80.32 T 5.69E05 D50393 D1Mit134 1 80.78 T 5.49E05 AK005218 D6Mit14 6 986.00 C 1.15E02 AK017517 01.059.350 1 62.23 T 4.02E05 AK017517 Mtap2 1 66.72 T 6.64E06 AK017517 D1Mit328 1 68.25 T 4.03E06 AK017517 D1Mit178 1 69.13 T 1.78E05 AK017517 D1Mit19 1 74.13 T 4.92E05 AK017517 D1Mit216 1 80.32 T 7.42E05 AK017517 D1Mit134 1 80.78 T 3.21E05 AK012889 D1Mit216 1 80.32 T 2.81E05 AK012889 D1Mit134 1 80.78 T 1.30E05 AK012889 D1Mit83 1 86.03 T 7.05E05 AK007024 D8Mit124 8 1139.72 T 6.79E05 AK014468 D1Mit134 1 80.78 T 4.16E05 Continued on next page

146 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P NM.023231 D6Mit14 6 986.00 C 3.59E02 NM.025569 D7Mit334 7 1116.49 T 5.04E05 AF156890 D1Mit128 1 72.96 T 6.87E05 NM.008635 D1Mit216 1 80.32 T 1.33E05 NM.019765 D1Mit216 1 80.32 T 4.23E05 NM.019765 D1Mit134 1 80.78 T 5.80E06 NM.016853 D9Mit227 9 1295.44 T 5.49E05 NM.008770 D8Mit156 8 1253.28 C 2.27E02 NM.009649 D3Mit19 3 535.89 C 2.82E02 NM.008333 D1Mit216 1 80.32 T 4.73E05 NM.008333 D1Mit134 1 80.78 T 5.82E05 X56716 D6Mit14 6 986.00 C 1.93E02 AK018920 D1Mit216 1 80.32 T 9.89E07 AK018920 D1Mit134 1 80.78 T 1.96E07 AK018920 D1Mit83 1 86.03 T 1.72E05 AK017885 D1Mit216 1 80.32 T 1.52E05 AK017885 D1Mit134 1 80.78 T 2.17E06 AK017885 D1Mit83 1 86.03 T 7.48E05 Z22076 D4Mit344 4 689.95 C 1.66E03 AK007715 D1Mit155 1 194.32 C 2.13E02 AK015100 D14Mit99 14 1876.58 T 5.90E05 NM.033041 D8Mit128 8 1185.17 T 7.64E05 NM.033041 08.062.280 8 1186.59 T 2.70E05 AK007262 D8Mit128 8 1185.17 T 2.51E05 AK007262 08.062.280 8 1186.59 T 7.00E05 NM.026156 D1Mit216 1 80.32 T 3.85E05 AK002857 D1Mit216 1 80.32 T 7.08E06 AK002857 D1Mit134 1 80.78 T 1.63E06 AK002857 D8Mit124 8 1139.72 T 3.21E05 Continued on next page

147 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P BC002274 D1Mit216 1 80.32 T 1.81E05 BC002274 D1Mit134 1 80.78 T 3.45E05 AK012619 D1Mit216 1 80.32 T 3.95E06 AK009355 D19Mit103 19 2419.31 T 7.08E05 BC004722 D19Mit68 19 2369.61 T 4.91E05 AK013135 D1Mit216 1 80.32 T 4.42E05 AK013135 D1Mit83 1 86.03 T 6.84E05 AK018473 D1Mit83 1 86.03 T 7.42E05 NM.008092 DXMit186 X 2582.74 C 7.60E03 AF229635 D1Mit134 1 80.78 T 4.32E05 U66620 D12Mit101 12 1729.13 T 3.00E05 NM.019959 D1Mit216 1 80.32 T 5.09E06 NM.019959 D1Mit134 1 80.78 T 6.12E06 U29500 D8Mit128 8 1185.17 T 5.55E05 NM.021519 D1Mit155 1 194.32 C 5.97E03 NM.009593 D1Mit216 1 80.32 T 2.92E05 NM.009593 D1Mit134 1 80.78 T 4.44E05 NM.007677 D7Mit350 7 1061.29 T 7.64E05 BC003331 D1Mit216 1 80.32 T 4.15E05 X96700 D1Mit181 1 74.94 T 7.49E05 X96700 D1Mit134 1 80.78 T 5.03E05 BC003881 D1Mit216 1 80.32 T 3.44E05 BC004773 D1Mit216 1 80.32 T 3.87E05 BC004773 D1Mit134 1 80.78 T 3.56E05 AK016837 D12Mit3 12 1703.22 T 1.64E05 AK021050 D1Mit216 1 80.32 T 1.74E06 AK021050 D1Mit134 1 80.78 T 3.35E05 AK020414 D10Mit179 10 1495.44 C 1.95E02 BC007186 D8Mit156 8 1253.28 C 4.75E02 Continued on next page

148 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P AK020526 D11Mit337 11 1627.76 C 4.52E02 AK019418 05.142.105 5 835.33 C 4.92E02 AK005865 D1Mit216 1 80.32 T 4.40E05 BC005485 D1Mit216 1 80.32 T 1.25E05 BC005485 D1Mit134 1 80.78 T 7.22E05 NM.025911 D1Mit216 1 80.32 T 3.78E05 AK015660 D1Mit134 1 80.78 T 2.08E05 NM.028562 D1Mit216 1 80.32 T 4.34E05 NM.028562 D1Mit134 1 80.78 T 4.34E05 AK020180 D1Mit216 1 80.32 T 5.75E07 AK020180 D1Mit134 1 80.78 T 2.21E05 AK018993 D13Mit35 13 1862.73 C 2.92E02 AK008501 D2Mit102 2 309.94 T 7.49E05 AK008501 D18Mit14 18 2314.24 T 1.57E05 AK008501 D18Mit17 18 2315.06 T 8.81E06 AK008501 D18Mit149 18 2320.57 T 1.62E06 AK020395 D11Mit34 11 1587.97 T 2.24E05 AK011377 D14Mit131 14 1975.00 C 2.85E02 NM.019566 D6Mit14 6 986.00 C 3.12E02 NM.011477 D1Mit216 1 80.32 T 5.95E06 NM.011477 D1Mit134 1 80.78 T 3.99E06 Z71173 D14Mit99 14 1876.58 T 2.74E05 Z71173 D14Mit50 14 1879.96 T 7.46E05 NM.008098 D8Mit124 8 1139.72 T 7.43E05 NM.021310 D2Mit372 2 231.93 T 6.61E05 M91458 D11Mit337 11 1627.76 C 4.54E03 M31585 D14Mit99 14 1876.58 T 5.05E05 NM.009675 D1Mit128 1 72.96 T 6.84E05 NM.009675 D1Mit19 1 74.13 T 2.16E05 Continued on next page

149 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P NM.009675 D1Mit216 1 80.32 T 1.29E06 NM.009675 D1Mit134 1 80.78 T 4.57E07 NM.009487 D10Mit179 10 1495.44 C 4.27E02 AF237770 D3Mit19 3 535.89 C 8.80E03 NM.010800 D9Mit151 9 1375.37 C 4.31E02 AF039601 D1Mit216 1 80.32 T 1.72E05 AJ279846 D4Mit344 4 689.95 C 1.39E02 M16357 D10Mit179 10 1495.44 C 3.41E02 AK020462 D1Mit216 1 80.32 T 3.30E05 AK020462 D1Mit134 1 80.78 T 2.38E05 NM.025919 D3Mit19 3 535.89 C 2.51E02 AK018070 D1Mit134 1 80.78 T 4.84E05 AK003083 D1Mit216 1 80.32 T 3.61E05 AK003083 D1Mit134 1 80.78 T 2.77E05 AK020737 D1Mit216 1 80.32 T 4.45E07 AK020737 D1Mit134 1 80.78 T 6.12E07 AK020737 D1Mit83 1 86.03 T 7.46E05 AK017325 D12Mit150 12 1742.65 C 2.07E02 AK019737 D1Mit216 1 80.32 T 3.30E05 AK019737 D1Mit134 1 80.78 T 4.92E05 AK014695 D4Mit344 4 689.95 C 1.08E02 AK019076 D7Mit259 7 1123.02 C 1.26E02 NM.019703 D1Mit134 1 80.78 T 3.95E05 NM.009187 D1Mit216 1 80.32 T 2.30E05 NM.009187 D1Mit134 1 80.78 T 2.31E05 AF092507 D1Mit134 1 80.78 T 6.50E05 NM.007621 D1Mit216 1 80.32 T 6.83E05 NM.007621 D1Mit134 1 80.78 T 6.64E05 NM.007621 D19Mit68 19 2369.61 T 3.41E06 Continued on next page

150 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P NM.007621 D19Mit109 19 2372.17 T 4.05E05 M35732 D1Mit216 1 80.32 T 4.42E05 M35732 D1Mit134 1 80.78 T 1.03E05 NM.007544 D15Mit13 15 1981.79 T 6.94E08 NM.007544 15.039.230 15 2020.21 T 8.87E06 NM.009412 D7Mit259 7 1123.02 C 3.86E02 D82866 05.142.105 5 835.33 C 1.98E02 NM.020276 D19Mit6 19 2426.62 C 5.19E03 NM.020276 D19Mit28 19 2381.35 T 2.93E05 U06124 D9Mit151 9 1375.37 C 6.70E03 AK014977 D1Mit134 1 80.78 T 5.62E05 AK020780 D1Mit134 1 80.78 T 2.15E05 AK011939 D1Mit216 1 80.32 T 4.47E05 AK011939 D1Mit134 1 80.78 T 1.56E05 AK019785 D6Mit254 6 966.58 T 6.73E05 AY042191 D1Mit216 1 80.32 T 1.51E05 AY042191 D1Mit134 1 80.78 T 8.80E06 AK003596 D11Mit337 11 1627.76 C 3.41E02 AK004153 D1Mit216 1 80.32 T 1.73E06 AK004153 D1Mit134 1 80.78 T 2.96E05 AK006811 D10Mit179 10 1495.44 C 8.77E04 AK016150 D1Mit216 1 80.32 T 3.25E05 AK016150 D1Mit134 1 80.78 T 1.07E05 AF370121 D2Mit304 2 323.95 T 7.65E05 M37596 D19Mit6 19 2426.62 C 4.38E02 AK021166 D1Mit216 1 80.32 T 5.11E05 AK021166 D1Mit134 1 80.78 T 5.01E06 BC004803 D1Mit216 1 80.32 T 6.52E05 NM.026634 D1Mit216 1 80.32 T 8.50E06 Continued on next page

151 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P NM.026634 D1Mit134 1 80.78 T 1.96E05 AK016513 D8Mit128 8 1185.17 T 6.87E05 NM.008700 08.062.280 8 1186.59 T 4.08E05 NM.021383 D6Mit14 6 986.00 C 1.70E02 AF041862 D2Mit457 2 376.81 C 4.11E03 NM.007556 D1Mit216 1 80.32 T 6.66E05 NM.007556 D1Mit134 1 80.78 T 2.10E05 NM.007556 D8Mit124 8 1139.72 T 3.34E05 NM.018887 D1Mit216 1 80.32 T 1.88E05 NM.008740 D14Mit131 14 1975.00 C 3.15E02 NM.009528 D1Mit134 1 80.78 T 1.43E05 NM.009528 D1Mit83 1 86.03 T 4.91E05 NM.019392 D10Mit179 10 1495.44 C 9.96E03 AF151110 D1Mit216 1 80.32 T 4.71E06 AF151110 D1Mit134 1 80.78 T 5.93E05 AF151110 D11Mit208 11 1567.30 T 5.20E05 M73551 D3Mit46 3 407.06 T 7.61E05 AK011047 D8Mit124 8 1139.72 T 1.57E05 AB049453 D1Mit155 1 194.32 C 4.84E02 AK010610 D8Mit156 8 1253.28 C 1.35E02 AK010610 D7Mit301 7 1069.38 T 4.92E05 AK004244 D1Mit134 1 80.78 T 4.52E05 NM.030564 D1Mit83 1 86.03 T 5.02E05 AK007786 D10Mit179 10 1495.44 C 1.52E02 BC006601 D1Mit128 1 72.96 T 3.10E05 BC006601 D1Mit19 1 74.13 T 1.11E05 BC006601 D1Mit216 1 80.32 T 4.57E06 BC006601 D1Mit134 1 80.78 T 2.23E06 AK011828 D1Mit216 1 80.32 T 1.33E05 Continued on next page

152 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P AK011828 D1Mit134 1 80.78 T 5.62E06 AK011828 D8Mit124 8 1139.72 T 4.19E05 AK015366 D1Mit134 1 80.78 T 7.15E05 AK012078 D1Mit134 1 80.78 T 4.14E05 AK006079 D1Mit216 1 80.32 T 1.19E05 AK006079 D1Mit134 1 80.78 T 1.38E06 AK006079 D1Mit83 1 86.03 T 6.60E05 BC005630 D18Mit144 18 2361.14 T 7.63E05 AK010018 D14Mit99 14 1876.58 T 2.78E05 NM.029813 D19Mit6 19 2426.62 C 4.92E02 NM.019689 D07Msw060 7 1048.12 T 2.41E05 NM.019689 D7Nds3 7 1052.85 T 3.84E05 NM.019689 07.073.835 7 1057.91 T 9.34E06 NM.019689 D7Mit62 7 1062.34 T 3.36E05 NM.008080 D1Mit216 1 80.32 T 2.87E05 NM.008080 D1Mit134 1 80.78 T 6.59E05 NM.020561 D1Mit216 1 80.32 T 1.63E05 NM.020561 D1Mit134 1 80.78 T 5.17E06 NM.021877 D4Mit344 4 689.95 C 7.66E03 AB024499 D1Mit216 1 80.32 T 4.77E05 AB024499 D1Mit134 1 80.78 T 6.03E05 NM.007460 D11Mit337 11 1627.76 C 1.70E03 NM.011930 D6Mit14 6 986.00 C 9.48E03 NM.008067 D1Mit216 1 80.32 T 7.26E08 NM.008067 D1Mit134 1 80.78 T 2.45E05 NM.007389 D11Mit337 11 1627.76 C 3.52E02 NM.015763 D1Mit216 1 80.32 T 3.28E07 NM.015763 D1Mit134 1 80.78 T 2.70E06 NM.015763 D1Mit83 1 86.03 T 1.92E05 Continued on next page

153 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P NM.021380 D2Mit457 2 376.81 C 1.09E02 NM.019820 D3Mit19 3 535.89 C 3.81E02 AF132083 D9Mit151 9 1375.37 C 1.59E02 NM.008119 D1Mit216 1 80.32 T 8.51E06 NM.008119 D1Mit134 1 80.78 T 4.65E06 Y13988 06.100.540 6 942.66 T 5.10E05 AK005953 D1Mit216 1 80.32 T 1.65E05 AK005953 D1Mit134 1 80.78 T 2.32E05 AK012892 D1Mit216 1 80.32 T 1.67E05 AK011831 D1Mit216 1 80.32 T 1.88E05 AK011831 D1Mit134 1 80.78 T 7.13E06 AK007177 D8Mit156 8 1253.28 C 2.03E02 BC005558 D18Mit144 18 2361.14 C 1.14E02 AK006239 D11Mit337 11 1627.76 C 3.77E02 AK008614 D1Mit216 1 80.32 T 3.62E05 AK008614 D1Mit134 1 80.78 T 1.66E05 AK019801 D6Mit14 6 986.00 C 6.59E03 AK017000 02.125.650 2 323.03 T 2.21E05 AK017000 D2Mit304 2 323.95 T 1.80E05 AK006638 02.125.650 2 323.03 T 1.04E05 AK006638 D2Mit304 2 323.95 T 1.28E05 AK018337 D15Mit35 15 2082.50 C 9.51E03 AK017073 D1Mit216 1 80.32 T 1.02E05 AK017073 D1Mit134 1 80.78 T 1.33E05 NM.007747 D1Mit134 1 80.78 T 1.29E05 AY007202 D1Mit155 1 194.32 C 1.72E02 NM.009060 D1Mit216 1 80.32 T 1.05E05 NM.009060 D1Mit134 1 80.78 T 7.01E06 AY008297 D15Mit35 15 2082.50 C 3.08E02 Continued on next page

154 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P AF090374 D19Mit68 19 2369.61 T 6.47E05 AF090374 D19Mit53 19 2410.66 T 1.55E05 AF090374 D19Mit10 19 2412.67 T 2.61E05 AB041997 D8Mit156 8 1253.28 C 3.69E02 NM.019883 D1Mit216 1 80.32 T 2.06E05 NM.019883 D1Mit134 1 80.78 T 1.23E05 AF189817 D15Mit35 15 2082.50 C 1.02E02 NM.007561 D15Mit35 15 2082.50 C 2.15E03 NM.007561 D1Mit216 1 80.32 T 3.52E05 NM.007561 D1Mit134 1 80.78 T 3.18E05 AK009454 D13Mit35 13 1862.73 C 3.05E02 AJ002200 D1Mit216 1 80.32 T 6.00E06 AJ002200 D1Mit134 1 80.78 T 4.73E06 X06517 D1Mit216 1 80.32 T 1.05E05 M34608 D1Mit216 1 80.32 T 2.76E06 M34608 D1Mit134 1 80.78 T 1.12E06 AK018845 D1Mit134 1 80.78 T 7.38E05 AK013576 D2Mit436 2 275.87 T 4.22E05 AK006044 D1Mit134 1 80.78 T 3.71E05 AK013609 D13Mit35 13 1862.73 C 4.46E02 AK008774 D1Mit216 1 80.32 T 6.64E05 AK013406 D1Mit128 1 72.96 T 2.37E05 AK013406 01.070.445 1 73.31 T 6.24E05 AK013406 D1Mit19 1 74.13 T 9.19E06 AK013406 D1Mit181 1 74.94 T 2.38E05 AK013406 D1Mit216 1 80.32 T 4.16E07 AK013406 D1Mit134 1 80.78 T 9.66E07 NM.016661 D1Mit216 1 80.32 T 5.64E05 NM.015820 D7Mit259 7 1123.02 C 3.98E04 Continued on next page

155 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P NM.007802 D3Mit155 3 464.91 T 5.13E05 NM.010121 14.114.540 14 1973.97 T 6.46E05 U06666 D19Mit28 19 2381.35 T 1.89E05 NM.011908 D14Mit99 14 1876.58 T 7.19E06 NM.011504 D1Mit216 1 80.32 T 6.96E05 NM.013652 D6Mit149 6 947.00 T 3.24E05 NM.018820 08.062.280 8 1186.59 T 3.00E05 BC012232 D1Mit216 1 80.32 T 5.36E05 BC012232 D1Mit134 1 80.78 T 4.62E05 M97159 D1Mit134 1 80.78 T 3.56E05 AB041653 D1Mit134 1 80.78 T 7.61E05 AB041804 08.062.280 8 1186.59 T 5.77E05 Z12406 08.062.280 8 1186.59 T 4.62E05 AF041950 D19Mit61 19 2380.97 T 2.25E05 AF041950 D19Mit28 19 2381.35 T 1.57E05 AK010005 D1Mit216 1 80.32 T 1.16E05 AK010005 D1Mit134 1 80.78 T 9.56E06 AK009616 D10Mit179 10 1495.44 C 4.01E02 Z12292 D1Mit134 1 80.78 T 2.50E05 Z12292 D1Mit83 1 86.03 T 5.75E06 AK016622 D8Mit156 8 1253.28 C 6.13E03 AK016407 01.059.350 1 62.23 T 4.37E05 AK016407 Mtap2 1 66.72 T 3.83E05 AK015631 DXMsw076 X 2512.01 T 3.82E05 NM.026457 D2Mit457 2 376.81 C 1.45E02 AK019066 D1Mit155 1 194.32 C 3.28E02 AK021090 D11Mit41 11 1597.84 T 6.43E05 NM.009442 D1Mit216 1 80.32 T 1.74E07 NM.009442 D1Mit134 1 80.78 T 3.22E06 Continued on next page

156 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P NM.009667 D1Mit216 1 80.32 T 2.71E05 NM.009667 D1Mit134 1 80.78 T 2.11E05 NM.018885 D8Mit124 8 1139.72 T 5.62E05 AL355706 D16Mit86 16 2176.22 C 4.40E02 NM.008955 D7Mit62 7 1062.34 T 5.17E05 NM.021387 D1Mit216 1 80.32 T 6.81E05 NM.021387 D1Mit134 1 80.78 T 1.21E05 NM.021387 D1Mit83 1 86.03 T 8.55E06 NM.016705 D7Mit259 7 1123.02 C 2.05E02 NM.019773 D1Mit216 1 80.32 T 2.98E06 NM.019773 D1Mit134 1 80.78 T 2.66E06 NM.008844 D14Mit99 14 1876.58 T 1.45E05 NM.013863 D6Mit298 6 878.70 T 3.89E05 NM.013863 D19Msw029 19 2394.69 T 5.66E05 NM.019988 D10Mit179 10 1495.44 C 1.38E02 Y11221 D1Mit216 1 80.32 T 5.32E05 Y11221 D1Mit134 1 80.78 T 8.74E06 NM.025760 D1Mit216 1 80.32 T 1.69E05 NM.025760 D1Mit134 1 80.78 T 2.56E06 NM.025760 D1Mit83 1 86.03 T 7.41E05 AF362750 D9Mit227 9 1295.44 T 2.76E05 AK006096 D1Mit216 1 80.32 T 3.35E05 AK006096 D1Mit134 1 80.78 T 9.12E06 AK006096 D8Mit124 8 1139.72 T 3.05E05 AK018190 D6Mit14 6 986.00 C 4.18E02 AK018190 D1Mit134 1 80.78 T 2.55E05 AK006995 D1Mit216 1 80.32 T 1.02E06 AK006995 D1Mit134 1 80.78 T 1.57E08 AK006995 D1Mit83 1 86.03 T 4.69E06 Continued on next page

157 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P BC012284 D12Mit150 12 1742.65 C 2.72E02 NM.025903 D1Mit134 1 80.78 T 5.42E05 AK009370 D2Mit372 2 231.93 T 6.30E05 AK020077 D1Mit216 1 80.32 T 4.02E05 AK020077 D1Mit134 1 80.78 T 2.66E05 AK016186 D1Mit216 1 80.32 T 2.28E05 AK016186 D1Mit134 1 80.78 T 3.70E05 AK015575 D2Mit457 2 376.81 C 4.97E02 NM.025933 D1Mit216 1 80.32 T 1.42E05 NM.025933 D1Mit134 1 80.78 T 3.02E05 NM.025933 D1Mit83 1 86.03 T 4.62E05 AB029329 D8Mit124 8 1139.72 T 3.88E05 NM.011305 D1Mit216 1 80.32 T 3.75E05 NM.011305 D1Mit134 1 80.78 T 3.60E05 NM.021455 D1Mit216 1 80.32 T 1.11E05 NM.021455 D1Mit134 1 80.78 T 4.88E05 NM.010885 D1Mit216 1 80.32 T 1.76E05 NM.010885 D1Mit134 1 80.78 T 6.10E06 X75927 D11Mit337 11 1627.76 C 4.30E02 NM.010564 D19Mit28 19 2381.35 T 3.72E05 X79508 D6Mit14 6 986.00 C 4.07E02 AF041965 D3Mit189 3 478.21 T 6.37E05 M27352 D16Mit86 16 2176.22 C 4.27E02 M27352 D1Mit218 1 127.28 T 2.67E05 AK015215 D19Mit68 19 2369.61 T 6.89E05 AK015215 D19Mit109 19 2372.17 T 1.56E05 M20876 D17Mit221 17 2270.31 C 4.86E02 AK011125 D19Mit53 19 2410.66 T 6.31E05 AK011125 D19Mit10 19 2412.67 T 3.03E05 Continued on next page

158 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P AK019055 D19Msw052a 19 2416.54 T 1.61E05 AK019055 D19Mit103 19 2419.31 T 1.26E05 AK002328 DXMit186 X 2582.74 C 4.35E02 AK014125 D7Mit350 7 1061.29 T 1.32E05 BC008111 D10Mit179 10 1495.44 C 4.95E02 AK007111 D1Mit134 1 80.78 T 2.27E05 NM.033264 DXMit1 X 2491.85 T 9.15E06 AK020116 D1Mit216 1 80.32 T 1.30E06 AK020116 D1Mit134 1 80.78 T 5.70E06 BC002161 D11Mit337 11 1627.76 C 2.98E02 NM.011037 D10Mit179 10 1495.44 C 3.84E02 NM.015735 02.125.650 2 323.03 T 3.06E05 NM.010059 D2Mit457 2 376.81 C 1.11E02 NM.011478 D1Mit216 1 80.32 T 5.28E06 NM.011478 D1Mit134 1 80.78 T 1.14E06 NM.008778 D1Mit216 1 80.32 T 2.60E06 AJ297743 D15Mit35 15 2082.50 C 1.83E02 NM.011126 D8Mit156 8 1253.28 C 3.95E02 NM.013769 D1Mit216 1 80.32 T 1.12E05 NM.013769 D1Mit134 1 80.78 T 1.70E05 NM.010923 D3Mit19 3 535.89 C 6.08E03 NM.011619 D7Mit259 7 1123.02 C 4.21E02 AF203899 D2Mit457 2 376.81 C 2.04E02 X04097 D7Mit343 7 1010.09 T 3.60E06 X04097 D7Mit155 7 1010.71 T 1.05E05 X04097 D7Mit225 7 1014.32 T 4.11E05 X04097 D7Mit227 7 1015.57 T 6.69E05 NM.019436 Mtap2 1 66.72 T 7.31E05 NM.019436 D1Mit328 1 68.25 T 4.84E06 Continued on next page

159 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P NM.019436 D1Mit178 1 69.13 T 2.46E05 NM.019436 D1Mit128 1 72.96 T 5.58E05 NM.019436 D1Mit19 1 74.13 T 3.83E05 NM.019436 D1Mit216 1 80.32 T 2.86E06 NM.019436 D1Mit134 1 80.78 T 3.28E05 NM.025631 D6Mit14 6 986.00 C 3.38E02 Z12503 D1Mit216 1 80.32 T 4.37E05 NM.026670 D9Mit151 9 1375.37 C 3.24E02 AK014103 D1Mit216 1 80.32 T 4.11E05 AK013280 D1Mit216 1 80.32 T 4.52E07 AK013280 D1Mit134 1 80.78 T 8.95E06 AK020475 D12Mit150 12 1742.65 C 1.66E02 AK013684 D1Mit155 1 194.32 C 2.44E02 NM.028850 D1Mit216 1 80.32 T 3.60E07 AK007896 D1Mit216 1 80.32 T 6.99E05 S76094 D1Mit216 1 80.32 T 6.13E06 AK014750 D1Mit216 1 80.32 T 8.33E06 AK014750 D1Mit134 1 80.78 T 2.32E05 AK018565 D4Mit344 4 689.95 C 3.24E02 AK007264 D17Mit221 17 2270.31 C 2.48E02 AK014892 D1Mit134 1 80.78 T 4.37E05 AK020972 D15Mit35 15 2082.50 C 5.37E03 AK006576 06.100.540 6 942.66 T 5.20E05 AK013928 D7Nds3 7 1052.85 T 2.90E05 AK013928 D7Mit350 7 1061.29 T 1.03E05 NM.019993 D1Mit216 1 80.32 T 5.66E06 NM.019993 D1Mit134 1 80.78 T 2.88E06 NM.019993 D8Mit124 8 1139.72 T 5.87E05 NM.009177 D1Mit134 1 80.78 T 1.68E05 Continued on next page

160 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P NM.019667 D1Mit216 1 80.32 T 1.45E05 J04695 D1Mit216 1 80.32 T 2.82E05 J04695 D1Mit134 1 80.78 T 8.21E06 AF090691 D17Mit221 17 2270.31 C 1.87E02 Z12461 D11Mit34 11 1587.97 T 7.48E05 AK016214 D1Mit216 1 80.32 T 2.99E05 AK016214 D1Mit134 1 80.78 T 1.08E05 BC004650 D7Mit259 7 1123.02 C 8.59E03 AK016598 D1Mit178 1 69.13 T 6.58E05 AK016598 D1Mit128 1 72.96 T 3.92E05 AK016598 D1Mit19 1 74.13 T 4.46E06 AK016598 D1Mit181 1 74.94 T 1.54E05 AK016598 D1Mit216 1 80.32 T 7.22E06 AK016598 D1Mit134 1 80.78 T 9.61E06 NM.025938 D3Mit19 3 535.89 C 2.54E02 AY042194 D1Mit134 1 80.78 T 5.10E05 BC002097 D17Mit221 17 2270.31 C 1.22E02 AK004762 D1Mit216 1 80.32 T 5.58E05 AK020969 D3Mit19 3 535.89 C 4.42E02 AK018000 D1Mit216 1 80.32 T 8.34E06 AK018000 D1Mit134 1 80.78 T 3.87E06 AK020940 D1Mit216 1 80.32 T 7.43E05 AK020940 D1Mit134 1 80.78 T 6.77E05 NM.024250 D11Mit41 11 1597.84 T 5.11E05 NM.010073 D6Mit14 6 986.00 C 2.95E03 NM.011289 D1Mit216 1 80.32 T 7.37E05 NM.009500 D1Mit134 1 80.78 T 1.79E05 NM.010385 D19Mit61 19 2380.97 T 2.37E05 NM.010385 D19Mit28 19 2381.35 T 3.05E05 Continued on next page

161 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P NM.009222 DXMit186 X 2582.74 C 3.11E02 U94828 D11Mit5 11 1575.89 T 6.37E05 NM.011861 D8Mit124 8 1139.72 T 2.84E05 AF322238 D1Mit216 1 80.32 T 1.26E06 AF322238 D1Mit134 1 80.78 T 1.02E06 AF322238 D1Mit83 1 86.03 T 7.58E05 AK012317 D1Mit155 1 194.32 C 2.45E03 AK004346 D19Mit6 19 2426.62 C 4.97E02 AK017313 D8Mit124 8 1139.72 T 6.55E05 AK015795 D8Mit124 8 1139.72 T 5.57E06 BC005647 D4Mit344 4 689.95 C 6.43E03 BC003975 D1Mit134 1 80.78 T 2.16E05 BC003975 D1Mit83 1 86.03 T 5.80E05 AK009018 D1Mit216 1 80.32 T 3.97E07 AK009018 D1Mit134 1 80.78 T 9.42E06 AK009018 D1Mit83 1 86.03 T 1.56E05 AK015896 D1Mit134 1 80.78 T 1.09E05 AK018230 D1Mit216 1 80.32 T 4.87E05 AK018230 D1Mit134 1 80.78 T 5.75E05 AK017042 D15Mit35 15 2082.50 C 4.96E02 BC013849 D1Mit216 1 80.32 T 6.28E06 BC013849 D1Mit134 1 80.78 T 4.81E05 AK016793 D1Mit216 1 80.32 T 3.97E05 AK016653 D1Mit134 1 80.78 T 2.27E05 AK016014 D8Mit124 8 1139.72 T 6.86E05 AK013946 D14Mit99 14 1876.58 T 6.54E05 NM.009011 D1Mit216 1 80.32 T 2.36E05 NM.010489 D8Mit124 8 1139.72 T 6.36E05 M76131 D1Mit216 1 80.32 T 5.66E05 Continued on next page

162 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P AJ279835 D8Mit124 8 1139.72 T 6.97E05 NM.013762 D1Mit216 1 80.32 T 5.72E05 NM.016851 D1Mit216 1 80.32 T 4.92E05 X62895 D1Mit216 1 80.32 T 6.56E05 NM.009074 D1Mit216 1 80.32 T 1.64E08 NM.009074 D1Mit134 1 80.78 T 5.39E07 NM.009074 D1Mit83 1 86.03 T 4.09E05 NM.010752 D1Mit216 1 80.32 T 4.99E05 BC002253 D1Mit216 1 80.32 T 6.83E05 NM.031397 D19Mit68 19 2369.61 T 6.33E05 BC003908 D19Mit28 19 2381.35 T 5.00E05 AF313800 D19Mit53 19 2410.66 T 5.03E05 Z12549 D1Mit216 1 80.32 T 1.37E05 AF289189 D16Mit86 16 2176.22 C 4.58E02 AK008235 D9Mit151 9 1375.37 C 4.64E02 AK016370 D1Mit134 1 80.78 T 6.36E05 NM.025816 DXMit186 X 2582.74 C 1.47E03 AK011611 D1Mit216 1 80.32 T 3.84E05 AK006195 D1Mit216 1 80.32 T 8.66E07 AK006195 D1Mit134 1 80.78 T 1.39E05 AK016410 D12Mit150 12 1742.65 C 4.11E02 AK013507 D19Mit103 19 2419.31 T 1.08E05 AK005480 D19Mit6 19 2426.62 C 6.68E03 AK017054 D1Mit178 1 69.13 T 4.51E05 AK017054 D1Mit128 1 72.96 T 3.61E05 AK017054 01.070.445 1 73.31 T 4.88E05 AK021134 02.125.650 2 323.03 T 3.75E05 AK017720 D9Mit151 9 1375.37 C 1.27E02 NM.023196 D1Mit216 1 80.32 T 3.40E06 Continued on next page

163 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P NM.009210 D1Mit216 1 80.32 T 1.16E05 NM.009210 D1Mit134 1 80.78 T 3.38E05 NM.010003 D1Mit134 1 80.78 T 3.21E05 AJ010752 D1Mit216 1 80.32 T 2.08E05 AJ010752 D1Mit134 1 80.78 T 6.22E06 NM.016806 D1Mit216 1 80.32 T 2.10E05 AJ271055 08.062.280 8 1186.59 T 7.19E05 NM.011187 D6Mit14 6 986.00 C 4.87E02 NM.009671 D1Mit216 1 80.32 T 1.63E05 AK018430 D4Mit344 4 689.95 C 4.66E02 AK017551 D1Mit216 1 80.32 T 4.64E06 AK017551 D1Mit134 1 80.78 T 1.25E06 AK017551 D1Mit83 1 86.03 T 5.06E05 AK019407 D1Mit216 1 80.32 T 8.75E06 AK019407 D1Mit134 1 80.78 T 1.62E06 AB049641 D1Mit19 1 74.13 T 3.48E05 AF220039 D1Mit216 1 80.32 T 4.82E05 NM.029565 D12Mit101 12 1729.13 T 2.43E05 NM.029565 D12Mit289 12 1729.64 T 4.44E05 NM.024229 D1Mit216 1 80.32 T 5.07E05 NM.024229 D1Mit134 1 80.78 T 2.67E05 AK005782 D1Mit216 1 80.32 T 1.19E05 AK005782 D1Mit134 1 80.78 T 1.74E05 NM.024178 D1Mit216 1 80.32 T 7.07E06 NM.024178 D1Mit134 1 80.78 T 3.38E05 AK014837 D4Mit344 4 689.95 C 8.59E03 BC013803 D19Mit6 19 2426.62 C 3.11E02 AK006668 D19Mit61 19 2380.97 T 9.51E06 AK006668 D19Mit28 19 2381.35 T 3.69E05 Continued on next page

164 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P AK016520 D8Mit156 8 1253.28 C 3.01E02 AK017619 D1Mit216 1 80.32 T 4.95E05 AK014149 D1Mit134 1 80.78 T 3.16E05 AK006264 D1Mit216 1 80.32 T 8.99E06 AK006264 D1Mit134 1 80.78 T 5.09E05 AK009212 D1Mit216 1 80.32 T 1.57E05 AK018872 D1Mit216 1 80.32 T 1.74E05 AK018872 D1Mit134 1 80.78 T 3.69E06 AK013590 D1Mit216 1 80.32 T 7.69E06 AK013590 D1Mit134 1 80.78 T 4.01E05 NM.009658 D1Mit216 1 80.32 T 5.96E06 NM.009658 D1Mit134 1 80.78 T 2.36E06 NM.009096 D1Mit134 1 80.78 T 3.94E05 NM.009091 D1Mit216 1 80.32 T 4.17E05 NM.009091 D1Mit134 1 80.78 T 1.62E05 NM.012006 D1Mit178 1 69.13 T 2.18E05 NM.012006 D1Mit128 1 72.96 T 1.05E05 NM.012006 D1Mit19 1 74.13 T 6.72E06 NM.012006 D1Mit216 1 80.32 T 3.39E07 NM.012006 D1Mit134 1 80.78 T 2.41E07 AF290472 D1Mit128 1 72.96 T 6.23E05 AF290472 D1Mit19 1 74.13 T 1.90E05 AF290472 D1Mit216 1 80.32 T 2.46E05 NM.016982 D19Mit28 19 2381.35 T 3.72E05 M92418 D1Mit128 1 72.96 T 2.71E05 M92418 D1Mit19 1 74.13 T 5.90E06 M92418 D1Mit181 1 74.94 T 2.75E05 M92418 D1Mit216 1 80.32 T 2.48E06 M92418 D1Mit134 1 80.78 T 3.74E06 Continued on next page

165 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P NM.009554 D1Mit134 1 80.78 T 4.60E05 NM.009554 D1Mit83 1 86.03 T 5.19E06 NM.021367 D1Mit128 1 72.96 T 2.30E05 NM.021367 D1Mit19 1 74.13 T 3.17E05 NM.021367 D1Mit216 1 80.32 T 6.42E05 X99751 D19Mit28 19 2381.35 T 3.36E05 Z12560 D1Mit216 1 80.32 T 2.80E05 AF041930 D12Mit150 12 1742.65 C 2.19E02 AJ310531 D11Mit337 11 1627.76 C 4.06E02 U01885 D1Mit216 1 80.32 T 1.88E05 U01885 D1Mit134 1 80.78 T 5.71E06 AK018052 D14Mit131 14 1975.00 C 3.43E02 AK019108 D6Mit14 6 986.00 C 3.64E02 AK016727 D1Mit19 1 74.13 T 3.57E05 AK016727 D1Mit216 1 80.32 T 3.95E05 AK016727 D1Mit134 1 80.78 T 1.53E05 AK009576 D16Mit86 16 2176.22 C 3.62E02 AF322375 D4Mit344 4 689.95 C 2.02E02 AK016408 D8Mit124 8 1139.72 T 6.22E05 BC003886 D3Mit19 3 535.89 C 1.22E02 AK008128 D12Mit150 12 1742.65 C 1.12E02 AK018330 D13Mit30 13 1844.60 T 3.30E05 BC009169 D11Mit337 11 1627.76 C 2.78E02 BC011321 D6Mit230 6 939.49 T 5.56E05 BC004774 D17Mit221 17 2270.31 C 1.28E02 AK015296 D17Mit221 17 2270.31 C 4.75E02 AK005840 D1Mit178 1 69.13 T 2.51E06 AK005840 D1Mit128 1 72.96 T 7.17E06 AK005840 D1Mit19 1 74.13 T 8.36E07 Continued on next page

166 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P AK005840 D1Mit181 1 74.94 T 3.29E05 AK005840 D1Mit216 1 80.32 T 4.24E06 AK005840 D1Mit134 1 80.78 T 7.83E07 AK010648 D4Mit344 4 689.95 C 3.56E02 AK015163 D1Mit216 1 80.32 T 7.60E05 BC006621 D1Mit216 1 80.32 T 1.51E06 NM.008726 D1Mit216 1 80.32 T 3.22E05 NM.008726 D1Mit134 1 80.78 T 4.79E06 NM.008726 D8Mit124 8 1139.72 T 1.50E05 NM.009872 D1Mit181 1 74.94 T 5.09E05 NM.009601 D1Mit19 1 74.13 T 5.31E05 NM.009601 D1Mit134 1 80.78 T 2.13E06 X58643 D14Mit131 14 1975.00 C 4.98E02 NM.010908 D12Mit150 12 1742.65 C 3.69E02 NM.025806 D1Mit328 1 68.25 T 4.10E05 NM.025806 D1Mit178 1 69.13 T 7.36E05 NM.025806 D1Mit128 1 72.96 T 2.34E05 NM.025806 D1Mit19 1 74.13 T 1.95E05 AK016990 D1Mit178 1 69.13 T 5.87E05 AK016990 D1Mit19 1 74.13 T 2.77E05 AK016990 D1Mit216 1 80.32 T 3.13E07 AK016990 D1Mit134 1 80.78 T 3.58E07 AK016990 D1Mit83 1 86.03 T 3.32E06 NM.031873 D8Mit124 8 1139.72 T 3.89E05 NM.025820 D1Mit216 1 80.32 T 6.33E05 NM.025820 D1Mit134 1 80.78 T 2.16E05 AK004622 D1Mit216 1 80.32 T 9.96E06 AK004622 D1Mit134 1 80.78 T 6.92E06 AK004622 D8Mit124 8 1139.72 T 3.17E05 Continued on next page

167 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P AY028460 D1Mit19 1 74.13 T 4.12E05 AY028460 D1Mit216 1 80.32 T 1.41E05 AY028460 D1Mit134 1 80.78 T 1.21E05 AY028460 D1Mit83 1 86.03 T 6.00E05 BC013497 D1Mit216 1 80.32 T 3.62E06 AB054591 D17Mit221 17 2270.31 C 3.07E02 AK016962 D1Mit134 1 80.78 T 5.66E05 BC003914 D1Mit216 1 80.32 T 1.00E05 BC003914 D1Mit134 1 80.78 T 1.12E05 BC003914 D8Mit124 8 1139.72 T 4.09E05 AK009534 D1Mit216 1 80.32 T 1.09E05 AK009534 D1Mit134 1 80.78 T 2.17E06 AK009534 D8Mit124 8 1139.72 T 4.74E05 AK013730 D8Mit124 8 1139.72 T 2.77E05 AK009010 D1Mit134 1 80.78 T 3.34E05 AK016210 D15Mit35 15 2082.50 C 4.72E02 NM.021391 D6Mit14 6 986.00 C 2.56E02 U89409 D11Mit337 11 1627.76 C 4.25E02 AF283667 D17Mit221 17 2270.31 C 3.87E02 NM.010926 D1Mit134 1 80.78 T 4.74E05 NM.010926 D8Mit124 8 1139.72 T 7.59E05 NM.008708 D14Mit131 14 1975.00 C 5.76E03 NM.011770 D2Mit457 2 376.81 C 2.72E02 NM.016703 D1Mit216 1 80.32 T 1.06E06 NM.016703 D1Mit134 1 80.78 T 2.11E05 AF043120 D2Mit457 2 376.81 C 4.90E02 NM.019718 D1Mit216 1 80.32 T 4.78E06 NM.019758 D1Mit216 1 80.32 T 7.28E05 NM.010424 D18Mit144 18 2361.14 C 2.83E02 Continued on next page

168 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P AF012178 D1Mit216 1 80.32 T 2.38E05 L16982 D1Mit216 1 80.32 T 4.49E05 Z12216 D1Mit216 1 80.32 T 1.50E05 Z12216 D1Mit134 1 80.78 T 4.61E05 AK005418 D1Mit83 1 86.03 T 4.79E05 AK018361 D11Mit337 11 1627.76 C 3.93E02 AK017764 D16Mit86 16 2176.22 C 2.63E02 AK015544 D1Mit134 1 80.78 T 4.36E05 AK015544 D1Mit83 1 86.03 T 5.81E05 AK018741 D11Mit337 11 1627.76 C 2.42E02 AK018741 D1Mit5 1 64.21 T 7.26E05 AK015114 D1Mit216 1 80.32 T 6.36E05 AK019877 D1Mit216 1 80.32 T 6.96E05 AK002745 D1Mit216 1 80.32 T 6.82E05 AK002745 D1Mit134 1 80.78 T 5.11E05 BC008158 D1Mit134 1 80.78 T 4.44E05 AK008754 D1Mit216 1 80.32 T 9.06E06 AK008754 D1Mit134 1 80.78 T 9.94E06 NM.011958 D1Mit178 1 69.13 T 9.43E06 NM.011958 D1Mit134 1 80.78 T 4.21E05 NM.011258 DXMit1 X 2491.85 T 6.72E05 NM.011741 D1Mit216 1 80.32 T 4.84E06 NM.011741 D1Mit134 1 80.78 T 8.79E07 NM.011287 D1Mit216 1 80.32 T 6.93E05 NM.011287 D1Mit134 1 80.78 T 2.61E05 NM.007927 D7Mit259 7 1123.02 C 3.40E02 AB010345 D1Mit134 1 80.78 T 3.90E05 M29394 D6Mit14 6 986.00 C 2.86E02 NM.009271 D1Mit216 1 80.32 T 3.89E05 Continued on next page

169 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P NM.007540 D10Mit179 10 1495.44 C 8.68E03 NM.010068 D1Mit216 1 80.32 T 1.05E05 NM.010068 D1Mit134 1 80.78 T 3.00E05 AB023957 D16Mit86 16 2176.22 C 4.90E02 X59199 D19Mit68 19 2369.61 T 1.72E05 AK005212 D4Mit344 4 689.95 C 1.53E02 AK014971 D7Mit259 7 1123.02 C 4.82E02 AK021201 05.142.105 5 835.33 C 1.00E02 NM.029716 D12Mit150 12 1742.65 C 1.07E02 AK010782 D3Mit19 3 535.89 C 2.72E02 NM.026461 D15Mit35 15 2082.50 C 4.49E02 NM.026461 D1Mit216 1 80.32 T 5.78E05 AK006553 D1Mit19 1 74.13 T 5.27E05 AK006553 D1Mit216 1 80.32 T 1.07E05 AK020369 D1Mit216 1 80.32 T 2.14E05 BC003938 D1Mit134 1 80.78 T 2.85E05 AK020290 D8Mit124 8 1139.72 T 2.74E05 AK019118 D1Mit216 1 80.32 T 1.61E05 AK019118 D1Mit134 1 80.78 T 4.61E05 AK006783 D1Mit178 1 69.13 T 1.77E05 AK006783 D1Mit19 1 74.13 T 2.08E05 AK006783 D1Mit216 1 80.32 T 1.26E05 NM.010402 D14Mit131 14 1975.00 C 3.16E02 NM.013550 D1Mit216 1 80.32 T 6.17E06 NM.013550 D1Mit134 1 80.78 T 3.04E06 NM.011418 D17Mit221 17 2270.31 C 1.82E02 NM.008826 D13Mit35 13 1862.73 C 3.42E02 NM.020046 D1Mit134 1 80.78 T 6.11E05 NM.007436 D1Mit216 1 80.32 T 2.15E05 Continued on next page

170 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P NM.007436 D1Mit134 1 80.78 T 1.38E05 NM.010730 D1Mit134 1 80.78 T 1.63E05 AF284437 D9Mit227 9 1295.44 T 7.47E05 NM.009847 D11Mit337 11 1627.76 C 3.40E02 NM.020260 D1Mit216 1 80.32 T 7.17E05 NM.009761 D1Mit216 1 80.32 T 6.04E05 NM.007921 DXMit186 X 2582.74 C 7.53E03 NM.009536 D1Mit425 1 156.62 T 6.79E05 BC013553 D13Mit35 13 1862.73 C 1.97E02 AF204174 D1Mit178 1 69.13 T 5.18E05 AF204174 D1Mit128 1 72.96 T 3.07E05 AF204174 D1Mit19 1 74.13 T 4.91E06 AF204174 D1Mit181 1 74.94 T 2.75E05 AF204174 D1Mit216 1 80.32 T 1.14E05 AF204174 D1Mit134 1 80.78 T 1.25E06 AF204174 D1Mit83 1 86.03 T 7.09E05 AK004849 D1Mit216 1 80.32 T 3.82E05 AK020197 D1Mit216 1 80.32 T 1.36E06 AK020197 D1Mit134 1 80.78 T 2.17E06 AK002831 D9Mit227 9 1295.44 T 5.05E05 AK002831 D9Mit191 9 1300.68 T 1.89E05 AK002831 D9Mit4 9 1306.17 T 4.34E05 NM.026508 D6Mit254 6 966.58 T 7.03E05 BC014688 D1Mit19 1 74.13 T 7.45E05 BC014688 D1Mit216 1 80.32 T 1.81E05 BC014688 D1Mit134 1 80.78 T 3.59E05 AK006182 D1Mit134 1 80.78 T 9.36E06 AK021087 D7Mit297 7 1043.33 T 1.65E05 AK021087 D07Msw058 7 1046.11 T 5.61E05 Continued on next page

171 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P AK021087 D7Nds3 7 1052.85 T 3.59E06 AK021087 07.073.835 7 1057.91 T 3.93E06 AK021087 D7Mit350 7 1061.29 T 2.71E06 AK021087 D7Mit62 7 1062.34 T 1.56E05 NM.025601 D1Mit216 1 80.32 T 4.64E05 NM.025601 D1Mit134 1 80.78 T 4.34E05 AK007200 D19Mit28 19 2381.35 T 4.09E05 AK010187 D1Mit216 1 80.32 T 2.73E05 AK010187 D1Mit134 1 80.78 T 9.03E06 AK019331 05.142.105 5 835.33 C 4.76E02 AF336862 D1Mit216 1 80.32 T 3.58E05 AF336862 D1Mit134 1 80.78 T 7.97E06 AK005859 D16Mit86 16 2176.22 C 4.02E02 AK008243 D1Mit178 1 69.13 T 5.42E05 AK008243 D1Mit19 1 74.13 T 2.70E05 AK008243 D1Mit216 1 80.32 T 5.45E05 AK008243 D1Mit134 1 80.78 T 2.29E05 AK017934 D9Mit151 9 1375.37 C 4.35E02 NM.026731 D1Mit216 1 80.32 T 6.26E06 NM.026731 D1Mit134 1 80.78 T 3.61E05 BC005638 D8Mit124 8 1139.72 T 3.04E05 NM.011972 D1Mit128 1 72.96 T 6.49E06 NM.011972 01.070.445 1 73.31 T 4.24E05 NM.011045 D1Mit128 1 72.96 T 5.49E05 AB051827 D1Mit216 1 80.32 T 7.42E06 AB051827 D1Mit134 1 80.78 T 1.58E06 AB051827 D8Mit124 8 1139.72 T 3.51E05 NM.009438 D1Mit216 1 80.32 T 1.95E06 NM.009438 D1Mit134 1 80.78 T 2.61E06 Continued on next page

172 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P NM.009438 D1Mit83 1 86.03 T 7.28E05 NM.009261 D1Mit216 1 80.32 T 3.70E05 NM.009261 D1Mit134 1 80.78 T 3.14E06 AB010357 D1Mit216 1 80.32 T 1.60E05 NM.011295 D19Mit6 19 2426.62 C 3.04E02 NM.009779 D1Mit216 1 80.32 T 5.28E05 NM.009779 D1Mit134 1 80.78 T 2.51E06 NM.007783 D3Mit19 3 535.89 C 3.89E02 NM.021381 D1Mit178 1 69.13 T 2.55E05 NM.021381 D1Mit128 1 72.96 T 6.50E06 NM.021381 01.070.445 1 73.31 T 6.95E05 NM.021381 D1Mit19 1 74.13 T 1.31E06 NM.021381 D1Mit181 1 74.94 T 1.74E05 NM.021381 D1Mit216 1 80.32 T 1.56E07 NM.021381 D1Mit134 1 80.78 T 4.51E07 NM.021381 D1Mit83 1 86.03 T 5.36E05 NM.009365 D1Mit128 1 72.96 T 4.67E06 NM.009365 01.070.445 1 73.31 T 5.29E05 NM.009365 D1Mit19 1 74.13 T 5.33E05 NM.010028 D1Mit134 1 80.78 T 9.40E07 AK009485 D5Mit309 5 769.16 T 4.65E05 AK004121 D1Mit19 1 74.13 T 6.00E05 BC011059 D7Mit350 7 1061.29 T 7.03E05 AK020496 D8Mit124 8 1139.72 T 3.25E05 BC003861 D1Mit178 1 69.13 T 2.38E05 BC003861 D1Mit128 1 72.96 T 1.81E05 BC003861 D1Mit19 1 74.13 T 3.21E06 BC003861 D1Mit181 1 74.94 T 1.71E05 BC003861 D1Mit216 1 80.32 T 1.72E06 Continued on next page

173 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P BC003861 D1Mit134 1 80.78 T 1.39E06 AK009532 D1Mit216 1 80.32 T 4.90E05 AK009532 D1Mit134 1 80.78 T 3.16E06 AK009532 D1Mit83 1 86.03 T 6.19E05 AK012898 D7Mit62 7 1062.34 T 6.82E05 AK013671 D6Mit14 6 986.00 C 2.22E02 NM.009326 D14Mit131 14 1975.00 C 4.20E03 M74012 D1Mit216 1 80.32 T 6.05E05 AF304351 D1Mit216 1 80.32 T 4.42E05 AF304351 D1Mit134 1 80.78 T 7.10E05 NM.013721 D15Mit35 15 2082.50 C 1.06E02 NM.016668 D12Mit150 12 1742.65 C 3.18E02 AB041591 D1Mit216 1 80.32 T 5.35E06 NM.007725 D1Mit178 1 69.13 T 4.65E05 NM.007725 D1Mit216 1 80.32 T 9.60E07 NM.019726 D6Mit149 6 947.00 T 7.50E05 M92403 D19Mit103 19 2419.31 T 3.82E05 AK017957 D2Mit457 2 376.81 C 1.82E02 NM.025710 D2Mit398 2 321.34 T 1.93E05 NM.025710 02.125.650 2 323.03 T 5.04E06 AK019450 D1Mit216 1 80.32 T 5.14E05 AK020973 D1Mit216 1 80.32 T 2.42E05 AK020338 D10Mit179 10 1495.44 C 5.54E03 AK017382 D1Mit216 1 80.32 T 3.27E06 NM.025366 D4Mit344 4 689.95 C 3.63E02 NM.025366 D1Mit216 1 80.32 T 2.61E05 BC006711 D1Mit216 1 80.32 T 1.98E05 BC006711 D1Mit134 1 80.78 T 3.85E05 AK020658 D3Mit19 3 535.89 C 3.53E02 Continued on next page

174 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P AK015269 D13Mit35 13 1862.73 C 3.26E02 AK016224 D3Mit19 3 535.89 C 3.73E02 AK007272 D1Mit216 1 80.32 T 9.75E06 AK021039 D8Mit124 8 1139.72 T 7.14E05 AK007899 D8Mit124 8 1139.72 T 4.42E05 D29965 D1Mit216 1 80.32 T 2.74E05 D29965 D1Mit83 1 86.03 T 2.44E05 Z78162 D11Mit337 11 1627.76 C 9.93E03 NM.008045 D4Mit344 4 689.95 C 3.61E02 NM.008045 D1Mit216 1 80.32 T 2.81E06 NM.008045 D1Mit134 1 80.78 T 7.09E07 NM.008045 D1Mit83 1 86.03 T 3.89E05 NM.007803 D1Mit216 1 80.32 T 4.98E06 NM.013745 D1Mit134 1 80.78 T 1.76E05 X80437 D1Mit216 1 80.32 T 2.24E05 NM.011969 D6Mit14 6 986.00 C 1.88E02 NM.011184 D1Mit216 1 80.32 T 3.28E05 NM.008445 D17Mit221 17 2270.31 C 4.28E02 NM.019771 D16Mit86 16 2176.22 C 1.02E02 NM.010716 D5Mit139 5 813.88 T 1.56E05 NM.010716 D5Mit370 5 815.22 T 2.99E05 NM.010716 D5Mit371 5 816.52 T 2.47E05 NM.009133 D19Mit6 19 2426.62 C 7.41E03 NM.007394 D9Mit151 9 1375.37 C 7.51E03 AK018352 D1Mit216 1 80.32 T 6.47E05 AK018352 D1Mit134 1 80.78 T 1.55E05 AK011474 D9Mit151 9 1375.37 C 4.37E02 AF141311 D1Mit178 1 69.13 T 7.50E05 AF141311 D1Mit134 1 80.78 T 1.98E05 Continued on next page

175 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P AK005984 D1Mit218 1 127.28 T 4.27E05 AK020554 D1Mit216 1 80.32 T 2.30E05 AK020554 D1Mit134 1 80.78 T 2.39E06 AK019874 D9Mit253 9 1294.68 T 5.55E05 AK005367 DXMit89 X 2436.00 T 5.82E06 X64997 D6Mit14 6 986.00 C 4.27E03 AK011867 D1Mit134 1 80.78 T 5.12E05 AK012833 D1Mit216 1 80.32 T 8.14E06 AK012833 D1Mit134 1 80.78 T 2.94E06 BC010790 D1Mit216 1 80.32 T 3.13E05 NM.030707 D1Mit216 1 80.32 T 4.29E06 AK015001 D1Mit216 1 80.32 T 4.86E05 AK020839 D1Mit178 1 69.13 T 5.99E05 AK020839 D1Mit216 1 80.32 T 4.86E05 AK020839 D1Mit134 1 80.78 T 7.13E05 S76975 D1Mit216 1 80.32 T 5.25E05 AK014196 D1Mit216 1 80.32 T 3.97E05 AK019444 D1Mit216 1 80.32 T 4.74E05 AK013826 D6Mit230 6 939.49 T 3.63E05 AK009339 D10Mit179 10 1495.44 C 4.58E02 AK017533 D6Mit230 6 939.49 T 4.38E05 NM.019488 D6Mit14 6 986.00 C 2.56E02 NM.019488 D1Mit216 1 80.32 T 4.34E06 NM.019488 D1Mit134 1 80.78 T 4.22E07 NM.019488 D1Mit83 1 86.03 T 2.36E05 NM.018796 05.142.105 5 835.33 C 6.04E03 NM.019647 D19Mit28 19 2381.35 T 3.24E05 AF106620 D19Mit6 19 2426.62 C 4.49E02 AB019618 D4Mit344 4 689.95 C 4.01E02 Continued on next page

176 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P U20304 D1Mit134 1 80.78 T 2.88E05 U55677 D3Mit12 3 477.70 T 6.36E05 U55677 03.106.500 3 484.85 T 3.17E05 Z12287 D1Mit134 1 80.78 T 1.31E05 Z12287 D1Mit83 1 86.03 T 2.29E05 AK014529 D1Mit134 1 80.78 T 3.97E05 NM.031180 D1Mit19 1 74.13 T 7.47E05 NM.031180 D1Mit216 1 80.32 T 2.51E05 BC009153 D1Mit178 1 69.13 T 7.53E05 BC009153 D1Mit216 1 80.32 T 5.55E05 BC009153 D1Mit134 1 80.78 T 7.45E06 NM.026446 D1Mit216 1 80.32 T 2.47E06 AK006678 D17Mit221 17 2270.31 C 9.44E03 AK019950 D2Mit436 2 275.87 T 3.98E06 AK008598 D2Mit436 2 275.87 T 4.95E05 AK008598 D13Mit57 13 1762.78 T 1.12E05 AK006719 D14Mit131 14 1975.00 C 4.34E02 BC005613 D1Mit216 1 80.32 T 6.97E05 AK014689 D6Mit14 6 986.00 C 7.94E03 AJ278128 D1Mit216 1 80.32 T 4.24E05 AJ278128 D1Mit134 1 80.78 T 7.59E06 AJ278128 D8Mit124 8 1139.72 T 4.31E05 NM.010434 D17Mit221 17 2270.31 C 9.97E03 NM.011395 D1Mit216 1 80.32 T 3.38E05 M13446 D12Mit150 12 1742.65 C 4.96E02 NM.013590 D13Mit35 13 1862.73 C 1.79E03 NM.013590 D6Mit67 6 938.74 T 8.15E06 NM.013590 D6Mit230 6 939.49 T 1.42E05 NM.019767 D1Mit128 1 72.96 T 3.01E05 Continued on next page

177 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P NM.018875 05.142.105 5 835.33 C 2.91E02 AF022770 D1Mit216 1 80.32 T 2.41E05 AF022770 D1Mit134 1 80.78 T 1.16E06 AF022770 D1Mit83 1 86.03 T 6.39E05 AB049731 D13Mit35 13 1862.73 C 2.57E02 AB031081 D2Mit457 2 376.81 C 4.61E02 NM.016678 D3Mit19 3 535.89 C 4.82E02 AK017256 D6Mit14 6 986.00 C 4.19E02 AF007230 D2Mit286 2 350.45 T 6.13E05 AF041935 D1Mit134 1 80.78 T 8.07E06 AF041935 D19Mit44 19 2373.53 T 2.77E05 Z12474 D2Mit286 2 350.45 T 4.00E05 AF367247 D1Mit216 1 80.32 T 2.43E05 AF367247 D1Mit134 1 80.78 T 1.62E05 AK017151 D1Mit216 1 80.32 T 1.88E05 AK017151 D1Mit134 1 80.78 T 2.09E06 AK004552 D1Mit216 1 80.32 T 1.42E05 AK017808 D1Mit216 1 80.32 T 5.30E05 AK017808 D1Mit134 1 80.78 T 4.61E05 BC004049 D14Mit131 14 1975.00 C 2.65E02 AK004503 05.142.105 5 835.33 C 2.95E02 L17333 DXMit1 X 2491.85 T 6.53E05 AK009988 D1Mit216 1 80.32 T 1.26E05 AK002303 05.142.105 5 835.33 C 4.83E02 AK017294 D1Mit134 1 80.78 T 1.88E05 AK007459 D1Mit216 1 80.32 T 2.95E06 AK007459 D1Mit134 1 80.78 T 1.04E07 AK007459 D1Mit83 1 86.03 T 7.10E05 AK017126 D1Mit134 1 80.78 T 6.35E06 Continued on next page

178 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P AK017126 D1Mit83 1 86.03 T 5.21E05 AK021131 D1Mit134 1 80.78 T 5.41E05 AK012340 D1Mit216 1 80.32 T 1.16E05 AK012340 D1Mit134 1 80.78 T 1.47E06 AK016457 D1Mit216 1 80.32 T 1.57E05 AK015963 D1Mit155 1 194.32 C 4.94E02 AF303106 D1Mit134 1 80.78 T 7.41E05 NM.009996 D2Mit457 2 376.81 C 4.19E02 NM.009945 D1Mit216 1 80.32 T 7.74E06 NM.009945 D1Mit134 1 80.78 T 4.84E06 NM.009306 D11Mit337 11 1627.76 C 1.61E02 NM.009163 D1Mit216 1 80.32 T 2.14E05 NM.009163 D1Mit134 1 80.78 T 2.85E05 NM.007381 D1Mit216 1 80.32 T 2.48E05 NM.010665 D1Mit178 1 69.13 T 3.80E05 NM.010665 D1Mit216 1 80.32 T 3.65E06 NM.010665 D1Mit134 1 80.78 T 6.31E07 NM.009402 D1Mit216 1 80.32 T 2.96E05 NM.009402 D1Mit134 1 80.78 T 5.44E05 NM.013664 D19Mit6 19 2426.62 C 1.04E02 NM.013747 D6Mit230 6 939.49 T 1.23E05 NM.013747 06.100.540 6 942.66 T 1.73E05 NM.008484 D10Mit179 10 1495.44 C 2.43E02 NM.020018 D4Mit344 4 689.95 C 2.64E02 NM.020018 D1Mit178 1 69.13 T 6.52E05 NM.020018 D1Mit216 1 80.32 T 6.30E06 NM.020018 D1Mit134 1 80.78 T 4.52E05 NM.016760 D1Mit216 1 80.32 T 7.51E06 AF120321 D1Mit216 1 80.32 T 4.71E05 Continued on next page

179 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P D12725 D16Mit188 16 2159.79 T 4.25E05 AK002893 D12Mit150 12 1742.65 C 3.22E02 AK015256 D1Mit216 1 80.32 T 2.92E05 AK015058 D11Mit337 11 1627.76 C 4.61E02 NM.024169 D19Mit6 19 2426.62 C 2.98E02 AK005991 D1Mit134 1 80.78 T 3.37E05 BC011128 D1Mit216 1 80.32 T 1.74E05 BC011128 D1Mit134 1 80.78 T 4.93E06 AK015871 D11Mit337 11 1627.76 C 1.95E02 AK019717 D1Mit216 1 80.32 T 9.54E06 AK012454 D1Mit216 1 80.32 T 1.09E06 AK014187 D1Mit216 1 80.32 T 1.97E05 AK013835 D1Mit216 1 80.32 T 4.92E06 AK013835 D1Mit134 1 80.78 T 4.33E06 AK013835 D1Mit83 1 86.03 T 2.33E05 AK004391 D1Mit216 1 80.32 T 2.28E05 AK004391 D1Mit134 1 80.78 T 3.63E05 AK013838 D1Mit216 1 80.32 T 1.44E06 AK013838 D1Mit134 1 80.78 T 6.44E06 AK016404 D8Mit124 8 1139.72 T 6.65E05 AK018459 D1Mit134 1 80.78 T 6.14E05 AK016174 D1Mit134 1 80.78 T 2.45E05 AK012318 D8Mit156 8 1253.28 C 3.38E02 AK009255 D19Mit61 19 2380.97 T 4.90E05 NM.025369 D10Mit179 10 1495.44 C 4.87E02 AF176521 D7Mit259 7 1123.02 C 1.88E02 NM.007690 D19Mit28 19 2381.35 T 1.98E05 NM.016877 D1Mit134 1 80.78 T 7.20E05 NM.019458 D1Mit216 1 80.32 T 1.29E05 Continued on next page

180 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P NM.019458 D1Mit134 1 80.78 T 7.03E06 U38502 D1Mit216 1 80.32 T 2.94E05 NM.007990 D19Mit28 19 2381.35 T 5.26E05 NM.010324 D4Mit344 4 689.95 C 2.36E02 NM.007494 D15Mit35 15 2082.50 C 6.45E03 NM.013876 D11Mit351 11 1576.22 T 6.21E05 NM.010163 D11Mit337 11 1627.76 C 1.21E02 NM.010163 D1Mit216 1 80.32 T 4.81E05 NM.010163 D1Mit134 1 80.78 T 6.35E05 NM.009076 DXMit186 X 2582.74 C 3.93E02 NM.019480 D19Mit6 19 2426.62 C 2.13E02 NM.008958 DXMit1 X 2491.85 T 6.14E05 D87910 D1Mit216 1 80.32 T 7.58E05 X15052 DXMit25 X 2490.22 T 3.07E05 X15052 DXMit1 X 2491.85 T 2.38E05 X15052 DXMsw076 X 2512.01 T 7.41E05 NM.007878 D13Mit35 13 1862.73 C 8.50E04 AK015955 D1Mit171 1 37.07 T 6.73E05 AF230110 D7Mit259 7 1123.02 C 4.93E02 AB041803 D1Mit216 1 80.32 T 1.09E05 AB041803 D1Mit134 1 80.78 T 9.63E06 AK015300 D11Mit337 11 1627.76 C 2.11E02 NM.026142 D7Mit301 7 1069.38 T 6.78E05 Z25851 D13Mit35 13 1862.73 C 2.28E02 AK016921 D6Mit67 6 938.74 T 1.43E05 AK016921 D6Mit230 6 939.49 T 1.85E07 AK016921 06.100.540 6 942.66 T 9.63E08 AK020788 05.142.105 5 835.33 T 3.77E05 AK006542 D10Mit179 10 1495.44 C 1.42E02 Continued on next page

181 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P AK013817 D19Mit61 19 2380.97 T 2.58E05 AK013817 D19Mit28 19 2381.35 T 5.38E05 AK013817 D19Mit128 19 2382.70 T 4.13E05 AK021233 D1Mit216 1 80.32 T 1.28E05 AK021233 D1Mit134 1 80.78 T 4.88E05 AK007969 D12Mit150 12 1742.65 C 3.14E02 BC008116 D1Mit216 1 80.32 T 2.76E06 BC008116 D1Mit134 1 80.78 T 3.44E06 AK015867 D2Mit457 2 376.81 C 5.02E03 NM.010271 D4Mit344 4 689.95 C 4.42E02 NM.010271 D8Mit128 8 1185.17 T 7.52E05 NM.010271 08.062.280 8 1186.59 T 5.97E05 NM.007438 D1Mit134 1 80.78 T 4.77E05 NM.011921 D1Mit216 1 80.32 T 2.88E06 NM.011921 D1Mit134 1 80.78 T 3.23E07 NM.008570 01.070.445 1 73.31 T 6.98E05 NM.009534 D1Mit134 1 80.78 T 5.83E06 NM.007457 D1Mit19 1 74.13 T 6.54E05 NM.007457 D1Mit216 1 80.32 T 7.25E07 NM.007457 D1Mit134 1 80.78 T 2.82E06 NM.007457 D1Mit83 1 86.03 T 2.50E05 U69600 D7Mit259 7 1123.02 C 3.81E02 NM.007470 D1Mit134 1 80.78 T 3.24E05 NM.021791 D1Mit134 1 80.78 T 6.38E05 AF322069 D1Mit216 1 80.32 T 3.87E06 AF322069 D1Mit134 1 80.78 T 1.52E07 AF322069 D1Mit83 1 86.03 T 2.29E06 NM.013886 D1Mit216 1 80.32 T 1.00E06 NM.013886 D1Mit134 1 80.78 T 5.56E07 Continued on next page

182 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P U18869 D4Mit344 4 689.95 C 2.93E02 NM.019453 D1Mit178 1 69.13 T 1.71E05 NM.019453 D1Mit128 1 72.96 T 4.90E05 NM.019453 D1Mit19 1 74.13 T 1.07E05 NM.019453 D1Mit181 1 74.94 T 4.71E05 NM.019453 D1Mit134 1 80.78 T 3.95E07 NM.008972 D18Mit144 18 2361.14 C 1.59E02 NM.008972 D19Mit109 19 2372.17 T 4.61E05 NM.013488 D1Mit216 1 80.32 T 2.34E05 NM.013488 D1Mit134 1 80.78 T 1.84E05 AF258602 D1Mit19 1 74.13 T 4.13E05 AF258602 D1Mit216 1 80.32 T 5.58E06 AF258602 D1Mit134 1 80.78 T 1.20E08 AF258602 D1Mit83 1 86.03 T 1.30E05 NM.008405 D1Mit216 1 80.32 T 5.25E05 NM.013897 D19Mit103 19 2419.31 T 1.21E05 AF253540 D1Mit155 1 194.32 C 3.45E02 AK019788 D8Mit124 8 1139.72 T 4.75E05 M18500 D1Mit216 1 80.32 T 8.88E08 M18500 D1Mit134 1 80.78 T 4.73E05 Z37502 D4Mit344 4 689.95 C 1.17E02 NM.024230 D13Mit35 13 1862.73 C 5.93E04 AK009014 D1Mit134 1 80.78 T 2.96E05 AK019568 D1Mit216 1 80.32 T 7.25E05 NM.025639 D19Mit103 19 2419.31 T 8.25E06 AK021018 D1Mit134 1 80.78 T 1.33E05 AK008900 D1Mit216 1 80.32 T 6.30E05 AK008900 D1Mit134 1 80.78 T 4.42E05 AK011360 D1Mit134 1 80.78 T 4.47E07 Continued on next page

183 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P AK011360 D1Mit83 1 86.03 T 1.62E06 AK017285 D1Mit216 1 80.32 T 2.25E06 AK017285 D1Mit134 1 80.78 T 1.19E06 AK006038 D10Mit179 10 1495.44 C 4.28E02 AK003103 D10Mit179 10 1495.44 C 2.59E02 AK003103 D1Mit128 1 72.96 T 3.98E05 AK003103 D1Mit19 1 74.13 T 2.67E05 AK017242 13.043.620 13 1791.43 T 4.13E05 AK005756 D1Mit216 1 80.32 T 2.13E05 AK017332 D1Mit134 1 80.78 T 9.01E06 NM.011131 D1Mit134 1 80.78 T 7.13E05 NM.019976 D1Mit216 1 80.32 T 5.08E06 NM.019976 D1Mit134 1 80.78 T 5.34E06 NM.021506 D1Mit216 1 80.32 T 1.32E05 NM.021506 D1Mit134 1 80.78 T 3.16E06 AB010330 D8Mit124 8 1139.72 T 6.24E05 NM.008935 D8Mit124 8 1139.72 T 1.79E05 NM.009913 D1Mit19 1 74.13 T 2.77E05 NM.009913 D1Mit216 1 80.32 T 2.35E06 NM.009913 D1Mit134 1 80.78 T 3.11E07 NM.009913 D1Mit83 1 86.03 T 3.39E05 L07051 D1Mit216 1 80.32 T 2.54E05 L07051 D1Mit134 1 80.78 T 2.80E05 L07051 D1Mit83 1 86.03 T 2.79E05 NM.016912 D19Mit68 19 2369.61 T 3.04E05 AF061933 D1Mit178 1 69.13 T 6.39E05 AF061933 D1Mit128 1 72.96 T 5.48E05 AF061933 D1Mit19 1 74.13 T 1.63E05 NM.008367 D1Mit218 1 127.28 T 2.45E05 Continued on next page

184 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P J04847 D1Mit128 1 72.96 T 6.01E05 J04847 D1Mit19 1 74.13 T 3.33E05 J04847 D1Mit134 1 80.78 T 1.16E05 NM.013641 11.085.885 11 1587.95 T 3.36E05 NM.013641 D11Mit34 11 1587.97 T 3.08E05 NM.008154 D1Mit178 1 69.13 T 1.21E05 NM.008154 D1Mit128 1 72.96 T 1.59E06 NM.008154 01.070.445 1 73.31 T 2.66E05 NM.008154 D1Mit19 1 74.13 T 3.28E08 NM.008154 D1Mit181 1 74.94 T 1.19E06 AK018143 D1Mit19 1 74.13 T 4.89E05 BC007170 D1Mit128 1 72.96 T 6.27E05 BC007170 D1Mit19 1 74.13 T 2.33E05 BC007170 D1Mit216 1 80.32 T 1.55E06 BC007170 D1Mit134 1 80.78 T 1.86E07 AK010983 D1Mit134 1 80.78 T 1.05E05 AK010715 D1Mit83 1 86.03 T 5.49E05 BC011338 D19Mit109 19 2372.17 T 4.38E05 BC007160 D10Mit122 10 1481.94 T 4.83E05 BC008274 D1Mit216 1 80.32 T 3.15E05 BC008274 D1Mit134 1 80.78 T 3.41E05 M25571 D17Mit221 17 2270.31 C 4.75E02 AK012475 D1Mit134 1 80.78 T 2.90E05 AK015594 D1Mit216 1 80.32 T 4.28E05 AK015594 D1Mit134 1 80.78 T 1.35E05 AK003495 D1Mit216 1 80.32 T 1.23E05 AK003495 D1Mit134 1 80.78 T 6.55E06 AK007473 D12Mit150 12 1742.65 C 2.34E02 AK003662 D1Mit178 1 69.13 T 6.73E05 Continued on next page

185 Signicant linkages in the BxD panel

Table A.1 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P AK003662 D1Mit128 1 72.96 T 7.66E05 AK003662 D1Mit19 1 74.13 T 2.50E05 AK003662 D1Mit216 1 80.32 T 2.47E05 AK003662 D1Mit134 1 80.78 T 6.73E06

Table A.2: Linkage hits in cis and in trans for expression levels measured in BXD kidney preparations. Signicant linkage is dened as P < 0.05, with genome–wise adjustment for trans eects.

Transcript Locus Chr Location (Mb) Cis/Trans P NM.009902 D11Mit337 11 1627.76 C 2.66E03 NM.015729 DXMit186 X 2582.74 C 1.24E03 AF041970 D4Mit344 4 689.95 C 1.04E02 AB053181 D8Mit156 8 1253.28 C 3.09E02 NM.009008 14.035.300 14 1898.47 T 5.31E05 NM.008761 D12Mit150 12 1742.65 C 2.70E02 AK013743 D6Mit14 6 986.00 C 2.03E02 AK008891 D2Mit457 2 376.81 C 4.47E02 NM.008097 D17Mit221 17 2270.31 C 2.38E02 AK013368 D16Mit47 16 2144.43 T 4.08E05 AK012465 DXMsw087 X 2522.85 T 5.16E05 AK002376 D8Mit156 8 1253.28 C 9.37E03 AF033665 D10Mit179 10 1495.44 C 4.96E02 NM.026318 05.142.105 5 835.33 C 4.44E02 M25572 D9Mit151 9 1375.37 C 3.60E03 AK006597 D9Mit151 9 1375.37 C 1.32E02 AK013917 D9Mit151 9 1375.37 C 4.14E02 NM.009466 D8Mit287 8 1148.94 T 2.50E05 NM.015749 D17Mit187 17 2259.38 T 3.56E05 Continued on next page

186 Signicant linkages in the BxD panel

Table A.2 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P NM.013697 D15Mit35 15 2082.50 C 4.20E02 NM.008530 D16Mit86 16 2176.22 C 4.91E03 AK006546 D8Mit124 8 1139.72 T 5.37E05 AK018157 D12Mit150 12 1742.65 C 2.15E02 NM.016768 D6Mit14 6 986.00 C 1.72E03 NM.011186 D8Mit189 8 1158.19 T 4.90E05 NM.011186 D8Mit24 8 1158.56 T 6.20E05 NM.011186 D8Mit294 8 1163.43 T 6.23E05 BC003922 D2Mit12 2 298.86 T 4.77E05 NM.024197 D18Mit144 18 2361.14 C 4.92E02 NM.013489 D1Mit155 1 194.32 C 1.12E02 NM.009574 D9Mit297 9 1287.93 T 3.90E05 AK020768 D10Mit179 10 1495.44 C 4.04E02 M17474 D7Mit259 7 1123.02 C 4.42E03 AK014763 D2Mit457 2 376.81 C 2.90E02 BC004679 D17Mit221 17 2270.31 C 3.53E02 BC005637 D17Mit221 17 2270.31 C 3.31E02 NM.018798 D13Mit35 13 1862.73 C 3.67E02 AK005397 D8Mit124 8 1139.72 T 1.81E05 AK011901 D6Mit14 6 986.00 C 9.94E03 AL359935 D15Mit35 15 2082.50 C 4.54E02 AK016182 D14Mit131 14 1975.00 C 1.38E02 AF133912 D8Mit156 8 1253.28 C 1.99E02 NM.025961 D1Mit155 1 194.32 C 3.39E02 AK016188 D8Mit289 8 1152.59 T 5.31E05 NM.010933 D3Mit19 3 535.89 C 1.98E02 U30239 D4Mit344 4 689.95 C 3.69E02 Z12266 05.142.105 5 835.33 C 4.77E02 D29931 D17Mit221 17 2270.31 C 3.08E02 Continued on next page

187 Signicant linkages in the BxD panel

Table A.2 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P AK012261 D9Mit297 9 1287.93 T 1.62E05 NM.008066 D2Mit457 2 376.81 C 3.48E02 AK012602 D9Mit151 9 1375.37 C 1.82E02 AK011952 D6Mit14 6 986.00 C 3.87E02 AK018196 D17Mit221 17 2270.31 C 3.96E02 AK014501 D18Mit144 18 2361.14 T 6.70E05 NM.008489 D11Mit337 11 1627.76 C 2.82E02 NM.009841 D18Mit144 18 2361.14 C 1.13E02 NM.011797 16.018.905 16 2103.78 T 2.86E05 AK015562 D12Mit150 12 1742.65 C 2.39E02 AK002929 D6Mit14 6 986.00 C 3.96E02 NM.010371 D6Mit14 6 986.00 C 3.21E02 D50393 D16Mit86 16 2176.22 C 1.28E02 AK005218 D8Mit156 8 1253.28 C 1.75E02 AK008255 D11Mit337 11 1627.76 C 1.64E02 AK002535 D2Mit457 2 376.81 C 3.82E02 NM.013836 D2Mit457 2 376.81 C 1.03E02 Y15910 D6Mit14 6 986.00 T 3.89E05 NM.008770 D9Mit297 9 1287.93 T 6.04E05 AK004803 D15Mit13 15 1981.79 T 4.70E05 NM.019484 10.104.590 10 1483.68 T 3.77E05 AK013272 D1Mit155 1 194.32 C 3.42E02 AF138745 D9Mit151 9 1375.37 C 4.02E02 X96700 D1Mit155 1 194.32 C 2.64E02 X96700 10.104.590 10 1483.68 T 6.94E05 AF041890 D9Mit151 9 1375.37 C 3.93E02 AK010122 D4Mit344 4 689.95 C 4.07E02 NM.009121 D4Mit344 4 689.95 C 2.20E02 NM.008098 D2Mit238 2 229.95 T 3.46E06 Continued on next page

188 Signicant linkages in the BxD panel

Table A.2 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P AB036742 D1Mit128 1 72.96 T 9.51E07 AB036742 01.070.445 1 73.31 T 2.84E05 AB036742 D1Mit19 1 74.13 T 1.08E05 AF217545 D8Mit335 8 1144.74 T 2.37E05 U58670 D8Mit289 8 1152.59 T 5.13E05 U58670 D8Mit294 8 1163.43 T 6.72E05 U58670 D8Mit339 8 1164.38 T 2.04E05 AK019605 05.142.105 5 835.33 C 4.46E02 NM.025461 D12Mit150 12 1742.65 C 1.06E02 AK015084 D16Mit86 16 2176.22 C 1.88E02 NM.016876 D12Mit36 12 1687.81 T 5.93E05 NM.007570 D2Mit457 2 376.81 C 7.46E03 M35732 D10Mit179 10 1495.44 C 3.09E02 NM.007544 D15Mit13 15 1981.79 T 1.22E05 AK008190 D12Mit150 12 1742.65 C 3.93E02 M83099 D11Mit337 11 1627.76 C 2.35E02 AK014977 D9Mit151 9 1375.37 C 3.16E02 AK003448 D8Mit124 8 1139.72 T 5.34E05 AK003448 D8Mit287 8 1148.94 T 5.56E05 AK003937 D1Mit19 1 74.13 T 6.93E05 AK008476 D1Mit128 1 72.96 T 7.10E05 AK008476 D1Mit19 1 74.13 T 7.15E05 BC013667 D1Mit128 1 72.96 T 6.79E05 BC013667 01.070.445 1 73.31 T 6.71E05 AK003596 D1Mit155 1 194.32 C 4.51E02 NM.023149 D15Mit35 15 2082.50 C 9.01E03 NM.025829 DXMit186 X 2582.74 C 2.50E02 AK007640 DXMit186 X 2582.74 C 4.07E02 BC002090 D13Mit35 13 1862.73 C 3.99E02 Continued on next page

189 Signicant linkages in the BxD panel

Table A.2 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P NM.009557 D6Mit14 6 986.00 C 4.21E02 AF144627 D8Mit156 8 1253.28 C 4.00E02 AF041974 D1Mit155 1 194.32 C 4.09E02 Z70662 D16Mit86 16 2176.22 C 1.11E02 AK003303 D6Mit14 6 986.00 C 4.09E02 AK016323 D11Mit337 11 1627.76 C 3.57E02 AK021043 D8Mit156 8 1253.28 C 1.10E02 BC005630 D6Mit14 6 986.00 C 3.09E02 NM.009093 02.151.240 2 340.66 T 8.24E06 NM.009093 D2Mit282 2 344.48 T 1.61E05 NM.009093 D2Mit493 2 349.91 T 8.73E06 NM.008067 D7Mit259 7 1123.02 C 4.50E02 NM.019811 D13Mit35 13 1862.73 C 3.82E02 D90156 D4Mit344 4 689.95 C 1.82E02 AF041902 D9Mit151 9 1375.37 C 4.80E02 NM.026612 D11Mit337 11 1627.76 C 4.60E02 AK006508 D8Mit156 8 1253.28 C 3.10E02 NM.007747 D15Mit71 15 2058.42 T 5.58E05 NM.007747 D15Mit158 15 2059.11 T 6.11E06 Z12568 D11Mit337 11 1627.76 C 4.75E02 BC010822 D6Mit14 6 986.00 C 4.07E02 AK017171 D8Mit124 8 1139.72 T 3.52E06 AK017171 D8Mit3 8 1148.38 T 7.30E05 AK017171 D8Mit287 8 1148.94 T 7.00E05 BC003429 D9Mit297 9 1287.93 T 3.65E05 BC003429 D9Mit91 9 1291.14 T 1.11E05 AK015488 D13Mit35 13 1862.73 C 1.59E02 NM.009984 D18Mit144 18 2361.14 C 4.72E02 AF041950 D4Mit344 4 689.95 C 4.61E03 Continued on next page

190 Signicant linkages in the BxD panel

Table A.2 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P AK005134 D3Mit19 3 535.89 C 4.34E02 AK003956 D16Mit86 16 2176.22 C 1.78E02 NM.026308 D9Mit297 9 1287.93 T 2.01E05 NM.026308 D9Mit91 9 1291.14 T 3.59E05 NM.017395 D12Mit150 12 1742.65 C 3.51E02 AF229643 D9Mit151 9 1375.37 C 6.66E03 NM.008625 D1Mit155 1 194.32 C 7.51E04 NM.009291 D10Mit179 10 1495.44 C 3.09E02 AK015388 D4Mit344 4 689.95 C 4.66E02 Z12253 05.142.105 5 835.33 C 3.65E02 NM.025797 D11Mit337 11 1627.76 C 1.68E02 AK014240 D4Mit344 4 689.95 C 2.42E02 NM.021455 D11Mit337 11 1627.76 C 2.02E02 NM.010564 08.025.220 8 1150.47 T 3.85E05 NM.010564 D8Mit289 8 1152.59 T 1.51E05 NM.010564 D8Mit189 8 1158.19 T 4.87E05 NM.028595 D12Mit150 12 1742.65 C 1.33E02 X04097 D7Mit343 7 1010.09 T 2.91E06 X04097 D7Mit155 7 1010.71 T 1.05E07 X04097 D7Mit225 7 1014.32 T 1.54E10 X04097 D7Mit227 7 1015.57 T 8.60E10 X04097 D7Mit228 7 1017.99 T 6.72E05 NM.011475 D1Mit155 1 194.32 C 1.48E02 AK018565 D6Mit14 6 986.00 C 9.81E03 AK016214 D3Mit19 3 535.89 C 2.90E02 BC011259 D1Mit128 1 72.96 T 6.48E05 BC011259 D1Mit19 1 74.13 T 3.03E05 BC011259 D2Mit14 2 281.66 T 7.38E05 AY035213 D10Mit179 10 1495.44 C 1.87E03 Continued on next page

191 Signicant linkages in the BxD panel

Table A.2 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P NM.025699 DXMit186 X 2582.74 C 4.09E03 AK014992 DXMit186 X 2582.74 C 5.51E03 AF155355 D18Mit144 18 2361.14 C 4.59E02 AK010079 D11Mit337 11 1627.76 C 1.33E02 Z17401 D12Mit235 12 1669.88 T 1.49E05 AK015610 D11Mit337 11 1627.76 C 4.77E02 BC003975 D11Mit337 11 1627.76 C 3.05E02 AK009575 D12Mit150 12 1742.65 C 4.27E02 NM.021481 D14Mit131 14 1975.00 C 4.41E02 AF396656 D9Mit297 9 1287.93 T 2.63E05 AF396656 D9Mit91 9 1291.14 T 1.91E05 AK009373 D9Mit151 9 1375.37 C 4.47E02 NM.009280 D9Mit151 9 1375.37 C 8.95E04 AF176530 D14Mit131 14 1975.00 C 1.19E02 AK018430 D12Mit150 12 1742.65 C 1.04E02 AK019407 D8Mit294 8 1163.43 T 6.46E05 BC013803 D1Mit155 1 194.32 C 7.45E03 X81716 D8Mit289 8 1152.59 T 1.51E05 NM.013496 D17Mit221 17 2270.31 C 4.69E02 NM.008623 D19Mit6 19 2426.62 C 3.10E02 AK016507 D12Mit150 12 1742.65 C 4.37E02 BC012974 D10Mit179 10 1495.44 C 3.11E02 AK005379 D8Mit287 8 1148.94 T 5.44E05 X12905 D11Mit337 11 1627.76 C 1.47E02 BC003974 D8Mit156 8 1253.28 C 2.03E02 AK013992 D6Mit14 6 986.00 C 4.68E02 X67789 D15Mit29 15 2053.03 T 7.22E06 AK010782 D7Mit259 7 1123.02 C 3.31E02 AK018387 D3Mit19 3 535.89 C 2.14E03 Continued on next page

192 Signicant linkages in the BxD panel

Table A.2 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P AK007278 D2Mit457 2 376.81 C 1.52E02 NM.008218 Src 2 353.40 T 4.61E05 NM.008218 D2Mit411 2 355.52 T 4.15E05 NM.011972 D11Mit337 11 1627.76 C 6.60E03 NM.008303 D3Mit76 3 472.32 T 4.43E05 NM.008303 D3Mit189 3 478.21 T 7.36E05 AF155353 D6Mit14 6 986.00 C 3.97E02 Z22098 D11Mit337 11 1627.76 C 2.82E02 NM.023455 05.142.105 5 835.33 C 4.69E02 AK020496 D1Mit155 1 194.32 C 2.01E02 NM.021484 D14Mit131 14 1975.00 C 3.70E02 NM.007409 D3Mit189 3 478.21 T 3.58E05 NM.008437 DXMsw076 X 2512.01 T 6.08E05 BC015283 05.142.105 5 835.33 C 3.69E02 NM.007823 D16Mit86 16 2176.22 C 2.98E02 NM.008968 D18Mit144 18 2361.14 C 1.90E02 NM.009776 D12Mit150 12 1742.65 C 4.34E02 AK013398 D14Mit131 14 1975.00 C 4.19E02 NM.033174 D13Mit35 13 1862.73 C 1.92E02 AK005159 D13Mit35 13 1862.73 C 4.92E02 AK014238 D2Mit457 2 376.81 C 3.69E03 AK004474 D13Mit35 13 1862.73 C 3.79E02 AK019638 D18Mit144 18 2361.14 T 2.65E05 NM.019472 D6Mit14 6 986.00 C 9.29E03 AK014529 D3Mit189 3 478.21 T 7.12E05 BC009153 D14Mit131 14 1975.00 C 2.35E02 AK002997 D9Mit151 9 1375.37 C 1.94E03 AK016395 D8Mit189 8 1158.19 T 4.67E06 AK016395 D8Mit24 8 1158.56 T 2.38E05 Continued on next page

193 Signicant linkages in the BxD panel

Table A.2 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P AK016395 D8Mit294 8 1163.43 T 1.09E06 AK016395 D8Mit339 8 1164.38 T 4.77E05 AK006240 D19Mit6 19 2426.62 C 4.69E02 AK008598 D2Mit457 2 376.81 C 2.92E02 M13446 D1Mit155 1 194.32 C 3.02E02 NM.008357 D9Mit151 9 1375.37 C 2.15E02 NM.008059 D2Mit457 2 376.81 C 4.17E02 AF041952 D7Mit259 7 1123.02 C 3.86E02 AF367247 D3Mit19 3 535.89 C 4.50E02 AK004503 D9Mit297 9 1287.93 T 3.02E06 AK004503 D9Mit91 9 1291.14 T 1.96E05 AK019624 D12Mit150 12 1742.65 C 4.37E02 AK014037 D10Mit179 10 1495.44 C 1.99E02 AK015767 D13Mit35 13 1862.73 C 4.17E02 AK011170 D14Mit121 14 1903.96 T 3.78E05 AK011170 D14Mit214 14 1905.14 T 4.16E05 AK011170 D14Mit5 14 1914.38 T 4.37E05 AK009774 D12Mit150 12 1742.65 C 3.91E02 NM.007381 D18Mit144 18 2361.14 C 1.36E02 NM.011616 D2Mit457 2 376.81 C 2.48E02 NM.026616 D1Mit155 1 194.32 C 3.63E02 AK016404 D2Mit457 2 376.81 C 6.32E03 AK005643 D3Mit19 3 535.89 C 1.13E02 AK015044 D10Mit179 10 1495.44 C 3.75E02 AK013716 D8Mit289 8 1152.59 T 4.88E05 Y17471 D6Mit14 6 986.00 C 1.35E02 AK017333 D16Mit86 16 2176.22 C 2.33E02 AK015576 D14Mit131 14 1975.00 C 3.01E02 AK017574 D1Mit155 1 194.32 C 2.63E02 Continued on next page

194 Signicant linkages in the BxD panel

Table A.2 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P AK016298 DXMit186 X 2582.74 C 2.14E02 AF232011 D16Mit86 16 2176.22 C 1.55E02 AK020281 D7Mit228 7 1017.99 T 5.48E05 NM.026128 D3Mit19 3 535.89 C 3.26E02 AK017831 D11Mit337 11 1627.76 C 2.04E02 AK004787 D6Mit14 6 986.00 C 5.00E02 NM.013868 D7Mit259 7 1123.02 C 3.57E02 AK007104 D9Mit151 9 1375.37 C 6.45E03 BC012230 D3Mit19 3 535.89 C 3.05E02

Table A.3: Linkage hits in cis and in trans for expression levels measured in BXD liver preparations. Signicant linkage is dened as P < 0.05, with genome–wise adjustment for trans eects.

Transcript Locus Chr Location (Mb) Cis/Trans P NM.017370 18.012.470 18 2290.74 T 7.38E06 NM.008003 D6Mit298 6 878.70 T 5.82E05 NM.021610 D13Mit106 13 1835.88 T 6.10E05 X59306 D19Mit6 19 2426.62 C 1.43E02 NM.009902 D9Mit227 9 1295.44 T 6.76E05 BC008264 D5Mit139 5 813.88 T 1.74E05 BC008264 D5Mit370 5 815.22 T 1.37E05 U35650 D4Mit237 4 580.06 T 2.31E05 U35650 D4Mit214 4 582.87 T 3.02E09 U35650 D4Mit111 4 590.76 T 7.00E06 U35650 D4Mit288 4 594.00 T 4.21E06 U35650 D4Mit17 4 599.66 T 1.50E05 AK019334 D11Mit34 11 1587.97 T 1.57E05 NM.009854 D19Mit6 19 2426.62 C 4.17E02 Continued on next page

195 Signicant linkages in the BxD panel

Table A.3 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P Z12284 D4Mit171 4 560.38 T 6.53E05 S60870 D11Mit337 11 1627.76 C 8.13E03 AB010331 D6Mit298 6 878.70 T 5.38E05 NM.021384 D19Mit6 19 2426.62 C 3.72E02 NM.018739 D9Mit227 9 1295.44 T 5.77E05 Z12419 D2Mit457 2 376.81 C 2.01E02 M23894 D17Mit175 17 2212.04 T 5.46E05 AK015651 D16Mit86 16 2176.22 C 3.72E02 AK015651 D11Mit154 11 1560.78 T 2.26E05 AK015651 D11Mit131 11 1564.89 T 7.05E05 AB018421 D1Mit155 1 194.32 C 2.57E02 NM.007762 D12Mit231 12 1727.16 T 1.21E05 NM.007762 D12Mit101 12 1729.13 T 8.41E06 NM.007762 D12Mit289 12 1729.64 T 7.20E06 M16073 16.044.855 16 2130.05 T 4.01E05 AK004313 D17Mit221 17 2270.31 C 2.17E02 AK020767 D15Mit35 15 2082.50 C 1.91E02 BC005803 D16Mit86 16 2176.22 C 3.49E02 NM.010924 D2Mit436 2 275.87 T 2.17E05 NM.010924 D9Mit4 9 1306.17 T 6.51E05 NM.010924 D9Mit21 9 1311.77 T 7.58E05 AK003943 D19Mit53 19 2410.66 T 2.48E05 AK003943 D19Mit103 19 2419.31 T 9.11E06 NM.020009 D18Mit144 18 2361.14 C 1.28E02 NM.008317 D15Mit35 15 2082.50 C 4.29E02 M93980 D4Mit249 4 662.10 T 3.78E05 NM.013697 D4Mit12 4 660.71 T 6.33E05 AK006931 D3Mit19 3 535.89 C 4.13E02 AK018636 D4Mit344 4 689.95 C 2.10E02 Continued on next page

196 Signicant linkages in the BxD panel

Table A.3 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P NM.024215 D11Mit337 11 1627.76 C 3.28E02 AK007274 05.142.105 5 835.33 C 2.59E02 AK016223 D11Mit337 11 1627.76 C 3.66E05 NM.011430 D6Mit14 6 986.00 C 3.04E02 AK020193 D12Mit289 12 1729.64 T 4.01E06 AK015950 D11Mit337 11 1627.76 C 3.43E02 AK013783 D12Mit150 12 1742.65 C 1.92E02 NM.010739 D4Mit344 4 689.95 C 1.92E02 NM.009125 D6Mit14 6 986.00 C 5.50E03 AF179403 D14Mit131 14 1975.00 C 7.38E03 AK014359 11.059.515 11 1566.46 T 3.58E05 NM.010225 D17Mit221 17 2270.31 C 5.00E02 AK005784 D10Mit179 10 1495.44 C 4.00E02 U04353 D9Mit151 9 1375.37 C 3.83E02 NM.008688 DXMit186 X 2582.74 C 4.12E02 NM.008953 D13Mit35 13 1862.73 C 2.60E02 NM.010187 D2Mit457 2 376.81 C 3.44E03 Z12440 D9Mit151 9 1375.37 C 3.34E02 AK019572 D6Mit14 6 986.00 C 4.29E02 NM.010933 D7Mit259 7 1123.02 C 3.90E02 NM.019414 D7Mit246 7 1009.87 T 6.56E05 AB041555 D11Mit34 11 1587.97 T 3.45E05 AF041884 DXMit186 X 2582.74 C 4.87E02 AK005665 D1Mit155 1 194.32 C 4.10E02 AF387322 D11Mit337 11 1627.76 C 5.67E03 NM.008884 D19Mit6 19 2426.62 C 3.39E02 L04150 D18Mit83 18 2296.35 T 6.22E05 NM.008836 D11Mit337 11 1627.76 C 4.00E02 AK004494 D12Mit101 12 1729.13 T 4.60E05 Continued on next page

197 Signicant linkages in the BxD panel

Table A.3 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P NM.008898 D11Mit34 11 1587.97 T 5.95E05 AK017880 D4Mit344 4 689.95 C 4.74E02 AK011390 D15Mit13 15 1981.79 T 6.74E05 NM.009209 D4Mit344 4 689.95 C 1.93E02 AK008255 D19Mit6 19 2426.62 C 4.51E02 AK019620 D1Mit155 1 194.32 C 4.40E02 NM.013836 05.142.105 5 835.33 C 2.93E02 AK013272 D11Mit337 11 1627.76 C 1.29E02 AK007715 D2Mit457 2 376.81 C 4.95E02 AK020041 D11Mit337 11 1627.76 C 2.54E03 NM.008280 D6Mit14 6 986.00 C 3.42E02 NM.010918 D12Mit150 12 1742.65 C 1.58E02 AB035322 D1Mit145 1 167.32 T 7.37E05 NM.021535 D11Mit337 11 1627.76 C 1.51E02 NM.008183 D18Mit60 18 2308.08 T 2.94E05 AF093591 D12Mit150 12 1742.65 C 1.74E02 AK003937 D11Mit34 11 1587.97 T 5.36E05 BC013667 D11Mit245 11 1585.67 T 7.09E05 BC013667 11.085.885 11 1587.95 T 2.04E06 BC013667 D11Mit34 11 1587.97 T 8.55E07 AK016150 D4Mit344 4 689.95 C 3.47E02 NM.007591 D1Mit134 1 80.78 T 7.10E05 NM.010024 D4Mit214 4 582.87 T 1.20E05 NM.010024 D4Mit111 4 590.76 T 1.32E05 NM.011514 D5Mit139 5 813.88 T 3.94E05 NM.011514 D5Mit370 5 815.22 T 7.01E05 NM.011514 D8Mit124 8 1139.72 T 1.55E05 NM.011514 D8Mit335 8 1144.74 T 4.56E05 AK017586 D12Mit114 12 1697.30 T 9.70E06 Continued on next page

198 Signicant linkages in the BxD panel

Table A.3 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P AK017586 D12Mit251 12 1698.22 T 1.05E05 AK003192 D6Mit14 6 986.00 C 4.68E02 NM.021877 D1Mit155 1 194.32 C 4.66E02 AJ272393 D7Mit259 7 1123.02 C 1.35E02 BC005588 D15Mit35 15 2082.50 C 4.30E02 NM.007618 D3Mit19 3 535.89 C 4.52E02 NM.009255 D16Mit86 16 2176.22 C 4.67E02 AK013045 D11Mit154 11 1560.78 T 6.29E05 NM.028602 D13Mit35 13 1862.73 C 2.41E02 AF190624 D12Mit289 12 1729.64 T 3.12E07 AF190624 D12Mit167 12 1731.75 T 4.51E05 AF190624 D12Msw102 12 1734.49 T 5.39E06 AF190624 D12Mit280 12 1737.88 T 1.79E06 AK004654 04.147.400 4 684.00 T 3.62E05 AK020868 D16Mit86 16 2176.22 C 1.85E02 BC006779 D8Mit335 8 1144.74 T 9.80E07 BC006779 D8Mit3 8 1148.38 T 1.83E05 BC006779 D8Mit287 8 1148.94 T 9.32E06 BC006779 08.025.220 8 1150.47 T 1.30E05 BC006779 D8Mit289 8 1152.59 T 2.76E06 BC006779 D8Mit189 8 1158.19 T 2.07E10 BC006779 D8Mit24 8 1158.56 T 6.15E08 BC006779 D8Mit294 8 1163.43 T 2.97E08 BC006779 D8Mit339 8 1164.38 T 4.03E06 NM.008946 D17Mit221 17 2270.31 C 3.98E02 NM.008946 18.012.470 18 2290.74 T 2.09E05 U60001 D6Mit14 6 986.00 C 4.98E02 AK008378 D19Mit6 19 2426.62 C 4.14E02 AK009021 D8Mit124 8 1139.72 T 4.58E05 Continued on next page

199 Signicant linkages in the BxD panel

Table A.3 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P NM.010885 D6Mit14 6 986.00 C 5.32E03 BC008111 D4Mit344 4 689.95 C 2.86E02 AF398969 D7Mit259 7 1123.02 C 3.30E02 AK007896 D18Mit144 18 2361.14 C 3.85E02 AF414080 D12Mit289 12 1729.64 T 6.76E06 AK013887 D1Mit216 1 80.32 T 3.81E05 AK003534 D17Mit100 17 2206.37 T 1.36E05 AK012518 04.150.225 4 686.80 T 6.04E05 AK015045 D4Mit344 4 689.95 C 1.83E02 AK015795 D1Mit83 1 86.03 T 2.21E05 AK014176 D2Mit457 2 376.81 C 3.06E02 AK018150 03.106.500 3 484.85 T 5.72E06 AK018150 D3Mit106 3 489.72 T 2.86E05 NM.013645 D9Mit151 9 1375.37 C 2.29E02 AK007993 D16Mit86 16 2176.22 C 7.96E03 AK004777 D6Mit14 6 986.00 C 2.23E02 NM.009303 D7Mit259 7 1123.02 C 3.81E02 NM.013602 D11Mit337 11 1627.76 C 4.19E02 AK007270 D1Mit155 1 194.32 C 2.32E03 BC012437 D4Mit344 4 689.95 C 6.66E03 AK006264 D16Mit86 16 2176.22 C 5.00E02 AK016507 D9Mit359 9 1362.13 T 1.13E06 AK016507 D9Mit212 9 1362.53 T 3.70E06 AK013668 17.004.660 17 2188.24 T 6.36E05 NM.008439 D1Mit216 1 80.32 T 9.67E06 NM.008439 D1Mit134 1 80.78 T 1.30E05 NM.009474 03.106.500 3 484.85 T 2.43E05 AK009010 D9Mit227 9 1295.44 T 9.79E06 M28684 D3Mit19 3 535.89 C 3.80E02 Continued on next page

200 Signicant linkages in the BxD panel

Table A.3 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P NM.025663 D15Mit35 15 2082.50 C 4.67E02 AK006223 D16Mit86 16 2176.22 C 3.49E02 AK011881 D15Mit35 15 2082.50 C 3.75E02 AK011881 D1Mit91 1 126.29 T 6.92E05 NM.007900 D4Mit249 4 662.10 T 1.95E05 NM.008056 D1Mit113 1 171.88 T 1.30E05 AK005861 D3Mit155 3 464.91 T 4.30E05 NM.010402 D12Mit167 12 1731.75 T 5.57E05 NM.030724 D12Mit150 12 1742.65 C 4.07E02 AK019990 D1Mit216 1 80.32 T 7.22E05 M23016 D7Mit259 7 1123.02 C 2.23E02 AK020023 D1Mit83 1 86.03 T 4.27E05 NM.020600 D2Mit457 2 376.81 C 4.25E02 NM.021454 D7Mit259 7 1123.02 C 1.90E02 NM.007982 D15Mit35 15 2082.50 C 8.28E03 NM.019871 D3Mit19 3 535.89 C 4.93E02 AK003393 D6Mit14 6 986.00 C 3.30E02 NM.008224 D14Mit131 14 1975.00 C 4.79E02 AF204959 D14Mit131 14 1975.00 C 7.79E04 NM.008968 D8Mit156 8 1253.28 C 3.69E02 BC010790 D7Mit259 7 1123.02 C 1.83E02 M60493 D18Mit83 18 2296.35 T 5.61E05 NM.019472 D2Mit457 2 376.81 C 1.25E02 NM.008039 D11Mit337 11 1627.76 C 1.97E02 NM.008039 D3Mit348 3 504.81 T 1.09E05 NM.008039 D3Mit14 3 509.87 T 1.93E11 NM.008039 D3Mit291 3 514.22 T 5.41E10 NM.008039 D3Mit351 3 517.45 T 4.94E08 NM.008039 D3Mit127 3 521.04 T 2.67E05 Continued on next page

201 Signicant linkages in the BxD panel

Table A.3 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P AK002717 D2Mit296 2 227.15 T 4.58E05 AK016395 D8Mit335 8 1144.74 T 2.97E05 AK016395 08.025.220 8 1150.47 T 2.32E05 AK016395 D8Mit289 8 1152.59 T 5.74E07 AK016395 D8Mit189 8 1158.19 T 3.19E12 AK016395 D8Mit24 8 1158.56 T 9.57E11 AK016395 D8Mit294 8 1163.43 T 2.56E08 AK016395 D8Mit339 8 1164.38 T 6.90E06 NM.008028 D13Mit35 13 1862.73 C 1.06E02 NM.019498 D15Mit35 15 2082.50 C 3.44E02 NM.008249 13.043.620 13 1791.43 T 7.44E05 NM.008249 D13Mit91 13 1792.73 T 4.78E05 NM.008249 D13Mit94 13 1793.63 T 3.47E05 AY035893 D13Mit254 13 1818.85 T 5.19E05 AK006696 D7Mit259 7 1123.02 C 4.88E02 Z25469 D2Mit296 2 227.15 T 5.28E05 Z25469 D18Mit15 18 2310.58 T 2.41E05 Z25469 D18Mit94 18 2313.66 T 5.56E06 AK004859 13.069.159 13 1812.67 T 6.71E05 NM.025994 D7Mit259 7 1123.02 C 4.64E03 NM.025994 D12Mit3 12 1703.22 T 6.33E06 AK010344 D8Mit3 8 1148.38 T 7.44E05 AK010344 D11Mit245 11 1585.67 T 5.82E05 NM.010317 D1Mit83 1 86.03 T 3.77E05 NM.010317 D1Mit84 1 91.83 T 2.45E05 NM.010317 D1Mit45 1 92.95 T 5.42E05 L10894 D7Mit259 7 1123.02 C 1.72E02 AK008010 D7Mit259 7 1123.02 C 5.49E03 AK013456 D14Mit131 14 1975.00 C 1.33E02 Continued on next page

202 Signicant linkages in the BxD panel

Table A.3 – continued from previous page Transcript Locus Chr Location (Mb) Cis/Trans P AK003662 D12Mit150 12 1742.65 C 7.43E03 AK017528 D4Mit344 4 689.95 C 9.03E03

203 Appendix B

Over–represented GO terms in correlation clusters

204 Over–represented GO terms in correlation clusters

Table B.1: Clusters of correlated transcripts with over– represented GO terms. Each of 1089 transcripts with mappable expression levels (Chapter 4) was used to seed a cluster; membership is determined as an absolute correlation coe- cient greater than the 95th quantile of coecients. Terms are signicantly over–represented at P > 0.05, adjusted for the number of clusters tested. Cluster numbering is arbitrary.

Cluster Over–represented GO terms 5 kinase activity 21 kinase activity 43 ATP binding 73 transferase activity 130 regulation of nucleobase, nucleoside, nucleotide and nucleic acid metabolism 186 cellular process 240 cellular process 277 regulation of nucleobase, nucleoside, nucleotide and nucleic acid metabolism regulation of transcription regulation of transcription, DNA-dependent regulation of cellular process 279 phosphotransferase activity, alcohol group as acceptor kinase activity 327 regulation of nucleobase, nucleoside, nucleotide and nucleic acid metabolism 339 kinase activity 393 transferase activity, transferring phosphorus-containing groups transferase activity kinase activity 413 transferase activity Continued on next page

205 Over–represented GO terms in correlation clusters

Table B.1 – continued from previous page 431 signal transduction cell communication 490 catabolism cellular catabolism 494 kinase activity 687 cell adhesion 696 negative regulation of cellular process 747 kinase activity transferase activity, transferring phosphorus-containing groups 857 kinase activity transferase activity transferase activity, transferring phosphorus-containing groups 990 kinase activity 1018 adenyl nucleotide binding ATP binding 1065 kinase activity 1070 adenyl nucleotide binding ATP binding

Table B.2: Clusters of correlated transcripts with over– represented GO terms in BXD kidney. Clusters were derived from 226 mappable genes, as described in the caption for Ta- ble B.1.

Cluster Over–represented GO terms 18 cellular physiological process cellular process 29 cell organization and biogenesis organelle organization and biogenesis 58 ion binding metal ion binding Continued on next page

206 Over–represented GO terms in correlation clusters

Table B.2 – continued from previous page 60 structural molecule activity 105 organismal physiological process 110 DNA binding 121 biosynthesis 129 transcription, DNA-dependent regulation of transcription, DNA-dependent 140 physiological process cellular physiological process metabolism 152 structural molecule activity 153 organelle organization and biogenesis cell organization and biogenesis 169 catabolism cellular catabolism 171 transcription, DNA-dependent regulation of transcription, DNA-dependent transcription regulator activity 177 DNA binding 186 nucleobase, nucleoside, nucleotide and nucleic acid metabolism regulation of metabolism regulation of cellular metabolism transcription 191 nucleic acid binding

Table B.3: Clusters of correlated transcripts with over– represented GO terms in BXD liver. Clusters were derived from 226 mappable genes, as described in the caption for Ta- ble B.1.

Cluster Over–represented GO terms 12 protein binding Continued on next page

207 Over–represented GO terms in correlation clusters

Table B.3 – continued from previous page 13 cellular biosynthesis biosynthesis 15 cellular physiological process cellular metabolism 21 cellular process 27 signal transduction 33 regulation of cellular physiological process nucleic acid binding 34 cellular biosynthesis 39 biosynthesis cellular biosynthesis 41 biosynthesis 47 signal transducer activity DNA binding binding 55 protein binding 62 cellular physiological process 63 adenyl nucleotide binding ATP binding 66 biosynthesis 69 cellular biosynthesis 71 nucleobase, nucleoside, nucleotide and nucleic acid metabolism transcription protein binding DNA binding transcription regulator activity 74 cellular biosynthesis biosynthesis 75 nucleobase, nucleoside, nucleotide and nucleic acid metabolism 84 metabolism cellular metabolism Continued on next page

208 Over–represented GO terms in correlation clusters

Table B.3 – continued from previous page regulation of metabolism regulation of cellular metabolism primary metabolism transcription regulation of nucleobase, nucleoside, nucleotide and nucleic acid metabolism transcription, DNA-dependent regulation of transcription regulation of transcription, DNA-dependent DNA binding 85 organ development organogenesis regulation of biological process regulation of cellular process morphogenesis DNA binding transcription regulator activity 90 physiological process cellular physiological process biosynthesis 96 protein binding 101 regulation of cellular process 120 cellular biosynthesis 125 ion binding metal ion binding 127 physiological process 131 protein binding 132 cellular biosynthesis biosynthesis 133 localization establishment of localization Continued on next page

209 Over–represented GO terms in correlation clusters

Table B.3 – continued from previous page transport 137 protein binding 140 signal transducer activity protein binding

210