Supporting Information

Hübner et al. 10.1073/pnas.1321533111 SI Materials and Methods imposed, disallowing a direct transition between low and high Sampling and DNA Extractions. Single nucleotide polymorphism states (7). For each arm, parameters with highest was tested in Drosophila melanogaster isofemale lines established log-likelihood were used in the HMM. Assignment of SNPs to from females inseminated in nature and collected on the oppo- each of the detected “hidden” distributions was conducted by site slopes of Nahal Oren canyon (Carmel massif, Israel) in local decoding of each point to a posteriori most probable state. A October 2010. All flies were reared on standard instant Dro- local decoding approach was preferred over the Viterbi global sophila food medium (Carolina Biological Supply Co.) in half- decoding algorithm (10) because the later maximized the joint pint milk bottles at a temperature of 24 ± 1 °C and on a 12:12 distribution of the hidden states even at the cost of local false light/dark cycle. For the molecular experiments, we used flies assignments. from 32 isofemale lines [16 from north facing slope (NFS) and 16 To search for high-differentiation regions along chromosome from south facing slope (SFS)]. DNA was extracted from 100 arms, we used 10-kb nonoverlapping windows. For each window, males of each line using DNAzol Genomic DNA Isolation Re- the level of differentiation was represented by two parameters, agent (Molecular Research Center Inc.) according to the man- LI = nL/nT (“low island”)andHI= nH/nT (“high island”), where ufacturer’s protocol. Equal amounts of ∼1 μg of DNA from each nT is the total number of SNPs within each window whereas nH line were pooled to make slope representations. Illumina paired- and nL are the numbers of SNPs assigned in the HMM step to end libraries were constructed and sequenced with HiSeq, 100- either high- or low-differentiation states, respectively. Following cycle, at ∼40× coverage per population. this step, a permutation test was conducted to detect genomic regions with significant enrichment of HI scores. For each per- Mapping Reads and Data Processing. Paired-end reads were filtered mutation, Fst values were taken randomly from the whole set for minimum average base quality score of 20 and a minimum of SNPs within a chromosome arm together with their corre- length of 50 bp using PoPoolation (1). Trimmed reads were sponding assignment to high-differentiation hidden state. High- mapped to the D. melanogaster reference genome using the differentiation score HI was then calculated for each window. Burrows–Wheeler aligner (BWA) v0.6.2 (2) with the following This reshuffling process was repeated 10,000 times, allowing only parameters: -n 0.01 -l 100 -o 1-d 12 -e 12. Paired-end data were significantly differentiated regions to be compared with random merged to single files in sam format with the “sampe” option of expectations. BWA, converted to binary alignment/map format and filtered for Significantly differentiated genomic regions between slopes a minimum mapping quality of 20 using SAMtools v1.18 (3). were then inspected for the type and strength of selection using Files of both NFS and SFS populations were further converted the Tajima’s D scores for each window as obtained with Po- to a pileup file and synchronized for downstream analysis of poolation (1). Trimmed files of each population were converted allele frequency differences between the slopes of the canyon. to a pileup format separately and used to calculate population For each detected SNP, differentiation between populations was measures over a nonoverlapping window of size 10 kb in Po- < − calculated using both Fst and Fisher exact test as implemented in Poolation. A score of D 2 is indicative of a recent selective Popoolation2 (4). For each of the two populations, footprints of sweep (fixation of novel mutation) followed by a slow recovery of natural selection were detected using Tajima’s D calculated over variation, thus an excess of rare alleles. On the other hand, D > 2 nonoverlapping 10-kb windows along chromosome arms. Trim- is indicative of small allele frequency differences due to bal- med files of each population were converted to a pileup format ancing selection. Thus, combining both differentiation and se- separately and used to calculate population measures over lection scores can be used for detection of genomic regions a nonoverlapping window of size 10 kb in PoPoolation using corresponding to interslope differentiation caused by alternative a minimum allele frequency of 2, minimum quality of 20, mini- selection (small D and significant HI). mum coverage of 6, and maximum coverage of 400. Ontology Term Analysis. To study the biological significance Detection of Interslope Selective Differentiated Genomic Regions. To of under diversifying selection, an enrichment analysis of reveal genomic regions with significant interslope differentiation, (GO) terms was conducted with the Bioconductor we used Hidden Markov Model (HMM) analysis (5–7). HMM package gosEq (11), which accounts for the gene-length bias. was used to discriminate between the distributions of three hidden Genes located in genomic regions found to be under differen- states corresponding to high, moderate, and low interslope dif- tiation between slopes (HI) were further tested for their asso- ferentiation and to assign each SNP to a corresponding state. We ciation with the detected differentiation. For each SNP within thus applied an HMM as implemented in the R package “Hid- each gene, a Fisher exact test was performed, and the proportion fi < denMarkov” (8) using interslope Fst values as were obtained for of signi cantly differentiated SNPs (P 0.01) was recorded for each SNP from PoPoolation2 software. Additional parameters each gene. Genes with significantly high proportion of differen- for the HMM are the transition probability matrix between tiated SNPs (P adjusted < 0.0001) were selected for downstream states, emission probabilities, and states distribution parameters. GO enrichment analysis. A list of all genes (significant and Evaluation of these parameters was made within the HMM nonsignificant) and their corresponding sequence length was framework using the expectation maximization Baum–Weltch used to generate the probability weighting function, which en- algorithm (9) for each chromosome arm separately, by enabling ables one to correct for the gene-length bias. For time efficiency, a maximum of 1,000 iterations to reach convergence. Initial pa- the Wallenius distribution method (11) was used as an approx- rameters were defined from the data using a maximum likelihood imation for the H0 distribution. Gene length was calculated for approach by launching the Baum–Weltch algorithm 100 times each gene using FlyBase resources (batch download tool), based with randomly sampled values from a predefined parameter on sequence minimum and maximum locations (available for space. Single-SNP Fst values were found to follow a gamma dis- D. melanogaster reference genome v5.31). We generated a list of tribution, with rate and shape parameters extending between GO annotations for the whole genome using AmiGO v1.8 soft- 0–300 and 0–60, respectively. A constrained transition matrix was ware, the official web-based set of tools for searching and

Hübner et al. www.pnas.org/cgi/content/short/1321533111 1of5 browsing the Gene Ontology database (12), which served to physical position of the gene loci in the chromosome sequence determine significant GO terms based on the chosen criteria. relative to the centromere (dist, Mb) known as strong suppressor Significantly overrepresented GO terms were further corrected of recombination (“centromeric effect”) (for a review, see ref. for multiple comparisons using the Benjamini–Hochberg method 17). The two variables, r and dist, allow taking into account (13). The Functional Annotation Clustering tool in DAVID (v. the centromeric effect and local variation of recombination su- 6.7) (14, 15) was used for an in-depth GO enrichment analysis perimposed on the centromeric effect. The effect of recom- and clustering. We performed functional annotation clustering bination was tested for each of the large chromosomal arms for GO terms with high-classification stringency. using rank correlation of r and dist, with the following param- Effects of Recombination. In the current study, we tested, albeit eters calculated for the components of gene sequence [coding indirectly, the effect of recombination on population polymor- DNA sequences (CDSs), introns, 5′ UTR and 3′ UTR, and the phism in the canyon for SNP sites. The effect of recombination total gene sequence]: density of polymorphic SNPs per kb was tested based on the local recombination rates using data from (parameter all/kb) and density of SNPs significantly differen- Marais et al. (16), represented by the score r, cM/Mb, and tiated between the slopes (parameter sig/kb).

1. Kofler R, et al. (2011) PoPoolation: A toolbox for population genetic analysis of next 10. Viterbi AJ (1967) Error bounds for convolutional codes and an asymptotically generation sequencing data from pooled individuals. PLoS ONE 6(1):e15925. optimum decoding algorithm. IEEE Trans Inf Theory 13:260–269. 2. Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler 11. Young MD, Wakefield MJ, Smyth GK, Oshlack A (2010) Gene ontology analysis for transform. Bioinformatics 25(14):1754–1760. RNA-seq: Accounting for selection bias. Genome Biol 11(2):R14. 3. Li H, et al.; 1000 Genome Project Data Processing Subgroup (2009) The Sequence 12. Carbon S, et al.; AmiGO Hub; Web Presence Working Group (2009) AmiGO: Online Alignment/Map format and SAMtools. Bioinformatics 25(16):2078–2079. access to ontology and annotation data. Bioinformatics 25(2):288–289. 4. Kofler R, Pandey RV, Schlötterer C (2011) PoPoolation2: Identifying differentiation 13. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: A practical and between populations using sequencing of pooled DNA samples (Pool-Seq). Bioinformatics powerful approach to multiple testing. J R Stat Soc, B 57:289–300. 27(24):3435–3436. 14. Huang W, Sherman BT, Lempicki RA (2009) Systematic and integrative analysis of 5. Boitard S, Schlötterer C, Futschik A (2009) Detecting selective sweeps: A new approach large gene lists using DAVID bioinformatics resources. Nat Protoc 4(1):44–57. based on hidden markov models. Genetics 181(4):1567–1578. 15. Huang W, Sherman BT, Lempicki RA (2009) Bioinformatics enrichment tools: Paths 6. Kern AD, Haussler D (2010) A population genetic hidden Markov model for detecting toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res genomic regions under selection. Mol Biol Evol 27(7):1673–1685. 37(1):1–13. 7. Hofer T, Foll M, Excoffier L (2012) Evolutionary forces shaping genomic islands of 16. Marais G, Mouchiroud D, Duret L (2001) Does recombination improve selection on population differentiation in humans. BMC Genomics 13:107. codon usage? Lessons from nematode and fly complete genomes. Proc Natl Acad Sci 8. Harte D (2009) HiddenMarkov: Hidden Markov models (Statistics Research Associates, USA 98(10):5688–5692. Wellington, New Zealand). 17. Korol AB (2013) Recombination. Encyclopedia of Biodiversity, ed Levin SA (Elsevier, 9. Baum LE, Petrie T, Soules G, Weiss N (1970) A maximization technique occuring in Amsterdam), 2nd Ed. statistical analyses of probabilistic functions of markov chains. Ann Math Stat 41: 164–171.

Table S1. Genes involved in the determined BP (biological processes) obtained using GORILLA analysis of 106 slope-divergent genes Biological process Genes involved in the process

Cell death Ef1gamma, p53, bun, Eip74EF, morgue, Nc, Dif, l(1)G0148, snama, Traf4, W, drpr, CycE, Traf4, pnt, L, mod(mdg4), Apc Cell adhesion beat-VI, beat-VII, beat-IIa, if, beat-IIIb, pnt, beat-IIIc, beat-IV, beat-Vb, brn, Apc, drpr, CycE Regulation of developmental process L, p53, CycA, bun, Eip74EF, Nc, CycT, Dif, hth, pnt, W, Apc, fwd, morgue, CycE, if, fj, Axn Response to stimulus cact, L, Mipp2, Dif, ald, W, for, beat-IIIc, fwd, to, Ir52a, Ir94b, brn, Ir94c, Aplip1, Traf4, spri, unc-104, fj, Ir54a, Dll, p53, bun, CycT, Ir94e, Fur2, Pole2, pnt, sima, Pi3K68D, drpr, Trx-2, amon, CycE, Myd88, heph, TotZ, Axn, Ir94a Reproductive process ste24b, cact, Dll, p53, Eip74EF, bun, Nc, pnt, sima, mod(mdg4), fwd, to, A2bp1, brn, Cct1, CycE, heph, Axn Eye and organ development Gug, L, bun, Nc, Cct1, eyg, hth, Traf4, cact, Dll, pnt, sima, Apc, CycE, if, Axn, fj Gametogenesis cact, ste24b, p53, Eip74EF, bun, sima, mod(mdg4), fwd, brn, A2bp1, Cct1, CycE, Axn Induction of apoptosis p53, bun, morgue, Nc, CycE, Traf4, pnt, W, mod(mdg4), snama Response to DNA damage stimulus cact, p53, Dif, CycE, Pole2, heph, W, sima, Pi3K68D, Trx-2 Determination of p53, bun, Nc, Cct1, Indy (i’m not dead yet), fwd, Trx-2 adult lifespan Stress response regulation p53, L, Aplip1, heph, Traf4, trx Larval behavior amon, for, unc-104, drpr Regulation of antimicrobial peptide Ntf-2, cact, Dif, Myd88 production and immune response Axonal defasciculation tok, beat-IIa, if Anterograde synaptic vesicle transport Aplip1, unc-104

Hübner et al. www.pnas.org/cgi/content/short/1321533111 2of5 Table S2. Genes from the list of 106 slope-divergent genes falling into significantly differentiated intervals Chromosome Genes falling into significantly differentiated intervals (Fig.2)

X CG2990, involved in DNA replication, and Okazaki fragment processing; CG1673, contributing to branched-chain amino acid biosynthetic process; l(1)G0196, biological processes and biological process in which it is involved are not known 2L chromosome CycE, known to contribute in developmental process and response to DNA damage stimulus; cact, involved in several processes such as response to DNA damage stimulus, nervous system development, oogenesis, antifungal and antimicrobial humoral response, positive regulation of antifungal peptide biosynthetic process, phagocytosis; Dif, contributing to response to stress, response to DNA damage stimulus, defense response to bacterium, biological regulation, single-organism developmental process, defense response to fungus, immune system process, positive regulation of cellular biosynthetic process, immune response 2R chromosome muskelin, contributing to regulation of cell-matrix adhesion; spt4, participating in chromatin modification; picot, reported to be involved in phosphate ion transport; unc-104, involved in regulation of larval locomotory behavior, vesicle maturation and transport, and axon guidance; Ir54a, involved in detection of chemical stimulus; ste24b, involved in spermatogenesis; ste24a, providing proteolysis and CAAX-box processing; fj, establishing planar polarity, epithelial cell apical/basal polarity and imaginal disk-derived wing hair orientation, cell–cell signaling, imaginal disk-derived wing vein specification, regulation of tube length and open tracheal system, imaginal disk growth, imaginal disk-derived leg joint morphogenesis, Wnt receptor signaling pathway 3L chromosome No such genes 3R chromosome CG31524 associated with oxidation-reduction process; Ir94b, Ir94c and CG31464 contributing to detection of chemical stimulus; Axn involved in a number of processes (oogenesis, imaginal disk pattern formation, heart development, phagocytosis); amon responsible for hatching behavior; mod(mdg4) regulating apoptosis; sima, heph, trx involved in response to DNA damage

Hübner et al. www.pnas.org/cgi/content/short/1321533111 3of5 Table S3. Genes belonging to regions of high interslope differentiation (HMM pHI < 0.01) and low polymorphism (Tajima’sD< −1.5) Gene (symbol) FlyBase ID Position

CG9008 FBgn0028540 2L: 13,774,340.0.13,780,163 p38b FBgn0024846 2L: 13,780,695.0.13,782,611 CG16890 FBgn0028932 2L: 13,782,231.0.13,783,457 CG31731 FBgn0028539 2L: 13,784,089.0.13,790,562 kek3 FBgn0028370 2L: 15,552,723.0.15,583,579 crp FBgn0001994 2L: 16,264,061.0.16,287,728 CadN FBgn0015609 2L: 17,647,053.0.17,778,340 PNGase FBgn0033050 2R: 1,905,244.0.1,908,200 dream FBgn0033051 2R: 1,907,816.0.1,910,649 EcR FBgn0000546 2R: 1,975,378.0.2,056,592 dve FBgn0020307 2R: 18,131,468.0.18,173,974 jing FBgn0086655 2R: 2,389,511.0.2,509,297 Gprk1 FBgn0260798 2R: 209,942.0.358,565 Pka-R2 FBgn0022382 2R: 5,881,395.0.5,913,547 CG9238 FBgn0036428 3L: 14,534,037.0.14,544,075 CG42247 FBgn0259099 3L: 14,896,905.0.14,936,007 CG13465 FBgn0040809 3L: 14,936,830.0.14,937,431 BobA FBgn0040487 3L: 14,938,599.0.14,939,225 Chd3 FBgn0023395 3L: 19,440,273.0.19,443,338 Ptpmeg FBgn0035133 3L: 328,097.0.356,050 Glut1 FBgn0025593 3L: 914,074.0.993,687 l(3)L1231 FBgn0086686 3R: 10,470,391.0.10,481,436 CG3505 FBgn0038250 3R: 10,483,179.0.10,484,950 CG3508 FBgn0038251 3R: 10,485,163.0.10,486,899 Rad17 FBgn0025808 3R: 10,486,898.0.10,488,914 CG3509 FBgn0038252 3R: 10,489,165.0.10,490,544 CG14303 FBgn0038633 3R: 14,416,749.0.14,475,859 CG31224 FBgn0051224 3R: 14,475,843.0.14,484,963 Dys FBgn0260003 3R: 15,286,807.0.15,423,010 Ir94d FBgn0259193 3R: 19,228,530.0.19,230,861 Ir94e FBgn0259194 3R: 19,231,231.0.19,233,124 CG4374 FBgn0039078 3R: 19,234,406.0.19,297,805 mod FBgn0002780 3R: 27,877,782.0.27,880,696 Map205 FBgn0002645 3R: 27,881,001.0.27,894,163 DppIII FBgn0037580 3R: 4,170,122.0.4,173,187 CG7352 FBgn0037581 3R: 4,173,301.0.4,175,018 Atg13 FBgn0261108 3R: 4,175,537.0.4,178,539 mRpL40 FBgn0037892 3R: 7,233,640.0.7,234,47 CG6719 FBgn0037893 3R: 7,234,514.0.7,235,597 Irbp FBgn0011774 3R: 7,235,769.0.7,238,162 Ranbp9 FBgn0037894 3R: 7,238,244.0.7,241,864 Tim17a1 FBgn0038018 3R: 8,189,389.0.8,190,402 GstD10 FBgn0042206 3R: 8,190,496.0.8,191,440 GstD9 FBgn0038020 3R: 8,191,902.0.8,193,286 GstD1 FBgn0001149 3R: 8,193,269.0.8,194,987 GstD2 FBgn0010038 3R: 8,197,693.0.8,198,426 GstD3 FBgn0010039 3R: 8,198,790.0.8,199,535 GstD4 FBgn0010040 3R: 8,199,824.0.8,200,546 CG9940 FBgn0030512 X: 13,578,586.0.13,612,316 Flo-2 FBgn0024753 X: 14,733,407.0.14,827,616 Ca-alpha1T FBgn0029846 X: 5,998,118.0.6,086,516 CG32719 FBgn0052719 X: 7,402,684.0.7,442,025 CG2120 FBgn0030005 X: 8,029,857.0.8,031,079 Smox FBgn0025800 X: 8,031,074.0.8,036,312 CG17982 FBgn0030006 X: 8,036,583.0.8,037,638 CG2263 FBgn0030007 X: 8,037,937.0.8,039,635 CG2129 FBgn0030008 X: 8,039,872.0.8,041,888 Dsor1 FBgn0010269 X: 9,141,375.0.9,144,074 CG17754 FBgn0030114 X: 9,144,453.0.9,149,986 amx FBgn0000077 X: 9139948.0.9141369

Hübner et al. www.pnas.org/cgi/content/short/1321533111 4of5 Table S4. Spearman rank correlation between recombination-related parameters, local density of recombination (r), and physical position on chromosome (dist) and SNP-based scores, the density of interslope significant SNP (sig/kb), maxHIi,i+9 scores (maximal scores over 10 adjacent 10-kb intervals), and max [−log p(HIi,i+9)] Parameter X 2L 2R 3L 3R dist vs. sig/kb CDS 0.052, ns 0.083, ns 0.002, ns 0.083, ns 0.069, ns Intron 0.009, ns 0.224, P < 0.001 0.240, P < 0.001 0.240, P < 0.001 0.218, P < 0.001 r vs. sig/kb CDS 0.072, ns 0.074, ns 0.029, ns 0.124, P < 0.02 0.023, ns Intron 0.060, ns 0.209, P < 0.001 0.196, P < 0.001 0.196, P < 0.001 0.107, P < 0.01 dist vs. differentiation parameters −4 maxHIi,i+9 0.062, ns 0.126, ns 0.052, ns 0.162, P < 0.02 0.219, P < 3x10 −11 −9 max [−log p(HIi,i+9)] 0.088, ns 0.013, ns 0.209, P < 0.003 0.416, P < 2x10 0.347, P < 3x10 r vs. differentiation parameters −7 maxHIi,i+9 0.116, ns 0.001, ns 0.169, P < 0.02 0.041, ns 0.302, P < 3x10 −7 max [−log p(HIi,i+9)] 0.224, P < 0.001 0.079, ns 0.160, P < 0.02 0.323, P < 3x10 0.186, P < 0.002

Note: the analysis for sig/kb scores was conducted for genes (n ∼ 3x103 per arm) whereas, for HI and p(HI) scores, the analysis included windows each including 10 HI intervals (n ∼ 2x102 per arm). ns, not significant (P > 0.05).

Hübner et al. www.pnas.org/cgi/content/short/1321533111 5of5