Supplemental Data Supplemental Figures and legends

A B 0%! 50%! 100%! oCYP1A_H01! 0%! 50%! 100%! oCYP1B1_H01! oCYP1A_H02! oCYP1B1_H02! N.JPN! oCYP1A_H03! N.JPN! n=10 n=10 oCYP1B1_H03! oCYP1A_H04! oCYP1B1_H04! S.JPN! oCYP1A_H05! S.JPN! oCYP1B1_H05! n=26 oCYP1A_H06! n=24 oCYP1B1_H06! E.KOR! oCYP1A_H07! E.KOR! n=4 oCYP1A_H08! n=4 oCYP1B1_H07! oCYP1A_H09! oCYP1B1_H08! W.KOR! oCYP1A_H10! W.KOR! n=4 n=4 oCYP1B1_H09! oCYP1A_H11! oCYP1B1_H10! Related species! oCYP1A_H12! Related species! oCYP1B1_H11! n=4 oCYP1A_H13! n=4

C D 0%! 50%! 100%! 0%! 50%! 100%! oCYP20A1_H01! oCYP20A1_H02! N.JPN! N.JPN! n=10 n=10 oCYP20A1_H03! oCYP5A1_H01! S.JPN! S.JPN! oCYP20A1_H04! n=26 oCYP5A1_H02! n=20 oCYP20A1_H05! E.KOR! E.KOR! oCYP20A1_H06! n=4 oCYP5A1_H03! n=4 oCYP20A1_H07! W.KOR! n.a.*! W.KOR! n=4 n=4 oCYP20A1_H08! oCYP20A1_H09! Related species! Related species! n=0 n=4 oCYP20A1_H10!

E F 2500!

Human names 2000! p = 1.6 x 10-2

p = 1.1 x 10-3 1500! CYP1A1 CYP4V2 CYP8A1 CYP20A1 CYP26B1

1000! CYP1B1 CYP5A1 CYP8B1 CYP21A2 CYP27B1 500!

CYP2R1 CYP7A1 CYP11A1 CYP24A1 CYP27C1 RLU(Relative light unit) 0! 1" 2" 3" CYP2U1 CYP7B1 CYP17A1 CYP26A1 CYP51A1

GFP Tanabe Maegok Figure S1: Haplotype diversity of medaka CYPs, related to Figure 1. Haplotype frequencies based on CYP1A (A), CYP1B1 (B), CYP5A1 (C) and CYP20A1 (D) amino acid sequences in medaka local populations. Only homozygotes for each amino acid change were included. Colors represent haplotypes in local wild populations, n indicates the number of . (E) Table showing 20 CYP orthologs that were detected in the medaka genome based on searches of the . (F) activity of medaka CYP1B1 with HA-tag, and western blotting. CYP1B1 were expressed at comparable levels. The x-axis represents the names of wild medaka populations. Each bar represents the mean ± S.D. from multiple independent samples (n = 3). Statistical comparisons were performed using Tukey–Kramer test. Each column was compared against values from the Tanabe population; * not analyzed.

A

Model p* InL† κ‡ ω1§ ω2|| ω3¶

A. one: ω1=ω2=ω3 9 -2452.45 2.582 0.134 0.134 0.134

B. two: ω1=ω2, ω3 10 -2449.61 2.596 0.118 0.118 999.000

C. free: ω1, ω2, ω3 15 -2445.81 2.590 0.000 0.031 999.000

Null Hypothesis Alternative Hypothesis Model χ2** d.f. †† P‡‡

ω1=ω2=ω3 ω1, ω2, ω3 A vs. C 13.273 6 0.039

ω1=ω2=ω3 ω1=ω2, ω3 A vs. B 5.681 1 0.017

p = 0.3948 B C p = 0.3139 1.25! p = 0.0035 conservative 0.0/4.3 TN p = 0.0299 1.1/8.8 1! p = 0.0028 S.JPN T395P 1.0/1.1 KS K69Q Common! 0.75! Ancestor 2.2/5.4 SH D420E, D507A 3.1/0.0 W.KOR 0.5! V171L, I246V V329I 2.0/6.5 MG T38I, I261V

non-conservative Relativeenzyme activity 0.25! 18.0/24.6 LZ outgroup

0! ! ! ! ! ! 0.01 ! ! Maegok T395P Tanabe Maegok oCYP1B1_H03 oCYP1B1_H08 Ancestral CYP1B1

Figure S2: Reconstruction of ancestral CYP1B1, related to Figure 4. (A) Log likelihood values (top) and likelihood ratio tests (2 ΔInl; bottom) were estimated under different models for the two hypotheses. (B) Phylogenetic tree based on the nucleotide sequences of CYP1B1, LZ was used as the out-group. The ancestral sequences of medaka CYP1B1 were inferred based on this tree. Amino acid substitutions are indicated under each branch. The numbers on each branch show dN/dS (ω) in Figure. 4. The orange arrow shows the amino acids replaced in subsequent mutagenesis analyses. (C) Comparison of enzyme activities between wild-type and mutant . Constructs were generated using site-direct mutagenesis. This figure indicates that substitution of 395th amino acid had no effect on CYP1B1 enzyme activity. Maegok T395P was also generated to confirm the effect of the amino acid change. Each bar represents the mean ± S.D. from multiple independent samples (n = 3). Data were analyzed for statistical significance using the Tukey–Kramer test. * number of parameters, † log likelihood value, ‡ transition/transversion rate ratio, § dN/dS for (TN, KS)–TN branch, || dN/dS for ((TN, KS), (SH, MG))–(TN, KS) branch, ¶ dN/dS for ((TN, KS), (SH, MG))–(SH, MG) branch, ** log likelihood difference (2 ΔInl), †† degrees of freedom, ‡‡ probability values

A B

CYP1B1*3! African! African! CYP1B1*1!

CYP1B1*4!

European! CYP1B1*2! European!

CYP1B1*5!

CYP1B1*6! Asian! Asian! residual!

0%! 25%! 50%! 75%! 100%! 0! 0.6! 1.2! 1.8! Haplotype frequency Mahalanobis’ generalized distance

Figure S3: CYP1B1 and sexual dimorphism in human populations, related to Results and Discussion. (A) CYP1B1 haplotype frequencies based on HapMap and 1,000 genome data. Africa includes YRI: Yoruban in Ibadan; Europe includes CEU: Utah residents with Northern and Western European ancestry from the CEPH collection; Asia includes JPT: Japanese in Tokyo, Japan and CHB: Han Chinese in Beijing, China. (B) Mahalanobis’ generalized distances (D2) between males and females based on tooth-crown size (modified from Hanihara 1978).

Supplemental Tables Table S1: Spearman’s rank correlation coefficients, related to Figure 1. Spearman’s Rho values are shown in the lower-left portion of the matrix, and p values are shown in the upper-right portion. Variables for correlation tests are shown below. “Mated male” indicates a dummy variable: a female with a Tanabe male (1) or female with a Maegok male (0). Likewise, “mated female” indicates a dummy variable: a female from Tanabe (1) or a female from Maegok (0) populations. ΔSL, ΔA-AFL, ΔP-AFL, and ΔAFL are differences between Tanabe and Maegok males’ morphometric data. Significant correlations after Holm corrections for multiple comparisons are indicated in bold.

Mated male Mated female ΔSL ΔA-AFL ΔP-AFL ΔAFL ratio Mated male - 1.000 1.000 1.000 0.004 0.004 Mated female 0.045 - 1.000 1.000 1.000 1.000 ΔSL -0.146 0.047 - 0.000 0.008 0.000 ΔA-AFL -0.146 0.183 0.715 - 1.000 0.015 ΔP-AFL 0.470 -0.167 -0.441 -0.044 - 0.000 ΔAFL ratio 0.470 0.089 -0.647 -0.414 0.827 - Table S2. Mahalanobis’ generalized distances based on C scores from three variables of medaka morphology, related to Figure 3. Mahalanobis’ distances are shown in the lower-left portion of the matrix, and p values are shown in the upper-right portion; TT, homozygote of Tanabe CYP1B1; TM, heterozygote of Tanabe and Maegok CYP1B1; MM, homozygote of Maegok CYP1B1. Significance was assessed using F-statistics; p values less than 0.001 are shown in bold.

TT♂ TM♂ MM♂ TT♀ TM♀ MM♀ TT♂ - 0.185 0.737 0.000 0.000 0.000 TM♂ 0.693 - 0.655 0.000 0.000 0.000 MM♂ 0.201 0.158 - 0.000 0.000 0.000 TT♀ 13.130 10.371 11.960 - 0.058 0.633 TM♀ 8.203 5.262 6.763 1.262 - 0.132 MM♀ 9.253 7.221 8.406 0.357 0.803 -

Table S3. Estimation of time since the most recent common ancestor using whole mitochondrial genomes related to Results and Discussion.

Number of TMRCA

segregating ˆ Ne θW 95% CI sites Mean 95% CI (lower) (upper)

3,289 1342.449 3,015,384 4,987,445 3,527,999 6,489,106

Supplemental Movie Movie S1. Mating behavior in medaka. We put two males (from Tanabe and Maegok, respectively) and one female medaka (from Tanabe or Maegok) into a tank, although the Maegok male cannot be seen in the video. The Tanabe male medaka has larger anal fins than the Maegok male, and grasps the Tanabe female using his anal fin. After slowly descending together, the male and female vibrate their bodies to stimulate the release of sperm and eggs, respectively. The eggs are fertilized at this moment.

Supplemental Experimental Procedures 1. Medaka SNP screening Identification of orthologous To identify CYP orthologs of humans in medaka, we reconstructed phylogenetic trees and investigated genome syntenies. The following amino acid sequences from seven species were obtained from the Ensembl database: medaka (Oryzias latipes), human (Homo sapiens), chimpanzee (Pan troglodytes), macaque (Macaca mulatta), zebrafish (Danio rerio), fugu (Takifugu rubripes) and tetraodon (Tetraodon nigroviridis). Phylogenetic analyses were performed using the program MEGA4 [S1]. Pairwise sequence divergences were calculated using the Poisson correction method, and phylogenetic trees were reconstructed using the neighbor-joining (NJ) method [S2]. Statistical reliability of tree branches was evaluated using 1,000 bootstrap replicates [S3]. Genome synteny was examined using Ensembl and Genomicus (http://www.dyogen.ens.fr/genomicus-64.01/cgi-bin/search.pl) [S4].

Samples We used lab-stocks of medaka from wild populations originated from Northern Japanese (N.JPN), Southern Japanese (S.JPN), Eastern Korean (E.KOR) and Western Korean/Chinese groups (W.KOR), which are distinguished by mitochondrial DNA sequences [13] (Figure 1A). These medaka strains were maintained for many generations as closed colonies in the Graduate School of Frontier Sciences, University of Tokyo [S5]. A total of 28 individuals were analyzed from 26 locations, including northern Japan: Niigata, Ryotsu, Kaga and Odate (n = 2); southern Japan: Tanabe, Takamatsu, Tessei, Kasumi, Urizura, Iwaki, Mishima, Hagi, Okewaki, Kikai, Nago (n = 2), Kochi, Yamaguchi, Akishima and Gushikami; east Korea: Yongchon and Sajin; west Korea: Maegok and Bugang; China: Shanghai and related species Luzon (O. luzonensis) and Hainan (O. curvinotus). Original habitats of each are described in Katsumura et al. 2009 [13].

DNA extraction Whole medaka were dissolved in a solution containing 10% SDS, 0.5 M EDTA and proteinase K. Total genomic DNA was extracted and purified using phenol– chloroform and isopropanol precipitation. Isolated DNA samples were resuspended in 10–100 µl TE buffer.

PCR direct sequencing PCR primers were designed from medaka genomic sequences [15] that corresponded to human sequences in which population-differentiated SNPs were identified (see primer list). Sequenced lengths excluding introns of each CYP were as follows: CYP1A, 1176 bp; CYP1B1, 1185 bp; CYP5A1, 237 bp and CYP20A1, 390 bp. Individuals with heterozygous sites were excluded from subsequent analyses because CYP haplotypes could not be determined by PCR direct sequencing. Screened nucleotide sequences were translated into amino acid sequences and CYP haplotype frequencies were calculated. Approximately 50 ng of genomic DNA was used as PCR template in 30 µl solutions containing 0.2 mM dNTPs, 0.08 µM forward and reverse primers, 0.75 U Ex Taq polymerase Hot Start (TaKaRa Bio Inc.) and 10× reaction buffer. Reactions were performed in a DNA Engine PTC-200 instrument (Bio-Rad Laboratories, Inc.) with an initial denaturing step at 95°C for 2 min, 40 denaturation cycles of 95°C for 30 s, 60°C for 30 s, 72°C for 30–90 s (depending on amplicon length) and a final extension step at 72°C for 5 min. PCR products were purified using 30% polyethylene glycol 6000 (PEG) precipitation and were used as templates in direct sequencing reactions according to the manufacturer’s instructions. Samples then were analyzed on an ABI 3130 DNA Sequencer (Applied Biosystems).

Search for candidate genes focusing common genetic polymorphisms between medaka and humans Our previous study shows that among human populations, highly differentiated genes with related physiological mechanisms tend to be highly differentiated among orthologues of medaka populations [S31]. Using phylogenetic and genome synteny analyses, we identified 20 CYP orthologs in medaka and humans (Figure S1). To screen medaka SNPs on cytochrome P450s (CYPs), we focused on CYP1A1, CYP1B1, CYP5A1 and CYP20A1 because in the HapMap database [S12], CYP1A1, CYP1B1 and CYP20A1 show higher polymorphic frequencies than CYP5A1. Reflecting the polymorphic patterns of CYPs in human populations, various non-synonymous mutations in genes encoding CYP1A, CYP1B1 and CYP20A1 were found in various medaka populations, and amino acid sequence haplotypes of these four CYPs cover a spectrum of frequencies across mitochondrial groups (Figure S1). Among 4 groups, 10 CYP1A haplotypes were found, and 2 were found among related species. Six haplotypes were identified in S.JPN, and the major haplotype (H02) in Oryzias was found with a frequency of 25%. Only one haplotype was detected in N.JPN (H01), and no haplotypes were shared among the groups. CYP1B1 carried a haplotype composition similar to CYP1A, and the CYP20A1 haplotype H03 was common to all four groups. In contrast, genetic variations in CYP5A1 were low in each mitochondrial group, indicating that this gene may be more conserved than the other CYPs, similar to human homologues. In particular, CYP5A1 exhibited no variations within the group, although the possibility that this pattern reflects relatively short sequences of CYP5A1 cannot be excluded. Although the frequencies of the four CYP haplotypes in E.KOR and W.KOR were nearly homogeneous, this may also reflect small sample sizes from both populations. Additional samples from these and S.JPN or N.JPN populations may reveal more variants. Because CYP1A and CYP1B1 were highly polymorphic and differentiated across medaka populations and were also involved in estrogen metabolism in fish [22], we examined whether these genes were associated with sexual dimorphisms.

2. Functional analysis of cytochrome P450 RNA extraction Whole medaka from laboratory stocks were dissolved in solutions containing 1 ml Sepasol RNA1 (Nakarai), 0.01 ml acetic acid and 0.2 ml chloroform. Following centrifugation, supernatants were purified by isopropanol precipitation. Total RNAs were resuspended in 90 µl of DEPC-treated water and combined with 1 µl DNase I (TaKaRa Bio Inc.) and 10 µl 10× DNase I buffer. Total RNA were then extracted and purified using phenol–chloroform and isopropanol precipitation. RNA samples were resuspended in 20 µl DEPC-treated water.

Rapid amplification of 5′cDNA ends The medaka CYP1B1 cDNA sequence in the Ensembl database lacks the 5′ portion of the full-length cDNA. This region was obtained using the 5′ rapid amplification of cDNA ends (RACE) technique with a SMART-RACE cDNA amplification kit (Clontech). First-strand cDNA templates were synthesized from total RNA that was extracted from the gills of adult medaka (Hd-rR strain [S6], female). Primers were designed from the CYP1B1 cDNA sequence of Hd-rR medaka in the Ensembl database. The first 5′ RACE product was amplified using the oCYP1B1EX3F1 and UPM primers supplied by the manufacturer. Nested PCR was subsequently performed using the initial PCR products and the oCYP1B1EX2F1 and UPM primers. The products were sub-cloned into pGEM-T easy (Promega) vectors and then sequenced. The 5′ RACE products and the CYP1B1 cDNA sequences from Ensembl were combined to generate full-length cDNAs. Primer sequences are presented in the primer list.

Molecular cloning of CYP1A and CYP1B1 Using total RNA isolated from adult whole medaka (Tanabe, Kasumi, Shanghai, Maegok, Luzon), first-strand cDNAs were synthesized using poly (dT) primers (5′-AAG CAG TGG TAA CAA CGC AGA GTA C[T]30VN-3′ [V: A, G or C; N: A, C G, or T; [T]30: series of 30 T bases]). CYP genes were amplified using PCR with primer pairs (see Primer list) designed to cover sequences from upstream of the corresponding CYP start codons to downstream of the stop codons. PCR products were cloned into pBluescript II (SK-) or pGEM-T easy (Promega) vectors and then sequenced. Complete coding regions of cDNAs were further amplified from the pBluescript II (SK-)- or pGEM-T easy cDNA clones using primers (see Primer list) designed to cover sequences between 5′ and 3′ coding regions and to install restriction enzyme sites for cloning.

UAS vector construction CYP overexpression was performed using the GAL4/UAS system [S7]. Constructs of CYP1A or CYP1B1 without hemagglutinin (HA)-tags were generated by ligating EcoRI/XhoI or BglII/XhoI fragments, respectively. These fragments were isolated from pBluescript II (SK-) or pGEM-T easy clones containing full-length cDNAs of CYP1A or CYP1B1 and then inserted at EcoRI/XhoI or BglII/XhoI sites of the pUAST vector, respectively. To generate HA-tagged CYP1B1 , a BglII site at the 5′ end and a NotI site at the 3′ end of the cDNA encoding the entire ORF were introduced by PCR (see Primer list). Each ORF region was excised with BglII/NotI enzymes, and the corresponding fragments were ligated into pUAST vectors encoding three tandem 3′ HA tags.

Transfection of Drosophila S2 cells Previously we established a cell-based assay for functional analyses of CYPs using Drosophila S2 cells [S8]. UAS-GFP, which comprises a GFP coding sequence cloned into the pUAST vector, was used as a negative control. The constructed pUAST vectors were transfected with actin5c–GAL4 constructs (provided by Prof. Yasushi Hiromi PhD at the National Institute of Genetics). S2 cells were cultured in Schneider’s Drosophila medium (Invitrogen) containing 10% heat-inactivated fetal bovine serum (Nichirei) and 1% penicillin–streptomycin solution (Nakarai). S2 cells were aliquoted into 24-well plates and transfected using Effectene transfection reagent (Qiagen) according to the manufacturer’s protocol.

Measurement of enzyme activity Medaka CYP1A and CYP1B1 activities were quantified using P450-Glo CYP1A1 and P450-Glo CYP1B1 assay kits (Promega), respectively [S9]. Two days (48 h) after transfection, media was replaced with fresh media containing the luminogenic substrate luciferin 6′-chloroethyl ether (100-µM; luciferin-CEE). After 3 h at 25°C, the media was aspirated and CYP activity was measured as D-luciferin luminescence using a microplate reader (PerkinElmer). Medaka CYP1A and CYP1B1 alleles and the negative control (GFP expressing cells) were assayed in triplicate. Luminescence values for the negative control were subtracted from those of experimental samples, and enzyme activities were calculated as luminescence ratios relative to that produced by the Tanabe allele. In addition, HA-tagged CYP activities corroborated those of non-tagged CYP proteins (Figure 2B) and no differences in protein expression levels were observed (Figure S2F).

Western blotting Cells expressing HA-tagged CYP1B1 protein were washed twice in ice-cold phosphate-buffered saline (PBS) and harvested in ice-cold lysis buffer containing 20 mM Tris–HCl (pH 8.0), 10% glycerol, 150 mM NaCl, 2 mM EDTA, 1 mM Na3VO4 and 10 mM CHAPS. After 15 min at 4°C, sample buffer containing 0.5 M Tris–HCl (pH 6.8) and 10% SDS, glycerol, β-mercaptoethanol and bromophenol blue were added. Electrophoresis of the indicated amounts of protein lysate was performed through a 12% SDS-PAGE. Proteins were transferred onto PVDF membranes, and nonspecific binding sites were blocked for 1 h at room temperature with 5% powdered skim milk solution containing 1 M Tris–HCl (pH 7.5), 1.5 M NaCl and 0.1% Tween80 (TBST). Membranes were then washed twice (5 min each) in TBST and were incubated overnight at 4°C in 5% skim milk–TBST containing rabbit polyclonal antibodies (1:10,000) against the HA-tag. The membranes were washed twice (5 min each) in TBST and were incubated for 1 h at room temperature with anti-rabbit IgG antibodies conjugated to alkaline phosphatase (1:5,000) in 5% skim milk–TBST. The membranes were then washed three times (5 min each) in TBST. Finally, membranes were incubated in 10 mL alkaline phosphatase buffer containing 5-bromo-4-chloro-3-indolyl phosphate (Promega) and nitro blue tetrazolium (Promega) until bands appeared.

3. Analysis of medaka CYP genotypes and sexual dimorphism Medaka populations Eighty-one Medaka populations were collected from East Asia and were maintained and bred for up to four generations at the University of Tokyo. In addition, several related species from South-East Asia were also maintained. Previously, we investigated genetic diversity among these medaka populations, showing abundant genetic diversity and a high degree of differentiation between populations, and providing an appropriate model for examinations of genetic polymorphisms and differentiation [12, 13]. Approximately 30 individuals were collected for breeding from the Tanabe and Maegok populations. The numbers of individuals in experimental populations were increased to approximately 150 in the laboratory. Medaka were bred at 28°C with a 14 h light/10 h dark cycle. Forty-two (22 males and 20 females) and 50 (24 males and 26 females) breeding individuals were chosen randomly from Tanabe and Maegok populations, respectively, for subsequent experiments.

Cross experiments Tanabe and Maegok individuals were crossed in the laboratory. F1 individuals were generated from two different sets of parents comprising a Tanabe male and three Maegok females, and a Maegok male and three Tanabe females. F1 individuals were then randomly crossed in pairs to yield 129 F2 individuals.

Measurements of anal fin rays Photos of Tanabe, Maegok and F2 individuals with body lengths >10 mm were taken using a digital camera (Eos Kiss, Canon) with a micro lens (Ultrasonic EF 100 mm, Canon). Standard body lengths (SL), anterior anal fin lengths (A-AFL) and posterior anal fin lengths (P-AFL) were then measured from photographs and the straight-line distance from the root to the tip of anal fins was estimated using Adobe Illustrator CS5 (Adobe). A-AFL and P-AFL were defined as the lengths of the fourth and second anal fin rays from the anterior and posterior, respectively. The sex of Tanabe and Maegok individuals were identified using several secondary sexual characteristics [S10] and that of F2 individuals was identified using genetic markers (see Genotypic sex determination). Anal fin morphology was compared using boxplots that represented five-number summary statistics for each group, with lower and upper error bars indicating minimum and maximum observations, the tops and bottoms of boxes represent third and the first quartiles, respectively, and the middle bar represents the median.

DNA extraction from the caudal fin Caudal fins from F2 individuals were excised and dissolved in SDS lysis buffer containing 10 mM Tris–HCl (pH.7.4–8.0), 1 mM EDTA, 1% SDS and proteinase K, and total genomic DNAs were extracted and purified using phenol/chloroform and ethanol precipitation. DNA samples were resuspended in 10–30 µl TE buffer.

CYP1B1 genotyping A 478-bp region of CYP1B1 was amplified using PCR with the primers described for the SNP screen (oCYP1B1EX3F1 and oCYP1B1EX3R1; see Primer list). Approximately 50 ng genomic DNA was used as the PCR template in 30 µl solutions containing 0.2 mM dNTPs, 0.08 µM of each primer, 0.75 U Ex Taq polymerase Hot Start (TaKaRa Bio Inc.) and 10× reaction buffer. Reactions were performed using a DNA Engine PTC-200 instrument (Bio-Rad Laboratories, Inc.) with an initial denaturing step at 95°C for 2 min, 40 cycles of 95°C for 30 s, 60°C for 30 s and 72°C for 30 s and a final extension step at 72°C for 5 min. PCR products were diluted 20-fold and used as templates in direct sequencing reactions according to the manufacturer’s protocol. Samples were then analyzed on an ABI3130 DNA Sequencer (Applied Biosystems). The 478-bp nucleotide sequences from CYP1B1 exon 3 were determined using forward primer alone or both primers concurrently. It was assumed that recombination did not occur in the CYP1B1 coding region, and CYP1B1 genotypes of F2 individuals were estimated from CYP1B1 DNA sequences of exon 3 after measuring fin lengths (blind test).

Genotypic sex determination Similar to mammals, medaka have an XY/XX sex determination system [S11]. Sex was genotyped according to the presence or absence of sex-determining DMY using PCR with genomic DNA templates [15]. PCR products from male medaka comprise amplicons of different lengths. The male-specific DMY and DMRT1 fragments are highly similar to DMY fragments of different amplicon lengths and are present both in male and female genomes. Sex is therefore determined by agarose gel electrophoresis of the PCR products and individuals with both DMY and DMRT1 bands are XY, whereas those with only DMRT1 bands are XX. PCR was performed using the following primers for DMY and DMRT1: oDMY-F: 5′-CTG ACA TGA GCA AGG AGA AGC AG-3′, oDMY-R: 5′-TGC GGC AGA CAG AGG ATT GG-3′. PCR was performed with an initial denaturing step at 95°C for 2 min, followed by 40 cycles of 95°C for 30 s, 60°C for 30 s and 72°C for 90 s and a final extension step at 72°C for 5 min. Electrophoresis was performed on a 1% agarose gel.

Haplotype analysis in human populations Human CYP1B1 haplotypes (allele combinations of each SNP) were estimated based on SNP frequencies in the HapMap database [S12] and data from the 1,000 genome project [S13] using the program PHASE v.2.1 [S14, S15] with default parameters. CYP1B1*1, CYP1B1*2 and CYP1B1*4 had differing enzyme activities compared with those of CYP1B1*3 [27]. These alleles produce four amino acid changes and one silent mutation and define haplotype combinations of the five SNPs rs10012, rs1056826, rs1056836, rs1056837 and rs1800440. Using HapMap and data from the 1,000 genome project, frequencies of the four haplotypes CYP1B1*1, CYP1B1*2, CYP1B1*3 and CYP1B1*4 were estimated for seven populations using these five SNPs.

Morphological analysis Mahalanobis’ generalized distances (D2) were based on C scores that were calculated from the three variables SL, A-AFL and P-AFL. This biological distance analysis was employed to represent differences in sexual dimorphism between CYP1B1 genotypes. Magnitudes of variables were converted into Z scores, which were used to generate C scores as defined by Howells (1989) [S16]. The advantage of C scores over simple ratios is that they reflect the relative size of a given feature in comparison to the size of all other analyzed traits [S17]. Standard deviations were calculated from pooled variances of the same samples. Mahalanobis’ generalized distances were calculated from the C score dataset. Following the two-step procedure, the effects of size and the intercorrelation of variables were removed from the final distance matrix [S18]. These were calculated using programs written by one of us (TH) using BASIC.

Statistical analysis Statistical analyses of morphometric data and anal fin ratios were performed using Kruskal–Wallis’ test. Scheffe’s type multiple comparison test was employed for non-parametric tests and was written using the program R v.2.11.1 with Aoki’s R script (http://aoki2.si.gunma-u.ac.jp/R/src/kruskal_wallis.R, encoding="euc-jp").

4. Anal fin morphology and sexual selection Mating experience To test whether males with larger anal fins were more likely to mate with females, we placed two medaka males from Tanabe and Maegok, respectively, and one female from Tanabe or Maegok in a tank to observe competitive mating (see the Movie S1). We collected fertile eggs after 55 mating events (total of 198 eggs) of eight groups (Figure 1D). Subsequently, we genotyped the LG22-1 (MF01SSA023H11) locus, one of 48 loci reported as a M marker 2003 [S19] that distinguishes between southern and northern medaka using fragment length polymorphisms in PCR products. Among M markers, we found that LG22-1 distinguished between Southern Japanese and Western Korean medaka by agarose gel electrophoresis. To determine LG22-1 genotypes in 198 fertile eggs, the 1,818-bp fragment was PCR-amplified using the primer pair 5′-CAG AGG ACT ACA GCA CCT AC-3′ and 5′-AGG AGA GAC CAC ATA GTT TCC-3′. PCR templates were prepared from fertile eggs using an alkaline lysis method (see a manufacturer’s protocol of KOD FX Neo (TOYOBO)). In brief, eggs were incubated at 95°C for 10 min in 180 µl of 50 mM NaOH and were then added to 20 µl of 1 M Tris–HCl (pH 8.0). Lysates of 2.5 µl were used as PCR templates in 22.5 µl solutions containing 0.4 mM dNTPs, 0.3 µM of each primer, 0.5-U KOD FX Neo and 2× PCR buffer for KOD FX Neo (TOYOBO). Reactions were performed using a DNA Engine PTC-200 (Bio-Rad Laboratories, Inc.) instrument, with an initial denaturing step at 94°C for 2 min, 35 cycles of 95°C for 10 s, 51.8°C for 30 s and 68°C for 30 s. Electrophoresis was performed using 1% agarose gel. To estimate the degree of association between the six variables, Spearman’s rank correlation coefficients were calculated (Table S1). Significance was calculated by applying the Holm method for post hoc multiple pairwise comparisons using the program R v.2.11.1.

Estimation of TMRCA using complete mtDNA genomes

We calculated TMRCA among Medaka populations using a rejection-sampling method [S20]. Seven complete mtDNA genome sequences were downloaded from GenBank (http://www.ncbi.nlm.nih.gov/genbank/). These were W.KOR (n = 2, AP008947 and AP008948), N.JPN (n = 2, AB498066 and AP008941) and S.JPN (n = 3, AB498065, AP008946 and AP004421). Subsequently, we counted the number of segregating sites (Seg) in each sequence. To follow the infinitely-many-site model, in which every mutation occurs at a site different from the sites of the previous mutations [S21], we excluded tri- or tetra-allelic SNP sites and focused on bi-allelic SNP sites. Given the summary statistics of n samples, the posterior distribution of coalescent times

T = Tn ,Tn−1,...,T2 is given as P(T | Seg) ∝ f (Seg | T )π (T ) [S22] by Bayes’ theorem.

The likelihood of f (Seg | T ) follows a Poisson distribution, and Ti , which is the waiting time of coalescence among i sequences, follows an exponential distribution with the parameter i(i −1) / 2 . Under the infinitely-many-site model, we obtained the posterior estimate of TMRCA given Seg using the following algorithm [S23]: for i = 1−1,000,000 do repeat

1. Generate t = (tn ,tn−1,...,t2 ) from π (T ) . 2. Calculate the length of the genealogy (l) from

l = 2t2 + ...+ (n −1)tn−1 + ntn until accepting t with a probability u defined by ˆ u = Po(lθW / 2){Seg} / Po(Seg){Seg} , where the Poisson point probability −λ Seg ˆ Po(λ){Seg} = e λ / Seg!, and θW = Seg / (1+1/ 2 + ...+1/ (n −1)) .

3. Keep ti = t end for

We estimated the effective population size ( Ne ) using the Watterson’s estimate of a ˆ ˆ population mutation rate θW = Neµ , where θW = Seg / {1+1/ 2 + ...+1(n −1)} and m is the mutation rate per site per generation. We assumed that m = 2.8 × 10−8 [S24] and the generation time was one year. Chronological times of TMRCA were calculated from

Net . Based on the posterior distribution of TMRCA given Seg, we calculated the mean and the 95% credible intervals for the posterior estimate (Table S3).

Estimation of dN/dS (ω) and assessment of ancestral CYP1B1 enzyme activities Assuming that sexual selection operated on anal fin morphology, we expect a detectable influence of such selection on CYP1B1 nucleotide sequences. Hence, we expect that in the lineages preceding Tanabe population, and in the common ancestor of S.JPN and W.KOR, the ratio of non-synonymous SNPs over synonymous SNPs (dN/dS = ω) is much less than 1. Thus, functional mutations did not occur in CYP1B1. We assumed a tree topology ((Tanabe, Kasumi), (Maegok, Shanghai), Luzon) and estimated w values for each branch using maximum likelihood calculations (Figure S2A and B). We then used a likelihood ratio to test the following two hypothesis: (1) the model with all different branch rates is significantly more likely than a model in which all branches have the same w values and (2) the model with W.KOR ancestor specific rates is significantly more likely than the model with constant w at all branches. Log likelihood and w values at branches were calculated using the program PAML [23]. To estimate the ancestral CYP1B1 amino acid sequences in O. latipes, we aligned Luzon medaka (O. luzonensis) CYP1B1 cDNA sequences with the four medaka cDNA sequences from Tanabe, Kasumi, Maegok and Shanghai using CLUSTAL W [S25]. Pairwise sequence divergences were calculated under the Kimura 2-parameter model [S26], and phylogenetic trees were reconstructed using the NJ method [S2]. Based on this tree topology (initial tree) from cDNA sequences, we inferred the ancestral CYP1B1 sequences using the Maximum Likelihood method [S27] under the JTT matrix-based model [S28]. Reconstruction of the ancestral sequences was performed using the program MEGA5 [S29]. Subsequently, we generated the putative ancestral CYP1B1 amino acid sequence using PCR-based site-direct mutagenesis. The point mutation 1192 C>A, Pro395Thr, was introduced into the cDNA of Tanabe CYP1B1 and was cloned into pBluescript II (SK-) vectors using the following mutagenesis primer pair: oCYP1B1EX3_P395T_F: 5′-CTA TCA TGA GCT ACA CCA TCC CCA AGA AC-3′, oCYP1B1EX3_P395T_R: 5′-GTT CTT GGG GAT GGT GTA GCT CAT GAT AG-3′. Approximately 5 ng of plasmid DNA was used as PCR template in a 50 µl solution containing 0.2 mM dNTPs, 0.2 µM of each primer, 1.25 U PrimeSTAR HS DNA polymerase (TaKaRa Bio Inc.) and 5× PrimeSTAR buffer. Reactions were performed using a DNA Engine PTC-200 instrument (Bio-Rad Laboratories, Inc.) with an initial denaturing step at 98°C for 2 min, 12 cycles of 98°C for 10 s, 55°C for 15 s and 72°C for 4.5 min. After 12 cycles the reaction was held at 15°C. Dpn I digestion was performed to degrade the methylated parental strand and the resulting plasmid was used to transform competent cells. Plasmid was isolated from positively transformed cells and the presence of the mutation was verified using a BigDye Terminator v3.1 Cycle Sequencing Kit and a ABI3130 DNA Sequencer (Applied Biosystems). The mutant plasmid sequence was identical to that of the wild-type except at the position corresponding to residue 395, which was changed from CCC (Proline) to ACC (Threonine). Mutant cDNA was re-cloned into a pUAST expression vector. We also generated a construct containing the point mutation (1192 A>C, Thr395Pro) in the cDNA of Maegok CYP1B1 and cloned it into the plasmid pGEM-T easy using the following mutagenesis primer pair: oCYP1B1EX3_T395P_F: 5′-CTA TCA TGA GCT ACC CCA TCC CCA AGA AC-3′, oCYP1B1EX3_T395P_R: 5′-GTT CTT GGG GAT GGG GTA GCT CAT GAT AG-3′. Subsequently, we measured the activity of each CYP1B1 using the previously described method (see 2. Functional analysis of cytochrome P450).

Genetic diversity of medaka CYP1B1 in S.JPN and W.KOR populations We examined the diversity in medaka CYP1A and CYP1B1 using 18 local population samples (Figure 1A). GE, GH, IL, KM and SS were maintained by National BioResource Project (NBRP) medaka, and the others were maintained at the University of Tokyo. We sequenced 1028- and 1185-bp fragments of CYP1A and CYP1B1 coding regions, respectively, using the previously described PCR direct sequencing method (see 2. Functional analysis of cytochrome P450). Nucleotide sequences were aligned using CLUSTAL W [S25] with the program MEGA4 [S1]. Nucleotide diversity was calculated using the program DnaSP, v. 5.10.01 [S30].

Primer list

Purposes of use Genes Primer names Sequences (5′–3′)

SNP screening CYP1A oCYP1A1EX3F1 GAGATCATTGACGATGCCGACTACTTTT

oCYP1A1EX2R1 GTACTGCCATTCATCGGTCCTCTGTC

oCYP1A1EX7F1 GCAGCGCTTGTGCTTCATTGTG

oCYP1A1EX4R1 CTGGGCAGTGGGGTATTTGGTG

CYP1B1 oCYP1B1EX2F1 CCAAGCGTCGGAACCACATAGTC

oCYP1B1EX2R1 CTGGACGAAGGAGCTCTCAGAAATG

oCYP1B1EX3F1 GCATCTTGCCGCAGAGACACTG

oCYP1B1EX3R1 AGCGCCTCCCCTCCATTGAA

CYP5A1 oCYP5A1EX6F1 CGAGCGGTGGAGGAGAGTGAGGAG

oCYP5A1EX9R1 ATCTCGCCACGGGAGCCATCAA

CYP20A1 oCYP20A1EX5F1 AATGACCTCGGCGTCGTCTTTGAACC

oCYP20A1EX1R1 TCGCCGTCACGTTTGTCGTCATTTTAG

Full-length cDNA CYP1A oCYP1A1EX1F1 TTTTGTTTAATTTTGGCAAGCTCCTCTG

oCYP1A1EX7R2 CCTGTAATGCCATCGAATCTGACACA

CYP1B1 oCYP1B1EX2F3 CTGGAGACAAAATCCGCTGTGGTTCT

oCYP1B1EX3R2 TCGGCCTTCATGTTTGAGTCTTTCTGTAG CYP1A oCYP1A1EX1F2-EcoRⅠ CGACGAATTCGTCATCATGGCATTAATGGT

oCYP1A1EX7R3‐XhoⅠ ATTTCTCGAGAGCTTCAGTGTCCATCCTT

CYP1B1 oCYP1B1EX2F3-BglⅡ GGATAAGATCTAGGATCATGGATGTGACAGC

oCYP1B1EX2F4-BglII GGATAAGATCTAGGATCATGGGTGTGACAG

oCYP1B1EX2F5-BglII GGATAAGATCTTTCCTACTGATGTGCTGGA

oCYP1B1EX3R3‐XhoⅠ ATTTTCTCGAGACTTCAACCCTCACGGC

oCYP1B1EX3R4-NotI ATTTTGCGGCCGCGCGGCGCCGCTGTG

oCYP1B1EX3R5-NotI ATTTTGCGGCCGCGAGGGGGCGCTGTG

Supplemental References S1. Tamura, K., Dudley, J., Nei, M., and Kumar, S. (2007). MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Molecular Biology and Evolution 24, 1596–1599.

S2. Saitou, N., and Nei, M. (1987). The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular Biology and Evolution 4, 406–425.

S3. Felsenstein, J. (1985). Confidence limits on phylogenies: an approach using the bootstrap. Evolution, 783–791.

S4. Muffato, M., Louis, A., Poisnel, C. E., and Crollius, H. R. (2010). Genomicus: a database and a browser to study gene synteny in modern and ancestral genomes. Bioinformatics 26, 1119–1121.

S5. Shima, A., Shimada, A., Sakaizumi, M., and Egami, N. (1985). First listing of wild stocks of the medaka Oryzias latipes currently kept by the Zoological Institute, Faculty of Science, University of Tokyo. Journal of the Faculty of Science, Imperial University of Tokyo Sect IV, Zoology 16, 27–35.

S6. Hyodo-Taguchi, Y. (1996). Inbred strains of the medaka, Oryzias latipes (Development of Medaka Biology in Japan-Part I). The Fish Biology Journal Medaka 8, 11–14.

S7. Brand, A. H., and Perrimon, N. (1993). Targeted as a means of altering cell fates and generating dominant phenotypes. Development 118, 401– 415.

S8. Yoshiyama-Yanagawa, T., Enya, S., Shimada-Niwa, Y., Yaguchi, S., Haramoto, Y., Matsuya, T., Shiomi, K., Sasakura, Y., Takahashi, S., Asashima, M., et al. (2011). The Conserved Rieske DAF-36/Neverland Is a Novel -metabolizing Enzyme. J. Biol. Chem. 286, 25756–25762.

S9. Cali, J. J., Ma, D., Sobol, M., Simpson, D. J., Frackman, S., Good, T. D., Daily, W. J., and Liu, D. (2006). Luminogenic cytochrome P450 assays. Expert Opin. Drug Metab. Toxicol. 2, 629–645. S10. Kinoshita, M., Murata, K., Naruse, K., and Tanaka, M. (2009). Medaka: biology, management, and experimental protocols.

S11. Aida, T. (1921). On the Inheritance of Color in a Fresh-Water Fish, APLOCHEILUS LATIPES Temmick and Schlegel, with Special Reference to Sex-Linked Inheritance. Genetics 6, 554.

S12. The International HapMap Consortium (2005). A haplotype map of the human genome. Nature 437, 1299–1320.

S13. The 1000 Genomes Project Consortium (2010). A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073.

S14. Stephens, M., Smith, N. J., and Donnelly, P. (2001). A new statistical method for haplotype reconstruction from population data. Am. J. Hum. Genet. 68, 978–989.

S15. Stephens, M., and Scheet, P. (2005). Accounting for decay of linkage disequilibrium in haplotype inference and missing-data imputation. Am. J. Hum. Genet. 76, 449–462.

S16. Howells, W. W. (1989). Skull Shapes and the Map (Peabody Museum of Archaeology &).

S17. Brace, C. L., Nelson, A. R., and Qifeng, P. (2004). Peopling of the New World: a comparative craniofacial view. In The Settlement of the American Continents: A Multidisciplinary Approach to Human Biogeography, C. Barton, G. Clark, D. Yesner, and G. Pearson, eds. (Tucson: Arizona University Press), pp. 28–38.

S18. Hanihara, T. (2006). Interpretation of craniofacial variation and diversification of East and Southeast Asians. In Bioarchaeology of Southeast Asia, M. Oxenham and N. Tayles, eds. (Cambridge: Cambridge University Press), pp. 91–111.

S19. Kimura, T., Jindo, T., Narita, T., Naruse, K., Kobayashi, D., Shin-I, T., Kitagawa, T., Sakaguchi, T., MITANI, H., Shima, A., et al. (2004). Large-scale isolation of ESTs from medaka embryos and its application to medaka developmental genetics. Mechanisms of Development 121, 915–932. S20. Ripley, B. D. (2009). Stochastic Simulation (Wiley).

S21. Watterson, G. A. (1975). On the number of segregating sites in genetical models without recombination. Theor Popul Biol 7, 256–276.

S22. Tavaré, S., Balding, D. J., Griffiths, R. C., and Donnelly, P. (1997). Inferring coalescence times from DNA sequence data. Genetics 145, 505–518.

S23. Tavaré, S. (2004). Part I: Ancestral inference in population genetics. Lectures on probability theory and statistics.

S24. Takehana, Y., Nagai, N., Matsuda, M., Tsuchiya, K., and Sakaizumi, M. (2003). Geographic variation and diversity of the cytochrome b gene in Japanese wild populations of medaka, Oryzias latipes. Zool. Sci. 20, 1279–1291.

S25. Thompson, J. D., Higgins, D. G., and Gibson, T. J. (1994). CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research 22, 4673–4680.

S26. Kimura, M. (1980). A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol 16, 111–120.

S27. Nei, M., and Kumar, S. (2000). Molecular Evolution and Phylogenetics (Oxford University Press, USA).

S28. Jones, D. T., Taylor, W. R., and Thornton, J. M. (1992). The rapid generation of mutation data matrices from protein sequences. Computer applications in ….

S29. Tamura, K., Peterson, D., Peterson, N., Stecher, G., Nei, M., and Kumar, S. (2011). MEGA5: Molecular Evolutionary Genetics Analysis Using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods. Molecular Biology and Evolution 28, 2731–2739.

S30. Librado, P., and Rozas, J. (2009). DnaSP v5: a software for comprehensive analysis of DNA polymorphism data. Bioinformatics 25, 1451–1452. S31. Matsumoto, Y., Oota, H., Asaoka, Y., Nishina, H., Watanabe, K., Bujnicki, J. M., Oda, S., Kawamura, S., and Mitani, H. (2009). Medaka: a promising model animal for comparative population genomics. BMC Res Notes 2, 88.