Western Alaska Salmon Stock Identification Program Technical Document:1 4 1 2 Title: Status of the SNP Baseline for Chum Salmon Version : 1.0 3 Authors: J
Total Page:16
File Type:pdf, Size:1020Kb
Western Alaska Salmon Stock Identification Program Technical Document:1 4 1 2 Title: Status of the SNP baseline for chum salmon Version : 1.0 3 Authors: J. Jasper, N. DeCovich, W. Templin 4 Date: August 25, 2009 5 6 7 Introduction 8 9 Under the Western Alaska Salmon Stock Identification Program (WASSIP), mixed stock 10 analysis (MSA) to estimate the relative stock contribution of catches will be accomplished using 11 the single nucleotide polymorphism (SNP) baseline for chum salmon. Original MSA analyses 12 of harvests in this area were accomplished with a coastwide baseline of allozyme data that was 13 developed in a multi-laboratory effort (Kondzela et al. 1994, Seeb et al. 2004), but this baseline 14 has been replaced with ones based on newer markers, which provide improved resolution and 15 greater laboratory efficiency. A coastwide microsatellite baseline has been recently completed 16 (Beacham et al. 2009), however, early in the process the decision was made to pursue a baseline 17 using SNP markers. This decision was based on the automatic standardization of SNP markers, 18 high throughput capabilities available through the infrastructure in the ADFG laboratory, relative 19 genotyping costs, and the ability to access more of the genome than is available through 20 microsatellites. The baseline of SNP markers has been in a state of continual development for 21 more than five years and through the WASSIP project it is expected that it will become a fully 22 functioning, coastwide replacement for the previous allozyme baseline. 23 24 The suite of SNP markers screened for the baseline has changed through time and will continue 25 to grow or change as more markers become available. We currently screen for 60 nuclear and 26 three mitochondrial markers, but the WASSIP Advisory Panel has requested that 96 SNP 27 markers be incorporated into the baseline to improve the precision and accuracy of stock 1 This document serves as a record of communication between the Alaska Department of Fish and Game Commercial Fisheries Division and the Western Alaska Salmon Stock Identification Program Technical Committee. As such, these documents serve diverse ad hoc information purposes and may contain basic, uninterpreted data. The contents of this document have not been subjected to review and should not be cited or distributed without the permission of the authors or the Commercial Fisheries Division. WASSIP Technical Document 4: Chum salmon baseline 28 composition estimates. To meet this request, we are contracting the development of at least 33 29 novel SNP markers that are targeted to differentiate among chum salmon populations spawning 30 within western Alaska and the Alaska Peninsula drainages (Technical Document 6). These new 31 SNP markers will be assessed after screening a fraction of the baseline and the best-performing 32 SNP markers will be added to the baseline during the winter of 2009/2010. 33 34 Here we present the current state of the chum salmon baseline based on samples collected 35 through the 2008 collection season and genotyped for the currently available SNP markers. This 36 analysis is not as developed as the analysis of the sockeye baseline (Technical Document 5) for 37 several reasons. First, much of the sockeye baseline needed to be analyzed and tested in 38 preparation for ongoing MSA applications in the Bristol Bay and North Peninsula fisheries. 39 Second, improvements to the chum salmon baseline are generally hindered by the lack of 40 resolution among population groups in western Alaska. The resolving power of the current set of 41 SNP markers is demonstrated in this document, but it will be more efficient to hold more in- 42 depth analyses of population structure until after the new SNP markers have been developed and 43 applied. 44 45 Methods 46 47 Tissue Sampling 48 49 Baseline samples for SNP analyses were collected from spawning populations or obtained from 50 existing agency archives from throughout the range of chum salmon in the Pacific Rim. Many of 51 the available samples were available from the samples used in the published survey of allozyme 52 variation (Seeb et al. 2004). Target sample size for baseline collections was 100 individuals 53 across all years for each population to achieve acceptable precision for the allele frequency 54 estimates (Allendorf and Phelps 1981; Waples 1990a). 55 56 Laboratory Analysis 57 58 Assaying genotypes 2 WASSIP Technical Document 4: Chum salmon baseline 59 60 Genomic DNA was extracted using a DNeasy® 96 Tissue Kit by QIAGEN® (Valencia, CA). 61 While 61 SNP markers were available, some of these markers were excluded from this analysis 62 because they were either not screened for the complete set of populations, were found to be out 63 of Hardy-Weinberg equilibrium, or were linked to other markers that were included in the 64 analysis. These issues resulted in a reduced set of fifty-three chum salmon SNP markers used in 65 this analysis (Table 2); two mitochondrial DNA (mtDNA) and 51 nuclear DNA (nDNA). 66 Laboratory methods followed the 5‟ nuclease methods described in Seeb et al. (2009). Thirty 67 assays originated from Elfstrom et al. (2007), sixteen from Smith et al. (2005a), and seven from 68 Smith et al. (2005b). 69 70 Baseline population samples were genotyped using uniplex SNP genotyping performed in 384- 71 well reaction plates and also by using the 48.48 array (Fluidigm Corporation) where 48 of the 52 72 markers were assayed in sets of 48 fish and the remaining markers were assayed on the 384-well 73 platform. With either platform, genotypes from generally 384 fish were visualized using the 74 GeneMapper (uniplex platform; Applied Biosystems) and BioMark (array platform; Fluidigm 75 Corporation) software programs and scored for each marker by two people simultaneously. 76 Scores were entered and archived in the Gene Conservation Laboratory Oracle database, LOKI. 77 78 Quality control 79 80 Three measures were taken to ensure quality control of the baseline data: 81 1. Re-genotyping of samples – Eight percent of each collection was re-genotyped for all 82 markers to ensure that genotypes were reproducible, to identify laboratory errors, and to 83 measure rates of inconsistencies during repeated analyses on the uniplex and array 84 platforms. We report error rates for a representative baseline project which consisted of 85 38 baseline collections comprising 3,886 individuals (~ 24% of current baseline). 86 87 2. Exclusion of individuals with an excessive drop-out rates – A threshold of 80% scorable 88 loci per individual was established and all individuals that did not meet this threshold 89 were excluded from statistical analysis and use in the baseline. This threshold was set to 3 WASSIP Technical Document 4: Chum salmon baseline 90 exclude individuals with poor quality DNA. Poor quality DNA leads to lower 91 reproducibility and therefore adds error to the allele frequency estimates. The value of 92 80% was chosen based upon the observation that many individuals with high quality 93 DNA had some dropouts, but generally less than 20% of markers, while those with poor- 94 quality DNA had higher drop-out rates. As a result, there was little difference in which 95 individuals were excluded from analysis when picking the threshold as long as it was 96 within the 70% to 90% range. 97 98 This rule (referred to as the “80% rule”) will also be used for samples from fishery 99 harvests to decrease errors and estimate variances caused by poor quality DNA and 100 missing data. This approach is an attempt to balance the benefits from better data with the 101 loss of power to accurately and precisely estimate stock proportions due to smaller 102 sample sizes. One other potential disadvantage of this approach is the potential to 103 introduce another form of bias if fish that are removed from analyses are not randomly 104 distributed in the mixture. Heterogeneity in sample removal may introduce bias in 105 subsequent estimates of stock proportions when samples with quality genotypic data are 106 not representative of the entire harvest being sampled. We anticipate that bias will only 107 be a concern if significant proportions of mixtures are excluded. 108 109 3. Finally, we searched for suspected duplicate fish within collections by identifying pairs 110 of individuals that had identical multi-locus genotypes at 38 or more loci. If suspected 111 duplicates were found, the second individual in each matching pair was removed from 112 further analyses. 113 114 Statistical analysis 115 116 Heterozygosity and FST 117 118 Genotypic data were retrieved from LOKI database and were used to calculate allele frequencies. 119 Observed heterozygosity, expected heterozygosity, and FST (Weir and Cockerham 1984) were 120 calculated for all markers using the program GDA v1.1 (Lewis and Zaykin 2001). 4 WASSIP Technical Document 4: Chum salmon baseline 121 122 Linkage disequilibrium 123 124 All pairs of nuclear markers were tested for gametic disequilibrium within each collection using 125 GDA. We defined a pair of markers to be significantly out of gametic equilibrium if tests for 126 gametic disequilibrium were significant (P < 0.01) for greater than half of all collections. When 127 gametic linkage was significant, the SNP with the lowest FST in the pair was dropped. All 128 mtDNA markers were combined into a single locus. Markers that did not exhibit gametic 129 disequilibrium with any other markers, retained markers from marker pairs that exhibited 130 gemetic disequilibrium, and the combined mtDNA markers were defined as loci for the 131 remaining analyses. 132 133 Pooling collections into populations 134 135 Collections taken at the same location at similar calendar days in different years were pooled as 136 suggested by Waples (1990b).