Supplemental Information File
Total Page:16
File Type:pdf, Size:1020Kb
Supplemental Information File Jaffe et al. CRISPRiSeq Manuscript Contents: • Supplemental Methods and References • Supplemental Figures S1-S16 • Supplemental Table S4 • Supplemental Note 1 Supplemental Methods Isolation and pooling of single CRISPRi strains A total of 760 strains, with previously established anhydrotetracycline (ATc)-dependent growth defects, were cherry-picked from a published collection of ~9,000 single CRISPRi strains (Smith et al. 2017) using the Stinger extension of the Rotor HDA robot (Singer Instruments). All 760 strains were cultured individually for 2 days at 30°C in 1 mL of YPD using eight 96-deep-well plates, and then combined to generate a starting pool (hereafter referred to as Sublib1); aliquots of this pool were stored in 17% glycerol at -80°C. To generate additional starting pools carrying non-targeting control guides, oligonucleotides carrying homology to the integration site and 20 random nucleotides (nt), instead of 20 nt of a PAM-adjacent targeting sequence, were transformed into the same ancestral strain that was used to generate the single CRISPRi collection (Smith et al. 2017). DNA was isolated from 40 clones by colony lysis, and the gRNA locus was then amplified using primers P82 and P22 (Supplemental Table S1) and OneTaq 2x Master Mix with Standard Buffer (NEB). ExoSAP-IT (Thermo Fisher) was used to digest excess primers before Sanger sequencing with primer P82. The e-crisp web tool (www.e-crisp.org) was used to verify that the random 20 base sequences had no genomic target (Heigwer et al. 2014). These strains carrying non-targeting guides, were grown up individually and combined as described above to create two control pools (CCP2 and CCP3, consisting of 5 and 19 strains, respectively). Supplemental Table S1 lists all strains and their respective 20 base gRNA targeting sequences within each of the three generated pools (Sublib1, CCP2 and CCP3). Two of the 19 control strains in CCP3 (CC3 and CC33) were removed from subsequent analysis, as they exhibited a growth defect in a preliminary fitness assay (Supplemental Fig. S15). Pooled growth for fitness assay Aliquots from all double CRISPRi pooled transformations were thawed and mixed such that each double CRISPRi strain would be represented at an approximately equal starting frequency in the pool. After pooling, to recover the cells from the freezer, cells were grown at a starting density of 2 x 107 cells/mL for 5 hr in YPD at 30°C. This outgrowth was then used to inoculate 3 replicate cultures of both YPD+ATc (250ng/µL) and YPEG+ATc (250ng/µL) at a starting density of 5 x 107 cells/mL in 5 mL each. The remaining outgrowth was split in half and either stored at -80°C in 17% glycerol for inoculating the second batch of growth conditions, or spun and then stored at -20°C with the supernatant removed for DNA extraction and barcode locus sequencing of time zero (T0). Batch serial growth and transfer were performed as follows: the 5 mL growth cultures were rotated at 30°C for 24 hours and then 1.25 mL of the culture were transferred to the next growth cycle by first spinning at 5K rpm, then re-suspending in 5 mL fresh growth media and returning to 30°C. This 1:4 dilution factor was chosen such that the population goes through approximately 2 generations per growth cycle. The remaining culture was split in two and saved at -20°C with the supernatant removed for DNA extraction. At each transfer, cellular density was also measured using the Coulter Counter (Beckman), and the culture was observed under the microscope for contamination. A total of 7 time points were collected, and at the final time point, to detect any significant loss of the construct carried at the YBR209W guide locus, -URA colonies were quantified by plating for single colonies on YPD and then replica plating to SC-URA. See Supplemental Fig. S2 for diagram of cell growth, measurements of cell density at each transfer and measurements of percent URA- colonies at the end of the pooled fitness assay. For the second batch of conditions, the cells saved from the outgrowth for the first batch were recovered from the freezer by growing for 5 hrs at 30°C in YPD at a starting density of 2 x 107 cells/mL, after which an aliquot was saved for T0 sequencing. Next, 3 replicate cultures of YPD+ATc (250ng/µL, to be grown at 37°C) and 3 replicate cultures of SC-URA+ATc (250ng/µL, to be grown at 30°C) were each inoculated at a starting density of 5 x 107 cells/mL in 5mL and transferred, verified and saved as described above, with a total of 4 time points collected. At the start of the second batch of conditions, an additional 3 replicates of YPD+ATc (250ng/µL) were inoculated at 2.5 x 107 cells/mL, and 625 µL were transferred every 48 hrs, so that the population went through approximately 3 generations per 48 hr cycle. Batch serial culture was performed in order to have sufficient cellular material at each transfer to perform amplicon sequencing and such that phases of the growth cycle other than exponential growth were captured (i.e. saturation and recovery from saturation). Sampling of multiple time points should also decrease the noise in fitness estimates compared to only using the first and last time points for this estimate. DNA extraction and barcode locus sequencing for fitness assay For the first three time points for all replicates of each growth condition, as well as both T0 outgrowth samples, genomic DNA was isolated from frozen cell pellets using the YeaStar Genomic DNA Kit (Zymo Research) and yields were quantified with Qubit dsDNA HS Assay Kit (Invitrogen). To generate sequencing libraries, for each sample a total of 150 ng of DNA (~11 million genomes, ~600 genomes per strain) was split between four identical PCRs and amplified using Q5 polymerase with 22 cycles. The forward and reverse primers (Supplemental Table S1) amplified a 942 nt amplicon of the YBR209W guide locus carrying the barcode identifying the query guide and the 20 nt PAM-adjacent targeting sequence identifying the guide derived from the starting pool. Note that future screens could decrease this relatively large amplicon size by designing each starting pool strain such that it carries its own unique DNA barcode adjacent to the loxP site. Subsequent PCR of the resulting ~300 nt double barcode locus could be used to identify corresponding genetic elements (Jaffe et al. 2017). The PCR used here also added Illumina adaptors and a 0 to 6 nt multiplexing tag on each side to be used to identify each sample library within the sequencing data. All 4 reactions were purified on one Qiagen PCR purification column, and then purified again on E-Gel SizeSelect 2% Agarose Gels (Invitrogen). Yields were quantified by Qubit dsDNA HS Assay Kit (Invitrogen), and the 50 sample libraries were each pooled at equimolar ratios into one of three sequencing libraries, consisting of 14, 18 and 18 samples each. The Bioanalyzer High-Sensitivity DNA Assay (Agilent) was used to verify the quality before running samples on an Illumina HiSeq at a core facility which runs samples with paired-end 2x101 nt reads, aiming to generate >100x coverage per strain per time point. Data analysis Parsing raw sequencing data and removing chimeric reads Raw fastq files were parsed via custom python scripts that assigned each read pair to the correct sequencing library and strain based on the multiplexing tags, barcode and 20 nt PAM-adjacent site- directed sequence (or SDS) it carried. Up to 1 mismatch was allowed for all identifiers except the primer sequence that was used as a reference point in the read to locate the other identifiers. For the primer sequence, 1 mismatch was allowed anywhere in its sequence in addition to allowing a mismatch at the first position. An estimate of the percentage of chimeric reads was calculated for each library by plotting the observed read counts for query barcode and starting pool SDS combinations which were not present in the experimental pool, versus the expected frequency of observation based on the observed read count for each individual component, i.e. the SDS and barcode sequences (Schlecht et al. 2017). A linear fit was made to these data using the lm() function in R, and this model was used to subtract out an estimated proportion of chimeric reads from the observed read counts for each BC/SDS combination that did exist in the experimental pool. Fitness Estimation For each strain, we normalized each count by the total counts at that respective time point. We then required a threshold frequency of 5 x 10-6 at the first time point (~30-40 reads) and 1 x 10-6 at the third time point (~5-8 reads), for a fitness estimate to be made. For each time point, (tn), frequencies were normalized to the change in frequency for that time point of the 100 “WT” control strains by multiplying by the following factor: !!",!! !!",!" and then by dividing by the frequency at the first time point. We then fit a linear model to the change in log(normalized frequency) over the number of generations the pool was grown, and used the slope + 1 of its fitted line as each strain’s fitness. Because our pooled fitness assay was performed over relatively few generations (up to 6 generations in the condition with the most growth), we expect the mean fitness of the population to be relatively constant, such that a linear model is a reasonable approach to estimate fitness.